docs: update roadmap and todo with recent changes

This commit is contained in:
Username
2025-12-28 17:00:52 +01:00
parent e758ce7178
commit 9e2fc3e09d
2 changed files with 44 additions and 27 deletions

View File

@@ -329,6 +329,25 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
- [x] Content hash deduplication for stale proxy list detection
- [x] stale_count reset when content hash changes
### Distributed Workers (Done)
- [x] Worker registration and heartbeat system
- [x] /api/workers endpoint for worker status monitoring
- [x] Tor connectivity check before workers claim work
- [x] Worker test rate tracking with sliding window calculation
- [x] Combined rate aggregation across all workers
- [x] Dashboard worker cards showing per-worker stats
### Dashboard Performance (Done)
- [x] Keyboard shortcuts: r=refresh, 1-9=tabs, t=theme, p=pause
- [x] Tab-aware chart rendering - skip expensive renders for hidden tabs
- [x] Visibility API - pause polling when browser tab hidden
- [x] Dark/muted-dark/light theme cycling
### Proxy Validation Cache (Done)
- [x] LRU cache for is_usable_proxy() using OrderedDict
- [x] Thread-safe with lock for concurrent access
- [x] Proper LRU eviction (move_to_end on hits, popitem oldest when full)
---
## Technical Debt
@@ -350,13 +369,22 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
| ppf.py | Main URL harvester daemon | Active, cleaned |
| proxywatchd.py | Proxy validation daemon | Active, enhanced |
| scraper.py | Searx search integration | Active, cleaned |
| fetch.py | HTTP fetching with proxy support | Active |
| fetch.py | HTTP fetching with proxy support | Active, LRU cache |
| dbs.py | Database schema and inserts | Active |
| mysqlite.py | SQLite wrapper | Active |
| rocksock.py | Socket/proxy abstraction (3rd party) | Stable |
| http2.py | HTTP client implementation | Stable |
| httpd.py | Web dashboard and REST API server | Active, enhanced |
| config.py | Configuration management | Active |
| comboparse.py | Config/arg parser framework | Stable, cleaned |
| soup_parser.py | BeautifulSoup wrapper | Stable, cleaned |
| misc.py | Utilities (timestamp, logging) | Stable, cleaned |
| export.py | Proxy export CLI tool | Active |
| engines.py | Search engine implementations | Active |
| connection_pool.py | Tor connection pooling | Active |
| network_stats.py | Network statistics tracking | Active |
| dns.py | DNS resolution with caching | Active |
| mitm.py | MITM certificate detection | Active |
| job.py | Priority job queue | Active |
| static/dashboard.js | Dashboard frontend logic | Active, enhanced |
| static/dashboard.html | Dashboard HTML template | Active |

41
TODO.md
View File

@@ -288,27 +288,16 @@ assessed for real-world impact based on measured data.
---
### [ ] 2. Proxy Validation Caching
### [x] 2. Proxy Validation Caching
**Current State:**
- `is_usable_proxy()`: 174,620 calls, 1.79s total
- `fetch.py:242 <genexpr>`: 3,403,165 calls, 3.66s total (proxy iteration)
- Many repeated validations for same proxy strings
**Completed.** Converted is_usable_proxy() cache to proper LRU with OrderedDict.
**Proposed Change:**
- Add LRU cache decorator to `is_usable_proxy()`
- Cache size: 10,000 entries (covers typical working set)
- TTL: None needed (IP validity doesn't change)
**Assessment:**
```
Current cost: 5.5s per 30min = 11s/hour = 4.4min/day
Potential saving: 50-70% cache hit rate = 2.7-3.8s per 30min = 5-8s/hour
Effort: Very low (add @lru_cache decorator)
Risk: None (pure function, deterministic output)
```
**Verdict:** LOW PRIORITY. Minimal gain for minimal effort. Do if convenient.
**Implementation:**
- fetch.py: Changed _proxy_valid_cache from dict to OrderedDict
- Added thread-safe _proxy_valid_cache_lock
- move_to_end() on cache hits to maintain LRU order
- Evict oldest entries when cache reaches max size (10,000)
- Proper LRU eviction instead of stopping inserts when full
---
@@ -414,15 +403,15 @@ SQLite's lightweight connections don't justify pooling complexity.
│ Optimization │ Effort │ Risk │ Savings │ Status
├─────────────────────────────────────┼────────┼────────┼─────────┼───────────┤
│ 1. SQLite Query Batching │ Low │ Low │ 20-34s/h│ DONE
│ 2. Proxy Validation Caching │ V.Low │ None │ 5-8s/h │ Maybe
│ 2. Proxy Validation Caching │ V.Low │ None │ 5-8s/h │ DONE
│ 3. Regex Pre-compilation │ Low │ None │ 5-8s/h │ DONE
│ 4. JSON Response Caching │ Medium │ Low │ 7-9s/h │ Later
│ 5. Object Pooling │ High │ Medium │ 11-15s/h│ Skip
│ 6. SQLite Connection Reuse │ Medium │ Medium │ 0.3s/h │ Skip
└─────────────────────────────────────┴────────┴────────┴─────────┴───────────┘
Completed: 1 (SQLite Batching), 3 (Regex Pre-compilation)
Remaining: 2 (Proxy Caching - Maybe), 4 (JSON Caching - Later)
Completed: 1 (SQLite Batching), 2 (Proxy Caching), 3 (Regex Pre-compilation)
Remaining: 4 (JSON Caching - Later)
Realized savings from completed optimizations:
Per hour: 25-42 seconds saved
@@ -455,19 +444,20 @@ above target the remaining 31.3% of CPU-bound operations.
- [ ] Lazy-load historical data (only when scrolled into view)
- [ ] WebSocket option for push updates (reduce polling overhead)
- [ ] Configurable refresh interval via URL param or localStorage
- [ ] Disable auto-refresh when tab not visible (Page Visibility API)
- [x] Pause polling when browser tab not visible (Page Visibility API)
- [x] Skip chart rendering for inactive dashboard tabs (reduces CPU)
### [ ] Dashboard Feature Ideas
**Low priority - consider when time permits:**
- [x] Geographic map visualization - /map endpoint with Leaflet.js
- [ ] Dark/light theme toggle
- [x] Dark/light/muted theme toggle - t key cycles themes
- [ ] Export stats as CSV/JSON from dashboard
- [ ] Historical graphs (24h, 7d) using stats_history table
- [ ] Per-ASN performance analysis
- [ ] Alert thresholds (success rate < X%, MITM detected)
- [ ] Mobile-responsive improvements
- [ ] Keyboard shortcuts (r=refresh, t=toggle sections)
- [x] Keyboard shortcuts (r=refresh, 1-9=tabs, t=theme, p=pause)
### [ ] Local JS Library Serving
@@ -870,4 +860,3 @@ Note: If podman ps shows empty but port is listening and health check passes,
the service is running correctly despite metadata issues. See "Podman Container
Metadata Disappears" section above.
```
- Dashboard: pause API polling for inactive tabs (only update persistent items + active tab)