diff --git a/ROADMAP.md b/ROADMAP.md index e8e4b3a..b25a995 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -329,6 +329,25 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design - [x] Content hash deduplication for stale proxy list detection - [x] stale_count reset when content hash changes +### Distributed Workers (Done) +- [x] Worker registration and heartbeat system +- [x] /api/workers endpoint for worker status monitoring +- [x] Tor connectivity check before workers claim work +- [x] Worker test rate tracking with sliding window calculation +- [x] Combined rate aggregation across all workers +- [x] Dashboard worker cards showing per-worker stats + +### Dashboard Performance (Done) +- [x] Keyboard shortcuts: r=refresh, 1-9=tabs, t=theme, p=pause +- [x] Tab-aware chart rendering - skip expensive renders for hidden tabs +- [x] Visibility API - pause polling when browser tab hidden +- [x] Dark/muted-dark/light theme cycling + +### Proxy Validation Cache (Done) +- [x] LRU cache for is_usable_proxy() using OrderedDict +- [x] Thread-safe with lock for concurrent access +- [x] Proper LRU eviction (move_to_end on hits, popitem oldest when full) + --- ## Technical Debt @@ -350,13 +369,22 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design | ppf.py | Main URL harvester daemon | Active, cleaned | | proxywatchd.py | Proxy validation daemon | Active, enhanced | | scraper.py | Searx search integration | Active, cleaned | -| fetch.py | HTTP fetching with proxy support | Active | +| fetch.py | HTTP fetching with proxy support | Active, LRU cache | | dbs.py | Database schema and inserts | Active | | mysqlite.py | SQLite wrapper | Active | | rocksock.py | Socket/proxy abstraction (3rd party) | Stable | | http2.py | HTTP client implementation | Stable | +| httpd.py | Web dashboard and REST API server | Active, enhanced | | config.py | Configuration management | Active | | comboparse.py | Config/arg parser framework | Stable, cleaned | | soup_parser.py | BeautifulSoup wrapper | Stable, cleaned | | misc.py | Utilities (timestamp, logging) | Stable, cleaned | | export.py | Proxy export CLI tool | Active | +| engines.py | Search engine implementations | Active | +| connection_pool.py | Tor connection pooling | Active | +| network_stats.py | Network statistics tracking | Active | +| dns.py | DNS resolution with caching | Active | +| mitm.py | MITM certificate detection | Active | +| job.py | Priority job queue | Active | +| static/dashboard.js | Dashboard frontend logic | Active, enhanced | +| static/dashboard.html | Dashboard HTML template | Active | diff --git a/TODO.md b/TODO.md index 8ec2644..393e58d 100644 --- a/TODO.md +++ b/TODO.md @@ -288,27 +288,16 @@ assessed for real-world impact based on measured data. --- -### [ ] 2. Proxy Validation Caching +### [x] 2. Proxy Validation Caching -**Current State:** -- `is_usable_proxy()`: 174,620 calls, 1.79s total -- `fetch.py:242 `: 3,403,165 calls, 3.66s total (proxy iteration) -- Many repeated validations for same proxy strings +**Completed.** Converted is_usable_proxy() cache to proper LRU with OrderedDict. -**Proposed Change:** -- Add LRU cache decorator to `is_usable_proxy()` -- Cache size: 10,000 entries (covers typical working set) -- TTL: None needed (IP validity doesn't change) - -**Assessment:** -``` -Current cost: 5.5s per 30min = 11s/hour = 4.4min/day -Potential saving: 50-70% cache hit rate = 2.7-3.8s per 30min = 5-8s/hour -Effort: Very low (add @lru_cache decorator) -Risk: None (pure function, deterministic output) -``` - -**Verdict:** LOW PRIORITY. Minimal gain for minimal effort. Do if convenient. +**Implementation:** +- fetch.py: Changed _proxy_valid_cache from dict to OrderedDict +- Added thread-safe _proxy_valid_cache_lock +- move_to_end() on cache hits to maintain LRU order +- Evict oldest entries when cache reaches max size (10,000) +- Proper LRU eviction instead of stopping inserts when full --- @@ -414,15 +403,15 @@ SQLite's lightweight connections don't justify pooling complexity. │ Optimization │ Effort │ Risk │ Savings │ Status ├─────────────────────────────────────┼────────┼────────┼─────────┼───────────┤ │ 1. SQLite Query Batching │ Low │ Low │ 20-34s/h│ DONE -│ 2. Proxy Validation Caching │ V.Low │ None │ 5-8s/h │ Maybe +│ 2. Proxy Validation Caching │ V.Low │ None │ 5-8s/h │ DONE │ 3. Regex Pre-compilation │ Low │ None │ 5-8s/h │ DONE │ 4. JSON Response Caching │ Medium │ Low │ 7-9s/h │ Later │ 5. Object Pooling │ High │ Medium │ 11-15s/h│ Skip │ 6. SQLite Connection Reuse │ Medium │ Medium │ 0.3s/h │ Skip └─────────────────────────────────────┴────────┴────────┴─────────┴───────────┘ -Completed: 1 (SQLite Batching), 3 (Regex Pre-compilation) -Remaining: 2 (Proxy Caching - Maybe), 4 (JSON Caching - Later) +Completed: 1 (SQLite Batching), 2 (Proxy Caching), 3 (Regex Pre-compilation) +Remaining: 4 (JSON Caching - Later) Realized savings from completed optimizations: Per hour: 25-42 seconds saved @@ -455,19 +444,20 @@ above target the remaining 31.3% of CPU-bound operations. - [ ] Lazy-load historical data (only when scrolled into view) - [ ] WebSocket option for push updates (reduce polling overhead) - [ ] Configurable refresh interval via URL param or localStorage -- [ ] Disable auto-refresh when tab not visible (Page Visibility API) +- [x] Pause polling when browser tab not visible (Page Visibility API) +- [x] Skip chart rendering for inactive dashboard tabs (reduces CPU) ### [ ] Dashboard Feature Ideas **Low priority - consider when time permits:** - [x] Geographic map visualization - /map endpoint with Leaflet.js -- [ ] Dark/light theme toggle +- [x] Dark/light/muted theme toggle - t key cycles themes - [ ] Export stats as CSV/JSON from dashboard - [ ] Historical graphs (24h, 7d) using stats_history table - [ ] Per-ASN performance analysis - [ ] Alert thresholds (success rate < X%, MITM detected) - [ ] Mobile-responsive improvements -- [ ] Keyboard shortcuts (r=refresh, t=toggle sections) +- [x] Keyboard shortcuts (r=refresh, 1-9=tabs, t=theme, p=pause) ### [ ] Local JS Library Serving @@ -870,4 +860,3 @@ Note: If podman ps shows empty but port is listening and health check passes, the service is running correctly despite metadata issues. See "Podman Container Metadata Disappears" section above. ``` -- Dashboard: pause API polling for inactive tabs (only update persistent items + active tab)