docs: update roadmap and todo with recent changes

This commit is contained in:
Username
2025-12-28 17:00:52 +01:00
parent e758ce7178
commit 9e2fc3e09d
2 changed files with 44 additions and 27 deletions

View File

@@ -329,6 +329,25 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
- [x] Content hash deduplication for stale proxy list detection - [x] Content hash deduplication for stale proxy list detection
- [x] stale_count reset when content hash changes - [x] stale_count reset when content hash changes
### Distributed Workers (Done)
- [x] Worker registration and heartbeat system
- [x] /api/workers endpoint for worker status monitoring
- [x] Tor connectivity check before workers claim work
- [x] Worker test rate tracking with sliding window calculation
- [x] Combined rate aggregation across all workers
- [x] Dashboard worker cards showing per-worker stats
### Dashboard Performance (Done)
- [x] Keyboard shortcuts: r=refresh, 1-9=tabs, t=theme, p=pause
- [x] Tab-aware chart rendering - skip expensive renders for hidden tabs
- [x] Visibility API - pause polling when browser tab hidden
- [x] Dark/muted-dark/light theme cycling
### Proxy Validation Cache (Done)
- [x] LRU cache for is_usable_proxy() using OrderedDict
- [x] Thread-safe with lock for concurrent access
- [x] Proper LRU eviction (move_to_end on hits, popitem oldest when full)
--- ---
## Technical Debt ## Technical Debt
@@ -350,13 +369,22 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
| ppf.py | Main URL harvester daemon | Active, cleaned | | ppf.py | Main URL harvester daemon | Active, cleaned |
| proxywatchd.py | Proxy validation daemon | Active, enhanced | | proxywatchd.py | Proxy validation daemon | Active, enhanced |
| scraper.py | Searx search integration | Active, cleaned | | scraper.py | Searx search integration | Active, cleaned |
| fetch.py | HTTP fetching with proxy support | Active | | fetch.py | HTTP fetching with proxy support | Active, LRU cache |
| dbs.py | Database schema and inserts | Active | | dbs.py | Database schema and inserts | Active |
| mysqlite.py | SQLite wrapper | Active | | mysqlite.py | SQLite wrapper | Active |
| rocksock.py | Socket/proxy abstraction (3rd party) | Stable | | rocksock.py | Socket/proxy abstraction (3rd party) | Stable |
| http2.py | HTTP client implementation | Stable | | http2.py | HTTP client implementation | Stable |
| httpd.py | Web dashboard and REST API server | Active, enhanced |
| config.py | Configuration management | Active | | config.py | Configuration management | Active |
| comboparse.py | Config/arg parser framework | Stable, cleaned | | comboparse.py | Config/arg parser framework | Stable, cleaned |
| soup_parser.py | BeautifulSoup wrapper | Stable, cleaned | | soup_parser.py | BeautifulSoup wrapper | Stable, cleaned |
| misc.py | Utilities (timestamp, logging) | Stable, cleaned | | misc.py | Utilities (timestamp, logging) | Stable, cleaned |
| export.py | Proxy export CLI tool | Active | | export.py | Proxy export CLI tool | Active |
| engines.py | Search engine implementations | Active |
| connection_pool.py | Tor connection pooling | Active |
| network_stats.py | Network statistics tracking | Active |
| dns.py | DNS resolution with caching | Active |
| mitm.py | MITM certificate detection | Active |
| job.py | Priority job queue | Active |
| static/dashboard.js | Dashboard frontend logic | Active, enhanced |
| static/dashboard.html | Dashboard HTML template | Active |

41
TODO.md
View File

@@ -288,27 +288,16 @@ assessed for real-world impact based on measured data.
--- ---
### [ ] 2. Proxy Validation Caching ### [x] 2. Proxy Validation Caching
**Current State:** **Completed.** Converted is_usable_proxy() cache to proper LRU with OrderedDict.
- `is_usable_proxy()`: 174,620 calls, 1.79s total
- `fetch.py:242 <genexpr>`: 3,403,165 calls, 3.66s total (proxy iteration)
- Many repeated validations for same proxy strings
**Proposed Change:** **Implementation:**
- Add LRU cache decorator to `is_usable_proxy()` - fetch.py: Changed _proxy_valid_cache from dict to OrderedDict
- Cache size: 10,000 entries (covers typical working set) - Added thread-safe _proxy_valid_cache_lock
- TTL: None needed (IP validity doesn't change) - move_to_end() on cache hits to maintain LRU order
- Evict oldest entries when cache reaches max size (10,000)
**Assessment:** - Proper LRU eviction instead of stopping inserts when full
```
Current cost: 5.5s per 30min = 11s/hour = 4.4min/day
Potential saving: 50-70% cache hit rate = 2.7-3.8s per 30min = 5-8s/hour
Effort: Very low (add @lru_cache decorator)
Risk: None (pure function, deterministic output)
```
**Verdict:** LOW PRIORITY. Minimal gain for minimal effort. Do if convenient.
--- ---
@@ -414,15 +403,15 @@ SQLite's lightweight connections don't justify pooling complexity.
│ Optimization │ Effort │ Risk │ Savings │ Status │ Optimization │ Effort │ Risk │ Savings │ Status
├─────────────────────────────────────┼────────┼────────┼─────────┼───────────┤ ├─────────────────────────────────────┼────────┼────────┼─────────┼───────────┤
│ 1. SQLite Query Batching │ Low │ Low │ 20-34s/h│ DONE │ 1. SQLite Query Batching │ Low │ Low │ 20-34s/h│ DONE
│ 2. Proxy Validation Caching │ V.Low │ None │ 5-8s/h │ Maybe │ 2. Proxy Validation Caching │ V.Low │ None │ 5-8s/h │ DONE
│ 3. Regex Pre-compilation │ Low │ None │ 5-8s/h │ DONE │ 3. Regex Pre-compilation │ Low │ None │ 5-8s/h │ DONE
│ 4. JSON Response Caching │ Medium │ Low │ 7-9s/h │ Later │ 4. JSON Response Caching │ Medium │ Low │ 7-9s/h │ Later
│ 5. Object Pooling │ High │ Medium │ 11-15s/h│ Skip │ 5. Object Pooling │ High │ Medium │ 11-15s/h│ Skip
│ 6. SQLite Connection Reuse │ Medium │ Medium │ 0.3s/h │ Skip │ 6. SQLite Connection Reuse │ Medium │ Medium │ 0.3s/h │ Skip
└─────────────────────────────────────┴────────┴────────┴─────────┴───────────┘ └─────────────────────────────────────┴────────┴────────┴─────────┴───────────┘
Completed: 1 (SQLite Batching), 3 (Regex Pre-compilation) Completed: 1 (SQLite Batching), 2 (Proxy Caching), 3 (Regex Pre-compilation)
Remaining: 2 (Proxy Caching - Maybe), 4 (JSON Caching - Later) Remaining: 4 (JSON Caching - Later)
Realized savings from completed optimizations: Realized savings from completed optimizations:
Per hour: 25-42 seconds saved Per hour: 25-42 seconds saved
@@ -455,19 +444,20 @@ above target the remaining 31.3% of CPU-bound operations.
- [ ] Lazy-load historical data (only when scrolled into view) - [ ] Lazy-load historical data (only when scrolled into view)
- [ ] WebSocket option for push updates (reduce polling overhead) - [ ] WebSocket option for push updates (reduce polling overhead)
- [ ] Configurable refresh interval via URL param or localStorage - [ ] Configurable refresh interval via URL param or localStorage
- [ ] Disable auto-refresh when tab not visible (Page Visibility API) - [x] Pause polling when browser tab not visible (Page Visibility API)
- [x] Skip chart rendering for inactive dashboard tabs (reduces CPU)
### [ ] Dashboard Feature Ideas ### [ ] Dashboard Feature Ideas
**Low priority - consider when time permits:** **Low priority - consider when time permits:**
- [x] Geographic map visualization - /map endpoint with Leaflet.js - [x] Geographic map visualization - /map endpoint with Leaflet.js
- [ ] Dark/light theme toggle - [x] Dark/light/muted theme toggle - t key cycles themes
- [ ] Export stats as CSV/JSON from dashboard - [ ] Export stats as CSV/JSON from dashboard
- [ ] Historical graphs (24h, 7d) using stats_history table - [ ] Historical graphs (24h, 7d) using stats_history table
- [ ] Per-ASN performance analysis - [ ] Per-ASN performance analysis
- [ ] Alert thresholds (success rate < X%, MITM detected) - [ ] Alert thresholds (success rate < X%, MITM detected)
- [ ] Mobile-responsive improvements - [ ] Mobile-responsive improvements
- [ ] Keyboard shortcuts (r=refresh, t=toggle sections) - [x] Keyboard shortcuts (r=refresh, 1-9=tabs, t=theme, p=pause)
### [ ] Local JS Library Serving ### [ ] Local JS Library Serving
@@ -870,4 +860,3 @@ Note: If podman ps shows empty but port is listening and health check passes,
the service is running correctly despite metadata issues. See "Podman Container the service is running correctly despite metadata issues. See "Podman Container
Metadata Disappears" section above. Metadata Disappears" section above.
``` ```
- Dashboard: pause API polling for inactive tabs (only update persistent items + active tab)