docs: update roadmap and todo with recent changes
This commit is contained in:
30
ROADMAP.md
30
ROADMAP.md
@@ -329,6 +329,25 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
|
||||
- [x] Content hash deduplication for stale proxy list detection
|
||||
- [x] stale_count reset when content hash changes
|
||||
|
||||
### Distributed Workers (Done)
|
||||
- [x] Worker registration and heartbeat system
|
||||
- [x] /api/workers endpoint for worker status monitoring
|
||||
- [x] Tor connectivity check before workers claim work
|
||||
- [x] Worker test rate tracking with sliding window calculation
|
||||
- [x] Combined rate aggregation across all workers
|
||||
- [x] Dashboard worker cards showing per-worker stats
|
||||
|
||||
### Dashboard Performance (Done)
|
||||
- [x] Keyboard shortcuts: r=refresh, 1-9=tabs, t=theme, p=pause
|
||||
- [x] Tab-aware chart rendering - skip expensive renders for hidden tabs
|
||||
- [x] Visibility API - pause polling when browser tab hidden
|
||||
- [x] Dark/muted-dark/light theme cycling
|
||||
|
||||
### Proxy Validation Cache (Done)
|
||||
- [x] LRU cache for is_usable_proxy() using OrderedDict
|
||||
- [x] Thread-safe with lock for concurrent access
|
||||
- [x] Proper LRU eviction (move_to_end on hits, popitem oldest when full)
|
||||
|
||||
---
|
||||
|
||||
## Technical Debt
|
||||
@@ -350,13 +369,22 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
|
||||
| ppf.py | Main URL harvester daemon | Active, cleaned |
|
||||
| proxywatchd.py | Proxy validation daemon | Active, enhanced |
|
||||
| scraper.py | Searx search integration | Active, cleaned |
|
||||
| fetch.py | HTTP fetching with proxy support | Active |
|
||||
| fetch.py | HTTP fetching with proxy support | Active, LRU cache |
|
||||
| dbs.py | Database schema and inserts | Active |
|
||||
| mysqlite.py | SQLite wrapper | Active |
|
||||
| rocksock.py | Socket/proxy abstraction (3rd party) | Stable |
|
||||
| http2.py | HTTP client implementation | Stable |
|
||||
| httpd.py | Web dashboard and REST API server | Active, enhanced |
|
||||
| config.py | Configuration management | Active |
|
||||
| comboparse.py | Config/arg parser framework | Stable, cleaned |
|
||||
| soup_parser.py | BeautifulSoup wrapper | Stable, cleaned |
|
||||
| misc.py | Utilities (timestamp, logging) | Stable, cleaned |
|
||||
| export.py | Proxy export CLI tool | Active |
|
||||
| engines.py | Search engine implementations | Active |
|
||||
| connection_pool.py | Tor connection pooling | Active |
|
||||
| network_stats.py | Network statistics tracking | Active |
|
||||
| dns.py | DNS resolution with caching | Active |
|
||||
| mitm.py | MITM certificate detection | Active |
|
||||
| job.py | Priority job queue | Active |
|
||||
| static/dashboard.js | Dashboard frontend logic | Active, enhanced |
|
||||
| static/dashboard.html | Dashboard HTML template | Active |
|
||||
|
||||
41
TODO.md
41
TODO.md
@@ -288,27 +288,16 @@ assessed for real-world impact based on measured data.
|
||||
|
||||
---
|
||||
|
||||
### [ ] 2. Proxy Validation Caching
|
||||
### [x] 2. Proxy Validation Caching
|
||||
|
||||
**Current State:**
|
||||
- `is_usable_proxy()`: 174,620 calls, 1.79s total
|
||||
- `fetch.py:242 <genexpr>`: 3,403,165 calls, 3.66s total (proxy iteration)
|
||||
- Many repeated validations for same proxy strings
|
||||
**Completed.** Converted is_usable_proxy() cache to proper LRU with OrderedDict.
|
||||
|
||||
**Proposed Change:**
|
||||
- Add LRU cache decorator to `is_usable_proxy()`
|
||||
- Cache size: 10,000 entries (covers typical working set)
|
||||
- TTL: None needed (IP validity doesn't change)
|
||||
|
||||
**Assessment:**
|
||||
```
|
||||
Current cost: 5.5s per 30min = 11s/hour = 4.4min/day
|
||||
Potential saving: 50-70% cache hit rate = 2.7-3.8s per 30min = 5-8s/hour
|
||||
Effort: Very low (add @lru_cache decorator)
|
||||
Risk: None (pure function, deterministic output)
|
||||
```
|
||||
|
||||
**Verdict:** LOW PRIORITY. Minimal gain for minimal effort. Do if convenient.
|
||||
**Implementation:**
|
||||
- fetch.py: Changed _proxy_valid_cache from dict to OrderedDict
|
||||
- Added thread-safe _proxy_valid_cache_lock
|
||||
- move_to_end() on cache hits to maintain LRU order
|
||||
- Evict oldest entries when cache reaches max size (10,000)
|
||||
- Proper LRU eviction instead of stopping inserts when full
|
||||
|
||||
---
|
||||
|
||||
@@ -414,15 +403,15 @@ SQLite's lightweight connections don't justify pooling complexity.
|
||||
│ Optimization │ Effort │ Risk │ Savings │ Status
|
||||
├─────────────────────────────────────┼────────┼────────┼─────────┼───────────┤
|
||||
│ 1. SQLite Query Batching │ Low │ Low │ 20-34s/h│ DONE
|
||||
│ 2. Proxy Validation Caching │ V.Low │ None │ 5-8s/h │ Maybe
|
||||
│ 2. Proxy Validation Caching │ V.Low │ None │ 5-8s/h │ DONE
|
||||
│ 3. Regex Pre-compilation │ Low │ None │ 5-8s/h │ DONE
|
||||
│ 4. JSON Response Caching │ Medium │ Low │ 7-9s/h │ Later
|
||||
│ 5. Object Pooling │ High │ Medium │ 11-15s/h│ Skip
|
||||
│ 6. SQLite Connection Reuse │ Medium │ Medium │ 0.3s/h │ Skip
|
||||
└─────────────────────────────────────┴────────┴────────┴─────────┴───────────┘
|
||||
|
||||
Completed: 1 (SQLite Batching), 3 (Regex Pre-compilation)
|
||||
Remaining: 2 (Proxy Caching - Maybe), 4 (JSON Caching - Later)
|
||||
Completed: 1 (SQLite Batching), 2 (Proxy Caching), 3 (Regex Pre-compilation)
|
||||
Remaining: 4 (JSON Caching - Later)
|
||||
|
||||
Realized savings from completed optimizations:
|
||||
Per hour: 25-42 seconds saved
|
||||
@@ -455,19 +444,20 @@ above target the remaining 31.3% of CPU-bound operations.
|
||||
- [ ] Lazy-load historical data (only when scrolled into view)
|
||||
- [ ] WebSocket option for push updates (reduce polling overhead)
|
||||
- [ ] Configurable refresh interval via URL param or localStorage
|
||||
- [ ] Disable auto-refresh when tab not visible (Page Visibility API)
|
||||
- [x] Pause polling when browser tab not visible (Page Visibility API)
|
||||
- [x] Skip chart rendering for inactive dashboard tabs (reduces CPU)
|
||||
|
||||
### [ ] Dashboard Feature Ideas
|
||||
|
||||
**Low priority - consider when time permits:**
|
||||
- [x] Geographic map visualization - /map endpoint with Leaflet.js
|
||||
- [ ] Dark/light theme toggle
|
||||
- [x] Dark/light/muted theme toggle - t key cycles themes
|
||||
- [ ] Export stats as CSV/JSON from dashboard
|
||||
- [ ] Historical graphs (24h, 7d) using stats_history table
|
||||
- [ ] Per-ASN performance analysis
|
||||
- [ ] Alert thresholds (success rate < X%, MITM detected)
|
||||
- [ ] Mobile-responsive improvements
|
||||
- [ ] Keyboard shortcuts (r=refresh, t=toggle sections)
|
||||
- [x] Keyboard shortcuts (r=refresh, 1-9=tabs, t=theme, p=pause)
|
||||
|
||||
### [ ] Local JS Library Serving
|
||||
|
||||
@@ -870,4 +860,3 @@ Note: If podman ps shows empty but port is listening and health check passes,
|
||||
the service is running correctly despite metadata issues. See "Podman Container
|
||||
Metadata Disappears" section above.
|
||||
```
|
||||
- Dashboard: pause API polling for inactive tabs (only update persistent items + active tab)
|
||||
|
||||
Reference in New Issue
Block a user