All checks were successful
CI / validate (push) Successful in 21s
Restructure roadmap into phases. Clean up todo as intake buffer. Add execution tasklist with prioritized items.
4.7 KiB
4.7 KiB
PPF Roadmap
Architecture
┌──────────────────────────────────────────┐
│ Odin (Master) │
│ httpd.py ─ API + SSL-only verification │
│ proxywatchd.py ─ proxy recheck daemon │
│ SQLite: proxies.db, websites.db │
└──────────┬───────────────────────────────┘
│ WireGuard (10.200.1.0/24)
┌────────────────┼────────────────┐
v v v
┌───────────┐ ┌───────────┐ ┌───────────┐
│ cassius │ │ edge │ │ sentinel │
│ Worker │ │ Worker │ │ Worker │
│ ppf.py │ │ ppf.py │ │ ppf.py │
└───────────┘ └───────────┘ └───────────┘
Workers claim URLs, extract proxies, test them, report back. Master verifies (SSL-only), serves API, coordinates distribution.
Constraints
- Python 2.7 runtime (container-based)
- Minimal external dependencies
- All traffic via Tor
Phase 1: Performance and Quality (current)
Profiling-driven optimizations and source pipeline hardening.
| Item | Status | Description |
|---|---|---|
| Extraction short-circuits | done | Guard clauses in fetch.py extractors |
| Skip shutdown on failed sockets | pending | Avoid 39s/session wasted on dead connections |
| SQLite connection reuse (odin) | pending | Cache per-greenlet, eliminate 2.7k opens/session |
| Lazy-load ASN database | pending | Defer 3.6s startup cost to first lookup |
| Add more seed sources (100+) | pending | Expand beyond 37 hardcoded URLs |
| Protocol-aware source weighting | pending | Prioritize SOCKS5-yielding sources |
Phase 2: Proxy Diversity and Consumer API
Address customer-reported quality gaps.
| Item | Status | Description |
|---|---|---|
| ASN diversity scoring | pending | Deprioritize over-represented ASNs in testing |
| Graduated recheck intervals | pending | Fresh proxies rechecked more often than stale |
| API filters (proto/country/ASN/latency) | pending | Consumer-facing query parameters on /proxies |
| Latency-based ranking | pending | Expose latency percentiles per proxy |
Phase 3: Self-Expanding Source Pool
Worker-driven link discovery from productive pages.
| Item | Status | Description |
|---|---|---|
| Link extraction from productive pages | pending | Parse HTML for links when page yields proxies |
| Report discovered URLs to master | pending | New endpoint for worker URL submissions |
| Conditional discovery | pending | Only extract links from confirmed-productive pages |
Phase 4: Long-Term
| Item | Status | Description |
|---|---|---|
| Python 3 migration | deferred | Unblocks modern deps, security patches, pyasn native |
| Worker trust scoring | pending | Activate spot-check verification framework |
| Dynamic target pool | pending | Auto-discover and rotate validation targets |
| Geographic target spread | pending | Ensure targets span multiple regions |
Completed
| Item | Date | Description |
|---|---|---|
| last_seen freshness fix | 2026-02-22 | Watchd updates last_seen on verification |
| Periodic re-seeding | 2026-02-22 | Reset errored sources every 6h |
| ASN enrichment | 2026-02-22 | Pure-Python ipasn.dat reader + backfill |
| URL pipeline stats | 2026-02-22 | /api/stats exposes source health metrics |
| Extraction short-circuits | 2026-02-22 | Guard clauses + precompiled table regexes |
| Target health tracking | prior | Cooldown-based health for all target pools |
| MITM field in proxy list | prior | Expose mitm boolean in JSON endpoints |
| V1 worker protocol removal | prior | Cleaned up legacy --worker code path |
File Reference
| File | Purpose |
|---|---|
| ppf.py | URL harvester, worker main loop |
| proxywatchd.py | Proxy validation daemon |
| fetch.py | HTTP fetching, proxy extraction |
| httpd.py | API server, worker coordination |
| dbs.py | Database schema, seed sources |
| config.py | Configuration management |
| rocksock.py | Socket/proxy abstraction |
| http2.py | HTTP client implementation |
| tools/ppf-deploy | Deployment wrapper |