PPF Roadmap
Architecture
Workers claim URLs, extract proxies, test them, report back.
Master verifies (SSL-only), serves API, coordinates distribution.
Constraints
- Python 2.7 runtime (container-based)
- Minimal external dependencies
- All traffic via Tor
Phase 1: Performance and Quality (current)
Profiling-driven optimizations and source pipeline hardening.
| Item |
Status |
Description |
| Extraction short-circuits |
done |
Guard clauses in fetch.py extractors |
| Skip shutdown on failed sockets |
done |
Track _connected flag, skip shutdown on dead sockets |
| SQLite connection reuse (odin) |
done |
Per-greenlet cached handles via threading.local |
| Lazy-load ASN database |
done |
Defer ipasn.dat parsing to first lookup |
| Add more seed sources (100+) |
done |
Expanded to 120+ URLs with SOCKS5-specific sources |
| Protocol-aware source weighting |
done |
Dynamic SOCKS boost in claim_urls scoring |
| Sharpen error penalty in URL scoring |
done |
Reduce erroring URL claim frequency |
Phase 2: Proxy Diversity and Consumer API
Address customer-reported quality gaps.
| Item |
Status |
Description |
| ASN diversity scoring |
pending |
Deprioritize over-represented ASNs in testing |
| Graduated recheck intervals |
pending |
Fresh proxies rechecked more often than stale |
| API filters (proto/country/ASN/latency) |
pending |
Consumer-facing query parameters on /proxies |
| Latency-based ranking |
pending |
Expose latency percentiles per proxy |
Phase 3: Self-Expanding Source Pool
Worker-driven link discovery from productive pages.
| Item |
Status |
Description |
| Link extraction from productive pages |
pending |
Parse HTML for links when page yields proxies |
| Report discovered URLs to master |
pending |
New endpoint for worker URL submissions |
| Conditional discovery |
pending |
Only extract links from confirmed-productive pages |
Phase 4: Long-Term
| Item |
Status |
Description |
| Python 3 migration |
deferred |
Unblocks modern deps, security patches, pyasn native |
| Worker trust scoring |
pending |
Activate spot-check verification framework |
| Dynamic target pool |
pending |
Auto-discover and rotate validation targets |
| Geographic target spread |
pending |
Ensure targets span multiple regions |
Completed
| Item |
Date |
Description |
| Sharpen URL error penalty |
2026-02-22 |
error0.5 cap 4.0 + stale0.2 cap 1.5 |
| SOCKS5 source expansion |
2026-02-22 |
Added 10 new SOCKS5-specific sources |
| SQLite connection reuse |
2026-02-22 |
Per-greenlet cached handles via threading.local |
| Lazy-load ASN database |
2026-02-22 |
Deferred ipasn.dat to first lookup |
| Socket shutdown skip |
2026-02-22 |
_connected flag, skip shutdown on dead sockets |
| Protocol-aware weighting |
2026-02-22 |
Dynamic SOCKS boost in claim_urls scoring |
| Seed sources expanded |
2026-02-22 |
37 -> 120+ URLs |
| last_seen freshness fix |
2026-02-22 |
Watchd updates last_seen on verification |
| Periodic re-seeding |
2026-02-22 |
Reset errored sources every 6h |
| ASN enrichment |
2026-02-22 |
Pure-Python ipasn.dat reader + backfill |
| URL pipeline stats |
2026-02-22 |
/api/stats exposes source health metrics |
| Extraction short-circuits |
2026-02-22 |
Guard clauses + precompiled table regexes |
| Target health tracking |
prior |
Cooldown-based health for all target pools |
| MITM field in proxy list |
prior |
Expose mitm boolean in JSON endpoints |
| V1 worker protocol removal |
prior |
Cleaned up legacy --worker code path |
File Reference
| File |
Purpose |
| ppf.py |
URL harvester, worker main loop |
| proxywatchd.py |
Proxy validation daemon |
| fetch.py |
HTTP fetching, proxy extraction |
| httpd.py |
API server, worker coordination |
| dbs.py |
Database schema, seed sources |
| config.py |
Configuration management |
| rocksock.py |
Socket/proxy abstraction |
| http2.py |
HTTP client implementation |
| tools/ppf-deploy |
Deployment wrapper |