# PPF Roadmap ## Architecture ``` ┌──────────────────────────────────────────┐ │ Odin (Master) │ │ httpd.py ─ API + SSL-only verification │ │ proxywatchd.py ─ proxy recheck daemon │ │ SQLite: proxies.db, websites.db │ └──────────┬───────────────────────────────┘ │ WireGuard (10.200.1.0/24) ┌────────────────┼────────────────┐ v v v ┌───────────┐ ┌───────────┐ ┌───────────┐ │ cassius │ │ edge │ │ sentinel │ │ Worker │ │ Worker │ │ Worker │ │ ppf.py │ │ ppf.py │ │ ppf.py │ └───────────┘ └───────────┘ └───────────┘ ``` Workers claim URLs, extract proxies, test them, report back. Master verifies (SSL-only), serves API, coordinates distribution. ## Constraints - Python 2.7 runtime (container-based) - Minimal external dependencies - All traffic via Tor --- ## Phase 1: Performance and Quality (current) Profiling-driven optimizations and source pipeline hardening. | Item | Status | Description | |------|--------|-------------| | Extraction short-circuits | done | Guard clauses in fetch.py extractors | | Skip shutdown on failed sockets | done | Track _connected flag, skip shutdown on dead sockets | | SQLite connection reuse (odin) | done | Per-greenlet cached handles via threading.local | | Lazy-load ASN database | done | Defer ipasn.dat parsing to first lookup | | Add more seed sources (100+) | done | Expanded to 120+ URLs with SOCKS5-specific sources | | Protocol-aware source weighting | done | Dynamic SOCKS boost in claim_urls scoring | | Sharpen error penalty in URL scoring | done | Reduce erroring URL claim frequency | ## Phase 2: Proxy Diversity and Consumer API Address customer-reported quality gaps. | Item | Status | Description | |------|--------|-------------| | ASN diversity scoring | pending | Deprioritize over-represented ASNs in testing | | Graduated recheck intervals | pending | Fresh proxies rechecked more often than stale | | API filters (proto/country/ASN/latency) | pending | Consumer-facing query parameters on /proxies | | Latency-based ranking | pending | Expose latency percentiles per proxy | ## Phase 3: Self-Expanding Source Pool Worker-driven link discovery from productive pages. | Item | Status | Description | |------|--------|-------------| | Link extraction from productive pages | pending | Parse HTML for links when page yields proxies | | Report discovered URLs to master | pending | New endpoint for worker URL submissions | | Conditional discovery | pending | Only extract links from confirmed-productive pages | ## Phase 4: Long-Term | Item | Status | Description | |------|--------|-------------| | Python 3 migration | deferred | Unblocks modern deps, security patches, pyasn native | | Worker trust scoring | pending | Activate spot-check verification framework | | Dynamic target pool | pending | Auto-discover and rotate validation targets | | Geographic target spread | pending | Ensure targets span multiple regions | --- ## Completed | Item | Date | Description | |------|------|-------------| | Sharpen URL error penalty | 2026-02-22 | error*0.5 cap 4.0 + stale*0.2 cap 1.5 | | SOCKS5 source expansion | 2026-02-22 | Added 10 new SOCKS5-specific sources | | SQLite connection reuse | 2026-02-22 | Per-greenlet cached handles via threading.local | | Lazy-load ASN database | 2026-02-22 | Deferred ipasn.dat to first lookup | | Socket shutdown skip | 2026-02-22 | _connected flag, skip shutdown on dead sockets | | Protocol-aware weighting | 2026-02-22 | Dynamic SOCKS boost in claim_urls scoring | | Seed sources expanded | 2026-02-22 | 37 -> 120+ URLs | | last_seen freshness fix | 2026-02-22 | Watchd updates last_seen on verification | | Periodic re-seeding | 2026-02-22 | Reset errored sources every 6h | | ASN enrichment | 2026-02-22 | Pure-Python ipasn.dat reader + backfill | | URL pipeline stats | 2026-02-22 | /api/stats exposes source health metrics | | Extraction short-circuits | 2026-02-22 | Guard clauses + precompiled table regexes | | Target health tracking | prior | Cooldown-based health for all target pools | | MITM field in proxy list | prior | Expose mitm boolean in JSON endpoints | | V1 worker protocol removal | prior | Cleaned up legacy --worker code path | --- ## File Reference | File | Purpose | |------|---------| | ppf.py | URL harvester, worker main loop | | proxywatchd.py | Proxy validation daemon | | fetch.py | HTTP fetching, proxy extraction | | httpd.py | API server, worker coordination | | dbs.py | Database schema, seed sources | | config.py | Configuration management | | rocksock.py | Socket/proxy abstraction | | http2.py | HTTP client implementation | | tools/ppf-deploy | Deployment wrapper |