Files
ppf/ROADMAP.md

5.3 KiB

PPF Roadmap

Architecture

                    ┌──────────────────────────────────────────┐
                    │              Odin (Master)                │
                    │  httpd.py ─ API + SSL-only verification   │
                    │  proxywatchd.py ─ proxy recheck daemon    │
                    │  SQLite: proxies.db, websites.db          │
                    └──────────┬───────────────────────────────┘
                               │ WireGuard (10.200.1.0/24)
              ┌────────────────┼────────────────┐
              v                v                v
        ┌───────────┐   ┌───────────┐   ┌───────────┐
        │  cassius   │   │   edge    │   │ sentinel  │
        │  Worker    │   │  Worker   │   │  Worker   │
        │  ppf.py    │   │  ppf.py   │   │  ppf.py   │
        └───────────┘   └───────────┘   └───────────┘

Workers claim URLs, extract proxies, test them, report back. Master verifies (SSL-only), serves API, coordinates distribution.

Constraints

  • Python 2.7 runtime (container-based)
  • Minimal external dependencies
  • All traffic via Tor

Phase 1: Performance and Quality (current)

Profiling-driven optimizations and source pipeline hardening.

Item Status Description
Extraction short-circuits done Guard clauses in fetch.py extractors
Skip shutdown on failed sockets done Track _connected flag, skip shutdown on dead sockets
SQLite connection reuse (odin) done Per-greenlet cached handles via threading.local
Lazy-load ASN database done Defer ipasn.dat parsing to first lookup
Add more seed sources (100+) done Expanded to 120+ URLs with SOCKS5-specific sources
Protocol-aware source weighting done Dynamic SOCKS boost in claim_urls scoring
Sharpen error penalty in URL scoring done Reduce erroring URL claim frequency

Phase 2: Proxy Diversity and Consumer API

Address customer-reported quality gaps.

Item Status Description
ASN diversity scoring pending Deprioritize over-represented ASNs in testing
Graduated recheck intervals pending Fresh proxies rechecked more often than stale
API filters (proto/country/ASN/latency) pending Consumer-facing query parameters on /proxies
Latency-based ranking pending Expose latency percentiles per proxy

Phase 3: Self-Expanding Source Pool

Worker-driven link discovery from productive pages.

Item Status Description
Link extraction from productive pages pending Parse HTML for links when page yields proxies
Report discovered URLs to master pending New endpoint for worker URL submissions
Conditional discovery pending Only extract links from confirmed-productive pages

Phase 4: Long-Term

Item Status Description
Python 3 migration deferred Unblocks modern deps, security patches, pyasn native
Worker trust scoring pending Activate spot-check verification framework
Dynamic target pool pending Auto-discover and rotate validation targets
Geographic target spread pending Ensure targets span multiple regions

Completed

Item Date Description
Sharpen URL error penalty 2026-02-22 error0.5 cap 4.0 + stale0.2 cap 1.5
SOCKS5 source expansion 2026-02-22 Added 10 new SOCKS5-specific sources
SQLite connection reuse 2026-02-22 Per-greenlet cached handles via threading.local
Lazy-load ASN database 2026-02-22 Deferred ipasn.dat to first lookup
Socket shutdown skip 2026-02-22 _connected flag, skip shutdown on dead sockets
Protocol-aware weighting 2026-02-22 Dynamic SOCKS boost in claim_urls scoring
Seed sources expanded 2026-02-22 37 -> 120+ URLs
last_seen freshness fix 2026-02-22 Watchd updates last_seen on verification
Periodic re-seeding 2026-02-22 Reset errored sources every 6h
ASN enrichment 2026-02-22 Pure-Python ipasn.dat reader + backfill
URL pipeline stats 2026-02-22 /api/stats exposes source health metrics
Extraction short-circuits 2026-02-22 Guard clauses + precompiled table regexes
Target health tracking prior Cooldown-based health for all target pools
MITM field in proxy list prior Expose mitm boolean in JSON endpoints
V1 worker protocol removal prior Cleaned up legacy --worker code path

File Reference

File Purpose
ppf.py URL harvester, worker main loop
proxywatchd.py Proxy validation daemon
fetch.py HTTP fetching, proxy extraction
httpd.py API server, worker coordination
dbs.py Database schema, seed sources
config.py Configuration management
rocksock.py Socket/proxy abstraction
http2.py HTTP client implementation
tools/ppf-deploy Deployment wrapper