114 lines
5.3 KiB
Markdown
114 lines
5.3 KiB
Markdown
# PPF Roadmap
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────┐
|
|
│ Odin (Master) │
|
|
│ httpd.py ─ API + SSL-only verification │
|
|
│ proxywatchd.py ─ proxy recheck daemon │
|
|
│ SQLite: proxies.db, websites.db │
|
|
└──────────┬───────────────────────────────┘
|
|
│ WireGuard (10.200.1.0/24)
|
|
┌────────────────┼────────────────┐
|
|
v v v
|
|
┌───────────┐ ┌───────────┐ ┌───────────┐
|
|
│ cassius │ │ edge │ │ sentinel │
|
|
│ Worker │ │ Worker │ │ Worker │
|
|
│ ppf.py │ │ ppf.py │ │ ppf.py │
|
|
└───────────┘ └───────────┘ └───────────┘
|
|
```
|
|
|
|
Workers claim URLs, extract proxies, test them, report back.
|
|
Master verifies (SSL-only), serves API, coordinates distribution.
|
|
|
|
## Constraints
|
|
|
|
- Python 2.7 runtime (container-based)
|
|
- Minimal external dependencies
|
|
- All traffic via Tor
|
|
|
|
---
|
|
|
|
## Phase 1: Performance and Quality (current)
|
|
|
|
Profiling-driven optimizations and source pipeline hardening.
|
|
|
|
| Item | Status | Description |
|
|
|------|--------|-------------|
|
|
| Extraction short-circuits | done | Guard clauses in fetch.py extractors |
|
|
| Skip shutdown on failed sockets | done | Track _connected flag, skip shutdown on dead sockets |
|
|
| SQLite connection reuse (odin) | done | Per-greenlet cached handles via threading.local |
|
|
| Lazy-load ASN database | done | Defer ipasn.dat parsing to first lookup |
|
|
| Add more seed sources (100+) | done | Expanded to 120+ URLs with SOCKS5-specific sources |
|
|
| Protocol-aware source weighting | done | Dynamic SOCKS boost in claim_urls scoring |
|
|
| Sharpen error penalty in URL scoring | done | Reduce erroring URL claim frequency |
|
|
|
|
## Phase 2: Proxy Diversity and Consumer API
|
|
|
|
Address customer-reported quality gaps.
|
|
|
|
| Item | Status | Description |
|
|
|------|--------|-------------|
|
|
| ASN diversity scoring | pending | Deprioritize over-represented ASNs in testing |
|
|
| Graduated recheck intervals | pending | Fresh proxies rechecked more often than stale |
|
|
| API filters (proto/country/ASN/latency) | pending | Consumer-facing query parameters on /proxies |
|
|
| Latency-based ranking | pending | Expose latency percentiles per proxy |
|
|
|
|
## Phase 3: Self-Expanding Source Pool
|
|
|
|
Worker-driven link discovery from productive pages.
|
|
|
|
| Item | Status | Description |
|
|
|------|--------|-------------|
|
|
| Link extraction from productive pages | pending | Parse HTML for links when page yields proxies |
|
|
| Report discovered URLs to master | pending | New endpoint for worker URL submissions |
|
|
| Conditional discovery | pending | Only extract links from confirmed-productive pages |
|
|
|
|
## Phase 4: Long-Term
|
|
|
|
| Item | Status | Description |
|
|
|------|--------|-------------|
|
|
| Python 3 migration | deferred | Unblocks modern deps, security patches, pyasn native |
|
|
| Worker trust scoring | pending | Activate spot-check verification framework |
|
|
| Dynamic target pool | pending | Auto-discover and rotate validation targets |
|
|
| Geographic target spread | pending | Ensure targets span multiple regions |
|
|
|
|
---
|
|
|
|
## Completed
|
|
|
|
| Item | Date | Description |
|
|
|------|------|-------------|
|
|
| Sharpen URL error penalty | 2026-02-22 | error*0.5 cap 4.0 + stale*0.2 cap 1.5 |
|
|
| SOCKS5 source expansion | 2026-02-22 | Added 10 new SOCKS5-specific sources |
|
|
| SQLite connection reuse | 2026-02-22 | Per-greenlet cached handles via threading.local |
|
|
| Lazy-load ASN database | 2026-02-22 | Deferred ipasn.dat to first lookup |
|
|
| Socket shutdown skip | 2026-02-22 | _connected flag, skip shutdown on dead sockets |
|
|
| Protocol-aware weighting | 2026-02-22 | Dynamic SOCKS boost in claim_urls scoring |
|
|
| Seed sources expanded | 2026-02-22 | 37 -> 120+ URLs |
|
|
| last_seen freshness fix | 2026-02-22 | Watchd updates last_seen on verification |
|
|
| Periodic re-seeding | 2026-02-22 | Reset errored sources every 6h |
|
|
| ASN enrichment | 2026-02-22 | Pure-Python ipasn.dat reader + backfill |
|
|
| URL pipeline stats | 2026-02-22 | /api/stats exposes source health metrics |
|
|
| Extraction short-circuits | 2026-02-22 | Guard clauses + precompiled table regexes |
|
|
| Target health tracking | prior | Cooldown-based health for all target pools |
|
|
| MITM field in proxy list | prior | Expose mitm boolean in JSON endpoints |
|
|
| V1 worker protocol removal | prior | Cleaned up legacy --worker code path |
|
|
|
|
---
|
|
|
|
## File Reference
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| ppf.py | URL harvester, worker main loop |
|
|
| proxywatchd.py | Proxy validation daemon |
|
|
| fetch.py | HTTP fetching, proxy extraction |
|
|
| httpd.py | API server, worker coordination |
|
|
| dbs.py | Database schema, seed sources |
|
|
| config.py | Configuration management |
|
|
| rocksock.py | Socket/proxy abstraction |
|
|
| http2.py | HTTP client implementation |
|
|
| tools/ppf-deploy | Deployment wrapper |
|