All checks were successful
CI / validate (push) Successful in 21s
Restructure roadmap into phases. Clean up todo as intake buffer. Add execution tasklist with prioritized items.
106 lines
4.7 KiB
Markdown
106 lines
4.7 KiB
Markdown
# PPF Roadmap
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────┐
|
|
│ Odin (Master) │
|
|
│ httpd.py ─ API + SSL-only verification │
|
|
│ proxywatchd.py ─ proxy recheck daemon │
|
|
│ SQLite: proxies.db, websites.db │
|
|
└──────────┬───────────────────────────────┘
|
|
│ WireGuard (10.200.1.0/24)
|
|
┌────────────────┼────────────────┐
|
|
v v v
|
|
┌───────────┐ ┌───────────┐ ┌───────────┐
|
|
│ cassius │ │ edge │ │ sentinel │
|
|
│ Worker │ │ Worker │ │ Worker │
|
|
│ ppf.py │ │ ppf.py │ │ ppf.py │
|
|
└───────────┘ └───────────┘ └───────────┘
|
|
```
|
|
|
|
Workers claim URLs, extract proxies, test them, report back.
|
|
Master verifies (SSL-only), serves API, coordinates distribution.
|
|
|
|
## Constraints
|
|
|
|
- Python 2.7 runtime (container-based)
|
|
- Minimal external dependencies
|
|
- All traffic via Tor
|
|
|
|
---
|
|
|
|
## Phase 1: Performance and Quality (current)
|
|
|
|
Profiling-driven optimizations and source pipeline hardening.
|
|
|
|
| Item | Status | Description |
|
|
|------|--------|-------------|
|
|
| Extraction short-circuits | done | Guard clauses in fetch.py extractors |
|
|
| Skip shutdown on failed sockets | pending | Avoid 39s/session wasted on dead connections |
|
|
| SQLite connection reuse (odin) | pending | Cache per-greenlet, eliminate 2.7k opens/session |
|
|
| Lazy-load ASN database | pending | Defer 3.6s startup cost to first lookup |
|
|
| Add more seed sources (100+) | pending | Expand beyond 37 hardcoded URLs |
|
|
| Protocol-aware source weighting | pending | Prioritize SOCKS5-yielding sources |
|
|
|
|
## Phase 2: Proxy Diversity and Consumer API
|
|
|
|
Address customer-reported quality gaps.
|
|
|
|
| Item | Status | Description |
|
|
|------|--------|-------------|
|
|
| ASN diversity scoring | pending | Deprioritize over-represented ASNs in testing |
|
|
| Graduated recheck intervals | pending | Fresh proxies rechecked more often than stale |
|
|
| API filters (proto/country/ASN/latency) | pending | Consumer-facing query parameters on /proxies |
|
|
| Latency-based ranking | pending | Expose latency percentiles per proxy |
|
|
|
|
## Phase 3: Self-Expanding Source Pool
|
|
|
|
Worker-driven link discovery from productive pages.
|
|
|
|
| Item | Status | Description |
|
|
|------|--------|-------------|
|
|
| Link extraction from productive pages | pending | Parse HTML for links when page yields proxies |
|
|
| Report discovered URLs to master | pending | New endpoint for worker URL submissions |
|
|
| Conditional discovery | pending | Only extract links from confirmed-productive pages |
|
|
|
|
## Phase 4: Long-Term
|
|
|
|
| Item | Status | Description |
|
|
|------|--------|-------------|
|
|
| Python 3 migration | deferred | Unblocks modern deps, security patches, pyasn native |
|
|
| Worker trust scoring | pending | Activate spot-check verification framework |
|
|
| Dynamic target pool | pending | Auto-discover and rotate validation targets |
|
|
| Geographic target spread | pending | Ensure targets span multiple regions |
|
|
|
|
---
|
|
|
|
## Completed
|
|
|
|
| Item | Date | Description |
|
|
|------|------|-------------|
|
|
| last_seen freshness fix | 2026-02-22 | Watchd updates last_seen on verification |
|
|
| Periodic re-seeding | 2026-02-22 | Reset errored sources every 6h |
|
|
| ASN enrichment | 2026-02-22 | Pure-Python ipasn.dat reader + backfill |
|
|
| URL pipeline stats | 2026-02-22 | /api/stats exposes source health metrics |
|
|
| Extraction short-circuits | 2026-02-22 | Guard clauses + precompiled table regexes |
|
|
| Target health tracking | prior | Cooldown-based health for all target pools |
|
|
| MITM field in proxy list | prior | Expose mitm boolean in JSON endpoints |
|
|
| V1 worker protocol removal | prior | Cleaned up legacy --worker code path |
|
|
|
|
---
|
|
|
|
## File Reference
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| ppf.py | URL harvester, worker main loop |
|
|
| proxywatchd.py | Proxy validation daemon |
|
|
| fetch.py | HTTP fetching, proxy extraction |
|
|
| httpd.py | API server, worker coordination |
|
|
| dbs.py | Database schema, seed sources |
|
|
| config.py | Configuration management |
|
|
| rocksock.py | Socket/proxy abstraction |
|
|
| http2.py | HTTP client implementation |
|
|
| tools/ppf-deploy | Deployment wrapper |
|