feat: worker-driven discovery and validation tightening #1

Merged
username merged 24 commits from feature/worker-driven-discovery into master 2026-02-17 17:39:49 +00:00
Owner

Summary

  • V2 worker mode: URL-driven proxy discovery (claim URLs from master, fetch via Tor, extract and report proxies)
  • Score-based URL scheduling with EMA tracking
  • Tighten secondary check validation: neutral judge blocks, HTTP response validation, reject private IPs, remove weak HEAD regexes
  • Configurable secondary check: checktype = none disables Phase 2 (SSL-only mode)
  • Compose files for master and worker container management

Commits

  • dbs: add last_seen, url scoring columns
  • httpd: claim-urls, report-urls, report-proxies endpoints
  • config: worker V2 flags, checktype none/false
  • ppf: worker_v2_main(), HTTP client functions
  • fetch: encode unicode URLs for HTTP/SOCKS
  • watchd: tighten secondary check validation
  • compose: rewrite master and worker compose files
## Summary - V2 worker mode: URL-driven proxy discovery (claim URLs from master, fetch via Tor, extract and report proxies) - Score-based URL scheduling with EMA tracking - Tighten secondary check validation: neutral judge blocks, HTTP response validation, reject private IPs, remove weak HEAD regexes - Configurable secondary check: `checktype = none` disables Phase 2 (SSL-only mode) - Compose files for master and worker container management ## Commits - `dbs`: add last_seen, url scoring columns - `httpd`: claim-urls, report-urls, report-proxies endpoints - `config`: worker V2 flags, checktype none/false - `ppf`: worker_v2_main(), HTTP client functions - `fetch`: encode unicode URLs for HTTP/SOCKS - `watchd`: tighten secondary check validation - `compose`: rewrite master and worker compose files
username added 24 commits 2026-02-17 17:39:38 +00:00
The WSGI _handle_route had a hardcoded LIMIT 100 query for /proxies,
ignoring limit, proto, country, asn, and format parameters. Align
with the BaseHTTPRequestHandler path that already supported them.
Filter out entries with proto IS NULL from /proxies and /proxies/count
endpoints. These are proxies added to the database but never validated,
leaking into results with null proto, asn, and zero latency.
Expand PROXY_SOURCES with proxifly, vakhov, prxchk, sunny9577,
officialputuid, hookzof, and iplocate lists. Add source_proto
and protos_working schema columns for protocol intelligence.
Remove completed proxy source expansion task from roadmap.
Architecture proposal to move proxy list fetching from master to
workers. Workers claim URLs, fetch lists, extract and test proxies,
report working proxies and URL health back to master. Trust-based
model: workers report working proxies only, no consensus needed.
Migration functions for check_interval, working_ratio, avg_fetch_time,
last_worker, and yield_rate columns with sensible defaults.
Replace ORDER BY RANDOM() in claim_urls with composite score:
age/interval ratio, yield bonus, quality bonus, error/stale penalties.

Rewrite submit_url_reports with adaptive check_interval and EMA for
avg_fetch_time and yield_rate. Add working_ratio correlation in
submit_proxy_reports via pending count tracking.
When URLs arrive as unicode (e.g. from JSON API responses), the unicode
type propagates through _parse_url into the SOCKS5 packet construction
in rocksock. Port bytes > 127 formatted via %c in a unicode string
produce non-ASCII characters that fail on socket sendall() implicit
ASCII encode.

Encode URLs to UTF-8 bytes at fetch entry points to keep the entire
request pipeline in str (bytes) domain.
Replace raw podman run with declarative compose.yml per host type.
Master (odin) gets compose.master.yml, workers get compose.worker.yml.
- judge blocks record as neutral (judge_block category), not success;
  evaluate() filters them out so they affect neither pass nor fail count
- require HTTP/1.x response line for non-IRC checks; non-HTTP garbage
  (captive portals, proxy error pages) fails immediately
- add is_public_ip() rejecting RFC 1918, loopback, link-local, and
  multicast ranges from judge exit IP extraction
- remove 5 weak HEAD regex targets whose fingerprint headers appear on
  error pages and captive portals (p3p, X-XSS-Protection,
  x-frame-options, referrer-policy, X-UA-Compatible)
Accepts none/false/off/disabled as checktype value, normalized to
'none' internally. When set, ssl_first is forced on and no Phase 2
check runs -- only successful TLS handshakes count as working.
compose: rewrite master and worker compose files
Some checks failed
CI / syntax-check (pull_request) Failing after 0s
CI / syntax-check (push) Failing after 0s
CI / memory-leak-check (pull_request) Failing after 16s
CI / memory-leak-check (push) Successful in 16s
fab1e1d110
Drop deprecated version key, add SELinux volume labels, SIGTERM
handling with 30s grace period, configurable master URL via
PPF_MASTER_URL env var, and usage documentation in headers.
username merged commit fab1e1d110 into master 2026-02-17 17:39:49 +00:00
username deleted branch feature/worker-driven-discovery 2026-02-17 17:39:49 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: username/ppf#1