feat: worker-driven discovery and validation tightening #1

Merged
username merged 24 commits from feature/worker-driven-discovery into master 2026-02-17 17:39:49 +00:00

24 Commits

Author SHA1 Message Date
Username
fab1e1d110 compose: rewrite master and worker compose files
Some checks failed
CI / syntax-check (pull_request) Failing after 0s
CI / syntax-check (push) Failing after 0s
CI / memory-leak-check (pull_request) Failing after 16s
CI / memory-leak-check (push) Successful in 16s
Drop deprecated version key, add SELinux volume labels, SIGTERM
handling with 30s grace period, configurable master URL via
PPF_MASTER_URL env var, and usage documentation in headers.
2026-02-17 18:37:49 +01:00
Username
716d60898b config: allow checktype = none to disable secondary check
Accepts none/false/off/disabled as checktype value, normalized to
'none' internally. When set, ssl_first is forced on and no Phase 2
check runs -- only successful TLS handshakes count as working.
2026-02-17 18:37:44 +01:00
Username
2e3ce149f9 watchd: tighten secondary check validation
- judge blocks record as neutral (judge_block category), not success;
  evaluate() filters them out so they affect neither pass nor fail count
- require HTTP/1.x response line for non-IRC checks; non-HTTP garbage
  (captive portals, proxy error pages) fails immediately
- add is_public_ip() rejecting RFC 1918, loopback, link-local, and
  multicast ranges from judge exit IP extraction
- remove 5 weak HEAD regex targets whose fingerprint headers appear on
  error pages and captive portals (p3p, X-XSS-Protection,
  x-frame-options, referrer-policy, X-UA-Compatible)
2026-02-17 18:37:38 +01:00
Username
1236ddbd2d add compose files for container management
Replace raw podman run with declarative compose.yml per host type.
Master (odin) gets compose.master.yml, workers get compose.worker.yml.
2026-02-17 18:17:12 +01:00
Username
0311abb46a fetch: encode unicode URLs to bytes before HTTP/SOCKS ops
When URLs arrive as unicode (e.g. from JSON API responses), the unicode
type propagates through _parse_url into the SOCKS5 packet construction
in rocksock. Port bytes > 127 formatted via %c in a unicode string
produce non-ASCII characters that fail on socket sendall() implicit
ASCII encode.

Encode URLs to UTF-8 bytes at fetch entry points to keep the entire
request pipeline in str (bytes) domain.
2026-02-17 16:43:26 +01:00
Username
e74782ad3f ppf: fix worker_id undefined when using --worker-key 2026-02-17 16:15:04 +01:00
Username
c710555aad ppf: pass url scoring config to httpd module 2026-02-17 15:20:15 +01:00
Username
c5287073bf httpd: add score-based url scheduling with EMA tracking
Replace ORDER BY RANDOM() in claim_urls with composite score:
age/interval ratio, yield bonus, quality bonus, error/stale penalties.

Rewrite submit_url_reports with adaptive check_interval and EMA for
avg_fetch_time and yield_rate. Add working_ratio correlation in
submit_proxy_reports via pending count tracking.
2026-02-17 15:20:07 +01:00
Username
66441f9292 dbs: add url scoring columns to uris table
Migration functions for check_interval, working_ratio, avg_fetch_time,
last_worker, and yield_rate columns with sensible defaults.
2026-02-17 15:19:59 +01:00
Username
862eeed5c8 ppf: add worker_v2_main() for URL-driven discovery 2026-02-17 14:23:58 +01:00
Username
0685c2bc4c ppf: add HTTP client functions for V2 worker endpoints 2026-02-17 14:23:44 +01:00
Username
4a5210f9f7 config: add worker V2 config items and --worker-v2 flag 2026-02-17 14:23:13 +01:00
Username
18c7118ed8 docs: update worker hosts to cassius, edge, sentinel 2026-02-17 14:05:29 +01:00
Username
6c111af630 httpd: add /api/report-proxies endpoint 2026-02-17 13:44:57 +01:00
Username
66157b5216 httpd: add /api/report-urls endpoint 2026-02-17 13:43:56 +01:00
Username
3162c65549 httpd: add /api/claim-urls endpoint 2026-02-17 13:42:59 +01:00
Username
5197c3b7e6 httpd: pass url database to api server 2026-02-17 13:42:01 +01:00
Username
da832d94b7 dbs: add last_seen column to proxylist 2026-02-17 13:41:25 +01:00
Username
96e6f06e0d docs: add worker-driven discovery design doc
Architecture proposal to move proxy list fetching from master to
workers. Workers claim URLs, fetch lists, extract and test proxies,
report working proxies and URL health back to master. Trust-based
model: workers report working proxies only, no consensus needed.
2026-02-17 13:32:42 +01:00
Username
c19959cda2 dbs: add 19 proxy sources from 7 new repositories
Expand PROXY_SOURCES with proxifly, vakhov, prxchk, sunny9577,
officialputuid, hookzof, and iplocate lists. Add source_proto
and protos_working schema columns for protocol intelligence.
Remove completed proxy source expansion task from roadmap.
2026-02-17 13:13:23 +01:00
Username
e6b736a577 docs: remove completed items from TODO and ROADMAP 2026-02-17 12:06:49 +01:00
Username
00afd141ae httpd: add /proxies/all endpoint for unlimited proxy list 2026-02-15 12:27:55 +01:00
Username
6ba4b3e1e9 httpd: exclude untested proxies from results
Filter out entries with proto IS NULL from /proxies and /proxies/count
endpoints. These are proxies added to the database but never validated,
leaking into results with null proto, asn, and zero latency.
2026-02-15 04:02:00 +01:00
Username
2960458825 httpd: fix wsgi /proxies route ignoring query params
The WSGI _handle_route had a hardcoded LIMIT 100 query for /proxies,
ignoring limit, proto, country, asn, and format parameters. Align
with the BaseHTTPRequestHandler path that already supported them.
2026-02-15 03:58:57 +01:00