Commit Graph

447 Commits

Author SHA1 Message Date
Username
1236ddbd2d add compose files for container management
Replace raw podman run with declarative compose.yml per host type.
Master (odin) gets compose.master.yml, workers get compose.worker.yml.
2026-02-17 18:17:12 +01:00
Username
0311abb46a fetch: encode unicode URLs to bytes before HTTP/SOCKS ops
When URLs arrive as unicode (e.g. from JSON API responses), the unicode
type propagates through _parse_url into the SOCKS5 packet construction
in rocksock. Port bytes > 127 formatted via %c in a unicode string
produce non-ASCII characters that fail on socket sendall() implicit
ASCII encode.

Encode URLs to UTF-8 bytes at fetch entry points to keep the entire
request pipeline in str (bytes) domain.
2026-02-17 16:43:26 +01:00
Username
e74782ad3f ppf: fix worker_id undefined when using --worker-key 2026-02-17 16:15:04 +01:00
Username
c710555aad ppf: pass url scoring config to httpd module 2026-02-17 15:20:15 +01:00
Username
c5287073bf httpd: add score-based url scheduling with EMA tracking
Replace ORDER BY RANDOM() in claim_urls with composite score:
age/interval ratio, yield bonus, quality bonus, error/stale penalties.

Rewrite submit_url_reports with adaptive check_interval and EMA for
avg_fetch_time and yield_rate. Add working_ratio correlation in
submit_proxy_reports via pending count tracking.
2026-02-17 15:20:07 +01:00
Username
66441f9292 dbs: add url scoring columns to uris table
Migration functions for check_interval, working_ratio, avg_fetch_time,
last_worker, and yield_rate columns with sensible defaults.
2026-02-17 15:19:59 +01:00
Username
862eeed5c8 ppf: add worker_v2_main() for URL-driven discovery 2026-02-17 14:23:58 +01:00
Username
0685c2bc4c ppf: add HTTP client functions for V2 worker endpoints 2026-02-17 14:23:44 +01:00
Username
4a5210f9f7 config: add worker V2 config items and --worker-v2 flag 2026-02-17 14:23:13 +01:00
Username
18c7118ed8 docs: update worker hosts to cassius, edge, sentinel 2026-02-17 14:05:29 +01:00
Username
6c111af630 httpd: add /api/report-proxies endpoint 2026-02-17 13:44:57 +01:00
Username
66157b5216 httpd: add /api/report-urls endpoint 2026-02-17 13:43:56 +01:00
Username
3162c65549 httpd: add /api/claim-urls endpoint 2026-02-17 13:42:59 +01:00
Username
5197c3b7e6 httpd: pass url database to api server 2026-02-17 13:42:01 +01:00
Username
da832d94b7 dbs: add last_seen column to proxylist 2026-02-17 13:41:25 +01:00
Username
96e6f06e0d docs: add worker-driven discovery design doc
Architecture proposal to move proxy list fetching from master to
workers. Workers claim URLs, fetch lists, extract and test proxies,
report working proxies and URL health back to master. Trust-based
model: workers report working proxies only, no consensus needed.
2026-02-17 13:32:42 +01:00
Username
c19959cda2 dbs: add 19 proxy sources from 7 new repositories
Expand PROXY_SOURCES with proxifly, vakhov, prxchk, sunny9577,
officialputuid, hookzof, and iplocate lists. Add source_proto
and protos_working schema columns for protocol intelligence.
Remove completed proxy source expansion task from roadmap.
2026-02-17 13:13:23 +01:00
Username
e6b736a577 docs: remove completed items from TODO and ROADMAP 2026-02-17 12:06:49 +01:00
Username
00afd141ae httpd: add /proxies/all endpoint for unlimited proxy list 2026-02-15 12:27:55 +01:00
Username
6ba4b3e1e9 httpd: exclude untested proxies from results
Filter out entries with proto IS NULL from /proxies and /proxies/count
endpoints. These are proxies added to the database but never validated,
leaking into results with null proto, asn, and zero latency.
2026-02-15 04:02:00 +01:00
Username
2960458825 httpd: fix wsgi /proxies route ignoring query params
The WSGI _handle_route had a hardcoded LIMIT 100 query for /proxies,
ignoring limit, proto, country, asn, and format parameters. Align
with the BaseHTTPRequestHandler path that already supported them.
2026-02-15 03:58:57 +01:00
Username
92d6e57fb8 dockerfile: apply debian 10 security updates
All checks were successful
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 10s
- add debian-security archive repository
- run apt-get upgrade for all available patches
- upgrade pip/setuptools/wheel to latest py2.7 versions

reduces container vulnerabilities from 293 to 130
2026-01-18 09:14:48 +01:00
Username
d87ff73d95 httpd: remove memory profiling code
Remove objgraph/pympler imports, gc.get_objects() caching,
and memory_samples tracking. Keep basic RSS tracking for dashboard.
2026-01-17 19:25:33 +01:00
Username
12174b0d9d fetch: fix LRU cache for python 2 compatibility 2026-01-08 09:05:59 +01:00
Username
8b606efa6d docs: update project instructions 2026-01-08 09:05:44 +01:00
Username
ae0b11d60f docs: update roadmap with completed items 2026-01-08 09:05:39 +01:00
Username
5da5f3025d dashboard: update UI for queue status display 2026-01-08 09:05:34 +01:00
Username
2156319bad ppf: worker heartbeat includes thread count 2026-01-08 09:05:30 +01:00
Username
1cb7d93a5f proxywatchd: add ssl_only mode and schedule improvements
- ssl_only mode: skip secondary check when SSL handshake fails
- _build_due_sql(): unified query for proxies due testing
- working_checktime/fail_retry_interval: new schedule formula
- fail_retry_backoff: linear backoff option for failing proxies
2026-01-08 09:05:25 +01:00
Username
8272cf06e0 config: add verification and schedule settings
- [verification] section: enabled, threads, batch_size, interval, max_queue
- working_checktime: retest interval for working proxies (default: 300s)
- fail_retry_interval: retry interval for failing proxies (default: 60s)
- fail_retry_backoff: linear backoff for failures (default: True)
- ssl_only: skip secondary check on SSL failure (default: False)
2026-01-08 09:05:20 +01:00
Username
64b3629585 dbs: add CDN filtering and verification tables
- CDN_PREFIXES: filter Cloudflare, Fastly, Akamai, CloudFront, Google
- is_cdn_ip(): check if IP belongs to known CDN ranges
- insert_proxies(): skip CDN IPs with count in log message
- verification tables: worker_results, verification_queue, worker_trust
- queue_verification(): add proxies for manager re-testing
- get_verification_stats(): queue size and trigger breakdown
- get_all_worker_trust(): trust scores for all workers
2026-01-08 09:05:13 +01:00
Username
721a602dd9 misc: simplify tor proxy URL to avoid circuit exhaustion 2026-01-08 09:05:03 +01:00
Username
39731e25b3 docs: document batch API endpoint 2026-01-08 09:03:01 +01:00
Username
6cc903c924 httpd: add batch API endpoint and worker improvements
- /api/dashboard: single endpoint returning stats + workers + countries
- dashboard.js: use batch endpoint (2 requests -> 1 per poll cycle)
- _get_workers_data: refactored from /api/workers for code reuse
- worker verification: trust scoring based on result accuracy
- fair distribution: dynamic batch sizing based on queue and workers
- queue tracking: session progress, due/claimed/pending counts
2026-01-08 09:02:56 +01:00
Username
44604f1ce3 tests: add unit test infrastructure
pytest-based test suite with fixtures for database testing.
Covers misc.py utilities, dbs.py operations, and fetch.py validation.
Includes mock_network.py for future network testing.
2026-01-08 01:42:38 +01:00
Username
c1ec5d593b worker: check tor every 30s instead of exponential backoff 2025-12-28 18:41:05 +01:00
Username
966c0d641d docs: mark low-effort tasks as completed 2025-12-28 17:25:06 +01:00
Username
dfb4739b66 proxywatchd: add __slots__ to hot objects for memory reduction 2025-12-28 17:23:51 +01:00
Username
480a652889 httpd: add stats export endpoint with CSV/JSON support 2025-12-28 17:23:44 +01:00
Username
1f09c75345 docs: add database context manager to completed work 2025-12-28 17:13:24 +01:00
Username
5a797a9b97 proxywatchd: use context manager for all DB operations 2025-12-28 17:11:56 +01:00
Username
9e2fc3e09d docs: update roadmap and todo with recent changes 2025-12-28 17:00:52 +01:00
Username
e758ce7178 dashboard: add keyboard shortcuts and optimize polling
- fetch.py: convert proxy validation cache to LRU with OrderedDict
  - thread-safe lock, move_to_end() on hits, evict oldest when full
- dashboard.js: add keyboard shortcuts (r=refresh, 1-9=tabs, t=theme, p=pause)
- dashboard.js: skip chart rendering for inactive tabs (reduces CPU)
2025-12-28 16:52:52 +01:00
Username
18ae73bfb8 httpd: add worker test rate tracking
Track per-worker test rates using 120s sliding window.
Display combined rate in dashboard and individual rates
in worker cards.
2025-12-28 16:43:53 +01:00
Username
2bc00d3ebd worker: check tor before claiming work
All checks were successful
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 11s
2025-12-28 16:09:40 +01:00
Username
0d7d2dce70 refactor: extract modules from proxywatchd.py
All checks were successful
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 11s
Extract focused modules to reduce proxywatchd.py complexity:
- stats.py: JudgeStats, Stats, regexes, ssl_targets (557 lines)
- mitm.py: MITMCertStats, cert extraction functions (239 lines)
- dns.py: socks4_resolve with TTL caching (86 lines)
- job.py: PriorityJobQueue, calculate_priority (103 lines)

proxywatchd.py reduced from 2488 to 1591 lines (-36%).
2025-12-28 15:45:24 +01:00
Username
35f24bb8b0 dashboard: refactor layout and add worker stats
All checks were successful
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 11s
2025-12-28 15:19:50 +01:00
Username
e89db20f5b scraper: add Bing and Yahoo engines 2025-12-28 15:19:39 +01:00
Username
0fbfee2855 httpd: add worker registration and distributed testing API 2025-12-28 15:19:08 +01:00
Username
3b361916fa fetch, dbs: minor refactoring 2025-12-28 15:18:42 +01:00