Commit Graph

119 Commits

Author SHA1 Message Date
Username
82c909d7c0 rename --worker-v2 to --worker
No V1 means no need for the suffix. Update flag, function name,
compose command, log messages, and docs.
2026-02-17 22:30:09 +01:00
Username
2782e6d754 ppf: remove V1 worker functions and main loop
Drop worker_get_work(), worker_submit_results(), and the entire
worker_main() V1 loop. Rewire --register to use worker_v2_main().
2026-02-17 22:10:38 +01:00
Username
dfcd8f0c00 add test provenance columns and worker report fields
Add last_check/last_target columns to proxylist schema with migration.
Include checktype and target in V2 worker report payload.
2026-02-17 21:06:21 +01:00
Username
e74782ad3f ppf: fix worker_id undefined when using --worker-key 2026-02-17 16:15:04 +01:00
Username
c710555aad ppf: pass url scoring config to httpd module 2026-02-17 15:20:15 +01:00
Username
862eeed5c8 ppf: add worker_v2_main() for URL-driven discovery 2026-02-17 14:23:58 +01:00
Username
0685c2bc4c ppf: add HTTP client functions for V2 worker endpoints 2026-02-17 14:23:44 +01:00
Username
5197c3b7e6 httpd: pass url database to api server 2026-02-17 13:42:01 +01:00
Username
2156319bad ppf: worker heartbeat includes thread count 2026-01-08 09:05:30 +01:00
Username
c1ec5d593b worker: check tor every 30s instead of exponential backoff 2025-12-28 18:41:05 +01:00
Username
2bc00d3ebd worker: check tor before claiming work
All checks were successful
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 11s
2025-12-28 16:09:40 +01:00
Username
f4286ea515 ppf: remove num_targets param (removed in phase 2)
All checks were successful
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 11s
2025-12-28 15:16:52 +01:00
Username
d219cc567f phase 2: code cleanup and simplification
All checks were successful
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 11s
- Remove unused result_queue from WorkerThread and worker mode
- Remove num_targets abstraction, simplify to single-target mode
- Add _db_context() context manager for database connections
- Refactor 5 call sites to use context manager (finish, init, cleanup_stale, periodic saves)
- Mark _prep_db/_close_db as deprecated
- Add __version__ = '2.0.0' to ppf.py
- Add thread spawn stagger (0-100ms) in worker mode for Tor-friendly startup
2025-12-28 14:31:37 +01:00
Username
72a2dcdaf4 ppf: add worker mode with distributed testing
All checks were successful
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 11s
- Add --worker mode for distributed proxy testing
- Workers claim batches from manager, test via local Tor, submit results
- Add --register to register new workers with manager
- Add thread spawn stagger (0-100ms) to avoid overwhelming Tor
- Verify Tor connectivity before claiming work
- Add heartbeat and batch timeout handling
- Track worker profiling state for dashboard display
2025-12-28 14:12:59 +01:00
Username
7232846b0f ppf: add --reset flag to clear all state 2025-12-26 20:57:15 +01:00
Username
a20b5525f0 ppf: handle confidence field in proxy tuples 2025-12-26 19:34:22 +01:00
Username
269fed55ff refactor core modules, integrate network stats 2025-12-25 11:13:20 +01:00
Username
9360c35add ppf: add format_duration helper and stale log improvements
- Add format_duration() for compact time display
- Improve stale proxy logging with duration info
2025-12-24 00:20:13 +01:00
Username
68a34f2638 fetch: detect proxy protocol from source URL path
- detect_proto_from_path() infers socks4/socks5/http from URL
- extract_proxies() now returns (address, proto) tuples
- ppf.py updated to handle protocol-tagged proxies
- profiler signal handler for SIGTERM stats dump
2025-12-23 17:23:17 +01:00
Username
267035802a ppf: reset stale_count when content hash changes 2025-12-22 00:05:06 +01:00
Username
f382a4ab6a ppf: add content hash for duplicate proxy list detection 2025-12-22 00:03:12 +01:00
Username
747e6dd7aa ppf: improve exception handling and logging 2025-12-21 23:37:57 +01:00
Username
e24f68500c style: normalize indentation and improve code style
- convert tabs to 4-space indentation
- add docstrings to modules and classes
- remove unused import (copy)
- use explicit object inheritance
- use 'while True' over 'while 1'
- use 'while args' over 'while len(args)'
- use '{}' over 'dict()'
- consistent string formatting
- Python 2/3 compatible Queue import
2025-12-20 23:18:45 +01:00
Username
4780b6f095 fetch: consolidate extract_proxies into single implementation 2025-12-20 22:50:39 +01:00
Username
c759f7197e ppf: use shared proxy cache from fetch module 2025-12-20 22:28:42 +01:00
Username
1d865d5250 ppf: use soup_parser instead of direct bs4 import 2025-12-20 17:33:40 +01:00
Username
57a7687b08 ppf: remove dead http server code 2025-12-20 16:46:08 +01:00
Your Name
15ff16b8d6 force py2 usage 2021-10-30 07:13:04 +02:00
Your Name
ee481ea31e ppf: make scraper use extra proxies if available 2021-07-27 22:36:15 +02:00
Your Name
6b6cd94cec spaces to tabs 2021-06-27 12:31:15 +02:00
Your Name
d3d83e1d90 changes 2021-05-12 08:06:03 +02:00
Your Name
cae6f75643 changs 2021-05-02 00:22:12 +02:00
Your Name
1a4d51f08c ppf: play nice with cpu 2021-02-10 22:26:27 +01:00
Your Name
60c78be3fb import new url as bulk list, misc cleansing 2021-02-06 23:25:12 +01:00
Your Name
7e91ae5237 changes 2021-02-06 21:50:08 +01:00
Your Name
68394da9ab misc changes and fixes and 2021-02-06 15:36:14 +01:00
Your Name
b29c734002 fix: url → self.url, make thread option configurable 2021-02-06 14:33:44 +01:00
Your Name
5965312a9a make leeching multithreaded, misc changes 2021-02-06 14:30:07 +01:00
Your Name
dd3d3c3518 fix: always check if is_bad_url 2021-02-06 12:20:34 +01:00
Your Name
01bded472f tabs to space 2021-02-06 12:14:22 +01:00
Your Name
78b29a1187 some changes 2021-01-24 03:52:56 +01:00
Mickaël Serneels
eeedf9d0a1 extract url only from same domains ? (default: False)
setting this option will make ppf not follow external links when extracting uris
2019-05-14 21:24:29 +02:00
Mickaël Serneels
b226bc0b03 check if bad url *after* building the url 2019-05-14 19:31:19 +02:00
Mickaël Serneels
eeae849e12 space2tab 2019-05-14 19:29:30 +02:00
Mickaël Serneels
bcaf7af0e7 extract_urls(): only when stale_count = 0 2019-05-13 23:49:35 +02:00
Mickaël Serneels
e2122a27d9 ppf: strip extraced uris 2019-05-13 23:48:55 +02:00
Mickaël Serneels
225b76462c import_from_file: don't add empty url 2019-05-13 23:48:55 +02:00
Mickaël Serneels
c241f1a766 make use of dbs.insert_urls() 2019-05-01 23:19:50 +02:00
Mickaël Serneels
c8d594fb73 add url extraction
url get extracted from webpage when page contains proxies

this allows to "learn" as much links as possible from a working website
2019-05-01 22:58:23 +02:00
Mickaël Serneels
0fb706eeae clean code 2019-05-01 17:43:29 +02:00