Username
98b232f3d3
fetch: add short-circuit guards to extraction functions
...
Skip expensive regex scans when content lacks required markers:
- extract_auth_proxies: skip if no '@' in content
- extract_proxies_from_table: skip if no '<table' tag
- extract_proxies_from_json: skip if no '{' or '['
- Hoist table regexes to module-level precompiled constants
2026-02-22 13:50:29 +01:00
Username
0311abb46a
fetch: encode unicode URLs to bytes before HTTP/SOCKS ops
...
When URLs arrive as unicode (e.g. from JSON API responses), the unicode
type propagates through _parse_url into the SOCKS5 packet construction
in rocksock. Port bytes > 127 formatted via %c in a unicode string
produce non-ASCII characters that fail on socket sendall() implicit
ASCII encode.
Encode URLs to UTF-8 bytes at fetch entry points to keep the entire
request pipeline in str (bytes) domain.
2026-02-17 16:43:26 +01:00
Username
12174b0d9d
fetch: fix LRU cache for python 2 compatibility
2026-01-08 09:05:59 +01:00
Username
e758ce7178
dashboard: add keyboard shortcuts and optimize polling
...
- fetch.py: convert proxy validation cache to LRU with OrderedDict
- thread-safe lock, move_to_end() on hits, evict oldest when full
- dashboard.js: add keyboard shortcuts (r=refresh, 1-9=tabs, t=theme, p=pause)
- dashboard.js: skip chart rendering for inactive tabs (reduces CPU)
2025-12-28 16:52:52 +01:00
Username
3b361916fa
fetch, dbs: minor refactoring
2025-12-28 15:18:42 +01:00
Username
d2bd7d4f34
fetch: retry with different Tor circuit on failure
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 12s
2025-12-26 20:57:28 +01:00
Username
906d1b33ae
fetch: cache is_usable_proxy results
CI / syntax-check (push) Successful in 3s
CI / memory-leak-check (push) Successful in 11s
2025-12-26 20:04:01 +01:00
Username
481dc514fb
fetch: add IPv6, auth proxy, and confidence scoring support
2025-12-26 19:13:36 +01:00
Username
272eba0f05
scraper: reuse connections, cycle circuit on block
CI / syntax-check (push) Successful in 6s
CI / memory-leak-check (push) Successful in 15s
2025-12-25 19:26:23 +01:00
Username
68e8b88afa
tor: use random credentials for circuit isolation
CI / syntax-check (push) Successful in 6s
CI / memory-leak-check (push) Successful in 14s
2025-12-25 19:18:25 +01:00
Username
269fed55ff
refactor core modules, integrate network stats
2025-12-25 11:13:20 +01:00
Username
97a7dc3316
fetch: use raw strings for regex patterns
CI / syntax-check (push) Successful in 6s
CI / memory-leak-check (push) Successful in 14s
2025-12-24 01:06:49 +01:00
Username
5e788c06d1
fetch: precompile proxy extraction regex
...
Move regex pattern compilation to module load time
for better performance in repeated calls.
2025-12-24 00:20:06 +01:00
Username
68a34f2638
fetch: detect proxy protocol from source URL path
...
- detect_proto_from_path() infers socks4/socks5/http from URL
- extract_proxies() now returns (address, proto) tuples
- ppf.py updated to handle protocol-tagged proxies
- profiler signal handler for SIGTERM stats dump
2025-12-23 17:23:17 +01:00
Username
6b5eb83bf4
fetch: add robust proxy string validation
2025-12-21 23:49:02 +01:00
Username
9e7c8d78b3
fetch: unify known proxies cache
2025-12-21 23:37:58 +01:00
Username
e24f68500c
style: normalize indentation and improve code style
...
- convert tabs to 4-space indentation
- add docstrings to modules and classes
- remove unused import (copy)
- use explicit object inheritance
- use 'while True' over 'while 1'
- use 'while args' over 'while len(args)'
- use '{}' over 'dict()'
- consistent string formatting
- Python 2/3 compatible Queue import
2025-12-20 23:18:45 +01:00
Username
4780b6f095
fetch: consolidate extract_proxies into single implementation
2025-12-20 22:50:39 +01:00
Username
3c88bc3298
fetch: add unified proxy cache functions
2025-12-20 22:28:37 +01:00
Your Name
d7db366857
split to ip/port, "cleanse" ips and ports, bugfixes
2021-08-22 20:39:50 +02:00
Your Name
ee481ea31e
ppf: make scraper use extra proxies if available
2021-07-27 22:36:15 +02:00
Your Name
6b6cd94cec
spaces to tabs
2021-06-27 12:31:15 +02:00
Your Name
f321e5a934
fetch: more describing debug message
2021-02-06 23:23:47 +01:00
Your Name
abd9b5bb9f
tabs to spaces
2021-02-06 14:30:18 +01:00
Mickaël Serneels
0155c6f2ad
ppf: check content-type (once) before trying to download/extract proxies
...
avoid trying to extract stuff from pdf and such (only accept text/*)
REQUIRES:
sqlite3 websites.sqlite "alter table uris add content_type text"
Don't test known uris:
sqlite3 websites.sqlite "update uris set content_type='text/manual' WHERE error=0"
2019-05-01 17:43:28 +02:00
rofl0r
bf7ec03fbf
fetch.py: factor out twice used var
2019-05-01 17:43:28 +02:00
rofl0r
b99f83a991
fetch.py: improve readability of extract_urls
2019-01-18 19:32:37 +00:00
rofl0r
4a41796b19
factor out http related code from ppf.py
2019-01-18 19:30:42 +00:00