Commit Graph

18 Commits

Author SHA1 Message Date
Username
269fed55ff refactor core modules, integrate network stats 2025-12-25 11:13:20 +01:00
Username
97a7dc3316 fetch: use raw strings for regex patterns
All checks were successful
CI / syntax-check (push) Successful in 6s
CI / memory-leak-check (push) Successful in 14s
2025-12-24 01:06:49 +01:00
Username
5e788c06d1 fetch: precompile proxy extraction regex
Move regex pattern compilation to module load time
for better performance in repeated calls.
2025-12-24 00:20:06 +01:00
Username
68a34f2638 fetch: detect proxy protocol from source URL path
- detect_proto_from_path() infers socks4/socks5/http from URL
- extract_proxies() now returns (address, proto) tuples
- ppf.py updated to handle protocol-tagged proxies
- profiler signal handler for SIGTERM stats dump
2025-12-23 17:23:17 +01:00
Username
6b5eb83bf4 fetch: add robust proxy string validation 2025-12-21 23:49:02 +01:00
Username
9e7c8d78b3 fetch: unify known proxies cache 2025-12-21 23:37:58 +01:00
Username
e24f68500c style: normalize indentation and improve code style
- convert tabs to 4-space indentation
- add docstrings to modules and classes
- remove unused import (copy)
- use explicit object inheritance
- use 'while True' over 'while 1'
- use 'while args' over 'while len(args)'
- use '{}' over 'dict()'
- consistent string formatting
- Python 2/3 compatible Queue import
2025-12-20 23:18:45 +01:00
Username
4780b6f095 fetch: consolidate extract_proxies into single implementation 2025-12-20 22:50:39 +01:00
Username
3c88bc3298 fetch: add unified proxy cache functions 2025-12-20 22:28:37 +01:00
Your Name
d7db366857 split to ip/port, "cleanse" ips and ports, bugfixes 2021-08-22 20:39:50 +02:00
Your Name
ee481ea31e ppf: make scraper use extra proxies if available 2021-07-27 22:36:15 +02:00
Your Name
6b6cd94cec spaces to tabs 2021-06-27 12:31:15 +02:00
Your Name
f321e5a934 fetch: more describing debug message 2021-02-06 23:23:47 +01:00
Your Name
abd9b5bb9f tabs to spaces 2021-02-06 14:30:18 +01:00
Mickaël Serneels
0155c6f2ad ppf: check content-type (once) before trying to download/extract proxies
avoid trying to extract stuff from pdf and such (only accept text/*)

REQUIRES:
sqlite3 websites.sqlite "alter table uris add content_type text"

Don't test known uris:
sqlite3 websites.sqlite "update uris set content_type='text/manual' WHERE error=0"
2019-05-01 17:43:28 +02:00
rofl0r
bf7ec03fbf fetch.py: factor out twice used var 2019-05-01 17:43:28 +02:00
rofl0r
b99f83a991 fetch.py: improve readability of extract_urls 2019-01-18 19:32:37 +00:00
rofl0r
4a41796b19 factor out http related code from ppf.py 2019-01-18 19:30:42 +00:00