Username
269fed55ff
refactor core modules, integrate network stats
2025-12-25 11:13:20 +01:00
Username
97a7dc3316
fetch: use raw strings for regex patterns
CI / syntax-check (push) Successful in 6s
CI / memory-leak-check (push) Successful in 14s
2025-12-24 01:06:49 +01:00
Username
5e788c06d1
fetch: precompile proxy extraction regex
...
Move regex pattern compilation to module load time
for better performance in repeated calls.
2025-12-24 00:20:06 +01:00
Username
68a34f2638
fetch: detect proxy protocol from source URL path
...
- detect_proto_from_path() infers socks4/socks5/http from URL
- extract_proxies() now returns (address, proto) tuples
- ppf.py updated to handle protocol-tagged proxies
- profiler signal handler for SIGTERM stats dump
2025-12-23 17:23:17 +01:00
Username
6b5eb83bf4
fetch: add robust proxy string validation
2025-12-21 23:49:02 +01:00
Username
9e7c8d78b3
fetch: unify known proxies cache
2025-12-21 23:37:58 +01:00
Username
e24f68500c
style: normalize indentation and improve code style
...
- convert tabs to 4-space indentation
- add docstrings to modules and classes
- remove unused import (copy)
- use explicit object inheritance
- use 'while True' over 'while 1'
- use 'while args' over 'while len(args)'
- use '{}' over 'dict()'
- consistent string formatting
- Python 2/3 compatible Queue import
2025-12-20 23:18:45 +01:00
Username
4780b6f095
fetch: consolidate extract_proxies into single implementation
2025-12-20 22:50:39 +01:00
Username
3c88bc3298
fetch: add unified proxy cache functions
2025-12-20 22:28:37 +01:00
Your Name
d7db366857
split to ip/port, "cleanse" ips and ports, bugfixes
2021-08-22 20:39:50 +02:00
Your Name
ee481ea31e
ppf: make scraper use extra proxies if available
2021-07-27 22:36:15 +02:00
Your Name
6b6cd94cec
spaces to tabs
2021-06-27 12:31:15 +02:00
Your Name
f321e5a934
fetch: more describing debug message
2021-02-06 23:23:47 +01:00
Your Name
abd9b5bb9f
tabs to spaces
2021-02-06 14:30:18 +01:00
Mickaël Serneels
0155c6f2ad
ppf: check content-type (once) before trying to download/extract proxies
...
avoid trying to extract stuff from pdf and such (only accept text/*)
REQUIRES:
sqlite3 websites.sqlite "alter table uris add content_type text"
Don't test known uris:
sqlite3 websites.sqlite "update uris set content_type='text/manual' WHERE error=0"
2019-05-01 17:43:28 +02:00
rofl0r
bf7ec03fbf
fetch.py: factor out twice used var
2019-05-01 17:43:28 +02:00
rofl0r
b99f83a991
fetch.py: improve readability of extract_urls
2019-01-18 19:32:37 +00:00
rofl0r
4a41796b19
factor out http related code from ppf.py
2019-01-18 19:30:42 +00:00