Commit Graph

297 Commits

Author SHA1 Message Date
Username
e24f68500c style: normalize indentation and improve code style
- convert tabs to 4-space indentation
- add docstrings to modules and classes
- remove unused import (copy)
- use explicit object inheritance
- use 'while True' over 'while 1'
- use 'while args' over 'while len(args)'
- use '{}' over 'dict()'
- consistent string formatting
- Python 2/3 compatible Queue import
2025-12-20 23:18:45 +01:00
Username
d356cdf6ee docs: mark priority queue complete 2025-12-20 23:11:54 +01:00
Username
a694e441a4 proxywatchd: add priority queue for job scheduling 2025-12-20 23:11:49 +01:00
Username
c224c55afe docs: mark tor connection pooling complete 2025-12-20 23:02:30 +01:00
Username
af5e1ce4b0 proxywatchd: integrate tor connection pool 2025-12-20 23:02:26 +01:00
Username
bc945a33ff add tor connection pool with health monitoring 2025-12-20 23:02:21 +01:00
Username
ce79ef7d7f engines: consolidate extract_urls with base class method 2025-12-20 22:50:46 +01:00
Username
4780b6f095 fetch: consolidate extract_proxies into single implementation 2025-12-20 22:50:39 +01:00
Username
9588da92e7 scraper: remove dead InstanceTracker class 2025-12-20 22:50:34 +01:00
Username
3188d50707 docs: update TODO and ROADMAP with completed work 2025-12-20 22:28:57 +01:00
Username
bef12e6bcf searx.instances: update with active SearXNG instances 2025-12-20 22:28:52 +01:00
Username
f289057267 cleanup: minor fixes in comboparse and soup_parser 2025-12-20 22:28:47 +01:00
Username
c759f7197e ppf: use shared proxy cache from fetch module 2025-12-20 22:28:42 +01:00
Username
3c88bc3298 fetch: add unified proxy cache functions 2025-12-20 22:28:37 +01:00
Username
2f2ff9a385 proxywatchd: add stats tracking and httpd integration
- Stats class with failure category tracking
- Configurable stats_interval for periodic reports
- Optional httpd server startup when enabled
- cleanup_stale() for removing dead proxies
2025-12-20 22:28:23 +01:00
Username
3f2074f0cf misc: add log levels and failure categorization
- LOG_LEVELS dict with debug, info, warn, error levels
- set_log_level(), get_log_level() functions
- categorize_error() for RocksockException classification
- FAIL_* constants: timeout, refused, auth, unreachable, dns, ssl, closed, proxy, other
2025-12-20 22:28:16 +01:00
Username
2212a9e00a httpd: add HTTP API server for proxy queries
- Endpoints: /proxies, /proxies/count, /health
- Query params: limit, proto, country, format (json/plain)
- Threaded server with CORS support
2025-12-20 22:28:10 +01:00
Username
3b3267d0db engines: add modular search engine abstraction
- SearchEngine base class with build_url, extract_urls, is_rate_limited
- Implementations: DuckDuckGo, Startpage, Mojeek, Qwant, Yandex, Ecosia, Brave
- Git hosters: GitHub, GitLab, Codeberg, Gitea
- Searx wrapper for SearXNG instances
2025-12-20 22:28:04 +01:00
Username
8ce6900244 scraper: integrate multi-lingual search terms
- Use translations module for 70% non-English search terms
- Initialize translations config on startup
- Add engines module for multi-engine support
2025-12-20 22:27:51 +01:00
Username
eeb71a1d55 config: add LibreTranslate settings
- libretranslate_url: API endpoint (default: https://lt.mymx.me/translate)
- libretranslate_enabled: toggle for dynamic translations (default: True)
2025-12-20 22:27:45 +01:00
Username
8132023c97 translations: add multi-lingual search term generation
- Static translations for 15 languages (ru, zh, es, pt, de, fr, ja, ko, ar, id, tr, vi, th, pl, uk)
- LibreTranslate API integration with configurable endpoint
- Dynamic language detection from API /languages endpoint
- Persistent JSON cache with 30-day TTL
- Categorized search terms: generic, protocol, anonymity, freshness, format, sources, geographic, use-case, search operators
- Dynamic year substitution for freshness terms
2025-12-20 22:27:37 +01:00
Username
4547ec3188 roadmap: update completed work 2025-12-20 18:25:55 +01:00
Username
90a6756ade dbs: add indexes and optimize batch inserts 2025-12-20 18:25:33 +01:00
Username
c054fa3c11 mysqlite: enable WAL mode for better concurrency 2025-12-20 18:25:33 +01:00
Username
86cabd1562 standardize code style: shebangs, class definitions, comments 2025-12-20 18:05:41 +01:00
Username
4c9a658d26 add test infrastructure for --nobs 2025-12-20 17:33:40 +01:00
Username
1d865d5250 ppf: use soup_parser instead of direct bs4 import 2025-12-20 17:33:40 +01:00
Username
0fd8424d33 config: add --nobs flag to disable BeautifulSoup 2025-12-20 17:33:39 +01:00
Username
31a3ac9a8b soup_parser: add stdlib HTMLParser fallback 2025-12-20 17:33:39 +01:00
Username
2a21bd44ed make IP2Location optional 2025-12-20 16:53:00 +01:00
Username
52e82f1f33 remove lxml dependency 2025-12-20 16:52:11 +01:00
Username
7846eb22c9 add project context documentation 2025-12-20 16:47:10 +01:00
Username
ac254873a5 gitignore: add local config directory 2025-12-20 16:47:09 +01:00
Username
fc72640f75 add copilot instructions 2025-12-20 16:46:44 +01:00
Username
d9cea386e9 add project roadmap and task list 2025-12-20 16:46:43 +01:00
Username
a4e2c2ed7c add docker support 2025-12-20 16:46:37 +01:00
Username
9d2701dfa0 add requirements.txt 2025-12-20 16:46:36 +01:00
Username
6ab04a77c7 add gitignore 2025-12-20 16:46:36 +01:00
Username
68d6a8e15f proxywatchd: implement multi-target validation with work-stealing queue 2025-12-20 16:46:09 +01:00
Username
57a7687b08 ppf: remove dead http server code 2025-12-20 16:46:08 +01:00
Username
dc545494b9 soup_parser: remove dead gumbo code 2025-12-20 16:46:08 +01:00
Username
e7a8ff7df7 scraper: remove debug print 2025-12-20 16:46:00 +01:00
Username
67eb5413e4 misc: remove unused random_string function 2025-12-20 16:45:59 +01:00
Username
a62c46600c comboparse: remove test code 2025-12-20 16:45:58 +01:00
Your Name
15ff16b8d6 force py2 usage 2021-10-30 07:13:04 +02:00
Your Name
d7db366857 split to ip/port, "cleanse" ips and ports, bugfixes 2021-08-22 20:39:50 +02:00
Your Name
c3bb49d229 proxywatchd: make use of verifycert 2021-07-27 22:36:24 +02:00
Your Name
ee481ea31e ppf: make scraper use extra proxies if available 2021-07-27 22:36:15 +02:00
Your Name
d4dd2a42ea proxywatchd.py: randomly choose from available regex keys 2021-07-05 16:54:17 +02:00
Your Name
d78212ac50 proxywatchd.py: add more websites 2021-07-05 16:45:02 +02:00