Username
e7478de79e
scraper: add engine stats API for dashboard
...
- EngineTracker.get_stats() returns detailed per-engine metrics
- get_scraper_stats() module function for external access
- includes success counts, backoff status, availability
2025-12-23 17:23:28 +01:00
Username
68a34f2638
fetch: detect proxy protocol from source URL path
...
- detect_proto_from_path() infers socks4/socks5/http from URL
- extract_proxies() now returns (address, proto) tuples
- ppf.py updated to handle protocol-tagged proxies
- profiler signal handler for SIGTERM stats dump
2025-12-23 17:23:17 +01:00
Username
e0e330301a
dbs: add session persistence and stats history
...
- session_state table for persisting runtime stats across restarts
- stats_history table for hourly snapshots (24h graphs)
- latency tracking with exponential moving average
- anonymity detection columns (transparent/anonymous/elite)
- curated PROXY_SOURCES list for seeding
- migration functions for existing databases
2025-12-23 17:23:04 +01:00
Username
71fb3800ee
config: add gevent dependency and min_threads option
...
- requirements.txt: add gevent for cooperative concurrency
- config.py: add min_threads setting for thread scaling
2025-12-23 17:22:52 +01:00
Username
267035802a
ppf: reset stale_count when content hash changes
2025-12-22 00:05:06 +01:00
Username
f382a4ab6a
ppf: add content hash for duplicate proxy list detection
2025-12-22 00:03:12 +01:00
Username
6b5eb83bf4
fetch: add robust proxy string validation
2025-12-21 23:49:02 +01:00
Username
73192311f3
docs: update dockerfile and readme
2025-12-21 23:38:04 +01:00
Username
01d5dfd477
minor: cleanup comboparse and http2
2025-12-21 23:37:59 +01:00
Username
9e7c8d78b3
fetch: unify known proxies cache
2025-12-21 23:37:58 +01:00
Username
747e6dd7aa
ppf: improve exception handling and logging
2025-12-21 23:37:57 +01:00
Username
901f2c1aee
httpd: improve api error handling
2025-12-21 23:37:49 +01:00
Username
b88aa2a878
scraper: add multi-engine support and tracking
2025-12-21 23:37:48 +01:00
Username
0274b84af8
engines: improve search engine rate limiting
2025-12-21 23:37:48 +01:00
Username
00623f3a18
connection_pool: add health tracking and backoff
2025-12-21 23:37:39 +01:00
Username
77867d0b2d
dbs: add latency columns and migration
2025-12-21 23:37:38 +01:00
Username
2e1d9b7d3f
gitignore: add database and data file patterns
2025-12-21 23:37:38 +01:00
Username
e2ef1b7e36
docs: mark geolocation and ssl testing as completed
2025-12-21 23:37:23 +01:00
Username
95bafcacff
proxywatchd: add startup logging, fix geolocation error handling
2025-12-21 23:37:19 +01:00
Username
b6c0a89afd
add IP2Location for geolocation
2025-12-21 23:37:13 +01:00
Username
d48cc3f9eb
config: fix defaults for database and checktype
2025-12-21 23:37:09 +01:00
Username
8718d33276
fix: use canonical schema from dbs.py in proxywatchd
2025-12-21 10:30:31 +01:00
Username
f4e242fc18
add systemd unit for rootless podman container
2025-12-21 10:23:27 +01:00
Username
55bc9a635e
docs: add README and update ROADMAP
...
- README.md: installation, configuration, usage, deployment
- ROADMAP.md: mark completed items (pooling, scaling, latency, containers)
- priority matrix updated with completion status
2025-12-21 10:19:18 +01:00
Username
79475c2bff
add latency tracking and dynamic thread scaling
...
- dbs.py: add avg_latency, latency_samples columns with migration
- dbs.py: update_proxy_latency() with exponential moving average
- proxywatchd.py: ThreadScaler class for dynamic thread count
- proxywatchd.py: calculate/record latency for successful proxies
- proxywatchd.py: _spawn_thread(), _remove_thread(), _adjust_threads()
- scaler reports status alongside periodic stats
2025-12-21 00:08:19 +01:00
Username
1e43f50aa6
style: normalize test file indentation
2025-12-20 23:19:22 +01:00
Username
e24f68500c
style: normalize indentation and improve code style
...
- convert tabs to 4-space indentation
- add docstrings to modules and classes
- remove unused import (copy)
- use explicit object inheritance
- use 'while True' over 'while 1'
- use 'while args' over 'while len(args)'
- use '{}' over 'dict()'
- consistent string formatting
- Python 2/3 compatible Queue import
2025-12-20 23:18:45 +01:00
Username
d356cdf6ee
docs: mark priority queue complete
2025-12-20 23:11:54 +01:00
Username
a694e441a4
proxywatchd: add priority queue for job scheduling
2025-12-20 23:11:49 +01:00
Username
c224c55afe
docs: mark tor connection pooling complete
2025-12-20 23:02:30 +01:00
Username
af5e1ce4b0
proxywatchd: integrate tor connection pool
2025-12-20 23:02:26 +01:00
Username
bc945a33ff
add tor connection pool with health monitoring
2025-12-20 23:02:21 +01:00
Username
ce79ef7d7f
engines: consolidate extract_urls with base class method
2025-12-20 22:50:46 +01:00
Username
4780b6f095
fetch: consolidate extract_proxies into single implementation
2025-12-20 22:50:39 +01:00
Username
9588da92e7
scraper: remove dead InstanceTracker class
2025-12-20 22:50:34 +01:00
Username
3188d50707
docs: update TODO and ROADMAP with completed work
2025-12-20 22:28:57 +01:00
Username
bef12e6bcf
searx.instances: update with active SearXNG instances
2025-12-20 22:28:52 +01:00
Username
f289057267
cleanup: minor fixes in comboparse and soup_parser
2025-12-20 22:28:47 +01:00
Username
c759f7197e
ppf: use shared proxy cache from fetch module
2025-12-20 22:28:42 +01:00
Username
3c88bc3298
fetch: add unified proxy cache functions
2025-12-20 22:28:37 +01:00
Username
2f2ff9a385
proxywatchd: add stats tracking and httpd integration
...
- Stats class with failure category tracking
- Configurable stats_interval for periodic reports
- Optional httpd server startup when enabled
- cleanup_stale() for removing dead proxies
2025-12-20 22:28:23 +01:00
Username
3f2074f0cf
misc: add log levels and failure categorization
...
- LOG_LEVELS dict with debug, info, warn, error levels
- set_log_level(), get_log_level() functions
- categorize_error() for RocksockException classification
- FAIL_* constants: timeout, refused, auth, unreachable, dns, ssl, closed, proxy, other
2025-12-20 22:28:16 +01:00
Username
2212a9e00a
httpd: add HTTP API server for proxy queries
...
- Endpoints: /proxies, /proxies/count, /health
- Query params: limit, proto, country, format (json/plain)
- Threaded server with CORS support
2025-12-20 22:28:10 +01:00
Username
3b3267d0db
engines: add modular search engine abstraction
...
- SearchEngine base class with build_url, extract_urls, is_rate_limited
- Implementations: DuckDuckGo, Startpage, Mojeek, Qwant, Yandex, Ecosia, Brave
- Git hosters: GitHub, GitLab, Codeberg, Gitea
- Searx wrapper for SearXNG instances
2025-12-20 22:28:04 +01:00
Username
8ce6900244
scraper: integrate multi-lingual search terms
...
- Use translations module for 70% non-English search terms
- Initialize translations config on startup
- Add engines module for multi-engine support
2025-12-20 22:27:51 +01:00
Username
eeb71a1d55
config: add LibreTranslate settings
...
- libretranslate_url: API endpoint (default: https://lt.mymx.me/translate )
- libretranslate_enabled: toggle for dynamic translations (default: True)
2025-12-20 22:27:45 +01:00
Username
8132023c97
translations: add multi-lingual search term generation
...
- Static translations for 15 languages (ru, zh, es, pt, de, fr, ja, ko, ar, id, tr, vi, th, pl, uk)
- LibreTranslate API integration with configurable endpoint
- Dynamic language detection from API /languages endpoint
- Persistent JSON cache with 30-day TTL
- Categorized search terms: generic, protocol, anonymity, freshness, format, sources, geographic, use-case, search operators
- Dynamic year substitution for freshness terms
2025-12-20 22:27:37 +01:00
Username
4547ec3188
roadmap: update completed work
2025-12-20 18:25:55 +01:00
Username
90a6756ade
dbs: add indexes and optimize batch inserts
2025-12-20 18:25:33 +01:00
Username
c054fa3c11
mysqlite: enable WAL mode for better concurrency
2025-12-20 18:25:33 +01:00