diff --git a/README.md b/README.md index 1c77281..9757f56 100644 --- a/README.md +++ b/README.md @@ -201,7 +201,7 @@ stale_count INT -- checks without new proxies ```ini [Unit] -Description=PPF Proxy Validator +Description=PPF Proxy Fetcher After=network-online.target tor.service Wants=network-online.target @@ -209,7 +209,8 @@ Wants=network-online.target Type=simple User=ppf WorkingDirectory=/opt/ppf -ExecStart=/usr/bin/python2 proxywatchd.py +# ppf.py is the main entry point (runs harvester + validator) +ExecStart=/usr/bin/python2 ppf.py Restart=on-failure RestartSec=30 @@ -224,15 +225,19 @@ WantedBy=multi-user.target podman build -t ppf:latest . # Run with persistent storage +# IMPORTANT: Use ppf.py as entry point (runs both harvester + validator) podman run -d --name ppf \ + --network=host \ -v ./data:/app/data:Z \ -v ./config.ini:/app/config.ini:ro \ - ppf:latest python proxywatchd.py + ppf:latest python ppf.py # Generate systemd unit podman generate systemd --name ppf --files --new ``` +Note: `--network=host` required for Tor access at 127.0.0.1:9050. + ## Troubleshooting ### Low Success Rate diff --git a/ROADMAP.md b/ROADMAP.md index 3103b2b..2e0563b 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -187,7 +187,7 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design │ │ │ │ [x] Standardize logging │ [x] Geographic validation │ │ [x] Config validation │ [x] Additional scrapers │ -│ [ ] Export functionality │ [ ] API sources │ +│ [x] Export functionality │ [ ] API sources │ │ [x] Status output │ [ ] Protocol fingerprinting │ │ │ │ └──────────────────────────┴──────────────────────────────────────────────────┘ @@ -281,6 +281,22 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design - [x] TLS handshake validation with certificate verification - [x] Detects MITM proxies that intercept SSL connections +### Export Functionality (Done) +- [x] export.py CLI tool for exporting working proxies +- [x] Multiple formats: txt, json, csv, len (length-prefixed) +- [x] Filters: proto, country, anonymity, max_latency +- [x] Sort options: latency, added, tested, success +- [x] Output to stdout or file + +### Web Dashboard (Done) +- [x] /dashboard endpoint with dark theme HTML UI +- [x] /api/stats endpoint for JSON runtime statistics +- [x] Auto-refresh with JavaScript fetch every 5 seconds +- [x] Stats provider callback from proxywatchd.py to httpd.py +- [x] Displays: tested/passed/success rate, thread count, uptime +- [x] Tor pool health: per-host latency, success rate, availability +- [x] Failure categories breakdown: timeout, proxy, ssl, closed + --- ## Technical Debt @@ -311,3 +327,4 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design | comboparse.py | Config/arg parser framework | Stable, cleaned | | soup_parser.py | BeautifulSoup wrapper | Stable, cleaned | | misc.py | Utilities (timestamp, logging) | Stable, cleaned | +| export.py | Proxy export CLI tool | Active | diff --git a/TODO.md b/TODO.md index 3dd5e5e..4657825 100644 --- a/TODO.md +++ b/TODO.md @@ -133,37 +133,14 @@ and report() methods. Integrated into main loop with configurable stats_interval --- -### [ ] 14. Export Functionality +### [x] 14. Export Functionality -**Problem:** No easy way to export working proxies for use elsewhere. - -**Implementation:** -```python -# new file: export.py -def export_proxies(proxydb, format='txt', filters=None): - """Export working proxies to various formats.""" - - query = 'SELECT proto, proxy FROM proxylist WHERE failed=0' - if filters: - if 'proto' in filters: - query += ' AND proto=?' - - rows = proxydb.execute(query).fetchall() - - if format == 'txt': - return '\n'.join('%s://%s' % (r[0], r[1]) for r in rows) - elif format == 'json': - import json - return json.dumps([{'proto': r[0], 'address': r[1]} for r in rows]) - elif format == 'csv': - return 'proto,address\n' + '\n'.join('%s,%s' % r for r in rows) - -# CLI: python export.py --format json --proto socks5 > proxies.json -``` - -**Files:** new export.py -**Effort:** Low -**Risk:** Low +**Completed.** Added export.py CLI tool for exporting working proxies. +- Formats: txt (default), json, csv, len (length-prefixed) +- Filters: --proto, --country, --anonymity, --max-latency +- Options: --sort (latency, added, tested, success), --limit, --pretty +- Output: stdout or --output file +- Usage: `python export.py --proto http --country US --sort latency --limit 100` --- @@ -251,8 +228,16 @@ if __name__ == '__main__': - Integrated into proxywatchd.py (starts when httpd.enabled=True) - Config: [httpd] section with listenip, port, enabled -### [ ] 20. Web Dashboard -Status page showing live statistics. +### [x] 20. Web Dashboard + +**Completed.** Added web dashboard with live statistics. +- httpd.py: DASHBOARD_HTML template with dark theme UI +- Endpoint: /dashboard (HTML page with auto-refresh) +- Endpoint: /api/stats (JSON runtime statistics) +- Stats include: tested/passed counts, success rate, thread count, uptime +- Tor pool health: per-host latency, success rate, availability +- Failure categories: timeout, proxy, ssl, closed, etc. +- proxywatchd.py: get_runtime_stats() method provides stats callback --- diff --git a/config.ini.sample b/config.ini.sample index fcd02a2..25c6016 100644 --- a/config.ini.sample +++ b/config.ini.sample @@ -4,6 +4,7 @@ tor_hosts = 127.0.0.1:9050 [watchd] max_fail = 5 threads = 10 +min_threads = 5 timeout = 9 submit_after = 200 use_ssl = 0 @@ -26,8 +27,7 @@ threads = 3 tor_safeguard = 0 [scraper] - -[flood] +enabled = 1 [httpd] listenip = 127.0.0.1 diff --git a/http2.py b/http2.py index 93c0e59..f1e311e 100644 --- a/http2.py +++ b/http2.py @@ -159,7 +159,7 @@ class RsHttp(): if postdata != '': s += postdata if self.debugreq: - print ">>>\n", s + print(">>>\n", s) return s def _make_head_request(self, url, extras=None): @@ -268,7 +268,7 @@ class RsHttp(): res = res.decode(charset) if self.debugreq: - print "<<<\n", s, res + print("<<<\n", s, res) return (s, res, redirect) @@ -377,7 +377,7 @@ class RsHttp(): l = self.conn.recvline().strip() s += l + '\n' if l == '': break - if self.debugreq: print "<<<\n", s + if self.debugreq: print("<<<\n", s) return s def head(self, url, extras=None): @@ -433,7 +433,7 @@ if __name__ == '__main__': http = RsHttp(host=host, port=port, timeout=15, ssl=use_ssl, follow_redirects=True, auto_set_cookies=True) http.debugreq = True if not http.connect(): - print "sorry, couldn't connect" + print("sorry, couldn't connect") else: hdr = http.head(uri) hdr, res = http.get(uri)