docs: update roadmap and task tracking

- README: update feature list
- ROADMAP: add completed features, update priorities
- TODO: mark completed tasks, add new items
- config.ini.sample: update example values
- http2: minor cleanup
This commit is contained in:
Username
2025-12-23 17:24:25 +01:00
parent 1c437d2246
commit eb1bba0e13
5 changed files with 49 additions and 42 deletions

View File

@@ -201,7 +201,7 @@ stale_count INT -- checks without new proxies
```ini
[Unit]
Description=PPF Proxy Validator
Description=PPF Proxy Fetcher
After=network-online.target tor.service
Wants=network-online.target
@@ -209,7 +209,8 @@ Wants=network-online.target
Type=simple
User=ppf
WorkingDirectory=/opt/ppf
ExecStart=/usr/bin/python2 proxywatchd.py
# ppf.py is the main entry point (runs harvester + validator)
ExecStart=/usr/bin/python2 ppf.py
Restart=on-failure
RestartSec=30
@@ -224,15 +225,19 @@ WantedBy=multi-user.target
podman build -t ppf:latest .
# Run with persistent storage
# IMPORTANT: Use ppf.py as entry point (runs both harvester + validator)
podman run -d --name ppf \
--network=host \
-v ./data:/app/data:Z \
-v ./config.ini:/app/config.ini:ro \
ppf:latest python proxywatchd.py
ppf:latest python ppf.py
# Generate systemd unit
podman generate systemd --name ppf --files --new
```
Note: `--network=host` required for Tor access at 127.0.0.1:9050.
## Troubleshooting
### Low Success Rate

View File

@@ -187,7 +187,7 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
│ │ │
│ [x] Standardize logging │ [x] Geographic validation │
│ [x] Config validation │ [x] Additional scrapers │
│ [ ] Export functionality │ [ ] API sources │
│ [x] Export functionality │ [ ] API sources │
│ [x] Status output │ [ ] Protocol fingerprinting │
│ │ │
└──────────────────────────┴──────────────────────────────────────────────────┘
@@ -281,6 +281,22 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
- [x] TLS handshake validation with certificate verification
- [x] Detects MITM proxies that intercept SSL connections
### Export Functionality (Done)
- [x] export.py CLI tool for exporting working proxies
- [x] Multiple formats: txt, json, csv, len (length-prefixed)
- [x] Filters: proto, country, anonymity, max_latency
- [x] Sort options: latency, added, tested, success
- [x] Output to stdout or file
### Web Dashboard (Done)
- [x] /dashboard endpoint with dark theme HTML UI
- [x] /api/stats endpoint for JSON runtime statistics
- [x] Auto-refresh with JavaScript fetch every 5 seconds
- [x] Stats provider callback from proxywatchd.py to httpd.py
- [x] Displays: tested/passed/success rate, thread count, uptime
- [x] Tor pool health: per-host latency, success rate, availability
- [x] Failure categories breakdown: timeout, proxy, ssl, closed
---
## Technical Debt
@@ -311,3 +327,4 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
| comboparse.py | Config/arg parser framework | Stable, cleaned |
| soup_parser.py | BeautifulSoup wrapper | Stable, cleaned |
| misc.py | Utilities (timestamp, logging) | Stable, cleaned |
| export.py | Proxy export CLI tool | Active |

49
TODO.md
View File

@@ -133,37 +133,14 @@ and report() methods. Integrated into main loop with configurable stats_interval
---
### [ ] 14. Export Functionality
### [x] 14. Export Functionality
**Problem:** No easy way to export working proxies for use elsewhere.
**Implementation:**
```python
# new file: export.py
def export_proxies(proxydb, format='txt', filters=None):
"""Export working proxies to various formats."""
query = 'SELECT proto, proxy FROM proxylist WHERE failed=0'
if filters:
if 'proto' in filters:
query += ' AND proto=?'
rows = proxydb.execute(query).fetchall()
if format == 'txt':
return '\n'.join('%s://%s' % (r[0], r[1]) for r in rows)
elif format == 'json':
import json
return json.dumps([{'proto': r[0], 'address': r[1]} for r in rows])
elif format == 'csv':
return 'proto,address\n' + '\n'.join('%s,%s' % r for r in rows)
# CLI: python export.py --format json --proto socks5 > proxies.json
```
**Files:** new export.py
**Effort:** Low
**Risk:** Low
**Completed.** Added export.py CLI tool for exporting working proxies.
- Formats: txt (default), json, csv, len (length-prefixed)
- Filters: --proto, --country, --anonymity, --max-latency
- Options: --sort (latency, added, tested, success), --limit, --pretty
- Output: stdout or --output file
- Usage: `python export.py --proto http --country US --sort latency --limit 100`
---
@@ -251,8 +228,16 @@ if __name__ == '__main__':
- Integrated into proxywatchd.py (starts when httpd.enabled=True)
- Config: [httpd] section with listenip, port, enabled
### [ ] 20. Web Dashboard
Status page showing live statistics.
### [x] 20. Web Dashboard
**Completed.** Added web dashboard with live statistics.
- httpd.py: DASHBOARD_HTML template with dark theme UI
- Endpoint: /dashboard (HTML page with auto-refresh)
- Endpoint: /api/stats (JSON runtime statistics)
- Stats include: tested/passed counts, success rate, thread count, uptime
- Tor pool health: per-host latency, success rate, availability
- Failure categories: timeout, proxy, ssl, closed, etc.
- proxywatchd.py: get_runtime_stats() method provides stats callback
---

View File

@@ -4,6 +4,7 @@ tor_hosts = 127.0.0.1:9050
[watchd]
max_fail = 5
threads = 10
min_threads = 5
timeout = 9
submit_after = 200
use_ssl = 0
@@ -26,8 +27,7 @@ threads = 3
tor_safeguard = 0
[scraper]
[flood]
enabled = 1
[httpd]
listenip = 127.0.0.1

View File

@@ -159,7 +159,7 @@ class RsHttp():
if postdata != '':
s += postdata
if self.debugreq:
print ">>>\n", s
print(">>>\n", s)
return s
def _make_head_request(self, url, extras=None):
@@ -268,7 +268,7 @@ class RsHttp():
res = res.decode(charset)
if self.debugreq:
print "<<<\n", s, res
print("<<<\n", s, res)
return (s, res, redirect)
@@ -377,7 +377,7 @@ class RsHttp():
l = self.conn.recvline().strip()
s += l + '\n'
if l == '': break
if self.debugreq: print "<<<\n", s
if self.debugreq: print("<<<\n", s)
return s
def head(self, url, extras=None):
@@ -433,7 +433,7 @@ if __name__ == '__main__':
http = RsHttp(host=host, port=port, timeout=15, ssl=use_ssl, follow_redirects=True, auto_set_cookies=True)
http.debugreq = True
if not http.connect():
print "sorry, couldn't connect"
print("sorry, couldn't connect")
else:
hdr = http.head(uri)
hdr, res = http.get(uri)