docs: add README and update ROADMAP

- README.md: installation, configuration, usage, deployment
- ROADMAP.md: mark completed items (pooling, scaling, latency, containers)
- priority matrix updated with completion status
This commit is contained in:
Username
2025-12-21 10:19:18 +01:00
parent 79475c2bff
commit 55bc9a635e
2 changed files with 304 additions and 8 deletions

View File

@@ -177,18 +177,18 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
├──────────────────────────┬──────────────────────────────────────────────────┤
│ HIGH IMPACT / LOW EFFORT │ HIGH IMPACT / HIGH EFFORT │
│ │ │
Unify _known_proxies Connection pooling
Graceful DB errors Dynamic thread scaling
Batch inserts Unit test infrastructure
WAL mode for SQLite Latency tracking
[x] Unify _known_proxies │ [x] Connection pooling │
[x] Graceful DB errors │ [x] Dynamic thread scaling │
[x] Batch inserts │ [ ] Unit test infrastructure │
[x] WAL mode for SQLite │ [x] Latency tracking │
│ │ │
├──────────────────────────┼──────────────────────────────────────────────────┤
│ LOW IMPACT / LOW EFFORT │ LOW IMPACT / HIGH EFFORT │
│ │ │
Standardize logging Geographic validation
Config validation Additional scrapers
Export functionality │ ● API sources
Status output Protocol fingerprinting
[x] Standardize logging │ [ ] Geographic validation │
[x] Config validation │ [x] Additional scrapers │
[ ] Export functionality │ [ ] API sources │
[x] Status output │ [ ] Protocol fingerprinting │
│ │ │
└──────────────────────────┴──────────────────────────────────────────────────┘
```
@@ -233,6 +233,41 @@ PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework design
- [x] Stale proxy cleanup (cleanup_stale() with configurable stale_days)
- [x] Timeout config options (timeout_connect, timeout_read)
### Connection Pooling (Done)
- [x] TorHostState class tracking per-host health and latency
- [x] TorConnectionPool with worker affinity for circuit reuse
- [x] Exponential backoff (5s, 10s, 20s, 40s, max 60s) on failures
- [x] Pool warmup and health status reporting
### Priority Queue (Done)
- [x] PriorityJobQueue class with heap-based ordering
- [x] calculate_priority() assigns priority 0-4 by proxy state
- [x] New proxies tested first, high-fail proxies last
### Dynamic Thread Scaling (Done)
- [x] ThreadScaler class adjusts thread count dynamically
- [x] Scales up when queue deep and success rate acceptable
- [x] Scales down when queue shallow or success rate drops
- [x] Respects min/max bounds with cooldown period
### Latency Tracking (Done)
- [x] avg_latency, latency_samples columns in proxylist
- [x] Exponential moving average calculation
- [x] Migration function for existing databases
- [x] Latency recorded for successful proxy tests
### Container Support (Done)
- [x] Dockerfile with Python 2.7-slim base
- [x] docker-compose.yml for local development
- [x] Rootless podman deployment documentation
- [x] Volume mounts for persistent data
### Code Style (Done)
- [x] Normalized indentation (4-space, no tabs)
- [x] Removed dead code and unused imports
- [x] Added docstrings to classes and functions
- [x] Python 2/3 compatible imports (Queue/queue)
---
## Technical Debt