Expand PROXY_SOURCES with proxifly, vakhov, prxchk, sunny9577, officialputuid, hookzof, and iplocate lists. Add source_proto and protos_working schema columns for protocol intelligence. Remove completed proxy source expansion task from roadmap.
92 lines
5.3 KiB
Markdown
92 lines
5.3 KiB
Markdown
# PPF Project Roadmap
|
|
|
|
## Project Purpose
|
|
|
|
PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework designed to:
|
|
|
|
1. **Discover** proxy addresses by crawling websites and search engines
|
|
2. **Validate** proxies through multi-target testing via Tor
|
|
3. **Maintain** a database of working proxies with protocol detection (SOCKS4/SOCKS5/HTTP)
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ PPF Architecture │
|
|
├─────────────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
│ │ scraper.py │ │ ppf.py │ │proxywatchd │ │
|
|
│ │ │ │ │ │ │ │
|
|
│ │ Searx query │───>│ URL harvest │───>│ Proxy test │ │
|
|
│ │ URL finding │ │ Proxy extract│ │ Validation │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
|
│ │ │ │ │
|
|
│ v v v │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ SQLite Databases │ │
|
|
│ │ uris.db (URLs) proxies.db (proxy list) │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Network Layer │ │
|
|
│ │ rocksock.py ─── Tor SOCKS ─── Test Proxy ─── Target Server │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Constraints
|
|
|
|
- **Python 2.7** compatibility required
|
|
- **Minimal external dependencies** (avoid adding new modules)
|
|
- Current dependencies: beautifulsoup4, pyasn, IP2Location
|
|
- Data files: IP2LOCATION-LITE-DB1.BIN (country), ipasn.dat (ASN)
|
|
|
|
---
|
|
|
|
## Open Work
|
|
|
|
### Validation
|
|
|
|
| Task | Description | File(s) |
|
|
|------|-------------|---------|
|
|
| Protocol fingerprinting | Better SOCKS4/SOCKS5/HTTP detection | rocksock.py |
|
|
|
|
### Target Management
|
|
|
|
| Task | Description | File(s) |
|
|
|------|-------------|---------|
|
|
| Dynamic target pool | Auto-discover and rotate validation targets | proxywatchd.py |
|
|
| Target health tracking | Remove unresponsive targets from pool | proxywatchd.py |
|
|
| Geographic target spread | Ensure targets span multiple regions | config.py |
|
|
|
|
---
|
|
|
|
## File Reference
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| ppf.py | Main URL harvester daemon |
|
|
| proxywatchd.py | Proxy validation daemon |
|
|
| scraper.py | Searx search integration |
|
|
| fetch.py | HTTP fetching with proxy support |
|
|
| dbs.py | Database schema and inserts |
|
|
| mysqlite.py | SQLite wrapper |
|
|
| rocksock.py | Socket/proxy abstraction (3rd party) |
|
|
| http2.py | HTTP client implementation |
|
|
| httpd.py | Web dashboard and REST API server |
|
|
| config.py | Configuration management |
|
|
| comboparse.py | Config/arg parser framework |
|
|
| soup_parser.py | BeautifulSoup wrapper |
|
|
| misc.py | Utilities (timestamp, logging) |
|
|
| export.py | Proxy export CLI tool |
|
|
| engines.py | Search engine implementations |
|
|
| connection_pool.py | Tor connection pooling |
|
|
| network_stats.py | Network statistics tracking |
|
|
| dns.py | DNS resolution with caching |
|
|
| mitm.py | MITM certificate detection |
|
|
| job.py | Priority job queue |
|
|
| static/dashboard.js | Dashboard frontend logic |
|
|
| static/dashboard.html | Dashboard HTML template |
|