5.6 KiB
5.6 KiB
PPF Project Roadmap
Project Purpose
PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework designed to:
- Discover proxy addresses by crawling websites and search engines
- Validate proxies through multi-target testing via Tor
- Maintain a database of working proxies with protocol detection (SOCKS4/SOCKS5/HTTP)
Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ PPF Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ scraper.py │ │ ppf.py │ │proxywatchd │ │
│ │ │ │ │ │ │ │
│ │ Searx query │───>│ URL harvest │───>│ Proxy test │ │
│ │ URL finding │ │ Proxy extract│ │ Validation │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ v v v │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ SQLite Databases │ │
│ │ uris.db (URLs) proxies.db (proxy list) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Network Layer │ │
│ │ rocksock.py ─── Tor SOCKS ─── Test Proxy ─── Target Server │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Constraints
- Python 2.7 compatibility required
- Minimal external dependencies (avoid adding new modules)
- Current dependencies: beautifulsoup4, pyasn, IP2Location
- Data files: IP2LOCATION-LITE-DB1.BIN (country), ipasn.dat (ASN)
Open Work
Validation
| Task | Description | File(s) |
|---|---|---|
| Protocol fingerprinting | Better SOCKS4/SOCKS5/HTTP detection | rocksock.py |
Target Management
| Task | Description | File(s) |
|---|---|---|
| Dynamic target pool | Auto-discover and rotate validation targets | proxywatchd.py |
| Target health tracking | Remove unresponsive targets from pool | proxywatchd.py |
| Geographic target spread | Ensure targets span multiple regions | config.py |
File Reference
| File | Purpose |
|---|---|
| ppf.py | Main URL harvester daemon |
| proxywatchd.py | Proxy validation daemon |
| scraper.py | Searx search integration |
| fetch.py | HTTP fetching with proxy support |
| dbs.py | Database schema and inserts |
| mysqlite.py | SQLite wrapper |
| rocksock.py | Socket/proxy abstraction (3rd party) |
| http2.py | HTTP client implementation |
| httpd.py | Web dashboard and REST API server |
| config.py | Configuration management |
| comboparse.py | Config/arg parser framework |
| soup_parser.py | BeautifulSoup wrapper |
| misc.py | Utilities (timestamp, logging) |
| export.py | Proxy export CLI tool |
| engines.py | Search engine implementations |
| connection_pool.py | Tor connection pooling |
| network_stats.py | Network statistics tracking |
| dns.py | DNS resolution with caching |
| mitm.py | MITM certificate detection |
| job.py | Priority job queue |
| static/dashboard.js | Dashboard frontend logic |
| static/dashboard.html | Dashboard HTML template |
| tools/lib/ppf-common.sh | Shared ops library (hosts, wrappers, colors) |
| tools/ppf-deploy | Deploy wrapper (validation + playbook) |
| tools/ppf-logs | View container logs |
| tools/ppf-service | Container lifecycle management |
| tools/playbooks/deploy.yml | Ansible deploy playbook |
| tools/playbooks/inventory.ini | Host inventory (WireGuard IPs) |