102 lines
5.7 KiB
Markdown
102 lines
5.7 KiB
Markdown
# PPF Project Roadmap
|
|
|
|
## Project Purpose
|
|
|
|
PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework designed to:
|
|
|
|
1. **Discover** proxy addresses by crawling websites and search engines
|
|
2. **Validate** proxies through multi-target testing via Tor
|
|
3. **Maintain** a database of working proxies with protocol detection (SOCKS4/SOCKS5/HTTP)
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ PPF Architecture │
|
|
├─────────────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
│ │ scraper.py │ │ ppf.py │ │proxywatchd │ │
|
|
│ │ │ │ │ │ │ │
|
|
│ │ Searx query │───>│ URL harvest │───>│ Proxy test │ │
|
|
│ │ URL finding │ │ Proxy extract│ │ Validation │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
|
│ │ │ │ │
|
|
│ v v v │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ SQLite Databases │ │
|
|
│ │ uris.db (URLs) proxies.db (proxy list) │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Network Layer │ │
|
|
│ │ rocksock.py ─── Tor SOCKS ─── Test Proxy ─── Target Server │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Constraints
|
|
|
|
- **Python 2.7** compatibility required
|
|
- **Minimal external dependencies** (avoid adding new modules)
|
|
- Current dependencies: beautifulsoup4, pyasn, IP2Location
|
|
- Data files: IP2LOCATION-LITE-DB1.BIN (country), ipasn.dat (ASN)
|
|
|
|
---
|
|
|
|
## Completed
|
|
|
|
### Target Management
|
|
|
|
| Task | Description | File(s) |
|
|
|------|-------------|---------|
|
|
| Target health tracking | Cooldown-based health tracking for all target pools (head, SSL, IRC, judges) | stats.py, proxywatchd.py |
|
|
| MITM field in proxy list | Expose mitm boolean in JSON proxy list endpoints | httpd.py |
|
|
|
|
---
|
|
|
|
## Open Work
|
|
|
|
### Target Management
|
|
|
|
| Task | Description | File(s) |
|
|
|------|-------------|---------|
|
|
| Dynamic target pool | Auto-discover and rotate validation targets | proxywatchd.py |
|
|
| Geographic target spread | Ensure targets span multiple regions | config.py |
|
|
|
|
---
|
|
|
|
## File Reference
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| ppf.py | Main URL harvester daemon |
|
|
| proxywatchd.py | Proxy validation daemon |
|
|
| scraper.py | Searx search integration |
|
|
| fetch.py | HTTP fetching with proxy support |
|
|
| dbs.py | Database schema and inserts |
|
|
| mysqlite.py | SQLite wrapper |
|
|
| rocksock.py | Socket/proxy abstraction (3rd party) |
|
|
| http2.py | HTTP client implementation |
|
|
| httpd.py | Web dashboard and REST API server |
|
|
| config.py | Configuration management |
|
|
| comboparse.py | Config/arg parser framework |
|
|
| soup_parser.py | BeautifulSoup wrapper |
|
|
| misc.py | Utilities (timestamp, logging) |
|
|
| export.py | Proxy export CLI tool |
|
|
| engines.py | Search engine implementations |
|
|
| connection_pool.py | Tor connection pooling |
|
|
| network_stats.py | Network statistics tracking |
|
|
| dns.py | DNS resolution with caching |
|
|
| mitm.py | MITM certificate detection |
|
|
| job.py | Priority job queue |
|
|
| static/dashboard.js | Dashboard frontend logic |
|
|
| static/dashboard.html | Dashboard HTML template |
|
|
| tools/lib/ppf-common.sh | Shared ops library (hosts, wrappers, colors) |
|
|
| tools/ppf-deploy | Deploy wrapper (validation + playbook) |
|
|
| tools/ppf-logs | View container logs |
|
|
| tools/ppf-service | Container lifecycle management |
|
|
| tools/playbooks/deploy.yml | Ansible deploy playbook |
|
|
| tools/playbooks/inventory.ini | Host inventory (WireGuard IPs) |
|