# PPF Project Roadmap ## Project Purpose PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework designed to: 1. **Discover** proxy addresses by crawling websites and search engines 2. **Validate** proxies through multi-target testing via Tor 3. **Maintain** a database of working proxies with protocol detection (SOCKS4/SOCKS5/HTTP) ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ PPF Architecture │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ scraper.py │ │ ppf.py │ │proxywatchd │ │ │ │ │ │ │ │ │ │ │ │ Searx query │───>│ URL harvest │───>│ Proxy test │ │ │ │ URL finding │ │ Proxy extract│ │ Validation │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ │ │ │ v v v │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ SQLite Databases │ │ │ │ uris.db (URLs) proxies.db (proxy list) │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Network Layer │ │ │ │ rocksock.py ─── Tor SOCKS ─── Test Proxy ─── Target Server │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## Constraints - **Python 2.7** compatibility required - **Minimal external dependencies** (avoid adding new modules) - Current dependencies: beautifulsoup4, pyasn, IP2Location - Data files: IP2LOCATION-LITE-DB1.BIN (country), ipasn.dat (ASN) --- ## Open Work ### Validation | Task | Description | File(s) | |------|-------------|---------| | Protocol fingerprinting | Better SOCKS4/SOCKS5/HTTP detection | rocksock.py | ### Target Management | Task | Description | File(s) | |------|-------------|---------| | Dynamic target pool | Auto-discover and rotate validation targets | proxywatchd.py | | Target health tracking | Remove unresponsive targets from pool | proxywatchd.py | | Geographic target spread | Ensure targets span multiple regions | config.py | --- ## File Reference | File | Purpose | |------|---------| | ppf.py | Main URL harvester daemon | | proxywatchd.py | Proxy validation daemon | | scraper.py | Searx search integration | | fetch.py | HTTP fetching with proxy support | | dbs.py | Database schema and inserts | | mysqlite.py | SQLite wrapper | | rocksock.py | Socket/proxy abstraction (3rd party) | | http2.py | HTTP client implementation | | httpd.py | Web dashboard and REST API server | | config.py | Configuration management | | comboparse.py | Config/arg parser framework | | soup_parser.py | BeautifulSoup wrapper | | misc.py | Utilities (timestamp, logging) | | export.py | Proxy export CLI tool | | engines.py | Search engine implementations | | connection_pool.py | Tor connection pooling | | network_stats.py | Network statistics tracking | | dns.py | DNS resolution with caching | | mitm.py | MITM certificate detection | | job.py | Priority job queue | | static/dashboard.js | Dashboard frontend logic | | static/dashboard.html | Dashboard HTML template |