Files
ppf/ROADMAP.md

5.6 KiB

PPF Project Roadmap

Project Purpose

PPF (Proxy Fetcher) is a Python 2 proxy scraping and validation framework designed to:

  1. Discover proxy addresses by crawling websites and search engines
  2. Validate proxies through multi-target testing via Tor
  3. Maintain a database of working proxies with protocol detection (SOCKS4/SOCKS5/HTTP)

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                              PPF Architecture                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                     │
│  │ scraper.py  │    │   ppf.py    │    │proxywatchd  │                     │
│  │             │    │             │    │             │                     │
│  │ Searx query │───>│ URL harvest │───>│ Proxy test  │                     │
│  │ URL finding │    │ Proxy extract│   │ Validation  │                     │
│  └─────────────┘    └─────────────┘    └─────────────┘                     │
│         │                  │                  │                             │
│         v                  v                  v                             │
│  ┌─────────────────────────────────────────────────────────────────┐       │
│  │                        SQLite Databases                          │       │
│  │  uris.db (URLs)                    proxies.db (proxy list)       │       │
│  └─────────────────────────────────────────────────────────────────┘       │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────┐       │
│  │                         Network Layer                            │       │
│  │  rocksock.py ─── Tor SOCKS ─── Test Proxy ─── Target Server      │       │
│  └─────────────────────────────────────────────────────────────────┘       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Constraints

  • Python 2.7 compatibility required
  • Minimal external dependencies (avoid adding new modules)
  • Current dependencies: beautifulsoup4, pyasn, IP2Location
  • Data files: IP2LOCATION-LITE-DB1.BIN (country), ipasn.dat (ASN)

Open Work

Validation

Task Description File(s)
Protocol fingerprinting Better SOCKS4/SOCKS5/HTTP detection rocksock.py

Target Management

Task Description File(s)
Dynamic target pool Auto-discover and rotate validation targets proxywatchd.py
Target health tracking Remove unresponsive targets from pool proxywatchd.py
Geographic target spread Ensure targets span multiple regions config.py

File Reference

File Purpose
ppf.py Main URL harvester daemon
proxywatchd.py Proxy validation daemon
scraper.py Searx search integration
fetch.py HTTP fetching with proxy support
dbs.py Database schema and inserts
mysqlite.py SQLite wrapper
rocksock.py Socket/proxy abstraction (3rd party)
http2.py HTTP client implementation
httpd.py Web dashboard and REST API server
config.py Configuration management
comboparse.py Config/arg parser framework
soup_parser.py BeautifulSoup wrapper
misc.py Utilities (timestamp, logging)
export.py Proxy export CLI tool
engines.py Search engine implementations
connection_pool.py Tor connection pooling
network_stats.py Network statistics tracking
dns.py DNS resolution with caching
mitm.py MITM certificate detection
job.py Priority job queue
static/dashboard.js Dashboard frontend logic
static/dashboard.html Dashboard HTML template
tools/lib/ppf-common.sh Shared ops library (hosts, wrappers, colors)
tools/ppf-deploy Deploy wrapper (validation + playbook)
tools/ppf-logs View container logs
tools/ppf-service Container lifecycle management
tools/playbooks/deploy.yml Ansible deploy playbook
tools/playbooks/inventory.ini Host inventory (WireGuard IPs)