Username 9ba965c87f
All checks were successful
CI / syntax-check (push) Successful in 6s
CI / memory-leak-check (push) Successful in 14s
proxywatchd: ensure socket cleanup before SSL fallback
2025-12-25 19:13:47 +01:00
2025-12-20 16:46:44 +01:00
2025-12-23 17:34:51 +01:00
2025-12-25 11:14:41 +01:00
2025-12-20 16:47:10 +01:00
2025-12-20 16:46:37 +01:00
2025-12-21 23:38:04 +01:00
2025-12-23 17:34:51 +01:00
2025-12-25 11:14:27 +01:00
2021-05-02 00:22:12 +02:00
2021-06-02 14:54:46 +02:00
2019-03-05 22:29:16 +00:00
2025-12-25 11:14:27 +01:00
2021-06-27 12:12:49 +02:00
2021-04-26 02:07:22 +02:00

PPF - Python Proxy Finder

A Python 2.7 proxy discovery and validation framework.

Overview

PPF discovers proxy addresses by crawling websites and search engines, validates them through multi-target testing via Tor, and maintains a database of working proxies with automatic protocol detection (SOCKS4/SOCKS5/HTTP).

scraper.py ──> ppf.py ──> proxywatchd.py
   │             │              │
   │ search      │ harvest      │ validate
   │ engines     │ proxies      │ via tor
   v             v              v
         SQLite databases

Requirements

  • Python 2.7
  • Tor SOCKS proxy (default: 127.0.0.1:9050)
  • beautifulsoup4 (optional with --nobs flag)

Installation

Local

pip install -r requirements.txt
cp config.ini.sample config.ini
cp servers.txt.sample servers.txt

Container (Rootless)

# On container host, as dedicated user
podman build -t ppf:latest .
podman run --rm ppf:latest python ppf.py --help

Prerequisites for rootless containers:

  • subuid/subgid mappings configured
  • linger enabled (loginctl enable-linger $USER)
  • passt installed for networking

Configuration

Copy config.ini.sample to config.ini and adjust:

[common]
tor_hosts = 127.0.0.1:9050      # Comma-separated Tor SOCKS addresses
timeout_connect = 10             # Connection timeout (seconds)
timeout_read = 15                # Read timeout (seconds)

[watchd]
threads = 10                     # Parallel validation threads
max_fail = 5                     # Failures before proxy marked dead
checktime = 1800                 # Base recheck interval (seconds)
database = proxies.sqlite        # Proxy database path
stale_days = 30                  # Days before removing dead proxies
stats_interval = 300             # Seconds between status reports

[ppf]
threads = 3                      # URL harvesting threads
search = 1                       # Enable search engine discovery
database = websites.sqlite       # URL database path

[scraper]
engines = searx,duckduckgo       # Comma-separated search engines
max_pages = 5                    # Max pages per engine query

[httpd]
enabled = 0                      # Enable REST API
port = 8081                      # API listen port
listenip = 127.0.0.1             # API bind address

Usage

Proxy Validation Daemon

python proxywatchd.py

Validates proxies from the database against multiple targets. Requires:

  • servers.txt with IRC servers (for IRC mode) or uses built-in HTTP targets
  • Running Tor instance

URL Harvester

python ppf.py

Crawls URLs for proxy addresses and feeds them to the validator. Also starts the watchd internally.

Search Engine Scraper

python scraper.py

Queries search engines for proxy list URLs. Supports:

  • SearXNG instances
  • DuckDuckGo, Startpage, Brave, Ecosia
  • GitHub, GitLab, Codeberg (code search)

Import From File

python ppf.py --file proxies.txt

CLI Flags

--nobs          Use stdlib HTMLParser instead of BeautifulSoup
--file FILE     Import proxies from file
-q, --quiet     Show warnings and errors only
-v, --verbose   Show debug messages

REST API

Enable in config with httpd.enabled = 1.

# Get working proxies
curl http://localhost:8081/proxies?limit=10&proto=socks5

# Get count
curl http://localhost:8081/proxies/count

# Health check
curl http://localhost:8081/health

Query parameters:

  • limit - Max results (default: 100)
  • proto - Filter by protocol (socks4/socks5/http)
  • country - Filter by country code
  • asn - Filter by ASN number
  • format - Output format (json/plain)

Architecture

Components

File Purpose
proxywatchd.py Proxy validation daemon with multi-target voting
ppf.py URL harvester and proxy extractor
scraper.py Search engine integration
fetch.py HTTP client with proxy support
dbs.py Database operations
mysqlite.py SQLite wrapper with WAL mode
connection_pool.py Tor connection pooling with health tracking
config.py Configuration management
httpd.py REST API server

Validation Logic

Each proxy is tested against 3 random targets:

  • 2/3 majority required for success
  • Protocol auto-detected (tries HTTP, SOCKS5, SOCKS4)
  • SSL/TLS tested periodically
  • MITM detection via certificate validation

Database Schema

-- proxylist (proxies.sqlite)
proxy TEXT UNIQUE      -- ip:port
proto TEXT             -- socks4/socks5/http
country TEXT           -- 2-letter code
asn INT                -- autonomous system number
failed INT             -- consecutive failures
success_count INT      -- total successes
avg_latency REAL       -- rolling average (ms)
tested INT             -- last test timestamp

-- uris (websites.sqlite)
url TEXT UNIQUE        -- source URL
error INT              -- consecutive errors
stale_count INT        -- checks without new proxies

Threading Model

  • Priority queue orders jobs by proxy health
  • Dynamic thread scaling based on success rate
  • Work-stealing ensures even load distribution
  • Tor connection pooling with worker affinity

Deployment

Systemd Service

[Unit]
Description=PPF Python Proxy Finder
After=network-online.target tor.service
Wants=network-online.target

[Service]
Type=simple
User=ppf
WorkingDirectory=/opt/ppf
# ppf.py is the main entry point (runs harvester + validator)
ExecStart=/usr/bin/python2 ppf.py
Restart=on-failure
RestartSec=30

[Install]
WantedBy=multi-user.target

Container Deployment

# Build
podman build -t ppf:latest .

# Run with persistent storage
# IMPORTANT: Use ppf.py as entry point (runs both harvester + validator)
podman run -d --name ppf \
  --network=host \
  -v ./data:/app/data:Z \
  -v ./config.ini:/app/config.ini:ro \
  ppf:latest python ppf.py

# Generate systemd unit
podman generate systemd --name ppf --files --new

Note: --network=host required for Tor access at 127.0.0.1:9050.

Troubleshooting

Low Success Rate

WATCHD X.XX% SUCCESS RATE - tor circuit blocked?
  • Tor circuit may be flagged; restart Tor
  • Target servers may be blocking; wait for rotation
  • Network issues; check connectivity

Database Locked

WAL mode handles most concurrency. If issues persist:

  • Reduce thread count
  • Check disk I/O
  • Verify single instance running

No Proxies Found

  • Check search engines in config
  • Verify Tor connectivity
  • Review scraper logs for rate limiting

License

See LICENSE file.

Description
No description provided
Readme 3.7 MiB
Languages
Python 74.2%
JavaScript 10.5%
CSS 7.4%
Shell 4%
HTML 3.7%
Other 0.2%