docs: update project instructions
This commit is contained in:
331
CLAUDE.md
331
CLAUDE.md
@@ -1,96 +1,295 @@
|
||||
# PPF Project Instructions
|
||||
|
||||
## Current State
|
||||
|
||||
PPF is a Python 2 proxy scraping and validation framework with:
|
||||
- Multi-target validation (2/3 majority voting)
|
||||
- SSL/TLS proxy testing with MITM detection
|
||||
- Web dashboard with electric cyan theme
|
||||
- Interactive world map (/map endpoint)
|
||||
- Memory profiling (/api/memory endpoint)
|
||||
- Tor connection pooling with health monitoring
|
||||
|
||||
## Deployment to Odin
|
||||
|
||||
When deploying PPF to odin server, follow these specifications exactly:
|
||||
|
||||
### Production Deployment
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Host: odin
|
||||
Container User: podman
|
||||
Source Path: /home/podman/ppf/src/
|
||||
Data Path: /home/podman/ppf/data/
|
||||
Config: /home/podman/ppf/config.ini
|
||||
HTTP Port: 8081
|
||||
Container: ppf (localhost/ppf:latest)
|
||||
┌──────────┬─────────────┬────────────────────────────────────────────────────────┐
|
||||
│ Host │ Role │ Notes
|
||||
├──────────┼─────────────┼────────────────────────────────────────────────────────┤
|
||||
│ odin │ Master │ Scrapes proxy lists, verifies conflicts, port 8081
|
||||
│ forge │ Worker │ Tests proxies, reports to master via WireGuard
|
||||
│ hermes │ Worker │ Tests proxies, reports to master via WireGuard
|
||||
│ janus │ Worker │ Tests proxies, reports to master via WireGuard
|
||||
└──────────┴─────────────┴────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Deployment Steps
|
||||
### Role Separation
|
||||
|
||||
1. **Validate syntax** - Run `python3 -m py_compile` on all .py files
|
||||
2. **Sync to staging** - rsync Python files to `odin:/tmp/ppf-update/`
|
||||
3. **Copy to container source** - `sudo -u podman cp /tmp/ppf-update/*.py /home/podman/ppf/src/`
|
||||
4. **Restart container** - `sudo -u podman podman restart ppf`
|
||||
5. **Clean staging** - Remove `/tmp/ppf-update/`
|
||||
6. **Verify** - Check container logs for successful startup
|
||||
- **Odin (Master)**: Scrapes proxy sources, does verification tests only. No routine testing. Local Tor only.
|
||||
- **Workers**: All routine proxy testing. Each uses only local Tor (127.0.0.1:9050).
|
||||
|
||||
### Test Pod (Development)
|
||||
## CRITICAL: Directory Structure Differences
|
||||
|
||||
If running a test container, use:
|
||||
- Different port: 8082
|
||||
- Different container name: ppf-test
|
||||
- Separate data directory: /home/podman/ppf-test/data/
|
||||
```
|
||||
┌──────────┬─────────────────────────┬──────────────────────────────────────────┐
|
||||
│ Host │ Code Location │ Container Mount
|
||||
├──────────┼─────────────────────────┼──────────────────────────────────────────┤
|
||||
│ odin │ /home/podman/ppf/*.py │ Mounts ppf/ directly to /app
|
||||
│ workers │ /home/podman/ppf/src/ │ Mounts ppf/src/ to /app (via systemd)
|
||||
└──────────┴─────────────────────────┴──────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**ODIN uses root ppf/ directory. WORKERS use ppf/src/ subdirectory.**
|
||||
|
||||
## Host Access
|
||||
|
||||
**ALWAYS use Ansible from `/opt/ansible` with venv activated:**
|
||||
|
||||
```bash
|
||||
# Test container example
|
||||
sudo -u podman podman run -d --name ppf-test \
|
||||
-p 8082:8081 \
|
||||
-v /home/podman/ppf/src:/app/src:ro \
|
||||
-v /home/podman/ppf-test/data:/app/data \
|
||||
localhost/ppf:latest
|
||||
cd /opt/ansible && source venv/bin/activate
|
||||
```
|
||||
|
||||
### Files to Deploy
|
||||
### Quick Reference Commands
|
||||
|
||||
All Python files in project root:
|
||||
- proxywatchd.py (main daemon)
|
||||
- httpd.py (web dashboard)
|
||||
- config.py, dbs.py, fetch.py, etc.
|
||||
```bash
|
||||
# Check worker status
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible forge,hermes,janus -m shell -a "hostname"
|
||||
|
||||
### Do NOT Deploy
|
||||
# Check worker config
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible forge,hermes,janus -m shell -a "grep -E 'threads|timeout|ssl' /home/podman/ppf/config.ini"
|
||||
|
||||
- config.ini (server-specific)
|
||||
- data/ directory contents
|
||||
- *.sqlite files
|
||||
# Check worker logs
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible forge -m shell -a "sudo -u podman journalctl --user -u ppf-worker -n 20"
|
||||
|
||||
# Modify config option
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible forge,hermes,janus -m lineinfile -a "path=/home/podman/ppf/config.ini line='ssl_only = 1' insertafter='ssl_first'"
|
||||
|
||||
# Restart workers (different UIDs!)
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible janus,forge -m raw -a "sudo -u podman XDG_RUNTIME_DIR=/run/user/996 systemctl --user restart ppf-worker"
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible hermes -m raw -a "sudo -u podman XDG_RUNTIME_DIR=/run/user/1001 systemctl --user restart ppf-worker"
|
||||
```
|
||||
|
||||
## Full Deployment Procedure
|
||||
|
||||
### Step 1: Validate Syntax Locally
|
||||
|
||||
```bash
|
||||
cd /home/user/git/ppf
|
||||
for f in *.py; do python3 -m py_compile "$f" && echo "OK: $f"; done
|
||||
```
|
||||
|
||||
### Step 2: Deploy to ALL Hosts
|
||||
|
||||
```bash
|
||||
cd /opt/ansible && source venv/bin/activate
|
||||
|
||||
# Deploy to ODIN (root ppf/ directory)
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m synchronize \
|
||||
-a "src=/home/user/git/ppf/ dest=/home/podman/ppf/ rsync_opts='--include=*.py,--include=servers.txt,--exclude=*'"
|
||||
|
||||
# Deploy to WORKERS (ppf/src/ subdirectory)
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible forge,hermes,janus -m synchronize \
|
||||
-a "src=/home/user/git/ppf/ dest=/home/podman/ppf/src/ rsync_opts='--include=*.py,--include=servers.txt,--exclude=*'"
|
||||
|
||||
# CRITICAL: Fix ownership on ALL hosts (rsync uses ansible user, containers need podman)
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin,forge,hermes,janus -m raw \
|
||||
-a "chown -R podman:podman /home/podman/ppf/"
|
||||
```
|
||||
|
||||
**Note:** Ownership must be fixed after every deploy. rsync runs as ansible user, but containers run as podman user. Missing ownership fix causes `ImportError: No module named X` errors.
|
||||
|
||||
### Step 3: Restart Services
|
||||
|
||||
```bash
|
||||
# Restart ODIN (UID 1005)
|
||||
ansible odin -m raw \
|
||||
-a "cd /tmp && XDG_RUNTIME_DIR=/run/user/1005 runuser -u podman -- podman restart ppf"
|
||||
|
||||
# Restart WORKERS (note different UIDs)
|
||||
ansible janus,forge -m raw \
|
||||
-a "sudo -u podman XDG_RUNTIME_DIR=/run/user/996 systemctl --user restart ppf-worker"
|
||||
ansible hermes -m raw \
|
||||
-a "sudo -u podman XDG_RUNTIME_DIR=/run/user/1001 systemctl --user restart ppf-worker"
|
||||
```
|
||||
|
||||
### Step 4: Verify All Running
|
||||
|
||||
```bash
|
||||
# Check odin (UID 1005)
|
||||
ansible odin -m raw \
|
||||
-a "cd /tmp && XDG_RUNTIME_DIR=/run/user/1005 runuser -u podman -- podman ps"
|
||||
|
||||
# Check workers
|
||||
ansible janus,forge -m raw \
|
||||
-a "sudo -u podman XDG_RUNTIME_DIR=/run/user/996 systemctl --user is-active ppf-worker"
|
||||
ansible hermes -m raw \
|
||||
-a "sudo -u podman XDG_RUNTIME_DIR=/run/user/1001 systemctl --user is-active ppf-worker"
|
||||
```
|
||||
|
||||
## Podman User IDs
|
||||
|
||||
```
|
||||
┌──────────┬───────┬─────────────────────────────┐
|
||||
│ Host │ UID │ XDG_RUNTIME_DIR
|
||||
├──────────┼───────┼─────────────────────────────┤
|
||||
│ odin │ 1005 │ /run/user/1005
|
||||
│ hermes │ 1001 │ /run/user/1001
|
||||
│ janus │ 996 │ /run/user/996
|
||||
│ forge │ 996 │ /run/user/996
|
||||
└──────────┴───────┴─────────────────────────────┘
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Odin config.ini
|
||||
|
||||
```ini
|
||||
[common]
|
||||
tor_hosts = 127.0.0.1:9050 # Local Tor ONLY
|
||||
|
||||
[watchd]
|
||||
threads = 0 # NO routine testing
|
||||
database = data/ppf.sqlite
|
||||
|
||||
[scraper]
|
||||
threads = 10
|
||||
```
|
||||
|
||||
### Worker config.ini
|
||||
|
||||
```ini
|
||||
[common]
|
||||
tor_hosts = 127.0.0.1:9050 # Local Tor ONLY
|
||||
|
||||
[watchd]
|
||||
threads = 35
|
||||
timeout = 9
|
||||
ssl_first = 1 # Try SSL handshake first
|
||||
ssl_only = 0 # Set to 1 to skip secondary check on SSL failure
|
||||
checktype = head # Secondary check type: head, irc, judges
|
||||
```
|
||||
|
||||
### Config Options
|
||||
|
||||
```
|
||||
┌───────────────┬─────────┬────────────────────────────────────────────────────┐
|
||||
│ Option │ Default │ Description
|
||||
├───────────────┼─────────┼────────────────────────────────────────────────────┤
|
||||
│ ssl_first │ 1 │ Try SSL handshake first, fallback to checktype
|
||||
│ ssl_only │ 0 │ Skip secondary check when SSL fails (faster)
|
||||
│ checktype │ head │ Secondary check: head, irc, judges
|
||||
│ threads │ 20 │ Number of test threads
|
||||
│ timeout │ 15 │ Socket timeout in seconds
|
||||
└───────────────┴─────────┴────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Work Distribution
|
||||
|
||||
Fair distribution algorithm (httpd.py):
|
||||
|
||||
```
|
||||
fair_share = (due_proxies / active_workers) * 1.2
|
||||
batch_size = clamp(fair_share, min=100, max=1000)
|
||||
```
|
||||
|
||||
- Master calculates batch size based on queue and active workers
|
||||
- Workers shuffle their batch locally to avoid testing same proxies simultaneously
|
||||
- Claims expire after 5 minutes if not completed
|
||||
|
||||
## Worker systemd Unit
|
||||
|
||||
Located at `/home/podman/.config/systemd/user/ppf-worker.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=PPF Worker Container
|
||||
After=network-online.target tor.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
Restart=on-failure
|
||||
RestartSec=10
|
||||
WorkingDirectory=%h
|
||||
ExecStartPre=-/usr/bin/podman stop -t 10 ppf-worker
|
||||
ExecStartPre=-/usr/bin/podman rm -f ppf-worker
|
||||
ExecStart=/usr/bin/podman run \
|
||||
--name ppf-worker --rm --log-driver=journald --network=host \
|
||||
-v %h/ppf/src:/app:ro \
|
||||
-v %h/ppf/data:/app/data \
|
||||
-v %h/ppf/config.ini:/app/config.ini:ro \
|
||||
-e PYTHONUNBUFFERED=1 \
|
||||
localhost/ppf-worker:latest \
|
||||
python -u ppf.py --worker --server http://10.200.1.250:8081
|
||||
ExecStop=/usr/bin/podman stop -t 10 ppf-worker
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
```
|
||||
|
||||
## Rebuilding Images
|
||||
|
||||
```bash
|
||||
# Workers - from ppf/ directory (Dockerfile copies from src/)
|
||||
ansible forge,hermes,janus -m raw \
|
||||
-a "cd /home/podman/ppf && sudo -u podman podman build -t localhost/ppf-worker:latest ."
|
||||
|
||||
# Odin - from ppf/ directory
|
||||
ansible odin -m raw \
|
||||
-a "cd /home/podman/ppf && sudo -u podman podman build -t localhost/ppf:latest ."
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```
|
||||
/dashboard Web UI with live statistics
|
||||
/map Interactive world map (Leaflet.js)
|
||||
/health Health check: {"status": "ok"}
|
||||
/map Interactive world map
|
||||
/health Health check
|
||||
/api/stats Runtime statistics (JSON)
|
||||
/api/memory Memory profiling data (JSON)
|
||||
/api/workers Connected worker status
|
||||
/api/memory Memory profiling data
|
||||
/api/countries Proxy counts by country
|
||||
/api/locations Precise proxy locations (requires DB5)
|
||||
/proxies Working proxies (limit, proto, country params)
|
||||
/proxies Working proxies list
|
||||
```
|
||||
|
||||
## Memory Analysis
|
||||
## Troubleshooting
|
||||
|
||||
Query production memory state:
|
||||
### Missing servers.txt
|
||||
|
||||
Workers need `servers.txt` in src/:
|
||||
```bash
|
||||
ssh odin "curl -s localhost:8081/api/memory" | python3 -m json.tool
|
||||
ansible forge,hermes,janus -m copy \
|
||||
-a "src=/home/user/git/ppf/servers.txt dest=/home/podman/ppf/src/servers.txt owner=podman group=podman"
|
||||
```
|
||||
|
||||
Key metrics:
|
||||
- `start_rss` / `process.VmRSS` - memory growth
|
||||
- `objgraph_common` - top object types by count
|
||||
- `samples` - RSS history over time
|
||||
- `gc.objects` - total GC-tracked objects
|
||||
### Exit Code 126 (Permission/Storage)
|
||||
|
||||
Current baseline (~260k queue):
|
||||
- Start: 442 MB
|
||||
- Running: 1.6 GB
|
||||
- Per-job overhead: ~4.5 KB
|
||||
```bash
|
||||
sudo -u podman podman system reset --force
|
||||
# Then rebuild image
|
||||
```
|
||||
|
||||
### Dashboard Shows NaN or Missing Data
|
||||
|
||||
Odin likely running old code. Redeploy to odin:
|
||||
```bash
|
||||
ansible odin -m synchronize \
|
||||
-a "src=/home/user/git/ppf/ dest=/home/podman/ppf/ rsync_opts='--include=*.py,--include=servers.txt,--exclude=*'"
|
||||
ansible odin -m raw -a "chown -R podman:podman /home/podman/ppf/"
|
||||
ansible odin -m raw -a "cd /tmp; sudo -u podman podman restart ppf"
|
||||
```
|
||||
|
||||
### Worker Keeps Crashing
|
||||
|
||||
1. Check systemd status with correct UID
|
||||
2. Verify servers.txt exists in src/
|
||||
3. Check ownership
|
||||
4. Run manually to see error:
|
||||
```bash
|
||||
sudo -u podman podman run --rm --network=host \
|
||||
-v /home/podman/ppf/src:/app:ro \
|
||||
-v /home/podman/ppf/data:/app/data \
|
||||
-v /home/podman/ppf/config.ini:/app/config.ini:ro \
|
||||
localhost/ppf-worker:latest \
|
||||
python -u ppf.py --worker --server http://10.200.1.250:8081
|
||||
```
|
||||
|
||||
## Files to Deploy
|
||||
|
||||
- All *.py files
|
||||
- servers.txt
|
||||
|
||||
## Do NOT Deploy
|
||||
|
||||
- config.ini (server-specific)
|
||||
- data/ contents
|
||||
- *.sqlite files
|
||||
|
||||
Reference in New Issue
Block a user