31 KiB
PPF Implementation Tasks
Legend
[ ] Not started
[~] In progress
[x] Completed
[!] Blocked/needs discussion
Immediate Priority (Next Sprint)
[x] 1. Unify _known_proxies Cache
Completed. Added init_known_proxies(), add_known_proxies(), is_known_proxy()
to fetch.py. Updated ppf.py to use these functions instead of local cache.
[x] 2. Graceful SQLite Error Handling
Completed. mysqlite.py now retries on "locked" errors with exponential backoff.
[x] 3. Enable SQLite WAL Mode
Completed. mysqlite.py enables WAL mode and NORMAL synchronous on init.
[x] 4. Batch Database Inserts
Completed. dbs.py uses executemany() for batch inserts.
[x] 5. Add Database Indexes
Completed. dbs.py creates indexes on failed, tested, proto, error, check_time.
Short Term (This Month)
[x] 6. Log Level Filtering
Completed. Added log level filtering with -q/--quiet and -v/--verbose CLI flags.
- misc.py: LOG_LEVELS dict, set_log_level(), get_log_level()
- config.py: Added -q/--quiet and -v/--verbose arguments
- Log levels: debug=0, info=1, warn=2, error=3
- --quiet: only show warn/error
- --verbose: show debug messages
[x] 7. Connection Timeout Standardization
Completed. Added timeout_connect and timeout_read to [common] section in config.py.
[x] 8. Failure Categorization
Completed. Added failure categorization for proxy errors.
- misc.py: categorize_error() function, FAIL_* constants
- Categories: timeout, refused, auth, unreachable, dns, ssl, closed, proxy, other
- proxywatchd.py: Stats.record() now accepts category parameter
- Stats.report() shows failure breakdown by category
- ProxyTestState.evaluate() returns (success, category) tuple
[x] 9. Priority Queue for Proxy Testing
Completed. Added priority-based job scheduling for proxy tests.
- PriorityJobQueue class with heap-based ordering
- calculate_priority() assigns priority 0-4 based on proxy state
- Priority 0: New proxies (never tested)
- Priority 1: Working proxies (no failures)
- Priority 2: Low fail count (< 3)
- Priority 3-4: Medium/high fail count
- Integrated into prepare_jobs() for automatic prioritization
[x] 10. Periodic Statistics Output
Completed. Added Stats class to proxywatchd.py with record(), should_report(), and report() methods. Integrated into main loop with configurable stats_interval.
Medium Term (Next Quarter)
[x] 11. Tor Connection Pooling
Completed. Added connection pooling with worker-Tor affinity and health monitoring.
- connection_pool.py: TorHostState class tracks per-host health, latency, backoff
- connection_pool.py: TorConnectionPool with worker affinity, warmup, statistics
- proxywatchd.py: Workers get consistent Tor host assignment for circuit reuse
- Success/failure tracking with exponential backoff (5s, 10s, 20s, 40s, max 60s)
- Latency tracking with rolling averages
- Pool status reported alongside periodic stats
[x] 12. Dynamic Thread Scaling
Completed. Added dynamic thread scaling based on queue depth and success rate.
- ThreadScaler class in proxywatchd.py with should_scale(), status_line()
- Scales up when queue is deep (2x target) and success rate > 10%
- Scales down when queue is shallow or success rate drops
- Min/max threads derived from config.watchd.threads (1/4x to 2x)
- 30-second cooldown between scaling decisions
- _spawn_thread(), _remove_thread(), _adjust_threads() helper methods
- Scaler status reported alongside periodic stats
[x] 13. Latency Tracking
Completed. Added per-proxy latency tracking with exponential moving average.
- dbs.py: avg_latency, latency_samples columns added to proxylist schema
- dbs.py: _migrate_latency_columns() for backward-compatible migration
- dbs.py: update_proxy_latency() with EMA (alpha = 2/(samples+1))
- proxywatchd.py: ProxyTestState.last_latency_ms field
- proxywatchd.py: evaluate() calculates average latency from successful tests
- proxywatchd.py: submit_collected() records latency for passing proxies
[x] 14. Export Functionality
Completed. Added export.py CLI tool for exporting working proxies.
- Formats: txt (default), json, csv, len (length-prefixed)
- Filters: --proto, --country, --anonymity, --max-latency
- Options: --sort (latency, added, tested, success), --limit, --pretty
- Output: stdout or --output file
- Usage:
python export.py --proto http --country US --sort latency --limit 100
[ ] 15. Unit Test Infrastructure
Problem: No automated tests. Changes can break existing functionality silently.
Implementation:
tests/
├── __init__.py
├── test_proxy_utils.py # Test IP validation, cleansing
├── test_extract.py # Test proxy/URL extraction
├── test_database.py # Test DB operations with temp DB
└── mock_network.py # Mock rocksock for offline testing
# tests/test_proxy_utils.py
import unittest
import sys
sys.path.insert(0, '..')
import fetch
class TestProxyValidation(unittest.TestCase):
def test_valid_proxy(self):
self.assertTrue(fetch.is_usable_proxy('8.8.8.8:8080'))
def test_private_ip_rejected(self):
self.assertFalse(fetch.is_usable_proxy('192.168.1.1:8080'))
self.assertFalse(fetch.is_usable_proxy('10.0.0.1:8080'))
self.assertFalse(fetch.is_usable_proxy('172.16.0.1:8080'))
def test_invalid_port_rejected(self):
self.assertFalse(fetch.is_usable_proxy('8.8.8.8:0'))
self.assertFalse(fetch.is_usable_proxy('8.8.8.8:99999'))
if __name__ == '__main__':
unittest.main()
Files: tests/ directory Effort: High (initial), Low (ongoing) Risk: Low
Long Term (Future)
[x] 16. Geographic Validation
Completed. Added IP2Location and pyasn for proxy geolocation.
- requirements.txt: Added IP2Location package
- proxywatchd.py: IP2Location for country lookup, pyasn for ASN lookup
- proxywatchd.py: Fixed ValueError handling when database files missing
- data/: IP2LOCATION-LITE-DB1.BIN (2.7M), ipasn.dat (23M)
- Output shows country codes:
http://1.2.3.4:8080 (US)or(IN),(DE), etc.
[x] 17. SSL Proxy Testing
Completed. Added SSL checktype for TLS handshake validation.
- config.py: Default checktype changed to 'ssl'
- proxywatchd.py: ssl_targets list with major HTTPS sites
- Validates TLS handshake with certificate verification
- Detects MITM proxies that intercept SSL connections
[x] 18. Additional Search Engines
Completed. Added modular search engine architecture.
- engines.py: SearchEngine base class with build_url(), extract_urls(), is_rate_limited()
- Engines: DuckDuckGo, Startpage, Mojeek (UK), Qwant (FR), Yandex (RU), Ecosia, Brave
- Git hosters: GitHub, GitLab, Codeberg, Gitea
- scraper.py: EngineTracker class for multi-engine rate limiting
- Config: [scraper] engines, max_pages settings
- searx.instances: Updated with 51 active SearXNG instances
[x] 19. REST API
Completed. Added HTTP API server for querying working proxies.
- httpd.py: ProxyAPIServer class with BaseHTTPServer
- Endpoints: /proxies, /proxies/count, /health
- Params: limit, proto, country, format (json/plain)
- Integrated into proxywatchd.py (starts when httpd.enabled=True)
- Config: [httpd] section with listenip, port, enabled
[x] 20. Web Dashboard
Completed. Added web dashboard with live statistics.
- httpd.py: DASHBOARD_HTML template with dark theme UI
- Endpoint: /dashboard (HTML page with auto-refresh)
- Endpoint: /api/stats (JSON runtime statistics)
- Stats include: tested/passed counts, success rate, thread count, uptime
- Tor pool health: per-host latency, success rate, availability
- Failure categories: timeout, proxy, ssl, closed, etc.
- proxywatchd.py: get_runtime_stats() method provides stats callback
[x] 21. Dashboard Enhancements (v2)
Completed. Major dashboard improvements for better visibility.
- Prominent check type badge in header (SSL/JUDGES/HTTP/IRC with color coding)
- System monitor bar: load average, memory usage, disk usage, process RSS
- Anonymity breakdown: elite/anonymous/transparent proxy counts
- Database health indicators: size, tested/hour, added/day, dead count
- Enhanced Tor pool: total requests, success rate, healthy nodes, avg latency
- SQLite ANALYZE/VACUUM functions for query optimization (dbs.py)
- Database statistics API (get_database_stats())
[x] 22. Completion Queue Optimization
Completed. Eliminated polling bottleneck in proxy test collection.
- Added
completion_queuefor event-driven state signaling ProxyTestState.record_result()signals when all targets completecollect_work()drains queue instead of polling all pending states- Changed
pending_statesfrom list to dict for O(1) removal - Result:
is_complete()eliminated from hot path,collect_work()54x faster
Profiling-Based Performance Optimizations
Baseline: 30-minute profiling session, 25.6M function calls, 1842s runtime
The following optimizations were identified through cProfile analysis. Each is assessed for real-world impact based on measured data.
[x] 1. SQLite Query Batching
Completed. Added batch update functions and optimized submit_collected().
Implementation:
batch_update_proxy_latency(): Single SELECT with IN clause, compute EMA in Python, batch UPDATE with executemany()batch_update_proxy_anonymity(): Batch all anonymity updates in single executemany()submit_collected(): Uses batch functions instead of per-proxy loops
Previous State:
- 18,182 execute() calls consuming 50.6s (2.7% of runtime)
- Individual UPDATE for each proxy latency and anonymity
Improvement:
- Reduced from N execute() + N commit() to 1 SELECT + 1 executemany() per batch
- Estimated 15-25% reduction in SQLite overhead
[ ] 2. Proxy Validation Caching
Current State:
is_usable_proxy(): 174,620 calls, 1.79s totalfetch.py:242 <genexpr>: 3,403,165 calls, 3.66s total (proxy iteration)- Many repeated validations for same proxy strings
Proposed Change:
- Add LRU cache decorator to
is_usable_proxy() - Cache size: 10,000 entries (covers typical working set)
- TTL: None needed (IP validity doesn't change)
Assessment:
Current cost: 5.5s per 30min = 11s/hour = 4.4min/day
Potential saving: 50-70% cache hit rate = 2.7-3.8s per 30min = 5-8s/hour
Effort: Very low (add @lru_cache decorator)
Risk: None (pure function, deterministic output)
Verdict: LOW PRIORITY. Minimal gain for minimal effort. Do if convenient.
[x] 3. Regex Pattern Pre-compilation
Completed. Pre-compiled proxy extraction pattern at module load.
Implementation:
fetch.py: AddedPROXY_PATTERN = re.compile(r'...')at module levelextract_proxies(): Changedre.findall(pattern, ...)toPROXY_PATTERN.findall(...)- Pattern compiled once at import, not on each call
Previous State:
extract_proxies(): 166 calls, 2.87s total (17.3ms each)- Pattern recompiled on each call
Improvement:
- Eliminated per-call regex compilation overhead
- Estimated 30-50% reduction in extract_proxies() time
[ ] 4. JSON Stats Response Caching
Current State:
- 1.9M calls to JSON encoder functions
_iterencode_dict: 1.4s,_iterencode_list: 0.8s- Dashboard polls every 3 seconds = 600 requests per 30min
- Most stats data unchanged between requests
Proposed Change:
- Cache serialized JSON response with short TTL (1-2 seconds)
- Only regenerate when underlying stats change
- Use ETag/If-None-Match for client-side caching
Assessment:
Current cost: ~5.5s per 30min (JSON encoding overhead)
Potential saving: 60-80% = 3.3-4.4s per 30min = 6.6-8.8s/hour
Effort: Medium (add caching layer to httpd.py)
Risk: Low (stale stats for 1-2 seconds acceptable)
Verdict: LOW PRIORITY. Only matters with frequent dashboard access.
[ ] 5. Object Pooling for Test States
Current State:
__new__calls: 43,413 at 10.1s totalProxyTestState.__init__: 18,150 calls, 0.87sTargetTestJobcreation: similar overhead- Objects created and discarded each test cycle
Proposed Change:
- Implement object pool for ProxyTestState and TargetTestJob
- Reset and reuse objects instead of creating new
- Pool size: 2x thread count
Assessment:
Current cost: ~11s per 30min = 22s/hour = 14.7min/day
Potential saving: 50-70% = 5.5-7.7s per 30min = 11-15s/hour = 7-10min/day
Effort: High (significant refactoring, reset logic needed)
Risk: Medium (state leakage bugs if reset incomplete)
Verdict: NOT RECOMMENDED. High effort, medium risk, modest gain. Python's object creation is already optimized. Focus elsewhere.
[ ] 6. SQLite Connection Reuse
Current State:
- 718 connection opens in 30min session
- Each open: 0.26ms (total 0.18s for connects)
- Connection per operation pattern in mysqlite.py
Proposed Change:
- Maintain persistent connection per thread
- Implement connection pool with health checks
- Reuse connections across operations
Assessment:
Current cost: 0.18s per 30min (connection overhead only)
Potential saving: 90% = 0.16s per 30min = 0.32s/hour
Effort: Medium (thread-local storage, lifecycle management)
Risk: Medium (connection state, locking issues)
Verdict: NOT RECOMMENDED. Negligible time savings (0.16s per 30min). SQLite's lightweight connections don't justify pooling complexity.
Summary: Optimization Priority Matrix
┌─────────────────────────────────────┬────────┬────────┬─────────┬───────────┐
│ Optimization │ Effort │ Risk │ Savings │ Status
├─────────────────────────────────────┼────────┼────────┼─────────┼───────────┤
│ 1. SQLite Query Batching │ Low │ Low │ 20-34s/h│ DONE
│ 2. Proxy Validation Caching │ V.Low │ None │ 5-8s/h │ Maybe
│ 3. Regex Pre-compilation │ Low │ None │ 5-8s/h │ DONE
│ 4. JSON Response Caching │ Medium │ Low │ 7-9s/h │ Later
│ 5. Object Pooling │ High │ Medium │ 11-15s/h│ Skip
│ 6. SQLite Connection Reuse │ Medium │ Medium │ 0.3s/h │ Skip
└─────────────────────────────────────┴────────┴────────┴─────────┴───────────┘
Completed: 1 (SQLite Batching), 3 (Regex Pre-compilation)
Remaining: 2 (Proxy Caching - Maybe), 4 (JSON Caching - Later)
Realized savings from completed optimizations:
Per hour: 25-42 seconds saved
Per day: 10-17 minutes saved
Per week: 1.2-2.0 hours saved
Note: 68.7% of runtime is socket I/O (recv/send) which cannot be optimized
without changing the fundamental network architecture. The optimizations
above target the remaining 31.3% of CPU-bound operations.
Potential Dashboard Improvements
[ ] Dashboard Performance Optimizations
Goal: Ensure dashboard remains lightweight and doesn't impact system performance.
Current safeguards:
- No polling on server side (client-initiated via fetch)
- 3-second refresh interval (configurable)
- Minimal DOM updates (targeted element updates, not full re-render)
- Static CSS/JS (no server-side templating per request)
- No persistent connections (stateless HTTP)
Future considerations:
- Add rate limiting on /api/stats endpoint
- Cache expensive DB queries (top countries, protocol breakdown)
- Lazy-load historical data (only when scrolled into view)
- WebSocket option for push updates (reduce polling overhead)
- Configurable refresh interval via URL param or localStorage
- Disable auto-refresh when tab not visible (Page Visibility API)
[ ] Dashboard Feature Ideas
Low priority - consider when time permits:
- Geographic map visualization - /map endpoint with Leaflet.js
- Dark/light theme toggle
- Export stats as CSV/JSON from dashboard
- Historical graphs (24h, 7d) using stats_history table
- Per-ASN performance analysis
- Alert thresholds (success rate < X%, MITM detected)
- Mobile-responsive improvements
- Keyboard shortcuts (r=refresh, t=toggle sections)
[ ] Local JS Library Serving
Goal: Serve all JavaScript libraries locally instead of CDN for reliability and offline use.
Current CDN dependencies:
- Leaflet.js 1.9.4 (map) - https://unpkg.com/leaflet@1.9.4/
Implementation:
- Bundle libraries into container image
- Serve from /static/lib/ endpoint
- Update HTML to reference local paths
Candidate libraries for future enhancements:
┌─────────────────┬─────────┬───────────────────────────────────────────────┐
│ Library │ Size │ Use Case
├─────────────────┼─────────┼───────────────────────────────────────────────┤
│ Chart.js │ 65 KB │ Line/bar/pie charts (simpler API than D3)
│ uPlot │ 15 KB │ Fast time-series charts (minimal, performant)
│ ApexCharts │ 125 KB │ Modern charts with animations
│ Frappe Charts │ 25 KB │ Simple, modern SVG charts
│ Sparkline │ 2 KB │ Tiny inline charts (already have custom impl)
├─────────────────┼─────────┼───────────────────────────────────────────────┤
│ D3.js │ 85 KB │ Full control, complex visualizations
│ D3-geo │ 30 KB │ Geographic projections (alternative to Leaflet)
├─────────────────┼─────────┼───────────────────────────────────────────────┤
│ Leaflet │ 40 KB │ Interactive maps (already using)
│ Leaflet.heat │ 5 KB │ Heatmap layer for proxy density
│ Leaflet.cluster │ 10 KB │ Marker clustering for many points
└─────────────────┴─────────┴───────────────────────────────────────────────┘
Recommendations:
● uPlot - Best for time-series (rate history, success rate history)
● Chart.js - Best for pie/bar charts (failure breakdown, protocol stats)
● Leaflet - Keep for maps, add heatmap plugin for density viz
Current custom implementations (no library):
- Sparkline charts (Test Rate History, Success Rate History) - inline SVG
- Histogram bars (Response Time Distribution) - CSS divs
- Pie charts (Failure Breakdown, Protocol Stats) - CSS conic-gradient
Decision: Current custom implementations are lightweight and sufficient. Add libraries only when custom becomes unmaintainable or new features needed.
[ ] Memory Optimization Candidates
Based on memory analysis (production metrics):
Current State (260k queue):
Start RSS: 442 MB
Current RSS: 1,615 MB
Per-job: ~4.5 KB overhead
Object Distribution:
259,863 TargetTestJob (1 per job)
259,863 ProxyTestState (1 per job)
259,950 LockType (1 per job - threading locks)
523,395 dict (2 per job - state + metadata)
522,807 list (2 per job - results + targets)
Potential optimizations (not yet implemented):
- Lock consolidation - reduce per-proxy locks (260k LockType objects)
- Leaner state objects - reduce dict/list count per job
- Slot-based classes - use
__slots__on hot objects - Object pooling - reuse ProxyTestState/TargetTestJob objects
Verdict: Memory scales linearly with queue (~4.5 KB/job). No leaks detected. Current usage acceptable for production workloads. Optimize only if memory becomes a constraint.
Completed
[x] Work-Stealing Queue
- Implemented shared Queue.Queue() for job distribution
- Workers pull from shared queue instead of pre-assigned lists
- Better utilization across threads
[x] Multi-Target Validation
- Test each proxy against 3 random targets
- 2/3 majority required for success
- Reduces false negatives from single target failures
[x] Interleaved Testing
- Jobs shuffled across all proxies before queueing
- Prevents burst of 3 connections to same proxy
- ProxyTestState accumulates results from TargetTestJobs
[x] Code Cleanup
- Removed 93 lines dead HTTP server code (ppf.py)
- Removed dead gumbo parser (soup_parser.py)
- Removed test code (comboparse.py)
- Removed unused functions (misc.py)
- Fixed IP/port cleansing (ppf.py)
- Updated .gitignore
[x] Rate Limiting & Instance Tracking (scraper.py)
- InstanceTracker class with exponential backoff
- Configurable backoff_base, backoff_max, fail_threshold
- Instance cycling when rate limited
[x] Exception Logging with Context
- Replaced bare
except:with typed exceptions across all files - Added context logging to exception handlers (e.g., URL, error message)
[x] Timeout Standardization
- Added timeout_connect, timeout_read to [common] config section
- Added stale_days, stats_interval to [watchd] config section
[x] Periodic Stats & Stale Cleanup (proxywatchd.py)
- Stats class tracks tested/passed/failed with thread-safe counters
- Configurable stats_interval (default: 300s)
- cleanup_stale() removes dead proxies older than stale_days (default: 30)
[x] Unified Proxy Cache
- Moved _known_proxies to fetch.py with helper functions
- init_known_proxies(), add_known_proxies(), is_known_proxy()
- ppf.py now uses shared cache via fetch module
[x] Config Validation
- config.py: validate() method checks config values on startup
- Validates: port ranges, timeout values, thread counts, engine names
- Warns on missing source_file, unknown engines
- Errors on unwritable database directories
- Integrated into ppf.py, proxywatchd.py, scraper.py main entry points
[x] Profiling Support
- config.py: Added --profile CLI argument
- ppf.py: Refactored main logic into main() function
- ppf.py: cProfile wrapper with stats output to profile.stats
- Prints top 20 functions by cumulative time on exit
- Usage:
python2 ppf.py --profile
[x] SIGTERM Graceful Shutdown
- ppf.py: Added signal handler converting SIGTERM to KeyboardInterrupt
- Ensures profile stats are written before container exit
- Allows clean thread shutdown in containerized environments
- Podman stop now triggers proper cleanup instead of SIGKILL
[x] Unicode Exception Handling (Python 2)
- Problem:
repr(e)on exceptions with unicode content caused encoding errors - Files affected: ppf.py, scraper.py (3 exception handlers)
- Solution: Check
isinstance(err_msg, unicode)then encode with 'backslashreplace' - Pattern applied:
try: err_msg = repr(e) if isinstance(err_msg, unicode): err_msg = err_msg.encode('ascii', 'backslashreplace') except: err_msg = type(e).__name__ - Handles Korean/CJK characters in search queries without crashing
[x] Interactive World Map (/map endpoint)
- Added Leaflet.js interactive map showing proxy distribution by country
- Modern glassmorphism UI with
backdrop-filter: blur(12px) - CartoDB dark tiles for dark theme
- Circle markers sized proportionally to proxy count per country
- Hover effects with smooth transitions
- Stats overlay showing total countries/proxies
- Legend with proxy count scale
- Country coordinates and names lookup tables
[x] Dashboard v3 - Electric Cyan Theme
- Translucent glass-morphism effects with
backdrop-filter: blur() - Electric cyan glow borders
rgba(56,189,248,...)on all graph wrappers - Gradient overlays using
::beforepseudo-elements - Unified styling across: .chart-wrap, .histo-wrap, .stats-wrap, .lb-wrap, .pie-wrap
- New .tor-card wrapper for Tor Exit Nodes with hover effects
- Lighter background color scheme (#1e2738 bg, #181f2a card)
[x] Map Endpoint Styling Update
- Converted from gold/bronze theme (#c8b48c) to electric cyan (#38bdf8)
- Glass panels with electric glow matching dashboard
- Map markers for approximate locations now cyan instead of gold
- Unified map_bg color with dashboard background (#1e2738)
- Updated Leaflet controls, popups, and legend to cyan theme
[x] MITM Re-test Optimization
- Skip redundant SSL checks for proxies already known to be MITM
- Added
mitm_retest_skippedcounter to Stats class - Optimization in
_try_ssl_check()checks existing MITM flag before testing - Avoids 6k+ unnecessary re-tests per session (based on production metrics)
[x] Memory Profiling Endpoint
- /api/memory endpoint with comprehensive memory analysis
- objgraph integration for object type distribution
- pympler integration for memory summaries
- Memory sample history tracking (RSS over time)
- Process memory from /proc/self/status
- GC statistics and collection counts
Deployment Troubleshooting Log
[x] Container Crash on Startup (2024-12-24)
Symptoms:
- Container starts then immediately disappears
podman psshows no running containerspodman logs ppfreturns "no such container"- Port 8081 not listening
Debugging Process:
-
Initial diagnosis - SSH to odin, checked container state:
sudo -u podman podman ps -a # Empty sudo ss -tlnp | grep 8081 # Nothing listening -
Ran container in foreground to capture output:
sudo -u podman bash -c 'cd /home/podman/ppf && \ timeout 25 podman run --rm --name ppf --network=host \ -v ./src:/app:ro -v ./data:/app/data \ -v ./config.ini:/app/config.ini:ro \ localhost/ppf python2 -u proxywatchd.py 2>&1' -
Found the error in httpd thread startup:
error: [Errno 98] Address already in use: ('0.0.0.0', 8081)Container started, httpd failed to bind, process continued but HTTP unavailable.
-
Identified root cause - orphaned processes from previous debug attempts:
ps aux | grep -E "[p]pf|[p]roxy" # Found: python2 ppf.py (PID 6421) still running, holding port 8081 # Found: conmon, timeout, bash processes from stale container -
Why orphans existed:
- Previous
timeout 15 podman runcommands timed out podman rm -fdoesn't kill processes when container metadata is corrupted- Orphaned python2 process kept running with port bound
- Previous
Root Cause: Stale container processes from interrupted debug sessions held port 8081. The container started successfully but httpd thread failed to bind, causing silent failure (no HTTP endpoints) while proxy testing continued.
Fix Applied:
# Force kill all orphaned processes
sudo pkill -9 -f "ppf.py"
sudo pkill -9 -f "proxywatchd.py"
sudo pkill -9 -f "conmon.*ppf"
sleep 2
# Verify port is free
sudo ss -tlnp | grep 8081 # Should show nothing
# Clean podman state
sudo -u podman podman rm -f -a
sudo -u podman podman container prune -f
# Start fresh
sudo -u podman bash -c 'cd /home/podman/ppf && \
podman run -d --rm --name ppf --network=host \
-v ./src:/app:ro -v ./data:/app/data \
-v ./config.ini:/app/config.ini:ro \
localhost/ppf python2 -u proxywatchd.py'
Verification:
curl -sf http://localhost:8081/health
# {"status": "ok", "timestamp": 1766573885}
Prevention:
- Use
podman-composefor reliable container management - Use
pkill -9 -fto kill orphaned processes before restart - Check port availability before starting:
ss -tlnp | grep 8081 - Run container foreground first to capture startup errors
Correct Deployment Procedure:
# As root or with sudo
sudo -i -u podman bash
cd /home/podman/ppf
podman-compose down
podman-compose up -d
podman ps
podman logs -f ppf
docker-compose.yml (updated):
version: '3.8'
services:
ppf:
image: localhost/ppf:latest
container_name: ppf
network_mode: host
volumes:
- ./src:/app:ro
- ./data:/app/data
- ./config.ini:/app/config.ini:ro
command: python2 -u proxywatchd.py
restart: unless-stopped
environment:
- PYTHONUNBUFFERED=1
[x] SSH Connection Flooding / fail2ban (2024-12-24)
Symptoms:
- SSH connections timing out or reset
- "Connection refused" errors
- Intermittent access to odin
Root Cause: Multiple individual SSH commands triggered fail2ban rate limiting.
Fix Applied:
Created ~/.claude/rules/ssh-usage.md with batching best practices.
Key Pattern:
# BAD: 5 separate connections
ssh host 'cmd1'
ssh host 'cmd2'
ssh host 'cmd3'
# GOOD: 1 connection, all commands
ssh host bash <<'EOF'
cmd1
cmd2
cmd3
EOF
[!] Podman Container Metadata Disappears (2024-12-24)
Symptoms:
podman ps -ashows empty even though process is runningpodman logs ppfreturns "no such container"- Port is listening and service responds to health checks
Observed Behavior:
# Container starts
podman run -d --name ppf ...
# Returns container ID: dc55f0a218b7...
# Immediately after
podman ps -a # Empty!
ss -tlnp | grep 8081 # Shows python2 listening
curl localhost:8081/health # {"status": "ok"}
Analysis:
- The process runs correctly inside the container namespace
- Container metadata in podman's database is lost/corrupted
- May be related to
--rmflag interaction with detached mode - Rootless podman with overlayfs can have state sync issues
Workaround: Service works despite missing metadata. Monitor via:
ss -tlnp | grep 8081- port listeningps aux | grep proxywatchd- process runningcurl localhost:8081/health- service responding
Impact: Low. Service functions correctly. Only podman logs unavailable.
Container Debugging Checklist
When container fails to start or crashes:
┌───┬─────────────────────────────────────────────────────────────────────────┐
│ 1 │ Check for orphans: ps aux | grep -E "[p]rocess_name"
│ 2 │ Check port conflicts: ss -tlnp | grep PORT
│ 3 │ Run foreground: podman run --rm (no -d) to see output
│ 4 │ Check podman state: podman ps -a
│ 5 │ Clean stale: pkill -9 -f "pattern" && podman rm -f -a
│ 6 │ Verify deps: config files, data dirs, volumes exist
│ 7 │ Check logs: podman logs container_name 2>&1 | tail -50
│ 8 │ Health check: curl -sf http://localhost:PORT/health
└───┴─────────────────────────────────────────────────────────────────────────┘
Note: If podman ps shows empty but port is listening and health check passes,
the service is running correctly despite metadata issues. See "Podman Container
Metadata Disappears" section above.
- Dashboard: pause API polling for inactive tabs (only update persistent items + active tab)