s5p/PROJECT.md

# s5p -- Project

## Purpose

A lightweight SOCKS5 proxy server that chains connections through Tor and/or
arbitrary proxy hops (SOCKS4, SOCKS5, HTTP CONNECT).

## Motivation

Existing solutions (`proxychains-ng`) rely on `LD_PRELOAD` hacks, only work
on Linux, and intercept at the library level. s5p is a proper SOCKS5 server
that any application can use natively -- no injection required.

## Architecture

```
                  TCP          tunnel        tunnel
Client -------> s5p -------> Hop 1 -------> Hop 2 -------> Target
        SOCKS5       proto1         proto2         protoN
```

- **server.py** -- asyncio SOCKS5 server, bidirectional relay, signal handling, multi-pool orchestration
- **proto.py** -- protocol handshakes (SOCKS5, SOCKS4/4a, HTTP CONNECT), chain builder
- **config.py** -- YAML config loading, proxy URL parsing, API response parsing, pool/listener config
- **pool.py** -- named proxy pool (multi-source, health-tested, persistent, MITM filtering)
- **http.py** -- minimal async HTTP/1.1 client (GET/POST JSON, no external deps)
- **connpool.py** -- pre-warmed TCP connection pool to first chain hop
- **api.py** -- built-in HTTP control API (runtime metrics, multi-pool state, config reload)
- **tor.py** -- Tor control port integration (NEWNYM signaling, periodic circuit rotation)
- **cli.py** -- argparse CLI, logging setup, cProfile support
- **metrics.py** -- connection counters, per-listener latency, rate tracking (lock-free, asyncio-only)

## Deployment

| Method | Command |
|--------|---------|
| Local venv | `pip install -e .` then `s5p -c config/s5p.yaml` |
| Container | `make build && make up` (Alpine, ~59MB) |

Production images bake source into the image via `COPY src/ /app/src/`.
Config and data are mounted at runtime: `./config/s5p.yaml` (ro) and
`~/.cache/s5p` as `/data` for pool state and profile output.
The compose.yaml volume mount overrides source for local dev.

CI pushes `harbor.mymx.me/s5p/s5p:latest` on every push to `main`
(lint + tests must pass first).

## Dependencies

| Package | Purpose |
|---------|---------|
| pyyaml  | Config file parsing |

All other functionality uses Python stdlib (`asyncio`, `socket`, `struct`).

## Design Decisions

- **No LD_PRELOAD** -- clean SOCKS5 server, works with any client
- **asyncio** -- single-threaded event loop, efficient for I/O-bound proxying
- **Domain passthrough** -- never resolve DNS locally to prevent leaks
- **Tor as a hop** -- no special Tor handling; it's just `socks5://127.0.0.1:9050`
- **Graceful shutdown** -- SIGTERM/SIGINT registered before startup for clean container stops
- **Config split** -- tracked example template, gitignored live config with real addresses
- **Proxy pool** -- multi-source (API + file), health-tested, persistent, auto-cleaned
- **Weighted selection** -- recently-tested proxies preferred via recency decay weight
- **Failure backoff** -- connection failures penalize proxy weight for 60s, avoids retry waste
- **Stale expiry** -- proxies dropped from sources evicted after 3 refresh cycles if not alive
- **Chain pre-flight** -- static chain tested before pool health tests; skip on failure
- **Warm start** -- trust cached alive state on restart, defer all health tests to background
- **SIGHUP reload** -- re-read config, update pool settings, re-fetch sources
- **Dead reporting** -- POST evicted proxies to upstream API for list quality feedback
- **Connection semaphore** -- cap concurrent connections to prevent fd exhaustion
- **Async HTTP** -- native asyncio HTTP client replaces blocking urllib, parallel fetches
- **First-hop pool** -- pre-warmed TCP connections to chain[0], stale-evicted, auto-refilled
- **Control API** -- built-in asyncio HTTP server, no Flask/external deps, disabled by default
- **Tor integration** -- control port NEWNYM signaling, periodic circuit rotation
- **Multi-Tor** -- round-robin traffic across multiple Tor nodes (`tor_nodes`)
- **Multi-listener** -- per-port chain depth and pool assignment
- **Named pools** -- independent proxy pools with per-listener binding (`proxy_pools:`)
- **MITM filtering** -- `mitm: true/false` source filter, `?mitm=0/1` API query param
- **Per-listener latency** -- independent latency tracking per listener in `/status`