diff --git a/CLAUDE.md b/CLAUDE.md index 901f677..cd26b9a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -31,89 +31,71 @@ **ODIN uses root ppf/ directory. WORKERS use ppf/src/ subdirectory.** -## Host Access +## Operations Toolkit -**ALWAYS use Ansible from `/opt/ansible` with venv activated:** +All deployment and service management is handled by `tools/`: + +``` +tools/ + lib/ppf-common.sh shared library (hosts, wrappers, colors) + ppf-deploy deploy code to nodes + ppf-logs view container logs + ppf-service manage containers (status/start/stop/restart) +``` + +Symlinked to `~/.local/bin/` for direct use. + +### Deployment + +```bash +ppf-deploy # all nodes: validate, sync, restart +ppf-deploy odin # master only +ppf-deploy workers # cassius, edge, sentinel +ppf-deploy cassius edge # specific hosts +ppf-deploy --no-restart # sync only, skip restart +``` + +Steps performed automatically: +1. Validate Python syntax locally +2. Rsync `*.py` + `servers.txt` (root for odin, `src/` for workers) +3. Copy compose file per role (`compose.master.yml` / `compose.worker.yml`) +4. Fix ownership (`chown -R podman:podman`) +5. Restart containers and show status + +### Container Logs + +```bash +ppf-logs # last 40 lines from odin +ppf-logs cassius # specific worker +ppf-logs -f edge # follow mode +ppf-logs -n 100 sentinel # last N lines +``` + +### Service Management + +```bash +ppf-service status # all nodes: compose ps + health +ppf-service status workers # workers only +ppf-service restart odin # restart master +ppf-service stop cassius # stop specific worker +ppf-service start workers # start all workers +``` + +### Direct Ansible (for operations not covered by tools) + +Tools use `/opt/ansible` venv and `ANSIBLE_REMOTE_TMP=/tmp/.ansible` +internally. For ad-hoc operations: ```bash cd /opt/ansible && source venv/bin/activate -``` - -### Quick Reference Commands - -```bash -# Check worker status -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m shell -a "hostname" # Check worker config -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m shell -a "grep -E 'threads|timeout|ssl' /home/podman/ppf/config.ini" - -# Check worker logs (dynamic UID) -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius -m raw \ - -a "uid=\$(id -u podman) && sudo -u podman podman logs --tail 20 ppf-worker" +ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel \ + -m shell -a "grep -E 'threads|timeout|ssl' /home/podman/ppf/config.ini" # Modify config option -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m lineinfile -a "path=/home/podman/ppf/config.ini line='ssl_only = 1' insertafter='ssl_first'" - -# Restart workers via compose -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m raw \ - -a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart" -``` - -## Full Deployment Procedure - -All hosts use `podman-compose` with `compose.yml` for container management. -Rsync deploys code; compose handles restart. - -### Step 1: Validate Syntax Locally - -```bash -cd /home/user/git/ppf -for f in *.py; do python3 -m py_compile "$f" && echo "OK: $f"; done -``` - -### Step 2: Deploy to ALL Hosts - -```bash -cd /opt/ansible && source venv/bin/activate - -# Deploy to ODIN (root ppf/ directory + compose.master.yml as compose.yml) -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m synchronize \ - -a "src=/home/user/git/ppf/ dest=/home/podman/ppf/ rsync_opts='--include=*.py,--include=servers.txt,--include=Dockerfile,--exclude=*'" -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m copy \ - -a "src=/home/user/git/ppf/compose.master.yml dest=/home/podman/ppf/compose.yml owner=podman group=podman" - -# Deploy to WORKERS (ppf/src/ subdirectory + compose.worker.yml as compose.yml) -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m synchronize \ - -a "src=/home/user/git/ppf/ dest=/home/podman/ppf/src/ rsync_opts='--include=*.py,--include=servers.txt,--include=Dockerfile,--exclude=*'" -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m copy \ - -a "src=/home/user/git/ppf/compose.worker.yml dest=/home/podman/ppf/compose.yml owner=podman group=podman" - -# CRITICAL: Fix ownership on ALL hosts (rsync uses ansible user, containers need podman) -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin,cassius,edge,sentinel -m raw \ - -a "chown -R podman:podman /home/podman/ppf/" -``` - -**Note:** Ownership must be fixed after every deploy. rsync runs as ansible user, but containers run as podman user. Missing ownership fix causes `ImportError: No module named X` errors. - -### Step 3: Restart Services - -```bash -# Restart ODIN via compose -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m raw \ - -a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart" - -# Restart WORKERS via compose -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m raw \ - -a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart" -``` - -### Step 4: Verify All Running - -```bash -# Check all hosts via compose -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin,cassius,edge,sentinel -m raw \ - -a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose ps" +ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel \ + -m lineinfile -a "path=/home/podman/ppf/config.ini line='ssl_only = 1' insertafter='ssl_first'" ``` ## Podman User IDs @@ -234,10 +216,9 @@ ansible odin -m raw \ ### Missing servers.txt -Workers need `servers.txt` in src/: +Redeploy syncs `servers.txt` automatically: ```bash -ansible cassius,edge,sentinel -m copy \ - -a "src=/home/user/git/ppf/servers.txt dest=/home/podman/ppf/src/servers.txt owner=podman group=podman" +ppf-deploy workers ``` ### Exit Code 126 (Permission/Storage) @@ -249,22 +230,17 @@ sudo -u podman podman system reset --force ### Dashboard Shows NaN or Missing Data -Odin likely running old code. Redeploy to odin: +Odin likely running old code: ```bash -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m synchronize \ - -a "src=/home/user/git/ppf/ dest=/home/podman/ppf/ rsync_opts='--include=*.py,--include=servers.txt,--exclude=*'" -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m raw -a "chown -R podman:podman /home/podman/ppf/" -ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m raw \ - -a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart" +ppf-deploy odin ``` ### Worker Keeps Crashing -1. Check container status: `sudo -u podman podman ps -a` -2. Check logs: `sudo -u podman podman logs --tail 50 ppf-worker` -3. Verify servers.txt exists in src/ -4. Check ownership: `ls -la /home/podman/ppf/src/` -5. Run manually to see error: +1. Check status: `ppf-service status workers` +2. Check logs: `ppf-logs -n 50 cassius` +3. Redeploy (fixes ownership + servers.txt): `ppf-deploy cassius` +4. If still failing, run manually on the host to see error: ```bash sudo -u podman podman run --rm --network=host \ -v /home/podman/ppf/src:/app:ro,Z \