docs: update CLAUDE.md for operations toolkit

Replace verbose ansible deployment commands with ppf-deploy,
ppf-logs, and ppf-service references. Keep raw ansible only
for ad-hoc config operations not covered by tools.
This commit is contained in:
Username
2026-02-17 22:52:54 +01:00
parent 1f14173595
commit 12f6b1d8eb

154
CLAUDE.md
View File

@@ -31,89 +31,71 @@
**ODIN uses root ppf/ directory. WORKERS use ppf/src/ subdirectory.**
## Host Access
## Operations Toolkit
**ALWAYS use Ansible from `/opt/ansible` with venv activated:**
All deployment and service management is handled by `tools/`:
```
tools/
lib/ppf-common.sh shared library (hosts, wrappers, colors)
ppf-deploy deploy code to nodes
ppf-logs view container logs
ppf-service manage containers (status/start/stop/restart)
```
Symlinked to `~/.local/bin/` for direct use.
### Deployment
```bash
ppf-deploy # all nodes: validate, sync, restart
ppf-deploy odin # master only
ppf-deploy workers # cassius, edge, sentinel
ppf-deploy cassius edge # specific hosts
ppf-deploy --no-restart # sync only, skip restart
```
Steps performed automatically:
1. Validate Python syntax locally
2. Rsync `*.py` + `servers.txt` (root for odin, `src/` for workers)
3. Copy compose file per role (`compose.master.yml` / `compose.worker.yml`)
4. Fix ownership (`chown -R podman:podman`)
5. Restart containers and show status
### Container Logs
```bash
ppf-logs # last 40 lines from odin
ppf-logs cassius # specific worker
ppf-logs -f edge # follow mode
ppf-logs -n 100 sentinel # last N lines
```
### Service Management
```bash
ppf-service status # all nodes: compose ps + health
ppf-service status workers # workers only
ppf-service restart odin # restart master
ppf-service stop cassius # stop specific worker
ppf-service start workers # start all workers
```
### Direct Ansible (for operations not covered by tools)
Tools use `/opt/ansible` venv and `ANSIBLE_REMOTE_TMP=/tmp/.ansible`
internally. For ad-hoc operations:
```bash
cd /opt/ansible && source venv/bin/activate
```
### Quick Reference Commands
```bash
# Check worker status
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m shell -a "hostname"
# Check worker config
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m shell -a "grep -E 'threads|timeout|ssl' /home/podman/ppf/config.ini"
# Check worker logs (dynamic UID)
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius -m raw \
-a "uid=\$(id -u podman) && sudo -u podman podman logs --tail 20 ppf-worker"
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel \
-m shell -a "grep -E 'threads|timeout|ssl' /home/podman/ppf/config.ini"
# Modify config option
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m lineinfile -a "path=/home/podman/ppf/config.ini line='ssl_only = 1' insertafter='ssl_first'"
# Restart workers via compose
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m raw \
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart"
```
## Full Deployment Procedure
All hosts use `podman-compose` with `compose.yml` for container management.
Rsync deploys code; compose handles restart.
### Step 1: Validate Syntax Locally
```bash
cd /home/user/git/ppf
for f in *.py; do python3 -m py_compile "$f" && echo "OK: $f"; done
```
### Step 2: Deploy to ALL Hosts
```bash
cd /opt/ansible && source venv/bin/activate
# Deploy to ODIN (root ppf/ directory + compose.master.yml as compose.yml)
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m synchronize \
-a "src=/home/user/git/ppf/ dest=/home/podman/ppf/ rsync_opts='--include=*.py,--include=servers.txt,--include=Dockerfile,--exclude=*'"
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m copy \
-a "src=/home/user/git/ppf/compose.master.yml dest=/home/podman/ppf/compose.yml owner=podman group=podman"
# Deploy to WORKERS (ppf/src/ subdirectory + compose.worker.yml as compose.yml)
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m synchronize \
-a "src=/home/user/git/ppf/ dest=/home/podman/ppf/src/ rsync_opts='--include=*.py,--include=servers.txt,--include=Dockerfile,--exclude=*'"
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m copy \
-a "src=/home/user/git/ppf/compose.worker.yml dest=/home/podman/ppf/compose.yml owner=podman group=podman"
# CRITICAL: Fix ownership on ALL hosts (rsync uses ansible user, containers need podman)
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin,cassius,edge,sentinel -m raw \
-a "chown -R podman:podman /home/podman/ppf/"
```
**Note:** Ownership must be fixed after every deploy. rsync runs as ansible user, but containers run as podman user. Missing ownership fix causes `ImportError: No module named X` errors.
### Step 3: Restart Services
```bash
# Restart ODIN via compose
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m raw \
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart"
# Restart WORKERS via compose
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m raw \
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart"
```
### Step 4: Verify All Running
```bash
# Check all hosts via compose
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin,cassius,edge,sentinel -m raw \
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose ps"
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel \
-m lineinfile -a "path=/home/podman/ppf/config.ini line='ssl_only = 1' insertafter='ssl_first'"
```
## Podman User IDs
@@ -234,10 +216,9 @@ ansible odin -m raw \
### Missing servers.txt
Workers need `servers.txt` in src/:
Redeploy syncs `servers.txt` automatically:
```bash
ansible cassius,edge,sentinel -m copy \
-a "src=/home/user/git/ppf/servers.txt dest=/home/podman/ppf/src/servers.txt owner=podman group=podman"
ppf-deploy workers
```
### Exit Code 126 (Permission/Storage)
@@ -249,22 +230,17 @@ sudo -u podman podman system reset --force
### Dashboard Shows NaN or Missing Data
Odin likely running old code. Redeploy to odin:
Odin likely running old code:
```bash
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m synchronize \
-a "src=/home/user/git/ppf/ dest=/home/podman/ppf/ rsync_opts='--include=*.py,--include=servers.txt,--exclude=*'"
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m raw -a "chown -R podman:podman /home/podman/ppf/"
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m raw \
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart"
ppf-deploy odin
```
### Worker Keeps Crashing
1. Check container status: `sudo -u podman podman ps -a`
2. Check logs: `sudo -u podman podman logs --tail 50 ppf-worker`
3. Verify servers.txt exists in src/
4. Check ownership: `ls -la /home/podman/ppf/src/`
5. Run manually to see error:
1. Check status: `ppf-service status workers`
2. Check logs: `ppf-logs -n 50 cassius`
3. Redeploy (fixes ownership + servers.txt): `ppf-deploy cassius`
4. If still failing, run manually on the host to see error:
```bash
sudo -u podman podman run --rm --network=host \
-v /home/podman/ppf/src:/app:ro,Z \