docs: update CLAUDE.md for operations toolkit
Replace verbose ansible deployment commands with ppf-deploy, ppf-logs, and ppf-service references. Keep raw ansible only for ad-hoc config operations not covered by tools.
This commit is contained in:
154
CLAUDE.md
154
CLAUDE.md
@@ -31,89 +31,71 @@
|
||||
|
||||
**ODIN uses root ppf/ directory. WORKERS use ppf/src/ subdirectory.**
|
||||
|
||||
## Host Access
|
||||
## Operations Toolkit
|
||||
|
||||
**ALWAYS use Ansible from `/opt/ansible` with venv activated:**
|
||||
All deployment and service management is handled by `tools/`:
|
||||
|
||||
```
|
||||
tools/
|
||||
lib/ppf-common.sh shared library (hosts, wrappers, colors)
|
||||
ppf-deploy deploy code to nodes
|
||||
ppf-logs view container logs
|
||||
ppf-service manage containers (status/start/stop/restart)
|
||||
```
|
||||
|
||||
Symlinked to `~/.local/bin/` for direct use.
|
||||
|
||||
### Deployment
|
||||
|
||||
```bash
|
||||
ppf-deploy # all nodes: validate, sync, restart
|
||||
ppf-deploy odin # master only
|
||||
ppf-deploy workers # cassius, edge, sentinel
|
||||
ppf-deploy cassius edge # specific hosts
|
||||
ppf-deploy --no-restart # sync only, skip restart
|
||||
```
|
||||
|
||||
Steps performed automatically:
|
||||
1. Validate Python syntax locally
|
||||
2. Rsync `*.py` + `servers.txt` (root for odin, `src/` for workers)
|
||||
3. Copy compose file per role (`compose.master.yml` / `compose.worker.yml`)
|
||||
4. Fix ownership (`chown -R podman:podman`)
|
||||
5. Restart containers and show status
|
||||
|
||||
### Container Logs
|
||||
|
||||
```bash
|
||||
ppf-logs # last 40 lines from odin
|
||||
ppf-logs cassius # specific worker
|
||||
ppf-logs -f edge # follow mode
|
||||
ppf-logs -n 100 sentinel # last N lines
|
||||
```
|
||||
|
||||
### Service Management
|
||||
|
||||
```bash
|
||||
ppf-service status # all nodes: compose ps + health
|
||||
ppf-service status workers # workers only
|
||||
ppf-service restart odin # restart master
|
||||
ppf-service stop cassius # stop specific worker
|
||||
ppf-service start workers # start all workers
|
||||
```
|
||||
|
||||
### Direct Ansible (for operations not covered by tools)
|
||||
|
||||
Tools use `/opt/ansible` venv and `ANSIBLE_REMOTE_TMP=/tmp/.ansible`
|
||||
internally. For ad-hoc operations:
|
||||
|
||||
```bash
|
||||
cd /opt/ansible && source venv/bin/activate
|
||||
```
|
||||
|
||||
### Quick Reference Commands
|
||||
|
||||
```bash
|
||||
# Check worker status
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m shell -a "hostname"
|
||||
|
||||
# Check worker config
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m shell -a "grep -E 'threads|timeout|ssl' /home/podman/ppf/config.ini"
|
||||
|
||||
# Check worker logs (dynamic UID)
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius -m raw \
|
||||
-a "uid=\$(id -u podman) && sudo -u podman podman logs --tail 20 ppf-worker"
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel \
|
||||
-m shell -a "grep -E 'threads|timeout|ssl' /home/podman/ppf/config.ini"
|
||||
|
||||
# Modify config option
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m lineinfile -a "path=/home/podman/ppf/config.ini line='ssl_only = 1' insertafter='ssl_first'"
|
||||
|
||||
# Restart workers via compose
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m raw \
|
||||
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart"
|
||||
```
|
||||
|
||||
## Full Deployment Procedure
|
||||
|
||||
All hosts use `podman-compose` with `compose.yml` for container management.
|
||||
Rsync deploys code; compose handles restart.
|
||||
|
||||
### Step 1: Validate Syntax Locally
|
||||
|
||||
```bash
|
||||
cd /home/user/git/ppf
|
||||
for f in *.py; do python3 -m py_compile "$f" && echo "OK: $f"; done
|
||||
```
|
||||
|
||||
### Step 2: Deploy to ALL Hosts
|
||||
|
||||
```bash
|
||||
cd /opt/ansible && source venv/bin/activate
|
||||
|
||||
# Deploy to ODIN (root ppf/ directory + compose.master.yml as compose.yml)
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m synchronize \
|
||||
-a "src=/home/user/git/ppf/ dest=/home/podman/ppf/ rsync_opts='--include=*.py,--include=servers.txt,--include=Dockerfile,--exclude=*'"
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m copy \
|
||||
-a "src=/home/user/git/ppf/compose.master.yml dest=/home/podman/ppf/compose.yml owner=podman group=podman"
|
||||
|
||||
# Deploy to WORKERS (ppf/src/ subdirectory + compose.worker.yml as compose.yml)
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m synchronize \
|
||||
-a "src=/home/user/git/ppf/ dest=/home/podman/ppf/src/ rsync_opts='--include=*.py,--include=servers.txt,--include=Dockerfile,--exclude=*'"
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m copy \
|
||||
-a "src=/home/user/git/ppf/compose.worker.yml dest=/home/podman/ppf/compose.yml owner=podman group=podman"
|
||||
|
||||
# CRITICAL: Fix ownership on ALL hosts (rsync uses ansible user, containers need podman)
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin,cassius,edge,sentinel -m raw \
|
||||
-a "chown -R podman:podman /home/podman/ppf/"
|
||||
```
|
||||
|
||||
**Note:** Ownership must be fixed after every deploy. rsync runs as ansible user, but containers run as podman user. Missing ownership fix causes `ImportError: No module named X` errors.
|
||||
|
||||
### Step 3: Restart Services
|
||||
|
||||
```bash
|
||||
# Restart ODIN via compose
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m raw \
|
||||
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart"
|
||||
|
||||
# Restart WORKERS via compose
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel -m raw \
|
||||
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart"
|
||||
```
|
||||
|
||||
### Step 4: Verify All Running
|
||||
|
||||
```bash
|
||||
# Check all hosts via compose
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin,cassius,edge,sentinel -m raw \
|
||||
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose ps"
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible cassius,edge,sentinel \
|
||||
-m lineinfile -a "path=/home/podman/ppf/config.ini line='ssl_only = 1' insertafter='ssl_first'"
|
||||
```
|
||||
|
||||
## Podman User IDs
|
||||
@@ -234,10 +216,9 @@ ansible odin -m raw \
|
||||
|
||||
### Missing servers.txt
|
||||
|
||||
Workers need `servers.txt` in src/:
|
||||
Redeploy syncs `servers.txt` automatically:
|
||||
```bash
|
||||
ansible cassius,edge,sentinel -m copy \
|
||||
-a "src=/home/user/git/ppf/servers.txt dest=/home/podman/ppf/src/servers.txt owner=podman group=podman"
|
||||
ppf-deploy workers
|
||||
```
|
||||
|
||||
### Exit Code 126 (Permission/Storage)
|
||||
@@ -249,22 +230,17 @@ sudo -u podman podman system reset --force
|
||||
|
||||
### Dashboard Shows NaN or Missing Data
|
||||
|
||||
Odin likely running old code. Redeploy to odin:
|
||||
Odin likely running old code:
|
||||
```bash
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m synchronize \
|
||||
-a "src=/home/user/git/ppf/ dest=/home/podman/ppf/ rsync_opts='--include=*.py,--include=servers.txt,--exclude=*'"
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m raw -a "chown -R podman:podman /home/podman/ppf/"
|
||||
ANSIBLE_REMOTE_TMP=/tmp/.ansible ansible odin -m raw \
|
||||
-a "uid=\$(id -u podman) && cd /home/podman/ppf && sudo -u podman XDG_RUNTIME_DIR=/run/user/\$uid podman-compose restart"
|
||||
ppf-deploy odin
|
||||
```
|
||||
|
||||
### Worker Keeps Crashing
|
||||
|
||||
1. Check container status: `sudo -u podman podman ps -a`
|
||||
2. Check logs: `sudo -u podman podman logs --tail 50 ppf-worker`
|
||||
3. Verify servers.txt exists in src/
|
||||
4. Check ownership: `ls -la /home/podman/ppf/src/`
|
||||
5. Run manually to see error:
|
||||
1. Check status: `ppf-service status workers`
|
||||
2. Check logs: `ppf-logs -n 50 cassius`
|
||||
3. Redeploy (fixes ownership + servers.txt): `ppf-deploy cassius`
|
||||
4. If still failing, run manually on the host to see error:
|
||||
```bash
|
||||
sudo -u podman podman run --rm --network=host \
|
||||
-v /home/podman/ppf/src:/app:ro,Z \
|
||||
|
||||
Reference in New Issue
Block a user