# TODO ## Done - [x] Firecracker CLI runner with snapshots (~1.1s) - [x] Alpine rootfs with ca-certificates, podman, python3 - [x] Global `fireclaw` command - [x] Multi-agent system — overseer + agent VMs + IRC + Ollama - [x] 5 agent templates (worker, coder, researcher, quick, creative) - [x] 5 Ollama models (qwen2.5-coder, qwen2.5, llama3.1, gemma3, phi4-mini) - [x] Agent tool access — shell commands + podman containers - [x] Persistent workspace + memory system (MEMORY.md pattern) - [x] Agent hot-reload — model/persona swap via SSH + SIGHUP - [x] Non-root agents — unprivileged `agent` user - [x] Agent-to-agent via IRC mentions (10s cooldown) - [x] DM support — private messages, no mention needed - [x] /invite support — agents auto-join invited channels - [x] Channel layout — #control (commands), #agents (common), DMs - [x] Overseer resilience — crash recovery, agent adoption - [x] Graceful shutdown — IRC QUIT before VM kill - [x] Systemd service (KillMode=process) - [x] Regression test suite (20 tests) - [ ] Refactor duplicated code — waitForSocket, boot sequence, tap setup, rootfs mount/inject are copy-pasted across vm.ts, snapshot.ts, agent-manager.ts. Extract shared helpers. ## Next up - [ ] Network policies per agent — restrict internet access - [ ] Warm pool — pre-booted VMs for instant agent spawns - [ ] Persistent agent memory improvements — richer memory structure, auto-save from conversations - [ ] Thin provisioning — device-mapper snapshots instead of full rootfs copies ## Polish - [ ] Agent-to-agent response quality — small models (7B) parrot messages instead of answering. Needs better prompting ("don't repeat the question, answer it") or larger models (14B+). Claude API would help here. - [ ] Cost tracking per agent interaction - [ ] Execution recording / audit trail - [ ] Agent health checks — overseer pings agents, restarts dead ones - [ ] Thread safety in agent.py — lock around IRC socket writes - [ ] Update regression tests for new channel layout