fireclaw/IDEAS.md

# Fireclaw Ideas

Future features and experiments, loosely prioritized by usefulness.

## Operator Tools

### !status command
Quick dashboard in IRC: agent count, RAM/CPU per VM, Ollama model currently loaded, system uptime, disk free. One command to see the health of everything.

### !logs <agent> [n]
Tail the last N interactions an agent had. Stored in the agent's workspace. Useful to see what an agent's been doing while you were away.

### !persona <agent> [new persona]
View or live-edit an agent's persona via IRC. "Make the worker more sarcastic" without touching files or restarting. Uses hot-reload under the hood.

### !pause / !resume <agent>
Temporarily mute an agent without destroying it. Agent stays alive but stops responding. Useful when you need a channel to yourself.

## Skill System (inspired by mitsuhiko/agent-stuff)

### SKILL.md pattern for agent tools
Replace hardcoded tools in agent.py with a discoverable skill directory. Each skill is a folder with a SKILL.md (description, parameters, examples) and a script (run.sh/run.py).

```
~/.fireclaw/skills/
  web_search/
    SKILL.md        # name, description, parameters — parsed into tool definition
    run.py          # actual implementation
  fetch_url/
    SKILL.md
    run.py
  git_diff/
    SKILL.md
    run.sh
```

Agent discovers skills at boot, loads SKILL.md into Ollama tool definitions, invokes scripts on tool call. Adding a new tool = drop a folder. No agent.py changes needed.

Could also support per-template skill selection — coder gets git/code skills, researcher gets search/fetch skills, worker gets everything.

Reference: https://github.com/mitsuhiko/agent-stuff — Pi Coding Agent skill/extension architecture.

## Agent Tools

### Web search
Agents can search via the searx instance on mymx. Either bake the searx CLI into the rootfs, or add a proper `web_search(query)` tool that calls the searx API from inside the VM. Agents could actually research topics instead of relying on training data.

### Fetch URL
`fetch_url(url)` tool to grab a webpage, strip HTML, return text. Combined with web search, agents become genuine research assistants. Could use `curl | python3 -c "from html.parser import..."` or a lightweight readability script.

### File sharing between agents
A shared `/shared` mount (third virtio drive, or a common ext4 image) that all agents can read/write. Drop a file from one agent, pick it up from another. Enables collaboration: researcher writes findings, coder reads and implements.

### Code execution sandbox
A `run_python(code)` tool that's safer than `run_command`. Executes in a subprocess with resource limits (timeout, memory cap). Better for code agents that need to test their own output.

## Automation

### Cron agents
Template gets an optional `schedule` field: `"schedule": "0 8 * * *"`. The overseer spawns the agent on schedule, it does its task, reports to #agents, and self-destructs. Use cases:
- Morning health check: "any disk/memory/service issues on grogbox?"
- Daily digest: "summarize what happened in #agents yesterday"
- Backup verification: "check that last night's backups completed"

### Webhook triggers
HTTP endpoint on the host (e.g., `:8080/hook/<template>`) that spawns an ephemeral agent with the webhook payload as context. Examples:
- Gitea push webhook → coder agent reviews the commit in #dev
- Uptime monitor → agent investigates and reports
- RSS feed → researcher summarizes new articles

### Alert forwarding
Pipe system alerts (fail2ban, smartmontools, systemd failures, journal errors) into #agents via a simple bridge script. An always-on agent could triage: "fail2ban banned 3 IPs today, all SSH brute force from China, nothing to worry about."

### Git integration
Agent can clone repos from Gitea (on mymx), read code, create branches, commit changes, open PRs. Would need git in the rootfs (already available via `apk add git`) and Gitea API access via the bridge network.

## Agent Personality & Memory

### Evolving personalities
Instruct agents to actively develop opinions, preferences, and communication styles over time. The memory system supports this — agents could save "I prefer concise answers" or "human likes dry humor" and adapt. Give them character arcs.

### Agent journals
Each agent maintains a daily journal in its workspace. Auto-saved summary of conversations, decisions made, things learned. Creates a narrative over time. Useful for debugging agent behavior and understanding their "thought process."

### Cross-agent memory
Agents can read (but not write) each other's MEMORY.md. A new agent spawned for a task can inherit context from an existing agent. "Spawn a coder that knows what the researcher found."

### Agent self-improvement
After each conversation, the agent reflects: "What could I have done better?" Saves lessons to memory. Over time, agents get better at their specific role. Needs a meta-prompt that triggers self-reflection.

## Multi-Agent Orchestration

### Task delegation
Human gives a complex task to one agent, it breaks it down and delegates subtasks to other agents via IRC. Researcher does the research, coder implements, worker tests. All visible in #agents.

### Agent voting
Multiple agents weigh in on a question. "Should we upgrade the kernel?" Each agent responds in #agents, human gets multiple perspectives. Could formalize with a `!poll` command.

### Agent debates
Two agents argue opposite sides of a technical decision. Useful for exploring trade-offs. "Should we use Rust or Go for this?" Coder argues one side, researcher the other.

## MCP Servers as Firecracker VMs

Run MCP tool servers in their own Firecracker VMs, same isolation model as agents. Managed by the overseer with the same lifecycle (!invoke, !destroy).

### Approach: single Firecracker VM with podman containers
```
Firecracker VMs (fcbr0, 172.16.0.x)
  ├── worker (agent VM)
  ├── coder (agent VM)
  └── mcp-services (service VM, 172.16.0.10)
        └── podman
              ├── mcp-fs (:8081)
              ├── mcp-git (:8082)
              └── mcp-searx (:8083)
```

One VM hosts all MCP servers in separate containers. Firecracker isolates from the host, podman separates services from each other. Lightweight — MCP servers are just HTTP wrappers, don't need their own VMs.

Agents call them at `172.16.0.10:<port>`. Overseer manages the VM and lists available tools via `!services`.

One-VM-per-service is overkill for trusted MCP servers but could be used for untrusted third-party tools.

### Why a Firecracker VM instead of host podman
- MCP servers can't access the host filesystem directly
- Consistent isolation model with agents
- The VM is independently restartable without affecting the host
- Podman-in-Firecracker is already working in the agent rootfs

### Candidate MCP servers
- **filesystem** — read/write to a shared volume (mounted as virtio drive)
- **git** — clone, read, diff, commit (Gitea on mymx accessible via bridge)
- **searx** — web search via searx.mymx.me
- **database** — SQLite or PostgreSQL query tool
- **fetch** — HTTP fetch + readability extraction

### Cron / scheduled agents
Add `schedule` field to templates (cron syntax). Overseer checks every minute, spawns matching agents, they do their task, report to #agents, self-destruct after timeout. Use cases: daily health checks, backup verification, digest summaries.

## Logging

### Centralized log viewer
Agent logs go to /workspace/agent.log inside each VM. For a centralized web UI:
- rsyslog on host (agents send to 172.16.0.1:514) for aggregation
- frontail (`npx frontail /var/log/fireclaw/*.log --port 9001`) for browser-based real-time viewing
- Or GoTTY (`gotty tail -f ...`) for zero-config web terminal

Start simple (plain files + !logs), add rsyslog + frontail when needed.

## Infrastructure

### Agent metrics dashboard
Simple HTML page served from the host showing: running agents, response times, model usage, memory contents, conversation history. No framework — just a static page with data from agents.json and workspace files.

### Agent backup/restore
Export an agent's complete state (workspace, config, rootfs diff) as a tarball. Import on another machine. Portable agent identities.

### Multi-host agents
Run agents on multiple machines (grogbox + odin). Overseer manages VMs across hosts via SSH. Agents on different hosts communicate via IRC federation.

### GPU deployment
Remote machine available: Xeon E5-1620 v4, 32GB RAM, Quadro P5000 (16GB VRAM). Enough for 14B-30B models at 2-5s inference. Standalone fireclaw deployment — its own ngircd, its own agents, completely independent from grogbox.

### Install script
`scripts/install.sh` — one-command deployment to new machines. Installs firecracker, ollama (with GPU if available), ngircd, Node.js, builds rootfs, configures everything. `curl -fsSL .../install.sh | bash` or just `./scripts/install.sh`. No Ansible dependency — plain bash.

## Fun & Experimental

### Agent challenges
Post a challenge in #agents: "shortest Python script that sorts a list." Agents compete, see each other's answers, iterate. Gamified agent development.

### Honeypot agents
Agent with fake credentials, fake services, fake data. See what it tries to do. Test agent safety before trusting it with real access. Could also test prompt injection resistance.

### Agent-written agents
An agent creates a new template (persona + config) and asks the overseer to spawn it. Self-replicating agent system. Needs careful guardrails.

### IRC games
Agents play text-based games with each other or with humans. Trivia, 20 questions, collaborative storytelling. Tests agent personality and creativity in a low-stakes way.

### Dream mode
An agent left running overnight with `trigger: all` in an empty channel, talking to itself. Stream of consciousness. Review in the morning. Probably nonsense, but occasionally insightful.