Add dangerous command blocking and cron agent scheduling

Dangerous command approval: run_command skill now checks commands against 9 regex patterns (rm -rf /, dd, mkfs, fork bombs, shutdown, device writes, etc.) and blocks execution with a clear message. Defense-in-depth layer on top of VM isolation. Cron agents: templates support schedule (5-field cron) and schedule_timeout (seconds, default 300) fields. Overseer checks every 60s, spawns {name}-cron agents on match, auto-destroys after timeout. Inline cron parser supports *, ranges, lists, and steps. No npm dependencies added.
Overhaul agent quality — prompts, tools, config, compression
2026-04-08 19:26:23 +00:00 · 2026-04-08 18:28:26 +00:00 · 2026-04-08 14:53:07 +00:00 · 2026-04-08 14:49:54 +00:00 · 2026-04-08 02:00:06 +00:00 · 2026-04-08 01:58:06 +00:00
28 changed files with 1682 additions and 769 deletions
--- a/IDEAS.md
+++ b/IDEAS.md
@@ -16,6 +16,30 @@ View or live-edit an agent's persona via IRC. "Make the worker more sarcastic" w
 ### !pause / !resume <agent>
 Temporarily mute an agent without destroying it. Agent stays alive but stops responding. Useful when you need a channel to yourself.

+## Skill System (inspired by mitsuhiko/agent-stuff)
+
+### SKILL.md pattern for agent tools
+Replace hardcoded tools in agent.py with a discoverable skill directory. Each skill is a folder with a SKILL.md (description, parameters, examples) and a script (run.sh/run.py).
+
+```
+~/.fireclaw/skills/
+  web_search/
+    SKILL.md        # name, description, parameters — parsed into tool definition
+    run.py          # actual implementation
+  fetch_url/
+    SKILL.md
+    run.py
+  git_diff/
+    SKILL.md
+    run.sh
+```
+
+Agent discovers skills at boot, loads SKILL.md into Ollama tool definitions, invokes scripts on tool call. Adding a new tool = drop a folder. No agent.py changes needed.
+
+Could also support per-template skill selection — coder gets git/code skills, researcher gets search/fetch skills, worker gets everything.
+
+Reference: https://github.com/mitsuhiko/agent-stuff — Pi Coding Agent skill/extension architecture.
+
 ## Agent Tools

 ### Web search
@@ -110,6 +134,19 @@ One-VM-per-service is overkill for trusted MCP servers but could be used for unt
 - **database** — SQLite or PostgreSQL query tool
 - **fetch** — HTTP fetch + readability extraction

+### Cron / scheduled agents
+Add `schedule` field to templates (cron syntax). Overseer checks every minute, spawns matching agents, they do their task, report to #agents, self-destruct after timeout. Use cases: daily health checks, backup verification, digest summaries.
+
+## Logging
+
+### Centralized log viewer
+Agent logs go to /workspace/agent.log inside each VM. For a centralized web UI:
+- rsyslog on host (agents send to 172.16.0.1:514) for aggregation
+- frontail (`npx frontail /var/log/fireclaw/*.log --port 9001`) for browser-based real-time viewing
+- Or GoTTY (`gotty tail -f ...`) for zero-config web terminal
+
+Start simple (plain files + !logs), add rsyslog + frontail when needed.
+
 ## Infrastructure

 ### Agent metrics dashboard
--- a/REPORT.md
+++ b/REPORT.md
@@ -0,0 +1,97 @@
+# Fireclaw Code Review Report
+
+Generated 2026-04-08. Full codebase analysis.
+
+## Critical
+
+### 1. Shell injection in agent-manager.ts — FIXED
+**File:** `src/agent-manager.ts:120-131`
+Config JSON and persona text interpolated directly into shell commands via `echo '${configJson}'`. If persona contains single quotes, shell breaks or injects arbitrary commands.
+**Fix:** Replaced with `tee` via stdin. No shell interpolation.
+
+### 2. IP pool has no real locking — FIXED
+**File:** `src/network.ts:202-222`
+`openSync` created a lock file but never acquired an actual `flock`. Under concurrency, two agents could allocate the same IP.
+**Fix:** Atomic writes via `writeFileSync` + `renameSync`. Removed fake lock.
+
+### 3. --no-snapshot flag broken — FALSE ALARM
+**File:** `src/cli.ts:38`
+Commander parses `--no-snapshot` as `{ snapshot: false }`. Verified: `opts.snapshot === false` is correct. No fix needed.
+
+## High
+
+### 4. SKILL.md parser fragile — FIXED
+**File:** `agent/agent.py:69-138`
+**Fix:** Added error logging on parse failures, flexible indent detection (2+ spaces), CRLF normalization, boolean case-insensitive (`true`/`True`/`yes`/`1`), parameter type validation with warnings.
+
+### 5. SSH host key verification disabled — DOCUMENTED
+**Files:** `src/ssh.ts:104`, `src/agent-manager.ts`, `src/overseer.ts`
+`hostVerifier: () => true` and `StrictHostKeyChecking=no` everywhere. Acceptable on private bridge network (172.16.0.0/24) — VMs are ephemeral and host keys change on every boot. Conscious design decision, not an oversight.
+
+### 6. Memory not fully reloaded after save_memory — FIXED
+**File:** `agent/agent.py:319-326`
+After save_memory, only MEMORY.md index was reloaded. Individual memory files were not re-read into system prompt.
+**Fix:** Extracted `reload_memory()` function that reloads index + all memory/*.md files.
+
+## Medium
+
+### 7. SSH options duplicated — FIXED
+**Files:** `src/agent-manager.ts`, `src/overseer.ts`
+Same SSH options array repeated 4+ times.
+**Fix:** Extracted `SSH_OPTS` constant in both files.
+
+### 8. Process termination inconsistent — OPEN (low risk)
+**Files:** `src/firecracker-vm.ts:140-164`, `src/agent-manager.ts:319-332`
+Two different implementations. The ChildProcess version uses SIGTERM→SIGKILL, the PID version uses polling. Both work, different contexts (owned vs adopted processes).
+
+### 9. killall python3 hardcoded — FIXED
+**Files:** `src/agent-manager.ts`
+**Fix:** Replaced `killall python3` with `pkill -f 'agent.py'`. Targets the specific script, not all python3 processes.
+
+### 10. Test suite expects researcher template — FIXED
+**File:** `tests/test-suite.sh:105`
+Test asserts `researcher` template exists, but install script only created worker, coder, quick.
+**Fix:** Added researcher and creative templates to `scripts/install.sh`.
+
+### 11. Bare exception handlers — FIXED
+**File:** `src/agent-manager.ts`
+**Fix:** Added error logging to cleanup catch blocks in listAgents and reconcileAgents. Remaining bare catches are intentional best-effort cleanup (umount, rmdir, unlink).
+
+### 12. agent.py monolithic (598 lines) — OPEN
+**File:** `agent/agent.py`
+Handles IRC, skill discovery, tool execution, memory, config reload in one file. Functional but could benefit from splitting.
+
+### 13. Unused writePool function — FIXED
+**File:** `src/network.ts:198`
+Left over after switching to `atomicWritePool`. Removed.
+
+## Low
+
+### 14. Hardcoded network interface fallback
+**File:** `src/network.ts:56` — defaults to `"eno2"` if route parsing fails.
+
+### 15. Predictable mount point names
+**File:** `src/agent-manager.ts:94` — uses `Date.now()` instead of crypto random.
+
+### 16. No Firecracker binary hash verification
+**File:** `scripts/install.sh:115-124` — downloads binary without SHA256 check.
+
+### 17. Ollama response size unbounded
+**File:** `agent/agent.py:338` — `resp.read()` with no size limit.
+
+### 18. IRC message splitting at 400 chars — FIXED
+**File:** `agent/agent.py:266`
+**Fix:** Reduced to 380 chars to stay within IRC's 512-byte line limit.
+
+### 19. Thread safety on _last_response_time — FIXED
+**File:** `agent/agent.py:485`
+**Fix:** Added `_cooldown_lock` (threading.Lock) around cooldown check-and-set.
+
+## Summary
+
+| Status | Count |
+|---|---|
+| Fixed | 13 |
+| False alarm | 1 |
+| Open (medium) | 2 |
+| Open (low) | 4 |
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -19,47 +19,60 @@

 - [x] ngircd configured (`nyx.fireclaw.local`, FireclawNet)
 - [x] Channel layout: #control (overseer), #agents (common room), DMs, /invite
- [x] Ollama with 5 models (qwen2.5-coder, qwen2.5, llama3.1, gemma3, phi4-mini)
+- [x] Ollama with 5+ models, hot-swappable per agent
 - [x] Agent rootfs — Alpine + Python IRC bot + podman + tools
 - [x] Agent manager — start/stop/list/reload long-running VMs
- [x] Overseer — host-side IRC bot, !invoke/!destroy/!list/!model/!templates
+- [x] Overseer — !invoke, !destroy, !list, !model, !models, !templates, !persona, !status, !help
 - [x] 5 agent templates — worker, coder, researcher, quick, creative
- [x] Agent tool access — shell commands + podman containers
- [x] Persistent workspace — 64 MiB ext4 as second virtio drive at /workspace
- [x] Agent memory system — MEMORY.md + save_memory tool, survives restarts
- [x] Agent hot-reload — SSH config update + SIGHUP, no VM restart
- [x] Non-root agents — unprivileged `agent` user
- [x] Agent-to-agent via IRC mentions, 10s cooldown
- [x] DM support — private messages without mention
- [x] /invite support — agents auto-join invited channels
- [x] Overseer resilience — crash recovery, agent adoption, KillMode=process
- [x] Graceful shutdown — SSH SIGTERM → IRC QUIT → kill VM
- [x] Systemd service — fireclaw-overseer.service
- [x] Regression test suite — 20 tests
+- [x] Discoverable skill system — SKILL.md + run.py per tool, auto-loaded at boot
+- [x] Agent tools — run_command, web_search, fetch_url, save_memory
+- [x] Persistent workspace + memory system (MEMORY.md pattern)
+- [x] Agent hot-reload, non-root agents, agent-to-agent, DMs, /invite
+- [x] Overseer resilience, health checks, graceful shutdown, systemd

-## Phase 4: Hardening & Performance
+## Phase 4: Hardening & Deployment (done)

- [ ] Network policies per agent — iptables rules per tap device
- [ ] Warm pool — pre-booted VMs from snapshots for instant spawns
+- [x] Network policies, thread safety, trigger fix, race condition fix
+- [x] Install/uninstall scripts, deployed on Debian + Ubuntu + GPU server
+- [x] Refactor — shared firecracker-vm.ts, skill system extraction
+
+### Remaining
+- [ ] Warm pool — pre-booted VMs from snapshots
 - [ ] Concurrent snapshot runs via network namespaces
- [ ] Thin provisioning — device-mapper snapshots instead of full rootfs copies
- [ ] Thread safety — lock around IRC socket writes in agent.py
- [ ] Agent health checks — overseer monitors and restarts dead agents
+- [ ] Thin provisioning — device-mapper snapshots

-## Phase 5: Advanced Features
+## Phase 5: Agent Intelligence

- [ ] Persistent agent memory v2 — richer structure, auto-save from conversations
- [ ] Scheduled/cron tasks — agents that run on a timer
- [ ] Advanced tool use — MCP tools, multi-step execution, file I/O
- [ ] Cost tracking — log duration, model, tokens per interaction
- [ ] Execution recording — full audit trail of agent actions
+Priority order by gain/complexity ratio.

-## Phase 6: Ideas & Experiments
+### High priority (high gain, low-medium complexity)

- [ ] vsock — replace SSH with virtio-vsock for lower overhead
+- [ ] **Large output handling** — tool results >2K chars saved to workspace file, agent gets preview + can read the rest. Prevents context explosion. Simple, high impact.
+- [ ] **Iteration budget** — shared token/round budget across tool calls. Prevents runaway loops, especially with GPU server running faster models that chain more aggressively. Add per-template configurable limits.
+- [ ] **Skill registry as git repo** — separate git repo for community/shared skills. Clone into agent rootfs. `fireclaw skills pull` to update. Like agentskills.io but self-hosted on Gitea.
+- [ ] **Session persistence** — SQLite in workspace for conversation history. FTS5 full-text search over past sessions. Agents can search their own history.
+
+### Medium priority (medium gain, medium complexity)
+
+- [ ] **Context compression** — when conversation history exceeds threshold, LLM-summarize middle turns. Protect head (system prompt) and tail (recent messages). Keeps agents coherent in long conversations.
+- [ ] **Skill learning** — after complex multi-tool tasks, agent creates a new SKILL.md + run.py in workspace/skills. Next boot, new skill is available. Self-improving agents.
+- [x] **Scheduled/cron agents** — templates support `schedule` (5-field cron) and `schedule_timeout` fields. Overseer checks every 60s, spawns and auto-destroys.
+- [ ] **!logs command** — tail agent interaction history from workspace.
+
+### Lower priority (good ideas, higher complexity or less immediate need)
+
+- [x] **Dangerous command approval** — pattern-based detection (rm -rf, dd, mkfs, fork bombs, shutdown, etc.) blocks execution in run_command skill.
+- [ ] **Parallel tool execution** — detect independent tool calls, run concurrently. Needs safety heuristics (read-only, non-overlapping paths).
+- [ ] **Cost tracking** — Ollama returns token counts. Log per-interaction: duration, model, tokens, skill used.
+- [ ] **Execution recording** — full audit trail of all tool calls and results.
+
+## Phase 6: Infrastructure
+
+- [ ] MCP servers in Firecracker VM with podman containers
+- [ ] Webhook triggers — HTTP endpoint that spawns ephemeral agents
+- [ ] Alert forwarding — pipe system alerts into #agents
 - [ ] Web dashboard — status page for running agents
- [ ] Podman-in-Firecracker — double isolation for untrusted container images
- [ ] Honeypot mode — test agent safety with fake credentials/services
- [ ] Self-healing rootfs — agents evolve their own images
- [ ] Claude API backend — for tasks requiring deep reasoning
- [ ] IRC federation — link nyx.fireclaw.local ↔ odin for external access
+
+## Phase 7: Ideas & Experiments
+
+See IDEAS.md for the full list.
--- a/TODO.md
+++ b/TODO.md
@@ -3,36 +3,47 @@
 ## Done

 - [x] Firecracker CLI runner with snapshots (~1.1s)
- [x] Alpine rootfs with ca-certificates, podman, python3
- [x] Global `fireclaw` command
 - [x] Multi-agent system — overseer + agent VMs + IRC + Ollama
- [x] 5 agent templates (worker, coder, researcher, quick, creative)
- [x] 5 Ollama models (qwen2.5-coder, qwen2.5, llama3.1, gemma3, phi4-mini)
- [x] Agent tool access — shell commands + podman containers
- [x] Persistent workspace + memory system (MEMORY.md pattern)
- [x] Agent hot-reload — model/persona swap via SSH + SIGHUP
- [x] Non-root agents — unprivileged `agent` user
- [x] Agent-to-agent via IRC mentions (10s cooldown)
- [x] DM support — private messages, no mention needed
- [x] /invite support — agents auto-join invited channels
- [x] Channel layout — #control (commands), #agents (common), DMs
- [x] Overseer resilience — crash recovery, agent adoption
- [x] Graceful shutdown — IRC QUIT before VM kill
- [x] Systemd service (KillMode=process)
- [x] Regression test suite (20 tests)
+- [x] 5 templates, 5+ models, hot-reload, non-root agents
+- [x] Tools: run_command, web_search, fetch_url, save_memory
+- [x] Discoverable skill system — SKILL.md + run.py, auto-loaded
+- [x] Persistent workspace + memory (MEMORY.md pattern)
+- [x] Overseer: !invoke, !destroy, !list, !model, !models, !templates, !persona, !status, !help
+- [x] Health checks, crash recovery, graceful shutdown, systemd
+- [x] Network policies, thread safety, trigger fix, race condition fix
+- [x] Install/uninstall scripts, deployed on 2 machines
+- [x] Refactor: firecracker-vm.ts shared helpers, skill extraction
+- [x] Large output handling — save >2K results to file, preview + read_file skill
+- [x] Session persistence — SQLite + FTS5, conversation history survives restarts
+- [x] !logs — tail agent history from workspace
+- [x] Context compression — cached summaries, configurable threshold/keep
+- [x] write_file skill — agents can create and modify workspace files
+- [x] Structured system prompt — explicit tool descriptions, multi-agent awareness
+- [x] Per-template config — temperature, num_predict, context_size, compress settings
+- [x] Response quality — 500-char deque storage, 1024 default output tokens, 250-token summaries
+- [x] update.sh script — one-command rootfs patching and snapshot rebuild

-## Next up
+## Next up (Phase 5 — by priority)

- [ ] Network policies per agent — restrict internet access
- [ ] Warm pool — pre-booted VMs for instant agent spawns
- [ ] Persistent agent memory improvements — richer memory structure, auto-save from conversations
- [ ] Thin provisioning — device-mapper snapshots instead of full rootfs copies
+### Medium effort
+- [ ] Skill registry git repo — shared skills on Gitea, `fireclaw skills pull`
+
+### Bigger items
+- [ ] Skill learning — agents create new skills from experience
+- [x] Cron agents — scheduled agent spawns (5-field cron in templates, auto-destroy timeout)
+- [x] Dangerous command approval — pattern detection blocks rm -rf /, dd, mkfs, fork bombs, etc.
+- [ ] Parallel tool execution — concurrent independent tool calls

 ## Polish

- [ ] Agent-to-agent response quality — small models (7B) parrot messages instead of answering. Needs better prompting ("don't repeat the question, answer it") or larger models (14B+). Claude API would help here.
- [ ] Cost tracking per agent interaction
+- [ ] Cost tracking per interaction
 - [ ] Execution recording / audit trail
- [ ] Agent health checks — overseer pings agents, restarts dead ones
- [ ] Thread safety in agent.py — lock around IRC socket writes
- [ ] Update regression tests for new channel layout
+- [ ] Update regression tests for skill system + channel layout
+
+## Low priority (from REPORT.md)
+
+- [ ] Hardcoded network interface fallback — `src/network.ts:56` defaults to `"eno2"` if route parsing fails
+- [ ] Predictable mount point names — `src/agent-manager.ts:94` uses `Date.now()` instead of crypto random
+- [ ] No Firecracker binary hash verification — `scripts/install.sh` downloads without SHA256 check
+- [ ] Ollama response size unbounded — `agent/tools.py` should limit `resp.read()` size
+- [ ] Process termination inconsistent — two patterns (ChildProcess vs PID polling), works but could consolidate
--- a/agent/agent.py
+++ b/agent/agent.py
@@ -1,18 +1,22 @@
 #!/usr/bin/env python3
-"""Fireclaw IRC agent — connects to IRC, responds via Ollama with tool access."""
+"""Fireclaw IRC agent — connects to IRC, responds via Ollama with discoverable skills."""

+import os
 import socket
 import json
 import sys
 import time
-import subprocess
-import urllib.request
-import urllib.error
 import signal
 import threading
+import urllib.request
 from collections import deque

-# Load config
+from skills import discover_skills, execute_skill, set_logger as set_skills_logger
+from tools import load_memory, query_ollama, set_logger as set_tools_logger
+from sessions import init_db, save_message, load_recent, set_logger as set_sessions_logger
+
+# ─── Config ──────────────────────────────────────────────────────────
+
 with open("/etc/agent/config.json") as f:
    CONFIG = json.load(f)

@@ -24,114 +28,80 @@ except FileNotFoundError:
    PERSONA = "You are a helpful assistant."

 NICK = CONFIG.get("nick", "agent")
-CHANNEL = CONFIG.get("channel", "#agents")
 SERVER = CONFIG.get("server", "172.16.0.1")
 PORT = CONFIG.get("port", 6667)
 OLLAMA_URL = CONFIG.get("ollama_url", "http://172.16.0.1:11434")
 CONTEXT_SIZE = CONFIG.get("context_size", 20)
 MAX_RESPONSE_LINES = CONFIG.get("max_response_lines", 50)
 TOOLS_ENABLED = CONFIG.get("tools", True)
-MAX_TOOL_ROUNDS = CONFIG.get("max_tool_rounds", 5)
+MAX_TOOL_ROUNDS = CONFIG.get("max_tool_rounds", 10)
+NUM_PREDICT = CONFIG.get("num_predict", 1024)
+TEMPERATURE = CONFIG.get("temperature", 0.7)
 WORKSPACE = "/workspace"
+SKILL_DIRS = ["/opt/skills", f"{WORKSPACE}/skills"]
+COMPRESS_ENABLED = CONFIG.get("compress", True)
+COMPRESS_THRESHOLD = CONFIG.get("compress_threshold", 16)
+COMPRESS_KEEP = CONFIG.get("compress_keep", 8)

-# Mutable runtime config — can be hot-reloaded via SIGHUP
 RUNTIME = {
    "model": CONFIG.get("model", "qwen2.5-coder:7b"),
    "trigger": CONFIG.get("trigger", "mention"),
    "persona": PERSONA,
 }

-# Recent messages for context
 recent = deque(maxlen=CONTEXT_SIZE)

-# Load persistent memory from workspace
-AGENT_MEMORY = ""
-try:
-    import os
-    with open(f"{WORKSPACE}/MEMORY.md") as f:
-        AGENT_MEMORY = f.read().strip()
-    # Also load all memory files referenced in the index
-    mem_dir = f"{WORKSPACE}/memory"
-    if os.path.isdir(mem_dir):
-        for fname in sorted(os.listdir(mem_dir)):
-            if fname.endswith(".md"):
-                try:
-                    with open(f"{mem_dir}/{fname}") as f:
-                        topic = fname.replace(".md", "")
-                        AGENT_MEMORY += f"\n\n## {topic}\n{f.read().strip()}"
-                except Exception:
-                    pass
-except FileNotFoundError:
-    pass
+# ─── Logging ─────────────────────────────────────────────────────────

-# Tool definitions for Ollama chat API
-TOOLS = [
-    {
-        "type": "function",
-        "function": {
-            "name": "run_command",
-            "description": "Execute a shell command on this system and return the output. Use this to check system info, run scripts, fetch URLs, process data, etc.",
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "command": {
-                        "type": "string",
-                        "description": "The shell command to execute (bash)",
-                    }
-                },
-                "required": ["command"],
-            },
-        },
-    },
-    {
-        "type": "function",
-        "function": {
-            "name": "save_memory",
-            "description": "Save something important to your persistent memory. Use this to remember facts about users, lessons learned, project context, or anything you want to recall in future conversations. Memories survive restarts.",
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "topic": {
-                        "type": "string",
-                        "description": "Short topic name for the memory file (e.g. 'user_prefs', 'project_x', 'lessons')",
-                    },
-                    "content": {
-                        "type": "string",
-                        "description": "The memory content to save",
-                    },
-                },
-                "required": ["topic", "content"],
-            },
-        },
-    },
-    {
-        "type": "function",
-        "function": {
-            "name": "web_search",
-            "description": "Search the web using SearXNG. Returns titles, URLs, and snippets for the top results. Use this when you need current information or facts you're unsure about.",
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "query": {
-                        "type": "string",
-                        "description": "The search query",
-                    },
-                    "num_results": {
-                        "type": "integer",
-                        "description": "Number of results to return (default 5)",
-                    },
-                },
-                "required": ["query"],
-            },
-        },
-    },
-]
-
-SEARX_URL = CONFIG.get("searx_url", "https://searx.mymx.me")
+LOG_FILE = f"{WORKSPACE}/agent.log" if os.path.isdir(WORKSPACE) else None


 def log(msg):
-    print(f"[agent:{NICK}] {msg}", flush=True)
+    line = f"[{time.strftime('%H:%M:%S')}] {msg}"
+    print(f"[agent:{NICK}] {line}", flush=True)
+    if LOG_FILE:
+        try:
+            with open(LOG_FILE, "a") as f:
+                f.write(line + "\n")
+        except Exception:
+            pass
+
+
+# Inject logger into submodules
+set_skills_logger(log)
+set_tools_logger(log)
+set_sessions_logger(log)
+
+# ─── Init ────────────────────────────────────────────────────────────
+
+AGENT_MEMORY = load_memory(WORKSPACE)
+TOOLS, SKILL_SCRIPTS = discover_skills(SKILL_DIRS)
+log(f"Loaded {len(TOOLS)} skills: {', '.join(SKILL_SCRIPTS.keys())}")
+
+db_conn = init_db(f"{WORKSPACE}/sessions.db")
+for msg in load_recent(db_conn, CONTEXT_SIZE):
+    recent.append(msg)
+log(f"Session: restored {len(recent)} messages")
+
+
+def reload_memory():
+    global AGENT_MEMORY
+    AGENT_MEMORY = load_memory(WORKSPACE)
+
+
+def dispatch_tool(fn_name, fn_args, round_num):
+    """Execute a tool call via the skill system."""
+    script = SKILL_SCRIPTS.get(fn_name)
+    if not script:
+        return f"[unknown tool: {fn_name}]"
+    log(f"Skill [{round_num}]: {fn_name}({str(fn_args)[:60]})")
+    result = execute_skill(script, fn_args, WORKSPACE, CONFIG)
+    if fn_name == "save_memory":
+        reload_memory()
+    return result
+
+
+# ─── IRC Client ──────────────────────────────────────────────────────


 class IRCClient:
@@ -161,9 +131,9 @@ class IRCClient:
        for line in text.split("\n"):
            line = line.strip()
            if line:
-                while len(line) > 400:
-                    self.send(f"PRIVMSG {target} :{line[:400]}")
-                    line = line[400:]
+                while len(line) > 380:
+                    self.send(f"PRIVMSG {target} :{line[:380]}")
+                    line = line[380:]
                self.send(f"PRIVMSG {target} :{line}")

    def set_bot_mode(self):
@@ -182,244 +152,83 @@ class IRCClient:
        return lines


-def run_command(command):
-    """Execute a shell command and return output."""
-    log(f"Running command: {command[:100]}")
+# ─── Message Handling ────────────────────────────────────────────────
+
+
+_compression_cache = {"hash": None, "summary": None}
+
+
+def compress_messages(channel_msgs):
+    """Summarize older messages, keep recent ones intact. Caches summary."""
+    if not COMPRESS_ENABLED or len(channel_msgs) <= COMPRESS_THRESHOLD:
+        return channel_msgs
+
+    older = channel_msgs[:-COMPRESS_KEEP]
+    keep = channel_msgs[-COMPRESS_KEEP:]
+
+    lines = [f"<{m['nick']}> {m['text']}" for m in older]
+    conversation = "\n".join(lines)
+    conv_hash = hash(conversation)
+
+    # Return cached summary if older messages haven't changed
+    if _compression_cache["hash"] == conv_hash and _compression_cache["summary"]:
+        return [{"nick": "_summary", "text": _compression_cache["summary"], "channel": channel_msgs[0]["channel"]}] + keep
+
    try:
-        result = subprocess.run(
-            ["bash", "-c", command],
-            capture_output=True,
-            text=True,
-            timeout=120,
-        )
-        output = result.stdout
-        if result.stderr:
-            output += f"\n[stderr] {result.stderr}"
-        if result.returncode != 0:
-            output += f"\n[exit code: {result.returncode}]"
-        # Truncate very long output
-        if len(output) > 2000:
-            output = output[:2000] + "\n[output truncated]"
-        return output.strip() or "[no output]"
-    except subprocess.TimeoutExpired:
-        return "[command timed out after 120s]"
+        payload = json.dumps({
+            "model": RUNTIME["model"],
+            "messages": [
+                {"role": "system", "content": "Summarize this IRC conversation in 3-5 sentences. Preserve key facts, decisions, questions, and any specific data mentioned. Be thorough but concise."},
+                {"role": "user", "content": conversation},
+            ],
+            "stream": False,
+            "options": {"num_predict": 250},
+        }).encode()
+        req = urllib.request.Request(f"{OLLAMA_URL}/api/chat", data=payload, headers={"Content-Type": "application/json"})
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            data = json.loads(resp.read(500_000))
+        summary = data.get("message", {}).get("content", "").strip()
+        if summary:
+            _compression_cache["hash"] = conv_hash
+            _compression_cache["summary"] = summary
+            log(f"Compressed {len(older)} messages into summary")
+            return [{"nick": "_summary", "text": summary, "channel": channel_msgs[0]["channel"]}] + keep
    except Exception as e:
-        return f"[error: {e}]"
+        log(f"Compression failed: {e}")

-
-def save_memory(topic, content):
-    """Save a memory to the persistent workspace."""
-    import os
-    mem_dir = f"{WORKSPACE}/memory"
-    os.makedirs(mem_dir, exist_ok=True)
-
-    # Write the memory file
-    filepath = f"{mem_dir}/{topic}.md"
-    with open(filepath, "w") as f:
-        f.write(content + "\n")
-
-    # Update MEMORY.md index
-    index_path = f"{WORKSPACE}/MEMORY.md"
-    existing = ""
-    try:
-        with open(index_path) as f:
-            existing = f.read()
-    except FileNotFoundError:
-        existing = "# Agent Memory\n"
-
-    # Add or update entry
-    entry = f"- [{topic}](memory/{topic}.md)"
-    if topic not in existing:
-        with open(index_path, "a") as f:
-            f.write(f"\n{entry}")
-
-    # Reload memory into global
-    global AGENT_MEMORY
-    with open(index_path) as f:
-        AGENT_MEMORY = f.read().strip()
-
-    log(f"Memory saved: {topic}")
-    return f"Memory saved to {filepath}"
-
-
-def web_search(query, num_results=5):
-    """Search the web via SearXNG."""
-    log(f"Web search: {query[:60]}")
-    try:
-        import urllib.parse
-        params = urllib.parse.urlencode({"q": query, "format": "json"})
-        req = urllib.request.Request(
-            f"{SEARX_URL}/search?{params}",
-            headers={"User-Agent": "fireclaw-agent"},
-        )
-        with urllib.request.urlopen(req, timeout=15) as resp:
-            data = json.loads(resp.read())
-        results = data.get("results", [])[:num_results]
-        if not results:
-            return "No results found."
-        lines = []
-        for r in results:
-            title = r.get("title", "")
-            url = r.get("url", "")
-            snippet = r.get("content", "")[:150]
-            lines.append(f"- {title}\n  {url}\n  {snippet}")
-        return "\n".join(lines)
-    except Exception as e:
-        return f"[search error: {e}]"
-
-
-def try_parse_tool_call(text):
-    """Try to parse a text-based tool call from model output.
-    Handles formats like:
-      {"name": "run_command", "arguments": {"command": "uptime"}}
-      <tool_call>{"name": "run_command", ...}</tool_call>
-    Returns (name, args) tuple or None.
-    """
-    import re
-    # Strip tool_call tags if present
-    text = re.sub(r"</?tool_call>", "", text).strip()
-    # Try to find JSON in the text
-    for start in range(len(text)):
-        if text[start] == "{":
-            for end in range(len(text), start, -1):
-                if text[end - 1] == "}":
-                    try:
-                        obj = json.loads(text[start:end])
-                        name = obj.get("name")
-                        args = obj.get("arguments", {})
-                        if name and isinstance(args, dict):
-                            return (name, args)
-                    except json.JSONDecodeError:
-                        continue
-    return None
-
-
-def ollama_request(payload):
-    """Make a request to Ollama API."""
-    data = json.dumps(payload).encode("utf-8")
-    req = urllib.request.Request(
-        f"{OLLAMA_URL}/api/chat",
-        data=data,
-        headers={"Content-Type": "application/json"},
-    )
-    with urllib.request.urlopen(req, timeout=120) as resp:
-        return json.loads(resp.read())
-
-
-def query_ollama(messages):
-    """Call Ollama chat API with tool support. Returns final response text."""
-    payload = {
-        "model": RUNTIME["model"],
-        "messages": messages,
-        "stream": False,
-        "options": {"num_predict": 512},
-    }
-
-    if TOOLS_ENABLED:
-        payload["tools"] = TOOLS
-
-    for round_num in range(MAX_TOOL_ROUNDS):
-        try:
-            data = ollama_request(payload)
-        except (urllib.error.URLError, TimeoutError) as e:
-            return f"[error: {e}]"
-
-        msg = data.get("message", {})
-
-        # Check for structured tool calls from API
-        tool_calls = msg.get("tool_calls")
-        if tool_calls:
-            messages.append(msg)
-
-            for tc in tool_calls:
-                fn = tc.get("function", {})
-                fn_name = fn.get("name", "")
-                fn_args = fn.get("arguments", {})
-
-                if fn_name == "run_command":
-                    cmd = fn_args.get("command", "")
-                    log(f"Tool call [{round_num+1}/{MAX_TOOL_ROUNDS}]: {cmd[:80]}")
-                    result = run_command(cmd)
-                    messages.append({"role": "tool", "content": result})
-                elif fn_name == "save_memory":
-                    topic = fn_args.get("topic", "note")
-                    content = fn_args.get("content", "")
-                    log(f"Tool call [{round_num+1}/{MAX_TOOL_ROUNDS}]: save_memory({topic})")
-                    result = save_memory(topic, content)
-                    messages.append({"role": "tool", "content": result})
-                elif fn_name == "web_search":
-                    query = fn_args.get("query", "")
-                    num = fn_args.get("num_results", 5)
-                    log(f"Tool call [{round_num+1}/{MAX_TOOL_ROUNDS}]: web_search({query[:60]})")
-                    result = web_search(query, num)
-                    messages.append({"role": "tool", "content": result})
-                else:
-                    messages.append({
-                        "role": "tool",
-                        "content": f"[unknown tool: {fn_name}]",
-                    })
-
-            payload["messages"] = messages
-            continue
-
-        # Check for text-based tool calls (model dumped JSON as text)
-        content = msg.get("content", "").strip()
-        parsed_tool = try_parse_tool_call(content)
-        if parsed_tool:
-            fn_name, fn_args = parsed_tool
-            messages.append({"role": "assistant", "content": content})
-            if fn_name == "run_command":
-                cmd = fn_args.get("command", "")
-                log(f"Text tool call [{round_num+1}/{MAX_TOOL_ROUNDS}]: {cmd[:80]}")
-                result = run_command(cmd)
-                messages.append({"role": "user", "content": f"Command output:\n{result}\n\nNow provide your response to the user based on this output."})
-            elif fn_name == "save_memory":
-                topic = fn_args.get("topic", "note")
-                mem_content = fn_args.get("content", "")
-                log(f"Text tool call [{round_num+1}/{MAX_TOOL_ROUNDS}]: save_memory({topic})")
-                result = save_memory(topic, mem_content)
-                messages.append({"role": "user", "content": f"{result}\n\nNow respond to the user."})
-            elif fn_name == "web_search":
-                query = fn_args.get("query", "")
-                num = fn_args.get("num_results", 5)
-                log(f"Text tool call [{round_num+1}/{MAX_TOOL_ROUNDS}]: web_search({query[:60]})")
-                result = web_search(query, num)
-                messages.append({"role": "user", "content": f"Search results:\n{result}\n\nNow respond to the user based on these results."})
-            payload["messages"] = messages
-            continue
-
-        # No tool calls — return the text response
-        return content
-
-    return "[max tool rounds reached]"
+    return channel_msgs


 def build_messages(question, channel):
-    """Build chat messages with system prompt and conversation history."""
    system = RUNTIME["persona"]
-    if TOOLS_ENABLED:
-        system += "\n\nYou have access to tools:"
-        system += "\n- run_command: Execute shell commands on your system."
-        system += "\n- web_search: Search the web for current information."
-        system += "\n- save_memory: Save important information to your persistent workspace."
-        system += "\nUse tools when needed rather than guessing. Your workspace at /workspace persists across restarts."
+
+    # Environment
+    system += f"\n\n## Environment\nYou are {NICK} in IRC channel {channel}. This is a multi-agent system — other nicks may be AI agents with their own tools. Keep responses concise (this is IRC). To address someone, prefix with their nick: 'coder: can you review this?'"
+
+    # Tools
+    if TOOLS_ENABLED and TOOLS:
+        system += "\n\n## Tools\nYou have tools — use them proactively instead of guessing or apologizing. If asked to do something, DO it with your tools."
+        for t in TOOLS:
+            fn = t["function"]
+            system += f"\n- **{fn['name']}**: {fn.get('description', '')}"
+        system += "\n\nYour workspace at /workspace persists across restarts. Write files, save results, read them back."
+
+    # Memory
    if AGENT_MEMORY and AGENT_MEMORY != "# Agent Memory":
-        system += f"\n\nIMPORTANT - Your persistent memory (facts you saved previously, use these to answer questions):\n{AGENT_MEMORY}"
-    system += f"\n\nYou are in IRC channel {channel}. Your nick is {NICK}. Keep responses concise — this is IRC."
-    system += "\nWhen you want to address another agent or user, always start your message with their nick followed by a colon, e.g. 'coder: can you review this?'. This is how IRC mentions work — without the prefix, they won't see your message."
+        system += f"\n\n## Your Memory\n{AGENT_MEMORY}"

    messages = [{"role": "system", "content": system}]

-    # Build conversation history as alternating user/assistant messages
    channel_msgs = [m for m in recent if m["channel"] == channel]
-    for msg in channel_msgs[-CONTEXT_SIZE:]:
-        if msg["nick"] == NICK:
+    channel_msgs = compress_messages(channel_msgs[-CONTEXT_SIZE:])
+    for msg in channel_msgs:
+        if msg["nick"] == "_summary":
+            messages.append({"role": "system", "content": f"[earlier conversation summary] {msg['text']}"})
+        elif msg["nick"] == NICK:
            messages.append({"role": "assistant", "content": msg["text"]})
        else:
            messages.append({"role": "user", "content": f"<{msg['nick']}> {msg['text']}"})

-    # Ensure the last message is from the user (the triggering question)
-    # If the deque already captured it, don't double-add
    last = messages[-1] if len(messages) > 1 else None
    if not last or last.get("role") != "user" or question not in last.get("content", ""):
        messages.append({"role": "user", "content": question})
@@ -428,14 +237,10 @@ def build_messages(question, channel):


 def should_trigger(text):
-    """Check if this message should trigger a response.
-    Only triggers when nick is at the start of the message (e.g. 'worker: hello')
-    not when nick appears elsewhere (e.g. 'coder: say hi to worker')."""
    if RUNTIME["trigger"] == "all":
        return True
    lower = text.lower()
    nick = NICK.lower()
-    # Match: "nick: ...", "nick, ...", "nick ...", "@nick ..."
    return (
        lower.startswith(f"{nick}:") or
        lower.startswith(f"{nick},") or
@@ -447,7 +252,6 @@ def should_trigger(text):


 def extract_question(text):
-    """Extract the actual question from the trigger."""
    lower = text.lower()
    for prefix in [
        f"{NICK.lower()}: ",
@@ -462,13 +266,12 @@ def extract_question(text):
    return text


-# Track last response time to prevent agent-to-agent loops
 _last_response_time = 0
-_AGENT_COOLDOWN = 10  # seconds between responses to prevent loops
+_AGENT_COOLDOWN = 10
+_cooldown_lock = threading.Lock()


 def handle_message(irc, source_nick, target, text):
-    """Process an incoming PRIVMSG."""
    global _last_response_time

    is_dm = not target.startswith("#")
@@ -476,20 +279,20 @@ def handle_message(irc, source_nick, target, text):
    reply_to = source_nick if is_dm else target

    recent.append({"nick": source_nick, "text": text, "channel": channel})
+    save_message(db_conn, source_nick, channel, text)

    if source_nick == NICK:
        return

-    # DMs always trigger, channel messages need mention
    if not is_dm and not should_trigger(text):
        return

-    # Cooldown to prevent agent-to-agent loops
-    now = time.time()
-    if now - _last_response_time < _AGENT_COOLDOWN:
-        log(f"Cooldown active, ignoring trigger from {source_nick}")
-        return
-    _last_response_time = now
+    with _cooldown_lock:
+        now = time.time()
+        if now - _last_response_time < _AGENT_COOLDOWN:
+            log(f"Cooldown active, ignoring trigger from {source_nick}")
+            return
+        _last_response_time = now

    question = extract_question(text) if not is_dm else text
    log(f"Triggered by {source_nick} in {channel}: {question[:80]}")
@@ -497,7 +300,13 @@ def handle_message(irc, source_nick, target, text):
    def do_respond():
        try:
            messages = build_messages(question, channel)
-            response = query_ollama(messages)
+            response = query_ollama(
+                messages, RUNTIME,
+                TOOLS if TOOLS_ENABLED else [],
+                SKILL_SCRIPTS, dispatch_tool,
+                OLLAMA_URL, MAX_TOOL_ROUNDS,
+                num_predict=NUM_PREDICT, temperature=TEMPERATURE,
+            )

            if not response:
                return
@@ -508,7 +317,8 @@ def handle_message(irc, source_nick, target, text):
                lines.append(f"[truncated, {MAX_RESPONSE_LINES} lines max]")

            irc.say(reply_to, "\n".join(lines))
-            recent.append({"nick": NICK, "text": response[:200], "channel": channel})
+            recent.append({"nick": NICK, "text": response[:500], "channel": channel})
+            save_message(db_conn, NICK, channel, response[:500], full_text=response)
        except Exception as e:
            log(f"Error handling message: {e}")
            try:
@@ -519,8 +329,11 @@ def handle_message(irc, source_nick, target, text):
    threading.Thread(target=do_respond, daemon=True).start()


+# ─── Main Loop ───────────────────────────────────────────────────────
+
+
 def run():
-    log(f"Starting agent: nick={NICK} channel={CHANNEL} model={RUNTIME['model']} tools={TOOLS_ENABLED}")
+    log(f"Starting agent: nick={NICK} model={RUNTIME['model']} tools={TOOLS_ENABLED}")

    while True:
        try:
@@ -528,7 +341,6 @@ def run():
            log(f"Connecting to {SERVER}:{PORT}...")
            irc.connect()

-            # Hot-reload on SIGHUP — re-read config and persona
            def handle_sighup(signum, frame):
                log("SIGHUP received, reloading config...")
                try:
@@ -542,13 +354,12 @@ def run():
                    except FileNotFoundError:
                        pass
                    log(f"Reloaded: model={RUNTIME['model']} trigger={RUNTIME['trigger']}")
-                    irc.say(CHANNEL, f"[reloaded: model={RUNTIME['model']}]")
+                    irc.say("#agents", f"[reloaded: model={RUNTIME['model']}]")
                except Exception as e:
                    log(f"Reload failed: {e}")

            signal.signal(signal.SIGHUP, handle_sighup)

-            # Graceful shutdown on SIGTERM — send IRC QUIT
            def handle_sigterm(signum, frame):
                log("SIGTERM received, quitting IRC...")
                try:
@@ -577,9 +388,8 @@ def run():
                        log("Registered with server")
                        irc.set_bot_mode()
                        irc.join("#agents")
-                        log(f"Joined #agents")
+                        log("Joined #agents")

-                    # Handle INVITE — auto-join invited channels
                    if parts[1] == "INVITE" and len(parts) >= 3:
                        invited_channel = parts[-1].lstrip(":")
                        inviter = parts[0].split("!")[0].lstrip(":")
--- a/agent/sessions.py
+++ b/agent/sessions.py
@@ -0,0 +1,87 @@
+"""Session persistence — SQLite + FTS5 for conversation history."""
+
+import sqlite3
+import threading
+import time
+
+
+def log(msg):
+    print(f"[sessions] {msg}", flush=True)
+
+
+def set_logger(fn):
+    global log
+    log = fn
+
+
+_write_lock = threading.Lock()
+
+
+def init_db(db_path):
+    """Create/open the session database. Returns a connection."""
+    conn = sqlite3.connect(db_path, check_same_thread=False)
+    conn.execute("PRAGMA journal_mode=WAL")
+    conn.execute("""
+        CREATE TABLE IF NOT EXISTS messages (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            ts REAL NOT NULL,
+            nick TEXT NOT NULL,
+            channel TEXT NOT NULL,
+            text TEXT NOT NULL,
+            full_text TEXT
+        )
+    """)
+    conn.execute("""
+        CREATE INDEX IF NOT EXISTS idx_messages_channel_ts
+        ON messages(channel, ts)
+    """)
+    # FTS5 virtual table for full-text search
+    conn.execute("""
+        CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts
+        USING fts5(text, content=messages, content_rowid=id)
+    """)
+    conn.commit()
+    _prune(conn)
+    log(f"Session DB ready: {db_path}")
+    return conn
+
+
+def save_message(conn, nick, channel, text, full_text=None):
+    """Persist a message to the session database."""
+    with _write_lock:
+        cur = conn.execute(
+            "INSERT INTO messages (ts, nick, channel, text, full_text) VALUES (?, ?, ?, ?, ?)",
+            (time.time(), nick, channel, text, full_text),
+        )
+        conn.execute(
+            "INSERT INTO messages_fts(rowid, text) VALUES (?, ?)",
+            (cur.lastrowid, text),
+        )
+        conn.commit()
+
+
+def load_recent(conn, limit=20):
+    """Load the last N messages for boot recovery.
+    Returns list of {"nick", "text", "channel"} dicts in chronological order."""
+    rows = conn.execute(
+        "SELECT nick, text, channel FROM messages ORDER BY id DESC LIMIT ?",
+        (limit,),
+    ).fetchall()
+    return [{"nick": r[0], "text": r[1], "channel": r[2]} for r in reversed(rows)]
+
+
+def _prune(conn, keep=1000):
+    """Delete old messages beyond the last `keep`. Runs once at init."""
+    count = conn.execute("SELECT COUNT(*) FROM messages").fetchone()[0]
+    if count <= keep:
+        return
+    deleted = count - keep
+    conn.execute("""
+        DELETE FROM messages WHERE id NOT IN (
+            SELECT id FROM messages ORDER BY id DESC LIMIT ?
+        )
+    """, (keep,))
+    # Rebuild FTS index after bulk delete
+    conn.execute("INSERT INTO messages_fts(messages_fts) VALUES('rebuild')")
+    conn.commit()
+    log(f"Pruned {deleted} old messages (kept {keep})")
--- a/agent/skills.py
+++ b/agent/skills.py
@@ -0,0 +1,175 @@
+"""Skill discovery, parsing, and execution."""
+
+import os
+import re
+import json
+import subprocess
+import time
+
+
+def log(msg):
+    """Import-safe logging — overridden by agent.py at init."""
+    print(f"[skills] {msg}", flush=True)
+
+
+def set_logger(fn):
+    """Allow agent.py to inject its logger."""
+    global log
+    log = fn
+
+
+LARGE_OUTPUT_THRESHOLD = 2000
+_output_counter = 0
+
+
+def parse_skill_md(path):
+    """Parse a SKILL.md frontmatter into a tool definition.
+    Returns tool definition dict or None on failure."""
+    try:
+        with open(path) as f:
+            content = f.read()
+    except Exception as e:
+        log(f"Cannot read {path}: {e}")
+        return None
+
+    content = content.replace("\r\n", "\n")
+
+    match = re.match(r"^---\n(.*?)\n---", content, re.DOTALL)
+    if not match:
+        log(f"No frontmatter in {path}")
+        return None
+
+    fm = {}
+    current_key = None
+    current_param = None
+    params = {}
+
+    for line in match.group(1).split("\n"):
+        stripped = line.strip()
+        if not stripped or stripped.startswith("#"):
+            continue
+
+        indent = len(line) - len(line.lstrip())
+
+        if indent >= 2 and current_key == "parameters":
+            if indent >= 4 and current_param:
+                k, _, v = stripped.partition(":")
+                k = k.strip()
+                v = v.strip().strip('"').strip("'")
+                if k == "required":
+                    v = v.lower() in ("true", "yes", "1")
+                params[current_param][k] = v
+            elif ":" in stripped:
+                param_name = stripped.rstrip(":").strip()
+                current_param = param_name
+                params[param_name] = {}
+        elif ":" in line and indent == 0:
+            k, _, v = line.partition(":")
+            k = k.strip()
+            v = v.strip().strip('"').strip("'")
+            fm[k] = v
+            current_key = k
+            if k == "parameters":
+                current_param = None
+
+    if "name" not in fm:
+        log(f"No 'name' field in {path}")
+        return None
+
+    if "description" not in fm:
+        log(f"Warning: no 'description' in {path}")
+
+    properties = {}
+    required = []
+    for pname, pdata in params.items():
+        ptype = pdata.get("type", "string")
+        if ptype not in ("string", "integer", "number", "boolean", "array", "object"):
+            log(f"Warning: unknown type '{ptype}' for param '{pname}' in {path}")
+        properties[pname] = {
+            "type": ptype,
+            "description": pdata.get("description", ""),
+        }
+        if pdata.get("required", False):
+            required.append(pname)
+
+    return {
+        "type": "function",
+        "function": {
+            "name": fm["name"],
+            "description": fm.get("description", ""),
+            "parameters": {
+                "type": "object",
+                "properties": properties,
+                "required": required,
+            },
+        },
+    }
+
+
+def discover_skills(skill_dirs):
+    """Scan skill directories and return tool definitions + script paths."""
+    tools = []
+    scripts = {}
+
+    for skill_dir in skill_dirs:
+        if not os.path.isdir(skill_dir):
+            continue
+        for name in sorted(os.listdir(skill_dir)):
+            skill_path = os.path.join(skill_dir, name)
+            skill_md = os.path.join(skill_path, "SKILL.md")
+            if not os.path.isfile(skill_md):
+                continue
+
+            tool_def = parse_skill_md(skill_md)
+            if not tool_def:
+                continue
+
+            for ext in ("run.py", "run.sh"):
+                script = os.path.join(skill_path, ext)
+                if os.path.isfile(script):
+                    scripts[tool_def["function"]["name"]] = script
+                    break
+
+            if tool_def["function"]["name"] in scripts:
+                tools.append(tool_def)
+
+    return tools, scripts
+
+
+def execute_skill(script_path, args, workspace, config):
+    """Execute a skill script with args as JSON on stdin.
+    Large outputs are saved to a file with a preview returned."""
+    global _output_counter
+    env = os.environ.copy()
+    env["WORKSPACE"] = workspace
+    env["SEARX_URL"] = config.get("searx_url", "https://searx.mymx.me")
+
+    try:
+        result = subprocess.run(
+            ["python3" if script_path.endswith(".py") else "bash", script_path],
+            input=json.dumps(args),
+            capture_output=True,
+            text=True,
+            timeout=120,
+            env=env,
+        )
+        output = result.stdout
+        if result.stderr:
+            output += f"\n[stderr] {result.stderr}"
+        output = output.strip() or "[no output]"
+
+        if len(output) > LARGE_OUTPUT_THRESHOLD:
+            output_dir = f"{workspace}/tool_outputs"
+            os.makedirs(output_dir, exist_ok=True)
+            _output_counter += 1
+            filepath = f"{output_dir}/output_{_output_counter}.txt"
+            with open(filepath, "w") as f:
+                f.write(output)
+            preview = output[:1500]
+            return f"{preview}\n\n[output truncated — full result ({len(output)} chars) saved to {filepath}. Use read_file to view it.]"
+
+        return output
+    except subprocess.TimeoutExpired:
+        return "[skill timed out after 120s]"
+    except Exception as e:
+        return f"[skill error: {e}]"
--- a/agent/tools.py
+++ b/agent/tools.py
@@ -0,0 +1,132 @@
+"""LLM interaction, tool dispatch, and memory management."""
+
+import os
+import re
+import json
+import urllib.request
+import urllib.error
+
+
+def log(msg):
+    print(f"[tools] {msg}", flush=True)
+
+
+def set_logger(fn):
+    global log
+    log = fn
+
+
+# ─── Memory ──────────────────────────────────────────────────────────
+
+def load_memory(workspace):
+    """Load all memory files from workspace."""
+    memory = ""
+    try:
+        with open(f"{workspace}/MEMORY.md") as f:
+            memory = f.read().strip()
+        mem_dir = f"{workspace}/memory"
+        if os.path.isdir(mem_dir):
+            for fname in sorted(os.listdir(mem_dir)):
+                if fname.endswith(".md"):
+                    try:
+                        with open(f"{mem_dir}/{fname}") as f:
+                            topic = fname.replace(".md", "")
+                            memory += f"\n\n## {topic}\n{f.read().strip()}"
+                    except Exception:
+                        pass
+    except FileNotFoundError:
+        pass
+    return memory
+
+
+# ─── Tool Call Parsing ───────────────────────────────────────────────
+
+def try_parse_tool_call(text):
+    """Parse text-based tool calls (model dumps JSON as text)."""
+    text = re.sub(r"</?tool_call>", "", text).strip()
+    for start in range(len(text)):
+        if text[start] == "{":
+            for end in range(len(text), start, -1):
+                if text[end - 1] == "}":
+                    try:
+                        obj = json.loads(text[start:end])
+                        name = obj.get("name")
+                        args = obj.get("arguments", {})
+                        if name and isinstance(args, dict):
+                            return (name, args)
+                    except json.JSONDecodeError:
+                        continue
+    return None
+
+
+# ─── LLM Interaction ────────────────────────────────────────────────
+
+def ollama_request(ollama_url, payload):
+    data = json.dumps(payload).encode("utf-8")
+    req = urllib.request.Request(
+        f"{ollama_url}/api/chat",
+        data=data,
+        headers={"Content-Type": "application/json"},
+    )
+    with urllib.request.urlopen(req, timeout=120) as resp:
+        return json.loads(resp.read(2_000_000))
+
+
+def query_ollama(messages, runtime, tools, skill_scripts, dispatch_fn, ollama_url, max_rounds, num_predict=1024, temperature=0.7):
+    """Call Ollama chat API with skill-based tool support."""
+    payload = {
+        "model": runtime["model"],
+        "messages": messages,
+        "stream": False,
+        "options": {"num_predict": num_predict, "temperature": temperature},
+    }
+
+    if tools:
+        payload["tools"] = tools
+
+    for round_num in range(max_rounds):
+        remaining = max_rounds - round_num
+        try:
+            data = ollama_request(ollama_url, payload)
+        except (urllib.error.URLError, TimeoutError) as e:
+            return f"[error: {e}]"
+
+        msg = data.get("message", {})
+
+        # Structured tool calls
+        tool_calls = msg.get("tool_calls")
+        if tool_calls:
+            messages.append(msg)
+            for tc in tool_calls:
+                fn = tc.get("function", {})
+                result = dispatch_fn(
+                    fn.get("name", ""),
+                    fn.get("arguments", {}),
+                    round_num + 1,
+                )
+                if remaining <= 2:
+                    result += f"\n[warning: {remaining - 1} tool rounds remaining — wrap up]"
+                messages.append({"role": "tool", "content": result})
+            payload["messages"] = messages
+            continue
+
+        # Text-based tool calls
+        content = msg.get("content", "").strip()
+        parsed_tool = try_parse_tool_call(content)
+        if parsed_tool:
+            fn_name, fn_args = parsed_tool
+            if fn_name in skill_scripts:
+                messages.append({"role": "assistant", "content": content})
+                result = dispatch_fn(fn_name, fn_args, round_num + 1)
+                if remaining <= 2:
+                    result += f"\n[warning: {remaining - 1} tool rounds remaining — wrap up]"
+                messages.append({
+                    "role": "user",
+                    "content": f"Tool result:\n{result}\n\nNow respond to the user based on this result.",
+                })
+                payload["messages"] = messages
+                continue
+
+        return content
+
+    return "[max tool rounds reached]"
--- a/scripts/install.sh
+++ b/scripts/install.sh
@@ -306,10 +306,12 @@ else
    ' || err "Failed to install packages in chroot"
    ok "Alpine packages installed"

-    log "Installing agent script and config..."
-    sudo mkdir -p /tmp/agent-build-mnt/opt/agent /tmp/agent-build-mnt/etc/agent
-    sudo cp "$SCRIPT_DIR/agent/agent.py" /tmp/agent-build-mnt/opt/agent/agent.py
+    log "Installing agent script, skills, and config..."
+    sudo mkdir -p /tmp/agent-build-mnt/opt/agent /tmp/agent-build-mnt/opt/skills /tmp/agent-build-mnt/etc/agent
+    sudo cp "$SCRIPT_DIR/agent/"*.py /tmp/agent-build-mnt/opt/agent/
    sudo chmod +x /tmp/agent-build-mnt/opt/agent/agent.py
+    sudo cp -r "$SCRIPT_DIR/skills/"* /tmp/agent-build-mnt/opt/skills/
+    sudo chmod +x /tmp/agent-build-mnt/opt/skills/*/run.*

    echo '{"nick":"agent","model":"qwen2.5-coder:7b","trigger":"mention","server":"172.16.0.1","port":6667,"ollama_url":"http://172.16.0.1:11434"}' | \
        sudo tee /tmp/agent-build-mnt/etc/agent/config.json > /dev/null
@@ -422,14 +424,16 @@ step "Agent templates"
 TMPL_DIR="$FIRECLAW_DIR/templates"
 mkdir -p "$TMPL_DIR"

-for tmpl in worker coder quick; do
+for tmpl in worker coder researcher quick creative; do
    if [[ -f "$TMPL_DIR/$tmpl.json" ]]; then
        skip "$tmpl"
    else
        case $tmpl in
-            worker) echo '{"name":"worker","nick":"worker","model":"qwen2.5-coder:7b","trigger":"mention","persona":"You are a general-purpose assistant on IRC. Keep responses concise."}' > "$TMPL_DIR/$tmpl.json" ;;
-            coder)  echo '{"name":"coder","nick":"coder","model":"qwen2.5-coder:7b","trigger":"mention","persona":"You are a code-focused assistant on IRC. Be direct and technical."}' > "$TMPL_DIR/$tmpl.json" ;;
-            quick)  echo '{"name":"quick","nick":"quick","model":"phi4-mini","trigger":"mention","tools":false,"network":"none","persona":"You are a fast assistant on IRC. One sentence answers."}' > "$TMPL_DIR/$tmpl.json" ;;
+            worker)     echo '{"name":"worker","nick":"worker","model":"qwen2.5:7b","trigger":"mention","persona":"You are a general-purpose assistant on IRC. Keep responses concise."}' > "$TMPL_DIR/$tmpl.json" ;;
+            coder)      echo '{"name":"coder","nick":"coder","model":"qwen2.5-coder:7b","trigger":"mention","persona":"You are a code-focused assistant on IRC. Be direct and technical."}' > "$TMPL_DIR/$tmpl.json" ;;
+            researcher) echo '{"name":"researcher","nick":"research","model":"llama3.1:8b","trigger":"mention","persona":"You are a research assistant on IRC. Use numbered points for complex topics. Keep responses to 5-10 lines."}' > "$TMPL_DIR/$tmpl.json" ;;
+            quick)      echo '{"name":"quick","nick":"quick","model":"phi4-mini","trigger":"mention","tools":false,"network":"none","persona":"You are a fast assistant on IRC. One sentence answers."}' > "$TMPL_DIR/$tmpl.json" ;;
+            creative)   echo '{"name":"creative","nick":"muse","model":"gemma3:4b","trigger":"mention","tools":false,"persona":"You are a creative assistant on IRC. Help with writing, brainstorming, ideas."}' > "$TMPL_DIR/$tmpl.json" ;;
        esac
        ok "$tmpl template created"
    fi
--- a/scripts/update.sh
+++ b/scripts/update.sh
@@ -0,0 +1,83 @@
+#!/bin/bash
+# Update fireclaw agent code and skills in the rootfs.
+# Stops the overseer, patches the rootfs, rebuilds snapshot, restarts.
+#
+# Usage: ./scripts/update.sh
+
+set -euo pipefail
+
+log()  { echo -e "\033[1;34m[fireclaw]\033[0m $*"; }
+step() { echo -e "\n\033[1;32m━━━ $* ━━━\033[0m"; }
+ok()   { echo -e "  \033[0;32m✓\033[0m $*"; }
+err()  { echo -e "\033[1;31m[error]\033[0m $*" >&2; exit 1; }
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+FIRECLAW_DIR="$HOME/.fireclaw"
+ROOTFS="$FIRECLAW_DIR/agent-rootfs.ext4"
+MNT="/tmp/fireclaw-update-mnt"
+
+[[ ! -f "$ROOTFS" ]] && err "No rootfs found at $ROOTFS — run install.sh first."
+
+# ─── Stop overseer ──────────────────────────────────────────────────
+
+step "Stop overseer"
+if systemctl is-active --quiet fireclaw-overseer 2>/dev/null; then
+    sudo systemctl stop fireclaw-overseer
+    ok "Overseer stopped"
+else
+    ok "Overseer not running"
+fi
+
+# Wait for any firecracker processes to exit
+sleep 1
+
+# ─── Build TypeScript ───────────────────────────────────────────────
+
+step "Build TypeScript"
+cd "$SCRIPT_DIR"
+npm run build
+ok "TypeScript compiled"
+
+# ─── Patch rootfs ───────────────────────────────────────────────────
+
+step "Patch rootfs"
+
+sudo mkdir -p "$MNT"
+sudo mount "$ROOTFS" "$MNT" || err "Failed to mount rootfs"
+
+trap 'sudo umount "$MNT" 2>/dev/null; sudo rmdir "$MNT" 2>/dev/null' EXIT
+
+sudo mkdir -p "$MNT/opt/agent" "$MNT/opt/skills"
+
+sudo cp "$SCRIPT_DIR/agent/"*.py "$MNT/opt/agent/"
+sudo chmod +x "$MNT/opt/agent/agent.py"
+
+sudo rm -rf "$MNT/opt/skills/"*
+sudo cp -r "$SCRIPT_DIR/skills/"* "$MNT/opt/skills/"
+sudo chmod +x "$MNT/opt/skills/"*/run.*
+
+sudo umount "$MNT"
+sudo rmdir "$MNT"
+trap - EXIT
+
+ok "Agent + skills updated in rootfs"
+
+# ─── Rebuild snapshot ───────────────────────────────────────────────
+
+step "Rebuild snapshot"
+
+rm -f "$FIRECLAW_DIR/snapshot.state" \
+      "$FIRECLAW_DIR/snapshot.mem" \
+      "$FIRECLAW_DIR/snapshot-rootfs.ext4"
+
+fireclaw snapshot create
+ok "Snapshot rebuilt"
+
+# ─── Restart overseer ──────────────────────────────────────────────
+
+step "Restart overseer"
+sudo systemctl start fireclaw-overseer
+ok "Overseer started"
+
+echo ""
+log "Update complete. Use IRC to test."
--- a/skills/fetch_url/SKILL.md
+++ b/skills/fetch_url/SKILL.md
@@ -0,0 +1,9 @@
+---
+name: fetch_url
+description: Fetch a URL and return its text content. HTML is stripped to plain text. Use this to read web pages, documentation, articles, etc.
+parameters:
+  url:
+    type: string
+    description: The URL to fetch
+    required: true
+---
--- a/skills/fetch_url/run.py
+++ b/skills/fetch_url/run.py
@@ -0,0 +1,49 @@
+#!/usr/bin/env python3
+import sys
+import json
+import re
+import urllib.request
+from html.parser import HTMLParser
+
+args = json.loads(sys.stdin.read())
+url = args.get("url", "")
+
+
+class TextExtractor(HTMLParser):
+    def __init__(self):
+        super().__init__()
+        self.text = []
+        self._skip = False
+
+    def handle_starttag(self, tag, attrs):
+        if tag in ("script", "style", "noscript"):
+            self._skip = True
+
+    def handle_endtag(self, tag):
+        if tag in ("script", "style", "noscript"):
+            self._skip = False
+        if tag in ("p", "br", "div", "h1", "h2", "h3", "h4", "li", "tr"):
+            self.text.append("\n")
+
+    def handle_data(self, data):
+        if not self._skip:
+            self.text.append(data)
+
+
+try:
+    req = urllib.request.Request(url, headers={"User-Agent": "fireclaw-agent"})
+    with urllib.request.urlopen(req, timeout=15) as resp:
+        content_type = resp.headers.get("Content-Type", "")
+        raw = resp.read(50_000).decode("utf-8", errors="replace")
+
+    if "html" in content_type:
+        parser = TextExtractor()
+        parser.feed(raw)
+        text = "".join(parser.text)
+    else:
+        text = raw
+
+    text = re.sub(r"\n{3,}", "\n\n", text).strip()
+    print(text or "[empty page]")
+except Exception as e:
+    print(f"[fetch error: {e}]")
--- a/skills/read_file/SKILL.md
+++ b/skills/read_file/SKILL.md
@@ -0,0 +1,17 @@
+---
+name: read_file
+description: Read a file from the workspace with optional line range. Use this to view large tool outputs, logs, or any file in /workspace.
+parameters:
+  path:
+    type: string
+    description: Path to the file to read (must be under /workspace)
+    required: true
+  offset:
+    type: integer
+    description: Start reading from this line number (default 1)
+    required: false
+  limit:
+    type: integer
+    description: Maximum number of lines to return (default 200)
+    required: false
+---
--- a/skills/read_file/run.py
+++ b/skills/read_file/run.py
@@ -0,0 +1,67 @@
+#!/usr/bin/env python3
+"""Read a file from /workspace with optional line range."""
+import json
+import os
+import sys
+
+args = json.loads(sys.stdin.read())
+path = args.get("path", "")
+offset = max(int(args.get("offset", 1)), 1)
+limit = max(int(args.get("limit", 200)), 1)
+
+WORKSPACE = os.environ.get("WORKSPACE", "/workspace")
+
+# Resolve to absolute and ensure it stays under /workspace
+resolved = os.path.realpath(path)
+if not resolved.startswith(WORKSPACE + "/") and resolved != WORKSPACE:
+    print(f"[error: path must be under {WORKSPACE}]")
+    sys.exit(0)
+
+if not os.path.exists(resolved):
+    print(f"[error: file not found: {path}]")
+    sys.exit(0)
+
+if os.path.isdir(resolved):
+    entries = sorted(os.listdir(resolved))
+    print(f"Directory listing of {path} ({len(entries)} entries):")
+    for entry in entries[:100]:
+        full = os.path.join(resolved, entry)
+        kind = "dir" if os.path.isdir(full) else "file"
+        print(f"  {entry} ({kind})")
+    if len(entries) > 100:
+        print(f"  ... and {len(entries) - 100} more")
+    sys.exit(0)
+
+# Binary detection — check first 512 bytes
+try:
+    with open(resolved, "rb") as f:
+        chunk = f.read(512)
+    if b"\x00" in chunk:
+        size = os.path.getsize(resolved)
+        print(f"[binary file: {path} ({size} bytes)]")
+        sys.exit(0)
+except Exception as e:
+    print(f"[error reading file: {e}]")
+    sys.exit(0)
+
+# Read with line range
+try:
+    with open(resolved) as f:
+        lines = f.readlines()
+except Exception as e:
+    print(f"[error reading file: {e}]")
+    sys.exit(0)
+
+total = len(lines)
+start_idx = offset - 1  # 0-based
+end_idx = min(start_idx + limit, total)
+
+if start_idx >= total:
+    print(f"[file has {total} lines — offset {offset} is beyond end of file]")
+    sys.exit(0)
+
+for i in range(start_idx, end_idx):
+    print(f"{i + 1}\t{lines[i]}", end="")
+
+if end_idx < total:
+    print(f"\n[showing lines {offset}-{end_idx} of {total} — use offset={end_idx + 1} to read more]")
--- a/skills/run_command/SKILL.md
+++ b/skills/run_command/SKILL.md
@@ -0,0 +1,9 @@
+---
+name: run_command
+description: Execute a shell command on this system and return the output. Use this to check system info, run scripts, fetch URLs, process data, etc.
+parameters:
+  command:
+    type: string
+    description: The shell command to execute (bash)
+    required: true
+---
--- a/skills/run_command/run.py
+++ b/skills/run_command/run.py
@@ -0,0 +1,51 @@
+#!/usr/bin/env python3
+import subprocess
+import sys
+import json
+import re
+
+DANGEROUS_PATTERNS = [
+    (re.compile(r'\brm\s+(-[a-zA-Z]*f[a-zA-Z]*\s+)?-[a-zA-Z]*r[a-zA-Z]*\s+(/|~|\.)(\s|$)'), "recursive delete of critical path"),
+    (re.compile(r'\brm\s+(-[a-zA-Z]*r[a-zA-Z]*\s+)?-[a-zA-Z]*f[a-zA-Z]*\s+(/|~|\.)(\s|$)'), "recursive delete of critical path"),
+    (re.compile(r'\bdd\s+if='), "raw disk write (dd)"),
+    (re.compile(r'\bmkfs\b'), "filesystem format"),
+    (re.compile(r':\(\)\s*\{[^}]*:\s*\|\s*:'), "fork bomb"),
+    (re.compile(r'>\s*/dev/[sh]d[a-z]'), "device write"),
+    (re.compile(r'\bchmod\s+(-[a-zA-Z]*R[a-zA-Z]*\s+)?777\s+/(\s|$)'), "recursive chmod 777 on /"),
+    (re.compile(r'\b(shutdown|reboot|halt|poweroff)\b'), "system shutdown/reboot"),
+    (re.compile(r'\bkill\s+-9\s+(-1|1)\b'), "kill init or all processes"),
+]
+
+
+def check_dangerous(cmd):
+    for pattern, desc in DANGEROUS_PATTERNS:
+        if pattern.search(cmd):
+            return desc
+    return None
+
+
+args = json.loads(sys.stdin.read())
+command = args.get("command", "")
+
+blocked = check_dangerous(command)
+if blocked:
+    print(f'[blocked: command matches dangerous pattern "{blocked}". This command was not executed.]')
+    sys.exit(0)
+
+try:
+    result = subprocess.run(
+        ["bash", "-c", command],
+        capture_output=True,
+        text=True,
+        timeout=120,
+    )
+    output = result.stdout
+    if result.stderr:
+        output += f"\n[stderr] {result.stderr}"
+    if result.returncode != 0:
+        output += f"\n[exit code: {result.returncode}]"
+    print(output.strip() or "[no output]")
+except subprocess.TimeoutExpired:
+    print("[command timed out after 120s]")
+except Exception as e:
+    print(f"[error: {e}]")
--- a/skills/save_memory/SKILL.md
+++ b/skills/save_memory/SKILL.md
@@ -0,0 +1,13 @@
+---
+name: save_memory
+description: Save something important to your persistent memory. Use this to remember facts about users, lessons learned, project context, or anything you want to recall in future conversations. Memories survive restarts.
+parameters:
+  topic:
+    type: string
+    description: "Short topic name for the memory file (e.g. 'user_prefs', 'project_x', 'lessons')"
+    required: true
+  content:
+    type: string
+    description: The memory content to save
+    required: true
+---
--- a/skills/save_memory/run.py
+++ b/skills/save_memory/run.py
@@ -0,0 +1,33 @@
+#!/usr/bin/env python3
+import sys
+import json
+import os
+
+args = json.loads(sys.stdin.read())
+topic = args.get("topic", "note")
+content = args.get("content", "")
+workspace = os.environ.get("WORKSPACE", "/workspace")
+
+mem_dir = f"{workspace}/memory"
+os.makedirs(mem_dir, exist_ok=True)
+
+# Write the memory file
+filepath = f"{mem_dir}/{topic}.md"
+with open(filepath, "w") as f:
+    f.write(content + "\n")
+
+# Update MEMORY.md index
+index_path = f"{workspace}/MEMORY.md"
+existing = ""
+try:
+    with open(index_path) as f:
+        existing = f.read()
+except FileNotFoundError:
+    existing = "# Agent Memory\n"
+
+entry = f"- [{topic}](memory/{topic}.md)"
+if topic not in existing:
+    with open(index_path, "a") as f:
+        f.write(f"\n{entry}")
+
+print(f"Memory saved to {filepath}")
--- a/skills/web_search/SKILL.md
+++ b/skills/web_search/SKILL.md
@@ -0,0 +1,13 @@
+---
+name: web_search
+description: Search the web using SearXNG. Returns titles, URLs, and snippets for the top results. Use this when you need current information or facts you're unsure about.
+parameters:
+  query:
+    type: string
+    description: The search query
+    required: true
+  num_results:
+    type: integer
+    description: Number of results to return (default 5)
+    required: false
+---
--- a/skills/web_search/run.py
+++ b/skills/web_search/run.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python3
+import sys
+import json
+import urllib.request
+import urllib.parse
+
+args = json.loads(sys.stdin.read())
+query = args.get("query", "")
+num_results = args.get("num_results", 5)
+searx_url = args.get("_searx_url", "https://searx.mymx.me")
+
+try:
+    params = urllib.parse.urlencode({"q": query, "format": "json"})
+    req = urllib.request.Request(
+        f"{searx_url}/search?{params}",
+        headers={"User-Agent": "fireclaw-agent"},
+    )
+    with urllib.request.urlopen(req, timeout=15) as resp:
+        data = json.loads(resp.read())
+    results = data.get("results", [])[:num_results]
+    if not results:
+        print("No results found.")
+    else:
+        lines = []
+        for r in results:
+            title = r.get("title", "")
+            url = r.get("url", "")
+            snippet = r.get("content", "")[:150]
+            lines.append(f"- {title}\n  {url}\n  {snippet}")
+        print("\n".join(lines))
+except Exception as e:
+    print(f"[search error: {e}]")
--- a/skills/write_file/SKILL.md
+++ b/skills/write_file/SKILL.md
@@ -0,0 +1,17 @@
+---
+name: write_file
+description: Write content to a file in /workspace. Creates parent directories if needed. Use this to save scripts, reports, data, or any output you want to persist.
+parameters:
+  path:
+    type: string
+    description: Path to write (must be under /workspace)
+    required: true
+  content:
+    type: string
+    description: Content to write to the file
+    required: true
+  append:
+    type: boolean
+    description: If true, append to existing file instead of overwriting (default false)
+    required: false
+---
--- a/skills/write_file/run.py
+++ b/skills/write_file/run.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python3
+"""Write content to a file under /workspace."""
+import json
+import os
+import sys
+
+args = json.loads(sys.stdin.read())
+path = args.get("path", "")
+content = args.get("content", "")
+append = args.get("append", False)
+
+WORKSPACE = os.environ.get("WORKSPACE", "/workspace")
+
+resolved = os.path.realpath(path)
+if not resolved.startswith(WORKSPACE + "/") and resolved != WORKSPACE:
+    print(f"[error: path must be under {WORKSPACE}]")
+    sys.exit(0)
+
+if os.path.isdir(resolved):
+    print(f"[error: {path} is a directory]")
+    sys.exit(0)
+
+try:
+    os.makedirs(os.path.dirname(resolved), exist_ok=True)
+    mode = "a" if append else "w"
+    with open(resolved, mode) as f:
+        f.write(content)
+    size = os.path.getsize(resolved)
+    action = "appended to" if append else "wrote"
+    print(f"[{action} {path} ({size} bytes)]")
+except Exception as e:
+    print(f"[error: {e}]")
--- a/src/agent-manager.ts
+++ b/src/agent-manager.ts
@@ -1,4 +1,3 @@
-import { spawn } from "node:child_process";
 import {
  existsSync,
  mkdirSync,
@@ -11,19 +10,26 @@ import {
 import { join } from "node:path";
 import { execFileSync } from "node:child_process";
 import { CONFIG } from "./config.js";
+
+const SSH_OPTS = [
+  "-o", "StrictHostKeyChecking=no",
+  "-o", "UserKnownHostsFile=/dev/null",
+  "-o", "ConnectTimeout=5",
+  "-i", CONFIG.sshKeyPath,
+];
 import {
-  ensureBridge,
-  ensureNat,
  allocateIp,
  releaseIp,
-  createTap,
  deleteTap,
-  macFromOctet,
  applyNetworkPolicy,
  removeNetworkPolicy,
  type NetworkPolicy,
 } from "./network.js";
-import * as api from "./firecracker-api.js";
+import {
+  setupNetwork,
+  spawnFirecracker,
+  bootVM,
+} from "./firecracker-vm.js";

 export interface AgentInfo {
  name: string;
@@ -39,13 +45,26 @@ export interface AgentInfo {
  startedAt: string;
 }

-interface AgentTemplate {
+export interface AgentTemplate {
  name: string;
  nick: string;
  model: string;
  trigger: string;
  persona: string;
  network?: NetworkPolicy;
+  // Agent runtime settings (passed to config.json)
+  tools?: boolean;
+  context_size?: number;
+  num_predict?: number;
+  temperature?: number;
+  compress?: boolean;
+  compress_threshold?: number;
+  compress_keep?: number;
+  max_tool_rounds?: number;
+  max_response_lines?: number;
+  // Cron scheduling
+  schedule?: string;          // 5-field cron expression, e.g. "0 8 * * *"
+  schedule_timeout?: number;  // seconds before auto-destroy (default 300)
 }

 const AGENTS_FILE = join(CONFIG.baseDir, "agents.json");
@@ -89,7 +108,7 @@ export function listTemplates(): string[] {

 function injectAgentConfig(
  rootfsPath: string,
-  config: { nick: string; model: string; trigger: string },
+  config: Record<string, unknown>,
  persona: string
 ) {
  const mountPoint = `/tmp/fireclaw-agent-${Date.now()}`;
@@ -104,35 +123,25 @@ function injectAgentConfig(
      { stdio: "pipe" }
    );

-    // Write config
+    // Write config (via stdin to avoid shell injection)
    const configJson = JSON.stringify({
-      nick: config.nick,
-      model: config.model,
-      trigger: config.trigger,
      server: "172.16.0.1",
      port: 6667,
      ollama_url: "http://172.16.0.1:11434",
+      ...config,
+    });
+    const configPath = join(mountPoint, "etc/agent/config.json");
+    execFileSync("sudo", ["tee", configPath], {
+      input: configJson,
+      stdio: ["pipe", "pipe", "pipe"],
    });
-    execFileSync(
-      "sudo",
-      [
-        "bash",
-        "-c",
-        `echo '${configJson}' > ${join(mountPoint, "etc/agent/config.json")}`,
-      ],
-      { stdio: "pipe" }
-    );

-    // Write persona
-    execFileSync(
-      "sudo",
-      [
-        "bash",
-        "-c",
-        `cat > ${join(mountPoint, "etc/agent/persona.md")} << 'PERSONA_EOF'\n${persona}\nPERSONA_EOF`,
-      ],
-      { stdio: "pipe" }
-    );
+    // Write persona (via stdin to avoid shell injection)
+    const personaPath = join(mountPoint, "etc/agent/persona.md");
+    execFileSync("sudo", ["tee", personaPath], {
+      input: persona,
+      stdio: ["pipe", "pipe", "pipe"],
+    });

    // Inject SSH key for debugging access
    execFileSync("sudo", ["mkdir", "-p", join(mountPoint, "root/.ssh")], {
@@ -201,24 +210,6 @@ function ensureWorkspace(agentName: string): string {
  return imgPath;
 }

-function waitForSocket(socketPath: string): Promise<void> {
-  return new Promise((resolve, reject) => {
-    const deadline = Date.now() + 5_000;
-    const check = () => {
-      if (existsSync(socketPath)) {
-        setTimeout(resolve, 200);
-        return;
-      }
-      if (Date.now() > deadline) {
-        reject(new Error("Firecracker socket did not appear"));
-        return;
-      }
-      setTimeout(check, 50);
-    };
-    check();
-  });
-}
-
 export async function startAgent(
  templateName: string,
  overrides?: { name?: string; model?: string }
@@ -254,58 +245,41 @@ export async function startAgent(
  // Clean stale socket from previous run
  try { unlinkSync(socketPath); } catch {}

-  // Prepare rootfs
+  // Prepare rootfs — pass all template settings to agent config
  copyFileSync(AGENT_ROOTFS, rootfsPath);
-  injectAgentConfig(
-    rootfsPath,
-    { nick, model, trigger: template.trigger },
-    template.persona
-  );
+  const agentConfig: Record<string, unknown> = {
+    nick,
+    model,
+    trigger: template.trigger,
+  };
+  // Forward optional template settings
+  for (const key of [
+    "tools", "context_size", "num_predict", "temperature",
+    "compress", "compress_threshold", "compress_keep",
+    "max_tool_rounds", "max_response_lines",
+  ] as const) {
+    if ((template as unknown as Record<string, unknown>)[key] !== undefined) {
+      agentConfig[key] = (template as unknown as Record<string, unknown>)[key];
+    }
+  }
+  injectAgentConfig(rootfsPath, agentConfig, template.persona);

  // Create/get persistent workspace
  const workspacePath = ensureWorkspace(name);

  // Setup network
-  ensureBridge();
-  ensureNat();
-  deleteTap(tapDevice); // clean stale tap from previous run
-  createTap(tapDevice);
+  setupNetwork(tapDevice);

  // Boot VM
-  const proc = spawn(
-    CONFIG.firecrackerBin,
-    ["--api-sock", socketPath],
-    { stdio: "pipe", detached: true }
-  );
-  proc.unref();
-
-  await waitForSocket(socketPath);
-
-  const bootArgs = [
-    "console=ttyS0",
-    "reboot=k",
-    "panic=1",
-    "pci=off",
-    "root=/dev/vda",
-    "rw",
-    `ip=${ip}::${CONFIG.bridge.gateway}:${CONFIG.bridge.netmask}::eth0:off`,
-  ].join(" ");
-
-  await api.putBootSource(socketPath, CONFIG.kernelPath, bootArgs);
-  await api.putDrive(socketPath, "rootfs", rootfsPath);
-  await api.putDrive(socketPath, "workspace", workspacePath, false, false);
-  await api.putNetworkInterface(
+  const proc = await spawnFirecracker(socketPath, { detached: true });
+  await bootVM({
    socketPath,
-    "eth0",
+    rootfsPath,
+    extraDrives: [{ id: "workspace", path: workspacePath }],
    tapDevice,
-    macFromOctet(octet)
-  );
-  await api.putMachineConfig(
-    socketPath,
-    CONFIG.vm.vcpuCount,
-    CONFIG.vm.memSizeMib
-  );
-  await api.startInstance(socketPath);
+    ip,
+    octet,
+  });

  // Apply network policy
  const networkPolicy: NetworkPolicy = template.network ?? "full";
@@ -348,14 +322,7 @@ export async function stopAgent(name: string) {
  try {
    execFileSync(
      "ssh",
-      [
-        "-o", "StrictHostKeyChecking=no",
-        "-o", "UserKnownHostsFile=/dev/null",
-        "-o", "ConnectTimeout=3",
-        "-i", CONFIG.sshKeyPath,
-        `root@${info.ip}`,
-        "killall python3 2>/dev/null; sleep 1",
-      ],
+      [...SSH_OPTS, `root@${info.ip}`, "pkill -f 'agent.py' 2>/dev/null; sleep 1"],
      { stdio: "pipe", timeout: 5_000 }
    );
  } catch {
@@ -413,18 +380,10 @@ export function listAgents(): AgentInfo[] {
    } catch {
      // Process is dead, clean up
      log(`Agent "${name}" is dead, cleaning up...`);
-      try {
-        deleteTap(info.tapDevice);
-      } catch {}
-      try {
-        releaseIp(info.octet);
-      } catch {}
-      try {
-        unlinkSync(info.rootfsPath);
-      } catch {}
-      try {
-        unlinkSync(info.socketPath);
-      } catch {}
+      try { deleteTap(info.tapDevice); } catch (e) { log(`  tap cleanup: ${e}`); }
+      try { releaseIp(info.octet); } catch (e) { log(`  ip cleanup: ${e}`); }
+      try { unlinkSync(info.rootfsPath); } catch (e) { log(`  rootfs cleanup: ${e}`); }
+      try { unlinkSync(info.socketPath); } catch (e) { log(`  socket cleanup: ${e}`); }
      delete agents[name];
    }
  }
@@ -452,13 +411,6 @@ export async function reloadAgent(
  }
  if (updates.trigger) configUpdates.trigger = updates.trigger;

-  // Write updated config as a temp file on the VM via SSH
-  const sshOpts = [
-    "-o", "StrictHostKeyChecking=no",
-    "-o", "UserKnownHostsFile=/dev/null",
-    "-o", "ConnectTimeout=5",
-    "-i", CONFIG.sshKeyPath,
-  ];
  const sshTarget = `root@${info.ip}`;

  try {
@@ -466,7 +418,7 @@ export async function reloadAgent(
      // Read current config from VM
      const currentRaw = execFileSync(
        "ssh",
-        [...sshOpts, sshTarget, "cat /etc/agent/config.json"],
+        [...SSH_OPTS, sshTarget, "cat /etc/agent/config.json"],
        { encoding: "utf-8", timeout: 10_000 }
      );
      const current = JSON.parse(currentRaw);
@@ -476,7 +428,7 @@ export async function reloadAgent(
      // Write back via stdin
      execFileSync(
        "ssh",
-        [...sshOpts, sshTarget, `cat > /etc/agent/config.json`],
+        [...SSH_OPTS, sshTarget, `cat > /etc/agent/config.json`],
        { input: newConfig, timeout: 10_000 }
      );
    }
@@ -484,7 +436,7 @@ export async function reloadAgent(
    if (updates.persona) {
      execFileSync(
        "ssh",
-        [...sshOpts, sshTarget, `cat > /etc/agent/persona.md`],
+        [...SSH_OPTS, sshTarget, `cat > /etc/agent/persona.md`],
        { input: updates.persona, timeout: 10_000 }
      );
    }
@@ -492,7 +444,7 @@ export async function reloadAgent(
    // Signal agent to reload
    execFileSync(
      "ssh",
-      [...sshOpts, sshTarget, "killall -HUP python3"],
+      [...SSH_OPTS, sshTarget, "pkill -HUP -f 'agent.py'"],
      { stdio: "pipe", timeout: 10_000 }
    );
  } catch (err) {
@@ -522,11 +474,10 @@ export function reconcileAgents(): { adopted: string[]; cleaned: string[] } {
      log(`Adopted running agent "${name}" (PID ${info.pid}, ${info.ip})`);
    } else {
      log(`Cleaning dead agent "${name}" (PID ${info.pid} gone)...`);
-      // Clean up resources from dead agent
-      try { deleteTap(info.tapDevice); } catch {}
-      try { releaseIp(info.octet); } catch {}
-      try { unlinkSync(info.rootfsPath); } catch {}
-      try { unlinkSync(info.socketPath); } catch {}
+      try { deleteTap(info.tapDevice); } catch (e) { log(`  tap: ${e}`); }
+      try { releaseIp(info.octet); } catch (e) { log(`  ip: ${e}`); }
+      try { unlinkSync(info.rootfsPath); } catch (e) { log(`  rootfs: ${e}`); }
+      try { unlinkSync(info.socketPath); } catch (e) { log(`  socket: ${e}`); }
      delete agents[name];
      cleaned.push(name);
    }
--- a/src/firecracker-vm.ts
+++ b/src/firecracker-vm.ts
@@ -0,0 +1,164 @@
+/**
+ * Shared Firecracker VM lifecycle helpers.
+ * Used by vm.ts, snapshot.ts, and agent-manager.ts.
+ */
+
+import { spawn, type ChildProcess } from "node:child_process";
+import { existsSync, unlinkSync, mkdirSync } from "node:fs";
+import { CONFIG } from "./config.js";
+import * as api from "./firecracker-api.js";
+import {
+  ensureBridge,
+  ensureNat,
+  createTap,
+  deleteTap,
+  macFromOctet,
+} from "./network.js";
+
+export interface BootOptions {
+  socketPath: string;
+  kernelPath?: string;
+  rootfsPath: string;
+  extraDrives?: { id: string; path: string; readOnly?: boolean }[];
+  tapDevice: string;
+  ip: string;
+  octet: number;
+  vcpu?: number;
+  mem?: number;
+}
+
+/**
+ * Wait for a Firecracker API socket to appear.
+ */
+export function waitForSocket(
+  socketPath: string,
+  timeoutMs = 5_000
+): Promise<void> {
+  return new Promise((resolve, reject) => {
+    const deadline = Date.now() + timeoutMs;
+    const check = () => {
+      if (existsSync(socketPath)) {
+        setTimeout(resolve, 200);
+        return;
+      }
+      if (Date.now() > deadline) {
+        reject(new Error("Firecracker socket did not appear"));
+        return;
+      }
+      setTimeout(check, 50);
+    };
+    check();
+  });
+}
+
+/**
+ * Set up network for a VM: ensure bridge, NAT, and create tap device.
+ * Cleans stale tap first.
+ */
+export function setupNetwork(tapDevice: string) {
+  ensureBridge();
+  ensureNat();
+  deleteTap(tapDevice);
+  createTap(tapDevice);
+}
+
+/**
+ * Spawn a Firecracker process and wait for the API socket.
+ */
+export async function spawnFirecracker(
+  socketPath: string,
+  opts?: { detached?: boolean }
+): Promise<ChildProcess> {
+  // Clean stale socket
+  try {
+    unlinkSync(socketPath);
+  } catch {}
+
+  mkdirSync(CONFIG.socketDir, { recursive: true });
+
+  const proc = spawn(
+    CONFIG.firecrackerBin,
+    ["--api-sock", socketPath],
+    {
+      stdio: "pipe",
+      detached: opts?.detached ?? false,
+    }
+  );
+
+  if (opts?.detached) proc.unref();
+
+  await waitForSocket(socketPath);
+  return proc;
+}
+
+/**
+ * Configure and start a Firecracker VM via its API.
+ */
+export async function bootVM(opts: BootOptions) {
+  const kernel = opts.kernelPath ?? CONFIG.kernelPath;
+  const vcpu = opts.vcpu ?? CONFIG.vm.vcpuCount;
+  const mem = opts.mem ?? CONFIG.vm.memSizeMib;
+
+  const bootArgs = [
+    "console=ttyS0",
+    "reboot=k",
+    "panic=1",
+    "pci=off",
+    "root=/dev/vda",
+    "rw",
+    `ip=${opts.ip}::${CONFIG.bridge.gateway}:${CONFIG.bridge.netmask}::eth0:off`,
+  ].join(" ");
+
+  await api.putBootSource(opts.socketPath, kernel, bootArgs);
+  await api.putDrive(opts.socketPath, "rootfs", opts.rootfsPath);
+
+  if (opts.extraDrives) {
+    for (const drive of opts.extraDrives) {
+      await api.putDrive(
+        opts.socketPath,
+        drive.id,
+        drive.path,
+        drive.readOnly ?? false,
+        false
+      );
+    }
+  }
+
+  await api.putNetworkInterface(
+    opts.socketPath,
+    "eth0",
+    opts.tapDevice,
+    macFromOctet(opts.octet)
+  );
+  await api.putMachineConfig(opts.socketPath, vcpu, mem);
+  await api.startInstance(opts.socketPath);
+}
+
+/**
+ * Kill a Firecracker process and clean up its socket.
+ */
+export async function killFirecracker(
+  proc: ChildProcess | null,
+  socketPath: string,
+  signal: NodeJS.Signals = "SIGTERM"
+) {
+  if (proc && !proc.killed) {
+    proc.kill(signal);
+    await new Promise<void>((resolve) => {
+      const timer = setTimeout(() => {
+        if (proc && !proc.killed) {
+          proc.kill("SIGKILL");
+        }
+        resolve();
+      }, 2_000);
+      proc.on("exit", () => {
+        clearTimeout(timer);
+        resolve();
+      });
+    });
+  }
+
+  try {
+    unlinkSync(socketPath);
+  } catch {}
+}
--- a/src/network.ts
+++ b/src/network.ts
@@ -1,5 +1,5 @@
 import { execFileSync } from "node:child_process";
-import { openSync, closeSync, readFileSync, writeFileSync } from "node:fs";
+import { readFileSync, writeFileSync, renameSync } from "node:fs";
 import { CONFIG } from "./config.js";

 function run(cmd: string, args: string[]) {
@@ -195,39 +195,35 @@ function readPool(): IpPool {
  }
 }

-function writePool(pool: IpPool) {
-  writeFileSync(CONFIG.ipPoolFile, JSON.stringify(pool));
+function atomicWritePool(pool: IpPool) {
+  const tmp = CONFIG.ipPoolFile + ".tmp";
+  writeFileSync(tmp, JSON.stringify(pool));
+  renameSync(tmp, CONFIG.ipPoolFile);
 }

 export function allocateIp(): { ip: string; octet: number } {
-  const fd = openSync(CONFIG.ipPoolLock, "w");
-  try {
-    // Simple flock via child process
-    const pool = readPool();
-    for (
-      let octet = CONFIG.bridge.minHost;
-      octet <= CONFIG.bridge.maxHost;
-      octet++
-    ) {
-      if (!pool.allocated.includes(octet)) {
-        pool.allocated.push(octet);
-        writePool(pool);
-        return { ip: `${CONFIG.bridge.prefix}.${octet}`, octet };
-      }
+  // Use flock for proper mutual exclusion
+  const result = execFileSync("bash", ["-c",
+    `flock "${CONFIG.ipPoolLock}" cat "${CONFIG.ipPoolFile}" 2>/dev/null || echo '{"allocated":[]}'`
+  ], { encoding: "utf-8" });
+  const pool: IpPool = JSON.parse(result.trim());
+
+  for (
+    let octet = CONFIG.bridge.minHost;
+    octet <= CONFIG.bridge.maxHost;
+    octet++
+  ) {
+    if (!pool.allocated.includes(octet)) {
+      pool.allocated.push(octet);
+      atomicWritePool(pool);
+      return { ip: `${CONFIG.bridge.prefix}.${octet}`, octet };
    }
-    throw new Error("No free IPs in pool");
-  } finally {
-    closeSync(fd);
  }
+  throw new Error("No free IPs in pool");
 }

 export function releaseIp(octet: number) {
-  const fd = openSync(CONFIG.ipPoolLock, "w");
-  try {
-    const pool = readPool();
-    pool.allocated = pool.allocated.filter((o) => o !== octet);
-    writePool(pool);
-  } finally {
-    closeSync(fd);
-  }
+  const pool = readPool();
+  pool.allocated = pool.allocated.filter((o) => o !== octet);
+  atomicWritePool(pool);
 }
--- a/src/overseer.ts
+++ b/src/overseer.ts
@@ -7,8 +7,18 @@ import {
  listTemplates,
  reconcileAgents,
  reloadAgent,
+  loadTemplate,
  type AgentInfo,
+  type AgentTemplate,
 } from "./agent-manager.js";
+import { CONFIG } from "./config.js";
+
+const SSH_OPTS = [
+  "-o", "StrictHostKeyChecking=no",
+  "-o", "UserKnownHostsFile=/dev/null",
+  "-o", "ConnectTimeout=3",
+  "-i", CONFIG.sshKeyPath,
+];

 interface OverseerConfig {
  server: string;
@@ -29,6 +39,37 @@ function formatAgentList(agents: AgentInfo[]): string[] {
  );
 }

+function fieldMatches(field: string, value: number): boolean {
+  if (field === "*") return true;
+  return field.split(",").some((part) => {
+    if (part.includes("/")) {
+      const [range, stepStr] = part.split("/");
+      const step = parseInt(stepStr);
+      if (range === "*") return value % step === 0;
+      const [min, max] = range.split("-").map(Number);
+      return value >= min && value <= max && (value - min) % step === 0;
+    }
+    if (part.includes("-")) {
+      const [min, max] = part.split("-").map(Number);
+      return value >= min && value <= max;
+    }
+    return parseInt(part) === value;
+  });
+}
+
+function matchesCron(expr: string, date: Date): boolean {
+  const fields = expr.trim().split(/\s+/);
+  if (fields.length !== 5) return false;
+  const checks: [string, number][] = [
+    [fields[0], date.getMinutes()],
+    [fields[1], date.getHours()],
+    [fields[2], date.getDate()],
+    [fields[3], date.getMonth() + 1],
+    [fields[4], date.getDay()],
+  ];
+  return checks.every(([field, value]) => fieldMatches(field, value));
+}
+
 export async function runOverseer(config: OverseerConfig) {
  // Reconcile agent state on startup
  log("Reconciling agent state...");
@@ -195,8 +236,91 @@ export async function runOverseer(config: OverseerConfig) {
          break;
        }

+        case "!logs": {
+          const name = parts[1];
+          const n = parseInt(parts[2] || "10");
+          if (!name) {
+            bot.say(event.target, "Usage: !logs <name> [lines]");
+            return;
+          }
+          const agents = listAgents();
+          const agent = agents.find((a) => a.name === name);
+          if (!agent) {
+            bot.say(event.target, `Agent "${name}" not found.`);
+            return;
+          }
+          try {
+            const { execFileSync } = await import("node:child_process");
+            const logs = execFileSync("ssh", [
+              ...SSH_OPTS,
+              `root@${agent.ip}`,
+              `tail -n ${n} /workspace/agent.log 2>/dev/null || echo '[no logs yet]'`,
+            ], { encoding: "utf-8", timeout: 5_000 }).trim();
+            for (const line of logs.split("\n")) {
+              bot.say(event.target, line);
+            }
+          } catch {
+            bot.say(event.target, `Could not read logs for "${name}".`);
+          }
+          break;
+        }
+
+        case "!persona": {
+          const name = parts[1];
+          if (!name) {
+            bot.say(event.target, "Usage: !persona <name> [new persona text]");
+            return;
+          }
+          const newPersona = parts.slice(2).join(" ");
+          if (newPersona) {
+            await reloadAgent(name, { persona: newPersona });
+            bot.say(event.target, `Agent "${name}" persona updated.`);
+          } else {
+            // View current persona — read from agent config via SSH
+            const agents = listAgents();
+            const agent = agents.find((a) => a.name === name);
+            if (!agent) {
+              bot.say(event.target, `Agent "${name}" not found.`);
+              return;
+            }
+            try {
+              const { execFileSync } = await import("node:child_process");
+              const persona = execFileSync("ssh", [
+                ...SSH_OPTS,
+                `root@${agent.ip}`,
+                "cat /etc/agent/persona.md",
+              ], { encoding: "utf-8", timeout: 5_000 }).trim();
+              bot.say(event.target, `${name}: ${persona}`);
+            } catch {
+              bot.say(event.target, `Could not read persona for "${name}".`);
+            }
+          }
+          break;
+        }
+
+        case "!version": {
+          try {
+            const { readFileSync } = await import("node:fs");
+            const { execFileSync } = await import("node:child_process");
+            const { join, dirname } = await import("node:path");
+            const { fileURLToPath } = await import("node:url");
+            const pkgDir = join(dirname(fileURLToPath(import.meta.url)), "..");
+            const pkg = JSON.parse(readFileSync(join(pkgDir, "package.json"), "utf-8"));
+            let gitHash = "";
+            try {
+              gitHash = execFileSync("git", ["rev-parse", "--short", "HEAD"], {
+                encoding: "utf-8", cwd: pkgDir, timeout: 3_000,
+              }).trim();
+            } catch {}
+            bot.say(event.target, `fireclaw v${pkg.version}${gitHash ? ` (${gitHash})` : ""}`);
+          } catch {
+            bot.say(event.target, "fireclaw (version unknown)");
+          }
+          break;
+        }
+
        case "!help": {
-          bot.say(event.target, "Commands: !invoke <template> [name] | !destroy <name> | !list | !model <name> <model> | !models | !templates | !status | !help");
+          bot.say(event.target, "Commands: !invoke !destroy !list !model !models !templates !persona !logs !status !version !help");
          break;
        }
      }
@@ -247,5 +371,54 @@ export async function runOverseer(config: OverseerConfig) {

  setInterval(healthCheck, HEALTH_CHECK_INTERVAL);

+  // Cron agent scheduler — check every 60s
+  const CRON_CHECK_INTERVAL = 60_000;
+  const cronCheck = async () => {
+    const templates = listTemplates();
+    for (const tmplName of templates) {
+      try {
+        const template = loadTemplate(tmplName);
+        if (!template.schedule) continue;
+
+        const now = new Date();
+        if (!matchesCron(template.schedule, now)) continue;
+
+        const cronName = `${template.name}-cron`;
+
+        // Skip if already running
+        const running = listAgents();
+        if (running.some((a) => a.name === cronName)) continue;
+
+        log(`Cron trigger: spawning "${cronName}" from template "${tmplName}"`);
+        bot.say(config.channel, `Cron: spawning "${cronName}" from template "${tmplName}"`);
+
+        const info = await startAgent(tmplName, { name: cronName });
+        knownAgents.add(cronName);
+
+        // Schedule auto-destroy
+        const timeout = (template.schedule_timeout ?? 300) * 1000;
+        setTimeout(async () => {
+          try {
+            const current = listAgents();
+            if (current.some((a) => a.name === cronName)) {
+              log(`Cron timeout: destroying "${cronName}" after ${timeout / 1000}s`);
+              bot.say(config.channel, `Cron: destroying "${cronName}" (timeout ${timeout / 1000}s)`);
+              await stopAgent(cronName);
+              knownAgents.delete(cronName);
+            }
+          } catch (err) {
+            const msg = err instanceof Error ? err.message : String(err);
+            log(`Error destroying cron agent "${cronName}": ${msg}`);
+          }
+        }, timeout);
+      } catch (err) {
+        const msg = err instanceof Error ? err.message : String(err);
+        log(`Cron error for template "${tmplName}": ${msg}`);
+      }
+    }
+  };
+
+  setInterval(cronCheck, CRON_CHECK_INTERVAL);
+
  log("Overseer started. Waiting for commands...");
 }
--- a/src/snapshot.ts
+++ b/src/snapshot.ts
@@ -1,45 +1,22 @@
-import { spawn, type ChildProcess } from "node:child_process";
-import { existsSync, mkdirSync } from "node:fs";
+import { type ChildProcess } from "node:child_process";
+import { existsSync, mkdirSync, copyFileSync } from "node:fs";
 import { join } from "node:path";
 import { CONFIG } from "./config.js";
 import * as api from "./firecracker-api.js";
-import {
-  ensureBridge,
-  ensureNat,
-  createTap,
-  deleteTap,
-  macFromOctet,
-} from "./network.js";
-import {
-  ensureBaseImage,
-  ensureSshKeypair,
-  injectSshKey,
-} from "./rootfs.js";
+import { deleteTap } from "./network.js";
+import { ensureBaseImage, ensureSshKeypair, injectSshKey } from "./rootfs.js";
 import { waitForSsh } from "./ssh.js";
-import { copyFileSync } from "node:fs";
+import {
+  setupNetwork,
+  spawnFirecracker,
+  bootVM,
+  killFirecracker,
+} from "./firecracker-vm.js";

 function log(msg: string) {
  process.stderr.write(`[snapshot] ${msg}\n`);
 }

-function waitForSocket(socketPath: string): Promise<void> {
-  return new Promise((resolve, reject) => {
-    const deadline = Date.now() + 5_000;
-    const check = () => {
-      if (existsSync(socketPath)) {
-        setTimeout(resolve, 200);
-        return;
-      }
-      if (Date.now() > deadline) {
-        reject(new Error("Firecracker socket did not appear"));
-        return;
-      }
-      setTimeout(check, 50);
-    };
-    check();
-  });
-}
-
 export function snapshotExists(): boolean {
  return (
    existsSync(CONFIG.snapshot.statePath) &&
@@ -61,47 +38,21 @@ export async function createSnapshot() {
  injectSshKey(snap.rootfsPath);

  log("Setting up network...");
-  ensureBridge();
-  ensureNat();
-  deleteTap(snap.tapDevice); // clean stale tap from previous run
-  createTap(snap.tapDevice);
+  setupNetwork(snap.tapDevice);

  let proc: ChildProcess | null = null;

  try {
    log("Booting VM for snapshot...");
-    proc = spawn(CONFIG.firecrackerBin, ["--api-sock", socketPath], {
-      stdio: "pipe",
-      detached: false,
+    proc = await spawnFirecracker(socketPath);
+    await bootVM({
+      socketPath,
+      rootfsPath: snap.rootfsPath,
+      tapDevice: snap.tapDevice,
+      ip: snap.ip,
+      octet: snap.octet,
    });

-    await waitForSocket(socketPath);
-
-    const bootArgs = [
-      "console=ttyS0",
-      "reboot=k",
-      "panic=1",
-      "pci=off",
-      "root=/dev/vda",
-      "rw",
-      `ip=${snap.ip}::${CONFIG.bridge.gateway}:${CONFIG.bridge.netmask}::eth0:off`,
-    ].join(" ");
-
-    await api.putBootSource(socketPath, CONFIG.kernelPath, bootArgs);
-    await api.putDrive(socketPath, "rootfs", snap.rootfsPath);
-    await api.putNetworkInterface(
-      socketPath,
-      "eth0",
-      snap.tapDevice,
-      macFromOctet(snap.octet)
-    );
-    await api.putMachineConfig(
-      socketPath,
-      CONFIG.vm.vcpuCount,
-      CONFIG.vm.memSizeMib
-    );
-    await api.startInstance(socketPath);
-
    log("Waiting for SSH...");
    await waitForSsh(snap.ip);

@@ -116,13 +67,7 @@ export async function createSnapshot() {
    log(`  Memory: ${snap.memPath}`);
    log(`  Rootfs: ${snap.rootfsPath}`);
  } finally {
-    if (proc && !proc.killed) {
-      proc.kill("SIGKILL");
-    }
-    try {
-      const { unlinkSync } = await import("node:fs");
-      unlinkSync(socketPath);
-    } catch {}
+    await killFirecracker(proc, socketPath, "SIGKILL");
    deleteTap(snap.tapDevice);
  }
 }
--- a/src/vm.ts
+++ b/src/vm.ts
@@ -1,19 +1,11 @@
-import { spawn, type ChildProcess } from "node:child_process";
-import { existsSync, mkdirSync } from "node:fs";
+import { type ChildProcess } from "node:child_process";
+import { mkdirSync } from "node:fs";
 import { join } from "node:path";
 import { randomBytes } from "node:crypto";
 import { CONFIG } from "./config.js";
 import type { VMConfig, RunResult, RunOptions } from "./types.js";
 import * as api from "./firecracker-api.js";
-import {
-  ensureBridge,
-  ensureNat,
-  allocateIp,
-  releaseIp,
-  createTap,
-  deleteTap,
-  macFromOctet,
-} from "./network.js";
+import { allocateIp, releaseIp, deleteTap } from "./network.js";
 import {
  ensureBaseImage,
  ensureSshKeypair,
@@ -24,6 +16,12 @@ import {
 import { waitForSsh, execCommand } from "./ssh.js";
 import { registerVm, unregisterVm } from "./cleanup.js";
 import { snapshotExists } from "./snapshot.js";
+import {
+  setupNetwork,
+  spawnFirecracker,
+  bootVM,
+  killFirecracker,
+} from "./firecracker-vm.js";

 function log(verbose: boolean, msg: string) {
  if (verbose) process.stderr.write(`[fireclaw] ${msg}\n`);
@@ -42,7 +40,6 @@ export class VMInstance {
    command: string,
    opts: RunOptions = {}
  ): Promise<RunResult> {
-    // Try snapshot path first unless disabled
    if (!opts.noSnapshot && snapshotExists()) {
      return VMInstance.runFromSnapshot(command, opts);
    }
@@ -65,33 +62,20 @@ export class VMInstance {
      guestIp: snap.ip,
      tapDevice: snap.tapDevice,
      socketPath: join(CONFIG.socketDir, `${id}.sock`),
-      rootfsPath: "", // shared, not per-run
+      rootfsPath: "",
      timeoutMs,
      verbose,
    };

    const vm = new VMInstance(config);
-    vm.octet = 0; // no IP pool allocation for snapshot runs
+    vm.octet = 0;
    registerVm(vm);

    try {
      log(verbose, `VM ${id}: restoring from snapshot...`);
-      ensureBridge();
-      ensureNat();
-      deleteTap(snap.tapDevice); // clean stale tap from previous run
-      createTap(snap.tapDevice);
+      setupNetwork(snap.tapDevice);

-      // Spawn firecracker and load snapshot
-      vm.process = spawn(
-        CONFIG.firecrackerBin,
-        ["--api-sock", config.socketPath],
-        { stdio: "pipe", detached: false }
-      );
-      vm.process.on("error", (err) => {
-        log(verbose, `Firecracker process error: ${err.message}`);
-      });
-
-      await vm.waitForSocket();
+      vm.process = await spawnFirecracker(config.socketPath);
      await api.putSnapshotLoad(
        config.socketPath,
        snap.statePath,
@@ -124,16 +108,12 @@ export class VMInstance {
    const verbose = opts.verbose ?? false;
    const timeoutMs = opts.timeout ?? CONFIG.vm.defaultTimeoutMs;

-    // Pre-flight checks
    ensureBaseImage();
    ensureSshKeypair();

-    // Allocate resources
    const { ip, octet } = allocateIp();
    const tapDevice = `fctap${octet}`;

-    mkdirSync(CONFIG.socketDir, { recursive: true });
-
    const config: VMConfig = {
      id,
      guestIp: ip,
@@ -154,13 +134,19 @@ export class VMInstance {
      injectSshKey(config.rootfsPath);

      log(verbose, `VM ${id}: creating tap ${tapDevice}...`);
-      ensureBridge();
-      ensureNat();
-      deleteTap(tapDevice); // clean stale tap from previous run
-      createTap(tapDevice);
+      setupNetwork(tapDevice);

      log(verbose, `VM ${id}: booting...`);
-      await vm.boot(opts);
+      vm.process = await spawnFirecracker(config.socketPath);
+      await bootVM({
+        socketPath: config.socketPath,
+        rootfsPath: config.rootfsPath,
+        tapDevice,
+        ip,
+        octet,
+        vcpu: opts.vcpu,
+        mem: opts.mem,
+      });

      log(verbose, `VM ${id}: waiting for SSH at ${ip}...`);
      await waitForSsh(ip);
@@ -179,110 +165,17 @@ export class VMInstance {
    }
  }

-  private async boot(opts: RunOptions) {
-    const { config } = this;
-    const vcpu = opts.vcpu ?? CONFIG.vm.vcpuCount;
-    const mem = opts.mem ?? CONFIG.vm.memSizeMib;
-
-    // Spawn firecracker
-    this.process = spawn(
-      CONFIG.firecrackerBin,
-      ["--api-sock", config.socketPath],
-      {
-        stdio: "pipe",
-        detached: false,
-      }
-    );
-
-    this.process.on("error", (err) => {
-      log(config.verbose, `Firecracker process error: ${err.message}`);
-    });
-
-    // Wait for socket
-    await this.waitForSocket();
-
-    // Configure via API
-    const bootArgs = [
-      "console=ttyS0",
-      "reboot=k",
-      "panic=1",
-      "pci=off",
-      "root=/dev/vda",
-      "rw",
-      `ip=${config.guestIp}::${CONFIG.bridge.gateway}:${CONFIG.bridge.netmask}::eth0:off`,
-    ].join(" ");
-
-    await api.putBootSource(config.socketPath, CONFIG.kernelPath, bootArgs);
-    await api.putDrive(config.socketPath, "rootfs", config.rootfsPath);
-    await api.putNetworkInterface(
-      config.socketPath,
-      "eth0",
-      config.tapDevice,
-      macFromOctet(this.octet)
-    );
-    await api.putMachineConfig(config.socketPath, vcpu, mem);
-    await api.startInstance(config.socketPath);
-  }
-
-  private waitForSocket(): Promise<void> {
-    const socketPath = this.config.socketPath;
-    return new Promise((resolve, reject) => {
-      const deadline = Date.now() + 5_000;
-
-      const check = () => {
-        if (existsSync(socketPath)) {
-          setTimeout(resolve, 200);
-          return;
-        }
-        if (Date.now() > deadline) {
-          reject(new Error("Firecracker socket did not appear"));
-          return;
-        }
-        setTimeout(check, 50);
-      };
-
-      check();
-    });
-  }
-
  async destroy() {
    const { config } = this;
    log(config.verbose, `VM ${config.id}: cleaning up...`);

-    // Kill firecracker
-    if (this.process && !this.process.killed) {
-      this.process.kill("SIGTERM");
-      await new Promise<void>((resolve) => {
-        const timer = setTimeout(() => {
-          if (this.process && !this.process.killed) {
-            this.process.kill("SIGKILL");
-          }
-          resolve();
-        }, 2_000);
-        this.process!.on("exit", () => {
-          clearTimeout(timer);
-          resolve();
-        });
-      });
-    }
-
-    // Clean up socket
-    try {
-      const { unlinkSync } = await import("node:fs");
-      unlinkSync(config.socketPath);
-    } catch {
-      // Already gone
-    }
-
-    // Clean up tap device
+    await killFirecracker(this.process, config.socketPath);
    deleteTap(config.tapDevice);

-    // Release IP (skip for snapshot runs which don't allocate from pool)
    if (this.octet > 0) {
      releaseIp(this.octet);
    }

-    // Delete rootfs copy (skip for snapshot runs which share rootfs)
    if (config.rootfsPath) {
      deleteRunCopy(config.rootfsPath);
    }
Author	SHA1	Message	Date
ansible	abc91bc149	Add dangerous command blocking and cron agent scheduling Dangerous command approval: run_command skill now checks commands against 9 regex patterns (rm -rf /, dd, mkfs, fork bombs, shutdown, device writes, etc.) and blocks execution with a clear message. Defense-in-depth layer on top of VM isolation. Cron agents: templates support schedule (5-field cron) and schedule_timeout (seconds, default 300) fields. Overseer checks every 60s, spawns {name}-cron agents on match, auto-destroys after timeout. Inline cron parser supports *, ranges, lists, and steps. No npm dependencies added.	2026-04-08 19:26:23 +00:00
ansible	c827d341ab	Overhaul agent quality — prompts, tools, config, compression - Rewrite system prompt: structured sections, explicit tool descriptions with full SKILL.md descriptions, multi-agent awareness - Add write_file skill for creating/modifying workspace files - Per-template config passthrough: temperature, num_predict, context_size, compress settings, max_tool_rounds, max_response_lines - Bump defaults: 1024 output tokens (was 512), 500-char deque (was 200), 250-token summaries (was 150), compress threshold 16 (was 12), keep 8 (was 4) - Cache compression by content hash — no redundant summarization - Update all 5 templates with tuned settings per role	2026-04-08 18:28:26 +00:00
ansible	6c4ad47b09	Fix update.sh — mkdir -p before copying agent/skills into rootfs	2026-04-08 14:53:07 +00:00
ansible	5b312e34de	Add read_file skill, session persistence, and update script - New read_file skill: paginated file reading with line ranges, path restricted to /workspace, binary detection, directory listing - Session persistence via SQLite + FTS5: conversation history survives agent restarts, last N messages restored into deque on boot, auto-prune to 1000 messages - Update truncation hint to reference read_file instead of run_command - New scripts/update.sh for patching rootfs + rebuilding snapshot Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:49:54 +00:00
ansible	6673210ff0	Add remaining low-priority items from code review to TODO	2026-04-08 02:00:06 +00:00
ansible	d6a737fbad	Update REPORT.md — 14 of 19 issues fixed	2026-04-08 01:58:06 +00:00
ansible	fdfa468960	Copy all agent/*.py files into rootfs	2026-04-08 01:58:06 +00:00
ansible	ca224a0ae9	Slim down agent.py — uses skills.py and tools.py modules	2026-04-08 01:58:06 +00:00
ansible	383113126f	Extract skills.py and tools.py from agent.py	2026-04-08 01:58:06 +00:00
ansible	2d371ceb86	Harden SKILL.md parser — error logging, flexible indent, CRLF support, type validation	2026-04-08 01:51:34 +00:00
ansible	d838fe08cf	Fix thread safety on cooldown lock, IRC line split to 380 chars	2026-04-08 01:50:03 +00:00
ansible	deca7228c7	Add logging to cleanup catch blocks, use pkill -f agent.py instead of killall python3	2026-04-08 01:50:03 +00:00
ansible	e685a2a7ba	Update REPORT.md with fix status	2026-04-08 01:41:30 +00:00
ansible	f4643b8c59	Add all 5 templates to install script (was missing researcher, creative)	2026-04-08 01:40:20 +00:00
ansible	0ab04e1964	Remove unused writePool function	2026-04-08 01:40:20 +00:00
ansible	100bb98e62	Extract SSH_OPTS constant, deduplicate SSH options	2026-04-08 01:36:01 +00:00
ansible	fe162d11f7	Fix memory reload after save_memory — reload all memory files	2026-04-08 01:36:01 +00:00
ansible	96dfa63c39	Fix IP pool — atomic writes via rename, remove fake lock	2026-04-08 01:36:01 +00:00
ansible	74e79f2870	Fix shell injection — use tee with stdin instead of echo interpolation	2026-04-08 01:36:01 +00:00
ansible	d9b695d5a0	Add code review report	2026-04-08 01:36:01 +00:00
ansible	c363f45ffc	Add !version command to overseer	2026-04-08 01:16:32 +00:00
ansible	426ca8f1c1	Add cron agents and logging ideas	2026-04-07 21:05:36 +00:00
ansible	e1f1a24a37	Add !logs command — tail agent log via SSH	2026-04-07 21:01:39 +00:00
ansible	185cda575e	Add file logging to agent — writes to /workspace/agent.log	2026-04-07 21:01:39 +00:00
ansible	9f624e9497	Remove per-skill truncation — handled by executor now	2026-04-07 20:53:51 +00:00
ansible	3083b5d9d7	Add large output handling and iteration budget to agent - Tool outputs >2K chars saved to workspace/tool_outputs/ with preview - Agent gets first 1500 chars + file path to read the rest - Iteration budget bumped to 10 rounds (configurable per template) - Warning injected when 2 rounds remaining to help model wrap up	2026-04-07 20:53:51 +00:00
ansible	3c00de75d1	Update TODO with prioritized next items	2026-04-07 20:50:19 +00:00
ansible	9a06bbf5ea	Update roadmap with Phase 5 priorities from Hermes analysis	2026-04-07 20:50:19 +00:00
ansible	a604d73340	Install skills into agent rootfs during setup	2026-04-07 20:35:56 +00:00
ansible	2d42d498b3	Refactor agent.py to use discoverable skill system	2026-04-07 20:35:56 +00:00
ansible	4483b585a7	Add skill definitions (SKILL.md + run.py) for all agent tools	2026-04-07 20:35:56 +00:00
ansible	42870c7c1f	Add SKILL.md pattern idea for discoverable agent tools	2026-04-07 20:27:58 +00:00
ansible	590c88ecef	Add !persona command to overseer	2026-04-07 17:43:39 +00:00
ansible	d3ed3619c2	Add fetch_url tool to agent	2026-04-07 17:43:39 +00:00
ansible	d299e394f0	Update ROADMAP — Phase 4 done, reflect current state	2026-04-07 16:34:16 +00:00
ansible	2e5912e73c	Add refactoring note to TODO	2026-04-07 16:32:32 +00:00
ansible	27cb6508dc	Extract shared VM lifecycle helpers into firecracker-vm.ts	2026-04-07 16:32:24 +00:00