32 Commits

Author SHA1 Message Date
Mathias Bergqvist
6328766c7f fix(main): re-evaluate chain per call to respect caller model override
All checks were successful
CI / Lint / Test / Vet (push) Successful in 1m8s
CI / Mirror to GitHub (push) Has been skipped
buildOrch now returns a closure instead of *Orchestrator. Each invocation
calls models.ChainFor(skill, req.Model) so a non-empty caller override
collapses to a single-entry chain (no escalation). The attempts slice is
also allocated fresh per call, preventing unbounded growth across requests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 11:34:09 +02:00
Mathias Bergqvist
f1deedd39d feat(main): wire per-skill Orchestrators replacing single executor.Run
Each skill now gets its own Orchestrator built from its ChainFor entry,
with LiteLLM for local tiers and Claude for cloud tiers. Removes the
defunct models.Resolve calls and single shared executor.Run pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 11:28:10 +02:00
Mathias Bergqvist
5cb272a869 feat(exec): add Orchestrator chain walker with verification and warm-state logging 2026-04-20 11:06:13 +02:00
Mathias Bergqvist
e96b39a812 feat(exec): add Claude verifier for local model output quality gate 2026-04-20 11:02:59 +02:00
Mathias Bergqvist
5db5b33cd7 feat(exec): add LiteLLM HTTP executor for local model dispatch 2026-04-20 10:46:08 +02:00
Mathias Bergqvist
a32457b5bc feat(exec): pass --model flag to claude subprocess for cloud-tier dispatch 2026-04-20 08:55:03 +02:00
Mathias Bergqvist
e0be5f0f98 feat(config): replace single-model config with chain-based routing
Implements escalation chains per skill with three-layer priority:
1. Caller override (model param) — no escalation
2. Per-skill chain from models.yaml
3. default_chain fallback

New APIs:
- Verifier() — fixed verifier for output validation
- LlamaSwapURL() — base URL for warm-state probing
- ChainFor(skill, override) — ordered model list for escalation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 08:48:33 +02:00
Mathias Bergqvist
6d410b810b feat(session): extend Attempt with tier, timing, and verdict fields 2026-04-20 08:35:27 +02:00
Mathias Bergqvist
76f195de2a docs: model orchestration design spec for Phase 3
All checks were successful
CI / Lint / Test / Vet (push) Successful in 1m8s
CI / Mirror to GitHub (push) Successful in 4s
2026-04-20 07:45:32 +02:00
Mathias Bergqvist
f901d4e67d fix(ci): use --follow-tags instead of --tags to avoid re-pushing existing tags
All checks were successful
CI / Lint / Test / Vet (push) Successful in 1m9s
CI / Mirror to GitHub (push) Successful in 3s
2026-04-19 19:16:22 +02:00
Mathias Bergqvist
509c04b6e4 fix(session): use fmt.Fprintf with nolint to satisfy both staticcheck and errcheck
Some checks failed
CI / Lint / Test / Vet (push) Successful in 1m7s
CI / Mirror to GitHub (push) Failing after 3s
2026-04-19 18:56:12 +02:00
Mathias Bergqvist
738275252c feat: hyperguild phase 2 — review/debug/spec/trainer skills with session history injection
Some checks failed
CI / Lint / Test / Vet (push) Failing after 3s
CI / Mirror to GitHub (push) Has been skipped
2026-04-19 14:38:05 +02:00
Mathias Bergqvist
38fcac4cba feat(trainer): add trainer MCP skill with reader→writer sub-agent chain
Reader agent scans session logs for SFT/DPO candidates; writer receives
reader output and formats+writes training pairs to brain/training-data/.
Adds trainer-reader.md and trainer-writer.md discipline prompts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 14:06:00 +02:00
Mathias Bergqvist
7697e901d2 feat(spec): add spec writing MCP skill
Adds the spec skill that generates structured implementation specs from
requirements and writes them to a configurable output path in the project.
Follows the same pattern as review/debug skills with session history injection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:59:28 +02:00
Mathias Bergqvist
8cff57009a feat(debug): add debug MCP skill with hypothesis generation
Implements the debug skill following the same pattern as review. The skill
accepts project_root + error (+ optional context/model/session_id), prepends
session history, and calls the executor to produce 3-5 ordered hypotheses —
diagnosis only, no fixes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:29:58 +02:00
Mathias Bergqvist
8fb44affef feat(review): add code review MCP skill with session history injection
Implements the review skill following the same pattern as retrospective/tdd.
Validates project_root and files args, prepends session history when a
session_id is provided, and delegates to the executor with Read,Bash tools.
Iron-law discipline prompt enforces CRITICAL/WARNING/SUGGESTION output format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:11:29 +02:00
Mathias Bergqvist
582ca5019b feat(tdd): inject session history into green and refactor worker prompts
Adds SessionsDir to tdd.Config, session_id to tool input schemas, and a
prependHistory method that reads the session JSONL log and prepends a
formatted history block to the task prompt before worker invocation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 10:18:23 +02:00
Mathias Bergqvist
858a9ba1a1 fix(exec): expand validPhases and remove schema enum constraint for phase 2026-04-19 10:03:21 +02:00
Mathias Bergqvist
cbef2da8de feat(session): add FormatHistory for worker context injection
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:40:41 +02:00
Mathias Bergqvist
b493651c26 fix(test): update executor test fixture to match --output-format json envelope
All checks were successful
CI / Lint / Test / Vet (push) Successful in 1m9s
CI / Mirror to GitHub (push) Has been skipped
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 07:42:24 +02:00
Mathias Bergqvist
6169404f34 fix(lint): fix remaining errcheck in brain handlers_test
Some checks failed
CI / Lint / Test / Vet (push) Failing after 1m5s
CI / Mirror to GitHub (push) Has been skipped
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 07:17:45 +02:00
Mathias Bergqvist
a67106026f fix(lint): satisfy errcheck for io.Copy, json.Encode, Body.Close, deferred Close
Some checks failed
CI / Lint / Test / Vet (push) Failing after 3s
CI / Mirror to GitHub (push) Has been skipped
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 07:15:35 +02:00
Mathias Bergqvist
99d523189f ci: add Gitea Actions quality gate and GitHub mirror
Some checks failed
CI / Lint / Test / Vet (push) Failing after 3s
CI / Mirror to GitHub (push) Has been skipped
- check job: lint + test + vet across both Go modules (root + ingestion)
- mirror job: pushes main + tags to github.com/mathiasb/hyperguild after check passes
- Taskfile: add VERSION/SHORT_SHA vars, fix build/lint/test/vet for multi-module,
  add tag and push tasks — matches cobalt-dingo conventions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 06:39:22 +02:00
Mathias Bergqvist
2d219760e5 docs: rewrite README for Phase 1 hyperguild
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 06:11:13 +02:00
Mathias Bergqvist
4bf5edb78e fix(exec): use --output-format json to get structured output from claude
--json-schema combined with --output-format text produces empty stdout.
The structured result is in the "structured_output" field of the json
envelope. Updated executor to unwrap the envelope.

Also removes --bare flag which disables OAuth keychain reads, causing
silent auth failure when ANTHROPIC_API_KEY is not set.

Adds goreman Procfile + stdio bridge (cmd/bridge) for Claude Code MCP
integration. Task start/stop replaced with goreman + port-kill.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 06:04:10 +02:00
Mathias Bergqvist
98acf1c14e feat: add task start/stop for tmux-managed hyperguild session 2026-04-17 21:38:26 +02:00
Mathias Bergqvist
9741d8ba28 Merge branch 'feat/hyperguild-phase1' 2026-04-17 21:23:37 +02:00
Mathias Bergqvist
bf67299a48 chore: ignore .superpowers/ brainstorm sessions 2026-04-17 21:23:37 +02:00
Mathias Bergqvist
24d9216474 fix(ingestion): preserve type and domain metadata as frontmatter in written notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:22:14 +02:00
Mathias Bergqvist
344def20bb test: phase 1 integration smoke test passing
All 8 MCP tools verified (tdd_red, tdd_green, tdd_refactor, brain_query,
brain_write, tier, session_log, retrospective). Ingestion write/query,
brain_query, tier, and session_log all return correct responses end-to-end.
Brain note written during smoke test committed to raw/ and wiki/concepts/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:18:08 +02:00
Mathias Bergqvist
d084af1af0 chore: add ingestion server tasks and update MCP registration 2026-04-17 20:54:40 +02:00
Mathias Bergqvist
e98bb2ba65 feat: wire brain, org, sessionlog, retrospective skills into supervisor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:52:16 +02:00
55 changed files with 2904 additions and 178 deletions

View File

@@ -6,7 +6,7 @@
}, },
"supervisor": { "supervisor": {
"url": "http://localhost:3200/mcp", "url": "http://localhost:3200/mcp",
"description": "Skill workers — TDD (red/green/refactor), more coming" "description": "Hyperguild SDO — skill workers (tdd, retrospective), brain tools (brain_query, brain_write), session logging, tier detection"
} }
} }
} }

View File

@@ -6,3 +6,12 @@ SUPERVISOR_MODELS_FILE=./config/models.yaml
# LiteLLM gateway (iguana) # LiteLLM gateway (iguana)
LITELLM_BASE_URL=http://iguana:4000 LITELLM_BASE_URL=http://iguana:4000
LITELLM_API_KEY=your-litellm-master-key LITELLM_API_KEY=your-litellm-master-key
# Ingestion server
INGEST_BASE_URL=http://localhost:3300
INGEST_PORT=3300
INGEST_BRAIN_DIR=./brain
# Brain directories
SUPERVISOR_SESSIONS_DIR=./brain/sessions
SUPERVISOR_BRAIN_DIR=./brain

58
.gitea/workflows/ci.yml Normal file
View File

@@ -0,0 +1,58 @@
name: CI
on:
push:
branches: [main]
tags: ["v*"]
pull_request:
branches: [main]
jobs:
# ── 1. Quality gate ─────────────────────────────────────────────────────────
check:
name: Lint / Test / Vet
runs-on: self-hosted
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache: false # self-hosted: Go cache persists on disk between runs
- name: Verify toolchain
run: |
go version
task --version
govulncheck -version 2>&1 || true
- name: Install golangci-lint
run: |
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/HEAD/install.sh \
| sh -s -- -b "$(go env GOPATH)/bin" v2.11.4
golangci-lint --version
- name: Run checks
run: task check
# ── 2. Mirror to GitHub ─────────────────────────────────────────────────────
mirror:
name: Mirror to GitHub
needs: check
runs-on: self-hosted
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Push to GitHub
run: |
mkdir -p ~/.ssh
echo '${{ secrets.GH_DEPLOY_KEY }}' > ~/.ssh/id_rsa_gh_mirror
chmod 600 ~/.ssh/id_rsa_gh_mirror
ssh-keyscan github.com >> ~/.ssh/known_hosts 2>/dev/null
GIT_SSH_COMMAND="ssh -i ~/.ssh/id_rsa_gh_mirror -o IdentitiesOnly=yes" \
git push git@github.com:mathiasb/hyperguild.git HEAD:main --follow-tags
rm ~/.ssh/id_rsa_gh_mirror
echo "✓ Mirrored to GitHub"

3
.gitignore vendored
View File

@@ -44,3 +44,6 @@ secrets/
# OS # OS
.DS_Store .DS_Store
Thumbs.db Thumbs.db
# Brainstorm sessions
.superpowers/

2
Procfile Normal file
View File

@@ -0,0 +1,2 @@
ingestion: cd ingestion && INGEST_BRAIN_DIR=../brain INGEST_PORT=3300 go run ./cmd/server/
supervisor: SUPERVISOR_CONFIG_DIR=./config/supervisor SUPERVISOR_MODELS_FILE=./config/models.yaml SUPERVISOR_SESSIONS_DIR=./brain/sessions INGEST_BASE_URL=http://localhost:3300 go run ./cmd/supervisor/

165
README.md
View File

@@ -1,98 +1,109 @@
# Project template # hyperguild
Harness-agnostic project scaffold using the Agent Skills open standard. An MCP server that acts as a disciplined AI supervisor for Claude Code sessions.
Instead of letting Claude Code do whatever it wants, hyperguild enforces structured
workflows (TDD red/green/refactor), logs every session, and accumulates learnings
into a searchable brain.
## Quick start ## How it works
```
Your Claude Code session (in any project)
│ MCP tools (over stdio bridge → HTTP)
supervisor :3200 — skill workers: tdd, retrospective
ingestion :3300 — brain HTTP API: query wiki, write notes
brain/
├── sessions/ — JSONL log, one file per session_id
├── wiki/ — searchable knowledge (full-text)
│ ├── concepts/
│ ├── entities/
│ └── sources/
├── raw/ — retrospective output, staged for review
└── training-data/ — SFT/DPO/RL data (Phase 2)
```
## Phase 1 tools (available now)
| Tool | What it does |
|------|-------------|
| `tdd_red` | Writes a failing test for a spec, verifies it fails |
| `tdd_green` | Writes the minimal implementation to make tests pass |
| `tdd_refactor` | Cleans up implementation while keeping tests green |
| `session_log` | Appends a structured entry to the session JSONL log |
| `retrospective` | Reads the session log, identifies novel learnings, writes to brain/raw/ |
| `brain_query` | Full-text search over brain/wiki/ |
| `brain_write` | Writes a note to brain/raw/ (with optional YAML frontmatter) |
| `tier` | Returns the current connectivity tier (1=cloud, 2=LAN, 3=offline) |
## Start the servers
```bash ```bash
degit mathias/project-template my-new-project # Requires goreman: go install github.com/mattn/goreman@latest
cd my-new-project task start # starts ingestion (:3300) + supervisor (:3200) via goreman
task init task stop # kills both by port
``` ```
## Structure ## Connect a project
``` Create `.mcp.json` in your project root:
.context/
├── PROJECT.md ← Canonical project context (edit this)
├── mcp.json ← MCP server config (generated on first sync)
└── system-prompt.txt ← Generated: generic system prompt
.skills/ ```json
├── go-patterns/ {
└── SKILL.md ← Agent Skills standard format "mcpServers": {
└── htmx-patterns/ "supervisor": {
└── SKILL.md "command": "/Users/mathias/dev/AI/supervisor/bin/supervisor-bridge",
"env": {
scripts/ "SUPERVISOR_URL": "http://localhost:3200/mcp"
└── context-sync.sh ← Adapter generator (finds root AGENT.md automatically) }
}
Taskfile.yml ← Task runner config }
DECISIONS.md ← Why things are the way they are }
``` ```
## Generated files (gitignored) Build the bridge binary once: `task bridge:build`
| File | Consumer | Notes | Then open Claude Code in your project — run `/mcp` to confirm `supervisor` is listed.
|------|----------|-------|
| `AGENTS.md` | Crush, Pi, Antigravity | Root + project concatenated |
| `CLAUDE.md` | Claude Code | Project-only (inherits root via tree walk) |
| `.cursorrules` | Cursor | Root + project concatenated |
| `.aider.conventions.md` | Aider | Root + project concatenated |
| `.context/system-prompt.txt` | Open WebUI, Mods, generic | Root + project concatenated |
## How root context works ## A typical TDD session
The script walks up from the project directory looking for `~/dev/.context/AGENT.md`. ```
1. Call tdd_red → spec in, failing test file out
- **Claude Code**: inherits natively (reads every `CLAUDE.md` up the tree) → project CLAUDE.md is project-only 2. Call tdd_green → test path in, implementation out
- **Everything else**: can't walk the tree → script concatenates root + project into each generated file 3. Call tdd_refactor → impl + test in, cleaned code out
4. Call session_log → log each phase result
## Skills 5. Call retrospective → extracts learnings → brain/raw/
6. Review brain/raw/, move worthy notes to brain/wiki/concepts/
Skills use the [Agent Skills open standard](https://github.com/badlogic/pi-skills). Each skill is a folder with a `SKILL.md` containing frontmatter: 7. Future sessions: call brain_query to retrieve relevant context
```yaml
---
name: my-skill
description: What this skill does. When to use it.
---
# Instructions here
``` ```
Supported natively by Claude Code, Pi, Crush, and Antigravity. No adapter needed for skills. ## Tier detection
### Adding a skill The supervisor probes connectivity at call time:
```bash | Tier | Label | Condition |
mkdir .skills/my-new-skill |------|-------|-----------|
# Create .skills/my-new-skill/SKILL.md with frontmatter + instructions | 1 | full-online | Can reach api.anthropic.com |
``` | 2 | lan-only | Can reach LiteLLM but not Anthropic |
| 3 | airplane | No external connectivity |
### Using pi-skills (cross-compatible) ## Key env vars
```bash | Variable | Default | Purpose |
# User-level (all projects) |----------|---------|---------|
git clone https://github.com/badlogic/pi-skills ~/.pi/agent/skills/pi-skills | `INGEST_BRAIN_DIR` | `../brain` | Brain directory for ingestion server |
| `INGEST_PORT` | `3300` | Ingestion server port |
| `SUPERVISOR_CONFIG_DIR` | `./config/supervisor` | Skill discipline files |
| `SUPERVISOR_SESSIONS_DIR` | `./brain/sessions` | JSONL session logs |
| `INGEST_BASE_URL` | `http://localhost:3300` | Supervisor → ingestion |
| `LITELLM_BASE_URL` | — | LiteLLM proxy for Tier 2 model routing |
# Symlink for Claude Code ## Phase 2 (planned)
ln -s ~/.pi/agent/skills/pi-skills/brave-search ~/.claude/skills/brave-search
```
## Usage with specific tools - `review` skill — structured code review with iron law enforcement
- `debug` skill — hypothesis-driven debugging sessions
**Claude Code**: `task context:sync:claude` → reads `CLAUDE.md` + discovers `.skills/*/SKILL.md` - `spec` skill — generates specs from conversations
- `trainer` — extracts SFT/DPO pairs from session logs for fine-tuning
**Crush**: `task context:sync:agents` → reads `AGENTS.md` + discovers `.skills/*/SKILL.md`
**Pi**: `task context:sync:agents` → reads `AGENTS.md` + discovers `.skills/*/SKILL.md` (or symlink `.skills/` to `.pi/skills/`)
**Antigravity**: `task context:sync:agents` → reads `AGENTS.md` + discovers `.skills/*/SKILL.md`
**Cursor**: `task context:sync:cursor` → reads `.cursorrules`
**Mistral Vibe**: Run root-level `task context:sync:vibe` once → `vibe --agent mathias`
**Open WebUI / Mods**: Copy `.context/system-prompt.txt` into a preset or pipe it
**Any other tool**: Point at `.context/PROJECT.md` directly — it's human-readable markdown

View File

@@ -1,7 +1,11 @@
version: '3' version: '3'
vars: vars:
PROJECT_NAME: '{{.PROJECT_NAME | default "myproject"}}' PROJECT_NAME: hyperguild
VERSION:
sh: git describe --tags --always --dirty 2>/dev/null || echo "dev"
SHORT_SHA:
sh: git rev-parse --short HEAD
tasks: tasks:
context:sync: context:sync:
@@ -19,57 +23,109 @@ tasks:
context:sync:cursor: context:sync:cursor:
cmds: [bash scripts/context-sync.sh cursor] cmds: [bash scripts/context-sync.sh cursor]
dev: # ── Development ────────────────────────────────────────────────────────────
desc: Start development server
cmds:
- go run ./cmd/server
build: start:
desc: Build the binary desc: Start ingestion + supervisor (requires goreman — go install github.com/mattn/goreman@latest)
cmds: cmds:
- go build -o bin/{{.PROJECT_NAME}} ./cmd/server - goreman start
check: stop:
desc: Run all checks (lint + test + vet) desc: Stop all hyperguild processes (Ctrl-C in the goreman terminal, or kill by port)
cmds: cmds:
- task: lint - lsof -ti:3300 | xargs kill -9 2>/dev/null || true
- task: test - lsof -ti:3200 | xargs kill -9 2>/dev/null || true
- task: vet - echo "hyperguild stopped"
lint:
cmds: [golangci-lint run ./...]
test:
cmds: [go test -race -count=1 ./...]
vet:
cmds:
- go vet ./...
- govulncheck ./... || true
up:
desc: Start containers
cmds: [docker compose up -d]
down:
cmds: [docker compose down]
init:
desc: Initialize a new project from this template
cmds:
- bash scripts/init.sh
supervisor:dev: supervisor:dev:
desc: Run supervisor MCP server (development) desc: Run supervisor MCP server (development)
cmds: cmds:
- go run ./cmd/supervisor - go run ./cmd/supervisor
ingestion:dev:
desc: Run ingestion server in development mode
dir: ingestion
env:
INGEST_BRAIN_DIR: "{{.ROOT_DIR}}/brain"
INGEST_PORT: "3300"
cmds:
- go run ./cmd/server
# ── Build ──────────────────────────────────────────────────────────────────
build:
desc: Build all binaries
cmds:
- task: supervisor:build
- task: bridge:build
- task: ingestion:build
supervisor:build: supervisor:build:
desc: Build supervisor binary desc: Build supervisor binary
cmds: cmds:
- go build -o bin/supervisor ./cmd/supervisor - go build -trimpath -ldflags="-s -w -X main.version={{.VERSION}}" -o bin/supervisor ./cmd/supervisor
bridge:build:
desc: Build stdio↔HTTP bridge for Claude Code MCP integration
cmds:
- go build -trimpath -ldflags="-s -w" -o bin/supervisor-bridge ./cmd/bridge
ingestion:build:
desc: Build ingestion server binary
dir: ingestion
cmds:
- go build -trimpath -ldflags="-s -w" -o ../bin/ingestion-server ./cmd/server
# ── Quality ────────────────────────────────────────────────────────────────
check:
desc: Run all checks (lint + test + vet) across all modules
cmds:
- task: lint
- task: test
- task: vet
lint:
cmds:
- golangci-lint run ./...
- cd ingestion && golangci-lint run ./...
test:
cmds:
- go test -race -count=1 ./...
- cd ingestion && go test -race -count=1 ./...
vet:
cmds:
- go vet ./...
- govulncheck ./... || true
- cd ingestion && go vet ./...
supervisor:test:smoke: supervisor:test:smoke:
desc: Smoke test supervisor via MCP (requires supervisor:dev running) desc: Smoke test supervisor via MCP (requires start running)
cmds: cmds:
- | - |
curl -s -X POST http://localhost:${SUPERVISOR_PORT:-3200}/mcp \ curl -s -X POST http://localhost:${SUPERVISOR_PORT:-3200}/mcp \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq . -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq .
# ── Git / Release ──────────────────────────────────────────────────────────
tag:
desc: Create and push a semver tag (usage — task tag version=v1.2.3)
preconditions:
- sh: '[[ "{{.version}}" =~ ^v[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9.]+)?$ ]]'
msg: "version must be semver, e.g. v1.2.3 or v1.2.3-rc.1"
- sh: "git diff --quiet && git diff --cached --quiet"
msg: "working tree must be clean before tagging"
cmds:
- git tag -a {{.version}} -m "Release {{.version}}"
- git push origin {{.version}}
push:
desc: Push current branch and tags to origin
vars:
BRANCH:
sh: git rev-parse --abbrev-ref HEAD
cmds:
- git push origin {{.BRANCH}} --tags

View File

@@ -0,0 +1,3 @@
# TDD Pattern
Always write the failing test first.

View File

@@ -0,0 +1,3 @@
# TDD Pattern
Always write the failing test first.

59
cmd/bridge/main.go Normal file
View File

@@ -0,0 +1,59 @@
// bridge is a stdio↔HTTP adapter that lets Claude Code connect to the
// supervisor MCP server via the stdio transport.
//
// Claude Code spawns this binary as a subprocess and communicates over
// stdin/stdout. Each newline-delimited JSON-RPC message from stdin is
// forwarded to the supervisor HTTP server and the response is written back.
//
// Usage:
//
// SUPERVISOR_URL=http://localhost:3200/mcp bridge
package main
import (
"bufio"
"bytes"
"fmt"
"io"
"net/http"
"os"
)
func main() {
url := os.Getenv("SUPERVISOR_URL")
if url == "" {
url = "http://localhost:3200/mcp"
}
client := &http.Client{}
scanner := bufio.NewScanner(os.Stdin)
scanner.Buffer(make([]byte, 1024*1024), 1024*1024)
for scanner.Scan() {
line := scanner.Bytes()
if len(bytes.TrimSpace(line)) == 0 {
continue
}
req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(line))
if err != nil {
fmt.Fprintf(os.Stderr, "bridge: build request: %v\n", err)
continue
}
req.Header.Set("Content-Type", "application/json")
resp, err := client.Do(req)
if err != nil {
fmt.Fprintf(os.Stderr, "bridge: request failed: %v\n", err)
continue
}
_, _ = io.Copy(os.Stdout, resp.Body)
_ = resp.Body.Close()
_, _ = os.Stdout.Write([]byte("\n"))
}
if err := scanner.Err(); err != nil {
fmt.Fprintf(os.Stderr, "bridge: scanner: %v\n", err)
os.Exit(1)
}
}

View File

@@ -1,6 +1,7 @@
package main package main
import ( import (
"context"
"log/slog" "log/slog"
"net/http" "net/http"
"os" "os"
@@ -9,7 +10,16 @@ import (
iexec "github.com/mathiasbq/supervisor/internal/exec" iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/mcp" "github.com/mathiasbq/supervisor/internal/mcp"
"github.com/mathiasbq/supervisor/internal/registry" "github.com/mathiasbq/supervisor/internal/registry"
"github.com/mathiasbq/supervisor/internal/skills/brain"
"github.com/mathiasbq/supervisor/internal/skills/org"
"github.com/mathiasbq/supervisor/internal/skills/retrospective"
skilldebug "github.com/mathiasbq/supervisor/internal/skills/debug"
"github.com/mathiasbq/supervisor/internal/skills/review"
"github.com/mathiasbq/supervisor/internal/skills/spec"
"github.com/mathiasbq/supervisor/internal/skills/trainer"
"github.com/mathiasbq/supervisor/internal/skills/sessionlog"
"github.com/mathiasbq/supervisor/internal/skills/tdd" "github.com/mathiasbq/supervisor/internal/skills/tdd"
"github.com/mathiasbq/supervisor/internal/tier"
) )
func main() { func main() {
@@ -39,18 +49,114 @@ func main() {
os.Exit(1) os.Exit(1)
} }
executor := iexec.New(iexec.Config{ retroPrompt, err := os.ReadFile(cfg.ConfigDir + "/retrospective.md")
if err != nil {
logger.Error("read retrospective.md", "path", cfg.ConfigDir+"/retrospective.md", "err", err)
os.Exit(1)
}
reviewPrompt, err := os.ReadFile(cfg.ConfigDir + "/review.md")
if err != nil {
logger.Error("read review.md", "path", cfg.ConfigDir+"/review.md", "err", err)
os.Exit(1)
}
debugPrompt, err := os.ReadFile(cfg.ConfigDir + "/debug.md")
if err != nil {
logger.Error("read debug.md", "path", cfg.ConfigDir+"/debug.md", "err", err)
os.Exit(1)
}
specPrompt, err := os.ReadFile(cfg.ConfigDir + "/spec.md")
if err != nil {
logger.Error("read spec.md", "path", cfg.ConfigDir+"/spec.md", "err", err)
os.Exit(1)
}
trainerReaderPrompt, err := os.ReadFile(cfg.ConfigDir + "/trainer-reader.md")
if err != nil {
logger.Error("read trainer-reader.md", "path", cfg.ConfigDir+"/trainer-reader.md", "err", err)
os.Exit(1)
}
trainerWriterPrompt, err := os.ReadFile(cfg.ConfigDir + "/trainer-writer.md")
if err != nil {
logger.Error("read trainer-writer.md", "path", cfg.ConfigDir+"/trainer-writer.md", "err", err)
os.Exit(1)
}
claudeExec := iexec.New(iexec.Config{
SystemPrompt: string(systemPrompt), SystemPrompt: string(systemPrompt),
LiteLLMBaseURL: cfg.LiteLLMBaseURL, LiteLLMBaseURL: cfg.LiteLLMBaseURL,
LiteLLMAPIKey: cfg.LiteLLMAPIKey, LiteLLMAPIKey: cfg.LiteLLMAPIKey,
}) })
litellmExec := iexec.NewLiteLLM(cfg.LiteLLMBaseURL, cfg.LiteLLMAPIKey, 0)
verifier := iexec.NewVerifier("", models.Verifier(), 0)
buildOrch := func(skill string) func(ctx context.Context, req iexec.Request) (iexec.Result, error) {
return func(ctx context.Context, req iexec.Request) (iexec.Result, error) {
rawChain := models.ChainFor(skill, req.Model)
chain := make([]iexec.ChainEntry, len(rawChain))
for i, m := range rawChain {
chain[i] = iexec.EntryFor(m)
}
attempts := make([]iexec.AttemptRecord, 0, len(chain))
orch := iexec.NewOrchestrator(chain, litellmExec.Run, claudeExec.Run, verifier, models.LlamaSwapURL(), &attempts)
return orch.Run(ctx, req)
}
}
tierFn := func(ctx context.Context) tier.Info {
return tier.Detect(ctx, "https://api.anthropic.com", cfg.LiteLLMBaseURL)
}
reg := registry.New() reg := registry.New()
reg.Register(tdd.New(tdd.Config{ reg.Register(tdd.New(tdd.Config{
SystemPrompt: string(systemPrompt), SystemPrompt: string(systemPrompt),
SkillPrompt: string(tddPrompt), SkillPrompt: string(tddPrompt),
DefaultModel: models.Resolve("tdd", ""), DefaultModel: models.ChainFor("tdd", "")[0],
ExecutorFn: executor.Run, ExecutorFn: buildOrch("tdd"),
SessionsDir: cfg.SessionsDir,
}))
reg.Register(brain.New(brain.Config{
IngestBaseURL: cfg.IngestBaseURL,
}))
reg.Register(org.New(org.Config{
TierFn: tierFn,
}))
reg.Register(sessionlog.New(sessionlog.Config{
SessionsDir: cfg.SessionsDir,
}))
reg.Register(retrospective.New(retrospective.Config{
SkillPrompt: string(retroPrompt),
DefaultModel: models.ChainFor("retrospective", "")[0],
SessionsDir: cfg.SessionsDir,
ExecutorFn: buildOrch("retrospective"),
}))
reg.Register(review.New(review.Config{
SkillPrompt: string(reviewPrompt),
DefaultModel: models.ChainFor("review", "")[0],
ExecutorFn: buildOrch("review"),
SessionsDir: cfg.SessionsDir,
}))
reg.Register(skilldebug.New(skilldebug.Config{
SkillPrompt: string(debugPrompt),
DefaultModel: models.ChainFor("debug", "")[0],
ExecutorFn: buildOrch("debug"),
SessionsDir: cfg.SessionsDir,
}))
reg.Register(spec.New(spec.Config{
SkillPrompt: string(specPrompt),
DefaultModel: models.ChainFor("spec", "")[0],
ExecutorFn: buildOrch("spec"),
SessionsDir: cfg.SessionsDir,
}))
reg.Register(trainer.New(trainer.Config{
ReaderPrompt: string(trainerReaderPrompt),
WriterPrompt: string(trainerWriterPrompt),
DefaultModel: models.ChainFor("trainer", "")[0],
ExecutorFn: buildOrch("trainer"),
SessionsDir: cfg.SessionsDir,
BrainDir: cfg.BrainDir,
})) }))
srv := mcp.NewServer(reg) srv := mcp.NewServer(reg)

View File

@@ -1,10 +1,41 @@
# Model routing table — three-layer priority: # Model routing chains — three-layer priority:
# 1. model param in MCP tool call (caller override) # 1. model param in MCP tool call (caller override — collapses to single entry, no escalation)
# 2. per-skill entry here # 2. per-skill chain here
# 3. default (fallback) # 3. default_chain fallback
default: ollama/qwen3-coder-30b-tuned
verifier: claude-sonnet-4-6 # fixed verifier for all local tiers
llama_swap_url: http://koala:8080 # for warm-state probing
default_chain:
- ollama/qwen3-coder-30b-tuned
- claude-sonnet-4-6
skills: skills:
tdd: ollama/qwen3-coder-30b-tuned tdd:
review: ollama/devstral-tuned chain:
debug: ollama/deepseek-r1-tuned - ollama/qwen3-coder-30b-tuned
- claude-sonnet-4-6
review:
chain:
- ollama/devstral-tuned
- ollama/gemma4
- claude-sonnet-4-6
debug:
chain:
- ollama/deepseek-r1-tuned
- claude-sonnet-4-6
spec:
chain:
- ollama/phi4
- ollama/gemma4
- claude-sonnet-4-6
- claude-opus-4-6
retrospective:
chain:
- ollama/qwen3-coder-30b-tuned
- claude-sonnet-4-6
trainer:
chain:
- ollama/qwen3-coder-30b-tuned
- claude-sonnet-4-6

View File

@@ -0,0 +1,31 @@
# Debug Discipline
You are a systematic debugger. Form hypotheses before suggesting fixes.
## Iron laws
1. Never suggest "try X and see what happens" — every hypothesis must have a specific expected outcome if correct
2. Generate exactly 3-5 hypotheses, ordered by likelihood (most likely first)
3. Never fix the bug — diagnose only; the caller decides what to do with the hypotheses
## Output contract
Return JSON result with:
- `status`: "pass" (hypotheses generated) or "error" (error too ambiguous to analyse)
- `phase`: "debug"
- `skill`: "debug"
- `file_path`: the most relevant file to the error (read it)
- `runner_output`: your hypotheses, formatted as:
```
HYPOTHESIS 1 (likelihood: high): <mechanism>
VERIFY: <exact command or file to check> → expected if correct: <specific output>
HYPOTHESIS 2 (likelihood: medium): <mechanism>
VERIFY: <exact command or file to check> → expected if correct: <specific output>
```
- `verified`: false — verification is the caller's job
- `message`: "N hypotheses for: <one-line error summary>"
## Rules
1. Read the error and any context files provided before forming hypotheses
2. Identify the failure mode first — what actually went wrong, not just what the error says
3. For each hypothesis: name the mechanism, explain why it would produce this exact error, give a concrete verification command with expected output
4. If the error is clearly a typo or trivial mistake, still form 3 hypotheses — surface the most likely cause as #1

View File

@@ -0,0 +1,30 @@
# Code Review Discipline
You are a disciplined code reviewer. Read files carefully before commenting.
## Iron laws
1. Never approve security vulnerabilities: command injection, SQL injection, credential exposure, path traversal, unchecked input at system boundaries
2. Never approve silently swallowed errors — `err != nil` without wrapping or handling is always wrong
3. Never approve missing validation at system boundaries (user input, external APIs, file reads)
## Output contract
Return JSON result with:
- `status`: "pass" if no blocking issues; "fail" if any iron law is violated
- `phase`: "review"
- `skill`: "review"
- `file_path`: first file reviewed
- `runner_output`: full review formatted as:
```
CRITICAL: <issue> at <file>:<line>
WARNING: <issue> at <file>:<line>
SUGGESTION: <issue> at <file>:<line>
```
- `verified`: true if you read all specified files; false if any were missing or unreadable
- `message`: "N critical, M warnings, K suggestions" or "clean: <which iron law checks passed and why>"
## Rules
1. Read every file listed before writing feedback
2. Check iron laws first — any violation is CRITICAL and sets status to "fail"
3. Then check: correctness, test coverage for new code, Go style conventions
4. Never rubber-stamp — if nothing is wrong, explain specifically which iron law checks you ran and why they passed
5. Line references are required for every finding — "roughly around the middle" is not acceptable

46
config/supervisor/spec.md Normal file
View File

@@ -0,0 +1,46 @@
# Spec Writing Discipline
You write structured implementation specs. Nothing is left ambiguous.
## Iron laws
1. Success criteria must be measurable — "the system is fast" is banned; "p99 < 200ms under 100 RPS" is valid
2. Always include an explicit "Out of scope" section — if you don't draw the boundary, the developer will guess wrong
3. Every technical decision in the approach must have a rationale
## Output contract
Return JSON result with:
- `status`: "pass" (spec written) or "error" (requirements too ambiguous to spec without more input)
- `phase`: "spec"
- `skill`: "spec"
- `file_path`: the output_path where the spec was written (absolute path)
- `runner_output`: ""
- `verified`: true if the file was written successfully
- `message`: "spec written: <one-line summary of what was specced>"
## Spec structure
Write the spec as markdown to the output_path:
```markdown
# [Feature] Spec
## Problem statement
[What problem does this solve? For whom? Why now?]
## Success criteria
- [ ] [Criterion 1 — measurable and verifiable]
- [ ] [Criterion 2 — measurable and verifiable]
## Constraints
[Non-negotiable requirements the solution must satisfy]
## Out of scope
[What we are explicitly NOT doing in this iteration]
## Technical approach
[Architecture decisions, key components, rationale for each choice]
## Risks
[What could go wrong, and how we'd mitigate it]
```
If the requirements are too vague to produce measurable success criteria, return status "error" with a message listing the specific questions that need answers.

View File

@@ -0,0 +1,31 @@
# Trainer Reader Discipline
You scan session logs and identify candidate learning moments worth converting to training data.
## What to look for
- **SFT candidates**: the worker did exactly the right thing — a clean pattern worth reinforcing
- **DPO candidates**: the worker first produced a wrong or suboptimal response, then corrected — you have both rejected and chosen
## Scoring (15)
- 5: novel pattern, clearly correct, generalises across projects
- 4: good pattern, correct, somewhat project-specific but still useful
- 3: correct but obvious — include only if especially clean
- 2 or below: skip — too ambiguous or too context-specific
## Output contract
Return JSON result with:
- `status`: "pass" or "error"
- `phase`: "trainer"
- `skill`: "trainer"
- `file_path`: ""
- `runner_output`: JSON array of candidates (valid JSON, not markdown):
[{"type":"sft","moment":"<what happened>","prompt":"<what was asked>","completion":"<what was done right>","score":4},
{"type":"dpo","moment":"<what happened>","prompt":"<what was asked>","chosen":"<correct>","rejected":"<incorrect>","score":3}]
- `verified`: true
- `message`: "N sft candidates, M dpo candidates found"
## Rules
1. Read all session entries in the task prompt
2. Score each entry — only include entries scoring >= 3
3. Prompt/completion fields must be phrased to generalise: no project-specific paths or names
4. If no candidates score >= 3, return an empty array `[]` — never force low-quality candidates

View File

@@ -0,0 +1,35 @@
# Trainer Writer Discipline
You receive candidate learning moments from the reader and write clean SFT/DPO training pairs.
## Quality gate (apply before writing)
- SFT: prompt must be phrased so it could come from any project, not just this one
- DPO: chosen and rejected must be clearly distinguishable — skip if a reader can't tell which is better
- Never include project-specific paths, variable names, or identifiers in any pair
## Output contract
Return JSON result with:
- `status`: "pass" (pairs written or skipped due to quality) or "error" (candidates JSON was malformed)
- `phase`: "trainer"
- `skill`: "trainer"
- `file_path`: path of the last file written (empty if nothing passed quality gate)
- `runner_output`: "N SFT pairs written to brain/training-data/sft/, M DPO pairs to brain/training-data/dpo/" or "0 pairs passed quality gate"
- `verified`: true if files were written; false if nothing passed
- `message`: "N sft + M dpo pairs for session <id>" or "no pairs passed quality gate"
## File format
JSONL — one JSON object per line.
SFT: `{"prompt": "...", "completion": "..."}`
DPO: `{"prompt": "...", "chosen": "...", "rejected": "..."}`
Write SFT to: `<brain_dir>/training-data/sft/<session_id>.jsonl`
Write DPO to: `<brain_dir>/training-data/dpo/<session_id>.jsonl`
Append to existing files if they exist (don't overwrite).
## Rules
1. Parse the `reader_candidates` JSON from the task prompt
2. For each candidate: apply quality gate
3. Write passing SFT candidates to sft JSONL, DPO candidates to dpo JSONL
4. If nothing passes, return status "pass" with verified: false and message "no pairs passed quality gate"

View File

@@ -0,0 +1,322 @@
# Model Orchestration Design
**Date:** 2026-04-20
**Status:** Approved for implementation
## Problem statement
The hyperguild supervisor currently spawns a `claude --print` subprocess for every skill call. The model routing config (`models.yaml`) exists but is dead weight — the model name is injected as text into the task prompt and ignored. Every skill call costs Claude tokens regardless of task complexity or data sensitivity.
## Goal
Route skill work to the most appropriate model — weighing cost, latency, and quality — with Claude acting as the real supervisor: verifying outputs and deciding when to escalate. Local models on owned hardware handle the common case; Claude escalates through a chain to frontier models only when local quality is insufficient.
## Success criteria
- [ ] Each skill dispatches generation to its configured local model via LiteLLM by default
- [ ] Claude verifies every local output and either accepts or escalates
- [ ] Escalation walks a per-skill chain (local small → local large → Sonnet → Opus) with one attempt per tier
- [ ] Every attempt (model, tier, duration, warm state, verdict) is logged in the session JSONL
- [ ] Cloud tiers (Sonnet/Opus) self-certify — no separate verifier call
- [ ] Zero changes to skill handlers — they call `ExecutorFn` exactly as today
- [ ] `LiteLTMBaseURL` already in config; no new env vars required beyond `LLAMA_SWAP_URL`
## Constraints
- One attempt per tier before escalating (no retry within a tier)
- Anthropic T&C: Claude is called normally via Anthropic API; local models are called directly via LiteLLM HTTP — no API redirection
- `models.yaml` remains the single routing config file
## Out of scope
- Auto-rerouting based on real-time warm state (logged, not acted on — Phase 4)
- Multi-tenant / public service exposure
- RAG/CAG model boosting
- Managed Agent cloud delegation (chain stub only in Phase 3)
---
## Architecture
```
MCP tool call (Claude Code)
Skill handler — calls ExecutorFn (unchanged)
Orchestrator.Run (implements ExecutorFn)
├─ Resolve chain from models.yaml
├─ For each model in chain:
│ ├─ [ollama/*] → LiteLLM executor → generate
│ │ ↓
│ │ Claude verifier (task + output + discipline)
│ │ ├─ accept → return Result (log attempt)
│ │ └─ escalate → next tier (log attempt)
│ │
│ └─ [claude-*] → Claude executor (current) → generate + self-certify
│ └─ return Result (log attempt)
└─ All tiers exhausted → return best attempt with escalation note
```
Claude is always the verifier for local tiers. At cloud tiers, Claude generates and self-certifies — the verifier call is skipped.
---
## Components
### 1. `internal/exec/litellm.go` — LiteLLM executor
Calls `POST /v1/chat/completions` on the configured LiteLLM server. Implements the same `ExecutorFn` signature as the existing claude executor.
```go
type LiteLLMExecutor struct {
BaseURL string
APIKey string
HTTPClient *http.Client
Timeout time.Duration
}
func NewLiteLLM(baseURL, apiKey string, timeout time.Duration) *LiteLLMExecutor
func (e *LiteLLMExecutor) Run(ctx context.Context, req Request) (Result, error)
```
Request mapping:
- `req.SkillPrompt` → system message
- `req.TaskPrompt` → user message
- `req.Model``model` field in the chat completions request
Response handling: local models are prompted (via the discipline file output contract) to return a JSON object matching the `Result` schema. The executor attempts `json.Unmarshal` into `Result` directly — no envelope unwrapping needed (unlike the `--output-format json` claude envelope). If unmarshalling fails, the executor returns an error that the orchestrator treats as an automatic escalation trigger.
### 2. `internal/exec/verifier.go` — Claude verifier
A focused Claude call that judges local model output. Uses the existing `Executor` (claude subprocess) internally.
```go
type Verdict struct {
Accept bool `json:"accept"`
Feedback string `json:"feedback"` // reason if not accepting; empty if accept
}
type Verifier struct {
executor *Executor // the existing claude executor
}
func NewVerifier(executor *Executor) *Verifier
func (v *Verifier) Verify(ctx context.Context, skillPrompt, taskPrompt string, output Result) (Verdict, error)
```
The verifier prompt gives Claude:
1. The skill discipline file (so it knows the iron laws and output contract)
2. The original task prompt (informed verification — Claude sees what was asked)
3. The generated output
4. A short instruction: "Does this output satisfy the discipline's iron laws and output contract? Reply with JSON: `{\"accept\": true|false, \"feedback\": \"...\"}`"
The verifier uses a lightweight JSON schema for its own output (a `Verdict` schema), keeping the call fast.
### 3. `internal/exec/orchestrator.go` — chain walker
Implements `ExecutorFn`. Walks the escalation chain, delegating generation and verification per tier.
```go
type Chain []ChainEntry
type ChainEntry struct {
Model string // e.g. "ollama/phi4", "claude-sonnet-4-5"
Tier string // "local" | "subagent" | "managed"
IsCloud bool // true for claude-* models; skips verifier
}
type Orchestrator struct {
chain Chain
litellm *LiteLLMExecutor
claude *Executor
verifier *Verifier
llamaSwapURL string // for warm-state probe
}
func NewOrchestrator(chain Chain, litellm *LiteLLMExecutor, claude *Executor, verifier *Verifier, llamaSwapURL string) *Orchestrator
func (o *Orchestrator) Run(ctx context.Context, req Request) (Result, error)
```
Algorithm:
```
for each entry in chain:
warm = probe llama-swap (if local tier)
start = now()
if entry.IsCloud:
result, err = claude.Run(ctx, req with entry.Model)
log attempt(model, tier, duration, warm, verified=true)
if err == nil: return result
else:
result, err = litellm.Run(ctx, req with entry.Model)
duration = now() - start
if err != nil:
log attempt(model, tier, duration, warm, verified=false)
continue // automatic escalation on parse/network error
verdict = verifier.Verify(ctx, req.SkillPrompt, req.TaskPrompt, result)
log attempt(model, tier, duration, warm, verified=verdict.Accept)
if verdict.Accept: return result
// inject verifier feedback into next tier's task prompt
req.TaskPrompt = req.TaskPrompt + "\n\nPrior attempt feedback: " + verdict.Feedback
return error("all tiers exhausted")
```
### 4. `internal/config/models.go` — chain parser
Replaces the current single-model resolution with chain parsing.
Updated `models.yaml` format:
```yaml
verifier: claude-sonnet-4-6 # fixed verifier for all local tiers
llama_swap_url: http://koala:8080 # for warm-state probing
default_chain:
- ollama/qwen3-coder-30b-tuned
- claude-sonnet-4-5
skills:
tdd:
chain:
- ollama/qwen3-coder-30b-tuned
- claude-sonnet-4-5
review:
chain:
- ollama/devstral-tuned
- ollama/gemma4
- claude-sonnet-4-5
debug:
chain:
- ollama/deepseek-r1-tuned
- claude-sonnet-4-5
spec:
chain:
- ollama/phi4
- ollama/gemma4
- claude-sonnet-4-5
- claude-opus-4-6
retrospective:
chain:
- ollama/qwen3-coder-30b-tuned
- claude-sonnet-4-5
trainer:
chain:
- ollama/qwen3-coder-30b-tuned
- claude-sonnet-4-5
```
The parser exposes:
```go
func (m *Models) ChainFor(skill string) Chain
func (m *Models) Verifier() string
func (m *Models) LlamaSwapURL() string
```
Caller override (`model` param in MCP tool call) pins the chain to a single entry — one model, no escalation. This preserves the existing override behaviour for power users.
### 5. `internal/session/session.go` — updated `Attempt` struct
```go
type Attempt struct {
Attempt int `json:"attempt"`
Model string `json:"model"`
Tier string `json:"tier"` // local | subagent | managed
DurationMs int64 `json:"duration_ms"`
WarmStart bool `json:"warm_start"` // model was already loaded in llama-swap
Verified bool `json:"verified"`
Verdict string `json:"verdict,omitempty"` // accept | escalate | error
Feedback string `json:"feedback,omitempty"` // verifier feedback on escalation
OutputSummary string `json:"output_summary,omitempty"`
RunnerOutput string `json:"runner_output,omitempty"`
}
```
### 6. `cmd/supervisor/main.go` — one wiring change
```go
// Before:
reg.Register(review.New(review.Config{ExecutorFn: executor.Run, ...}))
// After:
chain := models.ChainFor("review")
orch := exec.NewOrchestrator(chain, litellmExec, claudeExec, verifier, models.LlamaSwapURL())
reg.Register(review.New(review.Config{ExecutorFn: orch.Run, ...}))
```
One orchestrator per skill, sharing the same `litellmExec`, `claudeExec`, and `verifier` instances.
---
## Data flow example: `review` skill call
1. Claude Code calls `review` tool with `files: ["internal/foo.go"]`
2. Skill handler builds task prompt, calls `orch.Run`
3. Orchestrator resolves chain: `[devstral, gemma4, sonnet]`
4. Probes llama-swap: devstral is warm
5. LiteLLM calls devstral → returns JSON result
6. Verifier asks Claude: "does this review satisfy the iron laws?"
7. Claude: `{"accept": false, "feedback": "missing line references for all findings"}`
8. Orchestrator logs attempt #1 (devstral, local, 4200ms, warm, escalate)
9. Injects feedback into task prompt, calls gemma4
10. Verifier: `{"accept": true}`
11. Orchestrator logs attempt #2 (gemma4, local, 6100ms, cold, accept)
12. Returns result to skill handler → MCP response
Session JSONL records both attempts. You can see: devstral was warm but produced weak output; gemma4 was cold but passed.
---
## Observability
Session JSONL is the primary store. Each `Entry.Attempts` slice records the full escalation trail. To analyse across sessions:
```bash
# Which models are escalating most?
jq -r '.attempts[] | select(.verdict == "escalate") | .model' brain/sessions/*.jsonl | sort | uniq -c
# Average latency per model
jq -r '.attempts[] | [.model, .duration_ms] | @tsv' brain/sessions/*.jsonl | awk '{sum[$1]+=$2; n[$1]++} END {for (m in sum) print m, sum[m]/n[m]}'
# Cold start frequency
jq -r '.attempts[] | select(.warm_start == false) | .model' brain/sessions/*.jsonl | sort | uniq -c
```
No new metrics infrastructure needed for Phase 3. Phase 4 can build a dashboard on top of this data.
---
## Error handling
| Scenario | Behaviour |
|----------|-----------|
| LiteLLM unreachable | Log attempt as error, escalate immediately |
| Local model returns unparseable JSON | Log attempt as error, escalate |
| Verifier call fails | Log, treat as escalate (safe default) |
| All tiers exhausted | Return error to skill handler; skill returns MCP error to caller |
| Caller passes `model` override | Single-entry chain, no escalation, no verifier call |
---
## Testing approach
- `TestLiteLLMExecutor`: mock HTTP server returning valid/invalid JSON; verify parse logic and error escalation
- `TestVerifier`: fake claude executor returning accept/escalate verdicts; verify prompt construction
- `TestOrchestrator`: table-driven — chains of 1/2/3 tiers, various accept/escalate/error combinations; verify attempt log contents and final result
- `TestModelsChainFor`: YAML parsing for all skill overrides and default_chain fallback
- Integration smoke test: start real LiteLLM (or mock), call `review` tool via MCP, verify attempt log written
---
## Risks
| Risk | Mitigation |
|------|------------|
| Local models ignore output contract → bad JSON | Discipline files already specify JSON output contract; parse failure auto-escalates |
| Verifier Claude call adds latency to every local attempt | Verifier prompt is small and fast; acceptable tradeoff for quality gate |
| llama-swap warm probe adds overhead | Probe is a single lightweight HTTP GET; timeout at 200ms, treat failure as `warm_start: false` |
| Chain exhaustion leaves caller with no result | Return structured error via MCP; caller can retry with explicit `model` override |

View File

@@ -33,6 +33,8 @@ type queryRequest struct {
type writeRequest struct { type writeRequest struct {
Content string `json:"content"` Content string `json:"content"`
Filename string `json:"filename,omitempty"` Filename string `json:"filename,omitempty"`
Type string `json:"type,omitempty"`
Domain string `json:"domain,omitempty"`
} }
// Query handles POST /query — full-text search across the brain wiki. // Query handles POST /query — full-text search across the brain wiki.
@@ -83,8 +85,22 @@ func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {
return return
} }
finalContent := req.Content
if req.Type != "" || req.Domain != "" {
var fm strings.Builder
fm.WriteString("---\n")
if req.Type != "" {
fmt.Fprintf(&fm, "type: %s\n", req.Type)
}
if req.Domain != "" {
fmt.Fprintf(&fm, "domain: %s\n", req.Domain)
}
fm.WriteString("---\n")
finalContent = fm.String() + req.Content
}
dest := filepath.Join(rawDir, filepath.Base(filename)) dest := filepath.Join(rawDir, filepath.Base(filename))
if err := os.WriteFile(dest, []byte(req.Content), 0o644); err != nil { if err := os.WriteFile(dest, []byte(finalContent), 0o644); err != nil {
h.logger.Error("write failed", "err", err) h.logger.Error("write failed", "err", err)
http.Error(w, "write error", http.StatusInternalServerError) http.Error(w, "write error", http.StatusInternalServerError)
return return

View File

@@ -79,6 +79,27 @@ func TestQuery_RequiresQuery(t *testing.T) {
assert.Equal(t, http.StatusBadRequest, rec.Code) assert.Equal(t, http.StatusBadRequest, rec.Code)
} }
func TestWrite_IncludesFrontmatterWhenTypeProvided(t *testing.T) {
dir, h := setup(t)
body, _ := json.Marshal(map[string]any{
"content": "Some learning.",
"filename": "typed-note.md",
"type": "concept",
"domain": "software",
})
req := httptest.NewRequest(http.MethodPost, "/write", bytes.NewReader(body))
rec := httptest.NewRecorder()
h.Write(rec, req)
assert.Equal(t, http.StatusOK, rec.Code)
content, err := os.ReadFile(filepath.Join(dir, "raw", "typed-note.md"))
require.NoError(t, err)
assert.Contains(t, string(content), "type: concept")
assert.Contains(t, string(content), "domain: software")
assert.Contains(t, string(content), "Some learning.")
}
func TestWrite_GeneratesFilenameIfAbsent(t *testing.T) { func TestWrite_GeneratesFilenameIfAbsent(t *testing.T) {
dir, h := setup(t) dir, h := setup(t)
body, _ := json.Marshal(map[string]any{"content": "auto name"}) body, _ := json.Marshal(map[string]any{"content": "auto name"})

View File

@@ -8,6 +8,9 @@ type Config struct {
LiteLLMAPIKey string // LITELLM_API_KEY LiteLLMAPIKey string // LITELLM_API_KEY
ConfigDir string // SUPERVISOR_CONFIG_DIR, default ./config/supervisor ConfigDir string // SUPERVISOR_CONFIG_DIR, default ./config/supervisor
ModelsFile string // SUPERVISOR_MODELS_FILE, default <ConfigDir>/../models.yaml ModelsFile string // SUPERVISOR_MODELS_FILE, default <ConfigDir>/../models.yaml
IngestBaseURL string // INGEST_BASE_URL, default http://localhost:3300
SessionsDir string // SUPERVISOR_SESSIONS_DIR, default ./brain/sessions
BrainDir string // SUPERVISOR_BRAIN_DIR, default ./brain
} }
func Load() (Config, error) { func Load() (Config, error) {
@@ -18,6 +21,9 @@ func Load() (Config, error) {
ConfigDir: envOr("SUPERVISOR_CONFIG_DIR", "./config/supervisor"), ConfigDir: envOr("SUPERVISOR_CONFIG_DIR", "./config/supervisor"),
} }
cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml") cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml")
cfg.IngestBaseURL = envOr("INGEST_BASE_URL", "http://localhost:3300")
cfg.SessionsDir = envOr("SUPERVISOR_SESSIONS_DIR", "./brain/sessions")
cfg.BrainDir = envOr("SUPERVISOR_BRAIN_DIR", "./brain")
return cfg, nil return cfg, nil
} }

View File

@@ -13,12 +13,18 @@ func TestLoadDefaults(t *testing.T) {
t.Setenv("LITELLM_BASE_URL", "") t.Setenv("LITELLM_BASE_URL", "")
t.Setenv("LITELLM_API_KEY", "") t.Setenv("LITELLM_API_KEY", "")
t.Setenv("SUPERVISOR_CONFIG_DIR", "") t.Setenv("SUPERVISOR_CONFIG_DIR", "")
t.Setenv("INGEST_BASE_URL", "")
t.Setenv("SUPERVISOR_SESSIONS_DIR", "")
t.Setenv("SUPERVISOR_BRAIN_DIR", "")
cfg, err := config.Load() cfg, err := config.Load()
require.NoError(t, err) require.NoError(t, err)
assert.Equal(t, "3200", cfg.Port) assert.Equal(t, "3200", cfg.Port)
assert.Equal(t, "http://iguana:4000", cfg.LiteLLMBaseURL) assert.Equal(t, "http://iguana:4000", cfg.LiteLLMBaseURL)
assert.Equal(t, "./config/supervisor", cfg.ConfigDir) assert.Equal(t, "./config/supervisor", cfg.ConfigDir)
assert.Equal(t, "http://localhost:3300", cfg.IngestBaseURL)
assert.Equal(t, "./brain/sessions", cfg.SessionsDir)
assert.Equal(t, "./brain", cfg.BrainDir)
} }
func TestLoadFromEnv(t *testing.T) { func TestLoadFromEnv(t *testing.T) {

View File

@@ -7,9 +7,15 @@ import (
"gopkg.in/yaml.v3" "gopkg.in/yaml.v3"
) )
type skillChain struct {
Chain []string `yaml:"chain"`
}
type modelsFile struct { type modelsFile struct {
Default string `yaml:"default"` Verifier string `yaml:"verifier"`
Skills map[string]string `yaml:"skills"` LlamaSwapURL string `yaml:"llama_swap_url"`
DefaultChain []string `yaml:"default_chain"`
Skills map[string]skillChain `yaml:"skills"`
} }
type Models struct { type Models struct {
@@ -28,16 +34,23 @@ func LoadModels(path string) (Models, error) {
return Models{data: f}, nil return Models{data: f}, nil
} }
// Resolve returns the model for a skill, respecting three-layer priority: // Verifier returns the model name to use for all local-tier output verification.
// 1. override (from MCP call) — highest func (m Models) Verifier() string { return m.data.Verifier }
// 2. per-skill default from models.yaml
// 3. global default // LlamaSwapURL returns the llama-swap base URL for warm-state probing.
func (m Models) Resolve(skill, override string) string { func (m Models) LlamaSwapURL() string { return m.data.LlamaSwapURL }
// ChainFor returns the ordered list of model names for a skill.
// If override is non-empty, returns a single-entry chain (no escalation).
// Falls back to default_chain when the skill has no explicit entry.
func (m Models) ChainFor(skill, override string) []string {
if override != "" { if override != "" {
return override return []string{override}
} }
if model, ok := m.data.Skills[skill]; ok { if sc, ok := m.data.Skills[skill]; ok && len(sc.Chain) > 0 {
return model return sc.Chain
} }
return m.data.Default out := make([]string, len(m.data.DefaultChain))
copy(out, m.data.DefaultChain)
return out
} }

View File

@@ -10,35 +10,71 @@ import (
"github.com/stretchr/testify/require" "github.com/stretchr/testify/require"
) )
func TestModelsResolve(t *testing.T) { const testYAML = `
yaml := ` verifier: claude-sonnet-4-6
default: ollama/default-model llama_swap_url: http://koala:8080
default_chain:
- ollama/qwen3-coder-30b-tuned
- claude-sonnet-4-6
skills: skills:
tdd: ollama/qwen3-coder-30b-tuned review:
review: ollama/devstral-tuned chain:
- ollama/devstral-tuned
- ollama/gemma4
- claude-sonnet-4-6
spec:
chain:
- ollama/phi4
- claude-opus-4-6
` `
func writeModels(t *testing.T, content string) string {
t.Helper()
f := filepath.Join(t.TempDir(), "models.yaml") f := filepath.Join(t.TempDir(), "models.yaml")
require.NoError(t, os.WriteFile(f, []byte(yaml), 0644)) require.NoError(t, os.WriteFile(f, []byte(content), 0644))
return f
m, err := config.LoadModels(f)
require.NoError(t, err)
assert.Equal(t, "ollama/qwen3-coder-30b-tuned", m.Resolve("tdd", ""))
assert.Equal(t, "ollama/devstral-tuned", m.Resolve("review", ""))
assert.Equal(t, "ollama/default-model", m.Resolve("unknown", ""))
} }
func TestModelsOverride(t *testing.T) { func TestModelsVerifier(t *testing.T) {
yaml := ` m, err := config.LoadModels(writeModels(t, testYAML))
default: ollama/default-model require.NoError(t, err)
skills: assert.Equal(t, "claude-sonnet-4-6", m.Verifier())
tdd: ollama/qwen3-coder-30b-tuned }
`
f := filepath.Join(t.TempDir(), "models.yaml")
require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))
m, err := config.LoadModels(f) func TestModelsLlamaSwapURL(t *testing.T) {
m, err := config.LoadModels(writeModels(t, testYAML))
require.NoError(t, err)
assert.Equal(t, "http://koala:8080", m.LlamaSwapURL())
}
func TestModelsChainForSkillOverride(t *testing.T) {
m, err := config.LoadModels(writeModels(t, testYAML))
require.NoError(t, err) require.NoError(t, err)
assert.Equal(t, "anthropic/claude-sonnet-4-6", m.Resolve("tdd", "anthropic/claude-sonnet-4-6")) chain := m.ChainFor("review", "")
require.Len(t, chain, 3)
assert.Equal(t, "ollama/devstral-tuned", chain[0])
assert.Equal(t, "ollama/gemma4", chain[1])
assert.Equal(t, "claude-sonnet-4-6", chain[2])
}
func TestModelsChainForDefaultFallback(t *testing.T) {
m, err := config.LoadModels(writeModels(t, testYAML))
require.NoError(t, err)
chain := m.ChainFor("trainer", "") // not in skills map
require.Len(t, chain, 2)
assert.Equal(t, "ollama/qwen3-coder-30b-tuned", chain[0])
assert.Equal(t, "claude-sonnet-4-6", chain[1])
}
func TestModelsChainForCallerOverride(t *testing.T) {
m, err := config.LoadModels(writeModels(t, testYAML))
require.NoError(t, err)
chain := m.ChainFor("review", "claude-opus-4-6")
require.Len(t, chain, 1)
assert.Equal(t, "claude-opus-4-6", chain[0])
} }

View File

@@ -68,13 +68,15 @@ func (e *Executor) Run(ctx context.Context, req Request) (Result, error) {
args := []string{ args := []string{
"--print", "--print",
"--bare",
"--permission-mode", "bypassPermissions", "--permission-mode", "bypassPermissions",
"--tools", tools, "--tools", tools,
"--json-schema", Schema, "--json-schema", Schema,
"--output-format", "text", "--output-format", "json",
prompt,
} }
if strings.HasPrefix(req.Model, "claude-") {
args = append(args, "--model", req.Model)
}
args = append(args, prompt)
cmd := exec.CommandContext(ctx, e.cfg.ClaudeBinary, args...) cmd := exec.CommandContext(ctx, e.cfg.ClaudeBinary, args...)
cmd.Env = append(os.Environ(), "LITELLM_API_KEY="+e.cfg.LiteLLMAPIKey) cmd.Env = append(os.Environ(), "LITELLM_API_KEY="+e.cfg.LiteLLMAPIKey)
@@ -89,12 +91,21 @@ func (e *Executor) Run(ctx context.Context, req Request) (Result, error) {
return Result{}, fmt.Errorf("claude exited with error: %w — stderr: %s", err, stderr.String()) return Result{}, fmt.Errorf("claude exited with error: %w — stderr: %s", err, stderr.String())
} }
var r Result // --output-format json wraps the response in an envelope; structured output
if err := json.Unmarshal(stdout.Bytes(), &r); err != nil { // from --json-schema is in the "structured_output" field.
return Result{}, fmt.Errorf("parse result JSON: %w — raw output: %s", err, stdout.String()) var envelope struct {
StructuredOutput *Result `json:"structured_output"`
IsError bool `json:"is_error"`
Result string `json:"result"` // fallback text result for error messages
} }
if err := r.Validate(); err != nil { if err := json.Unmarshal(stdout.Bytes(), &envelope); err != nil {
return Result{}, fmt.Errorf("parse envelope JSON: %w — raw: %s — stderr: %s", err, stdout.String(), stderr.String())
}
if envelope.StructuredOutput == nil {
return Result{}, fmt.Errorf("no structured_output in response — result: %s — stderr: %s", envelope.Result, stderr.String())
}
if err := envelope.StructuredOutput.Validate(); err != nil {
return Result{}, fmt.Errorf("invalid result: %w", err) return Result{}, fmt.Errorf("invalid result: %w", err)
} }
return r, nil return *envelope.StructuredOutput, nil
} }

View File

@@ -28,8 +28,10 @@ func fakeClaudePath(t *testing.T, output string, exitCode int) string {
} }
func TestExecutorParsesValidResult(t *testing.T) { func TestExecutorParsesValidResult(t *testing.T) {
validJSON := `{"status":"pass","phase":"red","skill":"tdd","file_path":"/tmp/x_test.go","runner_output":"FAIL","verified":true,"model_used":"self","message":"ok"}` // Fake claude emits the --output-format json envelope that the real CLI produces.
claude := fakeClaudePath(t, validJSON, 0) // The executor extracts the result from the "structured_output" field.
envelope := `{"type":"result","subtype":"success","is_error":false,"structured_output":{"status":"pass","phase":"red","skill":"tdd","file_path":"/tmp/x_test.go","runner_output":"FAIL","verified":true,"model_used":"self","message":"ok"}}`
claude := fakeClaudePath(t, envelope, 0)
ex := iexec.New(iexec.Config{ ex := iexec.New(iexec.Config{
ClaudeBinary: claude, ClaudeBinary: claude,
@@ -73,3 +75,58 @@ func TestExecutorTimesOut(t *testing.T) {
_, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "slow"}) _, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "slow"})
assert.ErrorContains(t, err, "timeout") assert.ErrorContains(t, err, "timeout")
} }
func TestExecutorPassesModelFlagForCloudModel(t *testing.T) {
// The script captures its args to a temp file so we can assert --model was passed.
argsFile := filepath.Join(t.TempDir(), "args.txt")
envelope := `{"type":"result","subtype":"success","is_error":false,"structured_output":{"status":"pass","phase":"review","skill":"review","file_path":"","runner_output":"","verified":true,"model_used":"claude-sonnet-4-6","message":"ok"}}`
dir := t.TempDir()
script := filepath.Join(dir, "claude")
content := "#!/bin/sh\necho \"$@\" > " + argsFile + "\necho '" + envelope + "'\n"
require.NoError(t, os.WriteFile(script, []byte(content), 0755))
ex := iexec.New(iexec.Config{
ClaudeBinary: script,
SystemPrompt: "sys",
Timeout: 5 * time.Second,
})
_, err := ex.Run(context.Background(), iexec.Request{
SkillPrompt: "review rules",
TaskPrompt: "do review",
Model: "claude-sonnet-4-6",
})
require.NoError(t, err)
argsData, err := os.ReadFile(argsFile)
require.NoError(t, err)
assert.Contains(t, string(argsData), "--model claude-sonnet-4-6")
}
func TestExecutorSkipsModelFlagForLocalModel(t *testing.T) {
argsFile := filepath.Join(t.TempDir(), "args.txt")
envelope := `{"type":"result","subtype":"success","is_error":false,"structured_output":{"status":"pass","phase":"review","skill":"review","file_path":"","runner_output":"","verified":true,"model_used":"ollama/devstral","message":"ok"}}`
dir := t.TempDir()
script := filepath.Join(dir, "claude")
content := "#!/bin/sh\necho \"$@\" > " + argsFile + "\necho '" + envelope + "'\n"
require.NoError(t, os.WriteFile(script, []byte(content), 0755))
ex := iexec.New(iexec.Config{
ClaudeBinary: script,
SystemPrompt: "sys",
Timeout: 5 * time.Second,
})
_, err := ex.Run(context.Background(), iexec.Request{
SkillPrompt: "review rules",
TaskPrompt: "do review",
Model: "ollama/devstral",
})
require.NoError(t, err)
argsData, err := os.ReadFile(argsFile)
require.NoError(t, err)
assert.NotContains(t, string(argsData), "--model")
}

103
internal/exec/litellm.go Normal file
View File

@@ -0,0 +1,103 @@
package exec
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"time"
)
// LiteLLMExecutor calls a LiteLLM-compatible /v1/chat/completions endpoint.
// Local models are expected to return a JSON object matching the Result schema
// as their response content — no envelope.
type LiteLLMExecutor struct {
baseURL string
apiKey string
httpClient *http.Client
}
// NewLiteLLM creates a LiteLLMExecutor.
// timeout applies to the full HTTP round-trip per call.
func NewLiteLLM(baseURL, apiKey string, timeout time.Duration) *LiteLLMExecutor {
return &LiteLLMExecutor{
baseURL: baseURL,
apiKey: apiKey,
httpClient: &http.Client{Timeout: timeout},
}
}
type litellmMessage struct {
Role string `json:"role"`
Content string `json:"content"`
}
type litellmRequest struct {
Model string `json:"model"`
Messages []litellmMessage `json:"messages"`
}
type litellmChoice struct {
Message litellmMessage `json:"message"`
}
type litellmResponse struct {
Choices []litellmChoice `json:"choices"`
}
// Run dispatches req to the LiteLLM server and parses the Result from the
// assistant message content. Returns an error on network failure, non-200
// status, or unparseable/invalid JSON — all of which the Orchestrator treats
// as automatic escalation triggers.
func (e *LiteLLMExecutor) Run(ctx context.Context, req Request) (Result, error) {
body := litellmRequest{
Model: req.Model,
Messages: []litellmMessage{
{Role: "system", Content: req.SkillPrompt},
{Role: "user", Content: req.TaskPrompt},
},
}
bodyBytes, err := json.Marshal(body)
if err != nil {
return Result{}, fmt.Errorf("litellm: marshal request: %w", err)
}
httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, e.baseURL+"/v1/chat/completions", bytes.NewReader(bodyBytes))
if err != nil {
return Result{}, fmt.Errorf("litellm: create request: %w", err)
}
httpReq.Header.Set("Content-Type", "application/json")
if e.apiKey != "" {
httpReq.Header.Set("Authorization", "Bearer "+e.apiKey)
}
resp, err := e.httpClient.Do(httpReq)
if err != nil {
return Result{}, fmt.Errorf("litellm: request failed: %w", err)
}
defer resp.Body.Close() //nolint:errcheck
if resp.StatusCode != http.StatusOK {
return Result{}, fmt.Errorf("litellm: server returned status %d", resp.StatusCode)
}
var chatResp litellmResponse
if err := json.NewDecoder(resp.Body).Decode(&chatResp); err != nil {
return Result{}, fmt.Errorf("litellm: decode response: %w", err)
}
if len(chatResp.Choices) == 0 {
return Result{}, fmt.Errorf("litellm: no choices in response")
}
content := chatResp.Choices[0].Message.Content
var result Result
if err := json.Unmarshal([]byte(content), &result); err != nil {
return Result{}, fmt.Errorf("litellm: parse result JSON: %w — content: %s", err, content)
}
if err := result.Validate(); err != nil {
return Result{}, fmt.Errorf("litellm: invalid result: %w", err)
}
return result, nil
}

View File

@@ -0,0 +1,112 @@
package exec_test
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
"time"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func validLiteLLMResult() iexec.Result {
return iexec.Result{
Status: "pass",
Phase: "review",
Skill: "review",
ModelUsed: "ollama/devstral",
Message: "looks good",
}
}
func chatResponseFor(t *testing.T, result iexec.Result) []byte {
t.Helper()
content, err := json.Marshal(result)
require.NoError(t, err)
resp := map[string]any{
"choices": []map[string]any{
{"message": map[string]any{"role": "assistant", "content": string(content)}},
},
}
data, err := json.Marshal(resp)
require.NoError(t, err)
return data
}
func TestLiteLLMParsesValidResult(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
assert.Equal(t, "/v1/chat/completions", r.URL.Path)
assert.Equal(t, "application/json", r.Header.Get("Content-Type"))
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_, _ = w.Write(chatResponseFor(t, validLiteLLMResult()))
}))
defer srv.Close()
ex := iexec.NewLiteLLM(srv.URL, "", 5*time.Second)
result, err := ex.Run(context.Background(), iexec.Request{
SkillPrompt: "review rules",
TaskPrompt: "review the code",
Model: "ollama/devstral",
})
require.NoError(t, err)
assert.Equal(t, "pass", result.Status)
assert.Equal(t, "review", result.Skill)
}
func TestLiteLLMSendsAuthHeader(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
assert.Equal(t, "Bearer secret", r.Header.Get("Authorization"))
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_, _ = w.Write(chatResponseFor(t, validLiteLLMResult()))
}))
defer srv.Close()
ex := iexec.NewLiteLLM(srv.URL, "secret", 5*time.Second)
_, err := ex.Run(context.Background(), iexec.Request{Model: "x", TaskPrompt: "t", SkillPrompt: "s"})
require.NoError(t, err)
}
func TestLiteLLMErrorOnNonOKStatus(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusServiceUnavailable)
}))
defer srv.Close()
ex := iexec.NewLiteLLM(srv.URL, "", 5*time.Second)
_, err := ex.Run(context.Background(), iexec.Request{Model: "x", TaskPrompt: "t"})
assert.ErrorContains(t, err, "503")
}
func TestLiteLLMErrorOnUnparsableJSON(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
resp := map[string]any{
"choices": []map[string]any{
{"message": map[string]any{"role": "assistant", "content": "not json at all"}},
},
}
data, _ := json.Marshal(resp)
_, _ = w.Write(data)
}))
defer srv.Close()
ex := iexec.NewLiteLLM(srv.URL, "", 5*time.Second)
_, err := ex.Run(context.Background(), iexec.Request{Model: "x", TaskPrompt: "t"})
assert.Error(t, err)
}
func TestLiteLLMRespectsContextCancellation(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
cancel() // Cancel immediately
ex := iexec.NewLiteLLM("http://invalid.example.com", "", 1*time.Second)
_, err := ex.Run(ctx, iexec.Request{Model: "x", TaskPrompt: "t"})
assert.Error(t, err)
}

View File

@@ -0,0 +1,197 @@
package exec
import (
"context"
"fmt"
"io"
"net/http"
"strings"
"time"
)
// ChainEntry is one tier in an escalation chain.
type ChainEntry struct {
Model string // e.g. "ollama/phi4", "claude-sonnet-4-6"
Tier string // "local" | "subagent" | "managed"
IsCloud bool // true for claude-* models; skips verifier call
}
// EntryFor builds a ChainEntry from a model name string.
func EntryFor(model string) ChainEntry {
cloud := strings.HasPrefix(model, "claude-")
tier := "local"
if cloud {
tier = "subagent"
}
return ChainEntry{Model: model, Tier: tier, IsCloud: cloud}
}
// AttemptRecord captures the outcome of one tier attempt for session logging.
type AttemptRecord struct {
Model string
Tier string
DurationMs int64
WarmStart bool
Verdict string // "accept" | "escalate" | "error"
Feedback string
}
// VerifierFn is the interface the orchestrator uses to verify local output.
type VerifierFn interface {
Verify(ctx context.Context, skillPrompt, taskPrompt string, output Result) (Verdict, error)
}
// ExecutorRunFn is the signature of Executor.Run and LiteLLMExecutor.Run.
type ExecutorRunFn func(ctx context.Context, req Request) (Result, error)
// Orchestrator walks an escalation chain, delegating generation and verification.
// It implements the ExecutorFn shape expected by skill handlers.
type Orchestrator struct {
chain []ChainEntry
localRun ExecutorRunFn // for local (non-cloud) tiers; may be nil
cloudRun ExecutorRunFn // for cloud tiers; may be nil
verifier VerifierFn
llamaSwapURL string
attempts *[]AttemptRecord
}
// NewOrchestrator creates an Orchestrator.
// attempts is a pointer to a slice that will be appended to on each tier attempt.
// Pass nil for localRun or cloudRun if no tiers of that type exist in the chain.
func NewOrchestrator(
chain []ChainEntry,
localRun ExecutorRunFn,
cloudRun ExecutorRunFn,
verifier VerifierFn,
llamaSwapURL string,
attempts *[]AttemptRecord,
) *Orchestrator {
return &Orchestrator{
chain: chain,
localRun: localRun,
cloudRun: cloudRun,
verifier: verifier,
llamaSwapURL: llamaSwapURL,
attempts: attempts,
}
}
// Run walks the escalation chain and returns the first accepted result.
// Satisfies the ExecutorFn signature: func(context.Context, Request) (Result, error).
func (o *Orchestrator) Run(ctx context.Context, req Request) (Result, error) {
taskPrompt := req.TaskPrompt
for _, entry := range o.chain {
warm := o.probeWarm(entry.Model)
start := time.Now()
tierReq := req
tierReq.Model = entry.Model
tierReq.TaskPrompt = taskPrompt
if entry.IsCloud {
result, genErr := o.cloudRun(ctx, tierReq)
dur := time.Since(start).Milliseconds()
verdict := "accept"
if genErr != nil {
verdict = "error"
}
o.appendAttempt(AttemptRecord{
Model: entry.Model,
Tier: entry.Tier,
DurationMs: dur,
WarmStart: warm,
Verdict: verdict,
})
if genErr == nil {
return result, nil
}
continue
}
// Local tier.
result, genErr := o.localRun(ctx, tierReq)
dur := time.Since(start).Milliseconds()
if genErr != nil {
o.appendAttempt(AttemptRecord{
Model: entry.Model,
Tier: entry.Tier,
DurationMs: dur,
WarmStart: warm,
Verdict: "error",
Feedback: genErr.Error(),
})
continue
}
verdict, verErr := o.verifier.Verify(ctx, req.SkillPrompt, taskPrompt, result)
if verErr != nil {
// Treat verifier failure as escalate (safe default).
o.appendAttempt(AttemptRecord{
Model: entry.Model,
Tier: entry.Tier,
DurationMs: dur,
WarmStart: warm,
Verdict: "escalate",
Feedback: "verifier error: " + verErr.Error(),
})
continue
}
if verdict.Accept {
o.appendAttempt(AttemptRecord{
Model: entry.Model,
Tier: entry.Tier,
DurationMs: dur,
WarmStart: warm,
Verdict: "accept",
})
return result, nil
}
o.appendAttempt(AttemptRecord{
Model: entry.Model,
Tier: entry.Tier,
DurationMs: dur,
WarmStart: warm,
Verdict: "escalate",
Feedback: verdict.Feedback,
})
// Inject verifier feedback into the next tier's task prompt.
taskPrompt = taskPrompt + "\n\nPrior attempt feedback: " + verdict.Feedback
}
return Result{}, fmt.Errorf("all tiers exhausted after %d attempt(s)", len(o.chain))
}
func (o *Orchestrator) appendAttempt(rec AttemptRecord) {
if o.attempts != nil {
*o.attempts = append(*o.attempts, rec)
}
}
// probeWarm checks whether the model is currently loaded in llama-swap.
// Returns false on any error or if llamaSwapURL is empty.
func (o *Orchestrator) probeWarm(model string) bool {
if o.llamaSwapURL == "" {
return false
}
ctx, cancel := context.WithTimeout(context.Background(), 200*time.Millisecond)
defer cancel()
req, err := http.NewRequestWithContext(ctx, http.MethodGet, o.llamaSwapURL+"/v1/models", nil)
if err != nil {
return false
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return false
}
defer resp.Body.Close() //nolint:errcheck
body, err := io.ReadAll(resp.Body)
if err != nil {
return false
}
return strings.Contains(string(body), model)
}

View File

@@ -0,0 +1,151 @@
package exec_test
import (
"context"
"errors"
"testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// stubRunFn returns preset results sequentially.
type stubRunFn struct {
calls []stubCall
callIdx int
}
type stubCall struct {
result iexec.Result
err error
}
func (s *stubRunFn) Run(_ context.Context, _ iexec.Request) (iexec.Result, error) {
if s.callIdx >= len(s.calls) {
return iexec.Result{}, errors.New("unexpected call")
}
c := s.calls[s.callIdx]
s.callIdx++
return c.result, c.err
}
// stubVerifier returns preset verdicts sequentially.
type stubVerifier struct {
verdicts []iexec.Verdict
idx int
}
func (s *stubVerifier) Verify(_ context.Context, _, _ string, _ iexec.Result) (iexec.Verdict, error) {
if s.idx >= len(s.verdicts) {
return iexec.Verdict{}, errors.New("unexpected verify call")
}
v := s.verdicts[s.idx]
s.idx++
return v, nil
}
func okResult(skill string) iexec.Result {
return iexec.Result{Status: "pass", Phase: "review", Skill: skill, Message: "ok", ModelUsed: "m"}
}
func TestOrchestratorSingleLocalAccept(t *testing.T) {
local := &stubRunFn{calls: []stubCall{{result: okResult("review")}}}
verifier := &stubVerifier{verdicts: []iexec.Verdict{{Accept: true}}}
var attempts []iexec.AttemptRecord
orch := iexec.NewOrchestrator(
[]iexec.ChainEntry{{Model: "ollama/devstral", Tier: "local", IsCloud: false}},
local.Run, nil, verifier, "", &attempts,
)
result, err := orch.Run(context.Background(), iexec.Request{TaskPrompt: "review"})
require.NoError(t, err)
assert.Equal(t, "pass", result.Status)
require.Len(t, attempts, 1)
assert.Equal(t, "local", attempts[0].Tier)
assert.Equal(t, "accept", attempts[0].Verdict)
}
func TestOrchestratorEscalatesOnVerifierReject(t *testing.T) {
local := &stubRunFn{calls: []stubCall{
{result: iexec.Result{Status: "fail", Phase: "review", Skill: "review", Message: "weak"}},
{result: okResult("review")},
}}
verifier := &stubVerifier{verdicts: []iexec.Verdict{
{Accept: false, Feedback: "missing line refs"},
{Accept: true},
}}
var attempts []iexec.AttemptRecord
orch := iexec.NewOrchestrator(
[]iexec.ChainEntry{
{Model: "ollama/devstral", Tier: "local", IsCloud: false},
{Model: "ollama/gemma4", Tier: "local", IsCloud: false},
},
local.Run, nil, verifier, "", &attempts,
)
result, err := orch.Run(context.Background(), iexec.Request{TaskPrompt: "review"})
require.NoError(t, err)
assert.Equal(t, "pass", result.Status)
require.Len(t, attempts, 2)
assert.Equal(t, "escalate", attempts[0].Verdict)
assert.Equal(t, "missing line refs", attempts[0].Feedback)
assert.Equal(t, "accept", attempts[1].Verdict)
}
func TestOrchestratorEscalatesOnLocalError(t *testing.T) {
local := &stubRunFn{calls: []stubCall{
{err: errors.New("network failure")},
{result: okResult("review")},
}}
verifier := &stubVerifier{verdicts: []iexec.Verdict{{Accept: true}}}
var attempts []iexec.AttemptRecord
orch := iexec.NewOrchestrator(
[]iexec.ChainEntry{
{Model: "ollama/devstral", Tier: "local", IsCloud: false},
{Model: "ollama/gemma4", Tier: "local", IsCloud: false},
},
local.Run, nil, verifier, "", &attempts,
)
_, err := orch.Run(context.Background(), iexec.Request{TaskPrompt: "review"})
require.NoError(t, err)
require.Len(t, attempts, 2)
assert.Equal(t, "error", attempts[0].Verdict)
assert.Equal(t, "accept", attempts[1].Verdict)
}
func TestOrchestratorCloudTierSelfCertifies(t *testing.T) {
cloud := &stubRunFn{calls: []stubCall{{result: okResult("review")}}}
verifier := &stubVerifier{} // no verdicts — must not be called
var attempts []iexec.AttemptRecord
orch := iexec.NewOrchestrator(
[]iexec.ChainEntry{{Model: "claude-sonnet-4-6", Tier: "subagent", IsCloud: true}},
nil, cloud.Run, verifier, "", &attempts,
)
result, err := orch.Run(context.Background(), iexec.Request{TaskPrompt: "review"})
require.NoError(t, err)
assert.Equal(t, "pass", result.Status)
require.Len(t, attempts, 1)
assert.Equal(t, "subagent", attempts[0].Tier)
assert.Equal(t, "accept", attempts[0].Verdict)
assert.Equal(t, 0, verifier.idx) // verifier never called
}
func TestOrchestratorAllTiersExhausted(t *testing.T) {
local := &stubRunFn{calls: []stubCall{{err: errors.New("unavailable")}}}
var attempts []iexec.AttemptRecord
orch := iexec.NewOrchestrator(
[]iexec.ChainEntry{{Model: "ollama/devstral", Tier: "local", IsCloud: false}},
local.Run, nil, &stubVerifier{}, "", &attempts,
)
_, err := orch.Run(context.Background(), iexec.Request{TaskPrompt: "review"})
assert.ErrorContains(t, err, "all tiers exhausted")
}

View File

@@ -10,7 +10,7 @@ import (
// validates its own output before returning. // validates its own output before returning.
type Result struct { type Result struct {
Status string `json:"status"` // pass | fail | error Status string `json:"status"` // pass | fail | error
Phase string `json:"phase"` // red | green | refactor Phase string `json:"phase"` // red | green | refactor | retrospective | review | debug | spec | trainer
Skill string `json:"skill"` // tdd | review | ... Skill string `json:"skill"` // tdd | review | ...
FilePath string `json:"file_path"` // absolute path to generated file FilePath string `json:"file_path"` // absolute path to generated file
RunnerOutput string `json:"runner_output"` // raw stdout+stderr from test runner RunnerOutput string `json:"runner_output"` // raw stdout+stderr from test runner
@@ -25,6 +25,10 @@ var validPhases = map[string]bool{
"green": true, "green": true,
"refactor": true, "refactor": true,
"retrospective": true, "retrospective": true,
"review": true,
"debug": true,
"spec": true,
"trainer": true,
} }
func (r Result) Validate() error { func (r Result) Validate() error {
@@ -33,7 +37,7 @@ func (r Result) Validate() error {
errs = append(errs, "status must be pass|fail|error, got: "+r.Status) errs = append(errs, "status must be pass|fail|error, got: "+r.Status)
} }
if !validPhases[r.Phase] { if !validPhases[r.Phase] {
errs = append(errs, "phase must be red|green|refactor, got: "+r.Phase) errs = append(errs, "phase must be one of red|green|refactor|retrospective|review|debug|spec|trainer, got: "+r.Phase)
} }
if r.Skill == "" { if r.Skill == "" {
errs = append(errs, "skill is required") errs = append(errs, "skill is required")
@@ -50,7 +54,7 @@ const Schema = `{
"required": ["status","phase","skill","file_path","runner_output","verified","model_used","message"], "required": ["status","phase","skill","file_path","runner_output","verified","model_used","message"],
"properties": { "properties": {
"status": {"type": "string", "enum": ["pass","fail","error"]}, "status": {"type": "string", "enum": ["pass","fail","error"]},
"phase": {"type": "string", "enum": ["red","green","refactor"]}, "phase": {"type": "string"},
"skill": {"type": "string"}, "skill": {"type": "string"},
"file_path": {"type": "string"}, "file_path": {"type": "string"},
"runner_output": {"type": "string"}, "runner_output": {"type": "string"},

View File

@@ -69,3 +69,11 @@ func TestResultValidation(t *testing.T) {
}) })
} }
} }
func TestValidateAcceptsAllPhases(t *testing.T) {
phases := []string{"red", "green", "refactor", "retrospective", "review", "debug", "spec", "trainer"}
for _, phase := range phases {
r := exec.Result{Status: "pass", Phase: phase, Skill: "test", ModelUsed: "self", Message: "ok"}
assert.NoError(t, r.Validate(), "phase %q should be valid", phase)
}
}

99
internal/exec/verifier.go Normal file
View File

@@ -0,0 +1,99 @@
package exec
import (
"bytes"
"context"
"encoding/json"
"fmt"
"os"
"os/exec"
"time"
)
// Verdict is the output of a Claude verification call.
type Verdict struct {
Accept bool `json:"accept"`
Feedback string `json:"feedback"` // empty when Accept is true
}
// Verifier runs a focused Claude call to judge local model output.
type Verifier struct {
claudeBinary string
model string
timeout time.Duration
}
// NewVerifier creates a Verifier that calls claude with the given binary path and model.
// Empty claudeBinary defaults to "claude". Zero timeout defaults to 30s.
func NewVerifier(claudeBinary, model string, timeout time.Duration) *Verifier {
if claudeBinary == "" {
claudeBinary = "claude"
}
if timeout == 0 {
timeout = 30 * time.Second
}
return &Verifier{
claudeBinary: claudeBinary,
model: model,
timeout: timeout,
}
}
// Verify asks Claude whether output satisfies the skill discipline's iron laws.
// Returns Verdict{Accept: true} to accept or Verdict{Accept: false, Feedback: "..."}
// to escalate. Returns an error on subprocess failure or unparseable response.
func (v *Verifier) Verify(ctx context.Context, skillPrompt, taskPrompt string, output Result) (Verdict, error) {
ctx, cancel := context.WithTimeout(ctx, v.timeout)
defer cancel()
outputJSON, err := json.Marshal(output)
if err != nil {
return Verdict{}, fmt.Errorf("verifier: marshal output: %w", err)
}
prompt := fmt.Sprintf(`You are a quality verifier for an AI supervisor system.
Given the skill discipline, the original task, and the generated output, decide whether the output satisfies the discipline's iron laws and output contract.
Reply with JSON only — no other text:
{"accept": true, "feedback": ""}
or
{"accept": false, "feedback": "<one sentence reason>"}
## Skill discipline
%s
## Original task
%s
## Generated output
%s`, skillPrompt, taskPrompt, string(outputJSON))
args := []string{
"--print",
"--permission-mode", "bypassPermissions",
}
if v.model != "" {
args = append(args, "--model", v.model)
}
args = append(args, prompt)
cmd := exec.CommandContext(ctx, v.claudeBinary, args...)
cmd.Env = os.Environ()
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
if ctx.Err() != nil {
return Verdict{}, fmt.Errorf("verifier: timeout after %s", v.timeout)
}
return Verdict{}, fmt.Errorf("verifier: claude exited with error: %w — stderr: %s", err, stderr.String())
}
var verdict Verdict
if err := json.Unmarshal(bytes.TrimSpace(stdout.Bytes()), &verdict); err != nil {
return Verdict{}, fmt.Errorf("verifier: parse verdict JSON: %w — raw: %s", err, stdout.String())
}
return verdict, nil
}

View File

@@ -0,0 +1,74 @@
package exec_test
import (
"context"
"encoding/json"
"fmt"
"os"
"path/filepath"
"testing"
"time"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func fakeVerifierClaude(t *testing.T, verdict iexec.Verdict) string {
t.Helper()
data, err := json.Marshal(verdict)
require.NoError(t, err)
dir := t.TempDir()
script := filepath.Join(dir, "claude")
content := fmt.Sprintf("#!/bin/sh\necho '%s'\n", string(data))
require.NoError(t, os.WriteFile(script, []byte(content), 0755))
return script
}
func TestVerifierAccepts(t *testing.T) {
claude := fakeVerifierClaude(t, iexec.Verdict{Accept: true, Feedback: ""})
v := iexec.NewVerifier(claude, "claude-sonnet-4-6", 5*time.Second)
verdict, err := v.Verify(context.Background(), "skill rules", "do the task", iexec.Result{
Status: "pass", Phase: "review", Skill: "review", Message: "ok",
})
require.NoError(t, err)
assert.True(t, verdict.Accept)
assert.Empty(t, verdict.Feedback)
}
func TestVerifierEscalates(t *testing.T) {
claude := fakeVerifierClaude(t, iexec.Verdict{Accept: false, Feedback: "missing line references"})
v := iexec.NewVerifier(claude, "claude-sonnet-4-6", 5*time.Second)
verdict, err := v.Verify(context.Background(), "skill rules", "do the task", iexec.Result{
Status: "pass", Phase: "review", Skill: "review", Message: "incomplete",
})
require.NoError(t, err)
assert.False(t, verdict.Accept)
assert.Equal(t, "missing line references", verdict.Feedback)
}
func TestVerifierErrorOnUnparsableOutput(t *testing.T) {
dir := t.TempDir()
script := filepath.Join(dir, "claude")
require.NoError(t, os.WriteFile(script, []byte("#!/bin/sh\necho 'not json'\n"), 0755))
v := iexec.NewVerifier(script, "claude-sonnet-4-6", 5*time.Second)
_, err := v.Verify(context.Background(), "rules", "task", iexec.Result{
Status: "pass", Phase: "review", Skill: "review", Message: "ok",
})
assert.Error(t, err)
}
func TestVerifierErrorOnNonZeroExit(t *testing.T) {
dir := t.TempDir()
script := filepath.Join(dir, "claude")
require.NoError(t, os.WriteFile(script, []byte("#!/bin/sh\nexit 1\n"), 0755))
v := iexec.NewVerifier(script, "claude-sonnet-4-6", 5*time.Second)
_, err := v.Verify(context.Background(), "rules", "task", iexec.Result{
Status: "pass", Phase: "review", Skill: "review", Message: "ok",
})
assert.Error(t, err)
}

View File

@@ -80,7 +80,7 @@ func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
} }
w.Header().Set("Content-Type", "application/json") w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(response{ _ = json.NewEncoder(w).Encode(response{
JSONRPC: "2.0", JSONRPC: "2.0",
ID: req.ID, ID: req.ID,
Result: result, Result: result,
@@ -90,7 +90,7 @@ func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
func writeError(w http.ResponseWriter, id any, code int, msg string) { func writeError(w http.ResponseWriter, id any, code int, msg string) {
w.Header().Set("Content-Type", "application/json") w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(response{ _ = json.NewEncoder(w).Encode(response{
JSONRPC: "2.0", JSONRPC: "2.0",
ID: id, ID: id,
Error: &rpcError{Code: code, Message: msg}, Error: &rpcError{Code: code, Message: msg},

View File

@@ -0,0 +1,38 @@
// internal/session/history.go
package session
import (
"fmt"
"strings"
)
// FormatHistory formats prior session entries as a structured block for
// injection into a worker task prompt. Entries matching excludePhase are
// omitted (pass the current phase to avoid circular injection).
func FormatHistory(entries []Entry, excludePhase string) string {
var filtered []Entry
for _, e := range entries {
if e.Phase != excludePhase {
filtered = append(filtered, e)
}
}
if len(filtered) == 0 {
return ""
}
var b strings.Builder
b.WriteString("## Session history\n\n")
for _, e := range filtered {
fmt.Fprintf(&b, "### Phase: %s\n", e.Phase) //nolint:errcheck // strings.Builder never errors
fmt.Fprintf(&b, "- Skill: %s\n", e.Skill) //nolint:errcheck
fmt.Fprintf(&b, "- Status: %s\n", e.FinalStatus) //nolint:errcheck
if e.FilePath != "" {
fmt.Fprintf(&b, "- File: %s\n", e.FilePath) //nolint:errcheck
}
if e.Message != "" {
fmt.Fprintf(&b, "- Summary: %s\n", e.Message) //nolint:errcheck
}
b.WriteString("\n")
}
return b.String()
}

View File

@@ -0,0 +1,41 @@
// internal/session/history_test.go
package session_test
import (
"testing"
"time"
"github.com/mathiasbq/supervisor/internal/session"
"github.com/stretchr/testify/assert"
)
func TestFormatHistoryEmpty(t *testing.T) {
result := session.FormatHistory(nil, "")
assert.Equal(t, "", result)
}
func TestFormatHistoryFormatsEntries(t *testing.T) {
entries := []session.Entry{
{
Skill: "tdd", Phase: "red", FinalStatus: "pass",
FilePath: "internal/foo/foo_test.go",
Message: "wrote failing test for Foo",
Timestamp: time.Now(),
},
}
result := session.FormatHistory(entries, "")
assert.Contains(t, result, "## Session history")
assert.Contains(t, result, "Phase: red")
assert.Contains(t, result, "wrote failing test for Foo")
assert.Contains(t, result, "internal/foo/foo_test.go")
}
func TestFormatHistoryExcludesCurrentPhase(t *testing.T) {
entries := []session.Entry{
{Skill: "tdd", Phase: "red", Message: "red done", FinalStatus: "pass"},
{Skill: "tdd", Phase: "green", Message: "green done", FinalStatus: "pass"},
}
result := session.FormatHistory(entries, "green")
assert.Contains(t, result, "red done")
assert.NotContains(t, result, "green done")
}

View File

@@ -32,9 +32,14 @@ type Entry struct {
type Attempt struct { type Attempt struct {
Attempt int `json:"attempt"` Attempt int `json:"attempt"`
Model string `json:"model"` Model string `json:"model"`
Tier string `json:"tier"` // local | subagent | managed
DurationMs int64 `json:"duration_ms"`
WarmStart bool `json:"warm_start"` // model already loaded in llama-swap
Verified bool `json:"verified"`
Verdict string `json:"verdict,omitempty"` // accept | escalate | error
Feedback string `json:"feedback,omitempty"` // verifier feedback on escalation
OutputSummary string `json:"output_summary,omitempty"` OutputSummary string `json:"output_summary,omitempty"`
RunnerOutput string `json:"runner_output,omitempty"` RunnerOutput string `json:"runner_output,omitempty"`
Verified bool `json:"verified"`
} }
// Append writes entry as a single JSON line to sessionsDir/{sessionID}.jsonl. // Append writes entry as a single JSON line to sessionsDir/{sessionID}.jsonl.
@@ -73,7 +78,7 @@ func Read(sessionsDir, sessionID string) ([]Entry, error) {
if err != nil { if err != nil {
return nil, fmt.Errorf("open session log: %w", err) return nil, fmt.Errorf("open session log: %w", err)
} }
defer f.Close() defer f.Close() //nolint:errcheck
var entries []Entry var entries []Entry
scanner := bufio.NewScanner(f) scanner := bufio.NewScanner(f)

View File

@@ -61,3 +61,22 @@ func TestRead_EmptyWhenNoFile(t *testing.T) {
require.NoError(t, err) require.NoError(t, err)
assert.Empty(t, entries) assert.Empty(t, entries)
} }
func TestAttemptRoundTrip(t *testing.T) {
a := session.Attempt{
Attempt: 1,
Model: "ollama/devstral",
Tier: "local",
DurationMs: 4200,
WarmStart: true,
Verified: false,
Verdict: "escalate",
Feedback: "missing line references",
}
data, err := json.Marshal(a)
require.NoError(t, err)
var got session.Attempt
require.NoError(t, json.Unmarshal(data, &got))
assert.Equal(t, a, got)
}

View File

@@ -18,7 +18,7 @@ func TestHandle_BrainQuery_CallsIngestServer(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
assert.Equal(t, "/query", r.URL.Path) assert.Equal(t, "/query", r.URL.Path)
called = true called = true
json.NewEncoder(w).Encode(map[string]any{ _ = json.NewEncoder(w).Encode(map[string]any{
"results": []map[string]any{ "results": []map[string]any{
{"path": "wiki/concepts/tdd.md", "title": "TDD", "excerpt": "Test-driven development.", "score": 3}, {"path": "wiki/concepts/tdd.md", "title": "TDD", "excerpt": "Test-driven development.", "score": 3},
}, },
@@ -45,7 +45,7 @@ func TestHandle_BrainWrite_CallsIngestServer(t *testing.T) {
require.NoError(t, json.NewDecoder(r.Body).Decode(&body)) require.NoError(t, json.NewDecoder(r.Body).Decode(&body))
assert.Equal(t, "concept", body["type"]) assert.Equal(t, "concept", body["type"])
assert.Equal(t, "# Test\n\nSome learning.", body["content"]) assert.Equal(t, "# Test\n\nSome learning.", body["content"])
json.NewEncoder(w).Encode(map[string]string{"path": "raw/test.md"}) _ = json.NewEncoder(w).Encode(map[string]string{"path": "raw/test.md"})
})) }))
defer srv.Close() defer srv.Close()

View File

@@ -0,0 +1,80 @@
// internal/skills/debug/handlers.go
package debug
import (
"context"
"encoding/json"
"fmt"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
)
type debugArgs struct {
ProjectRoot string `json:"project_root"`
Error string `json:"error"`
Context string `json:"context"`
Model string `json:"model"`
SessionID string `json:"session_id"`
}
// Handle dispatches the MCP tool call to the appropriate handler.
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
if tool != "debug" {
return nil, fmt.Errorf("unknown tool: %s", tool)
}
var a debugArgs
if err := json.Unmarshal(args, &a); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if a.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if a.Error == "" {
return nil, fmt.Errorf("error is required")
}
model := a.Model
if model == "" {
model = s.cfg.DefaultModel
}
task := fmt.Sprintf(
"phase: debug\nproject_root: %s\nerror: %s\ncontext: %s\nmodel: %s",
a.ProjectRoot, a.Error, a.Context, model,
)
task = s.prependHistory(a.SessionID, "debug", task)
if s.cfg.ExecutorFn == nil {
return nil, fmt.Errorf("no executor configured")
}
result, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.SkillPrompt,
TaskPrompt: task,
Model: model,
Tools: "Read,Bash",
})
if err != nil {
return nil, err
}
b, err := json.Marshal(result)
if err != nil {
return nil, fmt.Errorf("marshal result: %w", err)
}
return b, nil
}
func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
if sessionID == "" || s.cfg.SessionsDir == "" {
return task
}
entries, err := session.Read(s.cfg.SessionsDir, sessionID)
if err != nil || len(entries) == 0 {
return task
}
history := session.FormatHistory(entries, currentPhase)
if history == "" {
return task
}
return history + "\n---\n\n" + task
}

View File

@@ -0,0 +1,61 @@
// internal/skills/debug/handlers_test.go
package debug_test
import (
"context"
"encoding/json"
"testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/skills/debug"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestDebugToolRegistered(t *testing.T) {
sk := debug.New(debug.Config{SkillPrompt: "debug rules"})
names := make([]string, 0)
for _, tool := range sk.Tools() {
names = append(names, tool.Name)
}
assert.Contains(t, names, "debug")
}
func TestDebugRequiresProjectRoot(t *testing.T) {
sk := debug.New(debug.Config{SkillPrompt: "d"})
_, err := sk.Handle(context.Background(), "debug", json.RawMessage(`{"error":"panic: nil pointer"}`))
assert.ErrorContains(t, err, "project_root")
}
func TestDebugRequiresError(t *testing.T) {
sk := debug.New(debug.Config{SkillPrompt: "d"})
_, err := sk.Handle(context.Background(), "debug", json.RawMessage(`{"project_root":"/tmp"}`))
assert.ErrorContains(t, err, "error")
}
func TestDebugCallsExecutor(t *testing.T) {
called := false
var capturedTask string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
called = true
capturedTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "debug", Skill: "debug",
RunnerOutput: "HYPOTHESIS 1 (likelihood: high): nil map access\nVERIFY: go test ./... → expected: panic line reference",
Verified: false, ModelUsed: "self", Message: "3 hypotheses for: panic nil pointer at foo.go:42",
}, nil
}
sk := debug.New(debug.Config{SkillPrompt: "debug rules", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
out, err := sk.Handle(context.Background(), "debug", json.RawMessage(
`{"project_root":"/tmp/proj","error":"panic: nil pointer dereference at foo.go:42","context":"occurs on startup"}`,
))
require.NoError(t, err)
assert.True(t, called)
assert.Contains(t, capturedTask, "panic: nil pointer dereference")
assert.Contains(t, capturedTask, "occurs on startup")
var result iexec.Result
require.NoError(t, json.Unmarshal(out, &result))
assert.Equal(t, "debug", result.Phase)
}

View File

@@ -0,0 +1,55 @@
// internal/skills/debug/skill.go
package debug
import (
"context"
"encoding/json"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/registry"
)
// ExecutorFn is the function signature for running a worker subprocess.
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
// Config holds dependencies for the debug skill.
type Config struct {
SkillPrompt string
DefaultModel string
ExecutorFn ExecutorFn
SessionsDir string
}
// Skill implements the debug MCP tool.
type Skill struct{ cfg Config }
// New creates a new debug Skill.
func New(cfg Config) *Skill { return &Skill{cfg: cfg} }
// Name returns the skill identifier.
func (s *Skill) Name() string { return "debug" }
// Tools returns the MCP tool definitions for this skill.
func (s *Skill) Tools() []registry.ToolDef {
schema := func(required []string, props map[string]any) json.RawMessage {
b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
return b
}
str := map[string]any{"type": "string"}
return []registry.ToolDef{
{
Name: "debug",
Description: "Analyse an error and return 3-5 hypotheses ordered by likelihood, each with a concrete verification step.",
InputSchema: schema(
[]string{"project_root", "error"},
map[string]any{
"project_root": str,
"error": str,
"context": str,
"model": str,
"session_id": str,
},
),
},
}
}

View File

@@ -0,0 +1,81 @@
// internal/skills/review/handlers.go
package review
import (
"context"
"encoding/json"
"fmt"
"strings"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
)
type reviewArgs struct {
ProjectRoot string `json:"project_root"`
Files []string `json:"files"`
Context string `json:"context"`
Model string `json:"model"`
SessionID string `json:"session_id"`
}
// Handle dispatches the MCP tool call to the appropriate handler.
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
if tool != "review" {
return nil, fmt.Errorf("unknown tool: %s", tool)
}
var a reviewArgs
if err := json.Unmarshal(args, &a); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if a.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if len(a.Files) == 0 {
return nil, fmt.Errorf("files is required")
}
model := a.Model
if model == "" {
model = s.cfg.DefaultModel
}
task := fmt.Sprintf(
"phase: review\nproject_root: %s\nfiles: %s\ncontext: %s\nmodel: %s",
a.ProjectRoot, strings.Join(a.Files, ", "), a.Context, model,
)
task = s.prependHistory(a.SessionID, "review", task)
if s.cfg.ExecutorFn == nil {
return nil, fmt.Errorf("no executor configured")
}
result, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.SkillPrompt,
TaskPrompt: task,
Model: model,
Tools: "Read,Bash",
})
if err != nil {
return nil, err
}
b, err := json.Marshal(result)
if err != nil {
return nil, fmt.Errorf("marshal result: %w", err)
}
return b, nil
}
func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
if sessionID == "" || s.cfg.SessionsDir == "" {
return task
}
entries, err := session.Read(s.cfg.SessionsDir, sessionID)
if err != nil || len(entries) == 0 {
return task
}
history := session.FormatHistory(entries, currentPhase)
if history == "" {
return task
}
return history + "\n---\n\n" + task
}

View File

@@ -0,0 +1,61 @@
// internal/skills/review/handlers_test.go
package review_test
import (
"context"
"encoding/json"
"testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/skills/review"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestReviewToolRegistered(t *testing.T) {
sk := review.New(review.Config{SkillPrompt: "review rules"})
names := make([]string, 0)
for _, tool := range sk.Tools() {
names = append(names, tool.Name)
}
assert.Contains(t, names, "review")
}
func TestReviewRequiresProjectRoot(t *testing.T) {
sk := review.New(review.Config{SkillPrompt: "r"})
_, err := sk.Handle(context.Background(), "review", json.RawMessage(`{"files":["main.go"]}`))
assert.ErrorContains(t, err, "project_root")
}
func TestReviewRequiresFiles(t *testing.T) {
sk := review.New(review.Config{SkillPrompt: "r"})
_, err := sk.Handle(context.Background(), "review", json.RawMessage(`{"project_root":"/tmp"}`))
assert.ErrorContains(t, err, "files")
}
func TestReviewCallsExecutor(t *testing.T) {
called := false
var capturedTask string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
called = true
capturedTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "review", Skill: "review",
Verified: true, ModelUsed: "self", Message: "2 warnings found",
}, nil
}
sk := review.New(review.Config{SkillPrompt: "review rules", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
out, err := sk.Handle(context.Background(), "review", json.RawMessage(
`{"project_root":"/tmp/proj","files":["internal/foo/foo.go"],"context":"PR: add Foo helper"}`,
))
require.NoError(t, err)
assert.True(t, called)
assert.Contains(t, capturedTask, "internal/foo/foo.go")
assert.Contains(t, capturedTask, "PR: add Foo helper")
var result iexec.Result
require.NoError(t, json.Unmarshal(out, &result))
assert.Equal(t, "pass", result.Status)
assert.Equal(t, "review", result.Phase)
}

View File

@@ -0,0 +1,55 @@
// internal/skills/review/skill.go
package review
import (
"context"
"encoding/json"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/registry"
)
// ExecutorFn is the function signature for running a worker subprocess.
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
// Config holds dependencies for the review skill.
type Config struct {
SkillPrompt string
DefaultModel string
ExecutorFn ExecutorFn
SessionsDir string
}
// Skill implements the review MCP tool.
type Skill struct{ cfg Config }
// New creates a new review Skill.
func New(cfg Config) *Skill { return &Skill{cfg: cfg} }
// Name returns the skill identifier.
func (s *Skill) Name() string { return "review" }
// Tools returns the MCP tool definitions for this skill.
func (s *Skill) Tools() []registry.ToolDef {
schema := func(required []string, props map[string]any) json.RawMessage {
b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
return b
}
str := map[string]any{"type": "string"}
return []registry.ToolDef{
{
Name: "review",
Description: "Perform a structured code review of the specified files. Returns findings with severity levels.",
InputSchema: schema(
[]string{"project_root", "files"},
map[string]any{
"project_root": str,
"files": map[string]any{"type": "array", "items": map[string]any{"type": "string"}},
"context": str,
"model": str,
"session_id": str,
},
),
},
}
}

View File

@@ -0,0 +1,85 @@
// internal/skills/spec/handlers.go
package spec
import (
"context"
"encoding/json"
"fmt"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
)
type specArgs struct {
ProjectRoot string `json:"project_root"`
Requirements string `json:"requirements"`
OutputPath string `json:"output_path"`
Context string `json:"context"`
Model string `json:"model"`
SessionID string `json:"session_id"`
}
// Handle dispatches the MCP tool call to the appropriate handler.
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
if tool != "spec" {
return nil, fmt.Errorf("unknown tool: %s", tool)
}
var a specArgs
if err := json.Unmarshal(args, &a); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if a.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if a.Requirements == "" {
return nil, fmt.Errorf("requirements is required")
}
outputPath := a.OutputPath
if outputPath == "" {
outputPath = "docs/spec.md"
}
model := a.Model
if model == "" {
model = s.cfg.DefaultModel
}
task := fmt.Sprintf(
"phase: spec\nproject_root: %s\nrequirements: %s\noutput_path: %s\ncontext: %s\nmodel: %s",
a.ProjectRoot, a.Requirements, outputPath, a.Context, model,
)
task = s.prependHistory(a.SessionID, "spec", task)
if s.cfg.ExecutorFn == nil {
return nil, fmt.Errorf("no executor configured")
}
result, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.SkillPrompt,
TaskPrompt: task,
Model: model,
Tools: "Read,Write",
})
if err != nil {
return nil, err
}
b, err := json.Marshal(result)
if err != nil {
return nil, fmt.Errorf("marshal result: %w", err)
}
return b, nil
}
func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
if sessionID == "" || s.cfg.SessionsDir == "" {
return task
}
entries, err := session.Read(s.cfg.SessionsDir, sessionID)
if err != nil || len(entries) == 0 {
return task
}
history := session.FormatHistory(entries, currentPhase)
if history == "" {
return task
}
return history + "\n---\n\n" + task
}

View File

@@ -0,0 +1,61 @@
// internal/skills/spec/handlers_test.go
package spec_test
import (
"context"
"encoding/json"
"testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/skills/spec"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestSpecToolRegistered(t *testing.T) {
sk := spec.New(spec.Config{SkillPrompt: "spec rules"})
names := make([]string, 0)
for _, tool := range sk.Tools() {
names = append(names, tool.Name)
}
assert.Contains(t, names, "spec")
}
func TestSpecRequiresProjectRoot(t *testing.T) {
sk := spec.New(spec.Config{SkillPrompt: "s"})
_, err := sk.Handle(context.Background(), "spec", json.RawMessage(`{"requirements":"add login"}`))
assert.ErrorContains(t, err, "project_root")
}
func TestSpecRequiresRequirements(t *testing.T) {
sk := spec.New(spec.Config{SkillPrompt: "s"})
_, err := sk.Handle(context.Background(), "spec", json.RawMessage(`{"project_root":"/tmp"}`))
assert.ErrorContains(t, err, "requirements")
}
func TestSpecCallsExecutor(t *testing.T) {
called := false
var capturedTask string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
called = true
capturedTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "spec", Skill: "spec",
FilePath: "/tmp/proj/docs/login-spec.md",
Verified: true, ModelUsed: "self", Message: "spec written: login feature",
}, nil
}
sk := spec.New(spec.Config{SkillPrompt: "spec rules", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
out, err := sk.Handle(context.Background(), "spec", json.RawMessage(
`{"project_root":"/tmp/proj","requirements":"add OAuth2 login","output_path":"docs/login-spec.md"}`,
))
require.NoError(t, err)
assert.True(t, called)
assert.Contains(t, capturedTask, "OAuth2 login")
assert.Contains(t, capturedTask, "docs/login-spec.md")
var result iexec.Result
require.NoError(t, json.Unmarshal(out, &result))
assert.Equal(t, "spec", result.Phase)
}

View File

@@ -0,0 +1,56 @@
// internal/skills/spec/skill.go
package spec
import (
"context"
"encoding/json"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/registry"
)
// ExecutorFn is the function signature for running a worker subprocess.
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
// Config holds dependencies for the spec skill.
type Config struct {
SkillPrompt string
DefaultModel string
ExecutorFn ExecutorFn
SessionsDir string
}
// Skill implements the spec MCP tool.
type Skill struct{ cfg Config }
// New creates a new spec Skill.
func New(cfg Config) *Skill { return &Skill{cfg: cfg} }
// Name returns the skill identifier.
func (s *Skill) Name() string { return "spec" }
// Tools returns the MCP tool definitions for this skill.
func (s *Skill) Tools() []registry.ToolDef {
schema := func(required []string, props map[string]any) json.RawMessage {
b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
return b
}
str := map[string]any{"type": "string"}
return []registry.ToolDef{
{
Name: "spec",
Description: "Generate a structured implementation spec from requirements. Writes the spec to output_path in the project.",
InputSchema: schema(
[]string{"project_root", "requirements"},
map[string]any{
"project_root": str,
"requirements": str,
"output_path": str,
"context": str,
"model": str,
"session_id": str,
},
),
},
}
}

View File

@@ -6,6 +6,7 @@ import (
"fmt" "fmt"
iexec "github.com/mathiasbq/supervisor/internal/exec" iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
) )
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) { func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
@@ -51,6 +52,7 @@ type greenArgs struct {
TestPath string `json:"test_path"` TestPath string `json:"test_path"`
Model string `json:"model"` Model string `json:"model"`
TestCmd string `json:"test_cmd"` TestCmd string `json:"test_cmd"`
SessionID string `json:"session_id"`
} }
func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) { func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
@@ -68,6 +70,7 @@ func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawM
"phase: green\nproject_root: %s\ntest_path: %s\nmodel: %s\ntest_cmd: %s", "phase: green\nproject_root: %s\ntest_path: %s\nmodel: %s\ntest_cmd: %s",
args.ProjectRoot, args.TestPath, s.resolveModel(args.Model), args.TestCmd, args.ProjectRoot, args.TestPath, s.resolveModel(args.Model), args.TestCmd,
) )
task = s.prependHistory(args.SessionID, "green", task)
return s.execute(ctx, task) return s.execute(ctx, task)
} }
@@ -77,6 +80,7 @@ type refactorArgs struct {
ImplPath string `json:"impl_path"` ImplPath string `json:"impl_path"`
Model string `json:"model"` Model string `json:"model"`
TestCmd string `json:"test_cmd"` TestCmd string `json:"test_cmd"`
SessionID string `json:"session_id"`
} }
func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) { func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
@@ -97,9 +101,25 @@ func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.R
"phase: refactor\nproject_root: %s\ntest_path: %s\nimpl_path: %s\nmodel: %s\ntest_cmd: %s", "phase: refactor\nproject_root: %s\ntest_path: %s\nimpl_path: %s\nmodel: %s\ntest_cmd: %s",
args.ProjectRoot, args.TestPath, args.ImplPath, s.resolveModel(args.Model), args.TestCmd, args.ProjectRoot, args.TestPath, args.ImplPath, s.resolveModel(args.Model), args.TestCmd,
) )
task = s.prependHistory(args.SessionID, "refactor", task)
return s.execute(ctx, task) return s.execute(ctx, task)
} }
func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
if sessionID == "" || s.cfg.SessionsDir == "" {
return task
}
entries, err := session.Read(s.cfg.SessionsDir, sessionID)
if err != nil || len(entries) == 0 {
return task
}
history := session.FormatHistory(entries, currentPhase)
if history == "" {
return task
}
return history + "\n---\n\n" + task
}
func (s *Skill) resolveModel(override string) string { func (s *Skill) resolveModel(override string) string {
if override != "" { if override != "" {
return override return override

View File

@@ -5,6 +5,8 @@ import (
"encoding/json" "encoding/json"
"testing" "testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
"github.com/mathiasbq/supervisor/internal/skills/tdd" "github.com/mathiasbq/supervisor/internal/skills/tdd"
"github.com/stretchr/testify/assert" "github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require" "github.com/stretchr/testify/require"
@@ -41,5 +43,43 @@ func TestTDDRedRequiresSpec(t *testing.T) {
assert.ErrorContains(t, err, "spec") assert.ErrorContains(t, err, "spec")
} }
func TestTDDGreenInjectsSessionHistory(t *testing.T) {
sessDir := t.TempDir()
require.NoError(t, session.Append(sessDir, "sess-1", session.Entry{
SessionID: "sess-1", Skill: "tdd", Phase: "red", FinalStatus: "pass",
FilePath: "internal/foo/foo_test.go",
Message: "wrote failing test for Foo",
}))
var capturedPrompt string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
capturedPrompt = req.TaskPrompt
return iexec.Result{Status: "pass", Phase: "green", Skill: "tdd", Verified: true, ModelUsed: "self", Message: "ok"}, nil
}
sk := tdd.New(tdd.Config{SkillPrompt: "tdd", ExecutorFn: fakeFn, SessionsDir: sessDir})
_, err := sk.Handle(context.Background(), "tdd_green", json.RawMessage(
`{"project_root":"/tmp","test_path":"internal/foo/foo_test.go","test_cmd":"go test ./...","session_id":"sess-1"}`,
))
require.NoError(t, err)
assert.Contains(t, capturedPrompt, "## Session history")
assert.Contains(t, capturedPrompt, "wrote failing test for Foo")
}
func TestTDDGreenNoHistoryWhenSessionIDEmpty(t *testing.T) {
var capturedPrompt string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
capturedPrompt = req.TaskPrompt
return iexec.Result{Status: "pass", Phase: "green", Skill: "tdd", Verified: true, ModelUsed: "self", Message: "ok"}, nil
}
sk := tdd.New(tdd.Config{SkillPrompt: "tdd", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
_, err := sk.Handle(context.Background(), "tdd_green", json.RawMessage(
`{"project_root":"/tmp","test_path":"internal/foo/foo_test.go"}`,
))
require.NoError(t, err)
assert.NotContains(t, capturedPrompt, "## Session history")
}
// Ensure require is used (avoids import error). // Ensure require is used (avoids import error).
var _ = require.New var _ = require.New

View File

@@ -16,6 +16,7 @@ type Config struct {
SkillPrompt string SkillPrompt string
ExecutorFn ExecutorFn // nil = no executor (tests that don't reach execute()) ExecutorFn ExecutorFn // nil = no executor (tests that don't reach execute())
DefaultModel string DefaultModel string
SessionsDir string // optional: path to brain/sessions/ for history injection
} }
type Skill struct { type Skill struct {
@@ -63,6 +64,7 @@ func (s *Skill) Tools() []registry.ToolDef {
"test_path": strProp, "test_path": strProp,
"model": strProp, "model": strProp,
"test_cmd": strProp, "test_cmd": strProp,
"session_id": strProp,
}, },
), ),
}, },
@@ -77,6 +79,7 @@ func (s *Skill) Tools() []registry.ToolDef {
"impl_path": strProp, "impl_path": strProp,
"model": strProp, "model": strProp,
"test_cmd": strProp, "test_cmd": strProp,
"session_id": strProp,
}, },
), ),
}, },

View File

@@ -0,0 +1,80 @@
// internal/skills/trainer/handlers.go
package trainer
import (
"context"
"encoding/json"
"fmt"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
)
type trainArgs struct {
SessionID string `json:"session_id"`
Model string `json:"model"`
}
// Handle dispatches the MCP tool call to the trainer handler.
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
if tool != "trainer" {
return nil, fmt.Errorf("unknown tool: %s", tool)
}
var a trainArgs
if err := json.Unmarshal(args, &a); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if a.SessionID == "" {
return nil, fmt.Errorf("session_id is required")
}
if s.cfg.ExecutorFn == nil {
return nil, fmt.Errorf("no executor configured")
}
model := a.Model
if model == "" {
model = s.cfg.DefaultModel
}
entries, err := session.Read(s.cfg.SessionsDir, a.SessionID)
if err != nil {
return nil, fmt.Errorf("read session log: %w", err)
}
// ── Step 1: Reader agent ─────────────────────────────────────────────────
history := session.FormatHistory(entries, "")
readerTask := fmt.Sprintf(
"role: reader\nsession_id: %s\nbrain_dir: %s\n\n%s",
a.SessionID, s.cfg.BrainDir, history,
)
readerResult, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.ReaderPrompt,
TaskPrompt: readerTask,
Model: model,
Tools: "Read",
})
if err != nil {
return nil, fmt.Errorf("reader agent: %w", err)
}
// ── Step 2: Writer agent (receives reader candidates) ────────────────────
writerTask := fmt.Sprintf(
"role: writer\nsession_id: %s\nbrain_dir: %s\n\nreader_summary: %s\nreader_candidates:\n%s",
a.SessionID, s.cfg.BrainDir, readerResult.Message, readerResult.RunnerOutput,
)
writerResult, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.WriterPrompt,
TaskPrompt: writerTask,
Model: model,
Tools: "Read,Write",
})
if err != nil {
return nil, fmt.Errorf("writer agent: %w", err)
}
b, err := json.Marshal(writerResult)
if err != nil {
return nil, fmt.Errorf("marshal result: %w", err)
}
return b, nil
}

View File

@@ -0,0 +1,82 @@
// internal/skills/trainer/handlers_test.go
package trainer_test
import (
"context"
"encoding/json"
"testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
"github.com/mathiasbq/supervisor/internal/skills/trainer"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestTrainerToolRegistered(t *testing.T) {
sk := trainer.New(trainer.Config{ReaderPrompt: "r", WriterPrompt: "w"})
names := make([]string, 0)
for _, tool := range sk.Tools() {
names = append(names, tool.Name)
}
assert.Contains(t, names, "trainer")
}
func TestTrainerRequiresSessionID(t *testing.T) {
sk := trainer.New(trainer.Config{ReaderPrompt: "r", WriterPrompt: "w"})
_, err := sk.Handle(context.Background(), "trainer", json.RawMessage(`{}`))
assert.ErrorContains(t, err, "session_id")
}
func TestTrainerCallsReaderThenWriter(t *testing.T) {
sessDir := t.TempDir()
require.NoError(t, session.Append(sessDir, "sess-1", session.Entry{
SessionID: "sess-1", Skill: "tdd", Phase: "red", FinalStatus: "pass",
Message: "wrote failing test", FilePath: "internal/foo/foo_test.go",
}))
callCount := 0
var readerTask, writerTask string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
callCount++
if callCount == 1 {
// reader call
readerTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "trainer", Skill: "trainer",
RunnerOutput: `[{"type":"sft","moment":"first-pass clean TDD","score":4}]`,
Verified: true, ModelUsed: "self", Message: "1 sft candidate found",
}, nil
}
// writer call
writerTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "trainer", Skill: "trainer",
FilePath: sessDir + "/training-data/sft/sess-1.jsonl",
Verified: true, ModelUsed: "self", Message: "1 sft pair written",
}, nil
}
sk := trainer.New(trainer.Config{
ReaderPrompt: "reader rules",
WriterPrompt: "writer rules",
ExecutorFn: fakeFn,
SessionsDir: sessDir,
BrainDir: t.TempDir(),
})
out, err := sk.Handle(context.Background(), "trainer", json.RawMessage(`{"session_id":"sess-1"}`))
require.NoError(t, err)
assert.Equal(t, 2, callCount, "executor must be called exactly twice: reader then writer")
assert.Contains(t, readerTask, "role: reader")
assert.Contains(t, readerTask, "sess-1")
assert.Contains(t, readerTask, "wrote failing test") // session history in reader prompt
assert.Contains(t, writerTask, "role: writer")
assert.Contains(t, writerTask, "sft candidate") // reader output passed to writer
var result iexec.Result
require.NoError(t, json.Unmarshal(out, &result))
assert.Equal(t, "trainer", result.Phase)
assert.Equal(t, "pass", result.Status)
}

View File

@@ -0,0 +1,53 @@
// internal/skills/trainer/skill.go
package trainer
import (
"context"
"encoding/json"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/registry"
)
// ExecutorFn is the function signature for running a worker subprocess.
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
// Config holds dependencies for the trainer skill.
type Config struct {
ReaderPrompt string
WriterPrompt string
DefaultModel string
ExecutorFn ExecutorFn
SessionsDir string
BrainDir string // root of brain/ directory; writer writes to BrainDir/training-data/
}
// Skill implements the trainer MCP tool.
type Skill struct{ cfg Config }
// New creates a new trainer Skill.
func New(cfg Config) *Skill { return &Skill{cfg: cfg} }
// Name returns the skill identifier.
func (s *Skill) Name() string { return "trainer" }
// Tools returns the MCP tool definitions for this skill.
func (s *Skill) Tools() []registry.ToolDef {
schema := func(required []string, props map[string]any) json.RawMessage {
b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
return b
}
return []registry.ToolDef{
{
Name: "trainer",
Description: "Extract SFT and DPO training pairs from a session log. Runs a reader→writer chain: reader identifies learning moments, writer formats and writes pairs to brain/training-data/.",
InputSchema: schema(
[]string{"session_id"},
map[string]any{
"session_id": map[string]any{"type": "string"},
"model": map[string]any{"type": "string"},
},
),
},
}
}