docs: add supervisor MCP server design spec

Captures the full architecture: Go MCP server on flamingo, Claude
supervisor+worker instance, LiteLLM/Ollama gradual delegation, TDD
as first skill, language-agnostic test runner detection, structured
JSON output contract, and three-layer model selection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Mathias Bergqvist
2026-04-16 22:57:10 +02:00
parent 92c9c42e24
commit 848001f27f

View File

@@ -0,0 +1,192 @@
# Supervisor MCP Server — Design Spec
**Date:** 2026-04-16
**Status:** approved
**Author:** Mathias + Claude
---
## Overview
The supervisor is a Go MCP server running locally on flamingo. It exposes skill workers as MCP tools that Claude Code can call natively during coding sessions. The first skill is TDD (red/green/refactor). The supervisor spawns a Claude instance to act as supervisor + worker, which runs verification commands directly on the target project's filesystem and optionally delegates code generation to Ollama models via LiteLLM.
---
## Architecture
```
Claude Code (orchestrator session)
│ MCP tool call e.g. tdd_red(project_root, spec)
┌─────────────────────────────┐
│ supervisor (Go, flamingo) │ thin — MCP protocol + process spawner only
└─────────────┬───────────────┘
│ exec: claude --print "<prompt>"
│ cwd: config/supervisor/ ← loads supervisor CLAUDE.md
┌─────────────────────────────────────────┐
│ Claude supervisor instance │
│ reads: config/supervisor/CLAUDE.md │ orchestration rules, output contract
│ reads: config/supervisor/tdd.md │ injected per skill invocation
│ tools: Read, Write, Bash │ filesystem access + test runner
│ optional: LiteLLM → Ollama worker │ code generation delegation (gradual)
└─────────────────────────────────────────┘
│ structured JSON response
MCP response → Claude Code
```
**Key properties:**
- Go server knows nothing about TDD — it routes, spawns, and validates JSON
- All orchestration logic is declarative in `config/supervisor/CLAUDE.md`
- Ollama migration path requires no architectural changes — supervisor Claude delegates generation to LiteLLM and retains verification
- Stateless — project path passed per call, no session registration
---
## Project Structure
```
supervisor/
├── cmd/
│ └── supervisor/
│ └── main.go ← starts MCP server, wires dependencies
├── internal/
│ ├── config/ ← env vars + models.yaml → typed Config struct
│ ├── mcp/ ← HTTP/SSE MCP server, tool dispatch
│ ├── registry/ ← maps tool names to skill handlers
│ ├── skills/
│ │ └── tdd/ ← tdd_red, tdd_green, tdd_refactor handlers
│ └── exec/ ← spawns claude --print, validates JSON output
├── config/
│ └── supervisor/
│ ├── CLAUDE.md ← permanent orchestration rules
│ └── tdd.md ← TDD discipline, injected per TDD call
├── docs/
│ └── superpowers/specs/ ← design specs (this file)
├── .env.example
└── Taskfile.yml
```
---
## MCP Tools
### TDD Skill
All tools are stateless. Project path is passed on every call.
| Tool | Required inputs | Optional inputs | Returns |
|------|----------------|-----------------|---------|
| `tdd_red` | `project_root`, `spec` | `model`, `test_cmd` | JSON result |
| `tdd_green` | `project_root`, `test_path` | `model`, `test_cmd` | JSON result |
| `tdd_refactor` | `project_root`, `test_path`, `impl_path` | `model`, `test_cmd` | JSON result |
### Response schema (all tools)
```json
{
"status": "pass | fail | error",
"phase": "red | green | refactor",
"skill": "tdd",
"file_path": "<absolute path to generated/modified file>",
"runner_output": "<raw stdout+stderr from test runner>",
"verified": true,
"model_used": "<model that generated the code, or 'self'>",
"message": "<one sentence summary>"
}
```
`verified` reflects the subprocess exit code only — never the model's self-assessment.
---
## Model Selection
Three-layer priority (highest to lowest):
1. `model` parameter in the MCP tool call — caller override
2. Per-skill default in `config/models.yaml`
3. Global fallback default
The Go skill handler reads the resolved model name and passes it to the Claude instance via the prompt (not an env var), so the supervisor Claude knows which model to delegate to per invocation.
```yaml
# config/models.yaml
default: ollama/qwen3-coder-30b-tuned
skills:
tdd: ollama/qwen3-coder-30b-tuned
review: ollama/devstral-tuned
debug: ollama/deepseek-r1-tuned
```
LiteLLM on iguana is the gateway for all Ollama and cloud model calls. The Go server passes `LITELLM_BASE_URL` and the resolved model name to the spawned Claude instance via the prompt context.
---
## Supervisor CLAUDE.md
Loaded for every invocation. Defines identity, output contract, verification rules, loop prevention, and test runner detection.
**Output contract:** Every response is raw JSON matching the schema above. No preamble, no markdown, no prose. Malformed output is treated as a failed invocation and triggers a retry.
**Verification:** Exit code is the only source of truth. `verified: true` only when the subprocess exit code matches the phase expectation.
**Loop prevention:**
- Maximum 3 attempts per phase
- Two consecutive identical outputs → immediate `status: error`
- No retries after 3 failures
**Test runner detection** (inspects `project_root`):
| Signal | Command |
|--------|---------|
| `go.mod` | `go test ./...` |
| `package.json` | `npm test` |
| `pyproject.toml` / `pytest.ini` | `pytest` |
| `Cargo.toml` | `cargo test` |
| `Gemfile` | `bundle exec rspec` |
| `mix.exs` | `mix test` |
`test_cmd` parameter bypasses detection entirely.
---
## Skill Discipline: TDD (`config/supervisor/tdd.md`)
Read from disk by the Go TDD handler and appended to the prompt string on every TDD tool call. Not part of the always-on CLAUDE.md.
- **Red:** Write one test, one behavior. Run suite. Confirm it fails. If it passes, the test is wrong — return `fail`.
- **Green:** Write minimal code to pass. Nothing beyond what the test requires. Run suite. Confirm it passes. Fix code, not test.
- **Refactor:** Structure only, no new behavior. Tests must stay green after every change.
---
## Gradual Ollama Migration
Start: Claude supervisor + worker (does everything itself).
Next: Claude delegates code generation to `qwen3-coder-30b-tuned` via LiteLLM, retains verification and judgment.
Later: More skills, more workers, model routing by task type.
No architectural changes required between stages — only `config/models.yaml` and `supervisor/CLAUDE.md` prompting strategy evolve.
---
## Infrastructure
- **Runs on:** flamingo (daily driver, local filesystem access)
- **LiteLLM gateway:** iguana:4000 (Tailscale)
- **MCP transport:** HTTP/SSE
- **Port:** 3200 (default, configurable)
- **Registered in:** `.context/mcp.json` → propagated to all harnesses via `task context:sync`
---
## Deferred
- **Generic phase engine (Option C):** revisit when 4+ skills exist and shared phase shape is visible. Saved in project memory.
- **Session registration:** stateless for now, revisit if per-session context becomes valuable.
- **Authentication:** internal Tailscale network only for now.