# Supervisor MCP Server Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Build a Go MCP server that exposes TDD skill workers as tools Claude Code can call natively, with a Claude supervisor instance handling orchestration and verification. **Architecture:** A thin Go HTTP server routes MCP tool calls to skill handlers. Each handler spawns `claude --print` with `--bare --system-prompt --json-schema --permission-mode bypassPermissions`, injecting supervisor rules and skill discipline. The spawned Claude instance uses its own Bash/Read/Write tools to run tests and generate code, returning a structured JSON result. LiteLLM on iguana handles Ollama model delegation. **Tech Stack:** Go 1.26, net/http (stdlib), gopkg.in/yaml.v3, testify, claude CLI, LiteLLM --- ## File Map | File | Responsibility | |------|---------------| | `go.mod` | Module definition | | `cmd/supervisor/main.go` | Entry point, wires all dependencies | | `internal/config/config.go` | Env vars → typed Config struct | | `internal/config/config_test.go` | Config parsing tests | | `internal/config/models.go` | YAML model routing table | | `internal/config/models_test.go` | Model resolution tests | | `internal/exec/result.go` | Result struct + JSON schema constant | | `internal/exec/result_test.go` | JSON parsing/validation tests | | `internal/exec/executor.go` | Spawn claude, capture output, parse result | | `internal/exec/executor_test.go` | Executor tests using fake binary | | `internal/registry/registry.go` | Skill interface + registry | | `internal/registry/registry_test.go` | Registry routing tests | | `internal/mcp/server.go` | HTTP/JSON-RPC MCP server | | `internal/mcp/server_test.go` | MCP protocol tests | | `internal/skills/tdd/skill.go` | Tool definitions (MCP schema) | | `internal/skills/tdd/handlers.go` | tdd_red, tdd_green, tdd_refactor | | `internal/skills/tdd/handlers_test.go` | Handler prompt-building tests | | `config/supervisor/CLAUDE.md` | Supervisor orchestration rules | | `config/supervisor/tdd.md` | TDD iron law (injected per call) | | `config/models.yaml` | Per-skill model routing table | | `.env.example` | Documented env vars | | `Taskfile.yml` | Add `supervisor:dev` and `supervisor:build` tasks | | `.context/mcp.json` | Register supervisor as MCP server | --- ## Task 1: Go module + project skeleton **Files:** - Create: `go.mod` - Create: `cmd/supervisor/main.go` - [ ] **Step 1: Write a failing test to verify the binary compiles and exits cleanly** Create `cmd/supervisor/main_test.go`: ```go package main import ( "os/exec" "testing" ) func TestBinaryCompiles(t *testing.T) { cmd := exec.Command("go", "build", "./...") out, err := cmd.CombinedOutput() if err != nil { t.Fatalf("build failed: %s\n%s", err, out) } } ``` - [ ] **Step 2: Run test — expect it to fail (no go.mod yet)** ```bash go test ./... ``` Expected: error — no go.mod - [ ] **Step 3: Initialize the module** ```bash go mod init github.com/mathiasbq/supervisor ``` - [ ] **Step 4: Create `cmd/supervisor/main.go`** ```go package main import ( "fmt" "os" ) func main() { fmt.Fprintln(os.Stderr, "supervisor: not yet implemented") os.Exit(1) } ``` - [ ] **Step 5: Run test — expect it to pass** ```bash go test ./cmd/supervisor/... ``` Expected: PASS - [ ] **Step 6: Commit** ```bash git add go.mod cmd/ git commit -m "chore: initialize go module and cmd skeleton" ``` --- ## Task 2: Result type **Files:** - Create: `internal/exec/result.go` - Create: `internal/exec/result_test.go` - [ ] **Step 1: Write failing tests for result parsing** Create `internal/exec/result_test.go`: ```go package exec_test import ( "encoding/json" "testing" "github.com/mathiasbq/supervisor/internal/exec" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) func TestResultParsesValidJSON(t *testing.T) { raw := `{ "status": "pass", "phase": "red", "skill": "tdd", "file_path": "/tmp/foo_test.go", "runner_output": "--- FAIL: TestFoo", "verified": true, "model_used": "self", "message": "test fails as expected" }` var r exec.Result require.NoError(t, json.Unmarshal([]byte(raw), &r)) assert.Equal(t, "pass", r.Status) assert.Equal(t, "red", r.Phase) assert.True(t, r.Verified) } func TestResultValidation(t *testing.T) { tests := []struct { name string result exec.Result wantErr bool }{ { name: "valid pass result", result: exec.Result{ Status: "pass", Phase: "red", Skill: "tdd", FilePath: "/tmp/x_test.go", RunnerOutput: "FAIL", Verified: true, ModelUsed: "self", Message: "ok", }, wantErr: false, }, { name: "empty status", result: exec.Result{Phase: "red", Skill: "tdd"}, wantErr: true, }, { name: "invalid status", result: exec.Result{Status: "unknown", Phase: "red", Skill: "tdd"}, wantErr: true, }, { name: "invalid phase", result: exec.Result{Status: "pass", Phase: "bad", Skill: "tdd"}, wantErr: true, }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { err := tt.result.Validate() if tt.wantErr { assert.Error(t, err) } else { assert.NoError(t, err) } }) } } ``` - [ ] **Step 2: Run — expect FAIL (package does not exist)** ```bash go test ./internal/exec/... ``` Expected: FAIL — cannot find package - [ ] **Step 3: Install testify** ```bash go get github.com/stretchr/testify@latest ``` - [ ] **Step 4: Create `internal/exec/result.go`** ```go package exec import ( "errors" "strings" ) // Result is the structured JSON output from every supervisor invocation. // The JSON schema constant is passed to claude via --json-schema so Claude // validates its own output before returning. type Result struct { Status string `json:"status"` // pass | fail | error Phase string `json:"phase"` // red | green | refactor Skill string `json:"skill"` // tdd | review | ... FilePath string `json:"file_path"` // absolute path to generated file RunnerOutput string `json:"runner_output"` // raw stdout+stderr from test runner Verified bool `json:"verified"` // based on exit code, never self-report ModelUsed string `json:"model_used"` // model name or "self" Message string `json:"message"` // one sentence summary } var validStatuses = map[string]bool{"pass": true, "fail": true, "error": true} var validPhases = map[string]bool{"red": true, "green": true, "refactor": true} func (r Result) Validate() error { var errs []string if !validStatuses[r.Status] { errs = append(errs, "status must be pass|fail|error, got: "+r.Status) } if !validPhases[r.Phase] { errs = append(errs, "phase must be red|green|refactor, got: "+r.Phase) } if r.Skill == "" { errs = append(errs, "skill is required") } if len(errs) > 0 { return errors.New(strings.Join(errs, "; ")) } return nil } // Schema is passed to claude --json-schema to enforce structured output. const Schema = `{ "type": "object", "required": ["status","phase","skill","file_path","runner_output","verified","model_used","message"], "properties": { "status": {"type": "string", "enum": ["pass","fail","error"]}, "phase": {"type": "string", "enum": ["red","green","refactor"]}, "skill": {"type": "string"}, "file_path": {"type": "string"}, "runner_output": {"type": "string"}, "verified": {"type": "boolean"}, "model_used": {"type": "string"}, "message": {"type": "string"} } }` ``` - [ ] **Step 5: Run — expect PASS** ```bash go test ./internal/exec/... ``` Expected: PASS - [ ] **Step 6: Commit** ```bash git add internal/exec/result.go internal/exec/result_test.go git commit -m "feat: add Result type with JSON schema and validation" ``` --- ## Task 3: Config package **Files:** - Create: `internal/config/config.go` - Create: `internal/config/config_test.go` - [ ] **Step 1: Write failing tests** Create `internal/config/config_test.go`: ```go package config_test import ( "testing" "github.com/mathiasbq/supervisor/internal/config" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) func TestLoadDefaults(t *testing.T) { t.Setenv("SUPERVISOR_PORT", "") t.Setenv("LITELLM_BASE_URL", "") t.Setenv("LITELLM_API_KEY", "") t.Setenv("SUPERVISOR_CONFIG_DIR", "") cfg, err := config.Load() require.NoError(t, err) assert.Equal(t, "3200", cfg.Port) assert.Equal(t, "http://iguana:4000", cfg.LiteLLMBaseURL) assert.Equal(t, "./config/supervisor", cfg.ConfigDir) } func TestLoadFromEnv(t *testing.T) { t.Setenv("SUPERVISOR_PORT", "4000") t.Setenv("LITELLM_BASE_URL", "http://localhost:4000") t.Setenv("LITELLM_API_KEY", "test-key") t.Setenv("SUPERVISOR_CONFIG_DIR", "/etc/supervisor") cfg, err := config.Load() require.NoError(t, err) assert.Equal(t, "4000", cfg.Port) assert.Equal(t, "http://localhost:4000", cfg.LiteLLMBaseURL) assert.Equal(t, "test-key", cfg.LiteLLMAPIKey) assert.Equal(t, "/etc/supervisor", cfg.ConfigDir) } ``` - [ ] **Step 2: Run — expect FAIL** ```bash go test ./internal/config/... ``` Expected: FAIL — cannot find package - [ ] **Step 3: Create `internal/config/config.go`** ```go package config import "os" type Config struct { Port string // SUPERVISOR_PORT, default 3200 LiteLLMBaseURL string // LITELLM_BASE_URL, default http://iguana:4000 LiteLLMAPIKey string // LITELLM_API_KEY ConfigDir string // SUPERVISOR_CONFIG_DIR, default ./config/supervisor ModelsFile string // SUPERVISOR_MODELS_FILE, default /../models.yaml } func Load() (Config, error) { cfg := Config{ Port: envOr("SUPERVISOR_PORT", "3200"), LiteLLMBaseURL: envOr("LITELLM_BASE_URL", "http://iguana:4000"), LiteLLMAPIKey: os.Getenv("LITELLM_API_KEY"), ConfigDir: envOr("SUPERVISOR_CONFIG_DIR", "./config/supervisor"), } cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml") return cfg, nil } func envOr(key, def string) string { if v := os.Getenv(key); v != "" { return v } return def } ``` - [ ] **Step 4: Run — expect PASS** ```bash go test ./internal/config/... ``` Expected: PASS - [ ] **Step 5: Commit** ```bash git add internal/config/ git commit -m "feat: add config package with env-var loading" ``` --- ## Task 4: Models config **Files:** - Create: `internal/config/models.go` - Create: `internal/config/models_test.go` - Create: `config/models.yaml` - [ ] **Step 1: Write failing tests** Append to `internal/config/models_test.go` (new file): ```go package config_test import ( "os" "path/filepath" "testing" "github.com/mathiasbq/supervisor/internal/config" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) func TestModelsResolve(t *testing.T) { yaml := ` default: ollama/default-model skills: tdd: ollama/qwen3-coder-30b-tuned review: ollama/devstral-tuned ` f := filepath.Join(t.TempDir(), "models.yaml") require.NoError(t, os.WriteFile(f, []byte(yaml), 0644)) m, err := config.LoadModels(f) require.NoError(t, err) assert.Equal(t, "ollama/qwen3-coder-30b-tuned", m.Resolve("tdd", "")) assert.Equal(t, "ollama/devstral-tuned", m.Resolve("review", "")) assert.Equal(t, "ollama/default-model", m.Resolve("unknown", "")) } func TestModelsOverride(t *testing.T) { yaml := ` default: ollama/default-model skills: tdd: ollama/qwen3-coder-30b-tuned ` f := filepath.Join(t.TempDir(), "models.yaml") require.NoError(t, os.WriteFile(f, []byte(yaml), 0644)) m, err := config.LoadModels(f) require.NoError(t, err) // caller override wins over skill default assert.Equal(t, "anthropic/claude-sonnet-4-6", m.Resolve("tdd", "anthropic/claude-sonnet-4-6")) } ``` - [ ] **Step 2: Run — expect FAIL** ```bash go test ./internal/config/... ``` Expected: FAIL — Models type undefined - [ ] **Step 3: Install yaml dependency** ```bash go get gopkg.in/yaml.v3 ``` - [ ] **Step 4: Create `internal/config/models.go`** ```go package config import ( "fmt" "os" "gopkg.in/yaml.v3" ) type modelsFile struct { Default string `yaml:"default"` Skills map[string]string `yaml:"skills"` } type Models struct { data modelsFile } func LoadModels(path string) (Models, error) { raw, err := os.ReadFile(path) if err != nil { return Models{}, fmt.Errorf("load models: %w", err) } var f modelsFile if err := yaml.Unmarshal(raw, &f); err != nil { return Models{}, fmt.Errorf("parse models: %w", err) } return Models{data: f}, nil } // Resolve returns the model for a skill, respecting three-layer priority: // 1. override (from MCP call) — highest // 2. per-skill default from models.yaml // 3. global default func (m Models) Resolve(skill, override string) string { if override != "" { return override } if model, ok := m.data.Skills[skill]; ok { return model } return m.data.Default } ``` - [ ] **Step 5: Run — expect PASS** ```bash go test ./internal/config/... ``` Expected: PASS - [ ] **Step 6: Create `config/models.yaml`** ```yaml # Model routing table — three-layer priority: # 1. model param in MCP tool call (caller override) # 2. per-skill entry here # 3. default (fallback) default: ollama/qwen3-coder-30b-tuned skills: tdd: ollama/qwen3-coder-30b-tuned review: ollama/devstral-tuned debug: ollama/deepseek-r1-tuned ``` - [ ] **Step 7: Commit** ```bash git add internal/config/models.go internal/config/models_test.go config/models.yaml git commit -m "feat: add model routing table with three-layer priority" ``` --- ## Task 5: Exec package **Files:** - Create: `internal/exec/executor.go` - Create: `internal/exec/executor_test.go` - Modify: `internal/exec/result.go` (add Schema constant — already done in Task 2) - [ ] **Step 1: Write failing tests** Create `internal/exec/executor_test.go`: ```go package exec_test import ( "context" "os" "path/filepath" "testing" "time" iexec "github.com/mathiasbq/supervisor/internal/exec" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) // fakeClaudePath writes a shell script that prints a fixed JSON result // and returns its path. Used to test executor without invoking real claude. func fakeClaudePath(t *testing.T, output string, exitCode int) string { t.Helper() dir := t.TempDir() script := filepath.Join(dir, "claude") var content string if exitCode != 0 { content = "#!/bin/sh\necho 'error' >&2\nexit 1\n" } else { content = "#!/bin/sh\necho '" + output + "'\n" } require.NoError(t, os.WriteFile(script, []byte(content), 0755)) return script } func TestExecutorParsesValidResult(t *testing.T) { validJSON := `{"status":"pass","phase":"red","skill":"tdd","file_path":"/tmp/x_test.go","runner_output":"FAIL","verified":true,"model_used":"self","message":"ok"}` claude := fakeClaudePath(t, validJSON, 0) ex := iexec.New(iexec.Config{ ClaudeBinary: claude, SystemPrompt: "you are a supervisor", Timeout: 5 * time.Second, }) result, err := ex.Run(context.Background(), iexec.Request{ SkillPrompt: "tdd rules", TaskPrompt: "run red phase", }) require.NoError(t, err) assert.Equal(t, "pass", result.Status) assert.True(t, result.Verified) } func TestExecutorReturnsErrorOnNonZeroExit(t *testing.T) { claude := fakeClaudePath(t, "", 1) ex := iexec.New(iexec.Config{ ClaudeBinary: claude, SystemPrompt: "you are a supervisor", Timeout: 5 * time.Second, }) _, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "fail"}) assert.Error(t, err) } func TestExecutorTimesOut(t *testing.T) { dir := t.TempDir() script := filepath.Join(dir, "claude") require.NoError(t, os.WriteFile(script, []byte("#!/bin/sh\nsleep 60\n"), 0755)) ex := iexec.New(iexec.Config{ ClaudeBinary: script, SystemPrompt: "you are a supervisor", Timeout: 100 * time.Millisecond, }) _, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "slow"}) assert.ErrorContains(t, err, "timeout") } ``` - [ ] **Step 2: Verify the test file compiles** ```bash go vet ./internal/exec/... ``` Expected: no errors - [ ] **Step 3: Run — expect FAIL** ```bash go test ./internal/exec/... ``` Expected: FAIL — Executor type undefined - [ ] **Step 4: Create `internal/exec/executor.go`** ```go package exec import ( "bytes" "context" "encoding/json" "fmt" "os/exec" "strings" "time" ) // Config holds executor configuration. type Config struct { ClaudeBinary string // path to claude binary, defaults to "claude" SystemPrompt string // contents of supervisor CLAUDE.md Timeout time.Duration // per-invocation timeout, default 120s LiteLLMBaseURL string // passed to Claude so it can delegate to Ollama LiteLLMAPIKey string // passed to Claude for LiteLLM auth } // Request is the input to a single supervisor invocation. type Request struct { SkillPrompt string // skill-specific discipline (e.g. tdd.md contents) TaskPrompt string // the specific task (phase, project_root, spec, model) Model string // resolved model name, passed in task prompt Tools string // comma-separated allowed tools, default "Bash,Read,Write" } // Executor spawns a claude instance and captures its structured JSON output. type Executor struct { cfg Config } func New(cfg Config) *Executor { if cfg.ClaudeBinary == "" { cfg.ClaudeBinary = "claude" } if cfg.Timeout == 0 { cfg.Timeout = 120 * time.Second } return &Executor{cfg: cfg} } func (e *Executor) Run(ctx context.Context, req Request) (Result, error) { ctx, cancel := context.WithTimeout(ctx, e.cfg.Timeout) defer cancel() tools := req.Tools if tools == "" { tools = "Bash,Read,Write" } // Build the full prompt: system rules + skill rules + task + infra context litellmCtx := fmt.Sprintf( "LITELLM_BASE_URL: %s\nLITELLM_API_KEY: %s", e.cfg.LiteLLMBaseURL, e.cfg.LiteLLMAPIKey, ) prompt := strings.Join([]string{ e.cfg.SystemPrompt, "---", req.SkillPrompt, "---", litellmCtx, "---", req.TaskPrompt, }, "\n\n") args := []string{ "--print", "--bare", "--permission-mode", "bypassPermissions", "--tools", tools, "--json-schema", Schema, "--output-format", "text", prompt, } cmd := exec.CommandContext(ctx, e.cfg.ClaudeBinary, args...) var stdout, stderr bytes.Buffer cmd.Stdout = &stdout cmd.Stderr = &stderr if err := cmd.Run(); err != nil { if ctx.Err() != nil { return Result{}, fmt.Errorf("timeout after %s", e.cfg.Timeout) } return Result{}, fmt.Errorf("claude exited with error: %w — stderr: %s", err, stderr.String()) } var r Result if err := json.Unmarshal(stdout.Bytes(), &r); err != nil { return Result{}, fmt.Errorf("parse result JSON: %w — raw output: %s", err, stdout.String()) } if err := r.Validate(); err != nil { return Result{}, fmt.Errorf("invalid result: %w", err) } return r, nil } ``` - [ ] **Step 5: Run — expect PASS** ```bash go test ./internal/exec/... ``` Expected: PASS - [ ] **Step 6: Commit** ```bash git add internal/exec/executor.go internal/exec/executor_test.go git commit -m "feat: add executor that spawns claude and parses JSON result" ``` --- ## Task 6: Registry + Skill interface **Files:** - Create: `internal/registry/registry.go` - Create: `internal/registry/registry_test.go` - [ ] **Step 1: Write failing tests** Create `internal/registry/registry_test.go`: ```go package registry_test import ( "context" "encoding/json" "testing" "github.com/mathiasbq/supervisor/internal/registry" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) type stubSkill struct { name string } func (s stubSkill) Name() string { return s.name } func (s stubSkill) Tools() []registry.ToolDef { return []registry.ToolDef{{ Name: s.name + "_tool", Description: "stub tool", InputSchema: json.RawMessage(`{"type":"object","properties":{}}`), }} } func (s stubSkill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) { return json.RawMessage(`{"ok":true}`), nil } func TestRegistryRoutes(t *testing.T) { r := registry.New() r.Register(stubSkill{name: "tdd"}) result, err := r.Dispatch(context.Background(), "tdd_tool", json.RawMessage(`{}`)) require.NoError(t, err) assert.JSONEq(t, `{"ok":true}`, string(result)) } func TestRegistryUnknownTool(t *testing.T) { r := registry.New() _, err := r.Dispatch(context.Background(), "unknown_tool", json.RawMessage(`{}`)) assert.ErrorContains(t, err, "unknown tool") } func TestRegistryListsTools(t *testing.T) { r := registry.New() r.Register(stubSkill{name: "tdd"}) tools := r.Tools() require.Len(t, tools, 1) assert.Equal(t, "tdd_tool", tools[0].Name) } ``` - [ ] **Step 2: Run — expect FAIL** ```bash go test ./internal/registry/... ``` Expected: FAIL — registry package not found - [ ] **Step 3: Create `internal/registry/registry.go`** ```go package registry import ( "context" "encoding/json" "fmt" ) // ToolDef describes a single MCP tool exposed by a skill. type ToolDef struct { Name string `json:"name"` Description string `json:"description"` InputSchema json.RawMessage `json:"inputSchema"` } // Skill is implemented by each skill package (tdd, review, etc.). type Skill interface { Name() string Tools() []ToolDef Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) } // Registry routes MCP tool calls to the correct skill handler. type Registry struct { skills map[string]Skill // tool name → skill tools []ToolDef } func New() *Registry { return &Registry{skills: make(map[string]Skill)} } func (r *Registry) Register(s Skill) { for _, t := range s.Tools() { r.skills[t.Name] = s r.tools = append(r.tools, t) } } func (r *Registry) Tools() []ToolDef { return r.tools } func (r *Registry) Dispatch(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) { s, ok := r.skills[tool] if !ok { return nil, fmt.Errorf("unknown tool: %s", tool) } return s.Handle(ctx, tool, args) } ``` - [ ] **Step 4: Run — expect PASS** ```bash go test ./internal/registry/... ``` Expected: PASS - [ ] **Step 5: Commit** ```bash git add internal/registry/ git commit -m "feat: add skill registry with tool routing" ``` --- ## Task 7: MCP server **Files:** - Create: `internal/mcp/server.go` - Create: `internal/mcp/server_test.go` - [ ] **Step 1: Write failing tests** Create `internal/mcp/server_test.go`: ```go package mcp_test import ( "bytes" "context" "encoding/json" "net/http" "net/http/httptest" "testing" "github.com/mathiasbq/supervisor/internal/mcp" "github.com/mathiasbq/supervisor/internal/registry" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) func jsonBody(t *testing.T, v any) *bytes.Buffer { t.Helper() b, err := json.Marshal(v) require.NoError(t, err) return bytes.NewBuffer(b) } func TestMCPInitialize(t *testing.T) { reg := registry.New() srv := mcp.NewServer(reg) req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{ "jsonrpc": "2.0", "id": 1, "method": "initialize", "params": map[string]any{}, })) req.Header.Set("Content-Type", "application/json") rr := httptest.NewRecorder() srv.ServeHTTP(rr, req) assert.Equal(t, http.StatusOK, rr.Code) var resp map[string]any require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp)) result := resp["result"].(map[string]any) assert.Equal(t, "2024-11-05", result["protocolVersion"]) } func TestMCPToolsList(t *testing.T) { reg := registry.New() srv := mcp.NewServer(reg) req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{ "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": map[string]any{}, })) req.Header.Set("Content-Type", "application/json") rr := httptest.NewRecorder() srv.ServeHTTP(rr, req) assert.Equal(t, http.StatusOK, rr.Code) var resp map[string]any require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp)) result := resp["result"].(map[string]any) assert.NotNil(t, result["tools"]) } func TestMCPUnknownMethod(t *testing.T) { reg := registry.New() srv := mcp.NewServer(reg) req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{ "jsonrpc": "2.0", "id": 3, "method": "unknown/method", "params": map[string]any{}, })) req.Header.Set("Content-Type", "application/json") rr := httptest.NewRecorder() srv.ServeHTTP(rr, req) assert.Equal(t, http.StatusOK, rr.Code) var resp map[string]any require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp)) assert.NotNil(t, resp["error"]) } ``` - [ ] **Step 2: Run — expect FAIL** ```bash go test ./internal/mcp/... ``` Expected: FAIL — mcp package not found - [ ] **Step 3: Create `internal/mcp/server.go`** ```go package mcp import ( "context" "encoding/json" "net/http" "github.com/mathiasbq/supervisor/internal/registry" ) type request struct { JSONRPC string `json:"jsonrpc"` ID any `json:"id"` Method string `json:"method"` Params json.RawMessage `json:"params"` } type response struct { JSONRPC string `json:"jsonrpc"` ID any `json:"id,omitempty"` Result any `json:"result,omitempty"` Error *rpcError `json:"error,omitempty"` } type rpcError struct { Code int `json:"code"` Message string `json:"message"` } // Server is an HTTP handler implementing the MCP JSON-RPC protocol. type Server struct { reg *registry.Registry } func NewServer(reg *registry.Registry) *Server { return &Server{reg: reg} } func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) { var req request if err := json.NewDecoder(r.Body).Decode(&req); err != nil { writeError(w, nil, -32700, "parse error") return } var result any var rpcErr *rpcError switch req.Method { case "initialize": result = map[string]any{ "protocolVersion": "2024-11-05", "capabilities": map[string]any{"tools": map[string]any{}}, "serverInfo": map[string]any{"name": "supervisor", "version": "0.1.0"}, } case "tools/list": result = map[string]any{"tools": s.reg.Tools()} case "tools/call": var p struct { Name string `json:"name"` Arguments json.RawMessage `json:"arguments"` } if err := json.Unmarshal(req.Params, &p); err != nil { rpcErr = &rpcError{Code: -32602, Message: "invalid params"} break } out, err := s.reg.Dispatch(context.Background(), p.Name, p.Arguments) if err != nil { rpcErr = &rpcError{Code: -32000, Message: err.Error()} break } result = map[string]any{ "content": []map[string]any{{"type": "text", "text": string(out)}}, } default: rpcErr = &rpcError{Code: -32601, Message: "method not found: " + req.Method} } w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(response{ JSONRPC: "2.0", ID: req.ID, Result: result, Error: rpcErr, }) } func writeError(w http.ResponseWriter, id any, code int, msg string) { w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(response{ JSONRPC: "2.0", ID: id, Error: &rpcError{Code: code, Message: msg}, }) } ``` - [ ] **Step 4: Run — expect PASS** ```bash go test ./internal/mcp/... ``` Expected: PASS - [ ] **Step 5: Commit** ```bash git add internal/mcp/ git commit -m "feat: add MCP HTTP server with JSON-RPC 2.0 transport" ``` --- ## Task 8: TDD skill **Files:** - Create: `internal/skills/tdd/skill.go` - Create: `internal/skills/tdd/handlers.go` - Create: `internal/skills/tdd/handlers_test.go` - [ ] **Step 1: Write failing tests** Create `internal/skills/tdd/handlers_test.go`: ```go package tdd_test import ( "context" "encoding/json" "testing" "github.com/mathiasbq/supervisor/internal/skills/tdd" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) func TestTDDSkillTools(t *testing.T) { skill := tdd.New(tdd.Config{ SystemPrompt: "supervisor rules", SkillPrompt: "tdd rules", ExecutorFn: nil, // will be set in TestHandle }) tools := skill.Tools() names := make([]string, len(tools)) for i, t := range tools { names[i] = t.Name } assert.ElementsMatch(t, []string{"tdd_red", "tdd_green", "tdd_refactor"}, names) } func TestTDDSkillHandleUnknown(t *testing.T) { skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"}) _, err := skill.Handle(context.Background(), "tdd_unknown", json.RawMessage(`{}`)) assert.ErrorContains(t, err, "unknown tool") } func TestTDDRedRequiresProjectRoot(t *testing.T) { skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"}) _, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"spec":"add two numbers"}`)) assert.ErrorContains(t, err, "project_root") } func TestTDDRedRequiresSpec(t *testing.T) { skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"}) _, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"project_root":"/tmp/proj"}`)) assert.ErrorContains(t, err, "spec") } ``` - [ ] **Step 2: Run — expect FAIL** ```bash go test ./internal/skills/tdd/... ``` Expected: FAIL — tdd package not found - [ ] **Step 3: Create `internal/skills/tdd/skill.go`** ```go package tdd import ( "context" "encoding/json" iexec "github.com/mathiasbq/supervisor/internal/exec" "github.com/mathiasbq/supervisor/internal/registry" ) // ExecutorFn allows injecting a test double for the executor. type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error) type Config struct { SystemPrompt string SkillPrompt string ExecutorFn ExecutorFn // nil = use real executor DefaultModel string } type Skill struct { cfg Config } func New(cfg Config) *Skill { return &Skill{cfg: cfg} } func (s *Skill) Name() string { return "tdd" } func (s *Skill) Tools() []registry.ToolDef { schema := func(required []string, props map[string]any) json.RawMessage { b, _ := json.Marshal(map[string]any{ "type": "object", "required": required, "properties": props, }) return b } strProp := map[string]any{"type": "string"} return []registry.ToolDef{ { Name: "tdd_red", Description: "Write a failing test for the described behavior. Verifies the test fails before returning.", InputSchema: schema( []string{"project_root", "spec"}, map[string]any{ "project_root": strProp, "spec": strProp, "model": strProp, "test_cmd": strProp, }, ), }, { Name: "tdd_green", Description: "Write minimal implementation to make the test at test_path pass.", InputSchema: schema( []string{"project_root", "test_path"}, map[string]any{ "project_root": strProp, "test_path": strProp, "model": strProp, "test_cmd": strProp, }, ), }, { Name: "tdd_refactor", Description: "Refactor the implementation at impl_path while keeping tests green.", InputSchema: schema( []string{"project_root", "test_path", "impl_path"}, map[string]any{ "project_root": strProp, "test_path": strProp, "impl_path": strProp, "model": strProp, "test_cmd": strProp, }, ), }, } } ``` - [ ] **Step 4: Create `internal/skills/tdd/handlers.go`** ```go package tdd import ( "context" "encoding/json" "fmt" iexec "github.com/mathiasbq/supervisor/internal/exec" ) func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) { switch tool { case "tdd_red": return s.handleRed(ctx, args) case "tdd_green": return s.handleGreen(ctx, args) case "tdd_refactor": return s.handleRefactor(ctx, args) default: return nil, fmt.Errorf("unknown tool: %s", tool) } } type redArgs struct { ProjectRoot string `json:"project_root"` Spec string `json:"spec"` Model string `json:"model"` TestCmd string `json:"test_cmd"` } func (s *Skill) handleRed(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) { var args redArgs if err := json.Unmarshal(raw, &args); err != nil { return nil, fmt.Errorf("parse args: %w", err) } if args.ProjectRoot == "" { return nil, fmt.Errorf("project_root is required") } if args.Spec == "" { return nil, fmt.Errorf("spec is required") } task := fmt.Sprintf( "phase: red\nproject_root: %s\nspec: %s\nmodel: %s\ntest_cmd: %s", args.ProjectRoot, args.Spec, s.resolveModel(args.Model), args.TestCmd, ) return s.execute(ctx, task) } type greenArgs struct { ProjectRoot string `json:"project_root"` TestPath string `json:"test_path"` Model string `json:"model"` TestCmd string `json:"test_cmd"` } func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) { var args greenArgs if err := json.Unmarshal(raw, &args); err != nil { return nil, fmt.Errorf("parse args: %w", err) } if args.ProjectRoot == "" { return nil, fmt.Errorf("project_root is required") } if args.TestPath == "" { return nil, fmt.Errorf("test_path is required") } task := fmt.Sprintf( "phase: green\nproject_root: %s\ntest_path: %s\nmodel: %s\ntest_cmd: %s", args.ProjectRoot, args.TestPath, s.resolveModel(args.Model), args.TestCmd, ) return s.execute(ctx, task) } type refactorArgs struct { ProjectRoot string `json:"project_root"` TestPath string `json:"test_path"` ImplPath string `json:"impl_path"` Model string `json:"model"` TestCmd string `json:"test_cmd"` } func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) { var args refactorArgs if err := json.Unmarshal(raw, &args); err != nil { return nil, fmt.Errorf("parse args: %w", err) } if args.ProjectRoot == "" { return nil, fmt.Errorf("project_root is required") } if args.TestPath == "" { return nil, fmt.Errorf("test_path is required") } if args.ImplPath == "" { return nil, fmt.Errorf("impl_path is required") } task := fmt.Sprintf( "phase: refactor\nproject_root: %s\ntest_path: %s\nimpl_path: %s\nmodel: %s\ntest_cmd: %s", args.ProjectRoot, args.TestPath, args.ImplPath, s.resolveModel(args.Model), args.TestCmd, ) return s.execute(ctx, task) } func (s *Skill) resolveModel(override string) string { if override != "" { return override } return s.cfg.DefaultModel } func (s *Skill) execute(ctx context.Context, task string) (json.RawMessage, error) { req := iexec.Request{ SkillPrompt: s.cfg.SkillPrompt, TaskPrompt: task, } var result iexec.Result var err error if s.cfg.ExecutorFn != nil { result, err = s.cfg.ExecutorFn(ctx, req) } else { return nil, fmt.Errorf("no executor configured") } if err != nil { return nil, err } return json.Marshal(result) } ``` - [ ] **Step 5: Run — expect PASS** ```bash go test ./internal/skills/tdd/... ``` Expected: PASS - [ ] **Step 6: Commit** ```bash git add internal/skills/tdd/ git commit -m "feat: add TDD skill with red/green/refactor handlers" ``` --- ## Task 9: Wire main.go **Files:** - Modify: `cmd/supervisor/main.go` - [ ] **Step 1: Write failing integration test** Create `cmd/supervisor/integration_test.go`: ```go //go:build integration package main_test import ( "bytes" "encoding/json" "net/http" "testing" "time" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) func TestServerStartsAndResponds(t *testing.T) { // Start the server in a goroutine (assumes it's running on :3200) // This test is run manually after `task supervisor:dev` time.Sleep(500 * time.Millisecond) body, _ := json.Marshal(map[string]any{ "jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": map[string]any{}, }) resp, err := http.Post("http://localhost:3200/mcp", "application/json", bytes.NewBuffer(body)) require.NoError(t, err) defer resp.Body.Close() assert.Equal(t, http.StatusOK, resp.StatusCode) } ``` - [ ] **Step 2: Run unit tests — all still pass** ```bash go test ./... ``` Expected: PASS (integration test skipped without build tag) - [ ] **Step 3: Rewrite `cmd/supervisor/main.go`** ```go package main import ( "log/slog" "net/http" "os" "github.com/mathiasbq/supervisor/internal/config" iexec "github.com/mathiasbq/supervisor/internal/exec" "github.com/mathiasbq/supervisor/internal/mcp" "github.com/mathiasbq/supervisor/internal/registry" "github.com/mathiasbq/supervisor/internal/skills/tdd" ) func main() { logger := slog.New(slog.NewJSONHandler(os.Stdout, nil)) cfg, err := config.Load() if err != nil { logger.Error("load config", "err", err) os.Exit(1) } models, err := config.LoadModels(cfg.ModelsFile) if err != nil { logger.Error("load models", "err", err) os.Exit(1) } systemPrompt, err := os.ReadFile(cfg.ConfigDir + "/CLAUDE.md") if err != nil { logger.Error("read supervisor CLAUDE.md", "err", err) os.Exit(1) } tddPrompt, err := os.ReadFile(cfg.ConfigDir + "/tdd.md") if err != nil { logger.Error("read tdd.md", "err", err) os.Exit(1) } executor := iexec.New(iexec.Config{ SystemPrompt: string(systemPrompt), LiteLLMBaseURL: cfg.LiteLLMBaseURL, LiteLLMAPIKey: cfg.LiteLLMAPIKey, }) reg := registry.New() reg.Register(tdd.New(tdd.Config{ SystemPrompt: string(systemPrompt), SkillPrompt: string(tddPrompt), DefaultModel: models.Resolve("tdd", ""), ExecutorFn: executor.Run, })) srv := mcp.NewServer(reg) mux := http.NewServeMux() mux.Handle("/mcp", srv) addr := ":" + cfg.Port logger.Info("supervisor starting", "addr", addr) if err := http.ListenAndServe(addr, mux); err != nil { logger.Error("server error", "err", err) os.Exit(1) } } ``` - [ ] **Step 4: Run all tests — expect PASS** ```bash go test ./... ``` Expected: PASS - [ ] **Step 5: Verify it builds** ```bash go build ./cmd/supervisor/ ``` Expected: binary produced, no errors - [ ] **Step 6: Commit** ```bash git add cmd/supervisor/main.go cmd/supervisor/integration_test.go git commit -m "feat: wire all components in main.go" ``` --- ## Task 10: Config files **Files:** - Create: `config/supervisor/CLAUDE.md` - Create: `config/supervisor/tdd.md` - Create: `.env.example` - [ ] **Step 1: Create `config/supervisor/CLAUDE.md`** ```markdown # Supervisor Agent You are a supervisor agent. You orchestrate skill workers. You do not converse. You do not explain. You execute and report. ## Output contract — NON-NEGOTIABLE Every response must be raw JSON conforming to the provided schema. No preamble, no markdown, no prose. Raw JSON only. If you cannot produce valid JSON for any reason, output: {"status":"error","phase":"","skill":"","file_path":"","runner_output":"","verified":false,"model_used":"self","message":""} ## Verification — IRON LAW Never trust your own output. Always run the test suite. Use the subprocess exit code as the only source of truth. `verified: true` only when exit code matches the expected outcome for the phase. `verified: false` even if you believe the code is correct. ## Loop prevention - Maximum 3 attempts per phase. - If output is identical across two consecutive attempts, stop immediately. - After 3 failures set status "error" and explain the blocker in message. - Never retry indefinitely. ## Test runner detection Inspect project_root for these signals in order: | Signal | Command | |---------------------------------|----------------------| | go.mod | go test ./... | | package.json | npm test | | pyproject.toml / pytest.ini | pytest | | Cargo.toml | cargo test | | Gemfile | bundle exec rspec | | mix.exs | mix test | If test_cmd is provided in the task, use it directly — skip detection. If runner cannot be determined, return status "error". ## Model delegation When a model is specified in the task, you may use it for code generation by calling LiteLLM at the LITELLM_BASE_URL with the specified model name. Always retain verification yourself — never delegate test execution. Tag model_used in your response with the model that generated the code, or "self" if you generated it directly. ``` - [ ] **Step 2: Create `config/supervisor/tdd.md`** ```markdown # TDD Skill ## Iron Law NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST. ## Red phase - Write exactly one test. One behavior. Name must describe the behavior clearly. - Run the test suite. Confirm the test FAILS. - If the test passes immediately: it tests existing behavior or is vacuous. Return status "fail" with message explaining why the test is wrong. - Do not write any implementation code in this phase. ## Green phase - Write the minimal code to make the failing test pass. Nothing more. - YAGNI: no extra parameters, no future-proofing, no clever abstractions. - Run the test suite. Confirm it PASSES. - If tests fail: fix the implementation, not the test. Max 3 attempts. ## Refactor phase - Improve structure, naming, or clarity only. No new behavior. - Tests must remain green after every change. - If tests break during refactor: revert that change, return status "fail". ``` - [ ] **Step 3: Create `.env.example`** ```bash # Supervisor MCP server SUPERVISOR_PORT=3200 SUPERVISOR_CONFIG_DIR=./config/supervisor SUPERVISOR_MODELS_FILE=./config/models.yaml # LiteLLM gateway (iguana) LITELLM_BASE_URL=http://iguana:4000 LITELLM_API_KEY=your-litellm-master-key ``` - [ ] **Step 4: Run all tests — still pass** ```bash task check ``` Expected: PASS - [ ] **Step 5: Commit** ```bash git add config/ .env.example git commit -m "feat: add supervisor CLAUDE.md, tdd skill prompt, and env example" ``` --- ## Task 11: Taskfile + MCP registration **Files:** - Modify: `Taskfile.yml` - Modify: `.context/mcp.json` - [ ] **Step 1: Add tasks to `Taskfile.yml`** Add these tasks to the existing `Taskfile.yml`: ```yaml supervisor:dev: desc: Run supervisor MCP server (development) cmds: - go run ./cmd/supervisor supervisor:build: desc: Build supervisor binary cmds: - go build -o bin/supervisor ./cmd/supervisor supervisor:test:smoke: desc: Smoke test supervisor via MCP (requires supervisor:dev running) cmds: - | curl -s -X POST http://localhost:${SUPERVISOR_PORT:-3200}/mcp \ -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq . ``` - [ ] **Step 2: Register supervisor in `.context/mcp.json`** Replace contents of `.context/mcp.json`: ```json { "mcpServers": { "knowledge": { "url": "http://localhost:3100/mcp", "description": "Project knowledge base — vector + graph retrieval" }, "supervisor": { "url": "http://localhost:3200/mcp", "description": "Skill workers — TDD (red/green/refactor), more coming" } } } ``` - [ ] **Step 3: Sync context files** ```bash task context:sync ``` Expected: CLAUDE.md, AGENTS.md updated with new MCP server entry - [ ] **Step 4: Commit** ```bash git add Taskfile.yml .context/mcp.json CLAUDE.md AGENTS.md git commit -m "feat: register supervisor MCP server and add task targets" ``` --- ## Task 12: End-to-end smoke test Manual verification that all layers connect. - [ ] **Step 1: Start the supervisor** In a terminal: ```bash cp .env.example .env # fill in LITELLM_API_KEY if needed task supervisor:dev ``` Expected: `{"time":"...","level":"INFO","msg":"supervisor starting","addr":":3200"}` - [ ] **Step 2: Verify tools/list** ```bash task supervisor:test:smoke ``` Expected: JSON response with `tdd_red`, `tdd_green`, `tdd_refactor` in `result.tools` - [ ] **Step 3: Verify tdd_red responds** ```bash curl -s -X POST http://localhost:3200/mcp \ -H "Content-Type: application/json" \ -d '{ "jsonrpc":"2.0","id":1,"method":"tools/call", "params":{ "name":"tdd_red", "arguments":{ "project_root":"'"$PWD"'", "spec":"a function that returns the sum of two integers" } } }' | jq . ``` Expected: JSON-RPC response with `result.content[0].text` containing a valid supervisor Result JSON - [ ] **Step 4: Reload Claude Code MCP config** In Claude Code, run `/mcp` to verify the supervisor server appears and its tools are listed. - [ ] **Step 5: Final commit** ```bash git add -p # stage any notes or fixes from smoke test git commit -m "chore: smoke test complete, supervisor MCP server operational" ``` --- ## go.sum After all `go get` commands in the tasks above, run: ```bash go mod tidy git add go.sum git commit -m "chore: tidy go.sum" ```