12 TDD tasks covering: Go module, Result type, Config, Models, Executor (claude --print), Registry, MCP server, TDD skill, main wiring, config files, Taskfile/MCP registration, smoke test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
47 KiB
Supervisor MCP Server Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Build a Go MCP server that exposes TDD skill workers as tools Claude Code can call natively, with a Claude supervisor instance handling orchestration and verification.
Architecture: A thin Go HTTP server routes MCP tool calls to skill handlers. Each handler spawns claude --print with --bare --system-prompt --json-schema --permission-mode bypassPermissions, injecting supervisor rules and skill discipline. The spawned Claude instance uses its own Bash/Read/Write tools to run tests and generate code, returning a structured JSON result. LiteLLM on iguana handles Ollama model delegation.
Tech Stack: Go 1.26, net/http (stdlib), gopkg.in/yaml.v3, testify, claude CLI, LiteLLM
File Map
| File | Responsibility |
|---|---|
go.mod |
Module definition |
cmd/supervisor/main.go |
Entry point, wires all dependencies |
internal/config/config.go |
Env vars → typed Config struct |
internal/config/config_test.go |
Config parsing tests |
internal/config/models.go |
YAML model routing table |
internal/config/models_test.go |
Model resolution tests |
internal/exec/result.go |
Result struct + JSON schema constant |
internal/exec/result_test.go |
JSON parsing/validation tests |
internal/exec/executor.go |
Spawn claude, capture output, parse result |
internal/exec/executor_test.go |
Executor tests using fake binary |
internal/registry/registry.go |
Skill interface + registry |
internal/registry/registry_test.go |
Registry routing tests |
internal/mcp/server.go |
HTTP/JSON-RPC MCP server |
internal/mcp/server_test.go |
MCP protocol tests |
internal/skills/tdd/skill.go |
Tool definitions (MCP schema) |
internal/skills/tdd/handlers.go |
tdd_red, tdd_green, tdd_refactor |
internal/skills/tdd/handlers_test.go |
Handler prompt-building tests |
config/supervisor/CLAUDE.md |
Supervisor orchestration rules |
config/supervisor/tdd.md |
TDD iron law (injected per call) |
config/models.yaml |
Per-skill model routing table |
.env.example |
Documented env vars |
Taskfile.yml |
Add supervisor:dev and supervisor:build tasks |
.context/mcp.json |
Register supervisor as MCP server |
Task 1: Go module + project skeleton
Files:
-
Create:
go.mod -
Create:
cmd/supervisor/main.go -
Step 1: Write a failing test to verify the binary compiles and exits cleanly
Create cmd/supervisor/main_test.go:
package main
import (
"os/exec"
"testing"
)
func TestBinaryCompiles(t *testing.T) {
cmd := exec.Command("go", "build", "./...")
out, err := cmd.CombinedOutput()
if err != nil {
t.Fatalf("build failed: %s\n%s", err, out)
}
}
- Step 2: Run test — expect it to fail (no go.mod yet)
go test ./...
Expected: error — no go.mod
- Step 3: Initialize the module
go mod init github.com/mathiasbq/supervisor
- Step 4: Create
cmd/supervisor/main.go
package main
import (
"fmt"
"os"
)
func main() {
fmt.Fprintln(os.Stderr, "supervisor: not yet implemented")
os.Exit(1)
}
- Step 5: Run test — expect it to pass
go test ./cmd/supervisor/...
Expected: PASS
- Step 6: Commit
git add go.mod cmd/
git commit -m "chore: initialize go module and cmd skeleton"
Task 2: Result type
Files:
-
Create:
internal/exec/result.go -
Create:
internal/exec/result_test.go -
Step 1: Write failing tests for result parsing
Create internal/exec/result_test.go:
package exec_test
import (
"encoding/json"
"testing"
"github.com/mathiasbq/supervisor/internal/exec"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestResultParsesValidJSON(t *testing.T) {
raw := `{
"status": "pass",
"phase": "red",
"skill": "tdd",
"file_path": "/tmp/foo_test.go",
"runner_output": "--- FAIL: TestFoo",
"verified": true,
"model_used": "self",
"message": "test fails as expected"
}`
var r exec.Result
require.NoError(t, json.Unmarshal([]byte(raw), &r))
assert.Equal(t, "pass", r.Status)
assert.Equal(t, "red", r.Phase)
assert.True(t, r.Verified)
}
func TestResultValidation(t *testing.T) {
tests := []struct {
name string
result exec.Result
wantErr bool
}{
{
name: "valid pass result",
result: exec.Result{
Status: "pass", Phase: "red", Skill: "tdd",
FilePath: "/tmp/x_test.go", RunnerOutput: "FAIL",
Verified: true, ModelUsed: "self", Message: "ok",
},
wantErr: false,
},
{
name: "empty status",
result: exec.Result{Phase: "red", Skill: "tdd"},
wantErr: true,
},
{
name: "invalid status",
result: exec.Result{Status: "unknown", Phase: "red", Skill: "tdd"},
wantErr: true,
},
{
name: "invalid phase",
result: exec.Result{Status: "pass", Phase: "bad", Skill: "tdd"},
wantErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := tt.result.Validate()
if tt.wantErr {
assert.Error(t, err)
} else {
assert.NoError(t, err)
}
})
}
}
- Step 2: Run — expect FAIL (package does not exist)
go test ./internal/exec/...
Expected: FAIL — cannot find package
- Step 3: Install testify
go get github.com/stretchr/testify@latest
- Step 4: Create
internal/exec/result.go
package exec
import (
"errors"
"strings"
)
// Result is the structured JSON output from every supervisor invocation.
// The JSON schema constant is passed to claude via --json-schema so Claude
// validates its own output before returning.
type Result struct {
Status string `json:"status"` // pass | fail | error
Phase string `json:"phase"` // red | green | refactor
Skill string `json:"skill"` // tdd | review | ...
FilePath string `json:"file_path"` // absolute path to generated file
RunnerOutput string `json:"runner_output"` // raw stdout+stderr from test runner
Verified bool `json:"verified"` // based on exit code, never self-report
ModelUsed string `json:"model_used"` // model name or "self"
Message string `json:"message"` // one sentence summary
}
var validStatuses = map[string]bool{"pass": true, "fail": true, "error": true}
var validPhases = map[string]bool{"red": true, "green": true, "refactor": true}
func (r Result) Validate() error {
var errs []string
if !validStatuses[r.Status] {
errs = append(errs, "status must be pass|fail|error, got: "+r.Status)
}
if !validPhases[r.Phase] {
errs = append(errs, "phase must be red|green|refactor, got: "+r.Phase)
}
if r.Skill == "" {
errs = append(errs, "skill is required")
}
if len(errs) > 0 {
return errors.New(strings.Join(errs, "; "))
}
return nil
}
// Schema is passed to claude --json-schema to enforce structured output.
const Schema = `{
"type": "object",
"required": ["status","phase","skill","file_path","runner_output","verified","model_used","message"],
"properties": {
"status": {"type": "string", "enum": ["pass","fail","error"]},
"phase": {"type": "string", "enum": ["red","green","refactor"]},
"skill": {"type": "string"},
"file_path": {"type": "string"},
"runner_output": {"type": "string"},
"verified": {"type": "boolean"},
"model_used": {"type": "string"},
"message": {"type": "string"}
}
}`
- Step 5: Run — expect PASS
go test ./internal/exec/...
Expected: PASS
- Step 6: Commit
git add internal/exec/result.go internal/exec/result_test.go
git commit -m "feat: add Result type with JSON schema and validation"
Task 3: Config package
Files:
-
Create:
internal/config/config.go -
Create:
internal/config/config_test.go -
Step 1: Write failing tests
Create internal/config/config_test.go:
package config_test
import (
"testing"
"github.com/mathiasbq/supervisor/internal/config"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestLoadDefaults(t *testing.T) {
t.Setenv("SUPERVISOR_PORT", "")
t.Setenv("LITELLM_BASE_URL", "")
t.Setenv("LITELLM_API_KEY", "")
t.Setenv("SUPERVISOR_CONFIG_DIR", "")
cfg, err := config.Load()
require.NoError(t, err)
assert.Equal(t, "3200", cfg.Port)
assert.Equal(t, "http://iguana:4000", cfg.LiteLLMBaseURL)
assert.Equal(t, "./config/supervisor", cfg.ConfigDir)
}
func TestLoadFromEnv(t *testing.T) {
t.Setenv("SUPERVISOR_PORT", "4000")
t.Setenv("LITELLM_BASE_URL", "http://localhost:4000")
t.Setenv("LITELLM_API_KEY", "test-key")
t.Setenv("SUPERVISOR_CONFIG_DIR", "/etc/supervisor")
cfg, err := config.Load()
require.NoError(t, err)
assert.Equal(t, "4000", cfg.Port)
assert.Equal(t, "http://localhost:4000", cfg.LiteLLMBaseURL)
assert.Equal(t, "test-key", cfg.LiteLLMAPIKey)
assert.Equal(t, "/etc/supervisor", cfg.ConfigDir)
}
- Step 2: Run — expect FAIL
go test ./internal/config/...
Expected: FAIL — cannot find package
- Step 3: Create
internal/config/config.go
package config
import "os"
type Config struct {
Port string // SUPERVISOR_PORT, default 3200
LiteLLMBaseURL string // LITELLM_BASE_URL, default http://iguana:4000
LiteLLMAPIKey string // LITELLM_API_KEY
ConfigDir string // SUPERVISOR_CONFIG_DIR, default ./config/supervisor
ModelsFile string // SUPERVISOR_MODELS_FILE, default <ConfigDir>/../models.yaml
}
func Load() (Config, error) {
cfg := Config{
Port: envOr("SUPERVISOR_PORT", "3200"),
LiteLLMBaseURL: envOr("LITELLM_BASE_URL", "http://iguana:4000"),
LiteLLMAPIKey: os.Getenv("LITELLM_API_KEY"),
ConfigDir: envOr("SUPERVISOR_CONFIG_DIR", "./config/supervisor"),
}
cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml")
return cfg, nil
}
func envOr(key, def string) string {
if v := os.Getenv(key); v != "" {
return v
}
return def
}
- Step 4: Run — expect PASS
go test ./internal/config/...
Expected: PASS
- Step 5: Commit
git add internal/config/
git commit -m "feat: add config package with env-var loading"
Task 4: Models config
Files:
-
Create:
internal/config/models.go -
Create:
internal/config/models_test.go -
Create:
config/models.yaml -
Step 1: Write failing tests
Append to internal/config/models_test.go (new file):
package config_test
import (
"os"
"path/filepath"
"testing"
"github.com/mathiasbq/supervisor/internal/config"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestModelsResolve(t *testing.T) {
yaml := `
default: ollama/default-model
skills:
tdd: ollama/qwen3-coder-30b-tuned
review: ollama/devstral-tuned
`
f := filepath.Join(t.TempDir(), "models.yaml")
require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))
m, err := config.LoadModels(f)
require.NoError(t, err)
assert.Equal(t, "ollama/qwen3-coder-30b-tuned", m.Resolve("tdd", ""))
assert.Equal(t, "ollama/devstral-tuned", m.Resolve("review", ""))
assert.Equal(t, "ollama/default-model", m.Resolve("unknown", ""))
}
func TestModelsOverride(t *testing.T) {
yaml := `
default: ollama/default-model
skills:
tdd: ollama/qwen3-coder-30b-tuned
`
f := filepath.Join(t.TempDir(), "models.yaml")
require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))
m, err := config.LoadModels(f)
require.NoError(t, err)
// caller override wins over skill default
assert.Equal(t, "anthropic/claude-sonnet-4-6", m.Resolve("tdd", "anthropic/claude-sonnet-4-6"))
}
- Step 2: Run — expect FAIL
go test ./internal/config/...
Expected: FAIL — Models type undefined
- Step 3: Install yaml dependency
go get gopkg.in/yaml.v3
- Step 4: Create
internal/config/models.go
package config
import (
"fmt"
"os"
"gopkg.in/yaml.v3"
)
type modelsFile struct {
Default string `yaml:"default"`
Skills map[string]string `yaml:"skills"`
}
type Models struct {
data modelsFile
}
func LoadModels(path string) (Models, error) {
raw, err := os.ReadFile(path)
if err != nil {
return Models{}, fmt.Errorf("load models: %w", err)
}
var f modelsFile
if err := yaml.Unmarshal(raw, &f); err != nil {
return Models{}, fmt.Errorf("parse models: %w", err)
}
return Models{data: f}, nil
}
// Resolve returns the model for a skill, respecting three-layer priority:
// 1. override (from MCP call) — highest
// 2. per-skill default from models.yaml
// 3. global default
func (m Models) Resolve(skill, override string) string {
if override != "" {
return override
}
if model, ok := m.data.Skills[skill]; ok {
return model
}
return m.data.Default
}
- Step 5: Run — expect PASS
go test ./internal/config/...
Expected: PASS
- Step 6: Create
config/models.yaml
# Model routing table — three-layer priority:
# 1. model param in MCP tool call (caller override)
# 2. per-skill entry here
# 3. default (fallback)
default: ollama/qwen3-coder-30b-tuned
skills:
tdd: ollama/qwen3-coder-30b-tuned
review: ollama/devstral-tuned
debug: ollama/deepseek-r1-tuned
- Step 7: Commit
git add internal/config/models.go internal/config/models_test.go config/models.yaml
git commit -m "feat: add model routing table with three-layer priority"
Task 5: Exec package
Files:
-
Create:
internal/exec/executor.go -
Create:
internal/exec/executor_test.go -
Modify:
internal/exec/result.go(add Schema constant — already done in Task 2) -
Step 1: Write failing tests
Create internal/exec/executor_test.go:
package exec_test
import (
"context"
"os"
"path/filepath"
"testing"
"time"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// fakeClaudePath writes a shell script that prints a fixed JSON result
// and returns its path. Used to test executor without invoking real claude.
func fakeClaudePath(t *testing.T, output string, exitCode int) string {
t.Helper()
dir := t.TempDir()
script := filepath.Join(dir, "claude")
var content string
if exitCode != 0 {
content = "#!/bin/sh\necho 'error' >&2\nexit 1\n"
} else {
content = "#!/bin/sh\necho '" + output + "'\n"
}
require.NoError(t, os.WriteFile(script, []byte(content), 0755))
return script
}
func TestExecutorParsesValidResult(t *testing.T) {
validJSON := `{"status":"pass","phase":"red","skill":"tdd","file_path":"/tmp/x_test.go","runner_output":"FAIL","verified":true,"model_used":"self","message":"ok"}`
claude := fakeClaudePath(t, validJSON, 0)
ex := iexec.New(iexec.Config{
ClaudeBinary: claude,
SystemPrompt: "you are a supervisor",
Timeout: 5 * time.Second,
})
result, err := ex.Run(context.Background(), iexec.Request{
SkillPrompt: "tdd rules",
TaskPrompt: "run red phase",
})
require.NoError(t, err)
assert.Equal(t, "pass", result.Status)
assert.True(t, result.Verified)
}
func TestExecutorReturnsErrorOnNonZeroExit(t *testing.T) {
claude := fakeClaudePath(t, "", 1)
ex := iexec.New(iexec.Config{
ClaudeBinary: claude,
SystemPrompt: "you are a supervisor",
Timeout: 5 * time.Second,
})
_, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "fail"})
assert.Error(t, err)
}
func TestExecutorTimesOut(t *testing.T) {
dir := t.TempDir()
script := filepath.Join(dir, "claude")
require.NoError(t, os.WriteFile(script, []byte("#!/bin/sh\nsleep 60\n"), 0755))
ex := iexec.New(iexec.Config{
ClaudeBinary: script,
SystemPrompt: "you are a supervisor",
Timeout: 100 * time.Millisecond,
})
_, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "slow"})
assert.ErrorContains(t, err, "timeout")
}
- Step 2: Verify the test file compiles
go vet ./internal/exec/...
Expected: no errors
- Step 3: Run — expect FAIL
go test ./internal/exec/...
Expected: FAIL — Executor type undefined
- Step 4: Create
internal/exec/executor.go
package exec
import (
"bytes"
"context"
"encoding/json"
"fmt"
"os/exec"
"strings"
"time"
)
// Config holds executor configuration.
type Config struct {
ClaudeBinary string // path to claude binary, defaults to "claude"
SystemPrompt string // contents of supervisor CLAUDE.md
Timeout time.Duration // per-invocation timeout, default 120s
LiteLLMBaseURL string // passed to Claude so it can delegate to Ollama
LiteLLMAPIKey string // passed to Claude for LiteLLM auth
}
// Request is the input to a single supervisor invocation.
type Request struct {
SkillPrompt string // skill-specific discipline (e.g. tdd.md contents)
TaskPrompt string // the specific task (phase, project_root, spec, model)
Model string // resolved model name, passed in task prompt
Tools string // comma-separated allowed tools, default "Bash,Read,Write"
}
// Executor spawns a claude instance and captures its structured JSON output.
type Executor struct {
cfg Config
}
func New(cfg Config) *Executor {
if cfg.ClaudeBinary == "" {
cfg.ClaudeBinary = "claude"
}
if cfg.Timeout == 0 {
cfg.Timeout = 120 * time.Second
}
return &Executor{cfg: cfg}
}
func (e *Executor) Run(ctx context.Context, req Request) (Result, error) {
ctx, cancel := context.WithTimeout(ctx, e.cfg.Timeout)
defer cancel()
tools := req.Tools
if tools == "" {
tools = "Bash,Read,Write"
}
// Build the full prompt: system rules + skill rules + task + infra context
litellmCtx := fmt.Sprintf(
"LITELLM_BASE_URL: %s\nLITELLM_API_KEY: %s",
e.cfg.LiteLLMBaseURL, e.cfg.LiteLLMAPIKey,
)
prompt := strings.Join([]string{
e.cfg.SystemPrompt,
"---",
req.SkillPrompt,
"---",
litellmCtx,
"---",
req.TaskPrompt,
}, "\n\n")
args := []string{
"--print",
"--bare",
"--permission-mode", "bypassPermissions",
"--tools", tools,
"--json-schema", Schema,
"--output-format", "text",
prompt,
}
cmd := exec.CommandContext(ctx, e.cfg.ClaudeBinary, args...)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
if ctx.Err() != nil {
return Result{}, fmt.Errorf("timeout after %s", e.cfg.Timeout)
}
return Result{}, fmt.Errorf("claude exited with error: %w — stderr: %s", err, stderr.String())
}
var r Result
if err := json.Unmarshal(stdout.Bytes(), &r); err != nil {
return Result{}, fmt.Errorf("parse result JSON: %w — raw output: %s", err, stdout.String())
}
if err := r.Validate(); err != nil {
return Result{}, fmt.Errorf("invalid result: %w", err)
}
return r, nil
}
- Step 5: Run — expect PASS
go test ./internal/exec/...
Expected: PASS
- Step 6: Commit
git add internal/exec/executor.go internal/exec/executor_test.go
git commit -m "feat: add executor that spawns claude and parses JSON result"
Task 6: Registry + Skill interface
Files:
-
Create:
internal/registry/registry.go -
Create:
internal/registry/registry_test.go -
Step 1: Write failing tests
Create internal/registry/registry_test.go:
package registry_test
import (
"context"
"encoding/json"
"testing"
"github.com/mathiasbq/supervisor/internal/registry"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
type stubSkill struct {
name string
}
func (s stubSkill) Name() string { return s.name }
func (s stubSkill) Tools() []registry.ToolDef {
return []registry.ToolDef{{
Name: s.name + "_tool",
Description: "stub tool",
InputSchema: json.RawMessage(`{"type":"object","properties":{}}`),
}}
}
func (s stubSkill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
return json.RawMessage(`{"ok":true}`), nil
}
func TestRegistryRoutes(t *testing.T) {
r := registry.New()
r.Register(stubSkill{name: "tdd"})
result, err := r.Dispatch(context.Background(), "tdd_tool", json.RawMessage(`{}`))
require.NoError(t, err)
assert.JSONEq(t, `{"ok":true}`, string(result))
}
func TestRegistryUnknownTool(t *testing.T) {
r := registry.New()
_, err := r.Dispatch(context.Background(), "unknown_tool", json.RawMessage(`{}`))
assert.ErrorContains(t, err, "unknown tool")
}
func TestRegistryListsTools(t *testing.T) {
r := registry.New()
r.Register(stubSkill{name: "tdd"})
tools := r.Tools()
require.Len(t, tools, 1)
assert.Equal(t, "tdd_tool", tools[0].Name)
}
- Step 2: Run — expect FAIL
go test ./internal/registry/...
Expected: FAIL — registry package not found
- Step 3: Create
internal/registry/registry.go
package registry
import (
"context"
"encoding/json"
"fmt"
)
// ToolDef describes a single MCP tool exposed by a skill.
type ToolDef struct {
Name string `json:"name"`
Description string `json:"description"`
InputSchema json.RawMessage `json:"inputSchema"`
}
// Skill is implemented by each skill package (tdd, review, etc.).
type Skill interface {
Name() string
Tools() []ToolDef
Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error)
}
// Registry routes MCP tool calls to the correct skill handler.
type Registry struct {
skills map[string]Skill // tool name → skill
tools []ToolDef
}
func New() *Registry {
return &Registry{skills: make(map[string]Skill)}
}
func (r *Registry) Register(s Skill) {
for _, t := range s.Tools() {
r.skills[t.Name] = s
r.tools = append(r.tools, t)
}
}
func (r *Registry) Tools() []ToolDef {
return r.tools
}
func (r *Registry) Dispatch(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
s, ok := r.skills[tool]
if !ok {
return nil, fmt.Errorf("unknown tool: %s", tool)
}
return s.Handle(ctx, tool, args)
}
- Step 4: Run — expect PASS
go test ./internal/registry/...
Expected: PASS
- Step 5: Commit
git add internal/registry/
git commit -m "feat: add skill registry with tool routing"
Task 7: MCP server
Files:
-
Create:
internal/mcp/server.go -
Create:
internal/mcp/server_test.go -
Step 1: Write failing tests
Create internal/mcp/server_test.go:
package mcp_test
import (
"bytes"
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
"github.com/mathiasbq/supervisor/internal/mcp"
"github.com/mathiasbq/supervisor/internal/registry"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func jsonBody(t *testing.T, v any) *bytes.Buffer {
t.Helper()
b, err := json.Marshal(v)
require.NoError(t, err)
return bytes.NewBuffer(b)
}
func TestMCPInitialize(t *testing.T) {
reg := registry.New()
srv := mcp.NewServer(reg)
req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": map[string]any{},
}))
req.Header.Set("Content-Type", "application/json")
rr := httptest.NewRecorder()
srv.ServeHTTP(rr, req)
assert.Equal(t, http.StatusOK, rr.Code)
var resp map[string]any
require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
result := resp["result"].(map[string]any)
assert.Equal(t, "2024-11-05", result["protocolVersion"])
}
func TestMCPToolsList(t *testing.T) {
reg := registry.New()
srv := mcp.NewServer(reg)
req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
"jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": map[string]any{},
}))
req.Header.Set("Content-Type", "application/json")
rr := httptest.NewRecorder()
srv.ServeHTTP(rr, req)
assert.Equal(t, http.StatusOK, rr.Code)
var resp map[string]any
require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
result := resp["result"].(map[string]any)
assert.NotNil(t, result["tools"])
}
func TestMCPUnknownMethod(t *testing.T) {
reg := registry.New()
srv := mcp.NewServer(reg)
req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
"jsonrpc": "2.0", "id": 3, "method": "unknown/method", "params": map[string]any{},
}))
req.Header.Set("Content-Type", "application/json")
rr := httptest.NewRecorder()
srv.ServeHTTP(rr, req)
assert.Equal(t, http.StatusOK, rr.Code)
var resp map[string]any
require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
assert.NotNil(t, resp["error"])
}
- Step 2: Run — expect FAIL
go test ./internal/mcp/...
Expected: FAIL — mcp package not found
- Step 3: Create
internal/mcp/server.go
package mcp
import (
"context"
"encoding/json"
"net/http"
"github.com/mathiasbq/supervisor/internal/registry"
)
type request struct {
JSONRPC string `json:"jsonrpc"`
ID any `json:"id"`
Method string `json:"method"`
Params json.RawMessage `json:"params"`
}
type response struct {
JSONRPC string `json:"jsonrpc"`
ID any `json:"id,omitempty"`
Result any `json:"result,omitempty"`
Error *rpcError `json:"error,omitempty"`
}
type rpcError struct {
Code int `json:"code"`
Message string `json:"message"`
}
// Server is an HTTP handler implementing the MCP JSON-RPC protocol.
type Server struct {
reg *registry.Registry
}
func NewServer(reg *registry.Registry) *Server {
return &Server{reg: reg}
}
func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
var req request
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeError(w, nil, -32700, "parse error")
return
}
var result any
var rpcErr *rpcError
switch req.Method {
case "initialize":
result = map[string]any{
"protocolVersion": "2024-11-05",
"capabilities": map[string]any{"tools": map[string]any{}},
"serverInfo": map[string]any{"name": "supervisor", "version": "0.1.0"},
}
case "tools/list":
result = map[string]any{"tools": s.reg.Tools()}
case "tools/call":
var p struct {
Name string `json:"name"`
Arguments json.RawMessage `json:"arguments"`
}
if err := json.Unmarshal(req.Params, &p); err != nil {
rpcErr = &rpcError{Code: -32602, Message: "invalid params"}
break
}
out, err := s.reg.Dispatch(context.Background(), p.Name, p.Arguments)
if err != nil {
rpcErr = &rpcError{Code: -32000, Message: err.Error()}
break
}
result = map[string]any{
"content": []map[string]any{{"type": "text", "text": string(out)}},
}
default:
rpcErr = &rpcError{Code: -32601, Message: "method not found: " + req.Method}
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(response{
JSONRPC: "2.0",
ID: req.ID,
Result: result,
Error: rpcErr,
})
}
func writeError(w http.ResponseWriter, id any, code int, msg string) {
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(response{
JSONRPC: "2.0",
ID: id,
Error: &rpcError{Code: code, Message: msg},
})
}
- Step 4: Run — expect PASS
go test ./internal/mcp/...
Expected: PASS
- Step 5: Commit
git add internal/mcp/
git commit -m "feat: add MCP HTTP server with JSON-RPC 2.0 transport"
Task 8: TDD skill
Files:
-
Create:
internal/skills/tdd/skill.go -
Create:
internal/skills/tdd/handlers.go -
Create:
internal/skills/tdd/handlers_test.go -
Step 1: Write failing tests
Create internal/skills/tdd/handlers_test.go:
package tdd_test
import (
"context"
"encoding/json"
"testing"
"github.com/mathiasbq/supervisor/internal/skills/tdd"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestTDDSkillTools(t *testing.T) {
skill := tdd.New(tdd.Config{
SystemPrompt: "supervisor rules",
SkillPrompt: "tdd rules",
ExecutorFn: nil, // will be set in TestHandle
})
tools := skill.Tools()
names := make([]string, len(tools))
for i, t := range tools {
names[i] = t.Name
}
assert.ElementsMatch(t, []string{"tdd_red", "tdd_green", "tdd_refactor"}, names)
}
func TestTDDSkillHandleUnknown(t *testing.T) {
skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
_, err := skill.Handle(context.Background(), "tdd_unknown", json.RawMessage(`{}`))
assert.ErrorContains(t, err, "unknown tool")
}
func TestTDDRedRequiresProjectRoot(t *testing.T) {
skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
_, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"spec":"add two numbers"}`))
assert.ErrorContains(t, err, "project_root")
}
func TestTDDRedRequiresSpec(t *testing.T) {
skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
_, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"project_root":"/tmp/proj"}`))
assert.ErrorContains(t, err, "spec")
}
- Step 2: Run — expect FAIL
go test ./internal/skills/tdd/...
Expected: FAIL — tdd package not found
- Step 3: Create
internal/skills/tdd/skill.go
package tdd
import (
"context"
"encoding/json"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/registry"
)
// ExecutorFn allows injecting a test double for the executor.
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
type Config struct {
SystemPrompt string
SkillPrompt string
ExecutorFn ExecutorFn // nil = use real executor
DefaultModel string
}
type Skill struct {
cfg Config
}
func New(cfg Config) *Skill {
return &Skill{cfg: cfg}
}
func (s *Skill) Name() string { return "tdd" }
func (s *Skill) Tools() []registry.ToolDef {
schema := func(required []string, props map[string]any) json.RawMessage {
b, _ := json.Marshal(map[string]any{
"type": "object",
"required": required,
"properties": props,
})
return b
}
strProp := map[string]any{"type": "string"}
return []registry.ToolDef{
{
Name: "tdd_red",
Description: "Write a failing test for the described behavior. Verifies the test fails before returning.",
InputSchema: schema(
[]string{"project_root", "spec"},
map[string]any{
"project_root": strProp,
"spec": strProp,
"model": strProp,
"test_cmd": strProp,
},
),
},
{
Name: "tdd_green",
Description: "Write minimal implementation to make the test at test_path pass.",
InputSchema: schema(
[]string{"project_root", "test_path"},
map[string]any{
"project_root": strProp,
"test_path": strProp,
"model": strProp,
"test_cmd": strProp,
},
),
},
{
Name: "tdd_refactor",
Description: "Refactor the implementation at impl_path while keeping tests green.",
InputSchema: schema(
[]string{"project_root", "test_path", "impl_path"},
map[string]any{
"project_root": strProp,
"test_path": strProp,
"impl_path": strProp,
"model": strProp,
"test_cmd": strProp,
},
),
},
}
}
- Step 4: Create
internal/skills/tdd/handlers.go
package tdd
import (
"context"
"encoding/json"
"fmt"
iexec "github.com/mathiasbq/supervisor/internal/exec"
)
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
switch tool {
case "tdd_red":
return s.handleRed(ctx, args)
case "tdd_green":
return s.handleGreen(ctx, args)
case "tdd_refactor":
return s.handleRefactor(ctx, args)
default:
return nil, fmt.Errorf("unknown tool: %s", tool)
}
}
type redArgs struct {
ProjectRoot string `json:"project_root"`
Spec string `json:"spec"`
Model string `json:"model"`
TestCmd string `json:"test_cmd"`
}
func (s *Skill) handleRed(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
var args redArgs
if err := json.Unmarshal(raw, &args); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if args.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if args.Spec == "" {
return nil, fmt.Errorf("spec is required")
}
task := fmt.Sprintf(
"phase: red\nproject_root: %s\nspec: %s\nmodel: %s\ntest_cmd: %s",
args.ProjectRoot, args.Spec, s.resolveModel(args.Model), args.TestCmd,
)
return s.execute(ctx, task)
}
type greenArgs struct {
ProjectRoot string `json:"project_root"`
TestPath string `json:"test_path"`
Model string `json:"model"`
TestCmd string `json:"test_cmd"`
}
func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
var args greenArgs
if err := json.Unmarshal(raw, &args); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if args.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if args.TestPath == "" {
return nil, fmt.Errorf("test_path is required")
}
task := fmt.Sprintf(
"phase: green\nproject_root: %s\ntest_path: %s\nmodel: %s\ntest_cmd: %s",
args.ProjectRoot, args.TestPath, s.resolveModel(args.Model), args.TestCmd,
)
return s.execute(ctx, task)
}
type refactorArgs struct {
ProjectRoot string `json:"project_root"`
TestPath string `json:"test_path"`
ImplPath string `json:"impl_path"`
Model string `json:"model"`
TestCmd string `json:"test_cmd"`
}
func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
var args refactorArgs
if err := json.Unmarshal(raw, &args); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if args.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if args.TestPath == "" {
return nil, fmt.Errorf("test_path is required")
}
if args.ImplPath == "" {
return nil, fmt.Errorf("impl_path is required")
}
task := fmt.Sprintf(
"phase: refactor\nproject_root: %s\ntest_path: %s\nimpl_path: %s\nmodel: %s\ntest_cmd: %s",
args.ProjectRoot, args.TestPath, args.ImplPath, s.resolveModel(args.Model), args.TestCmd,
)
return s.execute(ctx, task)
}
func (s *Skill) resolveModel(override string) string {
if override != "" {
return override
}
return s.cfg.DefaultModel
}
func (s *Skill) execute(ctx context.Context, task string) (json.RawMessage, error) {
req := iexec.Request{
SkillPrompt: s.cfg.SkillPrompt,
TaskPrompt: task,
}
var result iexec.Result
var err error
if s.cfg.ExecutorFn != nil {
result, err = s.cfg.ExecutorFn(ctx, req)
} else {
return nil, fmt.Errorf("no executor configured")
}
if err != nil {
return nil, err
}
return json.Marshal(result)
}
- Step 5: Run — expect PASS
go test ./internal/skills/tdd/...
Expected: PASS
- Step 6: Commit
git add internal/skills/tdd/
git commit -m "feat: add TDD skill with red/green/refactor handlers"
Task 9: Wire main.go
Files:
-
Modify:
cmd/supervisor/main.go -
Step 1: Write failing integration test
Create cmd/supervisor/integration_test.go:
//go:build integration
package main_test
import (
"bytes"
"encoding/json"
"net/http"
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestServerStartsAndResponds(t *testing.T) {
// Start the server in a goroutine (assumes it's running on :3200)
// This test is run manually after `task supervisor:dev`
time.Sleep(500 * time.Millisecond)
body, _ := json.Marshal(map[string]any{
"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": map[string]any{},
})
resp, err := http.Post("http://localhost:3200/mcp", "application/json", bytes.NewBuffer(body))
require.NoError(t, err)
defer resp.Body.Close()
assert.Equal(t, http.StatusOK, resp.StatusCode)
}
- Step 2: Run unit tests — all still pass
go test ./...
Expected: PASS (integration test skipped without build tag)
- Step 3: Rewrite
cmd/supervisor/main.go
package main
import (
"log/slog"
"net/http"
"os"
"github.com/mathiasbq/supervisor/internal/config"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/mcp"
"github.com/mathiasbq/supervisor/internal/registry"
"github.com/mathiasbq/supervisor/internal/skills/tdd"
)
func main() {
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
cfg, err := config.Load()
if err != nil {
logger.Error("load config", "err", err)
os.Exit(1)
}
models, err := config.LoadModels(cfg.ModelsFile)
if err != nil {
logger.Error("load models", "err", err)
os.Exit(1)
}
systemPrompt, err := os.ReadFile(cfg.ConfigDir + "/CLAUDE.md")
if err != nil {
logger.Error("read supervisor CLAUDE.md", "err", err)
os.Exit(1)
}
tddPrompt, err := os.ReadFile(cfg.ConfigDir + "/tdd.md")
if err != nil {
logger.Error("read tdd.md", "err", err)
os.Exit(1)
}
executor := iexec.New(iexec.Config{
SystemPrompt: string(systemPrompt),
LiteLLMBaseURL: cfg.LiteLLMBaseURL,
LiteLLMAPIKey: cfg.LiteLLMAPIKey,
})
reg := registry.New()
reg.Register(tdd.New(tdd.Config{
SystemPrompt: string(systemPrompt),
SkillPrompt: string(tddPrompt),
DefaultModel: models.Resolve("tdd", ""),
ExecutorFn: executor.Run,
}))
srv := mcp.NewServer(reg)
mux := http.NewServeMux()
mux.Handle("/mcp", srv)
addr := ":" + cfg.Port
logger.Info("supervisor starting", "addr", addr)
if err := http.ListenAndServe(addr, mux); err != nil {
logger.Error("server error", "err", err)
os.Exit(1)
}
}
- Step 4: Run all tests — expect PASS
go test ./...
Expected: PASS
- Step 5: Verify it builds
go build ./cmd/supervisor/
Expected: binary produced, no errors
- Step 6: Commit
git add cmd/supervisor/main.go cmd/supervisor/integration_test.go
git commit -m "feat: wire all components in main.go"
Task 10: Config files
Files:
-
Create:
config/supervisor/CLAUDE.md -
Create:
config/supervisor/tdd.md -
Create:
.env.example -
Step 1: Create
config/supervisor/CLAUDE.md
# Supervisor Agent
You are a supervisor agent. You orchestrate skill workers.
You do not converse. You do not explain. You execute and report.
## Output contract — NON-NEGOTIABLE
Every response must be raw JSON conforming to the provided schema.
No preamble, no markdown, no prose. Raw JSON only.
If you cannot produce valid JSON for any reason, output:
{"status":"error","phase":"","skill":"","file_path":"","runner_output":"","verified":false,"model_used":"self","message":"<reason>"}
## Verification — IRON LAW
Never trust your own output. Always run the test suite.
Use the subprocess exit code as the only source of truth.
`verified: true` only when exit code matches the expected outcome for the phase.
`verified: false` even if you believe the code is correct.
## Loop prevention
- Maximum 3 attempts per phase.
- If output is identical across two consecutive attempts, stop immediately.
- After 3 failures set status "error" and explain the blocker in message.
- Never retry indefinitely.
## Test runner detection
Inspect project_root for these signals in order:
| Signal | Command |
|---------------------------------|----------------------|
| go.mod | go test ./... |
| package.json | npm test |
| pyproject.toml / pytest.ini | pytest |
| Cargo.toml | cargo test |
| Gemfile | bundle exec rspec |
| mix.exs | mix test |
If test_cmd is provided in the task, use it directly — skip detection.
If runner cannot be determined, return status "error".
## Model delegation
When a model is specified in the task, you may use it for code generation
by calling LiteLLM at the LITELLM_BASE_URL with the specified model name.
Always retain verification yourself — never delegate test execution.
Tag model_used in your response with the model that generated the code,
or "self" if you generated it directly.
- Step 2: Create
config/supervisor/tdd.md
# TDD Skill
## Iron Law
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.
## Red phase
- Write exactly one test. One behavior. Name must describe the behavior clearly.
- Run the test suite. Confirm the test FAILS.
- If the test passes immediately: it tests existing behavior or is vacuous.
Return status "fail" with message explaining why the test is wrong.
- Do not write any implementation code in this phase.
## Green phase
- Write the minimal code to make the failing test pass. Nothing more.
- YAGNI: no extra parameters, no future-proofing, no clever abstractions.
- Run the test suite. Confirm it PASSES.
- If tests fail: fix the implementation, not the test. Max 3 attempts.
## Refactor phase
- Improve structure, naming, or clarity only. No new behavior.
- Tests must remain green after every change.
- If tests break during refactor: revert that change, return status "fail".
- Step 3: Create
.env.example
# Supervisor MCP server
SUPERVISOR_PORT=3200
SUPERVISOR_CONFIG_DIR=./config/supervisor
SUPERVISOR_MODELS_FILE=./config/models.yaml
# LiteLLM gateway (iguana)
LITELLM_BASE_URL=http://iguana:4000
LITELLM_API_KEY=your-litellm-master-key
- Step 4: Run all tests — still pass
task check
Expected: PASS
- Step 5: Commit
git add config/ .env.example
git commit -m "feat: add supervisor CLAUDE.md, tdd skill prompt, and env example"
Task 11: Taskfile + MCP registration
Files:
-
Modify:
Taskfile.yml -
Modify:
.context/mcp.json -
Step 1: Add tasks to
Taskfile.yml
Add these tasks to the existing Taskfile.yml:
supervisor:dev:
desc: Run supervisor MCP server (development)
cmds:
- go run ./cmd/supervisor
supervisor:build:
desc: Build supervisor binary
cmds:
- go build -o bin/supervisor ./cmd/supervisor
supervisor:test:smoke:
desc: Smoke test supervisor via MCP (requires supervisor:dev running)
cmds:
- |
curl -s -X POST http://localhost:${SUPERVISOR_PORT:-3200}/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq .
- Step 2: Register supervisor in
.context/mcp.json
Replace contents of .context/mcp.json:
{
"mcpServers": {
"knowledge": {
"url": "http://localhost:3100/mcp",
"description": "Project knowledge base — vector + graph retrieval"
},
"supervisor": {
"url": "http://localhost:3200/mcp",
"description": "Skill workers — TDD (red/green/refactor), more coming"
}
}
}
- Step 3: Sync context files
task context:sync
Expected: CLAUDE.md, AGENTS.md updated with new MCP server entry
- Step 4: Commit
git add Taskfile.yml .context/mcp.json CLAUDE.md AGENTS.md
git commit -m "feat: register supervisor MCP server and add task targets"
Task 12: End-to-end smoke test
Manual verification that all layers connect.
- Step 1: Start the supervisor
In a terminal:
cp .env.example .env # fill in LITELLM_API_KEY if needed
task supervisor:dev
Expected: {"time":"...","level":"INFO","msg":"supervisor starting","addr":":3200"}
- Step 2: Verify tools/list
task supervisor:test:smoke
Expected: JSON response with tdd_red, tdd_green, tdd_refactor in result.tools
- Step 3: Verify tdd_red responds
curl -s -X POST http://localhost:3200/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc":"2.0","id":1,"method":"tools/call",
"params":{
"name":"tdd_red",
"arguments":{
"project_root":"'"$PWD"'",
"spec":"a function that returns the sum of two integers"
}
}
}' | jq .
Expected: JSON-RPC response with result.content[0].text containing a valid supervisor Result JSON
- Step 4: Reload Claude Code MCP config
In Claude Code, run /mcp to verify the supervisor server appears and its tools are listed.
- Step 5: Final commit
git add -p # stage any notes or fixes from smoke test
git commit -m "chore: smoke test complete, supervisor MCP server operational"
go.sum
After all go get commands in the tasks above, run:
go mod tidy
git add go.sum
git commit -m "chore: tidy go.sum"