Files

Mathias Bergqvist 7d108b2911 docs: add supervisor MCP server implementation plan

12 TDD tasks covering: Go module, Result type, Config, Models,
Executor (claude --print), Registry, MCP server, TDD skill,
main wiring, config files, Taskfile/MCP registration, smoke test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-17 06:56:48 +02:00

47 KiB

Raw Blame History

Supervisor MCP Server Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build a Go MCP server that exposes TDD skill workers as tools Claude Code can call natively, with a Claude supervisor instance handling orchestration and verification.

Architecture: A thin Go HTTP server routes MCP tool calls to skill handlers. Each handler spawns claude --print with --bare --system-prompt --json-schema --permission-mode bypassPermissions, injecting supervisor rules and skill discipline. The spawned Claude instance uses its own Bash/Read/Write tools to run tests and generate code, returning a structured JSON result. LiteLLM on iguana handles Ollama model delegation.

Tech Stack: Go 1.26, net/http (stdlib), gopkg.in/yaml.v3, testify, claude CLI, LiteLLM

File Map

File	Responsibility
`go.mod`	Module definition
`cmd/supervisor/main.go`	Entry point, wires all dependencies
`internal/config/config.go`	Env vars → typed Config struct
`internal/config/config_test.go`	Config parsing tests
`internal/config/models.go`	YAML model routing table
`internal/config/models_test.go`	Model resolution tests
`internal/exec/result.go`	Result struct + JSON schema constant
`internal/exec/result_test.go`	JSON parsing/validation tests
`internal/exec/executor.go`	Spawn claude, capture output, parse result
`internal/exec/executor_test.go`	Executor tests using fake binary
`internal/registry/registry.go`	Skill interface + registry
`internal/registry/registry_test.go`	Registry routing tests
`internal/mcp/server.go`	HTTP/JSON-RPC MCP server
`internal/mcp/server_test.go`	MCP protocol tests
`internal/skills/tdd/skill.go`	Tool definitions (MCP schema)
`internal/skills/tdd/handlers.go`	tdd_red, tdd_green, tdd_refactor
`internal/skills/tdd/handlers_test.go`	Handler prompt-building tests
`config/supervisor/CLAUDE.md`	Supervisor orchestration rules
`config/supervisor/tdd.md`	TDD iron law (injected per call)
`config/models.yaml`	Per-skill model routing table
`.env.example`	Documented env vars
`Taskfile.yml`	Add `supervisor:dev` and `supervisor:build` tasks
`.context/mcp.json`	Register supervisor as MCP server

Task 1: Go module + project skeleton

Files:

Create: go.mod
Create: cmd/supervisor/main.go
Step 1: Write a failing test to verify the binary compiles and exits cleanly

Create cmd/supervisor/main_test.go:

package main

import (
    "os/exec"
    "testing"
)

func TestBinaryCompiles(t *testing.T) {
    cmd := exec.Command("go", "build", "./...")
    out, err := cmd.CombinedOutput()
    if err != nil {
        t.Fatalf("build failed: %s\n%s", err, out)
    }
}

Step 2: Run test — expect it to fail (no go.mod yet)

go test ./...

Expected: error — no go.mod

Step 3: Initialize the module

go mod init github.com/mathiasbq/supervisor

Step 4: Create cmd/supervisor/main.go

package main

import (
    "fmt"
    "os"
)

func main() {
    fmt.Fprintln(os.Stderr, "supervisor: not yet implemented")
    os.Exit(1)
}

Step 5: Run test — expect it to pass

go test ./cmd/supervisor/...

Expected: PASS

Step 6: Commit

git add go.mod cmd/
git commit -m "chore: initialize go module and cmd skeleton"

Task 2: Result type

Files:

Create: internal/exec/result.go
Create: internal/exec/result_test.go
Step 1: Write failing tests for result parsing

Create internal/exec/result_test.go:

package exec_test

import (
    "encoding/json"
    "testing"

    "github.com/mathiasbq/supervisor/internal/exec"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestResultParsesValidJSON(t *testing.T) {
    raw := `{
        "status": "pass",
        "phase": "red",
        "skill": "tdd",
        "file_path": "/tmp/foo_test.go",
        "runner_output": "--- FAIL: TestFoo",
        "verified": true,
        "model_used": "self",
        "message": "test fails as expected"
    }`
    var r exec.Result
    require.NoError(t, json.Unmarshal([]byte(raw), &r))
    assert.Equal(t, "pass", r.Status)
    assert.Equal(t, "red", r.Phase)
    assert.True(t, r.Verified)
}

func TestResultValidation(t *testing.T) {
    tests := []struct {
        name    string
        result  exec.Result
        wantErr bool
    }{
        {
            name: "valid pass result",
            result: exec.Result{
                Status: "pass", Phase: "red", Skill: "tdd",
                FilePath: "/tmp/x_test.go", RunnerOutput: "FAIL",
                Verified: true, ModelUsed: "self", Message: "ok",
            },
            wantErr: false,
        },
        {
            name:    "empty status",
            result:  exec.Result{Phase: "red", Skill: "tdd"},
            wantErr: true,
        },
        {
            name:    "invalid status",
            result:  exec.Result{Status: "unknown", Phase: "red", Skill: "tdd"},
            wantErr: true,
        },
        {
            name:    "invalid phase",
            result:  exec.Result{Status: "pass", Phase: "bad", Skill: "tdd"},
            wantErr: true,
        },
    }
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            err := tt.result.Validate()
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}

Step 2: Run — expect FAIL (package does not exist)

go test ./internal/exec/...

Expected: FAIL — cannot find package

Step 3: Install testify

go get github.com/stretchr/testify@latest

Step 4: Create internal/exec/result.go

package exec

import (
    "errors"
    "strings"
)

// Result is the structured JSON output from every supervisor invocation.
// The JSON schema constant is passed to claude via --json-schema so Claude
// validates its own output before returning.
type Result struct {
    Status       string `json:"status"`        // pass | fail | error
    Phase        string `json:"phase"`         // red | green | refactor
    Skill        string `json:"skill"`         // tdd | review | ...
    FilePath     string `json:"file_path"`     // absolute path to generated file
    RunnerOutput string `json:"runner_output"` // raw stdout+stderr from test runner
    Verified     bool   `json:"verified"`      // based on exit code, never self-report
    ModelUsed    string `json:"model_used"`    // model name or "self"
    Message      string `json:"message"`       // one sentence summary
}

var validStatuses = map[string]bool{"pass": true, "fail": true, "error": true}
var validPhases = map[string]bool{"red": true, "green": true, "refactor": true}

func (r Result) Validate() error {
    var errs []string
    if !validStatuses[r.Status] {
        errs = append(errs, "status must be pass|fail|error, got: "+r.Status)
    }
    if !validPhases[r.Phase] {
        errs = append(errs, "phase must be red|green|refactor, got: "+r.Phase)
    }
    if r.Skill == "" {
        errs = append(errs, "skill is required")
    }
    if len(errs) > 0 {
        return errors.New(strings.Join(errs, "; "))
    }
    return nil
}

// Schema is passed to claude --json-schema to enforce structured output.
const Schema = `{
  "type": "object",
  "required": ["status","phase","skill","file_path","runner_output","verified","model_used","message"],
  "properties": {
    "status":        {"type": "string", "enum": ["pass","fail","error"]},
    "phase":         {"type": "string", "enum": ["red","green","refactor"]},
    "skill":         {"type": "string"},
    "file_path":     {"type": "string"},
    "runner_output": {"type": "string"},
    "verified":      {"type": "boolean"},
    "model_used":    {"type": "string"},
    "message":       {"type": "string"}
  }
}`

Step 5: Run — expect PASS

go test ./internal/exec/...

Expected: PASS

Step 6: Commit

git add internal/exec/result.go internal/exec/result_test.go
git commit -m "feat: add Result type with JSON schema and validation"

Task 3: Config package

Files:

Create: internal/config/config.go
Create: internal/config/config_test.go
Step 1: Write failing tests

Create internal/config/config_test.go:

package config_test

import (
    "testing"

    "github.com/mathiasbq/supervisor/internal/config"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestLoadDefaults(t *testing.T) {
    t.Setenv("SUPERVISOR_PORT", "")
    t.Setenv("LITELLM_BASE_URL", "")
    t.Setenv("LITELLM_API_KEY", "")
    t.Setenv("SUPERVISOR_CONFIG_DIR", "")

    cfg, err := config.Load()
    require.NoError(t, err)
    assert.Equal(t, "3200", cfg.Port)
    assert.Equal(t, "http://iguana:4000", cfg.LiteLLMBaseURL)
    assert.Equal(t, "./config/supervisor", cfg.ConfigDir)
}

func TestLoadFromEnv(t *testing.T) {
    t.Setenv("SUPERVISOR_PORT", "4000")
    t.Setenv("LITELLM_BASE_URL", "http://localhost:4000")
    t.Setenv("LITELLM_API_KEY", "test-key")
    t.Setenv("SUPERVISOR_CONFIG_DIR", "/etc/supervisor")

    cfg, err := config.Load()
    require.NoError(t, err)
    assert.Equal(t, "4000", cfg.Port)
    assert.Equal(t, "http://localhost:4000", cfg.LiteLLMBaseURL)
    assert.Equal(t, "test-key", cfg.LiteLLMAPIKey)
    assert.Equal(t, "/etc/supervisor", cfg.ConfigDir)
}

Step 2: Run — expect FAIL

go test ./internal/config/...

Expected: FAIL — cannot find package

Step 3: Create internal/config/config.go

package config

import "os"

type Config struct {
    Port           string // SUPERVISOR_PORT, default 3200
    LiteLLMBaseURL string // LITELLM_BASE_URL, default http://iguana:4000
    LiteLLMAPIKey  string // LITELLM_API_KEY
    ConfigDir      string // SUPERVISOR_CONFIG_DIR, default ./config/supervisor
    ModelsFile     string // SUPERVISOR_MODELS_FILE, default <ConfigDir>/../models.yaml
}

func Load() (Config, error) {
    cfg := Config{
        Port:           envOr("SUPERVISOR_PORT", "3200"),
        LiteLLMBaseURL: envOr("LITELLM_BASE_URL", "http://iguana:4000"),
        LiteLLMAPIKey:  os.Getenv("LITELLM_API_KEY"),
        ConfigDir:      envOr("SUPERVISOR_CONFIG_DIR", "./config/supervisor"),
    }
    cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml")
    return cfg, nil
}

func envOr(key, def string) string {
    if v := os.Getenv(key); v != "" {
        return v
    }
    return def
}

Step 4: Run — expect PASS

go test ./internal/config/...

Expected: PASS

Step 5: Commit

git add internal/config/
git commit -m "feat: add config package with env-var loading"

Task 4: Models config

Files:

Create: internal/config/models.go
Create: internal/config/models_test.go
Create: config/models.yaml
Step 1: Write failing tests

Append to internal/config/models_test.go (new file):

package config_test

import (
    "os"
    "path/filepath"
    "testing"

    "github.com/mathiasbq/supervisor/internal/config"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestModelsResolve(t *testing.T) {
    yaml := `
default: ollama/default-model
skills:
  tdd: ollama/qwen3-coder-30b-tuned
  review: ollama/devstral-tuned
`
    f := filepath.Join(t.TempDir(), "models.yaml")
    require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))

    m, err := config.LoadModels(f)
    require.NoError(t, err)

    assert.Equal(t, "ollama/qwen3-coder-30b-tuned", m.Resolve("tdd", ""))
    assert.Equal(t, "ollama/devstral-tuned", m.Resolve("review", ""))
    assert.Equal(t, "ollama/default-model", m.Resolve("unknown", ""))
}

func TestModelsOverride(t *testing.T) {
    yaml := `
default: ollama/default-model
skills:
  tdd: ollama/qwen3-coder-30b-tuned
`
    f := filepath.Join(t.TempDir(), "models.yaml")
    require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))

    m, err := config.LoadModels(f)
    require.NoError(t, err)

    // caller override wins over skill default
    assert.Equal(t, "anthropic/claude-sonnet-4-6", m.Resolve("tdd", "anthropic/claude-sonnet-4-6"))
}

Step 2: Run — expect FAIL

go test ./internal/config/...

Expected: FAIL — Models type undefined

Step 3: Install yaml dependency

go get gopkg.in/yaml.v3

Step 4: Create internal/config/models.go

package config

import (
    "fmt"
    "os"

    "gopkg.in/yaml.v3"
)

type modelsFile struct {
    Default string            `yaml:"default"`
    Skills  map[string]string `yaml:"skills"`
}

type Models struct {
    data modelsFile
}

func LoadModels(path string) (Models, error) {
    raw, err := os.ReadFile(path)
    if err != nil {
        return Models{}, fmt.Errorf("load models: %w", err)
    }
    var f modelsFile
    if err := yaml.Unmarshal(raw, &f); err != nil {
        return Models{}, fmt.Errorf("parse models: %w", err)
    }
    return Models{data: f}, nil
}

// Resolve returns the model for a skill, respecting three-layer priority:
// 1. override (from MCP call) — highest
// 2. per-skill default from models.yaml
// 3. global default
func (m Models) Resolve(skill, override string) string {
    if override != "" {
        return override
    }
    if model, ok := m.data.Skills[skill]; ok {
        return model
    }
    return m.data.Default
}

Step 5: Run — expect PASS

go test ./internal/config/...

Expected: PASS

Step 6: Create config/models.yaml

# Model routing table — three-layer priority:
# 1. model param in MCP tool call (caller override)
# 2. per-skill entry here
# 3. default (fallback)
default: ollama/qwen3-coder-30b-tuned

skills:
  tdd:    ollama/qwen3-coder-30b-tuned
  review: ollama/devstral-tuned
  debug:  ollama/deepseek-r1-tuned

Step 7: Commit

git add internal/config/models.go internal/config/models_test.go config/models.yaml
git commit -m "feat: add model routing table with three-layer priority"

Task 5: Exec package

Files:

Create: internal/exec/executor.go
Create: internal/exec/executor_test.go
Modify: internal/exec/result.go (add Schema constant — already done in Task 2)
Step 1: Write failing tests

Create internal/exec/executor_test.go:

package exec_test

import (
    "context"
    "os"
    "path/filepath"
    "testing"
    "time"

    iexec "github.com/mathiasbq/supervisor/internal/exec"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

// fakeClaudePath writes a shell script that prints a fixed JSON result
// and returns its path. Used to test executor without invoking real claude.
func fakeClaudePath(t *testing.T, output string, exitCode int) string {
    t.Helper()
    dir := t.TempDir()
    script := filepath.Join(dir, "claude")
    var content string
    if exitCode != 0 {
        content = "#!/bin/sh\necho 'error' >&2\nexit 1\n"
    } else {
        content = "#!/bin/sh\necho '" + output + "'\n"
    }
    require.NoError(t, os.WriteFile(script, []byte(content), 0755))
    return script
}

func TestExecutorParsesValidResult(t *testing.T) {
    validJSON := `{"status":"pass","phase":"red","skill":"tdd","file_path":"/tmp/x_test.go","runner_output":"FAIL","verified":true,"model_used":"self","message":"ok"}`
    claude := fakeClaudePath(t, validJSON, 0)

    ex := iexec.New(iexec.Config{
        ClaudeBinary:  claude,
        SystemPrompt:  "you are a supervisor",
        Timeout:       5 * time.Second,
    })

    result, err := ex.Run(context.Background(), iexec.Request{
        SkillPrompt: "tdd rules",
        TaskPrompt:  "run red phase",
    })
    require.NoError(t, err)
    assert.Equal(t, "pass", result.Status)
    assert.True(t, result.Verified)
}

func TestExecutorReturnsErrorOnNonZeroExit(t *testing.T) {
    claude := fakeClaudePath(t, "", 1)

    ex := iexec.New(iexec.Config{
        ClaudeBinary: claude,
        SystemPrompt: "you are a supervisor",
        Timeout:      5 * time.Second,
    })

    _, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "fail"})
    assert.Error(t, err)
}

func TestExecutorTimesOut(t *testing.T) {
    dir := t.TempDir()
    script := filepath.Join(dir, "claude")
    require.NoError(t, os.WriteFile(script, []byte("#!/bin/sh\nsleep 60\n"), 0755))

    ex := iexec.New(iexec.Config{
        ClaudeBinary: script,
        SystemPrompt: "you are a supervisor",
        Timeout:      100 * time.Millisecond,
    })

    _, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "slow"})
    assert.ErrorContains(t, err, "timeout")
}

Step 2: Verify the test file compiles

go vet ./internal/exec/...

Expected: no errors

Step 3: Run — expect FAIL

go test ./internal/exec/...

Expected: FAIL — Executor type undefined

Step 4: Create internal/exec/executor.go

package exec

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "os/exec"
    "strings"
    "time"
)

// Config holds executor configuration.
type Config struct {
    ClaudeBinary   string        // path to claude binary, defaults to "claude"
    SystemPrompt   string        // contents of supervisor CLAUDE.md
    Timeout        time.Duration // per-invocation timeout, default 120s
    LiteLLMBaseURL string        // passed to Claude so it can delegate to Ollama
    LiteLLMAPIKey  string        // passed to Claude for LiteLLM auth
}

// Request is the input to a single supervisor invocation.
type Request struct {
    SkillPrompt string // skill-specific discipline (e.g. tdd.md contents)
    TaskPrompt  string // the specific task (phase, project_root, spec, model)
    Model       string // resolved model name, passed in task prompt
    Tools       string // comma-separated allowed tools, default "Bash,Read,Write"
}

// Executor spawns a claude instance and captures its structured JSON output.
type Executor struct {
    cfg Config
}

func New(cfg Config) *Executor {
    if cfg.ClaudeBinary == "" {
        cfg.ClaudeBinary = "claude"
    }
    if cfg.Timeout == 0 {
        cfg.Timeout = 120 * time.Second
    }
    return &Executor{cfg: cfg}
}

func (e *Executor) Run(ctx context.Context, req Request) (Result, error) {
    ctx, cancel := context.WithTimeout(ctx, e.cfg.Timeout)
    defer cancel()

    tools := req.Tools
    if tools == "" {
        tools = "Bash,Read,Write"
    }

    // Build the full prompt: system rules + skill rules + task + infra context
    litellmCtx := fmt.Sprintf(
        "LITELLM_BASE_URL: %s\nLITELLM_API_KEY: %s",
        e.cfg.LiteLLMBaseURL, e.cfg.LiteLLMAPIKey,
    )
    prompt := strings.Join([]string{
        e.cfg.SystemPrompt,
        "---",
        req.SkillPrompt,
        "---",
        litellmCtx,
        "---",
        req.TaskPrompt,
    }, "\n\n")

    args := []string{
        "--print",
        "--bare",
        "--permission-mode", "bypassPermissions",
        "--tools", tools,
        "--json-schema", Schema,
        "--output-format", "text",
        prompt,
    }

    cmd := exec.CommandContext(ctx, e.cfg.ClaudeBinary, args...)
    var stdout, stderr bytes.Buffer
    cmd.Stdout = &stdout
    cmd.Stderr = &stderr

    if err := cmd.Run(); err != nil {
        if ctx.Err() != nil {
            return Result{}, fmt.Errorf("timeout after %s", e.cfg.Timeout)
        }
        return Result{}, fmt.Errorf("claude exited with error: %w — stderr: %s", err, stderr.String())
    }

    var r Result
    if err := json.Unmarshal(stdout.Bytes(), &r); err != nil {
        return Result{}, fmt.Errorf("parse result JSON: %w — raw output: %s", err, stdout.String())
    }
    if err := r.Validate(); err != nil {
        return Result{}, fmt.Errorf("invalid result: %w", err)
    }
    return r, nil
}

Step 5: Run — expect PASS

go test ./internal/exec/...

Expected: PASS

Step 6: Commit

git add internal/exec/executor.go internal/exec/executor_test.go
git commit -m "feat: add executor that spawns claude and parses JSON result"

Task 6: Registry + Skill interface

Files:

Create: internal/registry/registry.go
Create: internal/registry/registry_test.go
Step 1: Write failing tests

Create internal/registry/registry_test.go:

package registry_test

import (
    "context"
    "encoding/json"
    "testing"

    "github.com/mathiasbq/supervisor/internal/registry"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

type stubSkill struct {
    name string
}

func (s stubSkill) Name() string { return s.name }
func (s stubSkill) Tools() []registry.ToolDef {
    return []registry.ToolDef{{
        Name:        s.name + "_tool",
        Description: "stub tool",
        InputSchema: json.RawMessage(`{"type":"object","properties":{}}`),
    }}
}
func (s stubSkill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
    return json.RawMessage(`{"ok":true}`), nil
}

func TestRegistryRoutes(t *testing.T) {
    r := registry.New()
    r.Register(stubSkill{name: "tdd"})

    result, err := r.Dispatch(context.Background(), "tdd_tool", json.RawMessage(`{}`))
    require.NoError(t, err)
    assert.JSONEq(t, `{"ok":true}`, string(result))
}

func TestRegistryUnknownTool(t *testing.T) {
    r := registry.New()
    _, err := r.Dispatch(context.Background(), "unknown_tool", json.RawMessage(`{}`))
    assert.ErrorContains(t, err, "unknown tool")
}

func TestRegistryListsTools(t *testing.T) {
    r := registry.New()
    r.Register(stubSkill{name: "tdd"})

    tools := r.Tools()
    require.Len(t, tools, 1)
    assert.Equal(t, "tdd_tool", tools[0].Name)
}

Step 2: Run — expect FAIL

go test ./internal/registry/...

Expected: FAIL — registry package not found

Step 3: Create internal/registry/registry.go

package registry

import (
    "context"
    "encoding/json"
    "fmt"
)

// ToolDef describes a single MCP tool exposed by a skill.
type ToolDef struct {
    Name        string          `json:"name"`
    Description string          `json:"description"`
    InputSchema json.RawMessage `json:"inputSchema"`
}

// Skill is implemented by each skill package (tdd, review, etc.).
type Skill interface {
    Name() string
    Tools() []ToolDef
    Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error)
}

// Registry routes MCP tool calls to the correct skill handler.
type Registry struct {
    skills map[string]Skill // tool name → skill
    tools  []ToolDef
}

func New() *Registry {
    return &Registry{skills: make(map[string]Skill)}
}

func (r *Registry) Register(s Skill) {
    for _, t := range s.Tools() {
        r.skills[t.Name] = s
        r.tools = append(r.tools, t)
    }
}

func (r *Registry) Tools() []ToolDef {
    return r.tools
}

func (r *Registry) Dispatch(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
    s, ok := r.skills[tool]
    if !ok {
        return nil, fmt.Errorf("unknown tool: %s", tool)
    }
    return s.Handle(ctx, tool, args)
}

Step 4: Run — expect PASS

go test ./internal/registry/...

Expected: PASS

Step 5: Commit

git add internal/registry/
git commit -m "feat: add skill registry with tool routing"

Task 7: MCP server

Files:

Create: internal/mcp/server.go
Create: internal/mcp/server_test.go
Step 1: Write failing tests

Create internal/mcp/server_test.go:

package mcp_test

import (
    "bytes"
    "context"
    "encoding/json"
    "net/http"
    "net/http/httptest"
    "testing"

    "github.com/mathiasbq/supervisor/internal/mcp"
    "github.com/mathiasbq/supervisor/internal/registry"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func jsonBody(t *testing.T, v any) *bytes.Buffer {
    t.Helper()
    b, err := json.Marshal(v)
    require.NoError(t, err)
    return bytes.NewBuffer(b)
}

func TestMCPInitialize(t *testing.T) {
    reg := registry.New()
    srv := mcp.NewServer(reg)

    req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
        "jsonrpc": "2.0",
        "id":      1,
        "method":  "initialize",
        "params":  map[string]any{},
    }))
    req.Header.Set("Content-Type", "application/json")
    rr := httptest.NewRecorder()

    srv.ServeHTTP(rr, req)

    assert.Equal(t, http.StatusOK, rr.Code)
    var resp map[string]any
    require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
    result := resp["result"].(map[string]any)
    assert.Equal(t, "2024-11-05", result["protocolVersion"])
}

func TestMCPToolsList(t *testing.T) {
    reg := registry.New()
    srv := mcp.NewServer(reg)

    req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
        "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": map[string]any{},
    }))
    req.Header.Set("Content-Type", "application/json")
    rr := httptest.NewRecorder()
    srv.ServeHTTP(rr, req)

    assert.Equal(t, http.StatusOK, rr.Code)
    var resp map[string]any
    require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
    result := resp["result"].(map[string]any)
    assert.NotNil(t, result["tools"])
}

func TestMCPUnknownMethod(t *testing.T) {
    reg := registry.New()
    srv := mcp.NewServer(reg)

    req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
        "jsonrpc": "2.0", "id": 3, "method": "unknown/method", "params": map[string]any{},
    }))
    req.Header.Set("Content-Type", "application/json")
    rr := httptest.NewRecorder()
    srv.ServeHTTP(rr, req)

    assert.Equal(t, http.StatusOK, rr.Code)
    var resp map[string]any
    require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
    assert.NotNil(t, resp["error"])
}

Step 2: Run — expect FAIL

go test ./internal/mcp/...

Expected: FAIL — mcp package not found

Step 3: Create internal/mcp/server.go

package mcp

import (
    "context"
    "encoding/json"
    "net/http"

    "github.com/mathiasbq/supervisor/internal/registry"
)

type request struct {
    JSONRPC string          `json:"jsonrpc"`
    ID      any             `json:"id"`
    Method  string          `json:"method"`
    Params  json.RawMessage `json:"params"`
}

type response struct {
    JSONRPC string `json:"jsonrpc"`
    ID      any    `json:"id,omitempty"`
    Result  any    `json:"result,omitempty"`
    Error   *rpcError `json:"error,omitempty"`
}

type rpcError struct {
    Code    int    `json:"code"`
    Message string `json:"message"`
}

// Server is an HTTP handler implementing the MCP JSON-RPC protocol.
type Server struct {
    reg *registry.Registry
}

func NewServer(reg *registry.Registry) *Server {
    return &Server{reg: reg}
}

func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    var req request
    if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
        writeError(w, nil, -32700, "parse error")
        return
    }

    var result any
    var rpcErr *rpcError

    switch req.Method {
    case "initialize":
        result = map[string]any{
            "protocolVersion": "2024-11-05",
            "capabilities":    map[string]any{"tools": map[string]any{}},
            "serverInfo":      map[string]any{"name": "supervisor", "version": "0.1.0"},
        }

    case "tools/list":
        result = map[string]any{"tools": s.reg.Tools()}

    case "tools/call":
        var p struct {
            Name      string          `json:"name"`
            Arguments json.RawMessage `json:"arguments"`
        }
        if err := json.Unmarshal(req.Params, &p); err != nil {
            rpcErr = &rpcError{Code: -32602, Message: "invalid params"}
            break
        }
        out, err := s.reg.Dispatch(context.Background(), p.Name, p.Arguments)
        if err != nil {
            rpcErr = &rpcError{Code: -32000, Message: err.Error()}
            break
        }
        result = map[string]any{
            "content": []map[string]any{{"type": "text", "text": string(out)}},
        }

    default:
        rpcErr = &rpcError{Code: -32601, Message: "method not found: " + req.Method}
    }

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(response{
        JSONRPC: "2.0",
        ID:      req.ID,
        Result:  result,
        Error:   rpcErr,
    })
}

func writeError(w http.ResponseWriter, id any, code int, msg string) {
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(response{
        JSONRPC: "2.0",
        ID:      id,
        Error:   &rpcError{Code: code, Message: msg},
    })
}

Step 4: Run — expect PASS

go test ./internal/mcp/...

Expected: PASS

Step 5: Commit

git add internal/mcp/
git commit -m "feat: add MCP HTTP server with JSON-RPC 2.0 transport"

Task 8: TDD skill

Files:

Create: internal/skills/tdd/skill.go
Create: internal/skills/tdd/handlers.go
Create: internal/skills/tdd/handlers_test.go
Step 1: Write failing tests

Create internal/skills/tdd/handlers_test.go:

package tdd_test

import (
    "context"
    "encoding/json"
    "testing"

    "github.com/mathiasbq/supervisor/internal/skills/tdd"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestTDDSkillTools(t *testing.T) {
    skill := tdd.New(tdd.Config{
        SystemPrompt: "supervisor rules",
        SkillPrompt:  "tdd rules",
        ExecutorFn:   nil, // will be set in TestHandle
    })
    tools := skill.Tools()
    names := make([]string, len(tools))
    for i, t := range tools {
        names[i] = t.Name
    }
    assert.ElementsMatch(t, []string{"tdd_red", "tdd_green", "tdd_refactor"}, names)
}

func TestTDDSkillHandleUnknown(t *testing.T) {
    skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
    _, err := skill.Handle(context.Background(), "tdd_unknown", json.RawMessage(`{}`))
    assert.ErrorContains(t, err, "unknown tool")
}

func TestTDDRedRequiresProjectRoot(t *testing.T) {
    skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
    _, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"spec":"add two numbers"}`))
    assert.ErrorContains(t, err, "project_root")
}

func TestTDDRedRequiresSpec(t *testing.T) {
    skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
    _, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"project_root":"/tmp/proj"}`))
    assert.ErrorContains(t, err, "spec")
}

Step 2: Run — expect FAIL

go test ./internal/skills/tdd/...

Expected: FAIL — tdd package not found

Step 3: Create internal/skills/tdd/skill.go

package tdd

import (
    "context"
    "encoding/json"

    iexec "github.com/mathiasbq/supervisor/internal/exec"
    "github.com/mathiasbq/supervisor/internal/registry"
)

// ExecutorFn allows injecting a test double for the executor.
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)

type Config struct {
    SystemPrompt string
    SkillPrompt  string
    ExecutorFn   ExecutorFn // nil = use real executor
    DefaultModel string
}

type Skill struct {
    cfg Config
}

func New(cfg Config) *Skill {
    return &Skill{cfg: cfg}
}

func (s *Skill) Name() string { return "tdd" }

func (s *Skill) Tools() []registry.ToolDef {
    schema := func(required []string, props map[string]any) json.RawMessage {
        b, _ := json.Marshal(map[string]any{
            "type":       "object",
            "required":   required,
            "properties": props,
        })
        return b
    }
    strProp := map[string]any{"type": "string"}

    return []registry.ToolDef{
        {
            Name:        "tdd_red",
            Description: "Write a failing test for the described behavior. Verifies the test fails before returning.",
            InputSchema: schema(
                []string{"project_root", "spec"},
                map[string]any{
                    "project_root": strProp,
                    "spec":         strProp,
                    "model":        strProp,
                    "test_cmd":     strProp,
                },
            ),
        },
        {
            Name:        "tdd_green",
            Description: "Write minimal implementation to make the test at test_path pass.",
            InputSchema: schema(
                []string{"project_root", "test_path"},
                map[string]any{
                    "project_root": strProp,
                    "test_path":    strProp,
                    "model":        strProp,
                    "test_cmd":     strProp,
                },
            ),
        },
        {
            Name:        "tdd_refactor",
            Description: "Refactor the implementation at impl_path while keeping tests green.",
            InputSchema: schema(
                []string{"project_root", "test_path", "impl_path"},
                map[string]any{
                    "project_root": strProp,
                    "test_path":    strProp,
                    "impl_path":    strProp,
                    "model":        strProp,
                    "test_cmd":     strProp,
                },
            ),
        },
    }
}

Step 4: Create internal/skills/tdd/handlers.go

package tdd

import (
    "context"
    "encoding/json"
    "fmt"

    iexec "github.com/mathiasbq/supervisor/internal/exec"
)

func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
    switch tool {
    case "tdd_red":
        return s.handleRed(ctx, args)
    case "tdd_green":
        return s.handleGreen(ctx, args)
    case "tdd_refactor":
        return s.handleRefactor(ctx, args)
    default:
        return nil, fmt.Errorf("unknown tool: %s", tool)
    }
}

type redArgs struct {
    ProjectRoot string `json:"project_root"`
    Spec        string `json:"spec"`
    Model       string `json:"model"`
    TestCmd     string `json:"test_cmd"`
}

func (s *Skill) handleRed(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
    var args redArgs
    if err := json.Unmarshal(raw, &args); err != nil {
        return nil, fmt.Errorf("parse args: %w", err)
    }
    if args.ProjectRoot == "" {
        return nil, fmt.Errorf("project_root is required")
    }
    if args.Spec == "" {
        return nil, fmt.Errorf("spec is required")
    }

    task := fmt.Sprintf(
        "phase: red\nproject_root: %s\nspec: %s\nmodel: %s\ntest_cmd: %s",
        args.ProjectRoot, args.Spec, s.resolveModel(args.Model), args.TestCmd,
    )
    return s.execute(ctx, task)
}

type greenArgs struct {
    ProjectRoot string `json:"project_root"`
    TestPath    string `json:"test_path"`
    Model       string `json:"model"`
    TestCmd     string `json:"test_cmd"`
}

func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
    var args greenArgs
    if err := json.Unmarshal(raw, &args); err != nil {
        return nil, fmt.Errorf("parse args: %w", err)
    }
    if args.ProjectRoot == "" {
        return nil, fmt.Errorf("project_root is required")
    }
    if args.TestPath == "" {
        return nil, fmt.Errorf("test_path is required")
    }

    task := fmt.Sprintf(
        "phase: green\nproject_root: %s\ntest_path: %s\nmodel: %s\ntest_cmd: %s",
        args.ProjectRoot, args.TestPath, s.resolveModel(args.Model), args.TestCmd,
    )
    return s.execute(ctx, task)
}

type refactorArgs struct {
    ProjectRoot string `json:"project_root"`
    TestPath    string `json:"test_path"`
    ImplPath    string `json:"impl_path"`
    Model       string `json:"model"`
    TestCmd     string `json:"test_cmd"`
}

func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
    var args refactorArgs
    if err := json.Unmarshal(raw, &args); err != nil {
        return nil, fmt.Errorf("parse args: %w", err)
    }
    if args.ProjectRoot == "" {
        return nil, fmt.Errorf("project_root is required")
    }
    if args.TestPath == "" {
        return nil, fmt.Errorf("test_path is required")
    }
    if args.ImplPath == "" {
        return nil, fmt.Errorf("impl_path is required")
    }

    task := fmt.Sprintf(
        "phase: refactor\nproject_root: %s\ntest_path: %s\nimpl_path: %s\nmodel: %s\ntest_cmd: %s",
        args.ProjectRoot, args.TestPath, args.ImplPath, s.resolveModel(args.Model), args.TestCmd,
    )
    return s.execute(ctx, task)
}

func (s *Skill) resolveModel(override string) string {
    if override != "" {
        return override
    }
    return s.cfg.DefaultModel
}

func (s *Skill) execute(ctx context.Context, task string) (json.RawMessage, error) {
    req := iexec.Request{
        SkillPrompt: s.cfg.SkillPrompt,
        TaskPrompt:  task,
    }

    var result iexec.Result
    var err error

    if s.cfg.ExecutorFn != nil {
        result, err = s.cfg.ExecutorFn(ctx, req)
    } else {
        return nil, fmt.Errorf("no executor configured")
    }

    if err != nil {
        return nil, err
    }

    return json.Marshal(result)
}

Step 5: Run — expect PASS

go test ./internal/skills/tdd/...

Expected: PASS

Step 6: Commit

git add internal/skills/tdd/
git commit -m "feat: add TDD skill with red/green/refactor handlers"

Task 9: Wire main.go

Files:

Modify: cmd/supervisor/main.go
Step 1: Write failing integration test

Create cmd/supervisor/integration_test.go:

//go:build integration

package main_test

import (
    "bytes"
    "encoding/json"
    "net/http"
    "testing"
    "time"

    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestServerStartsAndResponds(t *testing.T) {
    // Start the server in a goroutine (assumes it's running on :3200)
    // This test is run manually after `task supervisor:dev`
    time.Sleep(500 * time.Millisecond)

    body, _ := json.Marshal(map[string]any{
        "jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": map[string]any{},
    })
    resp, err := http.Post("http://localhost:3200/mcp", "application/json", bytes.NewBuffer(body))
    require.NoError(t, err)
    defer resp.Body.Close()

    assert.Equal(t, http.StatusOK, resp.StatusCode)
}

Step 2: Run unit tests — all still pass

go test ./...

Expected: PASS (integration test skipped without build tag)

Step 3: Rewrite cmd/supervisor/main.go

package main

import (
    "log/slog"
    "net/http"
    "os"

    "github.com/mathiasbq/supervisor/internal/config"
    iexec "github.com/mathiasbq/supervisor/internal/exec"
    "github.com/mathiasbq/supervisor/internal/mcp"
    "github.com/mathiasbq/supervisor/internal/registry"
    "github.com/mathiasbq/supervisor/internal/skills/tdd"
)

func main() {
    logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))

    cfg, err := config.Load()
    if err != nil {
        logger.Error("load config", "err", err)
        os.Exit(1)
    }

    models, err := config.LoadModels(cfg.ModelsFile)
    if err != nil {
        logger.Error("load models", "err", err)
        os.Exit(1)
    }

    systemPrompt, err := os.ReadFile(cfg.ConfigDir + "/CLAUDE.md")
    if err != nil {
        logger.Error("read supervisor CLAUDE.md", "err", err)
        os.Exit(1)
    }

    tddPrompt, err := os.ReadFile(cfg.ConfigDir + "/tdd.md")
    if err != nil {
        logger.Error("read tdd.md", "err", err)
        os.Exit(1)
    }

    executor := iexec.New(iexec.Config{
        SystemPrompt:   string(systemPrompt),
        LiteLLMBaseURL: cfg.LiteLLMBaseURL,
        LiteLLMAPIKey:  cfg.LiteLLMAPIKey,
    })

    reg := registry.New()
    reg.Register(tdd.New(tdd.Config{
        SystemPrompt: string(systemPrompt),
        SkillPrompt:  string(tddPrompt),
        DefaultModel: models.Resolve("tdd", ""),
        ExecutorFn:   executor.Run,
    }))

    srv := mcp.NewServer(reg)
    mux := http.NewServeMux()
    mux.Handle("/mcp", srv)

    addr := ":" + cfg.Port
    logger.Info("supervisor starting", "addr", addr)
    if err := http.ListenAndServe(addr, mux); err != nil {
        logger.Error("server error", "err", err)
        os.Exit(1)
    }
}

Step 4: Run all tests — expect PASS

go test ./...

Expected: PASS

Step 5: Verify it builds

go build ./cmd/supervisor/

Expected: binary produced, no errors

Step 6: Commit

git add cmd/supervisor/main.go cmd/supervisor/integration_test.go
git commit -m "feat: wire all components in main.go"

Task 10: Config files

Files:

Create: config/supervisor/CLAUDE.md
Create: config/supervisor/tdd.md
Create: .env.example
Step 1: Create config/supervisor/CLAUDE.md

# Supervisor Agent

You are a supervisor agent. You orchestrate skill workers.
You do not converse. You do not explain. You execute and report.

## Output contract — NON-NEGOTIABLE

Every response must be raw JSON conforming to the provided schema.
No preamble, no markdown, no prose. Raw JSON only.

If you cannot produce valid JSON for any reason, output:
{"status":"error","phase":"","skill":"","file_path":"","runner_output":"","verified":false,"model_used":"self","message":"<reason>"}

## Verification — IRON LAW

Never trust your own output. Always run the test suite.
Use the subprocess exit code as the only source of truth.
`verified: true` only when exit code matches the expected outcome for the phase.
`verified: false` even if you believe the code is correct.

## Loop prevention

- Maximum 3 attempts per phase.
- If output is identical across two consecutive attempts, stop immediately.
- After 3 failures set status "error" and explain the blocker in message.
- Never retry indefinitely.

## Test runner detection

Inspect project_root for these signals in order:

| Signal                          | Command              |
|---------------------------------|----------------------|
| go.mod                          | go test ./...        |
| package.json                    | npm test             |
| pyproject.toml / pytest.ini     | pytest               |
| Cargo.toml                      | cargo test           |
| Gemfile                         | bundle exec rspec    |
| mix.exs                         | mix test             |

If test_cmd is provided in the task, use it directly — skip detection.
If runner cannot be determined, return status "error".

## Model delegation

When a model is specified in the task, you may use it for code generation
by calling LiteLLM at the LITELLM_BASE_URL with the specified model name.
Always retain verification yourself — never delegate test execution.
Tag model_used in your response with the model that generated the code,
or "self" if you generated it directly.

Step 2: Create config/supervisor/tdd.md

# TDD Skill

## Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.

## Red phase

- Write exactly one test. One behavior. Name must describe the behavior clearly.
- Run the test suite. Confirm the test FAILS.
- If the test passes immediately: it tests existing behavior or is vacuous.
  Return status "fail" with message explaining why the test is wrong.
- Do not write any implementation code in this phase.

## Green phase

- Write the minimal code to make the failing test pass. Nothing more.
- YAGNI: no extra parameters, no future-proofing, no clever abstractions.
- Run the test suite. Confirm it PASSES.
- If tests fail: fix the implementation, not the test. Max 3 attempts.

## Refactor phase

- Improve structure, naming, or clarity only. No new behavior.
- Tests must remain green after every change.
- If tests break during refactor: revert that change, return status "fail".

Step 3: Create .env.example

# Supervisor MCP server
SUPERVISOR_PORT=3200
SUPERVISOR_CONFIG_DIR=./config/supervisor
SUPERVISOR_MODELS_FILE=./config/models.yaml

# LiteLLM gateway (iguana)
LITELLM_BASE_URL=http://iguana:4000
LITELLM_API_KEY=your-litellm-master-key

Step 4: Run all tests — still pass

task check

Expected: PASS

Step 5: Commit

git add config/ .env.example
git commit -m "feat: add supervisor CLAUDE.md, tdd skill prompt, and env example"

Task 11: Taskfile + MCP registration

Files:

Modify: Taskfile.yml
Modify: .context/mcp.json
Step 1: Add tasks to Taskfile.yml

Add these tasks to the existing Taskfile.yml:

  supervisor:dev:
    desc: Run supervisor MCP server (development)
    cmds:
      - go run ./cmd/supervisor

  supervisor:build:
    desc: Build supervisor binary
    cmds:
      - go build -o bin/supervisor ./cmd/supervisor

  supervisor:test:smoke:
    desc: Smoke test supervisor via MCP (requires supervisor:dev running)
    cmds:
      - |
        curl -s -X POST http://localhost:${SUPERVISOR_PORT:-3200}/mcp \
          -H "Content-Type: application/json" \
          -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq .

Step 2: Register supervisor in .context/mcp.json

Replace contents of .context/mcp.json:

{
  "mcpServers": {
    "knowledge": {
      "url": "http://localhost:3100/mcp",
      "description": "Project knowledge base — vector + graph retrieval"
    },
    "supervisor": {
      "url": "http://localhost:3200/mcp",
      "description": "Skill workers — TDD (red/green/refactor), more coming"
    }
  }
}

Step 3: Sync context files

task context:sync

Expected: CLAUDE.md, AGENTS.md updated with new MCP server entry

Step 4: Commit

git add Taskfile.yml .context/mcp.json CLAUDE.md AGENTS.md
git commit -m "feat: register supervisor MCP server and add task targets"

Task 12: End-to-end smoke test

Manual verification that all layers connect.

Step 1: Start the supervisor

In a terminal:

cp .env.example .env  # fill in LITELLM_API_KEY if needed
task supervisor:dev

Expected: {"time":"...","level":"INFO","msg":"supervisor starting","addr":":3200"}

Step 2: Verify tools/list

task supervisor:test:smoke

Expected: JSON response with tdd_red, tdd_green, tdd_refactor in result.tools

Step 3: Verify tdd_red responds

curl -s -X POST http://localhost:3200/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc":"2.0","id":1,"method":"tools/call",
    "params":{
      "name":"tdd_red",
      "arguments":{
        "project_root":"'"$PWD"'",
        "spec":"a function that returns the sum of two integers"
      }
    }
  }' | jq .

Expected: JSON-RPC response with result.content[0].text containing a valid supervisor Result JSON

Step 4: Reload Claude Code MCP config

In Claude Code, run /mcp to verify the supervisor server appears and its tools are listed.

Step 5: Final commit

git add -p  # stage any notes or fixes from smoke test
git commit -m "chore: smoke test complete, supervisor MCP server operational"

go.sum

After all go get commands in the tasks above, run:

go mod tidy
git add go.sum
git commit -m "chore: tidy go.sum"

47 KiB Raw Blame History

Supervisor MCP Server Implementation Plan

File Map

Task 1: Go module + project skeleton

Task 2: Result type

Task 3: Config package

Task 4: Models config

Task 5: Exec package

Task 6: Registry + Skill interface

Task 7: MCP server

Task 8: TDD skill

Task 9: Wire main.go

Task 10: Config files

Task 11: Taskfile + MCP registration

Task 12: End-to-end smoke test

go.sum

47 KiB

Raw Blame History