Files
hyperguild/docs/superpowers/plans/2026-04-16-supervisor-mcp-server.md
Mathias Bergqvist 7d108b2911 docs: add supervisor MCP server implementation plan
12 TDD tasks covering: Go module, Result type, Config, Models,
Executor (claude --print), Registry, MCP server, TDD skill,
main wiring, config files, Taskfile/MCP registration, smoke test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:56:48 +02:00

47 KiB

Supervisor MCP Server Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build a Go MCP server that exposes TDD skill workers as tools Claude Code can call natively, with a Claude supervisor instance handling orchestration and verification.

Architecture: A thin Go HTTP server routes MCP tool calls to skill handlers. Each handler spawns claude --print with --bare --system-prompt --json-schema --permission-mode bypassPermissions, injecting supervisor rules and skill discipline. The spawned Claude instance uses its own Bash/Read/Write tools to run tests and generate code, returning a structured JSON result. LiteLLM on iguana handles Ollama model delegation.

Tech Stack: Go 1.26, net/http (stdlib), gopkg.in/yaml.v3, testify, claude CLI, LiteLLM


File Map

File Responsibility
go.mod Module definition
cmd/supervisor/main.go Entry point, wires all dependencies
internal/config/config.go Env vars → typed Config struct
internal/config/config_test.go Config parsing tests
internal/config/models.go YAML model routing table
internal/config/models_test.go Model resolution tests
internal/exec/result.go Result struct + JSON schema constant
internal/exec/result_test.go JSON parsing/validation tests
internal/exec/executor.go Spawn claude, capture output, parse result
internal/exec/executor_test.go Executor tests using fake binary
internal/registry/registry.go Skill interface + registry
internal/registry/registry_test.go Registry routing tests
internal/mcp/server.go HTTP/JSON-RPC MCP server
internal/mcp/server_test.go MCP protocol tests
internal/skills/tdd/skill.go Tool definitions (MCP schema)
internal/skills/tdd/handlers.go tdd_red, tdd_green, tdd_refactor
internal/skills/tdd/handlers_test.go Handler prompt-building tests
config/supervisor/CLAUDE.md Supervisor orchestration rules
config/supervisor/tdd.md TDD iron law (injected per call)
config/models.yaml Per-skill model routing table
.env.example Documented env vars
Taskfile.yml Add supervisor:dev and supervisor:build tasks
.context/mcp.json Register supervisor as MCP server

Task 1: Go module + project skeleton

Files:

  • Create: go.mod

  • Create: cmd/supervisor/main.go

  • Step 1: Write a failing test to verify the binary compiles and exits cleanly

Create cmd/supervisor/main_test.go:

package main

import (
    "os/exec"
    "testing"
)

func TestBinaryCompiles(t *testing.T) {
    cmd := exec.Command("go", "build", "./...")
    out, err := cmd.CombinedOutput()
    if err != nil {
        t.Fatalf("build failed: %s\n%s", err, out)
    }
}
  • Step 2: Run test — expect it to fail (no go.mod yet)
go test ./...

Expected: error — no go.mod

  • Step 3: Initialize the module
go mod init github.com/mathiasbq/supervisor
  • Step 4: Create cmd/supervisor/main.go
package main

import (
    "fmt"
    "os"
)

func main() {
    fmt.Fprintln(os.Stderr, "supervisor: not yet implemented")
    os.Exit(1)
}
  • Step 5: Run test — expect it to pass
go test ./cmd/supervisor/...

Expected: PASS

  • Step 6: Commit
git add go.mod cmd/
git commit -m "chore: initialize go module and cmd skeleton"

Task 2: Result type

Files:

  • Create: internal/exec/result.go

  • Create: internal/exec/result_test.go

  • Step 1: Write failing tests for result parsing

Create internal/exec/result_test.go:

package exec_test

import (
    "encoding/json"
    "testing"

    "github.com/mathiasbq/supervisor/internal/exec"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestResultParsesValidJSON(t *testing.T) {
    raw := `{
        "status": "pass",
        "phase": "red",
        "skill": "tdd",
        "file_path": "/tmp/foo_test.go",
        "runner_output": "--- FAIL: TestFoo",
        "verified": true,
        "model_used": "self",
        "message": "test fails as expected"
    }`
    var r exec.Result
    require.NoError(t, json.Unmarshal([]byte(raw), &r))
    assert.Equal(t, "pass", r.Status)
    assert.Equal(t, "red", r.Phase)
    assert.True(t, r.Verified)
}

func TestResultValidation(t *testing.T) {
    tests := []struct {
        name    string
        result  exec.Result
        wantErr bool
    }{
        {
            name: "valid pass result",
            result: exec.Result{
                Status: "pass", Phase: "red", Skill: "tdd",
                FilePath: "/tmp/x_test.go", RunnerOutput: "FAIL",
                Verified: true, ModelUsed: "self", Message: "ok",
            },
            wantErr: false,
        },
        {
            name:    "empty status",
            result:  exec.Result{Phase: "red", Skill: "tdd"},
            wantErr: true,
        },
        {
            name:    "invalid status",
            result:  exec.Result{Status: "unknown", Phase: "red", Skill: "tdd"},
            wantErr: true,
        },
        {
            name:    "invalid phase",
            result:  exec.Result{Status: "pass", Phase: "bad", Skill: "tdd"},
            wantErr: true,
        },
    }
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            err := tt.result.Validate()
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}
  • Step 2: Run — expect FAIL (package does not exist)
go test ./internal/exec/...

Expected: FAIL — cannot find package

  • Step 3: Install testify
go get github.com/stretchr/testify@latest
  • Step 4: Create internal/exec/result.go
package exec

import (
    "errors"
    "strings"
)

// Result is the structured JSON output from every supervisor invocation.
// The JSON schema constant is passed to claude via --json-schema so Claude
// validates its own output before returning.
type Result struct {
    Status       string `json:"status"`        // pass | fail | error
    Phase        string `json:"phase"`         // red | green | refactor
    Skill        string `json:"skill"`         // tdd | review | ...
    FilePath     string `json:"file_path"`     // absolute path to generated file
    RunnerOutput string `json:"runner_output"` // raw stdout+stderr from test runner
    Verified     bool   `json:"verified"`      // based on exit code, never self-report
    ModelUsed    string `json:"model_used"`    // model name or "self"
    Message      string `json:"message"`       // one sentence summary
}

var validStatuses = map[string]bool{"pass": true, "fail": true, "error": true}
var validPhases = map[string]bool{"red": true, "green": true, "refactor": true}

func (r Result) Validate() error {
    var errs []string
    if !validStatuses[r.Status] {
        errs = append(errs, "status must be pass|fail|error, got: "+r.Status)
    }
    if !validPhases[r.Phase] {
        errs = append(errs, "phase must be red|green|refactor, got: "+r.Phase)
    }
    if r.Skill == "" {
        errs = append(errs, "skill is required")
    }
    if len(errs) > 0 {
        return errors.New(strings.Join(errs, "; "))
    }
    return nil
}

// Schema is passed to claude --json-schema to enforce structured output.
const Schema = `{
  "type": "object",
  "required": ["status","phase","skill","file_path","runner_output","verified","model_used","message"],
  "properties": {
    "status":        {"type": "string", "enum": ["pass","fail","error"]},
    "phase":         {"type": "string", "enum": ["red","green","refactor"]},
    "skill":         {"type": "string"},
    "file_path":     {"type": "string"},
    "runner_output": {"type": "string"},
    "verified":      {"type": "boolean"},
    "model_used":    {"type": "string"},
    "message":       {"type": "string"}
  }
}`
  • Step 5: Run — expect PASS
go test ./internal/exec/...

Expected: PASS

  • Step 6: Commit
git add internal/exec/result.go internal/exec/result_test.go
git commit -m "feat: add Result type with JSON schema and validation"

Task 3: Config package

Files:

  • Create: internal/config/config.go

  • Create: internal/config/config_test.go

  • Step 1: Write failing tests

Create internal/config/config_test.go:

package config_test

import (
    "testing"

    "github.com/mathiasbq/supervisor/internal/config"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestLoadDefaults(t *testing.T) {
    t.Setenv("SUPERVISOR_PORT", "")
    t.Setenv("LITELLM_BASE_URL", "")
    t.Setenv("LITELLM_API_KEY", "")
    t.Setenv("SUPERVISOR_CONFIG_DIR", "")

    cfg, err := config.Load()
    require.NoError(t, err)
    assert.Equal(t, "3200", cfg.Port)
    assert.Equal(t, "http://iguana:4000", cfg.LiteLLMBaseURL)
    assert.Equal(t, "./config/supervisor", cfg.ConfigDir)
}

func TestLoadFromEnv(t *testing.T) {
    t.Setenv("SUPERVISOR_PORT", "4000")
    t.Setenv("LITELLM_BASE_URL", "http://localhost:4000")
    t.Setenv("LITELLM_API_KEY", "test-key")
    t.Setenv("SUPERVISOR_CONFIG_DIR", "/etc/supervisor")

    cfg, err := config.Load()
    require.NoError(t, err)
    assert.Equal(t, "4000", cfg.Port)
    assert.Equal(t, "http://localhost:4000", cfg.LiteLLMBaseURL)
    assert.Equal(t, "test-key", cfg.LiteLLMAPIKey)
    assert.Equal(t, "/etc/supervisor", cfg.ConfigDir)
}
  • Step 2: Run — expect FAIL
go test ./internal/config/...

Expected: FAIL — cannot find package

  • Step 3: Create internal/config/config.go
package config

import "os"

type Config struct {
    Port           string // SUPERVISOR_PORT, default 3200
    LiteLLMBaseURL string // LITELLM_BASE_URL, default http://iguana:4000
    LiteLLMAPIKey  string // LITELLM_API_KEY
    ConfigDir      string // SUPERVISOR_CONFIG_DIR, default ./config/supervisor
    ModelsFile     string // SUPERVISOR_MODELS_FILE, default <ConfigDir>/../models.yaml
}

func Load() (Config, error) {
    cfg := Config{
        Port:           envOr("SUPERVISOR_PORT", "3200"),
        LiteLLMBaseURL: envOr("LITELLM_BASE_URL", "http://iguana:4000"),
        LiteLLMAPIKey:  os.Getenv("LITELLM_API_KEY"),
        ConfigDir:      envOr("SUPERVISOR_CONFIG_DIR", "./config/supervisor"),
    }
    cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml")
    return cfg, nil
}

func envOr(key, def string) string {
    if v := os.Getenv(key); v != "" {
        return v
    }
    return def
}
  • Step 4: Run — expect PASS
go test ./internal/config/...

Expected: PASS

  • Step 5: Commit
git add internal/config/
git commit -m "feat: add config package with env-var loading"

Task 4: Models config

Files:

  • Create: internal/config/models.go

  • Create: internal/config/models_test.go

  • Create: config/models.yaml

  • Step 1: Write failing tests

Append to internal/config/models_test.go (new file):

package config_test

import (
    "os"
    "path/filepath"
    "testing"

    "github.com/mathiasbq/supervisor/internal/config"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestModelsResolve(t *testing.T) {
    yaml := `
default: ollama/default-model
skills:
  tdd: ollama/qwen3-coder-30b-tuned
  review: ollama/devstral-tuned
`
    f := filepath.Join(t.TempDir(), "models.yaml")
    require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))

    m, err := config.LoadModels(f)
    require.NoError(t, err)

    assert.Equal(t, "ollama/qwen3-coder-30b-tuned", m.Resolve("tdd", ""))
    assert.Equal(t, "ollama/devstral-tuned", m.Resolve("review", ""))
    assert.Equal(t, "ollama/default-model", m.Resolve("unknown", ""))
}

func TestModelsOverride(t *testing.T) {
    yaml := `
default: ollama/default-model
skills:
  tdd: ollama/qwen3-coder-30b-tuned
`
    f := filepath.Join(t.TempDir(), "models.yaml")
    require.NoError(t, os.WriteFile(f, []byte(yaml), 0644))

    m, err := config.LoadModels(f)
    require.NoError(t, err)

    // caller override wins over skill default
    assert.Equal(t, "anthropic/claude-sonnet-4-6", m.Resolve("tdd", "anthropic/claude-sonnet-4-6"))
}
  • Step 2: Run — expect FAIL
go test ./internal/config/...

Expected: FAIL — Models type undefined

  • Step 3: Install yaml dependency
go get gopkg.in/yaml.v3
  • Step 4: Create internal/config/models.go
package config

import (
    "fmt"
    "os"

    "gopkg.in/yaml.v3"
)

type modelsFile struct {
    Default string            `yaml:"default"`
    Skills  map[string]string `yaml:"skills"`
}

type Models struct {
    data modelsFile
}

func LoadModels(path string) (Models, error) {
    raw, err := os.ReadFile(path)
    if err != nil {
        return Models{}, fmt.Errorf("load models: %w", err)
    }
    var f modelsFile
    if err := yaml.Unmarshal(raw, &f); err != nil {
        return Models{}, fmt.Errorf("parse models: %w", err)
    }
    return Models{data: f}, nil
}

// Resolve returns the model for a skill, respecting three-layer priority:
// 1. override (from MCP call) — highest
// 2. per-skill default from models.yaml
// 3. global default
func (m Models) Resolve(skill, override string) string {
    if override != "" {
        return override
    }
    if model, ok := m.data.Skills[skill]; ok {
        return model
    }
    return m.data.Default
}
  • Step 5: Run — expect PASS
go test ./internal/config/...

Expected: PASS

  • Step 6: Create config/models.yaml
# Model routing table — three-layer priority:
# 1. model param in MCP tool call (caller override)
# 2. per-skill entry here
# 3. default (fallback)
default: ollama/qwen3-coder-30b-tuned

skills:
  tdd:    ollama/qwen3-coder-30b-tuned
  review: ollama/devstral-tuned
  debug:  ollama/deepseek-r1-tuned
  • Step 7: Commit
git add internal/config/models.go internal/config/models_test.go config/models.yaml
git commit -m "feat: add model routing table with three-layer priority"

Task 5: Exec package

Files:

  • Create: internal/exec/executor.go

  • Create: internal/exec/executor_test.go

  • Modify: internal/exec/result.go (add Schema constant — already done in Task 2)

  • Step 1: Write failing tests

Create internal/exec/executor_test.go:

package exec_test

import (
    "context"
    "os"
    "path/filepath"
    "testing"
    "time"

    iexec "github.com/mathiasbq/supervisor/internal/exec"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

// fakeClaudePath writes a shell script that prints a fixed JSON result
// and returns its path. Used to test executor without invoking real claude.
func fakeClaudePath(t *testing.T, output string, exitCode int) string {
    t.Helper()
    dir := t.TempDir()
    script := filepath.Join(dir, "claude")
    var content string
    if exitCode != 0 {
        content = "#!/bin/sh\necho 'error' >&2\nexit 1\n"
    } else {
        content = "#!/bin/sh\necho '" + output + "'\n"
    }
    require.NoError(t, os.WriteFile(script, []byte(content), 0755))
    return script
}

func TestExecutorParsesValidResult(t *testing.T) {
    validJSON := `{"status":"pass","phase":"red","skill":"tdd","file_path":"/tmp/x_test.go","runner_output":"FAIL","verified":true,"model_used":"self","message":"ok"}`
    claude := fakeClaudePath(t, validJSON, 0)

    ex := iexec.New(iexec.Config{
        ClaudeBinary:  claude,
        SystemPrompt:  "you are a supervisor",
        Timeout:       5 * time.Second,
    })

    result, err := ex.Run(context.Background(), iexec.Request{
        SkillPrompt: "tdd rules",
        TaskPrompt:  "run red phase",
    })
    require.NoError(t, err)
    assert.Equal(t, "pass", result.Status)
    assert.True(t, result.Verified)
}

func TestExecutorReturnsErrorOnNonZeroExit(t *testing.T) {
    claude := fakeClaudePath(t, "", 1)

    ex := iexec.New(iexec.Config{
        ClaudeBinary: claude,
        SystemPrompt: "you are a supervisor",
        Timeout:      5 * time.Second,
    })

    _, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "fail"})
    assert.Error(t, err)
}

func TestExecutorTimesOut(t *testing.T) {
    dir := t.TempDir()
    script := filepath.Join(dir, "claude")
    require.NoError(t, os.WriteFile(script, []byte("#!/bin/sh\nsleep 60\n"), 0755))

    ex := iexec.New(iexec.Config{
        ClaudeBinary: script,
        SystemPrompt: "you are a supervisor",
        Timeout:      100 * time.Millisecond,
    })

    _, err := ex.Run(context.Background(), iexec.Request{TaskPrompt: "slow"})
    assert.ErrorContains(t, err, "timeout")
}
  • Step 2: Verify the test file compiles
go vet ./internal/exec/...

Expected: no errors

  • Step 3: Run — expect FAIL
go test ./internal/exec/...

Expected: FAIL — Executor type undefined

  • Step 4: Create internal/exec/executor.go
package exec

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "os/exec"
    "strings"
    "time"
)

// Config holds executor configuration.
type Config struct {
    ClaudeBinary   string        // path to claude binary, defaults to "claude"
    SystemPrompt   string        // contents of supervisor CLAUDE.md
    Timeout        time.Duration // per-invocation timeout, default 120s
    LiteLLMBaseURL string        // passed to Claude so it can delegate to Ollama
    LiteLLMAPIKey  string        // passed to Claude for LiteLLM auth
}

// Request is the input to a single supervisor invocation.
type Request struct {
    SkillPrompt string // skill-specific discipline (e.g. tdd.md contents)
    TaskPrompt  string // the specific task (phase, project_root, spec, model)
    Model       string // resolved model name, passed in task prompt
    Tools       string // comma-separated allowed tools, default "Bash,Read,Write"
}

// Executor spawns a claude instance and captures its structured JSON output.
type Executor struct {
    cfg Config
}

func New(cfg Config) *Executor {
    if cfg.ClaudeBinary == "" {
        cfg.ClaudeBinary = "claude"
    }
    if cfg.Timeout == 0 {
        cfg.Timeout = 120 * time.Second
    }
    return &Executor{cfg: cfg}
}

func (e *Executor) Run(ctx context.Context, req Request) (Result, error) {
    ctx, cancel := context.WithTimeout(ctx, e.cfg.Timeout)
    defer cancel()

    tools := req.Tools
    if tools == "" {
        tools = "Bash,Read,Write"
    }

    // Build the full prompt: system rules + skill rules + task + infra context
    litellmCtx := fmt.Sprintf(
        "LITELLM_BASE_URL: %s\nLITELLM_API_KEY: %s",
        e.cfg.LiteLLMBaseURL, e.cfg.LiteLLMAPIKey,
    )
    prompt := strings.Join([]string{
        e.cfg.SystemPrompt,
        "---",
        req.SkillPrompt,
        "---",
        litellmCtx,
        "---",
        req.TaskPrompt,
    }, "\n\n")

    args := []string{
        "--print",
        "--bare",
        "--permission-mode", "bypassPermissions",
        "--tools", tools,
        "--json-schema", Schema,
        "--output-format", "text",
        prompt,
    }

    cmd := exec.CommandContext(ctx, e.cfg.ClaudeBinary, args...)
    var stdout, stderr bytes.Buffer
    cmd.Stdout = &stdout
    cmd.Stderr = &stderr

    if err := cmd.Run(); err != nil {
        if ctx.Err() != nil {
            return Result{}, fmt.Errorf("timeout after %s", e.cfg.Timeout)
        }
        return Result{}, fmt.Errorf("claude exited with error: %w — stderr: %s", err, stderr.String())
    }

    var r Result
    if err := json.Unmarshal(stdout.Bytes(), &r); err != nil {
        return Result{}, fmt.Errorf("parse result JSON: %w — raw output: %s", err, stdout.String())
    }
    if err := r.Validate(); err != nil {
        return Result{}, fmt.Errorf("invalid result: %w", err)
    }
    return r, nil
}
  • Step 5: Run — expect PASS
go test ./internal/exec/...

Expected: PASS

  • Step 6: Commit
git add internal/exec/executor.go internal/exec/executor_test.go
git commit -m "feat: add executor that spawns claude and parses JSON result"

Task 6: Registry + Skill interface

Files:

  • Create: internal/registry/registry.go

  • Create: internal/registry/registry_test.go

  • Step 1: Write failing tests

Create internal/registry/registry_test.go:

package registry_test

import (
    "context"
    "encoding/json"
    "testing"

    "github.com/mathiasbq/supervisor/internal/registry"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

type stubSkill struct {
    name string
}

func (s stubSkill) Name() string { return s.name }
func (s stubSkill) Tools() []registry.ToolDef {
    return []registry.ToolDef{{
        Name:        s.name + "_tool",
        Description: "stub tool",
        InputSchema: json.RawMessage(`{"type":"object","properties":{}}`),
    }}
}
func (s stubSkill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
    return json.RawMessage(`{"ok":true}`), nil
}

func TestRegistryRoutes(t *testing.T) {
    r := registry.New()
    r.Register(stubSkill{name: "tdd"})

    result, err := r.Dispatch(context.Background(), "tdd_tool", json.RawMessage(`{}`))
    require.NoError(t, err)
    assert.JSONEq(t, `{"ok":true}`, string(result))
}

func TestRegistryUnknownTool(t *testing.T) {
    r := registry.New()
    _, err := r.Dispatch(context.Background(), "unknown_tool", json.RawMessage(`{}`))
    assert.ErrorContains(t, err, "unknown tool")
}

func TestRegistryListsTools(t *testing.T) {
    r := registry.New()
    r.Register(stubSkill{name: "tdd"})

    tools := r.Tools()
    require.Len(t, tools, 1)
    assert.Equal(t, "tdd_tool", tools[0].Name)
}
  • Step 2: Run — expect FAIL
go test ./internal/registry/...

Expected: FAIL — registry package not found

  • Step 3: Create internal/registry/registry.go
package registry

import (
    "context"
    "encoding/json"
    "fmt"
)

// ToolDef describes a single MCP tool exposed by a skill.
type ToolDef struct {
    Name        string          `json:"name"`
    Description string          `json:"description"`
    InputSchema json.RawMessage `json:"inputSchema"`
}

// Skill is implemented by each skill package (tdd, review, etc.).
type Skill interface {
    Name() string
    Tools() []ToolDef
    Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error)
}

// Registry routes MCP tool calls to the correct skill handler.
type Registry struct {
    skills map[string]Skill // tool name → skill
    tools  []ToolDef
}

func New() *Registry {
    return &Registry{skills: make(map[string]Skill)}
}

func (r *Registry) Register(s Skill) {
    for _, t := range s.Tools() {
        r.skills[t.Name] = s
        r.tools = append(r.tools, t)
    }
}

func (r *Registry) Tools() []ToolDef {
    return r.tools
}

func (r *Registry) Dispatch(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
    s, ok := r.skills[tool]
    if !ok {
        return nil, fmt.Errorf("unknown tool: %s", tool)
    }
    return s.Handle(ctx, tool, args)
}
  • Step 4: Run — expect PASS
go test ./internal/registry/...

Expected: PASS

  • Step 5: Commit
git add internal/registry/
git commit -m "feat: add skill registry with tool routing"

Task 7: MCP server

Files:

  • Create: internal/mcp/server.go

  • Create: internal/mcp/server_test.go

  • Step 1: Write failing tests

Create internal/mcp/server_test.go:

package mcp_test

import (
    "bytes"
    "context"
    "encoding/json"
    "net/http"
    "net/http/httptest"
    "testing"

    "github.com/mathiasbq/supervisor/internal/mcp"
    "github.com/mathiasbq/supervisor/internal/registry"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func jsonBody(t *testing.T, v any) *bytes.Buffer {
    t.Helper()
    b, err := json.Marshal(v)
    require.NoError(t, err)
    return bytes.NewBuffer(b)
}

func TestMCPInitialize(t *testing.T) {
    reg := registry.New()
    srv := mcp.NewServer(reg)

    req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
        "jsonrpc": "2.0",
        "id":      1,
        "method":  "initialize",
        "params":  map[string]any{},
    }))
    req.Header.Set("Content-Type", "application/json")
    rr := httptest.NewRecorder()

    srv.ServeHTTP(rr, req)

    assert.Equal(t, http.StatusOK, rr.Code)
    var resp map[string]any
    require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
    result := resp["result"].(map[string]any)
    assert.Equal(t, "2024-11-05", result["protocolVersion"])
}

func TestMCPToolsList(t *testing.T) {
    reg := registry.New()
    srv := mcp.NewServer(reg)

    req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
        "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": map[string]any{},
    }))
    req.Header.Set("Content-Type", "application/json")
    rr := httptest.NewRecorder()
    srv.ServeHTTP(rr, req)

    assert.Equal(t, http.StatusOK, rr.Code)
    var resp map[string]any
    require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
    result := resp["result"].(map[string]any)
    assert.NotNil(t, result["tools"])
}

func TestMCPUnknownMethod(t *testing.T) {
    reg := registry.New()
    srv := mcp.NewServer(reg)

    req := httptest.NewRequest(http.MethodPost, "/mcp", jsonBody(t, map[string]any{
        "jsonrpc": "2.0", "id": 3, "method": "unknown/method", "params": map[string]any{},
    }))
    req.Header.Set("Content-Type", "application/json")
    rr := httptest.NewRecorder()
    srv.ServeHTTP(rr, req)

    assert.Equal(t, http.StatusOK, rr.Code)
    var resp map[string]any
    require.NoError(t, json.Unmarshal(rr.Body.Bytes(), &resp))
    assert.NotNil(t, resp["error"])
}
  • Step 2: Run — expect FAIL
go test ./internal/mcp/...

Expected: FAIL — mcp package not found

  • Step 3: Create internal/mcp/server.go
package mcp

import (
    "context"
    "encoding/json"
    "net/http"

    "github.com/mathiasbq/supervisor/internal/registry"
)

type request struct {
    JSONRPC string          `json:"jsonrpc"`
    ID      any             `json:"id"`
    Method  string          `json:"method"`
    Params  json.RawMessage `json:"params"`
}

type response struct {
    JSONRPC string `json:"jsonrpc"`
    ID      any    `json:"id,omitempty"`
    Result  any    `json:"result,omitempty"`
    Error   *rpcError `json:"error,omitempty"`
}

type rpcError struct {
    Code    int    `json:"code"`
    Message string `json:"message"`
}

// Server is an HTTP handler implementing the MCP JSON-RPC protocol.
type Server struct {
    reg *registry.Registry
}

func NewServer(reg *registry.Registry) *Server {
    return &Server{reg: reg}
}

func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    var req request
    if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
        writeError(w, nil, -32700, "parse error")
        return
    }

    var result any
    var rpcErr *rpcError

    switch req.Method {
    case "initialize":
        result = map[string]any{
            "protocolVersion": "2024-11-05",
            "capabilities":    map[string]any{"tools": map[string]any{}},
            "serverInfo":      map[string]any{"name": "supervisor", "version": "0.1.0"},
        }

    case "tools/list":
        result = map[string]any{"tools": s.reg.Tools()}

    case "tools/call":
        var p struct {
            Name      string          `json:"name"`
            Arguments json.RawMessage `json:"arguments"`
        }
        if err := json.Unmarshal(req.Params, &p); err != nil {
            rpcErr = &rpcError{Code: -32602, Message: "invalid params"}
            break
        }
        out, err := s.reg.Dispatch(context.Background(), p.Name, p.Arguments)
        if err != nil {
            rpcErr = &rpcError{Code: -32000, Message: err.Error()}
            break
        }
        result = map[string]any{
            "content": []map[string]any{{"type": "text", "text": string(out)}},
        }

    default:
        rpcErr = &rpcError{Code: -32601, Message: "method not found: " + req.Method}
    }

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(response{
        JSONRPC: "2.0",
        ID:      req.ID,
        Result:  result,
        Error:   rpcErr,
    })
}

func writeError(w http.ResponseWriter, id any, code int, msg string) {
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(response{
        JSONRPC: "2.0",
        ID:      id,
        Error:   &rpcError{Code: code, Message: msg},
    })
}
  • Step 4: Run — expect PASS
go test ./internal/mcp/...

Expected: PASS

  • Step 5: Commit
git add internal/mcp/
git commit -m "feat: add MCP HTTP server with JSON-RPC 2.0 transport"

Task 8: TDD skill

Files:

  • Create: internal/skills/tdd/skill.go

  • Create: internal/skills/tdd/handlers.go

  • Create: internal/skills/tdd/handlers_test.go

  • Step 1: Write failing tests

Create internal/skills/tdd/handlers_test.go:

package tdd_test

import (
    "context"
    "encoding/json"
    "testing"

    "github.com/mathiasbq/supervisor/internal/skills/tdd"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestTDDSkillTools(t *testing.T) {
    skill := tdd.New(tdd.Config{
        SystemPrompt: "supervisor rules",
        SkillPrompt:  "tdd rules",
        ExecutorFn:   nil, // will be set in TestHandle
    })
    tools := skill.Tools()
    names := make([]string, len(tools))
    for i, t := range tools {
        names[i] = t.Name
    }
    assert.ElementsMatch(t, []string{"tdd_red", "tdd_green", "tdd_refactor"}, names)
}

func TestTDDSkillHandleUnknown(t *testing.T) {
    skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
    _, err := skill.Handle(context.Background(), "tdd_unknown", json.RawMessage(`{}`))
    assert.ErrorContains(t, err, "unknown tool")
}

func TestTDDRedRequiresProjectRoot(t *testing.T) {
    skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
    _, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"spec":"add two numbers"}`))
    assert.ErrorContains(t, err, "project_root")
}

func TestTDDRedRequiresSpec(t *testing.T) {
    skill := tdd.New(tdd.Config{SystemPrompt: "s", SkillPrompt: "t"})
    _, err := skill.Handle(context.Background(), "tdd_red", json.RawMessage(`{"project_root":"/tmp/proj"}`))
    assert.ErrorContains(t, err, "spec")
}
  • Step 2: Run — expect FAIL
go test ./internal/skills/tdd/...

Expected: FAIL — tdd package not found

  • Step 3: Create internal/skills/tdd/skill.go
package tdd

import (
    "context"
    "encoding/json"

    iexec "github.com/mathiasbq/supervisor/internal/exec"
    "github.com/mathiasbq/supervisor/internal/registry"
)

// ExecutorFn allows injecting a test double for the executor.
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)

type Config struct {
    SystemPrompt string
    SkillPrompt  string
    ExecutorFn   ExecutorFn // nil = use real executor
    DefaultModel string
}

type Skill struct {
    cfg Config
}

func New(cfg Config) *Skill {
    return &Skill{cfg: cfg}
}

func (s *Skill) Name() string { return "tdd" }

func (s *Skill) Tools() []registry.ToolDef {
    schema := func(required []string, props map[string]any) json.RawMessage {
        b, _ := json.Marshal(map[string]any{
            "type":       "object",
            "required":   required,
            "properties": props,
        })
        return b
    }
    strProp := map[string]any{"type": "string"}

    return []registry.ToolDef{
        {
            Name:        "tdd_red",
            Description: "Write a failing test for the described behavior. Verifies the test fails before returning.",
            InputSchema: schema(
                []string{"project_root", "spec"},
                map[string]any{
                    "project_root": strProp,
                    "spec":         strProp,
                    "model":        strProp,
                    "test_cmd":     strProp,
                },
            ),
        },
        {
            Name:        "tdd_green",
            Description: "Write minimal implementation to make the test at test_path pass.",
            InputSchema: schema(
                []string{"project_root", "test_path"},
                map[string]any{
                    "project_root": strProp,
                    "test_path":    strProp,
                    "model":        strProp,
                    "test_cmd":     strProp,
                },
            ),
        },
        {
            Name:        "tdd_refactor",
            Description: "Refactor the implementation at impl_path while keeping tests green.",
            InputSchema: schema(
                []string{"project_root", "test_path", "impl_path"},
                map[string]any{
                    "project_root": strProp,
                    "test_path":    strProp,
                    "impl_path":    strProp,
                    "model":        strProp,
                    "test_cmd":     strProp,
                },
            ),
        },
    }
}
  • Step 4: Create internal/skills/tdd/handlers.go
package tdd

import (
    "context"
    "encoding/json"
    "fmt"

    iexec "github.com/mathiasbq/supervisor/internal/exec"
)

func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
    switch tool {
    case "tdd_red":
        return s.handleRed(ctx, args)
    case "tdd_green":
        return s.handleGreen(ctx, args)
    case "tdd_refactor":
        return s.handleRefactor(ctx, args)
    default:
        return nil, fmt.Errorf("unknown tool: %s", tool)
    }
}

type redArgs struct {
    ProjectRoot string `json:"project_root"`
    Spec        string `json:"spec"`
    Model       string `json:"model"`
    TestCmd     string `json:"test_cmd"`
}

func (s *Skill) handleRed(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
    var args redArgs
    if err := json.Unmarshal(raw, &args); err != nil {
        return nil, fmt.Errorf("parse args: %w", err)
    }
    if args.ProjectRoot == "" {
        return nil, fmt.Errorf("project_root is required")
    }
    if args.Spec == "" {
        return nil, fmt.Errorf("spec is required")
    }

    task := fmt.Sprintf(
        "phase: red\nproject_root: %s\nspec: %s\nmodel: %s\ntest_cmd: %s",
        args.ProjectRoot, args.Spec, s.resolveModel(args.Model), args.TestCmd,
    )
    return s.execute(ctx, task)
}

type greenArgs struct {
    ProjectRoot string `json:"project_root"`
    TestPath    string `json:"test_path"`
    Model       string `json:"model"`
    TestCmd     string `json:"test_cmd"`
}

func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
    var args greenArgs
    if err := json.Unmarshal(raw, &args); err != nil {
        return nil, fmt.Errorf("parse args: %w", err)
    }
    if args.ProjectRoot == "" {
        return nil, fmt.Errorf("project_root is required")
    }
    if args.TestPath == "" {
        return nil, fmt.Errorf("test_path is required")
    }

    task := fmt.Sprintf(
        "phase: green\nproject_root: %s\ntest_path: %s\nmodel: %s\ntest_cmd: %s",
        args.ProjectRoot, args.TestPath, s.resolveModel(args.Model), args.TestCmd,
    )
    return s.execute(ctx, task)
}

type refactorArgs struct {
    ProjectRoot string `json:"project_root"`
    TestPath    string `json:"test_path"`
    ImplPath    string `json:"impl_path"`
    Model       string `json:"model"`
    TestCmd     string `json:"test_cmd"`
}

func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
    var args refactorArgs
    if err := json.Unmarshal(raw, &args); err != nil {
        return nil, fmt.Errorf("parse args: %w", err)
    }
    if args.ProjectRoot == "" {
        return nil, fmt.Errorf("project_root is required")
    }
    if args.TestPath == "" {
        return nil, fmt.Errorf("test_path is required")
    }
    if args.ImplPath == "" {
        return nil, fmt.Errorf("impl_path is required")
    }

    task := fmt.Sprintf(
        "phase: refactor\nproject_root: %s\ntest_path: %s\nimpl_path: %s\nmodel: %s\ntest_cmd: %s",
        args.ProjectRoot, args.TestPath, args.ImplPath, s.resolveModel(args.Model), args.TestCmd,
    )
    return s.execute(ctx, task)
}

func (s *Skill) resolveModel(override string) string {
    if override != "" {
        return override
    }
    return s.cfg.DefaultModel
}

func (s *Skill) execute(ctx context.Context, task string) (json.RawMessage, error) {
    req := iexec.Request{
        SkillPrompt: s.cfg.SkillPrompt,
        TaskPrompt:  task,
    }

    var result iexec.Result
    var err error

    if s.cfg.ExecutorFn != nil {
        result, err = s.cfg.ExecutorFn(ctx, req)
    } else {
        return nil, fmt.Errorf("no executor configured")
    }

    if err != nil {
        return nil, err
    }

    return json.Marshal(result)
}
  • Step 5: Run — expect PASS
go test ./internal/skills/tdd/...

Expected: PASS

  • Step 6: Commit
git add internal/skills/tdd/
git commit -m "feat: add TDD skill with red/green/refactor handlers"

Task 9: Wire main.go

Files:

  • Modify: cmd/supervisor/main.go

  • Step 1: Write failing integration test

Create cmd/supervisor/integration_test.go:

//go:build integration

package main_test

import (
    "bytes"
    "encoding/json"
    "net/http"
    "testing"
    "time"

    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestServerStartsAndResponds(t *testing.T) {
    // Start the server in a goroutine (assumes it's running on :3200)
    // This test is run manually after `task supervisor:dev`
    time.Sleep(500 * time.Millisecond)

    body, _ := json.Marshal(map[string]any{
        "jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": map[string]any{},
    })
    resp, err := http.Post("http://localhost:3200/mcp", "application/json", bytes.NewBuffer(body))
    require.NoError(t, err)
    defer resp.Body.Close()

    assert.Equal(t, http.StatusOK, resp.StatusCode)
}
  • Step 2: Run unit tests — all still pass
go test ./...

Expected: PASS (integration test skipped without build tag)

  • Step 3: Rewrite cmd/supervisor/main.go
package main

import (
    "log/slog"
    "net/http"
    "os"

    "github.com/mathiasbq/supervisor/internal/config"
    iexec "github.com/mathiasbq/supervisor/internal/exec"
    "github.com/mathiasbq/supervisor/internal/mcp"
    "github.com/mathiasbq/supervisor/internal/registry"
    "github.com/mathiasbq/supervisor/internal/skills/tdd"
)

func main() {
    logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))

    cfg, err := config.Load()
    if err != nil {
        logger.Error("load config", "err", err)
        os.Exit(1)
    }

    models, err := config.LoadModels(cfg.ModelsFile)
    if err != nil {
        logger.Error("load models", "err", err)
        os.Exit(1)
    }

    systemPrompt, err := os.ReadFile(cfg.ConfigDir + "/CLAUDE.md")
    if err != nil {
        logger.Error("read supervisor CLAUDE.md", "err", err)
        os.Exit(1)
    }

    tddPrompt, err := os.ReadFile(cfg.ConfigDir + "/tdd.md")
    if err != nil {
        logger.Error("read tdd.md", "err", err)
        os.Exit(1)
    }

    executor := iexec.New(iexec.Config{
        SystemPrompt:   string(systemPrompt),
        LiteLLMBaseURL: cfg.LiteLLMBaseURL,
        LiteLLMAPIKey:  cfg.LiteLLMAPIKey,
    })

    reg := registry.New()
    reg.Register(tdd.New(tdd.Config{
        SystemPrompt: string(systemPrompt),
        SkillPrompt:  string(tddPrompt),
        DefaultModel: models.Resolve("tdd", ""),
        ExecutorFn:   executor.Run,
    }))

    srv := mcp.NewServer(reg)
    mux := http.NewServeMux()
    mux.Handle("/mcp", srv)

    addr := ":" + cfg.Port
    logger.Info("supervisor starting", "addr", addr)
    if err := http.ListenAndServe(addr, mux); err != nil {
        logger.Error("server error", "err", err)
        os.Exit(1)
    }
}
  • Step 4: Run all tests — expect PASS
go test ./...

Expected: PASS

  • Step 5: Verify it builds
go build ./cmd/supervisor/

Expected: binary produced, no errors

  • Step 6: Commit
git add cmd/supervisor/main.go cmd/supervisor/integration_test.go
git commit -m "feat: wire all components in main.go"

Task 10: Config files

Files:

  • Create: config/supervisor/CLAUDE.md

  • Create: config/supervisor/tdd.md

  • Create: .env.example

  • Step 1: Create config/supervisor/CLAUDE.md

# Supervisor Agent

You are a supervisor agent. You orchestrate skill workers.
You do not converse. You do not explain. You execute and report.

## Output contract — NON-NEGOTIABLE

Every response must be raw JSON conforming to the provided schema.
No preamble, no markdown, no prose. Raw JSON only.

If you cannot produce valid JSON for any reason, output:
{"status":"error","phase":"","skill":"","file_path":"","runner_output":"","verified":false,"model_used":"self","message":"<reason>"}

## Verification — IRON LAW

Never trust your own output. Always run the test suite.
Use the subprocess exit code as the only source of truth.
`verified: true` only when exit code matches the expected outcome for the phase.
`verified: false` even if you believe the code is correct.

## Loop prevention

- Maximum 3 attempts per phase.
- If output is identical across two consecutive attempts, stop immediately.
- After 3 failures set status "error" and explain the blocker in message.
- Never retry indefinitely.

## Test runner detection

Inspect project_root for these signals in order:

| Signal                          | Command              |
|---------------------------------|----------------------|
| go.mod                          | go test ./...        |
| package.json                    | npm test             |
| pyproject.toml / pytest.ini     | pytest               |
| Cargo.toml                      | cargo test           |
| Gemfile                         | bundle exec rspec    |
| mix.exs                         | mix test             |

If test_cmd is provided in the task, use it directly — skip detection.
If runner cannot be determined, return status "error".

## Model delegation

When a model is specified in the task, you may use it for code generation
by calling LiteLLM at the LITELLM_BASE_URL with the specified model name.
Always retain verification yourself — never delegate test execution.
Tag model_used in your response with the model that generated the code,
or "self" if you generated it directly.
  • Step 2: Create config/supervisor/tdd.md
# TDD Skill

## Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.

## Red phase

- Write exactly one test. One behavior. Name must describe the behavior clearly.
- Run the test suite. Confirm the test FAILS.
- If the test passes immediately: it tests existing behavior or is vacuous.
  Return status "fail" with message explaining why the test is wrong.
- Do not write any implementation code in this phase.

## Green phase

- Write the minimal code to make the failing test pass. Nothing more.
- YAGNI: no extra parameters, no future-proofing, no clever abstractions.
- Run the test suite. Confirm it PASSES.
- If tests fail: fix the implementation, not the test. Max 3 attempts.

## Refactor phase

- Improve structure, naming, or clarity only. No new behavior.
- Tests must remain green after every change.
- If tests break during refactor: revert that change, return status "fail".
  • Step 3: Create .env.example
# Supervisor MCP server
SUPERVISOR_PORT=3200
SUPERVISOR_CONFIG_DIR=./config/supervisor
SUPERVISOR_MODELS_FILE=./config/models.yaml

# LiteLLM gateway (iguana)
LITELLM_BASE_URL=http://iguana:4000
LITELLM_API_KEY=your-litellm-master-key
  • Step 4: Run all tests — still pass
task check

Expected: PASS

  • Step 5: Commit
git add config/ .env.example
git commit -m "feat: add supervisor CLAUDE.md, tdd skill prompt, and env example"

Task 11: Taskfile + MCP registration

Files:

  • Modify: Taskfile.yml

  • Modify: .context/mcp.json

  • Step 1: Add tasks to Taskfile.yml

Add these tasks to the existing Taskfile.yml:

  supervisor:dev:
    desc: Run supervisor MCP server (development)
    cmds:
      - go run ./cmd/supervisor

  supervisor:build:
    desc: Build supervisor binary
    cmds:
      - go build -o bin/supervisor ./cmd/supervisor

  supervisor:test:smoke:
    desc: Smoke test supervisor via MCP (requires supervisor:dev running)
    cmds:
      - |
        curl -s -X POST http://localhost:${SUPERVISOR_PORT:-3200}/mcp \
          -H "Content-Type: application/json" \
          -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq .
  • Step 2: Register supervisor in .context/mcp.json

Replace contents of .context/mcp.json:

{
  "mcpServers": {
    "knowledge": {
      "url": "http://localhost:3100/mcp",
      "description": "Project knowledge base — vector + graph retrieval"
    },
    "supervisor": {
      "url": "http://localhost:3200/mcp",
      "description": "Skill workers — TDD (red/green/refactor), more coming"
    }
  }
}
  • Step 3: Sync context files
task context:sync

Expected: CLAUDE.md, AGENTS.md updated with new MCP server entry

  • Step 4: Commit
git add Taskfile.yml .context/mcp.json CLAUDE.md AGENTS.md
git commit -m "feat: register supervisor MCP server and add task targets"

Task 12: End-to-end smoke test

Manual verification that all layers connect.

  • Step 1: Start the supervisor

In a terminal:

cp .env.example .env  # fill in LITELLM_API_KEY if needed
task supervisor:dev

Expected: {"time":"...","level":"INFO","msg":"supervisor starting","addr":":3200"}

  • Step 2: Verify tools/list
task supervisor:test:smoke

Expected: JSON response with tdd_red, tdd_green, tdd_refactor in result.tools

  • Step 3: Verify tdd_red responds
curl -s -X POST http://localhost:3200/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc":"2.0","id":1,"method":"tools/call",
    "params":{
      "name":"tdd_red",
      "arguments":{
        "project_root":"'"$PWD"'",
        "spec":"a function that returns the sum of two integers"
      }
    }
  }' | jq .

Expected: JSON-RPC response with result.content[0].text containing a valid supervisor Result JSON

  • Step 4: Reload Claude Code MCP config

In Claude Code, run /mcp to verify the supervisor server appears and its tools are listed.

  • Step 5: Final commit
git add -p  # stage any notes or fixes from smoke test
git commit -m "chore: smoke test complete, supervisor MCP server operational"

go.sum

After all go get commands in the tasks above, run:

go mod tidy
git add go.sum
git commit -m "chore: tidy go.sum"