brain_write with a custom filename omitted the .md extension, causing search to skip the file (search.go filters on HasSuffix .md). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
55 KiB
Hyperguild Phase 2 Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add four new MCP skills (review, debug, spec, trainer) to the hyperguild supervisor, with automatic session history injection into all multi-phase skill workers.
Architecture: The supervisor reads prior session entries and injects them into each worker's task prompt when session_id is provided — the orchestrator no longer re-summarises history. The trainer skill runs a two-step sub-agent chain: a reader agent identifies learning moments from the session log, then a writer agent formats them as SFT/DPO pairs and writes to brain/training-data/. All other new skills are single-worker.
Tech Stack: Go 1.26, internal/session (JSONL log), internal/exec (claude subprocess executor), internal/registry (MCP tool registry), config/supervisor/*.md (discipline files), config/models.yaml (model routing)
File Map
New files
| File | Responsibility |
|---|---|
internal/session/history.go |
FormatHistory(entries, excludePhase) — formats prior session entries as a prompt block |
internal/session/history_test.go |
Unit tests for FormatHistory |
internal/skills/review/skill.go |
review skill: Config, Skill, New, Name, Tools |
internal/skills/review/handlers.go |
Handle, handleReview — single-phase code review worker |
internal/skills/review/handlers_test.go |
review handler unit tests |
internal/skills/debug/skill.go |
debug skill |
internal/skills/debug/handlers.go |
handleDebug — hypothesis generation worker |
internal/skills/debug/handlers_test.go |
debug handler unit tests |
internal/skills/spec/skill.go |
spec skill |
internal/skills/spec/handlers.go |
handleSpec — spec writing worker |
internal/skills/spec/handlers_test.go |
spec handler unit tests |
internal/skills/trainer/skill.go |
trainer skill: Config with ReaderPrompt + WriterPrompt + BrainDir |
internal/skills/trainer/handlers.go |
handleTrain — calls ExecutorFn twice: reader then writer |
internal/skills/trainer/handlers_test.go |
trainer handler unit tests (verifies two-call chain) |
config/supervisor/review.md |
Review worker discipline file |
config/supervisor/debug.md |
Debug worker discipline file |
config/supervisor/spec.md |
Spec worker discipline file |
config/supervisor/trainer-reader.md |
Trainer reader agent discipline |
config/supervisor/trainer-writer.md |
Trainer writer agent discipline |
Modified files
| File | Change |
|---|---|
internal/exec/result.go |
Add review/debug/spec/trainer to validPhases and Schema enum |
internal/skills/tdd/skill.go |
Add SessionsDir string to Config; add session_id to green/refactor tool schemas |
internal/skills/tdd/handlers.go |
Add session_id to greenArgs and refactorArgs; inject history when session_id non-empty |
internal/skills/tdd/handlers_test.go |
Add tests for history injection in green and refactor |
cmd/supervisor/main.go |
Pass SessionsDir to tdd.Config; read discipline files and register 4 new skills |
config/models.yaml |
Add trainer model entry |
Task 1: Session history utility
Files:
-
Create:
internal/session/history.go -
Create:
internal/session/history_test.go -
Step 1: Write the failing test
// internal/session/history_test.go
package session_test
import (
"testing"
"time"
"github.com/mathiasbq/supervisor/internal/session"
"github.com/stretchr/testify/assert"
)
func TestFormatHistoryEmpty(t *testing.T) {
result := session.FormatHistory(nil, "")
assert.Equal(t, "", result)
}
func TestFormatHistoryFormatsEntries(t *testing.T) {
entries := []session.Entry{
{
Skill: "tdd", Phase: "red", FinalStatus: "pass",
FilePath: "internal/foo/foo_test.go",
Message: "wrote failing test for Foo",
Timestamp: time.Now(),
},
}
result := session.FormatHistory(entries, "")
assert.Contains(t, result, "## Session history")
assert.Contains(t, result, "Phase: red")
assert.Contains(t, result, "wrote failing test for Foo")
assert.Contains(t, result, "internal/foo/foo_test.go")
}
func TestFormatHistoryExcludesCurrentPhase(t *testing.T) {
entries := []session.Entry{
{Skill: "tdd", Phase: "red", Message: "red done", FinalStatus: "pass"},
{Skill: "tdd", Phase: "green", Message: "green done", FinalStatus: "pass"},
}
result := session.FormatHistory(entries, "green")
assert.Contains(t, result, "red done")
assert.NotContains(t, result, "green done")
}
- Step 2: Run test to confirm it fails
cd /path/to/supervisor
go test ./internal/session/... -run TestFormatHistory -v
Expected: FAIL — session.FormatHistory undefined
- Step 3: Implement FormatHistory
// internal/session/history.go
package session
import (
"fmt"
"strings"
)
// FormatHistory formats prior session entries as a structured block for
// injection into a worker task prompt. Entries matching excludePhase are
// omitted (pass the current phase to avoid circular injection).
func FormatHistory(entries []Entry, excludePhase string) string {
var filtered []Entry
for _, e := range entries {
if e.Phase != excludePhase {
filtered = append(filtered, e)
}
}
if len(filtered) == 0 {
return ""
}
var b strings.Builder
b.WriteString("## Session history\n\n")
for _, e := range filtered {
fmt.Fprintf(&b, "### Phase: %s\n", e.Phase)
fmt.Fprintf(&b, "- Skill: %s\n", e.Skill)
fmt.Fprintf(&b, "- Status: %s\n", e.FinalStatus)
if e.FilePath != "" {
fmt.Fprintf(&b, "- File: %s\n", e.FilePath)
}
if e.Message != "" {
fmt.Fprintf(&b, "- Summary: %s\n", e.Message)
}
b.WriteString("\n")
}
return b.String()
}
- Step 4: Run tests to confirm they pass
go test ./internal/session/... -v
Expected: all TestFormatHistory* tests PASS
- Step 5: Commit
git add internal/session/history.go internal/session/history_test.go
git commit -m "feat(session): add FormatHistory for worker context injection"
Task 2: Fix Schema enum and validPhases
Files:
- Modify:
internal/exec/result.go
The Schema's phase enum currently only lists ["red","green","refactor"]. Workers using any other phase name get forced to pick from that list (the retrospective worker was returning "refactor" as a result). Fix by removing the enum constraint from the schema (let it be any string) and rely solely on validPhases for server-side validation.
- Step 1: Write the failing test
// Add to internal/exec/result_test.go (check existing file first for test helpers)
func TestValidateAcceptsAllPhases(t *testing.T) {
phases := []string{"red", "green", "refactor", "retrospective", "review", "debug", "spec", "trainer"}
for _, phase := range phases {
r := exec.Result{Status: "pass", Phase: phase, Skill: "test", ModelUsed: "self", Message: "ok"}
assert.NoError(t, r.Validate(), "phase %q should be valid", phase)
}
}
- Step 2: Run test to confirm it fails
go test ./internal/exec/... -run TestValidateAcceptsAllPhases -v
Expected: FAIL — "review", "debug", "spec", "trainer" fail validation
- Step 3: Update result.go
In internal/exec/result.go, make these two changes:
Change 1 — update validPhases:
var validPhases = map[string]bool{
"red": true,
"green": true,
"refactor": true,
"retrospective": true,
"review": true,
"debug": true,
"spec": true,
"trainer": true,
}
Change 2 — remove the enum constraint from the Schema phase property (replace the "phase" line):
// Before:
"phase": {"type": "string", "enum": ["red","green","refactor"]},
// After:
"phase": {"type": "string"},
Also fix the error message in Validate():
// Before:
errs = append(errs, "phase must be red|green|refactor, got: "+r.Phase)
// After:
errs = append(errs, "phase must be one of red|green|refactor|retrospective|review|debug|spec|trainer, got: "+r.Phase)
- Step 4: Run tests to confirm they pass
go test ./internal/exec/... -v
Expected: all tests PASS, including TestValidateAcceptsAllPhases
- Step 5: Commit
git add internal/exec/result.go
git commit -m "fix(exec): expand validPhases and remove schema enum constraint for phase"
Task 3: Session history injection in TDD green and refactor
Files:
-
Modify:
internal/skills/tdd/skill.go -
Modify:
internal/skills/tdd/handlers.go -
Modify:
internal/skills/tdd/handlers_test.go -
Modify:
cmd/supervisor/main.go -
Step 1: Write the failing tests
Add to internal/skills/tdd/handlers_test.go:
func TestTDDGreenInjectsSessionHistory(t *testing.T) {
sessDir := t.TempDir()
require.NoError(t, session.Append(sessDir, "sess-1", session.Entry{
SessionID: "sess-1", Skill: "tdd", Phase: "red", FinalStatus: "pass",
FilePath: "internal/foo/foo_test.go",
Message: "wrote failing test for Foo",
}))
var capturedPrompt string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
capturedPrompt = req.TaskPrompt
return iexec.Result{Status: "pass", Phase: "green", Skill: "tdd", Verified: true, ModelUsed: "self", Message: "ok"}, nil
}
sk := tdd.New(tdd.Config{SkillPrompt: "tdd", ExecutorFn: fakeFn, SessionsDir: sessDir})
_, err := sk.Handle(context.Background(), "tdd_green", json.RawMessage(
`{"project_root":"/tmp","test_path":"internal/foo/foo_test.go","test_cmd":"go test ./...","session_id":"sess-1"}`,
))
require.NoError(t, err)
assert.Contains(t, capturedPrompt, "## Session history")
assert.Contains(t, capturedPrompt, "wrote failing test for Foo")
}
func TestTDDGreenNoHistoryWhenSessionIDEmpty(t *testing.T) {
var capturedPrompt string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
capturedPrompt = req.TaskPrompt
return iexec.Result{Status: "pass", Phase: "green", Skill: "tdd", Verified: true, ModelUsed: "self", Message: "ok"}, nil
}
sk := tdd.New(tdd.Config{SkillPrompt: "tdd", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
_, err := sk.Handle(context.Background(), "tdd_green", json.RawMessage(
`{"project_root":"/tmp","test_path":"internal/foo/foo_test.go"}`,
))
require.NoError(t, err)
assert.NotContains(t, capturedPrompt, "## Session history")
}
You will need these imports in the test file:
import (
"github.com/mathiasbq/supervisor/internal/session"
iexec "github.com/mathiasbq/supervisor/internal/exec"
)
- Step 2: Run tests to confirm they fail
go test ./internal/skills/tdd/... -run TestTDDGreen -v
Expected: FAIL — tdd.Config has no SessionsDir field, session_id not in args
- Step 3: Add SessionsDir to Config and session_id to tool schemas
In internal/skills/tdd/skill.go, add SessionsDir to Config:
type Config struct {
SystemPrompt string
SkillPrompt string
ExecutorFn ExecutorFn
DefaultModel string
SessionsDir string // optional: path to brain/sessions/ for history injection
}
In Tools(), add "session_id" as an optional property to tdd_green and tdd_refactor input schemas:
// tdd_green InputSchema:
InputSchema: schema(
[]string{"project_root", "test_path"},
map[string]any{
"project_root": strProp,
"test_path": strProp,
"model": strProp,
"test_cmd": strProp,
"session_id": strProp,
},
),
// tdd_refactor InputSchema (same addition):
InputSchema: schema(
[]string{"project_root", "test_path", "impl_path"},
map[string]any{
"project_root": strProp,
"test_path": strProp,
"impl_path": strProp,
"model": strProp,
"test_cmd": strProp,
"session_id": strProp,
},
),
- Step 4: Update greenArgs, refactorArgs, and inject history
In internal/skills/tdd/handlers.go, add imports and update structs + handlers:
import (
// existing imports...
"github.com/mathiasbq/supervisor/internal/session"
)
type greenArgs struct {
ProjectRoot string `json:"project_root"`
TestPath string `json:"test_path"`
Model string `json:"model"`
TestCmd string `json:"test_cmd"`
SessionID string `json:"session_id"`
}
func (s *Skill) handleGreen(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
var args greenArgs
if err := json.Unmarshal(raw, &args); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if args.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if args.TestPath == "" {
return nil, fmt.Errorf("test_path is required")
}
task := fmt.Sprintf(
"phase: green\nproject_root: %s\ntest_path: %s\nmodel: %s\ntest_cmd: %s",
args.ProjectRoot, args.TestPath, s.resolveModel(args.Model), args.TestCmd,
)
task = s.prependHistory(args.SessionID, "green", task)
return s.execute(ctx, task)
}
type refactorArgs struct {
ProjectRoot string `json:"project_root"`
TestPath string `json:"test_path"`
ImplPath string `json:"impl_path"`
Model string `json:"model"`
TestCmd string `json:"test_cmd"`
SessionID string `json:"session_id"`
}
func (s *Skill) handleRefactor(ctx context.Context, raw json.RawMessage) (json.RawMessage, error) {
var args refactorArgs
if err := json.Unmarshal(raw, &args); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if args.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if args.TestPath == "" {
return nil, fmt.Errorf("test_path is required")
}
if args.ImplPath == "" {
return nil, fmt.Errorf("impl_path is required")
}
task := fmt.Sprintf(
"phase: refactor\nproject_root: %s\ntest_path: %s\nimpl_path: %s\nmodel: %s\ntest_cmd: %s",
args.ProjectRoot, args.TestPath, args.ImplPath, s.resolveModel(args.Model), args.TestCmd,
)
task = s.prependHistory(args.SessionID, "refactor", task)
return s.execute(ctx, task)
}
// prependHistory reads the session log and prepends prior phase entries to the task prompt.
// If sessionID is empty or SessionsDir is not configured, task is returned unchanged.
func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
if sessionID == "" || s.cfg.SessionsDir == "" {
return task
}
entries, err := session.Read(s.cfg.SessionsDir, sessionID)
if err != nil || len(entries) == 0 {
return task
}
history := session.FormatHistory(entries, currentPhase)
if history == "" {
return task
}
return history + "\n---\n\n" + task
}
- Step 5: Update main.go to pass SessionsDir to tdd.Config
In cmd/supervisor/main.go, find the tdd.New(tdd.Config{...}) call and add SessionsDir:
reg.Register(tdd.New(tdd.Config{
SystemPrompt: string(systemPrompt),
SkillPrompt: string(tddPrompt),
DefaultModel: models.Resolve("tdd", ""),
ExecutorFn: executor.Run,
SessionsDir: cfg.SessionsDir, // ← add this line
}))
- Step 6: Run all tests
go test ./... -race -count=1
Expected: all tests PASS
- Step 7: Commit
git add internal/skills/tdd/skill.go internal/skills/tdd/handlers.go internal/skills/tdd/handlers_test.go cmd/supervisor/main.go
git commit -m "feat(tdd): inject session history into green and refactor worker prompts"
Task 4: review skill
Files:
-
Create:
internal/skills/review/skill.go -
Create:
internal/skills/review/handlers.go -
Create:
internal/skills/review/handlers_test.go -
Create:
config/supervisor/review.md -
Modify:
cmd/supervisor/main.go -
Step 1: Write the failing test
// internal/skills/review/handlers_test.go
package review_test
import (
"context"
"encoding/json"
"testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/skills/review"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestReviewToolRegistered(t *testing.T) {
sk := review.New(review.Config{SkillPrompt: "review rules"})
names := make([]string, 0)
for _, tool := range sk.Tools() {
names = append(names, tool.Name)
}
assert.Contains(t, names, "review")
}
func TestReviewRequiresProjectRoot(t *testing.T) {
sk := review.New(review.Config{SkillPrompt: "r"})
_, err := sk.Handle(context.Background(), "review", json.RawMessage(`{"files":["main.go"]}`))
assert.ErrorContains(t, err, "project_root")
}
func TestReviewRequiresFiles(t *testing.T) {
sk := review.New(review.Config{SkillPrompt: "r"})
_, err := sk.Handle(context.Background(), "review", json.RawMessage(`{"project_root":"/tmp"}`))
assert.ErrorContains(t, err, "files")
}
func TestReviewCallsExecutor(t *testing.T) {
called := false
var capturedTask string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
called = true
capturedTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "review", Skill: "review",
Verified: true, ModelUsed: "self", Message: "2 warnings found",
}, nil
}
sk := review.New(review.Config{SkillPrompt: "review rules", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
out, err := sk.Handle(context.Background(), "review", json.RawMessage(
`{"project_root":"/tmp/proj","files":["internal/foo/foo.go"],"context":"PR: add Foo helper"}`,
))
require.NoError(t, err)
assert.True(t, called)
assert.Contains(t, capturedTask, "internal/foo/foo.go")
assert.Contains(t, capturedTask, "PR: add Foo helper")
var result iexec.Result
require.NoError(t, json.Unmarshal(out, &result))
assert.Equal(t, "pass", result.Status)
assert.Equal(t, "review", result.Phase)
}
var _ = require.New
- Step 2: Run test to confirm it fails
go test ./internal/skills/review/... -v
Expected: FAIL — package review does not exist
- Step 3: Create skill.go
// internal/skills/review/skill.go
package review
import (
"context"
"encoding/json"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/registry"
)
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
type Config struct {
SkillPrompt string
DefaultModel string
ExecutorFn ExecutorFn
SessionsDir string
}
type Skill struct{ cfg Config }
func New(cfg Config) *Skill { return &Skill{cfg: cfg} }
func (s *Skill) Name() string { return "review" }
func (s *Skill) Tools() []registry.ToolDef {
schema := func(required []string, props map[string]any) json.RawMessage {
b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
return b
}
return []registry.ToolDef{
{
Name: "review",
Description: "Perform a structured code review of the specified files. Returns findings with severity levels.",
InputSchema: schema(
[]string{"project_root", "files"},
map[string]any{
"project_root": map[string]any{"type": "string"},
"files": map[string]any{"type": "array", "items": map[string]any{"type": "string"}},
"context": map[string]any{"type": "string"},
"model": map[string]any{"type": "string"},
"session_id": map[string]any{"type": "string"},
},
),
},
}
}
- Step 4: Create handlers.go
// internal/skills/review/handlers.go
package review
import (
"context"
"encoding/json"
"fmt"
"strings"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
)
type reviewArgs struct {
ProjectRoot string `json:"project_root"`
Files []string `json:"files"`
Context string `json:"context"`
Model string `json:"model"`
SessionID string `json:"session_id"`
}
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
if tool != "review" {
return nil, fmt.Errorf("unknown tool: %s", tool)
}
var a reviewArgs
if err := json.Unmarshal(args, &a); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if a.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if len(a.Files) == 0 {
return nil, fmt.Errorf("files is required")
}
model := a.Model
if model == "" {
model = s.cfg.DefaultModel
}
task := fmt.Sprintf(
"phase: review\nproject_root: %s\nfiles: %s\ncontext: %s\nmodel: %s",
a.ProjectRoot, strings.Join(a.Files, ", "), a.Context, model,
)
task = s.prependHistory(a.SessionID, "review", task)
if s.cfg.ExecutorFn == nil {
return nil, fmt.Errorf("no executor configured")
}
result, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.SkillPrompt,
TaskPrompt: task,
Model: model,
Tools: "Read,Bash",
})
if err != nil {
return nil, err
}
b, err := json.Marshal(result)
if err != nil {
return nil, fmt.Errorf("marshal result: %w", err)
}
return b, nil
}
func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
if sessionID == "" || s.cfg.SessionsDir == "" {
return task
}
entries, err := session.Read(s.cfg.SessionsDir, sessionID)
if err != nil || len(entries) == 0 {
return task
}
history := session.FormatHistory(entries, currentPhase)
if history == "" {
return task
}
return history + "\n---\n\n" + task
}
- Step 5: Create config/supervisor/review.md
# Code Review Discipline
You are a disciplined code reviewer. Read files carefully before commenting.
## Iron laws
1. Never approve security vulnerabilities: command injection, SQL injection, credential exposure, path traversal, unchecked input at system boundaries
2. Never approve silently swallowed errors — `err != nil` without wrapping or handling is always wrong
3. Never approve missing validation at system boundaries (user input, external APIs, file reads)
## Output contract
Return JSON result with:
- `status`: "pass" if no blocking issues; "fail" if any iron law is violated
- `phase`: "review"
- `skill`: "review"
- `file_path`: first file reviewed
- `runner_output`: full review formatted as:
CRITICAL: at : WARNING: at : SUGGESTION: at :
- `verified`: true if you read all specified files; false if any were missing or unreadable
- `message`: "N critical, M warnings, K suggestions" or "clean: <which iron law checks passed and why>"
## Rules
1. Read every file listed before writing feedback
2. Check iron laws first — any violation is CRITICAL and sets status to "fail"
3. Then check: correctness, test coverage for new code, Go style conventions
4. Never rubber-stamp — if nothing is wrong, explain specifically which iron law checks you ran and why they passed
5. Line references are required for every finding — "roughly around the middle" is not acceptable
- Step 6: Wire into main.go
In cmd/supervisor/main.go, add after existing os.ReadFile calls for discipline files:
reviewPrompt, err := os.ReadFile(cfg.ConfigDir + "/review.md")
if err != nil {
logger.Error("read review.md", "path", cfg.ConfigDir+"/review.md", "err", err)
os.Exit(1)
}
Add import: "github.com/mathiasbq/supervisor/internal/skills/review"
Register after existing reg.Register calls:
reg.Register(review.New(review.Config{
SkillPrompt: string(reviewPrompt),
DefaultModel: models.Resolve("review", ""),
ExecutorFn: executor.Run,
SessionsDir: cfg.SessionsDir,
}))
- Step 7: Run all tests
go test ./... -race -count=1
Expected: all tests PASS
- Step 8: Commit
git add internal/skills/review/ config/supervisor/review.md cmd/supervisor/main.go
git commit -m "feat(review): add code review MCP skill with session history injection"
Task 5: debug skill
Files:
-
Create:
internal/skills/debug/skill.go -
Create:
internal/skills/debug/handlers.go -
Create:
internal/skills/debug/handlers_test.go -
Create:
config/supervisor/debug.md -
Modify:
cmd/supervisor/main.go -
Step 1: Write the failing test
// internal/skills/debug/handlers_test.go
package debug_test
import (
"context"
"encoding/json"
"testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/skills/debug"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestDebugToolRegistered(t *testing.T) {
sk := debug.New(debug.Config{SkillPrompt: "debug rules"})
names := make([]string, 0)
for _, tool := range sk.Tools() {
names = append(names, tool.Name)
}
assert.Contains(t, names, "debug")
}
func TestDebugRequiresProjectRoot(t *testing.T) {
sk := debug.New(debug.Config{SkillPrompt: "d"})
_, err := sk.Handle(context.Background(), "debug", json.RawMessage(`{"error":"panic: nil pointer"}`))
assert.ErrorContains(t, err, "project_root")
}
func TestDebugRequiresError(t *testing.T) {
sk := debug.New(debug.Config{SkillPrompt: "d"})
_, err := sk.Handle(context.Background(), "debug", json.RawMessage(`{"project_root":"/tmp"}`))
assert.ErrorContains(t, err, "error")
}
func TestDebugCallsExecutor(t *testing.T) {
called := false
var capturedTask string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
called = true
capturedTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "debug", Skill: "debug",
RunnerOutput: "HYPOTHESIS 1 (likelihood: high): nil map access\nVERIFY: go test ./... → expected: panic line reference",
Verified: false, ModelUsed: "self", Message: "3 hypotheses for: panic nil pointer at foo.go:42",
}, nil
}
sk := debug.New(debug.Config{SkillPrompt: "debug rules", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
out, err := sk.Handle(context.Background(), "debug", json.RawMessage(
`{"project_root":"/tmp/proj","error":"panic: nil pointer dereference at foo.go:42","context":"occurs on startup"}`,
))
require.NoError(t, err)
assert.True(t, called)
assert.Contains(t, capturedTask, "panic: nil pointer dereference")
assert.Contains(t, capturedTask, "occurs on startup")
var result iexec.Result
require.NoError(t, json.Unmarshal(out, &result))
assert.Equal(t, "debug", result.Phase)
}
var _ = require.New
- Step 2: Run test to confirm it fails
go test ./internal/skills/debug/... -v
Expected: FAIL — package does not exist
- Step 3: Create skill.go
// internal/skills/debug/skill.go
package debug
import (
"context"
"encoding/json"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/registry"
)
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
type Config struct {
SkillPrompt string
DefaultModel string
ExecutorFn ExecutorFn
SessionsDir string
}
type Skill struct{ cfg Config }
func New(cfg Config) *Skill { return &Skill{cfg: cfg} }
func (s *Skill) Name() string { return "debug" }
func (s *Skill) Tools() []registry.ToolDef {
schema := func(required []string, props map[string]any) json.RawMessage {
b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
return b
}
str := map[string]any{"type": "string"}
return []registry.ToolDef{
{
Name: "debug",
Description: "Analyse an error and return 3-5 hypotheses ordered by likelihood, each with a concrete verification step.",
InputSchema: schema(
[]string{"project_root", "error"},
map[string]any{
"project_root": str,
"error": str,
"context": str,
"model": str,
"session_id": str,
},
),
},
}
}
- Step 4: Create handlers.go
// internal/skills/debug/handlers.go
package debug
import (
"context"
"encoding/json"
"fmt"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
)
type debugArgs struct {
ProjectRoot string `json:"project_root"`
Error string `json:"error"`
Context string `json:"context"`
Model string `json:"model"`
SessionID string `json:"session_id"`
}
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
if tool != "debug" {
return nil, fmt.Errorf("unknown tool: %s", tool)
}
var a debugArgs
if err := json.Unmarshal(args, &a); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if a.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if a.Error == "" {
return nil, fmt.Errorf("error is required")
}
model := a.Model
if model == "" {
model = s.cfg.DefaultModel
}
task := fmt.Sprintf(
"phase: debug\nproject_root: %s\nerror: %s\ncontext: %s\nmodel: %s",
a.ProjectRoot, a.Error, a.Context, model,
)
task = s.prependHistory(a.SessionID, "debug", task)
if s.cfg.ExecutorFn == nil {
return nil, fmt.Errorf("no executor configured")
}
result, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.SkillPrompt,
TaskPrompt: task,
Model: model,
Tools: "Read,Bash",
})
if err != nil {
return nil, err
}
b, err := json.Marshal(result)
if err != nil {
return nil, fmt.Errorf("marshal result: %w", err)
}
return b, nil
}
func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
if sessionID == "" || s.cfg.SessionsDir == "" {
return task
}
entries, err := session.Read(s.cfg.SessionsDir, sessionID)
if err != nil || len(entries) == 0 {
return task
}
history := session.FormatHistory(entries, currentPhase)
if history == "" {
return task
}
return history + "\n---\n\n" + task
}
- Step 5: Create config/supervisor/debug.md
# Debug Discipline
You are a systematic debugger. Form hypotheses before suggesting fixes.
## Iron laws
1. Never suggest "try X and see what happens" — every hypothesis must have a specific expected outcome if correct
2. Generate exactly 3-5 hypotheses, ordered by likelihood (most likely first)
3. Never fix the bug — diagnose only; the caller decides what to do with the hypotheses
## Output contract
Return JSON result with:
- `status`: "pass" (hypotheses generated) or "error" (error too ambiguous to analyse)
- `phase`: "debug"
- `skill`: "debug"
- `file_path`: the most relevant file to the error (read it)
- `runner_output`: your hypotheses, formatted as:
HYPOTHESIS 1 (likelihood: high): VERIFY: → expected if correct:
HYPOTHESIS 2 (likelihood: medium): VERIFY: → expected if correct:
- `verified`: false — verification is the caller's job
- `message`: "N hypotheses for: <one-line error summary>"
## Rules
1. Read the error and any context files provided before forming hypotheses
2. Identify the failure mode first — what actually went wrong, not just what the error says
3. For each hypothesis: name the mechanism, explain why it would produce this exact error, give a concrete verification command with expected output
4. If the error is clearly a typo or trivial mistake, still form 3 hypotheses — surface the most likely cause as #1
- Step 6: Wire into main.go
Add file read:
debugPrompt, err := os.ReadFile(cfg.ConfigDir + "/debug.md")
if err != nil {
logger.Error("read debug.md", "path", cfg.ConfigDir+"/debug.md", "err", err)
os.Exit(1)
}
Add import: "github.com/mathiasbq/supervisor/internal/skills/debug"
Register skill:
reg.Register(debug.New(debug.Config{
SkillPrompt: string(debugPrompt),
DefaultModel: models.Resolve("debug", ""),
ExecutorFn: executor.Run,
SessionsDir: cfg.SessionsDir,
}))
- Step 7: Run all tests
go test ./... -race -count=1
Expected: all tests PASS
- Step 8: Commit
git add internal/skills/debug/ config/supervisor/debug.md cmd/supervisor/main.go
git commit -m "feat(debug): add debug MCP skill with hypothesis generation"
Task 6: spec skill
Files:
-
Create:
internal/skills/spec/skill.go -
Create:
internal/skills/spec/handlers.go -
Create:
internal/skills/spec/handlers_test.go -
Create:
config/supervisor/spec.md -
Modify:
cmd/supervisor/main.go -
Step 1: Write the failing test
// internal/skills/spec/handlers_test.go
package spec_test
import (
"context"
"encoding/json"
"testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/skills/spec"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestSpecToolRegistered(t *testing.T) {
sk := spec.New(spec.Config{SkillPrompt: "spec rules"})
names := make([]string, 0)
for _, tool := range sk.Tools() {
names = append(names, tool.Name)
}
assert.Contains(t, names, "spec")
}
func TestSpecRequiresProjectRoot(t *testing.T) {
sk := spec.New(spec.Config{SkillPrompt: "s"})
_, err := sk.Handle(context.Background(), "spec", json.RawMessage(`{"requirements":"add login"}`))
assert.ErrorContains(t, err, "project_root")
}
func TestSpecRequiresRequirements(t *testing.T) {
sk := spec.New(spec.Config{SkillPrompt: "s"})
_, err := sk.Handle(context.Background(), "spec", json.RawMessage(`{"project_root":"/tmp"}`))
assert.ErrorContains(t, err, "requirements")
}
func TestSpecCallsExecutor(t *testing.T) {
called := false
var capturedTask string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
called = true
capturedTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "spec", Skill: "spec",
FilePath: "/tmp/proj/docs/login-spec.md",
Verified: true, ModelUsed: "self", Message: "spec written: login feature",
}, nil
}
sk := spec.New(spec.Config{SkillPrompt: "spec rules", ExecutorFn: fakeFn, SessionsDir: t.TempDir()})
out, err := sk.Handle(context.Background(), "spec", json.RawMessage(
`{"project_root":"/tmp/proj","requirements":"add OAuth2 login","output_path":"docs/login-spec.md"}`,
))
require.NoError(t, err)
assert.True(t, called)
assert.Contains(t, capturedTask, "OAuth2 login")
assert.Contains(t, capturedTask, "docs/login-spec.md")
var result iexec.Result
require.NoError(t, json.Unmarshal(out, &result))
assert.Equal(t, "spec", result.Phase)
}
var _ = require.New
- Step 2: Run test to confirm it fails
go test ./internal/skills/spec/... -v
Expected: FAIL — package does not exist
- Step 3: Create skill.go
// internal/skills/spec/skill.go
package spec
import (
"context"
"encoding/json"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/registry"
)
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
type Config struct {
SkillPrompt string
DefaultModel string
ExecutorFn ExecutorFn
SessionsDir string
}
type Skill struct{ cfg Config }
func New(cfg Config) *Skill { return &Skill{cfg: cfg} }
func (s *Skill) Name() string { return "spec" }
func (s *Skill) Tools() []registry.ToolDef {
schema := func(required []string, props map[string]any) json.RawMessage {
b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
return b
}
str := map[string]any{"type": "string"}
return []registry.ToolDef{
{
Name: "spec",
Description: "Generate a structured implementation spec from requirements. Writes the spec to output_path in the project.",
InputSchema: schema(
[]string{"project_root", "requirements"},
map[string]any{
"project_root": str,
"requirements": str,
"output_path": str,
"context": str,
"model": str,
"session_id": str,
},
),
},
}
}
- Step 4: Create handlers.go
// internal/skills/spec/handlers.go
package spec
import (
"context"
"encoding/json"
"fmt"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
)
type specArgs struct {
ProjectRoot string `json:"project_root"`
Requirements string `json:"requirements"`
OutputPath string `json:"output_path"`
Context string `json:"context"`
Model string `json:"model"`
SessionID string `json:"session_id"`
}
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
if tool != "spec" {
return nil, fmt.Errorf("unknown tool: %s", tool)
}
var a specArgs
if err := json.Unmarshal(args, &a); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if a.ProjectRoot == "" {
return nil, fmt.Errorf("project_root is required")
}
if a.Requirements == "" {
return nil, fmt.Errorf("requirements is required")
}
outputPath := a.OutputPath
if outputPath == "" {
outputPath = "docs/spec.md"
}
model := a.Model
if model == "" {
model = s.cfg.DefaultModel
}
task := fmt.Sprintf(
"phase: spec\nproject_root: %s\nrequirements: %s\noutput_path: %s\ncontext: %s\nmodel: %s",
a.ProjectRoot, a.Requirements, outputPath, a.Context, model,
)
task = s.prependHistory(a.SessionID, "spec", task)
if s.cfg.ExecutorFn == nil {
return nil, fmt.Errorf("no executor configured")
}
result, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.SkillPrompt,
TaskPrompt: task,
Model: model,
Tools: "Read,Write",
})
if err != nil {
return nil, err
}
b, err := json.Marshal(result)
if err != nil {
return nil, fmt.Errorf("marshal result: %w", err)
}
return b, nil
}
func (s *Skill) prependHistory(sessionID, currentPhase, task string) string {
if sessionID == "" || s.cfg.SessionsDir == "" {
return task
}
entries, err := session.Read(s.cfg.SessionsDir, sessionID)
if err != nil || len(entries) == 0 {
return task
}
history := session.FormatHistory(entries, currentPhase)
if history == "" {
return task
}
return history + "\n---\n\n" + task
}
- Step 5: Create config/supervisor/spec.md
# Spec Writing Discipline
You write structured implementation specs. Nothing is left ambiguous.
## Iron laws
1. Success criteria must be measurable — "the system is fast" is banned; "p99 < 200ms under 100 RPS" is valid
2. Always include an explicit "Out of scope" section — if you don't draw the boundary, the developer will guess wrong
3. Every technical decision in the approach must have a rationale
## Output contract
Return JSON result with:
- `status`: "pass" (spec written) or "error" (requirements too ambiguous to spec without more input)
- `phase`: "spec"
- `skill`: "spec"
- `file_path`: the output_path where the spec was written (absolute path)
- `runner_output`: ""
- `verified`: true if the file was written successfully
- `message`: "spec written: <one-line summary of what was specced>"
## Spec structure
Write the spec as markdown to the output_path:
```markdown
# [Feature] Spec
## Problem statement
[What problem does this solve? For whom? Why now?]
## Success criteria
- [ ] [Criterion 1 — measurable and verifiable]
- [ ] [Criterion 2 — measurable and verifiable]
## Constraints
[Non-negotiable requirements the solution must satisfy]
## Out of scope
[What we are explicitly NOT doing in this iteration]
## Technical approach
[Architecture decisions, key components, rationale for each choice]
## Risks
[What could go wrong, and how we'd mitigate it]
If the requirements are too vague to produce measurable success criteria, return status "error" with a message listing the specific questions that need answers.
- [ ] **Step 6: Wire into main.go**
Add file read:
```go
specPrompt, err := os.ReadFile(cfg.ConfigDir + "/spec.md")
if err != nil {
logger.Error("read spec.md", "path", cfg.ConfigDir+"/spec.md", "err", err)
os.Exit(1)
}
Add import: "github.com/mathiasbq/supervisor/internal/skills/spec"
Register skill:
reg.Register(spec.New(spec.Config{
SkillPrompt: string(specPrompt),
DefaultModel: models.Resolve("spec", ""),
ExecutorFn: executor.Run,
SessionsDir: cfg.SessionsDir,
}))
- Step 7: Run all tests
go test ./... -race -count=1
Expected: all tests PASS
- Step 8: Commit
git add internal/skills/spec/ config/supervisor/spec.md cmd/supervisor/main.go
git commit -m "feat(spec): add spec writing MCP skill"
Task 7: trainer skill
Files:
-
Create:
internal/skills/trainer/skill.go -
Create:
internal/skills/trainer/handlers.go -
Create:
internal/skills/trainer/handlers_test.go -
Create:
config/supervisor/trainer-reader.md -
Create:
config/supervisor/trainer-writer.md -
Modify:
cmd/supervisor/main.go -
Modify:
config/models.yaml -
Step 1: Write the failing test
// internal/skills/trainer/handlers_test.go
package trainer_test
import (
"context"
"encoding/json"
"testing"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
"github.com/mathiasbq/supervisor/internal/skills/trainer"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestTrainerToolRegistered(t *testing.T) {
sk := trainer.New(trainer.Config{ReaderPrompt: "r", WriterPrompt: "w"})
names := make([]string, 0)
for _, tool := range sk.Tools() {
names = append(names, tool.Name)
}
assert.Contains(t, names, "trainer")
}
func TestTrainerRequiresSessionID(t *testing.T) {
sk := trainer.New(trainer.Config{ReaderPrompt: "r", WriterPrompt: "w"})
_, err := sk.Handle(context.Background(), "trainer", json.RawMessage(`{}`))
assert.ErrorContains(t, err, "session_id")
}
func TestTrainerCallsReaderThenWriter(t *testing.T) {
sessDir := t.TempDir()
require.NoError(t, session.Append(sessDir, "sess-1", session.Entry{
SessionID: "sess-1", Skill: "tdd", Phase: "red", FinalStatus: "pass",
Message: "wrote failing test", FilePath: "internal/foo/foo_test.go",
}))
callCount := 0
var readerTask, writerTask string
fakeFn := func(_ context.Context, req iexec.Request) (iexec.Result, error) {
callCount++
if callCount == 1 {
// reader call
readerTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "trainer", Skill: "trainer",
RunnerOutput: `[{"type":"sft","moment":"first-pass clean TDD","score":4}]`,
Verified: true, ModelUsed: "self", Message: "1 sft candidate found",
}, nil
}
// writer call
writerTask = req.TaskPrompt
return iexec.Result{
Status: "pass", Phase: "trainer", Skill: "trainer",
FilePath: sessDir + "/training-data/sft/sess-1.jsonl",
Verified: true, ModelUsed: "self", Message: "1 sft pair written",
}, nil
}
sk := trainer.New(trainer.Config{
ReaderPrompt: "reader rules",
WriterPrompt: "writer rules",
ExecutorFn: fakeFn,
SessionsDir: sessDir,
BrainDir: t.TempDir(),
})
out, err := sk.Handle(context.Background(), "trainer", json.RawMessage(`{"session_id":"sess-1"}`))
require.NoError(t, err)
assert.Equal(t, 2, callCount, "executor must be called exactly twice: reader then writer")
assert.Contains(t, readerTask, "role: reader")
assert.Contains(t, readerTask, "sess-1")
assert.Contains(t, readerTask, "wrote failing test") // session history in reader prompt
assert.Contains(t, writerTask, "role: writer")
assert.Contains(t, writerTask, "sft candidate") // reader output passed to writer
var result iexec.Result
require.NoError(t, json.Unmarshal(out, &result))
assert.Equal(t, "trainer", result.Phase)
assert.Equal(t, "pass", result.Status)
}
var _ = require.New
- Step 2: Run test to confirm it fails
go test ./internal/skills/trainer/... -v
Expected: FAIL — package does not exist
- Step 3: Create skill.go
// internal/skills/trainer/skill.go
package trainer
import (
"context"
"encoding/json"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/registry"
)
type ExecutorFn func(ctx context.Context, req iexec.Request) (iexec.Result, error)
type Config struct {
ReaderPrompt string
WriterPrompt string
DefaultModel string
ExecutorFn ExecutorFn
SessionsDir string
BrainDir string // root of brain/ directory; writer writes to BrainDir/training-data/
}
type Skill struct{ cfg Config }
func New(cfg Config) *Skill { return &Skill{cfg: cfg} }
func (s *Skill) Name() string { return "trainer" }
func (s *Skill) Tools() []registry.ToolDef {
schema := func(required []string, props map[string]any) json.RawMessage {
b, _ := json.Marshal(map[string]any{"type": "object", "required": required, "properties": props})
return b
}
return []registry.ToolDef{
{
Name: "trainer",
Description: "Extract SFT and DPO training pairs from a session log. Runs a reader→writer chain: reader identifies learning moments, writer formats and writes pairs to brain/training-data/.",
InputSchema: schema(
[]string{"session_id"},
map[string]any{
"session_id": map[string]any{"type": "string"},
"model": map[string]any{"type": "string"},
},
),
},
}
}
- Step 4: Create handlers.go
// internal/skills/trainer/handlers.go
package trainer
import (
"context"
"encoding/json"
"fmt"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/session"
)
type trainArgs struct {
SessionID string `json:"session_id"`
Model string `json:"model"`
}
func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
if tool != "trainer" {
return nil, fmt.Errorf("unknown tool: %s", tool)
}
var a trainArgs
if err := json.Unmarshal(args, &a); err != nil {
return nil, fmt.Errorf("parse args: %w", err)
}
if a.SessionID == "" {
return nil, fmt.Errorf("session_id is required")
}
if s.cfg.ExecutorFn == nil {
return nil, fmt.Errorf("no executor configured")
}
model := a.Model
if model == "" {
model = s.cfg.DefaultModel
}
entries, err := session.Read(s.cfg.SessionsDir, a.SessionID)
if err != nil {
return nil, fmt.Errorf("read session log: %w", err)
}
// ── Step 1: Reader agent ─────────────────────────────────────────────────
history := session.FormatHistory(entries, "")
readerTask := fmt.Sprintf(
"role: reader\nsession_id: %s\nbrain_dir: %s\n\n%s",
a.SessionID, s.cfg.BrainDir, history,
)
readerResult, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.ReaderPrompt,
TaskPrompt: readerTask,
Model: model,
Tools: "Read",
})
if err != nil {
return nil, fmt.Errorf("reader agent: %w", err)
}
// ── Step 2: Writer agent (receives reader candidates) ────────────────────
writerTask := fmt.Sprintf(
"role: writer\nsession_id: %s\nbrain_dir: %s\n\nreader_candidates:\n%s",
a.SessionID, s.cfg.BrainDir, readerResult.RunnerOutput,
)
writerResult, err := s.cfg.ExecutorFn(ctx, iexec.Request{
SkillPrompt: s.cfg.WriterPrompt,
TaskPrompt: writerTask,
Model: model,
Tools: "Read,Write",
})
if err != nil {
return nil, fmt.Errorf("writer agent: %w", err)
}
b, err := json.Marshal(writerResult)
if err != nil {
return nil, fmt.Errorf("marshal result: %w", err)
}
return b, nil
}
- Step 5: Create config/supervisor/trainer-reader.md
# Trainer Reader Discipline
You scan session logs and identify candidate learning moments worth converting to training data.
## What to look for
- **SFT candidates**: the worker did exactly the right thing — a clean pattern worth reinforcing
- **DPO candidates**: the worker first produced a wrong or suboptimal response, then corrected — you have both rejected and chosen
## Scoring (1–5)
- 5: novel pattern, clearly correct, generalises across projects
- 4: good pattern, correct, somewhat project-specific but still useful
- 3: correct but obvious — include only if especially clean
- 2 or below: skip — too ambiguous or too context-specific
## Output contract
Return JSON result with:
- `status`: "pass" or "error"
- `phase`: "trainer"
- `skill`: "trainer"
- `file_path`: ""
- `runner_output`: JSON array of candidates (valid JSON, not markdown):
[{"type":"sft","moment":"<what happened>","prompt":"<what was asked>","completion":"<what was done right>","score":4},
{"type":"dpo","moment":"<what happened>","prompt":"<what was asked>","chosen":"<correct>","rejected":"<incorrect>","score":3}]
- `verified`: true
- `message`: "N sft candidates, M dpo candidates found"
## Rules
1. Read all session entries in the task prompt
2. Score each entry — only include entries scoring >= 3
3. Prompt/completion fields must be phrased to generalise: no project-specific paths or names
4. If no candidates score >= 3, return an empty array `[]` — never force low-quality candidates
- Step 6: Create config/supervisor/trainer-writer.md
# Trainer Writer Discipline
You receive candidate learning moments from the reader and write clean SFT/DPO training pairs.
## Quality gate (apply before writing)
- SFT: prompt must be phrased so it could come from any project, not just this one
- DPO: chosen and rejected must be clearly distinguishable — skip if a reader can't tell which is better
- Never include project-specific paths, variable names, or identifiers in any pair
## Output contract
Return JSON result with:
- `status`: "pass" (pairs written or skipped due to quality) or "error" (candidates JSON was malformed)
- `phase`: "trainer"
- `skill`: "trainer"
- `file_path`: path of the last file written (empty if nothing passed quality gate)
- `runner_output`: "N SFT pairs written to brain/training-data/sft/, M DPO pairs to brain/training-data/dpo/" or "0 pairs passed quality gate"
- `verified`: true if files were written; false if nothing passed
- `message`: "N sft + M dpo pairs for session <id>" or "no pairs passed quality gate"
## File format
JSONL — one JSON object per line.
SFT: `{"prompt": "...", "completion": "..."}`
DPO: `{"prompt": "...", "chosen": "...", "rejected": "..."}`
Write SFT to: `<brain_dir>/training-data/sft/<session_id>.jsonl`
Write DPO to: `<brain_dir>/training-data/dpo/<session_id>.jsonl`
Append to existing files if they exist (don't overwrite).
## Rules
1. Parse the `reader_candidates` JSON from the task prompt
2. For each candidate: apply quality gate
3. Write passing SFT candidates to sft JSONL, DPO candidates to dpo JSONL
4. If nothing passes, return status "pass" with verified: false and message "no pairs passed quality gate"
- Step 7: Wire into main.go
Add file reads:
trainerReaderPrompt, err := os.ReadFile(cfg.ConfigDir + "/trainer-reader.md")
if err != nil {
logger.Error("read trainer-reader.md", "path", cfg.ConfigDir+"/trainer-reader.md", "err", err)
os.Exit(1)
}
trainerWriterPrompt, err := os.ReadFile(cfg.ConfigDir + "/trainer-writer.md")
if err != nil {
logger.Error("read trainer-writer.md", "path", cfg.ConfigDir+"/trainer-writer.md", "err", err)
os.Exit(1)
}
Add import: "github.com/mathiasbq/supervisor/internal/skills/trainer"
Register skill:
reg.Register(trainer.New(trainer.Config{
ReaderPrompt: string(trainerReaderPrompt),
WriterPrompt: string(trainerWriterPrompt),
DefaultModel: models.Resolve("trainer", ""),
ExecutorFn: executor.Run,
SessionsDir: cfg.SessionsDir,
BrainDir: cfg.BrainDir,
}))
- Step 8: Update config/models.yaml
Add trainer entry following existing format:
skills:
tdd: ollama/qwen3-coder-30b-tuned
review: ollama/devstral-tuned
debug: ollama/deepseek-r1-tuned
retrospective: ollama/qwen3-coder-30b-tuned
spec: ollama/qwen3-coder-30b-tuned
trainer: ollama/qwen3-coder-30b-tuned
- Step 9: Run all tests
go test ./... -race -count=1
Expected: all tests PASS, including TestTrainerCallsReaderThenWriter
- Step 10: Commit
git add internal/skills/trainer/ config/supervisor/trainer-reader.md config/supervisor/trainer-writer.md config/models.yaml cmd/supervisor/main.go
git commit -m "feat(trainer): add trainer MCP skill with reader→writer sub-agent chain"
Task 8: Integration smoke test and CI push
Files: none new — validates the full system
- Step 1: Run task check (full quality gate)
task check
Expected: lint, test, and vet all pass for both modules
- Step 2: Verify all 12 MCP tools are registered
With servers running (task start in another terminal):
curl -s -X POST http://localhost:3200/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' \
| python3 -c "import json,sys; tools=json.load(sys.stdin)['result']['tools']; [print(t['name']) for t in tools]"
Expected output (12 tools):
tdd_red
tdd_green
tdd_refactor
brain_query
brain_write
tier
session_log
retrospective
review
debug
spec
trainer
- Step 3: Smoke test each new skill
# review
curl -s --max-time 10 -X POST http://localhost:3200/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"review","arguments":{"project_root":"/tmp","files":["nonexistent.go"],"context":"test"}}}'
# debug
curl -s --max-time 10 -X POST http://localhost:3200/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"debug","arguments":{"project_root":"/tmp","error":"test error"}}}'
# spec
curl -s --max-time 10 -X POST http://localhost:3200/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"spec","arguments":{"project_root":"/tmp","requirements":"add login button"}}}'
# trainer (requires a session log entry)
curl -s --max-time 10 -X POST http://localhost:3200/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":5,"method":"tools/call","params":{"name":"trainer","arguments":{"session_id":"2026-04-17-validate-hyperguild"}}}'
Each should return a valid JSON-RPC result (not -32000 error). The actual worker call will take longer — --max-time 10 just tests that the MCP dispatch layer works.
- Step 4: Push and verify CI
git push origin main
Watch the CI run complete on Gitea. Check for:
checkjob: green (lint + test + vet)mirrorjob: green (pushed to GitHub)
curl -s "https://gitea.d-ma.be/api/v1/repos/mathias/hyperguild/actions/runs?limit=1" \
-H "Authorization: token $GITEA_TOKEN" | python3 -c "
import json,sys
d=json.load(sys.stdin)
for r in d.get('workflow_runs',[]):
print(f'#{r[\"id\"]} {r[\"status\"]:12} {r.get(\"conclusion\",\"\"):8} {r[\"display_title\"][:50]}')
"
- Step 5: Tag v0.2.0
task tag version=v0.2.0
Self-review
Spec coverage check:
- Session history injection into tdd_green and tdd_refactor → Task 3 ✓
- All new skills accept
session_idfor history injection → Tasks 4–7 ✓ - Trainer reader→writer chain → Task 7 ✓
- Schema enum fixed (was causing retrospective to return wrong phase) → Task 2 ✓
- Phase 2 skills registered in main.go → Tasks 4–7 each include main.go wiring ✓
- CI passes → Task 8 ✓
Placeholder scan: None found — all steps include complete code.
Type consistency:
ExecutorFnis defined in each skill package asfunc(ctx context.Context, req iexec.Request) (iexec.Result, error)— consistent across tdd, review, debug, spec, trainer ✓Config.SessionsDirpresent in all new skills ✓trainer.Config.BrainDirused in handlers.go Task 7 writer task prompt ✓session.FormatHistorysignature(entries []Entry, excludePhase string) stringused consistently ✓prependHistorymethod is defined identically in review, debug, spec handlers — this is intentional duplication (YAGNI: not enough skills to justify extracting a shared mixin) ✓
Note on tasks 4–7: These are independent of each other and can be executed in parallel by separate subagents after Tasks 1–3 are complete.