Files
hyperguild/docs/superpowers/plans/2026-05-03-pass-rate-logging.md
Mathias Bergqvist 1b9c4905a5
All checks were successful
CI / Lint / Test / Vet (push) Successful in 9s
CI / Mirror to GitHub (push) Successful in 4s
docs(plans): patch Plan 5 — test helper sessions/ subdir + sessionlog docstring step
Two corrections applied during Plan 5 execution:

  - Task 2 test helper writeSession now joins a sessions/ subdir so it
    matches the handler's <brainDir>/sessions/*.jsonl scan path.
    (The original heredoc would have produced 0 records in tests.)

  - Task 6 grew a Step 1.5 to update the session_log MCP tool's
    final_status description, picking up a spec requirement that
    didn't translate into a task in the original plan.
2026-05-03 22:59:21 +02:00

41 KiB

Pass-rate Logging Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Instrument skill invocations with session_log calls, expose pass-rate aggregates via a new /pass-rate HTTP endpoint on the brain pod, and add an optional hyperguild brain pass-rate CLI subcommand. Provides Plan 6 (Mode 2 routing pod) with the per-skill signal it needs to make routing decisions.

Architecture: SKILL.md is the source of truth for the logging contract — each instrumented skill's "Brain MCP Integration" section gets a literal copy-paste-friendly session_log call template. A new GET /pass-rate?skill=X&window=Y handler in the ingestion pod walks brain/sessions/*.jsonl on demand, normalizes legacy vocabulary (passok, failerror, skipskipped), and returns aggregate counts + pass rate. The hyperguild CLI gains a brain pass-rate subcommand that calls the endpoint.

Tech Stack: Go 1.26.1 stdlib (net/http, encoding/json, time); existing testify; markdown for SKILL.md updates. Two repos touched: local-dev (skill content) and hyperguild (endpoint + CLI).


Plan 5 of 7 — Hyperguild Skill Migration

Plans 1-4 done. Plan 5 instruments skills and exposes pass-rate, unblocking Plan 6 (routing pod) which then unblocks Plan 7 (supervisor retirement).

Spec: docs/superpowers/specs/2026-05-03-pass-rate-logging-design.md (committed af52f50).

Two repos, two worktrees:

  • Phase 1 worktree (local-dev): ~/Documents/local-dev-worktrees/pass-rate-instrumentation/ on branch feat/pass-rate-instrumentation.
  • Phase 2 worktree (hyperguild): ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint/ on branch feat/pass-rate-endpoint.

Each task's "Files" header names the worktree. Implementer subagents must cd into the named worktree before any read/edit/git operation. Internal absolute paths in code (e.g. ${HOME}/dev/.skills) remain unchanged — they're correct for runtime.

Lint convention: project's golangci-lint has errcheck enabled. Append //nolint:errcheck to any fmt.Fprint* call that ignores its return value when writing to stdout/stderr. The plan's heredocs include these markers. (Same as Plans 3 and 4.)

Verb decision (GET vs POST for /pass-rate): the existing brain HTTP REST API uses POST for all endpoints (/query, /write, /ingest, etc.) — historically because those endpoints all take JSON bodies. /pass-rate is a pure read with query-string parameters. We use GET here to follow REST semantics for pure reads; the inconsistency with /query is acknowledged. Future read endpoints SHOULD follow GET.

File Structure

Phase 1 (local-dev worktree)

Path Action Responsibility
.skills/tdd/SKILL.md modify Pilot — add the session_log contract under existing "Brain MCP Integration"
.skills/code-review/SKILL.md modify Rollout — same contract, code-review's phase set
.skills/debug/SKILL.md modify Rollout
.skills/feature-spec/SKILL.md modify Rollout
.skills/session-retrospective/SKILL.md modify Rollout
.skills/trainer/SKILL.md modify Rollout
.skills/spec-driven-dev/SKILL.md modify Rollout

Phase 2 (hyperguild worktree)

Path Action Responsibility
ingestion/internal/api/passrate.go create Aggregator + handler for /pass-rate
ingestion/internal/api/passrate_test.go create Unit tests for the aggregator + handler
ingestion/cmd/server/main.go modify Register GET /pass-rate route
ingestion/internal/api/handler_test.go modify (or create alongside) Integration test that the route serves expected JSON
cmd/hyperguild/brain.go modify Add runBrainPassRate and dispatch case
cmd/hyperguild/brain_test.go modify Tests for runBrainPassRate
cmd/hyperguild/http.go modify Add brainClient.PassRate method
cmd/hyperguild/http_test.go modify Tests for PassRate method
cmd/hyperguild/README.md modify Document the new subcommand and env vars
CLAUDE.md (or .context/PROJECT.md) modify Document /pass-rate endpoint alongside other brain HTTP REST endpoints

Task 1: Pilot — instrument tdd/SKILL.md

Worktree: ~/Documents/local-dev-worktrees/pass-rate-instrumentation/ (local-dev repo)

Add a "Logging" subsection under the existing "Brain MCP Integration" in tdd/SKILL.md. The subsection states the literal session_log call shape with explicit field names for the three TDD phases (red, green, refactor).

Files:

  • Modify: .skills/tdd/SKILL.md

  • Step 1: Read the existing SKILL.md to find the insertion point

cd ~/Documents/local-dev-worktrees/pass-rate-instrumentation
grep -n "## Brain MCP Integration\|## Mode 2 Routing Note\|## Cross-References" .skills/tdd/SKILL.md
# Locate the "Brain MCP Integration" section and the next section after it.
# The new "Logging" subsection inserts at the END of "Brain MCP Integration",
# immediately before whatever section comes next (typically "Mode 2 Routing Note").
  • Step 2: Insert the Logging subsection

Inside .skills/tdd/SKILL.md, find the "## Brain MCP Integration" section and append (before the next section starts) this exact subsection:

### Logging

Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.

**At end of `red` phase:**
- `session_log` with `{skill: "tdd", phase: "red", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`

**At end of `green` phase:**
- `session_log` with `{skill: "tdd", phase: "green", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`

**At end of `refactor` phase:**
- `session_log` with `{skill: "tdd", phase: "refactor", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`

**Status semantics:**
- `pass` — the phase's intended outcome was reached (red: test fails as expected; green: test passes; refactor: tests still pass after refactor).
- `fail` — the phase's intended outcome was NOT reached (test compiled when it shouldn't, test still fails after green attempt, refactor broke tests).
- `skip` — phase was skipped intentionally (e.g. refactor not warranted).

**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future `tdd` call to a local model. If your skill never logs, the routing pod sees no data and may default-route or default-not-route in a way that doesn't reflect real performance.
  • Step 3: Sanity-check the edit
grep -A2 "### Logging" .skills/tdd/SKILL.md | head -5
# Should show the subsection header

grep -c "session_log" .skills/tdd/SKILL.md
# Should show ≥ 4 (3 phase calls + 1 reference in the subsection intro,
# plus any existing references)

# Confirm the subsection sits inside Brain MCP Integration:
awk '/^## Brain MCP Integration$/,/^## /' .skills/tdd/SKILL.md | tail -20
  • Step 4: Commit
cd ~/Documents/local-dev-worktrees/pass-rate-instrumentation
git add .skills/tdd/SKILL.md
git commit -m "feat(skills): instrument tdd with session_log contract — Plan 5 pilot

Adds a Logging subsection under Brain MCP Integration that names the
session_log fields and per-phase semantics (pass/fail/skip). Pilot
for Plan 5 of the hyperguild migration. Six other binary-outcome
skills will follow once this pilot validates end-to-end via the
endpoint work in Phase 2."

Task 2: Endpoint — /pass-rate aggregator + handler + unit tests

Worktree: ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint/ (hyperguild repo)

Add a new aggregator and handler in ingestion/internal/api/passrate.go. Walks the pod's brain/sessions/*.jsonl files, filters by skill + window, normalizes status vocabulary, returns aggregates.

Note on dogfooding: This task is itself a TDD cycle. The implementing subagent should follow the now-instrumented tdd/SKILL.md (Task 1) and call session_log at the end of red, green, refactor. This validates Phase 1's pilot end-to-end. If the implementer doesn't have access to the brain MCP, they may skip the logging calls and report DONE_WITH_CONCERNS — the controller will validate manually.

Files:

  • Create: ingestion/internal/api/passrate.go

  • Create: ingestion/internal/api/passrate_test.go

  • Step 1: Write the failing test

Create ingestion/internal/api/passrate_test.go:

package api

import (
	"encoding/json"
	"net/http"
	"net/http/httptest"
	"os"
	"path/filepath"
	"testing"
	"time"

	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"
)

// writeSession writes one or more JSONL entries to <dir>/sessions/<sessionID>.jsonl.
// The handler scans <brainDir>/sessions/*.jsonl, so the helper creates the
// sessions/ subdir to match the production layout.
func writeSession(t *testing.T, dir, sessionID string, entries ...string) {
	t.Helper()
	sessionsDir := filepath.Join(dir, "sessions")
	require.NoError(t, os.MkdirAll(sessionsDir, 0o755))
	path := filepath.Join(sessionsDir, sessionID+".jsonl")
	body := ""
	for _, e := range entries {
		body += e + "\n"
	}
	require.NoError(t, os.WriteFile(path, []byte(body), 0o644))
}

func TestPassRate_HappyPath(t *testing.T) {
	dir := t.TempDir()
	now := time.Now().UTC()
	recent := now.Add(-1 * time.Hour).Format(time.RFC3339)

	writeSession(t, dir, "s1",
		`{"timestamp":"`+recent+`","skill":"tdd","phase":"red","final_status":"pass"}`,
		`{"timestamp":"`+recent+`","skill":"tdd","phase":"green","final_status":"pass"}`,
		`{"timestamp":"`+recent+`","skill":"tdd","phase":"refactor","final_status":"fail"}`,
	)
	writeSession(t, dir, "s2",
		`{"timestamp":"`+recent+`","skill":"code-review","phase":"review","final_status":"pass"}`,
	)

	h := &Handler{brainDir: dir}
	req := httptest.NewRequest(http.MethodGet, "/pass-rate?skill=tdd&window=24h", nil)
	w := httptest.NewRecorder()
	h.PassRate(w, req)

	resp := w.Result()
	require.Equal(t, http.StatusOK, resp.StatusCode)

	var got passRateResponse
	require.NoError(t, json.NewDecoder(resp.Body).Decode(&got))
	assert.Equal(t, "tdd", got.Skill)
	assert.Equal(t, "24h", got.Window)
	assert.Equal(t, 2, got.Pass)
	assert.Equal(t, 1, got.Fail)
	assert.Equal(t, 0, got.Skip)
	assert.Equal(t, 3, got.Total)
	require.NotNil(t, got.PassRate)
	assert.InDelta(t, 0.6667, *got.PassRate, 0.001)
}

func TestPassRate_LegacyVocabulary(t *testing.T) {
	dir := t.TempDir()
	now := time.Now().UTC().Format(time.RFC3339)
	writeSession(t, dir, "s1",
		`{"timestamp":"`+now+`","skill":"tdd","final_status":"ok"}`,
		`{"timestamp":"`+now+`","skill":"tdd","final_status":"error"}`,
		`{"timestamp":"`+now+`","skill":"tdd","final_status":"skipped"}`,
	)

	h := &Handler{brainDir: dir}
	req := httptest.NewRequest(http.MethodGet, "/pass-rate?skill=tdd&window=24h", nil)
	w := httptest.NewRecorder()
	h.PassRate(w, req)

	var got passRateResponse
	require.NoError(t, json.NewDecoder(w.Result().Body).Decode(&got))
	assert.Equal(t, 1, got.Pass, "ok→pass")
	assert.Equal(t, 1, got.Fail, "error→fail")
	assert.Equal(t, 1, got.Skip, "skipped→skip")
}

func TestPassRate_OutsideWindow_Excluded(t *testing.T) {
	dir := t.TempDir()
	old := time.Now().UTC().Add(-30 * 24 * time.Hour).Format(time.RFC3339)
	recent := time.Now().UTC().Add(-1 * time.Hour).Format(time.RFC3339)
	writeSession(t, dir, "s1",
		`{"timestamp":"`+old+`","skill":"tdd","final_status":"pass"}`,
		`{"timestamp":"`+recent+`","skill":"tdd","final_status":"pass"}`,
	)

	h := &Handler{brainDir: dir}
	req := httptest.NewRequest(http.MethodGet, "/pass-rate?skill=tdd&window=24h", nil)
	w := httptest.NewRecorder()
	h.PassRate(w, req)

	var got passRateResponse
	require.NoError(t, json.NewDecoder(w.Result().Body).Decode(&got))
	assert.Equal(t, 1, got.Pass)
	assert.Equal(t, 1, got.Total)
}

func TestPassRate_NoData_ReturnsZerosAndNullRate(t *testing.T) {
	dir := t.TempDir()
	h := &Handler{brainDir: dir}
	req := httptest.NewRequest(http.MethodGet, "/pass-rate?skill=tdd&window=24h", nil)
	w := httptest.NewRecorder()
	h.PassRate(w, req)

	var got passRateResponse
	require.NoError(t, json.NewDecoder(w.Result().Body).Decode(&got))
	assert.Equal(t, 0, got.Pass)
	assert.Equal(t, 0, got.Fail)
	assert.Equal(t, 0, got.Skip)
	assert.Equal(t, 0, got.Total)
	assert.Nil(t, got.PassRate, "pass_rate must be null when pass+fail == 0")
}

func TestPassRate_DefaultsTo7d(t *testing.T) {
	dir := t.TempDir()
	now := time.Now().UTC().Format(time.RFC3339)
	writeSession(t, dir, "s1", `{"timestamp":"`+now+`","skill":"tdd","final_status":"pass"}`)

	h := &Handler{brainDir: dir}
	req := httptest.NewRequest(http.MethodGet, "/pass-rate?skill=tdd", nil) // no window
	w := httptest.NewRecorder()
	h.PassRate(w, req)

	var got passRateResponse
	require.NoError(t, json.NewDecoder(w.Result().Body).Decode(&got))
	assert.Equal(t, "7d", got.Window)
	assert.Equal(t, 1, got.Pass)
}

func TestPassRate_MissingSkill_ReturnsBadRequest(t *testing.T) {
	dir := t.TempDir()
	h := &Handler{brainDir: dir}
	req := httptest.NewRequest(http.MethodGet, "/pass-rate", nil)
	w := httptest.NewRecorder()
	h.PassRate(w, req)
	assert.Equal(t, http.StatusBadRequest, w.Result().StatusCode)
}

func TestPassRate_BadWindow_ReturnsBadRequest(t *testing.T) {
	dir := t.TempDir()
	h := &Handler{brainDir: dir}
	req := httptest.NewRequest(http.MethodGet, "/pass-rate?skill=tdd&window=foo", nil)
	w := httptest.NewRecorder()
	h.PassRate(w, req)
	assert.Equal(t, http.StatusBadRequest, w.Result().StatusCode)
}

func TestPassRate_MalformedLine_Skipped(t *testing.T) {
	dir := t.TempDir()
	now := time.Now().UTC().Format(time.RFC3339)
	writeSession(t, dir, "s1",
		`{"timestamp":"`+now+`","skill":"tdd","final_status":"pass"}`,
		`not valid json`,
		`{"timestamp":"`+now+`","skill":"tdd","final_status":"pass"}`,
	)

	h := &Handler{brainDir: dir}
	req := httptest.NewRequest(http.MethodGet, "/pass-rate?skill=tdd&window=24h", nil)
	w := httptest.NewRecorder()
	h.PassRate(w, req)

	var got passRateResponse
	require.NoError(t, json.NewDecoder(w.Result().Body).Decode(&got))
	assert.Equal(t, 2, got.Pass, "the malformed line is silently skipped")
}
  • Step 2: Run test to confirm build failure
cd ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint
cd ingestion && go test ./internal/api/... -run TestPassRate 2>&1 | tail -10
# Expected: build failure — `Handler.PassRate`, `passRateResponse` undefined
# Note: Handler.brainDir field already exists (existing pattern); just adding a new method.
  • Step 3: Write the implementation

Create ingestion/internal/api/passrate.go:

package api

import (
	"encoding/json"
	"net/http"
	"os"
	"path/filepath"
	"strings"
	"time"
)

type passRateResponse struct {
	Skill    string   `json:"skill"`
	Window   string   `json:"window"`
	Pass     int      `json:"pass"`
	Fail     int      `json:"fail"`
	Skip     int      `json:"skip"`
	Total    int      `json:"total"`
	PassRate *float64 `json:"pass_rate"`
}

// PassRate handles GET /pass-rate?skill=X&window=Y.
// Walks brainDir/sessions/*.jsonl, filters by skill name and timestamp,
// returns aggregated counts and pass rate.
func (h *Handler) PassRate(w http.ResponseWriter, r *http.Request) {
	skill := r.URL.Query().Get("skill")
	if skill == "" {
		writeError(w, http.StatusBadRequest, "skill is required")
		return
	}

	windowStr := r.URL.Query().Get("window")
	if windowStr == "" {
		windowStr = "7d"
	}
	window, err := parseWindow(windowStr)
	if err != nil {
		writeError(w, http.StatusBadRequest, "invalid window: "+err.Error())
		return
	}

	cutoff := time.Now().UTC().Add(-window)
	pass, fail, skip := 0, 0, 0

	sessionsDir := filepath.Join(h.brainDir, "sessions")
	entries, err := os.ReadDir(sessionsDir)
	if err != nil && !os.IsNotExist(err) {
		writeError(w, http.StatusInternalServerError, "read sessions dir: "+err.Error())
		return
	}

	for _, entry := range entries {
		if entry.IsDir() || !strings.HasSuffix(entry.Name(), ".jsonl") {
			continue
		}
		body, err := os.ReadFile(filepath.Join(sessionsDir, entry.Name()))
		if err != nil {
			continue // skip unreadable files
		}
		for _, line := range strings.Split(string(body), "\n") {
			line = strings.TrimSpace(line)
			if line == "" {
				continue
			}
			var rec struct {
				Timestamp   string `json:"timestamp"`
				Skill       string `json:"skill"`
				FinalStatus string `json:"final_status"`
			}
			if err := json.Unmarshal([]byte(line), &rec); err != nil {
				continue // malformed — skip
			}
			if rec.Skill != skill {
				continue
			}
			ts, err := time.Parse(time.RFC3339, rec.Timestamp)
			if err != nil {
				continue
			}
			if ts.Before(cutoff) {
				continue
			}
			switch normalizeStatus(rec.FinalStatus) {
			case "pass":
				pass++
			case "fail":
				fail++
			case "skip":
				skip++
			}
		}
	}

	total := pass + fail + skip
	resp := passRateResponse{
		Skill:  skill,
		Window: windowStr,
		Pass:   pass,
		Fail:   fail,
		Skip:   skip,
		Total:  total,
	}
	if pass+fail > 0 {
		rate := float64(pass) / float64(pass+fail)
		resp.PassRate = &rate
	}

	w.Header().Set("Content-Type", "application/json")
	_ = json.NewEncoder(w).Encode(resp)
}

// normalizeStatus maps both new (pass/fail/skip) and legacy (ok/error/skipped)
// vocabularies to the canonical pass/fail/skip set. Unknown values are treated
// as skip for safety.
func normalizeStatus(s string) string {
	switch s {
	case "pass", "ok":
		return "pass"
	case "fail", "error":
		return "fail"
	case "skip", "skipped":
		return "skip"
	default:
		return "skip"
	}
}

// parseWindow accepts Go-style durations plus "Nd" for days.
func parseWindow(s string) (time.Duration, error) {
	if strings.HasSuffix(s, "d") {
		// Replace "d" with "h" * 24
		days := strings.TrimSuffix(s, "d")
		d, err := time.ParseDuration(days + "h")
		if err != nil {
			return 0, err
		}
		return d * 24, nil
	}
	return time.ParseDuration(s)
}
  • Step 4: Run tests, expect all 8 PASS
cd ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint/ingestion
go test ./internal/api/... -run TestPassRate -v 2>&1 | tail -25
# Expected: 8/8 PASS
  • Step 5: Run task check
cd ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint
task check 2>&1 | tail -15
  • Step 6: Commit
cd ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint
git add ingestion/internal/api/passrate.go ingestion/internal/api/passrate_test.go
git commit -m "feat(brain): /pass-rate aggregator and handler

Adds a new HTTP GET handler at the ingestion pod that walks
brain/sessions/*.jsonl, filters by skill name and timestamp window
(default 7d, accepts Nh and Nd), normalizes legacy status vocabulary
(ok→pass, error→fail, skipped→skip), and returns aggregated counts
plus pass_rate.

Pass rate is null when pass+fail == 0, distinguishing 'no data' from
'always passes'. Plan 6 routing pod will check for null before
making decisions.

Route registration in cmd/server/main.go lands in a follow-up commit."

Task 3: Wire /pass-rate into the route table

Worktree: ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint/

Register the handler at GET /pass-rate in the ingestion server's mux. Note the verb is GET (consistent with REST semantics for a pure read), unlike the other brain HTTP REST endpoints which are POST.

Files:

  • Modify: ingestion/cmd/server/main.go

  • Step 1: Read the existing route table

grep -n "mux.HandleFunc" ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint/ingestion/cmd/server/main.go
# Should list the existing POST routes (/query, /write, etc.)
  • Step 2: Add the route

In ingestion/cmd/server/main.go, find the block of mux.HandleFunc(...) calls. After the last existing one (likely mux.HandleFunc("POST /backfill-refs", h.BackfillRefs)), add:

	mux.HandleFunc("GET /pass-rate", h.PassRate)
  • Step 3: Build and check the route is reachable
cd ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint
go build ./ingestion/cmd/server
# Expected: build succeeds, no errors

# A route-reachability check: spin up the server pointing at a tmp brain dir
# and curl the endpoint.
TMPBRAIN=$(mktemp -d)
mkdir "$TMPBRAIN/sessions"
BRAIN_DIR="$TMPBRAIN" go run ./ingestion/cmd/server &
SERVER_PID=$!
sleep 1
curl -sS "http://localhost:3300/pass-rate?skill=tdd&window=7d" | head -3
# Expected: {"skill":"tdd","window":"7d","pass":0,"fail":0,"skip":0,"total":0,"pass_rate":null}
kill $SERVER_PID
rm -rf "$TMPBRAIN"
# Note: server may bind to a different port if BRAIN_DIR isn't enough config.
# If this step fails due to env var or port issues, fall back to confirming
# the route compiles by `go build ./ingestion/...` and rely on Task 4's
# integration test.
  • Step 4: Run task check
task check 2>&1 | tail -15
  • Step 5: Commit
git add ingestion/cmd/server/main.go
git commit -m "feat(brain): register GET /pass-rate route

Adds the route entry alongside the existing POST routes. Note: this
is the brain HTTP REST API's first GET endpoint — it follows REST
semantics for pure reads, while the legacy POST routes (query, write,
ingest, etc.) all take JSON bodies. Future read endpoints SHOULD use
GET; future write endpoints continue with POST."

Task 4: CLI subcommand hyperguild brain pass-rate

Worktree: ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint/

Add a third nested verb under brain in the hyperguild CLI. New brainClient.PassRate method calls the endpoint. New runBrainPassRate subcommand wires it into the dispatcher.

Files:

  • Modify: cmd/hyperguild/http.go (add PassRate method)

  • Modify: cmd/hyperguild/http_test.go (tests for PassRate)

  • Modify: cmd/hyperguild/brain.go (add runBrainPassRate, wire into runBrain switch)

  • Modify: cmd/hyperguild/brain_test.go (tests for runBrainPassRate)

  • Step 1: Write the failing tests

Append to cmd/hyperguild/http_test.go:

func TestBrainClient_PassRate_Success(t *testing.T) {
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		assert.Equal(t, http.MethodGet, r.Method)
		assert.Equal(t, "/pass-rate", r.URL.Path)
		assert.Equal(t, "tdd", r.URL.Query().Get("skill"))
		assert.Equal(t, "7d", r.URL.Query().Get("window"))
		w.Header().Set("Content-Type", "application/json")
		_, _ = w.Write([]byte(`{"skill":"tdd","window":"7d","pass":47,"fail":3,"skip":0,"total":50,"pass_rate":0.94}`))
	}))
	defer srv.Close()

	c := &brainClient{baseURL: srv.URL, http: srv.Client()}
	res, err := c.PassRate(context.Background(), "tdd", "7d")
	require.NoError(t, err)
	assert.Equal(t, "tdd", res.Skill)
	assert.Equal(t, 47, res.Pass)
	assert.Equal(t, 3, res.Fail)
	require.NotNil(t, res.PassRate)
	assert.InDelta(t, 0.94, *res.PassRate, 0.001)
}

func TestBrainClient_PassRate_NullRate(t *testing.T) {
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Content-Type", "application/json")
		_, _ = w.Write([]byte(`{"skill":"tdd","window":"7d","pass":0,"fail":0,"skip":0,"total":0,"pass_rate":null}`))
	}))
	defer srv.Close()

	c := &brainClient{baseURL: srv.URL, http: srv.Client()}
	res, err := c.PassRate(context.Background(), "tdd", "7d")
	require.NoError(t, err)
	assert.Nil(t, res.PassRate)
}

Append to cmd/hyperguild/brain_test.go:

func TestRunBrainPassRate_Human(t *testing.T) {
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		_, _ = w.Write([]byte(`{"skill":"tdd","window":"7d","pass":47,"fail":3,"skip":0,"total":50,"pass_rate":0.94}`))
	}))
	defer srv.Close()
	t.Setenv("BRAIN_URL", srv.URL)

	var out, errBuf bytes.Buffer
	err := runBrain(context.Background(), []string{"pass-rate", "tdd"}, strings.NewReader(""), &out, &errBuf)
	require.NoError(t, err)
	got := out.String()
	assert.Contains(t, got, "tdd")
	assert.Contains(t, got, "47 / 50")
	assert.Contains(t, got, "94%")
}

func TestRunBrainPassRate_NoData(t *testing.T) {
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		_, _ = w.Write([]byte(`{"skill":"tdd","window":"7d","pass":0,"fail":0,"skip":0,"total":0,"pass_rate":null}`))
	}))
	defer srv.Close()
	t.Setenv("BRAIN_URL", srv.URL)

	var out, errBuf bytes.Buffer
	err := runBrain(context.Background(), []string{"pass-rate", "tdd"}, strings.NewReader(""), &out, &errBuf)
	require.NoError(t, err)
	assert.Contains(t, out.String(), "no data")
}

func TestRunBrainPassRate_JSON(t *testing.T) {
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		_, _ = w.Write([]byte(`{"skill":"tdd","window":"7d","pass":47,"fail":3,"skip":0,"total":50,"pass_rate":0.94}`))
	}))
	defer srv.Close()
	t.Setenv("BRAIN_URL", srv.URL)

	var out, errBuf bytes.Buffer
	err := runBrain(context.Background(), []string{"pass-rate", "--json", "tdd"}, strings.NewReader(""), &out, &errBuf)
	require.NoError(t, err)
	assert.Contains(t, out.String(), `"pass_rate": 0.94`)
}

func TestRunBrainPassRate_MissingSkill(t *testing.T) {
	var out, errBuf bytes.Buffer
	err := runBrain(context.Background(), []string{"pass-rate"}, strings.NewReader(""), &out, &errBuf)
	assert.Error(t, err)
	assert.Contains(t, err.Error(), "skill required")
}

func TestRunBrainPassRate_WindowFlag(t *testing.T) {
	gotWindow := ""
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		gotWindow = r.URL.Query().Get("window")
		_, _ = w.Write([]byte(`{"skill":"tdd","window":"30d","pass":0,"fail":0,"skip":0,"total":0,"pass_rate":null}`))
	}))
	defer srv.Close()
	t.Setenv("BRAIN_URL", srv.URL)

	var out, errBuf bytes.Buffer
	err := runBrain(context.Background(), []string{"pass-rate", "--window", "30d", "tdd"}, strings.NewReader(""), &out, &errBuf)
	require.NoError(t, err)
	assert.Equal(t, "30d", gotWindow)
}
  • Step 2: Run tests, expect failure (undefined symbols)
cd ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint
go test ./cmd/hyperguild/... -run "TestBrainClient_PassRate|TestRunBrainPassRate" 2>&1 | tail -10
# Expected: build failure — `PassRate`, `runBrainPassRate` undefined
  • Step 3: Add the brainClient method

In cmd/hyperguild/http.go, append below the existing Write method:

// PassRateResult mirrors the /pass-rate response envelope.
type PassRateResult struct {
	Skill    string   `json:"skill"`
	Window   string   `json:"window"`
	Pass     int      `json:"pass"`
	Fail     int      `json:"fail"`
	Skip     int      `json:"skip"`
	Total    int      `json:"total"`
	PassRate *float64 `json:"pass_rate"`
}

func (c *brainClient) PassRate(ctx context.Context, skill, window string) (*PassRateResult, error) {
	q := url.Values{}
	q.Set("skill", skill)
	q.Set("window", window)
	u := c.baseURL + "/pass-rate?" + q.Encode()

	req, err := http.NewRequestWithContext(ctx, http.MethodGet, u, nil)
	if err != nil {
		return nil, fmt.Errorf("build request: %w", err)
	}
	resp, err := c.http.Do(req)
	if err != nil {
		return nil, fmt.Errorf("brain GET /pass-rate: %w", err)
	}
	defer func() { _ = resp.Body.Close() }()
	if resp.StatusCode != http.StatusOK {
		body, _ := io.ReadAll(resp.Body)
		return nil, fmt.Errorf("brain GET /pass-rate: status %d: %s", resp.StatusCode, string(body))
	}
	var out PassRateResult
	if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
		return nil, fmt.Errorf("decode /pass-rate response: %w", err)
	}
	return &out, nil
}

The net/url and strconv imports were removed in Plan 4's GET→POST fix; net/url needs to come back for this method. Add it to the imports.

  • Step 4: Add runBrainPassRate and wire into the dispatcher

In cmd/hyperguild/brain.go, update the runBrain switch to add a pass-rate case:

func runBrain(ctx context.Context, args []string, stdin io.Reader, stdout, stderr io.Writer) error {
	if len(args) == 0 {
		return errors.New("subcommand required (query|write|pass-rate)")
	}
	switch args[0] {
	case "query":
		return runBrainQuery(ctx, args[1:], stdin, stdout, stderr)
	case "write":
		return runBrainWrite(ctx, args[1:], stdin, stdout, stderr)
	case "pass-rate":
		return runBrainPassRate(ctx, args[1:], stdin, stdout, stderr)
	default:
		return fmt.Errorf("unknown subcommand: %s (expected query|write|pass-rate)", args[0])
	}
}

Then append runBrainPassRate:

func runBrainPassRate(ctx context.Context, args []string, _ io.Reader, stdout, stderr io.Writer) error {
	fs := flag.NewFlagSet("brain pass-rate", flag.ContinueOnError)
	fs.SetOutput(stderr)
	asJSON := fs.Bool("json", false, "output JSON instead of human-readable")
	window := fs.String("window", "7d", "lookback window (e.g. 1h, 24h, 7d, 30d)")
	if err := fs.Parse(args); err != nil {
		return fmt.Errorf("parse flags: %w", err)
	}
	if fs.NArg() < 1 {
		return errors.New("skill required")
	}
	skill := fs.Arg(0)

	res, err := newBrainClient().PassRate(ctx, skill, *window)
	if err != nil {
		return err
	}

	if *asJSON {
		enc := json.NewEncoder(stdout)
		enc.SetIndent("", "  ")
		return enc.Encode(res)
	}
	if res.PassRate == nil {
		fmt.Fprintf(stdout, "%s: no data (window: %s)\n", res.Skill, res.Window) //nolint:errcheck
		return nil
	}
	fmt.Fprintf(stdout, "%s: %d / %d = %.0f%% (window: %s)\n", res.Skill, res.Pass, res.Total, *res.PassRate*100, res.Window) //nolint:errcheck
	return nil
}
  • Step 5: Run tests, expect 5 + 2 + previous all PASS
go test ./cmd/hyperguild/... -v 2>&1 | tail -50
# Expected: previous tests still pass + 7 new tests pass
  • Step 6: Run task check
task check 2>&1 | tail -15
  • Step 7: Commit
git add cmd/hyperguild/http.go cmd/hyperguild/http_test.go cmd/hyperguild/brain.go cmd/hyperguild/brain_test.go
git commit -m "feat(hyperguild): brain pass-rate subcommand

Adds 'hyperguild brain pass-rate <skill> [--window 7d] [--json]'
calling the new /pass-rate endpoint. Human output:
  tdd: 47 / 50 = 94% (window: 7d)
or 'no data (window: 7d)' when pass_rate is null.

PassRateResult mirrors the response envelope; PassRate is *float64
so null is preserved across decode."

Task 5: Rollout — instrument 6 other binary-outcome SKILL.md files

Worktree: ~/Documents/local-dev-worktrees/pass-rate-instrumentation/ (local-dev repo)

Apply the same Logging subsection pattern from Task 1 to six more skills, each with its own phase set. Single commit covers all six (small, mechanical).

Files (all six, same pattern):

File Phases to log
.skills/code-review/SKILL.md One log call per phase the skill defines (typically phase-0 through phase-3 or similar — check the skill's existing structure)
.skills/debug/SKILL.md One log call at the end of each phase (hypothesize, instrument, verify, fix — or whatever the skill defines)
.skills/feature-spec/SKILL.md One log call per output phase (typically draft, review, approved)
.skills/session-retrospective/SKILL.md One log call at the end (surfaced with final_status reflecting whether candidates were emitted)
.skills/trainer/SKILL.md Two log calls: one at end of READ phase, one at end of WRITE phase
.skills/spec-driven-dev/SKILL.md One log call per output phase (typically per spec, decide, plan)

For each file:

  • Step 1: Open the file and locate "## Brain MCP Integration"
cd ~/Documents/local-dev-worktrees/pass-rate-instrumentation
grep -n "## Brain MCP Integration" .skills/code-review/SKILL.md .skills/debug/SKILL.md .skills/feature-spec/SKILL.md .skills/session-retrospective/SKILL.md .skills/trainer/SKILL.md .skills/spec-driven-dev/SKILL.md
  • Step 2: Append the Logging subsection to each

The template (substitute <skill-name> and the per-skill phase set):

### Logging

Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.

**At end of each phase:**
- `session_log` with `{skill: "<skill-name>", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`

**Phases for this skill:** [list the phases the skill defines, e.g. `red`, `green`, `refactor` for tdd; `phase-0`, `phase-1`, `phase-2`, `phase-3` for code-review; etc.]

**Status semantics:**
- `pass` — the phase's intended outcome was reached.
- `fail` — the phase's intended outcome was NOT reached.
- `skip` — phase was skipped intentionally.

**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.

For each of the six SKILL.md files, insert this subsection at the end of the existing "## Brain MCP Integration" section (immediately before the next ## header). Substitute <skill-name> with the skill's actual name. Read each skill's existing Process or Phases section to know what phases to list.

  • Step 3: Verify all six were updated
for f in code-review debug feature-spec session-retrospective trainer spec-driven-dev; do
  echo "=== $f ==="
  grep -c "session_log" .skills/$f/SKILL.md
done
# Each count should be ≥ 2 (the call site + the introductory mention).
  • Step 4: Commit
git add .skills/code-review/SKILL.md .skills/debug/SKILL.md .skills/feature-spec/SKILL.md .skills/session-retrospective/SKILL.md .skills/trainer/SKILL.md .skills/spec-driven-dev/SKILL.md
git commit -m "feat(skills): instrument 6 binary-outcome skills with session_log — Plan 5 rollout

Following the tdd pilot, applies the Logging subsection template to:

  - code-review
  - debug
  - feature-spec
  - session-retrospective
  - trainer
  - spec-driven-dev

Each skill's Brain MCP Integration section gains a literal session_log
call template with the skill's specific phase set and pass/fail/skip
semantics. Skills with no clear binary outcome (clean-code,
cognitive-load, solid, refactoring, test-design, problem-analysis,
user-stories, planning, atdd, gitea-ci) are intentionally not
instrumented — they're reading or process-shaping content, not
discrete actions."

Task 6: Documentation updates

Worktree: ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint/

Update cmd/hyperguild/README.md to document the new pass-rate subcommand. Update CLAUDE.md to mention the /pass-rate endpoint alongside the other brain HTTP REST endpoints.

Files:

  • Modify: cmd/hyperguild/README.md (add pass-rate subcommand block)

  • Modify: CLAUDE.md (mention /pass-rate in the brain HTTP REST API section)

  • Modify: internal/skills/sessionlog/skill.go (or wherever the session_log MCP tool is registered) — update the final_status description from ok | error | skipped to pass | fail | skip (legacy: ok | error | skipped accepted) so the agent-facing tool definition matches the new SKILL.md vocabulary

  • Step 1: Update the CLI README

Find the existing ### hyperguild brain write <type> <slug> section in cmd/hyperguild/README.md. After it (before the next ### heading), insert:

### `hyperguild brain pass-rate <skill>`

Returns the pass rate for a skill over a lookback window. Computed
on-demand from `brain/sessions/*.jsonl`.

```bash
$ hyperguild brain pass-rate tdd
tdd: 47 / 50 = 94% (window: 7d)

$ hyperguild brain pass-rate tdd --window 30d --json
{
  "skill": "tdd",
  "window": "30d",
  "pass": 142,
  "fail": 8,
  "skip": 5,
  "total": 155,
  "pass_rate": 0.9467
}

Flags:

  • --window — lookback window (default 7d; accepts Nh, Nd)
  • --json — emit the raw response envelope

Skills with no logged invocations return zero counts and pass_rate: null (indicating "no data", distinct from "always passes").


- [ ] **Step 1.5: Update the `session_log` MCP tool's `final_status` description**

Find the `session_log` tool registration in the supervisor (`internal/skills/sessionlog/`). Update the `final_status` field's description string to read:

pass | fail | skip (legacy: ok | error | skipped also accepted)


This keeps the agent-facing schema honest — Plan 5's SKILL.md contract uses the new vocabulary. The aggregator normalizes both for backward compat, but the tool's docstring should lead with the new vocabulary.

```bash
grep -rn "ok | error | skipped\|final_status.*description" ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint/internal/skills/sessionlog/ ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint/ingestion/internal/skills/sessionlog/ 2>/dev/null | head
# Locate the registration; update the description string verbatim.
  • Step 2: Update CLAUDE.md

Find the section in CLAUDE.md that documents the brain HTTP REST API (look for ## MCP endpoints or similar). After the existing endpoint list, add a note about /pass-rate:

The brain HTTP REST API also serves a read-only `GET /pass-rate?skill=X&window=Y`
endpoint that aggregates `final_status` counts from session logs and returns
`{skill, window, pass, fail, skip, total, pass_rate}`. Plan 6 (routing pod)
reads this to decide whether to route skill calls to local models. Pass rate
is `null` when no logged invocations are in the window.
  • Step 3: Run task check
task check 2>&1 | tail -15
  • Step 4: Commit
git add cmd/hyperguild/README.md CLAUDE.md
git commit -m "docs(hyperguild): document brain pass-rate subcommand and /pass-rate endpoint

Adds pass-rate to the CLI README's subcommand block. Updates CLAUDE.md
to note the new /pass-rate endpoint alongside the existing brain
HTTP REST API surface."

Task 7: Push both branches

Both worktrees. Push the local-dev instrumentation branch first (Phase 1), then the hyperguild endpoint+CLI branch (Phase 2).

  • Step 1: Verify both worktrees are clean
cd ~/Documents/local-dev-worktrees/pass-rate-instrumentation
git status
git log --oneline origin/main..HEAD
# Expected: 2 commits (Task 1 pilot + Task 5 rollout), working tree clean

cd ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint
git status
git log --oneline origin/main..HEAD
# Expected: 5 commits (Task 2-4, 6), working tree clean
  • Step 2: Push both
cd ~/Documents/local-dev-worktrees/pass-rate-instrumentation
git push -u origin HEAD
# Note the gitea PR URL.

cd ~/dev/AI/hyperguild/.worktrees/pass-rate-endpoint
git push -u origin HEAD
# Note the gitea PR URL.
  • Step 3: Surface the two PR URLs to the controller

Stop after both pushes. The controller decides merge timing — both branches are independent and can merge in either order, though Phase 1 (instrumentation) should land first so Phase 2's smoke test against a fresh tdd invocation has a chance of producing real data.


Self-review

1. Spec coverage:

Spec success criterion Implementing task
tdd instrumented (pilot) Task 1
6 other skills instrumented Task 5
GET /pass-rate?skill=X&window=Y returns aggregate JSON Task 2
Aggregator normalizes pass/ok, fail/error, skip/skipped Task 2 (normalizeStatus)
Endpoint returns null pass_rate when pass+fail==0 Task 2 (TestPassRate_NoData_ReturnsZerosAndNullRate)
Optional hyperguild brain pass-rate CLI subcommand Task 4
task check passes per task and on merged branches Step in every task
Real data for tdd 1 week post-merge Manual verification (out-of-band)

2. Placeholder scan: No "TBD"/"TODO"/"appropriate"/"similar to Task N". Each rollout SKILL.md has a concrete template; the per-skill phase set is the only thing the implementer fills in (not a placeholder — it requires reading each skill's existing structure).

3. Type/name consistency:

  • passRateResponse (lowercase) on the brain side; PassRateResult (uppercase, exported) on the CLI side. Different packages, different conventions. Field names match: Skill, Window, Pass, Fail, Skip, Total, PassRate. JSON tags identical.
  • normalizeStatus, parseWindow defined in Task 2, used unchanged.
  • runBrainPassRate, PassRate named consistently in Tasks 2/4.
  • BRAIN_URL env var consumed by newBrainClient() (existing) — no new env vars.

Risks (plan-level)

Risk Mitigation
Phase 1 worktree path conflicts with prior cross-tool-skill-wiring worktree (both under local-dev-worktrees/) Different branch name; the cross-tool-skill-wiring worktree was removed after Plan 3 merge. New worktree is created fresh.
Implementer subagent doesn't have brain MCP access (can't dogfood the new tdd contract) DONE_WITH_CONCERNS path documented in Task 2; controller validates manually post-merge.
Phase 2's smoke test (Task 3 Step 3) requires running the ingestion server locally, which may need additional config Falls back to relying on Task 4's integration test (which uses httptest.Server + the real Handler).
final_status in legacy entry uses pass (not the documented ok) — confirmed during spec phase Aggregator handles both; no migration.
Plan 6 may want per-model or per-mode aggregation that this endpoint doesn't provide Endpoint shape is small and additive; Plan 6 can extend with new query params (?model=X) without breaking existing callers.
6-skill rollout in a single commit could blur per-skill mistakes Acceptable — the rollout is mechanical (same template, six substitutions). If an individual skill has a structural quirk that doesn't fit the template, surface as DONE_WITH_CONCERNS and fix in a follow-up.