17 Commits

Author SHA1 Message Date
Mathias Bergqvist
bee4bb3c1f chore(routing): pre-merge cleanup — Plan 7 reminders, code_review→review, operator note
All checks were successful
CI / Lint / Test / Vet (push) Successful in 11s
CI / Mirror to GitHub (push) Successful in 4s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 23:22:15 +02:00
Mathias Bergqvist
d72454d929 docs(routing): document Mode 2 routing pod + env vars
Add routing pod to README architecture diagram and env vars table.
Add routing MCP endpoint to .context/PROJECT.md. Regenerate derived
context adapters via task context:sync.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 23:00:48 +02:00
Mathias Bergqvist
cf94d14922 chore(routing): drop unused BIN_PID assignment in smoke script 2026-05-05 22:56:44 +02:00
Mathias Bergqvist
78a43d6a42 test(routing): live-contract smoke target
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 22:52:23 +02:00
Mathias Bergqvist
ca933eef46 build(routing): Dockerfile + CD workflow
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 07:19:18 +02:00
Mathias Bergqvist
88782de07c feat(hyperguild): mode client-local writes routing headers
Plan 6 is now deployed; replace the _routing_pending placeholder in the
routing MCP entry with a real headers block carrying X-Hyperguild-Mode:
client-local. The pod treats absent or unknown values as client-local,
so this is forward-compat for future modes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 07:13:24 +02:00
Mathias Bergqvist
083c2d7db9 feat(routing): cmd/routing binary
Wires Config → LiteLLMExecutor → Router → four skills (review, debug,
retrospective, trainer) → Registry → MCP server with bearer auth and
/healthz. Each skill's CompleteFunc is wrapped so the Router decides
local-vs-Claude per call and logs every decision to the brain /mcp.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 23:43:59 +02:00
Mathias Bergqvist
751f410ca6 test(routing): pin tool-schema parity with supervisor
Captures the four routed skills' (review, debug, retrospective, trainer)
tool definitions as a JSON snapshot and asserts the routing pod's registry
advertises byte-equal schemas. A deliberate schema change fails this test,
requiring an intentional snapshot update in lockstep with consumers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 22:59:06 +02:00
Mathias Bergqvist
3a99d5e20e refactor(routing): surface logger errors via slog.Warn
Replace silent `_ = r.Logger.LogDecision(...)` discards with an
if-err check that emits slog.Warn on failure. A brain outage now
produces a visible warn line instead of swallowing the telemetry
error entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 22:55:35 +02:00
Mathias Bergqvist
9a258ca32a feat(routing): router dispatch wrapper
Composes Fetcher + Policy + Logger + CompleteFunc into a single Run method.
Falls open to Claude on local-model errors; defaults to local when brain is
unreachable. Skill packages will receive Router.Run as their CompleteFunc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 22:51:01 +02:00
Mathias Bergqvist
2a5a74f7c0 feat(routing): decision logger via brain MCP session_log
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:52:09 +02:00
Mathias Bergqvist
d40a5ac890 test(routing): cover TTL expiry in fetcher
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:50:01 +02:00
Mathias Bergqvist
b77820534a feat(routing): pass-rate fetcher with TTL cache
HTTP client that calls GET /pass-rate?skill=X&window=Y on the brain pod.
Caches *float64 results (including nil) per-skill for the configured TTL
(default 60s). On non-200 or network error returns (nil, err) so the
upstream router can fall through to default-to-local.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:46:11 +02:00
Mathias Bergqvist
db64ecb1d9 feat(routing): canonical request hash
SHA-256 of (system, user) joined with 0x00 separator, truncated to
uint64. Drives deterministic sample-band routing: identical prompt pair
→ same hash → same local-vs-Claude decision on every call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:41:42 +02:00
Mathias Bergqvist
ea29e5ebb8 feat(routing): decision policy
Pure-function Policy{Floor,Ceil} with Decide(*float64, uint64) Decision.
Rules in priority order: nil → local; ≥floor → local; <ceil → claude;
sample band → low bit of requestHash.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:36:59 +02:00
Mathias Bergqvist
ccf080db59 refactor(routing): clarify Floor/Ceil semantics + extend test coverage
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:34:22 +02:00
Mathias Bergqvist
69c038478b feat(routing): RoutingConfig + LoadRouting
Typed config struct and env parser for the routing pod. Kept separate
from the supervisor Config to avoid forcing routing fields onto the
supervisor and vice versa. Uses the existing envOr helper from config.go.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:25:31 +02:00
31 changed files with 1526 additions and 10 deletions

View File

@@ -227,6 +227,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale:
`tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`,
`trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later
migration.
- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises
the same four cost-routable skills as the supervisor (`review`, `debug`,
`retrospective`, `trainer`) but per-call decides whether to use a local
model or Claude based on the brain's `/pass-rate` response. Bearer auth
via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this
endpoint; Mode 1 and Mode 3 do not.
The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`,
`/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for

View File

@@ -56,6 +56,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale:
`tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`,
`trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later
migration.
- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises
the same four cost-routable skills as the supervisor (`review`, `debug`,
`retrospective`, `trainer`) but per-call decides whether to use a local
model or Claude based on the brain's `/pass-rate` response. Bearer auth
via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this
endpoint; Mode 1 and Mode 3 do not.
The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`,
`/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for

View File

@@ -232,6 +232,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale:
`tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`,
`trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later
migration.
- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises
the same four cost-routable skills as the supervisor (`review`, `debug`,
`retrospective`, `trainer`) but per-call decides whether to use a local
model or Claude based on the brain's `/pass-rate` response. Bearer auth
via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this
endpoint; Mode 1 and Mode 3 do not.
The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`,
`/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for

View File

@@ -230,6 +230,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale:
`tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`,
`trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later
migration.
- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises
the same four cost-routable skills as the supervisor (`review`, `debug`,
`retrospective`, `trainer`) but per-call decides whether to use a local
model or Claude based on the brain's `/pass-rate` response. Bearer auth
via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this
endpoint; Mode 1 and Mode 3 do not.
The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`,
`/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for

View File

@@ -15,6 +15,7 @@ jobs:
SERVICE: supervisor
IMAGE: gitea.d-ma.be/mathias/supervisor
INGESTION_IMAGE: gitea.d-ma.be/mathias/ingestion
ROUTING_IMAGE: gitea.d-ma.be/mathias/routing
INFRA_REPO: git@gitea.d-ma.be:mathias/infra.git
BUILDKIT_HOST: unix:///run/buildkit/buildkitd.sock
steps:
@@ -62,6 +63,28 @@ jobs:
echo "Built and pushed ${INGESTION_IMAGE}:${IMAGE_TAG}"
- name: Build and push routing image
run: |
set -e
trap 'rm -f /tmp/routing-image.tar' EXIT
IMAGE_TAG="${{ github.sha }}"
echo "Building ${ROUTING_IMAGE}:${IMAGE_TAG}"
buildctl --addr "${BUILDKIT_HOST}" build \
--frontend dockerfile.v0 \
--local context=. \
--local dockerfile=. \
--opt filename=Dockerfile.routing \
--opt build-arg:VERSION="${IMAGE_TAG}" \
--output type=oci,dest=/tmp/routing-image.tar
skopeo copy \
oci-archive:/tmp/routing-image.tar \
docker://${ROUTING_IMAGE}:${IMAGE_TAG} \
--dest-creds "${{ secrets.REGISTRY_CREDS }}"
echo "Built and pushed ${ROUTING_IMAGE}:${IMAGE_TAG}"
- name: Update infra repo
run: |
set -e
@@ -83,10 +106,15 @@ jobs:
sed -i "s|gitea.d-ma.be/mathias/ingestion:.*|gitea.d-ma.be/mathias/ingestion:${IMAGE_TAG}|" \
"k3s/apps/${SERVICE}/ingestion-deployment.yaml"
sed -i "s|gitea.d-ma.be/mathias/routing:.*|gitea.d-ma.be/mathias/routing:${IMAGE_TAG}|" \
"k3s/apps/routing/deployment.yaml"
git config user.email "cd-bot@d-ma.be"
git config user.name "CD Bot"
git add "k3s/apps/${SERVICE}/deployment.yaml" "k3s/apps/${SERVICE}/ingestion-deployment.yaml"
git commit -m "chore(deploy): ${SERVICE}+ingestion → ${IMAGE_TAG}"
git add "k3s/apps/${SERVICE}/deployment.yaml" \
"k3s/apps/${SERVICE}/ingestion-deployment.yaml" \
"k3s/apps/routing/deployment.yaml"
git commit -m "chore(deploy): supervisor+ingestion+routing → ${IMAGE_TAG}"
GIT_SSH_COMMAND="ssh -i ~/.ssh/infra_deploy_key -o IdentitiesOnly=yes" \
git push

View File

@@ -227,6 +227,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale:
`tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`,
`trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later
migration.
- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises
the same four cost-routable skills as the supervisor (`review`, `debug`,
`retrospective`, `trainer`) but per-call decides whether to use a local
model or Claude based on the brain's `/pass-rate` response. Bearer auth
via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this
endpoint; Mode 1 and Mode 3 do not.
The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`,
`/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for

View File

@@ -56,6 +56,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale:
`tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`,
`trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later
migration.
- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises
the same four cost-routable skills as the supervisor (`review`, `debug`,
`retrospective`, `trainer`) but per-call decides whether to use a local
model or Claude based on the brain's `/pass-rate` response. Bearer auth
via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this
endpoint; Mode 1 and Mode 3 do not.
The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`,
`/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for

View File

@@ -67,6 +67,31 @@ Record *why* things are the way they are. Future-you will thank present-you.
---
## Plan 6: routing pod reuses internal/skills/{review,debug,retrospective,trainer}
Plan 6 (Mode 2 routing pod, 2026-05-04) introduces a second consumer of
the four cost-routable skill packages. The routing pod constructs each
skill via `<pkg>.New(Config{...})` and hands it `routing.Router.Run` as
the `CompleteFunc`. Plan 7 (supervisor retirement) MUST NOT delete the
four packages.
**Plan 7's allowed deletions:**
- `internal/skills/{tdd,spec,tier}/` (not consumed by the routing pod)
- `cmd/supervisor/` (binary)
- `Dockerfile` (supervisor's, at repo root — distinct from `Dockerfile.routing`)
- supervisor manifests in the infra repo
- NodePort `:30320`
**Plan 7's preserved code:**
- `internal/skills/{review,debug,retrospective,trainer}/`
- `internal/registry`
- `internal/mcp`
- `internal/exec/litellm.go`
- `internal/routing/` (entirely new in Plan 6)
- `cmd/routing/`
---
## 2026-04-08 — Mistral Vibe gets its own adapter
**Context**: Vibe doesn't read `AGENTS.md` — it uses `~/.vibe/prompts/` and `~/.vibe/agents/` with TOML config.

30
Dockerfile.routing Normal file
View File

@@ -0,0 +1,30 @@
# syntax=docker/dockerfile:1
# ── Build stage ───────────────────────────────────────────────────────────────
FROM golang:1.26-bookworm AS builder
ARG VERSION=dev
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -trimpath -ldflags="-s -w -X main.version=${VERSION}" \
-o /out/routing ./cmd/routing
# ── Runtime stage ─────────────────────────────────────────────────────────────
FROM gcr.io/distroless/base-debian12
COPY --from=builder /out/routing /usr/local/bin/routing
COPY config/ /app/config/
ENV SUPERVISOR_CONFIG_DIR=/app/config/supervisor
ENV ROUTING_PORT=3210
EXPOSE 3210
USER 65532:65532
ENTRYPOINT ["/usr/local/bin/routing"]

View File

@@ -12,6 +12,7 @@ Your Claude Code session (in any project)
│ MCP over HTTP (Tailscale)
├──▶ supervisor :3200 (NodePort 30320 on koala) — skill workers: tdd, debug, spec, …
├──▶ routing :3210 (NodePort 30310 on koala) — Mode 2 only: review, debug, retrospective, trainer
└──▶ brain :3300 (NodePort 30330 on koala) — brain_query, brain_write, brain_ingest, session_log
└─ also serves the legacy REST endpoints (/query, /write, /ingest, …)
@@ -112,6 +113,16 @@ The supervisor probes connectivity at call time:
| `INGEST_BASE_URL` | `http://localhost:3300` | Supervisor → ingestion |
| `LITELLM_BASE_URL` | — | LiteLLM proxy for Tier 2 model routing |
| `SUPERVISOR_MCP_TOKEN` | — | Optional bearer token for the supervisor MCP HTTP endpoint; when empty, no auth is enforced |
| `ROUTING_PORT` | `3210` | Routing pod's listen port |
| `ROUTING_MCP_TOKEN` | — | Optional bearer token for the routing MCP HTTP endpoint |
| `BRAIN_URL` | `http://ingestion.supervisor:3300` | Routing pod → brain (in-cluster) |
| `HYPERGUILD_LOCAL_MODEL` | `qwen35` | Local model for routed-to-local skill calls |
| `HYPERGUILD_CLAUDE_MODEL` | `claude-sonnet-4-6` | Claude model for routed-to-Claude skill calls |
| `HYPERGUILD_ROUTE_LOCAL_FLOOR` | `0.90` | At/above pass rate, route to local |
| `HYPERGUILD_ROUTE_LOCAL_CEIL` | `0.70` | Below pass rate, route to Claude. Between CEIL and FLOOR is the sample band. |
| `HYPERGUILD_PASS_RATE_TTL_SECONDS` | `60` | Per-skill pass-rate cache TTL |
> **Operator note:** LiteLLM at `LITELLM_BASE_URL` must register both `HYPERGUILD_LOCAL_MODEL` and `HYPERGUILD_CLAUDE_MODEL` for routing to do useful work. If a model is missing, LiteLLM returns 4xx, the routing pod's local route fails, the fail-open retry on Claude likely also fails (since both are missing), and the only signal is `final_status: "fail"` on `_routing` entries in the brain.
## Phase 2 (planned)

View File

@@ -128,6 +128,11 @@ tasks:
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq .
smoke:routing:
desc: Boot the routing pod against live LiteLLM + brain and verify _routing logs land
cmds:
- bash scripts/smoke-routing.sh
# ── Git / Release ──────────────────────────────────────────────────────────
tag:

View File

@@ -115,9 +115,10 @@ Flags:
Modes:
- **cloud** — brain MCP only. Claude Code with no routing.
- **client-local** — brain + routing placeholder. The routing entry's
URL points at `koala:30310/mcp`; a `_routing_pending` field marks it
as awaiting Plan 6 of the hyperguild migration.
- **client-local** — brain + routing pod. The `routing` entry points at
`koala:30310/mcp` (the routing pod, deployed in Plan 6). The
`X-Hyperguild-Mode: client-local` header is forward-compat for future
modes; the pod treats absent or unknown values as `client-local`.
- **sovereign** — brain only, with a `_mode_note` explaining that this
mode primarily uses Crush + LiteLLM and the `.mcp.json` is a Claude
Code fallback for emergency offline use.

View File

@@ -78,9 +78,11 @@ func modeClientLocal(brainURL string) map[string]any {
"description": "Brain MCP — knowledge query, write, ingestion, session log",
},
"routing": map[string]any{
"url": "http://koala:30310/mcp",
"description": "Mode 2 routing pod — routes skill calls to LiteLLM/local",
"_routing_pending": "Plan 6 — routing pod not deployed yet; this URL is a placeholder",
"url": "http://koala:30310/mcp",
"description": "Mode 2 routing pod — routes skill calls to LiteLLM/local",
"headers": map[string]any{
"X-Hyperguild-Mode": "client-local",
},
},
},
}

View File

@@ -39,7 +39,7 @@ func TestRunMode_Cloud_Default(t *testing.T) {
assert.NotContains(t, got, "_mode_note")
}
func TestRunMode_ClientLocal_HasRoutingPlaceholder(t *testing.T) {
func TestRunMode_ClientLocal_HasRoutingEntry(t *testing.T) {
dir := t.TempDir()
outPath := filepath.Join(dir, ".mcp.json")
t.Setenv("BRAIN_URL", "http://koala:30330")
@@ -54,7 +54,32 @@ func TestRunMode_ClientLocal_HasRoutingPlaceholder(t *testing.T) {
require.Contains(t, servers, "routing")
routing := servers["routing"].(map[string]any)
assert.Contains(t, routing, "_routing_pending")
assert.NotContains(t, routing, "_routing_pending", "placeholder should be removed once Plan 6 ships")
headers, ok := routing["headers"].(map[string]any)
require.True(t, ok, "routing entry should have headers block")
assert.Equal(t, "client-local", headers["X-Hyperguild-Mode"])
}
func TestModeClientLocalHasRoutingHeader(t *testing.T) {
tmp := t.TempDir() + "/mcp.json"
out := &bytes.Buffer{}
stderr := &bytes.Buffer{}
require.NoError(t, runMode(context.Background(), []string{"client-local", "--out", tmp}, nil, out, stderr))
body, err := os.ReadFile(tmp)
require.NoError(t, err)
var doc map[string]any
require.NoError(t, json.Unmarshal(body, &doc))
servers := doc["mcpServers"].(map[string]any)
routing := servers["routing"].(map[string]any)
assert.Equal(t, "http://koala:30310/mcp", routing["url"])
assert.NotContains(t, routing, "_routing_pending", "placeholder should be removed once Plan 6 ships")
headers, ok := routing["headers"].(map[string]any)
require.True(t, ok, "routing entry should have headers block")
assert.Equal(t, "client-local", headers["X-Hyperguild-Mode"])
}
func TestRunMode_Sovereign_HasModeNote(t *testing.T) {

123
cmd/routing/main.go Normal file
View File

@@ -0,0 +1,123 @@
package main
// The internal/skills/{debug,retrospective,review,trainer} packages imported
// below are also imported by cmd/supervisor. Plan 7 (supervisor retirement)
// MUST NOT delete these four packages — the routing pod is their second
// consumer. Plan 7 deletes only internal/skills/{tdd,spec,tier} (the skills
// that don't route to local), the supervisor binary, and supervisor manifests.
// See docs/superpowers/specs/2026-05-04-mode-2-routing-pod-design.md (Constraints).
import (
"context"
"log/slog"
"net/http"
"os"
"time"
"github.com/mathiasbq/supervisor/internal/config"
iexec "github.com/mathiasbq/supervisor/internal/exec"
"github.com/mathiasbq/supervisor/internal/mcp"
"github.com/mathiasbq/supervisor/internal/registry"
"github.com/mathiasbq/supervisor/internal/routing"
"github.com/mathiasbq/supervisor/internal/skills/debug"
"github.com/mathiasbq/supervisor/internal/skills/retrospective"
"github.com/mathiasbq/supervisor/internal/skills/review"
"github.com/mathiasbq/supervisor/internal/skills/trainer"
)
func main() {
logger := slog.New(slog.NewTextHandler(os.Stderr, nil))
slog.SetDefault(logger)
cfg, err := config.LoadRouting()
if err != nil {
logger.Error("config load failed", "err", err)
os.Exit(1)
}
configDir := envOr("SUPERVISOR_CONFIG_DIR", "/app/config/supervisor")
mustRead := func(path string) string {
b, err := os.ReadFile(configDir + "/" + path)
if err != nil {
logger.Error("read prompt failed", "path", path, "err", err)
os.Exit(1)
}
return string(b)
}
llm := iexec.NewLiteLLM(cfg.LiteLLMBaseURL, cfg.LiteLLMAPIKey, 0)
router := &routing.Router{
Fetcher: routing.NewFetcher(cfg.BrainURL, "7d", time.Duration(cfg.PassRateTTLSeconds)*time.Second),
Logger: routing.NewLogger(cfg.BrainURL),
Policy: routing.Policy{Floor: cfg.RouteLocalFloor, Ceil: cfg.RouteLocalCeil},
LocalModel: cfg.LocalModel,
ClaudeModel: cfg.ClaudeModel,
Complete: llm.Complete,
}
// Skill packages call CompleteFunc(ctx, model, system, user) — no session_id
// or project_root in the signature. Rather than modifying every skill's API
// (and inflating Plan 6's blast radius), the routing pod logs every decision
// under a fixed session_id "_routing". Operators query
// `GET /pass-rate?skill=_routing&window=...` to inspect routing health.
const routingSessionID = "_routing"
wrap := func(skillName string) routing.CompleteFunc {
return func(ctx context.Context, _, system, user string) (string, int64, error) {
// The model param is ignored: the router picks the model based on policy.
return router.Run(ctx, routing.RunInput{
Skill: skillName,
System: system,
User: user,
SessionID: routingSessionID,
ProjectRoot: "",
})
}
}
reg := registry.New()
reg.Register(review.New(review.Config{
SkillPrompt: mustRead("review.md"),
DefaultModel: cfg.LocalModel,
CompleteFunc: review.CompleteFunc(wrap("review")),
}))
reg.Register(debug.New(debug.Config{
SkillPrompt: mustRead("debug.md"),
DefaultModel: cfg.LocalModel,
CompleteFunc: debug.CompleteFunc(wrap("debug")),
}))
reg.Register(retrospective.New(retrospective.Config{
SkillPrompt: mustRead("retrospective.md"),
DefaultModel: cfg.LocalModel,
CompleteFunc: retrospective.CompleteFunc(wrap("retrospective")),
}))
reg.Register(trainer.New(trainer.Config{
ReaderPrompt: mustRead("trainer-reader.md"),
WriterPrompt: mustRead("trainer-writer.md"),
DefaultModel: cfg.LocalModel,
CompleteFunc: trainer.CompleteFunc(wrap("trainer")),
}))
srv := mcp.NewServer(reg, cfg.MCPAuthToken)
mux := http.NewServeMux()
mux.Handle("/mcp", srv)
mux.HandleFunc("/healthz", func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusOK)
})
addr := ":" + cfg.Port
logger.Info("routing pod starting", "addr", addr,
"local", cfg.LocalModel, "claude", cfg.ClaudeModel,
"floor", cfg.RouteLocalFloor, "ceil", cfg.RouteLocalCeil)
if err := http.ListenAndServe(addr, mux); err != nil { //nolint:gosec
logger.Error("server stopped", "err", err)
os.Exit(1)
}
}
func envOr(key, def string) string {
if v := os.Getenv(key); v != "" {
return v
}
return def
}

123
cmd/routing/main_test.go Normal file
View File

@@ -0,0 +1,123 @@
package main_test
import (
"context"
"encoding/json"
"io"
"net/http"
"net/http/httptest"
"os/exec"
"strings"
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// TestRoutingPodEndToEnd boots the binary against fake LiteLLM + brain servers,
// calls tools/list and one tools/call, and verifies the brain saw a session_log POST.
func TestRoutingPodEndToEnd(t *testing.T) {
if testing.Short() {
t.Skip("end-to-end binary boot")
}
var brainHits int
llm := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
_ = json.NewEncoder(w).Encode(map[string]any{
"choices": []map[string]any{{"message": map[string]any{"role": "assistant", "content": "stub"}}},
})
}))
defer llm.Close()
brain := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/pass-rate":
brainHits++
_ = json.NewEncoder(w).Encode(map[string]any{"pass_rate": 0.95})
case "/mcp":
brainHits++
_ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": 1, "result": map[string]any{}})
}
}))
defer brain.Close()
bin := buildRouting(t)
cmd := exec.Command(bin)
cmd.Env = append(cmd.Env,
"ROUTING_PORT=33310",
"LITELLM_BASE_URL="+llm.URL,
"LITELLM_API_KEY=stub",
"BRAIN_URL="+brain.URL,
"SUPERVISOR_CONFIG_DIR=../../config/supervisor",
"PATH="+osPath(),
)
require.NoError(t, cmd.Start())
t.Cleanup(func() { _ = cmd.Process.Kill() })
require.NoError(t, waitForPort(t, "127.0.0.1:33310", 5*time.Second))
resp := mcpCall(t, "http://127.0.0.1:33310/mcp", `{"jsonrpc":"2.0","id":1,"method":"tools/list"}`)
assert.Contains(t, resp, `"review"`)
assert.Contains(t, resp, `"debug"`)
assert.Contains(t, resp, `"retrospective"`)
assert.Contains(t, resp, `"trainer"`)
resp = mcpCall(t, "http://127.0.0.1:33310/mcp", `{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"review","arguments":{"project_root":"/tmp","files":["README.md"]}}}`)
_ = resp // shape varies by skill; we only need a 200
// Wait briefly for the async session_log to land.
deadline := time.Now().Add(2 * time.Second)
for time.Now().Before(deadline) && brainHits < 2 {
time.Sleep(50 * time.Millisecond)
}
assert.GreaterOrEqual(t, brainHits, 2, "expected at least one /pass-rate hit and one /mcp session_log hit")
}
func buildRouting(t *testing.T) string {
t.Helper()
bin := t.TempDir() + "/routing"
out, err := exec.Command("go", "build", "-o", bin, "github.com/mathiasbq/supervisor/cmd/routing").CombinedOutput()
require.NoError(t, err, "build failed: %s", out)
return bin
}
func waitForPort(_ *testing.T, addr string, dur time.Duration) error {
deadline := time.Now().Add(dur)
for time.Now().Before(deadline) {
c, err := http.Get("http://" + addr + "/healthz") //nolint:noctx
if err == nil {
_ = c.Body.Close()
return nil
}
conn, err := http.NewRequest(http.MethodPost, "http://"+addr+"/mcp", strings.NewReader(`{}`))
if err == nil {
r, err := http.DefaultClient.Do(conn)
if err == nil {
_ = r.Body.Close()
return nil
}
}
time.Sleep(50 * time.Millisecond)
}
return context.DeadlineExceeded
}
func mcpCall(t *testing.T, url, body string) string {
t.Helper()
r, err := http.Post(url, "application/json", strings.NewReader(body)) //nolint:noctx
require.NoError(t, err)
defer func() { _ = r.Body.Close() }()
raw, err := io.ReadAll(r.Body)
require.NoError(t, err)
return string(raw)
}
func osPath() string {
for _, e := range append([]string{}, exec.Command("env").Env...) {
if strings.HasPrefix(e, "PATH=") {
return strings.TrimPrefix(e, "PATH=")
}
}
return "/usr/bin:/bin"
}

View File

@@ -0,0 +1,84 @@
package config
import (
"fmt"
"os"
"strconv"
)
// RoutingConfig holds the runtime configuration for the routing pod.
// Separate from Config because the routing pod's surface differs from the supervisor's.
type RoutingConfig struct {
Port string // ROUTING_PORT, default 3210
MCPAuthToken string // ROUTING_MCP_TOKEN, optional bearer token
LiteLLMBaseURL string // LITELLM_BASE_URL, default http://piguard:4000
LiteLLMAPIKey string // LITELLM_API_KEY
BrainURL string // BRAIN_URL, default http://ingestion.supervisor:3300
LocalModel string // HYPERGUILD_LOCAL_MODEL, default qwen35
ClaudeModel string // HYPERGUILD_CLAUDE_MODEL, default claude-sonnet-4-6
// RouteLocalFloor and RouteLocalCeil intentionally invert the usual
// floor < ceil mathematical convention: Floor (default 0.90) is the
// UPPER boundary — at/above it, always route local; Ceil (default 0.70)
// is the LOWER boundary — below it, always route Claude. The band in
// between is the 50/50 sample zone. The naming follows the spec's policy
// vocabulary; see internal/routing/policy.go for the consumer.
RouteLocalFloor float64 // HYPERGUILD_ROUTE_LOCAL_FLOOR, default 0.90
RouteLocalCeil float64 // HYPERGUILD_ROUTE_LOCAL_CEIL, default 0.70
PassRateTTLSeconds int // HYPERGUILD_PASS_RATE_TTL_SECONDS, default 60
}
func LoadRouting() (RoutingConfig, error) {
cfg := RoutingConfig{
Port: envOr("ROUTING_PORT", "3210"),
MCPAuthToken: os.Getenv("ROUTING_MCP_TOKEN"),
LiteLLMBaseURL: envOr("LITELLM_BASE_URL", "http://piguard:4000"),
LiteLLMAPIKey: os.Getenv("LITELLM_API_KEY"),
BrainURL: envOr("BRAIN_URL", "http://ingestion.supervisor:3300"),
LocalModel: envOr("HYPERGUILD_LOCAL_MODEL", "qwen35"),
ClaudeModel: envOr("HYPERGUILD_CLAUDE_MODEL", "claude-sonnet-4-6"),
}
floor, err := parseFloatEnv("HYPERGUILD_ROUTE_LOCAL_FLOOR", 0.90)
if err != nil {
return RoutingConfig{}, err
}
cfg.RouteLocalFloor = floor
ceil, err := parseFloatEnv("HYPERGUILD_ROUTE_LOCAL_CEIL", 0.70)
if err != nil {
return RoutingConfig{}, err
}
cfg.RouteLocalCeil = ceil
ttl, err := parseIntEnv("HYPERGUILD_PASS_RATE_TTL_SECONDS", 60)
if err != nil {
return RoutingConfig{}, err
}
cfg.PassRateTTLSeconds = ttl
return cfg, nil
}
func parseFloatEnv(key string, def float64) (float64, error) {
v := os.Getenv(key)
if v == "" {
return def, nil
}
f, err := strconv.ParseFloat(v, 64)
if err != nil {
return 0, fmt.Errorf("config: %s: %w", key, err)
}
return f, nil
}
func parseIntEnv(key string, def int) (int, error) {
v := os.Getenv(key)
if v == "" {
return def, nil
}
n, err := strconv.Atoi(v)
if err != nil {
return 0, fmt.Errorf("config: %s: %w", key, err)
}
return n, nil
}

View File

@@ -0,0 +1,73 @@
package config_test
import (
"testing"
"github.com/mathiasbq/supervisor/internal/config"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestLoadRoutingDefaults(t *testing.T) {
for _, k := range []string{
"ROUTING_PORT", "ROUTING_MCP_TOKEN", "LITELLM_BASE_URL", "LITELLM_API_KEY",
"BRAIN_URL", "HYPERGUILD_LOCAL_MODEL", "HYPERGUILD_CLAUDE_MODEL",
"HYPERGUILD_ROUTE_LOCAL_FLOOR", "HYPERGUILD_ROUTE_LOCAL_CEIL",
"HYPERGUILD_PASS_RATE_TTL_SECONDS",
} {
t.Setenv(k, "")
}
cfg, err := config.LoadRouting()
require.NoError(t, err)
assert.Equal(t, "3210", cfg.Port)
assert.Equal(t, "", cfg.MCPAuthToken)
assert.Equal(t, "http://piguard:4000", cfg.LiteLLMBaseURL)
assert.Equal(t, "http://ingestion.supervisor:3300", cfg.BrainURL)
assert.Equal(t, "qwen35", cfg.LocalModel)
assert.Equal(t, "claude-sonnet-4-6", cfg.ClaudeModel)
assert.InDelta(t, 0.90, cfg.RouteLocalFloor, 1e-9)
assert.InDelta(t, 0.70, cfg.RouteLocalCeil, 1e-9)
assert.Equal(t, 60, cfg.PassRateTTLSeconds)
assert.Equal(t, "", cfg.LiteLLMAPIKey)
}
func TestLoadRoutingFromEnv(t *testing.T) {
t.Setenv("ROUTING_PORT", "3250")
t.Setenv("ROUTING_MCP_TOKEN", "tok-xyz")
t.Setenv("LITELLM_BASE_URL", "http://localhost:4000")
t.Setenv("LITELLM_API_KEY", "lk")
t.Setenv("BRAIN_URL", "http://localhost:3300")
t.Setenv("HYPERGUILD_LOCAL_MODEL", "qwen2-7b")
t.Setenv("HYPERGUILD_CLAUDE_MODEL", "claude-opus-4-7")
t.Setenv("HYPERGUILD_ROUTE_LOCAL_FLOOR", "0.85")
t.Setenv("HYPERGUILD_ROUTE_LOCAL_CEIL", "0.65")
t.Setenv("HYPERGUILD_PASS_RATE_TTL_SECONDS", "30")
cfg, err := config.LoadRouting()
require.NoError(t, err)
assert.Equal(t, "3250", cfg.Port)
assert.Equal(t, "tok-xyz", cfg.MCPAuthToken)
assert.Equal(t, "http://localhost:4000", cfg.LiteLLMBaseURL)
assert.Equal(t, "lk", cfg.LiteLLMAPIKey)
assert.Equal(t, "http://localhost:3300", cfg.BrainURL)
assert.Equal(t, "qwen2-7b", cfg.LocalModel)
assert.Equal(t, "claude-opus-4-7", cfg.ClaudeModel)
assert.InDelta(t, 0.85, cfg.RouteLocalFloor, 1e-9)
assert.InDelta(t, 0.65, cfg.RouteLocalCeil, 1e-9)
assert.Equal(t, 30, cfg.PassRateTTLSeconds)
}
func TestLoadRoutingRejectsBadFloat(t *testing.T) {
t.Setenv("HYPERGUILD_ROUTE_LOCAL_FLOOR", "not-a-number")
_, err := config.LoadRouting()
require.Error(t, err)
assert.Contains(t, err.Error(), "HYPERGUILD_ROUTE_LOCAL_FLOOR")
}
func TestLoadRoutingRejectsBadInt(t *testing.T) {
t.Setenv("HYPERGUILD_PASS_RATE_TTL_SECONDS", "not-a-number")
_, err := config.LoadRouting()
require.Error(t, err)
assert.Contains(t, err.Error(), "HYPERGUILD_PASS_RATE_TTL_SECONDS")
}

21
internal/routing/hash.go Normal file
View File

@@ -0,0 +1,21 @@
package routing
import (
"crypto/sha256"
"encoding/binary"
)
// CanonicalHash returns a deterministic 64-bit hash of (system, user).
// Used to make sample-band routing decisions reproducible: identical input
// strings produce the same hash on every call, independent of process state.
//
// Inputs are joined with a 0x00 byte separator before hashing — distinguishes
// (system="ab", user="cd") from (system="abcd", user="").
func CanonicalHash(system, user string) uint64 {
h := sha256.New()
h.Write([]byte(system))
h.Write([]byte{0})
h.Write([]byte(user))
sum := h.Sum(nil)
return binary.BigEndian.Uint64(sum[:8])
}

View File

@@ -0,0 +1,46 @@
package routing_test
import (
"testing"
"github.com/mathiasbq/supervisor/internal/routing"
"github.com/stretchr/testify/assert"
)
func TestCanonicalHashDeterministic(t *testing.T) {
a := routing.CanonicalHash("system one", "user one")
b := routing.CanonicalHash("system one", "user one")
assert.Equal(t, a, b, "same inputs must produce same hash")
}
func TestCanonicalHashDistinguishesInputs(t *testing.T) {
cases := [][2]string{
{"sys", "user"},
{"sys", "user2"},
{"sys2", "user"},
{"", "system\x00user"}, // separator collision attempt
{"system\x00user", ""},
}
seen := make(map[uint64]bool)
for _, c := range cases {
h := routing.CanonicalHash(c[0], c[1])
assert.False(t, seen[h], "collision on %v", c)
seen[h] = true
}
}
func TestCanonicalHashLowBitDistribution(t *testing.T) {
// Sanity check: across 1000 distinct inputs, low-bit split is roughly even.
zeros, ones := 0, 0
for i := 0; i < 1000; i++ {
h := routing.CanonicalHash("sys", string(rune('a'+(i%26)))+string(rune(i)))
if h&1 == 0 {
zeros++
} else {
ones++
}
}
// Allow ±15% deviation from 500/500. Tighter would be flaky on real data.
assert.InDelta(t, 500, zeros, 150)
assert.InDelta(t, 500, ones, 150)
}

79
internal/routing/log.go Normal file
View File

@@ -0,0 +1,79 @@
package routing
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"time"
)
// LogEntry describes a single routing decision to log via the brain MCP.
type LogEntry struct {
SessionID string
Skill string // the original skill the call routed (e.g., "review")
Decision string // "local" or "claude" or "claude_fallback"
Message string // free-form, e.g. "model=qwen35, pass_rate=0.94"
ProjectRoot string
DurationMs int64
Failed bool // true → final_status: "fail"; false → "skip"
}
// Logger posts session_log entries to a brain MCP at BrainURL + /mcp.
type Logger struct {
BrainURL string
HTTP *http.Client
}
// NewLogger creates a Logger with a 2-second HTTP timeout.
func NewLogger(brainURL string) *Logger {
return &Logger{
BrainURL: brainURL,
HTTP: &http.Client{Timeout: 2 * time.Second},
}
}
// LogDecision posts a session_log MCP call. Errors are returned but the caller
// MUST NOT block real work on them — logging is best-effort.
func (l *Logger) LogDecision(ctx context.Context, e LogEntry) error {
status := "skip"
if e.Failed {
status = "fail"
}
payload := map[string]any{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": map[string]any{
"name": "session_log",
"arguments": map[string]any{
"session_id": e.SessionID,
"skill": "_routing",
"phase": "decide",
"final_status": status,
"message": fmt.Sprintf("%s: %s — %s", e.Skill, e.Decision, e.Message),
"duration_ms": e.DurationMs,
"project_root": e.ProjectRoot,
},
},
}
body, err := json.Marshal(payload)
if err != nil {
return fmt.Errorf("log: marshal: %w", err)
}
req, err := http.NewRequestWithContext(ctx, http.MethodPost, l.BrainURL+"/mcp", bytes.NewReader(body))
if err != nil {
return fmt.Errorf("log: build request: %w", err)
}
req.Header.Set("Content-Type", "application/json")
resp, err := l.HTTP.Do(req)
if err != nil {
return fmt.Errorf("log: request: %w", err)
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("log: server returned status %d", resp.StatusCode)
}
return nil
}

View File

@@ -0,0 +1,81 @@
package routing_test
import (
"context"
"encoding/json"
"io"
"net/http"
"net/http/httptest"
"testing"
"github.com/mathiasbq/supervisor/internal/routing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestLoggerLogDecision(t *testing.T) {
var captured map[string]any
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
assert.Equal(t, http.MethodPost, r.Method)
assert.Equal(t, "/mcp", r.URL.Path)
body, _ := io.ReadAll(r.Body)
require.NoError(t, json.Unmarshal(body, &captured))
_ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": 1, "result": map[string]any{"content": []map[string]any{{"type": "text", "text": "ok"}}}})
}))
defer srv.Close()
l := routing.NewLogger(srv.URL)
err := l.LogDecision(context.Background(), routing.LogEntry{
SessionID: "sess-1",
Skill: "review",
Decision: "local",
Message: "model=qwen35, pass_rate=0.94",
ProjectRoot: "/home/x/proj",
DurationMs: 1234,
Failed: false,
})
require.NoError(t, err)
params := captured["params"].(map[string]any)
assert.Equal(t, "tools/call", captured["method"])
assert.Equal(t, "session_log", params["name"])
args := params["arguments"].(map[string]any)
assert.Equal(t, "_routing", args["skill"])
assert.Equal(t, "decide", args["phase"])
assert.Equal(t, "skip", args["final_status"])
assert.Contains(t, args["message"].(string), "review: local")
assert.Equal(t, "sess-1", args["session_id"])
assert.Equal(t, "/home/x/proj", args["project_root"])
assert.Equal(t, float64(1234), args["duration_ms"])
}
func TestLoggerLogFailure(t *testing.T) {
var captured map[string]any
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
body, _ := io.ReadAll(r.Body)
_ = json.Unmarshal(body, &captured)
_ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": 1, "result": map[string]any{}})
}))
defer srv.Close()
l := routing.NewLogger(srv.URL)
err := l.LogDecision(context.Background(), routing.LogEntry{
SessionID: "s", Skill: "debug", Decision: "local", Message: "litellm down", Failed: true,
})
require.NoError(t, err)
args := captured["params"].(map[string]any)["arguments"].(map[string]any)
assert.Equal(t, "fail", args["final_status"])
}
func TestLoggerSurfacesUpstreamError(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
http.Error(w, "down", http.StatusBadGateway)
}))
defer srv.Close()
l := routing.NewLogger(srv.URL)
err := l.LogDecision(context.Background(), routing.LogEntry{Skill: "x", SessionID: "y", Decision: "local"})
require.Error(t, err)
}

View File

@@ -0,0 +1,85 @@
package routing
import (
"context"
"encoding/json"
"fmt"
"net/http"
"net/url"
"sync"
"time"
)
// Fetcher reads /pass-rate from the brain pod with a per-skill TTL cache.
type Fetcher struct {
BaseURL string
Window string
TTL time.Duration
HTTP *http.Client
mu sync.Mutex
cache map[string]cachedRate
}
type cachedRate struct {
value *float64
at time.Time
}
type passRateResponse struct {
PassRate *float64 `json:"pass_rate"`
}
// NewFetcher returns a Fetcher that calls baseURL + /pass-rate with the
// given window string. If ttl is zero, defaults to 60 seconds. The HTTP
// client uses a 1-second total timeout.
func NewFetcher(baseURL, window string, ttl time.Duration) *Fetcher {
if ttl == 0 {
ttl = 60 * time.Second
}
return &Fetcher{
BaseURL: baseURL,
Window: window,
TTL: ttl,
HTTP: &http.Client{Timeout: time.Second},
cache: make(map[string]cachedRate),
}
}
// Get returns the pass rate for the named skill, or nil if no data exists,
// or an error if the brain is unreachable. Caches successful results.
func (f *Fetcher) Get(ctx context.Context, skill string) (*float64, error) {
f.mu.Lock()
if c, ok := f.cache[skill]; ok && time.Since(c.at) < f.TTL {
v := c.value
f.mu.Unlock()
return v, nil
}
f.mu.Unlock()
u := fmt.Sprintf("%s/pass-rate?skill=%s&window=%s",
f.BaseURL, url.QueryEscape(skill), url.QueryEscape(f.Window))
req, err := http.NewRequestWithContext(ctx, http.MethodGet, u, nil)
if err != nil {
return nil, fmt.Errorf("passrate: build request: %w", err)
}
resp, err := f.HTTP.Do(req)
if err != nil {
return nil, fmt.Errorf("passrate: request: %w", err)
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("passrate: server returned status %d", resp.StatusCode)
}
var body passRateResponse
if err := json.NewDecoder(resp.Body).Decode(&body); err != nil {
return nil, fmt.Errorf("passrate: decode: %w", err)
}
f.mu.Lock()
f.cache[skill] = cachedRate{value: body.PassRate, at: time.Now()}
f.mu.Unlock()
return body.PassRate, nil
}

View File

@@ -0,0 +1,94 @@
package routing_test
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"sync/atomic"
"testing"
"time"
"github.com/mathiasbq/supervisor/internal/routing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestFetcherGetReturnsPassRate(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
assert.Equal(t, http.MethodGet, r.Method)
assert.Equal(t, "/pass-rate", r.URL.Path)
assert.Equal(t, "tdd", r.URL.Query().Get("skill"))
assert.Equal(t, "7d", r.URL.Query().Get("window"))
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(map[string]any{"skill": "tdd", "pass_rate": 0.94})
}))
defer srv.Close()
f := routing.NewFetcher(srv.URL, "7d", time.Minute)
pr, err := f.Get(context.Background(), "tdd")
require.NoError(t, err)
require.NotNil(t, pr)
assert.InDelta(t, 0.94, *pr, 1e-9)
}
func TestFetcherGetReturnsNilWhenNoData(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
_ = json.NewEncoder(w).Encode(map[string]any{"skill": "novel", "pass_rate": nil})
}))
defer srv.Close()
f := routing.NewFetcher(srv.URL, "7d", time.Minute)
pr, err := f.Get(context.Background(), "novel")
require.NoError(t, err)
assert.Nil(t, pr)
}
func TestFetcherCachesWithinTTL(t *testing.T) {
var calls int32
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
atomic.AddInt32(&calls, 1)
_ = json.NewEncoder(w).Encode(map[string]any{"pass_rate": 0.5})
}))
defer srv.Close()
f := routing.NewFetcher(srv.URL, "7d", time.Minute)
for i := 0; i < 5; i++ {
_, err := f.Get(context.Background(), "tdd")
require.NoError(t, err)
}
assert.Equal(t, int32(1), atomic.LoadInt32(&calls), "should hit upstream once and serve four times from cache")
}
func TestFetcherFetchesAgainAfterTTLExpires(t *testing.T) {
var calls int32
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
atomic.AddInt32(&calls, 1)
_ = json.NewEncoder(w).Encode(map[string]any{"pass_rate": 0.5})
}))
defer srv.Close()
// Tight TTL so the test stays fast.
f := routing.NewFetcher(srv.URL, "7d", 5*time.Millisecond)
_, err := f.Get(context.Background(), "tdd")
require.NoError(t, err)
assert.Equal(t, int32(1), atomic.LoadInt32(&calls))
// Sleep past TTL, then a second Get should hit upstream again.
time.Sleep(15 * time.Millisecond)
_, err = f.Get(context.Background(), "tdd")
require.NoError(t, err)
assert.Equal(t, int32(2), atomic.LoadInt32(&calls), "expected fresh upstream call after TTL expiry")
}
func TestFetcherSurfacesUpstreamError(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
http.Error(w, "boom", http.StatusInternalServerError)
}))
defer srv.Close()
f := routing.NewFetcher(srv.URL, "7d", time.Minute)
pr, err := f.Get(context.Background(), "tdd")
require.Error(t, err)
assert.Nil(t, pr)
}

View File

@@ -0,0 +1,47 @@
package routing
// Decision is the route picked for a single skill call.
type Decision int
const (
DecideLocal Decision = iota
DecideClaude
)
func (d Decision) String() string {
if d == DecideLocal {
return "local"
}
return "claude"
}
// Policy holds the floor/ceil thresholds for routing decisions.
//
// Rules (in order):
//
// 1. passRate == nil → DecideLocal (default-to-local for cost-routable skills)
// 2. *passRate >= Floor → DecideLocal (trust local)
// 3. *passRate < Ceil → DecideClaude (don't trust local)
// 4. otherwise (sample band) → requestHash low bit picks: 0=local, 1=claude
type Policy struct {
Floor float64
Ceil float64
}
// Decide returns the routing decision for a single call.
// requestHash is consulted only when passRate is in the sample band [Ceil, Floor).
func (p Policy) Decide(passRate *float64, requestHash uint64) Decision {
if passRate == nil {
return DecideLocal
}
if *passRate >= p.Floor {
return DecideLocal
}
if *passRate < p.Ceil {
return DecideClaude
}
if requestHash&1 == 0 {
return DecideLocal
}
return DecideClaude
}

View File

@@ -0,0 +1,36 @@
package routing_test
import (
"testing"
"github.com/mathiasbq/supervisor/internal/routing"
"github.com/stretchr/testify/assert"
)
func ptr(f float64) *float64 { return &f }
func TestPolicyDecide(t *testing.T) {
p := routing.Policy{Floor: 0.9, Ceil: 0.7}
cases := []struct {
name string
passRate *float64
hash uint64
want routing.Decision
}{
{"null pass rate → local", nil, 0, routing.DecideLocal},
{"null pass rate, hash irrelevant → local", nil, 0xDEADBEEF, routing.DecideLocal},
{"at floor → local", ptr(0.9), 0, routing.DecideLocal},
{"above floor → local", ptr(0.95), 0, routing.DecideLocal},
{"below ceil → claude", ptr(0.5), 0, routing.DecideClaude},
{"at ceil → sample-band even-hash → local", ptr(0.7), 0, routing.DecideLocal},
{"sample band, even hash → local", ptr(0.8), 2, routing.DecideLocal},
{"sample band, odd hash → claude", ptr(0.8), 3, routing.DecideClaude},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
assert.Equal(t, tc.want, p.Decide(tc.passRate, tc.hash))
})
}
}

View File

@@ -0,0 +1,84 @@
package routing
import (
"context"
"fmt"
"log/slog"
)
// CompleteFunc matches the signature used by every skill package's Config.
type CompleteFunc func(ctx context.Context, model, system, user string) (string, int64, error)
// RunInput captures the per-call inputs the dispatch wrapper needs.
type RunInput struct {
Skill string
System string
User string
SessionID string
ProjectRoot string
}
// Router composes a pass-rate fetcher, a decision policy, a session logger,
// and a LiteLLM client. Skill packages receive Router.Run as their CompleteFunc.
type Router struct {
Fetcher *Fetcher
Logger *Logger
Policy Policy
LocalModel string
ClaudeModel string
Complete CompleteFunc
}
// Run executes one skill call: decides local vs claude, calls LiteLLM, logs the
// decision. On local-side error, falls open by retrying once on the Claude model.
func (r *Router) Run(ctx context.Context, in RunInput) (string, int64, error) {
pr, ferr := r.Fetcher.Get(ctx, in.Skill)
if ferr != nil {
slog.Warn("router: pass-rate unreachable, defaulting to local", "skill", in.Skill, "err", ferr)
pr = nil
}
hash := CanonicalHash(in.System, in.User)
decision := r.Policy.Decide(pr, hash)
model := r.ClaudeModel
if decision == DecideLocal {
model = r.LocalModel
}
out, ms, err := r.Complete(ctx, model, in.System, in.User)
if lerr := r.Logger.LogDecision(ctx, LogEntry{
SessionID: in.SessionID,
Skill: in.Skill,
Decision: decision.String(),
Message: fmt.Sprintf("model=%s, pass_rate=%s", model, formatPassRate(pr)),
ProjectRoot: in.ProjectRoot,
DurationMs: ms,
Failed: err != nil,
}); lerr != nil {
slog.Warn("router: log decision failed", "skill", in.Skill, "err", lerr)
}
if err != nil && decision == DecideLocal {
slog.Warn("router: local failed, falling open to claude", "skill", in.Skill, "err", err)
out, ms, err = r.Complete(ctx, r.ClaudeModel, in.System, in.User)
if lerr := r.Logger.LogDecision(ctx, LogEntry{
SessionID: in.SessionID,
Skill: in.Skill,
Decision: "claude_fallback",
Message: fmt.Sprintf("model=%s, after-local-error", r.ClaudeModel),
ProjectRoot: in.ProjectRoot,
DurationMs: ms,
Failed: err != nil,
}); lerr != nil {
slog.Warn("router: log decision failed", "skill", in.Skill, "err", lerr)
}
}
return out, ms, err
}
func formatPassRate(pr *float64) string {
if pr == nil {
return "null"
}
return fmt.Sprintf("%.2f", *pr)
}

View File

@@ -0,0 +1,136 @@
package routing_test
import (
"context"
"encoding/json"
"errors"
"net/http"
"net/http/httptest"
"sync"
"testing"
"time"
"github.com/mathiasbq/supervisor/internal/routing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
type fakeLLM struct {
mu sync.Mutex
calls []struct{ Model, System, User string }
resp string
err error
errOn string // if non-empty, only the named model errors
}
func (f *fakeLLM) Complete(_ context.Context, model, system, user string) (string, int64, error) {
f.mu.Lock()
defer f.mu.Unlock()
f.calls = append(f.calls, struct{ Model, System, User string }{model, system, user})
if f.errOn == model {
return "", 0, f.err
}
if f.err != nil && f.errOn == "" {
return "", 0, f.err
}
return f.resp, 100, nil
}
func newRouter(t *testing.T, llm *fakeLLM, passRate float64) (*routing.Router, *httptest.Server, *httptest.Server) {
t.Helper()
brain := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/pass-rate":
_ = json.NewEncoder(w).Encode(map[string]any{"pass_rate": passRate})
case "/mcp":
_ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": 1, "result": map[string]any{}})
}
}))
t.Cleanup(brain.Close)
r := &routing.Router{
Fetcher: routing.NewFetcher(brain.URL, "7d", time.Minute),
Logger: routing.NewLogger(brain.URL),
Policy: routing.Policy{Floor: 0.9, Ceil: 0.7},
LocalModel: "qwen35",
ClaudeModel: "claude-sonnet-4-6",
Complete: llm.Complete,
}
return r, brain, brain
}
func TestRouterRoutesLocalAtHighPassRate(t *testing.T) {
llm := &fakeLLM{resp: "ok"}
r, _, _ := newRouter(t, llm, 0.95)
out, _, err := r.Run(context.Background(), routing.RunInput{
Skill: "review", System: "sys", User: "user", SessionID: "s1", ProjectRoot: "/p",
})
require.NoError(t, err)
assert.Equal(t, "ok", out)
llm.mu.Lock()
defer llm.mu.Unlock()
require.Len(t, llm.calls, 1)
assert.Equal(t, "qwen35", llm.calls[0].Model)
}
func TestRouterRoutesClaudeAtLowPassRate(t *testing.T) {
llm := &fakeLLM{resp: "ok"}
r, _, _ := newRouter(t, llm, 0.3)
_, _, err := r.Run(context.Background(), routing.RunInput{
Skill: "review", System: "sys", User: "user", SessionID: "s2",
})
require.NoError(t, err)
llm.mu.Lock()
defer llm.mu.Unlock()
require.Len(t, llm.calls, 1)
assert.Equal(t, "claude-sonnet-4-6", llm.calls[0].Model)
}
func TestRouterFailsOpenLocalErrorToClaude(t *testing.T) {
llm := &fakeLLM{resp: "ok-after-fallback", err: errors.New("local boom"), errOn: "qwen35"}
r, _, _ := newRouter(t, llm, 0.95) // would route local
out, _, err := r.Run(context.Background(), routing.RunInput{
Skill: "review", System: "sys", User: "user", SessionID: "s3",
})
require.NoError(t, err)
assert.Equal(t, "ok-after-fallback", out)
llm.mu.Lock()
defer llm.mu.Unlock()
require.Len(t, llm.calls, 2)
assert.Equal(t, "qwen35", llm.calls[0].Model)
assert.Equal(t, "claude-sonnet-4-6", llm.calls[1].Model)
}
func TestRouterDefaultsToLocalWhenBrainUnreachable(t *testing.T) {
// Brain returns 500 → fetcher errors → router treats pass rate as nil → local.
brain := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
http.Error(w, "down", http.StatusInternalServerError)
}))
defer brain.Close()
llm := &fakeLLM{resp: "ok"}
r := &routing.Router{
Fetcher: routing.NewFetcher(brain.URL, "7d", time.Minute),
Logger: routing.NewLogger(brain.URL),
Policy: routing.Policy{Floor: 0.9, Ceil: 0.7},
LocalModel: "qwen35",
ClaudeModel: "claude-sonnet-4-6",
Complete: llm.Complete,
}
_, _, err := r.Run(context.Background(), routing.RunInput{
Skill: "review", System: "sys", User: "user", SessionID: "s4",
})
require.NoError(t, err)
llm.mu.Lock()
defer llm.mu.Unlock()
require.Len(t, llm.calls, 1)
assert.Equal(t, "qwen35", llm.calls[0].Model)
}

View File

@@ -0,0 +1,80 @@
package routing_test
import (
"context"
"encoding/json"
"os"
"sort"
"testing"
"github.com/mathiasbq/supervisor/internal/registry"
"github.com/mathiasbq/supervisor/internal/skills/debug"
"github.com/mathiasbq/supervisor/internal/skills/retrospective"
"github.com/mathiasbq/supervisor/internal/skills/review"
"github.com/mathiasbq/supervisor/internal/skills/trainer"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// TestToolsListMatchesSupervisorSnapshot pins the four routed skills' tool
// definitions to the supervisor's current advertisement. A deliberate schema
// change must be reflected here by updating testdata/tools_list.snapshot.json.
func TestToolsListMatchesSupervisorSnapshot(t *testing.T) {
complete := func(_ context.Context, _, _, _ string) (string, int64, error) {
return "", 0, nil
}
reg := registry.New()
reg.Register(review.New(review.Config{
SkillPrompt: "stub",
DefaultModel: "stub",
CompleteFunc: complete,
}))
reg.Register(debug.New(debug.Config{
SkillPrompt: "stub",
DefaultModel: "stub",
CompleteFunc: complete,
}))
reg.Register(retrospective.New(retrospective.Config{
SkillPrompt: "stub",
DefaultModel: "stub",
CompleteFunc: complete,
}))
reg.Register(trainer.New(trainer.Config{
ReaderPrompt: "stub",
WriterPrompt: "stub",
DefaultModel: "stub",
CompleteFunc: complete,
}))
wanted := map[string]bool{
"review": true,
"debug": true,
"retrospective": true,
"trainer": true,
}
var routed []registry.ToolDef
for _, td := range reg.Tools() {
if wanted[td.Name] {
routed = append(routed, td)
}
}
sort.Slice(routed, func(i, j int) bool { return routed[i].Name < routed[j].Name })
got, err := json.MarshalIndent(routed, "", " ")
require.NoError(t, err)
want, err := os.ReadFile("testdata/tools_list.snapshot.json")
require.NoError(t, err)
// Normalize both via re-encode so whitespace differences don't dominate.
var gotV, wantV any
require.NoError(t, json.Unmarshal(got, &gotV))
require.NoError(t, json.Unmarshal(want, &wantV))
gotN, _ := json.MarshalIndent(gotV, "", " ")
wantN, _ := json.MarshalIndent(wantV, "", " ")
assert.Equal(t, string(wantN), string(gotN),
"tool advertisement drifted from supervisor snapshot — update testdata/tools_list.snapshot.json deliberately if the schema change is intentional")
}

View File

@@ -0,0 +1,97 @@
[
{
"name": "debug",
"description": "Consult a local model to analyse an error and return hypotheses ordered by likelihood, each with a concrete verification step.",
"inputSchema": {
"properties": {
"context": {
"type": "string"
},
"error": {
"type": "string"
},
"model": {
"type": "string"
},
"project_root": {
"type": "string"
},
"session_id": {
"type": "string"
}
},
"required": [
"project_root",
"error"
],
"type": "object"
}
},
{
"name": "retrospective",
"description": "Consult a local model to analyse a completed session and identify what is novel or worth preserving as organizational knowledge.",
"inputSchema": {
"type": "object",
"required": [
"session_id"
],
"properties": {
"session_id": {
"type": "string"
},
"model": {
"type": "string"
}
}
}
},
{
"name": "review",
"description": "Consult a local model for a structured code review of the specified files. Returns findings with severity levels.",
"inputSchema": {
"properties": {
"context": {
"type": "string"
},
"files": {
"items": {
"type": "string"
},
"type": "array"
},
"model": {
"type": "string"
},
"project_root": {
"type": "string"
},
"session_id": {
"type": "string"
}
},
"required": [
"project_root",
"files"
],
"type": "object"
}
},
{
"name": "trainer",
"description": "Consult a local model to identify learning moments from a session log and suggest knowledge to preserve in the brain.",
"inputSchema": {
"properties": {
"model": {
"type": "string"
},
"session_id": {
"type": "string"
}
},
"required": [
"session_id"
],
"type": "object"
}
}
]

64
scripts/smoke-routing.sh Executable file
View File

@@ -0,0 +1,64 @@
#!/usr/bin/env bash
set -euo pipefail
# Boot the routing binary and exercise its four tools against live deps.
# Skipped when LITELLM_BASE_URL or BRAIN_URL is unreachable.
LITELLM_BASE_URL="${LITELLM_BASE_URL:-http://piguard:4000}"
BRAIN_URL="${BRAIN_URL:-http://koala:30330}"
if ! curl -sS --max-time 2 "${LITELLM_BASE_URL}/v1/models" >/dev/null 2>&1; then
echo "SKIP: LITELLM at ${LITELLM_BASE_URL} unreachable"
exit 0
fi
if ! curl -sS --max-time 2 "${BRAIN_URL}/query" -X POST -d '{"query":"x","k":1}' -H 'Content-Type: application/json' >/dev/null 2>&1; then
echo "SKIP: BRAIN at ${BRAIN_URL} unreachable"
exit 0
fi
PORT=33310
BIN=$(mktemp)
trap 'rm -f $BIN; pkill -P $$ -f "$BIN" 2>/dev/null || true' EXIT
go build -o "$BIN" ./cmd/routing
LITELLM_BASE_URL="$LITELLM_BASE_URL" BRAIN_URL="$BRAIN_URL" \
ROUTING_PORT="$PORT" SUPERVISOR_CONFIG_DIR="$(pwd)/config/supervisor" \
"$BIN" &
# Wait for the binary to bind.
for _ in $(seq 1 50); do
curl -sS "http://127.0.0.1:${PORT}/healthz" >/dev/null 2>&1 && break
sleep 0.1
done
call_tool() {
local tool="$1"
local args="$2"
curl -sS -X POST "http://127.0.0.1:${PORT}/mcp" \
-H 'Content-Type: application/json' \
-d "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"${tool}\",\"arguments\":${args}}}" \
| jq -e '.result // .error' > /dev/null
}
echo "calling tools/list..."
curl -sS -X POST "http://127.0.0.1:${PORT}/mcp" \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' \
| jq -r '.result.tools | map(.name) | sort | .[]'
echo "calling each tool..."
call_tool review '{"project_root":"/tmp","files":["README.md"],"session_id":"smoke-1"}'
call_tool debug '{"project_root":"/tmp","error":"smoke test","session_id":"smoke-1"}'
call_tool retrospective '{"session_id":"smoke-1"}'
call_tool trainer '{"session_id":"smoke-1"}'
echo "checking brain has _routing entries..."
sleep 2
COUNT=$(curl -sS "${BRAIN_URL}/pass-rate?skill=_routing&window=1h" | jq -r '.total // 0')
if [ "${COUNT}" -lt 4 ]; then
echo "FAIL: expected >=4 _routing entries in last 1h, got ${COUNT}"
exit 1
fi
echo "PASS: smoke:routing"