The routing decision is about reasoning capacity, not cost or provider. Fast model (koala/qwen35-9b-fast) handles high-pass-rate calls; thinking model (iguana/gemma4-26b) handles low-pass-rate calls. Removes the implicit Anthropic dependency from the routing pod — both models go through LiteLLM. Renames: HYPERGUILD_LOCAL_MODEL → HYPERGUILD_FAST_MODEL, HYPERGUILD_CLAUDE_MODEL → HYPERGUILD_THINKING_MODEL, Router.LocalModel → FastModel, Router.ClaudeModel → ThinkingModel, log decision "claude_fallback" → "thinking_fallback". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.6 KiB
hyperguild
An MCP server that acts as a disciplined AI supervisor for Claude Code sessions. Instead of letting Claude Code do whatever it wants, hyperguild enforces structured workflows (TDD red/green/refactor), logs every session, and accumulates learnings into a searchable brain.
How it works
Your Claude Code session (in any project)
│
│ MCP over HTTP (Tailscale)
├──▶ supervisor :3200 (NodePort 30320 on koala) — skill workers: tdd, debug, spec, …
├──▶ routing :3210 (NodePort 30310 on koala) — Mode 2 only: review, debug, retrospective, trainer
└──▶ brain :3300 (NodePort 30330 on koala) — brain_query, brain_write, brain_ingest, session_log
│
└─ also serves the legacy REST endpoints (/query, /write, /ingest, …)
│
▼
brain/
├── sessions/ — JSONL log, one file per session_id
├── wiki/ — searchable knowledge (full-text)
│ ├── concepts/
│ ├── entities/
│ └── sources/
├── raw/ — retrospective output, staged for review
└── training-data/ — SFT/DPO/RL data (Phase 2)
Phase 1 tools (available now)
| Tool | What it does |
|---|---|
tdd_red |
Writes a failing test for a spec, verifies it fails |
tdd_green |
Writes the minimal implementation to make tests pass |
tdd_refactor |
Cleans up implementation while keeping tests green |
session_log |
Appends a structured entry to the session JSONL log |
retrospective |
Reads the session log, identifies novel learnings, writes to brain/raw/ |
brain_query |
Full-text search over brain/wiki/ |
brain_write |
Writes a note to brain/raw/ (with optional YAML frontmatter) |
tier |
Returns the current connectivity tier (1=cloud, 2=LAN, 3=offline) |
Start the servers
# Requires goreman: go install github.com/mattn/goreman@latest
task start # starts ingestion (:3300) + supervisor (:3200) via goreman
task stop # kills both by port
Connect a project
Create .mcp.json in your project root:
{
"mcpServers": {
"supervisor": {
"type": "http",
"url": "http://koala:30320/mcp"
},
"brain": {
"type": "http",
"url": "http://koala:30330/mcp"
}
}
}
Two MCP servers are exposed today, both reachable over Tailscale:
supervisoratkoala:30320— skill workers (tdd_red/green/refactor,review,debug,spec,retrospective,trainer,tier).brainatkoala:30330— knowledge access (brain_query,brain_write,brain_ingest,brain_ingest_raw) andsession_log. Hosted by the ingestion service directly, no separate pod.
No local binary or stdio shim is required — Claude Code talks to both via HTTP.
Open Claude Code in your project — run /mcp to confirm both servers are listed.
A typical TDD session
1. Call tdd_red → spec in, failing test file out
2. Call tdd_green → test path in, implementation out
3. Call tdd_refactor → impl + test in, cleaned code out
4. Call session_log → log each phase result
5. Call retrospective → extracts learnings → brain/raw/
6. Review brain/raw/, move worthy notes to brain/wiki/concepts/
7. Future sessions: call brain_query to retrieve relevant context
Tier detection
The supervisor probes connectivity at call time:
| Tier | Label | Condition |
|---|---|---|
| 1 | full-online | Can reach api.anthropic.com |
| 2 | lan-only | Can reach LiteLLM but not Anthropic |
| 3 | airplane | No external connectivity |
Key env vars
| Variable | Default | Purpose |
|---|---|---|
INGEST_BRAIN_DIR |
../brain |
Brain directory for ingestion server |
INGEST_PORT |
3300 |
Ingestion server port |
SUPERVISOR_CONFIG_DIR |
./config/supervisor |
Skill discipline files |
SUPERVISOR_SESSIONS_DIR |
./brain/sessions |
JSONL session logs |
INGEST_BASE_URL |
http://localhost:3300 |
Supervisor → ingestion |
LITELLM_BASE_URL |
— | LiteLLM proxy for Tier 2 model routing |
SUPERVISOR_MCP_TOKEN |
— | Optional bearer token for the supervisor MCP HTTP endpoint; when empty, no auth is enforced |
ROUTING_PORT |
3210 |
Routing pod's listen port |
ROUTING_MCP_TOKEN |
— | Optional bearer token for the routing MCP HTTP endpoint |
BRAIN_URL |
http://ingestion.supervisor:3300 |
Routing pod → brain (in-cluster) |
HYPERGUILD_FAST_MODEL |
koala/qwen35-9b-fast |
Fast model for high-pass-rate skill calls |
HYPERGUILD_THINKING_MODEL |
iguana/gemma4-26b |
Thinking model for low-pass-rate skill calls |
HYPERGUILD_ROUTE_LOCAL_FLOOR |
0.90 |
At/above pass rate, route to fast model |
HYPERGUILD_ROUTE_LOCAL_CEIL |
0.70 |
Below pass rate, route to thinking model. Between CEIL and FLOOR is the sample band. |
HYPERGUILD_PASS_RATE_TTL_SECONDS |
60 |
Per-skill pass-rate cache TTL |
Operator note: LiteLLM at
LITELLM_BASE_URLmust register bothHYPERGUILD_FAST_MODELandHYPERGUILD_THINKING_MODELfor routing to do useful work. If a model is missing, LiteLLM returns 4xx, the routing pod's fast route fails, the fail-open retry on the thinking model likely also fails (since both are missing), and the only signal isfinal_status: "fail"on_routingentries in the brain.
Phase 2 (planned)
reviewskill — structured code review with iron law enforcementdebugskill — hypothesis-driven debugging sessionsspecskill — generates specs from conversationstrainer— extracts SFT/DPO pairs from session logs for fine-tuning