Long markdown files (>~8KB) silently failed to embed because nomic-embed-text on iguana has a 2048-token context. embed sync logged errors=1 every cycle with no useful body until #37 added per-item logging — three files exceed the ceiling: finbert source (8 KB), koala-machine-state (7.1 KB), litellm-absorption (8.8 KB). Curated knowledge entries should never be vector-blind. Approach: chunk-before-embed, no schema change. vectorstore/chunk.go (new) - ChunkMarkdown splits at H1/H2 boundaries; sections over maxBytes are further split at paragraph boundaries, packing greedily under budget. - NumberChunks assigns "<parent>#NNNN" storage paths (1-based, zero-padded to 4 digits — handles files with up to ~10k sections in stable sort order). - ParentPath strips the chunk suffix for retrieval-side dedup. vectorstore/sync.go - After ChunkMarkdown produces N pieces, each is embedded + upserted as a separate brain_embeddings row at "<parent>#NNNN". maxChunkBytes = 4000 (≈1000 nomic tokens, well under the 2048 ceiling with headroom for unicode/code blocks). - "Already embedded?" check now reduces known paths to parent set via ParentPath, so the first chunk hit short-circuits the file. - Delete walk also reduces via ParentPath; when a parent file disappears, every chunk row (and any pre-existing bare-path row, for backward compatibility with rows written before this change) gets dropped. search/search.go - hybridMerge collapses chunk-path vector hits to parent via ParentPath before scope check, RRF accumulation, and hydration. A file with three chunk hits returns one result row, not three. Backward compatibility: pre-existing bare-path rows in brain_embeddings keep working — ParentPath returns them unchanged, knownParents handles them as if they were "wiki/foo.md#NNNN" hits, sync skips re-embed, and search dedup is a no-op for them. No migration required to ship. Tests: - chunk_test.go covers short / heading split / oversized section / content preservation / chunk numbering / parent-path stripping. - sync_test.go adds long-file chunking, single-chunk-row short file, skip-if-any-chunk-known, delete-all-chunks-of-disappeared-file. Existing tests updated for #NNNN paths. - search_test.go adds chunk-paths-dedupe-to-parent. Closes gitea/mathias/infra#38.
hyperguild
An MCP server that acts as a disciplined AI supervisor for Claude Code sessions. Instead of letting Claude Code do whatever it wants, hyperguild enforces structured workflows (TDD red/green/refactor), logs every session, and accumulates learnings into a searchable brain.
How it works
Your Claude Code session (in any project)
│
│ MCP over HTTP (Tailscale)
├──▶ supervisor :3200 (NodePort 30320 on koala) — skill workers: tdd, debug, spec, …
├──▶ routing :3210 (NodePort 30310 on koala) — Mode 2 only: review, debug, retrospective, trainer
└──▶ brain :3300 (NodePort 30330 on koala) — brain_query, brain_write, brain_ingest, session_log
│
└─ also serves the legacy REST endpoints (/query, /write, /ingest, …)
│
▼
brain/
├── sessions/ — JSONL log, one file per session_id
├── wiki/ — searchable knowledge (full-text)
│ ├── concepts/
│ ├── entities/
│ └── sources/
├── raw/ — retrospective output, staged for review
└── training-data/ — SFT/DPO/RL data (Phase 2)
Phase 1 tools (available now)
| Tool | What it does |
|---|---|
tdd_red |
Writes a failing test for a spec, verifies it fails |
tdd_green |
Writes the minimal implementation to make tests pass |
tdd_refactor |
Cleans up implementation while keeping tests green |
session_log |
Appends a structured entry to the session JSONL log |
retrospective |
Reads the session log, identifies novel learnings, writes to brain/raw/ |
brain_query |
Full-text search over brain/wiki/ |
brain_write |
Writes a note to brain/raw/ (with optional YAML frontmatter) |
tier |
Returns the current connectivity tier (1=cloud, 2=LAN, 3=offline) |
Start the servers
# Requires goreman: go install github.com/mattn/goreman@latest
task start # starts ingestion (:3300) + supervisor (:3200) via goreman
task stop # kills both by port
Connect a project
Create .mcp.json in your project root:
{
"mcpServers": {
"supervisor": {
"type": "http",
"url": "http://koala:30320/mcp"
},
"brain": {
"type": "http",
"url": "http://koala:30330/mcp"
}
}
}
Two MCP servers are exposed today, both reachable over Tailscale:
supervisoratkoala:30320— skill workers (tdd_red/green/refactor,review,debug,spec,retrospective,trainer,tier).brainatkoala:30330— knowledge access (brain_query,brain_write,brain_ingest,brain_ingest_raw) andsession_log. Hosted by the ingestion service directly, no separate pod.
No local binary or stdio shim is required — Claude Code talks to both via HTTP.
Open Claude Code in your project — run /mcp to confirm both servers are listed.
A typical TDD session
1. Call tdd_red → spec in, failing test file out
2. Call tdd_green → test path in, implementation out
3. Call tdd_refactor → impl + test in, cleaned code out
4. Call session_log → log each phase result
5. Call retrospective → extracts learnings → brain/raw/
6. Review brain/raw/, move worthy notes to brain/wiki/concepts/
7. Future sessions: call brain_query to retrieve relevant context
Tier detection
The supervisor probes connectivity at call time:
| Tier | Label | Condition |
|---|---|---|
| 1 | full-online | Can reach api.anthropic.com |
| 2 | lan-only | Can reach LiteLLM but not Anthropic |
| 3 | airplane | No external connectivity |
Key env vars
| Variable | Default | Purpose |
|---|---|---|
INGEST_BRAIN_DIR |
../brain |
Brain directory for ingestion server |
INGEST_PORT |
3300 |
Ingestion server port |
SUPERVISOR_CONFIG_DIR |
./config/supervisor |
Skill discipline files |
SUPERVISOR_SESSIONS_DIR |
./brain/sessions |
JSONL session logs |
INGEST_BASE_URL |
http://localhost:3300 |
Supervisor → ingestion |
LITELLM_BASE_URL |
— | LiteLLM proxy for Tier 2 model routing |
SUPERVISOR_MCP_TOKEN |
— | Optional bearer token for the supervisor MCP HTTP endpoint; when empty, no auth is enforced |
ROUTING_PORT |
3210 |
Routing pod's listen port |
ROUTING_MCP_TOKEN |
— | Optional bearer token for the routing MCP HTTP endpoint |
BRAIN_URL |
http://ingestion.supervisor:3300 |
Routing pod → brain (in-cluster) |
HYPERGUILD_FAST_MODEL |
koala/qwen35-9b-fast |
Fast model for high-pass-rate skill calls |
HYPERGUILD_THINKING_MODEL |
iguana/gemma4-26b |
Thinking model for low-pass-rate skill calls |
HYPERGUILD_ROUTE_LOCAL_FLOOR |
0.90 |
At/above pass rate, route to fast model |
HYPERGUILD_ROUTE_LOCAL_CEIL |
0.70 |
Below pass rate, route to thinking model. Between CEIL and FLOOR is the sample band. |
HYPERGUILD_PASS_RATE_TTL_SECONDS |
60 |
Per-skill pass-rate cache TTL |
Operator note: LiteLLM at
LITELLM_BASE_URLmust register bothHYPERGUILD_FAST_MODELandHYPERGUILD_THINKING_MODELfor routing to do useful work. If a model is missing, LiteLLM returns 4xx, the routing pod's fast route fails, the fail-open retry on the thinking model likely also fails (since both are missing), and the only signal isfinal_status: "fail"on_routingentries in the brain.
Phase 2 (planned)
reviewskill — structured code review with iron law enforcementdebugskill — hypothesis-driven debugging sessionsspecskill — generates specs from conversationstrainer— extracts SFT/DPO pairs from session logs for fine-tuning