Files
hyperguild/docs/superpowers/specs/2026-04-17-hyperguild-design.md
Mathias Bergqvist f76c150041 docs: add hyperguild architecture design spec
Full vision for the hyperguild SDO: monorepo structure, two-layer brain
(declarative wiki + parametric training data), operating tiers, MCP tool
surface, session log format, retrospective + trainer workers, and four
implementation phases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 19:53:32 +02:00

16 KiB

Hyperguild — Design Spec

Date: 2026-04-17 Status: approved Author: Mathias + Claude


Overview

The hyperguild is a local Software Development Organization (SDO) of AI workers, running across the Tailscale mesh (koala, iguana, flamingo). It exposes skill workers as MCP tools that Claude Code can call natively during coding sessions. Workers share a two-layer organizational brain — a declarative wiki and a parametric training data store — and operate under a common set of protocols and discipline files.

The hyperguild is designed for graceful degradation across operating tiers: Claude-orchestrated at full power, Ollama-driven on LAN-only, and minimally functional in airplane mode. The organizational shape (protocols, discipline files, tool signatures, output contracts) stays constant across all tiers — only the reasoning quality changes.

Inspired by Senge's five disciplines: personal mastery (discipline files), mental models (organizational CLAUDE.md), shared vision (protocols.md), team learning (retrospective worker), systems thinking (tier-aware routing and the learning loop).


Architecture

┌─────────────────────────────────────────────────────┐
│  Claude Code (orchestrator)                         │
│  Tier 1: Claude strategic lead                      │
│  Tier 2: Ollama orchestrator (LAN)                  │
│  Tier 3: Ollama, flamingo only                      │
└────────────────────┬────────────────────────────────┘
                     │ MCP (single endpoint)
                     ▼
┌─────────────────────────────────────────────────────┐
│  supervisor (Go, flamingo :3200)                    │
│  ├── skill tools: tdd, review, debug, spec          │
│  ├── session tools: session_log                     │
│  ├── brain tools: brain_query, brain_write          │
│  ├── org tools: tier                                │
│  └── tier detection + model routing                 │
└──────┬───────────────────────┬──────────────────────┘
       │ claude --print        │ HTTP (internal)
       │ or LiteLLM            │
       ▼                       ▼
┌──────────────┐    ┌─────────────────────────────────┐
│  Workers     │    │  ingestion (Go, :3300)           │
│  (subproc)   │    │  ├── /query  → Qdrant search    │
│              │    │  ├── /write  → brain/raw/        │
│  reads:      │    │  └── /ingest → raw → wiki       │
│  - CLAUDE.md │    └──────────────┬──────────────────┘
│  - skill.md  │                   │
│  - protocols │    ┌──────────────▼──────────────────┐
└──────────────┘    │  brain/                         │
                    │  ├── wiki/ (declarative)        │
                    │  │   ├── concepts/              │
                    │  │   ├── entities/              │
                    │  │   └── sources/               │
                    │  ├── raw/ (pending ingestion)   │
                    │  └── training-data/             │
                    │       ├── sft/                  │
                    │       ├── dpo/                  │
                    │       └── rl/                   │
                    └─────────────────────────────────┘

Monorepo Structure

Repository renamed from supervisor to hyperguild. Three Go modules, shared HTTP contracts.

hyperguild/
├── supervisor/                  ← Go module: MCP server
│   ├── cmd/supervisor/
│   ├── internal/
│   │   ├── config/              ← env + models.yaml (existing)
│   │   ├── exec/                ← spawns workers (existing)
│   │   ├── mcp/                 ← JSON-RPC server (existing)
│   │   ├── registry/            ← skill routing (existing)
│   │   ├── tier/                ← NEW: tier detection
│   │   ├── session/             ← NEW: session log
│   │   └── skills/
│   │       ├── tdd/             ← existing
│   │       ├── review/          ← NEW
│   │       ├── debug/           ← NEW
│   │       ├── spec/            ← NEW
│   │       ├── retrospective/   ← NEW
│   │       └── trainer/         ← NEW
│   └── go.mod
├── ingestion/                   ← Go module: brain processing
│   ├── cmd/
│   │   ├── ingest/              ← existing CLI
│   │   └── server/              ← NEW: HTTP API for supervisor
│   ├── internal/
│   │   ├── extract/             ← existing
│   │   ├── llm/                 ← existing
│   │   ├── slug/                ← existing
│   │   └── wiki/                ← existing
│   └── go.mod
├── brain/
│   ├── wiki/
│   │   ├── concepts/
│   │   ├── entities/
│   │   └── sources/
│   ├── raw/
│   ├── sessions/                ← session logs: {session_id}.jsonl
│   └── training-data/
│       ├── sft/                 ← JSONL: {messages: [...]}
│       ├── dpo/                 ← JSONL: {prompt, chosen, rejected}
│       └── rl/                  ← JSONL: {trajectory: [...], reward}
├── config/
│   └── supervisor/
│       ├── CLAUDE.md            ← orchestration rules (existing)
│       ├── tdd.md               ← existing
│       ├── review.md            ← NEW
│       ├── debug.md             ← NEW
│       ├── spec.md              ← NEW
│       ├── retrospective.md     ← NEW
│       ├── trainer.md           ← NEW
│       └── protocols.md         ← NEW: The Hyperguild Way
├── config/models.yaml
└── Taskfile.yml

Operating Tiers

Tier detection runs at supervisor startup and is re-evaluated on each request via the tier tool.

Tier Condition Orchestrator Brain Managed Agents
1 — Full online Tailscale + internet reachable Claude (API) Read + Write Available
2 — LAN only koala/iguana reachable, no internet Ollama via LiteLLM Read + Write Unavailable
3 — Airplane flamingo only Small local model Read-only snapshot Unavailable

Detection logic (in internal/tier):

  1. Probe https://api.anthropic.com with 2s timeout → Tier 1 if reachable
  2. Probe LiteLLM at LITELLM_BASE_URL → Tier 2 if reachable
  3. Default → Tier 3

Tier is returned by the tier MCP tool as:

{
  "tier": 2,
  "label": "lan-only",
  "available_models": ["ollama/qwen3-coder-30b-tuned"],
  "managed_agents": false
}

MCP Tool Surface

Skill tools

Tool Required Optional Phase
tdd_red project_root, spec model, test_cmd Existing
tdd_green project_root, test_path model, test_cmd Existing
tdd_refactor project_root, test_path, impl_path model, test_cmd Existing
review project_root, diff model, spec_path Phase 2
debug project_root, error model, test_cmd Phase 2
spec requirement model, context Phase 2
retrospective session_id model Phase 1
trainer session_id model, skill Phase 2

Brain tools

Tool Required Optional Returns
brain_query query domain, limit Array of wiki excerpts with scores
brain_write content, type domain, title Confirmation + path written

Session tools

Tool Required Optional Effect
session_log session_id, entry skill, outcome Appends to session log JSONL

Org tools

Tool Required Returns
tier Current tier, available models, capabilities

Session Log Format

The session log is the raw material for both the retrospective (declarative brain) and trainer (parametric brain) workers. Every entry is a JSONL record:

{
  "session_id": "2026-04-17-tdd-http-client",
  "timestamp": "2026-04-17T14:23:01Z",
  "skill": "tdd_green",
  "phase": "green",
  "project_root": "/Users/mathias/dev/myproject",
  "input": { "test_path": "internal/client/client_test.go" },
  "attempts": [
    {
      "attempt": 1,
      "model": "ollama/qwen3-coder-30b-tuned",
      "output_summary": "wrote minimal implementation",
      "runner_output": "FAIL: TestRetry (exit 1)",
      "verified": false
    },
    {
      "attempt": 2,
      "model": "ollama/qwen3-coder-30b-tuned",
      "output_summary": "fixed off-by-one in retry count",
      "runner_output": "ok github.com/... (exit 0)",
      "verified": true
    }
  ],
  "final_status": "pass",
  "file_path": "internal/client/client.go",
  "model_used": "ollama/qwen3-coder-30b-tuned",
  "duration_ms": 42300
}

Session logs live at brain/sessions/{session_id}.jsonl. Each skill invocation appends one entry.


Two-Layer Brain

Layer 1 — Declarative (wiki)

Human-readable, Obsidian-compatible wiki. Workers query before starting tasks. Retrospective worker writes here at session end.

Retrospective worker flow:

  1. Read session log for session_id
  2. Query brain for related existing pages
  3. Identify what's novel (patterns, decisions, failures worth keeping)
  4. Write structured entries to brain/raw/
  5. Flag entries tagged significant for Mathias review before ingestion
  6. Trigger ingestion-svc /ingest for untagged entries

Layer 2 — Parametric (training data)

Machine-readable JSONL, organized by learning strategy. Fed into fine-tuning pipelines on koala (Axolotl / LLaMA-Factory / Unsloth).

Data the guild generates naturally:

Type Source Ground truth
SFT Verified TDD cycles: (spec→test→impl) Exit code ✓
SFT Debug sessions: (error→hypothesis→fix) Exit code ✓
DPO Multi-attempt sessions: failed vs. passing attempt Exit code ✓
DPO Review cycles: original vs. revised code Mathias approval
RL Full TDD trajectories: state→action→reward Exit code = reward

Trainer worker flow (Phase 2):

  1. Read session log for session_id
  2. Extract SFT examples from verified single-attempt completions
  3. Extract DPO pairs from multi-attempt sessions (rejected = failed attempts)
  4. Extract RL trajectories from full sessions
  5. Write to brain/training-data/{type}/{skill}-{date}.jsonl
  6. Flag RL trajectories for Mathias review before use in training

When RL applies: Discrete, verifiable tasks with clear reward signals — TDD phases, debug sessions with passing exit code. Not for open-ended tasks (spec writing, review) where the reward is fuzzy — those use DPO from Mathias feedback.


Organizational Protocols

Responsibilities are split between the Go skill handler (the wrapper) and the worker subprocess (the claude --print process). Workers only have Bash, Read, Write — they cannot call MCP tools directly.

Go skill handler responsibilities (before/after subprocess)

  • Before spawning: call brain_query for context relevant to the task; inject results into the worker prompt
  • After result: call session_log with the full trajectory (input, attempts, outcome, duration)
  • On tier change: check tier and adjust model selection accordingly

Worker subprocess responsibilities (config/supervisor/protocols.md)

Injected into every worker prompt alongside the skill discipline file:

  • Output contract: every response is raw JSON matching the response schema — no preamble, no prose
  • Quality gate: verified: true only when the subprocess exit code confirms it — never self-assess
  • Escalation: if stuck after 3 attempts, return status: error with reason — do not retry silently
  • Offline behavior: if context from the brain is absent from the prompt, proceed with discipline file only and note the gap in message
  • Handoff: structure output so the next worker in a chain can consume it without transformation

Managed Agents Integration (Tier 1 only)

Claude Managed Agents ($0.08/session-hour + tokens) is available in Tier 1 for:

  • Sessions requiring real production credentials (scoped permissions via Managed Agents)
  • Long autonomous runs (hours) that would time out a local subprocess
  • Work requiring a full audit trail for compliance or client delivery

The supervisor does not call Managed Agents directly. Claude Code (the orchestrator) decides when to escalate based on task description and the tier response. The clean integration: Claude Code calls tier, sees managed_agents: true, then decides whether the task warrants escalation. No magic — the orchestrator decides.


Implementation Phases

Phase 1 — Foundation (this spec → implementation plan)

  1. Rename repo to hyperguild, migrate supervisor and ingestion-svc into monorepo
  2. Add ingestion/cmd/server — HTTP API: /query, /write, /ingest
  3. Add internal/tier — tier detection with probing logic
  4. Add internal/session — session log (append JSONL, read by session_id)
  5. Add brain_query, brain_write, session_log, tier MCP tools
  6. Add retrospective skill + config/supervisor/retrospective.md
  7. Write config/supervisor/protocols.md
  8. Update existing skill handlers to call session_log after each invocation
  9. Wire brain/training-data/ directory structure
  10. Update .context/mcp.json and Taskfile

Success criteria: A TDD session automatically produces a session log. Running retrospective writes structured entries to brain/raw/. brain_query returns relevant wiki content. Tier detection correctly identifies Tier 1 vs. Tier 2.

Phase 2 — SDO Skills + Trainer

  • review, debug, spec skills with discipline files
  • trainer skill: SFT and DPO extraction from session logs
  • brain_write with Mathias review gate for significant entries

Phase 3 — Parametric Learning

  • RL trajectory extraction
  • Integration with Axolotl/LLaMA-Factory on koala
  • Fine-tuning pipeline: training-data/ → model checkpoint → llama-swap → live workers

Phase 4 — Tier 3 + Airplane Mode

  • Brain snapshot tooling (pull wiki to flamingo for offline read)
  • Minimal worker set that runs on flamingo without koala
  • Degraded mode detection and graceful capability reduction

Deferred

  • Event-driven architecture (Option C): revisit when worker roster reaches ~8+ skills
  • A2A protocol: Anthropic hasn't shipped it in Claude Code yet — revisit H2 2026
  • Generic phase engine: saved in project memory, revisit at 4+ skills
  • Authentication: Tailscale mesh only for now