Files

Mathias Bergqvist f76c150041 docs: add hyperguild architecture design spec

Full vision for the hyperguild SDO: monorepo structure, two-layer brain
(declarative wiki + parametric training data), operating tiers, MCP tool
surface, session log format, retrospective + trainer workers, and four
implementation phases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-17 19:53:32 +02:00

16 KiB

Raw Blame History

Hyperguild — Design Spec

Date: 2026-04-17 Status: approved Author: Mathias + Claude

Overview

The hyperguild is a local Software Development Organization (SDO) of AI workers, running across the Tailscale mesh (koala, iguana, flamingo). It exposes skill workers as MCP tools that Claude Code can call natively during coding sessions. Workers share a two-layer organizational brain — a declarative wiki and a parametric training data store — and operate under a common set of protocols and discipline files.

The hyperguild is designed for graceful degradation across operating tiers: Claude-orchestrated at full power, Ollama-driven on LAN-only, and minimally functional in airplane mode. The organizational shape (protocols, discipline files, tool signatures, output contracts) stays constant across all tiers — only the reasoning quality changes.

Inspired by Senge's five disciplines: personal mastery (discipline files), mental models (organizational CLAUDE.md), shared vision (protocols.md), team learning (retrospective worker), systems thinking (tier-aware routing and the learning loop).

Architecture

┌─────────────────────────────────────────────────────┐
│  Claude Code (orchestrator)                         │
│  Tier 1: Claude strategic lead                      │
│  Tier 2: Ollama orchestrator (LAN)                  │
│  Tier 3: Ollama, flamingo only                      │
└────────────────────┬────────────────────────────────┘
                     │ MCP (single endpoint)
                     ▼
┌─────────────────────────────────────────────────────┐
│  supervisor (Go, flamingo :3200)                    │
│  ├── skill tools: tdd, review, debug, spec          │
│  ├── session tools: session_log                     │
│  ├── brain tools: brain_query, brain_write          │
│  ├── org tools: tier                                │
│  └── tier detection + model routing                 │
└──────┬───────────────────────┬──────────────────────┘
       │ claude --print        │ HTTP (internal)
       │ or LiteLLM            │
       ▼                       ▼
┌──────────────┐    ┌─────────────────────────────────┐
│  Workers     │    │  ingestion (Go, :3300)           │
│  (subproc)   │    │  ├── /query  → Qdrant search    │
│              │    │  ├── /write  → brain/raw/        │
│  reads:      │    │  └── /ingest → raw → wiki       │
│  - CLAUDE.md │    └──────────────┬──────────────────┘
│  - skill.md  │                   │
│  - protocols │    ┌──────────────▼──────────────────┐
└──────────────┘    │  brain/                         │
                    │  ├── wiki/ (declarative)        │
                    │  │   ├── concepts/              │
                    │  │   ├── entities/              │
                    │  │   └── sources/               │
                    │  ├── raw/ (pending ingestion)   │
                    │  └── training-data/             │
                    │       ├── sft/                  │
                    │       ├── dpo/                  │
                    │       └── rl/                   │
                    └─────────────────────────────────┘

Monorepo Structure

Repository renamed from supervisor to hyperguild. Three Go modules, shared HTTP contracts.

hyperguild/
├── supervisor/                  ← Go module: MCP server
│   ├── cmd/supervisor/
│   ├── internal/
│   │   ├── config/              ← env + models.yaml (existing)
│   │   ├── exec/                ← spawns workers (existing)
│   │   ├── mcp/                 ← JSON-RPC server (existing)
│   │   ├── registry/            ← skill routing (existing)
│   │   ├── tier/                ← NEW: tier detection
│   │   ├── session/             ← NEW: session log
│   │   └── skills/
│   │       ├── tdd/             ← existing
│   │       ├── review/          ← NEW
│   │       ├── debug/           ← NEW
│   │       ├── spec/            ← NEW
│   │       ├── retrospective/   ← NEW
│   │       └── trainer/         ← NEW
│   └── go.mod
├── ingestion/                   ← Go module: brain processing
│   ├── cmd/
│   │   ├── ingest/              ← existing CLI
│   │   └── server/              ← NEW: HTTP API for supervisor
│   ├── internal/
│   │   ├── extract/             ← existing
│   │   ├── llm/                 ← existing
│   │   ├── slug/                ← existing
│   │   └── wiki/                ← existing
│   └── go.mod
├── brain/
│   ├── wiki/
│   │   ├── concepts/
│   │   ├── entities/
│   │   └── sources/
│   ├── raw/
│   ├── sessions/                ← session logs: {session_id}.jsonl
│   └── training-data/
│       ├── sft/                 ← JSONL: {messages: [...]}
│       ├── dpo/                 ← JSONL: {prompt, chosen, rejected}
│       └── rl/                  ← JSONL: {trajectory: [...], reward}
├── config/
│   └── supervisor/
│       ├── CLAUDE.md            ← orchestration rules (existing)
│       ├── tdd.md               ← existing
│       ├── review.md            ← NEW
│       ├── debug.md             ← NEW
│       ├── spec.md              ← NEW
│       ├── retrospective.md     ← NEW
│       ├── trainer.md           ← NEW
│       └── protocols.md         ← NEW: The Hyperguild Way
├── config/models.yaml
└── Taskfile.yml

Operating Tiers

Tier detection runs at supervisor startup and is re-evaluated on each request via the tier tool.

Tier	Condition	Orchestrator	Brain	Managed Agents
1 — Full online	Tailscale + internet reachable	Claude (API)	Read + Write	Available
2 — LAN only	koala/iguana reachable, no internet	Ollama via LiteLLM	Read + Write	Unavailable
3 — Airplane	flamingo only	Small local model	Read-only snapshot	Unavailable

Detection logic (in internal/tier):

Probe https://api.anthropic.com with 2s timeout → Tier 1 if reachable
Probe LiteLLM at LITELLM_BASE_URL → Tier 2 if reachable
Default → Tier 3

Tier is returned by the tier MCP tool as:

{
  "tier": 2,
  "label": "lan-only",
  "available_models": ["ollama/qwen3-coder-30b-tuned"],
  "managed_agents": false
}

MCP Tool Surface

Skill tools

Tool	Required	Optional	Phase
`tdd_red`	`project_root`, `spec`	`model`, `test_cmd`	Existing
`tdd_green`	`project_root`, `test_path`	`model`, `test_cmd`	Existing
`tdd_refactor`	`project_root`, `test_path`, `impl_path`	`model`, `test_cmd`	Existing
`review`	`project_root`, `diff`	`model`, `spec_path`	Phase 2
`debug`	`project_root`, `error`	`model`, `test_cmd`	Phase 2
`spec`	`requirement`	`model`, `context`	Phase 2
`retrospective`	`session_id`	`model`	Phase 1
`trainer`	`session_id`	`model`, `skill`	Phase 2

Brain tools

Tool	Required	Optional	Returns
`brain_query`	`query`	`domain`, `limit`	Array of wiki excerpts with scores
`brain_write`	`content`, `type`	`domain`, `title`	Confirmation + path written

Session tools

Tool	Required	Optional	Effect
`session_log`	`session_id`, `entry`	`skill`, `outcome`	Appends to session log JSONL

Org tools

Tool	Required	Returns
`tier`	—	Current tier, available models, capabilities

Session Log Format

The session log is the raw material for both the retrospective (declarative brain) and trainer (parametric brain) workers. Every entry is a JSONL record:

{
  "session_id": "2026-04-17-tdd-http-client",
  "timestamp": "2026-04-17T14:23:01Z",
  "skill": "tdd_green",
  "phase": "green",
  "project_root": "/Users/mathias/dev/myproject",
  "input": { "test_path": "internal/client/client_test.go" },
  "attempts": [
    {
      "attempt": 1,
      "model": "ollama/qwen3-coder-30b-tuned",
      "output_summary": "wrote minimal implementation",
      "runner_output": "FAIL: TestRetry (exit 1)",
      "verified": false
    },
    {
      "attempt": 2,
      "model": "ollama/qwen3-coder-30b-tuned",
      "output_summary": "fixed off-by-one in retry count",
      "runner_output": "ok github.com/... (exit 0)",
      "verified": true
    }
  ],
  "final_status": "pass",
  "file_path": "internal/client/client.go",
  "model_used": "ollama/qwen3-coder-30b-tuned",
  "duration_ms": 42300
}

Session logs live at brain/sessions/{session_id}.jsonl. Each skill invocation appends one entry.

Two-Layer Brain

Layer 1 — Declarative (wiki)

Human-readable, Obsidian-compatible wiki. Workers query before starting tasks. Retrospective worker writes here at session end.

Retrospective worker flow:

Read session log for session_id
Query brain for related existing pages
Identify what's novel (patterns, decisions, failures worth keeping)
Write structured entries to brain/raw/
Flag entries tagged significant for Mathias review before ingestion
Trigger ingestion-svc /ingest for untagged entries

Layer 2 — Parametric (training data)

Machine-readable JSONL, organized by learning strategy. Fed into fine-tuning pipelines on koala (Axolotl / LLaMA-Factory / Unsloth).

Data the guild generates naturally:

Type	Source	Ground truth
SFT	Verified TDD cycles: (spec→test→impl)	Exit code ✓
SFT	Debug sessions: (error→hypothesis→fix)	Exit code ✓
DPO	Multi-attempt sessions: failed vs. passing attempt	Exit code ✓
DPO	Review cycles: original vs. revised code	Mathias approval
RL	Full TDD trajectories: state→action→reward	Exit code = reward

Trainer worker flow (Phase 2):

Read session log for session_id
Extract SFT examples from verified single-attempt completions
Extract DPO pairs from multi-attempt sessions (rejected = failed attempts)
Extract RL trajectories from full sessions
Write to brain/training-data/{type}/{skill}-{date}.jsonl
Flag RL trajectories for Mathias review before use in training

When RL applies: Discrete, verifiable tasks with clear reward signals — TDD phases, debug sessions with passing exit code. Not for open-ended tasks (spec writing, review) where the reward is fuzzy — those use DPO from Mathias feedback.

Organizational Protocols

Responsibilities are split between the Go skill handler (the wrapper) and the worker subprocess (the claude --print process). Workers only have Bash, Read, Write — they cannot call MCP tools directly.

Go skill handler responsibilities (before/after subprocess)

Before spawning: call brain_query for context relevant to the task; inject results into the worker prompt
After result: call session_log with the full trajectory (input, attempts, outcome, duration)
On tier change: check tier and adjust model selection accordingly

Worker subprocess responsibilities (`config/supervisor/protocols.md`)

Injected into every worker prompt alongside the skill discipline file:

Output contract: every response is raw JSON matching the response schema — no preamble, no prose
Quality gate: verified: true only when the subprocess exit code confirms it — never self-assess
Escalation: if stuck after 3 attempts, return status: error with reason — do not retry silently
Offline behavior: if context from the brain is absent from the prompt, proceed with discipline file only and note the gap in message
Handoff: structure output so the next worker in a chain can consume it without transformation

Managed Agents Integration (Tier 1 only)

Claude Managed Agents ($0.08/session-hour + tokens) is available in Tier 1 for:

Sessions requiring real production credentials (scoped permissions via Managed Agents)
Long autonomous runs (hours) that would time out a local subprocess
Work requiring a full audit trail for compliance or client delivery

The supervisor does not call Managed Agents directly. Claude Code (the orchestrator) decides when to escalate based on task description and the tier response. The clean integration: Claude Code calls tier, sees managed_agents: true, then decides whether the task warrants escalation. No magic — the orchestrator decides.

Implementation Phases

Phase 1 — Foundation (this spec → implementation plan)

Rename repo to hyperguild, migrate supervisor and ingestion-svc into monorepo
Add ingestion/cmd/server — HTTP API: /query, /write, /ingest
Add internal/tier — tier detection with probing logic
Add internal/session — session log (append JSONL, read by session_id)
Add brain_query, brain_write, session_log, tier MCP tools
Add retrospective skill + config/supervisor/retrospective.md
Write config/supervisor/protocols.md
Update existing skill handlers to call session_log after each invocation
Wire brain/training-data/ directory structure
Update .context/mcp.json and Taskfile

Success criteria: A TDD session automatically produces a session log. Running retrospective writes structured entries to brain/raw/. brain_query returns relevant wiki content. Tier detection correctly identifies Tier 1 vs. Tier 2.

Phase 2 — SDO Skills + Trainer

review, debug, spec skills with discipline files
trainer skill: SFT and DPO extraction from session logs
brain_write with Mathias review gate for significant entries

Phase 3 — Parametric Learning

RL trajectory extraction
Integration with Axolotl/LLaMA-Factory on koala
Fine-tuning pipeline: training-data/ → model checkpoint → llama-swap → live workers

Phase 4 — Tier 3 + Airplane Mode

Brain snapshot tooling (pull wiki to flamingo for offline read)
Minimal worker set that runs on flamingo without koala
Degraded mode detection and graceful capability reduction

Deferred

Event-driven architecture (Option C): revisit when worker roster reaches ~8+ skills
A2A protocol: Anthropic hasn't shipped it in Claude Code yet — revisit H2 2026
Generic phase engine: saved in project memory, revisit at 4+ skills
Authentication: Tailscale mesh only for now

16 KiB Raw Blame History