# Hyperguild — Design Spec

**Date:** 2026-04-17
**Status:** approved
**Author:** Mathias + Claude

---

## Overview

The hyperguild is a local Software Development Organization (SDO) of AI workers, running across the Tailscale mesh (koala, iguana, flamingo). It exposes skill workers as MCP tools that Claude Code can call natively during coding sessions. Workers share a two-layer organizational brain — a declarative wiki and a parametric training data store — and operate under a common set of protocols and discipline files.

The hyperguild is designed for graceful degradation across operating tiers: Claude-orchestrated at full power, Ollama-driven on LAN-only, and minimally functional in airplane mode. The organizational shape (protocols, discipline files, tool signatures, output contracts) stays constant across all tiers — only the reasoning quality changes.

Inspired by Senge's five disciplines: personal mastery (discipline files), mental models (organizational CLAUDE.md), shared vision (protocols.md), team learning (retrospective worker), systems thinking (tier-aware routing and the learning loop).

---

## Architecture

```
┌─────────────────────────────────────────────────────┐
│  Claude Code (orchestrator)                         │
│  Tier 1: Claude strategic lead                      │
│  Tier 2: Ollama orchestrator (LAN)                  │
│  Tier 3: Ollama, flamingo only                      │
└────────────────────┬────────────────────────────────┘
                     │ MCP (single endpoint)
                     ▼
┌─────────────────────────────────────────────────────┐
│  supervisor (Go, flamingo :3200)                    │
│  ├── skill tools: tdd, review, debug, spec          │
│  ├── session tools: session_log                     │
│  ├── brain tools: brain_query, brain_write          │
│  ├── org tools: tier                                │
│  └── tier detection + model routing                 │
└──────┬───────────────────────┬──────────────────────┘
       │ claude --print        │ HTTP (internal)
       │ or LiteLLM            │
       ▼                       ▼
┌──────────────┐    ┌─────────────────────────────────┐
│  Workers     │    │  ingestion (Go, :3300)           │
│  (subproc)   │    │  ├── /query  → Qdrant search    │
│              │    │  ├── /write  → brain/raw/        │
│  reads:      │    │  └── /ingest → raw → wiki       │
│  - CLAUDE.md │    └──────────────┬──────────────────┘
│  - skill.md  │                   │
│  - protocols │    ┌──────────────▼──────────────────┐
└──────────────┘    │  brain/                         │
                    │  ├── wiki/ (declarative)        │
                    │  │   ├── concepts/              │
                    │  │   ├── entities/              │
                    │  │   └── sources/               │
                    │  ├── raw/ (pending ingestion)   │
                    │  └── training-data/             │
                    │       ├── sft/                  │
                    │       ├── dpo/                  │
                    │       └── rl/                   │
                    └─────────────────────────────────┘
```

---

## Monorepo Structure

Repository renamed from `supervisor` to `hyperguild`. Three Go modules, shared HTTP contracts.

```
hyperguild/
├── supervisor/                  ← Go module: MCP server
│   ├── cmd/supervisor/
│   ├── internal/
│   │   ├── config/              ← env + models.yaml (existing)
│   │   ├── exec/                ← spawns workers (existing)
│   │   ├── mcp/                 ← JSON-RPC server (existing)
│   │   ├── registry/            ← skill routing (existing)
│   │   ├── tier/                ← NEW: tier detection
│   │   ├── session/             ← NEW: session log
│   │   └── skills/
│   │       ├── tdd/             ← existing
│   │       ├── review/          ← NEW
│   │       ├── debug/           ← NEW
│   │       ├── spec/            ← NEW
│   │       ├── retrospective/   ← NEW
│   │       └── trainer/         ← NEW
│   └── go.mod
├── ingestion/                   ← Go module: brain processing
│   ├── cmd/
│   │   ├── ingest/              ← existing CLI
│   │   └── server/              ← NEW: HTTP API for supervisor
│   ├── internal/
│   │   ├── extract/             ← existing
│   │   ├── llm/                 ← existing
│   │   ├── slug/                ← existing
│   │   └── wiki/                ← existing
│   └── go.mod
├── brain/
│   ├── wiki/
│   │   ├── concepts/
│   │   ├── entities/
│   │   └── sources/
│   ├── raw/
│   ├── sessions/                ← session logs: {session_id}.jsonl
│   └── training-data/
│       ├── sft/                 ← JSONL: {messages: [...]}
│       ├── dpo/                 ← JSONL: {prompt, chosen, rejected}
│       └── rl/                  ← JSONL: {trajectory: [...], reward}
├── config/
│   └── supervisor/
│       ├── CLAUDE.md            ← orchestration rules (existing)
│       ├── tdd.md               ← existing
│       ├── review.md            ← NEW
│       ├── debug.md             ← NEW
│       ├── spec.md              ← NEW
│       ├── retrospective.md     ← NEW
│       ├── trainer.md           ← NEW
│       └── protocols.md         ← NEW: The Hyperguild Way
├── config/models.yaml
└── Taskfile.yml
```

---

## Operating Tiers

Tier detection runs at supervisor startup and is re-evaluated on each request via the `tier` tool.

| Tier | Condition | Orchestrator | Brain | Managed Agents |
|------|-----------|--------------|-------|----------------|
| 1 — Full online | Tailscale + internet reachable | Claude (API) | Read + Write | Available |
| 2 — LAN only | koala/iguana reachable, no internet | Ollama via LiteLLM | Read + Write | Unavailable |
| 3 — Airplane | flamingo only | Small local model | Read-only snapshot | Unavailable |

**Detection logic** (in `internal/tier`):
1. Probe `https://api.anthropic.com` with 2s timeout → Tier 1 if reachable
2. Probe LiteLLM at `LITELLM_BASE_URL` → Tier 2 if reachable
3. Default → Tier 3

Tier is returned by the `tier` MCP tool as:
```json
{
  "tier": 2,
  "label": "lan-only",
  "available_models": ["ollama/qwen3-coder-30b-tuned"],
  "managed_agents": false
}
```

---

## MCP Tool Surface

### Skill tools

| Tool | Required | Optional | Phase |
|------|----------|----------|-------|
| `tdd_red` | `project_root`, `spec` | `model`, `test_cmd` | Existing |
| `tdd_green` | `project_root`, `test_path` | `model`, `test_cmd` | Existing |
| `tdd_refactor` | `project_root`, `test_path`, `impl_path` | `model`, `test_cmd` | Existing |
| `review` | `project_root`, `diff` | `model`, `spec_path` | Phase 2 |
| `debug` | `project_root`, `error` | `model`, `test_cmd` | Phase 2 |
| `spec` | `requirement` | `model`, `context` | Phase 2 |
| `retrospective` | `session_id` | `model` | Phase 1 |
| `trainer` | `session_id` | `model`, `skill` | Phase 2 |

### Brain tools

| Tool | Required | Optional | Returns |
|------|----------|----------|---------|
| `brain_query` | `query` | `domain`, `limit` | Array of wiki excerpts with scores |
| `brain_write` | `content`, `type` | `domain`, `title` | Confirmation + path written |

### Session tools

| Tool | Required | Optional | Effect |
|------|----------|----------|--------|
| `session_log` | `session_id`, `entry` | `skill`, `outcome` | Appends to session log JSONL |

### Org tools

| Tool | Required | Returns |
|------|----------|---------|
| `tier` | — | Current tier, available models, capabilities |

---

## Session Log Format

The session log is the raw material for both the retrospective (declarative brain) and trainer (parametric brain) workers. Every entry is a JSONL record:

```json
{
  "session_id": "2026-04-17-tdd-http-client",
  "timestamp": "2026-04-17T14:23:01Z",
  "skill": "tdd_green",
  "phase": "green",
  "project_root": "/Users/mathias/dev/myproject",
  "input": { "test_path": "internal/client/client_test.go" },
  "attempts": [
    {
      "attempt": 1,
      "model": "ollama/qwen3-coder-30b-tuned",
      "output_summary": "wrote minimal implementation",
      "runner_output": "FAIL: TestRetry (exit 1)",
      "verified": false
    },
    {
      "attempt": 2,
      "model": "ollama/qwen3-coder-30b-tuned",
      "output_summary": "fixed off-by-one in retry count",
      "runner_output": "ok github.com/... (exit 0)",
      "verified": true
    }
  ],
  "final_status": "pass",
  "file_path": "internal/client/client.go",
  "model_used": "ollama/qwen3-coder-30b-tuned",
  "duration_ms": 42300
}
```

Session logs live at `brain/sessions/{session_id}.jsonl`. Each skill invocation appends one entry.

---

## Two-Layer Brain

### Layer 1 — Declarative (wiki)

Human-readable, Obsidian-compatible wiki. Workers query before starting tasks. Retrospective worker writes here at session end.

**Retrospective worker flow:**
1. Read session log for `session_id`
2. Query brain for related existing pages
3. Identify what's novel (patterns, decisions, failures worth keeping)
4. Write structured entries to `brain/raw/`
5. Flag entries tagged `significant` for Mathias review before ingestion
6. Trigger ingestion-svc `/ingest` for untagged entries

### Layer 2 — Parametric (training data)

Machine-readable JSONL, organized by learning strategy. Fed into fine-tuning pipelines on koala (Axolotl / LLaMA-Factory / Unsloth).

**Data the guild generates naturally:**

| Type | Source | Ground truth |
|------|--------|-------------|
| SFT | Verified TDD cycles: (spec→test→impl) | Exit code ✓ |
| SFT | Debug sessions: (error→hypothesis→fix) | Exit code ✓ |
| DPO | Multi-attempt sessions: failed vs. passing attempt | Exit code ✓ |
| DPO | Review cycles: original vs. revised code | Mathias approval |
| RL | Full TDD trajectories: state→action→reward | Exit code = reward |

**Trainer worker flow** (Phase 2):
1. Read session log for `session_id`
2. Extract SFT examples from verified single-attempt completions
3. Extract DPO pairs from multi-attempt sessions (rejected = failed attempts)
4. Extract RL trajectories from full sessions
5. Write to `brain/training-data/{type}/{skill}-{date}.jsonl`
6. Flag RL trajectories for Mathias review before use in training

**When RL applies:** Discrete, verifiable tasks with clear reward signals — TDD phases, debug sessions with passing exit code. Not for open-ended tasks (spec writing, review) where the reward is fuzzy — those use DPO from Mathias feedback.

---

## Organizational Protocols

Responsibilities are split between the **Go skill handler** (the wrapper) and the **worker subprocess** (the claude --print process). Workers only have `Bash`, `Read`, `Write` — they cannot call MCP tools directly.

### Go skill handler responsibilities (before/after subprocess)

- **Before spawning**: call `brain_query` for context relevant to the task; inject results into the worker prompt
- **After result**: call `session_log` with the full trajectory (input, attempts, outcome, duration)
- **On tier change**: check `tier` and adjust model selection accordingly

### Worker subprocess responsibilities (`config/supervisor/protocols.md`)

Injected into every worker prompt alongside the skill discipline file:

- **Output contract**: every response is raw JSON matching the response schema — no preamble, no prose
- **Quality gate**: `verified: true` only when the subprocess exit code confirms it — never self-assess
- **Escalation**: if stuck after 3 attempts, return `status: error` with reason — do not retry silently
- **Offline behavior**: if context from the brain is absent from the prompt, proceed with discipline file only and note the gap in `message`
- **Handoff**: structure output so the next worker in a chain can consume it without transformation

---

## Managed Agents Integration (Tier 1 only)

Claude Managed Agents ($0.08/session-hour + tokens) is available in Tier 1 for:
- Sessions requiring real production credentials (scoped permissions via Managed Agents)
- Long autonomous runs (hours) that would time out a local subprocess
- Work requiring a full audit trail for compliance or client delivery

The supervisor does not call Managed Agents directly. Claude Code (the orchestrator) decides when to escalate based on task description and the `tier` response. The clean integration: Claude Code calls `tier`, sees `managed_agents: true`, then decides whether the task warrants escalation. No magic — the orchestrator decides.

---

## Implementation Phases

### Phase 1 — Foundation (this spec → implementation plan)

1. Rename repo to `hyperguild`, migrate `supervisor` and `ingestion-svc` into monorepo
2. Add `ingestion/cmd/server` — HTTP API: `/query`, `/write`, `/ingest`
3. Add `internal/tier` — tier detection with probing logic
4. Add `internal/session` — session log (append JSONL, read by session_id)
5. Add `brain_query`, `brain_write`, `session_log`, `tier` MCP tools
6. Add `retrospective` skill + `config/supervisor/retrospective.md`
7. Write `config/supervisor/protocols.md`
8. Update existing skill handlers to call `session_log` after each invocation
9. Wire `brain/training-data/` directory structure
10. Update `.context/mcp.json` and Taskfile

**Success criteria:** A TDD session automatically produces a session log. Running `retrospective` writes structured entries to `brain/raw/`. `brain_query` returns relevant wiki content. Tier detection correctly identifies Tier 1 vs. Tier 2.

### Phase 2 — SDO Skills + Trainer

- `review`, `debug`, `spec` skills with discipline files
- `trainer` skill: SFT and DPO extraction from session logs
- `brain_write` with Mathias review gate for `significant` entries

### Phase 3 — Parametric Learning

- RL trajectory extraction
- Integration with Axolotl/LLaMA-Factory on koala
- Fine-tuning pipeline: training-data/ → model checkpoint → llama-swap → live workers

### Phase 4 — Tier 3 + Airplane Mode

- Brain snapshot tooling (pull wiki to flamingo for offline read)
- Minimal worker set that runs on flamingo without koala
- Degraded mode detection and graceful capability reduction

---

## Deferred

- **Event-driven architecture (Option C)**: revisit when worker roster reaches ~8+ skills
- **A2A protocol**: Anthropic hasn't shipped it in Claude Code yet — revisit H2 2026
- **Generic phase engine**: saved in project memory, revisit at 4+ skills
- **Authentication**: Tailscale mesh only for now