mathias/hyperguild

Fork 0

Go to file

Mathias a56a4db963

CI / Lint / Test / Vet (push) Successful in 10s

Details

CI / Mirror to GitHub (push) Successful in 3s

Details

feat(brain_answer): Qwen3-Reranker cross-encoder filter (opt-in)

Adds an opt-in cross-encoder rerank step between BM25 retrieval and LLM
synthesis. With BRAIN_RERANKER_URL set, brain_answer retrieves BM25
top-20, scores each excerpt against the query via Qwen3-Reranker on
Ollama, drops the "no" answers, and forwards up to 5 surviving sources
to the LLM. Unset, behaviour is unchanged (BM25 top-10 → LLM).

The reranker is a *filter*, not a re-ranker: Qwen3-Reranker emits a
binary yes/no token under its native chat template, and ties within the
"yes" set are broken by BM25 rank — what got retrieved first stays
ahead.

New package ingestion/internal/reranker:
- Client with URL, Model, HTTP fields.
- New(url, model) returns nil on empty url so callers can treat
  "feature disabled" as a single nil check.
- Score(ctx, query, docs) issues one /api/generate call per doc using
  the Qwen3-Reranker yes/no chat template (verbatim, because the model
  was trained on this exact wording). Parses the first non-think token.

Wiring:
- mcp.Server gains a WithReranker fluent setter to keep NewServer
  signature stable.
- brain_answer's BM25 limit jumps to 20 only when a reranker is wired,
  to give the filter something to do.
- cmd/server/main.go reads BRAIN_RERANKER_URL (+ optional
  BRAIN_RERANKER_MODEL, default dengcao/Qwen3-Reranker-0.6B:F16).

Tests cover: nil-on-empty-url, ordered yes/no scoring, request shape
(model, prompt contents, yes/no template), ambiguous response → 0,
empty doc slice, upstream-error propagation, plus an end-to-end
brain_answer integration that proves only the relevant note reaches the
LLM when noise.md is rejected.

Closes hyperguild#7.

2026-05-18 22:55:46 +02:00

.context

chore: re-sync context adapters from updated root AGENT.md

2026-05-18 11:44:02 +02:00

.gitea/workflows

fix(cd): drop retired supervisor build, add routing rollout verification

2026-05-18 11:48:57 +02:00

.skills

chore: scaffold supervisor from project template

2026-04-16 21:50:53 +02:00

brain

feat(pipeline): update system prompt for new LLM JSON contract (no slugs)

2026-04-23 19:45:21 +02:00

cmd

test(routing): de-flake TestRoutingPodEndToEnd

2026-05-18 20:00:18 +02:00

config

fix(config): make no-JSON instruction unmissable in protocols.md

2026-04-22 16:51:51 +02:00

docs

docs(plan6): implementation plan for Mode 2 routing pod

2026-05-04 14:53:03 +02:00

ingestion

feat(brain_answer): Qwen3-Reranker cross-encoder filter (opt-in)

2026-05-18 22:55:46 +02:00

internal

fix(project_create): commit staging namespace directly to infra main

2026-05-18 17:20:53 +02:00

scripts

feat(brain): structured wing/hall taxonomy + obsidian-compatible layout

2026-05-18 20:47:08 +02:00

.aider.conventions.md

chore: re-sync context adapters from updated root AGENT.md

2026-05-18 11:44:02 +02:00

.cursorrules

chore: re-sync context adapters from updated root AGENT.md

2026-05-18 11:44:02 +02:00

.dockerignore

fix: add .dockerignore and non-root USER to Dockerfile

2026-04-20 20:27:42 +02:00

.env.example

feat: wire brain, org, sessionlog, retrospective skills into supervisor

2026-04-17 20:52:16 +02:00

.gitignore

chore: commit adapters; add context freshness gate to task check

2026-04-29 15:59:52 +02:00

.mcp.json

chore(mcp): remove supervisor entry from .mcp.json

2026-05-12 14:49:46 +02:00

.skills-shared

chore: scaffold supervisor from project template

2026-04-16 21:50:53 +02:00

AGENTS.md

chore: re-sync context adapters from updated root AGENT.md

2026-05-18 11:44:02 +02:00

CLAUDE.md

docs: update CLAUDE.md and DECISIONS.md for completed 7-plan migration

2026-05-12 14:53:08 +02:00

DECISIONS.md

feat(brain-mcp): OAuth 2.0 client_credentials flow for claude.ai

2026-05-18 22:21:54 +02:00

Dockerfile.routing

build(routing): Dockerfile + CD workflow

2026-05-05 07:19:18 +02:00

go.mod

feat(auth): add Dex JWT middleware to supervisor, routing pod, and brain MCP

2026-05-11 20:10:05 +02:00

go.sum

feat(auth): add Dex JWT middleware to supervisor, routing pod, and brain MCP

2026-05-11 20:10:05 +02:00

Procfile

feat(ingestion): wire watcher into server startup + fix Procfile env vars

2026-04-22 23:09:00 +02:00

README.md

refactor(routing): rename local/claude to fast/thinking model pair

2026-05-08 16:39:42 +02:00

Taskfile.yml

test(routing): live-contract smoke target

2026-05-05 22:52:23 +02:00

README.md

hyperguild

An MCP server that acts as a disciplined AI supervisor for Claude Code sessions. Instead of letting Claude Code do whatever it wants, hyperguild enforces structured workflows (TDD red/green/refactor), logs every session, and accumulates learnings into a searchable brain.

How it works

Your Claude Code session (in any project)
    │
    │  MCP over HTTP (Tailscale)
    ├──▶ supervisor  :3200 (NodePort 30320 on koala) — skill workers: tdd, debug, spec, …
    ├──▶ routing     :3210 (NodePort 30310 on koala) — Mode 2 only: review, debug, retrospective, trainer
    └──▶ brain       :3300 (NodePort 30330 on koala) — brain_query, brain_write, brain_ingest, session_log
                       │
                       └─ also serves the legacy REST endpoints (/query, /write, /ingest, …)
    │
    ▼
brain/
├── sessions/       — JSONL log, one file per session_id
├── wiki/           — searchable knowledge (full-text)
│   ├── concepts/
│   ├── entities/
│   └── sources/
├── raw/            — retrospective output, staged for review
└── training-data/  — SFT/DPO/RL data (Phase 2)

Phase 1 tools (available now)

Tool	What it does
`tdd_red`	Writes a failing test for a spec, verifies it fails
`tdd_green`	Writes the minimal implementation to make tests pass
`tdd_refactor`	Cleans up implementation while keeping tests green
`session_log`	Appends a structured entry to the session JSONL log
`retrospective`	Reads the session log, identifies novel learnings, writes to brain/raw/
`brain_query`	Full-text search over brain/wiki/
`brain_write`	Writes a note to brain/raw/ (with optional YAML frontmatter)
`tier`	Returns the current connectivity tier (1=cloud, 2=LAN, 3=offline)

Start the servers

# Requires goreman: go install github.com/mattn/goreman@latest
task start    # starts ingestion (:3300) + supervisor (:3200) via goreman
task stop     # kills both by port

Connect a project

Create .mcp.json in your project root:

{
  "mcpServers": {
    "supervisor": {
      "type": "http",
      "url": "http://koala:30320/mcp"
    },
    "brain": {
      "type": "http",
      "url": "http://koala:30330/mcp"
    }
  }
}

Two MCP servers are exposed today, both reachable over Tailscale:

supervisor at koala:30320 — skill workers (tdd_red/green/refactor, review, debug, spec, retrospective, trainer, tier).
brain at koala:30330 — knowledge access (brain_query, brain_write, brain_ingest, brain_ingest_raw) and session_log. Hosted by the ingestion service directly, no separate pod.

No local binary or stdio shim is required — Claude Code talks to both via HTTP.

Open Claude Code in your project — run /mcp to confirm both servers are listed.

A typical TDD session

1. Call tdd_red    → spec in, failing test file out
2. Call tdd_green  → test path in, implementation out
3. Call tdd_refactor → impl + test in, cleaned code out
4. Call session_log  → log each phase result
5. Call retrospective → extracts learnings → brain/raw/
6. Review brain/raw/, move worthy notes to brain/wiki/concepts/
7. Future sessions: call brain_query to retrieve relevant context

Tier detection

The supervisor probes connectivity at call time:

Tier	Label	Condition
1	full-online	Can reach api.anthropic.com
2	lan-only	Can reach LiteLLM but not Anthropic
3	airplane	No external connectivity

Key env vars

Variable	Default	Purpose
`INGEST_BRAIN_DIR`	`../brain`	Brain directory for ingestion server
`INGEST_PORT`	`3300`	Ingestion server port
`SUPERVISOR_CONFIG_DIR`	`./config/supervisor`	Skill discipline files
`SUPERVISOR_SESSIONS_DIR`	`./brain/sessions`	JSONL session logs
`INGEST_BASE_URL`	`http://localhost:3300`	Supervisor → ingestion
`LITELLM_BASE_URL`	—	LiteLLM proxy for Tier 2 model routing
`SUPERVISOR_MCP_TOKEN`	—	Optional bearer token for the supervisor MCP HTTP endpoint; when empty, no auth is enforced
`ROUTING_PORT`	`3210`	Routing pod's listen port
`ROUTING_MCP_TOKEN`	—	Optional bearer token for the routing MCP HTTP endpoint
`BRAIN_URL`	`http://ingestion.supervisor:3300`	Routing pod → brain (in-cluster)
`HYPERGUILD_FAST_MODEL`	`koala/qwen35-9b-fast`	Fast model for high-pass-rate skill calls
`HYPERGUILD_THINKING_MODEL`	`iguana/gemma4-26b`	Thinking model for low-pass-rate skill calls
`HYPERGUILD_ROUTE_LOCAL_FLOOR`	`0.90`	At/above pass rate, route to fast model
`HYPERGUILD_ROUTE_LOCAL_CEIL`	`0.70`	Below pass rate, route to thinking model. Between CEIL and FLOOR is the sample band.
`HYPERGUILD_PASS_RATE_TTL_SECONDS`	`60`	Per-skill pass-rate cache TTL

Operator note: LiteLLM at LITELLM_BASE_URL must register both HYPERGUILD_FAST_MODEL and HYPERGUILD_THINKING_MODEL for routing to do useful work. If a model is missing, LiteLLM returns 4xx, the routing pod's fast route fails, the fail-open retry on the thinking model likely also fails (since both are missing), and the only signal is final_status: "fail" on _routing entries in the brain.

Phase 2 (planned)

review skill — structured code review with iron law enforcement
debug skill — hypothesis-driven debugging sessions
spec skill — generates specs from conversations
trainer — extracts SFT/DPO pairs from session logs for fine-tuning