hyperguild/DECISIONS.md

# Decisions log

Record *why* things are the way they are. Future-you will thank present-you.

---

## 2026-04-08 — AGENTS.md as cross-tool standard, not CLAUDE.md

**Context**: Multiple tools (Crush, Pi, Antigravity) read `AGENTS.md` natively. Claude Code reads `CLAUDE.md`. Building on `CLAUDE.md` as the primary format locks into one vendor.

**Decision**: Canonical source is `.context/AGENT.md` (root) and `.context/PROJECT.md` (per-project). The adapter script generates both `AGENTS.md` and `CLAUDE.md` — identical content, two filenames. Crush, Pi, and Antigravity read `AGENTS.md`; Claude Code reads `CLAUDE.md`.

**Consequences**: One canonical file serves five+ tools. Adding a new tool that reads `AGENTS.md` requires zero adapter work.

## 2026-04-08 — Agent Skills standard (SKILL.md in folders) over flat markdown

**Context**: Claude Code, Pi, Crush, and Antigravity all support the Agent Skills open standard: a folder containing `SKILL.md` with frontmatter (`name`, `description`). Skills are discovered on-demand — only the description enters context, full instructions load when triggered.

**Decision**: Skills live in `.skills/{name}/SKILL.md` at project level. This replaces the earlier `.context/skills/{name}.md` flat-file approach.

**Consequences**: Skills are cross-compatible without adaptation. Pi auto-discovers them from `.pi/skills/` (symlink). Crush reads them natively. Progressive disclosure keeps context window lean.

## 2026-04-08 — Go + HTMX as default stack

**Context**: Need a default that's fast to prototype, easy to deploy as a single binary, and doesn't require a Node/npm toolchain for the UI layer.

**Decision**: Go with HTMX + Templ for server-rendered UI. Python as fallback for ML/data tasks. TypeScript only when a project genuinely needs a rich client-side SPA.

**Consequences**: Simpler deployment and dependency management. Agents need Go-specific skills.

## 2026-04-08 — Task over Make

**Context**: Makefiles have arcane syntax and poor cross-platform support.

**Decision**: Use Taskfile (taskfile.dev) — YAML-based, cross-platform, supports task dependencies.

**Consequences**: One extra binary to install. All project automation in `Taskfile.yml`.

## 2026-04-08 — Qdrant over ChromaDB for vector store

**Context**: Need collection-level isolation for client separation, payload filtering, runs well in k3s.

**Decision**: Qdrant. Native collection isolation, rich filtering, mature gRPC API.

**Consequences**: More operational complexity than Chroma, but isolation is non-negotiable for client work.

## 2026-04-22 — Hyperguild scope reset: drop parametric learning, simplify brain

**Context**: After shipping Phases 1–4 (MCP server, 6 skills, model orchestration, session logging, CD pipeline), we critically reviewed what was theater vs genuinely useful.

**Decisions**:

1. **Drop the parametric learning pipeline.** SFT/DPO/RL extraction, `brain/training-data/` directory structure, Axolotl/LLaMA-Factory fine-tuning loop — all cut. The loop requires thousands of high-quality examples to move the needle, which a solo consultant won't generate. Better base models ship faster than any fine-tuning effort could keep up with. This is a research project, not a productivity tool.

2. **Simplify the brain to plain markdown.** `brain/knowledge/` replaces `brain/wiki/ + brain/raw/ + brain/training-data/`. The trainer and retrospective workers write markdown entries. `brain_query` searches markdown. No ingestion pipeline, no tagging for significance review, no structured JSONL formats.

3. **Measure the escalation chain before assuming it's useful.** Local model (phi4) only belongs in a skill's chain if it passes Claude verification at a meaningful rate. Where it fails >70% of the time, it adds cost not value. Per-skill hit rate logging is the prerequisite to honest chain configuration.

4. **Keep what's real**: MCP tool surface, session logging with attempt records, tier detection, CD pipeline, bridge to Claude Code.

**What to build next** (in priority order):
- `brain_query` injection into skill handlers before spawning workers — this makes the declarative brain actually function
- `protocols.md` — behavioral contract injected into every worker prompt
- Per-skill pass rate logging and chain tuning

**Consequences**: Simpler system with a shorter feedback loop. The brain becomes real only when skill handlers query it. Training data ambitions deferred indefinitely — revisit if local model capabilities improve enough that fine-tuning becomes worthwhile.

---

## Plan 6: routing pod reuses internal/skills/{review,debug,retrospective,trainer}

Plan 6 (Mode 2 routing pod, 2026-05-04) introduces a second consumer of
the four cost-routable skill packages. The routing pod constructs each
skill via `<pkg>.New(Config{...})` and hands it `routing.Router.Run` as
the `CompleteFunc`.

**Preserved code (do not delete):**
- `internal/skills/{review,debug,retrospective,trainer}/`
- `internal/registry`, `internal/mcp`, `internal/exec/litellm.go`
- `internal/routing/`, `cmd/routing/`

---

## Plan 7: supervisor pod retired (2026-05-12)

**What was deleted:** `cmd/supervisor/`, `internal/skills/{tdd,spec}/`,
root `Dockerfile`, supervisor k8s manifests (Deployment, Service, Ingress,
NodePort 30320), `supervisor` entry removed from all `.mcp.json` configs.

**Coverage:** `tdd`/`spec` → SKILL.md files in `~/dev/.skills/`; `review`,
`debug`, `retrospective`, `trainer` → routing pod; `brain_*`/`session_log` →
brain MCP; `tier` → `hyperguild tier` CLI.

---

## 2026-05-12 — brain_answer and brain_classify: LLM routing via berget.ai → iguana

**Context:** Brain MCP returned raw BM25 excerpts with no synthesis. Adding
LLM-backed tools enables Q&A and ingestion enrichment without a separate service.

**Decision:** Two new MCP tools in the ingestion service (`ingestion/internal/mcp/`):
- `brain_answer(query)` — BM25 top-10 → LLM synthesis → answer + sources
- `brain_classify(text)` — LLM classifies doc into type/title/tags

Primary LLM: berget.ai `gemma4:31b` (EU cloud, spend tokens while available).
Fallback: iguana `gemma4:31b` (local Ollama). Reranker deferred to follow-up.
Router lives in `ingestion/internal/llm.Router`; opt-in via `BRAIN_LLM_PRIMARY_URL`.

**Consequences:** Brain becomes a knowledge assistant, not just a search index.
When berget.ai tokens run out, flip `BRAIN_LLM_PRIMARY_URL` to iguana.

---

## 2026-04-08 — Mistral Vibe gets its own adapter

**Context**: Vibe doesn't read `AGENTS.md` — it uses `~/.vibe/prompts/` and `~/.vibe/agents/` with TOML config.

**Decision**: The root context-sync generates a `mathias.md` prompt and `mathias.toml` agent config in `~/.vibe/`. This is the one tool that needs a custom adapter path.

**Consequences**: Run `vibe --agent mathias` to use your conventions. Other Vibe users on the machine aren't affected.

---

## 2026-05-18 — project_create commits staging namespace directly to infra main

**Context:** `project_create` writes a k8s namespace manifest into the infra
repo so Flux brings up a staging environment for the new project. Initial
implementation pushed to a `staging/<name>` branch, which required manual PR
merge before Flux saw the namespace — defeating the "one tool call, project
exists, staging reconciling within 60s" goal.

**Decision:** Option A — commit directly to `main`. `callInfraCommit` passes
`branch: "main"` to gitea-mcp's `file_write_branch`; no PR, no merge step.

**Consequences:** Staging namespace appears in cluster within ~60s of the
`project_create` call. Consistent with project-wide TBD policy (CLAUDE.md):
commit directly to main, every commit deployable. Acceptable because the
manifest is a fresh namespace under `k3s/staging/<name>/` — isolated, low
blast-radius, and Flux will simply recreate it if the file is bad. Manual
review gating was friction for no compensating safety gain on experiment
namespaces.

---

## 2026-05-18 — pgvector over Qdrant for brain hybrid retrieval (supersedes 2026-04-08)

**Context:** The 2026-04-08 ADR chose Qdrant for vector store. Since then,
postgres18 with pgvector has been deployed in the `databases` namespace on
koala and is already the shared default for the rest of the project
(CLAUDE.md lists `pgvector (vector), BM25` as the primary search layer and
Qdrant only as a fallback "when >1M vectors or hybrid retrieval"). Qdrant
itself has never been deployed — `kubectl get` finds no pod, service, or
manifest. Standing up a new vector engine for a single consumer is friction
that the original ADR did not weigh.

**Decision:** Use pgvector for brain hybrid retrieval. Issue #8 — and any
follow-on embedding work — targets the existing `postgres18` instance:

- one table `brain_embeddings(path TEXT PRIMARY KEY, embedding VECTOR(768), updated_at TIMESTAMPTZ)`,
  IVFFlat or HNSW index by feel once volume warrants
- BM25 stays as today (file walk + token frequency); cosine via pgvector
- hybrid scoring done in SQL or Go; pick once we measure
- nomic-embed-text on iguana ollama provides 768-dim vectors

**Consequences:** One database engine instead of two. Backups, monitoring,
and connection pooling already solved. Trade-off: pgvector at >1M vectors
or under hybrid-search load may underperform Qdrant — revisit only when
benchmarks hurt. The 2026-04-08 ADR is superseded for the brain use case;
Qdrant remains the noted fallback path in CLAUDE.md if scale demands it.