Adds a minimal RFC 8414 + RFC 6749 client_credentials flow so claude.ai's custom-MCP integration (no static-Bearer field in the UI) can exchange a client_id + client_secret pair for the existing BRAIN_MCP_TOKEN and use it as a Bearer on /mcp. No JWTs, no refresh, no expiry — the rest of the auth middleware is unchanged. New package ingestion/internal/oauth: - MetadataHandler(issuer): serves /.well-known/oauth-authorization-server with grant_types=[client_credentials] and both token_endpoint_auth_methods (post + basic). - TokenHandler(cfg): serves /oauth/token. Validates client_id and client_secret via constant-time compare; returns BRAIN_MCP_TOKEN as access_token. RFC 6749 §5.2 error JSON on bad grant / bad creds. Wiring in cmd/server/main.go: opt-in by setting both OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET. Setting only one is misconfiguration → exit 1. Mounts both endpoints with no auth; MCP_RESOURCE_URL supplies the issuer. Also pivots issue #8's vector backend from Qdrant to pgvector (see DECISIONS.md 2026-05-18) — Qdrant was never deployed and postgres18 with pgvector already runs as the project default; supersedes 2026-04-08 for this use case. Tests cover post-auth, basic-auth, wrong secret, bad grant, GET rejection, malformed Basic header, and Basic without colon. Closes hyperguild#5.
170 lines
9.3 KiB
Markdown
170 lines
9.3 KiB
Markdown
# Decisions log
|
||
|
||
Record *why* things are the way they are. Future-you will thank present-you.
|
||
|
||
---
|
||
|
||
## 2026-04-08 — AGENTS.md as cross-tool standard, not CLAUDE.md
|
||
|
||
**Context**: Multiple tools (Crush, Pi, Antigravity) read `AGENTS.md` natively. Claude Code reads `CLAUDE.md`. Building on `CLAUDE.md` as the primary format locks into one vendor.
|
||
|
||
**Decision**: Canonical source is `.context/AGENT.md` (root) and `.context/PROJECT.md` (per-project). The adapter script generates both `AGENTS.md` and `CLAUDE.md` — identical content, two filenames. Crush, Pi, and Antigravity read `AGENTS.md`; Claude Code reads `CLAUDE.md`.
|
||
|
||
**Consequences**: One canonical file serves five+ tools. Adding a new tool that reads `AGENTS.md` requires zero adapter work.
|
||
|
||
## 2026-04-08 — Agent Skills standard (SKILL.md in folders) over flat markdown
|
||
|
||
**Context**: Claude Code, Pi, Crush, and Antigravity all support the Agent Skills open standard: a folder containing `SKILL.md` with frontmatter (`name`, `description`). Skills are discovered on-demand — only the description enters context, full instructions load when triggered.
|
||
|
||
**Decision**: Skills live in `.skills/{name}/SKILL.md` at project level. This replaces the earlier `.context/skills/{name}.md` flat-file approach.
|
||
|
||
**Consequences**: Skills are cross-compatible without adaptation. Pi auto-discovers them from `.pi/skills/` (symlink). Crush reads them natively. Progressive disclosure keeps context window lean.
|
||
|
||
## 2026-04-08 — Go + HTMX as default stack
|
||
|
||
**Context**: Need a default that's fast to prototype, easy to deploy as a single binary, and doesn't require a Node/npm toolchain for the UI layer.
|
||
|
||
**Decision**: Go with HTMX + Templ for server-rendered UI. Python as fallback for ML/data tasks. TypeScript only when a project genuinely needs a rich client-side SPA.
|
||
|
||
**Consequences**: Simpler deployment and dependency management. Agents need Go-specific skills.
|
||
|
||
## 2026-04-08 — Task over Make
|
||
|
||
**Context**: Makefiles have arcane syntax and poor cross-platform support.
|
||
|
||
**Decision**: Use Taskfile (taskfile.dev) — YAML-based, cross-platform, supports task dependencies.
|
||
|
||
**Consequences**: One extra binary to install. All project automation in `Taskfile.yml`.
|
||
|
||
## 2026-04-08 — Qdrant over ChromaDB for vector store
|
||
|
||
**Context**: Need collection-level isolation for client separation, payload filtering, runs well in k3s.
|
||
|
||
**Decision**: Qdrant. Native collection isolation, rich filtering, mature gRPC API.
|
||
|
||
**Consequences**: More operational complexity than Chroma, but isolation is non-negotiable for client work.
|
||
|
||
## 2026-04-22 — Hyperguild scope reset: drop parametric learning, simplify brain
|
||
|
||
**Context**: After shipping Phases 1–4 (MCP server, 6 skills, model orchestration, session logging, CD pipeline), we critically reviewed what was theater vs genuinely useful.
|
||
|
||
**Decisions**:
|
||
|
||
1. **Drop the parametric learning pipeline.** SFT/DPO/RL extraction, `brain/training-data/` directory structure, Axolotl/LLaMA-Factory fine-tuning loop — all cut. The loop requires thousands of high-quality examples to move the needle, which a solo consultant won't generate. Better base models ship faster than any fine-tuning effort could keep up with. This is a research project, not a productivity tool.
|
||
|
||
2. **Simplify the brain to plain markdown.** `brain/knowledge/` replaces `brain/wiki/ + brain/raw/ + brain/training-data/`. The trainer and retrospective workers write markdown entries. `brain_query` searches markdown. No ingestion pipeline, no tagging for significance review, no structured JSONL formats.
|
||
|
||
3. **Measure the escalation chain before assuming it's useful.** Local model (phi4) only belongs in a skill's chain if it passes Claude verification at a meaningful rate. Where it fails >70% of the time, it adds cost not value. Per-skill hit rate logging is the prerequisite to honest chain configuration.
|
||
|
||
4. **Keep what's real**: MCP tool surface, session logging with attempt records, tier detection, CD pipeline, bridge to Claude Code.
|
||
|
||
**What to build next** (in priority order):
|
||
- `brain_query` injection into skill handlers before spawning workers — this makes the declarative brain actually function
|
||
- `protocols.md` — behavioral contract injected into every worker prompt
|
||
- Per-skill pass rate logging and chain tuning
|
||
|
||
**Consequences**: Simpler system with a shorter feedback loop. The brain becomes real only when skill handlers query it. Training data ambitions deferred indefinitely — revisit if local model capabilities improve enough that fine-tuning becomes worthwhile.
|
||
|
||
---
|
||
|
||
## Plan 6: routing pod reuses internal/skills/{review,debug,retrospective,trainer}
|
||
|
||
Plan 6 (Mode 2 routing pod, 2026-05-04) introduces a second consumer of
|
||
the four cost-routable skill packages. The routing pod constructs each
|
||
skill via `<pkg>.New(Config{...})` and hands it `routing.Router.Run` as
|
||
the `CompleteFunc`.
|
||
|
||
**Preserved code (do not delete):**
|
||
- `internal/skills/{review,debug,retrospective,trainer}/`
|
||
- `internal/registry`, `internal/mcp`, `internal/exec/litellm.go`
|
||
- `internal/routing/`, `cmd/routing/`
|
||
|
||
---
|
||
|
||
## Plan 7: supervisor pod retired (2026-05-12)
|
||
|
||
**What was deleted:** `cmd/supervisor/`, `internal/skills/{tdd,spec}/`,
|
||
root `Dockerfile`, supervisor k8s manifests (Deployment, Service, Ingress,
|
||
NodePort 30320), `supervisor` entry removed from all `.mcp.json` configs.
|
||
|
||
**Coverage:** `tdd`/`spec` → SKILL.md files in `~/dev/.skills/`; `review`,
|
||
`debug`, `retrospective`, `trainer` → routing pod; `brain_*`/`session_log` →
|
||
brain MCP; `tier` → `hyperguild tier` CLI.
|
||
|
||
---
|
||
|
||
## 2026-05-12 — brain_answer and brain_classify: LLM routing via berget.ai → iguana
|
||
|
||
**Context:** Brain MCP returned raw BM25 excerpts with no synthesis. Adding
|
||
LLM-backed tools enables Q&A and ingestion enrichment without a separate service.
|
||
|
||
**Decision:** Two new MCP tools in the ingestion service (`ingestion/internal/mcp/`):
|
||
- `brain_answer(query)` — BM25 top-10 → LLM synthesis → answer + sources
|
||
- `brain_classify(text)` — LLM classifies doc into type/title/tags
|
||
|
||
Primary LLM: berget.ai `gemma4:31b` (EU cloud, spend tokens while available).
|
||
Fallback: iguana `gemma4:31b` (local Ollama). Reranker deferred to follow-up.
|
||
Router lives in `ingestion/internal/llm.Router`; opt-in via `BRAIN_LLM_PRIMARY_URL`.
|
||
|
||
**Consequences:** Brain becomes a knowledge assistant, not just a search index.
|
||
When berget.ai tokens run out, flip `BRAIN_LLM_PRIMARY_URL` to iguana.
|
||
|
||
---
|
||
|
||
## 2026-04-08 — Mistral Vibe gets its own adapter
|
||
|
||
**Context**: Vibe doesn't read `AGENTS.md` — it uses `~/.vibe/prompts/` and `~/.vibe/agents/` with TOML config.
|
||
|
||
**Decision**: The root context-sync generates a `mathias.md` prompt and `mathias.toml` agent config in `~/.vibe/`. This is the one tool that needs a custom adapter path.
|
||
|
||
**Consequences**: Run `vibe --agent mathias` to use your conventions. Other Vibe users on the machine aren't affected.
|
||
|
||
---
|
||
|
||
## 2026-05-18 — project_create commits staging namespace directly to infra main
|
||
|
||
**Context:** `project_create` writes a k8s namespace manifest into the infra
|
||
repo so Flux brings up a staging environment for the new project. Initial
|
||
implementation pushed to a `staging/<name>` branch, which required manual PR
|
||
merge before Flux saw the namespace — defeating the "one tool call, project
|
||
exists, staging reconciling within 60s" goal.
|
||
|
||
**Decision:** Option A — commit directly to `main`. `callInfraCommit` passes
|
||
`branch: "main"` to gitea-mcp's `file_write_branch`; no PR, no merge step.
|
||
|
||
**Consequences:** Staging namespace appears in cluster within ~60s of the
|
||
`project_create` call. Consistent with project-wide TBD policy (CLAUDE.md):
|
||
commit directly to main, every commit deployable. Acceptable because the
|
||
manifest is a fresh namespace under `k3s/staging/<name>/` — isolated, low
|
||
blast-radius, and Flux will simply recreate it if the file is bad. Manual
|
||
review gating was friction for no compensating safety gain on experiment
|
||
namespaces.
|
||
|
||
---
|
||
|
||
## 2026-05-18 — pgvector over Qdrant for brain hybrid retrieval (supersedes 2026-04-08)
|
||
|
||
**Context:** The 2026-04-08 ADR chose Qdrant for vector store. Since then,
|
||
postgres18 with pgvector has been deployed in the `databases` namespace on
|
||
koala and is already the shared default for the rest of the project
|
||
(CLAUDE.md lists `pgvector (vector), BM25` as the primary search layer and
|
||
Qdrant only as a fallback "when >1M vectors or hybrid retrieval"). Qdrant
|
||
itself has never been deployed — `kubectl get` finds no pod, service, or
|
||
manifest. Standing up a new vector engine for a single consumer is friction
|
||
that the original ADR did not weigh.
|
||
|
||
**Decision:** Use pgvector for brain hybrid retrieval. Issue #8 — and any
|
||
follow-on embedding work — targets the existing `postgres18` instance:
|
||
|
||
- one table `brain_embeddings(path TEXT PRIMARY KEY, embedding VECTOR(768), updated_at TIMESTAMPTZ)`,
|
||
IVFFlat or HNSW index by feel once volume warrants
|
||
- BM25 stays as today (file walk + token frequency); cosine via pgvector
|
||
- hybrid scoring done in SQL or Go; pick once we measure
|
||
- nomic-embed-text on iguana ollama provides 768-dim vectors
|
||
|
||
**Consequences:** One database engine instead of two. Backups, monitoring,
|
||
and connection pooling already solved. Trade-off: pgvector at >1M vectors
|
||
or under hybrid-search load may underperform Qdrant — revisit only when
|
||
benchmarks hurt. The 2026-04-08 ADR is superseded for the brain use case;
|
||
Qdrant remains the noted fallback path in CLAUDE.md if scale demands it.
|