mathias/hyperguild

Fork 0

Files

Mathias 58c57412a9

CI / Lint / Test / Vet (push) Successful in 11s

Details

CI / Mirror to GitHub (push) Successful in 3s

Details

feat(brain-mcp): OAuth 2.0 client_credentials flow for claude.ai

Adds a minimal RFC 8414 + RFC 6749 client_credentials flow so claude.ai's
custom-MCP integration (no static-Bearer field in the UI) can exchange a
client_id + client_secret pair for the existing BRAIN_MCP_TOKEN and use
it as a Bearer on /mcp. No JWTs, no refresh, no expiry — the rest of
the auth middleware is unchanged.

New package ingestion/internal/oauth:
- MetadataHandler(issuer): serves /.well-known/oauth-authorization-server
  with grant_types=[client_credentials] and both
  token_endpoint_auth_methods (post + basic).
- TokenHandler(cfg): serves /oauth/token. Validates client_id and
  client_secret via constant-time compare; returns BRAIN_MCP_TOKEN as
  access_token. RFC 6749 §5.2 error JSON on bad grant / bad creds.

Wiring in cmd/server/main.go: opt-in by setting both OAUTH_CLIENT_ID and
OAUTH_CLIENT_SECRET. Setting only one is misconfiguration → exit 1.
Mounts both endpoints with no auth; MCP_RESOURCE_URL supplies the
issuer.

Also pivots issue #8's vector backend from Qdrant to pgvector (see
DECISIONS.md 2026-05-18) — Qdrant was never deployed and postgres18 with
pgvector already runs as the project default; supersedes 2026-04-08 for
this use case.

Tests cover post-auth, basic-auth, wrong secret, bad grant, GET
rejection, malformed Basic header, and Basic without colon.

Closes hyperguild#5.

2026-05-18 22:21:54 +02:00

9.3 KiB

Raw Permalink Blame History

Decisions log

Record why things are the way they are. Future-you will thank present-you.

2026-04-08 — AGENTS.md as cross-tool standard, not CLAUDE.md

Context: Multiple tools (Crush, Pi, Antigravity) read AGENTS.md natively. Claude Code reads CLAUDE.md. Building on CLAUDE.md as the primary format locks into one vendor.

Decision: Canonical source is .context/AGENT.md (root) and .context/PROJECT.md (per-project). The adapter script generates both AGENTS.md and CLAUDE.md — identical content, two filenames. Crush, Pi, and Antigravity read AGENTS.md; Claude Code reads CLAUDE.md.

Consequences: One canonical file serves five+ tools. Adding a new tool that reads AGENTS.md requires zero adapter work.

2026-04-08 — Agent Skills standard (SKILL.md in folders) over flat markdown

Context: Claude Code, Pi, Crush, and Antigravity all support the Agent Skills open standard: a folder containing SKILL.md with frontmatter (name, description). Skills are discovered on-demand — only the description enters context, full instructions load when triggered.

Decision: Skills live in .skills/{name}/SKILL.md at project level. This replaces the earlier .context/skills/{name}.md flat-file approach.

Consequences: Skills are cross-compatible without adaptation. Pi auto-discovers them from .pi/skills/ (symlink). Crush reads them natively. Progressive disclosure keeps context window lean.

2026-04-08 — Go + HTMX as default stack

Context: Need a default that's fast to prototype, easy to deploy as a single binary, and doesn't require a Node/npm toolchain for the UI layer.

Decision: Go with HTMX + Templ for server-rendered UI. Python as fallback for ML/data tasks. TypeScript only when a project genuinely needs a rich client-side SPA.

Consequences: Simpler deployment and dependency management. Agents need Go-specific skills.

2026-04-08 — Task over Make

Context: Makefiles have arcane syntax and poor cross-platform support.

Decision: Use Taskfile (taskfile.dev) — YAML-based, cross-platform, supports task dependencies.

Consequences: One extra binary to install. All project automation in Taskfile.yml.

2026-04-08 — Qdrant over ChromaDB for vector store

Context: Need collection-level isolation for client separation, payload filtering, runs well in k3s.

Decision: Qdrant. Native collection isolation, rich filtering, mature gRPC API.

Consequences: More operational complexity than Chroma, but isolation is non-negotiable for client work.

2026-04-22 — Hyperguild scope reset: drop parametric learning, simplify brain

Context: After shipping Phases 1–4 (MCP server, 6 skills, model orchestration, session logging, CD pipeline), we critically reviewed what was theater vs genuinely useful.

Decisions:

Drop the parametric learning pipeline. SFT/DPO/RL extraction, brain/training-data/ directory structure, Axolotl/LLaMA-Factory fine-tuning loop — all cut. The loop requires thousands of high-quality examples to move the needle, which a solo consultant won't generate. Better base models ship faster than any fine-tuning effort could keep up with. This is a research project, not a productivity tool.
Simplify the brain to plain markdown. brain/knowledge/ replaces brain/wiki/ + brain/raw/ + brain/training-data/. The trainer and retrospective workers write markdown entries. brain_query searches markdown. No ingestion pipeline, no tagging for significance review, no structured JSONL formats.
Measure the escalation chain before assuming it's useful. Local model (phi4) only belongs in a skill's chain if it passes Claude verification at a meaningful rate. Where it fails >70% of the time, it adds cost not value. Per-skill hit rate logging is the prerequisite to honest chain configuration.
Keep what's real: MCP tool surface, session logging with attempt records, tier detection, CD pipeline, bridge to Claude Code.

What to build next (in priority order):

brain_query injection into skill handlers before spawning workers — this makes the declarative brain actually function
protocols.md — behavioral contract injected into every worker prompt
Per-skill pass rate logging and chain tuning

Consequences: Simpler system with a shorter feedback loop. The brain becomes real only when skill handlers query it. Training data ambitions deferred indefinitely — revisit if local model capabilities improve enough that fine-tuning becomes worthwhile.

Plan 6: routing pod reuses internal/skills/{review,debug,retrospective,trainer}

Plan 6 (Mode 2 routing pod, 2026-05-04) introduces a second consumer of the four cost-routable skill packages. The routing pod constructs each skill via <pkg>.New(Config{...}) and hands it routing.Router.Run as the CompleteFunc.

Preserved code (do not delete):

internal/skills/{review,debug,retrospective,trainer}/
internal/registry, internal/mcp, internal/exec/litellm.go
internal/routing/, cmd/routing/

Plan 7: supervisor pod retired (2026-05-12)

What was deleted: cmd/supervisor/, internal/skills/{tdd,spec}/, root Dockerfile, supervisor k8s manifests (Deployment, Service, Ingress, NodePort 30320), supervisor entry removed from all .mcp.json configs.

Coverage: tdd/spec → SKILL.md files in ~/dev/.skills/; review, debug, retrospective, trainer → routing pod; brain_*/session_log → brain MCP; tier → hyperguild tier CLI.

2026-05-12 — brain_answer and brain_classify: LLM routing via berget.ai → iguana

Context: Brain MCP returned raw BM25 excerpts with no synthesis. Adding LLM-backed tools enables Q&A and ingestion enrichment without a separate service.

Decision: Two new MCP tools in the ingestion service (ingestion/internal/mcp/):

brain_answer(query) — BM25 top-10 → LLM synthesis → answer + sources
brain_classify(text) — LLM classifies doc into type/title/tags

Primary LLM: berget.ai gemma4:31b (EU cloud, spend tokens while available). Fallback: iguana gemma4:31b (local Ollama). Reranker deferred to follow-up. Router lives in ingestion/internal/llm.Router; opt-in via BRAIN_LLM_PRIMARY_URL.

Consequences: Brain becomes a knowledge assistant, not just a search index. When berget.ai tokens run out, flip BRAIN_LLM_PRIMARY_URL to iguana.

2026-04-08 — Mistral Vibe gets its own adapter

Context: Vibe doesn't read AGENTS.md — it uses ~/.vibe/prompts/ and ~/.vibe/agents/ with TOML config.

Decision: The root context-sync generates a mathias.md prompt and mathias.toml agent config in ~/.vibe/. This is the one tool that needs a custom adapter path.

Consequences: Run vibe --agent mathias to use your conventions. Other Vibe users on the machine aren't affected.

2026-05-18 — project_create commits staging namespace directly to infra main

Context: project_create writes a k8s namespace manifest into the infra repo so Flux brings up a staging environment for the new project. Initial implementation pushed to a staging/<name> branch, which required manual PR merge before Flux saw the namespace — defeating the "one tool call, project exists, staging reconciling within 60s" goal.

Decision: Option A — commit directly to main. callInfraCommit passes branch: "main" to gitea-mcp's file_write_branch; no PR, no merge step.

Consequences: Staging namespace appears in cluster within ~60s of the project_create call. Consistent with project-wide TBD policy (CLAUDE.md): commit directly to main, every commit deployable. Acceptable because the manifest is a fresh namespace under k3s/staging/<name>/ — isolated, low blast-radius, and Flux will simply recreate it if the file is bad. Manual review gating was friction for no compensating safety gain on experiment namespaces.

2026-05-18 — pgvector over Qdrant for brain hybrid retrieval (supersedes 2026-04-08)

Context: The 2026-04-08 ADR chose Qdrant for vector store. Since then, postgres18 with pgvector has been deployed in the databases namespace on koala and is already the shared default for the rest of the project (CLAUDE.md lists pgvector (vector), BM25 as the primary search layer and Qdrant only as a fallback "when >1M vectors or hybrid retrieval"). Qdrant itself has never been deployed — kubectl get finds no pod, service, or manifest. Standing up a new vector engine for a single consumer is friction that the original ADR did not weigh.

Decision: Use pgvector for brain hybrid retrieval. Issue #8 — and any follow-on embedding work — targets the existing postgres18 instance:

one table brain_embeddings(path TEXT PRIMARY KEY, embedding VECTOR(768), updated_at TIMESTAMPTZ), IVFFlat or HNSW index by feel once volume warrants
BM25 stays as today (file walk + token frequency); cosine via pgvector
hybrid scoring done in SQL or Go; pick once we measure
nomic-embed-text on iguana ollama provides 768-dim vectors

Consequences: One database engine instead of two. Backups, monitoring, and connection pooling already solved. Trade-off: pgvector at >1M vectors or under hybrid-search load may underperform Qdrant — revisit only when benchmarks hurt. The 2026-04-08 ADR is superseded for the brain use case; Qdrant remains the noted fallback path in CLAUDE.md if scale demands it.

9.3 KiB Raw Permalink Blame History Unescape Escape