diff --git a/.aider.conventions.md b/.aider.conventions.md index 01003be..43d4fbf 100644 --- a/.aider.conventions.md +++ b/.aider.conventions.md @@ -227,6 +227,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale: `tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`, `trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later migration. +- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises + the same four cost-routable skills as the supervisor (`review`, `debug`, + `retrospective`, `trainer`) but per-call decides whether to use a local + model or Claude based on the brain's `/pass-rate` response. Bearer auth + via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this + endpoint; Mode 1 and Mode 3 do not. The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`, `/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for diff --git a/.context/PROJECT.md b/.context/PROJECT.md index 7a4dab2..8e1b4e5 100644 --- a/.context/PROJECT.md +++ b/.context/PROJECT.md @@ -56,6 +56,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale: `tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`, `trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later migration. +- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises + the same four cost-routable skills as the supervisor (`review`, `debug`, + `retrospective`, `trainer`) but per-call decides whether to use a local + model or Claude based on the brain's `/pass-rate` response. Bearer auth + via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this + endpoint; Mode 1 and Mode 3 do not. The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`, `/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for diff --git a/.context/system-prompt.txt b/.context/system-prompt.txt index 10e407a..0353a6c 100644 --- a/.context/system-prompt.txt +++ b/.context/system-prompt.txt @@ -232,6 +232,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale: `tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`, `trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later migration. +- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises + the same four cost-routable skills as the supervisor (`review`, `debug`, + `retrospective`, `trainer`) but per-call decides whether to use a local + model or Claude based on the brain's `/pass-rate` response. Bearer auth + via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this + endpoint; Mode 1 and Mode 3 do not. The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`, `/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for diff --git a/.cursorrules b/.cursorrules index 33efd99..bd23f96 100644 --- a/.cursorrules +++ b/.cursorrules @@ -230,6 +230,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale: `tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`, `trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later migration. +- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises + the same four cost-routable skills as the supervisor (`review`, `debug`, + `retrospective`, `trainer`) but per-call decides whether to use a local + model or Claude based on the brain's `/pass-rate` response. Bearer auth + via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this + endpoint; Mode 1 and Mode 3 do not. The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`, `/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for diff --git a/AGENTS.md b/AGENTS.md index 01003be..43d4fbf 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -227,6 +227,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale: `tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`, `trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later migration. +- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises + the same four cost-routable skills as the supervisor (`review`, `debug`, + `retrospective`, `trainer`) but per-call decides whether to use a local + model or Claude based on the brain's `/pass-rate` response. Bearer auth + via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this + endpoint; Mode 1 and Mode 3 do not. The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`, `/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for diff --git a/CLAUDE.md b/CLAUDE.md index 7a4dab2..8e1b4e5 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -56,6 +56,12 @@ Two MCP servers expose this project's tooling, both reachable over Tailscale: `tdd_green`, `tdd_refactor`, `review`, `debug`, `spec`, `retrospective`, `trainer`, `tier`). Will shrink as skill workers move to SKILL.md in a later migration. +- **`routing`** at `http://koala:30310/mcp` — Mode 2 routing pod. Advertises + the same four cost-routable skills as the supervisor (`review`, `debug`, + `retrospective`, `trainer`) but per-call decides whether to use a local + model or Claude based on the brain's `/pass-rate` response. Bearer auth + via `ROUTING_MCP_TOKEN` (opt-in). Only `mode client-local` registers this + endpoint; Mode 1 and Mode 3 do not. The brain HTTP REST API (`/query`, `/write`, `/ingest`, `/ingest-raw`, `/ingest-path`, `/backfill-refs`) remains available on the same port (3300) for diff --git a/README.md b/README.md index 3577d84..f619890 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,7 @@ Your Claude Code session (in any project) │ │ MCP over HTTP (Tailscale) ├──▶ supervisor :3200 (NodePort 30320 on koala) — skill workers: tdd, debug, spec, … + ├──▶ routing :3210 (NodePort 30310 on koala) — Mode 2 only: review, debug, retrospective, trainer └──▶ brain :3300 (NodePort 30330 on koala) — brain_query, brain_write, brain_ingest, session_log │ └─ also serves the legacy REST endpoints (/query, /write, /ingest, …) @@ -112,6 +113,14 @@ The supervisor probes connectivity at call time: | `INGEST_BASE_URL` | `http://localhost:3300` | Supervisor → ingestion | | `LITELLM_BASE_URL` | — | LiteLLM proxy for Tier 2 model routing | | `SUPERVISOR_MCP_TOKEN` | — | Optional bearer token for the supervisor MCP HTTP endpoint; when empty, no auth is enforced | +| `ROUTING_PORT` | `3210` | Routing pod's listen port | +| `ROUTING_MCP_TOKEN` | — | Optional bearer token for the routing MCP HTTP endpoint | +| `BRAIN_URL` | `http://ingestion.supervisor:3300` | Routing pod → brain (in-cluster) | +| `HYPERGUILD_LOCAL_MODEL` | `qwen35` | Local model for routed-to-local skill calls | +| `HYPERGUILD_CLAUDE_MODEL` | `claude-sonnet-4-6` | Claude model for routed-to-Claude skill calls | +| `HYPERGUILD_ROUTE_LOCAL_FLOOR` | `0.90` | At/above pass rate, route to local | +| `HYPERGUILD_ROUTE_LOCAL_CEIL` | `0.70` | Below pass rate, route to Claude. Between CEIL and FLOOR is the sample band. | +| `HYPERGUILD_PASS_RATE_TTL_SECONDS` | `60` | Per-skill pass-rate cache TTL | ## Phase 2 (planned)