fix(ingestion): always append .md extension to written filenames
brain_write with a custom filename omitted the .md extension, causing search to skip the file (search.go filters on HasSuffix .md). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
10
.mcp.json
Normal file
10
.mcp.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"mcpServers": {
|
||||
"supervisor": {
|
||||
"command": "/Users/mathias/dev/AI/supervisor/bin/supervisor-bridge",
|
||||
"env": {
|
||||
"SUPERVISOR_URL": "http://koala:30320/mcp"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
241
docs/multi-model-routing.md
Normal file
241
docs/multi-model-routing.md
Normal file
@@ -0,0 +1,241 @@
|
||||
# Multi-Model Routing for supervisor
|
||||
|
||||
Reference document for implementing multi-model access within the supervisor project.
|
||||
Researched April 2026. Constraints: Claude Max subscription (ToS must be respected).
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
Route tasks to specialized, cheaper, or local models during agent and skill flows — without
|
||||
violating Anthropic's terms or introducing unnecessary infrastructure risk.
|
||||
|
||||
---
|
||||
|
||||
## Hard Constraints
|
||||
|
||||
- Claude Max subscription is in use. Anthropic's April 2026 terms **prohibit using the
|
||||
subscription with third-party harnesses that spoof the Anthropic API surface**.
|
||||
- `ANTHROPIC_BASE_URL` → LiteLLM workaround is explicitly out of scope.
|
||||
- Claude must remain the reasoning engine. Other models are tools, not replacements.
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure Available
|
||||
|
||||
| Machine | Role | Relevant services |
|
||||
|---------|------|-------------------|
|
||||
| koala | GPU inference | llama-swap, Ollama, Qdrant, LiteLLM proxy |
|
||||
| iguana | Services, builds | k3s, general services |
|
||||
| flamingo | Daily driver | Claude Code runs here |
|
||||
|
||||
LiteLLM proxy on koala exposes 100+ models (local + cloud) through a unified API.
|
||||
All machines connected via Tailscale.
|
||||
|
||||
---
|
||||
|
||||
## Approved Patterns
|
||||
|
||||
### Pattern 1 — Native Claude model tiering (zero build)
|
||||
|
||||
Claude Code subagents support per-agent model selection via frontmatter.
|
||||
Use this for cost routing within the Claude model family.
|
||||
|
||||
```yaml
|
||||
# ~/.claude/agents/explorer.md
|
||||
---
|
||||
name: explorer
|
||||
description: File reading, code search, codebase mapping — use for all exploration tasks
|
||||
model: haiku
|
||||
---
|
||||
```
|
||||
|
||||
- `haiku` for exploration, summarization, classification
|
||||
- `sonnet` (default) for main reasoning and implementation
|
||||
- `opus` for deep analysis, architecture decisions
|
||||
|
||||
**When to use**: Always. Add `model: haiku` to any subagent that does read-heavy or
|
||||
classification work. Cheapest and fastest path to cost control.
|
||||
|
||||
---
|
||||
|
||||
### Pattern 2 — MCP tools wrapping local models (primary build target)
|
||||
|
||||
Expose local models on koala as named MCP tools. Claude remains the orchestrator and
|
||||
reasoning engine — it calls local models as tools the same way it calls any other tool.
|
||||
|
||||
This is the intended MCP use case and carries zero ToS risk.
|
||||
|
||||
**Semantic contract**: Claude decides *when* to delegate based on the tool description.
|
||||
Write descriptions that tell Claude what the model is good for.
|
||||
|
||||
#### MCP server implementation
|
||||
|
||||
Small Python server, run on koala or flamingo, registered in Claude Code settings.
|
||||
|
||||
```python
|
||||
# supervisor/scripts/mcp_local_models.py
|
||||
import mcp
|
||||
import requests
|
||||
|
||||
server = mcp.Server("local-models")
|
||||
|
||||
LITELLM_BASE = "http://koala:4000"
|
||||
OLLAMA_BASE = "http://koala:11434"
|
||||
|
||||
def _litellm_chat(model: str, prompt: str) -> str:
|
||||
r = requests.post(f"{LITELLM_BASE}/v1/chat/completions", json={
|
||||
"model": model,
|
||||
"messages": [{"role": "user", "content": prompt}],
|
||||
"max_tokens": 2048,
|
||||
})
|
||||
r.raise_for_status()
|
||||
return r.json()["choices"][0]["message"]["content"]
|
||||
|
||||
|
||||
@server.tool()
|
||||
def ask_local_llama(prompt: str) -> str:
|
||||
"""Ask the local Llama model on koala.
|
||||
Use for: bulk summarization, first-pass analysis, classification, simple Q&A,
|
||||
anything that does not require deep reasoning or up-to-date knowledge.
|
||||
Faster and cheaper than cloud models for routine subtasks."""
|
||||
return _litellm_chat("llama3-local", prompt)
|
||||
|
||||
|
||||
@server.tool()
|
||||
def ask_coding_model(code: str, question: str) -> str:
|
||||
"""Ask a code-specialized local model.
|
||||
Use for: syntax checking, boilerplate generation, code formatting questions,
|
||||
simple refactors where pattern-matching is sufficient."""
|
||||
return _litellm_chat("codellama-local", f"Code:\n{code}\n\nQuestion: {question}")
|
||||
|
||||
|
||||
@server.tool()
|
||||
def list_available_local_models() -> list[str]:
|
||||
"""List all models currently available on the local LiteLLM proxy."""
|
||||
r = requests.get(f"{LITELLM_BASE}/v1/models")
|
||||
r.raise_for_status()
|
||||
return [m["id"] for m in r.json()["data"]]
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
mcp.run_stdio_server(server)
|
||||
```
|
||||
|
||||
#### Register in Claude Code
|
||||
|
||||
Add to `~/.claude/settings.json` (or project-level `.claude/settings.json`):
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"local-models": {
|
||||
"command": "python3",
|
||||
"args": ["/path/to/supervisor/scripts/mcp_local_models.py"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### LiteLLM config additions needed on koala
|
||||
|
||||
```yaml
|
||||
# litellm config.yaml — add model entries for local models
|
||||
model_list:
|
||||
- model_name: llama3-local
|
||||
litellm_params:
|
||||
model: ollama/llama3.2
|
||||
api_base: http://localhost:11434
|
||||
|
||||
- model_name: codellama-local
|
||||
litellm_params:
|
||||
model: ollama/codellama
|
||||
api_base: http://localhost:11434
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Pattern 3 — External orchestration scripts (for pipeline workflows)
|
||||
|
||||
For multi-model pipelines that don't need to live inside a Claude Code session.
|
||||
These scripts use their own API key (separate from Max subscription — API billing),
|
||||
so they can call Claude API + LiteLLM freely.
|
||||
|
||||
Claude Code invokes them via the Bash tool.
|
||||
|
||||
```
|
||||
Claude Code → [Bash tool] → ./scripts/orchestrate.py → {Claude API, LiteLLM, local models}
|
||||
```
|
||||
|
||||
```python
|
||||
# supervisor/scripts/orchestrate.py
|
||||
import anthropic
|
||||
import requests
|
||||
|
||||
claude = anthropic.Anthropic() # reads ANTHROPIC_API_KEY — separate from Max subscription
|
||||
|
||||
def analyze_document(path: str) -> str:
|
||||
with open(path) as f:
|
||||
content = f.read()
|
||||
|
||||
# Step 1: local Llama extracts structure (fast, cheap)
|
||||
structure = requests.post("http://koala:4000/v1/chat/completions", json={
|
||||
"model": "llama3-local",
|
||||
"messages": [{"role": "user", "content": f"Extract key sections from:\n{content}"}],
|
||||
}).json()["choices"][0]["message"]["content"]
|
||||
|
||||
# Step 2: Claude synthesizes and reasons over it
|
||||
synthesis = claude.messages.create(
|
||||
model="claude-sonnet-4-6",
|
||||
max_tokens=2048,
|
||||
messages=[{"role": "user", "content": f"Synthesize these findings:\n{structure}"}]
|
||||
)
|
||||
return synthesis.content[0].text
|
||||
```
|
||||
|
||||
**When to use**: Batch processing, automated pipelines, workflows triggered by cron or
|
||||
external events. Not for interactive Claude Code sessions.
|
||||
|
||||
---
|
||||
|
||||
## What to Skip
|
||||
|
||||
| Approach | Why skip |
|
||||
|----------|----------|
|
||||
| `ANTHROPIC_BASE_URL` → LiteLLM | ToS violation with Max subscription (April 2026 terms) |
|
||||
| Third-party harnesses (OpenClaw etc.) | Explicitly banned for subscription users |
|
||||
| A2A in Claude Code | Not implemented by Anthropic yet — revisit late 2026 |
|
||||
| OpenAI agent handoffs | Loses execution context, not worth the complexity |
|
||||
|
||||
---
|
||||
|
||||
## Protocol Landscape (for awareness, not immediate action)
|
||||
|
||||
- **MCP** — production, 97M monthly downloads, your primary tool-access protocol. LiteLLM
|
||||
natively supports it as both MCP gateway and MCP client as of v1.60+.
|
||||
- **A2A v1.0** — Google/Linux Foundation, 150+ orgs in production, but Anthropic has not
|
||||
shipped it in Claude Code. The intent is agent-to-agent peer delegation (vs MCP's
|
||||
agent-to-tool). Worth watching for H2 2026.
|
||||
- **AGNTCY** — Cisco/Linux Foundation, discovery and identity layer beneath MCP+A2A.
|
||||
Potentially relevant for multi-machine routing across koala/iguana/flamingo once mature.
|
||||
|
||||
---
|
||||
|
||||
## Build Priority
|
||||
|
||||
| Step | Effort | Value | When |
|
||||
|------|--------|-------|------|
|
||||
| Add `model: haiku` to explorer subagents | 10 min | Immediate cost saving | Now |
|
||||
| Write MCP server for local models | 2–3h | Local model access in sessions | Soon |
|
||||
| Register MCP server in Claude Code settings | 15 min | Activates pattern 2 | With above |
|
||||
| Write orchestration script template | 1–2h | Pipeline workflows | When needed |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- LiteLLM MCP docs: https://docs.litellm.ai/docs/mcp
|
||||
- Community MCP wrapper for LiteLLM: https://github.com/itsDarianNgo/mcp-server-litellm
|
||||
- Ollama MCP server: https://github.com/rawveg/ollama-mcp
|
||||
- A2A protocol status: https://www.linuxfoundation.org/press/a2a-protocol-surpasses-150-organizations-lands-in-major-cloud-platforms-and-sees-enterprise-production-use-in-first-year
|
||||
- AGNTCY: https://github.com/agntcy
|
||||
2138
docs/superpowers/plans/2026-04-17-hyperguild-phase1.md
Normal file
2138
docs/superpowers/plans/2026-04-17-hyperguild-phase1.md
Normal file
File diff suppressed because it is too large
Load Diff
1871
docs/superpowers/plans/2026-04-19-hyperguild-phase2.md
Normal file
1871
docs/superpowers/plans/2026-04-19-hyperguild-phase2.md
Normal file
File diff suppressed because it is too large
Load Diff
1617
docs/superpowers/plans/2026-04-20-model-orchestration-plan.md
Normal file
1617
docs/superpowers/plans/2026-04-20-model-orchestration-plan.md
Normal file
File diff suppressed because it is too large
Load Diff
1073
docs/superpowers/plans/2026-04-22-phase4-attempt-wiring.md
Normal file
1073
docs/superpowers/plans/2026-04-22-phase4-attempt-wiring.md
Normal file
File diff suppressed because it is too large
Load Diff
@@ -99,7 +99,11 @@ func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {
|
||||
finalContent = fm.String() + req.Content
|
||||
}
|
||||
|
||||
dest := filepath.Join(rawDir, filepath.Base(filename))
|
||||
base := filepath.Base(filename)
|
||||
if !strings.HasSuffix(base, ".md") {
|
||||
base += ".md"
|
||||
}
|
||||
dest := filepath.Join(rawDir, base)
|
||||
if err := os.WriteFile(dest, []byte(finalContent), 0o644); err != nil {
|
||||
h.logger.Error("write failed", "err", err)
|
||||
http.Error(w, "write error", http.StatusInternalServerError)
|
||||
|
||||
Reference in New Issue
Block a user