hyperguild/brain/eval/baseline-pre-fix.txt

# baseline-pre-fix — 20 questions, k=5

top-1 hit rate: 4/20 = 20%
top-3 hit rate: 13/20 = 65%

## per-question detail

· rank=3  expected=dex-in-memory-storage-wipes-oauth-tokens-on-every-pod-restart
     q: how do I stop dex from logging users out on every pod restart?
     1. homelab-network-perimeter-model
     2. 2026-05-12-koala-machine-state
     3. dex-in-memory-storage-wipes-oauth-tokens-on-every-pod-restart  <-- expected
     4. infra-litellm-absorption-2026-05-16
     5. Financial Sentiment Analysis on Stock Market Headlines With FinBERT & HuggingFace

★ rank=1  expected=postgres-least-privilege-migration-tenant-grant-bypass-2026-05
     q: my postgres-exporter broke after revoking PUBLIC CONNECT — why?
     1. postgres-least-privilege-migration-tenant-grant-bypass-2026-05  <-- expected
     2. infra-litellm-absorption-2026-05-16
     3. brain-mcp-activation-runbook
     4. extension-version-lags-platform-major-upgrade
     5. ntfy-deny-all-rollout-ordering-keep-alert-pipeline-live-during-auth-flip

★ rank=1  expected=homelab-network-perimeter-model
     q: when is a NodePort acceptable vs needing a public ingress with bearer gate?
     1. homelab-network-perimeter-model  <-- expected
     2. qwen3-thinking-model-empty-content-trap
     3. mcpclient-empty-token-silent-401-envfrom-missing-key
     4. 2026-05-12-koala-machine-state
     5. koala-llama-swap-native-tool-calls-survey-2026-05

· rank=3  expected=exit-255-unknown-reason-not-oom
     q: what does container exit code 255 with reason Unknown mean?
     1. qwen3-thinking-model-empty-content-trap
     2. infra-litellm-absorption-2026-05-16
     3. exit-255-unknown-reason-not-oom  <-- expected
     4. mcpclient-empty-token-silent-401-envfrom-missing-key
     5. koala-llama-swap-native-tool-calls-survey-2026-05

· rank=3  expected=gitea-push-mirror-cannot-create-remote-repo-needs-pre-existing-github-repo
     q: can gitea push-mirror create the github repo automatically?
     1. infra-litellm-absorption-2026-05-16
     2. Autoresearch
     3. gitea-push-mirror-cannot-create-remote-repo-needs-pre-existing-github-repo  <-- expected
     4. adr-new-project-gitea-first-github-mirror
     5. adr-github-as-primary-remote

✗ rank=0  expected=flux-healthcheck-stale-on-resource-removal
     q: a flux kustomization is stuck after I removed a resource — why?
     1. qwen3-thinking-model-empty-content-trap
     2. 2026-05-12-koala-machine-state
     3. homelab-architecture-principles-2026-05
     4. gitea-mcp: full stack shipped end-to-end (2026-05-05)
     5. k8s-configmap-mount-no-reload-needs-pod-restart

· rank=2  expected=go-bytes-buffer-bytes-reset-aliasing-trap
     q: the bytes buffer aliasing trap with Reset in a loop — what's the bug?
     1. Financial Sentiment Analysis on Stock Market Headlines With FinBERT & HuggingFace
     2. go-bytes-buffer-bytes-reset-aliasing-trap  <-- expected
     3. homelab-security-chains-not-bugs
     4. training-on-rtx-5070-pretraining-vs-finetuning
     5. Hash Encoding

★ rank=1  expected=homelab-architecture-principles-2026-05
     q: what are the homelab architecture principles from may 2026?
     1. homelab-architecture-principles-2026-05  <-- expected
     2. homelab-network-perimeter-model
     3. Claude Managed Agents — architecture notes relevant to homelab agent platform
     4. homelab-core-glossary
     5. 2026-05-12-koala-machine-state

✗ rank=0  expected=2026-05-04-sops-age-key-from-flux-cluster
     q: where does the sops age private key live in the cluster?
     1. 2026-05-12-koala-machine-state
     2. homelab-network-perimeter-model
     3. postgres-least-privilege-migration-tenant-grant-bypass-2026-05
     4. brain-mcp-activation-runbook
     5. dex-in-memory-storage-wipes-oauth-tokens-on-every-pod-restart

✗ rank=0  expected=grafana-dashboards-as-code-not-ui-state
     q: why do my grafana dashboards disappear after a pod restart?
     1. infra-litellm-absorption-2026-05-16
     2. 2026-05-12-koala-machine-state
     3. Financial Sentiment Analysis on Stock Market Headlines With FinBERT & HuggingFace
     4. brain-mcp-activation-runbook
     5. dex-in-memory-storage-wipes-oauth-tokens-on-every-pod-restart

· rank=2  expected=double-diamond-methodology
     q: what is the double diamond methodology?
     1. Harnessing the Power of Hash Encoding for Categorical Data in Data Science
     2. double-diamond-methodology  <-- expected
     3. unified-methodology-diamond-futures-autoresearch
     4. futures-thinking-extended-double-diamond
     5. insight-exploration-as-diamond-1

· rank=3  expected=2026-05-04-mcp-transport-version-claude-ai-strict
     q: my MCP server works from claude code but fails on claude.ai — what's different?
     1. qwen3-thinking-model-empty-content-trap
     2. mcp-resource-url-empty-breaks-claude-ai-discovery-silently
     3. 2026-05-04-mcp-transport-version-claude-ai-strict  <-- expected
     4. 2026-05-04-claude-ai-custom-mcp-connectors
     5. finding-github-mcp-claudeai-vs-claudecode

· rank=2  expected=homelab-security-chains-not-bugs
     q: how should I rate security findings — isolated bugs or exploit chains?
     1. homelab-network-perimeter-model
     2. homelab-security-chains-not-bugs  <-- expected
     3. Financial Sentiment Analysis on Stock Market Headlines With FinBERT & HuggingFace
     4. policy-audit-mode-blocks-nothing
     5. homelab-document-accepted-risk-to-break-audit-cycle

· rank=2  expected=2026-05-03-canonical-vs-derived-context-flow
     q: how should canonical context files relate to derived adapter files?
     1. qwen3-thinking-model-empty-content-trap
     2. 2026-05-03-canonical-vs-derived-context-flow  <-- expected
     3. 2026-05-12-koala-machine-state
     4. 2026-05-04-claude-ai-custom-mcp-connectors
     5. koala-llama-swap-native-tool-calls-survey-2026-05

· rank=2  expected=homelab-core-glossary
     q: what is the homelab core vocabulary glossary?
     1. homelab-architecture-principles-2026-05
     2. homelab-core-glossary  <-- expected
     3. Claude Managed Agents — architecture notes relevant to homelab agent platform
     4. 2026-05-12-koala-machine-state
     5. Autoresearch

★ rank=1  expected=koala-llama-swap-native-tool-calls-survey-2026-05
     q: which models on koala llama-swap actually emit native tool_calls correctly?
     1. koala-llama-swap-native-tool-calls-survey-2026-05  <-- expected
     2. 2026-05-12-koala-machine-state
     3. infra-litellm-absorption-2026-05-16
     4. training-on-rtx-5070-pretraining-vs-finetuning
     5. qwen3-thinking-model-empty-content-trap

✗ rank=0  expected=qwen35-9b-fast
     q: what is qwen35-9b-fast and what's it used for?
     1. koala-llama-swap-native-tool-calls-survey-2026-05
     2. qwen3-thinking-model-empty-content-trap
     3. Qwen35-9b-fast
     4. infra-litellm-absorption-2026-05-16
     5. 2026-05-12-koala-machine-state

✗ rank=0  expected=go-defer-errcheck-body-close
     q: in go, how do I prevent defer body close from silently dropping errors?
     1. infra-litellm-absorption-2026-05-16
     2. homelab-network-perimeter-model
     3. go-bytes-buffer-bytes-reset-aliasing-trap
     4. mcpclient-empty-token-silent-401-envfrom-missing-key
     5. brain-mcp-activation-runbook

✗ rank=0  expected=hyperguild-level3-pipeline-rewrite
     q: what was the level 3 rewrite of hyperguild's ingestion pipeline?
     1. 2026-05-12-koala-machine-state
     2. homelab-core-glossary
     3. brain-mcp-activation-runbook
     4. koala-llama-swap-native-tool-calls-survey-2026-05
     5. infra-litellm-absorption-2026-05-16

? rank=4  expected=adr-new-project-gitea-first-github-mirror
     q: what's the new-project ADR — is it gitea-first or github-first?
     1. gitea-push-mirror-cannot-create-remote-repo-needs-pre-existing-github-repo
     2. gitea-mcp: full stack shipped end-to-end (2026-05-05)
     3. mcp-tool-design-get-needs-list-partner
     4. adr-new-project-gitea-first-github-mirror  <-- expected
     5. 2026-05-04-gitea-mcp-build-session