Files
Mathias 2b7bbe38c7
All checks were successful
CI / Lint / Test / Vet (push) Successful in 11s
CI / Mirror to GitHub (push) Successful in 4s
docs(eval): record M4 + M4b scorer runs — phase 2 gate cleared (infra#72)
Tier-weighted retrieval against the qa-2026-05.md 20-question set:

| run                            | top-1 | top-3 |
|--------------------------------|-------|-------|
| baseline (pre-phase-1)         | 20%   | 65%   |
| post phase 1 (parser+content)  | 20%   | 70%   |
| post M4 (tier weighting)       | 30%   | 75%   |
| post M4b (entities → K tier)   | 35%   | 80%   |

Net Phase 2 lift: +15pt top-1, +15pt top-3 — comfortably above the
≥10pt close-gate set in infra#72.

Three remaining misses are content-keyword issues, not structure
issues (the questions don't share enough lexical surface with the
target entries to surface via BM25 alone). Vector search would
help here but the iguana embedder is off-mesh (see infra#64).
2026-05-25 18:51:29 +02:00
..