hyperguild/ingestion/internal/search/search_test.go at 49b188e9c90565a377f1914943c203e33be1e10e

mathias/hyperguild

Fork 0

Files

Mathias 4f78fecd06

CI / Lint / Test / Vet (push) Successful in 12s

Details

CI / Mirror to GitHub (push) Successful in 3s

Details

feat(search): M4 tier-weighted BM25 re-rank (infra#72)

The eval set under brain/eval/qa-2026-05.md showed BM25 top-1 at 20%
with 5 of the missing slugs being short focused knowledge entries
that lost to long aggregate docs on raw term-frequency. Tier weighting
addresses that without touching the BM25 algorithm itself.

How

- Result struct gains a Tier field, populated during the file walk
  via extractTier (frontmatter wins, path prefix as fallback —
  mirrors the graph.inferTierFromPath logic so the two callers stay
  in lockstep).
- After the existing sort (and optional hybridMerge), do a final
  stable re-sort by float64(Score) * tierWeight(Tier). Knowledge
  ×1.5, note ×1.0, inbox ×0.3, unknown ×1.0.
- hydrate() (vector-only hits) also fills Tier so re-ranking covers
  the hybrid path.

Test covers the load-bearing case: a long note-tier doc with raw=10
loses to a short knowledge-tier doc with raw=8 after weighting
(8×1.5=12 vs 10×1.0=10).

Measurement gate is in infra#72: re-run brain/eval/score.py against
the live brain after this image lands; close the issue when top-1
hit rate lifts by ≥10 absolute points.

2026-05-25 18:45:20 +02:00

8.1 KiB

Raw Blame History

View Raw

8.1 KiB Raw Blame History

8.1 KiB

Raw Blame History