feat(brain): add Qwen3-Reranker to brain_answer for improved RAG quality #7
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
brain_answercurrently does BM25 top-10 → LLM synthesis. BM25 recall is decent but ranking is keyword-frequency based — semantically relevant chunks can rank low if they don't share exact terms with the query.Qwen3-Reranker is available on iguana (cross-encoder, runs via Ollama). Adding a rerank step between retrieval and synthesis should improve answer quality with no change to the LLM call.
Proposed change
BRAIN_RERANKER_URLenv var (opt-in, same pattern asBRAIN_LLM_PRIMARY_URL)Why deferred
Needs Qwen3-Reranker confirmed running on iguana and a clean HTTP API to call it (Ollama
/api/rerankor custom wrapper). Verify model availability before building.Acceptance criteria
BRAIN_RERANKER_URLunset → behaviour unchanged (BM25 top-10 direct to LLM)BRAIN_RERANKER_URLset → BM25 top-20 → rerank → top-5 → LLMtask checkpassesShipped in
a56a4db.Design note (deviation from spec)
The issue spoke of a reranker "score" with a
top-5 by reranker scorecut. Qwen3-Reranker as published on Ollama (no native/api/rerankinv0.21.1) returns a single yes/no token under its trained chat template — there's no logprob surface to extract a fine-grained float per pair.Implementation choice: treat the reranker as a filter rather than a ranker.
Net effect matches the spec's intent (better RAG quality, fewer irrelevant chunks to LLM) with the API actually available today. If
/api/rerankor a logprob path lands in Ollama, swapparseYesNofor a float decoder; the rest of the wiring is stable.Acceptance criteria
BRAIN_RERANKER_URLunset → behaviour unchanged (BM25 top-10 → LLM) —TestBrainAnswer_Synthesizesstill green with no reranker injectedBRAIN_RERANKER_URLset → BM25 top-20 → rerank → ≤5 → LLM —TestBrainAnswer_RerankerFiltersBeforeLLMprovesnoise.mdis dropped before the LLM calltask checkcleanNew env vars
BRAIN_RERANKER_URLhttp://iguana:11434BRAIN_RERANKER_MODELdengcao/Qwen3-Reranker-0.6B:F16Verified
dengcao/Qwen3-Reranker-0.6B:F16already loaded on iguana ollama before coding (per the issue's "verify model availability" gate).Deploy
CI/CD auto-rebuilds the ingestion image. Add
BRAIN_RERANKER_URL=http://iguana:11434to the supervisor pod env to flip it on once the image rolls out.Closing.