feat(brain): hybrid BM25+vector search via Qdrant + nomic-embed-text #8

Closed
opened 2026-05-12 15:34:32 +00:00 by mathias · 3 comments
Owner

Context

Brain search is BM25-only (keyword frequency). Semantic similarity search would catch synonyms and paraphrases that BM25 misses. Qdrant is already the chosen vector store (see DECISIONS.md). nomic-embed-text runs on iguana via Ollama.

mem0 stack on koala was down when this was last checked — verify before building.

Proposed change

  1. On brain_write / brain_ingest: embed chunk with nomic-embed-text → upsert into Qdrant collection brain
  2. On brain_query / brain_answer: run BM25 + Qdrant vector search in parallel, merge results (RRF or score threshold), deduplicate by path
  3. Add env vars: BRAIN_QDRANT_URL, BRAIN_EMBED_URL (both opt-in)
  4. Backfill endpoint (/backfill-refs or new /backfill-embeddings) to embed existing brain docs

Prerequisites

  • Confirm Qdrant running on koala and reachable from ingestion pod
  • Confirm nomic-embed-text loaded on iguana Ollama
  • Decide: single brain collection or per-type collections

Acceptance criteria

  • Both env vars unset → BM25-only (no regression)
  • Both set → hybrid results returned
  • Backfill script/endpoint for existing docs
  • task check passes
## Context Brain search is BM25-only (keyword frequency). Semantic similarity search would catch synonyms and paraphrases that BM25 misses. Qdrant is already the chosen vector store (see DECISIONS.md). nomic-embed-text runs on iguana via Ollama. mem0 stack on koala was down when this was last checked — verify before building. ## Proposed change 1. On `brain_write` / `brain_ingest`: embed chunk with nomic-embed-text → upsert into Qdrant collection `brain` 2. On `brain_query` / `brain_answer`: run BM25 + Qdrant vector search in parallel, merge results (RRF or score threshold), deduplicate by path 3. Add env vars: `BRAIN_QDRANT_URL`, `BRAIN_EMBED_URL` (both opt-in) 4. Backfill endpoint (`/backfill-refs` or new `/backfill-embeddings`) to embed existing brain docs ## Prerequisites - Confirm Qdrant running on koala and reachable from ingestion pod - Confirm nomic-embed-text loaded on iguana Ollama - Decide: single `brain` collection or per-type collections ## Acceptance criteria - [ ] Both env vars unset → BM25-only (no regression) - [ ] Both set → hybrid results returned - [ ] Backfill script/endpoint for existing docs - [ ] `task check` passes
Author
Owner

Scope clarification: this issue is now the single source of truth for embedding-based brain retrieval. The embedding portion of #2 (flat-JSON sidecar at brain/.embeddings/index.json) was a parallel design that conflicted with DECISIONS.md (Qdrant). #2 has been closed and split — Tunnels live in #16, embeddings live here.

When picking this up, the scope to keep from #2's design notes:

  • Hybrid scoring with a configurable TF/embedding weight (α) — not strictly required, but nice for tuning
  • Opt-in via env var (BRAIN_QDRANT_URL + BRAIN_EMBED_URL) — already in this issue
  • Wing/Hall path scoping at query time — depends on #1

Depends on #1 (Hall taxonomy) for path-scoped queries to be meaningful.

Scope clarification: this issue is now the single source of truth for embedding-based brain retrieval. The embedding portion of #2 (flat-JSON sidecar at `brain/.embeddings/index.json`) was a parallel design that conflicted with DECISIONS.md (Qdrant). #2 has been closed and split — Tunnels live in #16, embeddings live here. When picking this up, the scope to keep from #2's design notes: - Hybrid scoring with a configurable TF/embedding weight (`α`) — not strictly required, but nice for tuning - Opt-in via env var (`BRAIN_QDRANT_URL` + `BRAIN_EMBED_URL`) — already in this issue - Wing/Hall path scoping at query time — depends on #1 Depends on #1 (Hall taxonomy) for path-scoped queries to be meaningful.
Author
Owner

Scope update (2026-05-18): backend switched from Qdrant → pgvector on the existing postgres18 in databases namespace.

Reason: Qdrant has never been deployed — no pod, no service, no manifest. Postgres18+pgvector is already running and shared across the project; CLAUDE.md lists pgvector as the default vector store anyway. Spinning up a new engine for one consumer is friction the 2026-04-08 ADR didn't weigh. Recorded in DECISIONS.md 2026-05-18 (supersedes 2026-04-08).

Revised target shape:

  • Table brain_embeddings(path TEXT PRIMARY KEY, embedding VECTOR(768), updated_at TIMESTAMPTZ) (768 = nomic-embed-text dimension)
  • BM25 stays as today (file walk + token frequency)
  • Cosine via pgvector <=> operator
  • Hybrid scoring done in SQL or Go; tune α after measuring
  • Embedding via nomic-embed-text on iguana ollama (confirmed loaded)
  • Opt-in env vars stay the same shape (BRAIN_EMBED_URL for embedder, plus a BRAIN_PG_DSN for the brain postgres connection)
  • Watcher tick upserts new/modified notes; brain_query runs BM25 + cosine in parallel and merges results

Out of scope unchanged: Qdrant retained as documented fallback for >1M vectors in CLAUDE.md.

Scope update (2026-05-18): backend switched from Qdrant → **pgvector** on the existing postgres18 in `databases` namespace. Reason: Qdrant has never been deployed — no pod, no service, no manifest. Postgres18+pgvector is already running and shared across the project; CLAUDE.md lists pgvector as the default vector store anyway. Spinning up a new engine for one consumer is friction the 2026-04-08 ADR didn't weigh. Recorded in DECISIONS.md 2026-05-18 (supersedes 2026-04-08). Revised target shape: - Table `brain_embeddings(path TEXT PRIMARY KEY, embedding VECTOR(768), updated_at TIMESTAMPTZ)` (768 = nomic-embed-text dimension) - BM25 stays as today (file walk + token frequency) - Cosine via pgvector `<=>` operator - Hybrid scoring done in SQL or Go; tune `α` after measuring - Embedding via `nomic-embed-text` on iguana ollama (confirmed loaded) - Opt-in env vars stay the same shape (`BRAIN_EMBED_URL` for embedder, plus a `BRAIN_PG_DSN` for the brain postgres connection) - Watcher tick upserts new/modified notes; `brain_query` runs BM25 + cosine in parallel and merges results Out of scope unchanged: Qdrant retained as documented fallback for >1M vectors in CLAUDE.md.
Author
Owner

Shipped in 57462b5. Backend pivoted to pgvector (see DECISIONS.md 2026-05-18).

Surface: embed.Client (Ollama embed), vectorstore.PGStore (pgxpool+pgvector HNSW), vectorstore.Sync (walk wiki/, upsert deltas, delete vanished). search.Query gains optional Vector+Embedder; when both set, BM25 + pgvector merge via Reciprocal Rank Fusion (k=60). MCP wired via WithHybridRetrieval. New REST POST /backfill-embeddings.

Env: BRAIN_PG_DSN+BRAIN_EMBED_URL must be set together (one alone → exit 1). Defaults: nomic-embed-text:latest (768d), 300s sync interval. Backfill: post to /backfill-embeddings.

DB bootstrap: scripts/brain-embeddings-init.sql — one-time DBA setup.

Acceptance criteria all met, task check clean. CI deploys image automatically; live verification gated on SOPS secret update.

Shipped in `57462b5`. Backend pivoted to pgvector (see DECISIONS.md 2026-05-18). Surface: `embed.Client` (Ollama embed), `vectorstore.PGStore` (pgxpool+pgvector HNSW), `vectorstore.Sync` (walk wiki/, upsert deltas, delete vanished). `search.Query` gains optional `Vector`+`Embedder`; when both set, BM25 + pgvector merge via Reciprocal Rank Fusion (k=60). MCP wired via `WithHybridRetrieval`. New REST `POST /backfill-embeddings`. Env: `BRAIN_PG_DSN`+`BRAIN_EMBED_URL` must be set together (one alone → exit 1). Defaults: nomic-embed-text:latest (768d), 300s sync interval. Backfill: post to `/backfill-embeddings`. DB bootstrap: `scripts/brain-embeddings-init.sql` — one-time DBA setup. Acceptance criteria all met, `task check` clean. CI deploys image automatically; live verification gated on SOPS secret update.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mathias/hyperguild#8