hyperguild/ingestion/internal/vectorstore/sync.go at 078ec029da2caf16a992b25115293376741b8cc3

mathias/hyperguild

Fork 0

Files

Mathias 078ec029da

CI / Lint / Test / Vet (push) Successful in 11s

Details

CI / Mirror to GitHub (push) Has been skipped

Details

fix(ingestion): embed sync also scans brain/knowledge/ + logs per-item errors

The embed sync goroutine only walked brain/wiki/. brain/knowledge/ (112
curated entries, per CLAUDE.md the most-important brain content) had zero
coverage in brain_embeddings — vector retrieval was blind to it. Hybrid
BM25 + pgvector retrieval would never surface a curated knowledge entry
via the vector arm.

Extract the per-root walk into a loop over a small subdir list and add
"knowledge" alongside "wiki". scanDirs is package-level so it stays a
single source of truth for what gets embedded.

Also log each failing item's path + error string from StartSync.
Previously only the aggregate count was logged, so a persistent
`errors=1` per cycle was opaque. With per-item warnings, the actual
ollama "input length exceeds the context length" surface immediately.

Refs gitea/mathias/infra#37 (this commit covers the knowledge/ scan
bug; the long-file chunking bug is a separate change.)

2026-05-19 21:27:15 +02:00

4.5 KiB

Raw Blame History

View Raw

4.5 KiB Raw Blame History

4.5 KiB

Raw Blame History