Strips slug authority from the LLM. The new RawPage type carries only
{title, type, subtype, domain, content} — no paths or frontmatter.
Pipeline will derive slugs deterministically (Task 4).
pipeline.go gets a temporary bridge stub (TODO task4) to keep the
package compiling between tasks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New extract package: Text() dispatcher for .md/.txt passthrough and
PDF extraction via pdftotext subprocess
- wiki.Entry gains Aliases []string, loaded from YAML frontmatter
- Fuzzy entity resolution in pipeline: normalizes titles (lowercase,
strip articles, collapse hyphens) and matches proposed pages against
existing inventory slugs and aliases to prevent proliferation
- Watcher and API handler now use extract.Text() instead of os.ReadFile
- Dockerfile: apk add poppler-utils in Alpine runtime stage
Files dropped into brain/raw/ are now copied to processed/ or failed/ rather
than moved. A .processed or .failed marker is written next to the original so
the watcher skips it on subsequent polls without deleting it. This keeps
Syncthing-synced Obsidian vaults intact after ingestion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLAUDE.md has a specific meaning in the Claude Code ecosystem (agent
instructions). The wiki schema for the ingestion pipeline should live
in schema.md to avoid confusion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wires pipeline.Run into the HTTP layer so callers can ingest raw text
or files/directories without touching the filesystem directly. Rewrites
main.go to parse LLM and watcher env vars and build pipeline.Config.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Polls brain/raw/ on a configurable ticker, derives human-readable source
names from filenames, runs the pipeline, and moves files to
processed/YYYY-MM-DD/ on success or failed/ on error with a log.md entry.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds prompt.go (BuildPrompt + systemPrompt) and pipeline.go (Run, Config,
Result, mergeAll) that wire chunking, LLM calls, parse, merge, index rebuild,
and log append into a single ingestion pipeline. Includes integration tests
covering write, dry-run, and duplicate-path merge scenarios.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
brain_write with a custom filename omitted the .md extension, causing
search to skip the file (search.go filters on HasSuffix .md).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty or whitespace-only queries would silently pass through to search,
returning meaningless results. Also removed the Domain field from
queryRequest — it was accepted but silently ignored since search.Query
has no domain parameter, which would confuse callers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements POST /query (BM25 search via internal/search) and POST /write
(raw file persistence to brain/raw/) as an api.Handler struct. Filename
is auto-generated when absent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both walk-level errors and ReadFile failures now use best-effort
semantics (warn via slog, continue) instead of mixed abort/silent-skip.
filepath.Rel error is now propagated from the callback instead of
discarded.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements search.Query which walks brainDir/wiki/**/*.md, scores files
by term-frequency across query tokens, and returns results sorted by
score descending. Uses only stdlib — no external search deps.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>