feat(pipeline): add POST /ingest-raw for direct batch ingestion without LLM

Allows callers to provide pre-structured RawPage data directly, bypassing the LLM extraction step. The pipeline still handles slug computation, frontmatter, link canonicalization, source back-references, and dedup — only the extraction is skipped. Useful when a more capable model or manual curation produces the structured data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(pipeline): repair invalid JSON escape sequences from LLM output before parsing
2026-04-24 11:15:59 +02:00 · 2026-04-23 22:04:27 +02:00 · 2026-04-23 19:55:37 +02:00 · 2026-04-23 19:45:21 +02:00 · 2026-04-23 19:07:33 +02:00 · 2026-04-23 18:59:10 +02:00
54 changed files with 9675 additions and 69 deletions
--- a/.gitea/workflows/cd.yml
+++ b/.gitea/workflows/cd.yml
@@ -1,13 +1,16 @@
 name: cd

 on:
-  push:
+  workflow_run:
+    workflows: ["CI"]
+    types: [completed]
    branches: [main]

 jobs:
  deploy:
    name: Build and deploy
    runs-on: self-hosted
+    if: ${{ github.event.workflow_run.conclusion == 'success' && github.event.workflow_run.event == 'push' }}
    env:
      SERVICE: supervisor
      IMAGE: gitea.d-ma.be/mathias/supervisor
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@ secrets/
 # ── Documented examples (commit these) ──
 !.env.example
 !config/supervisor/CLAUDE.md
+!brain/CLAUDE.md

 # IDE
 .idea/
--- a/4
+++ b/4
@@ -1,2 +1,2 @@
-ingestion: cd ingestion && INGEST_BRAIN_DIR=../brain INGEST_PORT=3300 go run ./cmd/server/
-supervisor: SUPERVISOR_CONFIG_DIR=./config/supervisor SUPERVISOR_MODELS_FILE=./config/models.yaml SUPERVISOR_SESSIONS_DIR=./brain/sessions INGEST_BASE_URL=http://localhost:3300 go run ./cmd/supervisor/
+ingestion: cd ingestion && INGEST_BRAIN_DIR=../brain INGEST_PORT=3300 INGEST_WATCH_INTERVAL=30 go run ./cmd/server/
+supervisor: SUPERVISOR_CONFIG_DIR=./config/supervisor SUPERVISOR_MODELS_FILE=./config/models.yaml SUPERVISOR_SESSIONS_DIR=./brain/sessions INGEST_BASE_URL=http://localhost:3300 INGEST_SVC_URL=http://localhost:3300 go run ./cmd/supervisor/
--- a/brain/schema.md
+++ b/brain/schema.md
@@ -0,0 +1,137 @@
+# Brain Wiki Schema
+
+This document defines the three page types in the brain wiki.
+The LLM must follow this schema exactly when generating wiki pages.
+
+## Output Format
+
+Return a JSON array. Each element:
+
+```json
+{
+  "title":   "exact page title",
+  "type":    "source | concept | entity",
+  "subtype": "see below — omit for concept",
+  "domain":  "see domains — omit if none fits",
+  "content": "Markdown body only — no frontmatter, no path"
+}
+```
+
+- `subtype` for **source**: `article | pdf | book | video | note | project`
+- `subtype` for **entity**: `person | company | tool | model | framework | technology`
+- The pipeline computes slugs and frontmatter — never include them in output.
+
+## Wikilink Format
+
+All cross-references use `[[Display Name]]` — just the display name, no slug, no pipe.
+
+Rules:
+- Only link to pages in the inventory or pages you are creating in this response
+- The pipeline converts `[[Display Name]]` to `[[slug|Display Name]]` automatically
+- Section links must match their section type (Related Concepts → concept pages only, etc.)
+
+Examples: `[[Domain Driven Design]]`, `[[Ryan Singer]]`, `[[Shape Up]]`
+
+## Domains
+
+Use one of: `ai-llm`, `software-engineering`, `product-strategy`, `finance-markets`,
+`personal`, `consulting`, `climate`, `infrastructure`, `security`
+
+---
+
+## Source Pages — wiki/sources/<slug>.md
+
+One page per ingested source. Books are NEVER split across multiple source pages — update the existing one.
+
+Body sections (in this order):
+
+### Summary
+2–3 sentences. Core argument or finding.
+
+### Key Claims
+Bulleted list. Paraphrase — no verbatim quotes or code.
+
+### Concepts Introduced or Reinforced
+Wikilinks to concept pages ONLY. One per line.
+
+### Entities Mentioned
+Wikilinks to entity pages ONLY. One per line.
+
+### Open Questions Raised
+Gaps or follow-up questions from this source.
+
+For books only, also add:
+
+### Chapters
+One bullet per chapter with 1–2 sentence summary.
+
+### Argument Arc
+Overall narrative as it becomes clear across chapters.
+
+### Updates
+Dated entries appended on re-ingestion. NEVER rewrite — only append.
+
+---
+
+## Concept Pages — wiki/concepts/<slug>.md
+
+One page per idea, framework, methodology, or pattern.
+
+Body sections (in this order):
+
+### Definition
+One-paragraph plain-language explanation.
+
+### Why It Matters
+Practical significance. Why should anyone care?
+
+### Related Concepts
+Wikilinks to concept pages ONLY.
+
+### Related Entities
+Wikilinks to entity pages ONLY.
+
+### Sources
+Wikilinks to source pages ONLY.
+
+### Evolving Notes
+Updated as new sources arrive. Append, do not rewrite.
+
+---
+
+## Entity Pages — wiki/entities/<slug>.md
+
+One page per person, tool, organisation, technology, or product.
+
+Body sections (in this order):
+
+### Description
+One-line description.
+
+### Relevance
+Why this entity matters to this knowledge base.
+
+### Key Positions, Products, or Claims
+With dates where known.
+
+### Related Concepts
+Wikilinks to concept pages ONLY.
+
+### Related Entities
+Wikilinks to entity pages ONLY.
+
+### Sources
+Wikilinks to source pages ONLY.
+
+---
+
+## Non-Negotiable Rules
+
+1. Output ONLY a valid JSON array — no markdown fences, no prose before or after
+2. Each element: `{"title": "...", "type": "...", "subtype": "...", "domain": "...", "content": "..."}`
+3. Never include slugs, paths, or frontmatter in output — the pipeline handles these
+4. Wikilinks: `[[Display Name]]` only — no pipe, no slug
+5. Dates always YYYY-MM-DD (used only in content body where contextually relevant)
+6. Never reproduce verbatim code — describe the pattern or technique
+7. Section links must match their section type
+8. One source page per book — if inventory shows it exists, include it as an UPDATE
--- a/cmd/supervisor/main.go
+++ b/cmd/supervisor/main.go
@@ -105,6 +105,8 @@ func main() {
 	}))
 	reg.Register(brain.New(brain.Config{
 		IngestBaseURL:  cfg.IngestBaseURL,
+		IngestSvcURL:   cfg.IngestSvcURL,
+		KBRetrievalURL: cfg.KBRetrievalURL,
 	}))
 	reg.Register(org.New(org.Config{
 		TierFn: tierFn,
--- a/docs/superpowers/plans/2026-04-22-brain-ingestion-pipeline.md
+++ b/docs/superpowers/plans/2026-04-22-brain-ingestion-pipeline.md
--- a/docs/superpowers/plans/2026-04-22-brain-ingestion-quality.md
+++ b/docs/superpowers/plans/2026-04-22-brain-ingestion-quality.md
@@ -0,0 +1,858 @@
+# Brain Ingestion Quality: PDF Extraction + Entity Resolution
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Fix PDF ingestion (currently passes raw bytes to LLM) and add fuzzy entity resolution (prevents slug proliferation at scale).
+
+**Architecture:** Two independent improvements wired into the existing pipeline. A new `extract` package handles text extraction by file type (pdftotext subprocess, passthrough for .md/.txt). A new `resolve.go` in the `pipeline` package normalizes proposed entity/concept titles against the loaded inventory to reuse existing slugs instead of creating duplicates. Both changes are wired into `watcher.go` and `api/handler.go` with no new dependencies except `poppler-utils` in the Docker image.
+
+**Tech Stack:** Go stdlib (`os/exec`, `bufio`, `strings`), testify, poppler-utils (`pdftotext`)
+
+---
+
+## File Structure
+
+**New files:**
+- `ingestion/internal/extract/extract.go` — `Text(path string) (string, error)` dispatcher
+- `ingestion/internal/extract/pdf.go` — `pdftotext` subprocess extraction
+- `ingestion/internal/extract/extract_test.go` — table-driven tests for all paths
+- `ingestion/internal/pipeline/resolve.go` — `Resolve(proposed []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) []wiki.Page`
+- `ingestion/internal/pipeline/resolve_test.go` — table-driven tests
+
+**Modified files:**
+- `ingestion/internal/wiki/types.go` — add `Aliases []string` to `Entry`
+- `ingestion/internal/wiki/inventory.go` — `readFrontmatter` reads both title and aliases
+- `ingestion/internal/wiki/inventory_test.go` — add alias coverage
+- `ingestion/internal/pipeline/pipeline.go` — call `Resolve` after `ParsePages`
+- `ingestion/internal/watcher/watcher.go` — call `extract.Text` instead of `os.ReadFile`
+- `ingestion/internal/api/handler.go` — call `extract.Text` for path-based ingestion
+- `ingestion/Dockerfile` — `apk add poppler-utils`
+
+---
+
+### Task 1: `extract` package — Text() dispatcher with .md/.txt passthrough
+
+**Files:**
+- Create: `ingestion/internal/extract/extract.go`
+- Create: `ingestion/internal/extract/extract_test.go`
+
+- [ ] **Step 1: Write the failing test**
+
+```go
+// ingestion/internal/extract/extract_test.go
+package extract
+
+import (
+	"os"
+	"path/filepath"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestText_Markdown(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "note.md")
+	require.NoError(t, os.WriteFile(path, []byte("# Hello\n\nWorld."), 0o644))
+
+	got, err := Text(path)
+	require.NoError(t, err)
+	assert.Equal(t, "# Hello\n\nWorld.", got)
+}
+
+func TestText_Txt(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "note.txt")
+	require.NoError(t, os.WriteFile(path, []byte("plain text"), 0o644))
+
+	got, err := Text(path)
+	require.NoError(t, err)
+	assert.Equal(t, "plain text", got)
+}
+
+func TestText_UnsupportedExtension(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "data.csv")
+	require.NoError(t, os.WriteFile(path, []byte("a,b,c"), 0o644))
+
+	_, err := Text(path)
+	assert.ErrorContains(t, err, "unsupported")
+}
+```
+
+- [ ] **Step 2: Run to verify it fails**
+
+```bash
+cd ingestion && go test ./internal/extract/... -v
+```
+Expected: compile error — package does not exist yet.
+
+- [ ] **Step 3: Implement extract.go**
+
+```go
+// ingestion/internal/extract/extract.go
+package extract
+
+import (
+	"fmt"
+	"os"
+	"strings"
+)
+
+// Text reads the file at path and returns its plain-text content.
+// Supported extensions: .md, .txt (passthrough), .pdf (via pdftotext).
+func Text(path string) (string, error) {
+	ext := strings.ToLower(fileExt(path))
+	switch ext {
+	case ".md", ".txt":
+		b, err := os.ReadFile(path)
+		if err != nil {
+			return "", fmt.Errorf("read %s: %w", path, err)
+		}
+		return string(b), nil
+	case ".pdf":
+		return extractPDF(path)
+	default:
+		return "", fmt.Errorf("unsupported file extension: %s", ext)
+	}
+}
+
+// fileExt returns the file extension including the dot, lowercased.
+func fileExt(path string) string {
+	for i := len(path) - 1; i >= 0; i-- {
+		if path[i] == '.' {
+			return path[i:]
+		}
+		if path[i] == '/' || path[i] == '\\' {
+			break
+		}
+	}
+	return ""
+}
+```
+
+- [ ] **Step 4: Add pdf.go stub so it compiles**
+
+```go
+// ingestion/internal/extract/pdf.go
+package extract
+
+import "fmt"
+
+func extractPDF(_ string) (string, error) {
+	return "", fmt.Errorf("PDF extraction not implemented")
+}
+```
+
+- [ ] **Step 5: Run tests to verify they pass**
+
+```bash
+cd ingestion && go test ./internal/extract/... -v
+```
+Expected: PASS — 3 tests passing.
+
+- [ ] **Step 6: Commit**
+
+```bash
+cd ingestion && git add internal/extract/
+git commit -m "feat(extract): add Text() dispatcher with md/txt passthrough"
+```
+
+---
+
+### Task 2: PDF extraction via pdftotext
+
+**Files:**
+- Modify: `ingestion/internal/extract/pdf.go`
+- Modify: `ingestion/internal/extract/extract_test.go`
+
+- [ ] **Step 1: Add PDF test (skip if pdftotext absent)**
+
+Append to `extract_test.go`:
+
+```go
+func TestText_PDF(t *testing.T) {
+	if _, err := exec.LookPath("pdftotext"); err != nil {
+		t.Skip("pdftotext not available")
+	}
+	// Use a known PDF fixture; if none, create a minimal one via echo.
+	// The test verifies the round-trip: a PDF containing "Hello PDF" yields that string.
+	dir := t.TempDir()
+	pdfPath := filepath.Join(dir, "test.pdf")
+
+	// Generate a minimal single-page PDF using a here-doc approach.
+	// This is a valid minimal PDF containing the text "Hello PDF".
+	minimalPDF := "%PDF-1.4\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj\n" +
+		"2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj\n" +
+		"3 0 obj<</Type/Page/MediaBox[0 0 612 792]/Parent 2 0 R/Contents 4 0 R/Resources<</Font<</F1<</Type/Font/Subtype/Type1/BaseFont/Helvetica>>>>>>>>endobj\n" +
+		"4 0 obj<</Length 44>>\nstream\nBT /F1 12 Tf 100 700 Td (Hello PDF) Tj ET\nendstream\nendobj\n" +
+		"xref\n0 5\n0000000000 65535 f\n0000000009 00000 n\n0000000058 00000 n\n0000000115 00000 n\n0000000310 00000 n\n" +
+		"trailer<</Size 5/Root 1 0 R>>\nstartxref\n406\n%%EOF\n"
+	require.NoError(t, os.WriteFile(pdfPath, []byte(minimalPDF), 0o644))
+
+	got, err := Text(pdfPath)
+	require.NoError(t, err)
+	assert.Contains(t, got, "Hello PDF")
+}
+```
+
+Add `"os/exec"` to imports in `extract_test.go`.
+
+- [ ] **Step 2: Run to verify it fails (or skips)**
+
+```bash
+cd ingestion && go test ./internal/extract/... -v -run TestText_PDF
+```
+Expected: SKIP (pdftotext not installed locally) or FAIL with "not implemented".
+
+- [ ] **Step 3: Implement pdf.go**
+
+```go
+// ingestion/internal/extract/pdf.go
+package extract
+
+import (
+	"bytes"
+	"fmt"
+	"os/exec"
+	"strings"
+)
+
+// extractPDF runs pdftotext on path and returns the extracted text.
+// pdftotext must be installed (package: poppler-utils on Alpine/Debian, poppler on Homebrew).
+func extractPDF(path string) (string, error) {
+	cmd := exec.Command("pdftotext", "-q", path, "-")
+	var stdout, stderr bytes.Buffer
+	cmd.Stdout = &stdout
+	cmd.Stderr = &stderr
+
+	if err := cmd.Run(); err != nil {
+		errMsg := strings.TrimSpace(stderr.String())
+		if errMsg == "" {
+			errMsg = err.Error()
+		}
+		return "", fmt.Errorf("pdftotext: %s", errMsg)
+	}
+
+	return strings.TrimSpace(stdout.String()), nil
+}
+```
+
+- [ ] **Step 4: Run all extract tests**
+
+```bash
+cd ingestion && go test ./internal/extract/... -v
+```
+Expected: PASS (PDF test skips if pdftotext absent, passes if present).
+
+- [ ] **Step 5: Commit**
+
+```bash
+cd ingestion && git add internal/extract/pdf.go internal/extract/extract_test.go
+git commit -m "feat(extract): implement PDF extraction via pdftotext"
+```
+
+---
+
+### Task 3: `Entry.Aliases` + inventory reads aliases from frontmatter
+
+**Files:**
+- Modify: `ingestion/internal/wiki/types.go`
+- Modify: `ingestion/internal/wiki/inventory.go`
+- Modify: `ingestion/internal/wiki/inventory_test.go`
+
+- [ ] **Step 1: Write failing test for alias loading**
+
+Add to `inventory_test.go`:
+
+```go
+func TestLoadInventory_ReadsAliases(t *testing.T) {
+	dir := t.TempDir()
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
+
+	require.NoError(t, os.WriteFile(
+		filepath.Join(dir, "wiki", "entities", "ryan-singer.md"),
+		[]byte("---\ntitle: Ryan Singer\naliases:\n  - Singer\n  - R. Singer\n---\n\n## Description\n\nDesigner.\n"),
+		0o644,
+	))
+
+	inv, err := LoadInventory(dir)
+	require.NoError(t, err)
+
+	require.Len(t, inv[PageTypeEntity], 1)
+	e := inv[PageTypeEntity][0]
+	assert.Equal(t, "Ryan Singer", e.Title)
+	assert.Equal(t, []string{"Singer", "R. Singer"}, e.Aliases)
+}
+```
+
+- [ ] **Step 2: Run to verify it fails**
+
+```bash
+cd ingestion && go test ./internal/wiki/... -v -run TestLoadInventory_ReadsAliases
+```
+Expected: compile error — `Entry` has no `Aliases` field.
+
+- [ ] **Step 3: Add Aliases to Entry in types.go**
+
+```go
+// Entry is a summary of an existing wiki page used to build the inventory.
+type Entry struct {
+	Slug    string
+	Title   string
+	Aliases []string
+	Type    PageType
+}
+```
+
+- [ ] **Step 4: Replace readTitle with readFrontmatter in inventory.go**
+
+Replace the `readTitle` function and its call site:
+
+```go
+// readFrontmatter extracts title and aliases from YAML frontmatter.
+// Falls back to slug for title and empty aliases on any error.
+func readFrontmatter(path, fallbackSlug string) (title string, aliases []string) {
+	title = fallbackSlug
+	f, err := os.Open(path)
+	if err != nil {
+		return
+	}
+	defer f.Close()
+
+	scanner := bufio.NewScanner(f)
+	inFM := false
+	inAliases := false
+	for scanner.Scan() {
+		line := scanner.Text()
+		if strings.TrimSpace(line) == "---" {
+			if !inFM {
+				inFM = true
+				continue
+			}
+			break // end of frontmatter
+		}
+		if !inFM {
+			continue
+		}
+
+		// Detect alias list items (lines starting with "  - ").
+		if inAliases {
+			trimmed := strings.TrimSpace(line)
+			if strings.HasPrefix(trimmed, "- ") {
+				aliases = append(aliases, strings.TrimPrefix(trimmed, "- "))
+				continue
+			}
+			inAliases = false // end of alias block
+		}
+
+		key, val, ok := strings.Cut(line, ":")
+		if !ok {
+			continue
+		}
+		switch strings.TrimSpace(key) {
+		case "title":
+			title = strings.Trim(strings.TrimSpace(val), `"'`)
+		case "aliases":
+			inAliases = true
+		}
+	}
+	return
+}
+```
+
+Update `LoadInventory` to use `readFrontmatter`:
+
+```go
+title, aliases := readFrontmatter(path, slug)
+result[pt] = append(result[pt], Entry{Slug: slug, Title: title, Aliases: aliases, Type: pt})
+```
+
+Remove the old `readTitle` function entirely.
+
+- [ ] **Step 5: Run all wiki tests**
+
+```bash
+cd ingestion && go test ./internal/wiki/... -v
+```
+Expected: PASS — all existing tests plus new alias test.
+
+- [ ] **Step 6: Commit**
+
+```bash
+cd ingestion && git add internal/wiki/types.go internal/wiki/inventory.go internal/wiki/inventory_test.go
+git commit -m "feat(wiki): add Aliases to Entry and read from YAML frontmatter"
+```
+
+---
+
+### Task 4: Fuzzy entity resolution
+
+**Files:**
+- Create: `ingestion/internal/pipeline/resolve.go`
+- Create: `ingestion/internal/pipeline/resolve_test.go`
+
+- [ ] **Step 1: Write failing tests**
+
+```go
+// ingestion/internal/pipeline/resolve_test.go
+package pipeline
+
+import (
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+func TestResolve_NoMatch(t *testing.T) {
+	proposed := []wiki.Page{
+		{Path: "wiki/entities/new-person.md", Content: "---\ntitle: New Person\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeEntity: {
+			{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: []string{"Singer"}},
+		},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	assert.Equal(t, "wiki/entities/new-person.md", got[0].Path)
+}
+
+func TestResolve_TitleMatchRedirectsSlug(t *testing.T) {
+	// Proposed slug differs from existing but title matches.
+	proposed := []wiki.Page{
+		{Path: "wiki/entities/ryan-singer-the-designer.md", Content: "---\ntitle: Ryan Singer\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeEntity: {
+			{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: nil},
+		},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	assert.Equal(t, "wiki/entities/ryan-singer.md", got[0].Path)
+}
+
+func TestResolve_AliasMatchRedirectsSlug(t *testing.T) {
+	// Proposed title matches an existing alias.
+	proposed := []wiki.Page{
+		{Path: "wiki/entities/singer.md", Content: "---\ntitle: Singer\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeEntity: {
+			{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: []string{"Singer", "R. Singer"}},
+		},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	assert.Equal(t, "wiki/entities/ryan-singer.md", got[0].Path)
+}
+
+func TestResolve_NormalizationCaseAndArticles(t *testing.T) {
+	// "the shape up method" normalizes to "shape up method" which matches "Shape Up Method".
+	proposed := []wiki.Page{
+		{Path: "wiki/concepts/the-shape-up-method.md", Content: "---\ntitle: The Shape Up Method\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeConcept: {
+			{Slug: "shape-up-method", Title: "Shape Up Method", Aliases: nil},
+		},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	assert.Equal(t, "wiki/concepts/shape-up-method.md", got[0].Path)
+}
+
+func TestResolve_OnlyMatchesSamePageType(t *testing.T) {
+	// A concept slug must not redirect to an entity with the same normalized name.
+	proposed := []wiki.Page{
+		{Path: "wiki/concepts/ryan-singer.md", Content: "---\ntitle: Ryan Singer\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeEntity: {
+			{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: nil},
+		},
+		wiki.PageTypeConcept: {},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	// Not redirected — different page type.
+	assert.Equal(t, "wiki/concepts/ryan-singer.md", got[0].Path)
+}
+
+func TestResolve_EmptyInventory(t *testing.T) {
+	proposed := []wiki.Page{
+		{Path: "wiki/entities/first.md", Content: "---\ntitle: First\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{}
+	got := Resolve(proposed, inventory)
+	assert.Equal(t, proposed, got)
+}
+```
+
+- [ ] **Step 2: Run to verify it fails**
+
+```bash
+cd ingestion && go test ./internal/pipeline/... -v -run TestResolve
+```
+Expected: compile error — `Resolve` not defined.
+
+- [ ] **Step 3: Implement resolve.go**
+
+```go
+// ingestion/internal/pipeline/resolve.go
+package pipeline
+
+import (
+	"path/filepath"
+	"strings"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+// Resolve remaps proposed pages to existing slugs when a fuzzy title match is found.
+// It only matches within the same page type (entities→entities, concepts→concepts).
+// Pages with no inventory match are returned unchanged.
+func Resolve(proposed []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) []wiki.Page {
+	// Build normalized lookup: normalized_title → canonical slug, keyed by page type.
+	type key struct {
+		pt         wiki.PageType
+		normalized string
+	}
+	lookup := make(map[key]string) // key → canonical slug
+	for pt, entries := range inventory {
+		for _, e := range entries {
+			k := key{pt: pt, normalized: normalizeTitle(e.Title)}
+			lookup[k] = e.Slug
+			for _, alias := range e.Aliases {
+				ak := key{pt: pt, normalized: normalizeTitle(alias)}
+				if _, exists := lookup[ak]; !exists {
+					lookup[ak] = e.Slug
+				}
+			}
+		}
+	}
+
+	out := make([]wiki.Page, 0, len(proposed))
+	for _, page := range proposed {
+		pt := pageTypeFromPath(page.Path)
+		title := extractTitle(page.Content)
+		k := key{pt: pt, normalized: normalizeTitle(title)}
+		if canonicalSlug, ok := lookup[k]; ok {
+			// Redirect path to canonical slug.
+			dir := filepath.Dir(page.Path)
+			page.Path = dir + "/" + canonicalSlug + ".md"
+		}
+		out = append(out, page)
+	}
+	return out
+}
+
+// normalizeTitle lowercases, removes leading articles, collapses whitespace.
+// "The Shape Up Method" → "shape up method"
+func normalizeTitle(s string) string {
+	s = strings.ToLower(strings.TrimSpace(s))
+	// Strip leading articles.
+	for _, article := range []string{"the ", "a ", "an "} {
+		s = strings.TrimPrefix(s, article)
+	}
+	// Collapse internal whitespace and replace hyphens.
+	s = strings.ReplaceAll(s, "-", " ")
+	return strings.Join(strings.Fields(s), " ")
+}
+
+// pageTypeFromPath extracts the wiki.PageType from a path like "wiki/entities/foo.md".
+func pageTypeFromPath(path string) wiki.PageType {
+	parts := strings.Split(filepath.ToSlash(path), "/")
+	if len(parts) >= 2 {
+		return wiki.PageType(parts[1])
+	}
+	return ""
+}
+
+// extractTitle reads the title field from YAML frontmatter in content.
+// Falls back to empty string if not found.
+func extractTitle(content string) string {
+	lines := strings.SplitN(content, "\n", 30)
+	inFM := false
+	for _, line := range lines {
+		if strings.TrimSpace(line) == "---" {
+			if !inFM {
+				inFM = true
+				continue
+			}
+			break
+		}
+		if inFM {
+			key, val, ok := strings.Cut(line, ":")
+			if ok && strings.TrimSpace(key) == "title" {
+				return strings.Trim(strings.TrimSpace(val), `"'`)
+			}
+		}
+	}
+	return ""
+}
+```
+
+- [ ] **Step 4: Run resolve tests**
+
+```bash
+cd ingestion && go test ./internal/pipeline/... -v -run TestResolve
+```
+Expected: PASS — 6 tests passing.
+
+- [ ] **Step 5: Commit**
+
+```bash
+cd ingestion && git add internal/pipeline/resolve.go internal/pipeline/resolve_test.go
+git commit -m "feat(pipeline): add fuzzy entity resolution to prevent slug proliferation"
+```
+
+---
+
+### Task 5: Wire Resolve into pipeline.Run
+
+**Files:**
+- Modify: `ingestion/internal/pipeline/pipeline.go`
+
+- [ ] **Step 1: Add Resolve call after ParsePages in Run()**
+
+In `pipeline.go`, locate the loop that builds `allPages`. After `allPages = append(allPages, pages...)`, we have all pages from all chunks. Resolve must run after all chunks are merged, against the snapshot inventory loaded at the start of the run.
+
+Replace the `merged := mergeAll(allPages)` line with:
+
+```go
+resolved := Resolve(allPages, inventory)
+merged := mergeAll(resolved)
+```
+
+The full relevant section of `Run` after this change:
+
+```go
+for _, chunk := range chunks {
+    userPrompt := BuildPrompt(schema, source, chunk, inventory)
+    output, err := cfg.Complete(ctx, systemPrompt, userPrompt)
+    if err != nil {
+        return Result{}, fmt.Errorf("LLM call: %w", err)
+    }
+    pages, warnings := ParsePages(output)
+    allPages = append(allPages, pages...)
+    allWarnings = append(allWarnings, warnings...)
+}
+
+resolved := Resolve(allPages, inventory)
+merged := mergeAll(resolved)
+```
+
+- [ ] **Step 2: Run all pipeline tests**
+
+```bash
+cd ingestion && go test ./internal/pipeline/... -v
+```
+Expected: PASS — all existing tests still pass (Resolve is a no-op when inventory is empty or no title matches).
+
+- [ ] **Step 3: Commit**
+
+```bash
+cd ingestion && git add internal/pipeline/pipeline.go
+git commit -m "feat(pipeline): resolve proposed pages against inventory before writing"
+```
+
+---
+
+### Task 6: Wire extract.Text into watcher and handler
+
+**Files:**
+- Modify: `ingestion/internal/watcher/watcher.go`
+- Modify: `ingestion/internal/api/handler.go`
+
+- [ ] **Step 1: Update watcher.go**
+
+In `processFile`, replace:
+
+```go
+content, err := os.ReadFile(path)
+if err != nil {
+    return fmt.Errorf("read file: %w", err)
+}
+
+_, runErr := pipeline.Run(ctx, cfg.Pipeline, cfg.BrainDir, string(content), source, false)
+```
+
+With:
+
+```go
+content, err := extract.Text(path)
+if err != nil {
+    return fmt.Errorf("extract text: %w", err)
+}
+
+_, runErr := pipeline.Run(ctx, cfg.Pipeline, cfg.BrainDir, content, source, false)
+```
+
+Add import: `"github.com/mathiasbq/hyperguild/ingestion/internal/extract"`
+
+Remove import: `"os"` if no longer used (check — `os` is still used for `os.MkdirAll`, `os.WriteFile`, `os.Stat`; keep it).
+
+- [ ] **Step 2: Update handler.go — single-file path**
+
+In `IngestPath`, the single-file branch reads:
+
+```go
+content, readErr := os.ReadFile(req.Path)
+if readErr != nil {
+    writeError(w, http.StatusInternalServerError, fmt.Sprintf("read file: %v", readErr))
+    return
+}
+```
+
+Replace with:
+
+```go
+content, readErr := extract.Text(req.Path)
+if readErr != nil {
+    writeError(w, http.StatusInternalServerError, fmt.Sprintf("extract text: %v", readErr))
+    return
+}
+```
+
+- [ ] **Step 3: Update handler.go — directory walk branch**
+
+In `IngestPath`, the directory walk reads:
+
+```go
+content, readErr := os.ReadFile(path)
+if readErr != nil {
+    allWarnings = append(allWarnings, fmt.Sprintf("read %s: %v", path, readErr))
+    return nil
+}
+source := req.Source
+if source == "" {
+    source = filepath.Base(path)
+}
+result, runErr := pipeline.Run(r.Context(), h.pipeline, h.brainDir, string(content), source, req.DryRun)
+```
+
+Replace with:
+
+```go
+content, readErr := extract.Text(path)
+if readErr != nil {
+    allWarnings = append(allWarnings, fmt.Sprintf("extract %s: %v", path, readErr))
+    return nil
+}
+source := req.Source
+if source == "" {
+    source = filepath.Base(path)
+}
+result, runErr := pipeline.Run(r.Context(), h.pipeline, h.brainDir, content, source, req.DryRun)
+```
+
+Add import: `"github.com/mathiasbq/hyperguild/ingestion/internal/extract"` to handler.go.
+
+- [ ] **Step 4: Build to verify no compile errors**
+
+```bash
+cd ingestion && go build ./...
+```
+Expected: success, no errors.
+
+- [ ] **Step 5: Run all tests**
+
+```bash
+cd ingestion && go test ./...
+```
+Expected: PASS — all tests pass (watcher tests use .md files, already covered by extract passthrough).
+
+- [ ] **Step 6: Commit**
+
+```bash
+cd ingestion && git add internal/watcher/watcher.go internal/api/handler.go
+git commit -m "feat(watcher,api): use extract.Text() for file reading — fixes PDF ingestion"
+```
+
+---
+
+### Task 7: Add poppler-utils to Dockerfile
+
+**Files:**
+- Modify: `ingestion/Dockerfile`
+
+- [ ] **Step 1: Add apk install for poppler-utils**
+
+In `ingestion/Dockerfile`, add `poppler-utils` to the Alpine runtime stage. The current final stage is:
+
+```dockerfile
+FROM alpine:3.21
+
+COPY --from=builder /out/ingestion /usr/local/bin/ingestion
+
+RUN addgroup -S ingestion && adduser -S -G ingestion ingestion
+```
+
+Replace with:
+
+```dockerfile
+FROM alpine:3.21
+
+RUN apk add --no-cache poppler-utils
+
+COPY --from=builder /out/ingestion /usr/local/bin/ingestion
+
+RUN addgroup -S ingestion && adduser -S -G ingestion ingestion
+```
+
+- [ ] **Step 2: Verify Dockerfile builds (local Docker)**
+
+```bash
+cd ingestion && docker build -t ingestion:test .
+```
+Expected: image builds successfully; `pdftotext` is available inside.
+
+- [ ] **Step 3: Verify pdftotext is accessible in the image**
+
+```bash
+docker run --rm ingestion:test pdftotext -v
+```
+Expected: prints version string like `pdftotext version 24.x.x`.
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd ingestion && git add Dockerfile
+git commit -m "chore(docker): add poppler-utils for PDF text extraction"
+```
+
+---
+
+## Self-Review
+
+**Spec coverage check:**
+
+| Requirement | Task |
+|---|---|
+| PDF extraction via pdftotext | Tasks 2, 6, 7 |
+| .md and .txt passthrough (no regression) | Task 1 |
+| Unsupported extension → clear error | Task 1 |
+| Entry.Aliases loaded from frontmatter | Task 3 |
+| Fuzzy normalization (case, articles, hyphens) | Task 4 |
+| Alias matching | Task 4 |
+| Title matching across different proposed slugs | Task 4 |
+| Cross-page-type isolation (concept ≠ entity) | Task 4 |
+| Resolve wired into pipeline.Run | Task 5 |
+| extract.Text wired into watcher | Task 6 |
+| extract.Text wired into handler (single + dir) | Task 6 |
+| Dockerfile includes poppler-utils | Task 7 |
+
+**Placeholder scan:** None found.
+
+**Type consistency:**
+- `Resolve([]wiki.Page, map[wiki.PageType][]wiki.Entry) []wiki.Page` — consistent across Tasks 4 and 5.
+- `extract.Text(path string) (string, error)` — consistent across Tasks 1, 2, and 6.
+- `Entry.Aliases []string` — added in Task 3, used by Resolve in Task 4 (reads `e.Aliases`).
+- `readFrontmatter` replaces `readTitle` entirely in Task 3 — no lingering `readTitle` calls.
--- a/docs/superpowers/plans/2026-04-23-level3-slug-authority.md
+++ b/docs/superpowers/plans/2026-04-23-level3-slug-authority.md
--- a/docs/superpowers/plans/2026-04-23-source-backrefs.md
+++ b/docs/superpowers/plans/2026-04-23-source-backrefs.md
@@ -0,0 +1,433 @@
+# Source Back-References Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** After the LLM produces wiki pages for an ingestion, automatically inject a `## Sources` back-reference on every concept and entity page that the source page links to.
+
+**Architecture:** A new `injectSourceRefs` post-processing step is inserted between `Resolve` and `mergeAll` in `pipeline.Run`. It finds the source page in the proposed batch, extracts all `[[slug|...]]` wikilinks, then calls `wiki.Merge` with a minimal patch page to add the back-reference. `wiki.Merge` already treats `## Sources` as a bullet section with deduplication — no custom section parsing is needed. For concepts/entities that exist on disk but weren't proposed in the current batch (the common case on re-ingestion), the function loads them from disk and adds them to the pages list so they are updated.
+
+**Tech Stack:** Go stdlib (`regexp`, `os`, `path/filepath`, `strings`), existing `wiki.Merge` and `wiki.Page` types.
+
+---
+
+## File Structure
+
+**New files:**
+- `ingestion/internal/pipeline/refs.go` — `injectSourceRefs`, `addSourceRef`, `extractWikilinks`, `findSourcePage`, `findInInventory`
+- `ingestion/internal/pipeline/refs_test.go` — table-driven tests
+
+**Modified files:**
+- `ingestion/internal/pipeline/pipeline.go` — insert `injectSourceRefs` call between `Resolve` and `mergeAll`
+
+---
+
+### Task 1: `refs.go` — source back-reference injection
+
+**Files:**
+- Create: `ingestion/internal/pipeline/refs_test.go`
+- Create: `ingestion/internal/pipeline/refs.go`
+
+- [ ] **Step 1: Write the failing tests**
+
+```go
+// ingestion/internal/pipeline/refs_test.go
+package pipeline
+
+import (
+	"os"
+	"path/filepath"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+// makeInventory builds a minimal inventory for test use.
+func makeInventory(concepts, entities []string) map[wiki.PageType][]wiki.Entry {
+	inv := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeConcept: {},
+		wiki.PageTypeEntity:  {},
+		wiki.PageTypeSource:  {},
+	}
+	for _, slug := range concepts {
+		inv[wiki.PageTypeConcept] = append(inv[wiki.PageTypeConcept], wiki.Entry{Slug: slug, Title: slug})
+	}
+	for _, slug := range entities {
+		inv[wiki.PageTypeEntity] = append(inv[wiki.PageTypeEntity], wiki.Entry{Slug: slug, Title: slug})
+	}
+	return inv
+}
+
+func TestInjectSourceRefs_NoSourcePage(t *testing.T) {
+	pages := []wiki.Page{
+		{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFoo.\n"},
+	}
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+	assert.Equal(t, pages, got)
+}
+
+func TestInjectSourceRefs_InjectsIntoProposedConcept(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/my-article.md",
+			Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[domain-driven-design|Domain Driven Design]].\n",
+		},
+		{
+			Path:    "wiki/concepts/domain-driven-design.md",
+			Content: "---\ntitle: Domain Driven Design\n---\n\n## Definition\n\nA methodology.\n",
+		},
+	}
+
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+
+	require.Len(t, got, 2)
+	assert.Contains(t, got[1].Content, "## Sources")
+	assert.Contains(t, got[1].Content, "[[my-article|My Article]]")
+}
+
+func TestInjectSourceRefs_LoadsConceptFromDisk(t *testing.T) {
+	brainDir := t.TempDir()
+	conceptDir := filepath.Join(brainDir, "wiki", "concepts")
+	require.NoError(t, os.MkdirAll(conceptDir, 0o755))
+	require.NoError(t, os.WriteFile(
+		filepath.Join(conceptDir, "shape-up.md"),
+		[]byte("---\ntitle: Shape Up\n---\n\n## Definition\n\nA methodology.\n"),
+		0o644,
+	))
+
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/my-article.md",
+			Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[shape-up|Shape Up]].\n",
+		},
+	}
+	inv := makeInventory([]string{"shape-up"}, nil)
+
+	got := injectSourceRefs(pages, inv, brainDir)
+
+	// Should have loaded shape-up.md from disk and added it with source ref.
+	require.Len(t, got, 2)
+	var conceptPage wiki.Page
+	for _, p := range got {
+		if p.Path == "wiki/concepts/shape-up.md" {
+			conceptPage = p
+		}
+	}
+	assert.Contains(t, conceptPage.Content, "## Sources")
+	assert.Contains(t, conceptPage.Content, "[[my-article|My Article]]")
+	// Original content preserved.
+	assert.Contains(t, conceptPage.Content, "## Definition")
+}
+
+func TestInjectSourceRefs_NoSelfReference(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/my-article.md",
+			Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSelf-link [[my-article|My Article]].\n",
+		},
+	}
+
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+
+	// Only one page — source should not reference itself.
+	assert.Len(t, got, 1)
+}
+
+func TestInjectSourceRefs_DeduplicatesOnReingestion(t *testing.T) {
+	// Concept already has source ref from a prior ingestion.
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/my-article.md",
+			Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[ddd|DDD]].\n",
+		},
+		{
+			Path:    "wiki/concepts/ddd.md",
+			Content: "---\ntitle: DDD\n---\n\n## Definition\n\nA thing.\n\n## Sources\n\n- [[my-article|My Article]]\n",
+		},
+	}
+
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+
+	require.Len(t, got, 2)
+	// The source ref must appear exactly once.
+	count := 0
+	for _, line := range splitLines(got[1].Content) {
+		if line == "- [[my-article|My Article]]" {
+			count++
+		}
+	}
+	assert.Equal(t, 1, count, "source ref should appear exactly once")
+}
+
+func TestInjectSourceRefs_InjectsIntoEntity(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/book.md",
+			Content: "---\ntitle: Book\n---\n\n## Summary\n\nBy [[ryan-singer|Ryan Singer]].\n",
+		},
+		{
+			Path:    "wiki/entities/ryan-singer.md",
+			Content: "---\ntitle: Ryan Singer\n---\n\n## Description\n\nA designer.\n",
+		},
+	}
+
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+
+	require.Len(t, got, 2)
+	var entity wiki.Page
+	for _, p := range got {
+		if p.Path == "wiki/entities/ryan-singer.md" {
+			entity = p
+		}
+	}
+	assert.Contains(t, entity.Content, "[[book|Book]]")
+}
+
+func TestExtractWikilinks(t *testing.T) {
+	content := "See [[foo|Foo]] and [[bar|Bar]] and [[foo|Foo again]]."
+	got := extractWikilinks(content)
+	assert.True(t, got["foo"])
+	assert.True(t, got["bar"])
+	assert.Len(t, got, 2, "duplicate slugs should be deduplicated")
+}
+
+// splitLines is a test helper.
+func splitLines(s string) []string {
+	var out []string
+	for _, l := range splitNewlines(s) {
+		if l != "" {
+			out = append(out, l)
+		}
+	}
+	return out
+}
+
+func splitNewlines(s string) []string {
+	var lines []string
+	start := 0
+	for i, c := range s {
+		if c == '\n' {
+			lines = append(lines, s[start:i])
+			start = i + 1
+		}
+	}
+	lines = append(lines, s[start:])
+	return lines
+}
+```
+
+- [ ] **Step 2: Run to verify they fail**
+
+```bash
+cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./internal/pipeline/... -run "TestInjectSourceRefs|TestExtractWikilinks" -v
+```
+Expected: compile error — `injectSourceRefs` and `extractWikilinks` not defined.
+
+- [ ] **Step 3: Implement refs.go**
+
+```go
+// ingestion/internal/pipeline/refs.go
+package pipeline
+
+import (
+	"os"
+	"path/filepath"
+	"regexp"
+	"strings"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+var wikilinkRE = regexp.MustCompile(`\[\[([^|\]]+)\|`)
+
+// injectSourceRefs finds the source page in the proposed batch, extracts its wikilinks,
+// and injects a back-reference into every linked concept or entity page.
+// Pages that exist on disk but are not in the current batch are loaded and appended
+// so they will be updated on write.
+func injectSourceRefs(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry, brainDir string) []wiki.Page {
+	sourceSlug, sourceTitle, found := findSourcePage(pages)
+	if !found {
+		return pages
+	}
+
+	// Locate source page content for wikilink extraction.
+	var sourceContent string
+	for _, p := range pages {
+		if strings.HasPrefix(p.Path, "wiki/sources/") &&
+			strings.TrimSuffix(filepath.Base(p.Path), ".md") == sourceSlug {
+			sourceContent = p.Content
+			break
+		}
+	}
+
+	linkedSlugs := extractWikilinks(sourceContent)
+	sourceRef := "- [[" + sourceSlug + "|" + sourceTitle + "]]"
+
+	// Build slug → index map for proposed pages (excluding wiki/sources/).
+	bySlug := make(map[string]int, len(pages))
+	for i, p := range pages {
+		if !strings.HasPrefix(p.Path, "wiki/sources/") {
+			bySlug[strings.TrimSuffix(filepath.Base(p.Path), ".md")] = i
+		}
+	}
+
+	for slug := range linkedSlugs {
+		if slug == sourceSlug {
+			continue // no self-reference
+		}
+
+		if idx, ok := bySlug[slug]; ok {
+			// Concept/entity is in the proposed batch — inject inline.
+			pages[idx] = addSourceRef(pages[idx], sourceRef)
+			continue
+		}
+
+		// Not in proposed batch — look for it in the inventory (exists on disk).
+		pt, ok := findInInventory(slug, inventory)
+		if !ok {
+			continue
+		}
+		diskPath := filepath.Join(brainDir, "wiki", string(pt), slug+".md")
+		b, err := os.ReadFile(diskPath)
+		if err != nil {
+			continue // page not found on disk; skip
+		}
+		page := wiki.Page{
+			Path:    "wiki/" + string(pt) + "/" + slug + ".md",
+			Content: string(b),
+		}
+		pages = append(pages, addSourceRef(page, sourceRef))
+	}
+
+	return pages
+}
+
+// addSourceRef injects sourceRef into the ## Sources bullet section of page.
+// Uses wiki.Merge so that existing Sources entries are deduplicated and all
+// other sections are preserved unchanged.
+func addSourceRef(page wiki.Page, sourceRef string) wiki.Page {
+	patch := wiki.Page{
+		Path:    page.Path,
+		Content: "\n## Sources\n\n" + sourceRef + "\n",
+	}
+	return wiki.Merge(page, patch)
+}
+
+// extractWikilinks returns the set of slugs referenced as [[slug|...]] in content.
+func extractWikilinks(content string) map[string]bool {
+	slugs := make(map[string]bool)
+	for _, m := range wikilinkRE.FindAllStringSubmatch(content, -1) {
+		slugs[m[1]] = true
+	}
+	return slugs
+}
+
+// findSourcePage returns the slug and title of the first wiki/sources/ page in pages.
+func findSourcePage(pages []wiki.Page) (slug, title string, found bool) {
+	for _, p := range pages {
+		if strings.HasPrefix(p.Path, "wiki/sources/") {
+			slug = strings.TrimSuffix(filepath.Base(p.Path), ".md")
+			title = extractTitle(p.Content)
+			if title == "" {
+				title = slug
+			}
+			return slug, title, true
+		}
+	}
+	return "", "", false
+}
+
+// findInInventory returns the PageType for a slug if it appears in the inventory.
+func findInInventory(slug string, inventory map[wiki.PageType][]wiki.Entry) (wiki.PageType, bool) {
+	for pt, entries := range inventory {
+		for _, e := range entries {
+			if e.Slug == slug {
+				return pt, true
+			}
+		}
+	}
+	return "", false
+}
+```
+
+- [ ] **Step 4: Run all pipeline tests**
+
+```bash
+cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./internal/pipeline/... -v
+```
+Expected: all existing tests PASS + 7 new refs tests PASS.
+
+- [ ] **Step 5: Commit**
+
+```bash
+cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs && git add ingestion/internal/pipeline/refs.go ingestion/internal/pipeline/refs_test.go && git commit -m "feat(pipeline): inject source back-references into concept and entity pages"
+```
+
+---
+
+### Task 2: Wire injectSourceRefs into pipeline.Run
+
+**Files:**
+- Modify: `ingestion/internal/pipeline/pipeline.go`
+
+- [ ] **Step 1: Insert the call**
+
+In `pipeline.go`, locate:
+
+```go
+	resolved := Resolve(allPages, inventory)
+	merged := mergeAll(resolved)
+```
+
+Replace with:
+
+```go
+	resolved := Resolve(allPages, inventory)
+	withRefs := injectSourceRefs(resolved, inventory, brainDir)
+	merged := mergeAll(withRefs)
+```
+
+No import changes needed — same package.
+
+- [ ] **Step 2: Run all pipeline tests**
+
+```bash
+cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./internal/pipeline/... -v
+```
+Expected: all tests PASS. The existing `TestRun_WritesPages` and `TestRun_DryRunDoesNotWrite` use LLM mocks that return source pages with no wikilinks to concepts — `injectSourceRefs` is a no-op for them.
+
+- [ ] **Step 3: Run full test suite + lint**
+
+```bash
+cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./... && golangci-lint run ./...
+```
+Expected: all packages PASS, 0 lint issues.
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs && git add ingestion/internal/pipeline/pipeline.go && git commit -m "feat(pipeline): wire source back-reference injection into Run"
+```
+
+---
+
+## Self-Review
+
+**Spec coverage:**
+
+| Requirement | Task |
+|---|---|
+| Concepts get `## Sources` back-link to ingested source | Task 1 |
+| Entities get `## Sources` back-link | Task 1 (TestInjectSourceRefs_InjectsIntoEntity) |
+| Existing pages on disk get updated with new source | Task 1 (TestInjectSourceRefs_LoadsConceptFromDisk) |
+| Re-ingestion of same source does not duplicate the ref | Task 1 (TestInjectSourceRefs_DeduplicatesOnReingestion) |
+| Source page does not reference itself | Task 1 (TestInjectSourceRefs_NoSelfReference) |
+| No-op when batch has no source page | Task 1 (TestInjectSourceRefs_NoSourcePage) |
+| Wired into Run between Resolve and mergeAll | Task 2 |
+| Full test suite and lint pass | Task 2 Step 3 |
+
+**Placeholder scan:** None.
+
+**Type consistency:** `injectSourceRefs([]wiki.Page, map[wiki.PageType][]wiki.Entry, string) []wiki.Page` — used identically in refs.go (definition) and pipeline.go (call site).
--- a/docs/superpowers/specs/2026-04-22-brain-ingestion-pipeline-design.md
+++ b/docs/superpowers/specs/2026-04-22-brain-ingestion-pipeline-design.md
@@ -0,0 +1,240 @@
+# Brain Ingestion Pipeline — Design Spec
+
+**Date:** 2026-04-22
+**Status:** approved
+**Author:** Mathias + Claude
+
+---
+
+## Overview
+
+Add a structured ingestion pipeline to the hyperguild brain. The pipeline accepts raw content (directly or from files) and uses an LLM to produce structured wiki pages in `brain/wiki/` — the declarative layer of the Two-Layer Brain. Three fixed knowledge classes: **concepts**, **entities**, **sources**.
+
+This spec covers:
+- Three new packages in the `ingestion` Go module (`llm`, `wiki`, `pipeline`, `watcher`)
+- Two new HTTP endpoints on the ingestion server (`/ingest`, `/ingest-path`)
+- A background file watcher for `brain/raw/`
+- Config additions to both the ingestion server and the supervisor
+
+It does **not** cover Layer 2 (training data, `brain/training-data/`) — that is the trainer worker's concern.
+
+---
+
+## Information Model
+
+Three fixed wiki page classes, matching the Two-Layer Brain design spec and the existing `ingestion-svc` model:
+
+### `wiki/sources/<slug>.md`
+One page per ingested source (project, book, article, note). Updated (not replaced) on re-ingestion.
+
+Required frontmatter: `title`, `type` (article|pdf|book|video|note|project), `domain`, `source_url`, `date_ingested`, `last_updated`, `aliases`.
+
+Body sections: Summary · Key Claims · Concepts Introduced or Reinforced · Entities Mentioned · Open Questions Raised. Books add: Chapters · Argument Arc · Updates (dated, append-only).
+
+### `wiki/concepts/<slug>.md`
+One page per idea, framework, methodology, or pattern (e.g. Domain Driven Design, TDD, event sourcing).
+
+Required frontmatter: `title`, `domain`, `last_updated`, `aliases`.
+
+Body sections: Definition · Why It Matters · Related Concepts · Related Entities · Sources · Evolving Notes.
+
+### `wiki/entities/<slug>.md`
+One page per person, tool, organisation, technology, or product.
+
+Required frontmatter: `title`, `type` (person|company|tool|model|framework|technology), `domain`, `last_updated`, `aliases`.
+
+Body sections: Description · Relevance · Key Positions/Products/Claims · Related Concepts · Related Entities · Sources.
+
+### Wikilink format
+All cross-references use `[[slug|Display Text]]`. Slug = lowercase title, spaces→hyphens, non-alphanumeric stripped. Slugs must resolve to an existing file in the wiki.
+
+### Supporting files
+- `brain/wiki/index.md` — auto-rebuilt on every ingest: one-sentence summary per page, grouped by type
+- `brain/log.md` — append-only audit trail: date, source, pages written, warnings
+
+---
+
+## Architecture
+
+### New packages (`ingestion` module)
+
+```
+ingestion/internal/
+  llm/        — OpenAI-compatible HTTP client (chat completions, retry on 429,
+                configurable timeout and temperature)
+  wiki/        — Page types, slug utilities, merge logic, inventory loader,
+                index rebuilder, log appender
+  pipeline/   — Orchestrates one ingest run end-to-end (content or extracted file text)
+  watcher/    — Polls brain/raw/ and triggers pipeline on new files
+```
+
+The existing `api/` and `search/` packages are updated; no other existing packages change.
+
+### Brain directory layout
+
+```
+brain/
+  wiki/
+    concepts/        ← LLM-structured concept pages
+    entities/        ← LLM-structured entity pages
+    sources/         ← LLM-structured source pages
+    index.md         ← auto-rebuilt on each ingest
+  knowledge/         ← quick raw notes via brain_write (BM25-searchable, unchanged)
+  raw/               ← drop zone; watcher picks up files here
+    processed/       ← moved here on success (organised by date: processed/YYYY-MM-DD/)
+    failed/          ← moved here on failure
+  sessions/          ← session logs (retrospective/trainer concern, not touched here)
+  training-data/     ← Layer 2 (trainer worker concern, not touched here)
+  log.md             ← append-only audit trail
+  CLAUDE.md          ← schema document injected into every ingest prompt
+```
+
+If `brain/CLAUDE.md` is absent, the pipeline falls back to an embedded default schema compiled into the binary.
+
+---
+
+## API
+
+### `POST /ingest`
+
+Ingest content provided directly by the caller.
+
+**Request:**
+```json
+{
+  "content": "...",
+  "source": "shape-up-book",
+  "dry_run": false
+}
+```
+
+**Response:**
+```json
+{
+  "pages": ["wiki/sources/shape-up.md", "wiki/concepts/betting-table.md"],
+  "warnings": []
+}
+```
+
+`source` is the human-readable name used when writing/updating `wiki/sources/<slug>.md`. `dry_run: true` returns the page contents without writing.
+
+### `POST /ingest-path`
+
+Ingest a file or walk a directory recursively. Supports `.md`, `.txt`, `.pdf`.
+
+**Request:**
+```json
+{
+  "path": "/Users/mathias/brain/raw/shape-up.pdf",
+  "source": "shape-up-book",
+  "dry_run": false
+}
+```
+
+If `path` is a directory, all supported files within it are ingested in sequence. `source` is optional for directory ingestion — if omitted, the LLM derives it from each file's name and content.
+
+**Response:** same shape as `/ingest`, with pages and warnings aggregated across all files.
+
+### Supervisor skill update
+
+`brain_ingest` in `internal/skills/brain/handlers.go` gains an optional `path` field. If `path` is set, it calls `/ingest-path`; otherwise `/ingest`.
+
+---
+
+## Pipeline
+
+`pipeline.Run(ctx, cfg, brainDir, content, source, dryRun)` — called by both HTTP handlers after any file reading is done.
+
+Steps:
+
+1. **Load inventory** — walk `brain/wiki/{concepts,entities,sources}/`, build slug index grouped by type. Injected into prompt so LLM knows what to update vs create.
+2. **Load schema** — read `brain/CLAUDE.md`; fall back to embedded default if absent.
+3. **Chunk** — split content at `INGEST_CHUNK_SIZE` chars (default 6000; split on paragraph boundary). If `INGEST_CHUNK_SIZE=0`, no chunking.
+4. **LLM call per chunk** — returns JSON array of `{"path": "wiki/concepts/foo.md", "content": "..."}`. Prompt structure: system instruction → date → schema → inventory → non-negotiable slug/wikilink rules → source content.
+5. **Parse + truncation recovery** — strip markdown fences if present. If JSON array is truncated mid-object (token limit), salvage all complete objects before the break and log a warning.
+6. **Merge** — combine pages with the same path across chunks:
+   - Bullet sections (Related Concepts, Related Entities, Sources, Key Claims): union unique lines
+   - Append sections (Evolving Notes, Updates, Open Questions): append new content
+   - All other sections: keep first occurrence
+   - Frontmatter: keep first occurrence
+7. **Write** — create subdirs as needed, write files atomically. In dry-run mode, return page map without writing.
+8. **Rebuild `index.md`** — one-sentence summary per page (derived from first body paragraph), grouped by type, with page count header.
+9. **Append to `log.md`** — date, source, list of pages written, warning count.
+
+---
+
+## File Watcher
+
+Background goroutine started at server startup (when `INGEST_WATCH_INTERVAL > 0`).
+
+**Poll loop:**
+1. Walk `brain/raw/` for files with supported extensions (`.md`, `.txt`, `.pdf`), excluding `processed/` and `failed/` subdirs.
+2. For each file found: derive source from filename (strip extension, kebab-to-title), call `pipeline.Run` with the file content.
+3. On success: move file to `brain/raw/processed/YYYY-MM-DD/<filename>`.
+4. On failure: move file to `brain/raw/failed/<filename>`, append error to `brain/log.md`.
+5. Sleep `INGEST_WATCH_INTERVAL` seconds, repeat.
+
+Files are processed one at a time (no concurrency within the watcher) to avoid LLM rate-limit collisions.
+
+---
+
+## LLM Prompt
+
+**System:**
+> You are a wiki agent. Read the source material and produce structured wiki pages following the schema provided. Output ONLY a valid JSON array — no markdown fences, no other text. Each element must have: `"path"` (relative path within wiki, e.g. `"wiki/sources/foo.md"`) and `"content"` (full markdown including YAML frontmatter). Follow the schema strictly: correct frontmatter fields, wikilinks as `[[slug|Display Text]]`, dates in YYYY-MM-DD format, paraphrase rather than quoting verbatim.
+
+**User (built dynamically):**
+1. Today's date
+2. Full schema (`brain/CLAUDE.md` content)
+3. Existing wiki inventory grouped by type (for update-vs-create decisions)
+4. Non-negotiable rules: slug format, wikilink format, one-source-per-book, section type enforcement
+5. Source content (the chunk)
+
+Temperature: 0.2 for reproducibility.
+
+---
+
+## Configuration
+
+### Ingestion server (new env vars)
+
+| Variable | Default | Description |
+|---|---|---|
+| `INGEST_LLM_URL` | `http://iguana:4000/v1` | OpenAI-compatible endpoint |
+| `INGEST_LLM_KEY` | (empty) | API key |
+| `INGEST_LLM_MODEL` | `koala/qwen35-9b-fast` | Model name |
+| `INGEST_LLM_TIMEOUT` | `15` | LLM call timeout (minutes) |
+| `INGEST_CHUNK_SIZE` | `6000` | Max chars per LLM call (0 = no chunking) |
+| `INGEST_WATCH_INTERVAL` | `30` | Watcher poll interval in seconds (0 = disabled) |
+
+### Supervisor (new env vars + wiring)
+
+| Variable | Default | Description |
+|---|---|---|
+| `INGEST_SVC_URL` | (empty) | URL of ingestion server for `brain_ingest` |
+| `KB_RETRIEVAL_URL` | (empty) | URL of KB retrieval server for `brain_search` |
+
+`config.go` gets two new fields. `main.go` passes them to `brain.New()`. Both tools are only registered as MCP tools when the respective URL is configured (already implemented in `skill.go`).
+
+---
+
+## Testing
+
+| Package | What is tested |
+|---|---|
+| `wiki/` | Slug generation (edge cases: apostrophes, colons, version strings), merge logic (bullets union, append, keep-first), inventory loading from temp dir, truncation recovery (valid partial JSON), index rebuild output |
+| `pipeline/` | Integration test: temp brain dir + mock LLM HTTP server returning fixture JSON; verify files written to correct paths, index rebuilt, log appended |
+| `api/` | Handler tests for `/ingest` and `/ingest-path` using mock pipeline; 400 on missing fields, 200 with expected response shape |
+| `watcher/` | File placed in `brain/raw/` is moved to `processed/` on mock-pipeline success; moved to `failed/` on error |
+
+All tests are table-driven. No real LLM calls in tests.
+
+---
+
+## Out of Scope
+
+- Python validation/correction loop (can be added later; the LLM prompt enforces schema rules as non-negotiable instructions)
+- `brain/training-data/` — trainer worker concern
+- `brain/sessions/` — retrospective/sessionlog concern
+- Upload endpoint (multipart HTTP) — `scp`/rsync to `brain/raw/` + watcher covers this
+- Qdrant vector indexing — `brain_search` calls a separate KB retrieval service; ingestion does not write to Qdrant
--- a/docs/superpowers/specs/2026-04-23-level3-slug-authority-design.md
+++ b/docs/superpowers/specs/2026-04-23-level3-slug-authority-design.md
@@ -0,0 +1,148 @@
+# Level 3: Strip Slug Authority from LLM — Design Spec
+
+## Problem
+
+The ingestion pipeline currently asks the LLM to produce full wiki pages including the file path (e.g. `wiki/sources/finbert-huggingface.md`). This causes two classes of bug:
+
+1. **Slug proliferation** — the LLM invents different slugs for the same concept across chunks or runs, producing duplicate pages that diverge in content.
+2. **Unstable paths** — the LLM may shorten, expand, or vary titles, making deduplication via `Resolve` unreliable because the slug mismatch is upstream of the normalizer.
+
+## Solution
+
+Strip slug authority from the LLM entirely. The LLM returns a minimal structured object. The pipeline computes all slugs deterministically from titles using `wiki.Slug(title)`.
+
+---
+
+## LLM JSON Contract
+
+### Output format (per page)
+
+```json
+{
+  "title": "FinBERT",
+  "type": "concept",
+  "subtype": "framework",
+  "domain": "ai-llm",
+  "content": "## Definition\n\nA BERT-based model fine-tuned for financial sentiment...\n\n## Related\n\n- [[Sentiment Analysis]]\n- [[Hugging Face]]\n"
+}
+```
+
+**Fields:**
+
+| Field | Required | Values |
+|-------|----------|--------|
+| `title` | yes | Human-readable title, e.g. "FinBERT" |
+| `type` | yes | `"source"` \| `"concept"` \| `"entity"` |
+| `subtype` | for entity/source | entity: `person\|company\|tool\|model\|framework\|technology`; source: `article\|pdf\|book\|video\|note\|project` |
+| `domain` | no | tag string, e.g. `ai-llm`, `finance` |
+| `content` | yes | Markdown body sections only — no frontmatter, no path |
+
+**Wikilinks in content:** `[[Display Name]]` only. No slug. The pipeline canonicalizes to `[[slug|Display Name]]` in a post-processing step.
+
+**The LLM never writes slugs, paths, or frontmatter.**
+
+---
+
+## Pipeline Changes
+
+### New type: `RawPage`
+
+```go
+type RawPage struct {
+    Title   string
+    Type    string // "source" | "concept" | "entity"
+    Subtype string
+    Domain  string
+    Content string
+}
+```
+
+### New step order
+
+```
+ParseRawPages → BuildPages → Resolve → CanonicalizeLinks → injectSourceRefs → mergeAll → write
+```
+
+### Step descriptions
+
+**`ParseRawPages(output string) ([]RawPage, []string)`**
+Replaces `ParsePages`. Deserializes JSON objects with the new schema. Same truncation-recovery logic as today. Returns `(pages, warnings)`.
+
+**`BuildPages(rawPages []RawPage, sourceSlug, date string) []wiki.Page`**
+Converts `RawPage → wiki.Page`:
+- Computes slug: `wiki.Slug(page.Title)`
+- Computes path: `wiki/<type>/<slug>.md`
+- Assembles frontmatter:
+  ```
+  ---
+  title: <Title>
+  type: <type>
+  subtype: <subtype>        # omitted if empty
+  domain: <domain>          # omitted if empty
+  created: <date>
+  source: <sourceSlug>      # omitted for the source page itself
+  ---
+  ```
+- Concatenates frontmatter + content
+
+**`Resolve(pages []wiki.Page, inventory) []wiki.Page`**
+Unchanged. Normalizes near-duplicate titles to existing inventory slugs.
+
+**`CanonicalizeLinks(pages []wiki.Page, inventory) ([]wiki.Page, []string)`**
+New. Builds a title→slug map from inventory + current batch. Replaces `[[Display Name]]` with `[[slug|Display Name]]` in each page's content. Titles with no known slug are left as-is and returned as warnings.
+
+**`injectSourceRefs`**
+Unchanged. Reads `[[slug|...]]` links (post-canonicalization) to inject back-references.
+
+**`mergeAll → write`**
+Unchanged.
+
+### `pipeline.Run` signature change
+
+```go
+func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryRun bool) (Result, error)
+```
+
+`source` is already passed (it's the display name / filename). A new internal `sourceSlug` is derived from it via `wiki.Slug(source)` before calling `BuildPages`. No API change needed.
+
+---
+
+## Files Changed
+
+| File | Change |
+|------|--------|
+| `ingestion/internal/pipeline/parse.go` | Replace `ParsePages` with `ParseRawPages` + `RawPage` type |
+| `ingestion/internal/pipeline/build.go` | New file: `BuildPages` |
+| `ingestion/internal/pipeline/links.go` | New file: `CanonicalizeLinks` |
+| `ingestion/internal/pipeline/pipeline.go` | Wire new steps; derive `sourceSlug` from `source` |
+| `ingestion/internal/pipeline/prompt.go` | New system prompt + `BuildPrompt` for new JSON format |
+| `brain/schema.md` | Update wikilink format and JSON schema docs |
+
+`resolve.go`, `refs.go`, `backfill.go`, `merge.go` — no changes.
+
+---
+
+## Wikilink Format
+
+- **LLM output**: `[[Display Name]]`
+- **Stored on disk**: `[[slug|Display Name]]`
+- **`CanonicalizeLinks`** converts between the two using the inventory
+
+This matches Obsidian's display-alias syntax that the existing codebase already uses.
+
+---
+
+## Testing Strategy
+
+- `ParseRawPages`: table-driven, cover valid JSON, truncated output, unknown type, missing title
+- `BuildPages`: table-driven, cover slug computation, frontmatter assembly, source page (no `source:` field), entity with subtype
+- `CanonicalizeLinks`: cover known title → replaced, unknown title → left as-is + warning, multiple links in one page
+- Integration test: full `Run` call with mock LLM returning new JSON format, assert no slug duplication across two chunks of the same source
+
+---
+
+## Out of Scope
+
+- Re-ingesting existing pages (user will trigger manually after deploy)
+- Changing the `BackfillRefs` endpoint (already correct, slug-based)
+- Changing the `Resolve` fuzzy-match algorithm
--- a/ingestion/Dockerfile
+++ b/ingestion/Dockerfile
@@ -15,6 +15,8 @@ RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \

 FROM alpine:3.21

+RUN apk add --no-cache poppler-utils
+
 COPY --from=builder /out/ingestion /usr/local/bin/ingestion

 RUN addgroup -S ingestion && adduser -S -G ingestion ingestion
--- a/ingestion/cmd/server/main.go
+++ b/ingestion/cmd/server/main.go
@@ -2,34 +2,88 @@
 package main

 import (
+	"context"
+	"fmt"
 	"log/slog"
 	"net/http"
 	"os"
+	"strconv"
+	"time"

 	"github.com/mathiasbq/hyperguild/ingestion/internal/api"
+	"github.com/mathiasbq/hyperguild/ingestion/internal/llm"
+	"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
+	"github.com/mathiasbq/hyperguild/ingestion/internal/watcher"
 )

+func envOr(key, fallback string) string {
+	if v := os.Getenv(key); v != "" {
+		return v
+	}
+	return fallback
+}
+
+func envInt(key string, fallback int) int {
+	if v := os.Getenv(key); v != "" {
+		if n, err := strconv.Atoi(v); err == nil {
+			return n
+		}
+	}
+	return fallback
+}
+
 func main() {
 	logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))

-	brainDir := os.Getenv("INGEST_BRAIN_DIR")
-	if brainDir == "" {
-		brainDir = "../brain"
+	brainDir := envOr("INGEST_BRAIN_DIR", "../brain")
+	port := envOr("INGEST_PORT", "3300")
+
+	llmURL := envOr("INGEST_LLM_URL", "http://iguana:4000/v1")
+	llmKey := os.Getenv("INGEST_LLM_KEY")
+	llmModel := envOr("INGEST_LLM_MODEL", "koala/qwen35-9b-fast")
+	llmTimeoutMins := envInt("INGEST_LLM_TIMEOUT", 15)
+	chunkSize := envInt("INGEST_CHUNK_SIZE", 6000)
+	watchInterval := envInt("INGEST_WATCH_INTERVAL", 30)
+
+	llmClient := llm.New(llmURL, llmKey, llmModel, time.Duration(llmTimeoutMins)*time.Minute)
+
+	pipelineCfg := pipeline.Config{
+		Complete:  llmClient.Complete,
+		ChunkSize: chunkSize,
 	}

-	port := os.Getenv("INGEST_PORT")
-	if port == "" {
-		port = "3300"
-	}
+	h := api.NewHandler(brainDir, logger, pipelineCfg)

-	h := api.NewHandler(brainDir, logger)
+	ctx := context.Background()
+	if watchInterval > 0 {
+		watcher.Start(ctx, watcher.Config{
+			BrainDir: brainDir,
+			Interval: time.Duration(watchInterval) * time.Second,
+			Pipeline: pipelineCfg,
+		})
+	}

 	mux := http.NewServeMux()
-	mux.HandleFunc("/query", h.Query)
-	mux.HandleFunc("/write", h.Write)
+	mux.HandleFunc("POST /query", h.Query)
+	mux.HandleFunc("POST /write", h.Write)
+	mux.HandleFunc("POST /ingest", h.Ingest)
+	mux.HandleFunc("POST /ingest-path", h.IngestPath)
+	mux.HandleFunc("POST /ingest-raw", h.IngestRaw)
+	mux.HandleFunc("POST /backfill-refs", h.BackfillRefs)

 	addr := ":" + port
-	logger.Info("ingestion server starting", "addr", addr, "brain_dir", brainDir)
+	watchIntervalLog := "disabled"
+	if watchInterval > 0 {
+		watchIntervalLog = fmt.Sprintf("%ds", watchInterval)
+	}
+	logger.Info("ingestion server starting",
+		"addr", addr,
+		"brain_dir", brainDir,
+		"llm_url", llmURL,
+		"llm_model", llmModel,
+		"chunk_size", chunkSize,
+		"watch_interval", watchIntervalLog,
+	)
 	if err := http.ListenAndServe(addr, mux); err != nil {
 		logger.Error("server stopped", "err", err)
 		os.Exit(1)
--- a/ingestion/internal/api/handler.go
+++ b/ingestion/internal/api/handler.go
@@ -11,6 +11,8 @@ import (
 	"strings"
 	"time"

+	"github.com/mathiasbq/hyperguild/ingestion/internal/extract"
+	"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
 	"github.com/mathiasbq/hyperguild/ingestion/internal/search"
 )

@@ -18,11 +20,15 @@ import (
 type Handler struct {
 	brainDir string
 	logger   *slog.Logger
+	pipeline pipeline.Config
 }

 // NewHandler constructs a Handler. brainDir is the absolute path to brain/.
-func NewHandler(brainDir string, logger *slog.Logger) *Handler {
-	return &Handler{brainDir: brainDir, logger: logger}
+func NewHandler(brainDir string, logger *slog.Logger, pipelineCfg pipeline.Config) *Handler {
+	if logger == nil {
+		logger = slog.Default()
+	}
+	return &Handler{brainDir: brainDir, logger: logger, pipeline: pipelineCfg}
 }

 type queryRequest struct {
@@ -37,15 +43,32 @@ type writeRequest struct {
 	Domain   string `json:"domain,omitempty"`
 }

+type ingestRequest struct {
+	Content string `json:"content"`
+	Source  string `json:"source"`
+	DryRun  bool   `json:"dry_run"`
+}
+
+type ingestPathRequest struct {
+	Path   string `json:"path"`
+	Source string `json:"source"`
+	DryRun bool   `json:"dry_run"`
+}
+
+type ingestResponse struct {
+	Pages    []string `json:"pages"`
+	Warnings []string `json:"warnings"`
+}
+
 // Query handles POST /query — full-text search across the brain wiki.
 func (h *Handler) Query(w http.ResponseWriter, r *http.Request) {
 	var req queryRequest
 	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
-		http.Error(w, "invalid JSON", http.StatusBadRequest)
+		writeError(w, http.StatusBadRequest, "invalid JSON")
 		return
 	}
 	if strings.TrimSpace(req.Query) == "" {
-		http.Error(w, "query is required", http.StatusBadRequest)
+		writeError(w, http.StatusBadRequest, "query is required")
 		return
 	}
 	if req.Limit == 0 {
@@ -55,22 +78,22 @@ func (h *Handler) Query(w http.ResponseWriter, r *http.Request) {
 	results, err := search.Query(h.brainDir, req.Query, req.Limit)
 	if err != nil {
 		h.logger.Error("query failed", "err", err)
-		http.Error(w, "search error", http.StatusInternalServerError)
+		writeError(w, http.StatusInternalServerError, "search error")
 		return
 	}

 	writeJSON(w, map[string]any{"results": results})
 }

-// Write handles POST /write — write raw content to brain/raw/.
+// Write handles POST /write — write raw content to brain/knowledge/.
 func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {
 	var req writeRequest
 	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
-		http.Error(w, "invalid JSON", http.StatusBadRequest)
+		writeError(w, http.StatusBadRequest, "invalid JSON")
 		return
 	}
 	if req.Content == "" {
-		http.Error(w, "content is required", http.StatusBadRequest)
+		writeError(w, http.StatusBadRequest, "content is required")
 		return
 	}

@@ -81,7 +104,7 @@ func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {

 	rawDir := filepath.Join(h.brainDir, "knowledge")
 	if err := os.MkdirAll(rawDir, 0o755); err != nil {
-		http.Error(w, "failed to create raw dir", http.StatusInternalServerError)
+		writeError(w, http.StatusInternalServerError, "failed to create raw dir")
 		return
 	}

@@ -104,9 +127,13 @@ func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {
 		base += ".md"
 	}
 	dest := filepath.Join(rawDir, base)
+	if !strings.HasPrefix(filepath.Clean(dest)+string(os.PathSeparator), filepath.Clean(rawDir)+string(os.PathSeparator)) {
+		writeError(w, http.StatusBadRequest, "invalid filename")
+		return
+	}
 	if err := os.WriteFile(dest, []byte(finalContent), 0o644); err != nil {
 		h.logger.Error("write failed", "err", err)
-		http.Error(w, "write error", http.StatusInternalServerError)
+		writeError(w, http.StatusInternalServerError, "write error")
 		return
 	}

@@ -114,7 +141,198 @@ func (h *Handler) Write(w http.ResponseWriter, r *http.Request) {
 	writeJSON(w, map[string]string{"path": filepath.ToSlash(rel)})
 }

+// Ingest handles POST /ingest — run the pipeline on provided content.
+func (h *Handler) Ingest(w http.ResponseWriter, r *http.Request) {
+	var req ingestRequest
+	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+		writeError(w, http.StatusBadRequest, "invalid JSON")
+		return
+	}
+	if strings.TrimSpace(req.Content) == "" {
+		writeError(w, http.StatusBadRequest, "content is required")
+		return
+	}
+	if strings.TrimSpace(req.Source) == "" {
+		writeError(w, http.StatusBadRequest, "source is required")
+		return
+	}
+
+	result, err := pipeline.Run(r.Context(), h.pipeline, h.brainDir, req.Content, req.Source, req.DryRun)
+	if err != nil {
+		h.logger.Error("ingest failed", "source", req.Source, "err", err)
+		writeError(w, http.StatusInternalServerError, "ingest error")
+		return
+	}
+
+	pages := result.Pages
+	if pages == nil {
+		pages = []string{}
+	}
+	warnings := result.Warnings
+	if warnings == nil {
+		warnings = []string{}
+	}
+	writeJSON(w, ingestResponse{Pages: pages, Warnings: warnings})
+}
+
+// supportedExtensions lists file extensions that IngestPath will process.
+var supportedExtensions = map[string]bool{
+	".md":  true,
+	".txt": true,
+	".pdf": true,
+}
+
+// IngestPath handles POST /ingest-path — ingest a file or directory.
+func (h *Handler) IngestPath(w http.ResponseWriter, r *http.Request) {
+	var req ingestPathRequest
+	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+		writeError(w, http.StatusBadRequest, "invalid JSON")
+		return
+	}
+	if strings.TrimSpace(req.Path) == "" {
+		writeError(w, http.StatusBadRequest, "path is required")
+		return
+	}
+
+	info, err := os.Stat(req.Path)
+	if err != nil {
+		writeError(w, http.StatusBadRequest, fmt.Sprintf("path not accessible: %v", err))
+		return
+	}
+
+	var allPages []string
+	var allWarnings []string
+
+	if info.IsDir() {
+		err = filepath.WalkDir(req.Path, func(path string, d os.DirEntry, walkErr error) error {
+			if walkErr != nil {
+				return walkErr
+			}
+			if d.IsDir() {
+				return nil
+			}
+			ext := strings.ToLower(filepath.Ext(path))
+			if !supportedExtensions[ext] {
+				return nil
+			}
+			content, readErr := extract.Text(path)
+			if readErr != nil {
+				allWarnings = append(allWarnings, fmt.Sprintf("extract %s: %v", path, readErr))
+				return nil
+			}
+			source := req.Source
+			if source == "" {
+				source = filepath.Base(path)
+			}
+			result, runErr := pipeline.Run(r.Context(), h.pipeline, h.brainDir, content, source, req.DryRun)
+			if runErr != nil {
+				allWarnings = append(allWarnings, fmt.Sprintf("ingest %s: %v", path, runErr))
+				return nil
+			}
+			allPages = append(allPages, result.Pages...)
+			allWarnings = append(allWarnings, result.Warnings...)
+			return nil
+		})
+		if err != nil {
+			h.logger.Error("walk dir failed", "path", req.Path, "err", err)
+			writeError(w, http.StatusInternalServerError, fmt.Sprintf("walk error: %v", err))
+			return
+		}
+	} else {
+		ext := strings.ToLower(filepath.Ext(req.Path))
+		if !supportedExtensions[ext] {
+			writeError(w, http.StatusBadRequest, fmt.Sprintf("unsupported file extension: %s", ext))
+			return
+		}
+		content, readErr := extract.Text(req.Path)
+		if readErr != nil {
+			writeError(w, http.StatusInternalServerError, fmt.Sprintf("extract text: %v", readErr))
+			return
+		}
+		source := req.Source
+		if source == "" {
+			source = filepath.Base(req.Path)
+		}
+		result, runErr := pipeline.Run(r.Context(), h.pipeline, h.brainDir, content, source, req.DryRun)
+		if runErr != nil {
+			h.logger.Error("ingest-path failed", "path", req.Path, "err", runErr)
+			writeError(w, http.StatusInternalServerError, "ingest error")
+			return
+		}
+		allPages = result.Pages
+		allWarnings = result.Warnings
+	}
+
+	if allPages == nil {
+		allPages = []string{}
+	}
+	if allWarnings == nil {
+		allWarnings = []string{}
+	}
+	writeJSON(w, ingestResponse{Pages: allPages, Warnings: allWarnings})
+}
+
+type ingestRawRequest struct {
+	Source string             `json:"source"`
+	Pages  []pipeline.RawPage `json:"pages"`
+	DryRun bool               `json:"dry_run"`
+}
+
+// IngestRaw handles POST /ingest-raw — run the pipeline on pre-parsed RawPages,
+// skipping the LLM extraction step. Use when the caller has already produced
+// structured page data (e.g. from a more capable model or manual curation).
+func (h *Handler) IngestRaw(w http.ResponseWriter, r *http.Request) {
+	var req ingestRawRequest
+	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+		writeError(w, http.StatusBadRequest, "invalid JSON")
+		return
+	}
+	if strings.TrimSpace(req.Source) == "" {
+		writeError(w, http.StatusBadRequest, "source is required")
+		return
+	}
+	if len(req.Pages) == 0 {
+		writeError(w, http.StatusBadRequest, "pages is required and must be non-empty")
+		return
+	}
+
+	result, err := pipeline.RunRaw(h.brainDir, req.Source, req.Pages, req.DryRun)
+	if err != nil {
+		h.logger.Error("ingest-raw failed", "source", req.Source, "err", err)
+		writeError(w, http.StatusInternalServerError, "ingest error")
+		return
+	}
+
+	pages := result.Pages
+	if pages == nil {
+		pages = []string{}
+	}
+	warnings := result.Warnings
+	if warnings == nil {
+		warnings = []string{}
+	}
+	writeJSON(w, ingestResponse{Pages: pages, Warnings: warnings})
+}
+
+// BackfillRefs handles POST /backfill-refs — injects source back-references
+// into all concept and entity pages based on existing wiki/sources/ pages.
+func (h *Handler) BackfillRefs(w http.ResponseWriter, r *http.Request) {
+	n, err := pipeline.BackfillRefs(r.Context(), h.brainDir)
+	if err != nil {
+		h.logger.Error("backfill-refs failed", "err", err)
+		writeError(w, http.StatusInternalServerError, "backfill error")
+		return
+	}
+	writeJSON(w, map[string]int{"updated": n})
+}
+
 func writeJSON(w http.ResponseWriter, v any) {
 	w.Header().Set("Content-Type", "application/json")
 	json.NewEncoder(w).Encode(v) //nolint:errcheck
 }
+
+func writeError(w http.ResponseWriter, code int, msg string) {
+	w.Header().Set("Content-Type", "application/json")
+	w.WriteHeader(code)
+	json.NewEncoder(w).Encode(map[string]string{"error": msg}) //nolint:errcheck
+}
--- a/ingestion/internal/api/handler_test.go
+++ b/ingestion/internal/api/handler_test.go
@@ -3,6 +3,7 @@ package api_test

 import (
 	"bytes"
+	"context"
 	"encoding/json"
 	"log/slog"
 	"net/http"
@@ -12,11 +13,26 @@ import (
 	"strings"
 	"testing"

-	"github.com/mathiasbq/hyperguild/ingestion/internal/api"
 	"github.com/stretchr/testify/assert"
 	"github.com/stretchr/testify/require"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/api"
+	"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
 )

+// stubComplete returns a fixed JSON RawPage so tests never call a real LLM.
+func stubComplete(_ context.Context, _, _ string) (string, error) {
+	return `[{"title":"Test Source","type":"source","subtype":"article","content":"## Summary\n\nSome content here.\n"}]`, nil
+}
+
+func stubPipelineCfg() pipeline.Config {
+	return pipeline.Config{
+		Complete:  stubComplete,
+		ChunkSize: 0,
+		Schema:    "# Test Schema\nwiki/sources/, wiki/concepts/, wiki/entities/",
+	}
+}
+
 func setup(t *testing.T) (string, *api.Handler) {
 	t.Helper()
 	dir := t.TempDir()
@@ -27,9 +43,13 @@ func setup(t *testing.T) (string, *api.Handler) {
 		0o644,
 	))
 	logger := slog.New(slog.NewTextHandler(os.Stderr, nil))
-	return dir, api.NewHandler(dir, logger)
+	return dir, api.NewHandler(dir, logger, stubPipelineCfg())
 }

+// ---------------------------------------------------------------------------
+// Existing tests (Write / Query)
+// ---------------------------------------------------------------------------
+
 func TestQuery_ReturnsResults(t *testing.T) {
 	_, h := setup(t)
 	body, _ := json.Marshal(map[string]any{"query": "test driven", "limit": 5})
@@ -112,3 +132,201 @@ func TestWrite_GeneratesFilenameIfAbsent(t *testing.T) {
 	assert.Len(t, entries, 2)
 	assert.True(t, strings.HasSuffix(entries[1].Name(), ".md"))
 }
+
+// ---------------------------------------------------------------------------
+// POST /ingest
+// ---------------------------------------------------------------------------
+
+func TestIngest_Validation(t *testing.T) {
+	cases := []struct {
+		name string
+		body map[string]any
+	}{
+		{"missing content", map[string]any{"source": "test-source"}},
+		{"missing source", map[string]any{"content": "some content"}},
+		{"whitespace content", map[string]any{"content": "   ", "source": "test-source"}},
+		{"whitespace source", map[string]any{"content": "some content", "source": "  "}},
+	}
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			_, h := setup(t)
+			body, _ := json.Marshal(tc.body)
+			req := httptest.NewRequest(http.MethodPost, "/ingest", bytes.NewReader(body))
+			rec := httptest.NewRecorder()
+
+			h.Ingest(rec, req)
+
+			assert.Equal(t, http.StatusBadRequest, rec.Code)
+		})
+	}
+}
+
+func TestIngest_Success(t *testing.T) {
+	_, h := setup(t)
+	body, _ := json.Marshal(map[string]any{
+		"content": "some content about shape-up methodology",
+		"source":  "shape-up-book",
+		"dry_run": true,
+	})
+	req := httptest.NewRequest(http.MethodPost, "/ingest", bytes.NewReader(body))
+	rec := httptest.NewRecorder()
+
+	h.Ingest(rec, req)
+
+	require.Equal(t, http.StatusOK, rec.Code)
+	var resp map[string]any
+	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
+	pages, ok := resp["pages"]
+	require.True(t, ok, "response must have pages field")
+	pagesSlice, ok := pages.([]any)
+	require.True(t, ok, "pages must be an array")
+	assert.NotEmpty(t, pagesSlice)
+}
+
+// ---------------------------------------------------------------------------
+// POST /ingest-path
+// ---------------------------------------------------------------------------
+
+func TestIngestPath_MissingPath(t *testing.T) {
+	_, h := setup(t)
+	body, _ := json.Marshal(map[string]any{"source": "test-source"})
+	req := httptest.NewRequest(http.MethodPost, "/ingest-path", bytes.NewReader(body))
+	rec := httptest.NewRecorder()
+
+	h.IngestPath(rec, req)
+
+	assert.Equal(t, http.StatusBadRequest, rec.Code)
+}
+
+func TestIngestPath_File(t *testing.T) {
+	_, h := setup(t)
+
+	// Create a temp file with content
+	dir := t.TempDir()
+	f := filepath.Join(dir, "doc.md")
+	require.NoError(t, os.WriteFile(f, []byte("# Hello\nThis is markdown content."), 0o644))
+
+	body, _ := json.Marshal(map[string]any{
+		"path":    f,
+		"source":  "test-doc",
+		"dry_run": true,
+	})
+	req := httptest.NewRequest(http.MethodPost, "/ingest-path", bytes.NewReader(body))
+	rec := httptest.NewRecorder()
+
+	h.IngestPath(rec, req)
+
+	require.Equal(t, http.StatusOK, rec.Code)
+	var resp map[string]any
+	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
+	pages, ok := resp["pages"]
+	require.True(t, ok, "response must have pages field")
+	pagesSlice, ok := pages.([]any)
+	require.True(t, ok, "pages must be an array")
+	assert.NotEmpty(t, pagesSlice)
+}
+
+// ---------------------------------------------------------------------------
+// POST /ingest-raw
+// ---------------------------------------------------------------------------
+
+func TestIngestRaw_Validation(t *testing.T) {
+	cases := []struct {
+		name string
+		body map[string]any
+	}{
+		{"missing source", map[string]any{"pages": []any{map[string]any{"title": "X", "type": "concept", "content": "x"}}}},
+		{"missing pages", map[string]any{"source": "test-source"}},
+		{"empty pages", map[string]any{"source": "test-source", "pages": []any{}}},
+	}
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			_, h := setup(t)
+			body, _ := json.Marshal(tc.body)
+			req := httptest.NewRequest(http.MethodPost, "/ingest-raw", bytes.NewReader(body))
+			rec := httptest.NewRecorder()
+
+			h.IngestRaw(rec, req)
+
+			assert.Equal(t, http.StatusBadRequest, rec.Code)
+		})
+	}
+}
+
+func TestIngestRaw_Success(t *testing.T) {
+	dir, h := setup(t)
+	body, _ := json.Marshal(map[string]any{
+		"source": "test-article",
+		"pages": []any{
+			map[string]any{"title": "Test Article", "type": "source", "subtype": "article", "domain": "Testing", "content": "## Summary\n\nThis is a test article about [[Test Concept]].\n"},
+			map[string]any{"title": "Test Concept", "type": "concept", "domain": "Testing", "content": "A concept for testing.\n"},
+		},
+	})
+	req := httptest.NewRequest(http.MethodPost, "/ingest-raw", bytes.NewReader(body))
+	rec := httptest.NewRecorder()
+
+	h.IngestRaw(rec, req)
+
+	require.Equal(t, http.StatusOK, rec.Code)
+	var resp map[string]any
+	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
+	pages := resp["pages"].([]any)
+	assert.Len(t, pages, 2)
+
+	// Verify files were written
+	sourcePath := filepath.Join(dir, "wiki", "sources", "test-article.md")
+	assert.FileExists(t, sourcePath)
+	conceptPath := filepath.Join(dir, "wiki", "concepts", "test-concept.md")
+	assert.FileExists(t, conceptPath)
+}
+
+func TestIngestRaw_DryRun(t *testing.T) {
+	dir, h := setup(t)
+	body, _ := json.Marshal(map[string]any{
+		"source": "dry-run-test",
+		"pages": []any{
+			map[string]any{"title": "Dry Run Source", "type": "source", "subtype": "article", "content": "Content."},
+		},
+		"dry_run": true,
+	})
+	req := httptest.NewRequest(http.MethodPost, "/ingest-raw", bytes.NewReader(body))
+	rec := httptest.NewRecorder()
+
+	h.IngestRaw(rec, req)
+
+	require.Equal(t, http.StatusOK, rec.Code)
+	var resp map[string]any
+	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
+	pages := resp["pages"].([]any)
+	assert.NotEmpty(t, pages)
+
+	// Verify no files were written
+	sourcePath := filepath.Join(dir, "wiki", "sources", "dry-run-test.md")
+	assert.NoFileExists(t, sourcePath)
+}
+
+func TestIngestPath_Directory(t *testing.T) {
+	_, h := setup(t)
+
+	// Create a temp dir with one .md file
+	dir := t.TempDir()
+	require.NoError(t, os.WriteFile(filepath.Join(dir, "notes.md"), []byte("# Notes\nSome notes."), 0o644))
+
+	body, _ := json.Marshal(map[string]any{
+		"path":    dir,
+		"dry_run": true,
+	})
+	req := httptest.NewRequest(http.MethodPost, "/ingest-path", bytes.NewReader(body))
+	rec := httptest.NewRecorder()
+
+	h.IngestPath(rec, req)
+
+	require.Equal(t, http.StatusOK, rec.Code)
+	var resp map[string]any
+	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
+	pages, ok := resp["pages"]
+	require.True(t, ok, "response must have pages field")
+	pagesSlice, ok := pages.([]any)
+	require.True(t, ok, "pages must be an array")
+	assert.NotEmpty(t, pagesSlice)
+}
--- a/ingestion/internal/extract/extract.go
+++ b/ingestion/internal/extract/extract.go
@@ -0,0 +1,39 @@
+// ingestion/internal/extract/extract.go
+package extract
+
+import (
+	"fmt"
+	"os"
+	"strings"
+)
+
+// Text reads the file at path and returns its plain-text content.
+// Supported extensions: .md, .txt (passthrough), .pdf (via pdftotext).
+func Text(path string) (string, error) {
+	ext := strings.ToLower(fileExt(path))
+	switch ext {
+	case ".md", ".txt":
+		b, err := os.ReadFile(path)
+		if err != nil {
+			return "", fmt.Errorf("read %s: %w", path, err)
+		}
+		return string(b), nil
+	case ".pdf":
+		return extractPDF(path)
+	default:
+		return "", fmt.Errorf("unsupported file extension: %s", ext)
+	}
+}
+
+// fileExt returns the file extension including the dot, lowercased.
+func fileExt(path string) string {
+	for i := len(path) - 1; i >= 0; i-- {
+		if path[i] == '.' {
+			return path[i:]
+		}
+		if path[i] == '/' || path[i] == '\\' {
+			break
+		}
+	}
+	return ""
+}
--- a/ingestion/internal/extract/extract_test.go
+++ b/ingestion/internal/extract/extract_test.go
@@ -0,0 +1,62 @@
+// ingestion/internal/extract/extract_test.go
+package extract
+
+import (
+	"os"
+	"os/exec"
+	"path/filepath"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestText_Markdown(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "note.md")
+	require.NoError(t, os.WriteFile(path, []byte("# Hello\n\nWorld."), 0o644))
+
+	got, err := Text(path)
+	require.NoError(t, err)
+	assert.Equal(t, "# Hello\n\nWorld.", got)
+}
+
+func TestText_Txt(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "note.txt")
+	require.NoError(t, os.WriteFile(path, []byte("plain text"), 0o644))
+
+	got, err := Text(path)
+	require.NoError(t, err)
+	assert.Equal(t, "plain text", got)
+}
+
+func TestText_UnsupportedExtension(t *testing.T) {
+	dir := t.TempDir()
+	path := filepath.Join(dir, "data.csv")
+	require.NoError(t, os.WriteFile(path, []byte("a,b,c"), 0o644))
+
+	_, err := Text(path)
+	assert.ErrorContains(t, err, "unsupported")
+}
+
+func TestText_PDF(t *testing.T) {
+	if _, err := exec.LookPath("pdftotext"); err != nil {
+		t.Skip("pdftotext not available")
+	}
+	dir := t.TempDir()
+	pdfPath := filepath.Join(dir, "test.pdf")
+
+	// Minimal valid PDF containing the text "Hello PDF".
+	minimalPDF := "%PDF-1.4\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj\n" +
+		"2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj\n" +
+		"3 0 obj<</Type/Page/MediaBox[0 0 612 792]/Parent 2 0 R/Contents 4 0 R/Resources<</Font<</F1<</Type/Font/Subtype/Type1/BaseFont/Helvetica>>>>>>>>endobj\n" +
+		"4 0 obj<</Length 44>>\nstream\nBT /F1 12 Tf 100 700 Td (Hello PDF) Tj ET\nendstream\nendobj\n" +
+		"xref\n0 5\n0000000000 65535 f\n0000000009 00000 n\n0000000058 00000 n\n0000000115 00000 n\n0000000310 00000 n\n" +
+		"trailer<</Size 5/Root 1 0 R>>\nstartxref\n406\n%%EOF\n"
+	require.NoError(t, os.WriteFile(pdfPath, []byte(minimalPDF), 0o644))
+
+	got, err := Text(pdfPath)
+	require.NoError(t, err)
+	assert.Contains(t, got, "Hello PDF")
+}
--- a/ingestion/internal/extract/pdf.go
+++ b/ingestion/internal/extract/pdf.go
@@ -0,0 +1,28 @@
+// ingestion/internal/extract/pdf.go
+package extract
+
+import (
+	"bytes"
+	"fmt"
+	"os/exec"
+	"strings"
+)
+
+// extractPDF runs pdftotext on path and returns the extracted text.
+// pdftotext must be installed (package: poppler-utils on Alpine/Debian, poppler on Homebrew).
+func extractPDF(path string) (string, error) {
+	cmd := exec.Command("pdftotext", "-q", path, "-")
+	var stdout, stderr bytes.Buffer
+	cmd.Stdout = &stdout
+	cmd.Stderr = &stderr
+
+	if err := cmd.Run(); err != nil {
+		errMsg := strings.TrimSpace(stderr.String())
+		if errMsg == "" {
+			errMsg = err.Error()
+		}
+		return "", fmt.Errorf("pdftotext: %s", errMsg)
+	}
+
+	return strings.TrimSpace(stdout.String()), nil
+}
--- a/ingestion/internal/llm/client.go
+++ b/ingestion/internal/llm/client.go
@@ -0,0 +1,119 @@
+package llm
+
+import (
+	"bytes"
+	"context"
+	"encoding/json"
+	"fmt"
+	"io"
+	"net/http"
+	"strconv"
+	"strings"
+	"time"
+)
+
+// Client calls an OpenAI-compatible chat completions endpoint.
+type Client struct {
+	baseURL    string
+	apiKey     string
+	model      string
+	httpClient *http.Client
+}
+
+// New constructs a Client.
+func New(baseURL, apiKey, model string, timeout time.Duration) *Client {
+	return &Client{
+		baseURL:    strings.TrimRight(baseURL, "/"),
+		apiKey:     apiKey,
+		model:      model,
+		httpClient: &http.Client{Timeout: timeout},
+	}
+}
+
+type chatRequest struct {
+	Model       string    `json:"model"`
+	Messages    []message `json:"messages"`
+	Temperature float64   `json:"temperature"`
+}
+
+type message struct {
+	Role    string `json:"role"`
+	Content string `json:"content"`
+}
+
+type chatResponse struct {
+	Choices []struct {
+		Message message `json:"message"`
+	} `json:"choices"`
+}
+
+// Complete sends a system + user message and returns the assistant's reply.
+// Retries once on HTTP 429 using Retry-After header or 5s backoff.
+func (c *Client) Complete(ctx context.Context, system, user string) (string, error) {
+	body := chatRequest{
+		Model: c.model,
+		Messages: []message{
+			{Role: "system", Content: system},
+			{Role: "user", Content: user},
+		},
+		Temperature: 0.2,
+	}
+	b, err := json.Marshal(body)
+	if err != nil {
+		return "", fmt.Errorf("marshal request: %w", err)
+	}
+
+	do := func() (*http.Response, error) {
+		req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.baseURL+"/chat/completions", bytes.NewReader(b))
+		if err != nil {
+			return nil, fmt.Errorf("build request: %w", err)
+		}
+		req.Header.Set("Content-Type", "application/json")
+		if c.apiKey != "" {
+			req.Header.Set("Authorization", "Bearer "+c.apiKey)
+		}
+		return c.httpClient.Do(req)
+	}
+
+	resp, err := do()
+	if err != nil {
+		return "", fmt.Errorf("call LLM: %w", err)
+	}
+
+	if resp.StatusCode == http.StatusTooManyRequests {
+		_ = resp.Body.Close()
+		wait := 5 * time.Second
+		if ra := resp.Header.Get("Retry-After"); ra != "" {
+			if secs, err := strconv.Atoi(ra); err == nil {
+				wait = time.Duration(secs) * time.Second
+			}
+		}
+		select {
+		case <-ctx.Done():
+			return "", ctx.Err()
+		case <-time.After(wait):
+		}
+		resp, err = do()
+		if err != nil {
+			return "", fmt.Errorf("retry LLM call: %w", err)
+		}
+	}
+	defer resp.Body.Close() //nolint:errcheck
+
+	out, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return "", fmt.Errorf("read response: %w", err)
+	}
+	if resp.StatusCode != http.StatusOK {
+		return "", fmt.Errorf("LLM returned %d: %s", resp.StatusCode, out)
+	}
+
+	var cr chatResponse
+	if err := json.Unmarshal(out, &cr); err != nil {
+		return "", fmt.Errorf("parse response: %w", err)
+	}
+	if len(cr.Choices) == 0 {
+		return "", fmt.Errorf("LLM returned no choices")
+	}
+	return cr.Choices[0].Message.Content, nil
+}
--- a/ingestion/internal/llm/client_test.go
+++ b/ingestion/internal/llm/client_test.go
@@ -0,0 +1,86 @@
+package llm
+
+import (
+	"context"
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func mockServer(t *testing.T, response string) *httptest.Server {
+	t.Helper()
+	return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		assert.Equal(t, "/chat/completions", r.URL.Path)
+		assert.Equal(t, "application/json", r.Header.Get("Content-Type"))
+		w.Header().Set("Content-Type", "application/json")
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"choices": []map[string]any{
+				{"message": map[string]any{"role": "assistant", "content": response}},
+			},
+		})
+	}))
+}
+
+func TestClient_Complete(t *testing.T) {
+	srv := mockServer(t, "hello world")
+	defer srv.Close()
+
+	c := New(srv.URL, "", "test-model", 10*time.Second)
+	got, err := c.Complete(context.Background(), "you are helpful", "say hello")
+	require.NoError(t, err)
+	assert.Equal(t, "hello world", got)
+}
+
+func TestClient_ReturnsErrorOnNon200(t *testing.T) {
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		http.Error(w, "overloaded", http.StatusServiceUnavailable)
+	}))
+	defer srv.Close()
+
+	c := New(srv.URL, "", "test-model", 10*time.Second)
+	_, err := c.Complete(context.Background(), "sys", "user")
+	assert.Error(t, err)
+}
+
+func TestClient_SendsAuthHeader(t *testing.T) {
+	var gotAuth string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		gotAuth = r.Header.Get("Authorization")
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"choices": []map[string]any{{"message": map[string]any{"content": "ok"}}},
+		})
+	}))
+	defer srv.Close()
+
+	c := New(srv.URL, "my-key", "test-model", 10*time.Second)
+	_, err := c.Complete(context.Background(), "sys", "user")
+	require.NoError(t, err)
+	assert.Equal(t, "Bearer my-key", gotAuth)
+}
+
+func TestClient_Retries429(t *testing.T) {
+	calls := 0
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		calls++
+		if calls == 1 {
+			w.Header().Set("Retry-After", "0")
+			w.WriteHeader(http.StatusTooManyRequests)
+			return
+		}
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"choices": []map[string]any{{"message": map[string]any{"content": "retried"}}},
+		})
+	}))
+	defer srv.Close()
+
+	c := New(srv.URL, "", "test-model", 10*time.Second)
+	got, err := c.Complete(context.Background(), "sys", "user")
+	require.NoError(t, err)
+	assert.Equal(t, "retried", got)
+	assert.Equal(t, 2, calls)
+}
--- a/ingestion/internal/pipeline/backfill.go
+++ b/ingestion/internal/pipeline/backfill.go
@@ -0,0 +1,91 @@
+// ingestion/internal/pipeline/backfill.go
+package pipeline
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"path/filepath"
+	"strings"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+// BackfillRefs walks wiki/sources/ and injects source back-references into every
+// concept and entity page that each source links to.
+// Changes for all sources are accumulated in memory before writing, so multiple
+// sources referencing the same concept are merged in one pass.
+// Deduplication is handled by wiki.Merge — running this multiple times is safe.
+// Returns the number of concept/entity pages written.
+func BackfillRefs(ctx context.Context, brainDir string) (int, error) {
+	inventory, err := wiki.LoadInventory(brainDir)
+	if err != nil {
+		return 0, fmt.Errorf("load inventory: %w", err)
+	}
+
+	sourcesDir := filepath.Join(brainDir, "wiki", "sources")
+	entries, err := os.ReadDir(sourcesDir)
+	if err != nil {
+		if os.IsNotExist(err) {
+			return 0, nil
+		}
+		return 0, fmt.Errorf("read sources dir: %w", err)
+	}
+
+	// Accumulate all changes before writing: relPath → updated Page.
+	// Collecting first means two sources that both link the same concept
+	// get both refs merged before a single write.
+	pending := make(map[string]wiki.Page)
+
+	for _, e := range entries {
+		if ctx.Err() != nil {
+			return 0, ctx.Err()
+		}
+		if e.IsDir() || !strings.HasSuffix(e.Name(), ".md") {
+			continue
+		}
+
+		b, err := os.ReadFile(filepath.Join(sourcesDir, e.Name()))
+		if err != nil {
+			continue
+		}
+		sourceContent := string(b)
+		sourceSlug := strings.TrimSuffix(e.Name(), ".md")
+		sourceTitle := extractTitle(sourceContent)
+		if sourceTitle == "" {
+			sourceTitle = sourceSlug
+		}
+		sourceRef := "- [[" + sourceSlug + "|" + sourceTitle + "]]"
+
+		for slug := range extractWikilinks(sourceContent) {
+			if slug == sourceSlug {
+				continue
+			}
+			pt, ok := findInInventory(slug, inventory)
+			if !ok {
+				continue
+			}
+			relPath := "wiki/" + string(pt) + "/" + slug + ".md"
+
+			// Start from already-accumulated version if we've seen this page.
+			page, seen := pending[relPath]
+			if !seen {
+				raw, err := os.ReadFile(filepath.Join(brainDir, filepath.FromSlash(relPath)))
+				if err != nil {
+					continue
+				}
+				page = wiki.Page{Path: relPath, Content: string(raw)}
+			}
+			pending[relPath] = addSourceRef(page, sourceRef)
+		}
+	}
+
+	for relPath, page := range pending {
+		dest := filepath.Join(brainDir, filepath.FromSlash(relPath))
+		if err := os.WriteFile(dest, []byte(page.Content), 0o644); err != nil {
+			return 0, fmt.Errorf("write %s: %w", relPath, err)
+		}
+	}
+
+	return len(pending), nil
+}
--- a/ingestion/internal/pipeline/backfill_test.go
+++ b/ingestion/internal/pipeline/backfill_test.go
@@ -0,0 +1,107 @@
+// ingestion/internal/pipeline/backfill_test.go
+package pipeline
+
+import (
+	"context"
+	"os"
+	"path/filepath"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func setupBrainDir(t *testing.T) string {
+	t.Helper()
+	dir := t.TempDir()
+	for _, sub := range []string{"wiki/sources", "wiki/concepts", "wiki/entities"} {
+		require.NoError(t, os.MkdirAll(filepath.Join(dir, sub), 0o755))
+	}
+	return dir
+}
+
+func writeFile(t *testing.T, path, content string) {
+	t.Helper()
+	require.NoError(t, os.MkdirAll(filepath.Dir(path), 0o755))
+	require.NoError(t, os.WriteFile(path, []byte(content), 0o644))
+}
+
+func TestBackfillRefs_UpdatesConcept(t *testing.T) {
+	dir := setupBrainDir(t)
+	writeFile(t, filepath.Join(dir, "wiki/sources/shape-up.md"),
+		"---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[betting|Betting]].\n")
+	writeFile(t, filepath.Join(dir, "wiki/concepts/betting.md"),
+		"---\ntitle: Betting\n---\n\n## Definition\n\nA resource allocation technique.\n")
+
+	n, err := BackfillRefs(context.Background(), dir)
+	require.NoError(t, err)
+	assert.Equal(t, 1, n)
+
+	got, err := os.ReadFile(filepath.Join(dir, "wiki/concepts/betting.md"))
+	require.NoError(t, err)
+	assert.Contains(t, string(got), "## Sources")
+	assert.Contains(t, string(got), "[[shape-up|Shape Up]]")
+	assert.Contains(t, string(got), "## Definition") // original content preserved
+}
+
+func TestBackfillRefs_Deduplication(t *testing.T) {
+	dir := setupBrainDir(t)
+	writeFile(t, filepath.Join(dir, "wiki/sources/shape-up.md"),
+		"---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[betting|Betting]].\n")
+	writeFile(t, filepath.Join(dir, "wiki/concepts/betting.md"),
+		"---\ntitle: Betting\n---\n\n## Definition\n\nA technique.\n")
+
+	// Run twice — should not duplicate the ref.
+	_, err := BackfillRefs(context.Background(), dir)
+	require.NoError(t, err)
+	_, err = BackfillRefs(context.Background(), dir)
+	require.NoError(t, err)
+
+	got, err := os.ReadFile(filepath.Join(dir, "wiki/concepts/betting.md"))
+	require.NoError(t, err)
+
+	count := 0
+	for _, line := range splitLines(string(got)) {
+		if line == "- [[shape-up|Shape Up]]" {
+			count++
+		}
+	}
+	assert.Equal(t, 1, count, "ref should appear exactly once after two runs")
+}
+
+func TestBackfillRefs_MultipleSources(t *testing.T) {
+	dir := setupBrainDir(t)
+	writeFile(t, filepath.Join(dir, "wiki/sources/book-a.md"),
+		"---\ntitle: Book A\n---\n\n## Summary\n\nSee [[shaping|Shaping]].\n")
+	writeFile(t, filepath.Join(dir, "wiki/sources/book-b.md"),
+		"---\ntitle: Book B\n---\n\n## Summary\n\nAlso [[shaping|Shaping]].\n")
+	writeFile(t, filepath.Join(dir, "wiki/concepts/shaping.md"),
+		"---\ntitle: Shaping\n---\n\n## Definition\n\nA design activity.\n")
+
+	n, err := BackfillRefs(context.Background(), dir)
+	require.NoError(t, err)
+	assert.Equal(t, 1, n) // one concept page written
+
+	got, err := os.ReadFile(filepath.Join(dir, "wiki/concepts/shaping.md"))
+	require.NoError(t, err)
+	assert.Contains(t, string(got), "[[book-a|Book A]]")
+	assert.Contains(t, string(got), "[[book-b|Book B]]")
+}
+
+func TestBackfillRefs_NoSourcesDir(t *testing.T) {
+	dir := t.TempDir() // no wiki/sources subdir
+	n, err := BackfillRefs(context.Background(), dir)
+	require.NoError(t, err)
+	assert.Equal(t, 0, n)
+}
+
+func TestBackfillRefs_SkipsUnknownSlugs(t *testing.T) {
+	dir := setupBrainDir(t)
+	// Source links to a slug not in inventory and not on disk.
+	writeFile(t, filepath.Join(dir, "wiki/sources/article.md"),
+		"---\ntitle: Article\n---\n\n## Summary\n\nSee [[ghost-slug|Ghost]].\n")
+
+	n, err := BackfillRefs(context.Background(), dir)
+	require.NoError(t, err)
+	assert.Equal(t, 0, n)
+}
--- a/ingestion/internal/pipeline/build.go
+++ b/ingestion/internal/pipeline/build.go
@@ -0,0 +1,106 @@
+// ingestion/internal/pipeline/build.go
+package pipeline
+
+import (
+	"fmt"
+	"strings"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+// BuildPages converts RawPages from the LLM into wiki.Pages with computed slugs,
+// paths, and YAML frontmatter. sourceSlug is the slug of the source being ingested
+// (derived from the filename, not the LLM title). Pages whose title resolves to an
+// empty slug are skipped and returned as warnings instead.
+func BuildPages(rawPages []RawPage, sourceSlug, date string) ([]wiki.Page, []string) {
+	out := make([]wiki.Page, 0, len(rawPages))
+	var warnings []string
+	for _, rp := range rawPages {
+		slug := computeSlug(rp, sourceSlug)
+		if slug == "" {
+			warnings = append(warnings, fmt.Sprintf("skipped page with empty title (type: %s)", rp.Type))
+			continue
+		}
+		out = append(out, buildPage(rp, sourceSlug, date))
+	}
+	return out, warnings
+}
+
+func computeSlug(rp RawPage, sourceSlug string) string {
+	if rp.Type == "source" {
+		return sourceSlug
+	}
+	return wiki.Slug(rp.Title)
+}
+
+func buildPage(rp RawPage, sourceSlug, date string) wiki.Page {
+	var slug, dir string
+	switch rp.Type {
+	case "source":
+		slug = sourceSlug
+		dir = "wiki/sources"
+	case "concept":
+		slug = wiki.Slug(rp.Title)
+		dir = "wiki/concepts"
+	case "entity":
+		slug = wiki.Slug(rp.Title)
+		dir = "wiki/entities"
+	default:
+		slug = wiki.Slug(rp.Title)
+		dir = "wiki/" + rp.Type
+	}
+
+	path := dir + "/" + slug + ".md"
+	fm := buildFrontmatter(rp, date)
+
+	return wiki.Page{
+		Path:    path,
+		Content: fm + "\n" + rp.Content,
+	}
+}
+
+func buildFrontmatter(rp RawPage, date string) string {
+	var sb strings.Builder
+	sb.WriteString("---\n")
+	fmt.Fprintf(&sb, "title: %s\n", yamlScalar(rp.Title))
+
+	switch rp.Type {
+	case "source":
+		subtype := rp.Subtype
+		if subtype == "" {
+			subtype = "article"
+		}
+		fmt.Fprintf(&sb, "type: %s\n", yamlScalar(subtype))
+		if rp.Domain != "" {
+			fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
+		}
+		fmt.Fprintf(&sb, "date_ingested: %s\n", date)
+		fmt.Fprintf(&sb, "last_updated: %s\n", date)
+	case "concept":
+		if rp.Domain != "" {
+			fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
+		}
+		fmt.Fprintf(&sb, "last_updated: %s\n", date)
+	case "entity":
+		if rp.Subtype != "" {
+			fmt.Fprintf(&sb, "type: %s\n", yamlScalar(rp.Subtype))
+		}
+		if rp.Domain != "" {
+			fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
+		}
+		fmt.Fprintf(&sb, "last_updated: %s\n", date)
+	default:
+		if rp.Domain != "" {
+			fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
+		}
+		fmt.Fprintf(&sb, "last_updated: %s\n", date)
+	}
+
+	fmt.Fprintf(&sb, "aliases:\n  - %s\n", yamlScalar(rp.Title))
+	sb.WriteString("---\n")
+	return sb.String()
+}
+
+func yamlScalar(s string) string {
+	return "'" + strings.ReplaceAll(s, "'", "''") + "'"
+}
--- a/ingestion/internal/pipeline/build_test.go
+++ b/ingestion/internal/pipeline/build_test.go
@@ -0,0 +1,167 @@
+// ingestion/internal/pipeline/build_test.go
+package pipeline
+
+import (
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestBuildPages_SourcePage(t *testing.T) {
+	raw := []RawPage{
+		{
+			Title:   "Shape Up",
+			Type:    "source",
+			Subtype: "book",
+			Domain:  "product-strategy",
+			Content: "## Summary\n\nA book about shaping product work.\n",
+		},
+	}
+	pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
+	require.Len(t, pages, 1)
+	assert.Empty(t, warnings)
+
+	p := pages[0]
+	assert.Equal(t, "wiki/sources/shape-up.md", p.Path)
+	assert.Contains(t, p.Content, "title: 'Shape Up'")
+	assert.Contains(t, p.Content, "type: 'book'")
+	assert.Contains(t, p.Content, "domain: 'product-strategy'")
+	assert.Contains(t, p.Content, "date_ingested: 2026-04-23")
+	assert.Contains(t, p.Content, "last_updated: 2026-04-23")
+	assert.Contains(t, p.Content, "aliases:\n  - 'Shape Up'")
+	assert.Contains(t, p.Content, "## Summary")
+	assert.True(t, strings.HasPrefix(p.Content, "---\n"), "content must start with frontmatter")
+}
+
+func TestBuildPages_ConceptPage(t *testing.T) {
+	raw := []RawPage{
+		{
+			Title:   "Betting",
+			Type:    "concept",
+			Domain:  "product-strategy",
+			Content: "## Definition\n\nA resource allocation technique.\n",
+		},
+	}
+	pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
+	require.Len(t, pages, 1)
+	assert.Empty(t, warnings)
+
+	p := pages[0]
+	assert.Equal(t, "wiki/concepts/betting.md", p.Path)
+	assert.Contains(t, p.Content, "title: 'Betting'")
+	assert.Contains(t, p.Content, "domain: 'product-strategy'")
+	assert.Contains(t, p.Content, "last_updated: 2026-04-23")
+	assert.Contains(t, p.Content, "aliases:\n  - 'Betting'")
+	assert.NotContains(t, p.Content, "date_ingested")
+	assert.Contains(t, p.Content, "## Definition")
+}
+
+func TestBuildPages_EntityPage(t *testing.T) {
+	raw := []RawPage{
+		{
+			Title:   "Ryan Singer",
+			Type:    "entity",
+			Subtype: "person",
+			Domain:  "product-strategy",
+			Content: "## Description\n\nA product designer.\n",
+		},
+	}
+	pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
+	require.Len(t, pages, 1)
+	assert.Empty(t, warnings)
+
+	p := pages[0]
+	assert.Equal(t, "wiki/entities/ryan-singer.md", p.Path)
+	assert.Contains(t, p.Content, "title: 'Ryan Singer'")
+	assert.Contains(t, p.Content, "type: 'person'")
+	assert.Contains(t, p.Content, "domain: 'product-strategy'")
+	assert.Contains(t, p.Content, "last_updated: 2026-04-23")
+	assert.Contains(t, p.Content, "aliases:\n  - 'Ryan Singer'")
+	assert.NotContains(t, p.Content, "date_ingested")
+}
+
+func TestBuildPages_SourceSlugUsedForSourcePage(t *testing.T) {
+	// LLM title differs from filename — pipeline uses sourceSlug for the source page path.
+	raw := []RawPage{
+		{Title: "FinBERT: A Pretrained Model", Type: "source", Subtype: "article", Content: "## Summary\n\nA model.\n"},
+	}
+	pages, _ := BuildPages(raw, "finbert-huggingface", "2026-04-23")
+	require.Len(t, pages, 1)
+	assert.Equal(t, "wiki/sources/finbert-huggingface.md", pages[0].Path)
+}
+
+func TestBuildPages_ConceptSlugDerivedFromTitle(t *testing.T) {
+	raw := []RawPage{
+		{Title: "Domain-Driven Design", Type: "concept", Content: "## Definition\n\nFoo.\n"},
+	}
+	pages, _ := BuildPages(raw, "some-source", "2026-04-23")
+	require.Len(t, pages, 1)
+	assert.Equal(t, "wiki/concepts/domain-driven-design.md", pages[0].Path)
+}
+
+func TestBuildPages_SourceDefaultSubtype(t *testing.T) {
+	// If subtype is omitted for a source, default to "article"
+	raw := []RawPage{
+		{Title: "Some Post", Type: "source", Content: "## Summary\n\nA post.\n"},
+	}
+	pages, _ := BuildPages(raw, "some-post", "2026-04-23")
+	require.Len(t, pages, 1)
+	assert.Contains(t, pages[0].Content, "type: 'article'")
+}
+
+func TestBuildPages_OmitsDomainWhenEmpty(t *testing.T) {
+	raw := []RawPage{
+		{Title: "Betting", Type: "concept", Content: "## Definition\n\nFoo.\n"},
+	}
+	pages, _ := BuildPages(raw, "src", "2026-04-23")
+	require.Len(t, pages, 1)
+	assert.NotContains(t, pages[0].Content, "domain:")
+}
+
+func TestBuildPages_MultiplePages(t *testing.T) {
+	raw := []RawPage{
+		{Title: "Shape Up", Type: "source", Subtype: "book", Content: "## Summary\n\nA book.\n"},
+		{Title: "Betting", Type: "concept", Content: "## Definition\n\nA technique.\n"},
+		{Title: "Ryan Singer", Type: "entity", Subtype: "person", Content: "## Description\n\nA designer.\n"},
+	}
+	pages, _ := BuildPages(raw, "shape-up", "2026-04-23")
+	require.Len(t, pages, 3)
+	assert.Equal(t, "wiki/sources/shape-up.md", pages[0].Path)
+	assert.Equal(t, "wiki/concepts/betting.md", pages[1].Path)
+	assert.Equal(t, "wiki/entities/ryan-singer.md", pages[2].Path)
+}
+
+func TestBuildPages_TitleWithColon(t *testing.T) {
+	raw := []RawPage{
+		{Title: "Shape Up: The Basecamp Method", Type: "source", Subtype: "book", Content: "## Summary\n\nA book.\n"},
+	}
+	pages, _ := BuildPages(raw, "shape-up", "2026-04-23")
+	require.Len(t, pages, 1)
+	// Title with colon must be quoted in YAML
+	assert.Contains(t, pages[0].Content, "title: 'Shape Up: The Basecamp Method'")
+	assert.Contains(t, pages[0].Content, "aliases:\n  - 'Shape Up: The Basecamp Method'")
+}
+
+func TestBuildPages_EntityNoSubtype(t *testing.T) {
+	raw := []RawPage{
+		{Title: "Basecamp", Type: "entity", Content: "## Description\n\nA company.\n"},
+	}
+	pages, _ := BuildPages(raw, "src", "2026-04-23")
+	require.Len(t, pages, 1)
+	assert.NotContains(t, pages[0].Content, "type:")
+	assert.Contains(t, pages[0].Content, "title: 'Basecamp'")
+}
+
+func TestBuildPages_EmptyTitleSkippedWithWarning(t *testing.T) {
+	raw := []RawPage{
+		{Title: "", Type: "concept", Content: "## Definition\n\nFoo.\n"},
+		{Title: "Betting", Type: "concept", Content: "## Definition\n\nA technique.\n"},
+	}
+	pages, warnings := BuildPages(raw, "src", "2026-04-23")
+	require.Len(t, pages, 1, "empty-title page should be skipped")
+	assert.Equal(t, "wiki/concepts/betting.md", pages[0].Path)
+	assert.Len(t, warnings, 1)
+	assert.Contains(t, warnings[0], "empty title")
+}
--- a/ingestion/internal/pipeline/chunk.go
+++ b/ingestion/internal/pipeline/chunk.go
@@ -0,0 +1,39 @@
+// ingestion/internal/pipeline/chunk.go
+package pipeline
+
+import "strings"
+
+// Chunk splits content into pieces of at most maxSize bytes, splitting at
+// paragraph boundaries (\n\n). If maxSize <= 0, returns content as one chunk.
+func Chunk(content string, maxSize int) []string {
+	content = strings.TrimSpace(content)
+	if maxSize <= 0 || len(content) <= maxSize {
+		return []string{content}
+	}
+
+	paragraphs := strings.Split(content, "\n\n")
+	var chunks []string
+	var cur strings.Builder
+
+	for _, para := range paragraphs {
+		para = strings.TrimSpace(para)
+		if para == "" {
+			continue
+		}
+		addition := para
+		if cur.Len() > 0 {
+			addition = "\n\n" + para
+		}
+		if cur.Len() > 0 && cur.Len()+len(addition) > maxSize {
+			chunks = append(chunks, cur.String())
+			cur.Reset()
+			cur.WriteString(para)
+		} else {
+			cur.WriteString(addition)
+		}
+	}
+	if cur.Len() > 0 {
+		chunks = append(chunks, cur.String())
+	}
+	return chunks
+}
--- a/ingestion/internal/pipeline/chunk_test.go
+++ b/ingestion/internal/pipeline/chunk_test.go
@@ -0,0 +1,36 @@
+// ingestion/internal/pipeline/chunk_test.go
+package pipeline
+
+import (
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func TestChunk_NoChunkingWhenZero(t *testing.T) {
+	content := strings.Repeat("word ", 1000)
+	chunks := Chunk(content, 0)
+	assert.Len(t, chunks, 1)
+}
+
+func TestChunk_SplitsAtParagraph(t *testing.T) {
+	content := "First paragraph here.\n\nSecond paragraph here."
+	chunks := Chunk(content, 40)
+	assert.Len(t, chunks, 2)
+	assert.Equal(t, "First paragraph here.", chunks[0])
+	assert.Equal(t, "Second paragraph here.", chunks[1])
+}
+
+func TestChunk_SingleLargeParagraph(t *testing.T) {
+	content := strings.Repeat("x", 100)
+	chunks := Chunk(content, 50)
+	assert.Len(t, chunks, 1)
+}
+
+func TestChunk_NoChunkingWhenContentFits(t *testing.T) {
+	content := "Short content."
+	chunks := Chunk(content, 1000)
+	assert.Len(t, chunks, 1)
+	assert.Equal(t, "Short content.", chunks[0])
+}
--- a/ingestion/internal/pipeline/links.go
+++ b/ingestion/internal/pipeline/links.go
@@ -0,0 +1,70 @@
+// ingestion/internal/pipeline/links.go
+package pipeline
+
+import (
+	"fmt"
+	"path/filepath"
+	"regexp"
+	"strings"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+// plainLinkRE matches [[Display Name]] — wikilinks without a slug pipe.
+// It does NOT match [[slug|Display]] (those already have a pipe).
+var plainLinkRE = regexp.MustCompile(`\[\[([^\]|]+)\]\]`)
+
+// CanonicalizeLinks converts [[Display Name]] wikilinks to [[slug|Display Name]]
+// using a title→slug map built from the inventory and current batch.
+// Unknown titles are left as-is and returned as warnings.
+func CanonicalizeLinks(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) ([]wiki.Page, []string) {
+	titleToSlug := buildTitleMap(pages, inventory)
+
+	var allWarnings []string
+	out := make([]wiki.Page, len(pages))
+	for i, p := range pages {
+		newContent, warnings := canonicalizeContent(p.Content, titleToSlug)
+		p.Content = newContent
+		out[i] = p
+		allWarnings = append(allWarnings, warnings...)
+	}
+	return out, allWarnings
+}
+
+// buildTitleMap builds a lowercase-title → slug map from inventory and current batch.
+// Current batch entries take precedence over inventory (they may be updates).
+func buildTitleMap(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) map[string]string {
+	m := make(map[string]string)
+	for _, entries := range inventory {
+		for _, e := range entries {
+			m[strings.ToLower(e.Title)] = e.Slug
+		}
+	}
+	// Current batch overrides inventory
+	for _, p := range pages {
+		title := extractTitle(p.Content)
+		slug := strings.TrimSuffix(filepath.Base(p.Path), ".md")
+		if title != "" && slug != "" {
+			m[strings.ToLower(title)] = slug
+		}
+	}
+	return m
+}
+
+func canonicalizeContent(content string, titleToSlug map[string]string) (string, []string) {
+	var warnings []string
+	result := plainLinkRE.ReplaceAllStringFunc(content, func(match string) string {
+		sub := plainLinkRE.FindStringSubmatch(match)
+		if len(sub) < 2 {
+			return match
+		}
+		displayName := sub[1]
+		slug, ok := titleToSlug[strings.ToLower(displayName)]
+		if !ok {
+			warnings = append(warnings, fmt.Sprintf("unknown wikilink: [[%s]]", displayName))
+			return match
+		}
+		return "[[" + slug + "|" + displayName + "]]"
+	})
+	return result, warnings
+}
--- a/ingestion/internal/pipeline/links_test.go
+++ b/ingestion/internal/pipeline/links_test.go
@@ -0,0 +1,125 @@
+// ingestion/internal/pipeline/links_test.go
+package pipeline
+
+import (
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+func TestCanonicalizeLinks_KnownTitle(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/shape-up.md",
+			Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Betting]].\n",
+		},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeConcept: {
+			{Slug: "betting", Title: "Betting"},
+		},
+	}
+	got, warnings := CanonicalizeLinks(pages, inventory)
+	require.Len(t, got, 1)
+	assert.Empty(t, warnings)
+	assert.Contains(t, got[0].Content, "[[betting|Betting]]")
+	assert.NotContains(t, got[0].Content, "[[Betting]]")
+}
+
+func TestCanonicalizeLinks_UnknownTitleLeftAsIs(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/shape-up.md",
+			Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Ghost Concept]].\n",
+		},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{}
+	got, warnings := CanonicalizeLinks(pages, inventory)
+	require.Len(t, got, 1)
+	assert.NotEmpty(t, warnings)
+	assert.Contains(t, got[0].Content, "[[Ghost Concept]]")
+}
+
+func TestCanonicalizeLinks_AlreadyCanonicalLinkUntouched(t *testing.T) {
+	// Links already in [[slug|Display]] format must not be double-converted
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/shape-up.md",
+			Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[betting|Betting]].\n",
+		},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeConcept: {
+			{Slug: "betting", Title: "Betting"},
+		},
+	}
+	got, warnings := CanonicalizeLinks(pages, inventory)
+	require.Len(t, got, 1)
+	assert.Empty(t, warnings)
+	// Should remain exactly as-is — not double-wrapped
+	assert.Contains(t, got[0].Content, "[[betting|Betting]]")
+	assert.NotContains(t, got[0].Content, "[[betting|[[betting|Betting]]]]")
+}
+
+func TestCanonicalizeLinks_CaseInsensitiveMatch(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/foo.md",
+			Content: "---\ntitle: 'Foo'\n---\n\n## Summary\n\nSee [[domain driven design]].\n",
+		},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeConcept: {
+			{Slug: "domain-driven-design", Title: "Domain Driven Design"},
+		},
+	}
+	got, warnings := CanonicalizeLinks(pages, inventory)
+	require.Len(t, got, 1)
+	assert.Empty(t, warnings)
+	assert.Contains(t, got[0].Content, "[[domain-driven-design|domain driven design]]")
+}
+
+func TestCanonicalizeLinks_CurrentBatchPagesResolved(t *testing.T) {
+	// A concept created in the same batch should be canonicalizable
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/shape-up.md",
+			Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Betting]].\n",
+		},
+		{
+			Path:    "wiki/concepts/betting.md",
+			Content: "---\ntitle: 'Betting'\n---\n\n## Definition\n\nA technique.\n",
+		},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{} // empty — Betting is in the batch, not inventory
+
+	got, warnings := CanonicalizeLinks(pages, inventory)
+	require.Len(t, got, 2)
+	assert.Empty(t, warnings)
+	assert.Contains(t, got[0].Content, "[[betting|Betting]]")
+}
+
+func TestCanonicalizeLinks_MultipleLinksInOnePage(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/foo.md",
+			Content: "---\ntitle: 'Foo'\n---\n\n## Summary\n\nSee [[Betting]] and [[Shape Up]].\n",
+		},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeConcept: {
+			{Slug: "betting", Title: "Betting"},
+		},
+		wiki.PageTypeSource: {
+			{Slug: "shape-up", Title: "Shape Up"},
+		},
+	}
+	got, warnings := CanonicalizeLinks(pages, inventory)
+	require.Len(t, got, 1)
+	assert.Empty(t, warnings)
+	assert.Contains(t, got[0].Content, "[[betting|Betting]]")
+	assert.Contains(t, got[0].Content, "[[shape-up|Shape Up]]")
+}
--- a/ingestion/internal/pipeline/parse.go
+++ b/ingestion/internal/pipeline/parse.go
@@ -0,0 +1,110 @@
+// ingestion/internal/pipeline/parse.go
+package pipeline
+
+import (
+	"encoding/json"
+	"fmt"
+	"strings"
+)
+
+// RawPage is the LLM's output format — minimal structured data with no path or frontmatter.
+// The pipeline derives slugs, paths, and frontmatter from these fields.
+type RawPage struct {
+	Title   string `json:"title"`
+	Type    string `json:"type"`    // "source" | "concept" | "entity"
+	Subtype string `json:"subtype"` // entity: person|company|tool|model|framework|technology; source: article|pdf|book|video|note|project
+	Domain  string `json:"domain"`
+	Content string `json:"content"` // Markdown body only — no frontmatter
+}
+
+// ParseRawPages parses LLM output as a JSON array of RawPage objects.
+// If the output contains invalid JSON escape sequences (e.g. \. from Markdown),
+// it attempts repair before falling back to truncation recovery.
+func ParseRawPages(output string) ([]RawPage, []string) {
+	output = strings.TrimSpace(output)
+	if output == "" {
+		return nil, []string{"LLM returned empty output"}
+	}
+
+	output = stripFences(output)
+
+	// Fast path: valid JSON.
+	var pages []RawPage
+	if err := json.Unmarshal([]byte(output), &pages); err == nil {
+		return pages, nil
+	}
+
+	// Repair pass: fix invalid escape sequences (e.g. \. \d from Markdown content).
+	repaired := repairJSON(output)
+	if err := json.Unmarshal([]byte(repaired), &pages); err == nil {
+		return pages, []string{"repaired invalid JSON escape sequences in LLM output"}
+	}
+
+	// Truncation recovery: find last `}` that closes a complete object.
+	idx := strings.LastIndex(repaired, "}")
+	if idx < 0 {
+		return nil, []string{"LLM output contained no complete JSON objects"}
+	}
+
+	start := strings.Index(repaired, "[")
+	if start < 0 {
+		return nil, []string{"LLM output contained no JSON array opening bracket"}
+	}
+
+	candidate := repaired[start:idx+1] + "]"
+	if err := json.Unmarshal([]byte(candidate), &pages); err != nil {
+		return nil, []string{fmt.Sprintf("truncation recovery failed: %v", err)}
+	}
+
+	return pages, []string{fmt.Sprintf("LLM output was truncated; recovered %d page(s)", len(pages))}
+}
+
+// repairJSON replaces invalid JSON escape sequences (e.g. \. \d \p) with
+// a properly escaped backslash followed by the same character.
+// It iterates byte-by-byte to correctly skip already-valid escape sequences
+// (including \\) without requiring lookbehind support.
+func repairJSON(s string) string {
+	var b strings.Builder
+	b.Grow(len(s))
+	i := 0
+	for i < len(s) {
+		if s[i] != '\\' {
+			b.WriteByte(s[i])
+			i++
+			continue
+		}
+		// We have a backslash. Peek at the next character.
+		if i+1 >= len(s) {
+			// Trailing backslash — emit as-is.
+			b.WriteByte(s[i])
+			i++
+			continue
+		}
+		next := s[i+1]
+		switch next {
+		case '"', '\\', '/', 'b', 'f', 'n', 'r', 't', 'u':
+			// Valid JSON escape sequence — emit both characters as-is.
+			b.WriteByte(s[i])
+			b.WriteByte(next)
+			i += 2
+		default:
+			// Invalid escape — double the backslash.
+			b.WriteByte('\\')
+			b.WriteByte('\\')
+			b.WriteByte(next)
+			i += 2
+		}
+	}
+	return b.String()
+}
+
+func stripFences(s string) string {
+	for _, prefix := range []string{"```json\n", "```json\r\n", "```\n", "```\r\n"} {
+		if strings.HasPrefix(s, prefix) {
+			s = strings.TrimPrefix(s, prefix)
+			s = strings.TrimSuffix(strings.TrimSpace(s), "```")
+			return strings.TrimSpace(s)
+		}
+	}
+	return s
+}
--- a/ingestion/internal/pipeline/parse_test.go
+++ b/ingestion/internal/pipeline/parse_test.go
@@ -0,0 +1,87 @@
+// ingestion/internal/pipeline/parse_test.go
+package pipeline
+
+import (
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestParseRawPages_ValidJSON(t *testing.T) {
+	input := `[{"title":"Shape Up","type":"source","subtype":"book","domain":"product-strategy","content":"## Summary\n\nFoo."},{"title":"Betting","type":"concept","content":"## Definition\n\nA technique."}]`
+	pages, warnings := ParseRawPages(input)
+	require.Len(t, pages, 2)
+	assert.Empty(t, warnings)
+	assert.Equal(t, "Shape Up", pages[0].Title)
+	assert.Equal(t, "source", pages[0].Type)
+	assert.Equal(t, "book", pages[0].Subtype)
+	assert.Equal(t, "product-strategy", pages[0].Domain)
+	assert.Equal(t, "Betting", pages[1].Title)
+	assert.Equal(t, "concept", pages[1].Type)
+	assert.Empty(t, pages[1].Subtype)
+}
+
+func TestParseRawPages_StripsFences(t *testing.T) {
+	input := "```json\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"## Definition\\n\\nFoo.\"}]\n```"
+	pages, warnings := ParseRawPages(input)
+	require.Len(t, pages, 1)
+	assert.Empty(t, warnings)
+	assert.Equal(t, "Foo", pages[0].Title)
+}
+
+func TestParseRawPages_TruncationRecovery(t *testing.T) {
+	input := `[{"title":"Foo","type":"concept","content":"## Definition\n\nFoo."},{"title":"Bar","type":"concept","content":"trunc`
+	pages, warnings := ParseRawPages(input)
+	require.Len(t, pages, 1)
+	assert.Equal(t, "Foo", pages[0].Title)
+	assert.NotEmpty(t, warnings)
+}
+
+func TestParseRawPages_EmptyInput(t *testing.T) {
+	pages, warnings := ParseRawPages("")
+	assert.Empty(t, pages)
+	assert.NotEmpty(t, warnings)
+}
+
+func TestParseRawPages_PlainFence(t *testing.T) {
+	input := "```\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"ok\"}]\n```"
+	pages, warnings := ParseRawPages(input)
+	require.Len(t, pages, 1)
+	assert.Empty(t, warnings)
+}
+
+func TestParseRawPages_MissingTitle(t *testing.T) {
+	// Missing title — still parsed, Title is empty string
+	input := `[{"type":"concept","content":"## Definition\n\nFoo."}]`
+	pages, warnings := ParseRawPages(input)
+	require.Len(t, pages, 1)
+	assert.Empty(t, warnings)
+	assert.Empty(t, pages[0].Title)
+}
+
+func TestParseRawPages_InvalidEscapeRepaired(t *testing.T) {
+	// LLM copied markdown escaped list numbers (\.) into JSON — invalid escape
+	raw := "[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"Step 4\\. Do it.\"}]"
+	pages, warnings := ParseRawPages(raw)
+	require.Len(t, pages, 1)
+	assert.Equal(t, "Foo", pages[0].Title)
+	assert.Contains(t, pages[0].Content, `4\.`)
+	assert.NotEmpty(t, warnings) // repair warning
+}
+
+func TestRepairJSON_FixesInvalidEscapes(t *testing.T) {
+	cases := []struct {
+		in   string
+		want string
+	}{
+		{`{"a":"foo\.bar"}`, `{"a":"foo\\.bar"}`},
+		{`{"a":"\\n is fine"}`, `{"a":"\\n is fine"}`}, // valid \n untouched
+		{`{"a":"\d+ items"}`, `{"a":"\\d+ items"}`},
+		{`{"a":"already \\ escaped"}`, `{"a":"already \\ escaped"}`}, // valid \\ untouched
+	}
+	for _, tc := range cases {
+		got := repairJSON(tc.in)
+		assert.Equal(t, tc.want, got, "input: %s", tc.in)
+	}
+}
--- a/ingestion/internal/pipeline/pipeline.go
+++ b/ingestion/internal/pipeline/pipeline.go
@@ -0,0 +1,146 @@
+// ingestion/internal/pipeline/pipeline.go
+package pipeline
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"path/filepath"
+	"strings"
+	"time"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+// CompleteFunc is the function signature for LLM calls.
+type CompleteFunc func(ctx context.Context, system, user string) (string, error)
+
+// Config holds pipeline configuration.
+type Config struct {
+	Complete  CompleteFunc
+	ChunkSize int    // 0 = no chunking
+	Schema    string // overrides brain/schema.md when set (useful in tests)
+}
+
+// Result is the outcome of a pipeline run.
+type Result struct {
+	Pages    []string // relative paths written (or would-be written in dry-run)
+	Warnings []string
+}
+
+// Run ingests content and writes structured wiki pages to brainDir/wiki/.
+// In dry-run mode, pages are returned but not written to disk.
+func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryRun bool) (Result, error) {
+	inventory, err := wiki.LoadInventory(brainDir)
+	if err != nil {
+		return Result{}, fmt.Errorf("load inventory: %w", err)
+	}
+
+	schema := cfg.Schema
+	if schema == "" {
+		schema = loadSchema(brainDir)
+	}
+
+	sourceSlug := wiki.Slug(source)
+	date := time.Now().UTC().Format("2006-01-02")
+	chunks := Chunk(content, cfg.ChunkSize)
+
+	var allRaw []RawPage
+	var allWarnings []string
+
+	for _, chunk := range chunks {
+		userPrompt := BuildPrompt(schema, source, chunk, inventory)
+		output, err := cfg.Complete(ctx, systemPrompt, userPrompt)
+		if err != nil {
+			return Result{}, fmt.Errorf("LLM call: %w", err)
+		}
+		raw, warnings := ParseRawPages(output)
+		allRaw = append(allRaw, raw...)
+		allWarnings = append(allWarnings, warnings...)
+	}
+
+	return buildAndWrite(allRaw, sourceSlug, date, brainDir, source, inventory, allWarnings, dryRun)
+}
+
+// RunRaw runs the pipeline on pre-parsed RawPages, skipping the LLM extraction
+// step. Use this when the caller has already produced the structured RawPage data
+// (e.g. from a more capable model or manual curation).
+func RunRaw(brainDir, source string, rawPages []RawPage, dryRun bool) (Result, error) {
+	inventory, err := wiki.LoadInventory(brainDir)
+	if err != nil {
+		return Result{}, fmt.Errorf("load inventory: %w", err)
+	}
+
+	sourceSlug := wiki.Slug(source)
+	date := time.Now().UTC().Format("2006-01-02")
+
+	return buildAndWrite(rawPages, sourceSlug, date, brainDir, source, inventory, nil, dryRun)
+}
+
+// buildAndWrite runs BuildPages through write for both Run and RunRaw.
+func buildAndWrite(rawPages []RawPage, sourceSlug, date, brainDir, source string, inventory map[wiki.PageType][]wiki.Entry, warnings []string, dryRun bool) (Result, error) {
+	pages, buildWarnings := BuildPages(rawPages, sourceSlug, date)
+	warnings = append(warnings, buildWarnings...)
+	resolved := Resolve(pages, inventory)
+	canonicalized, linkWarnings := CanonicalizeLinks(resolved, inventory)
+	warnings = append(warnings, linkWarnings...)
+	withRefs := injectSourceRefs(canonicalized, inventory, brainDir)
+	merged := mergeAll(withRefs)
+
+	var written []string
+	for _, page := range merged {
+		if !dryRun {
+			dest := filepath.Join(brainDir, filepath.FromSlash(page.Path))
+			if err := os.MkdirAll(filepath.Dir(dest), 0o755); err != nil {
+				return Result{}, fmt.Errorf("mkdir for %s: %w", page.Path, err)
+			}
+			if err := os.WriteFile(dest, []byte(page.Content), 0o644); err != nil {
+				return Result{}, fmt.Errorf("write %s: %w", page.Path, err)
+			}
+		}
+		written = append(written, page.Path)
+	}
+
+	if !dryRun {
+		if err := wiki.RebuildIndex(brainDir, date); err != nil {
+			warnings = append(warnings, fmt.Sprintf("rebuild index: %v", err))
+		}
+		if err := wiki.AppendLog(brainDir, source, written, warnings, date); err != nil {
+			warnings = append(warnings, fmt.Sprintf("append log: %v", err))
+		}
+	}
+
+	return Result{Pages: written, Warnings: warnings}, nil
+}
+
+// mergeAll deduplicates pages by path, merging content from later occurrences.
+func mergeAll(pages []wiki.Page) []wiki.Page {
+	order := make([]string, 0, len(pages))
+	byPath := make(map[string]wiki.Page, len(pages))
+	for _, p := range pages {
+		if _, seen := byPath[p.Path]; !seen {
+			order = append(order, p.Path)
+			byPath[p.Path] = p
+		} else {
+			byPath[p.Path] = wiki.Merge(byPath[p.Path], p)
+		}
+	}
+	result := make([]wiki.Page, 0, len(order))
+	for _, path := range order {
+		result = append(result, byPath[path])
+	}
+	return result
+}
+
+const defaultSchema = `# Brain Wiki Schema
+Three page types: wiki/sources/, wiki/concepts/, wiki/entities/.
+See brain/schema.md for the full schema.
+`
+
+func loadSchema(brainDir string) string {
+	b, err := os.ReadFile(filepath.Join(brainDir, "schema.md"))
+	if err != nil {
+		return defaultSchema
+	}
+	return strings.TrimSpace(string(b))
+}
--- a/ingestion/internal/pipeline/pipeline_test.go
+++ b/ingestion/internal/pipeline/pipeline_test.go
@@ -0,0 +1,139 @@
+// ingestion/internal/pipeline/pipeline_test.go
+package pipeline
+
+import (
+	"context"
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+	"os"
+	"path/filepath"
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/llm"
+)
+
+func TestRun_WritesPages(t *testing.T) {
+	brainDir := t.TempDir()
+	for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} {
+		require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
+	}
+
+	llmResponse := mustJSON([]RawPage{
+		{
+			Title:   "Test Article",
+			Type:    "source",
+			Subtype: "article",
+			Domain:  "software-engineering",
+			Content: "## Summary\n\nA test article.\n\n## Key Claims\n\n- It tests things.\n\n## Concepts Introduced or Reinforced\n\n[[Testing]]\n\n## Entities Mentioned\n\n## Open Questions Raised\n",
+		},
+		{
+			Title:   "Testing",
+			Type:    "concept",
+			Domain:  "software-engineering",
+			Content: "## Definition\n\nThe practice of verifying software.\n\n## Why It Matters\n\nCatches bugs.\n\n## Related Concepts\n\n## Related Entities\n\n## Sources\n\n## Evolving Notes\n",
+		},
+	})
+
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		w.Header().Set("Content-Type", "application/json")
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"choices": []map[string]any{
+				{"message": map[string]any{"role": "assistant", "content": llmResponse}},
+			},
+		})
+	}))
+	defer srv.Close()
+
+	cfg := Config{
+		Complete:  llm.New(srv.URL, "", "test-model", 30*time.Second).Complete,
+		ChunkSize: 0,
+	}
+
+	result, err := Run(context.Background(), cfg, brainDir, "An article about testing.", "test-article", false)
+	require.NoError(t, err)
+	assert.Len(t, result.Pages, 2)
+
+	_, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "test-article.md"))
+	require.NoError(t, err)
+	_, err = os.Stat(filepath.Join(brainDir, "wiki", "concepts", "testing.md"))
+	require.NoError(t, err)
+	_, err = os.Stat(filepath.Join(brainDir, "wiki", "index.md"))
+	require.NoError(t, err)
+	_, err = os.Stat(filepath.Join(brainDir, "log.md"))
+	require.NoError(t, err)
+}
+
+func TestRun_DryRunDoesNotWrite(t *testing.T) {
+	brainDir := t.TempDir()
+	for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} {
+		require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
+	}
+
+	llmResponse := mustJSON([]RawPage{{
+		Title:   "Foo",
+		Type:    "source",
+		Subtype: "article",
+		Content: "## Summary\n\nFoo.\n",
+	}})
+
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"choices": []map[string]any{{"message": map[string]any{"content": llmResponse}}},
+		})
+	}))
+	defer srv.Close()
+
+	cfg := Config{Complete: llm.New(srv.URL, "", "m", 30*time.Second).Complete}
+	result, err := Run(context.Background(), cfg, brainDir, "foo content", "foo", true)
+	require.NoError(t, err)
+	assert.Len(t, result.Pages, 1)
+
+	_, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "foo.md"))
+	assert.True(t, os.IsNotExist(err))
+}
+
+func TestRun_MergesDuplicatePaths(t *testing.T) {
+	brainDir := t.TempDir()
+	for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} {
+		require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
+	}
+
+	// LLM returns same title twice (simulates multi-chunk duplicate)
+	llmResponse := mustJSON([]RawPage{
+		{Title: "Foo", Type: "concept", Content: "## Definition\n\nFirst.\n\n## Related Concepts\n\n[[Bar]]\n"},
+		{Title: "Foo", Type: "concept", Content: "## Definition\n\nSecond.\n\n## Related Concepts\n\n[[Baz]]\n"},
+	})
+
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		_ = json.NewEncoder(w).Encode(map[string]any{
+			"choices": []map[string]any{{"message": map[string]any{"content": llmResponse}}},
+		})
+	}))
+	defer srv.Close()
+
+	cfg := Config{Complete: llm.New(srv.URL, "", "m", 30*time.Second).Complete}
+	result, err := Run(context.Background(), cfg, brainDir, "content", "foo", false)
+	require.NoError(t, err)
+	assert.Len(t, result.Pages, 1) // deduplicated
+
+	content, err := os.ReadFile(filepath.Join(brainDir, "wiki", "concepts", "foo.md"))
+	require.NoError(t, err)
+	// keep-first for Definition, union for Related Concepts
+	assert.Contains(t, string(content), "First.")
+	// Bar and Baz unknown in empty inventory → left as plain [[links]]
+	assert.Contains(t, string(content), "[[Bar]]")
+	assert.Contains(t, string(content), "[[Baz]]")
+}
+
+func mustJSON(v any) string {
+	b, err := json.Marshal(v)
+	if err != nil {
+		panic(err)
+	}
+	return string(b)
+}
--- a/ingestion/internal/pipeline/prompt.go
+++ b/ingestion/internal/pipeline/prompt.go
@@ -0,0 +1,63 @@
+// ingestion/internal/pipeline/prompt.go
+package pipeline
+
+import (
+	"fmt"
+	"strings"
+	"time"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+const systemPrompt = `You are a wiki agent. Read the source material and produce structured wiki pages following the schema provided.
+
+Output ONLY a valid JSON array — no markdown fences, no other text before or after.
+Each element must have exactly these fields:
+  "title"   — exact page title (e.g. "FinBERT", "Ryan Singer", "Shape Up")
+  "type"    — exactly one of: "source", "concept", "entity"
+  "subtype" — for source: article|pdf|book|video|note|project; for entity: person|company|tool|model|framework|technology; omit for concept
+  "domain"  — one of the domains in the schema (omit if none fits)
+  "content" — Markdown body only — NO frontmatter, NO path, NO slug
+
+Wikilinks in content: [[Display Name]] — just the display name, no slug, no pipe separator.
+Only link to pages listed in the inventory or pages you are creating in this response.`
+
+// BuildPrompt constructs the user prompt for a single chunk.
+func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]wiki.Entry) string {
+	var sb strings.Builder
+
+	fmt.Fprintf(&sb, "Today's date is %s.\n\n", time.Now().UTC().Format("2006-01-02"))
+
+	sb.WriteString("## Schema\n\n")
+	sb.WriteString(schema)
+	sb.WriteString("\n\n")
+
+	sb.WriteString("## Existing wiki pages\n\n")
+	sb.WriteString("Reference these pages by display name only — [[Display Name]] — in your content.\n\n")
+
+	for _, pt := range []wiki.PageType{wiki.PageTypeConcept, wiki.PageTypeEntity, wiki.PageTypeSource} {
+		entries := inventory[pt]
+		label := strings.ToUpper(string(pt)[:1]) + string(pt)[1:]
+		if len(entries) == 0 {
+			fmt.Fprintf(&sb, "%s — (none yet)\n\n", label)
+			continue
+		}
+		fmt.Fprintf(&sb, "%s:\n", label)
+		for _, e := range entries {
+			fmt.Fprintf(&sb, "  - %s\n", e.Title)
+		}
+		sb.WriteString("\n")
+	}
+
+	sb.WriteString("## Non-negotiable rules\n\n")
+	sb.WriteString("1. Output ONLY a valid JSON array — no prose, no fences.\n")
+	sb.WriteString("2. Fields: title, type, subtype (if applicable), domain (if applicable), content.\n")
+	sb.WriteString("3. Wikilinks: [[Display Name]] — no slug, no pipe. The pipeline handles slugs.\n")
+	sb.WriteString("4. Section links must match their section type (Related Concepts → concepts only, etc.).\n")
+	sb.WriteString("5. One source page per book — if inventory shows it exists, return it as an UPDATE.\n\n")
+
+	fmt.Fprintf(&sb, "## Source: %s\n\n", source)
+	sb.WriteString(content)
+
+	return sb.String()
+}
--- a/ingestion/internal/pipeline/refs.go
+++ b/ingestion/internal/pipeline/refs.go
@@ -0,0 +1,115 @@
+// ingestion/internal/pipeline/refs.go
+package pipeline
+
+import (
+	"os"
+	"path/filepath"
+	"regexp"
+	"strings"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+var wikilinkRE = regexp.MustCompile(`\[\[([^|\]]+)\|`)
+
+// injectSourceRefs finds the source page in the proposed batch, extracts its
+// wikilinks, and injects a back-reference into every linked concept or entity page.
+// Pages that exist on disk but are not in the current batch are loaded and
+// appended so they will be updated on write.
+func injectSourceRefs(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry, brainDir string) []wiki.Page {
+	sourceSlug, sourceTitle, found := findSourcePage(pages)
+	if !found {
+		return pages
+	}
+
+	var sourceContent string
+	for _, p := range pages {
+		if strings.HasPrefix(p.Path, "wiki/sources/") &&
+			strings.TrimSuffix(filepath.Base(p.Path), ".md") == sourceSlug {
+			sourceContent = p.Content
+			break
+		}
+	}
+
+	linkedSlugs := extractWikilinks(sourceContent)
+	sourceRef := "- [[" + sourceSlug + "|" + sourceTitle + "]]"
+
+	bySlug := make(map[string]int, len(pages))
+	for i, p := range pages {
+		if !strings.HasPrefix(p.Path, "wiki/sources/") {
+			bySlug[strings.TrimSuffix(filepath.Base(p.Path), ".md")] = i
+		}
+	}
+
+	for slug := range linkedSlugs {
+		if slug == sourceSlug {
+			continue
+		}
+		if idx, ok := bySlug[slug]; ok {
+			pages[idx] = addSourceRef(pages[idx], sourceRef)
+			continue
+		}
+		pt, ok := findInInventory(slug, inventory)
+		if !ok {
+			continue
+		}
+		diskPath := filepath.Join(brainDir, "wiki", string(pt), slug+".md")
+		b, err := os.ReadFile(diskPath)
+		if err != nil {
+			continue
+		}
+		page := wiki.Page{
+			Path:    "wiki/" + string(pt) + "/" + slug + ".md",
+			Content: string(b),
+		}
+		pages = append(pages, addSourceRef(page, sourceRef))
+	}
+
+	return pages
+}
+
+// addSourceRef injects sourceRef into the ## Sources bullet section of page
+// using wiki.Merge, which deduplicates bullets automatically.
+func addSourceRef(page wiki.Page, sourceRef string) wiki.Page {
+	patch := wiki.Page{
+		Path:    page.Path,
+		Content: "\n## Sources\n\n" + sourceRef + "\n",
+	}
+	return wiki.Merge(page, patch)
+}
+
+// extractWikilinks returns the set of slugs referenced as [[slug|...]] in content.
+func extractWikilinks(content string) map[string]bool {
+	slugs := make(map[string]bool)
+	for _, m := range wikilinkRE.FindAllStringSubmatch(content, -1) {
+		slugs[m[1]] = true
+	}
+	return slugs
+}
+
+// findSourcePage returns the slug and title of the first wiki/sources/ page in pages.
+func findSourcePage(pages []wiki.Page) (slug, title string, found bool) {
+	for _, p := range pages {
+		if strings.HasPrefix(p.Path, "wiki/sources/") {
+			slug = strings.TrimSuffix(filepath.Base(p.Path), ".md")
+			title = extractTitle(p.Content)
+			if title == "" {
+				title = slug
+			}
+			return slug, title, true
+		}
+	}
+	return "", "", false
+}
+
+// findInInventory returns the PageType for a slug if it appears in the inventory.
+func findInInventory(slug string, inventory map[wiki.PageType][]wiki.Entry) (wiki.PageType, bool) {
+	for pt, entries := range inventory {
+		for _, e := range entries {
+			if e.Slug == slug {
+				return pt, true
+			}
+		}
+	}
+	return "", false
+}
--- a/ingestion/internal/pipeline/refs_test.go
+++ b/ingestion/internal/pipeline/refs_test.go
@@ -0,0 +1,172 @@
+// ingestion/internal/pipeline/refs_test.go
+package pipeline
+
+import (
+	"os"
+	"path/filepath"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+func makeInventory(concepts, entities []string) map[wiki.PageType][]wiki.Entry {
+	inv := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeConcept: {},
+		wiki.PageTypeEntity:  {},
+		wiki.PageTypeSource:  {},
+	}
+	for _, slug := range concepts {
+		inv[wiki.PageTypeConcept] = append(inv[wiki.PageTypeConcept], wiki.Entry{Slug: slug, Title: slug})
+	}
+	for _, slug := range entities {
+		inv[wiki.PageTypeEntity] = append(inv[wiki.PageTypeEntity], wiki.Entry{Slug: slug, Title: slug})
+	}
+	return inv
+}
+
+func TestInjectSourceRefs_NoSourcePage(t *testing.T) {
+	pages := []wiki.Page{
+		{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFoo.\n"},
+	}
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+	assert.Equal(t, pages, got)
+}
+
+func TestInjectSourceRefs_InjectsIntoProposedConcept(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/my-article.md",
+			Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[domain-driven-design|Domain Driven Design]].\n",
+		},
+		{
+			Path:    "wiki/concepts/domain-driven-design.md",
+			Content: "---\ntitle: Domain Driven Design\n---\n\n## Definition\n\nA methodology.\n",
+		},
+	}
+
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+
+	require.Len(t, got, 2)
+	assert.Contains(t, got[1].Content, "## Sources")
+	assert.Contains(t, got[1].Content, "[[my-article|My Article]]")
+}
+
+func TestInjectSourceRefs_LoadsConceptFromDisk(t *testing.T) {
+	brainDir := t.TempDir()
+	conceptDir := filepath.Join(brainDir, "wiki", "concepts")
+	require.NoError(t, os.MkdirAll(conceptDir, 0o755))
+	require.NoError(t, os.WriteFile(
+		filepath.Join(conceptDir, "shape-up.md"),
+		[]byte("---\ntitle: Shape Up\n---\n\n## Definition\n\nA methodology.\n"),
+		0o644,
+	))
+
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/my-article.md",
+			Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[shape-up|Shape Up]].\n",
+		},
+	}
+	inv := makeInventory([]string{"shape-up"}, nil)
+
+	got := injectSourceRefs(pages, inv, brainDir)
+
+	require.Len(t, got, 2)
+	var conceptPage wiki.Page
+	for _, p := range got {
+		if p.Path == "wiki/concepts/shape-up.md" {
+			conceptPage = p
+		}
+	}
+	assert.Contains(t, conceptPage.Content, "## Sources")
+	assert.Contains(t, conceptPage.Content, "[[my-article|My Article]]")
+	assert.Contains(t, conceptPage.Content, "## Definition")
+}
+
+func TestInjectSourceRefs_NoSelfReference(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/my-article.md",
+			Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSelf-link [[my-article|My Article]].\n",
+		},
+	}
+
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+	assert.Len(t, got, 1)
+}
+
+func TestInjectSourceRefs_DeduplicatesOnReingestion(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/my-article.md",
+			Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[ddd|DDD]].\n",
+		},
+		{
+			Path:    "wiki/concepts/ddd.md",
+			Content: "---\ntitle: DDD\n---\n\n## Definition\n\nA thing.\n\n## Sources\n\n- [[my-article|My Article]]\n",
+		},
+	}
+
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+
+	require.Len(t, got, 2)
+	count := 0
+	for _, line := range splitLines(got[1].Content) {
+		if line == "- [[my-article|My Article]]" {
+			count++
+		}
+	}
+	assert.Equal(t, 1, count, "source ref should appear exactly once")
+}
+
+func TestInjectSourceRefs_InjectsIntoEntity(t *testing.T) {
+	pages := []wiki.Page{
+		{
+			Path:    "wiki/sources/book.md",
+			Content: "---\ntitle: Book\n---\n\n## Summary\n\nBy [[ryan-singer|Ryan Singer]].\n",
+		},
+		{
+			Path:    "wiki/entities/ryan-singer.md",
+			Content: "---\ntitle: Ryan Singer\n---\n\n## Description\n\nA designer.\n",
+		},
+	}
+
+	got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
+
+	require.Len(t, got, 2)
+	var entity wiki.Page
+	for _, p := range got {
+		if p.Path == "wiki/entities/ryan-singer.md" {
+			entity = p
+		}
+	}
+	assert.Contains(t, entity.Content, "[[book|Book]]")
+}
+
+func TestExtractWikilinks(t *testing.T) {
+	content := "See [[foo|Foo]] and [[bar|Bar]] and [[foo|Foo again]]."
+	got := extractWikilinks(content)
+	assert.True(t, got["foo"])
+	assert.True(t, got["bar"])
+	assert.Len(t, got, 2, "duplicate slugs should be deduplicated")
+}
+
+func splitLines(s string) []string {
+	var out []string
+	start := 0
+	for i := 0; i < len(s); i++ {
+		if s[i] == '\n' {
+			if line := s[start:i]; line != "" {
+				out = append(out, line)
+			}
+			start = i + 1
+		}
+	}
+	if last := s[start:]; last != "" {
+		out = append(out, last)
+	}
+	return out
+}
--- a/ingestion/internal/pipeline/resolve.go
+++ b/ingestion/internal/pipeline/resolve.go
@@ -0,0 +1,88 @@
+// ingestion/internal/pipeline/resolve.go
+package pipeline
+
+import (
+	"path/filepath"
+	"strings"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+// Resolve remaps proposed pages to existing slugs when a fuzzy title match is found.
+// It only matches within the same page type (entities→entities, concepts→concepts).
+// Pages with no inventory match are returned unchanged.
+func Resolve(proposed []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) []wiki.Page {
+	type key struct {
+		pt         wiki.PageType
+		normalized string
+	}
+	lookup := make(map[key]string) // key → canonical slug
+	for pt, entries := range inventory {
+		for _, e := range entries {
+			k := key{pt: pt, normalized: normalizeTitle(e.Title)}
+			lookup[k] = e.Slug
+			for _, alias := range e.Aliases {
+				ak := key{pt: pt, normalized: normalizeTitle(alias)}
+				if _, exists := lookup[ak]; !exists {
+					lookup[ak] = e.Slug
+				}
+			}
+		}
+	}
+
+	out := make([]wiki.Page, 0, len(proposed))
+	for _, page := range proposed {
+		pt := pageTypeFromPath(page.Path)
+		title := extractTitle(page.Content)
+		k := key{pt: pt, normalized: normalizeTitle(title)}
+		if canonicalSlug, ok := lookup[k]; ok {
+			dir := filepath.Dir(page.Path)
+			page.Path = dir + "/" + canonicalSlug + ".md"
+		}
+		out = append(out, page)
+	}
+	return out
+}
+
+// normalizeTitle lowercases, removes leading articles, collapses whitespace.
+// "The Shape Up Method" → "shape up method"
+func normalizeTitle(s string) string {
+	s = strings.ToLower(strings.TrimSpace(s))
+	for _, article := range []string{"the ", "a ", "an "} {
+		s = strings.TrimPrefix(s, article)
+	}
+	s = strings.ReplaceAll(s, "-", " ")
+	return strings.Join(strings.Fields(s), " ")
+}
+
+// pageTypeFromPath extracts the wiki.PageType from a path like "wiki/entities/foo.md".
+func pageTypeFromPath(path string) wiki.PageType {
+	parts := strings.Split(filepath.ToSlash(path), "/")
+	if len(parts) >= 2 {
+		return wiki.PageType(parts[1])
+	}
+	return ""
+}
+
+// extractTitle reads the title field from YAML frontmatter in content.
+// Falls back to empty string if not found.
+func extractTitle(content string) string {
+	lines := strings.SplitN(content, "\n", 30)
+	inFM := false
+	for _, line := range lines {
+		if strings.TrimSpace(line) == "---" {
+			if !inFM {
+				inFM = true
+				continue
+			}
+			break
+		}
+		if inFM {
+			key, val, ok := strings.Cut(line, ":")
+			if ok && strings.TrimSpace(key) == "title" {
+				return strings.Trim(strings.TrimSpace(val), `"'`)
+			}
+		}
+	}
+	return ""
+}
--- a/ingestion/internal/pipeline/resolve_test.go
+++ b/ingestion/internal/pipeline/resolve_test.go
@@ -0,0 +1,90 @@
+// ingestion/internal/pipeline/resolve_test.go
+package pipeline
+
+import (
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
+)
+
+func TestResolve_NoMatch(t *testing.T) {
+	proposed := []wiki.Page{
+		{Path: "wiki/entities/new-person.md", Content: "---\ntitle: New Person\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeEntity: {
+			{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: []string{"Singer"}},
+		},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	assert.Equal(t, "wiki/entities/new-person.md", got[0].Path)
+}
+
+func TestResolve_TitleMatchRedirectsSlug(t *testing.T) {
+	proposed := []wiki.Page{
+		{Path: "wiki/entities/ryan-singer-the-designer.md", Content: "---\ntitle: Ryan Singer\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeEntity: {
+			{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: nil},
+		},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	assert.Equal(t, "wiki/entities/ryan-singer.md", got[0].Path)
+}
+
+func TestResolve_AliasMatchRedirectsSlug(t *testing.T) {
+	proposed := []wiki.Page{
+		{Path: "wiki/entities/singer.md", Content: "---\ntitle: Singer\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeEntity: {
+			{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: []string{"Singer", "R. Singer"}},
+		},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	assert.Equal(t, "wiki/entities/ryan-singer.md", got[0].Path)
+}
+
+func TestResolve_NormalizationCaseAndArticles(t *testing.T) {
+	proposed := []wiki.Page{
+		{Path: "wiki/concepts/the-shape-up-method.md", Content: "---\ntitle: The Shape Up Method\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeConcept: {
+			{Slug: "shape-up-method", Title: "Shape Up Method", Aliases: nil},
+		},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	assert.Equal(t, "wiki/concepts/shape-up-method.md", got[0].Path)
+}
+
+func TestResolve_OnlyMatchesSamePageType(t *testing.T) {
+	proposed := []wiki.Page{
+		{Path: "wiki/concepts/ryan-singer.md", Content: "---\ntitle: Ryan Singer\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{
+		wiki.PageTypeEntity: {
+			{Slug: "ryan-singer", Title: "Ryan Singer", Aliases: nil},
+		},
+		wiki.PageTypeConcept: {},
+	}
+	got := Resolve(proposed, inventory)
+	assert.Len(t, got, 1)
+	assert.Equal(t, "wiki/concepts/ryan-singer.md", got[0].Path)
+}
+
+func TestResolve_EmptyInventory(t *testing.T) {
+	proposed := []wiki.Page{
+		{Path: "wiki/entities/first.md", Content: "---\ntitle: First\n---\n"},
+	}
+	inventory := map[wiki.PageType][]wiki.Entry{}
+	got := Resolve(proposed, inventory)
+	assert.Equal(t, proposed, got)
+}
--- a/ingestion/internal/search/search.go
+++ b/ingestion/internal/search/search.go
@@ -33,7 +33,12 @@ func Query(brainDir, query string, limit int) ([]Result, error) {

 	var results []Result

-	err := filepath.WalkDir(filepath.Join(brainDir, "knowledge"), func(path string, d os.DirEntry, err error) error {
+	for _, subdir := range []string{"knowledge", "wiki"} {
+		dir := filepath.Join(brainDir, subdir)
+		if _, statErr := os.Stat(dir); os.IsNotExist(statErr) {
+			continue
+		}
+		err := filepath.WalkDir(dir, func(path string, d os.DirEntry, err error) error {
 			if err != nil {
 				slog.Warn("search: skipping path", "path", path, "err", err)
 				return nil
@@ -74,6 +79,7 @@ func Query(brainDir, query string, limit int) ([]Result, error) {
 		if err != nil {
 			return nil, err
 		}
+	}

 	sort.Slice(results, func(i, j int) bool {
 		return results[i].Score > results[j].Score
--- a/ingestion/internal/watcher/watcher.go
+++ b/ingestion/internal/watcher/watcher.go
@@ -0,0 +1,210 @@
+// ingestion/internal/watcher/watcher.go
+package watcher
+
+import (
+	"context"
+	"fmt"
+	"io"
+	"log/slog"
+	"os"
+	"path/filepath"
+	"strings"
+	"time"
+	"unicode"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/extract"
+	"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
+)
+
+// Config holds watcher configuration.
+type Config struct {
+	BrainDir string
+	Interval time.Duration
+	Pipeline pipeline.Config
+}
+
+// Start launches the watcher in a background goroutine.
+// It returns immediately. The watcher stops when ctx is cancelled.
+func Start(ctx context.Context, cfg Config) {
+	go func() {
+		ticker := time.NewTicker(cfg.Interval)
+		defer ticker.Stop()
+		for {
+			select {
+			case <-ctx.Done():
+				return
+			case <-ticker.C:
+				date := time.Now().UTC().Format("2006-01-02")
+				errs := processDir(ctx, cfg, date)
+				for _, err := range errs {
+					slog.Error("watcher: error processing file", "error", err)
+				}
+			}
+		}
+	}()
+}
+
+// processDir walks brain/raw/, processes each eligible file, returns any errors encountered.
+func processDir(ctx context.Context, cfg Config, date string) []error {
+	rawDir := filepath.Join(cfg.BrainDir, "raw")
+
+	var errs []error
+	err := filepath.WalkDir(rawDir, func(path string, d os.DirEntry, err error) error {
+		if err != nil {
+			return err
+		}
+
+		// Skip the root itself.
+		if path == rawDir {
+			return nil
+		}
+
+		// Skip processed/ and failed/ subdirectories entirely.
+		if d.IsDir() {
+			name := d.Name()
+			if name == "processed" || name == "failed" {
+				return filepath.SkipDir
+			}
+			return nil
+		}
+
+		// Only process supported extensions.
+		ext := strings.ToLower(filepath.Ext(path))
+		if ext != ".md" && ext != ".txt" && ext != ".pdf" {
+			return nil
+		}
+
+		// Skip files that have already been processed or permanently failed.
+		if _, err := os.Stat(path + ".processed"); err == nil {
+			return nil
+		}
+		if _, err := os.Stat(path + ".failed"); err == nil {
+			return nil
+		}
+
+		if err := processFile(ctx, cfg, path, date); err != nil {
+			errs = append(errs, fmt.Errorf("process %s: %w", filepath.Base(path), err))
+		}
+		return nil
+	})
+	if err != nil {
+		errs = append(errs, fmt.Errorf("walk raw dir: %w", err))
+	}
+	return errs
+}
+
+// processFile reads a file, calls pipeline.Run, copies it to processed/ or failed/,
+// and writes a marker file next to the original so the watcher skips it next poll.
+// The original file is never deleted, keeping Syncthing-connected vaults (e.g. Obsidian) intact.
+func processFile(ctx context.Context, cfg Config, path, date string) error {
+	filename := filepath.Base(path)
+	source := deriveSource(filename)
+
+	content, err := extract.Text(path)
+	if err != nil {
+		return fmt.Errorf("extract text: %w", err)
+	}
+
+	_, runErr := pipeline.Run(ctx, cfg.Pipeline, cfg.BrainDir, content, source, false)
+	if runErr != nil {
+		// Copy to failed/ and leave a .failed marker so we don't retry.
+		failedDir := filepath.Join(cfg.BrainDir, "raw", "failed")
+		if mkErr := os.MkdirAll(failedDir, 0o755); mkErr != nil {
+			return fmt.Errorf("mkdir failed dir: %w", mkErr)
+		}
+		dest := filepath.Join(failedDir, filename)
+		if cpErr := copyFile(path, dest); cpErr != nil {
+			return fmt.Errorf("copy to failed: %w", cpErr)
+		}
+		if mkErr := os.WriteFile(path+".failed", []byte(runErr.Error()), 0o644); mkErr != nil {
+			slog.Error("watcher: failed to write .failed marker", "error", mkErr)
+		}
+
+		slog.Warn("watcher: file failed", "file", filename, "error", runErr)
+
+		if logErr := appendWatcherLog(cfg.BrainDir, filename, runErr, date); logErr != nil {
+			slog.Error("watcher: failed to write log entry", "error", logErr)
+		}
+		// Return nil: quarantine succeeded; error already logged.
+		return nil
+	}
+
+	// Copy to processed/YYYY-MM-DD/ and leave a .processed marker so we don't re-ingest.
+	processedDir := filepath.Join(cfg.BrainDir, "raw", "processed", date)
+	if err := os.MkdirAll(processedDir, 0o755); err != nil {
+		return fmt.Errorf("mkdir processed dir: %w", err)
+	}
+	dest := filepath.Join(processedDir, filename)
+	if _, err := os.Stat(dest); err == nil {
+		// Archive copy already exists; append timestamp to avoid overwriting.
+		ext := filepath.Ext(filename)
+		base := strings.TrimSuffix(filename, ext)
+		dest = filepath.Join(processedDir, base+"-"+time.Now().UTC().Format("150405")+ext)
+	}
+	if err := copyFile(path, dest); err != nil {
+		return fmt.Errorf("copy to processed: %w", err)
+	}
+	if err := os.WriteFile(path+".processed", []byte(date), 0o644); err != nil {
+		slog.Error("watcher: failed to write .processed marker", "error", err)
+	}
+
+	slog.Info("watcher: file processed", "file", filename, "source", source)
+	return nil
+}
+
+// copyFile copies src to dst, creating dst if it doesn't exist.
+func copyFile(src, dst string) error {
+	in, err := os.Open(src)
+	if err != nil {
+		return fmt.Errorf("open src: %w", err)
+	}
+	defer in.Close() //nolint:errcheck
+
+	out, err := os.Create(dst)
+	if err != nil {
+		return fmt.Errorf("create dst: %w", err)
+	}
+
+	if _, err := io.Copy(out, in); err != nil {
+		out.Close() //nolint:errcheck
+		return fmt.Errorf("copy: %w", err)
+	}
+	return out.Close()
+}
+
+// deriveSource turns a filename into a human-readable source name.
+// "shape-up-book.md" → "Shape Up Book"
+func deriveSource(filename string) string {
+	// Strip extension.
+	name := strings.TrimSuffix(filename, filepath.Ext(filename))
+	// Split on hyphens.
+	words := strings.Split(name, "-")
+	// Title-case each word.
+	for i, w := range words {
+		if w == "" {
+			continue
+		}
+		runes := []rune(w)
+		runes[0] = unicode.ToUpper(runes[0])
+		words[i] = string(runes)
+	}
+	return strings.Join(words, " ")
+}
+
+// appendWatcherLog appends a watcher error entry to brain/log.md.
+func appendWatcherLog(brainDir, filename string, runErr error, date string) error {
+	entry := fmt.Sprintf("## %s — watcher error\n\n- **File:** %s\n- **Error:** %s\n\n",
+		date, filename, runErr.Error())
+
+	logPath := filepath.Join(brainDir, "log.md")
+	f, err := os.OpenFile(logPath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
+	if err != nil {
+		return fmt.Errorf("open log: %w", err)
+	}
+
+	if _, err = f.WriteString(entry); err != nil {
+		f.Close() //nolint:errcheck
+		return fmt.Errorf("write log: %w", err)
+	}
+	return f.Close()
+}
--- a/ingestion/internal/watcher/watcher_test.go
+++ b/ingestion/internal/watcher/watcher_test.go
@@ -0,0 +1,231 @@
+// ingestion/internal/watcher/watcher_test.go
+package watcher
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"os"
+	"path/filepath"
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+
+	"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
+)
+
+// successComplete returns a valid JSON-encoded RawPage array for any call.
+func successComplete(raw pipeline.RawPage) pipeline.CompleteFunc {
+	return func(ctx context.Context, system, user string) (string, error) {
+		b, err := json.Marshal([]pipeline.RawPage{raw})
+		if err != nil {
+			return "", err
+		}
+		return string(b), nil
+	}
+}
+
+// errorComplete always returns an error simulating an LLM failure.
+func errorComplete(_ context.Context, _, _ string) (string, error) {
+	return "", fmt.Errorf("LLM unavailable")
+}
+
+func setupBrainDir(t *testing.T) string {
+	t.Helper()
+	brainDir := t.TempDir()
+	for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources", "raw"} {
+		require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
+	}
+	return brainDir
+}
+
+func TestStart_ProcessesFile(t *testing.T) {
+	brainDir := setupBrainDir(t)
+
+	// Place a .md file in raw/.
+	rawFile := filepath.Join(brainDir, "raw", "shape-up-book.md")
+	require.NoError(t, os.WriteFile(rawFile, []byte("Content about Shape Up."), 0o644))
+
+	date := time.Now().UTC().Format("2006-01-02")
+	rawPage := pipeline.RawPage{
+		Title:   "Shape Up Book",
+		Type:    "source",
+		Subtype: "article",
+		Domain:  "product-management",
+		Content: "## Summary\n\nA book about Shape Up.\n",
+	}
+
+	cfg := Config{
+		BrainDir: brainDir,
+		Interval: 50 * time.Millisecond,
+		Pipeline: pipeline.Config{
+			Complete:  successComplete(rawPage),
+			ChunkSize: 0,
+			Schema:    "# Schema\nThree page types.",
+		},
+	}
+
+	ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+	defer cancel()
+
+	Start(ctx, cfg)
+
+	// Poll until the file is moved to processed/.
+	processedPath := filepath.Join(brainDir, "raw", "processed", date, "shape-up-book.md")
+	var found bool
+	deadline := time.Now().Add(2 * time.Second)
+	for time.Now().Before(deadline) {
+		if _, err := os.Stat(processedPath); err == nil {
+			found = true
+			break
+		}
+		time.Sleep(20 * time.Millisecond)
+	}
+	require.True(t, found, "file should be copied to processed/")
+
+	// Original file should still exist (copy, not move — keeps Obsidian vault intact).
+	_, err := os.Stat(rawFile)
+	assert.NoError(t, err, "original file should remain in raw/")
+
+	// A .processed marker should exist next to the original.
+	_, err = os.Stat(rawFile + ".processed")
+	assert.NoError(t, err, ".processed marker should be written")
+
+	// Wiki page should exist.
+	wikiPath := filepath.Join(brainDir, "wiki", "sources", "shape-up-book.md")
+	_, err = os.Stat(wikiPath)
+	assert.NoError(t, err, "wiki page should be written")
+
+	// log.md should contain an ingest record.
+	logContent, err := os.ReadFile(filepath.Join(brainDir, "log.md"))
+	require.NoError(t, err)
+	assert.Contains(t, string(logContent), "— ingest")
+}
+
+func TestStart_MovesToFailedOnError(t *testing.T) {
+	brainDir := setupBrainDir(t)
+
+	rawFile := filepath.Join(brainDir, "raw", "bad-file.md")
+	require.NoError(t, os.WriteFile(rawFile, []byte("Some content."), 0o644))
+
+	cfg := Config{
+		BrainDir: brainDir,
+		Interval: 50 * time.Millisecond,
+		Pipeline: pipeline.Config{
+			Complete:  errorComplete,
+			ChunkSize: 0,
+			Schema:    "# Schema\nThree page types.",
+		},
+	}
+
+	ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+	defer cancel()
+
+	Start(ctx, cfg)
+
+	// Poll until the file is moved to failed/.
+	failedPath := filepath.Join(brainDir, "raw", "failed", "bad-file.md")
+	var found bool
+	deadline := time.Now().Add(2 * time.Second)
+	for time.Now().Before(deadline) {
+		if _, err := os.Stat(failedPath); err == nil {
+			found = true
+			break
+		}
+		time.Sleep(20 * time.Millisecond)
+	}
+	require.True(t, found, "file should be copied to failed/")
+
+	// Original file should still exist (copy, not move — keeps Obsidian vault intact).
+	_, err := os.Stat(rawFile)
+	assert.NoError(t, err, "original file should remain in raw/")
+
+	// A .failed marker should exist next to the original.
+	_, err = os.Stat(rawFile + ".failed")
+	assert.NoError(t, err, ".failed marker should be written")
+
+	// log.md should contain a watcher error entry.
+	logContent, err := os.ReadFile(filepath.Join(brainDir, "log.md"))
+	require.NoError(t, err)
+	assert.Contains(t, string(logContent), "— watcher error")
+	assert.Contains(t, string(logContent), "bad-file.md")
+}
+
+func TestDeriveSource(t *testing.T) {
+	tests := []struct {
+		filename string
+		want     string
+	}{
+		{"shape-up-book.md", "Shape Up Book"},
+		{"raft-consensus.txt", "Raft Consensus"},
+		{"my-note.md", "My Note"},
+		{"single.md", "Single"},
+		{"no-extension", "No Extension"},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.filename, func(t *testing.T) {
+			got := deriveSource(tc.filename)
+			assert.Equal(t, tc.want, got)
+		})
+	}
+}
+
+func TestProcessDir_SkipsSubdirs(t *testing.T) {
+	brainDir := setupBrainDir(t)
+
+	// Create processed/ and failed/ subdirs with files inside.
+	for _, sub := range []string{"processed/2026-04-22", "failed"} {
+		require.NoError(t, os.MkdirAll(filepath.Join(brainDir, "raw", sub), 0o755))
+	}
+
+	processedFile := filepath.Join(brainDir, "raw", "processed", "2026-04-22", "old-file.md")
+	failedFile := filepath.Join(brainDir, "raw", "failed", "broken-file.md")
+	require.NoError(t, os.WriteFile(processedFile, []byte("old"), 0o644))
+	require.NoError(t, os.WriteFile(failedFile, []byte("broken"), 0o644))
+
+	// Also place a valid file in raw/ root that should be processed.
+	validFile := filepath.Join(brainDir, "raw", "valid.md")
+	require.NoError(t, os.WriteFile(validFile, []byte("valid content"), 0o644))
+
+	date := time.Now().UTC().Format("2006-01-02")
+
+	// Track which sources were passed to Complete.
+	var processedSources []string
+	completeFn := func(ctx context.Context, system, user string) (string, error) {
+		// Record that this was called; return a minimal valid RawPage.
+		raw := pipeline.RawPage{
+			Title:   "Valid",
+			Type:    "source",
+			Subtype: "article",
+			Content: "## Summary\n\nValid.\n",
+		}
+		b, _ := json.Marshal([]pipeline.RawPage{raw})
+		processedSources = append(processedSources, "called")
+		return string(b), nil
+	}
+
+	cfg := Config{
+		BrainDir: brainDir,
+		Interval: time.Hour, // not used; we call processDir directly
+		Pipeline: pipeline.Config{
+			Complete:  completeFn,
+			ChunkSize: 0,
+			Schema:    "# Schema\nThree page types.",
+		},
+	}
+
+	errs := processDir(context.Background(), cfg, date)
+	assert.Empty(t, errs, "no errors expected")
+
+	// Complete should have been called exactly once (for valid.md, not for files in subdirs).
+	assert.Len(t, processedSources, 1, "only the file in raw/ root should be processed")
+
+	// Files in processed/ and failed/ must remain untouched.
+	_, err := os.Stat(processedFile)
+	assert.NoError(t, err, "processed subdir file should be untouched")
+	_, err = os.Stat(failedFile)
+	assert.NoError(t, err, "failed subdir file should be untouched")
+}
--- a/ingestion/internal/wiki/index.go
+++ b/ingestion/internal/wiki/index.go
@@ -0,0 +1,71 @@
+// ingestion/internal/wiki/index.go
+package wiki
+
+import (
+	"fmt"
+	"os"
+	"path/filepath"
+	"strings"
+)
+
+// RebuildIndex writes brain/wiki/index.md from the current wiki contents.
+func RebuildIndex(brainDir, date string) error {
+	inv, err := LoadInventory(brainDir)
+	if err != nil {
+		return fmt.Errorf("load inventory: %w", err)
+	}
+
+	total := len(inv[PageTypeConcept]) + len(inv[PageTypeEntity]) + len(inv[PageTypeSource])
+	var sb strings.Builder
+	fmt.Fprintf(&sb, "# Wiki Index\n\n")
+	fmt.Fprintf(&sb, "_Updated: %s — %d pages (%d concepts, %d entities, %d sources)_\n\n",
+		date, total,
+		len(inv[PageTypeConcept]),
+		len(inv[PageTypeEntity]),
+		len(inv[PageTypeSource]))
+
+	for _, pt := range []PageType{PageTypeConcept, PageTypeEntity, PageTypeSource} {
+		entries := inv[pt]
+		if len(entries) == 0 {
+			continue
+		}
+		label := strings.ToUpper(string(pt)[:1]) + string(pt)[1:]
+		fmt.Fprintf(&sb, "## %s\n\n", label)
+		for _, e := range entries {
+			summary := pageFirstSentence(brainDir, e)
+			if summary != "" {
+				fmt.Fprintf(&sb, "- [[%s|%s]] — %s\n", e.Slug, e.Title, summary)
+			} else {
+				fmt.Fprintf(&sb, "- [[%s|%s]]\n", e.Slug, e.Title)
+			}
+		}
+		sb.WriteString("\n")
+	}
+
+	dest := filepath.Join(brainDir, "wiki", "index.md")
+	return os.WriteFile(dest, []byte(sb.String()), 0o644)
+}
+
+func pageFirstSentence(brainDir string, e Entry) string {
+	path := filepath.Join(brainDir, "wiki", string(e.Type), e.Slug+".md")
+	content, err := os.ReadFile(path)
+	if err != nil {
+		return ""
+	}
+	parts := strings.SplitN(string(content), "---", 3)
+	body := string(content)
+	if len(parts) == 3 {
+		body = parts[2]
+	}
+	for _, line := range strings.Split(body, "\n") {
+		line = strings.TrimSpace(line)
+		if line == "" || strings.HasPrefix(line, "#") {
+			continue
+		}
+		if len(line) > 100 {
+			return line[:100] + "…"
+		}
+		return line
+	}
+	return ""
+}
--- a/ingestion/internal/wiki/index_test.go
+++ b/ingestion/internal/wiki/index_test.go
@@ -0,0 +1,76 @@
+// ingestion/internal/wiki/index_test.go
+package wiki
+
+import (
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func setupWikiDir(t *testing.T) string {
+	t.Helper()
+	dir := t.TempDir()
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
+	require.NoError(t, os.WriteFile(
+		filepath.Join(dir, "wiki", "concepts", "tdd.md"),
+		[]byte("---\ntitle: TDD\n---\n\n## Definition\n\nTest-driven development is a discipline.\n"),
+		0o644,
+	))
+	return dir
+}
+
+func TestRebuildIndex(t *testing.T) {
+	dir := setupWikiDir(t)
+	require.NoError(t, RebuildIndex(dir, "2026-04-22"))
+
+	content, err := os.ReadFile(filepath.Join(dir, "wiki", "index.md"))
+	require.NoError(t, err)
+	s := string(content)
+	assert.Contains(t, s, "# Wiki Index")
+	assert.Contains(t, s, "2026-04-22")
+	assert.Contains(t, s, "[[tdd|TDD]]")
+	assert.Contains(t, s, "## Concepts")
+}
+
+func TestRebuildIndex_EmptyWiki(t *testing.T) {
+	dir := t.TempDir()
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
+
+	require.NoError(t, RebuildIndex(dir, "2026-04-22"))
+	content, err := os.ReadFile(filepath.Join(dir, "wiki", "index.md"))
+	require.NoError(t, err)
+	assert.Contains(t, string(content), "# Wiki Index")
+}
+
+func TestAppendLog(t *testing.T) {
+	dir := t.TempDir()
+	require.NoError(t, AppendLog(dir, "shape-up-book",
+		[]string{"wiki/sources/shape-up.md", "wiki/concepts/betting-table.md"},
+		nil, "2026-04-22"))
+
+	content, err := os.ReadFile(filepath.Join(dir, "log.md"))
+	require.NoError(t, err)
+	s := string(content)
+	assert.Contains(t, s, "shape-up-book")
+	assert.Contains(t, s, "wiki/sources/shape-up.md")
+	assert.True(t, strings.HasPrefix(s, "## 2026-04-22"))
+}
+
+func TestAppendLog_AppendsOnSecondCall(t *testing.T) {
+	dir := t.TempDir()
+	require.NoError(t, AppendLog(dir, "source-a", []string{"wiki/sources/a.md"}, nil, "2026-04-22"))
+	require.NoError(t, AppendLog(dir, "source-b", []string{"wiki/sources/b.md"}, nil, "2026-04-22"))
+
+	content, err := os.ReadFile(filepath.Join(dir, "log.md"))
+	require.NoError(t, err)
+	assert.Contains(t, string(content), "source-a")
+	assert.Contains(t, string(content), "source-b")
+}
--- a/ingestion/internal/wiki/inventory.go
+++ b/ingestion/internal/wiki/inventory.go
@@ -0,0 +1,90 @@
+// ingestion/internal/wiki/inventory.go
+package wiki
+
+import (
+	"bufio"
+	"fmt"
+	"os"
+	"path/filepath"
+	"strings"
+)
+
+// LoadInventory walks brain/wiki/ and returns all pages grouped by type.
+// Missing subdirectories are silently skipped.
+func LoadInventory(brainDir string) (map[PageType][]Entry, error) {
+	result := map[PageType][]Entry{
+		PageTypeConcept: {},
+		PageTypeEntity:  {},
+		PageTypeSource:  {},
+	}
+	for pt := range result {
+		dir := filepath.Join(brainDir, "wiki", string(pt))
+		entries, err := os.ReadDir(dir)
+		if os.IsNotExist(err) {
+			continue
+		}
+		if err != nil {
+			return nil, fmt.Errorf("read dir %s: %w", dir, err)
+		}
+		for _, e := range entries {
+			if e.IsDir() || !strings.HasSuffix(e.Name(), ".md") {
+				continue
+			}
+			slug := strings.TrimSuffix(e.Name(), ".md")
+			path := filepath.Join(dir, e.Name())
+			title, aliases := readFrontmatter(path, slug)
+			result[pt] = append(result[pt], Entry{Slug: slug, Title: title, Aliases: aliases, Type: pt})
+		}
+	}
+	return result, nil
+}
+
+// readFrontmatter extracts title and aliases from YAML frontmatter.
+// Falls back to slug for title and empty aliases on any error.
+func readFrontmatter(path, fallbackSlug string) (title string, aliases []string) {
+	title = fallbackSlug
+	f, err := os.Open(path)
+	if err != nil {
+		return
+	}
+	defer f.Close() //nolint:errcheck
+
+	scanner := bufio.NewScanner(f)
+	inFM := false
+	inAliases := false
+	for scanner.Scan() {
+		line := scanner.Text()
+		if strings.TrimSpace(line) == "---" {
+			if !inFM {
+				inFM = true
+				continue
+			}
+			break // end of frontmatter
+		}
+		if !inFM {
+			continue
+		}
+
+		// Detect alias list items (lines starting with "  - ").
+		if inAliases {
+			trimmed := strings.TrimSpace(line)
+			if strings.HasPrefix(trimmed, "- ") {
+				aliases = append(aliases, strings.TrimPrefix(trimmed, "- "))
+				continue
+			}
+			inAliases = false // end of alias block
+		}
+
+		key, val, ok := strings.Cut(line, ":")
+		if !ok {
+			continue
+		}
+		switch strings.TrimSpace(key) {
+		case "title":
+			title = strings.Trim(strings.TrimSpace(val), `"'`)
+		case "aliases":
+			inAliases = true
+		}
+	}
+	return
+}
--- a/ingestion/internal/wiki/inventory_test.go
+++ b/ingestion/internal/wiki/inventory_test.go
@@ -0,0 +1,83 @@
+// ingestion/internal/wiki/inventory_test.go
+package wiki
+
+import (
+	"os"
+	"path/filepath"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestLoadInventory(t *testing.T) {
+	dir := t.TempDir()
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
+
+	require.NoError(t, os.WriteFile(
+		filepath.Join(dir, "wiki", "concepts", "domain-driven-design.md"),
+		[]byte("---\ntitle: Domain Driven Design\n---\n\n## Definition\n\nA thing.\n"),
+		0o644,
+	))
+	require.NoError(t, os.WriteFile(
+		filepath.Join(dir, "wiki", "entities", "ryan-singer.md"),
+		[]byte("---\ntitle: Ryan Singer\n---\n\n## Description\n\nDesigner.\n"),
+		0o644,
+	))
+
+	inv, err := LoadInventory(dir)
+	require.NoError(t, err)
+
+	assert.Len(t, inv[PageTypeConcept], 1)
+	assert.Equal(t, "domain-driven-design", inv[PageTypeConcept][0].Slug)
+	assert.Equal(t, "Domain Driven Design", inv[PageTypeConcept][0].Title)
+
+	assert.Len(t, inv[PageTypeEntity], 1)
+	assert.Equal(t, "ryan-singer", inv[PageTypeEntity][0].Slug)
+
+	assert.Empty(t, inv[PageTypeSource])
+}
+
+func TestLoadInventory_EmptyDirs(t *testing.T) {
+	dir := t.TempDir()
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
+
+	inv, err := LoadInventory(dir)
+	require.NoError(t, err)
+	assert.Empty(t, inv[PageTypeConcept])
+	assert.Empty(t, inv[PageTypeEntity])
+	assert.Empty(t, inv[PageTypeSource])
+}
+
+func TestLoadInventory_MissingDirsOk(t *testing.T) {
+	dir := t.TempDir()
+	// No wiki/ subdirs at all
+	inv, err := LoadInventory(dir)
+	require.NoError(t, err)
+	assert.NotNil(t, inv)
+}
+
+func TestLoadInventory_ReadsAliases(t *testing.T) {
+	dir := t.TempDir()
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755))
+	require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755))
+
+	require.NoError(t, os.WriteFile(
+		filepath.Join(dir, "wiki", "entities", "ryan-singer.md"),
+		[]byte("---\ntitle: Ryan Singer\naliases:\n  - Singer\n  - R. Singer\n---\n\n## Description\n\nDesigner.\n"),
+		0o644,
+	))
+
+	inv, err := LoadInventory(dir)
+	require.NoError(t, err)
+
+	require.Len(t, inv[PageTypeEntity], 1)
+	e := inv[PageTypeEntity][0]
+	assert.Equal(t, "Ryan Singer", e.Title)
+	assert.Equal(t, []string{"Singer", "R. Singer"}, e.Aliases)
+}
--- a/ingestion/internal/wiki/log.go
+++ b/ingestion/internal/wiki/log.go
@@ -0,0 +1,40 @@
+// ingestion/internal/wiki/log.go
+package wiki
+
+import (
+	"fmt"
+	"os"
+	"path/filepath"
+	"strings"
+)
+
+// AppendLog appends one ingestion record to brain/log.md.
+func AppendLog(brainDir, source string, pages, warnings []string, date string) error {
+	var sb strings.Builder
+	fmt.Fprintf(&sb, "## %s — ingest\n\n", date)
+	fmt.Fprintf(&sb, "- **Source:** %s\n", source)
+	if len(pages) > 0 {
+		sb.WriteString("- **Pages written:**\n")
+		for _, p := range pages {
+			fmt.Fprintf(&sb, "  - %s\n", p)
+		}
+	}
+	if len(warnings) > 0 {
+		sb.WriteString("- **Warnings:**\n")
+		for _, w := range warnings {
+			fmt.Fprintf(&sb, "  - %s\n", w)
+		}
+	}
+	sb.WriteString("\n")
+
+	logPath := filepath.Join(brainDir, "log.md")
+	f, err := os.OpenFile(logPath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
+	if err != nil {
+		return fmt.Errorf("open log: %w", err)
+	}
+	if _, err = f.WriteString(sb.String()); err != nil {
+		f.Close() //nolint:errcheck
+		return fmt.Errorf("write log: %w", err)
+	}
+	return f.Close()
+}
--- a/ingestion/internal/wiki/merge.go
+++ b/ingestion/internal/wiki/merge.go
@@ -0,0 +1,120 @@
+// ingestion/internal/wiki/merge.go
+package wiki
+
+import (
+	"fmt"
+	"strings"
+)
+
+var bulletSections = map[string]bool{
+	"Related Concepts":                  true,
+	"Related Entities":                  true,
+	"Sources":                           true,
+	"Key Claims":                        true,
+	"Entities Mentioned":                true,
+	"Concepts Introduced or Reinforced": true,
+	"Chapters":                          true,
+}
+
+var appendSections = map[string]bool{
+	"Evolving Notes":        true,
+	"Updates":               true,
+	"Open Questions Raised": true,
+	"Open Questions":        true,
+}
+
+type section struct {
+	heading string
+	content string
+}
+
+// Merge combines two Page values with the same path.
+// Frontmatter is taken from a. Sections are merged by strategy:
+// bullet sections union unique lines, append sections concatenate,
+// all others keep a's version. Sections in b not present in a are appended.
+func Merge(a, b Page) Page {
+	fmA, secsA := parseSections(a.Content)
+	_, secsB := parseSections(b.Content)
+
+	idx := make(map[string]int, len(secsA))
+	for i, s := range secsA {
+		idx[s.heading] = i
+	}
+
+	for _, sB := range secsB {
+		i, exists := idx[sB.heading]
+		if !exists {
+			idx[sB.heading] = len(secsA)
+			secsA = append(secsA, sB)
+			continue
+		}
+		sA := secsA[i]
+		switch {
+		case bulletSections[sB.heading]:
+			secsA[i].content = mergeBullets(sA.content, sB.content)
+		case appendSections[sB.heading]:
+			secsA[i].content = strings.TrimRight(sA.content, "\n") + "\n\n" + strings.TrimLeft(sB.content, "\n")
+		}
+	}
+
+	return Page{Path: a.Path, Content: rebuildContent(fmA, secsA)}
+}
+
+func parseSections(markdown string) (frontmatter string, sections []section) {
+	lines := strings.Split(markdown, "\n")
+	i := 0
+
+	if i < len(lines) && strings.TrimSpace(lines[i]) == "---" {
+		i++
+		var fmLines []string
+		for i < len(lines) {
+			if strings.TrimSpace(lines[i]) == "---" {
+				i++
+				break
+			}
+			fmLines = append(fmLines, lines[i])
+			i++
+		}
+		frontmatter = fmt.Sprintf("---\n%s\n---\n", strings.Join(fmLines, "\n"))
+	}
+
+	var cur *section
+	for ; i < len(lines); i++ {
+		line := lines[i]
+		if strings.HasPrefix(line, "## ") {
+			if cur != nil {
+				sections = append(sections, *cur)
+			}
+			cur = &section{heading: strings.TrimPrefix(line, "## ")}
+		} else if cur != nil {
+			cur.content += line + "\n"
+		}
+	}
+	if cur != nil {
+		sections = append(sections, *cur)
+	}
+	return
+}
+
+func rebuildContent(frontmatter string, sections []section) string {
+	var sb strings.Builder
+	sb.WriteString(frontmatter)
+	for _, sec := range sections {
+		fmt.Fprintf(&sb, "\n## %s\n\n%s", sec.heading, sec.content)
+	}
+	return sb.String()
+}
+
+func mergeBullets(a, b string) string {
+	seen := make(map[string]bool)
+	var lines []string
+	for _, line := range strings.Split(a+b, "\n") {
+		trimmed := strings.TrimSpace(line)
+		if trimmed == "" || seen[trimmed] {
+			continue
+		}
+		seen[trimmed] = true
+		lines = append(lines, line)
+	}
+	return strings.Join(lines, "\n") + "\n"
+}
--- a/ingestion/internal/wiki/merge_test.go
+++ b/ingestion/internal/wiki/merge_test.go
@@ -0,0 +1,55 @@
+// ingestion/internal/wiki/merge_test.go
+package wiki
+
+import (
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func TestMerge_BulletSectionsUnion(t *testing.T) {
+	a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Related Concepts\n\n- [[bar|Bar]]\n"}
+	b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Related Concepts\n\n- [[bar|Bar]]\n- [[baz|Baz]]\n"}
+
+	got := Merge(a, b)
+	assert.Contains(t, got.Content, "[[bar|Bar]]")
+	assert.Contains(t, got.Content, "[[baz|Baz]]")
+	assert.Equal(t, 1, strings.Count(got.Content, "[[bar|Bar]]"))
+}
+
+func TestMerge_AppendSections(t *testing.T) {
+	a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Evolving Notes\n\nFirst note.\n"}
+	b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Evolving Notes\n\nSecond note.\n"}
+
+	got := Merge(a, b)
+	assert.Contains(t, got.Content, "First note.")
+	assert.Contains(t, got.Content, "Second note.")
+}
+
+func TestMerge_KeepFirstForOtherSections(t *testing.T) {
+	a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFirst definition.\n"}
+	b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nSecond definition.\n"}
+
+	got := Merge(a, b)
+	assert.Contains(t, got.Content, "First definition.")
+	assert.NotContains(t, got.Content, "Second definition.")
+}
+
+func TestMerge_NewSectionFromB(t *testing.T) {
+	a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nA thing.\n"}
+	b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Why It Matters\n\nBecause reasons.\n"}
+
+	got := Merge(a, b)
+	assert.Contains(t, got.Content, "A thing.")
+	assert.Contains(t, got.Content, "Because reasons.")
+}
+
+func TestMerge_KeepsFrontmatterFromA(t *testing.T) {
+	a := Page{Path: "p.md", Content: "---\ntitle: A\nlast_updated: 2026-01-01\n---\n\n## Definition\n\nA.\n"}
+	b := Page{Path: "p.md", Content: "---\ntitle: B\nlast_updated: 2026-06-01\n---\n\n## Definition\n\nB.\n"}
+
+	got := Merge(a, b)
+	assert.Contains(t, got.Content, "title: A")
+	assert.NotContains(t, got.Content, "title: B")
+}
--- a/ingestion/internal/wiki/slug.go
+++ b/ingestion/internal/wiki/slug.go
@@ -0,0 +1,28 @@
+// ingestion/internal/wiki/slug.go
+package wiki
+
+import (
+	"strings"
+	"unicode"
+)
+
+// Slug converts a title to a kebab-case slug suitable for wiki filenames.
+// Rules: lowercase, spaces/hyphens/underscores → hyphens, strip everything else.
+func Slug(title string) string {
+	var b strings.Builder
+	prevHyphen := true // start true to trim leading hyphens
+	for _, r := range strings.ToLower(title) {
+		switch {
+		case r == ' ' || r == '-' || r == '_':
+			if !prevHyphen {
+				b.WriteRune('-')
+				prevHyphen = true
+			}
+		case unicode.IsLetter(r) || unicode.IsDigit(r):
+			b.WriteRune(r)
+			prevHyphen = false
+			// all other characters (apostrophes, colons, dots, etc.) are dropped
+		}
+	}
+	return strings.TrimRight(b.String(), "-")
+}
--- a/ingestion/internal/wiki/slug_test.go
+++ b/ingestion/internal/wiki/slug_test.go
@@ -0,0 +1,29 @@
+// ingestion/internal/wiki/slug_test.go
+package wiki
+
+import (
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func TestSlug(t *testing.T) {
+	tests := []struct {
+		input string
+		want  string
+	}{
+		{"Domain Driven Design", "domain-driven-design"},
+		{"It's Complicated", "its-complicated"},
+		{"gRPC", "grpc"},
+		{"GPT-4o", "gpt-4o"},
+		{"Property 1: It's Rough", "property-1-its-rough"},
+		{"  leading spaces  ", "leading-spaces"},
+		{"multiple   spaces", "multiple-spaces"},
+		{"already-kebab", "already-kebab"},
+	}
+	for _, tc := range tests {
+		t.Run(tc.input, func(t *testing.T) {
+			assert.Equal(t, tc.want, Slug(tc.input))
+		})
+	}
+}
--- a/ingestion/internal/wiki/types.go
+++ b/ingestion/internal/wiki/types.go
@@ -0,0 +1,25 @@
+// ingestion/internal/wiki/types.go
+package wiki
+
+// PageType identifies the wiki subdirectory for a page.
+type PageType string
+
+const (
+	PageTypeConcept PageType = "concepts"
+	PageTypeEntity  PageType = "entities"
+	PageTypeSource  PageType = "sources"
+)
+
+// Page is a wiki page to be written to disk.
+type Page struct {
+	Path    string // relative to brainDir, e.g. "wiki/sources/foo.md"
+	Content string // full markdown including YAML frontmatter
+}
+
+// Entry is a summary of an existing wiki page used to build the inventory.
+type Entry struct {
+	Slug    string
+	Title   string
+	Aliases []string
+	Type    PageType
+}
--- a/internal/config/config.go
+++ b/internal/config/config.go
@@ -9,6 +9,8 @@ type Config struct {
 	ConfigDir      string // SUPERVISOR_CONFIG_DIR, default ./config/supervisor
 	ModelsFile     string // SUPERVISOR_MODELS_FILE, default <ConfigDir>/../models.yaml
 	IngestBaseURL  string // INGEST_BASE_URL, default http://localhost:3300
+	IngestSvcURL   string // INGEST_SVC_URL — base URL for brain_ingest (/ingest, /ingest-path)
+	KBRetrievalURL string // KB_RETRIEVAL_URL — base URL for brain_search
 	SessionsDir    string // SUPERVISOR_SESSIONS_DIR, default ./brain/sessions
 	BrainDir       string // SUPERVISOR_BRAIN_DIR, default ./brain
 }
@@ -22,6 +24,8 @@ func Load() (Config, error) {
 	}
 	cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml")
 	cfg.IngestBaseURL = envOr("INGEST_BASE_URL", "http://localhost:3300")
+	cfg.IngestSvcURL = envOr("INGEST_SVC_URL", "")
+	cfg.KBRetrievalURL = envOr("KB_RETRIEVAL_URL", "")
 	cfg.SessionsDir = envOr("SUPERVISOR_SESSIONS_DIR", "./brain/sessions")
 	cfg.BrainDir = envOr("SUPERVISOR_BRAIN_DIR", "./brain")
 	return cfg, nil
--- a/internal/skills/brain/handlers.go
+++ b/internal/skills/brain/handlers.go
@@ -10,13 +10,19 @@ import (
 	"net/http"
 )

-// Handle dispatches brain_query and brain_write tool calls.
+// Handle dispatches brain tool calls.
 func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (json.RawMessage, error) {
 	switch tool {
 	case "brain_query":
 		return s.query(ctx, args)
 	case "brain_write":
 		return s.write(ctx, args)
+	case "brain_ingest_raw":
+		return s.ingestRaw(ctx, args)
+	case "brain_ingest":
+		return s.ingest(ctx, args)
+	case "brain_search":
+		return s.search(ctx, args)
 	default:
 		return nil, fmt.Errorf("unknown brain tool: %s", tool)
 	}
@@ -59,12 +65,101 @@ func (s *Skill) write(ctx context.Context, args json.RawMessage) (json.RawMessag
 	return s.post(ctx, "/write", a)
 }

+type ingestArgs struct {
+	Content string `json:"content,omitempty"`
+	Source  string `json:"source,omitempty"`
+	Path    string `json:"path,omitempty"`
+	DryRun  bool   `json:"dry_run,omitempty"`
+}
+
+func (s *Skill) ingest(ctx context.Context, args json.RawMessage) (json.RawMessage, error) {
+	var a ingestArgs
+	if err := json.Unmarshal(args, &a); err != nil {
+		return nil, fmt.Errorf("parse args: %w", err)
+	}
+	if s.cfg.IngestSvcURL == "" {
+		return nil, fmt.Errorf("brain_ingest: INGEST_SVC_URL not configured")
+	}
+	if a.Path != "" && a.Content != "" {
+		return nil, fmt.Errorf("path and content+source are mutually exclusive: provide one or the other")
+	}
+	if a.Path != "" {
+		return s.postTo(ctx, s.cfg.IngestSvcURL+"/ingest-path", map[string]any{
+			"path":    a.Path,
+			"source":  a.Source,
+			"dry_run": a.DryRun,
+		})
+	}
+	if a.Content != "" && a.Source != "" {
+		return s.postTo(ctx, s.cfg.IngestSvcURL+"/ingest", map[string]any{
+			"content": a.Content,
+			"source":  a.Source,
+			"dry_run": a.DryRun,
+		})
+	}
+	return nil, fmt.Errorf("either content+source or path is required")
+}
+
+type ingestRawArgs struct {
+	Source string `json:"source"`
+	Pages  []any  `json:"pages"`
+	DryRun bool   `json:"dry_run,omitempty"`
+}
+
+func (s *Skill) ingestRaw(ctx context.Context, args json.RawMessage) (json.RawMessage, error) {
+	var a ingestRawArgs
+	if err := json.Unmarshal(args, &a); err != nil {
+		return nil, fmt.Errorf("parse args: %w", err)
+	}
+	if s.cfg.IngestSvcURL == "" {
+		return nil, fmt.Errorf("brain_ingest_raw: INGEST_SVC_URL not configured")
+	}
+	if a.Source == "" {
+		return nil, fmt.Errorf("source is required")
+	}
+	if len(a.Pages) == 0 {
+		return nil, fmt.Errorf("pages is required and must be non-empty")
+	}
+	return s.postTo(ctx, s.cfg.IngestSvcURL+"/ingest-raw", map[string]any{
+		"source":  a.Source,
+		"pages":   a.Pages,
+		"dry_run": a.DryRun,
+	})
+}
+
+type searchArgs struct {
+	Query      string `json:"query"`
+	Collection string `json:"collection,omitempty"`
+	Limit      int    `json:"limit,omitempty"`
+}
+
+func (s *Skill) search(ctx context.Context, args json.RawMessage) (json.RawMessage, error) {
+	var a searchArgs
+	if err := json.Unmarshal(args, &a); err != nil {
+		return nil, fmt.Errorf("parse args: %w", err)
+	}
+	if a.Query == "" {
+		return nil, fmt.Errorf("query is required")
+	}
+	if a.Limit == 0 {
+		a.Limit = 5
+	}
+	if s.cfg.KBRetrievalURL == "" {
+		return nil, fmt.Errorf("brain_search: KB_RETRIEVAL_URL not configured")
+	}
+	return s.postTo(ctx, s.cfg.KBRetrievalURL+"/api/v1/search", a)
+}
+
 func (s *Skill) post(ctx context.Context, path string, body any) (json.RawMessage, error) {
+	return s.postTo(ctx, s.cfg.IngestBaseURL+path, body)
+}
+
+func (s *Skill) postTo(ctx context.Context, url string, body any) (json.RawMessage, error) {
 	b, err := json.Marshal(body)
 	if err != nil {
 		return nil, fmt.Errorf("marshal request: %w", err)
 	}
-	req, err := http.NewRequestWithContext(ctx, http.MethodPost, s.cfg.IngestBaseURL+path, bytes.NewReader(b))
+	req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(b))
 	if err != nil {
 		return nil, fmt.Errorf("build request: %w", err)
 	}
--- a/internal/skills/brain/handlers_test.go
+++ b/internal/skills/brain/handlers_test.go
@@ -63,3 +63,60 @@ func TestHandle_UnknownTool_ReturnsError(t *testing.T) {
 	_, err := s.Handle(context.Background(), "brain_unknown", nil)
 	assert.Error(t, err)
 }
+
+func TestIngest_RoutesToIngestPath(t *testing.T) {
+	var capturedPath string
+	var capturedBody map[string]any
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		capturedPath = r.URL.Path
+		require.NoError(t, json.NewDecoder(r.Body).Decode(&capturedBody))
+		_ = json.NewEncoder(w).Encode(map[string]any{"pages": []string{"wiki/foo.md"}})
+	}))
+	defer srv.Close()
+
+	s := brain.New(brain.Config{IngestSvcURL: srv.URL})
+	args, _ := json.Marshal(map[string]any{"path": "/tmp/some-file.md"})
+	out, err := s.Handle(context.Background(), "brain_ingest", args)
+	require.NoError(t, err)
+
+	assert.Equal(t, "/ingest-path", capturedPath)
+	assert.Equal(t, "/tmp/some-file.md", capturedBody["path"])
+
+	var result map[string]any
+	require.NoError(t, json.Unmarshal(out, &result))
+	pages := result["pages"].([]any)
+	assert.Len(t, pages, 1)
+}
+
+func TestIngest_RoutesToIngest(t *testing.T) {
+	var capturedPath string
+	var capturedBody map[string]any
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		capturedPath = r.URL.Path
+		require.NoError(t, json.NewDecoder(r.Body).Decode(&capturedBody))
+		_ = json.NewEncoder(w).Encode(map[string]any{"pages": []string{"wiki/bar.md"}})
+	}))
+	defer srv.Close()
+
+	s := brain.New(brain.Config{IngestSvcURL: srv.URL})
+	args, _ := json.Marshal(map[string]any{"content": "some content", "source": "my-source.md"})
+	out, err := s.Handle(context.Background(), "brain_ingest", args)
+	require.NoError(t, err)
+
+	assert.Equal(t, "/ingest", capturedPath)
+	assert.Equal(t, "some content", capturedBody["content"])
+	assert.Equal(t, "my-source.md", capturedBody["source"])
+
+	var result map[string]any
+	require.NoError(t, json.Unmarshal(out, &result))
+	pages := result["pages"].([]any)
+	assert.Len(t, pages, 1)
+}
+
+func TestIngest_MissingRequiredFields(t *testing.T) {
+	s := brain.New(brain.Config{IngestSvcURL: "http://localhost:3300"})
+	args, _ := json.Marshal(map[string]any{})
+	_, err := s.Handle(context.Background(), "brain_ingest", args)
+	require.Error(t, err)
+	assert.Contains(t, err.Error(), "either content+source or path is required")
+}
--- a/internal/skills/brain/skill.go
+++ b/internal/skills/brain/skill.go
@@ -9,7 +9,9 @@ import (

 // Config holds brain skill configuration.
 type Config struct {
-	IngestBaseURL string // base URL of the ingestion HTTP server, e.g. http://localhost:3300
+	IngestBaseURL  string // base URL of the ingestion HTTP server (brain_query, brain_write)
+	IngestSvcURL   string // base URL of the ingestion-svc HTTP server (brain_ingest)
+	KBRetrievalURL string // base URL of the kb-retrieval server (brain_search)
 }

 // Skill implements registry.Skill for brain_query and brain_write.
@@ -32,10 +34,10 @@ func (s *Skill) Tools() []registry.ToolDef {
 	str := map[string]any{"type": "string"}
 	num := map[string]any{"type": "integer"}

-	return []registry.ToolDef{
+	tools := []registry.ToolDef{
 		{
 			Name:        "brain_query",
-			Description: "Search the hyperguild brain wiki for relevant knowledge. Call this before starting any significant task.",
+			Description: "BM25 full-text search across brain/knowledge/ and brain/wiki/ markdown files. Fast, no embeddings needed. Call before any significant task.",
 			InputSchema: schema([]string{"query"}, map[string]any{
 				"query": str,
 				"limit": num,
@@ -43,7 +45,7 @@ func (s *Skill) Tools() []registry.ToolDef {
 		},
 		{
 			Name:        "brain_write",
-			Description: "Write a raw knowledge note to the brain for later ingestion into the wiki.",
+			Description: "Write a raw knowledge note to brain/knowledge/ for later ingestion.",
 			InputSchema: schema([]string{"content"}, map[string]any{
 				"content":  str,
 				"type":     str,
@@ -52,4 +54,58 @@ func (s *Skill) Tools() []registry.ToolDef {
 			}),
 		},
 	}
+	if s.cfg.IngestSvcURL != "" {
+		tools = append(tools, registry.ToolDef{
+			Name: "brain_ingest_raw",
+			Description: "Ingest pre-structured pages into the brain wiki, bypassing the LLM extraction step. " +
+				"Use when you (the calling agent) have already extracted entities, concepts, and content from a source. " +
+				"Provide source (human-readable name) and pages (array of {title, type, subtype, domain, content} objects). " +
+				"The pipeline computes slugs, paths, frontmatter, wikilink canonicalization, and source back-references. " +
+				"Returns the list of wiki pages written.",
+			InputSchema: schema([]string{"source", "pages"}, map[string]any{
+				"source": map[string]any{"type": "string", "description": "human-readable name for the source, e.g. 'shape-up-book'"},
+				"pages": map[string]any{
+					"type": "array",
+					"items": map[string]any{
+						"type":     "object",
+						"required": []string{"title", "type", "content"},
+						"properties": map[string]any{
+							"title":   map[string]any{"type": "string", "description": "page title, e.g. 'Hash Encoding'"},
+							"type":    map[string]any{"type": "string", "enum": []string{"source", "concept", "entity"}, "description": "page type"},
+							"subtype": map[string]any{"type": "string", "description": "entity: person|company|tool|model|framework|technology; source: article|pdf|book|video|note|project"},
+							"domain":  map[string]any{"type": "string", "description": "knowledge domain, e.g. 'Machine Learning'"},
+							"content": map[string]any{"type": "string", "description": "markdown body — no frontmatter, use [[Display Name]] for wikilinks"},
+						},
+					},
+				},
+				"dry_run": map[string]any{"type": "boolean"},
+			}),
+		})
+		tools = append(tools, registry.ToolDef{
+			Name: "brain_ingest",
+			Description: "Ingest content into the brain wiki (brain/wiki/). Calls an LLM to produce structured wiki pages. " +
+				"Use for substantial documents, articles, or knowledge worth structuring. " +
+				"Provide EITHER (a) path — absolute path to a file or directory, " +
+				"OR (b) content + source — raw text and a human-readable name. " +
+				"Providing both is an error. Returns the list of wiki pages written.",
+			InputSchema: schema([]string{}, map[string]any{
+				"content": map[string]any{"type": "string", "description": "raw text to ingest; required when path is not set"},
+				"source":  map[string]any{"type": "string", "description": "human-readable name for the content, e.g. 'shape-up-book'; required when path is not set"},
+				"path":    map[string]any{"type": "string", "description": "absolute path to a file or directory to ingest; mutually exclusive with content+source"},
+				"dry_run": map[string]any{"type": "boolean"},
+			}),
+		})
+	}
+	if s.cfg.KBRetrievalURL != "" {
+		tools = append(tools, registry.ToolDef{
+			Name:        "brain_search",
+			Description: "Semantic vector search across the brain wiki using embeddings. Use when brain_query returns no results or you need conceptually-related results rather than keyword matches.",
+			InputSchema: schema([]string{"query"}, map[string]any{
+				"query":      str,
+				"collection": str,
+				"limit":      num,
+			}),
+		})
+	}
+	return tools
 }
Author	SHA1	Message	Date
Mathias Bergqvist	0a70d9e972	feat(pipeline): add POST /ingest-raw for direct batch ingestion without LLM All checks were successful CI / Lint / Test / Vet (push) Successful in 9s Details CI / Mirror to GitHub (push) Has been skipped Details Allows callers to provide pre-structured RawPage data directly, bypassing the LLM extraction step. The pipeline still handles slug computation, frontmatter, link canonicalization, source back-references, and dedup — only the extraction is skipped. Useful when a more capable model or manual curation produces the structured data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 11:15:59 +02:00
Mathias Bergqvist	3e9a648115	fix(pipeline): repair invalid JSON escape sequences from LLM output before parsing All checks were successful CI / Lint / Test / Vet (push) Successful in 11s Details CI / Mirror to GitHub (push) Has been skipped Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 22:04:27 +02:00
Mathias Bergqvist	923a665365	fix(pipeline): skip RawPages with empty title in BuildPages instead of producing broken paths All checks were successful CI / Lint / Test / Vet (push) Successful in 9s Details CI / Mirror to GitHub (push) Has been skipped Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 19:55:37 +02:00
Mathias Bergqvist	537aebc302	feat(pipeline): update system prompt for new LLM JSON contract (no slugs) - Change prompt to reflect new output format: title, type, subtype, domain, content - Remove slug/path generation responsibility from LLM — pipeline now handles it - Wikilinks change from [[slug\|Display Name]] to [[Display Name]] only - LLM no longer includes frontmatter or paths in output docs(schema): update LLM output format and wikilink convention for Level 3 - Specify JSON schema: title, type, subtype, domain, content fields - Remove frontmatter requirements from schema output (handled by pipeline) - Simplify wikilink format to [[Display Name]] — no slug or pipe - Pipeline now responsible for slug generation and frontmatter construction These changes shift slug/frontmatter generation from LLM to pipeline, reducing cognitive load on the model and improving control over output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 19:45:21 +02:00
Mathias Bergqvist	de35d4dbb0	feat(pipeline): wire ParseRawPages+BuildPages+CanonicalizeLinks into Run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 19:07:33 +02:00
Mathias Bergqvist	26855f69b0	feat(pipeline): add CanonicalizeLinks — convert [[Display Name]] to [[slug\|Display Name]]	2026-04-23 18:59:10 +02:00
Mathias Bergqvist	a7b363d589	fix(pipeline): quote YAML scalar fields in buildFrontmatter to prevent injection	2026-04-23 18:56:39 +02:00
Mathias Bergqvist	7b57051af8	feat(pipeline): add BuildPages — compute slugs/paths/frontmatter from RawPage	2026-04-23 18:50:37 +02:00
Mathias Bergqvist	a620f6cb01	fix(pipeline): guard empty-title bridge + skip stale integration tests until task4 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 18:46:07 +02:00
Mathias Bergqvist	26b5636b43	feat(pipeline): replace ParsePages with ParseRawPages + RawPage type Strips slug authority from the LLM. The new RawPage type carries only {title, type, subtype, domain, content} — no paths or frontmatter. Pipeline will derive slugs deterministically (Task 4). pipeline.go gets a temporary bridge stub (TODO task4) to keep the package compiling between tasks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 18:41:33 +02:00
Mathias Bergqvist	989f375aec	docs: add Level 3 implementation plan	2026-04-23 17:37:45 +02:00
Mathias Bergqvist	6403d5e444	docs: add Level 3 slug authority design spec	2026-04-23 17:23:22 +02:00
Mathias Bergqvist	ab19968ae2	feat: POST /backfill-refs — retroactive source back-reference injection All checks were successful CI / Lint / Test / Vet (push) Successful in 10s Details CI / Mirror to GitHub (push) Successful in 3s Details Walks wiki/sources/, extracts wikilinks from each source page, and injects ## Sources back-refs into all linked concept and entity pages. All refs from all sources are accumulated in memory before writing, so multiple sources referencing the same concept are merged in a single write. Running the endpoint multiple times is safe — wiki.Merge deduplicates bullet items.	2026-04-23 16:50:11 +02:00
Mathias Bergqvist	1605624668	feat(pipeline): add POST /backfill-refs endpoint to retroactively inject source back-references	2026-04-23 16:50:00 +02:00
Mathias Bergqvist	55fa0b503a	feat: source back-references on concept and entity pages All checks were successful CI / Lint / Test / Vet (push) Successful in 10s Details CI / Mirror to GitHub (push) Successful in 3s Details After each ingestion, every concept and entity page linked from the source page gains a ## Sources entry pointing back to that source. Pages already on disk (from prior ingestions) are loaded and updated, so re-ingesting a new source accumulates references over time. Deduplication is handled by wiki.Merge's existing bullet-section logic.	2026-04-23 16:36:40 +02:00
Mathias Bergqvist	3c2bd9268c	feat(pipeline): wire source back-reference injection into Run	2026-04-23 16:36:22 +02:00
Mathias Bergqvist	29727ec2a5	feat(pipeline): inject source back-references into concept and entity pages	2026-04-23 16:35:47 +02:00
Mathias Bergqvist	0a075088b2	docs: add source back-references implementation plan	2026-04-23 16:33:41 +02:00
Mathias Bergqvist	1bfe501d09	fix(cd): only deploy when CI passes on main All checks were successful CI / Lint / Test / Vet (push) Successful in 10s Details CI / Mirror to GitHub (push) Successful in 3s Details	2026-04-23 16:24:59 +02:00
Mathias Bergqvist	3607920601	fix(lint): resolve all errcheck violations in ingestion module All checks were successful cd / Build and deploy (push) Successful in 10s Details CI / Lint / Test / Vet (push) Successful in 10s Details CI / Mirror to GitHub (push) Successful in 3s Details	2026-04-23 16:20:59 +02:00
Mathias Bergqvist	a6c39e8691	feat: PDF extraction and fuzzy entity resolution Some checks failed cd / Build and deploy (push) Successful in 11s Details CI / Lint / Test / Vet (push) Failing after 5s Details CI / Mirror to GitHub (push) Has been skipped Details - New extract package: Text() dispatcher for .md/.txt passthrough and PDF extraction via pdftotext subprocess - wiki.Entry gains Aliases []string, loaded from YAML frontmatter - Fuzzy entity resolution in pipeline: normalizes titles (lowercase, strip articles, collapse hyphens) and matches proposed pages against existing inventory slugs and aliases to prevent proliferation - Watcher and API handler now use extract.Text() instead of os.ReadFile - Dockerfile: apk add poppler-utils in Alpine runtime stage	2026-04-23 16:03:02 +02:00
Mathias Bergqvist	a37d18bf7a	chore(docker): add poppler-utils for PDF text extraction	2026-04-23 16:02:12 +02:00
Mathias Bergqvist	2975eadc87	feat(watcher,api): use extract.Text() for file reading — fixes PDF ingestion	2026-04-23 16:01:36 +02:00
Mathias Bergqvist	53e46781b1	feat(pipeline): resolve proposed pages against inventory before writing	2026-04-23 16:00:31 +02:00
Mathias Bergqvist	e9b5cc401c	feat(pipeline): add fuzzy entity resolution to prevent slug proliferation	2026-04-23 15:59:36 +02:00
Mathias Bergqvist	bf6f497d9d	feat(wiki): add Aliases to Entry and read from YAML frontmatter	2026-04-23 15:57:16 +02:00
Mathias Bergqvist	9cc6c2d053	feat(extract): implement PDF extraction via pdftotext	2026-04-23 15:53:46 +02:00
Mathias Bergqvist	43a46d07e5	feat(extract): add Text() dispatcher with md/txt passthrough	2026-04-23 15:45:20 +02:00
Mathias Bergqvist	820d1c93a7	docs: add implementation plan for PDF extraction and entity resolution	2026-04-23 15:44:13 +02:00
Mathias Bergqvist	6928907d79	fix(watcher): copy files instead of moving them, leave originals for Obsidian Some checks failed cd / Build and deploy (push) Successful in 10s Details CI / Lint / Test / Vet (push) Failing after 5s Details CI / Mirror to GitHub (push) Has been skipped Details Files dropped into brain/raw/ are now copied to processed/ or failed/ rather than moved. A .processed or .failed marker is written next to the original so the watcher skips it on subsequent polls without deleting it. This keeps Syncthing-synced Obsidian vaults intact after ingestion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 14:47:50 +02:00
Mathias Bergqvist	e74320a8e8	feat(ingestion): wire watcher into server startup + fix Procfile env vars Some checks failed cd / Build and deploy (push) Successful in 10s Details CI / Lint / Test / Vet (push) Failing after 5s Details CI / Mirror to GitHub (push) Has been skipped Details - Start background watcher on startup when INGEST_WATCH_INTERVAL > 0 - Procfile: add INGEST_WATCH_INTERVAL=30 and INGEST_SVC_URL for supervisor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 23:09:00 +02:00
Mathias Bergqvist	1b0706f270	chore(brain): rename CLAUDE.md to schema.md for clarity CLAUDE.md has a specific meaning in the Claude Code ecosystem (agent instructions). The wiki schema for the ingestion pipeline should live in schema.md to avoid confusion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 23:06:32 +02:00
Mathias Bergqvist	2ae6bfe81e	fix(brain): enforce mutual exclusivity and clarify brain_ingest schema - Return error when both path and content are supplied simultaneously - Improve tool description to clearly state the two valid call forms - Add per-field descriptions so LLMs understand what each parameter requires Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 23:03:03 +02:00
Mathias Bergqvist	a6dce972d6	feat(brain): add path field to brain_ingest for /ingest-path routing Adds an optional path field to brain_ingest so Claude can ingest files or directories directly by path without embedding content in the call. Routing: path set → /ingest-path; content+source set → /ingest; neither → error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 23:01:05 +02:00
Mathias Bergqvist	2f4b577131	fix(ingestion): address code review issues in api and watcher packages - Strip internal error detail from 500 responses (leak prevention) - Add path containment assertion in /write handler - Use Go 1.22 method-prefixed mux routes for automatic 405 responses - Clarify watch_interval log when watcher not yet wired - Consolidate validation tests into table-driven TestIngest_Validation - Watcher: return nil after successful quarantine to avoid double-logging - Watcher: append timestamp suffix to processed dest if file already exists Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:59:39 +02:00
Mathias Bergqvist	a25bb18c54	feat(ingestion): add /ingest and /ingest-path HTTP handlers Wires pipeline.Run into the HTTP layer so callers can ingest raw text or files/directories without touching the filesystem directly. Rewrites main.go to parse LLM and watcher env vars and build pipeline.Config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:54:28 +02:00
Mathias Bergqvist	78531bb238	feat(ingestion): add background file watcher for brain/raw/ Polls brain/raw/ on a configurable ticker, derives human-readable source names from filenames, runs the pipeline, and moves files to processed/YYYY-MM-DD/ on success or failed/ on error with a log.md entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:54:03 +02:00
Mathias Bergqvist	04fefe8e9c	fix(ingestion): wrap naked error returns and harden mustJSON helper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:51:19 +02:00
Mathias Bergqvist	103f4d90bf	feat(ingestion): add pipeline orchestrator with prompt builder Adds prompt.go (BuildPrompt + systemPrompt) and pipeline.go (Run, Config, Result, mergeAll) that wire chunking, LLM calls, parse, merge, index rebuild, and log append into a single ingestion pipeline. Includes integration tests covering write, dry-run, and duplicate-path merge scenarios. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:45:19 +02:00
Mathias Bergqvist	9b11719481	feat(ingestion): add content chunking and LLM JSON output parser	2026-04-22 22:37:14 +02:00
Mathias Bergqvist	d405346f07	feat(ingestion): add wiki index rebuilder and audit log	2026-04-22 22:36:55 +02:00
Mathias Bergqvist	bf8a3fc11c	feat(ingestion): add OpenAI-compatible LLM HTTP client with 429 retry	2026-04-22 22:29:24 +02:00
Mathias Bergqvist	ae5a4d04f0	feat(ingestion): add wiki page merge logic	2026-04-22 22:28:55 +02:00
Mathias Bergqvist	3a0424a6b4	feat(ingestion): add wiki inventory loader	2026-04-22 22:28:53 +02:00
Mathias Bergqvist	08dd7b9365	docs(brain): add wiki schema document for ingest prompt	2026-04-22 22:25:52 +02:00
Mathias Bergqvist	91e02b930c	feat(ingestion): add wiki package with Page types and slug generation	2026-04-22 22:25:45 +02:00
Mathias Bergqvist	c7341a2607	feat(config): add IngestSvcURL and KBRetrievalURL to supervisor config	2026-04-22 22:24:27 +02:00
Mathias Bergqvist	b5a0085c0a	feat(brain): add brain_ingest, brain_search tools and extend search to wiki/	2026-04-22 22:16:02 +02:00
Mathias Bergqvist	d6daa37c71	docs: add brain ingestion pipeline implementation plan	2026-04-22 22:14:59 +02:00
Mathias Bergqvist	62fc3989f2	docs: add brain ingestion pipeline design spec	2026-04-22 22:05:19 +02:00