feat(pipeline): add POST /ingest-raw for direct batch ingestion without LLM

Allows callers to provide pre-structured RawPage data directly, bypassing the LLM extraction step. The pipeline still handles slug computation, frontmatter, link canonicalization, source back-references, and dedup — only the extraction is skipped. Useful when a more capable model or manual curation produces the structured data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(pipeline): repair invalid JSON escape sequences from LLM output before parsing
2026-04-24 11:15:59 +02:00 · 2026-04-23 22:04:27 +02:00 · 2026-04-23 19:55:37 +02:00 · 2026-04-23 19:45:21 +02:00 · 2026-04-23 19:07:33 +02:00 · 2026-04-23 18:59:10 +02:00
18 changed files with 2370 additions and 138 deletions
--- a/brain/schema.md
+++ b/brain/schema.md
@@ -3,21 +3,34 @@
 This document defines the three page types in the brain wiki.
 The LLM must follow this schema exactly when generating wiki pages.
 ## Output Format
 Return a JSON array. Each element:
 ```json
 {
  "title":   "exact page title",
  "type":    "source | concept | entity",
  "subtype": "see below — omit for concept",
  "domain":  "see domains — omit if none fits",
  "content": "Markdown body only — no frontmatter, no path"
 }
 ```
 - `subtype` for **source**: `article | pdf | book | video | note | project`
 - `subtype` for **entity**: `person | company | tool | model | framework | technology`
 - The pipeline computes slugs and frontmatter — never include them in output.
 ## Wikilink Format
-All cross-references use `[[slug|Display Text]]`.
+All cross-references use `[[Display Name]]` — just the display name, no slug, no pipe.
 Rules:
- slug = lowercase filename without .md, spaces → hyphens, strip all non-alphanumeric except hyphens
+- Only link to pages in the inventory or pages you are creating in this response
- The `|` separator is REQUIRED — never use `[[Title]]` without a slug
+- The pipeline converts `[[Display Name]]` to `[[slug|Display Name]]` automatically
- Examples: `[[domain-driven-design|Domain Driven Design]]`, `[[ryan-singer|Ryan Singer]]`
+- Section links must match their section type (Related Concepts → concept pages only, etc.)
 - Slugs must resolve to an existing file in the inventory, or a file you are creating in this response
-Slug generation examples:
+Examples: `[[Domain Driven Design]]`, `[[Ryan Singer]]`, `[[Shape Up]]`
 - "Domain Driven Design" → `domain-driven-design`
 - "It's Complicated" → `its-complicated`
 - "gRPC" → `grpc`
 - "GPT-4o" → `gpt-4o`
 ## Domains
@@ -30,17 +43,6 @@ Use one of: `ai-llm`, `software-engineering`, `product-strategy`, `finance-marke
 One page per ingested source. Books are NEVER split across multiple source pages — update the existing one.
 Required frontmatter:
 ```yaml
 title: <exact title>
 type: article | pdf | book | video | note | project
 domain: <domain>
 date_ingested: YYYY-MM-DD
 last_updated: YYYY-MM-DD
 aliases:
  - <exact title>
 ```
 Body sections (in this order):
 ### Summary
@@ -50,10 +52,10 @@ Body sections (in this order):
 Bulleted list. Paraphrase — no verbatim quotes or code.
 ### Concepts Introduced or Reinforced
-Wikilinks to wiki/concepts/ ONLY. One per line.
+Wikilinks to concept pages ONLY. One per line.
 ### Entities Mentioned
-Wikilinks to wiki/entities/ ONLY. One per line.
+Wikilinks to entity pages ONLY. One per line.
 ### Open Questions Raised
 Gaps or follow-up questions from this source.
@@ -75,15 +77,6 @@ Dated entries appended on re-ingestion. NEVER rewrite — only append.
 One page per idea, framework, methodology, or pattern.
 Required frontmatter:
 ```yaml
 title: <concept name>
 domain: <domain>
 last_updated: YYYY-MM-DD
 aliases:
  - <exact title>
 ```
 Body sections (in this order):
 ### Definition
@@ -93,13 +86,13 @@ One-paragraph plain-language explanation.
 Practical significance. Why should anyone care?
 ### Related Concepts
-Wikilinks to wiki/concepts/ ONLY.
+Wikilinks to concept pages ONLY.
 ### Related Entities
-Wikilinks to wiki/entities/ ONLY.
+Wikilinks to entity pages ONLY.
 ### Sources
-Wikilinks to wiki/sources/ ONLY.
+Wikilinks to source pages ONLY.
 ### Evolving Notes
 Updated as new sources arrive. Append, do not rewrite.
@@ -110,16 +103,6 @@ Updated as new sources arrive. Append, do not rewrite.
 One page per person, tool, organisation, technology, or product.
 Required frontmatter:
 ```yaml
 title: <name>
 type: person | company | tool | model | framework | technology
 domain: <domain>
 last_updated: YYYY-MM-DD
 aliases:
  - <exact title>
 ```
 Body sections (in this order):
 ### Description
@@ -132,23 +115,23 @@ Why this entity matters to this knowledge base.
 With dates where known.
 ### Related Concepts
-Wikilinks to wiki/concepts/ ONLY.
+Wikilinks to concept pages ONLY.
 ### Related Entities
-Wikilinks to wiki/entities/ ONLY.
+Wikilinks to entity pages ONLY.
 ### Sources
-Wikilinks to wiki/sources/ ONLY.
+Wikilinks to source pages ONLY.
 ---
 ## Non-Negotiable Rules
 1. Output ONLY a valid JSON array — no markdown fences, no prose before or after
-2. Each element: `{"path": "wiki/<type>/<slug>.md", "content": "...full markdown..."}`
+2. Each element: `{"title": "...", "type": "...", "subtype": "...", "domain": "...", "content": "..."}`
-3. Slugs are kebab-case: lowercase, spaces→hyphens, strip special characters
+3. Never include slugs, paths, or frontmatter in output — the pipeline handles these
-4. Every wikilink must be `[[slug|Display Text]]` — the pipe separator is required
+4. Wikilinks: `[[Display Name]]` only — no pipe, no slug
-5. Dates always YYYY-MM-DD
+5. Dates always YYYY-MM-DD (used only in content body where contextually relevant)
 6. Never reproduce verbatim code — describe the pattern or technique
-7. Section links must match their section type (Related Concepts → concepts/ only, etc.)
+7. Section links must match their section type
 8. One source page per book — if inventory shows it exists, include it as an UPDATE
--- a/docs/superpowers/plans/2026-04-23-level3-slug-authority.md
+++ b/docs/superpowers/plans/2026-04-23-level3-slug-authority.md
--- a/docs/superpowers/specs/2026-04-23-level3-slug-authority-design.md
+++ b/docs/superpowers/specs/2026-04-23-level3-slug-authority-design.md
@@ -0,0 +1,148 @@
 # Level 3: Strip Slug Authority from LLM — Design Spec
 ## Problem
 The ingestion pipeline currently asks the LLM to produce full wiki pages including the file path (e.g. `wiki/sources/finbert-huggingface.md`). This causes two classes of bug:
 1. **Slug proliferation** — the LLM invents different slugs for the same concept across chunks or runs, producing duplicate pages that diverge in content.
 2. **Unstable paths** — the LLM may shorten, expand, or vary titles, making deduplication via `Resolve` unreliable because the slug mismatch is upstream of the normalizer.
 ## Solution
 Strip slug authority from the LLM entirely. The LLM returns a minimal structured object. The pipeline computes all slugs deterministically from titles using `wiki.Slug(title)`.
 ---
 ## LLM JSON Contract
 ### Output format (per page)
 ```json
 {
  "title": "FinBERT",
  "type": "concept",
  "subtype": "framework",
  "domain": "ai-llm",
  "content": "## Definition\n\nA BERT-based model fine-tuned for financial sentiment...\n\n## Related\n\n- [[Sentiment Analysis]]\n- [[Hugging Face]]\n"
 }
 ```
 **Fields:**
 | Field | Required | Values |
 |-------|----------|--------|
 | `title` | yes | Human-readable title, e.g. "FinBERT" |
 | `type` | yes | `"source"` \| `"concept"` \| `"entity"` |
 | `subtype` | for entity/source | entity: `person\|company\|tool\|model\|framework\|technology`; source: `article\|pdf\|book\|video\|note\|project` |
 | `domain` | no | tag string, e.g. `ai-llm`, `finance` |
 | `content` | yes | Markdown body sections only — no frontmatter, no path |
 **Wikilinks in content:** `[[Display Name]]` only. No slug. The pipeline canonicalizes to `[[slug|Display Name]]` in a post-processing step.
 **The LLM never writes slugs, paths, or frontmatter.**
 ---
 ## Pipeline Changes
 ### New type: `RawPage`
 ```go
 type RawPage struct {
    Title   string
    Type    string // "source" | "concept" | "entity"
    Subtype string
    Domain  string
    Content string
 }
 ```
 ### New step order
 ```
 ParseRawPages → BuildPages → Resolve → CanonicalizeLinks → injectSourceRefs → mergeAll → write
 ```
 ### Step descriptions
 **`ParseRawPages(output string) ([]RawPage, []string)`**
 Replaces `ParsePages`. Deserializes JSON objects with the new schema. Same truncation-recovery logic as today. Returns `(pages, warnings)`.
 **`BuildPages(rawPages []RawPage, sourceSlug, date string) []wiki.Page`**
 Converts `RawPage → wiki.Page`:
 - Computes slug: `wiki.Slug(page.Title)`
 - Computes path: `wiki/<type>/<slug>.md`
 - Assembles frontmatter:
  ```
  ---
  title: <Title>
  type: <type>
  subtype: <subtype>        # omitted if empty
  domain: <domain>          # omitted if empty
  created: <date>
  source: <sourceSlug>      # omitted for the source page itself
  ---
  ```
 - Concatenates frontmatter + content
 **`Resolve(pages []wiki.Page, inventory) []wiki.Page`**
 Unchanged. Normalizes near-duplicate titles to existing inventory slugs.
 **`CanonicalizeLinks(pages []wiki.Page, inventory) ([]wiki.Page, []string)`**
 New. Builds a title→slug map from inventory + current batch. Replaces `[[Display Name]]` with `[[slug|Display Name]]` in each page's content. Titles with no known slug are left as-is and returned as warnings.
 **`injectSourceRefs`**
 Unchanged. Reads `[[slug|...]]` links (post-canonicalization) to inject back-references.
 **`mergeAll → write`**
 Unchanged.
 ### `pipeline.Run` signature change
 ```go
 func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryRun bool) (Result, error)
 ```
 `source` is already passed (it's the display name / filename). A new internal `sourceSlug` is derived from it via `wiki.Slug(source)` before calling `BuildPages`. No API change needed.
 ---
 ## Files Changed
 | File | Change |
 |------|--------|
 | `ingestion/internal/pipeline/parse.go` | Replace `ParsePages` with `ParseRawPages` + `RawPage` type |
 | `ingestion/internal/pipeline/build.go` | New file: `BuildPages` |
 | `ingestion/internal/pipeline/links.go` | New file: `CanonicalizeLinks` |
 | `ingestion/internal/pipeline/pipeline.go` | Wire new steps; derive `sourceSlug` from `source` |
 | `ingestion/internal/pipeline/prompt.go` | New system prompt + `BuildPrompt` for new JSON format |
 | `brain/schema.md` | Update wikilink format and JSON schema docs |
 `resolve.go`, `refs.go`, `backfill.go`, `merge.go` — no changes.
 ---
 ## Wikilink Format
 - **LLM output**: `[[Display Name]]`
 - **Stored on disk**: `[[slug|Display Name]]`
 - **`CanonicalizeLinks`** converts between the two using the inventory
 This matches Obsidian's display-alias syntax that the existing codebase already uses.
 ---
 ## Testing Strategy
 - `ParseRawPages`: table-driven, cover valid JSON, truncated output, unknown type, missing title
 - `BuildPages`: table-driven, cover slug computation, frontmatter assembly, source page (no `source:` field), entity with subtype
 - `CanonicalizeLinks`: cover known title → replaced, unknown title → left as-is + warning, multiple links in one page
 - Integration test: full `Run` call with mock LLM returning new JSON format, assert no slug duplication across two chunks of the same source
 ---
 ## Out of Scope
 - Re-ingesting existing pages (user will trigger manually after deploy)
 - Changing the `BackfillRefs` endpoint (already correct, slug-based)
 - Changing the `Resolve` fuzzy-match algorithm
--- a/ingestion/cmd/server/main.go
+++ b/ingestion/cmd/server/main.go
@@ -68,6 +68,7 @@ func main() {
 	mux.HandleFunc("POST /write", h.Write)
 	mux.HandleFunc("POST /ingest", h.Ingest)
 	mux.HandleFunc("POST /ingest-path", h.IngestPath)
 	mux.HandleFunc("POST /ingest-raw", h.IngestRaw)
 	mux.HandleFunc("POST /backfill-refs", h.BackfillRefs)
 	addr := ":" + port
--- a/ingestion/internal/api/handler.go
+++ b/ingestion/internal/api/handler.go
@@ -272,6 +272,48 @@ func (h *Handler) IngestPath(w http.ResponseWriter, r *http.Request) {
 	writeJSON(w, ingestResponse{Pages: allPages, Warnings: allWarnings})
 }
 type ingestRawRequest struct {
 	Source string             `json:"source"`
 	Pages  []pipeline.RawPage `json:"pages"`
 	DryRun bool               `json:"dry_run"`
 }
 // IngestRaw handles POST /ingest-raw — run the pipeline on pre-parsed RawPages,
 // skipping the LLM extraction step. Use when the caller has already produced
 // structured page data (e.g. from a more capable model or manual curation).
 func (h *Handler) IngestRaw(w http.ResponseWriter, r *http.Request) {
 	var req ingestRawRequest
 	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
 		writeError(w, http.StatusBadRequest, "invalid JSON")
 		return
 	}
 	if strings.TrimSpace(req.Source) == "" {
 		writeError(w, http.StatusBadRequest, "source is required")
 		return
 	}
 	if len(req.Pages) == 0 {
 		writeError(w, http.StatusBadRequest, "pages is required and must be non-empty")
 		return
 	}
 	result, err := pipeline.RunRaw(h.brainDir, req.Source, req.Pages, req.DryRun)
 	if err != nil {
 		h.logger.Error("ingest-raw failed", "source", req.Source, "err", err)
 		writeError(w, http.StatusInternalServerError, "ingest error")
 		return
 	}
 	pages := result.Pages
 	if pages == nil {
 		pages = []string{}
 	}
 	warnings := result.Warnings
 	if warnings == nil {
 		warnings = []string{}
 	}
 	writeJSON(w, ingestResponse{Pages: pages, Warnings: warnings})
 }
 // BackfillRefs handles POST /backfill-refs — injects source back-references
 // into all concept and entity pages based on existing wiki/sources/ pages.
 func (h *Handler) BackfillRefs(w http.ResponseWriter, r *http.Request) {
--- a/ingestion/internal/api/handler_test.go
+++ b/ingestion/internal/api/handler_test.go
@@ -20,9 +20,9 @@ import (
 	"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
 )
-// stubComplete returns a fixed JSON page so tests never call a real LLM.
+// stubComplete returns a fixed JSON RawPage so tests never call a real LLM.
 func stubComplete(_ context.Context, _, _ string) (string, error) {
-	return `[{"path":"wiki/sources/test-source.md","content":"# Test Source\n\nSome content here.\n"}]`, nil
+	return `[{"title":"Test Source","type":"source","subtype":"article","content":"## Summary\n\nSome content here.\n"}]`, nil
 }
 func stubPipelineCfg() pipeline.Config {
@@ -226,6 +226,85 @@ func TestIngestPath_File(t *testing.T) {
 	assert.NotEmpty(t, pagesSlice)
 }
 // ---------------------------------------------------------------------------
 // POST /ingest-raw
 // ---------------------------------------------------------------------------
 func TestIngestRaw_Validation(t *testing.T) {
 	cases := []struct {
 		name string
 		body map[string]any
 	}{
 		{"missing source", map[string]any{"pages": []any{map[string]any{"title": "X", "type": "concept", "content": "x"}}}},
 		{"missing pages", map[string]any{"source": "test-source"}},
 		{"empty pages", map[string]any{"source": "test-source", "pages": []any{}}},
 	}
 	for _, tc := range cases {
 		t.Run(tc.name, func(t *testing.T) {
 			_, h := setup(t)
 			body, _ := json.Marshal(tc.body)
 			req := httptest.NewRequest(http.MethodPost, "/ingest-raw", bytes.NewReader(body))
 			rec := httptest.NewRecorder()
 			h.IngestRaw(rec, req)
 			assert.Equal(t, http.StatusBadRequest, rec.Code)
 		})
 	}
 }
 func TestIngestRaw_Success(t *testing.T) {
 	dir, h := setup(t)
 	body, _ := json.Marshal(map[string]any{
 		"source": "test-article",
 		"pages": []any{
 			map[string]any{"title": "Test Article", "type": "source", "subtype": "article", "domain": "Testing", "content": "## Summary\n\nThis is a test article about [[Test Concept]].\n"},
 			map[string]any{"title": "Test Concept", "type": "concept", "domain": "Testing", "content": "A concept for testing.\n"},
 		},
 	})
 	req := httptest.NewRequest(http.MethodPost, "/ingest-raw", bytes.NewReader(body))
 	rec := httptest.NewRecorder()
 	h.IngestRaw(rec, req)
 	require.Equal(t, http.StatusOK, rec.Code)
 	var resp map[string]any
 	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
 	pages := resp["pages"].([]any)
 	assert.Len(t, pages, 2)
 	// Verify files were written
 	sourcePath := filepath.Join(dir, "wiki", "sources", "test-article.md")
 	assert.FileExists(t, sourcePath)
 	conceptPath := filepath.Join(dir, "wiki", "concepts", "test-concept.md")
 	assert.FileExists(t, conceptPath)
 }
 func TestIngestRaw_DryRun(t *testing.T) {
 	dir, h := setup(t)
 	body, _ := json.Marshal(map[string]any{
 		"source": "dry-run-test",
 		"pages": []any{
 			map[string]any{"title": "Dry Run Source", "type": "source", "subtype": "article", "content": "Content."},
 		},
 		"dry_run": true,
 	})
 	req := httptest.NewRequest(http.MethodPost, "/ingest-raw", bytes.NewReader(body))
 	rec := httptest.NewRecorder()
 	h.IngestRaw(rec, req)
 	require.Equal(t, http.StatusOK, rec.Code)
 	var resp map[string]any
 	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
 	pages := resp["pages"].([]any)
 	assert.NotEmpty(t, pages)
 	// Verify no files were written
 	sourcePath := filepath.Join(dir, "wiki", "sources", "dry-run-test.md")
 	assert.NoFileExists(t, sourcePath)
 }
 func TestIngestPath_Directory(t *testing.T) {
 	_, h := setup(t)
--- a/ingestion/internal/pipeline/build.go
+++ b/ingestion/internal/pipeline/build.go
@@ -0,0 +1,106 @@
 // ingestion/internal/pipeline/build.go
 package pipeline
 import (
 	"fmt"
 	"strings"
 	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
 )
 // BuildPages converts RawPages from the LLM into wiki.Pages with computed slugs,
 // paths, and YAML frontmatter. sourceSlug is the slug of the source being ingested
 // (derived from the filename, not the LLM title). Pages whose title resolves to an
 // empty slug are skipped and returned as warnings instead.
 func BuildPages(rawPages []RawPage, sourceSlug, date string) ([]wiki.Page, []string) {
 	out := make([]wiki.Page, 0, len(rawPages))
 	var warnings []string
 	for _, rp := range rawPages {
 		slug := computeSlug(rp, sourceSlug)
 		if slug == "" {
 			warnings = append(warnings, fmt.Sprintf("skipped page with empty title (type: %s)", rp.Type))
 			continue
 		}
 		out = append(out, buildPage(rp, sourceSlug, date))
 	}
 	return out, warnings
 }
 func computeSlug(rp RawPage, sourceSlug string) string {
 	if rp.Type == "source" {
 		return sourceSlug
 	}
 	return wiki.Slug(rp.Title)
 }
 func buildPage(rp RawPage, sourceSlug, date string) wiki.Page {
 	var slug, dir string
 	switch rp.Type {
 	case "source":
 		slug = sourceSlug
 		dir = "wiki/sources"
 	case "concept":
 		slug = wiki.Slug(rp.Title)
 		dir = "wiki/concepts"
 	case "entity":
 		slug = wiki.Slug(rp.Title)
 		dir = "wiki/entities"
 	default:
 		slug = wiki.Slug(rp.Title)
 		dir = "wiki/" + rp.Type
 	}
 	path := dir + "/" + slug + ".md"
 	fm := buildFrontmatter(rp, date)
 	return wiki.Page{
 		Path:    path,
 		Content: fm + "\n" + rp.Content,
 	}
 }
 func buildFrontmatter(rp RawPage, date string) string {
 	var sb strings.Builder
 	sb.WriteString("---\n")
 	fmt.Fprintf(&sb, "title: %s\n", yamlScalar(rp.Title))
 	switch rp.Type {
 	case "source":
 		subtype := rp.Subtype
 		if subtype == "" {
 			subtype = "article"
 		}
 		fmt.Fprintf(&sb, "type: %s\n", yamlScalar(subtype))
 		if rp.Domain != "" {
 			fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
 		}
 		fmt.Fprintf(&sb, "date_ingested: %s\n", date)
 		fmt.Fprintf(&sb, "last_updated: %s\n", date)
 	case "concept":
 		if rp.Domain != "" {
 			fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
 		}
 		fmt.Fprintf(&sb, "last_updated: %s\n", date)
 	case "entity":
 		if rp.Subtype != "" {
 			fmt.Fprintf(&sb, "type: %s\n", yamlScalar(rp.Subtype))
 		}
 		if rp.Domain != "" {
 			fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
 		}
 		fmt.Fprintf(&sb, "last_updated: %s\n", date)
 	default:
 		if rp.Domain != "" {
 			fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
 		}
 		fmt.Fprintf(&sb, "last_updated: %s\n", date)
 	}
 	fmt.Fprintf(&sb, "aliases:\n  - %s\n", yamlScalar(rp.Title))
 	sb.WriteString("---\n")
 	return sb.String()
 }
 func yamlScalar(s string) string {
 	return "'" + strings.ReplaceAll(s, "'", "''") + "'"
 }
--- a/ingestion/internal/pipeline/build_test.go
+++ b/ingestion/internal/pipeline/build_test.go
@@ -0,0 +1,167 @@
 // ingestion/internal/pipeline/build_test.go
 package pipeline
 import (
 	"strings"
 	"testing"
 	"github.com/stretchr/testify/assert"
 	"github.com/stretchr/testify/require"
 )
 func TestBuildPages_SourcePage(t *testing.T) {
 	raw := []RawPage{
 		{
 			Title:   "Shape Up",
 			Type:    "source",
 			Subtype: "book",
 			Domain:  "product-strategy",
 			Content: "## Summary\n\nA book about shaping product work.\n",
 		},
 	}
 	pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
 	require.Len(t, pages, 1)
 	assert.Empty(t, warnings)
 	p := pages[0]
 	assert.Equal(t, "wiki/sources/shape-up.md", p.Path)
 	assert.Contains(t, p.Content, "title: 'Shape Up'")
 	assert.Contains(t, p.Content, "type: 'book'")
 	assert.Contains(t, p.Content, "domain: 'product-strategy'")
 	assert.Contains(t, p.Content, "date_ingested: 2026-04-23")
 	assert.Contains(t, p.Content, "last_updated: 2026-04-23")
 	assert.Contains(t, p.Content, "aliases:\n  - 'Shape Up'")
 	assert.Contains(t, p.Content, "## Summary")
 	assert.True(t, strings.HasPrefix(p.Content, "---\n"), "content must start with frontmatter")
 }
 func TestBuildPages_ConceptPage(t *testing.T) {
 	raw := []RawPage{
 		{
 			Title:   "Betting",
 			Type:    "concept",
 			Domain:  "product-strategy",
 			Content: "## Definition\n\nA resource allocation technique.\n",
 		},
 	}
 	pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
 	require.Len(t, pages, 1)
 	assert.Empty(t, warnings)
 	p := pages[0]
 	assert.Equal(t, "wiki/concepts/betting.md", p.Path)
 	assert.Contains(t, p.Content, "title: 'Betting'")
 	assert.Contains(t, p.Content, "domain: 'product-strategy'")
 	assert.Contains(t, p.Content, "last_updated: 2026-04-23")
 	assert.Contains(t, p.Content, "aliases:\n  - 'Betting'")
 	assert.NotContains(t, p.Content, "date_ingested")
 	assert.Contains(t, p.Content, "## Definition")
 }
 func TestBuildPages_EntityPage(t *testing.T) {
 	raw := []RawPage{
 		{
 			Title:   "Ryan Singer",
 			Type:    "entity",
 			Subtype: "person",
 			Domain:  "product-strategy",
 			Content: "## Description\n\nA product designer.\n",
 		},
 	}
 	pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
 	require.Len(t, pages, 1)
 	assert.Empty(t, warnings)
 	p := pages[0]
 	assert.Equal(t, "wiki/entities/ryan-singer.md", p.Path)
 	assert.Contains(t, p.Content, "title: 'Ryan Singer'")
 	assert.Contains(t, p.Content, "type: 'person'")
 	assert.Contains(t, p.Content, "domain: 'product-strategy'")
 	assert.Contains(t, p.Content, "last_updated: 2026-04-23")
 	assert.Contains(t, p.Content, "aliases:\n  - 'Ryan Singer'")
 	assert.NotContains(t, p.Content, "date_ingested")
 }
 func TestBuildPages_SourceSlugUsedForSourcePage(t *testing.T) {
 	// LLM title differs from filename — pipeline uses sourceSlug for the source page path.
 	raw := []RawPage{
 		{Title: "FinBERT: A Pretrained Model", Type: "source", Subtype: "article", Content: "## Summary\n\nA model.\n"},
 	}
 	pages, _ := BuildPages(raw, "finbert-huggingface", "2026-04-23")
 	require.Len(t, pages, 1)
 	assert.Equal(t, "wiki/sources/finbert-huggingface.md", pages[0].Path)
 }
 func TestBuildPages_ConceptSlugDerivedFromTitle(t *testing.T) {
 	raw := []RawPage{
 		{Title: "Domain-Driven Design", Type: "concept", Content: "## Definition\n\nFoo.\n"},
 	}
 	pages, _ := BuildPages(raw, "some-source", "2026-04-23")
 	require.Len(t, pages, 1)
 	assert.Equal(t, "wiki/concepts/domain-driven-design.md", pages[0].Path)
 }
 func TestBuildPages_SourceDefaultSubtype(t *testing.T) {
 	// If subtype is omitted for a source, default to "article"
 	raw := []RawPage{
 		{Title: "Some Post", Type: "source", Content: "## Summary\n\nA post.\n"},
 	}
 	pages, _ := BuildPages(raw, "some-post", "2026-04-23")
 	require.Len(t, pages, 1)
 	assert.Contains(t, pages[0].Content, "type: 'article'")
 }
 func TestBuildPages_OmitsDomainWhenEmpty(t *testing.T) {
 	raw := []RawPage{
 		{Title: "Betting", Type: "concept", Content: "## Definition\n\nFoo.\n"},
 	}
 	pages, _ := BuildPages(raw, "src", "2026-04-23")
 	require.Len(t, pages, 1)
 	assert.NotContains(t, pages[0].Content, "domain:")
 }
 func TestBuildPages_MultiplePages(t *testing.T) {
 	raw := []RawPage{
 		{Title: "Shape Up", Type: "source", Subtype: "book", Content: "## Summary\n\nA book.\n"},
 		{Title: "Betting", Type: "concept", Content: "## Definition\n\nA technique.\n"},
 		{Title: "Ryan Singer", Type: "entity", Subtype: "person", Content: "## Description\n\nA designer.\n"},
 	}
 	pages, _ := BuildPages(raw, "shape-up", "2026-04-23")
 	require.Len(t, pages, 3)
 	assert.Equal(t, "wiki/sources/shape-up.md", pages[0].Path)
 	assert.Equal(t, "wiki/concepts/betting.md", pages[1].Path)
 	assert.Equal(t, "wiki/entities/ryan-singer.md", pages[2].Path)
 }
 func TestBuildPages_TitleWithColon(t *testing.T) {
 	raw := []RawPage{
 		{Title: "Shape Up: The Basecamp Method", Type: "source", Subtype: "book", Content: "## Summary\n\nA book.\n"},
 	}
 	pages, _ := BuildPages(raw, "shape-up", "2026-04-23")
 	require.Len(t, pages, 1)
 	// Title with colon must be quoted in YAML
 	assert.Contains(t, pages[0].Content, "title: 'Shape Up: The Basecamp Method'")
 	assert.Contains(t, pages[0].Content, "aliases:\n  - 'Shape Up: The Basecamp Method'")
 }
 func TestBuildPages_EntityNoSubtype(t *testing.T) {
 	raw := []RawPage{
 		{Title: "Basecamp", Type: "entity", Content: "## Description\n\nA company.\n"},
 	}
 	pages, _ := BuildPages(raw, "src", "2026-04-23")
 	require.Len(t, pages, 1)
 	assert.NotContains(t, pages[0].Content, "type:")
 	assert.Contains(t, pages[0].Content, "title: 'Basecamp'")
 }
 func TestBuildPages_EmptyTitleSkippedWithWarning(t *testing.T) {
 	raw := []RawPage{
 		{Title: "", Type: "concept", Content: "## Definition\n\nFoo.\n"},
 		{Title: "Betting", Type: "concept", Content: "## Definition\n\nA technique.\n"},
 	}
 	pages, warnings := BuildPages(raw, "src", "2026-04-23")
 	require.Len(t, pages, 1, "empty-title page should be skipped")
 	assert.Equal(t, "wiki/concepts/betting.md", pages[0].Path)
 	assert.Len(t, warnings, 1)
 	assert.Contains(t, warnings[0], "empty title")
 }
--- a/ingestion/internal/pipeline/links.go
+++ b/ingestion/internal/pipeline/links.go
@@ -0,0 +1,70 @@
 // ingestion/internal/pipeline/links.go
 package pipeline
 import (
 	"fmt"
 	"path/filepath"
 	"regexp"
 	"strings"
 	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
 )
 // plainLinkRE matches [[Display Name]] — wikilinks without a slug pipe.
 // It does NOT match [[slug|Display]] (those already have a pipe).
 var plainLinkRE = regexp.MustCompile(`\[\[([^\]|]+)\]\]`)
 // CanonicalizeLinks converts [[Display Name]] wikilinks to [[slug|Display Name]]
 // using a title→slug map built from the inventory and current batch.
 // Unknown titles are left as-is and returned as warnings.
 func CanonicalizeLinks(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) ([]wiki.Page, []string) {
 	titleToSlug := buildTitleMap(pages, inventory)
 	var allWarnings []string
 	out := make([]wiki.Page, len(pages))
 	for i, p := range pages {
 		newContent, warnings := canonicalizeContent(p.Content, titleToSlug)
 		p.Content = newContent
 		out[i] = p
 		allWarnings = append(allWarnings, warnings...)
 	}
 	return out, allWarnings
 }
 // buildTitleMap builds a lowercase-title → slug map from inventory and current batch.
 // Current batch entries take precedence over inventory (they may be updates).
 func buildTitleMap(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) map[string]string {
 	m := make(map[string]string)
 	for _, entries := range inventory {
 		for _, e := range entries {
 			m[strings.ToLower(e.Title)] = e.Slug
 		}
 	}
 	// Current batch overrides inventory
 	for _, p := range pages {
 		title := extractTitle(p.Content)
 		slug := strings.TrimSuffix(filepath.Base(p.Path), ".md")
 		if title != "" && slug != "" {
 			m[strings.ToLower(title)] = slug
 		}
 	}
 	return m
 }
 func canonicalizeContent(content string, titleToSlug map[string]string) (string, []string) {
 	var warnings []string
 	result := plainLinkRE.ReplaceAllStringFunc(content, func(match string) string {
 		sub := plainLinkRE.FindStringSubmatch(match)
 		if len(sub) < 2 {
 			return match
 		}
 		displayName := sub[1]
 		slug, ok := titleToSlug[strings.ToLower(displayName)]
 		if !ok {
 			warnings = append(warnings, fmt.Sprintf("unknown wikilink: [[%s]]", displayName))
 			return match
 		}
 		return "[[" + slug + "|" + displayName + "]]"
 	})
 	return result, warnings
 }
--- a/ingestion/internal/pipeline/links_test.go
+++ b/ingestion/internal/pipeline/links_test.go
@@ -0,0 +1,125 @@
 // ingestion/internal/pipeline/links_test.go
 package pipeline
 import (
 	"testing"
 	"github.com/stretchr/testify/assert"
 	"github.com/stretchr/testify/require"
 	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
 )
 func TestCanonicalizeLinks_KnownTitle(t *testing.T) {
 	pages := []wiki.Page{
 		{
 			Path:    "wiki/sources/shape-up.md",
 			Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Betting]].\n",
 		},
 	}
 	inventory := map[wiki.PageType][]wiki.Entry{
 		wiki.PageTypeConcept: {
 			{Slug: "betting", Title: "Betting"},
 		},
 	}
 	got, warnings := CanonicalizeLinks(pages, inventory)
 	require.Len(t, got, 1)
 	assert.Empty(t, warnings)
 	assert.Contains(t, got[0].Content, "[[betting|Betting]]")
 	assert.NotContains(t, got[0].Content, "[[Betting]]")
 }
 func TestCanonicalizeLinks_UnknownTitleLeftAsIs(t *testing.T) {
 	pages := []wiki.Page{
 		{
 			Path:    "wiki/sources/shape-up.md",
 			Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Ghost Concept]].\n",
 		},
 	}
 	inventory := map[wiki.PageType][]wiki.Entry{}
 	got, warnings := CanonicalizeLinks(pages, inventory)
 	require.Len(t, got, 1)
 	assert.NotEmpty(t, warnings)
 	assert.Contains(t, got[0].Content, "[[Ghost Concept]]")
 }
 func TestCanonicalizeLinks_AlreadyCanonicalLinkUntouched(t *testing.T) {
 	// Links already in [[slug|Display]] format must not be double-converted
 	pages := []wiki.Page{
 		{
 			Path:    "wiki/sources/shape-up.md",
 			Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[betting|Betting]].\n",
 		},
 	}
 	inventory := map[wiki.PageType][]wiki.Entry{
 		wiki.PageTypeConcept: {
 			{Slug: "betting", Title: "Betting"},
 		},
 	}
 	got, warnings := CanonicalizeLinks(pages, inventory)
 	require.Len(t, got, 1)
 	assert.Empty(t, warnings)
 	// Should remain exactly as-is — not double-wrapped
 	assert.Contains(t, got[0].Content, "[[betting|Betting]]")
 	assert.NotContains(t, got[0].Content, "[[betting|[[betting|Betting]]]]")
 }
 func TestCanonicalizeLinks_CaseInsensitiveMatch(t *testing.T) {
 	pages := []wiki.Page{
 		{
 			Path:    "wiki/sources/foo.md",
 			Content: "---\ntitle: 'Foo'\n---\n\n## Summary\n\nSee [[domain driven design]].\n",
 		},
 	}
 	inventory := map[wiki.PageType][]wiki.Entry{
 		wiki.PageTypeConcept: {
 			{Slug: "domain-driven-design", Title: "Domain Driven Design"},
 		},
 	}
 	got, warnings := CanonicalizeLinks(pages, inventory)
 	require.Len(t, got, 1)
 	assert.Empty(t, warnings)
 	assert.Contains(t, got[0].Content, "[[domain-driven-design|domain driven design]]")
 }
 func TestCanonicalizeLinks_CurrentBatchPagesResolved(t *testing.T) {
 	// A concept created in the same batch should be canonicalizable
 	pages := []wiki.Page{
 		{
 			Path:    "wiki/sources/shape-up.md",
 			Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Betting]].\n",
 		},
 		{
 			Path:    "wiki/concepts/betting.md",
 			Content: "---\ntitle: 'Betting'\n---\n\n## Definition\n\nA technique.\n",
 		},
 	}
 	inventory := map[wiki.PageType][]wiki.Entry{} // empty — Betting is in the batch, not inventory
 	got, warnings := CanonicalizeLinks(pages, inventory)
 	require.Len(t, got, 2)
 	assert.Empty(t, warnings)
 	assert.Contains(t, got[0].Content, "[[betting|Betting]]")
 }
 func TestCanonicalizeLinks_MultipleLinksInOnePage(t *testing.T) {
 	pages := []wiki.Page{
 		{
 			Path:    "wiki/sources/foo.md",
 			Content: "---\ntitle: 'Foo'\n---\n\n## Summary\n\nSee [[Betting]] and [[Shape Up]].\n",
 		},
 	}
 	inventory := map[wiki.PageType][]wiki.Entry{
 		wiki.PageTypeConcept: {
 			{Slug: "betting", Title: "Betting"},
 		},
 		wiki.PageTypeSource: {
 			{Slug: "shape-up", Title: "Shape Up"},
 		},
 	}
 	got, warnings := CanonicalizeLinks(pages, inventory)
 	require.Len(t, got, 1)
 	assert.Empty(t, warnings)
 	assert.Contains(t, got[0].Content, "[[betting|Betting]]")
 	assert.Contains(t, got[0].Content, "[[shape-up|Shape Up]]")
 }
--- a/ingestion/internal/pipeline/parse.go
+++ b/ingestion/internal/pipeline/parse.go
@@ -5,13 +5,22 @@ import (
 	"encoding/json"
 	"fmt"
 	"strings"
 	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
 )
-// ParsePages parses LLM output as a JSON array of {path, content} objects.
+// RawPage is the LLM's output format — minimal structured data with no path or frontmatter.
-// If the array is truncated mid-object (token limit), it salvages all complete objects.
+// The pipeline derives slugs, paths, and frontmatter from these fields.
-func ParsePages(output string) ([]wiki.Page, []string) {
+type RawPage struct {
 	Title   string `json:"title"`
 	Type    string `json:"type"`    // "source" | "concept" | "entity"
 	Subtype string `json:"subtype"` // entity: person|company|tool|model|framework|technology; source: article|pdf|book|video|note|project
 	Domain  string `json:"domain"`
 	Content string `json:"content"` // Markdown body only — no frontmatter
 }
 // ParseRawPages parses LLM output as a JSON array of RawPage objects.
 // If the output contains invalid JSON escape sequences (e.g. \. from Markdown),
 // it attempts repair before falling back to truncation recovery.
 func ParseRawPages(output string) ([]RawPage, []string) {
 	output = strings.TrimSpace(output)
 	if output == "" {
 		return nil, []string{"LLM returned empty output"}
@@ -19,23 +28,30 @@ func ParsePages(output string) ([]wiki.Page, []string) {
 	output = stripFences(output)
-	var pages []wiki.Page
+	// Fast path: valid JSON.
 	var pages []RawPage
 	if err := json.Unmarshal([]byte(output), &pages); err == nil {
 		return pages, nil
 	}
 	// Repair pass: fix invalid escape sequences (e.g. \. \d from Markdown content).
 	repaired := repairJSON(output)
 	if err := json.Unmarshal([]byte(repaired), &pages); err == nil {
 		return pages, []string{"repaired invalid JSON escape sequences in LLM output"}
 	}
 	// Truncation recovery: find last `}` that closes a complete object.
-	idx := strings.LastIndex(output, "}")
+	idx := strings.LastIndex(repaired, "}")
 	if idx < 0 {
 		return nil, []string{"LLM output contained no complete JSON objects"}
 	}
-	start := strings.Index(output, "[")
+	start := strings.Index(repaired, "[")
 	if start < 0 {
 		return nil, []string{"LLM output contained no JSON array opening bracket"}
 	}
-	candidate := output[start:idx+1] + "]"
+	candidate := repaired[start:idx+1] + "]"
 	if err := json.Unmarshal([]byte(candidate), &pages); err != nil {
 		return nil, []string{fmt.Sprintf("truncation recovery failed: %v", err)}
 	}
@@ -43,6 +59,45 @@ func ParsePages(output string) ([]wiki.Page, []string) {
 	return pages, []string{fmt.Sprintf("LLM output was truncated; recovered %d page(s)", len(pages))}
 }
 // repairJSON replaces invalid JSON escape sequences (e.g. \. \d \p) with
 // a properly escaped backslash followed by the same character.
 // It iterates byte-by-byte to correctly skip already-valid escape sequences
 // (including \\) without requiring lookbehind support.
 func repairJSON(s string) string {
 	var b strings.Builder
 	b.Grow(len(s))
 	i := 0
 	for i < len(s) {
 		if s[i] != '\\' {
 			b.WriteByte(s[i])
 			i++
 			continue
 		}
 		// We have a backslash. Peek at the next character.
 		if i+1 >= len(s) {
 			// Trailing backslash — emit as-is.
 			b.WriteByte(s[i])
 			i++
 			continue
 		}
 		next := s[i+1]
 		switch next {
 		case '"', '\\', '/', 'b', 'f', 'n', 'r', 't', 'u':
 			// Valid JSON escape sequence — emit both characters as-is.
 			b.WriteByte(s[i])
 			b.WriteByte(next)
 			i += 2
 		default:
 			// Invalid escape — double the backslash.
 			b.WriteByte('\\')
 			b.WriteByte('\\')
 			b.WriteByte(next)
 			i += 2
 		}
 	}
 	return b.String()
 }
 func stripFences(s string) string {
 	for _, prefix := range []string{"```json\n", "```json\r\n", "```\n", "```\r\n"} {
 		if strings.HasPrefix(s, prefix) {
--- a/ingestion/internal/pipeline/parse_test.go
+++ b/ingestion/internal/pipeline/parse_test.go
@@ -8,39 +8,80 @@ import (
 	"github.com/stretchr/testify/require"
 )
-func TestParsePages_ValidJSON(t *testing.T) {
+func TestParseRawPages_ValidJSON(t *testing.T) {
-	input := `[{"path":"wiki/sources/foo.md","content":"# Foo"},{"path":"wiki/concepts/bar.md","content":"# Bar"}]`
+	input := `[{"title":"Shape Up","type":"source","subtype":"book","domain":"product-strategy","content":"## Summary\n\nFoo."},{"title":"Betting","type":"concept","content":"## Definition\n\nA technique."}]`
-	pages, warnings := ParsePages(input)
+	pages, warnings := ParseRawPages(input)
 	require.Len(t, pages, 2)
 	assert.Empty(t, warnings)
-	assert.Equal(t, "wiki/sources/foo.md", pages[0].Path)
+	assert.Equal(t, "Shape Up", pages[0].Title)
-	assert.Equal(t, "wiki/concepts/bar.md", pages[1].Path)
+	assert.Equal(t, "source", pages[0].Type)
 	assert.Equal(t, "book", pages[0].Subtype)
 	assert.Equal(t, "product-strategy", pages[0].Domain)
 	assert.Equal(t, "Betting", pages[1].Title)
 	assert.Equal(t, "concept", pages[1].Type)
 	assert.Empty(t, pages[1].Subtype)
 }
-func TestParsePages_StripsFences(t *testing.T) {
+func TestParseRawPages_StripsFences(t *testing.T) {
-	input := "```json\n[{\"path\":\"wiki/sources/foo.md\",\"content\":\"# Foo\"}]\n```"
+	input := "```json\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"## Definition\\n\\nFoo.\"}]\n```"
-	pages, warnings := ParsePages(input)
+	pages, warnings := ParseRawPages(input)
 	assert.Len(t, pages, 1)
 	assert.Empty(t, warnings)
 }
 func TestParsePages_TruncationRecovery(t *testing.T) {
 	input := `[{"path":"wiki/sources/foo.md","content":"# Foo"},{"path":"wiki/concepts/bar.md","content":"trunc`
 	pages, warnings := ParsePages(input)
 	require.Len(t, pages, 1)
-	assert.Equal(t, "wiki/sources/foo.md", pages[0].Path)
+	assert.Empty(t, warnings)
 	assert.Equal(t, "Foo", pages[0].Title)
 }
 func TestParseRawPages_TruncationRecovery(t *testing.T) {
 	input := `[{"title":"Foo","type":"concept","content":"## Definition\n\nFoo."},{"title":"Bar","type":"concept","content":"trunc`
 	pages, warnings := ParseRawPages(input)
 	require.Len(t, pages, 1)
 	assert.Equal(t, "Foo", pages[0].Title)
 	assert.NotEmpty(t, warnings)
 }
-func TestParsePages_EmptyInput(t *testing.T) {
+func TestParseRawPages_EmptyInput(t *testing.T) {
-	pages, warnings := ParsePages("")
+	pages, warnings := ParseRawPages("")
 	assert.Empty(t, pages)
 	assert.NotEmpty(t, warnings)
 }
-func TestParsePages_PlainFence(t *testing.T) {
+func TestParseRawPages_PlainFence(t *testing.T) {
-	input := "```\n[{\"path\":\"wiki/sources/foo.md\",\"content\":\"ok\"}]\n```"
+	input := "```\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"ok\"}]\n```"
-	pages, warnings := ParsePages(input)
+	pages, warnings := ParseRawPages(input)
-	assert.Len(t, pages, 1)
+	require.Len(t, pages, 1)
 	assert.Empty(t, warnings)
 }
 func TestParseRawPages_MissingTitle(t *testing.T) {
 	// Missing title — still parsed, Title is empty string
 	input := `[{"type":"concept","content":"## Definition\n\nFoo."}]`
 	pages, warnings := ParseRawPages(input)
 	require.Len(t, pages, 1)
 	assert.Empty(t, warnings)
 	assert.Empty(t, pages[0].Title)
 }
 func TestParseRawPages_InvalidEscapeRepaired(t *testing.T) {
 	// LLM copied markdown escaped list numbers (\.) into JSON — invalid escape
 	raw := "[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"Step 4\\. Do it.\"}]"
 	pages, warnings := ParseRawPages(raw)
 	require.Len(t, pages, 1)
 	assert.Equal(t, "Foo", pages[0].Title)
 	assert.Contains(t, pages[0].Content, `4\.`)
 	assert.NotEmpty(t, warnings) // repair warning
 }
 func TestRepairJSON_FixesInvalidEscapes(t *testing.T) {
 	cases := []struct {
 		in   string
 		want string
 	}{
 		{`{"a":"foo\.bar"}`, `{"a":"foo\\.bar"}`},
 		{`{"a":"\\n is fine"}`, `{"a":"\\n is fine"}`}, // valid \n untouched
 		{`{"a":"\d+ items"}`, `{"a":"\\d+ items"}`},
 		{`{"a":"already \\ escaped"}`, `{"a":"already \\ escaped"}`}, // valid \\ untouched
 	}
 	for _, tc := range cases {
 		got := repairJSON(tc.in)
 		assert.Equal(t, tc.want, got, "input: %s", tc.in)
 	}
 }
--- a/ingestion/internal/pipeline/pipeline.go
+++ b/ingestion/internal/pipeline/pipeline.go
@@ -41,9 +41,11 @@ func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryR
 		schema = loadSchema(brainDir)
 	}
 	sourceSlug := wiki.Slug(source)
 	date := time.Now().UTC().Format("2006-01-02")
 	chunks := Chunk(content, cfg.ChunkSize)
-	var allPages []wiki.Page
+	var allRaw []RawPage
 	var allWarnings []string
 	for _, chunk := range chunks {
@@ -52,18 +54,40 @@ func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryR
 		if err != nil {
 			return Result{}, fmt.Errorf("LLM call: %w", err)
 		}
-		pages, warnings := ParsePages(output)
+		raw, warnings := ParseRawPages(output)
-		allPages = append(allPages, pages...)
+		allRaw = append(allRaw, raw...)
 		allWarnings = append(allWarnings, warnings...)
 	}
-	resolved := Resolve(allPages, inventory)
+	return buildAndWrite(allRaw, sourceSlug, date, brainDir, source, inventory, allWarnings, dryRun)
-	withRefs := injectSourceRefs(resolved, inventory, brainDir)
+}
 // RunRaw runs the pipeline on pre-parsed RawPages, skipping the LLM extraction
 // step. Use this when the caller has already produced the structured RawPage data
 // (e.g. from a more capable model or manual curation).
 func RunRaw(brainDir, source string, rawPages []RawPage, dryRun bool) (Result, error) {
 	inventory, err := wiki.LoadInventory(brainDir)
 	if err != nil {
 		return Result{}, fmt.Errorf("load inventory: %w", err)
 	}
 	sourceSlug := wiki.Slug(source)
 	date := time.Now().UTC().Format("2006-01-02")
 	return buildAndWrite(rawPages, sourceSlug, date, brainDir, source, inventory, nil, dryRun)
 }
 // buildAndWrite runs BuildPages through write for both Run and RunRaw.
 func buildAndWrite(rawPages []RawPage, sourceSlug, date, brainDir, source string, inventory map[wiki.PageType][]wiki.Entry, warnings []string, dryRun bool) (Result, error) {
 	pages, buildWarnings := BuildPages(rawPages, sourceSlug, date)
 	warnings = append(warnings, buildWarnings...)
 	resolved := Resolve(pages, inventory)
 	canonicalized, linkWarnings := CanonicalizeLinks(resolved, inventory)
 	warnings = append(warnings, linkWarnings...)
 	withRefs := injectSourceRefs(canonicalized, inventory, brainDir)
 	merged := mergeAll(withRefs)
 	date := time.Now().UTC().Format("2006-01-02")
 	var written []string
 	for _, page := range merged {
 		if !dryRun {
 			dest := filepath.Join(brainDir, filepath.FromSlash(page.Path))
@@ -79,14 +103,14 @@ func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryR
 	if !dryRun {
 		if err := wiki.RebuildIndex(brainDir, date); err != nil {
-			allWarnings = append(allWarnings, fmt.Sprintf("rebuild index: %v", err))
+			warnings = append(warnings, fmt.Sprintf("rebuild index: %v", err))
 		}
-		if err := wiki.AppendLog(brainDir, source, written, allWarnings, date); err != nil {
+		if err := wiki.AppendLog(brainDir, source, written, warnings, date); err != nil {
-			allWarnings = append(allWarnings, fmt.Sprintf("append log: %v", err))
+			warnings = append(warnings, fmt.Sprintf("append log: %v", err))
 		}
 	}
-	return Result{Pages: written, Warnings: allWarnings}, nil
+	return Result{Pages: written, Warnings: warnings}, nil
 }
 // mergeAll deduplicates pages by path, merging content from later occurrences.
--- a/ingestion/internal/pipeline/pipeline_test.go
+++ b/ingestion/internal/pipeline/pipeline_test.go
@@ -15,7 +15,6 @@ import (
 	"github.com/stretchr/testify/require"
 	"github.com/mathiasbq/hyperguild/ingestion/internal/llm"
 	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
 )
 func TestRun_WritesPages(t *testing.T) {
@@ -24,14 +23,19 @@ func TestRun_WritesPages(t *testing.T) {
 		require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
 	}
-	llmResponse := mustJSON([]wiki.Page{
+	llmResponse := mustJSON([]RawPage{
 		{
-			Path:    "wiki/sources/test-article.md",
+			Title:   "Test Article",
-			Content: "---\ntitle: Test Article\ntype: article\ndomain: software-engineering\ndate_ingested: 2026-04-22\nlast_updated: 2026-04-22\naliases:\n  - Test Article\n---\n\n## Summary\n\nA test article.\n\n## Key Claims\n\n- It tests things.\n\n## Concepts Introduced or Reinforced\n\n## Entities Mentioned\n\n## Open Questions Raised\n",
+			Type:    "source",
 			Subtype: "article",
 			Domain:  "software-engineering",
 			Content: "## Summary\n\nA test article.\n\n## Key Claims\n\n- It tests things.\n\n## Concepts Introduced or Reinforced\n\n[[Testing]]\n\n## Entities Mentioned\n\n## Open Questions Raised\n",
 		},
 		{
-			Path:    "wiki/concepts/testing.md",
+			Title:   "Testing",
-			Content: "---\ntitle: Testing\ndomain: software-engineering\nlast_updated: 2026-04-22\naliases:\n  - Testing\n---\n\n## Definition\n\nThe practice of verifying software.\n\n## Why It Matters\n\nCatches bugs.\n\n## Related Concepts\n\n## Related Entities\n\n## Sources\n\n## Evolving Notes\n",
+			Type:    "concept",
 			Domain:  "software-engineering",
 			Content: "## Definition\n\nThe practice of verifying software.\n\n## Why It Matters\n\nCatches bugs.\n\n## Related Concepts\n\n## Related Entities\n\n## Sources\n\n## Evolving Notes\n",
 		},
 	})
@@ -53,7 +57,6 @@ func TestRun_WritesPages(t *testing.T) {
 	result, err := Run(context.Background(), cfg, brainDir, "An article about testing.", "test-article", false)
 	require.NoError(t, err)
 	assert.Len(t, result.Pages, 2)
 	assert.Empty(t, result.Warnings)
 	_, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "test-article.md"))
 	require.NoError(t, err)
@@ -71,9 +74,11 @@ func TestRun_DryRunDoesNotWrite(t *testing.T) {
 		require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
 	}
-	llmResponse := mustJSON([]wiki.Page{{
+	llmResponse := mustJSON([]RawPage{{
-		Path:    "wiki/sources/foo.md",
+		Title:   "Foo",
-		Content: "---\ntitle: Foo\n---\n\n## Summary\n\nFoo.\n",
+		Type:    "source",
 		Subtype: "article",
 		Content: "## Summary\n\nFoo.\n",
 	}})
 	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
@@ -98,10 +103,10 @@ func TestRun_MergesDuplicatePaths(t *testing.T) {
 		require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
 	}
-	// LLM returns same path twice (simulates multi-chunk merge)
+	// LLM returns same title twice (simulates multi-chunk duplicate)
-	llmResponse := mustJSON([]wiki.Page{
+	llmResponse := mustJSON([]RawPage{
-		{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFirst.\n\n## Related Concepts\n\n- [[bar|Bar]]\n"},
+		{Title: "Foo", Type: "concept", Content: "## Definition\n\nFirst.\n\n## Related Concepts\n\n[[Bar]]\n"},
-		{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nSecond.\n\n## Related Concepts\n\n- [[baz|Baz]]\n"},
+		{Title: "Foo", Type: "concept", Content: "## Definition\n\nSecond.\n\n## Related Concepts\n\n[[Baz]]\n"},
 	})
 	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
@@ -120,8 +125,9 @@ func TestRun_MergesDuplicatePaths(t *testing.T) {
 	require.NoError(t, err)
 	// keep-first for Definition, union for Related Concepts
 	assert.Contains(t, string(content), "First.")
-	assert.Contains(t, string(content), "[[bar|Bar]]")
+	// Bar and Baz unknown in empty inventory → left as plain [[links]]
-	assert.Contains(t, string(content), "[[baz|Baz]]")
+	assert.Contains(t, string(content), "[[Bar]]")
 	assert.Contains(t, string(content), "[[Baz]]")
 }
 func mustJSON(v any) string {
--- a/ingestion/internal/pipeline/prompt.go
+++ b/ingestion/internal/pipeline/prompt.go
@@ -12,12 +12,15 @@ import (
 const systemPrompt = `You are a wiki agent. Read the source material and produce structured wiki pages following the schema provided.
 Output ONLY a valid JSON array — no markdown fences, no other text before or after.
-Each element must have:
+Each element must have exactly these fields:
-  "path"    — relative path within the wiki, e.g. "wiki/sources/foo.md"
+  "title"   — exact page title (e.g. "FinBERT", "Ryan Singer", "Shape Up")
-  "content" — full markdown content of the page including YAML frontmatter
+  "type"    — exactly one of: "source", "concept", "entity"
  "subtype" — for source: article|pdf|book|video|note|project; for entity: person|company|tool|model|framework|technology; omit for concept
  "domain"  — one of the domains in the schema (omit if none fits)
  "content" — Markdown body only — NO frontmatter, NO path, NO slug
-Follow the schema strictly: correct frontmatter fields, wikilinks as [[slug|Display Text]],
+Wikilinks in content: [[Display Name]] — just the display name, no slug, no pipe separator.
-dates in YYYY-MM-DD format, and paraphrase rather than quoting verbatim.`
+Only link to pages listed in the inventory or pages you are creating in this response.`
 // BuildPrompt constructs the user prompt for a single chunk.
 func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]wiki.Entry) string {
@@ -30,7 +33,7 @@ func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]w
 	sb.WriteString("\n\n")
 	sb.WriteString("## Existing wiki pages\n\n")
-	sb.WriteString("Link ONLY to pages in this inventory or pages you are creating in this response.\n\n")
+	sb.WriteString("Reference these pages by display name only — [[Display Name]] — in your content.\n\n")
 	for _, pt := range []wiki.PageType{wiki.PageTypeConcept, wiki.PageTypeEntity, wiki.PageTypeSource} {
 		entries := inventory[pt]
@@ -39,19 +42,19 @@ func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]w
 			fmt.Fprintf(&sb, "%s — (none yet)\n\n", label)
 			continue
 		}
-		fmt.Fprintf(&sb, "%s — link ONLY under the matching section:\n", label)
+		fmt.Fprintf(&sb, "%s:\n", label)
 		for _, e := range entries {
-			fmt.Fprintf(&sb, "  - [[%s|%s]]\n", e.Slug, e.Title)
+			fmt.Fprintf(&sb, "  - %s\n", e.Title)
 		}
 		sb.WriteString("\n")
 	}
 	sb.WriteString("## Non-negotiable rules\n\n")
 	sb.WriteString("1. Output ONLY a valid JSON array — no prose, no fences.\n")
-	sb.WriteString("2. Slugs are kebab-case: lowercase, spaces→hyphens, no special chars.\n")
+	sb.WriteString("2. Fields: title, type, subtype (if applicable), domain (if applicable), content.\n")
-	sb.WriteString("3. Wikilinks: [[slug|Display Text]] — the pipe is required.\n")
+	sb.WriteString("3. Wikilinks: [[Display Name]] — no slug, no pipe. The pipeline handles slugs.\n")
-	sb.WriteString("4. Section links must match their section type.\n")
+	sb.WriteString("4. Section links must match their section type (Related Concepts → concepts only, etc.).\n")
-	sb.WriteString("5. One source page per book — update it if inventory shows it exists.\n\n")
+	sb.WriteString("5. One source page per book — if inventory shows it exists, return it as an UPDATE.\n\n")
 	fmt.Fprintf(&sb, "## Source: %s\n\n", source)
 	sb.WriteString(content)
--- a/ingestion/internal/watcher/watcher_test.go
+++ b/ingestion/internal/watcher/watcher_test.go
@@ -14,13 +14,12 @@ import (
 	"github.com/stretchr/testify/require"
 	"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
 	"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
 )
-// successComplete returns a valid JSON-encoded page array for any call.
+// successComplete returns a valid JSON-encoded RawPage array for any call.
-func successComplete(page wiki.Page) pipeline.CompleteFunc {
+func successComplete(raw pipeline.RawPage) pipeline.CompleteFunc {
 	return func(ctx context.Context, system, user string) (string, error) {
-		b, err := json.Marshal([]wiki.Page{page})
+		b, err := json.Marshal([]pipeline.RawPage{raw})
 		if err != nil {
 			return "", err
 		}
@@ -50,16 +49,19 @@ func TestStart_ProcessesFile(t *testing.T) {
 	require.NoError(t, os.WriteFile(rawFile, []byte("Content about Shape Up."), 0o644))
 	date := time.Now().UTC().Format("2006-01-02")
-	wikiPage := wiki.Page{
+	rawPage := pipeline.RawPage{
-		Path:    "wiki/sources/shape-up-book.md",
+		Title:   "Shape Up Book",
-		Content: "---\ntitle: Shape Up Book\ntype: article\ndomain: product-management\ndate_ingested: " + date + "\nlast_updated: " + date + "\naliases:\n  - Shape Up Book\n---\n\n## Summary\n\nA book about Shape Up.\n",
+		Type:    "source",
 		Subtype: "article",
 		Domain:  "product-management",
 		Content: "## Summary\n\nA book about Shape Up.\n",
 	}
 	cfg := Config{
 		BrainDir: brainDir,
 		Interval: 50 * time.Millisecond,
 		Pipeline: pipeline.Config{
-			Complete:  successComplete(wikiPage),
+			Complete:  successComplete(rawPage),
 			ChunkSize: 0,
 			Schema:    "# Schema\nThree page types.",
 		},
@@ -193,12 +195,14 @@ func TestProcessDir_SkipsSubdirs(t *testing.T) {
 	// Track which sources were passed to Complete.
 	var processedSources []string
 	completeFn := func(ctx context.Context, system, user string) (string, error) {
-		// Record that this was called; return a minimal valid page.
+		// Record that this was called; return a minimal valid RawPage.
-		page := wiki.Page{
+		raw := pipeline.RawPage{
-			Path:    "wiki/sources/valid.md",
+			Title:   "Valid",
-			Content: "---\ntitle: Valid\n---\n\n## Summary\n\nValid.\n",
+			Type:    "source",
 			Subtype: "article",
 			Content: "## Summary\n\nValid.\n",
 		}
-		b, _ := json.Marshal([]wiki.Page{page})
+		b, _ := json.Marshal([]pipeline.RawPage{raw})
 		processedSources = append(processedSources, "called")
 		return string(b), nil
 	}
--- a/internal/skills/brain/handlers.go
+++ b/internal/skills/brain/handlers.go
@@ -17,6 +17,8 @@ func (s *Skill) Handle(ctx context.Context, tool string, args json.RawMessage) (
 		return s.query(ctx, args)
 	case "brain_write":
 		return s.write(ctx, args)
 	case "brain_ingest_raw":
 		return s.ingestRaw(ctx, args)
 	case "brain_ingest":
 		return s.ingest(ctx, args)
 	case "brain_search":
@@ -98,6 +100,33 @@ func (s *Skill) ingest(ctx context.Context, args json.RawMessage) (json.RawMessa
 	return nil, fmt.Errorf("either content+source or path is required")
 }
 type ingestRawArgs struct {
 	Source string `json:"source"`
 	Pages  []any  `json:"pages"`
 	DryRun bool   `json:"dry_run,omitempty"`
 }
 func (s *Skill) ingestRaw(ctx context.Context, args json.RawMessage) (json.RawMessage, error) {
 	var a ingestRawArgs
 	if err := json.Unmarshal(args, &a); err != nil {
 		return nil, fmt.Errorf("parse args: %w", err)
 	}
 	if s.cfg.IngestSvcURL == "" {
 		return nil, fmt.Errorf("brain_ingest_raw: INGEST_SVC_URL not configured")
 	}
 	if a.Source == "" {
 		return nil, fmt.Errorf("source is required")
 	}
 	if len(a.Pages) == 0 {
 		return nil, fmt.Errorf("pages is required and must be non-empty")
 	}
 	return s.postTo(ctx, s.cfg.IngestSvcURL+"/ingest-raw", map[string]any{
 		"source":  a.Source,
 		"pages":   a.Pages,
 		"dry_run": a.DryRun,
 	})
 }
 type searchArgs struct {
 	Query      string `json:"query"`
 	Collection string `json:"collection,omitempty"`
--- a/internal/skills/brain/skill.go
+++ b/internal/skills/brain/skill.go
@@ -55,6 +55,32 @@ func (s *Skill) Tools() []registry.ToolDef {
 		},
 	}
 	if s.cfg.IngestSvcURL != "" {
 		tools = append(tools, registry.ToolDef{
 			Name: "brain_ingest_raw",
 			Description: "Ingest pre-structured pages into the brain wiki, bypassing the LLM extraction step. " +
 				"Use when you (the calling agent) have already extracted entities, concepts, and content from a source. " +
 				"Provide source (human-readable name) and pages (array of {title, type, subtype, domain, content} objects). " +
 				"The pipeline computes slugs, paths, frontmatter, wikilink canonicalization, and source back-references. " +
 				"Returns the list of wiki pages written.",
 			InputSchema: schema([]string{"source", "pages"}, map[string]any{
 				"source": map[string]any{"type": "string", "description": "human-readable name for the source, e.g. 'shape-up-book'"},
 				"pages": map[string]any{
 					"type": "array",
 					"items": map[string]any{
 						"type":     "object",
 						"required": []string{"title", "type", "content"},
 						"properties": map[string]any{
 							"title":   map[string]any{"type": "string", "description": "page title, e.g. 'Hash Encoding'"},
 							"type":    map[string]any{"type": "string", "enum": []string{"source", "concept", "entity"}, "description": "page type"},
 							"subtype": map[string]any{"type": "string", "description": "entity: person|company|tool|model|framework|technology; source: article|pdf|book|video|note|project"},
 							"domain":  map[string]any{"type": "string", "description": "knowledge domain, e.g. 'Machine Learning'"},
 							"content": map[string]any{"type": "string", "description": "markdown body — no frontmatter, use [[Display Name]] for wikilinks"},
 						},
 					},
 				},
 				"dry_run": map[string]any{"type": "boolean"},
 			}),
 		})
 		tools = append(tools, registry.ToolDef{
 			Name: "brain_ingest",
 			Description: "Ingest content into the brain wiki (brain/wiki/). Calls an LLM to produce structured wiki pages. " +
Author	SHA1	Message	Date
Mathias Bergqvist	0a70d9e972	feat(pipeline): add POST /ingest-raw for direct batch ingestion without LLM All checks were successful CI / Lint / Test / Vet (push) Successful in 9s Details CI / Mirror to GitHub (push) Has been skipped Details Allows callers to provide pre-structured RawPage data directly, bypassing the LLM extraction step. The pipeline still handles slug computation, frontmatter, link canonicalization, source back-references, and dedup — only the extraction is skipped. Useful when a more capable model or manual curation produces the structured data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 11:15:59 +02:00
Mathias Bergqvist	3e9a648115	fix(pipeline): repair invalid JSON escape sequences from LLM output before parsing All checks were successful CI / Lint / Test / Vet (push) Successful in 11s Details CI / Mirror to GitHub (push) Has been skipped Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 22:04:27 +02:00
Mathias Bergqvist	923a665365	fix(pipeline): skip RawPages with empty title in BuildPages instead of producing broken paths All checks were successful CI / Lint / Test / Vet (push) Successful in 9s Details CI / Mirror to GitHub (push) Has been skipped Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 19:55:37 +02:00
Mathias Bergqvist	537aebc302	feat(pipeline): update system prompt for new LLM JSON contract (no slugs) - Change prompt to reflect new output format: title, type, subtype, domain, content - Remove slug/path generation responsibility from LLM — pipeline now handles it - Wikilinks change from [[slug\|Display Name]] to [[Display Name]] only - LLM no longer includes frontmatter or paths in output docs(schema): update LLM output format and wikilink convention for Level 3 - Specify JSON schema: title, type, subtype, domain, content fields - Remove frontmatter requirements from schema output (handled by pipeline) - Simplify wikilink format to [[Display Name]] — no slug or pipe - Pipeline now responsible for slug generation and frontmatter construction These changes shift slug/frontmatter generation from LLM to pipeline, reducing cognitive load on the model and improving control over output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 19:45:21 +02:00
Mathias Bergqvist	de35d4dbb0	feat(pipeline): wire ParseRawPages+BuildPages+CanonicalizeLinks into Run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 19:07:33 +02:00
Mathias Bergqvist	26855f69b0	feat(pipeline): add CanonicalizeLinks — convert [[Display Name]] to [[slug\|Display Name]]	2026-04-23 18:59:10 +02:00
Mathias Bergqvist	a7b363d589	fix(pipeline): quote YAML scalar fields in buildFrontmatter to prevent injection	2026-04-23 18:56:39 +02:00
Mathias Bergqvist	7b57051af8	feat(pipeline): add BuildPages — compute slugs/paths/frontmatter from RawPage	2026-04-23 18:50:37 +02:00
Mathias Bergqvist	a620f6cb01	fix(pipeline): guard empty-title bridge + skip stale integration tests until task4 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 18:46:07 +02:00
Mathias Bergqvist	26b5636b43	feat(pipeline): replace ParsePages with ParseRawPages + RawPage type Strips slug authority from the LLM. The new RawPage type carries only {title, type, subtype, domain, content} — no paths or frontmatter. Pipeline will derive slugs deterministically (Task 4). pipeline.go gets a temporary bridge stub (TODO task4) to keep the package compiling between tasks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 18:41:33 +02:00
Mathias Bergqvist	989f375aec	docs: add Level 3 implementation plan	2026-04-23 17:37:45 +02:00
Mathias Bergqvist	6403d5e444	docs: add Level 3 slug authority design spec	2026-04-23 17:23:22 +02:00