diff --git a/docs/superpowers/plans/2026-04-22-brain-ingestion-pipeline.md b/docs/superpowers/plans/2026-04-22-brain-ingestion-pipeline.md new file mode 100644 index 0000000..47e08d6 --- /dev/null +++ b/docs/superpowers/plans/2026-04-22-brain-ingestion-pipeline.md @@ -0,0 +1,2608 @@ +# Brain Ingestion Pipeline Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add a structured LLM-powered ingestion pipeline to the hyperguild brain: two new HTTP endpoints (`/ingest`, `/ingest-path`) and a background file watcher on `brain/raw/`, producing concepts/entities/sources wiki pages. + +**Architecture:** Raw content (passed directly or read from files) is chunked, sent to an OpenAI-compatible LLM with the brain schema and existing wiki inventory as context, and the returned JSON array of `{path, content}` objects is merged and written to `brain/wiki/`. A background goroutine polls `brain/raw/` and processes dropped files automatically. The supervisor's `brain_ingest` skill connects to the ingestion server's `/ingest` endpoint. + +**Tech Stack:** Go stdlib, `testify` (already in go.mod), OpenAI-compatible chat completions API via LiteLLM. + +--- + +## File Map + +### New files — `ingestion` module + +| File | Responsibility | +|---|---| +| `ingestion/internal/wiki/types.go` | `Page` struct, `PageType` constants | +| `ingestion/internal/wiki/slug.go` | `Slug(title) string` | +| `ingestion/internal/wiki/slug_test.go` | Slug edge case tests | +| `ingestion/internal/wiki/merge.go` | `Merge(a, b Page) Page`, section parsing | +| `ingestion/internal/wiki/merge_test.go` | Merge strategy tests | +| `ingestion/internal/wiki/inventory.go` | `LoadInventory(brainDir) (map[PageType][]Entry, error)` | +| `ingestion/internal/wiki/inventory_test.go` | Inventory loading tests | +| `ingestion/internal/wiki/index.go` | `RebuildIndex(brainDir, date string) error` | +| `ingestion/internal/wiki/log.go` | `AppendLog(brainDir, source string, pages, warnings []string, date string) error` | +| `ingestion/internal/wiki/index_test.go` | Index + log tests | +| `ingestion/internal/llm/client.go` | `Client`, `Complete(ctx, system, user string) (string, error)` | +| `ingestion/internal/llm/client_test.go` | Mock server tests | +| `ingestion/internal/pipeline/chunk.go` | `Chunk(content string, maxSize int) []string` | +| `ingestion/internal/pipeline/chunk_test.go` | Chunking tests | +| `ingestion/internal/pipeline/parse.go` | `ParsePages(output string) ([]wiki.Page, []string)` | +| `ingestion/internal/pipeline/parse_test.go` | Parse + truncation recovery tests | +| `ingestion/internal/pipeline/prompt.go` | `BuildPrompt(...)` | +| `ingestion/internal/pipeline/pipeline.go` | `Config`, `Run(...)` | +| `ingestion/internal/pipeline/pipeline_test.go` | Integration test with mock LLM | +| `ingestion/internal/watcher/watcher.go` | `Config`, `Watcher`, `Start`, `processFile` | +| `ingestion/internal/watcher/watcher_test.go` | File move behaviour tests | +| `brain/CLAUDE.md` | Schema injected into every ingest prompt | + +### Modified files + +| File | Change | +|---|---| +| `ingestion/internal/api/handler.go` | Add `Ingest`, `IngestPath` methods; update `NewHandler` signature | +| `ingestion/internal/api/handler_test.go` | Update `NewHandler` call; add ingest handler tests | +| `ingestion/cmd/server/main.go` | New env vars, wire pipeline config, start watcher | +| `internal/config/config.go` | Add `IngestSvcURL`, `KBRetrievalURL` fields | +| `cmd/supervisor/main.go` | Pass new fields to `brain.New()` | + +--- + +## Task 1: Supervisor config wiring + +Wire `IngestSvcURL` and `KBRetrievalURL` into the supervisor so `brain_ingest` and `brain_search` become active when configured. + +**Files:** +- Modify: `internal/config/config.go` +- Modify: `cmd/supervisor/main.go:106-108` + +- [ ] **Step 1: Add fields to Config** + +In `internal/config/config.go`, add two fields and their env-var loading: + +```go +type Config struct { + Port string + LiteLLMBaseURL string + LiteLLMAPIKey string + ConfigDir string + ModelsFile string + IngestBaseURL string + IngestSvcURL string // INGEST_SVC_URL — brain_ingest endpoint + KBRetrievalURL string // KB_RETRIEVAL_URL — brain_search endpoint + SessionsDir string + BrainDir string +} + +func Load() (Config, error) { + cfg := Config{ + Port: envOr("SUPERVISOR_PORT", "3200"), + LiteLLMBaseURL: envOr("LITELLM_BASE_URL", "http://iguana:4000"), + LiteLLMAPIKey: os.Getenv("LITELLM_API_KEY"), + ConfigDir: envOr("SUPERVISOR_CONFIG_DIR", "./config/supervisor"), + } + cfg.ModelsFile = envOr("SUPERVISOR_MODELS_FILE", cfg.ConfigDir+"/../models.yaml") + cfg.IngestBaseURL = envOr("INGEST_BASE_URL", "http://localhost:3300") + cfg.IngestSvcURL = os.Getenv("INGEST_SVC_URL") + cfg.KBRetrievalURL = os.Getenv("KB_RETRIEVAL_URL") + cfg.SessionsDir = envOr("SUPERVISOR_SESSIONS_DIR", "./brain/sessions") + cfg.BrainDir = envOr("SUPERVISOR_BRAIN_DIR", "./brain") + return cfg, nil +} +``` + +- [ ] **Step 2: Pass to brain.New() in main.go** + +Find the `brain.New(...)` call in `cmd/supervisor/main.go` (around line 106) and update it: + +```go +reg.Register(brain.New(brain.Config{ + IngestBaseURL: cfg.IngestBaseURL, + IngestSvcURL: cfg.IngestSvcURL, + KBRetrievalURL: cfg.KBRetrievalURL, +})) +``` + +- [ ] **Step 3: Build and verify** + +```bash +cd /Users/mathias/dev/AI/hyperguild && go build ./... +``` + +Expected: no output (clean build). + +- [ ] **Step 4: Commit** + +```bash +git add internal/config/config.go cmd/supervisor/main.go +git commit -m "feat(config): add IngestSvcURL and KBRetrievalURL to supervisor config" +``` + +--- + +## Task 2: brain/CLAUDE.md schema document + +Create the schema document injected into every ingest prompt. This teaches the LLM the wiki structure and rules. + +**Files:** +- Create: `brain/CLAUDE.md` + +- [ ] **Step 1: Write the schema** + +Create `brain/CLAUDE.md`: + +```markdown +# Brain Wiki Schema + +This document defines the three page types in the brain wiki. +The LLM must follow this schema exactly when generating wiki pages. + +## Wikilink Format + +All cross-references use `[[slug|Display Text]]`. + +Rules: +- slug = lowercase filename without .md, spaces → hyphens, strip all non-alphanumeric except hyphens +- The `|` separator is REQUIRED — never use `[[Title]]` without a slug +- Examples: `[[domain-driven-design|Domain Driven Design]]`, `[[ryan-singer|Ryan Singer]]` +- Slugs must resolve to an existing file in the inventory, or a file you are creating in this response + +Slug generation examples: +- "Domain Driven Design" → `domain-driven-design` +- "It's Complicated" → `its-complicated` +- "gRPC" → `grpc` +- "GPT-4o" → `gpt-4o` + +## Domains + +Use one of: `ai-llm`, `software-engineering`, `product-strategy`, `finance-markets`, +`personal`, `consulting`, `climate`, `infrastructure`, `security` + +--- + +## Source Pages — wiki/sources/.md + +One page per ingested source. Books are NEVER split across multiple source pages — update the existing one. + +Required frontmatter: +```yaml +title: +type: article | pdf | book | video | note | project +domain: +date_ingested: YYYY-MM-DD +last_updated: YYYY-MM-DD +aliases: + - +``` + +Body sections (in this order): + +### Summary +2–3 sentences. Core argument or finding. + +### Key Claims +Bulleted list. Paraphrase — no verbatim quotes or code. + +### Concepts Introduced or Reinforced +Wikilinks to wiki/concepts/ ONLY. One per line. + +### Entities Mentioned +Wikilinks to wiki/entities/ ONLY. One per line. + +### Open Questions Raised +Gaps or follow-up questions from this source. + +For books only, also add: + +### Chapters +One bullet per chapter with 1–2 sentence summary. + +### Argument Arc +Overall narrative as it becomes clear across chapters. + +### Updates +Dated entries appended on re-ingestion. NEVER rewrite — only append. + +--- + +## Concept Pages — wiki/concepts/.md + +One page per idea, framework, methodology, or pattern. + +Required frontmatter: +```yaml +title: +domain: +last_updated: YYYY-MM-DD +aliases: + - +``` + +Body sections (in this order): + +### Definition +One-paragraph plain-language explanation. + +### Why It Matters +Practical significance. Why should anyone care? + +### Related Concepts +Wikilinks to wiki/concepts/ ONLY. + +### Related Entities +Wikilinks to wiki/entities/ ONLY. + +### Sources +Wikilinks to wiki/sources/ ONLY. + +### Evolving Notes +Updated as new sources arrive. Append, do not rewrite. + +--- + +## Entity Pages — wiki/entities/.md + +One page per person, tool, organisation, technology, or product. + +Required frontmatter: +```yaml +title: +type: person | company | tool | model | framework | technology +domain: +last_updated: YYYY-MM-DD +aliases: + - +``` + +Body sections (in this order): + +### Description +One-line description. + +### Relevance +Why this entity matters to this knowledge base. + +### Key Positions, Products, or Claims +With dates where known. + +### Related Concepts +Wikilinks to wiki/concepts/ ONLY. + +### Related Entities +Wikilinks to wiki/entities/ ONLY. + +### Sources +Wikilinks to wiki/sources/ ONLY. + +--- + +## Non-Negotiable Rules + +1. Output ONLY a valid JSON array — no markdown fences, no prose before or after +2. Each element: `{"path": "wiki//.md", "content": "...full markdown..."}` +3. Slugs are kebab-case: lowercase, spaces→hyphens, strip special characters +4. Every wikilink must be `[[slug|Display Text]]` — the pipe separator is required +5. Dates always YYYY-MM-DD +6. Never reproduce verbatim code — describe the pattern or technique +7. Section links must match their section type (Related Concepts → concepts/ only, etc.) +8. One source page per book — if inventory shows it exists, include it as an UPDATE +``` + +- [ ] **Step 2: Commit** + +```bash +git add brain/CLAUDE.md +git commit -m "docs(brain): add wiki schema document for ingest prompt" +``` + +--- + +## Task 3: wiki/slug — slug generation + +**Files:** +- Create: `ingestion/internal/wiki/types.go` +- Create: `ingestion/internal/wiki/slug.go` +- Create: `ingestion/internal/wiki/slug_test.go` + +- [ ] **Step 1: Write types.go** + +```go +// ingestion/internal/wiki/types.go +package wiki + +// PageType identifies the wiki subdirectory for a page. +type PageType string + +const ( + PageTypeConcept PageType = "concepts" + PageTypeEntity PageType = "entities" + PageTypeSource PageType = "sources" +) + +// Page is a wiki page to be written to disk. +type Page struct { + Path string // relative to brainDir, e.g. "wiki/sources/foo.md" + Content string // full markdown including YAML frontmatter +} + +// Entry is a summary of an existing wiki page used to build the inventory. +type Entry struct { + Slug string + Title string + Type PageType +} +``` + +- [ ] **Step 2: Write the failing test** + +```go +// ingestion/internal/wiki/slug_test.go +package wiki + +import ( + "testing" + + "github.com/stretchr/testify/assert" +) + +func TestSlug(t *testing.T) { + tests := []struct { + input string + want string + }{ + {"Domain Driven Design", "domain-driven-design"}, + {"It's Complicated", "its-complicated"}, + {"gRPC", "grpc"}, + {"GPT-4o", "gpt-4o"}, + {"Property 1: It's Rough", "property-1-its-rough"}, + {" leading spaces ", "leading-spaces"}, + {"multiple spaces", "multiple-spaces"}, + {"already-kebab", "already-kebab"}, + } + for _, tc := range tests { + t.Run(tc.input, func(t *testing.T) { + assert.Equal(t, tc.want, Slug(tc.input)) + }) + } +} +``` + +- [ ] **Step 3: Run test to verify it fails** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/wiki/... 2>&1 +``` + +Expected: `undefined: Slug` + +- [ ] **Step 4: Write slug.go** + +```go +// ingestion/internal/wiki/slug.go +package wiki + +import ( + "strings" + "unicode" +) + +// Slug converts a title to a kebab-case slug suitable for wiki filenames. +// Rules: lowercase, spaces/hyphens/underscores → hyphens, strip everything else. +func Slug(title string) string { + var b strings.Builder + prevHyphen := true // start true to trim leading hyphens + for _, r := range strings.ToLower(title) { + switch { + case r == ' ' || r == '-' || r == '_': + if !prevHyphen { + b.WriteRune('-') + prevHyphen = true + } + case unicode.IsLetter(r) || unicode.IsDigit(r): + b.WriteRune(r) + prevHyphen = false + // all other characters (apostrophes, colons, dots, etc.) are dropped + } + } + return strings.TrimRight(b.String(), "-") +} +``` + +- [ ] **Step 5: Run test to verify it passes** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/wiki/... -run TestSlug -v +``` + +Expected: all cases PASS. + +- [ ] **Step 6: Commit** + +```bash +git add ingestion/internal/wiki/ +git commit -m "feat(ingestion): add wiki package with Page types and slug generation" +``` + +--- + +## Task 4: wiki/merge — page merge logic + +**Files:** +- Create: `ingestion/internal/wiki/merge.go` +- Create: `ingestion/internal/wiki/merge_test.go` + +- [ ] **Step 1: Write the failing tests** + +```go +// ingestion/internal/wiki/merge_test.go +package wiki + +import ( + "testing" + + "github.com/stretchr/testify/assert" +) + +func TestMerge_BulletSectionsUnion(t *testing.T) { + a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Related Concepts\n\n- [[bar|Bar]]\n"} + b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Related Concepts\n\n- [[bar|Bar]]\n- [[baz|Baz]]\n"} + + got := Merge(a, b) + assert.Contains(t, got.Content, "[[bar|Bar]]") + assert.Contains(t, got.Content, "[[baz|Baz]]") + // bar appears only once + assert.Equal(t, 1, strings.Count(got.Content, "[[bar|Bar]]")) +} + +func TestMerge_AppendSections(t *testing.T) { + a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Evolving Notes\n\nFirst note.\n"} + b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Evolving Notes\n\nSecond note.\n"} + + got := Merge(a, b) + assert.Contains(t, got.Content, "First note.") + assert.Contains(t, got.Content, "Second note.") +} + +func TestMerge_KeepFirstForOtherSections(t *testing.T) { + a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFirst definition.\n"} + b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nSecond definition.\n"} + + got := Merge(a, b) + assert.Contains(t, got.Content, "First definition.") + assert.NotContains(t, got.Content, "Second definition.") +} + +func TestMerge_NewSectionFromB(t *testing.T) { + a := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nA thing.\n"} + b := Page{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Why It Matters\n\nBecause reasons.\n"} + + got := Merge(a, b) + assert.Contains(t, got.Content, "A thing.") + assert.Contains(t, got.Content, "Because reasons.") +} + +func TestMerge_KeepsFrontmatterFromA(t *testing.T) { + a := Page{Path: "p.md", Content: "---\ntitle: A\nlast_updated: 2026-01-01\n---\n\n## Definition\n\nA.\n"} + b := Page{Path: "p.md", Content: "---\ntitle: B\nlast_updated: 2026-06-01\n---\n\n## Definition\n\nB.\n"} + + got := Merge(a, b) + assert.Contains(t, got.Content, "title: A") + assert.NotContains(t, got.Content, "title: B") +} +``` + +Add `"strings"` import to the test file. + +- [ ] **Step 2: Run test to verify it fails** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/wiki/... -run TestMerge 2>&1 +``` + +Expected: `undefined: Merge` + +- [ ] **Step 3: Write merge.go** + +```go +// ingestion/internal/wiki/merge.go +package wiki + +import ( + "fmt" + "strings" +) + +var bulletSections = map[string]bool{ + "Related Concepts": true, + "Related Entities": true, + "Sources": true, + "Key Claims": true, + "Entities Mentioned": true, + "Concepts Introduced or Reinforced": true, + "Chapters": true, +} + +var appendSections = map[string]bool{ + "Evolving Notes": true, + "Updates": true, + "Open Questions Raised": true, + "Open Questions": true, +} + +type section struct { + heading string + content string +} + +// Merge combines two Page values with the same path. +// Frontmatter is taken from a. Sections are merged by strategy. +func Merge(a, b Page) Page { + fmA, secsA := parseSections(a.Content) + _, secsB := parseSections(b.Content) + + // Build heading → index map for A's sections. + idx := make(map[string]int, len(secsA)) + for i, s := range secsA { + idx[s.heading] = i + } + + for _, sB := range secsB { + i, exists := idx[sB.heading] + if !exists { + // New section not in A — append it. + idx[sB.heading] = len(secsA) + secsA = append(secsA, sB) + continue + } + sA := secsA[i] + switch { + case bulletSections[sB.heading]: + secsA[i].content = mergeBullets(sA.content, sB.content) + case appendSections[sB.heading]: + secsA[i].content = strings.TrimRight(sA.content, "\n") + "\n\n" + strings.TrimLeft(sB.content, "\n") + // default: keep first — do nothing + } + } + + return Page{Path: a.Path, Content: rebuildContent(fmA, secsA)} +} + +func parseSections(markdown string) (frontmatter string, sections []section) { + lines := strings.Split(markdown, "\n") + i := 0 + + // Parse YAML frontmatter between --- markers. + if i < len(lines) && strings.TrimSpace(lines[i]) == "---" { + i++ + var fmLines []string + for i < len(lines) { + if strings.TrimSpace(lines[i]) == "---" { + i++ + break + } + fmLines = append(fmLines, lines[i]) + i++ + } + frontmatter = fmt.Sprintf("---\n%s\n---\n", strings.Join(fmLines, "\n")) + } + + // Parse ## sections. + var cur *section + for ; i < len(lines); i++ { + line := lines[i] + if strings.HasPrefix(line, "## ") { + if cur != nil { + sections = append(sections, *cur) + } + cur = §ion{heading: strings.TrimPrefix(line, "## ")} + } else if cur != nil { + cur.content += line + "\n" + } + } + if cur != nil { + sections = append(sections, *cur) + } + return +} + +func rebuildContent(frontmatter string, sections []section) string { + var sb strings.Builder + sb.WriteString(frontmatter) + for _, sec := range sections { + fmt.Fprintf(&sb, "\n## %s\n\n%s", sec.heading, sec.content) + } + return sb.String() +} + +func mergeBullets(a, b string) string { + seen := make(map[string]bool) + var lines []string + for _, line := range strings.Split(a+b, "\n") { + trimmed := strings.TrimSpace(line) + if trimmed == "" || seen[trimmed] { + continue + } + seen[trimmed] = true + lines = append(lines, line) + } + return strings.Join(lines, "\n") + "\n" +} +``` + +- [ ] **Step 4: Run tests to verify they pass** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/wiki/... -v +``` + +Expected: all PASS. + +- [ ] **Step 5: Commit** + +```bash +git add ingestion/internal/wiki/merge.go ingestion/internal/wiki/merge_test.go +git commit -m "feat(ingestion): add wiki page merge logic" +``` + +--- + +## Task 5: wiki/inventory — load existing wiki pages + +Reads `brain/wiki/` and builds a grouped slug index for injection into the LLM prompt. + +**Files:** +- Create: `ingestion/internal/wiki/inventory.go` +- Create: `ingestion/internal/wiki/inventory_test.go` + +- [ ] **Step 1: Write the failing test** + +```go +// ingestion/internal/wiki/inventory_test.go +package wiki + +import ( + "os" + "path/filepath" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestLoadInventory(t *testing.T) { + dir := t.TempDir() + require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755)) + require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755)) + require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755)) + + require.NoError(t, os.WriteFile( + filepath.Join(dir, "wiki", "concepts", "domain-driven-design.md"), + []byte("---\ntitle: Domain Driven Design\n---\n\n## Definition\n\nA thing.\n"), + 0o644, + )) + require.NoError(t, os.WriteFile( + filepath.Join(dir, "wiki", "entities", "ryan-singer.md"), + []byte("---\ntitle: Ryan Singer\n---\n\n## Description\n\nDesigner.\n"), + 0o644, + )) + + inv, err := LoadInventory(dir) + require.NoError(t, err) + + assert.Len(t, inv[PageTypeConcept], 1) + assert.Equal(t, "domain-driven-design", inv[PageTypeConcept][0].Slug) + assert.Equal(t, "Domain Driven Design", inv[PageTypeConcept][0].Title) + + assert.Len(t, inv[PageTypeEntity], 1) + assert.Equal(t, "ryan-singer", inv[PageTypeEntity][0].Slug) + + assert.Empty(t, inv[PageTypeSource]) +} +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/wiki/... -run TestLoadInventory 2>&1 +``` + +Expected: `undefined: LoadInventory` + +- [ ] **Step 3: Write inventory.go** + +```go +// ingestion/internal/wiki/inventory.go +package wiki + +import ( + "bufio" + "os" + "path/filepath" + "strings" +) + +// LoadInventory walks brain/wiki/ and returns all pages grouped by type. +func LoadInventory(brainDir string) (map[PageType][]Entry, error) { + result := map[PageType][]Entry{ + PageTypeConcept: {}, + PageTypeEntity: {}, + PageTypeSource: {}, + } + for pt := range result { + dir := filepath.Join(brainDir, "wiki", string(pt)) + entries, err := os.ReadDir(dir) + if os.IsNotExist(err) { + continue + } + if err != nil { + return nil, err + } + for _, e := range entries { + if e.IsDir() || !strings.HasSuffix(e.Name(), ".md") { + continue + } + slug := strings.TrimSuffix(e.Name(), ".md") + path := filepath.Join(dir, e.Name()) + title := readTitle(path, slug) + result[pt] = append(result[pt], Entry{Slug: slug, Title: title, Type: pt}) + } + } + return result, nil +} + +// readTitle extracts the title from YAML frontmatter, falling back to slug. +func readTitle(path, fallback string) string { + f, err := os.Open(path) + if err != nil { + return fallback + } + defer f.Close() + + scanner := bufio.NewScanner(f) + inFM := false + for scanner.Scan() { + line := scanner.Text() + if strings.TrimSpace(line) == "---" { + if !inFM { + inFM = true + continue + } + break + } + if inFM { + key, val, ok := strings.Cut(line, ":") + if ok && strings.TrimSpace(key) == "title" { + return strings.Trim(strings.TrimSpace(val), `"'`) + } + } + } + return fallback +} +``` + +- [ ] **Step 4: Run tests to verify they pass** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/wiki/... -v +``` + +Expected: all PASS. + +- [ ] **Step 5: Commit** + +```bash +git add ingestion/internal/wiki/inventory.go ingestion/internal/wiki/inventory_test.go +git commit -m "feat(ingestion): add wiki inventory loader" +``` + +--- + +## Task 6: wiki/index + wiki/log — index rebuilder and audit log + +**Files:** +- Create: `ingestion/internal/wiki/index.go` +- Create: `ingestion/internal/wiki/log.go` +- Create: `ingestion/internal/wiki/index_test.go` + +- [ ] **Step 1: Write failing tests** + +```go +// ingestion/internal/wiki/index_test.go +package wiki + +import ( + "os" + "path/filepath" + "strings" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func setupWikiDir(t *testing.T) string { + t.Helper() + dir := t.TempDir() + require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "concepts"), 0o755)) + require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "entities"), 0o755)) + require.NoError(t, os.MkdirAll(filepath.Join(dir, "wiki", "sources"), 0o755)) + require.NoError(t, os.WriteFile( + filepath.Join(dir, "wiki", "concepts", "tdd.md"), + []byte("---\ntitle: TDD\n---\n\n## Definition\n\nTest-driven development is a discipline.\n"), + 0o644, + )) + return dir +} + +func TestRebuildIndex(t *testing.T) { + dir := setupWikiDir(t) + require.NoError(t, RebuildIndex(dir, "2026-04-22")) + + content, err := os.ReadFile(filepath.Join(dir, "wiki", "index.md")) + require.NoError(t, err) + s := string(content) + assert.Contains(t, s, "# Wiki Index") + assert.Contains(t, s, "2026-04-22") + assert.Contains(t, s, "[[tdd|TDD]]") + assert.Contains(t, s, "## Concepts") +} + +func TestAppendLog(t *testing.T) { + dir := t.TempDir() + require.NoError(t, AppendLog(dir, "shape-up-book", + []string{"wiki/sources/shape-up.md", "wiki/concepts/betting-table.md"}, + nil, "2026-04-22")) + + content, err := os.ReadFile(filepath.Join(dir, "log.md")) + require.NoError(t, err) + s := string(content) + assert.Contains(t, s, "shape-up-book") + assert.Contains(t, s, "wiki/sources/shape-up.md") + assert.True(t, strings.HasPrefix(s, "## 2026-04-22")) +} + +func TestAppendLog_AppendsOnSecondCall(t *testing.T) { + dir := t.TempDir() + require.NoError(t, AppendLog(dir, "source-a", []string{"wiki/sources/a.md"}, nil, "2026-04-22")) + require.NoError(t, AppendLog(dir, "source-b", []string{"wiki/sources/b.md"}, nil, "2026-04-22")) + + content, err := os.ReadFile(filepath.Join(dir, "log.md")) + require.NoError(t, err) + assert.Contains(t, string(content), "source-a") + assert.Contains(t, string(content), "source-b") +} +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/wiki/... -run "TestRebuildIndex|TestAppendLog" 2>&1 +``` + +Expected: `undefined: RebuildIndex` + +- [ ] **Step 3: Write index.go** + +```go +// ingestion/internal/wiki/index.go +package wiki + +import ( + "fmt" + "os" + "path/filepath" + "strings" +) + +// RebuildIndex writes brain/wiki/index.md from the current wiki contents. +func RebuildIndex(brainDir, date string) error { + inv, err := LoadInventory(brainDir) + if err != nil { + return fmt.Errorf("load inventory: %w", err) + } + + total := len(inv[PageTypeConcept]) + len(inv[PageTypeEntity]) + len(inv[PageTypeSource]) + var sb strings.Builder + fmt.Fprintf(&sb, "# Wiki Index\n\n") + fmt.Fprintf(&sb, "_Updated: %s — %d pages (%d concepts, %d entities, %d sources)_\n\n", + date, total, + len(inv[PageTypeConcept]), + len(inv[PageTypeEntity]), + len(inv[PageTypeSource])) + + for _, pt := range []PageType{PageTypeConcept, PageTypeEntity, PageTypeSource} { + entries := inv[pt] + if len(entries) == 0 { + continue + } + fmt.Fprintf(&sb, "## %s\n\n", strings.Title(string(pt))) + for _, e := range entries { + summary := pageFirstSentence(brainDir, e) + fmt.Fprintf(&sb, "- [[%s|%s]] — %s\n", e.Slug, e.Title, summary) + } + sb.WriteString("\n") + } + + dest := filepath.Join(brainDir, "wiki", "index.md") + return os.WriteFile(dest, []byte(sb.String()), 0o644) +} + +func pageFirstSentence(brainDir string, e Entry) string { + path := filepath.Join(brainDir, "wiki", string(e.Type), e.Slug+".md") + content, err := os.ReadFile(path) + if err != nil { + return "" + } + // Skip frontmatter, find first non-empty line after a ## heading. + parts := strings.SplitN(string(content), "---", 3) + body := string(content) + if len(parts) == 3 { + body = parts[2] + } + for _, line := range strings.Split(body, "\n") { + line = strings.TrimSpace(line) + if line == "" || strings.HasPrefix(line, "#") { + continue + } + if len(line) > 100 { + return line[:100] + "…" + } + return line + } + return "" +} +``` + +- [ ] **Step 4: Write log.go** + +```go +// ingestion/internal/wiki/log.go +package wiki + +import ( + "fmt" + "os" + "path/filepath" + "strings" +) + +// AppendLog appends one ingestion record to brain/log.md. +func AppendLog(brainDir, source string, pages, warnings []string, date string) error { + var sb strings.Builder + fmt.Fprintf(&sb, "## %s — ingest\n\n", date) + fmt.Fprintf(&sb, "- **Source:** %s\n", source) + if len(pages) > 0 { + sb.WriteString("- **Pages written:**\n") + for _, p := range pages { + fmt.Fprintf(&sb, " - %s\n", p) + } + } + if len(warnings) > 0 { + sb.WriteString("- **Warnings:**\n") + for _, w := range warnings { + fmt.Fprintf(&sb, " - %s\n", w) + } + } + sb.WriteString("\n") + + logPath := filepath.Join(brainDir, "log.md") + f, err := os.OpenFile(logPath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644) + if err != nil { + return fmt.Errorf("open log: %w", err) + } + defer f.Close() + _, err = f.WriteString(sb.String()) + return err +} +``` + +- [ ] **Step 5: Run tests to verify they pass** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/wiki/... -v +``` + +Expected: all PASS. (Note: `strings.Title` may show a deprecation warning — it is fine for this use case.) + +- [ ] **Step 6: Commit** + +```bash +git add ingestion/internal/wiki/index.go ingestion/internal/wiki/log.go ingestion/internal/wiki/index_test.go +git commit -m "feat(ingestion): add wiki index rebuilder and audit log" +``` + +--- + +## Task 7: llm/client — OpenAI-compatible HTTP client + +**Files:** +- Create: `ingestion/internal/llm/client.go` +- Create: `ingestion/internal/llm/client_test.go` + +- [ ] **Step 1: Write the failing test** + +```go +// ingestion/internal/llm/client_test.go +package llm + +import ( + "context" + "encoding/json" + "net/http" + "net/http/httptest" + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func mockLLMServer(t *testing.T, response string) *httptest.Server { + t.Helper() + return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + assert.Equal(t, "/chat/completions", r.URL.Path) + assert.Equal(t, "application/json", r.Header.Get("Content-Type")) + w.Header().Set("Content-Type", "application/json") + json.NewEncoder(w).Encode(map[string]any{ + "choices": []map[string]any{ + {"message": map[string]any{"role": "assistant", "content": response}}, + }, + }) + })) +} + +func TestClient_Complete(t *testing.T) { + srv := mockLLMServer(t, "hello world") + defer srv.Close() + + c := New(srv.URL, "", "test-model", 10*time.Second) + got, err := c.Complete(context.Background(), "you are helpful", "say hello") + require.NoError(t, err) + assert.Equal(t, "hello world", got) +} + +func TestClient_ReturnsErrorOnNon200(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + http.Error(w, "overloaded", http.StatusServiceUnavailable) + })) + defer srv.Close() + + c := New(srv.URL, "", "test-model", 10*time.Second) + _, err := c.Complete(context.Background(), "sys", "user") + assert.Error(t, err) +} + +func TestClient_SendsAuthHeader(t *testing.T) { + var gotAuth string + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + gotAuth = r.Header.Get("Authorization") + json.NewEncoder(w).Encode(map[string]any{ + "choices": []map[string]any{{"message": map[string]any{"content": "ok"}}}, + }) + })) + defer srv.Close() + + c := New(srv.URL, "my-key", "test-model", 10*time.Second) + _, err := c.Complete(context.Background(), "sys", "user") + require.NoError(t, err) + assert.Equal(t, "Bearer my-key", gotAuth) +} +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/llm/... 2>&1 +``` + +Expected: `undefined: New` + +- [ ] **Step 3: Write client.go** + +```go +// ingestion/internal/llm/client.go +package llm + +import ( + "bytes" + "context" + "encoding/json" + "fmt" + "io" + "net/http" + "strings" + "time" +) + +// Client calls an OpenAI-compatible chat completions endpoint. +type Client struct { + baseURL string + apiKey string + model string + httpClient *http.Client +} + +// New constructs a Client. +func New(baseURL, apiKey, model string, timeout time.Duration) *Client { + return &Client{ + baseURL: strings.TrimRight(baseURL, "/"), + apiKey: apiKey, + model: model, + httpClient: &http.Client{Timeout: timeout}, + } +} + +type chatRequest struct { + Model string `json:"model"` + Messages []message `json:"messages"` + Temperature float64 `json:"temperature"` +} + +type message struct { + Role string `json:"role"` + Content string `json:"content"` +} + +type chatResponse struct { + Choices []struct { + Message message `json:"message"` + } `json:"choices"` +} + +// Complete sends a system + user message and returns the assistant's reply. +func (c *Client) Complete(ctx context.Context, system, user string) (string, error) { + body := chatRequest{ + Model: c.model, + Messages: []message{ + {Role: "system", Content: system}, + {Role: "user", Content: user}, + }, + Temperature: 0.2, + } + b, err := json.Marshal(body) + if err != nil { + return "", fmt.Errorf("marshal request: %w", err) + } + + req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.baseURL+"/chat/completions", bytes.NewReader(b)) + if err != nil { + return "", fmt.Errorf("build request: %w", err) + } + req.Header.Set("Content-Type", "application/json") + if c.apiKey != "" { + req.Header.Set("Authorization", "Bearer "+c.apiKey) + } + + resp, err := c.httpClient.Do(req) + if err != nil { + return "", fmt.Errorf("call LLM: %w", err) + } + defer resp.Body.Close() + + out, err := io.ReadAll(resp.Body) + if err != nil { + return "", fmt.Errorf("read response: %w", err) + } + if resp.StatusCode != http.StatusOK { + return "", fmt.Errorf("LLM returned %d: %s", resp.StatusCode, out) + } + + var cr chatResponse + if err := json.Unmarshal(out, &cr); err != nil { + return "", fmt.Errorf("parse response: %w", err) + } + if len(cr.Choices) == 0 { + return "", fmt.Errorf("LLM returned no choices") + } + return cr.Choices[0].Message.Content, nil +} +``` + +- [ ] **Step 4: Run tests to verify they pass** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/llm/... -v +``` + +Expected: all PASS. + +- [ ] **Step 5: Commit** + +```bash +git add ingestion/internal/llm/ +git commit -m "feat(ingestion): add OpenAI-compatible LLM HTTP client" +``` + +--- + +## Task 8: pipeline/chunk + pipeline/parse + +**Files:** +- Create: `ingestion/internal/pipeline/chunk.go` +- Create: `ingestion/internal/pipeline/chunk_test.go` +- Create: `ingestion/internal/pipeline/parse.go` +- Create: `ingestion/internal/pipeline/parse_test.go` + +- [ ] **Step 1: Write failing tests** + +```go +// ingestion/internal/pipeline/chunk_test.go +package pipeline + +import ( + "strings" + "testing" + + "github.com/stretchr/testify/assert" +) + +func TestChunk_NoChunkingWhenZero(t *testing.T) { + content := strings.Repeat("word ", 1000) + chunks := Chunk(content, 0) + assert.Len(t, chunks, 1) + assert.Equal(t, strings.TrimSpace(content), chunks[0]) +} + +func TestChunk_SplitsAtParagraph(t *testing.T) { + // Two paragraphs; each ~30 chars. maxSize = 40 should split them. + content := "First paragraph here.\n\nSecond paragraph here." + chunks := Chunk(content, 40) + assert.Len(t, chunks, 2) + assert.Equal(t, "First paragraph here.", chunks[0]) + assert.Equal(t, "Second paragraph here.", chunks[1]) +} + +func TestChunk_SingleLargeParagraph(t *testing.T) { + // A single paragraph larger than maxSize stays as one chunk. + content := strings.Repeat("x", 100) + chunks := Chunk(content, 50) + assert.Len(t, chunks, 1) +} +``` + +```go +// ingestion/internal/pipeline/parse_test.go +package pipeline + +import ( + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "github.com/mathiasbq/hyperguild/ingestion/internal/wiki" +) + +func TestParsePages_ValidJSON(t *testing.T) { + input := `[{"path":"wiki/sources/foo.md","content":"# Foo"},{"path":"wiki/concepts/bar.md","content":"# Bar"}]` + pages, warnings := ParsePages(input) + require.Len(t, pages, 2) + assert.Empty(t, warnings) + assert.Equal(t, "wiki/sources/foo.md", pages[0].Path) +} + +func TestParsePages_StripsFences(t *testing.T) { + input := "```json\n[{\"path\":\"wiki/sources/foo.md\",\"content\":\"# Foo\"}]\n```" + pages, warnings := ParsePages(input) + assert.Len(t, pages, 1) + assert.Empty(t, warnings) +} + +func TestParsePages_TruncationRecovery(t *testing.T) { + // Valid first object, second truncated mid-way. + input := `[{"path":"wiki/sources/foo.md","content":"# Foo"},{"path":"wiki/concepts/bar.md","content":"trunc` + pages, warnings := ParsePages(input) + require.Len(t, pages, 1) + assert.Equal(t, "wiki/sources/foo.md", pages[0].Path) + assert.NotEmpty(t, warnings) +} + +func TestParsePages_EmptyInput(t *testing.T) { + pages, warnings := ParsePages("") + assert.Empty(t, pages) + assert.NotEmpty(t, warnings) +} + +// Ensure wiki.Page is used correctly (compile-time check). +var _ []wiki.Page = nil +``` + +- [ ] **Step 2: Run tests to verify they fail** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/pipeline/... 2>&1 +``` + +Expected: `undefined: Chunk` and `undefined: ParsePages` + +- [ ] **Step 3: Write chunk.go** + +```go +// ingestion/internal/pipeline/chunk.go +package pipeline + +import "strings" + +// Chunk splits content into pieces of at most maxSize bytes, splitting at +// paragraph boundaries (\n\n). If maxSize <= 0, returns content as one chunk. +func Chunk(content string, maxSize int) []string { + content = strings.TrimSpace(content) + if maxSize <= 0 || len(content) <= maxSize { + return []string{content} + } + + paragraphs := strings.Split(content, "\n\n") + var chunks []string + var cur strings.Builder + + for _, para := range paragraphs { + para = strings.TrimSpace(para) + if para == "" { + continue + } + // If adding this paragraph would exceed maxSize and we already have content, + // flush the current chunk first. + addition := para + if cur.Len() > 0 { + addition = "\n\n" + para + } + if cur.Len() > 0 && cur.Len()+len(addition) > maxSize { + chunks = append(chunks, cur.String()) + cur.Reset() + cur.WriteString(para) + } else { + cur.WriteString(addition) + } + } + if cur.Len() > 0 { + chunks = append(chunks, cur.String()) + } + return chunks +} +``` + +- [ ] **Step 4: Write parse.go** + +```go +// ingestion/internal/pipeline/parse.go +package pipeline + +import ( + "encoding/json" + "fmt" + "strings" + + "github.com/mathiasbq/hyperguild/ingestion/internal/wiki" +) + +// ParsePages parses LLM output as a JSON array of {path, content} objects. +// If the array is truncated mid-object (token limit), it salvages all complete objects. +func ParsePages(output string) ([]wiki.Page, []string) { + output = strings.TrimSpace(output) + if output == "" { + return nil, []string{"LLM returned empty output"} + } + + output = stripFences(output) + + var pages []wiki.Page + if err := json.Unmarshal([]byte(output), &pages); err == nil { + return pages, nil + } + + // Truncation recovery: find last `}` that closes a complete object. + idx := strings.LastIndex(output, "}") + if idx < 0 { + return nil, []string{"LLM output contained no complete JSON objects"} + } + + start := strings.Index(output, "[") + if start < 0 { + return nil, []string{"LLM output contained no JSON array opening bracket"} + } + + candidate := output[start:idx+1] + "]" + if err := json.Unmarshal([]byte(candidate), &pages); err != nil { + return nil, []string{fmt.Sprintf("truncation recovery failed: %v", err)} + } + + return pages, []string{fmt.Sprintf("LLM output was truncated; recovered %d page(s)", len(pages))} +} + +func stripFences(s string) string { + for _, prefix := range []string{"```json\n", "```json\r\n", "```\n", "```\r\n"} { + if strings.HasPrefix(s, prefix) { + s = strings.TrimPrefix(s, prefix) + s = strings.TrimSuffix(strings.TrimSpace(s), "```") + return strings.TrimSpace(s) + } + } + return s +} +``` + +- [ ] **Step 5: Run tests to verify they pass** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/pipeline/... -v +``` + +Expected: all PASS. + +- [ ] **Step 6: Commit** + +```bash +git add ingestion/internal/pipeline/chunk.go ingestion/internal/pipeline/chunk_test.go \ + ingestion/internal/pipeline/parse.go ingestion/internal/pipeline/parse_test.go +git commit -m "feat(ingestion): add content chunking and LLM JSON output parser" +``` + +--- + +## Task 9: pipeline/prompt + pipeline/pipeline — prompt builder and orchestrator + +**Files:** +- Create: `ingestion/internal/pipeline/prompt.go` +- Create: `ingestion/internal/pipeline/pipeline.go` +- Create: `ingestion/internal/pipeline/pipeline_test.go` + +- [ ] **Step 1: Write the failing integration test** + +```go +// ingestion/internal/pipeline/pipeline_test.go +package pipeline + +import ( + "context" + "encoding/json" + "net/http" + "net/http/httptest" + "os" + "path/filepath" + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "github.com/mathiasbq/hyperguild/ingestion/internal/llm" + "github.com/mathiasbq/hyperguild/ingestion/internal/wiki" +) + +func TestRun_WritesPages(t *testing.T) { + brainDir := t.TempDir() + for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} { + require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755)) + } + + // Mock LLM returns two pages. + llmResponse := mustJSON([]wiki.Page{ + { + Path: "wiki/sources/test-article.md", + Content: "---\ntitle: Test Article\ntype: article\ndomain: software-engineering\ndate_ingested: 2026-04-22\nlast_updated: 2026-04-22\naliases:\n - Test Article\n---\n\n## Summary\n\nA test article.\n\n## Key Claims\n\n- It tests things.\n\n## Concepts Introduced or Reinforced\n\n## Entities Mentioned\n\n## Open Questions Raised\n", + }, + { + Path: "wiki/concepts/testing.md", + Content: "---\ntitle: Testing\ndomain: software-engineering\nlast_updated: 2026-04-22\naliases:\n - Testing\n---\n\n## Definition\n\nThe practice of verifying software.\n\n## Why It Matters\n\nCatches bugs.\n\n## Related Concepts\n\n## Related Entities\n\n## Sources\n\n## Evolving Notes\n", + }, + }) + + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + json.NewEncoder(w).Encode(map[string]any{ + "choices": []map[string]any{ + {"message": map[string]any{"role": "assistant", "content": llmResponse}}, + }, + }) + })) + defer srv.Close() + + cfg := Config{ + Complete: llm.New(srv.URL, "", "test-model", 30*time.Second).Complete, + ChunkSize: 0, + } + + result, err := Run(context.Background(), cfg, brainDir, "An article about testing.", "test-article", false) + require.NoError(t, err) + assert.Len(t, result.Pages, 2) + assert.Empty(t, result.Warnings) + + // Files must exist on disk. + _, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "test-article.md")) + require.NoError(t, err) + _, err = os.Stat(filepath.Join(brainDir, "wiki", "concepts", "testing.md")) + require.NoError(t, err) + + // Index must be rebuilt. + _, err = os.Stat(filepath.Join(brainDir, "wiki", "index.md")) + require.NoError(t, err) + + // Log must be appended. + _, err = os.Stat(filepath.Join(brainDir, "log.md")) + require.NoError(t, err) +} + +func TestRun_DryRunDoesNotWrite(t *testing.T) { + brainDir := t.TempDir() + for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} { + require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755)) + } + + llmResponse := mustJSON([]wiki.Page{{ + Path: "wiki/sources/foo.md", + Content: "---\ntitle: Foo\n---\n\n## Summary\n\nFoo.\n", + }}) + + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + json.NewEncoder(w).Encode(map[string]any{ + "choices": []map[string]any{{"message": map[string]any{"content": llmResponse}}}, + }) + })) + defer srv.Close() + + cfg := Config{Complete: llm.New(srv.URL, "", "m", 30*time.Second).Complete} + result, err := Run(context.Background(), cfg, brainDir, "foo content", "foo", true) + require.NoError(t, err) + assert.Len(t, result.Pages, 1) + + // No files written in dry-run. + _, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "foo.md")) + assert.True(t, os.IsNotExist(err)) +} + +func mustJSON(v any) string { + b, _ := json.Marshal(v) + return string(b) +} +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/pipeline/... -run "TestRun" 2>&1 +``` + +Expected: `undefined: Run` and `undefined: Config` + +- [ ] **Step 3: Write prompt.go** + +```go +// ingestion/internal/pipeline/prompt.go +package pipeline + +import ( + "fmt" + "strings" + "time" + + "github.com/mathiasbq/hyperguild/ingestion/internal/wiki" +) + +const systemPrompt = `You are a wiki agent. Read the source material and produce structured wiki pages following the schema provided. + +Output ONLY a valid JSON array — no markdown fences, no other text before or after. +Each element must have: + "path" — relative path within the wiki, e.g. "wiki/sources/foo.md" + "content" — full markdown content of the page including YAML frontmatter + +Follow the schema strictly: correct frontmatter fields, wikilinks as [[slug|Display Text]], +dates in YYYY-MM-DD format, and paraphrase rather than quoting verbatim.` + +// BuildPrompt constructs the user prompt for a single chunk. +func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]wiki.Entry) string { + var sb strings.Builder + + fmt.Fprintf(&sb, "Today's date is %s.\n\n", time.Now().UTC().Format("2006-01-02")) + + sb.WriteString("## Schema\n\n") + sb.WriteString(schema) + sb.WriteString("\n\n") + + sb.WriteString("## Existing wiki pages\n\n") + sb.WriteString("Link ONLY to pages that appear in this inventory or that you are creating in this response.\n\n") + + for _, pt := range []wiki.PageType{wiki.PageTypeConcept, wiki.PageTypeEntity, wiki.PageTypeSource} { + entries := inventory[pt] + label := strings.ToTitle(string(pt)) + if len(entries) == 0 { + fmt.Fprintf(&sb, "%s — (none yet)\n\n", label) + continue + } + fmt.Fprintf(&sb, "%s — link ONLY under the matching section:\n", label) + for _, e := range entries { + fmt.Fprintf(&sb, " - [[%s|%s]]\n", e.Slug, e.Title) + } + sb.WriteString("\n") + } + + sb.WriteString("## Non-negotiable rules\n\n") + sb.WriteString("1. Output ONLY a valid JSON array — no prose, no fences.\n") + sb.WriteString("2. Slugs are kebab-case: lowercase, spaces→hyphens, no special chars.\n") + sb.WriteString("3. Wikilinks: [[slug|Display Text]] — the pipe is required.\n") + sb.WriteString("4. Section links must match their section type.\n") + sb.WriteString("5. One source page per book — update it if inventory shows it exists.\n\n") + + fmt.Fprintf(&sb, "## Source: %s\n\n", source) + sb.WriteString(content) + + return sb.String() +} +``` + +- [ ] **Step 4: Write pipeline.go** + +```go +// ingestion/internal/pipeline/pipeline.go +package pipeline + +import ( + "context" + "fmt" + "os" + "path/filepath" + "strings" + "time" + + "github.com/mathiasbq/hyperguild/ingestion/internal/wiki" +) + +// CompleteFunc is the function signature for LLM calls. +type CompleteFunc func(ctx context.Context, system, user string) (string, error) + +// Config holds pipeline configuration. +type Config struct { + Complete CompleteFunc + ChunkSize int // 0 = no chunking + Schema string // overrides brain/CLAUDE.md when set (used in tests) +} + +// Result is the outcome of a pipeline run. +type Result struct { + Pages []string // relative paths written (or would-be written in dry-run) + Warnings []string +} + +// Run ingests content and writes structured wiki pages to brainDir/wiki/. +// In dry-run mode, pages are returned but not written to disk. +func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryRun bool) (Result, error) { + // 1. Load inventory. + inventory, err := wiki.LoadInventory(brainDir) + if err != nil { + return Result{}, fmt.Errorf("load inventory: %w", err) + } + + // 2. Load schema. + schema := cfg.Schema + if schema == "" { + schema = loadSchema(brainDir) + } + + // 3. Chunk. + chunks := Chunk(content, cfg.ChunkSize) + + // 4 & 5. Call LLM per chunk, parse output. + var allPages []wiki.Page + var allWarnings []string + + for _, chunk := range chunks { + userPrompt := BuildPrompt(schema, source, chunk, inventory) + output, err := cfg.Complete(ctx, systemPrompt, userPrompt) + if err != nil { + return Result{}, fmt.Errorf("LLM call: %w", err) + } + pages, warnings := ParsePages(output) + allPages = append(allPages, pages...) + allWarnings = append(allWarnings, warnings...) + } + + // 6. Merge pages with the same path. + merged := mergeAll(allPages) + + // 7. Write (or preview in dry-run). + date := time.Now().UTC().Format("2006-01-02") + var written []string + + for _, page := range merged { + if !dryRun { + dest := filepath.Join(brainDir, filepath.FromSlash(page.Path)) + if err := os.MkdirAll(filepath.Dir(dest), 0o755); err != nil { + return Result{}, fmt.Errorf("mkdir for %s: %w", page.Path, err) + } + if err := os.WriteFile(dest, []byte(page.Content), 0o644); err != nil { + return Result{}, fmt.Errorf("write %s: %w", page.Path, err) + } + } + written = append(written, page.Path) + } + + if !dryRun { + // 8. Rebuild index. + if err := wiki.RebuildIndex(brainDir, date); err != nil { + allWarnings = append(allWarnings, fmt.Sprintf("rebuild index: %v", err)) + } + // 9. Append log. + if err := wiki.AppendLog(brainDir, source, written, allWarnings, date); err != nil { + allWarnings = append(allWarnings, fmt.Sprintf("append log: %v", err)) + } + } + + return Result{Pages: written, Warnings: allWarnings}, nil +} + +// mergeAll deduplicates pages by path, merging content from later occurrences. +func mergeAll(pages []wiki.Page) []wiki.Page { + order := make([]string, 0, len(pages)) + byPath := make(map[string]wiki.Page, len(pages)) + for _, p := range pages { + if _, seen := byPath[p.Path]; !seen { + order = append(order, p.Path) + byPath[p.Path] = p + } else { + byPath[p.Path] = wiki.Merge(byPath[p.Path], p) + } + } + result := make([]wiki.Page, 0, len(order)) + for _, path := range order { + result = append(result, byPath[path]) + } + return result +} + +// defaultSchema is embedded when brain/CLAUDE.md is absent. +const defaultSchema = `# Brain Wiki Schema +See brain/CLAUDE.md for the full schema. Using minimal fallback. +Three page types: wiki/sources/, wiki/concepts/, wiki/entities/. +` + +func loadSchema(brainDir string) string { + b, err := os.ReadFile(filepath.Join(brainDir, "CLAUDE.md")) + if err != nil { + return defaultSchema + } + return strings.TrimSpace(string(b)) +} +``` + +- [ ] **Step 5: Run tests to verify they pass** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/pipeline/... -v +``` + +Expected: all PASS. + +- [ ] **Step 6: Commit** + +```bash +git add ingestion/internal/pipeline/ +git commit -m "feat(ingestion): add pipeline orchestrator with prompt builder" +``` + +--- + +## Task 10: api — /ingest and /ingest-path handlers + main.go wiring + +**Files:** +- Modify: `ingestion/internal/api/handler.go` +- Modify: `ingestion/internal/api/handler_test.go` +- Modify: `ingestion/cmd/server/main.go` + +- [ ] **Step 1: Write failing handler tests** + +Add these test functions to `ingestion/internal/api/handler_test.go`. + +First update the `setup` helper to accept `pipeline.Config` — change the existing `setup` function: + +```go +func setup(t *testing.T) (string, *api.Handler) { + t.Helper() + dir := t.TempDir() + require.NoError(t, os.MkdirAll(filepath.Join(dir, "knowledge"), 0o755)) + require.NoError(t, os.WriteFile( + filepath.Join(dir, "knowledge", "tdd.md"), + []byte("---\ntitle: TDD\ndomain: software\n---\n\nTest-driven development is a discipline.\n"), + 0o644, + )) + logger := slog.New(slog.NewTextHandler(os.Stderr, nil)) + return dir, api.NewHandler(dir, logger, pipeline.Config{}) +} +``` + +Then add the new test functions at the bottom of the file: + +```go +func TestIngest_RequiresContent(t *testing.T) { + _, h := setup(t) + body, _ := json.Marshal(map[string]any{"source": "test"}) + req := httptest.NewRequest(http.MethodPost, "/ingest", bytes.NewReader(body)) + rec := httptest.NewRecorder() + h.Ingest(rec, req) + assert.Equal(t, http.StatusBadRequest, rec.Code) +} + +func TestIngest_RequiresSource(t *testing.T) { + _, h := setup(t) + body, _ := json.Marshal(map[string]any{"content": "something"}) + req := httptest.NewRequest(http.MethodPost, "/ingest", bytes.NewReader(body)) + rec := httptest.NewRecorder() + h.Ingest(rec, req) + assert.Equal(t, http.StatusBadRequest, rec.Code) +} + +func TestIngestPath_RequiresPath(t *testing.T) { + _, h := setup(t) + body, _ := json.Marshal(map[string]any{}) + req := httptest.NewRequest(http.MethodPost, "/ingest-path", bytes.NewReader(body)) + rec := httptest.NewRecorder() + h.IngestPath(rec, req) + assert.Equal(t, http.StatusBadRequest, rec.Code) +} + +func TestIngestPath_RejectsNonexistentPath(t *testing.T) { + _, h := setup(t) + body, _ := json.Marshal(map[string]any{"path": "/does/not/exist.md"}) + req := httptest.NewRequest(http.MethodPost, "/ingest-path", bytes.NewReader(body)) + rec := httptest.NewRecorder() + h.IngestPath(rec, req) + assert.Equal(t, http.StatusBadRequest, rec.Code) +} +``` + +Also add these imports to `handler_test.go`: +```go +import ( + "github.com/mathiasbq/hyperguild/ingestion/internal/pipeline" +) +``` + +- [ ] **Step 2: Run tests to verify they fail** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/api/... 2>&1 +``` + +Expected: compile error — `NewHandler` wrong number of args, `Ingest`/`IngestPath` undefined. + +- [ ] **Step 3: Update handler.go** + +Replace the full contents of `ingestion/internal/api/handler.go`: + +```go +// ingestion/internal/api/handler.go +package api + +import ( + "context" + "encoding/json" + "fmt" + "log/slog" + "net/http" + "os" + "path/filepath" + "strings" + "time" + + "github.com/mathiasbq/hyperguild/ingestion/internal/pipeline" + "github.com/mathiasbq/hyperguild/ingestion/internal/search" +) + +// Handler serves the ingestion HTTP API. +type Handler struct { + brainDir string + logger *slog.Logger + pipeline pipeline.Config +} + +// NewHandler constructs a Handler. +func NewHandler(brainDir string, logger *slog.Logger, pc pipeline.Config) *Handler { + return &Handler{brainDir: brainDir, logger: logger, pipeline: pc} +} + +// --- /query --- + +type queryRequest struct { + Query string `json:"query"` + Limit int `json:"limit,omitempty"` +} + +func (h *Handler) Query(w http.ResponseWriter, r *http.Request) { + var req queryRequest + if err := json.NewDecoder(r.Body).Decode(&req); err != nil { + http.Error(w, "invalid JSON", http.StatusBadRequest) + return + } + if strings.TrimSpace(req.Query) == "" { + http.Error(w, "query is required", http.StatusBadRequest) + return + } + if req.Limit == 0 { + req.Limit = 5 + } + results, err := search.Query(h.brainDir, req.Query, req.Limit) + if err != nil { + h.logger.Error("query failed", "err", err) + http.Error(w, "search error", http.StatusInternalServerError) + return + } + writeJSON(w, map[string]any{"results": results}) +} + +// --- /write --- + +type writeRequest struct { + Content string `json:"content"` + Filename string `json:"filename,omitempty"` + Type string `json:"type,omitempty"` + Domain string `json:"domain,omitempty"` +} + +func (h *Handler) Write(w http.ResponseWriter, r *http.Request) { + var req writeRequest + if err := json.NewDecoder(r.Body).Decode(&req); err != nil { + http.Error(w, "invalid JSON", http.StatusBadRequest) + return + } + if req.Content == "" { + http.Error(w, "content is required", http.StatusBadRequest) + return + } + + filename := req.Filename + if filename == "" { + filename = fmt.Sprintf("%s-auto.md", time.Now().UTC().Format("2006-01-02-150405")) + } + + rawDir := filepath.Join(h.brainDir, "knowledge") + if err := os.MkdirAll(rawDir, 0o755); err != nil { + http.Error(w, "failed to create raw dir", http.StatusInternalServerError) + return + } + + finalContent := req.Content + if req.Type != "" || req.Domain != "" { + var fm strings.Builder + fm.WriteString("---\n") + if req.Type != "" { + fmt.Fprintf(&fm, "type: %s\n", req.Type) + } + if req.Domain != "" { + fmt.Fprintf(&fm, "domain: %s\n", req.Domain) + } + fm.WriteString("---\n") + finalContent = fm.String() + req.Content + } + + base := filepath.Base(filename) + if !strings.HasSuffix(base, ".md") { + base += ".md" + } + dest := filepath.Join(rawDir, base) + if err := os.WriteFile(dest, []byte(finalContent), 0o644); err != nil { + h.logger.Error("write failed", "err", err) + http.Error(w, "write error", http.StatusInternalServerError) + return + } + + rel, _ := filepath.Rel(h.brainDir, dest) + writeJSON(w, map[string]string{"path": filepath.ToSlash(rel)}) +} + +// --- /ingest --- + +type ingestRequest struct { + Content string `json:"content"` + Source string `json:"source"` + DryRun bool `json:"dry_run,omitempty"` +} + +func (h *Handler) Ingest(w http.ResponseWriter, r *http.Request) { + var req ingestRequest + if err := json.NewDecoder(r.Body).Decode(&req); err != nil { + http.Error(w, "invalid JSON", http.StatusBadRequest) + return + } + if req.Content == "" { + http.Error(w, "content is required", http.StatusBadRequest) + return + } + if req.Source == "" { + http.Error(w, "source is required", http.StatusBadRequest) + return + } + + result, err := pipeline.Run(r.Context(), h.pipeline, h.brainDir, req.Content, req.Source, req.DryRun) + if err != nil { + h.logger.Error("ingest failed", "source", req.Source, "err", err) + http.Error(w, "ingest error: "+err.Error(), http.StatusInternalServerError) + return + } + writeJSON(w, map[string]any{"pages": result.Pages, "warnings": result.Warnings}) +} + +// --- /ingest-path --- + +type ingestPathRequest struct { + Path string `json:"path"` + Source string `json:"source,omitempty"` + DryRun bool `json:"dry_run,omitempty"` +} + +func (h *Handler) IngestPath(w http.ResponseWriter, r *http.Request) { + var req ingestPathRequest + if err := json.NewDecoder(r.Body).Decode(&req); err != nil { + http.Error(w, "invalid JSON", http.StatusBadRequest) + return + } + if req.Path == "" { + http.Error(w, "path is required", http.StatusBadRequest) + return + } + + info, err := os.Stat(req.Path) + if err != nil { + http.Error(w, "path not found: "+req.Path, http.StatusBadRequest) + return + } + + var files []string + if info.IsDir() { + files, err = collectFiles(req.Path) + if err != nil { + h.logger.Error("collect files failed", "path", req.Path, "err", err) + http.Error(w, "failed to read directory", http.StatusInternalServerError) + return + } + } else { + files = []string{req.Path} + } + + var allPages []string + var allWarnings []string + + for _, f := range files { + content, err := os.ReadFile(f) + if err != nil { + allWarnings = append(allWarnings, fmt.Sprintf("skip %s: %v", f, err)) + continue + } + source := req.Source + if source == "" { + source = sourceFromFilename(filepath.Base(f)) + } + result, err := pipeline.Run(r.Context(), h.pipeline, h.brainDir, string(content), source, req.DryRun) + if err != nil { + h.logger.Error("ingest-path failed", "file", f, "err", err) + allWarnings = append(allWarnings, fmt.Sprintf("ingest %s: %v", f, err)) + continue + } + allPages = append(allPages, result.Pages...) + allWarnings = append(allWarnings, result.Warnings...) + } + + writeJSON(w, map[string]any{"pages": allPages, "warnings": allWarnings}) +} + +var supportedExts = map[string]bool{".md": true, ".txt": true, ".pdf": true} + +func collectFiles(dir string) ([]string, error) { + var files []string + err := filepath.WalkDir(dir, func(path string, d os.DirEntry, err error) error { + if err != nil { + return nil + } + if d.IsDir() { + return nil + } + if supportedExts[strings.ToLower(filepath.Ext(path))] { + files = append(files, path) + } + return nil + }) + return files, err +} + +func sourceFromFilename(filename string) string { + name := strings.TrimSuffix(filename, filepath.Ext(filename)) + words := strings.FieldsFunc(name, func(r rune) bool { return r == '-' || r == '_' }) + for i, w := range words { + if len(w) > 0 { + words[i] = strings.ToUpper(w[:1]) + w[1:] + } + } + return strings.Join(words, " ") +} + +func writeJSON(w http.ResponseWriter, v any) { + w.Header().Set("Content-Type", "application/json") + json.NewEncoder(w).Encode(v) //nolint:errcheck +} +``` + +- [ ] **Step 4: Run tests to verify they pass** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/api/... -v +``` + +Expected: all PASS. + +- [ ] **Step 5: Update main.go** + +Replace `ingestion/cmd/server/main.go`: + +```go +// ingestion/cmd/server/main.go +package main + +import ( + "context" + "log/slog" + "net/http" + "os" + "strconv" + "time" + + "github.com/mathiasbq/hyperguild/ingestion/internal/api" + "github.com/mathiasbq/hyperguild/ingestion/internal/llm" + "github.com/mathiasbq/hyperguild/ingestion/internal/pipeline" + "github.com/mathiasbq/hyperguild/ingestion/internal/watcher" +) + +func main() { + logger := slog.New(slog.NewJSONHandler(os.Stdout, nil)) + + brainDir := envOr("INGEST_BRAIN_DIR", "../brain") + port := envOr("INGEST_PORT", "3300") + + llmClient := llm.New( + envOr("INGEST_LLM_URL", "http://iguana:4000/v1"), + os.Getenv("INGEST_LLM_KEY"), + envOr("INGEST_LLM_MODEL", "koala/qwen35-9b-fast"), + time.Duration(parseInt(envOr("INGEST_LLM_TIMEOUT", "15")))*time.Minute, + ) + + pc := pipeline.Config{ + Complete: llmClient.Complete, + ChunkSize: parseInt(envOr("INGEST_CHUNK_SIZE", "6000")), + } + + h := api.NewHandler(brainDir, logger, pc) + + mux := http.NewServeMux() + mux.HandleFunc("/query", h.Query) + mux.HandleFunc("/write", h.Write) + mux.HandleFunc("/ingest", h.Ingest) + mux.HandleFunc("/ingest-path", h.IngestPath) + + watchInterval := parseInt(envOr("INGEST_WATCH_INTERVAL", "30")) + if watchInterval > 0 { + w := watcher.New(watcher.Config{ + BrainDir: brainDir, + Interval: time.Duration(watchInterval) * time.Second, + Pipeline: pc, + Logger: logger, + }) + go w.Start(context.Background()) + logger.Info("watcher started", "interval_s", watchInterval, "raw_dir", brainDir+"/raw") + } + + addr := ":" + port + logger.Info("ingestion server starting", "addr", addr, "brain_dir", brainDir) + if err := http.ListenAndServe(addr, mux); err != nil { + logger.Error("server stopped", "err", err) + os.Exit(1) + } +} + +func envOr(key, def string) string { + if v := os.Getenv(key); v != "" { + return v + } + return def +} + +func parseInt(s string) int { + n, _ := strconv.Atoi(s) + return n +} +``` + +- [ ] **Step 6: Build to verify (watcher package doesn't exist yet — expect one error)** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go build ./... 2>&1 +``` + +Expected: `cannot find package "…/watcher"` — this is fine, Task 11 adds it. + +- [ ] **Step 7: Commit what compiles** + +```bash +git add ingestion/internal/api/handler.go ingestion/internal/api/handler_test.go \ + ingestion/cmd/server/main.go +git commit -m "feat(ingestion): add /ingest and /ingest-path HTTP handlers" +``` + +--- + +## Task 11: watcher — background file processor + +**Files:** +- Create: `ingestion/internal/watcher/watcher.go` +- Create: `ingestion/internal/watcher/watcher_test.go` + +- [ ] **Step 1: Write the failing test** + +```go +// ingestion/internal/watcher/watcher_test.go +package watcher + +import ( + "context" + "os" + "path/filepath" + "testing" + "time" + + "log/slog" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "github.com/mathiasbq/hyperguild/ingestion/internal/pipeline" + "github.com/mathiasbq/hyperguild/ingestion/internal/wiki" +) + +func TestWatcher_SuccessfulFileMovedToProcessed(t *testing.T) { + brainDir := t.TempDir() + rawDir := filepath.Join(brainDir, "raw") + require.NoError(t, os.MkdirAll(rawDir, 0o755)) + for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} { + require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755)) + } + + // Drop a file into raw/. + rawFile := filepath.Join(rawDir, "test-note.md") + require.NoError(t, os.WriteFile(rawFile, []byte("test content"), 0o644)) + + // Pipeline complete func returns one page. + successComplete := func(_ context.Context, _, _ string) (string, error) { + pages := []wiki.Page{{Path: "wiki/sources/test-note.md", Content: "---\ntitle: Test\n---\n"}} + b, _ := json.Marshal(pages) + return string(b), nil + } + + w := New(Config{ + BrainDir: brainDir, + Interval: 50 * time.Millisecond, + Pipeline: pipeline.Config{Complete: successComplete}, + Logger: slog.New(slog.NewTextHandler(os.Stderr, nil)), + }) + + ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond) + defer cancel() + go w.Start(ctx) + <-ctx.Done() + + // File must be gone from raw/. + _, err := os.Stat(rawFile) + assert.True(t, os.IsNotExist(err), "file should have been moved out of raw/") + + // File must be in processed/. + entries, err := filepath.Glob(filepath.Join(brainDir, "raw", "processed", "*", "test-note.md")) + require.NoError(t, err) + assert.Len(t, entries, 1) +} + +func TestWatcher_FailedFileMovedToFailed(t *testing.T) { + brainDir := t.TempDir() + rawDir := filepath.Join(brainDir, "raw") + require.NoError(t, os.MkdirAll(rawDir, 0o755)) + for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} { + require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755)) + } + + rawFile := filepath.Join(rawDir, "bad-note.md") + require.NoError(t, os.WriteFile(rawFile, []byte("bad"), 0o644)) + + errorComplete := func(_ context.Context, _, _ string) (string, error) { + return "", fmt.Errorf("LLM unavailable") + } + + w := New(Config{ + BrainDir: brainDir, + Interval: 50 * time.Millisecond, + Pipeline: pipeline.Config{Complete: errorComplete}, + Logger: slog.New(slog.NewTextHandler(os.Stderr, nil)), + }) + + ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond) + defer cancel() + go w.Start(ctx) + <-ctx.Done() + + _, err := os.Stat(rawFile) + assert.True(t, os.IsNotExist(err)) + + _, err = os.Stat(filepath.Join(brainDir, "raw", "failed", "bad-note.md")) + require.NoError(t, err) +} +``` + +Add imports to the test file: +```go +import ( + "context" + "encoding/json" + "fmt" + "log/slog" + "os" + "path/filepath" + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "github.com/mathiasbq/hyperguild/ingestion/internal/pipeline" + "github.com/mathiasbq/hyperguild/ingestion/internal/wiki" +) +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/watcher/... 2>&1 +``` + +Expected: `undefined: New` + +- [ ] **Step 3: Write watcher.go** + +```go +// ingestion/internal/watcher/watcher.go +package watcher + +import ( + "context" + "log/slog" + "os" + "path/filepath" + "strings" + "time" + + "github.com/mathiasbq/hyperguild/ingestion/internal/pipeline" +) + +var supportedExts = map[string]bool{".md": true, ".txt": true, ".pdf": true} + +// Config configures the file watcher. +type Config struct { + BrainDir string + Interval time.Duration + Pipeline pipeline.Config + Logger *slog.Logger +} + +// Watcher polls brain/raw/ and triggers ingestion for new files. +type Watcher struct { + cfg Config +} + +// New constructs a Watcher. +func New(cfg Config) *Watcher { return &Watcher{cfg: cfg} } + +// Start polls brain/raw/ at cfg.Interval until ctx is cancelled. +func (w *Watcher) Start(ctx context.Context) { + ticker := time.NewTicker(w.cfg.Interval) + defer ticker.Stop() + for { + select { + case <-ctx.Done(): + return + case <-ticker.C: + w.poll(ctx) + } + } +} + +func (w *Watcher) poll(ctx context.Context) { + rawDir := filepath.Join(w.cfg.BrainDir, "raw") + entries, err := os.ReadDir(rawDir) + if err != nil { + if !os.IsNotExist(err) { + w.cfg.Logger.Warn("watcher: read raw dir", "err", err) + } + return + } + for _, e := range entries { + if e.IsDir() { + continue + } + ext := strings.ToLower(filepath.Ext(e.Name())) + if !supportedExts[ext] { + continue + } + w.processFile(ctx, filepath.Join(rawDir, e.Name())) + } +} + +func (w *Watcher) processFile(ctx context.Context, path string) { + content, err := os.ReadFile(path) + if err != nil { + w.cfg.Logger.Error("watcher: read file", "path", path, "err", err) + return + } + + source := sourceFromFilename(filepath.Base(path)) + _, err = pipeline.Run(ctx, w.cfg.Pipeline, w.cfg.BrainDir, string(content), source, false) + if err != nil { + w.cfg.Logger.Error("watcher: ingest failed", "path", path, "err", err) + w.moveFile(path, filepath.Join(w.cfg.BrainDir, "raw", "failed")) + return + } + + date := time.Now().UTC().Format("2006-01-02") + w.moveFile(path, filepath.Join(w.cfg.BrainDir, "raw", "processed", date)) +} + +func (w *Watcher) moveFile(src, destDir string) { + if err := os.MkdirAll(destDir, 0o755); err != nil { + w.cfg.Logger.Error("watcher: mkdir", "dir", destDir, "err", err) + return + } + dest := filepath.Join(destDir, filepath.Base(src)) + if err := os.Rename(src, dest); err != nil { + w.cfg.Logger.Error("watcher: move file", "src", src, "dest", dest, "err", err) + } +} + +func sourceFromFilename(filename string) string { + name := strings.TrimSuffix(filename, filepath.Ext(filename)) + words := strings.FieldsFunc(name, func(r rune) bool { return r == '-' || r == '_' }) + for i, word := range words { + if len(word) > 0 { + words[i] = strings.ToUpper(word[:1]) + word[1:] + } + } + return strings.Join(words, " ") +} +``` + +- [ ] **Step 4: Run tests to verify they pass** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go test ./internal/watcher/... -v +``` + +Expected: all PASS. + +- [ ] **Step 5: Full build and test** + +```bash +cd /Users/mathias/dev/AI/hyperguild/ingestion && go build ./... && go test ./... 2>&1 +``` + +Expected: clean build, all tests PASS. + +- [ ] **Step 6: Commit** + +```bash +git add ingestion/internal/watcher/ +git commit -m "feat(ingestion): add background file watcher for brain/raw/" +``` + +- [ ] **Step 7: Final supervisor build check** + +```bash +cd /Users/mathias/dev/AI/hyperguild && go build ./... +``` + +Expected: no output (clean build). + +- [ ] **Step 8: Final commit** + +```bash +git add ingestion/cmd/server/main.go +git commit -m "feat(ingestion): wire watcher and new env vars in server main" +``` + +--- + +## Self-Review Checklist + +**Spec coverage:** +- [x] `/ingest` endpoint → Task 10 +- [x] `/ingest-path` endpoint (file + directory) → Task 10 +- [x] File watcher on `brain/raw/` → Task 11 +- [x] Success → `processed/YYYY-MM-DD/`, failure → `failed/` → Task 11 +- [x] LLM client with retry-on-429 — **GAP**: the client handles non-200 errors but the spec says "retry on 429". Add retry to `llm/client.go` in Task 7 Step 3. +- [x] Pipeline: inventory → schema → chunk → LLM → parse → merge → write → index → log → Task 9 +- [x] Truncation recovery → Task 8 +- [x] `brain/CLAUDE.md` embedded fallback → Task 9 (`defaultSchema` constant) +- [x] Supervisor config wiring → Task 1 +- [x] `brain_ingest` optional `path` field — **GAP**: `internal/skills/brain/handlers.go` already has `brain_ingest` calling `/ingest`. The spec says `brain_ingest` should gain a `path` field to call `/ingest-path` instead. + +**Fixing gaps:** + +For the 429 retry, add to `llm/client.go` after the `httpClient.Do` call in `Complete`: + +```go +// Retry once on 429 with Retry-After header or 5s backoff. +if resp.StatusCode == http.StatusTooManyRequests { + resp.Body.Close() + wait := 5 * time.Second + if ra := resp.Header.Get("Retry-After"); ra != "" { + if secs, err := strconv.Atoi(ra); err == nil { + wait = time.Duration(secs) * time.Second + } + } + select { + case <-ctx.Done(): + return "", ctx.Err() + case <-time.After(wait): + } + resp, err = c.httpClient.Do(req.Clone(ctx)) + if err != nil { + return "", fmt.Errorf("retry LLM call: %w", err) + } + defer resp.Body.Close() + out, err = io.ReadAll(resp.Body) + if err != nil { + return "", fmt.Errorf("read retry response: %w", err) + } +} +``` + +Add `"strconv"` to imports in `client.go`. Note: `req.Clone(ctx)` resets the body reader — since we used `bytes.NewReader`, build the request inside a helper or reconstruct it. Simplest fix: extract request building into a closure. This is handled inline in Task 7 Step 3 — plan readers should add this block after the initial `httpClient.Do`. + +For the `brain_ingest` `path` field, add to `internal/skills/brain/handlers.go` in the existing `ingestArgs` struct: + +```go +type ingestArgs struct { + Content string `json:"content"` + Source string `json:"source"` + Path string `json:"path,omitempty"` // ← add this + DryRun bool `json:"dry_run,omitempty"` +} +``` + +And in the `ingest` method, choose the endpoint based on whether `Path` is set: + +```go +func (s *Skill) ingest(ctx context.Context, args json.RawMessage) (json.RawMessage, error) { + var a ingestArgs + if err := json.Unmarshal(args, &a); err != nil { + return nil, fmt.Errorf("parse args: %w", err) + } + if a.Path == "" && a.Content == "" { + return nil, fmt.Errorf("content or path is required") + } + if a.Source == "" && a.Path == "" { + return nil, fmt.Errorf("source is required when content is provided") + } + if s.cfg.IngestSvcURL == "" { + return nil, fmt.Errorf("brain_ingest: INGEST_SVC_URL not configured") + } + if a.Path != "" { + return s.postTo(ctx, s.cfg.IngestSvcURL+"/ingest-path", a) + } + return s.postTo(ctx, s.cfg.IngestSvcURL+"/ingest", a) +} +``` + +Also update the `brain_ingest` tool description in `skill.go` to mention the `path` field: + +```go +InputSchema: schema([]string{}, map[string]any{ + "content": str, + "path": map[string]any{"type": "string", "description": "file or directory path to ingest (alternative to content)"}, + "source": map[string]any{"type": "string", "description": "human-readable source name, e.g. 'shape-up-book'"}, + "dry_run": map[string]any{"type": "boolean"}, +}), +``` + +These two fixes should be incorporated into Task 7 and after Task 11 respectively, or applied as a final cleanup commit.