1324 lines
41 KiB
Markdown
1324 lines
41 KiB
Markdown
# Level 3: Strip Slug Authority from LLM — Implementation Plan
|
||
|
||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||
|
||
**Goal:** Remove slug/path/frontmatter generation from the LLM; pipeline derives all slugs deterministically from titles via `wiki.Slug()`.
|
||
|
||
**Architecture:** Add `RawPage` type + `ParseRawPages` (LLM returns minimal JSON), `BuildPages` (computes slugs/paths/frontmatter), and `CanonicalizeLinks` (converts `[[Display Name]]` → `[[slug|Display Name]]`). Wire into `pipeline.Run`. Update system prompt and schema doc.
|
||
|
||
**Tech Stack:** Go 1.23, `encoding/json`, `regexp`, `strings`, `testify`
|
||
|
||
**Working directory for all commands:** `ingestion/` (the Go module root — always `cd ingestion` before running `go` commands)
|
||
|
||
---
|
||
|
||
## File Map
|
||
|
||
| File | Action | Responsibility |
|
||
|------|--------|---------------|
|
||
| `ingestion/internal/pipeline/parse.go` | Modify | Replace `ParsePages`+`wiki.Page` deserialization with `ParseRawPages`+`RawPage` type |
|
||
| `ingestion/internal/pipeline/parse_test.go` | Modify | Replace old `ParsePages` tests with `ParseRawPages` tests |
|
||
| `ingestion/internal/pipeline/build.go` | Create | `BuildPages` — computes slug, path, frontmatter from `RawPage` |
|
||
| `ingestion/internal/pipeline/build_test.go` | Create | Tests for `BuildPages` |
|
||
| `ingestion/internal/pipeline/links.go` | Create | `CanonicalizeLinks` — converts `[[Display Name]]` → `[[slug|Display Name]]` |
|
||
| `ingestion/internal/pipeline/links_test.go` | Create | Tests for `CanonicalizeLinks` |
|
||
| `ingestion/internal/pipeline/pipeline.go` | Modify | Wire `ParseRawPages → BuildPages → CanonicalizeLinks` into `Run`; update tests |
|
||
| `ingestion/internal/pipeline/pipeline_test.go` | Modify | Update mock LLM responses to new JSON format |
|
||
| `ingestion/internal/pipeline/prompt.go` | Modify | New system prompt + updated `BuildPrompt` for new JSON contract |
|
||
| `brain/schema.md` | Modify | Update wikilink format and JSON output format |
|
||
|
||
**Unchanged:** `resolve.go`, `refs.go`, `backfill.go`, `merge.go`, `chunk.go`, `resolve_test.go`, `refs_test.go`, `backfill_test.go`
|
||
|
||
---
|
||
|
||
## Task 1: `RawPage` type + `ParseRawPages`
|
||
|
||
**Files:**
|
||
- Modify: `ingestion/internal/pipeline/parse.go`
|
||
- Modify: `ingestion/internal/pipeline/parse_test.go`
|
||
|
||
- [ ] **Step 1: Write failing tests**
|
||
|
||
Replace the entire content of `ingestion/internal/pipeline/parse_test.go` with:
|
||
|
||
```go
|
||
// ingestion/internal/pipeline/parse_test.go
|
||
package pipeline
|
||
|
||
import (
|
||
"testing"
|
||
|
||
"github.com/stretchr/testify/assert"
|
||
"github.com/stretchr/testify/require"
|
||
)
|
||
|
||
func TestParseRawPages_ValidJSON(t *testing.T) {
|
||
input := `[{"title":"Shape Up","type":"source","subtype":"book","domain":"product-strategy","content":"## Summary\n\nFoo."},{"title":"Betting","type":"concept","content":"## Definition\n\nA technique."}]`
|
||
pages, warnings := ParseRawPages(input)
|
||
require.Len(t, pages, 2)
|
||
assert.Empty(t, warnings)
|
||
assert.Equal(t, "Shape Up", pages[0].Title)
|
||
assert.Equal(t, "source", pages[0].Type)
|
||
assert.Equal(t, "book", pages[0].Subtype)
|
||
assert.Equal(t, "product-strategy", pages[0].Domain)
|
||
assert.Equal(t, "Betting", pages[1].Title)
|
||
assert.Equal(t, "concept", pages[1].Type)
|
||
assert.Empty(t, pages[1].Subtype)
|
||
}
|
||
|
||
func TestParseRawPages_StripsFences(t *testing.T) {
|
||
input := "```json\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"## Definition\\n\\nFoo.\"}]\n```"
|
||
pages, warnings := ParseRawPages(input)
|
||
require.Len(t, pages, 1)
|
||
assert.Empty(t, warnings)
|
||
assert.Equal(t, "Foo", pages[0].Title)
|
||
}
|
||
|
||
func TestParseRawPages_TruncationRecovery(t *testing.T) {
|
||
input := `[{"title":"Foo","type":"concept","content":"## Definition\n\nFoo."},{"title":"Bar","type":"concept","content":"trunc`
|
||
pages, warnings := ParseRawPages(input)
|
||
require.Len(t, pages, 1)
|
||
assert.Equal(t, "Foo", pages[0].Title)
|
||
assert.NotEmpty(t, warnings)
|
||
}
|
||
|
||
func TestParseRawPages_EmptyInput(t *testing.T) {
|
||
pages, warnings := ParseRawPages("")
|
||
assert.Empty(t, pages)
|
||
assert.NotEmpty(t, warnings)
|
||
}
|
||
|
||
func TestParseRawPages_PlainFence(t *testing.T) {
|
||
input := "```\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"ok\"}]\n```"
|
||
pages, warnings := ParseRawPages(input)
|
||
require.Len(t, pages, 1)
|
||
assert.Empty(t, warnings)
|
||
}
|
||
|
||
func TestParseRawPages_MissingTitle(t *testing.T) {
|
||
// Missing title — still parsed, Title is empty string
|
||
input := `[{"type":"concept","content":"## Definition\n\nFoo."}]`
|
||
pages, warnings := ParseRawPages(input)
|
||
require.Len(t, pages, 1)
|
||
assert.Empty(t, warnings)
|
||
assert.Empty(t, pages[0].Title)
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run tests to verify they fail**
|
||
|
||
```bash
|
||
cd ingestion && go test ./internal/pipeline/ -run TestParseRawPages -v
|
||
```
|
||
|
||
Expected: FAIL — `ParseRawPages` undefined, `RawPage` undefined.
|
||
|
||
- [ ] **Step 3: Replace `parse.go` with new implementation**
|
||
|
||
Replace the entire content of `ingestion/internal/pipeline/parse.go` with:
|
||
|
||
```go
|
||
// ingestion/internal/pipeline/parse.go
|
||
package pipeline
|
||
|
||
import (
|
||
"encoding/json"
|
||
"fmt"
|
||
"strings"
|
||
)
|
||
|
||
// RawPage is the LLM's output format — minimal structured data with no path or frontmatter.
|
||
// The pipeline derives slugs, paths, and frontmatter from these fields.
|
||
type RawPage struct {
|
||
Title string `json:"title"`
|
||
Type string `json:"type"` // "source" | "concept" | "entity"
|
||
Subtype string `json:"subtype"` // entity: person|company|tool|model|framework|technology; source: article|pdf|book|video|note|project
|
||
Domain string `json:"domain"`
|
||
Content string `json:"content"` // Markdown body only — no frontmatter
|
||
}
|
||
|
||
// ParseRawPages parses LLM output as a JSON array of RawPage objects.
|
||
// If the array is truncated mid-object (token limit), it salvages all complete objects.
|
||
func ParseRawPages(output string) ([]RawPage, []string) {
|
||
output = strings.TrimSpace(output)
|
||
if output == "" {
|
||
return nil, []string{"LLM returned empty output"}
|
||
}
|
||
|
||
output = stripFences(output)
|
||
|
||
var pages []RawPage
|
||
if err := json.Unmarshal([]byte(output), &pages); err == nil {
|
||
return pages, nil
|
||
}
|
||
|
||
// Truncation recovery: find last `}` that closes a complete object.
|
||
idx := strings.LastIndex(output, "}")
|
||
if idx < 0 {
|
||
return nil, []string{"LLM output contained no complete JSON objects"}
|
||
}
|
||
|
||
start := strings.Index(output, "[")
|
||
if start < 0 {
|
||
return nil, []string{"LLM output contained no JSON array opening bracket"}
|
||
}
|
||
|
||
candidate := output[start:idx+1] + "]"
|
||
if err := json.Unmarshal([]byte(candidate), &pages); err != nil {
|
||
return nil, []string{fmt.Sprintf("truncation recovery failed: %v", err)}
|
||
}
|
||
|
||
return pages, []string{fmt.Sprintf("LLM output was truncated; recovered %d page(s)", len(pages))}
|
||
}
|
||
|
||
func stripFences(s string) string {
|
||
for _, prefix := range []string{"```json\n", "```json\r\n", "```\n", "```\r\n"} {
|
||
if strings.HasPrefix(s, prefix) {
|
||
s = strings.TrimPrefix(s, prefix)
|
||
s = strings.TrimSuffix(strings.TrimSpace(s), "```")
|
||
return strings.TrimSpace(s)
|
||
}
|
||
}
|
||
return s
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run tests to verify they pass**
|
||
|
||
```bash
|
||
cd ingestion && go test ./internal/pipeline/ -run TestParseRawPages -v
|
||
```
|
||
|
||
Expected: all 6 tests PASS.
|
||
|
||
- [ ] **Step 5: Verify package still compiles (pipeline.go still references old ParsePages — that's expected for now)**
|
||
|
||
```bash
|
||
cd ingestion && go build ./...
|
||
```
|
||
|
||
Expected: compile error mentioning `ParsePages` undefined — that's fine, we'll fix it in Task 4. If the error is something else, investigate.
|
||
|
||
Actually the old `ParsePages` no longer exists so `pipeline.go` will fail. That's OK — this task is intentionally breaking the pipeline step to set up for Task 4. If CI blocks on this, group Tasks 1–4 into a single commit.
|
||
|
||
- [ ] **Step 6: Commit**
|
||
|
||
```bash
|
||
cd ingestion && git add internal/pipeline/parse.go internal/pipeline/parse_test.go
|
||
git commit -m "feat(pipeline): replace ParsePages with ParseRawPages + RawPage type"
|
||
```
|
||
|
||
---
|
||
|
||
## Task 2: `BuildPages`
|
||
|
||
**Files:**
|
||
- Create: `ingestion/internal/pipeline/build.go`
|
||
- Create: `ingestion/internal/pipeline/build_test.go`
|
||
|
||
- [ ] **Step 1: Write failing tests**
|
||
|
||
Create `ingestion/internal/pipeline/build_test.go`:
|
||
|
||
```go
|
||
// ingestion/internal/pipeline/build_test.go
|
||
package pipeline
|
||
|
||
import (
|
||
"strings"
|
||
"testing"
|
||
|
||
"github.com/stretchr/testify/assert"
|
||
"github.com/stretchr/testify/require"
|
||
)
|
||
|
||
func TestBuildPages_SourcePage(t *testing.T) {
|
||
raw := []RawPage{
|
||
{
|
||
Title: "Shape Up",
|
||
Type: "source",
|
||
Subtype: "book",
|
||
Domain: "product-strategy",
|
||
Content: "## Summary\n\nA book about shaping product work.\n",
|
||
},
|
||
}
|
||
pages := BuildPages(raw, "shape-up", "2026-04-23")
|
||
require.Len(t, pages, 1)
|
||
|
||
p := pages[0]
|
||
assert.Equal(t, "wiki/sources/shape-up.md", p.Path)
|
||
assert.Contains(t, p.Content, "title: Shape Up")
|
||
assert.Contains(t, p.Content, "type: book")
|
||
assert.Contains(t, p.Content, "domain: product-strategy")
|
||
assert.Contains(t, p.Content, "date_ingested: 2026-04-23")
|
||
assert.Contains(t, p.Content, "last_updated: 2026-04-23")
|
||
assert.Contains(t, p.Content, "aliases:\n - Shape Up")
|
||
assert.Contains(t, p.Content, "## Summary")
|
||
assert.True(t, strings.HasPrefix(p.Content, "---\n"), "content must start with frontmatter")
|
||
}
|
||
|
||
func TestBuildPages_ConceptPage(t *testing.T) {
|
||
raw := []RawPage{
|
||
{
|
||
Title: "Betting",
|
||
Type: "concept",
|
||
Domain: "product-strategy",
|
||
Content: "## Definition\n\nA resource allocation technique.\n",
|
||
},
|
||
}
|
||
pages := BuildPages(raw, "shape-up", "2026-04-23")
|
||
require.Len(t, pages, 1)
|
||
|
||
p := pages[0]
|
||
assert.Equal(t, "wiki/concepts/betting.md", p.Path)
|
||
assert.Contains(t, p.Content, "title: Betting")
|
||
assert.Contains(t, p.Content, "domain: product-strategy")
|
||
assert.Contains(t, p.Content, "last_updated: 2026-04-23")
|
||
assert.Contains(t, p.Content, "aliases:\n - Betting")
|
||
assert.NotContains(t, p.Content, "date_ingested")
|
||
assert.Contains(t, p.Content, "## Definition")
|
||
}
|
||
|
||
func TestBuildPages_EntityPage(t *testing.T) {
|
||
raw := []RawPage{
|
||
{
|
||
Title: "Ryan Singer",
|
||
Type: "entity",
|
||
Subtype: "person",
|
||
Domain: "product-strategy",
|
||
Content: "## Description\n\nA product designer.\n",
|
||
},
|
||
}
|
||
pages := BuildPages(raw, "shape-up", "2026-04-23")
|
||
require.Len(t, pages, 1)
|
||
|
||
p := pages[0]
|
||
assert.Equal(t, "wiki/entities/ryan-singer.md", p.Path)
|
||
assert.Contains(t, p.Content, "title: Ryan Singer")
|
||
assert.Contains(t, p.Content, "type: person")
|
||
assert.Contains(t, p.Content, "domain: product-strategy")
|
||
assert.Contains(t, p.Content, "last_updated: 2026-04-23")
|
||
assert.Contains(t, p.Content, "aliases:\n - Ryan Singer")
|
||
assert.NotContains(t, p.Content, "date_ingested")
|
||
}
|
||
|
||
func TestBuildPages_SourceSlugUsedForSourcePage(t *testing.T) {
|
||
// LLM title differs from filename — pipeline uses sourceSlug for the source page path.
|
||
raw := []RawPage{
|
||
{Title: "FinBERT: A Pretrained Model", Type: "source", Subtype: "article", Content: "## Summary\n\nA model.\n"},
|
||
}
|
||
pages := BuildPages(raw, "finbert-huggingface", "2026-04-23")
|
||
require.Len(t, pages, 1)
|
||
assert.Equal(t, "wiki/sources/finbert-huggingface.md", pages[0].Path)
|
||
// Concept/entity slugs derived from title, not sourceSlug
|
||
}
|
||
|
||
func TestBuildPages_ConceptSlugDerivedFromTitle(t *testing.T) {
|
||
raw := []RawPage{
|
||
{Title: "Domain-Driven Design", Type: "concept", Content: "## Definition\n\nFoo.\n"},
|
||
}
|
||
pages := BuildPages(raw, "some-source", "2026-04-23")
|
||
require.Len(t, pages, 1)
|
||
assert.Equal(t, "wiki/concepts/domain-driven-design.md", pages[0].Path)
|
||
}
|
||
|
||
func TestBuildPages_SourceDefaultSubtype(t *testing.T) {
|
||
// If subtype is omitted for a source, default to "article"
|
||
raw := []RawPage{
|
||
{Title: "Some Post", Type: "source", Content: "## Summary\n\nA post.\n"},
|
||
}
|
||
pages := BuildPages(raw, "some-post", "2026-04-23")
|
||
require.Len(t, pages, 1)
|
||
assert.Contains(t, pages[0].Content, "type: article")
|
||
}
|
||
|
||
func TestBuildPages_OmitsDomainWhenEmpty(t *testing.T) {
|
||
raw := []RawPage{
|
||
{Title: "Betting", Type: "concept", Content: "## Definition\n\nFoo.\n"},
|
||
}
|
||
pages := BuildPages(raw, "src", "2026-04-23")
|
||
require.Len(t, pages, 1)
|
||
assert.NotContains(t, pages[0].Content, "domain:")
|
||
}
|
||
|
||
func TestBuildPages_MultiplePages(t *testing.T) {
|
||
raw := []RawPage{
|
||
{Title: "Shape Up", Type: "source", Subtype: "book", Content: "## Summary\n\nA book.\n"},
|
||
{Title: "Betting", Type: "concept", Content: "## Definition\n\nA technique.\n"},
|
||
{Title: "Ryan Singer", Type: "entity", Subtype: "person", Content: "## Description\n\nA designer.\n"},
|
||
}
|
||
pages := BuildPages(raw, "shape-up", "2026-04-23")
|
||
require.Len(t, pages, 3)
|
||
assert.Equal(t, "wiki/sources/shape-up.md", pages[0].Path)
|
||
assert.Equal(t, "wiki/concepts/betting.md", pages[1].Path)
|
||
assert.Equal(t, "wiki/entities/ryan-singer.md", pages[2].Path)
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run tests to verify they fail**
|
||
|
||
```bash
|
||
cd ingestion && go test ./internal/pipeline/ -run TestBuildPages -v
|
||
```
|
||
|
||
Expected: FAIL — `BuildPages` undefined.
|
||
|
||
- [ ] **Step 3: Create `build.go`**
|
||
|
||
Create `ingestion/internal/pipeline/build.go`:
|
||
|
||
```go
|
||
// ingestion/internal/pipeline/build.go
|
||
package pipeline
|
||
|
||
import (
|
||
"fmt"
|
||
"strings"
|
||
|
||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||
)
|
||
|
||
// BuildPages converts RawPages from the LLM into wiki.Pages with computed slugs,
|
||
// paths, and YAML frontmatter. sourceSlug is the slug of the source being ingested
|
||
// (derived from the filename, not the LLM title).
|
||
func BuildPages(rawPages []RawPage, sourceSlug, date string) []wiki.Page {
|
||
out := make([]wiki.Page, 0, len(rawPages))
|
||
for _, rp := range rawPages {
|
||
out = append(out, buildPage(rp, sourceSlug, date))
|
||
}
|
||
return out
|
||
}
|
||
|
||
func buildPage(rp RawPage, sourceSlug, date string) wiki.Page {
|
||
var slug, dir string
|
||
switch rp.Type {
|
||
case "source":
|
||
slug = sourceSlug
|
||
dir = "wiki/sources"
|
||
case "concept":
|
||
slug = wiki.Slug(rp.Title)
|
||
dir = "wiki/concepts"
|
||
case "entity":
|
||
slug = wiki.Slug(rp.Title)
|
||
dir = "wiki/entities"
|
||
default:
|
||
slug = wiki.Slug(rp.Title)
|
||
dir = "wiki/" + rp.Type
|
||
}
|
||
|
||
path := dir + "/" + slug + ".md"
|
||
fm := buildFrontmatter(rp, date)
|
||
|
||
return wiki.Page{
|
||
Path: path,
|
||
Content: fm + "\n" + rp.Content,
|
||
}
|
||
}
|
||
|
||
func buildFrontmatter(rp RawPage, date string) string {
|
||
var sb strings.Builder
|
||
sb.WriteString("---\n")
|
||
fmt.Fprintf(&sb, "title: %s\n", rp.Title)
|
||
|
||
switch rp.Type {
|
||
case "source":
|
||
subtype := rp.Subtype
|
||
if subtype == "" {
|
||
subtype = "article"
|
||
}
|
||
fmt.Fprintf(&sb, "type: %s\n", subtype)
|
||
if rp.Domain != "" {
|
||
fmt.Fprintf(&sb, "domain: %s\n", rp.Domain)
|
||
}
|
||
fmt.Fprintf(&sb, "date_ingested: %s\n", date)
|
||
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||
case "concept":
|
||
if rp.Domain != "" {
|
||
fmt.Fprintf(&sb, "domain: %s\n", rp.Domain)
|
||
}
|
||
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||
case "entity":
|
||
if rp.Subtype != "" {
|
||
fmt.Fprintf(&sb, "type: %s\n", rp.Subtype)
|
||
}
|
||
if rp.Domain != "" {
|
||
fmt.Fprintf(&sb, "domain: %s\n", rp.Domain)
|
||
}
|
||
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||
default:
|
||
if rp.Domain != "" {
|
||
fmt.Fprintf(&sb, "domain: %s\n", rp.Domain)
|
||
}
|
||
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||
}
|
||
|
||
fmt.Fprintf(&sb, "aliases:\n - %s\n", rp.Title)
|
||
sb.WriteString("---\n")
|
||
return sb.String()
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run tests to verify they pass**
|
||
|
||
```bash
|
||
cd ingestion && go test ./internal/pipeline/ -run TestBuildPages -v
|
||
```
|
||
|
||
Expected: all 8 tests PASS.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add ingestion/internal/pipeline/build.go ingestion/internal/pipeline/build_test.go
|
||
git commit -m "feat(pipeline): add BuildPages — compute slugs/paths/frontmatter from RawPage"
|
||
```
|
||
|
||
---
|
||
|
||
## Task 3: `CanonicalizeLinks`
|
||
|
||
**Files:**
|
||
- Create: `ingestion/internal/pipeline/links.go`
|
||
- Create: `ingestion/internal/pipeline/links_test.go`
|
||
|
||
- [ ] **Step 1: Write failing tests**
|
||
|
||
Create `ingestion/internal/pipeline/links_test.go`:
|
||
|
||
```go
|
||
// ingestion/internal/pipeline/links_test.go
|
||
package pipeline
|
||
|
||
import (
|
||
"testing"
|
||
|
||
"github.com/stretchr/testify/assert"
|
||
"github.com/stretchr/testify/require"
|
||
|
||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||
)
|
||
|
||
func TestCanonicalizeLinks_KnownTitle(t *testing.T) {
|
||
pages := []wiki.Page{
|
||
{
|
||
Path: "wiki/sources/shape-up.md",
|
||
Content: "---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[Betting]].\n",
|
||
},
|
||
}
|
||
inventory := map[wiki.PageType][]wiki.Entry{
|
||
wiki.PageTypeConcept: {
|
||
{Slug: "betting", Title: "Betting"},
|
||
},
|
||
}
|
||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||
require.Len(t, got, 1)
|
||
assert.Empty(t, warnings)
|
||
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||
assert.NotContains(t, got[0].Content, "[[Betting]]")
|
||
}
|
||
|
||
func TestCanonicalizeLinks_UnknownTitleLeftAsIs(t *testing.T) {
|
||
pages := []wiki.Page{
|
||
{
|
||
Path: "wiki/sources/shape-up.md",
|
||
Content: "---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[Ghost Concept]].\n",
|
||
},
|
||
}
|
||
inventory := map[wiki.PageType][]wiki.Entry{}
|
||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||
require.Len(t, got, 1)
|
||
assert.NotEmpty(t, warnings)
|
||
assert.Contains(t, got[0].Content, "[[Ghost Concept]]")
|
||
}
|
||
|
||
func TestCanonicalizeLinks_AlreadyCanonicalLinkUntouched(t *testing.T) {
|
||
// Links already in [[slug|Display]] format must not be double-converted
|
||
pages := []wiki.Page{
|
||
{
|
||
Path: "wiki/sources/shape-up.md",
|
||
Content: "---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[betting|Betting]].\n",
|
||
},
|
||
}
|
||
inventory := map[wiki.PageType][]wiki.Entry{
|
||
wiki.PageTypeConcept: {
|
||
{Slug: "betting", Title: "Betting"},
|
||
},
|
||
}
|
||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||
require.Len(t, got, 1)
|
||
assert.Empty(t, warnings)
|
||
// Should remain exactly as-is — not double-wrapped
|
||
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||
assert.NotContains(t, got[0].Content, "[[betting|[[betting|Betting]]]]")
|
||
}
|
||
|
||
func TestCanonicalizeLinks_CaseInsensitiveMatch(t *testing.T) {
|
||
pages := []wiki.Page{
|
||
{
|
||
Path: "wiki/sources/foo.md",
|
||
Content: "---\ntitle: Foo\n---\n\n## Summary\n\nSee [[domain driven design]].\n",
|
||
},
|
||
}
|
||
inventory := map[wiki.PageType][]wiki.Entry{
|
||
wiki.PageTypeConcept: {
|
||
{Slug: "domain-driven-design", Title: "Domain Driven Design"},
|
||
},
|
||
}
|
||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||
require.Len(t, got, 1)
|
||
assert.Empty(t, warnings)
|
||
assert.Contains(t, got[0].Content, "[[domain-driven-design|domain driven design]]")
|
||
}
|
||
|
||
func TestCanonicalizeLinks_CurrentBatchPagesResolved(t *testing.T) {
|
||
// A concept created in the same batch should be canonicalizable
|
||
pages := []wiki.Page{
|
||
{
|
||
Path: "wiki/sources/shape-up.md",
|
||
Content: "---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[Betting]].\n",
|
||
},
|
||
{
|
||
Path: "wiki/concepts/betting.md",
|
||
Content: "---\ntitle: Betting\n---\n\n## Definition\n\nA technique.\n",
|
||
},
|
||
}
|
||
inventory := map[wiki.PageType][]wiki.Entry{} // empty — Betting is in the batch, not inventory
|
||
|
||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||
require.Len(t, got, 2)
|
||
assert.Empty(t, warnings)
|
||
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||
}
|
||
|
||
func TestCanonicalizeLinks_MultipleLinksInOnePage(t *testing.T) {
|
||
pages := []wiki.Page{
|
||
{
|
||
Path: "wiki/sources/foo.md",
|
||
Content: "---\ntitle: Foo\n---\n\n## Summary\n\nSee [[Betting]] and [[Shape Up]].\n",
|
||
},
|
||
}
|
||
inventory := map[wiki.PageType][]wiki.Entry{
|
||
wiki.PageTypeConcept: {
|
||
{Slug: "betting", Title: "Betting"},
|
||
},
|
||
wiki.PageTypeSource: {
|
||
{Slug: "shape-up", Title: "Shape Up"},
|
||
},
|
||
}
|
||
got, warnings := CanonicalizeLinks(pages, inventory)
|
||
require.Len(t, got, 1)
|
||
assert.Empty(t, warnings)
|
||
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||
assert.Contains(t, got[0].Content, "[[shape-up|Shape Up]]")
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run tests to verify they fail**
|
||
|
||
```bash
|
||
cd ingestion && go test ./internal/pipeline/ -run TestCanonicalizeLinks -v
|
||
```
|
||
|
||
Expected: FAIL — `CanonicalizeLinks` undefined.
|
||
|
||
- [ ] **Step 3: Create `links.go`**
|
||
|
||
Create `ingestion/internal/pipeline/links.go`:
|
||
|
||
```go
|
||
// ingestion/internal/pipeline/links.go
|
||
package pipeline
|
||
|
||
import (
|
||
"fmt"
|
||
"path/filepath"
|
||
"regexp"
|
||
"strings"
|
||
|
||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||
)
|
||
|
||
// plainLinkRE matches [[Display Name]] — wikilinks without a slug pipe.
|
||
// It does NOT match [[slug|Display]] (those already have a pipe).
|
||
var plainLinkRE = regexp.MustCompile(`\[\[([^\]|]+)\]\]`)
|
||
|
||
// CanonicalizeLinks converts [[Display Name]] wikilinks to [[slug|Display Name]]
|
||
// using a title→slug map built from the inventory and current batch.
|
||
// Unknown titles are left as-is and returned as warnings.
|
||
func CanonicalizeLinks(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) ([]wiki.Page, []string) {
|
||
titleToSlug := buildTitleMap(pages, inventory)
|
||
|
||
var allWarnings []string
|
||
out := make([]wiki.Page, len(pages))
|
||
for i, p := range pages {
|
||
newContent, warnings := canonicalizeContent(p.Content, titleToSlug)
|
||
p.Content = newContent
|
||
out[i] = p
|
||
allWarnings = append(allWarnings, warnings...)
|
||
}
|
||
return out, allWarnings
|
||
}
|
||
|
||
// buildTitleMap builds a lowercase-title → slug map from inventory and current batch.
|
||
// Current batch entries take precedence over inventory (they may be updates).
|
||
func buildTitleMap(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) map[string]string {
|
||
m := make(map[string]string)
|
||
for _, entries := range inventory {
|
||
for _, e := range entries {
|
||
m[strings.ToLower(e.Title)] = e.Slug
|
||
}
|
||
}
|
||
// Current batch overrides inventory
|
||
for _, p := range pages {
|
||
title := extractTitle(p.Content)
|
||
slug := strings.TrimSuffix(filepath.Base(p.Path), ".md")
|
||
if title != "" && slug != "" {
|
||
m[strings.ToLower(title)] = slug
|
||
}
|
||
}
|
||
return m
|
||
}
|
||
|
||
func canonicalizeContent(content string, titleToSlug map[string]string) (string, []string) {
|
||
var warnings []string
|
||
result := plainLinkRE.ReplaceAllStringFunc(content, func(match string) string {
|
||
sub := plainLinkRE.FindStringSubmatch(match)
|
||
if len(sub) < 2 {
|
||
return match
|
||
}
|
||
displayName := sub[1]
|
||
slug, ok := titleToSlug[strings.ToLower(displayName)]
|
||
if !ok {
|
||
warnings = append(warnings, fmt.Sprintf("unknown wikilink: [[%s]]", displayName))
|
||
return match
|
||
}
|
||
return "[[" + slug + "|" + displayName + "]]"
|
||
})
|
||
return result, warnings
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run tests to verify they pass**
|
||
|
||
```bash
|
||
cd ingestion && go test ./internal/pipeline/ -run TestCanonicalizeLinks -v
|
||
```
|
||
|
||
Expected: all 6 tests PASS.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add ingestion/internal/pipeline/links.go ingestion/internal/pipeline/links_test.go
|
||
git commit -m "feat(pipeline): add CanonicalizeLinks — convert [[Display Name]] to [[slug|Display Name]]"
|
||
```
|
||
|
||
---
|
||
|
||
## Task 4: Wire `pipeline.go` + update `pipeline_test.go`
|
||
|
||
**Files:**
|
||
- Modify: `ingestion/internal/pipeline/pipeline.go`
|
||
- Modify: `ingestion/internal/pipeline/pipeline_test.go`
|
||
|
||
- [ ] **Step 1: Update `pipeline_test.go` to use new LLM response format**
|
||
|
||
Replace the entire content of `ingestion/internal/pipeline/pipeline_test.go` with:
|
||
|
||
```go
|
||
// ingestion/internal/pipeline/pipeline_test.go
|
||
package pipeline
|
||
|
||
import (
|
||
"context"
|
||
"encoding/json"
|
||
"net/http"
|
||
"net/http/httptest"
|
||
"os"
|
||
"path/filepath"
|
||
"testing"
|
||
"time"
|
||
|
||
"github.com/stretchr/testify/assert"
|
||
"github.com/stretchr/testify/require"
|
||
|
||
"github.com/mathiasbq/hyperguild/ingestion/internal/llm"
|
||
)
|
||
|
||
func TestRun_WritesPages(t *testing.T) {
|
||
brainDir := t.TempDir()
|
||
for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} {
|
||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||
}
|
||
|
||
llmResponse := mustJSON([]RawPage{
|
||
{
|
||
Title: "Test Article",
|
||
Type: "source",
|
||
Subtype: "article",
|
||
Domain: "software-engineering",
|
||
Content: "## Summary\n\nA test article.\n\n## Key Claims\n\n- It tests things.\n\n## Concepts Introduced or Reinforced\n\n[[Testing]]\n\n## Entities Mentioned\n\n## Open Questions Raised\n",
|
||
},
|
||
{
|
||
Title: "Testing",
|
||
Type: "concept",
|
||
Domain: "software-engineering",
|
||
Content: "## Definition\n\nThe practice of verifying software.\n\n## Why It Matters\n\nCatches bugs.\n\n## Related Concepts\n\n## Related Entities\n\n## Sources\n\n## Evolving Notes\n",
|
||
},
|
||
})
|
||
|
||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||
w.Header().Set("Content-Type", "application/json")
|
||
_ = json.NewEncoder(w).Encode(map[string]any{
|
||
"choices": []map[string]any{
|
||
{"message": map[string]any{"role": "assistant", "content": llmResponse}},
|
||
},
|
||
})
|
||
}))
|
||
defer srv.Close()
|
||
|
||
cfg := Config{
|
||
Complete: llm.New(srv.URL, "", "test-model", 30*time.Second).Complete,
|
||
ChunkSize: 0,
|
||
}
|
||
|
||
result, err := Run(context.Background(), cfg, brainDir, "An article about testing.", "test-article", false)
|
||
require.NoError(t, err)
|
||
assert.Len(t, result.Pages, 2)
|
||
|
||
_, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "test-article.md"))
|
||
require.NoError(t, err)
|
||
_, err = os.Stat(filepath.Join(brainDir, "wiki", "concepts", "testing.md"))
|
||
require.NoError(t, err)
|
||
_, err = os.Stat(filepath.Join(brainDir, "wiki", "index.md"))
|
||
require.NoError(t, err)
|
||
_, err = os.Stat(filepath.Join(brainDir, "log.md"))
|
||
require.NoError(t, err)
|
||
}
|
||
|
||
func TestRun_DryRunDoesNotWrite(t *testing.T) {
|
||
brainDir := t.TempDir()
|
||
for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} {
|
||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||
}
|
||
|
||
llmResponse := mustJSON([]RawPage{{
|
||
Title: "Foo",
|
||
Type: "source",
|
||
Subtype: "article",
|
||
Content: "## Summary\n\nFoo.\n",
|
||
}})
|
||
|
||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||
_ = json.NewEncoder(w).Encode(map[string]any{
|
||
"choices": []map[string]any{{"message": map[string]any{"content": llmResponse}}},
|
||
})
|
||
}))
|
||
defer srv.Close()
|
||
|
||
cfg := Config{Complete: llm.New(srv.URL, "", "m", 30*time.Second).Complete}
|
||
result, err := Run(context.Background(), cfg, brainDir, "foo content", "foo", true)
|
||
require.NoError(t, err)
|
||
assert.Len(t, result.Pages, 1)
|
||
|
||
_, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "foo.md"))
|
||
assert.True(t, os.IsNotExist(err))
|
||
}
|
||
|
||
func TestRun_MergesDuplicatePaths(t *testing.T) {
|
||
brainDir := t.TempDir()
|
||
for _, sub := range []string{"wiki/concepts", "wiki/entities", "wiki/sources"} {
|
||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||
}
|
||
|
||
// LLM returns same title twice (simulates multi-chunk duplicate)
|
||
llmResponse := mustJSON([]RawPage{
|
||
{Title: "Foo", Type: "concept", Content: "## Definition\n\nFirst.\n\n## Related Concepts\n\n[[Bar]]\n"},
|
||
{Title: "Foo", Type: "concept", Content: "## Definition\n\nSecond.\n\n## Related Concepts\n\n[[Baz]]\n"},
|
||
})
|
||
|
||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||
_ = json.NewEncoder(w).Encode(map[string]any{
|
||
"choices": []map[string]any{{"message": map[string]any{"content": llmResponse}}},
|
||
})
|
||
}))
|
||
defer srv.Close()
|
||
|
||
cfg := Config{Complete: llm.New(srv.URL, "", "m", 30*time.Second).Complete}
|
||
result, err := Run(context.Background(), cfg, brainDir, "content", "foo", false)
|
||
require.NoError(t, err)
|
||
assert.Len(t, result.Pages, 1) // deduplicated
|
||
|
||
content, err := os.ReadFile(filepath.Join(brainDir, "wiki", "concepts", "foo.md"))
|
||
require.NoError(t, err)
|
||
// keep-first for Definition, union for Related Concepts
|
||
assert.Contains(t, string(content), "First.")
|
||
// Bar and Baz unknown → left as plain [[links]]
|
||
assert.Contains(t, string(content), "[[Bar]]")
|
||
assert.Contains(t, string(content), "[[Baz]]")
|
||
}
|
||
|
||
func mustJSON(v any) string {
|
||
b, err := json.Marshal(v)
|
||
if err != nil {
|
||
panic(err)
|
||
}
|
||
return string(b)
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run tests to verify they fail (expected — pipeline.go still uses old ParsePages)**
|
||
|
||
```bash
|
||
cd ingestion && go test ./internal/pipeline/ -run TestRun -v
|
||
```
|
||
|
||
Expected: compile error or test failures — that's fine.
|
||
|
||
- [ ] **Step 3: Update `pipeline.go` to wire new steps**
|
||
|
||
Replace the entire content of `ingestion/internal/pipeline/pipeline.go` with:
|
||
|
||
```go
|
||
// ingestion/internal/pipeline/pipeline.go
|
||
package pipeline
|
||
|
||
import (
|
||
"context"
|
||
"fmt"
|
||
"os"
|
||
"path/filepath"
|
||
"strings"
|
||
"time"
|
||
|
||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||
)
|
||
|
||
// CompleteFunc is the function signature for LLM calls.
|
||
type CompleteFunc func(ctx context.Context, system, user string) (string, error)
|
||
|
||
// Config holds pipeline configuration.
|
||
type Config struct {
|
||
Complete CompleteFunc
|
||
ChunkSize int // 0 = no chunking
|
||
Schema string // overrides brain/schema.md when set (useful in tests)
|
||
}
|
||
|
||
// Result is the outcome of a pipeline run.
|
||
type Result struct {
|
||
Pages []string // relative paths written (or would-be written in dry-run)
|
||
Warnings []string
|
||
}
|
||
|
||
// Run ingests content and writes structured wiki pages to brainDir/wiki/.
|
||
// In dry-run mode, pages are returned but not written to disk.
|
||
func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryRun bool) (Result, error) {
|
||
inventory, err := wiki.LoadInventory(brainDir)
|
||
if err != nil {
|
||
return Result{}, fmt.Errorf("load inventory: %w", err)
|
||
}
|
||
|
||
schema := cfg.Schema
|
||
if schema == "" {
|
||
schema = loadSchema(brainDir)
|
||
}
|
||
|
||
sourceSlug := wiki.Slug(source)
|
||
date := time.Now().UTC().Format("2006-01-02")
|
||
chunks := Chunk(content, cfg.ChunkSize)
|
||
|
||
var allRaw []RawPage
|
||
var allWarnings []string
|
||
|
||
for _, chunk := range chunks {
|
||
userPrompt := BuildPrompt(schema, source, chunk, inventory)
|
||
output, err := cfg.Complete(ctx, systemPrompt, userPrompt)
|
||
if err != nil {
|
||
return Result{}, fmt.Errorf("LLM call: %w", err)
|
||
}
|
||
raw, warnings := ParseRawPages(output)
|
||
allRaw = append(allRaw, raw...)
|
||
allWarnings = append(allWarnings, warnings...)
|
||
}
|
||
|
||
pages := BuildPages(allRaw, sourceSlug, date)
|
||
resolved := Resolve(pages, inventory)
|
||
canonicalized, linkWarnings := CanonicalizeLinks(resolved, inventory)
|
||
allWarnings = append(allWarnings, linkWarnings...)
|
||
withRefs := injectSourceRefs(canonicalized, inventory, brainDir)
|
||
merged := mergeAll(withRefs)
|
||
|
||
var written []string
|
||
for _, page := range merged {
|
||
if !dryRun {
|
||
dest := filepath.Join(brainDir, filepath.FromSlash(page.Path))
|
||
if err := os.MkdirAll(filepath.Dir(dest), 0o755); err != nil {
|
||
return Result{}, fmt.Errorf("mkdir for %s: %w", page.Path, err)
|
||
}
|
||
if err := os.WriteFile(dest, []byte(page.Content), 0o644); err != nil {
|
||
return Result{}, fmt.Errorf("write %s: %w", page.Path, err)
|
||
}
|
||
}
|
||
written = append(written, page.Path)
|
||
}
|
||
|
||
if !dryRun {
|
||
if err := wiki.RebuildIndex(brainDir, date); err != nil {
|
||
allWarnings = append(allWarnings, fmt.Sprintf("rebuild index: %v", err))
|
||
}
|
||
if err := wiki.AppendLog(brainDir, source, written, allWarnings, date); err != nil {
|
||
allWarnings = append(allWarnings, fmt.Sprintf("append log: %v", err))
|
||
}
|
||
}
|
||
|
||
return Result{Pages: written, Warnings: allWarnings}, nil
|
||
}
|
||
|
||
// mergeAll deduplicates pages by path, merging content from later occurrences.
|
||
func mergeAll(pages []wiki.Page) []wiki.Page {
|
||
order := make([]string, 0, len(pages))
|
||
byPath := make(map[string]wiki.Page, len(pages))
|
||
for _, p := range pages {
|
||
if _, seen := byPath[p.Path]; !seen {
|
||
order = append(order, p.Path)
|
||
byPath[p.Path] = p
|
||
} else {
|
||
byPath[p.Path] = wiki.Merge(byPath[p.Path], p)
|
||
}
|
||
}
|
||
result := make([]wiki.Page, 0, len(order))
|
||
for _, path := range order {
|
||
result = append(result, byPath[path])
|
||
}
|
||
return result
|
||
}
|
||
|
||
const defaultSchema = `# Brain Wiki Schema
|
||
Three page types: wiki/sources/, wiki/concepts/, wiki/entities/.
|
||
See brain/schema.md for the full schema.
|
||
`
|
||
|
||
func loadSchema(brainDir string) string {
|
||
b, err := os.ReadFile(filepath.Join(brainDir, "schema.md"))
|
||
if err != nil {
|
||
return defaultSchema
|
||
}
|
||
return strings.TrimSpace(string(b))
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run all pipeline tests**
|
||
|
||
```bash
|
||
cd ingestion && go test ./internal/pipeline/ -v
|
||
```
|
||
|
||
Expected: all tests PASS. If `TestRun_WritesPages` has a warning about unknown wikilink `[[Testing]]` in warnings (since the Testing concept is in the same batch and should be resolved via `CanonicalizeLinks`), check that `CanonicalizeLinks` builds the batch map correctly before checking `result.Warnings`.
|
||
|
||
Note: `TestRun_WritesPages` now calls `assert.Len(t, result.Pages, 2)` without `assert.Empty(t, result.Warnings)` — unknown-link warnings from `CanonicalizeLinks` are acceptable since "Testing" is in the batch and should canonicalize cleanly. If the test fails with unexpected warnings, investigate `buildTitleMap` — it should find "Testing" from the current batch.
|
||
|
||
- [ ] **Step 5: Run full test suite**
|
||
|
||
```bash
|
||
cd ingestion && go test ./... 2>&1
|
||
```
|
||
|
||
Expected: all packages pass.
|
||
|
||
- [ ] **Step 6: Commit**
|
||
|
||
```bash
|
||
git add ingestion/internal/pipeline/pipeline.go ingestion/internal/pipeline/pipeline_test.go
|
||
git commit -m "feat(pipeline): wire ParseRawPages+BuildPages+CanonicalizeLinks into Run"
|
||
```
|
||
|
||
---
|
||
|
||
## Task 5: Update `prompt.go`
|
||
|
||
**Files:**
|
||
- Modify: `ingestion/internal/pipeline/prompt.go`
|
||
|
||
- [ ] **Step 1: Replace `prompt.go`**
|
||
|
||
Replace the entire content of `ingestion/internal/pipeline/prompt.go` with:
|
||
|
||
```go
|
||
// ingestion/internal/pipeline/prompt.go
|
||
package pipeline
|
||
|
||
import (
|
||
"fmt"
|
||
"strings"
|
||
"time"
|
||
|
||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||
)
|
||
|
||
const systemPrompt = `You are a wiki agent. Read the source material and produce structured wiki pages following the schema provided.
|
||
|
||
Output ONLY a valid JSON array — no markdown fences, no other text before or after.
|
||
Each element must have exactly these fields:
|
||
"title" — exact page title (e.g. "FinBERT", "Ryan Singer", "Shape Up")
|
||
"type" — exactly one of: "source", "concept", "entity"
|
||
"subtype" — for source: article|pdf|book|video|note|project; for entity: person|company|tool|model|framework|technology; omit for concept
|
||
"domain" — one of the domains in the schema (omit if none fits)
|
||
"content" — Markdown body only — NO frontmatter, NO path, NO slug
|
||
|
||
Wikilinks in content: [[Display Name]] — just the display name, no slug, no pipe separator.
|
||
Only link to pages listed in the inventory or pages you are creating in this response.`
|
||
|
||
// BuildPrompt constructs the user prompt for a single chunk.
|
||
func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]wiki.Entry) string {
|
||
var sb strings.Builder
|
||
|
||
fmt.Fprintf(&sb, "Today's date is %s.\n\n", time.Now().UTC().Format("2006-01-02"))
|
||
|
||
sb.WriteString("## Schema\n\n")
|
||
sb.WriteString(schema)
|
||
sb.WriteString("\n\n")
|
||
|
||
sb.WriteString("## Existing wiki pages\n\n")
|
||
sb.WriteString("Reference these pages by display name only — [[Display Name]] — in your content.\n\n")
|
||
|
||
for _, pt := range []wiki.PageType{wiki.PageTypeConcept, wiki.PageTypeEntity, wiki.PageTypeSource} {
|
||
entries := inventory[pt]
|
||
label := strings.ToUpper(string(pt)[:1]) + string(pt)[1:]
|
||
if len(entries) == 0 {
|
||
fmt.Fprintf(&sb, "%s — (none yet)\n\n", label)
|
||
continue
|
||
}
|
||
fmt.Fprintf(&sb, "%s:\n", label)
|
||
for _, e := range entries {
|
||
fmt.Fprintf(&sb, " - %s\n", e.Title)
|
||
}
|
||
sb.WriteString("\n")
|
||
}
|
||
|
||
sb.WriteString("## Non-negotiable rules\n\n")
|
||
sb.WriteString("1. Output ONLY a valid JSON array — no prose, no fences.\n")
|
||
sb.WriteString("2. Fields: title, type, subtype (if applicable), domain (if applicable), content.\n")
|
||
sb.WriteString("3. Wikilinks: [[Display Name]] — no slug, no pipe. The pipeline handles slugs.\n")
|
||
sb.WriteString("4. Section links must match their section type (Related Concepts → concepts only, etc.).\n")
|
||
sb.WriteString("5. One source page per book — if inventory shows it exists, return it as an UPDATE.\n\n")
|
||
|
||
fmt.Fprintf(&sb, "## Source: %s\n\n", source)
|
||
sb.WriteString(content)
|
||
|
||
return sb.String()
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Verify compile and tests still pass**
|
||
|
||
```bash
|
||
cd ingestion && go test ./... 2>&1
|
||
```
|
||
|
||
Expected: all tests PASS (prompt.go has no direct unit tests — covered via integration tests in pipeline_test.go).
|
||
|
||
- [ ] **Step 3: Commit**
|
||
|
||
```bash
|
||
git add ingestion/internal/pipeline/prompt.go
|
||
git commit -m "feat(pipeline): update system prompt for new LLM JSON contract (no slugs)"
|
||
```
|
||
|
||
---
|
||
|
||
## Task 6: Update `brain/schema.md`
|
||
|
||
**Files:**
|
||
- Modify: `brain/schema.md`
|
||
|
||
- [ ] **Step 1: Update the schema doc**
|
||
|
||
Replace the entire content of `brain/schema.md` with:
|
||
|
||
```markdown
|
||
# Brain Wiki Schema
|
||
|
||
This document defines the three page types in the brain wiki.
|
||
The LLM must follow this schema exactly when generating wiki pages.
|
||
|
||
## Output Format
|
||
|
||
Return a JSON array. Each element:
|
||
|
||
```json
|
||
{
|
||
"title": "exact page title",
|
||
"type": "source | concept | entity",
|
||
"subtype": "see below — omit for concept",
|
||
"domain": "see domains — omit if none fits",
|
||
"content": "Markdown body only — no frontmatter, no path"
|
||
}
|
||
```
|
||
|
||
- `subtype` for **source**: `article | pdf | book | video | note | project`
|
||
- `subtype` for **entity**: `person | company | tool | model | framework | technology`
|
||
- The pipeline computes slugs and frontmatter — never include them in output.
|
||
|
||
## Wikilink Format
|
||
|
||
All cross-references use `[[Display Name]]` — just the display name, no slug, no pipe.
|
||
|
||
Rules:
|
||
- Only link to pages in the inventory or pages you are creating in this response
|
||
- The pipeline converts `[[Display Name]]` to `[[slug|Display Name]]` automatically
|
||
- Section links must match their section type (Related Concepts → concept pages only, etc.)
|
||
|
||
Examples: `[[Domain Driven Design]]`, `[[Ryan Singer]]`, `[[Shape Up]]`
|
||
|
||
## Domains
|
||
|
||
Use one of: `ai-llm`, `software-engineering`, `product-strategy`, `finance-markets`,
|
||
`personal`, `consulting`, `climate`, `infrastructure`, `security`
|
||
|
||
---
|
||
|
||
## Source Pages — wiki/sources/<slug>.md
|
||
|
||
One page per ingested source. Books are NEVER split across multiple source pages — update the existing one.
|
||
|
||
Body sections (in this order):
|
||
|
||
### Summary
|
||
2–3 sentences. Core argument or finding.
|
||
|
||
### Key Claims
|
||
Bulleted list. Paraphrase — no verbatim quotes or code.
|
||
|
||
### Concepts Introduced or Reinforced
|
||
Wikilinks to concept pages ONLY. One per line.
|
||
|
||
### Entities Mentioned
|
||
Wikilinks to entity pages ONLY. One per line.
|
||
|
||
### Open Questions Raised
|
||
Gaps or follow-up questions from this source.
|
||
|
||
For books only, also add:
|
||
|
||
### Chapters
|
||
One bullet per chapter with 1–2 sentence summary.
|
||
|
||
### Argument Arc
|
||
Overall narrative as it becomes clear across chapters.
|
||
|
||
### Updates
|
||
Dated entries appended on re-ingestion. NEVER rewrite — only append.
|
||
|
||
---
|
||
|
||
## Concept Pages — wiki/concepts/<slug>.md
|
||
|
||
One page per idea, framework, methodology, or pattern.
|
||
|
||
Body sections (in this order):
|
||
|
||
### Definition
|
||
One-paragraph plain-language explanation.
|
||
|
||
### Why It Matters
|
||
Practical significance. Why should anyone care?
|
||
|
||
### Related Concepts
|
||
Wikilinks to concept pages ONLY.
|
||
|
||
### Related Entities
|
||
Wikilinks to entity pages ONLY.
|
||
|
||
### Sources
|
||
Wikilinks to source pages ONLY.
|
||
|
||
### Evolving Notes
|
||
Updated as new sources arrive. Append, do not rewrite.
|
||
|
||
---
|
||
|
||
## Entity Pages — wiki/entities/<slug>.md
|
||
|
||
One page per person, tool, organisation, technology, or product.
|
||
|
||
Body sections (in this order):
|
||
|
||
### Description
|
||
One-line description.
|
||
|
||
### Relevance
|
||
Why this entity matters to this knowledge base.
|
||
|
||
### Key Positions, Products, or Claims
|
||
With dates where known.
|
||
|
||
### Related Concepts
|
||
Wikilinks to concept pages ONLY.
|
||
|
||
### Related Entities
|
||
Wikilinks to entity pages ONLY.
|
||
|
||
### Sources
|
||
Wikilinks to source pages ONLY.
|
||
|
||
---
|
||
|
||
## Non-Negotiable Rules
|
||
|
||
1. Output ONLY a valid JSON array — no markdown fences, no prose before or after
|
||
2. Each element: `{"title": "...", "type": "...", "subtype": "...", "domain": "...", "content": "..."}`
|
||
3. Never include slugs, paths, or frontmatter in output — the pipeline handles these
|
||
4. Wikilinks: `[[Display Name]]` only — no pipe, no slug
|
||
5. Dates always YYYY-MM-DD (used only in content body where contextually relevant)
|
||
6. Never reproduce verbatim code — describe the pattern or technique
|
||
7. Section links must match their section type
|
||
8. One source page per book — if inventory shows it exists, include it as an UPDATE
|
||
```
|
||
|
||
- [ ] **Step 2: Verify Go tests still pass (schema.md is loaded at runtime, not compile time)**
|
||
|
||
```bash
|
||
cd ingestion && go test ./... 2>&1
|
||
```
|
||
|
||
Expected: all tests PASS.
|
||
|
||
- [ ] **Step 3: Commit**
|
||
|
||
```bash
|
||
git add brain/schema.md
|
||
git commit -m "docs(schema): update LLM output format and wikilink convention for Level 3"
|
||
```
|
||
|
||
---
|
||
|
||
## Self-Review
|
||
|
||
**Spec coverage check:**
|
||
|
||
| Spec requirement | Task |
|
||
|-----------------|------|
|
||
| LLM returns `{title, type, subtype, domain, content}` | Task 1 (RawPage), Task 5 (prompt) |
|
||
| Pipeline computes all slugs via `wiki.Slug(title)` | Task 2 (BuildPages) |
|
||
| Source page uses `sourceSlug` from filename | Task 2 (buildPage: `case "source": slug = sourceSlug`) |
|
||
| Frontmatter assembled by pipeline | Task 2 (buildFrontmatter) |
|
||
| `[[Display Name]]` → `[[slug|Display Name]]` canonicalization | Task 3 (CanonicalizeLinks) |
|
||
| CanonicalizeLinks runs after BuildPages, before injectSourceRefs | Task 4 (pipeline.go step order) |
|
||
| Unknown titles left as-is + warnings | Task 3 (links_test: UnknownTitleLeftAsIs) |
|
||
| Current-batch pages resolvable | Task 3 (buildTitleMap includes batch) |
|
||
| `injectSourceRefs` unchanged | — (no task needed) |
|
||
| `Resolve` unchanged | — (no task needed) |
|
||
| `brain/schema.md` updated | Task 6 |
|
||
| Integration test for no slug duplication across chunks | Task 4 (TestRun_MergesDuplicatePaths uses same title twice) |
|
||
|
||
All spec requirements covered.
|
||
|
||
**Placeholder scan:** None found.
|
||
|
||
**Type consistency:**
|
||
- `RawPage` defined in Task 1, used in Tasks 2, 4 ✓
|
||
- `BuildPages([]RawPage, string, string) []wiki.Page` defined in Task 2, called in Task 4 ✓
|
||
- `CanonicalizeLinks([]wiki.Page, map[wiki.PageType][]wiki.Entry) ([]wiki.Page, []string)` defined in Task 3, called in Task 4 ✓
|
||
- `ParseRawPages(string) ([]RawPage, []string)` defined in Task 1, called in Task 4 ✓
|
||
- `extractTitle` used in Task 3 (`links.go`) — defined in `resolve.go` (same package) ✓
|