Compare commits
16 Commits
1bfe501d09
...
v0.4.0
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
923a665365 | ||
|
|
537aebc302 | ||
|
|
de35d4dbb0 | ||
|
|
26855f69b0 | ||
|
|
a7b363d589 | ||
|
|
7b57051af8 | ||
|
|
a620f6cb01 | ||
|
|
26b5636b43 | ||
|
|
989f375aec | ||
|
|
6403d5e444 | ||
|
|
ab19968ae2 | ||
|
|
1605624668 | ||
|
|
55fa0b503a | ||
|
|
3c2bd9268c | ||
|
|
29727ec2a5 | ||
|
|
0a075088b2 |
@@ -3,21 +3,34 @@
|
|||||||
This document defines the three page types in the brain wiki.
|
This document defines the three page types in the brain wiki.
|
||||||
The LLM must follow this schema exactly when generating wiki pages.
|
The LLM must follow this schema exactly when generating wiki pages.
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
Return a JSON array. Each element:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"title": "exact page title",
|
||||||
|
"type": "source | concept | entity",
|
||||||
|
"subtype": "see below — omit for concept",
|
||||||
|
"domain": "see domains — omit if none fits",
|
||||||
|
"content": "Markdown body only — no frontmatter, no path"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- `subtype` for **source**: `article | pdf | book | video | note | project`
|
||||||
|
- `subtype` for **entity**: `person | company | tool | model | framework | technology`
|
||||||
|
- The pipeline computes slugs and frontmatter — never include them in output.
|
||||||
|
|
||||||
## Wikilink Format
|
## Wikilink Format
|
||||||
|
|
||||||
All cross-references use `[[slug|Display Text]]`.
|
All cross-references use `[[Display Name]]` — just the display name, no slug, no pipe.
|
||||||
|
|
||||||
Rules:
|
Rules:
|
||||||
- slug = lowercase filename without .md, spaces → hyphens, strip all non-alphanumeric except hyphens
|
- Only link to pages in the inventory or pages you are creating in this response
|
||||||
- The `|` separator is REQUIRED — never use `[[Title]]` without a slug
|
- The pipeline converts `[[Display Name]]` to `[[slug|Display Name]]` automatically
|
||||||
- Examples: `[[domain-driven-design|Domain Driven Design]]`, `[[ryan-singer|Ryan Singer]]`
|
- Section links must match their section type (Related Concepts → concept pages only, etc.)
|
||||||
- Slugs must resolve to an existing file in the inventory, or a file you are creating in this response
|
|
||||||
|
|
||||||
Slug generation examples:
|
Examples: `[[Domain Driven Design]]`, `[[Ryan Singer]]`, `[[Shape Up]]`
|
||||||
- "Domain Driven Design" → `domain-driven-design`
|
|
||||||
- "It's Complicated" → `its-complicated`
|
|
||||||
- "gRPC" → `grpc`
|
|
||||||
- "GPT-4o" → `gpt-4o`
|
|
||||||
|
|
||||||
## Domains
|
## Domains
|
||||||
|
|
||||||
@@ -30,17 +43,6 @@ Use one of: `ai-llm`, `software-engineering`, `product-strategy`, `finance-marke
|
|||||||
|
|
||||||
One page per ingested source. Books are NEVER split across multiple source pages — update the existing one.
|
One page per ingested source. Books are NEVER split across multiple source pages — update the existing one.
|
||||||
|
|
||||||
Required frontmatter:
|
|
||||||
```yaml
|
|
||||||
title: <exact title>
|
|
||||||
type: article | pdf | book | video | note | project
|
|
||||||
domain: <domain>
|
|
||||||
date_ingested: YYYY-MM-DD
|
|
||||||
last_updated: YYYY-MM-DD
|
|
||||||
aliases:
|
|
||||||
- <exact title>
|
|
||||||
```
|
|
||||||
|
|
||||||
Body sections (in this order):
|
Body sections (in this order):
|
||||||
|
|
||||||
### Summary
|
### Summary
|
||||||
@@ -50,10 +52,10 @@ Body sections (in this order):
|
|||||||
Bulleted list. Paraphrase — no verbatim quotes or code.
|
Bulleted list. Paraphrase — no verbatim quotes or code.
|
||||||
|
|
||||||
### Concepts Introduced or Reinforced
|
### Concepts Introduced or Reinforced
|
||||||
Wikilinks to wiki/concepts/ ONLY. One per line.
|
Wikilinks to concept pages ONLY. One per line.
|
||||||
|
|
||||||
### Entities Mentioned
|
### Entities Mentioned
|
||||||
Wikilinks to wiki/entities/ ONLY. One per line.
|
Wikilinks to entity pages ONLY. One per line.
|
||||||
|
|
||||||
### Open Questions Raised
|
### Open Questions Raised
|
||||||
Gaps or follow-up questions from this source.
|
Gaps or follow-up questions from this source.
|
||||||
@@ -75,15 +77,6 @@ Dated entries appended on re-ingestion. NEVER rewrite — only append.
|
|||||||
|
|
||||||
One page per idea, framework, methodology, or pattern.
|
One page per idea, framework, methodology, or pattern.
|
||||||
|
|
||||||
Required frontmatter:
|
|
||||||
```yaml
|
|
||||||
title: <concept name>
|
|
||||||
domain: <domain>
|
|
||||||
last_updated: YYYY-MM-DD
|
|
||||||
aliases:
|
|
||||||
- <exact title>
|
|
||||||
```
|
|
||||||
|
|
||||||
Body sections (in this order):
|
Body sections (in this order):
|
||||||
|
|
||||||
### Definition
|
### Definition
|
||||||
@@ -93,13 +86,13 @@ One-paragraph plain-language explanation.
|
|||||||
Practical significance. Why should anyone care?
|
Practical significance. Why should anyone care?
|
||||||
|
|
||||||
### Related Concepts
|
### Related Concepts
|
||||||
Wikilinks to wiki/concepts/ ONLY.
|
Wikilinks to concept pages ONLY.
|
||||||
|
|
||||||
### Related Entities
|
### Related Entities
|
||||||
Wikilinks to wiki/entities/ ONLY.
|
Wikilinks to entity pages ONLY.
|
||||||
|
|
||||||
### Sources
|
### Sources
|
||||||
Wikilinks to wiki/sources/ ONLY.
|
Wikilinks to source pages ONLY.
|
||||||
|
|
||||||
### Evolving Notes
|
### Evolving Notes
|
||||||
Updated as new sources arrive. Append, do not rewrite.
|
Updated as new sources arrive. Append, do not rewrite.
|
||||||
@@ -110,16 +103,6 @@ Updated as new sources arrive. Append, do not rewrite.
|
|||||||
|
|
||||||
One page per person, tool, organisation, technology, or product.
|
One page per person, tool, organisation, technology, or product.
|
||||||
|
|
||||||
Required frontmatter:
|
|
||||||
```yaml
|
|
||||||
title: <name>
|
|
||||||
type: person | company | tool | model | framework | technology
|
|
||||||
domain: <domain>
|
|
||||||
last_updated: YYYY-MM-DD
|
|
||||||
aliases:
|
|
||||||
- <exact title>
|
|
||||||
```
|
|
||||||
|
|
||||||
Body sections (in this order):
|
Body sections (in this order):
|
||||||
|
|
||||||
### Description
|
### Description
|
||||||
@@ -132,23 +115,23 @@ Why this entity matters to this knowledge base.
|
|||||||
With dates where known.
|
With dates where known.
|
||||||
|
|
||||||
### Related Concepts
|
### Related Concepts
|
||||||
Wikilinks to wiki/concepts/ ONLY.
|
Wikilinks to concept pages ONLY.
|
||||||
|
|
||||||
### Related Entities
|
### Related Entities
|
||||||
Wikilinks to wiki/entities/ ONLY.
|
Wikilinks to entity pages ONLY.
|
||||||
|
|
||||||
### Sources
|
### Sources
|
||||||
Wikilinks to wiki/sources/ ONLY.
|
Wikilinks to source pages ONLY.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Non-Negotiable Rules
|
## Non-Negotiable Rules
|
||||||
|
|
||||||
1. Output ONLY a valid JSON array — no markdown fences, no prose before or after
|
1. Output ONLY a valid JSON array — no markdown fences, no prose before or after
|
||||||
2. Each element: `{"path": "wiki/<type>/<slug>.md", "content": "...full markdown..."}`
|
2. Each element: `{"title": "...", "type": "...", "subtype": "...", "domain": "...", "content": "..."}`
|
||||||
3. Slugs are kebab-case: lowercase, spaces→hyphens, strip special characters
|
3. Never include slugs, paths, or frontmatter in output — the pipeline handles these
|
||||||
4. Every wikilink must be `[[slug|Display Text]]` — the pipe separator is required
|
4. Wikilinks: `[[Display Name]]` only — no pipe, no slug
|
||||||
5. Dates always YYYY-MM-DD
|
5. Dates always YYYY-MM-DD (used only in content body where contextually relevant)
|
||||||
6. Never reproduce verbatim code — describe the pattern or technique
|
6. Never reproduce verbatim code — describe the pattern or technique
|
||||||
7. Section links must match their section type (Related Concepts → concepts/ only, etc.)
|
7. Section links must match their section type
|
||||||
8. One source page per book — if inventory shows it exists, include it as an UPDATE
|
8. One source page per book — if inventory shows it exists, include it as an UPDATE
|
||||||
|
|||||||
1323
docs/superpowers/plans/2026-04-23-level3-slug-authority.md
Normal file
1323
docs/superpowers/plans/2026-04-23-level3-slug-authority.md
Normal file
File diff suppressed because it is too large
Load Diff
433
docs/superpowers/plans/2026-04-23-source-backrefs.md
Normal file
433
docs/superpowers/plans/2026-04-23-source-backrefs.md
Normal file
@@ -0,0 +1,433 @@
|
|||||||
|
# Source Back-References Implementation Plan
|
||||||
|
|
||||||
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||||
|
|
||||||
|
**Goal:** After the LLM produces wiki pages for an ingestion, automatically inject a `## Sources` back-reference on every concept and entity page that the source page links to.
|
||||||
|
|
||||||
|
**Architecture:** A new `injectSourceRefs` post-processing step is inserted between `Resolve` and `mergeAll` in `pipeline.Run`. It finds the source page in the proposed batch, extracts all `[[slug|...]]` wikilinks, then calls `wiki.Merge` with a minimal patch page to add the back-reference. `wiki.Merge` already treats `## Sources` as a bullet section with deduplication — no custom section parsing is needed. For concepts/entities that exist on disk but weren't proposed in the current batch (the common case on re-ingestion), the function loads them from disk and adds them to the pages list so they are updated.
|
||||||
|
|
||||||
|
**Tech Stack:** Go stdlib (`regexp`, `os`, `path/filepath`, `strings`), existing `wiki.Merge` and `wiki.Page` types.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
**New files:**
|
||||||
|
- `ingestion/internal/pipeline/refs.go` — `injectSourceRefs`, `addSourceRef`, `extractWikilinks`, `findSourcePage`, `findInInventory`
|
||||||
|
- `ingestion/internal/pipeline/refs_test.go` — table-driven tests
|
||||||
|
|
||||||
|
**Modified files:**
|
||||||
|
- `ingestion/internal/pipeline/pipeline.go` — insert `injectSourceRefs` call between `Resolve` and `mergeAll`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 1: `refs.go` — source back-reference injection
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `ingestion/internal/pipeline/refs_test.go`
|
||||||
|
- Create: `ingestion/internal/pipeline/refs.go`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the failing tests**
|
||||||
|
|
||||||
|
```go
|
||||||
|
// ingestion/internal/pipeline/refs_test.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
|
||||||
|
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||||
|
)
|
||||||
|
|
||||||
|
// makeInventory builds a minimal inventory for test use.
|
||||||
|
func makeInventory(concepts, entities []string) map[wiki.PageType][]wiki.Entry {
|
||||||
|
inv := map[wiki.PageType][]wiki.Entry{
|
||||||
|
wiki.PageTypeConcept: {},
|
||||||
|
wiki.PageTypeEntity: {},
|
||||||
|
wiki.PageTypeSource: {},
|
||||||
|
}
|
||||||
|
for _, slug := range concepts {
|
||||||
|
inv[wiki.PageTypeConcept] = append(inv[wiki.PageTypeConcept], wiki.Entry{Slug: slug, Title: slug})
|
||||||
|
}
|
||||||
|
for _, slug := range entities {
|
||||||
|
inv[wiki.PageTypeEntity] = append(inv[wiki.PageTypeEntity], wiki.Entry{Slug: slug, Title: slug})
|
||||||
|
}
|
||||||
|
return inv
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_NoSourcePage(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFoo.\n"},
|
||||||
|
}
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
assert.Equal(t, pages, got)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_InjectsIntoProposedConcept(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/my-article.md",
|
||||||
|
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[domain-driven-design|Domain Driven Design]].\n",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Path: "wiki/concepts/domain-driven-design.md",
|
||||||
|
Content: "---\ntitle: Domain Driven Design\n---\n\n## Definition\n\nA methodology.\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
|
||||||
|
require.Len(t, got, 2)
|
||||||
|
assert.Contains(t, got[1].Content, "## Sources")
|
||||||
|
assert.Contains(t, got[1].Content, "[[my-article|My Article]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_LoadsConceptFromDisk(t *testing.T) {
|
||||||
|
brainDir := t.TempDir()
|
||||||
|
conceptDir := filepath.Join(brainDir, "wiki", "concepts")
|
||||||
|
require.NoError(t, os.MkdirAll(conceptDir, 0o755))
|
||||||
|
require.NoError(t, os.WriteFile(
|
||||||
|
filepath.Join(conceptDir, "shape-up.md"),
|
||||||
|
[]byte("---\ntitle: Shape Up\n---\n\n## Definition\n\nA methodology.\n"),
|
||||||
|
0o644,
|
||||||
|
))
|
||||||
|
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/my-article.md",
|
||||||
|
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[shape-up|Shape Up]].\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
inv := makeInventory([]string{"shape-up"}, nil)
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, inv, brainDir)
|
||||||
|
|
||||||
|
// Should have loaded shape-up.md from disk and added it with source ref.
|
||||||
|
require.Len(t, got, 2)
|
||||||
|
var conceptPage wiki.Page
|
||||||
|
for _, p := range got {
|
||||||
|
if p.Path == "wiki/concepts/shape-up.md" {
|
||||||
|
conceptPage = p
|
||||||
|
}
|
||||||
|
}
|
||||||
|
assert.Contains(t, conceptPage.Content, "## Sources")
|
||||||
|
assert.Contains(t, conceptPage.Content, "[[my-article|My Article]]")
|
||||||
|
// Original content preserved.
|
||||||
|
assert.Contains(t, conceptPage.Content, "## Definition")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_NoSelfReference(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/my-article.md",
|
||||||
|
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSelf-link [[my-article|My Article]].\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
|
||||||
|
// Only one page — source should not reference itself.
|
||||||
|
assert.Len(t, got, 1)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_DeduplicatesOnReingestion(t *testing.T) {
|
||||||
|
// Concept already has source ref from a prior ingestion.
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/my-article.md",
|
||||||
|
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[ddd|DDD]].\n",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Path: "wiki/concepts/ddd.md",
|
||||||
|
Content: "---\ntitle: DDD\n---\n\n## Definition\n\nA thing.\n\n## Sources\n\n- [[my-article|My Article]]\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
|
||||||
|
require.Len(t, got, 2)
|
||||||
|
// The source ref must appear exactly once.
|
||||||
|
count := 0
|
||||||
|
for _, line := range splitLines(got[1].Content) {
|
||||||
|
if line == "- [[my-article|My Article]]" {
|
||||||
|
count++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
assert.Equal(t, 1, count, "source ref should appear exactly once")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_InjectsIntoEntity(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/book.md",
|
||||||
|
Content: "---\ntitle: Book\n---\n\n## Summary\n\nBy [[ryan-singer|Ryan Singer]].\n",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Path: "wiki/entities/ryan-singer.md",
|
||||||
|
Content: "---\ntitle: Ryan Singer\n---\n\n## Description\n\nA designer.\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
|
||||||
|
require.Len(t, got, 2)
|
||||||
|
var entity wiki.Page
|
||||||
|
for _, p := range got {
|
||||||
|
if p.Path == "wiki/entities/ryan-singer.md" {
|
||||||
|
entity = p
|
||||||
|
}
|
||||||
|
}
|
||||||
|
assert.Contains(t, entity.Content, "[[book|Book]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExtractWikilinks(t *testing.T) {
|
||||||
|
content := "See [[foo|Foo]] and [[bar|Bar]] and [[foo|Foo again]]."
|
||||||
|
got := extractWikilinks(content)
|
||||||
|
assert.True(t, got["foo"])
|
||||||
|
assert.True(t, got["bar"])
|
||||||
|
assert.Len(t, got, 2, "duplicate slugs should be deduplicated")
|
||||||
|
}
|
||||||
|
|
||||||
|
// splitLines is a test helper.
|
||||||
|
func splitLines(s string) []string {
|
||||||
|
var out []string
|
||||||
|
for _, l := range splitNewlines(s) {
|
||||||
|
if l != "" {
|
||||||
|
out = append(out, l)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
func splitNewlines(s string) []string {
|
||||||
|
var lines []string
|
||||||
|
start := 0
|
||||||
|
for i, c := range s {
|
||||||
|
if c == '\n' {
|
||||||
|
lines = append(lines, s[start:i])
|
||||||
|
start = i + 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
lines = append(lines, s[start:])
|
||||||
|
return lines
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run to verify they fail**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./internal/pipeline/... -run "TestInjectSourceRefs|TestExtractWikilinks" -v
|
||||||
|
```
|
||||||
|
Expected: compile error — `injectSourceRefs` and `extractWikilinks` not defined.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Implement refs.go**
|
||||||
|
|
||||||
|
```go
|
||||||
|
// ingestion/internal/pipeline/refs.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"regexp"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||||
|
)
|
||||||
|
|
||||||
|
var wikilinkRE = regexp.MustCompile(`\[\[([^|\]]+)\|`)
|
||||||
|
|
||||||
|
// injectSourceRefs finds the source page in the proposed batch, extracts its wikilinks,
|
||||||
|
// and injects a back-reference into every linked concept or entity page.
|
||||||
|
// Pages that exist on disk but are not in the current batch are loaded and appended
|
||||||
|
// so they will be updated on write.
|
||||||
|
func injectSourceRefs(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry, brainDir string) []wiki.Page {
|
||||||
|
sourceSlug, sourceTitle, found := findSourcePage(pages)
|
||||||
|
if !found {
|
||||||
|
return pages
|
||||||
|
}
|
||||||
|
|
||||||
|
// Locate source page content for wikilink extraction.
|
||||||
|
var sourceContent string
|
||||||
|
for _, p := range pages {
|
||||||
|
if strings.HasPrefix(p.Path, "wiki/sources/") &&
|
||||||
|
strings.TrimSuffix(filepath.Base(p.Path), ".md") == sourceSlug {
|
||||||
|
sourceContent = p.Content
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
linkedSlugs := extractWikilinks(sourceContent)
|
||||||
|
sourceRef := "- [[" + sourceSlug + "|" + sourceTitle + "]]"
|
||||||
|
|
||||||
|
// Build slug → index map for proposed pages (excluding wiki/sources/).
|
||||||
|
bySlug := make(map[string]int, len(pages))
|
||||||
|
for i, p := range pages {
|
||||||
|
if !strings.HasPrefix(p.Path, "wiki/sources/") {
|
||||||
|
bySlug[strings.TrimSuffix(filepath.Base(p.Path), ".md")] = i
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for slug := range linkedSlugs {
|
||||||
|
if slug == sourceSlug {
|
||||||
|
continue // no self-reference
|
||||||
|
}
|
||||||
|
|
||||||
|
if idx, ok := bySlug[slug]; ok {
|
||||||
|
// Concept/entity is in the proposed batch — inject inline.
|
||||||
|
pages[idx] = addSourceRef(pages[idx], sourceRef)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Not in proposed batch — look for it in the inventory (exists on disk).
|
||||||
|
pt, ok := findInInventory(slug, inventory)
|
||||||
|
if !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
diskPath := filepath.Join(brainDir, "wiki", string(pt), slug+".md")
|
||||||
|
b, err := os.ReadFile(diskPath)
|
||||||
|
if err != nil {
|
||||||
|
continue // page not found on disk; skip
|
||||||
|
}
|
||||||
|
page := wiki.Page{
|
||||||
|
Path: "wiki/" + string(pt) + "/" + slug + ".md",
|
||||||
|
Content: string(b),
|
||||||
|
}
|
||||||
|
pages = append(pages, addSourceRef(page, sourceRef))
|
||||||
|
}
|
||||||
|
|
||||||
|
return pages
|
||||||
|
}
|
||||||
|
|
||||||
|
// addSourceRef injects sourceRef into the ## Sources bullet section of page.
|
||||||
|
// Uses wiki.Merge so that existing Sources entries are deduplicated and all
|
||||||
|
// other sections are preserved unchanged.
|
||||||
|
func addSourceRef(page wiki.Page, sourceRef string) wiki.Page {
|
||||||
|
patch := wiki.Page{
|
||||||
|
Path: page.Path,
|
||||||
|
Content: "\n## Sources\n\n" + sourceRef + "\n",
|
||||||
|
}
|
||||||
|
return wiki.Merge(page, patch)
|
||||||
|
}
|
||||||
|
|
||||||
|
// extractWikilinks returns the set of slugs referenced as [[slug|...]] in content.
|
||||||
|
func extractWikilinks(content string) map[string]bool {
|
||||||
|
slugs := make(map[string]bool)
|
||||||
|
for _, m := range wikilinkRE.FindAllStringSubmatch(content, -1) {
|
||||||
|
slugs[m[1]] = true
|
||||||
|
}
|
||||||
|
return slugs
|
||||||
|
}
|
||||||
|
|
||||||
|
// findSourcePage returns the slug and title of the first wiki/sources/ page in pages.
|
||||||
|
func findSourcePage(pages []wiki.Page) (slug, title string, found bool) {
|
||||||
|
for _, p := range pages {
|
||||||
|
if strings.HasPrefix(p.Path, "wiki/sources/") {
|
||||||
|
slug = strings.TrimSuffix(filepath.Base(p.Path), ".md")
|
||||||
|
title = extractTitle(p.Content)
|
||||||
|
if title == "" {
|
||||||
|
title = slug
|
||||||
|
}
|
||||||
|
return slug, title, true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return "", "", false
|
||||||
|
}
|
||||||
|
|
||||||
|
// findInInventory returns the PageType for a slug if it appears in the inventory.
|
||||||
|
func findInInventory(slug string, inventory map[wiki.PageType][]wiki.Entry) (wiki.PageType, bool) {
|
||||||
|
for pt, entries := range inventory {
|
||||||
|
for _, e := range entries {
|
||||||
|
if e.Slug == slug {
|
||||||
|
return pt, true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return "", false
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Run all pipeline tests**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./internal/pipeline/... -v
|
||||||
|
```
|
||||||
|
Expected: all existing tests PASS + 7 new refs tests PASS.
|
||||||
|
|
||||||
|
- [ ] **Step 5: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs && git add ingestion/internal/pipeline/refs.go ingestion/internal/pipeline/refs_test.go && git commit -m "feat(pipeline): inject source back-references into concept and entity pages"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 2: Wire injectSourceRefs into pipeline.Run
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `ingestion/internal/pipeline/pipeline.go`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Insert the call**
|
||||||
|
|
||||||
|
In `pipeline.go`, locate:
|
||||||
|
|
||||||
|
```go
|
||||||
|
resolved := Resolve(allPages, inventory)
|
||||||
|
merged := mergeAll(resolved)
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace with:
|
||||||
|
|
||||||
|
```go
|
||||||
|
resolved := Resolve(allPages, inventory)
|
||||||
|
withRefs := injectSourceRefs(resolved, inventory, brainDir)
|
||||||
|
merged := mergeAll(withRefs)
|
||||||
|
```
|
||||||
|
|
||||||
|
No import changes needed — same package.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run all pipeline tests**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./internal/pipeline/... -v
|
||||||
|
```
|
||||||
|
Expected: all tests PASS. The existing `TestRun_WritesPages` and `TestRun_DryRunDoesNotWrite` use LLM mocks that return source pages with no wikilinks to concepts — `injectSourceRefs` is a no-op for them.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Run full test suite + lint**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs/ingestion && go test ./... && golangci-lint run ./...
|
||||||
|
```
|
||||||
|
Expected: all packages PASS, 0 lint issues.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /Users/mathias/Documents/local-dev/AI/hyperguild/.worktrees/feat-source-backrefs && git add ingestion/internal/pipeline/pipeline.go && git commit -m "feat(pipeline): wire source back-reference injection into Run"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Self-Review
|
||||||
|
|
||||||
|
**Spec coverage:**
|
||||||
|
|
||||||
|
| Requirement | Task |
|
||||||
|
|---|---|
|
||||||
|
| Concepts get `## Sources` back-link to ingested source | Task 1 |
|
||||||
|
| Entities get `## Sources` back-link | Task 1 (TestInjectSourceRefs_InjectsIntoEntity) |
|
||||||
|
| Existing pages on disk get updated with new source | Task 1 (TestInjectSourceRefs_LoadsConceptFromDisk) |
|
||||||
|
| Re-ingestion of same source does not duplicate the ref | Task 1 (TestInjectSourceRefs_DeduplicatesOnReingestion) |
|
||||||
|
| Source page does not reference itself | Task 1 (TestInjectSourceRefs_NoSelfReference) |
|
||||||
|
| No-op when batch has no source page | Task 1 (TestInjectSourceRefs_NoSourcePage) |
|
||||||
|
| Wired into Run between Resolve and mergeAll | Task 2 |
|
||||||
|
| Full test suite and lint pass | Task 2 Step 3 |
|
||||||
|
|
||||||
|
**Placeholder scan:** None.
|
||||||
|
|
||||||
|
**Type consistency:** `injectSourceRefs([]wiki.Page, map[wiki.PageType][]wiki.Entry, string) []wiki.Page` — used identically in refs.go (definition) and pipeline.go (call site).
|
||||||
@@ -0,0 +1,148 @@
|
|||||||
|
# Level 3: Strip Slug Authority from LLM — Design Spec
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
The ingestion pipeline currently asks the LLM to produce full wiki pages including the file path (e.g. `wiki/sources/finbert-huggingface.md`). This causes two classes of bug:
|
||||||
|
|
||||||
|
1. **Slug proliferation** — the LLM invents different slugs for the same concept across chunks or runs, producing duplicate pages that diverge in content.
|
||||||
|
2. **Unstable paths** — the LLM may shorten, expand, or vary titles, making deduplication via `Resolve` unreliable because the slug mismatch is upstream of the normalizer.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
Strip slug authority from the LLM entirely. The LLM returns a minimal structured object. The pipeline computes all slugs deterministically from titles using `wiki.Slug(title)`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## LLM JSON Contract
|
||||||
|
|
||||||
|
### Output format (per page)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"title": "FinBERT",
|
||||||
|
"type": "concept",
|
||||||
|
"subtype": "framework",
|
||||||
|
"domain": "ai-llm",
|
||||||
|
"content": "## Definition\n\nA BERT-based model fine-tuned for financial sentiment...\n\n## Related\n\n- [[Sentiment Analysis]]\n- [[Hugging Face]]\n"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fields:**
|
||||||
|
|
||||||
|
| Field | Required | Values |
|
||||||
|
|-------|----------|--------|
|
||||||
|
| `title` | yes | Human-readable title, e.g. "FinBERT" |
|
||||||
|
| `type` | yes | `"source"` \| `"concept"` \| `"entity"` |
|
||||||
|
| `subtype` | for entity/source | entity: `person\|company\|tool\|model\|framework\|technology`; source: `article\|pdf\|book\|video\|note\|project` |
|
||||||
|
| `domain` | no | tag string, e.g. `ai-llm`, `finance` |
|
||||||
|
| `content` | yes | Markdown body sections only — no frontmatter, no path |
|
||||||
|
|
||||||
|
**Wikilinks in content:** `[[Display Name]]` only. No slug. The pipeline canonicalizes to `[[slug|Display Name]]` in a post-processing step.
|
||||||
|
|
||||||
|
**The LLM never writes slugs, paths, or frontmatter.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pipeline Changes
|
||||||
|
|
||||||
|
### New type: `RawPage`
|
||||||
|
|
||||||
|
```go
|
||||||
|
type RawPage struct {
|
||||||
|
Title string
|
||||||
|
Type string // "source" | "concept" | "entity"
|
||||||
|
Subtype string
|
||||||
|
Domain string
|
||||||
|
Content string
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### New step order
|
||||||
|
|
||||||
|
```
|
||||||
|
ParseRawPages → BuildPages → Resolve → CanonicalizeLinks → injectSourceRefs → mergeAll → write
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step descriptions
|
||||||
|
|
||||||
|
**`ParseRawPages(output string) ([]RawPage, []string)`**
|
||||||
|
Replaces `ParsePages`. Deserializes JSON objects with the new schema. Same truncation-recovery logic as today. Returns `(pages, warnings)`.
|
||||||
|
|
||||||
|
**`BuildPages(rawPages []RawPage, sourceSlug, date string) []wiki.Page`**
|
||||||
|
Converts `RawPage → wiki.Page`:
|
||||||
|
- Computes slug: `wiki.Slug(page.Title)`
|
||||||
|
- Computes path: `wiki/<type>/<slug>.md`
|
||||||
|
- Assembles frontmatter:
|
||||||
|
```
|
||||||
|
---
|
||||||
|
title: <Title>
|
||||||
|
type: <type>
|
||||||
|
subtype: <subtype> # omitted if empty
|
||||||
|
domain: <domain> # omitted if empty
|
||||||
|
created: <date>
|
||||||
|
source: <sourceSlug> # omitted for the source page itself
|
||||||
|
---
|
||||||
|
```
|
||||||
|
- Concatenates frontmatter + content
|
||||||
|
|
||||||
|
**`Resolve(pages []wiki.Page, inventory) []wiki.Page`**
|
||||||
|
Unchanged. Normalizes near-duplicate titles to existing inventory slugs.
|
||||||
|
|
||||||
|
**`CanonicalizeLinks(pages []wiki.Page, inventory) ([]wiki.Page, []string)`**
|
||||||
|
New. Builds a title→slug map from inventory + current batch. Replaces `[[Display Name]]` with `[[slug|Display Name]]` in each page's content. Titles with no known slug are left as-is and returned as warnings.
|
||||||
|
|
||||||
|
**`injectSourceRefs`**
|
||||||
|
Unchanged. Reads `[[slug|...]]` links (post-canonicalization) to inject back-references.
|
||||||
|
|
||||||
|
**`mergeAll → write`**
|
||||||
|
Unchanged.
|
||||||
|
|
||||||
|
### `pipeline.Run` signature change
|
||||||
|
|
||||||
|
```go
|
||||||
|
func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryRun bool) (Result, error)
|
||||||
|
```
|
||||||
|
|
||||||
|
`source` is already passed (it's the display name / filename). A new internal `sourceSlug` is derived from it via `wiki.Slug(source)` before calling `BuildPages`. No API change needed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files Changed
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `ingestion/internal/pipeline/parse.go` | Replace `ParsePages` with `ParseRawPages` + `RawPage` type |
|
||||||
|
| `ingestion/internal/pipeline/build.go` | New file: `BuildPages` |
|
||||||
|
| `ingestion/internal/pipeline/links.go` | New file: `CanonicalizeLinks` |
|
||||||
|
| `ingestion/internal/pipeline/pipeline.go` | Wire new steps; derive `sourceSlug` from `source` |
|
||||||
|
| `ingestion/internal/pipeline/prompt.go` | New system prompt + `BuildPrompt` for new JSON format |
|
||||||
|
| `brain/schema.md` | Update wikilink format and JSON schema docs |
|
||||||
|
|
||||||
|
`resolve.go`, `refs.go`, `backfill.go`, `merge.go` — no changes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Wikilink Format
|
||||||
|
|
||||||
|
- **LLM output**: `[[Display Name]]`
|
||||||
|
- **Stored on disk**: `[[slug|Display Name]]`
|
||||||
|
- **`CanonicalizeLinks`** converts between the two using the inventory
|
||||||
|
|
||||||
|
This matches Obsidian's display-alias syntax that the existing codebase already uses.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
|
||||||
|
- `ParseRawPages`: table-driven, cover valid JSON, truncated output, unknown type, missing title
|
||||||
|
- `BuildPages`: table-driven, cover slug computation, frontmatter assembly, source page (no `source:` field), entity with subtype
|
||||||
|
- `CanonicalizeLinks`: cover known title → replaced, unknown title → left as-is + warning, multiple links in one page
|
||||||
|
- Integration test: full `Run` call with mock LLM returning new JSON format, assert no slug duplication across two chunks of the same source
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Out of Scope
|
||||||
|
|
||||||
|
- Re-ingesting existing pages (user will trigger manually after deploy)
|
||||||
|
- Changing the `BackfillRefs` endpoint (already correct, slug-based)
|
||||||
|
- Changing the `Resolve` fuzzy-match algorithm
|
||||||
@@ -68,6 +68,7 @@ func main() {
|
|||||||
mux.HandleFunc("POST /write", h.Write)
|
mux.HandleFunc("POST /write", h.Write)
|
||||||
mux.HandleFunc("POST /ingest", h.Ingest)
|
mux.HandleFunc("POST /ingest", h.Ingest)
|
||||||
mux.HandleFunc("POST /ingest-path", h.IngestPath)
|
mux.HandleFunc("POST /ingest-path", h.IngestPath)
|
||||||
|
mux.HandleFunc("POST /backfill-refs", h.BackfillRefs)
|
||||||
|
|
||||||
addr := ":" + port
|
addr := ":" + port
|
||||||
watchIntervalLog := "disabled"
|
watchIntervalLog := "disabled"
|
||||||
|
|||||||
@@ -272,6 +272,18 @@ func (h *Handler) IngestPath(w http.ResponseWriter, r *http.Request) {
|
|||||||
writeJSON(w, ingestResponse{Pages: allPages, Warnings: allWarnings})
|
writeJSON(w, ingestResponse{Pages: allPages, Warnings: allWarnings})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// BackfillRefs handles POST /backfill-refs — injects source back-references
|
||||||
|
// into all concept and entity pages based on existing wiki/sources/ pages.
|
||||||
|
func (h *Handler) BackfillRefs(w http.ResponseWriter, r *http.Request) {
|
||||||
|
n, err := pipeline.BackfillRefs(r.Context(), h.brainDir)
|
||||||
|
if err != nil {
|
||||||
|
h.logger.Error("backfill-refs failed", "err", err)
|
||||||
|
writeError(w, http.StatusInternalServerError, "backfill error")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
writeJSON(w, map[string]int{"updated": n})
|
||||||
|
}
|
||||||
|
|
||||||
func writeJSON(w http.ResponseWriter, v any) {
|
func writeJSON(w http.ResponseWriter, v any) {
|
||||||
w.Header().Set("Content-Type", "application/json")
|
w.Header().Set("Content-Type", "application/json")
|
||||||
json.NewEncoder(w).Encode(v) //nolint:errcheck
|
json.NewEncoder(w).Encode(v) //nolint:errcheck
|
||||||
|
|||||||
@@ -20,9 +20,9 @@ import (
|
|||||||
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
|
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
|
||||||
)
|
)
|
||||||
|
|
||||||
// stubComplete returns a fixed JSON page so tests never call a real LLM.
|
// stubComplete returns a fixed JSON RawPage so tests never call a real LLM.
|
||||||
func stubComplete(_ context.Context, _, _ string) (string, error) {
|
func stubComplete(_ context.Context, _, _ string) (string, error) {
|
||||||
return `[{"path":"wiki/sources/test-source.md","content":"# Test Source\n\nSome content here.\n"}]`, nil
|
return `[{"title":"Test Source","type":"source","subtype":"article","content":"## Summary\n\nSome content here.\n"}]`, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func stubPipelineCfg() pipeline.Config {
|
func stubPipelineCfg() pipeline.Config {
|
||||||
|
|||||||
91
ingestion/internal/pipeline/backfill.go
Normal file
91
ingestion/internal/pipeline/backfill.go
Normal file
@@ -0,0 +1,91 @@
|
|||||||
|
// ingestion/internal/pipeline/backfill.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||||
|
)
|
||||||
|
|
||||||
|
// BackfillRefs walks wiki/sources/ and injects source back-references into every
|
||||||
|
// concept and entity page that each source links to.
|
||||||
|
// Changes for all sources are accumulated in memory before writing, so multiple
|
||||||
|
// sources referencing the same concept are merged in one pass.
|
||||||
|
// Deduplication is handled by wiki.Merge — running this multiple times is safe.
|
||||||
|
// Returns the number of concept/entity pages written.
|
||||||
|
func BackfillRefs(ctx context.Context, brainDir string) (int, error) {
|
||||||
|
inventory, err := wiki.LoadInventory(brainDir)
|
||||||
|
if err != nil {
|
||||||
|
return 0, fmt.Errorf("load inventory: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
sourcesDir := filepath.Join(brainDir, "wiki", "sources")
|
||||||
|
entries, err := os.ReadDir(sourcesDir)
|
||||||
|
if err != nil {
|
||||||
|
if os.IsNotExist(err) {
|
||||||
|
return 0, nil
|
||||||
|
}
|
||||||
|
return 0, fmt.Errorf("read sources dir: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Accumulate all changes before writing: relPath → updated Page.
|
||||||
|
// Collecting first means two sources that both link the same concept
|
||||||
|
// get both refs merged before a single write.
|
||||||
|
pending := make(map[string]wiki.Page)
|
||||||
|
|
||||||
|
for _, e := range entries {
|
||||||
|
if ctx.Err() != nil {
|
||||||
|
return 0, ctx.Err()
|
||||||
|
}
|
||||||
|
if e.IsDir() || !strings.HasSuffix(e.Name(), ".md") {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
b, err := os.ReadFile(filepath.Join(sourcesDir, e.Name()))
|
||||||
|
if err != nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
sourceContent := string(b)
|
||||||
|
sourceSlug := strings.TrimSuffix(e.Name(), ".md")
|
||||||
|
sourceTitle := extractTitle(sourceContent)
|
||||||
|
if sourceTitle == "" {
|
||||||
|
sourceTitle = sourceSlug
|
||||||
|
}
|
||||||
|
sourceRef := "- [[" + sourceSlug + "|" + sourceTitle + "]]"
|
||||||
|
|
||||||
|
for slug := range extractWikilinks(sourceContent) {
|
||||||
|
if slug == sourceSlug {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
pt, ok := findInInventory(slug, inventory)
|
||||||
|
if !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
relPath := "wiki/" + string(pt) + "/" + slug + ".md"
|
||||||
|
|
||||||
|
// Start from already-accumulated version if we've seen this page.
|
||||||
|
page, seen := pending[relPath]
|
||||||
|
if !seen {
|
||||||
|
raw, err := os.ReadFile(filepath.Join(brainDir, filepath.FromSlash(relPath)))
|
||||||
|
if err != nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
page = wiki.Page{Path: relPath, Content: string(raw)}
|
||||||
|
}
|
||||||
|
pending[relPath] = addSourceRef(page, sourceRef)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for relPath, page := range pending {
|
||||||
|
dest := filepath.Join(brainDir, filepath.FromSlash(relPath))
|
||||||
|
if err := os.WriteFile(dest, []byte(page.Content), 0o644); err != nil {
|
||||||
|
return 0, fmt.Errorf("write %s: %w", relPath, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return len(pending), nil
|
||||||
|
}
|
||||||
107
ingestion/internal/pipeline/backfill_test.go
Normal file
107
ingestion/internal/pipeline/backfill_test.go
Normal file
@@ -0,0 +1,107 @@
|
|||||||
|
// ingestion/internal/pipeline/backfill_test.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
func setupBrainDir(t *testing.T) string {
|
||||||
|
t.Helper()
|
||||||
|
dir := t.TempDir()
|
||||||
|
for _, sub := range []string{"wiki/sources", "wiki/concepts", "wiki/entities"} {
|
||||||
|
require.NoError(t, os.MkdirAll(filepath.Join(dir, sub), 0o755))
|
||||||
|
}
|
||||||
|
return dir
|
||||||
|
}
|
||||||
|
|
||||||
|
func writeFile(t *testing.T, path, content string) {
|
||||||
|
t.Helper()
|
||||||
|
require.NoError(t, os.MkdirAll(filepath.Dir(path), 0o755))
|
||||||
|
require.NoError(t, os.WriteFile(path, []byte(content), 0o644))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBackfillRefs_UpdatesConcept(t *testing.T) {
|
||||||
|
dir := setupBrainDir(t)
|
||||||
|
writeFile(t, filepath.Join(dir, "wiki/sources/shape-up.md"),
|
||||||
|
"---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[betting|Betting]].\n")
|
||||||
|
writeFile(t, filepath.Join(dir, "wiki/concepts/betting.md"),
|
||||||
|
"---\ntitle: Betting\n---\n\n## Definition\n\nA resource allocation technique.\n")
|
||||||
|
|
||||||
|
n, err := BackfillRefs(context.Background(), dir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Equal(t, 1, n)
|
||||||
|
|
||||||
|
got, err := os.ReadFile(filepath.Join(dir, "wiki/concepts/betting.md"))
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Contains(t, string(got), "## Sources")
|
||||||
|
assert.Contains(t, string(got), "[[shape-up|Shape Up]]")
|
||||||
|
assert.Contains(t, string(got), "## Definition") // original content preserved
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBackfillRefs_Deduplication(t *testing.T) {
|
||||||
|
dir := setupBrainDir(t)
|
||||||
|
writeFile(t, filepath.Join(dir, "wiki/sources/shape-up.md"),
|
||||||
|
"---\ntitle: Shape Up\n---\n\n## Summary\n\nSee [[betting|Betting]].\n")
|
||||||
|
writeFile(t, filepath.Join(dir, "wiki/concepts/betting.md"),
|
||||||
|
"---\ntitle: Betting\n---\n\n## Definition\n\nA technique.\n")
|
||||||
|
|
||||||
|
// Run twice — should not duplicate the ref.
|
||||||
|
_, err := BackfillRefs(context.Background(), dir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
_, err = BackfillRefs(context.Background(), dir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
got, err := os.ReadFile(filepath.Join(dir, "wiki/concepts/betting.md"))
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
count := 0
|
||||||
|
for _, line := range splitLines(string(got)) {
|
||||||
|
if line == "- [[shape-up|Shape Up]]" {
|
||||||
|
count++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
assert.Equal(t, 1, count, "ref should appear exactly once after two runs")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBackfillRefs_MultipleSources(t *testing.T) {
|
||||||
|
dir := setupBrainDir(t)
|
||||||
|
writeFile(t, filepath.Join(dir, "wiki/sources/book-a.md"),
|
||||||
|
"---\ntitle: Book A\n---\n\n## Summary\n\nSee [[shaping|Shaping]].\n")
|
||||||
|
writeFile(t, filepath.Join(dir, "wiki/sources/book-b.md"),
|
||||||
|
"---\ntitle: Book B\n---\n\n## Summary\n\nAlso [[shaping|Shaping]].\n")
|
||||||
|
writeFile(t, filepath.Join(dir, "wiki/concepts/shaping.md"),
|
||||||
|
"---\ntitle: Shaping\n---\n\n## Definition\n\nA design activity.\n")
|
||||||
|
|
||||||
|
n, err := BackfillRefs(context.Background(), dir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Equal(t, 1, n) // one concept page written
|
||||||
|
|
||||||
|
got, err := os.ReadFile(filepath.Join(dir, "wiki/concepts/shaping.md"))
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Contains(t, string(got), "[[book-a|Book A]]")
|
||||||
|
assert.Contains(t, string(got), "[[book-b|Book B]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBackfillRefs_NoSourcesDir(t *testing.T) {
|
||||||
|
dir := t.TempDir() // no wiki/sources subdir
|
||||||
|
n, err := BackfillRefs(context.Background(), dir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Equal(t, 0, n)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBackfillRefs_SkipsUnknownSlugs(t *testing.T) {
|
||||||
|
dir := setupBrainDir(t)
|
||||||
|
// Source links to a slug not in inventory and not on disk.
|
||||||
|
writeFile(t, filepath.Join(dir, "wiki/sources/article.md"),
|
||||||
|
"---\ntitle: Article\n---\n\n## Summary\n\nSee [[ghost-slug|Ghost]].\n")
|
||||||
|
|
||||||
|
n, err := BackfillRefs(context.Background(), dir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Equal(t, 0, n)
|
||||||
|
}
|
||||||
106
ingestion/internal/pipeline/build.go
Normal file
106
ingestion/internal/pipeline/build.go
Normal file
@@ -0,0 +1,106 @@
|
|||||||
|
// ingestion/internal/pipeline/build.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||||
|
)
|
||||||
|
|
||||||
|
// BuildPages converts RawPages from the LLM into wiki.Pages with computed slugs,
|
||||||
|
// paths, and YAML frontmatter. sourceSlug is the slug of the source being ingested
|
||||||
|
// (derived from the filename, not the LLM title). Pages whose title resolves to an
|
||||||
|
// empty slug are skipped and returned as warnings instead.
|
||||||
|
func BuildPages(rawPages []RawPage, sourceSlug, date string) ([]wiki.Page, []string) {
|
||||||
|
out := make([]wiki.Page, 0, len(rawPages))
|
||||||
|
var warnings []string
|
||||||
|
for _, rp := range rawPages {
|
||||||
|
slug := computeSlug(rp, sourceSlug)
|
||||||
|
if slug == "" {
|
||||||
|
warnings = append(warnings, fmt.Sprintf("skipped page with empty title (type: %s)", rp.Type))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
out = append(out, buildPage(rp, sourceSlug, date))
|
||||||
|
}
|
||||||
|
return out, warnings
|
||||||
|
}
|
||||||
|
|
||||||
|
func computeSlug(rp RawPage, sourceSlug string) string {
|
||||||
|
if rp.Type == "source" {
|
||||||
|
return sourceSlug
|
||||||
|
}
|
||||||
|
return wiki.Slug(rp.Title)
|
||||||
|
}
|
||||||
|
|
||||||
|
func buildPage(rp RawPage, sourceSlug, date string) wiki.Page {
|
||||||
|
var slug, dir string
|
||||||
|
switch rp.Type {
|
||||||
|
case "source":
|
||||||
|
slug = sourceSlug
|
||||||
|
dir = "wiki/sources"
|
||||||
|
case "concept":
|
||||||
|
slug = wiki.Slug(rp.Title)
|
||||||
|
dir = "wiki/concepts"
|
||||||
|
case "entity":
|
||||||
|
slug = wiki.Slug(rp.Title)
|
||||||
|
dir = "wiki/entities"
|
||||||
|
default:
|
||||||
|
slug = wiki.Slug(rp.Title)
|
||||||
|
dir = "wiki/" + rp.Type
|
||||||
|
}
|
||||||
|
|
||||||
|
path := dir + "/" + slug + ".md"
|
||||||
|
fm := buildFrontmatter(rp, date)
|
||||||
|
|
||||||
|
return wiki.Page{
|
||||||
|
Path: path,
|
||||||
|
Content: fm + "\n" + rp.Content,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func buildFrontmatter(rp RawPage, date string) string {
|
||||||
|
var sb strings.Builder
|
||||||
|
sb.WriteString("---\n")
|
||||||
|
fmt.Fprintf(&sb, "title: %s\n", yamlScalar(rp.Title))
|
||||||
|
|
||||||
|
switch rp.Type {
|
||||||
|
case "source":
|
||||||
|
subtype := rp.Subtype
|
||||||
|
if subtype == "" {
|
||||||
|
subtype = "article"
|
||||||
|
}
|
||||||
|
fmt.Fprintf(&sb, "type: %s\n", yamlScalar(subtype))
|
||||||
|
if rp.Domain != "" {
|
||||||
|
fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
|
||||||
|
}
|
||||||
|
fmt.Fprintf(&sb, "date_ingested: %s\n", date)
|
||||||
|
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||||||
|
case "concept":
|
||||||
|
if rp.Domain != "" {
|
||||||
|
fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
|
||||||
|
}
|
||||||
|
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||||||
|
case "entity":
|
||||||
|
if rp.Subtype != "" {
|
||||||
|
fmt.Fprintf(&sb, "type: %s\n", yamlScalar(rp.Subtype))
|
||||||
|
}
|
||||||
|
if rp.Domain != "" {
|
||||||
|
fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
|
||||||
|
}
|
||||||
|
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||||||
|
default:
|
||||||
|
if rp.Domain != "" {
|
||||||
|
fmt.Fprintf(&sb, "domain: %s\n", yamlScalar(rp.Domain))
|
||||||
|
}
|
||||||
|
fmt.Fprintf(&sb, "last_updated: %s\n", date)
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Fprintf(&sb, "aliases:\n - %s\n", yamlScalar(rp.Title))
|
||||||
|
sb.WriteString("---\n")
|
||||||
|
return sb.String()
|
||||||
|
}
|
||||||
|
|
||||||
|
func yamlScalar(s string) string {
|
||||||
|
return "'" + strings.ReplaceAll(s, "'", "''") + "'"
|
||||||
|
}
|
||||||
167
ingestion/internal/pipeline/build_test.go
Normal file
167
ingestion/internal/pipeline/build_test.go
Normal file
@@ -0,0 +1,167 @@
|
|||||||
|
// ingestion/internal/pipeline/build_test.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestBuildPages_SourcePage(t *testing.T) {
|
||||||
|
raw := []RawPage{
|
||||||
|
{
|
||||||
|
Title: "Shape Up",
|
||||||
|
Type: "source",
|
||||||
|
Subtype: "book",
|
||||||
|
Domain: "product-strategy",
|
||||||
|
Content: "## Summary\n\nA book about shaping product work.\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.Empty(t, warnings)
|
||||||
|
|
||||||
|
p := pages[0]
|
||||||
|
assert.Equal(t, "wiki/sources/shape-up.md", p.Path)
|
||||||
|
assert.Contains(t, p.Content, "title: 'Shape Up'")
|
||||||
|
assert.Contains(t, p.Content, "type: 'book'")
|
||||||
|
assert.Contains(t, p.Content, "domain: 'product-strategy'")
|
||||||
|
assert.Contains(t, p.Content, "date_ingested: 2026-04-23")
|
||||||
|
assert.Contains(t, p.Content, "last_updated: 2026-04-23")
|
||||||
|
assert.Contains(t, p.Content, "aliases:\n - 'Shape Up'")
|
||||||
|
assert.Contains(t, p.Content, "## Summary")
|
||||||
|
assert.True(t, strings.HasPrefix(p.Content, "---\n"), "content must start with frontmatter")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_ConceptPage(t *testing.T) {
|
||||||
|
raw := []RawPage{
|
||||||
|
{
|
||||||
|
Title: "Betting",
|
||||||
|
Type: "concept",
|
||||||
|
Domain: "product-strategy",
|
||||||
|
Content: "## Definition\n\nA resource allocation technique.\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.Empty(t, warnings)
|
||||||
|
|
||||||
|
p := pages[0]
|
||||||
|
assert.Equal(t, "wiki/concepts/betting.md", p.Path)
|
||||||
|
assert.Contains(t, p.Content, "title: 'Betting'")
|
||||||
|
assert.Contains(t, p.Content, "domain: 'product-strategy'")
|
||||||
|
assert.Contains(t, p.Content, "last_updated: 2026-04-23")
|
||||||
|
assert.Contains(t, p.Content, "aliases:\n - 'Betting'")
|
||||||
|
assert.NotContains(t, p.Content, "date_ingested")
|
||||||
|
assert.Contains(t, p.Content, "## Definition")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_EntityPage(t *testing.T) {
|
||||||
|
raw := []RawPage{
|
||||||
|
{
|
||||||
|
Title: "Ryan Singer",
|
||||||
|
Type: "entity",
|
||||||
|
Subtype: "person",
|
||||||
|
Domain: "product-strategy",
|
||||||
|
Content: "## Description\n\nA product designer.\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
pages, warnings := BuildPages(raw, "shape-up", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.Empty(t, warnings)
|
||||||
|
|
||||||
|
p := pages[0]
|
||||||
|
assert.Equal(t, "wiki/entities/ryan-singer.md", p.Path)
|
||||||
|
assert.Contains(t, p.Content, "title: 'Ryan Singer'")
|
||||||
|
assert.Contains(t, p.Content, "type: 'person'")
|
||||||
|
assert.Contains(t, p.Content, "domain: 'product-strategy'")
|
||||||
|
assert.Contains(t, p.Content, "last_updated: 2026-04-23")
|
||||||
|
assert.Contains(t, p.Content, "aliases:\n - 'Ryan Singer'")
|
||||||
|
assert.NotContains(t, p.Content, "date_ingested")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_SourceSlugUsedForSourcePage(t *testing.T) {
|
||||||
|
// LLM title differs from filename — pipeline uses sourceSlug for the source page path.
|
||||||
|
raw := []RawPage{
|
||||||
|
{Title: "FinBERT: A Pretrained Model", Type: "source", Subtype: "article", Content: "## Summary\n\nA model.\n"},
|
||||||
|
}
|
||||||
|
pages, _ := BuildPages(raw, "finbert-huggingface", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.Equal(t, "wiki/sources/finbert-huggingface.md", pages[0].Path)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_ConceptSlugDerivedFromTitle(t *testing.T) {
|
||||||
|
raw := []RawPage{
|
||||||
|
{Title: "Domain-Driven Design", Type: "concept", Content: "## Definition\n\nFoo.\n"},
|
||||||
|
}
|
||||||
|
pages, _ := BuildPages(raw, "some-source", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.Equal(t, "wiki/concepts/domain-driven-design.md", pages[0].Path)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_SourceDefaultSubtype(t *testing.T) {
|
||||||
|
// If subtype is omitted for a source, default to "article"
|
||||||
|
raw := []RawPage{
|
||||||
|
{Title: "Some Post", Type: "source", Content: "## Summary\n\nA post.\n"},
|
||||||
|
}
|
||||||
|
pages, _ := BuildPages(raw, "some-post", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.Contains(t, pages[0].Content, "type: 'article'")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_OmitsDomainWhenEmpty(t *testing.T) {
|
||||||
|
raw := []RawPage{
|
||||||
|
{Title: "Betting", Type: "concept", Content: "## Definition\n\nFoo.\n"},
|
||||||
|
}
|
||||||
|
pages, _ := BuildPages(raw, "src", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.NotContains(t, pages[0].Content, "domain:")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_MultiplePages(t *testing.T) {
|
||||||
|
raw := []RawPage{
|
||||||
|
{Title: "Shape Up", Type: "source", Subtype: "book", Content: "## Summary\n\nA book.\n"},
|
||||||
|
{Title: "Betting", Type: "concept", Content: "## Definition\n\nA technique.\n"},
|
||||||
|
{Title: "Ryan Singer", Type: "entity", Subtype: "person", Content: "## Description\n\nA designer.\n"},
|
||||||
|
}
|
||||||
|
pages, _ := BuildPages(raw, "shape-up", "2026-04-23")
|
||||||
|
require.Len(t, pages, 3)
|
||||||
|
assert.Equal(t, "wiki/sources/shape-up.md", pages[0].Path)
|
||||||
|
assert.Equal(t, "wiki/concepts/betting.md", pages[1].Path)
|
||||||
|
assert.Equal(t, "wiki/entities/ryan-singer.md", pages[2].Path)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_TitleWithColon(t *testing.T) {
|
||||||
|
raw := []RawPage{
|
||||||
|
{Title: "Shape Up: The Basecamp Method", Type: "source", Subtype: "book", Content: "## Summary\n\nA book.\n"},
|
||||||
|
}
|
||||||
|
pages, _ := BuildPages(raw, "shape-up", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
// Title with colon must be quoted in YAML
|
||||||
|
assert.Contains(t, pages[0].Content, "title: 'Shape Up: The Basecamp Method'")
|
||||||
|
assert.Contains(t, pages[0].Content, "aliases:\n - 'Shape Up: The Basecamp Method'")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_EntityNoSubtype(t *testing.T) {
|
||||||
|
raw := []RawPage{
|
||||||
|
{Title: "Basecamp", Type: "entity", Content: "## Description\n\nA company.\n"},
|
||||||
|
}
|
||||||
|
pages, _ := BuildPages(raw, "src", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.NotContains(t, pages[0].Content, "type:")
|
||||||
|
assert.Contains(t, pages[0].Content, "title: 'Basecamp'")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBuildPages_EmptyTitleSkippedWithWarning(t *testing.T) {
|
||||||
|
raw := []RawPage{
|
||||||
|
{Title: "", Type: "concept", Content: "## Definition\n\nFoo.\n"},
|
||||||
|
{Title: "Betting", Type: "concept", Content: "## Definition\n\nA technique.\n"},
|
||||||
|
}
|
||||||
|
pages, warnings := BuildPages(raw, "src", "2026-04-23")
|
||||||
|
require.Len(t, pages, 1, "empty-title page should be skipped")
|
||||||
|
assert.Equal(t, "wiki/concepts/betting.md", pages[0].Path)
|
||||||
|
assert.Len(t, warnings, 1)
|
||||||
|
assert.Contains(t, warnings[0], "empty title")
|
||||||
|
}
|
||||||
70
ingestion/internal/pipeline/links.go
Normal file
70
ingestion/internal/pipeline/links.go
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
// ingestion/internal/pipeline/links.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"path/filepath"
|
||||||
|
"regexp"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||||
|
)
|
||||||
|
|
||||||
|
// plainLinkRE matches [[Display Name]] — wikilinks without a slug pipe.
|
||||||
|
// It does NOT match [[slug|Display]] (those already have a pipe).
|
||||||
|
var plainLinkRE = regexp.MustCompile(`\[\[([^\]|]+)\]\]`)
|
||||||
|
|
||||||
|
// CanonicalizeLinks converts [[Display Name]] wikilinks to [[slug|Display Name]]
|
||||||
|
// using a title→slug map built from the inventory and current batch.
|
||||||
|
// Unknown titles are left as-is and returned as warnings.
|
||||||
|
func CanonicalizeLinks(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) ([]wiki.Page, []string) {
|
||||||
|
titleToSlug := buildTitleMap(pages, inventory)
|
||||||
|
|
||||||
|
var allWarnings []string
|
||||||
|
out := make([]wiki.Page, len(pages))
|
||||||
|
for i, p := range pages {
|
||||||
|
newContent, warnings := canonicalizeContent(p.Content, titleToSlug)
|
||||||
|
p.Content = newContent
|
||||||
|
out[i] = p
|
||||||
|
allWarnings = append(allWarnings, warnings...)
|
||||||
|
}
|
||||||
|
return out, allWarnings
|
||||||
|
}
|
||||||
|
|
||||||
|
// buildTitleMap builds a lowercase-title → slug map from inventory and current batch.
|
||||||
|
// Current batch entries take precedence over inventory (they may be updates).
|
||||||
|
func buildTitleMap(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry) map[string]string {
|
||||||
|
m := make(map[string]string)
|
||||||
|
for _, entries := range inventory {
|
||||||
|
for _, e := range entries {
|
||||||
|
m[strings.ToLower(e.Title)] = e.Slug
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Current batch overrides inventory
|
||||||
|
for _, p := range pages {
|
||||||
|
title := extractTitle(p.Content)
|
||||||
|
slug := strings.TrimSuffix(filepath.Base(p.Path), ".md")
|
||||||
|
if title != "" && slug != "" {
|
||||||
|
m[strings.ToLower(title)] = slug
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return m
|
||||||
|
}
|
||||||
|
|
||||||
|
func canonicalizeContent(content string, titleToSlug map[string]string) (string, []string) {
|
||||||
|
var warnings []string
|
||||||
|
result := plainLinkRE.ReplaceAllStringFunc(content, func(match string) string {
|
||||||
|
sub := plainLinkRE.FindStringSubmatch(match)
|
||||||
|
if len(sub) < 2 {
|
||||||
|
return match
|
||||||
|
}
|
||||||
|
displayName := sub[1]
|
||||||
|
slug, ok := titleToSlug[strings.ToLower(displayName)]
|
||||||
|
if !ok {
|
||||||
|
warnings = append(warnings, fmt.Sprintf("unknown wikilink: [[%s]]", displayName))
|
||||||
|
return match
|
||||||
|
}
|
||||||
|
return "[[" + slug + "|" + displayName + "]]"
|
||||||
|
})
|
||||||
|
return result, warnings
|
||||||
|
}
|
||||||
125
ingestion/internal/pipeline/links_test.go
Normal file
125
ingestion/internal/pipeline/links_test.go
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
// ingestion/internal/pipeline/links_test.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
|
||||||
|
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestCanonicalizeLinks_KnownTitle(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/shape-up.md",
|
||||||
|
Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Betting]].\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
inventory := map[wiki.PageType][]wiki.Entry{
|
||||||
|
wiki.PageTypeConcept: {
|
||||||
|
{Slug: "betting", Title: "Betting"},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||||
|
require.Len(t, got, 1)
|
||||||
|
assert.Empty(t, warnings)
|
||||||
|
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||||||
|
assert.NotContains(t, got[0].Content, "[[Betting]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCanonicalizeLinks_UnknownTitleLeftAsIs(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/shape-up.md",
|
||||||
|
Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Ghost Concept]].\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
inventory := map[wiki.PageType][]wiki.Entry{}
|
||||||
|
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||||
|
require.Len(t, got, 1)
|
||||||
|
assert.NotEmpty(t, warnings)
|
||||||
|
assert.Contains(t, got[0].Content, "[[Ghost Concept]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCanonicalizeLinks_AlreadyCanonicalLinkUntouched(t *testing.T) {
|
||||||
|
// Links already in [[slug|Display]] format must not be double-converted
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/shape-up.md",
|
||||||
|
Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[betting|Betting]].\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
inventory := map[wiki.PageType][]wiki.Entry{
|
||||||
|
wiki.PageTypeConcept: {
|
||||||
|
{Slug: "betting", Title: "Betting"},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||||
|
require.Len(t, got, 1)
|
||||||
|
assert.Empty(t, warnings)
|
||||||
|
// Should remain exactly as-is — not double-wrapped
|
||||||
|
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||||||
|
assert.NotContains(t, got[0].Content, "[[betting|[[betting|Betting]]]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCanonicalizeLinks_CaseInsensitiveMatch(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/foo.md",
|
||||||
|
Content: "---\ntitle: 'Foo'\n---\n\n## Summary\n\nSee [[domain driven design]].\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
inventory := map[wiki.PageType][]wiki.Entry{
|
||||||
|
wiki.PageTypeConcept: {
|
||||||
|
{Slug: "domain-driven-design", Title: "Domain Driven Design"},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||||
|
require.Len(t, got, 1)
|
||||||
|
assert.Empty(t, warnings)
|
||||||
|
assert.Contains(t, got[0].Content, "[[domain-driven-design|domain driven design]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCanonicalizeLinks_CurrentBatchPagesResolved(t *testing.T) {
|
||||||
|
// A concept created in the same batch should be canonicalizable
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/shape-up.md",
|
||||||
|
Content: "---\ntitle: 'Shape Up'\n---\n\n## Summary\n\nSee [[Betting]].\n",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Path: "wiki/concepts/betting.md",
|
||||||
|
Content: "---\ntitle: 'Betting'\n---\n\n## Definition\n\nA technique.\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
inventory := map[wiki.PageType][]wiki.Entry{} // empty — Betting is in the batch, not inventory
|
||||||
|
|
||||||
|
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||||
|
require.Len(t, got, 2)
|
||||||
|
assert.Empty(t, warnings)
|
||||||
|
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCanonicalizeLinks_MultipleLinksInOnePage(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/foo.md",
|
||||||
|
Content: "---\ntitle: 'Foo'\n---\n\n## Summary\n\nSee [[Betting]] and [[Shape Up]].\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
inventory := map[wiki.PageType][]wiki.Entry{
|
||||||
|
wiki.PageTypeConcept: {
|
||||||
|
{Slug: "betting", Title: "Betting"},
|
||||||
|
},
|
||||||
|
wiki.PageTypeSource: {
|
||||||
|
{Slug: "shape-up", Title: "Shape Up"},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
got, warnings := CanonicalizeLinks(pages, inventory)
|
||||||
|
require.Len(t, got, 1)
|
||||||
|
assert.Empty(t, warnings)
|
||||||
|
assert.Contains(t, got[0].Content, "[[betting|Betting]]")
|
||||||
|
assert.Contains(t, got[0].Content, "[[shape-up|Shape Up]]")
|
||||||
|
}
|
||||||
@@ -5,13 +5,21 @@ import (
|
|||||||
"encoding/json"
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
"strings"
|
"strings"
|
||||||
|
|
||||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
// ParsePages parses LLM output as a JSON array of {path, content} objects.
|
// RawPage is the LLM's output format — minimal structured data with no path or frontmatter.
|
||||||
|
// The pipeline derives slugs, paths, and frontmatter from these fields.
|
||||||
|
type RawPage struct {
|
||||||
|
Title string `json:"title"`
|
||||||
|
Type string `json:"type"` // "source" | "concept" | "entity"
|
||||||
|
Subtype string `json:"subtype"` // entity: person|company|tool|model|framework|technology; source: article|pdf|book|video|note|project
|
||||||
|
Domain string `json:"domain"`
|
||||||
|
Content string `json:"content"` // Markdown body only — no frontmatter
|
||||||
|
}
|
||||||
|
|
||||||
|
// ParseRawPages parses LLM output as a JSON array of RawPage objects.
|
||||||
// If the array is truncated mid-object (token limit), it salvages all complete objects.
|
// If the array is truncated mid-object (token limit), it salvages all complete objects.
|
||||||
func ParsePages(output string) ([]wiki.Page, []string) {
|
func ParseRawPages(output string) ([]RawPage, []string) {
|
||||||
output = strings.TrimSpace(output)
|
output = strings.TrimSpace(output)
|
||||||
if output == "" {
|
if output == "" {
|
||||||
return nil, []string{"LLM returned empty output"}
|
return nil, []string{"LLM returned empty output"}
|
||||||
@@ -19,7 +27,7 @@ func ParsePages(output string) ([]wiki.Page, []string) {
|
|||||||
|
|
||||||
output = stripFences(output)
|
output = stripFences(output)
|
||||||
|
|
||||||
var pages []wiki.Page
|
var pages []RawPage
|
||||||
if err := json.Unmarshal([]byte(output), &pages); err == nil {
|
if err := json.Unmarshal([]byte(output), &pages); err == nil {
|
||||||
return pages, nil
|
return pages, nil
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -8,39 +8,54 @@ import (
|
|||||||
"github.com/stretchr/testify/require"
|
"github.com/stretchr/testify/require"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestParsePages_ValidJSON(t *testing.T) {
|
func TestParseRawPages_ValidJSON(t *testing.T) {
|
||||||
input := `[{"path":"wiki/sources/foo.md","content":"# Foo"},{"path":"wiki/concepts/bar.md","content":"# Bar"}]`
|
input := `[{"title":"Shape Up","type":"source","subtype":"book","domain":"product-strategy","content":"## Summary\n\nFoo."},{"title":"Betting","type":"concept","content":"## Definition\n\nA technique."}]`
|
||||||
pages, warnings := ParsePages(input)
|
pages, warnings := ParseRawPages(input)
|
||||||
require.Len(t, pages, 2)
|
require.Len(t, pages, 2)
|
||||||
assert.Empty(t, warnings)
|
assert.Empty(t, warnings)
|
||||||
assert.Equal(t, "wiki/sources/foo.md", pages[0].Path)
|
assert.Equal(t, "Shape Up", pages[0].Title)
|
||||||
assert.Equal(t, "wiki/concepts/bar.md", pages[1].Path)
|
assert.Equal(t, "source", pages[0].Type)
|
||||||
|
assert.Equal(t, "book", pages[0].Subtype)
|
||||||
|
assert.Equal(t, "product-strategy", pages[0].Domain)
|
||||||
|
assert.Equal(t, "Betting", pages[1].Title)
|
||||||
|
assert.Equal(t, "concept", pages[1].Type)
|
||||||
|
assert.Empty(t, pages[1].Subtype)
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParsePages_StripsFences(t *testing.T) {
|
func TestParseRawPages_StripsFences(t *testing.T) {
|
||||||
input := "```json\n[{\"path\":\"wiki/sources/foo.md\",\"content\":\"# Foo\"}]\n```"
|
input := "```json\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"## Definition\\n\\nFoo.\"}]\n```"
|
||||||
pages, warnings := ParsePages(input)
|
pages, warnings := ParseRawPages(input)
|
||||||
assert.Len(t, pages, 1)
|
|
||||||
assert.Empty(t, warnings)
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestParsePages_TruncationRecovery(t *testing.T) {
|
|
||||||
input := `[{"path":"wiki/sources/foo.md","content":"# Foo"},{"path":"wiki/concepts/bar.md","content":"trunc`
|
|
||||||
pages, warnings := ParsePages(input)
|
|
||||||
require.Len(t, pages, 1)
|
require.Len(t, pages, 1)
|
||||||
assert.Equal(t, "wiki/sources/foo.md", pages[0].Path)
|
assert.Empty(t, warnings)
|
||||||
|
assert.Equal(t, "Foo", pages[0].Title)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestParseRawPages_TruncationRecovery(t *testing.T) {
|
||||||
|
input := `[{"title":"Foo","type":"concept","content":"## Definition\n\nFoo."},{"title":"Bar","type":"concept","content":"trunc`
|
||||||
|
pages, warnings := ParseRawPages(input)
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.Equal(t, "Foo", pages[0].Title)
|
||||||
assert.NotEmpty(t, warnings)
|
assert.NotEmpty(t, warnings)
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParsePages_EmptyInput(t *testing.T) {
|
func TestParseRawPages_EmptyInput(t *testing.T) {
|
||||||
pages, warnings := ParsePages("")
|
pages, warnings := ParseRawPages("")
|
||||||
assert.Empty(t, pages)
|
assert.Empty(t, pages)
|
||||||
assert.NotEmpty(t, warnings)
|
assert.NotEmpty(t, warnings)
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParsePages_PlainFence(t *testing.T) {
|
func TestParseRawPages_PlainFence(t *testing.T) {
|
||||||
input := "```\n[{\"path\":\"wiki/sources/foo.md\",\"content\":\"ok\"}]\n```"
|
input := "```\n[{\"title\":\"Foo\",\"type\":\"concept\",\"content\":\"ok\"}]\n```"
|
||||||
pages, warnings := ParsePages(input)
|
pages, warnings := ParseRawPages(input)
|
||||||
assert.Len(t, pages, 1)
|
require.Len(t, pages, 1)
|
||||||
assert.Empty(t, warnings)
|
assert.Empty(t, warnings)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestParseRawPages_MissingTitle(t *testing.T) {
|
||||||
|
// Missing title — still parsed, Title is empty string
|
||||||
|
input := `[{"type":"concept","content":"## Definition\n\nFoo."}]`
|
||||||
|
pages, warnings := ParseRawPages(input)
|
||||||
|
require.Len(t, pages, 1)
|
||||||
|
assert.Empty(t, warnings)
|
||||||
|
assert.Empty(t, pages[0].Title)
|
||||||
|
}
|
||||||
|
|||||||
@@ -41,9 +41,11 @@ func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryR
|
|||||||
schema = loadSchema(brainDir)
|
schema = loadSchema(brainDir)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
sourceSlug := wiki.Slug(source)
|
||||||
|
date := time.Now().UTC().Format("2006-01-02")
|
||||||
chunks := Chunk(content, cfg.ChunkSize)
|
chunks := Chunk(content, cfg.ChunkSize)
|
||||||
|
|
||||||
var allPages []wiki.Page
|
var allRaw []RawPage
|
||||||
var allWarnings []string
|
var allWarnings []string
|
||||||
|
|
||||||
for _, chunk := range chunks {
|
for _, chunk := range chunks {
|
||||||
@@ -52,17 +54,20 @@ func Run(ctx context.Context, cfg Config, brainDir, content, source string, dryR
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return Result{}, fmt.Errorf("LLM call: %w", err)
|
return Result{}, fmt.Errorf("LLM call: %w", err)
|
||||||
}
|
}
|
||||||
pages, warnings := ParsePages(output)
|
raw, warnings := ParseRawPages(output)
|
||||||
allPages = append(allPages, pages...)
|
allRaw = append(allRaw, raw...)
|
||||||
allWarnings = append(allWarnings, warnings...)
|
allWarnings = append(allWarnings, warnings...)
|
||||||
}
|
}
|
||||||
|
|
||||||
resolved := Resolve(allPages, inventory)
|
pages, buildWarnings := BuildPages(allRaw, sourceSlug, date)
|
||||||
merged := mergeAll(resolved)
|
allWarnings = append(allWarnings, buildWarnings...)
|
||||||
|
resolved := Resolve(pages, inventory)
|
||||||
|
canonicalized, linkWarnings := CanonicalizeLinks(resolved, inventory)
|
||||||
|
allWarnings = append(allWarnings, linkWarnings...)
|
||||||
|
withRefs := injectSourceRefs(canonicalized, inventory, brainDir)
|
||||||
|
merged := mergeAll(withRefs)
|
||||||
|
|
||||||
date := time.Now().UTC().Format("2006-01-02")
|
|
||||||
var written []string
|
var written []string
|
||||||
|
|
||||||
for _, page := range merged {
|
for _, page := range merged {
|
||||||
if !dryRun {
|
if !dryRun {
|
||||||
dest := filepath.Join(brainDir, filepath.FromSlash(page.Path))
|
dest := filepath.Join(brainDir, filepath.FromSlash(page.Path))
|
||||||
|
|||||||
@@ -15,7 +15,6 @@ import (
|
|||||||
"github.com/stretchr/testify/require"
|
"github.com/stretchr/testify/require"
|
||||||
|
|
||||||
"github.com/mathiasbq/hyperguild/ingestion/internal/llm"
|
"github.com/mathiasbq/hyperguild/ingestion/internal/llm"
|
||||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestRun_WritesPages(t *testing.T) {
|
func TestRun_WritesPages(t *testing.T) {
|
||||||
@@ -24,14 +23,19 @@ func TestRun_WritesPages(t *testing.T) {
|
|||||||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||||||
}
|
}
|
||||||
|
|
||||||
llmResponse := mustJSON([]wiki.Page{
|
llmResponse := mustJSON([]RawPage{
|
||||||
{
|
{
|
||||||
Path: "wiki/sources/test-article.md",
|
Title: "Test Article",
|
||||||
Content: "---\ntitle: Test Article\ntype: article\ndomain: software-engineering\ndate_ingested: 2026-04-22\nlast_updated: 2026-04-22\naliases:\n - Test Article\n---\n\n## Summary\n\nA test article.\n\n## Key Claims\n\n- It tests things.\n\n## Concepts Introduced or Reinforced\n\n## Entities Mentioned\n\n## Open Questions Raised\n",
|
Type: "source",
|
||||||
|
Subtype: "article",
|
||||||
|
Domain: "software-engineering",
|
||||||
|
Content: "## Summary\n\nA test article.\n\n## Key Claims\n\n- It tests things.\n\n## Concepts Introduced or Reinforced\n\n[[Testing]]\n\n## Entities Mentioned\n\n## Open Questions Raised\n",
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
Path: "wiki/concepts/testing.md",
|
Title: "Testing",
|
||||||
Content: "---\ntitle: Testing\ndomain: software-engineering\nlast_updated: 2026-04-22\naliases:\n - Testing\n---\n\n## Definition\n\nThe practice of verifying software.\n\n## Why It Matters\n\nCatches bugs.\n\n## Related Concepts\n\n## Related Entities\n\n## Sources\n\n## Evolving Notes\n",
|
Type: "concept",
|
||||||
|
Domain: "software-engineering",
|
||||||
|
Content: "## Definition\n\nThe practice of verifying software.\n\n## Why It Matters\n\nCatches bugs.\n\n## Related Concepts\n\n## Related Entities\n\n## Sources\n\n## Evolving Notes\n",
|
||||||
},
|
},
|
||||||
})
|
})
|
||||||
|
|
||||||
@@ -53,7 +57,6 @@ func TestRun_WritesPages(t *testing.T) {
|
|||||||
result, err := Run(context.Background(), cfg, brainDir, "An article about testing.", "test-article", false)
|
result, err := Run(context.Background(), cfg, brainDir, "An article about testing.", "test-article", false)
|
||||||
require.NoError(t, err)
|
require.NoError(t, err)
|
||||||
assert.Len(t, result.Pages, 2)
|
assert.Len(t, result.Pages, 2)
|
||||||
assert.Empty(t, result.Warnings)
|
|
||||||
|
|
||||||
_, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "test-article.md"))
|
_, err = os.Stat(filepath.Join(brainDir, "wiki", "sources", "test-article.md"))
|
||||||
require.NoError(t, err)
|
require.NoError(t, err)
|
||||||
@@ -71,9 +74,11 @@ func TestRun_DryRunDoesNotWrite(t *testing.T) {
|
|||||||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||||||
}
|
}
|
||||||
|
|
||||||
llmResponse := mustJSON([]wiki.Page{{
|
llmResponse := mustJSON([]RawPage{{
|
||||||
Path: "wiki/sources/foo.md",
|
Title: "Foo",
|
||||||
Content: "---\ntitle: Foo\n---\n\n## Summary\n\nFoo.\n",
|
Type: "source",
|
||||||
|
Subtype: "article",
|
||||||
|
Content: "## Summary\n\nFoo.\n",
|
||||||
}})
|
}})
|
||||||
|
|
||||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
@@ -98,10 +103,10 @@ func TestRun_MergesDuplicatePaths(t *testing.T) {
|
|||||||
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
require.NoError(t, os.MkdirAll(filepath.Join(brainDir, sub), 0o755))
|
||||||
}
|
}
|
||||||
|
|
||||||
// LLM returns same path twice (simulates multi-chunk merge)
|
// LLM returns same title twice (simulates multi-chunk duplicate)
|
||||||
llmResponse := mustJSON([]wiki.Page{
|
llmResponse := mustJSON([]RawPage{
|
||||||
{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFirst.\n\n## Related Concepts\n\n- [[bar|Bar]]\n"},
|
{Title: "Foo", Type: "concept", Content: "## Definition\n\nFirst.\n\n## Related Concepts\n\n[[Bar]]\n"},
|
||||||
{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nSecond.\n\n## Related Concepts\n\n- [[baz|Baz]]\n"},
|
{Title: "Foo", Type: "concept", Content: "## Definition\n\nSecond.\n\n## Related Concepts\n\n[[Baz]]\n"},
|
||||||
})
|
})
|
||||||
|
|
||||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
@@ -120,8 +125,9 @@ func TestRun_MergesDuplicatePaths(t *testing.T) {
|
|||||||
require.NoError(t, err)
|
require.NoError(t, err)
|
||||||
// keep-first for Definition, union for Related Concepts
|
// keep-first for Definition, union for Related Concepts
|
||||||
assert.Contains(t, string(content), "First.")
|
assert.Contains(t, string(content), "First.")
|
||||||
assert.Contains(t, string(content), "[[bar|Bar]]")
|
// Bar and Baz unknown in empty inventory → left as plain [[links]]
|
||||||
assert.Contains(t, string(content), "[[baz|Baz]]")
|
assert.Contains(t, string(content), "[[Bar]]")
|
||||||
|
assert.Contains(t, string(content), "[[Baz]]")
|
||||||
}
|
}
|
||||||
|
|
||||||
func mustJSON(v any) string {
|
func mustJSON(v any) string {
|
||||||
|
|||||||
@@ -12,12 +12,15 @@ import (
|
|||||||
const systemPrompt = `You are a wiki agent. Read the source material and produce structured wiki pages following the schema provided.
|
const systemPrompt = `You are a wiki agent. Read the source material and produce structured wiki pages following the schema provided.
|
||||||
|
|
||||||
Output ONLY a valid JSON array — no markdown fences, no other text before or after.
|
Output ONLY a valid JSON array — no markdown fences, no other text before or after.
|
||||||
Each element must have:
|
Each element must have exactly these fields:
|
||||||
"path" — relative path within the wiki, e.g. "wiki/sources/foo.md"
|
"title" — exact page title (e.g. "FinBERT", "Ryan Singer", "Shape Up")
|
||||||
"content" — full markdown content of the page including YAML frontmatter
|
"type" — exactly one of: "source", "concept", "entity"
|
||||||
|
"subtype" — for source: article|pdf|book|video|note|project; for entity: person|company|tool|model|framework|technology; omit for concept
|
||||||
|
"domain" — one of the domains in the schema (omit if none fits)
|
||||||
|
"content" — Markdown body only — NO frontmatter, NO path, NO slug
|
||||||
|
|
||||||
Follow the schema strictly: correct frontmatter fields, wikilinks as [[slug|Display Text]],
|
Wikilinks in content: [[Display Name]] — just the display name, no slug, no pipe separator.
|
||||||
dates in YYYY-MM-DD format, and paraphrase rather than quoting verbatim.`
|
Only link to pages listed in the inventory or pages you are creating in this response.`
|
||||||
|
|
||||||
// BuildPrompt constructs the user prompt for a single chunk.
|
// BuildPrompt constructs the user prompt for a single chunk.
|
||||||
func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]wiki.Entry) string {
|
func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]wiki.Entry) string {
|
||||||
@@ -30,7 +33,7 @@ func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]w
|
|||||||
sb.WriteString("\n\n")
|
sb.WriteString("\n\n")
|
||||||
|
|
||||||
sb.WriteString("## Existing wiki pages\n\n")
|
sb.WriteString("## Existing wiki pages\n\n")
|
||||||
sb.WriteString("Link ONLY to pages in this inventory or pages you are creating in this response.\n\n")
|
sb.WriteString("Reference these pages by display name only — [[Display Name]] — in your content.\n\n")
|
||||||
|
|
||||||
for _, pt := range []wiki.PageType{wiki.PageTypeConcept, wiki.PageTypeEntity, wiki.PageTypeSource} {
|
for _, pt := range []wiki.PageType{wiki.PageTypeConcept, wiki.PageTypeEntity, wiki.PageTypeSource} {
|
||||||
entries := inventory[pt]
|
entries := inventory[pt]
|
||||||
@@ -39,19 +42,19 @@ func BuildPrompt(schema, source, content string, inventory map[wiki.PageType][]w
|
|||||||
fmt.Fprintf(&sb, "%s — (none yet)\n\n", label)
|
fmt.Fprintf(&sb, "%s — (none yet)\n\n", label)
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
fmt.Fprintf(&sb, "%s — link ONLY under the matching section:\n", label)
|
fmt.Fprintf(&sb, "%s:\n", label)
|
||||||
for _, e := range entries {
|
for _, e := range entries {
|
||||||
fmt.Fprintf(&sb, " - [[%s|%s]]\n", e.Slug, e.Title)
|
fmt.Fprintf(&sb, " - %s\n", e.Title)
|
||||||
}
|
}
|
||||||
sb.WriteString("\n")
|
sb.WriteString("\n")
|
||||||
}
|
}
|
||||||
|
|
||||||
sb.WriteString("## Non-negotiable rules\n\n")
|
sb.WriteString("## Non-negotiable rules\n\n")
|
||||||
sb.WriteString("1. Output ONLY a valid JSON array — no prose, no fences.\n")
|
sb.WriteString("1. Output ONLY a valid JSON array — no prose, no fences.\n")
|
||||||
sb.WriteString("2. Slugs are kebab-case: lowercase, spaces→hyphens, no special chars.\n")
|
sb.WriteString("2. Fields: title, type, subtype (if applicable), domain (if applicable), content.\n")
|
||||||
sb.WriteString("3. Wikilinks: [[slug|Display Text]] — the pipe is required.\n")
|
sb.WriteString("3. Wikilinks: [[Display Name]] — no slug, no pipe. The pipeline handles slugs.\n")
|
||||||
sb.WriteString("4. Section links must match their section type.\n")
|
sb.WriteString("4. Section links must match their section type (Related Concepts → concepts only, etc.).\n")
|
||||||
sb.WriteString("5. One source page per book — update it if inventory shows it exists.\n\n")
|
sb.WriteString("5. One source page per book — if inventory shows it exists, return it as an UPDATE.\n\n")
|
||||||
|
|
||||||
fmt.Fprintf(&sb, "## Source: %s\n\n", source)
|
fmt.Fprintf(&sb, "## Source: %s\n\n", source)
|
||||||
sb.WriteString(content)
|
sb.WriteString(content)
|
||||||
|
|||||||
115
ingestion/internal/pipeline/refs.go
Normal file
115
ingestion/internal/pipeline/refs.go
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
// ingestion/internal/pipeline/refs.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"regexp"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||||
|
)
|
||||||
|
|
||||||
|
var wikilinkRE = regexp.MustCompile(`\[\[([^|\]]+)\|`)
|
||||||
|
|
||||||
|
// injectSourceRefs finds the source page in the proposed batch, extracts its
|
||||||
|
// wikilinks, and injects a back-reference into every linked concept or entity page.
|
||||||
|
// Pages that exist on disk but are not in the current batch are loaded and
|
||||||
|
// appended so they will be updated on write.
|
||||||
|
func injectSourceRefs(pages []wiki.Page, inventory map[wiki.PageType][]wiki.Entry, brainDir string) []wiki.Page {
|
||||||
|
sourceSlug, sourceTitle, found := findSourcePage(pages)
|
||||||
|
if !found {
|
||||||
|
return pages
|
||||||
|
}
|
||||||
|
|
||||||
|
var sourceContent string
|
||||||
|
for _, p := range pages {
|
||||||
|
if strings.HasPrefix(p.Path, "wiki/sources/") &&
|
||||||
|
strings.TrimSuffix(filepath.Base(p.Path), ".md") == sourceSlug {
|
||||||
|
sourceContent = p.Content
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
linkedSlugs := extractWikilinks(sourceContent)
|
||||||
|
sourceRef := "- [[" + sourceSlug + "|" + sourceTitle + "]]"
|
||||||
|
|
||||||
|
bySlug := make(map[string]int, len(pages))
|
||||||
|
for i, p := range pages {
|
||||||
|
if !strings.HasPrefix(p.Path, "wiki/sources/") {
|
||||||
|
bySlug[strings.TrimSuffix(filepath.Base(p.Path), ".md")] = i
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for slug := range linkedSlugs {
|
||||||
|
if slug == sourceSlug {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if idx, ok := bySlug[slug]; ok {
|
||||||
|
pages[idx] = addSourceRef(pages[idx], sourceRef)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
pt, ok := findInInventory(slug, inventory)
|
||||||
|
if !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
diskPath := filepath.Join(brainDir, "wiki", string(pt), slug+".md")
|
||||||
|
b, err := os.ReadFile(diskPath)
|
||||||
|
if err != nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
page := wiki.Page{
|
||||||
|
Path: "wiki/" + string(pt) + "/" + slug + ".md",
|
||||||
|
Content: string(b),
|
||||||
|
}
|
||||||
|
pages = append(pages, addSourceRef(page, sourceRef))
|
||||||
|
}
|
||||||
|
|
||||||
|
return pages
|
||||||
|
}
|
||||||
|
|
||||||
|
// addSourceRef injects sourceRef into the ## Sources bullet section of page
|
||||||
|
// using wiki.Merge, which deduplicates bullets automatically.
|
||||||
|
func addSourceRef(page wiki.Page, sourceRef string) wiki.Page {
|
||||||
|
patch := wiki.Page{
|
||||||
|
Path: page.Path,
|
||||||
|
Content: "\n## Sources\n\n" + sourceRef + "\n",
|
||||||
|
}
|
||||||
|
return wiki.Merge(page, patch)
|
||||||
|
}
|
||||||
|
|
||||||
|
// extractWikilinks returns the set of slugs referenced as [[slug|...]] in content.
|
||||||
|
func extractWikilinks(content string) map[string]bool {
|
||||||
|
slugs := make(map[string]bool)
|
||||||
|
for _, m := range wikilinkRE.FindAllStringSubmatch(content, -1) {
|
||||||
|
slugs[m[1]] = true
|
||||||
|
}
|
||||||
|
return slugs
|
||||||
|
}
|
||||||
|
|
||||||
|
// findSourcePage returns the slug and title of the first wiki/sources/ page in pages.
|
||||||
|
func findSourcePage(pages []wiki.Page) (slug, title string, found bool) {
|
||||||
|
for _, p := range pages {
|
||||||
|
if strings.HasPrefix(p.Path, "wiki/sources/") {
|
||||||
|
slug = strings.TrimSuffix(filepath.Base(p.Path), ".md")
|
||||||
|
title = extractTitle(p.Content)
|
||||||
|
if title == "" {
|
||||||
|
title = slug
|
||||||
|
}
|
||||||
|
return slug, title, true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return "", "", false
|
||||||
|
}
|
||||||
|
|
||||||
|
// findInInventory returns the PageType for a slug if it appears in the inventory.
|
||||||
|
func findInInventory(slug string, inventory map[wiki.PageType][]wiki.Entry) (wiki.PageType, bool) {
|
||||||
|
for pt, entries := range inventory {
|
||||||
|
for _, e := range entries {
|
||||||
|
if e.Slug == slug {
|
||||||
|
return pt, true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return "", false
|
||||||
|
}
|
||||||
172
ingestion/internal/pipeline/refs_test.go
Normal file
172
ingestion/internal/pipeline/refs_test.go
Normal file
@@ -0,0 +1,172 @@
|
|||||||
|
// ingestion/internal/pipeline/refs_test.go
|
||||||
|
package pipeline
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
|
||||||
|
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
||||||
|
)
|
||||||
|
|
||||||
|
func makeInventory(concepts, entities []string) map[wiki.PageType][]wiki.Entry {
|
||||||
|
inv := map[wiki.PageType][]wiki.Entry{
|
||||||
|
wiki.PageTypeConcept: {},
|
||||||
|
wiki.PageTypeEntity: {},
|
||||||
|
wiki.PageTypeSource: {},
|
||||||
|
}
|
||||||
|
for _, slug := range concepts {
|
||||||
|
inv[wiki.PageTypeConcept] = append(inv[wiki.PageTypeConcept], wiki.Entry{Slug: slug, Title: slug})
|
||||||
|
}
|
||||||
|
for _, slug := range entities {
|
||||||
|
inv[wiki.PageTypeEntity] = append(inv[wiki.PageTypeEntity], wiki.Entry{Slug: slug, Title: slug})
|
||||||
|
}
|
||||||
|
return inv
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_NoSourcePage(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{Path: "wiki/concepts/foo.md", Content: "---\ntitle: Foo\n---\n\n## Definition\n\nFoo.\n"},
|
||||||
|
}
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
assert.Equal(t, pages, got)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_InjectsIntoProposedConcept(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/my-article.md",
|
||||||
|
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[domain-driven-design|Domain Driven Design]].\n",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Path: "wiki/concepts/domain-driven-design.md",
|
||||||
|
Content: "---\ntitle: Domain Driven Design\n---\n\n## Definition\n\nA methodology.\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
|
||||||
|
require.Len(t, got, 2)
|
||||||
|
assert.Contains(t, got[1].Content, "## Sources")
|
||||||
|
assert.Contains(t, got[1].Content, "[[my-article|My Article]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_LoadsConceptFromDisk(t *testing.T) {
|
||||||
|
brainDir := t.TempDir()
|
||||||
|
conceptDir := filepath.Join(brainDir, "wiki", "concepts")
|
||||||
|
require.NoError(t, os.MkdirAll(conceptDir, 0o755))
|
||||||
|
require.NoError(t, os.WriteFile(
|
||||||
|
filepath.Join(conceptDir, "shape-up.md"),
|
||||||
|
[]byte("---\ntitle: Shape Up\n---\n\n## Definition\n\nA methodology.\n"),
|
||||||
|
0o644,
|
||||||
|
))
|
||||||
|
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/my-article.md",
|
||||||
|
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[shape-up|Shape Up]].\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
inv := makeInventory([]string{"shape-up"}, nil)
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, inv, brainDir)
|
||||||
|
|
||||||
|
require.Len(t, got, 2)
|
||||||
|
var conceptPage wiki.Page
|
||||||
|
for _, p := range got {
|
||||||
|
if p.Path == "wiki/concepts/shape-up.md" {
|
||||||
|
conceptPage = p
|
||||||
|
}
|
||||||
|
}
|
||||||
|
assert.Contains(t, conceptPage.Content, "## Sources")
|
||||||
|
assert.Contains(t, conceptPage.Content, "[[my-article|My Article]]")
|
||||||
|
assert.Contains(t, conceptPage.Content, "## Definition")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_NoSelfReference(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/my-article.md",
|
||||||
|
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSelf-link [[my-article|My Article]].\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
assert.Len(t, got, 1)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_DeduplicatesOnReingestion(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/my-article.md",
|
||||||
|
Content: "---\ntitle: My Article\n---\n\n## Summary\n\nSee [[ddd|DDD]].\n",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Path: "wiki/concepts/ddd.md",
|
||||||
|
Content: "---\ntitle: DDD\n---\n\n## Definition\n\nA thing.\n\n## Sources\n\n- [[my-article|My Article]]\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
|
||||||
|
require.Len(t, got, 2)
|
||||||
|
count := 0
|
||||||
|
for _, line := range splitLines(got[1].Content) {
|
||||||
|
if line == "- [[my-article|My Article]]" {
|
||||||
|
count++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
assert.Equal(t, 1, count, "source ref should appear exactly once")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectSourceRefs_InjectsIntoEntity(t *testing.T) {
|
||||||
|
pages := []wiki.Page{
|
||||||
|
{
|
||||||
|
Path: "wiki/sources/book.md",
|
||||||
|
Content: "---\ntitle: Book\n---\n\n## Summary\n\nBy [[ryan-singer|Ryan Singer]].\n",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Path: "wiki/entities/ryan-singer.md",
|
||||||
|
Content: "---\ntitle: Ryan Singer\n---\n\n## Description\n\nA designer.\n",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
got := injectSourceRefs(pages, makeInventory(nil, nil), t.TempDir())
|
||||||
|
|
||||||
|
require.Len(t, got, 2)
|
||||||
|
var entity wiki.Page
|
||||||
|
for _, p := range got {
|
||||||
|
if p.Path == "wiki/entities/ryan-singer.md" {
|
||||||
|
entity = p
|
||||||
|
}
|
||||||
|
}
|
||||||
|
assert.Contains(t, entity.Content, "[[book|Book]]")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExtractWikilinks(t *testing.T) {
|
||||||
|
content := "See [[foo|Foo]] and [[bar|Bar]] and [[foo|Foo again]]."
|
||||||
|
got := extractWikilinks(content)
|
||||||
|
assert.True(t, got["foo"])
|
||||||
|
assert.True(t, got["bar"])
|
||||||
|
assert.Len(t, got, 2, "duplicate slugs should be deduplicated")
|
||||||
|
}
|
||||||
|
|
||||||
|
func splitLines(s string) []string {
|
||||||
|
var out []string
|
||||||
|
start := 0
|
||||||
|
for i := 0; i < len(s); i++ {
|
||||||
|
if s[i] == '\n' {
|
||||||
|
if line := s[start:i]; line != "" {
|
||||||
|
out = append(out, line)
|
||||||
|
}
|
||||||
|
start = i + 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if last := s[start:]; last != "" {
|
||||||
|
out = append(out, last)
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
@@ -14,13 +14,12 @@ import (
|
|||||||
"github.com/stretchr/testify/require"
|
"github.com/stretchr/testify/require"
|
||||||
|
|
||||||
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
|
"github.com/mathiasbq/hyperguild/ingestion/internal/pipeline"
|
||||||
"github.com/mathiasbq/hyperguild/ingestion/internal/wiki"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
// successComplete returns a valid JSON-encoded page array for any call.
|
// successComplete returns a valid JSON-encoded RawPage array for any call.
|
||||||
func successComplete(page wiki.Page) pipeline.CompleteFunc {
|
func successComplete(raw pipeline.RawPage) pipeline.CompleteFunc {
|
||||||
return func(ctx context.Context, system, user string) (string, error) {
|
return func(ctx context.Context, system, user string) (string, error) {
|
||||||
b, err := json.Marshal([]wiki.Page{page})
|
b, err := json.Marshal([]pipeline.RawPage{raw})
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return "", err
|
return "", err
|
||||||
}
|
}
|
||||||
@@ -50,16 +49,19 @@ func TestStart_ProcessesFile(t *testing.T) {
|
|||||||
require.NoError(t, os.WriteFile(rawFile, []byte("Content about Shape Up."), 0o644))
|
require.NoError(t, os.WriteFile(rawFile, []byte("Content about Shape Up."), 0o644))
|
||||||
|
|
||||||
date := time.Now().UTC().Format("2006-01-02")
|
date := time.Now().UTC().Format("2006-01-02")
|
||||||
wikiPage := wiki.Page{
|
rawPage := pipeline.RawPage{
|
||||||
Path: "wiki/sources/shape-up-book.md",
|
Title: "Shape Up Book",
|
||||||
Content: "---\ntitle: Shape Up Book\ntype: article\ndomain: product-management\ndate_ingested: " + date + "\nlast_updated: " + date + "\naliases:\n - Shape Up Book\n---\n\n## Summary\n\nA book about Shape Up.\n",
|
Type: "source",
|
||||||
|
Subtype: "article",
|
||||||
|
Domain: "product-management",
|
||||||
|
Content: "## Summary\n\nA book about Shape Up.\n",
|
||||||
}
|
}
|
||||||
|
|
||||||
cfg := Config{
|
cfg := Config{
|
||||||
BrainDir: brainDir,
|
BrainDir: brainDir,
|
||||||
Interval: 50 * time.Millisecond,
|
Interval: 50 * time.Millisecond,
|
||||||
Pipeline: pipeline.Config{
|
Pipeline: pipeline.Config{
|
||||||
Complete: successComplete(wikiPage),
|
Complete: successComplete(rawPage),
|
||||||
ChunkSize: 0,
|
ChunkSize: 0,
|
||||||
Schema: "# Schema\nThree page types.",
|
Schema: "# Schema\nThree page types.",
|
||||||
},
|
},
|
||||||
@@ -193,12 +195,14 @@ func TestProcessDir_SkipsSubdirs(t *testing.T) {
|
|||||||
// Track which sources were passed to Complete.
|
// Track which sources were passed to Complete.
|
||||||
var processedSources []string
|
var processedSources []string
|
||||||
completeFn := func(ctx context.Context, system, user string) (string, error) {
|
completeFn := func(ctx context.Context, system, user string) (string, error) {
|
||||||
// Record that this was called; return a minimal valid page.
|
// Record that this was called; return a minimal valid RawPage.
|
||||||
page := wiki.Page{
|
raw := pipeline.RawPage{
|
||||||
Path: "wiki/sources/valid.md",
|
Title: "Valid",
|
||||||
Content: "---\ntitle: Valid\n---\n\n## Summary\n\nValid.\n",
|
Type: "source",
|
||||||
|
Subtype: "article",
|
||||||
|
Content: "## Summary\n\nValid.\n",
|
||||||
}
|
}
|
||||||
b, _ := json.Marshal([]wiki.Page{page})
|
b, _ := json.Marshal([]pipeline.RawPage{raw})
|
||||||
processedSources = append(processedSources, "called")
|
processedSources = append(processedSources, "called")
|
||||||
return string(b), nil
|
return string(b), nil
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user