chore: bootstrap skills library — 19 skills + installer + CI auto-tag

Phase 1 of mathias/skills extraction (infra#62 Track D — homelab next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill dirs + SKILLS_INDEX.md) and adds the installation surface: - Taskfile.yml — install / update / list / release / check targets - install.sh — bootstrap installer for hosts without Task. Idempotent symlink wirer; default checkout at ~/.local/share/skills/ on every host; SKILLS_REF env var pins a tag (default: main). - .gitea/workflows/release.yml — auto-tag every push to main by Bump-Type footer (major/minor/patch, default patch). Skipped when commit contains [skip-release]. - README — usage, versioning, contribution flow, secret-hygiene rule. Phase 1 wires Claude Code only (~/.claude/skills/<name> global + <repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode, antigravity, and gitea-resident agents (cobalt-dingo, agentsquad) once their skill conventions are researched. Public repo, markdown-only — no secrets, no client names. Verified via pre-push grep before initial push. [skip-release]
2026-05-24 14:59:54 +02:00
commit fb464d03d4
33 changed files with 8688 additions and 0 deletions
--- a/.gitea/workflows/release.yml
+++ b/.gitea/workflows/release.yml
@@ -0,0 +1,68 @@
 name: release
 # Auto-tag every push to main with a semver bump derived from the latest
 # commit footer:
 #   Bump-Type: major  → vX.0.0
 #   Bump-Type: minor  → v0.Y.0
 #   (default)         → v0.0.Z (patch)
 #
 # Skips when the latest commit is itself the auto-tagger (commit message
 # contains "[skip-release]") to prevent recursion. Tag is pushed back
 # to the repo using the GITEA_TOKEN provided by act_runner.
 on:
  push:
    branches:
      - main
 jobs:
  tag:
    runs-on: ubuntu-latest
    steps:
      - name: checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          token: ${{ secrets.GITEA_TOKEN }}
      - name: derive next version
        id: bump
        run: |
          set -euo pipefail
          last_commit=$(git log -1 --pretty=%B)
          if printf '%s' "$last_commit" | grep -qi '\[skip-release\]'; then
            echo "skipping — commit marked [skip-release]"
            echo "skip=true" >> "$GITHUB_OUTPUT"
            exit 0
          fi
          latest=$(git describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0")
          major=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\1/')
          minor=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\2/')
          patch=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\3/')
          bump=$(printf '%s' "$last_commit" | grep -iE '^Bump-Type:' | head -1 | awk '{print tolower($2)}')
          case "$bump" in
            major) major=$((major + 1)); minor=0; patch=0 ;;
            minor) minor=$((minor + 1)); patch=0 ;;
            *)     patch=$((patch + 1)) ;;
          esac
          next="v${major}.${minor}.${patch}"
          echo "previous=$latest"
          echo "bump=${bump:-patch}"
          echo "next=$next"
          echo "tag=$next" >> "$GITHUB_OUTPUT"
          echo "skip=false" >> "$GITHUB_OUTPUT"
      - name: tag + push
        if: steps.bump.outputs.skip != 'true'
        env:
          TAG: ${{ steps.bump.outputs.tag }}
          GITEA_ACTOR: ${{ gitea.actor }}
          GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
        run: |
          set -euo pipefail
          git config user.name "${GITEA_ACTOR}"
          git config user.email "${GITEA_ACTOR}@noreply.gitea.d-ma.be"
          git tag -a "$TAG" -m "release $TAG"
          # gitea-actions checkout uses an x-access-token URL; reuse it.
          git push origin "$TAG"
          echo "pushed $TAG"
--- a/README.md
+++ b/README.md
@@ -0,0 +1,81 @@
 # skills
 Engineering skills library — markdown `SKILL.md` files that plug into any
 project or AI-coding harness. Installed via `task skills:install` in
 consumer projects, or directly by `install.sh` on hosts without Task.
 ## What is a skill?
 A skill is a single directory containing one or more markdown files. The
 canonical entrypoint is `<name>/SKILL.md`. Additional reference material
 lives alongside in the same directory (e.g.
 `clean-code/references/code-smells.md`). Harnesses load skills on
 demand — keep `SKILL.md` short and triggerable; push depth into
 references.
 See [`SKILLS_INDEX.md`](./SKILLS_INDEX.md) for the full catalogue with
 descriptions and "use when" triggers.
 ## Install (consumer project)
 The expected pattern: each consumer project's `Taskfile.yml` has a
 `skills:install` target that delegates here. From a project root:
 ```bash
 task skills:install
 ```
 Bare-shell fallback for hosts without Task installed:
 ```bash
 curl -fsSL https://gitea.d-ma.be/mathias/skills/raw/branch/main/install.sh | bash
 ```
 Both forms are idempotent and re-runnable. On every run they:
 1. `git clone` (or `git pull`) this repo into `~/.local/share/skills/` —
   the **canonical local checkout** on every host.
 2. Wire per-skill symlinks into the harnesses found on the host (see
   [Wired harnesses](#wired-harnesses)).
 Re-run after `git pull` upstream to pick up new skills or version
 changes. CI tags every change as a `vX.Y.Z` release (see
 [Versioning](#versioning)).
 ## Wired harnesses
 Phase 1 (current) wires **Claude Code** only:
 - Global: `~/.claude/skills/<name>` → `~/.local/share/skills/<name>`
 - Per-repo: `<repo>/.claude/skills/<name>` (when invoked from a git
  repo) → same
 Phase 2 will add: Crush, opencode, antigravity, gitea-resident agents
 (cobalt-dingo, agentsquad). See `mathias/infra` issue `infra#62`
 addendum for the roadmap.
 ## Versioning
 Every commit to `main` is auto-tagged as a semver release by the
 `.gitea/workflows/release.yml` action. Bump direction follows the commit
 footer:
 - `Bump-Type: major` → `X.0.0`
 - `Bump-Type: minor` → `0.Y.0`
 - *anything else* → `0.0.Z` (patch, the default)
 Consumer projects can pin a version by setting `SKILLS_REF=v0.1.0`
 before `task skills:install`; default is `main` for always-current.
 ## Contributing a skill
 1. Add a directory `<short-kebab-name>/`.
 2. Create `<short-kebab-name>/SKILL.md` — start with one-paragraph
   purpose, then triggers ("use when …"), then mechanics.
 3. Add a row to `SKILLS_INDEX.md` (description + use-when).
 4. Commit directly to `main`. Conventional-commit prefix
   (`feat:` / `fix:` / `docs:`). Add `Bump-Type: minor` footer when
   adding a new skill or expanding scope; default patch covers edits.
 This repo is markdown-only and public — never include secrets, client
 names, or anything under NDA in a skill.
--- a/SKILLS_INDEX.md
+++ b/SKILLS_INDEX.md
@@ -0,0 +1,91 @@
 # Engineering Skills Index
 This index lists all available engineering skills. Load the full SKILL.md on demand.
 | Skill | Description | Use when |
 |-------|-------------|----------|
 | `tdd` | Test-Driven Development enforcement | Starting any new feature or fixing a bug |
 | `clean-code` | Clean code principles with Go adaptation | Code review, refactoring, new code |
 | `solid` | SOLID principles with Go examples | Design review, architecture decisions |
 | `code-review` | Code smell detection + Go-specific checklist | Pre-merge review |
 | `refactoring` | Systematic refactoring methodology (Fowler) | When improving existing code |
 | `test-design` | Dave Farley test design scoring (Farley Index) | Reviewing or writing tests |
 | `cognitive-load` | Cognitive complexity analysis (CLI score) | Identifying hotspots, before refactoring |
 | `problem-analysis` | Deep problem understanding before coding | Starting any non-trivial task |
 | `user-stories` | Elephant Carpaccio story slicing | Breaking down features |
 | `atdd` | Acceptance Test-Driven Development workflow | Feature development with acceptance criteria |
 | `spec-driven-dev` | Spec-driven development for PMs | Writing specs and requirements |
 | `planning` | Planning and task breakdown | Sprint planning, task decomposition |
 | `gitea-ci` | Gitea CI/CD pipeline with act_runner, buildah, k3s, GitHub mirror | Setting up or debugging CI on this infra |
 | `debug` | Hypothesis-first debugging discipline | Any test failure or bug whose cause is not immediately obvious |
 | `feature-spec` | Implementation-level feature spec (tighter than spec-driven-dev) | Scoping one feature inside an already-specced project |
 | `playwright` | Playwright E2E testing — setup, HTMX patterns, SSE streams, CI wiring | Adding browser tests to any project with a web UI |
 | `session-retrospective` | Surface non-obvious learnings from a session log | After a coding session ends, before context is lost |
 | `trainer` | Two-phase brain curation (score candidates, write past quality gate) | Periodic brain pruning and curation |
 | `grill-me` | Structured plan interrogation — Quick Poke, Full Grill, Pre-mortem | Stress-testing a plan before committing; end of Diamond 1; before promoting to pre-prod |
 ## Wiring into tools
 The canonical home is `~/dev/.skills/`. Per-tool surfacing is handled by
 `~/dev/.context/wire-skills.sh`:
 - **Claude Code** — each git repo under `~/dev/` gets a `<repo>/.claude/skills`
  symlink pointing at `~/dev/.skills/`. The symlink is excluded by
  `**/.claude/skills` in `~/.config/git/ignore`, so it never lands in commits.
 - **Crush** — per-skill symlinks under `~/.config/crush/skills/<name>`, added
  alongside existing entries (e.g., `mstack/*`). Skills already present as
  real directories or non-canonical symlinks are preserved.
 Run `bash ~/dev/.context/wire-skills.sh --dry-run` to preview, then drop the
 flag to apply. The script is idempotent.
 ## Skill Relationships
 ```
 problem-analysis
    │
    └── user-stories
            │
            ├── spec-driven-dev (PM-kickoff lifecycle)
            │       │
            │       └── feature-spec (per-feature, mid-implementation)
            │               │
            │               └── planning
            │                       │
            │                       └── atdd ──→ tdd (foundation)
            │                                ├──→ debug (when tests fail)
            │                                ├──→ clean-code (REFACTOR phase)
            │                                └──→ test-design (review)
            │
            └── planning
 code-review ──→ refactoring ──→ clean-code
                                    └──→ solid
 (post-session)
 session-retrospective ──→ trainer ──→ brain (via brain_write)
 (planning & convergence)
 grill-me ──→ planning (after plan is sharpened)
         ──→ spec-driven-dev (after hypothesis is validated)
 ```
 ## Quick Triggers
 | Situation | Load |
 |-----------|------|
 | "Build this feature" | `problem-analysis` → `spec-driven-dev` → `planning` → `atdd` |
 | "Fix this bug" | `tdd` |
 | "Review this code" | `code-review` |
 | "Refactor this" | `refactoring` + `clean-code` |
 | "Design this system" | `solid` + `problem-analysis` |
 | "Break this down into stories" | `user-stories` |
 | "Write a spec" | `spec-driven-dev` |
 | "This code is hard to understand" | `cognitive-load` |
 | "Review these tests" | `test-design` |
 | "Set up CI for this project" | `gitea-ci` |
 | "CI pipeline is failing / not triggering" | `gitea-ci` |
 | "Commit directly to main? Or branch?" | TBD: commit to main unless parallel agents are active (see `gitea-ci` SKILL §Trunk-Based Development) |
 | "Grill me / stress-test this / is this ready?" | `grill-me` |
 | "End of Diamond 1 — should we build this?" | `grill-me` (Full Grill) |
 | "About to promote to pre-prod" | `grill-me` (Pre-mortem) |
--- a/Taskfile.yml
+++ b/Taskfile.yml
@@ -0,0 +1,130 @@
 version: '3'
 # Skills library Taskfile.
 #
 # Primary entry point for project + host installation. Consumer projects
 # do NOT depend on this file directly — they ship their own
 # `skills:install` target that delegates here via `install.sh`. This
 # Taskfile is invoked by maintainers of the skills repo itself, plus by
 # install.sh during the post-clone wire step.
 vars:
  REPO_URL: 'https://gitea.d-ma.be/mathias/skills.git'
  CHECKOUT_DIR: '{{.HOME}}/.local/share/skills'
  CLAUDE_GLOBAL_DIR: '{{.HOME}}/.claude/skills'
 tasks:
  default:
    cmds:
      - task: list
  list:
    desc: 'Print every skill in this repo'
    cmds:
      - 'ls -1 {{.TASKFILE_DIR}} | grep -Ev "^(Taskfile.yml|install.sh|README.md|SKILLS_INDEX.md|.gitea|.git$)$" || true'
    silent: true
  install:
    desc: 'Wire skills into every harness detected on this host (Claude Code today, more in phase 2).'
    cmds:
      - task: install:claude:global
      - task: install:claude:repo
  install:claude:global:
    desc: 'Per-skill symlinks under ~/.claude/skills/<name> — visible in every Claude Code project on this host.'
    cmds:
      - mkdir -p {{.CLAUDE_GLOBAL_DIR}}
      - |
        for skill in $(ls -1 "{{.TASKFILE_DIR}}" | grep -Ev "^(Taskfile.yml|install.sh|README.md|SKILLS_INDEX.md|.gitea|.git$)$"); do
          target="{{.TASKFILE_DIR}}/${skill}"
          link="{{.CLAUDE_GLOBAL_DIR}}/${skill}"
          if [ -L "$link" ] || [ -e "$link" ]; then
            current=$(readlink "$link" 2>/dev/null || true)
            if [ "$current" = "$target" ]; then
              continue
            fi
            rm -rf "$link"
          fi
          ln -s "$target" "$link"
          echo "linked $link → $target"
        done
  install:claude:repo:
    desc: 'Per-skill symlinks under $(pwd)/.claude/skills/<name> when invoked inside a git repo.'
    cmds:
      - |
        if ! git rev-parse --show-toplevel >/dev/null 2>&1; then
          echo "not in a git repo — skipping per-repo wire"; exit 0
        fi
        repo=$(git rev-parse --show-toplevel)
        if [ "$repo" = "{{.TASKFILE_DIR}}" ]; then
          echo "running from skills repo itself — skipping per-repo wire"; exit 0
        fi
        target_dir="$repo/.claude/skills"
        mkdir -p "$target_dir"
        for skill in $(ls -1 "{{.TASKFILE_DIR}}" | grep -Ev "^(Taskfile.yml|install.sh|README.md|SKILLS_INDEX.md|.gitea|.git$)$"); do
          target="{{.TASKFILE_DIR}}/${skill}"
          link="${target_dir}/${skill}"
          if [ -L "$link" ] || [ -e "$link" ]; then
            current=$(readlink "$link" 2>/dev/null || true)
            if [ "$current" = "$target" ]; then
              continue
            fi
            rm -rf "$link"
          fi
          ln -s "$target" "$link"
          echo "linked $link → $target"
        done
  update:
    desc: 'git pull the canonical checkout then re-run install.'
    cmds:
      - |
        if [ ! -d "{{.CHECKOUT_DIR}}/.git" ]; then
          echo "no canonical checkout at {{.CHECKOUT_DIR}} — run install.sh first"; exit 1
        fi
        git -C "{{.CHECKOUT_DIR}}" pull --ff-only
      - task: install
  release:
    desc: 'Tag HEAD per Bump-Type in the latest commit footer. Normally CI handles this — run locally only when CI is unavailable.'
    cmds:
      - |
        latest=$(git -C "{{.TASKFILE_DIR}}" describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0")
        major=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\1/')
        minor=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\2/')
        patch=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\3/')
        bump=$(git -C "{{.TASKFILE_DIR}}" log -1 --pretty=%B | grep -iE "^Bump-Type:" | head -1 | awk '{print tolower($2)}')
        case "$bump" in
          major) major=$((major + 1)); minor=0; patch=0 ;;
          minor) minor=$((minor + 1)); patch=0 ;;
          *)     patch=$((patch + 1)) ;;
        esac
        next="v${major}.${minor}.${patch}"
        echo "tagging $next (previous $latest, bump=${bump:-patch})"
        git -C "{{.TASKFILE_DIR}}" tag -a "$next" -m "release $next"
        echo "now: git push origin $next"
  check:
    desc: 'Sanity check — every skill dir has a SKILL.md, no stray files at root.'
    cmds:
      - |
        missing=0
        for skill in $(ls -1 "{{.TASKFILE_DIR}}" | grep -Ev "^(Taskfile.yml|install.sh|README.md|SKILLS_INDEX.md|.gitea|.git$)$"); do
          dir="{{.TASKFILE_DIR}}/${skill}"
          if [ ! -d "$dir" ]; then
            echo "ERROR: $skill is not a directory" >&2
            missing=$((missing + 1))
            continue
          fi
          if [ ! -f "$dir/SKILL.md" ]; then
            echo "ERROR: $dir missing SKILL.md" >&2
            missing=$((missing + 1))
          fi
        done
        if [ $missing -gt 0 ]; then
          echo "$missing skill(s) failed structure check" >&2
          exit 1
        fi
        echo "all skills have SKILL.md"
--- a/atdd/SKILL.md
+++ b/atdd/SKILL.md
@@ -0,0 +1,294 @@
 ---
 name: atdd
 description: Implement user stories using the ATDD workflow — Red (failing acceptance test) → Green (minimal implementation) → Refactor. One story at a time.
 ---
 # Acceptance Test-Driven Development (ATDD)
 ## Overview
 ATDD extends TDD to the acceptance level. You write tests that capture the behavior described in a user story's acceptance criteria — then implement just enough to make them pass.
 **Input required:** User stories with clear acceptance criteria. Load `user-stories` skill first.
 ## Core Methodology: Red-Green-Refactor at Acceptance Level
 ```
 RED    → Write failing acceptance test (one story at a time)
 GREEN  → Write minimal code to make it pass
 REFACTOR → Clean up code (load clean-code skill)
 COMMIT → Commit with reference to the story
 ```
 **Key principle:** Never mix phases. RED is only writing tests. GREEN is only making tests pass. REFACTOR is only cleaning up.
 **Stop and confirm** before advancing from RED to GREEN, and from GREEN to REFACTOR.
 ## Working on One Story at a Time
 Pick the highest-priority story from the backlog. Do not start a second story until the first is done (committed, clean, all tests green).
 ## Phase 1: RED — Write Failing Acceptance Test
 Translate the story's acceptance criteria into executable tests.
 **For each acceptance criterion:**
 ```
 Given [context]
 When [action]
 Then [observable outcome]
 ```
 Becomes a test case.
 ### Go ATDD Example
 **Story:** "Users can register with email and password"
 **Acceptance criteria:**
 - Given valid email and password, registration succeeds and returns the new user
 - Given an email already in use, registration returns ErrDuplicateEmail
 - Given an invalid email format, registration returns ErrInvalidEmail
 - Given a password shorter than 8 chars, registration returns ErrWeakPassword
 **Acceptance tests (write these BEFORE implementation):**
 ```go
 func TestUserRegistration_Acceptance(t *testing.T) {
    tests := []struct {
        name    string
        email   string
        pass    string
        wantErr error
    }{
        {
            name:    "valid registration creates user",
            email:   "alice@example.com",
            pass:    "securePass1!",
            wantErr: nil,
        },
        {
            name:    "duplicate email returns ErrDuplicateEmail",
            email:   "existing@example.com",
            pass:    "securePass1!",
            wantErr: ErrDuplicateEmail,
        },
        {
            name:    "invalid email format returns ErrInvalidEmail",
            email:   "not-an-email",
            pass:    "securePass1!",
            wantErr: ErrInvalidEmail,
        },
        {
            name:    "short password returns ErrWeakPassword",
            email:   "alice@example.com",
            pass:    "short",
            wantErr: ErrWeakPassword,
        },
    }
    store := NewInMemoryUserStore()
    // Pre-seed duplicate email
    store.Seed(User{Email: "existing@example.com"})
    svc := NewUserService(store)
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            _, err := svc.Register(context.Background(), tt.email, tt.pass)
            if tt.wantErr != nil {
                assert.ErrorIs(t, err, tt.wantErr)
            } else {
                assert.NoError(t, err)
            }
        })
    }
 }
 ```
 **Run and verify they fail:**
 ```bash
 go test -run TestUserRegistration_Acceptance ./...
 # FAIL: UserService.Register undefined
 ```
 The test must fail because the implementation doesn't exist yet. If it passes, you're testing existing behavior. Check your test.
 **STOP here. Confirm before moving to GREEN.**
 ## Phase 2: GREEN — Minimal Implementation
 Implement the absolute minimum code necessary to make the acceptance tests pass. Ugly code is acceptable at this stage. The goal is green, not clean.
 ```go
 // Minimal implementation — prioritize passing tests over elegance
 func (s *UserService) Register(ctx context.Context, email, password string) (User, error) {
    if !strings.Contains(email, "@") {
        return User{}, ErrInvalidEmail
    }
    if len(password) < 8 {
        return User{}, ErrWeakPassword
    }
    if s.store.Exists(ctx, email) {
        return User{}, ErrDuplicateEmail
    }
    u := User{ID: newID(), Email: email}
    if err := s.store.Save(ctx, u); err != nil {
        return User{}, fmt.Errorf("save user: %w", err)
    }
    return u, nil
 }
 ```
 **Verify GREEN:**
 ```bash
 go test -run TestUserRegistration_Acceptance ./...
 # ok
 go test ./... # All other tests still pass
 go test -race ./...
 ```
 **STOP here. Confirm before moving to REFACTOR.**
 ## Phase 3: REFACTOR — Clean Up
 With tests green, clean the code. Do not change behavior.
 **During REFACTOR, load and follow the `clean-code` skill.**
 Apply:
 - Extract functions for complex logic
 - Better naming
 - Remove duplication
 - Apply SOLID principles
 - For design decisions, load `solid` skill
 After every change: `go test ./...` must stay green.
 **Do NOT refactor the acceptance tests themselves** unless they have a bug. The tests are your specification — changing them changes what you've committed to.
 **STOP here. Confirm before committing.**
 ## Phase 4: COMMIT
 Commit the completed story. Reference the story in the commit message.
 ```bash
 git commit -m "feat: user registration with email validation
 Implements acceptance criteria for US-001 (User Registration).
 - Valid email/password creates a user
 - Duplicate email returns ErrDuplicateEmail
 - Invalid email format returns ErrInvalidEmail
 - Short password returns ErrWeakPassword"
 ```
 ## For Test Design Questions
 When writing acceptance tests, load the `test-design` skill to ensure:
 - Tests are understandable (Farley: Understandable)
 - Tests don't couple to implementation details (Farley: Maintainable)
 - Tests are repeatable across environments (Farley: Repeatable)
 ## For Code Review Before Commit
 Load the `code-review` skill before committing to check:
 - Error handling follows `fmt.Errorf("op: %w", err)` pattern
 - Context propagation is correct
 - No race conditions in concurrent code
 - Exported API is minimal
 ## For Refactoring Guidance
 Load the `refactoring` skill for systematic techniques to apply during REFACTOR phase.
 ## For Clean Code Principles
 Load the `clean-code` skill during REFACTOR phase to apply:
 - Naming conventions
 - Function size and responsibility
 - SOLID principles
 ## Go-Specific ATDD Notes
 ### Acceptance Test Location
 Place acceptance tests in `_test.go` files with the `_acceptance_test.go` suffix or `//go:build acceptance` build tag to separate from fast unit tests:
 ```go
 //go:build acceptance
 package service_test
 ```
 ```bash
 go test ./...              # Unit tests only (fast)
 go test -tags=acceptance ./...  # Include acceptance tests
 ```
 ### In-Memory Implementations for Acceptance Tests
 Use in-memory implementations of repositories for acceptance tests:
 ```go
 type InMemoryUserStore struct {
    mu    sync.Mutex
    users map[string]User
 }
 func (s *InMemoryUserStore) Save(ctx context.Context, u User) error {
    s.mu.Lock()
    defer s.mu.Unlock()
    if _, exists := s.users[u.Email]; exists {
        return ErrDuplicateEmail
    }
    s.users[u.Email] = u
    return nil
 }
 func (s *InMemoryUserStore) Exists(ctx context.Context, email string) bool {
    s.mu.Lock()
    defer s.mu.Unlock()
    _, exists := s.users[email]
    return exists
 }
 ```
 This keeps acceptance tests fast and deterministic.
 ### Given-When-Then in Go
 The `t.Run` structure maps naturally to Given-When-Then:
 ```go
 t.Run("given valid credentials, when logging in, then returns token", func(t *testing.T) {
    // Given: valid user exists
    store := InMemoryUserStore{...}
    // When: login is called
    token, err := svc.Login(ctx, email, pass)
    // Then: token is returned
    require.NoError(t, err)
    assert.NotEmpty(t, token)
 })
 ```
 ## Verification Checklist
 Before committing:
 - [ ] All acceptance tests pass
 - [ ] All existing tests still pass
 - [ ] Race detector clean: `go test -race ./...`
 - [ ] REFACTOR phase complete — code is clean
 - [ ] Commit message references the user story
 - [ ] No behavior added beyond the acceptance criteria
 ## Cross-References
 - Requires: `user-stories` skill output (stories with acceptance criteria)
 - During REFACTOR: load `clean-code` skill
 - For test quality: load `test-design` skill
 - For code review: load `code-review` skill
 - For refactoring guidance: load `refactoring` skill
 - Foundation: load `tdd` skill for core TDD methodology
--- a/clean-code/SKILL.md
+++ b/clean-code/SKILL.md
@@ -0,0 +1,282 @@
 ---
 name: clean-code
 description: Write code that humans can read, understand, and safely change. Apply SOLID, GRASP, and Clean Code principles during the REFACTOR phase of TDD.
 ---
 # Clean Code
 ## Core Philosophy
 Write code for humans first, computers second. Every piece of code you write will be read many more times than it is written. Optimize for clarity and simplicity above all else.
 Code has three consumers:
 1. **Users** — get their needs met
 2. **Customers** — make or save money
 3. **Developers** — must maintain it
 Developers read code 10x more than they write it. Design for them.
 ## When to Apply This Skill
 - During the **REFACTOR phase** of TDD (after the test is green)
 - When reviewing code (load `code-review` skill too)
 - When designing new types or packages
 **Do not apply during the GREEN phase.** Ugly code that passes is correct. Clean it after.
 ## Naming
 Naming is the most important thing in code. In priority order:
 1. **Consistency** — Same concept = same name everywhere. One name per concept.
 2. **Understandability** — Domain language, not technical jargon.
 3. **Specificity** — Precise, not vague. Avoid `data`, `info`, `manager`, `handler`, `processor`, `utils`.
 4. **Brevity** — Short but not cryptic.
 5. **Searchability** — Unique, greppable names.
 6. **Pronounceability** — You should be able to say it in conversation.
 ```go
 // Bad: inconsistent, vague, cryptic
 func getUsrById(id string) {}
 func fetchCustomerByID(id string) {}
 func retrieveClientById(id string) {}
 // Good: consistent, specific
 func GetUser(ctx context.Context, id UserID) (User, error) {}
 func GetOrder(ctx context.Context, id OrderID) (Order, error) {}
 ```
 ## Functions/Methods
 - Do one thing, do it well, do it only
 - Operate at a single level of abstraction
 - Prefer fewer parameters — zero is ideal, one or two common, three should be rare
 - Avoid side effects — if a function has a side effect, name it accordingly
 - Prefer pure functions where possible
 ```go
 // Bad: does too much, mixed abstraction levels
 func processOrder(o *Order, db *sql.DB, mailer Mailer) error {
    if o.Items == nil || len(o.Items) == 0 {
        return errors.New("empty order")
    }
    total := 0.0
    for _, item := range o.Items {
        total += item.Price * float64(item.Qty)
    }
    if _, err := db.Exec("INSERT INTO orders ...", o.ID, total); err != nil {
        return fmt.Errorf("save order: %w", err)
    }
    if err := mailer.Send(o.Customer.Email, "confirmed"); err != nil {
        return fmt.Errorf("send confirmation: %w", err)
    }
    return nil
 }
 // Good: single level of abstraction, composed from focused functions
 func ProcessOrder(ctx context.Context, o *Order, repo OrderRepository, mailer Mailer) error {
    if err := validateOrder(o); err != nil {
        return fmt.Errorf("validate: %w", err)
    }
    total := calculateTotal(o.Items)
    if err := repo.Save(ctx, o, total); err != nil {
        return fmt.Errorf("save: %w", err)
    }
    if err := mailer.SendConfirmation(ctx, o.Customer.Email); err != nil {
        return fmt.Errorf("notify: %w", err)
    }
    return nil
 }
 ```
 ## Error Handling (Go-Specific)
 In Go, error returns are a first-class feature, not a smell. Follow these rules:
 ```go
 // Always wrap with context
 if err != nil {
    return fmt.Errorf("parse config: %w", err)
 }
 // Never: naked return without context
 if err != nil {
    return err
 }
 // Never: log and return (pick one)
 if err != nil {
    log.Error("failed", "err", err)
    return err // caller gets logged error AND returns it — double-counted
 }
 // Never: swallow errors silently
 _ = doSomething()
 ```
 Error wrapping builds a call-chain narrative: `"http handler: parse request: validate email: invalid format"`. This is valuable, not verbose.
 ## Comments
 **The best comment is code that doesn't need one.** Before writing a comment, ask whether you could make the code itself clearer.
 When comments are necessary, explain **why**, not what. The code shows what happens; comments explain decisions and intent.
 ```go
 // Bad: explains what (redundant)
 // increment counter
 i++
 // Good: explains why
 // Offset by 1 because the upstream API uses 1-based page numbering
 page := requestedPage + 1
 ```
 **Go adaptation:** `godoc` comments on exported identifiers ARE documentation, not a failure. The "comments are failure" rule applies to inline comments explaining bad code. Exported types and functions must have godoc comments.
 ```go
 // UserService handles user lifecycle operations.
 // It is safe for concurrent use.
 type UserService struct { ... }
 // GetByEmail returns the user with the given email address.
 // Returns ErrNotFound if no user exists with that email.
 func (s *UserService) GetByEmail(ctx context.Context, email Email) (User, error) { ... }
 ```
 ## SOLID Principles (Brief)
 For full SOLID guidance with Go examples, load the `solid` skill.
 | Principle | Quick test |
 |-----------|------------|
 | SRP | "Does this type have ONE reason to change?" |
 | OCP | "Can I add behavior by implementing an interface, not editing this code?" |
 | LSP | "Can callers substitute any implementation without knowing the difference?" |
 | ISP | "Is this interface the smallest it can be?" (`io.Reader` is a model) |
 | DIP | "Do I pass interfaces as parameters, not concrete types?" |
 ## GRASP Principles
 **Information Expert:** Assign responsibility to the type that has the data needed to fulfil it.
 **Low Coupling:** Minimize dependencies between packages. When one changes, few others should too.
 **High Cohesion:** Keep related functionality together. Everything in a package should relate to its central purpose.
 **Controller:** One type handles system events for a use case.
 **Tell, Don't Ask:** Tell objects what to do rather than asking for their data and acting on it.
 ## Go-Specific Clean Code Rules
 ### Prefer interfaces over concrete types in function signatures
 ```go
 // Bad: tied to a concrete implementation
 func NewHandler(db *postgres.DB) *Handler { ... }
 // Good: depends on the behavior needed
 func NewHandler(store UserStore) *Handler { ... }
 ```
 ### Small, focused interfaces
 ```go
 // The io.Reader/io.Writer pattern is the Go ideal
 type Reader interface {
    Read(p []byte) (n int, err error)
 }
 // If you need both, compose at the call site
 type ReadWriter interface {
    Reader
    Writer
 }
 ```
 ### Embedding over inheritance
 ```go
 // Use embedding for composition, not to inherit behavior you'll override
 type LoggedHandler struct {
    Handler
    logger *slog.Logger
 }
 ```
 ### Constructor injection for dependencies
 ```go
 type Service struct {
    repo   Repository
    mailer Mailer
    logger *slog.Logger
 }
 func NewService(repo Repository, mailer Mailer, logger *slog.Logger) *Service {
    return &Service{repo: repo, mailer: mailer, logger: logger}
 }
 ```
 ### Avoid package-level state
 Package-level variables are global state. They make tests unreliable and code hard to reason about.
 ```go
 // Bad
 var db *sql.DB
 // Good: pass as dependency
 type Handler struct {
    db *sql.DB
 }
 ```
 ## Code Smell Detection (Quick Reference)
 During the REFACTOR phase, look for these smells and address them:
 | Smell | Go Signal | Fix |
 |-------|-----------|-----|
 | Long function | > ~30 lines (Go can go longer, but be honest) | Extract function |
 | Too many parameters | > 3-4 parameters | Use a config struct or option pattern |
 | Divergent change | One type changes for multiple unrelated reasons | Split into focused types |
 | Feature envy | Method uses another type's data more than its own | Move method to that type |
 | Magic numbers | Bare literals | Named constants or typed config |
 | Deep nesting | > 3 levels of indentation | Early returns, extract helpers |
 For comprehensive smell detection, load the `code-review` skill.
 ## Structural Principles
 **Separation of concerns:** Business logic should know nothing of HTTP, SQL, or filesystems.
 **Modularity:** Each package understandable in isolation.
 **Command-Query Separation:** Functions either do something (command) or return something (query), not both.
 **Principle of Least Surprise:** Code should behave the way readers expect.
 **Boy Scout Rule:** Leave the code better than you found it.
 ## Pre-Refactor Checklist
 Before cleaning up code:
 - [ ] All tests pass
 - [ ] Race detector clean: `go test -race ./...`
 - [ ] I understand what the code does
 - [ ] I have a clear target design in mind
 During refactoring:
 - [ ] Tests stay green after each change
 - [ ] Each change is small and focused
 - [ ] Commit after each meaningful improvement
 ## References
 - `references/clean-code.md` — detailed naming rules, object calisthenics
 - `references/code-smells.md` — comprehensive smell catalog with examples
 - Load `solid` skill for SOLID principles with Go examples
 - Load `refactoring` skill for systematic refactoring techniques
--- a/clean-code/references/clean-code.md
+++ b/clean-code/references/clean-code.md
@@ -0,0 +1,376 @@
 # Clean Code Practices
 ## What is Clean Code?
 Code that is:
 - **Easy to understand** - reveals intent clearly
 - **Easy to change** - modifications are localized
 - **Easy to test** - dependencies are injectable
 - **Simple** - no unnecessary complexity
 ## The Human-Centered Approach
 Code has THREE consumers:
 1. **Users** - get their needs met
 2. **Customers** - make or save money
 3. **Developers** - must maintain it
 Design for all three, but remember: **developers read code 10x more than they write it.**
 ## Naming Principles
 ### 1. Consistency & Uniqueness (HIGHEST PRIORITY)
 Same concept = same name everywhere. One name per concept.
 ```typescript
 // BAD: Inconsistent names for same concept
 getUserById(id)
 fetchCustomerById(id)
 retrieveClientById(id)
 // GOOD: Consistent
 getUser(id)
 getOrder(id)
 getProduct(id)
 ```
 ### 2. Understandability
 Use domain language, not technical jargon.
 ```typescript
 // BAD: Technical
 const arr = users.filter(u => u.isActive);
 // GOOD: Domain language
 const activeCustomers = users.filter(user => user.isActive);
 ```
 ### 3. Specificity
 Avoid vague names: `data`, `info`, `manager`, `handler`, `processor`, `utils`
 ```typescript
 // BAD: Vague
 class DataManager { }
 function processInfo(data) { }
 // GOOD: Specific
 class OrderRepository { }
 function validatePayment(payment) { }
 ```
 ### 4. Brevity (but not at cost of clarity)
 Short names are good only if meaning is preserved.
 ```typescript
 // BAD: Too cryptic
 const usrLst = getUsrs();
 // BAD: Unnecessarily long
 const listOfAllActiveUsersInTheSystem = getActiveUsers();
 // GOOD: Brief but clear
 const activeUsers = getActiveUsers();
 ```
 ### 5. Searchability
 Names should be unique enough to grep/search.
 ```typescript
 // BAD: Common word, hard to search
 const data = fetch();
 // GOOD: Unique, searchable
 const orderSummary = fetchOrderSummary();
 ```
 ### 6. Pronounceability
 You should be able to say it in conversation.
 ```typescript
 // BAD
 const genymdhms = generateYearMonthDayHourMinuteSecond();
 // GOOD
 const timestamp = generateTimestamp();
 ```
 ### 7. Austerity
 Avoid unnecessary filler words.
 ```typescript
 // BAD: Redundant
 const userData = user; // 'Data' adds nothing
 class UserClass { }    // 'Class' adds nothing
 // GOOD
 const user = user;
 class User { }
 ```
 ---
 ## Object Calisthenics (9 Rules)
 Exercises to improve OO design. Follow strictly during practice, relax slightly in production.
 ### 1. One Level of Indentation per Method
 ```typescript
 // BAD: Multiple levels
 function process(orders: Order[]) {
  for (const order of orders) {
    if (order.isValid()) {
      for (const item of order.items) {
        if (item.inStock) {
          // process...
        }
      }
    }
  }
 }
 // GOOD: Extract methods
 function process(orders: Order[]) {
  orders.filter(o => o.isValid()).forEach(processOrder);
 }
 function processOrder(order: Order) {
  order.items.filter(i => i.inStock).forEach(processItem);
 }
 ```
 ### 2. Don't Use the ELSE Keyword
 Use early returns, guard clauses, or polymorphism.
 ```typescript
 // BAD: else
 function getDiscount(user: User): number {
  if (user.isPremium) {
    return 20;
  } else {
    return 0;
  }
 }
 // GOOD: Early return
 function getDiscount(user: User): number {
  if (user.isPremium) return 20;
  return 0;
 }
 ```
 ### 3. Wrap All Primitives and Strings
 Primitives should be wrapped in domain objects when they have meaning.
 ```typescript
 // BAD: Primitive obsession
 function createUser(email: string, age: number) { }
 // GOOD: Value objects
 class Email {
  constructor(private value: string) {
    if (!this.isValid(value)) throw new InvalidEmail();
  }
  private isValid(email: string): boolean { ... }
 }
 class Age {
  constructor(private value: number) {
    if (value < 0 || value > 150) throw new InvalidAge();
  }
 }
 function createUser(email: Email, age: Age) { }
 ```
 ### 4. First-Class Collections
 Any class with a collection should have no other instance variables.
 ```typescript
 // BAD: Collection mixed with other state
 class Order {
  items: OrderItem[] = [];
  customerId: string;
  total: number;
 }
 // GOOD: Collection is its own class
 class OrderItems {
  constructor(private items: OrderItem[] = []) {}
  add(item: OrderItem): void { ... }
  total(): Money { ... }
  isEmpty(): boolean { ... }
 }
 class Order {
  constructor(
    private items: OrderItems,
    private customerId: CustomerId
  ) {}
 }
 ```
 ### 5. One Dot per Line (Law of Demeter)
 Don't chain through object graphs.
 ```typescript
 // BAD: Train wreck
 const city = order.customer.address.city;
 // GOOD: Tell, don't ask
 const city = order.getShippingCity();
 ```
 ### 6. Don't Abbreviate
 If a name is too long to type, the class is doing too much.
 ```typescript
 // BAD
 const custRepo = new CustRepo();
 const ord = new Ord();
 // GOOD
 const customerRepository = new CustomerRepository();
 const order = new Order();
 ```
 ### 7. Keep All Entities Small
 - Classes: < 50 lines
 - Methods: < 10 lines
 - Files: < 100 lines
 If larger, it's probably doing too much. Split it.
 ### 8. No Classes with More Than Two Instance Variables
 Forces small, focused classes.
 ```typescript
 // BAD: Too many variables
 class Order {
  id: string;
  customerId: string;
  items: Item[];
  total: number;
  status: string;
 }
 // GOOD: Composed of smaller objects
 class Order {
  constructor(
    private id: OrderId,
    private details: OrderDetails
  ) {}
 }
 class OrderDetails {
  constructor(
    private customer: Customer,
    private lineItems: LineItems
  ) {}
 }
 ```
 ### 9. No Getters/Setters/Properties
 Objects should have behavior, not just data. Tell objects what to do.
 ```typescript
 // BAD: Data bag with getters
 class Account {
  getBalance(): number { return this.balance; }
  setBalance(value: number) { this.balance = value; }
 }
 // Caller does the work
 if (account.getBalance() >= amount) {
  account.setBalance(account.getBalance() - amount);
 }
 // GOOD: Behavior-rich object
 class Account {
  withdraw(amount: Money): WithdrawResult {
    if (!this.canWithdraw(amount)) {
      return WithdrawResult.insufficientFunds();
    }
    this.balance = this.balance.subtract(amount);
    return WithdrawResult.success();
  }
 }
 // Caller tells, object decides
 const result = account.withdraw(amount);
 ```
 ---
 ## Comments
 ### When to Write Comments
 **Only write comments to explain WHY, not WHAT or HOW.**
 Code explains what and how. Comments explain business reasons, non-obvious decisions, or warnings.
 ```typescript
 // BAD: Explains what (redundant)
 // Add 1 to counter
 counter++;
 // GOOD: Explains why
 // Compensate for 0-based indexing in legacy API
 counter++;
 ```
 ### Prefer Self-Documenting Code
 Instead of commenting, rename to make intent clear.
 ```typescript
 // BAD: Comment needed
 // Check if user can access premium features
 if (user.subscriptionLevel >= 2 && !user.isBanned) { }
 // GOOD: Self-documenting
 if (user.canAccessPremiumFeatures()) { }
 ```
 ---
 ## Formatting
 ### Vertical Spacing
 - Related code together
 - Blank lines between concepts
 - Most important/public at top
 ### Horizontal Spacing
 - Consistent indentation
 - Space around operators
 - Max line length ~80-120 characters
 ### Storytelling
 Code should read top-to-bottom like a story. High-level at top, details below.
 ```typescript
 class OrderProcessor {
  // Public API first
  process(order: Order): ProcessResult {
    this.validate(order);
    this.calculateTotals(order);
    return this.save(order);
  }
  // Supporting methods below, in order of appearance
  private validate(order: Order): void { ... }
  private calculateTotals(order: Order): void { ... }
  private save(order: Order): ProcessResult { ... }
 }
 ```
--- a/clean-code/references/code-smells.md
+++ b/clean-code/references/code-smells.md
@@ -0,0 +1,334 @@
 # Code Smells & Anti-Patterns
 ## What Are Code Smells?
 Indicators that something MAY be wrong. Not bugs, but design problems that make code hard to understand, change, or test.
 ## The Five Categories
 ### 1. Bloaters
 Code that has grown too large.
 | Smell | Symptom | Refactoring |
 |-------|---------|-------------|
 | **Long Method** | > 10 lines | Extract Method |
 | **Large Class** | > 50 lines, multiple responsibilities | Extract Class |
 | **Long Parameter List** | > 3 parameters | Introduce Parameter Object |
 | **Data Clumps** | Same group of variables appear together | Extract Class |
 | **Primitive Obsession** | Primitives instead of small objects | Wrap in Value Object |
 ### 2. Object-Orientation Abusers
 Misuse of OO principles.
 | Smell | Symptom | Refactoring |
 |-------|---------|-------------|
 | **Switch Statements** | Type checking, large switch/if-else | Replace with Polymorphism |
 | **Parallel Inheritance** | Adding subclass requires adding another | Merge Hierarchies |
 | **Refused Bequest** | Subclass doesn't use parent methods | Replace Inheritance with Delegation |
 | **Alternative Classes** | Different interfaces, same concept | Rename, Extract Superclass |
 ### 3. Change Preventers
 Code that makes changes difficult.
 | Smell | Symptom | Refactoring |
 |-------|---------|-------------|
 | **Divergent Change** | One class changed for many reasons | Extract Class (SRP) |
 | **Shotgun Surgery** | One change touches many classes | Move Method/Field together |
 | **Parallel Inheritance** | (see above) | Merge Hierarchies |
 ### 4. Dispensables
 Code that can be removed.
 | Smell | Symptom | Refactoring |
 |-------|---------|-------------|
 | **Comments** | Explaining bad code | Rename, Extract Method |
 | **Duplicate Code** | Copy-paste | Extract Method, Pull Up Method |
 | **Dead Code** | Unreachable code | Delete |
 | **Speculative Generality** | "Just in case" code | Delete (YAGNI) |
 | **Lazy Class** | Class that does almost nothing | Inline Class |
 ### 5. Couplers
 Excessive coupling between classes.
 | Smell | Symptom | Refactoring |
 |-------|---------|-------------|
 | **Feature Envy** | Method uses another class's data extensively | Move Method |
 | **Inappropriate Intimacy** | Classes know too much about each other | Move Method, Extract Class |
 | **Message Chains** | `a.getB().getC().getD()` | Hide Delegate |
 | **Middle Man** | Class only delegates | Inline Class |
 ---
 ## The Seven Most Common Code Smells
 ### 1. Long Method
 **Symptom:** Method > 10 lines, doing multiple things.
 ```typescript
 // SMELL
 function processOrder(order: Order) {
  // Validate
  if (!order.items.length) throw new Error('Empty');
  if (!order.customer) throw new Error('No customer');
  // Calculate
  let total = 0;
  for (const item of order.items) {
    total += item.price * item.quantity;
    if (item.discount) {
      total -= item.discount;
    }
  }
  // Apply tax
  const taxRate = getTaxRate(order.customer.state);
  total = total * (1 + taxRate);
  // Save
  db.orders.insert({ ...order, total });
  // Notify
  emailService.send(order.customer.email, 'Order confirmed');
 }
 // REFACTORED
 function processOrder(order: Order) {
  validateOrder(order);
  const total = calculateTotal(order);
  saveOrder(order, total);
  notifyCustomer(order);
 }
 ```
 ### 2. Large Class
 **Symptom:** Class with many responsibilities, > 50 lines.
 ```typescript
 // SMELL: God class
 class User {
  // User data
  name: string;
  email: string;
  // Authentication
  login() { }
  logout() { }
  resetPassword() { }
  // Preferences
  setTheme() { }
  setLanguage() { }
  // Notifications
  sendEmail() { }
  sendSMS() { }
  // Billing
  charge() { }
  refund() { }
 }
 // REFACTORED: Separate classes
 class User { name: string; email: string; }
 class AuthService { login(); logout(); resetPassword(); }
 class UserPreferences { setTheme(); setLanguage(); }
 class NotificationService { sendEmail(); sendSMS(); }
 class BillingService { charge(); refund(); }
 ```
 ### 3. Feature Envy
 **Symptom:** Method uses another class's data more than its own.
 ```typescript
 // SMELL: Order envies Customer
 class Order {
  calculateShipping(customer: Customer): number {
    if (customer.country === 'US') {
      if (customer.state === 'CA') return 10;
      return 15;
    }
    return 25;
  }
 }
 // REFACTORED: Move to Customer
 class Customer {
  getShippingCost(): number {
    if (this.country === 'US') {
      if (this.state === 'CA') return 10;
      return 15;
    }
    return 25;
  }
 }
 class Order {
  calculateShipping(): number {
    return this.customer.getShippingCost();
  }
 }
 ```
 ### 4. Primitive Obsession
 **Symptom:** Using primitives for domain concepts.
 ```typescript
 // SMELL
 function createUser(email: string, age: number, zipCode: string) {
  // No validation, easy to pass wrong values
  if (!email.includes('@')) throw new Error();
  if (age < 0) throw new Error();
 }
 // REFACTORED: Value objects
 class Email {
  constructor(private value: string) {
    if (!value.includes('@')) throw new InvalidEmail();
  }
 }
 class Age {
  constructor(private value: number) {
    if (value < 0 || value > 150) throw new InvalidAge();
  }
 }
 function createUser(email: Email, age: Age, address: Address) {
  // Type system prevents invalid data
 }
 ```
 ### 5. Switch Statements
 **Symptom:** Switching on type, repeated across codebase.
 ```typescript
 // SMELL
 function getArea(shape: Shape): number {
  switch (shape.type) {
    case 'circle': return Math.PI * shape.radius ** 2;
    case 'rectangle': return shape.width * shape.height;
    case 'triangle': return 0.5 * shape.base * shape.height;
  }
 }
 function getPerimeter(shape: Shape): number {
  switch (shape.type) { // Same switch again!
    case 'circle': return 2 * Math.PI * shape.radius;
    // ...
  }
 }
 // REFACTORED: Polymorphism
 interface Shape {
  getArea(): number;
  getPerimeter(): number;
 }
 class Circle implements Shape {
  constructor(private radius: number) {}
  getArea(): number { return Math.PI * this.radius ** 2; }
  getPerimeter(): number { return 2 * Math.PI * this.radius; }
 }
 ```
 ### 6. Inappropriate Intimacy
 **Symptom:** Classes know too much about each other's internals.
 ```typescript
 // SMELL
 class Order {
  process() {
    const inventory = new Inventory();
    // Reaching into inventory's internals
    for (const item of this.items) {
      const stock = inventory.stockLevels[item.sku];
      if (stock.quantity < item.quantity) {
        throw new Error('Out of stock');
      }
      inventory.stockLevels[item.sku].quantity -= item.quantity;
    }
  }
 }
 // REFACTORED: Tell, don't ask
 class Inventory {
  reserve(items: OrderItem[]): ReserveResult {
    // Inventory manages its own state
    for (const item of items) {
      if (!this.canReserve(item)) {
        return ReserveResult.outOfStock(item);
      }
    }
    this.deductStock(items);
    return ReserveResult.success();
  }
 }
 class Order {
  process(inventory: Inventory) {
    const result = inventory.reserve(this.items);
    if (!result.isSuccess()) {
      throw new OutOfStockError(result.failedItem);
    }
  }
 }
 ```
 ### 7. Speculative Generality
 **Symptom:** "Just in case" abstractions that aren't used.
 ```typescript
 // SMELL: Over-engineered for hypothetical needs
 interface PaymentProcessor {
  process(): void;
  rollback(): void;
  audit(): void;
  generateReport(): void;
  scheduleRecurring(): void;
 }
 class StripeProcessor implements PaymentProcessor {
  process() { /* actual code */ }
  rollback() { throw new Error('Not implemented'); }
  audit() { throw new Error('Not implemented'); }
  generateReport() { throw new Error('Not implemented'); }
  scheduleRecurring() { throw new Error('Not implemented'); }
 }
 // REFACTORED: YAGNI
 interface PaymentProcessor {
  process(): void;
 }
 class StripeProcessor implements PaymentProcessor {
  process() { /* actual code */ }
 }
 // Add other methods when actually needed
 ```
 ---
 ## Prevention Strategies
 1. **Follow Object Calisthenics** - Rules prevent most smells
 2. **Practice TDD** - Tests reveal design problems early
 3. **Review in pairs** - Fresh eyes catch smells
 4. **Refactor continuously** - Don't let smells accumulate
 5. **Apply SOLID** - Prevents structural smells
 6. **Use static analysis** - Tools catch common issues
 ---
 ## When You Find a Smell
 1. **Confirm it's a problem** - Not all smells need fixing
 2. **Ensure test coverage** - Before refactoring
 3. **Refactor in small steps** - Keep tests passing
 4. **Commit frequently** - Easy to revert if needed
--- a/code-review/SKILL.md
+++ b/code-review/SKILL.md
@@ -0,0 +1,284 @@
 ---
 name: code-review
 description: Systematic code smell detection and Go-specific review checklist. Load before any pre-merge review.
 ---
 # Code Review
 ## Overview
 Code review is smell detection and principle verification, not style policing. The goal is to identify patterns that make code hard to understand, change, test, or safely operate.
 **This skill is read-only analysis.** When you find issues, document them with specific file:line references and prioritize by architectural impact. For fixes, load the `refactoring` or `clean-code` skill.
 ## When to Use
 - Pre-merge review of any pull request
 - Before committing a significant change
 - When asked to "review this code"
 - After the GREEN phase of TDD (use with `clean-code` skill)
 ## Review Process
 ### Phase 0: Iron Laws (block on any violation)
 Before reading the diff, scan for these. Any violation is a blocking finding regardless of how clean the rest of the change looks.
 1. **No security vulnerabilities** — command injection, SQL injection, credential exposure, path traversal, unchecked input at system boundaries
 2. **No silently swallowed errors** — `err != nil` without wrapping, logging, or surfacing is always wrong; `_ = f()` on a fallible call is wrong
 3. **No missing validation at system boundaries** — user input, external API responses, and file reads must be validated before use
 If any iron law is violated, flag it as **Blocking** before any other phase. The other phases still run, but the iron-law findings come first in the report. See Phase 3 for detailed patterns and code examples covering iron laws 2 and 3.
 ### Phase 1: Context
 Before examining code:
 1. Understand what the change is trying to accomplish
 2. Identify the affected packages and their purpose
 3. Check test coverage — is the behavior tested?
 4. Read the commit message and PR description
 ### Phase 2: Architectural Analysis (Highest Priority)
 Check for structural problems first. These are expensive to fix later.
 **Dependency violations:**
 - Does domain logic import infrastructure packages?
 - Are there circular imports between packages?
 - Does a type do things that multiple different stakeholders would request changes to?
 **Abstraction problems:**
 - Is there a large switch/if-else on a type string that should be an interface?
 - Are there God types that know too much about too many things?
 - Is there Shotgun Surgery — one logical change requires editing many files?
 ### Phase 3: Go-Specific Checklist
 #### Error Handling
 ```go
 // Required: wrap with context
 return fmt.Errorf("parse invoice: %w", err)
 // Forbidden: naked return
 return err
 // Forbidden: log and return (pick one)
 log.Error("failed", "err", err)
 return err
 // Forbidden: silently ignore
 _ = doSomething()
 // Suspicious: sentinel errors without documentation
 var ErrFoo = errors.New("foo") // Is this exported? Is it documented?
 ```
 #### Goroutine Safety
 - Is shared state protected by a mutex or accessed only via channels?
 - Are goroutines guaranteed to terminate (no goroutine leaks)?
 - Is `sync.WaitGroup` used correctly (Add before go, Done via defer)?
 - Is there a `go test -race` passing?
 ```go
 // Check: is this safe for concurrent use?
 type Cache struct {
    mu    sync.RWMutex // Good: explicit lock
    items map[string]Item
 }
 // Red flag: map access without lock in concurrent context
 type BadCache struct {
    items map[string]Item // Unsafe if accessed from multiple goroutines
 }
 ```
 #### Exported API Minimalism
 - Only export what callers outside the package actually need
 - Internal implementation details should be unexported
 - Does the exported API make sense as a standalone contract?
 ```go
 // Good: minimal exported surface
 type UserService struct { /* unexported fields */ }
 func NewUserService(...) *UserService { ... }
 func (s *UserService) GetUser(ctx context.Context, id UserID) (User, error) { ... }
 // Bad: over-exported internals
 type UserService struct {
    Store  UserStore  // Exported field — callers can replace it, breaks encapsulation
    Logger *slog.Logger // Exported — callers can mutate it
 }
 ```
 #### Dependency Safety
 Before adding any new dependency:
 - Has `govulncheck` been run? (`govulncheck ./...`)
 - Is there a justification in the commit message for non-stdlib additions?
 - Pre-approved without justification: `testify`, `slog`, `templ`, `sqlc`
 - Everything else needs a reason
 #### Generic/interface{} Usage
 ```go
 // Bad: any/interface{} when a concrete type or generic works
 func Process(data interface{}) error { ... }
 // Good: typed
 func Process[T Processor](data T) error { ... }
 // Good: concrete type when only one type is expected
 func ProcessOrder(o *Order) error { ... }
 ```
 #### Context Propagation
 - Every function that does I/O (DB, HTTP, file) should accept `ctx context.Context` as the first parameter
 - Context should be passed down, not stored in structs
 - `context.Background()` should only appear at the top-level (main, test setup)
 ```go
 // Good
 func (s *Store) GetUser(ctx context.Context, id UserID) (User, error) { ... }
 // Bad: no context — can't cancel, can't trace, can't timeout
 func (s *Store) GetUser(id UserID) (User, error) { ... }
 // Bad: context stored in struct
 type Handler struct {
    ctx context.Context // Don't do this
 }
 ```
 #### Logging with slog
 ```go
 // Good: structured, leveled
 slog.InfoContext(ctx, "user created", "user_id", u.ID, "email", u.Email)
 // Bad: unstructured
 log.Printf("user created: %s", u.Email)
 // Bad: logging errors that are also returned
 slog.Error("failed to save", "err", err)
 return fmt.Errorf("save user: %w", err) // caller is double-counting
 ```
 ### Phase 4: Code Smell Detection
 Apply the catalog from `clean-code/references/code-smells.md`. Key smells to prioritize:
 **Architectural impact (fix immediately):**
 - Global Data — package-level variables modified at runtime
 - Shotgun Surgery — one change touches many unrelated files
 - Feature Envy — a method uses another package's internals more than its own
 - God Object — a type that does everything
 **Design impact (fix before merge if possible):**
 - Long Method — functions > ~50 lines in Go (Go can be longer than OOP, but be honest)
 - Long Parameter List — > 4-5 parameters; consider a config struct
 - Duplicate Code — same logic copy-pasted (wait for Rule of Three)
 - Message Chain — `a.B().C().D()` — violates Law of Demeter
 - Primitive Obsession — using `string` for email, user ID, currency
 **Readability (note in review, fix if cheap):**
 - Magic Numbers — bare literals without named constants
 - Uncommunicative Names — `data`, `info`, `x`, `tmp`
 - Fallacious Comments — comments that don't match the code
 ### Phase 5: Test Quality
 - Are new behaviors covered by tests?
 - Are tests table-driven where appropriate?
 - Do tests test behavior, not implementation?
 - Are there mock-only tests that test nothing real?
 For comprehensive test review, load the `test-design` skill.
 ## Severity Classification
 | Severity | Description | Action |
 |----------|-------------|--------|
 | **Blocking** | Security issue, data corruption risk, broken behavior | Must fix before merge |
 | **High** | Architectural violation, goroutine safety, missing context | Fix before merge |
 | **Medium** | Design smell, missing tests for changed behavior | Fix in this PR or create issue |
 | **Low** | Naming, style, minor duplication | Note in review, optional |
 ## Output Format
 When documenting findings:
 ```markdown
 ## Code Review: [package/file]
 ### Blocking
 - `internal/store/user.go:47`: Error swallowed with `_ =` — callers can't know if Save failed
 ### High
 - `internal/handler/order.go:12`: Domain logic in HTTP handler — moves tax calculation out of service layer
 - `cmd/server/main.go:89`: Missing context propagation — `GetUser` call has no timeout
 ### Medium
 - `internal/service/invoice.go:34`: Long method (87 lines) — extract `calculateLineItems` and `applyTaxRules`
 ### Low
 - `internal/domain/user.go:5`: Field `Data string` — name reveals nothing about what data
 ```
 ## Brain MCP Integration
 The brain MCP holds prior review decisions across sessions. Use it to avoid re-litigating the same tradeoffs.
 **At review start:**
 - Run `brain_query` with the package or feature name + "review" to surface prior review findings, recurring smells, or architectural decisions for this area. This prevents flagging the same smell that was already accepted as a tradeoff.
 **After review completes:**
 - For findings the author accepts as a tradeoff (with rationale), run `brain_write` to record the decision. Future reviews will not re-flag it.
 **Never:**
 - Embed brain queries in the review output. They are context for you, not findings for the author.
 ### Logging
 Call `session_log` once at the end of every phase to record the outcome.
 Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
 treats `pass` as success, `fail` as failure, `skip` as neither.
 **At end of each phase:**
 - `session_log` with `{skill: "code-review", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **Phases for this skill:** phase-0-iron-laws, phase-1-context, phase-2-architectural-analysis, phase-3-go-checklist, phase-4-code-smell-detection, phase-5-test-quality
 **Status semantics:**
 - `pass` — the phase's intended outcome was reached.
 - `fail` — the phase's intended outcome was NOT reached.
 - `skip` — phase was skipped intentionally.
 **Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
 ## Common Go Anti-Patterns
 | Anti-Pattern | What to Look For |
 |-------------|-----------------|
 | Goroutine leak | `go func()` without a done channel or WaitGroup |
 | Error ignored | `_ = f()` or missing `if err != nil` after a fallible call |
 | Nil map write | `m[k] = v` without `m = make(map[...]{})` initialization |
 | Race on map | Concurrent map read/write without mutex |
 | ctx in struct | `type T struct { ctx context.Context }` |
 | Naked panic | `panic("something went wrong")` in non-initialization code |
 | fmt.Println in library | Use `slog` or accept a logger; never print to stdout from lib code |
 | init() abuse | `init()` with side effects that make testing hard |
 ## References
 - `clean-code/references/code-smells.md` — comprehensive smell catalog
 - Load `refactoring` skill when recommending fixes
 - Load `test-design` skill for reviewing test quality
 - Load `solid` skill for SOLID principle violations
 ## Mode 2 Routing Note
 This skill is invoked identically in Mode 1 (cloud Claude) and Mode 2 (client-local with supervisor routing). The routing layer (Plan 6) does not exist yet; treat this skill as Mode 1 only for now. Findings format and severity classification are unchanged across modes.
--- a/cognitive-load/SKILL.md
+++ b/cognitive-load/SKILL.md
@@ -0,0 +1,346 @@
 ---
 name: cognitive-load
 description: Measure how much mental effort a codebase demands from developers. Use when identifying hotspots, before architectural decisions, or when code feels hard to understand.
 ---
 # Cognitive Load Analysis
 ## Overview
 Cognitive load is the mental effort required to understand, modify, and reason about code. High cognitive load leads to bugs, slow development, and burnout.
 This skill provides a framework for measuring and reducing cognitive load using the **Cognitive Load Index (CLI)**, scored 0–1000.
 ## CLI Score Rating Scale
 | CLI Score | Rating | Action |
 |-----------|--------|--------|
 | 0–100 | Excellent | Model codebase |
 | 101–250 | Good | Minor improvements |
 | 251–400 | Moderate | Address hotspots |
 | 401–600 | Concerning | Plan refactoring |
 | 601–800 | Poor | Significant refactoring needed |
 | 801–999 | Severe | Consider rewrite of affected areas |
 ## The 8 Dimensions
 ### D1: Structural Complexity
 Cyclomatic complexity — the number of independent paths through code. Each `if`, `for`, `switch`, `case`, `&&`, `||` adds one.
 **Thresholds for Go:**
 - Simple function: < 5 (no issue)
 - Moderate: 5–10 (watch closely)
 - Complex: 10–15 (consider refactoring)
 - Critical: > 15 (must refactor)
 **Detection:**
 ```bash
 # Install: go install github.com/fzipp/gocyclo/cmd/gocyclo@latest
 gocyclo -over 10 ./...
 ```
 **Go hotspots:** Request handlers that mix auth, validation, business logic, and response formatting in one function.
 ### D2: Nesting Depth
 Maximum nesting depth — how many levels deep a reader must track.
 **Thresholds for Go:**
 - Ideal: 1–2 levels
 - Acceptable: 3 levels
 - Warning: 4 levels
 - Critical: 5+ levels
 **Fix:** Early returns and guard clauses eliminate deep nesting:
 ```go
 // Bad: 4 levels deep
 func process(r *http.Request) error {
    if r != nil {
        if r.Body != nil {
            data, err := io.ReadAll(r.Body)
            if err == nil {
                // actual work at level 4
            }
        }
    }
    return nil
 }
 // Good: guard clauses, 1 level
 func process(r *http.Request) error {
    if r == nil {
        return errors.New("nil request")
    }
    if r.Body == nil {
        return errors.New("no request body")
    }
    data, err := io.ReadAll(r.Body)
    if err != nil {
        return fmt.Errorf("read body: %w", err)
    }
    // actual work at level 1
    return nil
 }
 ```
 ### D3: Volume/Size
 Lines of code per file and function. Size alone isn't the issue — hidden responsibilities are.
 **Thresholds for Go:**
 - Function: > 50 lines is a signal (Go can go longer, but be honest about why)
 - File: > 400 lines often means too many responsibilities
 - Package: > 2000 lines total may need splitting
 **Go note:** Table-driven tests inflate LOC legitimately. Distinguish test LOC from production LOC when assessing.
 ### D4: Naming Quality
 Identifiers that don't reveal intent force readers to build a mental model from context.
 **Poor naming signals:**
 - Single-letter variables outside loop indices: `x`, `d`, `tmp`
 - Generic suffixes: `Manager`, `Handler`, `Processor`, `Data`, `Info`, `Util`
 - Abbreviations: `usrLst`, `custRepo`, `cfg2`
 - Inconsistency: `getUserById`, `FetchCustomerByID`, `retrieveClientById` for the same concept
 **Good naming in Go:**
 ```go
 // Bad
 func (m *Manager) proc(d interface{}) error { ... }
 // Good
 func (s *OrderService) PlaceOrder(ctx context.Context, req PlaceOrderRequest) (Order, error) { ... }
 ```
 ### D5: Coupling
 How many packages/types a given unit depends on. High coupling means a change in one place ripples widely.
 **Detection:**
 ```bash
 # Count imports per file
 grep -c "^import" *.go
 # or examine import blocks manually
 ```
 **Go-specific coupling smells:**
 - A domain type that imports from `database/sql`, `net/http`, and `smtp` simultaneously
 - A package that imports more than 10 other internal packages
 - Circular imports (Go compiler rejects these, but near-circular is still a smell)
 ### D6: Cohesion
 Whether things that belong together are together. Low cohesion means related code is scattered.
 **Signs of low cohesion:**
 - A function that calls many unrelated packages
 - A package where functions don't share data or purpose
 - Utility packages that grow without bound (`util`, `common`, `helpers`)
 ### D7: Duplication
 Repeated logic that must be updated in multiple places when requirements change.
 **Detection:**
 ```bash
 # Simple: find identical error messages that suggest copy-paste
 grep -r "error:" --include="*.go" | sort | uniq -d
 ```
 **Go note:** Apply the Rule of Three — don't extract the first or second duplication. The third is the signal.
 ### D8: Navigability
 How easily can a reader find relevant code and understand entry points?
 **Signs of poor navigability:**
 - No clear entry point (no `main.go`, no `New*` constructors)
 - Package names that don't reflect content
 - Files that mix many unrelated types
 - Deep directory nesting with unclear boundaries
 ## Go-Specific Cognitive Load Hotspots
 ### Deeply Nested Error Handling
 Go's explicit error handling is a feature, but it can create visual noise:
 ```go
 // High cognitive load: multiple nested error paths
 func (s *Service) Process(ctx context.Context, id string) error {
    u, err := s.store.GetUser(ctx, id)
    if err != nil {
        if errors.Is(err, ErrNotFound) {
            o, err := s.store.GetOrgByUserID(ctx, id)
            if err != nil {
                if errors.Is(err, ErrNotFound) {
                    return ErrEntityNotFound
                }
                return fmt.Errorf("get org: %w", err)
            }
            // ... more nesting
        }
        return fmt.Errorf("get user: %w", err)
    }
    // ... happy path
 }
 // Lower cognitive load: extract to focused functions
 func (s *Service) Process(ctx context.Context, id string) error {
    entity, err := s.resolveEntity(ctx, id)
    if err != nil {
        return fmt.Errorf("resolve entity: %w", err)
    }
    return s.processEntity(ctx, entity)
 }
 ```
 ### Goroutine Soup
 Goroutines without clear lifecycle management are high cognitive load:
 ```go
 // High load: who owns these goroutines? When do they stop?
 func startWorkers() {
    go processQueue()
    go cleanupExpired()
    go sendNotifications()
 }
 // Lower load: explicit lifecycle with context and WaitGroup
 func startWorkers(ctx context.Context) *WorkerPool {
    pool := &WorkerPool{}
    pool.wg.Add(3)
    go pool.processQueue(ctx)
    go pool.cleanupExpired(ctx)
    go pool.sendNotifications(ctx)
    return pool
 }
 func (p *WorkerPool) Shutdown() {
    p.cancel()
    p.wg.Wait()
 }
 ```
 ### Over-Generalization with Generics
 Generics add expressive power but also cognitive load. Use them sparingly:
 ```go
 // High load: generic just because you can
 func Transform[T any, U any](items []T, fn func(T) U) []U { ... }
 // Lower load: concrete types when you have only one use case
 func transformOrders(orders []Order) []OrderSummary { ... }
 ```
 Extract generics only when you genuinely have multiple type applications.
 ### Init Functions
 `init()` functions run invisibly and create hidden state:
 ```go
 // High load: what order do these run? What state do they set?
 func init() {
    db = mustConnectDB()
    cache = newCache()
 }
 // Lower load: explicit initialization in main or constructor
 func main() {
    db := mustConnectDB(cfg.DatabaseURL)
    cache := newCache(cfg.CacheSize)
    svc := NewService(db, cache)
    // ...
 }
 ```
 ### Long Interfaces
 An interface with 10 methods forces implementors to understand all 10 methods before reasoning about any:
 ```go
 // High cognitive load: 10 methods before you can implement
 type UserRepository interface {
    Create(ctx context.Context, u User) error
    GetByID(ctx context.Context, id UserID) (User, error)
    GetByEmail(ctx context.Context, email string) (User, error)
    Update(ctx context.Context, u User) error
    Delete(ctx context.Context, id UserID) error
    List(ctx context.Context, filter Filter) ([]User, error)
    Count(ctx context.Context, filter Filter) (int64, error)
    Exists(ctx context.Context, id UserID) (bool, error)
    Lock(ctx context.Context, id UserID) error
    Unlock(ctx context.Context, id UserID) error
 }
 // Lower load: separate by use case
 type UserReader interface {
    GetByID(ctx context.Context, id UserID) (User, error)
    GetByEmail(ctx context.Context, email string) (User, error)
 }
 type UserWriter interface {
    Create(ctx context.Context, u User) error
    Update(ctx context.Context, u User) error
    Delete(ctx context.Context, id UserID) error
 }
 ```
 ## When to Measure
 - Before starting refactoring work: establish a baseline
 - When a package/file is frequently the source of bugs
 - When team members consistently report confusion about a subsystem
 - Before a major feature addition that will touch complex code
 ## Hotspot Identification Process
 1. **Size sweep:** Find files > 400 LOC and functions > 50 LOC
   ```bash
   find . -name "*.go" ! -name "*_test.go" | xargs wc -l | sort -rn | head -20
   ```
 2. **Complexity sweep:** Find high cyclomatic complexity
   ```bash
   gocyclo -over 10 ./...
   ```
 3. **Naming sweep:** Find packages with vague names
   ```bash
   ls internal/ # Are there `util`, `common`, `helpers`?
   ```
 4. **Coupling sweep:** Find files with many imports
   ```bash
   grep -l "^import" **/*.go | xargs grep -c '"' | sort -t: -k2 -rn | head -10
   ```
 5. **Duplication sweep:**
   ```bash
   # Look for copied error strings, identical function signatures
   grep -rn "TODO\|FIXME\|HACK" --include="*.go" # Technical debt markers
   ```
 ## Reduction Strategies
 | High CLI Dimension | Primary Fix |
 |-------------------|-------------|
 | D1: Structural Complexity | Extract function, replace conditional with polymorphism |
 | D2: Nesting Depth | Guard clauses, early returns |
 | D3: Volume/Size | Extract type, split package |
 | D4: Naming | Rename (most impactful, cheapest refactoring) |
 | D5: Coupling | Introduce interface, apply DIP |
 | D6: Cohesion | Extract type, reorganize package |
 | D7: Duplication | Extract shared function (after Rule of Three) |
 | D8: Navigability | Reorganize package structure, add package docs |
 ## Cross-References
 - Load `refactoring` skill for systematic techniques to reduce identified hotspots
 - Load `solid` skill for structural fixes (coupling, cohesion)
 - Load `clean-code` skill for naming improvements
--- a/debug/SKILL.md
+++ b/debug/SKILL.md
@@ -0,0 +1,146 @@
 ---
 name: debug
 description: Systematic hypothesis-first debugging. Generate 3-5 ranked hypotheses with verification steps before suggesting any fix. Use when encountering a bug, test failure, or unexpected behavior.
 ---
 # Debug
 ## Overview
 Debugging is hypothesis generation, not fix generation. The first instinct — "try this and see if it works" — wastes time and teaches nothing. A disciplined debug session produces a small set of ranked, falsifiable hypotheses, each with a concrete verification command. The fix comes after one hypothesis is confirmed, not before.
 **Core principle:** A hypothesis you cannot verify in one command is not a hypothesis — it is a guess.
 ## When to Use
 - Any test failure whose cause is not immediately obvious
 - Any production error or unexpected behavior
 - Any bug report from a user or stakeholder
 - Before reaching for a debugger or adding print statements: form hypotheses first
 **Do not use for:**
 - Compile errors with clear messages (just fix the typo)
 - Already-diagnosed bugs where you know the cause (go straight to TDD with a regression test)
 ## Iron Laws
 1. **Never suggest "try X and see what happens."** Every hypothesis must have a specific expected outcome if correct.
 2. **Generate 3-5 hypotheses, ordered by likelihood (most likely first).** Fewer than 3 means you stopped thinking; more than 5 means you are not prioritizing.
 3. **Diagnose only — do not fix in this skill.** The fix happens in a separate TDD cycle (load `tdd` skill) once a hypothesis is confirmed.
 ## Process
 ### Step 1: Read the failure
 Before forming hypotheses:
 - Read the full error message and stack trace, not just the headline
 - Read the file where the failure originated, around the failing line
 - If the failure is from a test, read the test and the code under test
 - Identify the **failure mode** — what actually went wrong (e.g. "nil pointer dereference in goroutine spawned by handler") not just what the error says ("runtime error")
 ### Step 2: Generate hypotheses
 For each hypothesis, capture three things:
 - **Mechanism:** what specific code path or state would produce this exact failure
 - **Verification:** the single command or file inspection that confirms or denies it
 - **Expected outcome if correct:** the specific output you would see
 Order by likelihood. The most likely cause is hypothesis 1.
 ### Step 3: Output
 Use this format:
 ```
 HYPOTHESIS 1 (likelihood: high): <mechanism in one sentence>
 VERIFY: <exact command or file:line to inspect>
 EXPECTED IF CORRECT: <specific output, value, or condition>
 HYPOTHESIS 2 (likelihood: medium): <mechanism>
 VERIFY: <exact command>
 EXPECTED IF CORRECT: <specific output>
 [... up to 5 ...]
 RECOMMENDED NEXT STEP: Run VERIFY for hypothesis 1 first.
 ```
 End with the recommendation, not a fix.
 ## Worked Example
 **Failure:**
 ```
 --- FAIL: TestInvoiceParser_HandlesEmptyPDF (0.00s)
    parser_test.go:47: panic: runtime error: index out of range [0] with length 0
 ```
 **Output:**
 ```
 HYPOTHESIS 1 (likelihood: high): parser indexes into pages[0] before checking len(pages) > 0; empty PDFs produce a zero-page document
 VERIFY: rg -n 'pages\[0\]' internal/parser/
 EXPECTED IF CORRECT: at least one site reads pages[0] without a preceding length check
 HYPOTHESIS 2 (likelihood: medium): the test fixture is a zero-byte file rather than a valid empty PDF; pdf library returns nil pages slice instead of empty
 VERIFY: ls -la testdata/empty.pdf && file testdata/empty.pdf
 EXPECTED IF CORRECT: file size 0 bytes or "data" rather than "PDF document"
 HYPOTHESIS 3 (likelihood: low): a recently changed dependency reordered the page-extraction API; pages[0] now refers to metadata, not content
 VERIFY: git log --oneline -10 -- go.sum | grep -i pdf
 EXPECTED IF CORRECT: a pdf-library bump in the last few commits
 RECOMMENDED NEXT STEP: Run VERIFY for hypothesis 1 first.
 ```
 ## Anti-Patterns
 | Anti-Pattern | Why It Fails |
 |---|---|
 | "Maybe try restarting the service" | Not a mechanism. Not falsifiable. Teaches nothing if it works. |
 | "Could be a race condition" | Mechanism without specifics. Which two operations race? On what state? |
 | "Let me add some print statements and see" | Skips hypothesis generation. Generates noise, not understanding. |
 | Single hypothesis presented as fact | If you are sure, write the regression test, do not run a debug skill. |
 | Mixing hypotheses with fix suggestions | The skill is diagnose-only. The fix is a TDD task on its hypothesis. |
 ## Brain MCP Integration
 The brain holds prior debug sessions across the project. Use it to skip rediscovering known failure modes.
 **At debug start:**
 - Run `brain_query` with the error message snippet + the package name. Past sessions may have logged identical or similar failures with their resolved hypotheses.
 **After a hypothesis is confirmed:**
 - Run `brain_write` with the failure signature → confirmed mechanism. Future debug sessions on the same area get the answer immediately.
 **Never:**
 - Run `brain_write` for an unconfirmed hypothesis. Speculation in the brain pollutes future queries.
 ### Logging
 Call `session_log` once at the end of every phase to record the outcome.
 Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
 treats `pass` as success, `fail` as failure, `skip` as neither.
 **At end of each phase:**
 - `session_log` with `{skill: "debug", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **Phases for this skill:** read-failure, generate-hypotheses, output
 **Status semantics:**
 - `pass` — the phase's intended outcome was reached.
 - `fail` — the phase's intended outcome was NOT reached.
 - `skip` — phase was skipped intentionally.
 **Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
 ## Mode 2 Routing Note
 This skill produces high-volume mechanical output (hypothesis enumeration) and is a candidate for Mode 2 routing to a local model in the future. Until Plan 6 ships the routing pod, treat as Mode 1 only. The hypothesis format and discipline are identical regardless of which model generates them.
 ## Cross-References
 - After a hypothesis is confirmed, load `tdd` skill — write a failing regression test that proves the bug exists, then fix it.
 - For test failures specifically caused by mock-vs-real divergence, also load `tdd/references/testing-anti-patterns.md`.
 - Load `code-review` skill if the diagnosis surfaces a structural issue (god object, shotgun surgery) rather than a pointwise bug.
--- a/feature-spec/SKILL.md
+++ b/feature-spec/SKILL.md
@@ -0,0 +1,172 @@
 ---
 name: feature-spec
 description: Write a tight implementation-level spec for a single feature or component before coding it. Use after a project-level spec exists (see spec-driven-dev) and you need to nail down one feature inside it.
 ---
 # Feature Spec
 ## Overview
 A feature spec is the implementation-level contract for one feature or component, written by an engineer before code. It is shorter and more focused than a project-level spec (`spec-driven-dev`) — it assumes the project context, the stack, and the user are already known. Its job is to force one decision: **what does done look like, and what is explicitly not in this feature.**
 **Core principle:** If you cannot write three measurable success criteria, you do not understand the feature well enough to build it.
 ## When to Use
 - About to start implementing a feature inside an existing, already-specced project
 - A change that touches more than one file or one package
 - A change that introduces a new public API, schema, or external contract
 - Any time the requirements feel "obvious" — that feeling usually means assumptions are stacking silently
 **When NOT to use:**
 - Project kickoff or first-time architecture decisions — use `spec-driven-dev` instead, which covers the broader lifecycle (Objective, Tech Stack, Project Structure, Boundaries, etc.)
 - Single-line fixes, typos, dependency bumps
 - Spike or throwaway prototypes (note them as such; no spec needed)
 ## Iron Laws
 1. **Success criteria must be measurable.** "The system is fast" is banned. "p99 < 200ms under 100 RPS" is valid. If you cannot state a measurement, the criterion is not a criterion.
 2. **An "Out of scope" section is mandatory.** If you do not draw the boundary, the implementer (you, future-you, or another agent) will guess wrong.
 3. **Every technical decision in "Technical approach" must have a rationale.** A decision is any choice where a reasonable alternative existed and was not taken. "Use Postgres" is not a decision; "use Postgres because the rest of the stack uses it and we need transactions across user + invoice" is.
 ## Spec Template
 ```markdown
 # Spec: [Feature Name]
 ## Problem Statement
 What problem does this feature solve? For whom? Why now?
 (2-4 sentences. If you cannot answer "why now," ask before writing the rest.)
 ## Success Criteria
 - [ ] Criterion 1 — measurable and verifiable
 - [ ] Criterion 2 — measurable and verifiable
 - [ ] Criterion 3 — measurable and verifiable
 ## Constraints
 Non-negotiable requirements the solution must satisfy.
 Examples: "must not change the public API", "must complete in < 500ms",
 "must not require a database migration", "must work offline".
 ## Out of Scope
 What this feature explicitly does NOT do, even if related.
 Anything plausibly in scope that you decided to defer goes here.
 ## Technical Approach
 The architecture or design decisions, with rationale for each.
 Each bullet should answer "why this choice over the obvious alternative."
 ## Risks
 What could go wrong, and how it would be mitigated or detected.
 At least one risk must be listed; "no risks" usually means insufficient analysis.
 ```
 ## Worked Example
 ```markdown
 # Spec: IBAN extraction from Swedish PDF invoices
 ## Problem Statement
 Invoice processors at small accounting firms manually copy IBANs from PDF
 invoices into the banking system. The current parser handles ~60% of formats;
 40% require manual fallback, defeating the automation premise.
 Why now: the pilot customer is onboarding three new firms next month, all of
 which use formats currently in the 40%.
 ## Success Criteria
 - [ ] IBAN extracted correctly from ≥ 95% of a 200-invoice test corpus
 - [ ] Zero false positives (extracting a non-IBAN string as an IBAN) on the corpus
 - [ ] p99 extraction time < 500ms per invoice
 - [ ] Returns a structured error (not a panic) for unrecognized formats
 ## Constraints
 - Must not require a new dependency (PDF library is already chosen)
 - Must work without network access (no external IBAN validation API)
 - Must be callable from the existing `Parser.Extract(ctx, r)` signature
 ## Out of Scope
 - Non-Swedish IBAN formats (separate spec when needed)
 - BIC/SWIFT extraction (handled by a separate parser)
 - OCR — input is assumed to be text-extractable PDF, not scanned image
 - Validating that the IBAN refers to a real account
 ## Technical Approach
 - Layered regex-then-checksum approach: regex narrows candidates, MOD-97
  checksum validates. Rationale: pure regex is too permissive; pure
  checksum requires too many candidates to test.
 - Use the existing `pdfplumber` text layer rather than re-extracting.
  Rationale: extraction is already the slow path; reusing the layer
  avoids doubling extraction time.
 - Return `(iban string, ok bool, err error)` not `(*IBAN, error)`.
  Rationale: the caller almost always wants to know "did we find one"
  separately from "did extraction itself fail," and a pointer return
  invites nil-deref bugs.
 ## Risks
 - Regex tuned to current corpus may overfit; mitigation: corpus is held out
  per-firm, so we will see degradation at the next firm onboarding before
  it reaches production.
 - MOD-97 implementation bugs are subtle; mitigation: cross-check against
  the Wikipedia reference implementation in tests.
 ```
 ## Common Rationalizations
 | Rationalization | Reality |
 |---|---|
 | "It's a small feature, I don't need a spec" | Small features still need a measurable definition of done. The spec stays small too. |
 | "I'll write the spec after, when I know more" | Then it is documentation, not a spec. The point is forcing clarity *before* the code locks in your assumptions. |
 | "The success criteria are obvious" | If they are obvious, they are easy to write down. If they are hard to write down, they are not obvious. |
 | "I don't have time to enumerate out-of-scope items" | You will spend more time arguing over scope creep at review than writing this section now. |
 ## Brain MCP Integration
 The brain holds prior feature specs across the project. Use it to keep constraints and out-of-scope decisions coherent across related features.
 **At spec start:**
 - Run `brain_query` for prior specs in the same area or for the same problem domain. Reusing the same constraints and out-of-scope decisions across related features keeps the system coherent.
 **After spec is approved:**
 - Run `brain_write` with the spec's success criteria and out-of-scope list. Future related specs can cross-reference rather than re-derive.
 ### Logging
 Call `session_log` once at the end of every phase to record the outcome.
 Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
 treats `pass` as success, `fail` as failure, `skip` as neither.
 **At end of each phase:**
 - `session_log` with `{skill: "feature-spec", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **Phases for this skill:** draft
 **Status semantics:**
 - `pass` — the phase's intended outcome was reached (spec drafted and approved).
 - `fail` — the phase's intended outcome was NOT reached (spec rejected, blocked on unresolved questions).
 - `skip` — phase was skipped intentionally.
 **Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
 ## Mode 2 Routing Note
 Spec writing requires interpretation of intent and tradeoff reasoning — work where Claude generally outperforms a 9-13B local model. This skill is intentionally **not** a Mode 2 routing candidate. In Mode 2, the routing layer should leave `feature-spec` calls on Claude.
 ## Cross-References
 - Load `spec-driven-dev` instead if this is the first spec for a project (broader template, includes tech stack, project structure, boundaries).
 - Load `problem-analysis` skill before this if the problem statement itself feels fuzzy.
 - Load `tdd` skill once the spec is approved — each success criterion becomes a test.
 - Load `planning` skill if the spec implies more than ~5 implementation tasks.
--- a/gitea-ci/SKILL.md
+++ b/gitea-ci/SKILL.md
@@ -0,0 +1,487 @@
 ---
 name: gitea-ci
 description: End-to-end Gitea CI/CD setup with self-hosted act_runner, quality gate pipeline, buildah image build, k3s deploy, and GitHub mirror. Covers act_runner-specific gotchas that are not documented anywhere.
 ---
 # Gitea CI/CD Pipeline Setup
 ## Overview
 This skill sets up a complete CI/CD pipeline on Gitea with a self-hosted runner. It covers the full journey from a bare repo to a working pipeline with: lint/test gate, OCI image build, k3s deploy, and GitHub mirror. It also captures non-obvious act_runner quirks that will burn you if you don't know them.
 Load this skill when starting a new project that needs CI, or when debugging a broken Gitea Actions pipeline.
 ## When to Use
 - Setting up CI for a new project on this infra (Gitea + koala/iguana/flamingo)
 - Debugging a Gitea Actions pipeline that silently fails or behaves unexpectedly
 - Migrating an existing project to Gitea Actions from another CI system
 ---
 ## Phase 0: Gather Context
 Before writing any YAML, collect the following. Use AskUserQuestion for anything not obvious from the project:
 | Item | Where to find it | Why it matters |
 |------|-----------------|----------------|
 | Gitea repo URL | git remote -v | Base URL for API calls |
 | Runner machine | CLAUDE.md / infra notes | Which machine runs jobs |
 | Primary language | go.mod, package.json, etc. | Drives the toolchain steps |
 | Container registry | Ask or check existing config | `localhost:5000` on koala, or external |
 | k3s namespace | k8s/ manifests or ask | Deploy target |
 | GitHub mirror repo | Ask | Target for the mirror job |
 | Image name | Ask or derive from repo name | Used in buildah + k3s steps |
 Check the project's `CLAUDE.md` for infra notes first — often this answers most questions without needing to ask.
 ---
 ## Phase 1: Verify Runner Is Live
 Before writing the workflow, confirm the runner is registered and healthy.
 ```bash
 # Check registered runners via Gitea API
 curl -s "https://<gitea-host>/api/v1/repos/<owner>/<repo>/actions/runners" \
  -H "Authorization: token $GITEA_TOKEN" | python3 -m json.tool
 ```
 If no runners appear, the runner is not registered. On this infra, `koala` runs `act_runner`. Check with:
 ```bash
 ssh koala "systemctl status act_runner"
 # or if running as a process:
 ssh koala "pgrep -a act_runner"
 ```
 act_runner config is typically at `~/.config/act_runner/` or wherever it was installed. Registration command:
 ```bash
 act_runner register \
  --instance https://<gitea-host> \
  --token <runner-registration-token> \
  --name koala \
  --labels self-hosted
 ```
 The registration token is in Gitea: Settings → Actions → Runners → Create Runner.
 ---
 ## Phase 2: Write the Workflow
 The standard pipeline has 4-5 jobs. Copy the template below and fill in the `env.IMAGE` value and any project-specific toolchain steps.
 ```yaml
 name: CI
 on:
  push:
    branches: [main]
    tags: ["v*"]
  pull_request:
    branches: [main]
 env:
  IMAGE: <project-name>   # ← change this
 jobs:
  # ── 1. Quality gate ─────────────────────────────────────────────────────────
  check:
    name: Lint / Test / Vet
    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version-file: go.mod
          cache: false   # self-hosted: Go cache persists on disk between runs
      - name: Verify toolchain
        run: |
          go version
          task --version
          govulncheck -version 2>&1 || true
      - name: Install golangci-lint
        run: |
          curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/HEAD/install.sh \
            | sh -s -- -b "$(go env GOPATH)/bin" v2.11.4
          golangci-lint --version
      - name: Run checks
        run: task check
  # ── 2. Build image ──────────────────────────────────────────────────────────
  build:
    name: Build & Import
    needs: check
    runs-on: self-hosted
    if: github.event_name != 'pull_request'
    outputs:
      image-tag: ${{ steps.meta.outputs.sha-tag }}
    steps:
      - uses: actions/checkout@v4
      - name: Derive image tags
        id: meta
        run: |
          SHA=$(git rev-parse --short HEAD)
          echo "sha-tag=${SHA}" >> "$GITHUB_OUTPUT"
          REF="${{ github.ref }}"
          if [[ "$REF" == refs/tags/v* ]]; then
            echo "version-tag=${REF#refs/tags/}" >> "$GITHUB_OUTPUT"
          fi
      - name: Build and push to local registry
        run: |
          REGISTRY="localhost:5000"
          REF="${REGISTRY}/${{ env.IMAGE }}:${{ steps.meta.outputs.sha-tag }}"
          buildah build \
            --label "org.opencontainers.image.revision=${{ github.sha }}" \
            --label "org.opencontainers.image.source=${{ github.repositoryUrl }}" \
            -t ${REF} \
            -t ${REGISTRY}/${{ env.IMAGE }}:latest \
            .
          buildah push --tls-verify=false ${REF}
          buildah push --tls-verify=false ${REGISTRY}/${{ env.IMAGE }}:latest
          [[ -n "${{ steps.meta.outputs.version-tag }}" ]] && \
            buildah push --tls-verify=false \
              ${REF} \
              ${REGISTRY}/${{ env.IMAGE }}:${{ steps.meta.outputs.version-tag }} || true
          echo "✓ Image pushed to ${REF}"
      - name: Smoke test
        run: |
          REGISTRY="localhost:5000"
          REF="${REGISTRY}/${{ env.IMAGE }}:${{ steps.meta.outputs.sha-tag }}"
          CNAME="smoke-${{ steps.meta.outputs.sha-tag }}"
          sudo k3s ctr images pull --plain-http ${REF}
          OUTPUT=$(timeout 5 sudo k3s ctr run --rm ${REF} ${CNAME} /<binary-name> 2>&1 || true)
          sudo k3s ctr containers delete ${CNAME} 2>/dev/null || true
          echo "$OUTPUT" | grep -q "<project-name>" \
            && echo "✓ Smoke test passed" \
            || echo "⚠ Smoke test inconclusive: $OUTPUT"
  # ── 3. Deploy to k3s ────────────────────────────────────────────────────────
  deploy:
    name: Deploy to staging (k3s)
    needs: build
    runs-on: self-hosted
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - name: Apply manifests and update image
        env:
          IMAGE_TAG: ${{ needs.build.outputs.image-tag }}
        run: |
          kubectl apply -f k8s/namespace.yaml
          kubectl apply -f k8s/deployment.yaml
          kubectl set image deployment/<project-name> \
            <project-name>=localhost:5000/${{ env.IMAGE }}:${IMAGE_TAG} \
            --namespace <project-name>
      - name: Verify rollout
        run: |
          kubectl rollout status deployment/<project-name> \
            --namespace <project-name> \
            --timeout=120s \
          || {
            echo "── pod status ──"
            kubectl get pods -n <project-name> -o wide
            echo "── pod events ──"
            kubectl get events -n <project-name> --sort-by='.lastTimestamp' | tail -20
            exit 1
          }
  # ── 4. Mirror to GitHub ─────────────────────────────────────────────────────
  mirror:
    name: Mirror to GitHub
    needs: deploy
    runs-on: self-hosted
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Push to GitHub
        run: |
          mkdir -p ~/.ssh
          echo '${{ secrets.GH_DEPLOY_KEY }}' > ~/.ssh/id_rsa_gh_mirror
          chmod 600 ~/.ssh/id_rsa_gh_mirror
          ssh-keyscan github.com >> ~/.ssh/known_hosts 2>/dev/null
          GIT_SSH_COMMAND="ssh -i ~/.ssh/id_rsa_gh_mirror -o IdentitiesOnly=yes" \
            git push git@github.com:<gh-org>/<repo>.git HEAD:main
          rm ~/.ssh/id_rsa_gh_mirror
          echo "✓ Mirrored to GitHub"
 ```
 Replace all `<placeholders>` before committing.
 ---
 ## Trunk-Based Development and CI
 This infra uses TBD. CI on every push to main is the quality gate — not branch protection.
 ### What this means in practice
 - Commit directly to main for all solo and single-agent work
 - Each commit: one logical change + `task check` passing locally before push
 - CI runs `task check` again on push — if it was green locally it will be green in CI
 - No PRs, no feature branches, no squash-merges for normal work
 ### The pre-push ritual
 ```bash
 # In project root before every push:
 task check && git push origin main
 ```
 Wire this into your shell or a git pre-push hook so it's automatic.
 ### When CI catches something you missed locally
 Fix forward — commit the fix directly to main. Do not revert unless the broken
 commit introduced a data migration or irreversible infrastructure change.
 ### Feature flags instead of feature branches
 If a feature is too large for one session, hide it behind a build tag or a config
 flag until ready:
 ```go
 // cmd/server/main.go
 if os.Getenv("ENABLE_EXPERIMENTAL_FEATURE") == "true" {
    server.RegisterExperimentalRoutes(mux)
 }
 ```
 Commit the hidden feature incrementally. Activate it when complete.
 ### When to break the rule
 - **Parallel agents on the same repo:** short-lived `agent/<description>` branch,
  merge to main inside the same session.
 - **External contributor or client four-eyes requirement:** PR flow, document the
  reason in `PROJECT.md`.
 ---
 ## Phase 3: Set Up GitHub Mirror
 The mirror job needs an SSH deploy key with write access to the GitHub repo.
 ```bash
 # On your local machine:
 ssh-keygen -t ed25519 -C "gitea-mirror-<project>" -f /tmp/gh_mirror_key -N ""
 cat /tmp/gh_mirror_key.pub   # add this to GitHub repo → Settings → Deploy Keys (Allow write)
 cat /tmp/gh_mirror_key       # add this to Gitea repo → Settings → Secrets → GH_DEPLOY_KEY
 rm /tmp/gh_mirror_key /tmp/gh_mirror_key.pub
 ```
 The private key goes in Gitea as the `GH_DEPLOY_KEY` secret. The public key goes in GitHub as a deploy key with write access.
 ---
 ## Phase 4: Add Project Secrets
 Set secrets via the Gitea API — faster than the UI and scriptable:
 ```bash
 # Template:
 curl -s -X PUT \
  "https://<gitea-host>/api/v1/repos/<owner>/<repo>/actions/secrets/<SECRET_NAME>" \
  -H "Authorization: token $GITEA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"data":"<secret-value>"}' \
  -w "\nHTTP %{http_code}"
 # Expected: HTTP 204
 # Verify secrets are set:
 curl -s "https://<gitea-host>/api/v1/repos/<owner>/<repo>/actions/secrets" \
  -H "Authorization: token $GITEA_TOKEN" | python3 -m json.tool
 ```
 Required secrets for the baseline pipeline: `GH_DEPLOY_KEY`.
 Any project-specific secrets (API keys, tokens, credentials) are added the same way.
 ---
 ## Phase 5: Debug Failing Runs
 When a run fails, use the Gitea API to get logs without opening a browser.
 ```bash
 GITEA_HOST="https://<gitea-host>"
 REPO="<owner>/<repo>"
 TOKEN="$GITEA_TOKEN"
 # List recent runs
 curl -s "$GITEA_HOST/api/v1/repos/$REPO/actions/runs?limit=5" \
  -H "Authorization: token $TOKEN" | python3 -c "
 import json,sys
 for r in json.load(sys.stdin)['workflow_runs']:
    print(f'#{r[\"id\"]} {r[\"status\"]:12} {r.get(\"conclusion\",\"\"):10} {r[\"display_title\"][:50]}')
 "
 # List jobs for a run (replace RUN_ID)
 curl -s "$GITEA_HOST/api/v1/repos/$REPO/actions/runs/<RUN_ID>/jobs" \
  -H "Authorization: token $TOKEN" | python3 -c "
 import json,sys
 for j in json.load(sys.stdin)['jobs']:
    print(f'Job {j[\"id\"]}: {j[\"name\"]:30} | {j[\"status\"]:10} | {j.get(\"conclusion\",\"\")}')
 "
 # Get full log for a job (replace JOB_ID)
 curl -s "$GITEA_HOST/api/v1/repos/$REPO/actions/jobs/<JOB_ID>/logs" \
  -H "Authorization: token $TOKEN" | grep -E "ERROR|error|FAIL|undefined|cannot" | tail -30
 ```
 ---
 ## ⚠ act_runner Gotchas
 These are non-obvious bugs and sharp edges that will silently break your pipeline. Read all of them.
 ### 1. Steps with `env:` blocks that use secret expressions are silently skipped
 **Symptom**: A step produces no log output at all — not even its step header (`⭐ Run Main ...`). Subsequent steps fail as if the skipped step never ran.
 **Cause**: In act_runner v0.4.x, steps that have an `env:` block containing `${{ secrets.* }}` expressions are silently dropped before execution.
 **Wrong**:
 ```yaml
 - name: Write config
  env:
    API_KEY: ${{ secrets.API_KEY }}
  run: |
    echo "KEY=$API_KEY" > config.env
 ```
 **Right** — inline the expression directly in the `run:` script:
 ```yaml
 - name: Write config
  run: |
    echo 'KEY=${{ secrets.API_KEY }}' > config.env
 ```
 act_runner evaluates `${{ }}` expressions before passing the script to bash. The values are redacted in log output. No `env:` block needed.
 ### 2. Heredocs in `run: |` blocks — the indented EOF trap
 **Symptom**: A file is written with 10-100× more content than expected. JSON files fail to parse. The step appears to succeed.
 **Cause**: In a YAML literal block scalar (`run: |`), all content is at the block's indentation level. If you indent your heredoc content (to look neat), the `EOF` terminator also gets indented. Bash requires the terminator to be at column 0 — an indented `EOF` is NOT recognized, so the heredoc consumes everything that follows until end-of-script.
 **Wrong**:
 ```yaml
      - name: Write file
        run: |
          cat > output.json <<EOF
          {
            "key": "value"
          }
          EOF          ← indented, NOT recognized as terminator
          chmod 600 output.json
 ```
 The file ends up containing the JSON, the literal string `          EOF`, and `          chmod 600 output.json`.
 **Right** — avoid heredocs entirely. Use `echo` for simple lines, `printf` for single-line formatted output:
 ```yaml
      - name: Write file
        run: |
          echo '{"key":"${{ secrets.SOME_VALUE }}"}' > output.json
          chmod 600 output.json
 ```
 For JSON with dynamic values, use python3 as a one-liner (but see gotcha #3):
 ```yaml
      - name: Write token file
        run: |
          python3 -c "import json; open('token.json','w').write(json.dumps({'key':'${{ secrets.TOKEN }}','type':'bearer'}))"
 ```
 ### 3. Multiline `python3 -c` breaks YAML block scalar parsing
 **Symptom**: The workflow file is pushed but no CI run appears — Gitea silently rejects the workflow YAML.
 **Cause**: Unindented code in a multiline `python3 -c "..."` block (with line breaks inside the YAML `run: |`) is at column 0, which the YAML parser interprets as ending the block scalar and starting a new YAML key.
 **Wrong**:
 ```yaml
        run: |
          python3 -c "
 import json          ← column 0, YAML parser thinks block scalar ended
 d = {...}
 "
 ```
 **Right** — keep the python3 invocation on one line:
 ```yaml
        run: |
          python3 -c "import json; d = {...}; open('f','w').write(json.dumps(d))"
 ```
 If the one-liner is too long, write a small `.py` file in a previous step and execute it.
 ### 4. Validate YAML locally before pushing
 Gitea silently ignores workflows with YAML parse errors — no run appears, no error message.
 ```bash
 python3 -c "import yaml; yaml.safe_load(open('.gitea/workflows/ci.yml').read()); print('YAML valid')"
 ```
 Run this before every push that touches the workflow file.
 ### 5. Commit generated files instead of gitignoring them
 **Symptom**: Lint/build fails in CI with `undefined: SomeGeneratedType` even though it works locally.
 **Cause**: Code generation tools (templ, sqlc, wire, protoc, etc.) produce `_generated.go` or `_templ.go` files that are often in `.gitignore`. CI runners don't have the generator installed and can't regenerate them.
 **Fix**: Remove the generated files from `.gitignore` and commit them. They are deterministic, diffable Go source — committing them is the standard practice for projects that don't want to install generators in CI.
 If you need generators in CI anyway (e.g., for proto changes), add a generation step BEFORE lint in the `check` job and install the tool explicitly.
 ### 6. `errcheck` will flag `fmt.Sscanf` and similar "can't fail in practice" calls
 Parsers that convert strings to numbers (e.g., parsing API responses where you know the format) will trigger `errcheck`. Suppress with blank assignment:
 ```go
 // before
 fmt.Sscanf(s, "%d", &n)
 // after — communicates intent and satisfies errcheck
 _, _ = fmt.Sscanf(s, "%d", &n)
 ```
 ---
 ## Verification Checklist
 Before marking CI setup complete:
 - [ ] `python3 -c "import yaml; yaml.safe_load(...)"` passes on the workflow YAML
 - [ ] Runner appears as active in Gitea repo → Settings → Actions → Runners
 - [ ] A push to `main` triggers a run visible in Gitea repo → Actions
 - [ ] `check` job passes (lint + test + vet green)
 - [ ] `build` job produces an image in the local registry (`curl localhost:5000/v2/<image>/tags/list`)
 - [ ] `deploy` job shows rollout success in pod logs
 - [ ] `mirror` job shows "✓ Mirrored to GitHub" and the commit appears on GitHub
 - [ ] A PR (non-main push) runs `check` only, skips build/deploy/mirror
 - [ ] Any project-specific secrets are set via API and verified with a secrets list call
 ---
 ## Related Skills
 - `tdd` — for writing the tests that the `check` job runs
 - `planning` — for breaking down the work before writing pipeline YAML
 - `problem-analysis` — for debugging when a job fails in unexpected ways
--- a/grill-me/SKILL.md
+++ b/grill-me/SKILL.md
@@ -0,0 +1,147 @@
 ---
 name: grill-me
 description: Critically interrogates plans, ideas, and proposals to make them sharper, more robust, and better grounded — with the explicit goal of producing a meaningfully better plan by the end. Use this skill whenever the user wants to stress-test a plan, prepare for hard questions, check if an idea is ready to act on, validate a hypothesis, or explicitly asks to be challenged, questioned, or grilled. Also trigger when a conversation has been exploring an idea for a while and it's time to converge — use this skill to check if the idea is actually ready. Trigger phrases include "grill me", "challenge this", "poke holes", "am I missing something", "is this ready", "stress-test this", "devil's advocate", "what am I not seeing".
 ---
 # Grill-Me
 A structured interrogation skill that pressure-tests plans and ideas before they become commitments. The goal is not to kill ideas — it's to produce a **demonstrably better plan** than the one that walked in.
 ## When to use each mode
 Three interrogation modes. Pick based on context:
 **Mode 1 — Quick Poke** (idea is early, user wants light pressure)
 3–5 sharp questions, each with a proposal. Fast. No structure overhead. Good for early Diamond 1 conversations.
 **Mode 2 — Full Grill** (idea is approaching commitment, user wants thorough interrogation)
 Structured pass through all lenses below. Ends with a Plan Delta. Good for end of Diamond 1 or before starting a project.
 **Mode 3 — Pre-mortem** (a plan exists and is about to be executed)
 Assume it failed. Work backwards. What went wrong? Good for Diamond 2 → Deliver transitions.
 Read the context to pick the right mode. If unsure, ask: "Quick poke or full grill?"
 ---
 ## Always bring a proposal
 When a question requires input or a decision, don't just ask — bring a recommendation:
 > "What's the smallest version that validates the hypothesis? I'd suggest starting with just the ntfy URL-push channel (Phase 1), because it gives you real usage data before investing in the automated sync. Does that match your intuition?"
 The format: sharp question → "I'd suggest [X]" → one-sentence reasoning → quick confirmation ask.
 This respects the user's time, models good thinking, and makes it easier to say "yes, agreed" than to construct an answer from scratch. If you genuinely have no strong opinion, say so explicitly — but that should be rare.
 ---
 ## Full Grill — interrogation lenses
 Run through each lens. Skip lenses that clearly don't apply, but say so explicitly ("Skipping competitive lens — this is internal tooling"). For each question that requires a decision, include a proposal.
 ### 1. Hypothesis clarity
 - Can the core claim be stated as: "We believe [X] will produce [Y], measurable by [Z]"?
 - Is Y falsifiable? Could an experiment prove this wrong?
 - Is Z actually measurable with available tools, or is it hand-wavy?
 ### 2. Assumptions audit
 - List the top 3 assumptions the plan depends on.
 - Which assumption, if wrong, kills the whole thing?
 - Has any assumption been validated, or are they all untested?
 ### 3. Futures resilience (Double Diamond + Futures Thinking lens)
 - Name 2–3 plausible alternative futures (not just the one assumed).
 - Does the plan still make sense in each of them?
 - What's the earliest signal that one of those other futures is manifesting?
 ### 4. Scope and reversibility
 - What is explicitly out of scope? (Unstated scope is a risk.)
 - Which decisions in this plan are irreversible?
 - What's the smallest version of this that would still validate the hypothesis?
 ### 5. Cost and effort reality check
 - What's the most expensive part? Time, money, attention?
 - Is there a simpler path to the same learning?
 - What does "good enough" look like vs "perfect"?
 ### 6. Failure modes
 - What's the most likely way this fails? (Not catastrophically — just quietly doesn't work.)
 - What's the most catastrophic failure mode, even if unlikely?
 - Is there a canary — an early signal that things are going wrong?
 ### 7. Who and why
 - Who benefits if this succeeds?
 - Who is harmed, blocked, or ignored?
 - Is there someone who should be consulted who hasn't been?
 ### 8. Next action clarity
 - Is the next step concrete enough that it could be handed to an agent or a person with no further explanation?
 - Is there a single person (or agent) responsible for it?
 - What would stop it from happening this week?
 ---
 ## Pre-mortem mode
 Set the scene: "It's [6 months / 1 year] from now. This project has failed. Not dramatically — it just quietly didn't deliver. Walk me through what happened."
 Then probe:
 - Was it a wrong assumption, a scope problem, or an execution problem?
 - Was there a moment where a different decision would have changed the outcome?
 - What should we do differently before we start?
 Include a proposal for each finding: "To prevent this, I'd suggest [X]."
 ---
 ## Output format
 **Quick Poke:** Questions numbered, each followed by a proposal. No preamble. Uncomfortable is fine.
 **Full Grill:** One section per lens. Each section: 1–2 sentences of framing + 2–3 sharp questions + proposals where relevant. End with:
 - "The three things most worth resolving before moving forward are: [X, Y, Z]."
 - A **Plan Delta** section (see below).
 **Pre-mortem:** Narrative first (2–3 sentences setting the failure scene), then structured findings with proposals, then "three things to do differently".
 ---
 ## Plan Delta — mandatory at the end of Full Grill and Pre-mortem
 The grill is not done when the questions have been asked. It's done when the plan is **demonstrably stronger**. At minimum:
 - At least one weak assumption has been named and either validated or explicitly accepted as a known risk
 - The hypothesis is sharper and more falsifiable than when we started
 - Something has changed: a scope reduction, a clearer metric, a reordered phase, a dropped assumption
 If the plan hasn't changed at all after a Full Grill, something went wrong — the questions weren't sharp enough or the answers weren't pushed hard enough.
 End every Full Grill and Pre-mortem with:
 > **Plan delta**
 > Before: [one sentence on the original plan]
 > After: [one sentence on what concretely changed]
 > Still open: [1–2 unresolved questions and how/when they'll be resolved]
 ---
 ## Tone guidance
 The grill should feel like a smart, trusted colleague who genuinely wants the plan to succeed — not a critic looking for reasons to say no. Questions should be sharp but not hostile. If something is clearly solid, say so briefly before moving on.
 Don't soften questions with preambles. "Have you considered that..." is weaker than "What happens if...". Prefer the latter.
 If the plan is actually good, say so: "Lenses 1–4 look solid. The real open questions are in lenses 5 and 6."
 ---
 ## Integration with Double Diamond
 This skill maps to the **convergence moments** in the Double Diamond workflow:
 - **End of Diamond 1 (Define):** Use Full Grill to check if the hypothesis is ready to become a project. The three convergence criteria (falsifiable hypothesis, measurable success criterion, futures-resilient) come directly from lenses 1 and 3.
 - **End of Diamond 2 (Deliver):** Use Pre-mortem before promoting to pre-prod.
 - **Mid-Diamond 2 (Develop):** Use Quick Poke when an agent's experiment result is ambiguous and you're deciding whether to continue or pivot.
 The skill does NOT replace human judgment at convergence points — it sharpens the plan so that judgment is better informed.
--- a/install.sh
+++ b/install.sh
@@ -0,0 +1,98 @@
 #!/usr/bin/env bash
 # install.sh — bootstrap installer for the skills library.
 #
 # Usage (one-liner, hosts without Task):
 #   curl -fsSL https://gitea.d-ma.be/mathias/skills/raw/branch/main/install.sh | bash
 #
 # Pinned to a specific tag:
 #   SKILLS_REF=v0.1.0 bash install.sh
 #
 # Idempotent: re-running pulls the latest changes for the configured ref
 # (default: main) and re-wires symlinks. Existing correct links are
 # left alone; mismatched ones are replaced.
 set -euo pipefail
 REPO_URL="${SKILLS_REPO_URL:-https://gitea.d-ma.be/mathias/skills.git}"
 REF="${SKILLS_REF:-main}"
 CHECKOUT_DIR="${SKILLS_CHECKOUT_DIR:-$HOME/.local/share/skills}"
 CLAUDE_GLOBAL_DIR="${CLAUDE_SKILLS_DIR:-$HOME/.claude/skills}"
 log() { printf '[skills] %s\n' "$*"; }
 ensure_checkout() {
  if [ -d "$CHECKOUT_DIR/.git" ]; then
    log "updating $CHECKOUT_DIR"
    git -C "$CHECKOUT_DIR" fetch --tags --quiet
    git -C "$CHECKOUT_DIR" checkout --quiet "$REF"
    if git -C "$CHECKOUT_DIR" symbolic-ref -q HEAD >/dev/null; then
      # on a branch — fast-forward
      git -C "$CHECKOUT_DIR" pull --ff-only --quiet
    fi
  else
    log "cloning $REPO_URL into $CHECKOUT_DIR (ref=$REF)"
    mkdir -p "$(dirname "$CHECKOUT_DIR")"
    git clone --quiet "$REPO_URL" "$CHECKOUT_DIR"
    git -C "$CHECKOUT_DIR" checkout --quiet "$REF"
  fi
 }
 list_skills() {
  # Every top-level entry that isn't a known meta file.
  (cd "$CHECKOUT_DIR" && ls -1) | grep -Ev '^(Taskfile\.yml|install\.sh|README\.md|SKILLS_INDEX\.md|\.gitea|\.git)$' || true
 }
 link_skill() {
  # link_skill <target_dir> <skill_name>
  local target_dir="$1"
  local skill="$2"
  local target="$CHECKOUT_DIR/$skill"
  local link="$target_dir/$skill"
  if [ -L "$link" ] || [ -e "$link" ]; then
    local current
    current=$(readlink "$link" 2>/dev/null || true)
    if [ "$current" = "$target" ]; then
      return
    fi
    rm -rf "$link"
  fi
  ln -s "$target" "$link"
  log "linked $link → $target"
 }
 wire_claude_global() {
  mkdir -p "$CLAUDE_GLOBAL_DIR"
  while IFS= read -r skill; do
    [ -n "$skill" ] || continue
    link_skill "$CLAUDE_GLOBAL_DIR" "$skill"
  done < <(list_skills)
 }
 wire_claude_repo() {
  # Only wire per-repo when invoked from inside a git repo and it isn't
  # the skills repo itself.
  if ! git rev-parse --show-toplevel >/dev/null 2>&1; then
    return
  fi
  local repo
  repo=$(git rev-parse --show-toplevel)
  if [ "$repo" = "$CHECKOUT_DIR" ]; then
    return
  fi
  local target_dir="$repo/.claude/skills"
  mkdir -p "$target_dir"
  while IFS= read -r skill; do
    [ -n "$skill" ] || continue
    link_skill "$target_dir" "$skill"
  done < <(list_skills)
 }
 main() {
  ensure_checkout
  wire_claude_global
  wire_claude_repo
  log "done — $(list_skills | wc -l | tr -d ' ') skill(s) wired at ref=$REF"
 }
 main "$@"
--- a/planning/SKILL.md
+++ b/planning/SKILL.md
@@ -0,0 +1,276 @@
 ---
 name: planning
 description: Decompose work into small, verifiable, ordered tasks. Use when you have a spec or clear requirements and need to break work into implementable units.
 ---
 # Planning and Task Breakdown
 ## Overview
 Good task breakdown is the difference between reliable delivery and a tangled mess. Every task should be small enough to implement, test, and verify in a single focused session.
 **Input:** A spec (load `spec-driven-dev` skill) or clear requirements.
 **Output:** A prioritized, ordered list of tasks with acceptance criteria and verification steps.
 ## When to Use
 - You have a spec and need to break it into implementable units
 - A task feels too large or vague to start
 - You need to communicate scope to Mathias
 - The implementation order isn't obvious
 - You want to identify parallelization opportunities
 **When NOT to use:** Single-file changes with obvious scope.
 ## The Planning Process
 ### Step 1: Read-Only Discovery
 Before writing any tasks, operate in read-only mode:
 - Read the spec and relevant code
 - Identify existing patterns, conventions, and infrastructure
 - Map dependencies between components
 - Note risks and unknowns
 **Do NOT write code during planning.** The output is a plan document, not implementation.
 ### Step 2: Map the Dependency Graph
 Visualize what depends on what:
 ```
 Database schema / migrations
    │
    ├── Domain types (pure Go structs + interfaces)
    │       │
    │       ├── Service layer (business logic)
    │       │       │
    │       │       └── HTTP handlers
    │       │
    │       └── Store implementations (postgres, in-memory)
    │
    └── Seed data / test fixtures
 ```
 **Build foundations first.** You can't test a handler before the service exists. You can't write a service before the domain types are defined.
 ### Step 3: Slice Vertically
 Build one complete feature path at a time, not one layer at a time.
 **Bad (horizontal slicing):**
 ```
 Task 1: Write all database migrations
 Task 2: Write all store implementations
 Task 3: Write all service methods
 Task 4: Write all handlers
 ```
 No working software until Task 4. If Task 1 is wrong, you discover it late.
 **Good (vertical slicing):**
 ```
 Task 1: User can create an account (migration + store + service + handler for create)
 Task 2: User can log in (auth store + service + handler)
 Task 3: User can view their invoices (query + service + handler + response type)
 ```
 Each task delivers working, testable functionality. If Task 1 reveals a design issue, Tasks 2+ can adapt.
 ### Step 4: Write Tasks
 Each task follows this structure:
 ```markdown
 ## Task [N]: [Short descriptive title]
 **Description:** One sentence explaining what this task accomplishes.
 **Acceptance criteria:**
 - [ ] [Specific, testable condition]
 - [ ] [Specific, testable condition]
 **Verification:**
 - [ ] Tests pass: `go test -run TestFeatureName ./...`
 - [ ] Build: `go build ./...`
 - [ ] Manual check: [description if needed]
 **Dependencies:** Task [N-1], Task [N-2] (or "None")
 **Files likely touched:**
 - `internal/domain/user.go`
 - `internal/store/user_store.go`
 - `internal/service/user_service.go`
 **Estimated scope:** [XS/S/M/L]
 ```
 ### Step 5: Order and Add Checkpoints
 Arrange tasks so:
 1. Dependencies are satisfied (build foundation first)
 2. Each task leaves the system in a working state
 3. High-risk tasks are early (fail fast)
 4. Checkpoints occur after every 2–3 tasks
 ```markdown
 ## Checkpoint: After Tasks 1–3
 - [ ] `go test ./...` passes
 - [ ] `go build ./...` succeeds
 - [ ] Core user flow works end-to-end (can register and log in)
 - [ ] Mathias reviews before continuing
 ```
 ## Task Sizing Guidelines
 | Size | Files | Scope | Example |
 |------|-------|-------|---------|
 | **XS** | 1 | Single function or type | Add a validation rule to User |
 | **S** | 1–2 | One endpoint or store method | Add GetByEmail to UserStore |
 | **M** | 3–5 | One feature slice | User registration (domain + store + service + handler) |
 | **L** | 5–8 | Multi-component feature | Search with filtering, pagination, and sorting |
 | **XL** | 8+ | **Too large — break it down** | — |
 **Break any XL task into smaller tasks.** An agent performs best on XS, S, and M tasks.
 **Break a task down further when:**
 - It would take more than 2–3 hours of focused work
 - The acceptance criteria have more than 4 bullet points
 - It touches two independent subsystems (auth AND billing in the same task)
 - You find yourself writing "and" in the task title
 ## Plan Document Template
 ```markdown
 # Implementation Plan: [Feature/Project Name]
 ## Overview
 [One paragraph: what we're building and the key technical decisions]
 ## Architecture Decisions
 - [Decision 1 and rationale]
 - [Decision 2 and rationale]
 ## Dependency Graph
 [Diagram or list showing what depends on what]
 ## Task List
 ### Phase 1: Foundation
 - [ ] **Task 1:** [Title] — [Scope: XS/S/M]
  - AC: [criteria]
  - Verify: `go test -run TestX ./...`
  - Files: [list]
 - [ ] **Task 2:** [Title] — [Scope: XS/S/M]
  ...
 ### Checkpoint: Foundation Complete
 - [ ] `go test ./...` passes
 - [ ] Build clean
 - [ ] Foundation behavior works end-to-end
 ### Phase 2: Core Features
 - [ ] **Task 3:** [Title] — [Scope: S/M]
  ...
 ### Checkpoint: Core Features Complete
 - [ ] User-facing scenarios work
 - [ ] Mathias reviews before Phase 3
 ### Phase 3: Polish and Edge Cases
 - [ ] **Task 5:** Handle [edge case]
 - [ ] **Task 6:** [Performance/error handling]
 ### Checkpoint: Complete
 - [ ] All acceptance criteria from the spec are met
 - [ ] `go test -race ./...` passes
 - [ ] govulncheck passes
 - [ ] Ready for review
 ## Risks and Mitigations
 | Risk | Impact | Mitigation |
 |------|--------|-----------|
 | [External API behavior unknown] | High | Spike task first |
 | [Data volume may be larger than expected] | Medium | Add pagination before shipping |
 ## Open Questions
 - [Question requiring Mathias's input]
 ```
 ## Parallelization
 When working in parallel (multiple sessions or agents):
 - **Safe to parallelize:** Independent feature slices, tests for already-implemented features
 - **Must be sequential:** Database migrations, shared interface changes, dependency chains
 - **Needs coordination:** Features that share an API contract — define the contract first, then parallelize
 ## Go-Specific Planning Notes
 ### Database First, Then Domain
 If the feature requires a schema change:
 1. Write and apply migration first
 2. Generate sqlc code (`task generate` or `sqlc generate`)
 3. Then write domain types and service
 ### Interface Contracts Before Parallel Work
 If two tasks depend on a shared interface:
 ```go
 // Define this first, in its own task
 type InvoiceStore interface {
    Save(ctx context.Context, inv Invoice) error
    GetByID(ctx context.Context, id InvoiceID) (Invoice, error)
 }
 ```
 Then write the postgres implementation and the service in parallel.
 ### Spike Tasks for Unknowns
 If a risk involves an unknown (new API, unfamiliar library, performance unknown):
 ```markdown
 ## Task 0 (Spike): Validate Bankgirot API integration
 **Description:** Verify we can submit an ISO 20022 message to Bankgirot sandbox.
 **Timebox:** 2 hours
 **Output:** Working code or decision to change approach
 **Not:** Production-quality code — throw it away after learning
 ```
 Spikes are timeboxed explorations. The code is thrown away. The learning is kept.
 ## Verification Before Starting
 Before beginning implementation:
 - [ ] Every task has acceptance criteria
 - [ ] Every task has a verification step
 - [ ] Task dependencies are identified and ordered correctly
 - [ ] No task touches more than ~5 files
 - [ ] Checkpoints exist between major phases
 - [ ] Mathias has reviewed and approved the plan
 ## Common Rationalizations
 | Rationalization | Reality |
 |---|---|
 | "I'll figure it out as I go" | That's how you end up with rework. 10 minutes of planning saves hours. |
 | "The tasks are obvious" | Write them down. Explicit tasks surface hidden dependencies. |
 | "Planning is overhead" | Planning IS the task. Implementation without a plan is just typing. |
 | "I can hold it all in my head" | Context windows are finite. Written plans survive session boundaries. |
 ## Cross-References
 - Requires: `spec-driven-dev` skill output
 - During implementation: load `tdd` skill per task
 - For large refactoring tasks: load `refactoring` skill
 - For architecture decisions: load `solid` skill
--- a/playwright/SKILL.md
+++ b/playwright/SKILL.md
@@ -0,0 +1,273 @@
 ---
 name: playwright
 description: Add and run Playwright E2E tests for any project with a web UI — covers setup, HTMX-specific patterns, SSE/streaming, Taskfile integration, and CI wiring.
 ---
 # Playwright E2E Testing
 ## When to use
 - Project has a web UI (HTMX, React, plain HTML — any stack)
 - You want to verify the golden path in a real browser, not just unit-test handlers
 - Chat/SSE streams, HTMX swaps, or dynamic DOM need testing that Go tests cannot cover
 ## Setup (first time in a project)
 ### 1. Install
 ```bash
 npm init -y
 npm install --save-dev @playwright/test
 npx playwright install chromium   # chromium only — enough for CI
 ```
 ### 2. Directory layout
 ```
 <repo>/
  e2e/
    playwright.config.ts
    tests/
      smoke.spec.ts
      chat.spec.ts
  package.json
  Taskfile.yml          ← add e2e task here
 ```
 Keep `e2e/` at repo root. Add to `.gitignore`:
 ```
 node_modules/
 e2e/test-results/
 e2e/playwright-report/
 ```
 ### 3. Config (`e2e/playwright.config.ts`)
 ```typescript
 import { defineConfig, devices } from '@playwright/test';
 export default defineConfig({
  testDir: './tests',
  timeout: 30_000,
  retries: process.env.CI ? 2 : 0,
  reporter: process.env.CI ? 'github' : 'list',
  use: {
    baseURL: process.env.BASE_URL ?? 'http://localhost:8080',
    screenshot: 'only-on-failure',
    trace: 'retain-on-failure',
  },
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
  ],
 });
 ```
 ### 4. Taskfile task
 ```yaml
 tasks:
  e2e:
    desc: "Run Playwright E2E tests against running dev server"
    dir: e2e
    cmds:
      - npx playwright test {{.CLI_ARGS}}
    env:
      BASE_URL: "http://localhost:8080"
  e2e:ui:
    desc: "Open Playwright UI mode"
    dir: e2e
    cmds:
      - npx playwright test --ui
 ```
 Run with: `task e2e` or `task e2e -- --headed` or `task e2e -- tests/chat.spec.ts`
 ---
 ## HTMX-specific patterns
 HTMX updates the DOM asynchronously. Never use `waitForSelector` alone — wait for HTMX to finish the swap.
 ### Wait for HTMX swap to complete
 ```typescript
 // After triggering an action that causes an HTMX request:
 await page.locator('[hx-target]').waitFor({ state: 'visible' });
 // Better: wait for htmx:afterSwap event on the target element
 await page.evaluate(() =>
  new Promise<void>(resolve =>
    document.body.addEventListener('htmx:afterSwap', () => resolve(), { once: true })
  )
 );
 ```
 ### Wait for specific content to appear after swap
 ```typescript
 await page.locator('#result').waitFor({ state: 'visible' });
 await expect(page.locator('#result')).toContainText('expected text');
 ```
 ### Avoid timing races
 ```typescript
 // BAD — may grab stale content before swap completes
 await page.click('button');
 const text = await page.locator('#output').textContent();
 // GOOD — assert after waiting
 await page.click('button');
 await expect(page.locator('#output')).toContainText('expected text', { timeout: 5000 });
 ```
 ### Forms with HTMX submit
 ```typescript
 await page.fill('input[name="query"]', 'hello');
 await page.keyboard.press('Enter');           // or click submit button
 await expect(page.locator('#response')).not.toBeEmpty({ timeout: 10_000 });
 ```
 ---
 ## SSE / streaming patterns
 For chat UIs or any endpoint using `text/event-stream`:
 ```typescript
 test('chat stream delivers response', async ({ page }) => {
  await page.goto('/chat');
  // Type and submit
  await page.fill('#message-input', 'what is 2+2?');
  await page.click('#send-btn');
  // SSE response appears incrementally — wait for streaming to stop
  // Strategy: wait until the response element stops growing
  const responseEl = page.locator('#chat-response');
  await responseEl.waitFor({ state: 'visible', timeout: 15_000 });
  // Poll until content stabilises (streaming done)
  let prev = '';
  let stable = 0;
  while (stable < 3) {
    const cur = await responseEl.textContent() ?? '';
    if (cur === prev && cur.length > 0) stable++;
    else { stable = 0; prev = cur; }
    await page.waitForTimeout(300);
  }
  expect(prev.length).toBeGreaterThan(0);
 });
 ```
 For a done-signal in the DOM (e.g. a `[data-streaming="false"]` attribute set when stream ends):
 ```typescript
 await page.locator('[data-streaming="false"]').waitFor({ timeout: 20_000 });
 const text = await page.locator('#chat-response').textContent();
 expect(text).toBeTruthy();
 ```
 ---
 ## Golden path smoke test template
 ```typescript
 // e2e/tests/smoke.spec.ts
 import { test, expect } from '@playwright/test';
 test.describe('smoke', () => {
  test('home page loads', async ({ page }) => {
    await page.goto('/');
    await expect(page).toHaveTitle(/.+/);
    await expect(page.locator('body')).not.toBeEmpty();
  });
  test('main nav is present', async ({ page }) => {
    await page.goto('/');
    await expect(page.locator('nav')).toBeVisible();
  });
 });
 ```
 ---
 ## Screenshot on failure
 Already configured via `screenshot: 'only-on-failure'` in the config. Artifacts land in `e2e/test-results/`. View them with:
 ```bash
 npx playwright show-report e2e/playwright-report
 ```
 ---
 ## CI wiring (Gitea workflow)
 ```yaml
  e2e:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - uses: actions/checkout@v4
      - name: Install Node deps
        working-directory: e2e
        run: npm ci
      - name: Install Playwright browsers
        working-directory: e2e
        run: npx playwright install --with-deps chromium
      - name: Start server
        run: ./bin/server &
        env:
          PORT: 8080
      - name: Wait for server
        run: |
          for i in $(seq 1 20); do
            curl -sf http://localhost:8080/ && break
            sleep 1
          done
      - name: Run E2E tests
        working-directory: e2e
        run: npx playwright test
        env:
          BASE_URL: http://localhost:8080
      - name: Upload test artifacts
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: e2e/playwright-report/
 ```
 ---
 ## Checklist
 When adding Playwright to a project:
 - [ ] `npm init -y && npm install --save-dev @playwright/test` in `e2e/`
 - [ ] `npx playwright install chromium`
 - [ ] `e2e/playwright.config.ts` with baseURL and screenshot on failure
 - [ ] At least one smoke test covering the golden path
 - [ ] `.gitignore` entries: `node_modules/`, `test-results/`, `playwright-report/`
 - [ ] `task e2e` wired in Taskfile.yml
 - [ ] CI step added to Gitea workflow
 ## Failure triage
 | Symptom | Likely cause | Fix |
 |---------|-------------|-----|
 | `TimeoutError` on HTMX element | Swap not finished | Use `htmx:afterSwap` event or increase timeout |
 | Stale content asserted | Race between click and swap | Move assertion after explicit wait |
 | SSE test flaky | Stream not done yet | Poll for stable content or use done-signal attribute |
 | CI fails, local passes | Server not ready | Add wait-for-server loop in CI |
 | Screenshots empty | Screenshot taken before render | Add `await page.waitForLoadState('networkidle')` |
--- a/problem-analysis/SKILL.md
+++ b/problem-analysis/SKILL.md
@@ -0,0 +1,222 @@
 ---
 name: problem-analysis
 description: Deeply understand a problem before coding begins. Identify the real problem, stakeholders, constraints, and success criteria. Use before starting any non-trivial feature.
 ---
 # Problem Analysis
 ## Core Principle
 **Understand WHAT needs to be solved and WHY — before considering HOW.**
 Premature solution thinking is one of the most expensive mistakes in software. A feature built on a misunderstood problem is waste, no matter how well it's coded.
 > "A problem well-stated is a problem half-solved." — Charles Kettering
 ## When to Use
 **Always use before:**
 - Starting any non-trivial feature (more than a day's work)
 - Tackling a vague or ambiguous requirement
 - Making an architectural decision
 - Starting a new project or significant component
 **When NOT to use:** Bug fixes with clear reproduction steps, one-line changes, configuration updates.
 ## The Analysis Process
 ### Step 1: Understand the Real Problem
 Don't accept the first statement of a problem at face value. Dig to the real underlying issue.
 **Questions to ask:**
 - What is the user actually trying to accomplish?
 - What pain or friction does this create today?
 - Why is this a problem now? What changed?
 - What happens if we don't solve it?
 - Who else is affected?
 **The 5 Whys technique:** Ask "why" five times to reach the root cause.
 ```
 Problem: "The invoice export is too slow"
 Why? → "Users have to wait 30 seconds"
 Why? → "We're generating all 5000 invoices in a single query"
 Why? → "The feature was built for 50 invoices, now there are 5000"
 Why? → "We didn't anticipate growth"
 Root: "Batch processing architecture needed, not performance tuning"
 ```
 The solution to "export is slow" might be pagination, async export, or caching — not just query optimization.
 ### Step 2: Identify Stakeholders and Their Perspectives
 Every stakeholder has a different view of the problem.
 | Stakeholder | Typical concern |
 |-------------|----------------|
 | End user | "I need to accomplish X in Y time" |
 | Business | "This costs us money / blocks revenue" |
 | Ops/DevOps | "This creates operational burden" |
 | Compliance | "This creates regulatory risk" |
 | Developer | "This is hard to maintain or extend" |
 **Map the stakeholders:** Who is directly affected? Who is indirectly affected? Who has authority over the solution?
 ### Step 3: Define User Needs and Pain Points
 Move from "what users say they want" to "what users actually need."
 ```
 User says: "I need a report with 20 columns"
 User needs: "I need to identify which customers are at risk of churning"
 These may have completely different solutions.
 ```
 **Techniques:**
 - User journey mapping: walk through the current process step by step
 - Pain point identification: where is friction, delay, error, confusion?
 - Job-to-be-done framing: "When [situation], I want to [motivation], so I can [outcome]"
 ### Step 4: Clarify Requirements and Constraints
 **Functional requirements:** What must the system do? (In user terms, not technical terms)
 - "Users can export invoices as PDF"
 - "The system sends a confirmation email within 5 minutes"
 **Non-functional requirements:** How must the system behave?
 - Performance: "Export completes in under 5 seconds for 90% of users"
 - Security: "Only authenticated users can view their own invoices"
 - Availability: "Export feature available 99.9% of the time"
 **Constraints:** What is fixed and non-negotiable?
 - Budget/time
 - Existing systems that must be integrated
 - Regulatory requirements
 - Technology choices already made
 ### Step 5: Clarify Scope
 Explicitly define what is **in scope** and what is **out of scope**. Out-of-scope items are often as important as in-scope items for preventing scope creep.
 **Template:**
 ```
 In scope:
 - [Feature A]
 - [Feature B]
 Out of scope (for this version):
 - [Feature C] — will be addressed in v2
 - [Feature D] — not part of this problem
 Assumptions requiring validation:
 - [Assumption 1]
 - [Assumption 2]
 ```
 ### Step 6: Define Success Criteria
 Success criteria must be specific and testable. Vague criteria are not criteria.
 ```
 Vague: "Make the dashboard faster"
 Specific and testable:
 - Dashboard LCP < 2.5s on a 4G connection (measured with Lighthouse)
 - Initial data load completes in < 500ms (measured from API call to render)
 - Page is usable within 3s even if non-critical data hasn't loaded
 ```
 **Success criteria format:** `[Metric] [comparison] [threshold] [measurement method]`
 ### Step 7: Identify Risks and Unknowns
 | Category | Questions to ask |
 |----------|-----------------|
 | Technical | Do we know how to build this? What might surprise us? |
 | Business | Does the market actually want this? Is the problem real? |
 | Regulatory | Are there compliance implications? |
 | Operational | Will this create new operational burden? |
 | Integration | What external systems does this depend on? |
 ## Output: Problem Analysis Document
 The analysis should produce a `problem-analysis.md` document covering:
 ```markdown
 # Problem Analysis: [Feature/System Name]
 ## Problem Statement
 [One paragraph: what problem exists, for whom, and why it matters]
 ## Context
 [How this problem arose, what triggered the need to solve it now]
 ## Stakeholders
 | Stakeholder | Role | Primary concern |
 |------------|------|----------------|
 | [Name/role] | [Decision maker/user/affected] | [Their core need] |
 ## User Needs and Pain Points
 - [Pain point 1]: [Specific friction or failure today]
 - [Pain point 2]: ...
 ## Functional Requirements
 (In user terms, not technical terms)
 - [ ] [Requirement 1]
 - [ ] [Requirement 2]
 ## Non-Functional Requirements
 - Performance: [specific, measurable]
 - Security: [specific]
 - Availability: [specific]
 ## Scope
 **In scope:** [list]
 **Out of scope:** [list]
 ## Assumptions Requiring Validation
 - [Assumption 1]: [How to validate]
 - [Assumption 2]: ...
 ## Risks and Unknowns
 | Risk | Impact | Likelihood | Mitigation |
 |------|--------|-----------|-----------|
 | [Risk 1] | High/Med/Low | High/Med/Low | [Strategy] |
 ## Success Criteria
 - [Metric]: [threshold] ([measurement method])
 - [Metric]: [threshold] ([measurement method])
 ## Open Questions
 - [Question requiring answer before implementation begins]
 ```
 ## Mathias Context
 As a digital PM building production software:
 - **Capture the why:** The problem document should capture business context, not just technical requirements. Agents reading it should understand why this matters.
 - **Think about the user journey:** Walk through the entire experience, not just the feature in isolation.
 - **Challenge the requirement:** Is this the right problem to solve? Is there a simpler solution?
 - **Estimate impact:** How many users are affected? How often? What's the cost of not solving it?
 ## Transition to Next Steps
 After the problem is understood:
 1. Load `user-stories` skill to decompose into implementable stories
 2. Load `spec-driven-dev` skill to write a technical specification
 3. Load `planning` skill to break into tasks
 **Do NOT jump to implementation.** A complete problem analysis prevents the most expensive mistakes.
 ## Red Flags — When Analysis Is Incomplete
 - Requirements stated in technical terms ("add an index") instead of user terms ("make search faster")
 - No clear success criteria ("it should be better")
 - No stakeholders identified
 - Assumptions not written down
 - Scope not defined
 - "We know what we need to build" — this is usually premature
--- a/refactoring/SKILL.md
+++ b/refactoring/SKILL.md
@@ -0,0 +1,368 @@
 ---
 name: refactoring
 description: Systematic refactoring methodology based on Martin Fowler's catalog. Apply during the REFACTOR phase of TDD, never while making code changes that modify behavior.
 ---
 # Refactoring
 ## Core Principle
 > "Refactoring is a controlled technique for improving the design of existing code. Its essence is applying a series of small behavior-preserving transformations." — Martin Fowler
 **The First Rule:** Never refactor without tests. Tests are your safety net. If tests don't exist, write them first (load `tdd` skill).
 **The Second Rule:** Each refactoring step must keep tests green. If a step breaks tests, undo it.
 **Never mix refactoring and behavior change.** Separate commits for each.
 ## When to Refactor
 - After the GREEN phase of TDD (REFACTOR phase)
 - When code smell makes a feature hard to add
 - When the Boy Scout Rule demands it — you're in the area, leave it better
 - When technical debt is blocking velocity
 **When NOT to refactor:**
 - Code that works and won't change
 - Code being replaced soon
 - When you don't have tests
 ## The Process
 ### Step 1: Ensure Test Coverage
 ```bash
 go test ./...          # All tests pass first
 go test -race ./...    # Race detector clean
 ```
 If tests don't exist, write them before refactoring. See the `tdd` skill.
 ### Step 2: Identify the Target
 Load the `code-review` skill to detect smells. Prioritize:
 1. **Architectural smells** — Shotgun Surgery, Divergent Change, God Objects
 2. **Design smells** — Long Methods, Feature Envy, Primitive Obsession
 3. **Readability smells** — naming, comments, duplication
 ### Step 3: Apply Techniques in Small Steps
 Each technique below is one small step. Make the change. Run tests. Commit if green. Repeat.
 ### Step 4: Verify After Each Step
 ```bash
 go test ./...
 go test -race ./...
 ```
 If red: undo the step (git checkout), understand why, try a smaller step.
 ## Technique Catalog
 ### Composing Methods
 **Extract Function**
 When: a code block can be named meaningfully; a function does multiple things.
 ```go
 // Before
 func processOrder(o *Order) {
    // validate
    if o.Items == nil { panic("no items") }
    // calculate
    total := 0.0
    for _, item := range o.Items {
        total += item.Price
    }
    // save
    db.Save(o, total)
 }
 // After: each step extracted
 func processOrder(o *Order) {
    validateOrder(o)
    total := calculateTotal(o.Items)
    saveOrder(o, total)
 }
 ```
 Cross-reference: when naming extracted functions, load `clean-code` skill for naming conventions.
 **Inline Function**
 When: a function's body is as clear as its name; a function only does what its name says.
 **Extract Variable**
 When: a complex expression is hard to read inline.
 ```go
 // Before
 if user.Subscription.Level >= 2 && !user.IsBanned && user.VerifiedAt != nil { ... }
 // After
 canAccessPremium := user.Subscription.Level >= 2 && !user.IsBanned && user.VerifiedAt != nil
 if canAccessPremium { ... }
 ```
 **Replace Temp with Query**
 When: a temporary variable holds the result of an expression used once.
 ### Moving Features Between Types
 **Move Method**
 When: Feature Envy detected — a method uses another type's data more than its own.
 ```go
 // Before: Order envies Customer
 func (o *Order) getShippingCost() float64 {
    if o.Customer.Country == "SE" { return 0 }
    return 50
 }
 // After: moved to Customer
 func (c *Customer) ShippingCost() float64 {
    if c.Country == "SE" { return 0 }
    return 50
 }
 ```
 **Extract Type**
 When: a cluster of methods and fields have a different responsibility than the rest of the type.
 **Inline Type**
 When: a type isn't doing enough to justify its existence (Lazy Class smell).
 **Hide Delegate**
 When: callers access objects through chains (`a.B().C()`).
 **Remove Middle Man**
 When: a type only delegates without adding value.
 ### Organizing Data
 **Replace Data Value with Object**
 When: Primitive Obsession detected — a string is being used for email, user ID, currency.
 ```go
 // Before
 type User struct {
    Email string
 }
 // After: Email type carries validation and prevents misuse
 type Email struct {
    value string
 }
 func NewEmail(s string) (Email, error) {
    if !strings.Contains(s, "@") {
        return Email{}, fmt.Errorf("invalid email: %q", s)
    }
    return Email{value: s}, nil
 }
 type User struct {
    Email Email
 }
 ```
 **Replace Magic Number with Constant**
 ```go
 // Before
 if retries > 3 { ... }
 // After
 const maxRetries = 3
 if retries > maxRetries { ... }
 ```
 ### Simplifying Conditionals
 **Decompose Conditional**
 When: complex conditionals obscure intent.
 ```go
 // Before
 if plan.ExpiresAt.Before(time.Now()) && plan.GracePeriodDays > 0 { ... }
 // After
 if plan.IsExpiredWithGrace() { ... }
 ```
 **Replace Nested Conditional with Guard Clauses**
 When: deep nesting; use early returns instead.
 ```go
 // Before: arrow code
 func process(o *Order) error {
    if o != nil {
        if len(o.Items) > 0 {
            if o.Customer != nil {
                // actual logic
            }
        }
    }
    return nil
 }
 // After: guard clauses
 func process(o *Order) error {
    if o == nil {
        return errors.New("nil order")
    }
    if len(o.Items) == 0 {
        return errors.New("empty order")
    }
    if o.Customer == nil {
        return errors.New("no customer")
    }
    // actual logic
    return nil
 }
 ```
 **Replace Conditional with Polymorphism**
 When: repeated switch/if-else on a type string. Replace with an interface.
 See the `solid` skill, OCP section for full example.
 **Introduce Null Object**
 When: repeated nil checks for a dependency.
 ### Simplifying Function Calls
 **Rename Method/Variable**
 The most powerful refactoring. When you understand what something does, give it a name that reflects that understanding.
 Cross-reference: load `clean-code` skill for naming conventions.
 **Introduce Parameter Object**
 When: Long Parameter List smell — multiple parameters that always appear together.
 ```go
 // Before
 func Search(query string, page, pageSize int, sortBy string, ascending bool) Results { ... }
 // After
 type SearchOptions struct {
    Query     string
    Page      int
    PageSize  int
    SortBy    string
    Ascending bool
 }
 func Search(opts SearchOptions) Results { ... }
 ```
 **Separate Query from Modifier (Command-Query Separation)**
 When: a function both changes state and returns data.
 ```go
 // Bad: modifies AND returns
 func (s *Stack) PopAndReturn() (Item, bool) {
    // removes and returns
 }
 // Good: separate
 func (s *Stack) Peek() (Item, bool) { ... } // query only
 func (s *Stack) Pop() bool { ... }          // command only
 ```
 ### Dealing with Generalization
 **Extract Interface**
 When: you have multiple implementations or want to enable testing with a test double.
 ```go
 // Before: depends on concrete type
 type Service struct {
    db *postgres.DB
 }
 // After: depends on interface
 type Store interface {
    Save(ctx context.Context, u User) error
 }
 type Service struct {
    store Store
 }
 ```
 **Replace Inheritance with Delegation (Go: Embedding → Explicit Delegation)**
 When: embedding is used but only some methods of the embedded type are needed.
 ## Refactoring Sequence Dependencies
 Apply in this order to avoid breaking changes:
 1. **Extract Variable** → then Extract Function → then Move Function
 2. **Encapsulate Field** → before other data refactorings
 3. **Extract Function** → before Move Function
 4. **Rename** → before Extract Interface
 ## Go-Specific Refactoring Notes
 ### Extracting interfaces
 In Go, define the interface at the point of use, not at the implementation:
 ```go
 // The service package defines what it needs
 package service
 type UserStore interface {
    GetByID(ctx context.Context, id UserID) (User, error)
 }
 ```
 ### Error handling during refactoring
 When extracting functions, preserve the error wrapping chain:
 ```go
 // Extracted function must wrap errors with its context
 func validateOrder(o *Order) error {
    if len(o.Items) == 0 {
        return errors.New("order has no items")
    }
    return nil
 }
 // Caller wraps the context
 if err := validateOrder(o); err != nil {
    return fmt.Errorf("process order: %w", err)
 }
 ```
 ### Struct options pattern (for long parameter lists)
 Go idiom for optional parameters:
 ```go
 type ServerOption func(*Server)
 func WithTimeout(d time.Duration) ServerOption {
    return func(s *Server) { s.timeout = d }
 }
 func NewServer(addr string, opts ...ServerOption) *Server {
    s := &Server{addr: addr, timeout: 30 * time.Second}
    for _, opt := range opts {
        opt(s)
    }
    return s
 }
 ```
 ## Refactoring Commit Strategy
 Refactoring commits should be separate from feature commits:
 - `refactor: extract validateOrder from processOrder`
 - `refactor: replace string email with Email value type`
 - `refactor: rename UserData to UserProfile`
 Never combine: `feat: add discount + refactor: extract discount calculator`
 ## Verification Checklist
 After each refactoring session:
 - [ ] `go test ./...` passes
 - [ ] `go test -race ./...` passes
 - [ ] No behavior change (same inputs produce same outputs)
 - [ ] Committed with a `refactor:` prefix commit message
 - [ ] Smells addressed are documented (if doing a review)
 ## Cross-References
 - When applying Extract Function: load `clean-code` skill for naming conventions
 - When introducing interfaces: load `solid` skill for DIP and ISP guidance
 - When detecting smells: load `code-review` skill
 - Source: Martin Fowler, *Refactoring: Improving the Design of Existing Code* (2nd ed.)
 - Full catalog: https://refactoring.guru/refactoring
--- a/session-retrospective/SKILL.md
+++ b/session-retrospective/SKILL.md
@@ -0,0 +1,134 @@
 ---
 name: session-retrospective
 description: Review a completed coding session (from brain/sessions/*.jsonl) and surface non-obvious learnings worth preserving. Use after a session ends, before context is lost.
 ---
 # Session Retrospective
 ## Overview
 A session retrospective reads the structured log of a completed session — every skill invocation, what was attempted, what failed, what passed, how long it took — and identifies what is worth preserving in the brain for future sessions. The output is a small list of learnings, each with enough context that someone reading them cold understands why they matter.
 **Core principle:** A retrospective that surfaces only obvious patterns is a retrospective that wastes the brain. Filter aggressively for non-obvious.
 ## Iron Laws
 1. **Do not call `brain_write` from this skill.** Surface candidates only — the caller decides what to write.
 2. **Run `brain_query` first** to avoid surfacing learnings that are already in the brain.
 3. **Every Recommendation must generalize beyond this session.** Project-specific paths and identifiers belong in Context, not Recommendation.
 ## When to Use
 - After a coding session ends, before the operator switches context
 - After a multi-day push on the same area, summarizing the whole stretch
 - When the brain is feeling stale on a project area — re-running retrospective on older sessions can re-surface forgotten context
 **Do not use for:**
 - Sessions with no novel content (single-attempt passes, mechanical operations only)
 - Sessions less than ~30 minutes long, unless something dramatic happened in them
 ## Inputs
 The session log is in JSON Lines format. Each project that uses the brain stores logs in its own `brain/sessions/` directory (for hyperguild, that is `~/dev/AI/hyperguild/brain/sessions/<session-id>.jsonl`). Each line is one event:
 ```json
 {"ts": "...", "phase": "tdd_red", "skill": "tdd", "outcome": "fail", "notes": "..."}
 ```
 Either pass the session ID explicitly, or default to the most recent file.
 ## What is Worth Preserving
 - **Patterns that worked** — approaches clean enough to repeat next time, especially if non-obvious from the codebase alone
 - **Failures that revealed something** — bugs whose root cause taught you something about the codebase, the toolchain, or the approach
 - **Decisions made during the session** — architectural, structural, tooling choices that future-you would want to know the reasoning behind
 - **Anything that contradicts or extends established patterns** — these are the highest-value learnings
 ## What is NOT Worth Preserving
 - Routine red-green-refactor cycles with no surprises
 - Single-attempt passes with no interesting context
 - Mechanical operations (file moves, renames, formatting fixes)
 - Failures whose cause was a typo
 - Anything that is already in `CLAUDE.md` or an existing memory file
 **Decision procedure:** If the answer would be discoverable by reading CLAUDE.md, the codebase, or standard toolchain docs, drop it. If you are still unsure, ask: "would future-me, three months from now, want this to surface when querying the brain?" If no, drop it.
 ## Output Format
 Respond in markdown. For each learning worth preserving:
 ```
 **Learning:** One sentence describing what was learned.
 **Context:** Why this session surfaced it — what made it non-obvious.
 **Recommendation:** What should be done differently or repeated going forward.
 ```
 End with a summary line: `N learnings worth writing to brain` or `No novel learnings in this session.`
 The caller (operator or another agent) decides which learnings to actually write to the brain via `brain_write`. This skill does not write directly.
 ## Worked Example
 Input session log shows: a 3-hour session implementing a JSON ingestion pipeline. The session log shows three failed attempts at handling escape characters in LLM-generated JSON, then a passing fix that introduced a `repairJSON` helper.
 Output:
 ```
 **Learning:** LLM-generated JSON containing markdown can produce invalid escape sequences (e.g. `\_` and `\#`) that break the standard JSON parser even when the structure is correct.
 **Context:** The first three TDD cycles failed because the test corpus included markdown-flavored content from the LLM and the parser rejected the entire payload. The fourth cycle introduced a `repairJSON` pre-pass that normalizes escapes before parsing. Without this context, future sessions will rediscover the same failure mode.
 **Recommendation:** When ingesting LLM output as JSON, route it through `repairJSON` before `json.Unmarshal`. Add this to the ingestion-pipeline reference doc.
 **Learning:** Integration tests with `defer resp.Body.Close()` will fail CI's errcheck even when local `go test` is green.
 **Context:** Task 8 of the brain MCP migration shipped this pattern; CI caught it 30 minutes later. The codebase enforces `defer func() { _ = resp.Body.Close() }()` to satisfy errcheck.
 **Recommendation:** Implementer subagents must run `task check` (lint + test + vet) per task, not just `go test`. (Note: this learning has since been written to memory — a real run of this skill would now find it via `brain_query` and drop the candidate. Kept here as illustration.)
 2 learnings worth writing to brain
 ```
 ## Anti-Patterns
 | Anti-Pattern | Why It Fails |
 |---|---|
 | "Lots of tests passed today" | Outcome summary, not a learning. Brain queries on this return nothing useful. |
 | "Refactored the parser" | Activity log, not a learning. What was learned by doing it? |
 | Listing every failure as a learning | Most failures are typos or mechanical fixes. Filter to the ones with insight. |
 | Project-specific paths in the recommendation | Recommendations should generalize where possible — "ingestion pipelines" not "internal/ingest/v3/parser.go". |
 | Writing to brain directly from this skill | This skill surfaces candidates only. The caller decides what to write. |
 ## Brain MCP Integration
 The brain holds prior session learnings; checking it prevents this retrospective from re-surfacing what is already preserved.
 **At retrospective start:**
 - Run `brain_query` with the session's main topic to surface recent related learnings. Avoid emitting a "learning" that is already in the brain.
 **After retrospective output:**
 - This skill does NOT call `brain_write`. The caller does, selectively.
 ### Logging
 Call `session_log` once at the end of every phase to record the outcome.
 Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
 treats `pass` as success, `fail` as failure, `skip` as neither.
 **At end of each phase:**
 - `session_log` with `{skill: "session-retrospective", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **Phases for this skill:** surfaced
 **Status semantics:**
 - `pass` — the phase's intended outcome was reached (candidates were emitted).
 - `fail` — the phase's intended outcome was NOT reached (retrospective could not produce coherent candidates).
 - `skip` — phase was skipped intentionally (no novel learnings worth surfacing).
 **Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
 ## Mode 2 Routing Note
 Reading long session logs and pulling themes is high-volume, mechanical pattern-recognition work — a strong Mode 2 routing candidate. Until Plan 6 ships the routing pod, treat as Mode 1 only.
 ## Cross-References
 - Load `trainer` skill when you have more than ~5 candidates, or when the retrospective output feels too generous and needs a stricter quality gate before any of it goes into the brain. Trainer is the stricter cousin of this skill.
 - Pair with `brain_query` calls for prior session learnings on the same topic to avoid duplication.
--- a/solid/SKILL.md
+++ b/solid/SKILL.md
@@ -0,0 +1,259 @@
 ---
 name: solid
 description: Apply SOLID design principles in Go. Use during architecture decisions, design reviews, and when adding new abstractions.
 ---
 # SOLID Principles
 ## Overview
 SOLID helps structure software to be flexible, maintainable, and testable. These principles reduce coupling and increase cohesion.
 In Go, several principles manifest differently than in classical OOP languages. Where Go idioms conflict with OOP-centric SOLID, **Go wins**. The tension is noted explicitly.
 ## Quick Reference
 | Principle | One-Liner | Question to Ask |
 |-----------|-----------|-----------------|
 | **S**RP | One reason to change | "Does this type have ONE reason to change?" |
 | **O**CP | Open for extension, closed for modification | "Can I extend without modifying?" |
 | **L**SP | Subtypes are substitutable | "Can any implementation replace another safely?" |
 | **I**SP | Small, focused interfaces | "Are clients forced to depend on unused methods?" |
 | **D**IP | Depend on abstractions | "Do I accept interfaces, not concrete types?" |
 ## S — Single Responsibility Principle
 > "A class should have one, and only one, reason to change."
 In Go: a **type** or **package** should have one reason to change. A package that handles both domain logic and database queries has two reasons to change.
 **Detection:**
 - Can you describe the type's responsibility without using "and"?
 - Would different stakeholders (product, ops, DBA) request changes to different parts?
 **Go example:**
 ```go
 // Bad: multiple responsibilities
 type UserHandler struct {}
 func (h *UserHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { ... }
 func (h *UserHandler) SaveToDatabase(u User) error { ... }
 func (h *UserHandler) SendWelcomeEmail(u User) error { ... }
 // Good: single responsibility each
 type UserHandler struct {
    store  UserStore
    mailer Mailer
 }
 type PostgresUserStore struct { db *sql.DB }
 func (s *PostgresUserStore) Save(ctx context.Context, u User) error { ... }
 type SMTPMailer struct { client *smtp.Client }
 func (m *SMTPMailer) SendWelcome(ctx context.Context, email string) error { ... }
 ```
 ## O — Open/Closed Principle
 > "Software entities should be open for extension but closed for modification."
 In Go: add new behavior by implementing an interface or adding a new type, not by modifying existing code. The canonical Go pattern is a `switch` on a string type that needs a new case every time — replace it with an interface.
 **Go example:**
 ```go
 // Bad: must modify to add new payment method
 func ProcessPayment(method string, amount Money) error {
    switch method {
    case "stripe":
        return stripeCharge(amount)
    case "paypal":
        return paypalCharge(amount)
    // Must add cases here for every new method!
    }
    return errors.New("unknown payment method")
 }
 // Good: open for extension via new types
 type PaymentProcessor interface {
    Charge(ctx context.Context, amount Money) error
 }
 type StripeProcessor struct { client *stripe.Client }
 func (p *StripeProcessor) Charge(ctx context.Context, amount Money) error { ... }
 type PayPalProcessor struct { client *paypal.Client }
 func (p *PayPalProcessor) Charge(ctx context.Context, amount Money) error { ... }
 // Add new payment method: implement PaymentProcessor. No existing code changes.
 ```
 ## L — Liskov Substitution Principle
 > "Subtypes must be substitutable for their base types without altering program correctness."
 In Go: **interfaces are structural**. Any type that implements the method set satisfies the interface. This means callers should be able to use any implementation interchangeably without knowing the difference.
 The key question: does your implementation honor the **contract** (documented behavior, not just method signatures)?
 **Go example:**
 ```go
 // UserStore contract: Save persists a user and returns ErrDuplicate if email exists
 type UserStore interface {
    Save(ctx context.Context, u User) error
    GetByEmail(ctx context.Context, email string) (User, error)
 }
 // Good: both implementations honor the contract
 type PostgresUserStore struct { ... }
 func (s *PostgresUserStore) Save(ctx context.Context, u User) error { ... }
 type InMemoryUserStore struct { users map[string]User }
 func (s *InMemoryUserStore) Save(ctx context.Context, u User) error {
    if _, exists := s.users[u.Email]; exists {
        return ErrDuplicate // Must return ErrDuplicate, not some other error
    }
    s.users[u.Email] = u
    return nil
 }
 // Bad: violates contract — callers of UserStore cannot substitute this
 type BrokenStore struct { ... }
 func (s *BrokenStore) Save(ctx context.Context, u User) error {
    panic("not implemented") // Violates contract
 }
 ```
 **Go tension:** Go has no inheritance, so "refused bequest" (subclass ignoring parent methods) doesn't apply. The LSP concern in Go is about interface implementations that partially satisfy the contract through panics or no-ops.
 ## I — Interface Segregation Principle
 > "Clients should not be forced to depend on methods they do not use."
 In Go: this is idiomatic. The `io.Reader`, `io.Writer`, and `io.Closer` pattern is the model — small, focused interfaces that can be composed when needed.
 **Go example:**
 ```go
 // Bad: fat interface — callers that only read must depend on write methods too
 type FileStore interface {
    Read(name string) ([]byte, error)
    Write(name string, data []byte) error
    Delete(name string) error
    List(dir string) ([]string, error)
    Stats(name string) (FileInfo, error)
 }
 // Good: segregated interfaces — callers depend only on what they need
 type FileReader interface {
    Read(name string) ([]byte, error)
 }
 type FileWriter interface {
    Write(name string, data []byte) error
 }
 type FileDeleter interface {
    Delete(name string) error
 }
 // Compose when needed
 type FileReadWriter interface {
    FileReader
    FileWriter
 }
 // Handler that only reads: depends on FileReader only
 func NewReportHandler(store FileReader) *ReportHandler { ... }
 ```
 **Go idiom:** Define interfaces at the point of use, not at the point of implementation. The implementation package should not define the interface — the package that consumes it should.
 ```go
 // In the consumer package (handler)
 type UserStore interface {
    GetByID(ctx context.Context, id UserID) (User, error)
 }
 // The postgres package doesn't need to know about this interface
 // It just implements the method, and Go's structural typing handles the rest
 ```
 ## D — Dependency Inversion Principle
 > "High-level modules should not depend on low-level modules. Both should depend on abstractions."
 In Go: pass dependencies as interface parameters. Never instantiate concrete dependencies inside a function or type that contains business logic.
 ```go
 // Bad: high-level order service depends on low-level email implementation
 type OrderService struct {
    emailClient *sendgrid.Client // Locked to SendGrid
    db          *sql.DB         // Locked to PostgreSQL
 }
 // Good: depends on abstractions
 type Mailer interface {
    Send(ctx context.Context, to, subject, body string) error
 }
 type OrderRepository interface {
    Save(ctx context.Context, o Order) error
    GetByID(ctx context.Context, id OrderID) (Order, error)
 }
 type OrderService struct {
    repo   OrderRepository
    mailer Mailer
 }
 func NewOrderService(repo OrderRepository, mailer Mailer) *OrderService {
    return &OrderService{repo: repo, mailer: mailer}
 }
 ```
 **The Dependency Rule:** Source code dependencies point **inward** toward domain logic, never outward toward infrastructure.
 ```
 HTTP handlers → Application services → Domain types
 Database layer → Application services → Domain types
 Domain types know nothing about HTTP, SQL, or external APIs.
 ```
 ## Applying SOLID at Architecture Level
 | Principle | Package/Module Application |
 |-----------|---------------------------|
 | SRP | Each package has one clear purpose |
 | OCP | New features = new packages/types, not edits to existing |
 | LSP | All implementations of an interface are interchangeable |
 | ISP | Interfaces defined at point of use, as narrow as possible |
 | DIP | Domain packages import nothing from infrastructure packages |
 ## Go Tensions with OOP-centric SOLID
 | OOP SOLID | Go reality |
 |-----------|-----------|
 | Inheritance hierarchies for OCP | Use interfaces + new types instead |
 | Abstract base classes for LSP | No inheritance; use interface contracts |
 | Explicit interface declarations | Interfaces are implicit; define where consumed |
 | "Program to an interface" as ritual | Only extract interface when you have 2+ implementations or need testability |
 **Don't over-abstract.** A function that takes a `*sql.DB` directly is fine if there's only one implementation and it's never tested in isolation. Extract an interface when you need it.
 ## Red Flags
 | Flag | Likely Violation |
 |------|-----------------|
 | Type that "handles X and Y and Z" | SRP |
 | Large `switch` on a type string | OCP |
 | Implementation that panics on some methods | LSP |
 | Interface with 10+ methods | ISP |
 | `new(ConcreteType)` inside business logic | DIP |
 | Package imports something from `infrastructure/` | DIP |
 ## References
 - `references/solid-principles.md` — canonical SOLID reference with TypeScript examples
 - `references/go-adaptation.md` — this workspace's Go-specific rewrite of each principle
 - Load `clean-code` skill for naming and structure
 - Load `code-review` skill for detecting violations during review
--- a/solid/references/architecture.md
+++ b/solid/references/architecture.md
@@ -0,0 +1,318 @@
 # Software Architecture
 ## The Goal of Architecture
 Enable the development team to:
 1. **Add** features with minimal friction
 2. **Change** existing features safely
 3. **Remove** features cleanly
 4. **Test** features in isolation
 5. **Deploy** independently when possible
 ## Architectural Principles
 ### 1. Vertical Boundaries (Features/Slices)
 Organize by **feature**, not by technical layer.
 ```
 BAD: Layer-first
 src/
  controllers/
    UserController.ts
    OrderController.ts
  services/
    UserService.ts
    OrderService.ts
  repositories/
    UserRepository.ts
    OrderRepository.ts
 GOOD: Feature-first
 src/
  users/
    UserController.ts
    UserService.ts
    UserRepository.ts
  orders/
    OrderController.ts
    OrderService.ts
    OrderRepository.ts
 ```
 **Why:** Changes to "users" feature stay in `users/`. High cohesion within features.
 ### 2. Horizontal Boundaries (Layers)
 Separate concerns into layers with clear dependencies.
 ```
 ┌──────────────────────────────────────┐
 │           Presentation               │  UI, Controllers, CLI
 ├──────────────────────────────────────┤
 │           Application                │  Use Cases, Orchestration
 ├──────────────────────────────────────┤
 │             Domain                   │  Business Logic, Entities
 ├──────────────────────────────────────┤
 │          Infrastructure              │  Database, APIs, External
 └──────────────────────────────────────┘
 ```
 ### 3. The Dependency Rule
 **Dependencies point INWARD.**
 ```
 Infrastructure → Application → Domain
      ↓               ↓            ↓
   (outer)        (middle)      (inner)
 ```
 - Inner layers know NOTHING about outer layers
 - Domain has zero dependencies on infrastructure
 - Use interfaces to invert dependencies
 ```typescript
 // Domain defines the interface (inner)
 interface UserRepository {
  save(user: User): Promise<void>;
  findById(id: UserId): Promise<User | null>;
 }
 // Infrastructure implements it (outer)
 class PostgresUserRepository implements UserRepository {
  save(user: User): Promise<void> {
    // SQL here
  }
 }
 // Domain service uses the interface
 class UserService {
  constructor(private repo: UserRepository) {} // Depends on abstraction
 }
 ```
 ### 4. Contracts
 Interfaces define boundaries between components.
 ```typescript
 // The contract
 interface PaymentGateway {
  charge(amount: Money, card: CardDetails): Promise<ChargeResult>;
  refund(chargeId: string): Promise<RefundResult>;
 }
 // Multiple implementations possible
 class StripeGateway implements PaymentGateway { }
 class PayPalGateway implements PaymentGateway { }
 class MockGateway implements PaymentGateway { }  // For tests
 ```
 ### 5. Cross-Cutting Concerns
 Concerns that span multiple features: logging, auth, validation, error handling.
 **Options:**
 - Middleware/interceptors
 - Decorators
 - Aspect-oriented approaches
 - Base classes (use sparingly)
 ```typescript
 // Middleware approach
 class LoggingMiddleware {
  handle(request: Request, next: Handler): Response {
    console.log(`Request: ${request.path}`);
    const response = next(request);
    console.log(`Response: ${response.status}`);
    return response;
  }
 }
 ```
 ### 6. Conway's Law
 > "Organizations design systems that mirror their communication structure."
 **Implication:** Team structure affects architecture. Align both intentionally.
 ---
 ## Common Architectural Styles
 ### Layered Architecture
 Traditional layers: Presentation → Business → Persistence
 **Pros:** Simple, well-understood
 **Cons:** Can become a "big ball of mud" without discipline
 ### Hexagonal Architecture (Ports & Adapters)
 Domain at center, adapters around the edges.
 ```
        ┌─────────────────────┐
        │     HTTP Adapter    │
        └─────────┬───────────┘
                  │
 ┌─────────────────▼─────────────────┐
 │              DOMAIN                │
 │   ┌─────────────────────────┐     │
 │   │      Business Logic      │     │
 │   │      Use Cases           │     │
 │   └─────────────────────────┘     │
 └─────────────────┬─────────────────┘
                  │
        ┌─────────▼───────────┐
        │   Database Adapter   │
        └─────────────────────┘
 ```
 **Ports:** Interfaces defined by the domain
 **Adapters:** Implementations that connect to the outside world
 ### Clean Architecture
 Similar to Hexagonal, with explicit layers:
 1. **Entities** - Enterprise business rules
 2. **Use Cases** - Application business rules
 3. **Interface Adapters** - Controllers, Presenters, Gateways
 4. **Frameworks & Drivers** - Web, DB, External interfaces
 ---
 ## Feature-Driven Structure (Frontend)
 ```
 src/
  features/
    auth/
      components/
        LoginForm.tsx
        SignupForm.tsx
      hooks/
        useAuth.ts
      services/
        authService.ts
      types/
        auth.types.ts
      index.ts  # Public API
    checkout/
      components/
      hooks/
      services/
      types/
      index.ts
  shared/
    components/  # Truly shared UI
    hooks/       # Truly shared hooks
    utils/       # Truly shared utilities
 ```
 ---
 ## Feature-Driven Structure (Backend)
 ```
 src/
  modules/
    users/
      domain/
        User.ts
        UserRepository.ts  # Interface
      application/
        CreateUser.ts      # Use case
        GetUser.ts         # Use case
      infrastructure/
        PostgresUserRepo.ts
      presentation/
        UserController.ts
        UserDTO.ts
    orders/
      domain/
      application/
      infrastructure/
      presentation/
  shared/
    domain/        # Shared value objects
    infrastructure/ # Shared infra utilities
 ```
 ---
 ## The Walking Skeleton
 Start with a minimal end-to-end slice:
 1. **Thinnest possible feature** that touches all layers
 2. **Deployable** from day one
 3. **Proves the architecture** works
 Example walking skeleton for e-commerce:
 - User can view ONE product (hardcoded)
 - User can add it to cart
 - User can "checkout" (just logs)
 From there, flesh out each feature fully.
 ---
 ## Testing Architecture
 ```
 ┌────────────────────────────────────────────┐
 │            E2E / Acceptance Tests          │  Few, slow, high confidence
 ├────────────────────────────────────────────┤
 │            Integration Tests               │  Some, medium speed
 ├────────────────────────────────────────────┤
 │              Unit Tests                    │  Many, fast, isolated
 └────────────────────────────────────────────┘
 ```
 **Test by layer:**
 - **Domain:** Unit tests (most tests here)
 - **Application:** Integration tests with mocked infra
 - **Infrastructure:** Integration tests with real dependencies
 - **E2E:** Critical paths only
 ---
 ## Architecture Decision Records (ADRs)
 Document significant decisions:
 ```markdown
 # ADR 001: Use PostgreSQL for persistence
 ## Status
 Accepted
 ## Context
 We need a database. Options: PostgreSQL, MongoDB, MySQL
 ## Decision
 PostgreSQL for:
 - ACID compliance
 - Team familiarity
 - JSON support for flexibility
 ## Consequences
 - Need PostgreSQL expertise
 - Schema migrations required
 - Excellent query capabilities
 ```
 ---
 ## Red Flags in Architecture
 - **Circular dependencies** between modules
 - **Domain depending on infrastructure**
 - **Framework code in business logic**
 - **No clear boundaries** between features
 - **Shared mutable state** across modules
 - **"Util" or "Common" packages** that grow forever
 - **Database schema driving domain model**
--- a/solid/references/complexity.md
+++ b/solid/references/complexity.md
@@ -0,0 +1,286 @@
 # Managing Complexity
 ## The Two Types of Complexity
 ### Essential Complexity
 Inherent to the problem domain. Cannot be removed, only managed.
 - Business rules
 - Domain logic
 - User requirements
 ### Accidental Complexity
 Introduced by our solutions. CAN and SHOULD be minimized.
 - Poor abstractions
 - Unnecessary indirection
 - Framework ceremony
 - Technical debt
 **Goal: Minimize accidental complexity while clearly expressing essential complexity.**
 ---
 ## Detecting Complexity
 ### 1. Change Amplification
 Small changes require touching many files.
 **Symptom:** "To add this field, I need to update 15 files."
 **Cause:** Scattered responsibilities, poor abstraction boundaries.
 ### 2. Cognitive Load
 Code is hard to understand, requires holding too much in memory.
 **Symptom:** "I need to understand 10 other classes to understand this one."
 **Cause:** Tight coupling, hidden dependencies, unclear naming.
 ### 3. Unknown Unknowns
 Behavior is surprising, side effects are hidden.
 **Symptom:** "I changed this, and something completely unrelated broke."
 **Cause:** Global state, hidden dependencies, implicit contracts.
 ---
 ## The XP Values for Fighting Complexity
 From Extreme Programming:
 ### 1. Communication
 Code should communicate clearly. Names, structure, tests all contribute.
 ### 2. Simplicity
 Do the simplest thing that could possibly work.
 ### 3. Feedback
 Fast feedback loops catch complexity early. TDD, CI, code review.
 ### 4. Courage
 Refactor aggressively. Don't let complexity accumulate.
 ### 5. Respect
 Respect future readers (including yourself). Write for humans first.
 ---
 ## KISS - Keep It Simple, Silly
 > "The simplest solution that works is usually the best."
 ### How to Apply:
 1. Start with the obvious solution
 2. Only add complexity when REQUIRED
 3. Prefer boring, well-understood approaches
 4. Question every abstraction
 ```typescript
 // Over-engineered
 class UserServiceFactoryProvider {
  private static instance: UserServiceFactoryProvider;
  static getInstance(): UserServiceFactoryProvider { ... }
  createFactory(): UserServiceFactory { ... }
 }
 // KISS
 class UserService {
  getUser(id: string): User { ... }
 }
 ```
 ---
 ## YAGNI - You Aren't Gonna Need It
 > "Don't build features until they're actually needed."
 ### Warning Signs:
 - "We might need this later"
 - "It would be nice to have"
 - "Just in case"
 - "For future extensibility"
 ### The Cost of YAGNI Violations:
 1. **Development time** - Building unused features
 2. **Maintenance burden** - Code that must be maintained
 3. **Cognitive load** - More to understand
 4. **Wrong abstraction** - Guessing future needs incorrectly
 ```typescript
 // YAGNI violation: Building for hypothetical needs
 class User {
  // "We might need these someday"
  middleName?: string;
  secondaryEmail?: string;
  faxNumber?: string;
  linkedinProfile?: string;
  twitterHandle?: string;
 }
 // YAGNI: Only what's needed NOW
 class User {
  name: string;
  email: Email;
 }
 ```
 ---
 ## DRY - Don't Repeat Yourself (with The Rule of Three)
 > "Every piece of knowledge should have a single, unambiguous representation."
 ### BUT: The Rule of Three
 **Don't extract duplication until you see it THREE times.**
 Why? The wrong abstraction is worse than duplication.
 ```
 Duplication #1 → Leave it
 Duplication #2 → Note it, leave it
 Duplication #3 → NOW extract it
 ```
 ### Example:
 ```typescript
 // First time - leave it
 function processUserOrder(order) {
  validate(order);
  calculateTax(order);
  save(order);
 }
 // Second time - note the similarity, but leave it
 function processGuestOrder(order) {
  validate(order);
  calculateTax(order);
  save(order);
  sendGuestEmail(order);
 }
 // Third time - NOW extract
 function processCorporateOrder(order) {
  validate(order);
  calculateTax(order);
  save(order);
  applyCorporateDiscount(order);
 }
 // After three, extract the common parts
 function processOrder(order: Order, postProcessing: (o: Order) => void) {
  validate(order);
  calculateTax(order);
  save(order);
  postProcessing(order);
 }
 ```
 ---
 ## Separation of Concerns
 > "Each module should address a single concern."
 ### Concerns to Separate:
 - **Business logic** vs **Infrastructure**
 - **What** (policy) vs **How** (mechanism)
 - **Input** vs **Processing** vs **Output**
 - **Data** vs **Behavior**
 ### Example:
 ```typescript
 // BAD: Mixed concerns
 class OrderProcessor {
  process(order: Order) {
    // Validation
    if (!order.items.length) throw new Error('Empty');
    // Business logic
    let total = 0;
    for (const item of order.items) {
      total += item.price * item.quantity;
    }
    // Persistence
    const db = new Database();
    db.query(`INSERT INTO orders...`);
    // Notification
    const email = new EmailClient();
    email.send(order.customer.email, 'Order confirmed');
  }
 }
 // GOOD: Separated concerns
 class OrderProcessor {
  constructor(
    private validator: OrderValidator,
    private calculator: OrderCalculator,
    private repository: OrderRepository,
    private notifier: OrderNotifier
  ) {}
  process(order: Order): ProcessResult {
    this.validator.validate(order);
    const total = this.calculator.calculateTotal(order);
    const savedOrder = this.repository.save(order);
    this.notifier.notifyConfirmation(savedOrder);
    return ProcessResult.success(savedOrder);
  }
 }
 ```
 ---
 ## Managing Technical Debt
 ### Types of Technical Debt:
 1. **Deliberate** - Conscious trade-off for speed
 2. **Accidental** - Mistakes, lack of knowledge
 3. **Bit rot** - Code degrades over time
 ### The Boy Scout Rule:
 > "Leave the code better than you found it."
 Every time you touch code:
 - Improve one small thing
 - Fix one naming issue
 - Extract one method
 - Add one missing test
 ### When to Pay Down Debt:
 - When it's in your path (you're already there)
 - When it's blocking new features
 - When it's causing bugs
 - During dedicated refactoring time
 ### When NOT to Refactor:
 - Code that works and won't change
 - Code being replaced soon
 - When you don't have tests
 ---
 ## The Four Elements of Simple Design
 In priority order (from XP):
 1. **Runs all the tests**
   - If it doesn't work, nothing else matters
 2. **Expresses intent**
   - Clear names, obvious structure
   - Code tells the story
 3. **No duplication**
   - DRY (but Rule of Three)
   - Single source of truth
 4. **Minimal**
   - Fewest classes and methods possible
   - Remove anything unnecessary
 If these four are true, the design is simple enough.
--- a/solid/references/design-patterns.md
+++ b/solid/references/design-patterns.md
@@ -0,0 +1,504 @@
 # Design Patterns
 ## What Are Design Patterns?
 Reusable solutions to common design problems. A shared vocabulary for discussing design.
 ## WARNING: Don't Force Patterns
 > "Let patterns emerge from refactoring, don't force them upfront."
 Patterns should solve problems you HAVE, not problems you MIGHT have.
 ## When to Use Patterns
 1. **You recognize the problem** - You've seen it before
 2. **The pattern fits** - Not forcing it
 3. **It simplifies** - Doesn't add unnecessary complexity
 4. **Team understands it** - Shared knowledge
 ---
 ## Creational Patterns
 ### Singleton
 **Purpose:** Ensure only one instance exists.
 **When to use:** Global configuration, connection pools, logging.
 **Warning:** Often overused. Consider dependency injection instead.
 ```typescript
 class Logger {
  private static instance: Logger;
  private constructor() {}
  static getInstance(): Logger {
    if (!Logger.instance) {
      Logger.instance = new Logger();
    }
    return Logger.instance;
  }
  log(message: string): void { ... }
 }
 ```
 ### Factory
 **Purpose:** Create objects without specifying exact class.
 **When to use:** Object creation logic is complex, or varies by type.
 ```typescript
 interface Notification {
  send(message: string): void;
 }
 class EmailNotification implements Notification { ... }
 class SMSNotification implements Notification { ... }
 class PushNotification implements Notification { ... }
 class NotificationFactory {
  create(type: 'email' | 'sms' | 'push'): Notification {
    switch (type) {
      case 'email': return new EmailNotification();
      case 'sms': return new SMSNotification();
      case 'push': return new PushNotification();
    }
  }
 }
 ```
 ### Builder
 **Purpose:** Construct complex objects step by step.
 **When to use:** Objects with many optional parameters, test data creation.
 ```typescript
 class UserBuilder {
  private user: Partial<User> = {};
  withName(name: string): UserBuilder {
    this.user.name = name;
    return this;
  }
  withEmail(email: string): UserBuilder {
    this.user.email = email;
    return this;
  }
  withAge(age: number): UserBuilder {
    this.user.age = age;
    return this;
  }
  build(): User {
    return new User(
      this.user.name!,
      this.user.email!,
      this.user.age
    );
  }
 }
 // Usage
 const user = new UserBuilder()
  .withName('Alice')
  .withEmail('alice@example.com')
  .build();
 ```
 ### Prototype
 **Purpose:** Create new objects by cloning existing ones.
 **When to use:** Object creation is expensive, or you need copies with slight variations.
 ```typescript
 interface Prototype {
  clone(): Prototype;
 }
 class Document implements Prototype {
  constructor(
    public title: string,
    public content: string,
    public metadata: Metadata
  ) {}
  clone(): Document {
    return new Document(
      this.title,
      this.content,
      { ...this.metadata }
    );
  }
 }
 ```
 ---
 ## Structural Patterns
 ### Adapter
 **Purpose:** Make incompatible interfaces work together.
 **When to use:** Integrating third-party libraries, legacy code.
 ```typescript
 // Third-party library with different interface
 class OldPaymentAPI {
  makePayment(cents: number): boolean { ... }
 }
 // Our interface
 interface PaymentGateway {
  charge(amount: Money): ChargeResult;
 }
 // Adapter
 class OldPaymentAdapter implements PaymentGateway {
  constructor(private oldAPI: OldPaymentAPI) {}
  charge(amount: Money): ChargeResult {
    const cents = amount.toCents();
    const success = this.oldAPI.makePayment(cents);
    return success ? ChargeResult.success() : ChargeResult.failed();
  }
 }
 ```
 ### Decorator
 **Purpose:** Add behavior to objects dynamically.
 **When to use:** Adding features without modifying existing code.
 ```typescript
 interface Notifier {
  send(message: string): void;
 }
 class EmailNotifier implements Notifier {
  send(message: string): void {
    console.log(`Email: ${message}`);
  }
 }
 // Decorators
 class SMSDecorator implements Notifier {
  constructor(private wrapped: Notifier) {}
  send(message: string): void {
    this.wrapped.send(message);
    console.log(`SMS: ${message}`);
  }
 }
 class SlackDecorator implements Notifier {
  constructor(private wrapped: Notifier) {}
  send(message: string): void {
    this.wrapped.send(message);
    console.log(`Slack: ${message}`);
  }
 }
 // Usage - compose behaviors
 const notifier = new SlackDecorator(
  new SMSDecorator(
    new EmailNotifier()
  )
 );
 notifier.send('Alert!'); // Sends to all three
 ```
 ### Proxy
 **Purpose:** Control access to an object.
 **When to use:** Lazy loading, access control, logging, caching.
 ```typescript
 interface Image {
  display(): void;
 }
 class RealImage implements Image {
  constructor(private filename: string) {
    this.loadFromDisk(); // Expensive
  }
  private loadFromDisk(): void { ... }
  display(): void { ... }
 }
 // Lazy loading proxy
 class ImageProxy implements Image {
  private realImage: RealImage | null = null;
  constructor(private filename: string) {}
  display(): void {
    if (!this.realImage) {
      this.realImage = new RealImage(this.filename);
    }
    this.realImage.display();
  }
 }
 ```
 ### Composite
 **Purpose:** Treat individual objects and compositions uniformly.
 **When to use:** Tree structures, hierarchies (files/folders, UI components).
 ```typescript
 interface Component {
  getPrice(): number;
 }
 class Product implements Component {
  constructor(private price: number) {}
  getPrice(): number {
    return this.price;
  }
 }
 class Box implements Component {
  private children: Component[] = [];
  add(component: Component): void {
    this.children.push(component);
  }
  getPrice(): number {
    return this.children.reduce(
      (sum, child) => sum + child.getPrice(),
      0
    );
  }
 }
 // Usage
 const smallBox = new Box();
 smallBox.add(new Product(10));
 smallBox.add(new Product(20));
 const bigBox = new Box();
 bigBox.add(smallBox);
 bigBox.add(new Product(50));
 console.log(bigBox.getPrice()); // 80
 ```
 ---
 ## Behavioral Patterns
 ### Strategy
 **Purpose:** Define a family of algorithms, make them interchangeable.
 **When to use:** Multiple ways to do something, switchable at runtime.
 ```typescript
 interface PricingStrategy {
  calculate(basePrice: number): number;
 }
 class RegularPricing implements PricingStrategy {
  calculate(basePrice: number): number {
    return basePrice;
  }
 }
 class PremiumDiscount implements PricingStrategy {
  calculate(basePrice: number): number {
    return basePrice * 0.8; // 20% off
  }
 }
 class BlackFriday implements PricingStrategy {
  calculate(basePrice: number): number {
    return basePrice * 0.5; // 50% off
  }
 }
 class ShoppingCart {
  constructor(private pricing: PricingStrategy) {}
  calculateTotal(items: Item[]): number {
    const base = items.reduce((sum, i) => sum + i.price, 0);
    return this.pricing.calculate(base);
  }
 }
 ```
 ### Observer
 **Purpose:** Notify multiple objects about state changes.
 **When to use:** Event systems, pub/sub, reactive updates.
 ```typescript
 interface Observer {
  update(event: Event): void;
 }
 class EventEmitter {
  private observers: Observer[] = [];
  subscribe(observer: Observer): void {
    this.observers.push(observer);
  }
  unsubscribe(observer: Observer): void {
    this.observers = this.observers.filter(o => o !== observer);
  }
  notify(event: Event): void {
    this.observers.forEach(o => o.update(event));
  }
 }
 // Usage
 class OrderService extends EventEmitter {
  placeOrder(order: Order): void {
    // Process order...
    this.notify({ type: 'ORDER_PLACED', order });
  }
 }
 class EmailService implements Observer {
  update(event: Event): void {
    if (event.type === 'ORDER_PLACED') {
      this.sendConfirmation(event.order);
    }
  }
 }
 ```
 ### Template Method
 **Purpose:** Define algorithm skeleton, let subclasses override steps.
 **When to use:** Common algorithm with varying steps.
 ```typescript
 abstract class DataExporter {
  // Template method - defines the algorithm
  export(data: Data[]): void {
    this.validate(data);
    const formatted = this.format(data);
    this.write(formatted);
    this.notify();
  }
  // Common steps
  private validate(data: Data[]): void { ... }
  private notify(): void { ... }
  // Steps to override
  protected abstract format(data: Data[]): string;
  protected abstract write(content: string): void;
 }
 class CSVExporter extends DataExporter {
  protected format(data: Data[]): string {
    return data.map(d => d.toCSV()).join('\n');
  }
  protected write(content: string): void {
    fs.writeFileSync('export.csv', content);
  }
 }
 class JSONExporter extends DataExporter {
  protected format(data: Data[]): string {
    return JSON.stringify(data);
  }
  protected write(content: string): void {
    fs.writeFileSync('export.json', content);
  }
 }
 ```
 ### Command
 **Purpose:** Encapsulate a request as an object.
 **When to use:** Undo/redo, queuing, logging actions.
 ```typescript
 interface Command {
  execute(): void;
  undo(): void;
 }
 class AddItemCommand implements Command {
  constructor(
    private cart: Cart,
    private item: Item
  ) {}
  execute(): void {
    this.cart.add(this.item);
  }
  undo(): void {
    this.cart.remove(this.item);
  }
 }
 class CommandHistory {
  private history: Command[] = [];
  execute(command: Command): void {
    command.execute();
    this.history.push(command);
  }
  undo(): void {
    const command = this.history.pop();
    command?.undo();
  }
 }
 ```
 ---
 ## Pattern Awareness
 ### The Four-Dimensional Lens
 When analyzing new code/libraries, ask:
 1. **What problem does it solve?** (Creational, Structural, Behavioral)
 2. **What scope?** (Object-level, Class-level, System-level)
 3. **When is it applied?** (Compile-time, Runtime)
 4. **How coupled?** (Tight, Loose)
 This helps recognize patterns even in unfamiliar code.
 ---
 ## Anti-Patterns to Avoid
 | Anti-Pattern | Problem | Solution |
 |--------------|---------|----------|
 | **God Object** | Class does everything | Split by responsibility |
 | **Spaghetti Code** | Tangled, no structure | Refactor to layers |
 | **Golden Hammer** | Using one pattern for everything | Match pattern to problem |
 | **Premature Optimization** | Optimizing before needed | YAGNI, profile first |
 | **Copy-Paste Programming** | Duplication | Extract, Rule of Three |
--- a/solid/references/go-adaptation.md
+++ b/solid/references/go-adaptation.md
@@ -0,0 +1,317 @@
 # SOLID Principles: Go Adaptation
 This document rewrites each SOLID principle with idiomatic Go examples. Where Go idioms conflict with OOP-centric formulations, Go wins. Tension is noted.
 ## S — Single Responsibility Principle in Go
 **OOP formulation:** A class should have one reason to change.
 **Go formulation:** A type or package should have one reason to change. Responsibility maps to the unit of deployment (package), not just types.
 ### Go example
 ```go
 // Bad: Order type mixes domain, persistence, and notification
 package order
 type Order struct {
    ID       string
    Items    []Item
    Customer Customer
 }
 func (o *Order) Save(db *sql.DB) error { ... }       // persistence concern
 func (o *Order) SendReceipt(smtp *mail.Client) error { ... } // notification concern
 func (o *Order) CalculateTotal() Money { ... }       // domain concern (correct)
 // Good: each type/package has one reason to change
 package order
 // Domain type: only reason to change = business rules change
 type Order struct {
    ID       OrderID
    Items    []Item
    Customer Customer
 }
 func (o Order) Total() Money {
    total := Money{}
    for _, item := range o.Items {
        total = total.Add(item.Price.Multiply(item.Quantity))
    }
    return total
 }
 package orderstore
 // Persistence: only reason to change = storage mechanism changes
 type Store struct { db *sql.DB }
 func (s *Store) Save(ctx context.Context, o order.Order) error { ... }
 package ordernotify
 // Notification: only reason to change = notification channel changes
 type Notifier struct { mailer Mailer }
 func (n *Notifier) SendReceipt(ctx context.Context, o order.Order) error { ... }
 ```
 ### Package-level SRP
 Package names should be nouns that describe one concept:
 - `store`, `cache`, `handler`, `validator` — single responsibility
 - `util`, `common`, `misc`, `helpers` — SRP violation waiting to happen
 ---
 ## O — Open/Closed Principle in Go
 **OOP formulation:** Open for extension via subclassing, closed for modification.
 **Go formulation:** Open for extension by implementing an interface or adding new types; closed for modification of existing types. No subclassing needed.
 ### Go example
 ```go
 // Bad: adding a new discount type requires modifying existing code
 func ApplyDiscount(order *Order, discountType string) Money {
    switch discountType {
    case "percentage":
        return order.Total().Multiply(0.9)
    case "fixed":
        return order.Total().Subtract(Money{Amount: 10})
    // Must add case here every time
    }
    return order.Total()
 }
 // Good: add new discount types by implementing the interface
 type Discounter interface {
    Apply(total Money) Money
 }
 type PercentageDiscount struct{ Percent float64 }
 func (d PercentageDiscount) Apply(total Money) Money {
    return total.Multiply(1 - d.Percent/100)
 }
 type FixedDiscount struct{ Amount Money }
 func (d FixedDiscount) Apply(total Money) Money {
    return total.Subtract(d.Amount)
 }
 // Adding SeniorDiscount requires zero changes to existing code
 type SeniorDiscount struct{}
 func (d SeniorDiscount) Apply(total Money) Money {
    return total.Multiply(0.85)
 }
 func ApplyDiscount(order *Order, d Discounter) Money {
    return d.Apply(order.Total())
 }
 ```
 ### Go tension
 Go has no inheritance, so "closed for modification" is natural — you can't subclass a concrete type to override behavior. The extension point is always an interface. If you're adding switch cases to handle new types, that's the signal to introduce an interface.
 ---
 ## L — Liskov Substitution Principle in Go
 **OOP formulation:** Subtypes must be substitutable for their base types.
 **Go formulation:** Any implementation of an interface must honor the interface's documented contract, not just its method signatures. Go's structural typing means LSP is enforced by convention and documentation, not the compiler.
 ### Contract documentation
 ```go
 // UserStore: all implementations must honor this contract:
 //   - Save: persists user, returns ErrDuplicateEmail if email already exists
 //   - GetByEmail: returns ErrNotFound if user does not exist
 //   - Both methods must be safe for concurrent use
 type UserStore interface {
    Save(ctx context.Context, u User) error
    GetByEmail(ctx context.Context, email string) (User, error)
 }
 var (
    ErrDuplicateEmail = errors.New("duplicate email")
    ErrNotFound       = errors.New("not found")
 )
 // PostgresUserStore: honors contract
 type PostgresUserStore struct { db *sql.DB }
 func (s *PostgresUserStore) Save(ctx context.Context, u User) error {
    _, err := s.db.ExecContext(ctx, "INSERT INTO users ...", u.Email)
    if isUniqueViolation(err) {
        return ErrDuplicateEmail // Returns documented error
    }
    return err
 }
 // InMemoryUserStore: honors contract (for tests)
 type InMemoryUserStore struct {
    mu    sync.Mutex
    users map[string]User
 }
 func (s *InMemoryUserStore) Save(ctx context.Context, u User) error {
    s.mu.Lock()
    defer s.mu.Unlock()
    if _, exists := s.users[u.Email]; exists {
        return ErrDuplicateEmail // Same documented error
    }
    s.users[u.Email] = u
    return nil
 }
 ```
 ### LSP violations to watch for
 ```go
 // Bad: violates contract — does not return ErrDuplicateEmail
 type CachedUserStore struct { ... }
 func (s *CachedUserStore) Save(ctx context.Context, u User) error {
    return errors.New("cache: duplicate key") // Different error type breaks callers
 }
 // Bad: panics on some inputs — violates contract
 type LazyUserStore struct { ... }
 func (s *LazyUserStore) GetByEmail(ctx context.Context, email string) (User, error) {
    panic("not implemented yet") // Violates LSP
 }
 ```
 ---
 ## I — Interface Segregation Principle in Go
 **OOP formulation:** Clients should not depend on methods they do not use.
 **Go formulation:** Define the narrowest interface possible at the point of use. Go's structural typing makes this natural — you don't need the implementation to declare what it implements.
 ### io.Reader as the canonical example
 ```go
 // io.Reader is a single-method interface
 type Reader interface {
    Read(p []byte) (n int, err error)
 }
 // Any type with Read() satisfies this — os.File, bytes.Buffer, net.Conn, etc.
 // Callers that only need to read accept Reader, not a fat interface
 func parseConfig(r io.Reader) (Config, error) { ... }
 ```
 ### Define interfaces where you consume them
 ```go
 // package report — only needs to read invoices
 package report
 // Define the interface here, at the point of use — not in the invoice package
 type InvoiceReader interface {
    GetByID(ctx context.Context, id InvoiceID) (Invoice, error)
    ListByCustomer(ctx context.Context, customerID CustomerID) ([]Invoice, error)
 }
 func NewReporter(invoices InvoiceReader) *Reporter { ... }
 // The invoice package's PostgresStore has 10+ methods
 // This interface only exposes what the reporter needs
 // Adding new methods to PostgresStore never forces changes to the reporter
 ```
 ### Composing interfaces
 ```go
 type Reader interface {
    Read(p []byte) (n int, err error)
 }
 type Writer interface {
    Write(p []byte) (n int, err error)
 }
 // Compose at the call site, not in the definition
 type ReadWriter interface {
    Reader
    Writer
 }
 ```
 ---
 ## D — Dependency Inversion Principle in Go
 **OOP formulation:** High-level modules should not depend on low-level modules. Both depend on abstractions.
 **Go formulation:** Accept interfaces, not concrete types. Domain packages import nothing from infrastructure packages. Infrastructure packages import from domain packages.
 ### The dependency direction rule
 ```
 cmd/ → internal/handler → internal/service → internal/domain
 internal/store (postgres) → internal/domain
 internal/mailer (smtp) → internal/domain
 domain imports nothing from store, mailer, handler
 ```
 ### Constructor injection pattern
 ```go
 // Good: domain service accepts interfaces
 package service
 type UserService struct {
    store  UserStore  // interface
    mailer Mailer     // interface
    logger *slog.Logger
 }
 // NewUserService constructs with injected dependencies.
 // All parameters are interfaces — callers provide implementations.
 func NewUserService(store UserStore, mailer Mailer, logger *slog.Logger) *UserService {
    return &UserService{store: store, mailer: mailer, logger: logger}
 }
 // Bad: domain service imports infrastructure
 package service
 import "github.com/example/myapp/internal/store/postgres"
 type UserService struct {
    store *postgres.Store // Locked to PostgreSQL — breaks DIP
 }
 ```
 ### Wire it up at the boundary
 ```go
 // cmd/server/main.go — the composition root
 func main() {
    db := mustOpenDB(cfg.DatabaseURL)
    store := postgres.NewUserStore(db)
    mailer := smtp.NewMailer(cfg.SMTP)
    logger := slog.Default()
    svc := service.NewUserService(store, mailer, logger)
    handler := handler.NewUserHandler(svc)
    // ...
 }
 ```
 The `main` function (or a dependency injection framework) is the only place that names concrete implementations. All other code depends on interfaces.
 ### When NOT to extract an interface
 Don't extract an interface prematurely. These are fine as concrete dependencies:
 - `*slog.Logger` — no interface needed; it already accepts a Handler interface internally
 - `*sql.DB` — acceptable in `store` packages; extract an interface at the service boundary
 - Standard library types that are stable and have no test double need
 Only extract an interface when you have:
 1. Multiple implementations (real + test double), or
 2. A package boundary you want to keep clean (domain knows nothing of postgres)
--- a/solid/references/object-design.md
+++ b/solid/references/object-design.md
@@ -0,0 +1,328 @@
 # Object-Oriented Design
 ## Responsibility-Driven Design (RDD)
 The key insight: **Objects are defined by their responsibilities, not their data.**
 ### Finding Objects
 Start with:
 1. **Nouns** in requirements → candidate objects
 2. **Verbs** → candidate methods/behaviors
 3. **Domain concepts** → value objects
 ### Finding Responsibilities
 Each object should answer:
 - What does this object **know**?
 - What does this object **do**?
 - What does this object **decide**?
 ### Object Stereotypes
 Every class fits one (or maybe two) stereotypes:
 | Stereotype | Purpose | Example |
 |------------|---------|---------|
 | **Information Holder** | Knows things, holds data | `User`, `Product`, `Address` |
 | **Structurer** | Maintains relationships | `OrderItems`, `UserGroup` |
 | **Service Provider** | Performs work | `PaymentProcessor`, `EmailSender` |
 | **Coordinator** | Orchestrates workflow | `OrderFulfillmentService` |
 | **Controller** | Makes decisions, delegates | `CheckoutController` |
 | **Interfacer** | Transforms between systems | `UserAPIAdapter`, `DatabaseMapper` |
 ### The Two Questions
 For every class, ask:
 1. **"What pattern is this?"** - Which stereotype? Which design pattern?
 2. **"Is it doing too much?"** - Check object calisthenics rules
 If you can't answer clearly, the class needs refactoring.
 ---
 ## Tell, Don't Ask
 **Command objects to do work. Don't interrogate them and do the work yourself.**
 ```typescript
 // BAD: Asking, then doing
 if (account.getBalance() >= amount) {
  account.setBalance(account.getBalance() - amount);
  // more logic here...
 }
 // GOOD: Telling
 const result = account.withdraw(amount);
 if (result.isSuccess()) {
  // ...
 }
 ```
 The object that has the data should have the behavior.
 ---
 ## Design by Contract (DbC)
 Every method has:
 - **Preconditions** - What must be true BEFORE calling
 - **Postconditions** - What will be true AFTER calling
 - **Invariants** - What is ALWAYS true about the object
 ```typescript
 class BankAccount {
  private balance: Money;
  // INVARIANT: balance is never negative
  // PRECONDITION: amount > 0
  // POSTCONDITION: balance decreased by amount OR error returned
  withdraw(amount: Money): WithdrawResult {
    if (amount.isNegativeOrZero()) {
      return WithdrawResult.invalidAmount();
    }
    if (this.balance.isLessThan(amount)) {
      return WithdrawResult.insufficientFunds();
    }
    this.balance = this.balance.minus(amount);
    return WithdrawResult.success(this.balance);
  }
 }
 ```
 ---
 ## Composition Over Inheritance
 **Prefer composing objects over extending classes.**
 ### Why Inheritance is Problematic:
 - Tight coupling between parent and child
 - Fragile base class problem
 - Difficult to change parent without breaking children
 - Forces "is-a" relationship that may not fit
 ### When to Use Inheritance:
 - True "is-a" relationship (rare)
 - Framework requirements
 - Template Method pattern (intentional)
 ### Prefer Composition:
 ```typescript
 // BAD: Inheritance
 class PremiumUser extends User {
  getDiscount(): number { return 20; }
 }
 // GOOD: Composition
 class User {
  constructor(private discountPolicy: DiscountPolicy) {}
  getDiscount(): number {
    return this.discountPolicy.calculate();
  }
 }
 // Now discount behavior is pluggable
 new User(new PremiumDiscount());
 new User(new StandardDiscount());
 new User(new NoDiscount());
 ```
 ---
 ## The Law of Demeter (Principle of Least Knowledge)
 **Only talk to your immediate friends.**
 A method should only call:
 1. Methods on `this`
 2. Methods on parameters
 3. Methods on objects it creates
 4. Methods on its direct components
 ```typescript
 // BAD: Reaching through objects
 order.getCustomer().getAddress().getCity();
 // GOOD: Ask the immediate friend
 order.getShippingCity();
 ```
 This reduces coupling - changes to `Address` don't ripple through all callers.
 ---
 ## Encapsulation
 **Hide internal details, expose behavior.**
 ### Levels of Encapsulation:
 1. **Data** - private fields, no direct access
 2. **Implementation** - how things work internally
 3. **Type** - concrete class hidden behind interface
 4. **Design** - architectural decisions hidden from clients
 ```typescript
 // BAD: Exposed internals
 class Order {
  public items: Item[] = [];
  public total: number = 0;
 }
 // Client can corrupt state
 order.items.push(item);
 order.total = -999; // Oops!
 // GOOD: Encapsulated
 class Order {
  private items: OrderItems;
  private total: Money;
  addItem(item: Item): void {
    this.items.add(item);
    this.recalculateTotal();
  }
  getTotal(): Money {
    return this.total; // Returns copy or immutable
  }
 }
 ```
 ---
 ## Polymorphism
 **Replace conditionals with types.**
 ```typescript
 // BAD: Type checking
 function calculateShipping(method: string, value: number): number {
  if (method === 'standard') return value < 50 ? 5 : 0;
  if (method === 'express') return 15;
  if (method === 'overnight') return 25;
  throw new Error('Unknown method');
 }
 // GOOD: Polymorphism
 interface ShippingMethod {
  calculateCost(orderValue: number): number;
 }
 class StandardShipping implements ShippingMethod {
  calculateCost(orderValue: number): number {
    return orderValue < 50 ? 5 : 0;
  }
 }
 class ExpressShipping implements ShippingMethod {
  calculateCost(orderValue: number): number {
    return 15;
  }
 }
 // Usage - no conditionals
 function calculateShipping(method: ShippingMethod, value: number): number {
  return method.calculateCost(value);
 }
 ```
 ---
 ## Value Objects vs Entities
 ### Value Objects
 - Defined by their attributes (no identity)
 - Immutable
 - Comparable by value
 - Examples: `Money`, `Email`, `Address`, `DateRange`
 ```typescript
 class Money {
  constructor(
    private readonly amount: number,
    private readonly currency: string
  ) {}
  equals(other: Money): boolean {
    return this.amount === other.amount &&
           this.currency === other.currency;
  }
  add(other: Money): Money {
    if (this.currency !== other.currency) {
      throw new CurrencyMismatch();
    }
    return new Money(this.amount + other.amount, this.currency);
  }
 }
 ```
 ### Entities
 - Have identity (survives attribute changes)
 - Usually mutable (via methods)
 - Comparable by identity
 - Examples: `User`, `Order`, `Product`
 ```typescript
 class User {
  constructor(
    private readonly id: UserId,
    private email: Email,
    private name: Name
  ) {}
  equals(other: User): boolean {
    return this.id.equals(other.id); // Identity comparison
  }
  changeEmail(newEmail: Email): void {
    this.email = newEmail; // Still same user
  }
 }
 ```
 ---
 ## Aggregates
 A cluster of objects treated as a single unit for data changes.
 - One object is the **aggregate root** (entry point)
 - External code only references the root
 - Root enforces invariants for the entire cluster
 ```typescript
 // Order is the aggregate root
 class Order {
  private items: OrderItem[] = [];
  // All access through the root
  addItem(product: Product, quantity: number): void {
    const item = new OrderItem(product, quantity);
    this.items.push(item);
    this.validateTotal();
  }
  removeItem(itemId: ItemId): void {
    this.items = this.items.filter(i => !i.id.equals(itemId));
  }
  // Root enforces invariants
  private validateTotal(): void {
    if (this.calculateTotal().exceeds(MAX_ORDER_VALUE)) {
      throw new OrderTotalExceeded();
    }
  }
 }
 // BAD: Accessing items directly
 order.items.push(new OrderItem(...)); // Bypasses validation!
 // GOOD: Through the root
 order.addItem(product, 2); // Validation happens
 ```
--- a/solid/references/solid-principles.md
+++ b/solid/references/solid-principles.md
@@ -0,0 +1,262 @@
 # SOLID Principles
 ## Overview
 SOLID helps structure software to be flexible, maintainable, and testable. These principles reduce coupling and increase cohesion.
 ## S - Single Responsibility Principle (SRP)
 > "A class should have one, and only one, reason to change."
 ### Problem It Solves
 God objects that do everything - hard to test, hard to change, hard to understand.
 ### How to Apply
 Each class handles ONE responsibility. If you find yourself saying "and" when describing what a class does, split it.
 ```typescript
 // BAD: Multiple responsibilities
 class Order {
  calculateTotal(): number { ... }
  saveToDatabase(): void { ... }    // Persistence
  generateInvoice(): string { ... } // Presentation
 }
 // GOOD: Single responsibility each
 class Order {
  private items: OrderItem[] = [];
  addItem(item: OrderItem): void { ... }
  calculateTotal(): number { ... }
 }
 class OrderRepository {
  save(order: Order): Promise<void> { ... }
 }
 class InvoiceGenerator {
  generate(order: Order): Invoice { ... }
 }
 ```
 ### Detection Questions
 - Does this class have multiple reasons to change?
 - Can I describe it without using "and"?
 - Would different stakeholders request changes to different parts?
 ---
 ## O - Open/Closed Principle (OCP)
 > "Software entities should be open for extension but closed for modification."
 ### Problem It Solves
 Having to modify existing, tested code every time requirements change. Risk of breaking working features.
 ### How to Apply
 Design abstractions that allow new behavior through new classes, not edits to existing ones.
 ```typescript
 // BAD: Must modify to add new shipping
 class ShippingCalculator {
  calculate(type: string, value: number): number {
    if (type === 'standard') return value < 50 ? 5 : 0;
    if (type === 'express') return 15;
    // Must add more ifs for new types!
  }
 }
 // GOOD: Open for extension
 interface ShippingMethod {
  calculateCost(orderValue: number): number;
 }
 class StandardShipping implements ShippingMethod {
  calculateCost(orderValue: number): number {
    return orderValue < 50 ? 5 : 0;
  }
 }
 class ExpressShipping implements ShippingMethod {
  calculateCost(orderValue: number): number {
    return 15;
  }
 }
 // Add new shipping by creating new class, not modifying existing
 class SameDayShipping implements ShippingMethod {
  calculateCost(orderValue: number): number {
    return 25;
  }
 }
 ```
 ### Architectural Insight
 OCP at architecture level means: **design your codebase so new features are added by adding code, not changing existing code.**
 ---
 ## L - Liskov Substitution Principle (LSP)
 > "Subtypes must be substitutable for their base types without altering program correctness."
 ### Problem It Solves
 Subclasses that break expectations, requiring type-checking and special cases.
 ### How to Apply
 Subclasses must honor the contract of the parent. If the parent returns positive numbers, subclasses cannot return negatives.
 ```typescript
 // BAD: Violates parent's contract
 class DiscountPolicy {
  getDiscount(value: number): number {
    return 0; // Non-negative expected
  }
 }
 class WeirdDiscount extends DiscountPolicy {
  getDiscount(value: number): number {
    return -5; // Increases cost! Breaks expectations
  }
 }
 // GOOD: Enforces contract
 class DiscountPolicy {
  constructor(private discount: number) {
    if (discount < 0) throw new Error("Discount must be non-negative");
  }
  getDiscount(): number {
    return this.discount;
  }
 }
 ```
 ### Key Insight
 This is why you can swap `InMemoryUserRepo` for `PostgresUserRepo` - they both honor the `UserRepo` interface contract.
 ---
 ## I - Interface Segregation Principle (ISP)
 > "Clients should not be forced to depend on methods they do not use."
 ### Problem It Solves
 Fat interfaces that force partial implementations, empty methods, or throws.
 ### How to Apply
 Split large interfaces into smaller, cohesive ones. Clients depend only on what they need.
 ```typescript
 // BAD: Fat interface
 interface WarehouseDevice {
  printLabel(orderId: string): void;
  scanBarcode(): string;
  packageItem(orderId: string): void;
 }
 class BasicPrinter implements WarehouseDevice {
  printLabel(orderId: string): void { /* works */ }
  scanBarcode(): string { throw new Error("Not supported"); } // Forced!
  packageItem(orderId: string): void { throw new Error("Not supported"); }
 }
 // GOOD: Segregated interfaces
 interface LabelPrinter {
  printLabel(orderId: string): void;
 }
 interface BarcodeScanner {
  scanBarcode(): string;
 }
 interface ItemPackager {
  packageItem(orderId: string): void;
 }
 class BasicPrinter implements LabelPrinter {
  printLabel(orderId: string): void { /* only what it does */ }
 }
 ```
 ### Detection
 If you see `throw new Error("Not implemented")` or empty method bodies, the interface is too fat.
 ---
 ## D - Dependency Inversion Principle (DIP)
 > "High-level modules should not depend on low-level modules. Both should depend on abstractions."
 ### Problem It Solves
 Tight coupling to specific implementations (databases, APIs, frameworks). Hard to test, hard to swap.
 ### How to Apply
 Depend on interfaces, inject implementations.
 ```typescript
 // BAD: Direct dependency on concrete class
 class OrderService {
  private emailService = new SendGridEmailService(); // Locked in!
  confirmOrder(email: string): void {
    this.emailService.send(email, "Order confirmed");
  }
 }
 // GOOD: Depend on abstraction
 interface EmailService {
  send(to: string, message: string): void;
 }
 class OrderService {
  constructor(private emailService: EmailService) {}
  confirmOrder(email: string): void {
    this.emailService.send(email, "Order confirmed");
  }
 }
 // Now can inject any implementation
 new OrderService(new SendGridEmailService());
 new OrderService(new SESEmailService());
 new OrderService(new MockEmailService()); // For tests!
 ```
 ### The Dependency Rule
 Source code dependencies should point **inward** toward high-level policies (domain logic), never toward low-level details (infrastructure).
 ```
 Infrastructure → Application → Domain
      ↑              ↑            ↑
    (outer)       (middle)     (inner)
 Dependencies flow: outer → inner
 Never: inner → outer
 ```
 ---
 ## Applying SOLID at Architecture Level
 These principles scale beyond classes:
 | Principle | Architecture Application |
 |-----------|--------------------------|
 | SRP | Each bounded context has one responsibility |
 | OCP | New features = new modules, not edits to existing |
 | LSP | Microservices with same contract are substitutable |
 | ISP | Thin interfaces between services |
 | DIP | High-level business logic doesn't know about databases/frameworks |
 ---
 ## Quick Reference
 | Principle | One-Liner | Red Flag |
 |-----------|-----------|----------|
 | SRP | One reason to change | "This class handles X and Y and Z" |
 | OCP | Add, don't modify | `if/else` chains for types |
 | LSP | Subtypes are substitutable | Type-checking in calling code |
 | ISP | Small, focused interfaces | Empty method implementations |
 | DIP | Depend on abstractions | `new ConcreteClass()` in business logic |
--- a/spec-driven-dev/SKILL.md
+++ b/spec-driven-dev/SKILL.md
@@ -0,0 +1,299 @@
 ---
 name: spec-driven-dev
 description: Write a structured specification before writing any code. Use when starting a new project, feature, or significant change. Adapted for a PM-first workflow where why comes before how.
 ---
 # Spec-Driven Development
 ## Overview
 Write a structured specification before writing any code. The spec is the shared source of truth — it defines what we're building, why it matters, and how we'll know it's done.
 **Code without a spec is guessing.**
 A spec doesn't need to be long. A two-paragraph spec beats no spec. The value is in forcing clarity before code is written, not in the length of the document.
 ## Mathias PM Context
 As a digital product manager building software:
 - **Why before how:** The spec must capture the business context and user need before technical decisions. Agents reading the spec should understand *why this matters*, not just *what to build*.
 - **Explicit success criteria:** Vague requirements produce vague results. Every spec must have testable success criteria.
 - **Surfaces assumptions:** The spec's primary job is to surface misunderstandings before they become expensive code.
 - **Living document:** Update the spec when decisions change. An outdated spec is still better than no spec.
 ## When to Use
 **Always create a spec when:**
 - Starting a new project or feature
 - Requirements are ambiguous or incomplete
 - The change touches multiple files or modules
 - You're about to make an architectural decision
 - The task would take more than a day to implement
 **When NOT to use:** Single-line fixes, typos, or changes where requirements are unambiguous and self-contained.
 ## The Gated Workflow
 Do not advance to the next phase without validation at each gate.
 ```
 SPECIFY → [review] → PLAN → [review] → TASKS → [review] → IMPLEMENT
 ```
 Each gate is a deliberate pause: does the next phase make sense given what we know?
 ## Phase 1: SPECIFY
 ### Surface Assumptions Immediately
 Before writing spec content, list what you're assuming:
 ```
 ASSUMPTIONS I'M MAKING:
 1. This is a Go backend service (no frontend changes)
 2. Authentication is handled by the existing middleware
 3. The database is PostgreSQL (matching the rest of the stack)
 4. The feature is used by authenticated users only, not public
 → Confirm or correct before I proceed.
 ```
 Don't silently fill in ambiguous requirements.
 ### Write the Spec
 A spec covers six areas:
 **1. Objective — WHY are we building this?**
 This is the most important section. It must answer:
 - What user problem does this solve?
 - Who is the user?
 - What does success look like from the user's perspective?
 - Why now?
 ```markdown
 ## Objective
 Invoice importers at small accounting firms manually copy payment details
 from PDF invoices into their banking system, taking 10–20 minutes per invoice.
 **User:** Invoice processor at an accounting firm (10–50 invoices/day)
 **Problem:** Manual data entry is slow, error-prone, and creates compliance risk
 **Goal:** Reduce per-invoice processing time from ~15 minutes to < 2 minutes
 Success: Invoice processor can extract and queue a payment from a PDF in under 2 minutes,
 with confidence the data is correct.
 ```
 **2. Commands — Full executable commands**
 ```
 Build:   task build
 Test:    task test (or: go test ./...)
 Lint:    task lint (or: golangci-lint run)
 Dev:     task dev
 Deploy:  task deploy:staging
 ```
 **3. Project Structure — Where things live**
 ```
 internal/
  domain/          → Core types and interfaces
  service/         → Business logic
  store/           → Database implementations
  handler/         → HTTP handlers
 cmd/
  server/          → Main entry point
 ```
 **4. Code Style — One real example beats three paragraphs**
 ```go
 // Error handling: always wrap with context
 if err != nil {
    return fmt.Errorf("parse invoice PDF: %w", err)
 }
 // Dependency injection: accept interfaces
 func NewInvoiceService(store InvoiceStore, parser PDFParser) *InvoiceService { ... }
 // Context: always first parameter for I/O operations
 func (s *InvoiceService) ProcessPDF(ctx context.Context, r io.Reader) (Invoice, error) { ... }
 ```
 **5. Testing Strategy**
 ```
 Framework:  testing package + testify
 Locations:  *_test.go files in same package
 Unit tests: table-driven, in-memory implementations for stores
 Fast path:  go test ./... (unit tests only)
 Full suite: go test -tags=integration ./... (includes DB tests)
 Coverage:   >80% for business logic packages
 ```
 **6. Boundaries**
 ```
 Always:
  - Run go test ./... before committing
  - Wrap errors with fmt.Errorf("context: %w", err)
  - Pass ctx as first parameter to any I/O function
  - Run govulncheck before adding new dependencies
 Ask first:
  - Schema changes
  - Adding new external dependencies
  - Changing public API contracts
  - Performance changes that affect existing behavior
 Never:
  - Commit secrets or API keys
  - Remove or skip failing tests
  - Send client data to external APIs
  - Use naked returns on errors
 ```
 ### Spec Template
 ```markdown
 # Spec: [Feature/Project Name]
 ## Objective
 [What we're building and why. User story or problem statement.]
 [Who is the user? What does success look like from their perspective?]
 ## Tech Stack
 [Language, key libraries, relevant existing infrastructure]
 ## Commands
 Build:   [full command]
 Test:    [full command]
 Lint:    [full command]
 Dev:     [full command]
 ## Project Structure
 [Directory layout with descriptions]
 ## Code Style
 [One real code example showing the patterns to follow]
 ## Testing Strategy
 [Framework, test locations, what to unit test vs integration test]
 ## Boundaries
 - Always: [...]
 - Ask first: [...]
 - Never: [...]
 ## Success Criteria
 [Specific, testable conditions that define "done"]
 - [ ] [Condition 1: metric/threshold/method]
 - [ ] [Condition 2: ...]
 ## Open Questions
 [Anything unresolved that needs input before implementation begins]
 ```
 ## Reframing Vague Requirements
 When you receive a vague requirement, translate it into specific success criteria before writing any spec content:
 ```
 Vague: "Make the invoice parser more reliable"
 Reframed success criteria:
 - Parser correctly extracts IBAN from 95% of Swedish invoice formats
 - Parser correctly extracts total amount from 98% of tested invoices
 - Parser returns a structured error (not a panic) for unrecognized formats
 - Processing time < 2 seconds for PDFs up to 10MB
 → Are these the right targets?
 ```
 ## Phase 2: PLAN
 With a validated spec, create a technical implementation plan:
 1. Identify major components and their dependencies
 2. Determine implementation order (foundations first)
 3. Note risks and unknowns
 4. Identify what can be built in parallel vs. what must be sequential
 5. Define verification checkpoints between phases
 The plan should be reviewable: anyone should be able to read it and say "yes, that's the right approach" or "no, change X."
 Load the `planning` skill for detailed task breakdown.
 ## Phase 3: TASKS
 Break the plan into discrete, implementable tasks. Load the `planning` skill for the full task breakdown methodology.
 **Each task must have:**
 - Acceptance criteria
 - Verification step (test command, build, manual check)
 - File count estimate (no task should touch more than ~5 files)
 ## Phase 4: IMPLEMENT
 Execute tasks one at a time. For each task:
 1. Load `tdd` skill — write failing tests first
 2. Implement minimal code to pass
 3. Load `clean-code` skill — refactor
 ## Keeping the Spec Alive
 - **Update when decisions change** — spec first, then code
 - **Update when scope changes**
 - **Commit the spec** — it belongs in version control
 - **Reference the spec in PRs** — link to the section each PR implements
 ## Common Rationalizations
 | Rationalization | Reality |
 |---|---|
 | "This is simple, I don't need a spec" | Simple tasks still need acceptance criteria. A two-line spec is fine. |
 | "I'll write the spec after" | That's documentation, not specification. The spec's value is forcing clarity *before* code. |
 | "The spec will slow us down" | A 15-minute spec prevents hours of rework. |
 | "Requirements will change anyway" | That's why the spec is a living document. |
 | "The user knows what they want" | Even clear requests have implicit assumptions. The spec surfaces those. |
 ## Verification
 Before starting implementation:
 - [ ] Spec covers all six core areas
 - [ ] Mathias has reviewed and approved the spec
 - [ ] Success criteria are specific and testable
 - [ ] Boundaries (Always/Ask First/Never) are defined
 - [ ] Open questions are resolved or accepted as unknowns
 - [ ] Spec is saved to a file in the repository
 ## Brain MCP Integration
 ### Logging
 Call `session_log` once at the end of every phase to record the outcome.
 Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
 treats `pass` as success, `fail` as failure, `skip` as neither.
 **At end of each phase:**
 - `session_log` with `{skill: "spec-driven-dev", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **Phases for this skill:** specify, plan, tasks, implement
 **Status semantics:**
 - `pass` — the phase's intended outcome was reached (gate passed).
 - `fail` — the phase's intended outcome was NOT reached (gate blocked, rework required).
 - `skip` — phase was skipped intentionally.
 **Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
 ## Cross-References
 - Load `problem-analysis` skill for deep requirement understanding before speccing
 - Load `user-stories` skill to decompose the spec into stories
 - Load `planning` skill for task breakdown
 - Load `feature-spec` skill once implementation begins, to scope individual features inside the project
 - Load `tdd` skill during implementation
--- a/tdd/SKILL.md
+++ b/tdd/SKILL.md
@@ -0,0 +1,401 @@
 ---
 name: tdd
 description: Write failing tests first, then minimal code to pass. Non-negotiable for all new features and bug fixes.
 ---
 # Test-Driven Development (TDD)
 ## Overview
 Write the test first. Watch it fail. Write minimal code to pass.
 **Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
 **Violating the letter of the rules is violating the spirit of the rules.**
 ## When to Use
 **Always:**
 - New features
 - Bug fixes
 - Refactoring
 - Behavior changes
 **Exceptions (ask Mathias):**
 - Throwaway prototypes
 - Generated code (sqlc output, templ output)
 - Configuration files
 Thinking "skip TDD just this once"? Stop. That's rationalization.
 ## The Iron Law
 ```
 NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
 ```
 Write code before the test? Delete it. Start over.
 **No exceptions:**
 - Don't keep it as "reference"
 - Don't "adapt" it while writing tests
 - Don't look at it
 - Delete means delete
 Implement fresh from tests. Period.
 ## Red-Green-Refactor
 ```
 RED → verify fails correctly → GREEN → verify all pass → REFACTOR → stay green → RED (next)
 ```
 ### RED - Write Failing Test
 Write one minimal test showing what should happen.
 **Go example (good):**
 ```go
 func TestRetryOperation_RetriesThreeTimes(t *testing.T) {
    attempts := 0
    op := func() error {
        attempts++
        if attempts < 3 {
            return errors.New("fail")
        }
        return nil
    }
    err := RetryOperation(op, 3)
    require.NoError(t, err)
    assert.Equal(t, 3, attempts)
 }
 ```
 Clear name, tests real behavior, one thing.
 **Go example (bad):**
 ```go
 func TestRetry(t *testing.T) {
    // Vague name, tests nothing meaningful
    err := RetryOperation(nil, 0)
    assert.NoError(t, err)
 }
 ```
 **Requirements:**
 - One behavior per test
 - Clear name describing the behavior
 - Real code (no mocks unless crossing a system boundary)
 ### Verify RED - Watch It Fail
 **MANDATORY. Never skip.**
 ```bash
 go test -run TestRetryOperation_RetriesThreeTimes ./...
 ```
 Confirm:
 - Test fails (not compilation errors)
 - Failure message is expected
 - Fails because feature is missing (not typos)
 **Test passes?** You're testing existing behavior. Fix test.
 **Compilation errors?** Fix errors, re-run until it fails correctly.
 ### GREEN - Minimal Code
 Write simplest code to pass the test.
 **Good:**
 ```go
 func RetryOperation(op func() error, maxRetries int) error {
    for i := 0; i < maxRetries; i++ {
        if err := op(); err == nil {
            return nil
        }
    }
    return op()
 }
 ```
 Just enough to pass.
 **Bad:**
 ```go
 func RetryOperation(op func() error, maxRetries int, opts ...RetryOption) error {
    // YAGNI — don't add options, backoff, jitter before the test demands it
 }
 ```
 Over-engineered.
 Don't add features, refactor other code, or "improve" beyond what the test demands.
 ### Verify GREEN - Watch It Pass
 **MANDATORY.**
 ```bash
 go test ./...
 ```
 Confirm:
 - Test passes
 - All other tests still pass
 - No race conditions: `go test -race ./...`
 **Test fails?** Fix code, not test.
 **Other tests fail?** Fix now.
 ### REFACTOR - Clean Up
 After green only:
 - Remove duplication
 - Improve names
 - Extract helpers
 - Apply clean code principles (load `clean-code` skill)
 Keep tests green. Don't add behavior.
 ### Repeat
 Next failing test for next behavior.
 ## Go-Specific TDD Notes
 ### Table-Driven Tests (Preferred Pattern)
 ```go
 func TestValidateEmail(t *testing.T) {
    tests := []struct {
        name    string
        email   string
        wantErr bool
    }{
        {"valid email", "user@example.com", false},
        {"empty email", "", true},
        {"no at sign", "userexample.com", true},
        {"no domain", "user@", true},
    }
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            err := ValidateEmail(tt.email)
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
 }
 ```
 Table-driven tests are the Go idiom. Use them for behavior that varies across inputs.
 ### Subtests with t.Run
 Use `t.Run` to name subtests clearly. Failure messages include the subtest name.
 ```go
 t.Run("rejects empty input", func(t *testing.T) { ... })
 t.Run("accepts valid UUID", func(t *testing.T) { ... })
 ```
 ### Test File Conventions
 - File: `<package>_test.go` in the same directory
 - Package: `package foo_test` for black-box testing (preferred), `package foo` for white-box
 - Helpers: use `t.Helper()` so stack traces point to the caller, not the helper
 ```go
 func assertNoError(t *testing.T, err error) {
    t.Helper()
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
 }
 ```
 ### Running Tests
 ```bash
 go test ./...                        # all packages
 go test -run TestFoo ./pkg/...       # specific test
 go test -run TestFoo/subtest ./...   # specific subtest
 go test -race ./...                  # race detector (always run before commit)
 go test -cover ./...                 # coverage
 go test -v ./...                     # verbose
 ```
 ### testify
 Pre-approved. Use `assert` (continues on failure) and `require` (stops on failure):
 ```go
 require.NoError(t, err)          // fatal if error
 assert.Equal(t, expected, got)   // non-fatal comparison
 assert.ErrorIs(t, err, ErrFoo)   // error chain check
 ```
 ## Good Tests
 | Quality | Good | Bad |
 |---------|------|-----|
 | **Minimal** | One thing. "and" in name? Split it. | `TestValidatesEmailAndDomainAndWhitespace` |
 | **Clear** | Name describes behavior | `TestFoo`, `Test1` |
 | **Shows intent** | Demonstrates desired API | Obscures what code should do |
 | **Table-driven** | Multiple cases, one test function | Copy-pasted test functions |
 ## Common Rationalizations
 | Excuse | Reality |
 |--------|---------|
 | "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
 | "I'll test after" | Tests passing immediately prove nothing. |
 | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
 | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
 | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
 | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
 | "Need to explore first" | Fine. Throw away exploration, start with TDD. |
 | "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
 | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
 | "Existing code has no tests" | You're improving it. Add tests for existing code. |
 ## Red Flags - STOP and Start Over
 - Code before test
 - Test after implementation
 - Test passes immediately without seeing it fail
 - Can't explain why test failed
 - Tests added "later"
 - Rationalizing "just this once"
 - "Already manually tested it"
 - "Tests after achieve the same purpose"
 - "Keep as reference" or "adapt existing code"
 - "Already spent X hours, deleting is wasteful"
 - "TDD is dogmatic, I'm being pragmatic"
 - "This is different because..."
 **All of these mean: Delete code. Start over with TDD.**
 ## Example: Bug Fix
 **Bug:** Empty email accepted
 **RED**
 ```go
 func TestSubmitForm_RejectsEmptyEmail(t *testing.T) {
    result := submitForm(FormData{Email: ""})
    assert.Equal(t, "email required", result.Error)
 }
 ```
 **Verify RED**
 ```bash
 $ go test -run TestSubmitForm_RejectsEmptyEmail ./...
 FAIL: expected "email required", got ""
 ```
 **GREEN**
 ```go
 func submitForm(data FormData) FormResult {
    if strings.TrimSpace(data.Email) == "" {
        return FormResult{Error: "email required"}
    }
    // ...
 }
 ```
 **Verify GREEN**
 ```bash
 $ go test ./...
 ok  	example.com/myapp	0.003s
 ```
 **REFACTOR**
 Extract validation for multiple fields if needed.
 ## Verification Checklist
 Before marking work complete:
 - [ ] Every new function/method has a test
 - [ ] Watched each test fail before implementing
 - [ ] Each test failed for expected reason (feature missing, not typo)
 - [ ] Wrote minimal code to pass each test
 - [ ] All tests pass: `go test ./...`
 - [ ] Race detector clean: `go test -race ./...`
 - [ ] Tests use real code (mocks only if crossing a system boundary)
 - [ ] Edge cases and errors covered
 Can't check all boxes? You skipped TDD. Start over.
 ## When Stuck
 | Problem | Solution |
 |---------|----------|
 | Don't know how to test | Write wished-for API. Write assertion first. Ask Mathias. |
 | Test too complicated | Design too complicated. Simplify interface. |
 | Must mock everything | Code too coupled. Use dependency injection. |
 | Test setup huge | Extract helpers with `t.Helper()`. Still complex? Simplify design. |
 ## Debugging Integration
 Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
 Never fix bugs without a test.
 ## Testing Anti-Patterns
 When adding test utilities or mocks, load the `references/testing-anti-patterns.md` to avoid:
 - Testing mock behavior instead of real behavior
 - Adding test-only methods to production types
 - Mocking without understanding what the dependency does
 ## Brain MCP Integration
 The brain MCP exposes session context across machines. Use it to make TDD cycles cumulative rather than one-shot.
 **At session start:**
 - Run `brain_query` with the feature name + "tdd" to surface prior cycles, anti-patterns, or testing decisions for this code area. Skip if the feature is brand new.
 **Never:**
 - Embed brain content in test code or assertions. The brain is context for you, not state for the system under test.
 ### Logging
 Call `session_log` once at the end of every phase to record the outcome.
 Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
 treats `pass` as success, `fail` as failure, `skip` as neither.
 **At end of `red` phase:**
 - `session_log` with `{skill: "tdd", phase: "red", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **At end of `green` phase:**
 - `session_log` with `{skill: "tdd", phase: "green", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **At end of `refactor` phase:**
 - `session_log` with `{skill: "tdd", phase: "refactor", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **Status semantics:**
 - `pass` — the phase's intended outcome was reached (red: test fails as expected; green: test passes; refactor: tests still pass after refactor).
 - `fail` — the phase's intended outcome was NOT reached (test compiled when it shouldn't, test still fails after green attempt, refactor broke tests).
 - `skip` — phase was skipped intentionally (e.g. refactor not warranted).
 **Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future `tdd` call to a local model. If your skill never logs, the routing pod sees no data and may default-route or default-not-route in a way that doesn't reflect real performance.
 ## Final Rule
 ```
 Production code → test exists and failed first
 Otherwise → not TDD
 ```
 No exceptions without Mathias's permission.
 ## Mode 2 Routing Note
 This skill is invoked identically whether the agent is running in Mode 1 (cloud Claude, no routing) or Mode 2 (client-local, supervisor routing layer). The routing pod (Plan 6) does not exist yet; until it does, treat this skill as Mode 1 only. The discipline does not change between modes — only the model behind the call.
--- a/tdd/references/testing-anti-patterns.md
+++ b/tdd/references/testing-anti-patterns.md
@@ -0,0 +1,299 @@
 # Testing Anti-Patterns
 **Load this reference when:** writing or changing tests, adding mocks, or tempted to add test-only methods to production code.
 ## Overview
 Tests must verify real behavior, not mock behavior. Mocks are a means to isolate, not the thing being tested.
 **Core principle:** Test what the code does, not what the mocks do.
 **Following strict TDD prevents these anti-patterns.**
 ## The Iron Laws
 ```
 1. NEVER test mock behavior
 2. NEVER add test-only methods to production classes
 3. NEVER mock without understanding dependencies
 ```
 ## Anti-Pattern 1: Testing Mock Behavior
 **The violation:**
 ```typescript
 // ❌ BAD: Testing that the mock exists
 test('renders sidebar', () => {
  render(<Page />);
  expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
 });
 ```
 **Why this is wrong:**
 - You're verifying the mock works, not that the component works
 - Test passes when mock is present, fails when it's not
 - Tells you nothing about real behavior
 **your human partner's correction:** "Are we testing the behavior of a mock?"
 **The fix:**
 ```typescript
 // ✅ GOOD: Test real component or don't mock it
 test('renders sidebar', () => {
  render(<Page />);  // Don't mock sidebar
  expect(screen.getByRole('navigation')).toBeInTheDocument();
 });
 // OR if sidebar must be mocked for isolation:
 // Don't assert on the mock - test Page's behavior with sidebar present
 ```
 ### Gate Function
 ```
 BEFORE asserting on any mock element:
  Ask: "Am I testing real component behavior or just mock existence?"
  IF testing mock existence:
    STOP - Delete the assertion or unmock the component
  Test real behavior instead
 ```
 ## Anti-Pattern 2: Test-Only Methods in Production
 **The violation:**
 ```typescript
 // ❌ BAD: destroy() only used in tests
 class Session {
  async destroy() {  // Looks like production API!
    await this._workspaceManager?.destroyWorkspace(this.id);
    // ... cleanup
  }
 }
 // In tests
 afterEach(() => session.destroy());
 ```
 **Why this is wrong:**
 - Production class polluted with test-only code
 - Dangerous if accidentally called in production
 - Violates YAGNI and separation of concerns
 - Confuses object lifecycle with entity lifecycle
 **The fix:**
 ```typescript
 // ✅ GOOD: Test utilities handle test cleanup
 // Session has no destroy() - it's stateless in production
 // In test-utils/
 export async function cleanupSession(session: Session) {
  const workspace = session.getWorkspaceInfo();
  if (workspace) {
    await workspaceManager.destroyWorkspace(workspace.id);
  }
 }
 // In tests
 afterEach(() => cleanupSession(session));
 ```
 ### Gate Function
 ```
 BEFORE adding any method to production class:
  Ask: "Is this only used by tests?"
  IF yes:
    STOP - Don't add it
    Put it in test utilities instead
  Ask: "Does this class own this resource's lifecycle?"
  IF no:
    STOP - Wrong class for this method
 ```
 ## Anti-Pattern 3: Mocking Without Understanding
 **The violation:**
 ```typescript
 // ❌ BAD: Mock breaks test logic
 test('detects duplicate server', () => {
  // Mock prevents config write that test depends on!
  vi.mock('ToolCatalog', () => ({
    discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
  }));
  await addServer(config);
  await addServer(config);  // Should throw - but won't!
 });
 ```
 **Why this is wrong:**
 - Mocked method had side effect test depended on (writing config)
 - Over-mocking to "be safe" breaks actual behavior
 - Test passes for wrong reason or fails mysteriously
 **The fix:**
 ```typescript
 // ✅ GOOD: Mock at correct level
 test('detects duplicate server', () => {
  // Mock the slow part, preserve behavior test needs
  vi.mock('MCPServerManager'); // Just mock slow server startup
  await addServer(config);  // Config written
  await addServer(config);  // Duplicate detected ✓
 });
 ```
 ### Gate Function
 ```
 BEFORE mocking any method:
  STOP - Don't mock yet
  1. Ask: "What side effects does the real method have?"
  2. Ask: "Does this test depend on any of those side effects?"
  3. Ask: "Do I fully understand what this test needs?"
  IF depends on side effects:
    Mock at lower level (the actual slow/external operation)
    OR use test doubles that preserve necessary behavior
    NOT the high-level method the test depends on
  IF unsure what test depends on:
    Run test with real implementation FIRST
    Observe what actually needs to happen
    THEN add minimal mocking at the right level
  Red flags:
    - "I'll mock this to be safe"
    - "This might be slow, better mock it"
    - Mocking without understanding the dependency chain
 ```
 ## Anti-Pattern 4: Incomplete Mocks
 **The violation:**
 ```typescript
 // ❌ BAD: Partial mock - only fields you think you need
 const mockResponse = {
  status: 'success',
  data: { userId: '123', name: 'Alice' }
  // Missing: metadata that downstream code uses
 };
 // Later: breaks when code accesses response.metadata.requestId
 ```
 **Why this is wrong:**
 - **Partial mocks hide structural assumptions** - You only mocked fields you know about
 - **Downstream code may depend on fields you didn't include** - Silent failures
 - **Tests pass but integration fails** - Mock incomplete, real API complete
 - **False confidence** - Test proves nothing about real behavior
 **The Iron Rule:** Mock the COMPLETE data structure as it exists in reality, not just fields your immediate test uses.
 **The fix:**
 ```typescript
 // ✅ GOOD: Mirror real API completeness
 const mockResponse = {
  status: 'success',
  data: { userId: '123', name: 'Alice' },
  metadata: { requestId: 'req-789', timestamp: 1234567890 }
  // All fields real API returns
 };
 ```
 ### Gate Function
 ```
 BEFORE creating mock responses:
  Check: "What fields does the real API response contain?"
  Actions:
    1. Examine actual API response from docs/examples
    2. Include ALL fields system might consume downstream
    3. Verify mock matches real response schema completely
  Critical:
    If you're creating a mock, you must understand the ENTIRE structure
    Partial mocks fail silently when code depends on omitted fields
  If uncertain: Include all documented fields
 ```
 ## Anti-Pattern 5: Integration Tests as Afterthought
 **The violation:**
 ```
 ✅ Implementation complete
 ❌ No tests written
 "Ready for testing"
 ```
 **Why this is wrong:**
 - Testing is part of implementation, not optional follow-up
 - TDD would have caught this
 - Can't claim complete without tests
 **The fix:**
 ```
 TDD cycle:
 1. Write failing test
 2. Implement to pass
 3. Refactor
 4. THEN claim complete
 ```
 ## When Mocks Become Too Complex
 **Warning signs:**
 - Mock setup longer than test logic
 - Mocking everything to make test pass
 - Mocks missing methods real components have
 - Test breaks when mock changes
 **your human partner's question:** "Do we need to be using a mock here?"
 **Consider:** Integration tests with real components often simpler than complex mocks
 ## TDD Prevents These Anti-Patterns
 **Why TDD helps:**
 1. **Write test first** → Forces you to think about what you're actually testing
 2. **Watch it fail** → Confirms test tests real behavior, not mocks
 3. **Minimal implementation** → No test-only methods creep in
 4. **Real dependencies** → You see what the test actually needs before mocking
 **If you're testing mock behavior, you violated TDD** - you added mocks without watching test fail against real code first.
 ## Quick Reference
 | Anti-Pattern | Fix |
 |--------------|-----|
 | Assert on mock elements | Test real component or unmock it |
 | Test-only methods in production | Move to test utilities |
 | Mock without understanding | Understand dependencies first, mock minimally |
 | Incomplete mocks | Mirror real API completely |
 | Tests as afterthought | TDD - tests first |
 | Over-complex mocks | Consider integration tests |
 ## Red Flags
 - Assertion checks for `*-mock` test IDs
 - Methods only called in test files
 - Mock setup is >50% of test
 - Test fails when you remove mock
 - Can't explain why mock is needed
 - Mocking "just to be safe"
 ## The Bottom Line
 **Mocks are tools to isolate, not things to test.**
 If TDD reveals you're testing mock behavior, you've gone wrong.
 Fix: Test real behavior or question why you're mocking at all.
--- a/test-design/SKILL.md
+++ b/test-design/SKILL.md
@@ -0,0 +1,308 @@
 ---
 name: test-design
 description: Evaluate test quality using Dave Farley's 8 Properties of Good Tests. Use when reviewing or writing tests to ensure they provide genuine verification.
 ---
 # Test Design
 ## Overview
 Good tests are investments. Bad tests are liabilities — they pass when they shouldn't, fail when code is correct, or verify nothing meaningful.
 This skill uses Dave Farley's 8 Properties of Good Tests to assess and improve test quality. The **Farley Index** (0–10) provides a scored summary.
 Reference: [Dave Farley's Properties of Good Tests](https://www.linkedin.com/pulse/tdd-properties-good-tests-dave-farley-iexge/)
 ## The 8 Properties
 | Property | Weight | What it measures |
 |----------|--------|-----------------|
 | **Understandable** | 1.5x | Can a reader understand what behavior is being tested? |
 | **Maintainable** | 1.5x | Will small code changes cause test failures unrelated to behavior? |
 | **Repeatable** | 1.25x | Same result every time, regardless of environment or order |
 | **Atomic** | 1.0x | One behavior per test; tests are independent |
 | **Necessary** | 1.0x | Tests real behavior, not mock internals or framework behavior |
 | **Granular** | 1.0x | Each test covers one specific case |
 | **Fast** | 0.75x | Tests run quickly enough to support rapid TDD cycles |
 | **First (TDD)** | 1.0x | Tests were written before implementation |
 **Farley Index formula:** `(U×1.5 + M×1.5 + R×1.25 + A×1.0 + N×1.0 + G×1.0 + F×0.75 + T×1.0) / 9.0`
 ## Rating Scale
 | Score | Rating | Interpretation |
 |-------|--------|----------------|
 | 9.0–10.0 | Exemplary | Model quality; tests serve as living documentation |
 | 7.5–8.9 | Excellent | High quality with minor improvement opportunities |
 | 6.0–7.4 | Good | Solid foundation with clear improvement areas |
 | 4.5–5.9 | Fair | Functional but needs significant attention |
 | 3.0–4.4 | Poor | Tests provide limited value; refactoring needed |
 | 0.0–2.9 | Critical | Tests may be harmful; consider rewriting |
 ## Property Deep Dives
 ### Understandable (U)
 A test should tell a story: what behavior, under what conditions, produces what result.
 **Go patterns that help:**
 - Subtest names in `t.Run`: `t.Run("returns error when email is empty", ...)`
 - Table-driven tests with descriptive `name` fields
 - Arrange-Act-Assert structure with blank lines separating sections
 ```go
 // Good: clear behavior name, clear structure
 func TestValidateUser_RejectsEmptyEmail(t *testing.T) {
    // Arrange
    user := User{Name: "Alice", Email: ""}
    // Act
    err := ValidateUser(user)
    // Assert
    require.Error(t, err)
    assert.ErrorIs(t, err, ErrInvalidEmail)
 }
 // Bad: cryptic name, no structure
 func TestUser1(t *testing.T) {
    u := User{}
    assert.NotNil(t, ValidateUser(u))
 }
 ```
 **Negative signals:** cryptic names (`test_1`, `TestFoo`), no AAA structure, multiple behaviors in one test.
 ### Maintainable (M)
 Tests that break when implementation changes (but behavior doesn't) create noise and slow down development.
 **Negative signals:**
 - Over-specified mock interactions (`assert.Called(mock, "MethodX", args...)` when behavior is all that matters)
 - ArgumentCaptor deep inspection
 - `verifyNoMoreInteractions` that breaks when you add a logging call
 - Tests coupled to internal field names
 **Go patterns that help:**
 - Test behavior via public API, not internal state
 - Avoid asserting on exact call counts unless the count IS the behavior
 ```go
 // Bad: breaks when you add an audit log call
 mock.AssertCalled(t, "Save", user)
 mock.AssertNumberOfCalls(t, "Save", 1)
 mock.AssertNotCalled(t, "Log") // Breaks if you add logging later
 // Good: test the outcome
 result, err := service.CreateUser(ctx, req)
 require.NoError(t, err)
 assert.Equal(t, user.Email, result.Email)
 ```
 ### Repeatable (R)
 Tests must produce the same result regardless of when, where, or in what order they run.
 **Negative signals (Go):**
 - `time.Now()` in test logic without injection
 - `os.ReadFile` for fixtures that aren't hermetic
 - Shared global state between tests
 - Tests that depend on network availability
 - `time.Sleep` for synchronization
 **Go fixes:**
 ```go
 // Bad: time-dependent
 func TestTokenExpiry(t *testing.T) {
    token := generateToken()
    time.Sleep(2 * time.Second)
    assert.True(t, token.IsExpired())
 }
 // Good: inject clock
 type Clock interface {
    Now() time.Time
 }
 type FixedClock struct{ t time.Time }
 func (c FixedClock) Now() time.Time { return c.t }
 func TestTokenExpiry(t *testing.T) {
    clock := FixedClock{t: time.Unix(0, 0)}
    token := generateTokenWithClock(clock)
    futureClk := FixedClock{t: time.Unix(3600, 0)}
    assert.True(t, token.IsExpiredAt(futureClk.Now()))
 }
 ```
 Use `t.TempDir()` for filesystem fixtures — cleaned up automatically.
 ### Atomic (A)
 One test = one behavior. Tests must be independent — running in any order must produce the same result.
 **Go patterns:**
 - `t.Parallel()` on subtests forces isolation
 - Fresh state in each `t.Run`
 - No `init()` or package-level setup that leaks between tests
 ```go
 func TestUserService(t *testing.T) {
    tests := []struct {
        name    string
        input   CreateUserReq
        wantErr bool
    }{
        {"valid user", validReq, false},
        {"duplicate email", dupEmailReq, true},
    }
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            t.Parallel() // Each subtest runs independently
            store := NewInMemoryStore() // Fresh state per test
            svc := NewUserService(store)
            _, err := svc.Create(context.Background(), tt.input)
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
 }
 ```
 ### Necessary (N)
 Tests must verify real behavior. Tautology Theatre — tests whose outcome is predetermined regardless of production code — provides false confidence.
 **Types of Tautology Theatre:**
 1. **Mock tautology:** Configure mock return, then assert that mock returns it.
   ```go
   // Bad: this passes even if production code is deleted
   mockStore.On("GetUser", id).Return(user, nil)
   result, _ := mockStore.GetUser(id)
   assert.Equal(t, user, result) // Testing the mock, not production code
   ```
 2. **Mock-only test:** Every object is a mock; no real class instantiated.
 3. **Trivial tautology:** `assert.True(t, true)` or `assert.NotNil(t, new(User))`
 4. **Framework test:** Verifying that Go's `make(map[string]int)` returns non-nil.
 **Fix:** Test real behavior through real implementations. Use mocks only to isolate from external systems (DB, HTTP, filesystem).
 ### Granular (G)
 Each test covers one specific case. Table-driven tests in Go are the natural expression of granularity.
 ```go
 // Good: each row is one case, each can fail independently
 func TestParseAmount(t *testing.T) {
    tests := []struct {
        name    string
        input   string
        want    Amount
        wantErr bool
    }{
        {"integer", "100", Amount{Value: 100}, false},
        {"decimal", "10.50", Amount{Value: 1050, Scale: 2}, false},
        {"negative", "-5", Amount{}, true},
        {"empty", "", Amount{}, true},
        {"non-numeric", "abc", Amount{}, true},
    }
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            got, err := ParseAmount(tt.input)
            if tt.wantErr {
                require.Error(t, err)
                return
            }
            require.NoError(t, err)
            assert.Equal(t, tt.want, got)
        })
    }
 }
 ```
 ### Fast (F)
 Tests must run fast enough to support TDD cycles. Target: the full test suite in < 30 seconds for most projects.
 **Go fixes:**
 - Mark slow integration tests with build tags: `//go:build integration`
 - Use `t.Parallel()` to parallelize safe tests
 - Use `InMemoryStore` implementations instead of real DB for unit tests
 - Use `httptest.NewServer` for HTTP tests instead of real servers
 ```bash
 # Unit tests only (fast, default)
 go test ./...
 # Integration tests (slower, explicit)
 go test -tags=integration ./...
 ```
 **Negative signals:** `time.Sleep`, network calls without build tags, database calls in unit tests.
 ### First / TDD (T)
 Evidence that tests were written before implementation. This is the hardest property to verify statically.
 **Positive signals:**
 - Commit history shows test commit before implementation commit
 - Tests test behavior, not implementation details (tests-first forces API design)
 - Tests are simpler than the implementation (tests-first keeps tests focused)
 **Negative signals:**
 - Tests that exactly mirror the implementation structure
 - Tests that only cover happy paths (implementation-first misses edge cases)
 - Tests added in the same commit as a large implementation
 ## Go-Specific Test Design Notes
 ### t.Helper()
 Use `t.Helper()` in helper functions so stack traces point to the call site, not the helper:
 ```go
 func assertValidUser(t *testing.T, u User) {
    t.Helper()
    assert.NotEmpty(t, u.ID)
    assert.NotEmpty(t, u.Email)
 }
 ```
 ### Table-Driven Tests Are Preferred
 Go convention is table-driven tests. They're granular, readable, and easy to extend:
 - Add a new case by adding a row to the table — no new test function
 - Each case can be run independently: `go test -run TestFoo/case_name`
 ### Subtests Enable Targeted Runs
 ```bash
 go test -run TestValidateUser/rejects_empty_email ./...
 ```
 ## When Writing Tests
 Apply this checklist to every new test:
 - [ ] Name describes the behavior being tested (not the function name)
 - [ ] Structure follows Arrange-Act-Assert
 - [ ] Tests one behavior (no "and" in the name)
 - [ ] Uses real implementations where feasible
 - [ ] Runs in < 100ms (or tagged for integration)
 - [ ] Uses `t.Helper()` in helper functions
 - [ ] Table-driven if testing multiple similar inputs
 ## Cross-References
 - Load `tdd` skill for the full TDD workflow
 - Load `code-review` skill for test quality review during pre-merge review
 - See `clean-code/references/code-smells.md` for testing-specific smells
--- a/trainer/SKILL.md
+++ b/trainer/SKILL.md
@@ -0,0 +1,252 @@
 ---
 name: trainer
 description: Two-phase brain curation — scan session logs to score candidate learnings (READ), then write the ones that pass the quality gate as generalized knowledge entries (WRITE). Use periodically to keep the brain useful, not noisy.
 ---
 # Trainer
 ## Overview
 The trainer is the brain's quality gate. It runs in two phases:
 - **READ phase:** scan session logs, score candidate learnings 1-5, output the candidate list.
 - **WRITE phase:** take candidates that score ≥3, apply a quality gate (must generalize, no project-specific identifiers, must be clear cold), emit knowledge entries.
 `session-retrospective` surfaces candidates loosely. The trainer is stricter — it produces only generalized, reusable knowledge, the kind that earns its place in long-term memory.
 **Core principle:** If a knowledge entry only makes sense to someone who was in the session, it does not belong in the brain.
 ## Iron Laws
 1. **Do not call `brain_write` from this skill.** Emit entries; the operator decides which to commit.
 2. **Apply the quality gate before emitting any entry.** A skipped gate produces brain pollution.
 3. **No project-specific paths, identifiers, or function names in entries.** If you cannot generalize the lesson, drop the entry.
 ## When to Use
 - Periodically (weekly, or after a stretch of intense work) — not after every session
 - When `session-retrospective` has produced too many candidates to keep — trainer filters them
 - When the brain is becoming noisy and needs curation pruning before adding more
 **Do not use for:**
 - Single sessions with one or two clear learnings (use `session-retrospective` directly)
 - Project-specific decisions (those belong in CLAUDE.md, not the brain)
 ## Phase READ — Scan and Score
 ### What to Look For
 - **Patterns that worked** — the approach was clean, correct, and worth reinforcing for future sessions
 - **Corrections** — something was first done wrong, then corrected. Both sides are valuable: the rationalization that led to the wrong path, and the correction that fixed it.
 ### Scoring (1-5)
 | Score | Criteria |
 |---|---|
 | 5 | Novel pattern, clearly correct, generalizes across projects and domains |
 | 4 | Good pattern, correct, somewhat project-specific but still broadly useful |
 | 3 | Correct but obvious — include only if especially clean or pedagogically useful |
 | 2 | Trivial or already widely known — skip |
 | 1 | Project-specific noise, mechanical fix, or single-use detail — skip |
 Anything below 3 is dropped. Anything at 3 is borderline — include only if the operator might benefit from the framing.
 ### READ Output Format
 Respond in markdown:
 ```
 **Candidate N (score: X/5, type: pattern|correction)**
 - **What happened:** Brief description of the learning moment
 - **Why it's valuable:** What makes this worth preserving
 - **Key insight:** The distilled lesson in one sentence
 ```
 End with: `N candidates found (M scoring ≥ 3)` — the WRITE phase will use the ≥3 candidates.
 ## Phase WRITE — Emit Knowledge Entries
 ### Quality Gate
 Apply BEFORE writing each entry. Drop the entry if any check fails:
 - **Lesson** is phrased so it could apply to any project, not just this one — no project-specific paths, variable names, function names, or repo identifiers
 - **When it applies** may name a stack or tool family (e.g. "Go projects with golangci-lint") to scope the relevance, but must not name specific files or functions
 - **Example** may use concrete tool names, language syntax, or commands to make the point land — concrete examples are pedagogically necessary; only the Lesson and When-it-applies sections need to generalize
 - The insight is clearly stated enough that someone reading it cold (no session context) understands it
 - The lesson is not already in `CLAUDE.md` or an existing memory file
 ### WRITE Output Format
 For each candidate that passes the quality gate:
 ```
 # [Topic]
 ## Lesson
 [The key insight in 1-3 sentences. Generalized — no project-specific names.]
 ## When it applies
 [Conditions under which this pattern is relevant.]
 ## Example
 [A brief, generic example illustrating the lesson. Pseudocode or
 language-agnostic prose preferred over a copy of the original code.]
 ```
 After all entries, end with: `N entries ready for brain_write` or `0 entries passed quality gate — <reason>`.
 The operator (or a calling agent) decides whether to actually run `brain_write` on each entry. This skill does not write directly.
 ## Worked Example
 ### READ phase output
 ```
 **Candidate 1 (score: 4/5, type: correction)**
 - **What happened:** Implementer subagent reported "tests pass" after running `go test ./...`, but CI failed on a `defer resp.Body.Close()` errcheck violation 30 minutes later.
 - **Why it's valuable:** Captures a specific gap between local verification and CI verification that is easy to repeat across Go projects.
 - **Key insight:** Per-task verification must include the linter, not just the test runner.
 **Candidate 2 (score: 5/5, type: pattern)**
 - **What happened:** Repaired malformed JSON from LLM output by stripping invalid escape sequences before parsing. Generalizes well — most LLM-as-JSON pipelines need this.
 - **Why it's valuable:** Pattern works across any LLM-as-JSON ingestion pipeline.
 - **Key insight:** Treat LLM JSON output as untrusted input that may need normalization before parsing.
 **Candidate 3 (score: 3/5, type: pattern)**
 - **What happened:** Implementer subagent saved its commit SHA in the structured report so the spec reviewer could verify against a stable revision.
 - **Why it's valuable:** Obvious in hindsight, but cleanly framed — passing identifiers between subagents avoids race conditions when the file changes mid-review.
 - **Key insight:** Subagent reports should include verifiable identifiers (commit SHAs, file hashes) the reviewer can pin against.
 **Candidate 4 (score: 2/5, type: pattern)**
 - **What happened:** Renamed a local variable from `data` to `invoiceLineItems` for clarity.
 - **Why it's valuable:** It's good practice but trivial.
 - **Key insight:** Use names that describe what the data is.
 4 candidates found (3 scoring ≥ 3)
 ```
 ### WRITE phase output
 ```
 # Per-task verification must include lint
 ## Lesson
 When dispatching tasks that include "tests pass" as a success criterion,
 the verification command must include the project's linter, not just the
 test runner. Tests verify behavior; linters catch policy violations
 (error handling, unused variables, formatting) that the test suite is
 blind to.
 ## When it applies
 Any project with a CI lint gate, particularly Go projects with
 golangci-lint configured for errcheck or similar policies.
 ## Example
 In a project with a Taskfile that chains lint + test, the per-task
 verification should be `task check`, not `go test ./...`. The latter
 will pass locally and fail in CI.
 ---
 # LLM JSON output is untrusted input
 ## Lesson
 Output from an LLM that is supposed to be JSON should be treated as
 untrusted input. It may contain invalid escape sequences (often from
 markdown-formatted content the LLM included), structurally valid but
 semantically wrong fields, or trailing prose. Normalize before parsing.
 ## When it applies
 Any pipeline that calls an LLM with a JSON output contract and parses
 the response with a strict JSON parser.
 ## Example
 Wrap the parser in a normalization step: strip invalid escape sequences,
 trim trailing prose after the closing brace, validate against the
 expected schema. Treat the LLM as a network boundary, not a function call.
 ---
 # Subagent reports should carry verifiable identifiers
 ## Lesson
 When a subagent reports completion of work that another agent will verify,
 the report must carry an identifier that pins the verifiable state — a
 commit SHA, file hash, or revision number. Without one, the verifier may
 read a different state than the subagent produced if the workspace
 changes between report and review.
 ## When it applies
 Any multi-agent workflow with a verify-after-implement step, particularly
 when the workspace is shared (a worktree or branch held by multiple
 processes).
 ## Example
 After committing changes, a subagent reports `Status: DONE, SHA: abc1234`.
 The verifier runs `git show abc1234` to read exactly the state the
 subagent produced, even if newer commits arrived in the interim.
 3 entries ready for brain_write
 ```
 ## Anti-Patterns
 | Anti-Pattern | Why It Fails |
 |---|---|
 | Score everything 4 or 5 | The gate exists for a reason. If everything is high-value, nothing is. |
 | Knowledge entries with project-specific paths | They will not match queries from other projects, defeating the brain's value. |
 | Writing entries directly without the quality gate | Polluted brain → useless brain. The gate is the whole point. |
 | Skipping READ and going straight to WRITE | The score is what determines whether to write. Without it, you are just transcribing. |
 ## Brain MCP Integration
 The brain holds prior curated knowledge entries. Querying it before READ prevents duplicate entries from cluttering the gate.
 **Before READ:**
 - Run `brain_query` for the topics that appear in the session log. This prevents scoring candidates that are already in the brain (skip them — same content twice does not help).
 **After WRITE:**
 - This skill does NOT call `brain_write`. The operator decides which entries to commit, then runs `brain_write` per accepted entry.
 ### Logging
 Call `session_log` once at the end of every phase to record the outcome.
 Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
 treats `pass` as success, `fail` as failure, `skip` as neither.
 **At end of `read` phase:**
 - `session_log` with `{skill: "trainer", phase: "read", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **At end of `write` phase:**
 - `session_log` with `{skill: "trainer", phase: "write", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
 **Phases for this skill:** read, write
 **Status semantics:**
 - `pass` — the phase's intended outcome was reached (read: candidates scored; write: gate-passed entries emitted).
 - `fail` — the phase's intended outcome was NOT reached.
 - `skip` — phase was skipped intentionally (e.g. no candidates passed the gate, write skipped).
 **Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
 ## Mode 2 Routing Note
 Trainer is the same shape as `session-retrospective` — long-context reading, mechanical filtering — and is a Mode 2 routing candidate. Until Plan 6, treat as Mode 1 only.
 ## Cross-References
 - Load `session-retrospective` first if you want a looser, faster surfacing pass before the strict gate.
 - The brain's `brain_query` and `brain_write` tools are the read/write side of the brain that this skill curates.
--- a/user-stories/SKILL.md
+++ b/user-stories/SKILL.md
@@ -0,0 +1,246 @@
 ---
 name: user-stories
 description: Decompose problems into thin, deliverable user stories using Elephant Carpaccio and INVEST criteria. Use after problem analysis, before planning and implementation.
 ---
 # User Stories
 ## Core Philosophy
 A user story is a placeholder for a conversation, not a specification. It captures who needs something, what they need, and why — in the smallest unit that still delivers end-to-end value.
 The goal is **thin slices** — the Elephant Carpaccio technique. An elephant is too big to eat at once; slice it as thin as possible (carpaccio) so each slice is still valuable, testable, and deliverable.
 ## When to Use
 - You have a problem analysis and need to break it into implementable units
 - A feature is too large to estimate or implement
 - You need to communicate scope and prioritize with a stakeholder
 - You need to discover the minimum viable version of a feature
 **Input required:** A problem analysis (load `problem-analysis` skill first) or clear requirements.
 ## INVEST Criteria
 Every user story must be:
 | Criterion | Meaning |
 |-----------|---------|
 | **Independent** | Can be developed without requiring another story to be complete first |
 | **Negotiable** | Details can be discussed and refined |
 | **Valuable** | Delivers clear value to an end user (not just infrastructure) |
 | **Estimable** | Clear enough scope to understand complexity |
 | **Small** | Can be completed in a single sprint (1–2 weeks) |
 | **Testable** | Has clear, specific acceptance criteria |
 ## Story Format
 ```
 **Story Title:** [Descriptive name]
 **As a** [user type / persona]
 **I want** [the capability or feature]
 **So that** [the business value or outcome]
 **Acceptance Criteria:**
 - Given [starting context]
  When [action taken]
  Then [observable outcome]
 - Given [another context]
  When [action]
  Then [outcome]
 **Definition of Done:**
 - [User-facing quality requirement]
 - [Business completion criterion]
 - [Performance/security requirement if applicable]
 ```
 ## Elephant Carpaccio: Slicing Technique
 The most common mistake is making stories too large. Use these slicing patterns to cut large stories thin.
 ### Slicing Pattern 1: By Data Variation
 If a feature works differently for different data, make each variation its own story.
 ```
 Large story: "Users can search invoices"
 Thin slices:
 1. "Users can search invoices by exact invoice number"
 2. "Users can search invoices by customer name (partial match)"
 3. "Users can search invoices by date range"
 4. "Users can combine multiple search criteria"
 ```
 ### Slicing Pattern 2: By User Scenario
 Different user journeys are separate stories.
 ```
 Large story: "Users can manage their account"
 Thin slices:
 1. "New users can register with email and password"
 2. "Registered users can log in"
 3. "Logged-in users can update their display name"
 4. "Users can reset their password via email"
 ```
 ### Slicing Pattern 3: By Error Condition
 Happy path first, then error handling as a separate story.
 ```
 Story 1: "Users can upload a PDF invoice (happy path)"
 Story 2: "Users see a clear error when uploading non-PDF files"
 Story 3: "Users see a clear error when uploading files > 10MB"
 ```
 ### Slicing Pattern 4: By Performance
 Basic version first, then performance improvement as a separate story.
 ```
 Story 1: "Users can export invoice reports (any performance)"
 Story 2: "Invoice export completes in under 5 seconds for reports up to 1000 items"
 ```
 ### Slicing Pattern 5: By Rules/Workflow
 Each business rule is a candidate for its own story.
 ```
 Large story: "The system applies discounts to orders"
 Thin slices:
 1. "Premium customers receive a 10% discount automatically"
 2. "Orders over 1000 SEK receive free shipping"
 3. "Discount codes can be applied at checkout"
 ```
 ## Prioritization Framework
 Order stories by:
 1. **Risk reduction first** — stories that test unknown territory early
 2. **User value** — which story delivers the most value if we stopped here?
 3. **Learning** — stories that reveal information needed for later stories
 4. **Dependencies** — stories that others depend on come first
 ```
 Risk reduction: "Integrate with Bankgirot API (spike)"
    → We don't know if this is feasible; find out first
 User value: "Users can view their invoice list"
    → Most basic valuable state; everything else builds on this
 Foundation: "Authentication: users can log in"
    → Many other stories depend on this
 ```
 ## Story Decomposition Example
 **Problem:** A financial automation tool needs to process invoices for payment.
 **Too large:** "Process invoices for payment"
 **Decomposed with Elephant Carpaccio:**
 1. **Parse PDF invoice, extract total amount**
   - Given a valid Swedish PDF invoice
   - When I upload it to the system
   - Then the system displays the extracted amount and recipient
   - AC: Displays IBAN, amount, and due date for 10 sample invoices
 2. **Queue a payment for approval**
   - Given a parsed invoice
   - When I click "Approve for payment"
   - Then the payment is queued with status "pending"
   - AC: Payment appears in pending list, can be viewed
 3. **Execute a queued payment via Bankgirot**
   - Given a payment in "pending" status
   - When the batch window runs
   - Then an ISO 20022 payment instruction is submitted
   - AC: Submission confirmed, status changes to "submitted"
 4. **View payment status after submission**
   - As a user I can see whether a submitted payment succeeded or failed
   - AC: Status reflects Bankgirot confirmation/rejection
 5. **Handle rejected payments**
   - Given a submitted payment that is rejected by Bankgirot
   - When the rejection is received
   - Then the user sees a clear error message with the rejection reason
   - AC: Rejection reason from Bankgirot is displayed, payment status is "rejected"
 Each slice is independently valuable and testable.
 ## Output Document
 The output of this skill is a `user-stories.md` file:
 ```markdown
 # User Stories: [Feature Name]
 ## Personas
 - **[Persona 1]**: [Who they are and what they care about]
 - **[Persona 2]**: ...
 ## Stories (Prioritized)
 ### Phase 1: Foundation
 #### Story 1: [Title]
 As a [persona]...
 #### Story 2: [Title]
 ...
 ### Phase 2: Core Features
 #### Story 3: [Title]
 ...
 ## Decomposition Rationale
 [Why stories were sliced this way; what dependencies exist]
 ## Stories Deferred to v2
 - [Story A]: [Reason]
 ```
 ## Common Mistakes
 | Mistake | Fix |
 |---------|-----|
 | Technical tasks disguised as stories ("Set up database") | Reframe as user value ("Users can create an account") |
 | "And" in the title | Split into two stories |
 | No acceptance criteria | Every story must have testable criteria |
 | Stories depending on each other at the same priority level | Reorder so foundational stories come first |
 | Implementation details in acceptance criteria ("use PostgreSQL") | Keep criteria in user/business terms |
 ## Mathias Context
 As a digital PM:
 - Stories should capture **business value**, not technical tasks
 - Stories are the bridge between problem analysis and technical implementation
 - When the business says "build feature X", decompose it into stories that can be delivered incrementally
 - A story with 10 acceptance criteria is not one story — it's many. Split it.
 - The goal is to find the thinnest slice that's still worth shipping
 ## Transition to Next Steps
 After stories are written:
 1. Load `spec-driven-dev` skill to write a technical specification
 2. Load `planning` skill to convert stories into an implementation plan
 3. Load `atdd` skill to implement stories with acceptance test-driven development
 ## Cross-References
 - Requires: `problem-analysis` skill output
 - Leads to: `planning` and `atdd` skills
 - INVEST criteria origin: Bill Wake
 - Elephant Carpaccio: Alistair Cockburn