chore: bootstrap skills library — 19 skills + installer + CI auto-tag

Phase 1 of mathias/skills extraction (infra#62 Track D — homelab
next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill
dirs + SKILLS_INDEX.md) and adds the installation surface:

- Taskfile.yml — install / update / list / release / check targets
- install.sh — bootstrap installer for hosts without Task. Idempotent
  symlink wirer; default checkout at ~/.local/share/skills/ on every
  host; SKILLS_REF env var pins a tag (default: main).
- .gitea/workflows/release.yml — auto-tag every push to main by
  Bump-Type footer (major/minor/patch, default patch). Skipped when
  commit contains [skip-release].
- README — usage, versioning, contribution flow, secret-hygiene rule.

Phase 1 wires Claude Code only (~/.claude/skills/<name> global +
<repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode,
antigravity, and gitea-resident agents (cobalt-dingo, agentsquad)
once their skill conventions are researched.

Public repo, markdown-only — no secrets, no client names. Verified
via pre-push grep before initial push.

[skip-release]
This commit is contained in:
Mathias
2026-05-24 14:59:54 +02:00
commit fb464d03d4
33 changed files with 8688 additions and 0 deletions

View File

@@ -0,0 +1,68 @@
name: release
# Auto-tag every push to main with a semver bump derived from the latest
# commit footer:
# Bump-Type: major → vX.0.0
# Bump-Type: minor → v0.Y.0
# (default) → v0.0.Z (patch)
#
# Skips when the latest commit is itself the auto-tagger (commit message
# contains "[skip-release]") to prevent recursion. Tag is pushed back
# to the repo using the GITEA_TOKEN provided by act_runner.
on:
push:
branches:
- main
jobs:
tag:
runs-on: ubuntu-latest
steps:
- name: checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.GITEA_TOKEN }}
- name: derive next version
id: bump
run: |
set -euo pipefail
last_commit=$(git log -1 --pretty=%B)
if printf '%s' "$last_commit" | grep -qi '\[skip-release\]'; then
echo "skipping — commit marked [skip-release]"
echo "skip=true" >> "$GITHUB_OUTPUT"
exit 0
fi
latest=$(git describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0")
major=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\1/')
minor=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\2/')
patch=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\3/')
bump=$(printf '%s' "$last_commit" | grep -iE '^Bump-Type:' | head -1 | awk '{print tolower($2)}')
case "$bump" in
major) major=$((major + 1)); minor=0; patch=0 ;;
minor) minor=$((minor + 1)); patch=0 ;;
*) patch=$((patch + 1)) ;;
esac
next="v${major}.${minor}.${patch}"
echo "previous=$latest"
echo "bump=${bump:-patch}"
echo "next=$next"
echo "tag=$next" >> "$GITHUB_OUTPUT"
echo "skip=false" >> "$GITHUB_OUTPUT"
- name: tag + push
if: steps.bump.outputs.skip != 'true'
env:
TAG: ${{ steps.bump.outputs.tag }}
GITEA_ACTOR: ${{ gitea.actor }}
GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
run: |
set -euo pipefail
git config user.name "${GITEA_ACTOR}"
git config user.email "${GITEA_ACTOR}@noreply.gitea.d-ma.be"
git tag -a "$TAG" -m "release $TAG"
# gitea-actions checkout uses an x-access-token URL; reuse it.
git push origin "$TAG"
echo "pushed $TAG"

81
README.md Normal file
View File

@@ -0,0 +1,81 @@
# skills
Engineering skills library — markdown `SKILL.md` files that plug into any
project or AI-coding harness. Installed via `task skills:install` in
consumer projects, or directly by `install.sh` on hosts without Task.
## What is a skill?
A skill is a single directory containing one or more markdown files. The
canonical entrypoint is `<name>/SKILL.md`. Additional reference material
lives alongside in the same directory (e.g.
`clean-code/references/code-smells.md`). Harnesses load skills on
demand — keep `SKILL.md` short and triggerable; push depth into
references.
See [`SKILLS_INDEX.md`](./SKILLS_INDEX.md) for the full catalogue with
descriptions and "use when" triggers.
## Install (consumer project)
The expected pattern: each consumer project's `Taskfile.yml` has a
`skills:install` target that delegates here. From a project root:
```bash
task skills:install
```
Bare-shell fallback for hosts without Task installed:
```bash
curl -fsSL https://gitea.d-ma.be/mathias/skills/raw/branch/main/install.sh | bash
```
Both forms are idempotent and re-runnable. On every run they:
1. `git clone` (or `git pull`) this repo into `~/.local/share/skills/`
the **canonical local checkout** on every host.
2. Wire per-skill symlinks into the harnesses found on the host (see
[Wired harnesses](#wired-harnesses)).
Re-run after `git pull` upstream to pick up new skills or version
changes. CI tags every change as a `vX.Y.Z` release (see
[Versioning](#versioning)).
## Wired harnesses
Phase 1 (current) wires **Claude Code** only:
- Global: `~/.claude/skills/<name>``~/.local/share/skills/<name>`
- Per-repo: `<repo>/.claude/skills/<name>` (when invoked from a git
repo) → same
Phase 2 will add: Crush, opencode, antigravity, gitea-resident agents
(cobalt-dingo, agentsquad). See `mathias/infra` issue `infra#62`
addendum for the roadmap.
## Versioning
Every commit to `main` is auto-tagged as a semver release by the
`.gitea/workflows/release.yml` action. Bump direction follows the commit
footer:
- `Bump-Type: major``X.0.0`
- `Bump-Type: minor``0.Y.0`
- *anything else* → `0.0.Z` (patch, the default)
Consumer projects can pin a version by setting `SKILLS_REF=v0.1.0`
before `task skills:install`; default is `main` for always-current.
## Contributing a skill
1. Add a directory `<short-kebab-name>/`.
2. Create `<short-kebab-name>/SKILL.md` — start with one-paragraph
purpose, then triggers ("use when …"), then mechanics.
3. Add a row to `SKILLS_INDEX.md` (description + use-when).
4. Commit directly to `main`. Conventional-commit prefix
(`feat:` / `fix:` / `docs:`). Add `Bump-Type: minor` footer when
adding a new skill or expanding scope; default patch covers edits.
This repo is markdown-only and public — never include secrets, client
names, or anything under NDA in a skill.

91
SKILLS_INDEX.md Normal file
View File

@@ -0,0 +1,91 @@
# Engineering Skills Index
This index lists all available engineering skills. Load the full SKILL.md on demand.
| Skill | Description | Use when |
|-------|-------------|----------|
| `tdd` | Test-Driven Development enforcement | Starting any new feature or fixing a bug |
| `clean-code` | Clean code principles with Go adaptation | Code review, refactoring, new code |
| `solid` | SOLID principles with Go examples | Design review, architecture decisions |
| `code-review` | Code smell detection + Go-specific checklist | Pre-merge review |
| `refactoring` | Systematic refactoring methodology (Fowler) | When improving existing code |
| `test-design` | Dave Farley test design scoring (Farley Index) | Reviewing or writing tests |
| `cognitive-load` | Cognitive complexity analysis (CLI score) | Identifying hotspots, before refactoring |
| `problem-analysis` | Deep problem understanding before coding | Starting any non-trivial task |
| `user-stories` | Elephant Carpaccio story slicing | Breaking down features |
| `atdd` | Acceptance Test-Driven Development workflow | Feature development with acceptance criteria |
| `spec-driven-dev` | Spec-driven development for PMs | Writing specs and requirements |
| `planning` | Planning and task breakdown | Sprint planning, task decomposition |
| `gitea-ci` | Gitea CI/CD pipeline with act_runner, buildah, k3s, GitHub mirror | Setting up or debugging CI on this infra |
| `debug` | Hypothesis-first debugging discipline | Any test failure or bug whose cause is not immediately obvious |
| `feature-spec` | Implementation-level feature spec (tighter than spec-driven-dev) | Scoping one feature inside an already-specced project |
| `playwright` | Playwright E2E testing — setup, HTMX patterns, SSE streams, CI wiring | Adding browser tests to any project with a web UI |
| `session-retrospective` | Surface non-obvious learnings from a session log | After a coding session ends, before context is lost |
| `trainer` | Two-phase brain curation (score candidates, write past quality gate) | Periodic brain pruning and curation |
| `grill-me` | Structured plan interrogation — Quick Poke, Full Grill, Pre-mortem | Stress-testing a plan before committing; end of Diamond 1; before promoting to pre-prod |
## Wiring into tools
The canonical home is `~/dev/.skills/`. Per-tool surfacing is handled by
`~/dev/.context/wire-skills.sh`:
- **Claude Code** — each git repo under `~/dev/` gets a `<repo>/.claude/skills`
symlink pointing at `~/dev/.skills/`. The symlink is excluded by
`**/.claude/skills` in `~/.config/git/ignore`, so it never lands in commits.
- **Crush** — per-skill symlinks under `~/.config/crush/skills/<name>`, added
alongside existing entries (e.g., `mstack/*`). Skills already present as
real directories or non-canonical symlinks are preserved.
Run `bash ~/dev/.context/wire-skills.sh --dry-run` to preview, then drop the
flag to apply. The script is idempotent.
## Skill Relationships
```
problem-analysis
└── user-stories
├── spec-driven-dev (PM-kickoff lifecycle)
│ │
│ └── feature-spec (per-feature, mid-implementation)
│ │
│ └── planning
│ │
│ └── atdd ──→ tdd (foundation)
│ ├──→ debug (when tests fail)
│ ├──→ clean-code (REFACTOR phase)
│ └──→ test-design (review)
└── planning
code-review ──→ refactoring ──→ clean-code
└──→ solid
(post-session)
session-retrospective ──→ trainer ──→ brain (via brain_write)
(planning & convergence)
grill-me ──→ planning (after plan is sharpened)
──→ spec-driven-dev (after hypothesis is validated)
```
## Quick Triggers
| Situation | Load |
|-----------|------|
| "Build this feature" | `problem-analysis``spec-driven-dev``planning``atdd` |
| "Fix this bug" | `tdd` |
| "Review this code" | `code-review` |
| "Refactor this" | `refactoring` + `clean-code` |
| "Design this system" | `solid` + `problem-analysis` |
| "Break this down into stories" | `user-stories` |
| "Write a spec" | `spec-driven-dev` |
| "This code is hard to understand" | `cognitive-load` |
| "Review these tests" | `test-design` |
| "Set up CI for this project" | `gitea-ci` |
| "CI pipeline is failing / not triggering" | `gitea-ci` |
| "Commit directly to main? Or branch?" | TBD: commit to main unless parallel agents are active (see `gitea-ci` SKILL §Trunk-Based Development) |
| "Grill me / stress-test this / is this ready?" | `grill-me` |
| "End of Diamond 1 — should we build this?" | `grill-me` (Full Grill) |
| "About to promote to pre-prod" | `grill-me` (Pre-mortem) |

130
Taskfile.yml Normal file
View File

@@ -0,0 +1,130 @@
version: '3'
# Skills library Taskfile.
#
# Primary entry point for project + host installation. Consumer projects
# do NOT depend on this file directly — they ship their own
# `skills:install` target that delegates here via `install.sh`. This
# Taskfile is invoked by maintainers of the skills repo itself, plus by
# install.sh during the post-clone wire step.
vars:
REPO_URL: 'https://gitea.d-ma.be/mathias/skills.git'
CHECKOUT_DIR: '{{.HOME}}/.local/share/skills'
CLAUDE_GLOBAL_DIR: '{{.HOME}}/.claude/skills'
tasks:
default:
cmds:
- task: list
list:
desc: 'Print every skill in this repo'
cmds:
- 'ls -1 {{.TASKFILE_DIR}} | grep -Ev "^(Taskfile.yml|install.sh|README.md|SKILLS_INDEX.md|.gitea|.git$)$" || true'
silent: true
install:
desc: 'Wire skills into every harness detected on this host (Claude Code today, more in phase 2).'
cmds:
- task: install:claude:global
- task: install:claude:repo
install:claude:global:
desc: 'Per-skill symlinks under ~/.claude/skills/<name> — visible in every Claude Code project on this host.'
cmds:
- mkdir -p {{.CLAUDE_GLOBAL_DIR}}
- |
for skill in $(ls -1 "{{.TASKFILE_DIR}}" | grep -Ev "^(Taskfile.yml|install.sh|README.md|SKILLS_INDEX.md|.gitea|.git$)$"); do
target="{{.TASKFILE_DIR}}/${skill}"
link="{{.CLAUDE_GLOBAL_DIR}}/${skill}"
if [ -L "$link" ] || [ -e "$link" ]; then
current=$(readlink "$link" 2>/dev/null || true)
if [ "$current" = "$target" ]; then
continue
fi
rm -rf "$link"
fi
ln -s "$target" "$link"
echo "linked $link → $target"
done
install:claude:repo:
desc: 'Per-skill symlinks under $(pwd)/.claude/skills/<name> when invoked inside a git repo.'
cmds:
- |
if ! git rev-parse --show-toplevel >/dev/null 2>&1; then
echo "not in a git repo — skipping per-repo wire"; exit 0
fi
repo=$(git rev-parse --show-toplevel)
if [ "$repo" = "{{.TASKFILE_DIR}}" ]; then
echo "running from skills repo itself — skipping per-repo wire"; exit 0
fi
target_dir="$repo/.claude/skills"
mkdir -p "$target_dir"
for skill in $(ls -1 "{{.TASKFILE_DIR}}" | grep -Ev "^(Taskfile.yml|install.sh|README.md|SKILLS_INDEX.md|.gitea|.git$)$"); do
target="{{.TASKFILE_DIR}}/${skill}"
link="${target_dir}/${skill}"
if [ -L "$link" ] || [ -e "$link" ]; then
current=$(readlink "$link" 2>/dev/null || true)
if [ "$current" = "$target" ]; then
continue
fi
rm -rf "$link"
fi
ln -s "$target" "$link"
echo "linked $link → $target"
done
update:
desc: 'git pull the canonical checkout then re-run install.'
cmds:
- |
if [ ! -d "{{.CHECKOUT_DIR}}/.git" ]; then
echo "no canonical checkout at {{.CHECKOUT_DIR}} — run install.sh first"; exit 1
fi
git -C "{{.CHECKOUT_DIR}}" pull --ff-only
- task: install
release:
desc: 'Tag HEAD per Bump-Type in the latest commit footer. Normally CI handles this — run locally only when CI is unavailable.'
cmds:
- |
latest=$(git -C "{{.TASKFILE_DIR}}" describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0")
major=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\1/')
minor=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\2/')
patch=$(echo "$latest" | sed -E 's/^v([0-9]+)\.([0-9]+)\.([0-9]+)$/\3/')
bump=$(git -C "{{.TASKFILE_DIR}}" log -1 --pretty=%B | grep -iE "^Bump-Type:" | head -1 | awk '{print tolower($2)}')
case "$bump" in
major) major=$((major + 1)); minor=0; patch=0 ;;
minor) minor=$((minor + 1)); patch=0 ;;
*) patch=$((patch + 1)) ;;
esac
next="v${major}.${minor}.${patch}"
echo "tagging $next (previous $latest, bump=${bump:-patch})"
git -C "{{.TASKFILE_DIR}}" tag -a "$next" -m "release $next"
echo "now: git push origin $next"
check:
desc: 'Sanity check — every skill dir has a SKILL.md, no stray files at root.'
cmds:
- |
missing=0
for skill in $(ls -1 "{{.TASKFILE_DIR}}" | grep -Ev "^(Taskfile.yml|install.sh|README.md|SKILLS_INDEX.md|.gitea|.git$)$"); do
dir="{{.TASKFILE_DIR}}/${skill}"
if [ ! -d "$dir" ]; then
echo "ERROR: $skill is not a directory" >&2
missing=$((missing + 1))
continue
fi
if [ ! -f "$dir/SKILL.md" ]; then
echo "ERROR: $dir missing SKILL.md" >&2
missing=$((missing + 1))
fi
done
if [ $missing -gt 0 ]; then
echo "$missing skill(s) failed structure check" >&2
exit 1
fi
echo "all skills have SKILL.md"

294
atdd/SKILL.md Normal file
View File

@@ -0,0 +1,294 @@
---
name: atdd
description: Implement user stories using the ATDD workflow — Red (failing acceptance test) → Green (minimal implementation) → Refactor. One story at a time.
---
# Acceptance Test-Driven Development (ATDD)
## Overview
ATDD extends TDD to the acceptance level. You write tests that capture the behavior described in a user story's acceptance criteria — then implement just enough to make them pass.
**Input required:** User stories with clear acceptance criteria. Load `user-stories` skill first.
## Core Methodology: Red-Green-Refactor at Acceptance Level
```
RED → Write failing acceptance test (one story at a time)
GREEN → Write minimal code to make it pass
REFACTOR → Clean up code (load clean-code skill)
COMMIT → Commit with reference to the story
```
**Key principle:** Never mix phases. RED is only writing tests. GREEN is only making tests pass. REFACTOR is only cleaning up.
**Stop and confirm** before advancing from RED to GREEN, and from GREEN to REFACTOR.
## Working on One Story at a Time
Pick the highest-priority story from the backlog. Do not start a second story until the first is done (committed, clean, all tests green).
## Phase 1: RED — Write Failing Acceptance Test
Translate the story's acceptance criteria into executable tests.
**For each acceptance criterion:**
```
Given [context]
When [action]
Then [observable outcome]
```
Becomes a test case.
### Go ATDD Example
**Story:** "Users can register with email and password"
**Acceptance criteria:**
- Given valid email and password, registration succeeds and returns the new user
- Given an email already in use, registration returns ErrDuplicateEmail
- Given an invalid email format, registration returns ErrInvalidEmail
- Given a password shorter than 8 chars, registration returns ErrWeakPassword
**Acceptance tests (write these BEFORE implementation):**
```go
func TestUserRegistration_Acceptance(t *testing.T) {
tests := []struct {
name string
email string
pass string
wantErr error
}{
{
name: "valid registration creates user",
email: "alice@example.com",
pass: "securePass1!",
wantErr: nil,
},
{
name: "duplicate email returns ErrDuplicateEmail",
email: "existing@example.com",
pass: "securePass1!",
wantErr: ErrDuplicateEmail,
},
{
name: "invalid email format returns ErrInvalidEmail",
email: "not-an-email",
pass: "securePass1!",
wantErr: ErrInvalidEmail,
},
{
name: "short password returns ErrWeakPassword",
email: "alice@example.com",
pass: "short",
wantErr: ErrWeakPassword,
},
}
store := NewInMemoryUserStore()
// Pre-seed duplicate email
store.Seed(User{Email: "existing@example.com"})
svc := NewUserService(store)
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
_, err := svc.Register(context.Background(), tt.email, tt.pass)
if tt.wantErr != nil {
assert.ErrorIs(t, err, tt.wantErr)
} else {
assert.NoError(t, err)
}
})
}
}
```
**Run and verify they fail:**
```bash
go test -run TestUserRegistration_Acceptance ./...
# FAIL: UserService.Register undefined
```
The test must fail because the implementation doesn't exist yet. If it passes, you're testing existing behavior. Check your test.
**STOP here. Confirm before moving to GREEN.**
## Phase 2: GREEN — Minimal Implementation
Implement the absolute minimum code necessary to make the acceptance tests pass. Ugly code is acceptable at this stage. The goal is green, not clean.
```go
// Minimal implementation — prioritize passing tests over elegance
func (s *UserService) Register(ctx context.Context, email, password string) (User, error) {
if !strings.Contains(email, "@") {
return User{}, ErrInvalidEmail
}
if len(password) < 8 {
return User{}, ErrWeakPassword
}
if s.store.Exists(ctx, email) {
return User{}, ErrDuplicateEmail
}
u := User{ID: newID(), Email: email}
if err := s.store.Save(ctx, u); err != nil {
return User{}, fmt.Errorf("save user: %w", err)
}
return u, nil
}
```
**Verify GREEN:**
```bash
go test -run TestUserRegistration_Acceptance ./...
# ok
go test ./... # All other tests still pass
go test -race ./...
```
**STOP here. Confirm before moving to REFACTOR.**
## Phase 3: REFACTOR — Clean Up
With tests green, clean the code. Do not change behavior.
**During REFACTOR, load and follow the `clean-code` skill.**
Apply:
- Extract functions for complex logic
- Better naming
- Remove duplication
- Apply SOLID principles
- For design decisions, load `solid` skill
After every change: `go test ./...` must stay green.
**Do NOT refactor the acceptance tests themselves** unless they have a bug. The tests are your specification — changing them changes what you've committed to.
**STOP here. Confirm before committing.**
## Phase 4: COMMIT
Commit the completed story. Reference the story in the commit message.
```bash
git commit -m "feat: user registration with email validation
Implements acceptance criteria for US-001 (User Registration).
- Valid email/password creates a user
- Duplicate email returns ErrDuplicateEmail
- Invalid email format returns ErrInvalidEmail
- Short password returns ErrWeakPassword"
```
## For Test Design Questions
When writing acceptance tests, load the `test-design` skill to ensure:
- Tests are understandable (Farley: Understandable)
- Tests don't couple to implementation details (Farley: Maintainable)
- Tests are repeatable across environments (Farley: Repeatable)
## For Code Review Before Commit
Load the `code-review` skill before committing to check:
- Error handling follows `fmt.Errorf("op: %w", err)` pattern
- Context propagation is correct
- No race conditions in concurrent code
- Exported API is minimal
## For Refactoring Guidance
Load the `refactoring` skill for systematic techniques to apply during REFACTOR phase.
## For Clean Code Principles
Load the `clean-code` skill during REFACTOR phase to apply:
- Naming conventions
- Function size and responsibility
- SOLID principles
## Go-Specific ATDD Notes
### Acceptance Test Location
Place acceptance tests in `_test.go` files with the `_acceptance_test.go` suffix or `//go:build acceptance` build tag to separate from fast unit tests:
```go
//go:build acceptance
package service_test
```
```bash
go test ./... # Unit tests only (fast)
go test -tags=acceptance ./... # Include acceptance tests
```
### In-Memory Implementations for Acceptance Tests
Use in-memory implementations of repositories for acceptance tests:
```go
type InMemoryUserStore struct {
mu sync.Mutex
users map[string]User
}
func (s *InMemoryUserStore) Save(ctx context.Context, u User) error {
s.mu.Lock()
defer s.mu.Unlock()
if _, exists := s.users[u.Email]; exists {
return ErrDuplicateEmail
}
s.users[u.Email] = u
return nil
}
func (s *InMemoryUserStore) Exists(ctx context.Context, email string) bool {
s.mu.Lock()
defer s.mu.Unlock()
_, exists := s.users[email]
return exists
}
```
This keeps acceptance tests fast and deterministic.
### Given-When-Then in Go
The `t.Run` structure maps naturally to Given-When-Then:
```go
t.Run("given valid credentials, when logging in, then returns token", func(t *testing.T) {
// Given: valid user exists
store := InMemoryUserStore{...}
// When: login is called
token, err := svc.Login(ctx, email, pass)
// Then: token is returned
require.NoError(t, err)
assert.NotEmpty(t, token)
})
```
## Verification Checklist
Before committing:
- [ ] All acceptance tests pass
- [ ] All existing tests still pass
- [ ] Race detector clean: `go test -race ./...`
- [ ] REFACTOR phase complete — code is clean
- [ ] Commit message references the user story
- [ ] No behavior added beyond the acceptance criteria
## Cross-References
- Requires: `user-stories` skill output (stories with acceptance criteria)
- During REFACTOR: load `clean-code` skill
- For test quality: load `test-design` skill
- For code review: load `code-review` skill
- For refactoring guidance: load `refactoring` skill
- Foundation: load `tdd` skill for core TDD methodology

282
clean-code/SKILL.md Normal file
View File

@@ -0,0 +1,282 @@
---
name: clean-code
description: Write code that humans can read, understand, and safely change. Apply SOLID, GRASP, and Clean Code principles during the REFACTOR phase of TDD.
---
# Clean Code
## Core Philosophy
Write code for humans first, computers second. Every piece of code you write will be read many more times than it is written. Optimize for clarity and simplicity above all else.
Code has three consumers:
1. **Users** — get their needs met
2. **Customers** — make or save money
3. **Developers** — must maintain it
Developers read code 10x more than they write it. Design for them.
## When to Apply This Skill
- During the **REFACTOR phase** of TDD (after the test is green)
- When reviewing code (load `code-review` skill too)
- When designing new types or packages
**Do not apply during the GREEN phase.** Ugly code that passes is correct. Clean it after.
## Naming
Naming is the most important thing in code. In priority order:
1. **Consistency** — Same concept = same name everywhere. One name per concept.
2. **Understandability** — Domain language, not technical jargon.
3. **Specificity** — Precise, not vague. Avoid `data`, `info`, `manager`, `handler`, `processor`, `utils`.
4. **Brevity** — Short but not cryptic.
5. **Searchability** — Unique, greppable names.
6. **Pronounceability** — You should be able to say it in conversation.
```go
// Bad: inconsistent, vague, cryptic
func getUsrById(id string) {}
func fetchCustomerByID(id string) {}
func retrieveClientById(id string) {}
// Good: consistent, specific
func GetUser(ctx context.Context, id UserID) (User, error) {}
func GetOrder(ctx context.Context, id OrderID) (Order, error) {}
```
## Functions/Methods
- Do one thing, do it well, do it only
- Operate at a single level of abstraction
- Prefer fewer parameters — zero is ideal, one or two common, three should be rare
- Avoid side effects — if a function has a side effect, name it accordingly
- Prefer pure functions where possible
```go
// Bad: does too much, mixed abstraction levels
func processOrder(o *Order, db *sql.DB, mailer Mailer) error {
if o.Items == nil || len(o.Items) == 0 {
return errors.New("empty order")
}
total := 0.0
for _, item := range o.Items {
total += item.Price * float64(item.Qty)
}
if _, err := db.Exec("INSERT INTO orders ...", o.ID, total); err != nil {
return fmt.Errorf("save order: %w", err)
}
if err := mailer.Send(o.Customer.Email, "confirmed"); err != nil {
return fmt.Errorf("send confirmation: %w", err)
}
return nil
}
// Good: single level of abstraction, composed from focused functions
func ProcessOrder(ctx context.Context, o *Order, repo OrderRepository, mailer Mailer) error {
if err := validateOrder(o); err != nil {
return fmt.Errorf("validate: %w", err)
}
total := calculateTotal(o.Items)
if err := repo.Save(ctx, o, total); err != nil {
return fmt.Errorf("save: %w", err)
}
if err := mailer.SendConfirmation(ctx, o.Customer.Email); err != nil {
return fmt.Errorf("notify: %w", err)
}
return nil
}
```
## Error Handling (Go-Specific)
In Go, error returns are a first-class feature, not a smell. Follow these rules:
```go
// Always wrap with context
if err != nil {
return fmt.Errorf("parse config: %w", err)
}
// Never: naked return without context
if err != nil {
return err
}
// Never: log and return (pick one)
if err != nil {
log.Error("failed", "err", err)
return err // caller gets logged error AND returns it — double-counted
}
// Never: swallow errors silently
_ = doSomething()
```
Error wrapping builds a call-chain narrative: `"http handler: parse request: validate email: invalid format"`. This is valuable, not verbose.
## Comments
**The best comment is code that doesn't need one.** Before writing a comment, ask whether you could make the code itself clearer.
When comments are necessary, explain **why**, not what. The code shows what happens; comments explain decisions and intent.
```go
// Bad: explains what (redundant)
// increment counter
i++
// Good: explains why
// Offset by 1 because the upstream API uses 1-based page numbering
page := requestedPage + 1
```
**Go adaptation:** `godoc` comments on exported identifiers ARE documentation, not a failure. The "comments are failure" rule applies to inline comments explaining bad code. Exported types and functions must have godoc comments.
```go
// UserService handles user lifecycle operations.
// It is safe for concurrent use.
type UserService struct { ... }
// GetByEmail returns the user with the given email address.
// Returns ErrNotFound if no user exists with that email.
func (s *UserService) GetByEmail(ctx context.Context, email Email) (User, error) { ... }
```
## SOLID Principles (Brief)
For full SOLID guidance with Go examples, load the `solid` skill.
| Principle | Quick test |
|-----------|------------|
| SRP | "Does this type have ONE reason to change?" |
| OCP | "Can I add behavior by implementing an interface, not editing this code?" |
| LSP | "Can callers substitute any implementation without knowing the difference?" |
| ISP | "Is this interface the smallest it can be?" (`io.Reader` is a model) |
| DIP | "Do I pass interfaces as parameters, not concrete types?" |
## GRASP Principles
**Information Expert:** Assign responsibility to the type that has the data needed to fulfil it.
**Low Coupling:** Minimize dependencies between packages. When one changes, few others should too.
**High Cohesion:** Keep related functionality together. Everything in a package should relate to its central purpose.
**Controller:** One type handles system events for a use case.
**Tell, Don't Ask:** Tell objects what to do rather than asking for their data and acting on it.
## Go-Specific Clean Code Rules
### Prefer interfaces over concrete types in function signatures
```go
// Bad: tied to a concrete implementation
func NewHandler(db *postgres.DB) *Handler { ... }
// Good: depends on the behavior needed
func NewHandler(store UserStore) *Handler { ... }
```
### Small, focused interfaces
```go
// The io.Reader/io.Writer pattern is the Go ideal
type Reader interface {
Read(p []byte) (n int, err error)
}
// If you need both, compose at the call site
type ReadWriter interface {
Reader
Writer
}
```
### Embedding over inheritance
```go
// Use embedding for composition, not to inherit behavior you'll override
type LoggedHandler struct {
Handler
logger *slog.Logger
}
```
### Constructor injection for dependencies
```go
type Service struct {
repo Repository
mailer Mailer
logger *slog.Logger
}
func NewService(repo Repository, mailer Mailer, logger *slog.Logger) *Service {
return &Service{repo: repo, mailer: mailer, logger: logger}
}
```
### Avoid package-level state
Package-level variables are global state. They make tests unreliable and code hard to reason about.
```go
// Bad
var db *sql.DB
// Good: pass as dependency
type Handler struct {
db *sql.DB
}
```
## Code Smell Detection (Quick Reference)
During the REFACTOR phase, look for these smells and address them:
| Smell | Go Signal | Fix |
|-------|-----------|-----|
| Long function | > ~30 lines (Go can go longer, but be honest) | Extract function |
| Too many parameters | > 3-4 parameters | Use a config struct or option pattern |
| Divergent change | One type changes for multiple unrelated reasons | Split into focused types |
| Feature envy | Method uses another type's data more than its own | Move method to that type |
| Magic numbers | Bare literals | Named constants or typed config |
| Deep nesting | > 3 levels of indentation | Early returns, extract helpers |
For comprehensive smell detection, load the `code-review` skill.
## Structural Principles
**Separation of concerns:** Business logic should know nothing of HTTP, SQL, or filesystems.
**Modularity:** Each package understandable in isolation.
**Command-Query Separation:** Functions either do something (command) or return something (query), not both.
**Principle of Least Surprise:** Code should behave the way readers expect.
**Boy Scout Rule:** Leave the code better than you found it.
## Pre-Refactor Checklist
Before cleaning up code:
- [ ] All tests pass
- [ ] Race detector clean: `go test -race ./...`
- [ ] I understand what the code does
- [ ] I have a clear target design in mind
During refactoring:
- [ ] Tests stay green after each change
- [ ] Each change is small and focused
- [ ] Commit after each meaningful improvement
## References
- `references/clean-code.md` — detailed naming rules, object calisthenics
- `references/code-smells.md` — comprehensive smell catalog with examples
- Load `solid` skill for SOLID principles with Go examples
- Load `refactoring` skill for systematic refactoring techniques

View File

@@ -0,0 +1,376 @@
# Clean Code Practices
## What is Clean Code?
Code that is:
- **Easy to understand** - reveals intent clearly
- **Easy to change** - modifications are localized
- **Easy to test** - dependencies are injectable
- **Simple** - no unnecessary complexity
## The Human-Centered Approach
Code has THREE consumers:
1. **Users** - get their needs met
2. **Customers** - make or save money
3. **Developers** - must maintain it
Design for all three, but remember: **developers read code 10x more than they write it.**
## Naming Principles
### 1. Consistency & Uniqueness (HIGHEST PRIORITY)
Same concept = same name everywhere. One name per concept.
```typescript
// BAD: Inconsistent names for same concept
getUserById(id)
fetchCustomerById(id)
retrieveClientById(id)
// GOOD: Consistent
getUser(id)
getOrder(id)
getProduct(id)
```
### 2. Understandability
Use domain language, not technical jargon.
```typescript
// BAD: Technical
const arr = users.filter(u => u.isActive);
// GOOD: Domain language
const activeCustomers = users.filter(user => user.isActive);
```
### 3. Specificity
Avoid vague names: `data`, `info`, `manager`, `handler`, `processor`, `utils`
```typescript
// BAD: Vague
class DataManager { }
function processInfo(data) { }
// GOOD: Specific
class OrderRepository { }
function validatePayment(payment) { }
```
### 4. Brevity (but not at cost of clarity)
Short names are good only if meaning is preserved.
```typescript
// BAD: Too cryptic
const usrLst = getUsrs();
// BAD: Unnecessarily long
const listOfAllActiveUsersInTheSystem = getActiveUsers();
// GOOD: Brief but clear
const activeUsers = getActiveUsers();
```
### 5. Searchability
Names should be unique enough to grep/search.
```typescript
// BAD: Common word, hard to search
const data = fetch();
// GOOD: Unique, searchable
const orderSummary = fetchOrderSummary();
```
### 6. Pronounceability
You should be able to say it in conversation.
```typescript
// BAD
const genymdhms = generateYearMonthDayHourMinuteSecond();
// GOOD
const timestamp = generateTimestamp();
```
### 7. Austerity
Avoid unnecessary filler words.
```typescript
// BAD: Redundant
const userData = user; // 'Data' adds nothing
class UserClass { } // 'Class' adds nothing
// GOOD
const user = user;
class User { }
```
---
## Object Calisthenics (9 Rules)
Exercises to improve OO design. Follow strictly during practice, relax slightly in production.
### 1. One Level of Indentation per Method
```typescript
// BAD: Multiple levels
function process(orders: Order[]) {
for (const order of orders) {
if (order.isValid()) {
for (const item of order.items) {
if (item.inStock) {
// process...
}
}
}
}
}
// GOOD: Extract methods
function process(orders: Order[]) {
orders.filter(o => o.isValid()).forEach(processOrder);
}
function processOrder(order: Order) {
order.items.filter(i => i.inStock).forEach(processItem);
}
```
### 2. Don't Use the ELSE Keyword
Use early returns, guard clauses, or polymorphism.
```typescript
// BAD: else
function getDiscount(user: User): number {
if (user.isPremium) {
return 20;
} else {
return 0;
}
}
// GOOD: Early return
function getDiscount(user: User): number {
if (user.isPremium) return 20;
return 0;
}
```
### 3. Wrap All Primitives and Strings
Primitives should be wrapped in domain objects when they have meaning.
```typescript
// BAD: Primitive obsession
function createUser(email: string, age: number) { }
// GOOD: Value objects
class Email {
constructor(private value: string) {
if (!this.isValid(value)) throw new InvalidEmail();
}
private isValid(email: string): boolean { ... }
}
class Age {
constructor(private value: number) {
if (value < 0 || value > 150) throw new InvalidAge();
}
}
function createUser(email: Email, age: Age) { }
```
### 4. First-Class Collections
Any class with a collection should have no other instance variables.
```typescript
// BAD: Collection mixed with other state
class Order {
items: OrderItem[] = [];
customerId: string;
total: number;
}
// GOOD: Collection is its own class
class OrderItems {
constructor(private items: OrderItem[] = []) {}
add(item: OrderItem): void { ... }
total(): Money { ... }
isEmpty(): boolean { ... }
}
class Order {
constructor(
private items: OrderItems,
private customerId: CustomerId
) {}
}
```
### 5. One Dot per Line (Law of Demeter)
Don't chain through object graphs.
```typescript
// BAD: Train wreck
const city = order.customer.address.city;
// GOOD: Tell, don't ask
const city = order.getShippingCity();
```
### 6. Don't Abbreviate
If a name is too long to type, the class is doing too much.
```typescript
// BAD
const custRepo = new CustRepo();
const ord = new Ord();
// GOOD
const customerRepository = new CustomerRepository();
const order = new Order();
```
### 7. Keep All Entities Small
- Classes: < 50 lines
- Methods: < 10 lines
- Files: < 100 lines
If larger, it's probably doing too much. Split it.
### 8. No Classes with More Than Two Instance Variables
Forces small, focused classes.
```typescript
// BAD: Too many variables
class Order {
id: string;
customerId: string;
items: Item[];
total: number;
status: string;
}
// GOOD: Composed of smaller objects
class Order {
constructor(
private id: OrderId,
private details: OrderDetails
) {}
}
class OrderDetails {
constructor(
private customer: Customer,
private lineItems: LineItems
) {}
}
```
### 9. No Getters/Setters/Properties
Objects should have behavior, not just data. Tell objects what to do.
```typescript
// BAD: Data bag with getters
class Account {
getBalance(): number { return this.balance; }
setBalance(value: number) { this.balance = value; }
}
// Caller does the work
if (account.getBalance() >= amount) {
account.setBalance(account.getBalance() - amount);
}
// GOOD: Behavior-rich object
class Account {
withdraw(amount: Money): WithdrawResult {
if (!this.canWithdraw(amount)) {
return WithdrawResult.insufficientFunds();
}
this.balance = this.balance.subtract(amount);
return WithdrawResult.success();
}
}
// Caller tells, object decides
const result = account.withdraw(amount);
```
---
## Comments
### When to Write Comments
**Only write comments to explain WHY, not WHAT or HOW.**
Code explains what and how. Comments explain business reasons, non-obvious decisions, or warnings.
```typescript
// BAD: Explains what (redundant)
// Add 1 to counter
counter++;
// GOOD: Explains why
// Compensate for 0-based indexing in legacy API
counter++;
```
### Prefer Self-Documenting Code
Instead of commenting, rename to make intent clear.
```typescript
// BAD: Comment needed
// Check if user can access premium features
if (user.subscriptionLevel >= 2 && !user.isBanned) { }
// GOOD: Self-documenting
if (user.canAccessPremiumFeatures()) { }
```
---
## Formatting
### Vertical Spacing
- Related code together
- Blank lines between concepts
- Most important/public at top
### Horizontal Spacing
- Consistent indentation
- Space around operators
- Max line length ~80-120 characters
### Storytelling
Code should read top-to-bottom like a story. High-level at top, details below.
```typescript
class OrderProcessor {
// Public API first
process(order: Order): ProcessResult {
this.validate(order);
this.calculateTotals(order);
return this.save(order);
}
// Supporting methods below, in order of appearance
private validate(order: Order): void { ... }
private calculateTotals(order: Order): void { ... }
private save(order: Order): ProcessResult { ... }
}
```

View File

@@ -0,0 +1,334 @@
# Code Smells & Anti-Patterns
## What Are Code Smells?
Indicators that something MAY be wrong. Not bugs, but design problems that make code hard to understand, change, or test.
## The Five Categories
### 1. Bloaters
Code that has grown too large.
| Smell | Symptom | Refactoring |
|-------|---------|-------------|
| **Long Method** | > 10 lines | Extract Method |
| **Large Class** | > 50 lines, multiple responsibilities | Extract Class |
| **Long Parameter List** | > 3 parameters | Introduce Parameter Object |
| **Data Clumps** | Same group of variables appear together | Extract Class |
| **Primitive Obsession** | Primitives instead of small objects | Wrap in Value Object |
### 2. Object-Orientation Abusers
Misuse of OO principles.
| Smell | Symptom | Refactoring |
|-------|---------|-------------|
| **Switch Statements** | Type checking, large switch/if-else | Replace with Polymorphism |
| **Parallel Inheritance** | Adding subclass requires adding another | Merge Hierarchies |
| **Refused Bequest** | Subclass doesn't use parent methods | Replace Inheritance with Delegation |
| **Alternative Classes** | Different interfaces, same concept | Rename, Extract Superclass |
### 3. Change Preventers
Code that makes changes difficult.
| Smell | Symptom | Refactoring |
|-------|---------|-------------|
| **Divergent Change** | One class changed for many reasons | Extract Class (SRP) |
| **Shotgun Surgery** | One change touches many classes | Move Method/Field together |
| **Parallel Inheritance** | (see above) | Merge Hierarchies |
### 4. Dispensables
Code that can be removed.
| Smell | Symptom | Refactoring |
|-------|---------|-------------|
| **Comments** | Explaining bad code | Rename, Extract Method |
| **Duplicate Code** | Copy-paste | Extract Method, Pull Up Method |
| **Dead Code** | Unreachable code | Delete |
| **Speculative Generality** | "Just in case" code | Delete (YAGNI) |
| **Lazy Class** | Class that does almost nothing | Inline Class |
### 5. Couplers
Excessive coupling between classes.
| Smell | Symptom | Refactoring |
|-------|---------|-------------|
| **Feature Envy** | Method uses another class's data extensively | Move Method |
| **Inappropriate Intimacy** | Classes know too much about each other | Move Method, Extract Class |
| **Message Chains** | `a.getB().getC().getD()` | Hide Delegate |
| **Middle Man** | Class only delegates | Inline Class |
---
## The Seven Most Common Code Smells
### 1. Long Method
**Symptom:** Method > 10 lines, doing multiple things.
```typescript
// SMELL
function processOrder(order: Order) {
// Validate
if (!order.items.length) throw new Error('Empty');
if (!order.customer) throw new Error('No customer');
// Calculate
let total = 0;
for (const item of order.items) {
total += item.price * item.quantity;
if (item.discount) {
total -= item.discount;
}
}
// Apply tax
const taxRate = getTaxRate(order.customer.state);
total = total * (1 + taxRate);
// Save
db.orders.insert({ ...order, total });
// Notify
emailService.send(order.customer.email, 'Order confirmed');
}
// REFACTORED
function processOrder(order: Order) {
validateOrder(order);
const total = calculateTotal(order);
saveOrder(order, total);
notifyCustomer(order);
}
```
### 2. Large Class
**Symptom:** Class with many responsibilities, > 50 lines.
```typescript
// SMELL: God class
class User {
// User data
name: string;
email: string;
// Authentication
login() { }
logout() { }
resetPassword() { }
// Preferences
setTheme() { }
setLanguage() { }
// Notifications
sendEmail() { }
sendSMS() { }
// Billing
charge() { }
refund() { }
}
// REFACTORED: Separate classes
class User { name: string; email: string; }
class AuthService { login(); logout(); resetPassword(); }
class UserPreferences { setTheme(); setLanguage(); }
class NotificationService { sendEmail(); sendSMS(); }
class BillingService { charge(); refund(); }
```
### 3. Feature Envy
**Symptom:** Method uses another class's data more than its own.
```typescript
// SMELL: Order envies Customer
class Order {
calculateShipping(customer: Customer): number {
if (customer.country === 'US') {
if (customer.state === 'CA') return 10;
return 15;
}
return 25;
}
}
// REFACTORED: Move to Customer
class Customer {
getShippingCost(): number {
if (this.country === 'US') {
if (this.state === 'CA') return 10;
return 15;
}
return 25;
}
}
class Order {
calculateShipping(): number {
return this.customer.getShippingCost();
}
}
```
### 4. Primitive Obsession
**Symptom:** Using primitives for domain concepts.
```typescript
// SMELL
function createUser(email: string, age: number, zipCode: string) {
// No validation, easy to pass wrong values
if (!email.includes('@')) throw new Error();
if (age < 0) throw new Error();
}
// REFACTORED: Value objects
class Email {
constructor(private value: string) {
if (!value.includes('@')) throw new InvalidEmail();
}
}
class Age {
constructor(private value: number) {
if (value < 0 || value > 150) throw new InvalidAge();
}
}
function createUser(email: Email, age: Age, address: Address) {
// Type system prevents invalid data
}
```
### 5. Switch Statements
**Symptom:** Switching on type, repeated across codebase.
```typescript
// SMELL
function getArea(shape: Shape): number {
switch (shape.type) {
case 'circle': return Math.PI * shape.radius ** 2;
case 'rectangle': return shape.width * shape.height;
case 'triangle': return 0.5 * shape.base * shape.height;
}
}
function getPerimeter(shape: Shape): number {
switch (shape.type) { // Same switch again!
case 'circle': return 2 * Math.PI * shape.radius;
// ...
}
}
// REFACTORED: Polymorphism
interface Shape {
getArea(): number;
getPerimeter(): number;
}
class Circle implements Shape {
constructor(private radius: number) {}
getArea(): number { return Math.PI * this.radius ** 2; }
getPerimeter(): number { return 2 * Math.PI * this.radius; }
}
```
### 6. Inappropriate Intimacy
**Symptom:** Classes know too much about each other's internals.
```typescript
// SMELL
class Order {
process() {
const inventory = new Inventory();
// Reaching into inventory's internals
for (const item of this.items) {
const stock = inventory.stockLevels[item.sku];
if (stock.quantity < item.quantity) {
throw new Error('Out of stock');
}
inventory.stockLevels[item.sku].quantity -= item.quantity;
}
}
}
// REFACTORED: Tell, don't ask
class Inventory {
reserve(items: OrderItem[]): ReserveResult {
// Inventory manages its own state
for (const item of items) {
if (!this.canReserve(item)) {
return ReserveResult.outOfStock(item);
}
}
this.deductStock(items);
return ReserveResult.success();
}
}
class Order {
process(inventory: Inventory) {
const result = inventory.reserve(this.items);
if (!result.isSuccess()) {
throw new OutOfStockError(result.failedItem);
}
}
}
```
### 7. Speculative Generality
**Symptom:** "Just in case" abstractions that aren't used.
```typescript
// SMELL: Over-engineered for hypothetical needs
interface PaymentProcessor {
process(): void;
rollback(): void;
audit(): void;
generateReport(): void;
scheduleRecurring(): void;
}
class StripeProcessor implements PaymentProcessor {
process() { /* actual code */ }
rollback() { throw new Error('Not implemented'); }
audit() { throw new Error('Not implemented'); }
generateReport() { throw new Error('Not implemented'); }
scheduleRecurring() { throw new Error('Not implemented'); }
}
// REFACTORED: YAGNI
interface PaymentProcessor {
process(): void;
}
class StripeProcessor implements PaymentProcessor {
process() { /* actual code */ }
}
// Add other methods when actually needed
```
---
## Prevention Strategies
1. **Follow Object Calisthenics** - Rules prevent most smells
2. **Practice TDD** - Tests reveal design problems early
3. **Review in pairs** - Fresh eyes catch smells
4. **Refactor continuously** - Don't let smells accumulate
5. **Apply SOLID** - Prevents structural smells
6. **Use static analysis** - Tools catch common issues
---
## When You Find a Smell
1. **Confirm it's a problem** - Not all smells need fixing
2. **Ensure test coverage** - Before refactoring
3. **Refactor in small steps** - Keep tests passing
4. **Commit frequently** - Easy to revert if needed

284
code-review/SKILL.md Normal file
View File

@@ -0,0 +1,284 @@
---
name: code-review
description: Systematic code smell detection and Go-specific review checklist. Load before any pre-merge review.
---
# Code Review
## Overview
Code review is smell detection and principle verification, not style policing. The goal is to identify patterns that make code hard to understand, change, test, or safely operate.
**This skill is read-only analysis.** When you find issues, document them with specific file:line references and prioritize by architectural impact. For fixes, load the `refactoring` or `clean-code` skill.
## When to Use
- Pre-merge review of any pull request
- Before committing a significant change
- When asked to "review this code"
- After the GREEN phase of TDD (use with `clean-code` skill)
## Review Process
### Phase 0: Iron Laws (block on any violation)
Before reading the diff, scan for these. Any violation is a blocking finding regardless of how clean the rest of the change looks.
1. **No security vulnerabilities** — command injection, SQL injection, credential exposure, path traversal, unchecked input at system boundaries
2. **No silently swallowed errors**`err != nil` without wrapping, logging, or surfacing is always wrong; `_ = f()` on a fallible call is wrong
3. **No missing validation at system boundaries** — user input, external API responses, and file reads must be validated before use
If any iron law is violated, flag it as **Blocking** before any other phase. The other phases still run, but the iron-law findings come first in the report. See Phase 3 for detailed patterns and code examples covering iron laws 2 and 3.
### Phase 1: Context
Before examining code:
1. Understand what the change is trying to accomplish
2. Identify the affected packages and their purpose
3. Check test coverage — is the behavior tested?
4. Read the commit message and PR description
### Phase 2: Architectural Analysis (Highest Priority)
Check for structural problems first. These are expensive to fix later.
**Dependency violations:**
- Does domain logic import infrastructure packages?
- Are there circular imports between packages?
- Does a type do things that multiple different stakeholders would request changes to?
**Abstraction problems:**
- Is there a large switch/if-else on a type string that should be an interface?
- Are there God types that know too much about too many things?
- Is there Shotgun Surgery — one logical change requires editing many files?
### Phase 3: Go-Specific Checklist
#### Error Handling
```go
// Required: wrap with context
return fmt.Errorf("parse invoice: %w", err)
// Forbidden: naked return
return err
// Forbidden: log and return (pick one)
log.Error("failed", "err", err)
return err
// Forbidden: silently ignore
_ = doSomething()
// Suspicious: sentinel errors without documentation
var ErrFoo = errors.New("foo") // Is this exported? Is it documented?
```
#### Goroutine Safety
- Is shared state protected by a mutex or accessed only via channels?
- Are goroutines guaranteed to terminate (no goroutine leaks)?
- Is `sync.WaitGroup` used correctly (Add before go, Done via defer)?
- Is there a `go test -race` passing?
```go
// Check: is this safe for concurrent use?
type Cache struct {
mu sync.RWMutex // Good: explicit lock
items map[string]Item
}
// Red flag: map access without lock in concurrent context
type BadCache struct {
items map[string]Item // Unsafe if accessed from multiple goroutines
}
```
#### Exported API Minimalism
- Only export what callers outside the package actually need
- Internal implementation details should be unexported
- Does the exported API make sense as a standalone contract?
```go
// Good: minimal exported surface
type UserService struct { /* unexported fields */ }
func NewUserService(...) *UserService { ... }
func (s *UserService) GetUser(ctx context.Context, id UserID) (User, error) { ... }
// Bad: over-exported internals
type UserService struct {
Store UserStore // Exported field — callers can replace it, breaks encapsulation
Logger *slog.Logger // Exported — callers can mutate it
}
```
#### Dependency Safety
Before adding any new dependency:
- Has `govulncheck` been run? (`govulncheck ./...`)
- Is there a justification in the commit message for non-stdlib additions?
- Pre-approved without justification: `testify`, `slog`, `templ`, `sqlc`
- Everything else needs a reason
#### Generic/interface{} Usage
```go
// Bad: any/interface{} when a concrete type or generic works
func Process(data interface{}) error { ... }
// Good: typed
func Process[T Processor](data T) error { ... }
// Good: concrete type when only one type is expected
func ProcessOrder(o *Order) error { ... }
```
#### Context Propagation
- Every function that does I/O (DB, HTTP, file) should accept `ctx context.Context` as the first parameter
- Context should be passed down, not stored in structs
- `context.Background()` should only appear at the top-level (main, test setup)
```go
// Good
func (s *Store) GetUser(ctx context.Context, id UserID) (User, error) { ... }
// Bad: no context — can't cancel, can't trace, can't timeout
func (s *Store) GetUser(id UserID) (User, error) { ... }
// Bad: context stored in struct
type Handler struct {
ctx context.Context // Don't do this
}
```
#### Logging with slog
```go
// Good: structured, leveled
slog.InfoContext(ctx, "user created", "user_id", u.ID, "email", u.Email)
// Bad: unstructured
log.Printf("user created: %s", u.Email)
// Bad: logging errors that are also returned
slog.Error("failed to save", "err", err)
return fmt.Errorf("save user: %w", err) // caller is double-counting
```
### Phase 4: Code Smell Detection
Apply the catalog from `clean-code/references/code-smells.md`. Key smells to prioritize:
**Architectural impact (fix immediately):**
- Global Data — package-level variables modified at runtime
- Shotgun Surgery — one change touches many unrelated files
- Feature Envy — a method uses another package's internals more than its own
- God Object — a type that does everything
**Design impact (fix before merge if possible):**
- Long Method — functions > ~50 lines in Go (Go can be longer than OOP, but be honest)
- Long Parameter List — > 4-5 parameters; consider a config struct
- Duplicate Code — same logic copy-pasted (wait for Rule of Three)
- Message Chain — `a.B().C().D()` — violates Law of Demeter
- Primitive Obsession — using `string` for email, user ID, currency
**Readability (note in review, fix if cheap):**
- Magic Numbers — bare literals without named constants
- Uncommunicative Names — `data`, `info`, `x`, `tmp`
- Fallacious Comments — comments that don't match the code
### Phase 5: Test Quality
- Are new behaviors covered by tests?
- Are tests table-driven where appropriate?
- Do tests test behavior, not implementation?
- Are there mock-only tests that test nothing real?
For comprehensive test review, load the `test-design` skill.
## Severity Classification
| Severity | Description | Action |
|----------|-------------|--------|
| **Blocking** | Security issue, data corruption risk, broken behavior | Must fix before merge |
| **High** | Architectural violation, goroutine safety, missing context | Fix before merge |
| **Medium** | Design smell, missing tests for changed behavior | Fix in this PR or create issue |
| **Low** | Naming, style, minor duplication | Note in review, optional |
## Output Format
When documenting findings:
```markdown
## Code Review: [package/file]
### Blocking
- `internal/store/user.go:47`: Error swallowed with `_ =` — callers can't know if Save failed
### High
- `internal/handler/order.go:12`: Domain logic in HTTP handler — moves tax calculation out of service layer
- `cmd/server/main.go:89`: Missing context propagation — `GetUser` call has no timeout
### Medium
- `internal/service/invoice.go:34`: Long method (87 lines) — extract `calculateLineItems` and `applyTaxRules`
### Low
- `internal/domain/user.go:5`: Field `Data string` — name reveals nothing about what data
```
## Brain MCP Integration
The brain MCP holds prior review decisions across sessions. Use it to avoid re-litigating the same tradeoffs.
**At review start:**
- Run `brain_query` with the package or feature name + "review" to surface prior review findings, recurring smells, or architectural decisions for this area. This prevents flagging the same smell that was already accepted as a tradeoff.
**After review completes:**
- For findings the author accepts as a tradeoff (with rationale), run `brain_write` to record the decision. Future reviews will not re-flag it.
**Never:**
- Embed brain queries in the review output. They are context for you, not findings for the author.
### Logging
Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.
**At end of each phase:**
- `session_log` with `{skill: "code-review", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**Phases for this skill:** phase-0-iron-laws, phase-1-context, phase-2-architectural-analysis, phase-3-go-checklist, phase-4-code-smell-detection, phase-5-test-quality
**Status semantics:**
- `pass` — the phase's intended outcome was reached.
- `fail` — the phase's intended outcome was NOT reached.
- `skip` — phase was skipped intentionally.
**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
## Common Go Anti-Patterns
| Anti-Pattern | What to Look For |
|-------------|-----------------|
| Goroutine leak | `go func()` without a done channel or WaitGroup |
| Error ignored | `_ = f()` or missing `if err != nil` after a fallible call |
| Nil map write | `m[k] = v` without `m = make(map[...]{})` initialization |
| Race on map | Concurrent map read/write without mutex |
| ctx in struct | `type T struct { ctx context.Context }` |
| Naked panic | `panic("something went wrong")` in non-initialization code |
| fmt.Println in library | Use `slog` or accept a logger; never print to stdout from lib code |
| init() abuse | `init()` with side effects that make testing hard |
## References
- `clean-code/references/code-smells.md` — comprehensive smell catalog
- Load `refactoring` skill when recommending fixes
- Load `test-design` skill for reviewing test quality
- Load `solid` skill for SOLID principle violations
## Mode 2 Routing Note
This skill is invoked identically in Mode 1 (cloud Claude) and Mode 2 (client-local with supervisor routing). The routing layer (Plan 6) does not exist yet; treat this skill as Mode 1 only for now. Findings format and severity classification are unchanged across modes.

346
cognitive-load/SKILL.md Normal file
View File

@@ -0,0 +1,346 @@
---
name: cognitive-load
description: Measure how much mental effort a codebase demands from developers. Use when identifying hotspots, before architectural decisions, or when code feels hard to understand.
---
# Cognitive Load Analysis
## Overview
Cognitive load is the mental effort required to understand, modify, and reason about code. High cognitive load leads to bugs, slow development, and burnout.
This skill provides a framework for measuring and reducing cognitive load using the **Cognitive Load Index (CLI)**, scored 01000.
## CLI Score Rating Scale
| CLI Score | Rating | Action |
|-----------|--------|--------|
| 0100 | Excellent | Model codebase |
| 101250 | Good | Minor improvements |
| 251400 | Moderate | Address hotspots |
| 401600 | Concerning | Plan refactoring |
| 601800 | Poor | Significant refactoring needed |
| 801999 | Severe | Consider rewrite of affected areas |
## The 8 Dimensions
### D1: Structural Complexity
Cyclomatic complexity — the number of independent paths through code. Each `if`, `for`, `switch`, `case`, `&&`, `||` adds one.
**Thresholds for Go:**
- Simple function: < 5 (no issue)
- Moderate: 510 (watch closely)
- Complex: 1015 (consider refactoring)
- Critical: > 15 (must refactor)
**Detection:**
```bash
# Install: go install github.com/fzipp/gocyclo/cmd/gocyclo@latest
gocyclo -over 10 ./...
```
**Go hotspots:** Request handlers that mix auth, validation, business logic, and response formatting in one function.
### D2: Nesting Depth
Maximum nesting depth — how many levels deep a reader must track.
**Thresholds for Go:**
- Ideal: 12 levels
- Acceptable: 3 levels
- Warning: 4 levels
- Critical: 5+ levels
**Fix:** Early returns and guard clauses eliminate deep nesting:
```go
// Bad: 4 levels deep
func process(r *http.Request) error {
if r != nil {
if r.Body != nil {
data, err := io.ReadAll(r.Body)
if err == nil {
// actual work at level 4
}
}
}
return nil
}
// Good: guard clauses, 1 level
func process(r *http.Request) error {
if r == nil {
return errors.New("nil request")
}
if r.Body == nil {
return errors.New("no request body")
}
data, err := io.ReadAll(r.Body)
if err != nil {
return fmt.Errorf("read body: %w", err)
}
// actual work at level 1
return nil
}
```
### D3: Volume/Size
Lines of code per file and function. Size alone isn't the issue — hidden responsibilities are.
**Thresholds for Go:**
- Function: > 50 lines is a signal (Go can go longer, but be honest about why)
- File: > 400 lines often means too many responsibilities
- Package: > 2000 lines total may need splitting
**Go note:** Table-driven tests inflate LOC legitimately. Distinguish test LOC from production LOC when assessing.
### D4: Naming Quality
Identifiers that don't reveal intent force readers to build a mental model from context.
**Poor naming signals:**
- Single-letter variables outside loop indices: `x`, `d`, `tmp`
- Generic suffixes: `Manager`, `Handler`, `Processor`, `Data`, `Info`, `Util`
- Abbreviations: `usrLst`, `custRepo`, `cfg2`
- Inconsistency: `getUserById`, `FetchCustomerByID`, `retrieveClientById` for the same concept
**Good naming in Go:**
```go
// Bad
func (m *Manager) proc(d interface{}) error { ... }
// Good
func (s *OrderService) PlaceOrder(ctx context.Context, req PlaceOrderRequest) (Order, error) { ... }
```
### D5: Coupling
How many packages/types a given unit depends on. High coupling means a change in one place ripples widely.
**Detection:**
```bash
# Count imports per file
grep -c "^import" *.go
# or examine import blocks manually
```
**Go-specific coupling smells:**
- A domain type that imports from `database/sql`, `net/http`, and `smtp` simultaneously
- A package that imports more than 10 other internal packages
- Circular imports (Go compiler rejects these, but near-circular is still a smell)
### D6: Cohesion
Whether things that belong together are together. Low cohesion means related code is scattered.
**Signs of low cohesion:**
- A function that calls many unrelated packages
- A package where functions don't share data or purpose
- Utility packages that grow without bound (`util`, `common`, `helpers`)
### D7: Duplication
Repeated logic that must be updated in multiple places when requirements change.
**Detection:**
```bash
# Simple: find identical error messages that suggest copy-paste
grep -r "error:" --include="*.go" | sort | uniq -d
```
**Go note:** Apply the Rule of Three — don't extract the first or second duplication. The third is the signal.
### D8: Navigability
How easily can a reader find relevant code and understand entry points?
**Signs of poor navigability:**
- No clear entry point (no `main.go`, no `New*` constructors)
- Package names that don't reflect content
- Files that mix many unrelated types
- Deep directory nesting with unclear boundaries
## Go-Specific Cognitive Load Hotspots
### Deeply Nested Error Handling
Go's explicit error handling is a feature, but it can create visual noise:
```go
// High cognitive load: multiple nested error paths
func (s *Service) Process(ctx context.Context, id string) error {
u, err := s.store.GetUser(ctx, id)
if err != nil {
if errors.Is(err, ErrNotFound) {
o, err := s.store.GetOrgByUserID(ctx, id)
if err != nil {
if errors.Is(err, ErrNotFound) {
return ErrEntityNotFound
}
return fmt.Errorf("get org: %w", err)
}
// ... more nesting
}
return fmt.Errorf("get user: %w", err)
}
// ... happy path
}
// Lower cognitive load: extract to focused functions
func (s *Service) Process(ctx context.Context, id string) error {
entity, err := s.resolveEntity(ctx, id)
if err != nil {
return fmt.Errorf("resolve entity: %w", err)
}
return s.processEntity(ctx, entity)
}
```
### Goroutine Soup
Goroutines without clear lifecycle management are high cognitive load:
```go
// High load: who owns these goroutines? When do they stop?
func startWorkers() {
go processQueue()
go cleanupExpired()
go sendNotifications()
}
// Lower load: explicit lifecycle with context and WaitGroup
func startWorkers(ctx context.Context) *WorkerPool {
pool := &WorkerPool{}
pool.wg.Add(3)
go pool.processQueue(ctx)
go pool.cleanupExpired(ctx)
go pool.sendNotifications(ctx)
return pool
}
func (p *WorkerPool) Shutdown() {
p.cancel()
p.wg.Wait()
}
```
### Over-Generalization with Generics
Generics add expressive power but also cognitive load. Use them sparingly:
```go
// High load: generic just because you can
func Transform[T any, U any](items []T, fn func(T) U) []U { ... }
// Lower load: concrete types when you have only one use case
func transformOrders(orders []Order) []OrderSummary { ... }
```
Extract generics only when you genuinely have multiple type applications.
### Init Functions
`init()` functions run invisibly and create hidden state:
```go
// High load: what order do these run? What state do they set?
func init() {
db = mustConnectDB()
cache = newCache()
}
// Lower load: explicit initialization in main or constructor
func main() {
db := mustConnectDB(cfg.DatabaseURL)
cache := newCache(cfg.CacheSize)
svc := NewService(db, cache)
// ...
}
```
### Long Interfaces
An interface with 10 methods forces implementors to understand all 10 methods before reasoning about any:
```go
// High cognitive load: 10 methods before you can implement
type UserRepository interface {
Create(ctx context.Context, u User) error
GetByID(ctx context.Context, id UserID) (User, error)
GetByEmail(ctx context.Context, email string) (User, error)
Update(ctx context.Context, u User) error
Delete(ctx context.Context, id UserID) error
List(ctx context.Context, filter Filter) ([]User, error)
Count(ctx context.Context, filter Filter) (int64, error)
Exists(ctx context.Context, id UserID) (bool, error)
Lock(ctx context.Context, id UserID) error
Unlock(ctx context.Context, id UserID) error
}
// Lower load: separate by use case
type UserReader interface {
GetByID(ctx context.Context, id UserID) (User, error)
GetByEmail(ctx context.Context, email string) (User, error)
}
type UserWriter interface {
Create(ctx context.Context, u User) error
Update(ctx context.Context, u User) error
Delete(ctx context.Context, id UserID) error
}
```
## When to Measure
- Before starting refactoring work: establish a baseline
- When a package/file is frequently the source of bugs
- When team members consistently report confusion about a subsystem
- Before a major feature addition that will touch complex code
## Hotspot Identification Process
1. **Size sweep:** Find files > 400 LOC and functions > 50 LOC
```bash
find . -name "*.go" ! -name "*_test.go" | xargs wc -l | sort -rn | head -20
```
2. **Complexity sweep:** Find high cyclomatic complexity
```bash
gocyclo -over 10 ./...
```
3. **Naming sweep:** Find packages with vague names
```bash
ls internal/ # Are there `util`, `common`, `helpers`?
```
4. **Coupling sweep:** Find files with many imports
```bash
grep -l "^import" **/*.go | xargs grep -c '"' | sort -t: -k2 -rn | head -10
```
5. **Duplication sweep:**
```bash
# Look for copied error strings, identical function signatures
grep -rn "TODO\|FIXME\|HACK" --include="*.go" # Technical debt markers
```
## Reduction Strategies
| High CLI Dimension | Primary Fix |
|-------------------|-------------|
| D1: Structural Complexity | Extract function, replace conditional with polymorphism |
| D2: Nesting Depth | Guard clauses, early returns |
| D3: Volume/Size | Extract type, split package |
| D4: Naming | Rename (most impactful, cheapest refactoring) |
| D5: Coupling | Introduce interface, apply DIP |
| D6: Cohesion | Extract type, reorganize package |
| D7: Duplication | Extract shared function (after Rule of Three) |
| D8: Navigability | Reorganize package structure, add package docs |
## Cross-References
- Load `refactoring` skill for systematic techniques to reduce identified hotspots
- Load `solid` skill for structural fixes (coupling, cohesion)
- Load `clean-code` skill for naming improvements

146
debug/SKILL.md Normal file
View File

@@ -0,0 +1,146 @@
---
name: debug
description: Systematic hypothesis-first debugging. Generate 3-5 ranked hypotheses with verification steps before suggesting any fix. Use when encountering a bug, test failure, or unexpected behavior.
---
# Debug
## Overview
Debugging is hypothesis generation, not fix generation. The first instinct — "try this and see if it works" — wastes time and teaches nothing. A disciplined debug session produces a small set of ranked, falsifiable hypotheses, each with a concrete verification command. The fix comes after one hypothesis is confirmed, not before.
**Core principle:** A hypothesis you cannot verify in one command is not a hypothesis — it is a guess.
## When to Use
- Any test failure whose cause is not immediately obvious
- Any production error or unexpected behavior
- Any bug report from a user or stakeholder
- Before reaching for a debugger or adding print statements: form hypotheses first
**Do not use for:**
- Compile errors with clear messages (just fix the typo)
- Already-diagnosed bugs where you know the cause (go straight to TDD with a regression test)
## Iron Laws
1. **Never suggest "try X and see what happens."** Every hypothesis must have a specific expected outcome if correct.
2. **Generate 3-5 hypotheses, ordered by likelihood (most likely first).** Fewer than 3 means you stopped thinking; more than 5 means you are not prioritizing.
3. **Diagnose only — do not fix in this skill.** The fix happens in a separate TDD cycle (load `tdd` skill) once a hypothesis is confirmed.
## Process
### Step 1: Read the failure
Before forming hypotheses:
- Read the full error message and stack trace, not just the headline
- Read the file where the failure originated, around the failing line
- If the failure is from a test, read the test and the code under test
- Identify the **failure mode** — what actually went wrong (e.g. "nil pointer dereference in goroutine spawned by handler") not just what the error says ("runtime error")
### Step 2: Generate hypotheses
For each hypothesis, capture three things:
- **Mechanism:** what specific code path or state would produce this exact failure
- **Verification:** the single command or file inspection that confirms or denies it
- **Expected outcome if correct:** the specific output you would see
Order by likelihood. The most likely cause is hypothesis 1.
### Step 3: Output
Use this format:
```
HYPOTHESIS 1 (likelihood: high): <mechanism in one sentence>
VERIFY: <exact command or file:line to inspect>
EXPECTED IF CORRECT: <specific output, value, or condition>
HYPOTHESIS 2 (likelihood: medium): <mechanism>
VERIFY: <exact command>
EXPECTED IF CORRECT: <specific output>
[... up to 5 ...]
RECOMMENDED NEXT STEP: Run VERIFY for hypothesis 1 first.
```
End with the recommendation, not a fix.
## Worked Example
**Failure:**
```
--- FAIL: TestInvoiceParser_HandlesEmptyPDF (0.00s)
parser_test.go:47: panic: runtime error: index out of range [0] with length 0
```
**Output:**
```
HYPOTHESIS 1 (likelihood: high): parser indexes into pages[0] before checking len(pages) > 0; empty PDFs produce a zero-page document
VERIFY: rg -n 'pages\[0\]' internal/parser/
EXPECTED IF CORRECT: at least one site reads pages[0] without a preceding length check
HYPOTHESIS 2 (likelihood: medium): the test fixture is a zero-byte file rather than a valid empty PDF; pdf library returns nil pages slice instead of empty
VERIFY: ls -la testdata/empty.pdf && file testdata/empty.pdf
EXPECTED IF CORRECT: file size 0 bytes or "data" rather than "PDF document"
HYPOTHESIS 3 (likelihood: low): a recently changed dependency reordered the page-extraction API; pages[0] now refers to metadata, not content
VERIFY: git log --oneline -10 -- go.sum | grep -i pdf
EXPECTED IF CORRECT: a pdf-library bump in the last few commits
RECOMMENDED NEXT STEP: Run VERIFY for hypothesis 1 first.
```
## Anti-Patterns
| Anti-Pattern | Why It Fails |
|---|---|
| "Maybe try restarting the service" | Not a mechanism. Not falsifiable. Teaches nothing if it works. |
| "Could be a race condition" | Mechanism without specifics. Which two operations race? On what state? |
| "Let me add some print statements and see" | Skips hypothesis generation. Generates noise, not understanding. |
| Single hypothesis presented as fact | If you are sure, write the regression test, do not run a debug skill. |
| Mixing hypotheses with fix suggestions | The skill is diagnose-only. The fix is a TDD task on its hypothesis. |
## Brain MCP Integration
The brain holds prior debug sessions across the project. Use it to skip rediscovering known failure modes.
**At debug start:**
- Run `brain_query` with the error message snippet + the package name. Past sessions may have logged identical or similar failures with their resolved hypotheses.
**After a hypothesis is confirmed:**
- Run `brain_write` with the failure signature → confirmed mechanism. Future debug sessions on the same area get the answer immediately.
**Never:**
- Run `brain_write` for an unconfirmed hypothesis. Speculation in the brain pollutes future queries.
### Logging
Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.
**At end of each phase:**
- `session_log` with `{skill: "debug", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**Phases for this skill:** read-failure, generate-hypotheses, output
**Status semantics:**
- `pass` — the phase's intended outcome was reached.
- `fail` — the phase's intended outcome was NOT reached.
- `skip` — phase was skipped intentionally.
**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
## Mode 2 Routing Note
This skill produces high-volume mechanical output (hypothesis enumeration) and is a candidate for Mode 2 routing to a local model in the future. Until Plan 6 ships the routing pod, treat as Mode 1 only. The hypothesis format and discipline are identical regardless of which model generates them.
## Cross-References
- After a hypothesis is confirmed, load `tdd` skill — write a failing regression test that proves the bug exists, then fix it.
- For test failures specifically caused by mock-vs-real divergence, also load `tdd/references/testing-anti-patterns.md`.
- Load `code-review` skill if the diagnosis surfaces a structural issue (god object, shotgun surgery) rather than a pointwise bug.

172
feature-spec/SKILL.md Normal file
View File

@@ -0,0 +1,172 @@
---
name: feature-spec
description: Write a tight implementation-level spec for a single feature or component before coding it. Use after a project-level spec exists (see spec-driven-dev) and you need to nail down one feature inside it.
---
# Feature Spec
## Overview
A feature spec is the implementation-level contract for one feature or component, written by an engineer before code. It is shorter and more focused than a project-level spec (`spec-driven-dev`) — it assumes the project context, the stack, and the user are already known. Its job is to force one decision: **what does done look like, and what is explicitly not in this feature.**
**Core principle:** If you cannot write three measurable success criteria, you do not understand the feature well enough to build it.
## When to Use
- About to start implementing a feature inside an existing, already-specced project
- A change that touches more than one file or one package
- A change that introduces a new public API, schema, or external contract
- Any time the requirements feel "obvious" — that feeling usually means assumptions are stacking silently
**When NOT to use:**
- Project kickoff or first-time architecture decisions — use `spec-driven-dev` instead, which covers the broader lifecycle (Objective, Tech Stack, Project Structure, Boundaries, etc.)
- Single-line fixes, typos, dependency bumps
- Spike or throwaway prototypes (note them as such; no spec needed)
## Iron Laws
1. **Success criteria must be measurable.** "The system is fast" is banned. "p99 < 200ms under 100 RPS" is valid. If you cannot state a measurement, the criterion is not a criterion.
2. **An "Out of scope" section is mandatory.** If you do not draw the boundary, the implementer (you, future-you, or another agent) will guess wrong.
3. **Every technical decision in "Technical approach" must have a rationale.** A decision is any choice where a reasonable alternative existed and was not taken. "Use Postgres" is not a decision; "use Postgres because the rest of the stack uses it and we need transactions across user + invoice" is.
## Spec Template
```markdown
# Spec: [Feature Name]
## Problem Statement
What problem does this feature solve? For whom? Why now?
(2-4 sentences. If you cannot answer "why now," ask before writing the rest.)
## Success Criteria
- [ ] Criterion 1 — measurable and verifiable
- [ ] Criterion 2 — measurable and verifiable
- [ ] Criterion 3 — measurable and verifiable
## Constraints
Non-negotiable requirements the solution must satisfy.
Examples: "must not change the public API", "must complete in < 500ms",
"must not require a database migration", "must work offline".
## Out of Scope
What this feature explicitly does NOT do, even if related.
Anything plausibly in scope that you decided to defer goes here.
## Technical Approach
The architecture or design decisions, with rationale for each.
Each bullet should answer "why this choice over the obvious alternative."
## Risks
What could go wrong, and how it would be mitigated or detected.
At least one risk must be listed; "no risks" usually means insufficient analysis.
```
## Worked Example
```markdown
# Spec: IBAN extraction from Swedish PDF invoices
## Problem Statement
Invoice processors at small accounting firms manually copy IBANs from PDF
invoices into the banking system. The current parser handles ~60% of formats;
40% require manual fallback, defeating the automation premise.
Why now: the pilot customer is onboarding three new firms next month, all of
which use formats currently in the 40%.
## Success Criteria
- [ ] IBAN extracted correctly from ≥ 95% of a 200-invoice test corpus
- [ ] Zero false positives (extracting a non-IBAN string as an IBAN) on the corpus
- [ ] p99 extraction time < 500ms per invoice
- [ ] Returns a structured error (not a panic) for unrecognized formats
## Constraints
- Must not require a new dependency (PDF library is already chosen)
- Must work without network access (no external IBAN validation API)
- Must be callable from the existing `Parser.Extract(ctx, r)` signature
## Out of Scope
- Non-Swedish IBAN formats (separate spec when needed)
- BIC/SWIFT extraction (handled by a separate parser)
- OCR — input is assumed to be text-extractable PDF, not scanned image
- Validating that the IBAN refers to a real account
## Technical Approach
- Layered regex-then-checksum approach: regex narrows candidates, MOD-97
checksum validates. Rationale: pure regex is too permissive; pure
checksum requires too many candidates to test.
- Use the existing `pdfplumber` text layer rather than re-extracting.
Rationale: extraction is already the slow path; reusing the layer
avoids doubling extraction time.
- Return `(iban string, ok bool, err error)` not `(*IBAN, error)`.
Rationale: the caller almost always wants to know "did we find one"
separately from "did extraction itself fail," and a pointer return
invites nil-deref bugs.
## Risks
- Regex tuned to current corpus may overfit; mitigation: corpus is held out
per-firm, so we will see degradation at the next firm onboarding before
it reaches production.
- MOD-97 implementation bugs are subtle; mitigation: cross-check against
the Wikipedia reference implementation in tests.
```
## Common Rationalizations
| Rationalization | Reality |
|---|---|
| "It's a small feature, I don't need a spec" | Small features still need a measurable definition of done. The spec stays small too. |
| "I'll write the spec after, when I know more" | Then it is documentation, not a spec. The point is forcing clarity *before* the code locks in your assumptions. |
| "The success criteria are obvious" | If they are obvious, they are easy to write down. If they are hard to write down, they are not obvious. |
| "I don't have time to enumerate out-of-scope items" | You will spend more time arguing over scope creep at review than writing this section now. |
## Brain MCP Integration
The brain holds prior feature specs across the project. Use it to keep constraints and out-of-scope decisions coherent across related features.
**At spec start:**
- Run `brain_query` for prior specs in the same area or for the same problem domain. Reusing the same constraints and out-of-scope decisions across related features keeps the system coherent.
**After spec is approved:**
- Run `brain_write` with the spec's success criteria and out-of-scope list. Future related specs can cross-reference rather than re-derive.
### Logging
Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.
**At end of each phase:**
- `session_log` with `{skill: "feature-spec", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**Phases for this skill:** draft
**Status semantics:**
- `pass` — the phase's intended outcome was reached (spec drafted and approved).
- `fail` — the phase's intended outcome was NOT reached (spec rejected, blocked on unresolved questions).
- `skip` — phase was skipped intentionally.
**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
## Mode 2 Routing Note
Spec writing requires interpretation of intent and tradeoff reasoning — work where Claude generally outperforms a 9-13B local model. This skill is intentionally **not** a Mode 2 routing candidate. In Mode 2, the routing layer should leave `feature-spec` calls on Claude.
## Cross-References
- Load `spec-driven-dev` instead if this is the first spec for a project (broader template, includes tech stack, project structure, boundaries).
- Load `problem-analysis` skill before this if the problem statement itself feels fuzzy.
- Load `tdd` skill once the spec is approved — each success criterion becomes a test.
- Load `planning` skill if the spec implies more than ~5 implementation tasks.

487
gitea-ci/SKILL.md Normal file
View File

@@ -0,0 +1,487 @@
---
name: gitea-ci
description: End-to-end Gitea CI/CD setup with self-hosted act_runner, quality gate pipeline, buildah image build, k3s deploy, and GitHub mirror. Covers act_runner-specific gotchas that are not documented anywhere.
---
# Gitea CI/CD Pipeline Setup
## Overview
This skill sets up a complete CI/CD pipeline on Gitea with a self-hosted runner. It covers the full journey from a bare repo to a working pipeline with: lint/test gate, OCI image build, k3s deploy, and GitHub mirror. It also captures non-obvious act_runner quirks that will burn you if you don't know them.
Load this skill when starting a new project that needs CI, or when debugging a broken Gitea Actions pipeline.
## When to Use
- Setting up CI for a new project on this infra (Gitea + koala/iguana/flamingo)
- Debugging a Gitea Actions pipeline that silently fails or behaves unexpectedly
- Migrating an existing project to Gitea Actions from another CI system
---
## Phase 0: Gather Context
Before writing any YAML, collect the following. Use AskUserQuestion for anything not obvious from the project:
| Item | Where to find it | Why it matters |
|------|-----------------|----------------|
| Gitea repo URL | git remote -v | Base URL for API calls |
| Runner machine | CLAUDE.md / infra notes | Which machine runs jobs |
| Primary language | go.mod, package.json, etc. | Drives the toolchain steps |
| Container registry | Ask or check existing config | `localhost:5000` on koala, or external |
| k3s namespace | k8s/ manifests or ask | Deploy target |
| GitHub mirror repo | Ask | Target for the mirror job |
| Image name | Ask or derive from repo name | Used in buildah + k3s steps |
Check the project's `CLAUDE.md` for infra notes first — often this answers most questions without needing to ask.
---
## Phase 1: Verify Runner Is Live
Before writing the workflow, confirm the runner is registered and healthy.
```bash
# Check registered runners via Gitea API
curl -s "https://<gitea-host>/api/v1/repos/<owner>/<repo>/actions/runners" \
-H "Authorization: token $GITEA_TOKEN" | python3 -m json.tool
```
If no runners appear, the runner is not registered. On this infra, `koala` runs `act_runner`. Check with:
```bash
ssh koala "systemctl status act_runner"
# or if running as a process:
ssh koala "pgrep -a act_runner"
```
act_runner config is typically at `~/.config/act_runner/` or wherever it was installed. Registration command:
```bash
act_runner register \
--instance https://<gitea-host> \
--token <runner-registration-token> \
--name koala \
--labels self-hosted
```
The registration token is in Gitea: Settings → Actions → Runners → Create Runner.
---
## Phase 2: Write the Workflow
The standard pipeline has 4-5 jobs. Copy the template below and fill in the `env.IMAGE` value and any project-specific toolchain steps.
```yaml
name: CI
on:
push:
branches: [main]
tags: ["v*"]
pull_request:
branches: [main]
env:
IMAGE: <project-name> # ← change this
jobs:
# ── 1. Quality gate ─────────────────────────────────────────────────────────
check:
name: Lint / Test / Vet
runs-on: self-hosted
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache: false # self-hosted: Go cache persists on disk between runs
- name: Verify toolchain
run: |
go version
task --version
govulncheck -version 2>&1 || true
- name: Install golangci-lint
run: |
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/HEAD/install.sh \
| sh -s -- -b "$(go env GOPATH)/bin" v2.11.4
golangci-lint --version
- name: Run checks
run: task check
# ── 2. Build image ──────────────────────────────────────────────────────────
build:
name: Build & Import
needs: check
runs-on: self-hosted
if: github.event_name != 'pull_request'
outputs:
image-tag: ${{ steps.meta.outputs.sha-tag }}
steps:
- uses: actions/checkout@v4
- name: Derive image tags
id: meta
run: |
SHA=$(git rev-parse --short HEAD)
echo "sha-tag=${SHA}" >> "$GITHUB_OUTPUT"
REF="${{ github.ref }}"
if [[ "$REF" == refs/tags/v* ]]; then
echo "version-tag=${REF#refs/tags/}" >> "$GITHUB_OUTPUT"
fi
- name: Build and push to local registry
run: |
REGISTRY="localhost:5000"
REF="${REGISTRY}/${{ env.IMAGE }}:${{ steps.meta.outputs.sha-tag }}"
buildah build \
--label "org.opencontainers.image.revision=${{ github.sha }}" \
--label "org.opencontainers.image.source=${{ github.repositoryUrl }}" \
-t ${REF} \
-t ${REGISTRY}/${{ env.IMAGE }}:latest \
.
buildah push --tls-verify=false ${REF}
buildah push --tls-verify=false ${REGISTRY}/${{ env.IMAGE }}:latest
[[ -n "${{ steps.meta.outputs.version-tag }}" ]] && \
buildah push --tls-verify=false \
${REF} \
${REGISTRY}/${{ env.IMAGE }}:${{ steps.meta.outputs.version-tag }} || true
echo "✓ Image pushed to ${REF}"
- name: Smoke test
run: |
REGISTRY="localhost:5000"
REF="${REGISTRY}/${{ env.IMAGE }}:${{ steps.meta.outputs.sha-tag }}"
CNAME="smoke-${{ steps.meta.outputs.sha-tag }}"
sudo k3s ctr images pull --plain-http ${REF}
OUTPUT=$(timeout 5 sudo k3s ctr run --rm ${REF} ${CNAME} /<binary-name> 2>&1 || true)
sudo k3s ctr containers delete ${CNAME} 2>/dev/null || true
echo "$OUTPUT" | grep -q "<project-name>" \
&& echo "✓ Smoke test passed" \
|| echo "⚠ Smoke test inconclusive: $OUTPUT"
# ── 3. Deploy to k3s ────────────────────────────────────────────────────────
deploy:
name: Deploy to staging (k3s)
needs: build
runs-on: self-hosted
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment: staging
steps:
- uses: actions/checkout@v4
- name: Apply manifests and update image
env:
IMAGE_TAG: ${{ needs.build.outputs.image-tag }}
run: |
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/deployment.yaml
kubectl set image deployment/<project-name> \
<project-name>=localhost:5000/${{ env.IMAGE }}:${IMAGE_TAG} \
--namespace <project-name>
- name: Verify rollout
run: |
kubectl rollout status deployment/<project-name> \
--namespace <project-name> \
--timeout=120s \
|| {
echo "── pod status ──"
kubectl get pods -n <project-name> -o wide
echo "── pod events ──"
kubectl get events -n <project-name> --sort-by='.lastTimestamp' | tail -20
exit 1
}
# ── 4. Mirror to GitHub ─────────────────────────────────────────────────────
mirror:
name: Mirror to GitHub
needs: deploy
runs-on: self-hosted
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Push to GitHub
run: |
mkdir -p ~/.ssh
echo '${{ secrets.GH_DEPLOY_KEY }}' > ~/.ssh/id_rsa_gh_mirror
chmod 600 ~/.ssh/id_rsa_gh_mirror
ssh-keyscan github.com >> ~/.ssh/known_hosts 2>/dev/null
GIT_SSH_COMMAND="ssh -i ~/.ssh/id_rsa_gh_mirror -o IdentitiesOnly=yes" \
git push git@github.com:<gh-org>/<repo>.git HEAD:main
rm ~/.ssh/id_rsa_gh_mirror
echo "✓ Mirrored to GitHub"
```
Replace all `<placeholders>` before committing.
---
## Trunk-Based Development and CI
This infra uses TBD. CI on every push to main is the quality gate — not branch protection.
### What this means in practice
- Commit directly to main for all solo and single-agent work
- Each commit: one logical change + `task check` passing locally before push
- CI runs `task check` again on push — if it was green locally it will be green in CI
- No PRs, no feature branches, no squash-merges for normal work
### The pre-push ritual
```bash
# In project root before every push:
task check && git push origin main
```
Wire this into your shell or a git pre-push hook so it's automatic.
### When CI catches something you missed locally
Fix forward — commit the fix directly to main. Do not revert unless the broken
commit introduced a data migration or irreversible infrastructure change.
### Feature flags instead of feature branches
If a feature is too large for one session, hide it behind a build tag or a config
flag until ready:
```go
// cmd/server/main.go
if os.Getenv("ENABLE_EXPERIMENTAL_FEATURE") == "true" {
server.RegisterExperimentalRoutes(mux)
}
```
Commit the hidden feature incrementally. Activate it when complete.
### When to break the rule
- **Parallel agents on the same repo:** short-lived `agent/<description>` branch,
merge to main inside the same session.
- **External contributor or client four-eyes requirement:** PR flow, document the
reason in `PROJECT.md`.
---
## Phase 3: Set Up GitHub Mirror
The mirror job needs an SSH deploy key with write access to the GitHub repo.
```bash
# On your local machine:
ssh-keygen -t ed25519 -C "gitea-mirror-<project>" -f /tmp/gh_mirror_key -N ""
cat /tmp/gh_mirror_key.pub # add this to GitHub repo → Settings → Deploy Keys (Allow write)
cat /tmp/gh_mirror_key # add this to Gitea repo → Settings → Secrets → GH_DEPLOY_KEY
rm /tmp/gh_mirror_key /tmp/gh_mirror_key.pub
```
The private key goes in Gitea as the `GH_DEPLOY_KEY` secret. The public key goes in GitHub as a deploy key with write access.
---
## Phase 4: Add Project Secrets
Set secrets via the Gitea API — faster than the UI and scriptable:
```bash
# Template:
curl -s -X PUT \
"https://<gitea-host>/api/v1/repos/<owner>/<repo>/actions/secrets/<SECRET_NAME>" \
-H "Authorization: token $GITEA_TOKEN" \
-H "Content-Type: application/json" \
-d '{"data":"<secret-value>"}' \
-w "\nHTTP %{http_code}"
# Expected: HTTP 204
# Verify secrets are set:
curl -s "https://<gitea-host>/api/v1/repos/<owner>/<repo>/actions/secrets" \
-H "Authorization: token $GITEA_TOKEN" | python3 -m json.tool
```
Required secrets for the baseline pipeline: `GH_DEPLOY_KEY`.
Any project-specific secrets (API keys, tokens, credentials) are added the same way.
---
## Phase 5: Debug Failing Runs
When a run fails, use the Gitea API to get logs without opening a browser.
```bash
GITEA_HOST="https://<gitea-host>"
REPO="<owner>/<repo>"
TOKEN="$GITEA_TOKEN"
# List recent runs
curl -s "$GITEA_HOST/api/v1/repos/$REPO/actions/runs?limit=5" \
-H "Authorization: token $TOKEN" | python3 -c "
import json,sys
for r in json.load(sys.stdin)['workflow_runs']:
print(f'#{r[\"id\"]} {r[\"status\"]:12} {r.get(\"conclusion\",\"\"):10} {r[\"display_title\"][:50]}')
"
# List jobs for a run (replace RUN_ID)
curl -s "$GITEA_HOST/api/v1/repos/$REPO/actions/runs/<RUN_ID>/jobs" \
-H "Authorization: token $TOKEN" | python3 -c "
import json,sys
for j in json.load(sys.stdin)['jobs']:
print(f'Job {j[\"id\"]}: {j[\"name\"]:30} | {j[\"status\"]:10} | {j.get(\"conclusion\",\"\")}')
"
# Get full log for a job (replace JOB_ID)
curl -s "$GITEA_HOST/api/v1/repos/$REPO/actions/jobs/<JOB_ID>/logs" \
-H "Authorization: token $TOKEN" | grep -E "ERROR|error|FAIL|undefined|cannot" | tail -30
```
---
## ⚠ act_runner Gotchas
These are non-obvious bugs and sharp edges that will silently break your pipeline. Read all of them.
### 1. Steps with `env:` blocks that use secret expressions are silently skipped
**Symptom**: A step produces no log output at all — not even its step header (`⭐ Run Main ...`). Subsequent steps fail as if the skipped step never ran.
**Cause**: In act_runner v0.4.x, steps that have an `env:` block containing `${{ secrets.* }}` expressions are silently dropped before execution.
**Wrong**:
```yaml
- name: Write config
env:
API_KEY: ${{ secrets.API_KEY }}
run: |
echo "KEY=$API_KEY" > config.env
```
**Right** — inline the expression directly in the `run:` script:
```yaml
- name: Write config
run: |
echo 'KEY=${{ secrets.API_KEY }}' > config.env
```
act_runner evaluates `${{ }}` expressions before passing the script to bash. The values are redacted in log output. No `env:` block needed.
### 2. Heredocs in `run: |` blocks — the indented EOF trap
**Symptom**: A file is written with 10-100× more content than expected. JSON files fail to parse. The step appears to succeed.
**Cause**: In a YAML literal block scalar (`run: |`), all content is at the block's indentation level. If you indent your heredoc content (to look neat), the `EOF` terminator also gets indented. Bash requires the terminator to be at column 0 — an indented `EOF` is NOT recognized, so the heredoc consumes everything that follows until end-of-script.
**Wrong**:
```yaml
- name: Write file
run: |
cat > output.json <<EOF
{
"key": "value"
}
EOF ← indented, NOT recognized as terminator
chmod 600 output.json
```
The file ends up containing the JSON, the literal string ` EOF`, and ` chmod 600 output.json`.
**Right** — avoid heredocs entirely. Use `echo` for simple lines, `printf` for single-line formatted output:
```yaml
- name: Write file
run: |
echo '{"key":"${{ secrets.SOME_VALUE }}"}' > output.json
chmod 600 output.json
```
For JSON with dynamic values, use python3 as a one-liner (but see gotcha #3):
```yaml
- name: Write token file
run: |
python3 -c "import json; open('token.json','w').write(json.dumps({'key':'${{ secrets.TOKEN }}','type':'bearer'}))"
```
### 3. Multiline `python3 -c` breaks YAML block scalar parsing
**Symptom**: The workflow file is pushed but no CI run appears — Gitea silently rejects the workflow YAML.
**Cause**: Unindented code in a multiline `python3 -c "..."` block (with line breaks inside the YAML `run: |`) is at column 0, which the YAML parser interprets as ending the block scalar and starting a new YAML key.
**Wrong**:
```yaml
run: |
python3 -c "
import json ← column 0, YAML parser thinks block scalar ended
d = {...}
"
```
**Right** — keep the python3 invocation on one line:
```yaml
run: |
python3 -c "import json; d = {...}; open('f','w').write(json.dumps(d))"
```
If the one-liner is too long, write a small `.py` file in a previous step and execute it.
### 4. Validate YAML locally before pushing
Gitea silently ignores workflows with YAML parse errors — no run appears, no error message.
```bash
python3 -c "import yaml; yaml.safe_load(open('.gitea/workflows/ci.yml').read()); print('YAML valid')"
```
Run this before every push that touches the workflow file.
### 5. Commit generated files instead of gitignoring them
**Symptom**: Lint/build fails in CI with `undefined: SomeGeneratedType` even though it works locally.
**Cause**: Code generation tools (templ, sqlc, wire, protoc, etc.) produce `_generated.go` or `_templ.go` files that are often in `.gitignore`. CI runners don't have the generator installed and can't regenerate them.
**Fix**: Remove the generated files from `.gitignore` and commit them. They are deterministic, diffable Go source — committing them is the standard practice for projects that don't want to install generators in CI.
If you need generators in CI anyway (e.g., for proto changes), add a generation step BEFORE lint in the `check` job and install the tool explicitly.
### 6. `errcheck` will flag `fmt.Sscanf` and similar "can't fail in practice" calls
Parsers that convert strings to numbers (e.g., parsing API responses where you know the format) will trigger `errcheck`. Suppress with blank assignment:
```go
// before
fmt.Sscanf(s, "%d", &n)
// after — communicates intent and satisfies errcheck
_, _ = fmt.Sscanf(s, "%d", &n)
```
---
## Verification Checklist
Before marking CI setup complete:
- [ ] `python3 -c "import yaml; yaml.safe_load(...)"` passes on the workflow YAML
- [ ] Runner appears as active in Gitea repo → Settings → Actions → Runners
- [ ] A push to `main` triggers a run visible in Gitea repo → Actions
- [ ] `check` job passes (lint + test + vet green)
- [ ] `build` job produces an image in the local registry (`curl localhost:5000/v2/<image>/tags/list`)
- [ ] `deploy` job shows rollout success in pod logs
- [ ] `mirror` job shows "✓ Mirrored to GitHub" and the commit appears on GitHub
- [ ] A PR (non-main push) runs `check` only, skips build/deploy/mirror
- [ ] Any project-specific secrets are set via API and verified with a secrets list call
---
## Related Skills
- `tdd` — for writing the tests that the `check` job runs
- `planning` — for breaking down the work before writing pipeline YAML
- `problem-analysis` — for debugging when a job fails in unexpected ways

147
grill-me/SKILL.md Normal file
View File

@@ -0,0 +1,147 @@
---
name: grill-me
description: Critically interrogates plans, ideas, and proposals to make them sharper, more robust, and better grounded — with the explicit goal of producing a meaningfully better plan by the end. Use this skill whenever the user wants to stress-test a plan, prepare for hard questions, check if an idea is ready to act on, validate a hypothesis, or explicitly asks to be challenged, questioned, or grilled. Also trigger when a conversation has been exploring an idea for a while and it's time to converge — use this skill to check if the idea is actually ready. Trigger phrases include "grill me", "challenge this", "poke holes", "am I missing something", "is this ready", "stress-test this", "devil's advocate", "what am I not seeing".
---
# Grill-Me
A structured interrogation skill that pressure-tests plans and ideas before they become commitments. The goal is not to kill ideas — it's to produce a **demonstrably better plan** than the one that walked in.
## When to use each mode
Three interrogation modes. Pick based on context:
**Mode 1 — Quick Poke** (idea is early, user wants light pressure)
35 sharp questions, each with a proposal. Fast. No structure overhead. Good for early Diamond 1 conversations.
**Mode 2 — Full Grill** (idea is approaching commitment, user wants thorough interrogation)
Structured pass through all lenses below. Ends with a Plan Delta. Good for end of Diamond 1 or before starting a project.
**Mode 3 — Pre-mortem** (a plan exists and is about to be executed)
Assume it failed. Work backwards. What went wrong? Good for Diamond 2 → Deliver transitions.
Read the context to pick the right mode. If unsure, ask: "Quick poke or full grill?"
---
## Always bring a proposal
When a question requires input or a decision, don't just ask — bring a recommendation:
> "What's the smallest version that validates the hypothesis? I'd suggest starting with just the ntfy URL-push channel (Phase 1), because it gives you real usage data before investing in the automated sync. Does that match your intuition?"
The format: sharp question → "I'd suggest [X]" → one-sentence reasoning → quick confirmation ask.
This respects the user's time, models good thinking, and makes it easier to say "yes, agreed" than to construct an answer from scratch. If you genuinely have no strong opinion, say so explicitly — but that should be rare.
---
## Full Grill — interrogation lenses
Run through each lens. Skip lenses that clearly don't apply, but say so explicitly ("Skipping competitive lens — this is internal tooling"). For each question that requires a decision, include a proposal.
### 1. Hypothesis clarity
- Can the core claim be stated as: "We believe [X] will produce [Y], measurable by [Z]"?
- Is Y falsifiable? Could an experiment prove this wrong?
- Is Z actually measurable with available tools, or is it hand-wavy?
### 2. Assumptions audit
- List the top 3 assumptions the plan depends on.
- Which assumption, if wrong, kills the whole thing?
- Has any assumption been validated, or are they all untested?
### 3. Futures resilience (Double Diamond + Futures Thinking lens)
- Name 23 plausible alternative futures (not just the one assumed).
- Does the plan still make sense in each of them?
- What's the earliest signal that one of those other futures is manifesting?
### 4. Scope and reversibility
- What is explicitly out of scope? (Unstated scope is a risk.)
- Which decisions in this plan are irreversible?
- What's the smallest version of this that would still validate the hypothesis?
### 5. Cost and effort reality check
- What's the most expensive part? Time, money, attention?
- Is there a simpler path to the same learning?
- What does "good enough" look like vs "perfect"?
### 6. Failure modes
- What's the most likely way this fails? (Not catastrophically — just quietly doesn't work.)
- What's the most catastrophic failure mode, even if unlikely?
- Is there a canary — an early signal that things are going wrong?
### 7. Who and why
- Who benefits if this succeeds?
- Who is harmed, blocked, or ignored?
- Is there someone who should be consulted who hasn't been?
### 8. Next action clarity
- Is the next step concrete enough that it could be handed to an agent or a person with no further explanation?
- Is there a single person (or agent) responsible for it?
- What would stop it from happening this week?
---
## Pre-mortem mode
Set the scene: "It's [6 months / 1 year] from now. This project has failed. Not dramatically — it just quietly didn't deliver. Walk me through what happened."
Then probe:
- Was it a wrong assumption, a scope problem, or an execution problem?
- Was there a moment where a different decision would have changed the outcome?
- What should we do differently before we start?
Include a proposal for each finding: "To prevent this, I'd suggest [X]."
---
## Output format
**Quick Poke:** Questions numbered, each followed by a proposal. No preamble. Uncomfortable is fine.
**Full Grill:** One section per lens. Each section: 12 sentences of framing + 23 sharp questions + proposals where relevant. End with:
- "The three things most worth resolving before moving forward are: [X, Y, Z]."
- A **Plan Delta** section (see below).
**Pre-mortem:** Narrative first (23 sentences setting the failure scene), then structured findings with proposals, then "three things to do differently".
---
## Plan Delta — mandatory at the end of Full Grill and Pre-mortem
The grill is not done when the questions have been asked. It's done when the plan is **demonstrably stronger**. At minimum:
- At least one weak assumption has been named and either validated or explicitly accepted as a known risk
- The hypothesis is sharper and more falsifiable than when we started
- Something has changed: a scope reduction, a clearer metric, a reordered phase, a dropped assumption
If the plan hasn't changed at all after a Full Grill, something went wrong — the questions weren't sharp enough or the answers weren't pushed hard enough.
End every Full Grill and Pre-mortem with:
> **Plan delta**
> Before: [one sentence on the original plan]
> After: [one sentence on what concretely changed]
> Still open: [12 unresolved questions and how/when they'll be resolved]
---
## Tone guidance
The grill should feel like a smart, trusted colleague who genuinely wants the plan to succeed — not a critic looking for reasons to say no. Questions should be sharp but not hostile. If something is clearly solid, say so briefly before moving on.
Don't soften questions with preambles. "Have you considered that..." is weaker than "What happens if...". Prefer the latter.
If the plan is actually good, say so: "Lenses 14 look solid. The real open questions are in lenses 5 and 6."
---
## Integration with Double Diamond
This skill maps to the **convergence moments** in the Double Diamond workflow:
- **End of Diamond 1 (Define):** Use Full Grill to check if the hypothesis is ready to become a project. The three convergence criteria (falsifiable hypothesis, measurable success criterion, futures-resilient) come directly from lenses 1 and 3.
- **End of Diamond 2 (Deliver):** Use Pre-mortem before promoting to pre-prod.
- **Mid-Diamond 2 (Develop):** Use Quick Poke when an agent's experiment result is ambiguous and you're deciding whether to continue or pivot.
The skill does NOT replace human judgment at convergence points — it sharpens the plan so that judgment is better informed.

98
install.sh Executable file
View File

@@ -0,0 +1,98 @@
#!/usr/bin/env bash
# install.sh — bootstrap installer for the skills library.
#
# Usage (one-liner, hosts without Task):
# curl -fsSL https://gitea.d-ma.be/mathias/skills/raw/branch/main/install.sh | bash
#
# Pinned to a specific tag:
# SKILLS_REF=v0.1.0 bash install.sh
#
# Idempotent: re-running pulls the latest changes for the configured ref
# (default: main) and re-wires symlinks. Existing correct links are
# left alone; mismatched ones are replaced.
set -euo pipefail
REPO_URL="${SKILLS_REPO_URL:-https://gitea.d-ma.be/mathias/skills.git}"
REF="${SKILLS_REF:-main}"
CHECKOUT_DIR="${SKILLS_CHECKOUT_DIR:-$HOME/.local/share/skills}"
CLAUDE_GLOBAL_DIR="${CLAUDE_SKILLS_DIR:-$HOME/.claude/skills}"
log() { printf '[skills] %s\n' "$*"; }
ensure_checkout() {
if [ -d "$CHECKOUT_DIR/.git" ]; then
log "updating $CHECKOUT_DIR"
git -C "$CHECKOUT_DIR" fetch --tags --quiet
git -C "$CHECKOUT_DIR" checkout --quiet "$REF"
if git -C "$CHECKOUT_DIR" symbolic-ref -q HEAD >/dev/null; then
# on a branch — fast-forward
git -C "$CHECKOUT_DIR" pull --ff-only --quiet
fi
else
log "cloning $REPO_URL into $CHECKOUT_DIR (ref=$REF)"
mkdir -p "$(dirname "$CHECKOUT_DIR")"
git clone --quiet "$REPO_URL" "$CHECKOUT_DIR"
git -C "$CHECKOUT_DIR" checkout --quiet "$REF"
fi
}
list_skills() {
# Every top-level entry that isn't a known meta file.
(cd "$CHECKOUT_DIR" && ls -1) | grep -Ev '^(Taskfile\.yml|install\.sh|README\.md|SKILLS_INDEX\.md|\.gitea|\.git)$' || true
}
link_skill() {
# link_skill <target_dir> <skill_name>
local target_dir="$1"
local skill="$2"
local target="$CHECKOUT_DIR/$skill"
local link="$target_dir/$skill"
if [ -L "$link" ] || [ -e "$link" ]; then
local current
current=$(readlink "$link" 2>/dev/null || true)
if [ "$current" = "$target" ]; then
return
fi
rm -rf "$link"
fi
ln -s "$target" "$link"
log "linked $link$target"
}
wire_claude_global() {
mkdir -p "$CLAUDE_GLOBAL_DIR"
while IFS= read -r skill; do
[ -n "$skill" ] || continue
link_skill "$CLAUDE_GLOBAL_DIR" "$skill"
done < <(list_skills)
}
wire_claude_repo() {
# Only wire per-repo when invoked from inside a git repo and it isn't
# the skills repo itself.
if ! git rev-parse --show-toplevel >/dev/null 2>&1; then
return
fi
local repo
repo=$(git rev-parse --show-toplevel)
if [ "$repo" = "$CHECKOUT_DIR" ]; then
return
fi
local target_dir="$repo/.claude/skills"
mkdir -p "$target_dir"
while IFS= read -r skill; do
[ -n "$skill" ] || continue
link_skill "$target_dir" "$skill"
done < <(list_skills)
}
main() {
ensure_checkout
wire_claude_global
wire_claude_repo
log "done — $(list_skills | wc -l | tr -d ' ') skill(s) wired at ref=$REF"
}
main "$@"

276
planning/SKILL.md Normal file
View File

@@ -0,0 +1,276 @@
---
name: planning
description: Decompose work into small, verifiable, ordered tasks. Use when you have a spec or clear requirements and need to break work into implementable units.
---
# Planning and Task Breakdown
## Overview
Good task breakdown is the difference between reliable delivery and a tangled mess. Every task should be small enough to implement, test, and verify in a single focused session.
**Input:** A spec (load `spec-driven-dev` skill) or clear requirements.
**Output:** A prioritized, ordered list of tasks with acceptance criteria and verification steps.
## When to Use
- You have a spec and need to break it into implementable units
- A task feels too large or vague to start
- You need to communicate scope to Mathias
- The implementation order isn't obvious
- You want to identify parallelization opportunities
**When NOT to use:** Single-file changes with obvious scope.
## The Planning Process
### Step 1: Read-Only Discovery
Before writing any tasks, operate in read-only mode:
- Read the spec and relevant code
- Identify existing patterns, conventions, and infrastructure
- Map dependencies between components
- Note risks and unknowns
**Do NOT write code during planning.** The output is a plan document, not implementation.
### Step 2: Map the Dependency Graph
Visualize what depends on what:
```
Database schema / migrations
├── Domain types (pure Go structs + interfaces)
│ │
│ ├── Service layer (business logic)
│ │ │
│ │ └── HTTP handlers
│ │
│ └── Store implementations (postgres, in-memory)
└── Seed data / test fixtures
```
**Build foundations first.** You can't test a handler before the service exists. You can't write a service before the domain types are defined.
### Step 3: Slice Vertically
Build one complete feature path at a time, not one layer at a time.
**Bad (horizontal slicing):**
```
Task 1: Write all database migrations
Task 2: Write all store implementations
Task 3: Write all service methods
Task 4: Write all handlers
```
No working software until Task 4. If Task 1 is wrong, you discover it late.
**Good (vertical slicing):**
```
Task 1: User can create an account (migration + store + service + handler for create)
Task 2: User can log in (auth store + service + handler)
Task 3: User can view their invoices (query + service + handler + response type)
```
Each task delivers working, testable functionality. If Task 1 reveals a design issue, Tasks 2+ can adapt.
### Step 4: Write Tasks
Each task follows this structure:
```markdown
## Task [N]: [Short descriptive title]
**Description:** One sentence explaining what this task accomplishes.
**Acceptance criteria:**
- [ ] [Specific, testable condition]
- [ ] [Specific, testable condition]
**Verification:**
- [ ] Tests pass: `go test -run TestFeatureName ./...`
- [ ] Build: `go build ./...`
- [ ] Manual check: [description if needed]
**Dependencies:** Task [N-1], Task [N-2] (or "None")
**Files likely touched:**
- `internal/domain/user.go`
- `internal/store/user_store.go`
- `internal/service/user_service.go`
**Estimated scope:** [XS/S/M/L]
```
### Step 5: Order and Add Checkpoints
Arrange tasks so:
1. Dependencies are satisfied (build foundation first)
2. Each task leaves the system in a working state
3. High-risk tasks are early (fail fast)
4. Checkpoints occur after every 23 tasks
```markdown
## Checkpoint: After Tasks 13
- [ ] `go test ./...` passes
- [ ] `go build ./...` succeeds
- [ ] Core user flow works end-to-end (can register and log in)
- [ ] Mathias reviews before continuing
```
## Task Sizing Guidelines
| Size | Files | Scope | Example |
|------|-------|-------|---------|
| **XS** | 1 | Single function or type | Add a validation rule to User |
| **S** | 12 | One endpoint or store method | Add GetByEmail to UserStore |
| **M** | 35 | One feature slice | User registration (domain + store + service + handler) |
| **L** | 58 | Multi-component feature | Search with filtering, pagination, and sorting |
| **XL** | 8+ | **Too large — break it down** | — |
**Break any XL task into smaller tasks.** An agent performs best on XS, S, and M tasks.
**Break a task down further when:**
- It would take more than 23 hours of focused work
- The acceptance criteria have more than 4 bullet points
- It touches two independent subsystems (auth AND billing in the same task)
- You find yourself writing "and" in the task title
## Plan Document Template
```markdown
# Implementation Plan: [Feature/Project Name]
## Overview
[One paragraph: what we're building and the key technical decisions]
## Architecture Decisions
- [Decision 1 and rationale]
- [Decision 2 and rationale]
## Dependency Graph
[Diagram or list showing what depends on what]
## Task List
### Phase 1: Foundation
- [ ] **Task 1:** [Title] — [Scope: XS/S/M]
- AC: [criteria]
- Verify: `go test -run TestX ./...`
- Files: [list]
- [ ] **Task 2:** [Title] — [Scope: XS/S/M]
...
### Checkpoint: Foundation Complete
- [ ] `go test ./...` passes
- [ ] Build clean
- [ ] Foundation behavior works end-to-end
### Phase 2: Core Features
- [ ] **Task 3:** [Title] — [Scope: S/M]
...
### Checkpoint: Core Features Complete
- [ ] User-facing scenarios work
- [ ] Mathias reviews before Phase 3
### Phase 3: Polish and Edge Cases
- [ ] **Task 5:** Handle [edge case]
- [ ] **Task 6:** [Performance/error handling]
### Checkpoint: Complete
- [ ] All acceptance criteria from the spec are met
- [ ] `go test -race ./...` passes
- [ ] govulncheck passes
- [ ] Ready for review
## Risks and Mitigations
| Risk | Impact | Mitigation |
|------|--------|-----------|
| [External API behavior unknown] | High | Spike task first |
| [Data volume may be larger than expected] | Medium | Add pagination before shipping |
## Open Questions
- [Question requiring Mathias's input]
```
## Parallelization
When working in parallel (multiple sessions or agents):
- **Safe to parallelize:** Independent feature slices, tests for already-implemented features
- **Must be sequential:** Database migrations, shared interface changes, dependency chains
- **Needs coordination:** Features that share an API contract — define the contract first, then parallelize
## Go-Specific Planning Notes
### Database First, Then Domain
If the feature requires a schema change:
1. Write and apply migration first
2. Generate sqlc code (`task generate` or `sqlc generate`)
3. Then write domain types and service
### Interface Contracts Before Parallel Work
If two tasks depend on a shared interface:
```go
// Define this first, in its own task
type InvoiceStore interface {
Save(ctx context.Context, inv Invoice) error
GetByID(ctx context.Context, id InvoiceID) (Invoice, error)
}
```
Then write the postgres implementation and the service in parallel.
### Spike Tasks for Unknowns
If a risk involves an unknown (new API, unfamiliar library, performance unknown):
```markdown
## Task 0 (Spike): Validate Bankgirot API integration
**Description:** Verify we can submit an ISO 20022 message to Bankgirot sandbox.
**Timebox:** 2 hours
**Output:** Working code or decision to change approach
**Not:** Production-quality code — throw it away after learning
```
Spikes are timeboxed explorations. The code is thrown away. The learning is kept.
## Verification Before Starting
Before beginning implementation:
- [ ] Every task has acceptance criteria
- [ ] Every task has a verification step
- [ ] Task dependencies are identified and ordered correctly
- [ ] No task touches more than ~5 files
- [ ] Checkpoints exist between major phases
- [ ] Mathias has reviewed and approved the plan
## Common Rationalizations
| Rationalization | Reality |
|---|---|
| "I'll figure it out as I go" | That's how you end up with rework. 10 minutes of planning saves hours. |
| "The tasks are obvious" | Write them down. Explicit tasks surface hidden dependencies. |
| "Planning is overhead" | Planning IS the task. Implementation without a plan is just typing. |
| "I can hold it all in my head" | Context windows are finite. Written plans survive session boundaries. |
## Cross-References
- Requires: `spec-driven-dev` skill output
- During implementation: load `tdd` skill per task
- For large refactoring tasks: load `refactoring` skill
- For architecture decisions: load `solid` skill

273
playwright/SKILL.md Normal file
View File

@@ -0,0 +1,273 @@
---
name: playwright
description: Add and run Playwright E2E tests for any project with a web UI — covers setup, HTMX-specific patterns, SSE/streaming, Taskfile integration, and CI wiring.
---
# Playwright E2E Testing
## When to use
- Project has a web UI (HTMX, React, plain HTML — any stack)
- You want to verify the golden path in a real browser, not just unit-test handlers
- Chat/SSE streams, HTMX swaps, or dynamic DOM need testing that Go tests cannot cover
## Setup (first time in a project)
### 1. Install
```bash
npm init -y
npm install --save-dev @playwright/test
npx playwright install chromium # chromium only — enough for CI
```
### 2. Directory layout
```
<repo>/
e2e/
playwright.config.ts
tests/
smoke.spec.ts
chat.spec.ts
package.json
Taskfile.yml ← add e2e task here
```
Keep `e2e/` at repo root. Add to `.gitignore`:
```
node_modules/
e2e/test-results/
e2e/playwright-report/
```
### 3. Config (`e2e/playwright.config.ts`)
```typescript
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests',
timeout: 30_000,
retries: process.env.CI ? 2 : 0,
reporter: process.env.CI ? 'github' : 'list',
use: {
baseURL: process.env.BASE_URL ?? 'http://localhost:8080',
screenshot: 'only-on-failure',
trace: 'retain-on-failure',
},
projects: [
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
],
});
```
### 4. Taskfile task
```yaml
tasks:
e2e:
desc: "Run Playwright E2E tests against running dev server"
dir: e2e
cmds:
- npx playwright test {{.CLI_ARGS}}
env:
BASE_URL: "http://localhost:8080"
e2e:ui:
desc: "Open Playwright UI mode"
dir: e2e
cmds:
- npx playwright test --ui
```
Run with: `task e2e` or `task e2e -- --headed` or `task e2e -- tests/chat.spec.ts`
---
## HTMX-specific patterns
HTMX updates the DOM asynchronously. Never use `waitForSelector` alone — wait for HTMX to finish the swap.
### Wait for HTMX swap to complete
```typescript
// After triggering an action that causes an HTMX request:
await page.locator('[hx-target]').waitFor({ state: 'visible' });
// Better: wait for htmx:afterSwap event on the target element
await page.evaluate(() =>
new Promise<void>(resolve =>
document.body.addEventListener('htmx:afterSwap', () => resolve(), { once: true })
)
);
```
### Wait for specific content to appear after swap
```typescript
await page.locator('#result').waitFor({ state: 'visible' });
await expect(page.locator('#result')).toContainText('expected text');
```
### Avoid timing races
```typescript
// BAD — may grab stale content before swap completes
await page.click('button');
const text = await page.locator('#output').textContent();
// GOOD — assert after waiting
await page.click('button');
await expect(page.locator('#output')).toContainText('expected text', { timeout: 5000 });
```
### Forms with HTMX submit
```typescript
await page.fill('input[name="query"]', 'hello');
await page.keyboard.press('Enter'); // or click submit button
await expect(page.locator('#response')).not.toBeEmpty({ timeout: 10_000 });
```
---
## SSE / streaming patterns
For chat UIs or any endpoint using `text/event-stream`:
```typescript
test('chat stream delivers response', async ({ page }) => {
await page.goto('/chat');
// Type and submit
await page.fill('#message-input', 'what is 2+2?');
await page.click('#send-btn');
// SSE response appears incrementally — wait for streaming to stop
// Strategy: wait until the response element stops growing
const responseEl = page.locator('#chat-response');
await responseEl.waitFor({ state: 'visible', timeout: 15_000 });
// Poll until content stabilises (streaming done)
let prev = '';
let stable = 0;
while (stable < 3) {
const cur = await responseEl.textContent() ?? '';
if (cur === prev && cur.length > 0) stable++;
else { stable = 0; prev = cur; }
await page.waitForTimeout(300);
}
expect(prev.length).toBeGreaterThan(0);
});
```
For a done-signal in the DOM (e.g. a `[data-streaming="false"]` attribute set when stream ends):
```typescript
await page.locator('[data-streaming="false"]').waitFor({ timeout: 20_000 });
const text = await page.locator('#chat-response').textContent();
expect(text).toBeTruthy();
```
---
## Golden path smoke test template
```typescript
// e2e/tests/smoke.spec.ts
import { test, expect } from '@playwright/test';
test.describe('smoke', () => {
test('home page loads', async ({ page }) => {
await page.goto('/');
await expect(page).toHaveTitle(/.+/);
await expect(page.locator('body')).not.toBeEmpty();
});
test('main nav is present', async ({ page }) => {
await page.goto('/');
await expect(page.locator('nav')).toBeVisible();
});
});
```
---
## Screenshot on failure
Already configured via `screenshot: 'only-on-failure'` in the config. Artifacts land in `e2e/test-results/`. View them with:
```bash
npx playwright show-report e2e/playwright-report
```
---
## CI wiring (Gitea workflow)
```yaml
e2e:
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v4
- name: Install Node deps
working-directory: e2e
run: npm ci
- name: Install Playwright browsers
working-directory: e2e
run: npx playwright install --with-deps chromium
- name: Start server
run: ./bin/server &
env:
PORT: 8080
- name: Wait for server
run: |
for i in $(seq 1 20); do
curl -sf http://localhost:8080/ && break
sleep 1
done
- name: Run E2E tests
working-directory: e2e
run: npx playwright test
env:
BASE_URL: http://localhost:8080
- name: Upload test artifacts
if: failure()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: e2e/playwright-report/
```
---
## Checklist
When adding Playwright to a project:
- [ ] `npm init -y && npm install --save-dev @playwright/test` in `e2e/`
- [ ] `npx playwright install chromium`
- [ ] `e2e/playwright.config.ts` with baseURL and screenshot on failure
- [ ] At least one smoke test covering the golden path
- [ ] `.gitignore` entries: `node_modules/`, `test-results/`, `playwright-report/`
- [ ] `task e2e` wired in Taskfile.yml
- [ ] CI step added to Gitea workflow
## Failure triage
| Symptom | Likely cause | Fix |
|---------|-------------|-----|
| `TimeoutError` on HTMX element | Swap not finished | Use `htmx:afterSwap` event or increase timeout |
| Stale content asserted | Race between click and swap | Move assertion after explicit wait |
| SSE test flaky | Stream not done yet | Poll for stable content or use done-signal attribute |
| CI fails, local passes | Server not ready | Add wait-for-server loop in CI |
| Screenshots empty | Screenshot taken before render | Add `await page.waitForLoadState('networkidle')` |

222
problem-analysis/SKILL.md Normal file
View File

@@ -0,0 +1,222 @@
---
name: problem-analysis
description: Deeply understand a problem before coding begins. Identify the real problem, stakeholders, constraints, and success criteria. Use before starting any non-trivial feature.
---
# Problem Analysis
## Core Principle
**Understand WHAT needs to be solved and WHY — before considering HOW.**
Premature solution thinking is one of the most expensive mistakes in software. A feature built on a misunderstood problem is waste, no matter how well it's coded.
> "A problem well-stated is a problem half-solved." — Charles Kettering
## When to Use
**Always use before:**
- Starting any non-trivial feature (more than a day's work)
- Tackling a vague or ambiguous requirement
- Making an architectural decision
- Starting a new project or significant component
**When NOT to use:** Bug fixes with clear reproduction steps, one-line changes, configuration updates.
## The Analysis Process
### Step 1: Understand the Real Problem
Don't accept the first statement of a problem at face value. Dig to the real underlying issue.
**Questions to ask:**
- What is the user actually trying to accomplish?
- What pain or friction does this create today?
- Why is this a problem now? What changed?
- What happens if we don't solve it?
- Who else is affected?
**The 5 Whys technique:** Ask "why" five times to reach the root cause.
```
Problem: "The invoice export is too slow"
Why? → "Users have to wait 30 seconds"
Why? → "We're generating all 5000 invoices in a single query"
Why? → "The feature was built for 50 invoices, now there are 5000"
Why? → "We didn't anticipate growth"
Root: "Batch processing architecture needed, not performance tuning"
```
The solution to "export is slow" might be pagination, async export, or caching — not just query optimization.
### Step 2: Identify Stakeholders and Their Perspectives
Every stakeholder has a different view of the problem.
| Stakeholder | Typical concern |
|-------------|----------------|
| End user | "I need to accomplish X in Y time" |
| Business | "This costs us money / blocks revenue" |
| Ops/DevOps | "This creates operational burden" |
| Compliance | "This creates regulatory risk" |
| Developer | "This is hard to maintain or extend" |
**Map the stakeholders:** Who is directly affected? Who is indirectly affected? Who has authority over the solution?
### Step 3: Define User Needs and Pain Points
Move from "what users say they want" to "what users actually need."
```
User says: "I need a report with 20 columns"
User needs: "I need to identify which customers are at risk of churning"
These may have completely different solutions.
```
**Techniques:**
- User journey mapping: walk through the current process step by step
- Pain point identification: where is friction, delay, error, confusion?
- Job-to-be-done framing: "When [situation], I want to [motivation], so I can [outcome]"
### Step 4: Clarify Requirements and Constraints
**Functional requirements:** What must the system do? (In user terms, not technical terms)
- "Users can export invoices as PDF"
- "The system sends a confirmation email within 5 minutes"
**Non-functional requirements:** How must the system behave?
- Performance: "Export completes in under 5 seconds for 90% of users"
- Security: "Only authenticated users can view their own invoices"
- Availability: "Export feature available 99.9% of the time"
**Constraints:** What is fixed and non-negotiable?
- Budget/time
- Existing systems that must be integrated
- Regulatory requirements
- Technology choices already made
### Step 5: Clarify Scope
Explicitly define what is **in scope** and what is **out of scope**. Out-of-scope items are often as important as in-scope items for preventing scope creep.
**Template:**
```
In scope:
- [Feature A]
- [Feature B]
Out of scope (for this version):
- [Feature C] — will be addressed in v2
- [Feature D] — not part of this problem
Assumptions requiring validation:
- [Assumption 1]
- [Assumption 2]
```
### Step 6: Define Success Criteria
Success criteria must be specific and testable. Vague criteria are not criteria.
```
Vague: "Make the dashboard faster"
Specific and testable:
- Dashboard LCP < 2.5s on a 4G connection (measured with Lighthouse)
- Initial data load completes in < 500ms (measured from API call to render)
- Page is usable within 3s even if non-critical data hasn't loaded
```
**Success criteria format:** `[Metric] [comparison] [threshold] [measurement method]`
### Step 7: Identify Risks and Unknowns
| Category | Questions to ask |
|----------|-----------------|
| Technical | Do we know how to build this? What might surprise us? |
| Business | Does the market actually want this? Is the problem real? |
| Regulatory | Are there compliance implications? |
| Operational | Will this create new operational burden? |
| Integration | What external systems does this depend on? |
## Output: Problem Analysis Document
The analysis should produce a `problem-analysis.md` document covering:
```markdown
# Problem Analysis: [Feature/System Name]
## Problem Statement
[One paragraph: what problem exists, for whom, and why it matters]
## Context
[How this problem arose, what triggered the need to solve it now]
## Stakeholders
| Stakeholder | Role | Primary concern |
|------------|------|----------------|
| [Name/role] | [Decision maker/user/affected] | [Their core need] |
## User Needs and Pain Points
- [Pain point 1]: [Specific friction or failure today]
- [Pain point 2]: ...
## Functional Requirements
(In user terms, not technical terms)
- [ ] [Requirement 1]
- [ ] [Requirement 2]
## Non-Functional Requirements
- Performance: [specific, measurable]
- Security: [specific]
- Availability: [specific]
## Scope
**In scope:** [list]
**Out of scope:** [list]
## Assumptions Requiring Validation
- [Assumption 1]: [How to validate]
- [Assumption 2]: ...
## Risks and Unknowns
| Risk | Impact | Likelihood | Mitigation |
|------|--------|-----------|-----------|
| [Risk 1] | High/Med/Low | High/Med/Low | [Strategy] |
## Success Criteria
- [Metric]: [threshold] ([measurement method])
- [Metric]: [threshold] ([measurement method])
## Open Questions
- [Question requiring answer before implementation begins]
```
## Mathias Context
As a digital PM building production software:
- **Capture the why:** The problem document should capture business context, not just technical requirements. Agents reading it should understand why this matters.
- **Think about the user journey:** Walk through the entire experience, not just the feature in isolation.
- **Challenge the requirement:** Is this the right problem to solve? Is there a simpler solution?
- **Estimate impact:** How many users are affected? How often? What's the cost of not solving it?
## Transition to Next Steps
After the problem is understood:
1. Load `user-stories` skill to decompose into implementable stories
2. Load `spec-driven-dev` skill to write a technical specification
3. Load `planning` skill to break into tasks
**Do NOT jump to implementation.** A complete problem analysis prevents the most expensive mistakes.
## Red Flags — When Analysis Is Incomplete
- Requirements stated in technical terms ("add an index") instead of user terms ("make search faster")
- No clear success criteria ("it should be better")
- No stakeholders identified
- Assumptions not written down
- Scope not defined
- "We know what we need to build" — this is usually premature

368
refactoring/SKILL.md Normal file
View File

@@ -0,0 +1,368 @@
---
name: refactoring
description: Systematic refactoring methodology based on Martin Fowler's catalog. Apply during the REFACTOR phase of TDD, never while making code changes that modify behavior.
---
# Refactoring
## Core Principle
> "Refactoring is a controlled technique for improving the design of existing code. Its essence is applying a series of small behavior-preserving transformations." — Martin Fowler
**The First Rule:** Never refactor without tests. Tests are your safety net. If tests don't exist, write them first (load `tdd` skill).
**The Second Rule:** Each refactoring step must keep tests green. If a step breaks tests, undo it.
**Never mix refactoring and behavior change.** Separate commits for each.
## When to Refactor
- After the GREEN phase of TDD (REFACTOR phase)
- When code smell makes a feature hard to add
- When the Boy Scout Rule demands it — you're in the area, leave it better
- When technical debt is blocking velocity
**When NOT to refactor:**
- Code that works and won't change
- Code being replaced soon
- When you don't have tests
## The Process
### Step 1: Ensure Test Coverage
```bash
go test ./... # All tests pass first
go test -race ./... # Race detector clean
```
If tests don't exist, write them before refactoring. See the `tdd` skill.
### Step 2: Identify the Target
Load the `code-review` skill to detect smells. Prioritize:
1. **Architectural smells** — Shotgun Surgery, Divergent Change, God Objects
2. **Design smells** — Long Methods, Feature Envy, Primitive Obsession
3. **Readability smells** — naming, comments, duplication
### Step 3: Apply Techniques in Small Steps
Each technique below is one small step. Make the change. Run tests. Commit if green. Repeat.
### Step 4: Verify After Each Step
```bash
go test ./...
go test -race ./...
```
If red: undo the step (git checkout), understand why, try a smaller step.
## Technique Catalog
### Composing Methods
**Extract Function**
When: a code block can be named meaningfully; a function does multiple things.
```go
// Before
func processOrder(o *Order) {
// validate
if o.Items == nil { panic("no items") }
// calculate
total := 0.0
for _, item := range o.Items {
total += item.Price
}
// save
db.Save(o, total)
}
// After: each step extracted
func processOrder(o *Order) {
validateOrder(o)
total := calculateTotal(o.Items)
saveOrder(o, total)
}
```
Cross-reference: when naming extracted functions, load `clean-code` skill for naming conventions.
**Inline Function**
When: a function's body is as clear as its name; a function only does what its name says.
**Extract Variable**
When: a complex expression is hard to read inline.
```go
// Before
if user.Subscription.Level >= 2 && !user.IsBanned && user.VerifiedAt != nil { ... }
// After
canAccessPremium := user.Subscription.Level >= 2 && !user.IsBanned && user.VerifiedAt != nil
if canAccessPremium { ... }
```
**Replace Temp with Query**
When: a temporary variable holds the result of an expression used once.
### Moving Features Between Types
**Move Method**
When: Feature Envy detected — a method uses another type's data more than its own.
```go
// Before: Order envies Customer
func (o *Order) getShippingCost() float64 {
if o.Customer.Country == "SE" { return 0 }
return 50
}
// After: moved to Customer
func (c *Customer) ShippingCost() float64 {
if c.Country == "SE" { return 0 }
return 50
}
```
**Extract Type**
When: a cluster of methods and fields have a different responsibility than the rest of the type.
**Inline Type**
When: a type isn't doing enough to justify its existence (Lazy Class smell).
**Hide Delegate**
When: callers access objects through chains (`a.B().C()`).
**Remove Middle Man**
When: a type only delegates without adding value.
### Organizing Data
**Replace Data Value with Object**
When: Primitive Obsession detected — a string is being used for email, user ID, currency.
```go
// Before
type User struct {
Email string
}
// After: Email type carries validation and prevents misuse
type Email struct {
value string
}
func NewEmail(s string) (Email, error) {
if !strings.Contains(s, "@") {
return Email{}, fmt.Errorf("invalid email: %q", s)
}
return Email{value: s}, nil
}
type User struct {
Email Email
}
```
**Replace Magic Number with Constant**
```go
// Before
if retries > 3 { ... }
// After
const maxRetries = 3
if retries > maxRetries { ... }
```
### Simplifying Conditionals
**Decompose Conditional**
When: complex conditionals obscure intent.
```go
// Before
if plan.ExpiresAt.Before(time.Now()) && plan.GracePeriodDays > 0 { ... }
// After
if plan.IsExpiredWithGrace() { ... }
```
**Replace Nested Conditional with Guard Clauses**
When: deep nesting; use early returns instead.
```go
// Before: arrow code
func process(o *Order) error {
if o != nil {
if len(o.Items) > 0 {
if o.Customer != nil {
// actual logic
}
}
}
return nil
}
// After: guard clauses
func process(o *Order) error {
if o == nil {
return errors.New("nil order")
}
if len(o.Items) == 0 {
return errors.New("empty order")
}
if o.Customer == nil {
return errors.New("no customer")
}
// actual logic
return nil
}
```
**Replace Conditional with Polymorphism**
When: repeated switch/if-else on a type string. Replace with an interface.
See the `solid` skill, OCP section for full example.
**Introduce Null Object**
When: repeated nil checks for a dependency.
### Simplifying Function Calls
**Rename Method/Variable**
The most powerful refactoring. When you understand what something does, give it a name that reflects that understanding.
Cross-reference: load `clean-code` skill for naming conventions.
**Introduce Parameter Object**
When: Long Parameter List smell — multiple parameters that always appear together.
```go
// Before
func Search(query string, page, pageSize int, sortBy string, ascending bool) Results { ... }
// After
type SearchOptions struct {
Query string
Page int
PageSize int
SortBy string
Ascending bool
}
func Search(opts SearchOptions) Results { ... }
```
**Separate Query from Modifier (Command-Query Separation)**
When: a function both changes state and returns data.
```go
// Bad: modifies AND returns
func (s *Stack) PopAndReturn() (Item, bool) {
// removes and returns
}
// Good: separate
func (s *Stack) Peek() (Item, bool) { ... } // query only
func (s *Stack) Pop() bool { ... } // command only
```
### Dealing with Generalization
**Extract Interface**
When: you have multiple implementations or want to enable testing with a test double.
```go
// Before: depends on concrete type
type Service struct {
db *postgres.DB
}
// After: depends on interface
type Store interface {
Save(ctx context.Context, u User) error
}
type Service struct {
store Store
}
```
**Replace Inheritance with Delegation (Go: Embedding → Explicit Delegation)**
When: embedding is used but only some methods of the embedded type are needed.
## Refactoring Sequence Dependencies
Apply in this order to avoid breaking changes:
1. **Extract Variable** → then Extract Function → then Move Function
2. **Encapsulate Field** → before other data refactorings
3. **Extract Function** → before Move Function
4. **Rename** → before Extract Interface
## Go-Specific Refactoring Notes
### Extracting interfaces
In Go, define the interface at the point of use, not at the implementation:
```go
// The service package defines what it needs
package service
type UserStore interface {
GetByID(ctx context.Context, id UserID) (User, error)
}
```
### Error handling during refactoring
When extracting functions, preserve the error wrapping chain:
```go
// Extracted function must wrap errors with its context
func validateOrder(o *Order) error {
if len(o.Items) == 0 {
return errors.New("order has no items")
}
return nil
}
// Caller wraps the context
if err := validateOrder(o); err != nil {
return fmt.Errorf("process order: %w", err)
}
```
### Struct options pattern (for long parameter lists)
Go idiom for optional parameters:
```go
type ServerOption func(*Server)
func WithTimeout(d time.Duration) ServerOption {
return func(s *Server) { s.timeout = d }
}
func NewServer(addr string, opts ...ServerOption) *Server {
s := &Server{addr: addr, timeout: 30 * time.Second}
for _, opt := range opts {
opt(s)
}
return s
}
```
## Refactoring Commit Strategy
Refactoring commits should be separate from feature commits:
- `refactor: extract validateOrder from processOrder`
- `refactor: replace string email with Email value type`
- `refactor: rename UserData to UserProfile`
Never combine: `feat: add discount + refactor: extract discount calculator`
## Verification Checklist
After each refactoring session:
- [ ] `go test ./...` passes
- [ ] `go test -race ./...` passes
- [ ] No behavior change (same inputs produce same outputs)
- [ ] Committed with a `refactor:` prefix commit message
- [ ] Smells addressed are documented (if doing a review)
## Cross-References
- When applying Extract Function: load `clean-code` skill for naming conventions
- When introducing interfaces: load `solid` skill for DIP and ISP guidance
- When detecting smells: load `code-review` skill
- Source: Martin Fowler, *Refactoring: Improving the Design of Existing Code* (2nd ed.)
- Full catalog: https://refactoring.guru/refactoring

View File

@@ -0,0 +1,134 @@
---
name: session-retrospective
description: Review a completed coding session (from brain/sessions/*.jsonl) and surface non-obvious learnings worth preserving. Use after a session ends, before context is lost.
---
# Session Retrospective
## Overview
A session retrospective reads the structured log of a completed session — every skill invocation, what was attempted, what failed, what passed, how long it took — and identifies what is worth preserving in the brain for future sessions. The output is a small list of learnings, each with enough context that someone reading them cold understands why they matter.
**Core principle:** A retrospective that surfaces only obvious patterns is a retrospective that wastes the brain. Filter aggressively for non-obvious.
## Iron Laws
1. **Do not call `brain_write` from this skill.** Surface candidates only — the caller decides what to write.
2. **Run `brain_query` first** to avoid surfacing learnings that are already in the brain.
3. **Every Recommendation must generalize beyond this session.** Project-specific paths and identifiers belong in Context, not Recommendation.
## When to Use
- After a coding session ends, before the operator switches context
- After a multi-day push on the same area, summarizing the whole stretch
- When the brain is feeling stale on a project area — re-running retrospective on older sessions can re-surface forgotten context
**Do not use for:**
- Sessions with no novel content (single-attempt passes, mechanical operations only)
- Sessions less than ~30 minutes long, unless something dramatic happened in them
## Inputs
The session log is in JSON Lines format. Each project that uses the brain stores logs in its own `brain/sessions/` directory (for hyperguild, that is `~/dev/AI/hyperguild/brain/sessions/<session-id>.jsonl`). Each line is one event:
```json
{"ts": "...", "phase": "tdd_red", "skill": "tdd", "outcome": "fail", "notes": "..."}
```
Either pass the session ID explicitly, or default to the most recent file.
## What is Worth Preserving
- **Patterns that worked** — approaches clean enough to repeat next time, especially if non-obvious from the codebase alone
- **Failures that revealed something** — bugs whose root cause taught you something about the codebase, the toolchain, or the approach
- **Decisions made during the session** — architectural, structural, tooling choices that future-you would want to know the reasoning behind
- **Anything that contradicts or extends established patterns** — these are the highest-value learnings
## What is NOT Worth Preserving
- Routine red-green-refactor cycles with no surprises
- Single-attempt passes with no interesting context
- Mechanical operations (file moves, renames, formatting fixes)
- Failures whose cause was a typo
- Anything that is already in `CLAUDE.md` or an existing memory file
**Decision procedure:** If the answer would be discoverable by reading CLAUDE.md, the codebase, or standard toolchain docs, drop it. If you are still unsure, ask: "would future-me, three months from now, want this to surface when querying the brain?" If no, drop it.
## Output Format
Respond in markdown. For each learning worth preserving:
```
**Learning:** One sentence describing what was learned.
**Context:** Why this session surfaced it — what made it non-obvious.
**Recommendation:** What should be done differently or repeated going forward.
```
End with a summary line: `N learnings worth writing to brain` or `No novel learnings in this session.`
The caller (operator or another agent) decides which learnings to actually write to the brain via `brain_write`. This skill does not write directly.
## Worked Example
Input session log shows: a 3-hour session implementing a JSON ingestion pipeline. The session log shows three failed attempts at handling escape characters in LLM-generated JSON, then a passing fix that introduced a `repairJSON` helper.
Output:
```
**Learning:** LLM-generated JSON containing markdown can produce invalid escape sequences (e.g. `\_` and `\#`) that break the standard JSON parser even when the structure is correct.
**Context:** The first three TDD cycles failed because the test corpus included markdown-flavored content from the LLM and the parser rejected the entire payload. The fourth cycle introduced a `repairJSON` pre-pass that normalizes escapes before parsing. Without this context, future sessions will rediscover the same failure mode.
**Recommendation:** When ingesting LLM output as JSON, route it through `repairJSON` before `json.Unmarshal`. Add this to the ingestion-pipeline reference doc.
**Learning:** Integration tests with `defer resp.Body.Close()` will fail CI's errcheck even when local `go test` is green.
**Context:** Task 8 of the brain MCP migration shipped this pattern; CI caught it 30 minutes later. The codebase enforces `defer func() { _ = resp.Body.Close() }()` to satisfy errcheck.
**Recommendation:** Implementer subagents must run `task check` (lint + test + vet) per task, not just `go test`. (Note: this learning has since been written to memory — a real run of this skill would now find it via `brain_query` and drop the candidate. Kept here as illustration.)
2 learnings worth writing to brain
```
## Anti-Patterns
| Anti-Pattern | Why It Fails |
|---|---|
| "Lots of tests passed today" | Outcome summary, not a learning. Brain queries on this return nothing useful. |
| "Refactored the parser" | Activity log, not a learning. What was learned by doing it? |
| Listing every failure as a learning | Most failures are typos or mechanical fixes. Filter to the ones with insight. |
| Project-specific paths in the recommendation | Recommendations should generalize where possible — "ingestion pipelines" not "internal/ingest/v3/parser.go". |
| Writing to brain directly from this skill | This skill surfaces candidates only. The caller decides what to write. |
## Brain MCP Integration
The brain holds prior session learnings; checking it prevents this retrospective from re-surfacing what is already preserved.
**At retrospective start:**
- Run `brain_query` with the session's main topic to surface recent related learnings. Avoid emitting a "learning" that is already in the brain.
**After retrospective output:**
- This skill does NOT call `brain_write`. The caller does, selectively.
### Logging
Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.
**At end of each phase:**
- `session_log` with `{skill: "session-retrospective", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**Phases for this skill:** surfaced
**Status semantics:**
- `pass` — the phase's intended outcome was reached (candidates were emitted).
- `fail` — the phase's intended outcome was NOT reached (retrospective could not produce coherent candidates).
- `skip` — phase was skipped intentionally (no novel learnings worth surfacing).
**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
## Mode 2 Routing Note
Reading long session logs and pulling themes is high-volume, mechanical pattern-recognition work — a strong Mode 2 routing candidate. Until Plan 6 ships the routing pod, treat as Mode 1 only.
## Cross-References
- Load `trainer` skill when you have more than ~5 candidates, or when the retrospective output feels too generous and needs a stricter quality gate before any of it goes into the brain. Trainer is the stricter cousin of this skill.
- Pair with `brain_query` calls for prior session learnings on the same topic to avoid duplication.

259
solid/SKILL.md Normal file
View File

@@ -0,0 +1,259 @@
---
name: solid
description: Apply SOLID design principles in Go. Use during architecture decisions, design reviews, and when adding new abstractions.
---
# SOLID Principles
## Overview
SOLID helps structure software to be flexible, maintainable, and testable. These principles reduce coupling and increase cohesion.
In Go, several principles manifest differently than in classical OOP languages. Where Go idioms conflict with OOP-centric SOLID, **Go wins**. The tension is noted explicitly.
## Quick Reference
| Principle | One-Liner | Question to Ask |
|-----------|-----------|-----------------|
| **S**RP | One reason to change | "Does this type have ONE reason to change?" |
| **O**CP | Open for extension, closed for modification | "Can I extend without modifying?" |
| **L**SP | Subtypes are substitutable | "Can any implementation replace another safely?" |
| **I**SP | Small, focused interfaces | "Are clients forced to depend on unused methods?" |
| **D**IP | Depend on abstractions | "Do I accept interfaces, not concrete types?" |
## S — Single Responsibility Principle
> "A class should have one, and only one, reason to change."
In Go: a **type** or **package** should have one reason to change. A package that handles both domain logic and database queries has two reasons to change.
**Detection:**
- Can you describe the type's responsibility without using "and"?
- Would different stakeholders (product, ops, DBA) request changes to different parts?
**Go example:**
```go
// Bad: multiple responsibilities
type UserHandler struct {}
func (h *UserHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { ... }
func (h *UserHandler) SaveToDatabase(u User) error { ... }
func (h *UserHandler) SendWelcomeEmail(u User) error { ... }
// Good: single responsibility each
type UserHandler struct {
store UserStore
mailer Mailer
}
type PostgresUserStore struct { db *sql.DB }
func (s *PostgresUserStore) Save(ctx context.Context, u User) error { ... }
type SMTPMailer struct { client *smtp.Client }
func (m *SMTPMailer) SendWelcome(ctx context.Context, email string) error { ... }
```
## O — Open/Closed Principle
> "Software entities should be open for extension but closed for modification."
In Go: add new behavior by implementing an interface or adding a new type, not by modifying existing code. The canonical Go pattern is a `switch` on a string type that needs a new case every time — replace it with an interface.
**Go example:**
```go
// Bad: must modify to add new payment method
func ProcessPayment(method string, amount Money) error {
switch method {
case "stripe":
return stripeCharge(amount)
case "paypal":
return paypalCharge(amount)
// Must add cases here for every new method!
}
return errors.New("unknown payment method")
}
// Good: open for extension via new types
type PaymentProcessor interface {
Charge(ctx context.Context, amount Money) error
}
type StripeProcessor struct { client *stripe.Client }
func (p *StripeProcessor) Charge(ctx context.Context, amount Money) error { ... }
type PayPalProcessor struct { client *paypal.Client }
func (p *PayPalProcessor) Charge(ctx context.Context, amount Money) error { ... }
// Add new payment method: implement PaymentProcessor. No existing code changes.
```
## L — Liskov Substitution Principle
> "Subtypes must be substitutable for their base types without altering program correctness."
In Go: **interfaces are structural**. Any type that implements the method set satisfies the interface. This means callers should be able to use any implementation interchangeably without knowing the difference.
The key question: does your implementation honor the **contract** (documented behavior, not just method signatures)?
**Go example:**
```go
// UserStore contract: Save persists a user and returns ErrDuplicate if email exists
type UserStore interface {
Save(ctx context.Context, u User) error
GetByEmail(ctx context.Context, email string) (User, error)
}
// Good: both implementations honor the contract
type PostgresUserStore struct { ... }
func (s *PostgresUserStore) Save(ctx context.Context, u User) error { ... }
type InMemoryUserStore struct { users map[string]User }
func (s *InMemoryUserStore) Save(ctx context.Context, u User) error {
if _, exists := s.users[u.Email]; exists {
return ErrDuplicate // Must return ErrDuplicate, not some other error
}
s.users[u.Email] = u
return nil
}
// Bad: violates contract — callers of UserStore cannot substitute this
type BrokenStore struct { ... }
func (s *BrokenStore) Save(ctx context.Context, u User) error {
panic("not implemented") // Violates contract
}
```
**Go tension:** Go has no inheritance, so "refused bequest" (subclass ignoring parent methods) doesn't apply. The LSP concern in Go is about interface implementations that partially satisfy the contract through panics or no-ops.
## I — Interface Segregation Principle
> "Clients should not be forced to depend on methods they do not use."
In Go: this is idiomatic. The `io.Reader`, `io.Writer`, and `io.Closer` pattern is the model — small, focused interfaces that can be composed when needed.
**Go example:**
```go
// Bad: fat interface — callers that only read must depend on write methods too
type FileStore interface {
Read(name string) ([]byte, error)
Write(name string, data []byte) error
Delete(name string) error
List(dir string) ([]string, error)
Stats(name string) (FileInfo, error)
}
// Good: segregated interfaces — callers depend only on what they need
type FileReader interface {
Read(name string) ([]byte, error)
}
type FileWriter interface {
Write(name string, data []byte) error
}
type FileDeleter interface {
Delete(name string) error
}
// Compose when needed
type FileReadWriter interface {
FileReader
FileWriter
}
// Handler that only reads: depends on FileReader only
func NewReportHandler(store FileReader) *ReportHandler { ... }
```
**Go idiom:** Define interfaces at the point of use, not at the point of implementation. The implementation package should not define the interface — the package that consumes it should.
```go
// In the consumer package (handler)
type UserStore interface {
GetByID(ctx context.Context, id UserID) (User, error)
}
// The postgres package doesn't need to know about this interface
// It just implements the method, and Go's structural typing handles the rest
```
## D — Dependency Inversion Principle
> "High-level modules should not depend on low-level modules. Both should depend on abstractions."
In Go: pass dependencies as interface parameters. Never instantiate concrete dependencies inside a function or type that contains business logic.
```go
// Bad: high-level order service depends on low-level email implementation
type OrderService struct {
emailClient *sendgrid.Client // Locked to SendGrid
db *sql.DB // Locked to PostgreSQL
}
// Good: depends on abstractions
type Mailer interface {
Send(ctx context.Context, to, subject, body string) error
}
type OrderRepository interface {
Save(ctx context.Context, o Order) error
GetByID(ctx context.Context, id OrderID) (Order, error)
}
type OrderService struct {
repo OrderRepository
mailer Mailer
}
func NewOrderService(repo OrderRepository, mailer Mailer) *OrderService {
return &OrderService{repo: repo, mailer: mailer}
}
```
**The Dependency Rule:** Source code dependencies point **inward** toward domain logic, never outward toward infrastructure.
```
HTTP handlers → Application services → Domain types
Database layer → Application services → Domain types
Domain types know nothing about HTTP, SQL, or external APIs.
```
## Applying SOLID at Architecture Level
| Principle | Package/Module Application |
|-----------|---------------------------|
| SRP | Each package has one clear purpose |
| OCP | New features = new packages/types, not edits to existing |
| LSP | All implementations of an interface are interchangeable |
| ISP | Interfaces defined at point of use, as narrow as possible |
| DIP | Domain packages import nothing from infrastructure packages |
## Go Tensions with OOP-centric SOLID
| OOP SOLID | Go reality |
|-----------|-----------|
| Inheritance hierarchies for OCP | Use interfaces + new types instead |
| Abstract base classes for LSP | No inheritance; use interface contracts |
| Explicit interface declarations | Interfaces are implicit; define where consumed |
| "Program to an interface" as ritual | Only extract interface when you have 2+ implementations or need testability |
**Don't over-abstract.** A function that takes a `*sql.DB` directly is fine if there's only one implementation and it's never tested in isolation. Extract an interface when you need it.
## Red Flags
| Flag | Likely Violation |
|------|-----------------|
| Type that "handles X and Y and Z" | SRP |
| Large `switch` on a type string | OCP |
| Implementation that panics on some methods | LSP |
| Interface with 10+ methods | ISP |
| `new(ConcreteType)` inside business logic | DIP |
| Package imports something from `infrastructure/` | DIP |
## References
- `references/solid-principles.md` — canonical SOLID reference with TypeScript examples
- `references/go-adaptation.md` — this workspace's Go-specific rewrite of each principle
- Load `clean-code` skill for naming and structure
- Load `code-review` skill for detecting violations during review

View File

@@ -0,0 +1,318 @@
# Software Architecture
## The Goal of Architecture
Enable the development team to:
1. **Add** features with minimal friction
2. **Change** existing features safely
3. **Remove** features cleanly
4. **Test** features in isolation
5. **Deploy** independently when possible
## Architectural Principles
### 1. Vertical Boundaries (Features/Slices)
Organize by **feature**, not by technical layer.
```
BAD: Layer-first
src/
controllers/
UserController.ts
OrderController.ts
services/
UserService.ts
OrderService.ts
repositories/
UserRepository.ts
OrderRepository.ts
GOOD: Feature-first
src/
users/
UserController.ts
UserService.ts
UserRepository.ts
orders/
OrderController.ts
OrderService.ts
OrderRepository.ts
```
**Why:** Changes to "users" feature stay in `users/`. High cohesion within features.
### 2. Horizontal Boundaries (Layers)
Separate concerns into layers with clear dependencies.
```
┌──────────────────────────────────────┐
│ Presentation │ UI, Controllers, CLI
├──────────────────────────────────────┤
│ Application │ Use Cases, Orchestration
├──────────────────────────────────────┤
│ Domain │ Business Logic, Entities
├──────────────────────────────────────┤
│ Infrastructure │ Database, APIs, External
└──────────────────────────────────────┘
```
### 3. The Dependency Rule
**Dependencies point INWARD.**
```
Infrastructure → Application → Domain
↓ ↓ ↓
(outer) (middle) (inner)
```
- Inner layers know NOTHING about outer layers
- Domain has zero dependencies on infrastructure
- Use interfaces to invert dependencies
```typescript
// Domain defines the interface (inner)
interface UserRepository {
save(user: User): Promise<void>;
findById(id: UserId): Promise<User | null>;
}
// Infrastructure implements it (outer)
class PostgresUserRepository implements UserRepository {
save(user: User): Promise<void> {
// SQL here
}
}
// Domain service uses the interface
class UserService {
constructor(private repo: UserRepository) {} // Depends on abstraction
}
```
### 4. Contracts
Interfaces define boundaries between components.
```typescript
// The contract
interface PaymentGateway {
charge(amount: Money, card: CardDetails): Promise<ChargeResult>;
refund(chargeId: string): Promise<RefundResult>;
}
// Multiple implementations possible
class StripeGateway implements PaymentGateway { }
class PayPalGateway implements PaymentGateway { }
class MockGateway implements PaymentGateway { } // For tests
```
### 5. Cross-Cutting Concerns
Concerns that span multiple features: logging, auth, validation, error handling.
**Options:**
- Middleware/interceptors
- Decorators
- Aspect-oriented approaches
- Base classes (use sparingly)
```typescript
// Middleware approach
class LoggingMiddleware {
handle(request: Request, next: Handler): Response {
console.log(`Request: ${request.path}`);
const response = next(request);
console.log(`Response: ${response.status}`);
return response;
}
}
```
### 6. Conway's Law
> "Organizations design systems that mirror their communication structure."
**Implication:** Team structure affects architecture. Align both intentionally.
---
## Common Architectural Styles
### Layered Architecture
Traditional layers: Presentation → Business → Persistence
**Pros:** Simple, well-understood
**Cons:** Can become a "big ball of mud" without discipline
### Hexagonal Architecture (Ports & Adapters)
Domain at center, adapters around the edges.
```
┌─────────────────────┐
│ HTTP Adapter │
└─────────┬───────────┘
┌─────────────────▼─────────────────┐
│ DOMAIN │
│ ┌─────────────────────────┐ │
│ │ Business Logic │ │
│ │ Use Cases │ │
│ └─────────────────────────┘ │
└─────────────────┬─────────────────┘
┌─────────▼───────────┐
│ Database Adapter │
└─────────────────────┘
```
**Ports:** Interfaces defined by the domain
**Adapters:** Implementations that connect to the outside world
### Clean Architecture
Similar to Hexagonal, with explicit layers:
1. **Entities** - Enterprise business rules
2. **Use Cases** - Application business rules
3. **Interface Adapters** - Controllers, Presenters, Gateways
4. **Frameworks & Drivers** - Web, DB, External interfaces
---
## Feature-Driven Structure (Frontend)
```
src/
features/
auth/
components/
LoginForm.tsx
SignupForm.tsx
hooks/
useAuth.ts
services/
authService.ts
types/
auth.types.ts
index.ts # Public API
checkout/
components/
hooks/
services/
types/
index.ts
shared/
components/ # Truly shared UI
hooks/ # Truly shared hooks
utils/ # Truly shared utilities
```
---
## Feature-Driven Structure (Backend)
```
src/
modules/
users/
domain/
User.ts
UserRepository.ts # Interface
application/
CreateUser.ts # Use case
GetUser.ts # Use case
infrastructure/
PostgresUserRepo.ts
presentation/
UserController.ts
UserDTO.ts
orders/
domain/
application/
infrastructure/
presentation/
shared/
domain/ # Shared value objects
infrastructure/ # Shared infra utilities
```
---
## The Walking Skeleton
Start with a minimal end-to-end slice:
1. **Thinnest possible feature** that touches all layers
2. **Deployable** from day one
3. **Proves the architecture** works
Example walking skeleton for e-commerce:
- User can view ONE product (hardcoded)
- User can add it to cart
- User can "checkout" (just logs)
From there, flesh out each feature fully.
---
## Testing Architecture
```
┌────────────────────────────────────────────┐
│ E2E / Acceptance Tests │ Few, slow, high confidence
├────────────────────────────────────────────┤
│ Integration Tests │ Some, medium speed
├────────────────────────────────────────────┤
│ Unit Tests │ Many, fast, isolated
└────────────────────────────────────────────┘
```
**Test by layer:**
- **Domain:** Unit tests (most tests here)
- **Application:** Integration tests with mocked infra
- **Infrastructure:** Integration tests with real dependencies
- **E2E:** Critical paths only
---
## Architecture Decision Records (ADRs)
Document significant decisions:
```markdown
# ADR 001: Use PostgreSQL for persistence
## Status
Accepted
## Context
We need a database. Options: PostgreSQL, MongoDB, MySQL
## Decision
PostgreSQL for:
- ACID compliance
- Team familiarity
- JSON support for flexibility
## Consequences
- Need PostgreSQL expertise
- Schema migrations required
- Excellent query capabilities
```
---
## Red Flags in Architecture
- **Circular dependencies** between modules
- **Domain depending on infrastructure**
- **Framework code in business logic**
- **No clear boundaries** between features
- **Shared mutable state** across modules
- **"Util" or "Common" packages** that grow forever
- **Database schema driving domain model**

View File

@@ -0,0 +1,286 @@
# Managing Complexity
## The Two Types of Complexity
### Essential Complexity
Inherent to the problem domain. Cannot be removed, only managed.
- Business rules
- Domain logic
- User requirements
### Accidental Complexity
Introduced by our solutions. CAN and SHOULD be minimized.
- Poor abstractions
- Unnecessary indirection
- Framework ceremony
- Technical debt
**Goal: Minimize accidental complexity while clearly expressing essential complexity.**
---
## Detecting Complexity
### 1. Change Amplification
Small changes require touching many files.
**Symptom:** "To add this field, I need to update 15 files."
**Cause:** Scattered responsibilities, poor abstraction boundaries.
### 2. Cognitive Load
Code is hard to understand, requires holding too much in memory.
**Symptom:** "I need to understand 10 other classes to understand this one."
**Cause:** Tight coupling, hidden dependencies, unclear naming.
### 3. Unknown Unknowns
Behavior is surprising, side effects are hidden.
**Symptom:** "I changed this, and something completely unrelated broke."
**Cause:** Global state, hidden dependencies, implicit contracts.
---
## The XP Values for Fighting Complexity
From Extreme Programming:
### 1. Communication
Code should communicate clearly. Names, structure, tests all contribute.
### 2. Simplicity
Do the simplest thing that could possibly work.
### 3. Feedback
Fast feedback loops catch complexity early. TDD, CI, code review.
### 4. Courage
Refactor aggressively. Don't let complexity accumulate.
### 5. Respect
Respect future readers (including yourself). Write for humans first.
---
## KISS - Keep It Simple, Silly
> "The simplest solution that works is usually the best."
### How to Apply:
1. Start with the obvious solution
2. Only add complexity when REQUIRED
3. Prefer boring, well-understood approaches
4. Question every abstraction
```typescript
// Over-engineered
class UserServiceFactoryProvider {
private static instance: UserServiceFactoryProvider;
static getInstance(): UserServiceFactoryProvider { ... }
createFactory(): UserServiceFactory { ... }
}
// KISS
class UserService {
getUser(id: string): User { ... }
}
```
---
## YAGNI - You Aren't Gonna Need It
> "Don't build features until they're actually needed."
### Warning Signs:
- "We might need this later"
- "It would be nice to have"
- "Just in case"
- "For future extensibility"
### The Cost of YAGNI Violations:
1. **Development time** - Building unused features
2. **Maintenance burden** - Code that must be maintained
3. **Cognitive load** - More to understand
4. **Wrong abstraction** - Guessing future needs incorrectly
```typescript
// YAGNI violation: Building for hypothetical needs
class User {
// "We might need these someday"
middleName?: string;
secondaryEmail?: string;
faxNumber?: string;
linkedinProfile?: string;
twitterHandle?: string;
}
// YAGNI: Only what's needed NOW
class User {
name: string;
email: Email;
}
```
---
## DRY - Don't Repeat Yourself (with The Rule of Three)
> "Every piece of knowledge should have a single, unambiguous representation."
### BUT: The Rule of Three
**Don't extract duplication until you see it THREE times.**
Why? The wrong abstraction is worse than duplication.
```
Duplication #1 → Leave it
Duplication #2 → Note it, leave it
Duplication #3 → NOW extract it
```
### Example:
```typescript
// First time - leave it
function processUserOrder(order) {
validate(order);
calculateTax(order);
save(order);
}
// Second time - note the similarity, but leave it
function processGuestOrder(order) {
validate(order);
calculateTax(order);
save(order);
sendGuestEmail(order);
}
// Third time - NOW extract
function processCorporateOrder(order) {
validate(order);
calculateTax(order);
save(order);
applyCorporateDiscount(order);
}
// After three, extract the common parts
function processOrder(order: Order, postProcessing: (o: Order) => void) {
validate(order);
calculateTax(order);
save(order);
postProcessing(order);
}
```
---
## Separation of Concerns
> "Each module should address a single concern."
### Concerns to Separate:
- **Business logic** vs **Infrastructure**
- **What** (policy) vs **How** (mechanism)
- **Input** vs **Processing** vs **Output**
- **Data** vs **Behavior**
### Example:
```typescript
// BAD: Mixed concerns
class OrderProcessor {
process(order: Order) {
// Validation
if (!order.items.length) throw new Error('Empty');
// Business logic
let total = 0;
for (const item of order.items) {
total += item.price * item.quantity;
}
// Persistence
const db = new Database();
db.query(`INSERT INTO orders...`);
// Notification
const email = new EmailClient();
email.send(order.customer.email, 'Order confirmed');
}
}
// GOOD: Separated concerns
class OrderProcessor {
constructor(
private validator: OrderValidator,
private calculator: OrderCalculator,
private repository: OrderRepository,
private notifier: OrderNotifier
) {}
process(order: Order): ProcessResult {
this.validator.validate(order);
const total = this.calculator.calculateTotal(order);
const savedOrder = this.repository.save(order);
this.notifier.notifyConfirmation(savedOrder);
return ProcessResult.success(savedOrder);
}
}
```
---
## Managing Technical Debt
### Types of Technical Debt:
1. **Deliberate** - Conscious trade-off for speed
2. **Accidental** - Mistakes, lack of knowledge
3. **Bit rot** - Code degrades over time
### The Boy Scout Rule:
> "Leave the code better than you found it."
Every time you touch code:
- Improve one small thing
- Fix one naming issue
- Extract one method
- Add one missing test
### When to Pay Down Debt:
- When it's in your path (you're already there)
- When it's blocking new features
- When it's causing bugs
- During dedicated refactoring time
### When NOT to Refactor:
- Code that works and won't change
- Code being replaced soon
- When you don't have tests
---
## The Four Elements of Simple Design
In priority order (from XP):
1. **Runs all the tests**
- If it doesn't work, nothing else matters
2. **Expresses intent**
- Clear names, obvious structure
- Code tells the story
3. **No duplication**
- DRY (but Rule of Three)
- Single source of truth
4. **Minimal**
- Fewest classes and methods possible
- Remove anything unnecessary
If these four are true, the design is simple enough.

View File

@@ -0,0 +1,504 @@
# Design Patterns
## What Are Design Patterns?
Reusable solutions to common design problems. A shared vocabulary for discussing design.
## WARNING: Don't Force Patterns
> "Let patterns emerge from refactoring, don't force them upfront."
Patterns should solve problems you HAVE, not problems you MIGHT have.
## When to Use Patterns
1. **You recognize the problem** - You've seen it before
2. **The pattern fits** - Not forcing it
3. **It simplifies** - Doesn't add unnecessary complexity
4. **Team understands it** - Shared knowledge
---
## Creational Patterns
### Singleton
**Purpose:** Ensure only one instance exists.
**When to use:** Global configuration, connection pools, logging.
**Warning:** Often overused. Consider dependency injection instead.
```typescript
class Logger {
private static instance: Logger;
private constructor() {}
static getInstance(): Logger {
if (!Logger.instance) {
Logger.instance = new Logger();
}
return Logger.instance;
}
log(message: string): void { ... }
}
```
### Factory
**Purpose:** Create objects without specifying exact class.
**When to use:** Object creation logic is complex, or varies by type.
```typescript
interface Notification {
send(message: string): void;
}
class EmailNotification implements Notification { ... }
class SMSNotification implements Notification { ... }
class PushNotification implements Notification { ... }
class NotificationFactory {
create(type: 'email' | 'sms' | 'push'): Notification {
switch (type) {
case 'email': return new EmailNotification();
case 'sms': return new SMSNotification();
case 'push': return new PushNotification();
}
}
}
```
### Builder
**Purpose:** Construct complex objects step by step.
**When to use:** Objects with many optional parameters, test data creation.
```typescript
class UserBuilder {
private user: Partial<User> = {};
withName(name: string): UserBuilder {
this.user.name = name;
return this;
}
withEmail(email: string): UserBuilder {
this.user.email = email;
return this;
}
withAge(age: number): UserBuilder {
this.user.age = age;
return this;
}
build(): User {
return new User(
this.user.name!,
this.user.email!,
this.user.age
);
}
}
// Usage
const user = new UserBuilder()
.withName('Alice')
.withEmail('alice@example.com')
.build();
```
### Prototype
**Purpose:** Create new objects by cloning existing ones.
**When to use:** Object creation is expensive, or you need copies with slight variations.
```typescript
interface Prototype {
clone(): Prototype;
}
class Document implements Prototype {
constructor(
public title: string,
public content: string,
public metadata: Metadata
) {}
clone(): Document {
return new Document(
this.title,
this.content,
{ ...this.metadata }
);
}
}
```
---
## Structural Patterns
### Adapter
**Purpose:** Make incompatible interfaces work together.
**When to use:** Integrating third-party libraries, legacy code.
```typescript
// Third-party library with different interface
class OldPaymentAPI {
makePayment(cents: number): boolean { ... }
}
// Our interface
interface PaymentGateway {
charge(amount: Money): ChargeResult;
}
// Adapter
class OldPaymentAdapter implements PaymentGateway {
constructor(private oldAPI: OldPaymentAPI) {}
charge(amount: Money): ChargeResult {
const cents = amount.toCents();
const success = this.oldAPI.makePayment(cents);
return success ? ChargeResult.success() : ChargeResult.failed();
}
}
```
### Decorator
**Purpose:** Add behavior to objects dynamically.
**When to use:** Adding features without modifying existing code.
```typescript
interface Notifier {
send(message: string): void;
}
class EmailNotifier implements Notifier {
send(message: string): void {
console.log(`Email: ${message}`);
}
}
// Decorators
class SMSDecorator implements Notifier {
constructor(private wrapped: Notifier) {}
send(message: string): void {
this.wrapped.send(message);
console.log(`SMS: ${message}`);
}
}
class SlackDecorator implements Notifier {
constructor(private wrapped: Notifier) {}
send(message: string): void {
this.wrapped.send(message);
console.log(`Slack: ${message}`);
}
}
// Usage - compose behaviors
const notifier = new SlackDecorator(
new SMSDecorator(
new EmailNotifier()
)
);
notifier.send('Alert!'); // Sends to all three
```
### Proxy
**Purpose:** Control access to an object.
**When to use:** Lazy loading, access control, logging, caching.
```typescript
interface Image {
display(): void;
}
class RealImage implements Image {
constructor(private filename: string) {
this.loadFromDisk(); // Expensive
}
private loadFromDisk(): void { ... }
display(): void { ... }
}
// Lazy loading proxy
class ImageProxy implements Image {
private realImage: RealImage | null = null;
constructor(private filename: string) {}
display(): void {
if (!this.realImage) {
this.realImage = new RealImage(this.filename);
}
this.realImage.display();
}
}
```
### Composite
**Purpose:** Treat individual objects and compositions uniformly.
**When to use:** Tree structures, hierarchies (files/folders, UI components).
```typescript
interface Component {
getPrice(): number;
}
class Product implements Component {
constructor(private price: number) {}
getPrice(): number {
return this.price;
}
}
class Box implements Component {
private children: Component[] = [];
add(component: Component): void {
this.children.push(component);
}
getPrice(): number {
return this.children.reduce(
(sum, child) => sum + child.getPrice(),
0
);
}
}
// Usage
const smallBox = new Box();
smallBox.add(new Product(10));
smallBox.add(new Product(20));
const bigBox = new Box();
bigBox.add(smallBox);
bigBox.add(new Product(50));
console.log(bigBox.getPrice()); // 80
```
---
## Behavioral Patterns
### Strategy
**Purpose:** Define a family of algorithms, make them interchangeable.
**When to use:** Multiple ways to do something, switchable at runtime.
```typescript
interface PricingStrategy {
calculate(basePrice: number): number;
}
class RegularPricing implements PricingStrategy {
calculate(basePrice: number): number {
return basePrice;
}
}
class PremiumDiscount implements PricingStrategy {
calculate(basePrice: number): number {
return basePrice * 0.8; // 20% off
}
}
class BlackFriday implements PricingStrategy {
calculate(basePrice: number): number {
return basePrice * 0.5; // 50% off
}
}
class ShoppingCart {
constructor(private pricing: PricingStrategy) {}
calculateTotal(items: Item[]): number {
const base = items.reduce((sum, i) => sum + i.price, 0);
return this.pricing.calculate(base);
}
}
```
### Observer
**Purpose:** Notify multiple objects about state changes.
**When to use:** Event systems, pub/sub, reactive updates.
```typescript
interface Observer {
update(event: Event): void;
}
class EventEmitter {
private observers: Observer[] = [];
subscribe(observer: Observer): void {
this.observers.push(observer);
}
unsubscribe(observer: Observer): void {
this.observers = this.observers.filter(o => o !== observer);
}
notify(event: Event): void {
this.observers.forEach(o => o.update(event));
}
}
// Usage
class OrderService extends EventEmitter {
placeOrder(order: Order): void {
// Process order...
this.notify({ type: 'ORDER_PLACED', order });
}
}
class EmailService implements Observer {
update(event: Event): void {
if (event.type === 'ORDER_PLACED') {
this.sendConfirmation(event.order);
}
}
}
```
### Template Method
**Purpose:** Define algorithm skeleton, let subclasses override steps.
**When to use:** Common algorithm with varying steps.
```typescript
abstract class DataExporter {
// Template method - defines the algorithm
export(data: Data[]): void {
this.validate(data);
const formatted = this.format(data);
this.write(formatted);
this.notify();
}
// Common steps
private validate(data: Data[]): void { ... }
private notify(): void { ... }
// Steps to override
protected abstract format(data: Data[]): string;
protected abstract write(content: string): void;
}
class CSVExporter extends DataExporter {
protected format(data: Data[]): string {
return data.map(d => d.toCSV()).join('\n');
}
protected write(content: string): void {
fs.writeFileSync('export.csv', content);
}
}
class JSONExporter extends DataExporter {
protected format(data: Data[]): string {
return JSON.stringify(data);
}
protected write(content: string): void {
fs.writeFileSync('export.json', content);
}
}
```
### Command
**Purpose:** Encapsulate a request as an object.
**When to use:** Undo/redo, queuing, logging actions.
```typescript
interface Command {
execute(): void;
undo(): void;
}
class AddItemCommand implements Command {
constructor(
private cart: Cart,
private item: Item
) {}
execute(): void {
this.cart.add(this.item);
}
undo(): void {
this.cart.remove(this.item);
}
}
class CommandHistory {
private history: Command[] = [];
execute(command: Command): void {
command.execute();
this.history.push(command);
}
undo(): void {
const command = this.history.pop();
command?.undo();
}
}
```
---
## Pattern Awareness
### The Four-Dimensional Lens
When analyzing new code/libraries, ask:
1. **What problem does it solve?** (Creational, Structural, Behavioral)
2. **What scope?** (Object-level, Class-level, System-level)
3. **When is it applied?** (Compile-time, Runtime)
4. **How coupled?** (Tight, Loose)
This helps recognize patterns even in unfamiliar code.
---
## Anti-Patterns to Avoid
| Anti-Pattern | Problem | Solution |
|--------------|---------|----------|
| **God Object** | Class does everything | Split by responsibility |
| **Spaghetti Code** | Tangled, no structure | Refactor to layers |
| **Golden Hammer** | Using one pattern for everything | Match pattern to problem |
| **Premature Optimization** | Optimizing before needed | YAGNI, profile first |
| **Copy-Paste Programming** | Duplication | Extract, Rule of Three |

View File

@@ -0,0 +1,317 @@
# SOLID Principles: Go Adaptation
This document rewrites each SOLID principle with idiomatic Go examples. Where Go idioms conflict with OOP-centric formulations, Go wins. Tension is noted.
## S — Single Responsibility Principle in Go
**OOP formulation:** A class should have one reason to change.
**Go formulation:** A type or package should have one reason to change. Responsibility maps to the unit of deployment (package), not just types.
### Go example
```go
// Bad: Order type mixes domain, persistence, and notification
package order
type Order struct {
ID string
Items []Item
Customer Customer
}
func (o *Order) Save(db *sql.DB) error { ... } // persistence concern
func (o *Order) SendReceipt(smtp *mail.Client) error { ... } // notification concern
func (o *Order) CalculateTotal() Money { ... } // domain concern (correct)
// Good: each type/package has one reason to change
package order
// Domain type: only reason to change = business rules change
type Order struct {
ID OrderID
Items []Item
Customer Customer
}
func (o Order) Total() Money {
total := Money{}
for _, item := range o.Items {
total = total.Add(item.Price.Multiply(item.Quantity))
}
return total
}
package orderstore
// Persistence: only reason to change = storage mechanism changes
type Store struct { db *sql.DB }
func (s *Store) Save(ctx context.Context, o order.Order) error { ... }
package ordernotify
// Notification: only reason to change = notification channel changes
type Notifier struct { mailer Mailer }
func (n *Notifier) SendReceipt(ctx context.Context, o order.Order) error { ... }
```
### Package-level SRP
Package names should be nouns that describe one concept:
- `store`, `cache`, `handler`, `validator` — single responsibility
- `util`, `common`, `misc`, `helpers` — SRP violation waiting to happen
---
## O — Open/Closed Principle in Go
**OOP formulation:** Open for extension via subclassing, closed for modification.
**Go formulation:** Open for extension by implementing an interface or adding new types; closed for modification of existing types. No subclassing needed.
### Go example
```go
// Bad: adding a new discount type requires modifying existing code
func ApplyDiscount(order *Order, discountType string) Money {
switch discountType {
case "percentage":
return order.Total().Multiply(0.9)
case "fixed":
return order.Total().Subtract(Money{Amount: 10})
// Must add case here every time
}
return order.Total()
}
// Good: add new discount types by implementing the interface
type Discounter interface {
Apply(total Money) Money
}
type PercentageDiscount struct{ Percent float64 }
func (d PercentageDiscount) Apply(total Money) Money {
return total.Multiply(1 - d.Percent/100)
}
type FixedDiscount struct{ Amount Money }
func (d FixedDiscount) Apply(total Money) Money {
return total.Subtract(d.Amount)
}
// Adding SeniorDiscount requires zero changes to existing code
type SeniorDiscount struct{}
func (d SeniorDiscount) Apply(total Money) Money {
return total.Multiply(0.85)
}
func ApplyDiscount(order *Order, d Discounter) Money {
return d.Apply(order.Total())
}
```
### Go tension
Go has no inheritance, so "closed for modification" is natural — you can't subclass a concrete type to override behavior. The extension point is always an interface. If you're adding switch cases to handle new types, that's the signal to introduce an interface.
---
## L — Liskov Substitution Principle in Go
**OOP formulation:** Subtypes must be substitutable for their base types.
**Go formulation:** Any implementation of an interface must honor the interface's documented contract, not just its method signatures. Go's structural typing means LSP is enforced by convention and documentation, not the compiler.
### Contract documentation
```go
// UserStore: all implementations must honor this contract:
// - Save: persists user, returns ErrDuplicateEmail if email already exists
// - GetByEmail: returns ErrNotFound if user does not exist
// - Both methods must be safe for concurrent use
type UserStore interface {
Save(ctx context.Context, u User) error
GetByEmail(ctx context.Context, email string) (User, error)
}
var (
ErrDuplicateEmail = errors.New("duplicate email")
ErrNotFound = errors.New("not found")
)
// PostgresUserStore: honors contract
type PostgresUserStore struct { db *sql.DB }
func (s *PostgresUserStore) Save(ctx context.Context, u User) error {
_, err := s.db.ExecContext(ctx, "INSERT INTO users ...", u.Email)
if isUniqueViolation(err) {
return ErrDuplicateEmail // Returns documented error
}
return err
}
// InMemoryUserStore: honors contract (for tests)
type InMemoryUserStore struct {
mu sync.Mutex
users map[string]User
}
func (s *InMemoryUserStore) Save(ctx context.Context, u User) error {
s.mu.Lock()
defer s.mu.Unlock()
if _, exists := s.users[u.Email]; exists {
return ErrDuplicateEmail // Same documented error
}
s.users[u.Email] = u
return nil
}
```
### LSP violations to watch for
```go
// Bad: violates contract — does not return ErrDuplicateEmail
type CachedUserStore struct { ... }
func (s *CachedUserStore) Save(ctx context.Context, u User) error {
return errors.New("cache: duplicate key") // Different error type breaks callers
}
// Bad: panics on some inputs — violates contract
type LazyUserStore struct { ... }
func (s *LazyUserStore) GetByEmail(ctx context.Context, email string) (User, error) {
panic("not implemented yet") // Violates LSP
}
```
---
## I — Interface Segregation Principle in Go
**OOP formulation:** Clients should not depend on methods they do not use.
**Go formulation:** Define the narrowest interface possible at the point of use. Go's structural typing makes this natural — you don't need the implementation to declare what it implements.
### io.Reader as the canonical example
```go
// io.Reader is a single-method interface
type Reader interface {
Read(p []byte) (n int, err error)
}
// Any type with Read() satisfies this — os.File, bytes.Buffer, net.Conn, etc.
// Callers that only need to read accept Reader, not a fat interface
func parseConfig(r io.Reader) (Config, error) { ... }
```
### Define interfaces where you consume them
```go
// package report — only needs to read invoices
package report
// Define the interface here, at the point of use — not in the invoice package
type InvoiceReader interface {
GetByID(ctx context.Context, id InvoiceID) (Invoice, error)
ListByCustomer(ctx context.Context, customerID CustomerID) ([]Invoice, error)
}
func NewReporter(invoices InvoiceReader) *Reporter { ... }
// The invoice package's PostgresStore has 10+ methods
// This interface only exposes what the reporter needs
// Adding new methods to PostgresStore never forces changes to the reporter
```
### Composing interfaces
```go
type Reader interface {
Read(p []byte) (n int, err error)
}
type Writer interface {
Write(p []byte) (n int, err error)
}
// Compose at the call site, not in the definition
type ReadWriter interface {
Reader
Writer
}
```
---
## D — Dependency Inversion Principle in Go
**OOP formulation:** High-level modules should not depend on low-level modules. Both depend on abstractions.
**Go formulation:** Accept interfaces, not concrete types. Domain packages import nothing from infrastructure packages. Infrastructure packages import from domain packages.
### The dependency direction rule
```
cmd/ → internal/handler → internal/service → internal/domain
internal/store (postgres) → internal/domain
internal/mailer (smtp) → internal/domain
domain imports nothing from store, mailer, handler
```
### Constructor injection pattern
```go
// Good: domain service accepts interfaces
package service
type UserService struct {
store UserStore // interface
mailer Mailer // interface
logger *slog.Logger
}
// NewUserService constructs with injected dependencies.
// All parameters are interfaces — callers provide implementations.
func NewUserService(store UserStore, mailer Mailer, logger *slog.Logger) *UserService {
return &UserService{store: store, mailer: mailer, logger: logger}
}
// Bad: domain service imports infrastructure
package service
import "github.com/example/myapp/internal/store/postgres"
type UserService struct {
store *postgres.Store // Locked to PostgreSQL — breaks DIP
}
```
### Wire it up at the boundary
```go
// cmd/server/main.go — the composition root
func main() {
db := mustOpenDB(cfg.DatabaseURL)
store := postgres.NewUserStore(db)
mailer := smtp.NewMailer(cfg.SMTP)
logger := slog.Default()
svc := service.NewUserService(store, mailer, logger)
handler := handler.NewUserHandler(svc)
// ...
}
```
The `main` function (or a dependency injection framework) is the only place that names concrete implementations. All other code depends on interfaces.
### When NOT to extract an interface
Don't extract an interface prematurely. These are fine as concrete dependencies:
- `*slog.Logger` — no interface needed; it already accepts a Handler interface internally
- `*sql.DB` — acceptable in `store` packages; extract an interface at the service boundary
- Standard library types that are stable and have no test double need
Only extract an interface when you have:
1. Multiple implementations (real + test double), or
2. A package boundary you want to keep clean (domain knows nothing of postgres)

View File

@@ -0,0 +1,328 @@
# Object-Oriented Design
## Responsibility-Driven Design (RDD)
The key insight: **Objects are defined by their responsibilities, not their data.**
### Finding Objects
Start with:
1. **Nouns** in requirements → candidate objects
2. **Verbs** → candidate methods/behaviors
3. **Domain concepts** → value objects
### Finding Responsibilities
Each object should answer:
- What does this object **know**?
- What does this object **do**?
- What does this object **decide**?
### Object Stereotypes
Every class fits one (or maybe two) stereotypes:
| Stereotype | Purpose | Example |
|------------|---------|---------|
| **Information Holder** | Knows things, holds data | `User`, `Product`, `Address` |
| **Structurer** | Maintains relationships | `OrderItems`, `UserGroup` |
| **Service Provider** | Performs work | `PaymentProcessor`, `EmailSender` |
| **Coordinator** | Orchestrates workflow | `OrderFulfillmentService` |
| **Controller** | Makes decisions, delegates | `CheckoutController` |
| **Interfacer** | Transforms between systems | `UserAPIAdapter`, `DatabaseMapper` |
### The Two Questions
For every class, ask:
1. **"What pattern is this?"** - Which stereotype? Which design pattern?
2. **"Is it doing too much?"** - Check object calisthenics rules
If you can't answer clearly, the class needs refactoring.
---
## Tell, Don't Ask
**Command objects to do work. Don't interrogate them and do the work yourself.**
```typescript
// BAD: Asking, then doing
if (account.getBalance() >= amount) {
account.setBalance(account.getBalance() - amount);
// more logic here...
}
// GOOD: Telling
const result = account.withdraw(amount);
if (result.isSuccess()) {
// ...
}
```
The object that has the data should have the behavior.
---
## Design by Contract (DbC)
Every method has:
- **Preconditions** - What must be true BEFORE calling
- **Postconditions** - What will be true AFTER calling
- **Invariants** - What is ALWAYS true about the object
```typescript
class BankAccount {
private balance: Money;
// INVARIANT: balance is never negative
// PRECONDITION: amount > 0
// POSTCONDITION: balance decreased by amount OR error returned
withdraw(amount: Money): WithdrawResult {
if (amount.isNegativeOrZero()) {
return WithdrawResult.invalidAmount();
}
if (this.balance.isLessThan(amount)) {
return WithdrawResult.insufficientFunds();
}
this.balance = this.balance.minus(amount);
return WithdrawResult.success(this.balance);
}
}
```
---
## Composition Over Inheritance
**Prefer composing objects over extending classes.**
### Why Inheritance is Problematic:
- Tight coupling between parent and child
- Fragile base class problem
- Difficult to change parent without breaking children
- Forces "is-a" relationship that may not fit
### When to Use Inheritance:
- True "is-a" relationship (rare)
- Framework requirements
- Template Method pattern (intentional)
### Prefer Composition:
```typescript
// BAD: Inheritance
class PremiumUser extends User {
getDiscount(): number { return 20; }
}
// GOOD: Composition
class User {
constructor(private discountPolicy: DiscountPolicy) {}
getDiscount(): number {
return this.discountPolicy.calculate();
}
}
// Now discount behavior is pluggable
new User(new PremiumDiscount());
new User(new StandardDiscount());
new User(new NoDiscount());
```
---
## The Law of Demeter (Principle of Least Knowledge)
**Only talk to your immediate friends.**
A method should only call:
1. Methods on `this`
2. Methods on parameters
3. Methods on objects it creates
4. Methods on its direct components
```typescript
// BAD: Reaching through objects
order.getCustomer().getAddress().getCity();
// GOOD: Ask the immediate friend
order.getShippingCity();
```
This reduces coupling - changes to `Address` don't ripple through all callers.
---
## Encapsulation
**Hide internal details, expose behavior.**
### Levels of Encapsulation:
1. **Data** - private fields, no direct access
2. **Implementation** - how things work internally
3. **Type** - concrete class hidden behind interface
4. **Design** - architectural decisions hidden from clients
```typescript
// BAD: Exposed internals
class Order {
public items: Item[] = [];
public total: number = 0;
}
// Client can corrupt state
order.items.push(item);
order.total = -999; // Oops!
// GOOD: Encapsulated
class Order {
private items: OrderItems;
private total: Money;
addItem(item: Item): void {
this.items.add(item);
this.recalculateTotal();
}
getTotal(): Money {
return this.total; // Returns copy or immutable
}
}
```
---
## Polymorphism
**Replace conditionals with types.**
```typescript
// BAD: Type checking
function calculateShipping(method: string, value: number): number {
if (method === 'standard') return value < 50 ? 5 : 0;
if (method === 'express') return 15;
if (method === 'overnight') return 25;
throw new Error('Unknown method');
}
// GOOD: Polymorphism
interface ShippingMethod {
calculateCost(orderValue: number): number;
}
class StandardShipping implements ShippingMethod {
calculateCost(orderValue: number): number {
return orderValue < 50 ? 5 : 0;
}
}
class ExpressShipping implements ShippingMethod {
calculateCost(orderValue: number): number {
return 15;
}
}
// Usage - no conditionals
function calculateShipping(method: ShippingMethod, value: number): number {
return method.calculateCost(value);
}
```
---
## Value Objects vs Entities
### Value Objects
- Defined by their attributes (no identity)
- Immutable
- Comparable by value
- Examples: `Money`, `Email`, `Address`, `DateRange`
```typescript
class Money {
constructor(
private readonly amount: number,
private readonly currency: string
) {}
equals(other: Money): boolean {
return this.amount === other.amount &&
this.currency === other.currency;
}
add(other: Money): Money {
if (this.currency !== other.currency) {
throw new CurrencyMismatch();
}
return new Money(this.amount + other.amount, this.currency);
}
}
```
### Entities
- Have identity (survives attribute changes)
- Usually mutable (via methods)
- Comparable by identity
- Examples: `User`, `Order`, `Product`
```typescript
class User {
constructor(
private readonly id: UserId,
private email: Email,
private name: Name
) {}
equals(other: User): boolean {
return this.id.equals(other.id); // Identity comparison
}
changeEmail(newEmail: Email): void {
this.email = newEmail; // Still same user
}
}
```
---
## Aggregates
A cluster of objects treated as a single unit for data changes.
- One object is the **aggregate root** (entry point)
- External code only references the root
- Root enforces invariants for the entire cluster
```typescript
// Order is the aggregate root
class Order {
private items: OrderItem[] = [];
// All access through the root
addItem(product: Product, quantity: number): void {
const item = new OrderItem(product, quantity);
this.items.push(item);
this.validateTotal();
}
removeItem(itemId: ItemId): void {
this.items = this.items.filter(i => !i.id.equals(itemId));
}
// Root enforces invariants
private validateTotal(): void {
if (this.calculateTotal().exceeds(MAX_ORDER_VALUE)) {
throw new OrderTotalExceeded();
}
}
}
// BAD: Accessing items directly
order.items.push(new OrderItem(...)); // Bypasses validation!
// GOOD: Through the root
order.addItem(product, 2); // Validation happens
```

View File

@@ -0,0 +1,262 @@
# SOLID Principles
## Overview
SOLID helps structure software to be flexible, maintainable, and testable. These principles reduce coupling and increase cohesion.
## S - Single Responsibility Principle (SRP)
> "A class should have one, and only one, reason to change."
### Problem It Solves
God objects that do everything - hard to test, hard to change, hard to understand.
### How to Apply
Each class handles ONE responsibility. If you find yourself saying "and" when describing what a class does, split it.
```typescript
// BAD: Multiple responsibilities
class Order {
calculateTotal(): number { ... }
saveToDatabase(): void { ... } // Persistence
generateInvoice(): string { ... } // Presentation
}
// GOOD: Single responsibility each
class Order {
private items: OrderItem[] = [];
addItem(item: OrderItem): void { ... }
calculateTotal(): number { ... }
}
class OrderRepository {
save(order: Order): Promise<void> { ... }
}
class InvoiceGenerator {
generate(order: Order): Invoice { ... }
}
```
### Detection Questions
- Does this class have multiple reasons to change?
- Can I describe it without using "and"?
- Would different stakeholders request changes to different parts?
---
## O - Open/Closed Principle (OCP)
> "Software entities should be open for extension but closed for modification."
### Problem It Solves
Having to modify existing, tested code every time requirements change. Risk of breaking working features.
### How to Apply
Design abstractions that allow new behavior through new classes, not edits to existing ones.
```typescript
// BAD: Must modify to add new shipping
class ShippingCalculator {
calculate(type: string, value: number): number {
if (type === 'standard') return value < 50 ? 5 : 0;
if (type === 'express') return 15;
// Must add more ifs for new types!
}
}
// GOOD: Open for extension
interface ShippingMethod {
calculateCost(orderValue: number): number;
}
class StandardShipping implements ShippingMethod {
calculateCost(orderValue: number): number {
return orderValue < 50 ? 5 : 0;
}
}
class ExpressShipping implements ShippingMethod {
calculateCost(orderValue: number): number {
return 15;
}
}
// Add new shipping by creating new class, not modifying existing
class SameDayShipping implements ShippingMethod {
calculateCost(orderValue: number): number {
return 25;
}
}
```
### Architectural Insight
OCP at architecture level means: **design your codebase so new features are added by adding code, not changing existing code.**
---
## L - Liskov Substitution Principle (LSP)
> "Subtypes must be substitutable for their base types without altering program correctness."
### Problem It Solves
Subclasses that break expectations, requiring type-checking and special cases.
### How to Apply
Subclasses must honor the contract of the parent. If the parent returns positive numbers, subclasses cannot return negatives.
```typescript
// BAD: Violates parent's contract
class DiscountPolicy {
getDiscount(value: number): number {
return 0; // Non-negative expected
}
}
class WeirdDiscount extends DiscountPolicy {
getDiscount(value: number): number {
return -5; // Increases cost! Breaks expectations
}
}
// GOOD: Enforces contract
class DiscountPolicy {
constructor(private discount: number) {
if (discount < 0) throw new Error("Discount must be non-negative");
}
getDiscount(): number {
return this.discount;
}
}
```
### Key Insight
This is why you can swap `InMemoryUserRepo` for `PostgresUserRepo` - they both honor the `UserRepo` interface contract.
---
## I - Interface Segregation Principle (ISP)
> "Clients should not be forced to depend on methods they do not use."
### Problem It Solves
Fat interfaces that force partial implementations, empty methods, or throws.
### How to Apply
Split large interfaces into smaller, cohesive ones. Clients depend only on what they need.
```typescript
// BAD: Fat interface
interface WarehouseDevice {
printLabel(orderId: string): void;
scanBarcode(): string;
packageItem(orderId: string): void;
}
class BasicPrinter implements WarehouseDevice {
printLabel(orderId: string): void { /* works */ }
scanBarcode(): string { throw new Error("Not supported"); } // Forced!
packageItem(orderId: string): void { throw new Error("Not supported"); }
}
// GOOD: Segregated interfaces
interface LabelPrinter {
printLabel(orderId: string): void;
}
interface BarcodeScanner {
scanBarcode(): string;
}
interface ItemPackager {
packageItem(orderId: string): void;
}
class BasicPrinter implements LabelPrinter {
printLabel(orderId: string): void { /* only what it does */ }
}
```
### Detection
If you see `throw new Error("Not implemented")` or empty method bodies, the interface is too fat.
---
## D - Dependency Inversion Principle (DIP)
> "High-level modules should not depend on low-level modules. Both should depend on abstractions."
### Problem It Solves
Tight coupling to specific implementations (databases, APIs, frameworks). Hard to test, hard to swap.
### How to Apply
Depend on interfaces, inject implementations.
```typescript
// BAD: Direct dependency on concrete class
class OrderService {
private emailService = new SendGridEmailService(); // Locked in!
confirmOrder(email: string): void {
this.emailService.send(email, "Order confirmed");
}
}
// GOOD: Depend on abstraction
interface EmailService {
send(to: string, message: string): void;
}
class OrderService {
constructor(private emailService: EmailService) {}
confirmOrder(email: string): void {
this.emailService.send(email, "Order confirmed");
}
}
// Now can inject any implementation
new OrderService(new SendGridEmailService());
new OrderService(new SESEmailService());
new OrderService(new MockEmailService()); // For tests!
```
### The Dependency Rule
Source code dependencies should point **inward** toward high-level policies (domain logic), never toward low-level details (infrastructure).
```
Infrastructure → Application → Domain
↑ ↑ ↑
(outer) (middle) (inner)
Dependencies flow: outer → inner
Never: inner → outer
```
---
## Applying SOLID at Architecture Level
These principles scale beyond classes:
| Principle | Architecture Application |
|-----------|--------------------------|
| SRP | Each bounded context has one responsibility |
| OCP | New features = new modules, not edits to existing |
| LSP | Microservices with same contract are substitutable |
| ISP | Thin interfaces between services |
| DIP | High-level business logic doesn't know about databases/frameworks |
---
## Quick Reference
| Principle | One-Liner | Red Flag |
|-----------|-----------|----------|
| SRP | One reason to change | "This class handles X and Y and Z" |
| OCP | Add, don't modify | `if/else` chains for types |
| LSP | Subtypes are substitutable | Type-checking in calling code |
| ISP | Small, focused interfaces | Empty method implementations |
| DIP | Depend on abstractions | `new ConcreteClass()` in business logic |

299
spec-driven-dev/SKILL.md Normal file
View File

@@ -0,0 +1,299 @@
---
name: spec-driven-dev
description: Write a structured specification before writing any code. Use when starting a new project, feature, or significant change. Adapted for a PM-first workflow where why comes before how.
---
# Spec-Driven Development
## Overview
Write a structured specification before writing any code. The spec is the shared source of truth — it defines what we're building, why it matters, and how we'll know it's done.
**Code without a spec is guessing.**
A spec doesn't need to be long. A two-paragraph spec beats no spec. The value is in forcing clarity before code is written, not in the length of the document.
## Mathias PM Context
As a digital product manager building software:
- **Why before how:** The spec must capture the business context and user need before technical decisions. Agents reading the spec should understand *why this matters*, not just *what to build*.
- **Explicit success criteria:** Vague requirements produce vague results. Every spec must have testable success criteria.
- **Surfaces assumptions:** The spec's primary job is to surface misunderstandings before they become expensive code.
- **Living document:** Update the spec when decisions change. An outdated spec is still better than no spec.
## When to Use
**Always create a spec when:**
- Starting a new project or feature
- Requirements are ambiguous or incomplete
- The change touches multiple files or modules
- You're about to make an architectural decision
- The task would take more than a day to implement
**When NOT to use:** Single-line fixes, typos, or changes where requirements are unambiguous and self-contained.
## The Gated Workflow
Do not advance to the next phase without validation at each gate.
```
SPECIFY → [review] → PLAN → [review] → TASKS → [review] → IMPLEMENT
```
Each gate is a deliberate pause: does the next phase make sense given what we know?
## Phase 1: SPECIFY
### Surface Assumptions Immediately
Before writing spec content, list what you're assuming:
```
ASSUMPTIONS I'M MAKING:
1. This is a Go backend service (no frontend changes)
2. Authentication is handled by the existing middleware
3. The database is PostgreSQL (matching the rest of the stack)
4. The feature is used by authenticated users only, not public
→ Confirm or correct before I proceed.
```
Don't silently fill in ambiguous requirements.
### Write the Spec
A spec covers six areas:
**1. Objective — WHY are we building this?**
This is the most important section. It must answer:
- What user problem does this solve?
- Who is the user?
- What does success look like from the user's perspective?
- Why now?
```markdown
## Objective
Invoice importers at small accounting firms manually copy payment details
from PDF invoices into their banking system, taking 1020 minutes per invoice.
**User:** Invoice processor at an accounting firm (1050 invoices/day)
**Problem:** Manual data entry is slow, error-prone, and creates compliance risk
**Goal:** Reduce per-invoice processing time from ~15 minutes to < 2 minutes
Success: Invoice processor can extract and queue a payment from a PDF in under 2 minutes,
with confidence the data is correct.
```
**2. Commands — Full executable commands**
```
Build: task build
Test: task test (or: go test ./...)
Lint: task lint (or: golangci-lint run)
Dev: task dev
Deploy: task deploy:staging
```
**3. Project Structure — Where things live**
```
internal/
domain/ → Core types and interfaces
service/ → Business logic
store/ → Database implementations
handler/ → HTTP handlers
cmd/
server/ → Main entry point
```
**4. Code Style — One real example beats three paragraphs**
```go
// Error handling: always wrap with context
if err != nil {
return fmt.Errorf("parse invoice PDF: %w", err)
}
// Dependency injection: accept interfaces
func NewInvoiceService(store InvoiceStore, parser PDFParser) *InvoiceService { ... }
// Context: always first parameter for I/O operations
func (s *InvoiceService) ProcessPDF(ctx context.Context, r io.Reader) (Invoice, error) { ... }
```
**5. Testing Strategy**
```
Framework: testing package + testify
Locations: *_test.go files in same package
Unit tests: table-driven, in-memory implementations for stores
Fast path: go test ./... (unit tests only)
Full suite: go test -tags=integration ./... (includes DB tests)
Coverage: >80% for business logic packages
```
**6. Boundaries**
```
Always:
- Run go test ./... before committing
- Wrap errors with fmt.Errorf("context: %w", err)
- Pass ctx as first parameter to any I/O function
- Run govulncheck before adding new dependencies
Ask first:
- Schema changes
- Adding new external dependencies
- Changing public API contracts
- Performance changes that affect existing behavior
Never:
- Commit secrets or API keys
- Remove or skip failing tests
- Send client data to external APIs
- Use naked returns on errors
```
### Spec Template
```markdown
# Spec: [Feature/Project Name]
## Objective
[What we're building and why. User story or problem statement.]
[Who is the user? What does success look like from their perspective?]
## Tech Stack
[Language, key libraries, relevant existing infrastructure]
## Commands
Build: [full command]
Test: [full command]
Lint: [full command]
Dev: [full command]
## Project Structure
[Directory layout with descriptions]
## Code Style
[One real code example showing the patterns to follow]
## Testing Strategy
[Framework, test locations, what to unit test vs integration test]
## Boundaries
- Always: [...]
- Ask first: [...]
- Never: [...]
## Success Criteria
[Specific, testable conditions that define "done"]
- [ ] [Condition 1: metric/threshold/method]
- [ ] [Condition 2: ...]
## Open Questions
[Anything unresolved that needs input before implementation begins]
```
## Reframing Vague Requirements
When you receive a vague requirement, translate it into specific success criteria before writing any spec content:
```
Vague: "Make the invoice parser more reliable"
Reframed success criteria:
- Parser correctly extracts IBAN from 95% of Swedish invoice formats
- Parser correctly extracts total amount from 98% of tested invoices
- Parser returns a structured error (not a panic) for unrecognized formats
- Processing time < 2 seconds for PDFs up to 10MB
→ Are these the right targets?
```
## Phase 2: PLAN
With a validated spec, create a technical implementation plan:
1. Identify major components and their dependencies
2. Determine implementation order (foundations first)
3. Note risks and unknowns
4. Identify what can be built in parallel vs. what must be sequential
5. Define verification checkpoints between phases
The plan should be reviewable: anyone should be able to read it and say "yes, that's the right approach" or "no, change X."
Load the `planning` skill for detailed task breakdown.
## Phase 3: TASKS
Break the plan into discrete, implementable tasks. Load the `planning` skill for the full task breakdown methodology.
**Each task must have:**
- Acceptance criteria
- Verification step (test command, build, manual check)
- File count estimate (no task should touch more than ~5 files)
## Phase 4: IMPLEMENT
Execute tasks one at a time. For each task:
1. Load `tdd` skill — write failing tests first
2. Implement minimal code to pass
3. Load `clean-code` skill — refactor
## Keeping the Spec Alive
- **Update when decisions change** — spec first, then code
- **Update when scope changes**
- **Commit the spec** — it belongs in version control
- **Reference the spec in PRs** — link to the section each PR implements
## Common Rationalizations
| Rationalization | Reality |
|---|---|
| "This is simple, I don't need a spec" | Simple tasks still need acceptance criteria. A two-line spec is fine. |
| "I'll write the spec after" | That's documentation, not specification. The spec's value is forcing clarity *before* code. |
| "The spec will slow us down" | A 15-minute spec prevents hours of rework. |
| "Requirements will change anyway" | That's why the spec is a living document. |
| "The user knows what they want" | Even clear requests have implicit assumptions. The spec surfaces those. |
## Verification
Before starting implementation:
- [ ] Spec covers all six core areas
- [ ] Mathias has reviewed and approved the spec
- [ ] Success criteria are specific and testable
- [ ] Boundaries (Always/Ask First/Never) are defined
- [ ] Open questions are resolved or accepted as unknowns
- [ ] Spec is saved to a file in the repository
## Brain MCP Integration
### Logging
Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.
**At end of each phase:**
- `session_log` with `{skill: "spec-driven-dev", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**Phases for this skill:** specify, plan, tasks, implement
**Status semantics:**
- `pass` — the phase's intended outcome was reached (gate passed).
- `fail` — the phase's intended outcome was NOT reached (gate blocked, rework required).
- `skip` — phase was skipped intentionally.
**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
## Cross-References
- Load `problem-analysis` skill for deep requirement understanding before speccing
- Load `user-stories` skill to decompose the spec into stories
- Load `planning` skill for task breakdown
- Load `feature-spec` skill once implementation begins, to scope individual features inside the project
- Load `tdd` skill during implementation

401
tdd/SKILL.md Normal file
View File

@@ -0,0 +1,401 @@
---
name: tdd
description: Write failing tests first, then minimal code to pass. Non-negotiable for all new features and bug fixes.
---
# Test-Driven Development (TDD)
## Overview
Write the test first. Watch it fail. Write minimal code to pass.
**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
**Violating the letter of the rules is violating the spirit of the rules.**
## When to Use
**Always:**
- New features
- Bug fixes
- Refactoring
- Behavior changes
**Exceptions (ask Mathias):**
- Throwaway prototypes
- Generated code (sqlc output, templ output)
- Configuration files
Thinking "skip TDD just this once"? Stop. That's rationalization.
## The Iron Law
```
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
```
Write code before the test? Delete it. Start over.
**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete
Implement fresh from tests. Period.
## Red-Green-Refactor
```
RED → verify fails correctly → GREEN → verify all pass → REFACTOR → stay green → RED (next)
```
### RED - Write Failing Test
Write one minimal test showing what should happen.
**Go example (good):**
```go
func TestRetryOperation_RetriesThreeTimes(t *testing.T) {
attempts := 0
op := func() error {
attempts++
if attempts < 3 {
return errors.New("fail")
}
return nil
}
err := RetryOperation(op, 3)
require.NoError(t, err)
assert.Equal(t, 3, attempts)
}
```
Clear name, tests real behavior, one thing.
**Go example (bad):**
```go
func TestRetry(t *testing.T) {
// Vague name, tests nothing meaningful
err := RetryOperation(nil, 0)
assert.NoError(t, err)
}
```
**Requirements:**
- One behavior per test
- Clear name describing the behavior
- Real code (no mocks unless crossing a system boundary)
### Verify RED - Watch It Fail
**MANDATORY. Never skip.**
```bash
go test -run TestRetryOperation_RetriesThreeTimes ./...
```
Confirm:
- Test fails (not compilation errors)
- Failure message is expected
- Fails because feature is missing (not typos)
**Test passes?** You're testing existing behavior. Fix test.
**Compilation errors?** Fix errors, re-run until it fails correctly.
### GREEN - Minimal Code
Write simplest code to pass the test.
**Good:**
```go
func RetryOperation(op func() error, maxRetries int) error {
for i := 0; i < maxRetries; i++ {
if err := op(); err == nil {
return nil
}
}
return op()
}
```
Just enough to pass.
**Bad:**
```go
func RetryOperation(op func() error, maxRetries int, opts ...RetryOption) error {
// YAGNI — don't add options, backoff, jitter before the test demands it
}
```
Over-engineered.
Don't add features, refactor other code, or "improve" beyond what the test demands.
### Verify GREEN - Watch It Pass
**MANDATORY.**
```bash
go test ./...
```
Confirm:
- Test passes
- All other tests still pass
- No race conditions: `go test -race ./...`
**Test fails?** Fix code, not test.
**Other tests fail?** Fix now.
### REFACTOR - Clean Up
After green only:
- Remove duplication
- Improve names
- Extract helpers
- Apply clean code principles (load `clean-code` skill)
Keep tests green. Don't add behavior.
### Repeat
Next failing test for next behavior.
## Go-Specific TDD Notes
### Table-Driven Tests (Preferred Pattern)
```go
func TestValidateEmail(t *testing.T) {
tests := []struct {
name string
email string
wantErr bool
}{
{"valid email", "user@example.com", false},
{"empty email", "", true},
{"no at sign", "userexample.com", true},
{"no domain", "user@", true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := ValidateEmail(tt.email)
if tt.wantErr {
assert.Error(t, err)
} else {
assert.NoError(t, err)
}
})
}
}
```
Table-driven tests are the Go idiom. Use them for behavior that varies across inputs.
### Subtests with t.Run
Use `t.Run` to name subtests clearly. Failure messages include the subtest name.
```go
t.Run("rejects empty input", func(t *testing.T) { ... })
t.Run("accepts valid UUID", func(t *testing.T) { ... })
```
### Test File Conventions
- File: `<package>_test.go` in the same directory
- Package: `package foo_test` for black-box testing (preferred), `package foo` for white-box
- Helpers: use `t.Helper()` so stack traces point to the caller, not the helper
```go
func assertNoError(t *testing.T, err error) {
t.Helper()
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
}
```
### Running Tests
```bash
go test ./... # all packages
go test -run TestFoo ./pkg/... # specific test
go test -run TestFoo/subtest ./... # specific subtest
go test -race ./... # race detector (always run before commit)
go test -cover ./... # coverage
go test -v ./... # verbose
```
### testify
Pre-approved. Use `assert` (continues on failure) and `require` (stops on failure):
```go
require.NoError(t, err) // fatal if error
assert.Equal(t, expected, got) // non-fatal comparison
assert.ErrorIs(t, err, ErrFoo) // error chain check
```
## Good Tests
| Quality | Good | Bad |
|---------|------|-----|
| **Minimal** | One thing. "and" in name? Split it. | `TestValidatesEmailAndDomainAndWhitespace` |
| **Clear** | Name describes behavior | `TestFoo`, `Test1` |
| **Shows intent** | Demonstrates desired API | Obscures what code should do |
| **Table-driven** | Multiple cases, one test function | Copy-pasted test functions |
## Common Rationalizations
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
| "Existing code has no tests" | You're improving it. Add tests for existing code. |
## Red Flags - STOP and Start Over
- Code before test
- Test after implementation
- Test passes immediately without seeing it fail
- Can't explain why test failed
- Tests added "later"
- Rationalizing "just this once"
- "Already manually tested it"
- "Tests after achieve the same purpose"
- "Keep as reference" or "adapt existing code"
- "Already spent X hours, deleting is wasteful"
- "TDD is dogmatic, I'm being pragmatic"
- "This is different because..."
**All of these mean: Delete code. Start over with TDD.**
## Example: Bug Fix
**Bug:** Empty email accepted
**RED**
```go
func TestSubmitForm_RejectsEmptyEmail(t *testing.T) {
result := submitForm(FormData{Email: ""})
assert.Equal(t, "email required", result.Error)
}
```
**Verify RED**
```bash
$ go test -run TestSubmitForm_RejectsEmptyEmail ./...
FAIL: expected "email required", got ""
```
**GREEN**
```go
func submitForm(data FormData) FormResult {
if strings.TrimSpace(data.Email) == "" {
return FormResult{Error: "email required"}
}
// ...
}
```
**Verify GREEN**
```bash
$ go test ./...
ok example.com/myapp 0.003s
```
**REFACTOR**
Extract validation for multiple fields if needed.
## Verification Checklist
Before marking work complete:
- [ ] Every new function/method has a test
- [ ] Watched each test fail before implementing
- [ ] Each test failed for expected reason (feature missing, not typo)
- [ ] Wrote minimal code to pass each test
- [ ] All tests pass: `go test ./...`
- [ ] Race detector clean: `go test -race ./...`
- [ ] Tests use real code (mocks only if crossing a system boundary)
- [ ] Edge cases and errors covered
Can't check all boxes? You skipped TDD. Start over.
## When Stuck
| Problem | Solution |
|---------|----------|
| Don't know how to test | Write wished-for API. Write assertion first. Ask Mathias. |
| Test too complicated | Design too complicated. Simplify interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers with `t.Helper()`. Still complex? Simplify design. |
## Debugging Integration
Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
Never fix bugs without a test.
## Testing Anti-Patterns
When adding test utilities or mocks, load the `references/testing-anti-patterns.md` to avoid:
- Testing mock behavior instead of real behavior
- Adding test-only methods to production types
- Mocking without understanding what the dependency does
## Brain MCP Integration
The brain MCP exposes session context across machines. Use it to make TDD cycles cumulative rather than one-shot.
**At session start:**
- Run `brain_query` with the feature name + "tdd" to surface prior cycles, anti-patterns, or testing decisions for this code area. Skip if the feature is brand new.
**Never:**
- Embed brain content in test code or assertions. The brain is context for you, not state for the system under test.
### Logging
Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.
**At end of `red` phase:**
- `session_log` with `{skill: "tdd", phase: "red", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**At end of `green` phase:**
- `session_log` with `{skill: "tdd", phase: "green", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**At end of `refactor` phase:**
- `session_log` with `{skill: "tdd", phase: "refactor", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**Status semantics:**
- `pass` — the phase's intended outcome was reached (red: test fails as expected; green: test passes; refactor: tests still pass after refactor).
- `fail` — the phase's intended outcome was NOT reached (test compiled when it shouldn't, test still fails after green attempt, refactor broke tests).
- `skip` — phase was skipped intentionally (e.g. refactor not warranted).
**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future `tdd` call to a local model. If your skill never logs, the routing pod sees no data and may default-route or default-not-route in a way that doesn't reflect real performance.
## Final Rule
```
Production code → test exists and failed first
Otherwise → not TDD
```
No exceptions without Mathias's permission.
## Mode 2 Routing Note
This skill is invoked identically whether the agent is running in Mode 1 (cloud Claude, no routing) or Mode 2 (client-local, supervisor routing layer). The routing pod (Plan 6) does not exist yet; until it does, treat this skill as Mode 1 only. The discipline does not change between modes — only the model behind the call.

View File

@@ -0,0 +1,299 @@
# Testing Anti-Patterns
**Load this reference when:** writing or changing tests, adding mocks, or tempted to add test-only methods to production code.
## Overview
Tests must verify real behavior, not mock behavior. Mocks are a means to isolate, not the thing being tested.
**Core principle:** Test what the code does, not what the mocks do.
**Following strict TDD prevents these anti-patterns.**
## The Iron Laws
```
1. NEVER test mock behavior
2. NEVER add test-only methods to production classes
3. NEVER mock without understanding dependencies
```
## Anti-Pattern 1: Testing Mock Behavior
**The violation:**
```typescript
// ❌ BAD: Testing that the mock exists
test('renders sidebar', () => {
render(<Page />);
expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
});
```
**Why this is wrong:**
- You're verifying the mock works, not that the component works
- Test passes when mock is present, fails when it's not
- Tells you nothing about real behavior
**your human partner's correction:** "Are we testing the behavior of a mock?"
**The fix:**
```typescript
// ✅ GOOD: Test real component or don't mock it
test('renders sidebar', () => {
render(<Page />); // Don't mock sidebar
expect(screen.getByRole('navigation')).toBeInTheDocument();
});
// OR if sidebar must be mocked for isolation:
// Don't assert on the mock - test Page's behavior with sidebar present
```
### Gate Function
```
BEFORE asserting on any mock element:
Ask: "Am I testing real component behavior or just mock existence?"
IF testing mock existence:
STOP - Delete the assertion or unmock the component
Test real behavior instead
```
## Anti-Pattern 2: Test-Only Methods in Production
**The violation:**
```typescript
// ❌ BAD: destroy() only used in tests
class Session {
async destroy() { // Looks like production API!
await this._workspaceManager?.destroyWorkspace(this.id);
// ... cleanup
}
}
// In tests
afterEach(() => session.destroy());
```
**Why this is wrong:**
- Production class polluted with test-only code
- Dangerous if accidentally called in production
- Violates YAGNI and separation of concerns
- Confuses object lifecycle with entity lifecycle
**The fix:**
```typescript
// ✅ GOOD: Test utilities handle test cleanup
// Session has no destroy() - it's stateless in production
// In test-utils/
export async function cleanupSession(session: Session) {
const workspace = session.getWorkspaceInfo();
if (workspace) {
await workspaceManager.destroyWorkspace(workspace.id);
}
}
// In tests
afterEach(() => cleanupSession(session));
```
### Gate Function
```
BEFORE adding any method to production class:
Ask: "Is this only used by tests?"
IF yes:
STOP - Don't add it
Put it in test utilities instead
Ask: "Does this class own this resource's lifecycle?"
IF no:
STOP - Wrong class for this method
```
## Anti-Pattern 3: Mocking Without Understanding
**The violation:**
```typescript
// ❌ BAD: Mock breaks test logic
test('detects duplicate server', () => {
// Mock prevents config write that test depends on!
vi.mock('ToolCatalog', () => ({
discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
}));
await addServer(config);
await addServer(config); // Should throw - but won't!
});
```
**Why this is wrong:**
- Mocked method had side effect test depended on (writing config)
- Over-mocking to "be safe" breaks actual behavior
- Test passes for wrong reason or fails mysteriously
**The fix:**
```typescript
// ✅ GOOD: Mock at correct level
test('detects duplicate server', () => {
// Mock the slow part, preserve behavior test needs
vi.mock('MCPServerManager'); // Just mock slow server startup
await addServer(config); // Config written
await addServer(config); // Duplicate detected ✓
});
```
### Gate Function
```
BEFORE mocking any method:
STOP - Don't mock yet
1. Ask: "What side effects does the real method have?"
2. Ask: "Does this test depend on any of those side effects?"
3. Ask: "Do I fully understand what this test needs?"
IF depends on side effects:
Mock at lower level (the actual slow/external operation)
OR use test doubles that preserve necessary behavior
NOT the high-level method the test depends on
IF unsure what test depends on:
Run test with real implementation FIRST
Observe what actually needs to happen
THEN add minimal mocking at the right level
Red flags:
- "I'll mock this to be safe"
- "This might be slow, better mock it"
- Mocking without understanding the dependency chain
```
## Anti-Pattern 4: Incomplete Mocks
**The violation:**
```typescript
// ❌ BAD: Partial mock - only fields you think you need
const mockResponse = {
status: 'success',
data: { userId: '123', name: 'Alice' }
// Missing: metadata that downstream code uses
};
// Later: breaks when code accesses response.metadata.requestId
```
**Why this is wrong:**
- **Partial mocks hide structural assumptions** - You only mocked fields you know about
- **Downstream code may depend on fields you didn't include** - Silent failures
- **Tests pass but integration fails** - Mock incomplete, real API complete
- **False confidence** - Test proves nothing about real behavior
**The Iron Rule:** Mock the COMPLETE data structure as it exists in reality, not just fields your immediate test uses.
**The fix:**
```typescript
// ✅ GOOD: Mirror real API completeness
const mockResponse = {
status: 'success',
data: { userId: '123', name: 'Alice' },
metadata: { requestId: 'req-789', timestamp: 1234567890 }
// All fields real API returns
};
```
### Gate Function
```
BEFORE creating mock responses:
Check: "What fields does the real API response contain?"
Actions:
1. Examine actual API response from docs/examples
2. Include ALL fields system might consume downstream
3. Verify mock matches real response schema completely
Critical:
If you're creating a mock, you must understand the ENTIRE structure
Partial mocks fail silently when code depends on omitted fields
If uncertain: Include all documented fields
```
## Anti-Pattern 5: Integration Tests as Afterthought
**The violation:**
```
✅ Implementation complete
❌ No tests written
"Ready for testing"
```
**Why this is wrong:**
- Testing is part of implementation, not optional follow-up
- TDD would have caught this
- Can't claim complete without tests
**The fix:**
```
TDD cycle:
1. Write failing test
2. Implement to pass
3. Refactor
4. THEN claim complete
```
## When Mocks Become Too Complex
**Warning signs:**
- Mock setup longer than test logic
- Mocking everything to make test pass
- Mocks missing methods real components have
- Test breaks when mock changes
**your human partner's question:** "Do we need to be using a mock here?"
**Consider:** Integration tests with real components often simpler than complex mocks
## TDD Prevents These Anti-Patterns
**Why TDD helps:**
1. **Write test first** → Forces you to think about what you're actually testing
2. **Watch it fail** → Confirms test tests real behavior, not mocks
3. **Minimal implementation** → No test-only methods creep in
4. **Real dependencies** → You see what the test actually needs before mocking
**If you're testing mock behavior, you violated TDD** - you added mocks without watching test fail against real code first.
## Quick Reference
| Anti-Pattern | Fix |
|--------------|-----|
| Assert on mock elements | Test real component or unmock it |
| Test-only methods in production | Move to test utilities |
| Mock without understanding | Understand dependencies first, mock minimally |
| Incomplete mocks | Mirror real API completely |
| Tests as afterthought | TDD - tests first |
| Over-complex mocks | Consider integration tests |
## Red Flags
- Assertion checks for `*-mock` test IDs
- Methods only called in test files
- Mock setup is >50% of test
- Test fails when you remove mock
- Can't explain why mock is needed
- Mocking "just to be safe"
## The Bottom Line
**Mocks are tools to isolate, not things to test.**
If TDD reveals you're testing mock behavior, you've gone wrong.
Fix: Test real behavior or question why you're mocking at all.

308
test-design/SKILL.md Normal file
View File

@@ -0,0 +1,308 @@
---
name: test-design
description: Evaluate test quality using Dave Farley's 8 Properties of Good Tests. Use when reviewing or writing tests to ensure they provide genuine verification.
---
# Test Design
## Overview
Good tests are investments. Bad tests are liabilities — they pass when they shouldn't, fail when code is correct, or verify nothing meaningful.
This skill uses Dave Farley's 8 Properties of Good Tests to assess and improve test quality. The **Farley Index** (010) provides a scored summary.
Reference: [Dave Farley's Properties of Good Tests](https://www.linkedin.com/pulse/tdd-properties-good-tests-dave-farley-iexge/)
## The 8 Properties
| Property | Weight | What it measures |
|----------|--------|-----------------|
| **Understandable** | 1.5x | Can a reader understand what behavior is being tested? |
| **Maintainable** | 1.5x | Will small code changes cause test failures unrelated to behavior? |
| **Repeatable** | 1.25x | Same result every time, regardless of environment or order |
| **Atomic** | 1.0x | One behavior per test; tests are independent |
| **Necessary** | 1.0x | Tests real behavior, not mock internals or framework behavior |
| **Granular** | 1.0x | Each test covers one specific case |
| **Fast** | 0.75x | Tests run quickly enough to support rapid TDD cycles |
| **First (TDD)** | 1.0x | Tests were written before implementation |
**Farley Index formula:** `(U×1.5 + M×1.5 + R×1.25 + A×1.0 + N×1.0 + G×1.0 + F×0.75 + T×1.0) / 9.0`
## Rating Scale
| Score | Rating | Interpretation |
|-------|--------|----------------|
| 9.010.0 | Exemplary | Model quality; tests serve as living documentation |
| 7.58.9 | Excellent | High quality with minor improvement opportunities |
| 6.07.4 | Good | Solid foundation with clear improvement areas |
| 4.55.9 | Fair | Functional but needs significant attention |
| 3.04.4 | Poor | Tests provide limited value; refactoring needed |
| 0.02.9 | Critical | Tests may be harmful; consider rewriting |
## Property Deep Dives
### Understandable (U)
A test should tell a story: what behavior, under what conditions, produces what result.
**Go patterns that help:**
- Subtest names in `t.Run`: `t.Run("returns error when email is empty", ...)`
- Table-driven tests with descriptive `name` fields
- Arrange-Act-Assert structure with blank lines separating sections
```go
// Good: clear behavior name, clear structure
func TestValidateUser_RejectsEmptyEmail(t *testing.T) {
// Arrange
user := User{Name: "Alice", Email: ""}
// Act
err := ValidateUser(user)
// Assert
require.Error(t, err)
assert.ErrorIs(t, err, ErrInvalidEmail)
}
// Bad: cryptic name, no structure
func TestUser1(t *testing.T) {
u := User{}
assert.NotNil(t, ValidateUser(u))
}
```
**Negative signals:** cryptic names (`test_1`, `TestFoo`), no AAA structure, multiple behaviors in one test.
### Maintainable (M)
Tests that break when implementation changes (but behavior doesn't) create noise and slow down development.
**Negative signals:**
- Over-specified mock interactions (`assert.Called(mock, "MethodX", args...)` when behavior is all that matters)
- ArgumentCaptor deep inspection
- `verifyNoMoreInteractions` that breaks when you add a logging call
- Tests coupled to internal field names
**Go patterns that help:**
- Test behavior via public API, not internal state
- Avoid asserting on exact call counts unless the count IS the behavior
```go
// Bad: breaks when you add an audit log call
mock.AssertCalled(t, "Save", user)
mock.AssertNumberOfCalls(t, "Save", 1)
mock.AssertNotCalled(t, "Log") // Breaks if you add logging later
// Good: test the outcome
result, err := service.CreateUser(ctx, req)
require.NoError(t, err)
assert.Equal(t, user.Email, result.Email)
```
### Repeatable (R)
Tests must produce the same result regardless of when, where, or in what order they run.
**Negative signals (Go):**
- `time.Now()` in test logic without injection
- `os.ReadFile` for fixtures that aren't hermetic
- Shared global state between tests
- Tests that depend on network availability
- `time.Sleep` for synchronization
**Go fixes:**
```go
// Bad: time-dependent
func TestTokenExpiry(t *testing.T) {
token := generateToken()
time.Sleep(2 * time.Second)
assert.True(t, token.IsExpired())
}
// Good: inject clock
type Clock interface {
Now() time.Time
}
type FixedClock struct{ t time.Time }
func (c FixedClock) Now() time.Time { return c.t }
func TestTokenExpiry(t *testing.T) {
clock := FixedClock{t: time.Unix(0, 0)}
token := generateTokenWithClock(clock)
futureClk := FixedClock{t: time.Unix(3600, 0)}
assert.True(t, token.IsExpiredAt(futureClk.Now()))
}
```
Use `t.TempDir()` for filesystem fixtures — cleaned up automatically.
### Atomic (A)
One test = one behavior. Tests must be independent — running in any order must produce the same result.
**Go patterns:**
- `t.Parallel()` on subtests forces isolation
- Fresh state in each `t.Run`
- No `init()` or package-level setup that leaks between tests
```go
func TestUserService(t *testing.T) {
tests := []struct {
name string
input CreateUserReq
wantErr bool
}{
{"valid user", validReq, false},
{"duplicate email", dupEmailReq, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel() // Each subtest runs independently
store := NewInMemoryStore() // Fresh state per test
svc := NewUserService(store)
_, err := svc.Create(context.Background(), tt.input)
if tt.wantErr {
assert.Error(t, err)
} else {
assert.NoError(t, err)
}
})
}
}
```
### Necessary (N)
Tests must verify real behavior. Tautology Theatre — tests whose outcome is predetermined regardless of production code — provides false confidence.
**Types of Tautology Theatre:**
1. **Mock tautology:** Configure mock return, then assert that mock returns it.
```go
// Bad: this passes even if production code is deleted
mockStore.On("GetUser", id).Return(user, nil)
result, _ := mockStore.GetUser(id)
assert.Equal(t, user, result) // Testing the mock, not production code
```
2. **Mock-only test:** Every object is a mock; no real class instantiated.
3. **Trivial tautology:** `assert.True(t, true)` or `assert.NotNil(t, new(User))`
4. **Framework test:** Verifying that Go's `make(map[string]int)` returns non-nil.
**Fix:** Test real behavior through real implementations. Use mocks only to isolate from external systems (DB, HTTP, filesystem).
### Granular (G)
Each test covers one specific case. Table-driven tests in Go are the natural expression of granularity.
```go
// Good: each row is one case, each can fail independently
func TestParseAmount(t *testing.T) {
tests := []struct {
name string
input string
want Amount
wantErr bool
}{
{"integer", "100", Amount{Value: 100}, false},
{"decimal", "10.50", Amount{Value: 1050, Scale: 2}, false},
{"negative", "-5", Amount{}, true},
{"empty", "", Amount{}, true},
{"non-numeric", "abc", Amount{}, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := ParseAmount(tt.input)
if tt.wantErr {
require.Error(t, err)
return
}
require.NoError(t, err)
assert.Equal(t, tt.want, got)
})
}
}
```
### Fast (F)
Tests must run fast enough to support TDD cycles. Target: the full test suite in < 30 seconds for most projects.
**Go fixes:**
- Mark slow integration tests with build tags: `//go:build integration`
- Use `t.Parallel()` to parallelize safe tests
- Use `InMemoryStore` implementations instead of real DB for unit tests
- Use `httptest.NewServer` for HTTP tests instead of real servers
```bash
# Unit tests only (fast, default)
go test ./...
# Integration tests (slower, explicit)
go test -tags=integration ./...
```
**Negative signals:** `time.Sleep`, network calls without build tags, database calls in unit tests.
### First / TDD (T)
Evidence that tests were written before implementation. This is the hardest property to verify statically.
**Positive signals:**
- Commit history shows test commit before implementation commit
- Tests test behavior, not implementation details (tests-first forces API design)
- Tests are simpler than the implementation (tests-first keeps tests focused)
**Negative signals:**
- Tests that exactly mirror the implementation structure
- Tests that only cover happy paths (implementation-first misses edge cases)
- Tests added in the same commit as a large implementation
## Go-Specific Test Design Notes
### t.Helper()
Use `t.Helper()` in helper functions so stack traces point to the call site, not the helper:
```go
func assertValidUser(t *testing.T, u User) {
t.Helper()
assert.NotEmpty(t, u.ID)
assert.NotEmpty(t, u.Email)
}
```
### Table-Driven Tests Are Preferred
Go convention is table-driven tests. They're granular, readable, and easy to extend:
- Add a new case by adding a row to the table — no new test function
- Each case can be run independently: `go test -run TestFoo/case_name`
### Subtests Enable Targeted Runs
```bash
go test -run TestValidateUser/rejects_empty_email ./...
```
## When Writing Tests
Apply this checklist to every new test:
- [ ] Name describes the behavior being tested (not the function name)
- [ ] Structure follows Arrange-Act-Assert
- [ ] Tests one behavior (no "and" in the name)
- [ ] Uses real implementations where feasible
- [ ] Runs in < 100ms (or tagged for integration)
- [ ] Uses `t.Helper()` in helper functions
- [ ] Table-driven if testing multiple similar inputs
## Cross-References
- Load `tdd` skill for the full TDD workflow
- Load `code-review` skill for test quality review during pre-merge review
- See `clean-code/references/code-smells.md` for testing-specific smells

252
trainer/SKILL.md Normal file
View File

@@ -0,0 +1,252 @@
---
name: trainer
description: Two-phase brain curation — scan session logs to score candidate learnings (READ), then write the ones that pass the quality gate as generalized knowledge entries (WRITE). Use periodically to keep the brain useful, not noisy.
---
# Trainer
## Overview
The trainer is the brain's quality gate. It runs in two phases:
- **READ phase:** scan session logs, score candidate learnings 1-5, output the candidate list.
- **WRITE phase:** take candidates that score ≥3, apply a quality gate (must generalize, no project-specific identifiers, must be clear cold), emit knowledge entries.
`session-retrospective` surfaces candidates loosely. The trainer is stricter — it produces only generalized, reusable knowledge, the kind that earns its place in long-term memory.
**Core principle:** If a knowledge entry only makes sense to someone who was in the session, it does not belong in the brain.
## Iron Laws
1. **Do not call `brain_write` from this skill.** Emit entries; the operator decides which to commit.
2. **Apply the quality gate before emitting any entry.** A skipped gate produces brain pollution.
3. **No project-specific paths, identifiers, or function names in entries.** If you cannot generalize the lesson, drop the entry.
## When to Use
- Periodically (weekly, or after a stretch of intense work) — not after every session
- When `session-retrospective` has produced too many candidates to keep — trainer filters them
- When the brain is becoming noisy and needs curation pruning before adding more
**Do not use for:**
- Single sessions with one or two clear learnings (use `session-retrospective` directly)
- Project-specific decisions (those belong in CLAUDE.md, not the brain)
## Phase READ — Scan and Score
### What to Look For
- **Patterns that worked** — the approach was clean, correct, and worth reinforcing for future sessions
- **Corrections** — something was first done wrong, then corrected. Both sides are valuable: the rationalization that led to the wrong path, and the correction that fixed it.
### Scoring (1-5)
| Score | Criteria |
|---|---|
| 5 | Novel pattern, clearly correct, generalizes across projects and domains |
| 4 | Good pattern, correct, somewhat project-specific but still broadly useful |
| 3 | Correct but obvious — include only if especially clean or pedagogically useful |
| 2 | Trivial or already widely known — skip |
| 1 | Project-specific noise, mechanical fix, or single-use detail — skip |
Anything below 3 is dropped. Anything at 3 is borderline — include only if the operator might benefit from the framing.
### READ Output Format
Respond in markdown:
```
**Candidate N (score: X/5, type: pattern|correction)**
- **What happened:** Brief description of the learning moment
- **Why it's valuable:** What makes this worth preserving
- **Key insight:** The distilled lesson in one sentence
```
End with: `N candidates found (M scoring ≥ 3)` — the WRITE phase will use the ≥3 candidates.
## Phase WRITE — Emit Knowledge Entries
### Quality Gate
Apply BEFORE writing each entry. Drop the entry if any check fails:
- **Lesson** is phrased so it could apply to any project, not just this one — no project-specific paths, variable names, function names, or repo identifiers
- **When it applies** may name a stack or tool family (e.g. "Go projects with golangci-lint") to scope the relevance, but must not name specific files or functions
- **Example** may use concrete tool names, language syntax, or commands to make the point land — concrete examples are pedagogically necessary; only the Lesson and When-it-applies sections need to generalize
- The insight is clearly stated enough that someone reading it cold (no session context) understands it
- The lesson is not already in `CLAUDE.md` or an existing memory file
### WRITE Output Format
For each candidate that passes the quality gate:
```
# [Topic]
## Lesson
[The key insight in 1-3 sentences. Generalized — no project-specific names.]
## When it applies
[Conditions under which this pattern is relevant.]
## Example
[A brief, generic example illustrating the lesson. Pseudocode or
language-agnostic prose preferred over a copy of the original code.]
```
After all entries, end with: `N entries ready for brain_write` or `0 entries passed quality gate — <reason>`.
The operator (or a calling agent) decides whether to actually run `brain_write` on each entry. This skill does not write directly.
## Worked Example
### READ phase output
```
**Candidate 1 (score: 4/5, type: correction)**
- **What happened:** Implementer subagent reported "tests pass" after running `go test ./...`, but CI failed on a `defer resp.Body.Close()` errcheck violation 30 minutes later.
- **Why it's valuable:** Captures a specific gap between local verification and CI verification that is easy to repeat across Go projects.
- **Key insight:** Per-task verification must include the linter, not just the test runner.
**Candidate 2 (score: 5/5, type: pattern)**
- **What happened:** Repaired malformed JSON from LLM output by stripping invalid escape sequences before parsing. Generalizes well — most LLM-as-JSON pipelines need this.
- **Why it's valuable:** Pattern works across any LLM-as-JSON ingestion pipeline.
- **Key insight:** Treat LLM JSON output as untrusted input that may need normalization before parsing.
**Candidate 3 (score: 3/5, type: pattern)**
- **What happened:** Implementer subagent saved its commit SHA in the structured report so the spec reviewer could verify against a stable revision.
- **Why it's valuable:** Obvious in hindsight, but cleanly framed — passing identifiers between subagents avoids race conditions when the file changes mid-review.
- **Key insight:** Subagent reports should include verifiable identifiers (commit SHAs, file hashes) the reviewer can pin against.
**Candidate 4 (score: 2/5, type: pattern)**
- **What happened:** Renamed a local variable from `data` to `invoiceLineItems` for clarity.
- **Why it's valuable:** It's good practice but trivial.
- **Key insight:** Use names that describe what the data is.
4 candidates found (3 scoring ≥ 3)
```
### WRITE phase output
```
# Per-task verification must include lint
## Lesson
When dispatching tasks that include "tests pass" as a success criterion,
the verification command must include the project's linter, not just the
test runner. Tests verify behavior; linters catch policy violations
(error handling, unused variables, formatting) that the test suite is
blind to.
## When it applies
Any project with a CI lint gate, particularly Go projects with
golangci-lint configured for errcheck or similar policies.
## Example
In a project with a Taskfile that chains lint + test, the per-task
verification should be `task check`, not `go test ./...`. The latter
will pass locally and fail in CI.
---
# LLM JSON output is untrusted input
## Lesson
Output from an LLM that is supposed to be JSON should be treated as
untrusted input. It may contain invalid escape sequences (often from
markdown-formatted content the LLM included), structurally valid but
semantically wrong fields, or trailing prose. Normalize before parsing.
## When it applies
Any pipeline that calls an LLM with a JSON output contract and parses
the response with a strict JSON parser.
## Example
Wrap the parser in a normalization step: strip invalid escape sequences,
trim trailing prose after the closing brace, validate against the
expected schema. Treat the LLM as a network boundary, not a function call.
---
# Subagent reports should carry verifiable identifiers
## Lesson
When a subagent reports completion of work that another agent will verify,
the report must carry an identifier that pins the verifiable state — a
commit SHA, file hash, or revision number. Without one, the verifier may
read a different state than the subagent produced if the workspace
changes between report and review.
## When it applies
Any multi-agent workflow with a verify-after-implement step, particularly
when the workspace is shared (a worktree or branch held by multiple
processes).
## Example
After committing changes, a subagent reports `Status: DONE, SHA: abc1234`.
The verifier runs `git show abc1234` to read exactly the state the
subagent produced, even if newer commits arrived in the interim.
3 entries ready for brain_write
```
## Anti-Patterns
| Anti-Pattern | Why It Fails |
|---|---|
| Score everything 4 or 5 | The gate exists for a reason. If everything is high-value, nothing is. |
| Knowledge entries with project-specific paths | They will not match queries from other projects, defeating the brain's value. |
| Writing entries directly without the quality gate | Polluted brain → useless brain. The gate is the whole point. |
| Skipping READ and going straight to WRITE | The score is what determines whether to write. Without it, you are just transcribing. |
## Brain MCP Integration
The brain holds prior curated knowledge entries. Querying it before READ prevents duplicate entries from cluttering the gate.
**Before READ:**
- Run `brain_query` for the topics that appear in the session log. This prevents scoring candidates that are already in the brain (skip them — same content twice does not help).
**After WRITE:**
- This skill does NOT call `brain_write`. The operator decides which entries to commit, then runs `brain_write` per accepted entry.
### Logging
Call `session_log` once at the end of every phase to record the outcome.
Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which
treats `pass` as success, `fail` as failure, `skip` as neither.
**At end of `read` phase:**
- `session_log` with `{skill: "trainer", phase: "read", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**At end of `write` phase:**
- `session_log` with `{skill: "trainer", phase: "write", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}`
**Phases for this skill:** read, write
**Status semantics:**
- `pass` — the phase's intended outcome was reached (read: candidates scored; write: gate-passed entries emitted).
- `fail` — the phase's intended outcome was NOT reached.
- `skip` — phase was skipped intentionally (e.g. no candidates passed the gate, write skipped).
**Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.
## Mode 2 Routing Note
Trainer is the same shape as `session-retrospective` — long-context reading, mechanical filtering — and is a Mode 2 routing candidate. Until Plan 6, treat as Mode 1 only.
## Cross-References
- Load `session-retrospective` first if you want a looser, faster surfacing pass before the strict gate.
- The brain's `brain_query` and `brain_write` tools are the read/write side of the brain that this skill curates.

246
user-stories/SKILL.md Normal file
View File

@@ -0,0 +1,246 @@
---
name: user-stories
description: Decompose problems into thin, deliverable user stories using Elephant Carpaccio and INVEST criteria. Use after problem analysis, before planning and implementation.
---
# User Stories
## Core Philosophy
A user story is a placeholder for a conversation, not a specification. It captures who needs something, what they need, and why — in the smallest unit that still delivers end-to-end value.
The goal is **thin slices** — the Elephant Carpaccio technique. An elephant is too big to eat at once; slice it as thin as possible (carpaccio) so each slice is still valuable, testable, and deliverable.
## When to Use
- You have a problem analysis and need to break it into implementable units
- A feature is too large to estimate or implement
- You need to communicate scope and prioritize with a stakeholder
- You need to discover the minimum viable version of a feature
**Input required:** A problem analysis (load `problem-analysis` skill first) or clear requirements.
## INVEST Criteria
Every user story must be:
| Criterion | Meaning |
|-----------|---------|
| **Independent** | Can be developed without requiring another story to be complete first |
| **Negotiable** | Details can be discussed and refined |
| **Valuable** | Delivers clear value to an end user (not just infrastructure) |
| **Estimable** | Clear enough scope to understand complexity |
| **Small** | Can be completed in a single sprint (12 weeks) |
| **Testable** | Has clear, specific acceptance criteria |
## Story Format
```
**Story Title:** [Descriptive name]
**As a** [user type / persona]
**I want** [the capability or feature]
**So that** [the business value or outcome]
**Acceptance Criteria:**
- Given [starting context]
When [action taken]
Then [observable outcome]
- Given [another context]
When [action]
Then [outcome]
**Definition of Done:**
- [User-facing quality requirement]
- [Business completion criterion]
- [Performance/security requirement if applicable]
```
## Elephant Carpaccio: Slicing Technique
The most common mistake is making stories too large. Use these slicing patterns to cut large stories thin.
### Slicing Pattern 1: By Data Variation
If a feature works differently for different data, make each variation its own story.
```
Large story: "Users can search invoices"
Thin slices:
1. "Users can search invoices by exact invoice number"
2. "Users can search invoices by customer name (partial match)"
3. "Users can search invoices by date range"
4. "Users can combine multiple search criteria"
```
### Slicing Pattern 2: By User Scenario
Different user journeys are separate stories.
```
Large story: "Users can manage their account"
Thin slices:
1. "New users can register with email and password"
2. "Registered users can log in"
3. "Logged-in users can update their display name"
4. "Users can reset their password via email"
```
### Slicing Pattern 3: By Error Condition
Happy path first, then error handling as a separate story.
```
Story 1: "Users can upload a PDF invoice (happy path)"
Story 2: "Users see a clear error when uploading non-PDF files"
Story 3: "Users see a clear error when uploading files > 10MB"
```
### Slicing Pattern 4: By Performance
Basic version first, then performance improvement as a separate story.
```
Story 1: "Users can export invoice reports (any performance)"
Story 2: "Invoice export completes in under 5 seconds for reports up to 1000 items"
```
### Slicing Pattern 5: By Rules/Workflow
Each business rule is a candidate for its own story.
```
Large story: "The system applies discounts to orders"
Thin slices:
1. "Premium customers receive a 10% discount automatically"
2. "Orders over 1000 SEK receive free shipping"
3. "Discount codes can be applied at checkout"
```
## Prioritization Framework
Order stories by:
1. **Risk reduction first** — stories that test unknown territory early
2. **User value** — which story delivers the most value if we stopped here?
3. **Learning** — stories that reveal information needed for later stories
4. **Dependencies** — stories that others depend on come first
```
Risk reduction: "Integrate with Bankgirot API (spike)"
→ We don't know if this is feasible; find out first
User value: "Users can view their invoice list"
→ Most basic valuable state; everything else builds on this
Foundation: "Authentication: users can log in"
→ Many other stories depend on this
```
## Story Decomposition Example
**Problem:** A financial automation tool needs to process invoices for payment.
**Too large:** "Process invoices for payment"
**Decomposed with Elephant Carpaccio:**
1. **Parse PDF invoice, extract total amount**
- Given a valid Swedish PDF invoice
- When I upload it to the system
- Then the system displays the extracted amount and recipient
- AC: Displays IBAN, amount, and due date for 10 sample invoices
2. **Queue a payment for approval**
- Given a parsed invoice
- When I click "Approve for payment"
- Then the payment is queued with status "pending"
- AC: Payment appears in pending list, can be viewed
3. **Execute a queued payment via Bankgirot**
- Given a payment in "pending" status
- When the batch window runs
- Then an ISO 20022 payment instruction is submitted
- AC: Submission confirmed, status changes to "submitted"
4. **View payment status after submission**
- As a user I can see whether a submitted payment succeeded or failed
- AC: Status reflects Bankgirot confirmation/rejection
5. **Handle rejected payments**
- Given a submitted payment that is rejected by Bankgirot
- When the rejection is received
- Then the user sees a clear error message with the rejection reason
- AC: Rejection reason from Bankgirot is displayed, payment status is "rejected"
Each slice is independently valuable and testable.
## Output Document
The output of this skill is a `user-stories.md` file:
```markdown
# User Stories: [Feature Name]
## Personas
- **[Persona 1]**: [Who they are and what they care about]
- **[Persona 2]**: ...
## Stories (Prioritized)
### Phase 1: Foundation
#### Story 1: [Title]
As a [persona]...
#### Story 2: [Title]
...
### Phase 2: Core Features
#### Story 3: [Title]
...
## Decomposition Rationale
[Why stories were sliced this way; what dependencies exist]
## Stories Deferred to v2
- [Story A]: [Reason]
```
## Common Mistakes
| Mistake | Fix |
|---------|-----|
| Technical tasks disguised as stories ("Set up database") | Reframe as user value ("Users can create an account") |
| "And" in the title | Split into two stories |
| No acceptance criteria | Every story must have testable criteria |
| Stories depending on each other at the same priority level | Reorder so foundational stories come first |
| Implementation details in acceptance criteria ("use PostgreSQL") | Keep criteria in user/business terms |
## Mathias Context
As a digital PM:
- Stories should capture **business value**, not technical tasks
- Stories are the bridge between problem analysis and technical implementation
- When the business says "build feature X", decompose it into stories that can be delivered incrementally
- A story with 10 acceptance criteria is not one story — it's many. Split it.
- The goal is to find the thinnest slice that's still worth shipping
## Transition to Next Steps
After stories are written:
1. Load `spec-driven-dev` skill to write a technical specification
2. Load `planning` skill to convert stories into an implementation plan
3. Load `atdd` skill to implement stories with acceptance test-driven development
## Cross-References
- Requires: `problem-analysis` skill output
- Leads to: `planning` and `atdd` skills
- INVEST criteria origin: Bill Wake
- Elephant Carpaccio: Alistair Cockburn