Files
skills/tdd/SKILL.md
Mathias d6a71e370e
Some checks failed
release / tag (push) Has been cancelled
chore: bootstrap skills library — 19 skills + installer + CI auto-tag
Phase 1 of mathias/skills extraction (infra#62 Track D — homelab
next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill
dirs + SKILLS_INDEX.md) and adds the installation surface:

- Taskfile.yml — install / update / list / release / check targets
- install.sh — bootstrap installer for hosts without Task. Idempotent
  symlink wirer; default checkout at ~/.local/share/skills/ on every
  host; SKILLS_REF env var pins a tag (default: main).
- .gitea/workflows/release.yml — auto-tag every push to main by
  Bump-Type footer (major/minor/patch, default patch). Skipped when
  commit contains [skip-release].
- README — usage, versioning, contribution flow, secret-hygiene rule.

Phase 1 wires Claude Code only (~/.claude/skills/<name> global +
<repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode,
antigravity, and gitea-resident agents (cobalt-dingo, agentsquad)
once their skill conventions are researched.

Public repo, markdown-only — no secrets, no client names. Verified
via pre-push grep before initial push.

[skip-release]
2026-05-24 14:59:54 +02:00

12 KiB

name, description
name description
tdd Write failing tests first, then minimal code to pass. Non-negotiable for all new features and bug fixes.

Test-Driven Development (TDD)

Overview

Write the test first. Watch it fail. Write minimal code to pass.

Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.

Violating the letter of the rules is violating the spirit of the rules.

When to Use

Always:

  • New features
  • Bug fixes
  • Refactoring
  • Behavior changes

Exceptions (ask Mathias):

  • Throwaway prototypes
  • Generated code (sqlc output, templ output)
  • Configuration files

Thinking "skip TDD just this once"? Stop. That's rationalization.

The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

Write code before the test? Delete it. Start over.

No exceptions:

  • Don't keep it as "reference"
  • Don't "adapt" it while writing tests
  • Don't look at it
  • Delete means delete

Implement fresh from tests. Period.

Red-Green-Refactor

RED → verify fails correctly → GREEN → verify all pass → REFACTOR → stay green → RED (next)

RED - Write Failing Test

Write one minimal test showing what should happen.

Go example (good):

func TestRetryOperation_RetriesThreeTimes(t *testing.T) {
    attempts := 0
    op := func() error {
        attempts++
        if attempts < 3 {
            return errors.New("fail")
        }
        return nil
    }

    err := RetryOperation(op, 3)

    require.NoError(t, err)
    assert.Equal(t, 3, attempts)
}

Clear name, tests real behavior, one thing.

Go example (bad):

func TestRetry(t *testing.T) {
    // Vague name, tests nothing meaningful
    err := RetryOperation(nil, 0)
    assert.NoError(t, err)
}

Requirements:

  • One behavior per test
  • Clear name describing the behavior
  • Real code (no mocks unless crossing a system boundary)

Verify RED - Watch It Fail

MANDATORY. Never skip.

go test -run TestRetryOperation_RetriesThreeTimes ./...

Confirm:

  • Test fails (not compilation errors)
  • Failure message is expected
  • Fails because feature is missing (not typos)

Test passes? You're testing existing behavior. Fix test.

Compilation errors? Fix errors, re-run until it fails correctly.

GREEN - Minimal Code

Write simplest code to pass the test.

Good:

func RetryOperation(op func() error, maxRetries int) error {
    for i := 0; i < maxRetries; i++ {
        if err := op(); err == nil {
            return nil
        }
    }
    return op()
}

Just enough to pass.

Bad:

func RetryOperation(op func() error, maxRetries int, opts ...RetryOption) error {
    // YAGNI — don't add options, backoff, jitter before the test demands it
}

Over-engineered.

Don't add features, refactor other code, or "improve" beyond what the test demands.

Verify GREEN - Watch It Pass

MANDATORY.

go test ./...

Confirm:

  • Test passes
  • All other tests still pass
  • No race conditions: go test -race ./...

Test fails? Fix code, not test.

Other tests fail? Fix now.

REFACTOR - Clean Up

After green only:

  • Remove duplication
  • Improve names
  • Extract helpers
  • Apply clean code principles (load clean-code skill)

Keep tests green. Don't add behavior.

Repeat

Next failing test for next behavior.

Go-Specific TDD Notes

Table-Driven Tests (Preferred Pattern)

func TestValidateEmail(t *testing.T) {
    tests := []struct {
        name    string
        email   string
        wantErr bool
    }{
        {"valid email", "user@example.com", false},
        {"empty email", "", true},
        {"no at sign", "userexample.com", true},
        {"no domain", "user@", true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            err := ValidateEmail(tt.email)
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}

Table-driven tests are the Go idiom. Use them for behavior that varies across inputs.

Subtests with t.Run

Use t.Run to name subtests clearly. Failure messages include the subtest name.

t.Run("rejects empty input", func(t *testing.T) { ... })
t.Run("accepts valid UUID", func(t *testing.T) { ... })

Test File Conventions

  • File: <package>_test.go in the same directory
  • Package: package foo_test for black-box testing (preferred), package foo for white-box
  • Helpers: use t.Helper() so stack traces point to the caller, not the helper
func assertNoError(t *testing.T, err error) {
    t.Helper()
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
}

Running Tests

go test ./...                        # all packages
go test -run TestFoo ./pkg/...       # specific test
go test -run TestFoo/subtest ./...   # specific subtest
go test -race ./...                  # race detector (always run before commit)
go test -cover ./...                 # coverage
go test -v ./...                     # verbose

testify

Pre-approved. Use assert (continues on failure) and require (stops on failure):

require.NoError(t, err)          // fatal if error
assert.Equal(t, expected, got)   // non-fatal comparison
assert.ErrorIs(t, err, ErrFoo)   // error chain check

Good Tests

Quality Good Bad
Minimal One thing. "and" in name? Split it. TestValidatesEmailAndDomainAndWhitespace
Clear Name describes behavior TestFoo, Test1
Shows intent Demonstrates desired API Obscures what code should do
Table-driven Multiple cases, one test function Copy-pasted test functions

Common Rationalizations

Excuse Reality
"Too simple to test" Simple code breaks. Test takes 30 seconds.
"I'll test after" Tests passing immediately prove nothing.
"Tests after achieve same goals" Tests-after = "what does this do?" Tests-first = "what should this do?"
"Already manually tested" Ad-hoc ≠ systematic. No record, can't re-run.
"Deleting X hours is wasteful" Sunk cost fallacy. Keeping unverified code is technical debt.
"Keep as reference, write tests first" You'll adapt it. That's testing after. Delete means delete.
"Need to explore first" Fine. Throw away exploration, start with TDD.
"Test hard = design unclear" Listen to the test. Hard to test = hard to use.
"TDD will slow me down" TDD faster than debugging. Pragmatic = test-first.
"Existing code has no tests" You're improving it. Add tests for existing code.

Red Flags - STOP and Start Over

  • Code before test
  • Test after implementation
  • Test passes immediately without seeing it fail
  • Can't explain why test failed
  • Tests added "later"
  • Rationalizing "just this once"
  • "Already manually tested it"
  • "Tests after achieve the same purpose"
  • "Keep as reference" or "adapt existing code"
  • "Already spent X hours, deleting is wasteful"
  • "TDD is dogmatic, I'm being pragmatic"
  • "This is different because..."

All of these mean: Delete code. Start over with TDD.

Example: Bug Fix

Bug: Empty email accepted

RED

func TestSubmitForm_RejectsEmptyEmail(t *testing.T) {
    result := submitForm(FormData{Email: ""})
    assert.Equal(t, "email required", result.Error)
}

Verify RED

$ go test -run TestSubmitForm_RejectsEmptyEmail ./...
FAIL: expected "email required", got ""

GREEN

func submitForm(data FormData) FormResult {
    if strings.TrimSpace(data.Email) == "" {
        return FormResult{Error: "email required"}
    }
    // ...
}

Verify GREEN

$ go test ./...
ok  	example.com/myapp	0.003s

REFACTOR Extract validation for multiple fields if needed.

Verification Checklist

Before marking work complete:

  • Every new function/method has a test
  • Watched each test fail before implementing
  • Each test failed for expected reason (feature missing, not typo)
  • Wrote minimal code to pass each test
  • All tests pass: go test ./...
  • Race detector clean: go test -race ./...
  • Tests use real code (mocks only if crossing a system boundary)
  • Edge cases and errors covered

Can't check all boxes? You skipped TDD. Start over.

When Stuck

Problem Solution
Don't know how to test Write wished-for API. Write assertion first. Ask Mathias.
Test too complicated Design too complicated. Simplify interface.
Must mock everything Code too coupled. Use dependency injection.
Test setup huge Extract helpers with t.Helper(). Still complex? Simplify design.

Debugging Integration

Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.

Never fix bugs without a test.

Testing Anti-Patterns

When adding test utilities or mocks, load the references/testing-anti-patterns.md to avoid:

  • Testing mock behavior instead of real behavior
  • Adding test-only methods to production types
  • Mocking without understanding what the dependency does

Brain MCP Integration

The brain MCP exposes session context across machines. Use it to make TDD cycles cumulative rather than one-shot.

At session start:

  • Run brain_query with the feature name + "tdd" to surface prior cycles, anti-patterns, or testing decisions for this code area. Skip if the feature is brand new.

Never:

  • Embed brain content in test code or assertions. The brain is context for you, not state for the system under test.

Logging

Call session_log once at the end of every phase to record the outcome. Pass-rate is computed downstream by the /pass-rate HTTP endpoint, which treats pass as success, fail as failure, skip as neither.

At end of red phase:

  • session_log with {skill: "tdd", phase: "red", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}

At end of green phase:

  • session_log with {skill: "tdd", phase: "green", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}

At end of refactor phase:

  • session_log with {skill: "tdd", phase: "refactor", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}

Status semantics:

  • pass — the phase's intended outcome was reached (red: test fails as expected; green: test passes; refactor: tests still pass after refactor).
  • fail — the phase's intended outcome was NOT reached (test compiled when it shouldn't, test still fails after green attempt, refactor broke tests).
  • skip — phase was skipped intentionally (e.g. refactor not warranted).

Why this matters: the routing pod (Plan 6) reads pass-rate to decide whether to route a future tdd call to a local model. If your skill never logs, the routing pod sees no data and may default-route or default-not-route in a way that doesn't reflect real performance.

Final Rule

Production code → test exists and failed first
Otherwise → not TDD

No exceptions without Mathias's permission.

Mode 2 Routing Note

This skill is invoked identically whether the agent is running in Mode 1 (cloud Claude, no routing) or Mode 2 (client-local, supervisor routing layer). The routing pod (Plan 6) does not exist yet; until it does, treat this skill as Mode 1 only. The discipline does not change between modes — only the model behind the call.