--- name: tdd description: Write failing tests first, then minimal code to pass. Non-negotiable for all new features and bug fixes. --- # Test-Driven Development (TDD) ## Overview Write the test first. Watch it fail. Write minimal code to pass. **Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing. **Violating the letter of the rules is violating the spirit of the rules.** ## When to Use **Always:** - New features - Bug fixes - Refactoring - Behavior changes **Exceptions (ask Mathias):** - Throwaway prototypes - Generated code (sqlc output, templ output) - Configuration files Thinking "skip TDD just this once"? Stop. That's rationalization. ## The Iron Law ``` NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST ``` Write code before the test? Delete it. Start over. **No exceptions:** - Don't keep it as "reference" - Don't "adapt" it while writing tests - Don't look at it - Delete means delete Implement fresh from tests. Period. ## Red-Green-Refactor ``` RED → verify fails correctly → GREEN → verify all pass → REFACTOR → stay green → RED (next) ``` ### RED - Write Failing Test Write one minimal test showing what should happen. **Go example (good):** ```go func TestRetryOperation_RetriesThreeTimes(t *testing.T) { attempts := 0 op := func() error { attempts++ if attempts < 3 { return errors.New("fail") } return nil } err := RetryOperation(op, 3) require.NoError(t, err) assert.Equal(t, 3, attempts) } ``` Clear name, tests real behavior, one thing. **Go example (bad):** ```go func TestRetry(t *testing.T) { // Vague name, tests nothing meaningful err := RetryOperation(nil, 0) assert.NoError(t, err) } ``` **Requirements:** - One behavior per test - Clear name describing the behavior - Real code (no mocks unless crossing a system boundary) ### Verify RED - Watch It Fail **MANDATORY. Never skip.** ```bash go test -run TestRetryOperation_RetriesThreeTimes ./... ``` Confirm: - Test fails (not compilation errors) - Failure message is expected - Fails because feature is missing (not typos) **Test passes?** You're testing existing behavior. Fix test. **Compilation errors?** Fix errors, re-run until it fails correctly. ### GREEN - Minimal Code Write simplest code to pass the test. **Good:** ```go func RetryOperation(op func() error, maxRetries int) error { for i := 0; i < maxRetries; i++ { if err := op(); err == nil { return nil } } return op() } ``` Just enough to pass. **Bad:** ```go func RetryOperation(op func() error, maxRetries int, opts ...RetryOption) error { // YAGNI — don't add options, backoff, jitter before the test demands it } ``` Over-engineered. Don't add features, refactor other code, or "improve" beyond what the test demands. ### Verify GREEN - Watch It Pass **MANDATORY.** ```bash go test ./... ``` Confirm: - Test passes - All other tests still pass - No race conditions: `go test -race ./...` **Test fails?** Fix code, not test. **Other tests fail?** Fix now. ### REFACTOR - Clean Up After green only: - Remove duplication - Improve names - Extract helpers - Apply clean code principles (load `clean-code` skill) Keep tests green. Don't add behavior. ### Repeat Next failing test for next behavior. ## Go-Specific TDD Notes ### Table-Driven Tests (Preferred Pattern) ```go func TestValidateEmail(t *testing.T) { tests := []struct { name string email string wantErr bool }{ {"valid email", "user@example.com", false}, {"empty email", "", true}, {"no at sign", "userexample.com", true}, {"no domain", "user@", true}, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { err := ValidateEmail(tt.email) if tt.wantErr { assert.Error(t, err) } else { assert.NoError(t, err) } }) } } ``` Table-driven tests are the Go idiom. Use them for behavior that varies across inputs. ### Subtests with t.Run Use `t.Run` to name subtests clearly. Failure messages include the subtest name. ```go t.Run("rejects empty input", func(t *testing.T) { ... }) t.Run("accepts valid UUID", func(t *testing.T) { ... }) ``` ### Test File Conventions - File: `_test.go` in the same directory - Package: `package foo_test` for black-box testing (preferred), `package foo` for white-box - Helpers: use `t.Helper()` so stack traces point to the caller, not the helper ```go func assertNoError(t *testing.T, err error) { t.Helper() if err != nil { t.Fatalf("unexpected error: %v", err) } } ``` ### Running Tests ```bash go test ./... # all packages go test -run TestFoo ./pkg/... # specific test go test -run TestFoo/subtest ./... # specific subtest go test -race ./... # race detector (always run before commit) go test -cover ./... # coverage go test -v ./... # verbose ``` ### testify Pre-approved. Use `assert` (continues on failure) and `require` (stops on failure): ```go require.NoError(t, err) // fatal if error assert.Equal(t, expected, got) // non-fatal comparison assert.ErrorIs(t, err, ErrFoo) // error chain check ``` ## Good Tests | Quality | Good | Bad | |---------|------|-----| | **Minimal** | One thing. "and" in name? Split it. | `TestValidatesEmailAndDomainAndWhitespace` | | **Clear** | Name describes behavior | `TestFoo`, `Test1` | | **Shows intent** | Demonstrates desired API | Obscures what code should do | | **Table-driven** | Multiple cases, one test function | Copy-pasted test functions | ## Common Rationalizations | Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" | | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. | | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. | | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. | | "Need to explore first" | Fine. Throw away exploration, start with TDD. | | "Test hard = design unclear" | Listen to the test. Hard to test = hard to use. | | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. | | "Existing code has no tests" | You're improving it. Add tests for existing code. | ## Red Flags - STOP and Start Over - Code before test - Test after implementation - Test passes immediately without seeing it fail - Can't explain why test failed - Tests added "later" - Rationalizing "just this once" - "Already manually tested it" - "Tests after achieve the same purpose" - "Keep as reference" or "adapt existing code" - "Already spent X hours, deleting is wasteful" - "TDD is dogmatic, I'm being pragmatic" - "This is different because..." **All of these mean: Delete code. Start over with TDD.** ## Example: Bug Fix **Bug:** Empty email accepted **RED** ```go func TestSubmitForm_RejectsEmptyEmail(t *testing.T) { result := submitForm(FormData{Email: ""}) assert.Equal(t, "email required", result.Error) } ``` **Verify RED** ```bash $ go test -run TestSubmitForm_RejectsEmptyEmail ./... FAIL: expected "email required", got "" ``` **GREEN** ```go func submitForm(data FormData) FormResult { if strings.TrimSpace(data.Email) == "" { return FormResult{Error: "email required"} } // ... } ``` **Verify GREEN** ```bash $ go test ./... ok example.com/myapp 0.003s ``` **REFACTOR** Extract validation for multiple fields if needed. ## Verification Checklist Before marking work complete: - [ ] Every new function/method has a test - [ ] Watched each test fail before implementing - [ ] Each test failed for expected reason (feature missing, not typo) - [ ] Wrote minimal code to pass each test - [ ] All tests pass: `go test ./...` - [ ] Race detector clean: `go test -race ./...` - [ ] Tests use real code (mocks only if crossing a system boundary) - [ ] Edge cases and errors covered Can't check all boxes? You skipped TDD. Start over. ## When Stuck | Problem | Solution | |---------|----------| | Don't know how to test | Write wished-for API. Write assertion first. Ask Mathias. | | Test too complicated | Design too complicated. Simplify interface. | | Must mock everything | Code too coupled. Use dependency injection. | | Test setup huge | Extract helpers with `t.Helper()`. Still complex? Simplify design. | ## Debugging Integration Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression. Never fix bugs without a test. ## Testing Anti-Patterns When adding test utilities or mocks, load the `references/testing-anti-patterns.md` to avoid: - Testing mock behavior instead of real behavior - Adding test-only methods to production types - Mocking without understanding what the dependency does ## Brain MCP Integration The brain MCP exposes session context across machines. Use it to make TDD cycles cumulative rather than one-shot. **At session start:** - Run `brain_query` with the feature name + "tdd" to surface prior cycles, anti-patterns, or testing decisions for this code area. Skip if the feature is brand new. **Never:** - Embed brain content in test code or assertions. The brain is context for you, not state for the system under test. ### Logging Call `session_log` once at the end of every phase to record the outcome. Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which treats `pass` as success, `fail` as failure, `skip` as neither. **At end of `red` phase:** - `session_log` with `{skill: "tdd", phase: "red", final_status: "pass" | "fail" | "skip", message: "", duration_ms: , project_root: ""}` **At end of `green` phase:** - `session_log` with `{skill: "tdd", phase: "green", final_status: "pass" | "fail" | "skip", message: "", duration_ms: , project_root: ""}` **At end of `refactor` phase:** - `session_log` with `{skill: "tdd", phase: "refactor", final_status: "pass" | "fail" | "skip", message: "", duration_ms: , project_root: ""}` **Status semantics:** - `pass` — the phase's intended outcome was reached (red: test fails as expected; green: test passes; refactor: tests still pass after refactor). - `fail` — the phase's intended outcome was NOT reached (test compiled when it shouldn't, test still fails after green attempt, refactor broke tests). - `skip` — phase was skipped intentionally (e.g. refactor not warranted). **Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future `tdd` call to a local model. If your skill never logs, the routing pod sees no data and may default-route or default-not-route in a way that doesn't reflect real performance. ## Final Rule ``` Production code → test exists and failed first Otherwise → not TDD ``` No exceptions without Mathias's permission. ## Mode 2 Routing Note This skill is invoked identically whether the agent is running in Mode 1 (cloud Claude, no routing) or Mode 2 (client-local, supervisor routing layer). The routing pod (Plan 6) does not exist yet; until it does, treat this skill as Mode 1 only. The discipline does not change between modes — only the model behind the call.