--- name: test-design description: Evaluate test quality using Dave Farley's 8 Properties of Good Tests. Use when reviewing or writing tests to ensure they provide genuine verification. --- # Test Design ## Overview Good tests are investments. Bad tests are liabilities — they pass when they shouldn't, fail when code is correct, or verify nothing meaningful. This skill uses Dave Farley's 8 Properties of Good Tests to assess and improve test quality. The **Farley Index** (0–10) provides a scored summary. Reference: [Dave Farley's Properties of Good Tests](https://www.linkedin.com/pulse/tdd-properties-good-tests-dave-farley-iexge/) ## The 8 Properties | Property | Weight | What it measures | |----------|--------|-----------------| | **Understandable** | 1.5x | Can a reader understand what behavior is being tested? | | **Maintainable** | 1.5x | Will small code changes cause test failures unrelated to behavior? | | **Repeatable** | 1.25x | Same result every time, regardless of environment or order | | **Atomic** | 1.0x | One behavior per test; tests are independent | | **Necessary** | 1.0x | Tests real behavior, not mock internals or framework behavior | | **Granular** | 1.0x | Each test covers one specific case | | **Fast** | 0.75x | Tests run quickly enough to support rapid TDD cycles | | **First (TDD)** | 1.0x | Tests were written before implementation | **Farley Index formula:** `(U×1.5 + M×1.5 + R×1.25 + A×1.0 + N×1.0 + G×1.0 + F×0.75 + T×1.0) / 9.0` ## Rating Scale | Score | Rating | Interpretation | |-------|--------|----------------| | 9.0–10.0 | Exemplary | Model quality; tests serve as living documentation | | 7.5–8.9 | Excellent | High quality with minor improvement opportunities | | 6.0–7.4 | Good | Solid foundation with clear improvement areas | | 4.5–5.9 | Fair | Functional but needs significant attention | | 3.0–4.4 | Poor | Tests provide limited value; refactoring needed | | 0.0–2.9 | Critical | Tests may be harmful; consider rewriting | ## Property Deep Dives ### Understandable (U) A test should tell a story: what behavior, under what conditions, produces what result. **Go patterns that help:** - Subtest names in `t.Run`: `t.Run("returns error when email is empty", ...)` - Table-driven tests with descriptive `name` fields - Arrange-Act-Assert structure with blank lines separating sections ```go // Good: clear behavior name, clear structure func TestValidateUser_RejectsEmptyEmail(t *testing.T) { // Arrange user := User{Name: "Alice", Email: ""} // Act err := ValidateUser(user) // Assert require.Error(t, err) assert.ErrorIs(t, err, ErrInvalidEmail) } // Bad: cryptic name, no structure func TestUser1(t *testing.T) { u := User{} assert.NotNil(t, ValidateUser(u)) } ``` **Negative signals:** cryptic names (`test_1`, `TestFoo`), no AAA structure, multiple behaviors in one test. ### Maintainable (M) Tests that break when implementation changes (but behavior doesn't) create noise and slow down development. **Negative signals:** - Over-specified mock interactions (`assert.Called(mock, "MethodX", args...)` when behavior is all that matters) - ArgumentCaptor deep inspection - `verifyNoMoreInteractions` that breaks when you add a logging call - Tests coupled to internal field names **Go patterns that help:** - Test behavior via public API, not internal state - Avoid asserting on exact call counts unless the count IS the behavior ```go // Bad: breaks when you add an audit log call mock.AssertCalled(t, "Save", user) mock.AssertNumberOfCalls(t, "Save", 1) mock.AssertNotCalled(t, "Log") // Breaks if you add logging later // Good: test the outcome result, err := service.CreateUser(ctx, req) require.NoError(t, err) assert.Equal(t, user.Email, result.Email) ``` ### Repeatable (R) Tests must produce the same result regardless of when, where, or in what order they run. **Negative signals (Go):** - `time.Now()` in test logic without injection - `os.ReadFile` for fixtures that aren't hermetic - Shared global state between tests - Tests that depend on network availability - `time.Sleep` for synchronization **Go fixes:** ```go // Bad: time-dependent func TestTokenExpiry(t *testing.T) { token := generateToken() time.Sleep(2 * time.Second) assert.True(t, token.IsExpired()) } // Good: inject clock type Clock interface { Now() time.Time } type FixedClock struct{ t time.Time } func (c FixedClock) Now() time.Time { return c.t } func TestTokenExpiry(t *testing.T) { clock := FixedClock{t: time.Unix(0, 0)} token := generateTokenWithClock(clock) futureClk := FixedClock{t: time.Unix(3600, 0)} assert.True(t, token.IsExpiredAt(futureClk.Now())) } ``` Use `t.TempDir()` for filesystem fixtures — cleaned up automatically. ### Atomic (A) One test = one behavior. Tests must be independent — running in any order must produce the same result. **Go patterns:** - `t.Parallel()` on subtests forces isolation - Fresh state in each `t.Run` - No `init()` or package-level setup that leaks between tests ```go func TestUserService(t *testing.T) { tests := []struct { name string input CreateUserReq wantErr bool }{ {"valid user", validReq, false}, {"duplicate email", dupEmailReq, true}, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { t.Parallel() // Each subtest runs independently store := NewInMemoryStore() // Fresh state per test svc := NewUserService(store) _, err := svc.Create(context.Background(), tt.input) if tt.wantErr { assert.Error(t, err) } else { assert.NoError(t, err) } }) } } ``` ### Necessary (N) Tests must verify real behavior. Tautology Theatre — tests whose outcome is predetermined regardless of production code — provides false confidence. **Types of Tautology Theatre:** 1. **Mock tautology:** Configure mock return, then assert that mock returns it. ```go // Bad: this passes even if production code is deleted mockStore.On("GetUser", id).Return(user, nil) result, _ := mockStore.GetUser(id) assert.Equal(t, user, result) // Testing the mock, not production code ``` 2. **Mock-only test:** Every object is a mock; no real class instantiated. 3. **Trivial tautology:** `assert.True(t, true)` or `assert.NotNil(t, new(User))` 4. **Framework test:** Verifying that Go's `make(map[string]int)` returns non-nil. **Fix:** Test real behavior through real implementations. Use mocks only to isolate from external systems (DB, HTTP, filesystem). ### Granular (G) Each test covers one specific case. Table-driven tests in Go are the natural expression of granularity. ```go // Good: each row is one case, each can fail independently func TestParseAmount(t *testing.T) { tests := []struct { name string input string want Amount wantErr bool }{ {"integer", "100", Amount{Value: 100}, false}, {"decimal", "10.50", Amount{Value: 1050, Scale: 2}, false}, {"negative", "-5", Amount{}, true}, {"empty", "", Amount{}, true}, {"non-numeric", "abc", Amount{}, true}, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { got, err := ParseAmount(tt.input) if tt.wantErr { require.Error(t, err) return } require.NoError(t, err) assert.Equal(t, tt.want, got) }) } } ``` ### Fast (F) Tests must run fast enough to support TDD cycles. Target: the full test suite in < 30 seconds for most projects. **Go fixes:** - Mark slow integration tests with build tags: `//go:build integration` - Use `t.Parallel()` to parallelize safe tests - Use `InMemoryStore` implementations instead of real DB for unit tests - Use `httptest.NewServer` for HTTP tests instead of real servers ```bash # Unit tests only (fast, default) go test ./... # Integration tests (slower, explicit) go test -tags=integration ./... ``` **Negative signals:** `time.Sleep`, network calls without build tags, database calls in unit tests. ### First / TDD (T) Evidence that tests were written before implementation. This is the hardest property to verify statically. **Positive signals:** - Commit history shows test commit before implementation commit - Tests test behavior, not implementation details (tests-first forces API design) - Tests are simpler than the implementation (tests-first keeps tests focused) **Negative signals:** - Tests that exactly mirror the implementation structure - Tests that only cover happy paths (implementation-first misses edge cases) - Tests added in the same commit as a large implementation ## Go-Specific Test Design Notes ### t.Helper() Use `t.Helper()` in helper functions so stack traces point to the call site, not the helper: ```go func assertValidUser(t *testing.T, u User) { t.Helper() assert.NotEmpty(t, u.ID) assert.NotEmpty(t, u.Email) } ``` ### Table-Driven Tests Are Preferred Go convention is table-driven tests. They're granular, readable, and easy to extend: - Add a new case by adding a row to the table — no new test function - Each case can be run independently: `go test -run TestFoo/case_name` ### Subtests Enable Targeted Runs ```bash go test -run TestValidateUser/rejects_empty_email ./... ``` ## When Writing Tests Apply this checklist to every new test: - [ ] Name describes the behavior being tested (not the function name) - [ ] Structure follows Arrange-Act-Assert - [ ] Tests one behavior (no "and" in the name) - [ ] Uses real implementations where feasible - [ ] Runs in < 100ms (or tagged for integration) - [ ] Uses `t.Helper()` in helper functions - [ ] Table-driven if testing multiple similar inputs ## Cross-References - Load `tdd` skill for the full TDD workflow - Load `code-review` skill for test quality review during pre-merge review - See `clean-code/references/code-smells.md` for testing-specific smells