skills/test-design/SKILL.md

---
name: test-design
description: Evaluate test quality using Dave Farley's 8 Properties of Good Tests. Use when reviewing or writing tests to ensure they provide genuine verification.
---

# Test Design

## Overview

Good tests are investments. Bad tests are liabilities — they pass when they shouldn't, fail when code is correct, or verify nothing meaningful.

This skill uses Dave Farley's 8 Properties of Good Tests to assess and improve test quality. The **Farley Index** (0–10) provides a scored summary.

Reference: [Dave Farley's Properties of Good Tests](https://www.linkedin.com/pulse/tdd-properties-good-tests-dave-farley-iexge/)

## The 8 Properties

| Property | Weight | What it measures |
|----------|--------|-----------------|
| **Understandable** | 1.5x | Can a reader understand what behavior is being tested? |
| **Maintainable** | 1.5x | Will small code changes cause test failures unrelated to behavior? |
| **Repeatable** | 1.25x | Same result every time, regardless of environment or order |
| **Atomic** | 1.0x | One behavior per test; tests are independent |
| **Necessary** | 1.0x | Tests real behavior, not mock internals or framework behavior |
| **Granular** | 1.0x | Each test covers one specific case |
| **Fast** | 0.75x | Tests run quickly enough to support rapid TDD cycles |
| **First (TDD)** | 1.0x | Tests were written before implementation |

**Farley Index formula:** `(U×1.5 + M×1.5 + R×1.25 + A×1.0 + N×1.0 + G×1.0 + F×0.75 + T×1.0) / 9.0`

## Rating Scale

| Score | Rating | Interpretation |
|-------|--------|----------------|
| 9.0–10.0 | Exemplary | Model quality; tests serve as living documentation |
| 7.5–8.9 | Excellent | High quality with minor improvement opportunities |
| 6.0–7.4 | Good | Solid foundation with clear improvement areas |
| 4.5–5.9 | Fair | Functional but needs significant attention |
| 3.0–4.4 | Poor | Tests provide limited value; refactoring needed |
| 0.0–2.9 | Critical | Tests may be harmful; consider rewriting |

## Property Deep Dives

### Understandable (U)

A test should tell a story: what behavior, under what conditions, produces what result.

**Go patterns that help:**
- Subtest names in `t.Run`: `t.Run("returns error when email is empty", ...)`
- Table-driven tests with descriptive `name` fields
- Arrange-Act-Assert structure with blank lines separating sections

```go
// Good: clear behavior name, clear structure
func TestValidateUser_RejectsEmptyEmail(t *testing.T) {
    // Arrange
    user := User{Name: "Alice", Email: ""}

    // Act
    err := ValidateUser(user)

    // Assert
    require.Error(t, err)
    assert.ErrorIs(t, err, ErrInvalidEmail)
}

// Bad: cryptic name, no structure
func TestUser1(t *testing.T) {
    u := User{}
    assert.NotNil(t, ValidateUser(u))
}
```

**Negative signals:** cryptic names (`test_1`, `TestFoo`), no AAA structure, multiple behaviors in one test.

### Maintainable (M)

Tests that break when implementation changes (but behavior doesn't) create noise and slow down development.

**Negative signals:**
- Over-specified mock interactions (`assert.Called(mock, "MethodX", args...)` when behavior is all that matters)
- ArgumentCaptor deep inspection
- `verifyNoMoreInteractions` that breaks when you add a logging call
- Tests coupled to internal field names

**Go patterns that help:**
- Test behavior via public API, not internal state
- Avoid asserting on exact call counts unless the count IS the behavior

```go
// Bad: breaks when you add an audit log call
mock.AssertCalled(t, "Save", user)
mock.AssertNumberOfCalls(t, "Save", 1)
mock.AssertNotCalled(t, "Log") // Breaks if you add logging later

// Good: test the outcome
result, err := service.CreateUser(ctx, req)
require.NoError(t, err)
assert.Equal(t, user.Email, result.Email)
```

### Repeatable (R)

Tests must produce the same result regardless of when, where, or in what order they run.

**Negative signals (Go):**
- `time.Now()` in test logic without injection
- `os.ReadFile` for fixtures that aren't hermetic
- Shared global state between tests
- Tests that depend on network availability
- `time.Sleep` for synchronization

**Go fixes:**
```go
// Bad: time-dependent
func TestTokenExpiry(t *testing.T) {
    token := generateToken()
    time.Sleep(2 * time.Second)
    assert.True(t, token.IsExpired())
}

// Good: inject clock
type Clock interface {
    Now() time.Time
}

type FixedClock struct{ t time.Time }
func (c FixedClock) Now() time.Time { return c.t }

func TestTokenExpiry(t *testing.T) {
    clock := FixedClock{t: time.Unix(0, 0)}
    token := generateTokenWithClock(clock)
    futureClk := FixedClock{t: time.Unix(3600, 0)}
    assert.True(t, token.IsExpiredAt(futureClk.Now()))
}
```

Use `t.TempDir()` for filesystem fixtures — cleaned up automatically.

### Atomic (A)

One test = one behavior. Tests must be independent — running in any order must produce the same result.

**Go patterns:**
- `t.Parallel()` on subtests forces isolation
- Fresh state in each `t.Run`
- No `init()` or package-level setup that leaks between tests

```go
func TestUserService(t *testing.T) {
    tests := []struct {
        name    string
        input   CreateUserReq
        wantErr bool
    }{
        {"valid user", validReq, false},
        {"duplicate email", dupEmailReq, true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            t.Parallel() // Each subtest runs independently
            store := NewInMemoryStore() // Fresh state per test
            svc := NewUserService(store)
            _, err := svc.Create(context.Background(), tt.input)
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}
```

### Necessary (N)

Tests must verify real behavior. Tautology Theatre — tests whose outcome is predetermined regardless of production code — provides false confidence.

**Types of Tautology Theatre:**

1. **Mock tautology:** Configure mock return, then assert that mock returns it.
   ```go
   // Bad: this passes even if production code is deleted
   mockStore.On("GetUser", id).Return(user, nil)
   result, _ := mockStore.GetUser(id)
   assert.Equal(t, user, result) // Testing the mock, not production code
   ```

2. **Mock-only test:** Every object is a mock; no real class instantiated.

3. **Trivial tautology:** `assert.True(t, true)` or `assert.NotNil(t, new(User))`

4. **Framework test:** Verifying that Go's `make(map[string]int)` returns non-nil.

**Fix:** Test real behavior through real implementations. Use mocks only to isolate from external systems (DB, HTTP, filesystem).

### Granular (G)

Each test covers one specific case. Table-driven tests in Go are the natural expression of granularity.

```go
// Good: each row is one case, each can fail independently
func TestParseAmount(t *testing.T) {
    tests := []struct {
        name    string
        input   string
        want    Amount
        wantErr bool
    }{
        {"integer", "100", Amount{Value: 100}, false},
        {"decimal", "10.50", Amount{Value: 1050, Scale: 2}, false},
        {"negative", "-5", Amount{}, true},
        {"empty", "", Amount{}, true},
        {"non-numeric", "abc", Amount{}, true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            got, err := ParseAmount(tt.input)
            if tt.wantErr {
                require.Error(t, err)
                return
            }
            require.NoError(t, err)
            assert.Equal(t, tt.want, got)
        })
    }
}
```

### Fast (F)

Tests must run fast enough to support TDD cycles. Target: the full test suite in < 30 seconds for most projects.

**Go fixes:**
- Mark slow integration tests with build tags: `//go:build integration`
- Use `t.Parallel()` to parallelize safe tests
- Use `InMemoryStore` implementations instead of real DB for unit tests
- Use `httptest.NewServer` for HTTP tests instead of real servers

```bash
# Unit tests only (fast, default)
go test ./...

# Integration tests (slower, explicit)
go test -tags=integration ./...
```

**Negative signals:** `time.Sleep`, network calls without build tags, database calls in unit tests.

### First / TDD (T)

Evidence that tests were written before implementation. This is the hardest property to verify statically.

**Positive signals:**
- Commit history shows test commit before implementation commit
- Tests test behavior, not implementation details (tests-first forces API design)
- Tests are simpler than the implementation (tests-first keeps tests focused)

**Negative signals:**
- Tests that exactly mirror the implementation structure
- Tests that only cover happy paths (implementation-first misses edge cases)
- Tests added in the same commit as a large implementation

## Go-Specific Test Design Notes

### t.Helper()

Use `t.Helper()` in helper functions so stack traces point to the call site, not the helper:

```go
func assertValidUser(t *testing.T, u User) {
    t.Helper()
    assert.NotEmpty(t, u.ID)
    assert.NotEmpty(t, u.Email)
}
```

### Table-Driven Tests Are Preferred

Go convention is table-driven tests. They're granular, readable, and easy to extend:
- Add a new case by adding a row to the table — no new test function
- Each case can be run independently: `go test -run TestFoo/case_name`

### Subtests Enable Targeted Runs

```bash
go test -run TestValidateUser/rejects_empty_email ./...
```

## When Writing Tests

Apply this checklist to every new test:

- [ ] Name describes the behavior being tested (not the function name)
- [ ] Structure follows Arrange-Act-Assert
- [ ] Tests one behavior (no "and" in the name)
- [ ] Uses real implementations where feasible
- [ ] Runs in < 100ms (or tagged for integration)
- [ ] Uses `t.Helper()` in helper functions
- [ ] Table-driven if testing multiple similar inputs

## Cross-References

- Load `tdd` skill for the full TDD workflow
- Load `code-review` skill for test quality review during pre-merge review
- See `clean-code/references/code-smells.md` for testing-specific smells