Files
skills/test-design/SKILL.md
Mathias d6a71e370e
Some checks failed
release / tag (push) Has been cancelled
chore: bootstrap skills library — 19 skills + installer + CI auto-tag
Phase 1 of mathias/skills extraction (infra#62 Track D — homelab
next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill
dirs + SKILLS_INDEX.md) and adds the installation surface:

- Taskfile.yml — install / update / list / release / check targets
- install.sh — bootstrap installer for hosts without Task. Idempotent
  symlink wirer; default checkout at ~/.local/share/skills/ on every
  host; SKILLS_REF env var pins a tag (default: main).
- .gitea/workflows/release.yml — auto-tag every push to main by
  Bump-Type footer (major/minor/patch, default patch). Skipped when
  commit contains [skip-release].
- README — usage, versioning, contribution flow, secret-hygiene rule.

Phase 1 wires Claude Code only (~/.claude/skills/<name> global +
<repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode,
antigravity, and gitea-resident agents (cobalt-dingo, agentsquad)
once their skill conventions are researched.

Public repo, markdown-only — no secrets, no client names. Verified
via pre-push grep before initial push.

[skip-release]
2026-05-24 14:59:54 +02:00

309 lines
9.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: test-design
description: Evaluate test quality using Dave Farley's 8 Properties of Good Tests. Use when reviewing or writing tests to ensure they provide genuine verification.
---
# Test Design
## Overview
Good tests are investments. Bad tests are liabilities — they pass when they shouldn't, fail when code is correct, or verify nothing meaningful.
This skill uses Dave Farley's 8 Properties of Good Tests to assess and improve test quality. The **Farley Index** (010) provides a scored summary.
Reference: [Dave Farley's Properties of Good Tests](https://www.linkedin.com/pulse/tdd-properties-good-tests-dave-farley-iexge/)
## The 8 Properties
| Property | Weight | What it measures |
|----------|--------|-----------------|
| **Understandable** | 1.5x | Can a reader understand what behavior is being tested? |
| **Maintainable** | 1.5x | Will small code changes cause test failures unrelated to behavior? |
| **Repeatable** | 1.25x | Same result every time, regardless of environment or order |
| **Atomic** | 1.0x | One behavior per test; tests are independent |
| **Necessary** | 1.0x | Tests real behavior, not mock internals or framework behavior |
| **Granular** | 1.0x | Each test covers one specific case |
| **Fast** | 0.75x | Tests run quickly enough to support rapid TDD cycles |
| **First (TDD)** | 1.0x | Tests were written before implementation |
**Farley Index formula:** `(U×1.5 + M×1.5 + R×1.25 + A×1.0 + N×1.0 + G×1.0 + F×0.75 + T×1.0) / 9.0`
## Rating Scale
| Score | Rating | Interpretation |
|-------|--------|----------------|
| 9.010.0 | Exemplary | Model quality; tests serve as living documentation |
| 7.58.9 | Excellent | High quality with minor improvement opportunities |
| 6.07.4 | Good | Solid foundation with clear improvement areas |
| 4.55.9 | Fair | Functional but needs significant attention |
| 3.04.4 | Poor | Tests provide limited value; refactoring needed |
| 0.02.9 | Critical | Tests may be harmful; consider rewriting |
## Property Deep Dives
### Understandable (U)
A test should tell a story: what behavior, under what conditions, produces what result.
**Go patterns that help:**
- Subtest names in `t.Run`: `t.Run("returns error when email is empty", ...)`
- Table-driven tests with descriptive `name` fields
- Arrange-Act-Assert structure with blank lines separating sections
```go
// Good: clear behavior name, clear structure
func TestValidateUser_RejectsEmptyEmail(t *testing.T) {
// Arrange
user := User{Name: "Alice", Email: ""}
// Act
err := ValidateUser(user)
// Assert
require.Error(t, err)
assert.ErrorIs(t, err, ErrInvalidEmail)
}
// Bad: cryptic name, no structure
func TestUser1(t *testing.T) {
u := User{}
assert.NotNil(t, ValidateUser(u))
}
```
**Negative signals:** cryptic names (`test_1`, `TestFoo`), no AAA structure, multiple behaviors in one test.
### Maintainable (M)
Tests that break when implementation changes (but behavior doesn't) create noise and slow down development.
**Negative signals:**
- Over-specified mock interactions (`assert.Called(mock, "MethodX", args...)` when behavior is all that matters)
- ArgumentCaptor deep inspection
- `verifyNoMoreInteractions` that breaks when you add a logging call
- Tests coupled to internal field names
**Go patterns that help:**
- Test behavior via public API, not internal state
- Avoid asserting on exact call counts unless the count IS the behavior
```go
// Bad: breaks when you add an audit log call
mock.AssertCalled(t, "Save", user)
mock.AssertNumberOfCalls(t, "Save", 1)
mock.AssertNotCalled(t, "Log") // Breaks if you add logging later
// Good: test the outcome
result, err := service.CreateUser(ctx, req)
require.NoError(t, err)
assert.Equal(t, user.Email, result.Email)
```
### Repeatable (R)
Tests must produce the same result regardless of when, where, or in what order they run.
**Negative signals (Go):**
- `time.Now()` in test logic without injection
- `os.ReadFile` for fixtures that aren't hermetic
- Shared global state between tests
- Tests that depend on network availability
- `time.Sleep` for synchronization
**Go fixes:**
```go
// Bad: time-dependent
func TestTokenExpiry(t *testing.T) {
token := generateToken()
time.Sleep(2 * time.Second)
assert.True(t, token.IsExpired())
}
// Good: inject clock
type Clock interface {
Now() time.Time
}
type FixedClock struct{ t time.Time }
func (c FixedClock) Now() time.Time { return c.t }
func TestTokenExpiry(t *testing.T) {
clock := FixedClock{t: time.Unix(0, 0)}
token := generateTokenWithClock(clock)
futureClk := FixedClock{t: time.Unix(3600, 0)}
assert.True(t, token.IsExpiredAt(futureClk.Now()))
}
```
Use `t.TempDir()` for filesystem fixtures — cleaned up automatically.
### Atomic (A)
One test = one behavior. Tests must be independent — running in any order must produce the same result.
**Go patterns:**
- `t.Parallel()` on subtests forces isolation
- Fresh state in each `t.Run`
- No `init()` or package-level setup that leaks between tests
```go
func TestUserService(t *testing.T) {
tests := []struct {
name string
input CreateUserReq
wantErr bool
}{
{"valid user", validReq, false},
{"duplicate email", dupEmailReq, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel() // Each subtest runs independently
store := NewInMemoryStore() // Fresh state per test
svc := NewUserService(store)
_, err := svc.Create(context.Background(), tt.input)
if tt.wantErr {
assert.Error(t, err)
} else {
assert.NoError(t, err)
}
})
}
}
```
### Necessary (N)
Tests must verify real behavior. Tautology Theatre — tests whose outcome is predetermined regardless of production code — provides false confidence.
**Types of Tautology Theatre:**
1. **Mock tautology:** Configure mock return, then assert that mock returns it.
```go
// Bad: this passes even if production code is deleted
mockStore.On("GetUser", id).Return(user, nil)
result, _ := mockStore.GetUser(id)
assert.Equal(t, user, result) // Testing the mock, not production code
```
2. **Mock-only test:** Every object is a mock; no real class instantiated.
3. **Trivial tautology:** `assert.True(t, true)` or `assert.NotNil(t, new(User))`
4. **Framework test:** Verifying that Go's `make(map[string]int)` returns non-nil.
**Fix:** Test real behavior through real implementations. Use mocks only to isolate from external systems (DB, HTTP, filesystem).
### Granular (G)
Each test covers one specific case. Table-driven tests in Go are the natural expression of granularity.
```go
// Good: each row is one case, each can fail independently
func TestParseAmount(t *testing.T) {
tests := []struct {
name string
input string
want Amount
wantErr bool
}{
{"integer", "100", Amount{Value: 100}, false},
{"decimal", "10.50", Amount{Value: 1050, Scale: 2}, false},
{"negative", "-5", Amount{}, true},
{"empty", "", Amount{}, true},
{"non-numeric", "abc", Amount{}, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := ParseAmount(tt.input)
if tt.wantErr {
require.Error(t, err)
return
}
require.NoError(t, err)
assert.Equal(t, tt.want, got)
})
}
}
```
### Fast (F)
Tests must run fast enough to support TDD cycles. Target: the full test suite in < 30 seconds for most projects.
**Go fixes:**
- Mark slow integration tests with build tags: `//go:build integration`
- Use `t.Parallel()` to parallelize safe tests
- Use `InMemoryStore` implementations instead of real DB for unit tests
- Use `httptest.NewServer` for HTTP tests instead of real servers
```bash
# Unit tests only (fast, default)
go test ./...
# Integration tests (slower, explicit)
go test -tags=integration ./...
```
**Negative signals:** `time.Sleep`, network calls without build tags, database calls in unit tests.
### First / TDD (T)
Evidence that tests were written before implementation. This is the hardest property to verify statically.
**Positive signals:**
- Commit history shows test commit before implementation commit
- Tests test behavior, not implementation details (tests-first forces API design)
- Tests are simpler than the implementation (tests-first keeps tests focused)
**Negative signals:**
- Tests that exactly mirror the implementation structure
- Tests that only cover happy paths (implementation-first misses edge cases)
- Tests added in the same commit as a large implementation
## Go-Specific Test Design Notes
### t.Helper()
Use `t.Helper()` in helper functions so stack traces point to the call site, not the helper:
```go
func assertValidUser(t *testing.T, u User) {
t.Helper()
assert.NotEmpty(t, u.ID)
assert.NotEmpty(t, u.Email)
}
```
### Table-Driven Tests Are Preferred
Go convention is table-driven tests. They're granular, readable, and easy to extend:
- Add a new case by adding a row to the table — no new test function
- Each case can be run independently: `go test -run TestFoo/case_name`
### Subtests Enable Targeted Runs
```bash
go test -run TestValidateUser/rejects_empty_email ./...
```
## When Writing Tests
Apply this checklist to every new test:
- [ ] Name describes the behavior being tested (not the function name)
- [ ] Structure follows Arrange-Act-Assert
- [ ] Tests one behavior (no "and" in the name)
- [ ] Uses real implementations where feasible
- [ ] Runs in < 100ms (or tagged for integration)
- [ ] Uses `t.Helper()` in helper functions
- [ ] Table-driven if testing multiple similar inputs
## Cross-References
- Load `tdd` skill for the full TDD workflow
- Load `code-review` skill for test quality review during pre-merge review
- See `clean-code/references/code-smells.md` for testing-specific smells