Phase 1 of mathias/skills extraction (infra#62 Track D — homelab next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill dirs + SKILLS_INDEX.md) and adds the installation surface: - Taskfile.yml — install / update / list / release / check targets - install.sh — bootstrap installer for hosts without Task. Idempotent symlink wirer; default checkout at ~/.local/share/skills/ on every host; SKILLS_REF env var pins a tag (default: main). - .gitea/workflows/release.yml — auto-tag every push to main by Bump-Type footer (major/minor/patch, default patch). Skipped when commit contains [skip-release]. - README — usage, versioning, contribution flow, secret-hygiene rule. Phase 1 wires Claude Code only (~/.claude/skills/<name> global + <repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode, antigravity, and gitea-resident agents (cobalt-dingo, agentsquad) once their skill conventions are researched. Public repo, markdown-only — no secrets, no client names. Verified via pre-push grep before initial push. [skip-release]
9.9 KiB
name, description
| name | description |
|---|---|
| test-design | Evaluate test quality using Dave Farley's 8 Properties of Good Tests. Use when reviewing or writing tests to ensure they provide genuine verification. |
Test Design
Overview
Good tests are investments. Bad tests are liabilities — they pass when they shouldn't, fail when code is correct, or verify nothing meaningful.
This skill uses Dave Farley's 8 Properties of Good Tests to assess and improve test quality. The Farley Index (0–10) provides a scored summary.
Reference: Dave Farley's Properties of Good Tests
The 8 Properties
| Property | Weight | What it measures |
|---|---|---|
| Understandable | 1.5x | Can a reader understand what behavior is being tested? |
| Maintainable | 1.5x | Will small code changes cause test failures unrelated to behavior? |
| Repeatable | 1.25x | Same result every time, regardless of environment or order |
| Atomic | 1.0x | One behavior per test; tests are independent |
| Necessary | 1.0x | Tests real behavior, not mock internals or framework behavior |
| Granular | 1.0x | Each test covers one specific case |
| Fast | 0.75x | Tests run quickly enough to support rapid TDD cycles |
| First (TDD) | 1.0x | Tests were written before implementation |
Farley Index formula: (U×1.5 + M×1.5 + R×1.25 + A×1.0 + N×1.0 + G×1.0 + F×0.75 + T×1.0) / 9.0
Rating Scale
| Score | Rating | Interpretation |
|---|---|---|
| 9.0–10.0 | Exemplary | Model quality; tests serve as living documentation |
| 7.5–8.9 | Excellent | High quality with minor improvement opportunities |
| 6.0–7.4 | Good | Solid foundation with clear improvement areas |
| 4.5–5.9 | Fair | Functional but needs significant attention |
| 3.0–4.4 | Poor | Tests provide limited value; refactoring needed |
| 0.0–2.9 | Critical | Tests may be harmful; consider rewriting |
Property Deep Dives
Understandable (U)
A test should tell a story: what behavior, under what conditions, produces what result.
Go patterns that help:
- Subtest names in
t.Run:t.Run("returns error when email is empty", ...) - Table-driven tests with descriptive
namefields - Arrange-Act-Assert structure with blank lines separating sections
// Good: clear behavior name, clear structure
func TestValidateUser_RejectsEmptyEmail(t *testing.T) {
// Arrange
user := User{Name: "Alice", Email: ""}
// Act
err := ValidateUser(user)
// Assert
require.Error(t, err)
assert.ErrorIs(t, err, ErrInvalidEmail)
}
// Bad: cryptic name, no structure
func TestUser1(t *testing.T) {
u := User{}
assert.NotNil(t, ValidateUser(u))
}
Negative signals: cryptic names (test_1, TestFoo), no AAA structure, multiple behaviors in one test.
Maintainable (M)
Tests that break when implementation changes (but behavior doesn't) create noise and slow down development.
Negative signals:
- Over-specified mock interactions (
assert.Called(mock, "MethodX", args...)when behavior is all that matters) - ArgumentCaptor deep inspection
verifyNoMoreInteractionsthat breaks when you add a logging call- Tests coupled to internal field names
Go patterns that help:
- Test behavior via public API, not internal state
- Avoid asserting on exact call counts unless the count IS the behavior
// Bad: breaks when you add an audit log call
mock.AssertCalled(t, "Save", user)
mock.AssertNumberOfCalls(t, "Save", 1)
mock.AssertNotCalled(t, "Log") // Breaks if you add logging later
// Good: test the outcome
result, err := service.CreateUser(ctx, req)
require.NoError(t, err)
assert.Equal(t, user.Email, result.Email)
Repeatable (R)
Tests must produce the same result regardless of when, where, or in what order they run.
Negative signals (Go):
time.Now()in test logic without injectionos.ReadFilefor fixtures that aren't hermetic- Shared global state between tests
- Tests that depend on network availability
time.Sleepfor synchronization
Go fixes:
// Bad: time-dependent
func TestTokenExpiry(t *testing.T) {
token := generateToken()
time.Sleep(2 * time.Second)
assert.True(t, token.IsExpired())
}
// Good: inject clock
type Clock interface {
Now() time.Time
}
type FixedClock struct{ t time.Time }
func (c FixedClock) Now() time.Time { return c.t }
func TestTokenExpiry(t *testing.T) {
clock := FixedClock{t: time.Unix(0, 0)}
token := generateTokenWithClock(clock)
futureClk := FixedClock{t: time.Unix(3600, 0)}
assert.True(t, token.IsExpiredAt(futureClk.Now()))
}
Use t.TempDir() for filesystem fixtures — cleaned up automatically.
Atomic (A)
One test = one behavior. Tests must be independent — running in any order must produce the same result.
Go patterns:
t.Parallel()on subtests forces isolation- Fresh state in each
t.Run - No
init()or package-level setup that leaks between tests
func TestUserService(t *testing.T) {
tests := []struct {
name string
input CreateUserReq
wantErr bool
}{
{"valid user", validReq, false},
{"duplicate email", dupEmailReq, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel() // Each subtest runs independently
store := NewInMemoryStore() // Fresh state per test
svc := NewUserService(store)
_, err := svc.Create(context.Background(), tt.input)
if tt.wantErr {
assert.Error(t, err)
} else {
assert.NoError(t, err)
}
})
}
}
Necessary (N)
Tests must verify real behavior. Tautology Theatre — tests whose outcome is predetermined regardless of production code — provides false confidence.
Types of Tautology Theatre:
-
Mock tautology: Configure mock return, then assert that mock returns it.
// Bad: this passes even if production code is deleted mockStore.On("GetUser", id).Return(user, nil) result, _ := mockStore.GetUser(id) assert.Equal(t, user, result) // Testing the mock, not production code -
Mock-only test: Every object is a mock; no real class instantiated.
-
Trivial tautology:
assert.True(t, true)orassert.NotNil(t, new(User)) -
Framework test: Verifying that Go's
make(map[string]int)returns non-nil.
Fix: Test real behavior through real implementations. Use mocks only to isolate from external systems (DB, HTTP, filesystem).
Granular (G)
Each test covers one specific case. Table-driven tests in Go are the natural expression of granularity.
// Good: each row is one case, each can fail independently
func TestParseAmount(t *testing.T) {
tests := []struct {
name string
input string
want Amount
wantErr bool
}{
{"integer", "100", Amount{Value: 100}, false},
{"decimal", "10.50", Amount{Value: 1050, Scale: 2}, false},
{"negative", "-5", Amount{}, true},
{"empty", "", Amount{}, true},
{"non-numeric", "abc", Amount{}, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := ParseAmount(tt.input)
if tt.wantErr {
require.Error(t, err)
return
}
require.NoError(t, err)
assert.Equal(t, tt.want, got)
})
}
}
Fast (F)
Tests must run fast enough to support TDD cycles. Target: the full test suite in < 30 seconds for most projects.
Go fixes:
- Mark slow integration tests with build tags:
//go:build integration - Use
t.Parallel()to parallelize safe tests - Use
InMemoryStoreimplementations instead of real DB for unit tests - Use
httptest.NewServerfor HTTP tests instead of real servers
# Unit tests only (fast, default)
go test ./...
# Integration tests (slower, explicit)
go test -tags=integration ./...
Negative signals: time.Sleep, network calls without build tags, database calls in unit tests.
First / TDD (T)
Evidence that tests were written before implementation. This is the hardest property to verify statically.
Positive signals:
- Commit history shows test commit before implementation commit
- Tests test behavior, not implementation details (tests-first forces API design)
- Tests are simpler than the implementation (tests-first keeps tests focused)
Negative signals:
- Tests that exactly mirror the implementation structure
- Tests that only cover happy paths (implementation-first misses edge cases)
- Tests added in the same commit as a large implementation
Go-Specific Test Design Notes
t.Helper()
Use t.Helper() in helper functions so stack traces point to the call site, not the helper:
func assertValidUser(t *testing.T, u User) {
t.Helper()
assert.NotEmpty(t, u.ID)
assert.NotEmpty(t, u.Email)
}
Table-Driven Tests Are Preferred
Go convention is table-driven tests. They're granular, readable, and easy to extend:
- Add a new case by adding a row to the table — no new test function
- Each case can be run independently:
go test -run TestFoo/case_name
Subtests Enable Targeted Runs
go test -run TestValidateUser/rejects_empty_email ./...
When Writing Tests
Apply this checklist to every new test:
- Name describes the behavior being tested (not the function name)
- Structure follows Arrange-Act-Assert
- Tests one behavior (no "and" in the name)
- Uses real implementations where feasible
- Runs in < 100ms (or tagged for integration)
- Uses
t.Helper()in helper functions - Table-driven if testing multiple similar inputs
Cross-References
- Load
tddskill for the full TDD workflow - Load
code-reviewskill for test quality review during pre-merge review - See
clean-code/references/code-smells.mdfor testing-specific smells