Files
skills/debug/SKILL.md
Mathias d6a71e370e
Some checks failed
release / tag (push) Has been cancelled
chore: bootstrap skills library — 19 skills + installer + CI auto-tag
Phase 1 of mathias/skills extraction (infra#62 Track D — homelab
next-step plan addendum). Imports ~/dev/.skills/ verbatim (19 skill
dirs + SKILLS_INDEX.md) and adds the installation surface:

- Taskfile.yml — install / update / list / release / check targets
- install.sh — bootstrap installer for hosts without Task. Idempotent
  symlink wirer; default checkout at ~/.local/share/skills/ on every
  host; SKILLS_REF env var pins a tag (default: main).
- .gitea/workflows/release.yml — auto-tag every push to main by
  Bump-Type footer (major/minor/patch, default patch). Skipped when
  commit contains [skip-release].
- README — usage, versioning, contribution flow, secret-hygiene rule.

Phase 1 wires Claude Code only (~/.claude/skills/<name> global +
<repo>/.claude/skills/<name> per-repo). Phase 2 adds Crush, opencode,
antigravity, and gitea-resident agents (cobalt-dingo, agentsquad)
once their skill conventions are researched.

Public repo, markdown-only — no secrets, no client names. Verified
via pre-push grep before initial push.

[skip-release]
2026-05-24 14:59:54 +02:00

6.6 KiB

name, description
name description
debug Systematic hypothesis-first debugging. Generate 3-5 ranked hypotheses with verification steps before suggesting any fix. Use when encountering a bug, test failure, or unexpected behavior.

Debug

Overview

Debugging is hypothesis generation, not fix generation. The first instinct — "try this and see if it works" — wastes time and teaches nothing. A disciplined debug session produces a small set of ranked, falsifiable hypotheses, each with a concrete verification command. The fix comes after one hypothesis is confirmed, not before.

Core principle: A hypothesis you cannot verify in one command is not a hypothesis — it is a guess.

When to Use

  • Any test failure whose cause is not immediately obvious
  • Any production error or unexpected behavior
  • Any bug report from a user or stakeholder
  • Before reaching for a debugger or adding print statements: form hypotheses first

Do not use for:

  • Compile errors with clear messages (just fix the typo)
  • Already-diagnosed bugs where you know the cause (go straight to TDD with a regression test)

Iron Laws

  1. Never suggest "try X and see what happens." Every hypothesis must have a specific expected outcome if correct.
  2. Generate 3-5 hypotheses, ordered by likelihood (most likely first). Fewer than 3 means you stopped thinking; more than 5 means you are not prioritizing.
  3. Diagnose only — do not fix in this skill. The fix happens in a separate TDD cycle (load tdd skill) once a hypothesis is confirmed.

Process

Step 1: Read the failure

Before forming hypotheses:

  • Read the full error message and stack trace, not just the headline
  • Read the file where the failure originated, around the failing line
  • If the failure is from a test, read the test and the code under test
  • Identify the failure mode — what actually went wrong (e.g. "nil pointer dereference in goroutine spawned by handler") not just what the error says ("runtime error")

Step 2: Generate hypotheses

For each hypothesis, capture three things:

  • Mechanism: what specific code path or state would produce this exact failure
  • Verification: the single command or file inspection that confirms or denies it
  • Expected outcome if correct: the specific output you would see

Order by likelihood. The most likely cause is hypothesis 1.

Step 3: Output

Use this format:

HYPOTHESIS 1 (likelihood: high): <mechanism in one sentence>
VERIFY: <exact command or file:line to inspect>
EXPECTED IF CORRECT: <specific output, value, or condition>

HYPOTHESIS 2 (likelihood: medium): <mechanism>
VERIFY: <exact command>
EXPECTED IF CORRECT: <specific output>

[... up to 5 ...]

RECOMMENDED NEXT STEP: Run VERIFY for hypothesis 1 first.

End with the recommendation, not a fix.

Worked Example

Failure:

--- FAIL: TestInvoiceParser_HandlesEmptyPDF (0.00s)
    parser_test.go:47: panic: runtime error: index out of range [0] with length 0

Output:

HYPOTHESIS 1 (likelihood: high): parser indexes into pages[0] before checking len(pages) > 0; empty PDFs produce a zero-page document
VERIFY: rg -n 'pages\[0\]' internal/parser/
EXPECTED IF CORRECT: at least one site reads pages[0] without a preceding length check

HYPOTHESIS 2 (likelihood: medium): the test fixture is a zero-byte file rather than a valid empty PDF; pdf library returns nil pages slice instead of empty
VERIFY: ls -la testdata/empty.pdf && file testdata/empty.pdf
EXPECTED IF CORRECT: file size 0 bytes or "data" rather than "PDF document"

HYPOTHESIS 3 (likelihood: low): a recently changed dependency reordered the page-extraction API; pages[0] now refers to metadata, not content
VERIFY: git log --oneline -10 -- go.sum | grep -i pdf
EXPECTED IF CORRECT: a pdf-library bump in the last few commits

RECOMMENDED NEXT STEP: Run VERIFY for hypothesis 1 first.

Anti-Patterns

Anti-Pattern Why It Fails
"Maybe try restarting the service" Not a mechanism. Not falsifiable. Teaches nothing if it works.
"Could be a race condition" Mechanism without specifics. Which two operations race? On what state?
"Let me add some print statements and see" Skips hypothesis generation. Generates noise, not understanding.
Single hypothesis presented as fact If you are sure, write the regression test, do not run a debug skill.
Mixing hypotheses with fix suggestions The skill is diagnose-only. The fix is a TDD task on its hypothesis.

Brain MCP Integration

The brain holds prior debug sessions across the project. Use it to skip rediscovering known failure modes.

At debug start:

  • Run brain_query with the error message snippet + the package name. Past sessions may have logged identical or similar failures with their resolved hypotheses.

After a hypothesis is confirmed:

  • Run brain_write with the failure signature → confirmed mechanism. Future debug sessions on the same area get the answer immediately.

Never:

  • Run brain_write for an unconfirmed hypothesis. Speculation in the brain pollutes future queries.

Logging

Call session_log once at the end of every phase to record the outcome. Pass-rate is computed downstream by the /pass-rate HTTP endpoint, which treats pass as success, fail as failure, skip as neither.

At end of each phase:

  • session_log with {skill: "debug", phase: "<phase-name>", final_status: "pass" | "fail" | "skip", message: "<one-line summary>", duration_ms: <wall-clock>, project_root: "<absolute path>"}

Phases for this skill: read-failure, generate-hypotheses, output

Status semantics:

  • pass — the phase's intended outcome was reached.
  • fail — the phase's intended outcome was NOT reached.
  • skip — phase was skipped intentionally.

Why this matters: the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data.

Mode 2 Routing Note

This skill produces high-volume mechanical output (hypothesis enumeration) and is a candidate for Mode 2 routing to a local model in the future. Until Plan 6 ships the routing pod, treat as Mode 1 only. The hypothesis format and discipline are identical regardless of which model generates them.

Cross-References

  • After a hypothesis is confirmed, load tdd skill — write a failing regression test that proves the bug exists, then fix it.
  • For test failures specifically caused by mock-vs-real divergence, also load tdd/references/testing-anti-patterns.md.
  • Load code-review skill if the diagnosis surfaces a structural issue (god object, shotgun surgery) rather than a pointwise bug.