--- name: debug description: Systematic hypothesis-first debugging. Generate 3-5 ranked hypotheses with verification steps before suggesting any fix. Use when encountering a bug, test failure, or unexpected behavior. --- # Debug ## Overview Debugging is hypothesis generation, not fix generation. The first instinct — "try this and see if it works" — wastes time and teaches nothing. A disciplined debug session produces a small set of ranked, falsifiable hypotheses, each with a concrete verification command. The fix comes after one hypothesis is confirmed, not before. **Core principle:** A hypothesis you cannot verify in one command is not a hypothesis — it is a guess. ## When to Use - Any test failure whose cause is not immediately obvious - Any production error or unexpected behavior - Any bug report from a user or stakeholder - Before reaching for a debugger or adding print statements: form hypotheses first **Do not use for:** - Compile errors with clear messages (just fix the typo) - Already-diagnosed bugs where you know the cause (go straight to TDD with a regression test) ## Iron Laws 1. **Never suggest "try X and see what happens."** Every hypothesis must have a specific expected outcome if correct. 2. **Generate 3-5 hypotheses, ordered by likelihood (most likely first).** Fewer than 3 means you stopped thinking; more than 5 means you are not prioritizing. 3. **Diagnose only — do not fix in this skill.** The fix happens in a separate TDD cycle (load `tdd` skill) once a hypothesis is confirmed. ## Process ### Step 1: Read the failure Before forming hypotheses: - Read the full error message and stack trace, not just the headline - Read the file where the failure originated, around the failing line - If the failure is from a test, read the test and the code under test - Identify the **failure mode** — what actually went wrong (e.g. "nil pointer dereference in goroutine spawned by handler") not just what the error says ("runtime error") ### Step 2: Generate hypotheses For each hypothesis, capture three things: - **Mechanism:** what specific code path or state would produce this exact failure - **Verification:** the single command or file inspection that confirms or denies it - **Expected outcome if correct:** the specific output you would see Order by likelihood. The most likely cause is hypothesis 1. ### Step 3: Output Use this format: ``` HYPOTHESIS 1 (likelihood: high): VERIFY: EXPECTED IF CORRECT: HYPOTHESIS 2 (likelihood: medium): VERIFY: EXPECTED IF CORRECT: [... up to 5 ...] RECOMMENDED NEXT STEP: Run VERIFY for hypothesis 1 first. ``` End with the recommendation, not a fix. ## Worked Example **Failure:** ``` --- FAIL: TestInvoiceParser_HandlesEmptyPDF (0.00s) parser_test.go:47: panic: runtime error: index out of range [0] with length 0 ``` **Output:** ``` HYPOTHESIS 1 (likelihood: high): parser indexes into pages[0] before checking len(pages) > 0; empty PDFs produce a zero-page document VERIFY: rg -n 'pages\[0\]' internal/parser/ EXPECTED IF CORRECT: at least one site reads pages[0] without a preceding length check HYPOTHESIS 2 (likelihood: medium): the test fixture is a zero-byte file rather than a valid empty PDF; pdf library returns nil pages slice instead of empty VERIFY: ls -la testdata/empty.pdf && file testdata/empty.pdf EXPECTED IF CORRECT: file size 0 bytes or "data" rather than "PDF document" HYPOTHESIS 3 (likelihood: low): a recently changed dependency reordered the page-extraction API; pages[0] now refers to metadata, not content VERIFY: git log --oneline -10 -- go.sum | grep -i pdf EXPECTED IF CORRECT: a pdf-library bump in the last few commits RECOMMENDED NEXT STEP: Run VERIFY for hypothesis 1 first. ``` ## Anti-Patterns | Anti-Pattern | Why It Fails | |---|---| | "Maybe try restarting the service" | Not a mechanism. Not falsifiable. Teaches nothing if it works. | | "Could be a race condition" | Mechanism without specifics. Which two operations race? On what state? | | "Let me add some print statements and see" | Skips hypothesis generation. Generates noise, not understanding. | | Single hypothesis presented as fact | If you are sure, write the regression test, do not run a debug skill. | | Mixing hypotheses with fix suggestions | The skill is diagnose-only. The fix is a TDD task on its hypothesis. | ## Brain MCP Integration The brain holds prior debug sessions across the project. Use it to skip rediscovering known failure modes. **At debug start:** - Run `brain_query` with the error message snippet + the package name. Past sessions may have logged identical or similar failures with their resolved hypotheses. **After a hypothesis is confirmed:** - Run `brain_write` with the failure signature → confirmed mechanism. Future debug sessions on the same area get the answer immediately. **Never:** - Run `brain_write` for an unconfirmed hypothesis. Speculation in the brain pollutes future queries. ### Logging Call `session_log` once at the end of every phase to record the outcome. Pass-rate is computed downstream by the `/pass-rate` HTTP endpoint, which treats `pass` as success, `fail` as failure, `skip` as neither. **At end of each phase:** - `session_log` with `{skill: "debug", phase: "", final_status: "pass" | "fail" | "skip", message: "", duration_ms: , project_root: ""}` **Phases for this skill:** read-failure, generate-hypotheses, output **Status semantics:** - `pass` — the phase's intended outcome was reached. - `fail` — the phase's intended outcome was NOT reached. - `skip` — phase was skipped intentionally. **Why this matters:** the routing pod (Plan 6) reads pass-rate to decide whether to route a future call to a local model. If your skill never logs, the routing pod sees no data. ## Mode 2 Routing Note This skill produces high-volume mechanical output (hypothesis enumeration) and is a candidate for Mode 2 routing to a local model in the future. Until Plan 6 ships the routing pod, treat as Mode 1 only. The hypothesis format and discipline are identical regardless of which model generates them. ## Cross-References - After a hypothesis is confirmed, load `tdd` skill — write a failing regression test that proves the bug exists, then fix it. - For test failures specifically caused by mock-vs-real divergence, also load `tdd/references/testing-anti-patterns.md`. - Load `code-review` skill if the diagnosis surfaces a structural issue (god object, shotgun surgery) rather than a pointwise bug.