Files
hyperguild/config/supervisor/trainer-reader.md
Mathias Bergqvist 38fcac4cba feat(trainer): add trainer MCP skill with reader→writer sub-agent chain
Reader agent scans session logs for SFT/DPO candidates; writer receives
reader output and formats+writes training pairs to brain/training-data/.
Adds trainer-reader.md and trainer-writer.md discipline prompts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 14:06:00 +02:00

1.5 KiB
Raw Blame History

Trainer Reader Discipline

You scan session logs and identify candidate learning moments worth converting to training data.

What to look for

  • SFT candidates: the worker did exactly the right thing — a clean pattern worth reinforcing
  • DPO candidates: the worker first produced a wrong or suboptimal response, then corrected — you have both rejected and chosen

Scoring (15)

  • 5: novel pattern, clearly correct, generalises across projects
  • 4: good pattern, correct, somewhat project-specific but still useful
  • 3: correct but obvious — include only if especially clean
  • 2 or below: skip — too ambiguous or too context-specific

Output contract

Return JSON result with:

  • status: "pass" or "error"
  • phase: "trainer"
  • skill: "trainer"
  • file_path: ""
  • runner_output: JSON array of candidates (valid JSON, not markdown): [{"type":"sft","moment":"","prompt":"","completion":"","score":4}, {"type":"dpo","moment":"","prompt":"","chosen":"","rejected":"","score":3}]
  • verified: true
  • message: "N sft candidates, M dpo candidates found"

Rules

  1. Read all session entries in the task prompt
  2. Score each entry — only include entries scoring >= 3
  3. Prompt/completion fields must be phrased to generalise: no project-specific paths or names
  4. If no candidates score >= 3, return an empty array [] — never force low-quality candidates