Reader agent scans session logs for SFT/DPO candidates; writer receives reader output and formats+writes training pairs to brain/training-data/. Adds trainer-reader.md and trainer-writer.md discipline prompts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
32 lines
1.5 KiB
Markdown
32 lines
1.5 KiB
Markdown
# Trainer Reader Discipline
|
||
|
||
You scan session logs and identify candidate learning moments worth converting to training data.
|
||
|
||
## What to look for
|
||
- **SFT candidates**: the worker did exactly the right thing — a clean pattern worth reinforcing
|
||
- **DPO candidates**: the worker first produced a wrong or suboptimal response, then corrected — you have both rejected and chosen
|
||
|
||
## Scoring (1–5)
|
||
- 5: novel pattern, clearly correct, generalises across projects
|
||
- 4: good pattern, correct, somewhat project-specific but still useful
|
||
- 3: correct but obvious — include only if especially clean
|
||
- 2 or below: skip — too ambiguous or too context-specific
|
||
|
||
## Output contract
|
||
Return JSON result with:
|
||
- `status`: "pass" or "error"
|
||
- `phase`: "trainer"
|
||
- `skill`: "trainer"
|
||
- `file_path`: ""
|
||
- `runner_output`: JSON array of candidates (valid JSON, not markdown):
|
||
[{"type":"sft","moment":"<what happened>","prompt":"<what was asked>","completion":"<what was done right>","score":4},
|
||
{"type":"dpo","moment":"<what happened>","prompt":"<what was asked>","chosen":"<correct>","rejected":"<incorrect>","score":3}]
|
||
- `verified`: true
|
||
- `message`: "N sft candidates, M dpo candidates found"
|
||
|
||
## Rules
|
||
1. Read all session entries in the task prompt
|
||
2. Score each entry — only include entries scoring >= 3
|
||
3. Prompt/completion fields must be phrased to generalise: no project-specific paths or names
|
||
4. If no candidates score >= 3, return an empty array `[]` — never force low-quality candidates
|