claude-review: improve review quality for large PRs
Several issues were identified from analyzing logs of a large (52-commit) PR
review:
- Claude was batching multiple commits into a single review agent instead of
one per worktree. Strengthen the prompt to explicitly prohibit grouping.
- Claude was reading pr-context.json and commit messages before spawning
agents despite instructions not to, wasting time. Tighten the pre-spawn
rules to only allow listing worktrees/ and reading review-schema.json.
- Subagents were spawned with model "sonnet" instead of "opus". Add explicit
instruction to use opus.
- After agents returned, Claude spent 9 minutes re-verifying findings with
bash/grep/sed commands, duplicating the agents' work. Add instruction to
trust subagent findings and only read pr-context.json in phase 2.
- Subagents returned markdown-wrapped JSON instead of raw JSON arrays. Add
instruction requiring raw JSON output only.
- Each subagent was independently reading review-schema.json. Instead have
the main agent read it once and paste it into each subagent prompt.
- The "drop low-confidence findings" instruction was being used to justify
dropping findings that Claude itself acknowledged as valid ("solid cleanup
suggestions", "reasonable consistency improvement"). Remove the instruction.
- Simplify the deduplication instructions
- Stop adding the severity to the body in the post processing job as claude is
also adding it so they end up duplicated.