From a65ebc3ff9a868bd447faa59789ee8e9ad8c534a Mon Sep 17 00:00:00 2001 From: Daan De Meyer Date: Fri, 10 Apr 2026 08:04:08 +0000 Subject: [PATCH] claude-review: improve review quality for large PRs Several issues were identified from analyzing logs of a large (52-commit) PR review: - Claude was batching multiple commits into a single review agent instead of one per worktree. Strengthen the prompt to explicitly prohibit grouping. - Claude was reading pr-context.json and commit messages before spawning agents despite instructions not to, wasting time. Tighten the pre-spawn rules to only allow listing worktrees/ and reading review-schema.json. - Subagents were spawned with model "sonnet" instead of "opus". Add explicit instruction to use opus. - After agents returned, Claude spent 9 minutes re-verifying findings with bash/grep/sed commands, duplicating the agents' work. Add instruction to trust subagent findings and only read pr-context.json in phase 2. - Subagents returned markdown-wrapped JSON instead of raw JSON arrays. Add instruction requiring raw JSON output only. - Each subagent was independently reading review-schema.json. Instead have the main agent read it once and paste it into each subagent prompt. - The "drop low-confidence findings" instruction was being used to justify dropping findings that Claude itself acknowledged as valid ("solid cleanup suggestions", "reasonable consistency improvement"). Remove the instruction. - Simplify the deduplication instructions - Stop adding the severity to the body in the post processing job as claude is also adding it so they end up duplicated. --- .github/workflows/claude-review.yml | 59 +++++++++++++++-------------- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/.github/workflows/claude-review.yml b/.github/workflows/claude-review.yml index a079cf11642..bf20e7d51e9 100644 --- a/.github/workflows/claude-review.yml +++ b/.github/workflows/claude-review.yml @@ -256,27 +256,34 @@ jobs: ## Phase 1: Review commits - List the directories in `worktrees/` — there is one per commit. Each - worktree at `worktrees//` contains the full source tree checked out at - that commit, plus `commit.patch` (the diff) and `commit-message.txt` - (the commit message). Spawn one - review subagent per worktree, all in a single message so they run concurrently. - Do NOT pre-compute diffs or read any other files before spawning — the subagents - will do that themselves. + First, list the directories in `worktrees/` and read `review-schema.json`. + Then, spawn exactly one review subagent per worktree directory, all in a + single message so they run concurrently. Do NOT batch or group multiple + commits into a single agent. Do NOT read any other files before spawning — + the subagents will do that themselves. + + Each worktree at `worktrees//` contains the full source tree checked + out at that commit, plus `commit.patch` (the diff) and `commit-message.txt` + (the commit message). Each reviewer reviews design, code quality, style, potential bugs, and security implications. + Each subagent must be spawned with `model: "opus"`. + Each subagent prompt must include: - Instructions to read `pr-context.json` in the repository root for additional context. - - Instructions to read `review-schema.json` in the repository root and - return a JSON array matching the `comments` items schema from that file. + - The contents of `review-schema.json` (paste it into each prompt so the + agent doesn't have to read it separately). - The worktree path. - Instructions to read `commit-message.txt` and `commit.patch` in the worktree for the commit message and diff. - Instructions to verify every `line` and `start_line` value against the hunk ranges in `commit.patch` before returning. + - Instructions to return ONLY a raw JSON array of findings. No markdown, + no explanation, no code fences — just the JSON array. If there are no + findings, return `[]`. ## Phase 2: Collect, deduplicate, and summarize @@ -290,26 +297,20 @@ jobs: populate the `resolve` array. - If `tracking_comment` is non-null, use it as the basis for your summary. + Trust the subagent findings — do NOT re-verify them by running your own + bash, grep, sed, or awk commands against the source code. Phase 2 should + only read `pr-context.json` and then produce the structured output. + Then: - 1. Collect all issues. Merge duplicates (same file, lines within 3 of each other, same problem). - 2. Drop low-confidence findings. - 3. Check the existing inline review comments from `pr-context.json`. Do NOT - include a comment if one already exists on the same file about the same - problem, even if the exact line numbers differ (lines shift between - revisions). Also check for author replies that dismiss or reject a previous - comment — do NOT re-raise an issue the PR author has already responded to - disagreeing with. - Populate the `resolve` array with the REST API `id` (integer) of your own - review comment threads that should be resolved (user.login == "github-actions[bot]" - and body starts with "Claude: "). Do not resolve threads from human reviewers. - A thread should be resolved if: - - The issue it raised has been addressed in the current PR (i.e. your review - no longer flags it), or - - The PR author (or another reviewer) left a reply disagreeing with or - dismissing the comment. - Only include the `id` of the **first** comment in each thread (the one that - started the conversation). Do not resolve threads for issues that are still - present and unaddressed. + 1. Collect all issues. Merge duplicates across agents (same file, same + problem, lines within 3 of each other). + 2. Drop issues that already have a review comment on the same file about + the same problem, or where the PR author replied disagreeing. + 3. Populate the `resolve` array with the `id` of your own review comment + threads (user.login == "github-actions[bot]", body starts with + "Claude: ") that should be resolved — either because the issue was + fixed or because the author dismissed it. Use the first comment `id` + in each thread. Do not resolve threads from human reviewers. 4. Write a `summary` field in markdown for a top-level tracking comment. **If no existing tracking comment was found (first run):** @@ -486,7 +487,7 @@ jobs: ...(c.side != null && { side: c.side }), ...(c.start_line != null && { start_line: c.start_line }), ...(c.start_side != null && { start_side: c.start_side }), - body: `Claude: **${c.severity}**: ${c.body}`, + body: c.body, }); posted++; } catch (e) { -- 2.47.3