From a65ebc3ff9a868bd447faa59789ee8e9ad8c534a Mon Sep 17 00:00:00 2001
From: Daan De Meyer <daan@amutable.com>
Date: Fri, 10 Apr 2026 08:04:08 +0000
Subject: [PATCH] claude-review: improve review quality for large PRs

Several issues were identified from analyzing logs of a large (52-commit) PR
review:

- Claude was batching multiple commits into a single review agent instead of
  one per worktree. Strengthen the prompt to explicitly prohibit grouping.
- Claude was reading pr-context.json and commit messages before spawning
  agents despite instructions not to, wasting time. Tighten the pre-spawn
  rules to only allow listing worktrees/ and reading review-schema.json.
- Subagents were spawned with model "sonnet" instead of "opus". Add explicit
  instruction to use opus.
- After agents returned, Claude spent 9 minutes re-verifying findings with
  bash/grep/sed commands, duplicating the agents' work. Add instruction to
  trust subagent findings and only read pr-context.json in phase 2.
- Subagents returned markdown-wrapped JSON instead of raw JSON arrays. Add
  instruction requiring raw JSON output only.
- Each subagent was independently reading review-schema.json. Instead have
  the main agent read it once and paste it into each subagent prompt.
- The "drop low-confidence findings" instruction was being used to justify
  dropping findings that Claude itself acknowledged as valid ("solid cleanup
  suggestions", "reasonable consistency improvement"). Remove the instruction.
- Simplify the deduplication instructions
- Stop adding the severity to the body in the post processing job as claude is
  also adding it so they end up duplicated.
---
 .github/workflows/claude-review.yml | 59 +++++++++++++++--------------
 1 file changed, 30 insertions(+), 29 deletions(-)
diff --git a/.github/workflows/claude-review.yml b/.github/workflows/claude-review.yml
index a079cf11642..bf20e7d51e9 100644
--- a/.github/workflows/claude-review.yml
+++ b/.github/workflows/claude-review.yml
@@ -256,27 +256,34 @@ jobs:
 
           ## Phase 1: Review commits
 
-          List the directories in `worktrees/` â there is one per commit. Each
-          worktree at `worktrees/<sha>/` contains the full source tree checked out at
-          that commit, plus `commit.patch` (the diff) and `commit-message.txt`
-          (the commit message). Spawn one
-          review subagent per worktree, all in a single message so they run concurrently.
-          Do NOT pre-compute diffs or read any other files before spawning â the subagents
-          will do that themselves.
+          First, list the directories in `worktrees/` and read `review-schema.json`.
+          Then, spawn exactly one review subagent per worktree directory, all in a
+          single message so they run concurrently. Do NOT batch or group multiple
+          commits into a single agent. Do NOT read any other files before spawning â
+          the subagents will do that themselves.
+
+          Each worktree at `worktrees/<sha>/` contains the full source tree checked
+          out at that commit, plus `commit.patch` (the diff) and `commit-message.txt`
+          (the commit message).
 
           Each reviewer reviews design, code quality, style, potential bugs, and
           security implications.
 
+          Each subagent must be spawned with `model: "opus"`.
+
           Each subagent prompt must include:
           - Instructions to read `pr-context.json` in the repository root for additional
             context.
-          - Instructions to read `review-schema.json` in the repository root and
-            return a JSON array matching the `comments` items schema from that file.
+          - The contents of `review-schema.json` (paste it into each prompt so the
+            agent doesn't have to read it separately).
           - The worktree path.
           - Instructions to read `commit-message.txt` and `commit.patch` in the
             worktree for the commit message and diff.
           - Instructions to verify every `line` and `start_line` value
             against the hunk ranges in `commit.patch` before returning.
+          - Instructions to return ONLY a raw JSON array of findings. No markdown,
+            no explanation, no code fences â just the JSON array. If there are no
+            findings, return `[]`.
 
           ## Phase 2: Collect, deduplicate, and summarize
 
@@ -290,26 +297,20 @@ jobs:
             populate the `resolve` array.
           - If `tracking_comment` is non-null, use it as the basis for your summary.
 
+          Trust the subagent findings â do NOT re-verify them by running your own
+          bash, grep, sed, or awk commands against the source code. Phase 2 should
+          only read `pr-context.json` and then produce the structured output.
+
           Then:
-          1. Collect all issues. Merge duplicates (same file, lines within 3 of each other, same problem).
-          2. Drop low-confidence findings.
-          3. Check the existing inline review comments from `pr-context.json`. Do NOT
-             include a comment if one already exists on the same file about the same
-             problem, even if the exact line numbers differ (lines shift between
-             revisions). Also check for author replies that dismiss or reject a previous
-             comment â do NOT re-raise an issue the PR author has already responded to
-             disagreeing with.
-             Populate the `resolve` array with the REST API `id` (integer) of your own
-             review comment threads that should be resolved (user.login == "github-actions[bot]"
-             and body starts with "Claude: "). Do not resolve threads from human reviewers.
-             A thread should be resolved if:
-             - The issue it raised has been addressed in the current PR (i.e. your review
-               no longer flags it), or
-             - The PR author (or another reviewer) left a reply disagreeing with or
-               dismissing the comment.
-             Only include the `id` of the **first** comment in each thread (the one that
-             started the conversation). Do not resolve threads for issues that are still
-             present and unaddressed.
+          1. Collect all issues. Merge duplicates across agents (same file, same
+             problem, lines within 3 of each other).
+          2. Drop issues that already have a review comment on the same file about
+             the same problem, or where the PR author replied disagreeing.
+          3. Populate the `resolve` array with the `id` of your own review comment
+             threads (user.login == "github-actions[bot]", body starts with
+             "Claude: ") that should be resolved â either because the issue was
+             fixed or because the author dismissed it. Use the first comment `id`
+             in each thread. Do not resolve threads from human reviewers.
           4. Write a `summary` field in markdown for a top-level tracking comment.
 
               **If no existing tracking comment was found (first run):**
@@ -486,7 +487,7 @@ jobs:
                   ...(c.side != null && { side: c.side }),
                   ...(c.start_line != null && { start_line: c.start_line }),
                   ...(c.start_side != null && { start_side: c.start_side }),
-                  body: `Claude: **${c.severity}**: ${c.body}`,
+                  body: c.body,
                 });
                 posted++;
               } catch (e) {
-- 
2.47.3