]> git.ipfire.org Git - thirdparty/kernel/linux.git/commitdiff
md/raid5: skip 2-failure compute when other disk is R5_LOCKED
authorFengWei Shih <dannyshih@synology.com>
Thu, 19 Mar 2026 05:33:51 +0000 (13:33 +0800)
committerYu Kuai <yukuai@fnnas.com>
Sun, 22 Mar 2026 01:57:33 +0000 (09:57 +0800)
When skip_copy is enabled on a doubly-degraded RAID6, a device that is
being written to will be in R5_LOCKED state with R5_UPTODATE cleared.
If a new read triggers fetch_block() while the write is still in
flight, the 2-failure compute path may select this locked device as a
compute target because it is not R5_UPTODATE.

Because skip_copy makes the device page point directly to the bio page,
reconstructing data into it might be risky. Also, since the compute
marks the device R5_UPTODATE, it triggers WARN_ON in ops_run_io()
which checks that R5_SkipCopy and R5_UPTODATE are not both set.

This can be reproduced by running small-range concurrent read/write on
a doubly-degraded RAID6 with skip_copy enabled, for example:

  mdadm -C /dev/md0 -l6 -n6 -R -f /dev/loop[0-3] missing missing
  echo 1 > /sys/block/md0/md/skip_copy
  fio --filename=/dev/md0 --rw=randrw --bs=4k --numjobs=8 \
      --iodepth=32 --size=4M --runtime=30 --time_based --direct=1

Fix by checking R5_LOCKED before proceeding with the compute. The
compute will be retried once the lock is cleared on IO completion.

Signed-off-by: FengWei Shih <dannyshih@synology.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Link: https://lore.kernel.org/linux-raid/20260319053351.3676794-1-dannyshih@synology.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
drivers/md/raid5.c

index 2ec6dd6ddd93f5441be6fda3a3c9d52b79ab0b20..ddac1be2648f08e60c822702459b9f1426071fa8 100644 (file)
@@ -3916,6 +3916,8 @@ static int fetch_block(struct stripe_head *sh, struct stripe_head_state *s,
                                        break;
                        }
                        BUG_ON(other < 0);
+                       if (test_bit(R5_LOCKED, &sh->dev[other].flags))
+                               return 0;
                        pr_debug("Computing stripe %llu blocks %d,%d\n",
                               (unsigned long long)sh->sector,
                               disk_idx, other);