6.6-stable patches

author Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Tue, 18 Feb 2025 15:09:36 +0000 (16:09 +0100)

committer Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Tue, 18 Feb 2025 15:09:36 +0000 (16:09 +0100)
author Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tue, 18 Feb 2025 15:09:36 +0000 (16:09 +0100)
committer Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tue, 18 Feb 2025 15:09:36 +0000 (16:09 +0100)
diff --git a/queue-6.6/arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch b/queue-6.6/arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch

new file mode 100644 (file)

index 0000000..de12833
--- /dev/null
+++ b/queue-6.6/arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch
@@ -0,0 +1,191 @@
+From 064737920bdbca86df91b96aed256e88018fef3a Mon Sep 17 00:00:00 2001
+From: Marc Zyngier <maz@kernel.org>
+Date: Tue, 7 Jan 2025 22:59:41 +0000
+Subject: arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
+
+From: Marc Zyngier <maz@kernel.org>
+
+commit 064737920bdbca86df91b96aed256e88018fef3a upstream.
+
+The hwcaps code that exposes SVE features to userspace only
+considers ID_AA64ZFR0_EL1, while this is only valid when
+ID_AA64PFR0_EL1.SVE advertises that SVE is actually supported.
+
+The expectations are that when ID_AA64PFR0_EL1.SVE is 0, the
+ID_AA64ZFR0_EL1 register is also 0. So far, so good.
+
+Things become a bit more interesting if the HW implements SME.
+In this case, a few ID_AA64ZFR0_EL1 fields indicate *SME*
+features. And these fields overlap with their SVE interpretations.
+But the architecture says that the SME and SVE feature sets must
+match, so we're still hunky-dory.
+
+This goes wrong if the HW implements SME, but not SVE. In this
+case, we end-up advertising some SVE features to userspace, even
+if the HW has none. That's because we never consider whether SVE
+is actually implemented. Oh well.
+
+Fix it by restricting all SVE capabilities to ID_AA64PFR0_EL1.SVE
+being non-zero. The HWCAPS documentation is amended to reflect the
+actually checks performed by the kernel.
+
+Fixes: 06a916feca2b ("arm64: Expose SVE2 features for userspace")
+Reported-by: Catalin Marinas <catalin.marinas@arm.com>
+Signed-off-by: Marc Zyngier <maz@kernel.org>
+Signed-off-by: Mark Brown <broonie@kernel.org>
+Cc: Will Deacon <will@kernel.org>
+Cc: Mark Rutland <mark.rutland@arm.com>
+Cc: stable@vger.kernel.org
+Reviewed-by: Mark Brown <broonie@kernel.org>
+Link: https://lore.kernel.org/r/20250107-arm64-2024-dpisa-v5-1-7578da51fc3d@kernel.org
+Signed-off-by: Will Deacon <will@kernel.org>
+Signed-off-by: Marc Zyngier <maz@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ Documentation/arch/arm64/elf_hwcaps.rst |   36 ++++++++++++++++++++----------
+ arch/arm64/kernel/cpufeature.c          |   38 +++++++++++++++++++++-----------
+ 2 files changed, 50 insertions(+), 24 deletions(-)
+
+--- a/Documentation/arch/arm64/elf_hwcaps.rst
++++ b/Documentation/arch/arm64/elf_hwcaps.rst
+@@ -174,22 +174,28 @@ HWCAP2_DCPODP
+     Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010.
+ 
+ HWCAP2_SVE2
+-    Functionality implied by ID_AA64ZFR0_EL1.SVEVer == 0b0001.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.SVEver == 0b0001.
+ 
+ HWCAP2_SVEAES
+-    Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0001.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.AES == 0b0001.
+ 
+ HWCAP2_SVEPMULL
+-    Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0010.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.AES == 0b0010.
+ 
+ HWCAP2_SVEBITPERM
+-    Functionality implied by ID_AA64ZFR0_EL1.BitPerm == 0b0001.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.BitPerm == 0b0001.
+ 
+ HWCAP2_SVESHA3
+-    Functionality implied by ID_AA64ZFR0_EL1.SHA3 == 0b0001.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.SHA3 == 0b0001.
+ 
+ HWCAP2_SVESM4
+-    Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.SM4 == 0b0001.
+ 
+ HWCAP2_FLAGM2
+     Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010.
+@@ -198,16 +204,20 @@ HWCAP2_FRINT
+     Functionality implied by ID_AA64ISAR1_EL1.FRINTTS == 0b0001.
+ 
+ HWCAP2_SVEI8MM
+-    Functionality implied by ID_AA64ZFR0_EL1.I8MM == 0b0001.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.I8MM == 0b0001.
+ 
+ HWCAP2_SVEF32MM
+-    Functionality implied by ID_AA64ZFR0_EL1.F32MM == 0b0001.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.F32MM == 0b0001.
+ 
+ HWCAP2_SVEF64MM
+-    Functionality implied by ID_AA64ZFR0_EL1.F64MM == 0b0001.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.F64MM == 0b0001.
+ 
+ HWCAP2_SVEBF16
+-    Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0001.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.BF16 == 0b0001.
+ 
+ HWCAP2_I8MM
+     Functionality implied by ID_AA64ISAR1_EL1.I8MM == 0b0001.
+@@ -273,7 +283,8 @@ HWCAP2_EBF16
+     Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0010.
+ 
+ HWCAP2_SVE_EBF16
+-    Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0010.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.BF16 == 0b0010.
+ 
+ HWCAP2_CSSC
+     Functionality implied by ID_AA64ISAR2_EL1.CSSC == 0b0001.
+@@ -282,7 +293,8 @@ HWCAP2_RPRFM
+     Functionality implied by ID_AA64ISAR2_EL1.RPRFM == 0b0001.
+ 
+ HWCAP2_SVE2P1
+-    Functionality implied by ID_AA64ZFR0_EL1.SVEver == 0b0010.
++    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and
++    ID_AA64ZFR0_EL1.SVEver == 0b0010.
+ 
+ HWCAP2_SME2
+     Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0001.
+--- a/arch/arm64/kernel/cpufeature.c
++++ b/arch/arm64/kernel/cpufeature.c
+@@ -2762,6 +2762,13 @@ static const struct arm64_cpu_capabiliti
+               .matches = match,                                               \
+       }
+ 
++#define HWCAP_CAP_MATCH_ID(match, reg, field, min_value, cap_type, cap)               \
++      {                                                                       \
++              __HWCAP_CAP(#cap, cap_type, cap)                                \
++              HWCAP_CPUID_MATCH(reg, field, min_value)                        \
++              .matches = match,                                               \
++      }
++
+ #ifdef CONFIG_ARM64_PTR_AUTH
+ static const struct arm64_cpu_capabilities ptr_auth_hwcap_addr_matches[] = {
+       {
+@@ -2790,6 +2797,13 @@ static const struct arm64_cpu_capabiliti
+ };
+ #endif
+ 
++#ifdef CONFIG_ARM64_SVE
++static bool has_sve_feature(const struct arm64_cpu_capabilities *cap, int scope)
++{
++      return system_supports_sve() && has_user_cpuid_feature(cap, scope);
++}
++#endif
++
+ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
+       HWCAP_CAP(ID_AA64ISAR0_EL1, AES, PMULL, CAP_HWCAP, KERNEL_HWCAP_PMULL),
+       HWCAP_CAP(ID_AA64ISAR0_EL1, AES, AES, CAP_HWCAP, KERNEL_HWCAP_AES),
+@@ -2827,18 +2841,18 @@ static const struct arm64_cpu_capabiliti
+       HWCAP_CAP(ID_AA64MMFR2_EL1, AT, IMP, CAP_HWCAP, KERNEL_HWCAP_USCAT),
+ #ifdef CONFIG_ARM64_SVE
+       HWCAP_CAP(ID_AA64PFR0_EL1, SVE, IMP, CAP_HWCAP, KERNEL_HWCAP_SVE),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, SVEver, SVE2p1, CAP_HWCAP, KERNEL_HWCAP_SVE2P1),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, SVEver, SVE2, CAP_HWCAP, KERNEL_HWCAP_SVE2),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, AES, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEAES),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, AES, PMULL128, CAP_HWCAP, KERNEL_HWCAP_SVEPMULL),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, BitPerm, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBITPERM),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, BF16, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBF16),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, BF16, EBF16, CAP_HWCAP, KERNEL_HWCAP_SVE_EBF16),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, SHA3, IMP, CAP_HWCAP, KERNEL_HWCAP_SVESHA3),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, SM4, IMP, CAP_HWCAP, KERNEL_HWCAP_SVESM4),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, I8MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEI8MM),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, F32MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF32MM),
+-      HWCAP_CAP(ID_AA64ZFR0_EL1, F64MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF64MM),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SVEver, SVE2p1, CAP_HWCAP, KERNEL_HWCAP_SVE2P1),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SVEver, SVE2, CAP_HWCAP, KERNEL_HWCAP_SVE2),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, AES, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEAES),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, AES, PMULL128, CAP_HWCAP, KERNEL_HWCAP_SVEPMULL),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, BitPerm, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBITPERM),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, BF16, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBF16),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, BF16, EBF16, CAP_HWCAP, KERNEL_HWCAP_SVE_EBF16),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SHA3, IMP, CAP_HWCAP, KERNEL_HWCAP_SVESHA3),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SM4, IMP, CAP_HWCAP, KERNEL_HWCAP_SVESM4),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, I8MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEI8MM),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, F32MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF32MM),
++      HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, F64MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF64MM),
+ #endif
+       HWCAP_CAP(ID_AA64PFR1_EL1, SSBS, SSBS2, CAP_HWCAP, KERNEL_HWCAP_SSBS),
+ #ifdef CONFIG_ARM64_BTI
diff --git a/queue-6.6/md-add-a-new-callback-pers-bitmap_sector.patch b/queue-6.6/md-add-a-new-callback-pers-bitmap_sector.patch

new file mode 100644 (file)

index 0000000..ec768b4
--- /dev/null
+++ b/queue-6.6/md-add-a-new-callback-pers-bitmap_sector.patch
@@ -0,0 +1,36 @@
+From stable+bounces-114480-greg=kroah.com@vger.kernel.org Mon Feb 10 08:40:48 2025
+From: Yu Kuai <yukuai1@huaweicloud.com>
+Date: Mon, 10 Feb 2025 15:33:20 +0800
+Subject: md: add a new callback pers->bitmap_sector()
+To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com
+Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com
+Message-ID: <20250210073322.3315094-5-yukuai1@huaweicloud.com>
+
+From: Yu Kuai <yukuai3@huawei.com>
+
+commit 0c984a283a3ea3f10bebecd6c57c1d41b2e4f518 upstream.
+
+This callback will be used in raid5 to convert io ranges from array to
+bitmap.
+
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Reviewed-by: Xiao Ni <xni@redhat.com>
+Link: https://lore.kernel.org/r/20250109015145.158868-4-yukuai1@huaweicloud.com
+Signed-off-by: Song Liu <song@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/md/md.h |    3 +++
+ 1 file changed, 3 insertions(+)
+
+--- a/drivers/md/md.h
++++ b/drivers/md/md.h
+@@ -661,6 +661,9 @@ struct md_personality
+       void *(*takeover) (struct mddev *mddev);
+       /* Changes the consistency policy of an active array. */
+       int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
++      /* convert io ranges from array to bitmap */
++      void (*bitmap_sector)(struct mddev *mddev, sector_t *offset,
++                            unsigned long *sectors);
+ };
+ 
+ struct md_sysfs_entry {
diff --git a/queue-6.6/md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch b/queue-6.6/md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch

new file mode 100644 (file)

index 0000000..7a165bb
--- /dev/null
+++ b/queue-6.6/md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch
@@ -0,0 +1,261 @@
+From stable+bounces-114483-greg=kroah.com@vger.kernel.org Mon Feb 10 08:41:21 2025
+From: Yu Kuai <yukuai1@huaweicloud.com>
+Date: Mon, 10 Feb 2025 15:33:18 +0800
+Subject: md/md-bitmap: factor behind write counters out from bitmap_{start/end}write()
+To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com
+Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com
+Message-ID: <20250210073322.3315094-3-yukuai1@huaweicloud.com>
+
+From: Yu Kuai <yukuai3@huawei.com>
+
+commit 08c50142a128dcb2d7060aa3b4c5db8837f7a46a upstream.
+
+behind_write is only used in raid1, prepare to refactor
+bitmap_{start/end}write(), there are no functional changes.
+
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Reviewed-by: Xiao Ni <xni@redhat.com>
+Link: https://lore.kernel.org/r/20250109015145.158868-2-yukuai1@huaweicloud.com
+Signed-off-by: Song Liu <song@kernel.org>
+[There is no bitmap_operations, resolve conflicts by exporting new api
+md_bitmap_{start,end}_behind_write]
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/md/md-bitmap.c   |   60 +++++++++++++++++++++++++++++------------------
+ drivers/md/md-bitmap.h   |    6 +++-
+ drivers/md/raid1.c       |   11 +++++---
+ drivers/md/raid10.c      |    5 +--
+ drivers/md/raid5-cache.c |    4 +--
+ drivers/md/raid5.c       |   13 ++++------
+ 6 files changed, 59 insertions(+), 40 deletions(-)
+
+--- a/drivers/md/md-bitmap.c
++++ b/drivers/md/md-bitmap.c
+@@ -1465,22 +1465,12 @@ __acquires(bitmap->lock)
+                       &(bitmap->bp[page].map[pageoff]);
+ }
+ 
+-int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, unsigned long sectors, int behind)
++int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset,
++                       unsigned long sectors)
+ {
+       if (!bitmap)
+               return 0;
+ 
+-      if (behind) {
+-              int bw;
+-              atomic_inc(&bitmap->behind_writes);
+-              bw = atomic_read(&bitmap->behind_writes);
+-              if (bw > bitmap->behind_writes_used)
+-                      bitmap->behind_writes_used = bw;
+-
+-              pr_debug("inc write-behind count %d/%lu\n",
+-                       bw, bitmap->mddev->bitmap_info.max_write_behind);
+-      }
+-
+       while (sectors) {
+               sector_t blocks;
+               bitmap_counter_t *bmc;
+@@ -1527,20 +1517,13 @@ int md_bitmap_startwrite(struct bitmap *
+       }
+       return 0;
+ }
+-EXPORT_SYMBOL(md_bitmap_startwrite);
++EXPORT_SYMBOL_GPL(md_bitmap_startwrite);
+ 
+ void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset,
+-                      unsigned long sectors, int success, int behind)
++                      unsigned long sectors, int success)
+ {
+       if (!bitmap)
+               return;
+-      if (behind) {
+-              if (atomic_dec_and_test(&bitmap->behind_writes))
+-                      wake_up(&bitmap->behind_wait);
+-              pr_debug("dec write-behind count %d/%lu\n",
+-                       atomic_read(&bitmap->behind_writes),
+-                       bitmap->mddev->bitmap_info.max_write_behind);
+-      }
+ 
+       while (sectors) {
+               sector_t blocks;
+@@ -1580,7 +1563,7 @@ void md_bitmap_endwrite(struct bitmap *b
+                       sectors = 0;
+       }
+ }
+-EXPORT_SYMBOL(md_bitmap_endwrite);
++EXPORT_SYMBOL_GPL(md_bitmap_endwrite);
+ 
+ static int __bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks,
+                              int degraded)
+@@ -1842,6 +1825,39 @@ void md_bitmap_free(struct bitmap *bitma
+ }
+ EXPORT_SYMBOL(md_bitmap_free);
+ 
++void md_bitmap_start_behind_write(struct mddev *mddev)
++{
++      struct bitmap *bitmap = mddev->bitmap;
++      int bw;
++
++      if (!bitmap)
++              return;
++
++      atomic_inc(&bitmap->behind_writes);
++      bw = atomic_read(&bitmap->behind_writes);
++      if (bw > bitmap->behind_writes_used)
++              bitmap->behind_writes_used = bw;
++
++      pr_debug("inc write-behind count %d/%lu\n",
++               bw, bitmap->mddev->bitmap_info.max_write_behind);
++}
++EXPORT_SYMBOL_GPL(md_bitmap_start_behind_write);
++
++void md_bitmap_end_behind_write(struct mddev *mddev)
++{
++      struct bitmap *bitmap = mddev->bitmap;
++
++      if (!bitmap)
++              return;
++
++      if (atomic_dec_and_test(&bitmap->behind_writes))
++              wake_up(&bitmap->behind_wait);
++      pr_debug("dec write-behind count %d/%lu\n",
++               atomic_read(&bitmap->behind_writes),
++               bitmap->mddev->bitmap_info.max_write_behind);
++}
++EXPORT_SYMBOL_GPL(md_bitmap_end_behind_write);
++
+ void md_bitmap_wait_behind_writes(struct mddev *mddev)
+ {
+       struct bitmap *bitmap = mddev->bitmap;
+--- a/drivers/md/md-bitmap.h
++++ b/drivers/md/md-bitmap.h
+@@ -253,9 +253,11 @@ void md_bitmap_dirty_bits(struct bitmap
+ 
+ /* these are exported */
+ int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset,
+-                       unsigned long sectors, int behind);
++                       unsigned long sectors);
+ void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset,
+-                      unsigned long sectors, int success, int behind);
++                      unsigned long sectors, int success);
++void md_bitmap_start_behind_write(struct mddev *mddev);
++void md_bitmap_end_behind_write(struct mddev *mddev);
+ int md_bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int degraded);
+ void md_bitmap_end_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int aborted);
+ void md_bitmap_close_sync(struct bitmap *bitmap);
+--- a/drivers/md/raid1.c
++++ b/drivers/md/raid1.c
+@@ -419,11 +419,12 @@ static void close_write(struct r1bio *r1
+               bio_put(r1_bio->behind_master_bio);
+               r1_bio->behind_master_bio = NULL;
+       }
++      if (test_bit(R1BIO_BehindIO, &r1_bio->state))
++              md_bitmap_end_behind_write(r1_bio->mddev);
+       /* clear the bitmap if all writes complete successfully */
+       md_bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector,
+                          r1_bio->sectors,
+-                         !test_bit(R1BIO_Degraded, &r1_bio->state),
+-                         test_bit(R1BIO_BehindIO, &r1_bio->state));
++                         !test_bit(R1BIO_Degraded, &r1_bio->state));
+       md_write_end(r1_bio->mddev);
+ }
+ 
+@@ -1530,8 +1531,10 @@ static void raid1_write_request(struct m
+                               alloc_behind_master_bio(r1_bio, bio);
+                       }
+ 
+-                      md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
+-                                           test_bit(R1BIO_BehindIO, &r1_bio->state));
++                      if (test_bit(R1BIO_BehindIO, &r1_bio->state))
++                              md_bitmap_start_behind_write(mddev);
++                      md_bitmap_startwrite(bitmap, r1_bio->sector,
++                                           r1_bio->sectors);
+                       first_clone = 0;
+               }
+ 
+--- a/drivers/md/raid10.c
++++ b/drivers/md/raid10.c
+@@ -430,8 +430,7 @@ static void close_write(struct r10bio *r
+       /* clear the bitmap if all writes complete successfully */
+       md_bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector,
+                          r10_bio->sectors,
+-                         !test_bit(R10BIO_Degraded, &r10_bio->state),
+-                         0);
++                         !test_bit(R10BIO_Degraded, &r10_bio->state));
+       md_write_end(r10_bio->mddev);
+ }
+ 
+@@ -1554,7 +1553,7 @@ static void raid10_write_request(struct
+       md_account_bio(mddev, &bio);
+       r10_bio->master_bio = bio;
+       atomic_set(&r10_bio->remaining, 1);
+-      md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors, 0);
++      md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors);
+ 
+       for (i = 0; i < conf->copies; i++) {
+               if (r10_bio->devs[i].bio)
+--- a/drivers/md/raid5-cache.c
++++ b/drivers/md/raid5-cache.c
+@@ -315,8 +315,8 @@ void r5c_handle_cached_data_endio(struct
+                       r5c_return_dev_pending_writes(conf, &sh->dev[i]);
+                       md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+                                          RAID5_STRIPE_SECTORS(conf),
+-                                         !test_bit(STRIPE_DEGRADED, &sh->state),
+-                                         0);
++                                         !test_bit(STRIPE_DEGRADED,
++                                                   &sh->state));
+               }
+       }
+ }
+--- a/drivers/md/raid5.c
++++ b/drivers/md/raid5.c
+@@ -3606,7 +3606,7 @@ static void __add_stripe_bio(struct stri
+               set_bit(STRIPE_BITMAP_PENDING, &sh->state);
+               spin_unlock_irq(&sh->stripe_lock);
+               md_bitmap_startwrite(conf->mddev->bitmap, sh->sector,
+-                                   RAID5_STRIPE_SECTORS(conf), 0);
++                                   RAID5_STRIPE_SECTORS(conf));
+               spin_lock_irq(&sh->stripe_lock);
+               clear_bit(STRIPE_BITMAP_PENDING, &sh->state);
+               if (!sh->batch_head) {
+@@ -3708,7 +3708,7 @@ handle_failed_stripe(struct r5conf *conf
+               }
+               if (bitmap_end)
+                       md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                         RAID5_STRIPE_SECTORS(conf), 0, 0);
++                                         RAID5_STRIPE_SECTORS(conf), 0);
+               bitmap_end = 0;
+               /* and fail all 'written' */
+               bi = sh->dev[i].written;
+@@ -3754,7 +3754,7 @@ handle_failed_stripe(struct r5conf *conf
+               }
+               if (bitmap_end)
+                       md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                         RAID5_STRIPE_SECTORS(conf), 0, 0);
++                                         RAID5_STRIPE_SECTORS(conf), 0);
+               /* If we were in the middle of a write the parity block might
+                * still be locked - so just clear all R5_LOCKED flags
+                */
+@@ -4107,8 +4107,8 @@ returnbi:
+                               }
+                               md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+                                                  RAID5_STRIPE_SECTORS(conf),
+-                                                 !test_bit(STRIPE_DEGRADED, &sh->state),
+-                                                 0);
++                                                 !test_bit(STRIPE_DEGRADED,
++                                                           &sh->state));
+                               if (head_sh->batch_head) {
+                                       sh = list_first_entry(&sh->batch_list,
+                                                             struct stripe_head,
+@@ -5853,8 +5853,7 @@ static void make_discard_request(struct
+                            d++)
+                               md_bitmap_startwrite(mddev->bitmap,
+                                                    sh->sector,
+-                                                   RAID5_STRIPE_SECTORS(conf),
+-                                                   0);
++                                                   RAID5_STRIPE_SECTORS(conf));
+                       sh->bm_seq = conf->seq_flush + 1;
+                       set_bit(STRIPE_BIT_DELAY, &sh->state);
+               }
diff --git a/queue-6.6/md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch b/queue-6.6/md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch

new file mode 100644 (file)

index 0000000..ab9c6fe
--- /dev/null
+++ b/queue-6.6/md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch
@@ -0,0 +1,342 @@
+From stable+bounces-114484-greg=kroah.com@vger.kernel.org Mon Feb 10 08:41:27 2025
+From: Yu Kuai <yukuai1@huaweicloud.com>
+Date: Mon, 10 Feb 2025 15:33:22 +0800
+Subject: md/md-bitmap: move bitmap_{start, end}write to md upper layer
+To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com
+Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com
+Message-ID: <20250210073322.3315094-7-yukuai1@huaweicloud.com>
+
+From: Yu Kuai <yukuai3@huawei.com>
+
+commit cd5fc653381811f1e0ba65f5d169918cab61476f upstream.
+
+There are two BUG reports that raid5 will hang at
+bitmap_startwrite([1],[2]), root cause is that bitmap start write and end
+write is unbalanced, it's not quite clear where, and while reviewing raid5
+code, it's found that bitmap operations can be optimized. For example,
+for a 4 disks raid5, with chunksize=8k, if user issue a IO (0 + 48k) to
+the array:
+
+┌────────────────────────────────────────────────────────────┐
+│chunk 0                                                     │
+│      ┌────────────┬─────────────┬─────────────┬────────────┼
+│  sh0 │A0: 0 + 4k  │A1: 8k + 4k  │A2: 16k + 4k │A3: P       │
+│      ┼────────────┼─────────────┼─────────────┼────────────┼
+│  sh1 │B0: 4k + 4k │B1: 12k + 4k │B2: 20k + 4k │B3: P       │
+┼──────┴────────────┴─────────────┴─────────────┴────────────┼
+│chunk 1                                                     │
+│      ┌────────────┬─────────────┬─────────────┬────────────┤
+│  sh2 │C0: 24k + 4k│C1: 32k + 4k │C2: P        │C3: 40k + 4k│
+│      ┼────────────┼─────────────┼─────────────┼────────────┼
+│  sh3 │D0: 28k + 4k│D1: 36k + 4k │D2: P        │D3: 44k + 4k│
+└──────┴────────────┴─────────────┴─────────────┴────────────┘
+
+Before this patch, 4 stripe head will be used, and each sh will attach
+bio for 3 disks, and each attached bio will trigger
+bitmap_startwrite() once, which means total 12 times.
+ - 3 times (0 + 4k), for (A0, A1 and A2)
+ - 3 times (4 + 4k), for (B0, B1 and B2)
+ - 3 times (8 + 4k), for (C0, C1 and C3)
+ - 3 times (12 + 4k), for (D0, D1 and D3)
+
+After this patch, md upper layer will calculate that IO range (0 + 48k)
+is corresponding to the bitmap (0 + 16k), and call bitmap_startwrite()
+just once.
+
+Noted that this patch will align bitmap ranges to the chunks, for example,
+if user issue a IO (0 + 4k) to array:
+
+- Before this patch, 1 time (0 + 4k), for A0;
+- After this patch, 1 time (0 + 8k) for chunk 0;
+
+Usually, one bitmap bit will represent more than one disk chunk, and this
+doesn't have any difference. And even if user really created a array
+that one chunk contain multiple bits, the overhead is that more data
+will be recovered after power failure.
+
+Also remove STRIPE_BITMAP_PENDING since it's not used anymore.
+
+[1] https://lore.kernel.org/all/CAJpMwyjmHQLvm6zg1cmQErttNNQPDAAXPKM3xgTjMhbfts986Q@mail.gmail.com/
+[2] https://lore.kernel.org/all/ADF7D720-5764-4AF3-B68E-1845988737AA@flyingcircus.io/
+
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Link: https://lore.kernel.org/r/20250109015145.158868-6-yukuai1@huaweicloud.com
+Signed-off-by: Song Liu <song@kernel.org>
+[There is no bitmap_operations, resolve conflicts by replacing
+bitmap_ops->{startwrite, endwrite} with md_bitmap_{startwrite, endwrite}]
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/md/md-bitmap.c   |    2 -
+ drivers/md/md.c          |   26 ++++++++++++++++++++++++
+ drivers/md/md.h          |    2 +
+ drivers/md/raid1.c       |    5 ----
+ drivers/md/raid10.c      |    4 ---
+ drivers/md/raid5-cache.c |    2 -
+ drivers/md/raid5.c       |   50 ++++-------------------------------------------
+ drivers/md/raid5.h       |    3 --
+ 8 files changed, 33 insertions(+), 61 deletions(-)
+
+--- a/drivers/md/md-bitmap.c
++++ b/drivers/md/md-bitmap.c
+@@ -1517,7 +1517,6 @@ int md_bitmap_startwrite(struct bitmap *
+       }
+       return 0;
+ }
+-EXPORT_SYMBOL_GPL(md_bitmap_startwrite);
+ 
+ void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset,
+                       unsigned long sectors)
+@@ -1564,7 +1563,6 @@ void md_bitmap_endwrite(struct bitmap *b
+                       sectors = 0;
+       }
+ }
+-EXPORT_SYMBOL_GPL(md_bitmap_endwrite);
+ 
+ static int __bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks,
+                              int degraded)
+--- a/drivers/md/md.c
++++ b/drivers/md/md.c
+@@ -8713,12 +8713,32 @@ void md_submit_discard_bio(struct mddev
+ }
+ EXPORT_SYMBOL_GPL(md_submit_discard_bio);
+ 
++static void md_bitmap_start(struct mddev *mddev,
++                          struct md_io_clone *md_io_clone)
++{
++      if (mddev->pers->bitmap_sector)
++              mddev->pers->bitmap_sector(mddev, &md_io_clone->offset,
++                                         &md_io_clone->sectors);
++
++      md_bitmap_startwrite(mddev->bitmap, md_io_clone->offset,
++                           md_io_clone->sectors);
++}
++
++static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
++{
++      md_bitmap_endwrite(mddev->bitmap, md_io_clone->offset,
++                         md_io_clone->sectors);
++}
++
+ static void md_end_clone_io(struct bio *bio)
+ {
+       struct md_io_clone *md_io_clone = bio->bi_private;
+       struct bio *orig_bio = md_io_clone->orig_bio;
+       struct mddev *mddev = md_io_clone->mddev;
+ 
++      if (bio_data_dir(orig_bio) == WRITE && mddev->bitmap)
++              md_bitmap_end(mddev, md_io_clone);
++
+       if (bio->bi_status && !orig_bio->bi_status)
+               orig_bio->bi_status = bio->bi_status;
+ 
+@@ -8743,6 +8763,12 @@ static void md_clone_bio(struct mddev *m
+       if (blk_queue_io_stat(bdev->bd_disk->queue))
+               md_io_clone->start_time = bio_start_io_acct(*bio);
+ 
++      if (bio_data_dir(*bio) == WRITE && mddev->bitmap) {
++              md_io_clone->offset = (*bio)->bi_iter.bi_sector;
++              md_io_clone->sectors = bio_sectors(*bio);
++              md_bitmap_start(mddev, md_io_clone);
++      }
++
+       clone->bi_end_io = md_end_clone_io;
+       clone->bi_private = md_io_clone;
+       *bio = clone;
+--- a/drivers/md/md.h
++++ b/drivers/md/md.h
+@@ -746,6 +746,8 @@ struct md_io_clone {
+       struct mddev    *mddev;
+       struct bio      *orig_bio;
+       unsigned long   start_time;
++      sector_t        offset;
++      unsigned long   sectors;
+       struct bio      bio_clone;
+ };
+ 
+--- a/drivers/md/raid1.c
++++ b/drivers/md/raid1.c
+@@ -421,9 +421,6 @@ static void close_write(struct r1bio *r1
+       }
+       if (test_bit(R1BIO_BehindIO, &r1_bio->state))
+               md_bitmap_end_behind_write(r1_bio->mddev);
+-      /* clear the bitmap if all writes complete successfully */
+-      md_bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector,
+-                         r1_bio->sectors);
+       md_write_end(r1_bio->mddev);
+ }
+ 
+@@ -1517,8 +1514,6 @@ static void raid1_write_request(struct m
+ 
+                       if (test_bit(R1BIO_BehindIO, &r1_bio->state))
+                               md_bitmap_start_behind_write(mddev);
+-                      md_bitmap_startwrite(bitmap, r1_bio->sector,
+-                                           r1_bio->sectors);
+                       first_clone = 0;
+               }
+ 
+--- a/drivers/md/raid10.c
++++ b/drivers/md/raid10.c
+@@ -427,9 +427,6 @@ static void raid10_end_read_request(stru
+ 
+ static void close_write(struct r10bio *r10_bio)
+ {
+-      /* clear the bitmap if all writes complete successfully */
+-      md_bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector,
+-                         r10_bio->sectors);
+       md_write_end(r10_bio->mddev);
+ }
+ 
+@@ -1541,7 +1538,6 @@ static void raid10_write_request(struct
+       md_account_bio(mddev, &bio);
+       r10_bio->master_bio = bio;
+       atomic_set(&r10_bio->remaining, 1);
+-      md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors);
+ 
+       for (i = 0; i < conf->copies; i++) {
+               if (r10_bio->devs[i].bio)
+--- a/drivers/md/raid5-cache.c
++++ b/drivers/md/raid5-cache.c
+@@ -313,8 +313,6 @@ void r5c_handle_cached_data_endio(struct
+               if (sh->dev[i].written) {
+                       set_bit(R5_UPTODATE, &sh->dev[i].flags);
+                       r5c_return_dev_pending_writes(conf, &sh->dev[i]);
+-                      md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                         RAID5_STRIPE_SECTORS(conf));
+               }
+       }
+ }
+--- a/drivers/md/raid5.c
++++ b/drivers/md/raid5.c
+@@ -905,7 +905,6 @@ static bool stripe_can_batch(struct stri
+       if (raid5_has_log(conf) || raid5_has_ppl(conf))
+               return false;
+       return test_bit(STRIPE_BATCH_READY, &sh->state) &&
+-              !test_bit(STRIPE_BITMAP_PENDING, &sh->state) &&
+               is_full_stripe_write(sh);
+ }
+ 
+@@ -3587,29 +3586,9 @@ static void __add_stripe_bio(struct stri
+                (*bip)->bi_iter.bi_sector, sh->sector, dd_idx,
+                sh->dev[dd_idx].sector);
+ 
+-      if (conf->mddev->bitmap && firstwrite) {
+-              /* Cannot hold spinlock over bitmap_startwrite,
+-               * but must ensure this isn't added to a batch until
+-               * we have added to the bitmap and set bm_seq.
+-               * So set STRIPE_BITMAP_PENDING to prevent
+-               * batching.
+-               * If multiple __add_stripe_bio() calls race here they
+-               * much all set STRIPE_BITMAP_PENDING.  So only the first one
+-               * to complete "bitmap_startwrite" gets to set
+-               * STRIPE_BIT_DELAY.  This is important as once a stripe
+-               * is added to a batch, STRIPE_BIT_DELAY cannot be changed
+-               * any more.
+-               */
+-              set_bit(STRIPE_BITMAP_PENDING, &sh->state);
+-              spin_unlock_irq(&sh->stripe_lock);
+-              md_bitmap_startwrite(conf->mddev->bitmap, sh->sector,
+-                                   RAID5_STRIPE_SECTORS(conf));
+-              spin_lock_irq(&sh->stripe_lock);
+-              clear_bit(STRIPE_BITMAP_PENDING, &sh->state);
+-              if (!sh->batch_head) {
+-                      sh->bm_seq = conf->seq_flush+1;
+-                      set_bit(STRIPE_BIT_DELAY, &sh->state);
+-              }
++      if (conf->mddev->bitmap && firstwrite && !sh->batch_head) {
++              sh->bm_seq = conf->seq_flush+1;
++              set_bit(STRIPE_BIT_DELAY, &sh->state);
+       }
+ }
+ 
+@@ -3660,7 +3639,6 @@ handle_failed_stripe(struct r5conf *conf
+       BUG_ON(sh->batch_head);
+       for (i = disks; i--; ) {
+               struct bio *bi;
+-              int bitmap_end = 0;
+ 
+               if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
+                       struct md_rdev *rdev;
+@@ -3687,8 +3665,6 @@ handle_failed_stripe(struct r5conf *conf
+               sh->dev[i].towrite = NULL;
+               sh->overwrite_disks = 0;
+               spin_unlock_irq(&sh->stripe_lock);
+-              if (bi)
+-                      bitmap_end = 1;
+ 
+               log_stripe_write_finished(sh);
+ 
+@@ -3703,10 +3679,6 @@ handle_failed_stripe(struct r5conf *conf
+                       bio_io_error(bi);
+                       bi = nextbi;
+               }
+-              if (bitmap_end)
+-                      md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                         RAID5_STRIPE_SECTORS(conf));
+-              bitmap_end = 0;
+               /* and fail all 'written' */
+               bi = sh->dev[i].written;
+               sh->dev[i].written = NULL;
+@@ -3715,7 +3687,6 @@ handle_failed_stripe(struct r5conf *conf
+                       sh->dev[i].page = sh->dev[i].orig_page;
+               }
+ 
+-              if (bi) bitmap_end = 1;
+               while (bi && bi->bi_iter.bi_sector <
+                      sh->dev[i].sector + RAID5_STRIPE_SECTORS(conf)) {
+                       struct bio *bi2 = r5_next_bio(conf, bi, sh->dev[i].sector);
+@@ -3749,9 +3720,6 @@ handle_failed_stripe(struct r5conf *conf
+                               bi = nextbi;
+                       }
+               }
+-              if (bitmap_end)
+-                      md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                         RAID5_STRIPE_SECTORS(conf));
+               /* If we were in the middle of a write the parity block might
+                * still be locked - so just clear all R5_LOCKED flags
+                */
+@@ -4102,8 +4070,7 @@ returnbi:
+                                       bio_endio(wbi);
+                                       wbi = wbi2;
+                               }
+-                              md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                                 RAID5_STRIPE_SECTORS(conf));
++
+                               if (head_sh->batch_head) {
+                                       sh = list_first_entry(&sh->batch_list,
+                                                             struct stripe_head,
+@@ -4935,8 +4902,7 @@ static void break_stripe_batch_list(stru
+                                         (1 << STRIPE_COMPUTE_RUN)  |
+                                         (1 << STRIPE_DISCARD) |
+                                         (1 << STRIPE_BATCH_READY) |
+-                                        (1 << STRIPE_BATCH_ERR) |
+-                                        (1 << STRIPE_BITMAP_PENDING)),
++                                        (1 << STRIPE_BATCH_ERR)),
+                       "stripe state: %lx\n", sh->state);
+               WARN_ONCE(head_sh->state & ((1 << STRIPE_DISCARD) |
+                                             (1 << STRIPE_REPLACED)),
+@@ -5840,12 +5806,6 @@ static void make_discard_request(struct
+               }
+               spin_unlock_irq(&sh->stripe_lock);
+               if (conf->mddev->bitmap) {
+-                      for (d = 0;
+-                           d < conf->raid_disks - conf->max_degraded;
+-                           d++)
+-                              md_bitmap_startwrite(mddev->bitmap,
+-                                                   sh->sector,
+-                                                   RAID5_STRIPE_SECTORS(conf));
+                       sh->bm_seq = conf->seq_flush + 1;
+                       set_bit(STRIPE_BIT_DELAY, &sh->state);
+               }
+--- a/drivers/md/raid5.h
++++ b/drivers/md/raid5.h
+@@ -371,9 +371,6 @@ enum {
+       STRIPE_ON_RELEASE_LIST,
+       STRIPE_BATCH_READY,
+       STRIPE_BATCH_ERR,
+-      STRIPE_BITMAP_PENDING,  /* Being added to bitmap, don't add
+-                               * to batch yet.
+-                               */
+       STRIPE_LOG_TRAPPED,     /* trapped into log (see raid5-cache.c)
+                                * this bit is used in two scenarios:
+                                *
diff --git a/queue-6.6/md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch b/queue-6.6/md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch

new file mode 100644 (file)

index 0000000..9134c97
--- /dev/null
+++ b/queue-6.6/md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch
@@ -0,0 +1,347 @@
+From stable+bounces-114482-greg=kroah.com@vger.kernel.org Mon Feb 10 08:41:07 2025
+From: Yu Kuai <yukuai1@huaweicloud.com>
+Date: Mon, 10 Feb 2025 15:33:19 +0800
+Subject: md/md-bitmap: remove the last parameter for bimtap_ops->endwrite()
+To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com
+Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com
+Message-ID: <20250210073322.3315094-4-yukuai1@huaweicloud.com>
+
+From: Yu Kuai <yukuai3@huawei.com>
+
+commit 4f0e7d0e03b7b80af84759a9e7cfb0f81ac4adae upstream.
+
+For the case that IO failed for one rdev, the bit will be mark as NEEDED
+in following cases:
+
+1) If badblocks is set and rdev is not faulty;
+2) If rdev is faulty;
+
+Case 1) is useless because synchronize data to badblocks make no sense.
+Case 2) can be replaced with mddev->degraded.
+
+Also remove R1BIO_Degraded, R10BIO_Degraded and STRIPE_DEGRADED since
+case 2) no longer use them.
+
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Link: https://lore.kernel.org/r/20250109015145.158868-3-yukuai1@huaweicloud.com
+Signed-off-by: Song Liu <song@kernel.org>
+[ Resolve minor conflicts ]
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/md/md-bitmap.c   |   19 ++++++++++---------
+ drivers/md/md-bitmap.h   |    2 +-
+ drivers/md/raid1.c       |   27 +++------------------------
+ drivers/md/raid1.h       |    1 -
+ drivers/md/raid10.c      |   23 +++--------------------
+ drivers/md/raid10.h      |    1 -
+ drivers/md/raid5-cache.c |    4 +---
+ drivers/md/raid5.c       |   14 +++-----------
+ drivers/md/raid5.h       |    1 -
+ 9 files changed, 21 insertions(+), 71 deletions(-)
+
+--- a/drivers/md/md-bitmap.c
++++ b/drivers/md/md-bitmap.c
+@@ -1520,7 +1520,7 @@ int md_bitmap_startwrite(struct bitmap *
+ EXPORT_SYMBOL_GPL(md_bitmap_startwrite);
+ 
+ void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset,
+-                      unsigned long sectors, int success)
++                      unsigned long sectors)
+ {
+       if (!bitmap)
+               return;
+@@ -1537,15 +1537,16 @@ void md_bitmap_endwrite(struct bitmap *b
+                       return;
+               }
+ 
+-              if (success && !bitmap->mddev->degraded &&
+-                  bitmap->events_cleared < bitmap->mddev->events) {
+-                      bitmap->events_cleared = bitmap->mddev->events;
+-                      bitmap->need_sync = 1;
+-                      sysfs_notify_dirent_safe(bitmap->sysfs_can_clear);
+-              }
+-
+-              if (!success && !NEEDED(*bmc))
++              if (!bitmap->mddev->degraded) {
++                      if (bitmap->events_cleared < bitmap->mddev->events) {
++                              bitmap->events_cleared = bitmap->mddev->events;
++                              bitmap->need_sync = 1;
++                              sysfs_notify_dirent_safe(
++                                              bitmap->sysfs_can_clear);
++                      }
++              } else if (!NEEDED(*bmc)) {
+                       *bmc |= NEEDED_MASK;
++              }
+ 
+               if (COUNTER(*bmc) == COUNTER_MAX)
+                       wake_up(&bitmap->overflow_wait);
+--- a/drivers/md/md-bitmap.h
++++ b/drivers/md/md-bitmap.h
+@@ -255,7 +255,7 @@ void md_bitmap_dirty_bits(struct bitmap
+ int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset,
+                        unsigned long sectors);
+ void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset,
+-                      unsigned long sectors, int success);
++                      unsigned long sectors);
+ void md_bitmap_start_behind_write(struct mddev *mddev);
+ void md_bitmap_end_behind_write(struct mddev *mddev);
+ int md_bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int degraded);
+--- a/drivers/md/raid1.c
++++ b/drivers/md/raid1.c
+@@ -423,8 +423,7 @@ static void close_write(struct r1bio *r1
+               md_bitmap_end_behind_write(r1_bio->mddev);
+       /* clear the bitmap if all writes complete successfully */
+       md_bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector,
+-                         r1_bio->sectors,
+-                         !test_bit(R1BIO_Degraded, &r1_bio->state));
++                         r1_bio->sectors);
+       md_write_end(r1_bio->mddev);
+ }
+ 
+@@ -481,8 +480,6 @@ static void raid1_end_write_request(stru
+               if (!test_bit(Faulty, &rdev->flags))
+                       set_bit(R1BIO_WriteError, &r1_bio->state);
+               else {
+-                      /* Fail the request */
+-                      set_bit(R1BIO_Degraded, &r1_bio->state);
+                       /* Finished with this branch */
+                       r1_bio->bios[mirror] = NULL;
+                       to_put = bio;
+@@ -1415,11 +1412,8 @@ static void raid1_write_request(struct m
+                       break;
+               }
+               r1_bio->bios[i] = NULL;
+-              if (!rdev || test_bit(Faulty, &rdev->flags)) {
+-                      if (i < conf->raid_disks)
+-                              set_bit(R1BIO_Degraded, &r1_bio->state);
++              if (!rdev || test_bit(Faulty, &rdev->flags))
+                       continue;
+-              }
+ 
+               atomic_inc(&rdev->nr_pending);
+               if (test_bit(WriteErrorSeen, &rdev->flags)) {
+@@ -1445,16 +1439,6 @@ static void raid1_write_request(struct m
+                                        */
+                                       max_sectors = bad_sectors;
+                               rdev_dec_pending(rdev, mddev);
+-                              /* We don't set R1BIO_Degraded as that
+-                               * only applies if the disk is
+-                               * missing, so it might be re-added,
+-                               * and we want to know to recover this
+-                               * chunk.
+-                               * In this case the device is here,
+-                               * and the fact that this chunk is not
+-                               * in-sync is recorded in the bad
+-                               * block log
+-                               */
+                               continue;
+                       }
+                       if (is_bad) {
+@@ -2479,12 +2463,9 @@ static void handle_write_finished(struct
+                        * errors.
+                        */
+                       fail = true;
+-                      if (!narrow_write_error(r1_bio, m)) {
++                      if (!narrow_write_error(r1_bio, m))
+                               md_error(conf->mddev,
+                                        conf->mirrors[m].rdev);
+-                              /* an I/O failed, we can't clear the bitmap */
+-                              set_bit(R1BIO_Degraded, &r1_bio->state);
+-                      }
+                       rdev_dec_pending(conf->mirrors[m].rdev,
+                                        conf->mddev);
+               }
+@@ -2576,8 +2557,6 @@ static void raid1d(struct md_thread *thr
+                       list_del(&r1_bio->retry_list);
+                       idx = sector_to_idx(r1_bio->sector);
+                       atomic_dec(&conf->nr_queued[idx]);
+-                      if (mddev->degraded)
+-                              set_bit(R1BIO_Degraded, &r1_bio->state);
+                       if (test_bit(R1BIO_WriteError, &r1_bio->state))
+                               close_write(r1_bio);
+                       raid_end_bio_io(r1_bio);
+--- a/drivers/md/raid1.h
++++ b/drivers/md/raid1.h
+@@ -187,7 +187,6 @@ struct r1bio {
+ enum r1bio_state {
+       R1BIO_Uptodate,
+       R1BIO_IsSync,
+-      R1BIO_Degraded,
+       R1BIO_BehindIO,
+ /* Set ReadError on bios that experience a readerror so that
+  * raid1d knows what to do with them.
+--- a/drivers/md/raid10.c
++++ b/drivers/md/raid10.c
+@@ -429,8 +429,7 @@ static void close_write(struct r10bio *r
+ {
+       /* clear the bitmap if all writes complete successfully */
+       md_bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector,
+-                         r10_bio->sectors,
+-                         !test_bit(R10BIO_Degraded, &r10_bio->state));
++                         r10_bio->sectors);
+       md_write_end(r10_bio->mddev);
+ }
+ 
+@@ -500,7 +499,6 @@ static void raid10_end_write_request(str
+                               set_bit(R10BIO_WriteError, &r10_bio->state);
+                       else {
+                               /* Fail the request */
+-                              set_bit(R10BIO_Degraded, &r10_bio->state);
+                               r10_bio->devs[slot].bio = NULL;
+                               to_put = bio;
+                               dec_rdev = 1;
+@@ -1489,10 +1487,8 @@ static void raid10_write_request(struct
+               r10_bio->devs[i].bio = NULL;
+               r10_bio->devs[i].repl_bio = NULL;
+ 
+-              if (!rdev && !rrdev) {
+-                      set_bit(R10BIO_Degraded, &r10_bio->state);
++              if (!rdev && !rrdev)
+                       continue;
+-              }
+               if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) {
+                       sector_t first_bad;
+                       sector_t dev_sector = r10_bio->devs[i].addr;
+@@ -1509,14 +1505,6 @@ static void raid10_write_request(struct
+                                        * to other devices yet
+                                        */
+                                       max_sectors = bad_sectors;
+-                              /* We don't set R10BIO_Degraded as that
+-                               * only applies if the disk is missing,
+-                               * so it might be re-added, and we want to
+-                               * know to recover this chunk.
+-                               * In this case the device is here, and the
+-                               * fact that this chunk is not in-sync is
+-                               * recorded in the bad block log.
+-                               */
+                               continue;
+                       }
+                       if (is_bad) {
+@@ -3062,11 +3050,8 @@ static void handle_write_completed(struc
+                               rdev_dec_pending(rdev, conf->mddev);
+                       } else if (bio != NULL && bio->bi_status) {
+                               fail = true;
+-                              if (!narrow_write_error(r10_bio, m)) {
++                              if (!narrow_write_error(r10_bio, m))
+                                       md_error(conf->mddev, rdev);
+-                                      set_bit(R10BIO_Degraded,
+-                                              &r10_bio->state);
+-                              }
+                               rdev_dec_pending(rdev, conf->mddev);
+                       }
+                       bio = r10_bio->devs[m].repl_bio;
+@@ -3125,8 +3110,6 @@ static void raid10d(struct md_thread *th
+                       r10_bio = list_first_entry(&tmp, struct r10bio,
+                                                  retry_list);
+                       list_del(&r10_bio->retry_list);
+-                      if (mddev->degraded)
+-                              set_bit(R10BIO_Degraded, &r10_bio->state);
+ 
+                       if (test_bit(R10BIO_WriteError,
+                                    &r10_bio->state))
+--- a/drivers/md/raid10.h
++++ b/drivers/md/raid10.h
+@@ -161,7 +161,6 @@ enum r10bio_state {
+       R10BIO_IsSync,
+       R10BIO_IsRecover,
+       R10BIO_IsReshape,
+-      R10BIO_Degraded,
+ /* Set ReadError on bios that experience a read error
+  * so that raid10d knows what to do with them.
+  */
+--- a/drivers/md/raid5-cache.c
++++ b/drivers/md/raid5-cache.c
+@@ -314,9 +314,7 @@ void r5c_handle_cached_data_endio(struct
+                       set_bit(R5_UPTODATE, &sh->dev[i].flags);
+                       r5c_return_dev_pending_writes(conf, &sh->dev[i]);
+                       md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                         RAID5_STRIPE_SECTORS(conf),
+-                                         !test_bit(STRIPE_DEGRADED,
+-                                                   &sh->state));
++                                         RAID5_STRIPE_SECTORS(conf));
+               }
+       }
+ }
+--- a/drivers/md/raid5.c
++++ b/drivers/md/raid5.c
+@@ -1359,8 +1359,6 @@ again:
+                               submit_bio_noacct(rbi);
+               }
+               if (!rdev && !rrdev) {
+-                      if (op_is_write(op))
+-                              set_bit(STRIPE_DEGRADED, &sh->state);
+                       pr_debug("skip op %d on disc %d for sector %llu\n",
+                               bi->bi_opf, i, (unsigned long long)sh->sector);
+                       clear_bit(R5_LOCKED, &sh->dev[i].flags);
+@@ -2925,7 +2923,6 @@ static void raid5_end_write_request(stru
+                       set_bit(R5_MadeGoodRepl, &sh->dev[i].flags);
+       } else {
+               if (bi->bi_status) {
+-                      set_bit(STRIPE_DEGRADED, &sh->state);
+                       set_bit(WriteErrorSeen, &rdev->flags);
+                       set_bit(R5_WriteError, &sh->dev[i].flags);
+                       if (!test_and_set_bit(WantReplacement, &rdev->flags))
+@@ -3708,7 +3705,7 @@ handle_failed_stripe(struct r5conf *conf
+               }
+               if (bitmap_end)
+                       md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                         RAID5_STRIPE_SECTORS(conf), 0);
++                                         RAID5_STRIPE_SECTORS(conf));
+               bitmap_end = 0;
+               /* and fail all 'written' */
+               bi = sh->dev[i].written;
+@@ -3754,7 +3751,7 @@ handle_failed_stripe(struct r5conf *conf
+               }
+               if (bitmap_end)
+                       md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                         RAID5_STRIPE_SECTORS(conf), 0);
++                                         RAID5_STRIPE_SECTORS(conf));
+               /* If we were in the middle of a write the parity block might
+                * still be locked - so just clear all R5_LOCKED flags
+                */
+@@ -4106,9 +4103,7 @@ returnbi:
+                                       wbi = wbi2;
+                               }
+                               md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
+-                                                 RAID5_STRIPE_SECTORS(conf),
+-                                                 !test_bit(STRIPE_DEGRADED,
+-                                                           &sh->state));
++                                                 RAID5_STRIPE_SECTORS(conf));
+                               if (head_sh->batch_head) {
+                                       sh = list_first_entry(&sh->batch_list,
+                                                             struct stripe_head,
+@@ -4385,7 +4380,6 @@ static void handle_parity_checks5(struct
+               s->locked++;
+               set_bit(R5_Wantwrite, &dev->flags);
+ 
+-              clear_bit(STRIPE_DEGRADED, &sh->state);
+               set_bit(STRIPE_INSYNC, &sh->state);
+               break;
+       case check_state_run:
+@@ -4542,7 +4536,6 @@ static void handle_parity_checks6(struct
+                       clear_bit(R5_Wantwrite, &dev->flags);
+                       s->locked--;
+               }
+-              clear_bit(STRIPE_DEGRADED, &sh->state);
+ 
+               set_bit(STRIPE_INSYNC, &sh->state);
+               break;
+@@ -4951,7 +4944,6 @@ static void break_stripe_batch_list(stru
+ 
+               set_mask_bits(&sh->state, ~(STRIPE_EXPAND_SYNC_FLAGS |
+                                           (1 << STRIPE_PREREAD_ACTIVE) |
+-                                          (1 << STRIPE_DEGRADED) |
+                                           (1 << STRIPE_ON_UNPLUG_LIST)),
+                             head_sh->state & (1 << STRIPE_INSYNC));
+ 
+--- a/drivers/md/raid5.h
++++ b/drivers/md/raid5.h
+@@ -358,7 +358,6 @@ enum {
+       STRIPE_REPLACED,
+       STRIPE_PREREAD_ACTIVE,
+       STRIPE_DELAYED,
+-      STRIPE_DEGRADED,
+       STRIPE_BIT_DELAY,
+       STRIPE_EXPANDING,
+       STRIPE_EXPAND_SOURCE,
diff --git a/queue-6.6/md-raid5-implement-pers-bitmap_sector.patch b/queue-6.6/md-raid5-implement-pers-bitmap_sector.patch

new file mode 100644 (file)

index 0000000..a5cd336
--- /dev/null
+++ b/queue-6.6/md-raid5-implement-pers-bitmap_sector.patch
@@ -0,0 +1,111 @@
+From stable+bounces-114481-greg=kroah.com@vger.kernel.org Mon Feb 10 08:40:56 2025
+From: Yu Kuai <yukuai1@huaweicloud.com>
+Date: Mon, 10 Feb 2025 15:33:21 +0800
+Subject: md/raid5: implement pers->bitmap_sector()
+To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com
+Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com
+Message-ID: <20250210073322.3315094-6-yukuai1@huaweicloud.com>
+
+From: Yu Kuai <yukuai3@huawei.com>
+
+commit 9c89f604476cf15c31fbbdb043cff7fbf1dbe0cb upstream.
+
+Bitmap is used for the whole array for raid1/raid10, hence IO for the
+array can be used directly for bitmap. However, bitmap is used for
+underlying disks for raid5, hence IO for the array can't be used
+directly for bitmap.
+
+Implement pers->bitmap_sector() for raid5 to convert IO ranges from the
+array to the underlying disks.
+
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Link: https://lore.kernel.org/r/20250109015145.158868-5-yukuai1@huaweicloud.com
+Signed-off-by: Song Liu <song@kernel.org>
+[ Resolve minor conflicts ]
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/md/raid5.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 51 insertions(+)
+
+--- a/drivers/md/raid5.c
++++ b/drivers/md/raid5.c
+@@ -5996,6 +5996,54 @@ static enum reshape_loc get_reshape_loc(
+       return LOC_BEHIND_RESHAPE;
+ }
+ 
++static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
++                              unsigned long *sectors)
++{
++      struct r5conf *conf = mddev->private;
++      sector_t start = *offset;
++      sector_t end = start + *sectors;
++      sector_t prev_start = start;
++      sector_t prev_end = end;
++      int sectors_per_chunk;
++      enum reshape_loc loc;
++      int dd_idx;
++
++      sectors_per_chunk = conf->chunk_sectors *
++              (conf->raid_disks - conf->max_degraded);
++      start = round_down(start, sectors_per_chunk);
++      end = round_up(end, sectors_per_chunk);
++
++      start = raid5_compute_sector(conf, start, 0, &dd_idx, NULL);
++      end = raid5_compute_sector(conf, end, 0, &dd_idx, NULL);
++
++      /*
++       * For LOC_INSIDE_RESHAPE, this IO will wait for reshape to make
++       * progress, hence it's the same as LOC_BEHIND_RESHAPE.
++       */
++      loc = get_reshape_loc(mddev, conf, prev_start);
++      if (likely(loc != LOC_AHEAD_OF_RESHAPE)) {
++              *offset = start;
++              *sectors = end - start;
++              return;
++      }
++
++      sectors_per_chunk = conf->prev_chunk_sectors *
++              (conf->previous_raid_disks - conf->max_degraded);
++      prev_start = round_down(prev_start, sectors_per_chunk);
++      prev_end = round_down(prev_end, sectors_per_chunk);
++
++      prev_start = raid5_compute_sector(conf, prev_start, 1, &dd_idx, NULL);
++      prev_end = raid5_compute_sector(conf, prev_end, 1, &dd_idx, NULL);
++
++      /*
++       * for LOC_AHEAD_OF_RESHAPE, reshape can make progress before this IO
++       * is handled in make_stripe_request(), we can't know this here hence
++       * we set bits for both.
++       */
++      *offset = min(start, prev_start);
++      *sectors = max(end, prev_end) - *offset;
++}
++
+ static enum stripe_result make_stripe_request(struct mddev *mddev,
+               struct r5conf *conf, struct stripe_request_ctx *ctx,
+               sector_t logical_sector, struct bio *bi)
+@@ -9099,6 +9147,7 @@ static struct md_personality raid6_perso
+       .quiesce        = raid5_quiesce,
+       .takeover       = raid6_takeover,
+       .change_consistency_policy = raid5_change_consistency_policy,
++      .bitmap_sector  = raid5_bitmap_sector,
+ };
+ static struct md_personality raid5_personality =
+ {
+@@ -9124,6 +9173,7 @@ static struct md_personality raid5_perso
+       .quiesce        = raid5_quiesce,
+       .takeover       = raid5_takeover,
+       .change_consistency_policy = raid5_change_consistency_policy,
++      .bitmap_sector  = raid5_bitmap_sector,
+ };
+ 
+ static struct md_personality raid4_personality =
+@@ -9150,6 +9200,7 @@ static struct md_personality raid4_perso
+       .quiesce        = raid5_quiesce,
+       .takeover       = raid4_takeover,
+       .change_consistency_policy = raid5_change_consistency_policy,
++      .bitmap_sector  = raid5_bitmap_sector,
+ };
+ 
+ static int __init raid5_init(void)
diff --git a/queue-6.6/md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch b/queue-6.6/md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch

new file mode 100644 (file)

index 0000000..a5651e7
--- /dev/null
+++ b/queue-6.6/md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch
@@ -0,0 +1,134 @@
+From stable+bounces-114478-greg=kroah.com@vger.kernel.org Mon Feb 10 08:40:39 2025
+From: Yu Kuai <yukuai1@huaweicloud.com>
+Date: Mon, 10 Feb 2025 15:33:17 +0800
+Subject: md/raid5: recheck if reshape has finished with device_lock held
+To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com
+Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com
+Message-ID: <20250210073322.3315094-2-yukuai1@huaweicloud.com>
+
+From: Benjamin Marzinski <bmarzins@redhat.com>
+
+commit 25b3a8237a03ec0b67b965b52d74862e77ef7115 upstream.
+
+When handling an IO request, MD checks if a reshape is currently
+happening, and if so, where the IO sector is in relation to the reshape
+progress. MD uses conf->reshape_progress for both of these tasks.  When
+the reshape finishes, conf->reshape_progress is set to MaxSector.  If
+this occurs after MD checks if the reshape is currently happening but
+before it calls ahead_of_reshape(), then ahead_of_reshape() will end up
+comparing the IO sector against MaxSector. During a backwards reshape,
+this will make MD think the IO sector is in the area not yet reshaped,
+causing it to use the previous configuration, and map the IO to the
+sector where that data was before the reshape.
+
+This bug can be triggered by running the lvm2
+lvconvert-raid-reshape-linear_to_raid6-single-type.sh test in a loop,
+although it's very hard to reproduce.
+
+Fix this by factoring the code that checks where the IO sector is in
+relation to the reshape out to a helper called get_reshape_loc(),
+which reads reshape_progress and reshape_safe while holding the
+device_lock, and then rechecks if the reshape has finished before
+calling ahead_of_reshape with the saved values.
+
+Also use the helper during the REQ_NOWAIT check to see if the location
+is inside of the reshape region.
+
+Fixes: fef9c61fdfabf ("md/raid5: change reshape-progress measurement to cope with reshaping backwards.")
+Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
+Signed-off-by: Song Liu <song@kernel.org>
+Link: https://lore.kernel.org/r/20240702151802.1632010-1-bmarzins@redhat.com
+Signed-off-by: Yu Kuai <yukuai3@huawei.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/md/raid5.c |   64 +++++++++++++++++++++++++++++++++--------------------
+ 1 file changed, 41 insertions(+), 23 deletions(-)
+
+--- a/drivers/md/raid5.c
++++ b/drivers/md/raid5.c
+@@ -5972,6 +5972,39 @@ static bool reshape_disabled(struct mdde
+       return is_md_suspended(mddev) || !md_is_rdwr(mddev);
+ }
+ 
++enum reshape_loc {
++      LOC_NO_RESHAPE,
++      LOC_AHEAD_OF_RESHAPE,
++      LOC_INSIDE_RESHAPE,
++      LOC_BEHIND_RESHAPE,
++};
++
++static enum reshape_loc get_reshape_loc(struct mddev *mddev,
++              struct r5conf *conf, sector_t logical_sector)
++{
++      sector_t reshape_progress, reshape_safe;
++      /*
++       * Spinlock is needed as reshape_progress may be
++       * 64bit on a 32bit platform, and so it might be
++       * possible to see a half-updated value
++       * Of course reshape_progress could change after
++       * the lock is dropped, so once we get a reference
++       * to the stripe that we think it is, we will have
++       * to check again.
++       */
++      spin_lock_irq(&conf->device_lock);
++      reshape_progress = conf->reshape_progress;
++      reshape_safe = conf->reshape_safe;
++      spin_unlock_irq(&conf->device_lock);
++      if (reshape_progress == MaxSector)
++              return LOC_NO_RESHAPE;
++      if (ahead_of_reshape(mddev, logical_sector, reshape_progress))
++              return LOC_AHEAD_OF_RESHAPE;
++      if (ahead_of_reshape(mddev, logical_sector, reshape_safe))
++              return LOC_INSIDE_RESHAPE;
++      return LOC_BEHIND_RESHAPE;
++}
++
+ static enum stripe_result make_stripe_request(struct mddev *mddev,
+               struct r5conf *conf, struct stripe_request_ctx *ctx,
+               sector_t logical_sector, struct bio *bi)
+@@ -5986,28 +6019,14 @@ static enum stripe_result make_stripe_re
+       seq = read_seqcount_begin(&conf->gen_lock);
+ 
+       if (unlikely(conf->reshape_progress != MaxSector)) {
+-              /*
+-               * Spinlock is needed as reshape_progress may be
+-               * 64bit on a 32bit platform, and so it might be
+-               * possible to see a half-updated value
+-               * Of course reshape_progress could change after
+-               * the lock is dropped, so once we get a reference
+-               * to the stripe that we think it is, we will have
+-               * to check again.
+-               */
+-              spin_lock_irq(&conf->device_lock);
+-              if (ahead_of_reshape(mddev, logical_sector,
+-                                   conf->reshape_progress)) {
+-                      previous = 1;
+-              } else {
+-                      if (ahead_of_reshape(mddev, logical_sector,
+-                                           conf->reshape_safe)) {
+-                              spin_unlock_irq(&conf->device_lock);
+-                              ret = STRIPE_SCHEDULE_AND_RETRY;
+-                              goto out;
+-                      }
++              enum reshape_loc loc = get_reshape_loc(mddev, conf,
++                                                     logical_sector);
++              if (loc == LOC_INSIDE_RESHAPE) {
++                      ret = STRIPE_SCHEDULE_AND_RETRY;
++                      goto out;
+               }
+-              spin_unlock_irq(&conf->device_lock);
++              if (loc == LOC_AHEAD_OF_RESHAPE)
++                      previous = 1;
+       }
+ 
+       new_sector = raid5_compute_sector(conf, logical_sector, previous,
+@@ -6189,8 +6208,7 @@ static bool raid5_make_request(struct md
+       /* Bail out if conflicts with reshape and REQ_NOWAIT is set */
+       if ((bi->bi_opf & REQ_NOWAIT) &&
+           (conf->reshape_progress != MaxSector) &&
+-          !ahead_of_reshape(mddev, logical_sector, conf->reshape_progress) &&
+-          ahead_of_reshape(mddev, logical_sector, conf->reshape_safe)) {
++          get_reshape_loc(mddev, conf, logical_sector) == LOC_INSIDE_RESHAPE) {
+               bio_wouldblock_error(bi);
+               if (rw == WRITE)
+                       md_write_end(mddev);
diff --git a/queue-6.6/mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch b/queue-6.6/mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch

new file mode 100644 (file)

index 0000000..27e7ad1
--- /dev/null
+++ b/queue-6.6/mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch
@@ -0,0 +1,90 @@
+From 1aaf8c122918aa8897605a9aa1e8ed6600d6f930 Mon Sep 17 00:00:00 2001
+From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
+Date: Tue, 21 Jan 2025 10:01:59 +0800
+Subject: mm: gup: fix infinite loop within __get_longterm_locked
+
+From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
+
+commit 1aaf8c122918aa8897605a9aa1e8ed6600d6f930 upstream.
+
+We can run into an infinite loop in __get_longterm_locked() when
+collect_longterm_unpinnable_folios() finds only folios that are isolated
+from the LRU or were never added to the LRU.  This can happen when all
+folios to be pinned are never added to the LRU, for example when
+vm_ops->fault allocated pages using cma_alloc() and never added them to
+the LRU.
+
+Fix it by simply taking a look at the list in the single caller, to see if
+anything was added.
+
+[zhaoyang.huang@unisoc.com: move definition of local]
+  Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com
+Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com
+Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()")
+Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
+Reviewed-by: John Hubbard <jhubbard@nvidia.com>
+Reviewed-by: David Hildenbrand <david@redhat.com>
+Suggested-by: David Hildenbrand <david@redhat.com>
+Acked-by: David Hildenbrand <david@redhat.com>
+Cc: Aijun Sun <aijun.sun@unisoc.com>
+Cc: Alistair Popple <apopple@nvidia.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ mm/gup.c |   14 ++++----------
+ 1 file changed, 4 insertions(+), 10 deletions(-)
+
+--- a/mm/gup.c
++++ b/mm/gup.c
+@@ -1946,14 +1946,14 @@ struct page *get_dump_page(unsigned long
+ /*
+  * Returns the number of collected pages. Return value is always >= 0.
+  */
+-static unsigned long collect_longterm_unpinnable_pages(
++static void collect_longterm_unpinnable_pages(
+                                       struct list_head *movable_page_list,
+                                       unsigned long nr_pages,
+                                       struct page **pages)
+ {
+-      unsigned long i, collected = 0;
+       struct folio *prev_folio = NULL;
+       bool drain_allow = true;
++      unsigned long i;
+ 
+       for (i = 0; i < nr_pages; i++) {
+               struct folio *folio = page_folio(pages[i]);
+@@ -1965,8 +1965,6 @@ static unsigned long collect_longterm_un
+               if (folio_is_longterm_pinnable(folio))
+                       continue;
+ 
+-              collected++;
+-
+               if (folio_is_device_coherent(folio))
+                       continue;
+ 
+@@ -1988,8 +1986,6 @@ static unsigned long collect_longterm_un
+                                   NR_ISOLATED_ANON + folio_is_file_lru(folio),
+                                   folio_nr_pages(folio));
+       }
+-
+-      return collected;
+ }
+ 
+ /*
+@@ -2082,12 +2078,10 @@ err:
+ static long check_and_migrate_movable_pages(unsigned long nr_pages,
+                                           struct page **pages)
+ {
+-      unsigned long collected;
+       LIST_HEAD(movable_page_list);
+ 
+-      collected = collect_longterm_unpinnable_pages(&movable_page_list,
+-                                              nr_pages, pages);
+-      if (!collected)
++      collect_longterm_unpinnable_pages(&movable_page_list, nr_pages, pages);
++      if (list_empty(&movable_page_list))
+               return 0;
+ 
+       return migrate_longterm_unpinnable_pages(&movable_page_list, nr_pages,
diff --git a/queue-6.6/netdevsim-print-human-readable-ip-address.patch b/queue-6.6/netdevsim-print-human-readable-ip-address.patch

new file mode 100644 (file)

index 0000000..77b5a33
--- /dev/null
+++ b/queue-6.6/netdevsim-print-human-readable-ip-address.patch
@@ -0,0 +1,70 @@
+From c71bc6da6198a6d88df86094f1052bb581951d65 Mon Sep 17 00:00:00 2001
+From: Hangbin Liu <liuhangbin@gmail.com>
+Date: Thu, 10 Oct 2024 04:00:25 +0000
+Subject: netdevsim: print human readable IP address
+
+From: Hangbin Liu <liuhangbin@gmail.com>
+
+commit c71bc6da6198a6d88df86094f1052bb581951d65 upstream.
+
+Currently, IPSec addresses are printed in hexadecimal format, which is
+not user-friendly. e.g.
+
+  # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
+  SA count=2 tx=20
+  sa[0] rx ipaddr=0x00000000 00000000 00000000 0100a8c0
+  sa[0]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
+  sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
+  sa[1] tx ipaddr=0x00000000 00000000 00000000 00000000
+  sa[1]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
+  sa[1]    key=0x3167608a ca4f1397 43565909 941fa627
+
+This patch updates the code to print the IPSec address in a human-readable
+format for easier debug. e.g.
+
+ # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
+ SA count=4 tx=40
+ sa[0] tx ipaddr=0.0.0.0
+ sa[0]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
+ sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
+ sa[1] rx ipaddr=192.168.0.1
+ sa[1]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
+ sa[1]    key=0x3167608a ca4f1397 43565909 941fa627
+ sa[2] tx ipaddr=::
+ sa[2]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
+ sa[2]    key=0x3167608a ca4f1397 43565909 941fa627
+ sa[3] rx ipaddr=2000::1
+ sa[3]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
+ sa[3]    key=0x3167608a ca4f1397 43565909 941fa627
+
+Reviewed-by: Simon Horman <horms@kernel.org>
+Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
+Link: https://patch.msgid.link/20241010040027.21440-2-liuhangbin@gmail.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/net/netdevsim/ipsec.c |   12 ++++++++----
+ 1 file changed, 8 insertions(+), 4 deletions(-)
+
+--- a/drivers/net/netdevsim/ipsec.c
++++ b/drivers/net/netdevsim/ipsec.c
+@@ -39,10 +39,14 @@ static ssize_t nsim_dbg_netdev_ops_read(
+               if (!sap->used)
+                       continue;
+ 
+-              p += scnprintf(p, bufsize - (p - buf),
+-                             "sa[%i] %cx ipaddr=0x%08x %08x %08x %08x\n",
+-                             i, (sap->rx ? 'r' : 't'), sap->ipaddr[0],
+-                             sap->ipaddr[1], sap->ipaddr[2], sap->ipaddr[3]);
++              if (sap->xs->props.family == AF_INET6)
++                      p += scnprintf(p, bufsize - (p - buf),
++                                     "sa[%i] %cx ipaddr=%pI6c\n",
++                                     i, (sap->rx ? 'r' : 't'), &sap->ipaddr);
++              else
++                      p += scnprintf(p, bufsize - (p - buf),
++                                     "sa[%i] %cx ipaddr=%pI4\n",
++                                     i, (sap->rx ? 'r' : 't'), &sap->ipaddr[3]);
+               p += scnprintf(p, bufsize - (p - buf),
+                              "sa[%i]    spi=0x%08x proto=0x%x salt=0x%08x crypt=%d\n",
+                              i, be32_to_cpu(sap->xs->id.spi),
diff --git a/queue-6.6/selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch b/queue-6.6/selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch

new file mode 100644 (file)

index 0000000..e521ef2
--- /dev/null
+++ b/queue-6.6/selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch
@@ -0,0 +1,40 @@
+From 3ec920bb978ccdc68a7dfb304d303d598d038cb1 Mon Sep 17 00:00:00 2001
+From: Hangbin Liu <liuhangbin@gmail.com>
+Date: Thu, 10 Oct 2024 04:00:27 +0000
+Subject: selftests: rtnetlink: update netdevsim ipsec output format
+
+From: Hangbin Liu <liuhangbin@gmail.com>
+
+commit 3ec920bb978ccdc68a7dfb304d303d598d038cb1 upstream.
+
+After the netdevsim update to use human-readable IP address formats for
+IPsec, we can now use the source and destination IPs directly in testing.
+Here is the result:
+  # ./rtnetlink.sh -t kci_test_ipsec_offload
+  PASS: ipsec_offload
+
+Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
+Acked-by: Stanislav Fomichev <sdf@fomichev.me>
+Link: https://patch.msgid.link/20241010040027.21440-4-liuhangbin@gmail.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ tools/testing/selftests/net/rtnetlink.sh |    4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/tools/testing/selftests/net/rtnetlink.sh
++++ b/tools/testing/selftests/net/rtnetlink.sh
+@@ -921,10 +921,10 @@ kci_test_ipsec_offload()
+       # does driver have correct offload info
+       diff $sysfsf - << EOF
+ SA count=2 tx=3
+-sa[0] tx ipaddr=0x00000000 00000000 00000000 00000000
++sa[0] tx ipaddr=$dstip
+ sa[0]    spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1
+ sa[0]    key=0x34333231 38373635 32313039 36353433
+-sa[1] rx ipaddr=0x00000000 00000000 00000000 037ba8c0
++sa[1] rx ipaddr=$srcip
+ sa[1]    spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1
+ sa[1]    key=0x34333231 38373635 32313039 36353433
+ EOF
diff --git a/queue-6.6/series b/queue-6.6/series

index dced9d11613bbb8e28539448a4e8f70243ddfad7..a7562fba0c4e2e1dfe91dbd34cb16de62a1d3e53 100644 (file)
--- a/queue-6.6/series
+++ b/queue-6.6/series
@@ -133,3 +133,13 @@ drm-v3d-stop-active-perfmon-if-it-is-being-destroyed.patch
  x86-static-call-remove-early_boot_irqs_disabled-check-to-fix-xen-pvh-dom0.patch
  drm-amd-display-add-null-check-for-head_pipe-in-dcn201_acquire_free_pipe_for_layer.patch
  drm-amd-display-pass-non-null-to-dcn20_validate_apply_pipe_split_flags.patch
+netdevsim-print-human-readable-ip-address.patch
+selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch
+md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch
+md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch
+md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch
+md-add-a-new-callback-pers-bitmap_sector.patch
+md-raid5-implement-pers-bitmap_sector.patch
+md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch
+arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch
+mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch
author	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	Tue, 18 Feb 2025 15:09:36 +0000 (16:09 +0100)
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	Tue, 18 Feb 2025 15:09:36 +0000 (16:09 +0100)
queue-6.6/arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/md-add-a-new-callback-pers-bitmap_sector.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/md-raid5-implement-pers-bitmap_sector.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/netdevsim-print-human-readable-ip-address.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch	[new file with mode: 0644]	patch \| blob
queue-6.6/series		patch \| blob \| blame \| history