From: Greg Kroah-Hartman Date: Tue, 18 Feb 2025 15:09:36 +0000 (+0100) Subject: 6.6-stable patches X-Git-Tag: v6.1.129~33 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=6b2504286f14d8ebf6bcf646032e7bd428fabf6e;p=thirdparty%2Fkernel%2Fstable-queue.git 6.6-stable patches added patches: arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch md-add-a-new-callback-pers-bitmap_sector.patch md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch md-raid5-implement-pers-bitmap_sector.patch md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch netdevsim-print-human-readable-ip-address.patch selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch --- diff --git a/queue-6.6/arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch b/queue-6.6/arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch new file mode 100644 index 0000000000..de128337c9 --- /dev/null +++ b/queue-6.6/arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch @@ -0,0 +1,191 @@ +From 064737920bdbca86df91b96aed256e88018fef3a Mon Sep 17 00:00:00 2001 +From: Marc Zyngier +Date: Tue, 7 Jan 2025 22:59:41 +0000 +Subject: arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented + +From: Marc Zyngier + +commit 064737920bdbca86df91b96aed256e88018fef3a upstream. + +The hwcaps code that exposes SVE features to userspace only +considers ID_AA64ZFR0_EL1, while this is only valid when +ID_AA64PFR0_EL1.SVE advertises that SVE is actually supported. + +The expectations are that when ID_AA64PFR0_EL1.SVE is 0, the +ID_AA64ZFR0_EL1 register is also 0. So far, so good. + +Things become a bit more interesting if the HW implements SME. +In this case, a few ID_AA64ZFR0_EL1 fields indicate *SME* +features. And these fields overlap with their SVE interpretations. +But the architecture says that the SME and SVE feature sets must +match, so we're still hunky-dory. + +This goes wrong if the HW implements SME, but not SVE. In this +case, we end-up advertising some SVE features to userspace, even +if the HW has none. That's because we never consider whether SVE +is actually implemented. Oh well. + +Fix it by restricting all SVE capabilities to ID_AA64PFR0_EL1.SVE +being non-zero. The HWCAPS documentation is amended to reflect the +actually checks performed by the kernel. + +Fixes: 06a916feca2b ("arm64: Expose SVE2 features for userspace") +Reported-by: Catalin Marinas +Signed-off-by: Marc Zyngier +Signed-off-by: Mark Brown +Cc: Will Deacon +Cc: Mark Rutland +Cc: stable@vger.kernel.org +Reviewed-by: Mark Brown +Link: https://lore.kernel.org/r/20250107-arm64-2024-dpisa-v5-1-7578da51fc3d@kernel.org +Signed-off-by: Will Deacon +Signed-off-by: Marc Zyngier +Signed-off-by: Greg Kroah-Hartman +--- + Documentation/arch/arm64/elf_hwcaps.rst | 36 ++++++++++++++++++++---------- + arch/arm64/kernel/cpufeature.c | 38 +++++++++++++++++++++----------- + 2 files changed, 50 insertions(+), 24 deletions(-) + +--- a/Documentation/arch/arm64/elf_hwcaps.rst ++++ b/Documentation/arch/arm64/elf_hwcaps.rst +@@ -174,22 +174,28 @@ HWCAP2_DCPODP + Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010. + + HWCAP2_SVE2 +- Functionality implied by ID_AA64ZFR0_EL1.SVEVer == 0b0001. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.SVEver == 0b0001. + + HWCAP2_SVEAES +- Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0001. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.AES == 0b0001. + + HWCAP2_SVEPMULL +- Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0010. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.AES == 0b0010. + + HWCAP2_SVEBITPERM +- Functionality implied by ID_AA64ZFR0_EL1.BitPerm == 0b0001. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.BitPerm == 0b0001. + + HWCAP2_SVESHA3 +- Functionality implied by ID_AA64ZFR0_EL1.SHA3 == 0b0001. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.SHA3 == 0b0001. + + HWCAP2_SVESM4 +- Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.SM4 == 0b0001. + + HWCAP2_FLAGM2 + Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010. +@@ -198,16 +204,20 @@ HWCAP2_FRINT + Functionality implied by ID_AA64ISAR1_EL1.FRINTTS == 0b0001. + + HWCAP2_SVEI8MM +- Functionality implied by ID_AA64ZFR0_EL1.I8MM == 0b0001. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.I8MM == 0b0001. + + HWCAP2_SVEF32MM +- Functionality implied by ID_AA64ZFR0_EL1.F32MM == 0b0001. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.F32MM == 0b0001. + + HWCAP2_SVEF64MM +- Functionality implied by ID_AA64ZFR0_EL1.F64MM == 0b0001. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.F64MM == 0b0001. + + HWCAP2_SVEBF16 +- Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0001. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.BF16 == 0b0001. + + HWCAP2_I8MM + Functionality implied by ID_AA64ISAR1_EL1.I8MM == 0b0001. +@@ -273,7 +283,8 @@ HWCAP2_EBF16 + Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0010. + + HWCAP2_SVE_EBF16 +- Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0010. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.BF16 == 0b0010. + + HWCAP2_CSSC + Functionality implied by ID_AA64ISAR2_EL1.CSSC == 0b0001. +@@ -282,7 +293,8 @@ HWCAP2_RPRFM + Functionality implied by ID_AA64ISAR2_EL1.RPRFM == 0b0001. + + HWCAP2_SVE2P1 +- Functionality implied by ID_AA64ZFR0_EL1.SVEver == 0b0010. ++ Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and ++ ID_AA64ZFR0_EL1.SVEver == 0b0010. + + HWCAP2_SME2 + Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0001. +--- a/arch/arm64/kernel/cpufeature.c ++++ b/arch/arm64/kernel/cpufeature.c +@@ -2762,6 +2762,13 @@ static const struct arm64_cpu_capabiliti + .matches = match, \ + } + ++#define HWCAP_CAP_MATCH_ID(match, reg, field, min_value, cap_type, cap) \ ++ { \ ++ __HWCAP_CAP(#cap, cap_type, cap) \ ++ HWCAP_CPUID_MATCH(reg, field, min_value) \ ++ .matches = match, \ ++ } ++ + #ifdef CONFIG_ARM64_PTR_AUTH + static const struct arm64_cpu_capabilities ptr_auth_hwcap_addr_matches[] = { + { +@@ -2790,6 +2797,13 @@ static const struct arm64_cpu_capabiliti + }; + #endif + ++#ifdef CONFIG_ARM64_SVE ++static bool has_sve_feature(const struct arm64_cpu_capabilities *cap, int scope) ++{ ++ return system_supports_sve() && has_user_cpuid_feature(cap, scope); ++} ++#endif ++ + static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = { + HWCAP_CAP(ID_AA64ISAR0_EL1, AES, PMULL, CAP_HWCAP, KERNEL_HWCAP_PMULL), + HWCAP_CAP(ID_AA64ISAR0_EL1, AES, AES, CAP_HWCAP, KERNEL_HWCAP_AES), +@@ -2827,18 +2841,18 @@ static const struct arm64_cpu_capabiliti + HWCAP_CAP(ID_AA64MMFR2_EL1, AT, IMP, CAP_HWCAP, KERNEL_HWCAP_USCAT), + #ifdef CONFIG_ARM64_SVE + HWCAP_CAP(ID_AA64PFR0_EL1, SVE, IMP, CAP_HWCAP, KERNEL_HWCAP_SVE), +- HWCAP_CAP(ID_AA64ZFR0_EL1, SVEver, SVE2p1, CAP_HWCAP, KERNEL_HWCAP_SVE2P1), +- HWCAP_CAP(ID_AA64ZFR0_EL1, SVEver, SVE2, CAP_HWCAP, KERNEL_HWCAP_SVE2), +- HWCAP_CAP(ID_AA64ZFR0_EL1, AES, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEAES), +- HWCAP_CAP(ID_AA64ZFR0_EL1, AES, PMULL128, CAP_HWCAP, KERNEL_HWCAP_SVEPMULL), +- HWCAP_CAP(ID_AA64ZFR0_EL1, BitPerm, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBITPERM), +- HWCAP_CAP(ID_AA64ZFR0_EL1, BF16, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBF16), +- HWCAP_CAP(ID_AA64ZFR0_EL1, BF16, EBF16, CAP_HWCAP, KERNEL_HWCAP_SVE_EBF16), +- HWCAP_CAP(ID_AA64ZFR0_EL1, SHA3, IMP, CAP_HWCAP, KERNEL_HWCAP_SVESHA3), +- HWCAP_CAP(ID_AA64ZFR0_EL1, SM4, IMP, CAP_HWCAP, KERNEL_HWCAP_SVESM4), +- HWCAP_CAP(ID_AA64ZFR0_EL1, I8MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEI8MM), +- HWCAP_CAP(ID_AA64ZFR0_EL1, F32MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF32MM), +- HWCAP_CAP(ID_AA64ZFR0_EL1, F64MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF64MM), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SVEver, SVE2p1, CAP_HWCAP, KERNEL_HWCAP_SVE2P1), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SVEver, SVE2, CAP_HWCAP, KERNEL_HWCAP_SVE2), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, AES, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEAES), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, AES, PMULL128, CAP_HWCAP, KERNEL_HWCAP_SVEPMULL), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, BitPerm, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBITPERM), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, BF16, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBF16), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, BF16, EBF16, CAP_HWCAP, KERNEL_HWCAP_SVE_EBF16), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SHA3, IMP, CAP_HWCAP, KERNEL_HWCAP_SVESHA3), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SM4, IMP, CAP_HWCAP, KERNEL_HWCAP_SVESM4), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, I8MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEI8MM), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, F32MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF32MM), ++ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, F64MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF64MM), + #endif + HWCAP_CAP(ID_AA64PFR1_EL1, SSBS, SSBS2, CAP_HWCAP, KERNEL_HWCAP_SSBS), + #ifdef CONFIG_ARM64_BTI diff --git a/queue-6.6/md-add-a-new-callback-pers-bitmap_sector.patch b/queue-6.6/md-add-a-new-callback-pers-bitmap_sector.patch new file mode 100644 index 0000000000..ec768b442e --- /dev/null +++ b/queue-6.6/md-add-a-new-callback-pers-bitmap_sector.patch @@ -0,0 +1,36 @@ +From stable+bounces-114480-greg=kroah.com@vger.kernel.org Mon Feb 10 08:40:48 2025 +From: Yu Kuai +Date: Mon, 10 Feb 2025 15:33:20 +0800 +Subject: md: add a new callback pers->bitmap_sector() +To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com +Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com +Message-ID: <20250210073322.3315094-5-yukuai1@huaweicloud.com> + +From: Yu Kuai + +commit 0c984a283a3ea3f10bebecd6c57c1d41b2e4f518 upstream. + +This callback will be used in raid5 to convert io ranges from array to +bitmap. + +Signed-off-by: Yu Kuai +Reviewed-by: Xiao Ni +Link: https://lore.kernel.org/r/20250109015145.158868-4-yukuai1@huaweicloud.com +Signed-off-by: Song Liu +Signed-off-by: Greg Kroah-Hartman +--- + drivers/md/md.h | 3 +++ + 1 file changed, 3 insertions(+) + +--- a/drivers/md/md.h ++++ b/drivers/md/md.h +@@ -661,6 +661,9 @@ struct md_personality + void *(*takeover) (struct mddev *mddev); + /* Changes the consistency policy of an active array. */ + int (*change_consistency_policy)(struct mddev *mddev, const char *buf); ++ /* convert io ranges from array to bitmap */ ++ void (*bitmap_sector)(struct mddev *mddev, sector_t *offset, ++ unsigned long *sectors); + }; + + struct md_sysfs_entry { diff --git a/queue-6.6/md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch b/queue-6.6/md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch new file mode 100644 index 0000000000..7a165bbc38 --- /dev/null +++ b/queue-6.6/md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch @@ -0,0 +1,261 @@ +From stable+bounces-114483-greg=kroah.com@vger.kernel.org Mon Feb 10 08:41:21 2025 +From: Yu Kuai +Date: Mon, 10 Feb 2025 15:33:18 +0800 +Subject: md/md-bitmap: factor behind write counters out from bitmap_{start/end}write() +To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com +Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com +Message-ID: <20250210073322.3315094-3-yukuai1@huaweicloud.com> + +From: Yu Kuai + +commit 08c50142a128dcb2d7060aa3b4c5db8837f7a46a upstream. + +behind_write is only used in raid1, prepare to refactor +bitmap_{start/end}write(), there are no functional changes. + +Signed-off-by: Yu Kuai +Reviewed-by: Xiao Ni +Link: https://lore.kernel.org/r/20250109015145.158868-2-yukuai1@huaweicloud.com +Signed-off-by: Song Liu +[There is no bitmap_operations, resolve conflicts by exporting new api +md_bitmap_{start,end}_behind_write] +Signed-off-by: Yu Kuai +Signed-off-by: Greg Kroah-Hartman +--- + drivers/md/md-bitmap.c | 60 +++++++++++++++++++++++++++++------------------ + drivers/md/md-bitmap.h | 6 +++- + drivers/md/raid1.c | 11 +++++--- + drivers/md/raid10.c | 5 +-- + drivers/md/raid5-cache.c | 4 +-- + drivers/md/raid5.c | 13 ++++------ + 6 files changed, 59 insertions(+), 40 deletions(-) + +--- a/drivers/md/md-bitmap.c ++++ b/drivers/md/md-bitmap.c +@@ -1465,22 +1465,12 @@ __acquires(bitmap->lock) + &(bitmap->bp[page].map[pageoff]); + } + +-int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, unsigned long sectors, int behind) ++int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, ++ unsigned long sectors) + { + if (!bitmap) + return 0; + +- if (behind) { +- int bw; +- atomic_inc(&bitmap->behind_writes); +- bw = atomic_read(&bitmap->behind_writes); +- if (bw > bitmap->behind_writes_used) +- bitmap->behind_writes_used = bw; +- +- pr_debug("inc write-behind count %d/%lu\n", +- bw, bitmap->mddev->bitmap_info.max_write_behind); +- } +- + while (sectors) { + sector_t blocks; + bitmap_counter_t *bmc; +@@ -1527,20 +1517,13 @@ int md_bitmap_startwrite(struct bitmap * + } + return 0; + } +-EXPORT_SYMBOL(md_bitmap_startwrite); ++EXPORT_SYMBOL_GPL(md_bitmap_startwrite); + + void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, +- unsigned long sectors, int success, int behind) ++ unsigned long sectors, int success) + { + if (!bitmap) + return; +- if (behind) { +- if (atomic_dec_and_test(&bitmap->behind_writes)) +- wake_up(&bitmap->behind_wait); +- pr_debug("dec write-behind count %d/%lu\n", +- atomic_read(&bitmap->behind_writes), +- bitmap->mddev->bitmap_info.max_write_behind); +- } + + while (sectors) { + sector_t blocks; +@@ -1580,7 +1563,7 @@ void md_bitmap_endwrite(struct bitmap *b + sectors = 0; + } + } +-EXPORT_SYMBOL(md_bitmap_endwrite); ++EXPORT_SYMBOL_GPL(md_bitmap_endwrite); + + static int __bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, + int degraded) +@@ -1842,6 +1825,39 @@ void md_bitmap_free(struct bitmap *bitma + } + EXPORT_SYMBOL(md_bitmap_free); + ++void md_bitmap_start_behind_write(struct mddev *mddev) ++{ ++ struct bitmap *bitmap = mddev->bitmap; ++ int bw; ++ ++ if (!bitmap) ++ return; ++ ++ atomic_inc(&bitmap->behind_writes); ++ bw = atomic_read(&bitmap->behind_writes); ++ if (bw > bitmap->behind_writes_used) ++ bitmap->behind_writes_used = bw; ++ ++ pr_debug("inc write-behind count %d/%lu\n", ++ bw, bitmap->mddev->bitmap_info.max_write_behind); ++} ++EXPORT_SYMBOL_GPL(md_bitmap_start_behind_write); ++ ++void md_bitmap_end_behind_write(struct mddev *mddev) ++{ ++ struct bitmap *bitmap = mddev->bitmap; ++ ++ if (!bitmap) ++ return; ++ ++ if (atomic_dec_and_test(&bitmap->behind_writes)) ++ wake_up(&bitmap->behind_wait); ++ pr_debug("dec write-behind count %d/%lu\n", ++ atomic_read(&bitmap->behind_writes), ++ bitmap->mddev->bitmap_info.max_write_behind); ++} ++EXPORT_SYMBOL_GPL(md_bitmap_end_behind_write); ++ + void md_bitmap_wait_behind_writes(struct mddev *mddev) + { + struct bitmap *bitmap = mddev->bitmap; +--- a/drivers/md/md-bitmap.h ++++ b/drivers/md/md-bitmap.h +@@ -253,9 +253,11 @@ void md_bitmap_dirty_bits(struct bitmap + + /* these are exported */ + int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, +- unsigned long sectors, int behind); ++ unsigned long sectors); + void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, +- unsigned long sectors, int success, int behind); ++ unsigned long sectors, int success); ++void md_bitmap_start_behind_write(struct mddev *mddev); ++void md_bitmap_end_behind_write(struct mddev *mddev); + int md_bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int degraded); + void md_bitmap_end_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int aborted); + void md_bitmap_close_sync(struct bitmap *bitmap); +--- a/drivers/md/raid1.c ++++ b/drivers/md/raid1.c +@@ -419,11 +419,12 @@ static void close_write(struct r1bio *r1 + bio_put(r1_bio->behind_master_bio); + r1_bio->behind_master_bio = NULL; + } ++ if (test_bit(R1BIO_BehindIO, &r1_bio->state)) ++ md_bitmap_end_behind_write(r1_bio->mddev); + /* clear the bitmap if all writes complete successfully */ + md_bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector, + r1_bio->sectors, +- !test_bit(R1BIO_Degraded, &r1_bio->state), +- test_bit(R1BIO_BehindIO, &r1_bio->state)); ++ !test_bit(R1BIO_Degraded, &r1_bio->state)); + md_write_end(r1_bio->mddev); + } + +@@ -1530,8 +1531,10 @@ static void raid1_write_request(struct m + alloc_behind_master_bio(r1_bio, bio); + } + +- md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors, +- test_bit(R1BIO_BehindIO, &r1_bio->state)); ++ if (test_bit(R1BIO_BehindIO, &r1_bio->state)) ++ md_bitmap_start_behind_write(mddev); ++ md_bitmap_startwrite(bitmap, r1_bio->sector, ++ r1_bio->sectors); + first_clone = 0; + } + +--- a/drivers/md/raid10.c ++++ b/drivers/md/raid10.c +@@ -430,8 +430,7 @@ static void close_write(struct r10bio *r + /* clear the bitmap if all writes complete successfully */ + md_bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector, + r10_bio->sectors, +- !test_bit(R10BIO_Degraded, &r10_bio->state), +- 0); ++ !test_bit(R10BIO_Degraded, &r10_bio->state)); + md_write_end(r10_bio->mddev); + } + +@@ -1554,7 +1553,7 @@ static void raid10_write_request(struct + md_account_bio(mddev, &bio); + r10_bio->master_bio = bio; + atomic_set(&r10_bio->remaining, 1); +- md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors, 0); ++ md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors); + + for (i = 0; i < conf->copies; i++) { + if (r10_bio->devs[i].bio) +--- a/drivers/md/raid5-cache.c ++++ b/drivers/md/raid5-cache.c +@@ -315,8 +315,8 @@ void r5c_handle_cached_data_endio(struct + r5c_return_dev_pending_writes(conf, &sh->dev[i]); + md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, + RAID5_STRIPE_SECTORS(conf), +- !test_bit(STRIPE_DEGRADED, &sh->state), +- 0); ++ !test_bit(STRIPE_DEGRADED, ++ &sh->state)); + } + } + } +--- a/drivers/md/raid5.c ++++ b/drivers/md/raid5.c +@@ -3606,7 +3606,7 @@ static void __add_stripe_bio(struct stri + set_bit(STRIPE_BITMAP_PENDING, &sh->state); + spin_unlock_irq(&sh->stripe_lock); + md_bitmap_startwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf), 0); ++ RAID5_STRIPE_SECTORS(conf)); + spin_lock_irq(&sh->stripe_lock); + clear_bit(STRIPE_BITMAP_PENDING, &sh->state); + if (!sh->batch_head) { +@@ -3708,7 +3708,7 @@ handle_failed_stripe(struct r5conf *conf + } + if (bitmap_end) + md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf), 0, 0); ++ RAID5_STRIPE_SECTORS(conf), 0); + bitmap_end = 0; + /* and fail all 'written' */ + bi = sh->dev[i].written; +@@ -3754,7 +3754,7 @@ handle_failed_stripe(struct r5conf *conf + } + if (bitmap_end) + md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf), 0, 0); ++ RAID5_STRIPE_SECTORS(conf), 0); + /* If we were in the middle of a write the parity block might + * still be locked - so just clear all R5_LOCKED flags + */ +@@ -4107,8 +4107,8 @@ returnbi: + } + md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, + RAID5_STRIPE_SECTORS(conf), +- !test_bit(STRIPE_DEGRADED, &sh->state), +- 0); ++ !test_bit(STRIPE_DEGRADED, ++ &sh->state)); + if (head_sh->batch_head) { + sh = list_first_entry(&sh->batch_list, + struct stripe_head, +@@ -5853,8 +5853,7 @@ static void make_discard_request(struct + d++) + md_bitmap_startwrite(mddev->bitmap, + sh->sector, +- RAID5_STRIPE_SECTORS(conf), +- 0); ++ RAID5_STRIPE_SECTORS(conf)); + sh->bm_seq = conf->seq_flush + 1; + set_bit(STRIPE_BIT_DELAY, &sh->state); + } diff --git a/queue-6.6/md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch b/queue-6.6/md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch new file mode 100644 index 0000000000..ab9c6fe444 --- /dev/null +++ b/queue-6.6/md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch @@ -0,0 +1,342 @@ +From stable+bounces-114484-greg=kroah.com@vger.kernel.org Mon Feb 10 08:41:27 2025 +From: Yu Kuai +Date: Mon, 10 Feb 2025 15:33:22 +0800 +Subject: md/md-bitmap: move bitmap_{start, end}write to md upper layer +To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com +Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com +Message-ID: <20250210073322.3315094-7-yukuai1@huaweicloud.com> + +From: Yu Kuai + +commit cd5fc653381811f1e0ba65f5d169918cab61476f upstream. + +There are two BUG reports that raid5 will hang at +bitmap_startwrite([1],[2]), root cause is that bitmap start write and end +write is unbalanced, it's not quite clear where, and while reviewing raid5 +code, it's found that bitmap operations can be optimized. For example, +for a 4 disks raid5, with chunksize=8k, if user issue a IO (0 + 48k) to +the array: + +┌────────────────────────────────────────────────────────────┐ +│chunk 0 │ +│ ┌────────────┬─────────────┬─────────────┬────────────┼ +│ sh0 │A0: 0 + 4k │A1: 8k + 4k │A2: 16k + 4k │A3: P │ +│ ┼────────────┼─────────────┼─────────────┼────────────┼ +│ sh1 │B0: 4k + 4k │B1: 12k + 4k │B2: 20k + 4k │B3: P │ +┼──────┴────────────┴─────────────┴─────────────┴────────────┼ +│chunk 1 │ +│ ┌────────────┬─────────────┬─────────────┬────────────┤ +│ sh2 │C0: 24k + 4k│C1: 32k + 4k │C2: P │C3: 40k + 4k│ +│ ┼────────────┼─────────────┼─────────────┼────────────┼ +│ sh3 │D0: 28k + 4k│D1: 36k + 4k │D2: P │D3: 44k + 4k│ +└──────┴────────────┴─────────────┴─────────────┴────────────┘ + +Before this patch, 4 stripe head will be used, and each sh will attach +bio for 3 disks, and each attached bio will trigger +bitmap_startwrite() once, which means total 12 times. + - 3 times (0 + 4k), for (A0, A1 and A2) + - 3 times (4 + 4k), for (B0, B1 and B2) + - 3 times (8 + 4k), for (C0, C1 and C3) + - 3 times (12 + 4k), for (D0, D1 and D3) + +After this patch, md upper layer will calculate that IO range (0 + 48k) +is corresponding to the bitmap (0 + 16k), and call bitmap_startwrite() +just once. + +Noted that this patch will align bitmap ranges to the chunks, for example, +if user issue a IO (0 + 4k) to array: + +- Before this patch, 1 time (0 + 4k), for A0; +- After this patch, 1 time (0 + 8k) for chunk 0; + +Usually, one bitmap bit will represent more than one disk chunk, and this +doesn't have any difference. And even if user really created a array +that one chunk contain multiple bits, the overhead is that more data +will be recovered after power failure. + +Also remove STRIPE_BITMAP_PENDING since it's not used anymore. + +[1] https://lore.kernel.org/all/CAJpMwyjmHQLvm6zg1cmQErttNNQPDAAXPKM3xgTjMhbfts986Q@mail.gmail.com/ +[2] https://lore.kernel.org/all/ADF7D720-5764-4AF3-B68E-1845988737AA@flyingcircus.io/ + +Signed-off-by: Yu Kuai +Link: https://lore.kernel.org/r/20250109015145.158868-6-yukuai1@huaweicloud.com +Signed-off-by: Song Liu +[There is no bitmap_operations, resolve conflicts by replacing +bitmap_ops->{startwrite, endwrite} with md_bitmap_{startwrite, endwrite}] +Signed-off-by: Yu Kuai +Signed-off-by: Greg Kroah-Hartman +--- + drivers/md/md-bitmap.c | 2 - + drivers/md/md.c | 26 ++++++++++++++++++++++++ + drivers/md/md.h | 2 + + drivers/md/raid1.c | 5 ---- + drivers/md/raid10.c | 4 --- + drivers/md/raid5-cache.c | 2 - + drivers/md/raid5.c | 50 ++++------------------------------------------- + drivers/md/raid5.h | 3 -- + 8 files changed, 33 insertions(+), 61 deletions(-) + +--- a/drivers/md/md-bitmap.c ++++ b/drivers/md/md-bitmap.c +@@ -1517,7 +1517,6 @@ int md_bitmap_startwrite(struct bitmap * + } + return 0; + } +-EXPORT_SYMBOL_GPL(md_bitmap_startwrite); + + void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, + unsigned long sectors) +@@ -1564,7 +1563,6 @@ void md_bitmap_endwrite(struct bitmap *b + sectors = 0; + } + } +-EXPORT_SYMBOL_GPL(md_bitmap_endwrite); + + static int __bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, + int degraded) +--- a/drivers/md/md.c ++++ b/drivers/md/md.c +@@ -8713,12 +8713,32 @@ void md_submit_discard_bio(struct mddev + } + EXPORT_SYMBOL_GPL(md_submit_discard_bio); + ++static void md_bitmap_start(struct mddev *mddev, ++ struct md_io_clone *md_io_clone) ++{ ++ if (mddev->pers->bitmap_sector) ++ mddev->pers->bitmap_sector(mddev, &md_io_clone->offset, ++ &md_io_clone->sectors); ++ ++ md_bitmap_startwrite(mddev->bitmap, md_io_clone->offset, ++ md_io_clone->sectors); ++} ++ ++static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone) ++{ ++ md_bitmap_endwrite(mddev->bitmap, md_io_clone->offset, ++ md_io_clone->sectors); ++} ++ + static void md_end_clone_io(struct bio *bio) + { + struct md_io_clone *md_io_clone = bio->bi_private; + struct bio *orig_bio = md_io_clone->orig_bio; + struct mddev *mddev = md_io_clone->mddev; + ++ if (bio_data_dir(orig_bio) == WRITE && mddev->bitmap) ++ md_bitmap_end(mddev, md_io_clone); ++ + if (bio->bi_status && !orig_bio->bi_status) + orig_bio->bi_status = bio->bi_status; + +@@ -8743,6 +8763,12 @@ static void md_clone_bio(struct mddev *m + if (blk_queue_io_stat(bdev->bd_disk->queue)) + md_io_clone->start_time = bio_start_io_acct(*bio); + ++ if (bio_data_dir(*bio) == WRITE && mddev->bitmap) { ++ md_io_clone->offset = (*bio)->bi_iter.bi_sector; ++ md_io_clone->sectors = bio_sectors(*bio); ++ md_bitmap_start(mddev, md_io_clone); ++ } ++ + clone->bi_end_io = md_end_clone_io; + clone->bi_private = md_io_clone; + *bio = clone; +--- a/drivers/md/md.h ++++ b/drivers/md/md.h +@@ -746,6 +746,8 @@ struct md_io_clone { + struct mddev *mddev; + struct bio *orig_bio; + unsigned long start_time; ++ sector_t offset; ++ unsigned long sectors; + struct bio bio_clone; + }; + +--- a/drivers/md/raid1.c ++++ b/drivers/md/raid1.c +@@ -421,9 +421,6 @@ static void close_write(struct r1bio *r1 + } + if (test_bit(R1BIO_BehindIO, &r1_bio->state)) + md_bitmap_end_behind_write(r1_bio->mddev); +- /* clear the bitmap if all writes complete successfully */ +- md_bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector, +- r1_bio->sectors); + md_write_end(r1_bio->mddev); + } + +@@ -1517,8 +1514,6 @@ static void raid1_write_request(struct m + + if (test_bit(R1BIO_BehindIO, &r1_bio->state)) + md_bitmap_start_behind_write(mddev); +- md_bitmap_startwrite(bitmap, r1_bio->sector, +- r1_bio->sectors); + first_clone = 0; + } + +--- a/drivers/md/raid10.c ++++ b/drivers/md/raid10.c +@@ -427,9 +427,6 @@ static void raid10_end_read_request(stru + + static void close_write(struct r10bio *r10_bio) + { +- /* clear the bitmap if all writes complete successfully */ +- md_bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector, +- r10_bio->sectors); + md_write_end(r10_bio->mddev); + } + +@@ -1541,7 +1538,6 @@ static void raid10_write_request(struct + md_account_bio(mddev, &bio); + r10_bio->master_bio = bio; + atomic_set(&r10_bio->remaining, 1); +- md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors); + + for (i = 0; i < conf->copies; i++) { + if (r10_bio->devs[i].bio) +--- a/drivers/md/raid5-cache.c ++++ b/drivers/md/raid5-cache.c +@@ -313,8 +313,6 @@ void r5c_handle_cached_data_endio(struct + if (sh->dev[i].written) { + set_bit(R5_UPTODATE, &sh->dev[i].flags); + r5c_return_dev_pending_writes(conf, &sh->dev[i]); +- md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf)); + } + } + } +--- a/drivers/md/raid5.c ++++ b/drivers/md/raid5.c +@@ -905,7 +905,6 @@ static bool stripe_can_batch(struct stri + if (raid5_has_log(conf) || raid5_has_ppl(conf)) + return false; + return test_bit(STRIPE_BATCH_READY, &sh->state) && +- !test_bit(STRIPE_BITMAP_PENDING, &sh->state) && + is_full_stripe_write(sh); + } + +@@ -3587,29 +3586,9 @@ static void __add_stripe_bio(struct stri + (*bip)->bi_iter.bi_sector, sh->sector, dd_idx, + sh->dev[dd_idx].sector); + +- if (conf->mddev->bitmap && firstwrite) { +- /* Cannot hold spinlock over bitmap_startwrite, +- * but must ensure this isn't added to a batch until +- * we have added to the bitmap and set bm_seq. +- * So set STRIPE_BITMAP_PENDING to prevent +- * batching. +- * If multiple __add_stripe_bio() calls race here they +- * much all set STRIPE_BITMAP_PENDING. So only the first one +- * to complete "bitmap_startwrite" gets to set +- * STRIPE_BIT_DELAY. This is important as once a stripe +- * is added to a batch, STRIPE_BIT_DELAY cannot be changed +- * any more. +- */ +- set_bit(STRIPE_BITMAP_PENDING, &sh->state); +- spin_unlock_irq(&sh->stripe_lock); +- md_bitmap_startwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf)); +- spin_lock_irq(&sh->stripe_lock); +- clear_bit(STRIPE_BITMAP_PENDING, &sh->state); +- if (!sh->batch_head) { +- sh->bm_seq = conf->seq_flush+1; +- set_bit(STRIPE_BIT_DELAY, &sh->state); +- } ++ if (conf->mddev->bitmap && firstwrite && !sh->batch_head) { ++ sh->bm_seq = conf->seq_flush+1; ++ set_bit(STRIPE_BIT_DELAY, &sh->state); + } + } + +@@ -3660,7 +3639,6 @@ handle_failed_stripe(struct r5conf *conf + BUG_ON(sh->batch_head); + for (i = disks; i--; ) { + struct bio *bi; +- int bitmap_end = 0; + + if (test_bit(R5_ReadError, &sh->dev[i].flags)) { + struct md_rdev *rdev; +@@ -3687,8 +3665,6 @@ handle_failed_stripe(struct r5conf *conf + sh->dev[i].towrite = NULL; + sh->overwrite_disks = 0; + spin_unlock_irq(&sh->stripe_lock); +- if (bi) +- bitmap_end = 1; + + log_stripe_write_finished(sh); + +@@ -3703,10 +3679,6 @@ handle_failed_stripe(struct r5conf *conf + bio_io_error(bi); + bi = nextbi; + } +- if (bitmap_end) +- md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf)); +- bitmap_end = 0; + /* and fail all 'written' */ + bi = sh->dev[i].written; + sh->dev[i].written = NULL; +@@ -3715,7 +3687,6 @@ handle_failed_stripe(struct r5conf *conf + sh->dev[i].page = sh->dev[i].orig_page; + } + +- if (bi) bitmap_end = 1; + while (bi && bi->bi_iter.bi_sector < + sh->dev[i].sector + RAID5_STRIPE_SECTORS(conf)) { + struct bio *bi2 = r5_next_bio(conf, bi, sh->dev[i].sector); +@@ -3749,9 +3720,6 @@ handle_failed_stripe(struct r5conf *conf + bi = nextbi; + } + } +- if (bitmap_end) +- md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf)); + /* If we were in the middle of a write the parity block might + * still be locked - so just clear all R5_LOCKED flags + */ +@@ -4102,8 +4070,7 @@ returnbi: + bio_endio(wbi); + wbi = wbi2; + } +- md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf)); ++ + if (head_sh->batch_head) { + sh = list_first_entry(&sh->batch_list, + struct stripe_head, +@@ -4935,8 +4902,7 @@ static void break_stripe_batch_list(stru + (1 << STRIPE_COMPUTE_RUN) | + (1 << STRIPE_DISCARD) | + (1 << STRIPE_BATCH_READY) | +- (1 << STRIPE_BATCH_ERR) | +- (1 << STRIPE_BITMAP_PENDING)), ++ (1 << STRIPE_BATCH_ERR)), + "stripe state: %lx\n", sh->state); + WARN_ONCE(head_sh->state & ((1 << STRIPE_DISCARD) | + (1 << STRIPE_REPLACED)), +@@ -5840,12 +5806,6 @@ static void make_discard_request(struct + } + spin_unlock_irq(&sh->stripe_lock); + if (conf->mddev->bitmap) { +- for (d = 0; +- d < conf->raid_disks - conf->max_degraded; +- d++) +- md_bitmap_startwrite(mddev->bitmap, +- sh->sector, +- RAID5_STRIPE_SECTORS(conf)); + sh->bm_seq = conf->seq_flush + 1; + set_bit(STRIPE_BIT_DELAY, &sh->state); + } +--- a/drivers/md/raid5.h ++++ b/drivers/md/raid5.h +@@ -371,9 +371,6 @@ enum { + STRIPE_ON_RELEASE_LIST, + STRIPE_BATCH_READY, + STRIPE_BATCH_ERR, +- STRIPE_BITMAP_PENDING, /* Being added to bitmap, don't add +- * to batch yet. +- */ + STRIPE_LOG_TRAPPED, /* trapped into log (see raid5-cache.c) + * this bit is used in two scenarios: + * diff --git a/queue-6.6/md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch b/queue-6.6/md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch new file mode 100644 index 0000000000..9134c974fd --- /dev/null +++ b/queue-6.6/md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch @@ -0,0 +1,347 @@ +From stable+bounces-114482-greg=kroah.com@vger.kernel.org Mon Feb 10 08:41:07 2025 +From: Yu Kuai +Date: Mon, 10 Feb 2025 15:33:19 +0800 +Subject: md/md-bitmap: remove the last parameter for bimtap_ops->endwrite() +To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com +Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com +Message-ID: <20250210073322.3315094-4-yukuai1@huaweicloud.com> + +From: Yu Kuai + +commit 4f0e7d0e03b7b80af84759a9e7cfb0f81ac4adae upstream. + +For the case that IO failed for one rdev, the bit will be mark as NEEDED +in following cases: + +1) If badblocks is set and rdev is not faulty; +2) If rdev is faulty; + +Case 1) is useless because synchronize data to badblocks make no sense. +Case 2) can be replaced with mddev->degraded. + +Also remove R1BIO_Degraded, R10BIO_Degraded and STRIPE_DEGRADED since +case 2) no longer use them. + +Signed-off-by: Yu Kuai +Link: https://lore.kernel.org/r/20250109015145.158868-3-yukuai1@huaweicloud.com +Signed-off-by: Song Liu +[ Resolve minor conflicts ] +Signed-off-by: Yu Kuai +Signed-off-by: Greg Kroah-Hartman +--- + drivers/md/md-bitmap.c | 19 ++++++++++--------- + drivers/md/md-bitmap.h | 2 +- + drivers/md/raid1.c | 27 +++------------------------ + drivers/md/raid1.h | 1 - + drivers/md/raid10.c | 23 +++-------------------- + drivers/md/raid10.h | 1 - + drivers/md/raid5-cache.c | 4 +--- + drivers/md/raid5.c | 14 +++----------- + drivers/md/raid5.h | 1 - + 9 files changed, 21 insertions(+), 71 deletions(-) + +--- a/drivers/md/md-bitmap.c ++++ b/drivers/md/md-bitmap.c +@@ -1520,7 +1520,7 @@ int md_bitmap_startwrite(struct bitmap * + EXPORT_SYMBOL_GPL(md_bitmap_startwrite); + + void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, +- unsigned long sectors, int success) ++ unsigned long sectors) + { + if (!bitmap) + return; +@@ -1537,15 +1537,16 @@ void md_bitmap_endwrite(struct bitmap *b + return; + } + +- if (success && !bitmap->mddev->degraded && +- bitmap->events_cleared < bitmap->mddev->events) { +- bitmap->events_cleared = bitmap->mddev->events; +- bitmap->need_sync = 1; +- sysfs_notify_dirent_safe(bitmap->sysfs_can_clear); +- } +- +- if (!success && !NEEDED(*bmc)) ++ if (!bitmap->mddev->degraded) { ++ if (bitmap->events_cleared < bitmap->mddev->events) { ++ bitmap->events_cleared = bitmap->mddev->events; ++ bitmap->need_sync = 1; ++ sysfs_notify_dirent_safe( ++ bitmap->sysfs_can_clear); ++ } ++ } else if (!NEEDED(*bmc)) { + *bmc |= NEEDED_MASK; ++ } + + if (COUNTER(*bmc) == COUNTER_MAX) + wake_up(&bitmap->overflow_wait); +--- a/drivers/md/md-bitmap.h ++++ b/drivers/md/md-bitmap.h +@@ -255,7 +255,7 @@ void md_bitmap_dirty_bits(struct bitmap + int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, + unsigned long sectors); + void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, +- unsigned long sectors, int success); ++ unsigned long sectors); + void md_bitmap_start_behind_write(struct mddev *mddev); + void md_bitmap_end_behind_write(struct mddev *mddev); + int md_bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int degraded); +--- a/drivers/md/raid1.c ++++ b/drivers/md/raid1.c +@@ -423,8 +423,7 @@ static void close_write(struct r1bio *r1 + md_bitmap_end_behind_write(r1_bio->mddev); + /* clear the bitmap if all writes complete successfully */ + md_bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector, +- r1_bio->sectors, +- !test_bit(R1BIO_Degraded, &r1_bio->state)); ++ r1_bio->sectors); + md_write_end(r1_bio->mddev); + } + +@@ -481,8 +480,6 @@ static void raid1_end_write_request(stru + if (!test_bit(Faulty, &rdev->flags)) + set_bit(R1BIO_WriteError, &r1_bio->state); + else { +- /* Fail the request */ +- set_bit(R1BIO_Degraded, &r1_bio->state); + /* Finished with this branch */ + r1_bio->bios[mirror] = NULL; + to_put = bio; +@@ -1415,11 +1412,8 @@ static void raid1_write_request(struct m + break; + } + r1_bio->bios[i] = NULL; +- if (!rdev || test_bit(Faulty, &rdev->flags)) { +- if (i < conf->raid_disks) +- set_bit(R1BIO_Degraded, &r1_bio->state); ++ if (!rdev || test_bit(Faulty, &rdev->flags)) + continue; +- } + + atomic_inc(&rdev->nr_pending); + if (test_bit(WriteErrorSeen, &rdev->flags)) { +@@ -1445,16 +1439,6 @@ static void raid1_write_request(struct m + */ + max_sectors = bad_sectors; + rdev_dec_pending(rdev, mddev); +- /* We don't set R1BIO_Degraded as that +- * only applies if the disk is +- * missing, so it might be re-added, +- * and we want to know to recover this +- * chunk. +- * In this case the device is here, +- * and the fact that this chunk is not +- * in-sync is recorded in the bad +- * block log +- */ + continue; + } + if (is_bad) { +@@ -2479,12 +2463,9 @@ static void handle_write_finished(struct + * errors. + */ + fail = true; +- if (!narrow_write_error(r1_bio, m)) { ++ if (!narrow_write_error(r1_bio, m)) + md_error(conf->mddev, + conf->mirrors[m].rdev); +- /* an I/O failed, we can't clear the bitmap */ +- set_bit(R1BIO_Degraded, &r1_bio->state); +- } + rdev_dec_pending(conf->mirrors[m].rdev, + conf->mddev); + } +@@ -2576,8 +2557,6 @@ static void raid1d(struct md_thread *thr + list_del(&r1_bio->retry_list); + idx = sector_to_idx(r1_bio->sector); + atomic_dec(&conf->nr_queued[idx]); +- if (mddev->degraded) +- set_bit(R1BIO_Degraded, &r1_bio->state); + if (test_bit(R1BIO_WriteError, &r1_bio->state)) + close_write(r1_bio); + raid_end_bio_io(r1_bio); +--- a/drivers/md/raid1.h ++++ b/drivers/md/raid1.h +@@ -187,7 +187,6 @@ struct r1bio { + enum r1bio_state { + R1BIO_Uptodate, + R1BIO_IsSync, +- R1BIO_Degraded, + R1BIO_BehindIO, + /* Set ReadError on bios that experience a readerror so that + * raid1d knows what to do with them. +--- a/drivers/md/raid10.c ++++ b/drivers/md/raid10.c +@@ -429,8 +429,7 @@ static void close_write(struct r10bio *r + { + /* clear the bitmap if all writes complete successfully */ + md_bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector, +- r10_bio->sectors, +- !test_bit(R10BIO_Degraded, &r10_bio->state)); ++ r10_bio->sectors); + md_write_end(r10_bio->mddev); + } + +@@ -500,7 +499,6 @@ static void raid10_end_write_request(str + set_bit(R10BIO_WriteError, &r10_bio->state); + else { + /* Fail the request */ +- set_bit(R10BIO_Degraded, &r10_bio->state); + r10_bio->devs[slot].bio = NULL; + to_put = bio; + dec_rdev = 1; +@@ -1489,10 +1487,8 @@ static void raid10_write_request(struct + r10_bio->devs[i].bio = NULL; + r10_bio->devs[i].repl_bio = NULL; + +- if (!rdev && !rrdev) { +- set_bit(R10BIO_Degraded, &r10_bio->state); ++ if (!rdev && !rrdev) + continue; +- } + if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) { + sector_t first_bad; + sector_t dev_sector = r10_bio->devs[i].addr; +@@ -1509,14 +1505,6 @@ static void raid10_write_request(struct + * to other devices yet + */ + max_sectors = bad_sectors; +- /* We don't set R10BIO_Degraded as that +- * only applies if the disk is missing, +- * so it might be re-added, and we want to +- * know to recover this chunk. +- * In this case the device is here, and the +- * fact that this chunk is not in-sync is +- * recorded in the bad block log. +- */ + continue; + } + if (is_bad) { +@@ -3062,11 +3050,8 @@ static void handle_write_completed(struc + rdev_dec_pending(rdev, conf->mddev); + } else if (bio != NULL && bio->bi_status) { + fail = true; +- if (!narrow_write_error(r10_bio, m)) { ++ if (!narrow_write_error(r10_bio, m)) + md_error(conf->mddev, rdev); +- set_bit(R10BIO_Degraded, +- &r10_bio->state); +- } + rdev_dec_pending(rdev, conf->mddev); + } + bio = r10_bio->devs[m].repl_bio; +@@ -3125,8 +3110,6 @@ static void raid10d(struct md_thread *th + r10_bio = list_first_entry(&tmp, struct r10bio, + retry_list); + list_del(&r10_bio->retry_list); +- if (mddev->degraded) +- set_bit(R10BIO_Degraded, &r10_bio->state); + + if (test_bit(R10BIO_WriteError, + &r10_bio->state)) +--- a/drivers/md/raid10.h ++++ b/drivers/md/raid10.h +@@ -161,7 +161,6 @@ enum r10bio_state { + R10BIO_IsSync, + R10BIO_IsRecover, + R10BIO_IsReshape, +- R10BIO_Degraded, + /* Set ReadError on bios that experience a read error + * so that raid10d knows what to do with them. + */ +--- a/drivers/md/raid5-cache.c ++++ b/drivers/md/raid5-cache.c +@@ -314,9 +314,7 @@ void r5c_handle_cached_data_endio(struct + set_bit(R5_UPTODATE, &sh->dev[i].flags); + r5c_return_dev_pending_writes(conf, &sh->dev[i]); + md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf), +- !test_bit(STRIPE_DEGRADED, +- &sh->state)); ++ RAID5_STRIPE_SECTORS(conf)); + } + } + } +--- a/drivers/md/raid5.c ++++ b/drivers/md/raid5.c +@@ -1359,8 +1359,6 @@ again: + submit_bio_noacct(rbi); + } + if (!rdev && !rrdev) { +- if (op_is_write(op)) +- set_bit(STRIPE_DEGRADED, &sh->state); + pr_debug("skip op %d on disc %d for sector %llu\n", + bi->bi_opf, i, (unsigned long long)sh->sector); + clear_bit(R5_LOCKED, &sh->dev[i].flags); +@@ -2925,7 +2923,6 @@ static void raid5_end_write_request(stru + set_bit(R5_MadeGoodRepl, &sh->dev[i].flags); + } else { + if (bi->bi_status) { +- set_bit(STRIPE_DEGRADED, &sh->state); + set_bit(WriteErrorSeen, &rdev->flags); + set_bit(R5_WriteError, &sh->dev[i].flags); + if (!test_and_set_bit(WantReplacement, &rdev->flags)) +@@ -3708,7 +3705,7 @@ handle_failed_stripe(struct r5conf *conf + } + if (bitmap_end) + md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf), 0); ++ RAID5_STRIPE_SECTORS(conf)); + bitmap_end = 0; + /* and fail all 'written' */ + bi = sh->dev[i].written; +@@ -3754,7 +3751,7 @@ handle_failed_stripe(struct r5conf *conf + } + if (bitmap_end) + md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf), 0); ++ RAID5_STRIPE_SECTORS(conf)); + /* If we were in the middle of a write the parity block might + * still be locked - so just clear all R5_LOCKED flags + */ +@@ -4106,9 +4103,7 @@ returnbi: + wbi = wbi2; + } + md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, +- RAID5_STRIPE_SECTORS(conf), +- !test_bit(STRIPE_DEGRADED, +- &sh->state)); ++ RAID5_STRIPE_SECTORS(conf)); + if (head_sh->batch_head) { + sh = list_first_entry(&sh->batch_list, + struct stripe_head, +@@ -4385,7 +4380,6 @@ static void handle_parity_checks5(struct + s->locked++; + set_bit(R5_Wantwrite, &dev->flags); + +- clear_bit(STRIPE_DEGRADED, &sh->state); + set_bit(STRIPE_INSYNC, &sh->state); + break; + case check_state_run: +@@ -4542,7 +4536,6 @@ static void handle_parity_checks6(struct + clear_bit(R5_Wantwrite, &dev->flags); + s->locked--; + } +- clear_bit(STRIPE_DEGRADED, &sh->state); + + set_bit(STRIPE_INSYNC, &sh->state); + break; +@@ -4951,7 +4944,6 @@ static void break_stripe_batch_list(stru + + set_mask_bits(&sh->state, ~(STRIPE_EXPAND_SYNC_FLAGS | + (1 << STRIPE_PREREAD_ACTIVE) | +- (1 << STRIPE_DEGRADED) | + (1 << STRIPE_ON_UNPLUG_LIST)), + head_sh->state & (1 << STRIPE_INSYNC)); + +--- a/drivers/md/raid5.h ++++ b/drivers/md/raid5.h +@@ -358,7 +358,6 @@ enum { + STRIPE_REPLACED, + STRIPE_PREREAD_ACTIVE, + STRIPE_DELAYED, +- STRIPE_DEGRADED, + STRIPE_BIT_DELAY, + STRIPE_EXPANDING, + STRIPE_EXPAND_SOURCE, diff --git a/queue-6.6/md-raid5-implement-pers-bitmap_sector.patch b/queue-6.6/md-raid5-implement-pers-bitmap_sector.patch new file mode 100644 index 0000000000..a5cd3366f2 --- /dev/null +++ b/queue-6.6/md-raid5-implement-pers-bitmap_sector.patch @@ -0,0 +1,111 @@ +From stable+bounces-114481-greg=kroah.com@vger.kernel.org Mon Feb 10 08:40:56 2025 +From: Yu Kuai +Date: Mon, 10 Feb 2025 15:33:21 +0800 +Subject: md/raid5: implement pers->bitmap_sector() +To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com +Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com +Message-ID: <20250210073322.3315094-6-yukuai1@huaweicloud.com> + +From: Yu Kuai + +commit 9c89f604476cf15c31fbbdb043cff7fbf1dbe0cb upstream. + +Bitmap is used for the whole array for raid1/raid10, hence IO for the +array can be used directly for bitmap. However, bitmap is used for +underlying disks for raid5, hence IO for the array can't be used +directly for bitmap. + +Implement pers->bitmap_sector() for raid5 to convert IO ranges from the +array to the underlying disks. + +Signed-off-by: Yu Kuai +Link: https://lore.kernel.org/r/20250109015145.158868-5-yukuai1@huaweicloud.com +Signed-off-by: Song Liu +[ Resolve minor conflicts ] +Signed-off-by: Yu Kuai +Signed-off-by: Greg Kroah-Hartman +--- + drivers/md/raid5.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 51 insertions(+) + +--- a/drivers/md/raid5.c ++++ b/drivers/md/raid5.c +@@ -5996,6 +5996,54 @@ static enum reshape_loc get_reshape_loc( + return LOC_BEHIND_RESHAPE; + } + ++static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset, ++ unsigned long *sectors) ++{ ++ struct r5conf *conf = mddev->private; ++ sector_t start = *offset; ++ sector_t end = start + *sectors; ++ sector_t prev_start = start; ++ sector_t prev_end = end; ++ int sectors_per_chunk; ++ enum reshape_loc loc; ++ int dd_idx; ++ ++ sectors_per_chunk = conf->chunk_sectors * ++ (conf->raid_disks - conf->max_degraded); ++ start = round_down(start, sectors_per_chunk); ++ end = round_up(end, sectors_per_chunk); ++ ++ start = raid5_compute_sector(conf, start, 0, &dd_idx, NULL); ++ end = raid5_compute_sector(conf, end, 0, &dd_idx, NULL); ++ ++ /* ++ * For LOC_INSIDE_RESHAPE, this IO will wait for reshape to make ++ * progress, hence it's the same as LOC_BEHIND_RESHAPE. ++ */ ++ loc = get_reshape_loc(mddev, conf, prev_start); ++ if (likely(loc != LOC_AHEAD_OF_RESHAPE)) { ++ *offset = start; ++ *sectors = end - start; ++ return; ++ } ++ ++ sectors_per_chunk = conf->prev_chunk_sectors * ++ (conf->previous_raid_disks - conf->max_degraded); ++ prev_start = round_down(prev_start, sectors_per_chunk); ++ prev_end = round_down(prev_end, sectors_per_chunk); ++ ++ prev_start = raid5_compute_sector(conf, prev_start, 1, &dd_idx, NULL); ++ prev_end = raid5_compute_sector(conf, prev_end, 1, &dd_idx, NULL); ++ ++ /* ++ * for LOC_AHEAD_OF_RESHAPE, reshape can make progress before this IO ++ * is handled in make_stripe_request(), we can't know this here hence ++ * we set bits for both. ++ */ ++ *offset = min(start, prev_start); ++ *sectors = max(end, prev_end) - *offset; ++} ++ + static enum stripe_result make_stripe_request(struct mddev *mddev, + struct r5conf *conf, struct stripe_request_ctx *ctx, + sector_t logical_sector, struct bio *bi) +@@ -9099,6 +9147,7 @@ static struct md_personality raid6_perso + .quiesce = raid5_quiesce, + .takeover = raid6_takeover, + .change_consistency_policy = raid5_change_consistency_policy, ++ .bitmap_sector = raid5_bitmap_sector, + }; + static struct md_personality raid5_personality = + { +@@ -9124,6 +9173,7 @@ static struct md_personality raid5_perso + .quiesce = raid5_quiesce, + .takeover = raid5_takeover, + .change_consistency_policy = raid5_change_consistency_policy, ++ .bitmap_sector = raid5_bitmap_sector, + }; + + static struct md_personality raid4_personality = +@@ -9150,6 +9200,7 @@ static struct md_personality raid4_perso + .quiesce = raid5_quiesce, + .takeover = raid4_takeover, + .change_consistency_policy = raid5_change_consistency_policy, ++ .bitmap_sector = raid5_bitmap_sector, + }; + + static int __init raid5_init(void) diff --git a/queue-6.6/md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch b/queue-6.6/md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch new file mode 100644 index 0000000000..a5651e7c13 --- /dev/null +++ b/queue-6.6/md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch @@ -0,0 +1,134 @@ +From stable+bounces-114478-greg=kroah.com@vger.kernel.org Mon Feb 10 08:40:39 2025 +From: Yu Kuai +Date: Mon, 10 Feb 2025 15:33:17 +0800 +Subject: md/raid5: recheck if reshape has finished with device_lock held +To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com +Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com +Message-ID: <20250210073322.3315094-2-yukuai1@huaweicloud.com> + +From: Benjamin Marzinski + +commit 25b3a8237a03ec0b67b965b52d74862e77ef7115 upstream. + +When handling an IO request, MD checks if a reshape is currently +happening, and if so, where the IO sector is in relation to the reshape +progress. MD uses conf->reshape_progress for both of these tasks. When +the reshape finishes, conf->reshape_progress is set to MaxSector. If +this occurs after MD checks if the reshape is currently happening but +before it calls ahead_of_reshape(), then ahead_of_reshape() will end up +comparing the IO sector against MaxSector. During a backwards reshape, +this will make MD think the IO sector is in the area not yet reshaped, +causing it to use the previous configuration, and map the IO to the +sector where that data was before the reshape. + +This bug can be triggered by running the lvm2 +lvconvert-raid-reshape-linear_to_raid6-single-type.sh test in a loop, +although it's very hard to reproduce. + +Fix this by factoring the code that checks where the IO sector is in +relation to the reshape out to a helper called get_reshape_loc(), +which reads reshape_progress and reshape_safe while holding the +device_lock, and then rechecks if the reshape has finished before +calling ahead_of_reshape with the saved values. + +Also use the helper during the REQ_NOWAIT check to see if the location +is inside of the reshape region. + +Fixes: fef9c61fdfabf ("md/raid5: change reshape-progress measurement to cope with reshaping backwards.") +Signed-off-by: Benjamin Marzinski +Signed-off-by: Song Liu +Link: https://lore.kernel.org/r/20240702151802.1632010-1-bmarzins@redhat.com +Signed-off-by: Yu Kuai +Signed-off-by: Greg Kroah-Hartman +--- + drivers/md/raid5.c | 64 +++++++++++++++++++++++++++++++++-------------------- + 1 file changed, 41 insertions(+), 23 deletions(-) + +--- a/drivers/md/raid5.c ++++ b/drivers/md/raid5.c +@@ -5972,6 +5972,39 @@ static bool reshape_disabled(struct mdde + return is_md_suspended(mddev) || !md_is_rdwr(mddev); + } + ++enum reshape_loc { ++ LOC_NO_RESHAPE, ++ LOC_AHEAD_OF_RESHAPE, ++ LOC_INSIDE_RESHAPE, ++ LOC_BEHIND_RESHAPE, ++}; ++ ++static enum reshape_loc get_reshape_loc(struct mddev *mddev, ++ struct r5conf *conf, sector_t logical_sector) ++{ ++ sector_t reshape_progress, reshape_safe; ++ /* ++ * Spinlock is needed as reshape_progress may be ++ * 64bit on a 32bit platform, and so it might be ++ * possible to see a half-updated value ++ * Of course reshape_progress could change after ++ * the lock is dropped, so once we get a reference ++ * to the stripe that we think it is, we will have ++ * to check again. ++ */ ++ spin_lock_irq(&conf->device_lock); ++ reshape_progress = conf->reshape_progress; ++ reshape_safe = conf->reshape_safe; ++ spin_unlock_irq(&conf->device_lock); ++ if (reshape_progress == MaxSector) ++ return LOC_NO_RESHAPE; ++ if (ahead_of_reshape(mddev, logical_sector, reshape_progress)) ++ return LOC_AHEAD_OF_RESHAPE; ++ if (ahead_of_reshape(mddev, logical_sector, reshape_safe)) ++ return LOC_INSIDE_RESHAPE; ++ return LOC_BEHIND_RESHAPE; ++} ++ + static enum stripe_result make_stripe_request(struct mddev *mddev, + struct r5conf *conf, struct stripe_request_ctx *ctx, + sector_t logical_sector, struct bio *bi) +@@ -5986,28 +6019,14 @@ static enum stripe_result make_stripe_re + seq = read_seqcount_begin(&conf->gen_lock); + + if (unlikely(conf->reshape_progress != MaxSector)) { +- /* +- * Spinlock is needed as reshape_progress may be +- * 64bit on a 32bit platform, and so it might be +- * possible to see a half-updated value +- * Of course reshape_progress could change after +- * the lock is dropped, so once we get a reference +- * to the stripe that we think it is, we will have +- * to check again. +- */ +- spin_lock_irq(&conf->device_lock); +- if (ahead_of_reshape(mddev, logical_sector, +- conf->reshape_progress)) { +- previous = 1; +- } else { +- if (ahead_of_reshape(mddev, logical_sector, +- conf->reshape_safe)) { +- spin_unlock_irq(&conf->device_lock); +- ret = STRIPE_SCHEDULE_AND_RETRY; +- goto out; +- } ++ enum reshape_loc loc = get_reshape_loc(mddev, conf, ++ logical_sector); ++ if (loc == LOC_INSIDE_RESHAPE) { ++ ret = STRIPE_SCHEDULE_AND_RETRY; ++ goto out; + } +- spin_unlock_irq(&conf->device_lock); ++ if (loc == LOC_AHEAD_OF_RESHAPE) ++ previous = 1; + } + + new_sector = raid5_compute_sector(conf, logical_sector, previous, +@@ -6189,8 +6208,7 @@ static bool raid5_make_request(struct md + /* Bail out if conflicts with reshape and REQ_NOWAIT is set */ + if ((bi->bi_opf & REQ_NOWAIT) && + (conf->reshape_progress != MaxSector) && +- !ahead_of_reshape(mddev, logical_sector, conf->reshape_progress) && +- ahead_of_reshape(mddev, logical_sector, conf->reshape_safe)) { ++ get_reshape_loc(mddev, conf, logical_sector) == LOC_INSIDE_RESHAPE) { + bio_wouldblock_error(bi); + if (rw == WRITE) + md_write_end(mddev); diff --git a/queue-6.6/mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch b/queue-6.6/mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch new file mode 100644 index 0000000000..27e7ad1a65 --- /dev/null +++ b/queue-6.6/mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch @@ -0,0 +1,90 @@ +From 1aaf8c122918aa8897605a9aa1e8ed6600d6f930 Mon Sep 17 00:00:00 2001 +From: Zhaoyang Huang +Date: Tue, 21 Jan 2025 10:01:59 +0800 +Subject: mm: gup: fix infinite loop within __get_longterm_locked + +From: Zhaoyang Huang + +commit 1aaf8c122918aa8897605a9aa1e8ed6600d6f930 upstream. + +We can run into an infinite loop in __get_longterm_locked() when +collect_longterm_unpinnable_folios() finds only folios that are isolated +from the LRU or were never added to the LRU. This can happen when all +folios to be pinned are never added to the LRU, for example when +vm_ops->fault allocated pages using cma_alloc() and never added them to +the LRU. + +Fix it by simply taking a look at the list in the single caller, to see if +anything was added. + +[zhaoyang.huang@unisoc.com: move definition of local] + Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com +Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com +Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()") +Signed-off-by: Zhaoyang Huang +Reviewed-by: John Hubbard +Reviewed-by: David Hildenbrand +Suggested-by: David Hildenbrand +Acked-by: David Hildenbrand +Cc: Aijun Sun +Cc: Alistair Popple +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Wentao Guan +Signed-off-by: Greg Kroah-Hartman +--- + mm/gup.c | 14 ++++---------- + 1 file changed, 4 insertions(+), 10 deletions(-) + +--- a/mm/gup.c ++++ b/mm/gup.c +@@ -1946,14 +1946,14 @@ struct page *get_dump_page(unsigned long + /* + * Returns the number of collected pages. Return value is always >= 0. + */ +-static unsigned long collect_longterm_unpinnable_pages( ++static void collect_longterm_unpinnable_pages( + struct list_head *movable_page_list, + unsigned long nr_pages, + struct page **pages) + { +- unsigned long i, collected = 0; + struct folio *prev_folio = NULL; + bool drain_allow = true; ++ unsigned long i; + + for (i = 0; i < nr_pages; i++) { + struct folio *folio = page_folio(pages[i]); +@@ -1965,8 +1965,6 @@ static unsigned long collect_longterm_un + if (folio_is_longterm_pinnable(folio)) + continue; + +- collected++; +- + if (folio_is_device_coherent(folio)) + continue; + +@@ -1988,8 +1986,6 @@ static unsigned long collect_longterm_un + NR_ISOLATED_ANON + folio_is_file_lru(folio), + folio_nr_pages(folio)); + } +- +- return collected; + } + + /* +@@ -2082,12 +2078,10 @@ err: + static long check_and_migrate_movable_pages(unsigned long nr_pages, + struct page **pages) + { +- unsigned long collected; + LIST_HEAD(movable_page_list); + +- collected = collect_longterm_unpinnable_pages(&movable_page_list, +- nr_pages, pages); +- if (!collected) ++ collect_longterm_unpinnable_pages(&movable_page_list, nr_pages, pages); ++ if (list_empty(&movable_page_list)) + return 0; + + return migrate_longterm_unpinnable_pages(&movable_page_list, nr_pages, diff --git a/queue-6.6/netdevsim-print-human-readable-ip-address.patch b/queue-6.6/netdevsim-print-human-readable-ip-address.patch new file mode 100644 index 0000000000..77b5a33074 --- /dev/null +++ b/queue-6.6/netdevsim-print-human-readable-ip-address.patch @@ -0,0 +1,70 @@ +From c71bc6da6198a6d88df86094f1052bb581951d65 Mon Sep 17 00:00:00 2001 +From: Hangbin Liu +Date: Thu, 10 Oct 2024 04:00:25 +0000 +Subject: netdevsim: print human readable IP address + +From: Hangbin Liu + +commit c71bc6da6198a6d88df86094f1052bb581951d65 upstream. + +Currently, IPSec addresses are printed in hexadecimal format, which is +not user-friendly. e.g. + + # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec + SA count=2 tx=20 + sa[0] rx ipaddr=0x00000000 00000000 00000000 0100a8c0 + sa[0] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1 + sa[0] key=0x3167608a ca4f1397 43565909 941fa627 + sa[1] tx ipaddr=0x00000000 00000000 00000000 00000000 + sa[1] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1 + sa[1] key=0x3167608a ca4f1397 43565909 941fa627 + +This patch updates the code to print the IPSec address in a human-readable +format for easier debug. e.g. + + # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec + SA count=4 tx=40 + sa[0] tx ipaddr=0.0.0.0 + sa[0] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1 + sa[0] key=0x3167608a ca4f1397 43565909 941fa627 + sa[1] rx ipaddr=192.168.0.1 + sa[1] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1 + sa[1] key=0x3167608a ca4f1397 43565909 941fa627 + sa[2] tx ipaddr=:: + sa[2] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1 + sa[2] key=0x3167608a ca4f1397 43565909 941fa627 + sa[3] rx ipaddr=2000::1 + sa[3] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1 + sa[3] key=0x3167608a ca4f1397 43565909 941fa627 + +Reviewed-by: Simon Horman +Signed-off-by: Hangbin Liu +Link: https://patch.msgid.link/20241010040027.21440-2-liuhangbin@gmail.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Harshit Mogalapalli +Signed-off-by: Greg Kroah-Hartman +--- + drivers/net/netdevsim/ipsec.c | 12 ++++++++---- + 1 file changed, 8 insertions(+), 4 deletions(-) + +--- a/drivers/net/netdevsim/ipsec.c ++++ b/drivers/net/netdevsim/ipsec.c +@@ -39,10 +39,14 @@ static ssize_t nsim_dbg_netdev_ops_read( + if (!sap->used) + continue; + +- p += scnprintf(p, bufsize - (p - buf), +- "sa[%i] %cx ipaddr=0x%08x %08x %08x %08x\n", +- i, (sap->rx ? 'r' : 't'), sap->ipaddr[0], +- sap->ipaddr[1], sap->ipaddr[2], sap->ipaddr[3]); ++ if (sap->xs->props.family == AF_INET6) ++ p += scnprintf(p, bufsize - (p - buf), ++ "sa[%i] %cx ipaddr=%pI6c\n", ++ i, (sap->rx ? 'r' : 't'), &sap->ipaddr); ++ else ++ p += scnprintf(p, bufsize - (p - buf), ++ "sa[%i] %cx ipaddr=%pI4\n", ++ i, (sap->rx ? 'r' : 't'), &sap->ipaddr[3]); + p += scnprintf(p, bufsize - (p - buf), + "sa[%i] spi=0x%08x proto=0x%x salt=0x%08x crypt=%d\n", + i, be32_to_cpu(sap->xs->id.spi), diff --git a/queue-6.6/selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch b/queue-6.6/selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch new file mode 100644 index 0000000000..e521ef286b --- /dev/null +++ b/queue-6.6/selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch @@ -0,0 +1,40 @@ +From 3ec920bb978ccdc68a7dfb304d303d598d038cb1 Mon Sep 17 00:00:00 2001 +From: Hangbin Liu +Date: Thu, 10 Oct 2024 04:00:27 +0000 +Subject: selftests: rtnetlink: update netdevsim ipsec output format + +From: Hangbin Liu + +commit 3ec920bb978ccdc68a7dfb304d303d598d038cb1 upstream. + +After the netdevsim update to use human-readable IP address formats for +IPsec, we can now use the source and destination IPs directly in testing. +Here is the result: + # ./rtnetlink.sh -t kci_test_ipsec_offload + PASS: ipsec_offload + +Signed-off-by: Hangbin Liu +Acked-by: Stanislav Fomichev +Link: https://patch.msgid.link/20241010040027.21440-4-liuhangbin@gmail.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Harshit Mogalapalli +Signed-off-by: Greg Kroah-Hartman +--- + tools/testing/selftests/net/rtnetlink.sh | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/tools/testing/selftests/net/rtnetlink.sh ++++ b/tools/testing/selftests/net/rtnetlink.sh +@@ -921,10 +921,10 @@ kci_test_ipsec_offload() + # does driver have correct offload info + diff $sysfsf - << EOF + SA count=2 tx=3 +-sa[0] tx ipaddr=0x00000000 00000000 00000000 00000000 ++sa[0] tx ipaddr=$dstip + sa[0] spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1 + sa[0] key=0x34333231 38373635 32313039 36353433 +-sa[1] rx ipaddr=0x00000000 00000000 00000000 037ba8c0 ++sa[1] rx ipaddr=$srcip + sa[1] spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1 + sa[1] key=0x34333231 38373635 32313039 36353433 + EOF diff --git a/queue-6.6/series b/queue-6.6/series index dced9d1161..a7562fba0c 100644 --- a/queue-6.6/series +++ b/queue-6.6/series @@ -133,3 +133,13 @@ drm-v3d-stop-active-perfmon-if-it-is-being-destroyed.patch x86-static-call-remove-early_boot_irqs_disabled-check-to-fix-xen-pvh-dom0.patch drm-amd-display-add-null-check-for-head_pipe-in-dcn201_acquire_free_pipe_for_layer.patch drm-amd-display-pass-non-null-to-dcn20_validate_apply_pipe_split_flags.patch +netdevsim-print-human-readable-ip-address.patch +selftests-rtnetlink-update-netdevsim-ipsec-output-format.patch +md-raid5-recheck-if-reshape-has-finished-with-device_lock-held.patch +md-md-bitmap-factor-behind-write-counters-out-from-bitmap_-start-end-write.patch +md-md-bitmap-remove-the-last-parameter-for-bimtap_ops-endwrite.patch +md-add-a-new-callback-pers-bitmap_sector.patch +md-raid5-implement-pers-bitmap_sector.patch +md-md-bitmap-move-bitmap_-start-end-write-to-md-upper-layer.patch +arm64-filter-out-sve-hwcaps-when-feat_sve-isn-t-implemented.patch +mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch