From: Greg Kroah-Hartman Date: Thu, 17 Apr 2025 11:53:06 +0000 (+0200) Subject: 5.15-stable patches X-Git-Tag: v6.12.24~67 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=f6dd1b240445a6d9ad3edc6b52ca7aa6a4fab157;p=thirdparty%2Fkernel%2Fstable-queue.git 5.15-stable patches added patches: mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch sparc-mm-disable-preemption-in-lazy-mmu-mode.patch --- diff --git a/queue-5.15/arm64-dts-exynos-gs101-disable-pinctrl_gsacore-node.patch b/queue-5.15/arm64-dts-exynos-gs101-disable-pinctrl_gsacore-node.patch deleted file mode 100644 index 9d4a4f13f0..0000000000 --- a/queue-5.15/arm64-dts-exynos-gs101-disable-pinctrl_gsacore-node.patch +++ /dev/null @@ -1,52 +0,0 @@ -From 168e24966f10ff635b0ec9728aa71833bf850ee5 Mon Sep 17 00:00:00 2001 -From: Peter Griffin -Date: Mon, 6 Jan 2025 14:57:46 +0000 -Subject: arm64: dts: exynos: gs101: disable pinctrl_gsacore node - -From: Peter Griffin - -commit 168e24966f10ff635b0ec9728aa71833bf850ee5 upstream. - -gsacore registers are not accessible from normal world. - -Disable this node, so that the suspend/resume callbacks -in the pinctrl driver don't cause a Serror attempting to -access the registers. - -Fixes: ea89fdf24fd9 ("arm64: dts: exynos: google: Add initial Google gs101 SoC support") -Signed-off-by: Peter Griffin -To: Rob Herring -To: Krzysztof Kozlowski -To: Conor Dooley -To: Alim Akhtar -Cc: linux-arm-kernel@lists.infradead.org -Cc: linux-samsung-soc@vger.kernel.org -Cc: devicetree@vger.kernel.org -Cc: linux-kernel@vger.kernel.org -Cc: tudor.ambarus@linaro.org -Cc: andre.draszik@linaro.org -Cc: kernel-team@android.com -Cc: willmcvicker@google.com -Cc: stable@vger.kernel.org -Link: https://lore.kernel.org/r/20250106-contrib-pg-pinctrl_gsacore_disable-v1-1-d3fc88a48aed@linaro.org -Signed-off-by: Krzysztof Kozlowski -Signed-off-by: Greg Kroah-Hartman ---- - arch/arm64/boot/dts/exynos/google/gs101.dtsi | 1 + - 1 file changed, 1 insertion(+) - -diff --git a/arch/arm64/boot/dts/exynos/google/gs101.dtsi b/arch/arm64/boot/dts/exynos/google/gs101.dtsi -index c5335dd59dfe..813f96089578 100644 ---- a/arch/arm64/boot/dts/exynos/google/gs101.dtsi -+++ b/arch/arm64/boot/dts/exynos/google/gs101.dtsi -@@ -1454,6 +1454,7 @@ pinctrl_gsacore: pinctrl@17a80000 { - /* TODO: update once support for this CMU exists */ - clocks = <0>; - clock-names = "pclk"; -+ status = "disabled"; - }; - - cmu_top: clock-controller@1e080000 { --- -2.49.0 - diff --git a/queue-5.15/mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch b/queue-5.15/mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch new file mode 100644 index 0000000000..a6e1db63bf --- /dev/null +++ b/queue-5.15/mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch @@ -0,0 +1,62 @@ +From c0ebbb3841e07c4493e6fe351698806b09a87a37 Mon Sep 17 00:00:00 2001 +From: Mathieu Desnoyers +Date: Wed, 12 Mar 2025 10:10:13 -0400 +Subject: mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock + +From: Mathieu Desnoyers + +commit c0ebbb3841e07c4493e6fe351698806b09a87a37 upstream. + +The PGDAT_RECLAIM_LOCKED bit is used to provide mutual exclusion of node +reclaim for struct pglist_data using a single bit. + +It is "locked" with a test_and_set_bit (similarly to a try lock) which +provides full ordering with respect to loads and stores done within +__node_reclaim(). + +It is "unlocked" with clear_bit(), which does not provide any ordering +with respect to loads and stores done before clearing the bit. + +The lack of clear_bit() memory ordering with respect to stores within +__node_reclaim() can cause a subsequent CPU to fail to observe stores from +a prior node reclaim. This is not an issue in practice on TSO (e.g. +x86), but it is an issue on weakly-ordered architectures (e.g. arm64). + +Fix this by using clear_bit_unlock rather than clear_bit to clear +PGDAT_RECLAIM_LOCKED with a release memory ordering semantic. + +This provides stronger memory ordering (release rather than relaxed). + +Link: https://lkml.kernel.org/r/20250312141014.129725-1-mathieu.desnoyers@efficios.com +Fixes: d773ed6b856a ("mm: test and set zone reclaim lock before starting reclaim") +Signed-off-by: Mathieu Desnoyers +Cc: Lorenzo Stoakes +Cc: Matthew Wilcox +Cc: Alan Stern +Cc: Andrea Parri +Cc: Will Deacon +Cc: Peter Zijlstra +Cc: Boqun Feng +Cc: Nicholas Piggin +Cc: David Howells +Cc: Jade Alglave +Cc: Luc Maranget +Cc: "Paul E. McKenney" +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/vmscan.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -4581,7 +4581,7 @@ int node_reclaim(struct pglist_data *pgd + return NODE_RECLAIM_NOSCAN; + + ret = __node_reclaim(pgdat, gfp_mask, order); +- clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags); ++ clear_bit_unlock(PGDAT_RECLAIM_LOCKED, &pgdat->flags); + + if (!ret) + count_vm_event(PGSCAN_ZONE_RECLAIM_FAILED); diff --git a/queue-5.15/mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch b/queue-5.15/mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch new file mode 100644 index 0000000000..49ba12d1a7 --- /dev/null +++ b/queue-5.15/mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch @@ -0,0 +1,137 @@ +From aaf99ac2ceb7c974f758a635723eeaf48596388e Mon Sep 17 00:00:00 2001 +From: Shuai Xue +Date: Wed, 12 Mar 2025 19:28:51 +0800 +Subject: mm/hwpoison: do not send SIGBUS to processes with recovered clean pages + +From: Shuai Xue + +commit aaf99ac2ceb7c974f758a635723eeaf48596388e upstream. + +When an uncorrected memory error is consumed there is a race between the +CMCI from the memory controller reporting an uncorrected error with a UCNA +signature, and the core reporting and SRAR signature machine check when +the data is about to be consumed. + +- Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] + +Prior to Icelake memory controllers reported patrol scrub events that +detected a previously unseen uncorrected error in memory by signaling a +broadcast machine check with an SRAO (Software Recoverable Action +Optional) signature in the machine check bank. This was overkill because +it's not an urgent problem that no core is on the verge of consuming that +bad data. It's also found that multi SRAO UCE may cause nested MCE +interrupts and finally become an IERR. + +Hence, Intel downgrades the machine check bank signature of patrol scrub +from SRAO to UCNA (Uncorrected, No Action required), and signal changed to +#CMCI. Just to add to the confusion, Linux does take an action (in +uc_decode_notifier()) to try to offline the page despite the UC*NA* +signature name. + +- Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1] + +Having decided that CMCI/UCNA is the best action for patrol scrub errors, +the memory controller uses it for reads too. But the memory controller is +executing asynchronously from the core, and can't tell the difference +between a "real" read and a speculative read. So it will do CMCI/UCNA if +an error is found in any read. + +Thus: + +1) Core is clever and thinks address A is needed soon, issues a speculative read. +2) Core finds it is going to use address A soon after sending the read request +3) The CMCI from the memory controller is in a race with MCE from the core + that will soon try to retire the load from address A. + +Quite often (because speculation has got better) the CMCI from the memory +controller is delivered before the core is committed to the instruction +reading address A, so the interrupt is taken, and Linux offlines the page +(marking it as poison). + +- Why user process is killed for instr case + +Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported +"not recovered"") tries to fix noise message "Memory error not recovered" +and skips duplicate SIGBUSs due to the race. But it also introduced a bug +that kill_accessing_process() return -EHWPOISON for instr case, as result, +kill_me_maybe() send a SIGBUS to user process. + +If the CMCI wins that race, the page is marked poisoned when +uc_decode_notifier() calls memory_failure(). For dirty pages, +memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag, +converting the PTE to a hwpoison entry. As a result, +kill_accessing_process(): + +- call walk_page_range() and return 1 regardless of whether + try_to_unmap() succeeds or fails, +- call kill_proc() to make sure a SIGBUS is sent +- return -EHWPOISON to indicate that SIGBUS is already sent to the + process and kill_me_maybe() doesn't have to send it again. + +However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the +PTE unchanged and not converted to a hwpoison entry. Conversely, for +clean pages where PTE entries are not marked as hwpoison, +kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send +a SIGBUS. + +Console log looks like this: + + Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects + Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered + Memory failure: 0x827ca68: already hardware poisoned + mce: Memory error not recovered + +To fix it, return 0 for "corrupted page was clean", preventing an +unnecessary SIGBUS to user process. + +[1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.com/T/#mba94f1305b3009dd340ce4114d3221fe810d1871 +Link: https://lkml.kernel.org/r/20250312112852.82415-3-xueshuai@linux.alibaba.com +Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") +Signed-off-by: Shuai Xue +Tested-by: Tony Luck +Acked-by: Miaohe Lin +Cc: Baolin Wang +Cc: Borislav Betkov +Cc: Catalin Marinas +Cc: Dave Hansen +Cc: "H. Peter Anvin" +Cc: Ingo Molnar +Cc: Jane Chu +Cc: Jarkko Sakkinen +Cc: Jonathan Cameron +Cc: Josh Poimboeuf +Cc: Naoya Horiguchi +Cc: Peter Zijlstra +Cc: Ruidong Tian +Cc: Thomas Gleinxer +Cc: Yazen Ghannam +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/memory-failure.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + +--- a/mm/memory-failure.c ++++ b/mm/memory-failure.c +@@ -707,12 +707,17 @@ static int kill_accessing_process(struct + mmap_read_lock(p->mm); + ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwp_walk_ops, + (void *)&priv); ++ /* ++ * ret = 1 when CMCI wins, regardless of whether try_to_unmap() ++ * succeeds or fails, then kill the process with SIGBUS. ++ * ret = 0 when poison page is a clean page and it's dropped, no ++ * SIGBUS is needed. ++ */ + if (ret == 1 && priv.tk.addr) + kill_proc(&priv.tk, pfn, flags); +- else +- ret = 0; + mmap_read_unlock(p->mm); +- return ret > 0 ? -EHWPOISON : -EFAULT; ++ ++ return ret > 0 ? -EHWPOISON : 0; + } + + static const char *action_name[] = { diff --git a/queue-5.15/series b/queue-5.15/series index 96ea3e6f6b..c696eae3b4 100644 --- a/queue-5.15/series +++ b/queue-5.15/series @@ -96,4 +96,6 @@ mptcp-only-inc-mpjoinackhmacfailure-for-hmac-failures.patch mtd-inftlcore-add-error-check-for-inftl_read_oob.patch mtd-rawnand-add-status-chack-in-r852_ready.patch arm64-dts-mediatek-mt8173-fix-disp-pwm-compatible-string.patch -arm64-dts-exynos-gs101-disable-pinctrl_gsacore-node.patch +sparc-mm-disable-preemption-in-lazy-mmu-mode.patch +mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch +mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch diff --git a/queue-5.15/sparc-mm-disable-preemption-in-lazy-mmu-mode.patch b/queue-5.15/sparc-mm-disable-preemption-in-lazy-mmu-mode.patch new file mode 100644 index 0000000000..ade1fb51da --- /dev/null +++ b/queue-5.15/sparc-mm-disable-preemption-in-lazy-mmu-mode.patch @@ -0,0 +1,70 @@ +From a1d416bf9faf4f4871cb5a943614a07f80a7d70f Mon Sep 17 00:00:00 2001 +From: Ryan Roberts +Date: Mon, 3 Mar 2025 14:15:37 +0000 +Subject: sparc/mm: disable preemption in lazy mmu mode + +From: Ryan Roberts + +commit a1d416bf9faf4f4871cb5a943614a07f80a7d70f upstream. + +Since commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy +updates") it's been possible for arch_[enter|leave]_lazy_mmu_mode() to be +called without holding a page table lock (for the kernel mappings case), +and therefore it is possible that preemption may occur while in the lazy +mmu mode. The Sparc lazy mmu implementation is not robust to preemption +since it stores the lazy mode state in a per-cpu structure and does not +attempt to manage that state on task switch. + +Powerpc had the same issue and fixed it by explicitly disabling preemption +in arch_enter_lazy_mmu_mode() and re-enabling in +arch_leave_lazy_mmu_mode(). See commit b9ef323ea168 ("powerpc/64s: +Disable preemption in hash lazy mmu mode"). + +Given Sparc's lazy mmu mode is based on powerpc's, let's fix it in the +same way here. + +Link: https://lkml.kernel.org/r/20250303141542.3371656-4-ryan.roberts@arm.com +Fixes: 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy updates") +Signed-off-by: Ryan Roberts +Acked-by: David Hildenbrand +Acked-by: Andreas Larsson +Acked-by: Juergen Gross +Cc: Borislav Betkov +Cc: Boris Ostrovsky +Cc: Catalin Marinas +Cc: Dave Hansen +Cc: David S. Miller +Cc: "H. Peter Anvin" +Cc: Ingo Molnar +Cc: Juegren Gross +Cc: Matthew Wilcow (Oracle) +Cc: Thomas Gleinxer +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + arch/sparc/mm/tlb.c | 5 ++++- + 1 file changed, 4 insertions(+), 1 deletion(-) + +--- a/arch/sparc/mm/tlb.c ++++ b/arch/sparc/mm/tlb.c +@@ -52,8 +52,10 @@ out: + + void arch_enter_lazy_mmu_mode(void) + { +- struct tlb_batch *tb = this_cpu_ptr(&tlb_batch); ++ struct tlb_batch *tb; + ++ preempt_disable(); ++ tb = this_cpu_ptr(&tlb_batch); + tb->active = 1; + } + +@@ -64,6 +66,7 @@ void arch_leave_lazy_mmu_mode(void) + if (tb->tlb_nr) + flush_tlb_pending(); + tb->active = 0; ++ preempt_enable(); + } + + static void tlb_batch_add_one(struct mm_struct *mm, unsigned long vaddr,