From: Greg Kroah-Hartman Date: Thu, 17 Apr 2025 11:53:08 +0000 (+0200) Subject: 6.1-stable patches X-Git-Tag: v6.12.24~66 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=fa50755032df1dd5ee23ed918e6e364c4abcb2e5;p=thirdparty%2Fkernel%2Fstable-queue.git 6.1-stable patches added patches: mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch sparc-mm-disable-preemption-in-lazy-mmu-mode.patch --- diff --git a/queue-6.1/mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch b/queue-6.1/mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch new file mode 100644 index 0000000000..1ac15a60c1 --- /dev/null +++ b/queue-6.1/mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch @@ -0,0 +1,62 @@ +From c0ebbb3841e07c4493e6fe351698806b09a87a37 Mon Sep 17 00:00:00 2001 +From: Mathieu Desnoyers +Date: Wed, 12 Mar 2025 10:10:13 -0400 +Subject: mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock + +From: Mathieu Desnoyers + +commit c0ebbb3841e07c4493e6fe351698806b09a87a37 upstream. + +The PGDAT_RECLAIM_LOCKED bit is used to provide mutual exclusion of node +reclaim for struct pglist_data using a single bit. + +It is "locked" with a test_and_set_bit (similarly to a try lock) which +provides full ordering with respect to loads and stores done within +__node_reclaim(). + +It is "unlocked" with clear_bit(), which does not provide any ordering +with respect to loads and stores done before clearing the bit. + +The lack of clear_bit() memory ordering with respect to stores within +__node_reclaim() can cause a subsequent CPU to fail to observe stores from +a prior node reclaim. This is not an issue in practice on TSO (e.g. +x86), but it is an issue on weakly-ordered architectures (e.g. arm64). + +Fix this by using clear_bit_unlock rather than clear_bit to clear +PGDAT_RECLAIM_LOCKED with a release memory ordering semantic. + +This provides stronger memory ordering (release rather than relaxed). + +Link: https://lkml.kernel.org/r/20250312141014.129725-1-mathieu.desnoyers@efficios.com +Fixes: d773ed6b856a ("mm: test and set zone reclaim lock before starting reclaim") +Signed-off-by: Mathieu Desnoyers +Cc: Lorenzo Stoakes +Cc: Matthew Wilcox +Cc: Alan Stern +Cc: Andrea Parri +Cc: Will Deacon +Cc: Peter Zijlstra +Cc: Boqun Feng +Cc: Nicholas Piggin +Cc: David Howells +Cc: Jade Alglave +Cc: Luc Maranget +Cc: "Paul E. McKenney" +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/vmscan.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -7729,7 +7729,7 @@ int node_reclaim(struct pglist_data *pgd + return NODE_RECLAIM_NOSCAN; + + ret = __node_reclaim(pgdat, gfp_mask, order); +- clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags); ++ clear_bit_unlock(PGDAT_RECLAIM_LOCKED, &pgdat->flags); + + if (!ret) + count_vm_event(PGSCAN_ZONE_RECLAIM_FAILED); diff --git a/queue-6.1/mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch b/queue-6.1/mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch new file mode 100644 index 0000000000..2603eaec22 --- /dev/null +++ b/queue-6.1/mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch @@ -0,0 +1,137 @@ +From aaf99ac2ceb7c974f758a635723eeaf48596388e Mon Sep 17 00:00:00 2001 +From: Shuai Xue +Date: Wed, 12 Mar 2025 19:28:51 +0800 +Subject: mm/hwpoison: do not send SIGBUS to processes with recovered clean pages + +From: Shuai Xue + +commit aaf99ac2ceb7c974f758a635723eeaf48596388e upstream. + +When an uncorrected memory error is consumed there is a race between the +CMCI from the memory controller reporting an uncorrected error with a UCNA +signature, and the core reporting and SRAR signature machine check when +the data is about to be consumed. + +- Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] + +Prior to Icelake memory controllers reported patrol scrub events that +detected a previously unseen uncorrected error in memory by signaling a +broadcast machine check with an SRAO (Software Recoverable Action +Optional) signature in the machine check bank. This was overkill because +it's not an urgent problem that no core is on the verge of consuming that +bad data. It's also found that multi SRAO UCE may cause nested MCE +interrupts and finally become an IERR. + +Hence, Intel downgrades the machine check bank signature of patrol scrub +from SRAO to UCNA (Uncorrected, No Action required), and signal changed to +#CMCI. Just to add to the confusion, Linux does take an action (in +uc_decode_notifier()) to try to offline the page despite the UC*NA* +signature name. + +- Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1] + +Having decided that CMCI/UCNA is the best action for patrol scrub errors, +the memory controller uses it for reads too. But the memory controller is +executing asynchronously from the core, and can't tell the difference +between a "real" read and a speculative read. So it will do CMCI/UCNA if +an error is found in any read. + +Thus: + +1) Core is clever and thinks address A is needed soon, issues a speculative read. +2) Core finds it is going to use address A soon after sending the read request +3) The CMCI from the memory controller is in a race with MCE from the core + that will soon try to retire the load from address A. + +Quite often (because speculation has got better) the CMCI from the memory +controller is delivered before the core is committed to the instruction +reading address A, so the interrupt is taken, and Linux offlines the page +(marking it as poison). + +- Why user process is killed for instr case + +Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported +"not recovered"") tries to fix noise message "Memory error not recovered" +and skips duplicate SIGBUSs due to the race. But it also introduced a bug +that kill_accessing_process() return -EHWPOISON for instr case, as result, +kill_me_maybe() send a SIGBUS to user process. + +If the CMCI wins that race, the page is marked poisoned when +uc_decode_notifier() calls memory_failure(). For dirty pages, +memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag, +converting the PTE to a hwpoison entry. As a result, +kill_accessing_process(): + +- call walk_page_range() and return 1 regardless of whether + try_to_unmap() succeeds or fails, +- call kill_proc() to make sure a SIGBUS is sent +- return -EHWPOISON to indicate that SIGBUS is already sent to the + process and kill_me_maybe() doesn't have to send it again. + +However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the +PTE unchanged and not converted to a hwpoison entry. Conversely, for +clean pages where PTE entries are not marked as hwpoison, +kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send +a SIGBUS. + +Console log looks like this: + + Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects + Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered + Memory failure: 0x827ca68: already hardware poisoned + mce: Memory error not recovered + +To fix it, return 0 for "corrupted page was clean", preventing an +unnecessary SIGBUS to user process. + +[1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.com/T/#mba94f1305b3009dd340ce4114d3221fe810d1871 +Link: https://lkml.kernel.org/r/20250312112852.82415-3-xueshuai@linux.alibaba.com +Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") +Signed-off-by: Shuai Xue +Tested-by: Tony Luck +Acked-by: Miaohe Lin +Cc: Baolin Wang +Cc: Borislav Betkov +Cc: Catalin Marinas +Cc: Dave Hansen +Cc: "H. Peter Anvin" +Cc: Ingo Molnar +Cc: Jane Chu +Cc: Jarkko Sakkinen +Cc: Jonathan Cameron +Cc: Josh Poimboeuf +Cc: Naoya Horiguchi +Cc: Peter Zijlstra +Cc: Ruidong Tian +Cc: Thomas Gleinxer +Cc: Yazen Ghannam +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/memory-failure.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + +--- a/mm/memory-failure.c ++++ b/mm/memory-failure.c +@@ -764,12 +764,17 @@ static int kill_accessing_process(struct + mmap_read_lock(p->mm); + ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwp_walk_ops, + (void *)&priv); ++ /* ++ * ret = 1 when CMCI wins, regardless of whether try_to_unmap() ++ * succeeds or fails, then kill the process with SIGBUS. ++ * ret = 0 when poison page is a clean page and it's dropped, no ++ * SIGBUS is needed. ++ */ + if (ret == 1 && priv.tk.addr) + kill_proc(&priv.tk, pfn, flags); +- else +- ret = 0; + mmap_read_unlock(p->mm); +- return ret > 0 ? -EHWPOISON : -EFAULT; ++ ++ return ret > 0 ? -EHWPOISON : 0; + } + + static const char *action_name[] = { diff --git a/queue-6.1/mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch b/queue-6.1/mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch new file mode 100644 index 0000000000..d43d423b2b --- /dev/null +++ b/queue-6.1/mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch @@ -0,0 +1,66 @@ +From bc3fe6805cf09a25a086573a17d40e525208c5d8 Mon Sep 17 00:00:00 2001 +From: David Hildenbrand +Date: Mon, 10 Feb 2025 20:37:44 +0100 +Subject: mm/rmap: reject hugetlb folios in folio_make_device_exclusive() + +From: David Hildenbrand + +commit bc3fe6805cf09a25a086573a17d40e525208c5d8 upstream. + +Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP, +let's add a safety net in case FOLL_SPLIT_PMD usage would ever be +reworked. + +In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the +generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have +returned a page. In particular, hugetlb folios that are not PMD-sized +would never have been prone to FOLL_SPLIT_PMD. + +hugetlb folios can be anonymous, and page_make_device_exclusive_one() is +not really prepared for handling them at all. So let's spell that out. + +Link: https://lkml.kernel.org/r/20250210193801.781278-3-david@redhat.com +Fixes: b756a3b5e7ea ("mm: device exclusive memory access") +Signed-off-by: David Hildenbrand +Reviewed-by: Alistair Popple +Tested-by: Alistair Popple +Cc: Alex Shi +Cc: Danilo Krummrich +Cc: Dave Airlie +Cc: Jann Horn +Cc: Jason Gunthorpe +Cc: Jerome Glisse +Cc: John Hubbard +Cc: Jonathan Corbet +Cc: Karol Herbst +Cc: Liam Howlett +Cc: Lorenzo Stoakes +Cc: Lyude +Cc: "Masami Hiramatsu (Google)" +Cc: Oleg Nesterov +Cc: Pasha Tatashin +Cc: Peter Xu +Cc: Peter Zijlstra (Intel) +Cc: SeongJae Park +Cc: Simona Vetter +Cc: Vlastimil Babka +Cc: Yanteng Si +Cc: Barry Song +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/rmap.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/rmap.c ++++ b/mm/rmap.c +@@ -2306,7 +2306,7 @@ static bool folio_make_device_exclusive( + * Restrict to anonymous folios for now to avoid potential writeback + * issues. + */ +- if (!folio_test_anon(folio)) ++ if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) + return false; + + rmap_walk(folio, &rwc); diff --git a/queue-6.1/series b/queue-6.1/series index d7b4fb5a88..0cec40acb2 100644 --- a/queue-6.1/series +++ b/queue-6.1/series @@ -127,3 +127,7 @@ mtd-rawnand-add-status-chack-in-r852_ready.patch arm64-mm-correct-the-update-of-max_pfn.patch arm64-dts-mediatek-mt8173-fix-disp-pwm-compatible-string.patch btrfs-fix-non-empty-delayed-iputs-list-on-unmount-due-to-compressed-write-workers.patch +sparc-mm-disable-preemption-in-lazy-mmu-mode.patch +mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch +mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch +mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch diff --git a/queue-6.1/sparc-mm-disable-preemption-in-lazy-mmu-mode.patch b/queue-6.1/sparc-mm-disable-preemption-in-lazy-mmu-mode.patch new file mode 100644 index 0000000000..ade1fb51da --- /dev/null +++ b/queue-6.1/sparc-mm-disable-preemption-in-lazy-mmu-mode.patch @@ -0,0 +1,70 @@ +From a1d416bf9faf4f4871cb5a943614a07f80a7d70f Mon Sep 17 00:00:00 2001 +From: Ryan Roberts +Date: Mon, 3 Mar 2025 14:15:37 +0000 +Subject: sparc/mm: disable preemption in lazy mmu mode + +From: Ryan Roberts + +commit a1d416bf9faf4f4871cb5a943614a07f80a7d70f upstream. + +Since commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy +updates") it's been possible for arch_[enter|leave]_lazy_mmu_mode() to be +called without holding a page table lock (for the kernel mappings case), +and therefore it is possible that preemption may occur while in the lazy +mmu mode. The Sparc lazy mmu implementation is not robust to preemption +since it stores the lazy mode state in a per-cpu structure and does not +attempt to manage that state on task switch. + +Powerpc had the same issue and fixed it by explicitly disabling preemption +in arch_enter_lazy_mmu_mode() and re-enabling in +arch_leave_lazy_mmu_mode(). See commit b9ef323ea168 ("powerpc/64s: +Disable preemption in hash lazy mmu mode"). + +Given Sparc's lazy mmu mode is based on powerpc's, let's fix it in the +same way here. + +Link: https://lkml.kernel.org/r/20250303141542.3371656-4-ryan.roberts@arm.com +Fixes: 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy updates") +Signed-off-by: Ryan Roberts +Acked-by: David Hildenbrand +Acked-by: Andreas Larsson +Acked-by: Juergen Gross +Cc: Borislav Betkov +Cc: Boris Ostrovsky +Cc: Catalin Marinas +Cc: Dave Hansen +Cc: David S. Miller +Cc: "H. Peter Anvin" +Cc: Ingo Molnar +Cc: Juegren Gross +Cc: Matthew Wilcow (Oracle) +Cc: Thomas Gleinxer +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + arch/sparc/mm/tlb.c | 5 ++++- + 1 file changed, 4 insertions(+), 1 deletion(-) + +--- a/arch/sparc/mm/tlb.c ++++ b/arch/sparc/mm/tlb.c +@@ -52,8 +52,10 @@ out: + + void arch_enter_lazy_mmu_mode(void) + { +- struct tlb_batch *tb = this_cpu_ptr(&tlb_batch); ++ struct tlb_batch *tb; + ++ preempt_disable(); ++ tb = this_cpu_ptr(&tlb_batch); + tb->active = 1; + } + +@@ -64,6 +66,7 @@ void arch_leave_lazy_mmu_mode(void) + if (tb->tlb_nr) + flush_tlb_pending(); + tb->active = 0; ++ preempt_enable(); + } + + static void tlb_batch_add_one(struct mm_struct *mm, unsigned long vaddr,