From: Greg Kroah-Hartman Date: Tue, 8 Jul 2025 15:37:08 +0000 (+0200) Subject: 6.12-stable patches X-Git-Tag: v5.15.187~18 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=c6dc2cb8332ae0cc7b322b5d67ab49c811b94c79;p=thirdparty%2Fkernel%2Fstable-queue.git 6.12-stable patches added patches: mm-userfaultfd-fix-race-of-userfaultfd_move-and-swap-cache.patch mm-vmalloc-fix-data-race-in-show_numa_info.patch powerpc-kernel-fix-ppc_save_regs-inclusion-in-build.patch --- diff --git a/queue-6.12/crypto-powerpc-poly1305-add-depends-on-broken-for-no.patch b/queue-6.12/crypto-powerpc-poly1305-add-depends-on-broken-for-no.patch deleted file mode 100644 index 312940dd6a..0000000000 --- a/queue-6.12/crypto-powerpc-poly1305-add-depends-on-broken-for-no.patch +++ /dev/null @@ -1,55 +0,0 @@ -From c5eef0e2402ded238d0a8469208de6080139fd9e Mon Sep 17 00:00:00 2001 -From: Sasha Levin -Date: Tue, 20 May 2025 10:39:29 +0800 -Subject: crypto: powerpc/poly1305 - add depends on BROKEN for now - -From: Eric Biggers - -[ Upstream commit bc8169003b41e89fe7052e408cf9fdbecb4017fe ] - -As discussed in the thread containing -https://lore.kernel.org/linux-crypto/20250510053308.GB505731@sol/, the -Power10-optimized Poly1305 code is currently not safe to call in softirq -context. Disable it for now. It can be re-enabled once it is fixed. - -Fixes: ba8f8624fde2 ("crypto: poly1305-p10 - Glue code for optmized Poly1305 implementation for ppc64le") -Cc: stable@vger.kernel.org -Signed-off-by: Eric Biggers -Signed-off-by: Herbert Xu -Signed-off-by: Sasha Levin ---- - arch/powerpc/lib/crypto/Kconfig | 22 ++++++++++++++++++++++ - 1 file changed, 22 insertions(+) - create mode 100644 arch/powerpc/lib/crypto/Kconfig - -diff --git a/arch/powerpc/lib/crypto/Kconfig b/arch/powerpc/lib/crypto/Kconfig -new file mode 100644 -index 0000000000000..3f9e1bbd9905b ---- /dev/null -+++ b/arch/powerpc/lib/crypto/Kconfig -@@ -0,0 +1,22 @@ -+# SPDX-License-Identifier: GPL-2.0-only -+ -+config CRYPTO_CHACHA20_P10 -+ tristate -+ depends on PPC64 && CPU_LITTLE_ENDIAN && VSX -+ default CRYPTO_LIB_CHACHA -+ select CRYPTO_LIB_CHACHA_GENERIC -+ select CRYPTO_ARCH_HAVE_LIB_CHACHA -+ -+config CRYPTO_POLY1305_P10 -+ tristate -+ depends on PPC64 && CPU_LITTLE_ENDIAN && VSX -+ depends on BROKEN # Needs to be fixed to work in softirq context -+ default CRYPTO_LIB_POLY1305 -+ select CRYPTO_ARCH_HAVE_LIB_POLY1305 -+ select CRYPTO_LIB_POLY1305_GENERIC -+ -+config CRYPTO_SHA256_PPC_SPE -+ tristate -+ depends on SPE -+ default CRYPTO_LIB_SHA256 -+ select CRYPTO_ARCH_HAVE_LIB_SHA256 --- -2.39.5 - diff --git a/queue-6.12/mm-userfaultfd-fix-race-of-userfaultfd_move-and-swap-cache.patch b/queue-6.12/mm-userfaultfd-fix-race-of-userfaultfd_move-and-swap-cache.patch new file mode 100644 index 0000000000..2eca1ef77c --- /dev/null +++ b/queue-6.12/mm-userfaultfd-fix-race-of-userfaultfd_move-and-swap-cache.patch @@ -0,0 +1,200 @@ +From 0ea148a799198518d8ebab63ddd0bb6114a103bc Mon Sep 17 00:00:00 2001 +From: Kairui Song +Date: Wed, 4 Jun 2025 23:10:38 +0800 +Subject: mm: userfaultfd: fix race of userfaultfd_move and swap cache + +From: Kairui Song + +commit 0ea148a799198518d8ebab63ddd0bb6114a103bc upstream. + +This commit fixes two kinds of races, they may have different results: + +Barry reported a BUG_ON in commit c50f8e6053b0, we may see the same +BUG_ON if the filemap lookup returned NULL and folio is added to swap +cache after that. + +If another kind of race is triggered (folio changed after lookup) we +may see RSS counter is corrupted: + +[ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0 +type:MM_ANONPAGES val:-1 +[ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0 +type:MM_SHMEMPAGES val:1 + +Because the folio is being accounted to the wrong VMA. + +I'm not sure if there will be any data corruption though, seems no. +The issues above are critical already. + + +On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache +lookup, and tries to move the found folio to the faulting vma. Currently, +it relies on checking the PTE value to ensure that the moved folio still +belongs to the src swap entry and that no new folio has been added to the +swap cache, which turns out to be unreliable. + +While working and reviewing the swap table series with Barry, following +existing races are observed and reproduced [1]: + +In the example below, move_pages_pte is moving src_pte to dst_pte, where +src_pte is a swap entry PTE holding swap entry S1, and S1 is not in the +swap cache: + +CPU1 CPU2 +userfaultfd_move + move_pages_pte() + entry = pte_to_swp_entry(orig_src_pte); + // Here it got entry = S1 + ... < interrupted> ... + + // folio A is a new allocated folio + // and get installed into src_pte + + // src_pte now points to folio A, S1 + // has swap count == 0, it can be freed + // by folio_swap_swap or swap + // allocator's reclaim. + + // folio B is a folio in another VMA. + + // S1 is freed, folio B can use it + // for swap out with no problem. + ... + folio = filemap_get_folio(S1) + // Got folio B here !!! + ... < interrupted again> ... + + // Now S1 is free to be used again. + + // Now src_pte is a swap entry PTE + // holding S1 again. + folio_trylock(folio) + move_swap_pte + double_pt_lock + is_pte_pages_stable + // Check passed because src_pte == S1 + folio_move_anon_rmap(...) + // Moved invalid folio B here !!! + +The race window is very short and requires multiple collisions of multiple +rare events, so it's very unlikely to happen, but with a deliberately +constructed reproducer and increased time window, it can be reproduced +easily. + +This can be fixed by checking if the folio returned by filemap is the +valid swap cache folio after acquiring the folio lock. + +Another similar race is possible: filemap_get_folio may return NULL, but +folio (A) could be swapped in and then swapped out again using the same +swap entry after the lookup. In such a case, folio (A) may remain in the +swap cache, so it must be moved too: + +CPU1 CPU2 +userfaultfd_move + move_pages_pte() + entry = pte_to_swp_entry(orig_src_pte); + // Here it got entry = S1, and S1 is not in swap cache + folio = filemap_get_folio(S1) + // Got NULL + ... < interrupted again> ... + + + move_swap_pte + double_pt_lock + is_pte_pages_stable + // Check passed because src_pte == S1 + folio_move_anon_rmap(...) + // folio A is ignored !!! + +Fix this by checking the swap cache again after acquiring the src_pte +lock. And to avoid the filemap overhead, we check swap_map directly [2]. + +The SWP_SYNCHRONOUS_IO path does make the problem more complex, but so far +we don't need to worry about that, since folios can only be exposed to the +swap cache in the swap out path, and this is covered in this patch by +checking the swap cache again after acquiring the src_pte lock. + +Testing with a simple C program that allocates and moves several GB of +memory did not show any observable performance change. + +Link: https://lkml.kernel.org/r/20250604151038.21968-1-ryncsn@gmail.com +Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") +Signed-off-by: Kairui Song +Closes: https://lore.kernel.org/linux-mm/CAMgjq7B1K=6OOrK2OUZ0-tqCzi+EJt+2_K97TPGoSt=9+JwP7Q@mail.gmail.com/ [1] +Link: https://lore.kernel.org/all/CAGsJ_4yJhJBo16XhiC-nUzSheyX-V3-nFE+tAi=8Y560K8eT=A@mail.gmail.com/ [2] +Reviewed-by: Lokesh Gidra +Acked-by: Peter Xu +Reviewed-by: Suren Baghdasaryan +Reviewed-by: Barry Song +Reviewed-by: Chris Li +Cc: Andrea Arcangeli +Cc: David Hildenbrand +Cc: Kairui Song +Cc: +Signed-off-by: Andrew Morton +(cherry picked from commit 0ea148a799198518d8ebab63ddd0bb6114a103bc) +[ lokeshgidra: resolved merged conflict caused by the difference in + move_swap_pte() arguments ] +Signed-off-by: Lokesh Gidra +Signed-off-by: Greg Kroah-Hartman +--- + mm/userfaultfd.c | 33 +++++++++++++++++++++++++++++++-- + 1 file changed, 31 insertions(+), 2 deletions(-) + +--- a/mm/userfaultfd.c ++++ b/mm/userfaultfd.c +@@ -1078,8 +1078,18 @@ static int move_swap_pte(struct mm_struc + pte_t *dst_pte, pte_t *src_pte, + pte_t orig_dst_pte, pte_t orig_src_pte, + spinlock_t *dst_ptl, spinlock_t *src_ptl, +- struct folio *src_folio) ++ struct folio *src_folio, ++ struct swap_info_struct *si, swp_entry_t entry) + { ++ /* ++ * Check if the folio still belongs to the target swap entry after ++ * acquiring the lock. Folio can be freed in the swap cache while ++ * not locked. ++ */ ++ if (src_folio && unlikely(!folio_test_swapcache(src_folio) || ++ entry.val != src_folio->swap.val)) ++ return -EAGAIN; ++ + double_pt_lock(dst_ptl, src_ptl); + + if (!pte_same(ptep_get(src_pte), orig_src_pte) || +@@ -1096,6 +1106,25 @@ static int move_swap_pte(struct mm_struc + if (src_folio) { + folio_move_anon_rmap(src_folio, dst_vma); + src_folio->index = linear_page_index(dst_vma, dst_addr); ++ } else { ++ /* ++ * Check if the swap entry is cached after acquiring the src_pte ++ * lock. Otherwise, we might miss a newly loaded swap cache folio. ++ * ++ * Check swap_map directly to minimize overhead, READ_ONCE is sufficient. ++ * We are trying to catch newly added swap cache, the only possible case is ++ * when a folio is swapped in and out again staying in swap cache, using the ++ * same entry before the PTE check above. The PTL is acquired and released ++ * twice, each time after updating the swap_map's flag. So holding ++ * the PTL here ensures we see the updated value. False positive is possible, ++ * e.g. SWP_SYNCHRONOUS_IO swapin may set the flag without touching the ++ * cache, or during the tiny synchronization window between swap cache and ++ * swap_map, but it will be gone very quickly, worst result is retry jitters. ++ */ ++ if (READ_ONCE(si->swap_map[swp_offset(entry)]) & SWAP_HAS_CACHE) { ++ double_pt_unlock(dst_ptl, src_ptl); ++ return -EAGAIN; ++ } + } + + orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte); +@@ -1391,7 +1420,7 @@ retry: + } + err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte, + orig_dst_pte, orig_src_pte, +- dst_ptl, src_ptl, src_folio); ++ dst_ptl, src_ptl, src_folio, si, entry); + } + + out: diff --git a/queue-6.12/mm-vmalloc-fix-data-race-in-show_numa_info.patch b/queue-6.12/mm-vmalloc-fix-data-race-in-show_numa_info.patch new file mode 100644 index 0000000000..b56a9dc04a --- /dev/null +++ b/queue-6.12/mm-vmalloc-fix-data-race-in-show_numa_info.patch @@ -0,0 +1,164 @@ +From 5c5f0468d172ddec2e333d738d2a1f85402cf0bc Mon Sep 17 00:00:00 2001 +From: Jeongjun Park +Date: Fri, 9 May 2025 01:56:20 +0900 +Subject: mm/vmalloc: fix data race in show_numa_info() + +From: Jeongjun Park + +commit 5c5f0468d172ddec2e333d738d2a1f85402cf0bc upstream. + +The following data-race was found in show_numa_info(): + +================================================================== +BUG: KCSAN: data-race in vmalloc_info_show / vmalloc_info_show + +read to 0xffff88800971fe30 of 4 bytes by task 8289 on cpu 0: + show_numa_info mm/vmalloc.c:4936 [inline] + vmalloc_info_show+0x5a8/0x7e0 mm/vmalloc.c:5016 + seq_read_iter+0x373/0xb40 fs/seq_file.c:230 + proc_reg_read_iter+0x11e/0x170 fs/proc/inode.c:299 +.... + +write to 0xffff88800971fe30 of 4 bytes by task 8287 on cpu 1: + show_numa_info mm/vmalloc.c:4934 [inline] + vmalloc_info_show+0x38f/0x7e0 mm/vmalloc.c:5016 + seq_read_iter+0x373/0xb40 fs/seq_file.c:230 + proc_reg_read_iter+0x11e/0x170 fs/proc/inode.c:299 +.... + +value changed: 0x0000008f -> 0x00000000 +================================================================== + +According to this report,there is a read/write data-race because +m->private is accessible to multiple CPUs. To fix this, instead of +allocating the heap in proc_vmalloc_init() and passing the heap address to +m->private, vmalloc_info_show() should allocate the heap. + +Link: https://lkml.kernel.org/r/20250508165620.15321-1-aha310510@gmail.com +Fixes: 8e1d743f2c26 ("mm: vmalloc: support multiple nodes in vmallocinfo") +Signed-off-by: Jeongjun Park +Suggested-by: Eric Dumazet +Suggested-by: Andrew Morton +Reviewed-by: "Uladzislau Rezki (Sony)" +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/vmalloc.c | 63 ++++++++++++++++++++++++++++++++--------------------------- + 1 file changed, 35 insertions(+), 28 deletions(-) + +--- a/mm/vmalloc.c ++++ b/mm/vmalloc.c +@@ -3095,7 +3095,7 @@ static void clear_vm_uninitialized_flag( + /* + * Before removing VM_UNINITIALIZED, + * we should make sure that vm has proper values. +- * Pair with smp_rmb() in show_numa_info(). ++ * Pair with smp_rmb() in vread_iter() and vmalloc_info_show(). + */ + smp_wmb(); + vm->flags &= ~VM_UNINITIALIZED; +@@ -4938,28 +4938,29 @@ bool vmalloc_dump_obj(void *object) + #endif + + #ifdef CONFIG_PROC_FS +-static void show_numa_info(struct seq_file *m, struct vm_struct *v) +-{ +- if (IS_ENABLED(CONFIG_NUMA)) { +- unsigned int nr, *counters = m->private; +- unsigned int step = 1U << vm_area_page_order(v); + +- if (!counters) +- return; ++/* ++ * Print number of pages allocated on each memory node. ++ * ++ * This function can only be called if CONFIG_NUMA is enabled ++ * and VM_UNINITIALIZED bit in v->flags is disabled. ++ */ ++static void show_numa_info(struct seq_file *m, struct vm_struct *v, ++ unsigned int *counters) ++{ ++ unsigned int nr; ++ unsigned int step = 1U << vm_area_page_order(v); + +- if (v->flags & VM_UNINITIALIZED) +- return; +- /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */ +- smp_rmb(); ++ if (!counters) ++ return; + +- memset(counters, 0, nr_node_ids * sizeof(unsigned int)); ++ memset(counters, 0, nr_node_ids * sizeof(unsigned int)); + +- for (nr = 0; nr < v->nr_pages; nr += step) +- counters[page_to_nid(v->pages[nr])] += step; +- for_each_node_state(nr, N_HIGH_MEMORY) +- if (counters[nr]) +- seq_printf(m, " N%u=%u", nr, counters[nr]); +- } ++ for (nr = 0; nr < v->nr_pages; nr += step) ++ counters[page_to_nid(v->pages[nr])] += step; ++ for_each_node_state(nr, N_HIGH_MEMORY) ++ if (counters[nr]) ++ seq_printf(m, " N%u=%u", nr, counters[nr]); + } + + static void show_purge_info(struct seq_file *m) +@@ -4987,6 +4988,10 @@ static int vmalloc_info_show(struct seq_ + struct vmap_area *va; + struct vm_struct *v; + int i; ++ unsigned int *counters; ++ ++ if (IS_ENABLED(CONFIG_NUMA)) ++ counters = kmalloc(nr_node_ids * sizeof(unsigned int), GFP_KERNEL); + + for (i = 0; i < nr_vmap_nodes; i++) { + vn = &vmap_nodes[i]; +@@ -5003,6 +5008,11 @@ static int vmalloc_info_show(struct seq_ + } + + v = va->vm; ++ if (v->flags & VM_UNINITIALIZED) ++ continue; ++ ++ /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */ ++ smp_rmb(); + + seq_printf(m, "0x%pK-0x%pK %7ld", + v->addr, v->addr + v->size, v->size); +@@ -5037,7 +5047,9 @@ static int vmalloc_info_show(struct seq_ + if (is_vmalloc_addr(v->pages)) + seq_puts(m, " vpages"); + +- show_numa_info(m, v); ++ if (IS_ENABLED(CONFIG_NUMA)) ++ show_numa_info(m, v, counters); ++ + seq_putc(m, '\n'); + } + spin_unlock(&vn->busy.lock); +@@ -5047,19 +5059,14 @@ static int vmalloc_info_show(struct seq_ + * As a final step, dump "unpurged" areas. + */ + show_purge_info(m); ++ if (IS_ENABLED(CONFIG_NUMA)) ++ kfree(counters); + return 0; + } + + static int __init proc_vmalloc_init(void) + { +- void *priv_data = NULL; +- +- if (IS_ENABLED(CONFIG_NUMA)) +- priv_data = kmalloc(nr_node_ids * sizeof(unsigned int), GFP_KERNEL); +- +- proc_create_single_data("vmallocinfo", +- 0400, NULL, vmalloc_info_show, priv_data); +- ++ proc_create_single("vmallocinfo", 0400, NULL, vmalloc_info_show); + return 0; + } + module_init(proc_vmalloc_init); diff --git a/queue-6.12/powerpc-kernel-fix-ppc_save_regs-inclusion-in-build.patch b/queue-6.12/powerpc-kernel-fix-ppc_save_regs-inclusion-in-build.patch new file mode 100644 index 0000000000..16c7439f29 --- /dev/null +++ b/queue-6.12/powerpc-kernel-fix-ppc_save_regs-inclusion-in-build.patch @@ -0,0 +1,48 @@ +From 93bd4a80efeb521314485a06d8c21157240497bb Mon Sep 17 00:00:00 2001 +From: Madhavan Srinivasan +Date: Sun, 11 May 2025 09:41:11 +0530 +Subject: powerpc/kernel: Fix ppc_save_regs inclusion in build + +From: Madhavan Srinivasan + +commit 93bd4a80efeb521314485a06d8c21157240497bb upstream. + +Recent patch fixed an old commit +'fc2a5a6161a2 ("powerpc/64s: ppc_save_regs is now needed for all 64s builds")' +which is to include building of ppc_save_reg.c only when XMON +and KEXEC_CORE and PPC_BOOK3S are enabled. This was valid, since +ppc_save_regs was called only in replay_system_reset() of old +irq.c which was under BOOK3S. + +But there has been multiple refactoring of irq.c and have +added call to ppc_save_regs() from __replay_soft_interrupts +-> replay_soft_interrupts which is part of irq_64.c included +under CONFIG_PPC64. And since ppc_save_regs is called in +CRASH_DUMP path as part of crash_setup_regs in kexec.h, +CONFIG_PPC32 also needs it. + +So with this recent patch which enabled the building of +ppc_save_regs.c caused a build break when none of these +(XMON, KEXEC_CORE, BOOK3S) where enabled as part of config. +Patch to enable building of ppc_save_regs.c by defaults. + +Signed-off-by: Madhavan Srinivasan +Link: https://patch.msgid.link/20250511041111.841158-1-maddy@linux.ibm.com +Cc: Guenter Roeck +Signed-off-by: Greg Kroah-Hartman +--- + arch/powerpc/kernel/Makefile | 2 -- + 1 file changed, 2 deletions(-) + +--- a/arch/powerpc/kernel/Makefile ++++ b/arch/powerpc/kernel/Makefile +@@ -162,9 +162,7 @@ endif + + obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM) += tm.o + +-ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC_CORE)$(CONFIG_PPC_BOOK3S),) + obj-y += ppc_save_regs.o +-endif + + obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o + obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o diff --git a/queue-6.12/series b/queue-6.12/series index 3aa280218a..a09bf3e474 100644 --- a/queue-6.12/series +++ b/queue-6.12/series @@ -165,7 +165,6 @@ drm-i915-dp_mst-work-around-thunderbolt-sink-disconn.patch drm-amdgpu-add-kicker-fws-loading-for-gfx11-smu13-ps.patch drm-amd-display-add-more-checks-for-dsc-hubp-ono-gua.patch arm64-dts-qcom-x1e80100-crd-mark-l12b-and-l15b-alway.patch -crypto-powerpc-poly1305-add-depends-on-broken-for-no.patch drm-amdgpu-mes-add-missing-locking-in-helper-functio.patch sched_ext-make-scx_group_set_weight-always-update-tg.patch scsi-lpfc-restore-clearing-of-nlp_unreg_inp-in-ndlp-.patch @@ -223,3 +222,6 @@ platform-x86-think-lmi-create-ksets-consecutively.patch platform-x86-think-lmi-fix-kobject-cleanup.patch platform-x86-think-lmi-fix-sysfs-group-cleanup.patch usb-typec-displayport-fix-potential-deadlock.patch +powerpc-kernel-fix-ppc_save_regs-inclusion-in-build.patch +mm-vmalloc-fix-data-race-in-show_numa_info.patch +mm-userfaultfd-fix-race-of-userfaultfd_move-and-swap-cache.patch