From: Greg Kroah-Hartman Date: Sat, 21 Mar 2026 08:39:22 +0000 (+0100) Subject: 5.15-stable patches X-Git-Tag: v6.1.167~58 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=45f88f081cc36df9aba3a355fe8b3562773954a7;p=thirdparty%2Fkernel%2Fstable-queue.git 5.15-stable patches added patches: mm-hugetlb-fix-copy_hugetlb_page_range-to-use-pt_share_count.patch mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch mm-hugetlb-fix-hugetlb_pmd_shared.patch mm-hugetlb-fix-two-comments-related-to-huge_pmd_unshare.patch mm-hugetlb-make-detecting-shared-pte-more-reliable.patch mm-rmap-fix-two-comments-related-to-huge_pmd_unshare.patch --- diff --git a/queue-5.15/mm-hugetlb-fix-copy_hugetlb_page_range-to-use-pt_share_count.patch b/queue-5.15/mm-hugetlb-fix-copy_hugetlb_page_range-to-use-pt_share_count.patch new file mode 100644 index 0000000000..4cafa5640c --- /dev/null +++ b/queue-5.15/mm-hugetlb-fix-copy_hugetlb_page_range-to-use-pt_share_count.patch @@ -0,0 +1,88 @@ +From 14967a9c7d247841b0312c48dcf8cd29e55a4cc8 Mon Sep 17 00:00:00 2001 +From: Jane Chu +Date: Mon, 15 Sep 2025 18:45:20 -0600 +Subject: mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count + +From: Jane Chu + +commit 14967a9c7d247841b0312c48dcf8cd29e55a4cc8 upstream. + +commit 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared +count") introduced ->pt_share_count dedicated to hugetlb PMD share count +tracking, but omitted fixing copy_hugetlb_page_range(), leaving the +function relying on page_count() for tracking that no longer works. + +When lazy page table copy for hugetlb is disabled, that is, revert commit +bcd51a3c679d ("hugetlb: lazy page table copies in fork()") fork()'ing with +hugetlb PMD sharing quickly lockup - + +[ 239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s! +[ 239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0 +[ 239.446631] Call Trace: +[ 239.446633] +[ 239.446636] _raw_spin_lock+0x3f/0x60 +[ 239.446639] copy_hugetlb_page_range+0x258/0xb50 +[ 239.446645] copy_page_range+0x22b/0x2c0 +[ 239.446651] dup_mmap+0x3e2/0x770 +[ 239.446654] dup_mm.constprop.0+0x5e/0x230 +[ 239.446657] copy_process+0xd17/0x1760 +[ 239.446660] kernel_clone+0xc0/0x3e0 +[ 239.446661] __do_sys_clone+0x65/0xa0 +[ 239.446664] do_syscall_64+0x82/0x930 +[ 239.446668] ? count_memcg_events+0xd2/0x190 +[ 239.446671] ? syscall_trace_enter+0x14e/0x1f0 +[ 239.446676] ? syscall_exit_work+0x118/0x150 +[ 239.446677] ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0 +[ 239.446681] ? clear_bhb_loop+0x30/0x80 +[ 239.446684] ? clear_bhb_loop+0x30/0x80 +[ 239.446686] entry_SYSCALL_64_after_hwframe+0x76/0x7e + +There are two options to resolve the potential latent issue: + 1. warn against PMD sharing in copy_hugetlb_page_range(), + 2. fix it. +This patch opts for the second option. +While at it, simplify the comment, the details are not actually relevant +anymore. + +Link: https://lkml.kernel.org/r/20250916004520.1604530-1-jane.chu@oracle.com +Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count") +Signed-off-by: Jane Chu +Reviewed-by: Harry Yoo +Acked-by: Oscar Salvador +Acked-by: David Hildenbrand +Cc: Jann Horn +Cc: Liu Shixin +Cc: Muchun Song +Signed-off-by: Andrew Morton +[ David: We don't have ptdesc and the wrappers, so work directly on the + page->pt_share_count. CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING is still + called CONFIG_ARCH_WANT_HUGE_PMD_SHARE. ] +Signed-off-by: David Hildenbrand (Arm) +Signed-off-by: Greg Kroah-Hartman +--- + mm/hugetlb.c | 13 ++++--------- + 1 file changed, 4 insertions(+), 9 deletions(-) + +--- a/mm/hugetlb.c ++++ b/mm/hugetlb.c +@@ -4341,16 +4341,11 @@ int copy_hugetlb_page_range(struct mm_st + break; + } + +- /* +- * If the pagetables are shared don't copy or take references. +- * +- * dst_pte == src_pte is the common case of src/dest sharing. +- * However, src could have 'unshared' and dst shares with +- * another vma. So page_count of ptep page is checked instead +- * to reliably determine whether pte is shared. +- */ +- if (page_count(virt_to_page(dst_pte)) > 1) ++#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE ++ /* If the pagetables are shared, there is nothing to do */ ++ if (atomic_read(&virt_to_page(dst_pte)->pt_share_count)) + continue; ++#endif + + dst_ptl = huge_pte_lock(h, dst, dst_pte); + src_ptl = huge_pte_lockptr(h, src, src_pte); diff --git a/queue-5.15/mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch b/queue-5.15/mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch new file mode 100644 index 0000000000..affa282ed2 --- /dev/null +++ b/queue-5.15/mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch @@ -0,0 +1,728 @@ +From 8ce720d5bd91e9dc16db3604aa4b1bf76770a9a1 Mon Sep 17 00:00:00 2001 +From: "David Hildenbrand (Red Hat)" +Date: Tue, 23 Dec 2025 22:40:37 +0100 +Subject: mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather + +From: David Hildenbrand (Red Hat) + +commit 8ce720d5bd91e9dc16db3604aa4b1bf76770a9a1 upstream. + +As reported, ever since commit 1013af4f585f ("mm/hugetlb: fix +huge_pmd_unshare() vs GUP-fast race") we can end up in some situations +where we perform so many IPI broadcasts when unsharing hugetlb PMD page +tables that it severely regresses some workloads. + +In particular, when we fork()+exit(), or when we munmap() a large +area backed by many shared PMD tables, we perform one IPI broadcast per +unshared PMD table. + +There are two optimizations to be had: + +(1) When we process (unshare) multiple such PMD tables, such as during + exit(), it is sufficient to send a single IPI broadcast (as long as + we respect locking rules) instead of one per PMD table. + + Locking prevents that any of these PMD tables could get reused before + we drop the lock. + +(2) When we are not the last sharer (> 2 users including us), there is + no need to send the IPI broadcast. The shared PMD tables cannot + become exclusive (fully unshared) before an IPI will be broadcasted + by the last sharer. + + Concurrent GUP-fast could walk into a PMD table just before we + unshared it. It could then succeed in grabbing a page from the + shared page table even after munmap() etc succeeded (and supressed + an IPI). But there is not difference compared to GUP-fast just + sleeping for a while after grabbing the page and re-enabling IRQs. + + Most importantly, GUP-fast will never walk into page tables that are + no-longer shared, because the last sharer will issue an IPI + broadcast. + + (if ever required, checking whether the PUD changed in GUP-fast + after grabbing the page like we do in the PTE case could handle + this) + +So let's rework PMD sharing TLB flushing + IPI sync to use the mmu_gather +infrastructure so we can implement these optimizations and demystify the +code at least a bit. Extend the mmu_gather infrastructure to be able to +deal with our special hugetlb PMD table sharing implementation. + +To make initialization of the mmu_gather easier when working on a single +VMA (in particular, when dealing with hugetlb), provide +tlb_gather_mmu_vma(). + +We'll consolidate the handling for (full) unsharing of PMD tables in +tlb_unshare_pmd_ptdesc() and tlb_flush_unshared_tables(), and track +in "struct mmu_gather" whether we had (full) unsharing of PMD tables. + +Because locking is very special (concurrent unsharing+reuse must be +prevented), we disallow deferring flushing to tlb_finish_mmu() and instead +require an explicit earlier call to tlb_flush_unshared_tables(). + +From hugetlb code, we call huge_pmd_unshare_flush() where we make sure +that the expected lock protecting us from concurrent unsharing+reuse is +still held. + +Check with a VM_WARN_ON_ONCE() in tlb_finish_mmu() that +tlb_flush_unshared_tables() was properly called earlier. + +Document it all properly. + +Notes about tlb_remove_table_sync_one() interaction with unsharing: + +There are two fairly tricky things: + +(1) tlb_remove_table_sync_one() is a NOP on architectures without + CONFIG_MMU_GATHER_RCU_TABLE_FREE. + + Here, the assumption is that the previous TLB flush would send an + IPI to all relevant CPUs. Careful: some architectures like x86 only + send IPIs to all relevant CPUs when tlb->freed_tables is set. + + The relevant architectures should be selecting + MMU_GATHER_RCU_TABLE_FREE, but x86 might not do that in stable + kernels and it might have been problematic before this patch. + + Also, the arch flushing behavior (independent of IPIs) is different + when tlb->freed_tables is set. Do we have to enlighten them to also + take care of tlb->unshared_tables? So far we didn't care, so + hopefully we are fine. Of course, we could be setting + tlb->freed_tables as well, but that might then unnecessarily flush + too much, because the semantics of tlb->freed_tables are a bit + fuzzy. + + This patch changes nothing in this regard. + +(2) tlb_remove_table_sync_one() is not a NOP on architectures with + CONFIG_MMU_GATHER_RCU_TABLE_FREE that actually don't need a sync. + + Take x86 as an example: in the common case (!pv, !X86_FEATURE_INVLPGB) + we still issue IPIs during TLB flushes and don't actually need the + second tlb_remove_table_sync_one(). + + This optimized can be implemented on top of this, by checking e.g., in + tlb_remove_table_sync_one() whether we really need IPIs. But as + described in (1), it really must honor tlb->freed_tables then to + send IPIs to all relevant CPUs. + +Notes on TLB flushing changes: + +(1) Flushing for non-shared PMD tables + + We're converting from flush_hugetlb_tlb_range() to + tlb_remove_huge_tlb_entry(). Given that we properly initialize the + MMU gather in tlb_gather_mmu_vma() to be hugetlb aware, similar to + __unmap_hugepage_range(), that should be fine. + +(2) Flushing for shared PMD tables + + We're converting from various things (flush_hugetlb_tlb_range(), + tlb_flush_pmd_range(), flush_tlb_range()) to tlb_flush_pmd_range(). + + tlb_flush_pmd_range() achieves the same that + tlb_remove_huge_tlb_entry() would achieve in these scenarios. + Note that tlb_remove_huge_tlb_entry() also calls + __tlb_remove_tlb_entry(), however that is only implemented on + powerpc, which does not support PMD table sharing. + + Similar to (1), tlb_gather_mmu_vma() should make sure that TLB + flushing keeps on working as expected. + +Further, note that the ptdesc_pmd_pts_dec() in huge_pmd_share() is not a +concern, as we are holding the i_mmap_lock the whole time, preventing +concurrent unsharing. That ptdesc_pmd_pts_dec() usage will be removed +separately as a cleanup later. + +There are plenty more cleanups to be had, but they have to wait until +this is fixed. + +[david@kernel.org: fix kerneldoc] + Link: https://lkml.kernel.org/r/f223dd74-331c-412d-93fc-69e360a5006c@kernel.org +Link: https://lkml.kernel.org/r/20251223214037.580860-5-david@kernel.org +Fixes: 1013af4f585f ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race") +Signed-off-by: David Hildenbrand (Red Hat) +Reported-by: Uschakow, Stanislav" +Closes: https://lore.kernel.org/all/4d3878531c76479d9f8ca9789dc6485d@amazon.de/ +Tested-by: Laurence Oberman +Acked-by: Harry Yoo +Reviewed-by: Lorenzo Stoakes +Cc: Lance Yang +Cc: Liu Shixin +Cc: Oscar Salvador +Cc: Rik van Riel +Cc: +Signed-off-by: Andrew Morton +[ David: We don't have ptdesc and the wrappers, so work directly on + page->pt_share_count and pass "struct page" instead of "struct ptdesc". + CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING is still called + CONFIG_ARCH_WANT_HUGE_PMD_SHARE and is set even without + CONFIG_HUGETLB_PAGE. We don't have 550a7d60bd5e ("mm, hugepages: add + mremap() support for hugepage backed vma"), so move_hugetlb_page_tables() + does not exist. We don't have 40549ba8f8e0 ("hugetlb: use new vma_lock + for pmd sharing synchronization") so changes in mm/rmap.c looks quite + different. We don't have 4ddb4d91b82f ("hugetlb: do not update address + in huge_pmd_unshare"), so huge_pmd_unshare() still gets a pointer to + an address. Some smaller contextual stuff. ] +Signed-off-by: David Hildenbrand (Arm) +Signed-off-by: Greg Kroah-Hartman +--- + include/asm-generic/tlb.h | 77 ++++++++++++++++++++++++++++++++- + include/linux/hugetlb.h | 15 ++++-- + include/linux/mm_types.h | 1 + mm/hugetlb.c | 107 +++++++++++++++++++++++++++------------------- + mm/mmu_gather.c | 33 ++++++++++++++ + mm/rmap.c | 20 ++++++-- + 6 files changed, 197 insertions(+), 56 deletions(-) + +--- a/include/asm-generic/tlb.h ++++ b/include/asm-generic/tlb.h +@@ -46,7 +46,8 @@ + * + * The mmu_gather API consists of: + * +- * - tlb_gather_mmu() / tlb_gather_mmu_fullmm() / tlb_finish_mmu() ++ * - tlb_gather_mmu() / tlb_gather_mmu_fullmm() / tlb_gather_mmu_vma() / ++ * tlb_finish_mmu() + * + * start and finish a mmu_gather + * +@@ -293,6 +294,20 @@ struct mmu_gather { + unsigned int vma_exec : 1; + unsigned int vma_huge : 1; + ++ /* ++ * Did we unshare (unmap) any shared page tables? For now only ++ * used for hugetlb PMD table sharing. ++ */ ++ unsigned int unshared_tables : 1; ++ ++ /* ++ * Did we unshare any page tables such that they are now exclusive ++ * and could get reused+modified by the new owner? When setting this ++ * flag, "unshared_tables" will be set as well. For now only used ++ * for hugetlb PMD table sharing. ++ */ ++ unsigned int fully_unshared_tables : 1; ++ + unsigned int batch_count; + + #ifndef CONFIG_MMU_GATHER_NO_GATHER +@@ -329,6 +344,7 @@ static inline void __tlb_reset_range(str + tlb->cleared_pmds = 0; + tlb->cleared_puds = 0; + tlb->cleared_p4ds = 0; ++ tlb->unshared_tables = 0; + /* + * Do not reset mmu_gather::vma_* fields here, we do not + * call into tlb_start_vma() again to set them if there is an +@@ -424,7 +440,7 @@ static inline void tlb_flush_mmu_tlbonly + * these bits. + */ + if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds || +- tlb->cleared_puds || tlb->cleared_p4ds)) ++ tlb->cleared_puds || tlb->cleared_p4ds || tlb->unshared_tables)) + return; + + tlb_flush(tlb); +@@ -662,6 +678,63 @@ static inline void tlb_flush_p4d_range(s + } while (0) + #endif + ++#if defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_HUGETLB_PAGE) ++static inline void tlb_unshare_pmd_ptdesc(struct mmu_gather *tlb, struct page *pt, ++ unsigned long addr) ++{ ++ /* ++ * The caller must make sure that concurrent unsharing + exclusive ++ * reuse is impossible until tlb_flush_unshared_tables() was called. ++ */ ++ VM_WARN_ON_ONCE(!atomic_read(&pt->pt_share_count)); ++ atomic_dec(&pt->pt_share_count); ++ ++ /* Clearing a PUD pointing at a PMD table with PMD leaves. */ ++ tlb_flush_pmd_range(tlb, addr & PUD_MASK, PUD_SIZE); ++ ++ /* ++ * If the page table is now exclusively owned, we fully unshared ++ * a page table. ++ */ ++ if (!atomic_read(&pt->pt_share_count)) ++ tlb->fully_unshared_tables = true; ++ tlb->unshared_tables = true; ++} ++ ++static inline void tlb_flush_unshared_tables(struct mmu_gather *tlb) ++{ ++ /* ++ * As soon as the caller drops locks to allow for reuse of ++ * previously-shared tables, these tables could get modified and ++ * even reused outside of hugetlb context, so we have to make sure that ++ * any page table walkers (incl. TLB, GUP-fast) are aware of that ++ * change. ++ * ++ * Even if we are not fully unsharing a PMD table, we must ++ * flush the TLB for the unsharer now. ++ */ ++ if (tlb->unshared_tables) ++ tlb_flush_mmu_tlbonly(tlb); ++ ++ /* ++ * Similarly, we must make sure that concurrent GUP-fast will not ++ * walk previously-shared page tables that are getting modified+reused ++ * elsewhere. So broadcast an IPI to wait for any concurrent GUP-fast. ++ * ++ * We only perform this when we are the last sharer of a page table, ++ * as the IPI will reach all CPUs: any GUP-fast. ++ * ++ * Note that on configs where tlb_remove_table_sync_one() is a NOP, ++ * the expectation is that the tlb_flush_mmu_tlbonly() would have issued ++ * required IPIs already for us. ++ */ ++ if (tlb->fully_unshared_tables) { ++ tlb_remove_table_sync_one(); ++ tlb->fully_unshared_tables = false; ++ } ++} ++#endif ++ + #endif /* CONFIG_MMU */ + + #endif /* _ASM_GENERIC__TLB_H */ +--- a/include/linux/hugetlb.h ++++ b/include/linux/hugetlb.h +@@ -190,8 +190,9 @@ pte_t *huge_pte_alloc(struct mm_struct * + unsigned long addr, unsigned long sz); + pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz); +-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, +- unsigned long *addr, pte_t *ptep); ++int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma, ++ unsigned long *addr, pte_t *ptep); ++void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma); + void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, + unsigned long *start, unsigned long *end); + struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, +@@ -232,13 +233,17 @@ static inline struct address_space *huge + return NULL; + } + +-static inline int huge_pmd_unshare(struct mm_struct *mm, +- struct vm_area_struct *vma, +- unsigned long *addr, pte_t *ptep) ++static inline int huge_pmd_unshare(struct mmu_gather *tlb, ++ struct vm_area_struct *vma, unsigned long *addr, pte_t *ptep) + { + return 0; + } + ++static inline void huge_pmd_unshare_flush(struct mmu_gather *tlb, ++ struct vm_area_struct *vma) ++{ ++} ++ + static inline void adjust_range_if_pmd_sharing_possible( + struct vm_area_struct *vma, + unsigned long *start, unsigned long *end) +--- a/include/linux/mm_types.h ++++ b/include/linux/mm_types.h +@@ -612,6 +612,7 @@ static inline cpumask_t *mm_cpumask(stru + struct mmu_gather; + extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm); + extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm); ++void tlb_gather_mmu_vma(struct mmu_gather *tlb, struct vm_area_struct *vma); + extern void tlb_finish_mmu(struct mmu_gather *tlb); + + static inline void init_tlb_flush_pending(struct mm_struct *mm) +--- a/mm/hugetlb.c ++++ b/mm/hugetlb.c +@@ -4463,7 +4463,6 @@ void __unmap_hugepage_range(struct mmu_g + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + struct mmu_notifier_range range; +- bool force_flush = false; + + WARN_ON(!is_vm_hugetlb_page(vma)); + BUG_ON(start & ~huge_page_mask(h)); +@@ -4490,10 +4489,8 @@ void __unmap_hugepage_range(struct mmu_g + continue; + + ptl = huge_pte_lock(h, mm, ptep); +- if (huge_pmd_unshare(mm, vma, &address, ptep)) { ++ if (huge_pmd_unshare(tlb, vma, &address, ptep)) { + spin_unlock(ptl); +- tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); +- force_flush = true; + continue; + } + +@@ -4551,14 +4548,7 @@ void __unmap_hugepage_range(struct mmu_g + mmu_notifier_invalidate_range_end(&range); + tlb_end_vma(tlb, vma); + +- /* +- * There is nothing protecting a previously-shared page table that we +- * unshared through huge_pmd_unshare() from getting freed after we +- * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare() +- * succeeded, flush the range corresponding to the pud. +- */ +- if (force_flush) +- tlb_flush_mmu_tlbonly(tlb); ++ huge_pmd_unshare_flush(tlb, vma); + } + + void __unmap_hugepage_range_final(struct mmu_gather *tlb, +@@ -5636,8 +5626,8 @@ unsigned long hugetlb_change_protection( + pte_t pte; + struct hstate *h = hstate_vma(vma); + unsigned long pages = 0; +- bool shared_pmd = false; + struct mmu_notifier_range range; ++ struct mmu_gather tlb; + + /* + * In the case of shared PMDs, the area to flush could be beyond +@@ -5650,6 +5640,7 @@ unsigned long hugetlb_change_protection( + + BUG_ON(address >= end); + flush_cache_range(vma, range.start, range.end); ++ tlb_gather_mmu_vma(&tlb, vma); + + mmu_notifier_invalidate_range_start(&range); + i_mmap_lock_write(vma->vm_file->f_mapping); +@@ -5659,10 +5650,9 @@ unsigned long hugetlb_change_protection( + if (!ptep) + continue; + ptl = huge_pte_lock(h, mm, ptep); +- if (huge_pmd_unshare(mm, vma, &address, ptep)) { ++ if (huge_pmd_unshare(&tlb, vma, &address, ptep)) { + pages++; + spin_unlock(ptl); +- shared_pmd = true; + continue; + } + pte = huge_ptep_get(ptep); +@@ -5695,21 +5685,15 @@ unsigned long hugetlb_change_protection( + pte = arch_make_huge_pte(pte, shift, vma->vm_flags); + huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); + pages++; ++ tlb_remove_huge_tlb_entry(h, &tlb, ptep, address); + } + spin_unlock(ptl); + + cond_resched(); + } +- /* +- * There is nothing protecting a previously-shared page table that we +- * unshared through huge_pmd_unshare() from getting freed after we +- * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare() +- * succeeded, flush the range corresponding to the pud. +- */ +- if (shared_pmd) +- flush_hugetlb_tlb_range(vma, range.start, range.end); +- else +- flush_hugetlb_tlb_range(vma, start, end); ++ ++ tlb_flush_mmu_tlbonly(&tlb); ++ huge_pmd_unshare_flush(&tlb, vma); + /* + * No need to call mmu_notifier_invalidate_range() we are downgrading + * page table protection not changing it to point to a new page. +@@ -5718,6 +5702,7 @@ unsigned long hugetlb_change_protection( + */ + i_mmap_unlock_write(vma->vm_file->f_mapping); + mmu_notifier_invalidate_range_end(&range); ++ tlb_finish_mmu(&tlb); + + return pages << h->order; + } +@@ -6053,18 +6038,27 @@ out: + return pte; + } + +-/* +- * unmap huge page backed by shared pte. ++/** ++ * huge_pmd_unshare - Unmap a pmd table if it is shared by multiple users ++ * @tlb: the current mmu_gather. ++ * @vma: the vma covering the pmd table. ++ * @addr: pointer to the address we are trying to unshare. ++ * @ptep: pointer into the (pmd) page table. ++ * ++ * Called with the page table lock held, the i_mmap_rwsem held in write mode ++ * and the hugetlb vma lock held in write mode. + * +- * Called with page table lock held. ++ * Note: The caller must call huge_pmd_unshare_flush() before dropping the ++ * i_mmap_rwsem. + * +- * returns: 1 successfully unmapped a shared pte page +- * 0 the underlying pte page is not shared, or it is the last user ++ * Returns: 1 if it was a shared PMD table and it got unmapped, or 0 if it ++ * was not a shared PMD table. + */ +-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, +- unsigned long *addr, pte_t *ptep) ++int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma, ++ unsigned long *addr, pte_t *ptep) + { + unsigned long sz = huge_page_size(hstate_vma(vma)); ++ struct mm_struct *mm = vma->vm_mm; + pgd_t *pgd = pgd_offset(mm, *addr); + p4d_t *p4d = p4d_offset(pgd, *addr); + pud_t *pud = pud_offset(p4d, *addr); +@@ -6076,14 +6070,8 @@ int huge_pmd_unshare(struct mm_struct *m + return 0; + + pud_clear(pud); +- /* +- * Once our caller drops the rmap lock, some other process might be +- * using this page table as a normal, non-hugetlb page table. +- * Wait for pending gup_fast() in other threads to finish before letting +- * that happen. +- */ +- tlb_remove_table_sync_one(); +- atomic_dec(&virt_to_page(ptep)->pt_share_count); ++ tlb_unshare_pmd_ptdesc(tlb, virt_to_page(ptep), *addr); ++ + mm_dec_nr_pmds(mm); + /* + * This update of passed address optimizes loops sequentially +@@ -6096,6 +6084,29 @@ int huge_pmd_unshare(struct mm_struct *m + return 1; + } + ++/* ++ * huge_pmd_unshare_flush - Complete a sequence of huge_pmd_unshare() calls ++ * @tlb: the current mmu_gather. ++ * @vma: the vma covering the pmd table. ++ * ++ * Perform necessary TLB flushes or IPI broadcasts to synchronize PMD table ++ * unsharing with concurrent page table walkers. ++ * ++ * This function must be called after a sequence of huge_pmd_unshare() ++ * calls while still holding the i_mmap_rwsem. ++ */ ++void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma) ++{ ++ /* ++ * We must synchronize page table unsharing such that nobody will ++ * try reusing a previously-shared page table while it might still ++ * be in use by previous sharers (TLB, GUP_fast). ++ */ ++ i_mmap_assert_write_locked(vma->vm_file->f_mapping); ++ ++ tlb_flush_unshared_tables(tlb); ++} ++ + #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ + pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud) +@@ -6103,12 +6114,16 @@ pte_t *huge_pmd_share(struct mm_struct * + return NULL; + } + +-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, +- unsigned long *addr, pte_t *ptep) ++int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma, ++ unsigned long *addr, pte_t *ptep) + { + return 0; + } + ++void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma) ++{ ++} ++ + void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, + unsigned long *start, unsigned long *end) + { +@@ -6387,6 +6402,7 @@ static void hugetlb_unshare_pmds(struct + unsigned long sz = huge_page_size(h); + struct mm_struct *mm = vma->vm_mm; + struct mmu_notifier_range range; ++ struct mmu_gather tlb; + unsigned long address; + spinlock_t *ptl; + pte_t *ptep; +@@ -6397,6 +6413,8 @@ static void hugetlb_unshare_pmds(struct + if (start >= end) + return; + ++ tlb_gather_mmu_vma(&tlb, vma); ++ + /* + * No need to call adjust_range_if_pmd_sharing_possible(), because + * we have already done the PUD_SIZE alignment. +@@ -6417,10 +6435,10 @@ static void hugetlb_unshare_pmds(struct + continue; + ptl = huge_pte_lock(h, mm, ptep); + /* We don't want 'address' to be changed */ +- huge_pmd_unshare(mm, vma, &tmp, ptep); ++ huge_pmd_unshare(&tlb, vma, &tmp, ptep); + spin_unlock(ptl); + } +- flush_hugetlb_tlb_range(vma, start, end); ++ huge_pmd_unshare_flush(&tlb, vma); + if (take_locks) { + i_mmap_unlock_write(vma->vm_file->f_mapping); + } +@@ -6429,6 +6447,7 @@ static void hugetlb_unshare_pmds(struct + * Documentation/vm/mmu_notifier.rst. + */ + mmu_notifier_invalidate_range_end(&range); ++ tlb_finish_mmu(&tlb); + } + + /* +--- a/mm/mmu_gather.c ++++ b/mm/mmu_gather.c +@@ -7,6 +7,7 @@ + #include + #include + #include ++#include + + #include + #include +@@ -267,6 +268,7 @@ static void __tlb_gather_mmu(struct mmu_ + tlb->page_size = 0; + #endif + ++ tlb->fully_unshared_tables = 0; + __tlb_reset_range(tlb); + inc_tlb_flush_pending(tlb->mm); + } +@@ -301,6 +303,31 @@ void tlb_gather_mmu_fullmm(struct mmu_ga + } + + /** ++ * tlb_gather_mmu_vma - initialize an mmu_gather structure for operating on a ++ * single VMA ++ * @tlb: the mmu_gather structure to initialize ++ * @vma: the vm_area_struct ++ * ++ * Called to initialize an (on-stack) mmu_gather structure for operating on ++ * a single VMA. In contrast to tlb_gather_mmu(), calling this function will ++ * not require another call to tlb_start_vma(). In contrast to tlb_start_vma(), ++ * this function will *not* call flush_cache_range(). ++ * ++ * For hugetlb VMAs, this function will also initialize the mmu_gather ++ * page_size accordingly, not requiring a separate call to ++ * tlb_change_page_size(). ++ * ++ */ ++void tlb_gather_mmu_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) ++{ ++ tlb_gather_mmu(tlb, vma->vm_mm); ++ tlb_update_vma_flags(tlb, vma); ++ if (is_vm_hugetlb_page(vma)) ++ /* All entries have the same size. */ ++ tlb_change_page_size(tlb, huge_page_size(hstate_vma(vma))); ++} ++ ++/** + * tlb_finish_mmu - finish an mmu_gather structure + * @tlb: the mmu_gather structure to finish + * +@@ -310,6 +337,12 @@ void tlb_gather_mmu_fullmm(struct mmu_ga + void tlb_finish_mmu(struct mmu_gather *tlb) + { + /* ++ * We expect an earlier huge_pmd_unshare_flush() call to sort this out, ++ * due to complicated locking requirements with page table unsharing. ++ */ ++ VM_WARN_ON_ONCE(tlb->fully_unshared_tables); ++ ++ /* + * If there are parallel threads are doing PTE changes on same range + * under non-exclusive lock (e.g., mmap_lock read-side) but defer TLB + * flush by batching, one thread may end up seeing inconsistent PTEs +--- a/mm/rmap.c ++++ b/mm/rmap.c +@@ -74,7 +74,7 @@ + #include + #include + +-#include ++#include + + #include + +@@ -1469,13 +1469,16 @@ static bool try_to_unmap_one(struct page + address = pvmw.address; + + if (PageHuge(page) && !PageAnon(page)) { ++ struct mmu_gather tlb; ++ + /* + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. + */ + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); +- if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { ++ tlb_gather_mmu_vma(&tlb, vma); ++ if (huge_pmd_unshare(&tlb, vma, &address, pvmw.pte)) { + /* + * huge_pmd_unshare unmapped an entire PMD + * page. There is no way of knowing exactly +@@ -1484,9 +1487,10 @@ static bool try_to_unmap_one(struct page + * already adjusted above to cover this range. + */ + flush_cache_range(vma, range.start, range.end); +- flush_tlb_range(vma, range.start, range.end); ++ huge_pmd_unshare_flush(&tlb, vma); + mmu_notifier_invalidate_range(mm, range.start, + range.end); ++ tlb_finish_mmu(&tlb); + + /* + * The PMD table was unmapped, +@@ -1495,6 +1499,7 @@ static bool try_to_unmap_one(struct page + page_vma_mapped_walk_done(&pvmw); + break; + } ++ tlb_finish_mmu(&tlb); + } + + /* Nuke the page table entry. */ +@@ -1783,13 +1788,16 @@ static bool try_to_migrate_one(struct pa + address = pvmw.address; + + if (PageHuge(page) && !PageAnon(page)) { ++ struct mmu_gather tlb; ++ + /* + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. + */ + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); +- if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { ++ tlb_gather_mmu_vma(&tlb, vma); ++ if (huge_pmd_unshare(&tlb, vma, &address, pvmw.pte)) { + /* + * huge_pmd_unshare unmapped an entire PMD + * page. There is no way of knowing exactly +@@ -1798,9 +1806,10 @@ static bool try_to_migrate_one(struct pa + * already adjusted above to cover this range. + */ + flush_cache_range(vma, range.start, range.end); +- flush_tlb_range(vma, range.start, range.end); ++ huge_pmd_unshare_flush(&tlb, vma); + mmu_notifier_invalidate_range(mm, range.start, + range.end); ++ tlb_finish_mmu(&tlb); + + /* + * The PMD table was unmapped, +@@ -1809,6 +1818,7 @@ static bool try_to_migrate_one(struct pa + page_vma_mapped_walk_done(&pvmw); + break; + } ++ tlb_finish_mmu(&tlb); + } + + /* Nuke the page table entry. */ diff --git a/queue-5.15/mm-hugetlb-fix-hugetlb_pmd_shared.patch b/queue-5.15/mm-hugetlb-fix-hugetlb_pmd_shared.patch new file mode 100644 index 0000000000..14a254ad13 --- /dev/null +++ b/queue-5.15/mm-hugetlb-fix-hugetlb_pmd_shared.patch @@ -0,0 +1,92 @@ +From ca1a47cd3f5f4c46ca188b1c9a27af87d1ab2216 Mon Sep 17 00:00:00 2001 +From: "David Hildenbrand (Red Hat)" +Date: Tue, 23 Dec 2025 22:40:34 +0100 +Subject: mm/hugetlb: fix hugetlb_pmd_shared() + +From: David Hildenbrand (Red Hat) + +commit ca1a47cd3f5f4c46ca188b1c9a27af87d1ab2216 upstream. + +Patch series "mm/hugetlb: fixes for PMD table sharing (incl. using +mmu_gather)", v3. + +One functional fix, one performance regression fix, and two related +comment fixes. + +I cleaned up my prototype I recently shared [1] for the performance fix, +deferring most of the cleanups I had in the prototype to a later point. +While doing that I identified the other things. + +The goal of this patch set is to be backported to stable trees "fairly" +easily. At least patch #1 and #4. + +Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing +Patch #2 + #3 are simple comment fixes that patch #4 interacts with. +Patch #4 is a fix for the reported performance regression due to excessive +IPI broadcasts during fork()+exit(). + +The last patch is all about TLB flushes, IPIs and mmu_gather. +Read: complicated + +There are plenty of cleanups in the future to be had + one reasonable +optimization on x86. But that's all out of scope for this series. + +Runtime tested, with a focus on fixing the performance regression using +the original reproducer [2] on x86. + + +This patch (of 4): + +We switched from (wrongly) using the page count to an independent shared +count. Now, shared page tables have a refcount of 1 (excluding +speculative references) and instead use ptdesc->pt_share_count to identify +sharing. + +We didn't convert hugetlb_pmd_shared(), so right now, we would never +detect a shared PMD table as such, because sharing/unsharing no longer +touches the refcount of a PMD table. + +Page migration, like mbind() or migrate_pages() would allow for migrating +folios mapped into such shared PMD tables, even though the folios are not +exclusive. In smaps we would account them as "private" although they are +"shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the +pagemap interface. + +Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared(). + +Link: https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org +Link: https://lkml.kernel.org/r/20251223214037.580860-2-david@kernel.org +Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [1] +Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [2] +Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count") +Signed-off-by: David Hildenbrand (Red Hat) +Reviewed-by: Rik van Riel +Reviewed-by: Lance Yang +Tested-by: Lance Yang +Reviewed-by: Harry Yoo +Tested-by: Laurence Oberman +Reviewed-by: Lorenzo Stoakes +Acked-by: Oscar Salvador +Cc: Liu Shixin +Cc: Uschakow, Stanislav" +Cc: +Signed-off-by: Andrew Morton +[ David: We don't have ptdesc and the wrappers, so work directly on + page->pt_share_count. ] +Signed-off-by: David Hildenbrand (Arm) +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/hugetlb.h | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/include/linux/hugetlb.h ++++ b/include/linux/hugetlb.h +@@ -1110,7 +1110,7 @@ static inline __init void hugetlb_cma_ch + #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE + static inline bool hugetlb_pmd_shared(pte_t *pte) + { +- return page_count(virt_to_page(pte)) > 1; ++ return atomic_read(&virt_to_page(pte)->pt_share_count); + } + #else + static inline bool hugetlb_pmd_shared(pte_t *pte) diff --git a/queue-5.15/mm-hugetlb-fix-two-comments-related-to-huge_pmd_unshare.patch b/queue-5.15/mm-hugetlb-fix-two-comments-related-to-huge_pmd_unshare.patch new file mode 100644 index 0000000000..70e72f46b2 --- /dev/null +++ b/queue-5.15/mm-hugetlb-fix-two-comments-related-to-huge_pmd_unshare.patch @@ -0,0 +1,87 @@ +From 3937027caecb4f8251e82dd857ba1d749bb5a428 Mon Sep 17 00:00:00 2001 +From: "David Hildenbrand (Red Hat)" +Date: Tue, 23 Dec 2025 22:40:35 +0100 +Subject: mm/hugetlb: fix two comments related to huge_pmd_unshare() + +From: David Hildenbrand (Red Hat) + +commit 3937027caecb4f8251e82dd857ba1d749bb5a428 upstream. + +Ever since we stopped using the page count to detect shared PMD page +tables, these comments are outdated. + +The only reason we have to flush the TLB early is because once we drop the +i_mmap_rwsem, the previously shared page table could get freed (to then +get reallocated and used for other purpose). So we really have to flush +the TLB before that could happen. + +So let's simplify the comments a bit. + +The "If we unshared PMDs, the TLB flush was not recorded in mmu_gather." +part introduced as in commit a4a118f2eead ("hugetlbfs: flush TLBs +correctly after huge_pmd_unshare") was confusing: sure it is recorded in +the mmu_gather, otherwise tlb_flush_mmu_tlbonly() wouldn't do anything. +So let's drop that comment while at it as well. + +We'll centralize these comments in a single helper as we rework the code +next. + +Link: https://lkml.kernel.org/r/20251223214037.580860-3-david@kernel.org +Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count") +Signed-off-by: David Hildenbrand (Red Hat) +Reviewed-by: Rik van Riel +Tested-by: Laurence Oberman +Reviewed-by: Lorenzo Stoakes +Acked-by: Oscar Salvador +Reviewed-by: Harry Yoo +Cc: Liu Shixin +Cc: Lance Yang +Cc: "Uschakow, Stanislav" +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: David Hildenbrand (Arm) +Signed-off-by: Greg Kroah-Hartman +--- + mm/hugetlb.c | 24 ++++++++---------------- + 1 file changed, 8 insertions(+), 16 deletions(-) + +--- a/mm/hugetlb.c ++++ b/mm/hugetlb.c +@@ -4552,17 +4552,10 @@ void __unmap_hugepage_range(struct mmu_g + tlb_end_vma(tlb, vma); + + /* +- * If we unshared PMDs, the TLB flush was not recorded in mmu_gather. We +- * could defer the flush until now, since by holding i_mmap_rwsem we +- * guaranteed that the last refernece would not be dropped. But we must +- * do the flushing before we return, as otherwise i_mmap_rwsem will be +- * dropped and the last reference to the shared PMDs page might be +- * dropped as well. +- * +- * In theory we could defer the freeing of the PMD pages as well, but +- * huge_pmd_unshare() relies on the exact page_count for the PMD page to +- * detect sharing, so we cannot defer the release of the page either. +- * Instead, do flush now. ++ * There is nothing protecting a previously-shared page table that we ++ * unshared through huge_pmd_unshare() from getting freed after we ++ * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare() ++ * succeeded, flush the range corresponding to the pud. + */ + if (force_flush) + tlb_flush_mmu_tlbonly(tlb); +@@ -5708,11 +5701,10 @@ unsigned long hugetlb_change_protection( + cond_resched(); + } + /* +- * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare +- * may have cleared our pud entry and done put_page on the page table: +- * once we release i_mmap_rwsem, another task can do the final put_page +- * and that page table be reused and filled with junk. If we actually +- * did unshare a page of pmds, flush the range corresponding to the pud. ++ * There is nothing protecting a previously-shared page table that we ++ * unshared through huge_pmd_unshare() from getting freed after we ++ * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare() ++ * succeeded, flush the range corresponding to the pud. + */ + if (shared_pmd) + flush_hugetlb_tlb_range(vma, range.start, range.end); diff --git a/queue-5.15/mm-hugetlb-make-detecting-shared-pte-more-reliable.patch b/queue-5.15/mm-hugetlb-make-detecting-shared-pte-more-reliable.patch new file mode 100644 index 0000000000..35073465b8 --- /dev/null +++ b/queue-5.15/mm-hugetlb-make-detecting-shared-pte-more-reliable.patch @@ -0,0 +1,85 @@ +From 3aa4ed8040e1535d95c03cef8b52cf11bf0d8546 Mon Sep 17 00:00:00 2001 +From: Miaohe Lin +Date: Tue, 16 Aug 2022 21:05:53 +0800 +Subject: mm/hugetlb: make detecting shared pte more reliable + +From: Miaohe Lin + +commit 3aa4ed8040e1535d95c03cef8b52cf11bf0d8546 upstream. + +If the pagetables are shared, we shouldn't copy or take references. Since +src could have unshared and dst shares with another vma, huge_pte_none() +is thus used to determine whether dst_pte is shared. But this check isn't +reliable. A shared pte could have pte none in pagetable in fact. The +page count of ptep page should be checked here in order to reliably +determine whether pte is shared. + +[lukas.bulwahn@gmail.com: remove unused local variable dst_entry in copy_hugetlb_page_range()] + Link: https://lkml.kernel.org/r/20220822082525.26071-1-lukas.bulwahn@gmail.com +Link: https://lkml.kernel.org/r/20220816130553.31406-7-linmiaohe@huawei.com +Signed-off-by: Miaohe Lin +Signed-off-by: Lukas Bulwahn +Reviewed-by: Mike Kravetz +Cc: Muchun Song +Signed-off-by: Andrew Morton +Signed-off-by: David Hildenbrand (Arm) +Signed-off-by: Greg Kroah-Hartman +--- + mm/hugetlb.c | 21 ++++++++------------- + 1 file changed, 8 insertions(+), 13 deletions(-) + +--- a/mm/hugetlb.c ++++ b/mm/hugetlb.c +@@ -4304,7 +4304,7 @@ hugetlb_install_page(struct vm_area_stru + int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, + struct vm_area_struct *vma) + { +- pte_t *src_pte, *dst_pte, entry, dst_entry; ++ pte_t *src_pte, *dst_pte, entry; + struct page *ptepage; + unsigned long addr; + bool cow = is_cow_mapping(vma->vm_flags); +@@ -4343,28 +4343,23 @@ int copy_hugetlb_page_range(struct mm_st + + /* + * If the pagetables are shared don't copy or take references. +- * dst_pte == src_pte is the common case of src/dest sharing. + * ++ * dst_pte == src_pte is the common case of src/dest sharing. + * However, src could have 'unshared' and dst shares with +- * another vma. If dst_pte !none, this implies sharing. +- * Check here before taking page table lock, and once again +- * after taking the lock below. ++ * another vma. So page_count of ptep page is checked instead ++ * to reliably determine whether pte is shared. + */ +- dst_entry = huge_ptep_get(dst_pte); +- if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) ++ if (page_count(virt_to_page(dst_pte)) > 1) + continue; + + dst_ptl = huge_pte_lock(h, dst, dst_pte); + src_ptl = huge_pte_lockptr(h, src, src_pte); + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); + entry = huge_ptep_get(src_pte); +- dst_entry = huge_ptep_get(dst_pte); + again: +- if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) { ++ if (huge_pte_none(entry)) { + /* +- * Skip if src entry none. Also, skip in the +- * unlikely case dst entry !none as this implies +- * sharing with another vma. ++ * Skip if src entry none. + */ + ; + } else if (unlikely(is_hugetlb_entry_migration(entry) || +@@ -4423,7 +4418,7 @@ again: + restore_reserve_on_error(h, vma, addr, + new); + put_page(new); +- /* dst_entry won't change as in child */ ++ /* huge_ptep of dst_pte won't change as in child */ + goto again; + } + hugetlb_install_page(vma, dst_pte, addr, new); diff --git a/queue-5.15/mm-rmap-fix-two-comments-related-to-huge_pmd_unshare.patch b/queue-5.15/mm-rmap-fix-two-comments-related-to-huge_pmd_unshare.patch new file mode 100644 index 0000000000..39c210c5a6 --- /dev/null +++ b/queue-5.15/mm-rmap-fix-two-comments-related-to-huge_pmd_unshare.patch @@ -0,0 +1,74 @@ +From a8682d500f691b6dfaa16ae1502d990aeb86e8be Mon Sep 17 00:00:00 2001 +From: "David Hildenbrand (Red Hat)" +Date: Tue, 23 Dec 2025 22:40:36 +0100 +Subject: mm/rmap: fix two comments related to huge_pmd_unshare() + +From: David Hildenbrand (Red Hat) + +commit a8682d500f691b6dfaa16ae1502d990aeb86e8be upstream. + +PMD page table unsharing no longer touches the refcount of a PMD page +table. Also, it is not about dropping the refcount of a "PMD page" but +the "PMD page table". + +Let's just simplify by saying that the PMD page table was unmapped, +consequently also unmapping the folio that was mapped into this page. + +This code should be deduplicated in the future. + +Link: https://lkml.kernel.org/r/20251223214037.580860-4-david@kernel.org +Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count") +Signed-off-by: David Hildenbrand (Red Hat) +Reviewed-by: Rik van Riel +Tested-by: Laurence Oberman +Reviewed-by: Lorenzo Stoakes +Acked-by: Oscar Salvador +Cc: Liu Shixin +Cc: Harry Yoo +Cc: Lance Yang +Cc: "Uschakow, Stanislav" +Cc: +Signed-off-by: Andrew Morton +[ David: We don't have 40549ba8f8e0 ("hugetlb: use new vma_lock + for pmd sharing synchronization") so there are some contextual + differences. ] +Signed-off-by: David Hildenbrand (Arm) +Signed-off-by: Greg Kroah-Hartman +--- + mm/rmap.c | 18 ++++-------------- + 1 file changed, 4 insertions(+), 14 deletions(-) + +--- a/mm/rmap.c ++++ b/mm/rmap.c +@@ -1489,13 +1489,8 @@ static bool try_to_unmap_one(struct page + range.end); + + /* +- * The ref count of the PMD page was dropped +- * which is part of the way map counting +- * is done for shared PMDs. Return 'true' +- * here. When there is no other sharing, +- * huge_pmd_unshare returns false and we will +- * unmap the actual page and drop map count +- * to zero. ++ * The PMD table was unmapped, ++ * consequently unmapping the folio. + */ + page_vma_mapped_walk_done(&pvmw); + break; +@@ -1808,13 +1803,8 @@ static bool try_to_migrate_one(struct pa + range.end); + + /* +- * The ref count of the PMD page was dropped +- * which is part of the way map counting +- * is done for shared PMDs. Return 'true' +- * here. When there is no other sharing, +- * huge_pmd_unshare returns false and we will +- * unmap the actual page and drop map count +- * to zero. ++ * The PMD table was unmapped, ++ * consequently unmapping the folio. + */ + page_vma_mapped_walk_done(&pvmw); + break; diff --git a/queue-5.15/series b/queue-5.15/series index 96882381be..cd81a422a5 100644 --- a/queue-5.15/series +++ b/queue-5.15/series @@ -211,3 +211,9 @@ serial-8250-fix-tx-deadlock-when-using-dma.patch serial-8250-add-late-synchronize_irq-to-shutdown-to-handle-dw-uart-busy.patch serial-uartlite-fix-pm-runtime-usage-count-underflow-on-probe.patch drm-radeon-apply-state-adjust-rules-to-some-additional-hainan-vairants.patch +mm-hugetlb-make-detecting-shared-pte-more-reliable.patch +mm-hugetlb-fix-copy_hugetlb_page_range-to-use-pt_share_count.patch +mm-hugetlb-fix-hugetlb_pmd_shared.patch +mm-hugetlb-fix-two-comments-related-to-huge_pmd_unshare.patch +mm-rmap-fix-two-comments-related-to-huge_pmd_unshare.patch +mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch