]> git.ipfire.org Git - thirdparty/kernel/linux.git/commitdiff
mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared THP
authorWei Yang <richard.weiyang@gmail.com>
Thu, 5 Mar 2026 01:50:06 +0000 (01:50 +0000)
committerAndrew Morton <akpm@linux-foundation.org>
Tue, 10 Mar 2026 23:01:49 +0000 (16:01 -0700)
Commit 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and
split_huge_pmd_locked()") return false unconditionally after
split_huge_pmd_locked().  This may fail try_to_migrate() early when
TTU_SPLIT_HUGE_PMD is specified.

The reason is the above commit adjusted try_to_migrate_one() to, when a
PMD-mapped THP entry is found, and TTU_SPLIT_HUGE_PMD is specified (for
example, via unmap_folio()), return false unconditionally.  This breaks
the rmap walk and fail try_to_migrate() early, if this PMD-mapped THP is
mapped in multiple processes.

The user sensible impact of this bug could be:

  * On memory pressure, shrink_folio_list() may split partially mapped
    folio with split_folio_to_list(). Then free unmapped pages without IO.
    If failed, it may not be reclaimed.
  * On memory failure, memory_failure() would call try_to_split_thp_page()
    to split folio contains the bad page. If succeed, the PG_has_hwpoisoned
    bit is only set in the after-split folio contains @split_at. By doing
    so, we limit bad memory. If failed to split, the whole folios is not
    usable.

One way to reproduce:

    Create an anonymous THP range and fork 512 children, so we have a
    THP shared mapped in 513 processes. Then trigger folio split with
    /sys/kernel/debug/split_huge_pages debugfs to split the THP folio to
    order 0.

Without the above commit, we can successfully split to order 0.  With the
above commit, the folio is still a large folio.

And currently there are two core users of TTU_SPLIT_HUGE_PMD:

  * try_to_unmap_one()
  * try_to_migrate_one()

try_to_unmap_one() would restart the rmap walk, so only
try_to_migrate_one() is affected.

We can't simply revert commit 60fbb14396d5 ("mm/huge_memory: adjust
try_to_migrate_one() and split_huge_pmd_locked()"), since it removed some
duplicated check covered by page_vma_mapped_walk().

This patch fixes this by restart page_vma_mapped_walk() after
split_huge_pmd_locked().  Since we cannot simply return "true" to fix the
problem, as that would affect another case:

    When invoking folio_try_share_anon_rmap_pmd() from
    split_huge_pmd_locked(), the latter can fail and leave a large folio
    mapped through PTEs, in which case we ought to return true from
    try_to_migrate_one(). This might result in unnecessary walking of the
    rmap but is relatively harmless.

Link: https://lkml.kernel.org/r/20260305015006.27343-1-richard.weiyang@gmail.com
Fixes: 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Gavin Guo <gavinguo@igalia.com>
Acked-by: David Hildenbrand (arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/rmap.c

index b1ba1236ffbaa406be26753b7552bfa42fc4e552..391337282e3f3c172aa2fce00a6d7e5df1ee4797 100644 (file)
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2450,11 +2450,17 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
                        __maybe_unused pmd_t pmdval;
 
                        if (flags & TTU_SPLIT_HUGE_PMD) {
+                               /*
+                                * split_huge_pmd_locked() might leave the
+                                * folio mapped through PTEs. Retry the walk
+                                * so we can detect this scenario and properly
+                                * abort the walk.
+                                */
                                split_huge_pmd_locked(vma, pvmw.address,
                                                      pvmw.pmd, true);
-                               ret = false;
-                               page_vma_mapped_walk_done(&pvmw);
-                               break;
+                               flags &= ~TTU_SPLIT_HUGE_PMD;
+                               page_vma_mapped_walk_restart(&pvmw);
+                               continue;
                        }
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
                        pmdval = pmdp_get(pvmw.pmd);