]> git.ipfire.org Git - thirdparty/kernel/stable.git/commitdiff
mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
authorJinjiang Tu <tujinjiang@huawei.com>
Fri, 27 Jun 2025 12:57:46 +0000 (20:57 +0800)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fri, 1 Aug 2025 08:48:44 +0000 (09:48 +0100)
commit 9f1e8cd0b7c4c944e9921b52a6661b5eda2705ab upstream.

In shrink_folio_list(), the hwpoisoned folio may be large folio, which
can't be handled by unmap_poisoned_folio().  For THP, try_to_unmap_one()
must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
retry.  Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
pvmw.pte.  Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a
WARN_ON_ONCE due to the page isn't in swapcache.

Since UCE is rare in real world, and race with reclaimation is more rare,
just skipping the hwpoisoned large folio is enough.  memory_failure() will
handle it if the UCE is triggered again.

This happens when memory reclaim for large folio races with
memory_failure(), and will lead to kernel panic.  The race is as
follows:

cpu0      cpu1
 shrink_folio_list memory_failure
  TestSetPageHWPoison
  unmap_poisoned_folio
  --> trigger BUG_ON due to
  unmap_poisoned_folio couldn't
   handle large folio

[tujinjiang@huawei.com: add comment to unmap_poisoned_folio()]
Link: https://lkml.kernel.org/r/69fd4e00-1b13-d5f7-1c82-705c7d977ea4@huawei.com
Link: https://lkml.kernel.org/r/20250627125747.3094074-2-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mm/memory-failure.c
mm/vmscan.c

index ec1c71abe88dfd5cf7d06ffc332fb157869a12b7..70b2ccf0d51eedddd5124ec9be45bb93796b0257 100644 (file)
@@ -1559,6 +1559,10 @@ static int get_hwpoison_page(struct page *p, unsigned long flags)
        return ret;
 }
 
+/*
+ * The caller must guarantee the folio isn't large folio, except hugetlb.
+ * try_to_unmap() can't handle it.
+ */
 int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill)
 {
        enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
index 0eb5d510d4f6b6b98a953fc3ef30fba97a1b0d02..e3c1e2e1560d7514210fefa30210f452e6394761 100644 (file)
@@ -1080,6 +1080,14 @@ retry:
                        goto keep;
 
                if (folio_contain_hwpoisoned_page(folio)) {
+                       /*
+                        * unmap_poisoned_folio() can't handle large
+                        * folio, just skip it. memory_failure() will
+                        * handle it if the UCE is triggered again.
+                        */
+                       if (folio_test_large(folio))
+                               goto keep_locked;
+
                        unmap_poisoned_folio(folio, folio_pfn(folio), false);
                        folio_unlock(folio);
                        folio_put(folio);