]> git.ipfire.org Git - thirdparty/kernel/stable.git/commitdiff
mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
authorShakeel Butt <shakeel.butt@linux.dev>
Fri, 30 Jan 2026 04:29:25 +0000 (20:29 -0800)
committerAndrew Morton <akpm@linux-foundation.org>
Fri, 6 Feb 2026 23:47:15 +0000 (15:47 -0800)
In META's fleet, we observed high-level cgroups showing zero file memcg
stats while their descendants had non-zero values.  Investigation using
drgn revealed that these parent cgroups actually had negative file stats,
aggregated from their children.

This issue became more frequent after deploying thp-always more widely,
pointing to a correlation with THP file collapsing.  The root cause is
that collapse_file() assumes old folios and the new THP belong to the same
node and memcg.  When this assumption breaks, stats become skewed.  The
bug affects not just memcg stats but also per-numa stats, and not just
NR_FILE_PAGES but also NR_SHMEM.

The assumption breaks in scenarios such as:

1. Small folios allocated on one node while the THP gets allocated on a
   different node.

2. A package downloader running in one cgroup populates the page cache,
   while a job in a different cgroup executes the downloaded binary.

3. A file shared between processes in different cgroups, where one
   process faults in the pages and khugepaged (or madvise(COLLAPSE))
   collapses them on behalf of the other.

Fix the accounting by explicitly incrementing stats for the new THP and
decrementing stats for the old folios being replaced.

Link: https://lkml.kernel.org/r/20260130042925.2797946-1-shakeel.butt@linux.dev
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: David Hildenbrand (arm) <david@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/khugepaged.c

index 3ba6dcea5993f9b7d1e360ede78719829393b2d3..1b8faae5b44887dd02d9cb77f0fef8538d369841 100644 (file)
@@ -2195,16 +2195,13 @@ immap_locked:
                xas_lock_irq(&xas);
        }
 
-       if (is_shmem)
+       if (is_shmem) {
+               lruvec_stat_mod_folio(new_folio, NR_SHMEM, HPAGE_PMD_NR);
                lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR);
-       else
+       } else {
                lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR);
-
-       if (nr_none) {
-               lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none);
-               /* nr_none is always 0 for non-shmem. */
-               lruvec_stat_mod_folio(new_folio, NR_SHMEM, nr_none);
        }
+       lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, HPAGE_PMD_NR);
 
        /*
         * Mark new_folio as uptodate before inserting it into the
@@ -2238,6 +2235,11 @@ immap_locked:
         */
        list_for_each_entry_safe(folio, tmp, &pagelist, lru) {
                list_del(&folio->lru);
+               lruvec_stat_mod_folio(folio, NR_FILE_PAGES,
+                                     -folio_nr_pages(folio));
+               if (is_shmem)
+                       lruvec_stat_mod_folio(folio, NR_SHMEM,
+                                             -folio_nr_pages(folio));
                folio->mapping = NULL;
                folio_clear_active(folio);
                folio_clear_unevictable(folio);