Breno Leitao [Mon, 16 Mar 2026 11:54:32 +0000 (04:54 -0700)]
kho: rename fdt parameter to blob in kho_add/remove_subtree()
Since kho_add_subtree() now accepts arbitrary data blobs (not just FDTs),
rename the parameter from 'fdt' to 'blob' to better reflect its purpose.
Apply the same rename to kho_remove_subtree() for consistency.
Also rename kho_debugfs_fdt_add() and kho_debugfs_fdt_remove() to
kho_debugfs_blob_add() and kho_debugfs_blob_remove() respectively, with
the same parameter rename from 'fdt' to 'blob'.
Link: https://lore.kernel.org/20260316-kho-v9-2-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Breno Leitao [Mon, 16 Mar 2026 11:54:31 +0000 (04:54 -0700)]
kho: add size parameter to kho_add_subtree()
Patch series "kho: history: track previous kernel version and kexec boot
count", v9.
Use Kexec Handover (KHO) to pass the previous kernel's version string and
the number of kexec reboots since the last cold boot to the next kernel,
and print it at boot time.
Bugs that only reproduce when kexecing from specific kernel versions are
difficult to diagnose. These issues occur when a buggy kernel kexecs into
a new kernel, with the bug manifesting only in the second kernel.
Recent examples include:
* eb2266312507 ("x86/boot: Fix page table access in 5-level to 4-level paging transition")
* 77d48d39e991 ("efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption")
* 64b45dd46e15 ("x86/efi: skip memattr table on kexec boot")
As kexec-based reboots become more common, these version-dependent bugs
are appearing more frequently. At scale, correlating crashes to the
previous kernel version is challenging, especially when issues only occur
in specific transition scenarios.
Some bugs manifest only after multiple consecutive kexec reboots.
Tracking the kexec count helps identify these cases (this metric is
already used by live update sub-system).
KHO provides a reliable mechanism to pass information between kernels. By
carrying the previous kernel's release string and kexec count forward, we
can print this context at boot time to aid debugging.
The goal of this feature is to have this information being printed in
early boot, so, users can trace back kernel releases in kexec. Systemd is
not helpful because we cannot assume that the previous kernel has systemd
or even write access to the disk (common when using Linux as bootloaders)
This patch (of 6):
kho_add_subtree() assumes the fdt argument is always an FDT and calls
fdt_totalsize() on it in the debugfs code path. This assumption will
break if a caller passes arbitrary data instead of an FDT.
When CONFIG_KEXEC_HANDOVER_DEBUGFS is enabled, kho_debugfs_fdt_add() calls
__kho_debugfs_fdt_add(), which executes:
f->wrapper.size = fdt_totalsize(fdt);
Fix this by adding an explicit size parameter to kho_add_subtree() so
callers specify the blob size. This allows subtrees to contain arbitrary
data formats, not just FDTs. Update all callers:
- memblock.c: use fdt_totalsize(fdt)
- luo_core.c: use fdt_totalsize(fdt_out)
- test_kho.c: use fdt_totalsize()
- kexec_handover.c (root fdt): use fdt_totalsize(kho_out.fdt)
Also update __kho_debugfs_fdt_add() to receive the size explicitly instead
of computing it internally via fdt_totalsize(). In kho_in_debugfs_init(),
pass fdt_totalsize() for the root FDT and sub-blobs since all current
users are FDTs. A subsequent patch will persist the size in the KHO FDT
so the incoming side can handle non-FDT blobs correctly.
Link: https://lore.kernel.org/20260323110747.193569-1-duanchenghao@kylinos.cn Link: https://lore.kernel.org/20260316-kho-v9-1-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a Kconfig option to default kmemleak verbose mode on at build time.
This option depends on DEBUG_KMEMLEAK_AUTO_SCAN since verbose reporting is
only meaningful when the automatic scanning thread is running.
When enabled, kmemleak prints full details (backtrace, hex dump, address)
of unreferenced objects to dmesg as they are detected during scanning,
removing the need to manually read /sys/kernel/debug/kmemleak.
Making this a compile-time option rather than a boot parameter allows
debug kernel flavors to enable verbose kmemleak reporting by default
without requiring changes to boot arguments. A machine can simply swap to
a debug kernel and benefit from kmemleak reporting automatically.
By surfacing leak reports directly in dmesg, they are automatically
forwarded through any kernel logging infrastructure and can be easily
captured by log aggregation tooling, making it practical to monitor memory
leaks across large fleets.
The verbose setting can still be toggled at runtime via
/sys/module/kmemleak/parameters/verbose.
Link: https://lore.kernel.org/20260323-kmemleak_report-v1-1-ba2cdd9c11b9@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MAINTAINERS: update MGLRU entry to reflect current status
We are moving to a far more proactive model of maintainership within mm
and thus put a great deal of emphasis on sub-maintainers being active
within the community both in terms of code contributions and review.
The MGLRU has not had much activity since being added to the kernel and
the current maintainers who kindly stepped up have unfortunately not been
able to contribute a great deal to it for over a year, nor engage all that
heavily in review.
As a result, and within no negative connotations implied whatsoever, it
seems appropriate to downgrade the current maintainers to reviewers.
At this time nobody is quite exercising the maintainer role in this area
of the kernel, but there is encouraging activity from a number of people
who are trusted elsewhere in the kernel, and who have contributed relevant
work or review.
Therefore add further reviewers, and at this stage - to reflect the
reality on the ground - we will not have any sub-maintainers listed at
all.
Each of the files listed are shared with other sections in MAINTAINERS, so
this doesn't reduce sub-maintainer coverage.
Link: https://lore.kernel.org/20260326185629.355476-1-ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Barry Song <baohua@kernel.org> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Kairui Song <kasong@tencent.com> Acked-by: Qi Zheng <qi.zheng@linux.dev> Acked-by: Yuanchu Xie <yuanchu@google.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Wei Xu <weixugc@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Fri, 27 Mar 2026 10:16:30 +0000 (18:16 +0800)]
mm: memcontrol: correct the nr_pages parameter type of mem_cgroup_update_lru_size()
The nr_pages parameter of mem_cgroup_update_lru_size() represents a page
count. During the reparenting of LRU folios, the value passed to it can
potentially exceed the maximum value of a 32-bit integer. It should be
declared as long instead of int to match the types used in lruvec size
accounting and to prevent possible overflow.
Update the parameter type to long to ensure correctness.
Link: https://lore.kernel.org/fd4140de44fa0a3978e4e2426731187fe8625f0b.1774604356.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Fri, 27 Mar 2026 10:16:29 +0000 (18:16 +0800)]
mm: memcontrol: change val type to long in __mod_memcg_{lruvec_}state()
The __mod_memcg_state() and __mod_memcg_lruvec_state() functions are also
used to reparent non-hierarchical stats. In this scenario, the values
passed to them are accumulated statistics that might be extremely large
and exceed the upper limit of a 32-bit integer.
Change the val parameter type from int to long in these functions and
their corresponding tracepoints (memcg_rstat_stats) to prevent potential
overflow issues.
After that, in memcg_state_val_in_pages(), if the passed val is negative,
the expression val * unit / PAGE_SIZE could be implicitly converted to a
massive positive number when compared with 1UL in the max() macro. This
leads to returning an incorrect massive positive value.
Fix this by using abs(val) to calculate the magnitude first, and then
restoring the sign of the value before returning the result.
Additionally, use mult_frac() to prevent potential overflow during the
multiplication of val and unit.
Link: https://lore.kernel.org/70a9440e49c464b4dca88bcabc6b491bd335c9f0.1774604356.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reported-by: Harry Yoo (Oracle) <harry@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Fri, 27 Mar 2026 10:16:28 +0000 (18:16 +0800)]
mm: memcontrol: correct the type of stats_updates to unsigned long
Patch series "fix unexpected type conversions and potential overflows",
v3.
As Harry Yoo pointed out [1], in scenarios where massive state updates
occur (e.g., during the reparenting of LRU folios), the values passed to
memcg stat update functions can accumulate and exceed the upper limit of a
32-bit integer.
If the parameter types are not large enough (like 'int') or are handled
incorrectly, it can lead to severe truncation, potential overflow issues,
and unexpected type conversion bugs.
This series aims to address these issues by correcting the parameter types
in the relevant functions, and by fixing an implicit conversion bug in
memcg_state_val_in_pages().
This patch (of 3):
The memcg_rstat_updated() tracks updates for vmstats_percpu->state and
lruvec_stats_percpu->state. Since these state values are of type long,
change the val parameter passed to memcg_rstat_updated() to long as well.
Correspondingly, change the type of stats_updates in struct
memcg_vmstats_percpu and struct memcg_vmstats from unsigned int and
atomic_t to unsigned long and atomic_long_t respectively to prevent
potential overflow when handling large state updates during the
reparenting of LRU folios.
Link: https://lore.kernel.org/cover.1774604356.git.zhengqi.arch@bytedance.com Link: https://lore.kernel.org/a5b0b468e7b4fe5f26c50e36d5d016f16d92f98f.1774604356.git.zhengqi.arch@bytedance.com Link: https://lore.kernel.org/all/acDxaEgnqPI-Z4be@hyeyoo/ Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:51 +0000 (19:52 +0800)]
mm: lru: add VM_WARN_ON_ONCE_FOLIO to lru maintenance helpers
We must ensure the folio is deleted from or added to the correct lruvec
list. So, add VM_WARN_ON_ONCE_FOLIO() to catch invalid users. The
VM_BUG_ON_PAGE() in move_pages_to_lru() can be removed as
add_page_to_lru_list() will perform the necessary check.
Link: https://lore.kernel.org/2c90fc006d9d730331a3caeef96f7e5dabe2036d.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:50 +0000 (19:52 +0800)]
mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios
Now that everything is set up, switch folio->memcg_data pointers to
objcgs, update the accessors, and execute reparenting on cgroup death.
Finally, folio->memcg_data of LRU folios and kmem folios will always point
to an object cgroup pointer. The folio->memcg_data of slab folios will
point to an vector of object cgroups.
Link: https://lore.kernel.org/80cb7af198dc6f2173fe616d1207a4c315ece141.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Thu, 5 Mar 2026 11:52:49 +0000 (19:52 +0800)]
mm: memcontrol: convert objcg to be per-memcg per-node type
Convert objcg to be per-memcg per-node type, so that when reparent LRU
folios later, we can hold the lru lock at the node level, thus avoiding
holding too many lru locks at once.
[zhengqi.arch@bytedance.com: reset pn->orig_objcg to NULL] Link: https://lore.kernel.org/20260309112939.31937-1-qi.zheng@linux.dev
[akpm@linux-foundation.org: fix comment typo, per Usama. Reflow comment to 80 cols]
[devnexen@gmail.com: fix obj_cgroup leak in mem_cgroup_css_online() error path] Link: https://lore.kernel.org/20260322193631.45457-1-devnexen@gmail.com
[devnexen@gmail.com: add newline, per Qi Zheng] Link: https://lore.kernel.org/20260323063007.7783-1-devnexen@gmail.com Link: https://lore.kernel.org/56c04b1c5d54f75ccdc12896df6c1ca35403ecc3.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: David Carlier <devnexen@gmail.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Usama Arif <usama.arif@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Thu, 5 Mar 2026 11:52:48 +0000 (19:52 +0800)]
mm: memcontrol: prepare for reparenting non-hierarchical stats
To resolve the dying memcg issue, we need to reparent LRU folios of child
memcg to its parent memcg. This could cause problems for non-hierarchical
stats.
As Yosry Ahmed pointed out:
In short, if memory is charged to a dying cgroup at the time of
reparenting, when the memory gets uncharged the stats updates will occur
at the parent. This will update both hierarchical and non-hierarchical
stats of the parent, which would corrupt the parent's non-hierarchical
stats (because those counters were never incremented when the memory was
charged).
Now we have the following two types of non-hierarchical stats, and they
are only used in CONFIG_MEMCG_V1:
a. memcg->vmstats->state_local[i]
b. pn->lruvec_stats->state_local[i]
To ensure that these non-hierarchical stats work properly, we need to
reparent these non-hierarchical stats after reparenting LRU folios. To
this end, this commit makes the following preparations:
1. implement reparent_state_local() to reparent non-hierarchical stats
2. make css_killed_work_fn() to be called in rcu work, and implement
get_non_dying_memcg_start() and get_non_dying_memcg_end() to avoid race
between mod_memcg_state()/mod_memcg_lruvec_state()
and reparent_state_local()
Link: https://lore.kernel.org/e862995c45a7101a541284b6ebee5e5c32c89066.1772711148.git.zhengqi.arch@bytedance.com Co-developed-by: Yosry Ahmed <yosry@kernel.org> Signed-off-by: Yosry Ahmed <yosry@kernel.org> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Thu, 5 Mar 2026 11:52:46 +0000 (19:52 +0800)]
mm: workingset: use lruvec_lru_size() to get the number of lru pages
For cgroup v2, count_shadow_nodes() is the only place to read
non-hierarchical stats (lruvec_stats->state_local). To avoid the need to
consider cgroup v2 during subsequent non-hierarchical stats reparenting,
use lruvec_lru_size() instead of lruvec_page_state_local() to get the
number of lru pages.
For NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B cases, it appears
that the statistics here have already been problematic for a while since
slab pages have been reparented. So just ignore it for now.
Link: https://lore.kernel.org/b1d448c667a8fb377c3390d9aba43bdb7e4d5739.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Thu, 5 Mar 2026 11:52:44 +0000 (19:52 +0800)]
mm: vmscan: prepare for reparenting MGLRU folios
Similar to traditional LRU folios, in order to solve the dying memcg
problem, we also need to reparenting MGLRU folios to the parent memcg when
memcg offline.
However, there are the following challenges:
1. Each lruvec has between MIN_NR_GENS and MAX_NR_GENS generations, the
number of generations of the parent and child memcg may be different,
so we cannot simply transfer MGLRU folios in the child memcg to the
parent memcg as we did for traditional LRU folios.
2. The generation information is stored in folio->flags, but we cannot
traverse these folios while holding the lru lock, otherwise it may
cause softlockup.
3. In walk_update_folio(), the gen of folio and corresponding lru size
may be updated, but the folio is not immediately moved to the
corresponding lru list. Therefore, there may be folios of different
generations on an LRU list.
4. In lru_gen_del_folio(), the generation to which the folio belongs is
found based on the generation information in folio->flags, and the
corresponding LRU size will be updated. Therefore, we need to update
the lru size correctly during reparenting, otherwise the lru size may
be updated incorrectly in lru_gen_del_folio().
Finally, this patch chose a compromise method, which is to splice the lru
list in the child memcg to the lru list of the same generation in the
parent memcg during reparenting. And in order to ensure that the parent
memcg has the same generation, we need to increase the generations in the
parent memcg to the MAX_NR_GENS before reparenting.
Of course, the same generation has different meanings in the parent and
child memcg, this will cause confusion in the hot and cold information of
folios. But other than that, this method is simple enough, the lru size
is correct, and there is no need to consider some concurrency issues (such
as lru_gen_del_folio()).
To prepare for the above work, this commit implements the specific
functions, which will be used during reparenting.
[zhengqi.arch@bytedance.com: use list_splice_tail_init() to reparent child folios] Link: https://lore.kernel.org/20260324114937.28569-1-qi.zheng@linux.dev Link: https://lore.kernel.org/e75050354cdbc42221a04f7cf133292b61105548.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Suggested-by: Harry Yoo <harry.yoo@oracle.com> Suggested-by: Imran Khan <imran.f.khan@oracle.com> Acked-by: Harry Yoo <harry.yoo@oracle.com> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Thu, 5 Mar 2026 11:52:43 +0000 (19:52 +0800)]
mm: vmscan: prepare for reparenting traditional LRU folios
To resolve the dying memcg issue, we need to reparent LRU folios of child
memcg to its parent memcg. For traditional LRU list, each lruvec of every
memcg comprises four LRU lists. Due to the symmetry of the LRU lists, it
is feasible to transfer the LRU lists from a memcg to its parent memcg
during the reparenting process.
This commit implements the specific function, which will be used during
the reparenting process.
Link: https://lore.kernel.org/a92d217a9fc82bd0c401210204a095caaf615b1c.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Muchun Song <muchun.song@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:42 +0000 (19:52 +0800)]
mm: memcontrol: prepare for reparenting LRU pages for lruvec lock
The following diagram illustrates how to ensure the safety of the folio
lruvec lock when LRU folios undergo reparenting.
In the folio_lruvec_lock(folio) function:
rcu_read_lock();
retry:
lruvec = folio_lruvec(folio);
/* There is a possibility of folio reparenting at this point. */
spin_lock(&lruvec->lru_lock);
if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) {
/*
* The wrong lruvec lock was acquired, and a retry is required.
* This is because the folio resides on the parent memcg lruvec
* list.
*/
spin_unlock(&lruvec->lru_lock);
goto retry;
}
/* Reaching here indicates that folio_memcg() is stable. */
In the memcg_reparent_objcgs(memcg) function:
spin_lock(&lruvec->lru_lock);
spin_lock(&lruvec_parent->lru_lock);
/* Transfer folios from the lruvec list to the parent's. */
spin_unlock(&lruvec_parent->lru_lock);
spin_unlock(&lruvec->lru_lock);
After acquiring the lruvec lock, it is necessary to verify whether the
folio has been reparented. If reparenting has occurred, the new lruvec
lock must be reacquired. During the LRU folio reparenting process, the
lruvec lock will also be acquired (this will be implemented in a
subsequent patch). Therefore, folio_memcg() remains unchanged while the
lruvec lock is held.
Given that lruvec_memcg(lruvec) is always equal to folio_memcg(folio)
after the lruvec lock is acquired, the lruvec_memcg_debug() check is
redundant. Hence, it is removed.
This patch serves as a preparation for the reparenting of LRU folios.
Link: https://lore.kernel.org/23f22cbb1419f277a3483018b32158ae2b86c666.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Thu, 5 Mar 2026 11:52:41 +0000 (19:52 +0800)]
mm: do not open-code lruvec lock
Now we have lruvec_unlock(), lruvec_unlock_irq() and
lruvec_unlock_irqrestore(), but no the paired lruvec_lock(),
lruvec_lock_irq() and lruvec_lock_irqsave().
There is currently no use case for lruvec_lock_irqsave(), so only
introduce lruvec_lock_irq(), and change all open-code places to use this
helper function. This looks cleaner and prepares for reparenting LRU
pages, preventing user from missing RCU lock calls due to open-code lruvec
lock.
Link: https://lore.kernel.org/2d0bafe7564e17ece46dfd58197af22ce57017dc.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Muchun Song <muchun.song@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:40 +0000 (19:52 +0800)]
mm: workingset: prevent lruvec release in workingset_activation()
In the near future, a folio will no longer pin its corresponding memory
cgroup. So an lruvec returned by folio_lruvec() could be released without
the rcu read lock or a reference to its memory cgroup.
In the current patch, the rcu read lock is employed to safeguard against
the release of the lruvec in workingset_activation().
This serves as a preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/c6130476affbba0a7d309a887c3df11e0167990b.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:39 +0000 (19:52 +0800)]
mm: swap: prevent lruvec release in lru_gen_clear_refs()
In the near future, a folio will no longer pin its corresponding memory
cgroup. So an lruvec returned by folio_lruvec() could be released without
the rcu read lock or a reference to its memory cgroup.
In the current patch, the rcu read lock is employed to safeguard against
the release of the lruvec in lru_gen_clear_refs().
This serves as a preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/986cd26227191a48a7c34a2a15812d361f4ebd53.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:38 +0000 (19:52 +0800)]
mm: zswap: prevent lruvec release in zswap_folio_swapin()
In the near future, a folio will no longer pin its corresponding memory
cgroup. So an lruvec returned by folio_lruvec() could be released without
the rcu read lock or a reference to its memory cgroup.
In the current patch, the rcu read lock is employed to safeguard against
the release of the lruvec in zswap_folio_swapin().
This serves as a preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/02b3f76ee8d1132f69ac5baaedce38fb82b09a48.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:37 +0000 (19:52 +0800)]
mm: workingset: prevent lruvec release in workingset_refault()
In the near future, a folio will no longer pin its corresponding memory
cgroup. So an lruvec returned by folio_lruvec() could be released without
the rcu read lock or a reference to its memory cgroup.
In the current patch, the rcu read lock is employed to safeguard against
the release of the lruvec in workingset_refault().
This serves as a preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/e3a8c19a9b18422b43213f6c89c451c5b6ca1577.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosry@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Thu, 5 Mar 2026 11:52:36 +0000 (19:52 +0800)]
mm: zswap: prevent memory cgroup release in zswap_compress()
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the rcu read lock is employed to safeguard against
the release of the memory cgroup in zswap_compress().
Link: https://lore.kernel.org/340f315050fb8a67caaf01b4836d4f38a41cf1a8.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Thu, 5 Mar 2026 11:52:35 +0000 (19:52 +0800)]
mm: thp: prevent memory cgroup release in folio_split_queue_lock{_irqsave}()
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the rcu read lock is employed to safeguard against
the release of the memory cgroup in folio_split_queue_lock{_irqsave}().
Link: https://lore.kernel.org/ca2957c0df1126b2c71b40c738018fd5255525a6.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:34 +0000 (19:52 +0800)]
mm: workingset: prevent memory cgroup release in lru_gen_eviction()
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the rcu read lock is employed to safeguard against
the release of the memory cgroup in lru_gen_eviction().
This serves as a preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/f37e8ae2d84ddc690813d834cd75735d52d1bc78.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:33 +0000 (19:52 +0800)]
mm: memcontrol: prevent memory cgroup release in mem_cgroup_swap_full()
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the rcu read lock is employed to safeguard against
the release of the memory cgroup in mem_cgroup_swap_full().
This serves as a preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/21d1abab7342615745ea4c18a88237335ab44d13.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:32 +0000 (19:52 +0800)]
mm: mglru: prevent memory cgroup release in mglru
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the rcu read lock is employed to safeguard against
the release of the memory cgroup in mglru.
This serves as a preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/9d887662a9d39c425742dd8468e3123316bccfe3.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:31 +0000 (19:52 +0800)]
mm: migrate: prevent memory cgroup release in folio_migrate_mapping()
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In __folio_migrate_mapping(), the rcu read lock is employed to safeguard
against the release of the memory cgroup in folio_migrate_mapping().
This serves as a preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/0f156c2f1188f256855617953f8305f43e066065.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:30 +0000 (19:52 +0800)]
mm: page_io: prevent memory cgroup release in page_io module
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the rcu read lock is employed to safeguard against
the release of the memory cgroup in swap_writeout() and
bio_associate_blkg_from_page().
This serves as a preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/7c3708358412fb02c482d0985feb5e9513a863ef.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:29 +0000 (19:52 +0800)]
mm: memcontrol: prevent memory cgroup release in count_memcg_folio_events()
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the rcu read lock is employed to safeguard against
the release of the memory cgroup in count_memcg_folio_events().
This serves as a preparatory measure for the reparenting of the
LRU pages.
Link: https://lore.kernel.org/dea6aa0389367f7fd6b715c8837a2cf7506bd889.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:28 +0000 (19:52 +0800)]
writeback: prevent memory cgroup release in writeback module
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the function get_mem_cgroup_css_from_folio() and the
rcu read lock are employed to safeguard against the release of the memory
cgroup.
This serves as a preparatory measure for the reparenting of the
LRU pages.
Link: https://lore.kernel.org/645f99bc344575417f67def3744f975596df2793.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:27 +0000 (19:52 +0800)]
buffer: prevent memory cgroup release in folio_alloc_buffers()
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the function get_mem_cgroup_from_folio() is employed
to safeguard against the release of the memory cgroup. This serves as a
preparatory measure for the reparenting of the LRU pages.
Link: https://lore.kernel.org/d6d48fdcf329c549373ac0a1c80fd9f38067e34e.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:26 +0000 (19:52 +0800)]
mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio()
In the near future, a folio will no longer pin its corresponding memory
cgroup. To ensure safety, it will only be appropriate to hold the rcu
read lock or acquire a reference to the memory cgroup returned by
folio_memcg(), thereby preventing it from being released.
In the current patch, the rcu read lock is employed to safeguard against
the release of the memory cgroup in get_mem_cgroup_from_folio().
This serves as a preparatory measure for the reparenting of the
LRU pages.
Link: https://lore.kernel.org/a5a64c6173a566bd21534606aeaaa9220cb1366d.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:25 +0000 (19:52 +0800)]
mm: memcontrol: return root object cgroup for root memory cgroup
Memory cgroup functions such as get_mem_cgroup_from_folio() and
get_mem_cgroup_from_mm() return a valid memory cgroup pointer, even for
the root memory cgroup. In contrast, the situation for object cgroups has
been different.
Previously, the root object cgroup couldn't be returned because it didn't
exist. Now that a valid root object cgroup exists, for the sake of
consistency, it's necessary to align the behavior of object-cgroup-related
operations with that of memory cgroup APIs.
Link: https://lore.kernel.org/e9c3f40ba7681d9753372d4ee2ac7a0216848b95.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:24 +0000 (19:52 +0800)]
mm: memcontrol: allocate object cgroup for non-kmem case
To allow LRU page reparenting, the objcg infrastructure is no longer
solely applicable to the kmem case. In this patch, we extend the scope of
the objcg infrastructure beyond the kmem case, enabling LRU folios to
reuse it for folio charging purposes.
It should be noted that LRU folios are not accounted for at the root
level, yet the folio->memcg_data points to the root_mem_cgroup. Hence,
the folio->memcg_data of LRU folios always points to a valid pointer.
However, the root_mem_cgroup does not possess an object cgroup.
Therefore, we also allocate an object cgroup for the root_mem_cgroup.
Link: https://lore.kernel.org/b77274aa8e3f37c419bedf4782943fd5885dda82.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Chen Ridong <chenridong@huawei.com> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:23 +0000 (19:52 +0800)]
mm: vmscan: refactor move_folios_to_lru()
In a subsequent patch, we'll reparent the LRU folios. The folios that are
moved to the appropriate LRU list can undergo reparenting during the
move_folios_to_lru() process. Hence, it's incorrect for the caller to
hold a lruvec lock. Instead, we should utilize the more general interface
of folio_lruvec_relock_irq() to obtain the correct lruvec lock.
This patch involves only code refactoring and doesn't introduce any
functional changes.
Link: https://lore.kernel.org/6f1dac88b61e2e3cb7a3e90bacdf06b654acfc15.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chen Ridong <chenridong@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qi Zheng [Thu, 5 Mar 2026 11:52:22 +0000 (19:52 +0800)]
mm: vmscan: prepare for the refactoring the move_folios_to_lru()
Once we refactor move_folios_to_lru(), its callers will no longer have to
hold the lruvec lock; For shrink_inactive_list(), shrink_active_list() and
evict_folios(), IRQ disabling is only needed for __count_vm_events() and
__mod_node_page_state().
To avoid using local_irq_disable() on the PREEMPT_RT kernel, let's make
all callers of move_folios_to_lru() use IRQ-safed count_vm_events() and
mod_node_page_state().
Link: https://lore.kernel.org/b3a202f1787b0857bb6cbe059fffb8edefaf67b7.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:21 +0000 (19:52 +0800)]
mm: rename unlock_page_lruvec_irq and its variants
It is inappropriate to use folio_lruvec_lock() variants in conjunction
with unlock_page_lruvec() variants, as this involves the inconsistent
operation of locking a folio while unlocking a page. To rectify this, the
functions unlock_page_lruvec{_irq, _irqrestore} are renamed to
lruvec_unlock{_irq,_irqrestore}.
Link: https://lore.kernel.org/4e5e05271a250df4d1812e1832be65636a78c957.1772711148.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Chen Ridong <chenridong@huawei.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 5 Mar 2026 11:52:19 +0000 (19:52 +0800)]
mm: memcontrol: remove dead code of checking parent memory cgroup
Patch series "Eliminate Dying Memory Cgroup", v6.
Introduction
============
This patchset is intended to transfer the LRU pages to the object cgroup
without holding a reference to the original memory cgroup in order to
address the issue of the dying memory cgroup. A consensus has already
been reached regarding this approach recently [1].
Background
==========
The issue of a dying memory cgroup refers to a situation where a memory
cgroup is no longer being used by users, but memory (the metadata
associated with memory cgroups) remains allocated to it. This situation
may potentially result in memory leaks or inefficiencies in memory
reclamation and has persisted as an issue for several years. Any memory
allocation that endures longer than the lifespan (from the users'
perspective) of a memory cgroup can lead to the issue of dying memory
cgroup. We have exerted greater efforts to tackle this problem by
introducing the infrastructure of object cgroup [2].
Presently, numerous types of objects (slab objects, non-slab kernel
allocations, per-CPU objects) are charged to the object cgroup without
holding a reference to the original memory cgroup. The final allocations
for LRU pages (anonymous pages and file pages) are charged at allocation
time and continues to hold a reference to the original memory cgroup until
reclaimed.
File pages are more complex than anonymous pages as they can be shared
among different memory cgroups and may persist beyond the lifespan of the
memory cgroup. The long-term pinning of file pages to memory cgroups is a
widespread issue that causes recurring problems in practical scenarios
[3]. File pages remain unreclaimed for extended periods. Additionally,
they are accessed by successive instances (second, third, fourth, etc.) of
the same job, which is restarted into a new cgroup each time. As a
result, unreclaimable dying memory cgroups accumulate, leading to memory
wastage and significantly reducing the efficiency of page reclamation.
Fundamentals
============
A folio will no longer pin its corresponding memory cgroup. It is
necessary to ensure that the memory cgroup or the lruvec associated with
the memory cgroup is not released when a user obtains a pointer to the
memory cgroup or lruvec returned by folio_memcg() or folio_lruvec().
Users are required to hold the RCU read lock or acquire a reference to the
memory cgroup associated with the folio to prevent its release if they are
not concerned about the binding stability between the folio and its
corresponding memory cgroup. However, some users of folio_lruvec() (i.e.,
the lruvec lock) desire a stable binding between the folio and its
corresponding memory cgroup. An approach is needed to ensure the
stability of the binding while the lruvec lock is held, and to detect the
situation of holding the incorrect lruvec lock when there is a race
condition during memory cgroup reparenting. The following four steps are
taken to achieve these goals.
1. The first step to be taken is to identify all users of both functions
(folio_memcg() and folio_lruvec()) who are not concerned about binding
stability and implement appropriate measures (such as holding a RCU read
lock or temporarily obtaining a reference to the memory cgroup for a
brief period) to prevent the release of the memory cgroup.
2. Secondly, the following refactoring of folio_lruvec_lock() demonstrates
how to ensure the binding stability from the user's perspective of
folio_lruvec().
From the perspective of memory cgroup removal, the entire reparenting
process (altering the binding relationship between folio and its memory
cgroup and moving the LRU lists to its parental memory cgroup) should be
carried out under both the lruvec lock of the memory cgroup being removed
and the lruvec lock of its parent.
3. Finally, transfer the LRU pages to the object cgroup without holding a
reference to the original memory cgroup.
Effect
======
Finally, it can be observed that the quantity of dying memory cgroups will
not experience a significant increase if the following test script is
executed to reproduce the issue.
#!/bin/bash
# Create a temporary file 'temp' filled with zero bytes
dd if=/dev/zero of=temp bs=4096 count=1
# Display memory-cgroup info from /proc/cgroups
cat /proc/cgroups | grep memory
for i in {0..2000}
do
mkdir /sys/fs/cgroup/memory/test$i
echo $$ > /sys/fs/cgroup/memory/test$i/cgroup.procs
# Potentially create a dying memory cgroup
rmdir /sys/fs/cgroup/memory/test$i
done
# Display memory-cgroup info after test
cat /proc/cgroups | grep memory
rm -f temp log
This patch (of 33):
Since the no-hierarchy mode has been deprecated after the commit:
commit bef8620cd8e0 ("mm: memcg: deprecate the non-hierarchical mode").
As a result, parent_mem_cgroup() will not return NULL except when passing
the root memcg, and the root memcg cannot be offline. Hence, it's safe to
remove the check on the returned value of parent_mem_cgroup(). Remove the
corresponding dead code.
Link: https://lore.kernel.org/f4481291bf8c6561dd8949045b5a1ed4008a6b63.1772711148.git.zhengqi.arch@bytedance.com Link: https://lore.kernel.org/linux-mm/Z6OkXXYDorPrBvEQ@hm-sls2/ Link: https://lwn.net/Articles/895431/ Link: https://github.com/systemd/systemd/pull/36827 Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Chen Ridong <chenridong@huawei.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: Hugh Dickins <hughd@google.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yosry Ahmed <yosry@kernel.org> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Mon, 13 Apr 2026 10:57:13 +0000 (11:57 +0100)]
mm/vma: remove __vma_check_mmap_hook()
Commit c50ca15dd496 ("mm: add vm_ops->mapped hook") introduced
__vma_check_mmap_hook() in order to assert that a driver doesn't
incorrectly implement both an f_op->mmap() and a vm_ops->mapped hook, the
latter of which would not ultimately get invoked.
However, this did not correctly account for stacked drivers (or drivers
that otherwise use the compatibility layer) which might recursively call
an mmap_prepare hook via the compatibility layer.
Thus the nested mmap_prepare() invocation might result in a VMA which has
vm_ops->mapped set with an overlaying mmap() hook, causing the
__vma_check_mmap_hook() to fail in vfs_mmap(), wrongly failing the
operation.
This patch resolves this by simply removing the check, as we can't be
certain that an mmap() hook doesn't at some point invoke the compatibility
layer, and it's not worth trying to track it.
Link: https://lore.kernel.org/20260413105713.92625-1-ljs@kernel.org Fixes: c50ca15dd496 ("mm: add vm_ops->mapped hook") Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/all/adx2ws5z0NMIe5Yj@shinmob/ Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Merge tag 'mtd/for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD updates from Miquel Raynal:
"MTD changes:
- mtdconcat finally makes it in, after several years of being merged
and reverted
- Baikal SoC support is being removed, so MTD bits are being removed
as well
- misc cleanups
NAND changes:
- SunXi driver support for new versions of the Allwinner NAND
controller.
- DT-binding improvements and cleanups.
- A few fixes (Realtek ECC and Winbond SPI NAND), aside with the
usual load of misc changes.
SPI NOR fixes:
- Enable die erase on MT35XU02GCBA. We knew this flash needed this
fixup since 7f77c561e227 ("mtd: spi-nor: micron-st: add TODO for
fixing mt35xu02gcba") but did not add it due to lack of hardware to
test on.
- Fix locking on some Winbond w25q series flashes.
- Fix Auto Address Increment (AAI) writes on SST that flashes that
start on odd address. The write enable latch needs to be set again
after the single byte program"
* tag 'mtd/for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (44 commits)
mtd: spinand: winbond: Declare the QE bit on W25NxxJW
mtd: spi-nor: micron-st: Enable die erase support for MT35XU02GCBA
mtd: spi-nor: winbond: Fix locking support for w25q256jw
mtd: spi-nor: sst: Fix write enable before AAI sequence
mtd: spi-nor: winbond: Fix locking support for w25q64jvm
mtd: spi-nor: winbond: Fix locking support for w25q256jwm
dt-bindings: mtd: mxc-nand: add missing compatible string and ref to nand-controller-legacy.yaml
dt-bindings: mtd: gpmi-nand: ref to nand-controller-legacy.yaml
dt-bindings: mtd: refactor NAND bindings and add nand-controller-legacy.yaml
mtd: spinand: winbond: Clarify when to enable the HS bit
mtd: rawnand: sunxi: introduce maximize variable user data length
mtd: rawnand: sunxi: fix typos in comments
mtd: rawnand: sunxi: change error prone variable name
mtd: rawnand: sunxi: remove dead code
mtd: rawnand: sunxi: make the code more self-explanatory
mtd: rawnand: sunxi: replace hard coded value by a define - take2
mtd: rawnand: sunxi: do not count BBM bytes twice
mtd: rawnand: sunxi: fix sunxi_nfc_hw_ecc_read_extra_oob
mtd: rawnand: sunxi: sunxi_nand_ooblayout_free code clarification
mtd: cmdlinepart: use a flexible array member
...
Merge tag 'ext4_for_linux-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
- Refactor code paths involved with partial block zero-out in
prearation for converting ext4 to use iomap for buffered writes
- Remove use of d_alloc() from ext4 in preparation for the deprecation
of this interface
- Replace some J_ASSERTS with a journal abort so we can avoid a kernel
panic for a localized file system error
- Simplify various code paths in mballoc, move_extent, and fast commit
- Fix rare deadlock in jbd2_journal_cancel_revoke() that can be
triggered by generic/013 when blocksize < pagesize
- Fix memory leak when releasing an extended attribute when its value
is stored in an ea_inode
- Fix various potential kunit test bugs in fs/ext4/extents.c
- Fix potential out-of-bounds access in check_xattr() with a corrupted
file system
- Make the jbd2_inode dirty range tracking safe for lockless reads
- Avoid a WARN_ON when writeback files due to a corrupted file system;
we already print an ext4 warning indicatign that data will be lost,
so the WARN_ON is not necessary and doesn't add any new information
* tag 'ext4_for_linux-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits)
jbd2: fix deadlock in jbd2_journal_cancel_revoke()
ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()
ext4: fix possible null-ptr-deref in mbt_kunit_exit()
ext4: fix possible null-ptr-deref in extents_kunit_exit()
ext4: fix the error handling process in extents_kunit_init).
ext4: call deactivate_super() in extents_kunit_exit()
ext4: fix miss unlock 'sb->s_umount' in extents_kunit_init()
ext4: fix bounds check in check_xattrs() to prevent out-of-bounds access
ext4: zero post-EOF partial block before appending write
ext4: move pagecache_isize_extended() out of active handle
ext4: remove ctime/mtime update from ext4_alloc_file_blocks()
ext4: unify SYNC mode checks in fallocate paths
ext4: ensure zeroed partial blocks are persisted in SYNC mode
ext4: move zero partial block range functions out of active handle
ext4: pass allocate range as loff_t to ext4_alloc_file_blocks()
ext4: remove handle parameters from zero partial block functions
ext4: move ordered data handling out of ext4_block_do_zero_range()
ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range()
ext4: factor out journalled block zeroing range
ext4: rename and extend ext4_block_truncate_page()
...
Merge tag 'for-linus-7.1-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Fixes:
- validate getxattr response length
- don't overflow the bufmap slot on readahead
- fix parsing problem with kernel debug keywords
Cleanup:
- take better advantage of strscpy
New:
- manage bufmap as folios
- add usercopy whitelist to orangefs_op_cache"
* tag 'for-linus-7.1-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
bufmap: manage as folios, V2.
orangefs: validate getxattr response length
orangefs_readahead: don't overflow the bufmap slot.
debugfs: take better advantage of strscpy.
orangefs: add usercopy whitelist to orangefs_op_cache
orangefs-debugfs.c: fix parsing problem with kernel debug keywords.
Merge tag 'ntfs-for-7.1-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs
Pull ntfs resurrection from Namjae Jeon:
"Ever since Kari Argillander’s 2022 report [1] regarding the state of
the ntfs3 driver, I have spent the last 4 years working to provide
full write support and current trends (iomap, no buffer head, folio),
enhanced performance, stable maintenance, utility support including
fsck for NTFS in Linux.
This new implementation is built upon the clean foundation of the
original read-only NTFS driver, adding:
- Write support:
Implemented full write support based on the classic read-only NTFS
driver. Added delayed allocation to improve write performance
through multi-cluster allocation and reduced fragmentation of the
cluster bitmap.
- iomap conversion:
Switched buffered IO (reads/writes), direct IO, file extent
mapping, readpages, and writepages to use iomap.
- Remove buffer_head:
Completely removed buffer_head usage by converting to folios. As a
result, the dependency on CONFIG_BUFFER_HEAD has been removed from
Kconfig.
- Stability improvements:
The new ntfs driver passes 326 xfstests, compared to 273 for ntfs3.
All tests passed by ntfs3 are a complete subset of the tests passed
by this implementation. Added support for fallocate, idmapped
mounts, permissions, and more.
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
"Most of the diff stat comes from Xu Kuohai's fix to emit ENDBR/BTI,
since all JITs had to be touched to move constant blinding out and
pass bpf_verifier_env in.
- Fix use-after-free in arena_vm_close on fork (Alexei Starovoitov)
- Dissociate struct_ops program with map if map_update fails (Amery
Hung)
- Fix out-of-range and off-by-one bugs in arm64 JIT (Daniel Borkmann)
- Fix precedence bug in convert_bpf_ld_abs alignment check (Daniel
Borkmann)
- Fix arg tracking for imprecise/multi-offset in BPF_ST/STX insns
(Eduard Zingerman)
- Copy token from main to subprogs to fix missing kallsyms (Eduard
Zingerman)
- Prevent double close and leak of btf objects in libbpf (Jiri Olsa)
- Fix af_unix null-ptr-deref in sockmap (Michal Luczaj)
- Fix NULL deref in map_kptr_match_type for scalar regs (Mykyta
Yatsenko)
- Avoid unnecessary IPIs. Remove redundant bpf_flush_icache() in
arm64 and riscv JITs (Puranjay Mohan)
- Fix out of bounds access. Validate node_id in arena_alloc_pages()
(Puranjay Mohan)
- Reject BPF-to-BPF calls and callbacks in arm32 JIT (Puranjay Mohan)
- Refactor all JITs to pass bpf_verifier_env to emit ENDBR/BTI for
indirect jump targets on x86-64, arm64 JITs (Xu Kuohai)
- Allow UTF-8 literals in bpf_bprintf_prepare() (Yihan Ding)"
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (32 commits)
bpf, arm32: Reject BPF-to-BPF calls and callbacks in the JIT
bpf: Dissociate struct_ops program with map if map_update fails
bpf: Validate node_id in arena_alloc_pages()
libbpf: Prevent double close and leak of btf objects
selftests/bpf: cover UTF-8 trace_printk output
bpf: allow UTF-8 literals in bpf_bprintf_prepare()
selftests/bpf: Reject scalar store into kptr slot
bpf: Fix NULL deref in map_kptr_match_type for scalar regs
bpf: Fix precedence bug in convert_bpf_ld_abs alignment check
bpf, arm64: Emit BTI for indirect jump target
bpf, x86: Emit ENDBR for indirect jump targets
bpf: Add helper to detect indirect jump targets
bpf: Pass bpf_verifier_env to JIT
bpf: Move constants blinding out of arch-specific JITs
bpf, sockmap: Take state lock for af_unix iter
bpf, sockmap: Fix af_unix null-ptr-deref in proto update
selftests/bpf: Extend bpf_iter_unix to attempt deadlocking
bpf, sockmap: Fix af_unix iter deadlock
bpf, sockmap: Annotate af_unix sock:: Sk_state data-races
selftests/bpf: verify kallsyms entries for token-loaded subprograms
...
Merge tag 'cxl-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL (Compute Express Link) updates from Dave Jiang:
"The significant change of interest is the handling of soft reserved
memory conflict between CXL and HMEM. In essence CXL will be the first
to claim the soft reserved memory ranges that belongs to CXL and
attempt to enumerate them with best effort. If CXL is not able to
enumerate the ranges it will punt them to HMEM.
There are also MAINTAINERS email changes from Dan Williams and
Jonathan Cameron"
* tag 'cxl-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (37 commits)
MAINTAINERS: Update Jonathan Cameron's email address
cxl/hdm: Add support for 32 switch decoders
MAINTAINERS: Update address for Dan Williams
tools/testing/cxl: Enable replay of user regions as auto regions
cxl/region: Add a region sysfs interface for region lock status
tools/testing/cxl: Test dax_hmem takeover of CXL regions
tools/testing/cxl: Simulate auto-assembly failure
dax/hmem: Parent dax_hmem devices
dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices
dax/hmem: Reduce visibility of dax_cxl coordination symbols
cxl/region: Constify cxl_region_resource_contains()
cxl/region: Limit visibility of cxl_region_contains_resource()
dax/cxl: Fix HMEM dependencies
cxl/region: Fix use-after-free from auto assembly failure
cxl/core: Check existence of cxl_memdev_state in poison test
cxl/core: use cleanup.h for devm_cxl_add_dax_region
cxl/core/region: move dax region device logic into region_dax.c
cxl/core/region: move pmem region driver logic into region_pmem.c
dax/hmem, cxl: Defer and resolve Soft Reserved ownership
cxl/region: Add helper to check Soft Reserved containment by CXL regions
...
Merge tag 'stop-machine.2026.04.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull stop-machine update from Paul McKenney:
- kernel-doc updates for stop_machine() and stop_machine_cpuslocked()
functions
* tag 'stop-machine.2026.04.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
stop_machine: Fix the documentation for a NULL cpus argument
Merge tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"There are two main changes, one feature removal, some code cleanup,
and a number of bug fixes.
Main changes:
- Detecting secure boot mode was limited to IMA. Make detecting
secure boot mode accessible to EVM and other LSMs
- IMA sigv3 support was limited to fsverity. Add IMA sigv3 support
for IMA regular file hashes and EVM portable signatures
Remove:
- Remove IMA support for asychronous hash calculation originally
added for hardware acceleration
Cleanup:
- Remove unnecessary Kconfig CONFIG_MODULE_SIG and CONFIG_KEXEC_SIG
tests
- Add descriptions of the IMA atomic flags
Bug fixes:
- Like IMA, properly limit EVM "fix" mode
- Define and call evm_fix_hmac() to update security.evm
- Fallback to using i_version to detect file change for filesystems
that do not support STATX_CHANGE_COOKIE
- Address missing kernel support for configured (new) TPM hash
algorithms
- Add missing crypto_shash_final() return value"
* tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
evm: Enforce signatures version 3 with new EVM policy 'bit 3'
integrity: Allow sigv3 verification on EVM_XATTR_PORTABLE_DIGSIG
ima: add support to require IMA sigv3 signatures
ima: add regular file data hash signature version 3 support
ima: Define asymmetric_verify_v3() to verify IMA sigv3 signatures
ima: remove buggy support for asynchronous hashes
integrity: Eliminate weak definition of arch_get_secureboot()
ima: Add code comments to explain IMA iint cache atomic_flags
ima_fs: Correctly create securityfs files for unsupported hash algos
ima: check return value of crypto_shash_final() in boot aggregate
ima: Define and use a digest_size field in the ima_algo_desc structure
powerpc/ima: Drop unnecessary check for CONFIG_MODULE_SIG
ima: efi: Drop unnecessary check for CONFIG_MODULE_SIG/CONFIG_KEXEC_SIG
ima: fallback to using i_version to detect file change
evm: fix security.evm for a file with IMA signature
s390: Drop unnecessary CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT
evm: Don't enable fix mode when secure boot is enabled
integrity: Make arch_ima_get_secureboot integrity-wide
Merge tag 'hwlock-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull hwspinlock updates from Bjorn Andersson:
"Remove the unused u8500 hardware spinlock driver, and clean out the
hwspinlock_pdata struct as this was the last user of the struct"
* tag 'hwlock-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
hwspinlock: remove now unused pdata from header file
hwspinlock: u8500: delete driver
Merge tag 'rpmsg-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull rpmsg updates from Bjorn Andersson:
"Mark 'data' argument in rpmsg_send() const, and perculate to related
drivers. Replace deprecated class_destroy() with class_unregister()"
* tag 'rpmsg-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
media: platform: mtk-mdp3: Constify buffer passed to mdp_vpu_sendmsg()
ASoC: qcom: Constify GPR packet being send over GPR interface
rpmsg: Constify buffer passed to send API
remoteproc: mtk_scp: Constify buffer passed to scp_send_ipi()
remoteproc: mtk_scp_ipi: Constify buffer passed to scp_ipi_send()
drivers: rpmsg: class_destroy() is deprecated
Merge tag 'rproc-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull remoteproc updates from Bjorn Andersson:
- Move requesting of IRQs in TI Keystone driver to probe time instead
of remoteproc start, to allow better handling of errors.
- Introduce support for more than 10 entries in the Qualcomm minidump
implementation.
- Add audio DSP remoteproc support for the Qualcomm Eliza platform. Add
modem remoteproc support for the Qualcomm MDM9607, MSM8917, MSM8937,
and MSM8940 platforms.
- Add list of Qualcomm QMI service ids to the QMI header file, in order
to avoid sprinkling them across the various drivers using them.
Migrate sysmon to use this constant.
- Fix several issues related to DeviceTree parsing and mailbox handling
in the Xilinx R5F remote processor driver.
- Fix incorrect error checks in reserved memory handling and polish the
code across i.MX and TI drivers.
* tag 'rproc-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (35 commits)
remoteproc: qcom: pas: Add Eliza ADSP support
dt-bindings: remoteproc: qcom,milos-pas: Document Eliza ADSP
remoteproc: qcom: Add missing space before closing bracket
dt-bindings: remoteproc: qcom: Drop types for firmware-name
remoteproc: qcom: Fix minidump out-of-bounds access on subsystems array
dt-bindings: remoteproc: k3-r5f: Add memory-region-names
dt-bindings: remoteproc: k3-r5f: Split up memory regions
remoteproc: use SIZE_MAX in rproc_u64_fit_in_size_t()
dt-bindings: remoteproc: qcom,sm8550-pas: Add Glymur CDSP
dt-bindings: remoteproc: qcom,sm8550-pas: Add Glymur ADSP
remoteproc: xlnx: Release mailbox channels on shutdown
remoteproc: sysmon: Use the unified QMI service ID instead of defining it locally
remoteproc: xlnx: Only access buffer information if IPI is buffered
remoteproc: xlnx: Avoid mailbox setup
remoteproc: keystone: Request IRQs in probe()
remoteproc: pru: Remove empty remove callback
remoteproc: pru: Use rproc_of_parse_firmware() to get firmware name
remoteproc: da8xx: Reorder resource fetching in probe()
remoteproc: da8xx: Remove unused local struct data
remoteproc: da8xx: Use dev_err_probe()
...
Merge tag 'for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pateldipen1984/linux
Pull hte updates from Dipen Patel:
- Add tegra264 HTE driver and dt binding support
- Remove tegra194 SoC Kconfig dependency
- Replace use of system_unbound_wq with system_dfl_wq
* tag 'for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pateldipen1984/linux:
hte: tegra194: Add Tegra264 GTE support
dt-bindings: timestamp: Add Tegra264 support
hte: tegra194: remove Kconfig dependency on Tegra194 SoC
hte: replace use of system_unbound_wq with system_dfl_wq
There is only a collection of bugfixes this time around, with no notable
changes to the core. Some of the more noteworthy bugfixes listed below.
- Enable die erase on MT35XU02GCBA. We knew this flash needed this fixup
since 7f77c561e227 ("mtd: spi-nor: micron-st: add TODO for fixing
mt35xu02gcba") but did not add it due to lack of hardware to test on.
- Fix locking on some Winbond w25q series flashes.
- Fix Auto Address Increment (AAI) writes on SST that flashes that start
on odd address. The write enable latch needs to be set again after the
single byte program.
bpf, arm32: Reject BPF-to-BPF calls and callbacks in the JIT
The ARM32 BPF JIT does not support BPF-to-BPF function calls
(BPF_PSEUDO_CALL) or callbacks (BPF_PSEUDO_FUNC), but it does
not reject them either.
When a program with subprograms is loaded (e.g. libxdp's XDP
dispatcher uses __noinline__ subprograms, or any program using
callbacks like bpf_loop or bpf_for_each_map_elem), the verifier
invokes bpf_jit_subprogs() which calls bpf_int_jit_compile()
for each subprogram.
For BPF_PSEUDO_CALL, since ARM32 does not reject it, the JIT
silently emits code using the wrong address computation:
func = __bpf_call_base + imm
where imm is a pc-relative subprogram offset, producing a bogus
function pointer.
For BPF_PSEUDO_FUNC, the ldimm64 handler ignores src_reg and
loads the immediate as a normal 64-bit value without error.
In both cases, build_body() reports success and a JIT image is
allocated. ARM32 lacks the jit_data/extra_pass mechanism needed
for the second JIT pass in bpf_jit_subprogs(). On the second
pass, bpf_int_jit_compile() performs a full fresh compilation,
allocating a new JIT binary and overwriting prog->bpf_func. The
first allocation is never freed. bpf_jit_subprogs() then detects
the function pointer changed and aborts with -ENOTSUPP, but the
original JIT binary has already been leaked. Each program
load/unload cycle leaks one JIT binary allocation, as reported
by kmemleak:
Fix this by rejecting both BPF_PSEUDO_CALL in the BPF_CALL
handler and BPF_PSEUDO_FUNC in the BPF_LD_IMM64 handler, falling
through to the existing 'notyet' path. This causes build_body()
to fail before any JIT binary is allocated, so
bpf_int_jit_compile() returns the original program unjitted.
bpf_jit_subprogs() then sees !prog->jited and cleanly falls
back to the interpreter with no leak.
Acked-by: Daniel Borkmann <daniel@iogearbox.net> Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Reported-by: Jonas Rebmann <jre@pengutronix.de> Closes: https://lore.kernel.org/bpf/b63e9174-7a3d-4e22-8294-16df07a4af89@pengutronix.de Tested-by: Jonas Rebmann <jre@pengutronix.de> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260417143353.838911-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf: Dissociate struct_ops program with map if map_update fails
Currently, when bpf_struct_ops_map_update_elem() fails, the programs'
st_ops_assoc will remain set. They may become dangling pointers if the
map is freed later, but they will never be dereferenced since the
struct_ops attachment did not succeed. However, if one of the programs
is subsequently attached as part of another struct_ops map, its
st_ops_assoc will be poisoned even though its old st_ops_assoc was stale
from a failed attachment.
Fix the spurious poisoned st_ops_assoc by dissociating struct_ops
programs with a map if the attachment fails. Move
bpf_prog_assoc_struct_ops() to after *plink++ to make sure
bpf_prog_disassoc_struct_ops() will not miss a program when iterating
st_map->links.
Note that, dissociating a program from a map requires some attention as
it must not reset a poisoned st_ops_assoc or a st_ops_assoc pointing to
another map. The former is already guarded in
bpf_prog_disassoc_struct_ops(). The latter also will not happen since
st_ops_assoc of programs in st_map->links are set by
bpf_prog_assoc_struct_ops(), which can only be poisoned or pointing to
the current map.
Merge tag 'for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Power-supply drivers:
- S2MU005: new battery fuel gauge driver
- macsmc-power: new driver for Apple Silicon
- qcom_battmgr: Add support for Glymur and Kaanapali
- max17042: add support for max77759
- qcom_smbx: allow disabling charging
- bd71828: add input current limit support
- multiple drivers: use new device managed workqueue allocation
function
- misc small cleanups and fixes
Reset core:
- Expose sysfs for registered reboot_modes
Reset drivers
- misc small cleanups and fixes"
* tag 'for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (36 commits)
power: supply: qcom_smbx: allow disabling charging
power: reset: drop unneeded dependencies on OF_GPIO
power: supply: bd71828: add input current limit property
dt-bindings: power: reset: cortina,gemini-power-controller: convert to DT schema
power: supply: add support for S2MU005 battery fuel gauge device
dt-bindings: power: supply: document Samsung S2MU005 battery fuel gauge
power: reset: reboot-mode: fix -Wformat-security warning
power: supply: ipaq_micro: Simplify with devm
power: supply: mt6370: Simplify with devm_alloc_ordered_workqueue()
power: supply: max77705: Free allocated workqueue and fix removal order
power: supply: max77705: Drop duplicated IRQ error message
power: supply: cw2015: Free allocated workqueue
power: reset: keystone: Use register_sys_off_handler(SYS_OFF_MODE_RESTART)
power: supply: twl4030_madc: Drop unused header includes
power: supply: bq24190: Avoid rescheduling after cancelling work
power: supply: axp288_charger: Simplify returns of dev_err_probe()
power: supply: axp288_charger: Do not cancel work before initializing it
power: supply: cpcap-battery: pass static battery cell data from device tree
dt-bindings: power: supply: cpcap-battery: document monitored-battery property
power: supply: qcom_battmgr: Add support for Glymur and Kaanapali
...
Merge tag 'hsi-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi
Pull HSI updates from Sebastian Reichel:
- use flexible array member for hsi_port in hsi_controller
- misc small fixes
* tag 'hsi-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
HSI: omap_ssi_port: remove depends on ARM
HSI: omap_ssi_port: remove set but unused variables
HSI: cmt_speech: fix wrong printf format
HSI: omap_ssi_port: remove null check from FAM
hsi: hsi_core: use kzalloc_flex
Merge tag 'hid-for-linus-2026041601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
"Core:
- fixed handling of 0-sized reports (Dmitry Torokhov)
- convert core code to __free() (Dmitry Torokhov)
- support for multiple batteries per HID device (Lucas Zampieri)
Drivers:
- support for rumble effects in winwing driver (Ivan Gorinov)
- new support for a variety of Sony Rock Band and Sony DJ Hero
Turntable devices (Rosalie Wanders)
- new driver for Lenovo Legion Go / S devices (Derek J. Clark)
- power management improvements to intel-thc-hid driver (Even Xu)
... other assorted cleanups, fixes and device-specific quirks"
* tag 'hid-for-linus-2026041601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (73 commits)
HID: core: clamp report_size in s32ton() to avoid undefined shift
HID: logitech-dj: fix wrong detection of bad DJ_SHORT output report
HID: logitech-hidpp: fix race condition when accessing stale stack pointer
HID: winwing: Enable rumble effects
HID: core: do not allow parsing 0-sized reports
HID: usbhid: refactor endpoint lookup
HID: huawei: fix CD30 keyboard report descriptor issue
HID: playstation: validate num_touch_reports in DualShock 4 reports
HID: drop 'default !EXPERT' from tristate symbols
HID: usbhid: fix deadlock in hid_post_reset()
HID: apple: ensure the keyboard backlight is off if suspending
HID: quirks: Set ALWAYS_POLL for LOGITECH_BOLT_RECEIVER
HID: alps: fix NULL pointer dereference in alps_raw_event()
HID: logitech-dj: Prevent REPORT_ID_DJ_SHORT related user initiated OOB write
HID: logitech-dj: Standardise hid_report_enum variable nomenclature
HID: sony: update module description
HID: logitech-hidpp: Check bounds when deleting force-feedback effects
HID: sony: add battery status support for Rock Band 4 PS5 guitars
HID: sony: fix style issues
HID: quirks: update hid-sony supported devices
...
Merge tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping updates from Marek Szyprowski:
- added support for batched cache sync, what improves performance of
dma_map/unmap_sg() operations on ARM64 architecture (Barry Song)
- introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory
used in confidential computing (Jiri Pirko)
- refactored spaghetti-like code in drivers/of/of_reserved_mem.c and
its clients (Marek Szyprowski, shared branch with device-tree updates
to avoid merge conflicts)
- prepared Contiguous Memory Allocator related code for making dma-buf
drivers modularized (Maxime Ripard)
- added support for benchmarking dma_map_sg() calls to tools/dma
utility (Qinxin Xia)
* tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: (24 commits)
dma-buf: heaps: system: document system_cc_shared heap
dma-buf: heaps: system: add system_cc_shared heap for explicitly shared memory
dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory
mm: cma: Export cma_alloc(), cma_release() and cma_get_name()
dma: contiguous: Export dev_get_cma_area()
dma: contiguous: Make dma_contiguous_default_area static
dma: contiguous: Make dev_get_cma_area() a proper function
dma: contiguous: Turn heap registration logic around
of: reserved_mem: rework fdt_init_reserved_mem_node()
of: reserved_mem: clarify fdt_scan_reserved_mem*() functions
of: reserved_mem: rearrange code a bit
of: reserved_mem: replace CMA quirks by generic methods
of: reserved_mem: switch to ops based OF_DECLARE()
of: reserved_mem: use -ENODEV instead of -ENOENT
of: reserved_mem: remove fdt node from the structure
dma-mapping: fix false kernel-doc comment marker
dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg
dma-mapping: Separate DMA sync issuing and completion waiting
arm64: Provide dcache_inval_poc_nosync helper
arm64: Provide dcache_clean_poc_nosync helper
...
Merge tag 'phy-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"New Support:
- Qualcomm Eliza QMP UFS PHY
- Canaan K230 USB 2.0 PHY driver
- Mediatek mt8167 dsi-phy
- Eswin EIC7700 SATA PHY driver
Updates:
- Sorted subsytem Makefile/Kconfig and some kernel-doc udpates"
* tag 'phy-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
dt-bindings: phy: qcom,sc8280xp-qmp-ufs-phy: document the Eliza QMP UFS PHY
phy: qcom: m31-eusb2: clear PLL_EN during init
phy: eswin: Create eswin directory and add EIC7700 SATA PHY driver
dt-bindings: phy: eswin: Document the EIC7700 SoC SATA PHY
phy: apple: apple: Use local variable for ioremap return value
phy: qcom: qmp-usbc: Simplify check for non-NULL pointer
phy: marvell: mmp3-hsic: Avoid re-casting __iomem
phy: apple: atc: Make atcphy_dwc3_reset_ops variable static
dt-bindings: phy: mediatek,dsi-phy: Add support for mt8167
phy: usb: Add driver for Canaan K230 USB 2.0 PHY
dt-bindings: phy: Add Canaan K230 USB PHY
phy: phy-mtk-tphy: Update names and format of kernel-doc comments
phy: Sort the subsystem Kconfig
phy: Sort the subsystem Makefile
phy: move spacemit pcie driver to its subfolder
Merge tag 'soundwire-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
- Core: DP prepare polling for avoiding interrupt deadlock
- AMD clock init and bandwidth refactoring
- Intel more codecs to wake list, clear message on before signaling
waiting thread
* tag 'soundwire-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: intel_auxdevice: Add cs42l49 to wake_capable_list
soundwire: cadence: Clear message complete before signaling waiting thread
soundwire: Intel: test bus.bpt_stream before assigning it
soundwire: bus: demote UNATTACHED state warnings to dev_dbg()
soundwire: stream: Poll for DP prepare to avoid interrupt deadlock
soundwire: amd: refactor bandwidth calculation logic
soundwire: amd: add clock init control function
soundwire: intel_auxdevice: Add CS47L47 to wake_capable_list
soundwire: slave: Don't register devices that are disabled in ACPI
soundwire: sdw.h: repair names and format of kernel-doc comments
Merge tag 'trace-latency-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing latency update from Steven Rostedt:
- Add TIMERLAT_ALIGN osnoise option
Add a timer alignment option for timerlat that makes it work like the
cyclictest -A option. timelat creates threads to test the latency of
the kernel. The alignment option will have these threads trigger at
the alignment offsets from each other. Instead of having each thread
wake up at the exact same time, if the alignment is set to "20" each
thread will wake up at 20 microseconds from the previous one.
* tag 'trace-latency-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/osnoise: Add option to align tlat threads
Merge tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Fix printf format warning for bprintf
sunrpc uses a trace_printk() that triggers a printf warning during
the compile. Move the __printf() attribute around for when debugging
is not enabled the warning will go away
- Remove redundant check for EVENT_FILE_FL_FREED in
event_filter_write()
The FREED flag is checked in the call to event_file_file() and then
checked again right afterward, which is unneeded
- Clean up event_file_file() and event_file_data() helpers
These helper functions played a different role in the past, but now
with eventfs, the READ_ONCE() isn't needed. Simplify the code a bit
and also add a warning to event_file_data() if the file or its data
is not present
- Remove updating file->private_data in tracing open
All access to the file private data is handled by the helper
functions, which do not use file->private_data. Stop updating it on
open
- Show ENUM names in function arguments via BTF in function tracing
When showing the function arguments when func-args option is set for
function tracing, if one of the arguments is found to be an enum,
show the name of the enum instead of its number
- Add new trace_call__##name() API for tracepoints
Tracepoints are enabled via static_branch() blocks, where when not
enabled, there's only a nop that is in the code where the execution
will just skip over it. When tracing is enabled, the nop is converted
to a direct jump to the tracepoint code. Sometimes more calculations
are required to be performed to update the parameters of the
tracepoint. In this case, trace_##name##_enabled() is called which is
a static_branch() that gets enabled only when the tracepoint is
enabled. This allows the extra calculations to also be skipped by the
nop:
if (trace_foo_enabled()) {
x = bar();
trace_foo(x);
}
Where the x=bar() is only performed when foo is enabled. The problem
with this approach is that there's now two static_branch() calls. One
for checking if the tracepoint is enabled, and then again to know if
the tracepoint should be called. The second one is redundant
Introduce trace_call__foo() that will call the foo() tracepoint
directly without doing a static_branch():
if (trace_foo_enabled()) {
x = bar();
trace_call__foo();
}
- Update various locations to use the new trace_call__##name() API
- Move snapshot code out of trace.c
Cleaning up trace.c to not be a "dump all", move the snapshot code
out of it and into a new trace_snapshot.c file
- Clean up some "%*.s" to "%*s"
- Allow boot kernel command line options to be called multiple times
The ipi_raise target_cpus field is defined as a __bitmask(). There is
now a __cpumask() field definition. Update the field to use that
- Have hist_field_name() use a snprintf() and not a series of strcat()
It's safer to use snprintf() that a series of strcat()
- Fix tracepoint regfunc balancing
A tracepoint can define a "reg" and "unreg" function that gets called
before the tracepoint is enabled, and after it is disabled
respectively. But on error, after the "reg" func is called and the
tracepoint is not enabled, the "unreg" function is not called to tear
down what the "reg" function performed
- Fix output that shows what histograms are enabled
Event variables are displayed incorrectly in the histogram output
Instead of "sched.sched_wakeup.$var", it is showing
"$sched.sched_wakeup.var" where the '$' is in the incorrect location
- Some other simple cleanups
* tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (24 commits)
selftests/ftrace: Add test case for fully-qualified variable references
tracing: Fix fully-qualified variable reference printing in histograms
tracepoint: balance regfunc() on func_add() failure in tracepoint_add_func()
tracing: Rebuild full_name on each hist_field_name() call
tracing: Report ipi_raise target CPUs as cpumask
tracing: Remove duplicate latency_fsnotify() stub
tracing: Preserve repeated trace_trigger boot parameters
tracing: Append repeated boot-time tracing parameters
tracing: Remove spurious default precision from show_event_trigger/filter formats
cpufreq: Use trace_call__##name() at guarded tracepoint call sites
tracing: Remove tracing_alloc_snapshot() when snapshot isn't defined
tracing: Move snapshot code out of trace.c and into trace_snapshot.c
mm: damon: Use trace_call__##name() at guarded tracepoint call sites
btrfs: Use trace_call__##name() at guarded tracepoint call sites
spi: Use trace_call__##name() at guarded tracepoint call sites
i2c: Use trace_call__##name() at guarded tracepoint call sites
kernel: Use trace_call__##name() at guarded tracepoint call sites
tracepoint: Add trace_call__##name() API
tracing: trace_mmap.h: fix a kernel-doc warning
tracing: Pretty-print enum parameters in function arguments
...
Merge tag 'bootconfig-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig updates from Masami Hiramatsu:
"Minor fixes for handling errors:
- fix off-by-one in xbc_verify_tree() next node check
- increment xbc_node_num after node init succeeds
- validate child node index in xbc_verify_tree()
Code cleanups (mainly type/attribute changes):
- clean up comment typos and bracing
- drop redundant memset of xbc_nodes
- replace linux/kernel.h with specific includes
- narrow flag parameter type from uint32_t to uint16_t
- constify xbc_calc_checksum() data parameter
- fix signed comparison in xbc_node_get_data()
- use size_t for strlen result in xbc_node_match_prefix()
- use signed type for offset in xbc_init_node()
- use size_t for key length tracking in xbc_verify_tree()
- change xbc_node_index() return type to uint16_t"
* tag 'bootconfig-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
lib/bootconfig: change xbc_node_index() return type to uint16_t
lib/bootconfig: use size_t for key length tracking in xbc_verify_tree()
lib/bootconfig: use signed type for offset in xbc_init_node()
lib/bootconfig: use size_t for strlen result in xbc_node_match_prefix()
lib/bootconfig: fix signed comparison in xbc_node_get_data()
lib/bootconfig: validate child node index in xbc_verify_tree()
lib/bootconfig: replace linux/kernel.h with specific includes
bootconfig: constify xbc_calc_checksum() data parameter
lib/bootconfig: drop redundant memset of xbc_nodes
lib/bootconfig: increment xbc_node_num after node init succeeds
lib/bootconfig: fix off-by-one in xbc_verify_tree() next node check
lib/bootconfig: narrow flag parameter type from uint32_t to uint16_t
lib/bootconfig: clean up comment typos and bracing
Merge tag 'alpha-for-v7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/lindholm/alpha
Pull alpha updates from Magnus Lindholm:
"One fix to silence pgprot_modify() compiler warnings, and one patch
adding SECCOMP/SECCOMP_FILTER support together with the syscall and
ptrace fixes needed for it"
* tag 'alpha-for-v7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/lindholm/alpha:
alpha: Define pgprot_modify to silence tautological comparison warnings
alpha: add support for SECCOMP and SECCOMP_FILTER
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Arm:
- Add support for tracing in the standalone EL2 hypervisor code,
which should help both debugging and performance analysis. This
uses the new infrastructure for 'remote' trace buffers that can be
exposed by non-kernel entities such as firmware, and which came
through the tracing tree
- Add support for GICv5 Per Processor Interrupts (PPIs), as the
starting point for supporting the new GIC architecture in KVM
- Finally add support for pKVM protected guests, where pages are
unmapped from the host as they are faulted into the guest and can
be shared back from the guest using pKVM hypercalls. Protected
guests are created using a new machine type identifier. As the
elusive guestmem has not yet delivered on its promises, anonymous
memory is also supported
This is only a first step towards full isolation from the host; for
example, the CPU register state and DMA accesses are not yet
isolated. Because this does not really yet bring fully what it
promises, it is hidden behind CONFIG_ARM_PKVM_GUEST +
'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is
created. Caveat emptor
- Rework the dreaded user_mem_abort() function to make it more
maintainable, reducing the amount of state being exposed to the
various helpers and rendering a substantial amount of state
immutable
- Expand the Stage-2 page table dumper to support NV shadow page
tables on a per-VM basis
- Tidy up the pKVM PSCI proxy code to be slightly less hard to
follow
- Fix both SPE and TRBE in non-VHE configurations so that they do not
generate spurious, out of context table walks that ultimately lead
to very bad HW lockups
- A small set of patches fixing the Stage-2 MMU freeing in error
cases
- Tighten-up accepted SMC immediate value to be only #0 for host
SMCCC calls
- The usual cleanups and other selftest churn
LoongArch:
- Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel()
- Add DMSINTC irqchip in kernel support
RISC-V:
- Fix steal time shared memory alignment checks
- Fix vector context allocation leak
- Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()
- Fix double-free of sdata in kvm_pmu_clear_snapshot_area()
- Fix integer overflow in kvm_pmu_validate_counter_mask()
- Fix shift-out-of-bounds in make_xfence_request()
- Fix lost write protection on huge pages during dirty logging
- Split huge pages during fault handling for dirty logging
- Skip CSR restore if VCPU is reloaded on the same core
- Implement kvm_arch_has_default_irqchip() for KVM selftests
- Factored-out ISA checks into separate sources
- Added hideleg to struct kvm_vcpu_config
- Factored-out VCPU config into separate sources
- Support configuration of per-VM HGATP mode from KVM user space
s390:
- Support for ESA (31-bit) guests inside nested hypervisors
- Remove restriction on memslot alignment, which is not needed
anymore with the new gmap code
- Fix LPSW/E to update the bear (which of course is the breaking
event address register)
x86:
- Shut up various UBSAN warnings on reading module parameter before
they were initialized
- Don't zero-allocate page tables that are used for splitting
hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in
the page table and thus write all bytes
- As an optimization, bail early when trying to unsync 4KiB mappings
if the target gfn can just be mapped with a 2MiB hugepage
x86 generic:
- Copy single-chunk MMIO write values into struct kvm_vcpu (more
precisely struct kvm_mmio_fragment) to fix use-after-free stack
bugs where KVM would dereference stack pointer after an exit to
userspace
- Clean up and comment the emulated MMIO code to try to make it
easier to maintain (not necessarily "easy", but "easier")
- Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not *all* of
VMX and SVM enabling) as it is needed for trusted I/O
- Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions
- Immediately fail the build if a required #define is missing in one
of KVM's headers that is included multiple times
- Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected
exception, mostly to prevent syzkaller from abusing the uAPI to
trigger WARNs, but also because it can help prevent userspace from
unintentionally crashing the VM
- Exempt SMM from CPUID faulting on Intel, as per the spec
- Misc hardening and cleanup changes
x86 (AMD):
- Fix and optimize IRQ window inhibit handling for AVIC; make it
per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple
vCPUs have to-be-injected IRQs
- Clean up and optimize the OSVW handling, avoiding a bug in which
KVM would overwrite state when enabling virtualization on multiple
CPUs in parallel. This should not be a problem because OSVW should
usually be the same for all CPUs
- Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains
about a "too large" size based purely on user input
- Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION
- Disallow synchronizing a VMSA of an already-launched/encrypted
vCPU, as doing so for an SNP guest will crash the host due to an
RMP violation page fault
- Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped
queries are required to hold kvm->lock, and enforce it by lockdep.
Fix various bugs where sev_guest() was not ensured to be stable for
the whole duration of a function or ioctl
- Convert a pile of kvm->lock SEV code to guard()
- Play nicer with userspace that does not enable
KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6
as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the
payload would end up in EXITINFO2 rather than CR2, for example).
Only set CR2 and DR6 when consumption of the payload is imminent,
but on the other hand force delivery of the payload in all paths
where userspace retrieves CR2 or DR6
- Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT
instead of vmcb02->save.cr2. The value is out of sync after a
save/restore or after a #PF is injected into L2
- Fix a class of nSVM bugs where some fields written by the CPU are
not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so
are not up-to-date when saved by KVM_GET_NESTED_STATE
- Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE
and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly
initialized after save+restore
- Add a variety of missing nSVM consistency checks
- Fix several bugs where KVM failed to correctly update VMCB fields
on nested #VMEXIT
- Fix several bugs where KVM failed to correctly synthesize #UD or
#GP for SVM-related instructions
- Add support for save+restore of virtualized LBRs (on SVM)
- Refactor various helpers and macros to improve clarity and
(hopefully) make the code easier to maintain
- Aggressively sanitize fields when copying from vmcb12, to guard
against unintentionally allowing L1 to utilize yet-to-be-defined
features
- Fix several bugs where KVM botched rAX legality checks when
emulating SVM instructions. There are remaining issues in that KVM
doesn't handle size prefix overrides for 64-bit guests
- Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails
instead of somewhat arbitrarily synthesizing #GP (i.e. don't double
down on AMD's architectural but sketchy behavior of generating #GP
for "unsupported" addresses)
- Cache all used vmcb12 fields to further harden against TOCTOU bugs
x86 (Intel):
- Drop obsolete branch hint prefixes from the VMX instruction macros
- Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a
register input when appropriate
- Code cleanups
guest_memfd:
- Don't mark guest_memfd folios as accessed, as guest_memfd doesn't
support reclaim, the memory is unevictable, and there is no storage
to write back to
LoongArch selftests:
- Add KVM PMU test cases
s390 selftests:
- Enable more memory selftests
x86 selftests:
- Add support for Hygon CPUs in KVM selftests
- Fix a bug in the MSR test where it would get false failures on
AMD/Hygon CPUs with exactly one of RDPID or RDTSCP
- Add an MADV_COLLAPSE testcase for guest_memfd as a regression test
for a bug where the kernel would attempt to collapse guest_memfd
folios against KVM's will"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits)
KVM: x86: use inlines instead of macros for is_sev_*guest
x86/virt: Treat SVM as unsupported when running as an SEV+ guest
KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails
KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper
KVM: SEV: use mutex guard in snp_handle_guest_req()
KVM: SEV: use mutex guard in sev_mem_enc_unregister_region()
KVM: SEV: use mutex guard in sev_mem_enc_ioctl()
KVM: SEV: use mutex guard in snp_launch_update()
KVM: SEV: Assert that kvm->lock is held when querying SEV+ support
KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe"
KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y
KVM: SEV: WARN on unhandled VM type when initializing VM
KVM: LoongArch: selftests: Add PMU overflow interrupt test
KVM: LoongArch: selftests: Add basic PMU event counting test
KVM: LoongArch: selftests: Add cpucfg read/write helpers
LoongArch: KVM: Add DMSINTC inject msi to vCPU
LoongArch: KVM: Add DMSINTC device support
LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function
LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch
LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch
...
Guangshuo Li [Wed, 15 Apr 2026 17:05:15 +0000 (01:05 +0800)]
parisc: led: fix reference leak on failed device registration
When platform_device_register() fails in startup_leds(), the embedded
struct device in platform_leds has already been initialized by
device_initialize(), but the failure path only reports the error and
does not drop the device reference for the current platform device:
module.lds.S: Fix modules on 32-bit parisc architecture
On the 32-bit parisc architecture, we always used the
-ffunction-sections compiler option to tell the compiler to put the
functions into seperate text sections. This is necessary, otherwise
"big" kernel modules like ext4 or ipv6 fail to load because some
branches won't be able to reach their stubs.
Commit 1ba9f8979426 ("vmlinux.lds: Unify TEXT_MAIN, DATA_MAIN, and related
macros") broke this for parisc because all text sections will get
unconditionally merged now.
Introduce the ARCH_WANTS_MODULES_TEXT_SECTIONS config option which
avoids the text section merge for modules, and fix this issue by
enabling this option by default for 32-bit parisc.
Fixes: 1ba9f8979426 ("vmlinux.lds: Unify TEXT_MAIN, DATA_MAIN, and related macros") Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: stable@vger.kernel.org # v6.19+ Suggested-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Helge Deller <deller@gmx.de>
parisc: Fix signal code to depend on CONFIG_COMPAT instead of CONFIG_64BIT
The signal handler code used CONFIG_64BIT to decide if compat handling
code should be compiled in. Fix it to use CONFIG_COMPAT instead.
This allows to disable CONFIG_COMPAT even when running a 64-bit kernel.
Zen1's hardware divider can leave, under certain circumstances, partial
results from previous operations. Those results can be leaked by
another, attacker thread.
parisc: Drop ip_fast_csum() inline assembly implementation
The assembly code of ip_fast_csum() triggers unaligned access warnings
if the IP header isn't correctly aligned:
Kernel: unaligned access to 0x173d22e76 in inet_gro_receive+0xbc/0x2e8 (iir 0x0e8810b6)
Kernel: unaligned access to 0x173d22e7e in inet_gro_receive+0xc4/0x2e8 (iir 0x0e88109a)
Kernel: unaligned access to 0x173d22e82 in inet_gro_receive+0xc8/0x2e8 (iir 0x0e90109d)
Kernel: unaligned access to 0x173d22e7a in inet_gro_receive+0xd0/0x2e8 (iir 0x0e9810b8)
Kernel: unaligned access to 0x173d22e86 in inet_gro_receive+0xdc/0x2e8 (iir 0x0e8810b8)
We have the option to a) ignore the warnings, b) work around it by
adding more code to check for alignment, or c) to switch to the generic
implementation and rely on the compiler to optimize the code.
Let's go with c), because a) isn't nice, and b) would effectively lead
to an implementation which is basically equal to c).
Kexin Sun [Sat, 21 Mar 2026 10:58:31 +0000 (18:58 +0800)]
parisc: update outdated comments for renamed ccio_alloc_consistent()
The function ccio_alloc_consistent() was renamed to ccio_alloc() by commit 79387179e2e4 ("parisc: convert to dma_map_ops"). Update the three stale
references in ccio-dma.c.
Also replace the obsolete PCI_DMA_TODEVICE constant name with DMA_TO_DEVICE in
a nearby comment to match the code.
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"Several fixes:
- Add missing static const
- Correct type 1 emulation for VFIO_CHECK_EXTENSION when no-iommu is
turned on
- Fix selftest memory leak and syzkaller splat
- Fix missed -EFAULT in fault reporting write() fops
- Fix a race where map/unmap with the internal IOVA allocator can
unmap things it should not"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Fix a race with concurrent allocation and unmap
iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 format
iommufd: Fix return value of iommufd_fault_fops_write()
iommufd: update outdated comment for renamed iommufd_hw_pagetable_alloc()
iommufd/selftest: Fix page leaks in mock_viommu_{init,destroy}
iommufd: vfio compatibility extension check for noiommu mode
iommufd: Constify struct dma_buf_attach_ops
Merge tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl
Pull fwctl updates from Jason Gunthorpe:
- New fwctl driver for Broadcom RDMA NICs
- Bug fix for non-modular builds
* tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl:
fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal
fwctl/bnxt_fwctl: Add documentation entries
fwctl/bnxt_fwctl: Add bnxt fwctl device
fwctl/bnxt_en: Create an aux device for fwctl
fwctl/bnxt_en: Refactor aux bus functions to be more generic
fwctl/bnxt_en: Move common definitions to include/linux/bnxt/
Merge tag 'soc-arm-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC ARM code updates from Arnd Bergmann:
"These are again very minimal updates:
- A workaround for firmware on Google Nexus 10
- A fix for early debugging on OMAP1
- A rework for Microchip SoC configuration
- Cleanups on OMAP2 an R-Car-Gen2"
* tag 'soc-arm-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: omap2: dead code cleanup in kconfig for ARCH_OMAP4
ARM: OMAP1: Fix DEBUG_LL and earlyprintk on OMAP16XX
arm64: Kconfig: provide a top-level switch for Microchip platforms
ARM: shmobile: rcar-gen2: Use of_phandle_args_equal() helper
ARM: omap: fix all kernel-doc warnings
ARM: omap2: Replace scnprintf with strscpy in omap3_cpuinfo
ARM: samsung: exynos5250: Allow CPU1 to boot
Merge tag 'soc-defconfig-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC defconfig updates from Arnd Bergmann:
"As usual, we enable a number of additional device drivers as loadable
modules, to support the added platforms. The largest change this time
is for OMAP2/3, which were not that well supported in the generic
arm32 defconfig.
The Tegra SoC platforms are now enabled by default in Kconfig when
ARCH_TEGRA is enabled, which means the defconfig change is done at the
same time as the Kconfig change here"
* tag 'soc-defconfig-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (25 commits)
arch/arm: Drop CONFIG_FIRMWARE_EDID from defconfig files
arm64: defconfig: Enable DP83TG720 PHY driver
arm64: tegra: defconfig: Drop redundant ARCH_TEGRA_foo_SOC
ARM: tegra: defconfig: Drop redundant ARCH_TEGRA_foo_SOC
arm64: defconfig: enable pci-pwrctrl-generic as module
arm64: defconfig: Enable Lontium LT8713sx driver
arm64: defconfig: Enable Qualcomm Eliza SoC display clock controller
arm64: defconfig: enable IPQ5210 RDP504 base configs
arm64: defconfig: Enable Milos LPASS LPI pinctrl driver
arm64: defconfig: Enable Kaanapali clock controllers
arm64: defconfig: Enable configs for Arduino VENTUNO Q
arm64: defconfig: Enable Qualcomm Eliza basic resource providers
arm64: defconfig: Enable S5KJN1 camera sensor
arm64: defconfig: Enable configurations for Toradex Aquila AM69
arm64: defconfig: remove SENSORS_SA67MCU
arm64: defconfig: Enable Qualcomm WCD937x headphone codec as module
arm64: defconfig: Enable QCOMTEE module for QTEE-enabled Qualcomm SoCs
ARM: shmobile: defconfig: Refresh for v7.0-rc1
arm: multi_v7_defconfig: Enable more OMAP 3/4 related configs
ARM: multi_v7_defconfig: omap2plus_defconfig: Enable ITE IT66121 driver
...
Merge tag 'soc-drivers-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
"The driver updates again are all over the place with many minor fixes
going into platform specific code. The most notable changes are:
- Support for Microchip pic64gx system controllers
- Work on cleaning up devicetree bindings for SoC drivers, and
converting them into the new format
- Lots of smaller changes for Qualcomm SoC drivers, including support
for a number of newly supported chips
- reset controller API cleanups and a new driver for Cix Sky1
- Reworks of the Tegra PMC and CBB drivers, along with a change to
how individual Tegra SoCs get selected in Kconfig and BPMP firmware
driver updates including a refresh of the ABI header to match the
version used by firmware
- STM32 updates to the firewall bus driver and support for the debug
bus through OP-TEE
- SCMI firmware driver improvements for reliability, in particular
for dealing with broken firmware interrupts
- Memory driver updates for Tegra, and a patch to remove the unused
Baikal T1 driver"
* tag 'soc-drivers-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (193 commits)
firmware: arm_ffa: Use the correct buffer size during RXTX_MAP
firmware: qcom: scm: Allow QSEECOM on Lenovo IdeaCentre Mini X
clk: spear: fix resource leak in clk_register_vco_pll()
reset: rzv2h-usb2phy: Add support for VBUS mux controller registration
reset: rzv2h-usb2phy: Convert to regmap API
dt-bindings: reset: renesas,rzv2h-usb2phy: Document RZ/G3E USB2PHY reset
dt-bindings: reset: renesas,rzv2h-usb2phy: Add '#mux-state-cells' property
soc: microchip: add mpfs gpio interrupt mux driver
dt-bindings: soc: microchip: document PolarFire SoC's gpio interrupt mux
gpio: mpfs: Add interrupt support
soc: qcom: ubwc: add helpers to get programmable values
soc: qcom: ubwc: add helper to get min_acc length
firmware: qcom: scm: Register gunyah watchdog device
soc: qcom: socinfo: Add SoC ID for SA8650P
dt-bindings: arm: qcom,ids: Add SoC ID for SA8650P
firmware: qcom: scm: Allow QSEECOM on Mahua CRD
soc: qcom: wcnss: simplify allocation of req
soc: qcom: pd-mapper: Add support for Eliza
soc: qcom: aoss: compare against normalized cooling state
soc: qcom: llcc: fix v1 SB syndrome register offset
...
Merge tag 'soc-dt-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC devicetree updates from Arnd Bergmann:
"A number of SoC platforms are adding modernized variants of their
already supported chips time, with a total of 12 new SoCs, and two
older SoC getting removed:
- Qualcomm Glymur is a compute SoC using 18 Oryon-2 CPU cores
- Qualcomm Mahua is a variant of Glymur with only 12 CPU cores, but
largely identical.
- Qualcomm Eliza is an embeded platform for mobile phone (SM7750) and
IOT (QC7790S/M) workloads
- Qualcomm IPQ5210 is a wireless networking SoC using Cortex-A53
cores
- Qualcomm apq8084 and ipq806x had only rudimentary support but no
actual products using them, so they are now gone.
- Axis ARTPEC-9 is a follow-up to the ARTPEC-8 embedded SoC, using
the Samsung SoC platform but now with Cortex-A55 cores
- ARM Zena is a virtual platform in FVP using Cortex-A720AE cores,
with additional versions planned to be merged in the future.
- ARM corstone-1000-a320 is a reference platform for IOT, using
low-end Cortex-A320 cores
- Microchip LAN9691 is an updated 64-bit variant of the arm32 lan966x
series of networking SoCs
- Microchip PIC64GX is an embedded RISC-V chip using SIFIVE U54 CPU
cores
- Rockchip RV1103B is the low-end 32-bit single-core vision processor
- Renesas RZ/G3L (r9a08g046) is an industrial embedded chip using
Cortex-A55 cores, similar to the G3E and G3S variants we already
supported.
- NXP S32N79 is an automotive SoC using Cortex-A78AE cores, a
significant upgrade from the older S32V and S32G series
These all come with at least one reference board or an initial product
using these, in total there are 67 newly added boards. The ones for
already supported SoCs are:
- Two more Aspeed BMC based boards
- Three older tablets based on 32-bit OMAP4 and Exynos5 SoCs
- One Set-top-box based on Allwinner H6
- 22 additional industrial/embedded boards using 64-bit NXP i.MX8M or
i.MX9 SoCs
- 20 Qualcomm SoC based machines across all possible markets:
workstation, gaming, laptop, phone, networking, reference, ...
- Three more Rockchips rk35xx based boards
- Four variants of the Toradex Verdin using TI AM62
Other notable bits are:
- A cleanup for the 32-bit Tegra paz00 board moved the last board
specific code on Tegra into equivalent dts syntax.
- There continues to be a significant number of fixes for static
checking of dtc syntax, but it feels like this is slowing down,
hopefully getting into a state where most known issues are
addressed
- Additional hardware support for many existing boards across SoC
families, notably Qualcomm, Broadcom, i.MX2, i.MX6, Rockchips,
STM32, Mediatek, Tegra, TI and Microchip"
* tag 'soc-dt-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (841 commits)
arm64: dts: ti: k3: Use memory-region-names for r5f
ARM: dts: imx: Add DT overlays for DH i.MX6 DHCOM SoM and boards
ARM: dts: imx6sx: remove fallback compatible string fsl,imx28-lcdif
ARM: dts: imx25: rename node name tcq to touchscreen
ARM: dts: imx: b850v3: Disable unused usdhc4
ARM: dts: imx: b850v3: Define GPIO line names
ARM: dts: imx: b850v3: Use alphabetical sorting
ARM: dts: imx: bx50v3: Configure phy-mode to eliminate a warning
ARM: dts: imx: bx50v3: Configure switch PHY max-speed to 100Mbps
ARM: dts: imx7ulp: Add CPU clock and OPP table support
ARM: dts: imx7-mba7: Deassert BOOT_EN after boot
ARM: dts: tqma7: add boot phase properties
ARM: dts: imx7s: add boot phase properties
ARM: dts: tqma6ul[l]: correct spelling of TQ-Systems
ARM: dts: mba6ulx: add boot phase properties
ARM: dts: imx6ul[l]-tqma6ul[l]: add boot phase properties
ARM: dts: imx6ul/imx6ull: add boot phase properties
ARM: dts: imx6qdl-mba6: add boot phase properties
ARM: dts: imx6qdl-tqma6: add boot phase properties
ARM: dts: imx6qdl: add boot phase properties
...
Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "pid: make sub-init creation retryable" (Oleg Nesterov)
Make creation of init in a new namespace more robust by clearing away
some historical cruft which is no longer needed. Also some
documentation fixups
- "selftests/fchmodat2: Error handling and general" (Mark Brown)
Fix and a cleanup for the fchmodat2() syscall selftest
- "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko)
- "hung_task: Provide runtime reset interface for hung task detector"
(Aaron Tomlin)
Give administrators the ability to zero out
/proc/sys/kernel/hung_task_detect_count
- "tools/getdelays: use the static UAPI headers from
tools/include/uapi" (Thomas Weißschuh)
Teach getdelays to use the in-kernel UAPI headers rather than the
system-provided ones
- "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta)
Several cleanups and fixups to the hardlockup detector code and its
documentation
- "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law)
A couple of small/theoretical fixes in the bch code
- "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo)
- "cleanup the RAID5 XOR library" (Christoph Hellwig)
A quite far-reaching cleanup to this code. I can't do better than to
quote Christoph:
"The XOR library used for the RAID5 parity is a bit of a mess right
now. The main file sits in crypto/ despite not being cryptography
and not using the crypto API, with the generic implementations
sitting in include/asm-generic and the arch implementations
sitting in an asm/ header in theory. The latter doesn't work for
many cases, so architectures often build the code directly into
the core kernel, or create another module for the architecture
code.
Change this to a single module in lib/ that also contains the
architecture optimizations, similar to the library work Eric
Biggers has done for the CRC and crypto libraries later. After
that it changes to better calling conventions that allow for
smarter architecture implementations (although none is contained
here yet), and uses static_call to avoid indirection function call
overhead"
- "lib/list_sort: Clean up list_sort() scheduling workarounds"
(Kuan-Wei Chiu)
Clean up this library code by removing a hacky thing which was added
for UBIFS, which UBIFS doesn't actually need
- "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt)
Fix a few bugs in the scatterlist code, add in-kernel tests for the
now-fixed bugs and fix a leak in the test itself
- "kdump: Enable LUKS-encrypted dump target support in ARM64 and
PowerPC" (Coiby Xu)
Enable support of the LUKS-encrypted device dump target on arm64 and
powerpc
- "ocfs2: consolidate extent list validation into block read callbacks"
(Joseph Qi)
Cleanup, simplify, and make more robust ocfs2's validation of extent
list fields (Kernel test robot loves mounting corrupted fs images!)
* tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits)
ocfs2: validate group add input before caching
ocfs2: validate bg_bits during freefrag scan
ocfs2: fix listxattr handling when the buffer is full
doc: watchdog: fix typos etc
update Sean's email address
ocfs2: use get_random_u32() where appropriate
ocfs2: split transactions in dio completion to avoid credit exhaustion
ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path()
ocfs2: validate extent block list fields during block read
ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec()
ocfs2: validate dx_root extent list fields during block read
ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY
ocfs2: handle invalid dinode in ocfs2_group_extend
.get_maintainer.ignore: add Askar
ocfs2: validate bg_list extent bounds in discontig groups
checkpatch: exclude forward declarations of const structs
tools/accounting: handle truncated taskstats netlink messages
taskstats: set version in TGID exit notifications
ocfs2/heartbeat: fix slot mapping rollback leaks on error paths
arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel
...
Merge tag 'v7.1-rc1-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- Fix integer underflow in encrypted read
- Four debug patches, adding a few tracepoints
- Minor update to MAINTAINERS file (preferred server URL for cifs)
- Remove the BUG_ON() calls in d_mark_tmpfile_name
* tag 'v7.1-rc1-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
MAINTAINERS: change git.samba.org to https
smb: client: fix integer underflow in receive_encrypted_read()
smb: client: add tracepoints for deferred handle caching
smb: client: add oplock level to smb3_open_done tracepoint
smb: client: add tracepoint for local lock conflicts
smb: client: add tracepoints for lock operations
vfs: get rid of BUG_ON() in d_mark_tmpfile_name()
Jiri Olsa [Thu, 16 Apr 2026 10:00:34 +0000 (12:00 +0200)]
libbpf: Prevent double close and leak of btf objects
Sashiko found possible double close of btf object fd [1],
which happens when strdup in load_module_btfs fails at which
point the obj->btf_module_cnt is already incremented.
The error path close btf fd and so does later cleanup code in
bpf_object_post_load_cleanup function.
Also libbpf_ensure_mem failure leaves btf object not assigned
and it's leaked.
Replacing the err_out label with break to make the error path
less confusing as suggested by Alan.
Incrementing obj->btf_module_cnt only if there's no failure
and releasing btf object in error path.
====================
bpf: allow UTF-8 literals in bpf_bprintf_prepare()
bpf_bprintf_prepare() currently rejects any non-ASCII byte in format
strings, so helpers such as bpf_trace_printk() fail to emit UTF-8
literal text even when those bytes are not part of a format specifier.
Keep plain text permissive while continuing to parse '%' sequences as
ASCII-only. Patch 1 updates snprintf_negative() at the same time so the
selftests stay consistent during bisection. Patch 2 then extends
trace_printk coverage for both the valid UTF-8 literal case and the
invalid non-ASCII-after-'%' case.
Changes in v3:
- drop Suggested-by trailers and move review credit into this changelog
- update test_snprintf_negative() in patch 1/2 so plain non-ASCII text is
accepted while non-ASCII after '%' is still rejected, keeping
./test_progs -t snprintf aligned with the new behavior.
- clarify the trace_printk negative case with an explicit invalid format
string and comment
- address Paul Chaignon's review feedback and keep the negative coverage
requested earlier by Alan Maguire
Changes in v2:
- split the core change and selftest updates into two patches
- drop unnecessary isspace()/ispunct() casts
- add comments to clarify plain-text vs format-specifier handling
- add a negative selftest for non-ASCII bytes inside '%' sequences
Testing:
- Reproduced on x86_64 without the core fix: ASCII trace output works,
while UTF-8 literal text in bpf_trace_printk() is rejected and
produces no trace output
- Verified with tools/testing/selftests/bpf: ./test_progs -t trace_printk
- Verified with tools/testing/selftests/bpf: ./test_progs -t snprintf
====================
Extend trace_printk coverage to verify that UTF-8 literal text is
emitted successfully and that '%' parsing still rejects non-ASCII
bytes once format parsing starts.
Use an explicitly invalid format string for the negative case so the
ASCII-only parser expectation is visible from the test code itself.
bpf: allow UTF-8 literals in bpf_bprintf_prepare()
bpf_bprintf_prepare() only needs ASCII parsing for conversion
specifiers. Plain text can safely carry bytes >= 0x80, so allow
UTF-8 literals outside '%' sequences while keeping ASCII control
bytes rejected and format specifiers ASCII-only.
This keeps existing parsing rules for format directives unchanged,
while allowing helpers such as bpf_trace_printk() to emit UTF-8
literal text.
Update test_snprintf_negative() in the same commit so selftests keep
matching the new plain-text vs format-specifier split during bisection.
====================
bpf: Fix NULL deref when storing scalar into kptr slot
map_kptr_match_type() accesses reg->btf before confirming the register
is PTR_TO_BTF_ID. A scalar store into a kptr slot has no btf, causing
a NULL pointer dereference. Guard base_type() first.