]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
3 weeks agodrm/msm/dp: Fix the ISR_* enum values
Jessica Zhang [Sun, 24 May 2026 10:33:30 +0000 (13:33 +0300)] 
drm/msm/dp: Fix the ISR_* enum values

The ISR_HPD_* enum should represent values that can be read from the
REG_DP_DP_HPD_INT_STATUS register. Swap ISR_HPD_IO_GLITCH_COUNT and
ISR_HPD_REPLUG_COUNT to map them correctly to register values.

While we are at it, correct the spelling for ISR_HPD_REPLUG_COUNT.

Fixes: 8ede2ecc3e5e ("drm/msm/dp: Add DP compliance tests on Snapdragon Chipsets")
Signed-off-by: Jessica Zhang <jessica.zhang@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Tested-by: Val Packett <val@packett.cool> # x1e80100-dell-latitude-7455
Tested-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com> # Hamoa IOT EVK, QCS8300 Ride
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/727602/
Link: https://lore.kernel.org/r/20260524-hpd-refactor-v6-2-cf3ab488dd7b@oss.qualcomm.com
3 weeks agodrm/msm/dp: fix HPD state status bit shift value
Jessica Zhang [Sun, 24 May 2026 10:33:29 +0000 (13:33 +0300)] 
drm/msm/dp: fix HPD state status bit shift value

The HPD state status is the 3 most significant bits, not 4 bits of the
HPD_INT_STATUS register.

Fix the bit shift macro so that the correct bits are returned in
msm_dp_aux_is_link_connected().

Fixes: 19e52bcb27c2 ("drm/msm/dp: return correct connection status after suspend")
Signed-off-by: Jessica Zhang <jessica.zhang@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Tested-by: Val Packett <val@packett.cool> # x1e80100-dell-latitude-7455
Tested-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com> # Hamoa IOT EVK, QCS8300 Ride
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/727611/
Link: https://lore.kernel.org/r/20260524-hpd-refactor-v6-1-cf3ab488dd7b@oss.qualcomm.com
3 weeks agodt-bindings: mmc: sdhci-msm: Document the Shikra compatible
Monish Chunara [Fri, 8 May 2026 10:15:44 +0000 (15:45 +0530)] 
dt-bindings: mmc: sdhci-msm: Document the Shikra compatible

Document the Shikra-specific SDHCI compatible in the sdhci-msm binding.
Use "qcom,sdhci-msm-v5" as the fallback compatible for the MSM SDHCI v5
controller used on Shikra.

Signed-off-by: Monish Chunara <monish.chunara@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>
3 weeks agoarm64: dts: rockchip: add radxa camera 4k on rock 5b+ cam1
Michael Riesch [Fri, 22 May 2026 21:23:13 +0000 (23:23 +0200)] 
arm64: dts: rockchip: add radxa camera 4k on rock 5b+ cam1

Add device tree overlay for the Radxa Camera 4K (featuring the Sony IMX415
image sensor) to applied on the Radxa ROCK 5B+ CAM1 port.

Signed-off-by: Michael Riesch <michael.riesch@collabora.com>
Link: https://patch.msgid.link/20260522-rk3588-vicap-v5-7-d1d1f5265c56@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
3 weeks agoarm64: dts: rockchip: add radxa camera 4k on rock 5b+ cam0
Michael Riesch [Fri, 22 May 2026 21:23:12 +0000 (23:23 +0200)] 
arm64: dts: rockchip: add radxa camera 4k on rock 5b+ cam0

Add device tree overlay for the Radxa Camera 4K (featuring the Sony IMX415
image sensor) to applied on the Radxa ROCK 5B+ CAM0 port.

Signed-off-by: Michael Riesch <michael.riesch@collabora.com>
Link: https://patch.msgid.link/20260522-rk3588-vicap-v5-6-d1d1f5265c56@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
3 weeks agoarm64: dts: rockchip: add vicap node to rk3588
Michael Riesch [Fri, 22 May 2026 21:23:11 +0000 (23:23 +0200)] 
arm64: dts: rockchip: add vicap node to rk3588

Add the device tree node for the RK3588 Video Capture (VICAP) unit.

Signed-off-by: Michael Riesch <michael.riesch@collabora.com>
[converted reg values in vicap ports to hexadecimal, to have them align
 with the port@X values, and be less confusing]
Link: https://patch.msgid.link/20260522-rk3588-vicap-v5-5-d1d1f5265c56@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
3 weeks agoblock: Include bvec.h kernel-doc in the htmldocs
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:59:04 +0000 (18:59 +0100)] 
block: Include bvec.h kernel-doc in the htmldocs

People have gone to the trouble of writing this kernel-doc; the
least we can do is publish it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@linux.dev>
Link: https://patch.msgid.link/20260528175905.1102280-3-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agoblock: Add bvec_folio()
Matthew Wilcox (Oracle) [Thu, 28 May 2026 17:59:03 +0000 (18:59 +0100)] 
block: Add bvec_folio()

This is a simple helper which replaces page_folio(bvec->bv_page).
Minor improvement in readability, but the real motivation is to reduce
the number of references to bvec->bv_page so that it can be changed
with less work.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@linux.dev>
Link: https://patch.msgid.link/20260528175905.1102280-2-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agoRevert "media: renesas: vsp1: brx: Fix format propagation"
Laurent Pinchart [Wed, 6 May 2026 21:56:50 +0000 (00:56 +0300)] 
Revert "media: renesas: vsp1: brx: Fix format propagation"

This reverts commit 937f3e6b51f1cea079be9ba642665f2bf8bcc31f.

The change to format propagation in the BRx broke configuration of the
DRM pipeline. Revert it to fix the regression.

The original commit was meant to fix a v4l2-compliance failure, with no
known userspace applications being affected beside test tools. Reverting
is the simplest option, a more comprehensive fix can be developed (and
tested more thoroughly) later.

Reported-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Closes: https://lore.kernel.org/linux-media/CA+V-a8t481xuwava0nb7uY9CUPqFWZ_8EP0xrK3BgumP7HDcLg@mail.gmail.com
Fixes: 937f3e6b51f1 ("media: renesas: vsp1: brx: Fix format propagation")
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/T2H
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260506215650.1897177-3-laurent.pinchart+renesas@ideasonboard.com
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
3 weeks agoRevert "media: renesas: vsp1: Initialize format on all pads"
Laurent Pinchart [Wed, 6 May 2026 21:56:49 +0000 (00:56 +0300)] 
Revert "media: renesas: vsp1: Initialize format on all pads"

This reverts commit 133ac42af0a1b389e8b7b3dc7c1cc8c30ff162b6.

The change to format initialization, along with the change to format
propagation in the BRx in commit 937f3e6b51f1 ("media: renesas: vsp1:
brx: Fix format propagation"), broke configuration of the DRM pipeline.
Revert it to fix the regression.

The original commit was meant to fix a v4l2-compliance failure, with no
known userspace applications being affected beside test tools. Reverting
is the simplest option, a more comprehensive fix can be developed (and
tested more thoroughly) later.

Fixes: 133ac42af0a1 ("media: renesas: vsp1: Initialize format on all pads")
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/T2H
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260506215650.1897177-2-laurent.pinchart+renesas@ideasonboard.com
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
3 weeks agosched/fair: Use rq_clock() in update_tg_load_avg() rate-limit
Rik van Riel [Tue, 26 May 2026 19:43:29 +0000 (12:43 -0700)] 
sched/fair: Use rq_clock() in update_tg_load_avg() rate-limit

update_tg_load_avg() is called once per leaf cfs_rq from the
__update_blocked_fair() walk that runs inside the NOHZ idle-balance
softirq, and again from update_load_avg() with UPDATE_TG.  Its first
operation after the trivial early-outs is unconditionally:

now = sched_clock_cpu(cpu_of(rq_of(cfs_rq)));
if (now - cfs_rq->last_update_tg_load_avg < NSEC_PER_MSEC)
return;

Jakub ran into a system where nohz_idle_balance() was taking 75%
of a CPU (which is handling network traffic and doing many irq_exit_cpu
calls), with 35% of that CPU spent in update_load_avg, and 17% of the
CPU in sched_clock_cpu(), reading the TSC.

In a quick synthetic test, it looks like this patch reduces the
CPU use of sched_balance_update_blocked_averages by about 20%.

Switch the rate-limit to read rq_clock(rq_of(cfs_rq)) instead.
This eliminates the rdtsc, and uses a fairly fresh timestamp,
because all callers of update_tg_load_avg() and clear_tg_load_avg()
hold rq->lock and have called update_rq_clock(rq) within microseconds:

  caller                                   pre-state
  __update_blocked_fair                    encloser did update_rq_clock(rq)
  update_load_avg's three UPDATE_TG sites  under rq->lock after enqueue/dequeue/update_curr
  attach_/detach_entity_cfs_rq             preceded by update_load_avg(...)
  clear_tg_load_avg via offline path       rq_clock_start_loop_update(rq) upfront

so rq->clock is fresh at every call.  Since cfs_rqs are per-CPU
per-task_group, cfs_rq->last_update_tg_load_avg is always compared
against the same rq's clock; no cross-rq drift.

Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude (Anthropic)
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://patch.msgid.link/20260527110250.6a91718d@fangorn
3 weeks agoselftests/sched_ext: Validate dl_server attach/detach in total_bw test
Andrea Righi [Tue, 26 May 2026 16:42:49 +0000 (18:42 +0200)] 
selftests/sched_ext: Validate dl_server attach/detach in total_bw test

Extend the total_bw selftest to validate the fair/ext dl_server
auto-attach/detach operations.

After the existing consistency checks, the test now doubles the
fair_server's runtime on every CPU via debugfs and verifies that:
 1. total_bw grew after the customization (proves fair_server was
    attached and apply_params() honored the dl_bw_attached flag),
 2. with the minimal BPF scheduler loaded, total_bw drops back to the
    baseline value (proves fair_server was detached and ext_server was
    attached at its own default runtime),
 3. after unload total_bw matches the doubled value from step 1 (proves
    fair_server was re-attached with the runtime customization preserved
    across the load/unload cycle).

Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://patch.msgid.link/20260526164420.638711-3-arighi@nvidia.com
3 weeks agosched_ext: Auto-register/unregister dl_server reservations
Andrea Righi [Tue, 26 May 2026 16:42:48 +0000 (18:42 +0200)] 
sched_ext: Auto-register/unregister dl_server reservations

Commit cd959a3562050d ("sched_ext: Add a DL server for sched_ext tasks")
introduced an ext_server deadline server to protect sched_ext tasks from
fair/RT starvation, mirroring the existing fair_server.

Currently, both servers reserve their 50ms/1000ms bandwidth at boot,
regardless of whether a BPF scheduler is loaded. Unused bandwidth is
still reclaimed at runtime by other classes, but the static reservation
prevents the RT class from implicitly using that headroom when one of
the two classes is guaranteed to be empty.

A sysadmin can work around this by writing
/sys/kernel/debug/sched/{fair,ext}_server/cpu*/runtime, but that
requires manual action and not all systems expose debugfs.

A better approach is to make server bandwidth reservations dynamic: only
the scheduling policy that is currently active should register its
reservation, while the inactive one should not artificially hold
capacity (keeping both reservations only when the BPF scheduler is
running in partial mode):

 +---------------------------------------------+-------------+------------+
 | BPF scheduler state                         | fair server | ext server |
 +---------------------------------------------+-------------+------------+
 | not loaded (default boot)                   | reserved    | none       |
 | loaded full mode (!SCX_OPS_SWITCH_PARTIAL)  | none        | reserved   |
 | loaded partial mode (SCX_OPS_SWITCH_PARTIAL)| reserved    | reserved   |
 +---------------------------------------------+-------------+------------+

To achieve this, introduce an "attached/detached" state for each
deadline server, so the kernel can decide whether a server's bandwidth
should be accounted in global bandwidth tracking.

At boot, the system starts with only the fair server contributing to
bandwidth accounting. When a BPF scheduler is enabled, the ext server is
attached and may replace or complement the fair server depending on
whether full or partial mode is used. When sched_ext is disabled, the
system restores the previous deadline bandwidth values and behavior.

The transition logic ensures that switching between scheduling modes is
consistent and reversible, without losing runtime configuration or
requiring manual intervention.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://patch.msgid.link/20260526164420.638711-2-arighi@nvidia.com
3 weeks agosched/deadline: Reject debugfs dl_server writes for offline CPUs
Andrea Righi [Tue, 26 May 2026 10:05:02 +0000 (12:05 +0200)] 
sched/deadline: Reject debugfs dl_server writes for offline CPUs

Writing runtime or period via the per-CPU dl_server debugfs files
(/sys/kernel/debug/sched/{fair,ext}_server/cpu*/{runtime,period}) on an
offline CPU can trigger two distinct kernel issues:

1) Divide-by-zero in dl_server_apply_params():

  Oops: divide error: 0000 [#1] SMP NOPTI
  RIP: 0010:dl_server_apply_params+0x239/0x3a0
  Call Trace:
   sched_server_write_common.isra.0+0x21a/0x3c0
   full_proxy_write+0x78/0xd0
   vfs_write+0xe7/0x6e0

  Both __dl_sub() and __dl_add() divide by cpus internally, which can be
  0 once the CPU has been removed from any active root-domain span (this
  has been latent since the debugfs interface was introduced).

2) WARN_ON_ONCE in dl_server_start():

  WARNING: kernel/sched/deadline.c:1805 at dl_server_start+0x232/0x270

  Commit ee6e44dfe6e5 ("sched/deadline: Stop dl_server before CPU goes
  offline") added this check to catch enqueueing the server on an
  offline rq.

There's no meaningful semantics for re-configuring the per-CPU dl_server
bandwidth while the CPU is offline, so simply reject the write with
-EBUSY so userspace gets a clear error.

Closes: https://lore.kernel.org/all/20260526092228.3B6891F00A3A@smtp.kernel.org/
Fixes: d741f297bcea ("sched/fair: Fair server interface")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: abaci-kreproducer <abaci@linux.alibaba.com>
Link: https://patch.msgid.link/20260526100502.575774-1-arighi@nvidia.com
3 weeks agosched/topology: Provide arch_llc_mask for cache aware scheduling
Shrikanth Hegde [Fri, 29 May 2026 07:57:12 +0000 (13:27 +0530)] 
sched/topology: Provide arch_llc_mask for cache aware scheduling

Venkat Reported a boot kernel panic next-20260522. Git bisect pointed to
b5ea300a17e3 ("sched/cache: Make LLC id continuous")

Stacktrace points to llc_mask being null.

  NIP [c000000000e58504] _find_first_bit+0x44/0x130
  LR [c000000000e58500] _find_first_bit+0x40/0x130
  Call Trace:
  build_sched_domains+0xad8/0xe50
  sched_init_smp+0xa8/0x164
  kernel_init_freeable+0x250/0x370
  ret_from_kernel_user_thread+0x14/0x1c

On powerpc, cpu_coregroup_mask is available only when the underlying
hardware support coregroup. In shared LPAR, QEMU guest or power9 etc
coregroup isn't supported. In such cases llc_mask was being referenced
when it was null leading to panic.

On powerpc, LLC is at SMT core level. So assumption that coregroup(MC)
domain point to LLC is wrong. Provide a way for archs to say where its
LLC is if it not at MC domain.

Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
Closes: https://lore.kernel.org/all/51154de7-3700-4cb4-82f2-1b3a8fa427f7@linux.ibm.com/
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Co-developed-by: Chen, Yu C <yu.c.chen@intel.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Tested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://patch.msgid.link/20260529075712.1181039-1-sshegde@linux.ibm.com
3 weeks agotools/mm/slabinfo: remove redundant slab->partial assignment
Xuewen Wang [Mon, 18 May 2026 06:21:59 +0000 (14:21 +0800)] 
tools/mm/slabinfo: remove redundant slab->partial assignment

slab->partial is assigned by get_obj("partial") and then immediately
overwritten by get_obj_and_str("partial", &t). Remove the first
redundant assignment.

Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
Link: https://patch.msgid.link/20260518062159.80664-4-wangxuewen@kylinos.cn
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
3 weeks agotools/mm/slabinfo: remove dead assignment in get_obj_and_str()
Xuewen Wang [Mon, 18 May 2026 06:21:58 +0000 (14:21 +0800)] 
tools/mm/slabinfo: remove dead assignment in get_obj_and_str()

The assignment `x = NULL` sets the local parameter variable instead of
`*x`, which is a no-op since `*x` was already set to NULL on the line
above. Remove the dead assignment.

Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
Reviewed-by: SeongJae Park <sj@kernel.org>
Link: https://patch.msgid.link/20260518062159.80664-3-wangxuewen@kylinos.cn
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
3 weeks agotools/mm/slabinfo: Fix trace disable logic inversion
Xuewen Wang [Mon, 18 May 2026 06:21:57 +0000 (14:21 +0800)] 
tools/mm/slabinfo: Fix trace disable logic inversion

The disable trace path in slab_debug() had a logic error where it would
set trace=1 instead of trace=0. This made trace functionality permanently
enabled once turned on for any slab cache.

Fixes: a87615b8f9e2 ("SLUB: slabinfo upgrade")
Cc: stable@vger.kernel.org
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
WARNING: From:/Signed-off-by: email address mismatch: 'From: wangxuewen <18810879172@163.com>' != 'Signed-off-by: wangxuewen <wangxuewen@kylinos.cn>'
Link: https://patch.msgid.link/20260518062159.80664-2-wangxuewen@kylinos.cn
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
3 weeks agoMAINTAINERS: add slab-related scripts and tools to SLAB ALLOCATOR
Vlastimil Babka (SUSE) [Mon, 25 May 2026 07:26:39 +0000 (09:26 +0200)] 
MAINTAINERS: add slab-related scripts and tools to SLAB ALLOCATOR

Make sure the maintainers and reviewers are CC'd on changes to the
scripts and tools that depend on slab internals.

Link: https://patch.msgid.link/20260525-maint-slab-tools-v1-1-d66b69f1412a@kernel.org
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
3 weeks agodrm/sched: Fix clang build warning in kunit tests
Tvrtko Ursulin [Fri, 22 May 2026 09:01:29 +0000 (10:01 +0100)] 
drm/sched: Fix clang build warning in kunit tests

Initializing compile time constant struct or arrays from another such
variable is a gcc extension, while clang strictly requires a compile time
constant literal.

As reported by LKP:

>> drivers/gpu/drm/scheduler/tests/tests_scheduler.c:675:10: error: initializer element is not a compile-time constant
                                 drm_sched_scheduler_two_clients_attr),
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/kunit/test.h:224:13: note: expanded from macro 'KUNIT_CASE_PARAM_ATTR'
                     .attr = attributes, .module_name = KBUILD_MODNAME}
                             ^~~~~~~~~~
   1 error generated.

vim +675 drivers/gpu/drm/scheduler/tests/tests_scheduler.c

   671
   672 static struct kunit_case drm_sched_scheduler_two_clients_tests[] = {
   673 KUNIT_CASE_PARAM_ATTR(drm_sched_scheduler_two_clients_test,
   674       drm_sched_scheduler_two_clients_gen_params,
 > 675       drm_sched_scheduler_two_clients_attr),
   676 {}
   677 };
   678

Fix it by using a compound literal as other tests do.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605220312.Pu7UO05u-lkp@intel.com/
Fixes: 97ef806a5314 ("drm/sched: Add some scheduling quality unit tests")
Cc: Philipp Stanner <phasta@kernel.org>
Acked-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://lore.kernel.org/r/20260522090129.9385-1-tvrtko.ursulin@igalia.com
3 weeks agoKVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisor
Mark Brown [Thu, 28 May 2026 23:01:44 +0000 (00:01 +0100)] 
KVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisor

ZCR_EL2 can be updated by a VHE guest hypervisor either using ZCR_EL2
(which traps) or ZCR_EL1 (which does not trap). KVM handles both in
different way:

- on ZCR_EL2 trap, ZCR_EL2.LEN is immediately capped at the VM's own
  VL limit. This has the potential to break existing SW that relies
  on the full LEN field to be stateful.

- on ZCR_EL1 access, we do absolutely nothing.

On restoring the SVE context for an L2 guest, we directly restore the
guest hypervisor's view of ZCR_EL2 into the physical ZCR_EL2. If the
guest's view of the register was updated using the ZCR_EL2 accessor,
the value has already been sanitised (with the caveat mentioned above).

But if the guest used ZCR_EL1, the raw value is written into the HW,
and the L2 guest can now access VLs that it shouldn't.

Fix all the above by moving the VL capping to the restore points,
ensuring that:

- the HW is always programmed with a capped value, irrespective of
  the accessor being used,

- the ZCR_EL2.LEN field is always completely stateful, irrespective
  of the accessor being used.

Additionally, move ZCR_EL2 to be a sanitised register, ensuring that
only the LEN field is actually stateful. This requires some creative
construction of the RES0 mask, as the sysreg generation script does
not yet generate RAZ/WI fields.

Fixes: b3d29a823099 ("KVM: arm64: nv: Handle ZCR_EL2 traps")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260529-kvm-arm64-fix-zcr-len-nv-v2-1-86cad51992bd@kernel.org
[maz: rewrote commit message, tidy up access_zcr_el2()]
Signed-off-by: Marc Zyngier <maz@kernel.org>
3 weeks agoRevert "esp: fix page frag reference leak on skb_to_sgvec failure"
Steffen Klassert [Fri, 29 May 2026 08:23:25 +0000 (10:23 +0200)] 
Revert "esp: fix page frag reference leak on skb_to_sgvec failure"

This reverts commit 2982e599fff6faa21c8df147d96fc7af6c1a2f24.

The patch does not fully fix the issue and the Author does
not match the 'Signed-off-by:' tag, so revert it for now.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 weeks agovfs: replace ints with enum last_type for LAST_XXX
Jori Koolstra [Thu, 28 May 2026 17:58:47 +0000 (17:58 +0000)] 
vfs: replace ints with enum last_type for LAST_XXX

Several functions in namei.c take an "int *type" parameter, such as
filename_parentat(). To know what values this can take you have to find
the anonymous struct that defines the LAST_XXX values. Define an enum
last_type to make this type explicit.

Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Link: https://patch.msgid.link/20260528175854.57626-2-jkoolstra@xs4all.nl
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agovfs: make LAST_XXX private to fs/namei.c
Jori Koolstra [Thu, 28 May 2026 17:58:46 +0000 (17:58 +0000)] 
vfs: make LAST_XXX private to fs/namei.c

The only user of LAST_XXX outside of fs/namei.c is fs/smb/server/vfs.c;
ksmbd_vfs_path_lookup() calls vfs_path_parent_lookup() and expects a
LAST_NORM last type (or it will be ENOENT). ksmbd_vfs_rename() also calls
vfs_path_parent_lookup() but forgets the LAST_NORM check.

It does not really make sense to have vfs_path_parent_lookup() expose
the last_type because it is only needed to ensure it is LAST_NORM. So
let's do this check in vfs_path_parent_lookup() instead and keep the
LAST_XXX internal to fs/namei.c. This changes the ENOENT errno in
ksmbd_vfs_path_lookup() to EINVAL, which matches better with how this is
handled by callers of filename_parentat().

Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Link: https://patch.msgid.link/20260528175854.57626-1-jkoolstra@xs4all.nl
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agortla: Document tests in README
Tomas Glozar [Thu, 14 May 2026 07:30:38 +0000 (09:30 +0200)] 
rtla: Document tests in README

RTLA tests are not documented anywhere. Mention both runtime and unit
tests in the README, with instructions on how to run them and a list of
dependencies and required system configuration.

Link: https://lore.kernel.org/r/20260514073038.204428-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
3 weeks agogpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close
Alexandre Courbot [Sat, 25 Apr 2026 13:11:12 +0000 (22:11 +0900)] 
gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close

Some parts of the GSP boot process are chip-specific actions, whereas
others (like sending the initial post-boot messages) deal directly with
the working GSP.

Reorganize the boot code a bit so the chipset-specific parts are clumped
together, which will make their extraction into a HAL easier.

This has no effect on the GSP boot process.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260521-nova-unload-v6-5-65f581c812c9@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run()
John Hubbard [Sat, 11 Apr 2026 02:49:35 +0000 (19:49 -0700)] 
gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run()

Move the SEC2 reset/load/boot sequence into a BooterFirmware::run()
method. This is mostly refactoring, with no significant behavior change,
done in preparation for adding an alternative FSP boot path.

Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260521-nova-unload-v6-4-65f581c812c9@nvidia.com
[acourbot: fix typo in commit message.]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: do not import firmware commands into GSP command module
Alexandre Courbot [Mon, 20 Apr 2026 09:42:27 +0000 (18:42 +0900)] 
gpu: nova-core: do not import firmware commands into GSP command module

Importing all the firmware commands like we did is a bit confusing, as
the layer of a command type (fw or GSP) cannot be inferred from looking
at its name alone. Furthermore it makes it impossible to create commands
that have the same name as their firmware command.

Thus, stop importing all commands and refer to them from the `fw` module
instead.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260521-nova-unload-v6-2-65f581c812c9@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: remove unneeded get_gsp_info proxy function
Alexandre Courbot [Sun, 19 Apr 2026 02:48:14 +0000 (11:48 +0900)] 
gpu: nova-core: remove unneeded get_gsp_info proxy function

This function was useful before the generic command-queue send methods
got merged, but it is just boilerplate now. Replace it with the correct
sequence to queue the `GetGspStaticInfo` command directly.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260521-nova-unload-v6-1-65f581c812c9@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agodrm: prevent integer overflows in dumb buffer creation helpers
Rajat Gupta [Thu, 21 May 2026 05:11:21 +0000 (22:11 -0700)] 
drm: prevent integer overflows in dumb buffer creation helpers

Fix integer overflow issues in the dumb buffer creation path:

1. drm_mode_create_dumb() does not bound width, height, or bpp
   before passing them to driver callbacks.  Downstream helpers
   (e.g. drm_gem_dma_dumb_create_internal) perform pitch/size
   alignment in u32 arithmetic that can overflow for extreme
   values.  Add hard limits: width and height < 8192, bpp <= 32.
   No legitimate software rendering use case exceeds these.

2. drm_mode_align_dumb() uses roundup(pitch, hw_pitch_align)
   without checking for overflow.  If pitch is near U32_MAX,
   roundup() wraps to a small value, making subsequent
   check_mul_overflow() pass with a much smaller pitch than
   intended.  Add an overflow check after roundup.

3. drm_mode_align_dumb() uses ALIGN(size, hw_size_align) which
   only works correctly for power-of-two alignment values.
   Replace with roundup() which works for any alignment.

Suggested-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
3 weeks agoriscv: dts: spacemit: k3: Initial support for CoM260-IFX board
Yixun Lan [Wed, 20 May 2026 23:45:28 +0000 (23:45 +0000)] 
riscv: dts: spacemit: k3: Initial support for CoM260-IFX board

The K3 CoM260-IFX board combine with one 260 pins "Gold Finger" computer
module with a carrier board. The module integrates the K3 SoC, LPDDR5,
UFS storage, Gigabit Ethernet, Micro SD card, PMIC Chip. The board offers
a comprehensive array of interfaces, including MIPI-DSI, MIPI-CSI,
DisplayPort, SDIO, SPI, I2S, I2C, CAN-FD, PWM, UART, USB, PCIe, and GMAC.

Add initial support for enabling Serial UART and ethernet.

Link: https://patch.msgid.link/20260520-02-k3-com260-ifx-v2-2-d55095457cf0@kernel.org
Signed-off-by: Yixun Lan <dlan@kernel.org>
3 weeks agodt-bindings: riscv: spacemit: Add K3 CoM260-IFX board
Yixun Lan [Wed, 20 May 2026 23:45:27 +0000 (23:45 +0000)] 
dt-bindings: riscv: spacemit: Add K3 CoM260-IFX board

The SpacemiT K3 CoM260-IFX board combines a 69.6 × 45 mm compute module
with a reference carrier board.

The module integrates up to 32GB LPDDR5 memory, UFS storage, Micro SD
card slot and includes interfaces such as dual MIPI CSI-2 connectors,
M.2 expansion, USB 3.0, Gigabit Ethernet, DisplayPort, and a 40-pin
expansion header.

The carrier board is intended as a general-purpose development platform
for CoM260 module and exposes interfaces for all of storage, display,
networking, and camera connectivity.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260520-02-k3-com260-ifx-v2-1-d55095457cf0@kernel.org
Signed-off-by: Yixun Lan <dlan@kernel.org>
3 weeks agocrypto: af_alg - Document that it is *always* slower
Demi Marie Obenour [Sat, 23 May 2026 19:43:04 +0000 (15:43 -0400)] 
crypto: af_alg - Document that it is *always* slower

Without support for zero-copy or off-CPU offloads, AF_ALG is always
slower than software cryptography. Its only advantage is that it might
save code size. However, this is largely mitigated by lightweight
userspace cryptographic libraries.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: af_alg - Drop support for off-CPU cryptography
Demi Marie Obenour [Sat, 23 May 2026 19:43:03 +0000 (15:43 -0400)] 
crypto: af_alg - Drop support for off-CPU cryptography

AF_ALG is deprecated and exposed to unprivileged userspace.  Only
use the least buggy algorithm implementations: the pure software ones.

This removes one of the main advantages of AF_ALG, which is the
ability to use it with off-CPU accelerators.  However, using off-CPU
accelerators has huge overheads, both in performance and attack surface.
I have yet to see real-world, performance-critical workloads where using
an accelerator via AF_ALG is actually a win over doing cryptography in
userspace.

If using an off-CPU accelerator really does turn out to be a win, a new
API should be developed that is actually a good fit for it.

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agonet: Remove support for AIO on sockets
Demi Marie Obenour [Sat, 23 May 2026 19:43:02 +0000 (15:43 -0400)] 
net: Remove support for AIO on sockets

The only user of msg->msg_iocb was AF_ALG, but that's deprecated.
It can be removed entirely at the cost of only supporting synchronous
operations.  This doesn't break userspace, which will silently block
(for a bounded amount of time) in io_submit instead of operating
asynchronously.

This also makes struct msghdr smaller, helping every other caller of
sendmsg().

Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: loongson - Select CRYPTO_RNG
Eric Biggers [Fri, 22 May 2026 02:25:25 +0000 (21:25 -0500)] 
crypto: loongson - Select CRYPTO_RNG

This driver registers a rng_alg, so it requires CRYPTO_RNG.

Fixes: 766b2d724c8d ("crypto: loongson - add Loongson RNG driver support")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605201622.qWOiiZTV-lkp@intel.com/
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: ccp/tsm - Enable the root port after the endpoint
Alexey Kardashevskiy [Thu, 21 May 2026 07:43:01 +0000 (17:43 +1000)] 
crypto: ccp/tsm - Enable the root port after the endpoint

The PCIe r7.0, chapter "6.33.8 Other IDE Rules" mandates if selective IDE
is enabled for config requersts, a stream must be enabled on the endpoint
before enabling it on the rootport:

===
For Selective IDE, the Stream must not be used until it has been enabled in
both Partner Ports. For cases where one of the Partner Ports is a Root Port
and Selective IDE for Configuration Requests is enabled, the other
Partner Port must be enabled prior to the Root Port. For other scenarios,
the mechanisms to satisfy this requirement are implementation-specific.
===

Do what the spec says.

Fixes: 4be423572da1 ("crypto/ccp: Implement SEV-TIO PCIe IDE (phase1)")
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - use pci logging variants for PCI-specific messages
Ahsan Atta [Wed, 20 May 2026 12:51:50 +0000 (13:51 +0100)] 
crypto: qat - use pci logging variants for PCI-specific messages

Replace dev_err(&pdev->dev, ...), dev_info(&pdev->dev, ...) and
dev_dbg(&pdev->dev, ...) with pci_err(), pci_info() and pci_dbg()
where the log message relates to a PCI subsystem operation such as
device enable, BAR mapping, PCI region requests, PCI state
save/restore, and SR-IOV management.

Messages about driver-level logic (NUMA topology, device matching,
accelerator units, capabilities, configuration, DMA) are intentionally
left as dev_err() even when a struct pci_dev pointer is in scope,
since those concern the device or driver rather than the PCI bus.

No functional change.

Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - protect service table iterations with service_lock
Ahsan Atta [Wed, 20 May 2026 12:41:55 +0000 (13:41 +0100)] 
crypto: qat - protect service table iterations with service_lock

The service_table list is protected by service_lock when entries are
added or removed (in adf_service_add() and adf_service_remove()), but
several functions iterate over the list without holding this lock.

A concurrent adf_service_register() or adf_service_unregister() call
could modify the list during traversal, leading to list corruption or
a use-after-free.

Fix this by holding service_lock across all list_for_each_entry()
iterations of service_table in adf_dev_init(), adf_dev_start(),
adf_dev_stop(), adf_dev_shutdown(), adf_dev_restarting_notify(),
adf_dev_restarted_notify(), and adf_error_notifier().

The lock ordering is safe: callers of the static helpers (adf_dev_up()
and adf_dev_down()) acquire state_lock before service_lock, and no
event_hld callback or service_lock holder ever acquires state_lock in
the reverse order.

Cc: stable@vger.kernel.org
Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Co-developed-by: Maksim Lukoshkov <maksim.lukoshkov@intel.com>
Signed-off-by: Maksim Lukoshkov <maksim.lukoshkov@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - fix restarting state leak on allocation failure
Ahsan Atta [Wed, 20 May 2026 12:33:00 +0000 (13:33 +0100)] 
crypto: qat - fix restarting state leak on allocation failure

In adf_dev_aer_schedule_reset(), ADF_STATUS_RESTARTING is set before
allocating reset_data. If the allocation fails, the function returns
-ENOMEM without queuing reset work, so nothing ever clears the bit.
This leaves the device permanently stuck in the restarting state,
causing all subsequent reset attempts to be silently skipped.

Fix this by using test_and_set_bit() to atomically claim the
RESTARTING state, preventing duplicate reset scheduling races under
concurrent fatal error reporting. If the subsequent allocation fails,
clear the bit to restore clean state so future reset attempts can
proceed.

Cc: stable@vger.kernel.org
Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Co-developed-by: Maksim Lukoshkov <maksim.lukoshkov@intel.com>
Signed-off-by: Maksim Lukoshkov <maksim.lukoshkov@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: octeontx - use strscpy_pad in ucode_load_store
Thorsten Blum [Wed, 20 May 2026 10:00:30 +0000 (12:00 +0200)] 
crypto: octeontx - use strscpy_pad in ucode_load_store

Instead of zero-initializing the temporary buffer and then copying into
it with strscpy(), use strscpy_pad() to copy the string and zero-pad any
trailing bytes. Drop the explicit size argument to further simplify the
code since strscpy_pad() can determine it automatically when the
destination buffer has a fixed length.

Also use strscpy_pad() to check for string truncation instead of the
hard-coded OTX_CPT_UCODE_NAME_LENGTH.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: s390 - add select CRYPTO_AEAD for aes
Arnd Bergmann [Wed, 20 May 2026 07:38:44 +0000 (09:38 +0200)] 
crypto: s390 - add select CRYPTO_AEAD for aes

The aes driver registers both skcipher and aead algorithms,
but when aead is not enabled this causes a link failure:

s390-linux-ld: arch/s390/crypto/aes_s390.o: in function `aes_s390_fini':
arch/s390/crypto/aes_s390.c:969:(.text+0x115e): undefined reference to `crypto_unregister_aead'
s390-linux-ld: arch/s390/crypto/aes_s390.o: in function `aes_s390_init':
arch/s390/crypto/aes_s390.c:1028:(.init.text+0x294): undefined reference to `crypto_register_aead'

Add the missing 'select' statement.

Fixes: bf7fa038707c ("s390/crypto: add s390 platform specific aes gcm support.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: atmel-ecc - Use named initializers for struct i2c_device_id
Uwe Kleine-König (The Capable Hub) [Wed, 20 May 2026 07:01:30 +0000 (09:01 +0200)] 
crypto: atmel-ecc - Use named initializers for struct i2c_device_id

While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.

This patch doesn't modify the compiled array, only its representation in
source form benefits. The former was confirmed with x86 and arm64
builds.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: atmel-sha204a - Use named initializers for struct i2c_device_id
Uwe Kleine-König (The Capable Hub) [Wed, 20 May 2026 07:01:29 +0000 (09:01 +0200)] 
crypto: atmel-sha204a - Use named initializers for struct i2c_device_id

While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.

This patch doesn't modify the compiled array, only its representation in
source form benefits. The former was confirmed with x86 and arm64
builds.

For consistency also assign .driver_data for the array item that the
driver relies on i2c_get_match_data() returning NULL for.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: atmel-sha204a - Drop of_device_id data
Uwe Kleine-König (The Capable Hub) [Wed, 20 May 2026 07:01:28 +0000 (09:01 +0200)] 
crypto: atmel-sha204a - Drop of_device_id data

The driver binds to i2c devices only and thus in the absence of an
assignment for .data in the of_device_id array i2c_get_match_data()
falls back to .driver_data from the i2c_device_id array. So only provide
&atsha204_quality once to reduce duplication.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: tegra - Return ENOMEM when input buffer allocation fails for ccm
Herbert Xu [Wed, 20 May 2026 02:51:14 +0000 (10:51 +0800)] 
crypto: tegra - Return ENOMEM when input buffer allocation fails for ccm

Ensure the ENOMEM error value is set when the input buffer allocation
fails in tegra_ccm_do_one_req.

Fixes: 1e245948ca0c ("crypto: tegra - finalize crypto req on error")
Reported-by: Vladislav Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Vladislav Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: ecrdsa - remove empty sig_alg exit callback
Thorsten Blum [Tue, 19 May 2026 08:36:32 +0000 (10:36 +0200)] 
crypto: ecrdsa - remove empty sig_alg exit callback

ecrdsa_exit_tfm() is empty, and sig_alg .exit is optional. The
corresponding .init callback is not set either, so there is nothing to
release in .exit.

Remove the empty function and leave .exit unset.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: tegra - Fix dma_free_coherent size error
Herbert Xu [Tue, 19 May 2026 04:22:18 +0000 (12:22 +0800)] 
crypto: tegra - Fix dma_free_coherent size error

When freeing a coherent DMA buffer, the size must match the value
that was used during the allocation.

Unfortunately the size field in the tegra driver gets overwritten
by this point so it no longer matches and creates a warning.

Fix this by saving a copy of the size on the stack.

Note that the ccm function actually mixes up the inbuf and outbuf
sizes, but it doesn't matter because the two sizes are actually
equal.

Fixes: 1cb328da4e8f ("crypto: tegra - Do not use fixed size buffers")
Reporeted-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Vladislav Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: inside-secure/eip93 - Add check for devm_request_threaded_irq
Aleksander Jan Bajkowski [Mon, 18 May 2026 21:24:59 +0000 (23:24 +0200)] 
crypto: inside-secure/eip93 - Add check for devm_request_threaded_irq

As the potential failure of the devm_request_threaded_irq(),
it should be better to check the return value and return
error if fails.

Fixes: 9739f5f93b78 ("crypto: eip93 - Add Inside Secure SafeXcel EIP-93 crypto engine support")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: inside-secure/eip93 - Drop superfluous blank line
Aleksander Jan Bajkowski [Mon, 18 May 2026 21:22:56 +0000 (23:22 +0200)] 
crypto: inside-secure/eip93 - Drop superfluous blank line

No need for a blank line.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: hisilicon/qm - support doorbell enable control
Zongyu Wu [Mon, 18 May 2026 14:29:56 +0000 (22:29 +0800)] 
crypto: hisilicon/qm - support doorbell enable control

The driver notifies the hardware to handle task through
doorbell. Currently, doorbell is enabled by default. To
prevent the process from sending doorbells during hardware
reset scenarios, which could cause the hardware to process
doorbells and trigger new errors:

For example, when the physical machine is resetting the device,
doorbells are still being sent from the virtual machine.

Therefore, the driver disables doorbell during hardware
unavailability. After hardware initialization is completed,
doorbell is enabled, and any task sent during the unavailability
period will return errors.

The hardware supports the PF to disable doorbells for all functions,
while the VF can only disable its own doorbell function. When the PF
is reset, it will disable doorbells for all functions. When VF is
reset, it only disables its own doorbell and does not affect tasks
on other functions.

Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: hisilicon - mask all error type when removing driver
Weili Qian [Mon, 18 May 2026 14:29:55 +0000 (22:29 +0800)] 
crypto: hisilicon - mask all error type when removing driver

Each bit in the error interrupt register corresponds to a specific
error type. A bit value of 0 enables the interrupt, and a bit value
of 1 disables the interrupt. Currently, when disabling interrupts,
it incorrectly enables the interrupt types that were not enabled.
Therefore, when disabling interrupts, all bits should be directly
written to 1.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: hisilicon/qm - disable error report before flr
Weili Qian [Mon, 18 May 2026 14:29:54 +0000 (22:29 +0800)] 
crypto: hisilicon/qm - disable error report before flr

Before function level reset, driver first disable device error report
and then waits for the device reset to complete. However, when the
error is recovered, the error bits will be enabled again, resulting in
invalid disable. It is modified to detect that there is no error
before disable error report, and then do FLR.

Fixes: 7ce396fa12a9 ("crypto: hisilicon - add FLR support")
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: hisilicon/qm - support function-level error reset
Zhushuai Yin [Mon, 18 May 2026 14:29:53 +0000 (22:29 +0800)] 
crypto: hisilicon/qm - support function-level error reset

When executing operations on crypto devices, hardware errors
are inevitable. For certain errors, a full device reset is
required to recover. However, in certain cases, only a
specific function may fail, while other functions can still
operate normally. A system-wide RAS reset in such cases would
unnecessarily impact functioning components.

This patch introduces function-level granularity handling,
enabling targeted resets of only the error-reporting
functions without affecting other operational functions.

Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: hisilicon/qm - place the interrupt status interface after the PM usage counter
Zhushuai Yin [Mon, 18 May 2026 14:29:52 +0000 (22:29 +0800)] 
crypto: hisilicon/qm - place the interrupt status interface after the PM usage counter

To avoid accessing memory of a suspended device, and since the counter
interface used by PM involves sleep operations, the counter interface
cannot be placed in the interrupt top half. Therefore, the interface for
acquiring the interrupt status in the RAS reset flow that resides in the
interrupt context needs to be moved to the bottom half for processing.

Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: hisilicon/qm - allow VF devices to query hardware isolation status
Zhushuai Yin [Mon, 18 May 2026 14:29:51 +0000 (22:29 +0800)] 
crypto: hisilicon/qm - allow VF devices to query hardware isolation status

The problem that the VF device cannot obtain the isolation
status and isolation threshold of the device is resolved.

The accelerator driver can query the device isolation status
and threshold via the VF device using the fault query sysfs
interface under uacce. Note that only the PF device supports
isolation policy configuration, while the VF device is
limited to read-only query operations.

Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agoerofs: fix use-after-free on sbi->sync_decompress
Gao Xiang [Fri, 22 May 2026 08:27:16 +0000 (16:27 +0800)] 
erofs: fix use-after-free on sbi->sync_decompress

z_erofs_decompress_kickoff() can race with filesystem unmount, causing
a use-after-free on sbi->sync_decompress.

When I/O completes, z_erofs_endio() calls z_erofs_decompress_kickoff()
to queue z_erofs_decompressqueue_work() asynchronously. Then, after all
folios are unlocked, unmount workflow can proceed and sbi will be freed
before accessing to sbi->sync_decompress.

Thread (unmount)        I/O completion        kworker
                        queue_work
                                              z_erofs_decompressqueue_work
                                               (all folios are unlocked)
cleanup_mnt
 ..
 erofs_kill_sb
  erofs_sb_free
   kfree(sbi)
                        access sbi->sync_decompress  // UAF!!

Fixes: 40452ffca3c1 ("erofs: add sysfs node to control sync decompression strategy")
Reported-by: syzbot+52bae5c495dbe261a0bc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=52bae5c495dbe261a0bc
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Jianan Huang <jnhuang95@gmail.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
3 weeks agoselftests/mm: add kmemleak verbose dedup test
Breno Leitao [Wed, 6 May 2026 12:58:25 +0000 (05:58 -0700)] 
selftests/mm: add kmemleak verbose dedup test

Add a regression test for the per-scan verbose dedup added in the
preceding commit.  The test loads samples/kmemleak's helper module
(CONFIG_SAMPLE_KMEMLEAK=m) to generate orphan allocations, several of
which share an allocation backtrace, runs four kmemleak scans with verbose
printing enabled, then walks dmesg looking for two "unreferenced object"
reports within a single scan that share an identical backtrace - which
would mean dedup failed to collapse them.

The test is intentionally permissive on detection but strict on
regressions:

 - PASS when no duplicates are observed, regardless of whether the
   dedup summary line ("... and N more object(s) with the same
   backtrace") was actually emitted. Per-CPU chunk reuse, slab
   freelist pointers, kernel stack residue and CONFIG_DEBUG_KMEMLEAK_
   AUTO_SCAN can all keep most of the orphans "still referenced" or
   reported across many separate scans, so the dedup path may have
   nothing to fold within one scan. That is not a regression.

 - PASS reports whether dedup actually fired, so a passing run on a
   well-behaved environment is still informative.

 - FAIL when two same-backtrace reports land in a single scan (clear
   dedup regression).

 - FAIL when kmemleak's own per-scan tally counts leaks but the
   verbose path emits zero "unreferenced object" lines - that catches
   a regression in the verbose printer itself, which would otherwise
   pass the duplicate check trivially.

 - SKIP when kmemleak is absent, disabled at runtime, or the helper
   module is not built.

The dmesg parser anchors stack-frame matching to the indentation kmemleak
uses for them (4+ spaces under "kmemleak: ") so unrelated kmemleak
warnings landing between reports do not get lumped into the backtrace key
and mask a duplicate.

Link: https://lore.kernel.org/20260506-kmemleak_dedup-v3-2-2d36aafc34da@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/kmemleak: dedupe verbose scan output by allocation backtrace
Breno Leitao [Wed, 6 May 2026 12:58:24 +0000 (05:58 -0700)] 
mm/kmemleak: dedupe verbose scan output by allocation backtrace

Patch series "mm/kmemleak: dedupe verbose scan output", v3.

I am starting to run with kmemleak in verbose enabled in some "probe
points" across the my employers fleet so that suspected leaks land in
dmesg without needing a separate read of /sys/kernel/debug/kmemleak.

The downside is that workloads which leak many objects from a single
allocation site flood the console with byte-for-byte identical backtraces.
Hundreds of duplicates per scan are common, drowning out distinct leaks
and unrelated kernel messages, while adding no signal beyond the first
occurrence.

This series collapses those duplicates inside kmemleak itself.  Each
unique stackdepot trace_handle prints once per scan, followed by a short
summary line when more than one object shares it:

  kmemleak: unreferenced object 0xff110001083beb00 (size 192):
  kmemleak:   comm "modprobe", pid 974, jiffies 4294754196
  kmemleak:   ...
  kmemleak:   backtrace (crc 6f361828):
  kmemleak:     __kmalloc_cache_noprof+0x1af/0x650
  kmemleak:     ...
  kmemleak:   ... and 71 more object(s) with the same backtrace

The "N new suspected memory leaks" tally and the contents of
/sys/kernel/debug/kmemleak are unchanged - the per-object detail is still
available on demand, only the verbose (dmesg) output is collapsed.

Patch 1 is the kmemleak change.

Patch 2 adds a selftest that loads samples/kmemleak's CONFIG_SAMPLE
kmemleak-test module to generate ten leaks sharing one call site and
checks that the printed count is strictly less than the reported leak
total.  Not sure if Patch 2 is useful or not, if not, it is easier to
discard.

This patch (of 2):

In kmemleak's verbose mode, every unreferenced object found during a scan
is logged with its full header, hex dump and 16-frame backtrace.
Workloads that leak many objects from a single allocation site flood dmesg
with byte-for-byte identical backtraces, drowning out distinct leaks and
other kernel messages.

Dedupe within each scan using stackdepot's trace_handle as the key: for
every leaked object with a recorded stack trace, look up the
representative kmemleak_object in a per-scan xarray keyed by trace_handle.
The first sighting stores the object pointer (with a get_object()
reference) and sets object->dup_count to 1; later sightings just bump
dup_count on the representative.  After the scan, walk the xarray once and
emit each unique backtrace, followed by a single summary line when more
than one object shares it.

Leaks whose trace_handle is 0 (early-boot allocations tracked before
kmemleak_init() set up object_cache, or stack_depot_save() failures under
memory pressure) cannot be deduped, so they are still printed inline via
the same locked OBJECT_ALLOCATED-checked helper.  The contents of
/sys/kernel/debug/kmemleak are unchanged - only the verbose console output
is collapsed.

Safety notes:

 - The xarray store happens outside object->lock: object->lock is a
   raw spinlock, while xa_store() may grab xa_node slab locks at a
   higher wait-context level which lockdep flags as invalid.
   trace_handle is captured under object->lock (which serialises with
   kmemleak_update_trace()'s writer), so it is safe to use after
   dropping the lock.

 - get_object() pins the kmemleak_object metadata across
   rcu_read_unlock(), but the underlying tracked allocation can still
   be freed concurrently. The deferred print path therefore re-acquires
   object->lock and re-checks OBJECT_ALLOCATED via print_leak_locked()
   before touching object->pointer; __delete_object() clears that flag
   under the same lock before the user memory goes away. The same
   helper is used by the trace_handle == 0 and xa_store() failure
   fallbacks, so every printer in the new path has identical safety
   guarantees.

 - If get_object() fails after we set OBJECT_REPORTED, the object is
   already being torn down (use_count hit zero); the leak count is
   still accurate but the verbose line is dropped, which is correct
   - the memory was freed concurrently and is no longer a leak.

 - If xa_store() fails to allocate an xa_node under memory pressure,
   we fall back to printing inline via print_leak_locked() instead of
   silently dropping the leak.

 - The hex dump is skipped for coalesced entries (dup_count > 1):
   bytes would differ across objects sharing a backtrace anyway, and
   skipping it removes the only remaining read of object->pointer's
   contents in the deferred path. The representative's reported size
   may also differ from the coalesced objects' sizes; the printed
   trace_handle reflects the representative's current value rather
   than the value used as the dedup key, which is normally - but not
   strictly - identical.

Link: https://lore.kernel.org/20260506-kmemleak_dedup-v3-0-2d36aafc34da@debian.org
Link: https://lore.kernel.org/20260506-kmemleak_dedup-v3-1-2d36aafc34da@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/swap: add cond_resched() in swap_reclaim_full_clusters to prevent softlockup
Zijiang Huang [Wed, 6 May 2026 13:09:19 +0000 (21:09 +0800)] 
mm/swap: add cond_resched() in swap_reclaim_full_clusters to prevent softlockup

We hit a real softlockup in an internal stress test environment.  The
workload was LTP memory/swap stress on a large arm64 machine, with 320
CPUs, about 1TB memory and an 8.6GB swap device.  The system was under
heavy load and the swap device had a large number of full clusters.  The
softlockup was triggered during a stress test after about 3 days.

So, add periodic cond_resched() calls during large full_clusters
reclaim operations to prevent softlockup issues.

Detailed call trace as follow:

PID: 3817773  TASK: ffff0883bb28b780  CPU: 48   COMMAND: "kworker/48:7"
   #0 [ffff800080183d10] __crash_kexec at ffffa4c1361e5de4
   #1 [ffff800080183d90] panic at ffffa4c1360d5e9c
   #2 [ffff800080183e20] watchdog_timer_fn at ffffa4c136231fa8
   ...
  #16 [ffff8000c4ad3cb0] swap_cache_del_folio at ffffa4c1363e1614
  #17 [ffff8000c4ad3ce0] __try_to_reclaim_swap at ffffa4c1363e4bfc
  #18 [ffff8000c4ad3d40] swap_reclaim_full_clusters at ffffa4c1363e5474
  #19 [ffff8000c4ad3da0] swap_reclaim_work at ffffa4c1363e550c
  #20 [ffff8000c4ad3dc0] process_one_work at ffffa4c136102edc
  #21 [ffff8000c4ad3e10] worker_thread at ffffa4c136103398
  #22 [ffff8000c4ad3e70] kthread at ffffa4c13610d95c

Link: https://lore.kernel.org/20260506130919.2298807-1-kerayhuang@tencent.com
Fixes: 5168a68eb78f ("mm, swap: avoid over reclaim of full clusters")
Signed-off-by: Zijiang Huang <kerayhuang@tencent.com>
Reviewed-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Reviewed-by: albinwyang <albinwyang@tencent.com>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Barry Song <baohua@kernel.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Youngjun Park <youngjun.park@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agohighmem-internal.h: fix typo in the comment for kunmap_atomic()
Zhouyi Zhou [Tue, 5 May 2026 02:11:25 +0000 (02:11 +0000)] 
highmem-internal.h: fix typo in the comment for kunmap_atomic()

Replace `PREEMP_RT` with `PREEMPT_RT` in the header comment to match the
correct kernel configuration name.

Link: https://lore.kernel.org/20260505021125.1941691-1-zhouzhouyi@gmail.com
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoDocs/admin-guide/mm/damon/stat: document kdamond_pid parameter
SeongJae Park [Sat, 2 May 2026 02:05:04 +0000 (19:05 -0700)] 
Docs/admin-guide/mm/damon/stat: document kdamond_pid parameter

Update DAMON_STAT usage document for newly added kdamond_pid parameter.

Link: https://lore.kernel.org/20260502020505.80822-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon/stat: add a parameter for reading kdamond pid
SeongJae Park [Sat, 2 May 2026 02:05:03 +0000 (19:05 -0700)] 
mm/damon/stat: add a parameter for reading kdamond pid

Patch series "mm/damon/stat: add kdamond_pid parameter".

DAMON_STAT doesn't provide the pid of its kdamond, unlike DAMON_RECLAIM
and DAMON_LRU_SORT.  This makes user-space management of DAMON_STAT
unnecessarily complicated.  Provide the information via a new parameter,
namely kdamond_pid, and document it.

This patch (of 2):

Knowing the pid of the kdamonds can help user-space management including
monitoring of DAMON's system resource consumption.  To make it easier,
DAMON_SYSFS, DAMON_RECLAIM and DAMON_LRU_SORT provide the pid information.
DAMON_STAT is not providing it, though.  Expose the pid of DAMON_STAT
kdamond via a new read-only module parameter, namely kdamond_pid.  This
also makes DAMON modules usage more standardized, because DAMON_RECLAIM
and DAMON_LRU_SORT also provide the information via their read-only
parameters of the same name.

Link: https://lore.kernel.org/20260502020505.80822-1-sj@kernel.org
Link: https://lore.kernel.org/20260502020505.80822-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoDocs/admin-guide/mm/damon/reclaim: update for autotune_monitoring_intervals
SeongJae Park [Fri, 1 May 2026 01:17:39 +0000 (18:17 -0700)] 
Docs/admin-guide/mm/damon/reclaim: update for autotune_monitoring_intervals

Update DAMON_RECLAIM usage document for the newly added monitoring
intervals auto-tuning enablement parameter.

Link: https://lore.kernel.org/20260501011740.81988-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon/reclaim: add autotune_monitoring_intervals parameter
SeongJae Park [Fri, 1 May 2026 01:17:38 +0000 (18:17 -0700)] 
mm/damon/reclaim: add autotune_monitoring_intervals parameter

Patch series "mm/damon/reclaim: support monitoring intervals auto-tuning".

The monitoring intervals auto-tuning feature of DAMON has proven to be
useful in multiple environments.  Add a new DAMON_RECLAIM parameter for
supporting the feature, and update the document for the new parameter.

This patch (of 2):

DAMON's monitoring intervals auto-tuning feature has proven to be useful
in multiple environments.  DAMON_RECLAIM is still asking users to do the
manual tuning of the intervals.  Add a module parameter for utilizing the
auto-tuning feature with the suggested default setup.

Note that use of the auto-tuning overrides the manually entered monitoring
intervals.  Also, note that the 'min_age' will dynamically changed
proportional to auto-tuned intervals.  It is recommended to use 'min_age'
short enough and use 'quota_mem_pressure_us' like coldness threshold
auto-tuning features together.

Link: https://lore.kernel.org/20260501011740.81988-1-sj@kernel.org
Link: https://lore.kernel.org/20260501011740.81988-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoselftests/cgroup: include slab in test_percpu_basic memory check
Li Wang [Fri, 1 May 2026 02:20:58 +0000 (10:20 +0800)] 
selftests/cgroup: include slab in test_percpu_basic memory check

test_percpu_basic() currently compares memory.current against only
memory.stat:percpu after creating 1000 child cgroups.

Observed failure:
  #./test_kmem
  ok 1 test_kmem_basic
  ok 2 test_kmem_memcg_deletion
  ok 3 test_kmem_proc_kpagecgroup
  ok 4 test_kmem_kernel_stacks
  ok 5 test_kmem_dead_cgroups
  memory.current 11530240
  percpu 8440000
  not ok 6 test_percpu_basic

That assumption is too strict: child cgroup creation also allocates
slab-backed metadata, so memory.current is expected to be larger than
percpu alone. One visible path is:

  cgroup_mkdir()
    cgroup_create()
      cgroup_addrm_file()
        cgroup_add_file()
          __kernfs_create_file()
            __kernfs_new_node()
              kmem_cache_zalloc()

These kernfs allocations are charged as slab and show up in
memory.stat:slab.

Update the check to compare memory.current against (percpu + slab)
within MAX_VMSTAT_ERROR, and print slab/delta in the failure message to
improve diagnostics.

Link: https://lore.kernel.org/20260501022058.18024-3-li.wang@linux.dev
Signed-off-by: Li Wang <li.wang@linux.dev>
Reviewed-by: Waiman Long <longman@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Sayali Patil <sayalip@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoselftests/cgroup: fix hardcoded page size in test_percpu_basic
Li Wang [Fri, 1 May 2026 02:20:57 +0000 (10:20 +0800)] 
selftests/cgroup: fix hardcoded page size in test_percpu_basic

Patch series "selftests/cgroup: Fix false positive failures in
test_percpu_basic", v2.

This patch series addresses two separate issues that cause false
positive failures in the test_percpu_basic test within the cgroup
kmem selftests.

The first issue stems from a hardcoded assumption about the system
page size, which breaks the test on architectures with larger page
sizes.

The second issue is an overly strict memory check that fails to
account for the slab metadata allocated during cgroup creation.

This patch (of 2):

MAX_VMSTAT_ERROR uses a hardcoded page size of 4096, which assumes 4K
pages.  This causes test_percpu_basic to fail on systems where the kernel
is configured with a larger page size, such as aarch64 systems using 16K
or 64K pages, where the maximum permissible discrepancy between
memory.current and percpu charges is proportionally larger.

Replace the hardcoded 4096 with sysconf(_SC_PAGESIZE) to correctly derive
the page size at runtime regardless of the underlying architecture or
kernel configuration.

Link: https://lore.kernel.org/20260501022058.18024-1-li.wang@linux.dev
Link: https://lore.kernel.org/20260501022058.18024-2-li.wang@linux.dev
Signed-off-by: Li Wang <li.wang@linux.dev>
Acked-by: Waiman Long <longman@redhat.com>
Reviewed-by: Sayali Patil <sayalip@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/filemap: do not count FAULT_FLAG_TRIED retries as mmap hits
fujunjie [Tue, 28 Apr 2026 01:59:44 +0000 (01:59 +0000)] 
mm/filemap: do not count FAULT_FLAG_TRIED retries as mmap hits

A fault that starts synchronous mmap readahead can return VM_FAULT_RETRY
after dropping mmap_lock.  The retry may then map the folio brought in by
that same miss.

Do not let this retry decrement mmap_miss.  The retry still maps the folio
from the page cache; it just does not count as a useful mmap readahead
hit.

Link: https://lore.kernel.org/tencent_22E6B8849EC1141FE7773C64467E6F1E2C09@qq.com
Signed-off-by: fujunjie <fujunjie1@qq.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Vishal Moola <vishal.moola@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/filemap: count only the faulting address as a mmap hit
fujunjie [Tue, 28 Apr 2026 01:59:43 +0000 (01:59 +0000)] 
mm/filemap: count only the faulting address as a mmap hit

Patch series "mm/filemap: tighten mmap_miss hit accounting", v3.

mmap_miss is increased when synchronous mmap readahead is needed, and
decreased when filemap_map_pages() maps folios that are already in the
page cache.  The decrease side can over-credit hits in two cases:

  - fault-around installs nearby PTEs even though the fault only proves
    that the faulting address was accessed;
  - after synchronous mmap readahead returns VM_FAULT_RETRY, the retry
    can find the folio brought in by the same miss and immediately
    cancel that miss.

Current evidence comes from a local KVM/data-disk microbenchmark using
mmap_miss_probe, with an 8 GiB guest, 2 vCPUs, 8192 KiB read_ahead_kb,
cold page cache before each run, 1% of the file accessed, and medians of 3
runs.

mmap_miss_probe mmap()s a prepared file with MADV_NORMAL and then touches
one byte at selected base-page offsets.  The access order is random,
sequential, or a fixed page stride.  The harness drops caches before each
run and samples /proc/vmstat around that access loop.

The 20 GiB case below is a larger-than-memory file case in an 8 GiB guest.
No separate memory hog was used.  The 4 GiB case uses the same 8 GiB
guest but keeps the file fit-in-memory.

Each case used a fresh temporary qcow2 data disk, seen by the guest as
/dev/vda, formatted as ext4 and mounted at /mnt/mmap-matrix.

Each result is "pgpgin GiB / elapsed seconds".  "pgpgin GiB" is the delta
of the guest /proc/vmstat pgpgin counter, converted from KiB to GiB; it is
used here as an approximate block input counter, not as resident memory or
exact application IO.  "Elapsed seconds" is the wall-clock runtime of the
whole mmap_miss_probe access pass, not per-access latency.

For the 20 GiB larger-than-memory case:

        workload       before                after
        random         223.377 GiB/101.293s  1.010 GiB/4.790s
        stride1021     204.214 GiB/97.557s   204.208 GiB/108.086s
        stride2053     409.584 GiB/193.700s  0.970 GiB/3.685s
        stride4099     406.452 GiB/134.241s  0.975 GiB/3.499s
        sequential       0.212 GiB/0.050s    0.212 GiB/0.057s

For the 4 GiB fit-in-memory case:

        workload       before              after
        random         3.987 GiB/1.960s    0.980 GiB/1.221s
        stride1021     4.002 GiB/1.838s    4.002 GiB/1.851s
        stride2053     3.991 GiB/1.835s    0.811 GiB/0.985s
        stride4099     4.001 GiB/1.836s    0.819 GiB/1.037s
        sequential     0.056 GiB/0.013s    0.056 GiB/0.018s

The 20 GiB setup also has an ablation.  P1 is only the faulting-address
hit accounting change.  P2-only is only the FAULT_FLAG_TRIED retry
filter.  P1+P2 is the combined accounting change:

        workload    variant   result
        random      baseline  223.377 GiB/101.293s
        random      P1        223.268 GiB/98.481s
        random      P2-only   223.257 GiB/100.091s
        random      P1+P2     1.010 GiB/4.790s
        stride2053  baseline  409.584 GiB/193.700s
        stride2053  P1        409.584 GiB/197.645s
        stride2053  P2-only   15.722 GiB/5.485s
        stride2053  P1+P2     0.970 GiB/3.685s
        sequential  baseline  0.212 GiB/0.050s
        sequential  P1        0.212 GiB/0.046s
        sequential  P2-only   0.212 GiB/0.050s
        sequential  P1+P2     0.212 GiB/0.057s

After the v2 implementation refactor, only the final P1+P2 shape was rerun
in the same setup.  The numbers stayed in line with the v1 P1+P2 rows
above:

        workload       larger-than-memory case    fit-in-memory case
                       20 GiB file, 1% access    4 GiB file, 1% access
        random           1.010 GiB/4.383s          0.980 GiB/1.088s
        stride1021     204.216 GiB/105.601s        4.001 GiB/1.783s
        stride2053       0.970 GiB/3.760s          0.810 GiB/0.908s
        stride4099       0.975 GiB/3.410s          0.818 GiB/0.870s
        sequential       0.212 GiB/0.060s          0.056 GiB/0.016s

This does not claim to solve every sparse pattern.  The stride1021 rows
are intentionally shown as a boundary: with 8192 KiB read_ahead_kb,
file->f_ra.ra_pages is 2048 base pages, and synchronous mmap read-around
uses a 2048-page window centered around the fault, roughly [index - 1024,
index + 1023].  stride1021 is 1021 * 4 KiB = 4084 KiB, so the next access
lands inside the previous read-around window.  About every other access
can be a real faulting-address page-cache hit, and the other half can each
read about 8 MiB.  For about 52k accesses in the 20 GiB/1% run, half of
them times 8 MiB is about 205 GiB, matching the observed 204 GiB.

This patch (of 2):

filemap_map_pages() reduces file->f_ra.mmap_miss when fault-around maps
folios that are already present in the page cache.  That hit accounting is
too generous because fault-around can install PTEs around the faulting
address even though the fault only proves that the faulting address was
accessed.

Move the mmap_miss update back into filemap_map_pages(), drop the
mmap_miss argument from the helper functions, and decrement mmap_miss only
when the helper return value shows that the faulting address was mapped.
Keep the existing workingset-folio behavior unchanged.

Link: https://lore.kernel.org/tencent_AA501E9A238337BD167E5C2ACF948A1AF308@qq.com
Link: https://lore.kernel.org/tencent_756F151FE66F3D80479A6F982C0AB8569F09@qq.com
Signed-off-by: fujunjie <fujunjie1@qq.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Vishal Moola <vishal.moola@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: use zone lock guard in __offline_isolated_pages()
Dmitry Ilvokhin [Wed, 29 Apr 2026 12:02:13 +0000 (12:02 +0000)] 
mm: use zone lock guard in __offline_isolated_pages()

Use spinlock_irqsave zone lock guard in __offline_isolated_pages() to
replace the explicit lock/unlock pattern with automatic scope-based
cleanup.

Link: https://lore.kernel.org/13149be4f8151e18eb5f1eb4f3241ab3cffb373e.1777462630.git.d@ilvokhin.com
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: use zone lock guard in free_pcppages_bulk()
Dmitry Ilvokhin [Wed, 29 Apr 2026 12:02:12 +0000 (12:02 +0000)] 
mm: use zone lock guard in free_pcppages_bulk()

Use spinlock_irqsave zone lock guard in free_pcppages_bulk() to replace
the explicit lock/unlock pattern with automatic scope-based cleanup.

Link: https://lore.kernel.org/aafc2d660057a91eb40417f8ff4645b0a8c525e2.1777462630.git.d@ilvokhin.com
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: use zone lock guard in put_page_back_buddy()
Dmitry Ilvokhin [Wed, 29 Apr 2026 12:02:11 +0000 (12:02 +0000)] 
mm: use zone lock guard in put_page_back_buddy()

Use spinlock_irqsave zone lock guard in put_page_back_buddy() to replace
the explicit lock/unlock pattern with automatic scope-based cleanup.

Link: https://lore.kernel.org/b0fceedca37139da36aa626ac72eb9840b641021.1777462630.git.d@ilvokhin.com
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: use zone lock guard in take_page_off_buddy()
Dmitry Ilvokhin [Wed, 29 Apr 2026 12:02:10 +0000 (12:02 +0000)] 
mm: use zone lock guard in take_page_off_buddy()

Use spinlock_irqsave zone lock guard in take_page_off_buddy() to replace
the explicit lock/unlock pattern with automatic scope-based cleanup.

This also allows to return directly from the loop, removing the 'ret'
variable.

Link: https://lore.kernel.org/a981721632a981f148c63e3f7df3d1116a0c3f6d.1777462630.git.d@ilvokhin.com
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: use zone lock guard in set_migratetype_isolate()
Dmitry Ilvokhin [Wed, 29 Apr 2026 12:02:09 +0000 (12:02 +0000)] 
mm: use zone lock guard in set_migratetype_isolate()

Use spinlock_irqsave scoped lock guard in set_migratetype_isolate() to
replace the explicit lock/unlock pattern with automatic scope-based
cleanup.  The scoped variant is used to keep dump_page() outside the
locked section to avoid a lockdep splat.

Link: https://lore.kernel.org/6883351ad7f74d20875fff30e0e3214a089cea97.1777462630.git.d@ilvokhin.com
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: use zone lock guard in unreserve_highatomic_pageblock()
Dmitry Ilvokhin [Wed, 29 Apr 2026 12:02:08 +0000 (12:02 +0000)] 
mm: use zone lock guard in unreserve_highatomic_pageblock()

Use spinlock_irqsave zone lock guard in unreserve_highatomic_pageblock()
to replace the explicit lock/unlock pattern with automatic scope-based
cleanup.

Link: https://lore.kernel.org/69db814cd178915cb5615334a29304678f960963.1777462630.git.d@ilvokhin.com
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: use zone lock guard in unset_migratetype_isolate()
Dmitry Ilvokhin [Wed, 29 Apr 2026 12:02:07 +0000 (12:02 +0000)] 
mm: use zone lock guard in unset_migratetype_isolate()

Use spinlock_irqsave zone lock guard in unset_migratetype_isolate() to
replace the explicit lock/unlock and goto pattern with automatic
scope-based cleanup.

Link: https://lore.kernel.org/815c0905ea77828ed32bf56ff0a6d3c6548eb3a2.1777462630.git.d@ilvokhin.com
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: use zone lock guard in reserve_highatomic_pageblock()
Dmitry Ilvokhin [Wed, 29 Apr 2026 12:02:06 +0000 (12:02 +0000)] 
mm: use zone lock guard in reserve_highatomic_pageblock()

Patch series "mm: use spinlock guards for zone lock", v3.

This series uses spinlock guard for zone lock across several mm functions
to replace explicit lock/unlock patterns with automatic scope-based
cleanup.

This simplifies the control flow by removing 'flags' variables, goto
labels, and redundant unlock calls.

Patches are ordered by decreasing value.  The first six patches simplify
the control flow by removing gotos, multiple unlock paths, or 'ret'
variables.  The last two are simpler lock/unlock pair conversions that
only remove 'flags' and can be dropped if considered unnecessary churn.

Binary size increase is +39 bytes, with Peter Zijlstra's fix for guards
[1] applied.  This is due to the compiler not being able to deduplicate
epilogue and eliminate redundant NULL check.  See discussion [2] for more
details.  I proposed a patch [3] that fixes this, but until it is merged
we need to assume +39 bytes will stay (though it is compiler dependent).

This patch (of 8):

Use the spinlock_irqsave zone lock guard in reserve_highatomic_pageblock()
to replace the explicit lock/unlock and goto out_unlock pattern with
automatic scope-based cleanup.

Link: https://lore.kernel.org/cover.1777462630.git.d@ilvokhin.com
Link: https://lore.kernel.org/3657e1144e2ffc1ca0eb57d57d89bfec4073d8c6.1777462630.git.d@ilvokhin.com
Link: https://lore.kernel.org/all/20260309164516.GE606826@noisy.programming.kicks-ass.net/
Link: https://lore.kernel.org/all/afC5C6fylF4AsITV@shell.ilvokhin.com/
Link: https://lore.kernel.org/all/20260427165037.205337-1-d@ilvokhin.com/
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoDocs/ABI/damon: mark schemes/<S>/filters/ deprecated
SeongJae Park [Wed, 29 Apr 2026 15:03:06 +0000 (08:03 -0700)] 
Docs/ABI/damon: mark schemes/<S>/filters/ deprecated

Now the 'filters/' directory is deprecated.  Update ABI document to also
announce the fact.  Also update the descriptions of the files to be based
on 'core_filter/' directory, to make the old descriptions ready to be
removed when the time arrives.

Link: https://lore.kernel.org/20260429150309.82282-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoDocs/admin-guide/mm/damon/usage: mark scheme filters sysfs dir as deprecated
SeongJae Park [Wed, 29 Apr 2026 15:03:05 +0000 (08:03 -0700)] 
Docs/admin-guide/mm/damon/usage: mark scheme filters sysfs dir as deprecated

Patch series "mm/damon/sysfs: document filters/ directory as deprecated".

Commit ab71d2d30121 ("mm/damon/sysfs-schemes: let
damon_sysfs_scheme_set_filters() be used for different named directories")
introduced alternatives of 'filters' directory, namely core_filters/ and
'ops_filters/ directories.  Now the alternatives are well stabilized and
ready for all users.  All filters/ directory use cases are expected to be
able to be migrated to the alternatives.  An LTS kernel having the
alternatives, namely 6.18.y, is also released.  Existence of filters/
directory is only confusing.

It would be better not immediately removing the directory, though.  There
could be users that need time before migrating to the alternatives.  There
might be unexpected use cases that the alternatives cannot support.  Doing
the deprecation step by step across multiple years like DAMON debugfs
deprecation would be safer.  Start the deprecation changes by announcing
the deprecation on the documents.

Every year, one more action for completely removing the directory will be
followed, like DAMON debugfs deprecation did.  Following yearly actions
are currently expected.  In 2027, deprecation warning kernel messages will
be printed once, for use of filters/ directory.  In 2028, filters/
directory will be renamed to filters_DEPRECATED/.  In 2029,
filters_DEPRECATED/ directory will be removed.

This patch (of 2):

The alternatives of 'filters/' directory, namely 'core_filters/' and
'ops_filters/', can fully support all the features 'filters/' directory
can do, and provide better user experience.  Having 'filters/' directory
is only confusing to users.  Announce it as deprecated on the usage
document.

Link: https://lore.kernel.org/20260429150309.82282-1-sj@kernel.org
Link: https://lore.kernel.org/20260429150309.82282-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/khugepaged: return -EAGAIN for SCAN_PAGE_HAS_PRIVATE in MADV_COLLAPSE
Vineet Agarwal [Wed, 29 Apr 2026 14:04:34 +0000 (19:34 +0530)] 
mm/khugepaged: return -EAGAIN for SCAN_PAGE_HAS_PRIVATE in MADV_COLLAPSE

MADV_COLLAPSE uses errno values to provide actionable feedback to
userspace.  Temporary resource constraints are mapped to -EAGAIN so the
caller may retry, while intrinsic failures of the specified range are
mapped to -EINVAL.

collapse_file() returns SCAN_PAGE_HAS_PRIVATE when filemap_release_folio()
fails while isolating file-backed folios for collapse.  This currently
falls through the default case in madvise_collapse_errno() and is reported
to userspace as -EINVAL.

However, filemap_release_folio() failure commonly reflects temporary folio
state rather than a permanently uncollapsible range.

For example, ext4 returns false when a folio still has dirty journalled
data, btrfs returns false for dirty or writeback folios before extent
state release, and NFS may return false while reclaiming
filesystem-private folio state.

In such cases, retrying MADV_COLLAPSE after writeback, reclaim or journal
progress may succeed.  This matches the existing -EAGAIN handling for
SCAN_PAGE_DIRTY_OR_WRITEBACK and other transient collapse failures more
closely than -EINVAL.

Therefore, map SCAN_PAGE_HAS_PRIVATE to -EAGAIN so userspace receives
retryable feedback for this temporary failure path.

Link: https://lore.kernel.org/20260429140434.439456-1-agarwal.vineet2006@gmail.com
Signed-off-by: Vineet Agarwal <agarwal.vineet2006@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoselftests/mm: khugepaged: initialize file contents via mmap
Vineet Agarwal [Wed, 29 Apr 2026 11:58:16 +0000 (17:28 +0530)] 
selftests/mm: khugepaged: initialize file contents via mmap

file_setup_area() currently allocates anonymous memory, fills it, and
writes it into the backing file used for collapse testing.

Instead of copying data through write(), resize the file with ftruncate(),
map it directly with MAP_SHARED, and initialize the mapped area in place.

This simplifies the setup path and avoids the need for explicit partial
write handling.

Link: https://lore.kernel.org/20260429115816.98824-1-agarwal.vineet2006@gmail.com
Signed-off-by: Vineet Agarwal <agarwal.vineet2006@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoDocs/admin-guide/mm/damon/lru_sort: update for entire memory monitoring
SeongJae Park [Wed, 29 Apr 2026 04:12:29 +0000 (21:12 -0700)] 
Docs/admin-guide/mm/damon/lru_sort: update for entire memory monitoring

Update DAMON_LRU_SORT usage document for the changed default monitoring
target region selection.

Link: https://lore.kernel.org/20260429041232.90257-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoDocs/admin-guide/mm/damon/reclaim: update for entire memory monitoring
SeongJae Park [Wed, 29 Apr 2026 04:12:28 +0000 (21:12 -0700)] 
Docs/admin-guide/mm/damon/reclaim: update for entire memory monitoring

Update DAMON_RECLAIM usage document for the changed default monitoring
target region selection.

Link: https://lore.kernel.org/20260429041232.90257-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon/stat: use damon_set_region_system_rams_default()
SeongJae Park [Wed, 29 Apr 2026 04:12:27 +0000 (21:12 -0700)] 
mm/damon/stat: use damon_set_region_system_rams_default()

damon_stat_set_moniotirng_region() is nearly a duplicate of the core
function, damon_set_region_system_rams_default().  Use the core
implementation.

Link: https://lore.kernel.org/20260429041232.90257-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon/core: remove damon_set_region_biggest_system_ram_default()
SeongJae Park [Wed, 29 Apr 2026 04:12:26 +0000 (21:12 -0700)] 
mm/damon/core: remove damon_set_region_biggest_system_ram_default()

Now nobody is using damon_set_region_biggest_system_ram_default().  Remove
it.

Link: https://lore.kernel.org/20260429041232.90257-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon/lru_sort: cover all system rams
SeongJae Park [Wed, 29 Apr 2026 04:12:25 +0000 (21:12 -0700)] 
mm/damon/lru_sort: cover all system rams

DAMON_LRU_SORT allows users to set the physical address range to monitor
and do the work on.  When users don't explicitly set the range, the
biggest system ram resource of the system is selected as the monitoring
target address range.  The intention was to reduce the overhead from
monitoring non-System RAM areas because monitoring non-System RAM may be
meaningless.  However, because of the sampling based access check and
adaptive regions adjustment, the overhead should be negligible.  It makes
more sense to just cover all system rams of the system.  Do so.

Link: https://lore.kernel.org/20260429041232.90257-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon/reclaim: cover all system rams
SeongJae Park [Wed, 29 Apr 2026 04:12:24 +0000 (21:12 -0700)] 
mm/damon/reclaim: cover all system rams

DAMON_RECLAIM allows users to set the physical address range to monitor
and do the work on.  When users don't explicitly set the range, the
biggest System RAM resource of the system is selected as the monitoring
target address range.  The intention was to reduce the overhead from
monitoring non-System RAM areas because monitoring of non-System RAM may
be meaningless.  However, because of the sampling based access check and
adaptive regions adjustment, the overhead should be negligible.  It makes
more sense to just cover all system rams of the system.  Do so.

Link: https://lore.kernel.org/20260429041232.90257-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon: introduce damon_set_region_system_rams_default()
SeongJae Park [Wed, 29 Apr 2026 04:12:23 +0000 (21:12 -0700)] 
mm/damon: introduce damon_set_region_system_rams_default()

Patch series "mm/damon/reclaim,lru_sort: monitor all system rams by
default".

DAMON_RECLAIM and DAMON_LRU_SORT set the biggest 'System RAM' resource of
the system as the default monitoring target address range.  The main
intention behind the design is to minimize the overhead coming from
monitoring of non-System RAM areas.

This could result in an odd setup when there are multiple discrete System
RAMs of considerable sizes.  For example, there are System RAMs each
having 500 GiB size.  In this case, only the first 500 GiB will be set as
the monitoring region by default.  This is particularly common on NUMA
systems.  Hence the modules allow users to set the monitoring target
address range using the module parameters if the default setup doesn't
work for them.  In other words, the current design trades ease of setup
for lower overhead.

However, because DAMON utilizes the sampling based access check and the
adaptive regions adjustment mechanisms, the overhead from the monitoring
of non-System RAM areas should be negligible in most setups.  Meanwhile,
the setup complexity is causing real headaches for users who need to run
those modules on various types of systems.  That is, the current tradeoff
is not a good deal.

Set the physical address range that can cover all System RAM areas of the
system as the default monitoring regions for DAMON_RECLAIM and
DAMON_LRU_SORT.

Technically speaking, this is changing documented behavior.  However, it
makes no sense to believe there is a real use case that really depends on
the old weird default behavior.  If the old default behavior was working
for them in the reasonable way, this change will only add a negligible
amount of monitoring overhead.  If it didn't work, the users may already
be using manual monitoring regions setup, and they will not be affected by
this change.

Patches Sequence
================

Patch 1 introduces a new core function that will be used for the new
default monitoring target region setup.  Patch 2 and 3 update
DAMON_RECLAIM and DAMON_LRU_SORT to use the new function instead of the
old one, respectively.  Patch 4 removes the old core function that was
replaced by the new one, as there is no more user of it.  Patch 5 updates
DAMON_STAT to use the new one instead of its in-house nearly-duplicate
self implementation of the functionality.  Finally patches 6 and 7 update
the DAMON_RECLAIM and DAMON_LRU_SORT user documentation for the new
behaviors, respectively.

This patch (of 7):

damon_set_region_biggest_system_ram_default() sets the monitoring target
region as the caller requested.  If the caller didn't specify the region,
it finds the biggest System RAM of the system and sets it as the target
region.  When there are more than one considerable size of System RAM
resources in the system, the default target setup makes no sense.
Introduce a variant, namely damon_set_region_system_rams_default().  It
sets a physical address range that covers all System RAM resources as the
default target region.

Link: https://lore.kernel.org/20260429041232.90257-1-sj@kernel.org
Link: https://lore.kernel.org/20260429041232.90257-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: skip KASAN tagging for page-allocated page tables
Muhammad Usama Anjum [Wed, 29 Apr 2026 10:27:04 +0000 (15:57 +0530)] 
mm: skip KASAN tagging for page-allocated page tables

Page tables are always accessed via the linear mapping with a match-all
tag, so HW-tag KASAN never checks them.  For page-allocated tables (PTEs
and PGDs etc), avoid the tag setup and poisoning overhead by using
__GFP_SKIP_KASAN.  SLUB-backed page tables are unchanged for now.  (They
aren't widely used and require more SLUB related skip logic.  Leave it
later.)

Link: https://lore.kernel.org/20260429102704.680174-4-dev.jain@arm.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Liam Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agokasan: skip HW tagging for all kernel thread stacks
Muhammad Usama Anjum [Wed, 29 Apr 2026 10:27:03 +0000 (15:57 +0530)] 
kasan: skip HW tagging for all kernel thread stacks

HW-tag KASAN never checks kernel stacks because stack pointers carry the
match-all tag, so setting/poisoning tags is pure overhead.

- Add __GFP_SKIP_KASAN to THREADINFO_GFP so every stack allocator that
  uses it skips tagging (fork path plus arch users)
- Add __GFP_SKIP_KASAN to GFP_VMAP_STACK for the fork-specific vmap
  stacks.
- When reusing cached vmap stacks, skip kasan_unpoison_range() if HW tags
  are enabled.

Software KASAN is unchanged; this only affects tag-based KASAN.

Link: https://lore.kernel.org/20260429102704.680174-3-dev.jain@arm.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Liam Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agovmalloc: add __GFP_SKIP_KASAN support
Muhammad Usama Anjum [Wed, 29 Apr 2026 10:27:02 +0000 (15:57 +0530)] 
vmalloc: add __GFP_SKIP_KASAN support

Patch series "kasan: hw_tags: Disable tagging for stack and page-tables",
v4.

Stacks and page tables are always accessed with the match-all tag, so
assigning a new random tag every time at allocation and setting invalid
tag at deallocation time, just adds overhead without improving the
detection.

With __GFP_SKIP_KASAN the page keeps its poison tag and KASAN_TAG_KERNEL
(match-all tag) is stored in the page flags while keeping the poison tag
in the hardware.  The benefit of it is that 256 tag setting instruction
per 4 kB page aren't needed at allocation and deallocation time.

Thus match-all pointers still work, while non-match tags (other than
poison tag) still fault.

__GFP_SKIP_KASAN only skips for KASAN_HW_TAGS mode, so coverage is
unchanged.

Benchmark:
The benchmark has two modes. In thread mode, the child process forks
and creates N threads. In pgtable mode, the parent maps and faults a
specified memory size and then forks repeatedly with children exiting
immediately.

Thread benchmark:
2000 iterations, 2000 threads: 2.575 s → 2.229 s (~13.4% faster)

The pgtable samples:
- 2048 MB, 2000 iters 19.08 s → 17.62 s (~7.6% faster)

This patch (of 3):

For allocations that will be accessed only with match-all pointers (e.g.,
kernel stacks), setting tags is wasted work.  If the caller already set
__GFP_SKIP_KASAN, skip tag setting of vmalloc pages.

Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc APIs.
So it wasn't being checked.  Now its being checked and acted upon.  Other
KASAN modes are unchanged because __GFP_SKIP_KASAN is ignored for them in
the page allocator, and in vmalloc too we ignore this flag for them.

This is a preparatory patch for optimizing kernel stack allocations.

Link: https://lore.kernel.org/20260429102704.680174-1-dev.jain@arm.com
Link: https://lore.kernel.org/20260429102704.680174-2-dev.jain@arm.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
Co-developed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Co-developed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Liam Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/memcontrol: hoist pstatc_pcpu assignment out of CPU loop
Hui Zhu [Wed, 29 Apr 2026 08:42:16 +0000 (16:42 +0800)] 
mm/memcontrol: hoist pstatc_pcpu assignment out of CPU loop

In mem_cgroup_alloc(), the assignment of pstatc_pcpu is invariant with
respect to the for_each_possible_cpu() loop: both the 'parent' pointer and
'parent->vmstats_percpu' remain constant throughout all iterations.

The original code redundantly re-evaluated the 'if (parent)' condition and
reassigned pstatc_pcpu on every CPU iteration, then repeated the same
ternary check 'parent ?  pstatc_pcpu : NULL' when storing into
statc->parent_pcpu.

Move the single conditional assignment of pstatc_pcpu to before the loop,
resolving both the loop-invariant placement issue and the duplicated null
check.  On systems with a large number of possible CPUs, this eliminates
repeated branch evaluation with no functional change.

No functional change intended.

Link: https://lore.kernel.org/20260429084216.186238-1-hui.zhu@linux.dev
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Reviewed-by: SeongJae Park <sj@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/migrate: rename PAGE_ migration flags to FOLIO_
Shivank Garg [Tue, 24 Mar 2026 19:07:09 +0000 (19:07 +0000)] 
mm/migrate: rename PAGE_ migration flags to FOLIO_

These flags only track folio-specific state during migration and are not
used for movable_ops pages.  Rename the enum values and the old_page_state
variable to match.

No functional change.

Link: https://lore.kernel.org/20260324190706.964555-4-shivankg@amd.com
Signed-off-by: Shivank Garg <shivankg@amd.com>
Suggested-by: David Hildenbrand <david@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoselftests/damon/sysfs.py: pause DAMON before dumping status
SeongJae Park [Mon, 27 Apr 2026 15:12:29 +0000 (08:12 -0700)] 
selftests/damon/sysfs.py: pause DAMON before dumping status

The sysfs.py test commits DAMON parameters, dump the internal DAMON state,
and show if the parameters are committed as expected using the dumped
state.  While the dumping is ongoing, DAMON is alive.  It can make
internal changes including addition and removal of regions.  It can
therefore make a race that can result in false test results.  Pause DAMON
execution during the state dumping to avoid such races.

Link: https://lore.kernel.org/20260427151231.113429-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoselftests/damon/sysfs.py: check pause on assert_ctx_committed()
SeongJae Park [Mon, 27 Apr 2026 15:12:28 +0000 (08:12 -0700)] 
selftests/damon/sysfs.py: check pause on assert_ctx_committed()

Extend sysfs.py tests to confirm damon_ctx->pause can be set using the
pause sysfs file.

Link: https://lore.kernel.org/20260427151231.113429-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoselftests/damon/drgn_dump_damon_status: dump pause
SeongJae Park [Mon, 27 Apr 2026 15:12:27 +0000 (08:12 -0700)] 
selftests/damon/drgn_dump_damon_status: dump pause

drgn_dump_damon_status is not dumping the damon_ctx->pause parameter
value, so it cannot be tested.  Dump it for future tests.

Link: https://lore.kernel.org/20260427151231.113429-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoselftests/damon/_damon_sysfs: support pause file staging
SeongJae Park [Mon, 27 Apr 2026 15:12:26 +0000 (08:12 -0700)] 
selftests/damon/_damon_sysfs: support pause file staging

DAMON test-purpose sysfs interface control Python module, _damon_sysfs, is
not supporting the newly added pause file.  Add the support of the file,
for future test and use of the feature.

Link: https://lore.kernel.org/20260427151231.113429-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon/tests/core-kunit: test pause commitment
SeongJae Park [Mon, 27 Apr 2026 15:12:25 +0000 (08:12 -0700)] 
mm/damon/tests/core-kunit: test pause commitment

Add a kunit test for commitment of damon_ctx->pause parameter that can be
done using damon_commit_ctx().

Link: https://lore.kernel.org/20260427151231.113429-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoDocs/ABI/damon: update for pause sysfs file
SeongJae Park [Mon, 27 Apr 2026 15:12:24 +0000 (08:12 -0700)] 
Docs/ABI/damon: update for pause sysfs file

Update DAMON ABI document for the DAMON context execution pause/resume
feature.

Link: https://lore.kernel.org/20260427151231.113429-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoDocs/admin-guide/mm/damon/usage: update for pause file
SeongJae Park [Mon, 27 Apr 2026 15:12:23 +0000 (08:12 -0700)] 
Docs/admin-guide/mm/damon/usage: update for pause file

Update DAMON usage document for the DAMON context execution pause/resume
feature.

Link: https://lore.kernel.org/20260427151231.113429-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>