]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
3 weeks agoarm_mpam: Improve check for whether or not NRDY is hardware managed
Ben Horgan [Thu, 7 May 2026 15:28:14 +0000 (16:28 +0100)] 
arm_mpam: Improve check for whether or not NRDY is hardware managed

mpam_ris_hw_probe_csu_nrdy() sets and clears MSMON_CSU.NRDY and checks
whether it's configuration sticks. However, hardware isn't given a chance
to disagree. Based on rule LRTGP, in MPAM specification IHI0099 version
B.b, the hardware will set NRDY if it needs time to establish a count after
a configuration change.

Enable the monitor so that NRDY becomes relevant and change the
configuration after clearing NRDY to try and coax the hardware into setting
it.

Fixes: 8c90dc68a5de ("arm_mpam: Probe the hardware features resctrl supports")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 weeks agoarm_mpam: Pretend that NRDY is always hardware managed
Ben Horgan [Thu, 7 May 2026 15:28:13 +0000 (16:28 +0100)] 
arm_mpam: Pretend that NRDY is always hardware managed

Rule ZTXDS of the MPAM specification, IHI009 version B.b, states: "If a
monitor does not support automatic updates of NRDY, software can use that
bit for any purpose."

As software is not reliably informed whether or not the monitor supports
automatic updates of NRDY always assume that hardware may manage NRDY but
don't rely on it. When NRDY is truly untouched by hardware then, as it is
written to 0 on configuration, it will always read 0.

At probe it's checked if MSMON_CSU.NRDY and MSMON_MBWU.NRDY are hardware
managed but not MSMON_MBWU_L.NDRY. Specialize the checking for hardware
managed NRDY to CSU counters as this is the only case where hardware
management makes sense. Continue to inform the user if MSMON_CSU.NRDY
appears to be hardware managed but the firmware doesn't provide the
associated time limit for the automatic clearing of NRDY. Remove the NRDY
feature flags as they are now unused.

Fixes: 8c90dc68a5de ("arm_mpam: Probe the hardware features resctrl supports")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 weeks agoarm_mpam: Fix monitor instance selection when checking for hardware NRDY
Ben Horgan [Thu, 7 May 2026 15:28:12 +0000 (16:28 +0100)] 
arm_mpam: Fix monitor instance selection when checking for hardware NRDY

In _mpam_ris_hw_probe_hw_nrdy() a new register value to select the first
monitor and relevant RIS is prepared in mon_sel. However, it is written to
the monitor value register, e.g. MSMON_CSU, rather than MSMON_CFG_MON_SEL.

As MSMON_CFG_MON_SEL is a 32 bit register update the type of mon_sel to
u32.  Write mon_sel to the intended register, MSMON_CFG_MON_SEL.

Fixes: 8c90dc68a5de ("arm_mpam: Probe the hardware features resctrl supports")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 weeks agoxfrm: Check for underflow in xfrm_state_mtu
David Ahern [Wed, 13 May 2026 16:49:14 +0000 (10:49 -0600)] 
xfrm: Check for underflow in xfrm_state_mtu

Leo Lin reported OOB write issue in esp component:

  xfrm_state_mtu() returns u32 but performs its arithmetic in unsigned
  modulo-2^32 space using an attacker-influenced "header_len + authsize +
  net_adj" subtracted from a small "mtu" argument. A nobody user can
  install an IPv4 ESP tunnel SA with a large authentication key
  (XFRMA_ALG_AUTH_TRUNC, e.g. hmac(sha512), 64-byte key, 64-byte trunc),
  configure a small interface MTU (68 bytes), and set XFRMA_TFCPAD to a
  large value. When a single UDP datagram is then sent through the
  tunnel, xfrm_state_mtu() underflows to a near-2^32 value, and
  esp_output() consumes it as a signed int via:

        padto      = min(x->tfcpad, xfrm_state_mtu(x, mtu_cached))
        esp.tfclen = padto - skb->len   (assigned to int)

  esp.tfclen ends up negative (e.g. -207). It is sign-extended to size_t
  when passed to memset() inside esp_output_fill_trailer(), producing a
  ~16 EB write of zeroes at skb_tail_pointer(skb). KASAN logs it as
  "Write of size 18446744073709551537 at addr ffff888...".

Check for underflow and return 1. This causes the sendmsg attempt to
fail with ENETUNREACH.

Fixes: c5c252389374 ("[XFRM]: Optimize MTU calculation")
Reported-by: Leo Lin <leo@depthfirst.com>
Assisted-by: Codex:26.506.31004
Signed-off-by: David Ahern <dahern@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 weeks agopowerpc/time: Remove redundant preempt_disable|enable() calls from arch_irq_work_raise()
Sayali Patil [Wed, 13 May 2026 08:14:13 +0000 (13:44 +0530)] 
powerpc/time: Remove redundant preempt_disable|enable() calls from arch_irq_work_raise()

A kernel panic is observed when handling machine check exceptions from
real mode.

  BUG: Unable to handle kernel data access on read at 0xc00000006be21300
  Oops: Kernel access of bad area, sig: 11 [#1]
  MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 88222248  XER: 00000005
  CFAR: c00000000003ffc4 DAR: c00000006be21300 DSISR: 40000000 IRQMASK: 0
  NIP [c000000000029e40] arch_irq_work_raise+0x10/0x70
  LR [c00000000003ffc8] machine_check_queue_event+0xa8/0x150
  Call Trace:
  [c0000000179d3c70] [c00000000003ff64] machine_check_queue_event+0x44/0x150
  [c0000000179d3d30] [c0000000000084e0] machine_check_early_common+0x1f0/0x2c0

The crash occurs because arch_irq_work_raise() calls preempt_disable()
from machine check exception (MCE) handlers running in real mode. In
this context, accessing the preempt_count can fault, leading to the panic.

The preempt_disable()/preempt_enable() pair in arch_irq_work_raise()
was originally added by commit 0fe1ac48bef0 ("powerpc/perf_event: Fix
oops due to perf_event_do_pending call") to avoid races while raising
irq work from exception context.

Later, commit 471ba0e686cb ("irq_work: Do not raise an IPI when
queueing work on the local CPU") added preemption protection in
irq_work_queue() path, while commit 20b876918c06 ("irq_work: Use per
cpu atomics instead of regular atomics") added equivalent
protection in irq_work_queue_on() before reaching arch_irq_work_raise():

  irq_work_queue() / irq_work_queue_on()
    -> preempt_disable()
      -> __irq_work_queue_local()
        -> irq_work_raise()
          -> arch_irq_work_raise()

As a result, callers other than mce_irq_work_raise() already execute
with preemption disabled, making the additional
preempt_disable()/preempt_enable() pair in arch_irq_work_raise()
redundant.

The arch_irq_work_raise() function executes in NMI context when called
from MCE handler. Hence we will not be preempted or scheduled out since
we are in NMI context with MSR[EE]=0. Therefore, it is safe to remove
the preempt_disable()/preempt_enable() calls from here.

Remove it to avoid accessing preempt_count from real mode context.

Fixes: cc15ff327569 ("powerpc/mce: Avoid using irq_work_queue() in realmode")
Suggested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
[Maddy: Fixed the commit title]
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260513081413.222490-1-sayalip@linux.ibm.com
3 weeks agoInput: synaptics - add LEN2058 to SMBus passlist for ThinkPad E490
Nicolás Bazaes [Thu, 14 May 2026 01:35:49 +0000 (21:35 -0400)] 
Input: synaptics - add LEN2058 to SMBus passlist for ThinkPad E490

The Lenovo ThinkPad E490 (PNP ID: LEN2058) has a Synaptics TM3471-020
touchpad that supports SMBus/RMI4 mode but is not listed in
smbus_pnp_ids[]. Without this entry, RMI4 over SMBus is not enabled
by default, and the touchpad falls back to PS/2 mode.

Adding LEN2058 to the passlist enables automatic RMI4 detection without
requiring the psmouse.synaptics_intertouch parameter, and matches
the behavior of similar ThinkPad models already in the list
(E480/LEN2054, E580/LEN2055).

Tested on ThinkPad E490 with kernel 7.0.5-zen1 and Arch Linux.
RMI4 over SMBus is confirmed working without any kernel parameters.

Signed-off-by: Nicolás Bazaes <contacto@bazaes.cl>
Assisted-by: Claude:claude-sonnet-4-6
Link: https://patch.msgid.link/20260514013552.14234-1-contacto@bazaes.cl
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
3 weeks agoriscv: misaligned: Make enabling delegation depend on NONPORTABLE
Vivian Wang [Wed, 1 Apr 2026 01:53:17 +0000 (09:53 +0800)] 
riscv: misaligned: Make enabling delegation depend on NONPORTABLE

The unaligned access emulation code in Linux has various deficiencies.
For example, it doesn't emulate vector instructions [1] [2], and doesn't
emulate KVM guest accesses. Therefore, requesting misaligned exception
delegation with SBI FWFT actually regresses vector instructions' and KVM
guests' behavior.

Until Linux can handle it properly, guard these sbi_fwft_set() calls
behind RISCV_SBI_FWFT_DELEGATE_MISALIGNED, which in turn depends on
NONPORTABLE. Those who are sure that this wouldn't be a problem can
enable this option, perhaps getting better performance.

The rest of the existing code proceeds as before, except as if
SBI_FWFT_MISALIGNED_EXC_DELEG is not available, to handle any remaining
address misaligned exceptions on a best-effort basis. The KVM SBI FWFT
implementation is also not touched, but it is disabled if the firmware
emulates unaligned accesses.

Cc: stable@vger.kernel.org
Fixes: cf5a8abc6560 ("riscv: misaligned: request misaligned exception from SBI")
Reported-by: Songsong Zhang <U2FsdGVkX1@gmail.com> # KVM
Link: https://lore.kernel.org/linux-riscv/38ce44c1-08cf-4e3f-8ade-20da224f529c@iscas.ac.cn/
Link: https://lore.kernel.org/linux-riscv/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260401-riscv-misaligned-dont-delegate-v2-1-5014a288c097@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
3 weeks agoriscv: Docs: fix unmatched quote warning
Randy Dunlap [Mon, 6 Apr 2026 23:23:04 +0000 (16:23 -0700)] 
riscv: Docs: fix unmatched quote warning

'make htmldocs' complains about ``prctrl` -- so add a second '`' to
avoid the warning.

Documentation/arch/riscv/zicfilp.rst:79: WARNING: Inline literal start-string without end-string. [docutils]

Fixes: 08ee1559052b ("prctl: cfi: change the branch landing pad prctl()s to be more descriptive")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260406232304.1892528-1-rdunlap@infradead.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
3 weeks agoio_uring: validate user-controlled cq.head in io_cqe_cache_refill()
Zizhi Wo [Thu, 14 May 2026 02:18:47 +0000 (10:18 +0800)] 
io_uring: validate user-controlled cq.head in io_cqe_cache_refill()

A fuzzing run reproduced an unkillable io_uring task stuck at ~100% CPU:

[root@fedora io_uring_stress]# ps -ef | grep io_uring
root  1240  1  99 13:36 ?  00:01:35 [io_uring_stress] <defunct>

The task loops inside io_cqring_wait() and never returns to userspace,
and SIGKILL has no effect.

This is caused by the CQ ring exposing rings->cq.head to userspace as
writable, while the authoritative tail lives in kernel-private
ctx->cached_cq_tail. io_cqe_cache_refill() computes free space as an
unsigned subtraction:

    free = ctx->cq_entries - min(tail - head, ctx->cq_entries);

If userspace keeps head within [0, tail], the subtraction is well
defined and min() just acts as a defensive clamp. But if userspace
advances head past tail, (tail - head) wraps to a huge value, free
becomes 0, and io_cqe_cache_refill() fails. The CQE is pushed onto the
overflow list and IO_CHECK_CQ_OVERFLOW_BIT is set.

The wait loop in io_cqring_wait() relies on an invariant: refill() only
fails when the CQ is *physically* full, in which case rings->cq.tail has
been advanced to iowq->cq_tail and io_should_wake() returns true. The
tampered head breaks this: refill() fails while the ring is not full, no
OCQE is copied in, rings->cq.tail never catches up, io_should_wake()
stays false, and io_cqring_wait_schedule() keeps returning early because
IO_CHECK_CQ_OVERFLOW_BIT is still set. The result is a tight retry loop
that never returns to userspace.

Introduce io_cqring_queued() as the single point that converts the
(tail, head) pair into a trustworthy queued count. Since the real
head/tail distance is bounded by cq_entries (far below 2^31), a signed
comparison reliably detects userspace moving head past tail; in that
case treat the queue as empty so callers see the full cache as free and
forward progress is preserved.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Link: https://patch.msgid.link/20260514021847.4062782-1-wozizhi@huaweicloud.com
[axboe: fixup commit message, kill 'queued' var, and keep it all in
io_uring.c]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agoMerge tag 'amd-drm-fixes-7.1-2026-05-13' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 14 May 2026 02:04:08 +0000 (12:04 +1000)] 
Merge tag 'amd-drm-fixes-7.1-2026-05-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-7.1-2026-05-13:

amdgpu:
- Userq fixes
- DCN 3.2 fix
- RAS fix
- GC 12 fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260513224053.40670-1-alexander.deucher@amd.com
3 weeks agoMerge branch 'macsec-use-rcu_work-to-fix-crypto-cleanup-in-softirq-context'
Jakub Kicinski [Thu, 14 May 2026 02:03:07 +0000 (19:03 -0700)] 
Merge branch 'macsec-use-rcu_work-to-fix-crypto-cleanup-in-softirq-context'

Jinliang Zheng says:

====================
macsec: use rcu_work to fix crypto cleanup in softirq context

From: Jinliang Zheng <alexjlzheng@tencent.com>

crypto_free_aead() can internally call vunmap() (e.g. via dma_free_attrs()
in hardware crypto drivers like hisi_sec2), which must not be invoked from
softirq context. Both free_rxsa() and free_txsa() are RCU callbacks that
run in softirq, causing a kernel crash on affected hardware.

This series fixes the issue by deferring the actual cleanup to a workqueue
using rcu_work, which combines the RCU grace period and workqueue dispatch
into a single primitive.

Two design decisions worth noting:

1. rcu_work instead of schedule_work() + synchronize_rcu()

   An alternative would be to call schedule_work() directly from
   macsec_rxsa_put()/macsec_txsa_put(), then call synchronize_rcu() at
   the start of the work handler to replace the grace period previously
   provided by call_rcu(). However, synchronize_rcu() blocks the worker
   thread for the duration of a full RCU grace period. Under high SA
   churn (e.g. tearing down an interface with many SAs), each SA would
   occupy a worker thread while waiting, and multiple concurrent calls
   cannot share the same grace period — leading to unnecessary latency
   and resource waste.

   rcu_work uses call_rcu_hurry() internally, which is fully asynchronous:
   the worker thread is only dispatched after the grace period has elapsed,
   and multiple concurrent queue_rcu_work() calls naturally batch under the
   same grace period via the RCU subsystem's existing coalescing mechanism.

2. Dedicated workqueue instead of system_wq

   Using a dedicated workqueue (macsec_wq) allows macsec_exit() to drain
   exactly the work items belonging to this module — by calling
   destroy_workqueue() after rcu_barrier(). If system_wq were used,
   flush_scheduled_work() would drain all pending work items across the
   entire system, creating unnecessary coupling with unrelated subsystems
   and potentially causing unexpected delays. The dedicated workqueue
   provides a clean, contained teardown path.
====================

Link: https://patch.msgid.link/20260511153102.2640368-1-alexjlzheng@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agomacsec: use rcu_work to defer TX SA crypto cleanup out of softirq
Jinliang Zheng [Mon, 11 May 2026 15:31:00 +0000 (23:31 +0800)] 
macsec: use rcu_work to defer TX SA crypto cleanup out of softirq

free_txsa() is an RCU callback running in softirq context, but calls
crypto_free_aead() which can invoke vunmap() internally on hardware
crypto drivers (e.g. hisi_sec2), triggering a kernel crash.

Use rcu_work to defer the cleanup to a workqueue, for the same reasons
as the analogous fix to free_rxsa() in the previous patch.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260511153102.2640368-4-alexjlzheng@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agomacsec: use rcu_work to defer RX SA crypto cleanup out of softirq
Jinliang Zheng [Mon, 11 May 2026 15:30:59 +0000 (23:30 +0800)] 
macsec: use rcu_work to defer RX SA crypto cleanup out of softirq

crypto_free_aead() can internally invoke vunmap() (e.g. via
dma_free_attrs() in hardware crypto drivers such as hisi_sec2).
vunmap() must not be called from softirq context, but free_rxsa()
is an RCU callback that runs in softirq, leading to a kernel crash:

  vunmap+0x4c/0x70
  __iommu_dma_free+0xd0/0x138
  dma_free_attrs+0xf4/0x100
  sec_aead_exit+0x64/0xb8 [hisi_sec2]
  crypto_destroy_tfm+0x98/0x110
  free_rxsa+0x28/0x50 [macsec]
  rcu_do_batch+0x184/0x460
  rcu_core+0xf4/0x1f8
  handle_softirqs+0x118/0x330

Use rcu_work to defer the cleanup to a workqueue. rcu_work dispatches
the worker asynchronously after the RCU grace period, so no thread
blocks waiting, and concurrent releases of multiple SAs naturally
share the same grace period.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260511153102.2640368-3-alexjlzheng@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agomacsec: introduce dedicated workqueue for SA crypto cleanup
Jinliang Zheng [Mon, 11 May 2026 15:30:58 +0000 (23:30 +0800)] 
macsec: introduce dedicated workqueue for SA crypto cleanup

Introduce a dedicated ordered workqueue, macsec_wq, which will be used
by subsequent patches to defer SA crypto cleanup (crypto_free_aead and
related teardown) out of softirq context.

Using a dedicated workqueue instead of system_wq allows macsec_exit()
to drain exactly the work items belonging to this module via
destroy_workqueue(), without interfering with unrelated work items on
system_wq or causing unexpected delays elsewhere.

rcu_barrier() in macsec_exit() ensures all in-flight rcu_work callbacks
have enqueued their work items before destroy_workqueue() drains and
destroys the queue, making the two-step teardown correct and complete.
The same sequence is kept in the error path of macsec_init() as a
precaution, to mirror macsec_exit() and stay safe if work ever becomes
queueable before this point in the future.

While at it, rename the error labels in macsec_init() from the
resource-named style (rtnl:, notifier:, wq:) to the err_xxx: style
(err_rtnl:, err_notifier:, err_destroy_wq:) to align with the broader
kernel convention.

Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260511153102.2640368-2-alexjlzheng@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: net_failover: Fix the deadlock in slave register
Faicker Mo [Mon, 11 May 2026 14:05:51 +0000 (22:05 +0800)] 
net: net_failover: Fix the deadlock in slave register

There is netdev_lock_ops() before the NETDEV_REGISTER notifier
in register_netdevice(), so use the non-locking functions
in net_failover_slave_register().
failover_slave_register() in failover_existing_slave_register() adds lock
and unlock ops too.

Call Trace:
 <TASK>
 __schedule+0x30d/0x7a0
 schedule+0x27/0x90
 schedule_preempt_disabled+0x15/0x30
 __mutex_lock.constprop.0+0x538/0x9e0
 __mutex_lock_slowpath+0x13/0x20
 mutex_lock+0x3b/0x50
 dev_set_mtu+0x40/0xe0
 net_failover_slave_register+0x24/0x280
 failover_slave_register+0x103/0x1b0
 failover_event+0x15e/0x210
 ? dropmon_net_event+0xac/0xe0
 notifier_call_chain+0x5e/0xe0
 raw_notifier_call_chain+0x16/0x30
 call_netdevice_notifiers_info+0x52/0xa0
 register_netdevice+0x5f4/0x7c0
 register_netdev+0x1e/0x40
 _mlx5e_probe+0xe2/0x370 [mlx5_core]
 mlx5e_probe+0x59/0x70 [mlx5_core]
 ? __pfx_mlx5e_probe+0x10/0x10 [mlx5_core]

Fixes: 4c975fd70002 ("net: hold instance lock during NETDEV_REGISTER/UP")
Signed-off-by: Faicker Mo <faicker.mo@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge tag 'drm-intel-fixes-2026-05-13' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Thu, 14 May 2026 01:53:09 +0000 (11:53 +1000)] 
Merge tag 'drm-intel-fixes-2026-05-13' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Skip __i915_request_skip() for already signaled requests (Sebastian Brzezinka)
- Fix VSC dynamic range signaling for RGB formats [dp] (Chaitanya Kumar Borah)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patch.msgid.link/agSVZmNC_qV4G6jQ@linux
3 weeks agoMAINTAINERS: update atlantic driver maintainer
Sukhdeep Singh [Tue, 12 May 2026 12:57:11 +0000 (18:27 +0530)] 
MAINTAINERS: update atlantic driver maintainer

Igor Russkikh and Egor Pomozov have left Marvell.
Take over maintenance of the atlantic driver and its PTP subsystem.

Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoriscv: cfi: reduce shadow stack size limit from 4GB to 2GB
Zong Li [Tue, 28 Apr 2026 02:41:05 +0000 (19:41 -0700)] 
riscv: cfi: reduce shadow stack size limit from 4GB to 2GB

Follow the ARM64 GCS (Guarded Control Stack) implementation approach
by reducing the shadow stack size allocation from min(RLIMIT_STACK, 4GB)
to min(RLIMIT_STACK/2, 2GB). See commit 506496bcbb42 ("arm64/gcs: Ensure
that new threads have a GCS")

Rationale:

1. Shadow stacks only store return addresses (8 bytes per entry), not
   local variables, function parameters, or saved registers. A 2GB
   shadow stack is far more than sufficient for any practical
   application, even with extremely deep recursion. Using half the size
   maintains adequate margin while being more resource-efficient.

2. On memory-constrained systems (e.g., platforms with only 4GB of
   physical memory, which is a common configuration), allocating 4GB
   of virtual address space for shadow stack per process/thread can
   lead to virtual memory allocation failures when the overcommit mode
   is set to OVERCOMMIT_GUESS or OVERCOMMIT_NEVER:
   Error: "__vm_enough_memory: not enough memory for the allocation"

This reduces virtual address space consumption by 50% while maintaining
more than adequate space for return address storage.

Signed-off-by: Zong Li <zong.li@sifive.com>
Link: https://patch.msgid.link/20260428024105.645162-1-zong.li@sifive.com
[pjw@kernel.org: clean up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
3 weeks agoselftests/tc-testing: Add QFQ/CBS qlen underflow test
Victor Nogueira [Mon, 11 May 2026 18:30:58 +0000 (14:30 -0400)] 
selftests/tc-testing: Add QFQ/CBS qlen underflow test

Since CBS was not calling reset for its child qdisc, there are scenarios
where it could cause an underflow on its parent's qlen/backlog. When the
parent is QFQ, a null-ptr deref could occur.

Add a test case that reproduces the underflow followed by a null-ptr
deref scenario.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet/sched: sch_cbs: Call qdisc_reset for child qdisc
Jamal Hadi Salim [Mon, 11 May 2026 18:30:57 +0000 (14:30 -0400)] 
net/sched: sch_cbs: Call qdisc_reset for child qdisc

During a reset, CBS is not calling reset on its child qdisc, which
might cause qlen/backlog accounting issues. For example, if we have CBS
with a QFQ parent and a netem child with delay, we can create a scenario
where the parent's qlen underflows. QFQ, specifically, uses qlen to
check whether it should deference a pointer, so this scenario may cause
a null-ptr deref in QFQ:

[   43.875639][  T319] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] SMP KASAN NOPTI
[   43.876124][  T319] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[   43.876417][  T319] CPU: 10 UID: 0 PID: 319 Comm: ping Not tainted 7.0.0-13039-ge728258debd5 #773 PREEMPT(full)
[   43.876751][  T319] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   43.876949][  T319] RIP: 0010:qfq_dequeue+0x35c/0x1650
[   43.877123][  T319] Code: 00 fc ff df 80 3c 02 00 0f 85 17 0e 00 00 4c 8d 73 48 48 89 9d b8 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 76 0c 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b
[   43.877648][  T319] RSP: 0018:ffff8881017ef4f0 EFLAGS: 00010216
[   43.877845][  T319] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: dffffc0000000000
[   43.878073][  T319] RDX: 0000000000000009 RSI: 0000000c40000000 RDI: ffff88810eef02b0
[   43.878306][  T319] RBP: ffff88810eef0000 R08: ffff88810eef0280 R09: 1ffff1102120fd63
[   43.878523][  T319] R10: 1ffff1102120fd66 R11: 1ffff1102120fd67 R12: 0000000c40000000
[   43.878742][  T319] R13: ffff88810eef02b8 R14: 0000000000000048 R15: 0000000020000000
[   43.878959][  T319] FS:  00007f9c51c47c40(0000) GS:ffff88817a0be000(0000) knlGS:0000000000000000
[   43.879214][  T319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.879403][  T319] CR2: 000055e69a2230a8 CR3: 000000010c07a000 CR4: 0000000000750ef0
[   43.879621][  T319] PKRU: 55555554
[   43.879735][  T319] Call Trace:
[   43.879844][  T319]  <TASK>
[   43.879924][  T319]  __qdisc_run+0x169/0x1900
[   43.880075][  T319]  ? dev_qdisc_enqueue+0x8b/0x210
[   43.880222][  T319]  __dev_queue_xmit+0x2346/0x37a0
[   43.880376][  T319]  ? register_lock_class+0x3f/0x800
[   43.880531][  T319]  ? srso_alias_return_thunk+0x5/0xfbef5
[   43.880684][  T319]  ? __pfx___dev_queue_xmit+0x10/0x10
[   43.880834][  T319]  ? srso_alias_return_thunk+0x5/0xfbef5
[   43.880977][  T319]  ? __lock_acquire+0x819/0x1df0
[   43.881124][  T319]  ? srso_alias_return_thunk+0x5/0xfbef5
[   43.881275][  T319]  ? srso_alias_return_thunk+0x5/0xfbef5
[   43.881418][  T319]  ? __asan_memcpy+0x3c/0x60
[   43.881563][  T319]  ? srso_alias_return_thunk+0x5/0xfbef5
[   43.881708][  T319]  ? eth_header+0x165/0x1a0
[   43.881853][  T319]  ? lockdep_hardirqs_on_prepare+0xdb/0x1a0
[   43.882031][  T319]  ? srso_alias_return_thunk+0x5/0xfbef5
[   43.882174][  T319]  ? neigh_resolve_output+0x3cc/0x7e0
[   43.882325][  T319]  ? srso_alias_return_thunk+0x5/0xfbef5
[   43.882471][  T319]  ip_finish_output2+0x6b6/0x1e10

Fix this by calling qdisc_reset for CBS' child qdisc.
Sashiko caught an issue which could result in a null ptr deref if
qdisc_create_dflt() is invoked on an unitialised cbs qdisc which is exposed
by this patch. We add an early return if the qdisc is null to address this.
This is a similar approach used by two other fixes[1][2].

The proper fix for this specific issue elucidated by sashiko is to remove
the call to qdisc_reset when qdisc_create_dflt fails. Since the dflt qdisc
isn't attached anywhere yet at that point, calling the reset callback doesn't
make much sense (and as stated has been a source of two other bugs).
We plan on  submitting this fix in a later patch.
[1] https://lore.kernel.org/netdev/20221018063201.306474-2-shaozhengchao@huawei.com/
[2] https://lore.kernel.org/netdev/20221018063201.306474-4-shaozhengchao@huawei.com/

Fixes: 585d763af09c ("net/sched: Introduce Credit Based Shaper (CBS) qdisc")
Reported-by: Junyoung Jang <graypanda.inzag@gmail.com>
Tested-by: Junyoung Jang <graypanda.inzag@gmail.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoASoC: sdw_utils: Check speaker component string allocation
Cássio Gabriel [Tue, 12 May 2026 14:03:53 +0000 (11:03 -0300)] 
ASoC: sdw_utils: Check speaker component string allocation

devm_kasprintf() can fail while building the temporary speaker
component string. If that happens, spk_components is set to NULL, but
the current code can still pass it to strlen() on a later loop iteration
or after the loop when appending the speaker component list to
card->components.

Use NULL to represent the initial "no speaker components" state, and
return -ENOMEM immediately if building spk_components fails.

Fixes: 0f60ecffbfe3 ("ASoC: sdw_utils: generate combined spk components string")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260512-asoc-sdw-utils-spk-components-alloc-v1-1-c9bbd6d2e123@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agomm/memory: fix spurious warning when unmapping device-private/exclusive pages
Alistair Popple [Fri, 1 May 2026 06:51:16 +0000 (16:51 +1000)] 
mm/memory: fix spurious warning when unmapping device-private/exclusive pages

Device private and exclusive entries are only supported for anonymous
folios.  This condition is tested in __migrate_device_pages() and
make_device_exclusive() using folio_test_anon().  However the unmap path
tests this assumption using vma_is_anonymous().

This is wrong because whilst anonymous VMAs can only contain folios where
folio_test_anon() is true the opposite relation does not hold.  A folio
for which folio_test_anon() is true does not imply vma_is_anonymous() is
true.  Such a condition can occur if for example a folio is part of a
private filebacked mapping.

In this case vma_is_anonymous() is false as the mapping is filebacked, but
folio_test_anon() may be true, thus permitting devices to migrate the
folio to device private memory.  This can lead to the following spurious
warnings during process teardown:

[  772.737706] ------------[ cut here ]------------
[  772.739201] WARNING: mm/memory.c:1754 at unmap_page_range.cold+0x26/0x18a, CPU#17: hmm-tests/2041
[  772.742050] Modules linked in: test_hmm nvidia_uvm(O) nvidia(O)
[  772.743959] CPU: 17 UID: 0 PID: 2041 Comm: hmm-tests Tainted: G        W  O        7.0.0+ #387 PREEMPT(full)
[  772.747104] Tainted: [W]=WARN, [O]=OOT_MODULE
[  772.748509] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[  772.752117] RIP: 0010:unmap_page_range.cold+0x26/0x18a
[  772.753780] Code: 7e fe ff ff 48 89 4c 24 78 4c 89 44 24 38 e8 f2 ff b1 00 48 8b 4c 24 78 4c 8b 44 24 38 48 8b 44 24 18 48 83 78 48 00 74 04 90 <0f> 0b 90 48 89 ca b8 ff ff 37 00 48 c1 ea 03 48 c1 e0 2a 80 3c 02
[  772.759602] RSP: 0018:ffff888112607550 EFLAGS: 00010286
[  772.761310] RAX: ffff88811bbf4dc0 RBX: dffffc0000000000 RCX: ffffea03e9bfffd8
[  772.763583] RDX: 1ffff1102377e9c1 RSI: 0000000000000008 RDI: ffff88811bbf4e08
[  772.765914] RBP: 0000000000000006 R08: ffff8881059f7448 R09: ffffed10224c0e68
[  772.768184] R10: ffff888112607347 R11: 0000000000000001 R12: 0000000000000001
[  772.770461] R13: ffffea03e9bfffc0 R14: ffff888112607908 R15: ffffea03e9bfffc0
[  772.772782] FS:  00007f327caa2780(0000) GS:ffff888427b7d000(0000) knlGS:0000000000000000
[  772.775328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  772.777187] CR2: 00007f327ca89000 CR3: 00000001994d5000 CR4: 00000000000006f0
[  772.779135] Call Trace:
[  772.779792]  <TASK>
[  772.780317]  ? dmirror_interval_invalidate+0x1a3/0x290 [test_hmm]
[  772.781873]  ? vm_normal_page_pud+0x2b0/0x2b0
[  772.782992]  ? __rwlock_init+0x150/0x150
[  772.784006]  ? lock_release+0x216/0x2b0
[  772.785008]  ? __mmu_notifier_invalidate_range_start+0x505/0x6e0
[  772.786522]  ? lock_release+0x216/0x2b0
[  772.787498]  ? unmap_single_vma+0xb6/0x210
[  772.788573]  unmap_vmas+0x27d/0x520
[  772.789506]  ? unmap_single_vma+0x210/0x210
[  772.790607]  ? mas_update_gap.part.0+0x620/0x620
[  772.791834]  unmap_region+0x19e/0x350
[  772.792769]  ? remove_vma+0x130/0x130
[  772.793684]  ? mas_alloc_nodes+0x1f2/0x300
[  772.794730]  vms_complete_munmap_vmas+0x8c1/0xe20
[  772.795926]  ? unmap_region+0x350/0x350
[  772.796917]  do_vmi_align_munmap+0x36a/0x4e0
[  772.798018]  ? lock_release+0x216/0x2b0
[  772.799024]  ? vma_shrink+0x620/0x620
[  772.799983]  do_vmi_munmap+0x150/0x2c0
[  772.800939]  __vm_munmap+0x161/0x2c0
[  772.801872]  ? expand_downwards+0xd60/0xd60
[  772.802948]  ? clockevents_program_event+0x1ef/0x540
[  772.804217]  ? lock_release+0x216/0x2b0
[  772.805158]  __x64_sys_munmap+0x59/0x80
[  772.805776]  do_syscall_64+0xfc/0x670
[  772.806336]  ? irqentry_exit+0xda/0x580
[  772.806976]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  772.807772] RIP: 0033:0x7f327cbb2717
[  772.808323] Code: 73 01 c3 48 8b 0d f9 76 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 76 0d 00 f7 d8 64 89 01 48
[  772.811337] RSP: 002b:00007ffde7f57d38 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
[  772.812564] RAX: ffffffffffffffda RBX: 00007f327cc9c000 RCX: 00007f327cbb2717
[  772.813733] RDX: 0000000000000000 RSI: 0000000000400000 RDI: 00007f327c289000
[  772.814867] RBP: 0000000000421360 R08: 000000000000001a R09: 0000000000000000
[  772.815991] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffde7f57d74
[  772.817121] R13: 00007f327c689010 R14: 0000000000100000 R15: 00007f327c289000
[  772.818272]  </TASK>
[  772.818614] irq event stamp: 0
[  772.819159] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  772.820174] hardirqs last disabled at (0): [<ffffffff82a57ab3>] copy_process+0x19f3/0x6440
[  772.821511] softirqs last  enabled at (0): [<ffffffff82a57b00>] copy_process+0x1a40/0x6440
[  772.822869] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  772.823871] ---[ end trace 0000000000000000 ]---

Fix this by using the same check for folio_test_anon() in
zap_nonpresent_ptes(). Also add a hmm-test case for this.

Link: https://lore.kernel.org/20260501065116.2057242-1-apopple@nvidia.com
Fixes: 999dad824c39 ("mm/shmem: persist uffd-wp bit across zapping for file-backed")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reported-by: Arsen Arsenović <aarsenovic@baylibre.com>
Reviewed-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm: fix __vm_normal_page() to handle missing support for pmd_special()/pud_special()
David Hildenbrand (Arm) [Thu, 30 Apr 2026 11:31:22 +0000 (13:31 +0200)] 
mm: fix __vm_normal_page() to handle missing support for pmd_special()/pud_special()

On x86 32-bit with THP enabled, zap_huge_pmd() is seen to generate a
"WARNING: mm/memory.c:735 at __vm_normal_page+0x6a/0x7d", from the
VM_WARN_ON_ONCE(is_zero_pfn(pfn) || is_huge_zero_pfn(pfn)); followed by
"BUG: Bad rss-counter state"s, then later "BUG: Bad page state"s when
reclaim gets to call shrink_huge_zero_folio_scan().

It's as if the _PAGE_SPECIAL bit never got set in the huge_zero pmd: and
indeed, whereas pte_special() and pte_mkspecial() are subject to a
dedicated CONFIG_ARCH_HAS_PTE_SPECIAL, pmd_special() and pmd_mkspecial()
are subject to CONFIG_ARCH_SUPPORTS_PMD_PFNMAP, which is never enabled on
any 32-bit architecture.

While the problem was exposed through commit d80a9cb1a64a
("mm/huge_memory: add and use normal_or_softleaf_folio_pmd()"), it was an
oversight in commit af38538801c6 ("mm/memory: factor out common code from
vm_normal_page_*()") and would result in other problems:
* huge zero folio accounted in smaps, pagemap (PAGE_IS_FILE) and
  numamaps as file-backed THP
* folio_walk_start() returning the folio even without FW_ZEROPAGE set.
  Callers seem to tolerate that, though.

... and triggering the VM_WARN_ON_ONE(), although never reported so far.

To fix it, teach vm_normal_page_pmd()/vm_normal_page_pud() to consider
whether pmd_special/pud_special is actually implemented.

Link: https://lore.kernel.org/20260430-pmd_special-v1-1-dbcbcfd72c20@kernel.org
Fixes: af38538801c6 ("mm/memory: factor out common code from vm_normal_page_*()")
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Reported-by: Hugh Dickins <hughd@google.com>
Closes: https://lore.kernel.org/r/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com
Reported-by: Bibo Mao <maobibo@loongson.cn>
Closes: https://lore.kernel.org/r/20260430041121.2839350-1-maobibo@loongson.cn
Debugged-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agodrivers/base/memory: fix memory block reference leak in poison accounting
Muchun Song [Tue, 28 Apr 2026 08:52:18 +0000 (16:52 +0800)] 
drivers/base/memory: fix memory block reference leak in poison accounting

memblk_nr_poison_inc() and memblk_nr_poison_sub() look up a memory block
via find_memory_block_by_id(), which acquires a reference to the memory
block device.

Both helpers use the returned memory block without dropping that
reference, leaking the device reference on each successful lookup.  Drop
the reference after updating nr_hwpoison.

Link: https://lore.kernel.org/20260428085219.1316047-3-songmuchun@bytedance.com
Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Huang, Ying" <huang.ying.caritas@gmail.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/memory_hotplug: fix memory block reference leak on remove
Muchun Song [Tue, 28 Apr 2026 08:52:17 +0000 (16:52 +0800)] 
mm/memory_hotplug: fix memory block reference leak on remove

Patch series "mm: Fix memory block leaks and locking", v2.

This series fixes two memory block device reference leaks and one locking
issue around the per-memory_block hwpoison counter.

This patch (of 2):

remove_memory_blocks_and_altmaps() looks up each memory block with
find_memory_block(), which acquires a reference to the memory block
device.

That reference is never dropped on this path, resulting in a leaked device
reference when removing memory blocks and their altmaps.  Drop the
reference after retrieving mem->altmap and clearing mem->altmap, before
removing the memory block device.

Link: https://lore.kernel.org/20260428085219.1316047-1-songmuchun@bytedance.com
Link: https://lore.kernel.org/20260428085219.1316047-2-songmuchun@bytedance.com
Fixes: 6b8f0798b85a ("mm/memory_hotplug: split memmap_on_memory requests across memblocks")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Huang, Ying" <huang.ying.caritas@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agolib: kunit_iov_iter: fix test fail on powerpc
Christian A. Ehrhardt [Tue, 21 Apr 2026 07:07:07 +0000 (09:07 +0200)] 
lib: kunit_iov_iter: fix test fail on powerpc

Increase buffer size to accommodate machines with 64K PAGE_SIZE.

Link: https://lore.kernel.org/20260421070707.992873-1-lk@c--e.de
Fixes: 0913b7554726 ("lib: kunit_iov_iter: add tests for extract_iter_to_sg")
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Reported-by: David Gow <davidgow@google.com>
Closes: https://lore.kernel.org/34a81ec2-af84-465d-9b5e-7bb5bf01680f@davidgow.net
Tested-by: David Gow <davidgow@google.com>
Tested-by: Josh Law <joshlaw48@gmail.com>
Reviewed-by: Josh Law <joshlaw48@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/page_alloc: fix initialization of tags of the huge zero folio with init_on_free
David Hildenbrand (Arm) [Tue, 21 Apr 2026 15:39:07 +0000 (17:39 +0200)] 
mm/page_alloc: fix initialization of tags of the huge zero folio with init_on_free

__GFP_ZEROTAGS semantics are currently a bit weird, but effectively this
flag is only ever set alongside __GFP_ZERO and __GFP_SKIP_KASAN.

If we run with init_on_free, we will zero out pages during
__free_pages_prepare(), to skip zeroing on the allocation path.

However, when allocating with __GFP_ZEROTAG set, post_alloc_hook() will
consequently not only skip clearing page content, but also skip clearing
tag memory.

Not clearing tags through __GFP_ZEROTAGS is irrelevant for most pages that
will get mapped to user space through set_pte_at() later: set_pte_at() and
friends will detect that the tags have not been initialized yet
(PG_mte_tagged not set), and initialize them.

However, for the huge zero folio, which will be mapped through a PMD
marked as special, this initialization will not be performed, ending up
exposing whatever tags were still set for the pages.

The docs (Documentation/arch/arm64/memory-tagging-extension.rst) state
that allocation tags are set to 0 when a page is first mapped to user
space.  That no longer holds with the huge zero folio when init_on_free is
enabled.

Fix it by decoupling __GFP_ZEROTAGS from __GFP_ZERO, passing to
tag_clear_highpages() whether we want to also clear page content.

Invert the meaning of the tag_clear_highpages() return value to have
clearer semantics.

Reproduced with the huge zero folio by modifying the check_buffer_fill
arm64/mte selftest to use a 2 MiB area, after making sure that pages have
a non-0 tag set when freeing (note that, during boot, we will not actually
initialize tags, but only set KASAN_TAG_KERNEL in the page flags).

$ ./check_buffer_fill
1..20
...
not ok 17 Check initial tags with private mapping, sync error mode and mmap memory
not ok 18 Check initial tags with private mapping, sync error mode and mmap/mprotect memory
...

This code needs more cleanups; we'll tackle that next, like
decoupling __GFP_ZEROTAGS from __GFP_SKIP_KASAN.

[akpm@linux-foundation.org: s/__GPF_ZERO/__GFP_ZERO/, per David]
Link: https://lore.kernel.org/20260421-zerotags-v2-1-05cb1035482e@kernel.org
Fixes: adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero folio")
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Lance Yang <lance.yang@linux.dev>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoMAINTAINERS: add kexec@ list to LIVE UPDATE ENTRY
Mike Rapoport (Microsoft) [Tue, 28 Apr 2026 12:48:33 +0000 (15:48 +0300)] 
MAINTAINERS: add kexec@ list to LIVE UPDATE ENTRY

Link: https://lore.kernel.org/20260428124833.1903302-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Baoquan He <baoquan.he@linux.dev>
Cc: Dave Young <ruirui.yang@linux.dev>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoMAINTAINERS: add tree for KDUMP and KEXEC
Mike Rapoport (Microsoft) [Tue, 28 Apr 2026 12:48:32 +0000 (15:48 +0300)] 
MAINTAINERS: add tree for KDUMP and KEXEC

Patch series "MAINTAINERS: update KEXEC, KDUMP and LIVE UPDATE".

KHO and LiveUpdate team is going to pick kdump and kexec patches to
their tree at

https://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git

Update MAINTAINERS to reflect this change and add kexec@ list to LIVE
UPDATE entry.

This patch (of 2):

KHO and LiveUpdate team is going to pick kdump and kexec patches to their
tree at

https://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git

Update MAINTAINERS to reflect it.

Link: https://lore.kernel.org/20260428124833.1903302-1-rppt@kernel.org
Link: https://lore.kernel.org/20260428124833.1903302-2-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Baoquan He <baoquan.he@linux.dev>
Acked-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Young <ruirui.yang@linux.dev>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoselftests/mm: run_vmtests.sh: fix destructive tests invocation
Luiz Capitulino [Mon, 27 Apr 2026 16:03:51 +0000 (12:03 -0400)] 
selftests/mm: run_vmtests.sh: fix destructive tests invocation

Destructive tests should be invoked with -d command-line option, but this
won't work today since 'd' is missing in getopts command-line.  This
commit fixes it.

Link: https://lore.kernel.org/214fd9e4-5398-4c26-859e-c982c2e277c3@redhat.com
Fixes: f16ff3b692ad ("selftests/mm: run_vmtests.sh: add missing tests")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoscripts/gdb: slab: update field names of struct kmem_cache
Illia Ostapyshyn [Mon, 27 Apr 2026 14:24:48 +0000 (16:24 +0200)] 
scripts/gdb: slab: update field names of struct kmem_cache

The commit 5ba6bc27b1f9 ("slab: decouple pointer to barn from
kmem_cache_node") reorganized the struct kmem_cache to factor out the
per-node fields to the new struct kmem_cache_per_node_ptrs.  This causes
the gdb scripts for lx-slabinfo and lx-slabtrace fail as they still
reference the old structure.

Adjust the gdb scripts to match the current state of struct kmem_cache.

Link: https://lore.kernel.org/20260427142448.666117-3-illia@yshyn.com
Fixes: 5ba6bc27b1f9 ("slab: decouple pointer to barn from kmem_cache_node")
Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: Hao Li <hao.li@linux.dev>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Seongjun Hong <hsj0512@snu.ac.kr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoscripts/gdb: mm: cast untyped symbols in x86_page_ops
Illia Ostapyshyn [Mon, 27 Apr 2026 14:24:47 +0000 (16:24 +0200)] 
scripts/gdb: mm: cast untyped symbols in x86_page_ops

The symbols phys_base, _text, and _end, used in x86_page_ops are either
defined in assembly or implicitly by the linker.  Thus, they lack type
information and cause a conversion error after gdb.parse_and_eval.
Explicitly cast these expressions to unsigned long.

Link: https://lore.kernel.org/20260427142448.666117-2-illia@yshyn.com
Fixes: 55f8b4518d14 ("scripts/gdb: implement x86_page_ops in mm.py")
Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.com>
Cc: Hao Li <hao.li@linux.dev>
Cc: Harry Yoo <harry@kernel.org>
Cc: Seongjun Hong <hsj0512@snu.ac.kr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon: fix damos_stat tracepoint format for sz_applied
SeongJae Park [Sun, 26 Apr 2026 19:31:17 +0000 (12:31 -0700)] 
mm/damon: fix damos_stat tracepoint format for sz_applied

The print format is wrongly marking sz_applied as sz_tried.  Fix it.

Link: https://lore.kernel.org/20260426193119.88095-1-sj@kernel.org
Fixes: 804c26b961da ("mm/damon/core: add trace point for damos stat per apply interval")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org> # 7.0.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/damon/sysfs-schemes: call missing mem_cgroup_iter_break()
SeongJae Park [Sun, 26 Apr 2026 17:36:12 +0000 (10:36 -0700)] 
mm/damon/sysfs-schemes: call missing mem_cgroup_iter_break()

damon_sysfs_memcg_path_to_id() breaks mem_cgroup_iter() loop without
calling mem_cgroup_iter_break().  This leaks the cgroup reference.  Fix
the issue by calling mem_cgroup_iter_break() before the break.

The issue was discovered [1] by Sashiko.

Link: https://lore.kernel.org/20260426173625.86521-1-sj@kernel.org
Link: https://lore.kernel.org/20260423004148.74722-1-sj@kernel.org
Fixes: 29cbb9a13f05 ("mm/damon/sysfs-schemes: implement scheme filters")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.3.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/migrate_device: fix spinlock leak in migrate_vma_insert_huge_pmd_page
Sunny Patel [Sat, 25 Apr 2026 13:35:27 +0000 (19:05 +0530)] 
mm/migrate_device: fix spinlock leak in migrate_vma_insert_huge_pmd_page

When check_stable_address_space() fails after the PMD spinlock has
been acquired via pmd_lock(), the code jumps directly to the abort
label, bypassing the spin_unlock() call in unlock_abort. This causes
the PMD spinlock to be permanently held, leading to a deadlock.

Change the goto target from abort to unlock_abort to ensure the
spinlock is always released on this error path.

Link: https://lore.kernel.org/20260425133537.17463-1-nueralspacetech@gmail.com
Fixes: a30b48bf1b24 ("mm/migrate_device: implement THP migration of zone device pages")
Signed-off-by: Sunny Patel <nueralspacetech@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Balbir Singh <balbirs@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoFDDI: defza: Sanitise the reset safety timer
Maciej W. Rozycki [Sat, 9 May 2026 21:04:50 +0000 (22:04 +0100)] 
FDDI: defza: Sanitise the reset safety timer

The reset actions of the DEFZA adapters are exceedingly slow, taking up
to 30 seconds to complete by the device spec and typically in the range
of 10 seconds in reality, as required for the device RTOS to boot, still
quite a lot.  Therefore a state machine is used that's interrupt driven,
however a safety mechanism is required in case of adapter malfunction,
so that if no state change interrupt has arrived in time, then the
situation is taken care of.

The safety mechanism depends on the origin of the reset.  For regular
adapter initialisation at the device probe time a sleep is requested.
However a reset is also required by the device spec when the adapter has
transitioned into the halted state, such as in response to a PC Trace
event in the course of ring fault recovery, possibly a common network
event.  In that case no sleep is possible as a device halt is reported
at the hardirq level.

A timer is therefore set up to ensure progress in case no adapter state
change interrupt has arrived in time, but as from commit 168f6b6ffbee
("timers: Use del_timer_sync() even on UP") a warning is issued as the
timer is deleted in the hardirq handler upon an expected state change:

  defza: v.1.1.4  Oct  6 2018  Maciej W. Rozycki
  tc2: DEC FDDIcontroller 700 or 700-C at 0x18000000, irq 4
  tc2: resetting the board...
  ------------[ cut here ]------------
  WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0
  Modules linked in:
  CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-dirty #2 VOLUNTARY
  Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468
          0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458
          ffffffff80897400 9800000002027b88 0000000000000000 7070617773203a6d
          0000000000000000 9800000002027ba4 0000000000001000 6465746e69617420
          0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009
          000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000
          0000000000000000 0000000000000020 0000000000000000 ffffffff80910000
          ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0
          0000000000000000 0000000000000000 0000000000000000 0000000000000009
          000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0
          ...
  Call Trace:
  [<ffffffff800502b8>] show_stack+0x28/0xf0
  [<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c
  [<ffffffff80068c98>] __warn+0xa0/0x128
  [<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4
  [<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120
  [<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8
  [<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228
  [<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78
  [<ffffffff800cc320>] handle_percpu_irq+0x50/0x80
  [<ffffffff800c5970>] generic_handle_irq+0x90/0xd0
  [<ffffffff806e956c>] do_IRQ+0x1c/0x30
  [<ffffffff8004ad4c>] handle_int+0x148/0x154
  [<ffffffff800ab7c0>] do_idle+0x40/0x108
  [<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38
  [<ffffffff806dfec8>] kernel_init+0x0/0x108

  ---[ end trace 0000000000000000 ]---
  tc2: OK
  tc2: model 700 (DEFZA-AA), MMF PMD, address 08-00-2b-xx-xx-xx
  tc2: ROM rev. 1.0, firmware rev. 1.2, RMC rev. A, SMT ver. 1
  tc2: link unavailable
  ------------[ cut here ]------------
  WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0
  Modules linked in:
  CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G        W           7.0.0-dirty #2 VOLUNTARY
  Tainted: [W]=WARN
  Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468
          0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458
          ffffffff80897400 9800000002027b88 0000000000000000 0000000000000000
          0000000000000000 9800000002027ba4 0000000000001000 0000000000000000
          0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009
          000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000
          0000000000000000 0000000000000020 0000000000000000 ffffffff80910000
          ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0
          0000000000000000 0000000000000000 0000000000000000 0000000000000009
          000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0
          ...
  Call Trace:
  [<ffffffff800502b8>] show_stack+0x28/0xf0
  [<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c
  [<ffffffff80068c98>] __warn+0xa0/0x128
  [<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4
  [<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120
  [<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8
  [<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228
  [<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78
  [<ffffffff800cc320>] handle_percpu_irq+0x50/0x80
  [<ffffffff800c5970>] generic_handle_irq+0x90/0xd0
  [<ffffffff806e956c>] do_IRQ+0x1c/0x30
  [<ffffffff8004ad4c>] handle_int+0x148/0x154
  [<ffffffff806de8a4>] arch_local_irq_disable+0x4/0x28
  [<ffffffff800ab7d0>] do_idle+0x50/0x108
  [<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38
  [<ffffffff806dfec8>] kernel_init+0x0/0x108

  ---[ end trace 0000000000000000 ]---
  tc2: registered as fddi0

The immediate origin of the new warning is the switch away from aliasing
del_timer_sync() to del_timer() (timer_delete_sync() to timer_delete()
in terms of current function names) for UP configurations, which however
is the only choice for this driver anyway as no SMP hardware supports
the TURBOchannel bus this device interfaces to.  Therefore there is a
very remote issue only this is a sign of.

Specifically if an adapter reset issued upon a transition to the halted
state times out and first triggers fza_reset_timer() for another reset
assertion, which then schedules fza_reset_timer() for reset deassertion
and then that second call is pre-empted after poking at the hardware,
but before the timer has been rearmed and owing to high system load
causing exceedingly high scheduling latency control is not handed back
before a transition to the uninitialised state has caused the timer to
be deleted even before it has been started, then fza_reset_timer() will
be called yet again and issue another reset even though by then the
adapter has already recovered.

Prevent this situation from happening by switching to timer_delete() for
the transition to the halted state and protect the code region affected
with a spinlock, also to make sure add_timer() has not been called twice
in a row due to an execution race between the interrupt handler and the
timer handler (though it could only happen on SMP, but let's keep the
driver clean).  It's a very unlikely sequence of events to happen and
therefore there's no point in trying to be overly clever about it, such
as by placing printk() calls outside the protection.  For the transition
to the uninitialised state switch to timer_delete_sync_try() instead, so
that a timer isn't deleted that's just been rearmed by the timer handler
and needs to watch for the device to come out of reset again (again, an
SMP scenario only).

Retain timer_delete_sync() invocations outside the hardirq context for a
stray timer not to fire once device structures have been released.

Fixes: 61414f5ec9834 ("FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agodrm/hyperv: During panic do VMBus unload after frame buffer is flushed
Michael Kelley [Tue, 17 Feb 2026 18:23:35 +0000 (10:23 -0800)] 
drm/hyperv: During panic do VMBus unload after frame buffer is flushed

In a VM, Linux panic information (reason for the panic, stack trace,
etc.) may be written to a serial console and/or a virtual frame buffer
for a graphics console. The latter may need to be flushed back to the
host hypervisor for display.

The current Hyper-V DRM driver for the frame buffer does the flushing
*after* the VMBus connection has been unloaded, such that panic messages
are not displayed on the graphics console. A user with a Hyper-V graphics
console is left with just a hung empty screen after a panic. The enhanced
control that DRM provides over the panic display in the graphics console
is similarly non-functional.

Commit 3671f3777758 ("drm/hyperv: Add support for drm_panic") added
the Hyper-V DRM driver support to flush the virtual frame buffer. It
provided necessary functionality but did not handle the sequencing
problem with VMBus unload.

Fix the full problem by using VMBus functions to suppress the VMBus
unload that is normally done by the VMBus driver in the panic path. Then
after the frame buffer has been flushed, do the VMBus unload so that a
kdump kernel can start cleanly. As expected, CONFIG_DRM_PANIC must be
selected for these changes to have effect. As a side benefit, the
enhanced features of the DRM panic path are also functional.

Fixes: 3671f3777758 ("drm/hyperv: Add support for drm_panic")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
3 weeks agoDrivers: hv: vmbus: Provide option to skip VMBus unload on panic
Michael Kelley [Tue, 17 Feb 2026 18:23:34 +0000 (10:23 -0800)] 
Drivers: hv: vmbus: Provide option to skip VMBus unload on panic

Currently, VMBus code initiates a VMBus unload in the panic path so
that if a kdump kernel is loaded, it can start fresh in setting up its
own VMBus connection. However, a driver for the VMBus virtual frame
buffer may need to flush dirty portions of the frame buffer back to
the Hyper-V host so that panic information is visible in the graphics
console. To support such flushing, provide exported functions for the
frame buffer driver to specify that the VMBus unload should not be
done by the VMBus driver, and to initiate the VMBus unload itself.
Together these allow a frame buffer driver to delay the VMBus unload
until after it has completed the flush.

Ideally, the VMBus driver could use its own panic-path callback to do
the unload after all frame buffer drivers have finished. But DRM frame
buffer drivers use the kmsg dump callback, and there are no callbacks
after that in the panic path. Hence this somewhat messy approach to
properly sequencing the frame buffer flush and the VMBus unload.

Fixes: 3671f3777758 ("drm/hyperv: Add support for drm_panic")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
3 weeks agoi2c: tegra: make tegra_i2c_mutex_unlock() return void
Saurav Sachidanand [Thu, 7 May 2026 22:11:45 +0000 (22:11 +0000)] 
i2c: tegra: make tegra_i2c_mutex_unlock() return void

tegra_i2c_mutex_unlock() returning an error that overwrites the transfer
result causes silent loss of I2C transfer errors. If the transfer failed
but the unlock succeeded, the error was lost and the function incorrectly
reported success.

Rather than propagating the unlock error (which is not actionable by the
caller - the I2C message may have been sent regardless), convert the
function to return void and WARN on the unexpected condition. If the
unlock fails, subsequent lock attempts will fail anyway, making the error
visible on the next transfer.

Fixes: 6077cfd716fb ("i2c: tegra: Add support for SW mutex register")
Signed-off-by: Saurav Sachidanand <sauravsc@amazon.com>
Cc: <stable@vger.kernel.org> # v7.0+
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260507221145.62183-3-sauravsc@amazon.com
3 weeks agoi2c: tegra: fix pm_runtime leak on mutex_lock failure
Saurav Sachidanand [Thu, 7 May 2026 22:11:44 +0000 (22:11 +0000)] 
i2c: tegra: fix pm_runtime leak on mutex_lock failure

If tegra_i2c_mutex_lock() fails, the function returns without calling
pm_runtime_put(), leaking the runtime PM reference acquired by the
preceding pm_runtime_get_sync(). This prevents the device from ever
entering runtime suspend.

Add the missing pm_runtime_put() before returning on lock failure.

Fixes: 6077cfd716fb ("i2c: tegra: Add support for SW mutex register")
Signed-off-by: Saurav Sachidanand <sauravsc@amazon.com>
Cc: <stable@vger.kernel.org> # v7.0+
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260507221145.62183-2-sauravsc@amazon.com
3 weeks agodriver core: platform: remove software node on release()
Bartosz Golaszewski [Wed, 13 May 2026 15:04:48 +0000 (17:04 +0200)] 
driver core: platform: remove software node on release()

If we pass a software node to a newly created device using struct
platform_device_info, it will not be removed when the device is
released. This may happen when a module creating the device is removed
or on failure in platform_device_add().

When we try to reuse that software node in a subsequent call to
platform_device_register_full(), it will fail with -EBUSY.

Provide a wrapper around the existing platform_device_release() that
additionally calls device_remove_software_node() and use it to replace
the former if we end up adding a software node.

While at it: check all three possible situations in which two software
nodes for a single platform device can be created/assigned in
platform_device_register_full() and bail-out early.

Fixes: 0fc434bc2c45 ("driver core: platform: allow attaching software nodes when creating devices")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260513-swnode-remove-on-dev-unreg-v6-1-f9c58939df27@oss.qualcomm.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
3 weeks agoMerge tag 'sched_ext-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 13 May 2026 22:00:40 +0000 (15:00 -0700)] 
Merge tag 'sched_ext-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:
 "The bulk of this is hardening of the new sub-scheduler infrastructure.

   - UAFs and lifecycle bugs on the sub-sched attach/detach paths:
     parent sub_kset freed under a racing child, list_del_rcu on an
     uninitialized list head, ops->priv stomped by concurrent
     attach/detach, and a UAF in the init-failure error path

   - Task state-machine reorg closing concurrent enable-vs-dead races: a
     task exiting during the unlocked init window could trip NULL ops
     derefs or skip exit_task() cleanup

   - A scx_link_sched() self-deadlock on scx_sched_lock

   - isolcpus: stop dereferencing the now-RCU-protected HK_TYPE_DOMAIN
     cpumask without RCU, and stop rejecting BPF schedulers when only
     cpuset isolated partitions are active

   - PREEMPT_RT: disable irq_work runs in hardirq context so dumps show
     the failing task rather than the irq_work kthread

   - Assorted !CONFIG_EXT_SUB_SCHED, randconfig, and selftest build
     fixes"

* tag 'sched_ext-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Use HK_TYPE_DOMAIN_BOOT to detect isolcpus= domain isolation
  sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
  sched_ext: INIT_LIST_HEAD() &sch->all in scx_alloc_and_add_sched()
  sched_ext: Drop NONE early return in scx_disable_and_exit_task()
  sched_ext: Avoid UAF in scx_root_enable_workfn() init failure path
  sched_ext: Clear ops->priv on scx_alloc_and_add_sched() error paths
  sched_ext: Fix ops->priv clobber on concurrent attach/detach
  selftests/sched_ext: Fix build error in dequeue selftest
  sched_ext: Handle SCX_TASK_NONE in disable/switched_from paths
  sched_ext: Close sub-sched init race with post-init DEAD recheck
  sched_ext: Close root-enable vs sched_ext_dead() race with SCX_TASK_INIT_BEGIN
  sched_ext: Replace SCX_TASK_OFF_TASKS flag with SCX_TASK_DEAD state
  sched_ext: Inline scx_init_task() and move RESET_RUNNABLE_AT into scx_set_task_state()
  sched_ext: Cleanups in preparation for the SCX_TASK_INIT_BEGIN/DEAD work
  sched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work
  sched_ext: Fix !CONFIG_EXT_SUB_SCHED build warnings
  sched_ext: Drop unused scx_find_sub_sched() stub
  sched_ext: Move scx_error() out of scx_link_sched()'s lock region

3 weeks agoMerge tag 'cgroup-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 13 May 2026 21:56:31 +0000 (14:56 -0700)] 
Merge tag 'cgroup-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - cpuset fixes:
     - Partition invalidation could return CPUs still in use by sibling
       partitions, producing overlapping effective_cpus
     - cpuset_can_attach() over-reserved DL bandwidth on moves that
       stayed within the same root domain
     - Pending DL migration state leaked into later attaches when a
       later can_attach() check failed
     - Reorder PF_EXITING and __GFP_HARDWALL checks so dying tasks can
       allocate from any node and exit quickly

 - dmem: propagate -ENOMEM instead of spinning forever when the fallback
   pool allocation also fails

 - selftests/cgroup: percpu test error-path leak, bogus numeric
   comparison of cpuset strings, and a zero-length read() that silently
   passed OOM-kill tests

* tag 'cgroup-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Return only actually allocated CPUs during partition invalidation
  selftests/cgroup: Fix error path leaks in test_percpu_basic
  cgroup/cpuset: Reserve DL bandwidth only for root-domain moves
  cgroup/cpuset: Reset DL migration state on can_attach() failure
  selftests/cgroup: Fix string comparison in write_test
  selftests/cgroup: Fix cg_read_strcmp() empty string comparison
  cgroup/dmem: Return -ENOMEM on failed pool preallocation
  cgroup/cpuset: move PF_EXITING check before __GFP_HARDWALL in cpuset_current_node_allowed()

3 weeks agoMerge tag 'wq-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 13 May 2026 21:49:13 +0000 (14:49 -0700)] 
Merge tag 'wq-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fixes from Tejun Heo:

 - Plug a wq->cpu_pwq leak on the WQ_UNBOUND allocation failure path

 - Fix a cancel_delayed_work_sync() livelock against drain_workqueue()
   caused by the drain/destroy reject path leaving WORK_STRUCT_PENDING
   set with no owner

* tag 'wq-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Fix wq->cpu_pwq leak in alloc_and_link_pwqs() WQ_UNBOUND path
  workqueue: Release PENDING in __queue_work() drain/destroy reject path

3 weeks agodrm/msm/a6xx: Check kzalloc return in a8xx_hfi_send_perf_table
Chen Ni [Tue, 28 Apr 2026 07:35:58 +0000 (15:35 +0800)] 
drm/msm/a6xx: Check kzalloc return in a8xx_hfi_send_perf_table

Check the return value of kzalloc() to prevent a NULL pointer
dereference on allocation failure.

Fixes: 06cfbca0e1c6 ("drm/msm/a6xx: Share dependency vote table with GMU")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/721342/
Message-ID: <20260428073558.1234238-1-nichen@iscas.ac.cn>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
3 weeks agodrm/msm: Fix iommu_map_sgtable() return value check and avoid WARN
Mikko Perttunen [Tue, 21 Apr 2026 04:02:38 +0000 (13:02 +0900)] 
drm/msm: Fix iommu_map_sgtable() return value check and avoid WARN

Commit "iommu: return full error code from iommu_map_sg[_atomic]()"
changed iommu_map_sgtable() to return an ssize_t and negative values
in error cases, rather than a size_t and a zero.

Store the return value in the appropriate type and in case of error,
return it rather than WARNing.

Fixes: ad8f36e4b6b1 ("iommu: return full error code from iommu_map_sg[_atomic]()")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Patchwork: https://patchwork.freedesktop.org/patch/719685/
Message-ID: <20260421-iommu_map_sgtable-return-v1-3-fb484c07d2a1@nvidia.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
3 weeks agodrm/msm: Correct modparam description
Rob Clark [Sat, 18 Apr 2026 15:08:47 +0000 (08:08 -0700)] 
drm/msm: Correct modparam description

Preemption is enabled for gen8 as well.

Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/719256/
Message-ID: <20260418150847.157246-1-robin.clark@oss.qualcomm.com>

3 weeks agodrm/msm/a6xx: Restore sysprof_active
Rob Clark [Sat, 11 Apr 2026 15:03:12 +0000 (08:03 -0700)] 
drm/msm/a6xx: Restore sysprof_active

This got lost in the shuffle somehow when moving the vfunc table to
catalogue.  Fixes inhibiting IFPC when userspace is collecting perfcntr
data.

Fixes: 491fadb2b818 ("drm/msm/adreno: Move adreno_gpu_func to catalogue")
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/717780/
Message-ID: <20260411150312.257937-1-robin.clark@oss.qualcomm.com>

3 weeks agodrm/msm/adreno: fix userspace-triggered crash on a2xx-a4xx
Dmitry Baryshkov [Sat, 11 Apr 2026 14:59:15 +0000 (17:59 +0300)] 
drm/msm/adreno: fix userspace-triggered crash on a2xx-a4xx

Before a5xx Adreno driver will not try fetching UBWC params (because
those generations didn't support UBWC anyway), however it's still
possible to query UBWC-related params from the userspace, triggering
possible NULL pointer dereference. Check for UBWC config in
adreno_get_param() and return sane defaults if there is none.

Fixes: a452510aad53 ("drm/msm/adreno: Switch to the common UBWC config struct")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/717778/
Message-ID: <20260411-adreno-fix-ubwc-v3-1-4983156f3f80@oss.qualcomm.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
3 weeks agoksmbd: fix null pointer dereference in compare_guid_key()
Jeremy Laratro [Tue, 12 May 2026 23:26:16 +0000 (08:26 +0900)] 
ksmbd: fix null pointer dereference in compare_guid_key()

session_fd_check() walks the per-inode m_op_list during durable-handle
session teardown and sets op->conn = NULL for every opinfo whose conn
matched the closing session's connection. The matching opinfo, however,
stays linked in its per-ClientGuid lease_table_list entry's lb->lease_list
because destroy_lease_table() only runs on full TCP-connection teardown,
not on SESSION_LOGOFF.

If the same TCP connection then negotiates a fresh session with the
same ClientGuid (ClientGuid is bound to NEGOTIATE, not the session, and
is unchanged across LOGOFF + SETUP) and issues a SMB2 CREATE with a
lease context on a different inode, find_same_lease_key() walks
lb->lease_list, reaches the stale opinfo, and calls compare_guid_key(),
which unconditionally dereferences opinfo->conn->ClientGUID. The conn
pointer is NULL and the kernel panics.

Reproducer requires only a successful SMB2 SESSION_SETUP and a share
configured with 'durable handles = yes'. KASAN report on mainline
70390501d194:

  general protection fault, probably for non-canonical address
  0xdffffc0000000069: 0000 [#1] SMP KASAN PTI
  KASAN: null-ptr-deref in range [0x0000000000000348-0x000000000000034f]
  Workqueue: ksmbd-io handle_ksmbd_work
  RIP: 0010:bcmp+0x5b/0x230
  Call Trace:
   compare_guid_key+0x4b/0xd0
   find_same_lease_key+0x324/0x690
   smb2_open+0x6aea/0x8e60
   handle_ksmbd_work+0x796/0xee0
   ...

Faulting address 0x348 is the offset of ClientGUID within struct
ksmbd_conn, confirming opinfo->conn was NULL.

Read opinfo->conn once and bail out if it has been cleared by a
concurrent session_fd_check(). A half-detached opinfo cannot be the
owner of an active lease, so returning 0 is the correct match result.

Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2")
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Laratro <research@aradex.io>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 weeks agoksmbd: fix null pointer dereference in proc_show_files()
Jeremy Laratro [Tue, 12 May 2026 23:23:26 +0000 (08:23 +0900)] 
ksmbd: fix null pointer dereference in proc_show_files()

When a SMB2 client opens a file with a durable v2 handle and then issues
SMB2 SESSION_LOGOFF, session_fd_check() clears fp->tcon = NULL on the
reconnectable file pointer but leaves the fp registered in global_ft.idr
until the durable scavenger fires (up to fp->durable_timeout seconds
later).

During that window any read of /proc/fs/ksmbd/files (mode 0400) panics
the kernel because proc_show_files() walks global_ft.idr and
unconditionally dereferences fp->tcon->id with no NULL guard.

Reproducer requires only a successful SMB2 SESSION_SETUP and a share
configured with 'durable handles = yes'. KASAN report on mainline
70390501d194:

  general protection fault, probably for non-canonical address
  0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  RIP: 0010:proc_show_files+0x118/0x740
  Call Trace:
   proc_show_files+0x118/0x740
   seq_read_iter+0x4ef/0xe10
   proc_reg_read_iter+0x1b7/0x280
   ...

Guard the dereference. A durable-disconnected fp legitimately has no
tcon; report its tree id as 0 rather than oopsing.

Fixes: b38f99c1217a ("ksmbd: add procfs interface for runtime monitoring and statistics")
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Laratro <research@aradex.io>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 weeks agoksmbd: fix SID memory leak in set_posix_acl_entries_dacl() on overflow
Ferry Meng [Mon, 11 May 2026 13:18:16 +0000 (21:18 +0800)] 
ksmbd: fix SID memory leak in set_posix_acl_entries_dacl() on overflow

Commit 299f962c0b02 ("ksmbd: use check_add_overflow() to prevent u16
DACL size overflow") added check_add_overflow() guards that break out
of the ACE-building loops in set_posix_acl_entries_dacl() when the
accumulated DACL size would wrap past 65535.

However, each iteration allocates a struct smb_sid via kmalloc_obj()
at the top of the loop and relies on the kfree(sid) call at the end
of the loop body (the 'pass_same_sid' label in the first loop, and
the explicit kfree at the tail of the second loop) to release it.
The newly introduced 'break' statements bypass those kfree() calls,
leaking the sid buffer every time an overflow is detected.

A malicious or malformed file with enough POSIX ACL entries to trip
the overflow check will leak one or more struct smb_sid allocations
on every request that touches the file's DACL, providing a trivial
kernel memory exhaustion vector.

Free sid before breaking out of the loops to plug the leak.

Fixes: 299f962c0b02 ("ksmbd: use check_add_overflow() to prevent u16 DACL size overflow")
Cc: stable@vger.kernel.org
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 weeks agodrm/msm/adreno: Fix a reference leak in a6xx_gpu_init()
Felix Gu [Fri, 23 Jan 2026 16:37:38 +0000 (00:37 +0800)] 
drm/msm/adreno: Fix a reference leak in a6xx_gpu_init()

In a6xx_gpu_init(), node is obtained via of_parse_phandle().
While there was a manual of_node_put() at the end of the
common path, several early error returns would bypass this call,
resulting in a reference leak.
Fix this by using the __free(device_node) cleanup handler to
release the reference when the variable goes out of scope.

Fixes: 5a903a44a984 ("drm/msm/a6xx: Introduce GMU wrapper support")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/700661/
Message-ID: <20260124-a6xx_gpu-v1-1-fa0c8b2dcfb1@gmail.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
3 weeks agodrm/msm: Fix GMEM_BASE for A650
Alexander Koskovich [Sat, 14 Mar 2026 04:14:50 +0000 (04:14 +0000)] 
drm/msm: Fix GMEM_BASE for A650

Commit dc220915ddb2 ("drm/msm: Fix GMEM_BASE for gen8") changed the
GMEM_BASE check from adreno_is_a650_family() & adreno_is_a740_family()
to family >= ADRENO_6XX_GEN4.

This inadvertently excluded A650 (ADRENO_6XX_GEN3), causing it to report
an incorrect GMEM_BASE which results in severe rendering corruption.

Update check to also include ADRENO_6XX_GEN3 to fix A650.

Fixes: dc220915ddb2 ("drm/msm: Fix GMEM_BASE for gen8")
Signed-off-by: Alexander Koskovich <akoskovich@pm.me>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/711880/
Message-ID: <20260314-fix-gmem-base-a650-v1-1-3308f60cf74c@pm.me>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
3 weeks agovfio/pci: Make VFIO_PCI_OFFSET_TO_INDEX() return unsigned
Matt Evans [Mon, 11 May 2026 14:46:42 +0000 (07:46 -0700)] 
vfio/pci: Make VFIO_PCI_OFFSET_TO_INDEX() return unsigned

VFIO_PCI_OFFSET_TO_INDEX() is used in several places with a signed
parameter (e.g. loff_t).  Because it makes no sense for a BAR/resource
index to be negative, enforce this in the macro.

This fixes at least one current issue, where vfio_pci_ioeventfd() uses
this macro with an unvalidated signed loff_t returned into a signed
type, leading to a possible negative array access.  This instance does
test against an out-of-bounds positive value, so treating the index as
unsigned fixes this issue.

Fixes: 89e1f7d4c66d8 ("vfio: Add PCI device driver")
Signed-off-by: Matt Evans <mattev@meta.com>
Link: https://lore.kernel.org/r/20260511144642.2926799-1-mattev@meta.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
3 weeks agosched_ext: Use HK_TYPE_DOMAIN_BOOT to detect isolcpus= domain isolation
Andrea Righi [Wed, 13 May 2026 11:24:38 +0000 (13:24 +0200)] 
sched_ext: Use HK_TYPE_DOMAIN_BOOT to detect isolcpus= domain isolation

scx_enable() refuses to attach a BPF scheduler when isolcpus=domain is
in effect by comparing housekeeping_cpumask(HK_TYPE_DOMAIN) against
cpu_possible_mask.

Since commit 27c3a5967f05 ("sched/isolation: Convert housekeeping
cpumasks to rcu pointers"), HK_TYPE_DOMAIN's cpumask is RCU protected
and dereferencing it requires either RCU read lock, the cpu_hotplug
write lock, or the cpuset lock; scx_enable() holds none of these, so
booting with isolcpus=domain and attaching any BPF scheduler triggers
the following lockdep splat:

  =============================
  WARNING: suspicious RCU usage
  -----------------------------
  kernel/sched/isolation.c:60 suspicious rcu_dereference_check() usage!

  1 lock held by scx_flash/281:
   #0: ffffffff8379fce0 (update_mutex){+.+.}-{4:4}, at:
       bpf_struct_ops_link_create+0x134/0x1c0

  Call Trace:
   dump_stack_lvl+0x6f/0xb0
   lockdep_rcu_suspicious.cold+0x37/0x70
   housekeeping_cpumask+0xcd/0xe0
   scx_enable.isra.0+0x17/0x120
   bpf_scx_reg+0x5e/0x80
   bpf_struct_ops_link_create+0x151/0x1c0
   __sys_bpf+0x1e4b/0x33c0
   __x64_sys_bpf+0x21/0x30
   do_syscall_64+0x117/0xf80
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

In addition, commit 03ff73510169 ("cpuset: Update HK_TYPE_DOMAIN cpumask
from cpuset") made HK_TYPE_DOMAIN include cpuset isolated partitions as
well, which means the current check also rejects BPF schedulers when a
cpuset partition is active. That contradicts the original intent of
commit 9f391f94a173 ("sched_ext: Disallow loading BPF scheduler if
isolcpus= domain isolation is in effect"), which explicitly noted that
cpuset partitions are honored through per-task cpumasks and should not
be rejected.

Switch to housekeeping_enabled(HK_TYPE_DOMAIN_BOOT), which reads only
the housekeeping flag bit (no RCU dereference) and reflects exactly the
boot-time isolcpus= configuration that the error message refers to.

Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
Cc: stable@vger.kernel.org # v7.0+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
3 weeks agovfio/pci: fix dma-buf kref underflow after revoke
Alex Williamson [Thu, 7 May 2026 14:35:46 +0000 (08:35 -0600)] 
vfio/pci: fix dma-buf kref underflow after revoke

vfio_pci_dma_buf_move(revoked=true) and vfio_pci_dma_buf_cleanup()
ran the same drain sequence: set priv->revoked, invalidate mappings,
wait for fences, drop the registered kref, wait for completion.
When the VFIO device fd was closed after PCI_COMMAND_MEMORY had been
cleared, both ran in turn -- the second kref_put underflowed and the
subsequent wait_for_completion() blocked on a completion that the
first run had already consumed:

  refcount_t: underflow; use-after-free.
  WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x59/0x90
  Call Trace:
   vfio_pci_dma_buf_cleanup+0x163/0x168 [vfio_pci_core]
   vfio_pci_core_close_device+0x67/0xe0 [vfio_pci_core]
   vfio_df_close+0x4c/0x80 [vfio]
   vfio_df_group_close+0x36/0x80 [vfio]
   vfio_device_fops_release+0x21/0x40 [vfio]
   __fput+0xe6/0x2b0
   __x64_sys_close+0x3d/0x80

Collapse the duplication: vfio_pci_dma_buf_cleanup() now delegates
the drain to vfio_pci_dma_buf_move(true), which is idempotent for
already-revoked dma-bufs.  cleanup retains only list removal and
the device registration drop; the dma_resv_lock that bracketed
those is dropped along with the in-line drain that required it,
memory_lock continues to protect them.

Re-arm the kref and the completion at the end of move()'s revoke
branch so post-revoke state matches post-creation (kref == 1,
completion ready).  This keeps cleanup's call into move() a no-op
when revoke already ran, and replaces the explicit kref_init() that
the un-revoke branch used to perform for the un-revoke -> remap
path.

Fixes: 1a8a5227f229 ("vfio: Wait for dma-buf invalidation to complete")
Reported-by: Joonas Kylmälä <joonas.kylmala@netum.fi>
Closes: https://lore.kernel.org/all/GVXPR02MB12019AA6014F27EF5D773E89BFB372@GVXPR02MB12019.eurprd02.prod.outlook.com/
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Reviewed-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260507143548.1018405-1-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
3 weeks agoblock: align down bounces bios
Christoph Hellwig [Thu, 7 May 2026 05:01:48 +0000 (07:01 +0200)] 
block: align down bounces bios

Just like for the extract user pages path, we need to align down the
size to the supported boundary.

Fixes: 8dd5e7c75d7b ("block: add helpers to bounce buffer an iov_iter into bios")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Link: https://patch.msgid.link/20260507050153.1298375-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agoblock: pass a minsize argument to bio_iov_iter_bounce
Christoph Hellwig [Thu, 7 May 2026 05:01:47 +0000 (07:01 +0200)] 
block: pass a minsize argument to bio_iov_iter_bounce

When bouncing for block size > PAGE_SIZE file systems that require
file system block size alignment (e.g. zoned XFS), the bio needs to
be big enough to fit an entire block.

Fixes: 8dd5e7c75d7b ("block: add helpers to bounce buffer an iov_iter into bios")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Link: https://patch.msgid.link/20260507050153.1298375-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agocpufreq: intel_pstate: Use HYBRID_SCALING_FACTOR_ADL for Bartlett Lake
Henry Tseng [Wed, 13 May 2026 10:28:46 +0000 (18:28 +0800)] 
cpufreq: intel_pstate: Use HYBRID_SCALING_FACTOR_ADL for Bartlett Lake

Bartlett Lake P-core only SKUs (e.g. Intel Core 9 273PE)
do not report X86_FEATURE_HYBRID_CPU and are not in
intel_hybrid_scaling_factor[].  In hwp_get_cpu_scaling(), the
non-hybrid fallback then applies core_get_scaling() (100000),
producing cpuinfo_max_freq values that exceed the documented Max
Turbo Frequency:

  intel_pstate: CPU0: PERF_CTL turbo = 57
  intel_pstate: CPU0: HWP_CAP guaranteed = 30
  intel_pstate: CPU0: HWP_CAP highest = 70
  intel_pstate: CPU0: HWP-to-frequency scaling factor: 100000
  intel_pstate: set_policy cpuinfo.max 7000000 policy->max 7000000
  ...
  intel_pstate: CPU12: PERF_CTL turbo = 57
  intel_pstate: CPU12: HWP_CAP guaranteed = 30
  intel_pstate: CPU12: HWP_CAP highest = 73
  intel_pstate: CPU12: HWP-to-frequency scaling factor: 100000
  intel_pstate: set_policy cpuinfo.max 7300000 policy->max 7300000
  ...

Per the Intel datasheet [1], the Intel Core 9 273PE specifies:

  Performance-cores:                                 12
  Efficient-cores:                                   0
  Max Turbo Frequency:                               5.7 GHz
  Intel Thermal Velocity Boost Frequency:            5.7 GHz
  Intel Turbo Boost Max Technology 3.0 Frequency:    5.6 GHz
  Performance-core Max Turbo Frequency:              5.4 GHz
  Performance-core Base Frequency:                   2.3 GHz

Bartlett Lake P-cores are Raptor Cove cores, per
commit d466304c4322 ("x86/cpu: Add CPU model number for Bartlett
Lake CPUs with Raptor Cove cores").  The Alder Lake and Raptor Lake
P-core entries in intel_hybrid_scaling_factor[] use
HYBRID_SCALING_FACTOR_ADL (78741).  The same factor applies to
Bartlett Lake.

Add Bartlett Lake to intel_hybrid_scaling_factor[] with
HYBRID_SCALING_FACTOR_ADL so HWP performance levels map to the
correct CPU frequencies matching the datasheet's Max Turbo Frequency.

Link: https://www.intel.com/content/www/us/en/products/sku/245717/intel-core-9-processor-273pe-36m-cache-up-to-5-70-ghz/specifications.html
Signed-off-by: Henry Tseng <henrytseng@qnap.com>
[ rjw: Changelog tweak ]
Link: https://patch.msgid.link/20260513102847.75179-1-henrytseng@qnap.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 weeks agocpufreq: intel_pstate: Use correct scaling factor on Raptor Lake-E
Rafael J. Wysocki [Tue, 12 May 2026 19:20:30 +0000 (21:20 +0200)] 
cpufreq: intel_pstate: Use correct scaling factor on Raptor Lake-E

Raptor Lake-E has the same processor ID as Raptor Lake-S, so there is
an entry in intel_hybrid_scaling_factor[] for it.  It does not contain
E-cores though and hybrid_get_cpu_type() returns 0 for its P-cores, so
they get the default "core" scaling factor.  However, the original
Raptor Lake scaling factor for P-cores still needs to be used for
mapping the HWP performance levels of the P-cores in Raptor Lake-E to
frequency, as though they were part of a real hybrid system.

To address this, update hwp_get_cpu_scaling() to return
hybrid_scaling_factor, which is the P-core scaling factor
retrieved from intel_hybrid_scaling_factor[], for all CPUs
that are not enumerated as E-cores.

Fixes: 9b18d536b124 ("cpufreq: intel_pstate: Use CPPC to get scaling factors")
Link: https://lore.kernel.org/all/20260511235328.2018458-1-srinivas.pandruvada@linux.intel.com/
Reported-by: Henry Tseng <henrytseng@qnap.com>
Closes: https://lore.kernel.org/linux-pm/20260508063032.3248602-1-henrytseng@qnap.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/4523296.ejJDZkT8p0@rafael.j.wysocki
3 weeks agoDocumentation: intel_pstate: Fix description of asymmetric packing with SMT
Ricardo Neri [Fri, 24 Apr 2026 21:41:13 +0000 (14:41 -0700)] 
Documentation: intel_pstate: Fix description of asymmetric packing with SMT

Patchset [1], including commits

 046a5a95c3b0 ("x86/sched/itmt: Give all SMT siblings of a core the same priority")
 995998ebdebd ("x86/sched: Remove SD_ASYM_PACKING from the SMT domain flags")

overhauled asym_packing handling in the scheduler on x86 hybrid
processors with SMT. It removed SD_ASYM_PACKING from the x86 SMT
scheduling domain and made all SMT siblings of a core share the same
priority. As a result, asym_packing operates only across physical
cores, spreading tasks among them and only using idle SMT siblings
once all physical cores are busy.

Fix the documentation to reflect this behavior.

Fixes: f20af84c29b2 ("cpufreq: intel_pstate: Document hybrid processor support")
Link: https://lore.kernel.org/r/20230406203148.19182-1-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20260424-rneri-fix-intel-pstate-doc-smt-asym-packing-v1-1-317bf7d5c362@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 weeks agoio-wq: check that the predecessor is hashed in io_wq_remove_pending()
Nicholas Carlini [Mon, 11 May 2026 18:02:16 +0000 (18:02 +0000)] 
io-wq: check that the predecessor is hashed in io_wq_remove_pending()

io_wq_remove_pending() needs to fix up wq->hash_tail[] if the cancelled
work was the tail of its hash bucket. When doing this, it checks whether
the preceding entry in acct->work_list has the same hash value, but
never checks that the predecessor is hashed at all. io_get_work_hash()
is simply atomic_read(&work->flags) >> IO_WQ_HASH_SHIFT, and the hash
bits are never set for non-hashed work, so it returns 0. Thus, when a
hashed bucket-0 work is cancelled while a non-hashed work is its list
predecessor, the check spuriously passes and a pointer to the non-hashed
io_kiocb is stored in wq->hash_tail[0].

Because non-hashed work is dequeued via the fast path in
io_get_next_work(), which never touches hash_tail[], the stale pointer
is never cleared. Therefore, after the non-hashed io_kiocb completes and
is freed back to req_cachep, wq->hash_tail[0] is a dangling pointer. The
io_wq is per-task (tctx->io_wq) and survives ring open/close, so the
dangling pointer persists for the lifetime of the task; the next hashed
bucket-0 enqueue dereferences it in io_wq_insert_work() and
wq_list_add_after() writes through freed memory.

Add the missing io_wq_is_hashed() check so a non-hashed predecessor
never inherits a hash_tail[] slot.

Cc: stable@vger.kernel.org
Fixes: 204361a77f40 ("io-wq: fix hang after cancelling pending hashed work")
Signed-off-by: Nicholas Carlini <nicholas@carlini.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agocgroup/cpuset: Return only actually allocated CPUs during partition invalidation
sunshaojie [Wed, 13 May 2026 10:37:38 +0000 (18:37 +0800)] 
cgroup/cpuset: Return only actually allocated CPUs during partition invalidation

In update_parent_effective_cpumask() with partcmd_invalidate, the CPUs
to return to the parent are computed as:

    adding = cpumask_and(tmp->addmask, xcpus, parent->effective_xcpus);

where xcpus = user_xcpus(cs) which returns cs->exclusive_cpus (if set)
or cs->cpus_allowed. When exclusive_cpus is not set, user_xcpus(cs) can
contain CPUs that were never actually granted to the partition due to
sibling exclusion in compute_excpus(). Consequently, the invalidation
may return CPUs to the parent that remain in use by sibling partitions,
causing overlapping effective_cpus and triggering the
WARN_ON_ONCE(1) in generate_sched_domains().

Use cs->effective_xcpus instead, which reflects the CPUs actually
granted to this partition.

Reproducer (on a 4-CPU machine):

    cd /sys/fs/cgroup
    mkdir a1 b1

    # a1 becomes partition root with CPUs 0-1
    echo "0-1" > a1/cpuset.cpus
    echo "root" > a1/cpuset.cpus.partition

    # b1 becomes partition root with CPUs 1-2, but sibling exclusion
    # reduces its effective_xcpus to CPU 2 only
    echo "1-2" > b1/cpuset.cpus
    echo "root" > b1/cpuset.cpus.partition

    # b1 changes cpus_allowed to 0-1 -> partition invalidation
    echo "0-1" > b1/cpuset.cpus

    # Expected: CPUs 2-3  (only CPU 2 returned from b1)
    # Actual:   CPUs 1-3  (CPU 0-1 returned, overlapping with a1)
    cat cpuset.cpus.effective

dmesg will also show a WARNING from generate_sched_domains() reporting
overlapping partition root effective_cpus.

Fixes: 2a3602030d80 ("cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict")
Cc: stable@vger.kernel.org # v7.0+
Signed-off-by: sunshaojie <sunshaojie@kylinos.cn>
Tested-by: Chen Ridong <chenridong@huaweicloud.com>
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
3 weeks agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Wed, 13 May 2026 18:53:51 +0000 (11:53 -0700)] 
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "arm64:

   - Add the pKVM side of the workaround for ARM's erratum 4193714,
     provided that the EL3 firmware does its part of the job. KVM will
     refuse to initialise otherwise

   - Correctly handle 52bit VAs for guest EL2 stage-1 translations when
     running under NV with E2H==0

   - Correctly deal with permission faults in guest_memfd memslots

   - Fix the steal-time selftest after the infrastructure was reworked

   - Make sure the host cannot pass a non-sensical clock update to the
     EL2 tracing infrastructure

   - Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390
     ability to run arm64 guests, which will inevitably lead to arm64
     code being directly used on s390

   - Make sure that EL2 is configured with both exception entry and exit
     being Context Synchronization Events

   - Handle the current vcpu being NULL on EL2 panic

   - Fix the selftest_vcpu memcache being empty at the point of donation
     or sharing

   - Check that the memcache has enough capacity before engaging on the
     share/donate path

   - Fix __deactivate_fgt() to use its parameter rather than a variable
     in the macro context

  s390:

   - Fix array overrun with large amounts of PCI devices

  x86:

   - Never use L0's PAUSE loop exiting while L2 is running, since it's
     unlikely that a nested guest will help solving the hypervisor's
     spinlock contention

   - Fix emulation of MOVNTDQA

   - Fix typo in Xen hypercall tracepoint

   - Add back an optimization that was left behind when recently fixing
     a bug

   - Add module parameter to disable CET, whose implementation seems to
     have issues. For now it remains enabled by default

  Generic:

   - Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn()

  Documentation:

   - Update stale links

  Selftests:

   - Fix guest_memfd_test with host page size > guest page size"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: VMX: introduce module parameter to disable CET
  KVM: x86: Swap the dst and src operand for MOVNTDQA
  KVM: x86: use again the flush argument of __link_shadow_page()
  KVM: selftests: Ensure gmem file sizes are multiple of host page size
  Documentation: kvm: update links in the references section of AMD Memory Encryption
  KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running
  KVM: x86: Fix Xen hypercall tracepoint argument assignment
  KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
  KVM: arm64: Pre-check vcpu memcache for host->guest donate
  KVM: arm64: Pre-check vcpu memcache for host->guest share
  KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
  KVM: arm64: Fix __deactivate_fgt macro parameter typo
  KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
  KVM: arm64: Make EL2 exception entry and exit context-synchronization events
  MAINTAINERS: Add Steffen as reviewer for KVM/arm64
  KVM: arm64: Remove potential UB on nvhe tracing clock update
  KVM: selftests: arm64: Fix steal_time test after UAPI refactoring
  KVM: arm64: Handle permission faults with guest_memfd
  KVM: arm64: nv: Consider the DS bit when translating TCR_EL2
  KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests
  ...

3 weeks agoselftests/cgroup: Fix error path leaks in test_percpu_basic
Yu Miao [Wed, 13 May 2026 02:39:07 +0000 (10:39 +0800)] 
selftests/cgroup: Fix error path leaks in test_percpu_basic

When cg_name_indexed() returns NULL partway through the child creation
loop, the code returned -1 without running cleanup_children and cleanup.
That left the `parent` pathname allocation unreleased and did not remove
child cgroup directories already created under the parent. Fix by jumping
to cleanup_children instead of returning.

When cg_create() fails, `child` (the pathname from cg_name_indexed())
was not freed before cleanup_children. Fix by freeing `child` before
branching to cleanup_children.

Fixes: 90631e1dea55 ("kselftests: cgroup: add perpcu memory accounting test")
Signed-off-by: Yu Miao <yumiao@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
3 weeks agoRDMA/bnxt_re: zero shared page before exposing to userspace
Lord Ulf Henrik Holmberg [Sat, 9 May 2026 08:40:11 +0000 (10:40 +0200)] 
RDMA/bnxt_re: zero shared page before exposing to userspace

bnxt_re_alloc_ucontext() allocates uctx->shpg via
__get_free_page(GFP_KERNEL). The buddy allocator does not zero pages
without __GFP_ZERO, so the page contains stale kernel data from
whatever object most recently freed it.

The page is then mapped into userspace via vm_insert_page() under
BNXT_RE_MMAP_SH_PAGE in bnxt_re_mmap(). The driver only ever writes
4 bytes (a u32 AVID) at offset BNXT_RE_AVID_OFFT (0x10) inside
bnxt_re_create_ah(); the remaining 4092 bytes of the page are exposed
to userspace unsanitised, leaking kernel memory contents.

Any user with access to /dev/infiniband/uverbsX on a host with a
bnxt_re device (typically rdma group membership) can read this data
via a single mmap() at pgoff 0 after IB_USER_VERBS_CMD_GET_CONTEXT.

Other shared pages in the same file already use get_zeroed_page()
correctly:

  drivers/infiniband/hw/bnxt_re/ib_verbs.c
      srq->uctx_srq_page = (void *)get_zeroed_page(GFP_KERNEL);
      cq->uctx_cq_page  = (void *)get_zeroed_page(GFP_KERNEL);

uctx->shpg is the only outlier. Bring it in line with the existing
convention by switching to get_zeroed_page().

Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Lord Ulf Henrik Holmberg <henrik.holmberg@defensify.se>
Link: https://patch.msgid.link/20260509084011.11971-1-pomzm67@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
3 weeks agoselftests/rdma: explicitly skip tests when required modules are missing
Yi Lai [Thu, 7 May 2026 12:51:06 +0000 (20:51 +0800)] 
selftests/rdma: explicitly skip tests when required modules are missing

Currently, the rdma rxe selftests fail with an exit code of 1 when
required kernel modules are not present. This causes spurious failures
in environments where these modules might not be compiled or available.

Include the standard kselftest 'ktap_helpers.sh' and replace the
hardcoded error exits with '$KSFT_SKIP'. This ensures the tests are
properly marked as skipped rather than failed.

Fixes: e01027cab38a ("RDMA/rxe: Add testcase for net namespace rxe")
Signed-off-by: Yi Lai <yi1.lai@intel.com>
Link: https://patch.msgid.link/20260507125106.3114167-1-yi1.lai@intel.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
3 weeks agoRDMA/nldev: Add mutual exclusion in nldev_dellink()
Edward Adam Davis [Thu, 7 May 2026 12:50:10 +0000 (20:50 +0800)] 
RDMA/nldev: Add mutual exclusion in nldev_dellink()

We must serialize calls to nldev_dellink() or risk a crash as syzbot
reported:

KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
Call Trace:
 udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
 rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
 rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
 rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
 rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254

Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Link: https://patch.msgid.link/tencent_611BEB4B141B1A2526BAA3BBB2335F9E9108@qq.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
3 weeks agodrm/gma500/oaktrail_lvds: fix i2c adapter leaks on init
Johan Hovold [Fri, 8 May 2026 14:44:46 +0000 (16:44 +0200)] 
drm/gma500/oaktrail_lvds: fix i2c adapter leaks on init

The LVDS init code looks up an I2C adapter using i2c_get_adapter() and
tries to read the EDID before falling back to allocating and registering
its own adapter.

Make sure to drop the references taken by i2c_get_adapter() when falling
back to allocating an adapter as well as on late errors to allow the
looked up adapter to be deregistered.

Fixes: 1b082ccf5901 ("gma500: Add Oaktrail support")
Cc: stable@vger.kernel.org # 3.3
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Link: https://patch.msgid.link/20260508144446.59722-4-johan@kernel.org
3 weeks agodrm/gma500/oaktrail_lvds: fix hang on init failure
Johan Hovold [Fri, 8 May 2026 14:44:45 +0000 (16:44 +0200)] 
drm/gma500/oaktrail_lvds: fix hang on init failure

The LVDS init code looks up an I2C adapter using i2c_get_adapter() and
tries to read the EDID before falling back to allocating and registering
its own adapter.

The error handling does not separate these cases so on a late init
failure it will try to deregister and free also an adapter that had
previously been registered. Since i2c_get_adapter() takes another
reference to the adapter, deregistration hangs indefinitely while
waiting for the reference to be released.

Fix this by only destroying adapters allocated during LVDS init on
errors.

Fixes: a57ebfc0b4da ("drm/gma500: Make oaktrail lvds use ddc adapter from drm_connector")
Cc: stable@vger.kernel.org # 6.0
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Link: https://patch.msgid.link/20260508144446.59722-3-johan@kernel.org
3 weeks agodrm/gma500/oaktrail_hdmi: fix i2c adapter leak on setup
Johan Hovold [Fri, 8 May 2026 14:44:44 +0000 (16:44 +0200)] 
drm/gma500/oaktrail_hdmi: fix i2c adapter leak on setup

Make sure to drop the reference taken to the I2C adapter (and its
module) when setting up HDMI to allow the adapter to be deregistered.

Fixes: 1b082ccf5901 ("gma500: Add Oaktrail support")
Cc: stable@vger.kernel.org # 3.3
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Link: https://patch.msgid.link/20260508144446.59722-2-johan@kernel.org
3 weeks agoKVM: selftests: Guard execinfo.h inclusion for non-glibc builds
Hisam Mehboob [Thu, 9 Apr 2026 15:38:47 +0000 (20:38 +0500)] 
KVM: selftests: Guard execinfo.h inclusion for non-glibc builds

The backtrace() function and execinfo.h are GNU extensions available
in glibc but not in non-glibc C libraries such as musl. Building KVM
selftests with musl-gcc fails with:

  lib/assert.c:9:10: fatal error: execinfo.h: No such file or directory

Fix this by guarding the inclusion of execinfo.h and the stack dumping
logic under #ifdef __GLIBC__. For non-glibc builds, provide a local
stub for test_dump_stack().

Suggested-by: Aqib Faruqui <aqibaf@amazon.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hisam Mehboob <hisamshar@gmail.com>
Link: https://patch.msgid.link/20260409153846.1502656-2-hisamshar@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: x86: Rate-limit global clock updates on vCPU load
Lei Chen [Thu, 9 Apr 2026 14:22:26 +0000 (22:22 +0800)] 
KVM: x86: Rate-limit global clock updates on vCPU load

commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.

As a result, kvm_arch_vcpu_load() can queue global clock update requests
every time a vCPU is scheduled when the master clock is disabled or when
the vCPU is loaded for the first time.

Restore the throttling with a per-VM ratelimit state and gate
KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
scheduling does not generate a steady stream of redundant clock update
requests.

Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
Link: https://patch.msgid.link/20260409142226.2581-1-lei.chen@smartx.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agox86/virt: Silence RCU lockdep splat in emergency virt callback path
Mikhail Gavrilov [Mon, 4 May 2026 23:54:35 +0000 (04:54 +0500)] 
x86/virt: Silence RCU lockdep splat in emergency virt callback path

x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching the crashing CPU, which triggers a suspicious
RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
panic/kdump:

  WARNING: suspicious RCU usage
  arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by tee/11119:
   #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write

  Call Trace:
   <TASK>
   dump_stack_lvl+0x84/0xd0
   lockdep_rcu_suspicious.cold+0x37/0x8f
   x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
   x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
   x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
   native_machine_crash_shutdown+0x72/0x170
   __crash_kexec+0x137/0x280
   panic+0xce/0xd0
   sysrq_handle_crash+0x1f/0x20
   __handle_sysrq.cold+0x192/0x335
   write_sysrq_trigger+0x8c/0xc0
   proc_reg_write+0x1c3/0x3c0
   vfs_write+0x1d0/0xf80
   ksys_write+0x116/0x250
   do_syscall_64+0x11c/0x1480
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
   </TASK>

A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
panic context (RCU may ignore the crashing CPU during synchronization),
and a concurrent KVM module unload could in principle race with the
callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
notifier registered on reboot/shutdown") which notes that nothing
prevents module unload during panic/reboot.

However, the alternatives are worse:

  - smp_store_release()/smp_load_acquire() handles ordering but not
    liveness; the kernel still needs to keep the module text alive
    while the callback is in flight.
  - Taking a lock in the panic path is risky — any lock could be held
    by a CPU that has already been NMI'd to a halt.

Use rcu_dereference_raw() to silence the splat and accept the
vanishingly small remaining race. Panic context inherently cannot
guarantee complete correctness; the goal here is to keep debug builds
quiet on the kdump path so the splat doesn't obscure the actual
kernel state being captured.

Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:

  echo c > /proc/sysrq-trigger

Suggested-by: Sean Christopherson <seanjc@google.com>
Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20260504235435.90957-1-mikhail.v.gavrilov@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoKVM: selftests: Include sys/mman.h *and* linux/mman.h, via kvm_syscalls.h
Sean Christopherson [Tue, 28 Apr 2026 01:25:03 +0000 (18:25 -0700)] 
KVM: selftests: Include sys/mman.h *and* linux/mman.h, via kvm_syscalls.h

Include both linux/mman.h (the kernel provided version) and sys/mman.h (the
libc provided version) throughout KVM selftests, by way of kvm_syscalls.h
(which should have been including sys/mman.h anyways).  Pulling in the
kernel's version fixes compilation errors with the guest_memfd test on
older versions of libc due to a recent commit adding MADV_COLLAPSE testing.

  In file included from include/kvm_util.h:8,
                   from guest_memfd_test.c:21:
  guest_memfd_test.c: In function ‘test_collapse’:
  guest_memfd_test.c:219:47: error: ‘MADV_COLLAPSE’ undeclared (first use in this function); did you mean ‘MADV_COLD’?
      219 |         TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_COLLAPSE), -1);
          |                                               ^~~~~~~~~~~~~
    include/test_util.h:62:16: note: in definition of macro ‘TEST_ASSERT_EQ’
       62 |         typeof(a) __a = (a);                                            \
          |                ^
    guest_memfd_test.c:219:47: note: each undeclared identifier is reported only once for each function it appears in
      219 |         TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_COLLAPSE), -1);
          |                                               ^~~~~~~~~~~~~
    include/test_util.h:62:16: note: in definition of macro ‘TEST_ASSERT_EQ’
       62 |         typeof(a) __a = (a);                                            \
          |                ^

Route the includes through kvm_syscalls.h to try and avoid a future game
of whack-a-mole, i.e. so that future expansion of test coverage doesn't run
into the same problem.

To discourage use of sys/mman.h, opportunistically include the kernel's
version of mman.h in test_util.h as it only needs MAP_SHARED, i.e. only
needs the full set of kernel defs, not the libc syscall wrappers.

Fixes: 9830209b4ae8 ("KVM: selftests: Test MADV_COLLAPSE on guest_memfd")
Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Closes: https://lore.kernel.org/all/20260427204313.50741-1-rick.p.edgecombe@intel.com
Link: https://patch.msgid.link/20260428012503.1213654-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agox86/mce: Restore MCA polling interval halving
Borislav Petkov (AMD) [Mon, 16 Mar 2026 15:12:00 +0000 (16:12 +0100)] 
x86/mce: Restore MCA polling interval halving

RongQing reported that the MCA polling interval doesn't halve when an
error gets logged. It was traced down to the commit in Fixes:, because:

  mce_timer_fn()
  |-> mce_poll_banks()
  |-> machine_check_poll()
  |-> mce_log()

which will queue the work and return.

Now, back in mce_timer_fn():

        /*
         * Alert userspace if needed. If we logged an MCE, reduce the polling
         * interval, otherwise increase the polling interval.
         */
        if (mce_notify_irq())

<--- here we haven't ran the notifier chain yet so mce_need_notify is
not set yet so this won't hit and we won't halve the interval iv.

Now the notifier chain runs. mce_early_notifier() sets the bit, does
mce_notify_irq(), that clears the bit and then the notifier chain
a little later logs the error.

So this is a silly timing issue.

But, that's all unnecessary.

All it needs to happen here is, the "should we notify of a logged MCE"
mce_notify_irq() asks, should be simply a question to the mce gen pool:
"Are you empty?"

And that then turns into a simple yes or no answer and it all
JustWorks(tm).

So do that and also distribute the functionality where it belongs:
 - Print that MCE events have been logged in mce_log()
 - Trigger the mcelog tool specific work in the first notifier

As a result, mce_notify_irq() can go now.

Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
Reported-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20260112082747.2842-1-lirongqing@baidu.com
3 weeks agoMerge tag 'fixes-2026-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupd...
Linus Torvalds [Wed, 13 May 2026 15:24:50 +0000 (08:24 -0700)] 
Merge tag 'fixes-2026-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux

Pull liveupdate fixes from Mike Rapoport:
 "A few fixes for kexec handover and liveupdate:

   - make sure KHO is skipped for crash kernel

   - fix error reporting in memfd preservation if it fails mid-loop

   - don't allow preserving memfds whose page count exceeds UINT_MAX

   - fix documentation of memfd seals preservation to match the code"

* tag 'fixes-2026-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux:
  mm/memfd_luo: document preservation of file seals
  mm/memfd_luo: reject memfds whose page count exceeds UINT_MAX
  mm/memfd_luo: report error when restoring a folio fails mid-loop
  kho: skip KHO for crash kernel

3 weeks agodrm/xe: Drop unused ggtt_balloon field
Michal Wajdeczko [Sun, 10 May 2026 20:56:05 +0000 (22:56 +0200)] 
drm/xe: Drop unused ggtt_balloon field

During recent GGTT refactoring we missed to drop now unused field
from the xe_tile. Drop it now.

Fixes: e904c56ba6e0 ("drm/xe: Rewrite GGTT VF initialization")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://patch.msgid.link/20260510205605.642-1-michal.wajdeczko@intel.com
(cherry picked from commit 21d5a871f57909dc4d8e4f5d3bf92f9ccf2597b2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 weeks agoselftests: ublk: cap nthreads to kernel's actual nr_hw_queues
Ming Lei [Wed, 13 May 2026 10:19:40 +0000 (18:19 +0800)] 
selftests: ublk: cap nthreads to kernel's actual nr_hw_queues

dev->nthreads is derived from the user-requested queue count before the
ADD command, but the kernel may reduce nr_hw_queues (capped to
nr_cpu_ids). When the VM has fewer CPUs than requested queues, the
daemon creates more handler threads than there are kernel queues.

In non-batch mode, the extra threads access uninitialized queues
(q_depth=0), submit zero io_uring SQEs, and block forever in
io_cqring_wait. In batch mode, the extra threads cause similar hangs
during device removal.

In both cases, the stuck threads prevent the daemon from closing the
char device, holding the last ublk_device reference and causing
ublk_ctrl_del_dev() to hang in wait_event_interruptible().

Fix by capping dev->nthreads to the kernel-returned nr_hw_queues after
the ADD command completes. per_io_tasks mode is excluded because threads
interleave across all queues, so nthreads > nr_hw_queues is valid.

Fixes: abe54c160346 ("selftests: ublk: kublk: decouple ublk_queues from ublk server threads")
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260513101941.1373998-1-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agoblock: fix handling of dead zone write plugs
Damien Le Moal [Wed, 13 May 2026 11:11:29 +0000 (20:11 +0900)] 
block: fix handling of dead zone write plugs

Shin'ichiro reported hard to reproduce unaligned write errors with zoned
block devices. Under normal operation conditions (e.g. running XFS on an
SMR disk), these errors are nearly impossible to trigger. But using a
"slow" kernel with many debug options enables and some specific use
cases (e.g. fio zbd test case 46), the errors can be reproduced fairly
easily.

The unaligned write errors come from mishandling a valid reference
counting pattern of zone write plugs. Such pattern triggers for instance
if a process A writes a zone (not necessarilly to the full state),
another process B immediately resets the zone and immediately following
the completion of the zone reset, starts issuing writes to the zone.
With such pattern, in some cases, the zone write plugs worker thread of
the device may still be holding a reference to the zone write plug of
the zone taken when process A was writing to the zone. The following
zone reset from process B marks the zone as dead but does not remove the
zone write plug from the device hash table as a reference to the plug
still exist. Once process B starts issuing new writes, the zone write
plug is seen as dead and the writes from process B are immediately
failed, despite this write pattern being perfectly legal.

Fix this by allowing restoring a dead zone write plug to a live state if
a write is issued to the zone when the zone is: marked as dead, empty
and the write sector corresponds to the first sector of the zone (that
is, the write is aligned to the zone write pointer). This is done with
the new helper function disk_check_zone_wplug_dead(), which restores a
dead zone write plug to a live state by clearing the BLK_ZONE_WPLUG_DEAD
flag and restoring the initial reference to the zone write plug taken
when the plug was added to the device hash table.

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: b7d4ffb51037 ("block: fix zone write plug removal")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://patch.msgid.link/20260513111129.108809-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agoKVM: VMX: introduce module parameter to disable CET
Paolo Bonzini [Tue, 12 May 2026 14:58:48 +0000 (16:58 +0200)] 
KVM: VMX: introduce module parameter to disable CET

There have been reports of host hangs caused by CET virtualization.
Until these are analyzed further, introduce a module parameter that
makes it possible to easily disable it.

Link: https://lore.kernel.org/all/85548beb-1486-40f9-beb4-632c78e3360b@proxmox.com/
Cc: David Riley <d.riley@proxmox.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 weeks agodrm/msm/dpu: don't mix devm and drmm functions
Dmitry Baryshkov [Tue, 5 May 2026 00:24:58 +0000 (03:24 +0300)] 
drm/msm/dpu: don't mix devm and drmm functions

Mixing devm and drmm functions will result in a use-after-free on msm
driver teardown if userspace keeps a reference on the drm device:
The WB connector data will be destroyed because of the use of
devm_kzalloc()), while the usersoace still can try interacting with the
WB connector (which uses drmm_ functions).

Change dpu_writeback_init() to use drmm_.

Fixes: 0b37ac63fc9d ("drm/msm/dpu: use drmm_writeback_connector_init()")
Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Closes: https://lore.kernel.org/r/78c764b8-44cf-4db5-88e7-807a85954518@wanadoo.fr
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: John.Harrison@Igalia.com
Patchwork: https://patchwork.freedesktop.org/patch/722656/
Link: https://lore.kernel.org/r/20260505-wb-drop-encoder-v5-1-42567b7c7af2@oss.qualcomm.com
3 weeks agodrm/msm/dsi: don't dump registers past the mapped region
Dmitry Baryshkov [Tue, 28 Apr 2026 17:21:38 +0000 (20:21 +0300)] 
drm/msm/dsi: don't dump registers past the mapped region

On DSI 6G platforms the IO address space is internally adjusted by
io_offset. Later this adjusted address might be used for memory dumping.
However the size that is used for memory dumping isn't adjusted to
account for the io_offset, leading to the potential access to the
unmapped region. Lower ctrl_size by the io_offset value to prevent
access past the mapped area.

 msm_disp_snapshot_add_block+0x1d4/0x3c8 [msm] (P)
 msm_dsi_host_snapshot+0x4c/0x78 [msm]
 msm_dsi_snapshot+0x28/0x50 [msm]
 msm_disp_snapshot_capture_state+0x74/0x140 [msm]
 msm_disp_snapshot_state_sync+0x60/0x90 [msm]
 _msm_disp_snapshot_work+0x30/0x90 [msm]
 kthread_worker_fn+0xdc/0x460
 kthread+0x120/0x140

Fixes: bac2c6a62ed9 ("drm/msm: get rid of msm_iomap_size")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/721747/
Link: https://lore.kernel.org/r/20260428-msm-fix-dsi-dump-v1-1-5d4cb5ccfac7@oss.qualcomm.com
3 weeks agodrm/msm/dpu: Fix Kaanapali CWB register configuration
Mahadevan P [Tue, 28 Apr 2026 11:44:25 +0000 (17:14 +0530)] 
drm/msm/dpu: Fix Kaanapali CWB register configuration

The Kaanapali DPU catalog defines kaanapali_cwb[] with the correct
CWB base addresses for this platform (0x169200, 0x169600, 0x16a200,
0x16a600), but the dpu_kaanapali_cfg struct was mistakenly pointing
to sm8650_cwb instead. The SM8650 CWB blocks sit at completely
different offsets (0x66200, 0x66600, 0x7E200, 0x7E600), so using
them on Kaanapali would program CWB registers at wrong addresses,
corrupting unrelated hardware blocks and breaking writeback capture.

Fix this by pointing .cwb to the correct kaanapali_cwb array.

Fixes: 83fe2cd56b1d ("drm/msm/dpu: Add support for Kaanapali DPU")
Signed-off-by: Mahadevan P <mahadevan.p@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/721444/
Link: https://lore.kernel.org/r/20260428-kaanapali_cwb-v1-1-51fdb2c65498@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodrm/msm/dpu: fix UV scanlines calculation for YUV UBWC formats
Neil Armstrong [Tue, 14 Apr 2026 15:14:30 +0000 (17:14 +0200)] 
drm/msm/dpu: fix UV scanlines calculation for YUV UBWC formats

The UV scanlines is calculated with (height + 1) / 2 unlike
the Y scanlines, add back the correct scanlines calculation
for UBWC YUV formats.

Fixes: 2f3ff6ab8f5c ("drm/msm/dpu: use standard functions in _dpu_format_populate_plane_sizes_ubwc()")
Fixes: ada4a19ed21c ("drm/msm/dpu: rewrite _dpu_format_populate_plane_sizes_ubwc()")
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/718309/
Link: https://lore.kernel.org/r/20260414-topic-sm8x50-msm-dpu1-formats-qc10c-v1-1-0b62325b9030@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodt-bindings: display/msm: qcom,eliza-mdss: Correct DPU and DP ranges in example
Krzysztof Kozlowski [Sun, 5 Apr 2026 14:34:01 +0000 (16:34 +0200)] 
dt-bindings: display/msm: qcom,eliza-mdss: Correct DPU and DP ranges in example

VBIF register range is 0x3000 long.  DisplayPort block has few too short
ranges and misses four more address spaces.  Similarly first part of DSI
space should be 0x300 long.

No practical impact, except when existing code is being re-used in new
contributions.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/716460/
Link: https://lore.kernel.org/r/20260405-dts-qcom-display-regs-v2-5-34f4024c65dc@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodt-bindings: display/msm: sm8750-mdss: Correct DPU and DP ranges in example
Krzysztof Kozlowski [Sun, 5 Apr 2026 14:34:00 +0000 (16:34 +0200)] 
dt-bindings: display/msm: sm8750-mdss: Correct DPU and DP ranges in example

VBIF register range is 0x3000 long. DisplayPort block has few too short
ranges and misses four more address spaces.

No practical impact, except when existing code is being re-used in new
contributions.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/716453/
Link: https://lore.kernel.org/r/20260405-dts-qcom-display-regs-v2-4-34f4024c65dc@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodt-bindings: display/msm: sm8650: Correct VBIF range in example
Krzysztof Kozlowski [Sun, 5 Apr 2026 14:33:59 +0000 (16:33 +0200)] 
dt-bindings: display/msm: sm8650: Correct VBIF range in example

VBIF register range is 0x3000 long, so correct the example.  No
practical impact, except when existing code is being re-used in new
contributions.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/716454/
Link: https://lore.kernel.org/r/20260405-dts-qcom-display-regs-v2-3-34f4024c65dc@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodt-bindings: display/msm: dp-controller: Allow DAI on SM8650 and others
Krzysztof Kozlowski [Sun, 5 Apr 2026 14:33:58 +0000 (16:33 +0200)] 
dt-bindings: display/msm: dp-controller: Allow DAI on SM8650 and others

DisplayPort on Qualcomm SoCs like SM8650 and compatible SM8750 supports
audio and there is already DTS having cells and sound-name-prefix.  The
"else:" clause for non-EDP and non-aux-bus cases already requires
'#sound-dai-cells', so it should actually reference the dai-common.yaml
for other properties, as pointed out by dtbs_check warnings like:

  sm8650-hdk-display-card-rear-camera-card.dtb:
    displayport-controller@af54000 (qcom,sm8650-dp): Unevaluated properties are not allowed ('sound-name-prefix' was unexpected)

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/716452/
Link: https://lore.kernel.org/r/20260405-dts-qcom-display-regs-v2-2-34f4024c65dc@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodt-bindings: display/msm: dp-controller: Correct SM8650 IO range
Krzysztof Kozlowski [Sun, 5 Apr 2026 14:33:57 +0000 (16:33 +0200)] 
dt-bindings: display/msm: dp-controller: Correct SM8650 IO range

DP on Qualcomm SM8650 come with nine address ranges, so describe the
remaining ones as optional to keep ABI backwards compatible.  Driver
also does not need them to operate correctly.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/716450/
Link: https://lore.kernel.org/r/20260405-dts-qcom-display-regs-v2-1-34f4024c65dc@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agoio_uring/rw: drop unused attr_type_mask from io_prep_rw_pi()
Yang Xiuwei [Wed, 13 May 2026 09:43:03 +0000 (17:43 +0800)] 
io_uring/rw: drop unused attr_type_mask from io_prep_rw_pi()

io_prep_rw_pi() never used the attr_type_mask argument. Callers already
validate sqe->attr_type_mask before invoking the helper (only
IORING_RW_ATTR_FLAG_PI is supported today). Remove the dead parameter to
avoid implying further interpretation happens here.

Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Link: https://patch.msgid.link/20260513094303.866533-1-yangxiuwei@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agopinctrl-amd: enable IRQ for WACF2200 touchscreen on Lenovo Yoga 7 14AGP11
Hardik Prakash [Tue, 12 May 2026 07:31:38 +0000 (13:01 +0530)] 
pinctrl-amd: enable IRQ for WACF2200 touchscreen on Lenovo Yoga 7 14AGP11

On Lenovo Yoga 7 14AGP11 (83TD), the WACF2200 touchscreen controller
is wired via I2C2 (AMDI0010:02) with its interrupt on GPIO pin 157
(confirmed via ACPI _CRS GpioInt decode). After amd_gpio_irq_init()
clears all GPIO interrupts at boot, pin 157 is never re-enabled,
preventing the touchscreen from signalling the driver.

Windows keeps GPIO 157 INTERRUPT_ENABLE (bit 11) and INTERRUPT_MASK
(bit 12) set after initialisation. Add a DMI quirk to restore these
bits after amd_gpio_irq_init() on this hardware.

Assisted-by: Claude:claude-sonnet-4-6
Assisted-by: GPT-Codex:gpt-5.2-codex
Signed-off-by: Hardik Prakash <hardikprakash.official@gmail.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
3 weeks agonet: ethernet: ravb: Do not check URAM suspension when WoL is active
Niklas Söderlund [Sun, 10 May 2026 10:30:17 +0000 (12:30 +0200)] 
net: ethernet: ravb: Do not check URAM suspension when WoL is active

When updating the driver to match latest datasheet to suspend access to
URAM when suspending DMA transfers a corner-case was missed, URAM access
will not be suspended if WoL is enabled. This lead to the error message
(correctly) being triggered as URAM access is not suspended even tho
it's requested as part of stopping DMA.

Avoid checking if URAM access is suspended and printing the error
message if WoL is enabled when we suspend the system, as we know it will
not be.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/all/CAMuHMdWnjV%3DHGE1o08zLhUfTgOSene5fYx1J5GG10mB%2BToq8qg@mail.gmail.com/
Fixes: 353d8e7989b6 ("net: ethernet: ravb: Suspend and resume the transmission flow")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoethtool: fix ethnl_bitmap32_not_zero() bit interval semantics
Chenguang Zhao [Mon, 11 May 2026 01:43:43 +0000 (09:43 +0800)] 
ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics

ethnl_bitmap32_not_zero() should return true if some bit in [start, end)
is set:

- Fix inverted memchr_inv() sense: return true when the scan finds a
  non-zero byte, not when the middle words are all zero.
- Return false for an empty interval (end <= start).
- When end is 32-bit aligned, indices in [start, end) do not include any
  bits from map[end_word]; return false after earlier checks found no
  non-zero data.

Fixes: 10b518d4e6dd ("ethtool: netlink bitset handling")
Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet/smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint
Xiang Mei [Sun, 10 May 2026 22:26:40 +0000 (15:26 -0700)] 
net/smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint

The smc_msg_event tracepoint class, shared by smc_tx_sendmsg and
smc_rx_recvmsg, unconditionally dereferences smc->conn.lnk:

__string(name, smc->conn.lnk->ibname)

conn->lnk is only set for SMC-R; for SMC-D it is NULL. Other code on
these paths already handles this (e.g. !conn->lnk in
SMC_STAT_RMB_TX_SIZE_SMALL()). With the tracepoint enabled, the first
sendmsg()/recvmsg() on an SMC-D socket crashes:

  Oops: general protection fault, probably for non-canonical address
  KASAN: null-ptr-deref in range [...]
  RIP: 0010:strlen+0x1e/0xa0
  Call Trace:
   trace_event_raw_event_smc_msg_event (net/smc/smc_tracepoint.h:44)
   smc_rx_recvmsg (net/smc/smc_rx.c:515)
   smc_recvmsg (net/smc/af_smc.c:2859)
   __sys_recvfrom (net/socket.c:2315)
   __x64_sys_recvfrom (net/socket.c:2326)
   do_syscall_64

The faulting address 0x3e0 is offsetof(struct smc_link, ibname),
confirming the NULL ->lnk deref. Enabling the tracepoint requires
root, but the trigger itself is unprivileged: socket(AF_SMC, ...) has
no capability check, and SMC-D negotiation needs no admin step on
s390 or on x86 with the loopback ISM device loaded.

Log an empty device name for SMC-D instead of dereferencing NULL.

Fixes: aff3083f10bf ("net/smc: Introduce tracepoints for tx and rx msg")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS
Nicolò Coccia [Sun, 10 May 2026 16:34:13 +0000 (12:34 -0400)] 
net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS

A logic flaw in __smc_setsockopt() allows a local unprivileged user to
cause a Denial of Service (DoS) by holding the socket lock indefinitely.

The function __smc_setsockopt() calls copy_from_sockptr() while holding
lock_sock(sk). By passing a userfaultfd-monitored memory page (or
FUSE-backed memory on systems where unprivileged userfaultfd is disabled)
as the optval, an attacker can halt execution during the copy operation,
keeping the lock held.

Combined with asynchronous tear-down operations like shutdown(), this
exhausts the kernel wq (kworkers) and triggers the hung task watchdog.

[  240.123456] INFO: task kworker/u8:2 blocked for more than 120 seconds.
[  240.123489] Call Trace:
[  240.123501]  smc_shutdown+...
[  240.123512]  lock_sock_nested+...

This patch moves the user-space copy outside the lock_sock() critical
section to prevent the issue.

Fixes: a6a6fe27bab4 ("net/smc: Dynamic control handshake limitation by socket options")
Signed-off-by: Nicolò Coccia <n.coccia96@gmail.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Tested-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: atm: fix skb leak in sigd_send() default branch
Wei Yang [Sat, 9 May 2026 12:23:58 +0000 (20:23 +0800)] 
net: atm: fix skb leak in sigd_send() default branch

The default branch in sigd_send() calls sock_put() and returns -EINVAL
without freeing the skb, while all other exit paths do so. Add the
missing dev_kfree_skb() before sock_put() to fix the leak.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wei Yang <albinwyang@tencent.com>
Link: https://patch.msgid.link/20260509122358.1102997-1-albin_yang@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: ethtool: phy: avoid NULL deref when PHY driver is unbound
David Carlier [Sat, 9 May 2026 21:50:46 +0000 (22:50 +0100)] 
net: ethtool: phy: avoid NULL deref when PHY driver is unbound

phydev->drv can become NULL while the phy_device is still attached to
its net_device, namely after the PHY driver is unbound via sysfs:

echo <mdio_id> > /sys/bus/mdio_bus/drivers/<phy_drv>/unbind

phy_remove() clears phydev->drv but doesn't call phy_detach(), so the
phy_device stays in the link topology xarray and ethnl_req_get_phydev()
still hands it back. ETHTOOL_MSG_PHY_GET then oopses on:

rep_data->drvname = kstrdup(phydev->drv->name, GFP_KERNEL);

drvname is already treated as optional by phy_reply_size(),
phy_fill_reply() and phy_cleanup_data(), so just skip the allocation
when there is no driver bound.

Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump")
Cc: stable@vger.kernel.org # 6.13.x
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260509215046.107157-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled
Zoran Ilievski [Mon, 11 May 2026 06:40:02 +0000 (08:40 +0200)] 
net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled

The shutdown handler aq_pci_shutdown() unconditionally calls
pci_wake_from_d3(pdev, false), clearing the PCI PME_En bit even when
wake-on-LAN has been configured. While aq_nic_shutdown() correctly
programs the NIC firmware via aq_nic_set_power() to listen for magic
packets, the PCI subsystem will not propagate the resulting PME wake
event from D3, so the system never wakes after poweroff.

WOL from suspend (S3) is unaffected because aq_suspend_common() does
not touch pci_wake_from_d3() and relies on the PM core's wake
configuration via device_may_wakeup().

This affects all atlantic-supported NICs (AQC107/108/111/112/113);
users have reported that WOL works if the atlantic driver is never
loaded, but breaks once it has run its shutdown path.

Pass the configured WOL state to pci_wake_from_d3() instead of a
literal false, so the PCI PME_En bit is preserved when the user has
armed WOL via ethtool.

Fixes: 90869ddfefeb ("net: aquantia: Implement pci shutdown callback")
Cc: stable@vger.kernel.org
Signed-off-by: Zoran Ilievski <goodboy@rexbytes.com>
Reviewed-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260511064002.1857-1-goodboy@rexbytes.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>