]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
9 days agoMerge tag 'drm-misc-fixes-2026-06-05' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Fri, 5 Jun 2026 22:37:21 +0000 (08:37 +1000)] 
Merge tag 'drm-misc-fixes-2026-06-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

dumb-buffer:
- remove strict limits on buffer geometry

ethosu:
- reject unsupported NPU_OP_RESIZE
- fix index of IFM region
- fix weight index
- fix overflows in DMA-size calculations
- reject DMA commands with uninitialized length
- fix OOB write in ethosu_gem_cmdstream_copy_and_validate

imx:
- fix kernel-doc warnings

ivpu:
- add overflow checks in firmware handling and get_info_ioctl

v3d:
- wait for pending L2T flush before cleaning caches
- fix leak of vaddr
- skip CSD when it has zeroed workgroups
- fix ref counting in performance monitoring

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260605072602.GA268798@linux.fritz.box
9 days agoMerge tag 'io_uring-7.1-20260605' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 5 Jun 2026 20:52:15 +0000 (13:52 -0700)] 
Merge tag 'io_uring-7.1-20260605' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fix from Jens Axboe:
 "A single fix for a missing flag mask when multishot is used with
  an incrementally consumed buffer ring, potentially leading to
  application confusion because of lack of IORING_CQE_F_BUF_MORE
  consistency"

* tag 'io_uring-7.1-20260605' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/net: inherit IORING_CQE_F_BUF_MORE across bundle recv retries

10 days agoMerge tag 'kbuild-fixes-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuil...
Linus Torvalds [Fri, 5 Jun 2026 18:16:15 +0000 (11:16 -0700)] 
Merge tag 'kbuild-fixes-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild fix from Nicolas Schier:
 "A single simple commit that fixes the currently broken kconfig
  selftests"

* tag 'kbuild-fixes-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kconfig: Fix repeated include selftest expectation

10 days agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 5 Jun 2026 17:38:45 +0000 (10:38 -0700)] 
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "arm64:
   - Correctly drop the ITS translation cache reference when it actually
     gets invalidated

   - Take the SRCU lock for SW page table walks

   - Restore POR_EL0 access to host EL0, avoiding POR_EL0 becoming
     inaccessible from EL0 after running a guest

   - Reassign nested_mmus array behind mmu_lock, ensuring that vcpu init
     and MMU notifiers are mutually exclusive

   - Correctly handle FEAT_XNX at stage-2

  s390:
   - More fixes for the new page table management and nested
     virtualization

  x86:
   - More fixes for GHCB issues:
      - Read start/end indices of page size change requests exactly once
        per vmexit
      - Unmap and unpin the GHCB as needed on vCPU free"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
  KVM: arm64: Correctly identify executable PTEs at stage-2
  KVM: arm64: nv: Fix handling of XN[0] when !FEAT_XNX
  KVM: arm64: Reassign nested_mmus array behind mmu_lock
  KVM: arm64: Restore POR_EL0 access to host EL0
  KVM: arm64: Take the SRCU lock for page table walks in fault injection and AT emulation
  KVM: arm64: vgic-its: Drop the translation cache reference only for the erased entry
  KVM: SEV: Unmap and unpin the GHCB as needed on vCPU free
  KVM: SEV: Decouple the need to sync the GHCB SA from the need to free the SA
  KVM: SEV: Move sev_free_vcpu() down below sev_es_unmap_ghcb()
  KVM: Don't WARN if memory is dirtied without a vCPU when the VM is dying
  KVM: SEV: Read start/end indices of PSC requests exactly once per #VMGEXIT
  KVM: SEV: Add an anonymous "psc" struct to track current PSC metadata
  KVM: SEV: Make it more obvious when KVM is writing back the current PSC index
  KVM: s390: Remove ptep_zap_softleaf_entry()
  KVM: s390: Fix possible reference leak in fault-in code
  KVM: s390: Prevent memslots outside the ASCE range
  KVM: s390: Lock pte when making page secure
  KVM: s390: Fix fault-in code
  KVM: s390: vsie: Fix rmap handling in _do_shadow_crste()
  KVM: s390: Fix guest / virtual address confusion in _essa_clear_cbrl()
  ...

10 days agoMerge tag 'probes-fixes-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 5 Jun 2026 17:33:32 +0000 (10:33 -0700)] 
Merge tag 'probes-fixes-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing/probes fix from Masami Hiramatsu:
 "Fix the eprobe event parser to point error position correctly"

* tag 'probes-fixes-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Point the error offset correctly for eprobe argument error

10 days agokconfig: Fix repeated include selftest expectation
Zhou Yuhang [Wed, 20 May 2026 07:08:00 +0000 (15:08 +0800)] 
kconfig: Fix repeated include selftest expectation

The err_repeated_inc test was added with an expected stderr fixture
that does not match the diagnostic printed by kconfig.

Running "make testconfig" currently fails in that test even though the
parser reports the duplicated include correctly:

  [stderr]
  Kconfig.inc1:4: error: repeated inclusion of Kconfig.inc3
  Kconfig.inc2:3: note: location of first inclusion of Kconfig.inc3

The fixture expects "Repeated" and "Location" with capital letters, but
the diagnostic emitted by scripts/kconfig/util.c uses lowercase words.
Update the fixture to match the real message.

Fixes: 102d712ded3e ("kconfig: Error out on duplicated kconfig inclusion")
Signed-off-by: Zhou Yuhang <zhouyuhang@kylinos.cn>
Tested-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260520070800.2265479-1-zhouyuhang1010@163.com
Signed-off-by: Nicolas Schier <nsc@kernel.org>
10 days agoMerge tag 'kvmarm-fixes-7.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmar...
Paolo Bonzini [Fri, 5 Jun 2026 16:54:37 +0000 (18:54 +0200)] 
Merge tag 'kvmarm-fixes-7.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 7.1, take #5

- Correctly drop the ITS translation cache reference when it actually
  gets invalidated

- Take the SRCU lock for SW page table walks

- Restore POR_EL0 access to host EL0, avoiding POR_EL0 becoming
  inaccessible from EL0 after running a guest

- Reassign nested_mmus array behind mmu_lock, ensuring that vcpu init
  and MMU notifiers are mutually exclusive

- Correctly handle FEAT_XNX at stage-2

10 days agoMerge tag 'nfs-for-7.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Fri, 5 Jun 2026 16:34:14 +0000 (09:34 -0700)] 
Merge tag 'nfs-for-7.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fix from Trond Myklebust:

 - Fix a use after free in nfs_write_completion

* tag 'nfs-for-7.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: write_completion: dereference loop-local req, not hdr->req

10 days agoMerge tag 'xfs-fixes-7.1-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Fri, 5 Jun 2026 15:34:32 +0000 (08:34 -0700)] 
Merge tag 'xfs-fixes-7.1-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:
 "A collection of fixes mostly for the RT device, including a small
  refactor that has no functional change"

* tag 'xfs-fixes-7.1-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: Remove mention of PageWriteback
  xfs: abort mount if xfs_fs_reserve_ag_blocks fails
  xfs: factor rtgroup geom write pointer reporting into a helper
  xfs: drop the RTG reference later in xfs_ioc_rtgroup_geometry
  xfs: fix rtgroup cleanup in CoW fork repair
  xfs: fix error returns in CoW fork repair
  xfs: fix overlapping extents returned for pNFS LAYOUTGET
  xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
  xfs: handle racing deletions in xfs_zone_gc_iter_irec

10 days agoMerge tag 'erofs-for-7.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 5 Jun 2026 15:28:10 +0000 (08:28 -0700)] 
Merge tag 'erofs-for-7.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

 - Fix a UAF of sbi->sync_decompress when compressed I/Os
   race with unmount

 - Fix a regression introduced this development cycle that
   incorrectly rejects multiple-algorithm images

* tag 'erofs-for-7.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix EFSCORRUPTED on multi-algorithm images in z_erofs_map_sanity_check()
  erofs: fix use-after-free on sbi->sync_decompress

10 days agoMerge tag 'v7.1-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Fri, 5 Jun 2026 15:23:02 +0000 (08:23 -0700)] 
Merge tag 'v7.1-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Fix use after free in SMB2_CANCEL

 - Fix race in ksmbd_reopen_durable_fd

 - Fix oplock and lease break potential NULL-dref

* tag 'v7.1-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix use-after-free of a deferred file_lock on double SMB2_CANCEL
  ksmbd: fix durable reconnect double-bind race in ksmbd_reopen_durable_fd
  ksmbd: fix NULL-deref of opinfo->conn in oplock/lease break notifiers

10 days agomisc: fastrpc: fix use-after-free race in fastrpc_map_create
Zhenghang Xiao [Sat, 30 May 2026 20:45:28 +0000 (21:45 +0100)] 
misc: fastrpc: fix use-after-free race in fastrpc_map_create

fastrpc_map_lookup returns a raw pointer after releasing fl->lock. The
caller fastrpc_map_create then calls fastrpc_map_get (kref_get_unless_zero)
on this unprotected pointer. A concurrent MEM_UNMAP can free the map
between the lock release and the kref operation, resulting in a
use-after-free on the freed slab object.

Restore the take_ref parameter to fastrpc_map_lookup so the reference
is acquired atomically under fl->lock before the pointer is exposed to
the caller.

Fixes: 10df039834f8 ("misc: fastrpc: Skip reference for DMA handles")
Cc: stable@vger.kernel.org
Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204528.116920-5-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agomisc: fastrpc: Fix NULL pointer dereference in rpmsg callback
Mukesh Ojha [Sat, 30 May 2026 20:45:27 +0000 (21:45 +0100)] 
misc: fastrpc: Fix NULL pointer dereference in rpmsg callback

A NULL pointer dereference was observed on Hawi at boot when the DSP
sends a glink message before fastrpc_rpmsg_probe() has completed
initialization:

  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000178
  pc : _raw_spin_lock_irqsave+0x34/0x8c
  lr : fastrpc_rpmsg_callback+0x3c/0xcc [fastrpc]
  ...
  Call trace:
   _raw_spin_lock_irqsave+0x34/0x8c (P)
   fastrpc_rpmsg_callback+0x3c/0xcc [fastrpc]
   qcom_glink_native_rx+0x538/0x6a4
   qcom_glink_smem_intr+0x14/0x24 [qcom_glink_smem]

The faulting address 0x178 corresponds to the lock variable inside
struct fastrpc_channel_ctx, confirming that cctx is NULL when
fastrpc_rpmsg_callback() attempts to take the spinlock.

There are two issues here. First, dev_set_drvdata() is called before
spin_lock_init() and idr_init(), leaving a window where the callback
can retrieve a valid cctx pointer but operate on an uninitialized
spinlock. Second, the rpmsg channel becomes live as soon as the driver
is bound, so fastrpc_rpmsg_callback() can fire before dev_set_drvdata()
is called at all, resulting in dev_get_drvdata() returning NULL.

Fix both issues by moving all cctx initialization ahead of
dev_set_drvdata() so the structure is fully initialized before it
becomes visible to the callback, and add a NULL check in
fastrpc_rpmsg_callback() as a guard against any remaining window.

Fixes: f6f9279f2bf0 ("misc: fastrpc: Add Qualcomm fastrpc basic driver model")
Cc: stable@vger.kernel.org
Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204528.116920-4-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agomisc: fastrpc: fix DMA address corruption due to find_vma misuse
Junrui Luo [Sat, 30 May 2026 20:45:26 +0000 (21:45 +0100)] 
misc: fastrpc: fix DMA address corruption due to find_vma misuse

fastrpc_get_args() uses find_vma() to look up the VMA for a user-provided
pointer and compute a DMA address offset. When the address falls in a gap
before the returned VMA, (ptr & PAGE_MASK) - vma->vm_start underflows,
corrupting the DMA address sent to the DSP.

Replace find_vma() with vma_lookup(), which returns NULL when the address
is not contained within any VMA.

Cc: stable@vger.kernel.org
Fixes: 80f3afd72bd4 ("misc: fastrpc: consider address offset before sending to DSP")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204528.116920-3-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agomisc: fastrpc: fix use-after-free of fastrpc_user in workqueue context
Anandu Krishnan E [Sat, 30 May 2026 20:45:25 +0000 (21:45 +0100)] 
misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context

There is a race between fastrpc_device_release() and the workqueue
that processes DSP responses. When the user closes the file descriptor,
fastrpc_device_release() frees the fastrpc_user structure. Concurrently,
an in-flight DSP invocation can complete and fastrpc_rpmsg_callback()
schedules context cleanup via schedule_work(&ctx->put_work). If the
workqueue runs fastrpc_context_free() in parallel with or after
fastrpc_device_release() has freed the user structure, it dereferences
the freed fastrpc_user. Depending on the state of the context at the
time of the race, any one of the following accesses can be hit:

 1. fastrpc_buf_free() calls fastrpc_ipa_to_dma_addr(buf->fl->cctx, ...)
    to strip the SID bits from the stored IOVA before passing the
    physical address to dma_free_coherent().

 2. fastrpc_free_map() reads map->fl->cctx->vmperms[0].vmid to
    reconstruct the source permission bitmask needed for the
    qcom_scm_assign_mem() call that returns memory from the DSP VM
    back to HLOS.

 3. fastrpc_free_map() acquires map->fl->lock to safely remove the
    map node from the fl->maps list.

The resulting use-after-free manifests as:

  pc : fastrpc_buf_free+0x38/0x80 [fastrpc]
  lr : fastrpc_context_free+0xa8/0x1b0 [fastrpc]
  fastrpc_context_free+0xa8/0x1b0 [fastrpc]
  fastrpc_context_put_wq+0x78/0xa0 [fastrpc]
  process_one_work+0x180/0x450
  worker_thread+0x26c/0x388

Add kref-based reference counting to fastrpc_user. Have each invoke
context take a reference on the user at allocation time and release it
when the context is freed. Release the initial reference in
fastrpc_device_release() at file close. Move the teardown of the user
structure — freeing pending contexts, maps, mmaps, and the channel
context reference — into the kref release callback fastrpc_user_free(),
so that it runs only when the last reference is dropped, regardless of
whether that happens at device close or after the final in-flight
context completes.

Fixes: 6cffd79504ce ("misc: fastrpc: Add support for dmabuf exporter")
Cc: stable@kernel.org
Signed-off-by: Anandu Krishnan E <anandu.e@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204528.116920-2-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agoslimbus: qcom-ngd-ctrl: Avoid ABBA on tx_lock/ctrl->lock
Bjorn Andersson [Sat, 30 May 2026 20:44:21 +0000 (21:44 +0100)] 
slimbus: qcom-ngd-ctrl: Avoid ABBA on tx_lock/ctrl->lock

During the SSR/PDR down notification the tx_lock is taken with the
intent to provide synchronization with active DMA transfers.

But during this period qcom_slim_ngd_down() is invoked, which ends up in
slim_report_absent(), which takes the slim_controller lock. In multiple
other codepaths these two locks are taken in the opposite order (i.e.
slim_controller then tx_lock).

The result is a lockdep splat, and a possible deadlock:

  rprocctl/449 is trying to acquire lock:
  ffff00009793e620 (&ctrl->lock){+.+.}-{4:4}, at: slim_report_absent (drivers/slimbus/core.c:322) slimbus

  but task is already holding lock:
  ffff00009793fb50 (&ctrl->tx_lock){+.+.}-{4:4}, at: qcom_slim_ngd_ssr_pdr_notify (drivers/slimbus/qcom-ngd-ctrl.c:1475) slim_qcom_ngd_ctrl

  which lock already depends on the new lock.

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&ctrl->tx_lock);
                                lock(&ctrl->lock);
                                lock(&ctrl->tx_lock);
   lock(&ctrl->lock);

The assumption is that the comment refers to the desire to not call
qcom_slim_ngd_exit_dma() while we have an ongoing DMA TX transaction.
But any such transaction is initiated and completed within a single
qcom_slim_ngd_xfer_msg().

Prior to calling qcom_slim_ngd_exit_dma() the slim_controller is torn
down, all child devices are notified that the slimbus is gone and the
child devices are removed.

Stop taking the tx_lock in qcom_slim_ngd_ssr_pdr_notify() to avoid the
deadlock.

Fixes: a899d324863a ("slimbus: qcom-ngd-ctrl: add Sub System Restart support")
Cc: stable@vger.kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-9-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agoslimbus: qcom-ngd-ctrl: Balance pm_runtime enablement for NGD
Bjorn Andersson [Sat, 30 May 2026 20:44:20 +0000 (21:44 +0100)] 
slimbus: qcom-ngd-ctrl: Balance pm_runtime enablement for NGD

The pm_runtime_enable() and pm_runtime_use_autosuspend() calls are
supposed to be balanced on exit, add these calls.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-8-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agoslimbus: qcom-ngd-ctrl: Initialize controller resources in controller
Bjorn Andersson [Sat, 30 May 2026 20:44:19 +0000 (21:44 +0100)] 
slimbus: qcom-ngd-ctrl: Initialize controller resources in controller

The work structs and work queue are controller resources, create and
destroy them in the controller context. Creating them as part of the
child device's probe path seems to be okay now that the controller's
probe has been updated, but if for some reason the child does not probe
successfully a SSR or PDR notification will schedule_work() on an
uninitialized "ngd_up_work".

Move the initialization of these controller resources to the controller
probe function to avoid any issues, and to clarify the ownership.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-7-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agoslimbus: qcom-ngd-ctrl: Register callbacks after creating the ngd
Bjorn Andersson [Sat, 30 May 2026 20:44:18 +0000 (21:44 +0100)] 
slimbus: qcom-ngd-ctrl: Register callbacks after creating the ngd

When the remoteproc starts in parallel with the NGD driver being probed,
or the remoteproc is already up when the PDR lookup is being registered,
or in the theoretical event that we get an interrupt from the hardware,
these callbacks will operate on uninitialized data. This result in
issues to boot the affected boards.

One such example can be seen in the following fault, where
qcom_slim_ngd_ssr_pdr_notify() schedules work on the NULL ngd_up_work.

[   21.858578] ------------[ cut here ]------------
[   21.858745] WARNING: kernel/workqueue.c:2338 at __queue_work+0x5e0/0x790, CPU#2: kworker/2:2/116
...
[   21.859251] Call trace:
[   21.859255]  __queue_work+0x5e0/0x790 (P)
[   21.859265]  queue_work_on+0x6c/0xf0
[   21.859273]  qcom_slim_ngd_ssr_pdr_notify+0x110/0x150 [slim_qcom_ngd_ctrl]
[   21.859304]  qcom_slim_ngd_ssr_notify+0x24/0x40 [slim_qcom_ngd_ctrl]
[   21.859318]  notifier_call_chain+0xa4/0x230
[   21.859329]  srcu_notifier_call_chain+0x64/0xb8
[   21.859338]  ssr_notify_start+0x40/0x78 [qcom_common]
[   21.859355]  rproc_start+0x130/0x230
[   21.859367]  rproc_boot+0x3d4/0x518
...

Move the enablement of interrupts, and the registration of SSR and PDR
until after the NGD device has been registered.

This could be further refined by moving initialization to the control
driver probe and by removing the platform driver model from the picture.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-6-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agoslimbus: qcom-ngd-ctrl: Correct PDR and SSR cleanup ownership
Bjorn Andersson [Sat, 30 May 2026 20:44:17 +0000 (21:44 +0100)] 
slimbus: qcom-ngd-ctrl: Correct PDR and SSR cleanup ownership

PDR and SSR callbacks are registred from the controller probe function,
but currently released from the child device's remove function.

The remove() function should only be unwinding what was done in the
same device's probe() function.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-5-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agoslimbus: qcom-ngd-ctrl: Fix probe error path ordering
Bjorn Andersson [Sat, 30 May 2026 20:44:16 +0000 (21:44 +0100)] 
slimbus: qcom-ngd-ctrl: Fix probe error path ordering

qcom_slim_ngd_ctrl_probe() first registers the SSR callback then
allocates the PDR context, as such the error path needs to come in
opposite order to allow us to unroll each step.

Fixes: 16f14551d0df ("slimbus: qcom-ngd: cleanup in probe error path")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-4-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agoslimbus: qcom-ngd-ctrl: Fix up platform_driver registration
Bjorn Andersson [Sat, 30 May 2026 20:44:15 +0000 (21:44 +0100)] 
slimbus: qcom-ngd-ctrl: Fix up platform_driver registration

Device drivers should not invoke platform_driver_register()/unregister()
in their probe and remove paths. They should further not rely on
platform_driver_unregister() as their only means of "deleting" their
child devices.

Introduce a helper to unregister the child device and move the
platform_driver_register()/unregister() to module_init()/exit().

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-3-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agoslimbus: qcom-ngd-ctrl: fix OF node refcount
Bartosz Golaszewski [Sat, 30 May 2026 20:44:14 +0000 (21:44 +0100)] 
slimbus: qcom-ngd-ctrl: fix OF node refcount

Platform devices created with platform_device_alloc() call
platform_device_release() when the last reference to the device's
kobject is dropped. This function calls of_node_put() unconditionally.
This works fine for devices created with platform_device_register_full()
but users of the split approach (platform_device_alloc() +
platform_device_add()) must bump the reference of the of_node they
assign manually. Add the missing call to of_node_get().

Cc: stable@vger.kernel.org
Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204421.116824-2-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agonvmem: core: fix use-after-free bugs in error paths
Bartosz Golaszewski [Sat, 30 May 2026 20:43:40 +0000 (21:43 +0100)] 
nvmem: core: fix use-after-free bugs in error paths

Fix several instances of error paths in which we call
__nvmem_device_put() - which may end up freeing the underlying memory
and other resources - and then keep on using the nvmem structure. Always
put the reference to the nvmem device as the last step before returning
the error code.

Cc: stable@vger.kernel.org
Fixes: 7ae6478b304b ("nvmem: core: rework nvmem cell instance creation")
Fixes: e888d445ac33 ("nvmem: resolve cells from DT at registration time")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204340.116743-3-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agonvmem: layouts: onie-tlv: fix hang on unknown types
Andre Heider [Sat, 30 May 2026 20:43:39 +0000 (21:43 +0100)] 
nvmem: layouts: onie-tlv: fix hang on unknown types

The EEPROM on my board has a vendor specific entry of type 0x41. When
stumbling upon that, this driver hangs in an endless loop.

Fix it by keep incrementing the offset on unknown entries, so the loop
will eventually stop.

Fixes: d3c0d12f6474 ("nvmem: layouts: onie-tlv: Add new layout driver")
Cc: Stable@vger.kernel.org
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Srinivas Kandagatla <srini@kernel.org>
Link: https://patch.msgid.link/20260530204340.116743-2-srini@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 days agoMerge tag 'svc_fixes_for_v7.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Fri, 5 Jun 2026 15:17:30 +0000 (17:17 +0200)] 
Merge tag 'svc_fixes_for_v7.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into char-misc-linus

Dinh writes:

firmware: stratix10-svc and stratix10-rsu: fixes for v7.1
- Return -EOPNOTSUPP when ATF async is not supported
- Fix SVC driver from loading entirely when asynchronous ops is not
  supported in older ATF.
- Fix a NULL pointer dereference on a timeout in rsu_send_msg()

* tag 'svc_fixes_for_v7.1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
  firmware: stratix10-rsu: Fix NULL deref on rsu_send_msg() timeout in probe
  firmware: stratix10-svc: Don't fail probe when async ops unsupported
  firmware: stratix10-svc: Return -EOPNOTSUPP when ATF async unsupported

10 days agoMerge tag 'usb-serial-7.1-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Fri, 5 Jun 2026 15:16:27 +0000 (17:16 +0200)] 
Merge tag 'usb-serial-7.1-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB serial fixes for 7.1-rc7

Here are two fixes for buffer overflows in the io_ti driver and a new
modem device id.

All have been in linux-next with no reported issues.

* tag 'usb-serial-7.1-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: option: add usb-id for Dell Wireless DW5826e-m
  USB: serial: io_ti: fix heap overflow in build_i2c_fw_hdr()
  USB: serial: io_ti: fix heap overflow in get_manuf_info()

10 days agoKVM: arm64: Correctly identify executable PTEs at stage-2
Oliver Upton [Tue, 2 Jun 2026 16:59:01 +0000 (09:59 -0700)] 
KVM: arm64: Correctly identify executable PTEs at stage-2

KVM invalidates the I-cache before installing an executable PTE on
implementations without DIC. Unfortunately, support for FEAT_XNX
broke this check as KVM_PTE_LEAF_ATTR_HI_S2_XN was expanded to a
bitfield.

Fix it by reusing kvm_pgtable_stage2_pte_prot() and testing the abstract
permission bits instead.

Fixes: 2608563b466b ("KVM: arm64: Add support for FEAT_XNX stage-2 permissions")
Reported-by: Sashiko (gemini/gemini-3.1-pro-preview)
Signed-off-by: Oliver Upton <oupton@kernel.org>
Reviewed-by: Wei-Lin Chang <weilin.chang@arm.com>
Link: https://patch.msgid.link/20260602165901.52800-3-oupton@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
10 days agoKVM: arm64: nv: Fix handling of XN[0] when !FEAT_XNX
Oliver Upton [Tue, 2 Jun 2026 16:59:00 +0000 (09:59 -0700)] 
KVM: arm64: nv: Fix handling of XN[0] when !FEAT_XNX

XN has already been extracted from its bitfield position so using
FIELD_PREP() on the mask that clears XN[0] is completely broken, having
the effect of unconditionally granting execute permissions...

Fix the obvious mistake by manipulating the right bit.

Cc: stable@vger.kernel.org
Fixes: d93febe2ed2e ("KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2")
Reviewed-by: Wei-Lin Chang <weilin.chang@arm.com>
Signed-off-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20260602165901.52800-2-oupton@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
10 days agocleanup: Specify nonnull argument index
Dmitry Ilvokhin [Fri, 5 Jun 2026 10:06:22 +0000 (03:06 -0700)] 
cleanup: Specify nonnull argument index

The guard constructors were annotated with an empty __nonnull_args(),
relying on __nonnull__() marking every pointer parameter as non-NULL.
Sparse cannot parse the empty argument list.

Both constructors take the lock pointer as their first parameter, so
specify the index explicitly: __nonnull_args(1).

Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/all/aiJi0WcYE8FZt-jO@stanley.mountain/
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/aiKpH3cLBEj3TF2Q@shell.ilvokhin.com
10 days agofs/read_write: Do not export __kernel_write() to the entire world
Andy Shevchenko [Thu, 4 Jun 2026 09:52:02 +0000 (11:52 +0200)] 
fs/read_write: Do not export __kernel_write() to the entire world

Since we have EXPORT_SYMBOL_FOR_MODULES(), we may narrow
the __kernel_write() export to the only which really needs it.
With that being done, update the respective comment.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260604095233.284067-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agoptp: vmclock: Use hw_cycles from snapshot for precise TSC pairing
David Woodhouse [Thu, 4 Jun 2026 09:35:18 +0000 (10:35 +0100)] 
ptp: vmclock: Use hw_cycles from snapshot for precise TSC pairing

When the system clocksource is kvmclock or Hyper-V (not the TSC directly),
vmclock_get_crosststamp() falls through to a separate get_cycles() call,
losing the atomic pairing between the system time snapshot and the TSC
reading.

Now that ktime_get_snapshot_id() populates hw_cycles with the underlying
TSC value for derived clocksources, use it when available.  This gives a
perfect (system_time, tsc) pairing for the device time calculation.

The SUPPORT_KVMCLOCK wrapper is still needed to convert the TSC into
kvmclock nanoseconds for system_counter->cycles, because otherwise
get_device_system_crosststamp() can't interpret the result against the
system clock.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Assisted-by: Kiro:claude-opus-4.6-1m
Link: https://patch.msgid.link/20260604095755.64849-4-dwmw2@infradead.org
10 days agox86/kvmclock: Implement read_snapshot() for kvmclock clocksource
David Woodhouse [Thu, 4 Jun 2026 09:35:17 +0000 (10:35 +0100)] 
x86/kvmclock: Implement read_snapshot() for kvmclock clocksource

Implement the read_snapshot() callback for the kvmclock clocksource.  This
returns the kvmclock nanosecond value (for timekeeping) while also
providing the raw TSC value that was used to compute it.

The TSC is read inside the pvclock seqlock-protected region, ensuring the
raw TSC and derived kvmclock value are atomically paired.

This enables ktime_get_snapshot_id() to provide the raw TSC to consumers
like the vmclock PTP driver, which currently has to do a separate call to
get_cycles() to obtain a value at *approximately* the same time, to feed
through the vmclock calculation.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Assisted-by: Kiro:claude-opus-4.6-1m
Link: https://patch.msgid.link/20260604095755.64849-3-dwmw2@infradead.org
10 days agoclocksource/hyperv: Implement read_snapshot() for TSC page clocksource
David Woodhouse [Thu, 4 Jun 2026 09:35:16 +0000 (10:35 +0100)] 
clocksource/hyperv: Implement read_snapshot() for TSC page clocksource

Implement the read_snapshot() callback for the Hyper-V TSC page clock-
source. This returns the derived 10MHz reference time (for timekeeping)
while also providing the raw TSC value that was used to compute it.

When the TSC page is valid, hv_read_tsc_page_tsc() atomically captures both
values from a single RDTSC inside the sequence-counter protected read. When
the TSC page is invalid (sequence == 0), the hw_csid and hw_cycles are set
to zero indicating no value is available.

This enables ktime_get_snapshot_id() to provide the raw TSC to consumers
like KVM's master clock when running nested guests under Hyper-V.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Assisted-by: Kiro:claude-opus-4.6-1m
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20260604095755.64849-2-dwmw2@infradead.org
10 days agoiomap: Add IOMAP_F_ZERO_TAIL flag to trace event strings
Namjae Jeon [Wed, 3 Jun 2026 14:40:31 +0000 (23:40 +0900)] 
iomap: Add IOMAP_F_ZERO_TAIL flag to trace event strings

Add IOMAP_F_ZERO_TAIL to the flag string mapping in iomap trace
events. This allows the new flag to be properly displayed in
ftrace output when iomap operations use it.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Link: https://patch.msgid.link/20260603144031.7370-1-linkinjeon@kernel.org
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agoMerge patch series "vfs infrastructure for fs-verity support for XFS with post EOF...
Christian Brauner [Thu, 4 Jun 2026 11:47:27 +0000 (13:47 +0200)] 
Merge patch series "vfs infrastructure for fs-verity support for XFS with post EOF merkle tree"

Christian Brauner <brauner@kernel.org> says:

This brings in the vfs infrastructure required to implement fs-verity
support for XFS.

* patches from https://patch.msgid.link/20260520123722.405752-1-aalbersh@kernel.org:
  iomap: introduce iomap_fsverity_write() for writing fsverity metadata
  iomap: teach iomap to read files with fsverity
  iomap: introduce IOMAP_F_FSVERITY and teach writeback to handle fsverity
  fsverity: generate and store zero-block hash

Link: https://patch.msgid.link/20260520123722.405752-1-aalbersh@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agoiomap: introduce iomap_fsverity_write() for writing fsverity metadata
Andrey Albershteyn [Wed, 20 May 2026 12:37:07 +0000 (14:37 +0200)] 
iomap: introduce iomap_fsverity_write() for writing fsverity metadata

This is just a wrapper around iomap_file_buffered_write() to create
necessary iterator over metadata.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://patch.msgid.link/20260520123722.405752-10-aalbersh@kernel.org
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agoiomap: teach iomap to read files with fsverity
Andrey Albershteyn [Wed, 20 May 2026 12:37:06 +0000 (14:37 +0200)] 
iomap: teach iomap to read files with fsverity

Obtain fsverity info for folios with file data and fsverity metadata.
Filesystem can pass vi down to ioend and then to fsverity for
verification. This is different from other filesystems ext4, f2fs, btrfs
supporting fsverity, these filesystems don't need fsverity_info for
reading fsverity metadata. While reading merkle tree iomap requires
fsverity info to synthesize hashes for zeroed data block.

fsverity metadata has two kinds of holes - ones in merkle tree and one
after fsverity descriptor.

Merkle tree holes are blocks full of hashes of zeroed data blocks. These
are not stored on the disk but synthesized on the fly. This saves a bit
of space for sparse files. Due to this iomap also need to lookup
fsverity_info for folios with fsverity metadata. ->vi has a hash of the
zeroed data block which will be used to fill the merkle tree block.

The hole past descriptor is interpreted as end of metadata region. As we
don't have EOF here we use this hole as an indication that rest of the
folio is empty. This patch marks rest of the folio beyond fsverity
descriptor as uptodate.

For file data, fsverity needs to verify consistency of the whole file
against the root hash, hashes of holes are included in the merkle tree.
Verify them too.

Issue reading of fsverity merkle tree on the fsverity inodes. This way
metadata will be available at I/O completion time.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://patch.msgid.link/20260520123722.405752-9-aalbersh@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agoiomap: introduce IOMAP_F_FSVERITY and teach writeback to handle fsverity
Andrey Albershteyn [Wed, 20 May 2026 12:37:05 +0000 (14:37 +0200)] 
iomap: introduce IOMAP_F_FSVERITY and teach writeback to handle fsverity

This flag indicates that I/O is for fsverity metadata.

In the write path skip i_size check and i_size updates as metadata is
past EOF. In writeback don't update i_size and continue writeback if
even folio is beyond EOF. In read path don't zero fsverity folios, again
they are past EOF.

The iomap_block_needs_zeroing() is also called from write path. For
folios of larger order we don't want to zero out pages in the folio as
these could contain other merkle tree blocks. For fsverity, filesystem
will request to read PAGE_SIZE memory regions. For data folios, iomap
will zero the rest of the folio for anything which is beyond EOF. We
don't want this for fsverity folios.

Christian Brauner <brauner@kernel.org> says:
Changed IOMAP_F_FSVERITY from (1U << 10) to (1U << 11) to avoid colliding
with IOMAP_F_ZERO_TAIL, which already uses (1U << 10).

Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://patch.msgid.link/20260520123722.405752-8-aalbersh@kernel.org
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agofsverity: generate and store zero-block hash
Andrey Albershteyn [Wed, 20 May 2026 12:37:02 +0000 (14:37 +0200)] 
fsverity: generate and store zero-block hash

Compute the hash of one filesystem block's worth of zeros. A filesystem
implementation can decide to elide merkle tree blocks containing only
this hash and synthesize the contents at read time.

Let's pretend that there's a file containing 131 data block and whose
merkle tree looks roughly like this:

root
 +--leaf0
 |   +--data0
 |   +--data1
 |   +--...
 |   `--data128
 `--leaf1
     +--data129
     +--data130
     `--data131

If data[0-128] are sparse holes, then leaf0 will contain a repeating
sequence of @zero_digest.  Therefore, leaf0 need not be written to disk
because its contents can be synthesized.

A subsequent xfs patch will use this to reduce the size of the merkle
tree when dealing with sparse gold master disk images and the like.

Note that this works only on the first-level (data holes). fsverity
doesn't store/generate zero_digest for any higher levels.

Add a helper to pre-fill folio with hashes of empty blocks. This will be
used by iomap to synthesize blocks full of zero hashes on the fly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://patch.msgid.link/20260520123722.405752-5-aalbersh@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agoio_uring/net: inherit IORING_CQE_F_BUF_MORE across bundle recv retries
Clément Léger [Thu, 4 Jun 2026 16:07:13 +0000 (09:07 -0700)] 
io_uring/net: inherit IORING_CQE_F_BUF_MORE across bundle recv retries

When a bundle recv retries inside io_recv_finish(), the merge logic OR
the saved cflags from the previous iteration with the cflags returned by
the new iteration:
  cflags = req->cqe.flags | (cflags & CQE_F_MASK);

Bits listed in CQE_F_MASK are inherited from the new iteration, and all
other bits (notably IORING_CQE_F_BUFFER and the buffer ID) come from the
saved cflags. Before this change CQE_F_MASK covered only
IORING_CQE_F_SOCK_NONEMPTY and IORING_CQE_F_MORE.

When using provided buffer rings (IOU_PBUF_RING_INC) with incremental
mode, and bundle recv, io_kbuf_inc_commit() can leave the head ring
entry partially consumed, __io_put_kbufs() then sets
IORING_CQE_F_BUF_MORE on the returned cflags so userspace knows the
buffer ID will be reused for subsequent completions.

Because IORING_CQE_F_BUF_MORE was not in CQE_F_MASK, the merge above
silently dropped it whenever the final retry iteration partially
consumed the buffer, and the subsequent req->cqe.flags = cflags &
~CQE_F_MASK save would have left a stale IORING_CQE_F_BUF_MORE in the
carried-over cflags had one been present. Userspace would then
wrongfully advance it ring head past an entry the kernel still uses.

Add IORING_CQE_F_BUF_MORE to CQE_F_MASK so it is both inherited from the
new iteration into the user-visible CQE and stripped from the saved
cflags between iterations.

Cc: stable@vger.kernel.org
Signed-off-by: Clément Léger <cleger@meta.com>
Assisted-by: Claude:claude-opus-4.6
Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption")
Link: https://patch.msgid.link/20260604160715.2482972-1-cleger@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 days agoxfrm: espintcp: do not reuse an in-progress partial send
Wyatt Feng [Tue, 2 Jun 2026 16:46:27 +0000 (00:46 +0800)] 
xfrm: espintcp: do not reuse an in-progress partial send

espintcp keeps a single in-flight transmit in ctx->partial.
Before building a new sk_msg, espintcp_sendmsg() first tries to flush
that state through espintcp_push_msgs().

For blocking callers, espintcp_push_msgs() may return success even when
the previous partial send is still pending. espintcp_sendmsg() would
then reinitialize emsg->skmsg and reuse ctx->partial while the old
transfer still owns that state.

Do not rebuild the send message when ctx->partial is still in progress.
If espintcp_push_msgs() returns with emsg->len still set, fail the new
send instead of overwriting the live partial state.

This is a memory-safety fix: reusing the live partial-send state can
leave a stale offset attached to a new sk_msg and lead to an out-of-
bounds read in the send path.

tcp_sendmsg_locked() already handles waiting for send buffer memory, so
the fix here is just to preserve espintcp's one-message-at-a-time
transmit state.

Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Assisted-by: Codex:GPT-5.4
Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
10 days agoxfrm: iptfs: fix ABBA deadlock in iptfs_destroy_state()
Tristan Madani [Tue, 2 Jun 2026 17:16:41 +0000 (17:16 +0000)] 
xfrm: iptfs: fix ABBA deadlock in iptfs_destroy_state()

iptfs_destroy_state() calls hrtimer_cancel() while holding a spinlock
that the timer callback also acquires, leading to an ABBA deadlock on
SMP systems.

For the output timer (iptfs_timer):
  - iptfs_destroy_state() holds x->lock, calls hrtimer_cancel()
  - iptfs_delay_timer() callback takes x->lock

For the drop timer (drop_timer):
  - iptfs_destroy_state() holds drop_lock, calls hrtimer_cancel()
  - iptfs_drop_timer() callback takes drop_lock

Both timers use HRTIMER_MODE_REL_SOFT, so their callbacks run in softirq
context.  When hrtimer_cancel() is called for a soft timer that is
currently executing on another CPU, hrtimer_cancel_wait_running() spins
on softirq_expiry_lock -- the same lock held by the softirq running the
callback.  If the callback is blocked waiting for the spinlock held by
the caller of hrtimer_cancel(), a circular dependency forms:

  CPU 0: holds lock_A -> waits for softirq_expiry_lock
  CPU 1: holds softirq_expiry_lock -> waits for lock_A

Fix by calling hrtimer_cancel() before acquiring the respective locks.
hrtimer_cancel() is safe to call without holding any lock and will wait
for any in-progress callback to complete.  For the output timer, the
lock is still acquired afterwards to drain the packet queue.  For the
drop timer, the lock/unlock pair is removed entirely since it only
existed to serialize with the timer callback, which hrtimer_cancel()
already guarantees.

Found by source code audit.

Fixes: 4b3faf610cc6 ("xfrm: iptfs: add new iptfs xfrm mode impl")
Cc: Christian Hopps <chopps@labn.net>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
10 days agoKVM: arm64: Reassign nested_mmus array behind mmu_lock
Hyunwoo Kim [Fri, 5 Jun 2026 08:27:01 +0000 (17:27 +0900)] 
KVM: arm64: Reassign nested_mmus array behind mmu_lock

kvm->arch.nested_mmus[] is walked under kvm->mmu_lock, including from the
MMU notifier path (kvm_unmap_gfn_range() -> kvm_nested_s2_unmap()), which
can run at any time. kvm_vcpu_init_nested() reallocates the array and frees
the old buffer while holding only kvm->arch.config_lock, so such a walker
can reference the freed array.

Allocate the new array outside of mmu_lock, as the allocation can sleep.
Under the lock, copy the existing entries, fix up the back pointers and
reassign the array. Free the old buffer after dropping the lock, as
kvfree() can sleep as well.

Fixes: 4f128f8e1aaac ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/aiKIVVeIr1aAB1yp@v4bel
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger,kernel.org
10 days agoKVM: arm64: Restore POR_EL0 access to host EL0
Joey Gouly [Thu, 4 Jun 2026 10:54:34 +0000 (11:54 +0100)] 
KVM: arm64: Restore POR_EL0 access to host EL0

CPTR_EL2.E0POE was being cleared in __deactivate_cptr_traps_vhe(), which meant
that any accesses to POR_EL0 from host EL0 would trap and be reported to
userspace as an Illegal instruction. This would happen after running any VM,
regardless if it used POE or not.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Link: https://sashiko.dev/#/patchset/20260602155430.2088142-1-maz@kernel.org?part=1
Link: https://patch.msgid.link/20260604105434.2297268-1-joey.gouly@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger,kernel.org
10 days agoRevert "drm/i915/backlight: Remove try_vesa_interface"
Suraj Kandpal [Sun, 17 May 2026 02:47:09 +0000 (08:17 +0530)] 
Revert "drm/i915/backlight: Remove try_vesa_interface"

This reverts commit 40d2f5820951dee818d05c14677277048bd85f9f.

Removing the try_vesa_interface gate caused a backlight regression on
panels whose VBT correctly reports INTEL_BACKLIGHT_DISPLAY_DDI and whose
PWM path is the actual backlight control, but whose DPCD optimistically
advertises DP_EDP_BACKLIGHT_AUX_ENABLE_CAP / _BRIGHTNESS_AUX_SET_CAP.
After the commit such panels silently bind to the VESA AUX backlight
funcs; AUX writes complete but the panel ignores them, leaving
brightness stuck (no-op backlight). Observed on at least KBL and TGL
eDP setups.

Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Link: https://patch.msgid.link/20260517024709.1016121-1-suraj.kandpal@intel.com
(cherry picked from commit f30fddb4402313aa5301a74d721638d343395269)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
10 days agox86/process: Convert rdmsr() to rdmsrq() in arch_post_acpi_subsys_init() to address...
HyeongJun An [Thu, 4 Jun 2026 15:00:52 +0000 (00:00 +0900)] 
x86/process: Convert rdmsr() to rdmsrq() in arch_post_acpi_subsys_init() to address W=1 warning

arch_post_acpi_subsys_init() reads MSR_K8_INT_PENDING_MSG with rdmsr()
into a lo/hi pair but only uses the low 32 bits: K8_INTP_C1E_ACTIVE_MASK
(0x18000000) lies entirely within them. The 'hi' half is never consumed,
which triggers a -Wunused-but-set-variable warning under W=1:

  arch/x86/kernel/process.c: In function 'arch_post_acpi_subsys_init':
  arch/x86/kernel/process.c:972:17: warning: variable 'hi' set but not used

Read the full MSR into a single u64 with rdmsrq() and test the mask
against it, dropping the now-unnecessary lo/hi variables.

No functional change intended.

Signed-off-by: HyeongJun An <sammiee5311@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jürgen Groß <jgross@suse.com>
Link: https://patch.msgid.link/20260604150052.3337246-1-sammiee5311@gmail.com
10 days agoKVM: arm64: Take the SRCU lock for page table walks in fault injection and AT emulation
Hyunwoo Kim [Wed, 3 Jun 2026 12:09:33 +0000 (21:09 +0900)] 
KVM: arm64: Take the SRCU lock for page table walks in fault injection and AT emulation

walk_s1() and kvm_walk_nested_s2() expect to be called while holding
kvm->srcu to guard against memslot changes. While this is generally
the case, __kvm_at_s12() and __kvm_find_s1_desc_level() call into the
respective walkers without taking kvm->srcu.

Fix by acquiring kvm->srcu prior to the table walk in both instances.

Cc: stable@vger.kernel.org
Fixes: 50f77dc87f13 ("KVM: arm64: Populate level on S1PTW SEA injection")
Fixes: be04cebf3e78 ("KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W}")
Suggested-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/aiAZfdeyanIvP8SD@v4bel
Signed-off-by: Marc Zyngier <maz@kernel.org>
10 days agoKVM: arm64: vgic-its: Drop the translation cache reference only for the erased entry
Hyunwoo Kim [Mon, 1 Jun 2026 14:53:26 +0000 (23:53 +0900)] 
KVM: arm64: vgic-its: Drop the translation cache reference only for the erased entry

vgic_its_invalidate_cache() walks the per-ITS translation cache with
xa_for_each() and drops the cache's reference on each entry with
vgic_put_irq(). It puts the iterated pointer, though, rather than the
value returned by xa_erase().

The function is called from contexts that do not exclude one another: the
ITS command handlers hold its_lock, the GITS_CTLR write path holds
cmd_lock, and the path that clears EnableLPIs in a redistributor's
GICR_CTLR holds neither. Two or more of them can drain the same cache
concurrently, and if each one observes the same entry, erases it and then
puts it, the single reference the cache holds on that entry is dropped
more than once. The entry can then be freed while an ITE still maps it.

xa_erase() is atomic and returns the previous entry, so put only the entry
that this context actually removed. The cache reference is then dropped
exactly once per entry even when the invalidations run concurrently, and
the behavior is unchanged when only one context runs.

Fixes: 8201d1028caa ("KVM: arm64: vgic-its: Maintain a translation cache per ITS")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/ah2c5lu4JbUg7dj-@v4bel
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
10 days agoirqchip/irq-realtek-rtl: Add multicore support
Markus Stockhausen [Thu, 4 Jun 2026 18:25:06 +0000 (20:25 +0200)] 
irqchip/irq-realtek-rtl: Add multicore support

The Realtek interrupt driver currently supports only single core
systems. So the higher end devices like RTL839x and RTL930x with
dual VPEs must be driven with NR_CPU=1. Enhance the driver to
support multicore (dual VPE) systems. For this:

  - Extend the register map for multiple cores
  - Search for multiple CPU cores in the devicetree
  - Improve the register helpers to support multiple cores
  - Add an affinity setter
  - Enhance the IRQ handler for multiple cores

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260604182506.1113440-3-markus.stockhausen@gmx.de
10 days agoirqchip/irq-realtek-rtl: Add/simplify register helpers
Markus Stockhausen [Thu, 4 Jun 2026 18:25:05 +0000 (20:25 +0200)] 
irqchip/irq-realtek-rtl: Add/simplify register helpers

The Realtek interrupt controller has two important registers that are used
by the driver in several places

 - GIMR: global interrupt mask register
 - IRR: Interrupt routing registers

The usage of these registers is very inconsistent. GIMR is addressed
directly while IRR has a helper that needs a macro as an input. Harmonize
this by providing consistent helpers that improve code readability.

The callers of these helpers use classic lock/unlock functions and
sometimes use the wrong locking helper. E.g. irqsave variants are used in
mask/unmask although not needed. Adapt and fix the surrounding call
locations.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260604182506.1113440-2-markus.stockhausen@gmx.de
10 days agox86/resctrl: Only check Intel systems for SNC
Tony Luck [Fri, 5 Jun 2026 04:46:49 +0000 (21:46 -0700)] 
x86/resctrl: Only check Intel systems for SNC

topology_num_nodes_per_package() reports values greater than one on certain
AMD systems resulting in resctrl's Intel model specific SNC detection
printing the confusing message:

   "CoD enabled system? Resctrl not supported"

Add a check for Intel systems before looking at the topology.

[ reinette: Add Closes tag, fix tag typos, rework changelog ]

Fixes: 59674fc9d0bf ("x86/resctrl: Fix SNC detection")
Reported-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://patch.msgid.link/9849330f45ac86344cc5ac54df2d313906d70bc4.1780634584.git.reinette.chatre@intel.com
Closes: https://lore.kernel.org/lkml/37ac0376-43a3-4283-a3d5-4d57b3bec578@amd.com/
10 days agoiomap: introduce IOMAP_F_ZERO_TAIL flag
Namjae Jeon [Mon, 18 May 2026 11:46:55 +0000 (20:46 +0900)] 
iomap: introduce IOMAP_F_ZERO_TAIL flag

In filesystems that maintain a separate Valid Data Length, such as exFAT
and NTFS, a partial write may start at or beyond the current valid_size and
extend it. In this case, the region after the previous valid_size but
within the same filesystem block is considered unwritten.

This patch introduces IOMAP_F_ZERO_TAIL. When this flag is set in iomap,
__iomap_write_begin() will zero only the tail portion while preserving any
valid data before it in the same block.

Without this tail zeroing, stale data in the unwritten portion of the block
can remain in the page cache. Subsequent reads can then return incorrect
contents from that region.

Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Link: https://patch.msgid.link/20260518114705.9601-2-linkinjeon@kernel.org
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agorust: ptr: remove implicit index projection syntax
Gary Guo [Tue, 2 Jun 2026 14:17:57 +0000 (15:17 +0100)] 
rust: ptr: remove implicit index projection syntax

All users have been converted to use keyworded index projection syntax to
explicitly state their intention when doing index projection.

Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260602-projection-syntax-rework-v2-6-6989470f5440@garyguo.net
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
10 days agogpu: nova-core: convert to keyworded projection syntax
Gary Guo [Tue, 2 Jun 2026 14:17:56 +0000 (15:17 +0100)] 
gpu: nova-core: convert to keyworded projection syntax

Use "build" to denote that the index bounds checking here is performed at
build time.

Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260602-projection-syntax-rework-v2-5-6989470f5440@garyguo.net
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
10 days agorust: dma: update to keyworded index projection syntax
Gary Guo [Tue, 2 Jun 2026 14:17:55 +0000 (15:17 +0100)] 
rust: dma: update to keyworded index projection syntax

Demonstrate the preferred syntax of index projection in DMA documentation
and examples. A few `[i]?` cases are converted to demonstrate the new
variant.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260602-projection-syntax-rework-v2-4-6989470f5440@garyguo.net
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
10 days agorust: ptr: add panicking index projection variant
Gary Guo [Tue, 2 Jun 2026 14:17:54 +0000 (15:17 +0100)] 
rust: ptr: add panicking index projection variant

There have been a few cases where the programmer knows that the indices are
in bounds but the compiler cannot deduce that. This is also
compiler-version-dependent, so using build indexing here can be
problematic. On the other hand, it is also not ideal to use the fallible
variant, as it adds an error handling path that is never hit.

Add a new panicking index projection for this scenario. Like all panicking
operations, this should be used carefully only in cases where the user
knows the index is going to be in bounds, and panicking would indicate
something is catastrophically wrong.

To signify this, require users to explicitly denote the type of index being
used. The existing two types of index projections also gain the keyworded
version, which will be the recommended way going forward.

The keyworded syntax also paves the way of perhaps adding more flavors in
the future, e.g. `unsafe` index projection. However, unless the code is
extremely performance sensitive and bounds checking cannot be tolerated,
the panicking variant is safer and should be preferred, so it will be left
to the future when demand arises.

Signed-off-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260602-projection-syntax-rework-v2-3-6989470f5440@garyguo.net
[ Fixed broken intra-doc link. Added a few extra intra-doc links. Reworded
  some docs slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
10 days agorust: ptr: use `match` instead of `unwrap_or_else` for `build_index`
Gary Guo [Tue, 2 Jun 2026 14:17:53 +0000 (15:17 +0100)] 
rust: ptr: use `match` instead of `unwrap_or_else` for `build_index`

Use `match` to avoid potential inlining issues of the `unwrap_or_else`
function.

Suggested-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/rust-for-linux/aeCKlut-88SbNsyW@google.com/
Signed-off-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260602-projection-syntax-rework-v2-2-6989470f5440@garyguo.net
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
10 days agorust: ptr: rename `ProjectIndex::index` to `build_index`
Gary Guo [Tue, 2 Jun 2026 14:17:52 +0000 (15:17 +0100)] 
rust: ptr: rename `ProjectIndex::index` to `build_index`

The corresponding `SliceIndex` trait in Rust uses `index` to mean the
panicking variant, which is also being added to `ProjectIndex`. Hence
rename our custom `build_error!` index variant to `build_index`.

Suggested-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://lore.kernel.org/rust-for-linux/DI5LLN2V3XCS.34H4CG99N4MPA@nvidia.com
Signed-off-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260602-projection-syntax-rework-v2-1-6989470f5440@garyguo.net
[ Reworded docs slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
10 days agoALSA: seq: dummy: fix UMP event stack overread
Kyle Zeng [Fri, 5 Jun 2026 08:02:04 +0000 (01:02 -0700)] 
ALSA: seq: dummy: fix UMP event stack overread

The dummy sequencer port forwards events by copying an incoming
struct snd_seq_event into a stack temporary, rewriting source and
destination, and dispatching the temporary to subscribers. That legacy
event storage is smaller than struct snd_seq_ump_event.

When a UMP event reaches the dummy client, the copy leaves the UMP flag
set but only provides legacy-sized stack storage. The subscriber
delivery path then uses snd_seq_event_packet_size() and copies a
UMP-sized packet from that stack object, reading past the end of the
temporary.

Use the existing union __snd_seq_event storage and copy the packet size
reported for the incoming event before rewriting the common routing
fields. This preserves the full UMP packet for UMP events while keeping
legacy event handling unchanged.

Fixes: 32cb23a0f911 ("ALSA: seq: dummy: Allow UMP conversion")
Signed-off-by: Kyle Zeng <kylebot@openai.com>
Link: https://patch.msgid.link/20260605080204.32045-1-kylebot@openai.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
10 days agoMerge patch series "proc: protect ptrace_may_access() with exec_update_lock"
Christian Brauner [Fri, 22 May 2026 11:49:13 +0000 (13:49 +0200)] 
Merge patch series "proc: protect ptrace_may_access() with exec_update_lock"

Jann Horn <jannh@google.com> says:

My understanding is that procfs is effectively maintained by the VFS
maintainers (though scripts/get_maintainer.pl claims that there are
no maintainers for procfs because the VFS entry only claims files
directly in fs/, and the procfs entry has no maintainers listed on
it).

In procfs, most uses of ptrace_may_access() should use
exec_update_lock to avoid TOCTOU issues with concurrent privileged
execve() (like setuid binary execution).

This series doesn't fix all the remaining issues in procfs, but it fixes
the easy cases for now; I will probably follow up with fixes for the
gnarlier cases later unless someone else wants to do that.

I have checked that procfs files still work with these changes and that
CONFIG_PROVE_LOCKING=y doesn't generate any warnings.

(checkpatch complains about missing argument names in
proc_op::proc_get_link, but that was already the case before my patch.)

* patches from https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-0-5c3d20e0ac33@google.com:
  proc: protect ptrace_may_access() with exec_update_lock (FD links)
  proc: protect ptrace_may_access() with exec_update_lock (part 1)

Link: https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-0-5c3d20e0ac33@google.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agoproc: protect ptrace_may_access() with exec_update_lock (FD links)
Jann Horn [Mon, 18 May 2026 16:35:16 +0000 (18:35 +0200)] 
proc: protect ptrace_may_access() with exec_update_lock (FD links)

proc_pid_get_link() and proc_pid_readlink() currently look up the task from
the pid once, then do the ptrace access check on that task, then look up
the task from the pid a second time to do the actual access.
That's racy in several ways.

To fix it, pass the task to the ->proc_get_link() handler, and instead of
proc_fd_access_allowed(), introduce a new helper call_proc_get_link() that
looks up and locks the task, does the access check, and calls
->proc_get_link().

Fixes: 778c1144771f ("[PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-2-5c3d20e0ac33@google.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agoproc: protect ptrace_may_access() with exec_update_lock (part 1)
Jann Horn [Mon, 18 May 2026 16:35:15 +0000 (18:35 +0200)] 
proc: protect ptrace_may_access() with exec_update_lock (part 1)

Fix the easy cases where procfs currently calls ptrace_may_access() without
exec_update_lock protection, where the fix is to simply add the extra lock
or use mm_access():

 - do_task_stat(): grab exec_update_lock
 - proc_pid_wchan(): grab exec_update_lock
 - proc_map_files_lookup(): use mm_access() instead of get_task_mm()
 - proc_map_files_readdir(): use mm_access() instead of get_task_mm()
 - proc_ns_get_link(): grab exec_update_lock
 - proc_ns_readlink(): grab exec_update_lock

Fixes: f83ce3e6b02d ("proc: avoid information leaks to non-privileged processes")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-1-5c3d20e0ac33@google.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
10 days agosimple_lookup(): use d_splice_alias() for ->lookup() return value
Al Viro [Sat, 9 May 2026 16:29:41 +0000 (12:29 -0400)] 
simple_lookup(): use d_splice_alias() for ->lookup() return value

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoecryptfs: use d_splice_alias() for ->lookup() return value
Al Viro [Sat, 9 May 2026 16:28:48 +0000 (12:28 -0400)] 
ecryptfs: use d_splice_alias() for ->lookup() return value

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoconfigfs_lookup(): switch to d_splice_alias()
Al Viro [Fri, 8 May 2026 21:58:35 +0000 (17:58 -0400)] 
configfs_lookup(): switch to d_splice_alias()

more idiomatic

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agotracefs: use d_splice_alias() in ->lookup() instances
Al Viro [Tue, 27 Jan 2026 20:19:06 +0000 (15:19 -0500)] 
tracefs: use d_splice_alias() in ->lookup() instances

d_add() is not wrong there (inodes are freshly allocated), but
d_splice_alias() is more idiomatic.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agomake cursors NORCU
Al Viro [Tue, 5 May 2026 04:20:19 +0000 (00:20 -0400)] 
make cursors NORCU

All it requires is making sure that d_walk() will skip *all*
CURSOR dentries, even if somebody passes it one as an argument.

Cursors are negative and unhashed all along, never get added to
LRU or to shrink lists and no RCU references via ->d_sib are
possible for those - dentry_unlist() makes sure that no killed
dentry has ->d_sib.next left pointing to a cursor.

Seeing that a cursor is allocated every time we open a directory
on autofs, debugfs, devpts, etc., avoiding an RCU delay when such
opened files get closed is attractive...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agonfs: get rid of fake root dentries
Al Viro [Wed, 15 Apr 2026 23:29:53 +0000 (19:29 -0400)] 
nfs: get rid of fake root dentries

... just grab the reference to the (real) root we are about to return
for the first mount of this superblock and be done with that.

Once upon a time dentry tree eviction at fs shutdown used to break
if ->s_root had been spliced on top of something; that hadn't been
the case for years now, and these fake root dentries violate a bunch
of invariants.  Let's get rid of them...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agowind ->s_roots via ->d_sib instead of ->d_hash
Al Viro [Sat, 18 Apr 2026 22:39:03 +0000 (18:39 -0400)] 
wind ->s_roots via ->d_sib instead of ->d_hash

shrink_dcache_for_umount() is supposed to handle the possibility of
some of the dentries to be evicted being in other threads shrink
lists; it either kills them, leaving an empty husk to be freed by
the owner of shrink list whenever it gets around to that, or it
waits for the eviction in progress to get completed.

That relies upon dentry remaining attached to the tree until the
eviction reaches dentry_unlist() and its ->d_sib gets removed
from the list.  Unfortunately, the secondary roots are linked
via ->d_hash, rather than ->d_sib and they become removed from
that list before their inode references are dropped.

If shrink_dentry_list() from another thread ends up evicting
one of the secondary roots and gets to that point in dentry_kill()
when shrink_dcache_for_umount() is looking for secondary roots,
the latter will *not* notice anything, possibly leading to
warnings about busy inodes at umount time and all kinds of breakage
after that.

Moreover, shrink_dcache_for_umount() walks the list of secondary
roots with no protection whatsoever, so it might end up calling
dget() on a dentry that already passed through
lockref_mark_dead(&dentry->d_lockref);
ending up with corrupted refcount and possible UAF.

AFAICS, the most straightforward way to deal with that would be
to have secondary roots linked via ->d_sib rather than ->d_hash;
then they would remain on the list until killed, and we could
use d_add_waiter() machinery to wait for eviction in progress.

Changes:
* secondary roots look the same as ->s_root from d_unhashed()
and d_unlinked() POV now.
* secondary roots are represented as "no parent, but on ->d_sib"
instead of "no parent, but on ->d_hash".
* since ->d_sib is a plain hlist, we protect it with per-superblock
spinlock (sb->s_roots_lock) instead of the LSB of the head pointer (for
non-root dentries it would be protected by ->d_lock of parent).
* __d_obtain_alias() uses ->d_sib for linkage when allocating
a secondary root.
* d_splice_alias_ops() detects splicing of a secondary root and
removes it from the list before calling __d_move().
* dentry_unlist() detects eviction of a secondary root and
removes it from the list; no need to play the games for d_walk() sake,
since the latter is not going to look for the next sibling of those
anyway.
* ___d_drop() doesn't care about ->s_roots anymore.
* shrink_dcache_for_umount() uses proper locking for access to
the list of secondary roots and if it runs into one that is in the middle
of eviction waits for that to finish.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoshrink_dentry_tree(): unify the calls of shrink_dentry_list()
Al Viro [Thu, 16 Apr 2026 15:50:41 +0000 (11:50 -0400)] 
shrink_dentry_tree(): unify the calls of shrink_dentry_list()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoshrinking rcu_read_lock() scope in d_alloc_parallel()
Al Viro [Thu, 23 Apr 2026 18:29:18 +0000 (19:29 +0100)] 
shrinking rcu_read_lock() scope in d_alloc_parallel()

The current use of rcu_read_lock() uses in d_alloc_parallel()
is fairly opaque - the single large scope serves two purposes.

We start with lookup in normal hash, and there rcu_read_lock()
scope puts __d_lookup_rcu() and subsequent lockref_get_not_dead() into
the same RCU read-side critical area.

If no match is found, we proceed to lock the hash chain of
in-lookup hash and scan that for a match.  If we find a match, we want
to grab it and wait for lookup in progress to finish.  Since the bitlock
we use for these hash chains has to nest inside ->d_lock, we need to
unlock the chain first and use lockref_get_not_dead() on the match.
That has to be done without breaking the RCU read-side critical area,
and we use the same rcu_read_lock() scope to bridge over.

The thing is, after having grabbed the reference (and it is
very unlikely to fail) we proceed to grab ->d_lock - d_wait_lookup()
and __d_lookup_unhash()/__d_wake_in_lookup_waiters() are using that for
serialization. That makes lockref_get_not_dead() pointless - trying
to avoid grabbing ->d_lock for refcount increment, only to grab it
anyway immediately after that. If we grab ->d_lock first and replace
lockref_get_not_dead() with direct check for sign and increment if
non-negative we can move rcu_read_unlock() to immediately after grabbing
->d_lock.  Moreover, we don't need the RCU read-side critical area to
be contiguous since before earlier __d_lookup_rcu() - we can just as
well terminate the earlier one ASAP and call rcu_read_lock() again only
after having found a match (if any) in the in-lookup hash chain.

That makes the entire thing easier to follow and the purpose
of those rcu_read_lock() calls easier to describe - the first scope is
for __d_lookup_rcu() + lockref_get_not_dead(), the second one bridges
over from the bitlock scope to the ->d_lock scope on the match found in
in-lookup hash.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agod_walk(): shrink rcu_read_lock() scope
Al Viro [Tue, 21 Apr 2026 19:52:13 +0000 (15:52 -0400)] 
d_walk(): shrink rcu_read_lock() scope

we only need it to bridge over from ->d_lock scope of child to ->d_lock
scope of parent; dropping ->d_lock at rename_retry doesn't need to be
in rcu_read_lock() scope.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agodocument dentry_kill()
Al Viro [Sat, 11 Apr 2026 20:17:19 +0000 (16:17 -0400)] 
document dentry_kill()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoadjust calling conventions of lock_for_kill(), fold __dentry_kill() into dentry_kill()
Al Viro [Sat, 11 Apr 2026 08:17:02 +0000 (04:17 -0400)] 
adjust calling conventions of lock_for_kill(), fold __dentry_kill() into dentry_kill()

Pull dropping ->d_lock on lock_for_kill() failure into lock_for_kill() itself.
That reduces dentry_kill() to
if (!lock_for_kill(dentry))
return NULL;
return __dentry_kill(dentry);
at which point it's easier to move that if (...) into the beginning of __dentry_kill()
itself and rename it into dentry_kill().

Document the new calling conventions of lock_for_kill().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoDocument rcu_read_lock() use in select_collect2()
Al Viro [Sat, 11 Apr 2026 08:01:28 +0000 (04:01 -0400)] 
Document rcu_read_lock() use in select_collect2()

If select_collect2() finds something that is neither busy nor can
be moved to shrink list, it needs to return that to caller's caller
(shrink_dcache_tree()) ASAP and do so without grabbing references (among
other things, it might be already dying, in which case refcount can't be
incremented).  We are called inside a ->d_lock scope, but that scope is
going to be terminated as soon as we return to caller (d_walk()); ->d_lock
will be retaken by shrink_dcache_tree(), but we need to bridge between
these scopes, turning them into contiguous RCU read-side critical area.

We do that with rcu_read_lock() scope - it spans from unbalanced
rcu_read_lock() in select_collect2() to unbalanced rcu_read_unlock()
in shrink_dcache_tree().  That works, but it really needs to be documented;
it's rather unidiomatic and it had caused quite a bit of confusion - some
of it in form of patches "fixing" the damn thing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoShift rcu_read_{,un}lock() inside fast_dput()
Al Viro [Sat, 11 Apr 2026 07:56:42 +0000 (03:56 -0400)] 
Shift rcu_read_{,un}lock() inside fast_dput()

Shrink rcu_read_lock() scopes surrounding fast_dput() calls.
Both callers are immediately preceded and followed by
rcu_read_lock()/rcu_read_unlock() resp.  Shrink that down
into fast_dput() itself; in case when fast_dput() ends up
grabbing ->d_lock, we can pull rcu_read_unlock() up to
right after spin_lock().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agosimplify safety for lock_for_kill() slowpath
Al Viro [Sat, 11 Apr 2026 07:24:28 +0000 (03:24 -0400)] 
simplify safety for lock_for_kill() slowpath

rcu_read_lock() scopes in dentry eviction machinery are too wide
and badly structured; we end up with too many of those, quite
a few essentially identical.  Worse, quite a few of the function
involved are not neutral wrt that, making them harder to reason about.

rcu_read_lock() scope is not the only thing establishing an
RCU read-side critical area - spin_lock scope does the same and
they can be mixed - the sequence
rcu_read_lock()
...
spin_lock()
...
rcu_read_unlock()
...
rcu_read_lock()
...
spun_unlock()
...
rcu_read_unlock()
is an unbroken RCU read-side critical area.

Use of that observation allows to simplify things.  First of all,
lock_for_kill() relies upon being in an unbroken RCU read-side
critical area.  It's always called with ->d_lock held, and normally
returns without having ever dropped that spinlock.  We would not
need rcu_read_lock() at all, if not for the slow path - if trylock
of inode->i_lock fails, we need to drop and retake ->d_lock.

Having all calls of lock_for_kill() inside an rcu_read_lock() scope
takes care of that, but to show that lock_for_kill() slow path is safe,
we need to demonstrate such rcu_read_lock() scope for any call chain
leading to lock_for_kill().  Which is not fun, seeing that there are
10 such scopes, with 5 distinct beginnings between them.

Case 1: opens in dput() proceeds through fast_dput() grabbing ->d_lock,
returning false into dput() and there a call of finish_dput() which calls
dentry_kill(), which calls lock_for_kill(); ends in dentry_kill(), either
right after lock_for_kill() success or right after dropping ->d_lock
on lock_for_kill() failure.  ->d_lock is held continuously all the way
into lock_for_kill().

Case 2: opens in dentry_kill(), where we proceed to the same call of
dentry_kill() as in case 1.  ->d_lock is held since before the
beginning of the scope and all the way into lock_for_kill().

Case 3: opens in select_collect2(), proceeds through the return to
d_walk() and to shrink_dcache_tree() where we grab ->d_lock and
proceed to call shrink_kill(), which calls dentry_kill(), then as
in the previous scopes.

Case 4: opens in shrink_dentry_list(), followed by call of shrink_kill(),
then same as in case 3.  ->d_lock is held since before the beginning
of the scope and all the way into lock_for_kill().

Case 5: opens in shrink_kill(), where it's immediately followed by
call of dentry_kill(), then same as in the previous scopes.  ->d_lock
is held since before the beginning of the scope all the way into
lock_for_kill().

Note that in cases 2, 4 and 5 the slow path of lock_for_kill() is the
only part of rcu_read_lock() scope that is not covered by spinlock
scopes.  In case 1 we have the area in fast_dput() as well and in
case 3 - the return path from select_collect2() and chunk in shrink_dcache_tree()
up to grabbing ->d_lock.

Seeing that the reasons we need rcu_read_lock() in these additional
areas are completely unrelated to lock_for_kill() slow path, the things
get much more straightforward with
* explicit rcu_read_lock() scope surrounding the area in slow path
of lock_for_kill() where ->d_lock is not held
* shrink_dentry_list() dropping rcu_read_lock() as soon as it has
grabbed ->d_lock.
* dput() dropping rcu_read_lock() just before calling finish_dput().
* rcu_read_lock() calls in finish_dput(), shrink_kill() and
shrink_dentry_list() are removed, along with rcu_read_unlock() calls in
dentry_kill().

RCU read-side critical areas are unchanged by that, safety of lock_for_kill()
slow path is trivial to verify and a bunch of rcu_read_lock() scopes either
gone or become easier to describe.

Update the comments on locking conventions and memory safety considerations,
including the NORCU case.

Incidentally, all calls of fast_dput() are immediately preceded by rcu_read_lock()
and followed by rcu_read_unlock() now, which will allow to simplify those on
the next step...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agofold lock_for_kill() and __dentry_kill() into common helper
Al Viro [Sat, 11 Apr 2026 07:14:19 +0000 (03:14 -0400)] 
fold lock_for_kill() and __dentry_kill() into common helper

There are two callers of lock_for_kill() and both are followed
by the same sequence of actions:
* in case of failure, drop ->d_lock, do rcu_read_unlock() and
go away
* in case of success, do rcu_read_unlock() followed by
passing dentry to __dentry_kill(); if the latter returns NULL, go away.

All calls of __dentry_kill() are paired with lock_for_kill() now;
let's turn that sequence into a new helper (dentry_kill()) and switch
to using it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agofold lock_for_kill() into shrink_kill()
Al Viro [Sat, 11 Apr 2026 07:06:39 +0000 (03:06 -0400)] 
fold lock_for_kill() into shrink_kill()

Both callers have exact same shape.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoshrink_dentry_list(): start with removing from shrink list
Al Viro [Sat, 11 Apr 2026 05:52:53 +0000 (01:52 -0400)] 
shrink_dentry_list(): start with removing from shrink list

Currently we leave dentry on the list until we are done with
lock_for_kill().  That guarantees that it won't have been
even scheduled for removal until we remove it from the list
and drop ->d_lock.  We grab ->d_lock and rcu_read_lock()
and call lock_for_kill().  There are four possible cases:
1) lock_for_kill() has succeeded; dentry and its inode
(if any) are locked, dentry refcount is zero and we can
remove it from shrink list and feed it to shrink_kill().
2) lock_for_kill() fails since dentry has become busy.
Nothing to do, rcu_read_unlock(), remove from shrink list,
drop ->d_lock and move on.
3) lock_for_kill() fails since dentry is currently
being killed - already entered __dentry_kill(), but hasn't
reached dentry_unlist() yet.  Nothing to do, we should just
do rcu_read_unlock(), remove from shrink list so that
whoever's executing __dentry_kill() would free it once they
are done, drop ->d_lock and move on - same actions as in
case (2).
4) lock_for_kill() fails since dentry has been killed
(reached dentry_unlist(), DCACHE_DENTRY_KILLED set in ->d_flags).
In that case whoever had been killing it had already seen it
on our shrink list and skipped freeing it.  At that point it's
just a passive chunk of memory; rcu_read_unlock(), remove from
the list, drop ->d_lock and use dentry_free() to schedule
freeing.

While that works, there's a simpler way to do it:
* grab ->d_lock
* remove dentry from our shrink list
* if DCACHE_DENTRY_KILLED is already set, drop ->d_lock,
call dentry_free() and move on.
* otherwise grab rcu_read_lock() and call lock_for_free()
* if lock_for_kill() succeeds, feed dentry
to shrink_kill(), otherwise drop the locks and move on.

The end result is equivalent to the old variant.  The only difference
arises if at the time we grab ->d_lock dentry had refcount 0 and
lock_for_kill() had failed spin_trylock() and had to drop and regain
->d_lock.  Otherwise nobody can observe at which point within the
unbroken ->d_lock scope dentry had been removed from the shrink list -
all accesses to ->d_lru are under ->d_lock.

If ->d_lock had been dropped and regained, it is possible for another
thread to feed that dentry to __dentry_kill(); if it doesn't get to
dentry_unlist() before we regain ->d_lock, behaviour is still identical -
it's case (3) and by the time __dentry_kill() would've gotten around
to checking if the victim is on shrink list, it would've been already
removed from ours.

If __dentry_kill() from another thread *does* get to dentry_unlist(),
in the old variant we would have __dentry_kill() leave calling
dentry_free() to us and in the new one __dentry_kill() would've called
dentry_free() itself.  Since we are under rcu_read_lock(), we are
guaranteed that actual freeing won't happen until we get around to
rcu_read_unlock().  IOW, the new variant is still safe wrt UAF, if
not for the same reason as the old one, and overall result is the same;
the only difference is which threads ends up scheduling the actual
freeing of dentry.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agod_prune_aliases(): make sure to skip NORCU aliases
Al Viro [Mon, 4 May 2026 06:49:20 +0000 (02:49 -0400)] 
d_prune_aliases(): make sure to skip NORCU aliases

Either they are busy (in which case they won't be moved to shrink
list anyway) or they have a zero refcount, in which case we really
shouldn't mess with them - whoever had dropped the refcount to
zero is on the way to evicting and freeing them.

That way we are guaranteed that only the thread that has dropped
refcount of NORCU dentry to zero might call lock_for_kill() and
__dentry_kill() for those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agokill d_dispose_if_unused()
Al Viro [Mon, 13 Apr 2026 03:39:16 +0000 (23:39 -0400)] 
kill d_dispose_if_unused()

Rename to_shrink_list() into __move_to_shrink_list(), document and
export it.  Switch d_dispose_if_unused() users to that and kill
d_dispose_if_unused() itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agomake to_shrink_list() return whether it has moved dentry to list
Al Viro [Sun, 12 Apr 2026 18:35:38 +0000 (14:35 -0400)] 
make to_shrink_list() return whether it has moved dentry to list

... and make it check the refcount for being zero in addition to
dentry not being on a shrink list already.  Simplifies the callers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoselect_collect(): ignore dentries on shrink lists if they have positive refcounts
Al Viro [Sun, 12 Apr 2026 18:17:52 +0000 (14:17 -0400)] 
select_collect(): ignore dentries on shrink lists if they have positive refcounts

If all dentries we find have positive refcounts and some happen
to be on shrink lists, there's no point trying to steal them in the
select_collect2() phase - we won't be able to evict any of them.  Busy on
shrink lists is still busy...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agofind_acceptable_alias(): skip NORCU aliases with zero refcount
Al Viro [Mon, 4 May 2026 04:32:43 +0000 (00:32 -0400)] 
find_acceptable_alias(): skip NORCU aliases with zero refcount

similar to d_find_any_alias() situation

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agofix a race between d_find_any_alias() and final dput() of NORCU dentries
Al Viro [Mon, 4 May 2026 03:00:09 +0000 (23:00 -0400)] 
fix a race between d_find_any_alias() and final dput() of NORCU dentries

Refcount of a NORCU dentry must not be incremented after having dropped
to zero.  Otherwise we might end up with the following race:
CPU1: in fast_dput(d), rcu_read_lock();
CPU1: decrements refcount of d to 0
CPU1: notice that it's unhashed
CPU2: grab a reference to d
CPU2: dput(d), freeing d
CPU1: ... looks like we need to evict d, let's grab ->d_lock, recheck
      the refcount, etc.
and that spin_lock(&d->d_lock) ends up a UAF, despite still being in
an RCU read-side critical area started back when the refcount had been
positive.  If not for DCACHE_NORCU in d->d_flags freeing would've been
RCU-delayed, so we'd have grabbed ->d_lock, noticed the negative value
stored into refcount by __dentry_kill(), dropped the locks and that would
be it.  For NORCU dentries freeing is _not_ delayed, though.

Most of the non-counting references are excluded for NORCU dentries -
they are not allowed to be hashed, they never get placed on LRU, they
never get placed into anyone's list of children and while dput_to_list()
might put them into a shrink list, nobody bumps refcount of something
that had been reached that way.

However, inode's list of aliases can be a problem - it does not contribute
to dentry refcount (for obvious reasons) and we *do* have places that
grab references to something found on that list - that's precisely what
d_find_alias() is.  In case of d_find_alias() we are safe - it skips
unhashed aliases, so all NORCU ones are ignored there.  d_find_any_alias()
is *not* limited to hashed ones, though, and while it's usually called
for directories (which never get NORCU dentries), there are callers that
use it to get something for non-directories with no hashed aliases.

Having d_find_any_alias() hit a NORCU dentry is not impossible - it can
be easily arranged if you have CAP_DAC_READ_SEARCH (memfd_create() + mmap()
+ name_to_handle_at() for /proc/self/map_files/<...> + munmap() +
open_by_handle_at() will do that, and adding a second memfd_create() for
mount_fd makes it possible to do that without having memfd pinned).
The race window is narrow, and it's probably not feasible on bare hardware,
but...

It's not hard to fix, fortunately:
* separate __d_find_dir_alias() (== current __d_find_any_alias()) to
be used for directory inodes.
* provide dget_alias_ilocked() that would return false for NORCU
dentries with zero refcount and return true incrementing refcount otherwise
* make __d_find_any_alias() go over the list of aliases, using
dget_alias_ilocked() and returning the alias it succeeds on (normally the
first one).  Any NORCU alias with zero refcount is going to be evicted by
the thread that had dropped the final reference; this makes __d_find_any_alias()
pretend it had lost the race with eviction.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoalloc_path_pseudo(): make sure we don't end up with NORCU dentries for directories
Al Viro [Mon, 27 Apr 2026 18:19:28 +0000 (14:19 -0400)] 
alloc_path_pseudo(): make sure we don't end up with NORCU dentries for directories

A lot of places relies upon directories never having NORCU dentries;
currently that property holds, but the proof is not straightforward
and rather brittle.

It's better to have that verified in the sole caller of d_alloc_pseudo(),
so that any future bugs in that direction were caught early.

That way we can be sure that
* current directory of any process is not NORCU
* root directory of any process is not NORCU
* starting point of any LOOKUP_RCU pathwalk is not NORCU
* dget_parent() can rely upon ->d_parent not being NORCU
* d_walk() and is_subdir() can rely upon the same
* alloc_file_pseudo() won't create multiple aliases for a directory
without having to go through a convoluted audit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoVFS: use wait_var_event for waiting in d_alloc_parallel()
NeilBrown [Thu, 30 Apr 2026 19:42:43 +0000 (15:42 -0400)] 
VFS: use wait_var_event for waiting in d_alloc_parallel()

Parallel lookup starts with a call of d_alloc_parallel().  That primitive
either returns a matching hashed dentry or allocates a new one in the
in-lookup state and returns it to the caller.  Once the caller is done
with lookup, it indicates so either by call of d_{splice_alias,add}()
or by call of d_done_lookup(); at that point dentry leaves the in-lookup
state.

If d_alloc_parallel() finds a matching in-lookup dentry, it must wait for
that dentry to leave the in-lookup state, one way or another.  Currently
by supplying wait_queue_head to d_alloc_parallel().  If d_alloc_parallel()
creates a new in-lookup dentry, the address of that wait_queue_head is stored
in ->d_wait of new dentry and stays there while it's in the in-lookup;
subsequent d_alloc_parallel() will wait on the queue found in the matching
in-lookup dentry.  Transition out of in-lookup state wakes waiters on that
queue (if any).

That works, but the calling conventions are inconvenient - the caller must
supply wait_queue_head and make sure that it survives at least until the new
in-lookup dentry leaves the in-lookup state.  That amounts to boilerplate
in the d_alloc_parallel() callers that are followed by a call of d_lookup_done()
in the same function; in cases like nfs asynchronous unlink it gets worse than
that.

This patch changes d_alloc_parallel() to use wake_up_var_locked() to
wake up waiters, and wait_var_event_spinlock() to wait.  dentry->d_lock
is used for synchronisation as it is already held and the relevant
times.

That eliminates the need of caller-supplied wait_queue_head, simplifying
the calling conventions.  Better yet, we only need one bit of information
stored in dentry itself: whether there are any waiters to be woken up,
and that can be easily stored in ->d_flags; ->d_wait goes away.

The reason we need that bit (DCACHE_LOOKUP_WAITERS) is that with wait_var
machinery the queues are shared with all kinds of stuff and there's
no way tell if any of the waiters have anything to do with our dentry;
most of the time none of them will be relevant, so we need to avoid the
pointless wakeups.

Another benefit of the new scheme comes from the fact that wakeups
have to be done outside of write-side critical areas of ->i_dir_seq;
with the old scheme we need to carry the value picked from ->d_wait from
__d_lookup_unhash() to the place where we actually wake the waiters up.
Now we can just leave DCACHE_LOOKUP_WAITERS in ->d_flags until we get
to doing wakeups - that's done within the same ->d_lock scope, so we
are fine; new bit is accessed only under ->d_lock and it's seen only
on dentries with DCACHE_PAR_LOOKUP in ->d_flags.

__d_lookup_unhash() no longer needs to re-init ->d_lru.  That was
previously shared (in a union) with ->d_wait but ->d_wait is now gone
so it no longer corrupts ->d_lru.

Co-developed-by: Al Viro <viro@zeniv.linux.org.uk> # saner handling of flags
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 days agoaccel/ethosu: fix OOB write in ethosu_gem_cmdstream_copy_and_validate()
Muhammad Bilal [Sat, 23 May 2026 19:08:43 +0000 (19:08 +0000)] 
accel/ethosu: fix OOB write in ethosu_gem_cmdstream_copy_and_validate()

The command stream parsing loop increments the index variable a second
time when a 64-bit command word is encountered (bit 14 set), but does
not re-check the loop bound before writing the second word:

    for (i = 0; i < size / 4; i++) {
        bocmds[i] = cmds[0];
        if (cmd & 0x4000) {
            i++;
            bocmds[i] = cmds[1];   /* unchecked */
        }
    }

The buffer bocmds is backed by a DMA allocation of exactly size bytes
from drm_gem_dma_create(ddev, size), giving valid indices [0, size/4-1].

When i == size/4 - 1 on entry to an iteration and bit 14 of cmds[0] is
set, bocmds[size/4-1] is written in bounds, i is then incremented to
size/4, and bocmds[size/4] writes four bytes past the end of the
allocation.

Userspace controls both the buffer contents and the size argument via
the ioctl, making this a userspace-triggerable heap out-of-bounds write.

Fix by checking the incremented index against the buffer bound before
the second write and returning -EINVAL if the buffer is too small to
contain the extended command.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260523190843.33977-1-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
10 days agonet: mv643xx: fix OF node refcount
Bartosz Golaszewski [Tue, 2 Jun 2026 07:34:14 +0000 (09:34 +0200)] 
net: mv643xx: fix OF node refcount

Platform devices created with platform_device_alloc() call
platform_device_release() when the last reference to the device's
kobject is dropped. This function calls of_node_put() unconditionally.
This works fine for devices created with platform_device_register_full()
but users of the split approach (platform_device_alloc() +
platform_device_add()) must bump the reference of the of_node they
assign manually. Add the missing call to of_node_get().

Cc: stable@vger.kernel.org
Fixes: 76723bca2802 ("net: mv643xx_eth: add DT parsing support")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260602073414.22500-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoinet: frags: fix use-after-free caused by the fqdir_pre_exit() flush
Hyunwoo Kim [Tue, 2 Jun 2026 10:21:05 +0000 (19:21 +0900)] 
inet: frags: fix use-after-free caused by the fqdir_pre_exit() flush

On netns teardown, fqdir_pre_exit() walks the fqdir rhashtable and
flushes every fragment queue that is not yet complete using
inet_frag_queue_flush(). That helper frees all the skbs queued on the
fragment queue but does not set INET_FRAG_COMPLETE, and leaves
q->fragments_tail and q->last_run_head pointing at the freed skbs.
The queue itself stays in the rhashtable.

fqdir_pre_exit() first lowers high_thresh to 0 to stop new queue lookups,
but it cannot stop a fragment that already obtained the queue through
inet_frag_find() earlier and stalled just before taking the queue lock.
Once that fragment resumes after the flush and takes the queue lock,
it passes the INET_FRAG_COMPLETE check and then dereferences the freed
fragments_tail. inet_frag_queue_insert() reads FRAG_CB() and ->len of
that pointer and, on the append path, writes ->next_frag, causing a
slab use-after-free. IPv6, nf_conntrack_reasm6 and 6lowpan reassembly
share the same flush path and are affected as well.

Reset rb_fragments, fragments_tail and last_run_head in
inet_frag_queue_flush() so a flushed queue no longer points at the
freed skbs. A fragment that resumes after the flush and takes the
queue lock then finds an empty queue and starts a new run instead of
dereferencing the freed fragments_tail. ip_frag_reinit() already
performed this reset after its own flush, so drop the now duplicate
code there.

Cc: stable@vger.kernel.org
Fixes: 006a5035b495 ("inet: frags: flush pending skbs in fqdir_pre_exit()")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Link: https://patch.msgid.link/ah6ukYq5G98LshdA@v4bel
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoaccel/ethosu: reject DMA commands with uninitialized length
Muhammad Bilal [Sun, 24 May 2026 13:03:19 +0000 (13:03 +0000)] 
accel/ethosu: reject DMA commands with uninitialized length

cmd_state_init() initializes the command state with memset(0xff),
leaving dma->len at U64_MAX to signal missing setup. The only setter
is NPU_SET_DMA0_LEN; if userspace omits this command and issues
NPU_OP_DMA_START, dma->len remains U64_MAX.

In dma_length(), a positive stride added to U64_MAX wraps to a small
value. With size0 == 1, check_mul_overflow() does not trigger and
dma_length() returns 0 instead of U64_MAX. The caller's U64_MAX check
then passes, region_size[] stays 0, and the bounds check in
ethosu_job.c is bypassed, allowing hardware to execute DMA with stale
physical addresses.

Fix by checking for U64_MAX at the start of dma_length() before any
arithmetic, consistent with the sentinel value used throughout the
driver to detect uninitialized fields.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260524130319.12747-1-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
10 days agoaccel/ethosu: fix arithmetic issues in dma_length()
Muhammad Bilal [Sun, 24 May 2026 10:37:10 +0000 (10:37 +0000)] 
accel/ethosu: fix arithmetic issues in dma_length()

dma_length() derives DMA region usage from command stream values and
updates region_size[]:

    len = ((len + stride[0]) * size0 + stride[1]) * size1
    region_size[region] = max(..., len + dma->offset)

Several arithmetic issues can corrupt the derived region size:

- signed stride values may underflow when added to len
- intermediate multiplications may overflow
- len + dma->offset may overflow during region_size updates
- dma_length() error returns were not validated by the caller

region_size[] is later used by ethosu_job.c to validate command stream
accesses against GEM buffer sizes. Arithmetic wraparound can therefore
under-report region usage and bypass the bounds validation.

Fix by validating signed additions, using overflow helpers for
multiplications and offset updates, and propagating dma_length()
failures to the caller.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260524103710.47397-1-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
10 days agoaccel/ethosu: fix wrong weight index in NPU_SET_SCALE1_LENGTH on U85
Muhammad Bilal [Sat, 23 May 2026 21:07:53 +0000 (21:07 +0000)] 
accel/ethosu: fix wrong weight index in NPU_SET_SCALE1_LENGTH on U85

On non-U65 hardware (e.g. U85), opcode 0x4093 is NPU_SET_WEIGHT2_LENGTH.
The BASE handler for the same opcode correctly assigns to
st.weight[2].base, but the LENGTH handler mistakenly assigns cmds[1]
to st.weight[1].length instead of st.weight[2].length.

This leaves weight[2].length at its initialised sentinel value of
0xffffffff and corrupts weight[1].length with the user-supplied value,
breaking the software bounds-check state for both weight buffers on U85.

Fix the index to match the BASE handler.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260523210840.92039-3-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
10 days agoaccel/ethosu: reject NPU_OP_RESIZE commands from userspace
Muhammad Bilal [Sat, 23 May 2026 21:07:52 +0000 (21:07 +0000)] 
accel/ethosu: reject NPU_OP_RESIZE commands from userspace

NPU_OP_RESIZE is a U85-only command that the driver does not yet
implement. The existing WARN_ON(1) placeholder fires unconditionally
whenever userspace submits this command via DRM_IOCTL_ETHOSU_GEM_CREATE,
causing unbounded kernel log spam.

If panic_on_warn is set the kernel panics, giving any unprivileged user
with access to the DRM device a trivial denial-of-service primitive.

Replace the WARN_ON(1) with an explicit -EINVAL return so the ioctl
rejects the command before it reaches hardware.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260523210840.92039-2-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
10 days agoaccel/ethosu: fix IFM region index out-of-bounds in command stream parser
Muhammad Bilal [Sat, 23 May 2026 19:51:59 +0000 (19:51 +0000)] 
accel/ethosu: fix IFM region index out-of-bounds in command stream parser

NPU_SET_IFM_REGION extracts the region index with param & 0x7f, giving
a maximum value of 127. However region_size[] and output_region[] in
struct ethosu_validated_cmdstream_info are both sized to
NPU_BASEP_REGION_MAX (8), giving valid indices [0..7].

Every other region assignment in the same switch uses param & 0x7:
  NPU_SET_OFM_REGION:  st.ofm.region  = param & 0x7;
  NPU_SET_IFM2_REGION: st.ifm2.region = param & 0x7;
  NPU_SET_WEIGHT_REGION: st.weight[0].region = param & 0x7;
  NPU_SET_SCALE_REGION:  st.scale[0].region  = param & 0x7;

The 0x7f mask on IFM is inconsistent and appears to be a typo.

feat_matrix_length() and calc_sizes() use the region index directly
as an array subscript into the kzalloc'd info struct:
  info->region_size[fm->region] = max(...);

A userspace caller supplying NPU_SET_IFM_REGION with param > 7 causes
a write up to 127*8 = 1016 bytes past the start of region_size[],
corrupting adjacent kernel heap data.

Fix by applying the same & 0x7 mask used by all other region
assignments.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260523195159.55801-1-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
10 days agoMerge tag 'net-7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 4 Jun 2026 21:35:55 +0000 (14:35 -0700)] 
Merge tag 'net-7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from Netfilter, wireless and Bluetooth.

  Current release - fix to a fix:

   - Bluetooth: MGMT: fix backward compatibility with bluetoothd
     which adds stray bytes to MGMT_OP_ADD_EXT_ADV_DATA

  Previous releases - regressions:

   - af_unix: fix inq_len update inaccuracy on partial read

   - eth: fec: fix pinctrl default state restore order on resume

   - wifi: iwlwifi:
       - mvm: don't support the reset handshake for old firmwares
       - pcie: simplify the resume flow if fast resume is not used,
         work around NIC access failures

  Previous releases - always broken:

   - Bluetooth: L2CAP: reject BR/EDR signaling packets over MTUsig

   - sctp: fix a couple of bugs in COOKIE_ECHO processing

   - sched: fix pedit partial COW leading to page cache corruption

   - wifi: nl80211: reject oversized EMA RNR lists

   - netfilter:
       - conntrack_irc: fix possible out-of-bounds read
       - bridge: make ebt_snat ARP rewrite writable

   - appletalk: zero-initialize aarp_entry to prevent heap info leak

   - ipv4: restrict IPOPT_SSRR and IPOPT_LSRR options

   - mptcp: fix number of bugs reported by AI scans and discovered
     during NVMe over MPTCP testing"

* tag 'net-7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  Reapply "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"
  udp: clear skb->dev before running a sockmap verdict
  sctp: purge outqueue on stale COOKIE-ECHO handling
  bonding: annotate data-races arcound churn variables
  net/802/mrp: fix vector attribute parsing in mrp_pdu_parse_vecattr
  rtase: Avoid sleeping in get_stats64()
  ieee802154: 6lowpan: only accept IPv6 packets in lowpan_xmit()
  ipv6: mcast: Fix use-after-free when processing MLD queries
  selftests: net: add vxlan vnifilter notification test
  vxlan: vnifilter: fix spurious notification on VNI update
  vxlan: vnifilter: send notification on VNI add
  rtase: Reset TX subqueue when clearing TX ring
  octeontx2-af: npc: Fix CPT channel mask in npc_install_flow
  dt-bindings: ethernet: eswin: fix hsp-sp-csr backward compatibility
  sctp: validate cached peer INIT chunk length in COOKIE_ECHO processing
  net/sched: fix pedit partial COW leading to page cache corruption
  vsock/vmci: fix sk_ack_backlog leak on failed handshake
  net: bonding: fix NULL pointer dereference in bond_do_ioctl()
  geneve: fix length used in GRO hint UDP checksum adjustment
  net: ethernet: mtk_eth_soc: Fix use-after-free in metadata dst teardown
  ...

10 days agoMerge tag 'drm-xe-fixes-2026-06-04' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Thu, 4 Jun 2026 21:18:09 +0000 (07:18 +1000)] 
Merge tag 'drm-xe-fixes-2026-06-04' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Revert removing support for unpublished NVL-S GuC (Daniele)
- Suspend fixes related to multi-queue (Niranjana)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aiHPGiPrAyHgwBZl@intel.com
10 days agoMerge tag 'trace-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Thu, 4 Jun 2026 20:38:42 +0000 (13:38 -0700)] 
Merge tag 'trace-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:

 - Fix CFI violation in probestub function

   The probestub is a function to allow tprobes to hook to a tracepoint
   to gain access to its parameters.

   The function itself is only referenced by the tracepoint structure
   which lives in the __tracepoint section. objtool explicitly ignores
   that section and when processing functions in the kernel, if it
   detects one that has no references it will seal it to have its ENDBR
   stripped on boot up.

   This means the probstub function will have its ENDBR stripped and if
   a tprobe is attached to it with IBT enabled, it will go *boom*.

* tag 'trace-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix CFI violation in probestub being called by tprobes