]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
4 weeks agoMerge tag 'pull-coda' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Tue, 21 Apr 2026 21:03:10 +0000 (14:03 -0700)] 
Merge tag 'pull-coda' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull coda dcache updates from Al Viro:
 "Coda dcache-related cleanups and fixes"

* tag 'pull-coda' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coda_flag_children(): fix a UAF
  sanitize coda_dentry_delete()
  coda: is_bad_inode() is always false there

4 weeks agodrm/amd: Adjust ASPM support quirk to cover more Intel hosts
Mario Limonciello [Sun, 19 Apr 2026 04:16:52 +0000 (23:16 -0500)] 
drm/amd: Adjust ASPM support quirk to cover more Intel hosts

Some of the same issues identified in commit c770ef19673fb
("drm/amd/amdgpu: disable ASPM in some situations") also affect
Tiger Lake systems with GFX11 connected over USB4. Widen the net
to also match these hosts.

Fixes: d9b3a066dfcd ("drm/amd: Exclude dGPUs in eGPU enclosures from DPM quirks")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5145
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a214d888485b9f35fe03882a92962e6d5697849)

4 weeks agodrm/amd/display: Undo accidental fix revert in amdgpu_dm_ism.c
Leo Li [Fri, 17 Apr 2026 17:54:30 +0000 (13:54 -0400)] 
drm/amd/display: Undo accidental fix revert in amdgpu_dm_ism.c

[Why]

Pausing DPM power profiles during static screen caused a bunch of
audio/performance/clock issues that were addressed in this fix:
'commit 1412482b7143 ("Revert "drm/amd/display: pause the workload setting in dm"")'

This logic in function amdgpu_dm_crtc_vblank_control_worker() was moved
to amdgpu_dm_ism.c, but the fix was lost in the process.

[How]

Reapply the fix to amdgpu_dm_ism.c

Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bc621e91d6fc004cfae9148c5a91acad19ada3e4)

4 weeks agoMerge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 21 Apr 2026 18:46:22 +0000 (11:46 -0700)] 
Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull more crypto library updates from Eric Biggers:
 "Crypto library fix and documentation update:

   - Fix an integer underflow in the mpi library

   - Improve the crypto library documentation"

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: docs: Add rst documentation to Documentation/crypto/
  docs: kdoc: Expand 'at_least' when creating parameter list
  lib/crypto: mpi: Fix integer underflow in mpi_read_raw_from_sgl()

4 weeks agoio_uring/zcrx: warn on freelist violations
Pavel Begunkov [Tue, 21 Apr 2026 08:45:29 +0000 (09:45 +0100)] 
io_uring/zcrx: warn on freelist violations

The freelist is appropriately sized to always be able to take a free
niov, but let's be more defensive and check the invariant with a
warning. That should help to catch any double-free issues.

Suggested-by: Kai Aizen <kai@snailsploit.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/2f3cea363b04649755e3b6bb9ab66485a95936d5.1776760901.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/zcrx: clear RQ headers on init
Pavel Begunkov [Tue, 21 Apr 2026 08:46:44 +0000 (09:46 +0100)] 
io_uring/zcrx: clear RQ headers on init

It might be unexpected to users if the RQ head/tail after a ring
creation are not zeroed, fix that.

Cc: stable@vger.kernel.org
Fixes: 6f377873cb239 ("io_uring/zcrx: add interface queue and refill queue")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/331f94663c3e8f021ffa3cb770ca2844a07d4855.1776760911.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/zcrx: fix user_struct uaf
Pavel Begunkov [Tue, 21 Apr 2026 08:47:04 +0000 (09:47 +0100)] 
io_uring/zcrx: fix user_struct uaf

io_free_rbuf_ring() usees a struct user_struct, which
io_zcrx_ifq_free() puts it down before destroying the ring.

Cc: stable@vger.kernel.org
Fixes: 5c686456a4e83 ("io_uring/zcrx: add user_struct and mm_struct to io_zcrx_ifq")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/e560ae00960d27a810522a7efc0e201c82dff351.1776760917.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/register: fix ring resizing with mixed/large SQEs/CQEs
Jens Axboe [Mon, 20 Apr 2026 19:41:38 +0000 (13:41 -0600)] 
io_uring/register: fix ring resizing with mixed/large SQEs/CQEs

The ring resizing only properly handles "normal" sized SQEs or CQEs, if
there are pending entries around a resize. This normally should not be
the case, but the code is supposed to handle this regardless.

For the mixed SQE/CQE cases, the current copying works fine as they
are indexed in the same way. Each half is just copied separately. But
for fixed large SQEs and CQEs, the iteration and copy need to take that
into account.

Cc: stable@kernel.org
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/futex: ensure partial wakes are appropriately dequeued
Jens Axboe [Mon, 20 Apr 2026 20:24:50 +0000 (14:24 -0600)] 
io_uring/futex: ensure partial wakes are appropriately dequeued

If a FUTEX_WAITV vectored operation is only partially woken, we
should call __futex_wake_mark() on the queue to account for that.
If not, then a later wakeup will wake the same entry, rather than
the next one in line.

Fixes: 8f350194d5cfd ("io_uring: add support for vectored futex waits")
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/rw: add defensive hardening for negative kbuf lengths
Jens Axboe [Mon, 20 Apr 2026 19:16:19 +0000 (13:16 -0600)] 
io_uring/rw: add defensive hardening for negative kbuf lengths

No real bug here, just being a bit defensive in ensuring that whatever
gets passed into io_put_kbuf() is always >= 0 and not some random error
value.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/rsrc: use kvfree() for the imu cache
Jens Axboe [Mon, 20 Apr 2026 19:15:41 +0000 (13:15 -0600)] 
io_uring/rsrc: use kvfree() for the imu cache

Currently anything that requires kvmalloc_flex() for allocations will
not get re-cached, and hence the cache freeing path is correct in that
it always uses kfree() to free the allocated memory. But this seems a
bit fragile as it's something that could get mix should that situation
change, so switch io_free_imu() and io_alloc_cache_free() to use kvfree
as the desctructor.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/rsrc: unify nospec indexing for direct descriptors
Jens Axboe [Mon, 20 Apr 2026 19:14:54 +0000 (13:14 -0600)] 
io_uring/rsrc: unify nospec indexing for direct descriptors

For file updates, the node reset isn't capping the value via
array_index_nospec() like the other paths do. Ensure it's all sane and
have the update path do the proper capping as well.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: fix spurious fput in registered ring path
Jens Axboe [Mon, 20 Apr 2026 14:06:00 +0000 (14:06 +0000)] 
io_uring: fix spurious fput in registered ring path

Fix an issue with io_uring_ctx_get_file() not gating fput() on whether
or not the file descriptor is a registered/direct one or not.

Fixes: c5e9f6a96bf7 ("io_uring: unify getting ctx from passed in file descriptor")
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoMerge tag 'erofs-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 21 Apr 2026 18:16:04 +0000 (11:16 -0700)] 
Merge tag 'erofs-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

 - Fix dirent nameoff handling to avoid out-of-bound reads
   out of crafted images

 - Fix two type truncation issues on 32-bit platforms

* tag 'erofs-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: unify lcn as u64 for 32-bit platforms
  erofs: fix offset truncation when shifting pgoff on 32-bit platforms
  erofs: fix the out-of-bounds nameoff handling for trailing dirents

4 weeks agovfio/cdx: Consolidate MSI configured state onto cdx_irqs
Alex Williamson [Fri, 17 Apr 2026 20:27:58 +0000 (14:27 -0600)] 
vfio/cdx: Consolidate MSI configured state onto cdx_irqs

struct vfio_cdx_device carries three fields that track whether MSI has
been configured: vdev->cdx_irqs (the allocated vector array), vdev->
msi_count (the array length), and vdev->config_msi (a boolean flag).
The three are set together when vfio_cdx_msi_enable() succeeds and
cleared together by vfio_cdx_msi_disable().  However, the error paths
in vfio_cdx_msi_enable() free the cdx_irqs allocation on failure
without resetting the pointer, leaving it stale and skewed from the
other two fields until the next enable call overwrites it.

Clear vdev->cdx_irqs to NULL alongside the kfree() in both error paths
so the pointer consistently reflects the configured state.  With that
invariant restored and access to the MSI state serialized by
cdx_irqs_lock, vdev->config_msi is fully redundant with
(vdev->cdx_irqs != NULL).  Drop the config_msi field and switch all
readers to test cdx_irqs directly.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-4-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
4 weeks agovfio/cdx: Serialize VFIO_DEVICE_SET_IRQS with a per-device mutex
Alex Williamson [Fri, 17 Apr 2026 20:27:57 +0000 (14:27 -0600)] 
vfio/cdx: Serialize VFIO_DEVICE_SET_IRQS with a per-device mutex

vfio_cdx_set_msi_trigger() reads vdev->config_msi and operates on the
vdev->cdx_irqs array based on its value, but provides no serialization
against concurrent VFIO_DEVICE_SET_IRQS ioctls.  Two callers can race
such that one observes config_msi as set while another clears it and
frees cdx_irqs via vfio_cdx_msi_disable(), resulting in a use-after-free
of the cdx_irqs array.

Add a cdx_irqs_lock mutex to struct vfio_cdx_device and acquire it in
vfio_cdx_set_msi_trigger(), which is the single chokepoint through
which all updates to config_msi, cdx_irqs, and msi_count flow, covering
both the ioctl path and the close-device cleanup path.  This keeps the
test of config_msi atomic with the subsequent enable, disable, or
trigger operations.

Drop the pre-call !cdx_irqs test from vfio_cdx_irqs_cleanup() as part
of this change: the optimization it provided is redundant with the
!config_msi early-return inside vfio_cdx_msi_disable(), and leaving the
test in place would be an unsynchronized read of state the new lock is
meant to protect.

Fixes: 848e447e000c ("vfio/cdx: add interrupt support")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
4 weeks agovfio/cdx: Fix NULL pointer dereference in interrupt trigger path
Prasanna Kumar T S M [Fri, 17 Apr 2026 20:27:56 +0000 (14:27 -0600)] 
vfio/cdx: Fix NULL pointer dereference in interrupt trigger path

Add validation to ensure MSI is configured before accessing cdx_irqs
array in vfio_cdx_set_msi_trigger(). Without this check, userspace
can trigger a NULL pointer dereference by calling VFIO_DEVICE_SET_IRQS
with VFIO_IRQ_SET_DATA_BOOL or VFIO_IRQ_SET_DATA_NONE flags before
ever setting up interrupts via VFIO_IRQ_SET_DATA_EVENTFD.

The vfio_cdx_msi_enable() function allocates the cdx_irqs array and
sets config_msi to 1 only when called through the EVENTFD path. The
trigger loop (for DATA_BOOL/DATA_NONE) assumed this had already been
done, but there was no enforcement of this call ordering.

This matches the protection used in the PCI VFIO driver where
vfio_pci_set_msi_trigger() checks irq_is() before the trigger loop.

Fixes: 848e447e000c ("vfio/cdx: add interrupt support")
Cc: stable@vger.kernel.org
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Acked-by: Nipun Gupta <nipun.gupta@amd.com>
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
4 weeks agovfio: replace vfio->device_class with a const struct class
Alex Williamson [Fri, 17 Apr 2026 15:28:12 +0000 (09:28 -0600)] 
vfio: replace vfio->device_class with a const struct class

The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Replace vfio->device_class with a const struct class and drop
the class_create() call.

Compile tested with both CONFIG_VFIO_DEVICE_CDEV on and off (and
CONFIG_VFIO on); found no errors/warns in dmesg.

Link: https://lore.kernel.org/all/2023040244-duffel-pushpin-f738@gregkh/
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
[Remove unused vfio_cdev_init() args]
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Link: https://lore.kernel.org/r/20260417152814.18026-1-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
4 weeks agovfio/virtio: Use guard() for bar_mutex in legacy I/O
Alex Williamson [Tue, 14 Apr 2026 20:06:22 +0000 (14:06 -0600)] 
vfio/virtio: Use guard() for bar_mutex in legacy I/O

Convert the bar_mutex acquisition in virtiovf_issue_legacy_rw_cmd()
to use guard(), eliminating the out label and goto-based error paths
in favor of direct returns.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-5-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
4 weeks agovfio/virtio: Use guard() for migf->lock where applicable
Alex Williamson [Tue, 14 Apr 2026 20:06:21 +0000 (14:06 -0600)] 
vfio/virtio: Use guard() for migf->lock where applicable

Convert migf->lock acquisitions in virtiovf_disable_fd() and
virtiovf_save_read() to use guard().  In virtiovf_save_read() this
eliminates the out_unlock label and multiple goto paths by allowing
direct returns, and removes the need for the done variable to double
as an error carrier.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-4-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
4 weeks agovfio/virtio: Use guard() for list_lock where applicable
Alex Williamson [Tue, 14 Apr 2026 20:06:20 +0000 (14:06 -0600)] 
vfio/virtio: Use guard() for list_lock where applicable

Convert list_lock mutex acquisitions to use guard() and scoped_guard()
where the lock scope aligns with the function or block scope.  This
simplifies virtiovf_get_data_buff_from_pos() by replacing goto-based
unwinding with direct returns.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
4 weeks agovfio/virtio: Convert list_lock from spinlock to mutex
Alex Williamson [Tue, 14 Apr 2026 20:06:19 +0000 (14:06 -0600)] 
vfio/virtio: Convert list_lock from spinlock to mutex

The list_lock spinlock with IRQ disabling was copied from the mlx5
vfio-pci variant driver, where it is justified by a hardirq async
command completion callback that accesses the protected lists.  The
virtio driver has no such interrupt context usage; all list_lock
acquisitions occur in process context via file read/write operations
or state transitions under state_mutex.

Convert list_lock to a mutex to be consistent with peer vfio-pci
variant drivers (hisilicon, pds, qat, xe) which all use mutexes for
equivalent migration data protection.  This also fixes a mismatched
spin_lock()/spin_unlock_irq() pair in virtiovf_read_device_context_chunk()
that could incorrectly enable interrupts.

Reported-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Closes: https://lore.kernel.org/all/20260413073603.30538-1-guojinhui.liam@bytedance.com
Fixes: 0bbc82e4ec79 ("vfio/virtio: Add support for the basic live migration functionality")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
4 weeks agovfio/pci: Clean up DMABUFs before disabling function
Matt Evans [Wed, 15 Apr 2026 18:17:52 +0000 (11:17 -0700)] 
vfio/pci: Clean up DMABUFs before disabling function

On device shutdown, make vfio_pci_core_close_device() call
vfio_pci_dma_buf_cleanup() before the function is disabled via
vfio_pci_core_disable().  This ensures that all access via DMABUFs is
revoked before the function's BARs become inaccessible.

This fixes an issue where, if the function is disabled first, a tiny
window exists in which the function's MSE is cleared and yet BARs
could still be accessed via the DMABUF.  The resources would also be
freed and up for grabs by a different driver.

Fixes: 5d74781ebc86c ("vfio/pci: Add dma-buf export support for MMIO regions")
Signed-off-by: Matt Evans <mattev@meta.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20260415181752.1027604-1-mattev@meta.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
4 weeks agoblock: only restrict bio allocation gfp mask asked to block
Christoph Hellwig [Wed, 15 Apr 2026 06:08:07 +0000 (08:08 +0200)] 
block: only restrict bio allocation gfp mask asked to block

If the caller is asking for a non-blocking allocation, we should not
further restrict the gfp mask, which just increases the likelihood
of failures.

Fixes: b520c4eef83d ("block: split bio_alloc_bioset more clearly into a fast and slowpath")
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20260415060813.807659-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoALSA: hda/realtek - Add mute LED support for HP Victus 15-fa2xxx
Spencer Payton [Tue, 21 Apr 2026 08:49:18 +0000 (10:49 +0200)] 
ALSA: hda/realtek - Add mute LED support for HP Victus 15-fa2xxx

The mute LED on this laptop uses ALC245 but requires a quirk to work.
This patch enables the existing ALC245_FIXUP_HP_MUTE_LED_COEFBIT
quirk for the device.

Tested my Victus 15-fa2xxx (PCI SSID 103c:8dcd).
The LED behaviour works as intended.

Cc: stable@vger.kernel.org
Signed-off-by: Spencer Payton <spayton681@gmail.com>
Link: https://patch.msgid.link/20260421084918.14685-1-spayton681@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
4 weeks agotools/sched_ext: scx_qmap: Silence task_ctx lookup miss
Tejun Heo [Tue, 21 Apr 2026 07:17:11 +0000 (21:17 -1000)] 
tools/sched_ext: scx_qmap: Silence task_ctx lookup miss

scx_fork() dispatches ops.init_task to exactly one scheduler - the one
owning the forking task's cgroup. A task forked inside a sub-scheduler's
cgroup is init'd into the sub only; the root scheduler has no task_ctx
entry for it. When that task later appears as @prev in the root's
qmap_dispatch() (or flows through core-sched comparison via task_qdist),
the bpf_task_storage_get() legitimately misses.

qmap treated those misses as fatal via scx_bpf_error("task_ctx lookup
failed") and aborted the scheduler as soon as the first cross-sched
task hit the root. Drop the error in the sites where the miss is
legitimate: lookup_task_ctx() (helper; callers already check for NULL),
qmap_dispatch()'s @prev branch (bookkeeping-only), task_qdist()
(returns 0 which makes the comparison a no-op), and qmap_select_cpu()
(returns prev_cpu as a no-op fallback instead of -ESRCH). The existing
scx_error was a paranoid guard from the pre-sub-sched world where every
task was owned by the one and only scheduler.

v2: qmap_select_cpu() returns prev_cpu on NULL instead of -ESRCH, so
    the root scheduler doesn't error on cross-sched tasks that pass
    through it (Andrea Righi).

Fixes: 4f8b122848db ("sched_ext: Add basic building blocks for nested sub-scheduler dispatching")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
4 weeks agoALSA: pcmtest: Fix resource leaks in module init error paths
Cássio Gabriel [Tue, 21 Apr 2026 13:03:06 +0000 (10:03 -0300)] 
ALSA: pcmtest: Fix resource leaks in module init error paths

pcmtest allocates its pattern buffers and creates its debugfs tree
before registering the platform device and driver, but mod_init()
does not release those resources when a later init step fails.

As a result, a debugfs directory creation failure leaks the pattern
buffers, while platform_device_register() and
platform_driver_register() failures leave both the pattern buffers
and the debugfs tree behind. The recent fix for failed device
registration only dropped the embedded device reference.

Add the missing cleanup for the debugfs tree and pattern buffers in
the remaining module init error paths.

Fixes: 315a3d57c64c ("ALSA: Implement the new Virtual PCM Test Driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260421-alsa-pcmtest-init-unwind-v1-1-03fe0c423dbb@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
4 weeks agotpm: tpm_tis: stop transmit if retries are exhausted
Jacqueline Wong [Wed, 15 Apr 2026 16:00:06 +0000 (16:00 +0000)] 
tpm: tpm_tis: stop transmit if retries are exhausted

tpm_tis_send_main() will attempt to retry sending data TPM_RETRY times.
Currently, if those retries are exhausted, the driver will attempt to
call execute. The TPM will be in the wrong state, leading to the
operation simply timing out.

Instead, if there is still an error after retries are exhausted, return
that error immediately.

Cc: stable@vger.kernel.org # v6.6+
Fixes: 280db21e153d8 ("tpm_tis: Resend command to recover from data transfer errors")
Signed-off-by: Jacqueline Wong <jacqwong@google.com>
Signed-off-by: Jordan Hand <jhand@google.com>
Link: https://lore.kernel.org/r/20260415160006.2275325-3-jacqwong@google.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
4 weeks agotpm: tpm_tis: add error logging for data transfer
Jacqueline Wong [Wed, 15 Apr 2026 16:00:05 +0000 (16:00 +0000)] 
tpm: tpm_tis: add error logging for data transfer

Add logging to more easily determine reason for transmit failure

Cc: stable@vger.kernel.org # v6.6+
Fixes: 280db21e153d8 ("tpm_tis: Resend command to recover from data transfer errors")
Signed-off-by: Jacqueline Wong <jacqwong@google.com>
Signed-off-by: Jordan Hand <jhand@google.com>
Link: https://lore.kernel.org/r/20260415160006.2275325-2-jacqwong@google.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
4 weeks agotpm: avoid -Wunused-but-set-variable
Arnd Bergmann [Fri, 22 Mar 2024 13:22:48 +0000 (14:22 +0100)] 
tpm: avoid -Wunused-but-set-variable

Outside of the EFI tpm code, the TPM_MEMREMAP()/TPM_MEMUNMAP functions are
defined as trivial macros, leading to the mapping_size variable ending
up unused:

In file included from drivers/char/tpm/tpm-sysfs.c:16:
In file included from drivers/char/tpm/tpm.h:28:
include/linux/tpm_eventlog.h:167:6: error: variable 'mapping_size' set but not used [-Werror,-Wunused-but-set-variable]
  167 |         int mapping_size;

Turn the stubs into inline functions to avoid this warning.

Cc: stable@vger.kernel.org # v5.3+
Fixes: c46f3405692d ("tpm: Reserve the TPM final events table")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
4 weeks agotpm: Use kfree_sensitive() to free auth session in tpm_dev_release()
Gunnar Kudrjavets [Thu, 9 Apr 2026 17:20:54 +0000 (17:20 +0000)] 
tpm: Use kfree_sensitive() to free auth session in tpm_dev_release()

tpm_dev_release() uses plain kfree() to free chip->auth, which contains
sensitive cryptographic material including HMAC session keys, nonces,
and passphrase data (struct tpm2_auth).

Every other code path that frees this structure uses kfree_sensitive()
to zero the memory before releasing it: both tpm2_end_auth_session()
and tpm_buf_check_hmac_response() do so. The tpm_dev_release() path
is the only one that does not, leaving key material in freed slab
memory until it is eventually overwritten.

Use kfree_sensitive() for consistency with the rest of the driver and
to ensure session keys are scrubbed during device teardown.

Cc: stable@vger.kernel.org # v6.10+
Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions")
Signed-off-by: Gunnar Kudrjavets <gunnarku@amazon.com>
Reviewed-by: Justinien Bouron <jbouron@amazon.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
4 weeks agotpm2-sessions: Fix missing tpm_buf_destroy() in tpm2_read_public()
Gunnar Kudrjavets [Wed, 15 Apr 2026 00:00:03 +0000 (03:00 +0300)] 
tpm2-sessions: Fix missing tpm_buf_destroy() in tpm2_read_public()

tpm2_read_public() calls tpm_buf_init() but fails to call
tpm_buf_destroy() on two exit paths, leaking a page allocation:

1. When name_size() returns an error (unrecognized hash algorithm),
   the function returns directly without destroying the buffer.

2. On the success path, the buffer is never destroyed before
   returning.

All other error paths in the function correctly call
tpm_buf_destroy() before returning.

Fix both by adding the missing tpm_buf_destroy() calls.

Cc: stable@vger.kernel.org # v6.19+
Fixes: bda1cbf73c6e ("tpm2-sessions: Fix tpm2_read_public range checks")
Signed-off-by: Gunnar Kudrjavets <gunnarku@amazon.com>
Reviewed-by: Justinien Bouron <jbouron@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
4 weeks agotpm: Fix auth session leak in tpm2_get_random() error path
Gunnar Kudrjavets [Wed, 8 Apr 2026 09:00:27 +0000 (12:00 +0300)] 
tpm: Fix auth session leak in tpm2_get_random() error path

When tpm_buf_fill_hmac_session() fails inside the do-while loop in
tpm2_get_random(), the function returns directly after destroying the
buffer, without ending the auth session via tpm2_end_auth_session().

This leaks the TPM auth session resource. All other error paths within
the loop correctly reach the 'out' label which calls both
tpm_buf_destroy() and tpm2_end_auth_session().

Fix this by replacing the early return with a goto to the existing 'out'
label, which already handles both cleanup operations. The redundant
tpm_buf_destroy() call is removed since 'out' takes care of it.

Cc: stable@vger.kernel.org # v6.19+
Fixes: 6e9722e9a7bf ("tpm2-sessions: Fix out of range indexing in name_size")
Signed-off-by: Gunnar Kudrjavets <gunnarku@amazon.com>
Reviewed-by: Justinien Bouron <jbouron@amazon.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
4 weeks agotpm: i2c: atmel: fix block comment formatting
Ethan Luna [Wed, 8 Apr 2026 08:37:47 +0000 (11:37 +0300)] 
tpm: i2c: atmel: fix block comment formatting

Multiple block comments in tpm_i2c_atmel.c placed the closing '*/' on the
same line as the comment text. This violates the kernel's preferred
comment style, which requires the closing delimiter to appear on its
line.

Fix the formatting to improve readability and resolve checkpatch
warnings.

Signed-off-by: Ethan Luna <trunixcodes@zohomail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
4 weeks agotpm_crb: Convert ACPI driver to a platform one
Rafael J. Wysocki [Mon, 23 Feb 2026 15:55:21 +0000 (16:55 +0100)] 
tpm_crb: Convert ACPI driver to a platform one

In all cases in which a struct acpi_driver is used for binding a driver
to an ACPI device object, a corresponding platform device is created by
the ACPI core and that device is regarded as a proper representation of
underlying hardware.  Accordingly, a struct platform_driver should be
used by driver code to bind to that device.  There are multiple reasons
why drivers should not bind directly to ACPI device objects [1].

Overall, it is better to bind drivers to platform devices than to their
ACPI companions, so convert the tpm_crb ACPI driver to a platform one.

While this is not expected to alter functionality, it changes sysfs
layout and so it will be visible to user space.

Link: https://lore.kernel.org/all/2396510.ElGaqSPkdT@rafael.j.wysocki/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
4 weeks agotpm: Make tcpci_pm_ops variable static const
Krzysztof Kozlowski [Mon, 16 Feb 2026 11:04:59 +0000 (12:04 +0100)] 
tpm: Make tcpci_pm_ops variable static const

File-scope 'tcpci_pm_ops' is not used outside of this unit and is not
modified anywhere, so make it static const to silence sparse warning:

  tcpci.c:1002:1: warning: symbol 'tcpci_pm_ops' was not declared. Should it be static?

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
4 weeks agokgdb: update outdated references to kgdb_wait()
Kexin Sun [Tue, 24 Mar 2026 03:23:44 +0000 (11:23 +0800)] 
kgdb: update outdated references to kgdb_wait()

The function kgdb_wait() was folded into the static function
kgdb_cpu_enter() by commit 62fae312197a ("kgdb: eliminate
kgdb_wait(), all cpus enter the same way").  Update the four stale
references accordingly:

 - include/linux/kgdb.h and arch/x86/kernel/kgdb.c: the
   kgdb_roundup_cpus() kdoc describes what other CPUs are rounded up
   to call.  Because kgdb_cpu_enter() is static, the correct public
   entry point is kgdb_handle_exception(); also fix a pre-existing
   grammar error ("get them be" -> "get them into") and reflow the
   text.

 - kernel/debug/debug_core.c: replace with the generic description
   "the debug trap handler", since the actual entry path is
   architecture-specific.

 - kernel/debug/gdbstub.c: kgdb_cpu_enter() is correct here (it
   describes internal state, not a call target); add the missing
   parentheses.

Suggested-by: Daniel Thompson <daniel@riscstar.com>
Assisted-by: unnamed:deepseek-v3.2 coccinelle
Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn>
4 weeks agoMerge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Linus Torvalds [Tue, 21 Apr 2026 15:33:26 +0000 (08:33 -0700)] 
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
 "We've finally gotten rid of the struct clk_ops::round_rate() code
  after months of effort from Brian Masney. Now the only option is to
  use determine_rate(), which is good because that takes a struct
  argument instead of just a couple unsigned longs, allowing us to
  easily modify the way we determine and set rates in the clk tree.

  Beyond that core framework change we've got the typical pile of new
  SoC clk driver additions, fixes for clk data and/or adding missing
  clks because the consumer driver using those clks wasn't ready, etc.
  The usual suspects are all here: Qualcomm, Samsung, Mediatek, and
  Rockchip along with some newcomers making RISC-V SoCs like ESWIN's
  eic700 and Tenstorrent's Atlantis. The clk driver side of this looks
  pretty normal.

  Core:
   - Remove the round_rate() clk op (yay!)

  New Drivers:
   - ESWIN eic700 SoC clk support
   - Econet EN751221 SoC clock/reset support
   - Global TCSR, RPMh, and display clock controller support for the
     Qualcomm Eliza platform
   - TCSR, the multiple global, and the RPMh clock controller support
     for the Qualcomm Nord platform
   - GPU clock controller support for Qualcomm SM8750
   - Video and GPU clock controller support for Qualcomm Glymur
   - Global clock controller support for Qualcomm IPQ5210
   - Axis ARTPEC-9: Add new PLL clocks and new drivers for eight clock
     controllers on the SoC
   - ExynosAutov920: Add G3D (GPU) clock controller
   - Clock driver for the Rockchip RV1103B SoC
   - Initial support for the Renesas RZ/G3L (R9A08G046) SoC
   - Clock and reset controllers (e.g. PRCM) in the Tenstorrent Atlantis SoC"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (132 commits)
  clk: visconti: pll: initialize clk_init_data to zero
  clk: fsl-sai: Add MCLK generation support
  clk: fsl-sai: Extract clock setup into fsl_sai_clk_register()
  dt-bindings: clock: fsl-sai: Document clock-cells = <1> support
  clk: fsl-sai: Add i.MX8M support with 8 byte register offset
  clk: fsl-sai: Sort the headers
  dt-bindings: clock: fsl-sai: Document i.MX8M support
  clk: qcom: gcc: Add multiple global clock controller driver for Nord SoC
  clk: qcom: rpmh: Add support for Nord rpmh clocks
  clk: qcom: Add TCSR clock driver for Nord SoC
  dt-bindings: clock: qcom: Add Nord Global Clock Controller
  dt-bindings: clock: qcom-rpmhcc: Add support for Nord SoCs
  dt-bindings: clock: qcom: Document the Nord SoC TCSR Clock Controller
  clk: qcom: gcc-x1e80100: Keep GCC USB QTB clock always ON
  clk: qcom: Constify list of critical CBCR registers
  clk: qcom: Constify qcom_cc_driver_data
  clk: qcom: videocc-glymur: Constify qcom_cc_desc
  clk: qcom: Add a driver for SM8750 GPU clocks
  dt-bindings: clock: qcom: Add SM8750 GPU clocks
  clk: qcom: ipq-cmn-pll: Add IPQ8074 SoC support
  ...

4 weeks agoMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Tue, 21 Apr 2026 15:22:18 +0000 (08:22 -0700)] 
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Usual driver updates (ufs, lpfc, fnic, target, mpi3mr).

  The substantive core changes are adding a 'serial' sysfs attribute and
  getting sd to support > PAGE_SIZE sectors"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (98 commits)
  scsi: target: Don't validate ignored fields in PROUT PREEMPT
  scsi: qla2xxx: Use nr_cpu_ids instead of NR_CPUS for qp_cpu_map allocation
  scsi: ufs: core: Disable timestamp for Kioxia THGJFJT0E25BAIP
  scsi: mpi3mr: Fix typo
  scsi: sd: fix missing put_disk() when device_add(&disk_dev) fails
  scsi: libsas: Delete unused to_dom_device() and to_dev_attr()
  scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC
  scsi: iscsi_tcp: Remove unneeded selections of CRYPTO and CRYPTO_MD5
  scsi: lpfc: Update lpfc version to 15.0.0.0
  scsi: lpfc: Add PCI ID support for LPe42100 series adapters
  scsi: lpfc: Introduce 128G link speed selection and support
  scsi: lpfc: Check ASIC_ID register to aid diagnostics during failed fw updates
  scsi: lpfc: Update construction of SGL when XPSGL is enabled
  scsi: lpfc: Remove deprecated PBDE feature
  scsi: lpfc: Add REG_VFI mailbox cmd error handling
  scsi: lpfc: Log MCQE contents for mbox commands with no context
  scsi: lpfc: Select mailbox rq_create cmd version based on SLI4 if_type
  scsi: lpfc: Break out of IRQ affinity assignment when mask reaches nr_cpu_ids
  scsi: ufs: core: Make the header files self-contained
  scsi: ufs: core: Remove an include directive from ufshcd-crypto.h
  ...

4 weeks agoMerge tag 'v7.1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Tue, 21 Apr 2026 15:06:43 +0000 (08:06 -0700)] 
Merge tag 'v7.1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

 - Fix IPsec ESN regression in authencesn

 - Fix hmac setkey failure in eip93

 - Guard against IV changing in algif_aead

 - Fix async completion handling in krb5enc

 - Fix fallback async completion in acomp

 - Fix handling of MAY_BACKLOG requests in pcrypt

 - Fix issues with firmware-returned values in ccp

* tag 'v7.1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: krb5enc - fix async decrypt skipping hash verification
  crypto: algif_aead - snapshot IV for async AEAD requests
  crypto: acomp - fix wrong pointer stored by acomp_save_req()
  crypto: ccp - copy IV using skcipher ivsize
  crypto: ccp: Don't attempt to copy ID to userspace if PSP command failed
  crypto: ccp: Don't attempt to copy PDH cert to userspace if PSP command failed
  crypto: ccp: Don't attempt to copy CSR to userspace if PSP command failed
  crypto: pcrypt - Fix handling of MAY_BACKLOG requests
  crypto: sa2ul - Fix AEAD fallback algorithm names
  crypto: authencesn - Fix src offset when decrypting in-place
  crypto: eip93 - fix hmac setkey algo selection

4 weeks agotracing/fprobe: Check the same type fprobe on table as the unregistered one
Masami Hiramatsu (Google) [Mon, 20 Apr 2026 14:01:20 +0000 (23:01 +0900)] 
tracing/fprobe: Check the same type fprobe on table as the unregistered one

Commit 2c67dc457bc6 ("tracing: fprobe: optimization for entry only case")
introduced a different ftrace_ops for entry-only fprobes.

However, when unregistering an fprobe, the kernel only checks if another
fprobe exists at the same address, without checking which type of fprobe
it is.
If different fprobes are registered at the same address, the same address
will be registered in both fgraph_ops and ftrace_ops, but only one of
them will be deleted when unregistering. (the one removed first will not
be deleted from the ops).

This results in junk entries remaining in either fgraph_ops or ftrace_ops.
For example:
 =======
 cd /sys/kernel/tracing

 # 'Add entry and exit events on the same place'
 echo 'f:event1 vfs_read' >> dynamic_events
 echo 'f:event2 vfs_read%return' >> dynamic_events

 # 'Enable both of them'
 echo 1 > events/fprobes/enable
 cat enabled_functions
vfs_read (2)            ->arch_ftrace_ops_list_func+0x0/0x210

 # 'Disable and remove exit event'
 echo 0 > events/fprobes/event2/enable
 echo -:event2 >> dynamic_events

 # 'Disable and remove all events'
 echo 0 > events/fprobes/enable
 echo > dynamic_events

 # 'Add another event'
 echo 'f:event3 vfs_open%return' > dynamic_events
 cat dynamic_events
f:fprobes/event3 vfs_open%return

 echo 1 > events/fprobes/enable
 cat enabled_functions
vfs_open (1)            tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60    subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
vfs_read (1)            tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60    subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
 =======

As you can see, an entry for the vfs_read remains.

To fix this issue, when unregistering, the kernel should also check if
there is the same type of fprobes still exist at the same address, and
if not, delete its entry from either fgraph_ops or ftrace_ops.

Link: https://lore.kernel.org/all/177669367993.132053.10553046138528674802.stgit@mhiramat.tok.corp.google.com/
Fixes: 2c67dc457bc6 ("tracing: fprobe: optimization for entry only case")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
4 weeks agotracing/fprobe: Avoid kcalloc() in rcu_read_lock section
Masami Hiramatsu (Google) [Mon, 20 Apr 2026 14:01:12 +0000 (23:01 +0900)] 
tracing/fprobe: Avoid kcalloc() in rcu_read_lock section

fprobe_remove_node_in_module() is called under RCU read locked, but
this invokes kcalloc() if there are more than 8 fprobes installed
on the module. Sashiko warns it because kcalloc() can sleep [1].

 [1] https://sashiko.dev/#/patchset/177552432201.853249.5125045538812833325.stgit%40mhiramat.tok.corp.google.com

To fix this issue, expand the batch size to 128 and do not expand
the fprobe_addr_list, but just cancel walking on fprobe_ip_table,
update fgraph/ftrace_ops and retry the loop again.

Link: https://lore.kernel.org/all/177669367206.132053.1493637946869032744.stgit@mhiramat.tok.corp.google.com/
Fixes: 0de4c70d04a4 ("tracing: fprobe: use rhltable for fprobe_ip_table")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
4 weeks agospi: Fix the error description in the `ptp_sts_word_post` comment
Dewei Meng [Tue, 21 Apr 2026 02:58:08 +0000 (10:58 +0800)] 
spi: Fix the error description in the `ptp_sts_word_post` comment

Based on the comment information, the description within the
`ptp_sts_word_post` section should be changed to "See @ptp_sts_word_pre".

Signed-off-by: Dewei Meng <mengdewei@cqsoftware.com.cn>
Link: https://patch.msgid.link/20260421025808.6572-1-mengdewei@cqsoftware.com.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
4 weeks agotracing/fprobe: Remove fprobe from hash in failure path
Masami Hiramatsu (Google) [Mon, 20 Apr 2026 14:01:04 +0000 (23:01 +0900)] 
tracing/fprobe: Remove fprobe from hash in failure path

When register_fprobe_ips() fails, it tries to remove a list of
fprobe_hash_node from fprobe_ip_table, but it missed to remove
fprobe itself from fprobe_table. Moreover, when removing
the fprobe_hash_node which is added to rhltable once, it must
use kfree_rcu() after removing from rhltable.

To fix these issues, this reuses unregister_fprobe() internal
code to rollback the half-way registered fprobe.

Link: https://lore.kernel.org/all/177669366417.132053.17874946321744910456.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
4 weeks agotracing/fprobe: Unregister fprobe even if memory allocation fails
Masami Hiramatsu (Google) [Mon, 20 Apr 2026 14:00:56 +0000 (23:00 +0900)] 
tracing/fprobe: Unregister fprobe even if memory allocation fails

unregister_fprobe() can fail under memory pressure because of memory
allocation failure, but this maybe called from module unloading, and
usually there is no way to retry it. Moreover. trace_fprobe does not
check the return value.

To fix this problem, unregister fprobe and fprobe_hash_node even if
working memory allocation fails.
Anyway, if the last fprobe is removed, the filter will be freed.

Link: https://lore.kernel.org/all/177669365629.132053.8433032896213721288.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
4 weeks agotracing/fprobe: Reject registration of a registered fprobe before init
Masami Hiramatsu (Google) [Mon, 20 Apr 2026 14:00:48 +0000 (23:00 +0900)] 
tracing/fprobe: Reject registration of a registered fprobe before init

Reject registration of a registered fprobe which is on the fprobe
hash table before initializing fprobe.
The add_fprobe_hash() checks this re-register fprobe, but since
fprobe_init() clears hlist_array field, it is too late to check it.
It has to check the re-registration before touncing fprobe.

Link: https://lore.kernel.org/all/177669364845.132053.18375367916162315835.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
4 weeks agoMerge tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 21 Apr 2026 14:30:44 +0000 (07:30 -0700)] 
Merge tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull dcache busy loop updates from Al Viro:
 "Fix livelocks in shrink_dcache_tree()

  If shrink_dcache_tree() finds a dentry in the middle of being killed
  by another thread, it has to wait until the victim finishes dying,
  gets detached from the tree and ceases to pin its parent.

  The way we used to deal with that amounted to busy-wait;
  unfortunately, it's not just inefficient but can lead to reliably
  reproducible hard livelocks.

  Solved by having shrink_dentry_tree() attach a completion to such
  dentry, with dentry_unlist() calling complete() on all objects
  attached to it. With a bit of care it can be done without growing
  struct dentry or adding overhead in normal case"

* tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get rid of busy-waiting in shrink_dcache_tree()
  dcache.c: more idiomatic "positives are not allowed" sanity checks
  struct dentry: make ->d_u anonymous
  for_each_alias(): helper macro for iterating through dentries of given inode

4 weeks agoarm64: dts: meson-gxl-p230: fix ethernet PHY interrupt number
Jun Yan [Mon, 30 Mar 2026 14:51:11 +0000 (22:51 +0800)] 
arm64: dts: meson-gxl-p230: fix ethernet PHY interrupt number

Correct the interrupt number assigned to the Realtek PHY in the p230

following the same logic as commit 3106507e1004 ("ARM64: dts: meson-gxm:
fix q200 interrupt number"),as reported in [PATCH 0/2] Ethernet PHY
interrupt improvements [1].

[1] https://lore.kernel.org/all/20171202214037.17017-1-martin.blumenstingl@googlemail.com/

Fixes: b94d22d94ad2 ("ARM64: dts: meson-gx: add external PHY interrupt on some platforms")
Signed-off-by: Jun Yan <jerrysteve1101@gmail.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://patch.msgid.link/20260330145111.115318-1-jerrysteve1101@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
4 weeks agoarm64: dts: amlogic: meson-axg: Add missing cache information to cpu0
Anand Moon [Thu, 19 Feb 2026 10:35:46 +0000 (16:05 +0530)] 
arm64: dts: amlogic: meson-axg: Add missing cache information to cpu0

Add missing L1 data and instruction cache parameters to the CPU node 0
for the Cortex-A53 caches on the Meson AXG SoC.

Fixes: 3b6ad2a43367 ("arm64: dts: amlogic: Add cache information to the Amlogic AXG SoCS")
Signed-off-by: Anand Moon <linux.amoon@gmail.com>
Link: https://patch.msgid.link/20260219103548.18392-1-linux.amoon@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
4 weeks agoarm64: dts: amlogic: t7: khadas-vim4: fix board model name
Nick Xie [Fri, 6 Mar 2026 03:07:56 +0000 (11:07 +0800)] 
arm64: dts: amlogic: t7: khadas-vim4: fix board model name

Update the model property to "Khadas VIM4" to match the official
product branding and maintain consistency with other Khadas boards
(e.g., VIM1, VIM2, VIM3) in the kernel tree.

Signed-off-by: Nick Xie <nick@khadas.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260306030756.2421841-1-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
4 weeks agoarm64: dts: amlogic: Fix GIC register ranges for Amlogic T7
Ronald Claveau [Thu, 5 Mar 2026 22:11:25 +0000 (23:11 +0100)] 
arm64: dts: amlogic: Fix GIC register ranges for Amlogic T7

This patch aims to fix the GIC register ranges for Amlogic T7 SoC family.

- Context
Kernel log shows a warning about GIC
[    0.000000] GIC: GICv2 detected, but range too small and irqchip.gicv2_force_probe not set

Using cat /proc/interrupts command shows GIC as GIC-0

Adding some peripherals sometimes causes hangs on interrupts.

- According to the GIC-400 ARM doc, the memory map is like:
0x1000-0x1FFF Distributor
0x2000-0x3FFF CPU interfaces
0x4000-0x5FFF Virtual interface control block
0x6000-0x7FFF Virtual CPU interfaces

- Identify GIC model from distributor register

Offset | Name | Type | Reset
0x008 | GICD_IIDR | RO | 0x0200143B

kvim4# md.l 0xFFF01008 1
fff010080200143b

- Identify CPU interface from CPU interface register

Offset | Name | Type | Reset
0x00FC | GICC_IIDR | RO | 0x0202143B

kvim4# md.l 0xFFF020FC 1
fff020fc0202143b

- Virtual interface control register check

Offset | Name | Type | Reset
0x004 | GICH_VTR | RO | 0x90000003

kvim4# md.l 0xFFF04004 1
fff0400490000003

- Virtual CPU interfaces check

Offset | Name | Type | Reset
0x00FC | GICV_IIDR | RO | 0x0202143B

kvim4# md.l 0xFFF060FC 1
fff060fc0202143b

- After this patch there is no warning anymore.
GICv2 is correctly identified.

[    0.000000] GIC: Using split EOI/Deactivate mode

Using cat /proc/interrupts command shows GIC as GICv2

Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260305-fix-amlt7-gic-dts-v1-1-5944415c74bf@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
4 weeks agoarm64: dts: amlogic: t7: khadas-vim4: fix memory layout for 8GB RAM
Nick Xie [Thu, 19 Mar 2026 02:34:46 +0000 (10:34 +0800)] 
arm64: dts: amlogic: t7: khadas-vim4: fix memory layout for 8GB RAM

The Khadas VIM4 features 8GB of LPDDR4X RAM. The previous memory node
mapped a single incorrect region. This caused the kernel to map MMIO
and secure firmware (ATF/TrustZone) memory holes as standard RAM,
leading to an Asynchronous SError Interrupt during early boot
(paging_init) when the kernel attempted to clear those pages.

Fix this by splitting the 8GB memory layout into three separate
regions to properly avoid the memory holes (e.g., 0xe0000000 -
0xffffffff):
- 3.5GB @ 0x000000000
- 3.5GB @ 0x100000000
- 1.0GB @ 0x200000000

Signed-off-by: Nick Xie <nick@khadas.com>
Suggested-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Link: https://patch.msgid.link/20260319023446.3422695-1-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
4 weeks agoarm64: dts: amlogic: s6: Drop CPU masks from GICv3 PPI interrupts
Geert Uytterhoeven [Wed, 4 Mar 2026 17:10:58 +0000 (18:10 +0100)] 
arm64: dts: amlogic: s6: Drop CPU masks from GICv3 PPI interrupts

Unlike older GIC variants, the GICv3 DT bindings do not support
specifying a CPU mask in PPI interrupt specifiers.  Drop the masks.
While at it, replace the magic number for IRQ_TYPE_LEVEL_HIGH by its
symbolic definition.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/f9c6eddebebcd2e128edd2dbc51706e23589f9e8.1772643434.git.geert+renesas@glider.be
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
4 weeks agonet/sched: sch_dualpi2: drain both C-queue and L-queue in dualpi2_change()
Chia-Yu Chang [Fri, 17 Apr 2026 15:25:51 +0000 (17:25 +0200)] 
net/sched: sch_dualpi2: drain both C-queue and L-queue in dualpi2_change()

Fix dualpi2_change() to correctly enforce updated limit and memlimit
values after a configuration change of the dualpi2 qdisc.

Before this patch, dualpi2_change() always attempted to dequeue packets
via the root qdisc (C-queue) when reducing backlog or memory usage, and
unconditionally assumed that a valid skb will be returned. When traffic
classification results in packets being queued in the L-queue while the
C-queue is empty, this leads to a NULL skb dereference during limit or
memlimit enforcement.

This is fixed by first dequeuing from the C-queue path if it is
non-empty. Once the C-queue is empty, packets are dequeued directly from
the L-queue. Return values from qdisc_dequeue_internal() are checked for
both queues. When dequeuing from the L-queue, the parent qdisc qlen and
backlog counters are updated explicitly to keep overall qdisc statistics
consistent.

Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com>
Closes: https://lore.kernel.org/netdev/20260413075740.2234828-1-hxzene@gmail.com/
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Link: https://patch.msgid.link/20260417152551.71648-1-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agonet: airoha: Fix PPE cpu port configuration for GDM2 loopback path
Lorenzo Bianconi [Fri, 17 Apr 2026 15:24:41 +0000 (17:24 +0200)] 
net: airoha: Fix PPE cpu port configuration for GDM2 loopback path

When QoS loopback is enabled for GDM3 or GDM4, incoming packets are
forwarded to GDM2. However, the PPE cpu port for GDM2 is not configured
in this path, causing traffic originating from GDM3/GDM4, which may
be set up as WAN ports backed by QDMA1, to be incorrectly directed
to QDMA0 instead.
Configure the PPE cpu port for GDM2 when QoS loopback is active on
GDM3 or GDM4 to ensure traffic is routed to the correct QDMA instance.

Fixes: 9cd451d414f6 ("net: airoha: Add loopback support for GDM2")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260417-airoha-ppe-cpu-port-for-gdm2-loopback-v1-1-c7a9de0f6f57@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agoipmi: Check event message buffer response for bad data
Corey Minyard [Mon, 20 Apr 2026 17:50:09 +0000 (12:50 -0500)] 
ipmi: Check event message buffer response for bad data

The event message buffer response data size got checked later when
processing, but check it right after the response comes back.  It
appears some BMCs may return an empty message instead of an error
when fetching events.

There are apparently some new BMCs that make this error, so we need to
compensate.

Reported-by: Matt Fleming <mfleming@cloudflare.com>
Closes: https://lore.kernel.org/lkml/20260415115930.3428942-1-matt@readmodwrite.com/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org>
Signed-off-by: Corey Minyard <corey@minyard.net>
4 weeks agoMerge branch 'net-sleepable-ndo_set_rx_mode'
Paolo Abeni [Tue, 21 Apr 2026 10:50:26 +0000 (12:50 +0200)] 
Merge branch 'net-sleepable-ndo_set_rx_mode'

Stanislav Fomichev says:

====================
net: sleepable ndo_set_rx_mode

This series adds a new ndo_set_rx_mode_async callback that enables
drivers to handle address list updates in a sleepable context. The
current ndo_set_rx_mode is called under the netif_addr_lock spinlock
with BHs disabled, which prevents drivers from sleeping. This is
problematic for ops-locked drivers that need to sleep.

The approach:
1. Add snapshot/reconcile infrastructure for address lists
2. Introduce dev_rx_mode_work that takes snapshots under the lock,
   drops the lock, calls the driver, then reconciles changes back
3. Move promiscuity handling into the scheduled work as well
4. Convert existing ops-locked drivers to ndo_set_rx_mode_async
5. Add a warning for ops-locked drivers still using ndo_set_rx_mode
6. Add a selftest exercising the team+bridge+macvlan topology that
   triggers the addr_lock -> ops_lock ordering issue
====================

Link: https://patch.msgid.link/20260416185712.2155425-1-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agoselftests: net: use ip commands instead of teamd in team rx_mode test
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:12 +0000 (11:57 -0700)] 
selftests: net: use ip commands instead of teamd in team rx_mode test

Replace teamd daemon usage with ip link commands for team device
setup. teamd -d daemonizes and returns to the shell before port
addition completes, creating a race: the test may create the macvlan
(and check for its address on a slave) before teamd has finished
adding ports. This makes the test inherently dependent on scheduling
timing.

Using ip commands makes port addition synchronous, removing the race
and making the test deterministic.

Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jay Vosburgh <jv@jvosburgh.net>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-16-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agoselftests: net: add team_bridge_macvlan rx_mode test
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:11 +0000 (11:57 -0700)] 
selftests: net: add team_bridge_macvlan rx_mode test

Add a test that exercises the ndo_change_rx_flags path through a
macvlan -> bridge -> team -> dummy stack. This triggers dev_uc_add
under addr_list_lock which flips promiscuity on the lower device.
With the new work queue approach, this must not deadlock.

Link: https://lore.kernel.org/netdev/20260214033859.43857-1-jiayuan.chen@linux.dev/
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-15-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agonet: warn ops-locked drivers still using ndo_set_rx_mode
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:10 +0000 (11:57 -0700)] 
net: warn ops-locked drivers still using ndo_set_rx_mode

Now that all in-tree ops-locked drivers have been converted to
ndo_set_rx_mode_async, add a warning in register_netdevice to catch
any remaining or newly added drivers that use ndo_set_rx_mode with
ops locking. This ensures future driver authors are guided toward
the async path.

Also route ops-locked devices through netdev_rx_mode_work even if they
lack rx_mode NDOs, to ensure netdev_ops_assert_locked() does not fire
on the legacy path where only RTNL is held.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-14-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agonetkit: convert to ndo_set_rx_mode_async
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:09 +0000 (11:57 -0700)] 
netkit: convert to ndo_set_rx_mode_async

Convert netkit driver from ndo_set_rx_mode to ndo_set_rx_mode_async.
The netkit driver's set_multicast_list is a no-op, presumably
for the same reason as the one in dummy? (fake multicast ability)

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-13-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agodummy: convert to ndo_set_rx_mode_async
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:08 +0000 (11:57 -0700)] 
dummy: convert to ndo_set_rx_mode_async

Convert dummy driver from ndo_set_rx_mode to ndo_set_rx_mode_async.
The dummy driver's set_multicast_list is a no-op, so the conversion
is straightforward: update the signature and the ops assignment.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-12-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agonetdevsim: convert to ndo_set_rx_mode_async
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:07 +0000 (11:57 -0700)] 
netdevsim: convert to ndo_set_rx_mode_async

Convert netdevsim from ndo_set_rx_mode to ndo_set_rx_mode_async.
The callback is a no-op stub so just update the signature and
ops struct wiring.

Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-11-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agoiavf: convert to ndo_set_rx_mode_async
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:06 +0000 (11:57 -0700)] 
iavf: convert to ndo_set_rx_mode_async

Convert iavf from ndo_set_rx_mode to ndo_set_rx_mode_async.
iavf_set_rx_mode now takes explicit uc/mc list parameters and
uses __hw_addr_sync_dev on the snapshots instead of __dev_uc_sync
and __dev_mc_sync.

The iavf_configure internal caller passes the real lists directly.

Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-10-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agobnxt: use snapshot in bnxt_cfg_rx_mode
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:05 +0000 (11:57 -0700)] 
bnxt: use snapshot in bnxt_cfg_rx_mode

With the introduction of ndo_set_rx_mode_async (as discussed in [1])
we can call bnxt_cfg_rx_mode directly. Convert bnxt_cfg_rx_mode to
use uc/mc snapshots and move its call in bnxt_sp_task to the
section that resets BNXT_STATE_IN_SP_TASK. Switch to direct call in
bnxt_set_rx_mode.

Link: https://lore.kernel.org/netdev/CACKFLi=5vj8hPqEUKDd8RTw3au5G+zRgQEqjF+6NZnyoNm90KA@mail.gmail.com/
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-9-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agobnxt: convert to ndo_set_rx_mode_async
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:04 +0000 (11:57 -0700)] 
bnxt: convert to ndo_set_rx_mode_async

Convert bnxt from ndo_set_rx_mode to ndo_set_rx_mode_async.
bnxt_set_rx_mode, bnxt_mc_list_updated and bnxt_uc_list_updated
now take explicit uc/mc list parameters and iterate with
netdev_hw_addr_list_for_each instead of netdev_for_each_{uc,mc}_addr.

The bnxt_cfg_rx_mode internal caller passes the real lists under
netif_addr_lock_bh.

BNXT_RX_MASK_SP_EVENT is still used here, next patch converts to
the direct call.

Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-8-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agomlx5: convert to ndo_set_rx_mode_async
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:03 +0000 (11:57 -0700)] 
mlx5: convert to ndo_set_rx_mode_async

Convert mlx5 from ndo_set_rx_mode to ndo_set_rx_mode_async. The
driver's mlx5e_set_rx_mode now receives uc/mc snapshots and calls
mlx5e_fs_set_rx_mode_work directly instead of queueing work.

mlx5e_sync_netdev_addr and mlx5e_handle_netdev_addr now take
explicit uc/mc list parameters and iterate with
netdev_hw_addr_list_for_each instead of netdev_for_each_{uc,mc}_addr.

Fallback to netdev's uc/mc in a few places and grab addr lock.

Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-7-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agofbnic: convert to ndo_set_rx_mode_async
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:02 +0000 (11:57 -0700)] 
fbnic: convert to ndo_set_rx_mode_async

Convert fbnic from ndo_set_rx_mode to ndo_set_rx_mode_async. The
driver's __fbnic_set_rx_mode() now takes explicit uc/mc list
parameters and uses __hw_addr_sync_dev() on the snapshots instead
of __dev_uc_sync/__dev_mc_sync on the netdev directly.

Update callers in fbnic_up, fbnic_fw_config_after_crash,
fbnic_bmc_rpc_check and fbnic_set_mac to pass the real address
lists calling __fbnic_set_rx_mode outside the async work path.

Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: kernel-team@meta.com
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-6-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agonet: move promiscuity handling into netdev_rx_mode_work
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:01 +0000 (11:57 -0700)] 
net: move promiscuity handling into netdev_rx_mode_work

Move unicast promiscuity tracking into netdev_rx_mode_work so it runs
under netdev_ops_lock instead of under the addr_lock spinlock. This
is required because __dev_set_promiscuity calls dev_change_rx_flags
and __dev_notify_flags, both of which may need to sleep.

Change ASSERT_RTNL() to netdev_ops_assert_locked() in
__dev_set_promiscuity, netif_set_allmulti and __dev_change_flags
since these are now called from the work queue under the ops lock.

Link: https://lore.kernel.org/netdev/20260214033859.43857-1-jiayuan.chen@linux.dev/
Fixes: 78cd408356fe ("net: add missing instance lock to dev_set_promiscuity")
Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-5-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agonet: cache snapshot entries for ndo_set_rx_mode_async
Stanislav Fomichev [Thu, 16 Apr 2026 18:57:00 +0000 (11:57 -0700)] 
net: cache snapshot entries for ndo_set_rx_mode_async

Add a per-device netdev_hw_addr_list cache (rx_mode_addr_cache) that
allows __hw_addr_list_snapshot() and __hw_addr_list_reconcile() to
reuse previously allocated entries instead of hitting GFP_ATOMIC on
every snapshot cycle.

snapshot pops entries from the cache when available, falling back to
__hw_addr_create(). reconcile splices both snapshot lists back into
the cache via __hw_addr_splice(). The cache is flushed in
free_netdev().

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-4-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agonet: introduce ndo_set_rx_mode_async and netdev_rx_mode_work
Stanislav Fomichev [Thu, 16 Apr 2026 18:56:59 +0000 (11:56 -0700)] 
net: introduce ndo_set_rx_mode_async and netdev_rx_mode_work

Add ndo_set_rx_mode_async callback that drivers can implement instead
of the legacy ndo_set_rx_mode. The legacy callback runs under the
netif_addr_lock spinlock with BHs disabled, preventing drivers from
sleeping. The async variant runs from a work queue with rtnl_lock and
netdev_lock_ops held, in fully sleepable context.

When __dev_set_rx_mode() sees ndo_set_rx_mode_async, it schedules
netdev_rx_mode_work instead of calling the driver inline. The work
function takes two snapshots of each address list (uc/mc) under
the addr_lock, then drops the lock and calls the driver with the
work copies. After the driver returns, it reconciles the snapshots
back to the real lists under the lock.

Add netif_rx_mode_sync() to opportunistically execute the pending
workqueue update inline, so that rx mode changes are committed
before returning to userspace:
  - dev_change_flags (SIOCSIFFLAGS / RTM_NEWLINK)
  - dev_set_promiscuity
  - dev_set_allmulti
  - dev_ifsioc SIOCADDMULTI / SIOCDELMULTI
  - do_setlink (RTM_SETLINK)

Note that some deep hierarchies still do skip the lower updates via:
  - dev_uc_sync
  - dev_mc_sync

If we do end up hitting user-visible issues, we can add more calls to
netif_rx_mode_sync in specific places. But hopefully we should not,
the actual user-visible lists are still synced, it's that just HW state
that might be lagging.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-3-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agonet: add address list snapshot and reconciliation infrastructure
Stanislav Fomichev [Thu, 16 Apr 2026 18:56:58 +0000 (11:56 -0700)] 
net: add address list snapshot and reconciliation infrastructure

Introduce __hw_addr_list_snapshot() and __hw_addr_list_reconcile()
for use by the upcoming ndo_set_rx_mode_async callback.

The async rx_mode path needs to snapshot the device's unicast and
multicast address lists under the addr_lock, hand those snapshots
to the driver (which may sleep), and then propagate any sync_cnt
changes back to the real lists. Two identical snapshots are taken:
a work copy for the driver to pass to __hw_addr_sync_dev() and a
reference copy to compute deltas against.

__hw_addr_list_reconcile() walks the reference snapshot comparing
each entry against the work snapshot to determine what the driver
synced or unsynced. It then applies those deltas to the real list,
handling concurrent modifications:

  - If the real entry was concurrently removed but the driver synced
    it to hardware (delta > 0), re-insert a stale entry so the next
    work run properly unsyncs it from hardware.
  - If the entry still exists, apply the delta normally. An entry
    whose refcount drops to zero is removed.

  # dev_addr_test_snapshot_benchmark: 1024 addrs x 1000 snapshots: 89872802 ns total, 89872 ns/iter
  # dev_addr_test_snapshot_benchmark.speed: slow

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-2-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agonetfilter: nf_tables: add hook transactions for device deletions
Pablo Neira Ayuso [Wed, 15 Apr 2026 20:58:23 +0000 (22:58 +0200)] 
netfilter: nf_tables: add hook transactions for device deletions

Restore the flag that indicates that the hook is going away, ie.
NFT_HOOK_REMOVE, but add a new transaction object to track deletion
of hooks without altering the basechain/flowtable hook_list during
the preparation phase.

The existing approach that moves the hook from the basechain/flowtable
hook_list to transaction hook_list breaks netlink dump path readers
of this RCU-protected list.

It should be possible use an array for nft_trans_hook to store the
deleted hooks to compact the representation but I am not expecting
many hook object, specially now that wildcard support for devices
is in place.

Note that the nft_trans_chain_hooks() list contains a list of struct
nft_trans_hook objects for DELCHAIN and DELFLOWTABLE commands, while
this list stores struct nft_hook objects for NEWCHAIN and NEWFLOWTABLE.
Note that new commands can be updated to use nft_trans_hook for
consistency.

This patch also adapts the event notification path to deal with the list
of hook transactions.

Fixes: 7d937b107108 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Fixes: b6d9014a3335 ("netfilter: nf_tables: delete flowtable hooks via transaction list")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 weeks agonetfilter: nf_tables: join hook list via splice_list_rcu() in commit phase
Pablo Neira Ayuso [Wed, 15 Apr 2026 15:56:14 +0000 (17:56 +0200)] 
netfilter: nf_tables: join hook list via splice_list_rcu() in commit phase

Publish new hooks in the list into the basechain/flowtable using
splice_list_rcu() to ensure netlink dump list traversal via rcu is safe
while concurrent ruleset update is going on.

Fixes: 78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable")
Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 weeks agorculist: add list_splice_rcu() for private lists
Pablo Neira Ayuso [Wed, 15 Apr 2026 15:56:02 +0000 (17:56 +0200)] 
rculist: add list_splice_rcu() for private lists

This patch adds a helper function, list_splice_rcu(), to safely splice
a private (non-RCU-protected) list into an RCU-protected list.

The function ensures that only the pointer visible to RCU readers
(prev->next) is updated using rcu_assign_pointer(), while the rest of
the list manipulations are performed with regular assignments, as the
source list is private and not visible to concurrent RCU readers.

This is useful for moving elements from a private list into a global
RCU-protected list, ensuring safe publication for RCU readers.
Subsystems with some sort of batching mechanism from userspace can
benefit from this new function.

The function __list_splice_rcu() has been added for clarity and to
follow the same pattern as in the existing list_splice*() interfaces,
where there is a check to ensure that the list to splice is not
empty. Note that __list_splice_rcu() has no documentation for this
reason.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 weeks agonetfilter: nf_tables: use list_del_rcu for netlink hooks
Florian Westphal [Thu, 16 Apr 2026 13:14:51 +0000 (15:14 +0200)] 
netfilter: nf_tables: use list_del_rcu for netlink hooks

nft_netdev_unregister_hooks and __nft_unregister_flowtable_net_hooks need
to use list_del_rcu(), this list can be walked by concurrent dumpers.

Add a new helper and use it consistently.

Fixes: f9a43007d3f7 ("netfilter: nf_tables: double hook unregistration in netns path")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 weeks agonetfilter: arp_tables: fix IEEE1394 ARP payload parsing
Pablo Neira Ayuso [Mon, 20 Apr 2026 21:15:32 +0000 (23:15 +0200)] 
netfilter: arp_tables: fix IEEE1394 ARP payload parsing

Weiming Shi says:

"arp_packet_match() unconditionally parses the ARP payload assuming two
hardware addresses are present (source and target). However,
IPv4-over-IEEE1394 ARP (RFC 2734) omits the target hardware address
field, and arp_hdr_len() already accounts for this by returning a
shorter length for ARPHRD_IEEE1394 devices.

As a result, on IEEE1394 interfaces arp_packet_match() advances past a
nonexistent target hardware address and reads the wrong bytes for both
the target device address comparison and the target IP address. This
causes arptables rules to match against garbage data, leading to
incorrect filtering decisions: packets that should be accepted may be
dropped and vice versa.

The ARP stack in net/ipv4/arp.c (arp_create and arp_process) already
handles this correctly by skipping the target hardware address for
ARPHRD_IEEE1394. Apply the same pattern to arp_packet_match()."

Mangle the original patch to always return 0 (no match) in case user
matches on the target hardware address which is never present in
IEEE1394.

Note that this returns 0 (no match) for either normal and inverse match
because matching in the target hardware address in ARPHRD_IEEE1394 has
never been supported by arptables. This is intentional, matching on the
target hardware address should never evaluate true for ARPHRD_IEEE1394.

Moreover, adjust arpt_mangle to drop the packet too as AI suggests:

In arpt_mangle, the logic assumes a standard ARP layout. Because
IEEE1394 (FireWire) omits the target hardware address, the linear
pointer arithmetic miscalculates the offset for the target IP address.
This causes mangling operations to write to the wrong location, leading
to packet corruption. To ensure safety, this patch drops packets
(NF_DROP) when mangling is requested for these fields on IEEE1394
devices, as the current implementation cannot correctly map the FireWire
ARP payload.

This omits both mangling target hardware and IP address. Even if IP
address mangling should be possible in IEEE1394, this would require
to adjust arpt_mangle offset calculation, which has never been
supported.

Based on patch from Weiming Shi <bestswngs@gmail.com>.

Fixes: 6752c8db8e0c ("firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection.")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 weeks agoerofs: unify lcn as u64 for 32-bit platforms
Gao Xiang [Mon, 20 Apr 2026 10:11:42 +0000 (18:11 +0800)] 
erofs: unify lcn as u64 for 32-bit platforms

As sashiko reported [1], `lcn` was typed as `unsigned long` (or
`unsigned int` sometimes), which is only 32 bits wide on 32-bit
platforms, which causes `(lcn << lclusterbits)` to be truncated
at 4 GiB.

In order to consolidate the logic, just use `u64` consistently
around the codebase.

[1] https://sashiko.dev/r/20260420034612.1899973-1-hsiangkao%40linux.alibaba.com

Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
4 weeks agoerofs: fix offset truncation when shifting pgoff on 32-bit platforms
Gao Xiang [Mon, 20 Apr 2026 03:46:12 +0000 (11:46 +0800)] 
erofs: fix offset truncation when shifting pgoff on 32-bit platforms

On 32-bit platforms, pgoff_t is 32 bits wide, so left-shifting
large arbitrary pgoff_t values by PAGE_SHIFT performs 32-bit arithmetic
and silently truncates the result for pages beyond the 4 GiB boundary.

Cast the page index to loff_t before shifting to produce a correct
64-bit byte offset.

Fixes: 386292919c25 ("erofs: introduce readmore decompression strategy")
Fixes: 307210c262a2 ("erofs: verify metadata accesses for file-backed mounts")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
4 weeks agoerofs: fix the out-of-bounds nameoff handling for trailing dirents
Gao Xiang [Tue, 21 Apr 2026 07:59:52 +0000 (15:59 +0800)] 
erofs: fix the out-of-bounds nameoff handling for trailing dirents

Currently we already have boundary-checks for nameoffs, but the trailing
dirents are special since the namelens are calculated with strnlen()
with unchecked nameoffs.

If a crafted EROFS has a trailing dirent with nameoff >= maxsize,
maxsize - nameoff can underflow, causing strnlen() to read past the
directory block.

nameoff0 should also be verified to be a multiple of
`sizeof(struct erofs_dirent)` as well [1].

[1] https://sashiko.dev/#/patchset/20260416063511.3173774-1-hsiangkao%40linux.alibaba.com

Fixes: 3aa8ec716e52 ("staging: erofs: add directory operations")
Fixes: 33bac912840f ("staging: erofs: keep corrupted fs from crashing kernel in erofs_readdir()")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Reported-by: Junrui Luo <moonafterrain@outlook.com>
Closes: https://lore.kernel.org/r/A0FD7E0F-7558-49B0-8BC8-EB1ECDB2479A@outlook.com
Cc: stable@vger.kernel.org
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
4 weeks agoslip: bound decode() reads against the compressed packet length
Weiming Shi [Thu, 16 Apr 2026 10:01:51 +0000 (18:01 +0800)] 
slip: bound decode() reads against the compressed packet length

slhc_uncompress() parses a VJ-compressed TCP header by advancing a
pointer through the packet via decode() and pull16(). Neither helper
bounds-checks against isize, and decode() masks its return with
& 0xffff so it can never return the -1 that callers test for -- those
error paths are dead code.

A short compressed frame whose change byte requests optional fields
lets decode() read past the end of the packet. The over-read bytes
are folded into the cached cstate and reflected into subsequent
reconstructed packets.

Make decode() and pull16() take the packet end pointer and return -1
when exhausted. Add a bounds check before the TCP-checksum read.
The existing == -1 tests now do what they were always meant to.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20260414134126.758795-2-horms@kernel.org/
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260416100147.531855-5-bestswngs@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agoALSA: usb-audio/line6: Add support for POD HD PRO
Phil Willoughby [Mon, 20 Apr 2026 15:23:49 +0000 (16:23 +0100)] 
ALSA: usb-audio/line6: Add support for POD HD PRO

The POD HD PRO is the rackmount version of the POD 500, with most of the
same behaviors. As with some of the other rackmount POD devices it will
not send captured audio to the host unless the host is sending playback
audio, so it has LINE6_CAP_IN_NEEDS_OUT in addition to the POD 500
flags.

Tested-By: Phil Willoughby <willerz@gmail.com>
Signed-off-by: Phil Willoughby <willerz@gmail.com>
Link: https://patch.msgid.link/20260420152405.7230-1-willerz@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
4 weeks agoALSA: hda/realtek: Add LED fixup for HP EliteBook 6 G2a Laptops
Chris Chiu [Tue, 21 Apr 2026 02:34:28 +0000 (02:34 +0000)] 
ALSA: hda/realtek: Add LED fixup for HP EliteBook 6 G2a Laptops

The HP EliteBook 6 G2a laptops requires specific LED control method
ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF to work.

Signed-off-by: Chris Chiu <chris.chiu@canonical.com>
Link: https://patch.msgid.link/20260421023429.3723154-1-chris.chiu@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
4 weeks agoslip: reject VJ receive packets on instances with no rstate array
Weiming Shi [Wed, 15 Apr 2026 20:41:31 +0000 (04:41 +0800)] 
slip: reject VJ receive packets on instances with no rstate array

slhc_init() accepts rslots == 0 as a valid configuration, with the
documented meaning of 'no receive compression'. In that case the
allocation loop in slhc_init() is skipped, so comp->rstate stays
NULL and comp->rslot_limit stays 0 (from the kzalloc of struct
slcompress).

The receive helpers do not defend against that configuration.
slhc_uncompress() dereferences comp->rstate[x] when the VJ header
carries an explicit connection ID, and slhc_remember() later assigns
cs = &comp->rstate[...] after only comparing the packet's slot number
to comp->rslot_limit. Because rslot_limit is 0, slot 0 passes the
range check, and the code dereferences a NULL rstate.

The configuration is reachable in-tree through PPP. PPPIOCSMAXCID
stores its argument in a signed int, and (val >> 16) uses arithmetic
shift. Passing 0xffff0000 therefore sign-extends to -1, so val2 + 1
is 0 and ppp_generic.c ends up calling slhc_init(0, 1). Because
/dev/ppp open is gated by ns_capable(CAP_NET_ADMIN), the whole path
is reachable from an unprivileged user namespace. Once the malformed
VJ state is installed, any inbound VJ-compressed or VJ-uncompressed
frame that selects slot 0 crashes the kernel in softirq context:

 Oops: general protection fault, probably for non-canonical
       address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
 RIP: 0010:slhc_uncompress (drivers/net/slip/slhc.c:519)
 Call Trace:
  <TASK>
  ppp_receive_nonmp_frame (drivers/net/ppp/ppp_generic.c:2466)
  ppp_input (drivers/net/ppp/ppp_generic.c:2359)
  ppp_async_process (drivers/net/ppp/ppp_async.c:492)
  tasklet_action_common (kernel/softirq.c:926)
  handle_softirqs (kernel/softirq.c:623)
  run_ksoftirqd (kernel/softirq.c:1055)
  smpboot_thread_fn (kernel/smpboot.c:160)
  kthread (kernel/kthread.c:436)
  ret_from_fork (arch/x86/kernel/process.c:164)
  </TASK>

Reject the receive side on such instances instead of touching rstate.
slhc_uncompress() falls through to its existing 'bad' label, which
bumps sls_i_error and enters the toss state. slhc_remember() mirrors
that with an explicit sls_i_error increment followed by slhc_toss();
the sls_i_runt counter is not used here because a missing rstate is
an internal configuration state, not a runt packet.

The transmit path is unaffected: the only in-tree caller that picks
rslots from userspace (ppp_generic.c) still supplies tslots >= 1, and
slip.c always calls slhc_init(16, 16), so comp->tstate remains valid
and slhc_compress() continues to work.

Fixes: 4ab42d78e37a ("ppp, slip: Validate VJ compression slot parameters completely")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260415204130.258866-2-bestswngs@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 weeks agorhashtable: Bounce deferred worker kick through irq_work
Tejun Heo [Tue, 21 Apr 2026 06:03:26 +0000 (20:03 -1000)] 
rhashtable: Bounce deferred worker kick through irq_work

Inserts past 75% load call schedule_work(&ht->run_work) to kick an
async resize. If a caller holds a raw spinlock (e.g. an
insecure_elasticity user), schedule_work() under that lock records

  caller_lock -> pool->lock -> pi_lock -> rq->__lock

A cycle forms if any of these locks is acquired in the reverse
direction elsewhere. sched_ext, the only current insecure_elasticity
user, hits this: it holds scx_sched_lock across rhashtable inserts of
sub-schedulers, while scx_bypass() takes rq->__lock -> scx_sched_lock.
Exercising the resize path produces:

  Chain exists of:
    &pool->lock --> &rq->__lock --> scx_sched_lock

Bounce the kick from the insert paths through irq_work so
schedule_work() runs from hard IRQ context with the caller's lock no
longer held. rht_deferred_worker()'s self-rearm on error stays on
schedule_work(&ht->run_work) - the worker runs in process context with
no caller lock held, and keeping the self-requeue on @run_work lets
cancel_work_sync() in rhashtable_free_and_destroy() drain it.

v3: Keep rht_deferred_worker()'s self-rearm on schedule_work(&run_work).
    Routing it through irq_work in v2 broke cancel_work_sync()'s
    self-requeue handling - an irq_work queued after irq_work_sync()
    returned but while cancel_work_sync() was still waiting could fire
    post-teardown.

v2: Bounce unconditionally instead of gating on insecure_elasticity,
    as suggested by Herbert.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
4 weeks agoscsi: hisi_sas: Fix sparse warnings in prep_ata_v3_hw()
Yihang Li [Mon, 20 Apr 2026 02:10:44 +0000 (10:10 +0800)] 
scsi: hisi_sas: Fix sparse warnings in prep_ata_v3_hw()

In prep_ata_v3_hw(), add cpu_to_le32() to fix warning:

  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:1448:26: sparse: sparse: invalid assignment: |=
  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:1448:26: sparse:    left side has type restricted __le32
  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:1448:26: sparse:    right side has type unsigned int

Fixes: 8aa580cd9284 ("scsi: hisi_sas: Enable force phy when SATA disk directly connected")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604191850.IVYPTaML-lkp@intel.com/
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://patch.msgid.link/20260420021044.3339459-1-liyihang9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
4 weeks agoscsi: pmcraid: Fix typo in comments
Hugo Villeneuve [Fri, 17 Apr 2026 20:07:31 +0000 (16:07 -0400)] 
scsi: pmcraid: Fix typo in comments

Fix typo in structure comment.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Link: https://patch.msgid.link/20260417200738.3920001-1-hugo@hugovil.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
4 weeks agoscsi: scsi_dh_alua: Increase default ALUA timeout to maximum spec value
Brian Bunker [Thu, 16 Apr 2026 16:55:12 +0000 (09:55 -0700)] 
scsi: scsi_dh_alua: Increase default ALUA timeout to maximum spec value

The ALUA handler maps a 0 value (no implicit transition timeout provided
by the target) to the ALUA_FAILOVER_TIMEOUT constant, currently 60
seconds. This means the kernel already does not accept an infinite
transition time.

However, 60 seconds is insufficient for some arrays that may take longer
to complete ALUA transitions. Since the highest value allowed by the
SCSI specification for the implicit transition timeout is a single byte
(255 seconds), change the default to 255. This way, when a target does
not provide an explicit transition timeout, we default to the maximum
value the spec allows rather than an arbitrary 60 second limit.

Co-developed-by: Krishna Kant <krishna.kant@purestorage.com>
Signed-off-by: Krishna Kant <krishna.kant@purestorage.com>
Co-developed-by: Riya Savla <rsavla@purestorage.com>
Signed-off-by: Riya Savla <rsavla@purestorage.com>
Signed-off-by: Brian Bunker <brian@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20260416165512.26497-2-brian@purestorage.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
4 weeks agoscsi: smartpqi: Silence a recursive lock warning
Tomas Henzl [Tue, 14 Apr 2026 12:41:18 +0000 (14:41 +0200)] 
scsi: smartpqi: Silence a recursive lock warning

On systems with multiple controllers debug kernel shows

  WARNING: possible recursive locking detected

during shutdown.

Each controller does have its own ctrl_info (and mutex) and that isn't
correctly recognized by debug kernel.  Suppress the warning by releasing
the mutex at the end of pqi_shutdown().

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Don Brace <don.brace@microchip.com>
Link: https://patch.msgid.link/20260414124118.23661-1-thenzl@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
4 weeks agoscsi: mpt3sas: Limit NVMe request size to 2 MiB
Ranjan Kumar [Tue, 14 Apr 2026 11:08:11 +0000 (16:38 +0530)] 
scsi: mpt3sas: Limit NVMe request size to 2 MiB

The HBA firmware reports NVMe MDTS values based on the underlying drive
capability. However, because the driver allocates a fixed 4K buffer for
the PRP list, accommodating at most 512 entries, the driver supports a
maximum I/O transfer size of 2 MiB.

Limit max_hw_sectors to the smaller of the reported MDTS and the 2 MiB
driver limit to prevent issuing oversized I/O that may lead to a kernel
oops.

Cc: stable@vger.kernel.org
Fixes: 9b8b84879d4a ("block: Increase BLK_DEF_MAX_SECTORS_CAP")
Reported-by: Mira Limbeck <m.limbeck@proxmox.com>
Closes: https://lore.kernel.org/r/291f78bf-4b4a-40dd-867d-053b36c564b3@proxmox.com
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b8b84879d4a
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Tested-by: Mira Limbeck <m.limbeck@proxmox.com>
Link: https://patch.msgid.link/20260414110811.85156-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
4 weeks agoscsi: sg: Don't use GFP_ATOMIC in sg_start_req()
Christoph Hellwig [Wed, 15 Apr 2026 06:08:06 +0000 (08:08 +0200)] 
scsi: sg: Don't use GFP_ATOMIC in sg_start_req()

sg_start_req() is called from normal user context and can sleep when
waiting for memory.  Switch it to use GFP_KERNEL, which fixes allocation
failures seen with the bio_alloc rework.

Fixes: b520c4eef83d ("block: split bio_alloc_bioset more clearly into a fast and slowpath")
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260415060813.807659-2-hch@lst.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
4 weeks agobtrfs: fix double-decrement of bytes_may_use in submit_one_async_extent()
Mark Harmstone [Thu, 16 Apr 2026 17:15:23 +0000 (18:15 +0100)] 
btrfs: fix double-decrement of bytes_may_use in submit_one_async_extent()

submit_one_async_extent() calls btrfs_reserve_extent(), which decrements
bytes_may_use. If the call btrfs_create_io_em() fails, we jump to
out_free_reserve, which calls extent_clear_unlock_delalloc().

Because we're specifying EXTENT_DO_ACCOUNTING, i.e.
EXTENT_CLEAR_META_RESV | EXTENT_CLEAR_DATA_RESV, this decreases
bytes_may_use again. This can lead to problems later on, as an initial
write can fail only for the writeback to silently ENOSPC.

Fix this by replacing EXTENT_DO_ACCOUNTING with EXTENT_CLEAR_META_RESV.
This parallels a4fe134fc1d8eb ("btrfs: fix a double release on reserved
extents in cow_one_range()"), which is the same fix in cow_one_range().

Fixes: 151a41bc46df ("Btrfs: fix what bits we clear when erroring out from delalloc")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: check return value of btrfs_partially_delete_raid_extent()
robbieko [Mon, 13 Apr 2026 06:52:37 +0000 (14:52 +0800)] 
btrfs: check return value of btrfs_partially_delete_raid_extent()

btrfs_partially_delete_raid_extent() returns an error code (e.g.
-ENOMEM from kzalloc(), or errors from btrfs_del_item/btrfs_insert_item()),
but all three call sites in btrfs_delete_raid_extent() discard the
return value, silently losing errors and potentially leaving the stripe
tree in an inconsistent state.

Fix by capturing the return value into ret at all three call sites and
breaking out of the loop on error where appropriate.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: handle -EAGAIN from btrfs_duplicate_item and refresh stale leaf pointer
robbieko [Mon, 13 Apr 2026 06:52:36 +0000 (14:52 +0800)] 
btrfs: handle -EAGAIN from btrfs_duplicate_item and refresh stale leaf pointer

In the 'punch a hole' case of btrfs_delete_raid_extent(),
btrfs_duplicate_item() can return -EAGAIN when the leaf needs to be
split and the path becomes invalid. The old code treats any error as
fatal and breaks out of the loop.

Additionally, btrfs_duplicate_item() may trigger setup_leaf_for_split()
which can reallocate the leaf node. The code continues using the old
leaf pointer, leading to use-after-free or stale data access.

Fix both issues by:

- Handling -EAGAIN specifically: release the path and retry the loop.
- Refreshing leaf = path->nodes[0] after successful duplication.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: replace ASSERT with proper error handling in stripe lookup fallback
robbieko [Mon, 13 Apr 2026 06:52:35 +0000 (14:52 +0800)] 
btrfs: replace ASSERT with proper error handling in stripe lookup fallback

After falling back to the previous item in btrfs_delete_raid_extent(),
the code uses ASSERT(found_start <= start) to verify the found extent
actually precedes our target range. If the B-tree state is unexpected
(e.g. no overlapping extent exists), this triggers a kernel BUG/panic
in debug builds, or silently continues with wrong data otherwise.

Replace the ASSERT with a proper bounds check that returns -ENOENT if
the found extent does not actually overlap with the start position.

Signed-off-by: robbieko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: fix wrong min_objectid in btrfs_previous_item() call
robbieko [Mon, 13 Apr 2026 06:52:34 +0000 (14:52 +0800)] 
btrfs: fix wrong min_objectid in btrfs_previous_item() call

When found_start > start and slot == 0, btrfs_previous_item() is called
with min_objectid=start to find the previous stripe extent. However, the
previous stripe extent we are looking for has objectid < start (it starts
before our deletion range), so passing start as min_objectid prevents
finding it.

Fix by passing 0 as min_objectid to allow finding any preceding stripe
extent regardless of its objectid.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: fix raid stripe search missing entries at leaf boundaries
robbieko [Mon, 13 Apr 2026 06:52:33 +0000 (14:52 +0800)] 
btrfs: fix raid stripe search missing entries at leaf boundaries

In btrfs_delete_raid_extent(), the search key uses offset=0. When the
target stripe entry is the first item on a leaf, btrfs_search_slot()
may land on the previous leaf and decrementing the slot from nritems
still points to the wrong entry, causing the stripe extent to be
silently missed.

Fix this by searching with offset=(u64)-1 instead. Since no real stripe
entry has this offset, btrfs_search_slot() always returns 1 with the
slot pointing past the last matching objectid entry. Then unconditionally
decrement the slot with a proper slots[0]==0 early-exit check to handle
the case where no matching entry exists.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: copy devid in btrfs_partially_delete_raid_extent()
robbieko [Mon, 13 Apr 2026 06:52:32 +0000 (14:52 +0800)] 
btrfs: copy devid in btrfs_partially_delete_raid_extent()

When btrfs_partially_delete_raid_extent() rebuilds a truncated/shifted
stripe extent into newitem, the loop copies the physical address for
each stride but forgets to copy the devid. The resulting item written
back to the stripe tree has zeroed-out devids, corrupting the stripe
mapping.

Fix this by reading the devid with btrfs_raid_stride_devid() and
writing it into the new item with btrfs_set_stack_raid_stride_devid()
before copying the physical address.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: handle unexpected free-space-tree key types
David Sterba [Tue, 14 Apr 2026 15:30:31 +0000 (17:30 +0200)] 
btrfs: handle unexpected free-space-tree key types

Replace the conditional assertions with proper error handling and
transaction abort if we find an unexpected key type in the free space
tree.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: fix missing last_unlink_trans update when removing a directory
Filipe Manana [Thu, 9 Apr 2026 14:46:51 +0000 (15:46 +0100)] 
btrfs: fix missing last_unlink_trans update when removing a directory

When removing a directory we are not updating its last_unlink_trans field,
which can result in incorrect fsync behaviour in case some one fsyncs the
directory after it was removed because it's holding a file descriptor on
it.

Example scenario:

   mkdir /mnt/dir1
   mkdir /mnt/dir1/dir2
   mkdir /mnt/dir3

   sync -f /mnt

   # Do some change to the directory and fsync it.
   chmod 700 /mnt/dir1
   xfs_io -c fsync /mnt/dir1

   # Move dir2 out of dir1 so that dir1 becomes empty.
   mv /mnt/dir1/dir2 /mnt/dir3/

   open fd on /mnt/dir1
   call rmdir(2) on path "/mnt/dir1"
   fsync fd

   <trigger power failure>

When attempting to mount the filesystem, the log replay will fail with
an -EIO error and dmesg/syslog has the following:

   [445771.626482] BTRFS info (device dm-0): first mount of filesystem 0368bbea-6c5e-44b5-b409-09abe496e650
   [445771.626486] BTRFS info (device dm-0): using crc32c checksum algorithm
   [445771.627912] BTRFS info (device dm-0): start tree-log replay
   [445771.628335] page: refcount:2 mapcount:0 mapping:0000000061443ddc index:0x1d00 pfn:0x7072a5
   [445771.629453] memcg:ffff89f400351b00
   [445771.629892] aops:btree_aops [btrfs] ino:1
   [445771.630737] flags: 0x17fffc00000402a(uptodate|lru|private|writeback|node=0|zone=2|lastcpupid=0x1ffff)
   [445771.632359] raw: 017fffc00000402a fffff47284d950c8 fffff472907b7c08 ffff89f458e412b8
   [445771.633713] raw: 0000000000001d00 ffff89f6c51d1a90 00000002ffffffff ffff89f400351b00
   [445771.635029] page dumped because: eb page dump
   [445771.635825] BTRFS critical (device dm-0): corrupt leaf: root=5 block=30408704 slot=10 ino=258, invalid nlink: has 2 expect no more than 1 for dir
   [445771.638088] BTRFS info (device dm-0): leaf 30408704 gen 10 total ptrs 17 free space 14878 owner 5
   [445771.638091] BTRFS info (device dm-0): refs 4 lock_owner 0 current 3581087
   [445771.638094]  item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
   [445771.638097]  inode generation 3 transid 9 size 16 nbytes 16384
   [445771.638098]  block group 0 mode 40755 links 1 uid 0 gid 0
   [445771.638100]  rdev 0 sequence 2 flags 0x0
   [445771.638102]  atime 1775744884.0
   [445771.660056]  ctime 1775744885.645502983
   [445771.660058]  mtime 1775744885.645502983
   [445771.660060]  otime 1775744884.0
   [445771.660062]  item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
   [445771.660064]  index 0 name_len 2
   [445771.660066]  item 2 key (256 DIR_ITEM 1843588421) itemoff 16077 itemsize 34
   [445771.660068]  location key (259 1 0) type 2
   [445771.660070]  transid 9 data_len 0 name_len 4
   [445771.660075]  item 3 key (256 DIR_ITEM 2363071922) itemoff 16043 itemsize 34
   [445771.660076]  location key (257 1 0) type 2
   [445771.660077]  transid 9 data_len 0 name_len 4
   [445771.660078]  item 4 key (256 DIR_INDEX 2) itemoff 16009 itemsize 34
   [445771.660079]  location key (257 1 0) type 2
   [445771.660080]  transid 9 data_len 0 name_len 4
   [445771.660081]  item 5 key (256 DIR_INDEX 3) itemoff 15975 itemsize 34
   [445771.660082]  location key (259 1 0) type 2
   [445771.660083]  transid 9 data_len 0 name_len 4
   [445771.660084]  item 6 key (257 INODE_ITEM 0) itemoff 15815 itemsize 160
   [445771.660086]  inode generation 9 transid 9 size 8 nbytes 0
   [445771.660087]  block group 0 mode 40777 links 1 uid 0 gid 0
   [445771.660088]  rdev 0 sequence 2 flags 0x0
   [445771.660089]  atime 1775744885.641174097
   [445771.660090]  ctime 1775744885.645502983
   [445771.660091]  mtime 1775744885.645502983
   [445771.660105]  otime 1775744885.641174097
   [445771.660106]  item 7 key (257 INODE_REF 256) itemoff 15801 itemsize 14
   [445771.660107]  index 2 name_len 4
   [445771.660108]  item 8 key (257 DIR_ITEM 2676584006) itemoff 15767 itemsize 34
   [445771.660109]  location key (258 1 0) type 2
   [445771.660110]  transid 9 data_len 0 name_len 4
   [445771.660111]  item 9 key (257 DIR_INDEX 2) itemoff 15733 itemsize 34
   [445771.660112]  location key (258 1 0) type 2
   [445771.660113]  transid 9 data_len 0 name_len 4
   [445771.660114]  item 10 key (258 INODE_ITEM 0) itemoff 15573 itemsize 160
   [445771.660115]  inode generation 9 transid 10 size 0 nbytes 0
   [445771.660116]  block group 0 mode 40755 links 2 uid 0 gid 0
   [445771.660117]  rdev 0 sequence 0 flags 0x0
   [445771.660118]  atime 1775744885.645502983
   [445771.660119]  ctime 1775744885.645502983
   [445771.660120]  mtime 1775744885.645502983
   [445771.660121]  otime 1775744885.645502983
   [445771.660122]  item 11 key (258 INODE_REF 257) itemoff 15559 itemsize 14
   [445771.660123]  index 2 name_len 4
   [445771.660124]  item 12 key (258 INODE_REF 259) itemoff 15545 itemsize 14
   [445771.660125]  index 2 name_len 4
   [445771.660126]  item 13 key (259 INODE_ITEM 0) itemoff 15385 itemsize 160
   [445771.660127]  inode generation 9 transid 10 size 8 nbytes 0
   [445771.660128]  block group 0 mode 40755 links 1 uid 0 gid 0
   [445771.660129]  rdev 0 sequence 1 flags 0x0
   [445771.660130]  atime 1775744885.645502983
   [445771.660130]  ctime 1775744885.645502983
   [445771.660131]  mtime 1775744885.645502983
   [445771.660132]  otime 1775744885.645502983
   [445771.660133]  item 14 key (259 INODE_REF 256) itemoff 15371 itemsize 14
   [445771.660134]  index 3 name_len 4
   [445771.660135]  item 15 key (259 DIR_ITEM 2676584006) itemoff 15337 itemsize 34
   [445771.660136]  location key (258 1 0) type 2
   [445771.660137]  transid 10 data_len 0 name_len 4
   [445771.660138]  item 16 key (259 DIR_INDEX 2) itemoff 15303 itemsize 34
   [445771.660139]  location key (258 1 0) type 2
   [445771.660140]  transid 10 data_len 0 name_len 4
   [445771.660144] BTRFS error (device dm-0): block=30408704 write time tree block corruption detected
   [445771.661650] ------------[ cut here ]------------
   [445771.662358] WARNING: fs/btrfs/disk-io.c:326 at btree_csum_one_bio+0x217/0x230 [btrfs], CPU#8: mount/3581087
   [445771.663588] Modules linked in: btrfs f2fs xfs (...)
   [445771.671229] CPU: 8 UID: 0 PID: 3581087 Comm: mount Tainted: G        W           7.0.0-rc6-btrfs-next-230+ #2 PREEMPT(full)
   [445771.672575] Tainted: [W]=WARN
   [445771.672987] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
   [445771.674460] RIP: 0010:btree_csum_one_bio+0x217/0x230 [btrfs]
   [445771.675222] Code: 89 44 24 (...)
   [445771.677364] RSP: 0018:ffffd23882247660 EFLAGS: 00010246
   [445771.678029] RAX: 0000000000000000 RBX: ffff89f6c51d1a90 RCX: 0000000000000000
   [445771.678975] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff89f406020000
   [445771.679983] RBP: ffff89f821204000 R08: 0000000000000000 R09: 00000000ffefffff
   [445771.680905] R10: ffffd23882247448 R11: 0000000000000003 R12: ffffd23882247668
   [445771.681978] R13: ffff89f458e40fc0 R14: ffff89f737f4f500 R15: ffff89f737f4f500
   [445771.682912] FS:  00007f0447a98840(0000) GS:ffff89fb9771d000(0000) knlGS:0000000000000000
   [445771.684393] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [445771.685230] CR2: 00007f0447bf1330 CR3: 000000017cb02002 CR4: 0000000000370ef0
   [445771.686273] Call Trace:
   [445771.686646]  <TASK>
   [445771.686969]  btrfs_submit_bbio+0x83f/0x860 [btrfs]
   [445771.687750]  ? write_one_eb+0x28f/0x340 [btrfs]
   [445771.688428]  btree_writepages+0x2e3/0x550 [btrfs]
   [445771.689180]  ? kmem_cache_alloc_noprof+0x12a/0x490
   [445771.689963]  ? alloc_extent_state+0x19/0x120 [btrfs]
   [445771.690801]  ? kmem_cache_free+0x135/0x380
   [445771.691328]  ? preempt_count_add+0x69/0xa0
   [445771.691831]  ? set_extent_bit+0x252/0x8e0 [btrfs]
   [445771.692468]  ? xas_load+0x9/0xc0
   [445771.692873]  ? xas_find+0x14d/0x1a0
   [445771.693304]  do_writepages+0xc6/0x160
   [445771.693756]  filemap_writeback+0xb8/0xe0
   [445771.694274]  btrfs_write_marked_extents+0x61/0x170 [btrfs]
   [445771.694999]  btrfs_write_and_wait_transaction+0x4e/0xc0 [btrfs]
   [445771.695818]  btrfs_commit_transaction+0x5c8/0xd10 [btrfs]
   [445771.696530]  ? kmem_cache_free+0x135/0x380
   [445771.697120]  ? release_extent_buffer+0x34/0x160 [btrfs]
   [445771.697786]  btrfs_recover_log_trees+0x7be/0x7e0 [btrfs]
   [445771.698525]  ? __pfx_replay_one_buffer+0x10/0x10 [btrfs]
   [445771.699206]  open_ctree+0x11e5/0x1810 [btrfs]
   [445771.699776]  btrfs_get_tree.cold+0xb/0x162 [btrfs]
   [445771.700463]  ? fscontext_read+0x165/0x180
   [445771.701146]  ? rw_verify_area+0x50/0x180
   [445771.701866]  vfs_get_tree+0x25/0xd0
   [445771.702491]  vfs_cmd_create+0x59/0xe0
   [445771.703125]  __do_sys_fsconfig+0x303/0x610
   [445771.703603]  do_syscall_64+0xe9/0xf20
   [445771.703974]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
   [445771.704700] RIP: 0033:0x7f0447cbd4aa
   [445771.705108] Code: 73 01 c3 (...)
   [445771.707263] RSP: 002b:00007ffc4e528318 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
   [445771.708107] RAX: ffffffffffffffda RBX: 00005561585d8c20 RCX: 00007f0447cbd4aa
   [445771.708931] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
   [445771.709744] RBP: 00005561585d9120 R08: 0000000000000000 R09: 0000000000000000
   [445771.710674] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
   [445771.711477] R13: 00007f0447e4f580 R14: 00007f0447e5126c R15: 00007f0447e36a23
   [445771.712277]  </TASK>
   [445771.712541] ---[ end trace 0000000000000000 ]---
   [445771.713382] BTRFS error (device dm-0): error while writing out transaction: -5
   [445771.714679] BTRFS warning (device dm-0): Skipping commit of aborted transaction.
   [445771.715562] BTRFS error (device dm-0 state A): Transaction aborted (error -5)
   [445771.716459] BTRFS: error (device dm-0 state A) in cleanup_transaction:2068: errno=-5 IO failure
   [445771.717936] BTRFS error (device dm-0 state EA): failed to recover log trees with error: -5
   [445771.719681] BTRFS error (device dm-0 state EA): open_ctree failed: -5

The problem is that such a fsync should have result in a fallback to a
transaction commit, but that did not happen because through the
btrfs_rmdir() we never update the directory's last_unlink_trans field.
Any inode that had a link removed must have its last_unlink_trans updated
to the ID of transaction used for the operation, otherwise fsync and log
replay will not work correctly.

btrfs_rmdir() calls btrfs_unlink_inode() and through that call chain we
never call btrfs_record_unlink_dir() in order to update last_unlink_trans.
However btrfs_unlink(), which is used for unlinking regular files, calls
btrfs_record_unlink_dir() and then calls btrfs_unlink_inode(). So fix
this by moving the call to btrfs_record_unlink_dir() from btrfs_unlink()
to btrfs_unlink_inode().

A test case for fstests will follow soon.

Reported-by: Slava0135 <slava.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAAJYhww5ov62Hm+n+tmhcL-e_4cBobg+OWogKjOJxVUXivC=MQ@mail.gmail.com/
CC: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>