]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
2 weeks agoext4: move zero partial block range functions out of active handle
Zhang Yi [Fri, 27 Mar 2026 10:29:34 +0000 (18:29 +0800)] 
ext4: move zero partial block range functions out of active handle

Move ext4_block_zero_eof() and ext4_zero_partial_blocks() calls out of
the active handle context, making them independent operations, and also
add return value checks. This is safe because it still ensures data is
updated before metadata for data=ordered mode and data=journal mode
because we still zero data and ordering data before modifying the
metadata.

This change is required for iomap infrastructure conversion because the
iomap buffered I/O path does not use the same journal infrastructure for
partial block zeroing. The lock ordering of folio lock and starting
transactions is "folio lock -> transaction start", which is opposite of
the current path. Therefore, zeroing partial blocks cannot be performed
under the active handle.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260327102939.1095257-9-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: pass allocate range as loff_t to ext4_alloc_file_blocks()
Zhang Yi [Fri, 27 Mar 2026 10:29:33 +0000 (18:29 +0800)] 
ext4: pass allocate range as loff_t to ext4_alloc_file_blocks()

Change ext4_alloc_file_blocks() to accept offset and len in byte
granularity instead of block granularity. This allows callers to pass
byte offsets and lengths directly, and this prepares for moving the
ext4_zero_partial_blocks() call from the while(len) loop for unaligned
append writes, where it only needs to be invoked once before doing block
allocation.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260327102939.1095257-8-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: remove handle parameters from zero partial block functions
Zhang Yi [Fri, 27 Mar 2026 10:29:32 +0000 (18:29 +0800)] 
ext4: remove handle parameters from zero partial block functions

Only journal data mode requires an active journal handle when zeroing
partial blocks. Stop passing handle_t *handle to
ext4_zero_partial_blocks() and related functions, and make
ext4_block_journalled_zero_range() start a handle independently.

This change has no practical impact now because all callers invoke these
functions within the context of an active handle. It prepares for moving
ext4_block_zero_eof() out of an active handle in the next patch, which
is a prerequisite for converting block zero range operations to iomap
infrastructure.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260327102939.1095257-7-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: move ordered data handling out of ext4_block_do_zero_range()
Zhang Yi [Fri, 27 Mar 2026 10:29:31 +0000 (18:29 +0800)] 
ext4: move ordered data handling out of ext4_block_do_zero_range()

Remove the handle parameter from ext4_block_do_zero_range() and move the
ordered data handling to ext4_block_zero_eof().

This is necessary for truncate up and append writes across a range
extending beyond EOF. The ordered data must be committed before updating
i_disksize to prevent exposing stale on-disk data from concurrent
post-EOF mmap writes during previous folio writeback or in case of
system crash during append writes.

This is unnecessary for partial block hole punching because the entire
punch operation does not provide atomicity guarantees and can already
expose intermediate results in case of crash.

Hole punching can only ever expose data that was there before the punch
but missed zeroing during append / truncate could expose data that was
not visible in the file before the operation.

Since ordered data handling is no longer performed inside
ext4_zero_partial_blocks(), ext4_punch_hole() no longer needs to attach
jinode.

This is prepared for the conversion to the iomap infrastructure, which
does not use ordered data mode while zeroing post-EOF partial blocks.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260327102939.1095257-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: rename ext4_block_zero_page_range() to ext4_block_zero_range()
Zhang Yi [Fri, 27 Mar 2026 10:29:30 +0000 (18:29 +0800)] 
ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range()

Rename ext4_block_zero_page_range() to ext4_block_zero_range() since the
"page" naming is no longer appropriate for current context. Also change
its signature to take an inode pointer instead of an address_space. This
aligns with the caller ext4_block_zero_eof() and
ext4_zero_partial_blocks().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260327102939.1095257-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: factor out journalled block zeroing range
Zhang Yi [Fri, 27 Mar 2026 10:29:29 +0000 (18:29 +0800)] 
ext4: factor out journalled block zeroing range

Refactor __ext4_block_zero_page_range() by separating the block zeroing
operations for ordered data mode and journal data mode into two distinct
functions:

  - ext4_block_do_zero_range(): handles non-journal data mode with
    ordered data support
  - ext4_block_journalled_zero_range(): handles journal data mode

Also extract a common helper, ext4_load_tail_bh(), to handle buffer head
and folio retrieval, along with the associated error handling. This
prepares for converting the partial block zero range to the iomap
infrastructure.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260327102939.1095257-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: rename and extend ext4_block_truncate_page()
Zhang Yi [Fri, 27 Mar 2026 10:29:28 +0000 (18:29 +0800)] 
ext4: rename and extend ext4_block_truncate_page()

Rename ext4_block_truncate_page() to ext4_block_zero_eof() and extend
its signature to accept an explicit 'end' offset instead of calculating
the block boundary. This helper function now can replace all cases
requiring zeroing of the partial EOF block, including the append
buffered write paths in ext4_*_write_end().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260327102939.1095257-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: add did_zero output parameter to ext4_block_zero_page_range()
Zhang Yi [Fri, 27 Mar 2026 10:29:27 +0000 (18:29 +0800)] 
ext4: add did_zero output parameter to ext4_block_zero_page_range()

Add a bool *did_zero output parameter to ext4_block_zero_page_range()
and __ext4_block_zero_page_range(). The parameter reports whether a
partial block was zeroed out, which is needed for the upcoming iomap
buffered I/O conversion.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260327102939.1095257-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: fix diagnostic printf formats
David Laight [Thu, 26 Mar 2026 20:18:04 +0000 (20:18 +0000)] 
ext4: fix diagnostic printf formats

The formats for non-terminated names should be "%.*s" not "%*.s".
The kernel currently treats "%*.s" as equivalent to "%*s" whereas
userspace requires it be equivalent to "%*.0s".
Neither is correct here.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Link: https://patch.msgid.link/20260326201804.3881-1-david.laight.linux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: move dcache manipulation out of __ext4_link()
NeilBrown [Fri, 20 Mar 2026 00:03:18 +0000 (11:03 +1100)] 
ext4: move dcache manipulation out of __ext4_link()

__ext4_link() has two callers.

- ext4_link() calls it during normal handling of the link() system
  call or similar
- ext4_fc_replay_link_internal() calls it when replaying the journal
  at mount time.

The former needs changes to dcache - instantiating the dentry to the
inode on success.  The latter doesn't need or want any dcache
manipulation.

So move the manipulation out of __ext4_link() and do it in ext4_link()
only.

This requires:
 - passing the qname from the dentry explicitly to __ext4_link.
   The parent dir is already passed.  The dentry is still passed
   in the ext4_link() case purely for use by ext4_fc_track_link().
 - passing the inode separately to ext4_fc_track_link() as the
   dentry will not be instantiated yet.
 - using __ext4_add_entry() in ext4_link, which doesn't need a dentry.
 - moving ihold(), d_instantiate(), drop_nlink() and iput() calls out
   of __ext4_link() into ext4_link().

Note that ext4_inc_count() and drop_nlink() remain in __ext4_link()
as both callers need them and they are not related to the dentry.

This substantially simplifies ext4_fc_replay_link_internal(), and
removes a use of d_alloc() which, it is planned, will be removed.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20260320000838.3797494-4-neilb@ownmail.net
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: add ext4_fc_eligible()
NeilBrown [Fri, 20 Mar 2026 00:03:17 +0000 (11:03 +1100)] 
ext4: add ext4_fc_eligible()

Testing EXT4_MF_FC_INELIGIBLE is almost always combined with testing
ext4_fc_disabled().  The code can be simplified by combining these two
in a new ext4_fc_eligible().

In ext4_fc_track_inode() this moves the ext4_fc_disabled() test after
ext4_fc_mark_ineligible(), but as that is a non-op when
ext4_fc_disabled() is true, this is no no consequence.

Note that it is important to still call ext4_fc_mark_ineligible() in
ext4_fc_track_inode() even when ext4_fc_eligible() would return true.
ext4_fc_mark_ineligible() does not ONLY set the "INELIGIBLE" flag but
also updates ->s_fc_ineligible_tid to make sure that the flag remains
set until all ineligible transactions have been committed.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20260320000838.3797494-3-neilb@ownmail.net
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: split __ext4_add_entry() out of ext4_add_entry()
NeilBrown [Fri, 20 Mar 2026 00:03:16 +0000 (11:03 +1100)] 
ext4: split __ext4_add_entry() out of ext4_add_entry()

__ext4_add_entry() is not given a dentry - just inodes and name.
This will help the next patch which simplifies __ex4_link().

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20260320000838.3797494-2-neilb@ownmail.net
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: prefer IS_ERR_OR_NULL over manual NULL check
Philipp Hahn [Tue, 10 Mar 2026 11:48:30 +0000 (12:48 +0100)] 
ext4: prefer IS_ERR_OR_NULL over manual NULL check

Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
check.

Change generated with coccinelle.

To: "Theodore Ts'o" <tytso@mit.edu>
To: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
Link: https://patch.msgid.link/20260310-b4-is_err_or_null-v1-4-bd63b656022d@avm.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agonet: remove the netif_get_rx_queue_lease_locked() helpers
Jakub Kicinski [Wed, 8 Apr 2026 22:12:51 +0000 (15:12 -0700)] 
net: remove the netif_get_rx_queue_lease_locked() helpers

The netif_get_rx_queue_lease_locked() API hides the locking
and the descend onto the leased queue. Making the code
harder to follow (at least to me). Remove the API and open
code the descend a bit. Most of the code now looks like:

 if (!leased)
     return __helper(x);

 hw_rxq = ..
 netdev_lock(hw_rxq->dev);
 ret = __helper(x);
 netdev_unlock(hw_rxq->dev);

 return ret;

Of course if we have more code paths that need the wrapping
we may need to revisit. For now, IMHO, having to know what
netif_get_rx_queue_lease_locked() does is not worth the 20LoC
it saves.

Link: https://patch.msgid.link/20260408151251.72bd2482@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'
Jakub Kicinski [Fri, 10 Apr 2026 01:24:34 +0000 (18:24 -0700)] 
Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'

Daniel Borkmann says:

====================
netkit: Support for io_uring zero-copy and AF_XDP

Containers use virtual netdevs to route traffic from a physical netdev
in the host namespace. They do not have access to the physical netdev
in the host and thus can't use memory providers or AF_XDP that require
reconfiguring/restarting queues in the physical netdev.

This patchset adds the concept of queue leasing to virtual netdevs that
allow containers to use memory providers and AF_XDP at native speed.
Leased queues are bound to a real queue in a physical netdev and act
as a proxy.

Memory providers and AF_XDP operations take an ifindex and queue id,
so containers would pass in an ifindex for a virtual netdev and a queue
id of a leased queue, which then gets proxied to the underlying real
queue.

We have implemented support for this concept in netkit and tested the
latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504
(bnxt_en) 100G NICs. For more details see the individual patches.
====================

Link: https://patch.msgid.link/20260402231031.447597-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests/net: Add queue leasing tests with netkit
David Wei [Thu, 2 Apr 2026 23:10:31 +0000 (01:10 +0200)] 
selftests/net: Add queue leasing tests with netkit

Add extensive selftests for netkit queue leasing, using io_uring zero
copy test binary inside of a netns with netkit. This checks that memory
providers can be bound against virtual queues in a netkit within a
netns that are leasing from a physical netdev in the default netns.
Also add various test cases around corner cases for the queue creation
itself as well as queue info dumping and teardown in case of netkit in
device pair and single mode.

Signed-off-by: David Wei <dw@davidwei.uk>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260402231031.447597-15-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetkit: Add xsk support for af_xdp applications
Daniel Borkmann [Thu, 2 Apr 2026 23:10:30 +0000 (01:10 +0200)] 
netkit: Add xsk support for af_xdp applications

Enable support for AF_XDP applications to operate on a netkit device.
The goal is that AF_XDP applications can natively consume AF_XDP
from network namespaces. The use-case from Cilium side is to support
Kubernetes KubeVirt VMs through QEMU's AF_XDP backend. KubeVirt is a
virtual machine management add-on for Kubernetes which aims to provide
a common ground for virtualization. KubeVirt spawns the VMs inside
Kubernetes Pods which reside in their own network namespace just like
regular Pods.

Raw QEMU AF_XDP backend example with eth0 being a physical device with
16 queues where netkit is bound to the last queue (for multi-queue RSS
context can be used if supported by the driver):

  # ethtool -X eth0 start 0 equal 15
  # ethtool -X eth0 start 15 equal 1 context new
  # ethtool --config-ntuple eth0 flow-type ether \
            src 00:00:00:00:00:00 \
            src-mask ff:ff:ff:ff:ff:ff \
            dst $mac dst-mask 00:00:00:00:00:00 \
            proto 0 proto-mask 0xffff action 15
  [ ... setup BPF/XDP prog on eth0 to steer into shared xsk map ... ]
  # ip netns add foo
  # ip link add numrxqueues 2 nk type netkit single
  # ynl --family netdev --output-json --do queue-create \
        --json "{"ifindex": $(ifindex nk), "type": "rx", \
                 "lease": { "ifindex": $(ifindex eth0), \
                            "queue": { "type": "rx", "id": 15 } } }"
  {'id': 1}
  # ip link set nk netns foo
  # ip netns exec foo ip link set lo up
  # ip netns exec foo ip link set nk up
  # ip netns exec foo qemu-system-x86_64 \
          -kernel $kernel \
          -drive file=${image_name},index=0,media=disk,format=raw \
          -append "root=/dev/sda rw console=ttyS0" \
          -cpu host \
          -m $memory \
          -enable-kvm \
          -device virtio-net-pci,netdev=net0,mac=$mac \
          -netdev af-xdp,ifname=nk,id=net0,mode=native,queues=1,start-queue=1,inhibit=on,map-path=$dir/xsks_map \
          -nographic

We have tested the above against a dual-port Nvidia ConnectX-6 (mlx5)
100G NIC with successful network connectivity out of QEMU. An earlier
iteration of this work was presented at LSF/MM/BPF [0] and more
recently at LPC [1].

For getting to a first starting point to connect all things with
KubeVirt, bind mounting the xsk map from Cilium into the VM launcher
Pod which acts as a regular Kubernetes Pod while not perfect, is not
a big problem given its out of reach from the application sitting
inside the VM (and some of the control plane aspects are baked in
the launcher Pod already), so the isolation barrier is still the VM.
Eventually the goal is to have a XDP/XSK redirect extension where
there is no need to have the xsk map, and the BPF program can just
derive the target xsk through the queue where traffic was received
on.

The exposure through netkit is because Cilium should not act as a
proxy handing out xsk sockets. Existing applications expect a netdev
from kernel side and should not need to rewrite just to implement
against a CNI's protocol. Also, all the memory should not be accounted
against Cilium but rather the application Pod itself which is consuming
AF_XDP. Further, on up/downgrades we expect the data plane to being
completely decoupled from the control plane; if Cilium would own the
sockets that would be disruptive. Another use-case which opens up and
is regularly asked from users would be to have DPDK applications on
top of AF_XDP in regular Kubernetes Pods.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf
Link: https://lpc.events/event/19/contributions/2275/
Link: https://patch.msgid.link/20260402231031.447597-14-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetkit: Add netkit notifier to check for unregistering devices
Daniel Borkmann [Thu, 2 Apr 2026 23:10:29 +0000 (01:10 +0200)] 
netkit: Add netkit notifier to check for unregistering devices

Add a netdevice notifier in netkit to watch for NETDEV_UNREGISTER events.
If the target device is indeed NETREG_UNREGISTERING and previously leased
a queue to a netkit device, then collect the related netkit devices and
batch-unregister_netdevice_many() them.

If this were not done, then the netkit device would hold a reference on
the physical device preventing it from going away. However, in case of
both io_uring zero-copy as well as AF_XDP this situation is handled
gracefully and the allocated resources are torn down.

In the case where mentioned infra is used through netkit, the applications
have a reference on netkit, and netkit in turn holds a reference on the
physical device. In order to have netkit release the reference on the
physical device, we need such watcher to then unregister the netkit ones.

This is generally quite similar to the dependency handling in case of
tunnels (e.g. vxlan bound to a underlying netdev) where the tunnel device
gets removed along with the physical device.

  # ip a
  [...]
  4: enp10s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
      link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff
      inet 10.0.0.2/24 scope global enp10s0f0np0
         valid_lft forever preferred_lft forever
  [...]
  8: nk@NONE: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
  [...]

  # rmmod mlx5_ib
  # rmmod mlx5_core
  [...]
  [  309.261822] mlx5_core 0000:0a:00.0 mlx5_0: Port: 1 Link DOWN
  [  344.235236] mlx5_core 0000:0a:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
  [  344.246948] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
  [  344.463754] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
  [  344.770155] mlx5_core 0000:0a:00.1: E-Switch: cleanup
  [...]

  # ip a
  [...]
  [ both enp10s0f0np0 and nk gone ]
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-13-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetkit: Implement rtnl_link_ops->alloc and ndo_queue_create
David Wei [Thu, 2 Apr 2026 23:10:28 +0000 (01:10 +0200)] 
netkit: Implement rtnl_link_ops->alloc and ndo_queue_create

Implement rtnl_link_ops->alloc that allows the number of rx queues to be
set when netkit is created. By default, netkit has only a single rxq (and
single txq). The number of queues is deliberately not allowed to be changed
via ethtool -L and is fixed for the lifetime of a netkit instance.

For netkit device creation, numrxqueues with larger than one rxq can be
specified. These rxqs are leasable to real rxqs in physical netdevs:

  ip link add type netkit peer numrxqueues 64      # for device pair
  ip link add numrxqueues 64 type netkit single    # for single device

The limit of numrxqueues for netkit is currently set to 1024, which allows
leasing multiple real rxqs from physical netdevs.

The implementation of ndo_queue_create() adds a new rxq during the queue
lease operation. We allow to create queues either in single device mode
or for the case of dual device mode for the netkit peer device which gets
placed into the target network namespace. For dual device mode the lease
against the primary device does not make sense for the targeted use cases,
and therefore gets rejected.

We also need to add a lockdep class for netkit, such that lockdep does
not trip over us, similarly done as in commit 0bef512012b1 ("net: add
netdev_lockdep_set_classes() to virtual drivers").

This is also the last missing bit to netkit for supporting io_uring with
zero-copy mode [0]. Up until this point it was not possible to consume the
latter out of containers or Kubernetes Pods where applications are in their
own network namespace.

io_uring example with eth0 being a physical device with 16 queues where
netkit is bound to the last queue, iou-zcrx.c is binary from selftests;
ethtool configuration (tcp-data-split, hds_thresh, RSS, flow steering)
is done on the physical device by the control plane; here, flow steering
to that queue is based on the service VIP:port of the server utilizing
io_uring:

  # ethtool -X eth0 start 0 equal 15
  # ethtool -X eth0 start 15 equal 1 context new
  # ethtool --config-ntuple eth0 flow-type tcp4 dst-ip 1.2.3.4 dst-port 5000 action 15
  # ip netns add foo
  # ip link add type netkit peer numrxqueues 2
  # ynl --family netdev --output-json --do queue-create \
        --json "{"ifindex": $(ifindex nk0), "type": "rx", \
                 "lease": { "ifindex": $(ifindex eth0), \
                            "queue": { "type": "rx", "id": 15 } } }"
  {'id': 1}
  # ip link set nk0 netns foo
  # ip link set nk1 up
  # ip netns exec foo ip link set lo up
  # ip netns exec foo ip link set nk0 up
  # ip netns exec foo ip addr add 1.2.3.4/32 dev nk0
  [ ... setup routing etc to get external traffic into the netns ... ]
  # ip netns exec foo ./iou-zcrx -s -p 5000 -i nk0 -q 1

Remote io_uring client:

  # ./iou-zcrx -c -h 1.2.3.4 -p 5000 -l 12840 -z 65536

We have tested the above against a Broadcom BCM957504 (bnxt_en) 100G NIC,
supporting TCP header/data split.

Similarly, this also works for devmem which we tested using ncdevmem:

  # ip netns exec foo ./ncdevmem -s 1.2.3.4 -l -p 5000 -f nk0 -t 1 -q 1

And on the remote client:

  # ./ncdevmem -s 1.2.3.4 -p 5000 -f eth0

For Cilium, the plan is to open up support for the various memory providers
for regular Kubernetes Pods when Cilium is configured with netkit datapath
mode.

Signed-off-by: David Wei <dw@davidwei.uk>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://kernel-recipes.org/en/2024/schedule/efficient-zero-copy-networking-using-io_uring
Link: https://patch.msgid.link/20260402231031.447597-12-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetkit: Add single device mode for netkit
Daniel Borkmann [Thu, 2 Apr 2026 23:10:27 +0000 (01:10 +0200)] 
netkit: Add single device mode for netkit

Add a single device mode for netkit instead of netkit pairs. The primary
target for the paired devices is to connect network namespaces, of course,
and support has been implemented in projects like Cilium [0]. For the rxq
leasing the plan is to support two main scenarios related to single device
mode:

* For the use-case of io_uring zero-copy, the control plane can either
  set up a netkit pair where the peer device can perform rxq leasing which
  is then tied to the lifetime of the peer device, or the control plane
  can use a regular netkit pair to connect the hostns to a Pod/container
  and dynamically add/remove rxq leasing through a single device without
  having to interrupt the device pair. In the case of io_uring, the memory
  pool is used as skb non-linear pages, and thus the skb will go its way
  through the regular stack into netkit. Things like the netkit policy when
  no BPF is attached or skb scrubbing etc apply as-is in case the paired
  devices are used, or if the backend memory is tied to the single device
  and traffic goes through a paired device.

* For the use-case of AF_XDP, the control plane needs to use netkit in the
  single device mode. The single device mode currently enforces only a
  pass policy when no BPF is attached, and does not yet support BPF link
  attachments for AF_XDP. skbs sent to that device get dropped at the
  moment. Given AF_XDP operates at a lower layer of the stack tying this
  to the netkit pair did not make sense. In future, the plan is to allow
  BPF at the XDP layer which can: i) process traffic coming from the AF_XDP
  application (e.g. QEMU with AF_XDP backend) to filter egress traffic or
  to push selected egress traffic up to the single netkit device to the
  local stack (e.g. DHCP requests), and ii) vice-versa skbs sent to the
  single netkit into the AF_XDP application (e.g. DHCP replies). Also,
  the control-plane can dynamically manage rxq leasing for the single
  netkit device without having to interrupt (e.g. down/up cycle) the main
  netkit pair for the Pod which has traffic going in and out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://docs.cilium.io/en/stable/operations/performance/tuning/#netkit-device-mode
Link: https://patch.msgid.link/20260402231031.447597-11-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoxsk: Proxy pool management for leased queues
Daniel Borkmann [Thu, 2 Apr 2026 23:10:26 +0000 (01:10 +0200)] 
xsk: Proxy pool management for leased queues

Similarly to the netif_mp_{open,close}_rxq handling for leased queues, proxy
the xsk_{reg,clear}_pool_at_qid via netif_get_rx_queue_lease_locked such
that in case a virtual netdev picked a leased rxq, the request gets through
to the real rxq in the physical netdev. The proxying is only relevant for
queue_id < dev->real_num_rx_queues since right now it's only supported for
rxqs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-10-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoxsk: Extend xsk_rcv_check validation
Daniel Borkmann [Thu, 2 Apr 2026 23:10:25 +0000 (01:10 +0200)] 
xsk: Extend xsk_rcv_check validation

xsk_rcv_check tests for inbound packets to see whether they match
the bound AF_XDP socket. Refactor the test into a small helper
xsk_dev_queue_valid and move the validation against xs->dev and
xs->queue_id there.

The fast-path case stays in place and allows for quick return in
xsk_dev_queue_valid. If it fails, the validation is extended to
check whether the AF_XDP socket is bound against a leased queue,
and if so, the test is redone.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-9-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Proxy netdev_queue_get_dma_dev for leased queues
David Wei [Thu, 2 Apr 2026 23:10:24 +0000 (01:10 +0200)] 
net: Proxy netdev_queue_get_dma_dev for leased queues

Extend netdev_queue_get_dma_dev to return the physical device of the
real rxq for DMA in case the queue was leased. This allows memory
providers like io_uring zero-copy or devmem to bind to the physically
leased rxq via virtual devices such as netkit.

Signed-off-by: David Wei <dw@davidwei.uk>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-8-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Proxy netif_mp_{open,close}_rxq for leased queues
David Wei [Thu, 2 Apr 2026 23:10:23 +0000 (01:10 +0200)] 
net: Proxy netif_mp_{open,close}_rxq for leased queues

When a process in a container wants to setup a memory provider, it will
use the virtual netdev and a leased rxq, and call netif_mp_{open,close}_rxq
to try and restart the queue. At this point, proxy the queue restart on
the real rxq in the physical netdev.

For memory providers (io_uring zero-copy rx and devmem), it causes the
real rxq in the physical netdev to be filled from a memory provider that
has DMA mapped memory from a process within a container.

Signed-off-by: David Wei <dw@davidwei.uk>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-7-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Slightly simplify net_mp_{open,close}_rxq
Daniel Borkmann [Thu, 2 Apr 2026 23:10:22 +0000 (01:10 +0200)] 
net: Slightly simplify net_mp_{open,close}_rxq

net_mp_open_rxq is currently not used in the tree as all callers are
using __net_mp_open_rxq directly, and net_mp_close_rxq is only used
once while all other locations use __net_mp_close_rxq.

Consolidate into a single API, netif_mp_{open,close}_rxq, using the
netif_ prefix to indicate that the caller is responsible for locking.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-6-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet, ethtool: Disallow leased real rxqs to be resized
Daniel Borkmann [Thu, 2 Apr 2026 23:10:21 +0000 (01:10 +0200)] 
net, ethtool: Disallow leased real rxqs to be resized

Similar to AF_XDP, do not allow queues in a physical netdev to be resized
by ethtool -L when they are leased. Cover channel resize paths (both
netlink and ioctl) to reject resizing when the queues would be affected.

Given we need to have different checks for RX vs TX, detangle the code into
a two-loop version rather than the range of new_combined + min(new_rx, new_tx)
to old_combined + max(old_rx, old_tx).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-5-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Add lease info to queue-get response
Daniel Borkmann [Thu, 2 Apr 2026 23:10:20 +0000 (01:10 +0200)] 
net: Add lease info to queue-get response

Populate nested lease info to the queue-get response that returns the
ifindex, queue id with type and optionally netns id if the device
resides in a different netns.

Example with ynl client when using AF_XDP via queue leasing:

  # ip a
  [...]
  4: enp10s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:24 qdisc mq state UP group default qlen 1000
    link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.2/24 scope global enp10s0f0np0
       valid_lft forever preferred_lft forever
    inet6 fe80::eaeb:d3ff:fea3:43f6/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
  [...]

  # ethtool -i enp10s0f0np0
  driver: mlx5_core
  [...]

  # ynl --family netdev --output-json --do queue-get \
        --json '{"ifindex": 4, "id": 15, "type": "rx"}'
  {'id': 15,
   'ifindex': 4,
   'lease': {'ifindex': 8, 'netns-id': 0, 'queue': {'id': 1, 'type': 'rx'}},
   'napi-id': 8227,
   'type': 'rx',
   'xsk': {}}

  # ip netns list
  foo (id: 0)

  # ip netns exec foo ip a
  [...]
  8: nk@NONE: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
      inet6 fe80::200:ff:fe00:0/64 scope link proto kernel_ll
         valid_lft forever preferred_lft forever
  [...]

  # ip netns exec foo ethtool -i nk
  driver: netkit
  [...]

  # ip netns exec foo ls /sys/class/net/nk/queues/
  rx-0  rx-1  tx-0

  # ip netns exec foo ynl --family netdev --output-json --do queue-get \
        --json '{"ifindex": 8, "id": 1, "type": "rx"}'
  {"id": 1, "type": "rx", "ifindex": 8, "xsk": {}}

Note that the caller of netdev_nl_queue_fill_one() holds the netdevice
lock. For the queue-get we do not lock both devices. When queues get
{un,}leased, both devices are locked, thus if __netif_get_rx_queue_lease()
returns a lease pointer, it points to a valid device. The netns-id is
fetched via peernet2id_alloc() similarly as done in OVS.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-4-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Implement netdev_nl_queue_create_doit
Daniel Borkmann [Thu, 2 Apr 2026 23:10:19 +0000 (01:10 +0200)] 
net: Implement netdev_nl_queue_create_doit

Implement netdev_nl_queue_create_doit which creates a new rx queue in a
virtual netdev and then leases it to a rx queue in a physical netdev.

Example with ynl client:

  # ynl --family netdev --output-json --do queue-create \
        --json '{"ifindex": 8, "type": "rx", "lease": {"ifindex": 4, "queue": {"type": "rx", "id": 15}}}'
  {'id': 1}

Note that the netdevice locking order is always from the virtual to
the physical device.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260402231031.447597-3-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Add queue-create operation
Daniel Borkmann [Thu, 2 Apr 2026 23:10:18 +0000 (01:10 +0200)] 
net: Add queue-create operation

Add a ynl netdev family operation called queue-create that creates a
new queue on a netdevice:

      name: queue-create
      attribute-set: queue
      flags: [admin-perm]
      do:
        request:
          attributes:
            - ifindex
            - type
            - lease
        reply: &queue-create-op
          attributes:
            - id

This is a generic operation such that it can be extended for various
use cases in future. Right now it is mandatory to specify ifindex,
the queue type which is enforced to rx and a lease. The newly created
queue id is returned to the caller.

A queue from a virtual device can have a lease which refers to another
queue from a physical device. This is useful for memory providers
and AF_XDP operations which take an ifindex and queue id to allow
applications to bind against virtual devices in containers. The lease
couples both queues together and allows to proxy the operations from
a virtual device in a container to the physical device.

In future, the nested lease attribute can be lifted and made optional
for other use-cases such as dynamic queue creation for physical
netdevs. The lack of lease and the specification of the physical
device as an ifindex will imply that we need a real queue to be
allocated. Similarly, the queue type enforcement to rx can then be
lifted as well to support tx.

An early implementation had only driver-specific integration [0], but
in order for other virtual devices to reuse, it makes sense to have
this as a generic API in core net.

For leasing queues, the virtual netdev must have real_num_rx_queues
less than num_rx_queues at the time of calling queue-create. The
queue-type must be rx as only rx queues are supported for leasing
for now. We also enforce that the queue-create ifindex must point
to a virtual device, and that the nested lease attribute's ifindex
must point to a physical device. The nested lease attribute set
contains a netns-id attribute which is optional and can specify a
netns-id relative to the caller's netns. It requires cap_net_admin
and if the netns-id attribute is not specified, the lease ifindex
will be retrieved from the current netns. Also, it is modeled as
an s32 type similarly as done elsewhere in the stack.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf
Link: https://patch.msgid.link/20260402231031.447597-2-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge tag 'drm-misc-next-fixes-2026-04-09' of https://gitlab.freedesktop.org/drm...
Dave Airlie [Fri, 10 Apr 2026 01:15:14 +0000 (11:15 +1000)] 
Merge tag 'drm-misc-next-fixes-2026-04-09' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

Short summary of fixes pull:

dma-buf:
- fence: fix docs for dma_fence_unlock_irqrestore()

fb-helper:
- unlock in error path

gem-shmem:
- fix PMD write update

gem-vram:
- remove obsolete documentation

ivpu:
- fix device-recovery handling

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260409113921.GA181028@linux.fritz.box
2 weeks agoblock: refactor blkdev_zone_mgmt_ioctl
Christoph Hellwig [Fri, 27 Mar 2026 09:00:32 +0000 (10:00 +0100)] 
block: refactor blkdev_zone_mgmt_ioctl

Split the zone reset case into a separate helper so that the conditional
locking goes away.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://patch.msgid.link/20260327090032.3722065-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agoMAINTAINERS: update ublk driver maintainer email
Ming Lei [Thu, 9 Apr 2026 13:30:19 +0000 (21:30 +0800)] 
MAINTAINERS: update ublk driver maintainer email

Update the ublk userspace block driver maintainer email address
from ming.lei@redhat.com to tom.leiming@gmail.com as the original
email will become invalid.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260409133020.3780098-8-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agoDocumentation: ublk: address review comments for SHMEM_ZC docs
Ming Lei [Thu, 9 Apr 2026 13:30:18 +0000 (21:30 +0800)] 
Documentation: ublk: address review comments for SHMEM_ZC docs

- Use "physical pages" instead of "page frame numbers (PFNs)" for
  clarity
- Remove "without any per-I/O overhead" claim from zero-copy
  description
- Add scatter/gather limitation: each I/O's data must be contiguous
  within a single registered buffer

Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260409133020.3780098-7-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agoublk: allow buffer registration before device is started
Ming Lei [Thu, 9 Apr 2026 13:30:17 +0000 (21:30 +0800)] 
ublk: allow buffer registration before device is started

Before START_DEV, there is no disk, no queue, no I/O dispatch, so
the maple tree can be safely modified under ub->mutex alone without
freezing the queue.

Add ublk_lock_buf_tree()/ublk_unlock_buf_tree() helpers that take
ub->mutex first, then freeze the queue if device is started. This
ordering (mutex -> freeze) is safe because ublk_stop_dev_unlocked()
already holds ub->mutex when calling del_gendisk() which freezes
the queue.

Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260409133020.3780098-6-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agoublk: replace xarray with IDA for shmem buffer index allocation
Ming Lei [Thu, 9 Apr 2026 13:30:16 +0000 (21:30 +0800)] 
ublk: replace xarray with IDA for shmem buffer index allocation

Remove struct ublk_buf which only contained nr_pages that was never
read after registration. Use IDA for pure index allocation instead
of xarray. Make __ublk_ctrl_unreg_buf() return int so the caller
can detect invalid index without a separate lookup.

Simplify ublk_buf_cleanup() to walk the maple tree directly and
unpin all pages in one pass, instead of iterating the xarray by
buffer index.

Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260409133020.3780098-5-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agoublk: simplify PFN range loop in __ublk_ctrl_reg_buf
Ming Lei [Thu, 9 Apr 2026 13:30:15 +0000 (21:30 +0800)] 
ublk: simplify PFN range loop in __ublk_ctrl_reg_buf

Use the for-loop increment instead of a manual `i++` past the last
page, and fix the mtree_insert_range end key accordingly.

Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260409133020.3780098-4-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agoublk: verify all pages in multi-page bvec fall within registered range
Ming Lei [Thu, 9 Apr 2026 13:30:14 +0000 (21:30 +0800)] 
ublk: verify all pages in multi-page bvec fall within registered range

rq_for_each_bvec() yields multi-page bvecs where bv_page is only the
first page. ublk_try_buf_match() only validated the start PFN against
the maple tree, but a bvec can span multiple pages past the end of a
registered range.

Use mas_walk() instead of mtree_load() to obtain the range boundaries
stored in the maple tree, and check that the bvec's end PFN does not
exceed the range. Also remove base_pfn from struct ublk_buf_range
since mas.index already provides the range start PFN.

Reported-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260409133020.3780098-3-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agoublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
Ming Lei [Thu, 9 Apr 2026 13:30:13 +0000 (21:30 +0800)] 
ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support

The __u32 len field cannot represent a 4GB buffer (0x100000000
overflows to 0). Change it to __u64 so buffers up to 4GB can be
registered. Add a reserved field for alignment and validate it
is zero.

The kernel enforces a default max of 4GB (UBLK_SHMEM_BUF_SIZE_MAX)
which may be increased in future.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260409133020.3780098-2-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agoaffs: bound hash_pos before table lookup in affs_readdir
Hyungjung Joo [Fri, 13 Mar 2026 13:29:43 +0000 (22:29 +0900)] 
affs: bound hash_pos before table lookup in affs_readdir

affs_readdir() decodes ctx->pos into hash_pos and chain_pos and then
dereferences AFFS_HEAD(dir_bh)->table[hash_pos] before validating
that hash_pos is within the runtime table bound. Treat out-of-range
positions as end-of-directory before the first table lookup.

Signed-off-by: Hyungjung Joo <jhj140711@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2 weeks agoMerge tag 'kbuild-fixes-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuil...
Linus Torvalds [Thu, 9 Apr 2026 23:48:44 +0000 (16:48 -0700)] 
Merge tag 'kbuild-fixes-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild fixes from Nathan Chancellor:

 - Make modules-cpio-pkg respect INSTALL_MOD_PATH so that it can be
   used with distribution initramfs files that have a merged /usr,
   such as Fedora

 - Silence an instance of -Wunused-but-set-global, a strengthening
   of -Wunused-but-set-variable in tip of tree Clang, in modpost,
   as the variable for extra warnings is currently unused

* tag 'kbuild-fixes-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  modpost: Declare extra_warn with unused attribute
  kbuild: modules-cpio-pkg: Respect INSTALL_MOD_PATH

2 weeks agoi2c: atr: use kzalloc_flex
Rosen Penev [Fri, 27 Mar 2026 03:03:10 +0000 (20:03 -0700)] 
i2c: atr: use kzalloc_flex

Convert kzalloc_obj + kcalloc to kzalloc_flex to save an allocation.

Add __counted_by to get extra runtime analysis.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260327030310.8502-1-rosenp@gmail.com
2 weeks agoselftests/nolibc: use gcc 15
Thomas Weißschuh [Wed, 8 Apr 2026 21:03:58 +0000 (23:03 +0200)] 
selftests/nolibc: use gcc 15

Newer compilers tend to detect more problematic code.

Update the testsuite to use gcc 15.2.0 by default.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260408-nolibc-gcc-15-v1-3-330d0c40f894@weissschuh.net
2 weeks agotools/nolibc: support UBSAN on gcc
Thomas Weißschuh [Wed, 8 Apr 2026 21:03:57 +0000 (23:03 +0200)] 
tools/nolibc: support UBSAN on gcc

The UBSAN implementation in gcc requires a slightly different function
attribute to skip instrumentation.

Extend __nolibc_no_sanitize_undefined to also handle gcc.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260408-nolibc-gcc-15-v1-2-330d0c40f894@weissschuh.net
2 weeks agotools/nolibc: create __nolibc_no_sanitize_ubsan
Thomas Weißschuh [Wed, 8 Apr 2026 21:03:56 +0000 (23:03 +0200)] 
tools/nolibc: create __nolibc_no_sanitize_ubsan

The logic to disable UBSAN will become a bit more complicated.
Move it out into compiler.h, so crt.h stays readable.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260408-nolibc-gcc-15-v1-1-330d0c40f894@weissschuh.net
2 weeks agodrm/ttm/tests: Remove checks from ttm_pool_free_no_dma_alloc
Maarten Lankhorst [Thu, 9 Apr 2026 14:26:59 +0000 (16:26 +0200)] 
drm/ttm/tests: Remove checks from ttm_pool_free_no_dma_alloc

On !x86, the pool type is never initialised, and the pages are freed
back to the system.

The test broke on the list_lru rewrite, but I'm not sure how that it was
supposed to work previously. In the meantime CI is broken so reverting
for now.

Fixes: 444e2a19d7fd ("ttm/pool: port to list_lru. (v2)")
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260409142658.1511941-2-dev@lankhorst.se
2 weeks agodrm/ttm/tests: fix lru_count ASSERT
Matthew Auld [Thu, 9 Apr 2026 12:15:09 +0000 (13:15 +0100)] 
drm/ttm/tests: fix lru_count ASSERT

On pool init we should expect the lru_count for each node to be zeroed
as per __list_lru_init -> init_one_lru, but here we are asserting the
opposite.

Currently our CI is blowing up with:

10:23:33] # ttm_device_init_pools: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:178
[10:23:33] Expected !list_lru_count(&pt.pages) to be false, but is true
[10:23:33] [FAILED] DMA allocations, DMA32 required
[10:23:33] [PASSED] No DMA allocations, DMA32 required
[10:23:33]     # ttm_device_init_pools: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:178
[10:23:33]     Expected !list_lru_count(&pt.pages) to be false, but is true

Fixes: 444e2a19d7fd ("ttm/pool: port to list_lru. (v2)")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ryszard Knop <ryszard.knop@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260409121512.81298-3-matthew.auld@intel.com
2 weeks agobpf: Fix use-after-free in offloaded map/prog info fill
Jiayuan Chen [Thu, 9 Apr 2026 02:37:32 +0000 (10:37 +0800)] 
bpf: Fix use-after-free in offloaded map/prog info fill

When querying info for an offloaded BPF map or program,
bpf_map_offload_info_fill_ns() and bpf_prog_offload_info_fill_ns()
obtain the network namespace with get_net(dev_net(offmap->netdev)).
However, the associated netdev's netns may be racing with teardown
during netns destruction. If the netns refcount has already reached 0,
get_net() performs a refcount_t increment on 0, triggering:

  refcount_t: addition on 0; use-after-free.

Although rtnl_lock and bpf_devs_lock ensure the netdev pointer remains
valid, they cannot prevent the netns refcount from reaching zero.

Fix this by using maybe_get_net() instead of get_net(). maybe_get_net()
uses refcount_inc_not_zero() and returns NULL if the refcount is already
zero, which causes ns_get_path_cb() to fail and the caller to return
-ENOENT -- the correct behavior when the netns is being destroyed.

Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs")
Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps")
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/f0aa3678-79c9-47ae-9e8c-02a3d1df160a@hust.edu.cn/
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260409023733.168050-1-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agoMAINTAINERS: Remove Salil Mehta as HiSilicon HNS3/HNS Ethernet maintainer
Salil Mehta [Thu, 9 Apr 2026 00:04:30 +0000 (01:04 +0100)] 
MAINTAINERS: Remove Salil Mehta as HiSilicon HNS3/HNS Ethernet maintainer

Closing this chapter and a long wonderful journey with my team, I sign off one
last time with my Huawei email address. Remove my maintainer entry for the
HiSilicon HNS and HNS3 10G/100G Ethernet drivers, and add a CREDITS entry for
my co-authorship and maintenance contributions to these drivers.

Link: https://lore.kernel.org/netdev/259cd032-2ccb-452b-8524-75bc7162e138@huawei.com/
Cc: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Acked-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260409000430.7217-1-salil.mehta@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 2 Apr 2026 17:57:09 +0000 (10:57 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.0-rc8).

Conflicts:

net/ipv6/seg6_iptunnel.c
  c3812651b522f ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel")
  78723a62b969a ("seg6: add per-route tunnel source address")
https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk

net/ipv4/icmp.c
  fde29fd934932 ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()")
  d98adfbdd5c01 ("ipv4: drop ipv6_stub usage and use direct function calls")
https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk

Adjacent changes:

drivers/net/ethernet/stmicro/stmmac/chain_mode.c
  51f4e090b9f8 ("net: stmmac: fix integer underflow in chain mode")
  6b4286e05508 ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests/bpf: Add test for stale pkt range after scalar arithmetic
Daniel Borkmann [Thu, 9 Apr 2026 15:50:16 +0000 (17:50 +0200)] 
selftests/bpf: Add test for stale pkt range after scalar arithmetic

Extend the verifier_direct_packet_access BPF selftests to exercise the
verifier code paths which ensure that the pkt range is cleared after
add/sub alu with a known scalar. The tests reject the invalid access.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_direct
  [...]
  #592/35  verifier_direct_packet_access/direct packet access: pkt_range cleared after sub with known scalar:OK
  #592/36  verifier_direct_packet_access/direct packet access: pkt_range cleared after add with known scalar:OK
  #592/37  verifier_direct_packet_access/direct packet access: test3:OK
  #592/38  verifier_direct_packet_access/direct packet access: test3 @unpriv:OK
  #592/39  verifier_direct_packet_access/direct packet access: test34 (non-linear, cgroup_skb/ingress, too short eth):OK
  #592/40  verifier_direct_packet_access/direct packet access: test35 (non-linear, cgroup_skb/ingress, too short 1):OK
  #592/41  verifier_direct_packet_access/direct packet access: test36 (non-linear, cgroup_skb/ingress, long enough):OK
  #592     verifier_direct_packet_access:OK
  [...]
  Summary: 2/47 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260409155016.536608-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agobpf: Drop pkt_end markers on arithmetic to prevent is_pkt_ptr_branch_taken
Daniel Borkmann [Thu, 9 Apr 2026 15:50:15 +0000 (17:50 +0200)] 
bpf: Drop pkt_end markers on arithmetic to prevent is_pkt_ptr_branch_taken

When a pkt pointer acquires AT_PKT_END or BEYOND_PKT_END range from
a comparison, and then, known-constant arithmetic is performed,
adjust_ptr_min_max_vals() copies the stale range via dst_reg->raw =
ptr_reg->raw without clearing the negative reg->range sentinel values.

This lets is_pkt_ptr_branch_taken() choose one branch direction and
skip going through the other. Fix this by clearing negative pkt range
values (that is, AT_PKT_END and BEYOND_PKT_END) after arithmetic on
pkt pointers. This ensures is_pkt_ptr_branch_taken() returns unknown
and both branches are properly verified.

Fixes: 6d94e741a8ff ("bpf: Support for pointers beyond pkt_end.")
Reported-by: STAR Labs SG <info@starlabs.sg>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260409155016.536608-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agoMerge branch 'acpi-apei'
Rafael J. Wysocki [Thu, 9 Apr 2026 20:01:19 +0000 (22:01 +0200)] 
Merge branch 'acpi-apei'

Merge ACPI APEI updates for 7.1-rc1:

 - Add devm_ghes_register_vendor_record_notifier(), use it in the PCI
   hisi driver, and Add NVIDIA vendor CPER record handler (Kai-Heng
   Feng)

* acpi-apei:
  ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler
  PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
  ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()

2 weeks agoMerge branch 'acpi-driver'
Rafael J. Wysocki [Thu, 9 Apr 2026 19:54:15 +0000 (21:54 +0200)] 
Merge branch 'acpi-driver'

Merge ACPI core driver core driver updates and assorted driver updates
related to ACPI support for 7.1-rc1:

 - Clean up the ACPI AC and ACPI PAD (processor aggregator device)
   drivers (Rafael Wysocki)

 - Rework checking for duplicate video bus devices and consolidate
   pnp.bus_id workarounds handling in the ACPI video bus driver (Rafael
   Wysocki)

 - Update the ACPI core device drivers to stop setting acpi_device_name()
   unnecessarily (Rafael Wysocki)

 - Rearrange code using acpi_device_class() in the ACPI core device
   drivers and update them to stop setting acpi_device_class()
   unnecessarily (Rafael Wysocki)

 - Define ACPI_AC_CLASS in one place (Rafael Wysocki)

 - Convert the ni903x_wdt watchdog driver and the xen ACPI PAD driver to
   bind to platform devices instead of ACPI devices (Rafael Wysocki)

* acpi-driver:
  watchdog: ni903x_wdt: Convert to a platform driver
  ACPI: PAD: xen: Convert to a platform driver
  ACPI: AC: Define ACPI_AC_CLASS in one place
  ACPI: driver: Do not set acpi_device_class() unnecessarily
  ACPI: driver: Avoid using pnp.device_class for netlink handling
  ACPI: event: Redefine acpi_notifier_call_chain()
  ACPI: driver: Do not set acpi_device_name() unnecessarily
  ACPI: video: Consolidate pnp.bus_id workarounds handling
  ACPI: video: Rework checking for duplicate video bus devices
  driver core: auxiliary bus: Introduce dev_is_auxiliary()
  ACPI: PAD: Rearrange notify handler installation and removal
  ACPI: AC: Get rid of unnecessary declarations

2 weeks agoregmap: debugfs: fix race condition in dummy name allocation
Zxyan Zhu [Thu, 9 Apr 2026 03:50:15 +0000 (11:50 +0800)] 
regmap: debugfs: fix race condition in dummy name allocation

Use IDA instead of a simple counter for generating unique dummy names.
The previous implementation used dummy_index++ which is not atomic,
leading to potential duplicate names when multiple threads call
regmap_debugfs_init() concurrently with name="dummy".

Signed-off-by: Zxyan Zhu <zxyan0222@gmail.com>
Link: https://patch.msgid.link/20260409035015.950764-1-zxyan0222@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoMerge branch 'acpi-tad'
Rafael J. Wysocki [Thu, 9 Apr 2026 19:50:37 +0000 (21:50 +0200)] 
Merge branch 'acpi-tad'

Merge ACPI Time and Alarm Device (TAD) driver updates for 7.1-rc1:

 - Clean up the ACPI TAD driver in various ways and add an RTC class
   device interface, including both the RTC setting/reading and alarm
   timer support, to it (Rafael Wysocki)

* acpi-tad:
  ACPI: TAD: Add alarm support to the RTC class device interface
  ACPI: TAD: Split acpi_tad_rtc_read_time()
  ACPI: TAD: Relocate two functions
  ACPI: TAD: Split three functions to untangle runtime PM handling
  ACPI: TAD: Use DC wakeup only if AC wakeup is supported
  ACPI: TAD: Use dev_groups in struct device_driver
  ACPI: TAD: Update the driver description comment
  ACPI: TAD: Add RTC class device interface
  ACPI: TAD: Clear unused RT data in acpi_tad_set_real_time()
  ACPI: TAD: Rearrange RT data validation checking
  ACPI: TAD: Use __free() for cleanup in time_store()
  ACPI: TAD: Support RTC without wakeup
  ACPI: TAD: Create one attribute group

2 weeks agothermal: renesas: rzg3e: Remove stale @trim_offset kernel-doc entry
John Madieu [Thu, 9 Apr 2026 12:59:16 +0000 (14:59 +0200)] 
thermal: renesas: rzg3e: Remove stale @trim_offset kernel-doc entry

The trim_offset field was removed from struct rzg3e_thermal_priv but
its kernel-doc entry was left behind. Remove it to fix the mismatch.

Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260409125916.2244241-1-john.madieu.xa@bp.renesas.com
2 weeks agoASoC: rt1320-sdw: kcontrol for brown-out feature update
Jack Yu [Thu, 9 Apr 2026 06:01:01 +0000 (14:01 +0800)] 
ASoC: rt1320-sdw: kcontrol for brown-out feature update

Create a kcontrol to enable or disable brown-out dynamically.

Signed-off-by: Jack Yu <jack.yu@realtek.com>
Link: https://patch.msgid.link/20260409060102.4177554-1-jack.yu@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoMerge branch 'acpi-cmos-rtc'
Rafael J. Wysocki [Thu, 9 Apr 2026 19:40:22 +0000 (21:40 +0200)] 
Merge branch 'acpi-cmos-rtc'

Merge updates related to the CMOS RTC driver and x86/ACPI CMOS RTC
support for 7.1-rc1:

 - Add ACPI support to the platform device interface in the CMOS RTC
   driver, make the ACPI core device enumeration code create a platform
   device for the CMOS RTC, and drop CMOS RTC PNP device support (Rafael
   Wysocki)

 - Consolidate the x86-specific CMOS RTC handling with the ACPI TAD
   driver and clean up the CMOS RTC ACPI address space handler (Rafael
   Wysocki)

 - Enable ACPI alarm in the CMOS RTC driver if advertised in ACPI FADT
   and allow that driver to work without a dedicated IRQ if the ACPI
   alarm is used (Rafael Wysocki)

* acpi-cmos-rtc:
  rtc: cmos: Do not require IRQ if ACPI alarm is used
  rtc: cmos: Enable ACPI alarm if advertised in ACPI FADT
  ACPI: TAD/x86: cmos_rtc: Consolidate address space handler setup
  rtc: cmos: Drop PNP device support
  x86: rtc: Drop PNP device check
  ACPI: PNP: Drop CMOS RTC PNP device support
  ACPI: x86/rtc-cmos: Use platform device for driver binding
  ACPI: x86: cmos_rtc: Create a CMOS RTC platform device
  ACPI: x86: cmos_rtc: Improve coordination with ACPI TAD driver
  ACPI: x86: cmos_rtc: Clean up address space handler driver

2 weeks agoMerge branches 'acpi-processor' and 'acpi-cppc'
Rafael J. Wysocki [Thu, 9 Apr 2026 19:26:06 +0000 (21:26 +0200)] 
Merge branches 'acpi-processor' and 'acpi-cppc'

Merge ACPI processor driver updates and ACPI CPPC library updates for
7.1-rc1:

 - Address multiple assorted issues and clean up the code in the ACPI
   processor idle driver (Huisong Li)

 - Replace strlcat() in the ACPI processor idle drive with a better
   alternative (Andy Shevchenko)

 - Rearrange and clean up acpi_processor_errata_piix4() (Rafael Wysocki)

 - Move reference performance to capabilities and fix an uninitialized
   variable in the ACPI CPPC library (Pengjie Zhang)

 - Add support for the Performance Limited Register to the ACPI CPPC
   library (Sumit Gupta)

 - Add cppc_get_perf() API to read performance controls, extend
   cppc_set_epp_perf() for FFH/SystemMemory, and make the ACPI CPPC
   library warn on missing mandatory DESIRED_PERF register (Sumit Gupta)

 - Modify the cpufreq CPPC driver to update MIN_PERF/MAX_PERF in target
   callbacks to allow it to control performance bounds via standard
   scaling_min_freq and scaling_max_freq sysfs attributes and add sysfs
   documentation for the Performance Limited Register to it (Sumit Gupta)

* acpi-processor:
  ACPI: processor: idle: Reset cpuidle on C-state list changes
  cpuidle: Extract and export no-lock variants of cpuidle_unregister_device()
  ACPI: processor: idle: Fix NULL pointer dereference in hotplug path
  ACPI: processor: idle: Reset power_setup_done flag on initialization failure
  ACPI: processor: Rearrange and clean up acpi_processor_errata_piix4()
  ACPI: processor: idle: Replace strlcat() with better alternative
  ACPI: processor: idle: Remove redundant static variable and rename cstate check function
  ACPI: processor: idle: Move max_cstate update out of the loop
  ACPI: processor: idle: Remove redundant cstate check in acpi_processor_power_init
  ACPI: processor: idle: Add missing bounds check in flatten_lpi_states()

* acpi-cppc:
  ACPI: CPPC: Check cpc_read() return values consistently
  ACPI: CPPC: Fix uninitialized ref variable in cppc_get_perf_caps()
  ACPI: CPPC: Move reference performance to capabilities
  cpufreq: CPPC: Add sysfs documentation for perf_limited
  ACPI: CPPC: add APIs and sysfs interface for perf_limited
  cpufreq: cppc: Update MIN_PERF/MAX_PERF in target callbacks
  cpufreq: CPPC: Update cached perf_ctrls on sysfs write
  ACPI: CPPC: Extend cppc_set_epp_perf() for FFH/SystemMemory
  ACPI: CPPC: Warn on missing mandatory DESIRED_PERF register
  ACPI: CPPC: Add cppc_get_perf() API to read performance controls

2 weeks agox86/virt: Treat SVM as unsupported when running as an SEV+ guest
Sean Christopherson [Thu, 9 Apr 2026 19:13:41 +0000 (12:13 -0700)] 
x86/virt: Treat SVM as unsupported when running as an SEV+ guest

When running as an SEV+ guest, treat SVM as unsupported even if CPUID (and
other reporting, e.g. MSRs) enumerate support for SVM, as KVM  doesn't
support nested virtualization within an SEV VM (KVM would need to
explicitly share all VMCBs and other assets with the untrusted host), let
alone running nested VMs within SEV-ES+ guests (e.g. emulating VMLOAD,
VMSAVE, and VMRUN all require access to guest register state).  And outside
of KVM, there is no in-tree user of SVM enabling.

Arguably, the hypervisor/VMM (e.g. QEMU) should clear SVM from guest CPUID
for SEV VMs, especially for SEV-ES+, but super duper technically, it's
feasible to run nested VMs in SEV+ guests (with many caveats).  More
importantly, Linux-as-a-guest has played nice with SVM being advertised to
SEV+ guests for a long time.

Treating SVM as unsupported fixes a regression where a clean shutdown of
an SEV-ES+ guest degrades into an abrupt termination.  Due to a gnarly
virtualization hole in SEV-ES (the architecture), where EFER must NOT be
intercepted by the hypervisor (because the untrusted hypervisor can't set
e.g. EFER.LME on behalf o the guest), the _host's_ EFER.SVME is visible to
the guest.  Because EFER.SVME must be always '1' while in guest mode,
Linux-the-guest sees EFER.SVME=1 even when _its_ EFER.SVME is '0', thinks
it has enabled virtualization, and ultimately can cause
x86_svm_emergency_disable_virtualization_cpu() to execute STGI to ensure
GIF is enabled.  Executing STGI _should_ be fine, except Linux is a also
wee bit paranoid when running as an SEV-ES guest.

Because L0 sees EFER.SVME=0 for the guest, a well-behaved L0 hypervisor
will intercept STGI (to inject #UD), and thus generate a #VC on the STGI.
Which, again, should be fine.  Unfortunately, vc_check_opcode_bytes() fails
to account for STGI and other SVM instructions, throws a fatal error, and
triggers a termination request.  In a perfect world, the #VC handler would
be more forgiving of unknown intercepts, especially when the #VC happened
on an instruction with exception fixup.  For now, just fix the immediate
regression.

Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Reported-by: Srikanth Aithal <sraithal@amd.com>
Closes: https://lore.kernel.org/all/c820e242-9f3a-4210-b414-19d11b022404@amd.com
Link: https://patch.msgid.link/20260409191341.1932853-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoregulator: fix OF node imbalance on reuse
Mark Brown [Thu, 9 Apr 2026 19:19:36 +0000 (20:19 +0100)] 
regulator: fix OF node imbalance on reuse

Johan Hovold <johan@kernel.org> says:

These drivers reuse the OF node of their parent multi-function device
but fail to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.

Fix this by using the intended helper for reusing OF nodes.

Note that the first two patches will cause a trivial conflict with Doug's
series adding accessor functions for struct device flags which has now been
merged to the driver-core tree:

https://lore.kernel.org/r/20260406232444.3117516-1-dianders@chromium.org

Link: https://patch.msgid.link/20260408073055.5183-1-johan@kernel.org
2 weeks agoregulator: bd9571mwv: fix OF node reference imbalance
Johan Hovold [Wed, 8 Apr 2026 07:30:55 +0000 (09:30 +0200)] 
regulator: bd9571mwv: fix OF node reference imbalance

The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.

Fix this by using the intended helper for reusing OF nodes.

Fixes: e85c5a153fe2 ("regulator: Add ROHM BD9571MWV-M PMIC regulator driver")
Cc: stable@vger.kernel.org # 4.12
Cc: Marek Vasut <marek.vasut@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260408073055.5183-8-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoMerge branches 'acpica', 'acpi-osl' and 'acpi-tables'
Rafael J. Wysocki [Thu, 9 Apr 2026 19:19:34 +0000 (21:19 +0200)] 
Merge branches 'acpica', 'acpi-osl' and 'acpi-tables'

Merge ACPICA updates, an ACPI OS service layer (OSL) update and
assorted updates related to parsing ACPI tables for 7.1-rc1:

 - Update maintainers information regarding ACPICA (Rafael Wysocki)

 - Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy() (Kees
   Cook)

 - Trigger an ordered system power off after encountering a fatal error
   operator in AML (Armin Wolf)

 - Enable ACPI FPDT parsing on LoongArch (Xi Ruoyao)

 - Remove the temporary stop-gap acpi_pptt_cache_v1_full structure from
   the ACPI PPTT parser (Ben Horgan)

 - Add support for exposing ACPI FPDT subtables FBPT and S3PT (Nate
   DeSimone)

* acpica:
  ACPICA: Update maintainers information
  ACPICA: Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy()

* acpi-osl:
  ACPI: OSL: Poweroff when encountering a fatal ACPI error

* acpi-tables:
  ACPI: tables: Enable FPDT on LoongArch
  Documentation: ABI: add FBPT and S3PT entries to sysfs-firmware-acpi
  ACPI: FPDT: expose FBPT and S3PT subtables via sysfs
  ACPI: PPTT: Remove duplicate structure, acpi_pptt_cache_v1_full

2 weeks agoregulator: act8945a: fix OF node reference imbalance
Johan Hovold [Wed, 8 Apr 2026 07:30:54 +0000 (09:30 +0200)] 
regulator: act8945a: fix OF node reference imbalance

The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.

Fix this by using the intended helper for reusing OF nodes.

Fixes: 38c09961048b ("regulator: act8945a: add regulator driver for ACT8945A")
Cc: stable@vger.kernel.org # 4.6
Cc: Wenyou Yang <wenyou.yang@atmel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260408073055.5183-7-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoregulator: s2dos05: fix OF node reference imbalance
Johan Hovold [Wed, 8 Apr 2026 07:30:53 +0000 (09:30 +0200)] 
regulator: s2dos05: fix OF node reference imbalance

The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.

Fix this by using the intended helper for reusing OF nodes.

Fixes: bb2441402392 ("regulator: add s2dos05 regulator support")
Cc: stable@vger.kernel.org # 6.18
Cc: Dzmitry Sankouski <dsankouski@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260408073055.5183-6-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoregulator: mt6357: fix OF node reference imbalance
Johan Hovold [Wed, 8 Apr 2026 07:30:52 +0000 (09:30 +0200)] 
regulator: mt6357: fix OF node reference imbalance

The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.

Fix this by using the intended helper for reusing OF nodes.

Fixes: dafc7cde23dc ("regulator: add mt6357 regulator")
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260408073055.5183-5-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoregulator: max77650: fix OF node reference imbalance
Johan Hovold [Wed, 8 Apr 2026 07:30:51 +0000 (09:30 +0200)] 
regulator: max77650: fix OF node reference imbalance

The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.

Fix this by using the intended helper for reusing OF nodes.

Fixes: bcc61f1c44fd ("regulator: max77650: add regulator support")
Cc: stable@vger.kernel.org # 5.1
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260408073055.5183-4-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoregulator: rk808: fix OF node reference imbalance
Johan Hovold [Wed, 8 Apr 2026 07:30:50 +0000 (09:30 +0200)] 
regulator: rk808: fix OF node reference imbalance

The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.

Fix this by using the intended helper for reusing OF nodes.

Fixes: 647e57351f8e ("regulator: rk808: reduce 'struct rk808' usage")
Cc: stable@vger.kernel.org # 6.2
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260408073055.5183-3-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoregulator: bq257xx: fix OF node reference imbalance
Johan Hovold [Wed, 8 Apr 2026 07:30:49 +0000 (09:30 +0200)] 
regulator: bq257xx: fix OF node reference imbalance

The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.

Fix this by using the intended helper for reusing OF nodes.

Fixes: 981dd162b635 ("regulator: bq257xx: Add bq257xx boost regulator driver")
Cc: stable@vger.kernel.org # 6.18
Cc: Chris Morgan <macromorgan@hotmail.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260408073055.5183-2-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoACPICA: Update maintainers information
Rafael J. Wysocki [Thu, 9 Apr 2026 11:24:28 +0000 (13:24 +0200)] 
ACPICA: Update maintainers information

Update MAINTAINERS to reflect ACPICA maintainership changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12876647.O9o76ZdvQC@rafael.j.wysocki
2 weeks agospi: mpfs: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:19 +0000 (14:04 +0200)] 
spi: mpfs: fix controller deregistration

Make sure to deregister the controller before disabling underlying
resources like interrupts during driver unbind.

Fixes: 9ac8d17694b6 ("spi: add support for microchip fpga spi controllers")
Cc: stable@vger.kernel.org # 6.0
Cc: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260409120419.388546-21-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: microchip-core-spi: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:18 +0000 (14:04 +0200)] 
spi: microchip-core-spi: fix controller deregistration

Make sure to deregister the controller before disabling underlying
resources like interrupts during driver unbind.

Fixes: 059f545832be ("spi: add support for microchip "soft" spi controller")
Cc: stable@vger.kernel.org # 6.19
Cc: Prajna Rajendra Kumar <prajna.rajendrakumar@microchip.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260409120419.388546-20-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: microchip-core-qspi: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:17 +0000 (14:04 +0200)] 
spi: microchip-core-qspi: fix controller deregistration

Make sure to deregister the controller before disabling underlying
resources like interrupts during driver unbind.

Fixes: 8596124c4c1b ("spi: microchip-core-qspi: Add support for microchip fpga qspi controllers")
Cc: stable@vger.kernel.org # 6.1
Cc: Naga Sureshkumar Relli <nagasuresh.relli@microchip.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260409120419.388546-19-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: meson-spicc: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:16 +0000 (14:04 +0200)] 
spi: meson-spicc: fix controller deregistration

Make sure to deregister the controller before disabling it to allow SPI
device drivers to do I/O during deregistration.

Fixes: 454fa271bc4e ("spi: Add Meson SPICC driver")
Cc: stable@vger.kernel.org # 4.13
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-18-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: lantiq-ssc: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:15 +0000 (14:04 +0200)] 
spi: lantiq-ssc: fix controller deregistration

Make sure to deregister the controller before releasing underlying
resources like clocks during driver unbind.

Fixes: 17f84b793c01 ("spi: lantiq-ssc: add support for Lantiq SSC SPI controller")
Cc: stable@vger.kernel.org # 4.11
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-17-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: img-spfi: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:14 +0000 (14:04 +0200)] 
spi: img-spfi: fix controller deregistration

Make sure to deregister the controller before disabling and releasing
underlying resources like clocks and DMA during driver unbind.

Fixes: deba25800a12 ("spi: Add driver for IMG SPFI controller")
Cc: stable@vger.kernel.org # 3.19
Cc: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-16-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: fsl-espi: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:12 +0000 (14:04 +0200)] 
spi: fsl-espi: fix controller deregistration

Make sure to deregister the controller before disabling runtime PM
(which can leave the controller disabled) to allow SPI device drivers to
do I/O during deregistration.

Fixes: e9abb4db8d10 ("spi: fsl-espi: add runtime PM")
Cc: stable@vger.kernel.org # 4.3
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-14-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: ep93xx: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:11 +0000 (14:04 +0200)] 
spi: ep93xx: fix controller deregistration

Make sure to deregister the controller before releasing underlying
resources like DMA during driver unbind.

Fixes: 011f23a3c2f2 ("spi/ep93xx: implemented driver for Cirrus EP93xx SPI controller")
Cc: stable@vger.kernel.org # 2.6.35
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-13-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: dln2: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:10 +0000 (14:04 +0200)] 
spi: dln2: fix controller deregistration

Make sure to deregister the controller before disabling it to allow
SPI device drivers to do I/O during deregistration.

Fixes: 3d8c0d749da3 ("spi: add support for DLN-2 USB-SPI adapter")
Cc: stable@vger.kernel.org # 4.0
Cc: Laurentiu Palcu <laurentiu.palcu@intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-12-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: coldfire-qspi: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:09 +0000 (14:04 +0200)] 
spi: coldfire-qspi: fix controller deregistration

Make sure to deregister the controller before disabling underlying
resources like clocks (via runtime pm) during driver unbind.

Fixes: 34b8c6617366 ("spi: Add Freescale/Motorola Coldfire QSPI driver")
Cc: stable@vger.kernel.org # 2.6.34
Cc: Steven King <sfking@fdwdc.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-11-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: cavium-thunderx: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:08 +0000 (14:04 +0200)] 
spi: cavium-thunderx: fix controller deregistration

Make sure to deregister the controller before disabling it to avoid
hanging or leaking resources associated with the queue when the queue
non-empty.

Fixes: 7347a6c7af8d ("spi: octeon: Add ThunderX driver")
Cc: stable@vger.kernel.org # 4.9
Cc: Jan Glauber <jan.glauber@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-10-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: octeon: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:07 +0000 (14:04 +0200)] 
spi: octeon: fix controller deregistration

Make sure to deregister the controller before disabling it to avoid
hanging or leaking resources associated with the queue when the queue is
non-empty.

Fixes: 22ad2d8df77d ("spi: octeon: use devm_spi_register_master()")
Cc: stable@vger.kernel.org # 3.13
Cc: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-9-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: bcmbca-hsspi: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:06 +0000 (14:04 +0200)] 
spi: bcmbca-hsspi: fix controller deregistration

Make sure to deregister the controller before disabling underlying
resources like interrupts during driver unbind to allow SPI drivers to
do I/O during deregistration.

Note that clocks were also disabled before the recent commit
e532e21a246d ("spi: bcm63xx-hsspi: Simplify clock handling with
devm_clk_get_enabled()").

Fixes: a38a2233f23b ("spi: bcmbca-hsspi: Add driver for newer HSSPI controller")
Cc: stable@vger.kernel.org # 6.3: deb269e0394f
Cc: stable@vger.kernel.org # 6.3
Cc: William Zhang <william.zhang@broadcom.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-8-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: bcm63xx-hsspi: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:05 +0000 (14:04 +0200)] 
spi: bcm63xx-hsspi: fix controller deregistration

Make sure to deregister the controller before disabling underlying
resources like interrupts during driver unbind to allow SPI drivers to
do I/O during deregistration.

Note that clocks were also disabled before the recent commit
e532e21a246d ("spi: bcm63xx-hsspi: Simplify clock handling with
devm_clk_get_enabled()").

Fixes: 7d255695804f ("spi/bcm63xx-hsspi: use devm_register_master()")
Cc: stable@vger.kernel.org # 3.14
Cc: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-7-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: bcm63xx: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:04 +0000 (14:04 +0200)] 
spi: bcm63xx: fix controller deregistration

Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.

Fixes: b42dfed83d95 ("spi: add Broadcom BCM63xx SPI controller driver")
Cc: stable@vger.kernel.org # 3.4
Cc: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-6-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: atmel: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:03 +0000 (14:04 +0200)] 
spi: atmel: fix controller deregistration

Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.

Fixes: 754ce4f29937 ("[PATCH] SPI: atmel_spi driver")
Cc: stable@vger.kernel.org # 2.6.21
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-5-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: at91-usart: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:02 +0000 (14:04 +0200)] 
spi: at91-usart: fix controller deregistration

Make sure to deregister the controller before disabling and releasing
underlying resources like clocks and DMA during driver unbind.

Fixes: e1892546ff66 ("spi: at91-usart: Add driver for at91-usart as SPI")
Cc: stable@vger.kernel.org # 4.20
Cc: Radu Pirea <radu.pirea@microchip.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-4-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: aspeed-smc: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:01 +0000 (14:04 +0200)] 
spi: aspeed-smc: fix controller deregistration

Make sure to deregister the controller before disabling it to allow
SPI device drivers to do I/O during deregistration.

Fixes: e3228ed92893 ("spi: spi-mem: Convert Aspeed SMC driver to spi-mem")
Cc: stable@vger.kernel.org # 5.19
Cc: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-3-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agospi: amlogic-spisg: fix controller deregistration
Johan Hovold [Thu, 9 Apr 2026 12:04:00 +0000 (14:04 +0200)] 
spi: amlogic-spisg: fix controller deregistration

Make sure to deregister the controller before disabling underlying
resources like clocks during driver unbind.

Fixes: cef9991e04ae ("spi: Add Amlogic SPISG driver")
Cc: stable@vger.kernel.org # 6.17: b8db95529979
Cc: stable@vger.kernel.org # 6.17
Cc: Sunny Luo <sunny.luo@amlogic.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120419.388546-2-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoKVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails
Sean Christopherson [Tue, 10 Mar 2026 23:48:29 +0000 (16:48 -0700)] 
KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails

Dedup a small amount of cleanup code in SEV ASID allocation by reusing
an existing error label.

No functional change intended.

Link: https://patch.msgid.link/20260310234829.2608037-22-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper
Carlos López [Tue, 10 Mar 2026 23:48:28 +0000 (16:48 -0700)] 
KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper

Extract the lock-protected parts of SEV ASID allocation into a new helper
and opportunistically convert it to use guard() when acquiring the mutex.

Preserve the goto even though it's a little odd, as it's there's a fair
amount of subtlety that makes it surprisingly difficult to replicate the
functionality with a loop construct, and arguably using goto yields the
most readable code.

No functional change intended.

Signed-off-by: Carlos López <clopez@suse.de>
[sean: move code to separate helper, rework shortlog+changelog]
Link: https://patch.msgid.link/20260310234829.2608037-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: use mutex guard in snp_handle_guest_req()
Carlos López [Tue, 10 Mar 2026 23:48:27 +0000 (16:48 -0700)] 
KVM: SEV: use mutex guard in snp_handle_guest_req()

Simplify the error paths in snp_handle_guest_req() by using a mutex
guard, allowing early return instead of using gotos.

Signed-off-by: Carlos López <clopez@suse.de>
Link: https://patch.msgid.link/20260120201013.3931334-8-clopez@suse.de
Link: https://patch.msgid.link/20260310234829.2608037-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: use mutex guard in sev_mem_enc_unregister_region()
Carlos López [Tue, 10 Mar 2026 23:48:26 +0000 (16:48 -0700)] 
KVM: SEV: use mutex guard in sev_mem_enc_unregister_region()

Simplify the error paths in sev_mem_enc_unregister_region() by using a
mutex guard, allowing early return instead of using gotos.

Signed-off-by: Carlos López <clopez@suse.de>
Link: https://patch.msgid.link/20260120201013.3931334-7-clopez@suse.de
Link: https://patch.msgid.link/20260310234829.2608037-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: use mutex guard in sev_mem_enc_ioctl()
Carlos López [Tue, 10 Mar 2026 23:48:25 +0000 (16:48 -0700)] 
KVM: SEV: use mutex guard in sev_mem_enc_ioctl()

Simplify the error paths in sev_mem_enc_ioctl() by using a mutex guard,
allowing early return instead of using gotos.

Signed-off-by: Carlos López <clopez@suse.de>
Link: https://patch.msgid.link/20260120201013.3931334-5-clopez@suse.de
Link: https://patch.msgid.link/20260310234829.2608037-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: use mutex guard in snp_launch_update()
Carlos López [Tue, 10 Mar 2026 23:48:24 +0000 (16:48 -0700)] 
KVM: SEV: use mutex guard in snp_launch_update()

Simplify the error paths in snp_launch_update() by using a mutex guard,
allowing early return instead of using gotos.

Signed-off-by: Carlos López <clopez@suse.de>
Link: https://patch.msgid.link/20260120201013.3931334-4-clopez@suse.de
Link: https://patch.msgid.link/20260310234829.2608037-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Assert that kvm->lock is held when querying SEV+ support
Sean Christopherson [Tue, 10 Mar 2026 23:48:23 +0000 (16:48 -0700)] 
KVM: SEV: Assert that kvm->lock is held when querying SEV+ support

Assert that kvm->lock is held when checking if a VM is an SEV+ VM, as KVM
sets *and* resets the relevant flags when initialization SEV state, i.e.
it's extremely easy to end up with TOCTOU bugs if kvm->lock isn't held.

Add waivers for a VM being torn down (refcount is '0') and for there being
a loaded vCPU, with comments for both explaining why they're safe.

Note, the "vCPU loaded" waiver is necessary to avoid splats on the SNP
checks in sev_gmem_prepare() and sev_gmem_max_mapping_level(), which are
currently called when handling nested page faults.  Alternatively, those
checks could key off KVM_X86_SNP_VM, as kvm_arch.vm_type is stable early
in VM creation.  Prioritize consistency, at least for now, and to leave a
"reminder" that the max mapping level code in particular likely needs
special attention if/when KVM supports dirty logging for SNP guests.

Link: https://patch.msgid.link/20260310234829.2608037-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe"
Sean Christopherson [Tue, 10 Mar 2026 23:48:22 +0000 (16:48 -0700)] 
KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe"

Document that the check for an SEV+ guest when reclaiming guest memory is
safe even though kvm->lock isn't held.  This will allow asserting that
kvm->lock is held in the SEV accessors, without triggering false positives
on the "safe" cases.

No functional change intended.

Link: https://patch.msgid.link/20260310234829.2608037-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y
Sean Christopherson [Tue, 10 Mar 2026 23:48:21 +0000 (16:48 -0700)] 
KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y

Bury "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y to make it harder
for SEV specific code to sneak into common SVM code.

No functional change intended.

Link: https://patch.msgid.link/20260310234829.2608037-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: WARN on unhandled VM type when initializing VM
Sean Christopherson [Tue, 10 Mar 2026 23:48:20 +0000 (16:48 -0700)] 
KVM: SEV: WARN on unhandled VM type when initializing VM

WARN if KVM encounters an unhandled VM type when setting up flags for SEV+
VMs, e.g. to guard against adding a new flavor of SEV without adding proper
recognition in sev_vm_init().

Practically speaking, no functional change intended (the new "default" case
should be unreachable).

Link: https://patch.msgid.link/20260310234829.2608037-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agospi: npcm-fiu: drop unused remove callback
Johan Hovold [Thu, 9 Apr 2026 12:08:10 +0000 (14:08 +0200)] 
spi: npcm-fiu: drop unused remove callback

Drop the remove callback which is unused since commit 82c4fadb0b95
("spi: npcm-fiu: Use helper function devm_clk_get_enabled()").

The above mentioned commit also removed the last user of the platform
driver data which no longer needs to be set (twice).

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260409120810.388909-1-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>