]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
5 weeks agosockptr: fix usize check in copy_struct_from_sockptr() for user pointers
Stefan Metzmacher [Tue, 7 Apr 2026 16:03:14 +0000 (18:03 +0200)] 
sockptr: fix usize check in copy_struct_from_sockptr() for user pointers

copy_struct_from_user will never hit the check_zeroed_user() call
and will never return -E2BIG if new userspace passed new bits in a
larger structure than the current kernel structure.

As far as I can there are no critical/related uapi changes in

- include/net/bluetooth/bluetooth.h and net/bluetooth/sco.c
  after the use of copy_struct_from_sockptr in v6.13-rc3
- include/uapi/linux/tcp.h and net/ipv4/tcp_ao.c
  after the use of copy_struct_from_sockptr in v6.6-rc1

So that new callers will get the correct behavior from the start.

Fixes: 4954f17ddefc ("net/tcp: Introduce TCP_AO setsockopt()s")
Fixes: ef84703a911f ("net/tcp: Add TCP-AO getsockopt()s")
Fixes: faadfaba5e01 ("net/tcp: Add TCP_AO_REPAIR")
Fixes: 3e643e4efa1e ("Bluetooth: Improve setsockopt() handling of malformed user input")
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Francesco Ruggeri <fruggeri@arista.com>
Cc: Salam Noureddine <noureddine@arista.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Michal Luczaj <mhal@rbox.co>
Cc: David Wei <dw@davidwei.uk>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Christian Brauner <brauner@kernel.org>
CC: Kees Cook <keescook@chromium.org>
Cc: netdev@vger.kernel.org
Cc: linux-bluetooth@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://patch.msgid.link/cfaedbc33ae9d36adaabf04fa79424f30ff1efdd.1775576651.git.metze@samba.org
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agouaccess: fix ignored_trailing logic in copy_struct_to_user()
Stefan Metzmacher [Tue, 7 Apr 2026 16:03:13 +0000 (18:03 +0200)] 
uaccess: fix ignored_trailing logic in copy_struct_to_user()

Currently all callers pass ignored_trailing=NULL, but I have
code that will make use of.

Now it actually behaves like documented:

* If @usize < @ksize, then the kernel is trying to pass userspace a newer
  struct than it supports. Thus we only copy the interoperable portions
  (@usize) and ignore the rest (but @ignored_trailing is set to %true if
  any of the trailing (@ksize - @usize) bytes are non-zero).

Fixes: 424a55a4a908 ("uaccess: add copy_struct_to_user helper")
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Francesco Ruggeri <fruggeri@arista.com>
Cc: Salam Noureddine <noureddine@arista.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Michal Luczaj <mhal@rbox.co>
Cc: David Wei <dw@davidwei.uk>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Christian Brauner <brauner@kernel.org>
CC: Kees Cook <keescook@chromium.org>
Cc: netdev@vger.kernel.org
Cc: linux-bluetooth@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://patch.msgid.link/71f69442410c1186ed8ce6d5b4b9d4a5a70edbad.1775576651.git.metze@samba.org
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agogpio: zevio: allow COMPILE_TEST builds
Rosen Penev [Sat, 9 May 2026 00:34:38 +0000 (17:34 -0700)] 
gpio: zevio: allow COMPILE_TEST builds

The ZEVIO GPIO driver uses generic platform, MMIO, and gpiolib interfaces.
Allow it to build with COMPILE_TEST so it gets coverage on non-ARM
platforms.

Drop the ARM-specific IOMEM() casts around the register pointer.  The
pointer is already __iomem, so readl() and writel() can use it directly.

Tested with:
make LLVM=1 ARCH=loongarch drivers/gpio/gpio-zevio.o

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260509003438.956051-1-rosenp@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
5 weeks agofprobe: Fix unregister_fprobe() to wait for RCU grace period
Masami Hiramatsu (Google) [Thu, 7 May 2026 07:46:29 +0000 (16:46 +0900)] 
fprobe: Fix unregister_fprobe() to wait for RCU grace period

Commit 4346ba1604093 ("fprobe: Rewrite fprobe on function-graph tracer")
changed fprobe to register struct fprobe to an rcu-hlist, but it forgot
to wait for RCU GP. Thus there can be use-after-free if the fprobe is
released right after unregistering. This can be happened on fprobe
event and sample module code.

To fix this issue, add synchronize_rcu() in unregister_fprobe().

Note that BPF is OK because fprobe is used as a part of
bpf_kprobe_multi_link. This unregisters its fprobe in
bpf_kprobe_multi_link_release() and it is deallocated via
bpf_kprobe_multi_link_dealloc(), which is invoked from
bpf_link_defer_dealloc_rcu_gp() RCU callback.

For BPF, this also introduced unregister_fprobe_async() which does
NOT wait for RCU grace priod.

Link: https://lore.kernel.org/all/177813998919.256460.2809243930741138224.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba1604093 ("fprobe: Rewrite fprobe on function-graph tracer")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
5 weeks agothunderbolt: property: Cap recursion depth in __tb_property_parse_dir()
Michael Bommarito [Sun, 10 May 2026 23:16:58 +0000 (19:16 -0400)] 
thunderbolt: property: Cap recursion depth in __tb_property_parse_dir()

A DIRECTORY entry's value field is used as the dir_offset for a
recursive call into __tb_property_parse_dir() with no depth counter.
A crafted peer that chains DIRECTORY entries into a back-reference
loop drives the parser until the kernel stack is exhausted and the
guard page fires.  Any untrusted XDomain peer (cable, dock, in-line
inspector, adjacent host) that reaches the PROPERTIES_REQUEST
control-plane exchange can trigger this without authentication.

Thread a depth counter through tb_property_parse() and
__tb_property_parse_dir(), and reject blocks that exceed
TB_PROPERTY_MAX_DEPTH = 8.  That is comfortably larger than any
observed legitimate XDomain layout.

Operators who do not need XDomain host-to-host discovery can disable
the path entirely with thunderbolt.xdomain=0 on the kernel command
line.

Fixes: cdae7c07e3e3 ("thunderbolt: Add support for XDomain properties")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
5 weeks agothunderbolt: property: Reject dir_len < 4 to prevent size_t underflow
Michael Bommarito [Sun, 10 May 2026 23:16:57 +0000 (19:16 -0400)] 
thunderbolt: property: Reject dir_len < 4 to prevent size_t underflow

On the non-root path, __tb_property_parse_dir() takes dir_len from
entry->length (u16 widened to size_t).  Two distinct OOB conditions
follow when entry->length < 4:

1. The non-root path begins with kmemdup(&block[dir_offset],
   sizeof(*dir->uuid), ...) which always reads 4 dwords from
   dir_offset.  tb_property_entry_valid() only enforces
   dir_offset + entry->length <= block_len, so a crafted entry
   with dir_offset close to the end of the property block and
   entry->length in 0..3 passes that gate but lets the UUID copy
   run off the block (e.g. dir_offset = 497, dir_len = 3 in a
   500-dword block reads block[497..501]).

2. After the kmemdup, content_len = dir_len - 4 underflows size_t
   to ~SIZE_MAX, nentries becomes SIZE_MAX / 4, and the entry
   walk runs OOB on each iteration until an entry fails
   validation or the kernel oopses on an unmapped page.

Reject dir_len < 4 on the non-root path *before* the UUID kmemdup,
which closes both holes.

Also move INIT_LIST_HEAD(&dir->properties) up to immediately after
the dir allocation so the new error-return path (and the existing
uuid-alloc failure path) calling tb_property_free_dir() sees a
walkable list rather than the zero-initialized NULL next/prev that
list_for_each_entry_safe() would oops on.

Fixes: cdae7c07e3e3 ("thunderbolt: Add support for XDomain properties")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
5 weeks agothunderbolt: property: Reject u32 wrap in tb_property_entry_valid()
Michael Bommarito [Sun, 10 May 2026 23:16:56 +0000 (19:16 -0400)] 
thunderbolt: property: Reject u32 wrap in tb_property_entry_valid()

entry->value is u32 and entry->length is u16; the sum is performed in
u32 and wraps.  A malicious XDomain peer can pick
value = 0xffffff00, length = 0x100 so the sum 0x100000000 wraps to 0
and passes the > block_len check.  tb_property_parse() then passes
entry->value to parse_dwdata() as a dword offset into the property
block, reading attacker-directed memory far past the allocation.

For TEXT-typed entries with the "deviceid" or "vendorid" keys this
lands in xd->device_name / xd->vendor_name and is readable back via
the per-XDomain device_name / vendor_name sysfs attributes; the leak
is NUL-bounded (kstrdup() stops at the first zero byte) and
untargeted (the attacker picks a delta, not an absolute address).
DATA-typed entries are parsed into property->value.data but not
generically surfaced to userspace.

Use check_add_overflow() so a wrapped sum is rejected.

Fixes: cdae7c07e3e3 ("thunderbolt: Add support for XDomain properties")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
5 weeks agobpf: fix crash in bpf_[set|remove]_dentry_xattr for negative dentries
Matt Bobrowski [Thu, 30 Apr 2026 07:38:36 +0000 (07:38 +0000)] 
bpf: fix crash in bpf_[set|remove]_dentry_xattr for negative dentries

bpf_set_dentry_xattr and bpf_remove_dentry_xattr BPF kfuncs attempt to
lock the inode of the supplied dentry without checking if it is
NULL. If a negative dentry is passed (e.g. from
security_inode_create), d_inode(dentry) returns NULL, and
inode_lock(inode) will cause a NULL pointer dereference.

Trivially fix this by adding a NULL check for inode before attempting
to lock it, returning -EINVAL if it is NULL.

Additionally, drop WARN_ON(!inode) in bpf_xattr_read_permission() and
bpf_xattr_write_permission(). These warnings could be triggered by
passing a negative dentry to bpf_get_dentry_xattr() or the _locked
variants of the xattr kfuncs, potentially causing a Denial of Service
on systems with panic_on_warn enabled. Instead, simply return -EINVAL.

Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Closes: https://lore.kernel.org/bpf/1587cbf4-1293-4e25-ad24-c970836a1686@std.uestc.edu.cn/
Fixes: 56467292794b ("bpf: fs/xattr: Add BPF kfuncs to set and remove xattrs")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://patch.msgid.link/20260430073836.2894001-1-mattbobrowski@google.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agoMerge patch series "cleanup block-style layouts exports"
Christian Brauner [Mon, 11 May 2026 09:11:55 +0000 (11:11 +0200)] 
Merge patch series "cleanup block-style layouts exports"

Chuck Lever <cel@kernel.org> says:

This series cleanups the exportfs support for block-style layouts that
provide direct block device access.  This is preparation for supporting
exportfs of more than a single device per file system.

* patches from https://patch.msgid.link/20260423181854.743150-1-cel@kernel.org:
  exportfs,nfsd: rework checking for layout-based block device access support
  exportfs: don't pass struct iattr to ->commit_blocks
  exportfs: split out the ops for layout-based block device access
  nfsd/blocklayout: always ignore loca_time_modify

Link: https://patch.msgid.link/20260423181854.743150-1-cel@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agoexportfs,nfsd: rework checking for layout-based block device access support
Christoph Hellwig [Thu, 23 Apr 2026 18:18:54 +0000 (14:18 -0400)] 
exportfs,nfsd: rework checking for layout-based block device access support

Currently NFSD hard codes checking support for block-style layouts.
Lift the checks into a file system-helper and provide a exportfs-level
helper to implement the typical checks.

This prepares for supporting block layout export of multiple devices
per file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260423181854.743150-5-cel@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agoexportfs: don't pass struct iattr to ->commit_blocks
Christoph Hellwig [Thu, 23 Apr 2026 18:18:53 +0000 (14:18 -0400)] 
exportfs: don't pass struct iattr to ->commit_blocks

The only thing ->commit_blocks really needs is the new size, with a magic
-1 placeholder 0 for "do not change the size" because it only ever
extends the size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260423181854.743150-4-cel@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agoexportfs: split out the ops for layout-based block device access
Christoph Hellwig [Thu, 23 Apr 2026 18:18:52 +0000 (14:18 -0400)] 
exportfs: split out the ops for layout-based block device access

The support to grant layouts for direct block device access works
at a very different layer than the rest of exports.  Split the methods
for it into a separate struct, and move that into a separate header
to better split things out.  The pointer to the new operation vector
is kept in export_operations to avoid bloating the super_block.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260423181854.743150-3-cel@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agonfsd/blocklayout: always ignore loca_time_modify
Christoph Hellwig [Thu, 23 Apr 2026 18:18:51 +0000 (14:18 -0400)] 
nfsd/blocklayout: always ignore loca_time_modify

RFC 8881 Section 18.42 makes it clear that the client provided timestamp
is a "may" condition, and clients that want to force a specific timestamp
should send a separate SETATTR in the compound.

Since commit b82f92d5dd1a ("fs: have setattr_copy handle multigrain
timestamps appropriately") the ia_mtime value is ignored by file
systems using multi-grain timestamps like XFS, which is the only
file system supporting blocklayout exports right now, so make that
explicit in NFSD as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260423181854.743150-2-cel@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agoselftests/pid_namespace: compute pid_max test limits dynamically
Bjoern Doebel [Wed, 22 Apr 2026 20:11:51 +0000 (20:11 +0000)] 
selftests/pid_namespace: compute pid_max test limits dynamically

The pid_max kselftest hardcodes pid_max values of 400 and 500, but the
kernel enforces a minimum of PIDS_PER_CPU_MIN * num_possible_cpus().
On machines with many possible CPUs (e.g. nr_cpu_ids=128 yields a
minimum of 1024), writing 400 or 500 to /proc/sys/kernel/pid_max
returns EINVAL and all three tests fail.

Compute these limits the same way as the kernel does and set outer_limit
and inner_limit dynamically based on the result. Original test semantics
are preserved (outer < inner, nested namespace capped by parent).

Signed-off-by: Bjoern Doebel <doebel@amazon.com>
Link: https://patch.msgid.link/20260422201151.3830506-1-doebel@amazon.com
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Assisted-by: Kiro:claude-opus-4.6
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 weeks agosched_ext: Clear ops->priv on scx_alloc_and_add_sched() error paths
Andrea Righi [Mon, 11 May 2026 08:31:30 +0000 (10:31 +0200)] 
sched_ext: Clear ops->priv on scx_alloc_and_add_sched() error paths

scx_alloc_and_add_sched() can fail after @sch has been assigned to
ops->priv. In those cases @sch is torn down (either via kfree() through
the err_free_* chain or via kobject_put() -> scx_kobj_release() -> RCU
work), but @ops->priv is left pointing at the about-to-be-freed pointer.

With the recent -EBUSY gate in scx_root_enable_workfn() and
scx_sub_enable_workfn() that rejects an attach when @ops->priv is still
non-NULL, see commit bbf30b383cf6 ("sched_ext: Fix ops->priv clobber on
concurrent attach/detach"), a dangling @ops->priv permanently locks the
kdata out: every future attach attempt sees a stale binding and returns
-EBUSY even though no scheduler is actually attached.

Clear @ops->priv on the post-assign failure paths so that the kdata
returns to its pre-attach state when the function returns ERR_PTR().

Fixes: bbf30b383cf6 ("sched_ext: Fix ops->priv clobber on concurrent attach/detach")
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoceph: put folios not suitable for writeback
Hristo Venev [Mon, 4 May 2026 15:54:45 +0000 (18:54 +0300)] 
ceph: put folios not suitable for writeback

The batch holds references to the folios (see `filemap_get_folios`,
`folio_batch_release`), so we need to `folio_put` the folios we remove.

Tested on v6.18.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/74156
Signed-off-by: Hristo Venev <hristo@venev.name>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agoceph: add ceph_has_realms_with_quotas() check to ceph_quota_update_statfs()
Viacheslav Dubeyko [Thu, 9 Apr 2026 18:33:23 +0000 (11:33 -0700)] 
ceph: add ceph_has_realms_with_quotas() check to ceph_quota_update_statfs()

When MDS rejects a session, remove_session_caps() ->
__ceph_remove_cap() -> ceph_change_snap_realm() clears
i_snap_realm for every inode that loses its last cap.
The realm is restored once caps are re-granted after
reconnect. It is not a real error and this patch changes
pr_err_ratelimited_client() on doutc().

Every quota methods ceph_quota_is_max_files_exceeded(),
ceph_quota_is_max_bytes_exceeded(),
ceph_quota_is_max_bytes_approaching() calls
ceph_has_realms_with_quotas() check. This patch adds
the missing ceph_has_realms_with_quotas() call into
ceph_quota_update_statfs().

[ idryomov: add braces around both arms of multiline ifs ]

Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agolibceph: Fix potential out-of-bounds access in __ceph_x_decrypt()
Raphael Zimmer [Tue, 28 Apr 2026 12:15:46 +0000 (14:15 +0200)] 
libceph: Fix potential out-of-bounds access in __ceph_x_decrypt()

In __ceph_x_decrypt(), a part of the buffer p is interpreted as a
ceph_x_encrypt_header, and the magic field of this struct is accessed.
This happens without any guarantee that the buffer is large enough to
hold this struct. The function parameter ciphertext_len represents the
length of the ciphertext to decrypt and is guaranteed to be at most the
remaining size of the allocated buffer p. However, this value is not
necessarily greater than sizeof(ceph_x_encrypt_header). E.g., a message
frame of type FRAME_TAG_AUTH_REPLY_MORE, that is just as long to hold
the ciphertext at its end with a ciphertext_len of 8 or less, can
trigger an out-of-bounds memory access when accessing hdr->magic.

This patch fixes the issue by adding a check to ensure that the
decrypted plaintext in the buffer is large enough to represent at least
the ceph_x_encrypt_header.

Cc: stable@vger.kernel.org
Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agoceph: fix BUG_ON in __ceph_build_xattrs_blob() due to stale blob size
Viacheslav Dubeyko [Thu, 9 Apr 2026 19:43:40 +0000 (12:43 -0700)] 
ceph: fix BUG_ON in __ceph_build_xattrs_blob() due to stale blob size

The generic/642 test-case can reproduce the kernel crash:

[40243.605254] ------------[ cut here ]------------
[40243.605956] kernel BUG at fs/ceph/xattr.c:918!
[40243.607142] Oops: invalid opcode: 0000 [#1] SMP PTI
[40243.608067] CPU: 7 UID: 0 PID: 498762 Comm: kworker/7:1 Not tainted 7.0.0-rc7+ #3 PREEMPT(full)
[40243.609700] Hardware name: QEMU Ubuntu 25.10 PC v2 (i440FX + PIIX, + 10.1 machine, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[40243.611820] Workqueue: ceph-msgr ceph_con_workfn
[40243.612715] RIP: 0010:__ceph_build_xattrs_blob+0x1b8/0x1e0
[40243.613731] Code: 0f 84 82 fe ff ff e9 cf 8e 56 ff 48 8d 65 e8 31 c0 5b 41 5c 41 5d 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b 4c 8b 62 08 41 8b 85 24 07 00 00 49 83 c4 04 41 89 44 24 fc
[40243.616888] RSP: 0018:ffffcc80c4d4b688 EFLAGS: 00010287
[40243.617773] RAX: 0000000000010026 RBX: 0000000000000001 RCX: 0000000000000000
[40243.618928] RDX: ffff8a773798dee0 RSI: 0000000000000000 RDI: 0000000000000000
[40243.620158] RBP: ffffcc80c4d4b6a0 R08: 0000000000000000 R09: 0000000000000000
[40243.621573] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a75f3b58000
[40243.622907] R13: ffff8a75f3b58000 R14: 0000000000000080 R15: 000000000000bffd
[40243.624054] FS:  0000000000000000(0000) GS:ffff8a787d1b4000(0000) knlGS:0000000000000000
[40243.625331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40243.626269] CR2: 000072f390b623c0 CR3: 000000011c02a003 CR4: 0000000000372ef0
[40243.627408] Call Trace:
[40243.627839]  <TASK>
[40243.628188]  __prep_cap+0x3fd/0x4a0
[40243.628789]  ? do_raw_spin_unlock+0x4e/0xe0
[40243.629474]  ceph_check_caps+0x46a/0xc80
[40243.630094]  ? __lock_acquire+0x4a2/0x2650
[40243.630773]  ? find_held_lock+0x31/0x90
[40243.631347]  ? handle_cap_grant+0x79f/0x1060
[40243.632068]  ? lock_release+0xd9/0x300
[40243.632696]  ? __mutex_unlock_slowpath+0x3e/0x340
[40243.633429]  ? lock_release+0xd9/0x300
[40243.634052]  handle_cap_grant+0xcf6/0x1060
[40243.634745]  ceph_handle_caps+0x122b/0x2110
[40243.635415]  mds_dispatch+0x5bd/0x2160
[40243.636034]  ? ceph_con_process_message+0x65/0x190
[40243.636828]  ? lock_release+0xd9/0x300
[40243.637431]  ceph_con_process_message+0x7a/0x190
[40243.638184]  ? kfree+0x311/0x4f0
[40243.638749]  ? kfree+0x311/0x4f0
[40243.639268]  process_message+0x16/0x1a0
[40243.639915]  ? sg_free_table+0x39/0x90
[40243.640572]  ceph_con_v2_try_read+0xf58/0x2120
[40243.641255]  ? lock_acquire+0xc8/0x300
[40243.641863]  ceph_con_workfn+0x151/0x820
[40243.642493]  process_one_work+0x22f/0x630
[40243.643093]  ? process_one_work+0x254/0x630
[40243.643770]  worker_thread+0x1e2/0x400
[40243.644332]  ? __pfx_worker_thread+0x10/0x10
[40243.645020]  kthread+0x109/0x140
[40243.645560]  ? __pfx_kthread+0x10/0x10
[40243.646125]  ret_from_fork+0x3f8/0x480
[40243.646752]  ? __pfx_kthread+0x10/0x10
[40243.647316]  ? __pfx_kthread+0x10/0x10
[40243.647919]  ret_from_fork_asm+0x1a/0x30
[40243.648556]  </TASK>
[40243.648902] Modules linked in: overlay hctr2 libpolyval chacha libchacha adiantum libnh libpoly1305 essiv intel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit kvm_intel kvm irqbypass joydev ghash_clmulni_intel aesni_intel rapl input_leds mac_hid psmouse vga16fb serio_raw vgastate floppy i2c_piix4 pata_acpi bochs qemu_fw_cfg i2c_smbus sch_fq_codel rbd dm_crypt msr parport_pc ppdev lp parport efi_pstore
[40243.654766] ---[ end trace 0000000000000000 ]---

Commit d93231a6bc8a ("ceph: prevent a client from exceeding the MDS
maximum xattr size") moved the required_blob_size computation to before
the __build_xattrs() call, introducing a race.

__build_xattrs() releases and reacquires i_ceph_lock during execution.
In that window, handle_cap_grant() may update i_xattrs.blob with a
newer MDS-provided blob and bump i_xattrs.version.  When
__build_xattrs() detects that index_version < version, it destroys and
rebuilds the entire xattr rb-tree from the new blob, potentially
increasing count, names_size, and vals_size.

The prealloc_blob size check that follows still uses the stale
required_blob_size computed before the rebuild, so it passes even when
prealloc_blob is too small for the now-larger tree. After __set_xattr()
adds one more xattr on top, __ceph_build_xattrs_blob() is called from
the cap flush path and hits:

    BUG_ON(need > ci->i_xattrs.prealloc_blob->alloc_len);

Fix this by recomputing required_blob_size after __build_xattrs()
returns, using the current tree state. Also re-validate against
m_max_xattr_size to fall back to the sync path if the rebuilt tree now
exceeds the MDS limit.

Cc: stable@vger.kernel.org
Fixes: d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size")
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agoceph: fix a buffer leak in __ceph_setxattr()
Viacheslav Dubeyko [Thu, 9 Apr 2026 19:26:02 +0000 (12:26 -0700)] 
ceph: fix a buffer leak in __ceph_setxattr()

The old_blob in __ceph_setxattr() can store
ci->i_xattrs.prealloc_blob value during the retry.
However, it is never called the ceph_buffer_put()
for the old_blob object. This patch fixes the issue of
the buffer leak.

Cc: stable@vger.kernel.org
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agolibceph: Fix unnecessarily high ceph_decode_need() for uniform bucket
Raphael Zimmer [Fri, 24 Apr 2026 13:37:37 +0000 (15:37 +0200)] 
libceph: Fix unnecessarily high ceph_decode_need() for uniform bucket

In crush_decode_uniform_bucket(), the item_weight field of the bucket
is set. This is a single field of type u32 since the uniform bucket uses
the same weight for all items. The value in ceph_decode_need() is set to
(1+b->h.size) * sizeof(u32), which is higher than actually needed.

This patch removes the call to ceph_decode_need() with the unnecessarily
high value and switches the subsequent operation from ceph_decode_32()
to ceph_decode_32_safe(), which already includes the correct bounds
check.

Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agolibceph: Fix potential out-of-bounds access in crush_decode()
Raphael Zimmer [Wed, 22 Apr 2026 08:47:13 +0000 (10:47 +0200)] 
libceph: Fix potential out-of-bounds access in crush_decode()

A message of type CEPH_MSG_OSD_MAP containing a crush map with at least
one bucket has two fields holding the bucket algorithm. If the values
in these two fields differ, an out-of-bounds access can occur. This is
the case because the first algorithm field (alg) is used to allocate
the correct amount of memory for a bucket of this type, while the second
algorithm field inside the bucket (b->alg) is used in the subsequent
processing.

This patch fixes the issue by adding a check that compares alg and
b->alg and aborts the processing in case they differ. Furthermore,
b->alg is set to 0 in this case, because the destruction of the crush
map also uses this field to determine the bucket type, which can again
result in an out-of-bounds access when trying to free the memory pointed
to by the fields of the bucket. To correctly free the memory allocated
for the bucket in such a case, the corresponding call to kfree is moved
from the algorithm-specific crush_destroy_bucket functions to the
generic crush_destroy_bucket().

Cc: stable@vger.kernel.org
Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agoxfrm: ipcomp: Free destination pages on acomp errors
Herbert Xu [Wed, 6 May 2026 13:23:28 +0000 (21:23 +0800)] 
xfrm: ipcomp: Free destination pages on acomp errors

Move the out_free_req label up by a couple of lines so that the
allocated dst SG list gets freed on error as well as success.

Fixes: eb2953d26971 ("xfrm: ipcomp: Use crypto_acomp interface")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Reported-by: Yilin Zhu <zylzyl2333@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
5 weeks agoiommu/vt-d: Avoid NULL pointer dereference or refcount corruption
Zhenzhong Duan [Sat, 9 May 2026 02:43:46 +0000 (10:43 +0800)] 
iommu/vt-d: Avoid NULL pointer dereference or refcount corruption

Commit 60f030f7418d ("iommu/vt-d: Avoid use of NULL after WARN_ON_ONCE")
fixed a NULL pointer dereference in an unlikely situation partly.

If dev_pasid is not found in the dev_pasids list, it remains NULL.
However, the teardown operations are executed unconditionally, this lead
to a NULL pointer dereference or refcount corruption.

If the domain was never attached to this IOMMU, info will be NULL, which
would cause an immediate dereference when checking --info->refcnt.

Even if info is not NULL, decrementing the refcount without having removed
a valid PASID might unbalance the count. This could lead to premature
dropping of the refcount to 0, potentially causing a use-after-free for the
remaining active devices sharing the domain.

Fix it by returning early if dev_pasid is NULL, before executing the
teardown operations.

Issue found by AI review and suggested by Kevin Tian.
https://sashiko.dev/#/patchset/20260421031347.1408890-1-zhenzhong.duan%40intel.com

Fixes: 60f030f7418d ("iommu/vt-d: Avoid use of NULL after WARN_ON_ONCE")
Cc: stable@vger.kernel.org
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260422033538.95000-1-zhenzhong.duan@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu/vt-d: Fix oops due to out of scope access
Zhenzhong Duan [Sat, 9 May 2026 02:43:45 +0000 (10:43 +0800)] 
iommu/vt-d: Fix oops due to out of scope access

Below oops triggers when kill QEMU process:

  Oops: general protection fault, probably for non-canonical address 0x7fffffff844eaaa7: 0000 [#1] SMP NOPTI
  Call Trace:
   <TASK>
   do_raw_spin_lock+0xaa/0xc0
   _raw_spin_lock_irqsave+0x21/0x40
   domain_remove_dev_pasid+0x52/0x160
   intel_nested_set_dev_pasid+0x1b9/0x1e0
   __iommu_set_group_pasid+0x56/0x120
   pci_dev_reset_iommu_done+0xe3/0x180
   pcie_flr+0x65/0x160
   __pci_reset_function_locked+0x5b/0x120
   vfio_pci_core_close_device+0x63/0xe0 [vfio_pci_core]
   vfio_df_close+0x4f/0xa0
   vfio_df_unbind_iommufd+0x2d/0x60
   vfio_device_fops_release+0x3e/0x40
   __fput+0xe5/0x2c0
   task_work_run+0x58/0xa0
   do_exit+0x2c8/0x600
   do_group_exit+0x2f/0xa0
   get_signal+0x863/0x8c0
   arch_do_signal_or_restart+0x24/0x100
   exit_to_user_mode_loop+0x87/0x380
   do_syscall_64+0x2ff/0x11e0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

The global static blocked domain is a dummy domain without corresponding
dmar_domain structure, accessing beyond iommu_domain structure triggers
oops easily. Fix it by return early in domain_remove_dev_pasid() like
identity domain.

Fixes: 7d0c9da6c150 ("iommu/vt-d: Add set_dev_pasid callback for dma domain")
Cc: stable@vger.kernel.org
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260421031347.1408890-1-zhenzhong.duan@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu/vt-d: Disable DMAR for Intel Q35 IGFX
Naval Alcalá [Sat, 9 May 2026 02:43:44 +0000 (10:43 +0800)] 
iommu/vt-d: Disable DMAR for Intel Q35 IGFX

Intel Q35 integrated graphics (8086:29b2) exhibits broken DMAR
behaviour similar to other G4x/GM45 devices for which DMAR is
already disabled via quirks.

When DMAR is enabled, the system may hard lock up during boot or
early device initialization, requiring a reset.

Add the missing PCI ID to the existing quirk list to disable
DMAR for this device.

Fixes: 1f76249cc3be ("iommu/vt-d: Declare Broadwell igfx dmar support snafu")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=201185
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216064
Signed-off-by: Naval Alcalá <ari@naval.cat>
Link: https://lore.kernel.org/r/20260410161622.13549-1-ari@naval.cat
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agocgroup/cpuset: Reset DL migration state on can_attach() failure
Guopeng Zhang [Sat, 9 May 2026 10:20:30 +0000 (18:20 +0800)] 
cgroup/cpuset: Reset DL migration state on can_attach() failure

cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
state in the destination cpuset while walking the taskset.

If a later task_can_attach() or security_task_setscheduler() check
fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
and does not call cpuset_cancel_attach() for it. The partially
accumulated state is then left behind and can be consumed by a later
attach, corrupting cpuset DL task accounting and pending DL bandwidth
accounting.

Reset the pending DL migration state from the common error exit when
ret is non-zero. Successful can_attach() keeps the state for
cpuset_attach() or cpuset_cancel_attach().

Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
Cc: stable@vger.kernel.org # v6.10+
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Reviewed-by: Waiman Long <longman@redhat.com>
5 weeks agoiommu: Warn on premature unblock during DMA aliased sibling reset
Nicolin Chen [Sat, 25 Apr 2026 01:15:27 +0000 (18:15 -0700)] 
iommu: Warn on premature unblock during DMA aliased sibling reset

When two aliased siblings are in the same iommu_group, they might share the
same RID. The reset functions don't support this case, though it is unclear
whether there is a real case of having an ATS capable device on a PCI/PCI-X
bus.

Theoretically, however, if two aliased devices are resetting concurrently,
one might be unblocked prematurely in the middle of the reset by the other
sibling who completes the reset first.

This isn't a regression from this series but it's better to spit a warning,
so we can know if such use case is common enough for us to make subsequent
patches for its coverage.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset
Nicolin Chen [Sat, 25 Apr 2026 01:15:26 +0000 (18:15 -0700)] 
iommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset

In __iommu_group_set_domain_internal(), concurrent domain attachments are
rejected when any device in the group is recovering. This is necessary to
fence concurrent attachments to a multi-device group where devices might
share the same RID due to PCI DMA alias quirks, but triggers the WARN_ON in
__iommu_group_set_domain_nofail().

Other IOMMU_SET_DOMAIN_MUST_SUCCEED callers in detach/teardown paths, such
as __iommu_group_set_core_domain and __iommu_release_dma_ownership, should
not be rejected, as the domain would be freed anyway in these nofail paths
while group->domain is still pointing to it. So pci_dev_reset_iommu_done()
could trigger a UAF when re-attaching group->domain.

Honor the IOMMU_SET_DOMAIN_MUST_SUCCEED flag, allowing the callers through
the group->recovery_cnt fence, so as to update the group->domain pointer.
Instead add a gdev->blocked check in the device iteration loop, to prevent
any concurrent per-device detachment.

Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Closes: https://sashiko.dev/#/patchset/20260407194644.171304-1-nicolinc%40nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu: Fix ATS invalidation timeouts during __iommu_remove_group_pasid()
Nicolin Chen [Sat, 25 Apr 2026 01:15:25 +0000 (18:15 -0700)] 
iommu: Fix ATS invalidation timeouts during __iommu_remove_group_pasid()

If a device is blocked, its PASID domains are already detached. Repeating
iommu_remove_dev_pasid() is unnecessary and might trigger ATS invalidation
timeouts.

Skip the iommu_remove_dev_pasid() call upon gdev->blocked.

Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Closes: https://sashiko.dev/#/patchset/20260407194644.171304-1-nicolinc%40nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu: Fix nested pci_dev_reset_iommu_prepare/done()
Nicolin Chen [Sat, 25 Apr 2026 01:15:24 +0000 (18:15 -0700)] 
iommu: Fix nested pci_dev_reset_iommu_prepare/done()

Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
internally while both are calling pci_dev_reset_iommu_prepare/done().

As pci_dev_reset_iommu_prepare() doesn't support re-entry, the inner call
will trigger a WARN_ON and return -EBUSY, resulting in failing the entire
device reset.

On the other hand, removing the outer calls in the PCI callers is unsafe.
As pointed out by Kevin, device-specific quirks like reset_hinic_vf_dev()
execute custom firmware waits after their inner pcie_flr() completes. If
the IOMMU protection relies solely on the inner reset, the IOMMU will be
unblocked prematurely while the device is still resetting.

Instead, fix this by making pci_dev_reset_iommu_prepare/done() reentrant.

Introduce gdev->reset_depth to handle the re-entries on the same device.

Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Reported-by: Shuai Xue <xueshuai@linux.alibaba.com>
Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu: Fix pasid attach in pci_dev_reset_iommu_prepare/done()
Nicolin Chen [Sat, 25 Apr 2026 01:15:23 +0000 (18:15 -0700)] 
iommu: Fix pasid attach in pci_dev_reset_iommu_prepare/done()

Now the helpers handle per-gdev resets. Replace __iommu_set_group_pasid()
with set_dev_pasid() accordingly, in the pci_dev_reset_iommu_done().

Also add max_pasids check as other callers.

Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Reported-by: Shuai Xue <xueshuai@linux.alibaba.com>
Closes: https://lore.kernel.org/all/ad858513-09fc-455e-bbc5-fe38a225cc78@linux.alibaba.com/
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu: Replace per-group resetting_domain with per-gdev blocked flag
Nicolin Chen [Sat, 25 Apr 2026 01:15:22 +0000 (18:15 -0700)] 
iommu: Replace per-group resetting_domain with per-gdev blocked flag

The core tracks device resetting states with a per-group resetting_domain,
while a reset is actually per group-device. Such a mismatch might lead to
confusion and even difficulty to untangle per-gdev handling requirement.

Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
internally while both are calling pci_dev_reset_iommu_prepare/done(). And
the solution requires the core to track at the group_device level as well.

Introduce a 'blocked' flag to struct group_device, to allow a multi-device
group to isolate concurrent device resets independently.

As the reset routine is per gdev, it cannot clear group->resetting_domain
without iterating over the device list to ensure no other device is being
reset. Simplify it by replacing the resetting_domain with a 'recovery_cnt'
in the struct iommu_group.

No functional change. But this is essential to apply following bug fixes.

Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Reported-by: Shuai Xue <xueshuai@linux.alibaba.com>
Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu: Fix kdocs of pci_dev_reset_iommu_done()
Nicolin Chen [Sat, 25 Apr 2026 01:15:21 +0000 (18:15 -0700)] 
iommu: Fix kdocs of pci_dev_reset_iommu_done()

Remove the duplicated word. No functional change.

Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu: Fix NULL group->domain dereference in pci_dev_reset_iommu_done()
Nicolin Chen [Sat, 25 Apr 2026 01:15:20 +0000 (18:15 -0700)] 
iommu: Fix NULL group->domain dereference in pci_dev_reset_iommu_done()

Local sashiko review pointed it out that group->domain could be NULL when
a default domain fails to allocate during the first probe, which can crash
at domain->ops->attach_dev dereference in __iommu_attach_device() invoked
by pci_dev_reset_iommu_done().

pci_dev_reset_iommu_prepare() is fine as an old_domain pointer can be NULL.

Skip the re-attach in pci_dev_reset_iommu_done() to fix the bug.

Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agoiommu/amd: Bounds-check devid in __rlookup_amd_iommu()
Jose Fernandez (Anthropic) [Tue, 21 Apr 2026 19:26:13 +0000 (19:26 +0000)] 
iommu/amd: Bounds-check devid in __rlookup_amd_iommu()

iommu_device_register() walks every device on the PCI bus via
bus_for_each_dev() and calls amd_iommu_probe_device() for each. The
inlined check_device() path computes the device's sbdf, calls
rlookup_amd_iommu() to find the owning IOMMU, and only afterwards
verifies devid <= pci_seg->last_bdf. __rlookup_amd_iommu() indexes
rlookup_table[devid] with no bounds check of its own, so for a PCI
device whose BDF is not described by the IVRS, the lookup reads past
the end of the allocation before the caller's bounds check can run.

This was harmless before commit e874c666b15b ("iommu/amd: Change
rlookup, irq_lookup, and alias to use kvalloc()"): the table was a
zeroed page-order allocation, so the over-read returned NULL and the
caller's NULL check skipped the device. After that commit the table is
a tight kvcalloc() and the over-read returns adjacent slab contents,
which check_device() then dereferences as a struct amd_iommu *,
causing a boot-time GPF.

Seen on Google Compute Engine ct6e VMs, where the virtualized IVRS
describes only the four TPU endpoints 00:04.0-07.0; the gVNIC at
00:08.0 (devid 0x40) indexes 56 bytes past the 456-byte allocation,
into the adjacent kmalloc-512 slab object:

  pci 0000:00:04.0: Adding to iommu group 0
  pci 0000:00:05.0: Adding to iommu group 1
  pci 0000:00:06.0: Adding to iommu group 2
  pci 0000:00:07.0: Adding to iommu group 3
  Oops: general protection fault, probably for non-canonical address 0x3a64695f78746382: 0000 [#1] SMP NOPTI
  CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.22 #1
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/06/2025
  RIP: 0010:amd_iommu_probe_device+0x54/0x3a0
  Call Trace:
   __iommu_probe_device+0x107/0x520
   probe_iommu_group+0x29/0x50
   bus_for_each_dev+0x7e/0xe0
   iommu_device_register+0xc9/0x240
   iommu_go_to_state+0x9c0/0x1c60
   amd_iommu_init+0x14/0x40
   pci_iommu_init+0x16/0x60
   do_one_initcall+0x47/0x2f0

Guard the array access in __rlookup_amd_iommu(). With the fix applied
on 6.18.22, the gVNIC at 00:08.0 is skipped cleanly and the VM boots.

Fixes: e874c666b15b ("iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc()")
Cc: stable@vger.kernel.org
Reported-by: Ziyuan Chen <zc@anthropic.com>
Tested-by: Ziyuan Chen <zc@anthropic.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Assisted-by: Claude:unspecified
Signed-off-by: Jose Fernandez (Anthropic) <jose.fernandez@linux.dev>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agogpio: add GPIO controller found on Waveshare DSI TOUCH panels
Dmitry Baryshkov [Thu, 7 May 2026 09:01:33 +0000 (12:01 +0300)] 
gpio: add GPIO controller found on Waveshare DSI TOUCH panels

The Waveshare DSI TOUCH family of panels has separate on-board GPIO
controller, which controls power supplies to the panel and the touch
screen and provides reset pins for both the panel and the touchscreen.
Also it provides a simple PWM controller for panel backlight. Add
support for this GPIO controller.

Tested-by: Riccardo Mereu <r.mereu@arduino.cc>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260507-waveshare-dsi-touch-v5-2-d2ac7ccc22d4@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
5 weeks agodt-bindings: gpio: describe Waveshare GPIO controller
Dmitry Baryshkov [Thu, 7 May 2026 09:01:32 +0000 (12:01 +0300)] 
dt-bindings: gpio: describe Waveshare GPIO controller

The Waveshare DSI TOUCH family of panels has separate on-board GPIO
controller, which controls power supplies to the panel and the touch
screen and provides reset pins for both the panel and the touchscreen.
Also it provides a simple PWM controller for panel backlight.

Add bindings for these GPIO controllers. As overall integration might be
not very obvious (and it differs significantly from the bindings used by
the original drivers), provide complete example with the on-board
regulators and the DSI panel.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260507-waveshare-dsi-touch-v5-1-d2ac7ccc22d4@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
5 weeks agopower: sequencing: print power sequencing device parent in debugfs
Chen-Yu Tsai [Thu, 7 May 2026 05:29:41 +0000 (13:29 +0800)] 
power: sequencing: print power sequencing device parent in debugfs

The debugfs summary currently shows the power sequencing device's name.
This is not really helpful since the device name is always "pwrseq.N".

Also print the parent device's name. This would likely be the device
node name from the device tree, something like "nvme-connector". This
would make it much easier for the developer to associate the summary
with a certain device.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Link: https://patch.msgid.link/20260507052943.3133349-1-wenst@chromium.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
5 weeks agoiommu/amd: Remove latent out-of-bounds access in IOMMU debugfs
Eder Zulian [Fri, 10 Apr 2026 12:55:50 +0000 (14:55 +0200)] 
iommu/amd: Remove latent out-of-bounds access in IOMMU debugfs

In iommu_mmio_write() and iommu_capability_write(), the variables
dbg_mmio_offset and dbg_cap_offset are declared as int. However, they
are populated using kstrtou32_from_user(). If a user provides a
sufficiently large value, it can become a negative integer.

Prior to this patch, the AMD IOMMU debugfs implementation was already
protected by different mechanisms.

1. #define OFS_IN_SZ 8 ensures the user string <= 8 bytes, so
   e.g. 0xffffffff isn't a valid input.

  if (cnt > OFS_IN_SZ)
     return -EINVAL;

2. Implicit type promotion in iommu_mmio_write(), dbg_mmio_offset is int
   and iommu->mmio_phys_end is u64

  if (dbg_mmio_offset > iommu->mmio_phys_end - sizeof(u64))
      return -EINVAL;

3. The show handlers would currently catch the negative number and
   refuse to perform the read.

Replace kstrtou32_from_user() with kstrtos32_from_user() to parse the
input, and check for negative values to explicitly prevent out-of-bounds
memory accesses directly in iommu_mmio_write() and
iommu_capability_write().

Signed-off-by: Eder Zulian <ezulian@redhat.com>
Fixes: 7a4ee419e8c1 ("iommu/amd: Add debugfs support to dump IOMMU MMIO registers")
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
5 weeks agosched_ext: Fix ops->priv clobber on concurrent attach/detach
Andrea Righi [Mon, 11 May 2026 06:18:12 +0000 (08:18 +0200)] 
sched_ext: Fix ops->priv clobber on concurrent attach/detach

Under heavy concurrent attach/detach operations, scx_claim_exit() can
trigger a NULL pointer dereference. This can be reproduced running the
reload_loop kselftests inside a virtme-ng session:

 $ vng -v -- ./tools/testing/selftests/sched_ext/runner -t reload_loop
 ...
 BUG: kernel NULL pointer dereference, address: 0000000000000400
 RIP: 0010:scx_claim_exit+0x3b/0x120
 Call Trace:
  <TASK>
  bpf_scx_unreg+0x45/0xb0
  bpf_struct_ops_map_link_dealloc+0x39/0x50
  bpf_link_release+0x18/0x20
  __fput+0x10b/0x2e0
  __x64_sys_close+0x47/0xa0

The underlying race (diagnosed by Tejun Heo) is a stomp of @ops->priv,
not a missing NULL check:

  T2 unreg(K)                       T1 reg(K)
  -----------                       ---------
  sch = ops->priv = sch_b800
  scx_disable; flush_disable_work
    [scx_root_disable: scx_root=NULL,
     mutex_unlock, state=DISABLED]
                                    mutex_lock; state ok
                                    scx_alloc_and_add_sched:
                                      ops->priv = sch_a800
                                    scx_root = sch_a800; init=0
                                    state=ENABLED; mutex_unlock
    [flush returns]
  RCU_INIT_POINTER(ops->priv, NULL) <-- clobbers sch_a800
  kobject_put(sch_b800)

T1 acquires scx_enable_mutex inside scx_root_disable()'s mutex_unlock
window and starts a fresh attach on the same kdata, assigning sch_a800
to @ops->priv. T2 then continues out of scx_disable()/flush_disable_work
and clobbers @ops->priv to NULL, leaking sch_a800; the bpf_link is gone
but state stays SCX_ENABLED, so all future attaches fail with -EBUSY
permanently. The next bpf_scx_unreg() on that kdata then reads NULL
@ops->priv and dereferences it in scx_claim_exit().

Make @ops->priv the lifecycle binding: in scx_root_enable_workfn() and
scx_sub_enable_workfn(), after the existing state check and still under
scx_enable_mutex, refuse with -EBUSY if @ops->priv is non-NULL. This
rejects an attempt to reuse a kdata that is still bound to a previous
scheduler instance, closing the race without changing the unreg side.

Fixes: 105dcd005be2 ("sched_ext: Introduce scx_prog_sched()")
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoplatform/chrome: cros_kbd_led_backlight: Drop CONFIG_MFD_CROS_EC_DEV ifdeffery
Thomas Weißschuh [Sat, 4 Apr 2026 07:55:28 +0000 (09:55 +0200)] 
platform/chrome: cros_kbd_led_backlight: Drop CONFIG_MFD_CROS_EC_DEV ifdeffery

The ifdeffery is unnecessary, as the compiler can already optimize away
all of the mfd-specific code based on the IS_ENABLED() in
keyboard_led_is_mfd_device().

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20260404-cros_kbd_led-cleanup-v1-3-0dc1100d54e3@weissschuh.net
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
5 weeks agoplatform/chrome: cros_kbd_led_backlight: Pass keyboard_led as parameter
Thomas Weißschuh [Sat, 4 Apr 2026 07:55:27 +0000 (09:55 +0200)] 
platform/chrome: cros_kbd_led_backlight: Pass keyboard_led as parameter

Make the code simpler to read by passing the 'struct keyboard_led' as
a parameter to the 'init' callbacks instead of relying on the platform
device driver data.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20260404-cros_kbd_led-cleanup-v1-2-0dc1100d54e3@weissschuh.net
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
5 weeks agoplatform/chrome: cros_kbd_led_backlight: Drop max_brightness from driver data
Thomas Weißschuh [Sat, 4 Apr 2026 07:55:26 +0000 (09:55 +0200)] 
platform/chrome: cros_kbd_led_backlight: Drop max_brightness from driver data

The maximum brightness is always 100. There is no need to read that from
the driver data.

Remove the superfluous driver data.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20260404-cros_kbd_led-cleanup-v1-1-0dc1100d54e3@weissschuh.net
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
5 weeks agoplatform/chrome: Resolve kb_wake_angle visibility race
Tzung-Bi Shih [Tue, 7 Apr 2026 10:26:15 +0000 (10:26 +0000)] 
platform/chrome: Resolve kb_wake_angle visibility race

A race condition exists between the probe of cros-ec-sysfs and
cros-ec-sensorhub.

The `kb_wake_angle` attribute should only be visible if the sensor hub
detects two or more accelerometers.  If cros_ec_sysfs_probe() runs
before cros_ec_sensorhub_register() completes sensor enumeration, the
sysfs attributes are created while `has_kb_wake_angle` is still false,
hiding `kb_wake_angle` incorrectly.

Store the created attribute group pointer in `ec_dev->group`.  When
the sensor hub completes sensor enumeration, it checks for this group
and calls sysfs_update_group() to notify the sysfs core to re-evaluate
attribute visibility.  This ensures the `kb_wake_angle` attribute
visibility is correctly updated regardless of the driver probe order.

Co-developed-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Link: https://lore.kernel.org/r/20260407102615.1605317-1-tzungbi@kernel.org
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
5 weeks agoselftests/sched_ext: Fix build error in dequeue selftest
Andrea Righi [Sun, 10 May 2026 17:52:11 +0000 (19:52 +0200)] 
selftests/sched_ext: Fix build error in dequeue selftest

Building the dequeue selftest with newer compilers (e.g., gcc 16)
triggers the following error:

 dequeue.c:28:22: error: variable 'sum' set but not used

The 'volatile' qualifier prevents the writes from being optimized away,
but does not silence the unused variable 'sum' is indeed only written
and never read.

Consume 'sum' via an empty asm() with a register input constraint. This
forces the compiler to keep the accumulated value (preserving the CPU
stress loop) and avoiding the build error.

Fixes: 658ad2259b3e ("selftests/sched_ext: Add test to validate ops.dequeue() semantics")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoselftests/cgroup: Fix string comparison in write_test
Hongfu Li [Mon, 11 May 2026 01:39:57 +0000 (09:39 +0800)] 
selftests/cgroup: Fix string comparison in write_test

Use string comparison (!=) instead of numeric comparison (-ne) for
cpuset values like "0-1".
For example:
$ [[ "0-1" != "2-3" ]] && echo "true" || echo "false"
true
$ [[ "0-1" -ne "2-3" ]] && echo "true" || echo "false"
false

Signed-off-by: Hongfu Li <lihongfu@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoselftests/cgroup: Fix cg_read_strcmp() empty string comparison
Hongfu Li [Sat, 9 May 2026 08:03:28 +0000 (16:03 +0800)] 
selftests/cgroup: Fix cg_read_strcmp() empty string comparison

cg_read_strcmp() allocated a buffer sized to strlen(expected) + 1,
then passed it to read_text() which calls read(fd, buf, size-1).

When comparing against an empty string (""), strlen("") = 0 gives a
1-byte buffer, and read() is asked to read 0 bytes.  The file content
is never actually read, so strcmp("", buf) always returns 0 regardless
of the real content.  This caused cg_test_proc_killed() to always
report the cgroup as empty immediately, making OOM tests pass without
verifying that processes were killed.

Signed-off-by: Hongfu Li <lihongfu@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agocgroup/dmem: Return -ENOMEM on failed pool preallocation
Guopeng Zhang [Mon, 11 May 2026 01:31:50 +0000 (09:31 +0800)] 
cgroup/dmem: Return -ENOMEM on failed pool preallocation

get_cg_pool_unlocked() handles allocation failures under dmemcg_lock by
dropping the lock, preallocating a pool with GFP_KERNEL, and retrying the
locked lookup and creation path.

If the fallback allocation fails too, pool remains NULL. Since the loop
condition is while (!pool), the function can keep retrying instead of
propagating the allocation failure to the caller.

Set pool to ERR_PTR(-ENOMEM) when the fallback allocation fails so the
loop exits through the existing common return path. The callers already
handle ERR_PTR() from get_cg_pool_unlocked(), so this restores the
expected error path.

Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup")
Cc: stable@vger.kernel.org # v6.14+
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
5 weeks agoASoC: sdw_utils: make RT712/RT721 CODEC_MIC be optional
Mark Brown [Mon, 11 May 2026 01:04:02 +0000 (10:04 +0900)] 
ASoC: sdw_utils: make RT712/RT721 CODEC_MIC be optional

Bard Liao <yung-chuan.liao@linux.intel.com> says:

The RT712 and RT721 codec mic are optional and are not used on some
products. Add a quirk to make it optional and skip the codec mic DAI
when it is not present in DisCo table.

Link: https://patch.msgid.link/20260508093224.1246282-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agoASoC: sdw_utils: Add quirk to ignore RT721 CODEC_MIC
Mac Chiang [Fri, 8 May 2026 09:32:24 +0000 (17:32 +0800)] 
ASoC: sdw_utils: Add quirk to ignore RT721 CODEC_MIC

Add a quirk to skip the CODEC_MIC DAI when it is not present.
This ensures PCH_DMIC is used as the fallback; otherwise,
CODEC_MIC remains the default.

Fixes: 846a8d3cf3ba ("ASoC: Intel: soc-acpi-intel-ptl-match: Add rt721 support")
Signed-off-by: Mac Chiang <mac.chiang@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260508093224.1246282-3-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agoASoC: sdw_utils: Add quirk to ignore RT712 CODEC_MIC
Mac Chiang [Fri, 8 May 2026 09:32:23 +0000 (17:32 +0800)] 
ASoC: sdw_utils: Add quirk to ignore RT712 CODEC_MIC

Some devices do not use CODEC_MIC but use the host PCH_DMIC
instead. Add a quirk to skip the CODEC_MIC DAI when it is not present
in disco table, ensuring the correct capture device is used.

If CODEC_MIC is present, it continues to be used as default.

Fixes: 9489db97f6f0 ("ASoC: sdw_utils: add SmartMic DAI for RT712 VB")
Signed-off-by: Mac Chiang <mac.chiang@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260508093224.1246282-2-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agoASoC: Intel: soc-acpi: add LG Gram 16Z90U RT713 + single RT1320 quirk
Jang Pyohwan [Sat, 9 May 2026 08:53:10 +0000 (17:53 +0900)] 
ASoC: Intel: soc-acpi: add LG Gram 16Z90U RT713 + single RT1320 quirk

Add a SoundWire machine table entry for the LG Gram Pro 2026
(16Z90U-KU7BK), which has an unusual configuration:

  sdw:0:1:025d:1320:01   single stereo RT1320 SmartAmp on link 1
  sdw:0:3:025d:0713:01   RT713 jack/headset codec on link 3

Existing rt713-rt1320 boards have two RT1320 amps on different links
("link_mask = BIT(1) | BIT(2) | BIT(3)"). The LG Gram uses a single
stereo RT1320 chip, so the new entry uses "link_mask = BIT(1) |
BIT(3)" with the existing rt1320_1_group2_adr structure, leaving the
two-channel routing to the topology.

The RT713 on this board does not expose a SMART_MIC function in
ACPI, so the .machine_check callback used by the existing entries
(snd_soc_acpi_intel_sdca_is_device_rt712_vb) would reject this
board. Drop machine_check for the new entry; speaker output and
the headset jack do not depend on the SMART_MIC presence check.

The corresponding topology source has been submitted to the SOF
project at https://github.com/thesofproject/sof/pull/10760 . The
generated sof-ptl-rt713-l3-rt1320-l1-2ch.tplg and
nhlt-sof-ptl-rt713-l3-rt1320-l1.bin will follow in linux-firmware
once that lands.

Tested on Ubuntu 26.04 with kernel 7.0.0-15: speaker (RT1320
stereo), headphone jack with auto-routing, headset mic, and the
internal NHLT DMIC array all work via the UCM HiFi profile.

Signed-off-by: Jang Pyohwan <vhgksl@daum.net>
Link: https://patch.msgid.link/20260509175317.DnhjxHczQay7kkp5z6t4lg@vhgksl.daum.net
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agoASoC: soc-acpi-intel-arl-match: add rt712_l0_rt1320_l3 support
Mark Brown [Mon, 11 May 2026 01:02:29 +0000 (10:02 +0900)] 
ASoC: soc-acpi-intel-arl-match: add rt712_l0_rt1320_l3 support

Bard Liao <yung-chuan.liao@linux.intel.com> says:

Add rt712_l0_rt1320_l3 support for ARL.

5 weeks agoASoC: soc-acpi-intel-arl-match: add rt712_l0_rt1320_l3 support
Gary C Wang [Fri, 8 May 2026 10:42:38 +0000 (18:42 +0800)] 
ASoC: soc-acpi-intel-arl-match: add rt712_l0_rt1320_l3 support

Add support for using the rt712 multi-function codec on link 0 and the
rt1320 amplifier on link 3 on ARL platforms.

Signed-off-by: Gary C Wang <gary.c.wang@intel.com>
Co-developed-by: Mac Chiang <mac.chiang@intel.com>
Signed-off-by: Mac Chiang <mac.chiang@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260508104239.1247525-3-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agoASoC: Intel: soc-acpi-intel-arl-match: Reorder ACPI machine tables
Mac Chiang [Fri, 8 May 2026 10:42:37 +0000 (18:42 +0800)] 
ASoC: Intel: soc-acpi-intel-arl-match: Reorder ACPI machine tables

When the SOF device driver enumerates the machine tables,
it selects the entry with the most numbers of matched links in
ascending order.

Align the ordering with commit 08095e20995ad6e3648af7416c90163627fe7e44
("ASoC: Intel: soc-acpi-intel-ptl-match: Sort ACPI link/machine tables").

Signed-off-by: Mac Chiang <mac.chiang@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260508104239.1247525-2-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: switch to managed controller allocation (part 2/3)
Mark Brown [Mon, 11 May 2026 00:55:56 +0000 (09:55 +0900)] 
spi: switch to managed controller allocation (part 2/3)

Johan Hovold <johan@kernel.org> says:

In preparation for fixing the SPI controller API so that it no longer
drops a reference when deregistering (non-managed) controllers (cf.
[1]), this series converts drivers using non-managed registration to use
managed allocation.

Included is also a related cleanup of a ti-qspi error path.

This second set will be followed by a third set of 12 patches for
drivers using managed registration.

That leaves us with 18 drivers using non-managed allocation, which is
few enough to be able to fix the API in tree-wide change.

Johan

[1] https://lore.kernel.org/lkml/20260325145319.1132072-1-johan@kernel.org/

Link: https://patch.msgid.link/20260505072909.618363-1-johan@kernel.org
5 weeks agospi: zync-qspi: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:29:09 +0000 (09:29 +0200)] 
spi: zync-qspi: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-21-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: uniphier: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:29:08 +0000 (09:29 +0200)] 
spi: uniphier: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-20-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: ti-qspi: cleanup registration error path
Johan Hovold [Tue, 5 May 2026 07:29:07 +0000 (09:29 +0200)] 
spi: ti-qspi: cleanup registration error path

Add a proper error path for when registration fails so that the probe
tests for errors consistently.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-19-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: ti-qspi: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:29:06 +0000 (09:29 +0200)] 
spi: ti-qspi: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-18-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: tegra20-sflash: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:29:05 +0000 (09:29 +0200)] 
spi: tegra20-sflash: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-17-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: tegra114: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:29:04 +0000 (09:29 +0200)] 
spi: tegra114: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-16-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: syncuacer: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:29:03 +0000 (09:29 +0200)] 
spi: syncuacer: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-15-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: sun6i: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:29:02 +0000 (09:29 +0200)] 
spi: sun6i: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-14-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: sun4i: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:29:01 +0000 (09:29 +0200)] 
spi: sun4i: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-13-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: st-ssc4: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:29:00 +0000 (09:29 +0200)] 
spi: st-ssc4: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-12-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: sprd: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:59 +0000 (09:28 +0200)] 
spi: sprd: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-11-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: slave-mt27xx: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:58 +0000 (09:28 +0200)] 
spi: slave-mt27xx: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-10-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: sifive: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:57 +0000 (09:28 +0200)] 
spi: sifive: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-9-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: sh-msiof: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:56 +0000 (09:28 +0200)] 
spi: sh-msiof: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-8-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: sh-hspi: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:55 +0000 (09:28 +0200)] 
spi: sh-hspi: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-7-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: rspi: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:54 +0000 (09:28 +0200)] 
spi: rspi: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-6-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: qup: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:53 +0000 (09:28 +0200)] 
spi: qup: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-5-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: pl022: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:52 +0000 (09:28 +0200)] 
spi: pl022: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-4-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: pic32-sqi: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:51 +0000 (09:28 +0200)] 
spi: pic32-sqi: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-3-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: pic32: switch to managed controller allocation
Johan Hovold [Tue, 5 May 2026 07:28:50 +0000 (09:28 +0200)] 
spi: pic32: switch to managed controller allocation

Switch to device managed controller allocation to simplify error
handling and to avoid having to take another reference during
deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260505072909.618363-2-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agoregulator: Kconfig: fix a typo in help
Ihor Matushchak [Fri, 8 May 2026 08:49:33 +0000 (10:49 +0200)] 
regulator: Kconfig: fix a typo in help

Fixes a typo in Kconfig, 'protectorvia' -> 'protector via'.

Signed-off-by: Ihor Matushchak <ihor.matushchak@foobox.net>
Link: https://patch.msgid.link/20260508084933.4076-1-ihor.matushchak@foobox.net
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agospi: amd: Set correct bus number in ACPI probe path
Krishnamoorthi M [Thu, 7 May 2026 18:00:51 +0000 (23:30 +0530)] 
spi: amd: Set correct bus number in ACPI probe path

On platforms where the HID2 SPI controller (AMDI0063) is enumerated via
ACPI instead of PCI, amd_spi_probe() unconditionally sets bus_num to 0,
while the PCI probe path assigns bus_num 2 for HID2 controller.

Align the ACPI probe path to use the same bus number so that userspace
and SPI client drivers see a consistent bus assignment regardless of the
enumeration method.

Fixes: b644c2776652 ("spi: spi_amd: Add PCI-based driver for AMD HID2 SPI controller")
Cc: stable@vger.kernel.org # v6.16+
Signed-off-by: Krishnamoorthi M <krishnamoorthi.m@amd.com>
Link: https://patch.msgid.link/20260507180051.4158674-1-krishnamoorthi.m@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agoASoC; dt-bindings: mediatek,mt8173-rt5650-rt5514: Fix mediatek,audio-codec constraints
Rob Herring (Arm) [Fri, 8 May 2026 18:24:37 +0000 (13:24 -0500)] 
ASoC; dt-bindings: mediatek,mt8173-rt5650-rt5514: Fix mediatek,audio-codec constraints

A phandle-array is really a matrix and needs constraints on the number
of elements for both the inner and outer dimensions. Add the missing
inner constraints.

Fixes: 472d77bdc511 ("ASoC: dt-bindings: mediatek,mt8173-rt5650-rt5514: convert to DT schema")
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260508182438.1757394-1-robh@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agoMAINTAINERS: ASoC/ti: Remove myself and add Sen Wang as maintainer
Peter Ujfalusi [Tue, 5 May 2026 16:47:44 +0000 (19:47 +0300)] 
MAINTAINERS: ASoC/ti: Remove myself and add Sen Wang as maintainer

As I cannot spend adequate time to fulfill my role as maintainer for the
TI ASoC drivers, it is for the better if I resign and hand over the role
to Sen Wang.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Acked-by: Nishanth Menon <nm@ti.com>
Link: https://patch.msgid.link/20260505164744.16134-1-peter.ujfalusi@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 weeks agorust: pin-init: examples: fix `useless_borrows_in_formatting` clippy warning
Gary Guo [Tue, 5 May 2026 11:51:37 +0000 (12:51 +0100)] 
rust: pin-init: examples: fix `useless_borrows_in_formatting` clippy warning

Clippy 1.97 introduces new `useless_borrows_in_formatting` warning which
fires on the examples as we have `&*expr` where the format macro takes
reference already. Remove the extra borrow.

Link: https://patch.msgid.link/20260505115138.2466966-1-gary@kernel.org
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: internal: remove `collect_tuple` polyfill after MSRV bump
Gary Guo [Fri, 1 May 2026 13:44:45 +0000 (14:44 +0100)] 
rust: pin-init: internal: remove `collect_tuple` polyfill after MSRV bump

Tuples implement `FromIterator` since Rust 1.79. Remove the `collect_tuple`
polyfill now the MSRV is above 1.79.

To avoid over-identing the closure, I move the `Field` destructure from the
closure parameter to a let binding. This keeps the diff small.

Link: https://patch.msgid.link/20260501134445.3809731-1-gary@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: internal: turn `PhantomPinned` error into warnings
Gary Guo [Tue, 28 Apr 2026 13:10:59 +0000 (14:10 +0100)] 
rust: pin-init: internal: turn `PhantomPinned` error into warnings

The `PhantomPinned` detection is just a lint, and is emitted as an error
because there is no `compile_warning!()` macro, and
`proc-macro-diagnostics` is not stable.

Use of `#[deprecated = ""]` attribute to approximate custom proc-macro
warnings. A new line is added before message for visual clarity.

An example warning with this trick looks like this:

    warning: use of deprecated function `_::warn`:
             The field `pin` of type `PhantomPinned` only has an effect if it has the `#[pin]` attribute
     --> test.rs:9:5
      |
    9 |     pin: marker::PhantomPinned,
      |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Suggested-by: Benno Lossin <lossin@kernel.org>
Link: https://github.com/Rust-for-Linux/pin-init/issues/51
Link: https://patch.msgid.link/20260428-pin-init-sync-v1-10-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: cleanup workaround for old Rust compiler
Gary Guo [Tue, 28 Apr 2026 13:10:58 +0000 (14:10 +0100)] 
rust: pin-init: cleanup workaround for old Rust compiler

The workaround mentions it's for Rust versions before 1.81. The minimum is
now 1.82, thus clean up.

Link: https://patch.msgid.link/20260428-pin-init-sync-v1-9-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: fix badge URL in README
Gary Guo [Tue, 28 Apr 2026 13:10:57 +0000 (14:10 +0100)] 
rust: pin-init: fix badge URL in README

The old CI workflow has been deleted ~1 year ago. Fix the URL to point to
the correct one.

Link: https://patch.msgid.link/20260428-pin-init-sync-v1-8-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: internal: adjust license identifier of `zeroable.rs`
Benno Lossin [Tue, 28 Apr 2026 13:10:56 +0000 (14:10 +0100)] 
rust: pin-init: internal: adjust license identifier of `zeroable.rs`

The pin-init crate has been licensed under `Apache-2.0 OR MIT` since the
beginning. I introduced in commit 071cedc84e90 ("rust: add derive macro for
`Zeroable`") `zeroable.rs` with incompatible GPL-2.0 SPDX identifier. The
file has not been modified by other authors, so relicense it under the
above license.

Signed-off-by: Benno Lossin <lossin@kernel.org>
[ Reworded commit message - Gary ]
Link: https://patch.msgid.link/20260428-pin-init-sync-v1-7-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: internal: remove redundant `#[pin]` filtering
Gary Guo [Tue, 28 Apr 2026 13:10:55 +0000 (14:10 +0100)] 
rust: pin-init: internal: remove redundant `#[pin]` filtering

The `generate_projections` and `generate_the_pin_data` function already
receive filtered field lists, they do not need to filter out `#[pin]`
again.

Reviewed-by: Benno Lossin <lossin@kernel.org>
Link: https://patch.msgid.link/20260428-pin-init-sync-v1-6-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: internal: add missing where clause to projection types
Mohamad Alsadhan [Tue, 28 Apr 2026 13:10:54 +0000 (14:10 +0100)] 
rust: pin-init: internal: add missing where clause to projection types

`#[pin_data]` failed to propagate the struct's `where` clause to the
generated projection struct. As a result, bounds written in a `where`
clause could be dropped during expansion, causing type errors when
fields depended on those bounds.

Fix this by adding the missing `where` clause to the generated
projection struct.

Reported-by: Andreas Hindborg <a.hindborg@kernel.org>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/561532-pin-init/topic/generic.20bounds.20and.20.60.23.5Bpin_data.5D.60/with/578381591
Signed-off-by: Mohamad Alsadhan <mo@sdhn.cc>
Reviewed-by: Gary Guo <gary@garyguo.net>
[ Reworded commit message - Gary ]
Link: https://patch.msgid.link/20260428-pin-init-sync-v1-5-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: extend `impl_zeroable_option` macro to handle generics
Mohamad Alsadhan [Tue, 28 Apr 2026 13:10:53 +0000 (14:10 +0100)] 
rust: pin-init: extend `impl_zeroable_option` macro to handle generics

Improve impl_zeroable_option macro to handle generic impls for types
like `&T`, `&mut T`, `NonNull<T>`, and others (for which `Option<T>`
is guaranteed to be zeroable) with similar approach to
`impl_zeroable`.

Also, update old declarations to use generics e.g. `NonZeroU8` to
`NonZero<u8>`.

Signed-off-by: Mohamad Alsadhan <mo@sdhn.cc>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260428-pin-init-sync-v1-4-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: cleanup `Zeroable` and `ZeroableOptions`
Mohamad Alsadhan [Tue, 28 Apr 2026 13:10:52 +0000 (14:10 +0100)] 
rust: pin-init: cleanup `Zeroable` and `ZeroableOptions`

Place definitions and implementations (incl. macro invocations) of
the `Zeroable` trait first in the relevant section of `src/lib.rs`,
followed by the ZeroableOption trait and its implementations.

Rename `impl_non_zero_int_zeroable_option` to `impl_zeroable_option`
for consistency.

This commit should not introduce any functional changes.

Signed-off-by: Mohamad Alsadhan <mo@sdhn.cc>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260428-pin-init-sync-v1-3-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: bump minimum Rust version to 1.82
Gary Guo [Tue, 28 Apr 2026 13:10:51 +0000 (14:10 +0100)] 
rust: pin-init: bump minimum Rust version to 1.82

Following the kernel minimum version bump in commit f32fb9c58a5b ("rust:
bump Rust minimum supported version to 1.85.0 (Debian Trixie)"), bump
pin-init's minimum Rust version to 1.82.

This removes the `lint_reasons` feature which is stabilized in 1.81 and the
`raw_ref_ops` and `new_uninit` features which are stabilized in 1.82.

Given we do not use any features that are stabilized in 1.82..=1.85 range,
and pin-init crate is useful for other projects which may have their own
MSRV requirements, the minimum version is not straightly bumped to 1.85.

Reviewed-by: Benno Lossin <lossin@kernel.org>
Link: https://patch.msgid.link/20260428-pin-init-sync-v1-2-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agorust: pin-init: examples: mark as `#[inline]` all `From::from()`s for `Error`
Alistair Francis [Tue, 28 Apr 2026 13:10:50 +0000 (14:10 +0100)] 
rust: pin-init: examples: mark as `#[inline]` all `From::from()`s for `Error`

There was a recent request in kernel [1] to mark as `#[inline]` the
simple `From::from()` functions implemented for `Error`.

Thus mark all of the existing

    impl From<...> for Error {
        fn from(err: ...) -> Self {
            ...
        }
    }

functions as `#[inline]`.

While in pin-init crate the relevant code is just examples, it
nevertheless does not hurt to use good practice for them.

Suggested-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/all/8403c8b7a832b5274743816eb77abfa4@garyguo.net/
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
[ Reworded commit message - Gary ]
Link: https://patch.msgid.link/20260428-pin-init-sync-v1-1-07f9bd3859fb@garyguo.net
Signed-off-by: Gary Guo <gary@garyguo.net>
5 weeks agoLinux 7.1-rc3 v7.1-rc3
Linus Torvalds [Sun, 10 May 2026 21:08:09 +0000 (14:08 -0700)] 
Linux 7.1-rc3

5 weeks agosched_ext: Handle SCX_TASK_NONE in disable/switched_from paths
Tejun Heo [Sun, 10 May 2026 20:08:16 +0000 (10:08 -1000)] 
sched_ext: Handle SCX_TASK_NONE in disable/switched_from paths

scx_fail_parent() leaves cgroup tasks at (state=NONE, sched=parent,
sched_class=ext) until the parent itself is torn down by the scx_error() it
raised. When the later root_disable iterates them, two paths trip on NONE.

scx_disable_and_exit_task() re-enters the wrapper at NONE: the inner switch
returns early but the trailing scx_set_task_sched(p, NULL) clobbers the
parent sched left by scx_fail_parent(), and scx_set_task_state(p, NONE)
wastes a write on an already-NONE task. switched_from_scx() then calls
scx_disable_task(), which WARNs on non-ENABLED state and writes state=READY,
producing a NONE -> READY transition the validation matrix rejects.

Treat NONE as "nothing to do" in both paths. Add a NONE early-return at the
top of scx_disable_and_exit_task() and a parallel NONE check in
switched_from_scx() next to task_dead_and_done().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
5 weeks agosched_ext: Close sub-sched init race with post-init DEAD recheck
Tejun Heo [Sun, 10 May 2026 20:08:16 +0000 (10:08 -1000)] 
sched_ext: Close sub-sched init race with post-init DEAD recheck

scx_sub_enable_workfn()'s init pass and scx_sub_disable() migration both
drop the rq lock to call __scx_init_task() against the other sched. A
TASK_DEAD @p can fall through sched_ext_dead() in that window.
sched_ext_dead() runs ops.exit_task() on the sched @p was attached to, not
on the sched whose init just completed, so the new allocation leaks.

Reuse the DEAD signal set by sched_ext_dead(). After __scx_init_task()
returns, take task_rq_lock(p) and check for DEAD; on hit, call
scx_sub_init_cancel_task() against the sub sched the init ran for and drop
@p; on miss, proceed as before.

Reported-by: zhidao su <suzhidao@xiaomi.com>
Link: https://lore.kernel.org/all/20260429133155.3825247-1-suzhidao@xiaomi.com/
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
5 weeks agosched_ext: Close root-enable vs sched_ext_dead() race with SCX_TASK_INIT_BEGIN
Tejun Heo [Sun, 10 May 2026 20:08:16 +0000 (10:08 -1000)] 
sched_ext: Close root-enable vs sched_ext_dead() race with SCX_TASK_INIT_BEGIN

scx_root_enable_workfn() drops the iter rq lock for ops.init_task() and a
TASK_DEAD @p can fall through sched_ext_dead() in that window. The race hits
when sched_ext_dead() observes SCX_TASK_INIT (the intermediate state before
@p->scx.sched is published) and dereferences NULL via SCX_HAS_OP(NULL,
exit_task), or observes SCX_TASK_NONE during the unlocked init window and
skips cleanup so exit_task() never runs.

Add SCX_TASK_INIT_BEGIN. The enable path writes NONE -> INIT_BEGIN under the
iter rq lock, then takes the rq lock again after init to walk INIT_BEGIN ->
INIT -> READY. sched_ext_dead() that wins the rq-lock race observes
INIT_BEGIN and sets DEAD without calling into ops; the post-init recheck
unwinds via scx_sub_init_cancel_task().

scx_fork() runs single-threaded against sched_ext_dead() (the task is not on
scx_tasks until scx_post_fork() adds it) so its INIT_BEGIN -> INIT walk
needs no rq-lock pairing; it rolls back to NONE on ops.init_task() failure.

The validation matrix grows the INIT_BEGIN row and the INIT_BEGIN -> DEAD
edge; INIT now requires INIT_BEGIN as the predecessor. scx_sub_disable()'s
migration writes INIT_BEGIN as a synthetic predecessor to satisfy the
tightened verification.

The sub-sched paths still race with sched_ext_dead() during the unlocked
init window. This will be fixed by the next patch.

Reported-by: zhidao su <suzhidao@xiaomi.com>
Link: https://lore.kernel.org/all/20260429133155.3825247-1-suzhidao@xiaomi.com/
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
5 weeks agosched_ext: Replace SCX_TASK_OFF_TASKS flag with SCX_TASK_DEAD state
Tejun Heo [Sun, 10 May 2026 20:08:16 +0000 (10:08 -1000)] 
sched_ext: Replace SCX_TASK_OFF_TASKS flag with SCX_TASK_DEAD state

SCX_TASK_OFF_TASKS marked tasks already through sched_ext_dead() so cgroup
task iteration would skip them. This can be expressed better with a task
state. Replace the flag with SCX_TASK_DEAD.

scx_disable_and_exit_task() resets state to NONE on its way out, so
sched_ext_dead() now sets DEAD after the wrapper returns. The validation
matrix grows NONE -> DEAD, warns on DEAD -> NONE, and tightens READY's
predecessor to INIT or ENABLED so the new DEAD value cannot silently
transition to READY.

Prepares for the following enable vs dead race fix.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
5 weeks agosched_ext: Inline scx_init_task() and move RESET_RUNNABLE_AT into scx_set_task_state()
Tejun Heo [Sun, 10 May 2026 20:08:16 +0000 (10:08 -1000)] 
sched_ext: Inline scx_init_task() and move RESET_RUNNABLE_AT into scx_set_task_state()

Prepare for the SCX_TASK_INIT_BEGIN/DEAD work that follows by collapsing the
scx_init_task() helper. Move the SCX_TASK_RESET_RUNNABLE_AT setting into
scx_set_task_state() on the INIT transition (it was set unconditionally at
every INIT site through the scx_init_task() helper), inline scx_init_task()
into scx_fork() and scx_root_enable_workfn(), and drop the helper.

As a side effect, scx_sub_disable() migration sequence now also sets
RESET_RUNNABLE_AT (it previously wrote INIT directly without going through
scx_init_task()). The flag triggers a runnable_at reset on the next
set_task_runnable(), which is harmless on a task that has just been moved
between scheds.

On root-enable, p->scx.flags is written without the task's rq lock. The task
isn't visible to scx yet, and a follow-up patch restores the lock-held
write.

v2: Note p->scx.flags rq-lock relaxation on root-enable path. (Andrea)

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
5 weeks agosched_ext: Cleanups in preparation for the SCX_TASK_INIT_BEGIN/DEAD work
Tejun Heo [Sun, 10 May 2026 20:08:16 +0000 (10:08 -1000)] 
sched_ext: Cleanups in preparation for the SCX_TASK_INIT_BEGIN/DEAD work

Cleanups in preparation for the state-machine work that follows:

- Convert three sub-sched call sites that open-code
  rcu_assign_pointer(p->scx.sched, ...) to scx_set_task_sched().

- Move scx_get_task_state()/scx_set_task_state() above the SCX task iter
  section so scx_task_iter_next_locked() can use them without a forward
  declaration.

No functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>