Gil Portnoy [Thu, 11 Jun 2026 13:59:19 +0000 (22:59 +0900)]
ksmbd: reject non-VALID session in compound request branch
smb2_check_user_session() takes a shortcut for any operation that is not
the first in a COMPOUND request: it reuses work->sess (the session bound by
the first operation) and validates only the SessionId, then returns
"valid". It never re-checks work->sess->state == SMB2_SESSION_VALID, and a
SessionId of 0xFFFFFFFFFFFFFFFF (ULLONG_MAX, the MS-SMB2 related-operation
value) skips even the id comparison. The standalone path
(ksmbd_session_lookup_all() plus the SESSION_SETUP state machine) does
enforce the VALID state; the compound branch bypasses all of it.
A SESSION_SETUP carrying only an NTLM Type-1 (NtLmNegotiate) blob publishes
a fresh SMB2_SESSION_IN_PROGRESS session whose sess->user is still NULL
(->user is assigned later, by ntlm_authenticate()). Used as operation 1 of
a COMPOUND with operation 2 = TREE_CONNECT (related, SessionId=ULLONG_MAX,
\\host\IPC$), the tree-connect then runs on that IN_PROGRESS session and
reaches ksmbd_ipc_tree_connect_request(), which dereferences
user_name(sess->user) with sess->user == NULL (transport_ipc.c:687/701/704)
-> remote NULL-pointer dereference and a kernel Oops that wedges the ksmbd
worker for all clients.
Reject any non-first compound operation that lands on a session which is
not SMB2_SESSION_VALID, mirroring the validity the standalone lookup path
enforces. SESSION_SETUP itself legitimately runs on an IN_PROGRESS session,
but it is never carried as a non-first compound operation, so multi-leg
authentication is unaffected by this check.
Fixes: 5005bcb42191 ("ksmbd: validate session id and tree id in the compound request") Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Wed, 10 Jun 2026 09:46:10 +0000 (18:46 +0900)]
ksmbd: compress SMB2 READ responses
Handle SMB2_READFLAG_REQUEST_COMPRESSED for non-RDMA reads.
Flatten the response iov, emit chained or unchained LZ77 transforms when
compression is beneficial, and retain the generated buffer until the work
item is released.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Advertise LZ77 and Pattern_V1 with chained transform support in the
SMB 3.1.1 compression negotiate context. Validate the server's returned
algorithm list and flags, then retain the negotiated capabilities for a
future compressed transform receive implementation.
This patch only negotiates capabilities. It does not request compressed
READ responses or add a compressed transform receive path.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Gil Portnoy [Wed, 10 Jun 2026 11:13:51 +0000 (20:13 +0900)]
ksmbd: add per-handle permission check to FILE_LINK_INFORMATION
The FILE_LINK_INFORMATION arm of smb2_set_info_file() calls
smb2_create_link() with no per-handle fp->daccess check. On the
ReplaceIfExists path smb2_create_link() unlinks an existing file at the
target name (ksmbd_vfs_remove_file) and creates a hardlink
(ksmbd_vfs_link); neither helper checks daccess. A handle opened with
FILE_READ_DATA only (no FILE_DELETE, no FILE_WRITE_DATA) can therefore
delete an arbitrary file in the share and plant a hardlink over its name.
The sibling delete/move arms in the same switch already gate:
FILE_RENAME_INFORMATION and FILE_DISPOSITION_INFORMATION both require
FILE_DELETE_LE; FILE_FULL_EA_INFORMATION requires FILE_WRITE_EA_LE. Gate
the link arm the same way as its closest analogue (rename), since it
mutates the namespace and, on replace, deletes an existing entry.
This is a sibling of commit cc57232cae23 ("ksmbd: fix FSCTL permission
bypass by adding a permission check for FSCTL_SET_SPARSE").
Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Gil Portnoy [Wed, 10 Jun 2026 11:07:04 +0000 (20:07 +0900)]
ksmbd: add a permission check for FSCTL_SET_ZERO_DATA
FSCTL_SET_ZERO_DATA in smb2_ioctl() destroys file data via
ksmbd_vfs_zero_data() -> vfs_fallocate(PUNCH_HOLE/ZERO_RANGE) after
checking only the share-level KSMBD_TREE_CONN_FLAG_WRITABLE, with no
per-handle access check. A handle opened with only FILE_WRITE_ATTRIBUTES
still yields an FMODE_WRITE filp (FILE_WRITE_ATTRIBUTES is part of
FILE_WRITE_DESIRE_ACCESS_LE, so smb2_create_open_flags() opens it
O_WRONLY), so the vfs_fallocate FMODE_WRITE check does not stop it; only
the missing fp->daccess gate would. Reproduced on mainline 7.1-rc7 with
KASAN by an authenticated SMB client: a FILE_WRITE_ATTRIBUTES-only handle
zeroed 4096 bytes of file data it had no FILE_WRITE_DATA right to
(6/6; a FILE_READ_DATA-only handle was correctly denied).
This is the unfixed sibling of commit cc57232cae23 ("ksmbd: fix FSCTL
permission bypass by adding a permission check for FSCTL_SET_SPARSE").
Because SET_ZERO_DATA writes data (not an attribute), require
FILE_WRITE_DATA.
Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Gil Portnoy [Tue, 9 Jun 2026 00:00:00 +0000 (00:00 +0000)]
ksmbd: add a WRITE_DAC/WRITE_OWNER check to SMB2 SET_INFO SECURITY
commit cc57232cae23 ("ksmbd: fix FSCTL permission bypass by adding a
permission check for FSCTL_SET_SPARSE") added a fp->daccess gate to
fsctl_set_sparse and noted that "similar handle-level checks exist in other
functions but are missing here." The SMB2 SET_INFO SECURITY arm is one of
the missing ones, and the most security-relevant: smb2_set_info_sec() calls
set_info_sec() with no per-handle access check.
set_info_sec() (fs/smb/server/smbacl.c) re-permissions the file: it
rewrites owner/group/mode via notify_change(), rewrites the POSIX ACL via
set_posix_acl(), and on KSMBD_SHARE_FLAG_ACL_XATTR shares removes and
rewrites the Windows security descriptor via ksmbd_vfs_set_sd_xattr().
Every other persistent-mutation arm of the sibling handler
smb2_set_info_file() checks fp->daccess first (FILE_WRITE_DATA /
FILE_DELETE / FILE_WRITE_EA / FILE_WRITE_ATTRIBUTES); the SECURITY arm —
which mutates the access control itself — is the only one with no gate.
A client can therefore open a handle with FILE_WRITE_ATTRIBUTES only (no
FILE_WRITE_DAC / FILE_WRITE_OWNER) and use SMB2_SET_INFO with InfoType
SMB2_O_INFO_SECURITY to rewrite the file's DACL and owner, granting itself
access the handle's daccess never carried. Unlike the FSCTL data arms this
is a metadata/xattr operation, so there is no FMODE_WRITE VFS backstop —
the missing fp->daccess check is the entire gate.
Setting a security descriptor is the WRITE_DAC / WRITE_OWNER operation, so
require at least one of those on the handle before re-permissioning the
file. -EACCES is mapped to STATUS_ACCESS_DENIED by smb2_set_info().
Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Gil Portnoy [Wed, 10 Jun 2026 10:53:14 +0000 (19:53 +0900)]
ksmbd: fix use-after-free of a deferred file_lock on SMB2_CLOSE then SMB2_CANCEL
Commit f580d27e8928 ("ksmbd: fix use-after-free of a deferred file_lock on
double SMB2_CANCEL") made smb2_cancel() skip a work whose state is
KSMBD_WORK_CANCELLED, so its cancel_fn cannot be fired a second time. But
KSMBD_WORK has three states (ACTIVE, CANCELLED, CLOSED), and the same
freeing producer path is reached for CLOSED too:
SMB2_CLOSE on the locking handle -> set_close_state_blocked_works() sets
the deferred work's state to KSMBD_WORK_CLOSED and wakes the smb2_lock()
worker. The worker takes the non-ACTIVE early-exit, locks_free_lock()s
the file_lock and, because the state is not KSMBD_WORK_CANCELLED, takes
the STATUS_RANGE_NOT_LOCKED branch with "goto out2" -- which, like the
cancelled branch, skips release_async_work(). The work stays on
conn->async_requests with a live cancel_fn = smb2_remove_blocked_lock
pointing at the freed file_lock.
A subsequent SMB2_CANCEL for the same AsyncId then passes the
KSMBD_WORK_CANCELLED-only guard (its state is KSMBD_WORK_CLOSED), so
smb2_cancel() fires cancel_fn again over the freed file_lock -- the same
use-after-free fixed, via SMB2_CLOSE instead of a first SMB2_CANCEL:
BUG: KASAN: slab-use-after-free in __locks_delete_block
__locks_delete_block
locks_delete_block
ksmbd_vfs_posix_lock_unblock
smb2_remove_blocked_lock
smb2_cancel <- 2nd SMB2_CANCEL fires cancel_fn
handle_ksmbd_work
Allocated by ...: locks_alloc_lock <- smb2_lock
Freed by ...: locks_free_lock <- smb2_lock (non-ACTIVE early-exit)
... cache file_lock_cache of size 192
Reproduced on mainline 7.1-rc7 (which already contains f580d27e8928) with
KASAN by an authenticated SMB client; the double-SMB2_CANCEL control is
silent on that kernel, so the splat is attributable to the CLOSE trigger.
Only an ACTIVE deferred work may have its cancel_fn fired: both terminal
states (CANCELLED and CLOSED) reach the smb2_lock() early-exit that frees
the file_lock and skips release_async_work(). Guard on KSMBD_WORK_ACTIVE
so any non-active work is skipped.
Fixes: f580d27e8928 ("ksmbd: fix use-after-free of a deferred file_lock on double SMB2_CANCEL") Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: server: remove code guarded by nonexistent config option
A small piece of code in fs/smb/server/smb_common.c depends on
CONFIG_SMB_INSECURE_SERVER, which has never been defined in the
mainline kernel, but was present in old out-of-tree versions of ksmbd.
Remove this dead code.
Discovered while searching for CONFIG_* symbols referenced in code but
not defined in any Kconfig file.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Sun, 7 Jun 2026 11:15:51 +0000 (20:15 +0900)]
ksmbd: prevent path traversal bypass by restricting caseless retry
ksmbd_vfs_path_lookup() enforces LOOKUP_BENEATH to restrict path
resolution within the share root. When a crafted path attempts to
escape the share boundary using parent-directory components ('..'),
vfs_path_parent_lookup() detects this and immediately fails,
returning -EXDEV.
However, a bug exists in __ksmbd_vfs_kern_path() under caseless mode.
The function fails to intercept the -EXDEV error and erroneously
falls through to the caseless retry logic, which is intended only
for genuinely missing files. During this retry process, the path
is reconstructed, leading to an unintended LOOKUP_BENEATH bypass
that allows write-capable users to create zero-length files or
directories outside the exported share.
Fix this by ensuring that the execution only proceeds to the caseless
lookup retry when the error is specifically -ENOENT. Any other errors,
such as -EXDEV from a path traversal attempt, must be returned immediately.
Cc: stable@vger.kernel.org Reported-by: Y s65 <yu4ys@outlook.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Davide Ornaghi [Sat, 6 Jun 2026 07:11:04 +0000 (16:11 +0900)]
ksmbd: fix UAF of struct file_lock in SMB2_LOCK deferred-lock cancellation
When a blocking byte-range lock request is deferred in the
FILE_LOCK_DEFERRED path, ksmbd registers the asynchronous work into
the connection's async_requests list via setup_async_work(). The cancel
callback smb2_remove_blocked_lock() holds a reference to the flock.
If the lock waiter is subsequently woken up but the work state is no
longer KSMBD_WORK_ACTIVE (e.g., due to a concurrent cancellation), the
cleanup path calls locks_free_lock(flock) without dequeuing the work from
the async_requests list. Concurrently, smb2_cancel() walks the list
under conn->request_lock and invokes the cancel callback, which then
dereferences the already freed 'flock'. This leads to a slab-use-after-free
inside __wake_up_common.
Fix this by restructuring the cleanup logic after the worker returns
from ksmbd_vfs_posix_lock_wait(). Move list_del(&smb_lock->llist) and
release_async_work(work) to the top of the cleanup block. This guarantees
that the async work is completely dequeued and serialized under
conn->request_lock before locks_free_lock(flock) is called, rendering
the flock unreachable for any concurrent smb2_cancel().
Cc: stable@vger.kernel.org Signed-off-by: Davide Ornaghi <d.ornaghi97@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Guangshuo Li [Fri, 5 Jun 2026 04:30:16 +0000 (12:30 +0800)]
ksmbd: fix use-after-free in same_client_has_lease()
same_client_has_lease() returns an opinfo pointer from ci->m_op_list
after dropping ci->m_lock without taking a reference.
smb_grant_oplock() then dereferences that pointer in copy_lease() and
when checking breaking_cnt. A concurrent close can remove the old lease
from ci->m_op_list and drop the last reference before the caller uses
the returned pointer, leading to a use-after-free.
Take a reference when same_client_has_lease() selects an existing lease,
drop any previous match while scanning, and release the returned
reference in smb_grant_oplock() after copying the lease state.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Hem Parekh [Tue, 2 Jun 2026 23:56:46 +0000 (16:56 -0700)]
ksmbd: fix out-of-bounds read in smb_check_perm_dacl()
The permission-check ACE walk in smb_check_perm_dacl() validates the ACE
header size and caps sid.num_subauth at SID_MAX_SUB_AUTHORITIES, but it
never checks that ace->size is actually large enough to contain
num_subauth sub-authorities before compare_sids() dereferences them.
CIFS_SID_BASE_SIZE covers the SID header up to but excluding the
sub_auth[] array, and offsetof(struct smb_ace, sid) is the ACE header,
so the existing guards only guarantee the 8-byte SID base, i.e. zero
sub-authorities. compare_sids() then reads ace->sid.sub_auth[i] for
i < min(local_sid->num_subauth, ace->sid.num_subauth). The local
comparison SIDs (sid_everyone, sid_unix_NFS_mode, and the id_to_sid()
result) always have at least one sub-authority, and an attacker controls
the ACE revision and authority bytes (which lie within the in-bounds SID
base), so they can match one of those SIDs and force the sub_auth read.
A crafted ACE with size == 16 and num_subauth >= 1 placed at the tail of
the security descriptor therefore causes a heap out-of-bounds read of up
to SID_MAX_SUB_AUTHORITIES * sizeof(__le32) bytes past the pntsd
allocation. The security descriptor is loaded by ksmbd_vfs_get_sd_xattr()
into a buffer sized exactly to the on-disk data (kzalloc(sd_size) in
ndr_decode_v4_ntacl()), so the read lands past the allocation. The
malformed descriptor can be stored verbatim via SMB2_SET_INFO (the DACL
is not normalised before being written to the security.NTACL xattr) and
the read fires on a subsequent SMB2_CREATE access check, making this
reachable by an authenticated client on a share that uses ACL xattrs.
Add the missing num_subauth-versus-ace_size check, mirroring the
identical guards already present in the sibling parsers parse_dacl() and
smb_inherit_dacl().
Fixes: d07b26f39246 ("ksmbd: require minimum ACE size in smb_check_perm_dacl()") Cc: stable@vger.kernel.org Signed-off-by: Hem Parekh <hemparekh1596@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Both CoreI2C and the hardened versions of it on mpfs and pic64gx have a
reset pin. For the former, usually this is wired to a common fabric
reset not managed by software and for the latter two the platform
firmware takes them out of reset on first-party boards (or those using
modified versions of the vendor firmware), but not all boards may take
this approach. Permit providing a reset in devicetree for Linux, or
other devicetree-consuming software, to use.
Various names for Qualcomm as a company are used in user-visible config
options: QCOM, Qualcomm and Qualcomm Technologies. Switch to unified
"Qualcomm" so it will be easier for users to identify the options when
for example running menuconfig.
David Carlier [Wed, 6 May 2026 15:40:15 +0000 (16:40 +0100)]
i2c: ls2x-v2: return IRQ_HANDLED after servicing an error
The event ISR reads SR1 and, when an error flag (ARLO/AF/BERR) is set,
calls loongson2_i2c_isr_error() which clears the offending flag, issues
STOP for the AF case, records msg->result, masks every CR2 interrupt
enable and completes the waiter. The handler then returns IRQ_NONE,
declaring to the IRQ core that the device did not interrupt.
That report is wrong. The device did interrupt and the handler fully
serviced it. Because the IRQ is requested with IRQF_SHARED, the genirq
spurious-IRQ tracker counts each error as unhandled. A bus that emits
sporadic NACKs, arbitration losses or bus errors will therefore march
toward the spurious-IRQ threshold and the line can end up disabled,
wedging the controller.
Return IRQ_HANDLED on this path. The other IRQ_NONE site, taken when
neither an event nor an error bit is set, remains correct.
ipv4: fib_rule: Move fib4_rules_exit() to ->exit().
syzbot reported use-after-free of net->ipv4.rules_ops. [0]
It can be reproduced with these commands:
while true; do
ip netns add ns1
ip -n ns1 link set dev lo up
ip -n ns1 address add 192.0.2.1/24 dev lo
ip -n ns1 link add name dummy1 up type dummy
ip -n ns1 address add 198.51.100.1/24 dev dummy1
ip -n ns1 rule add ipproto tcp sport 12345 table 12345
ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321
ip netns del ns1
done
The cited commit moved fib4_rules_exit() earlier to ->exit_rtnl(),
but the kernel socket destroyed in ->exit() could eventually reach
__fib_lookup().
I left fib4_rules_exit() in ->exit_rtnl() because fib4_rule_delete()
calls fib_unmerge(), which requires RTNL.
However, when ->delete() is called, ->configure() has already been
called, thus fib_unmerge() in ->delete() has no effect.
Let's remove fib_unmerge() in fib4_rule_delete() and move
fib4_rules_exit() to ->exit().
Many thanks to Ido Schimmel for providing the nice repro very quickly.
Note that we can make fib_rules_ops.delete() return void once
net-next opens.
[0]:
BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641
Eric Dumazet [Tue, 16 Jun 2026 14:13:17 +0000 (14:13 +0000)]
net: serialize netif_running() check in enqueue_to_backlog()
Syzbot reported a KASAN slab-use-after-free in fib_rules_lookup().
The root cause is a race condition where packets can escape the backlog
flushing during device unregistration (e.g., during netns exit).
Commit e9e4dd3267d0 ("net: do not process device backlog during unregistration")
introduced a lockless netif_running() check in enqueue_to_backlog() to
prevent queuing packets to an unregistering device.
However, this creates a TOCTOU race window.
A lockless transmitter (like veth_xmit) can pass
the check before dev_close() clears IFF_UP. If the transmitter is then
delayed, flush_all_backlogs() can run and finish before the transmitter
grabs the backlog lock and queues the packet. The packet then escapes
the flush and triggers UAF later when processed.
Fix this by moving the netif_running() check inside the backlog lock.
This serializes the check with the flush work (which also grabs the lock).
We then either queue the packet before the flush runs (so it gets flushed),
or check netif_running() after the flush/close completes (so it gets dropped).
Fixes: e9e4dd3267d0 ("net: do not process device backlog during unregistration") Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a315824.b0403584.28d0ff.0000.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260616141317.407791-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge in late fixes in preparation for the net-next PR.
Conflicts:
net/tls/tls_sw.c 406e8a651a7b ("net: skmsg: preserve sg.copy across SG transforms") 79511603a65b ("tls: remove dead sockmap (psock) handling from the SW path")
drivers/net/ethernet/microsoft/mana/mana_en.c f8fd56977eeea ("net: mana: guard TX wq object destroy with INVALID_MANA_HANDLE check") d07efe5a6e641 ("net: mana: Use per-queue allocation for tx_qp to reduce allocation size")
https://lore.kernel.org/ajAPXu-C_PuTgV-a@sirena.org.uk
Yiming Qian [Wed, 10 Jun 2026 06:21:36 +0000 (06:21 +0000)]
net: skmsg: preserve sg.copy across SG transforms
The sk_msg sg.copy bitmap is part of the scatterlist entry ownership
state. A set bit tells sk_msg_compute_data_pointers() not to expose the
entry through writable BPF ctx->data. This protects entries backed by
pages that are not private to the sk_msg, such as splice-backed file
page-cache pages.
Several sk_msg transform paths move, copy, split, or compact
msg->sg.data[] entries without moving the matching sg.copy bit. This can
make an externally backed entry arrive at a new slot with a clear copy
bit. A later SK_MSG verdict can then expose sg_virt(sge) as writable
ctx->data and BPF stores can modify the original page cache.
Keep sg.copy synchronized with sg.data[] whenever entries are
transferred, shifted, split, or copied into a new sk_msg. Clear the bit
when an entry is replaced by a newly allocated private page or freed.
This covers the BPF pull/push/pop helpers, sk_msg_shift_left/right(),
sk_msg_xfer(), and tls_split_open_record(), including the partial tail
entry created during TLS open-record splitting.
Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Cc: stable@vger.kernel.org Reported-by: Yiming Qian <yimingqian591@gmail.com> Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Yiming Qian <yimingqian591@gmail.com> Link: https://patch.msgid.link/20260610062137.49075-1-yimingqian591@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
appletalk: move the protocol out of tree
This tiny series moves appletalk out of tree, to:
https://github.com/linux-netdev/mod-orphan
Core maintainainers are unable to keep up with the rate of security
bug reports and fixes. Nobody seems to care about appletalk enough
to review the patches.
As Eric pointed out Mac OS dropped AppleTalk over a decade ago.
====================
Jakub Kicinski [Mon, 15 Jun 2026 22:29:35 +0000 (15:29 -0700)]
appletalk: move the protocol out of tree
AppleTalk has been removed in MacOS X 10.6 (Snow Leopard), in 2009,
according to Wikipedia. We recently got a burst of AI generated
fixes to this protocol which nobody is reviewing.
Let AppleTalk follow AX.25 and hamradio out of the Linux tree.
We we will maintain the code at: github.com/linux-netdev/mod-orphan
for anyone interested in playing with it.
Retain the uAPI for now. No strong reason, simply because I suspect
keeping it will be less controversial.
Jakub Kicinski [Mon, 15 Jun 2026 22:29:34 +0000 (15:29 -0700)]
appletalk: stop storing per-interface state in struct net_device
AppleTalk keeps its per-interface control block (struct atalk_iface)
directly in struct netdevice (dev->atalk_ptr). This is the only thing
tying the protocol into the core net_device layout and is the sole
blocker to moving AppleTalk out of tree.
Replace dev->atalk_ptr with a small ifindex-keyed hashtable internal
to ddp.c. The existing atalk_interfaces list stays the owner of the iface
objects; the hashtable is purely a fast dev->iface index and reuses
the same atalk_interfaces_lock.
AFAICT this patch does not make this code any more racy than it already
is, I'm sure Sashiko will point out some basically existing bugs.
AFAICT atalk_interfaces_lock is the innermost lock already.
i3c: mipi-i3c-hci: Use named initializers for platform_device_id's .driver_data
The assignment in this driver uses a mixed way to initialize the
platform_device_id array. .name is assigned by name and .driver_data by
position. Unify that to use named assignment for both struct members.
This is needed for a planned change to struct platform_device_id
replacing .driver_data by an anonymous union.
Jacob Moroni [Tue, 16 Jun 2026 15:56:01 +0000 (15:56 +0000)]
RDMA/irdma: Replace waitqueue and flag with completion
The driver previously used a waitqueue along with an explicit
request_done flag, but without proper barriers around request_done.
An earlier patch by Gui-Dong Han <hanguidong02@gmail.com> attempted
to fix this by adding the missing memory barriers. Rather than
adding the barriers, this patch replaces the waitqueue+flag with
a completion, which is designed for this exact purpose.
Junxian Huang [Sat, 13 Jun 2026 10:20:45 +0000 (18:20 +0800)]
RDMA/hns: Fix memory leak of bonding resources
In a corner case of concurrent driver removal and driver reset,
bonding resource is first released in hns_roce_hw_v2_exit() during
driver removal, and then is allocated again in hns_roce_register_device()
during driver reset. This leads to memory leak because the release
timing has already passed. This may also lead to a kernel panic
as below because of the leaked notifier callback:
Zhenhao Wan [Thu, 11 Jun 2026 17:15:54 +0000 (01:15 +0800)]
RDMA/rtrs-srv: Bound RDMA-Write length to chunk size in rdma_write_sg
When the server answers an RTRS READ, rdma_write_sg() builds the source
scatter/gather entry for the IB_WR_RDMA_WRITE that returns data to the
peer. Its length is taken directly from the wire descriptor:
rd_msg points into the chunk buffer that the remote peer filled via
RDMA-WRITE-WITH-IMM (rtrs_srv_rdma_done() -> process_io_req() ->
process_read()), so desc[0].len is attacker-controlled and, before this
change, was only rejected when zero. The source address is the fixed
chunk start (dma_addr[msg_id]) and the source lkey is the PD-wide
local_dma_lkey, which is not tied to the chunk's MR mapping, so the verbs
layer does not constrain the transfer length to max_chunk_size. msg_id
and off are bounded against queue_depth and max_chunk_size in
rtrs_srv_rdma_done(), but desc[0].len is a separate field that was not
checked against the chunk size.
A peer that advertises desc[0].len larger than max_chunk_size can make
the posted RDMA write read past the chunk's mapped region. The resulting
behaviour depends on the IOMMU configuration: with no IOMMU or in
passthrough mode the read may extend into memory adjacent to the chunk
and be returned to the peer, which can disclose host memory; with a
translating IOMMU the out-of-range access is expected to fault and abort
the connection. In either case the transfer exceeds what the protocol
permits and is driven by a remote peer.
Reject a descriptor length above max_chunk_size, mirroring the existing
off >= max_chunk_size bound in rtrs_srv_rdma_done(). Legitimate clients
do not exceed it: the client sets desc[0].len to its MR length, which is
capped at the negotiated max_io_size (max_chunk_size - MAX_HDR_SIZE).
Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://patch.msgid.link/r/20260612-master-v1-1-70cde5c6fdc9@gmail.com Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Zhenhao Wan <whi4ed0g@gmail.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
docs: infiniband: correct name of option to enable the ib_uverbs module
The Infiniband documentation states that CONFIG_INFINIBAND_USER_VERBS
should be used to enable the ib_uverbs module. However, this option was
renamed to CONFIG_INFINIBAND_USER_ACCESS in commit 17781cd6186c
("[PATCH] IB: clean up user access config options"). Update the
documentation to reflect this.
Selvin Xavier [Mon, 15 Jun 2026 22:47:51 +0000 (15:47 -0700)]
RDMA/bnxt_re: Reject GET_TOGGLE_MEM when toggle page was not allocated
If a user calls BNXT_RE_METHOD_GET_TOGGLE_MEM on a device that does not
support the CQ/SRQ toggle feature, uctx_cq_page or uctx_srq_page will
be NULL.
Add an explicit -EOPNOTSUPP return after capturing the address from
uctx_cq_page / uctx_srq_page if the address is zero.
Fixes: e275919d9669 ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace") Fixes: 181028a0d84c ("RDMA/bnxt_re: Share a page to expose per SRQ info with userspace") Link: https://patch.msgid.link/r/20260615224751.232802-16-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Selvin Xavier [Mon, 15 Jun 2026 22:47:47 +0000 (15:47 -0700)]
RDMA/bnxt_re: Avoid repeated requests to allocate WC pages
Applications can request multiple WC pages for the same ucontext.
As of now, only 1 WC page per ucontext is supported. Add a lock to
avoid concurrent access and a check to fail repeated requests.
Also, if the mmap entry insert fails for the WC, free the Doorbell
page index mapped for the WC page.
Fixes: eee6268421a2 ("RDMA/bnxt_re: Move the UAPI methods to a dedicated file") Fixes: 360da60d6c6e ("RDMA/bnxt_re: Enable low latency push") Link: https://patch.msgid.link/r/20260615224751.232802-12-selvin.xavier@broadcom.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
====================
tls: reject the combination of TLS and sockmap
There are no known TLS+sockmap users and it has some known
hard to solve bugs. Let's reject this configuration as we
discussed a number of times.
====================
Jakub Kicinski [Sun, 14 Jun 2026 01:41:00 +0000 (18:41 -0700)]
selftests/bpf: test that TLS crypto is rejected on a sockmap socket
TLS and sockmap are mutually exclusive. We already have a test
for the sockmap side rejecting kTLS, add the inverse test matching
patch 1 of this series.
Jakub Kicinski [Sun, 14 Jun 2026 01:40:59 +0000 (18:40 -0700)]
selftests/bpf: drop the unused kTLS program from test_sockmap
With the sockmap + kTLS tests gone, the BPF-side support in test_sockmap
is dead: the tls_sock_map map and bpf_prog3 (which redirected skbs into
it) are no longer referenced. Remove them, along with the now-unused
bpf_write_pass() helper.
bpf_prog3 was progs[2], so renumber the progs[] users in test_sockmap.c:
the sockops program drops to progs[2] and the sk_msg tx programs to
progs[3..7]. Shrink the map/prog arrays from 9 to 8 and drop the
tls_sock_map entry (the last one) from map_names[] to match.
Jakub Kicinski [Sun, 14 Jun 2026 01:40:58 +0000 (18:40 -0700)]
selftests/bpf: remove sockmap + ktls tests
The combination of sockmap and TLS is no longer supported - installing
the TLS ULP on a sockmap socket (and vice versa) is now rejected. Remove
the tests that exercise the combination along with their BPF program;
the file covered nothing but sockmap sockets holding kTLS contexts.
Jakub Kicinski [Sun, 14 Jun 2026 01:40:57 +0000 (18:40 -0700)]
tls: remove dead sockmap (psock) handling from the SW path
TLS and sockmap are now mutually exclusive. Try to delete the code
from sendmsg and recvmsg path which is now obviously dead.
The main goal is to delete enough code for AI security scanners
to no longer bother us with sockmap related bugs. At the same
time retain the code in case someone has the cycles to fix
all of this and make the integration work, again.
If the integration does not get restored we can wipe the rest
of the skmsg code from TLS in two or three releases.
The changes on the Tx side are deeper since that's where most
of the bugs are, Rx side simply takes the data from sockmap
and gives it to the user. On Tx split record handling and
rolling back the iterator were the two problem areas.
Jakub Kicinski [Sun, 14 Jun 2026 01:40:56 +0000 (18:40 -0700)]
tls: reject the combination of TLS and sockmap
TLS and sockmap (BPF psock) integration hides a lot of latent bugs.
Bugs which may be more or less relevant for real users but they
are definitely exploitable.
We could not find anyone actively using this integration so let's
reject this config. Adding a TLS socket to a sockmap was already
rejected by sk_psock_init() through the inet_csk_has_ulp() check.
We need to reject the attempts to configure the TLS keys (rather
than adding the ULP itself) because checking prior to the ULP
installation is tricky without risking a race with sockmap getting
added in parallel (sockmap does not hold the socket lock).
This patch is a minimal rejection of the feature. Subsequent patch
in the series will do a light dead code removal. Full cleanup would
require a major rewrite of the Tx path, we don't need skmsg any more.
Jakub Kicinski [Tue, 16 Jun 2026 15:53:56 +0000 (08:53 -0700)]
Merge branch 'atm-remove-more-dead-code'
Jakub Kicinski says:
====================
atm: remove more dead code
Commit 6deb53595092 ("net: remove unused ATM protocols and legacy
ATM device drivers") removed a good chunk of old ATM drivers.
Our goal going forward is to limit the ATM support to PPPoATM
used in ADSL deployments.
A recent burst of AI generated fixes for net/atm/signaling.c and
net/atm/svc.c made me look closer at the remaining code. PPPoATM runs
over permanent virtual circuits (PF_ATMPVC) with a statically
configured VPI/VCI. We can drop switched virtual circuits (SVCs)
and user-space signaling (atmsigd) support. While digging around
I noticed a few more obviously dead pieces of code.
Annoyingly, I have applied one "fix" to QoS config which will
now make net conflict with this series :/
====================
Jakub Kicinski [Mon, 15 Jun 2026 19:44:16 +0000 (12:44 -0700)]
atm: remove orphaned uAPI for deleted drivers, protocols and SVCs
ATM removals have left a number of uAPI headers and ioctl
definitions with no in-kernel implementation behind them:
- device headers for adapters deleted with the legacy PCI/SBUS drivers:
atm_eni.h, atm_he.h, atm_idt77105.h, atm_nicstar.h, atm_zatm.h and
the atmtcp pair atm_tcp.h / <linux/atm_tcp.h>
- protocol headers for the removed CLIP, LANE and MPOA stacks:
atmarp.h, atmclip.h, atmlec.h, atmmpc.h
- atmsvc.h and the SVC / p2mp / local-address ioctls in atmdev.h
(ATM_{GET,RST,ADD,DEL}ADDR, ATM_{ADD,DEL,GET}LECSADDR,
ATM_{ADD,DROP}PARTY) left behind by the SVC and address-registry
removals
None of these are referenced by any remaining in-tree code.
Let's try to delete all this. Chances are nobody cares about
these headers any more. I'm keeping this separate from the
kernel side code changes for ease of revert, in case I am
proven wrong...
Jakub Kicinski [Mon, 15 Jun 2026 19:44:15 +0000 (12:44 -0700)]
atm: remove unused ATM PHY operations
The PHY operations are vestiges of the SAR/framer split used by the
removed PCI/SBUS ATM adapters:
- atmdev_ops::phy_put / ::phy_get (register accessors) are never called
by the core and solos-pci only listed them as NULL
- struct atmphy_ops and atm_dev::phy have no users at all - nothing
assigns or dereferences them
Remove all of them. atm_dev::phy_data is kept: solos-pci repurposes it
to stash its per-port channel index.
Jakub Kicinski [Mon, 15 Jun 2026 19:44:14 +0000 (12:44 -0700)]
atm: remove the unused pre_send and send_bh device operations
atmdev_ops::pre_send (a TX pre-processing hook) and ::send_bh (a
bottom-half capable send variant) have no implementation behind them:
no remaining ATM driver sets either, so vcc_sendmsg() always skipped
pre_send and the raw AAL0/AAL5 paths always fell back to ->send().
The drivers that used these hooks were removed with the legacy ATM
adapters.
Drop both operations and the dead branches that tested for them.
Jakub Kicinski [Mon, 15 Jun 2026 19:44:13 +0000 (12:44 -0700)]
atm: remove the unused change_qos device operation
atmdev_ops::change_qos() was the hook for renegotiating the traffic
parameters of an already-connected VCC, driven from SO_ATMQOS on a
connected socket (and previously from the SVC as_modify path, now gone).
None of the ATM drivers left in tree implement it - solos-pci only listed
change_qos = NULL - so atm_change_qos() always returned -EOPNOTSUPP.
Drop the operation and return -EOPNOTSUPP directly.
Jakub Kicinski [Mon, 15 Jun 2026 19:44:12 +0000 (12:44 -0700)]
atm: remove SVC socket support and the signaling daemon interface
ATM switched virtual circuits (SVCs) are set up and torn down by a
user-space signaling daemon (atmsigd) which the kernel talks to over
a dedicated "sigd" socket: the kernel marshals Q.2931-style requests
(as_connect, as_listen, as_accept, as_close, ...) to the daemon and
applies the results to PF_ATMSVC sockets. This is the machinery behind
classical SVC use and was the foundation for LANE / MPOA, all of which
have been removed.
DSL deployments do not use any of this. PPPoATM and BR2684 run over
permanent virtual circuits (PF_ATMPVC) with a statically configured
VPI/VCI; no atmsigd, no Q.2931. Neither remaining ATM driver
(solos-pci, the USB DSL modems) is reachable through the SVC path.
Remove the SVC socket family and the signaling interface:
- delete net/atm/svc.c, net/atm/signaling.c and signaling.h
- drop atmsvc_init()/atmsvc_exit() and the PF_ATMSVC registration and
module alias
- drop the ATMSIGD_CTRL ioctl (sigd_attach) and the /proc/net/atm/svc
file
- fold the SVC branch out of atm_change_qos(); all sockets are PVCs now
The obsolete ATM_SETSC ioctl stub is left in place (it already just
warns and returns 0), as is the struct atm_vcc SVC bookkeeping shared
with the queueing layer.
Jakub Kicinski [Mon, 15 Jun 2026 19:44:11 +0000 (12:44 -0700)]
atm: remove the local ATM (NSAP) address registry
net/atm/addr.c maintained the per-device lists of local NSAP addresses
(dev->local) and ILMI-learned LECS addresses (dev->lecs). These exist
solely to serve SVC signaling: the lists are populated through the
ATM_{ADD,DEL,RST}ADDR / ATM_{ADD,DEL,GET}LECSADDR ioctls used by the
atmsigd / ILMI daemons, and consumed when registering addresses with the
signaling daemon. The LECS list belonged to LAN Emulation, which has
been removed.
With no SVC users in a DSL-only configuration these lists are always
empty, so drop the registry entirely:
- remove the ADDR/LECSADDR/RSTADDR ioctls
- drop the now-always-empty "atmaddress" sysfs attribute
- remove the dev->local / dev->lecs lists, structs and enums
- delete net/atm/addr.c and net/atm/addr.h
The device ESI ("MAC" address) and its ATM_{G,S}ETESI ioctls and
"address" sysfs attribute are retained - the USB DSL modems populate
the ESI.
Jakub Kicinski [Mon, 15 Jun 2026 19:44:10 +0000 (12:44 -0700)]
atm: remove dead SONET PHY ioctls
The SONET_* ioctls are SONET/SDH PHY controls that atm_dev_ioctl() and
the compat path only ever forwarded to the driver's ->ioctl() handler.
The PHY drivers that implemented them (the S/UNI library and the framers
on the removed PCI/SBUS adapters) are gone, and neither surviving driver
services them: solos-pci has no ->ioctl, and usbatm handles only
ATM_QUERYLOOP. They now uniformly return an error regardless.
Drop the SONET compat passthrough and the SONET cases in atm_dev_ioctl(),
along with the now-unused linux/sonet.h includes. The SONET_* uAPI
definitions are untouched.
Jakub Kicinski [Mon, 15 Jun 2026 19:44:09 +0000 (12:44 -0700)]
atm: remove the unused send_oam / push_oam callbacks
The atmdev_ops::send_oam device operation and the atm_vcc::push_oam
callback were the kernel's interface for raw F4/F5 OAM cell exchange.
Nothing assigns them a non-NULL value and nothing ever invokes them:
the core only ever initialises push_oam to NULL (in vcc_create() and the
AAL init helpers) and the Solos driver only lists send_oam = NULL for
documentation. The drivers that actually drove OAM through these hooks
were removed along with the legacy ATM adapters.
Jakub Kicinski [Mon, 15 Jun 2026 19:44:08 +0000 (12:44 -0700)]
atm: remove AAL3/4 transport support
AAL3/4 is an obsolete connection-oriented ATM adaptation layer that has
seen no real use since the SMDS-era hardware it was designed for (90s?).
We are only maintaining ATM support in-tree to keep PPPoATM running,
and PPPoATM runs over AAL5.
Drop the "raw" AAL3/4 transport (atm_init_aal34()) and the ATM_AAL34
cases in the connect and traffic-parameter paths. A vcc_connect() with
qos.aal == ATM_AAL34 now fails with -EPROTOTYPE.
Finalize commit c33c794828f2 ("mm: ptep_get() conversion") and
replace direct page table entry dereferencing with the proper
accessors (ptep_get(), pmdp_get(), etc.).
Override the default getter implementations even though they are
currently identical: pud_clear(), p4d_clear(), and pgd_clear()
require corresponding architecture-specific getters, but these
are not yet defined. This avoids a dependency loop.
Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Selvin Xavier [Mon, 15 Jun 2026 22:47:45 +0000 (15:47 -0700)]
RDMA/bnxt_re: Add a max slot check for SQ
The variable WQE mode must be validated against
the maximum slots supported by HW. The max supported
value is 64K. Adding a max and min check and fail if user
supplied value is more than the max supported and zero.
Fixes: d8ea645d6984 ("RDMA/bnxt_re: Handle variable WQE support for user applications") Link: https://patch.msgid.link/r/20260615224751.232802-10-selvin.xavier@broadcom.com Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Selvin Xavier [Mon, 15 Jun 2026 22:47:44 +0000 (15:47 -0700)]
RDMA/bnxt_re: Avoid displaying the kernel pointer
While dumping the info on MR using the rdma tool, we
dump the mr_hwq which is a kernel pointer. There is
no need to expose this value for end user. So avoid
it.
Fixes: 7363eb76b7f3 ("RDMA/bnxt_re: Support driver specific data collection using rdma tool") Link: https://patch.msgid.link/r/20260615224751.232802-9-selvin.xavier@broadcom.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Selvin Xavier [Mon, 15 Jun 2026 22:47:39 +0000 (15:47 -0700)]
RDMA/bnxt_re: Free CQ toggle page after firmware teardown
Free the toggle page only after firmware teardown completes so that
an NQ interrupt arriving during bnxt_qplib_destroy_cq() won't write
the toggle value to an already-freed page. Move free_page() after
bnxt_qplib_destroy_cq.
Selvin Xavier [Mon, 15 Jun 2026 22:47:38 +0000 (15:47 -0700)]
RDMA/bnxt_re: Free SRQ toggle page after firmware teardown
Free the toggle page only after firmware teardown completes so that
an NQ interrupt arriving during bnxt_qplib_destroy_srq() won't write
the toggle values to an already-freed page. Move free_page() after
bnxt_qplib_destroy_srq().
===================
This is supposed to improve s390 idle time accounting, and brings it
back to the state it was before arch_cpu_idle_time() was removed from
s390 [3].
In result all cpu time accounting is done by the s390 architecture backend
again, instead of having a mix of architecure specific and common code
accounting (common code: idle, s390 architecture: everything else).
===================
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
The YAML conversion added me as maintainer but I can't recall being
asked nor do I want to maintain it. Thierry has created the YAML file
and works for the company which contributed the driver.
dt-bindings: i2c: convert i2c-mux-reg to DT schema
Convert Documentation/devicetree/bindings/i2c/i2c-mux-reg.txt to
the YAML schema so the i2c-mux-reg binding is validated by
dt_binding_check. Faithful port of the existing properties; no
semantic change.
Haoxiang Li [Wed, 10 Jun 2026 03:05:13 +0000 (11:05 +0800)]
i2c: davinci: Unregister cpufreq notifier on probe failure
davinci_i2c_probe() registers a cpufreq transition notifier before adding
the I2C adapter. If i2c_add_numbered_adapter() fails, the probe error path
releases the device resources without unregistering the notifier.
Add a dedicated error path to unregister the cpufreq notifier after
i2c_add_numbered_adapter() fails.
In cci_probe() the controller's interrupt is requested using a devres
managed API, and in cci_probe() error path and cci_remove() it'd be
safe to rely on devres mechanism to free and shutdown the interrupt,
thus explicit disable_irq() calls can be removed as unnecessary ones.
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20260515234121.1607425-5-vladimir.zapolskiy@linaro.org
i2c: qcom-cci: Move cci_init() under cci_reset() function
On probe or runtime errors cci_reset() is called and it should be coupled
with cci_init(), instead of doing this on caller's side, embed cci_init()
directly into the cci_reset() function.
This is a non-functional change, cci_reset() and cci_init() function
bodies are reordered.
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20260515234121.1607425-4-vladimir.zapolskiy@linaro.org
Linus Torvalds [Tue, 16 Jun 2026 12:20:34 +0000 (17:50 +0530)]
Merge tag 'trace-tools-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull RTLA tool updates from Steven Rostedt:
- Fix discrepancy in --dump-tasks option
Due to a mistake, rtla-timerlat-hist used the CLI syntax
"--dump-task" instead of the documented "--dump-tasks". Change the
option to match both documentation and the other timerlat tool,
rtla-timerlat-top.
- Extend coverage of runtime tests
Cover both top and hist tools in all applicable test cases, add tests
for a few uncovered options, and extend checks for some existing
tests.
- Add unit tests for actions
rtla's actions feature is implemented in its source file and contains
non-trivial parsing logic. Cover it with unit tests.
- Stop record trace on interrupt
Fix a bug where an interval exists after receiving a signal in which
the main instance is stopped but the record instance is not, leading
to discrepancies in reported results and sometimes rtla hanging.
- Restore continue flag in actions_perform()
Fix a bug where rtla always continues tracing after hitting a
threshold even if the continue action was triggered just once, and
add tests verifying that the flag is reset properly.
- Migrate command line interface to libsubcmd
Replace rtla's argument parsing using getopt_long() with libsubcmd,
used by perf and objtool, to reuse existing code and auto-generate
better help messages. Extensive unit tests are included to detect
regressions.
- Add -A/--aligned option to timerlat tools
Add an option to align timerlat threads, based on the recently
introduced TIMERLAT_ALIGN option of the timerlat tracer, together
with unit tests and documentation.
- Document tests in README
Document how to run unit and runtime tests in rtla's README.txt,
including the dependencies needed to run them.
* tag 'trace-tools-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
rtla: Document tests in README
Documentation/rtla: Add -A/--aligned option
rtla/tests: Add unit tests for -A/--aligned option
rtla/timerlat: Add -A/--aligned CLI option
rtla/tests: Add unit tests for CLI option callbacks
rtla/tests: Add unit tests for _parse_args() functions
rtla: Parse cmdline using libsubcmd
tools subcmd: allow parsing distinct --opt and --no-opt
tools subcmd: support optarg as separate argument
rtla: Add libsubcmd dependency
rtla/tests: Add runtime tests for restoring continue flag
rtla/tests: Run runtime tests in temporary directory
rtla/tests: Add unit test for restoring continue flag
rtla/actions: Restore continue flag in actions_perform()
rtla: Stop the record trace on interrupt
rtla/tests: Add unit tests for actions module
rtla/tests: Add runtime tests for -C/--cgroup
rtla/tests: Add runtime test for -k and -u options
rtla/tests: Add runtime test for -H/--house-keeping
rtla/tests: Cover all hist options in runtime tests
...
Linus Torvalds [Tue, 16 Jun 2026 12:08:19 +0000 (17:38 +0530)]
Merge tag 'trace-latency-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing latency updates from Steven Rostedt:
- Dump the stack to the buffer on timerlat uret threashold event
Record the stack trace in the buffer for THREAD_URET as well as
THREAD_CONTEXT when the threshold is hit. Otherwise, if the threshold
was not hit at task wakeup, but was at task return, it will not
produce a stack trace making it harder to debug.
- Have osnoise trace prints print to all buffers
The osnoise tracer is allowed to print to the main buffer. Add a
osnoise_print() helper function and use trace_array_vprintk() to
print osnoise output.
* tag 'trace-latency-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/osnoise: Array printk init and cleanup
tracing/osnoise: Dump stack on timerlat uret threshold event
Linus Torvalds [Tue, 16 Jun 2026 12:03:20 +0000 (17:33 +0530)]
Merge tag 'probes-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- BTF support for dereferencing pointers
Add syntax to the parsing of eprobes to typecast structure pointer
trace event fields, enabling BTF-based dereferencing instead of
relying on manual offsets.
- Improvements and robustness enhancements
- Use flexible array for entry fetch code.
Store probe entry fetch instructions in the probe_entry_arg
allocation via a flexible array member to simplify memory
allocation and lifetime management.
- Replace BUG_ON with lockdep_assert_held in uprobe_buffer functions
Replace BUG_ON() calls with lockdep_assert_held() in uprobe buffer
enable/disable paths to prevent kernel crashes and better verify
lock ownership.
- Ensure the uprobe buffer size is bigger than event size.
Add a BUILD_BUG_ON() assertion to guarantee that the per-CPU
uprobe working buffer size is always larger than the maximum probe
event size.
* tag 'probes-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/eprobes: Allow use of BTF names to dereference pointers
tracing: Replace BUG_ON with lockdep_assert_held in uprobe_buffer functions
tracing: Use flexible array for entry fetch code
tracing/probes: Ensure the uprobe buffer size is bigger than event size
Linus Torvalds [Tue, 16 Jun 2026 11:59:24 +0000 (17:29 +0530)]
Merge tag 'bootconfig-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig updates from Masami Hiramatsu:
- bootconfig: move xbc_snprint_cmdline() to lib/bootconfig.c
Move the xbc_snprint_cmdline() function and its buffer from
main.c to the shared lib/bootconfig.c parser library so it
can be reused by userspace tools.
- render kernel.* subtree as cmdline string with -C
Add a new -C option to print the kernel.* subtree as a flat
command-line string at build time, allowing early parameter
injection without runtime parsing.
* tag 'bootconfig-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/bootconfig: render kernel.* subtree as cmdline string with -C
bootconfig: move xbc_snprint_cmdline() to lib/bootconfig.c
Linus Torvalds [Tue, 16 Jun 2026 11:19:07 +0000 (16:49 +0530)]
Merge tag 'linux_kselftest-next-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
"Several fixes and improvements to resctrl tests and a change to
kselftest document to clarify the use of FORCE_TARGETS build variable"
* tag 'linux_kselftest-next-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest: fix doc for ksft_test_result_report()
selftests/resctrl: Reduce L2 impact on CAT test
selftests/resctrl: Simplify perf usage in CAT test
selftests/resctrl: Remove requirement on cache miss rate
selftests/resctrl: Raise threshold at which MBM and PMU values are compared
selftests/resctrl: Increase size of buffer used in MBM and MBA tests
selftests/resctrl: Support multiple events associated with iMC
selftests/resctrl: Prepare for parsing multiple events per iMC
selftests/resctrl: Do not store iMC counter value in counter config structure
selftests/resctrl: Reduce interference from L2 occupancy during cache occupancy test
selftests/resctrl: Improve accuracy of cache occupancy test
docs: kselftest: Document the FORCE_TARGETS build variable
Linus Torvalds [Tue, 16 Jun 2026 11:03:57 +0000 (16:33 +0530)]
Merge tag 'linux_kselftest-kunit-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
"Fixes to tool and kunit core and new features to both to support JUnit
XML (primitive) and backtrace suppression API:
- Core support for suppressing warning backtraces
- Parse and print the reason tests are skipped
- Add (primitive) support for outputting JUnit XML
- Don't write to stdout when it should be disabled
- Add backtrace suppression self-tests
- Suppress intentional warning backtraces in scaling unit tests
- Add documentation for warning backtrace suppression API
- Fix spelling mistakes in comments and messages
- gen_compile_commands: Ignore libgcc.a
- qemu_configs: Add or1k / openrisc configuration"
* tag 'linux_kselftest-kunit-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit:tool: Don't write to stdout when it should be disabled
kunit: tool: Add (primitive) support for outputting JUnit XML
kunit: tool: Parse and print the reason tests are skipped
kunit: Add documentation for warning backtrace suppression API
drm: Suppress intentional warning backtraces in scaling unit tests
kunit: Add backtrace suppression self-tests
bug/kunit: Core support for suppressing warning backtraces
kunit: Fix spelling mistakes in comments and messages
kunit: qemu_configs: Add or1k / openrisc configuration
gen_compile_commands: Ignore libgcc.a
Sayali Patil [Tue, 16 Jun 2026 06:14:35 +0000 (11:44 +0530)]
powerpc/fadump: define MIN_RMA in bytes rather than MB
The MIN_RMA size checks in fadump_setup_param_area() use
(MIN_RMA * 1024 * 1024), which is evaluated in int and can
overflow when MIN_RMA is increased to values such as SZ_2G,
triggering compiler warnings such as:
warning: integer overflow in expression of type 'int'
results in '0' [-Woverflow]
Define MIN_RMA directly in bytes using SZ_1M and update the
callers accordingly. This avoids repeated unit conversions and
prevents integer overflow.
Also convert MIN_RMA back to MB when populating the firmware
architecture vector, since firmware expects the value in MB.
powerpc: Restore KUAP registers on syscall restart exit
During a syscall restart, block KUAP so that pending interrupts can be
replayed. The original KUAP state is not restored before returning to
userspace, causing subsequent userspace accesses to fault and eventually
trigger bad_access_pkey(), crashing the kernel.
The original KUAP register values are already saved in
arch_enter_from_user_mode(). Restore them on the syscall restart exit
path before returning to userspace.
Linus Torvalds [Tue, 16 Jun 2026 07:50:54 +0000 (13:20 +0530)]
Merge tag 'for-7.2/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- small cleanups in dm-vdo, dm-raid, dm-cache, dm-zoned-metadata
- rework of dm-ima
- introduce dm-inlinecrypt
- fix wrong return value in dm-ioctl
- fix rcu stall when polling
* tag 'for-7.2/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-zoned-metadata: Use strscpy() to copy device name
dm cache: make smq background work limit configurable
dm-inlinecrypt: add support for hardware-wrapped keys
dm: limit target bio polling to one shot
dm-ioctl: report an error if a device has no table
dm: add documentation for dm-inlinecrypt target
dm-inlinecrypt: add target for inline block device encryption
block: export blk-crypto symbols required by dm-inlinecrypt
dm-ima: use active table's size if available
dm-ima: Fail more gracefully in dm_ima_measure_on_*
dm-ima: Handle race between rename and table swap
dm-ima: Fix issues with dm_ima_measure_on_device_rename
dm-ima: remove new_map from dm_ima_measure_on_device_clear
dm-ima: Fix UAF errors and measuring incorrect context
dm-ima: don't copy the active table to the inactive table
dm-ima: Remove status_flags from dm_ima_measure_on_table_load()
dm-ima: remove broken last_target_measured logic
dm-ima: remove dm_ima_reset_data()
dm-raid: only requeue bios when dm is suspending
dm vdo: use get_random_u32() where appropriate
Linus Torvalds [Tue, 16 Jun 2026 07:32:47 +0000 (13:02 +0530)]
Merge tag 'for-7.2/block-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Per-controller admin and IO timeout sysfs attributes, and
letting the block layer set request timeouts (Maurizio,
Maximilian)
- Multipath passthrough iostats, and PCI P2PDMA enablement for
multipath devices (Keith, Kiran)
- A new diag sysfs attribute group exporting per-controller
counters (retries, multipath failover, error counters, requeue
and failure counts, reset and reconnect events) (Nilay)
- FDP configuration validation and bounds check fixes (liuxixin)
- Various nvmet fixes, including a pre-auth out-of-bounds read in
the Discovery Get Log Page handler, auth payload bounds
validation, and tcp error-path leak fixes (Bryam, Tianchu,
Geliang)
- nvme-tcp lockdep and workqueue fixes (Shin'ichiro, Kuniyuki,
Eric)
- Assorted other fixes and cleanups (John, Yao, Chao, Mateusz,
Achkinazi, Wentao)
- MD pull request via Yu Kuai:
- raid1/raid10 fixes for a deadlock in the read error recovery
path, error-path detection and bio accounting with cloned bios,
and an nr_pending leak in the REQ_ATOMIC bad-block error path
(Abd-Alrhman)
- PCI P2PDMA propagation from member devices to the RAID device
(Kiran)
- dm-raid bio requeue fix, and various smaller fixes and cleanups
(Benjamin, Chen, Li, Thorsten)
- Enable Clang lock context analysis for the block layer, with the
accompanying annotations across queue limits, the blk_holder_ops
callbacks, crypto, cgroup, iocost, kyber and mq-deadline (Bart)
- Block status code infrastructure work: a tagged status table, a
str_to_blk_op() helper, a bio_endio_status() helper, and on top of
that a new configurable block-layer error injection facility
(Christoph)
- DRBD netlink rework, replacing the genl_magic machinery with explicit
netlink serialization and moving the DRBD UAPI headers to
include/uapi/linux/ (Christoph Böhmwalder)
- bvec improvements: a bvec_folio() helper and making the bvec_iter
helpers proper inline functions (Willy, Christoph)
- ublk cleanups and a canceling-flag fix for the disk-not-allocated
case (Caleb, Ming)
- Partition handling fixes: bound the AIX pp_count scan, fix an of_node
refcount leak, and replace __get_free_page() with kmalloc() (Bryam,
Wentao, Mike)
- Convert numa_node to int in blk_mq_hw_ctx and ->init_request, and add
WQ_PERCPU to the block workqueue users (Mateusz, Marco)
- Block statistics and tracing: propagate in-flight to the whole disk
on partition IO, export passthrough stats, and a new
block_rq_tag_wait tracepoint (Tang, Keith, Aaron)
- A round of removals, unexports and cleanups across bio, direct-io and
the bvec helpers (Christoph)
- Various driver fixes (mtip32xx use-after-free, rbd snap_count
validation and strscpy conversion, nbd socket lockdep reclassify,
virtio-blk zone report clamp, floppy) and a batch of MAINTAINERS
email/list updates (Coly, Li, Yu, Christoph Böhmwalder)
- Other little fixes and cleanups all over
* tag 'for-7.2/block-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (117 commits)
MAINTAINERS: Update Coly Li's email address
block: check bio split for unaligned bvec
nbd: Reclassify sockets to avoid lockdep circular dependency
block: add configurable error injection
block: add a str_to_blk_op helper
block: add a "tag" for block status codes
block: add a macro to initialize the status table
floppy: Drop unused pnp driver data
block: propagate in_flight to whole disk on partition I/O
virtio-blk: clamp zone report to the report buffer capacity
block: optimize I/O merge hot path with unlikely() hints
drivers/block/rbd: Use strscpy() to copy strings into arrays
partitions: aix: bound the pp_count scan to the ppe array
block: Enable lock context analysis
block/mq-deadline: Make the lock context annotations compatible with Clang
block/Kyber: Make the lock context annotations compatible with Clang
block/blk-mq-debugfs: Improve lock context annotations
block/blk-iocost: Inline iocg_lock() and iocg_unlock()
block/blk-iocost: Split ioc_rqos_throttle()
block/crypto: Annotate the crypto functions
...