git.ipfire.org Git - thirdparty/linux.git/log

rust: ptr: rename `ProjectIndex::index` to `build_index`

The corresponding `SliceIndex` trait in Rust uses `index` to mean the
panicking variant, which is also being added to `ProjectIndex`. Hence
rename our custom `build_error!` index variant to `build_index`.

Suggested-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://lore.kernel.org/rust-for-linux/DI5LLN2V3XCS.34H4CG99N4MPA@nvidia.com
Signed-off-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260602-projection-syntax-rework-v2-1-6989470f5440@garyguo.net
[ Reworded docs slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

ALSA: seq: dummy: fix UMP event stack overread

The dummy sequencer port forwards events by copying an incoming
struct snd_seq_event into a stack temporary, rewriting source and
destination, and dispatching the temporary to subscribers. That legacy
event storage is smaller than struct snd_seq_ump_event.

When a UMP event reaches the dummy client, the copy leaves the UMP flag
set but only provides legacy-sized stack storage. The subscriber
delivery path then uses snd_seq_event_packet_size() and copies a
UMP-sized packet from that stack object, reading past the end of the
temporary.

Use the existing union __snd_seq_event storage and copy the packet size
reported for the incoming event before rewriting the common routing
fields. This preserves the full UMP packet for UMP events while keeping
legacy event handling unchanged.

Fixes: 32cb23a0f911 ("ALSA: seq: dummy: Allow UMP conversion")
Signed-off-by: Kyle Zeng <kylebot@openai.com>
Link: https://patch.msgid.link/20260605080204.32045-1-kylebot@openai.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge patch series "proc: protect ptrace_may_access() with exec_update_lock"

Jann Horn <jannh@google.com> says:

My understanding is that procfs is effectively maintained by the VFS
maintainers (though scripts/get_maintainer.pl claims that there are
no maintainers for procfs because the VFS entry only claims files
directly in fs/, and the procfs entry has no maintainers listed on
it).

In procfs, most uses of ptrace_may_access() should use
exec_update_lock to avoid TOCTOU issues with concurrent privileged
execve() (like setuid binary execution).

This series doesn't fix all the remaining issues in procfs, but it fixes
the easy cases for now; I will probably follow up with fixes for the
gnarlier cases later unless someone else wants to do that.

I have checked that procfs files still work with these changes and that
CONFIG_PROVE_LOCKING=y doesn't generate any warnings.

(checkpatch complains about missing argument names in
proc_op::proc_get_link, but that was already the case before my patch.)

* patches from https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-0-5c3d20e0ac33@google.com:
proc: protect ptrace_may_access() with exec_update_lock (FD links)
proc: protect ptrace_may_access() with exec_update_lock (part 1)

Link: https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-0-5c3d20e0ac33@google.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

proc: protect ptrace_may_access() with exec_update_lock (FD links)

proc_pid_get_link() and proc_pid_readlink() currently look up the task from
the pid once, then do the ptrace access check on that task, then look up
the task from the pid a second time to do the actual access.
That's racy in several ways.

To fix it, pass the task to the ->proc_get_link() handler, and instead of
proc_fd_access_allowed(), introduce a new helper call_proc_get_link() that
looks up and locks the task, does the access check, and calls
->proc_get_link().

Fixes: 778c1144771f ("[PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-2-5c3d20e0ac33@google.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

proc: protect ptrace_may_access() with exec_update_lock (part 1)

Fix the easy cases where procfs currently calls ptrace_may_access() without
exec_update_lock protection, where the fix is to simply add the extra lock
or use mm_access():

- do_task_stat(): grab exec_update_lock
- proc_pid_wchan(): grab exec_update_lock
- proc_map_files_lookup(): use mm_access() instead of get_task_mm()
- proc_map_files_readdir(): use mm_access() instead of get_task_mm()
- proc_ns_get_link(): grab exec_update_lock
- proc_ns_readlink(): grab exec_update_lock

Fixes: f83ce3e6b02d ("proc: avoid information leaks to non-privileged processes")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-1-5c3d20e0ac33@google.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

wifi: nl80211: Increase ie_len size to prevent truncated IEs in new peer notifications

Currently, ie_len in cfg80211_notify_new_peer_candidate is defined as
1-byte field, capping the maximum IE list size at 255 bytes. When a
large beacon is received, the IE list is truncated, passing incomplete
data to wpa_supplicant. This causes supplicant to fail parsing the IEs.

Increasing the size of ie_len to allow the full length of the IE list to
be forwarded properly.

Signed-off-by: Thiyagarajan Pandiyan <thiyagarajan@aerlync.com>
Link: https://patch.msgid.link/20260605054307.427874-1-thiyagarajan@aerlync.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: fold tid_ampdu_rx allocations into a flexible array

Convert the separately-allocated reorder_buf pointer to a C99 flexible
array member at the end of struct tid_ampdu_rx, with both the
sk_buff_head and the jiffies timestamp in each array element. This
collapses three allocations into one and removes the corresponding
kfree() pairs from the error and free paths.

Assisted-by: opencode:big-pickle
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260605005627.317194-1-rosenp@gmail.com
[fix kernel-doc]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Fix -Wc23-extensions in hwmp_route_info_get()

When building with a version of clang that supports
'-fms-anonymous-structs' (which will be used by the kernel instead of
the wider '-fms-extensions'), there are a couple warnings after some
recent mesg_hwmp.c changes:

  net/mac80211/mesh_hwmp.c:373:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    373 |                 struct ieee80211_mesh_hwmp_preq_top *preq_elem_top =
        |                 ^
  net/mac80211/mesh_hwmp.c:390:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    390 |                 struct ieee80211_mesh_hwmp_prep_top *prep_elem_top =
        |                 ^
  2 errors generated.

Enclose the switch case blocks in braces to clear up the warning.

Fixes: a91c65cb99d1 ("wifi: mac80211: Use struct instead of macro for PREP frame")
Fixes: 4ac20bd40b7d ("wifi: mac80211: Use struct instead of macro for PREQ frame")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260604-mac80211-mesh_hwmp-fix-c23-extensions-v1-1-25a64d6ce541@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

batman-adv: fix kernel-doc typos and grammar errors

Various minor errors were gathered over the time in batman-adv's kernel-doc
comments. Get rid of many of them before they are copied (again) to new
functions.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: fix batadv_v_ogm_packet_recv error handling kernel-doc

All receive handlers in batman-adv are consuming the skbuff independent of
the result of the handler. The "(without freeing the skb) on failure" is
therefore not corrrect anymore for the current implementation.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: uapi: keep kernel-doc in struct member order

The order of the members of struct batadv_coded_packet and struct
batadv_unicast_tvlv_packet didn't match the kernel doc. This is the case
for all other structures and should also be done the same way for these
two.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: bla: update stale kernel-doc

The bridge-loop-avoidance code was changed recently to avoid inconsistent
state and race condition problems. The kernel-doc addded in these commits
(and related code) has various minor deficits which are now resolved.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: tp_meter: update stale kernel-doc after refactoring

The tp_meter codebase was recently refactored:

* throughput meter sender and receiver variables were split into
two different structures
* the congestion control variables were extracted in a separate structure

But the kernel-doc was not updated everywhere to reflect these changes.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: correct batadv_wifi_* kernel-doc

The original kernel documentation for the batadv_wifi_* functions contained
copy+paste errors. Correct them to make it easier understandable.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: document cleanup of batadv_wifi_net_devices entries

It doesn't seem to be obvious how the entries from the
batadv_wifi_net_devices rhashtable are getting removed before the actual
rhashtable is destroyed. Document the idea behind the process and which
steps are involved.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: use GFP_KERNEL allocations for the wifi detection cache

The batadv_wifi_net_device_insert() is called with ASSERT_RTNL() held, but
not inside a spinlock or another context which prevents "might_sleep"
functions. To relax the requirements for the allocator, use GFP_KERNEL.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: drop duplicated wifi_flags assignments

During the initialization of the batadv_wifi_net_device_state, it is enough
to write the wifi_flags once before the batadv_wifi_net_device_state is
added to the batadv_wifi_net_devices rhashtable.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: convert cancellation of work items to disable helper

With commit 86898fa6b8cd ("workqueue: Implement disable/enable for
(delayed) work items"), work queues gained the ability to permanently
disallow re-queuing of work items. This is particularly important during
object teardown, where a work item must not be re-armed after shutdown
begins.

Convert all cancel_work_sync() and cancel_delayed_work_sync() call sites to
their disable_* equivalents to clarify the intent to prevent re-arming
after teardown.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: tp_meter: initialize last_recv_time during init

The last_recv_time is the most important indicator for a receiver session
to figure out whether a session timed out or not. But this information was
only initialized after the session was added to the tp_receiver_list and
after the timer was started.

In the worst case, the timer (function) could have tried to access this
information before the actual initialization was reached. Like rest of the
variables of the tp_meter receiver session, this field has to be filled out
before any other (parallel running) context has the chance to access it.

Cc: stable@kernel.org
Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

simple_lookup(): use d_splice_alias() for ->lookup() return value

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ecryptfs: use d_splice_alias() for ->lookup() return value

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

configfs_lookup(): switch to d_splice_alias()

more idiomatic

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

tracefs: use d_splice_alias() in ->lookup() instances

d_add() is not wrong there (inodes are freshly allocated), but
d_splice_alias() is more idiomatic.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

make cursors NORCU

All it requires is making sure that d_walk() will skip *all*
CURSOR dentries, even if somebody passes it one as an argument.

Cursors are negative and unhashed all along, never get added to
LRU or to shrink lists and no RCU references via ->d_sib are
possible for those - dentry_unlist() makes sure that no killed
dentry has ->d_sib.next left pointing to a cursor.

Seeing that a cursor is allocated every time we open a directory
on autofs, debugfs, devpts, etc., avoiding an RCU delay when such
opened files get closed is attractive...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

nfs: get rid of fake root dentries

... just grab the reference to the (real) root we are about to return
for the first mount of this superblock and be done with that.

Once upon a time dentry tree eviction at fs shutdown used to break
if ->s_root had been spliced on top of something; that hadn't been
the case for years now, and these fake root dentries violate a bunch
of invariants. Let's get rid of them...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

wind ->s_roots via ->d_sib instead of ->d_hash

shrink_dcache_for_umount() is supposed to handle the possibility of
some of the dentries to be evicted being in other threads shrink
lists; it either kills them, leaving an empty husk to be freed by
the owner of shrink list whenever it gets around to that, or it
waits for the eviction in progress to get completed.

That relies upon dentry remaining attached to the tree until the
eviction reaches dentry_unlist() and its ->d_sib gets removed
from the list. Unfortunately, the secondary roots are linked
via ->d_hash, rather than ->d_sib and they become removed from
that list before their inode references are dropped.

If shrink_dentry_list() from another thread ends up evicting
one of the secondary roots and gets to that point in dentry_kill()
when shrink_dcache_for_umount() is looking for secondary roots,
the latter will *not* notice anything, possibly leading to
warnings about busy inodes at umount time and all kinds of breakage
after that.

Moreover, shrink_dcache_for_umount() walks the list of secondary
roots with no protection whatsoever, so it might end up calling
dget() on a dentry that already passed through
lockref_mark_dead(&dentry->d_lockref);
ending up with corrupted refcount and possible UAF.

AFAICS, the most straightforward way to deal with that would be
to have secondary roots linked via ->d_sib rather than ->d_hash;
then they would remain on the list until killed, and we could
use d_add_waiter() machinery to wait for eviction in progress.

Changes:
* secondary roots look the same as ->s_root from d_unhashed()
and d_unlinked() POV now.
* secondary roots are represented as "no parent, but on ->d_sib"
instead of "no parent, but on ->d_hash".
* since ->d_sib is a plain hlist, we protect it with per-superblock
spinlock (sb->s_roots_lock) instead of the LSB of the head pointer (for
non-root dentries it would be protected by ->d_lock of parent).
* __d_obtain_alias() uses ->d_sib for linkage when allocating
a secondary root.
* d_splice_alias_ops() detects splicing of a secondary root and
removes it from the list before calling __d_move().
* dentry_unlist() detects eviction of a secondary root and
removes it from the list; no need to play the games for d_walk() sake,
since the latter is not going to look for the next sibling of those
anyway.
* ___d_drop() doesn't care about ->s_roots anymore.
* shrink_dcache_for_umount() uses proper locking for access to
the list of secondary roots and if it runs into one that is in the middle
of eviction waits for that to finish.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

shrink_dentry_tree(): unify the calls of shrink_dentry_list()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

shrinking rcu_read_lock() scope in d_alloc_parallel()

The current use of rcu_read_lock() uses in d_alloc_parallel()
is fairly opaque - the single large scope serves two purposes.

We start with lookup in normal hash, and there rcu_read_lock()
scope puts __d_lookup_rcu() and subsequent lockref_get_not_dead() into
the same RCU read-side critical area.

If no match is found, we proceed to lock the hash chain of
in-lookup hash and scan that for a match.  If we find a match, we want
to grab it and wait for lookup in progress to finish.  Since the bitlock
we use for these hash chains has to nest inside ->d_lock, we need to
unlock the chain first and use lockref_get_not_dead() on the match.
That has to be done without breaking the RCU read-side critical area,
and we use the same rcu_read_lock() scope to bridge over.

The thing is, after having grabbed the reference (and it is
very unlikely to fail) we proceed to grab ->d_lock - d_wait_lookup()
and __d_lookup_unhash()/__d_wake_in_lookup_waiters() are using that for
serialization. That makes lockref_get_not_dead() pointless - trying
to avoid grabbing ->d_lock for refcount increment, only to grab it
anyway immediately after that. If we grab ->d_lock first and replace
lockref_get_not_dead() with direct check for sign and increment if
non-negative we can move rcu_read_unlock() to immediately after grabbing
->d_lock.  Moreover, we don't need the RCU read-side critical area to
be contiguous since before earlier __d_lookup_rcu() - we can just as
well terminate the earlier one ASAP and call rcu_read_lock() again only
after having found a match (if any) in the in-lookup hash chain.

That makes the entire thing easier to follow and the purpose
of those rcu_read_lock() calls easier to describe - the first scope is
for __d_lookup_rcu() + lockref_get_not_dead(), the second one bridges
over from the bitlock scope to the ->d_lock scope on the match found in
in-lookup hash.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d_walk(): shrink rcu_read_lock() scope

we only need it to bridge over from ->d_lock scope of child to ->d_lock
scope of parent; dropping ->d_lock at rename_retry doesn't need to be
in rcu_read_lock() scope.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

document dentry_kill()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

adjust calling conventions of lock_for_kill(), fold __dentry_kill() into dentry_kill()

Pull dropping ->d_lock on lock_for_kill() failure into lock_for_kill() itself.
That reduces dentry_kill() to
if (!lock_for_kill(dentry))
return NULL;
return __dentry_kill(dentry);
at which point it's easier to move that if (...) into the beginning of __dentry_kill()
itself and rename it into dentry_kill().

Document the new calling conventions of lock_for_kill().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Document rcu_read_lock() use in select_collect2()

If select_collect2() finds something that is neither busy nor can
be moved to shrink list, it needs to return that to caller's caller
(shrink_dcache_tree()) ASAP and do so without grabbing references (among
other things, it might be already dying, in which case refcount can't be
incremented). We are called inside a ->d_lock scope, but that scope is
going to be terminated as soon as we return to caller (d_walk()); ->d_lock
will be retaken by shrink_dcache_tree(), but we need to bridge between
these scopes, turning them into contiguous RCU read-side critical area.

We do that with rcu_read_lock() scope - it spans from unbalanced
rcu_read_lock() in select_collect2() to unbalanced rcu_read_unlock()
in shrink_dcache_tree(). That works, but it really needs to be documented;
it's rather unidiomatic and it had caused quite a bit of confusion - some
of it in form of patches "fixing" the damn thing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Shift rcu_read_{,un}lock() inside fast_dput()

Shrink rcu_read_lock() scopes surrounding fast_dput() calls.
Both callers are immediately preceded and followed by
rcu_read_lock()/rcu_read_unlock() resp. Shrink that down
into fast_dput() itself; in case when fast_dput() ends up
grabbing ->d_lock, we can pull rcu_read_unlock() up to
right after spin_lock().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

simplify safety for lock_for_kill() slowpath

rcu_read_lock() scopes in dentry eviction machinery are too wide
and badly structured; we end up with too many of those, quite
a few essentially identical.  Worse, quite a few of the function
involved are not neutral wrt that, making them harder to reason about.

rcu_read_lock() scope is not the only thing establishing an
RCU read-side critical area - spin_lock scope does the same and
they can be mixed - the sequence
rcu_read_lock()
...
spin_lock()
...
rcu_read_unlock()
...
rcu_read_lock()
...
spun_unlock()
...
rcu_read_unlock()
is an unbroken RCU read-side critical area.

Use of that observation allows to simplify things.  First of all,
lock_for_kill() relies upon being in an unbroken RCU read-side
critical area.  It's always called with ->d_lock held, and normally
returns without having ever dropped that spinlock.  We would not
need rcu_read_lock() at all, if not for the slow path - if trylock
of inode->i_lock fails, we need to drop and retake ->d_lock.

Having all calls of lock_for_kill() inside an rcu_read_lock() scope
takes care of that, but to show that lock_for_kill() slow path is safe,
we need to demonstrate such rcu_read_lock() scope for any call chain
leading to lock_for_kill().  Which is not fun, seeing that there are
10 such scopes, with 5 distinct beginnings between them.

Case 1: opens in dput() proceeds through fast_dput() grabbing ->d_lock,
returning false into dput() and there a call of finish_dput() which calls
dentry_kill(), which calls lock_for_kill(); ends in dentry_kill(), either
right after lock_for_kill() success or right after dropping ->d_lock
on lock_for_kill() failure.  ->d_lock is held continuously all the way
into lock_for_kill().

Case 2: opens in dentry_kill(), where we proceed to the same call of
dentry_kill() as in case 1.  ->d_lock is held since before the
beginning of the scope and all the way into lock_for_kill().

Case 3: opens in select_collect2(), proceeds through the return to
d_walk() and to shrink_dcache_tree() where we grab ->d_lock and
proceed to call shrink_kill(), which calls dentry_kill(), then as
in the previous scopes.

Case 4: opens in shrink_dentry_list(), followed by call of shrink_kill(),
then same as in case 3.  ->d_lock is held since before the beginning
of the scope and all the way into lock_for_kill().

Case 5: opens in shrink_kill(), where it's immediately followed by
call of dentry_kill(), then same as in the previous scopes.  ->d_lock
is held since before the beginning of the scope all the way into
lock_for_kill().

Note that in cases 2, 4 and 5 the slow path of lock_for_kill() is the
only part of rcu_read_lock() scope that is not covered by spinlock
scopes.  In case 1 we have the area in fast_dput() as well and in
case 3 - the return path from select_collect2() and chunk in shrink_dcache_tree()
up to grabbing ->d_lock.

Seeing that the reasons we need rcu_read_lock() in these additional
areas are completely unrelated to lock_for_kill() slow path, the things
get much more straightforward with
* explicit rcu_read_lock() scope surrounding the area in slow path
of lock_for_kill() where ->d_lock is not held
* shrink_dentry_list() dropping rcu_read_lock() as soon as it has
grabbed ->d_lock.
* dput() dropping rcu_read_lock() just before calling finish_dput().
* rcu_read_lock() calls in finish_dput(), shrink_kill() and
shrink_dentry_list() are removed, along with rcu_read_unlock() calls in
dentry_kill().

RCU read-side critical areas are unchanged by that, safety of lock_for_kill()
slow path is trivial to verify and a bunch of rcu_read_lock() scopes either
gone or become easier to describe.

Update the comments on locking conventions and memory safety considerations,
including the NORCU case.

Incidentally, all calls of fast_dput() are immediately preceded by rcu_read_lock()
and followed by rcu_read_unlock() now, which will allow to simplify those on
the next step...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fold lock_for_kill() and __dentry_kill() into common helper

There are two callers of lock_for_kill() and both are followed
by the same sequence of actions:
* in case of failure, drop ->d_lock, do rcu_read_unlock() and
go away
* in case of success, do rcu_read_unlock() followed by
passing dentry to __dentry_kill(); if the latter returns NULL, go away.

All calls of __dentry_kill() are paired with lock_for_kill() now;
let's turn that sequence into a new helper (dentry_kill()) and switch
to using it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fold lock_for_kill() into shrink_kill()

Both callers have exact same shape.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

shrink_dentry_list(): start with removing from shrink list

Currently we leave dentry on the list until we are done with
lock_for_kill().  That guarantees that it won't have been
even scheduled for removal until we remove it from the list
and drop ->d_lock.  We grab ->d_lock and rcu_read_lock()
and call lock_for_kill().  There are four possible cases:
1) lock_for_kill() has succeeded; dentry and its inode
(if any) are locked, dentry refcount is zero and we can
remove it from shrink list and feed it to shrink_kill().
2) lock_for_kill() fails since dentry has become busy.
Nothing to do, rcu_read_unlock(), remove from shrink list,
drop ->d_lock and move on.
3) lock_for_kill() fails since dentry is currently
being killed - already entered __dentry_kill(), but hasn't
reached dentry_unlist() yet.  Nothing to do, we should just
do rcu_read_unlock(), remove from shrink list so that
whoever's executing __dentry_kill() would free it once they
are done, drop ->d_lock and move on - same actions as in
case (2).
4) lock_for_kill() fails since dentry has been killed
(reached dentry_unlist(), DCACHE_DENTRY_KILLED set in ->d_flags).
In that case whoever had been killing it had already seen it
on our shrink list and skipped freeing it.  At that point it's
just a passive chunk of memory; rcu_read_unlock(), remove from
the list, drop ->d_lock and use dentry_free() to schedule
freeing.

While that works, there's a simpler way to do it:
* grab ->d_lock
* remove dentry from our shrink list
* if DCACHE_DENTRY_KILLED is already set, drop ->d_lock,
call dentry_free() and move on.
* otherwise grab rcu_read_lock() and call lock_for_free()
* if lock_for_kill() succeeds, feed dentry
to shrink_kill(), otherwise drop the locks and move on.

The end result is equivalent to the old variant.  The only difference
arises if at the time we grab ->d_lock dentry had refcount 0 and
lock_for_kill() had failed spin_trylock() and had to drop and regain
->d_lock.  Otherwise nobody can observe at which point within the
unbroken ->d_lock scope dentry had been removed from the shrink list -
all accesses to ->d_lru are under ->d_lock.

If ->d_lock had been dropped and regained, it is possible for another
thread to feed that dentry to __dentry_kill(); if it doesn't get to
dentry_unlist() before we regain ->d_lock, behaviour is still identical -
it's case (3) and by the time __dentry_kill() would've gotten around
to checking if the victim is on shrink list, it would've been already
removed from ours.

If __dentry_kill() from another thread *does* get to dentry_unlist(),
in the old variant we would have __dentry_kill() leave calling
dentry_free() to us and in the new one __dentry_kill() would've called
dentry_free() itself.  Since we are under rcu_read_lock(), we are
guaranteed that actual freeing won't happen until we get around to
rcu_read_unlock().  IOW, the new variant is still safe wrt UAF, if
not for the same reason as the old one, and overall result is the same;
the only difference is which threads ends up scheduling the actual
freeing of dentry.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d_prune_aliases(): make sure to skip NORCU aliases

Either they are busy (in which case they won't be moved to shrink
list anyway) or they have a zero refcount, in which case we really
shouldn't mess with them - whoever had dropped the refcount to
zero is on the way to evicting and freeing them.

That way we are guaranteed that only the thread that has dropped
refcount of NORCU dentry to zero might call lock_for_kill() and
__dentry_kill() for those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

kill d_dispose_if_unused()

Rename to_shrink_list() into __move_to_shrink_list(), document and
export it. Switch d_dispose_if_unused() users to that and kill
d_dispose_if_unused() itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

make to_shrink_list() return whether it has moved dentry to list

... and make it check the refcount for being zero in addition to
dentry not being on a shrink list already. Simplifies the callers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

select_collect(): ignore dentries on shrink lists if they have positive refcounts

If all dentries we find have positive refcounts and some happen
to be on shrink lists, there's no point trying to steal them in the
select_collect2() phase - we won't be able to evict any of them. Busy on
shrink lists is still busy...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

find_acceptable_alias(): skip NORCU aliases with zero refcount

similar to d_find_any_alias() situation

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fix a race between d_find_any_alias() and final dput() of NORCU dentries

Refcount of a NORCU dentry must not be incremented after having dropped
to zero.  Otherwise we might end up with the following race:
CPU1: in fast_dput(d), rcu_read_lock();
CPU1: decrements refcount of d to 0
CPU1: notice that it's unhashed
CPU2: grab a reference to d
CPU2: dput(d), freeing d
CPU1: ... looks like we need to evict d, let's grab ->d_lock, recheck
      the refcount, etc.
and that spin_lock(&d->d_lock) ends up a UAF, despite still being in
an RCU read-side critical area started back when the refcount had been
positive.  If not for DCACHE_NORCU in d->d_flags freeing would've been
RCU-delayed, so we'd have grabbed ->d_lock, noticed the negative value
stored into refcount by __dentry_kill(), dropped the locks and that would
be it.  For NORCU dentries freeing is _not_ delayed, though.

Most of the non-counting references are excluded for NORCU dentries -
they are not allowed to be hashed, they never get placed on LRU, they
never get placed into anyone's list of children and while dput_to_list()
might put them into a shrink list, nobody bumps refcount of something
that had been reached that way.

However, inode's list of aliases can be a problem - it does not contribute
to dentry refcount (for obvious reasons) and we *do* have places that
grab references to something found on that list - that's precisely what
d_find_alias() is.  In case of d_find_alias() we are safe - it skips
unhashed aliases, so all NORCU ones are ignored there.  d_find_any_alias()
is *not* limited to hashed ones, though, and while it's usually called
for directories (which never get NORCU dentries), there are callers that
use it to get something for non-directories with no hashed aliases.

Having d_find_any_alias() hit a NORCU dentry is not impossible - it can
be easily arranged if you have CAP_DAC_READ_SEARCH (memfd_create() + mmap()
+ name_to_handle_at() for /proc/self/map_files/<...> + munmap() +
open_by_handle_at() will do that, and adding a second memfd_create() for
mount_fd makes it possible to do that without having memfd pinned).
The race window is narrow, and it's probably not feasible on bare hardware,
but...

It's not hard to fix, fortunately:
* separate __d_find_dir_alias() (== current __d_find_any_alias()) to
be used for directory inodes.
* provide dget_alias_ilocked() that would return false for NORCU
dentries with zero refcount and return true incrementing refcount otherwise
* make __d_find_any_alias() go over the list of aliases, using
dget_alias_ilocked() and returning the alias it succeeds on (normally the
first one).  Any NORCU alias with zero refcount is going to be evicted by
the thread that had dropped the final reference; this makes __d_find_any_alias()
pretend it had lost the race with eviction.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

alloc_path_pseudo(): make sure we don't end up with NORCU dentries for directories

A lot of places relies upon directories never having NORCU dentries;
currently that property holds, but the proof is not straightforward
and rather brittle.

It's better to have that verified in the sole caller of d_alloc_pseudo(),
so that any future bugs in that direction were caught early.

That way we can be sure that
* current directory of any process is not NORCU
* root directory of any process is not NORCU
* starting point of any LOOKUP_RCU pathwalk is not NORCU
* dget_parent() can rely upon ->d_parent not being NORCU
* d_walk() and is_subdir() can rely upon the same
* alloc_file_pseudo() won't create multiple aliases for a directory
without having to go through a convoluted audit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

VFS: use wait_var_event for waiting in d_alloc_parallel()

Parallel lookup starts with a call of d_alloc_parallel().  That primitive
either returns a matching hashed dentry or allocates a new one in the
in-lookup state and returns it to the caller.  Once the caller is done
with lookup, it indicates so either by call of d_{splice_alias,add}()
or by call of d_done_lookup(); at that point dentry leaves the in-lookup
state.

If d_alloc_parallel() finds a matching in-lookup dentry, it must wait for
that dentry to leave the in-lookup state, one way or another.  Currently
by supplying wait_queue_head to d_alloc_parallel().  If d_alloc_parallel()
creates a new in-lookup dentry, the address of that wait_queue_head is stored
in ->d_wait of new dentry and stays there while it's in the in-lookup;
subsequent d_alloc_parallel() will wait on the queue found in the matching
in-lookup dentry.  Transition out of in-lookup state wakes waiters on that
queue (if any).

That works, but the calling conventions are inconvenient - the caller must
supply wait_queue_head and make sure that it survives at least until the new
in-lookup dentry leaves the in-lookup state.  That amounts to boilerplate
in the d_alloc_parallel() callers that are followed by a call of d_lookup_done()
in the same function; in cases like nfs asynchronous unlink it gets worse than
that.

This patch changes d_alloc_parallel() to use wake_up_var_locked() to
wake up waiters, and wait_var_event_spinlock() to wait.  dentry->d_lock
is used for synchronisation as it is already held and the relevant
times.

That eliminates the need of caller-supplied wait_queue_head, simplifying
the calling conventions.  Better yet, we only need one bit of information
stored in dentry itself: whether there are any waiters to be woken up,
and that can be easily stored in ->d_flags; ->d_wait goes away.

The reason we need that bit (DCACHE_LOOKUP_WAITERS) is that with wait_var
machinery the queues are shared with all kinds of stuff and there's
no way tell if any of the waiters have anything to do with our dentry;
most of the time none of them will be relevant, so we need to avoid the
pointless wakeups.

Another benefit of the new scheme comes from the fact that wakeups
have to be done outside of write-side critical areas of ->i_dir_seq;
with the old scheme we need to carry the value picked from ->d_wait from
__d_lookup_unhash() to the place where we actually wake the waiters up.
Now we can just leave DCACHE_LOOKUP_WAITERS in ->d_flags until we get
to doing wakeups - that's done within the same ->d_lock scope, so we
are fine; new bit is accessed only under ->d_lock and it's seen only
on dentries with DCACHE_PAR_LOOKUP in ->d_flags.

__d_lookup_unhash() no longer needs to re-init ->d_lru.  That was
previously shared (in a union) with ->d_wait but ->d_wait is now gone
so it no longer corrupts ->d_lru.

Co-developed-by: Al Viro <viro@zeniv.linux.org.uk> # saner handling of flags
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

accel/ethosu: fix OOB write in ethosu_gem_cmdstream_copy_and_validate()

The command stream parsing loop increments the index variable a second
time when a 64-bit command word is encountered (bit 14 set), but does
not re-check the loop bound before writing the second word:

    for (i = 0; i < size / 4; i++) {
        bocmds[i] = cmds[0];
        if (cmd & 0x4000) {
            i++;
            bocmds[i] = cmds[1];   /* unchecked */
        }
    }

The buffer bocmds is backed by a DMA allocation of exactly size bytes
from drm_gem_dma_create(ddev, size), giving valid indices [0, size/4-1].

When i == size/4 - 1 on entry to an iteration and bit 14 of cmds[0] is
set, bocmds[size/4-1] is written in bounds, i is then incremented to
size/4, and bocmds[size/4] writes four bytes past the end of the
allocation.

Userspace controls both the buffer contents and the size argument via
the ioctl, making this a userspace-triggerable heap out-of-bounds write.

Fix by checking the incremented index against the buffer bound before
the second write and returning -EINVAL if the buffer is too small to
contain the extended command.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260523190843.33977-1-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

Merge tag 'batadv-next-pullrequest-20260603' of https://git.open-mesh.org/batadv

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches, all by
Sven Eckelmann:

- tp_meter: fix various minor issues (8 patches)

- tp_meter: split generic session type in sender and receiver type

- tp_meter: consolidate locking for congestion control (2 patches)

- bla: annotate lasttime access with READ/WRITE_ONCE

- elp: prevent transmission interval underflow

- tt: sync local and global tvlv preparation return values

- tt: directly retrieve wifi flags of net_device

* tag 'batadv-next-pullrequest-20260603' of https://git.open-mesh.org/batadv:
  batman-adv: tt: directly retrieve wifi flags of net_device
  batman-adv: tt: sync local and global tvlv preparation return values
  batman-adv: prevent ELP transmission interval underflow
  batman-adv: bla: annotate lasttime access with READ/WRITE_ONCE
  batman-adv: tp_meter: consolidate congestion control variables
  batman-adv: tp_meter: use locking for all congestion control variables
  batman-adv: tp_meter: split vars into sender and receiver types
  batman-adv: tp_meter: add only finished tp_vars to lists
  batman-adv: tp_meter: handle seqno wrap-around for fast recovery detection
  batman-adv: tp_meter: fix fast recovery precondition
  batman-adv: tp_meter: avoid divide-by-zero for dec_cwnd
  batman-adv: tp_meter: avoid window underflow
  batman-adv: tp_meter: initialize dec_cwnd explicitly
  batman-adv: tp_meter: initialize dup_acks explicitly
  batman-adv: tp_meter: keep unacked list in ascending ordered
====================

Link: https://patch.msgid.link/20260603072527.174487-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mv643xx: fix OF node refcount

Platform devices created with platform_device_alloc() call
platform_device_release() when the last reference to the device's
kobject is dropped. This function calls of_node_put() unconditionally.
This works fine for devices created with platform_device_register_full()
but users of the split approach (platform_device_alloc() +
platform_device_add()) must bump the reference of the of_node they
assign manually. Add the missing call to of_node_get().

Cc: stable@vger.kernel.org
Fixes: 76723bca2802 ("net: mv643xx_eth: add DT parsing support")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260602073414.22500-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/dns_resolver: use kasprintf + kmemdup_nul to simplify dns_query

Use kasprintf() for descriptions with a query type and kmemdup_nul()
otherwise to simplify dns_query().

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260602071343.962830-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: kpu: Default profile updates

Add support for parsing the following:
1. fabric path header
2. tpids 0x88a8, 0x9100 and 0x9200 parsing for
first pass and second pass packets
3. parse stacked VLANs
4. RoCEv2 header with UDP destination port 4791
5. single SBTAG parsing

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Nitin Shetty J <nshettyj@marvell.com>
Link: https://patch.msgid.link/20260602040535.3975769-1-nshettyj@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-rds-roce-support-follow-ups'

Allison Henderson says:

====================
selftests: rds: ROCE support follow ups

This is a follow up series to the "Add ROCE support to rds selftests"
series.  The first patch renames run.sh to rds_run.sh, which provides
a self-describing name that appears on the netdev CI dashboard.

The second patch addresses a sashiko complaint that I thought was
worth circling back for.  In the patch "pin RDS sockets to their
intended transport," sockets are pinned to the specific transport they
are meant to test.  By default, socket transports are implicitly
selected based on the network topology, but it is possible that they
can fail back to other transports if the underlying connection could
not be established.  So the patch pins them to the intended transport
to avoid false positives.

The third patch "support RDS built as loadable modules," lifts the
CONFIG_MODULES=n requirement, and updates the check_*conf_enabled()
to allow modules set to "=m" and further load the backing modules for
any component set as such.  config.sh is updated to match.

The fourth patch converts the rdma-prerequisite checks to return XFAIL
rather than SKIP, since the RDMA datapath is not run in netdev CI.

Questions, comments and feedback appreciated!
====================

Link: https://patch.msgid.link/20260602050657.26389-1-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: report missing RDMA prereqs as XFAIL

Make the RDMA test return XFAIL rather than skip when RXE is not
available, since the RDMA datapath is not run in netdev CI.

Change the three RDMA-prerequisite checks in check_rdma_conf() and
check_rdma_conf_enabled() to exit with KSFT_XFAIL (2) and tag their
messages [XFAIL] instead of [SKIP].

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260602050657.26389-5-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: support RDS built as loadable modules

Commit 92cc6708f4a2 ("selftests: rds: config: disable modules") set
CONFIG_MODULES=n since run.sh required this kconfig. But disabling
modules also forces every =m option to =n rather than =y, which can
silently drop unrelated features.

This patch removes CONFIG_MODULES=n from the rds selftest config and
updates the check_*conf_enabled() routines to accept a config as
either built-in (=y) or modular (=m). A new probe_module() function
is added to load the backing module when a component is set to be
modular (=m). config.sh no longer forces CONFIG_MODULES=n, so a user
who follows the SKIP message to run config.sh does not silently end
up with modules disabled again.

rds.ko itself is auto-loaded on socket creation, and rds_rdma.ko is
auto-loaded when SO_RDS_TRANSPORT is set with RDS_TRANS_IB, but the
TCP transport (rds_tcp.ko) is not auto-loaded on the bind path, so
the backing modules are loaded explicitly here.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260602050657.26389-4-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: pin RDS sockets to their intended transport

The RDS selftests create AF_RDS sockets but never selects a transport,
so the transport is chosen implicitly based on network topology when
the socket is bound. If underlying connection establishment fails, RDS
can fall back to another transport (e.g. loopback) and the test still
passes, silently bypassing the intended datapath it is meant to
exercise.

Set SO_RDS_TRANSPORT to the proper RDS_TRANS_IB or RDS_TRANS_TCP before
they are bound, so the test fails loudly if the intended transport is
unavailable rather than passing on a different path.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260602050657.26389-3-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Rename run.sh to rds_run.sh

This patch renames run.sh to rds_run.sh. This gives the test a
self-describing name that appears in the netdev CI dashboard.

Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260602050657.26389-2-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: use READ_ONCE() in ipv6_flowlabel_get()

ipv6_flowlabel_get() reads flowlabel_consistency and
flowlabel_state_ranges locklessly.

Use READ_ONCE() for these sysctl accesses.

Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260602002506.1519901-1-runyu.xiao@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: use READ_ONCE() for bindv6only default in inet6_create()

inet6_create() reads net->ipv6.sysctl.bindv6only locklessly.

Use READ_ONCE() for this sysctl access.

Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260602002414.1504106-1-runyu.xiao@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

inet: frags: remove redundant assignment in inet_frag_reasm_prepare()

The assignment is redundant because skb_clone() already copies skb->cb.

Remove the unnecessary code.

Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
Link: https://patch.msgid.link/20260603065323.2736839-1-yuan.gao@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: Report extack error to the user if airoha_tc_htb_modify_queue() fails

Report an extack error message in airoha_tc_htb_modify_queue routine if
airoha_qdma_set_tx_rate_limit() fails.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260603-airoha_tc_htb_modify_queue-err-message-v1-1-33ec3ab997d9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dsa: remove obsolete dsa.txt

dsa.txt has been a redirect to dsa.yaml since commit bce58590d1bd
("dt-bindings: net: dsa: Add DSA yaml binding") introduced the .yaml
schema. The .yaml has the same filename in the same directory, making
this redirect unnecessary for discoverability.

Two files still reference dsa.txt, forcing readers through an extra
hop to reach the .yaml. The stub has not been touched since August
2020. Update references in lan9303.txt and
Documentation/networking/dsa/dsa.rst to point directly to dsa.yaml
and remove the stub.

Signed-off-by: Akash Sukhavasi <akash.sukhavasi@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260603-b4-remove-redirect-stubs-v2-3-c8c19876ab64@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: remove obsolete mdio.txt

mdio.txt has been a single-line redirect to mdio.yaml since
commit 62d77ff7ecbf ("dt-bindings: net: Add a YAML schemas for the
generic MDIO options"), which introduced the .yaml schema and reduced
the .txt to a stub in the same change. The .yaml has the same filename
in the same directory, making this redirect unnecessary for
discoverability.

No files in the tree reference mdio.txt and it has not been touched
since June 2019. Remove the obsolete stub.

Signed-off-by: Akash Sukhavasi <akash.sukhavasi@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260603-b4-remove-redirect-stubs-v2-1-c8c19876ab64@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtnetlink: use dev_isalive() in rtnl_getlink()

rtnl_getlink() uses an RCU lookup to get the netdevice pointer.

When/If rtnl_lock() is used, we should check if the netdevice is not
being dismantled before potentially perform illegal actions.

Move dev_isalive() out of net/core/net-sysfs.c and make it available
in net/core/dev.h.

Return -ENODEV if rtnl_getlink() finds a device which is currently
being dismantled and RTNL is requested.

Fixes: e896e5c0734b ("rtnetlink: do not acquire RTNL in rtnl_getlink() with RTEXT_FILTER_NAME_ONLY")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://patch.msgid.link/20260603180831.1024716-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: Use %pM format specifier for MAC addresses

Convert to %pM instead of using custom code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260603112011.230890-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bridge: mcast: Synchronously shutdown port multicast timers

Currently, while four timers are set up during port multicast context
initialization, only two are synchronously deleted when the context is
de-initialized, just before being deleted.

This is fine because the structure containing the multicast context
(either a bridge port or a VLAN) is only deleted after an RCU grace
period and it will not pass as long as the timers are executing. These
timers are also not supposed to do any work at this stage. They acquire
the bridge multicast lock, see that the multicast context was disabled
and exit.

Make the code more explicit and symmetric and synchronously shutdown all
four timers when the multicast context is de-initialized. Use
timer_shutdown_sync() to guarantee that the timers will not be re-armed
given that the containing structure is being deleted.

Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260603103522.622411-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'rndis_host-add-le310x1-id-and-enable-low-power-handling'

Shaoxu Liu says:

====================
rndis_host: add LE310X1 ID and enable low-power handling

This series adds RNDIS support for Telit Cinterion LE310X1 and then enables
USB power management for that specific ID.
====================

Link: https://patch.msgid.link/tencent_29CB862D5756CBCBAFD2EE436EBAC98A7E05@qq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rndis_host: enable power management for Telit LE310X1

Enable autosuspend support for Telit Cinterion LE310X1 RNDIS interface
by selecting a driver_info variant with manage_power callback.

This keeps power management scoped to the new Telit ID only, and avoids
changing behavior for all existing RNDIS devices.

Signed-off-by: Shaoxu Liu <shaoxul@foxmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/tencent_B7686B84CD4B76D76BB912FA6367FAC2CA05@qq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rndis_host: add Telit LE310X1 RNDIS USB ID

Add a device match entry for Telit Cinterion LE310X1 RNDIS interface
(VID:PID 1bc7:7030).

This is a functional no-op and keeps using the generic rndis_info for now.
Power-management behavior is handled in a follow-up patch.

Signed-off-by: Shaoxu Liu <shaoxul@foxmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/tencent_F1AF1F5AD39C56485BD16C6DB2415E5B9508@qq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

inet: frags: fix use-after-free caused by the fqdir_pre_exit() flush

On netns teardown, fqdir_pre_exit() walks the fqdir rhashtable and
flushes every fragment queue that is not yet complete using
inet_frag_queue_flush(). That helper frees all the skbs queued on the
fragment queue but does not set INET_FRAG_COMPLETE, and leaves
q->fragments_tail and q->last_run_head pointing at the freed skbs.
The queue itself stays in the rhashtable.

fqdir_pre_exit() first lowers high_thresh to 0 to stop new queue lookups,
but it cannot stop a fragment that already obtained the queue through
inet_frag_find() earlier and stalled just before taking the queue lock.
Once that fragment resumes after the flush and takes the queue lock,
it passes the INET_FRAG_COMPLETE check and then dereferences the freed
fragments_tail. inet_frag_queue_insert() reads FRAG_CB() and ->len of
that pointer and, on the append path, writes ->next_frag, causing a
slab use-after-free. IPv6, nf_conntrack_reasm6 and 6lowpan reassembly
share the same flush path and are affected as well.

Reset rb_fragments, fragments_tail and last_run_head in
inet_frag_queue_flush() so a flushed queue no longer points at the
freed skbs. A fragment that resumes after the flush and takes the
queue lock then finds an empty queue and starts a new run instead of
dereferencing the freed fragments_tail. ip_frag_reinit() already
performed this reset after its own flush, so drop the now duplicate
code there.

Cc: stable@vger.kernel.org
Fixes: 006a5035b495 ("inet: frags: flush pending skbs in fqdir_pre_exit()")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Link: https://patch.msgid.link/ah6ukYq5G98LshdA@v4bel
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: annotate data-races in sysfs and procfs

bonding sysfs and procfs read parameters locklessly,
while drivers/net/bonding/bond_options.c can write over them.

Add missing READ_ONCE()/WRITE_ONCE() annotations.

This came as a prereq to avoid RTNL in bond_fill_info().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260602152748.2564393-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i2c: busses: make K1 driver default for SpacemiT platforms

Enable I2C_K1 by default when ARCH_SPACEMIT is configured to ensure SD
card functionality works out-of-the-box.

SpacemiT K1 boards use I2C-controlled PMICs (like the P1 chip) to
provide SD card power supplies. Without the I2C_K1 driver enabled,
regulators cannot be controlled and SD card detection/operation fails.

Suggested-by: Margherita Milani <margherita.milani@amarulasolutions.com>
Suggested-by: Yixun Lan <dlan@kernel.org>
Signed-off-by: Iker Pedrosa <ikerpedrosam@gmail.com>
Reviewed-by: Yixun Lan <dlan@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260526-orangepi-sd-card-i2c-v1-1-b92268bfd467@gmail.com

Merge tag 'drm-rust-next-2026-06-04' of https://gitlab.freedesktop.org/drm/rust/kernel into drm-next

DRM Rust changes for v7.2-rc1

- Driver Core (shared via signed tag dd-lifetimes-7.2-rc1):

  - Introduce Higher-Ranked Lifetime Types (HRT) for Rust device
    drivers, allowing driver structs to hold device resources like
    pci::Bar and IoMem directly with a lifetime tied to the binding
    scope, removing the need for Devres indirection and ARef<Device>.

  - Replace drvdata() with scoped registration data on the auxiliary
    bus, using the new ForLt trait to thread lifetimes through
    registrations. Remove drvdata() and driver_type.

- DRM:

  - Add GPUVM immediate mode abstraction for Rust GPU drivers:
    - In immediate mode, GPU virtual address space state is updated
      during job execution (in the DMA fence signalling critical path),
      keeping the GPUVM and the GPU's address space always in sync.

    - Provide GpuVm, GpuVa, and GpuVmBo types for managing address
      spaces, virtual mappings, and GEM object backing respectively.

    - Provide split-merge map/unmap operations that handle partial
      overlaps with existing mappings.

    - drm_exec integration for dma_resv locking and GEM object
      validation based on the external/evicted object lists are not
      yet covered and planned as follow-up work.

  - Introduce DeviceContext type state for drm::Device, allowing
    drivers to restrict operations to contexts where the device is
    guaranteed to be registered (or not yet registered) with userspace.

  - Add FEAT_RENDER flag to the Driver trait for render node support.

- Nova:

  - Hopper/Blackwell enablement:
    - Add GPU identification and architecture-based HAL selection for
      Hopper (GH100) and Blackwell (GB100, GB202).

    - Implement the FSP (Foundation Security Processor) boot path used by
      Hopper and Blackwell, including FSP falcon engine support, EMEM
      operations, MCTP/NVDM message infrastructure, and FSP Chain of
      Trust boot with GSP lockdown release.

    - Add support for 32-bit firmware images and auto-detection of
      firmware image format.

    - Add architecture-specific framebuffer, sysmem flush, PCI config
      mirror, DMA mask, and WPR/non-WPR heap sizing.

  - GSP boot and unload:
    - Refactor the GSP boot process into a chipset-specific HAL,
      keeping the SEC2 and FSP boot paths separated cleanly.

    - Implement proper driver unload: send UNLOADING_GUEST_DRIVER
      command, run Booter Unloader and FWSEC-SB upon unbinding, and run
      the unload bundle on Gsp::boot() failure. This removes the need
      for a manual GPU reset between driver unbind and re-probe.

  - GA100 support:
    - Add support for the GA100 GPU, including IFR header detection and
      skipping, correct fwsignature selection, conditional FRTS boot,
      and documentation of the IFR header layout.

  - VBIOS hardening and refactoring:
    - Harden VBIOS parsing with checked arithmetic, bounds-checked
      accesses, and FromBytes-based structure reads throughout the FWSEC
      and Falcon data paths. Simplify the overall VBIOS module
      structure.

  - HRT adoption:
    - Use lifetime-parameterized pci::Bar directly, replacing the
      Arc<Devres<Bar0>> indirection. Replace ARef<Device> with &'bound
      Device in SysmemFlush and the GSP sequencer. Separate the driver
      type from driver data.

  - Misc:
    - Rename module names to kebab-case (nova-drm, nova-core).

    - Require little-endian in Kconfig, making the existing assumption
      explicit.

- Tyr:

  - Define comprehensive typed register blocks for GPU_CONTROL,
    JOB_CONTROL, MMU_CONTROL (including per-address-space registers),
    and DOORBELL_BLOCK using the kernel register!() macro. This replaces
    manual bit manipulation with typed register and field accessors.

  - Add shmem-backed GEM objects and set DMA mask based on GPU physical
    address width.

  - Adopt HRT: separate driver type from driver data, and use IoMem
    directly instead of Devres for register access during probe.

  - Move clock cleanup into a Drop implementation.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: "Danilo Krummrich" <dakr@kernel.org>
Link: https://patch.msgid.link/DJ0IF39U9ETK.PCCUO7ZEQ4S0@kernel.org

i2c: Use named initializers for arrays of i2c_device_data

While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.

The mentioned robustness is relevant for a planned change to struct
i2c_device_id that replaces .driver_data by an anonymous union.

While touching all these arrays, unify usage of whitespace in the list
terminator.

This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20260518164510.805502-2-u.kleine-koenig@baylibre.com
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>

accel/ethosu: reject DMA commands with uninitialized length

cmd_state_init() initializes the command state with memset(0xff),
leaving dma->len at U64_MAX to signal missing setup. The only setter
is NPU_SET_DMA0_LEN; if userspace omits this command and issues
NPU_OP_DMA_START, dma->len remains U64_MAX.

In dma_length(), a positive stride added to U64_MAX wraps to a small
value. With size0 == 1, check_mul_overflow() does not trigger and
dma_length() returns 0 instead of U64_MAX. The caller's U64_MAX check
then passes, region_size[] stays 0, and the bounds check in
ethosu_job.c is bypassed, allowing hardware to execute DMA with stale
physical addresses.

Fix by checking for U64_MAX at the start of dma_length() before any
arithmetic, consistent with the sentinel value used throughout the
driver to detect uninitialized fields.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260524130319.12747-1-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

accel/ethosu: fix arithmetic issues in dma_length()

dma_length() derives DMA region usage from command stream values and
updates region_size[]:

len = ((len + stride[0]) * size0 + stride[1]) * size1
region_size[region] = max(..., len + dma->offset)

Several arithmetic issues can corrupt the derived region size:

- signed stride values may underflow when added to len
- intermediate multiplications may overflow
- len + dma->offset may overflow during region_size updates
- dma_length() error returns were not validated by the caller

region_size[] is later used by ethosu_job.c to validate command stream
accesses against GEM buffer sizes. Arithmetic wraparound can therefore
under-report region usage and bypass the bounds validation.

Fix by validating signed additions, using overflow helpers for
multiplications and offset updates, and propagating dma_length()
failures to the caller.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260524103710.47397-1-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

accel/ethosu: fix wrong weight index in NPU_SET_SCALE1_LENGTH on U85

On non-U65 hardware (e.g. U85), opcode 0x4093 is NPU_SET_WEIGHT2_LENGTH.
The BASE handler for the same opcode correctly assigns to
st.weight[2].base, but the LENGTH handler mistakenly assigns cmds[1]
to st.weight[1].length instead of st.weight[2].length.

This leaves weight[2].length at its initialised sentinel value of
0xffffffff and corrupts weight[1].length with the user-supplied value,
breaking the software bounds-check state for both weight buffers on U85.

Fix the index to match the BASE handler.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260523210840.92039-3-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

accel/ethosu: reject NPU_OP_RESIZE commands from userspace

NPU_OP_RESIZE is a U85-only command that the driver does not yet
implement. The existing WARN_ON(1) placeholder fires unconditionally
whenever userspace submits this command via DRM_IOCTL_ETHOSU_GEM_CREATE,
causing unbounded kernel log spam.

If panic_on_warn is set the kernel panics, giving any unprivileged user
with access to the DRM device a trivial denial-of-service primitive.

Replace the WARN_ON(1) with an explicit -EINVAL return so the ioctl
rejects the command before it reaches hardware.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260523210840.92039-2-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

tools: ynl: try to avoid the very slow YAML loader

Turns out Python YAML defaults to a pure Python loader for YAML
files which is a lot slower than the C loader (using libyaml).
Try to use the C one whenever possible.

The avg time to run:
$ tools/net/ynl/pyynl/cli.py --family tc --no-schema
drops from 300+ ms to 115 ms with this change (40 samples).

We could drop the load time further to 85 ms if we "compiled"
the specs to JSON. Slightly tricky parts are that we don't
currently install the specs at all on make install, so it's
unclear where to put the conversion. Also JSON has questionable
support for comments and we need an SPDX line.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260603210810.2636193-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

accel/ethosu: fix IFM region index out-of-bounds in command stream parser

NPU_SET_IFM_REGION extracts the region index with param & 0x7f, giving
a maximum value of 127. However region_size[] and output_region[] in
struct ethosu_validated_cmdstream_info are both sized to
NPU_BASEP_REGION_MAX (8), giving valid indices [0..7].

Every other region assignment in the same switch uses param & 0x7:
  NPU_SET_OFM_REGION:  st.ofm.region  = param & 0x7;
  NPU_SET_IFM2_REGION: st.ifm2.region = param & 0x7;
  NPU_SET_WEIGHT_REGION: st.weight[0].region = param & 0x7;
  NPU_SET_SCALE_REGION:  st.scale[0].region  = param & 0x7;

The 0x7f mask on IFM is inconsistent and appears to be a typo.

feat_matrix_length() and calc_sizes() use the region index directly
as an array subscript into the kzalloc'd info struct:
  info->region_size[fm->region] = max(...);

A userspace caller supplying NPU_SET_IFM_REGION with param > 7 causes
a write up to 127*8 = 1016 bytes past the start of region_size[],
corrupting adjacent kernel heap data.

Fix by applying the same & 0x7 mask used by all other region
assignments.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Link: https://patch.msgid.link/20260523195159.55801-1-meatuni001@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.1-rc7).

Silent conflicts:

net/wireless/nl80211.c
  cb9959ab5f99 ("wifi: cfg80211: enforce HE/EHT cap/oper consistency")
  a384ae969902 ("wifi: cfg80211: move AP HT/VHT/... operation to beacon info")
https://lore.kernel.org/aiGJDaHV4UlCexIQ@sirena.org.uk

Conflicts:

drivers/net/wireless/intel/iwlwifi/mld/ap.c
  a342c99cb70d ("wifi: iwlwifi: mld: honor BSS_CHANGED_BEACON_ENABLED")
  9bf1b409afc7 ("wifi: iwlwifi: mld: send tx power constraints before link activation")
https://lore.kernel.org/ah2bfedhV45ZxMO8@sirena.org.uk

drivers/net/wireless/intel/iwlwifi/pcie/drv.c
  093305d801fa ("wifi: iwlwifi: pcie: simplify the resume flow if fast resume is not used")
  e2323929a68a ("wifi: iwlwifi: pcie: add debug print for resume flow if powered off")
https://lore.kernel.org/ah2bfedhV45ZxMO8@sirena.org.uk

Adjacent changes:

drivers/net/ethernet/airoha/airoha_eth.c
  b38cae85d1c4 ("net: airoha: Fix use-after-free in metadata dst teardown")
  ec6c391bcca7 ("net: airoha: Introduce airoha_gdm_dev struct")

drivers/net/ethernet/microchip/lan743x_main.c
  8173d22b211f ("net: lan743x: permit VLAN-tagged packets up to configured MTU")
  e3c6508a46f5 ("net: lan743x: avoid netdev-based logging before netdev registration")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter, wireless and Bluetooth.

  Current release - fix to a fix:

   - Bluetooth: MGMT: fix backward compatibility with bluetoothd
     which adds stray bytes to MGMT_OP_ADD_EXT_ADV_DATA

  Previous releases - regressions:

   - af_unix: fix inq_len update inaccuracy on partial read

   - eth: fec: fix pinctrl default state restore order on resume

   - wifi: iwlwifi:
       - mvm: don't support the reset handshake for old firmwares
       - pcie: simplify the resume flow if fast resume is not used,
         work around NIC access failures

  Previous releases - always broken:

   - Bluetooth: L2CAP: reject BR/EDR signaling packets over MTUsig

   - sctp: fix a couple of bugs in COOKIE_ECHO processing

   - sched: fix pedit partial COW leading to page cache corruption

   - wifi: nl80211: reject oversized EMA RNR lists

   - netfilter:
       - conntrack_irc: fix possible out-of-bounds read
       - bridge: make ebt_snat ARP rewrite writable

   - appletalk: zero-initialize aarp_entry to prevent heap info leak

   - ipv4: restrict IPOPT_SSRR and IPOPT_LSRR options

   - mptcp: fix number of bugs reported by AI scans and discovered
     during NVMe over MPTCP testing"

* tag 'net-7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  Reapply "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"
  udp: clear skb->dev before running a sockmap verdict
  sctp: purge outqueue on stale COOKIE-ECHO handling
  bonding: annotate data-races arcound churn variables
  net/802/mrp: fix vector attribute parsing in mrp_pdu_parse_vecattr
  rtase: Avoid sleeping in get_stats64()
  ieee802154: 6lowpan: only accept IPv6 packets in lowpan_xmit()
  ipv6: mcast: Fix use-after-free when processing MLD queries
  selftests: net: add vxlan vnifilter notification test
  vxlan: vnifilter: fix spurious notification on VNI update
  vxlan: vnifilter: send notification on VNI add
  rtase: Reset TX subqueue when clearing TX ring
  octeontx2-af: npc: Fix CPT channel mask in npc_install_flow
  dt-bindings: ethernet: eswin: fix hsp-sp-csr backward compatibility
  sctp: validate cached peer INIT chunk length in COOKIE_ECHO processing
  net/sched: fix pedit partial COW leading to page cache corruption
  vsock/vmci: fix sk_ack_backlog leak on failed handshake
  net: bonding: fix NULL pointer dereference in bond_do_ioctl()
  geneve: fix length used in GRO hint UDP checksum adjustment
  net: ethernet: mtk_eth_soc: Fix use-after-free in metadata dst teardown
  ...

Merge tag 'drm-xe-fixes-2026-06-04' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Revert removing support for unpublished NVL-S GuC (Daniele)
- Suspend fixes related to multi-queue (Niranjana)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aiHPGiPrAyHgwBZl@intel.com

Merge branch 'net-ethtool-make-sure-__ethtool_get_link_ksettings-is-ops-locked'

Jakub Kicinski says:

====================
net: ethtool: make sure __ethtool_get_link_ksettings() is ops-locked

This is prep for the series which will make most of the ethtool ops
run without rtnl_lock. The AI bots surfaced a number of callers of
__ethtool_get_link_ksettings() which need fixing, so I decided to
send that as a smaller prep-series. Each driver changed separately
for ease of review.

Full series unlocking ethtool ops AKA v1::
https://lore.kernel.org/20260528231637.251822-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20260603012840.2254293-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: make sure __ethtool_get_link_ksettings() is ops-locked

All drivers which may call *_get_link_ksettings() on ops-locked
devices from paths already holding the ops lock are ready now.
Make __ethtool_get_link_ksettings() take the ops lock, and assert
that it's held in netif_get_link_ksettings().

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

scsi: fcoe: don't recurse on the netdev's ops lock

fcoe_link_speed_update() calls __ethtool_get_link_ksettings() on the
lport's netdev, which will soon take the dev's ops lock. Some notifier
callers already arrive with this lock held. Switch to
netif_get_link_ksettings() and adjust the explicit call sites to take
the netdev lock explicitly.

Within fcoe_device_notification() try to only query the link speed
from notifiers which announce link state change (UP / CHANGE),
DOWN / GOING_DOWN notifiers are slightly sketchy when it comes
to ops locking right now, and the code already special-cases
those by maintaining the local link_possible variable.

Also take the lock in bnx2fc_net_config(), even though I think
that bnx2fc call sites are largely irrelevant since it's not
an ops-locked driver.

Link: https://patch.msgid.link/20260603012840.2254293-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

leds: trigger: netdev: don't recurse on the netdev ops lock

get_device_state() calls __ethtool_get_link_ksettings() on the trigger's
netdev, which will soon take the dev's ops lock. Three of its callers
already hold that lock and one doesn't, so the function would either
deadlock or run unprotected depending on the path.

Make get_device_state() expect the dev's ops lock held and switch to
netif_get_link_ksettings():

  * netdev_trig_notify() NETDEV_UP / NETDEV_CHANGE / NETDEV_CHANGENAME
    arrive with the dev's ops lock held (per netdevices.rst).
  * set_device_name() does not hold the lock, take it explicitly.

Due to lock ordering we need to reshuffle the code in set_device_name()
a little bit. We need to find the device earlier on, so that we can
lock it before we take trigger_data->lock.

Link: https://patch.msgid.link/20260603012840.2254293-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sched: don't recurse on the netdev ops lock in qdiscs

cbs_set_port_rate() and taprio_set_picos_per_byte() are reached from
two paths and both already hold the device's ops lock:

*_change(), via tc_modify_qdisc() which calls netdev_lock_ops(dev)
before dispatching to the qdisc ops.

*_dev_notifier() on NETDEV_UP / NETDEV_CHANGE, where caller
holds the ops lock across the notifier chain.

Switch to netif_get_link_ksettings() to avoid deadlock once
__ethtool_get_link_ksettings() starts taking the netdev lock.

Link: https://patch.msgid.link/20260603012840.2254293-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: don't recurse on the port's netdev ops lock

port_cost() calls __ethtool_get_link_ksettings() on the port device,
which will soon take the port's ops lock. br_port_carrier_check()
is reached via the NETDEV_CHANGE notifier from linkwatch, which
already holds the port's ops lock, so the call would deadlock.

Make port_cost() expect the port's ops lock held and switch to
netif_get_link_ksettings(). The only other caller is new_nbp(),
make sure it takes the lock explicitly.

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260603012840.2254293-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: team: don't recurse on the port's netdev ops lock

__team_port_change_send() calls __ethtool_get_link_ksettings() on
the port, which will soon take the port's ops lock. The notifier
caller already holds it while the slave-add/del callers do not,
so the function would either deadlock or run unprotected depending
on the path.

Make __team_port_change_send() expect the port's ops lock held and
switch to netif_get_link_ksettings(). team_device_event()'s NETDEV_UP /
NETDEV_CHANGE already arrive with the port's ops lock held.
team_port_add() now take it explicitly.

Note that NETDEV_DOWN and team_port_del() will pass false as @linkup
so they will not execute netif_get_link_ksettings(). This is fortunate
as NETDEV_DOWN has somewhat mixed locking right now.

Link: https://patch.msgid.link/20260603012840.2254293-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bonding: don't recurse on the slave's netdev ops lock

bond_update_speed_duplex() calls __ethtool_get_link_ksettings() on
the slave, which will soon take the slave's ops lock. One of its
callers already holds it and the other three don't, so the function
would either deadlock or run unprotected depending on the path.

Make the helper expect the slave's ops lock held and switch to
netif_get_link_ksettings(). Wrap the three call sites that don't
already hold it:

  * bond_enslave() (rtnl held; core drops the lower's ops lock
    around ->ndo_add_slave).
  * bond_miimon_commit() (rtnl_trylock'd from the mii workqueue).
  * bond_ethtool_get_link_ksettings() (rtnl held via ethtool layer,
    bond device itself is not ops locked).

The call site which does already hold the ops lock is
bond_slave_netdev_event() via NETDEV_UP / NETDEV_CHANGE notifiers,
so it stays as-is.

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260603012840.2254293-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: add netif_get_link_ksettings() for correct ops-locked use

__ethtool_get_link_ksettings() is exported and called from sysfs
and many drivers. It invokes ethtool_ops->get_link_ksettings
so by our own docs it should be holding netdev lock for ops locked
devices. Looks like commit 2bcf4772e45a ("net: ethtool:
try to protect all callback with netdev instance lock")
missed adding the ops lock here.

There's a number of callers we need to fix up so let's add the
netif_get_link_ksettings() helper first, without any actual
locking changes (this commit is a nop).

Not treating this as a fix because I don't think any driver cares
at this point, but if we want to remove the rtnl_lock protection
this will become critical.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: document NETDEV_CHANGENAME as ops locked

NETDEV_CHANGENAME is only emitted from netif_change_name().
netif_change_name() has two callers both of which hold netdev_lock_ops()
around the call site:
- dev_change_name()
- do_setlink()

Document NETDEV_CHANGENAME as always ops locked.

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: cmis_cdb: hold instance lock for ops locked devices

FW module flashing was written so that the flashing happens
without holding rtnl_lock. This allows flashing multiple modules
at once. Current drivers can handle that well, but we should
let drivers depend on the netdev instance lock. Instance lock
is per netdev, and so is the module so we won't break parallel
updates.

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: rename netdev_ops_assert_locked()

Jakub suggests renaming the existing assert to match
the netdev_lock_ops_compat() semantics.

We want netdev_assert_locked_ops() to mean - if the driver
is ops locked - check that it's holding the device lock.

The existing helper check for either ops lock or rtnl_lock,
which is the locking behavior of netdev_lock_ops_compat().

The reason for naming divergence is likely that
netdev_ops_assert_locked() predated the _compat() helpers.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'trace-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:

- Fix CFI violation in probestub function

   The probestub is a function to allow tprobes to hook to a tracepoint
   to gain access to its parameters.

   The function itself is only referenced by the tracepoint structure
   which lives in the __tracepoint section. objtool explicitly ignores
   that section and when processing functions in the kernel, if it
   detects one that has no references it will seal it to have its ENDBR
   stripped on boot up.

   This means the probstub function will have its ENDBR stripped and if
   a tprobe is attached to it with IBT enabled, it will go *boom*.

* tag 'trace-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix CFI violation in probestub being called by tprobes

rust: sync: add #[must_use] to GlobalGuard and GlobalLock::try_lock

Guard is marked #[must_use] since dropping it releases the lock. GlobalGuard
wraps Guard with identical semantics but was missing the annotation, so
discarding it would silently compile without warning.

Similarly, GlobalLock::try_lock was missing #[must_use]. Option<T> does not
propagate #[must_use] from T, so the attribute needs to be on the function
directly - same reason Lock::try_lock has it.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260502160057.3402896-1-ashutoshdesai993@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

drm/amd/display: Consult MCCS FreeSync cap only if requested & supported

When the do_mccs parameter is false, we don't call
dm_helpers_read_mccs_caps, so sink->mccs_caps.freesync_supported is
unlikely to be true.

Fixes: 6f71d5dd3206 ("drm/amd/display: Read sink freesync support via mccs")
Bug: https://gitlab.freedesktop.org/drm/amd/-/work_items/5286
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 115bf5ca318e18a3dc1888ec6271c7052774952a)

drm/amdkfd: Unwind debug trap enable on copy_to_user failure

If kfd_dbg_trap_enable() fails while copying runtime_info to userspace,
it had already activated the trap, set debug_trap_enabled, taken an extra
process reference, and opened the debug event file. Return -EFAULT without
unwinding that state, leaving inconsistent trap state and a refcount
imbalance that could break later DISABLE/ENABLE.

On copy_to_user failure, deactivate the trap and undo the rest of the
enable setup before returning.

Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 01112e241e37f9ac98b6f418d93ce2e0b87b7ee0)

drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTS

The kfd_wait_on_events ioctl passes a user-supplied num_events parameter
directly to alloc_event_waiters() which calls kcalloc() without validation.
This allows unprivileged users with /dev/kfd access to trigger large kernel
memory allocations, potentially causing memory exhaustion and denial of
service via the OOM killer.

Add a check to reject num_events values exceeding KFD_SIGNAL_EVENT_LIMIT
(4096), which is the maximum number of events a single process can create.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 39eb6da7acee8d0cc12a8959235b590f295d7b4c)

drm/amdgpu: restart the CS if some parts of the VM are still invalidated

Make sure that we only submit work with full up to date VM page tables.

Backport to 7.1 and older.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 59720bfd8c6dbebeb8d5a7ab64241b007efd9213)
Cc: stable@vger.kernel.org

drm/amdgpu/userq: Fix reading timeline points in wait ioctl

Use correct u64 type.

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0ac98160dfb6ab3c6d7b38e0ff9687780beed9cb)