git.ipfire.org Git - thirdparty/kernel/stable.git/log

xfs: convert xchk_inode_xref_set_corrupt to xchk_ip_xref_set_corrupt

All xref corruption reports have the xfs_inode structure, so switch
the helper to work based on that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: add a xchk_ip_set_corrupt helper

Add a smaller wrapper to set a inode corrupted by the xfs_inode
pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: add a xfs_rmap_inode_owner helper

Add a small wrapper for initializing the rmap owner to i_ino.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: add a xfs_rmap_inode_bmbt_owner

Add a small wrapper for initializing the bmbt owner to i_ino.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: add a XFS_INO_TO_FSB helper

Add a shortcut for the common XFS_INO_TO_FSB(mp, ip->i_ino) pattern.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: add a XFS_INODE_TO_AGINO helper

Add a shortcut for the common XFS_INO_TO_AGINO(mp, ip->i_ino) pattern.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: add a XFS_INODE_TO_AGNO helper

Add a shortcut for the common XFS_INO_TO_AGNO(mp, ip->i_ino) pattern.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: fix unreachable BIGTIME check in dquot flush validation

The dqp->q_id == 0 check inside the XFS_DQTYPE_BIGTIME block is
unreachable because root dquots return successfully earlier. Reject root
dquots with XFS_DQTYPE_BIGTIME before that early return, preserving the
intended validation and removing the unreachable condition.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 4ea1ff3b4968 ("xfs: widen ondisk quota expiration timestamps to handle y2038+")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Allison Henderson <achender@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: fix exchmaps reservation limit check

xfs_exchmaps_estimate_overhead() adds the bmbt and rmapbt
overhead to a local resblks variable, but the final UINT_MAX
check still tests req->resblks. That is the reservation value
from before the overhead was added.

The computed value is stored back in req->resblks and later passed
to xfs_trans_alloc(), whose block reservation argument is unsigned
int. Check the computed reservation so the existing limit applies
to the value that will be used.

Fixes: 966ceafc7a43 ("xfs: create deferred log items for file mapping exchanges")
Cc: stable@vger.kernel.org # v6.10
Signed-off-by: Yingjie Gao <gaoyingjie@uniontech.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: drop the experimental warning for the zoned allocator

The zoned allocator has been released with 6.15 on May 25, 2025. It has
seen constant maintenance and improvements and no major issues, so
promote it out of the experimental category.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

drm/i915/edp: Check supported link rates DPCD read

intel_edp_set_sink_rates() reads DP_SUPPORTED_LINK_RATES into a local
stack array and then parses the array unconditionally. If the read
fails, the array contents are not valid and may result in bogus sink
link rates being used.

Use drm_dp_dpcd_read_data() and clear the sink rate array on failure,
so the existing parser falls back to the default sink rate handling.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: 68f357cb7347 ("drm/i915/dp: generate and cache sink rate array for all DP, not just eDP 1.4")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/20260529145759.1640646-1-n.zhandarovich@fintech.ru
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit bd61c7756b34157e093028225a69383b4b1203cc)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>

i2c: eg20t: Consistently define pci_device_ids using named initializers

The .driver_data member of the struct pci_device_id array were
initialized by list expressions. This isn't easily readable if you're
not into PCI. Using named initializers is more explicit and thus easier
to parse.

This change doesn't introduce changes to the compiled pci_device_id
arrays. Tested on x86 and arm64.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/41316792102ff2860ec019373293cb07d545a0b0.1779481436.git.u.kleine-koenig@baylibre.com

i2c: designware-pcidrv: Consistently define pci_device_ids using named initializers

The .driver_data member of the struct pci_device_id array were
initialized by list expressions. This isn't easily readable if you're
not into PCI. Using named initializers is more explicit and thus easier
to parse.

This change doesn't introduce changes to the compiled pci_device_id
array. Tested on x86 and arm64.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/68667c4ab85716b190d8b705813b610e21a386f6.1779481436.git.u.kleine-koenig@baylibre.com

i2c: bcm-kona: fix spelling mistake in timeout-check comment

Fix a spelling mistake in the timeout-check comment.

No functional change.

Signed-off-by: Stepan Ionichev <sozdayvek@gmail.com>
Acked-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260503165925.1738-1-sozdayvek@gmail.com

i2c: cadence: Add shutdown handler

During system reboot or kexec, in-flight I2C transfers can cause
spurious interrupts or leave the bus in an undefined state. Add a
shutdown handler that marks the adapter suspended and resets the
controller, ensuring a clean handoff.

Signed-off-by: Ajay Neeli <ajay.neeli@amd.com>
Acked-by: Michal Simek <michal.simek@amd.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260430053050.3590173-1-ajay.neeli@amd.com

accel/ivpu: Fix signed integer truncation in IPC receive

Fix potential buffer overflow where firmware-supplied data_size is cast
to signed int before being used in min_t(). Large unsigned values
(>= 0x80000000) become negative, causing unsigned wraparound and
oversized memcpy operations that can overflow the stack buffer.

Change min_t(int, ...) to min() as both values are unsigned and can be
handled by min() without explicit cast.

Fixes: 3b434a3445ff ("accel/ivpu: Use threaded IRQ to handle JOB done messages")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20260601161643.229342-1-andrzej.kacprowski@linux.intel.com

Merge tag 'drm-misc-next-fixes-2026-06-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next-fixes for v7.2-rc1:
- Revert last minute IS_ERR_OR_NULL changes in nouveau/gsp.
- Fix build warning in drm scheduler.
- Flush caches and TLB before v3d runtime suspend.
- Fix a trace and debug command in amdxdna.
- Fix heap buffer address validation when PASID is disabled in amdxdna.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/a4a5bf50-3fc8-4faf-884b-08121687124a@linux.intel.com

net: openvswitch: fix possible kfree_skb of ERR_PTR

After the patch in the "Fixes" tag, the allocation of the "reply" skb
can happen either before or after locking the ovs_mutex.

However, error cleanups still follow the classical reversed order,
assuming "reply" is allocated before locking: it is freed after unlocking.

If "reply" allocation happens after locking the mutex and it fails,
"reply" is left with an ERR_PTR, and execution jumps to the correspondent
cleanup stage which will try to free an invalid pointer.

Fix this by setting the pointer to NULL after having saved its error
value.

Fixes: 893f139b9a6c ("openvswitch: Minimize ovs_flow_cmd_new|set critical sections.")
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://patch.msgid.link/20260604121946.942164-1-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tls-receive-path-fixes-and-clean-ups'

Chuck Lever says:

====================
tls: receive-path fixes and clean-ups

I'd like to encourage in-kernel kTLS consumers (NFSD, NVMe/TCP) to
coalesce on the use of read_sock. While auditing read_sock for that
purpose, Hannes and Sabrina flagged a few rough edges in the receive
paths.

This series is a set of clean-ups, not a performance series. Async
batch decryption and its submit/deliver scaffolding were dropped
during previous review: async_capable is always false for TLS 1.3,
the version NFSD and NVMe/TCP both require, so async-related
improvements were unreachable for the in-kernel consumers this
work targets.

A subsequent series will introduce infrastructure to support
KeyUpdate for in-kernel kTLS consumers, which need to handle TLS
Alert messages that trigger a tlshd upcall.
====================

Link: https://patch.msgid.link/20260604-tls-read-sock-v12-0-b114efa6e3e2@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: Flush backlog before waiting for a new record

While lock_sock is held, incoming TCP segments land on
sk->sk_backlog rather than sk->sk_receive_queue.
tls_rx_rec_wait() inspects only sk_receive_queue, so backlog
data remains invisible. For non-blocking callers (read_sock,
and recvmsg or splice_read with MSG_DONTWAIT) this causes a
spurious -EAGAIN. For blocking callers it forces an
unnecessary sleep/wakeup cycle.

Flush the backlog inside tls_rx_rec_wait() before checking
sk_receive_queue so the strparser can parse newly-arrived
segments immediately. On the next loop iteration
tls_read_flush_backlog() may redundantly flush, but this
path is cold and the cost is negligible.

Backlog processing can run tcp_reset(), which calls
tcp_done_with_error() to set sk->sk_err = ECONNRESET and then
tcp_done() to set sk->sk_shutdown = SHUTDOWN_MASK. The pre-existing
top-of-loop sk_err check already ran before the flush, so the
freshly-set error would be masked by the next-line sk_shutdown test
returning 0 (EOF). Re-check sk_err immediately before the sk_shutdown
test so a connection abort surfaces as -ECONNRESET rather than a clean
EOF.

Commit f508262ae9f2 ("tls: Preserve sk_err across recvmsg() when
data has been copied") gave the top-of-loop sk_err check a
has_copied split. The recheck applies the same handling: when the
caller has already copied bytes, sk_err is reported but preserved
so the error surfaces on the next call; otherwise sock_error()
consumes it so the error is reported exactly once.

Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/netdev/ahgHgQ84RCc8uYrG@krikkit/
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260604-tls-read-sock-v12-6-b114efa6e3e2@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: Suppress spurious saved_data_ready on all receive paths

Each record release via tls_strp_msg_done() triggered
tls_strp_check_rcv(), which called tls_rx_msg_ready() and
fired saved_data_ready(). During a multi-record receive, the
first N-1 wakeups are pure overhead: the caller is already
running and will pick up subsequent records on the next loop
iteration. The recvmsg and splice_read paths share this waste.

Suppress per-record notifications and emit a single one on
reader exit. tls_rx_rec_done() releases the current record
and parses the next without announcing; tls_strp_check_rcv()
gains a bool announce parameter so callers can request the
quiet form. tls_rx_reader_release() fires the deferred
announce on exit through tls_rx_msg_maybe_announce(), an
idempotent helper that calls saved_data_ready() only when a
record is parsed and has not yet been announced.

To keep the final notification idempotent against records that
the BH or the worker has already announced, tls_strparser gains
a msg_announced bit. tls_rx_msg_maybe_announce() sets the bit
when firing saved_data_ready(); the bit is cleared whenever
the parsed record is wiped, by tls_strp_msg_consume() on
consumption or by tls_strp_msg_load() when the lower socket
loses bytes from under the parse. A second call for the same
parsed record -- as when recvmsg() satisfies the request from
ctx->rx_list without touching the strparser -- becomes a
no-op.

With no remaining callers, tls_strp_msg_done() is removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260604-tls-read-sock-v12-5-b114efa6e3e2@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: Factor tls_strp_msg_consume() from tls_strp_msg_done()

tls_strp_msg_done() conflates releasing the current record with
checking for the next one via tls_strp_check_rcv(). A subsequent
patch needs to release a record without immediately triggering
that check, so the release step is separated into
tls_strp_msg_consume(). tls_strp_msg_done() is preserved as a
wrapper for existing callers.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260604-tls-read-sock-v12-4-b114efa6e3e2@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: Move decrypt-failure abort into tls_rx_one_record()

Three receive paths -- recvmsg, read_sock, and splice_read --
each follow tls_rx_one_record() with the same tls_err_abort()
call. Consolidate the abort into tls_rx_one_record() so the
decrypt-and-abort sequence lives in one place.

A tls_check_pending_rekey() failure after successful
decryption no longer triggers tls_err_abort(). That path
fires only when skb_copy_bits() fails on a valid skb,
which is not a realistic scenario.

Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260604-tls-read-sock-v12-3-b114efa6e3e2@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: Re-present partially-consumed records in tls_sw_read_sock()

The tls_sw_read_sock() loop releases the current skb whether
read_actor() consumed the full record or only a prefix. When
the actor takes only part of the record and leaves desc->count
non-zero, the remainder is lost: skb is neither requeued nor
freed, and the next iteration overwrites it during dequeue or
tls_rx_rec_wait().

No mainline consumer reaches this path today. The only
in-tree TLS read_sock user is nvme/tcp, whose actor
nvme_tcp_recv_skb() loops until the input length is exhausted
and returns either the full length or a negative error.

The path becomes reachable with the upcoming NFSD svcsock
receive built on read_sock_cmsg. Its data actor,
svc_tcp_recv_actor(), parses an RPC fragment stream
incrementally and returns at fragment boundaries. When a TLS
record carries the tail of one RPC fragment plus the head of
the next, the actor returns fewer bytes than offered while
leaving desc->count non-zero, and without re-presentation the
trailing fragment header vanishes.

__tcp_read_sock() handles the equivalent case for plain TCP
by leaving the unread bytes available for the next iteration
to re-present, via sequence-number re-lookup. Adopt the same
loop-level behavior: when read_actor() consumes only part of
the record, update rxm->offset and rxm->full_len and requeue
the skb to the head of rx_list so the next iteration
re-presents the unread bytes. Switch the open-ended for-loop
to "while (desc->count)" so the partial- and full-consume
arms share a single exit check and read_actor() is not
re-invoked once desc->count is exhausted.

Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260604-tls-read-sock-v12-2-b114efa6e3e2@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: Avoid evaluating freed skb in tls_sw_read_sock() loop

tls_sw_read_sock() ends its receive loop with while (skb), but
the else branch in the body calls consume_skb(skb) before the
predicate is re-evaluated. A pointer becomes indeterminate when
the object it points to reaches end-of-lifetime (C2011 6.2.4p2),
and using an indeterminate value is undefined behavior (Annex
J.2). The pointer is not dereferenced today -- the predicate
either exits the loop or skb is overwritten at the top of the
next iteration -- but any future change that adds a dereference
between consume_skb() and the predicate would silently introduce
a use-after-free.

Replace the do/while form with an explicit for(;;) loop so
termination happens through a break statement rather than
predicate evaluation of a freed pointer.

Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260604-tls-read-sock-v12-1-b114efa6e3e2@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge patch series "`zerocopy` support"

Introduce support for `zerocopy` [1][2]:

    Fast, safe, compile error. Pick two.

    Zerocopy makes zero-cost memory manipulation effortless. We write
    `unsafe` so you don't have to.

It essentially provides derivable traits (e.g. `FromBytes`) and macros
(e.g. `transmute!`) for safely converting between byte sequences and
other types. Having such support allows us to remove some `unsafe` code.

It is among the most downloaded Rust crates (top #50 recent, top #100
all-time downloads; according to crates.io), and it is also used by the
Rust compiler itself.

The series starts with a few preparation commits, then the `zerocopy`
and `zerocopy-derive` crates are added. Finally, an example patch using
it is on top, removing one `unsafe impl`.

I had to adapt the crates slightly (just +2/-3 lines), but both patches
could potentially be provided upstream eventually. Please see the
commits for details.

In total, it is about ~39k lines added, ~32k without counting `benches/`
which are just for documentation purposes.

See the cover letter for `syn` for some more details about depending on
third-party crates in commit 54e3eae85562 ("Merge patch series "`syn`
support"").

The codegen of an isolated example function similar to the patch on top
is essentially identical. It also turns out that (for that particular
case) `zerocopy`'s version, even under `debug-assertions` enabled, has
no remaining panics, unlike a few in the current code (because the
compiler can prove the remaining `ub_checks` statically).

So their "fast, safe" does indeed check out -- at least in that case.

P.S. This version of `zerocopy` has already the unstable `Ptr{,Inner}`
types -- to play with them, please use:

    make ... KRUSTFLAGS=--cfg=zerocopy_unstable_ptr

Link: https://github.com/google/zerocopy
Link: https://docs.rs/zerocopy
Link: https://patch.msgid.link/20260608141439.182634-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

if_ether.h: add 802.1AC, warn about GRE 0x00FE

Because LLC wasn't complicated/annoying enough, there's 2 more
"ethertypes" being used for it:

- 0x8870 is pretty "normal", it got standardized in
  802.1AC-2016/Cor1-2018 for transporting LLC frames > 1500 bytes.
  It simply replaces the length value (which is no longer encoded, and
  must now be derived from the packet.)  The actual value dates back to
  2001; https://datatracker.ietf.org/doc/html/draft-ietf-isis-ext-eth-01
  (it was used without "proper" standardization for a long time)

- 0x00fe is a doozy - actually "invalid" depending on how you look at
  it; it's used in GRE (and possibly GENEVE) tunnels to transport the
  IS-IS routing protocol.  https://seclists.org/tcpdump/2002/q4/61 is
  the best/oldest source I could find.  It's inspired by the 0xfe SAP
  value, a GRE packet with protocol 0x00fe is followed by a payload "as
  if" it was Ethernet with "<length> 0xfe 0xfe 0x03".  (Again the length
  isn't encoded explicitly anymore.)

The 0x00fe value is quite close to other values the kernel is using
internally for various things (after all they "won't clash for 1500
types").  Except this one does clash, and if someone unknowingly starts
using it for something internal... we end up in a world of pain in
getting IS-IS running on GRE tunnels.  Hence the "WARNING".

Signed-off-by: David 'equinox' Lamparter <equinox@diac24.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Link: https://patch.msgid.link/20260605164144.81184-1-equinox@diac24.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gpu: nova-core: firmware: parse `FalconUCodeDescV2` via `zerocopy`

Now that we have `zerocopy` support, we can avoid some `unsafe` code.

For instance, for `FalconUCodeDescV2`, we can replace the `unsafe impl
FromBytes` by safely deriving `zerocopy`'s `FromBytes` and then calling
`read_from_prefix`.

Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260608141439.182634-20-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: prelude: add `zerocopy{,_derive}::FromBytes`

In order to easily use `FromBytes`, add it to the prelude.

This adds both the trait (`zerocopy::FromBytes`) as well as the derive
macro (`zerocopy_derive::FromBytes`).

We will be adding more as we need them.

Link: https://patch.msgid.link/20260608141439.182634-19-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy-derive: enable support in kbuild

With all the new files in place and ready from the new crate, enable
the support for it in the build system.

In addition, skip formatting for this vendored crate.

Link: https://patch.msgid.link/20260608141439.182634-18-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy-derive: add `README.md`

Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d257107 ("rust: adapt `alloc` crate to the
kernel"), a `README.md` file was added to explain the provenance and
licensing of the source files.

Thus do the same for the `zerocopy-derive` crate.

Cc: Joshua Liebow-Feeser <joshlf@google.com>
Cc: Jack Wrenn <jswrenn@google.com>
Link: https://patch.msgid.link/20260608141439.182634-17-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy-derive: avoid generating non-ASCII identifiers

Linux is built with `-Dnon_ascii_idents`. However, `zerocopy-derive`
uses a non-ASCII character (`ẕ`) internally, which in turn triggers
the lint when attempting to use derives like `FromBytes`:

    error: identifier contains non-ASCII characters
       --> rust/kernel/lib.rs:153:9
        |
    153 |         a: u32,
        |         ^
        |
        = note: requested on the command line with `-D non-ascii-idents`

This was already noticed by another project using
`#![deny(non_ascii_idents)]` [1]. `zerocopy` added an
`#[allow(non_ascii_idents)]` [2], but it does not work since, at the
moment, the `non_ascii_idents` lint is a `crate_level_only` one, and thus
`allow`s only work at the crate root level.

Due to this, an issue about relaxing this restriction was created in
upstream Rust [3] some months ago.

Thus work around it here by using another prefix. The likelihood of a
collision is very small for us, since we control the callers, and this
will hopefully be fixed soon at either the `zerocopy` or the Rust level.

I filed an issue [4] about it with upstream `zerocopy` as requested
and we discussed this with upstream Rust and `zerocopy`: the Rust issue
got nominated and a PR [5] to relax the restriction was submitted by
Joshua. Upstream `zerocopy` prefers that approach, so if Rust merges it,
then it means we will be able to remove the workaround when we bump the
MSRV, thus likely late 2027, since we follow Debian Stable.

Cc: Joshua Liebow-Feeser <joshlf@google.com>
Cc: Jack Wrenn <jswrenn@google.com>
Link: https://github.com/google/zerocopy/issues/2880
Link: https://github.com/google/zerocopy/pull/2882
Link: https://github.com/rust-lang/rust/issues/151025
Link: https://github.com/google/zerocopy/issues/3427
Link: https://github.com/rust-lang/rust/pull/157497
Link: https://patch.msgid.link/20260608141439.182634-16-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy-derive: add SPDX License Identifiers

Originally, when the Rust upstream `alloc` standard library crate was
vendored, the SPDX License Identifiers were added to every file so that
the license on those was clear. The same happened with the vendoring of
`proc_macro2`, `quote` and `syn`. Please see:

  commit 057b8d257107 ("rust: adapt `alloc` crate to the kernel")
  commit 69942c0a8965 ("rust: syn: add SPDX License Identifiers")
  commit ddfa1b279d08 ("rust: quote: add SPDX License Identifiers")
  commit a9acfceb9614 ("rust: proc-macro2: add SPDX License Identifiers")

Thus do the same for the `zerocopy-derive` crate.

This makes `scripts/spdxcheck.py` pass: use parentheses like commit
06e9bfc1e57d ("ionic: make spdxcheck.py happy") did since we have two
`OR` operators in the expression (three licenses).

Finally, as requested, I filed an issue [1] with upstream about it.

Cc: Joshua Liebow-Feeser <joshlf@google.com>
Cc: Jack Wrenn <jswrenn@google.com>
Link: https://github.com/google/zerocopy/issues/3428
Link: https://patch.msgid.link/20260608141439.182634-15-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy-derive: import crate

This is a subset of the Rust `zerocopy-derive` crate, version v0.8.50
(released 2026-05-31), licensed under "BSD-2-Clause OR Apache-2.0 OR
MIT", from:

    https://github.com/google/zerocopy/tree/v0.8.50/zerocopy-derive/src

The files are copied as-is, with no modifications whatsoever (not even
adding the SPDX identifiers).

For copyright details, please see:

    https://github.com/google/zerocopy/blob/v0.8.50/README.md?plain=1
    https://github.com/google/zerocopy/blob/v0.8.50/LICENSE-BSD
    https://github.com/google/zerocopy/blob/v0.8.50/LICENSE-APACHE
    https://github.com/google/zerocopy/blob/v0.8.50/LICENSE-MIT

The next two patches modify these files as needed for use within the
kernel. This patch split allows reviewers to double-check the import
and to clearly see the differences introduced.

The following script may be used to verify the contents:

    for path in $(cd rust/zerocopy-derive/ && find . -type f); do
        curl --silent --show-error --location \
            https://github.com/google/zerocopy/raw/v0.8.50/zerocopy-derive/src/$path \
            | diff --unified rust/zerocopy-derive/$path - && echo $path: OK
    done

Cc: Joshua Liebow-Feeser <joshlf@google.com>
Cc: Jack Wrenn <jswrenn@google.com>
Link: https://patch.msgid.link/20260608141439.182634-14-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy: enable support in kbuild

With all the new files in place and ready from the new crate, enable
the support for it in the build system.

In addition, skip formatting for this vendored crate.

Finally, there are no generated symbols expected from `zerocopy`, thus
skip adding the `exports` generation.

Link: https://patch.msgid.link/20260608141439.182634-13-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy: add `README.md`

Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d257107 ("rust: adapt `alloc` crate to the
kernel"), a `README.md` file was added to explain the provenance and
licensing of the source files.

Thus do the same for the `zerocopy` crate.

Cc: Joshua Liebow-Feeser <joshlf@google.com>
Cc: Jack Wrenn <jswrenn@google.com>
Link: https://patch.msgid.link/20260608141439.182634-12-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy: remove float `Display` support

The kernel builds `core` with the `no_fp_fmt_parse` `--cfg`, which means
we do not have support for formatting floating point primitives. However,
`zerocopy` expects those implementations to exist:

    error[E0277]: `f32` doesn't implement `core::fmt::Display`
       --> rust/zerocopy/src/byteorder.rs:172:29
        |
    172 |                   $trait::fmt(&self.get(), f)
        |                   ----------- ^^^^^^^^^^^ the trait `core::fmt::Display` is not implemented for `f32`
        |                   |
        |                   required by a bound introduced by this call
    ...
    907 | / define_type!(
    908 | |     An,
    909 | |     "A 32-bit floating point number",
    910 | |     F32,
    ...   |
    922 | |     []
    923 | | );
        | |_- in this macro invocation
        |
        = help: the following other types implement trait `core::fmt::Display`:
                  i128
                  i16
                  i32
                  i64
                  i8
                  isize
                  u128
                  u16
                and 4 others
        = note: this error originates in the macro `impl_fmt_trait` which comes from the expansion of the macro `define_type` (in Nightly builds, run with -Z macro-backtrace for more info)

Thus work around it by skipping those implementations in `zerocopy`.

Ideally, `zerocopy` would have the equivalent of `no_fp_fmt_parse`;
and, indeed, upstream just added it [1] after I filed an issue [2]
about it as requested. We can try it in a future update of our
vendored copy.

Cc: Joshua Liebow-Feeser <joshlf@google.com>
Cc: Jack Wrenn <jswrenn@google.com>
Link: https://github.com/google/zerocopy/pull/3429
Link: https://github.com/google/zerocopy/issues/3426
Link: https://patch.msgid.link/20260608141439.182634-11-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy: add SPDX License Identifiers

Originally, when the Rust upstream `alloc` standard library crate was
vendored, the SPDX License Identifiers were added to every file so that
the license on those was clear. The same happened with the vendoring of
`proc_macro2`, `quote` and `syn`. Please see:

  commit 057b8d257107 ("rust: adapt `alloc` crate to the kernel")
  commit 69942c0a8965 ("rust: syn: add SPDX License Identifiers")
  commit ddfa1b279d08 ("rust: quote: add SPDX License Identifiers")
  commit a9acfceb9614 ("rust: proc-macro2: add SPDX License Identifiers")

Thus do the same for the `zerocopy` crate.

This makes `scripts/spdxcheck.py` pass: use parentheses like commit
06e9bfc1e57d ("ionic: make spdxcheck.py happy") did since we have two
`OR` operators in the expression (three licenses).

SPDX identifiers are not added to the `benches` files because they are
included in rendered documentation. Nevertheless, the `README.md` to be
added by a later commit mentions the license.

Finally, as requested, I filed an issue [1] with upstream about it.

Cc: Joshua Liebow-Feeser <joshlf@google.com>
Cc: Jack Wrenn <jswrenn@google.com>
Link: https://github.com/google/zerocopy/issues/3428
Link: https://patch.msgid.link/20260608141439.182634-10-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: zerocopy: import crate

This is a subset of the Rust `zerocopy` crate, version v0.8.50 (released
2026-05-31), licensed under "BSD-2-Clause OR Apache-2.0 OR MIT", from:

    https://github.com/google/zerocopy/tree/v0.8.50

The files are copied as-is, with no modifications whatsoever (not even
adding the SPDX identifiers).

The `benches` folder is added (i.e. not just `src` like in other cases)
since the files there are included in the rendered documentation,
as well as the `rustdoc` CSS style file that is needed to make those
visually more understandable.

For copyright details, please see:

    https://github.com/google/zerocopy/blob/v0.8.50/README.md?plain=1
    https://github.com/google/zerocopy/blob/v0.8.50/LICENSE-BSD
    https://github.com/google/zerocopy/blob/v0.8.50/LICENSE-APACHE
    https://github.com/google/zerocopy/blob/v0.8.50/LICENSE-MIT

The next two patches modify these files as needed for use within the
kernel. This patch split allows reviewers to double-check the import
and to clearly see the differences introduced.

The following script may be used to verify the contents:

    for path in $(cd rust/zerocopy/ && find . -type f); do
        curl --silent --show-error --location \
            https://github.com/google/zerocopy/raw/v0.8.50/$path \
            | diff --unified rust/zerocopy/$path - && echo $path: OK
    done

Cc: Joshua Liebow-Feeser <joshlf@google.com>
Cc: Jack Wrenn <jswrenn@google.com>
Link: https://patch.msgid.link/20260608141439.182634-9-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: kbuild: support `skip_clippy` for `rustc_procmacro`

Certain vendored crates, like the upcoming `zerocopy-derive`, do not
need to be built with Clippy since we `--cap-lints=allow` them anyway.

Thus add support to skip Clippy for proc macro crates.

Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260608141439.182634-8-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: kbuild: support per-target environment variables

Certain vendored crates, like the upcoming `zerocopy`, use extra
environment variables (e.g. via `env!`).

Thus add support to easily specify those.

Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260608141439.182634-7-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: kbuild: define `procmacro-extension` variable

Since we are adding one more proc macro crate (`zerocopy-derive`),
we are refactoring their handling.

Thus, instead of using `libmacros_extension` as the common variable to
hold the extension for all of them, use a dedicated variable with a more
generic name (including for its implementation).

Link: https://patch.msgid.link/20260608141439.182634-6-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: kbuild: define `procmacro-name` function

Since we are adding one more proc macro crate (`zerocopy-derive`),
we are refactoring their handling.

Thus define a `procmacro-name` function and use it to fill the existing
variables' values.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260608141439.182634-5-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: sync: add `UniqueArc::as_ptr`

Add an associated function to `UniqueArc` for getting a raw pointer. The
implementation defers to the `Arc` implementation.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260605-unique-arc-as-ptr-v2-1-425476d2abdb@kernel.org
[ Relaxed bound moving it to new `T: ?Sized` impl block. Reworded since
it is not a method anymore. Added intra-doc link. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: kbuild: remove unused variable

Since we are adding one more proc macro crate (`zerocopy-derive`),
we are refactoring their handling.

`libpin_init_internal_extension` was added to mimic the setup for
`macros`, but it is not used, since the extension is expected to be
the same.

Thus remove it.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260608141439.182634-4-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: inline some init methods

These methods should be inlined for optimization reasons. Failure to do
so can also produce symbol names larger than what `modpost` or `objtool`
can handle.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260605-nova-exports-v4-1-e948c287407c@nvidia.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: kbuild: show the right `quiet_cmd_rustc_procmacrolibrary`

When Clippy is skipped, `RUSTC` should be shown in `quiet` instead of
`CLIPPY` to be accurate and to avoid confusion.

Thus do so, matching what we do in `quiet_cmd_rustc_library`.

Fixes: 7dbe46c0b11d ("rust: kbuild: add proc macro library support")
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260608141439.182634-3-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

scripts: generate_rust_analyzer: support passing env vars

A future commit adding `zerocopy` support will need to pass an environment
variable during its build.

Thus add support for an `--envs` parameter, similar to `--cfgs`, that
allows to pass a map of variables to set for a given crate.

This allows us to keep a single source of truth for those values.

No change intended in the generated `rust-project.json`.

Acked-by: Tamir Duberstein <tamird@kernel.org>
Link: https://patch.msgid.link/20260608141439.182634-2-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: io: use the `bitfield!` macro in `register!`

Replace the local bitfield rules by the equivalent invocation of the
`bitfield!` macro.

No functional change should be introduced as the `bitfield!` macro has
been extracted from the rules of `register!`.

Acked-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Yury Norov <ynorov@nvidia.com>
Link: https://patch.msgid.link/20260606-bitfield-v5-3-b92188820914@nvidia.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: bitfield: Add KUnit tests for bitfield

Add KUnit tests to make sure the macro is working correctly. The unit
tests are put behind the new `RUST_BITFIELD_KUNIT_TEST` Kconfig option.

Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
[acourbot:
- Use a consistent test axis where each test focuses on a single thing.
- Rename members to generic name including range for readability.
- Add test exercising `try_with`.
- Add test checking that unallocated bits are left untouched.
]
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Yury Norov <ynorov@nvidia.com>
Link: https://patch.msgid.link/20260606-bitfield-v5-2-b92188820914@nvidia.com
[ Prefixed test suite name with `rust_` as mentioned. Markdown-formatted
a few comments with Markdown. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: extract `bitfield!` macro from `register!`

Extract the bitfield-defining part of the `register!` macro into an
independent macro used to define bitfield types with bounds-checked
accessors.

Each field is represented as a `Bounded` of the appropriate bit width,
ensuring field values are never silently truncated.

Fields can optionally be converted to/from custom types, either fallibly
or infallibly.

Appropriate documentation is also added, and a MAINTAINERS entry created
for the new module.

Two minor fixups are also applied: the private accessors are inlined,
and a couple of missing fully qualified types in the macro are fixed.

Acked-by: Yury Norov <ynorov@nvidia.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Yury Norov <ynorov@nvidia.com>
Link: https://patch.msgid.link/20260606-bitfield-v5-1-b92188820914@nvidia.com
[ Added some more intra-doc links. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

net: garp: reload skb header pointers after pskb_may_pull()

garp_pdu_parse_attr() keeps a pointer into the skb linear area across
pskb_may_pull(skb, ga->len), and garp_pdu_parse_msg() dereferences gm
on every loop iteration even though the nested parse may pull again.
pskb_may_pull() can reallocate the skb head, which would leave those
pointers stale.

This is not reachable today: GARP PDUs arrive via the 802.2 LLC SAP
path, where llc_fixup_skb() already pulls and trims the whole payload
into the linear area, so the inner pulls never reallocate. Reload ga
after the pull and snapshot gm->attrtype into a local anyway, to harden
the parser and match the skb_header_pointer() discipline used by mrp.c.

No functional change.

Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260604141925.237746-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: sit: reload inner IPv6 header after GSO offloads

ipip6_tunnel_xmit() caches the inner IPv6 header pointer at function
entry and continues using it after iptunnel_handle_offloads().

For GSO skbs, iptunnel_handle_offloads() calls skb_header_unclone().
When the skb header is cloned, skb_header_unclone() can call
pskb_expand_head(), which may move the skb head. The pskb_expand_head()
contract requires pointers into the skb header to be reloaded after the
call.

If the later skb_realloc_headroom() branch is not taken, SIT uses the
stale iph6 pointer to read the inner hop limit and DS field. That can
read from a freed skb head after the old head's remaining clone is
released.

Reload iph6 after the offload helper succeeds and before subsequent
reads from the inner IPv6 header. Keep the existing reload after
skb_realloc_headroom(), since that branch can also replace the skb.

Fixes: 14909664e4e1 ("sit: Setup and TX path for sit/UDP foo-over-udp encapsulation")
Signed-off-by: Kyle Zeng <kylebot@openai.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+6eb9ca986d80f6f88cf9@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20260605073448.6524-1-kylebot@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Use effective affinity mask for IRQ selection

When a sf is created after a CPU has been taken offline, the IRQ pool may
contain IRQs with affinity masks that include the offline CPU. Since only
online CPUs should be considered for IRQ placement, cpumask_subset() check
would fail because the iter_mask contains offline CPUs that are not present
in req_mask, causing sf creation to fail.

This is an example:
  1. When mlx5 driver loads, it initializes the IRQ pools.
     For sf_ctrl_pool with ≤64 sf:
     - xa_num_irqs = {N, N} (There is only one slot)
  2. When the first SF is created:
     - The ctrl IRQ is allocated with mask=cpu_online_mask={0-191}
  2. We take CPU 20 offline
  3. Existing ctl irq still have mask={0-191}
  4. Create a new SF:
     - req_mask={0-19,21-191}
     - iter_mask={0-191}
     - {0-191} is NOT a subset of {0-19,21-191}
     - least_loaded_irq=NULL
  5. Try to allocate a new irq via irq_pool_request_irq()
  6. xa_alloc() fails because the pool is full(There is only one slot)
  7. sf creation fails with error

Use irq_get_effective_affinity_mask() instead, which returns the IRQ's
actual effective affinity that already excludes offline CPUs.

Fixes: 061f5b23588a ("net/mlx5: SF, Use all available cpu for setting cpu affinity")
Suggested-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260605102112.91772-1-fushuai.wang@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Simplify cpumask operations in comp_irq_request_sf()

Combine cpumask_copy() and cpumask_andnot() into a single
cpumask_andnot() since the function can take cpu_online_mask
directly as the source.

Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260605101756.91275-1-fushuai.wang@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: xsk: Fix DMA and xdp_frame leak on XDP_TX xmit failure

In the XSK branch of mlx5e_xmit_xdp_buff(), when sq->xmit_xdp_frame()
returns false (e.g. XDPSQ is full), the function returns without
unmapping the DMA address or freeing the xdp_frame allocated by
xdp_convert_zc_to_xdp_frame(). The xdpi_fifo push only happens on
success, so the completion path cannot recover these entries.

With CONFIG_DMA_API_DEBUG=y, the leak surfaces on driver unbind:

  DMA-API: pci 0000:08:00.0: device driver has pending DMA
  allocations while released from device [count=1116]
  One of leaked entries details: [device address=0x000000010ffd7028]
  [size=1534 bytes] [mapped with DMA_TO_DEVICE] [mapped as phy]
  WARNING: kernel/dma/debug.c:881 at dma_debug_device_change+0x127/0x180
  ...
  DMA-API: Mapped at:
   debug_dma_map_phys+0x4b/0xd0
   dma_map_phys+0xfd/0x2d0
   mlx5e_xdp_handle+0x5ae/0xac0 [mlx5_core]
   mlx5e_xsk_skb_from_cqe_mpwrq_linear+0xc4/0x170 [mlx5_core]
   mlx5e_handle_rx_cqe_mpwrq+0xc1/0x290 [mlx5_core]

Add the missing unmap + xdp_return_frame, matching the cleanup already
done in mlx5e_xdp_xmit(). has_frags is rejected earlier in this branch,
so no per-frag unmap is needed.

Fixes: 84a0a2310d6d ("net/mlx5e: XDP_TX from UMEM support")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260604135446.456119-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list

mlx5_query_nic_vport_mac_list() sizes its firmware command buffer using
the PF's log_max_current_uc/mc_list capabilities. When querying a VF
vport with a larger configured max (via devlink), the firmware response
can overflow this buffer:

BUG: KASAN: slab-out-of-bounds in mlx5_query_nic_vport_mac_list+0x453/0x4c0 [mlx5_core]
Read of size 4 at addr ff1100013ffc8a12 by task kworker/u96:2/385

CPU: 12 UID: 0 PID: 385 Comm: kworker/u96:2 Not tainted 7.0.0-rc6+ #1 PREEMPT
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
Workqueue: mlx5_esw_wq esw_vport_change_handler [mlx5_core]
Call Trace:
  <TASK>
  dump_stack_lvl+0x69/0xa0
  print_report+0x176/0x4e4
  kasan_report+0xc8/0x100
  mlx5_query_nic_vport_mac_list+0x453/0x4c0 [mlx5_core]
  esw_update_vport_addr_list+0x2e3/0xda0 [mlx5_core]
  esw_vport_change_handle_locked+0xa1f/0x1060 [mlx5_core]
  esw_vport_change_handler+0x6a/0x90 [mlx5_core]
  process_one_work+0x87f/0x15e0
  worker_thread+0x62b/0x1020
  kthread+0x375/0x490
  ret_from_fork+0x4dc/0x810
  ret_from_fork_asm+0x11/0x20
  </TASK>

Fix by querying the vport's own HCA caps to size the buffer correctly.
Refactor the function to allocate and return the MAC list internally,
removing the caller's dependency on knowing the correct max.

Fixes: e16aea2744ab ("net/mlx5: Introduce access functions to modify/query vport mac lists")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260604135849.458060-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: qrtr: fix refcount saturation and potential UAF in qrtr_port_remove

In qrtr_port_remove(), the socket reference count is decremented via
__sock_put() before the port is removed from the qrtr_ports XArray and
before the RCU grace period elapses.

This breaks the fundamental RCU update paradigm. It exposes a race
window where a concurrent RCU reader (such as qrtr_reset_ports() or
qrtr_port_lookup()) can obtain a pointer to the socket from the XArray,
and attempt to call sock_hold() on a socket whose reference count has
already dropped to zero.

This exact race condition was hit during syzkaller fuzzing, leading to
the following refcount saturation warning and a potential Use-After-Free:

  refcount_t: saturated; leaking memory.
  WARNING: CPU: 3 PID: 1273 at lib/refcount.c:22 refcount_warn_saturate+0xae/0x1d0
  Modules linked in: qrtr(+) bochs drm_shmem_helper ...
  Call Trace:
   <TASK>
   qrtr_reset_ports net/qrtr/af_qrtr.c:768 [inline] [qrtr]
   __qrtr_bind.isra.0+0x48b/0x570 net/qrtr/af_qrtr.c:805 [qrtr]
   qrtr_bind+0x17d/0x210 net/qrtr/af_qrtr.c:901 [qrtr]
   kernel_bind+0xe4/0x120 net/socket.c:3592
   qrtr_ns_init+0x1a6/0x380 net/qrtr/ns.c:715 [qrtr]
   qrtr_proto_init+0x3b/0xff0 net/qrtr/af_qrtr.c:169 [qrtr]
   do_one_initcall+0xf5/0x5e0 init/main.c:1283
   ...
   </TASK>

Fix this by deferring the reference count decrement until after the
xa_erase() and the synchronize_rcu() complete.

(Note: The v1 of this patch incorrectly replaced __sock_put() with
sock_put(). As Simon Horman pointed out, the callers of qrtr_port_remove()
still hold a reference to the socket, so freeing the socket memory here
would lead to a subsequent UAF in the caller. Thus, the __sock_put() is
kept, but only repositioned to close the RCU race.)

Fixes: bdabad3e363d ("net: Add Qualcomm IPC router")
Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260604064801.1180388-1-w15303746062@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-some-cleanups-following-phy_port-sfp'

Maxime Chevallier says:

====================
net: phy: some cleanups following phy_port SFP

While posting the v11 of phy_port netlink, sashiko found some
pre-existing issues, and following the tentative fix, Nicolai found
some more :)

This is V3, with a re-ordering of the port/sfp cleanup, as well as a new
patch (patch 3) that also reorders the phy_remove() path.
====================

Link: https://patch.msgid.link/20260604092819.723505-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: don't try to setup PHY-driven SFP cages when using genphy

We don't have support for PHY-driver SFP cages with the genphy code.

On top of that, it was found by sashiko that running
sfp_bus_add_upstream() for genphy deadlocks, as for genphy the PHY
probing runs under RTNL, which isn't the case for non-genphy drivers.

This problem was reproduced, and does lead to a deadlock on RTNL.

Before the blamed commit, the phy_sfp_probe() call was made by
individual PHY drivers, so there was no way to get to the SFP probing
path when using genphy.

Let's therefore only run phy_sfp_probe when not using genphy.

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Fixes: bad869b5e41a ("net: phy: Only rely on phy_port for PHY-driven SFP")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260604092819.723505-5-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Clean the phy_ports after unregistering the downstream SFP bus

As reported by sashiko when looking a other patches, we need to ensure
that the downstream SFP bus gets unregistered prior to destroying the
phy_ports attached to a phy_device, as the SFP code may reference these
ports. Let's make sure we follow that ordering in phy_remove().

Fixes: 589e934d2735 ("net: phy: Introduce PHY ports representation")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260604092819.723505-4-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: remove phy ports upon probe failure

When phy_probe fails, let's clean the phy_ports that were successfully
added already.

Suggested-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Fixes: 589e934d2735 ("net: phy: Introduce PHY ports representation")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260604092819.723505-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: clean the sfp upstream if phy probing fails

Sashiko reported that we don't call sfp_bus_del_upstream() in the probe
failure path, so let's add it, otherwise the sfp-bus is left with a
dangling 'upstream' field, that may be used later on during SFP events.

This issue existed before the generic phylib sfp support, back when
drivers were calling phy_sfp_probe themselves.

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Fixes: 298e54fa810e ("net: phy: add core phylib sfp support")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260604092819.723505-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdev: fix double-free in netdev_nl_bind_rx_doit()

Sashiko flags that genlmsg_reply() always consumes the skb.
The error path calls nlmsg_free(rsp) so we can't jump directly
to it. Let's not unbind, just propagate the error to the user.
This is the typical way of handling genlmsg_reply() failures.
They shouldn't happen unless user does something silly like
calling the kernel with an already-full rcvbuf.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phonet: free phonet_device after RCU grace period

phonet_device_destroy() removes a phonet_device from the per-net device
list with list_del_rcu(), but frees it immediately. RCU readers walking
the same list can still hold a pointer to the object after it has been
removed, leading to a slab-use-after-free.

Use kfree_rcu(), matching the lifetime rule already used by
phonet_address_del() for the same object type.

Fixes: eeb74a9d45f7 ("Phonet: convert devices list to RCU")
Cc: stable@vger.kernel.org
Signed-off-by: Santosh Kalluri <santosh.kalluri129@gmail.com>
Acked-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ibm: emac: mal: fix potential system hang in mal_remove()

napi_disable() is not idempotent and calling it on an already-disabled
or unenabled NAPI context will cause the kernel to spin indefinitely
waiting for the NAPI_STATE_SCHED bit to clear.

In mal_remove(), napi_disable() is called unconditionally. If no MACs were
registered, NAPI was never enabled. Also, if they were registered but
subsequently unregistered, NAPI was already disabled in
mal_unregister_commac(). In either case, calling napi_disable() causes
the kernel to hang upon module removal.

Fix this by only calling napi_disable() in mal_remove() if the commac list
is not empty (which implies NAPI is enabled).

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260603230821.5619-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ibm: emac: Clear MAL descriptors without memset

Clear MAL descriptor rings with explicit field stores instead of
memset(). The descriptor rings are carved from MAL coherent DMA memory,
which may be mapped uncached on 32-bit powerpc. The optimized memset()
path can use dcbz there and trigger an alignment warning.

Use WRITE_ONCE() for each field to prevent the compiler from merging
the stores back into a memset() call.

The skb tracking arrays remain ordinary CPU memory and still use memset().

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260603230754.5535-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ibm: emac: Fix use-after-free during device removal

The driver was using devm_register_netdev() which causes unregister_netdev()
to be deferred until the devres cleanup phase, which runs after emac_remove()
returns. This creates a use-after-free window where:

1. emac_remove() is called, which tears down hardware (cancels work, detaches
   modules, unregisters from MAL)
2. emac_remove() returns
3. devres cleanup runs and finally calls unregister_netdev()

During step 3, the network stack might still process packets, triggering
emac_irq(), emac_poll(), or other handlers that access now-freed hardware
resources (dev->emacp, dev->mal, etc.).

Fix this by replacing devm_register_netdev() with manual register_netdev()
and calling unregister_netdev() at the beginning of emac_remove(), before
any hardware teardown. This ensures the network device is fully stopped and
unregistered before hardware resources are released.

The change is safe because:
- dev->ndev is assigned very early in probe (before any error paths that
  could bypass emac_remove)
- platform_set_drvdata() is only called after successful registration, so
  emac_remove() only runs for fully registered devices
- unregister_netdev() is idempotent and safe to call on any registered device

Fixes: a4dd8535a527 ("net: ibm: emac: use devm for register_netdev")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ibm: emac: mal: fix unchecked platform_get_irq return values

platform_get_irq() returns a negative errno on failure.
Commit c4f5d0454cab5 moved the platform_get_irq() calls and explicitly
removed the error checks that were previously present, claiming
devm_request_irq() can handle it. However, a negative IRQ number
passed to devm_request_irq() fails with -EINVAL instead of
propagating the real error from platform_get_irq().

Restore the missing error checks with proper errno propagation.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260603211734.30750-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx4: avoid GCC 10 __bad_copy_from() false positive

mlx4_init_user_cqes() fills a scratch buffer with the CQE
initialization pattern and then copies from that buffer to userspace.

In the single-copy path, the copy length is array_size(entries,
cqe_size), but the scratch buffer is allocated with PAGE_SIZE. GCC 10
does not carry the branch invariant strongly enough through the object
size checks and falsely triggers __bad_copy_from().

Size the scratch buffer to the actual copy length for the active path,
keep array_size() for the single-copy case, and retain a WARN_ON_ONCE()
guard for the PAGE_SIZE invariant before allocating the buffer.

Fixes: f69bf5dee7ef ("net/mlx4: Use array_size() helper in copy_to_user()")
Signed-off-by: Yao Sang <sangyao@kylinos.cn>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

iommufd: Destroy the pages content after detaching from dmabuf

Sashiko points out this has gotten out of order, the mutex could still be
in use through the dmabuf invalidation callbacks. Don't destroy any of the
pages content until the dmabuf is fully detached.

Fixes: 71db84a092c3 ("iommufd: Add DMABUF to iopt_pages")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

net: add pskb_may_pull() to skb_gro_receive_list()

skb_gro_receive_list() calls skb_pull(skb, skb_gro_offset(skb)) without
first ensuring the data is in the linear area via pskb_may_pull(). When
the skb arrives via napi_gro_frags(), skb_headlen can be 0 (all data in
page fragments) while skb_gro_offset is non-zero (after IP+TCP header
parsing). The skb_pull() then decrements skb->len by skb_gro_offset
but skb->data_len stays unchanged, hitting BUG_ON(skb->len < skb->data_len)
in __skb_pull().

The UDP fraglist GRO path already contains this guard at
udp_offload.c:749. Adding it to skb_gro_receive_list() itself provides
centralized protection for all callers (TCP, UDP, and any future
protocols), and ensures the precondition of skb_pull() is satisfied
before it is called.

On pskb_may_pull() failure, set NAPI_GRO_CB(skb)->flush = 1 so the
skb is not held as a new GRO head and is instead delivered through the
normal receive path, matching the UDP handling.

Fixes: 8d95dc474f85 ("net: add code for TCP fraglist GRO")
Reported-by: HanQuan <eilaimemedsnaimel@gmail.com>
Reported-by: MingXuan <bwnie0730@outlook.com>
Signed-off-by: HanQuan <eilaimemedsnaimel@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: quiesce DMA before freeing resources

pdsc_teardown() frees DMA buffers but does not disable bus mastering,
leaving the device able to perform DMA after the buffers are freed.
This can lead to use-after-free if the device writes to freed memory.

Add pci_clear_master() to pdsc_teardown() to disable bus mastering
before freeing resources, ensuring all DMA is quiesced.

Add pci_set_master() to pdsc_setup() to re-enable bus mastering,
which is needed for the firmware recovery path since pdsc_teardown()
now disables it.

Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
Link: https://patch.msgid.link/20260604213637.3844317-1-nikhil.rao@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

iommufd: Take dma_resv lock before dma_buf_unpin() in release path

dma_buf_unpin() requires the caller to hold the exporter's dma_resv
lock:

  void dma_buf_unpin(struct dma_buf_attachment *attach)
  {
          ...
          dma_resv_assert_held(dmabuf->resv);
          ...
  }

iopt_release_pages() calls dma_buf_unpin() without taking that lock,
so every iommufd_ioas_destroy()/iommufd_ioas_unmap() that releases
the last reference on a DMABUF-backed iopt_pages triggers a WARN.
This was hit while running tools/testing/selftests/iommu/iommufd:

  WARNING: drivers/dma-buf/dma-buf.c:1137 at dma_buf_unpin+0x62/0x70
  RIP: 0010:dma_buf_unpin+0x62/0x70
  Call Trace:
   <TASK>
   dma_buf_unpin+0x62/0x70
   iopt_release_pages+0xe4/0x190
   iopt_unmap_iova_range+0x1c7/0x290
   iopt_unmap_all+0x1a/0x30
   iommufd_ioas_destroy+0x1d/0x50
   iommufd_fops_release+0x93/0x150
   __fput+0xfc/0x2c0
   __x64_sys_close+0x3d/0x80
   do_syscall_64+0x65/0x180
   </TASK>

Take the dma_resv lock around dma_buf_unpin() in iopt_release_pages(),
matching the iopt_map_dmabuf() convention. dma_buf_detach() acquires the
reservation lock internally, so it must remain outside the locked region.

Fixes: 8c5f9645c389 ("iommufd: Add dma_buf_pin()")
Link: https://patch.msgid.link/r/20260526111034.4079-1-Ankit.Soni@amd.com
Reported-by: Ankit Soni <Ankit.Soni@amd.com>
Signed-off-by: Ankit Soni <Ankit.Soni@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Merge branch 'ip6mr-no-rtnl-for-rtnl_family_ip6mr-rtnetlink'

Kuniyuki Iwashima says:

====================
ip6mr: No RTNL for RTNL_FAMILY_IP6MR rtnetlink.

This series is the IPv6 version of

https://lore.kernel.org/netdev/20260228221800.1082070-1-kuniyu@google.com/

and removes RTNL from ip6mr rtnetlink handlers.

After this series, there are a few RTNL left in net/ipv6/ip6mr.c
and such users will be converted to per-netns RTNL in another
series.

Patch 1 extends the ipmr selftest to exercise most of the RTNL
paths in net/ipv6/ipmr.c

Patch 2 - 6 converts RTM_GETROUTE handlers to RCU.

Patch 7 removes struct fib_dump_filter.rtnl_held.

Patch 8 use RCU for mr_table for CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=n
for ->exit_rtnl().

Patch 9 move fib_rules_unregister() to ->exit()

Patch 10 - 12 converts ->exit_batch() to ->exit_rtnl() to
save one RTNL in cleanup_net().

Patch 13 removes unnecessary RTNL during setup_net() failure.

Patch 14 drops RTNL for MRT6_(ADD|DEL)_MFC(_PROXY)?.

Patch 15 misc clean up

v2: https://lore.kernel.org/20260410211726.1668756-1-kuniyu@google.com
v1: https://lore.kernel.org/20260407212001.2368593-1-kuniyu@google.com
====================

Link: https://patch.msgid.link/20260604224712.3209821-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Define net->ipv6.{ip6mr_notifier_ops,ipmr_seq} under CONFIG_IPV6_MROUTE.

net->ipv6.ip6mr_notifier_ops and net->ipv6.ipmr_seq are used
only in net/ipv6/ip6mr.c.

Let's move these definitions under CONFIG_IPV6_MROUTE.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-16-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Replace RTNL with a dedicated mutex for MFC.

ip6mr does not have rtnetlink interface for MFC unlike ipmr,
which uses dev_get_by_index_rcu() to set struct mfcctl.mfcc_parent.

ip6mr_mfc_add() and ip6mr_mfc_delete() are called under RTNL
from ip6_mroute_setsockopt() only.

There are no RTNL dependant, but ip6_mroute_setsockopt() reuses
RTNL just for mrt->mfc_hash and mrt->mfc_cache_list.

Let's replace RTNL with a new per-netns mutex.

Later, ip6mr_notifier_ops and ipmr_seq will be moved under
CONFIG_IPV6_MROUTE.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-15-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Remove RTNL in ip6mr_rules_init() and ip6mr_net_init().

When ip6mr_free_table() is called from ip6mr_rules_init() or
ip6mr_net_init(), the netns is not yet published.

Thus, no device should have been registered, and
mroute_clean_tables() will not call mif6_delete(), so
unregister_netdevice_many() is unnecessary.

unregister_netdevice_many() does nothing if the list is empty,
but it requires RTNL due to the unconditional ASSERT_RTNL()
at the entry of unregister_netdevice_many_notify().

Let's remove unnecessary RTNL and ASSERT_RTNL() and instead
add WARN_ON_ONCE() in ip6mr_free_table().

Note that we use a local list for the new WARN_ON_ONCE() because
dev_kill_list passed from ip6mr_rules_exit_rtnl() may have some
devices when other ops->init() fails after ipmr durnig setup_net().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Convert ip6mr_net_exit_batch() to ->exit_rtnl().

ip6mr_net_ops uses ->exit_batch() to acquire RTNL only once
for dying network namespaces.

ip6mr does not depend on the ordering of ->exit_rtnl() and
->exit_batch() of other pernet_operations (unlike fib_net_ops).

Once ip6mr_free_table() is called and all devices are
queued for destruction in ->exit_rtnl(), later during
NETDEV_UNREGISTER, ip6mr_device_event() will not see anything
in vif table and just do nothing.

Let's convert ip6mr_net_exit_batch() to ->exit_rtnl().

We will remove RTNL and unregister_netdevice_many() in
ip6mr_rules_init().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Move unregister_netdevice_many() out of ip6mr_free_table().

This is a prep commit to convert ip6mr_net_exit_batch() to
->exit_rtnl().

Let's move unregister_netdevice_many() in ip6mr_free_table()
to its callers.

Now ip6mr_rules_exit() can do batching all tables per netns.

Note that later we will remove RTNL and unregister_netdevice_many()
in ip6mr_rules_init().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-12-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Move unregister_netdevice_many() out of mroute_clean_tables().

This is a prep commit to convert ip6mr_net_exit_batch() to
->exit_rtnl().

Let's move unregister_netdevice_many() in mroute_clean_tables()
to its callers.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-11-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Call fib_rules_unregister() without RTNL.

fib_rules_unregister() removes ops from net->rules_ops under
spinlock, calls ops->delete() for each rule, and frees the ops.

ip6mr_rules_ops_template does not have ->delete(), and any
operation does not require RTNL there.

Let's move fib_rules_unregister() from ip6mr_rules_exit() to
ip6mr_net_exit().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Free mr_table after RCU grace period.

Since default_device_exit_batch() is called after ->exit_rtnl(),
idev->mc_ifc_work could finally call mroute6_is_socket() under RCU
while ->exit_rtnl() is running. [0]

With CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=n, ip6mr_fib_lookup() does
not check if net->ipv6.mrt6 is NULL. If ip6mr_net_exit_batch()
set net->ipv6.mrt6 to NULL and freed it, the mrt->mroute_sk access
could result in null-ptr-deref or use-after-free.

Let's prepare for that situation by applying RCU rule to ip6mr
table similarly.

!check_net(net) is added in ip6mr_cache_unresolved() and
mroute_clean_tables() to synchronise the two by mfc_unres_lock
so that ip6mr_cache_unresolved() will not queue skb after
mroute_clean_tables() purged &mrt->mfc_unres_queue.

rcu_read_lock() in reg_vif_xmit() is moved up to cover
ip6mr_fib_lookup() as with ipmr.

Link: https://lore.kernel.org/netdev/20260407184202.34cfe2d6@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Remove rtnl_held of struct fib_dump_filter.

Commit 22e36ea9f5d7 ("inet: allow ip_valid_fib_dump_req() to
be called with RTNL or RCU") introduced the rtnl_held field in
struct fib_dump_filter to switch __dev_get_by_index() and
dev_get_by_index_rcu() depending on the caller's context.

This field served as an interim measure while we were incrementally
converting all callers of ip_valid_fib_dump_req() to RCU.

Now that all users (IPv4, IPv6, ipmr, ip6mr, and MPLS) have
been converted to RCU, the field is no longer necessary.

Let's remove it.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Convert ip6mr_rtm_dumproute() to RCU.

ip6mr_rtm_dumproute() calls mr_table_dump() or mr_rtm_dumproute(),
and mr_rtm_dumproute() finally calls mr_table_dump().

mr_table_dump() calls the passed function, _ip6mr_fill_mroute().

_ip6mr_fill_mroute() is a wrapper for ip6mr_fill_mroute() to cast
struct mr_mfc * to struct mfc6_cache *.

ip6mr_fill_mroute() can already be called safely under RCU.

Let's convert ip6mr_rtm_dumproute() to RCU.

Now there is no user of the rtnl_held field in struct
fib_dump_filter, and the next patch will remove it.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Convert ip6mr_rtm_getroute() to RCU.

ip6mr_rtm_getroute() calls __ip6mr_get_table(), ip6mr_cache_find(),
and ip6mr_fill_mroute().

Once created, struct mr_table is not freed until netns dismantle,
so it's safe under RCU.

ip6mr_cache_find() iterates mrt->mfc_hash with rhl_for_each_entry_rcu().
struct mr_mfc is freed with call_rcu(), so this is also safe under
RCU.

ip6mr_fill_mroute() calls mr_fill_mroute(), which properly uses
RCU helpers.

Let's call them under RCU and register ip6mr_rtm_getroute() with
RTNL_FLAG_DOIT_UNLOCKED.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Allocate skb earlier in ip6mr_rtm_getroute().

We will convert ip6mr_rtm_getroute() to RCU in the following patch,
where __ip6mr_get_table() will be called under RCU.

nlmsg_new() uses GFP_KERNEL and needs to be called before holding
rcu_read_lock().

As a prep, let's move nlmsg_new() before __ip6mr_get_table().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Use MAXMIFS in mr6_msgsize().

mr6_msgsize() calculates skb size needed for ip6mr_fill_mroute().

The size differs based on mrt->maxvif.

We will drop RTNL for ip6mr_rtm_getroute() and mrt->maxvif may
change under RCU.

To avoid -EMSGSIZE, let's calculate the size with the maximum
value of mrt->maxvif, MAXMIFS.

struct rtnexthop is 8 bytes and MAXMIFS is 32, so the maximum delta
is 256 bytes, which is small enough.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6mr: Annotate access to mrt->mroute_do_{pim,assert,wrvifwhole}.

These fields in struct mr_table are updated in ip6_mroute_setsockopt()
under RTNL:

  * mroute_do_pim
  * mroute_do_assert (MRT6_PIM is under RTNL while MRT6_ASSERT is lockless)
  * mroute_do_wrvifwhole

However, ip6_mroute_getsockopt() does not hold RTNL and read the first
two fields locklessly, and ip6_mr_forward() reads all the three under
RCU.

Let's use WRITE_ONCE() and READ_ONCE() for them.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftest: net: Extend ipmr.c for IP6MR.

This commit extends most test cases in ipmr.c for IPV6MR.

Note that IP6MR does not provide rtnetlink interface for MFC,
so such tests are added to XFAIL_ADD().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260604224712.3209821-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'so_txtime-improvements'

Willem de Bruijn says:

====================
SO_TXTIME improvements

FQ targets monotonic timestamps as generated by the TCP stack.

But SO_TXTIME was later added, which can send skbs with timestamps
against other clocks. It is now possible to detect these through skb
tstamp_type.

Make FQ robust by converting these timestamps for use in FQ (patch 2).

This also requires testing against out-of-bounds values. Prefer to do
this at the source, when parsing SCM_TXTIME (patch 1). But, tests in
the hot path are still needed, to handle BPF sources.

Extend the so_txtime selftest to handle this new case (patch 3).

v1: https://lore.kernel.org/20260603190243.2789335-1-willemdebruijn.kernel@gmail.com
====================

Link: https://patch.msgid.link/20260604194221.3319080-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: extend so_txtime with FQ with other clocks

Add a variant of the existing FQ tests, but pass CLOCK_TAI rather than
the native CLOCK_MONOTONIC clock id.

FQ used to imply monotonic. This is no longer the case, and the
inverse need not hold either. Rename $PREFIX_mono to $PREFIX_fq.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260604194221.3319080-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net_sched: sch_fq: convert skb->tstamp if not monotonic

FQ currently assumes skb->tstamp holds monotonic time, as used by TCP.

Users with ns_capable CAP_NET_ADMIN can transmit skbs using SO_TXTIME
with CLOCK_MONOTONIC, CLOCK_REALTIME or CLOCK_TAI clockids as of
commit 80b14dee2bea ("net: Add a new socket option for a future
transmit time.")

More recently, skbs also gained tstamp_type to explicitly communicate
the clockid of skb->tstamp, with commit 4d25ca2d6801 ("net: Rename
mono_delivery_time to tstamp_type for scalabilty"), commit
1693c5db6ab8 ("net: Add additional bit to support clockid_t timestamp
type") and a few others.

Detect other clocks and convert to monotonic for use in FQ. That is,
convert fq_skb_cb(skb)->time_to_send. Do not convert skb->tstamp
itself. Network device clocks are more commonly synchronized to TAI.

Conversion may be imprecise due to clock adjustment (e.g., adjfreq)
between when SCM_TSTAMP is set and when it is converted in fq_enqueue.
The common codepath is short, so skew will be well below common pacing
operation. Even in edge cases, bursts (too soon) or beyond horizon
(too late) are indistinguishable from network conditions. To which
senders must be robust, as long as infrequent.

Avoid overflow due to negative offsets becoming huge when converting
from signed ktime_t to u64 time_to_send. Bound lower to mono 1 and
upper to now + q->horizon. This protects against bad input, e.g.,
from BPF programs.

Detect legacy BPF programs that program skb->tstamp without setting
skb->tstamp_type. Here tstamp_type is zero (SKB_CLOCK_REALTIME), but
the value will be unrealistic for realtime in the 21st century. Follow
existing TIME_UPTIME_SEC_MAX as bound between mono and realtime.

Signed-off-by: Willem de Bruijn <willemb@google.com>
----

Changes
v1 -> v2
- replace Fixes tag with references inside the commit message

Link: https://patch.msgid.link/20260604194221.3319080-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ensure SCM_TXTIME delivery time is no older than system boot

Limit input to sane values to avoid having to add tests later in the
kernel hot path, e.g., in FQ.

SCM_TXTIME timestamps are converted to signed ktime_t when assigned to
skb->tstamp. Avoid having negative values overflow into large positive
ones when again used as u64, e.g., in FQ time_to_send.

For CLOCK_MONOTONIC, only allow positive values.

For CLOCK_REALTIME and CLOCK_TAI, allow equivalent values, i.e., no
older than the boot of the machine.

skb->tstamp zero is a special case signaling feature off. This is not
converted between clockids.

Handle the special case where the realtime clock is set so small that
real - mono is negative, however unlikely in practice.

Ideally we would also set a sane upper bound, but that would require
reading the clock, which is an expensive operation. Continue to defer
that validation to users of the data. FQ already does this.

Bound rather than return error on older timestamps. This is the
existing policy e.g., in FQ.

Signed-off-by: Willem de Bruijn <willemb@google.com>
----

Changes
  v1 -> v2
    - remove spurious semicolon at end of switch
    - remove Fixes tag

Link: https://patch.msgid.link/20260604194221.3319080-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'renesas-pinctrl-for-v7.2-tag3' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into devel

pinctrl: renesas: Updates for v7.2 (take three)

- Fix locking on RZ/G3L.

* tag 'renesas-pinctrl-for-v7.2-tag3' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers:
pinctrl: renesas: rzg2l: Use raw_spinlock_irqsave() on power source update

Signed-off-by: Linus Walleij <linusw@kernel.org>

neighbour: remove obsolete EXPORT_SYMBOL()

IPv6 can't be a module anymore, we no longer need to export:

- neigh_changeaddr
- neigh_carrier_down
- neigh_ifdown
- neigh_connected_output
- neigh_direct_output
- neigh_table_init
- neigh_table_clear

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260605073426.2922242-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

geneve: Move udp_conf.local_ip6 under CONFIG_IPV6 in geneve_create_sock().

Unlike struct ip_tunnel_key, struct udp_port_cfg does not always
define IPv6 address fields.

  >> drivers/net/geneve.c:778:12: error: no member named 'local_ip6' in 'struct udp_port_cfg'
       778 |                 udp_conf.local_ip6 = info->key.u.ipv6.src;
           |                 ~~~~~~~~ ^

Let's add CONFIG_IPV6 guard in geneve_create_sock().

Fixes: afabbb56a726 ("geneve: Introduce IFLA_GENEVE_LOCAL and IFLA_GENEVE_LOCAL6.")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202606070019.yx2LhZPU-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260606204848.1987046-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pinctrl: PINCTRL_STMFX should depend on CONFIG_OF

Commit e785c990adcc ("pinctrl: Kconfig: drop unneeded dependencies
on OF_GPIO") removed a redundant dependecy on CONFIG_OF_GPIO for
several pinctrl drivers, but this change also removed a dependency
on CONFIG_OF for some of those drivers.

Normally, this wouldn't be a problem, but PINCTRL_STMFX also selected
MFD_STMFX, which does depend on CONFIG_OF. This conflict allows
MFD_STMFX to be enabled even if CONFIG_OF is disabled.

Fix this by also having PINCTRL_STMFX depend on CONFIG_OF. This is
okay because the pinctrl-stmfx driver actually does depend on CONFIG_OF
functions.

Fixes: e785c990adcc ("pinctrl: Kconfig: drop unneeded dependencies on OF_GPIO")
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>

dt-bindings: pinctrl: realtek,rtd1625: Fix input voltage property name

The property 'input-voltage-microvolt' is a typo. Rename it to
'input-threshold-voltage-microvolt' to align with the standard pin
configuration defined in pincfg-node.yaml and parsed by pinconf-generic.c.

Fixes: f6ea7004e926 ("dt-bindings: pinctrl: realtek: Add RTD1625 pinctrl binding")
Signed-off-by: Yu-Chun Lin <eleanor.lin@realtek.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>

dt-bindings: pinctrl: mediatek: mt6795: document the slew-rate property

The driver for MT6795 pinctrl already supports the slew-rate property.
Add its description to the documentation.

Signed-off-by: Luca Leonardo Scorcia <l.scorcia@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>