git.ipfire.org Git - thirdparty/kernel/linux.git/log

net/mlx5: Fix multiport device check over light SFs

Driver is using num_vhca_ports capability to distinguish between
multiport master device and multiport slave device. num_vhca_ports is a
capability the driver sets according to the MAX num_vhca_ports
capability reported by FW. On the other hand, light SFs doesn't set the
above capbility.

This leads to wrong results whenever light SFs is checking whether he is
a multiport master or slave.

Therefore, use the MAX capability to distinguish between master and
slave devices.

Fixes: e71383fb9cd1 ("net/mlx5: Light probe local SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: alb: fix UAF in rlb_arp_recv during bond up/down

The ALB RX path may access rx_hashtbl concurrently with bond
teardown. During rapid bond up/down cycles, rlb_deinitialize()
frees rx_hashtbl while RX handlers are still running, leading
to a null pointer dereference detected by KASAN.

However, the root cause is that rlb_arp_recv() can still be accessed
after setting recv_probe to NULL, which is actually a use-after-free
(UAF) issue. That is the reason for using the referenced commit in the
Fixes tag.

[  214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI
[  214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
[  214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary)
[  214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022
[  214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding]
[  214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6
04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00
[  214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206
[  214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e
[  214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8
[  214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8
[  214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119
[  214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810
[  214.286943] FS:  00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000
[  214.295966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0
[  214.310347] Call Trace:
[  214.313070]  <IRQ>
[  214.315318]  ? __pfx_rlb_arp_recv+0x10/0x10 [bonding]
[  214.320975]  bond_handle_frame+0x166/0xb60 [bonding]
[  214.326537]  ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
[  214.332680]  __netif_receive_skb_core.constprop.0+0x576/0x2710
[  214.339199]  ? __pfx_arp_process+0x10/0x10
[  214.343775]  ? sched_balance_find_src_group+0x98/0x630
[  214.349513]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  214.356513]  ? arp_rcv+0x307/0x690
[  214.360311]  ? __pfx_arp_rcv+0x10/0x10
[  214.364499]  ? __lock_acquire+0x58c/0xbd0
[  214.368975]  __netif_receive_skb_one_core+0xae/0x1b0
[  214.374518]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  214.380743]  ? lock_acquire+0x10b/0x140
[  214.385026]  process_backlog+0x3f1/0x13a0
[  214.389502]  ? process_backlog+0x3aa/0x13a0
[  214.394174]  __napi_poll.constprop.0+0x9f/0x370
[  214.399233]  net_rx_action+0x8c1/0xe60
[  214.403423]  ? __pfx_net_rx_action+0x10/0x10
[  214.408193]  ? lock_acquire.part.0+0xbd/0x260
[  214.413058]  ? sched_clock_cpu+0x6c/0x540
[  214.417540]  ? mark_held_locks+0x40/0x70
[  214.421920]  handle_softirqs+0x1fd/0x860
[  214.426302]  ? __pfx_handle_softirqs+0x10/0x10
[  214.431264]  ? __neigh_event_send+0x2d6/0xf50
[  214.436131]  do_softirq+0xb1/0xf0
[  214.439830]  </IRQ>

The issue is reproducible by repeatedly running
ip link set bond0 up/down while receiving ARP messages, where
rlb_arp_recv() can race with rlb_deinitialize() and dereference
a freed rx_hashtbl entry.

Fix this by setting recv_probe to NULL and then calling
synchronize_net() to wait for any concurrent RX processing to finish.
This ensures that no RX handler can access rx_hashtbl after it is freed
in bond_alb_deinitialize().

Reported-by: Liang Li <liali@redhat.com>
Fixes: 3aba891dde38 ("bonding: move processing of recv handlers into handle_frame()")
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260218060919.101574-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnge: fix reserving resources from FW

HWRM_FUNC_CFG is used to reserve resources, whereas HWRM_FUNC_QCFG is
intended for querying resource information from the firmware.
Since __bnge_hwrm_reserve_pf_rings() reserves resources for a specific
PF, the command type should be HWRM_FUNC_CFG.

Fixes: 627c67f038d2 ("bng_en: Add resource management support")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260218052755.4097468-1-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Advertise supported XDP features.

Drivers are supposed to advertise the XDP features they support. This was
missed while adding XDP support.

Before:
$ ynl --family netdev --dump dev-get
...
{'ifindex': 3,
  'xdp-features': set(),
  'xdp-rx-metadata-features': set(),
  'xsk-features': set()},
...

After:
$ ynl --family netdev --dump dev-get
...
{'ifindex': 3,
  'xdp-features': {'basic', 'rx-sg'},
  'xdp-rx-metadata-features': set(),
  'xsk-features': set()},
...

Fixes: 168deb7b31b2 ("eth: fbnic: Add support for XDP_TX action")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260218030620.3329608-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

rds: tcp: fix uninit-value in __inet_bind

KMSAN reported an uninit-value access in __inet_bind() when binding
an RDS TCP socket.

The uninitialized memory originates from rds_tcp_conn_alloc(),
which uses kmem_cache_alloc() to allocate the rds_tcp_connection structure.

Specifically, the field 't_client_port_group' is incremented in
rds_tcp_conn_path_connect() without being initialized first:

if (++tc->t_client_port_group >= port_groups)

Since kmem_cache_alloc() does not zero the memory, this field contains
garbage, leading to the KMSAN report.

Fix this by using kmem_cache_zalloc() to ensure the structure is
zero-initialized upon allocation.

Reported-by: syzbot+aae646f09192f72a68dc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=aae646f09192f72a68dc
Tested-by: syzbot+aae646f09192f72a68dc@syzkaller.appspotmail.com
Fixes: a20a6992558f ("net/rds: Encode cp_index in TCP source port")
Signed-off-by: Tabrez Ahmed <tabreztalks@gmail.com>
Reviewed-by: Charalampos Mitrodimas <charmitro@posteo.net>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260217135350.33641-1-tabreztalks@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/rds: Fix NULL pointer dereference in rds_tcp_accept_one

Save a local pointer to new_sock->sk and hold a reference before
installing callbacks in rds_tcp_accept_one. After
rds_tcp_set_callbacks() or rds_tcp_reset_callbacks(), tc->t_sock is
set to new_sock which may race with the shutdown path. A concurrent
rds_tcp_conn_path_shutdown() may call sock_release(), which sets
new_sock->sk = NULL and may eventually free sk when the refcount
reaches zero.

Subsequent accesses to new_sock->sk->sk_state would dereference NULL,
causing the crash. The fix saves a local sk pointer before callbacks
are installed so that sk_state can be accessed safely even after
new_sock->sk is nulled, and uses sock_hold()/sock_put() to ensure
sk itself remains valid for the duration.

Fixes: 826c1004d4ae ("net/rds: rds_tcp_conn_path_shutdown must not discard messages")
Reported-by: syzbot+96046021045ffe6d7709@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96046021045ffe6d7709
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260216222643.2391390-1-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-af: Fix default entries mcam entry action

As per design, AF should update the default MCAM action only when
mcam_index is -1. A bug in the previous patch caused default entries
to be changed even when the request was not for them.

Fixes: 570ba37898ec ("octeontx2-af: Update RSS algorithm index")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260216090338.1318976-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-26-02-17' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

The following patchset contains Netfilter fixes for *net*:

1) Add missing __rcu annotations to NAT helper hook pointers in Amanda,
   FTP, IRC, SNMP and TFTP helpers.  From Sun Jian.

2-4):
- Add global spinlock to serialize nft_counter fetch+reset operations.
- Use atomic64_xchg() for nft_quota reset instead of read+subtract pattern.
   Note AI review detects a race in this change but it isn't new. The
   'racing' bit only exists to prevent constant stream of 'quota expired'
   notifications.
- Revert commit_mutex usage in nf_tables reset path, it caused
   circular lock dependency.  All from Brian Witte.

5) Fix uninitialized l3num value in nf_conntrack_h323 helper.

6) Fix musl libc compatibility in netfilter_bridge.h UAPI header. This
   change isn't nice (UAPI headers should not include libc headers), but
   as-is musl builds may fail due to redefinition of struct ethhdr.

7) Fix protocol checksum validation in IPVS for IPv6 with extension headers,
   from Julian Anastasov.

8) Fix device reference leak in IPVS when netdev goes down. Also from
   Julian.

9) Remove WARN_ON_ONCE when accessing forward path array, this can
   trigger with sufficiently long forward paths.  From Pablo Neira Ayuso.

10) Fix use-after-free in nf_tables_addchain() error path, from Inseo An.

* tag 'nf-26-02-17' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix use-after-free in nf_tables_addchain()
  net: remove WARN_ON_ONCE when accessing forward path array
  ipvs: do not keep dest_dst if dev is going down
  ipvs: skip ipv6 extension headers for csum checks
  include: uapi: netfilter_bridge.h: Cover for musl libc
  netfilter: nf_conntrack_h323: don't pass uninitialised l3num value
  netfilter: nf_tables: revert commit_mutex usage in reset path
  netfilter: nft_quota: use atomic64_xchg for reset
  netfilter: nft_counter: serialize reset with spinlock
  netfilter: annotate NAT helper hook pointers with __rcu
====================

Link: https://patch.msgid.link/20260217163233.31455-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: XSK, Fix unintended ICOSQ change

XSK wakeup must use the async ICOSQ (with proper locking), as it is not
guaranteed to run on the same CPU as the channel.

The commit that converted the NAPI trigger path to use the sync ICOSQ
incorrectly applied the same change to XSK, causing XSK wakeups to use
the sync ICOSQ as well. Revert XSK flows to use the async ICOSQ.

XDP program attach/detach triggers channel reopen, while XSK pool
enable/disable can happen on-the-fly via NDOs without reopening
channels. As a result, xsk_pool state cannot be reliably used at
mlx5e_open_channel() time to decide whether an async ICOSQ is needed.

Update the async_icosq_needed logic to depend on the presence of an XDP
program rather than the xsk_pool, ensuring the async ICOSQ is available
when XSK wakeups are enabled.

This fixes multiple issues:

1. Illegal synchronize_rcu() in an RCU read- side critical section via
mlx5e_xsk_wakeup() -> mlx5e_trigger_napi_icosq() ->
synchronize_net(). The stack holds RCU read-lock in xsk_poll().

2. Hitting a NULL pointer dereference in mlx5e_xsk_wakeup():

[] BUG: kernel NULL pointer dereference, address: 0000000000000240
[] #PF: supervisor read access in kernel mode
[] #PF: error_code(0x0000) - not-present page
[] PGD 0 P4D 0
[] Oops: Oops: 0000 [#1] SMP
[] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not tainted 6.19.0-rc5+ #229 PREEMPT(none)
[] Hardware name: [...]
[] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core]

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/all/20260123223916.361295-1-daniel@iogearbox.net/
Fixes: 56aca3e0f730 ("net/mlx5e: Use regular ICOSQ for triggering NAPI")
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Alice Mikityanska <alice.kernel@fastmail.im>
Link: https://patch.msgid.link/20260217074525.1761454-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'icmp-better-deal-with-ddos'

Eric Dumazet says:

====================
icmp: better deal with DDOS

When dealing with death of big UDP servers, admins might want to
increase net.ipv4.icmp_msgs_per_sec and net.ipv4.icmp_msgs_burst
to big values (2,000,000 or more).

They also might need to tune the per-host ratelimit to 1ms or 0ms
in favor of the global rate limit.

This series fixes bugs showing up in all these needs.
====================

Link: https://patch.msgid.link/20260216142832.3834174-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero

If net.ipv6.icmp.ratelimit is zero we do not have to call
inet_getpeer_v6() and inet_peer_xrlim_allow().

Both can be very expensive under DDOS.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero

If net.ipv4.icmp_ratelimit is zero, we do not have to call
inet_getpeer_v4() and inet_peer_xrlim_allow().

Both can be very expensive under DDOS.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()

Following part was needed before the blamed commit, because
inet_getpeer_v6() second argument was the prefix.

/* Give more bandwidth to wider prefixes. */
if (rt->rt6i_dst.plen < 128)
tmo >>= ((128 - rt->rt6i_dst.plen)>>5);

Now inet_getpeer_v6() retrieves hosts, we need to remove
@tmo adjustement or wider prefixes likes /24 allow 8x
more ICMP to be sent for a given ratelimit.

As we had this issue for a while, this patch changes net.ipv6.icmp.ratelimit
default value from 1000ms to 100ms to avoid potential regressions.

Also add a READ_ONCE() when reading net->ipv6.sysctl.icmpv6_time.

Fixes: fd0273d7939f ("ipv6: Remove external dependency on rt6i_dst and rt6i_src")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260216142832.3834174-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

inet: move icmp_global_{credit,stamp} to a separate cache line

icmp_global_credit was meant to be changed ~1000 times per second,
but if an admin sets net.ipv4.icmp_msgs_per_sec to a very high value,
icmp_global_credit changes can inflict false sharing to surrounding
fields that are read mostly.

Move icmp_global_credit and icmp_global_stamp to a separate
cacheline aligned group.

Fixes: b056b4cd9178 ("icmp: move icmp_global.credit and icmp_global.stamp to per netns storage")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

icmp: prevent possible overflow in icmp_global_allow()

Following expression can overflow
if sysctl_icmp_msgs_per_sec is big enough.

sysctl_icmp_msgs_per_sec * delta / HZ;

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: packetdrill: add ipv4-mapped-ipv6 tests

Add ipv4-mapped-ipv6 case to ksft_runner.sh before
an upcoming TCP fix in this area.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260217142924.1853498-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Add validation for MTU changes

Increasing the MTU beyond the HDS threshold causes the hardware to
fragment packets across multiple buffers. If a single-buffer XDP program
is attached, the driver will drop all multi-frag frames. While we can't
prevent a remote sender from sending non-TCP packets larger than the MTU,
this will prevent users from inadvertently breaking new TCP streams.

Traditionally, drivers supported XDP with MTU less than 4Kb
(packet per page). Fbnic currently prevents attaching XDP when MTU is too high.
But it does not prevent increasing MTU after XDP is attached.

Fixes: 1b0a3950dbd4 ("eth: fbnic: Add XDP pass, drop, abort support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_skbedit: fix divide-by-zero in tcf_skbedit_hash()

Commit 38a6f0865796 ("net: sched: support hash selecting tx queue")
added SKBEDIT_F_TXQ_SKBHASH support. The inclusive range size is
computed as:

mapping_mod = queue_mapping_max - queue_mapping + 1;

The range size can be 65536 when the requested range covers all possible
u16 queue IDs (e.g. queue_mapping=0 and queue_mapping_max=U16_MAX).
That value cannot be represented in a u16 and previously wrapped to 0,
so tcf_skbedit_hash() could trigger a divide-by-zero:

queue_mapping += skb_get_hash(skb) % params->mapping_mod;

Compute mapping_mod in a wider type and reject ranges larger than U16_MAX
to prevent params->mapping_mod from becoming 0 and avoid the crash.

Fixes: 38a6f0865796 ("net: sched: support hash selecting tx queue")
Cc: stable@vger.kernel.org # 6.12+
Signed-off-by: Ruitong Liu <cnitlrt@gmail.com>
Link: https://patch.msgid.link/20260213175948.1505257-1-cnitlrt@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock: document namespace mode sysctls

Add documentation for the vsock per-namespace sysctls (`ns_mode` and
`child_ns_mode`) to Documentation/admin-guide/sysctl/net.rst.
These sysctls were introduced by commit eafb64f40ca4 ("vsock: add
netns to vsock core").

Document the two namespace modes (`global` and `local`), the
inheritance behavior of `child_ns_mode`, and the restriction preventing
local namespaces from setting `child_ns_mode` to `global`.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260216163147.236844-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ec_bhf: Fix dma_free_coherent() dma handle

dma_free_coherent() in error path takes priv->rx_buf.alloc_len as
the dma handle. This would lead to improper unmapping of the buffer.

Change the dma handle to priv->rx_buf.alloc_phys.

Fixes: 6af55ff52b02 ("Driver for Beckhoff CX5020 EtherCAT master module.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://patch.msgid.link/20260213164340.77272-2-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macvlan: observe an RCU grace period in macvlan_common_newlink() error path

valis reported that a race condition still happens after my prior patch.

macvlan_common_newlink() might have made @dev visible before
detecting an error, and its caller will directly call free_netdev(dev).

We must respect an RCU period, either in macvlan or the core networking
stack.

After adding a temporary mdelay(1000) in macvlan_forward_source_one()
to open the race window, valis repro was:

ip link add p1 type veth peer p2
ip link set address 00:00:00:00:00:20 dev p1
ip link set up dev p1
ip link set up dev p2
ip link add mv0 link p2 type macvlan mode source

(ip link add invalid% link p2 type macvlan mode source macaddr add
00:00:00:00:00:20 &) ; sleep 0.5 ; ping -c1 -I p1 1.2.3.4
PING 1.2.3.4 (1.2.3.4): 56 data bytes
RTNETLINK answers: Invalid argument

BUG: KASAN: slab-use-after-free in macvlan_forward_source
(drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
Read of size 8 at addr ffff888016bb89c0 by task e/175

CPU: 1 UID: 1000 PID: 175 Comm: e Not tainted 6.19.0-rc8+ #33 NONE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:123)
print_report (mm/kasan/report.c:379 mm/kasan/report.c:482)
? macvlan_forward_source (drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
kasan_report (mm/kasan/report.c:597)
? macvlan_forward_source (drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
macvlan_forward_source (drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
? tasklet_init (kernel/softirq.c:983)
macvlan_handle_frame (drivers/net/macvlan.c:501)

Allocated by task 169:
kasan_save_stack (mm/kasan/common.c:58)
kasan_save_track (./arch/x86/include/asm/current.h:25
mm/kasan/common.c:70 mm/kasan/common.c:79)
__kasan_kmalloc (mm/kasan/common.c:419)
__kvmalloc_node_noprof (./include/linux/kasan.h:263 mm/slub.c:5657
mm/slub.c:7140)
alloc_netdev_mqs (net/core/dev.c:12012)
rtnl_create_link (net/core/rtnetlink.c:3648)
rtnl_newlink (net/core/rtnetlink.c:3830 net/core/rtnetlink.c:3957
net/core/rtnetlink.c:4072)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
netlink_rcv_skb (net/netlink/af_netlink.c:2550)
netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
netlink_sendmsg (net/netlink/af_netlink.c:1894)
__sys_sendto (net/socket.c:727 net/socket.c:742 net/socket.c:2206)
__x64_sys_sendto (net/socket.c:2209)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)

Freed by task 169:
kasan_save_stack (mm/kasan/common.c:58)
kasan_save_track (./arch/x86/include/asm/current.h:25
mm/kasan/common.c:70 mm/kasan/common.c:79)
kasan_save_free_info (mm/kasan/generic.c:587)
__kasan_slab_free (mm/kasan/common.c:287)
kfree (mm/slub.c:6674 mm/slub.c:6882)
rtnl_newlink (net/core/rtnetlink.c:3845 net/core/rtnetlink.c:3957
net/core/rtnetlink.c:4072)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
netlink_rcv_skb (net/netlink/af_netlink.c:2550)
netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
netlink_sendmsg (net/netlink/af_netlink.c:1894)
__sys_sendto (net/socket.c:727 net/socket.c:742 net/socket.c:2206)
__x64_sys_sendto (net/socket.c:2209)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)

Fixes: f8db6475a836 ("macvlan: fix error recovery in macvlan_common_newlink()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: valis <sec@valis.email>
Link: https://patch.msgid.link/20260213142557.3059043-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tc_actions: don't dump 2MB of \0 to stdout

Since we started running selftests in NIPA we have been seeing
tc_actions.sh generate a soft lockup warning on ~20% of the runs.
On the pre-netdev foundation setup it was actually a missed irq
splat from the console. Now it's either that or a lockup.

I initially suspected a socket locking issue since the test
is exercising local loopback with act_mirred.
After hours of staring at this I noticed in strace that ncat
when -o $file is specified _both_ saves the output to the file
and still prints it to stdout. Because the file being sent
is constructed with:

dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$mirred
^^^^^^^^^

the data printed is all \0. Most terminals don't display nul
characters (and neither does vng output capture save them).
But QEMU's serial console still has to poke them thru which
is very slow and causes the lockup (if the file is >600kB).

Replace the '-o $file' with '> $file'. This speeds the test up
from 2m20s to 18s on debug kernels, and prevents the warnings.

Fixes: ca22da2fbd69 ("act_mirred: use the backlog for nested calls to mirred ingress")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260214035159.2119699-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: addrconf: reduce default temp_valid_lft to 2 days

This is a recommendation from RFC 8981 and it was intended to be changed
by commit 969c54646af0 ("ipv6: Implement draft-ietf-6man-rfc4941bis")
but it only changed the sysctl documentation.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260214172543.5783-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ping: annotate data-races in ping_lookup()

isk->inet_num, isk->inet_rcv_saddr and sk->sk_bound_dev_if
are read locklessly in ping_lookup().

Add READ_ONCE()/WRITE_ONCE() annotations.

The race on isk->inet_rcv_saddr is probably coming from IPv6 support,
but does not deserve a specific backport.

Fixes: dbca1596bbb0 ("ping: convert to RCU lookups, get rid of rwlock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216100149.3319315-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: MxL862xx: don't force-enable MAXLINEAR_GPHY

The newly added dsa driver attempts to enable the corresponding PHY driver,
but that one has additional dependencies that may not be available:

WARNING: unmet direct dependencies detected for MAXLINEAR_GPHY
  Depends on [m]: NETDEVICES [=y] && PHYLIB [=y] && (HWMON [=m] || HWMON [=m]=n [=n])
  Selected by [y]:
  - NET_DSA_MXL862 [=y] && NETDEVICES [=y] && NET_DSA [=y]
aarch64-linux-ld: drivers/net/phy/mxl-gpy.o: in function `gpy_probe':
mxl-gpy.c:(.text.gpy_probe+0x13c): undefined reference to `devm_hwmon_device_register_with_info'
aarch64-linux-ld: drivers/net/phy/mxl-gpy.o: in function `gpy_hwmon_read':
mxl-gpy.c:(.text.gpy_hwmon_read+0x48): undefined reference to `polynomial_calc'

There is actually no compile-time dependency, as DSA correctly uses the
PHY abstractions. Remove the 'select' statement to reduce the complexity.

Fixes: 23794bec1cb6 ("net: dsa: add basic initial driver for MxL862xx switches")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/20260216105522.2382373-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Fix ref frequency setting

The frequency for an input reference is computed as:

frequency = freq_base * freq_mult * freq_ratio_m / freq_ratio_n

Before commit 5bc02b190a3fb ("dpll: zl3073x: Cache all reference
properties in zl3073x_ref"), zl3073x_dpll_input_pin_frequency_set()
explicitly wrote 1 to both the REF_RATIO_M and REF_RATIO_N hardware
registers whenever a new frequency was set. This ensured the FEC ratio
was always reset to 1:1 alongside the new base/multiplier values.

The refactoring in that commit introduced zl3073x_ref_freq_set() to
update the cached ref state, but this helper only sets freq_base and
freq_mult without resetting freq_ratio_m and freq_ratio_n to 1. Because
zl3073x_ref_state_set() uses a compare-and-write strategy, unchanged
ratio fields are never written to the hardware. If the device previously
had non-unity FEC ratio values, they remain in effect after a frequency
change, resulting in an incorrect computed frequency.

Explicitly set freq_ratio_m and freq_ratio_n to 1 in zl3073x_ref_freq_set()
to restore the original behavior.

Fixes: 5bc02b190a3fb ("dpll: zl3073x: Cache all reference properties in zl3073x_ref")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260216194007.680416-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: do not delay zero-copy skbs in skb_attempt_defer_free()

After the blamed commit, TCP tx zero copy notifications could be
arbitrarily delayed and cause regressions in applications waiting
for them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs")
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260216193653.627617-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: psp: select CONFIG_SKB_EXTENSIONS

psp now uses skb extensions, failing to build when that is disabled:

In file included from include/net/psp.h:7,
                 from net/psp/psp_sock.c:9:
include/net/psp/functions.h: In function '__psp_skb_coalesce_diff':
include/net/psp/functions.h:60:13: error: implicit declaration of function 'skb_ext_find'; did you mean 'skb_ext_copy'? [-Wimplicit-function-declaration]
   60 |         a = skb_ext_find(one, SKB_EXT_PSP);
      |             ^~~~~~~~~~~~
      |             skb_ext_copy
include/net/psp/functions.h:60:31: error: 'SKB_EXT_PSP' undeclared (first use in this function)
   60 |         a = skb_ext_find(one, SKB_EXT_PSP);
      |                               ^~~~~~~~~~~
include/net/psp/functions.h:60:31: note: each undeclared identifier is reported only once for each function it appears in
include/net/psp/functions.h: In function '__psp_sk_rx_policy_check':
include/net/psp/functions.h:94:53: error: 'SKB_EXT_PSP' undeclared (first use in this function)
   94 |         struct psp_skb_ext *pse = skb_ext_find(skb, SKB_EXT_PSP);
      |                                                     ^~~~~~~~~~~
net/psp/psp_sock.c: In function 'psp_sock_recv_queue_check':
net/psp/psp_sock.c:164:41: error: 'SKB_EXT_PSP' undeclared (first use in this function)
  164 |                 pse = skb_ext_find(skb, SKB_EXT_PSP);
      |                                         ^~~~~~~~~~~

Select the Kconfig symbol as we do from its other users.

Fixes: 6b46ca260e22 ("net: psp: add socket security association code")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20260216105500.2382181-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix a race in ip6_sock_set_v6only()

It is unlikely that this function will be ever called
with isk->inet_num being not zero.

Perform the check on isk->inet_num inside the locked section
for complete safety.

Fixes: 9b115749acb24 ("ipv6: add ip6_sock_set_v6only")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260216102202.3343588-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netfilter: nf_tables: fix use-after-free in nf_tables_addchain()

nf_tables_addchain() publishes the chain to table->chains via
list_add_tail_rcu() (in nft_chain_add()) before registering hooks.
If nf_tables_register_hook() then fails, the error path calls
nft_chain_del() (list_del_rcu()) followed by nf_tables_chain_destroy()
with no RCU grace period in between.

This creates two use-after-free conditions:

1) Control-plane: nf_tables_dump_chains() traverses table->chains
    under rcu_read_lock(). A concurrent dump can still be walking
    the chain when the error path frees it.

2) Packet path: for NFPROTO_INET, nf_register_net_hook() briefly
    installs the IPv4 hook before IPv6 registration fails.  Packets
    entering nft_do_chain() via the transient IPv4 hook can still be
    dereferencing chain->blob_gen_X when the error path frees the
    chain.

Add synchronize_rcu() between nft_chain_del() and the chain destroy
so that all RCU readers -- both dump threads and in-flight packet
evaluation -- have finished before the chain is freed.

Fixes: 91c7b38dc9f0 ("netfilter: nf_tables: use new transaction infrastructure to handle chain")
Signed-off-by: Inseo An <y0un9sa@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

net: remove WARN_ON_ONCE when accessing forward path array

Although unlikely, recent support for IPIP tunnels increases chances of
reaching this WARN_ON_ONCE if userspace manages to build a sufficiently
long forward path.

Remove it.

Fixes: ddb94eafab8b ("net: resolve forwarding path from virtual netdevice and HW destination address")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: do not keep dest_dst if dev is going down

There is race between the netdev notifier ip_vs_dst_event()
and the code that caches dst with dev that is going down.
As the FIB can be notified for the closed device after our
handler finishes, it is possible valid route to be returned
and cached resuling in a leaked dev reference until the dest
is not removed.

To prevent new dest_dst to be attached to dest just after the
handler dropped the old one, add a netif_running() check
to make sure the notifier handler is not currently running
for device that is closing.

Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: skip ipv6 extension headers for csum checks

Protocol checksum validation fails for IPv6 if there are extension
headers before the protocol header. iph->len already contains its
offset, so use it to fix the problem.

Fixes: 2906f66a5682 ("ipvs: SCTP Trasport Loadbalancing Support")
Fixes: 0bbdd42b7efa ("IPVS: Extend protocol DNAT/SNAT and state handlers")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

include: uapi: netfilter_bridge.h: Cover for musl libc

Musl defines its own struct ethhdr and thus defines __UAPI_DEF_ETHHDR to
zero. To avoid struct redefinition errors, user space is therefore
supposed to include netinet/if_ether.h before (or instead of)
linux/if_ether.h. To relieve them from this burden, include the libc
header here if not building for kernel space.

Reported-by: Alyssa Ross <hi@alyssa.is>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nf_conntrack_h323: don't pass uninitialised l3num value

Mihail Milev reports: Error: UNINIT (CWE-457):
net/netfilter/nf_conntrack_h323_main.c:1189:2: var_decl:
Declaring variable "tuple" without initializer.
net/netfilter/nf_conntrack_h323_main.c:1197:2:
uninit_use_in_call: Using uninitialized value "tuple.src.l3num" when calling "__nf_ct_expect_find".
net/netfilter/nf_conntrack_expect.c:142:2:
read_value: Reading value "tuple->src.l3num" when calling "nf_ct_expect_dst_hash".

  1195|    tuple.dst.protonum = IPPROTO_TCP;
  1196|
  1197|-> exp = __nf_ct_expect_find(net, nf_ct_zone(ct), &tuple);
  1198|    if (exp && exp->master == ct)
  1199|    return exp;

Switch this to a C99 initialiser and set the l3num value.

Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port")
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nf_tables: revert commit_mutex usage in reset path

It causes circular lock dependency between commit_mutex, nfnl_subsys_ipset
and nlk_cb_mutex when nft reset, ipset list, and iptables-nft with '-m set'
rule run at the same time.

Previous patches made it safe to run individual reset handlers concurrently
so commit_mutex is no longer required to prevent this.

Fixes: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests")
Fixes: 3d483faa6663 ("netfilter: nf_tables: Add locking for NFT_MSG_GETSETELEM_RESET requests")
Fixes: 3cb03edb4de3 ("netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests")
Link: https://lore.kernel.org/all/aUh_3mVRV8OrGsVo@strlen.de/
Reported-by: <syzbot+ff16b505ec9152e5f448@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=ff16b505ec9152e5f448
Signed-off-by: Brian Witte <brianwitte@mailfence.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nft_quota: use atomic64_xchg for reset

Use atomic64_xchg() to atomically read and zero the consumed value
on reset, which is simpler than the previous read+sub pattern and
doesn't require lock serialization.

Fixes: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests")
Fixes: 3d483faa6663 ("netfilter: nf_tables: Add locking for NFT_MSG_GETSETELEM_RESET requests")
Fixes: 3cb03edb4de3 ("netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Brian Witte <brianwitte@mailfence.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nft_counter: serialize reset with spinlock

Add a global static spinlock to serialize counter fetch+reset
operations, preventing concurrent dump-and-reset from underrunning
values.

The lock is taken before fetching the total so that two parallel
resets cannot both read the same counter values and then both
subtract them.

A global lock is used for simplicity since resets are infrequent.
If this becomes a bottleneck, it can be replaced with a per-net
lock later.

Fixes: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests")
Fixes: 3d483faa6663 ("netfilter: nf_tables: Add locking for NFT_MSG_GETSETELEM_RESET requests")
Fixes: 3cb03edb4de3 ("netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Brian Witte <brianwitte@mailfence.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: annotate NAT helper hook pointers with __rcu

The NAT helper hook pointers are updated and dereferenced under RCU rules,
but lack the proper __rcu annotation.

This makes sparse report address space mismatches when the hooks are used
with rcu_dereference().

Add the missing __rcu annotations to the global hook pointer declarations
and definitions in Amanda, FTP, IRC, SNMP and TFTP.

No functional change intended.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

Merge branch 'selftests-forwarding-fix-br_netfilter-related-test-failures'

Aleksei Oladko says:

====================
selftests: forwarding: fix br_netfilter related test failures

This patch series fixes kselftests that fail when the br_nefilter
module is loaded. The failures occur because the tests generate
packets that are either modified or encapsulated, but their IP
headers are not fully correct for sanity checks performed by
be_netfilter.

Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
====================

Link: https://patch.msgid.link/20260213131907.43351-1-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: forwarding: fix pedit tests failure with br_netfilter enabled

The tests use the tc pedit action to modify the IPv4 source address
("pedit ex munge ip src set"), but the IP header checksum is not
recalculated after the modification. As a result, the modified packet
fails sanity checks in br_netfilter after bridging and is dropped,
which causes the test to fail.

Fix this by ensuring net.bridge.bridge-nf-call-iptables is set to 0
during the test execution. This prevents the bridge from passing
L2 traffic to netfilter, bypassing the checksum validation that
causes the test failure.

Fixes: 92ad3828944e ("selftests: forwarding: Add a test for pedit munge SIP and DIP")
Fixes: 226657ba2389 ("selftests: forwarding: Add a forwarding test for pedit munge dsfield")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-4-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: forwarding: vxlan_bridge_1d_ipv6: fix test failure with br_netfilter enabled

The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv6 packet has an incorrect payload length set in the IPv6 header.
After VXLAN decapsulation, such packets do not pass sanity checks in
br_netfilter and are dropped, which causes the test to fail.

Fix this by setting the correct IPv6 payload length for the encapsulated
packet generated by mausezahn, so that the packet is accepted
by br_netfilter.

tools/testing/selftests/net/forwarding/vxlan_bridge_1d_ipv6.sh
lines 698-706

              )"00:03:"$(           : Payload length
              )"3a:"$(              : Next header
              )"04:"$(              : Hop limit
              )"$saddr:"$(          : IP saddr
              )"$daddr:"$(          : IP daddr
              )"80:"$(              : ICMPv6.type
              )"00:"$(              : ICMPv6.code
              )"00:"$(              : ICMPv6.checksum
              )

Data after IPv6 header:
• 80: — 1 byte (ICMPv6 type)
• 00: — 1 byte (ICMPv6 code)
• 00: — 1 byte (ICMPv6 checksum, truncated)

Total: 3 bytes → 00:03 is correct. The old value 00:08 did not match
the actual payload size.

Fixes: b07e9957f220 ("selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-3-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: forwarding: vxlan_bridge_1d: fix test failure with br_netfilter enabled

The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv4 packet contains a zero IP header checksum. After VXLAN
decapsulation, such packets do not pass sanity checks in br_netfilter
and are dropped, which causes the test to fail.

Fix this by calculating and setting a valid IPv4 header checksum for the
encapsulated packet generated by mausezahn, so that the packet is accepted
by br_netfilter. Fixed by using the payload_template_calc_checksum() /
payload_template_expand_checksum() helpers that are only available
in v6.3 and newer kernels.

Fixes: a0b61f3d8ebf ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-2-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT

CONFIG_PREEMPT_RT is special, make this clear in backlog_lock_irq_save()
and backlog_unlock_irq_restore().

The issue shows up with CONFIG_DEBUG_IRQFLAGS=y

raw_local_irq_restore() called with IRQs enabled
WARNING: kernel/locking/irqflag-debug.c:10 at warn_bogus_irq_restore+0xc/0x20 kernel/locking/irqflag-debug.c:10, CPU#1: aoe_tx0/1321
Modules linked in:
CPU: 1 UID: 0 PID: 1321 Comm: aoe_tx0 Not tainted syzkaller #0 PREEMPT_{RT,(full)}
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026
RIP: 0010:warn_bogus_irq_restore+0xc/0x20 kernel/locking/irqflag-debug.c:10
Call Trace:
<TASK>
  backlog_unlock_irq_restore net/core/dev.c:253 [inline]
  enqueue_to_backlog+0x525/0xcf0 net/core/dev.c:5347
  netif_rx_internal+0x120/0x550 net/core/dev.c:5659
  __netif_rx+0xa9/0x110 net/core/dev.c:5679
  loopback_xmit+0x43a/0x660 drivers/net/loopback.c:90
  __netdev_start_xmit include/linux/netdevice.h:5275 [inline]
  netdev_start_xmit include/linux/netdevice.h:5284 [inline]
  xmit_one net/core/dev.c:3864 [inline]
  dev_hard_start_xmit+0x2df/0x830 net/core/dev.c:3880
  __dev_queue_xmit+0x16f4/0x3990 net/core/dev.c:4829
  dev_queue_xmit include/linux/netdevice.h:3384 [inline]

Fixes: 27a01c1969a5 ("net: fully inline backlog_unlock_irq_restore()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260213120427.2914544-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

printk: add CONFIG_PRINTK dependency for netconsole

The 'select PRINTK_EXECUTION_CTX' line now causes a harmless warning
when NETCONSOLE_DYNAMIC is enabled but PRINTK is not:

WARNING: unmet direct dependencies detected for PRINTK_EXECUTION_CTX
  Depends on [n]: PRINTK [=n]
  Selected by [y]:
  - NETCONSOLE_DYNAMIC [=y] && NETDEVICES [=y] && NET_CORE [=y] && NETCONSOLE [=y] && SYSFS [=y] && CONFIGFS_FS [=y] && (NETCONSOLE [=y]!=y [=y] || CONFIGFS_FS [=y]!=m [=m])

In that configuration, the netconsole driver is useless anyway, so
avoid this with an added dependency that prevents CONFIG_NETCONSOLE
to be enabled without CONFIG_PRINTK.

Fixes: 60325c27d3cf ("printk: Add execution context (task name/CPU) to printk_info")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260213074431.1729627-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-bridge-mcast-fix-mdb_n_entries-counting-warning'

Nikolay Aleksandrov says:

====================
net: bridge: mcast: fix mdb_n_entries counting warning

The first patch fixes a warning in the mdb_n_entries code which was
reported by syzbot, the second patch adds tests for different ways
which used to trigger said warning. For more information please check
the individual commit messages.
====================

Link: https://patch.msgid.link/20260213070031.1400003-1-nikolay@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: forwarding: bridge_mdb_max: add tests for mdb_n_entries warning

Recently we were able to trigger a warning in the mdb_n_entries counting
code. Add tests that exercise different ways which used to trigger that
warning.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://patch.msgid.link/20260213070031.1400003-3-nikolay@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: bridge: mcast: always update mdb_n_entries for vlan contexts

syzbot triggered a warning[1] about the number of mdb entries in a context.
It turned out that there are multiple ways to trigger that warning today
(some got added during the years), the root cause of the problem is that
the increase is done conditionally, and over the years these different
conditions increased so there were new ways to trigger the warning, that is
to do a decrease which wasn't paired with a previous increase.

For example one way to trigger it is with flush:
$ ip l add br0 up type bridge vlan_filtering 1 mcast_snooping 1
$ ip l add dumdum up master br0 type dummy
$ bridge mdb add dev br0 port dumdum grp 239.0.0.1 permanent vid 1
$ ip link set dev br0 down
$ ip link set dev br0 type bridge mcast_vlan_snooping 1
   ^^^^ this will enable snooping, but will not update mdb_n_entries
        because in __br_multicast_enable_port_ctx() we check !netif_running
$ bridge mdb flush dev br0
   ^^^ this will trigger the warning because it will delete the pg which
       we added above, which will try to decrease mdb_n_entries

Fix the problem by removing the conditional increase and always keep the
count up-to-date while the vlan exists. In order to do that we have to
first initialize it on port-vlan context creation, and then always increase
or decrease the value regardless of mcast options. To keep the current
behaviour we have to enforce the mdb limit only if the context is port's or
if the port-vlan's mcast snooping is enabled.

[1]
------------[ cut here ]------------
n == 0
WARNING: net/bridge/br_multicast.c:718 at br_multicast_port_ngroups_dec_one net/bridge/br_multicast.c:718 [inline], CPU#0: syz.4.4607/22043
WARNING: net/bridge/br_multicast.c:718 at br_multicast_port_ngroups_dec net/bridge/br_multicast.c:771 [inline], CPU#0: syz.4.4607/22043
WARNING: net/bridge/br_multicast.c:718 at br_multicast_del_pg+0x1bbe/0x1e20 net/bridge/br_multicast.c:825, CPU#0: syz.4.4607/22043
Modules linked in:
CPU: 0 UID: 0 PID: 22043 Comm: syz.4.4607 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026
RIP: 0010:br_multicast_port_ngroups_dec_one net/bridge/br_multicast.c:718 [inline]
RIP: 0010:br_multicast_port_ngroups_dec net/bridge/br_multicast.c:771 [inline]
RIP: 0010:br_multicast_del_pg+0x1bbe/0x1e20 net/bridge/br_multicast.c:825
Code: 41 5f 5d e9 04 7a 48 f7 e8 3f 73 5c f7 90 0f 0b 90 e9 cf fd ff ff e8 31 73 5c f7 90 0f 0b 90 e9 16 fd ff ff e8 23 73 5c f7 90 <0f> 0b 90 e9 60 fd ff ff e8 15 73 5c f7 eb 05 e8 0e 73 5c f7 48 8b
RSP: 0018:ffffc9000c207220 EFLAGS: 00010293
RAX: ffffffff8a68042d RBX: ffff88807c6f1800 RCX: ffff888066e90000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff888066e90000 R09: 000000000000000c
R10: 000000000000000c R11: 0000000000000000 R12: ffff8880303ef800
R13: dffffc0000000000 R14: ffff888050eb11c4 R15: 1ffff1100a1d6238
FS:  00007fa45921b6c0(0000) GS:ffff8881256f5000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa4591f9ff8 CR3: 0000000081df2000 CR4: 00000000003526f0
Call Trace:
  <TASK>
  br_mdb_flush_pgs net/bridge/br_mdb.c:1525 [inline]
  br_mdb_flush net/bridge/br_mdb.c:1544 [inline]
  br_mdb_del_bulk+0x5e2/0xb20 net/bridge/br_mdb.c:1561
  rtnl_mdb_del+0x48a/0x640 net/core/rtnetlink.c:-1
  rtnetlink_rcv_msg+0x77e/0xbe0 net/core/rtnetlink.c:6967
  netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
  netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
  netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344
  netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
  sock_sendmsg_nosec net/socket.c:727 [inline]
  __sock_sendmsg net/socket.c:742 [inline]
  ____sys_sendmsg+0xa68/0xad0 net/socket.c:2592
  ___sys_sendmsg+0x2a5/0x360 net/socket.c:2646
  __sys_sendmsg net/socket.c:2678 [inline]
  __do_sys_sendmsg net/socket.c:2683 [inline]
  __se_sys_sendmsg net/socket.c:2681 [inline]
  __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681
  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
  do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa45839aeb9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa45921b028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fa458615fa0 RCX: 00007fa45839aeb9
RDX: 0000000000000000 RSI: 00002000000000c0 RDI: 0000000000000004
RBP: 00007fa458408c1f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa458616038 R14: 00007fa458615fa0 R15: 00007fff0b59fae8
</TASK>

Fixes: b57e8d870d52 ("net: bridge: Maintain number of MDB entries in net_bridge_mcast_port")
Reported-by: syzbot+d5d1b7343531d17bd3c5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/aYrWbRp83MQR1ife@debil/T/#t
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://patch.msgid.link/20260213070031.1400003-2-nikolay@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: arcnet: com20020-pci: fix support for 2.5Mbit cards

Commit 8c14f9c70327 ("ARCNET: add com20020 PCI IDs with metadata")
converted the com20020-pci driver to use a card info structure instead
of a single flag mask in driver_data. However, it failed to take into
account that in the original code, driver_data of 0 indicates a card
with no special flags, not a card that should not have any card info
structure. This introduced a null pointer dereference when cards with
no flags were probed.

Commit bd6f1fd5d33d ("net: arcnet: com20020: Fix null-ptr-deref in
com20020pci_probe()") then papered over this issue by rejecting cards
with no driver_data instead of resolving the problem at its source.

Fix the original issue by introducing a new card info structure for
2.5Mbit cards that does not set any flags and using it if no
driver_data is present.

Fixes: 8c14f9c70327 ("ARCNET: add com20020 PCI IDs with metadata")
Fixes: bd6f1fd5d33d ("net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()")
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260213045510.32368-1-enelsonmoore@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/rds: rds_sendmsg should not discard payload_len

Commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with
connection teardown") modifies rds_sendmsg to avoid enqueueing work
while a tear down is in progress. However, it also changed the return
value of rds_sendmsg to that of rds_send_xmit instead of the
payload_len. This means the user may incorrectly receive errno values
when it should have simply received a payload of 0 while the peer
attempts a reconnections. So this patch corrects the teardown handling
code to only use the out error path in that case, thus restoring the
original payload_len return value.

Fixes: 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260213035409.1963391-1-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

xen-netback: reject zero-queue configuration from guest

A malicious or buggy Xen guest can write "0" to the xenbus key
"multi-queue-num-queues". The connect() function in the backend only
validates the upper bound (requested_num_queues > xenvif_max_queues)
but not zero, allowing requested_num_queues=0 to reach
vzalloc(array_size(0, sizeof(struct xenvif_queue))), which triggers
WARN_ON_ONCE(!size) in __vmalloc_node_range().

On systems with panic_on_warn=1, this allows a guest-to-host denial
of service.

The Xen network interface specification requires
the queue count to be "greater than zero".

Add a zero check to match the validation already present
in xen-blkback, which has included this
guard since its multi-queue support was added.

Fixes: 8d3d53b3e433 ("xen-netback: Add support for multiple queues")
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://patch.msgid.link/20260212224040.86674-1-n7l8m4@u.northwestern.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: usb: catc: enable basic endpoint checking

catc_probe() fills three URBs with hardcoded endpoint pipes without
verifying the endpoint descriptors:

- usb_sndbulkpipe(usbdev, 1) and usb_rcvbulkpipe(usbdev, 1) for TX/RX
- usb_rcvintpipe(usbdev, 2) for interrupt status

A malformed USB device can present these endpoints with transfer types
that differ from what the driver assumes.

Add a catc_usb_ep enum for endpoint numbers, replacing magic constants
throughout. Add usb_check_bulk_endpoints() and usb_check_int_endpoints()
calls after usb_set_interface() to verify endpoint types before use,
rejecting devices with mismatched descriptors at probe time.

Similar to
- commit 90b7f2961798 ("net: usb: rtl8150: enable basic endpoint checking")
which fixed the issue in rtl8150.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Link: https://patch.msgid.link/20260212214154.3609844-1-n7l8m4@u.northwestern.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ovpn: tcp - don't deref NULL sk_socket member after tcp_close()

When deleting a peer in case of keepalive expiration, the peer is
removed from the OpenVPN hashtable and is temporary inserted in a
"release list" for further processing.

This happens in:
ovpn_peer_keepalive_work()
  unlock_ovpn(release_list)

This processing includes detaching from the socket being used to
talk to this peer, by restoring its original proto and socket
ops/callbacks.

In case of TCP it may happen that, while the peer is sitting in
the release list, userspace decides to close the socket.
This will result in a concurrent execution of:

tcp_close(sk)
  __tcp_close(sk)
    sock_orphan(sk)
      sk_set_socket(sk, NULL)

The last function call will set sk->sk_socket to NULL.

When the releasing routine is resumed, ovpn_tcp_socket_detach()
will attempt to dereference sk->sk_socket to restore its original
ops member. This operation will crash due to sk->sk_socket being NULL.

Fix this race condition by testing-and-accessing
sk->sk_socket atomically under sk->sk_callback_lock.

Link: https://lore.kernel.org/netdev/176996279620.3109699.15382994681575380467@eldamar.lan/
Link: https://github.com/OpenVPN/ovpn-net-next/issues/29
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Link: https://patch.msgid.link/20260212213130.11497-1-antonio@openvpn.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'ovpn-net-20260212' of https://github.com/OpenVPN/ovpn-net-next

Antonio Quartulli says:

====================
This batch includes the following fixes:
* set sk_user_data before installing callbacks, to avoid dropping early
  packets
* fix use-after-free in ovpn_net_xmit when accessing shared skbs that
  got released
* fix TX bytes stats by adding-up every positively processed GSO
  segment
====================

Link: https://patch.msgid.link/20260212210340.11260-1-antonio@openvpn.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'eth-fbnic-fix-and-improve-header-data-split-configuration'

Bobby Eshleman says:

====================
eth: fbnic: fix and improve header/data split configuration

This series fixes TCP HDS configuration in the fbnic driver and adds a
devmem selftest for the intended behavior.

The issues fixed include setting the correct CSR for EN_HDR_SPLIT,
adjusting the hds threshold clamp to avoid long headers overflowing into
the payload page, and configuring the device to use L4/L3/L2 header
boundaries when present by programming the DMA hint bit unconditionally
for all steering rule types.

Prior to these fixes, small payloads fail devmem.check_rx_hds().

Testing:

IFC0=enp1s0
IFC1=eth0

export REMOTE_TYPE=netns
export REMOTE_ARGS=ns-remote
export NETIF=$IFC0
export LOCAL_V4=192.0.3.1
export REMOTE_V4=192.0.3.2
sysctl -w net.ipv6.conf.$IFC0.keep_addr_on_down=1

./tools/testing/selftests/drivers/net/hw/devmem.py

TAP version 13
1..4
ok 1 devmem.check_rx
ok 2 devmem.check_tx
ok 3 devmem.check_tx_chunks
ok 4 devmem.check_rx_hds

To: Alexander Duyck <alexanderduyck@fb.com>
To: Jakub Kicinski <kuba@kernel.org>
To: kernel-team@meta.com
To: Andrew Lunn <andrew+netdev@lunn.ch>
To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Paolo Abeni <pabeni@redhat.com>
To: Mohsin Bashir <mohsin.bashr@gmail.com>
To: Shuah Khan <shuah@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
====================

Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-0-55d050e6f606@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: drv-net: add HDS payload sweep test for devmem TCP

Add check_rx_hds test that verifies header/data split works across
payload sizes. The test sweeps payload sizes from 1 byte to 8KB, if any
data propagates up to userspace as SCM_DEVMEM_LINEAR, then the test
fails. This shows that regardless of payload size, ncdevmem's
configuration of hds-thresh to 0 is respected.

Add -L (--fail-on-linear) flag to ncdevmem that causes the receiver to
fail if any SCM_DEVMEM_LINEAR cmsg is received.

Use socat option for fixed block sizing and tcp nodelay to disable
nagle's algo to avoid buffering.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-4-55d050e6f606@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: set DMA_HINT_L4 for all flows

fbnic always advertises ETHTOOL_TCP_DATA_SPLIT_ENABLED via ethtool
.get_ringparam. To enable proper splitting for all flow types, even for
IP/Ethernet flows, this patch sets DMA_HINT_L4 unconditionally for all
RSS and NFC flow steering rules. According to the spec, L4 falls back to
L3 if no valid L4 is found, and L3 falls back to L2 if no L3 is found.
This makes sure that the correct header boundary is used regardless of
traffic type. This is important for zero-copy use cases where we must
ensure that all ZC packets are split correctly.

Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration")
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-3-55d050e6f606@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: increase FBNIC_HDR_BYTES_MIN from 128 to 256 bytes

Increase FBNIC_HDR_BYTES_MIN from 128 to 256 bytes. The previous minimum
was too small to guarantee that very long L2+L3+L4 headers always fit
within the header buffer. When EN_HDR_SPLIT is disabled and a packet
exceeds MAX_HEADER_BYTES, splitting occurs at that byte offset instead
of the header boundary, resulting in some of the header landing in the
payload page. The increased minimum ensures headers always fit with the
MAX_HEADER_BYTES cut off and land in the header page.

Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration")
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-2-55d050e6f606@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: set FBNIC_QUEUE_RDE_CTL0_EN_HDR_SPLIT on RDE_CTL0

Fix EN_HDR_SPLIT configuration by writing the field to RDE_CTL0 instead
of RDE_CTL1.

Because drop mode configuration and header splitting enablement both use
RDE_CTL0, we consolidate these configurations into the single function
fbnic_config_drop_mode.

Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration")
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-1-55d050e6f606@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

fbnic: close fw_log race between users and teardown

Fixes a theoretical race on fw_log between the teardown path and fw_log
write functions.

fw_log is written inside fbnic_fw_log_write() and can be reached from
the mailbox handler fbnic_fw_msix_intr(), but fw_log is freed before
IRQ/MBX teardown during cleanup, resulting in a potential data race of
dereferencing a freed/null variable.

Possible Interleaving Scenario:
  CPU0: fbnic_fw_msix_intr() // Entry
          fbnic_fw_log_write()
            if (fbnic_fw_log_ready())   // true
            ... preempt ...
  CPU1: fbnic_remove() // Entry
          fbnic_fw_log_free()
            vfree(log->data_start);
            log->data_start = NULL;
  CPU0: continues, walks log->entries or writes to log->data_start

The initialization also has an incorrect order problem, as the fw_log
is currently allocated after MBX setup during initialization.
Fix the problems by adjusting the synchronization order to put
initialization in place before the mailbox is enabled, and not cleared
until after the mailbox has been disabled.

Fixes: ecc53b1b46c89 ("eth: fbnic: Enable firmware logging")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://patch.msgid.link/20260211191329.530886-1-dg573847474@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'vsock-fix-child-netns-mode-initialization-and-restriction'

Stefano Garzarella says:

====================
vsock: fix child netns mode initialization and restriction

This series fixes two issues in the vsock network namespace support
recently introduced by commit eafb64f40ca4 ("vsock: add netns to vsock
core").

Patch 1 fixes `child_ns_mode` being always hardcoded to "global" for new
namespaces, breaking propagation of the "local" mode through nested
namespaces.

Patch 2 prevents a "local" namespace from switching `child_ns_mode` to
"global", which would allow nested namespaces to escape vsock isolation
and access global CIDs.
====================

Link: https://patch.msgid.link/20260212205916.97533-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock: prevent child netns mode switch from local to global

A "local" namespace can change its `child_ns_mode` sysctl to "global",
allowing nested namespaces to access global CIDs. This can be exploited
by an unprivileged user who gained CAP_NET_ADMIN through a user
namespace.

Prevent this by rejecting writes that attempt to set `child_ns_mode` to
"global" when the current namespace's mode is "local".

Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
Cc: bobbyeshleman@meta.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260212205916.97533-3-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock: fix child netns mode initialization

When a new network namespace is created, vsock_net_init() correctly
initializes the namespace's mode by reading the parent's `child_ns_mode`
via vsock_net_child_mode(). However, the `child_ns_mode` of the new
namespace was always hardcoded to VSOCK_NET_MODE_GLOBAL, regardless of
its own mode.

This means that if a parent namespace has `child_ns_mode` set to "local",
the child namespace correctly gets mode "local", but its `child_ns_mode`
is reset to "global". As a result, further nested namespaces will
incorrectly get mode "global" instead of inheriting "local", breaking
the expected propagation of the mode through nested namespaces.

Fix this by initializing `child_ns_mode` to the namespace's own mode,
so the setting propagates correctly through all levels of nesting.

Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
Cc: bobbyeshleman@meta.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260212205916.97533-2-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sparx5/lan969x: fix PTP clock max_adj value

The max_adj field in ptp_clock_info tells userspace how much the PHC
clock frequency can be adjusted. ptp4l reads this and will never request
a correction larger than max_adj.

On both sparx5 and lan969x the clock offset may never converge because
the servo needs a frequency correction larger than the current max_adj
of 200000 (200 ppm) allows. The servo rails at the max and the offset
stays in the tens of microseconds.

The hardware has no inherent max adjustment limit; frequency correction
is done by writing a 64-bit clock period increment to CLK_PER_CFG, and
the register has plenty of range. The 200000 value was just an overly
conservative software limit. The max_adj is shared between sparx5 and
lan969x, and the increased value is safe for both.

Fix this by increasing max_adj to 10000000 (10000 ppm), giving the
servo sufficient headroom.

Fixes: 0933bd04047c ("net: sparx5: Add support for ptp clocks")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260212-sparx5-ptp-max-adj-v2-v1-1-06b200e50ce3@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Fix out-of-bound access in fib6_add_rt2node().

syzbot reported out-of-bound read in fib6_add_rt2node(). [0]

When IPv6 route is created with RTA_NH_ID, struct fib6_info
does not have the trailing struct fib6_nh.

The cited commit started to check !iter->fib6_nh->fib_nh_gw_family
to ensure that rt6_qualify_for_ecmp() will return false for iter.

If iter->nh is not NULL, rt6_qualify_for_ecmp() returns false anyway.

Let's check iter->nh before reading iter->fib6_nh and avoid OOB read.

[0]:
BUG: KASAN: slab-out-of-bounds in fib6_add_rt2node+0x349c/0x3500 net/ipv6/ip6_fib.c:1142
Read of size 1 at addr ffff8880384ba6de by task syz.0.18/5500

CPU: 0 UID: 0 PID: 5500 Comm: syz.0.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xba/0x230 mm/kasan/report.c:482
kasan_report+0x117/0x150 mm/kasan/report.c:595
fib6_add_rt2node+0x349c/0x3500 net/ipv6/ip6_fib.c:1142
fib6_add_rt2node_nh net/ipv6/ip6_fib.c:1363 [inline]
fib6_add+0x910/0x18c0 net/ipv6/ip6_fib.c:1531
__ip6_ins_rt net/ipv6/route.c:1351 [inline]
ip6_route_add+0xde/0x1b0 net/ipv6/route.c:3957
inet6_rtm_newroute+0x268/0x19e0 net/ipv6/route.c:5660
rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6958
netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0xa68/0xad0 net/socket.c:2592
___sys_sendmsg+0x2a5/0x360 net/socket.c:2646
__sys_sendmsg net/socket.c:2678 [inline]
__do_sys_sendmsg net/socket.c:2683 [inline]
__se_sys_sendmsg net/socket.c:2681 [inline]
__x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f9316b9aeb9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd8809b678 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9316e15fa0 RCX: 00007f9316b9aeb9
RDX: 0000000000000000 RSI: 0000200000004380 RDI: 0000000000000003
RBP: 00007f9316c08c1f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f9316e15fac R14: 00007f9316e15fa0 R15: 00007f9316e15fa0
</TASK>

Allocated by task 5499:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
kasan_kmalloc include/linux/kasan.h:263 [inline]
__do_kmalloc_node mm/slub.c:5657 [inline]
__kmalloc_noprof+0x40c/0x7e0 mm/slub.c:5669
kmalloc_noprof include/linux/slab.h:961 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
fib6_info_alloc+0x30/0xf0 net/ipv6/ip6_fib.c:155
ip6_route_info_create+0x142/0x860 net/ipv6/route.c:3820
ip6_route_add+0x49/0x1b0 net/ipv6/route.c:3949
inet6_rtm_newroute+0x268/0x19e0 net/ipv6/route.c:5660
rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6958
netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0xa68/0xad0 net/socket.c:2592
___sys_sendmsg+0x2a5/0x360 net/socket.c:2646
__sys_sendmsg net/socket.c:2678 [inline]
__do_sys_sendmsg net/socket.c:2683 [inline]
__se_sys_sendmsg net/socket.c:2681 [inline]
__x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: bbf4a17ad9ff ("ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF")
Reported-by: syzbot+707d6a5da1ab9e0c6f9d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/698cbfba.050a0220.2eeac1.009d.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://patch.msgid.link/20260211175133.3657034-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()

On the receive path, __ioam6_fill_trace_data() uses trace->nodelen
to decide how much data to write for each node. It trusts this field
as-is from the incoming packet, with no consistency check against
trace->type (the 24-bit field that tells which data items are
present). A crafted packet can set nodelen=0 while setting type bits
0-21, causing the function to write ~100 bytes past the allocated
region (into skb_shared_info), which corrupts adjacent heap memory
and leads to a kernel panic.

Add a shared helper ioam6_trace_compute_nodelen() in ioam6.c to
derive the expected nodelen from the type field, and use it:

  - in ioam6_iptunnel.c (send path, existing validation) to replace
    the open-coded computation;
  - in exthdrs.c (receive path, ipv6_hop_ioam) to drop packets whose
    nodelen is inconsistent with the type field, before any data is
    written.

Per RFC 9197, bits 12-21 are each short (4-octet) fields, so they
are included in IOAM6_MASK_SHORT_FIELDS (changed from 0xff100000 to
0xff1ffc00).

Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace")
Cc: stable@vger.kernel.org
Signed-off-by: Junxi Qian <qjx1298677004@gmail.com>
Reviewed-by: Justin Iurman <justin.iurman@gmail.com>
Link: https://patch.msgid.link/20260211040412.86195-1-qjx1298677004@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: netconsole: Increase port listening timeout

wait_for_port() can wait up to 2 seconds with the sleep and the polling
in wait_local_port_listen() combined. So, in netcons_basic.sh, the socat
process could die before the test writes to the netconsole.

Increase the timeout to 3 seconds to make netcons_basic.sh pass
consistently.

Fixes: 3dc6c76391cb ("selftests: net: Add IPv6 support to netconsole basic tests")
Signed-off-by: Pin-yen Lin <treapking@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210005939.3230550-1-treapking@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mscc-ocelot-fix-missing-lock-in-ocelot_port_xmit'

Ziyi Guo says:

====================
net: mscc: ocelot: fix missing lock in ocelot_port_xmit()

ocelot_port_xmit() calls ocelot_can_inject() and
ocelot_port_inject_frame() without holding the injection group lock.
Both functions contain lockdep_assert_held() for the injection lock,
and the correct caller felix_port_deferred_xmit() properly acquires
the lock using ocelot_lock_inj_grp() before calling these functions.

this v3 splits the fix into a 3-patch series to separate
refactoring from the behavioral change:

  1/3: Extract the PTP timestamp handling into an ocelot_xmit_timestamp()
       helper so the logic isn't duplicated when the function is split.

  2/3: Split ocelot_port_xmit() into ocelot_port_xmit_fdma() and
       ocelot_port_xmit_inj(), keeping the FDMA and register injection
       code paths fully separate.

  3/3: Add ocelot_lock_inj_grp()/ocelot_unlock_inj_grp() in
       ocelot_port_xmit_inj() to fix the missing lock protection.

Patches 1-2 are pure refactors with no behavioral change.
Patch 3 is the actual bug fix.
====================

Link: https://patch.msgid.link/20260208225602.1339325-1-n7l8m4@u.northwestern.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: add missing lock protection in ocelot_port_xmit_inj()

ocelot_port_xmit_inj() calls ocelot_can_inject() and
ocelot_port_inject_frame() without holding the injection group lock.
Both functions contain lockdep_assert_held() for the injection lock,
and the correct caller felix_port_deferred_xmit() properly acquires
the lock using ocelot_lock_inj_grp() before calling these functions.

Add ocelot_lock_inj_grp()/ocelot_unlock_inj_grp() around the register
injection path to fix the missing lock protection. The FDMA path is not
affected as it uses its own locking mechanism.

Fixes: c5e12ac3beb0 ("net: mscc: ocelot: serialize access to the injection/extraction groups")
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260208225602.1339325-4-n7l8m4@u.northwestern.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: split xmit into FDMA and register injection paths

Split ocelot_port_xmit() into two separate functions:
- ocelot_port_xmit_fdma(): handles the FDMA injection path
- ocelot_port_xmit_inj(): handles the register-based injection path

The top-level ocelot_port_xmit() now dispatches to the appropriate
function based on the ocelot_fdma_enabled static key.

This is a pure refactor with no behavioral change. Separating the two
code paths makes each one simpler and prepares for adding proper locking
to the register injection path without affecting the FDMA path.

Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260208225602.1339325-3-n7l8m4@u.northwestern.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: extract ocelot_xmit_timestamp() helper

Extract the PTP timestamp handling logic from ocelot_port_xmit() into a
separate ocelot_xmit_timestamp() helper function. This is a pure
refactor with no behavioral change.

The helper returns false if the skb was consumed (freed) due to a
timestamp request failure, and true if the caller should continue with
frame injection. The rew_op value is returned via pointer.

This prepares for splitting ocelot_port_xmit() into separate FDMA and
register injection paths in a subsequent patch.

Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260208225602.1339325-2-n7l8m4@u.northwestern.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sparx5/lan969x: fix DWRR cost max to match hardware register width

DWRR (Deficit Weighted Round Robin) scheduling distributes bandwidth
across traffic classes based on per-queue cost values, where lower cost
means higher bandwidth share.

The SPX5_DWRR_COST_MAX constant is 63 (6 bits) but the hardware
register field HSCH_DWRR_ENTRY_DWRR_COST is GENMASK(24, 20), only
5 bits wide (max 31). This causes sparx5_weight_to_hw_cost() to
compute cost values that silently overflow via FIELD_PREP, resulting
in incorrect scheduling weights.

Set SPX5_DWRR_COST_MAX to 31 to match the hardware register width.

Fixes: 211225428d65 ("net: microchip: sparx5: add support for offloading ets qdisc")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210-sparx5-fix-dwrr-cost-max-v1-1-58fbdbc25652@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: fore200e: fix use-after-free in tasklets during device removal

When the PCA-200E or SBA-200E adapter is being detached, the fore200e
is deallocated. However, the tx_tasklet or rx_tasklet may still be running
or pending, leading to use-after-free bug when the already freed fore200e
is accessed again in fore200e_tx_tasklet() or fore200e_rx_tasklet().

One of the race conditions can occur as follows:

CPU 0 (cleanup)           | CPU 1 (tasklet)
fore200e_pca_remove_one() | fore200e_interrupt()
  fore200e_shutdown()     |   tasklet_schedule()
    kfree(fore200e)       | fore200e_tx_tasklet()
                          |   fore200e-> // UAF

Fix this by ensuring tx_tasklet or rx_tasklet is properly canceled before
the fore200e is released. Add tasklet_kill() in fore200e_shutdown() to
synchronize with any pending or running tasklets. Moreover, since
fore200e_reset() could prevent further interrupts or data transfers,
the tasklet_kill() should be placed after fore200e_reset() to prevent
the tasklet from being rescheduled in fore200e_interrupt(). Finally,
it only needs to do tasklet_kill() when the fore200e state is greater
than or equal to FORE200E_STATE_IRQ, since tasklets are uninitialized
in earlier states. In a word, the tasklet_kill() should be placed in
the FORE200E_STATE_IRQ branch within the switch...case structure.

This bug was identified through static analysis.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@kernel.org
Suggested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210094537.9767-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: fix oops when split header is enabled

For GMAC4, when split header is enabled, in some rare cases, the
hardware does not fill buf2 of the first descriptor with payload.
Thus we cannot assume buf2 is always fully filled if it is not
the last descriptor. Otherwise, the length of buf2 of the second
descriptor will be calculated wrong and cause an oops:

Unable to handle kernel paging request at virtual address ffff00019246bfc0
...
x2 : 0000000000000040 x1 : ffff00019246bfc0 x0 : ffff00009246c000
Call trace:
dcache_inval_poc+0x28/0x58 (P)
dma_direct_sync_single_for_cpu+0x38/0x6c
__dma_sync_single_for_cpu+0x34/0x6c
stmmac_napi_poll_rx+0x8f0/0xb60
__napi_poll.constprop.0+0x30/0x144
net_rx_action+0x160/0x274
handle_softirqs+0x1b8/0x1fc
...

To fix this, the PL bit-field in RDES3 register is used for all
descriptors, whether it is the last descriptor or not.

Fixes: ec222003bd94 ("net: stmmac: Prepare to add Split Header support")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Link: https://patch.msgid.link/20260209225037.589130-1-jie.zhang@analog.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: intel: fix PCI device ID conflict between i40e and ipw2200

The ID 8086:104f is matched by both i40e and ipw2200. The same device
ID should not be in more than one driver, because in that case, which
driver is used is unpredictable. Fix this by taking advantage of the
fact that i40e devices use PCI_CLASS_NETWORK_ETHERNET and ipw2200
devices use PCI_CLASS_NETWORK_OTHER to differentiate the devices.

Fixes: 2e45d3f4677a ("i40e: Add support for X710 B/P & SFP+ cards")
Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Link: https://patch.msgid.link/20260210021235.16315-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: limit RPS test CPUs to supported range

The _get_unused_cpus() function can return CPU numbers >= 16, which
exceeds RPS_MAX_CPUS in toeplitz.c. When this happens, the test fails
with a cryptic message:

  # Exception| Traceback (most recent call last):
  # Exception|   File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/ksft.py", line 319, in ksft_run
  # Exception|     func(*args)
  # Exception|   File "/tmp/cur/linux/tools/testing/selftests/drivers/net/hw/toeplitz.py", line 189, in test
  # Exception|     with bkg(" ".join(rx_cmd), ksft_ready=True, exit_wait=True) as rx_proc:
  # Exception|          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  # Exception|   File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 124, in __init__
  # Exception|     super().__init__(comm, background=True,
  # Exception|   File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 77, in __init__
  # Exception|     raise Exception("Did not receive ready message")
  # Exception| Exception: Did not receive ready message

Rename _get_unused_cpus() to _get_unused_rps_cpus() and cap the CPU
search range to RPS_MAX_CPUS.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210093110.1935149-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: lib: Fix jq parsing error

The testcase failed as below:
$./vlan_bridge_binding.sh
...
+ adf_ip_link_set_up d1
+ local name=d1
+ shift
+ ip_link_is_up d1
+ ip_link_has_flag d1 UP
+ local name=d1
+ shift
+ local flag=UP
+ shift
++ ip -j link show d1
++ jq --arg flag UP 'any(.[].flags.[]; . == $flag)'
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START
(Unix shell quoting issues?) at <top-level>, line 1:
any(.[].flags.[]; . == $flag)
jq: 1 compile error

Remove the extra dot (.) after flags array to fix this.

Fixes: 4baa1d3a5080 ("selftests: net: lib: Add ip_link_has_flag()")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260211022146.190948-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: Remove duplicate include

Remove duplicate inclusion of <net/netdev_queues.h>.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260211032021.2719742-1-nichen@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: tc_restrictions: Fix test failure with new iproute2

As explained in [1], iproute2 started rejecting tc-police burst sizes
that result in an overflow. This can happen when the burst size is high
enough and the rate is low enough.

A couple of test cases specify such configurations, resulting in
iproute2 errors and test failure.

Fix by reducing the burst size so that the test will pass with both new
and old iproute2 versions.

[1] https://lore.kernel.org/netdev/20250916215731.3431465-1-jay.vosburgh@canonical.com/

Fixes: cb12d1763267 ("selftests: mlxsw: tc_restrictions: Test tc-police restrictions")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/88b00c6e85188aa6a065dc240206119b328c46e1.1770643998.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mctp: ensure our nlmsg responses are initialised

Syed Faraz Abrar (@farazsth98) from Zellic, and Pumpkin (@u1f383) from
DEVCORE Research Team working with Trend Micro Zero Day Initiative
report that a RTM_GETNEIGH will return uninitalised data in the pad
bytes of the ndmsg data.

Ensure we're initialising the netlink data to zero, in the link, addr
and neigh response messages.

Fixes: 831119f88781 ("mctp: Add neighbour netlink interface")
Fixes: 06d2f4c583a7 ("mctp: Add netlink route management")
Fixes: 583be982d934 ("mctp: Add device handling and netlink interface")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260209-dev-mctp-nlmsg-v1-1-f1e30c346a43@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

myri10ge: replace comma with semicolons

Commit fd24173439c0 ("myri10ge: avoid uninitialized variable use")
added commas instead of semicolons

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260212055028.3248491-1-nichen@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy_port-sfp-modules-representation-and-phy_port-listing'

Maxime Chevallier says:

====================
net: phy_port: SFP modules representation and phy_port listing (part)
====================

Applying just the initial cleanup + fixes from the series.

Link: https://patch.msgid.link/20260205092317.755906-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: phy_port: Correctly recompute the port's linkmodes

a PHY-driven phy_port contains a 'supported' field containing the
linkmodes available on this port. This is populated based on :
- The PHY's reported features
- The DT representation of the connector
- The PHY's attach_mdi() callback

As these different attrbution methods work in conjunction, the helper
phy_port_update_supported() recomputes the final 'supported' value based
on the populated mediums, linkmodes and pairs.

However this recompute wasn't correctly implemented, and added more
modes than necessary by or'ing the medium-specific modes to the existing
support. Let's fix this and properly filter the modes.

Fixes: 589e934d2735 ("net: phy: Introduce PHY ports representation")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://patch.msgid.link/20260205092317.755906-4-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: phy_port: Cleanup the of-parsing logic for phy_port

We don't need to maintain a mediums bitfield, let's drop it and drop a
bogus check for empty mediums, as we already check it above.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://patch.msgid.link/20260205092317.755906-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: initialize the port support based on the PHY's for OF ports

With the phy_port infrastructure came an ethernet-connector binding,
allowing to represent the MDI of a PHY in devicetree. This allows
specifying the mediums and pairs of a port.

Let's initialize the port's supported list based on what the PHY
reports, so that we can then filter it with what the connector allows
using.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://patch.msgid.link/20260205092317.755906-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ovpn: fix VPN TX bytes counting

In ovpn_net_xmit, after GSO segmentation and segment processing, the
first segment on the list is used to increment VPN TX statistics, which
fails to account for any subsequent segments in the chain.

Fix this by accumulating the length of every segment that successfully
passes skb_share_check into a tx_bytes variable. This ensures the peer
statistics accurately reflect the total data volume sent, regardless of
whether the original packet was segmented.

Fixes: 04ca14955f9a ("ovpn: store tunnel and transport statistics")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>

ovpn: fix possible use-after-free in ovpn_net_xmit

When building the skb_list in ovpn_net_xmit, skb_share_check will free
the original skb if it is shared. The current implementation continues
to use the stale skb pointer for subsequent operations:
- peer lookup,
- skb_dst_drop (even though all segments produced by skb_gso_segment
will have a dst attached),
- ovpn_peer_stats_increment_tx.

Fix this by moving the peer lookup and skb_dst_drop before segmentation
so that the original skb is still valid when used. Return early if all
segments fail skb_share_check and the list ends up empty.
Also switch ovpn_peer_stats_increment_tx to use skb_list.next; the next
patch fixes the stats logic.

Fixes: 08857b5ec5d9 ("ovpn: implement basic TX path (UDP)")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>

ovpn: set sk_user_data before overriding callbacks

During initialization, we override socket callbacks and set sk_user_data
to an ovpn_socket instance. Currently, these two operations are
decoupled: callbacks are overridden before sk_user_data is set. While
existing callbacks perform safety checks for NULL or non-ovpn
sk_user_data, this condition causes a "half-formed" state where valid
packets arriving during attachment trigger error logs (e.g., "invoked on
non ovpn socket").

Set sk_user_data before overriding the callbacks so that it can be
accessed safely from them. Since we already check that the socket has no
sk_user_data before setting it, this remains safe even if an interrupt
accesses the socket after sk_user_data is set but before the callbacks
are overridden.

This also requires initializing all protocol-specific fields (such as
tcp_tx_work and peer links) before calling ovpn_socket_attach, ensuring
the ovpn_socket is fully formed before it becomes visible to any
callback.

Fixes: f6226ae7a0cd ("ovpn: introduce the ovpn_socket object")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>

Merge tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core & protocols:

   - A significant effort all around the stack to guide the compiler to
     make the right choice when inlining code, to avoid unneeded calls
     for small helper and stack canary overhead in the fast-path.

     This generates better and faster code with very small or no text
     size increases, as in many cases the call generated more code than
     the actual inlined helper.

   - Extend AccECN implementation so that is now functionally complete,
     also allow the user-space enabling it on a per network namespace
     basis.

   - Add support for memory providers with large (above 4K) rx buffer.
     Paired with hw-gro, larger rx buffer sizes reduce the number of
     buffers traversing the stack, dincreasing single stream CPU usage
     by up to ~30%.

   - Do not add HBH header to Big TCP GSO packets. This simplifies the
     RX path, the TX path and the NIC drivers, and is possible because
     user-space taps can now interpret correctly such packets without
     the HBH hint.

   - Allow IPv6 routes to be configured with a gateway address that is
     resolved out of a different interface than the one specified,
     aligning IPv6 to IPv4 behavior.

   - Multi-queue aware sch_cake. This makes it possible to scale the
     rate shaper of sch_cake across multiple CPUs, while still enforcing
     a single global rate on the interface.

   - Add support for the nbcon (new buffer console) infrastructure to
     netconsole, enabling lock-free, priority-based console operations
     that are safer in crash scenarios.

   - Improve the TCP ipv6 output path to cache the flow information,
     saving cpu cycles, reducing cache line misses and stack use.

   - Improve netfilter packet tracker to resolve clashes for most
     protocols, avoiding unneeded drops on rare occasions.

   - Add IP6IP6 tunneling acceleration to the flowtable infrastructure.

   - Reduce tcp socket size by one cache line.

   - Notify neighbour changes atomically, avoiding inconsistencies
     between the notification sequence and the actual states sequence.

   - Add vsock namespace support, allowing complete isolation of vsocks
     across different network namespaces.

   - Improve xsk generic performances with cache-alignment-oriented
     optimizations.

   - Support netconsole automatic target recovery, allowing netconsole
     to reestablish targets when underlying low-level interface comes
     back online.

  Driver API:

   - Support for switching the working mode (automatic vs manual) of a
     DPLL device via netlink.

   - Introduce PHY ports representation to expose multiple front-facing
     media ports over a single MAC.

   - Introduce "rx-polarity" and "tx-polarity" device tree properties,
     to generalize polarity inversion requirements for differential
     signaling.

   - Add helper to create, prepare and enable managed clocks.

  Device drivers:

   - Add Huawei hinic3 PF etherner driver.

   - Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet
     controller.

   - Add ethernet driver for MaxLinear MxL862xx switches

   - Remove parallel-port Ethernet driver.

   - Convert existing driver timestamp configuration reporting to
     hwtstamp_get and remove legacy ioctl().

   - Convert existing drivers to .get_rx_ring_count(), simplifing the RX
     ring count retrieval. Also remove the legacy fallback path.

   - Ethernet high-speed NICs:
      - Broadcom (bnxt, bng):
         - bnxt: add FW interface update to support FEC stats histogram
           and NVRAM defragmentation
         - bng: add TSO and H/W GRO support
      - nVidia/Mellanox (mlx5):
         - improve latency of channel restart operations, reducing the
           used H/W resources
         - add TSO support for UDP over GRE over VLAN
         - add flow counters support for hardware steering (HWS) rules
         - use a static memory area to store headers for H/W GRO,
           leading to 12% RX tput improvement
      - Intel (100G, ice, idpf):
         - ice: reorganizes layout of Tx and Rx rings for cacheline
           locality and utilizes __cacheline_group* macros on the new
           layouts
         - ice: introduces Synchronous Ethernet (SyncE) support
      - Meta (fbnic):
         - adds debugfs for firmware mailbox and tx/rx rings vectors

   - Ethernet virtual:
      - geneve: introduce GRO/GSO support for double UDP encapsulation

   - Ethernet NICs consumer, and embedded:
      - Synopsys (stmmac):
         - some code refactoring and cleanups
      - RealTek (r8169):
         - add support for RTL8127ATF (10G Fiber SFP)
         - add dash and LTR support
      - Airoha:
         - AN8811HB 2.5 Gbps phy support
      - Freescale (fec):
         - add XDP zero-copy support
      - Thunderbolt:
         - add get link setting support to allow bonding
      - Renesas:
         - add support for RZ/G3L GBETH SoC

   - Ethernet switches:
      - Maxlinear:
         - support R(G)MII slow rate configuration
         - add support for Intel GSW150
      - Motorcomm (yt921x):
         - add DCB/QoS support
      - TI:
         - icssm-prueth: support bridging (STP/RSTP) via the switchdev
           framework

   - Ethernet PHYs:
      - Realtek:
         - enable SGMII and 2500Base-X in-band auto-negotiation
         - simplify and reunify C22/C45 drivers
      - Micrel: convert bindings to DT schema

   - CAN:
      - move skb headroom content into skb extensions, making CAN
        metadata access more robust

   - CAN drivers:
      - rcar_canfd:
         - add support for FD-only mode
         - add support for the RZ/T2H SoC
      - sja1000: cleanup the CAN state handling

   - WiFi:
      - implement EPPKE/802.1X over auth frames support
      - split up drop reasons better, removing generic RX_DROP
      - additional FTM capabilities: 6 GHz support, supported number of
        spatial streams and supported number of LTF repetitions
      - better mac80211 iterators to enumerate resources
      - initial UHR (Wi-Fi 8) support for cfg80211/mac80211

   - WiFi drivers:
      - Qualcomm/Atheros:
         - ath11k: support for Channel Frequency Response measurement
         - ath12k: a significant driver refactor to support multi-wiphy
           devices and and pave the way for future device support in the
           same driver (rather than splitting to ath13k)
         - ath12k: support for the QCC2072 chipset
      - Intel:
         - iwlwifi: partial Neighbor Awareness Networking (NAN) support
         - iwlwifi: initial support for U-NII-9 and IEEE 802.11bn
      - RealTek (rtw89):
         - preparations for RTL8922DE support

   - Bluetooth:
      - implement setsockopt(BT_PHY) to set the connection packet type/PHY
      - set link_policy on incoming ACL connections

   - Bluetooth drivers:
      - btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE
      - btqca: add WCN6855 firmware priority selection feature"

* tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1254 commits)
  bnge/bng_re: Add a new HSI
  net: macb: Fix tx/rx malfunction after phy link down and up
  af_unix: Fix memleak of newsk in unix_stream_connect().
  net: ti: icssg-prueth: Add optional dependency on HSR
  net: dsa: add basic initial driver for MxL862xx switches
  net: mdio: add unlocked mdiodev C45 bus accessors
  net: dsa: add tag format for MxL862xx switches
  dt-bindings: net: dsa: add MaxLinear MxL862xx
  selftests: drivers: net: hw: Modify toeplitz.c to poll for packets
  octeontx2-pf: Unregister devlink on probe failure
  net: renesas: rswitch: fix forwarding offload statemachine
  ionic: Rate limit unknown xcvr type messages
  tcp: inet6_csk_xmit() optimization
  tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock()
  tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect()
  ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6
  ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update()
  ipv6: use np->final in inet6_sk_rebuild_header()
  ipv6: add daddr/final storage in struct ipv6_pinfo
  net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup()
  ...

Merge tag 'devicetree-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:
"DT core:

   - Sync dtc/libfdt with upstream v1.7.2-62-ga26ef6400bd8

   - Add a for_each_compatible_node_scoped() loop and convert users in
     cpufreq, dmaengine, clk, cdx, powerpc and Arm

   - Simplify of/platform.c with scoped loop helpers

   - Add fw_devlink tracking for "mmc-pwrseq"

   - Optimize fw_devlink callback code size for pinctrl-N properties

   - Replace strcmp_suffix() with strends()

  DT bindings:

   - Support building single binding targets

   - Convert google,goldfish-fb, cznic,turris-mox-rwtm, ti,prm-inst

   - Add bindings for Freescale AVIC, Realtek RTD1xxx system
     controllers, Microchip 25AA010A EEPROM, OnSemi FIN3385, IEI
     WT61P803 PUZZLE, Delta Electronics DPS-800-AB power supply,
     Infineon IR35221 Digital Multi-phase Controller, Infineon PXE1610
     Digital Dual Output 6+1 VR12.5 & VR13 CPU Controller,
     socionext,uniphier-smpctrl, and xlnx,zynqmp-firmware

   - Lots of trivial binding fixes to address warnings in DTS files.
     These are mostly for arm64 platforms which is getting closer to be
     warning free. Some public shaming has helped.

   - Fix I2C bus node names in examples

   - Drop obsolete brcm,vulcan-soc binding

   - Drop unreferenced binding headers"

* tag 'devicetree-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (60 commits)
  dt-bindings: interrupt-controller: Add compatiblie string fsl,imx(1|25|27|31|35)-avic
  dt-bindings: soc: imx: add fsl,aips and fsl,emi compatible strings
  dt-bindings: display: bridge: lt8912b: Drop reset gpio requirement
  dt-bindings: firmware: fsl,scu: Mark multi-channel MU layouts as deprecated
  cpufreq: s5pv210: Simplify with scoped for each OF child loop
  dmaengine: fsl_raid: Simplify with scoped for each OF child loop
  clk: imx: imx31: Simplify with scoped for each OF child loop
  clk: imx: imx27: Simplify with scoped for each OF child loop
  cdx: Use mutex guard to simplify error handling
  cdx: Simplify with scoped for each OF child loop
  powerpc/wii: Simplify with scoped for each OF child loop
  powerpc/fsp2: Simplify with scoped for each OF child loop
  ARM: exynos: Simplify with scoped for each OF child loop
  ARM: at91: Simplify with scoped for each OF child loop
  of: Add for_each_compatible_node_scoped() helper
  dt-bindings: Fix emails with spaces or missing brackets
  scripts/dtc: Update to upstream version v1.7.2-62-ga26ef6400bd8
  dt-bindings: crypto: inside-secure,safexcel: Mandate only ring IRQs
  dt-bindings: crypto: inside-secure,safexcel: Add SoC compatibles
  of: reserved_mem: Fix placement of __free() annotation
  ...

Merge tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core updates from Danilo Krummrich:
"Bus:

   - Ensure bus->match() is consistently called with the device lock
     held

   - Improve type safety of bus_find_device_by_acpi_dev()

  Devtmpfs:

   - Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of
     simple_strtoul()

   - Avoid sparse warning by making devtmpfs_context_ops static

  IOMMU:

   - Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe()

  MAINTAINERS:

   - Add the new driver-core mailing list (driver-core@lists.linux.dev)
     to all relevant entries

   - Add missing tree location for "FIRMWARE LOADER (request_firmware)"

   - Add driver-model documentation to the "DRIVER CORE" entry

   - Add missing driver-core maintainers to the "AUXILIARY BUS" entry

  Misc:

   - Change return type of attribute_container_register() to void; it
     has always been infallible

   - Do not export sysfs_change_owner(), sysfs_file_change_owner() and
     device_change_owner()

   - Move devres_for_each_res() from the public devres header to
     drivers/base/base.h

   - Do not use a static struct device for the faux bus; allocate it
     dynamically

  Revocable:

   - Patches for the revocable synchronization primitive have been
     scheduled for v7.0-rc1, but have been reverted as they need some
     more refinement

  Rust:

   - Device:
      - Support dev_printk on all device types, not just the core Device
        struct; remove now-redundant .as_ref() calls in dev_* print
        calls

   - Devres:
      - Introduce an internal reference count in Devres<T> to avoid a
        deadlock condition in case of (indirect) nesting

   - DMA:
      - Allow drivers to tune the maximum DMA segment size via
        dma_set_max_seg_size()

   - I/O:
      - Introduce the concept of generic I/O backends to handle
        different kinds of device shared memory through a common
        interface.

        This enables higher-level concepts such as register
        abstractions, I/O slices, and field projections to be built
        generically on top.

        In a first step, introduce the Io, IoCapable<T>, and IoKnownSize
        trait hierarchy for sharing a common interface supporting offset
        validation and bound-checking logic between I/O backends.

      - Refactor MMIO to use the common I/O backend infrastructure

   - Misc:
      - Add __rust_helper annotations to C helpers for inlining into
        Rust code

      - Use "kernel vertical" style for imports

      - Replace kernel::c_str! with C string literals

      - Update ARef imports to use sync::aref

      - Use pin_init::zeroed() for struct auxiliary_device_id and
        debugfs file_operations initialization

      - Use LKMM atomic types in debugfs doc-tests

      - Various minor comment and documentation fixes

   - PCI:
      - Implement PCI configuration space accessors using the common I/O
        backend infrastructure

      - Document pci::Bar device endianness assumptions

   - SoC:
      - Abstractions for struct soc_device and struct soc_device_attribute

      - Sample driver for soc::Device"

* tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (79 commits)
  rust: devres: fix race condition due to nesting
  rust: dma: add missing __rust_helper annotations
  samples: rust: pci: Remove some additional `.as_ref()` for `dev_*` print
  Revert "revocable: Revocable resource management"
  Revert "revocable: Add Kunit test cases"
  Revert "selftests: revocable: Add kselftest cases"
  driver core: remove device_change_owner() export
  sysfs: remove exports of sysfs_*change_owner()
  driver core: disable revocable code from build
  revocable: Add KUnit test for concurrent access
  revocable: fix SRCU index corruption by requiring caller-provided storage
  revocable: Add KUnit test for provider lifetime races
  revocable: Fix races in revocable_alloc() using RCU
  driver core: fix inverted "locked" suffix of driver_match_device()
  rust: io: move MIN_SIZE and io_addr_assert to IoKnownSize
  rust: pci: re-export ConfigSpace
  rust: dma: allow drivers to tune max segment size
  gpu: tyr: remove redundant `.as_ref()` for `dev_*` print
  rust: auxiliary: use `pin_init::zeroed()` for device ID
  rust: debugfs: use pin_init::zeroed() for file_operations
  ...

Merge tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI updates from Bjorn Helgaas:
"Enumeration:

   - Don't try to enable Extended Tags on VFs since that bit is Reserved
     and causes misleading log messages (Håkon Bugge)

   - Initialize Endpoint Read Completion Boundary to match Root Port,
     regardless of ACPI _HPX (Håkon Bugge)

   - Apply _HPX PCIe Setting Record only to AER configuration, and only
     when OS owns PCIe hotplug but not AER, to avoid clobbering Extended
     Tag and Relaxed Ordering settings (Håkon Bugge)

  Resource management:

   - Move CardBus code to setup-cardbus.c and only build it when
     CONFIG_CARDBUS is set (Ilpo Järvinen)

   - Fix bridge window alignment with optional resources, where
     additional alignment requirement was previously lost (Ilpo
     Järvinen)

   - Stop over-estimating bridge window size since they are now assigned
     without any gaps between them (Ilpo Järvinen)

   - Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening
     for nested bridges and endpoints (Ilpo Järvinen)

   - Add pbus_mem_size_optional() to handle sizes of optional resources
     (SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)

   - Don't claim disabled bridge windows to avoid spurious claim
     failures (Ilpo Järvinen)

  Driver binding:

   - Fix device reference leak in pcie_port_remove_service() (Uwe
     Kleine-König)

   - Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
     portdrv.c (Uwe Kleine-König)

   - Convert portdrv to use pcie_port_bus_type.probe() and .remove()
     callbacks so .probe() and .remove() can eventually be removed from
     struct device_driver (Uwe Kleine-König)

  Error handling:

   - Clear stale errors on reporting agents upon probe so they don't
     look like recent errors (Lukas Wunner)

   - Add generic RAS tracepoint for hotplug events (Shuai Xue)

   - Add RAS tracepoint for link speed changes (Shuai Xue)

  Power management:

   - Avoid redundant delay on transition from D3hot to D3cold if the
     device was already in D3hot (Brian Norris)

   - Prevent runtime suspend until devices are fully initialized to
     avoid saving incompletely configured device state (Brian Norris)

  Power control:

   - Add power_on/off callbacks with generic signature to pwrseq,
     tc9563, and slot drivers so they can be used by pwrctrl core
     (Manivannan Sadhasivam)

   - Add PCIe M.2 connector support to the slot pwrctrl driver
     (Manivannan Sadhasivam)

   - Switch to pwrctrl interfaces to create, destroy, and power on/off
     devices, calling them from host controller drivers instead of the
     PCI core (Manivannan Sadhasivam)

   - Drop qcom .assert_perst() callbacks since this is now done by the
     controller driver instead of the pwrctrl driver (Manivannan
     Sadhasivam)

  Virtualization:

   - Remove an incorrect unlock in pci_slot_trylock() error handling
     (Jinhui Guo)

   - Lock the bridge device for slot reset (Keith Busch)

   - Enable ACS after IOMMU configuration on OF platforms so ACS is
     enabled an all devices; previously the first device enumerated
     (typically a Root Port) didn't have ACS enabled (Manivannan
     Sadhasivam)

   - Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to
     work around hardware erratum; previously ACS SV was only
     temporarily disabled, which worked for enumeration but not after
     reset (Manivannan Sadhasivam)

  Peer-to-peer DMA:

   - Release per-CPU pgmap ref when vm_insert_page() fails to avoid hang
     when removing the PCI device (Hou Tao)

   - Remove incorrect p2pmem_alloc_mmap() warning about page refcount
     (Hou Tao)

  Endpoint framework:

   - Add configfs sub-groups synchronously to avoid NULL pointer
     dereference when racing with removal (Liu Song)

   - Fix swapped parameters in pci_{primary/secondary}_epc_epf_unlink()
     functions (Manikanta Maddireddy)

  ASPEED PCIe controller driver:

   - Add ASPEED Root Complex DT binding and driver (Jacky Chou)

  Freescale i.MX6 PCIe controller driver:

   - Add DT binding and driver support for an optional external refclock
     in addition to the refclock from the internal PLL (Richard Zhu)

   - Fix CLKREQ# control so host asserts it during enumeration and
     Endpoints can use it afterwards to exit the L1.2 link state
     (Richard Zhu)

  NVIDIA Tegra PCIe controller driver:

   - Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear
     down MSI domains to be built as modules (Aaron Kling)

   - Allow pci-tegra to be built as a module (Aaron Kling)

  NVIDIA Tegra194 PCIe controller driver:

   - Relax Kconfig so tegra194 can be built for platforms beyond
     Tegra194 (Vidya Sagar)

  Qualcomm PCIe controller driver:

   - Merge SC8180x DT binding into SM8150 (Krzysztof Kozlowski)

   - Move SDX55, SDM845, QCS404, IPQ5018, IPQ6018, IPQ8074 Gen3,
     IPQ8074, IPQ4019, IPQ9574, APQ8064, MSM8996, APQ8084 to dedicated
     schema (Krzysztof Kozlowski)

   - Add DT binding and driver support for SA8255p Endpoint being
     configured by firmware (Mrinmay Sarkar)

   - Parse PERST# from all PCIe bridge nodes for future platforms that
     will have PERST# in Switch Downstream Ports as well as in Root
     Ports (Manivannan Sadhasivam)

  Renesas RZ/G3S PCIe controller driver:

   - Use pci_generic_config_write() since the writability provided by
     the custom wrapper is unnecessary (Claudiu Beznea)

  SOPHGO PCIe controller driver:

   - Disable ASPM L0s and L1 on Sophgo 2044 PCIe Root Ports (Inochi
     Amaoto)

  Synopsys DesignWare PCIe controller driver:

   - Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
     pointer to the preceding Capability, to allow removal of
     Capabilities that are advertised but not fully implemented (Qiang
     Yu)

   - Remove MSI and MSI-X Capabilities in platforms that can't support
     them, so the PCI core automatically falls back to INTx (Qiang Yu)

   - Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status
     for drivers that support this (Shawn Lin)

   - Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if
     link is not up to avoid an unnecessary timeout (Manivannan
     Sadhasivam)

   - Revert dw-rockchip, qcom, and DWC core changes that used link-up
     IRQs to trigger enumeration instead of waiting for link to be up
     because the PCI core doesn't allocate bus number space for
     hierarchies that might be attached (Niklas Cassel)

   - Make endpoint iATU entry for MSI permanent instead of programming
     it dynamically, which is slow and racy with respect to other
     concurrent traffic, e.g., eDMA (Koichiro Den)

   - Use iMSI-RX MSI target address when possible to fix endpoints using
     32-bit MSI (Shawn Lin)

   - Allow DWC host controller driver probe to continue if device is not
     found or found but inactive; only fail when there's an error with
     the link (Manivannan Sadhasivam)

   - For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers
     are not accessible after PME_Turn_Off, simply wait 10ms instead of
     polling for L2/L3 Ready (Richard Zhu)

   - Use multiple iATU entries to map large bridge windows and DMA
     ranges when necessary instead of failing (Samuel Holland)

   - Add EPC dynamic_inbound_mapping feature bit for Endpoint
     Controllers that can update BAR inbound address translation without
     requiring EPF driver to clear/reset the BAR first, and advertise it
     for DWC-based Endpoints (Koichiro Den)

   - Add EPC subrange_mapping feature bit for Endpoint Controllers that
     can map multiple independent inbound regions in a single BAR,
     implement subrange mapping, advertise it for DWC-based Endpoints,
     and add Endpoint selftests for it (Koichiro Den)

   - Make resizable BARs work for Endpoint multi-PF configurations;
     previously it only worked for PF 0 (Aksh Garg)

   - Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings,
     and Address Match Mode (Aksh Garg)

   - Set up iATU when ECAM is enabled; previously IO and MEM outbound
     windows weren't programmed, and ECAM-related iATU entries weren't
     restored after suspend/resume, so config accesses failed (Krishna
     Chaitanya Chundru)

  Miscellaneous:

   - Use system_percpu_wq and WQ_PERCPU to explicitly request per-CPU
     work so WQ_UNBOUND can eventually be removed (Marco Crivellari)"

* tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (176 commits)
  PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
  PCI: Disable ACS SV for IDT 0x8090 switch
  PCI: Disable ACS SV for IDT 0x80b5 switch
  PCI: Cache ACS Capabilities register
  PCI: Enable ACS after configuring IOMMU for OF platforms
  PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
  PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
  PCI: Use device_lock_assert() to verify device lock is held
  PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
  PCI: Fix pci_slot_lock () device locking
  PCI: Fix pci_slot_trylock() error handling
  PCI: Mark Nvidia GB10 to avoid bus reset
  PCI: Mark ASM1164 SATA controller to avoid bus reset
  PCI: host-generic: Avoid reporting incorrect 'missing reg property' error
  PCI/PME: Replace RMW of Root Status register with direct write
  PCI/AER: Clear stale errors on reporting agents upon probe
  PCI: Don't claim disabled bridge windows
  PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
  PCI: dwc: Fix missing iATU setup when ECAM is enabled
  PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
  ...

Merge tag 'for-7.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

- dm-verity:
     - various optimizations and fixes related to forward error correction
     - add a .dm-verity keyring

- dm-integrity: fix bugs with growing a device in bitmap mode

- dm-mpath:
     - fix leaking fake timeout requests
     - fix UAF bug caused by stale rq->bio
     - fix minor bugs in device creation

- dm-core:
     - fix a bug related to blkg association
     - avoid unnecessary blk-crypto work on invalid keys

- dm-bufio:
     - dm-bufio cleanup and optimization (reducing hash table lookups)

- various other minor fixes and cleanups

* tag 'for-7.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits)
  dm mpath: make pg_init_delay_msecs settable
  Revert "dm: fix a race condition in retrieve_deps"
  dm mpath: Add missing dm_put_device when failing to get scsi dh name
  dm vdo encodings: clean up header and version functions
  dm: use bio_clone_blkg_association
  dm: fix excessive blk-crypto operations for invalid keys
  dm-verity: fix section mismatch error
  dm-unstripe: fix mapping bug when there are multiple targets in a table
  dm-integrity: fix recalculation in bitmap mode
  dm-bufio: avoid redundant buffer_tree lookups
  dm-bufio: merge cache_put() into cache_put_and_wake()
  selftests: add dm-verity keyring selftests
  dm-verity: add dm-verity keyring
  dm: clear cloned request bio pointer when last clone bio completes
  dm-verity: fix up various workqueue-related comments
  dm-verity: switch to bio_advance_iter_single()
  dm-verity: consolidate the BH and normal work structs
  dm: add WQ_PERCPU to alloc_workqueue users
  dm-integrity: fix a typo in the code for write/discard race
  dm: use READ_ONCE in dm_blk_report_zones
  ...

Merge tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:
"Core changes:
   - Rust bindings for IO-pgtable code
   - IOMMU page allocation debugging support
   - Disable ATS during PCI resets

  Intel VT-d changes:
   - Skip dev-iotlb flush for inaccessible PCIe device
   - Flush cache for PASID table before using it
   - Use right invalidation method for SVA and NESTED domains
   - Ensure atomicity in context and PASID entry updates

  AMD-Vi changes:
   - Support for nested translations
   - Other minor improvements

  ARM-SMMU-v2 changes:
   - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS"

  ARM-SMMU-v3 changes:
   - Improve CMDQ locking fairness for pathetically small queue sizes
   - Remove tracking of the IAS as this is only relevant for AArch32 and
     was causing C_BAD_STE errors
   - Add device-tree support for NVIDIA's CMDQV extension
   - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields
   - Don't disable ATS for nested S1-bypass nested domains
   - Additions to the kunit selftests"

* tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits)
  iommupt: Always add IOVA range to iotlb_gather in gather_range_pages()
  iommu/amd: serialize sequence allocation under concurrent TLB invalidations
  iommu/amd: Fix type of type parameter to amd_iommufd_hw_info()
  iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate
  iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage
  iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence
  iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence
  iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence
  iommu/arm-smmu-v3: Add device-tree support for CMDQV driver
  iommu/tegra241-cmdqv: Decouple driver from ACPI
  iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p
  iommu/vt-d: Fix race condition during PASID entry replacement
  iommu/vt-d: Clear Present bit before tearing down context entry
  iommu/vt-d: Clear Present bit before tearing down PASID entry
  iommu/vt-d: Flush piotlb for SVM and Nested domain
  iommu/vt-d: Flush cache for PASID table before using it
  iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode
  iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode
  rust: iommu: fix `srctree` link warning
  rust: iommu: fix Rust formatting
  ...

Merge tag 'landlock-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock updates from Mickaël Salaün:

- extend Landlock to enforce restrictions on a whole process, similarly
   to the seccomp's TSYNC flag

- refactor data structures to simplify code and improve performance

- add documentation to cover missing parts

* tag 'landlock-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  mailmap: Add entry for Mickaël Salaün
  landlock: Transpose the layer masks data structure
  landlock: Add access_mask_subset() helper
  selftests/landlock: Add filesystem access benchmark
  landlock: Document audit blocker field format
  landlock: Add errata documentation section
  landlock: Add backwards compatibility for restrict flags
  landlock: Refactor TCP socket type check
  landlock: Minor reword of docs for TCP access rights
  landlock: Document LANDLOCK_RESTRICT_SELF_TSYNC
  selftests/landlock: Add LANDLOCK_RESTRICT_SELF_TSYNC tests
  landlock: Multithreading support for landlock_restrict_self()

Merge tag 'integrity-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
"Just two bug fixes: IMA's detecting scripts (bprm_creds_for_exec), and
  calculating the EVM HMAC"

* tag 'integrity-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  evm: Use ordered xattrs list to calculate HMAC in evm_init_hmac()
  ima: Fix stack-out-of-bounds in is_bprm_creds_for_exec()

Merge tag 'Smack-for-7.0' of https://github.com/cschaufler/smack-next

Pull smack updates from Casey Schaufler:
"Two improvements to the code for setting the CIPSO Domain Of
  Interpretation (DOI), a seldom used feature, and a formatting change"

* tag 'Smack-for-7.0' of https://github.com/cschaufler/smack-next:
  smack: /smack/doi: accept previously used values
  smack: /smack/doi must be > 0
  security: smack: fix indentation in smack_access.c

Merge tag 'printk-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

- Check all mandatory callbacks when registering nbcon consoles

- Fix some compiler warnings

* tag 'printk-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  vsnprintf: drop __printf() attributes on binary printing functions
  printf: convert test_hashed into macro
  printk: nbcon: Check for device_{lock,unlock} callbacks

Merge tag 'slab-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

- The percpu sheaves caching layer was introduced as opt-in in 6.18 and
   now we enable it for all caches and remove the previous cpu (partial)
   slab caching mechanism.

   Besides the lower locking overhead and much more likely fastpath when
   freeing, this removes the rather complicated code related to the cpu
   slab lockless fastpaths (using this_cpu_try_cmpxchg128/64) and all
   its complications for PREEMPT_RT or kmalloc_nolock().

   The lockless slab freelist+counters update operation using
   try_cmpxchg128/64 remains and is crucial for freeing remote NUMA
   objects, and to allow flushing objects from sheaves to slabs mostly
   without the node list_lock (Vlastimil Babka)

- Eliminate slabobj_ext metadata overhead when possible. Instead of
   using kmalloc() to allocate the array for memcg and/or allocation
   profiling tag pointers, use leftover space in a slab or per-object
   padding due to alignment (Harry Yoo)

- Various followup improvements to the above (Hao Li)

* tag 'slab-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (39 commits)
  slub: let need_slab_obj_exts() return false if SLAB_NO_OBJ_EXT is set
  mm/slab: only allow SLAB_OBJ_EXT_IN_OBJ for unmergeable caches
  mm/slab: place slabobj_ext metadata in unused space within s->size
  mm/slab: move [__]ksize and slab_ksize() to mm/slub.c
  mm/slab: save memory by allocating slabobj_ext array from leftover
  mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
  mm/slab: use stride to access slabobj_ext
  mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
  ext4: specify the free pointer offset for ext4_inode_cache
  mm/slab: allow specifying free pointer offset when using constructor
  mm/slab: use unsigned long for orig_size to ensure proper metadata align
  slub: clarify object field layout comments
  mm/slab: avoid allocating slabobj_ext array from its own slab
  slub: avoid list_lock contention from __refill_objects_any()
  mm/slub: cleanup and repurpose some stat items
  mm/slub: remove DEACTIVATE_TO_* stat items
  slab: remove frozen slab checks from __slab_free()
  slab: update overview comments
  slab: refill sheaves from all nodes
  slab: remove unused PREEMPT_RT specific macros
  ...

Merge tag 'kbuild-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild/Kconfig updates from Nathan Chancellor:
"Kbuild:

   - Drop '*_probe' pattern from modpost section check allowlist, which
     hid legitimate warnings (Johan Hovold)

   - Disable -Wtype-limits altogether, instead of enabling at W=2
     (Vincent Mailhol)

   - Improve UAPI testing to skip testing headers that require a libc
     when CONFIG_CC_CAN_LINK is not set, opening up testing of headers
     with no libc dependencies to more environments (Thomas Weißschuh)

   - Update gendwarfksyms documentation with required dependencies
     (Jihan LIN)

   - Reject invalid LLVM= values to avoid unintentionally falling back
     to system toolchain (Thomas Weißschuh)

   - Add a script to help run the kernel build process in a container
     for consistent environments and testing (Guillaume Tucker)

   - Simplify kallsyms by getting rid of the relative base (Ard
     Biesheuvel)

   - Performance and usability improvements to scripts/make_fit.py
     (Simon Glass)

   - Minor various clean ups and fixes

  Kconfig:

   - Move XPM icons to individual files, clearing up GTK deprecation
     warnings (Rostislav Krasny)

   - Support

        depends on FOO if BAR

     as syntactic sugar for

        depends on FOO || !BAR

     (Nicolas Pitre, Graham Roff)

   - Refactor merge_config.sh to use awk over shell/sed/grep,
     dramatically speeding up processing large number of config
     fragments (Anders Roxell, Mikko Rapeli)"

* tag 'kbuild-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (39 commits)
  kbuild: remove dependency of run-command on config
  scripts/make_fit: Compress dtbs in parallel
  scripts/make_fit: Support a few more parallel compressors
  kbuild: Support a FIT_EXTRA_ARGS environment variable
  scripts/make_fit: Move dtb processing into a function
  scripts/make_fit: Support an initial ramdisk
  scripts/make_fit: Speed up operation
  rust: kconfig: Don't require RUST_IS_AVAILABLE for rustc-option
  MAINTAINERS: Add scripts/install.sh into Kbuild entry
  modpost: Amend ppc64 save/restfpr symnames for -Os build
  MIPS: tools: relocs: Ship a definition of R_MIPS_PC32
  streamline_config.pl: remove superfluous exclamation mark
  kbuild: dummy-tools: Add python3
  scripts: kconfig: merge_config.sh: warn on duplicate input files
  scripts: kconfig: merge_config.sh: use awk in checks too
  scripts: kconfig: merge_config.sh: refactor from shell/sed/grep to awk
  kallsyms: Get rid of kallsyms relative base
  mips: Add support for PC32 relocations in vmlinux
  Documentation: dev-tools: add container.rst page
  scripts: add tool to run containerized builds
  ...