git.ipfire.org Git - thirdparty/kernel/linux.git/log

eth: mlx5: Move pause storm errors to pause stats

Report device_stall_critical_watermark_cnt as tx_pause_storm_events in
the ethtool_pause_stats struct. This counter tracks pause storm error
events which indicate the NIC has been sending pause frames for an
extended period due to a stall.

The ethtool_pause_stats struct reports these stalls as a single value,
whereas the device supports tracking them per priority. Aggregate the
counter across all priority classes to capture stalls on all priorities.
Note that the stats are fetched from the device for each priority via
mlx5_core_access_reg().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20260302230149.1580195-6-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: Fetch TX pause storm stats

With pause storm protection in place, track the occurrence of pause
storm events. Since there is a one-to-one mapping between pause storm
interrupts and events, use the interrupt count to track this metric.

./ethtool -I -a eth0
Pause parameters for eth0:
Autonegotiate: off
RX: off
TX: on
Statistics:
  tx_pause_frames: 759657
  rx_pause_frames: 0
  tx_pause_storm_events: 219

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20260302230149.1580195-5-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: Add protection against pause storm

Add protection against TX pause storms. A pause storm occurs when a
device fails to send received packets up to the stack. When a pause
storm is detected (pause state persists beyond the configured timeout),
the device stops sending the pause frames and begins dropping packets
instead of back-pressuring.

The timeout is configurable via ethtool tunable (pfc-prevention-tout)
with a maximum value of 10485ms, and the default value of 500ms.

Once the device transitions to the storm-detected state, the service
task periodically attempts recovery, returning the device to normal
operation to handle any subsequent pause storm episodes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20260302230149.1580195-4-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethtool: Update doc for tunable

ETHTOOL_PFC_PREVENTION_TOUT enables the configuration of timeout value
for PFC storm prevention. This can also be used to configure storm
detection timeout for global pause settings. In fact some existing
drivers are already using it for the said purpose.

Highlight that the knob can formally be used to configure timeout
value for pause storm prevention mechanism. The update to the ethtool
man page will follow afterwards.

Link: https://lore.kernel.org/aa5f189a-ac62-4633-97b5-ebf939e9c535@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20260302230149.1580195-3-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethtool: Track pause storm events

With TX pause enabled, if a device is unable to pass packets up to the
stack (e.g., CPU is hanged), the device can cause pause storm. Given
that devices can have native support to protect the neighbor from such
flooding, such events need some tracking. This support is to track TX
pause storm events for better observability.

Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20260302230149.1580195-2-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'gve-optimize-and-enable-hw-gro-for-dqo'

Ankit Garg says:

====================
gve: optimize and enable HW GRO for DQO

The DQO device has always performed HW GRO, not LRO. This series updates
the feature bit and modifies the RX path to enhance support. It sets
gso_segs correctly so the software stack can continue coalescing, and
pulls network headers into the skb linear space to avoid multiple small
memory copies when header-split is disabled.

We also enable HW GRO by default on supported devices.
====================

Link: https://patch.msgid.link/20260303195549.2679070-1-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Enable hw-gro by default if device supported

Change the driver's default behavior to enable hw-gro whenever supported
for device.

Performance observations:
- We observed ~10% improvement in RX single stream throughput across
various MTU sizes.
- No change in TCP_RR/TCP_CRR latencies

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260303195549.2679070-5-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: pull network headers into skb linear part

Currently, in DQO mode with hw-gro enabled, entire received packet is
placed into skb fragments when header-split is disabled. This leaves
the skb linear part empty, forcing the networking stack to do multiple
small memory copies to access eth, IP and TCP headers.

This patch adds a single memcpy to put all headers into linear portion
before packet reaches the SW GRO stack; thus eliminating multiple
smaller memcpy calls.

Additionally, the criteria for calling napi_gro_frags() was updated.
Since skb->head is now populated, we instead check if the SKB is the
cached NAPI scratchpad to ensure we continue using the zero-allocation
path.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260303195549.2679070-4-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: fix SW coalescing when hw-GRO is used

Leaving gso_segs unpopulated on hardware GRO packet prevents further
coalescing by software stack because the kernel's GRO logic marks the
SKB for flush because the expected length of all segments doesn't match
actual payload length.

Setting gso_segs correctly results in significantly more segments being
coalesced as measured by the result of dev_gro_receive().

gso_segs are derived from payload length. When header-split is enabled,
payload is in the non-linear portion of skb. And when header-split is
disabled, we have to parse the headers to determine payload length.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260303195549.2679070-3-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Advertise NETIF_F_GRO_HW instead of NETIF_F_LRO

The device behind DQO format has always coalesced packets per stricter
hardware GRO spec even though it was being advertised as LRO.

Update advertised capability to match device behavior.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260303195549.2679070-2-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

r8169: add support for RTL8125cp

This patch adds support for chip RTL8125cp. Its XID is 0x708. We apply
different configuration and firmware for RTL8125cp.

Signed-off-by: Javen Xu <javen_xu@realsil.com.cn>
Link: https://patch.msgid.link/20260303094611.450-1-javen_xu@realsil.com.cn
[pabeni@redhat.com: changelog cleanup]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ppp: don't store tx skb in the fastpath

Currently, ppp->xmit_pending is used in ppp_send_frame() to pass a skb
to ppp_push(), and holds the skb when a PPP channel cannot immediately
transmit it. This state is redundant because the transmit queue
(ppp->file.xq) can already handle the backlog. Furthermore, during
normal operation, an skb is queued in file.xq only to be immediately
dequeued, causing unnecessary overhead.

Refactor the transmit path to avoid stashing the skb when possible:
- Remove ppp->xmit_pending.
- Rename ppp_send_frame() to ppp_prepare_tx_skb(), and don't call
  ppp_push() in it. It returns 1 if the skb is consumed
  (dropped/handled) or 0 if it can be passed to ppp_push().
- Update ppp_push() to accept the skb. It returns 1 if the skb is
  consumed, or 0 if the channel is busy.
- Optimize __ppp_xmit_process():
  - Fastpath: If the queue is empty, attempt to send the skb directly
    via ppp_push(). If busy, queue it.
  - Slowpath: If the queue is not empty, process the backlog in
    file.xq. Split dequeuing loop into a separate function
    ppp_xmit_flush() so ppp_channel_push() uses that directly instead of
    passing a NULL skb to __ppp_xmit_process().

This simplifies the states and reduces locking in the fastpath.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20260303093219.234403-1-dqfext@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mana: Add MAC address to vPort logs and clarify error messages

Add MAC address to vPort configuration success message and update error
message to be more specific about HWC message errors in
mana_send_request.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260302174204.234837-1-ernis@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'nf-next-26-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: updates for net-next

The following patchset contains Netfilter updates for *net-next*,
including changes to IPv6 stack and updates to IPVS from Julian Anastasov.

1) ipv6: export fib6_lookup for nft_fib_ipv6 module
2) factor out ipv6_anycast_destination logic so its usable without
   dst_entry.  These are dependencies for patch 3.
3) switch nft_fib_ipv6 module to no longer need temporary dst_entry
   object allocations by using fib6_lookup() + RCU.
   This gets us ~13% higher packet rate in my tests.

Patches 4 to 8, from Eric Dumazet, zap sk_callback_lock usage in
netfilter.  Patch 9 removes another sk_callback_lock instance.

Remaining patches, from Julian Anastasov, improve IPVS, Quoting Julian:
* Add infrastructure for resizable hash tables based on hlist_bl.
* Change the 256-bucket service hash table to be resizable.
* Change the global connection table to be per-net and resizable.
* Make connection hashing more secure for setups with multiple services.

netfilter pull request nf-next-26-03-04

* tag 'nf-next-26-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  ipvs: use more keys for connection hashing
  ipvs: switch to per-net connection table
  ipvs: use resizable hash table for services
  ipvs: add resizable hash tables
  rculist_bl: add hlist_bl_for_each_entry_continue_rcu
  netfilter: nfnetlink_queue: remove locking in nfqnl_get_sk_secctx
  netfilter: nfnetlink_queue: no longer acquire sk_callback_lock
  netfilter: nfnetlink_log: no longer acquire sk_callback_lock
  netfilter: nft_meta: no longer acquire sk_callback_lock in nft_meta_get_eval_skugid()
  netfilter: xt_owner: no longer acquire sk_callback_lock in mt_owner()
  netfilter: nf_log_syslog: no longer acquire sk_callback_lock in nf_log_dump_sk_uid_gid()
  netfilter: nft_fib_ipv6: switch to fib6_lookup
  ipv6: make ipv6_anycast_destination logic usable without dst_entry
  ipv6: export fib6_lookup for nft_fib_ipv6
====================

Link: https://patch.msgid.link/20260304114921.31042-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'amd-xgbe-add-support-for-p100a-platform'

Raju Rangoju says:

====================
amd-xgbe: add support for P100a platform

This patch series adds support for the AMD P100a platform featuring
the ethernet controller PCI device ID 0x1122.

The P100a platform uses different register access patterns and speed
encoding compared to previous generation hardware (Yellow Carp,etc.)
Key differences include:

1. Different XPCS window offset calculation due to changed memory mapping
2. 2.5G speed uses XGMII mode (ss=0x06) instead of GMII (ss=0x02)
3. Extended port speed bits (6-bit instead of 5-bit) for 5G support

The series is organized as follows:

Patch 1: Defines macros for MAC version numbers and speed select values
to replace hardcoded magic numbers

Patch 2: Adds the core P100a platform support with PCI ID,
register configuration, and version-specific behavior

Tested on AMD P100a platform verifying:
- 10G/2.5G/1G/100M link establishment
- PHY initialization and auto-negotiation
- No register access errors
====================

Link: https://patch.msgid.link/20260302044409.1388430-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

amd-xgbe: add support for P100a platform

Add hardware support for the AMD P100a platform featuring the ethernet
controller PCI device ID 0x1122.

Platform-specific changes include:

1. PCI device ID and register configuration:
   - Add XGBE_P100a_PCI_DEVICE_ID (0x1122) for recognition
   - Configure platform-specific XPCS window registers
   - Disable CDR workaround and RRC for this platform

2. XPCS window offset calculation fix:
   The P100a platform uses a different memory mapping scheme for XPCS
   register access. The offset calculation differs between platforms:

   - Older platforms (YC): offset = base + (addr & mask)
     The address is masked first, then added to the window base.

   - P100a: offset = (base + addr) & mask
     The full address is added to base first, then masked.

   This is critical because using the wrong calculation causes register
   reads/writes to access incorrect addresses, leading to incorrect
   behaviour.

3. 2.5G speed mode handling:
   P100a uses XGMII mode (ss=0x06) for 2.5G instead of GMII mode
   (ss=0x02) used by older platforms. The MAC version check determines
   which mode to use.

4. Port speed bits extended:
   Extend XP_PROP_0_PORT_SPEEDS from 5 bits to 6 bits to support the
   additional 5G speed capability.

5. Rx adaptation disabled:
   Rx adaptation is disabled for P100a (MAC version 0x33) as this
   feature requires further development for this platform.

6. Rate change command for 2.5G:
   Use XGBE_MB_SUBCMD_2_5G_KX subcommand for 2.5G mode on P100a
   instead of XGBE_MB_SUBCMD_NONE used on older platforms.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260302044634.1388661-2-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

amd-xgbe: define macros for MAC versions and speed select values

Define symbolic constants for MAC hardware version numbers and speed
select register values to improve code readability and maintainability.

This replaces magic numbers like 0x30, 0x33, 0x07, 0x06, etc. with
descriptive macro names that indicate their purpose:

MAC versions:
- XGBE_MAC_VER_30: Baseline version supporting Rx adaptation
- XGBE_MAC_VER_33: P100a platform version

Speed select values for MAC_TCR_SS register:
- XGBE_MAC_SS_10G: 10Gbps XGMII mode
- XGBE_MAC_SS_2_5G_GMII: 2.5Gbps GMII mode (older platforms)
- XGBE_MAC_SS_2_5G_XGMII: 2.5Gbps XGMII mode (P100a)
- XGBE_MAC_SS_1G: 1Gbps mode
- XGBE_MAC_SS_100M: 100Mbps mode
- XGBE_MAC_SS_10M: 10Mbps mode

No functional changes.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260302044634.1388661-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dibs: change dibs_class to a const struct

The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Change dibs_class to be a const struct class and drop the
class_create() call.

Link: https://lore.kernel.org/all/2023040244-duffel-pushpin-f738@gregkh/
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://patch.msgid.link/20260303163104.3749311-1-jkoolstra@xs4all.nl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: update the README

I have added some instructions for driver authors on the NIPA wiki:
https://github.com/linux-netdev/nipa/wiki/Guidance-for-test-authors
last year. Given the increasingly common use of LLMs let's add those
in tree as well. Hopefully this will decrease the number of review
comments we have to give to AI-assisted noobs.

While at it sync the overall instructions with what's on the GitHub
as well.

Link: https://patch.msgid.link/20260303213626.2320308-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss: Fix error calculation in test_hitless_key_update

This test verifies there are no errors when a devices RSS key is updated
while traffic is flowing. The current check is a no-op since the last
sample was subtracted from itself.

Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20260303202258.1595661-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2026-03-02 (ice, i40e, ixgbe)

For ice:
Simon Horman adds const modifier to read only member of a struct.

For i40e:
Yury Norov removes an unneeded check of bitmap_weight().

Andy Shevchenko adds a missing include.

For ixgbe:
Aleksandr changes declaration of a bitmap to utilize DECLARE_BITMAP()
macro.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ixgbe: refactor: use DECLARE_BITMAP for ring state field
  i40e: Add missing wordpart.h header
  i40e: drop useless bitmap_weight() call in i40e_set_rxfh_fields()
  ice: Make name member of struct ice_cgu_pin_desc const
====================

Link: https://patch.msgid.link/20260304000800.3536872-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: use ktime_t in struct scm_timestamping_internal

Instead of using struct timespec64 in scm_timestamping_internal,
use ktime_t, saving 24 bytes in kernel stack.

This makes tcp_update_recv_tstamps() small enough to be inlined.

The ktime_t -> timespec64 conversions happen after socket lock
has been released in tcp_recvmsg(), and only if the application
requested them.

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux
add/remove: 0/2 grow/shrink: 5/4 up/down: 146/-277 (-131)
Function                                     old     new   delta
tcp_zerocopy_receive                        2383    2425     +42
mptcp_recvmsg                               1565    1607     +42
tcp_recvmsg_locked                          3797    3823     +26
put_cmsg_scm_timestamping64                  131     149     +18
put_cmsg_scm_timestamping                    131     149     +18
__pfx_tcp_update_recv_tstamps                 16       -     -16
do_tcp_getsockopt                           4024    4006     -18
tcp_recv_timestamp                           474     430     -44
tcp_zc_handle_leftover                       417     371     -46
__sock_recv_timestamp                       1087    1031     -56
tcp_update_recv_tstamps                       97       -     -97
Total: Before=25223788, After=25223657, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20260304012747.881644-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: openvswitch: clean up some kernel-doc warnings

Fix some kernel-doc warnings in openvswitch.h:

Mark enum placeholders that are not used as "private" so that kernel-doc
comments are not needed for them.

Correct names for 2 enum values:
Warning: include/uapi/linux/openvswitch.h:300 Excess enum value
'@OVS_VPORT_UPCALL_SUCCESS' description in 'ovs_vport_upcall_attr'
Warning: include/uapi/linux/openvswitch.h:300 Excess enum value
'@OVS_VPORT_UPCALL_FAIL' description in 'ovs_vport_upcall_attr'

Convert one comment from "/**" kernel-doc to a plain C "/*" comment:
Warning: include/uapi/linux/openvswitch.h:638 This comment starts with
'/**', but isn't a kernel-doc comment.
* Omit attributes for notifications.

Add more kernel-doc:
- add kernel-doc for kernel-only enums;
- add missing kernel-doc for enum ovs_datapath_attr;
- add missing kernel-doc for enum ovs_flow_attr;
- add missing kernel-doc for enum ovs_sample_attr;
- add kernel-doc for enum ovs_check_pkt_len_attr;
- add kernel-doc for enum ovs_action_attr;
- add kernel-doc for enum ovs_action_push_eth;
- add kernel-doc for enum ovs_vport_attr;

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://patch.msgid.link/20260304012437.469151-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: move tcp_do_parse_auth_options() to net/ipv4/tcp.c

tcp_do_parse_auth_options() fast path user is tcp_inbound_hash().

Move tcp_do_parse_auth_options() right before tcp_inbound_hash()
so that it can be (auto)inlined by the compiler.

As a bonus, stack canary is removed from tcp_inbound_hash().

Also use EXPORT_IPV6_MOD(tcp_do_parse_auth_options).

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux
add/remove: 0/0 grow/shrink: 1/0 up/down: 131/0 (131)
Function old new delta
tcp_inbound_hash 565 696 +131
Total: Before=25223788, After=25223919, chg +0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260303191243.557245-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'rfs-use-high-order-allocations-for-hash-tables'

Eric Dumazet says:

====================
rfs: use high-order allocations for hash tables

This series adds rps_tag_ptr which encodes both a pointer
and a size of a power-of-two hash table in a single long word.

RFS hash tables (global and per rx-queue) are converted to rps_tag_ptr.

This removes a cache line miss, and allows high-order allocations.

The global hash table can benefit from huge pages.
====================

Link: https://patch.msgid.link/20260302181432.1836150-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table

Instead of storing the @log at the beginning of rps_dev_flow_table
use 5 low order bits of the rps_tag_ptr to store the log of the size.

This removes a potential cache line miss (for light traffic).

This allows us to switch to one high-order allocation instead of vmalloc()
when CONFIG_RFS_ACCEL is not set.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net-sysfs: remove rcu field from 'struct rps_dev_flow_table'

Remove rps_dev_flow_table_release() in favor of kvfree_rcu_mightsleep().

In the following pach, we will remove "u8 @log" field
and 'struct rps_dev_flow_table' size will be a power-of-two.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net-sysfs: get rid of rps_dev_flow_lock

Use unrcu_pointer() and xchg() in store_rps_dev_flow_table_cnt()
instead of a dedicated spinlock.

Make a similar change in rx_queue_release(), so that both
functions use a similar construct and synchronization.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net-sysfs: use rps_tag_ptr and remove metadata from rps_sock_flow_table

Instead of storing the @mask at the beginning of rps_sock_flow_table,
use 5 low order bits of the rps_tag_ptr to store the log of the size.

This removes a potential cache line miss to fetch @mask.

More importantly, we can switch to vmalloc_huge() without wasting memory.

Tested with:

numactl --interleave=all bash -c "echo 4194304 >/proc/sys/net/core/rps_sock_flow_entries"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net-sysfs: add rps_sock_flow_table_mask() helper

In preparation of the following patch, abstract access
to the @mask field in 'struct rps_sock_flow_table'.

Also cleanup rps_sock_flow_sysctl() a bit :

- Rename orig_sock_table to o_sock_table.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net-sysfs: remove rcu field from 'struct rps_sock_flow_table'

Removing rcu_head (and @mask in a following patch)
will allow a power-of-two allocation and thus high-order
allocation for better performance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add rps_tag_ptr type and helpers

Add a new rps_tag_ptr type to encode a pointer and a size
to a power-of-two table.

Three helpers are added converting an rps_tag_ptr to:

1) A log of the size.

2) A mask : (size - 1).

3) A pointer to the array.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fix off-by-one in udp_flow_src_port() / psp_write_headers()

udp_flow_src_port() and psp_write_headers() use ip_local_port_range.

ip_local_port_range is inclusive : all ports between min and max
can be used.

Before this patch, if ip_local_port_range was set to 40000-40001
40001 would not be used as a source port.

Use reciprocal_scale() to help code readability.

Not tagged for stable trees, as this change could break user
expectations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260302163933.1754393-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tools-ynl-tests-adjust-makefile-to-mimic-ksft'

Jakub Kicinski says:

====================
tools: ynl: tests: adjust Makefile to mimic ksft

Make a few minor adjustments to tools/net/ynl/tests/Makefile
to align its behavior more with how real kselftests behave.
This series allows running the YNL tests in NIPA with little
extra integration effort.

If anyone already integrated these tests into their CI minor
adjustments to the integration may be needed (due to patch 2).
====================

Link: https://patch.msgid.link/20260303163504.2084981-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: produce kselftest-list.txt from tests

Executors will need kselftest-list.txt so create it when
tests are installed.

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260303163504.2084981-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support INSTALL_PATH in the tests Makefile

We have modelled the YNL tests after ksft to be able to reuse
the NIPA wrappers. Make sure YNL honors INSTALL_PATH not just
DESTDIR, ksft uses INSTALL_PATH.

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260303163504.2084981-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: don't install tests in /usr/bin/

Until commit 790792ebc960 ("tools: ynl: don't install tests")
YNL selftests were installed with all the other YNL outputs.
That's no longer the case, as tests are not really production
artifacts. Let's not install them in /usr/bin at all, and
mirror kselftest format more closely:

For: make -C tools/net/ynl/tests/ install DESTDIR=tmp

tmp/usr/share/kselftest
              ├── ktap_helpers.sh
              └── ynl
                  ├── test_ynl_cli.sh
                  └── test_ynl_ethtool.sh

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260303163504.2084981-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: rename TESTS variable to TEST_PROGS

Use the standard kselftest variable naming for tests in the Makefile.
NIPA depends on being able to selectively target tests by setting
those variables on the CLI.

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260303163504.2084981-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mxl862xx: rename MDIO op arguments

The use of the 'port' argument name for functions implementing the MDIO
bus operations is misleading as the port address isn't equal to the
PHY address.

Rename the MDIO operation argument name to match the prototypes of
mdiobus_write, mdiobus_read, mdiobus_c45_read and mdiobus_c45_write.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/e1f4cb3bcffc7df9af0f2c9b673b14c7e1201c9a.1772507674.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dsa: maxlinear,mxl862xx: remove port label

The ports in the example device tree should not have a 'label'
property. Labels for all user ports have been removed from an earlier
submission, but this was overlooked in the case of the CPU port.

Remove 'cpu' port label from the example.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/61579de297eb636ec5f1e6c97d453e26abb0625d.1772507210.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Notable features this time:
- cfg80211/mac80211
   - finished assoc frame encryption/EPPKE/802.1X-over-auth
     (also hwsim)
   - radar detection improvements
   - 6 GHz incumbent signal detection APIs
   - multi-link support for FILS, probe response
     templates and client probling
- ath12k:
   - monitor mode support on IPQ5332
   - basic hwmon temperature reporting

* tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (38 commits)
  wifi: UHR: define DPS/DBE/P-EDCA elements and fix size parsing
  wifi: mac80211_hwsim: change hwsim_class to a const struct
  wifi: mac80211: give the AP more time for EPPKE as well
  wifi: ath12k: Remove the unused argument from the Rx data path
  wifi: ath12k: Enable monitor mode support on IPQ5332
  wifi: ath12k: Set up MLO after SSR
  wifi: ath11k: Silence remoteproc probe deferral prints
  wifi: cfg80211: support key installation on non-netdev wdevs
  wifi: cfg80211: make cluster id an array
  wifi: mac80211: update outdated comment
  wifi: mac80211: Advertise IEEE 802.1X authentication support
  wifi: mac80211: Add support for IEEE 802.1X authentication protocol in non-AP STA mode
  wifi: cfg80211: add support for IEEE 802.1X Authentication Protocol
  wifi: mac80211: Advertise EPPKE support based on driver capabilities
  wifi: mac80211_hwsim: Advertise support for (Re)Association frame encryption
  wifi: mac80211: Fix AAD/Nonce computation for management frames with MLO
  wifi: rt2x00: use generic nvmem_cell_get
  wifi: mac80211: fetch unsolicited probe response template by link ID
  wifi: mac80211: fetch FILS discovery template by link ID
  wifi: nl80211: don't allow DFS channels for NAN
  ...
====================

Link: https://patch.msgid.link/20260304113707.175181-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wifi: UHR: define DPS/DBE/P-EDCA elements and fix size parsing

Add UHR Operation and Capability definitions and parsing helpers:

- Define ieee80211_uhr_dps_info, ieee80211_uhr_dbe_info,
  ieee80211_uhr_p_edca_info with masks.
- Update ieee80211_uhr_oper_size_ok() to account for optional
  DPS/DBE/P-EDCA blocks.
- Move NPCA pointer position after DPS Operation Parameter if it is
  present in ieee80211_uhr_oper_size_ok().
- Move NPCA pointer position after DPS info if it is present in
  ieee80211_uhr_npca_info().

Signed-off-by: Karthikeyan Kathirvel <karthikeyan.kathirvel@oss.qualcomm.com>
Link: https://patch.msgid.link/20260304085343.1093993-2-karthikeyan.kathirvel@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

ipvs: use more keys for connection hashing

Simon Kirby reported long time ago that IPVS connection hashing
based only on the client address/port (caddr, cport) as hash keys
is not suitable for setups that accept traffic on multiple virtual
IPs and ports. It can happen for multiple VIP:VPORT services, for
single or many fwmark service(s) that match multiple virtual IPs
and ports or even for passive FTP with peristence in DR/TUN mode
where we expect traffic on multiple ports for the virtual IP.

Fix it by adding virtual addresses and ports to the hash function.
This causes the traffic from NAT real servers to clients to use
second hashing for the in->out direction.

As result:

- the IN direction from client will use hash node hn0 where
the source/dest addresses and ports used by client will be used
as hash keys

- the OUT direction from NAT real servers will use hash node hn1
for the traffic from real server to client

- the persistence templates are hashed only with parameters based on
the IN direction, so they now will also use the virtual address,
port and fwmark from the service.

OLD:
- all methods: c_list node: proto, caddr:cport
- persistence templates: c_list node: proto, caddr_net:0
- persistence engine templates: c_list node: per-PE, PE-SIP uses jhash

NEW:
- all methods: hn0 node (dir 0): proto, caddr:cport -> vaddr:vport
- MASQ method: hn1 node (dir 1): proto, daddr:dport -> caddr:cport
- persistence templates: hn0 node (dir 0):
proto, caddr_net:0 -> vaddr:vport_or_0
proto, caddr_net:0 -> fwmark:0
- persistence engine templates: hn0 node (dir 0): as before

Also reorder the ip_vs_conn fields, so that hash nodes are on same
read-mostly cache line while write-mostly fields are on separate
cache line.

Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: switch to per-net connection table

Use per-net resizable hash table for connections. The global table
is slow to walk when using many namespaces.

The table can be resized in the range of [256 - ip_vs_conn_tab_size].
Table is attached only while services are present. Resizing is done
by delayed work based on load (the number of connections).

Add a hash_key field into the connection to store the table ID in
the highest bit and the entry's hash value in the lowest bits. The
lowest part of the hash value is used as bucket ID, the remaining
part is used to filter the entries in the bucket before matching
the keys and as result, helps the lookup operation to access only
one cache line. By knowing the table ID and bucket ID for entry,
we can unlink it without calculating the hash value and doing
lookup by keys. We need only to validate the saved hash_key under
lock.

For better security switch from jhash to siphash for the default
connection hashing but the persistence engines may use their own
function. Keeping the hash table loaded with entries below the
size (12%) allows to avoid collision for 96+% of the conns.

ip_vs_conn_fill_cport() now will rehash the connection with proper
locking because unhash+hash is not safe for RCU readers.

To invalidate the templates setting just dport to 0xffff is enough,
no need to rehash them. As result, ip_vs_conn_unhash() is now
unused and removed.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: use resizable hash table for services

Make the hash table for services resizable in the bit range of 4-20.
Table is attached only while services are present. Resizing is done
by delayed work based on load (the number of hashed services).
Table grows when load increases 2+ times (above 12.5% with lfactor=-3)
and shrinks 8+ times when load decreases 16+ times (below 0.78%).

Switch to jhash hashing to reduce the collisions for multiple
services.

Add a hash_key field into the service to store the table ID in
the highest bit and the entry's hash value in the lowest bits. The
lowest part of the hash value is used as bucket ID, the remaining
part is used to filter the entries in the bucket before matching
the keys and as result, helps the lookup operation to access only
one cache line. By knowing the table ID and bucket ID for entry,
we can unlink it without calculating the hash value and doing
lookup by keys. We need only to validate the saved hash_key under
lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: add resizable hash tables

Add infrastructure for resizable hash tables based on hlist_bl
which we will use in followup patches.

The tables allow RCU lookups during resizing, bucket modifications
are protected with per-bucket bit lock and additional custom locking,
the tables are resized when load reaches thresholds determined based
on load factor parameter.

Compared to other implementations we rely on:
* fast entry removal by using node unlinking without pre-lookup
* entry rehashing when hash key changes
* entries can contain multiple hash nodes
* custom locking depending on different contexts
* adjustable load factor to customize the grow/shrink process

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

rculist_bl: add hlist_bl_for_each_entry_continue_rcu

Change the old hlist_bl_first_rcu to hlist_bl_first_rcu_dereference
to indicate that it is a RCU dereference.

Add hlist_bl_next_rcu and hlist_bl_first_rcu to use RCU pointers
and use them to fix sparse warnings.

Add hlist_bl_for_each_entry_continue_rcu.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: remove locking in nfqnl_get_sk_secctx

We don't need the cb lock here.
Also, if skb was NULL we'd have crashed already.

Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: no longer acquire sk_callback_lock

After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in
nfqnl_put_sk_uidgid() to avoid touching sk->sk_callback_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_log: no longer acquire sk_callback_lock

After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in
__build_packet_message() to avoid touching sk->sk_callback_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nft_meta: no longer acquire sk_callback_lock in nft_meta_get_eval_skugid()

After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in
nft_meta_get_eval_skugid() to avoid touching sk->sk_callback_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: xt_owner: no longer acquire sk_callback_lock in mt_owner()

After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in mt_owner()
to avoid touching sk_callback_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nf_log_syslog: no longer acquire sk_callback_lock in nf_log_dump_sk_uid_gid()

After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in nf_log_dump_sk_uid_gid()
to avoid touching sk_callback_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nft_fib_ipv6: switch to fib6_lookup

Existing code works but it requires a temporary dst object that is
released again right away.

Switch to fib6_lookup + RT6_LOOKUP_F_DST_NOREF: no need for temporary dst
objects and refcount overhead anymore.

Provides ~13% improvement in match performance.

Signed-off-by: Florian Westphal <fw@strlen.de>

ipv6: make ipv6_anycast_destination logic usable without dst_entry

nft_fib_ipv6 uses ipv6_anycast_destination(), but upcoming patch removes
the dst_entry usage in favor of fib6_result.

Move the 'plen > 127' logic to a new helper and call it from the
existing one.

Signed-off-by: Florian Westphal <fw@strlen.de>

ipv6: export fib6_lookup for nft_fib_ipv6

Upcoming patch will call fib6_lookup from nft_fib_ipv6. The EXPORT_SYMBOL is
added twice because there are two implementations of the function, one
is a small stub for MULTIPLE_TABLES=n, only one is compiled into the
kernel depending on .config settings.

Alternative to EXPORT_SYMBOL is to use an indirect call via the
ipv6_stub->fib6_lookup() indirection, but thats more expensive than the
direct call.

Also, nft_fib_ipv6 cannot be builtin if ipv6 is a module.

Signed-off-by: Florian Westphal <fw@strlen.de>

wifi: mac80211_hwsim: change hwsim_class to a const struct

The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Change hwsim_class to be a const struct class and drop the
class_create() call.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Link: https://patch.msgid.link/20260303165938.3773998-1-jkoolstra@xs4all.nl
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge tag 'ath-next-20260303' of https://git.kernel.org/pub/scm/linux/kernel/git/ath/ath

Jeff Johnson says:
==================
ath.git patches for v7.1 (PR #1)

For ath12k:
Add basic hwmon temperature reporting.
Enable monitor mode on IPQ5332.

Also a few small cleanups and bug fixes across ath drivers.
==================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

selftests: net: add macvlan multicast test for shared source MAC

Add a selftest that verifies multicast delivery to a macvlan bridge
port when the source MAC of the incoming frame matches the macvlan's
own MAC address.

This scenario occurs with protocols like VRRP where multiple hosts
share the same virtual MAC address. Without the corresponding kernel
change, macvlan bridge mode does not handle this case and the
multicast frame is not delivered.

Signed-off-by: Kibaek Yoo <psykibaek@gmail.com>
Link: https://patch.msgid.link/20260228071613.4360-2-psykibaek@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macvlan: support multicast rx for bridge ports with shared source MAC

Macvlan bridge mode currently does not handle the case where an
external source shares its MAC address with a local macvlan interface.
When such a frame arrives, macvlan_hash_lookup() matches the source
MAC to the local macvlan, and macvlan_multicast_rx() assumes bridge
ports already received the frame during local transmission. Since the
frame actually originated externally, bridge ports never saw it.

This situation arises with protocols like VRRP, where multiple hosts
use the same virtual MAC address.

Support this by passing NULL as the source device and including
MACVLAN_MODE_BRIDGE in the mode mask for the else branch of
macvlan_multicast_rx(). This ensures all VEPA and bridge mode macvlan
interfaces receive incoming multicast regardless of source MAC
matching. The trade-off is that looped-back locally-originated
multicasts may be delivered to bridge ports a second time, but
multicast consumers already handle duplicate frames.

Signed-off-by: Kibaek Yoo <psykibaek@gmail.com>
Link: https://patch.msgid.link/20260228071613.4360-1-psykibaek@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: core: failover: enforce mandatory ops and clean up redundant checks

The failover framework requires 'ops' to be functional. Currently,
failover_register() allows an instance to be registered with NULL
ops, which leads to inconsistent NULL checks and potential NULL
pointer dereferences in the slave registration paths.

Harden the entry point by requiring non-NULL ops in
failover_register(). This ensures the 'fops' pointer is guaranteed
to be valid for any successfully registered failover instance.
Consequently, remove the now redundant NULL checks for 'fops'
throughout the module to simplify the logic.

Signed-off-by: Zeeshan Ahmad <zeeshanahmad022019@gmail.com>
Link: https://patch.msgid.link/20260302064317.9964-1-zeeshanahmad022019@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: netconsole: print diagnostic on busywait timeout in netcons_basic

The script uses set -euo pipefail, so when busywait times out waiting
for the netconsole message to arrive, it returns 1 and the script exits
immediately without printing any error message. As reported by Jakub,
this makes failures hard to diagnose since the test reports exit=1 with
no explanation.

Handle the busywait failure explicitly so that a FAIL message is printed
before exiting. This is how it looks like now:

Running with target mode: basic (ipv6)
[ 167.452561] netconsole selftest: netcons_QdMay
FAIL: Timed out waiting (20000 ms) for netconsole message in /tmp/netcons_QdMay

The remaining silent failures under set -e can only happen during the
setup phase (netdevsim creation, interface configuration, configfs
writes). So, it is not expected to have any silent failure once the test
starts.

Note that this issue might be less frequent now, since commit
a68a9bd086c28 ("selftests: netconsole: Increase port listening timeout")
increased the timeout that _might_ have been the root cause of these
random failures in NIPA.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260302-netconsole_test_verbose-v1-1-b1be5d30cd7d@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'grab-ipa-imem-slice-through-dt'

Konrad Dybcio says:

====================
Grab IPA IMEM slice through DT

This adds the necessary driver change to migrate over from
hardcoded-per-IPA-version-but-varying-per-implementation numbers, while
unfortunately keeping them in there for backwards compatibility.

The DT changes will be submitted in a separate series, this one is OK
to merge independently.
====================

Link: https://patch.msgid.link/20260302-topic-ipa_imem-v6-0-c0ebbf3eae9f@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: Grab IMEM slice base/size from DTS

This is a detail that differ per chip, and not per IPA version (and
there are cases of the same IPA versions being implemented across very
very very different SoCs).

This region isn't actually used by the driver, but we most definitely
want to iommu-map it, so that IPA can poke at the data within.

Reviewed-by: Alex Elder <elder@riscstar.com>
Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260302-topic-ipa_imem-v6-3-c0ebbf3eae9f@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: qcom,ipa: Add sram property for describing IMEM slice

The IPA driver currently grabs a slice of IMEM through hardcoded
addresses. Not only is that ugly and against the principles of DT,
but it also creates a situation where two distinct platforms
implementing the same version of IPA would need to be hardcoded
together and matched at runtime.

Instead, do the sane thing and accept a handle to said region directly.

Don't make it required on purpose, as it's not there on ancient
implementations (currently unsupported) and we're not yet done with
filling the data across al DTs.

Reviewed-by: Alex Elder <elder@riscstar.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260302-topic-ipa_imem-v6-2-c0ebbf3eae9f@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: sram: qcom,imem: Allow modem-tables subnode

The IP Accelerator hardware/firmware owns a sizeable region within the
IMEM, named 'modem-tables', containing various packet processing
configuration data.

It's not actually accessed by the OS, although we have to IOMMU-map it
with the IPA device, so that presumably the firmware can act upon it.

Allow it as a subnode of IMEM.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Alex Elder <elder@riscstar.com>
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260302-topic-ipa_imem-v6-1-c0ebbf3eae9f@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: use ethtool_sprintf to fill ethtool stats strings

The RISC-V toolchain triggers a stringop-truncation warning when using
snprintf() with a fixed ETH_GSTRING_LEN (32 bytes) buffer.

Convert the driver to use the modern ethtool_sprintf() API from
linux/ethtool.h. This removes the need for manual snprintf() and
memcpy() calls, handles the 32-byte padding automatically, and
simplifies the logic by removing manual pointer arithmetic.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Sean Chang <seanwascoding@gmail.com>
Link: https://patch.msgid.link/20260302142931.49108-1-seanwascoding@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: core: allow netdev_upper_get_next_dev_rcu from bh context

Since XDP programs are called from a NAPI poll context, the RCU
reference liveness is ensured by local_bh_disable().

Commit aeea1b86f936 ("bpf, devmap: Exclude XDP broadcast to master
device") started to call netdev_upper_get_next_dev_rcu() from this
context, but missed adding rcu_read_lock_bh_held() as a condition to the
RCU checks.
While both bh_disabled and rcu_read_lock() provide RCU protection,
lockdep complains since the check condition is insufficient [1].

Add rcu_read_lock_bh_held() as condition to help lockdep to understand
the dereference is safe, in the same way as commit 694cea395fde ("bpf:
Allow RCU-protected lookups to happen from bh context").

[1]
WARNING: net/core/dev.c:8099 at netdev_upper_get_next_dev_rcu+0x96/0xd0, CPU#0: swapper/0/0
...
RIP: 0010:netdev_upper_get_next_dev_rcu+0x96/0xd0
...
  <IRQ>
  dev_map_enqueue_multi+0x411/0x970
  xdp_do_redirect+0xdf2/0x1030
  __igc_xdp_run_prog+0x6a0/0xc80
  igc_poll+0x34b0/0x70b0
  __napi_poll.constprop.0+0x98/0x490
  net_rx_action+0x8f2/0xfa0
  handle_softirqs+0x1c7/0x710
  __irq_exit_rcu+0xb1/0xf0
  irq_exit_rcu+0x9/0x20
  common_interrupt+0x7f/0x90
  </IRQ>

Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260220110922.94781-1-kohei@enjuk.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

NFC: s3fwrn5: Replace strcpy() with strscpy()

Replace strcpy() with strscpy() which limits the copy to the size of
the destination buffer. Since fw_info->fw_name is an array with a
fixed, declared size, the two-argument variant of strscpy() is used -
the compiler deduces the buffer size automatically.

This is a defensive cleanup replacing the deprecated strcpy()
with the preferred strscpy().

Signed-off-by: Tomasz Unger <tomasz.unger@yahoo.pl>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260302100908.26399-1-tomasz.unger@yahoo.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

NFC: nfcmrvl: Replace strcpy() with strscpy()

Replace strcpy() with strscpy() which limits the copy to the size of
the destination buffer. Since fw_dnld->name is an array, the
two-argument variant of strscpy() is used - the compiler deduces
the buffer size automatically.

This is a defensive cleanup replacing the deprecated strcpy()
with the preferred strscpy().

Signed-off-by: Tomasz Unger <tomasz.unger@yahoo.pl>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260301144345.218628-1-tomasz.unger@yahoo.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

NFC: nxp-nci: Replace strcpy() with strscpy()

Replace strcpy() with strscpy() which limits the copy to the size of
the destination buffer. Since fw_info->name is an array, the
two-argument variant of strscpy() is used - the compiler deduces
the buffer size automatically.

This is a defensive cleanup replacing the deprecated strcpy()
with the preferred strscpy().

Signed-off-by: Tomasz Unger <tomasz.unger@yahoo.pl>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260301135633.214497-1-tomasz.unger@yahoo.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

NFC: pn544: i2c: Replace strcpy() with strscpy()

Replace strcpy() with strscpy() which limits the copy to the size of
the destination buffer. Since phy->firmware_name is an array, the
two-argument variant of strscpy() is used - the compiler deduces
the buffer size automatically.

This is a defensive cleanup. As pointed out by Jakub Kicinski
<kuba@kernel.org>, firmware_name is already bounded to
NFC_FIRMWARE_NAME_MAXSIZE via nla_strscpy() in net/nfc/netlink.c
before reaching this driver, so no actual buffer overflow is possible.

Signed-off-by: Tomasz Unger <tomasz.unger@yahoo.pl>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260301121254.174354-1-tomasz.unger@yahoo.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbe: refactor: use DECLARE_BITMAP for ring state field

Convert the ring state field from 'unsigned long' to a proper bitmap
using DECLARE_BITMAP macro, aligning with the implementation pattern
already used in the i40e driver.

This change:
- Adds __IXGBE_RING_STATE_NBITS as the bitmap size sentinel to enum
  ixgbe_ring_state_t (consistent with i40e's __I40E_RING_STATE_NBITS)
- Changes 'unsigned long state' to 'DECLARE_BITMAP(state,
  __IXGBE_RING_STATE_NBITS)' in struct ixgbe_ring
- Removes the address-of operator (&) when passing ring->state to bit
  manipulation functions, as bitmap arrays naturally decay to pointers

The change maintains functional equivalence while using the
more appropriate kernel bitmap API, consistent with other Intel Ethernet
drivers.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

i40e: Add missing wordpart.h header

When cleaning up another header I have met this build error:

drivers/net/ethernet/intel/i40e/i40e_hmc.h:105:22: error: implicit declaration of function 'upper_32_bits' [-Wimplicit-function-declaration]
105 | val1 = (u32)(upper_32_bits(pa)); \

This is due to missing header, add it to fix the possible issue.

Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

i40e: drop useless bitmap_weight() call in i40e_set_rxfh_fields()

bitmap_weight() is O(N) and useless here, because the following
for_each_set_bit() returns immediately in case of empty flow_pctypes.

Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Make name member of struct ice_cgu_pin_desc const

The name member of struct ice_cgu_pin_desc never modified.
Make it const.

Found by inspection.
Compile tested only.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge branch 'net-phy-improve-stats-handling-in-mdio_bus-c'

Heiner Kallweit says:

====================
net: phy: improve stats handling in mdio_bus.c

Improve stats handling in mdio_bus.c.
====================

Link: https://patch.msgid.link/799114be-1456-442b-b479-142e7ee9d254@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: improve mdiobus_stats_acct

- Remove duplicated preempt disable. Disabling preemption has been added
to functions like u64_stats_update_begin() in the meantime.
- Simplify branch structure

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/2ceeb542-986a-404e-ad0f-62e0a938ce7c@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: inline helper mdio_bus_get_global_stat

mdio_bus_get_global_stat() has only one user. Inline it to simplify
the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/7876625a-bd6f-42b4-8eb3-420f39d2f59a@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: use macro __ATTRIBUTE_GROUPS

Use macro __ATTRIBUTE_GROUPS() to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/260fb184-c662-415c-b288-e1423097f2b9@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: constify attributes and attribute arrays

Constify attributes and attribute arrays, using new member attrs_const
of struct attribute_group.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/c20f17bb-3489-42b5-b8fe-457245ac6cb3@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: avoid extra casting in mdio_bus_get_stat

Using void * instead of char * allows to remove one cast.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/054bbf60-d8ac-45ce-8b80-9c396469b7f9@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: consider that mdio_bus_device_stat_field_show doesn't use member address

mdio_bus_device_stat_field_show() doesn't use the address member,
so we don't have to initialize it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/03a812a7-6871-4cc0-b5bf-ee80c6d6b5fd@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: use macro __ATTR to simplify the code

Use macro __ATTR to simplify the code. Note that __ATTR can't be used
in MDIO_BUS_STATS_ADDR_ATTR_DECL because the included stringification
would conflict with how argument file is passed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/4877a4dc-247c-4453-b281-20a8d969b15b@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: extend struct mdio_bus_stat_attr instead of using dev_ext_attribute

Currently the var member of struct dev_ext_attribute is used in a very
ugly way. Extend struct mdio_bus_stat_attr instead, what allows to
simplify the code and also slightly reduces memory footprint.

Note: Member addr is renamed to avoid a conflict in macro
MDIO_BUS_STATS_ADDR_ATTR_DECL.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ce9f85d2-4f72-4b15-b868-210a8ced662d@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: davinci_emac: stop using bus type mdio_bus_type

This driver is the only user of mdio_bus_type outside phylib.
Using mdio_bus_type isn't strictly needed here, so use an alternative
approach. This will allow to make mdio_bus_type private to phylib
in a follow-up series.

Compile-tested only.

Note: Devices supported by this driver are OF-only, therefore the string
comparison in match_first_device() isn't needed any longer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/cc8e83aa-48c3-4497-b6ad-760a7f9e25dc@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg: Add HSR/PRP protocol frame filtering

Add support for HSR and PRP protocol frame filtering in the ICSSG
classifier by configuring filter table 3 (FT3) to detect PTP frames
(EtherType 0x88F7) in HSR/PRP tagged packets.

Also add rx_class_or_base to miig_rt_offsets structure to support
RX_CLASS_OR register access, and fix typos in FT1_N_REG and FT3_N_REG
macros (slize -> slice).

Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20260227174254.3821443-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'dpll-zl3073x-consolidate-chip-info-and-add-temperature-reporting'

Ivan Vecera says:

====================
dpll: zl3073x: consolidate chip info and add temperature reporting

This series refactors the ZL3073x chip variant handling and adds die
temperature reporting for chips that support it.

Patch 1 replaces the five per-variant chip_info structures and their
exported symbols with a single consolidated lookup table. The chip
variant is now detected at runtime from the chip ID register rather
than being selected at compile time via bus driver match data. This
simplifies the I2C/SPI drivers and makes adding new variants a
single-line table addition. A flags field replaces the hardcoded
chip_id switch in zl3073x_dev_is_ref_phase_comp_32bit().

Patch 2 uses the new flags infrastructure to add die temperature
reporting for chip variants that provide a temperature status register.
The temp_get callback is conditionally set during device registration
based on the ZL3073X_FLAG_DIE_TEMP chip flag.
====================

Link: https://patch.msgid.link/20260227105300.710272-1-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dpll: zl3073x: add die temperature reporting for supported chips

Some zl3073x chip variants (0x1Exx, 0x2Exx and 0x3FC4) provide a die
temperature status register with 0.1 C resolution.

Add a ZL3073X_FLAG_DIE_TEMP chip flag to identify these variants and
implement zl3073x_dpll_temp_get() as the dpll_device_ops.temp_get
callback. The register value is converted from 0.1 C units to
millidegrees as expected by the DPLL subsystem.

To support per-instance ops selection, copy the base dpll_device_ops
into struct zl3073x_dpll and conditionally set .temp_get during device
registration based on the chip flag.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260227105300.710272-3-ivecera@redhat.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dpll: zl3073x: detect DPLL channel count from chip ID at runtime

Replace the five per-variant zl3073x_chip_info structures and their
exported symbol definitions with a single consolidated chip ID lookup
table. The chip variant is now detected at runtime by reading the chip
ID register from hardware and looking it up in the table, rather than
being selected at compile time via the bus driver match data.

Repurpose struct zl3073x_chip_info to hold a single chip ID, its
channel count, and a flags field. Introduce enum zl3073x_flags with
ZL3073X_FLAG_REF_PHASE_COMP_32 to replace the chip_id switch statement
in zl3073x_dev_is_ref_phase_comp_32bit(). Store a pointer to the
detected chip_info entry in struct zl3073x_dev for runtime access.

This simplifies the bus drivers by removing per-variant .data and
.driver_data references from the I2C/SPI match tables, and makes
adding support for new chip variants a single-line table addition.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260227105300.710272-2-ivecera@redhat.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wifi: mac80211: give the AP more time for EPPKE as well

EPPKE authentication can use SAE (via PASN), so give the
AP more time to respond to EPPKE case just like for SAE.

Link: https://patch.msgid.link/20260128132414.881741-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

net: mana: Trigger VF reset/recovery on health check failure due to HWC timeout

The GF stats periodic query is used as mechanism to monitor HWC health
check. If this HWC command times out, it is a strong indication that
the device/SoC is in a faulty state and requires recovery.

Today, when a timeout is detected, the driver marks
hwc_timeout_occurred, clears cached stats, and stops rescheduling the
periodic work. However, the device itself is left in the same failing
state.

Extend the timeout handling path to trigger the existing MANA VF
recovery service by queueing a GDMA_EQE_HWC_RESET_REQUEST work item.
This is expected to initiate the appropriate recovery flow by suspende
resume first and if it fails then trigger a bus rescan.

This change is intentionally limited to HWC command timeouts and does
not trigger recovery for errors reported by the SoC as a normal command
response.

Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/aaFShvKnwR5FY8dH@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

atm: atmdev: add function parameter names and description

kernel-doc reports function parameters not described for parameters
that are not named. Add parameter names for these functions and then
describe the function parameters in kernel-doc format.

Fixes these warnings:
Warning: include/linux/atmdev.h:316 function parameter '' not described
in 'register_atm_ioctl'
Warning: include/linux/atmdev.h:321 function parameter '' not described
in 'deregister_atm_ioctl'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260228220845.2978547-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dccp Remove inet_hashinfo2_init_mod().

Commit c92c81df93df ("net: dccp: fix kernel crash on module load")
added inet_hashinfo2_init_mod() for DCCP.

Commit 22d6c9eebf2e ("net: Unexport shared functions for DCCP.")
removed EXPORT_SYMBOL_GPL() it but forgot to remove the function
itself.

Let's remove inet_hashinfo2_init_mod().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260301063756.1581685-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipmr-no-rtnl-for-rtnl_family_ipmr-rtnetlink'

Kuniyuki Iwashima says:

====================
ipmr: No RTNL for RTNL_FAMILY_IPMR rtnetlink.

This series removes RTNL from ipmr rtnetlink handlers.

After this series, there are a few RTNL left in net/ipv4/ipmr.c
and such users will be converted to per-netns RTNL in another
series.

Patch 1 adds a selftest to exercise most? of the RTNL paths
in net/ipv4/ipmr.c

Patch 2 - 6 converts RTM_GETLINK / RTM_GETROUTE handlers
to RCU.

Patch 7 - 9 converts ->exit_batch() to ->exit_rtnl() to
save one RTNL in cleanup_net().

Patch 10 - 11 removes unnecessary RTNL during setup_net()
failure.

Patch 12 is a random cleanup.

Patch 13 - 15 drops RTNL for RTM_NEWROUTE and RTM_DELROUTE.
====================

Link: https://patch.msgid.link/20260228221800.1082070-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipmr: Don't hold RTNL for ipmr_rtm_route().

ipmr_mfc_add() and ipmr_mfc_delete() are already protected
by a dedicated mutex.

rtm_to_ipmr_mfcc() calls __ipmr_get_table(), __dev_get_by_index(),
amd ipmr_find_vif().

Once __dev_get_by_index() is converted to dev_get_by_index_rcu(),
we can move the other two functions under that same RCU section
and drop RTNL for ipmr_rtm_route().

Let's do that conversion and drop ASSERT_RTNL() in
mr_call_mfc_notifiers().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260228221800.1082070-16-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipmr: Add dedicated mutex for mrt->{mfc_hash,mfc_cache_list}.

We will no longer hold RTNL for ipmr_rtm_route() to modify the
MFC hash table.

Only __dev_get_by_index() in rtm_to_ipmr_mfcc() is the RTNL
dependant, otherwise, we just need protection for mrt->mfc_hash
and mrt->mfc_cache_list.

Let's add a new mutex for ipmr_mfc_add(), ipmr_mfc_delete(),
and mroute_clean_tables() (setsockopt(MRT_FLUSH or MRT_DONE)).

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260228221800.1082070-15-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipmr/ip6mr: Convert net->ipv[46].ipmr_seq to atomic_t.

We will no longer hold RTNL for ipmr_mfc_add() and ipmr_mfc_delete().

MFC entry can be loosely connected with VIF by its index for
mrt->vif_table[] (stored in mfc_parent), but the two tables are
not synchronised. i.e. Even if VIF 1 is removed, MFC for VIF 1
is not automatically removed.

The only field that the MFC/VIF interfaces share is
net->ipv[46].ipmr_seq, which is protected by RTNL.

Adding a new mutex for both just to protect a single field is overkill.

Let's convert the field to atomic_t.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260228221800.1082070-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipmr: Define net->ipv4.{ipmr_notifier_ops,ipmr_seq} under CONFIG_IP_MROUTE.

net->ipv4.ipmr_notifier_ops and net->ipv4.ipmr_seq are used
only in net/ipv4/ipmr.c.

Let's move these definitions under CONFIG_IP_MROUTE.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260228221800.1082070-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipmr: Call fib_rules_unregister() without RTNL.

fib_rules_unregister() removes ops from net->rules_ops under
spinlock, calls ops->delete() for each rule, and frees the ops.

ipmr_rules_ops_template does not have ->delete(), and any
operation does not require RTNL there.

Let's move fib_rules_unregister() from ipmr_rules_exit_rtnl()
to ipmr_net_exit().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260228221800.1082070-12-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>