Mohsin Bashir [Mon, 2 Mar 2026 23:01:49 +0000 (15:01 -0800)]
eth: mlx5: Move pause storm errors to pause stats
Report device_stall_critical_watermark_cnt as tx_pause_storm_events in
the ethtool_pause_stats struct. This counter tracks pause storm error
events which indicate the NIC has been sending pause frames for an
extended period due to a stall.
The ethtool_pause_stats struct reports these stalls as a single value,
whereas the device supports tracking them per priority. Aggregate the
counter across all priority classes to capture stalls on all priorities.
Note that the stats are fetched from the device for each priority via
mlx5_core_access_reg().
Mohsin Bashir [Mon, 2 Mar 2026 23:01:48 +0000 (15:01 -0800)]
eth: fbnic: Fetch TX pause storm stats
With pause storm protection in place, track the occurrence of pause
storm events. Since there is a one-to-one mapping between pause storm
interrupts and events, use the interrupt count to track this metric.
./ethtool -I -a eth0
Pause parameters for eth0:
Autonegotiate: off
RX: off
TX: on
Statistics:
tx_pause_frames: 759657
rx_pause_frames: 0
tx_pause_storm_events: 219
Mohsin Bashir [Mon, 2 Mar 2026 23:01:47 +0000 (15:01 -0800)]
eth: fbnic: Add protection against pause storm
Add protection against TX pause storms. A pause storm occurs when a
device fails to send received packets up to the stack. When a pause
storm is detected (pause state persists beyond the configured timeout),
the device stops sending the pause frames and begins dropping packets
instead of back-pressuring.
The timeout is configurable via ethtool tunable (pfc-prevention-tout)
with a maximum value of 10485ms, and the default value of 500ms.
Once the device transitions to the storm-detected state, the service
task periodically attempts recovery, returning the device to normal
operation to handle any subsequent pause storm episodes.
Mohsin Bashir [Mon, 2 Mar 2026 23:01:46 +0000 (15:01 -0800)]
net: ethtool: Update doc for tunable
ETHTOOL_PFC_PREVENTION_TOUT enables the configuration of timeout value
for PFC storm prevention. This can also be used to configure storm
detection timeout for global pause settings. In fact some existing
drivers are already using it for the said purpose.
Highlight that the knob can formally be used to configure timeout
value for pause storm prevention mechanism. The update to the ethtool
man page will follow afterwards.
Mohsin Bashir [Mon, 2 Mar 2026 23:01:45 +0000 (15:01 -0800)]
net: ethtool: Track pause storm events
With TX pause enabled, if a device is unable to pass packets up to the
stack (e.g., CPU is hanged), the device can cause pause storm. Given
that devices can have native support to protect the neighbor from such
flooding, such events need some tracking. This support is to track TX
pause storm events for better observability.
====================
gve: optimize and enable HW GRO for DQO
The DQO device has always performed HW GRO, not LRO. This series updates
the feature bit and modifies the RX path to enhance support. It sets
gso_segs correctly so the software stack can continue coalescing, and
pulls network headers into the skb linear space to avoid multiple small
memory copies when header-split is disabled.
We also enable HW GRO by default on supported devices.
====================
Ankit Garg [Tue, 3 Mar 2026 19:55:49 +0000 (11:55 -0800)]
gve: Enable hw-gro by default if device supported
Change the driver's default behavior to enable hw-gro whenever supported
for device.
Performance observations:
- We observed ~10% improvement in RX single stream throughput across
various MTU sizes.
- No change in TCP_RR/TCP_CRR latencies
Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-5-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ankit Garg [Tue, 3 Mar 2026 19:55:48 +0000 (11:55 -0800)]
gve: pull network headers into skb linear part
Currently, in DQO mode with hw-gro enabled, entire received packet is
placed into skb fragments when header-split is disabled. This leaves
the skb linear part empty, forcing the networking stack to do multiple
small memory copies to access eth, IP and TCP headers.
This patch adds a single memcpy to put all headers into linear portion
before packet reaches the SW GRO stack; thus eliminating multiple
smaller memcpy calls.
Additionally, the criteria for calling napi_gro_frags() was updated.
Since skb->head is now populated, we instead check if the SKB is the
cached NAPI scratchpad to ensure we continue using the zero-allocation
path.
Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-4-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ankit Garg [Tue, 3 Mar 2026 19:55:47 +0000 (11:55 -0800)]
gve: fix SW coalescing when hw-GRO is used
Leaving gso_segs unpopulated on hardware GRO packet prevents further
coalescing by software stack because the kernel's GRO logic marks the
SKB for flush because the expected length of all segments doesn't match
actual payload length.
Setting gso_segs correctly results in significantly more segments being
coalesced as measured by the result of dev_gro_receive().
gso_segs are derived from payload length. When header-split is enabled,
payload is in the non-linear portion of skb. And when header-split is
disabled, we have to parse the headers to determine payload length.
Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-3-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Qingfang Deng [Tue, 3 Mar 2026 09:32:19 +0000 (17:32 +0800)]
ppp: don't store tx skb in the fastpath
Currently, ppp->xmit_pending is used in ppp_send_frame() to pass a skb
to ppp_push(), and holds the skb when a PPP channel cannot immediately
transmit it. This state is redundant because the transmit queue
(ppp->file.xq) can already handle the backlog. Furthermore, during
normal operation, an skb is queued in file.xq only to be immediately
dequeued, causing unnecessary overhead.
Refactor the transmit path to avoid stashing the skb when possible:
- Remove ppp->xmit_pending.
- Rename ppp_send_frame() to ppp_prepare_tx_skb(), and don't call
ppp_push() in it. It returns 1 if the skb is consumed
(dropped/handled) or 0 if it can be passed to ppp_push().
- Update ppp_push() to accept the skb. It returns 1 if the skb is
consumed, or 0 if the channel is busy.
- Optimize __ppp_xmit_process():
- Fastpath: If the queue is empty, attempt to send the skb directly
via ppp_push(). If busy, queue it.
- Slowpath: If the queue is not empty, process the backlog in
file.xq. Split dequeuing loop into a separate function
ppp_xmit_flush() so ppp_channel_push() uses that directly instead of
passing a NULL skb to __ppp_xmit_process().
This simplifies the states and reduces locking in the fastpath.
Paolo Abeni [Thu, 5 Mar 2026 10:32:49 +0000 (11:32 +0100)]
Merge tag 'nf-next-26-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
The following patchset contains Netfilter updates for *net-next*,
including changes to IPv6 stack and updates to IPVS from Julian Anastasov.
1) ipv6: export fib6_lookup for nft_fib_ipv6 module
2) factor out ipv6_anycast_destination logic so its usable without
dst_entry. These are dependencies for patch 3.
3) switch nft_fib_ipv6 module to no longer need temporary dst_entry
object allocations by using fib6_lookup() + RCU.
This gets us ~13% higher packet rate in my tests.
Patches 4 to 8, from Eric Dumazet, zap sk_callback_lock usage in
netfilter. Patch 9 removes another sk_callback_lock instance.
Remaining patches, from Julian Anastasov, improve IPVS, Quoting Julian:
* Add infrastructure for resizable hash tables based on hlist_bl.
* Change the 256-bucket service hash table to be resizable.
* Change the global connection table to be per-net and resizable.
* Make connection hashing more secure for setups with multiple services.
netfilter pull request nf-next-26-03-04
* tag 'nf-next-26-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
ipvs: use more keys for connection hashing
ipvs: switch to per-net connection table
ipvs: use resizable hash table for services
ipvs: add resizable hash tables
rculist_bl: add hlist_bl_for_each_entry_continue_rcu
netfilter: nfnetlink_queue: remove locking in nfqnl_get_sk_secctx
netfilter: nfnetlink_queue: no longer acquire sk_callback_lock
netfilter: nfnetlink_log: no longer acquire sk_callback_lock
netfilter: nft_meta: no longer acquire sk_callback_lock in nft_meta_get_eval_skugid()
netfilter: xt_owner: no longer acquire sk_callback_lock in mt_owner()
netfilter: nf_log_syslog: no longer acquire sk_callback_lock in nf_log_dump_sk_uid_gid()
netfilter: nft_fib_ipv6: switch to fib6_lookup
ipv6: make ipv6_anycast_destination logic usable without dst_entry
ipv6: export fib6_lookup for nft_fib_ipv6
====================
====================
amd-xgbe: add support for P100a platform
This patch series adds support for the AMD P100a platform featuring
the ethernet controller PCI device ID 0x1122.
The P100a platform uses different register access patterns and speed
encoding compared to previous generation hardware (Yellow Carp,etc.)
Key differences include:
1. Different XPCS window offset calculation due to changed memory mapping
2. 2.5G speed uses XGMII mode (ss=0x06) instead of GMII (ss=0x02)
3. Extended port speed bits (6-bit instead of 5-bit) for 5G support
The series is organized as follows:
Patch 1: Defines macros for MAC version numbers and speed select values
to replace hardcoded magic numbers
Patch 2: Adds the core P100a platform support with PCI ID,
register configuration, and version-specific behavior
Tested on AMD P100a platform verifying:
- 10G/2.5G/1G/100M link establishment
- PHY initialization and auto-negotiation
- No register access errors
====================
Raju Rangoju [Mon, 2 Mar 2026 04:46:34 +0000 (10:16 +0530)]
amd-xgbe: add support for P100a platform
Add hardware support for the AMD P100a platform featuring the ethernet
controller PCI device ID 0x1122.
Platform-specific changes include:
1. PCI device ID and register configuration:
- Add XGBE_P100a_PCI_DEVICE_ID (0x1122) for recognition
- Configure platform-specific XPCS window registers
- Disable CDR workaround and RRC for this platform
2. XPCS window offset calculation fix:
The P100a platform uses a different memory mapping scheme for XPCS
register access. The offset calculation differs between platforms:
- Older platforms (YC): offset = base + (addr & mask)
The address is masked first, then added to the window base.
- P100a: offset = (base + addr) & mask
The full address is added to base first, then masked.
This is critical because using the wrong calculation causes register
reads/writes to access incorrect addresses, leading to incorrect
behaviour.
3. 2.5G speed mode handling:
P100a uses XGMII mode (ss=0x06) for 2.5G instead of GMII mode
(ss=0x02) used by older platforms. The MAC version check determines
which mode to use.
4. Port speed bits extended:
Extend XP_PROP_0_PORT_SPEEDS from 5 bits to 6 bits to support the
additional 5G speed capability.
5. Rx adaptation disabled:
Rx adaptation is disabled for P100a (MAC version 0x33) as this
feature requires further development for this platform.
6. Rate change command for 2.5G:
Use XGBE_MB_SUBCMD_2_5G_KX subcommand for 2.5G mode on P100a
instead of XGBE_MB_SUBCMD_NONE used on older platforms.
Jori Koolstra [Tue, 3 Mar 2026 16:31:04 +0000 (17:31 +0100)]
dibs: change dibs_class to a const struct
The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Change dibs_class to be a const struct class and drop the
class_create() call.
Jakub Kicinski [Tue, 3 Mar 2026 21:36:25 +0000 (13:36 -0800)]
selftests: drv-net: update the README
I have added some instructions for driver authors on the NIPA wiki:
https://github.com/linux-netdev/nipa/wiki/Guidance-for-test-authors
last year. Given the increasingly common use of LLMs let's add those
in tree as well. Hopefully this will decrease the number of review
comments we have to give to AI-assisted noobs.
While at it sync the overall instructions with what's on the GitHub
as well.
selftests: drv-net: rss: Fix error calculation in test_hitless_key_update
This test verifies there are no errors when a devices RSS key is updated
while traffic is flowing. The current check is a no-op since the last
sample was subtracted from itself.
Jakub Kicinski [Thu, 5 Mar 2026 02:37:21 +0000 (18:37 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-03-02 (ice, i40e, ixgbe)
For ice:
Simon Horman adds const modifier to read only member of a struct.
For i40e:
Yury Norov removes an unneeded check of bitmap_weight().
Andy Shevchenko adds a missing include.
For ixgbe:
Aleksandr changes declaration of a bitmap to utilize DECLARE_BITMAP()
macro.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ixgbe: refactor: use DECLARE_BITMAP for ring state field
i40e: Add missing wordpart.h header
i40e: drop useless bitmap_weight() call in i40e_set_rxfh_fields()
ice: Make name member of struct ice_cgu_pin_desc const
====================
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20260304012747.881644-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Wed, 4 Mar 2026 01:24:37 +0000 (17:24 -0800)]
net: openvswitch: clean up some kernel-doc warnings
Fix some kernel-doc warnings in openvswitch.h:
Mark enum placeholders that are not used as "private" so that kernel-doc
comments are not needed for them.
Correct names for 2 enum values:
Warning: include/uapi/linux/openvswitch.h:300 Excess enum value
'@OVS_VPORT_UPCALL_SUCCESS' description in 'ovs_vport_upcall_attr'
Warning: include/uapi/linux/openvswitch.h:300 Excess enum value
'@OVS_VPORT_UPCALL_FAIL' description in 'ovs_vport_upcall_attr'
Convert one comment from "/**" kernel-doc to a plain C "/*" comment:
Warning: include/uapi/linux/openvswitch.h:638 This comment starts with
'/**', but isn't a kernel-doc comment.
* Omit attributes for notifications.
Add more kernel-doc:
- add kernel-doc for kernel-only enums;
- add missing kernel-doc for enum ovs_datapath_attr;
- add missing kernel-doc for enum ovs_flow_attr;
- add missing kernel-doc for enum ovs_sample_attr;
- add kernel-doc for enum ovs_check_pkt_len_attr;
- add kernel-doc for enum ovs_action_attr;
- add kernel-doc for enum ovs_action_push_eth;
- add kernel-doc for enum ovs_vport_attr;
====================
tools: ynl: tests: adjust Makefile to mimic ksft
Make a few minor adjustments to tools/net/ynl/tests/Makefile
to align its behavior more with how real kselftests behave.
This series allows running the YNL tests in NIPA with little
extra integration effort.
If anyone already integrated these tests into their CI minor
adjustments to the integration may be needed (due to patch 2).
====================
Jakub Kicinski [Tue, 3 Mar 2026 16:35:03 +0000 (08:35 -0800)]
tools: ynl: support INSTALL_PATH in the tests Makefile
We have modelled the YNL tests after ksft to be able to reuse
the NIPA wrappers. Make sure YNL honors INSTALL_PATH not just
DESTDIR, ksft uses INSTALL_PATH.
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260303163504.2084981-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 3 Mar 2026 16:35:02 +0000 (08:35 -0800)]
tools: ynl: don't install tests in /usr/bin/
Until commit 790792ebc960 ("tools: ynl: don't install tests")
YNL selftests were installed with all the other YNL outputs.
That's no longer the case, as tests are not really production
artifacts. Let's not install them in /usr/bin at all, and
mirror kselftest format more closely:
For: make -C tools/net/ynl/tests/ install DESTDIR=tmp
Jakub Kicinski [Tue, 3 Mar 2026 16:35:01 +0000 (08:35 -0800)]
tools: ynl: rename TESTS variable to TEST_PROGS
Use the standard kselftest variable naming for tests in the Makefile.
NIPA depends on being able to selectively target tests by setting
those variables on the CLI.
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260303163504.2084981-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Tue, 3 Mar 2026 03:17:43 +0000 (03:17 +0000)]
net: dsa: mxl862xx: rename MDIO op arguments
The use of the 'port' argument name for functions implementing the MDIO
bus operations is misleading as the port address isn't equal to the
PHY address.
Rename the MDIO operation argument name to match the prototypes of
mdiobus_write, mdiobus_read, mdiobus_c45_read and mdiobus_c45_write.
Daniel Golle [Tue, 3 Mar 2026 03:16:23 +0000 (03:16 +0000)]
dt-bindings: net: dsa: maxlinear,mxl862xx: remove port label
The ports in the example device tree should not have a 'label'
property. Labels for all user ports have been removed from an earlier
submission, but this was overlooked in the case of the CPU port.
Jakub Kicinski [Wed, 4 Mar 2026 23:30:04 +0000 (15:30 -0800)]
Merge tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Notable features this time:
- cfg80211/mac80211
- finished assoc frame encryption/EPPKE/802.1X-over-auth
(also hwsim)
- radar detection improvements
- 6 GHz incumbent signal detection APIs
- multi-link support for FILS, probe response
templates and client probling
- ath12k:
- monitor mode support on IPQ5332
- basic hwmon temperature reporting
* tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (38 commits)
wifi: UHR: define DPS/DBE/P-EDCA elements and fix size parsing
wifi: mac80211_hwsim: change hwsim_class to a const struct
wifi: mac80211: give the AP more time for EPPKE as well
wifi: ath12k: Remove the unused argument from the Rx data path
wifi: ath12k: Enable monitor mode support on IPQ5332
wifi: ath12k: Set up MLO after SSR
wifi: ath11k: Silence remoteproc probe deferral prints
wifi: cfg80211: support key installation on non-netdev wdevs
wifi: cfg80211: make cluster id an array
wifi: mac80211: update outdated comment
wifi: mac80211: Advertise IEEE 802.1X authentication support
wifi: mac80211: Add support for IEEE 802.1X authentication protocol in non-AP STA mode
wifi: cfg80211: add support for IEEE 802.1X Authentication Protocol
wifi: mac80211: Advertise EPPKE support based on driver capabilities
wifi: mac80211_hwsim: Advertise support for (Re)Association frame encryption
wifi: mac80211: Fix AAD/Nonce computation for management frames with MLO
wifi: rt2x00: use generic nvmem_cell_get
wifi: mac80211: fetch unsolicited probe response template by link ID
wifi: mac80211: fetch FILS discovery template by link ID
wifi: nl80211: don't allow DFS channels for NAN
...
====================
wifi: UHR: define DPS/DBE/P-EDCA elements and fix size parsing
Add UHR Operation and Capability definitions and parsing helpers:
- Define ieee80211_uhr_dps_info, ieee80211_uhr_dbe_info,
ieee80211_uhr_p_edca_info with masks.
- Update ieee80211_uhr_oper_size_ok() to account for optional
DPS/DBE/P-EDCA blocks.
- Move NPCA pointer position after DPS Operation Parameter if it is
present in ieee80211_uhr_oper_size_ok().
- Move NPCA pointer position after DPS info if it is present in
ieee80211_uhr_npca_info().
Simon Kirby reported long time ago that IPVS connection hashing
based only on the client address/port (caddr, cport) as hash keys
is not suitable for setups that accept traffic on multiple virtual
IPs and ports. It can happen for multiple VIP:VPORT services, for
single or many fwmark service(s) that match multiple virtual IPs
and ports or even for passive FTP with peristence in DR/TUN mode
where we expect traffic on multiple ports for the virtual IP.
Fix it by adding virtual addresses and ports to the hash function.
This causes the traffic from NAT real servers to clients to use
second hashing for the in->out direction.
As result:
- the IN direction from client will use hash node hn0 where
the source/dest addresses and ports used by client will be used
as hash keys
- the OUT direction from NAT real servers will use hash node hn1
for the traffic from real server to client
- the persistence templates are hashed only with parameters based on
the IN direction, so they now will also use the virtual address,
port and fwmark from the service.
OLD:
- all methods: c_list node: proto, caddr:cport
- persistence templates: c_list node: proto, caddr_net:0
- persistence engine templates: c_list node: per-PE, PE-SIP uses jhash
NEW:
- all methods: hn0 node (dir 0): proto, caddr:cport -> vaddr:vport
- MASQ method: hn1 node (dir 1): proto, daddr:dport -> caddr:cport
- persistence templates: hn0 node (dir 0):
proto, caddr_net:0 -> vaddr:vport_or_0
proto, caddr_net:0 -> fwmark:0
- persistence engine templates: hn0 node (dir 0): as before
Also reorder the ip_vs_conn fields, so that hash nodes are on same
read-mostly cache line while write-mostly fields are on separate
cache line.
Use per-net resizable hash table for connections. The global table
is slow to walk when using many namespaces.
The table can be resized in the range of [256 - ip_vs_conn_tab_size].
Table is attached only while services are present. Resizing is done
by delayed work based on load (the number of connections).
Add a hash_key field into the connection to store the table ID in
the highest bit and the entry's hash value in the lowest bits. The
lowest part of the hash value is used as bucket ID, the remaining
part is used to filter the entries in the bucket before matching
the keys and as result, helps the lookup operation to access only
one cache line. By knowing the table ID and bucket ID for entry,
we can unlink it without calculating the hash value and doing
lookup by keys. We need only to validate the saved hash_key under
lock.
For better security switch from jhash to siphash for the default
connection hashing but the persistence engines may use their own
function. Keeping the hash table loaded with entries below the
size (12%) allows to avoid collision for 96+% of the conns.
ip_vs_conn_fill_cport() now will rehash the connection with proper
locking because unhash+hash is not safe for RCU readers.
To invalidate the templates setting just dport to 0xffff is enough,
no need to rehash them. As result, ip_vs_conn_unhash() is now
unused and removed.
Make the hash table for services resizable in the bit range of 4-20.
Table is attached only while services are present. Resizing is done
by delayed work based on load (the number of hashed services).
Table grows when load increases 2+ times (above 12.5% with lfactor=-3)
and shrinks 8+ times when load decreases 16+ times (below 0.78%).
Switch to jhash hashing to reduce the collisions for multiple
services.
Add a hash_key field into the service to store the table ID in
the highest bit and the entry's hash value in the lowest bits. The
lowest part of the hash value is used as bucket ID, the remaining
part is used to filter the entries in the bucket before matching
the keys and as result, helps the lookup operation to access only
one cache line. By knowing the table ID and bucket ID for entry,
we can unlink it without calculating the hash value and doing
lookup by keys. We need only to validate the saved hash_key under
lock.
Add infrastructure for resizable hash tables based on hlist_bl
which we will use in followup patches.
The tables allow RCU lookups during resizing, bucket modifications
are protected with per-bucket bit lock and additional custom locking,
the tables are resized when load reaches thresholds determined based
on load factor parameter.
Compared to other implementations we rely on:
* fast entry removal by using node unlinking without pre-lookup
* entry rehashing when hash key changes
* entries can contain multiple hash nodes
* custom locking depending on different contexts
* adjustable load factor to customize the grow/shrink process
Eric Dumazet [Thu, 26 Feb 2026 09:40:36 +0000 (09:40 +0000)]
netfilter: nfnetlink_queue: no longer acquire sk_callback_lock
After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in
nfqnl_put_sk_uidgid() to avoid touching sk->sk_callback_lock.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Eric Dumazet [Thu, 26 Feb 2026 08:58:16 +0000 (08:58 +0000)]
netfilter: nfnetlink_log: no longer acquire sk_callback_lock
After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in
__build_packet_message() to avoid touching sk->sk_callback_lock.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Eric Dumazet [Thu, 26 Feb 2026 08:29:22 +0000 (08:29 +0000)]
netfilter: nft_meta: no longer acquire sk_callback_lock in nft_meta_get_eval_skugid()
After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in
nft_meta_get_eval_skugid() to avoid touching sk->sk_callback_lock.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Eric Dumazet [Wed, 25 Feb 2026 13:23:19 +0000 (13:23 +0000)]
netfilter: xt_owner: no longer acquire sk_callback_lock in mt_owner()
After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in mt_owner()
to avoid touching sk_callback_lock.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Eric Dumazet [Wed, 25 Feb 2026 13:20:19 +0000 (13:20 +0000)]
netfilter: nf_log_syslog: no longer acquire sk_callback_lock in nf_log_dump_sk_uid_gid()
After commit 983512f3a87f ("net: Drop the lock in skb_may_tx_timestamp()")
from Sebastian Andrzej Siewior, apply the same logic in nf_log_dump_sk_uid_gid()
to avoid touching sk_callback_lock.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Upcoming patch will call fib6_lookup from nft_fib_ipv6. The EXPORT_SYMBOL is
added twice because there are two implementations of the function, one
is a small stub for MULTIPLE_TABLES=n, only one is compiled into the
kernel depending on .config settings.
Alternative to EXPORT_SYMBOL is to use an indirect call via the
ipv6_stub->fib6_lookup() indirection, but thats more expensive than the
direct call.
Also, nft_fib_ipv6 cannot be builtin if ipv6 is a module.
Jori Koolstra [Tue, 3 Mar 2026 16:59:37 +0000 (17:59 +0100)]
wifi: mac80211_hwsim: change hwsim_class to a const struct
The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Change hwsim_class to be a const struct class and drop the
class_create() call.
Kibaek Yoo [Sat, 28 Feb 2026 07:16:13 +0000 (16:16 +0900)]
selftests: net: add macvlan multicast test for shared source MAC
Add a selftest that verifies multicast delivery to a macvlan bridge
port when the source MAC of the incoming frame matches the macvlan's
own MAC address.
This scenario occurs with protocols like VRRP where multiple hosts
share the same virtual MAC address. Without the corresponding kernel
change, macvlan bridge mode does not handle this case and the
multicast frame is not delivered.
Kibaek Yoo [Sat, 28 Feb 2026 07:16:12 +0000 (16:16 +0900)]
net: macvlan: support multicast rx for bridge ports with shared source MAC
Macvlan bridge mode currently does not handle the case where an
external source shares its MAC address with a local macvlan interface.
When such a frame arrives, macvlan_hash_lookup() matches the source
MAC to the local macvlan, and macvlan_multicast_rx() assumes bridge
ports already received the frame during local transmission. Since the
frame actually originated externally, bridge ports never saw it.
This situation arises with protocols like VRRP, where multiple hosts
use the same virtual MAC address.
Support this by passing NULL as the source device and including
MACVLAN_MODE_BRIDGE in the mode mask for the else branch of
macvlan_multicast_rx(). This ensures all VEPA and bridge mode macvlan
interfaces receive incoming multicast regardless of source MAC
matching. The trade-off is that looped-back locally-originated
multicasts may be delivered to bridge ports a second time, but
multicast consumers already handle duplicate frames.
Zeeshan Ahmad [Mon, 2 Mar 2026 06:43:17 +0000 (11:43 +0500)]
net: core: failover: enforce mandatory ops and clean up redundant checks
The failover framework requires 'ops' to be functional. Currently,
failover_register() allows an instance to be registered with NULL
ops, which leads to inconsistent NULL checks and potential NULL
pointer dereferences in the slave registration paths.
Harden the entry point by requiring non-NULL ops in
failover_register(). This ensures the 'fops' pointer is guaranteed
to be valid for any successfully registered failover instance.
Consequently, remove the now redundant NULL checks for 'fops'
throughout the module to simplify the logic.
Breno Leitao [Mon, 2 Mar 2026 14:40:39 +0000 (06:40 -0800)]
selftests: netconsole: print diagnostic on busywait timeout in netcons_basic
The script uses set -euo pipefail, so when busywait times out waiting
for the netconsole message to arrive, it returns 1 and the script exits
immediately without printing any error message. As reported by Jakub,
this makes failures hard to diagnose since the test reports exit=1 with
no explanation.
Handle the busywait failure explicitly so that a FAIL message is printed
before exiting. This is how it looks like now:
Running with target mode: basic (ipv6)
[ 167.452561] netconsole selftest: netcons_QdMay
FAIL: Timed out waiting (20000 ms) for netconsole message in /tmp/netcons_QdMay
The remaining silent failures under set -e can only happen during the
setup phase (netdevsim creation, interface configuration, configfs
writes). So, it is not expected to have any silent failure once the test
starts.
Note that this issue might be less frequent now, since commit a68a9bd086c28 ("selftests: netconsole: Increase port listening timeout")
increased the timeout that _might_ have been the root cause of these
random failures in NIPA.
Jakub Kicinski [Wed, 4 Mar 2026 01:22:17 +0000 (17:22 -0800)]
Merge branch 'grab-ipa-imem-slice-through-dt'
Konrad Dybcio says:
====================
Grab IPA IMEM slice through DT
This adds the necessary driver change to migrate over from
hardcoded-per-IPA-version-but-varying-per-implementation numbers, while
unfortunately keeping them in there for backwards compatibility.
The DT changes will be submitted in a separate series, this one is OK
to merge independently.
====================
Konrad Dybcio [Mon, 2 Mar 2026 15:58:45 +0000 (16:58 +0100)]
net: ipa: Grab IMEM slice base/size from DTS
This is a detail that differ per chip, and not per IPA version (and
there are cases of the same IPA versions being implemented across very
very very different SoCs).
This region isn't actually used by the driver, but we most definitely
want to iommu-map it, so that IPA can poke at the data within.
Reviewed-by: Alex Elder <elder@riscstar.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://patch.msgid.link/20260302-topic-ipa_imem-v6-3-c0ebbf3eae9f@oss.qualcomm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Konrad Dybcio [Mon, 2 Mar 2026 15:58:44 +0000 (16:58 +0100)]
dt-bindings: net: qcom,ipa: Add sram property for describing IMEM slice
The IPA driver currently grabs a slice of IMEM through hardcoded
addresses. Not only is that ugly and against the principles of DT,
but it also creates a situation where two distinct platforms
implementing the same version of IPA would need to be hardcoded
together and matched at runtime.
Instead, do the sane thing and accept a handle to said region directly.
Don't make it required on purpose, as it's not there on ancient
implementations (currently unsupported) and we're not yet done with
filling the data across al DTs.
The IP Accelerator hardware/firmware owns a sizeable region within the
IMEM, named 'modem-tables', containing various packet processing
configuration data.
It's not actually accessed by the OS, although we have to IOMMU-map it
with the IPA device, so that presumably the firmware can act upon it.
Sean Chang [Mon, 2 Mar 2026 14:29:31 +0000 (22:29 +0800)]
net: macb: use ethtool_sprintf to fill ethtool stats strings
The RISC-V toolchain triggers a stringop-truncation warning when using
snprintf() with a fixed ETH_GSTRING_LEN (32 bytes) buffer.
Convert the driver to use the modern ethtool_sprintf() API from
linux/ethtool.h. This removes the need for manual snprintf() and
memcpy() calls, handles the 32-byte padding automatically, and
simplifies the logic by removing manual pointer arithmetic.
Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Sean Chang <seanwascoding@gmail.com> Link: https://patch.msgid.link/20260302142931.49108-1-seanwascoding@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kohei Enju [Fri, 20 Feb 2026 11:09:17 +0000 (11:09 +0000)]
net: core: allow netdev_upper_get_next_dev_rcu from bh context
Since XDP programs are called from a NAPI poll context, the RCU
reference liveness is ensured by local_bh_disable().
Commit aeea1b86f936 ("bpf, devmap: Exclude XDP broadcast to master
device") started to call netdev_upper_get_next_dev_rcu() from this
context, but missed adding rcu_read_lock_bh_held() as a condition to the
RCU checks.
While both bh_disabled and rcu_read_lock() provide RCU protection,
lockdep complains since the check condition is insufficient [1].
Add rcu_read_lock_bh_held() as condition to help lockdep to understand
the dereference is safe, in the same way as commit 694cea395fde ("bpf:
Allow RCU-protected lookups to happen from bh context").
Tomasz Unger [Mon, 2 Mar 2026 10:09:08 +0000 (11:09 +0100)]
NFC: s3fwrn5: Replace strcpy() with strscpy()
Replace strcpy() with strscpy() which limits the copy to the size of
the destination buffer. Since fw_info->fw_name is an array with a
fixed, declared size, the two-argument variant of strscpy() is used -
the compiler deduces the buffer size automatically.
This is a defensive cleanup replacing the deprecated strcpy()
with the preferred strscpy().
Tomasz Unger [Sun, 1 Mar 2026 14:43:45 +0000 (15:43 +0100)]
NFC: nfcmrvl: Replace strcpy() with strscpy()
Replace strcpy() with strscpy() which limits the copy to the size of
the destination buffer. Since fw_dnld->name is an array, the
two-argument variant of strscpy() is used - the compiler deduces
the buffer size automatically.
This is a defensive cleanup replacing the deprecated strcpy()
with the preferred strscpy().
Tomasz Unger [Sun, 1 Mar 2026 13:56:33 +0000 (14:56 +0100)]
NFC: nxp-nci: Replace strcpy() with strscpy()
Replace strcpy() with strscpy() which limits the copy to the size of
the destination buffer. Since fw_info->name is an array, the
two-argument variant of strscpy() is used - the compiler deduces
the buffer size automatically.
This is a defensive cleanup replacing the deprecated strcpy()
with the preferred strscpy().
Tomasz Unger [Sun, 1 Mar 2026 12:12:54 +0000 (13:12 +0100)]
NFC: pn544: i2c: Replace strcpy() with strscpy()
Replace strcpy() with strscpy() which limits the copy to the size of
the destination buffer. Since phy->firmware_name is an array, the
two-argument variant of strscpy() is used - the compiler deduces
the buffer size automatically.
This is a defensive cleanup. As pointed out by Jakub Kicinski
<kuba@kernel.org>, firmware_name is already bounded to
NFC_FIRMWARE_NAME_MAXSIZE via nla_strscpy() in net/nfc/netlink.c
before reaching this driver, so no actual buffer overflow is possible.
ixgbe: refactor: use DECLARE_BITMAP for ring state field
Convert the ring state field from 'unsigned long' to a proper bitmap
using DECLARE_BITMAP macro, aligning with the implementation pattern
already used in the i40e driver.
This change:
- Adds __IXGBE_RING_STATE_NBITS as the bitmap size sentinel to enum
ixgbe_ring_state_t (consistent with i40e's __I40E_RING_STATE_NBITS)
- Changes 'unsigned long state' to 'DECLARE_BITMAP(state,
__IXGBE_RING_STATE_NBITS)' in struct ixgbe_ring
- Removes the address-of operator (&) when passing ring->state to bit
manipulation functions, as bitmap arrays naturally decay to pointers
The change maintains functional equivalence while using the
more appropriate kernel bitmap API, consistent with other Intel Ethernet
drivers.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Andy Shevchenko [Mon, 2 Mar 2026 09:18:31 +0000 (10:18 +0100)]
i40e: Add missing wordpart.h header
When cleaning up another header I have met this build error:
drivers/net/ethernet/intel/i40e/i40e_hmc.h:105:22: error: implicit declaration of function 'upper_32_bits' [-Wimplicit-function-declaration]
105 | val1 = (u32)(upper_32_bits(pa)); \
This is due to missing header, add it to fix the possible issue.
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e: drop useless bitmap_weight() call in i40e_set_rxfh_fields()
bitmap_weight() is O(N) and useless here, because the following
for_each_set_bit() returns immediately in case of empty flow_pctypes.
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Simon Horman [Tue, 27 Jan 2026 16:16:55 +0000 (16:16 +0000)]
ice: Make name member of struct ice_cgu_pin_desc const
The name member of struct ice_cgu_pin_desc never modified.
Make it const.
Found by inspection.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Heiner Kallweit [Fri, 27 Feb 2026 22:11:02 +0000 (23:11 +0100)]
net: phy: improve mdiobus_stats_acct
- Remove duplicated preempt disable. Disabling preemption has been added
to functions like u64_stats_update_begin() in the meantime.
- Simplify branch structure
Heiner Kallweit [Fri, 27 Feb 2026 22:06:28 +0000 (23:06 +0100)]
net: mdio: use macro __ATTR to simplify the code
Use macro __ATTR to simplify the code. Note that __ATTR can't be used
in MDIO_BUS_STATS_ADDR_ATTR_DECL because the included stringification
would conflict with how argument file is passed.
Heiner Kallweit [Fri, 27 Feb 2026 22:05:18 +0000 (23:05 +0100)]
net: mdio: extend struct mdio_bus_stat_attr instead of using dev_ext_attribute
Currently the var member of struct dev_ext_attribute is used in a very
ugly way. Extend struct mdio_bus_stat_attr instead, what allows to
simplify the code and also slightly reduces memory footprint.
Note: Member addr is renamed to avoid a conflict in macro
MDIO_BUS_STATS_ADDR_ATTR_DECL.
Heiner Kallweit [Fri, 27 Feb 2026 20:52:16 +0000 (21:52 +0100)]
net: ti: davinci_emac: stop using bus type mdio_bus_type
This driver is the only user of mdio_bus_type outside phylib.
Using mdio_bus_type isn't strictly needed here, so use an alternative
approach. This will allow to make mdio_bus_type private to phylib
in a follow-up series.
Compile-tested only.
Note: Devices supported by this driver are OF-only, therefore the string
comparison in match_first_device() isn't needed any longer.
Add support for HSR and PRP protocol frame filtering in the ICSSG
classifier by configuring filter table 3 (FT3) to detect PTP frames
(EtherType 0x88F7) in HSR/PRP tagged packets.
Also add rx_class_or_base to miig_rt_offsets structure to support
RX_CLASS_OR register access, and fix typos in FT1_N_REG and FT3_N_REG
macros (slize -> slice).
====================
dpll: zl3073x: consolidate chip info and add temperature reporting
This series refactors the ZL3073x chip variant handling and adds die
temperature reporting for chips that support it.
Patch 1 replaces the five per-variant chip_info structures and their
exported symbols with a single consolidated lookup table. The chip
variant is now detected at runtime from the chip ID register rather
than being selected at compile time via bus driver match data. This
simplifies the I2C/SPI drivers and makes adding new variants a
single-line table addition. A flags field replaces the hardcoded
chip_id switch in zl3073x_dev_is_ref_phase_comp_32bit().
Patch 2 uses the new flags infrastructure to add die temperature
reporting for chip variants that provide a temperature status register.
The temp_get callback is conditionally set during device registration
based on the ZL3073X_FLAG_DIE_TEMP chip flag.
====================
Ivan Vecera [Fri, 27 Feb 2026 10:53:00 +0000 (11:53 +0100)]
dpll: zl3073x: add die temperature reporting for supported chips
Some zl3073x chip variants (0x1Exx, 0x2Exx and 0x3FC4) provide a die
temperature status register with 0.1 C resolution.
Add a ZL3073X_FLAG_DIE_TEMP chip flag to identify these variants and
implement zl3073x_dpll_temp_get() as the dpll_device_ops.temp_get
callback. The register value is converted from 0.1 C units to
millidegrees as expected by the DPLL subsystem.
To support per-instance ops selection, copy the base dpll_device_ops
into struct zl3073x_dpll and conditionally set .temp_get during device
registration based on the chip flag.
Ivan Vecera [Fri, 27 Feb 2026 10:52:59 +0000 (11:52 +0100)]
dpll: zl3073x: detect DPLL channel count from chip ID at runtime
Replace the five per-variant zl3073x_chip_info structures and their
exported symbol definitions with a single consolidated chip ID lookup
table. The chip variant is now detected at runtime by reading the chip
ID register from hardware and looking it up in the table, rather than
being selected at compile time via the bus driver match data.
Repurpose struct zl3073x_chip_info to hold a single chip ID, its
channel count, and a flags field. Introduce enum zl3073x_flags with
ZL3073X_FLAG_REF_PHASE_COMP_32 to replace the chip_id switch statement
in zl3073x_dev_is_ref_phase_comp_32bit(). Store a pointer to the
detected chip_info entry in struct zl3073x_dev for runtime access.
This simplifies the bus drivers by removing per-variant .data and
.driver_data references from the I2C/SPI match tables, and makes
adding support for new chip variants a single-line table addition.
Dipayaan Roy [Fri, 27 Feb 2026 08:15:02 +0000 (00:15 -0800)]
net: mana: Trigger VF reset/recovery on health check failure due to HWC timeout
The GF stats periodic query is used as mechanism to monitor HWC health
check. If this HWC command times out, it is a strong indication that
the device/SoC is in a faulty state and requires recovery.
Today, when a timeout is detected, the driver marks
hwc_timeout_occurred, clears cached stats, and stops rescheduling the
periodic work. However, the device itself is left in the same failing
state.
Extend the timeout handling path to trigger the existing MANA VF
recovery service by queueing a GDMA_EQE_HWC_RESET_REQUEST work item.
This is expected to initiate the appropriate recovery flow by suspende
resume first and if it fails then trigger a bus rescan.
This change is intentionally limited to HWC command timeouts and does
not trigger recovery for errors reported by the SoC as a normal command
response.
Randy Dunlap [Sat, 28 Feb 2026 22:08:45 +0000 (14:08 -0800)]
atm: atmdev: add function parameter names and description
kernel-doc reports function parameters not described for parameters
that are not named. Add parameter names for these functions and then
describe the function parameters in kernel-doc format.
Fixes these warnings:
Warning: include/linux/atmdev.h:316 function parameter '' not described
in 'register_atm_ioctl'
Warning: include/linux/atmdev.h:321 function parameter '' not described
in 'deregister_atm_ioctl'
Once __dev_get_by_index() is converted to dev_get_by_index_rcu(),
we can move the other two functions under that same RCU section
and drop RTNL for ipmr_rtm_route().
Let's do that conversion and drop ASSERT_RTNL() in
mr_call_mfc_notifiers().
ipmr/ip6mr: Convert net->ipv[46].ipmr_seq to atomic_t.
We will no longer hold RTNL for ipmr_mfc_add() and ipmr_mfc_delete().
MFC entry can be loosely connected with VIF by its index for
mrt->vif_table[] (stored in mfc_parent), but the two tables are
not synchronised. i.e. Even if VIF 1 is removed, MFC for VIF 1
is not automatically removed.
The only field that the MFC/VIF interfaces share is
net->ipv[46].ipmr_seq, which is protected by RTNL.
Adding a new mutex for both just to protect a single field is overkill.