]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
6 weeks agonet/sched: add qstats_cpu_drop_inc() helper
Eric Dumazet [Fri, 1 May 2026 13:59:16 +0000 (13:59 +0000)] 
net/sched: add qstats_cpu_drop_inc() helper

1) Using this_cpu_inc() is better than going through this_cpu_ptr():

- Single instruction on x86.
- Store tearing prevention.

2) Change tcf_action_update_stats() to use this_cpu_add().

3) Add WRITE_ONCE() to __qdisc_qstats_drop() and qstats_drop_inc()
   in preparation for lockless "tc qdisc show".

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 3/17 up/down: 72/-216 (-144)
Function                                     old     new   delta
dualpi2_enqueue_skb                          462     511     +49
tcf_ife_act                                 1061    1077     +16
taprio_enqueue                               613     620      +7
codel_qdisc_enqueue                          149     143      -6
tcf_vlan_act                                 684     676      -8
tcf_skbedit_act                              626     618      -8
tcf_police_act                               725     717      -8
tcf_mpls_act                                1297    1289      -8
tcf_gate_act                                 310     302      -8
tcf_gact_act                                 222     214      -8
tcf_csum_act                                2438    2430      -8
tcf_bpf_act                                  709     701      -8
tcf_action_update_stats                      124     115      -9
pie_qdisc_enqueue                            865     856      -9
pfifo_enqueue                                116     107      -9
choke_enqueue                               2069    2059     -10
plug_enqueue                                 139     128     -11
bfifo_enqueue                                121     110     -11
tcf_nat_act                                 1501    1489     -12
gred_enqueue                                1743    1668     -75
Total: Before=24388609, After=24388465, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260501135916.2566766-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: realtek: Add support for PHY LEDs on RTL8221B
Chukun Pan [Fri, 1 May 2026 10:00:02 +0000 (18:00 +0800)] 
net: phy: realtek: Add support for PHY LEDs on RTL8221B

Realtek RTL8221B Ethernet PHY supports three LED pins which are used to
indicate link status and activity. Add netdev trigger support for them.

Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260501100002.755672-1-amadeus@jmu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/sched: taprio: prepare taprio_dump() for RTNL removal
Eric Dumazet [Fri, 1 May 2026 06:42:47 +0000 (06:42 +0000)] 
net/sched: taprio: prepare taprio_dump() for RTNL removal

We soon will no longer hold RTNL in qdisc dumps.

Add READ_ONCE()/WRITE_ONCE() annotations.

Note: taprio already uses RCU to protect most of its fields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260501064247.2027688-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'intel-wired-lan-updates-2024-04-30-ixgbe-i40e-ice'
Jakub Kicinski [Sun, 3 May 2026 02:12:47 +0000 (19:12 -0700)] 
Merge branch 'intel-wired-lan-updates-2024-04-30-ixgbe-i40e-ice'

Jacob Keller says:

====================
Intel Wired LAN Updates 2024-04-30 (ixgbe, i40e, ice)

This series includes updates to support Energy-Efficient Ethernet (EEE) on
E610 devices in the ixgbe driver, support for an unmanaged DPLL output on
E830, as well as some other minor cleanups and improvements across ixgbe,
i40e, and ice.

Jedrzej begins with the first six patches preparing the ixgbe driver to
support EEE, adding a EEE capability flag, updating the supported EEE
speeds, updating the ACI command structures with the fields related to
EEE, moving the EEE config validation out for re-use, and finally
implementing the EEE support for E610 hardware.

Aleksandr fixes the ixgbe_update_flash_X550() logic to prevent unaligned
access in ixgbe_host_interface_command(). Note: this has no functional
change on x86, and is being sent through net-next as it is considered a
minor cleanup.

Jacob (hi!) modifies the i40e driver to only timestamp PTP event packets,
instead of timestamping every V2 event frame. This avoids wasting the
limited number of timestamp slots for frames which the PTP protocol does
not care about.

Jacob also extends the devlink flash notification message reporting that
users can activate the new firmware via devlink reload to explicitly
indicate the required "fw_activate" action.

Byungchul Park  fixes the ice_lbtest_receive_frames() function to use
netmem_desc instead of the page structure.

Przemyslaw Korba fixes a truncation warning in ice_dpll_init_fwnode_pins()
by increasing the allowed length of the pin_name string on the stack to 16.

Ivan Vecera adds some bounds checking to ice_dpll_rclk_state_on_pin_get/set()
and moves the CGU register macros to be under the header guard ifdef in
ice_dpll.h
====================

Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-0-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoice: dpll: Fix compilation warning
Przemyslaw Korba [Fri, 1 May 2026 06:37:24 +0000 (23:37 -0700)] 
ice: dpll: Fix compilation warning

Introduced by commit ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and
dynamic pin discovery"):

ice_dpll.c: In function ‘ice_dpll_init’:
ice_dpll.c:3588:59: error: ‘%u’ directive output may be truncated
writing between 1 and 10 bytes into a region of size 4
[-Werror=format-truncation=] snprintf(pin_name, sizeof(pin_name),
"rclk%u", i);

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-13-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoice: access @pp through netmem_desc instead of page
Byungchul Park [Fri, 1 May 2026 06:37:23 +0000 (23:37 -0700)] 
ice: access @pp through netmem_desc instead of page

To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make ice driver access @pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-12-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoice: mention fw_activate action along with devlink reload
Jacob Keller [Fri, 1 May 2026 06:37:22 +0000 (23:37 -0700)] 
ice: mention fw_activate action along with devlink reload

The ice driver reports a helpful status message when updating firmware
indicating what action is necessary to enable the new firmware. This is
done because some updates require power cycling or rebooting the machine
but some can be activated via devlink.

The ice driver only supports activating firmware with the specific action
of "fw_activate" a bare "devlink dev reload" will *not* update the
firmware, and will only perform driver reinitialization.

Update the status message to explicitly reflect that the reload must use
the fw_activate action.

I considered modifying the text to spell out the full command, but felt
that was both overkill and something that would belong better as part of
the user space program and not hard coded into the kernel driver output.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-11-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoi40e: only timestamp PTP event packets
Jacob Keller [Fri, 1 May 2026 06:37:21 +0000 (23:37 -0700)] 
i40e: only timestamp PTP event packets

The i40e_ptp_set_timestamp_mode() function is responsible for configuring
hardware timestamping. When programming receive timestamping, the logic
must determine how to configure the PRTTSYN_CTL1 register for receive
timestamping.

The i40e hardware does not support timestamping all frames. Instead,
timestamps are captured into one of the four PRTTSYN_RXTIME registers.

Currently, the driver configures hardware to timestamp all V2 packets on
ports 319 and 320, including all message types. This timestamps
significantly more packets than is actually requested by the
HWTSTAMP_FILTER_PTP_V2_EVENT filter type.

The documentation for HWTSTAMP_FILTER_PTP_V2_EVENT indicates that it should
timestamp PTP v2 messages on any layer, including any kind of event
packets.

Timestamping other packets is acceptable, but not required by the filter.
Doing so wastes valuable slots in the Rx timestamp registers. For most
applications this doesn't cause a problem. However, for extremely high
rates of messages, it becomes possible that one of the critical event
packets is not timestamped.

The PTP protocol only requires timestamps for event messages on port 319,
but hardware is timestamping on both 319 and 320, and timestamping message
types which do not need a timestamp value.

The i40e hardware actually has a more strict filtering option. First, only
timestamp layer 4 messages on port 319 instead of both 319 and 320. Second,
note that hardware has a specific mode to timestamp only event packets
(those with message type < 8).

Update the configuration to use the strict mode that only timestamps event
messages, switching the TSYNTYPE field from 10b to 11b which limits the
timestamping only to eventpackets with a Message Type of < 8. Note that the
X700 series datasheet seems to indicate that the V2MSESTYPE field is no
longer relevant. However, we only tested and validated with leaving the
V2MESSTYPE field set to 0xF for the "wildcard" behavior it documents. This
might not be required but it in that case setting it appears harmless, so
leave it as is.

This avoids wasting the valuable Rx timestamp register slots on non-event
frames, and may reduce faults when operating under high event rates.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-10-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoixgbe: fix unaligned u32 access in ixgbe_update_flash_X550()
Aleksandr Loktionov [Fri, 1 May 2026 06:37:20 +0000 (23:37 -0700)] 
ixgbe: fix unaligned u32 access in ixgbe_update_flash_X550()

ixgbe_host_interface_command() treats its buffer as a u32 array. The
local buffer we pass in was a union of byte-sized fields, which gives
it 1-byte alignment on the stack. On strict-align architectures this
can cause unaligned 32-bit accesses.

Add a u32 member to union ixgbe_hic_hdr2 so the object is 4-byte
aligned, and pass the u32 member when calling
ixgbe_host_interface_command().

No functional change on x86; prevents unaligned accesses on
architectures that enforce natural alignment.

Fixes: 49425dfc7451 ("ixgbe: Add support for x550em_a 10G MAC type")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: 6a14ee0cfb19 ("ixgbe: Add X550 support function pointers")
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-9-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoixgbe: E610: add EEE support
Jedrzej Jagielski [Fri, 1 May 2026 06:37:17 +0000 (23:37 -0700)] 
ixgbe: E610: add EEE support

Add E610 specific implementation of .get_eee() and .set_eee() ethtool
callbacks.

Introduce ixgbe_setup_eee_e610() which is used to set EEE config
on E610 device via ixgbe_aci_set_phy_cfg() (0x0601 ACI command).
Assign it to dedicated mac operation.

E610 devices support EEE feature specifically for 2.5, 5 and 10G link
speeds. When user try to set EEE for unsupported speeds log it.

Setting timer and setting EEE advertised speeds are not yet supported.

EEE shall be enabled by default for E610 devices.

Add EEE statuis logging during link watchdog run.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-6-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoixgbe: move EEE config validation out of ixgbe_set_eee()
Jedrzej Jagielski [Fri, 1 May 2026 06:37:16 +0000 (23:37 -0700)] 
ixgbe: move EEE config validation out of ixgbe_set_eee()

To make this part of the code mode reusable move all
EEE input checks out of ixgbe_set_eee().

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-5-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoixgbe: E610: update ACI command structs with EEE fields
Jedrzej Jagielski [Fri, 1 May 2026 06:37:15 +0000 (23:37 -0700)] 
ixgbe: E610: update ACI command structs with EEE fields

There were recent changes in some of the ACI commands,
which have been extended with EEE related fields.
Set PHY Config, Get PHY Caps and Get Link Info have been
affected.

Align SW structs to the recent FW changes.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-4-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoixgbe: E610: use new version of 0x601 ACI command buffer
Jedrzej Jagielski [Fri, 1 May 2026 06:37:14 +0000 (23:37 -0700)] 
ixgbe: E610: use new version of 0x601 ACI command buffer

Since FW version 1.40, buffer size of the 0x601 cmd has been increased
by 2B - from 24 to 26B. Buffer has been extended with new field
which can be used to configure EEE entry delay.

Pre-1.40 FW versions still expect 24B buffer and throws error when
receipts 26B buffer. To keep compatibility, check whether EEE
device capability flag is set and basing on it use appropriate
size of the command buffer.

Additionally place Set PHY Config capabilities defines out of
structs definitions.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-3-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoixgbe: E610: update EEE supported speeds
Jedrzej Jagielski [Fri, 1 May 2026 06:37:13 +0000 (23:37 -0700)] 
ixgbe: E610: update EEE supported speeds

Despite there was no EEE (Energy Efficient Ethernet) feature
support for E610 adapters, eee_speeds_supported variable was
defined and even initialized with some EEE speeds.

As E610 adapter supports EEE only for 10G, 5G and 2.5G speeds,
update hw.phy.eee_speeds_supported. Remove unsupported speeds -
10M, 100M and 1G.

Add also entry for 5G speed in EEE speeds mapping array used
by ethtool callbacks.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-2-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoixgbe: E610: add discovering EEE capability
Jedrzej Jagielski [Fri, 1 May 2026 06:37:12 +0000 (23:37 -0700)] 
ixgbe: E610: add discovering EEE capability

Add detecting and parsing EEE device capability.

Recently EEE functionality support has been introduced to E610 FW.
Currently ixgbe driver has no possibility to detect whether NVM
loaded on given adapter supports EEE.

There's dedicated device capability element reflecting FW support
for given EEE link speed.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-1-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-dsa-yt921x-add-port-police-support'
Jakub Kicinski [Sat, 2 May 2026 17:39:00 +0000 (10:39 -0700)] 
Merge branch 'net-dsa-yt921x-add-port-police-support'

David Yang says:

====================
net: dsa: yt921x: Add port police support
====================

Link: https://patch.msgid.link/20260430114529.3536911-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: yt921x: Add port police support
David Yang [Thu, 30 Apr 2026 11:45:26 +0000 (19:45 +0800)] 
net: dsa: yt921x: Add port police support

Enable rate meter ability and support limiting the rate of incoming
traffic.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260430114529.3536911-4-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: yt921x: Refactor long register helpers
David Yang [Thu, 30 Apr 2026 11:45:25 +0000 (19:45 +0800)] 
net: dsa: yt921x: Refactor long register helpers

Dealing long registers with u64 is good, until you realize there are
longer 96-bit registers.

Refactor reg64 helpers to use u32 arrays instead of u64 values, in
preparation for 96-bit registers. We do not keep the separate u64
version for reg64 to avoid duplicated wrappers, although it looks better
when dealing with reg64 *only*.

Helpers for reg96 should be added when they are actually used to avoid
function unused warnings.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260430114529.3536911-3-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: pass extack to dsa_switch_ops :: port_policer_add()
David Yang [Thu, 30 Apr 2026 11:45:24 +0000 (19:45 +0800)] 
net: dsa: pass extack to dsa_switch_ops :: port_policer_add()

Drivers might have error messages to propagate to user space. Propagate
the netlink extack so that they can inform user space in a verbal way of
their limitations.

Make the according transformations to the two users (sja1105 and felix).

Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260430114529.3536911-2-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5: Add vhca_id_type support to IPsec alias creation
Patrisious Haddad [Thu, 30 Apr 2026 06:19:58 +0000 (09:19 +0300)] 
net/mlx5: Add vhca_id_type support to IPsec alias creation

When creating an alias FT for MPV IPsec, if alias creation with
sw_vhca_id is supported use it instead of using the hw_vhca_id.

This in turn allows IPsec to work properly after live migration,
in case a VF was live migrated and his hw_vhca_id changed due to
migration which can happen if you migrate to a VF with a different index
than yours, IPsec would fail to start post migration, this patch
resolves the issue by using sw_vhca_id instead which doesn't change post
migration.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260430061958.225245-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoDocumentation/tcp_ao: Document the supported MAC algorithms and lengths
Eric Biggers [Wed, 29 Apr 2026 21:08:56 +0000 (21:08 +0000)] 
Documentation/tcp_ao: Document the supported MAC algorithms and lengths

Update the TCP-AO documentation to fix some incorrect terminology and
claims regarding the MAC algorithms, and document which MAC algorithms
and lengths the Linux implementation supports.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260429210856.725667-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-mlx5-enable-sub-page-allocations-for-mlx5_frag_buf'
Jakub Kicinski [Sat, 2 May 2026 02:02:09 +0000 (19:02 -0700)] 
Merge branch 'net-mlx5-enable-sub-page-allocations-for-mlx5_frag_buf'

Tariq Toukan says:

====================
net/mlx5: enable sub-page allocations for mlx5_frag_buf

This series aims to improve memory utilization for DMA-coherent
fragmented-buffer allocations on systems with large PAGE_SIZE.

Before this change, such allocations were page-granular, as they were
backed by full pages. On large-page systems this caused significant
internal waste for small objects. For example, a single 4K request
consumed an entire 64K page.

The common kernel solution for sub-page coherent DMA allocations is the
DMA pool API. However, those pools do not return pages to the system
until teardown. That behavior is not a good fit for mlx5_frag_buf
allocations, since they back interface resources (WQs and CQs).
Interfaces may be removed dynamically, so their memory footprint should
reflect live usage to avoid situations where large amounts of memory
remain tied up in pools.

This series introduces a lightweight mlx5-local pool implementation for
sub-page coherent DMA allocations, which immediately returns free
backing pages. It wires mlx5_frag_buf allocations to use these internal
pools, while keeping the mechanism reusable for other mlx5-internal
coherent DMA allocation users in follow-up work.
====================

Link: https://patch.msgid.link/20260429201429.223809-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5: use internal dma pools for frag buf alloc
Nimrod Oren [Wed, 29 Apr 2026 20:14:29 +0000 (23:14 +0300)] 
net/mlx5: use internal dma pools for frag buf alloc

Add mlx5_dma_pool alloc/free paths, and wire mlx5_frag_buf allocation
and free paths to use them.

mlx5_frag_buf_alloc_node() now selects an mlx5_dma_pool to allocate
fragments from, instead of directly allocating full coherent pages.

mlx5_frag_buf_free() frees from the respective pool.

mlx5_dma_pool_alloc() keeps allocation fast by maintaining pages with
available indexes at the head of the list, so the common allocation path
can take a free index immediately. New backing pages are allocated only
when no free index is available.

mlx5_dma_pool_free() returns released indexes to the pool and frees a
backing page once all of its indexes become free. This avoids keeping
fully free pages for the lifetime of the pool and reduces coherent DMA
memory footprint.

Signed-off-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260429201429.223809-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5: add frag buf pools create/destroy paths
Nimrod Oren [Wed, 29 Apr 2026 20:14:28 +0000 (23:14 +0300)] 
net/mlx5: add frag buf pools create/destroy paths

Introduce mlx5 DMA pool and pool-page data structures, and add the
creation and teardown paths.

Each NUMA node owns a set of mlx5_dma_pool instances, each one with a
different block size. The sizes are defined as all powers of two
starting from MLX5_ADAPTER_PAGE_SHIFT and up to PAGE_SHIFT. Since
mlx5_frag_bufs are used to back objects whose sizes are encoded relative
to MLX5_ADAPTER_PAGE_SHIFT, a smaller block_shift value cannot be used.
Requests larger than PAGE_SIZE continue to be handled as page-sized
fragments, as in the existing frag-buf allocation model.

Signed-off-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260429201429.223809-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5: wire frag buf pools lifecycle hooks
Nimrod Oren [Wed, 29 Apr 2026 20:14:27 +0000 (23:14 +0300)] 
net/mlx5: wire frag buf pools lifecycle hooks

Wire mlx5_frag_buf pools init/cleanup hooks into
mlx5_mdev_init()/uninit() and the init unwind path.

Keep temporary no-op stubs in alloc.c so lifecycle ordering is in place
before the coherent DMA sub-page allocator implementation is added in
follow-up patches.

Signed-off-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260429201429.223809-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agopppoe: optimize hash with word access
Qingfang Deng [Wed, 29 Apr 2026 02:38:46 +0000 (10:38 +0800)] 
pppoe: optimize hash with word access

Currently, hash_item() processes the 6-byte Ethernet address and the
2-byte session ID byte-wise to compute a hash.

Optimize this by using 16-bit word operations: XOR three 16-bit words
from the Ethernet address and the 16-bit session ID, then fold the
result. This reduces the total number of loads and XORs. The Ethernet
addresses in a skb and struct pppoe_addr are both 2-byte aligned, so the
u16 pointer cast is safe.

Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Link: https://patch.msgid.link/20260429023848.153425-1-qingfang.deng@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: airoha: configure QoS channel for HW accelerated flowtable traffic
Lorenzo Bianconi [Thu, 30 Apr 2026 08:47:38 +0000 (10:47 +0200)] 
net: airoha: configure QoS channel for HW accelerated flowtable traffic

As done for the SW path, configure the QoS channel for HW accelerated
traffic according to the user port index when forwarding to a DSA port,
or rely on the GDM port identifier otherwise. This allows HTB shaping
to be applied to HW accelerated traffic.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260430-airoha-ppe-qos-channel-v1-1-5ef9221e85c1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'tcp-move-some-fastpath-fields-to-appropriate-groups'
Jakub Kicinski [Sat, 2 May 2026 00:22:47 +0000 (17:22 -0700)] 
Merge branch 'tcp-move-some-fastpath-fields-to-appropriate-groups'

Eric Dumazet says:

====================
tcp: move some fastpath fields to appropriate groups

Move following fields to better groups to increase data locality.

- delivered
- delivered_ce
- segs_in
- segs_out
- first_tx_mstamp
- delivered_mstamp
- max_packets_out
- cwnd_usage_seq
- rate_delivered
- rate_interval_us

No change in overall tcp_sock size.
====================

Link: https://patch.msgid.link/20260430100021.211139-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: move max_packets_out, cwnd_usage_seq, rate_delivered and rate_interval_us to...
Eric Dumazet [Thu, 30 Apr 2026 10:00:21 +0000 (10:00 +0000)] 
tcp: move max_packets_out, cwnd_usage_seq, rate_delivered and rate_interval_us to tcp_sock_write_tx group

These fields are used in TX path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: move tp->bytes_acked to tcp_sock_write_tx group
Eric Dumazet [Thu, 30 Apr 2026 10:00:20 +0000 (10:00 +0000)] 
tcp: move tp->bytes_acked to tcp_sock_write_tx group

tp->bytes_acked is touched in TX path only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: move tp->first_tx_mstamp and tp->delivered_mstamp to tcp_sock_write_tx
Eric Dumazet [Thu, 30 Apr 2026 10:00:19 +0000 (10:00 +0000)] 
tcp: move tp->first_tx_mstamp and tp->delivered_mstamp to tcp_sock_write_tx

These fields are touched in when payload is sent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: move tp->segs_in and tp->segs_out to tcp_sock_write_txrx group
Eric Dumazet [Thu, 30 Apr 2026 10:00:18 +0000 (10:00 +0000)] 
tcp: move tp->segs_in and tp->segs_out to tcp_sock_write_txrx group

segs_in is changed for each incoming packet, including ACK packets.
segs_out is changed for each outgoing packet, including ACK packets.

They belong to tcp_sock_write_txrx group.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: move tp->delivered and tp->delivered_ce to tcp_sock_write_tx group
Eric Dumazet [Thu, 30 Apr 2026 10:00:17 +0000 (10:00 +0000)] 
tcp: move tp->delivered and tp->delivered_ce to tcp_sock_write_tx group

These counters are changed whenever sent data is acknowleged.

They do not belong to tcp_sock_write_txrx group, because TCP receivers
do not touch them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: drv-net: Enable ntuple-filters if supported
Dimitri Daskalakis [Thu, 30 Apr 2026 16:52:17 +0000 (09:52 -0700)] 
selftests: drv-net: Enable ntuple-filters if supported

Certain devices which support ntuple-filters do not enable the feature
by default. The existing tests will skip (if they check for the feature),
or fail if they blindly attempt to install rules. Therefore, attempt to turn
on ntuple-filters if the device supports them.

Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260430165217.3700469-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoip6mr: plug drop_reason to ip6mr_cache_report()
Eric Dumazet [Thu, 30 Apr 2026 07:40:04 +0000 (07:40 +0000)] 
ip6mr: plug drop_reason to ip6mr_cache_report()

- Check mrt->mroute_sk earlier in the function.

- Use sock_queue_rcv_skb_reason() instead of sock_queue_rcv_skb().
- Use sk_skb_reason_drop() instead of kfree_skb().
  Note that we return -ENOMEM if sock_queue_rcv_skb_reason() failed,
  as the precise error is not really needed for callers.

- Remove one net_warn_ratelimited().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260430074004.4133602-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agone2k: fold drivers/net/Space.c into ne.c
Arnd Bergmann [Wed, 29 Apr 2026 14:55:46 +0000 (16:55 +0200)] 
ne2k: fold drivers/net/Space.c into ne.c

drivers/net/Space.c is the last remnant of the linux-2.4.x driver model
that required each subsystem and device driver init function to be called
from init/main.c explicitly, before the introduction of initcall levels.

In linux-7.0, this was only used for a handful of ISA network drivers,
with the ne2000 driver being the last one.

Fold the code into ne.c directly, with minimal changes to preserve
the existing command line parsing.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260429145624.2948432-2-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: cs89x0: remove ISA bus probing
Arnd Bergmann [Wed, 29 Apr 2026 14:55:45 +0000 (16:55 +0200)] 
net: cs89x0: remove ISA bus probing

The cs89x0 driver is really two in one, and they are mutually exclusive:

 - the ISA driver was used on 486-era PCs. It likely has no remaining
   users, like the other ethernet drivers that got removed in
   linux-7.1. The DMA support in here is the last device driver use of
   the deprecated isa_bus_to_virt() interface, all other users are either
   x86 specific or or got converted to the normal dma-mapping interface.
   The driver was maintained by Andrew Morton at the time, based on
   the linux-2.2 vendor driver from Cirrus Logic.

 - the platform_driver instance was used on some embedded Arm boards
   around the same time, such as the EP7211 Development Kit. This
   is the same chip, but uses modern devicetree based probing and no DMA.
   This was added by Alexander Shiyan.

Remove the ISA driver as a cleanup, including all of the outdated
documentation referring to its configuration.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260429145624.2948432-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: tls: reshuffle the device ops check
Jakub Kicinski [Wed, 29 Apr 2026 21:30:01 +0000 (14:30 -0700)] 
net: tls: reshuffle the device ops check

We try to validate during registration that the netdev
has ops if it has features. This is currently somewhat sillily
written because we have a dereference before a NULL check
on the ops struct. Straighten this out.

No functional change intended other than saving ourselves
the very theoretical crash with a bad driver.

Note that we check earlier in the function that either ops
or TLS features are set for the device in question.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260429213001.1908235-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-sched-tc_dump_qdisc-optimizations'
Jakub Kicinski [Fri, 1 May 2026 03:54:58 +0000 (20:54 -0700)] 
Merge branch 'net-sched-tc_dump_qdisc-optimizations'

Eric Dumazet says:

====================
net/sched: tc_dump_qdisc() optimizations

Before converting tc_dump_qdisc() to RCU, we make the following changes:

- Use for_each_netdev_dump() instead of for_each_netdev()

- Only dump qdiscs of a single device at user space request.
====================

Link: https://patch.msgid.link/20260430023628.3216283-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/sched: speedup tc_dump_qdisc() when tcm_ifindex is provided
Eric Dumazet [Thu, 30 Apr 2026 02:36:28 +0000 (02:36 +0000)] 
net/sched: speedup tc_dump_qdisc() when tcm_ifindex is provided

There is no point dumping qdiscs for all devices when user space
wants them for a single device:

tc -s -d qdisc show dev eth1

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430023628.3216283-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/sched: switch tc_dump_qdisc() to for_each_netdev_dump()
Eric Dumazet [Thu, 30 Apr 2026 02:36:27 +0000 (02:36 +0000)] 
net/sched: switch tc_dump_qdisc() to for_each_netdev_dump()

Use for_each_netdev_dump() instead of for_each_netdev().

This is more scalable, and will ease RCU conversion.

This also offer better behavior when other threads
are adding or deleting netevices concurrently.

This enables dumping qdiscs for a single device
at user space request in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430023628.3216283-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/sched: tc_dump_qdisc_root() refactor
Eric Dumazet [Thu, 30 Apr 2026 02:36:26 +0000 (02:36 +0000)] 
net/sched: tc_dump_qdisc_root() refactor

Change tc_fill_qdisc() to return -EMSGSIZE when skb is too small.

Change tc_dump_qdisc_root() to propagate tc_fill_qdisc() error to its callers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430023628.3216283-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/sched: propagate tc_fill_tclass() error
Eric Dumazet [Thu, 30 Apr 2026 02:36:25 +0000 (02:36 +0000)] 
net/sched: propagate tc_fill_tclass() error

Change tc_fill_tclass() to return -EMSGSIZE when skb is too small.

Change its caller to propagate this error (instead of -EINVAL)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430023628.3216283-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests/net: packetdrill: add tcp_syncookies_ip[46]_9k
Eric Dumazet [Thu, 30 Apr 2026 02:14:44 +0000 (02:14 +0000)] 
selftests/net: packetdrill: add tcp_syncookies_ip[46]_9k

These tests check syncookie mode is able to reconstruct some
client options when TCP TS are used:

- wscale option.
- sackOK.
- MSS (in a limited way, especially for IPv4).
- ECN : not enabled.

Note that IPv4 and IPv6 have different msstab[] values:

IPv4 msstab[4] = { 536, 1300, 1440, 1460 }
IPv6 msstab[4] = { 1280 - 60, 1480 - 60, 1500 - 60, 9000 - 60 }

IPv4 is currently capping SND_MSS to 1460, even on a 9K MTU network.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260430021444.2929534-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Fri, 1 May 2026 01:53:20 +0000 (18:53 -0700)] 
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2026-04-29

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Extend query_esw_functions output for multi-function support
  net/mlx5: Remove unused host_sf_enable field
  net/mlx5: Add function_id_type for enable/disable_hca cmds
  mlx5: Rename the vport number enums for host PF and VF
====================

Link: https://patch.msgid.link/20260429212747.224411-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'bridge-do-not-suppress-arp-probes-and-dad-ns-unconditionally'
Jakub Kicinski [Fri, 1 May 2026 00:35:20 +0000 (17:35 -0700)] 
Merge branch 'bridge-do-not-suppress-arp-probes-and-dad-ns-unconditionally'

Danielle Ratson says:

====================
bridge: Do not suppress ARP probes and DAD NS unconditionally

When using bridge neighbor suppression in EVPN deployments, Duplicate
Address Detection (DAD) is currently broken for both IPv4 (ARP probes)
and IPv6 (DAD Neighbor Solicitations). This prevents proper address
conflict detection across the VXLAN fabric.

The neighbor suppression feature allows the bridge to reply to ARP/NS
messages on behalf of remote hosts when FDB and neighbor entries exist,
suppressing unnecessary flooding over the VXLAN overlay. However, the
current implementation unconditionally suppresses ARP probes and DAD NS,
which breaks DAD.

For DAD to work correctly:
- When the bridge doesn't know the answer:
  flood the probe/DAD packet to allow remote VTEPs to respond.
- When the bridge knows the answer:
  reply to indicate the address is in use.

This series fixes the issue by adjusting the early suppression checks to
exclude ARP probes and DAD NS from unconditional suppression, allowing
them to reach the normal FDB lookup path. Gratuitous ARP and IPv6
unsolicited-NA messages are still suppressed unconditionally as before.

Patchset overview:
Patch #1: Fixes the unconditional suppression.
Patch #2: Adds selftests.
====================

Link: https://patch.msgid.link/20260429062405.1386417-1-danieller@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: net: Add tests for ARP probe and DAD NS handling
Danielle Ratson [Wed, 29 Apr 2026 06:24:05 +0000 (09:24 +0300)] 
selftests: net: Add tests for ARP probe and DAD NS handling

Add test cases to verify that ARP probes and DAD Neighbor Solicitations
are handled correctly by the bridge neighbor suppression feature.

When neighbor suppression is enabled on a bridge VXLAN port, the bridge
should reply to ARP/NS messages on behalf of remote hosts when both FDB
and neighbor entries exist, and the answer is known. However, when
either the FDB or the neighbor exists, ARP probes / DAD NS should be
treated like regular ARP requests / NS and flood to VXLAN.

Add two new test functions:

neigh_suppress_arp_probe(): Tests ARP probe handling by triggering
duplicate address detection using arping -D. Verifies that probes are
flooded when the bridge doesn't know the answer, and suppressed when FDB
and neighbor entries exist.

neigh_suppress_dad_ns(): Tests DAD NS handling by constructing DAD NS
packets using mausezahn and verifies correct flooding/suppression
behavior.

Before the previous patch:

$ ./test_bridge_neigh_suppress.sh -t "neigh_suppress_arp_probe neigh_suppress_dad_ns"

Per-port ARP probe suppression
------------------------------
TEST: ARP probe suppression                                         [ OK ]
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: ARP probe suppression                                         [FAIL]
TEST: FDB and neighbor entry installation                           [ OK ]
TEST: arping                                                        [FAIL]
TEST: ARP probe suppression                                         [FAIL]
TEST: neighbor removal                                              [ OK ]
TEST: ARP probe suppression                                         [FAIL]
TEST: "neigh_suppress" is off                                       [ OK ]
TEST: ARP probe suppression                                         [FAIL]

Per-port DAD NS suppression
---------------------------
TEST: DAD NS suppression                                            [ OK ]
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: DAD NS suppression                                            [FAIL]
TEST: FDB and neighbor entry installation                           [ OK ]
TEST: DAD NS suppression                                            [FAIL]
TEST: neighbor removal                                              [ OK ]
TEST: DAD NS suppression                                            [FAIL]
TEST: DAD NS proxy NA reply                                         [FAIL]
TEST: "neigh_suppress" is off                                       [ OK ]
TEST: DAD NS suppression                                            [FAIL]

Tests passed:   10
Tests failed:   10

After the previous patch:

$ ./test_bridge_neigh_suppress.sh -t "neigh_suppress_arp_probe neigh_suppress_dad_ns"

Per-port ARP probe suppression
------------------------------
TEST: ARP probe suppression                                         [ OK ]
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: ARP probe suppression                                         [ OK ]
TEST: FDB and neighbor entry installation                           [ OK ]
TEST: arping                                                        [ OK ]
TEST: ARP probe suppression                                         [ OK ]
TEST: neighbor removal                                              [ OK ]
TEST: ARP probe suppression                                         [ OK ]
TEST: "neigh_suppress" is off                                       [ OK ]
TEST: ARP probe suppression                                         [ OK ]

Per-port DAD NS suppression
---------------------------
TEST: DAD NS suppression                                            [ OK ]
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: DAD NS suppression                                            [ OK ]
TEST: FDB and neighbor entry installation                           [ OK ]
TEST: DAD NS suppression                                            [ OK ]
TEST: neighbor removal                                              [ OK ]
TEST: DAD NS suppression                                            [ OK ]
TEST: DAD NS proxy NA reply                                         [ OK ]
TEST: "neigh_suppress" is off                                       [ OK ]
TEST: DAD NS suppression                                            [ OK ]

Tests passed:  20
Tests failed:   0

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260429062405.1386417-3-danieller@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agobridge: Do not suppress ARP probes and DAD NS unconditionally
Danielle Ratson [Wed, 29 Apr 2026 06:24:04 +0000 (09:24 +0300)] 
bridge: Do not suppress ARP probes and DAD NS unconditionally

When neighbor suppression is enabled on a VXLAN port, the bridge is
expected to reply to ARP/NS messages on behalf of remote hosts when both
FDB and neighbor entries exist. This allows the bridge to suppress
flooding of these messages to the VXLAN overlay.

According to RFC 9161 ("Operational Aspects of Proxy ARP/ND in Ethernet
Virtual Private Networks"):
"A PE SHOULD reply to broadcast/multicast address resolution messages,
i.e., ARP Requests, ARP probes, NS messages, as well as DAD NS messages.
An ARP probe is an ARP Request constructed with an all-zero sender IP
address that may be used by hosts for IPv4 Address Conflict Detection as
specified in [RFC5227]".

However, the current implementation unconditionally suppresses ARP probes
and DAD Neighbor Solicitations, which breaks Duplicate Address Detection
(DAD) over EVPN.

For DAD to work correctly over the VXLAN fabric:
- When the bridge does not know the answer:
  flood the probe/DAD packet to allow remote VTEPs to respond.
- When the bridge knows the answer:
  reply to indicate the address is in use.

Fix by adjusting the early suppression checks to exclude ARP probes and
DAD NS from unconditional suppression.

When replying to a DAD NS, br_nd_send() is adjusted to set the NA
destination to the all-nodes multicast address (ff02::1) and clear the
Solicited flag, in accordance with RFC 4861 section 7.2.4.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260429062405.1386417-2-danieller@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/smc: cap allocation order for SMC-R physically contiguous buffers
D. Wythe [Wed, 29 Apr 2026 02:16:37 +0000 (10:16 +0800)] 
net/smc: cap allocation order for SMC-R physically contiguous buffers

The alloc_pages() cannot satisfy requests exceeding MAX_PAGE_ORDER,
and attempting such allocations will lead to guaranteed failures
and potential kernel warnings.

For SMCR_PHYS_CONT_BUFS, cap the allocation order to MAX_PAGE_ORDER.
This ensures the attempts to allocate the largest possible physically
contiguous chunk succeed, instead of failing with an invalid order.
This also avoids redundant "try-fail-degrade" cycles in
__smc_buf_create().

For SMCR_MIXED_BUFS, no cap is needed: if the order exceeds
MAX_PAGE_ORDER, alloc_pages() will silently fail (__GFP_NOWARN)
and automatically fall back to virtual memory.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com>
Link: https://patch.msgid.link/20260429021637.21815-1-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'wireless-next-2026-04-30' of https://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Fri, 1 May 2026 00:10:20 +0000 (17:10 -0700)] 
Merge tag 'wireless-next-2026-04-30' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Some new content already, notably:
 - mac80211: major rework of station bandwidth handling,
             fixing issues with lower capability than AP
 - general: cleanups for EMLSR spec issues (drafts differed)
 - ath9k: GPIO interface improvements
 - ath12k: replace dynamic memory allocation in WMI RX path

* tag 'wireless-next-2026-04-30' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (39 commits)
  wifi: brcmsmac: phy_lcn: Remove dead code in wlc_lcnphy_radio_2064_channel_tune_4313()
  wifi: mac80211: always allow transmitting null-data on TXQs
  wifi: mac80211: use kstrtobool_from_user() in debugfs callbacks
  wifi: cfg80211: validate cipher suite for NAN Data keys
  wifi: nl80211: check link is beaconing for color change
  wifi: mac80211: clarify an 802.11 VHT spec reference
  wifi: mac80211: fix per-station PHY capability bandwidth
  wifi: mac80211: clarify per-STA bandwidth handling
  wifi: nl80211: always validate AP operation/PHY regulatory
  wifi: cfg80211: provide HT/VHT operation for AP beacon
  wifi: nl80211: reject too short HT/VHT/HE/EHT capability/operation
  wifi: cfg80211: move AP HT/VHT/... operation to beacon info
  wifi: nl80211: reject beacons with bad HE operation
  wifi: cfg80211: remove HE/SAE H2E required fields
  wifi: mac80211: remove ieee80211_sta_cur_vht_bw()
  wifi: mac80211: clean up ieee80211_sta_cap_rx_bw()
  wifi: mac80211: clean up initial STA NSS/bandwidth handling
  wifi: mac80211: clean up STA NSS handling
  wifi: mac80211: simplify ieee80211_sta_rx_bw_to_chan_width()
  wifi: nl80211: document channel opmode change channel width
  ...
====================

Link: https://patch.msgid.link/20260430120304.249081-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: mctp: test: remove skb dumps from test output
Jeremy Kerr [Wed, 29 Apr 2026 08:27:31 +0000 (16:27 +0800)] 
net: mctp: test: remove skb dumps from test output

We're currently dumping skb info in our fragment input test, which makes
interpreting the TAP test output a bit awkward.

Remove the skb dumps.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260429-dev-mctp-test-skb-dump-v1-1-13fd5789ef71@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 30 Apr 2026 19:49:56 +0000 (12:49 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.1-rc2).

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'net-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 30 Apr 2026 15:45:43 +0000 (08:45 -0700)] 
Merge tag 'net-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - ipmr: free mr_table after RCU grace period.

  Previous releases - regressions:

   - core: add net_iov_init() and use it to initialize ->page_type

   - sched: taprio: fix NULL pointer dereference in class dump

   - netfilter: nf_tables:
      - use list_del_rcu for netlink hooks
      - fix strict mode inbound policy matching

   - tcp: make probe0 timer handle expired user timeout

   - vrf: fix a potential NPD when removing a port from a VRF

   - eth: ice:
      - fix NULL pointer dereference in ice_reset_all_vfs()
      - fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw

  Previous releases - always broken:

   - page_pool: fix memory-provider leak in error path

   - sched: sch_cake: annotate data-races in cake_dump_stats()

   - mptcp: fix scheduling with atomic in timestamp sockopt

   - psp: check for device unregister when creating assoc

   - tls: fix strparser anchor skb leak on offload RX setup failure

   - eth:
      - stmmac: prevent NULL deref when RX memory exhausted
      - airoha: do not read uninitialized fragment address
      - rtl8150: fix use-after-free in rtl8150_start_xmit()

  Misc:

   - add Ido Schimmel as IPv4/IPv6 maintainer

   - add David Heidelberg as NFC subsystem maintainer"

* tag 'net-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
  net/sched: cls_flower: revert unintended changes
  sfc: fix error code in efx_devlink_info_running_versions()
  net: tls: fix strparser anchor skb leak on offload RX setup failure
  ice: add dpll peer notification for paired SMA and U.FL pins
  ice: fix missing dpll notifications for SW pins
  dpll: export __dpll_pin_change_ntf() for use under dpll_lock
  ice: fix SMA and U.FL pin state changes affecting paired pin
  ice: fix missing SMA pin initialization in DPLL subsystem
  ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
  ice: fix NULL pointer dereference in ice_reset_all_vfs()
  iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
  iavf: wait for PF confirmation before removing VLAN filters
  iavf: stop removing VLAN filters from PF on interface down
  iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
  page_pool: fix memory-provider leak in page_pool_create_percpu() error path
  bonding: 3ad: implement proper RCU rules for port->aggregator
  net: airoha: Do not return err in ndo_stop() callback
  hv_sock: fix ARM64 support
  MAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel
  selftests: drv-net: clarify linters and frameworks in README
  ...

6 weeks agoMerge tag 'ata-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Linus Torvalds [Thu, 30 Apr 2026 15:35:36 +0000 (08:35 -0700)] 
Merge tag 'ata-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Niklas Cassel:

 - Fix a reference leak on device_register() failure in pata_parport

* tag 'ata-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: pata_parport: switch to dynamic root device

6 weeks agoMerge tag 'sound-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 30 Apr 2026 15:29:56 +0000 (08:29 -0700)] 
Merge tag 'sound-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A bunch of small fixes. One minor fix is found in the core side for
  data race in PCM OSS layer, while remaining changes are various
  device-specific fixes and quirks.

   - Core: PCM OSS data race fix

   - HD-audio: Fixes for TAS2781, CS35L56, and Realtek/Conexant quirks;
     avoidance of a WARN_ON for HDMI channel mapping

   - USB-audio: Improvements in UAC3 parsing robustness (leaks, size
     checks) and fixes for potential endless loops

   - ASoC: Driver-specific fixes for CS35L56, Intel bytcr_wm5102,
     Spacemit, AW88395, and others, plus a new quirk for Steam Deck
     OLED

   - Misc: A UAF fix in aloop driver, division by zero fix in ua101
     driver and leak fixes in caiaq driver"

* tag 'sound-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits)
  ALSA: hda/tas2781: Fix incorrect bit update for non-book-zero or book 0 pages >1
  ALSA: hda: cs35l56: Fix uninitialized value in cs35l56_hda_read_acpi()
  ALSA: hda/conexant: Fix missing error check for jack detection
  ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks
  ALSA: usb-audio: Fix quirk entry placement for PreSonus AudioBox USB
  ASoC: spacemit: adjust FIFO trigger threshold to half FIFO size
  ASoC: spacemit: move hw constraints from hw_params to startup
  ASoC: codecs: ab8500: Fix casting of private data
  ASoC: cs35l56: Fix illegal writes to OTP_MEM registers
  ASoC: Intel: bytcr_wm5102: Fix MCLK leak on platform_clock_control error
  ALSA: usb-audio: Avoid potential endless loop in convert_chmap_v3()
  ALSA: usb-audio: Fix potential leak of pd at parsing UAC3 streams
  ALSA: caiaq: Don't abort when no input device is available
  ALSA: caiaq: Fix potentially leftover ep1_in_urb at error path
  ASoC: aw88395: Fix kernel panic caused by invalid GPIO error pointer
  ALSA: caiaq: fix usb_dev refcount leak on probe failure
  sound: ua101: fix division by zero at probe
  ALSA: usb-audio: apply quirk for Playstation PDP Riffmaster
  ALSA: hda: Remove duplicate cmedia entries in codecs Makefile
  ALSA: hda/realtek: Add micmute LED quirk for Acer Aspire A315-44P
  ...

6 weeks agoMerge branch 'dpll-add-pin-operational-state'
Paolo Abeni [Thu, 30 Apr 2026 14:22:06 +0000 (16:22 +0200)] 
Merge branch 'dpll-add-pin-operational-state'

Ivan Vecera says:

====================
dpll: add pin operational state

Add pin operational state (operstate) to the DPLL subsystem to
separate administrative intent from actual hardware status.

Currently pin-state mixes what the user requested (connected,
selectable, disconnected) with what the hardware is actually doing.
This makes it difficult to diagnose situations where a user sets
a pin as selectable or connected but the hardware cannot use it
due to signal issues.

The new operstate attribute is reported inside the pin-parent-device
nest alongside the existing state and is read-only. Defined values:

  - active: pin is qualified and actively used by the DPLL
  - standby: pin is qualified but not actively used by the DPLL
  - no-signal: pin does not have a valid signal
  - qual-failed: pin signal failed qualification checks

Patch 1 adds the operstate enum, netlink attribute and the
operstate_on_dpll_get callback to the DPLL subsystem. It also
updates Documentation/driver-api/dpll.rst to describe the
separation between admin state and operational state.

Patch 2 implements the callback for ZL3073x input pins using the
reference monitor status register. It also refactors the existing
state_on_dpll_get to return purely administrative state and switches
periodic monitoring to track operstate changes.
====================

Link: https://patch.msgid.link/20260428154907.2820654-1-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agodpll: zl3073x: implement pin operational state reporting
Ivan Vecera [Tue, 28 Apr 2026 15:49:07 +0000 (17:49 +0200)] 
dpll: zl3073x: implement pin operational state reporting

Implement operstate_on_dpll_get callback for input pins to report
the actual hardware status:

  - active: pin is the currently locked reference
  - standby: signal is valid but pin is not actively used
  - no-signal: reference monitor reports Loss of Signal (LOS)
  - qual-failed: reference monitor reports a qualification failure
    (SCM, CFM, GST, PFM, eSync or Split-XO)

Separate administrative state (state_on_dpll_get) from operational
state: admin state now reports purely the user-requested intent
(connected in reflock mode, selectable in auto mode).

Switch periodic monitoring to track operstate changes instead of
the mixed admin/oper state that was previously reported.

Add ref_mon_status bit definitions to regs.h.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Petr Oros <poros@redhat.com>
Link: https://patch.msgid.link/20260428154907.2820654-3-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agodpll: add pin operational state
Ivan Vecera [Tue, 28 Apr 2026 15:49:06 +0000 (17:49 +0200)] 
dpll: add pin operational state

Add pin-operstate enum and operstate_on_dpll_get callback to report
the actual hardware status of a pin with respect to its parent DPLL
device. Unlike pin-state (which reflects administrative intent set
by the user), operstate reflects what the hardware is actually doing.

Defined operational states:
  - active: pin is qualified and actively used by the DPLL
  - standby: pin is qualified but not actively used by the DPLL
  - no-signal: pin does not have a valid signal
  - qual-failed: pin signal failed qualification

The operstate is reported inside the pin-parent-device nested
attribute alongside the existing state and phase-offset attributes.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Petr Oros <poros@redhat.com>
Link: https://patch.msgid.link/20260428154907.2820654-2-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet/sched: cls_flower: revert unintended changes
Paolo Abeni [Wed, 29 Apr 2026 07:39:11 +0000 (09:39 +0200)] 
net/sched: cls_flower: revert unintended changes

While applying the blamed commit 4ca07b9239bd ("net: mctp i2c: check
length before marking flow active"), I unintentionally included
unrelated and unacceptable changes.

Revert them.

Fixes: 4ca07b9239bd ("net: mctp i2c: check length before marking flow active")
Reported-by: Jeremy Kerr <jk@codeconstruct.com.au>
Closes: https://lore.kernel.org/netdev/bd8704fe0bd53e278add5cde4873256656623e2e.camel@codeconstruct.com.au/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/043026a53ff84da88b17648c4b0d17f0331749cb.1777447863.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agosfc: fix error code in efx_devlink_info_running_versions()
Dan Carpenter [Wed, 29 Apr 2026 06:48:17 +0000 (09:48 +0300)] 
sfc: fix error code in efx_devlink_info_running_versions()

Return -EIO if efx_mcdi_rpc() doesn't return enough space.

Fixes: 14743ddd2495 ("sfc: add devlink info support for ef100")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/afGpsbLRHL4_H0KS@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: tls: fix strparser anchor skb leak on offload RX setup failure
Jakub Kicinski [Tue, 28 Apr 2026 23:15:59 +0000 (16:15 -0700)] 
net: tls: fix strparser anchor skb leak on offload RX setup failure

When tls_set_device_offload_rx() fails at tls_dev_add(), the error path
calls tls_sw_free_resources_rx() to clean up the SW context that was
initialized by tls_set_sw_offload(). This function calls
tls_sw_release_resources_rx() (which stops the strparser via
tls_strp_stop()) and tls_sw_free_ctx_rx() (which kfrees the context),
but never frees the anchor skb that was allocated by alloc_skb(0) in
tls_strp_init().

Note that tls_sw_free_resources_rx() is exclusively used for this
"failed to start offload" code path, there's no other caller.

The leak did not exist before commit 84c61fe1a75b ("tls: rx: do not use
the standard strparser"), because the standard strparser doesn't try
to pre-allocate an skb.

The normal close path in tls_sk_proto_close() handles cleanup by calling
tls_sw_strparser_done() (which calls tls_strp_done()) after dropping
the socket lock, because tls_strp_done() does cancel_work_sync() and
the strparser work handler takes the socket lock.

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260428231559.1358502-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoMerge branch 'intel-wired-lan-update-2026-04-27-ice-iavf'
Paolo Abeni [Thu, 30 Apr 2026 09:37:42 +0000 (11:37 +0200)] 
Merge branch 'intel-wired-lan-update-2026-04-27-ice-iavf'

Jacob Keller says:

====================
Intel Wired LAN Update 2026-04-27 (ice, iavf)

Petr Oros from RedHat has accumulated a number of fixes for the Intel ice
and iavf drivers, bundled together in this series.

First, a series of 4 fixes to resolve issues with the iavf driver logic for
handling VLAN filters. This includes keeping VLAN filters while the
interface is brought down, waiting for confirmation on filter deletion
before deleting filters from the driver tracking structures, and handling
the VIRTCHNL_OP_ADD_VLAN for the old v1 VLAN_ADD command.

A fix for a crash in ice_reset_all_vfs(), properly checking for errors when
ice_vf_rebuild_vsi() fails.

A fix for a possible infinite recursion in ice_cfg_tx_topo() that occurs
when trying to apply invalid Tx topology configuration.

A fix to initialize the SMA pins in the DPLL subsystem properly.

A fix to change the SMA and U.FL pin state for paired pins, ensuring that
all flows changing one pin will also update its shared pin appropriately.

A preparatory patch to export __dpll_pin_change_ntf() so that drivers can
notify pin changes while already holding the dpll_lock.

A fix to ensure DPLL notifications are sent for the software-controlled
pins which wrap the physical CGU input/output pins.

A fix to add DPLL notifications for peer pins when changing the SMA or U.FL
pins, ensuring DPLL subsystem is notified about the paired connected pins.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
====================

Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-0-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoice: add dpll peer notification for paired SMA and U.FL pins
Petr Oros [Tue, 28 Apr 2026 05:22:23 +0000 (22:22 -0700)] 
ice: add dpll peer notification for paired SMA and U.FL pins

SMA and U.FL pins share physical signal paths in pairs (SMA1/U.FL1 and
SMA2/U.FL2).  When one pin's state changes via a PCA9575 GPIO write,
the paired pin's state also changes, but no notification is sent for
the peer pin.  Userspace consumers monitoring the peer via dpll netlink
subscribe never learn about the update.

Add ice_dpll_sw_pin_notify_peer() which sends a change notification for
the paired SW pin.  Call it from ice_dpll_pin_sma_direction_set(),
ice_dpll_sma_pin_state_set(), and ice_dpll_ufl_pin_state_set() after
pf->dplls.lock is released.  Use __dpll_pin_change_ntf() because
dpll_lock is still held by the dpll netlink layer (dpll_pin_pre_doit).

Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control")
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-11-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoice: fix missing dpll notifications for SW pins
Petr Oros [Tue, 28 Apr 2026 05:22:22 +0000 (22:22 -0700)] 
ice: fix missing dpll notifications for SW pins

The SMA/U.FL pin redesign (commit 2dd5d03c77e2 ("ice: redesign dpll
sma/u.fl pins control")) introduced software-controlled pins that wrap
backing CGU input/output pins, but never updated the notification and
data paths to propagate pin events to these SW wrappers.

The periodic work sends dpll_pin_change_ntf() only for direct CGU input
pins.  SW pins that wrap these inputs never receive change or phase
offset notifications, so userspace consumers such as synce4l monitoring
SMA pins via dpll netlink never learn about state transitions or phase
offset updates.  Similarly, ice_dpll_phase_offset_get() reads the SW
pin's own phase_offset field which is never updated; the PPS monitor
writes to the backing CGU input's field instead.

Fix by introducing ice_dpll_pin_ntf(), a wrapper around
dpll_pin_change_ntf() that also notifies any registered SMA/U.FL pin
whose backing CGU input matches.  Replace all direct
dpll_pin_change_ntf() calls in the periodic notification paths with
this wrapper.  Fix ice_dpll_phase_offset_get() to return the backing
CGU input's phase_offset for input-direction SW pins.

Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control")
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-10-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agodpll: export __dpll_pin_change_ntf() for use under dpll_lock
Ivan Vecera [Tue, 28 Apr 2026 05:22:21 +0000 (22:22 -0700)] 
dpll: export __dpll_pin_change_ntf() for use under dpll_lock

Export __dpll_pin_change_ntf() so that drivers can send pin change
notifications from within pin callbacks, which are already called
under dpll_lock. Using dpll_pin_change_ntf() in that context would
deadlock.

Add lockdep_assert_held() to catch misuse without the lock held.

Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-9-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoice: fix SMA and U.FL pin state changes affecting paired pin
Petr Oros [Tue, 28 Apr 2026 05:22:20 +0000 (22:22 -0700)] 
ice: fix SMA and U.FL pin state changes affecting paired pin

SMA and U.FL pins share physical signal paths in pairs (SMA1/U.FL1 and
SMA2/U.FL2) controlled by the PCA9575 GPIO expander.  Each pair can
only have one active pin at a time: SMA1 output and U.FL1 output share
the same CGU output, SMA2 input and U.FL2 input share the same CGU
input.  The PCA9575 register bits determine which connector in each
pair owns the signal path.

The driver does not account for this pairing in two places:

ice_dpll_ufl_pin_state_set() modifies PCA9575 bits and disables the
backing CGU pin without checking whether the U.FL pin is currently
active.  Disconnecting an already inactive U.FL pin flips bits that
the paired SMA pin relies on, breaking its connection.

ice_dpll_sma_direction_set() does not propagate direction changes to
the paired U.FL pin.  For SMA2/U.FL2 the ICE_SMA2_UFL2_RX_DIS bit is
never managed, so U.FL2 stays disconnected after SMA2 switches to
output.  For both pairs the backing CGU pin of the U.FL side is never
enabled when a direction change activates it, so userspace sees the
pin as disconnected even though the routing is correct.

Fix by guarding the U.FL disconnect path against inactive pins and by
updating the paired U.FL pin fully on SMA direction changes: manage
ICE_SMA2_UFL2_RX_DIS for the SMA2/U.FL2 pair and enable the backing
CGU pin whenever the peer becomes active.

Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control")
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-8-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoice: fix missing SMA pin initialization in DPLL subsystem
Petr Oros [Tue, 28 Apr 2026 05:22:19 +0000 (22:22 -0700)] 
ice: fix missing SMA pin initialization in DPLL subsystem

The DPLL SMA/U.FL pin redesign introduced ice_dpll_sw_pin_frequency_get()
which gates frequency reporting on the pin's active flag. This flag is
determined by ice_dpll_sw_pins_update() from the PCA9575 GPIO expander
state. Before the redesign, SMA pins were exposed as direct HW
input/output pins and ice_dpll_frequency_get() returned the CGU
frequency unconditionally — the PCA9575 state was never consulted.

The PCA9575 powers on with all outputs high, setting ICE_SMA1_DIR_EN,
ICE_SMA1_TX_EN, ICE_SMA2_DIR_EN and ICE_SMA2_TX_EN. Nothing in the
driver writes the register during initialization, so
ice_dpll_sw_pins_update() sees all pins as inactive and
ice_dpll_sw_pin_frequency_get() permanently returns 0 Hz for every
SW pin.

Fix this by writing a default SMA configuration in
ice_dpll_init_info_sw_pins(): clear all SMA bits, then set SMA1 and
SMA2 as active inputs (DIR_EN=0) with U.FL1 output and U.FL2 input
disabled. Each SMA/U.FL pair shares a physical signal path so only
one pin per pair can be active at a time. U.FL pins still report
frequency 0 after this fix: U.FL1 (output-only) is disabled by
ICE_SMA1_TX_EN which keeps the TX output buffer off, and U.FL2
(input-only) is disabled by ICE_SMA2_UFL2_RX_DIS. They can be
activated by changing the corresponding SMA pin direction via dpll
netlink.

Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-7-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
Petr Oros [Tue, 28 Apr 2026 05:22:18 +0000 (22:22 -0700)] 
ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw

On certain E810 configurations where firmware supports Tx scheduler
topology switching (tx_sched_topo_comp_mode_en), ice_cfg_tx_topo()
may need to apply a new 5-layer or 9-layer topology from the DDP
package. If the AQ command to set the topology fails (e.g. due to
invalid DDP data or firmware limitations), the global configuration
lock must still be cleared via a CORER reset.

Commit 86aae43f21cf ("ice: don't leave device non-functional if Tx
scheduler config fails") correctly fixed this by refactoring
ice_cfg_tx_topo() to always trigger CORER after acquiring the global
lock and re-initialize hardware via ice_init_hw() afterwards.

However, commit 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end
of deinit paths") later moved ice_init_dev_hw() into ice_init_hw(),
breaking the reinit path introduced by 86aae43f21cf. This creates an
infinite recursive call chain:

  ice_init_hw()
    ice_init_dev_hw()
      ice_cfg_tx_topo()         # topology change needed
        ice_deinit_hw()
        ice_init_hw()           # reinit after CORER
          ice_init_dev_hw()     # recurse
            ice_cfg_tx_topo()
              ...               # stack overflow

Fix by moving ice_init_dev_hw() back out of ice_init_hw() and calling
it explicitly from ice_probe() and ice_devlink_reinit_up(). The third
caller, ice_cfg_tx_topo(), intentionally does not need ice_init_dev_hw()
during its reinit, it only needs the core HW reinitialization. This
breaks the recursion cleanly without adding flags or guards.

The deinit ordering changes from commit 8a37f9e2ff40 ("ice: move
ice_deinit_dev() to the end of deinit paths") which fixed slow rmmod
are preserved, only the init-side placement of ice_init_dev_hw() is
reverted.

Fixes: 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end of deinit paths")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-6-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoice: fix NULL pointer dereference in ice_reset_all_vfs()
Petr Oros [Tue, 28 Apr 2026 05:22:17 +0000 (22:22 -0700)] 
ice: fix NULL pointer dereference in ice_reset_all_vfs()

ice_reset_all_vfs() ignores the return value of ice_vf_rebuild_vsi().
When the VSI rebuild fails (e.g. during NVM firmware update via
nvmupdate64e), ice_vsi_rebuild() tears down the VSI on its error path,
leaving txq_map and rxq_map as NULL. The subsequent unconditional call
to ice_vf_post_vsi_rebuild() leads to a NULL pointer dereference in
ice_ena_vf_q_mappings() when it accesses vsi->txq_map[0].

The single-VF reset path in ice_reset_vf() already handles this
correctly by checking the return value of ice_vf_reconfig_vsi() and
skipping ice_vf_post_vsi_rebuild() on failure.

Apply the same pattern to ice_reset_all_vfs(): check the return value
of ice_vf_rebuild_vsi() and skip ice_vf_post_vsi_rebuild() and
ice_eswitch_attach_vf() on failure. The VF is left safely disabled
(ICE_VF_STATE_INIT not set, VFGEN_RSTAT not set to VFACTIVE) and can
be recovered via a VFLR triggered by a PCI reset of the VF
(sysfs reset or driver rebind).

Note that this patch does not prevent the VF VSI rebuild from failing
during NVM update — the underlying cause is firmware being in a
transitional state while the EMP reset is processed, which can cause
Admin Queue commands (ice_add_vsi, ice_cfg_vsi_lan) to fail. This
patch only prevents the subsequent NULL pointer dereference that
crashes the kernel when the rebuild does fail.

 crash> bt
     PID: 50795    TASK: ff34c9ee708dc680  CPU: 1    COMMAND: "kworker/u512:5"
      #0 [ff72159bcfe5bb50] machine_kexec at ffffffffaa8850ee
      #1 [ff72159bcfe5bba8] __crash_kexec at ffffffffaaa15fba
      #2 [ff72159bcfe5bc68] crash_kexec at ffffffffaaa16540
      #3 [ff72159bcfe5bc70] oops_end at ffffffffaa837eda
      #4 [ff72159bcfe5bc90] page_fault_oops at ffffffffaa893997
      #5 [ff72159bcfe5bce8] exc_page_fault at ffffffffab528595
      #6 [ff72159bcfe5bd10] asm_exc_page_fault at ffffffffab600bb2
         [exception RIP: ice_ena_vf_q_mappings+0x79]
         RIP: ffffffffc0a85b29  RSP: ff72159bcfe5bdc8  RFLAGS: 00010206
         RAX: 00000000000f0000  RBX: ff34c9efc9c00000  RCX: 0000000000000000
         RDX: 0000000000000000  RSI: 0000000000000010  RDI: ff34c9efc9c00000
         RBP: ff34c9efc27d4828   R8: 0000000000000093   R9: 0000000000000040
         R10: ff34c9efc27d4828  R11: 0000000000000040  R12: 0000000000100000
         R13: 0000000000000010  R14:   R15:
         ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      #7 [ff72159bcfe5bdf8] ice_sriov_post_vsi_rebuild at ffffffffc0a85e2e [ice]
      #8 [ff72159bcfe5be08] ice_reset_all_vfs at ffffffffc0a920b4 [ice]
      #9 [ff72159bcfe5be48] ice_service_task at ffffffffc0a31519 [ice]
     #10 [ff72159bcfe5be88] process_one_work at ffffffffaa93dca4
     #11 [ff72159bcfe5bec8] worker_thread at ffffffffaa93e9de
     #12 [ff72159bcfe5bf18] kthread at ffffffffaa946663
     #13 [ff72159bcfe5bf50] ret_from_fork at ffffffffaa8086b9

 The panic occurs attempting to dereference the NULL pointer in RDX at
 ice_sriov.c:294, which loads vsi->txq_map (offset 0x4b8 in ice_vsi).

 The faulting VSI is an allocated slab object but not fully initialized
 after a failed ice_vsi_rebuild():

  crash> struct ice_vsi 0xff34c9efc27d4828
    netdev = 0x0,
    rx_rings = 0x0,
    tx_rings = 0x0,
    q_vectors = 0x0,
    txq_map = 0x0,
    rxq_map = 0x0,
    alloc_txq = 0x10,
    num_txq = 0x10,
    alloc_rxq = 0x10,
    num_rxq = 0x10,

 The nvmupdate64e process was performing NVM firmware update:

  crash> bt 0xff34c9edd1a30000
  PID: 49858    TASK: ff34c9edd1a30000  CPU: 1    COMMAND: "nvmupdate64e"
   #0 [ff72159bcd617618] __schedule at ffffffffab5333f8
   #4 [ff72159bcd617750] ice_sq_send_cmd at ffffffffc0a35347 [ice]
   #5 [ff72159bcd6177a8] ice_sq_send_cmd_retry at ffffffffc0a35b47 [ice]
   #6 [ff72159bcd617810] ice_aq_send_cmd at ffffffffc0a38018 [ice]
   #7 [ff72159bcd617848] ice_aq_read_nvm at ffffffffc0a40254 [ice]
   #8 [ff72159bcd6178b8] ice_read_flat_nvm at ffffffffc0a4034c [ice]
   #9 [ff72159bcd617918] ice_devlink_nvm_snapshot at ffffffffc0a6ffa5 [ice]

 dmesg:
  ice 0000:13:00.0: firmware recommends not updating fw.mgmt, as it
    may result in a downgrade. continuing anyways
  ice 0000:13:00.1: ice_init_nvm failed -5
  ice 0000:13:00.1: Rebuild failed, unload and reload driver

Fixes: 12bb018c538c ("ice: Refactor VF reset")
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-5-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoiavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
Petr Oros [Tue, 28 Apr 2026 05:22:16 +0000 (22:22 -0700)] 
iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler

The V1 ADD_VLAN opcode had no success handler; filters sent via V1
stayed in ADDING state permanently.  Add a fallthrough case so V1
filters also transition ADDING -> ACTIVE on PF confirmation.

Critically, add an `if (v_retval) break` guard: the error switch in
iavf_virtchnl_completion() does NOT return after handling errors,
it falls through to the success switch.  Without this guard, a
PF-rejected ADD would incorrectly mark ADDING filters as ACTIVE,
creating a driver/HW mismatch where the driver believes the filter
is installed but the PF never accepted it.

For V2, this is harmless: iavf_vlan_add_reject() in the error
block already kfree'd all ADDING filters, so the success handler
finds nothing to transition.

Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-4-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoiavf: wait for PF confirmation before removing VLAN filters
Petr Oros [Tue, 28 Apr 2026 05:22:15 +0000 (22:22 -0700)] 
iavf: wait for PF confirmation before removing VLAN filters

The VLAN filter DELETE path was asymmetric with the ADD path: ADD
waits for PF confirmation (ADD -> ADDING -> ACTIVE), but DELETE
immediately frees the filter struct after sending the DEL message
without waiting for the PF response.

This is problematic because:
 - If the PF rejects the DEL, the filter remains in HW but the driver
   has already freed the tracking structure, losing sync.
 - Race conditions between DEL pending and other operations
   (add, reset) cannot be properly resolved if the filter struct
   is already gone.

Add IAVF_VLAN_REMOVING state to make the DELETE path symmetric:

  REMOVE -> REMOVING (send DEL) -> PF confirms -> kfree
                                -> PF rejects  -> ACTIVE

In iavf_del_vlans(), transition filters from REMOVE to REMOVING
instead of immediately freeing them. The new DEL completion handler
in iavf_virtchnl_completion() frees filters on success or reverts
them to ACTIVE on error.

Update iavf_add_vlan() to handle the REMOVING state: if a DEL is
pending and the user re-adds the same VLAN, queue it for ADD so
it gets re-programmed after the PF processes the DEL.

The !VLAN_FILTERING_ALLOWED early-exit path still frees filters
directly since no PF message is sent in that case.

Also update iavf_del_vlan() to skip filters already in REMOVING
state: DEL has been sent to PF and the completion handler will
free the filter when PF confirms. Without this guard, the sequence
DEL(pending) -> user-del -> second DEL could cause the PF to return
an error for the second DEL (filter already gone), causing the
completion handler to incorrectly revert a deleted filter back to
ACTIVE.

Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-3-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoiavf: stop removing VLAN filters from PF on interface down
Petr Oros [Tue, 28 Apr 2026 05:22:14 +0000 (22:22 -0700)] 
iavf: stop removing VLAN filters from PF on interface down

When a VF goes down, the driver currently sends DEL_VLAN to the PF for
every VLAN filter (ACTIVE -> DISABLE -> send DEL -> INACTIVE), then
re-adds them all on UP (INACTIVE -> ADD -> send ADD -> ADDING ->
ACTIVE). This round-trip is unnecessary because:

 1. The PF disables the VF's queues via VIRTCHNL_OP_DISABLE_QUEUES,
    which already prevents all RX/TX traffic regardless of VLAN filter
    state.

 2. The VLAN filters remaining in PF HW while the VF is down is
    harmless - packets matching those filters have nowhere to go with
    queues disabled.

 3. The DEL+ADD cycle during down/up creates race windows where the
    VLAN filter list is incomplete. With spoofcheck enabled, the PF
    enables TX VLAN filtering on the first non-zero VLAN add, blocking
    traffic for any VLANs not yet re-added.

Remove the entire DISABLE/INACTIVE state machinery:
 - Remove IAVF_VLAN_DISABLE and IAVF_VLAN_INACTIVE enum values
 - Remove iavf_restore_filters() and its call from iavf_open()
 - Remove VLAN filter handling from iavf_clear_mac_vlan_filters(),
   rename it to iavf_clear_mac_filters()
 - Remove DEL_VLAN_FILTER scheduling from iavf_down()
 - Remove all DISABLE/INACTIVE handling from iavf_del_vlans()

VLAN filters now stay ACTIVE across down/up cycles. Only explicit
user removal (ndo_vlan_rx_kill_vid) or PF/VF reset triggers VLAN
filter deletion/re-addition.

Fixes: ed1f5b58ea01 ("i40evf: remove VLAN filters on close")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-2-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoiavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
Petr Oros [Tue, 28 Apr 2026 05:22:13 +0000 (22:22 -0700)] 
iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING

Rename the IAVF_VLAN_IS_NEW state to IAVF_VLAN_ADDING to better
describe what the state represents: an ADD request has been sent to
the PF and is waiting for a response.

This is a pure rename with no behavioral change, preparing for a
cleanup of the VLAN filter state machine.

Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-1-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoMerge branch 'reimplement-tcp-ao-using-crypto-library'
Paolo Abeni [Thu, 30 Apr 2026 07:39:12 +0000 (09:39 +0200)] 
Merge branch 'reimplement-tcp-ao-using-crypto-library'

Eric Biggers says:

====================
Reimplement TCP-AO using crypto library

This series can also be retrieved from:

    git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git tcp-ao-v2

This series is targeting net-next for 7.2.  To make this series
self-contained in the networking code, I dropped the patches that remove
support for transformation cloning from the crypto API, which is a
further negative 275-line cleanup and optimization this series enables.
That will be done as a follow-up, either through the crypto tree for
7.3, or still through net-next for 7.2 at maintainer preference.

This series refactors the TCP-AO (TCP Authentication Option) code to do
MAC and KDF computations using lib/crypto/ instead of crypto_ahash.
This greatly simplifies the code and makes it much more efficient.  The
entire tcp_sigpool mechanism becomes unnecessary and is removed, as the
problems it was designed to solve don't exist with the library APIs.

The crypto API's support for crypto transformation cloning also becomes
unnecessary and will be removed in follow-up patches.  Note that as part
of that, we'll be able to roll back the addition of the reference count
to crypto_tfm, which had regressed performance for all crypto API users.

To make this simplification and optimization possible, this series also
updates the TCP-AO code to support a specific set of algorithms, rather
than arbitrary algorithms that don't make sense and are very likely not
being used, e.g. CRC-32 and HMAC-MD5.

Specifically, this series retains the support for AES-128-CMAC,
HMAC-SHA1, and HMAC-SHA256.  AES-128-CMAC and HMAC-SHA1 are the only
algorithms that are actually standardized for use in TCP-AO, while
HMAC-SHA256 makes sense to continue supporting as a Linux extension.  Of
course, other algorithms can still be (re-)added later if ever needed.
It's worth noting that TCP-AO MACs are limited to 20 bytes by the TCP
options space, which limits the benefit of further algorithm upgrades.

This series passes the tcp_ao selftests
(sudo make -C tools/testing/selftests/net/tcp_ao/ run_tests).

To get a sense for how much more efficient this makes the TCP-AO code,
here's a microbenchmark for tcp_ao_hash_skb() with skb->len == 128:

        Algorithm       Avg cycles (before)     Avg cycles (after)
        ---------       -------------------     ------------------
        HMAC-SHA1       3319                    1256
        HMAC-SHA256     3311                    1344
        AES-128-CMAC    2720                    1107
====================

Link: https://patch.msgid.link/20260427172727.9310-1-ebiggers@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet/tcp: Remove tcp_sigpool
Eric Biggers [Mon, 27 Apr 2026 17:27:27 +0000 (10:27 -0700)] 
net/tcp: Remove tcp_sigpool

tcp_sigpool is no longer used.  It existed only as a workaround for
issues in the design of the crypto_ahash API, which have been avoided by
switching to the much easier-to-use library APIs instead.  Remove it.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260427172727.9310-6-ebiggers@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet/tcp-ao: Return void from functions that can no longer fail
Eric Biggers [Mon, 27 Apr 2026 17:27:26 +0000 (10:27 -0700)] 
net/tcp-ao: Return void from functions that can no longer fail

Since tcp-ao now uses the crypto library API instead of crypto_ahash,
and MACs and keys now have a statically-known maximum size, many tcp-ao
functions can no longer fail.  Propagate this change up into the return
types of various functions.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260427172727.9310-5-ebiggers@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet/tcp-ao: Use stack-allocated MAC and traffic_key buffers
Eric Biggers [Mon, 27 Apr 2026 17:27:25 +0000 (10:27 -0700)] 
net/tcp-ao: Use stack-allocated MAC and traffic_key buffers

Now that the maximum MAC and traffic key lengths are statically-known
small values, allocate MACs and traffic keys on the stack instead of
with kmalloc.  This eliminates multiple failure-prone GFP_ATOMIC
allocations.

Note that some cases such as tcp_ao_prepare_reset() are left unchanged
for now since they would require slightly wider changes.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260427172727.9310-4-ebiggers@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet/tcp-ao: Use crypto library API instead of crypto_ahash
Eric Biggers [Mon, 27 Apr 2026 17:27:24 +0000 (10:27 -0700)] 
net/tcp-ao: Use crypto library API instead of crypto_ahash

Currently the kernel's TCP-AO implementation does the MAC and KDF
computations using the crypto_ahash API.  This API is inefficient and
difficult to use, and it has required extensive workarounds in the form
of per-CPU preallocated objects (tcp_sigpool) to work at all.

Let's use lib/crypto/ instead.  This means switching to straightforward
stack-allocated structures, virtually addressed buffers, and direct
function calls.  It also means removing quite a bit of error handling.
This makes TCP-AO quite a bit faster.

This also enables many additional cleanups, which later commits will
handle: removing tcp-sigpool, removing support for crypto_tfm cloning,
removing more error handling, and replacing more dynamically-allocated
buffers with stack buffers based on the now-statically-known limits.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260427172727.9310-3-ebiggers@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet/tcp-ao: Drop support for most non-RFC-specified algorithms
Eric Biggers [Mon, 27 Apr 2026 17:27:23 +0000 (10:27 -0700)] 
net/tcp-ao: Drop support for most non-RFC-specified algorithms

RFC 5926 (https://datatracker.ietf.org/doc/html/rfc5926) specifies the
use of AES-128-CMAC and HMAC-SHA1 with TCP-AO.  This includes a
specification for how traffic keys shall be derived for each algorithm.

Support for any other algorithms with TCP-AO isn't standardized, though
an expired Internet Draft (a work-in-progress document, not a standard)
from 2019 does propose adding HMAC-SHA256 support:
https://datatracker.ietf.org/doc/html/draft-nayak-tcp-sha2-03

Since both documents specify the KDF for each algorithm individually, it
isn't necessarily clear how any other algorithm should be integrated.

Nevertheless, the Linux implementation of TCP-AO allows userspace to
specify the MAC algorithm as a string tcp_ao_add::alg_name naming either
"cmac(aes128)" or an arbitrary algorithm in the crypto_ahash API.  The
set of valid strings is undocumented.  The implementation assumes that
"cmac(aes128)" is the only algorithm that requires an entropy extraction
step and that all algorithms accept keys with length equal to the
untruncated MAC; thus, arbitrary HMAC algorithms probably do work, but
some other MAC algorithms like AES-256-CMAC have never actually worked.

Unfortunately, this undocumented string allows many obsolete, insecure,
or redundant algorithms.  For example, "hmac(md5)" and the
non-cryptographic "crc32" are accepted.  It also ties the implementation
to crypto_ahash and requires that most memory be dynamically allocated,
making the implementation unnecessarily complex and inefficient.  Still
furthermore, this implementation requires the crypto API to support
"transformation cloning", whose only user is this feature.

Fortunately, it's very likely that only a few algorithms are actually
used in practice.  Let's restrict the set of allowed algorithms to
"cmac(aes128)" (or "cmac(aes)" with keylen=16), "hmac(sha1)", and
"hmac(sha256)".  The first two are the actually standard ones, while
HMAC-SHA256 seems like a reasonable algorithm to continue supporting as
a Linux extension, considering the Internet Draft for it and the fact
that SHA-256 is the usual choice of upgrade from the outdated SHA-1.

If any other algorithm ever turns out to be needed, e.g. HMAC-SHA512, it
can of course be (re-)added in library form.  However, note that the TCP
options space limits TCP-AO MACs to 20 bytes (160 bits) anyway, which
limits the potential benefit of any further upgrade to the algorithm.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260427172727.9310-2-ebiggers@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoMerge tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Thu, 30 Apr 2026 05:21:44 +0000 (22:21 -0700)] 
Merge tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix inverted check of registering the stats for branch tracing

   When calling register_stat_tracer() which returns zero on success and
   negative on error, the callers were checking the return of zero as an
   error and printing a warning message. Because this was just a normal
   printk() message and not a WARN(), it wasn't caught in any testing.

   Fix the check to print the warning message when an error actually
   happens.

 - Fix a typo in a comment in tracepoint.h

 - Limit the size of event probes to 3K in size

   It is possible to create a dynamic event probe via the tracefs system
   that is greater than the max size of an event that the ring buffer
   can hold. This basically causes the event to become useless.

   Limit the size of an event probe to be 3K as that should be large
   enough to handle any dynamic events being created, and fits within
   the PAGE_SIZE sub-buffers of the ring buffer.

* tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Limit size of event probe to 3K
  tracepoint: Fix typo in tracepoint.h comment
  tracing: branch: Fix inverted check on stat tracer registration

6 weeks agopage_pool: fix memory-provider leak in page_pool_create_percpu() error path
Hasan Basbunar [Tue, 28 Apr 2026 17:07:39 +0000 (19:07 +0200)] 
page_pool: fix memory-provider leak in page_pool_create_percpu() error path

When page_pool_create_percpu() fails on page_pool_list(), it falls
through to its err_uninit: label, which calls page_pool_uninit().
At that point page_pool_init() has already taken two references
when the user requested PP_FLAG_ALLOW_UNREADABLE_NETMEM:

pool->mp_ops->init(pool)
static_branch_inc(&page_pool_mem_providers);

Neither is undone by page_pool_uninit(); both are only undone by
__page_pool_destroy() (success-side teardown). The error path
therefore leaks the per-provider reference taken by mp_ops->init
(io_zcrx_ifq->refs in the io_uring zcrx provider, the dmabuf
binding refcount in the devmem provider) plus one increment of
the page_pool_mem_providers static branch on every failure of
xa_alloc_cyclic() inside page_pool_list().

The leaked io_zcrx_ifq->refs in turn pins everything
io_zcrx_ifq_free() would release on cleanup: ifq->user (uid),
ifq->mm_account (mmdrop), ifq->dev (device refcount),
ifq->netdev_tracker (netdev refcount), and the rbuf region.
The leaked static branch increment forces all subsequent
page_pool_alloc_netmems() and page_pool_return_page() callers to
take the slow mp_ops branch for the lifetime of the kernel.

Reachable via the io_uring zcrx path:

io_uring_register(IORING_REGISTER_ZCRX_IFQ)  /* CAP_NET_ADMIN */
  -> __io_uring_register
  -> io_register_zcrx
  -> zcrx_register_netdev
  -> netif_mp_open_rxq
  -> driver ndo_queue_mem_alloc
  -> page_pool_create_percpu
    -> page_pool_init succeeds (mp_ops->init runs, branch++)
    -> page_pool_list fails (xa_alloc_cyclic -ENOMEM)
    -> goto err_uninit         <-- leak

The same shape applies to the devmem dmabuf provider via
mp_dmabuf_devmem_init()/mp_dmabuf_devmem_destroy().

Restore the cleanup symmetry by moving the mp_ops->destroy() and
static_branch_dec() calls out of __page_pool_destroy() and into
page_pool_uninit(), so page_pool_uninit() is again the strict
inverse of page_pool_init(). page_pool_uninit() has only two
callers (the err_uninit: path and __page_pool_destroy()), so this
preserves the single-call invariant on the success path while
fixing the err path. The error path of page_pool_init() itself
still skips the mp_ops cleanup correctly: mp_ops->init is the
last action that takes a reference before page_pool_init() returns
0, so when it returns an error neither the refcount nor the static
branch has been touched.

Triggering the bug requires xa_alloc_cyclic() to fail with -ENOMEM,
which under normal GFP_KERNEL retry behaviour is rare. It is
deterministic under CONFIG_FAULT_INJECTION with fail_page_alloc /
xa fault injection, or under sustained memory pressure. The leak
is silent: there is no warning, and the released kernel build
continues running with a permanently-incremented static branch.

Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
Signed-off-by: Hasan Basbunar <basbunarhasan@gmail.com>
Link: https://patch.msgid.link/20260428170739.34881-1-basbunarhasan@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: add tcp_mstamp_refresh_inline()
Eric Dumazet [Wed, 29 Apr 2026 01:08:09 +0000 (01:08 +0000)] 
tcp: add tcp_mstamp_refresh_inline()

We want to inline tcp_mstamp_refresh() in fast path only:

- tcp_rcv_established()
- tcp_write_xmit()

Add tcp_mstamp_refresh_inline() for this purpose.

Add noinline qualifier on tcp_mstamp_refresh() for the other paths,
to reduce bloat.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 1/4 up/down: 26/-123 (-97)
Function                                     old     new   delta
tcp_rcv_established                         2238    2264     +26
tcp_connect                                 4027    4003     -24
tcp_tsq_write                                152     120     -32
tcp_send_active_reset                        476     444     -32
tcp_send_window_probe                        235     200     -35
Total: Before=25316710, After=25316613, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260429010809.784315-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agobonding: 3ad: implement proper RCU rules for port->aggregator
Eric Dumazet [Tue, 28 Apr 2026 12:32:07 +0000 (12:32 +0000)] 
bonding: 3ad: implement proper RCU rules for port->aggregator

syzbot found a data-race in bond_3ad_get_active_agg_info /
bond_3ad_state_machine_handler [1] which hints at lack of proper
RCU implementation.

Add __rcu qualifier to port->aggregator, and add proper RCU API.

[1]

BUG: KCSAN: data-race in bond_3ad_get_active_agg_info / bond_3ad_state_machine_handler

write to 0xffff88813cf5c4b0 of 8 bytes by task 36 on cpu 0:
  ad_port_selection_logic drivers/net/bonding/bond_3ad.c:1659 [inline]
  bond_3ad_state_machine_handler+0x9d5/0x2d60 drivers/net/bonding/bond_3ad.c:2569
  process_one_work kernel/workqueue.c:3302 [inline]
  process_scheduled_works+0x4f0/0x9c0 kernel/workqueue.c:3385
  worker_thread+0x58a/0x780 kernel/workqueue.c:3466
  kthread+0x22a/0x280 kernel/kthread.c:436
  ret_from_fork+0x146/0x330 arch/x86/kernel/process.c:158
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

read to 0xffff88813cf5c4b0 of 8 bytes by task 22063 on cpu 1:
  __bond_3ad_get_active_agg_info drivers/net/bonding/bond_3ad.c:2858 [inline]
  bond_3ad_get_active_agg_info+0x8c/0x230 drivers/net/bonding/bond_3ad.c:2881
  bond_fill_info+0xe0f/0x10f0 drivers/net/bonding/bond_netlink.c:853
  rtnl_link_info_fill net/core/rtnetlink.c:906 [inline]
  rtnl_link_fill+0x1d7/0x4e0 net/core/rtnetlink.c:927
  rtnl_fill_ifinfo+0xf8e/0x1380 net/core/rtnetlink.c:2168
  rtmsg_ifinfo_build_skb+0x11c/0x1b0 net/core/rtnetlink.c:4453
  rtmsg_ifinfo_event net/core/rtnetlink.c:4486 [inline]
  rtmsg_ifinfo+0x6d/0x110 net/core/rtnetlink.c:4495
  __dev_notify_flags+0x76/0x390 net/core/dev.c:9790
  netif_change_flags+0xac/0xd0 net/core/dev.c:9823
  do_setlink+0x905/0x2950 net/core/rtnetlink.c:3180
  rtnl_group_changelink net/core/rtnetlink.c:3813 [inline]
  __rtnl_newlink net/core/rtnetlink.c:3981 [inline]
  rtnl_newlink+0xf55/0x1400 net/core/rtnetlink.c:4109
  rtnetlink_rcv_msg+0x64b/0x720 net/core/rtnetlink.c:6995
  netlink_rcv_skb+0x123/0x220 net/netlink/af_netlink.c:2550
  rtnetlink_rcv+0x1c/0x30 net/core/rtnetlink.c:7022
  netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
  netlink_unicast+0x5a8/0x680 net/netlink/af_netlink.c:1344
  netlink_sendmsg+0x5c8/0x6f0 net/netlink/af_netlink.c:1894
  sock_sendmsg_nosec net/socket.c:787 [inline]
  __sock_sendmsg net/socket.c:802 [inline]
  ____sys_sendmsg+0x563/0x5b0 net/socket.c:2698
  ___sys_sendmsg+0x195/0x1e0 net/socket.c:2752
  __sys_sendmsg net/socket.c:2784 [inline]
  __do_sys_sendmsg net/socket.c:2789 [inline]
  __se_sys_sendmsg net/socket.c:2787 [inline]
  __x64_sys_sendmsg+0xd4/0x160 net/socket.c:2787
  x64_sys_call+0x194c/0x3020 arch/x86/include/generated/asm/syscalls_64.h:47
  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
  do_syscall_64+0x12c/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x0000000000000000 -> 0xffff88813cf5c400

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 22063 Comm: syz.0.31122 Tainted: G        W           syzkaller #0 PREEMPT(full)
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026

Fixes: 47e91f56008b ("bonding: use RCU protection for 3ad xmit path")
Reported-by: syzbot+9bb2ff2a4ab9e17307e1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69f0a82f.050a0220.3aadc4.0000.GAE@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Link: https://patch.msgid.link/20260428123207.3809211-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: airoha: Do not return err in ndo_stop() callback
Lorenzo Bianconi [Tue, 28 Apr 2026 06:53:16 +0000 (08:53 +0200)] 
net: airoha: Do not return err in ndo_stop() callback

Always complete the airoha_dev_stop() routine regardless of the
airoha_set_vip_for_gdm_port() return value, since errors from
ndo_stop() are ignored by the networking stack and the interface is
always considered down after the call.

Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260428-airoha-ndo-stop-not-err-v1-1-674506d29a91@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: mdio: drop unneeded dependency on OF_GPIO
Bartosz Golaszewski [Tue, 28 Apr 2026 09:33:38 +0000 (11:33 +0200)] 
net: mdio: drop unneeded dependency on OF_GPIO

OF_GPIO is selected automatically on all OF systems. Any symbols it
controls also provide stubs so there's really no reason to select it
explicitly.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260428093338.35043-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: airoha: Rename get_src_port_id callback in get_sport
Lorenzo Bianconi [Tue, 28 Apr 2026 05:23:38 +0000 (07:23 +0200)] 
net: airoha: Rename get_src_port_id callback in get_sport

For code consistency, rename get_src_port_id callback in get_sport.
Please note this patch does not introduce any logical change and it is
just a cosmetic patch.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260428-airoha-get_src_port_id-callback-v1-1-3f765c91c1e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agor8152: Use ocp/mdio test and clear functions in r8157_hw_phy_cfg()
Birger Koblitz [Tue, 28 Apr 2026 03:44:58 +0000 (05:44 +0200)] 
r8152: Use ocp/mdio test and clear functions in r8157_hw_phy_cfg()

Replace explicit testing of bits and clearing these bits by existing
functions ocp_word_test_and_clr_bits() and r8152_mdio_test_and_clr_bit()
to re-use this code.

This allows to remove the "ocp_data" variable. Also remove the "ret" variable
which was incorrectly used for the r8153_phy_status() return value which
is a u16, so that the remaining "data" variable is sufficient.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Link: https://patch.msgid.link/20260428-use_bit_functions-v1-1-6eb5a3507610@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agor8152: Fix double consecutive clearing of PLA_MCU_SPDWN_EN bit
Birger Koblitz [Tue, 28 Apr 2026 03:43:38 +0000 (05:43 +0200)] 
r8152: Fix double consecutive clearing of PLA_MCU_SPDWN_EN bit

Due to a Copy & Paste Error, the PLA_MCU_SPDWN_EN bit was cleared
twice consecutively using ocp_word_clr_bits. Fix that.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Link: https://patch.msgid.link/20260428-patch_double-v1-1-27c830a9eb2e@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-mlx5-fix-e-switch-work-queue-deadlock-with-devlink-lock'
Jakub Kicinski [Thu, 30 Apr 2026 00:46:30 +0000 (17:46 -0700)] 
Merge branch 'net-mlx5-fix-e-switch-work-queue-deadlock-with-devlink-lock'

Tariq Toukan says:

====================
net/mlx5: Fix E-Switch work queue deadlock with devlink lock

mlx5_eswitch_cleanup() calls destroy_workqueue() while holding the
devlink lock through mlx5_uninit_one(). E-Switch workqueue workers also
need the devlink lock, but previously took it before checking whether
their work item was stale. Cleanup can therefore wait for a worker that
is blocked on the same devlink lock.

Mode changes have the same ordering hazard: the mode-change path holds
devlink lock while tearing down the current mode, and old work may still
be pending on the E-Switch workqueue.

Fix this by making esw_wq_handler() check the generation counter before
attempting to take devlink lock. The worker uses devl_trylock(); if the
lock is busy and the work is still current, it sleeps on an E-Switch wait
queue with a short timeout. Invalidation increments the generation
counter and wakes the wait queue, so stale workers exit without spinning
or blocking cleanup.

The generation counter already existed but was buried in
mlx5_esw_functions and only covered function-change events. The three
patches get from there to the fix in small steps.

Patch 1 moves the counter up to mlx5_eswitch. Pure refactor,
no behavior change.

Patch 2 cleans up the work queue plumbing: factors out the repeated
lock/check/dispatch boilerplate into a single esw_wq_handler() and
adds mlx5_esw_add_work() as the one place to enqueue work.

Patch 3 is the actual fix: check the generation before the lock, use
devl_trylock() instead of devl_lock(), add a wait queue so lock retries
do not spin, and invalidate pending work at the earliest safe operation
boundary. Cleanup invalidates before destroy_workqueue(), and mode
teardown unregisters the work-producing notifiers before invalidating so
new notifier work cannot capture the new generation.
====================

Link: https://patch.msgid.link/20260428051018.219093-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5: E-Switch, fix deadlock between devlink lock and esw->wq
Mark Bloch [Tue, 28 Apr 2026 05:10:17 +0000 (08:10 +0300)] 
net/mlx5: E-Switch, fix deadlock between devlink lock and esw->wq

mlx5_eswitch_cleanup() calls destroy_workqueue() while holding the
devlink lock through mlx5_uninit_one(). E-Switch workqueue workers also
need the devlink lock, but previously took it before checking whether
their work item was stale. This can deadlock when cleanup waits for a
worker that is blocked on the same devlink lock.

Mode changes have the same ordering hazard: the mode-change path holds
devlink lock while tearing down the current mode, and old work may still
be pending on the E-Switch workqueue.

Fix this by making esw_wq_handler() check the generation counter before
attempting to take devlink lock. The worker uses devl_trylock(); if the
lock is busy and the work is still current, it sleeps on an E-Switch wait
queue with a short timeout. Invalidation increments the generation
counter and wakes the wait queue, so stale workers exit without spinning
or blocking cleanup.

Invalidate work at the earliest safe operation boundary. Cleanup
invalidates before destroy_workqueue(), and QoS cleanup runs after the
workqueue is destroyed. Mode teardown unregisters the work-producing
notifiers first, then invalidates the queue before tearing down
FDB/QoS/rate-node state. This prevents new notifier work from capturing
the new generation while still making old work stale before expensive
teardown starts.

mlx5_devlink_eswitch_mode_set() now relies on
mlx5_eswitch_disable_locked() for the mode-change invalidation instead
of incrementing the generation after disable. mlx5_eswitch_disable()
gets the same coverage. SR-IOV enable/disable paths invalidate before VF
state changes so work against the old VF count or mode is discarded.

Remove the conditional generation increment in
mlx5_eswitch_event_handler_unregister(); mlx5_eswitch_disable_locked()
now handles it unconditionally after the relevant notifiers are
unregistered.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260428051018.219093-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5: E-Switch, introduce generic work queue dispatch helper
Mark Bloch [Tue, 28 Apr 2026 05:10:16 +0000 (08:10 +0300)] 
net/mlx5: E-Switch, introduce generic work queue dispatch helper

Each E-Switch work item requires the same boilerplate: acquire the
devlink lock, check whether the work is stale, dispatch to the
appropriate handler, and release the lock. Factor this out.

Add a func callback to mlx5_host_work so the generic handler
esw_wq_handler() can dispatch to the right function without
duplicating locking logic. Introduce mlx5_esw_add_work() as the
single enqueue point: it stamps the work item with the current
generation counter and queues it onto the E-Switch work queue.

Refactor esw_vfs_changed_event_handler() to match the new contract:
it no longer receives work_gen or out as parameters. It queries
mlx5_esw_query_functions() itself and owns the kvfree() of the
result. The devlink lock is acquired and released by esw_wq_handler()
before dispatching, so the handler runs with the lock already held.

Update mlx5_esw_funcs_changed_handler() to use mlx5_esw_add_work().

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260428051018.219093-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5: E-Switch, move work queue generation counter
Mark Bloch [Tue, 28 Apr 2026 05:10:15 +0000 (08:10 +0300)] 
net/mlx5: E-Switch, move work queue generation counter

The generation counter in mlx5_esw_functions is used to detect stale
work items on the E-Switch work queue. Move it from mlx5_esw_functions
to the top-level mlx5_eswitch struct so it can guard all work types,
not just function-change events.

This is a mechanical refactor: no behavioral change.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260428051018.219093-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agohv_sock: fix ARM64 support
Hamza Mahfooz [Tue, 28 Apr 2026 12:53:39 +0000 (08:53 -0400)] 
hv_sock: fix ARM64 support

VMBUS ring buffers must be page aligned. Therefore, the current value of
24K presents a challenge on ARM64 kernels (with 64K pages). So, use
VMBUS_RING_SIZE() to ensure they are always aligned and large enough to
hold all of the relevant data.

Cc: stable@vger.kernel.org
Fixes: 77ffe33363c0 ("hv_sock: use HV_HYP_PAGE_SIZE for Hyper-V communication")
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260428125339.13963-1-hamzamahfooz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: aquantia: use ADVERTISE_XNP for extended next page advertising
Maxime Chevallier [Tue, 28 Apr 2026 12:58:27 +0000 (14:58 +0200)] 
net: phy: aquantia: use ADVERTISE_XNP for extended next page advertising

When configuring the link parameters in forced mode for the AQR-105, the
Extended Next Page bit gets advertised for Multi-Gigabit modes.

This is done through bit 12 of MDIO_AN_ADVERTISE in MDIO_MMD_AN. This
contains a copy of the MII_ADVERTISE, for which 802.3 defines bit 12 as
the Extended Next Page advertising. This bit used to be marked as
reserved, but a proper define for it was added in :

commit e7a62edd34b1 ("net: phy: qcom: at803x: Use the correct bit to disable extended next page")

Let's use it instead of the ADVERTISE_RESV definition, making the code
more self-documenting.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260428125827.238469-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-psp-add-more-validation'
Jakub Kicinski [Wed, 29 Apr 2026 23:55:57 +0000 (16:55 -0700)] 
Merge branch 'net-psp-add-more-validation'

Jakub Kicinski says:

====================
net: psp: add more validation

Address some AI code-scan issues with the PSP code.
I don't think any of these are real bugs, but they may
become bugs in the future. The two real bugs discovered
were posted separately for net. AI reports 3 more which
seem plain wrong (rx SPI "leak" on error etc.).
====================

Link: https://patch.msgid.link/20260428205352.1247325-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agopsp: validate IPv4 header fields in psp_dev_rcv()
Jakub Kicinski [Tue, 28 Apr 2026 20:53:52 +0000 (13:53 -0700)] 
psp: validate IPv4 header fields in psp_dev_rcv()

psp_dev_rcv() is called from the NIC driver's RX completion path
before the frame reaches ip_rcv_core(), so the IP header has not
been validated in SW, yet. We expect that the device has done
all this validation, but let's also add the SW checks, to avoid
surprises.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260428205352.1247325-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agopsp: add a comment about a psp_dev add netlink notification
Jakub Kicinski [Tue, 28 Apr 2026 20:53:51 +0000 (13:53 -0700)] 
psp: add a comment about a psp_dev add netlink notification

In psp_dev_create(), the DEV_ADD_NTF netlink notification is sent
before the device is published to the netdev via rcu_assign_pointer().
IIRC this is intentional because a single PSP device is expected
to be shared with multiple netdevs. So we are trying to default to
not having the netdev info. We can change it if someone complains
but for now just add a comment that it's intentional.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260428205352.1247325-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agopsp: validate protocol before mutating skb in psp_dev_encapsulate()
Jakub Kicinski [Tue, 28 Apr 2026 20:53:50 +0000 (13:53 -0700)] 
psp: validate protocol before mutating skb in psp_dev_encapsulate()

Code checkers / AI scans will complain that we have already modified
the packet by the time we realize that protocol is not IP.

Move the skb->protocol check to before skb_push()/memmove() so that
the skb is not left in a corrupted state when the function returns
false for an unsupported protocol. psp_dev_rcv() follows similar
pattern.

Today this path is unreachable because both in-tree callers (mlx5 and
netdevsim) only reach psp_dev_encapsulate() from TCP socket TX paths
where skb->protocol is always ETH_P_IP or ETH_P_IPV6, and both drop
the skb on a false return, anyway.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260428205352.1247325-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: drv-net: rss: add case for field config on RSS context
Jakub Kicinski [Tue, 28 Apr 2026 20:36:24 +0000 (13:36 -0700)] 
selftests: drv-net: rss: add case for field config on RSS context

We had some issues with a suspected traffic imbalance on an RSS
context. Make sure the tests cover the RXFH field selection
vs additional contexts.

Tested-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260428203624.1224387-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel
Jakub Kicinski [Tue, 28 Apr 2026 20:39:24 +0000 (13:39 -0700)] 
MAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel

The IPv4/IPv6 and routing code is not very well separated from
the TCP/UDP code. Scope it down properly by providing a more
accurate file list, instead of net/ipv4/ and net/ipv6/

Now that the entry is more accurately representing layer 3
and routing merge in the nexthop entry into it.

Add Ido Schimmel as a co-maintainer, Ido's git history speaks
for itself.

Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260428203924.1229169-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>