]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
6 weeks agonet: stmmac: dwmac-loongson: Add new GMAC's PCI device ID support
Huacai Chen [Thu, 24 Apr 2025 07:22:09 +0000 (15:22 +0800)] 
net: stmmac: dwmac-loongson: Add new GMAC's PCI device ID support

Add a new GMAC's PCI device ID (0x7a23) support which is used in
Loongson-2K3000/Loongson-3B6000M. The new GMAC device use external PHY,
so it reuses loongson_gmac_data() as the old GMAC device (0x7a03), and
the new GMAC device still doesn't support flow control now.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Tested-by: Henry Chen <chenx97@aosc.io>
Tested-by: Biao Dong <dongbiao@loongson.cn>
Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://patch.msgid.link/20250424072209.3134762-4-chenhuacai@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: dwmac-loongson: Add new multi-chan IP core support
Huacai Chen [Thu, 24 Apr 2025 07:22:08 +0000 (15:22 +0800)] 
net: stmmac: dwmac-loongson: Add new multi-chan IP core support

Add a new multi-chan IP core (0x12) support which is used in Loongson-
2K3000/Loongson-3B6000M. Compared with the 0x10 core, the new 0x12 core
reduces channel numbers from 8 to 4, but checksum is supported for all
channels.

Add a "multichan" flag to loongson_data, so that we can simply use a
"if (ld->multichan)" condition rather than the complicated condition
"if (ld->loongson_id == DWMAC_CORE_MULTICHAN_V1 || ld->loongson_id ==
DWMAC_CORE_MULTICHAN_V2)".

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Henry Chen <chenx97@aosc.io>
Tested-by: Biao Dong <dongbiao@loongson.cn>
Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://patch.msgid.link/20250424072209.3134762-3-chenhuacai@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-stmmac-socfpga-1000basex-support-and-cleanups'
Jakub Kicinski [Sat, 26 Apr 2025 02:22:03 +0000 (19:22 -0700)] 
Merge branch 'net-stmmac-socfpga-1000basex-support-and-cleanups'

Maxime Chevallier says:

====================
net: stmmac: socfpga: 1000BaseX support and cleanups

This small series sorts-out 1000BaseX support and does a bit of cleanup
for the Lynx conversion.

Patch 1 makes sure that we set the right phy_mode when working in
1000BaseX mode, so that the internal GMII is configured correctly.

Patch 2 removes a check for phy_device upon calling fix_mac_speed(). As
the SGMII adapter may be chained to a Lynx PCS, checking for a
phy_device to be attached to the netdev before enabling the SGMII
adapter doesn't make sense, as we won't have a downstream PHY when using
1000BaseX.

Patch 3 cleans an unused field from the PCS conversion.

v1: https://lore.kernel.org/20250422094701.49798-1-maxime.chevallier@bootlin.com
v2: https://lore.kernel.org/20250423104646.189648-1-maxime.chevallier@bootlin.com
====================

Link: https://patch.msgid.link/20250424071223.221239-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: socfpga: Remove unused pcs-mdiodev field
Maxime Chevallier [Thu, 24 Apr 2025 07:12:22 +0000 (09:12 +0200)] 
net: stmmac: socfpga: Remove unused pcs-mdiodev field

When dwmac-socfpga was converted to using the Lynx PCS (previously
referred to in the driver as the Altera TSE PCS), the
lynx_pcs_create_mdiodev() was used to create the pcs instance.

As this function didn't exist in the early versions of the series, a
local mdiodev object was stored for PCS creation. It was never used, but
still made it into the driver, so remove it.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250424071223.221239-4-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: socfpga: Don't check for phy to enable the SGMII adapter
Maxime Chevallier [Thu, 24 Apr 2025 07:12:21 +0000 (09:12 +0200)] 
net: stmmac: socfpga: Don't check for phy to enable the SGMII adapter

The SGMII adapter needs to be enabled for both Cisco SGMII and 1000BaseX
operations. It doesn't make sense to check for an attached phydev here,
as we simply might not have any, in particular if we're using the
1000BaseX interface mode.

Make so that we only re-enable the SGMII adapter when it's present, and
when we use a phy_mode that is handled by said adapter.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250424071223.221239-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: socfpga: Enable internal GMII when using 1000BaseX
Maxime Chevallier [Thu, 24 Apr 2025 07:12:20 +0000 (09:12 +0200)] 
net: stmmac: socfpga: Enable internal GMII when using 1000BaseX

Dwmac Socfpga may be used with an instance of a Lynx / Altera TSE PCS,
in which case it gains support for 1000BaseX.

It appears that the PCS is wired to the MAC through an internal GMII
bus. Make sure that we enable the GMII_MII mode for the internal MAC when
using 1000BaseX.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250424071223.221239-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: dwmac-loongson: Move queue number init to common function
Huacai Chen [Thu, 24 Apr 2025 07:22:07 +0000 (15:22 +0800)] 
net: stmmac: dwmac-loongson: Move queue number init to common function

Currently, the tx and rx queue number initialization is duplicated in
loongson_gmac_data() and loongson_gnet_data(), so move it to the common
function loongson_default_data().

This is a preparation for later patches.

Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Tested-by: Henry Chen <chenx97@aosc.io>
Tested-by: Biao Dong <dongbiao@loongson.cn>
Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agobonding: assign random address if device address is same as bond
Hangbin Liu [Thu, 24 Apr 2025 04:22:38 +0000 (04:22 +0000)] 
bonding: assign random address if device address is same as bond

This change addresses a MAC address conflict issue in failover scenarios,
similar to the problem described in commit a951bc1e6ba5 ("bonding: correct
the MAC address for 'follow' fail_over_mac policy").

In fail_over_mac=follow mode, the bonding driver expects the formerly active
slave to swap MAC addresses with the newly active slave during failover.
However, under certain conditions, two slaves may end up with the same MAC
address, which breaks this policy:

1) ip link set eth0 master bond0
   -> bond0 adopts eth0's MAC address (MAC0).

2) ip link set eth1 master bond0
   -> eth1 is added as a backup with its own MAC (MAC1).

3) ip link set eth0 nomaster
   -> eth0 is released and restores its MAC (MAC0).
   -> eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.

4) ip link set eth0 master bond0
   -> eth0 is re-added to bond0, now both eth0 and eth1 have MAC0.

This results in a MAC address conflict and violates the expected behavior
of the failover policy.

To fix this, we assign a random MAC address to any newly added slave if
its current MAC address matches that of the bond. The original (permanent)
MAC address is saved and will be restored when the device is released
from the bond.

This ensures that each slave has a unique MAC address during failover
transitions, preserving the integrity of the fail_over_mac=follow policy.

Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 weeks agoMerge branch 'io_uring-zcrx-fix-selftests-and-add-new-test-for-rss-ctx'
Jakub Kicinski [Sat, 26 Apr 2025 01:44:11 +0000 (18:44 -0700)] 
Merge branch 'io_uring-zcrx-fix-selftests-and-add-new-test-for-rss-ctx'

David Wei says:

====================
io_uring/zcrx: fix selftests and add new test for rss ctx

Update io_uring zero copy receive selftest. Patch 1 does a requested
cleanup to use defer() for undoing ethtool actions during the test and
restoring the NIC under test back to its original state.

Patch 2 adds a required call to set hds_thresh to 0. This is needed for
the queue API.

Patch 3 adds a new test case for steering into RSS contexts. A real
application using io_uring zero copy receive relies on this working to
shard work across multiple queues. There seems to be some
differences/bugs with steering into RSS contexts and individual queues.
====================

Link: https://patch.msgid.link/20250425022049.3474590-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoio_uring/zcrx: selftests: add test case for rss ctx
David Wei [Fri, 25 Apr 2025 02:20:49 +0000 (19:20 -0700)] 
io_uring/zcrx: selftests: add test case for rss ctx

RSS contexts are used to shard work across multiple queues for an
application using io_uring zero copy receive. Add a test case checking
that steering flows into an RSS context works.

Until I add multi-thread support to the selftest binary, this test case
only has 1 queue in the RSS context.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250425022049.3474590-4-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoio_uring/zcrx: selftests: set hds_thresh to 0
David Wei [Fri, 25 Apr 2025 02:20:48 +0000 (19:20 -0700)] 
io_uring/zcrx: selftests: set hds_thresh to 0

Setting hds_thresh to 0 is required for queue reset.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250425022049.3474590-3-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoio_uring/zcrx: selftests: switch to using defer() for cleanup
David Wei [Fri, 25 Apr 2025 02:20:47 +0000 (19:20 -0700)] 
io_uring/zcrx: selftests: switch to using defer() for cleanup

Switch to using defer() for putting the NIC back to the original state
prior to running the selftest.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250425022049.3474590-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'fix-netdevim-to-correctly-mark-napi-ids'
Jakub Kicinski [Fri, 25 Apr 2025 01:30:36 +0000 (18:30 -0700)] 
Merge branch 'fix-netdevim-to-correctly-mark-napi-ids'

Joe Damato says:

====================
Fix netdevim to correctly mark NAPI IDs

This series fixes netdevsim to correctly set the NAPI ID on the skb.
This is helpful for writing tests around features that use
SO_INCOMING_NAPI_ID.

In addition to the netdevsim fix in patch 1, patches 2 & 3 do some self
test refactoring and add a test for NAPI IDs. The test itself (patch 3)
introduces a C helper because apparently python doesn't have
socket.SO_INCOMING_NAPI_ID.

v3: https://lore.kernel.org/20250418013719.12094-1-jdamato@fastly.com
v2: https://lore.kernel.org/20250417013301.39228-1-jdamato@fastly.com
rfcv1: https://lore.kernel.org/20250329000030.39543-1-jdamato@fastly.com
====================

Link: https://patch.msgid.link/20250424002746.16891-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: drv-net: Test that NAPI ID is non-zero
Joe Damato [Thu, 24 Apr 2025 00:27:33 +0000 (00:27 +0000)] 
selftests: drv-net: Test that NAPI ID is non-zero

Test that the SO_INCOMING_NAPI_ID of a network file descriptor is
non-zero. This ensures that either the core networking stack or, in some
cases like netdevsim, the driver correctly sets the NAPI ID.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250424002746.16891-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: drv-net: Factor out ksft C helpers
Joe Damato [Thu, 24 Apr 2025 00:27:32 +0000 (00:27 +0000)] 
selftests: drv-net: Factor out ksft C helpers

Factor ksft C helpers to a header so they can be used by other C-based
tests.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250424002746.16891-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetdevsim: Mark NAPI ID on skb in nsim_rcv
Joe Damato [Thu, 24 Apr 2025 00:27:31 +0000 (00:27 +0000)] 
netdevsim: Mark NAPI ID on skb in nsim_rcv

Previously, nsim_rcv was not marking the NAPI ID on the skb, leading to
applications seeing a napi ID of 0 when using SO_INCOMING_NAPI_ID.

To add to the userland confusion, netlink appears to correctly report
the NAPI IDs for netdevsim queues but the resulting file descriptor from
a call to accept() was reporting a NAPI ID of 0.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250424002746.16891-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotools: ynl: fix the header guard name for OVPN
Jakub Kicinski [Wed, 23 Apr 2025 22:02:31 +0000 (15:02 -0700)] 
tools: ynl: fix the header guard name for OVPN

Thorsten reports that after upgrading system headers from linux-next
the YNL build breaks. I typo'ed the header guard, _H is missing.

Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Link: https://lore.kernel.org/59ba7a94-17b9-485f-aa6d-14e4f01a7a39@leemhuis.info
Fixes: 12b196568a3a ("tools: ynl: add missing header deps")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250423220231.1035931-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethernet: mtk_wed: annotate RCU release in attach()
Johannes Berg [Wed, 23 Apr 2025 15:08:08 +0000 (17:08 +0200)] 
net: ethernet: mtk_wed: annotate RCU release in attach()

There are some sparse warnings in wifi, and it seems that
it's actually possible to annotate a function pointer with
__releases(), making the sparse warnings go away. In a way
that also serves as documentation that rcu_read_unlock()
must be called in the attach method, so add that annotation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250423150811.456205-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'tcp-fastopen-observability'
Jakub Kicinski [Fri, 25 Apr 2025 01:21:07 +0000 (18:21 -0700)] 
Merge branch 'tcp-fastopen-observability'

Jeremy Harris says:

====================
tcp: fastopen: observability

Whether TCP Fast Open was used for a connection is not reliably
observable by an accepting application when the SYN passed no data.

Fix this by noting during SYN receive processing that an acceptable Fast
Open option was used, and provide this to userland via getsockopt TCP_INFO.
====================

Link: https://patch.msgid.link/20250423124334.4916-1-jgh@exim.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: fastopen: pass TFO child indication through getsockopt
Jeremy Harris [Wed, 23 Apr 2025 12:43:34 +0000 (13:43 +0100)] 
tcp: fastopen: pass TFO child indication through getsockopt

tcp: fastopen: pass TFO child indication through getsockopt

Note that this uses up the last bit of a field in struct tcp_info

Signed-off-by: Jeremy Harris <jgh@exim.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250423124334.4916-3-jgh@exim.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: fastopen: note that a child socket was created
Jeremy Harris [Wed, 23 Apr 2025 12:43:33 +0000 (13:43 +0100)] 
tcp: fastopen: note that a child socket was created

tcp: fastopen: note that a child socket was created

This uses up the last bit in a field of tcp_sock.

Signed-off-by: Jeremy Harris <jgh@exim.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250423124334.4916-2-jgh@exim.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ip_gre: Fix spelling mistake "demultiplexor" -> "demultiplexer"
Colin Ian King [Wed, 23 Apr 2025 11:37:19 +0000 (12:37 +0100)] 
net: ip_gre: Fix spelling mistake "demultiplexor" -> "demultiplexer"

There is a spelling mistake in a pr_info message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patch.msgid.link/20250423113719.173539-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agorxrpc: rxgk: Fix some reference count leaks
Dan Carpenter [Wed, 23 Apr 2025 08:25:45 +0000 (11:25 +0300)] 
rxrpc: rxgk: Fix some reference count leaks

These paths should call rxgk_put(gk) but they don't.  In the
rxgk_construct_response() function the "goto error;" will free the
"response" skb as well calling rxgk_put() so that's a bonus.

Fixes: 9d1d2b59341f ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/aAikCbsnnzYtVmIA@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethernet: mtk_eth_soc: convert cap_bit in mtk_eth_muxc struct to u64
Bo-Cun Chen [Wed, 23 Apr 2025 00:48:02 +0000 (01:48 +0100)] 
net: ethernet: mtk_eth_soc: convert cap_bit in mtk_eth_muxc struct to u64

With commit 51a4df60db5c2 ("net: ethernet: mtk_eth_soc: convert caps in
mtk_soc_data struct to u64") the capabilities bitfield was converted to
a 64-bit value, but a cap_bit in struct mtk_eth_muxc which is used to
store a full bitfield (rather than the bit number, as the name would
suggest) still holds only a 32-bit value.

Change the type of cap_bit to u64 in order to avoid truncating the
bitfield which results in path selection to not work with capabilities
above the 32-bit limit.

The values currently stored in the cap_bit field are
MTK_ETH_MUX_GDM1_TO_GMAC1_ESW:
 BIT_ULL(18) | BIT_ULL(5)

MTK_ETH_MUX_GMAC2_GMAC0_TO_GEPHY:
 BIT_ULL(19) | BIT_ULL(5) | BIT_ULL(6)

MTK_ETH_MUX_U3_GMAC2_TO_QPHY:
 BIT_ULL(20) | BIT_ULL(5) | BIT_ULL(6)

MTK_ETH_MUX_GMAC1_GMAC2_TO_SGMII_RGMII:
 BIT_ULL(20) | BIT_ULL(5) | BIT_ULL(7)

MTK_ETH_MUX_GMAC12_TO_GEPHY_SGMII:
 BIT_ULL(21) | BIT_ULL(5)

While all those values are currently still within 32-bit boundaries,
the addition of new capabilities of MT7988 as well as future SoC's
like MT7987 will exceed them. Also, the use of a 32-bit 'int' type to
store the result of a BIT_ULL(...) is misleading.

Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/ded98b0d716c3203017a7a92151516ec2bf1abee.1745369249.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agorxrpc: Remove deadcode
Dr. David Alan Gilbert [Tue, 22 Apr 2025 23:51:47 +0000 (00:51 +0100)] 
rxrpc: Remove deadcode

Remove three functions that are no longer used.

rxrpc_get_txbuf() last use was removed by 2020's
commit 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and
local processor work")

rxrpc_kernel_get_epoch() last use was removed by 2020's
commit 44746355ccb1 ("afs: Don't get epoch from a server because it may be
ambiguous")

rxrpc_kernel_set_max_life() last use was removed by 2023's
commit db099c625b13 ("rxrpc: Fix timeout of a call that hasn't yet been
granted a channel")

Both of the rxrpc_kernel_* functions were documented.  Remove that
documentation as well as the code.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20250422235147.146460-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-bcmasp-add-v3-0-and-remove-v2-0'
Jakub Kicinski [Thu, 24 Apr 2025 23:59:55 +0000 (16:59 -0700)] 
Merge branch 'net-bcmasp-add-v3-0-and-remove-v2-0'

Justin Chen says:

====================
net: bcmasp: Add v3.0 and remove v2.0

asp-v2.0 had one supported SoC that never saw the light of day.
Given that it was the first iteration of the HW, it ended up with
some one off HW design decisions that were changed in futher iterations
of the HW. We remove support to simplify the code and make it easier to
add future revisions.

Add support for asp-v3.0. asp-v3.0 reduces the feature set for cost
savings. We reduce the number of channel/network filters. And also
remove some features and statistics.
====================

Link: https://patch.msgid.link/20250422233645.1931036-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: mdio-bcm-unimac: Add asp-v3.0
Justin Chen [Tue, 22 Apr 2025 23:36:45 +0000 (16:36 -0700)] 
net: phy: mdio-bcm-unimac: Add asp-v3.0

Add mdio compat string for asp-v3.0 ethernet driver.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250422233645.1931036-9-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: bcmasp: Add support for asp-v3.0
Justin Chen [Tue, 22 Apr 2025 23:36:44 +0000 (16:36 -0700)] 
net: bcmasp: Add support for asp-v3.0

The asp-v3.0 is a major HW revision that reduced the number of
channels and filters. The goal was to save cost by reducing the
feature set.

Changes for asp-v3.0
- Number of network filters were reduced.
- Number of channels were reduced.
- EDPKT stats were removed.
- Fix a bug with csum offload.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250422233645.1931036-8-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodt-bindings: net: brcm,unimac-mdio: Add asp-v3.0
Justin Chen [Tue, 22 Apr 2025 23:36:43 +0000 (16:36 -0700)] 
dt-bindings: net: brcm,unimac-mdio: Add asp-v3.0

The asp-v3.0 Ethernet controller uses a brcm unimac like its
predecessor.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250422233645.1931036-7-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodt-bindings: net: brcm,asp-v2.0: Add asp-v3.0
Justin Chen [Tue, 22 Apr 2025 23:36:42 +0000 (16:36 -0700)] 
dt-bindings: net: brcm,asp-v2.0: Add asp-v3.0

Add asp-v3.0 support. v3.0 is a major revision that reduces
the feature set for cost savings. We have a reduced amount of
channels and network filters.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250422233645.1931036-6-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: mdio-bcm-unimac: Remove asp-v2.0
Justin Chen [Tue, 22 Apr 2025 23:36:41 +0000 (16:36 -0700)] 
net: phy: mdio-bcm-unimac: Remove asp-v2.0

Remove asp-v2.0 which will no longer be supported.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250422233645.1931036-5-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: bcmasp: Remove support for asp-v2.0
Justin Chen [Tue, 22 Apr 2025 23:36:40 +0000 (16:36 -0700)] 
net: bcmasp: Remove support for asp-v2.0

The SoC that supported asp-v2.0 never saw the light of day. asp-v2.0 has
quirks that makes the logic overly complicated. For example, asp-v2.0 is
the only revision that has a different wake up IRQ hook up. Remove asp-v2.0
support to make supporting future HW revisions cleaner.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250422233645.1931036-4-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodt-bindings: net: brcm,unimac-mdio: Remove asp-v2.0
Justin Chen [Tue, 22 Apr 2025 23:36:39 +0000 (16:36 -0700)] 
dt-bindings: net: brcm,unimac-mdio: Remove asp-v2.0

Remove asp-v2.0 which was only supported on one SoC that never
saw the light of day.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250422233645.1931036-3-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agodt-bindings: net: brcm,asp-v2.0: Remove asp-v2.0
Justin Chen [Tue, 22 Apr 2025 23:36:38 +0000 (16:36 -0700)] 
dt-bindings: net: brcm,asp-v2.0: Remove asp-v2.0

Remove asp-v2.0 which was only supported on one SoC that never
saw the light of day.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250422233645.1931036-2-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: iou-zcrx: Get the page size at runtime
Haiyue Wang [Sat, 19 Apr 2025 14:10:15 +0000 (22:10 +0800)] 
selftests: iou-zcrx: Get the page size at runtime

Use the API `sysconf()` to query page size at runtime, instead of using
hard code number 4096.

And use `posix_memalign` to allocate the page size aligned momory.

Signed-off-by: Haiyue Wang <haiyuewa@163.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250419141044.10304-1-haiyuewa@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 24 Apr 2025 18:14:16 +0000 (11:14 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.15-rc4).

This pull includes wireless and a fix to vxlan which isn't
in Linus's tree just yet. The latter creates with a silent conflict
/ build breakage, so merging it now to avoid causing problems.

drivers/net/vxlan/vxlan_vnifilter.c
  094adad91310 ("vxlan: Use a single lock to protect the FDB table")
  087a9eb9e597 ("vxlan: vnifilter: Fix unlocked deletion of default FDB entry")
https://lore.kernel.org/20250423145131.513029-1-idosch@nvidia.com

No "normal" conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agovxlan: vnifilter: Fix unlocked deletion of default FDB entry
Ido Schimmel [Wed, 23 Apr 2025 14:51:31 +0000 (17:51 +0300)] 
vxlan: vnifilter: Fix unlocked deletion of default FDB entry

When a VNI is deleted from a VXLAN device in 'vnifilter' mode, the FDB
entry associated with the default remote (assuming one was configured)
is deleted without holding the hash lock. This is wrong and will result
in a warning [1] being generated by the lockdep annotation that was
added by commit ebe642067455 ("vxlan: Create wrappers for FDB lookup").

Reproducer:

 # ip link add vx0 up type vxlan dstport 4789 external vnifilter local 192.0.2.1
 # bridge vni add vni 10010 remote 198.51.100.1 dev vx0
 # bridge vni del vni 10010 dev vx0

Fix by acquiring the hash lock before the deletion and releasing it
afterwards. Blame the original commit that introduced the issue rather
than the one that exposed it.

[1]
WARNING: CPU: 3 PID: 392 at drivers/net/vxlan/vxlan_core.c:417 vxlan_find_mac+0x17f/0x1a0
[...]
RIP: 0010:vxlan_find_mac+0x17f/0x1a0
[...]
Call Trace:
 <TASK>
 __vxlan_fdb_delete+0xbe/0x560
 vxlan_vni_delete_group+0x2ba/0x940
 vxlan_vni_del.isra.0+0x15f/0x580
 vxlan_process_vni_filter+0x38b/0x7b0
 vxlan_vnifilter_process+0x3bb/0x510
 rtnetlink_rcv_msg+0x2f7/0xb70
 netlink_rcv_skb+0x131/0x360
 netlink_unicast+0x426/0x710
 netlink_sendmsg+0x75a/0xc20
 __sock_sendmsg+0xc1/0x150
 ____sys_sendmsg+0x5aa/0x7b0
 ___sys_sendmsg+0xfc/0x180
 __sys_sendmsg+0x121/0x1b0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250423145131.513029-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'wireless-2025-04-24' of https://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 24 Apr 2025 18:10:57 +0000 (11:10 -0700)] 
Merge tag 'wireless-2025-04-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Some more fixes, notably:
 * iwlwifi: various regression and iwlmld fixes
 * mac80211: fix TX frames in monitor mode
 * brcmfmac: error handling for firmware load

* tag 'wireless-2025-04-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: restore missing initialization of async_handlers_list
  wifi: brcm80211: fmac: Add error handling for brcmf_usb_dl_writeimage()
  wifi: plfxlc: Remove erroneous assert in plfxlc_mac_release
  wifi: iwlwifi: fix the check for the SCRATCH register upon resume
  wifi: iwlwifi: don't warn if the NIC is gone in resume
  wifi: iwlwifi: mld: fix BAID validity check
  wifi: iwlwifi: back off on continuous errors
  wifi: iwlwifi: mld: only create debugfs symlink if it does not exist
  wifi: iwlwifi: mld: inform trans on init failure
  wifi: iwlwifi: mld: properly handle async notification in op mode start
  Revert "wifi: iwlwifi: make no_160 more generic"
  Revert "wifi: iwlwifi: add support for BE213"
  wifi: mac80211: restore monitor for outgoing frames
====================

Link: https://patch.msgid.link/20250424120535.56499-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'net-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 24 Apr 2025 16:14:50 +0000 (09:14 -0700)] 
Merge tag 'net-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "No fixes from any subtree.

  Current release - regressions:

   - net: fix the missing unlock for detached devices

  Previous releases - regressions:

   - sched: fix UAF vulnerability in HFSC qdisc

   - lwtunnel: disable BHs when required

   - mptcp: pm: defer freeing of MPTCP userspace path manager entries

   - tipc: fix NULL pointer dereference in tipc_mon_reinit_self()

   - eth: virtio-net: disable delayed refill when pausing rx

  Previous releases - always broken:

   - phylink: fix suspend/resume with WoL enabled and link down

   - eth:
       - mlx5: fix null-ptr-deref in mlx5_create_{inner_,}ttc_table()
       - xen-netfront: handle NULL returned by xdp_convert_buff_to_frame()
       - enetc: fix frame corruption on bpf_xdp_adjust_head/tail() and XDP_PASS
       - stmmac: fix dwmac1000 ptp timestamp status offset
       - pds_core: prevent possible adminq overflow/stuck condition

  Misc:

   - a bunch of MAINTAINERS updates"

* tag 'net-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (32 commits)
  net: stmmac: fix multiplication overflow when reading timestamp
  net: stmmac: fix dwmac1000 ptp timestamp status offset
  net: dp83822: Fix OF_MDIO config check
  pds_core: make wait_context part of q_info
  pds_core: Remove unnecessary check in pds_client_adminq_cmd()
  pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result
  pds_core: Prevent possible adminq overflow/stuck condition
  net: dsa: mt7530: sync driver-specific behavior of MT7531 variants
  selftests/tc-testing: Add test for HFSC queue emptying during peek operation
  net_sched: hfsc: Fix a potential UAF in hfsc_dequeue() too
  net_sched: hfsc: Fix a UAF vulnerability in class handling
  selftests: mptcp: diag: use mptcp_lib_get_info_value
  mptcp: pm: Defer freeing of MPTCP userspace path manager entries
  net: ethernet: mtk_eth_soc: net: revise NETSYSv3 hardware configuration
  tipc: fix NULL pointer dereference in tipc_mon_reinit_self()
  virtio-net: disable delayed refill when pausing rx
  net: phy: leds: fix memory leak
  net: phylink: mac_link_(up|down)() clarifications
  net: phylink: fix suspend/resume with WoL enabled and link down
  net: lwtunnel: disable BHs when required
  ...

7 weeks agoMerge tag 'v6.15-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Thu, 24 Apr 2025 16:10:01 +0000 (09:10 -0700)] 
Merge tag 'v6.15-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

 - Revert acomp multibuffer tests which were buggy

 - Fix off-by-one regression in new scomp code

 - Lower quality setting on atmel-sha204a as it may not be random

* tag 'v6.15-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: atmel-sha204a - Set hwrng quality to lowest possible
  crypto: scomp - Fix off-by-one bug when calculating last page
  Revert "crypto: testmgr - Add multibuffer acomp testing"

7 weeks agonet: phy: marvell-88q2xxx: Enable temperature sensor for mv88q211x
Niklas Söderlund [Fri, 18 Apr 2025 14:58:00 +0000 (16:58 +0200)] 
net: phy: marvell-88q2xxx: Enable temperature sensor for mv88q211x

The temperature sensor enabled for mv88q222x devices also functions for
mv88q211x based devices. Unify the two devices probe functions to enable
the sensors for all devices supported by this driver.

The same oddity as for mv88q222x devices exists, the PHY link must be up
for a correct temperature reading to be reported.

    # cat /sys/class/hwmon/hwmon9/temp1_input
    -75000

    # ifconfig end5 up

    # cat /sys/class/hwmon/hwmon9/temp1_input
    59000

Worth noting is that while the temperature register offsets and layout
are the same between mv88q211x and mv88q222x devices their names in the
datasheets are different. This change keeps the mv88q222x names for the
mv88q211x support.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250418145800.2420751-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'net-stmmac-fix-timestamp-snapshots-on-dwmac1000'
Paolo Abeni [Thu, 24 Apr 2025 09:50:45 +0000 (11:50 +0200)] 
Merge branch 'net-stmmac-fix-timestamp-snapshots-on-dwmac1000'

Alexis Lothore says:

====================
net: stmmac: fix timestamp snapshots on dwmac1000

this is the v2 of a small series containing two small fixes for the
timestamp snapshot feature on stmmac, especially on dwmac1000 version.
Those issues have been detected on a socfpga (Cyclone V) platform. They
kind of follow the big rework sent by Maxime at the end of last year to
properly split this feature support between different versions of the
DWMAC IP.

v1: https://lore.kernel.org/r/20250422-stmmac_ts-v1-0-b59c9f406041@bootlin.com

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
====================

Link: https://patch.msgid.link/20250423-stmmac_ts-v2-0-e2cf2bbd61b1@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: fix multiplication overflow when reading timestamp
Alexis Lothoré [Wed, 23 Apr 2025 07:12:10 +0000 (09:12 +0200)] 
net: stmmac: fix multiplication overflow when reading timestamp

The current way of reading a timestamp snapshot in stmmac can lead to
integer overflow, as the computation is done on 32 bits. The issue has
been observed on a dwmac-socfpga platform returning chaotic timestamp
values due to this overflow. The corresponding multiplication is done
with a MUL instruction, which returns 32 bit values. Explicitly casting
the value to 64 bits replaced the MUL with a UMLAL, which computes and
returns the result on 64 bits, and so returns correctly the timestamps.

Prevent this overflow by explicitly casting the intermediate value to
u64 to make sure that the whole computation is made on u64. While at it,
apply the same cast on the other dwmac variant (GMAC4) method for
snapshot retrieval.

Fixes: 477c3e1f6363 ("net: stmmac: Introduce dwmac1000 timestamping operations")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250423-stmmac_ts-v2-2-e2cf2bbd61b1@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: stmmac: fix dwmac1000 ptp timestamp status offset
Alexis Lothore [Wed, 23 Apr 2025 07:12:09 +0000 (09:12 +0200)] 
net: stmmac: fix dwmac1000 ptp timestamp status offset

When a PTP interrupt occurs, the driver accesses the wrong offset to
learn about the number of available snapshots in the FIFO for dwmac1000:
it should be accessing bits 29..25, while it is currently reading bits
19..16 (those are bits about the auxiliary triggers which have generated
the timestamps). As a consequence, it does not compute correctly the
number of available snapshots, and so possibly do not generate the
corresponding clock events if the bogus value ends up being 0.

Fix clock events generation by reading the correct bits in the timestamp
register for dwmac1000.

Fixes: 477c3e1f6363 ("net: stmmac: Introduce dwmac1000 timestamping operations")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250423-stmmac_ts-v2-1-e2cf2bbd61b1@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: dp83822: Fix OF_MDIO config check
Johannes Schneider [Wed, 23 Apr 2025 04:47:24 +0000 (06:47 +0200)] 
net: dp83822: Fix OF_MDIO config check

When CONFIG_OF_MDIO is set to be a module the code block is not
compiled. Use the IS_ENABLED macro that checks for both built in as
well as module.

Fixes: 5dc39fd5ef35 ("net: phy: DP83822: Add ability to advertise Fiber connection")
Signed-off-by: Johannes Schneider <johannes.schneider@leica-geosystems.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250423044724.1284492-1-johannes.schneider@leica-geosystems.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'ipv6-no-rtnl-for-ipv6-routing-table'
Paolo Abeni [Thu, 24 Apr 2025 07:30:02 +0000 (09:30 +0200)] 
Merge branch 'ipv6-no-rtnl-for-ipv6-routing-table'

Kuniyuki Iwashima says:

====================
ipv6: No RTNL for IPv6 routing table.

IPv6 routing tables are protected by each table's lock and work in
the interrupt context, which means we basically don't need RTNL to
modify an IPv6 routing table itself.

Currently, the control paths require RTNL because we may need to
perform device and nexthop lookups; we must prevent dev/nexthop from
going away from the netns.

This, however, can be achieved by RCU as well.

If we are in the RCU critical section while adding an IPv6 route,
synchronize_net() in __dev_change_net_namespace() and
unregister_netdevice_many_notify() guarantee that the dev will not be
moved to another netns or removed.

Also, nexthop is guaranteed not to be freed during the RCU grace period.

If we care about a race between nexthop removal and IPv6 route addition,
we can get rid of RTNL from the control paths.

Patch 1 moves a validation for RTA_MULTIPATH earlier.
Patch 2 removes RTNL for SIOCDELRT and RTM_DELROUTE.
Patch 3 ~ 11 moves validation and memory allocation earlier.
Patch 12 prevents a race between two requests for the same table.
Patch 13 & 14 prevents the nexthop race mentioned above.
Patch 15 removes RTNL for SIOCADDRT and RTM_NEWROUTE.

Test:

The script [0] lets each CPU-X create 100000 routes on table-X in a
batch.

On c7a.metal-48xl EC2 instance with 192 CPUs,

without this series:

  $ sudo ./route_test.sh
  start adding routes
  added 19200000 routes (100000 routes * 192 tables).
  total routes: 19200006
  Time elapsed: 191577 milliseconds.

with this series:

  $ sudo ./route_test.sh
  start adding routes
  added 19200000 routes (100000 routes * 192 tables).
  total routes: 19200006
  Time elapsed: 62854 milliseconds.

I changed the number of routes (1000 ~ 100000 per CPU/table) and
consistently saw it finish 3x faster with this series.

[0]

mkdir tmp

NS="test"
ip netns add $NS
ip -n $NS link add veth0 type veth peer veth1
ip -n $NS link set veth0 up
ip -n $NS link set veth1 up

TABLES=()
for i in $(seq $(nproc)); do
    TABLES+=("$i")
done

ROUTES=()
for i in {1..100}; do
    for j in {1..1000}; do
ROUTES+=("2001:$i:$j::/64")
    done
done

for TABLE in "${TABLES[@]}"; do
    (
FILE="./tmp/batch-table-$TABLE.txt"
> $FILE
for ROUTE in "${ROUTES[@]}"; do
            echo "route add $ROUTE dev veth0 table $TABLE" >> $FILE
done
    ) &
done

wait

echo "start adding routes"

START_TIME=$(date +%s%3N)
for TABLE in "${TABLES[@]}"; do
    ip -n $NS -6 -batch "./tmp/batch-table-$TABLE.txt" &
done

wait
END_TIME=$(date +%s%3N)
ELAPSED_TIME=$((END_TIME - START_TIME))

echo "added $((${#ROUTES[@]} * ${#TABLES[@]})) routes (${#ROUTES[@]} routes * ${#TABLES[@]} tables)."
echo "total routes: $(ip -n $NS -6 route show table all | wc -l)"  # Just for debug
echo "Time elapsed: ${ELAPSED_TIME} milliseconds."

ip netns del $NS
rm -fr ./tmp/

v2: https://lore.kernel.org/netdev/20250409011243.26195-1-kuniyu@amazon.com/
v1: https://lore.kernel.org/netdev/20250321040131.21057-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20250418000443.43734-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE.
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:56 +0000 (17:03 -0700)] 
ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE.

Now we are ready to remove RTNL from SIOCADDRT and RTM_NEWROUTE.

The remaining things to do are

  1. pass false to lwtunnel_valid_encap_type_attr()
  2. use rcu_dereference_rtnl() in fib6_check_nexthop()
  3. place rcu_read_lock() before ip6_route_info_create_nh().

Let's complete the RTNL-free conversion.

When each CPU-X adds 100000 routes on table-X in a batch
concurrently on c7a.metal-48xl EC2 instance with 192 CPUs,

without this series:

  $ sudo ./route_test.sh
  ...
  added 19200000 routes (100000 routes * 192 tables).
  time elapsed: 191577 milliseconds.

with this series:

  $ sudo ./route_test.sh
  ...
  added 19200000 routes (100000 routes * 192 tables).
  time elapsed: 62854 milliseconds.

I changed the number of routes in each table (1000 ~ 100000)
and consistently saw it finish 3x faster with this series.

Note that now every caller of lwtunnel_valid_encap_type() passes
false as the last argument, and this can be removed later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-16-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Protect nh->f6i_list with spinlock and flag.
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:55 +0000 (17:03 -0700)] 
ipv6: Protect nh->f6i_list with spinlock and flag.

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT.

Then, we may be going to add a route tied to a dying nexthop.

The nexthop itself is not freed during the RCU grace period, but
if we link a route after __remove_nexthop_fib() is called for the
nexthop, the route will be leaked.

To avoid the race between IPv6 route addition under RCU vs nexthop
deletion under RTNL, let's add a dead flag and protect it and
nh->f6i_list with a spinlock.

__remove_nexthop_fib() acquires the nexthop's spinlock and sets false
to nh->dead, then calls ip6_del_rt() for the linked route one by one
without the spinlock because fib6_purge_rt() acquires it later.

While adding an IPv6 route, fib6_add() acquires the nexthop lock and
checks the dead flag just before inserting the route.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-15-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Defer fib6_purge_rt() in fib6_add_rt2node() to fib6_add().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:54 +0000 (17:03 -0700)] 
ipv6: Defer fib6_purge_rt() in fib6_add_rt2node() to fib6_add().

The next patch adds per-nexthop spinlock which protects nh->f6i_list.

When rt->nh is not NULL, fib6_add_rt2node() will be called under the lock.
fib6_add_rt2node() could call fib6_purge_rt() for another route, which
could holds another nexthop lock.

Then, deadlock could happen between two nexthops.

Let's defer fib6_purge_rt() after fib6_add_rt2node().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20250418000443.43734-14-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Protect fib6_link_table() with spinlock.
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:53 +0000 (17:03 -0700)] 
ipv6: Protect fib6_link_table() with spinlock.

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT.

If the request specifies a new table ID, fib6_new_table() is
called to create a new routing table.

Two concurrent requests could specify the same table ID, so we
need a lock to protect net->ipv6.fib_table_hash[h].

Let's add a spinlock to protect the hash bucket linkage.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20250418000443.43734-13-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Factorise ip6_route_multipath_add().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:52 +0000 (17:03 -0700)] 
ipv6: Factorise ip6_route_multipath_add().

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely
on RCU to guarantee dev and nexthop lifetime.

Then, the RCU section will start before ip6_route_info_create_nh()
in ip6_route_multipath_add(), but ip6_route_info_create() is called
in the same loop and will sleep.

Let's split the loop into ip6_route_mpath_info_create() and
ip6_route_mpath_info_create_nh().

Note that ip6_route_info_append() is now integrated into
ip6_route_mpath_info_create_nh() because we need to call different
free functions for nexthops that passed ip6_route_info_create_nh().

In case of failure, the remaining nexthops that ip6_route_info_create_nh()
has not been called for will be freed by ip6_route_mpath_info_cleanup().

OTOH, if a nexthop passes ip6_route_info_create_nh(), it will be linked
to a local temporary list, which will be spliced back to rt6_nh_list.
In case of failure, these nexthops will be released by fib6_info_release()
in ip6_route_multipath_add().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-12-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Rename rt6_nh.next to rt6_nh.list.
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:51 +0000 (17:03 -0700)] 
ipv6: Rename rt6_nh.next to rt6_nh.list.

ip6_route_multipath_add() allocates struct rt6_nh for each config
of multipath routes to link them to a local list rt6_nh_list.

struct rt6_nh.next is the list node of each config, so the name
is quite misleading.

Let's rename it to list.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/netdev/c9bee472-c94e-4878-8cc2-1512b2c54db5@redhat.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-11-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Don't pass net to ip6_route_info_append().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:50 +0000 (17:03 -0700)] 
ipv6: Don't pass net to ip6_route_info_append().

net is not used in ip6_route_info_append() after commit 36f19d5b4f99
("net/ipv6: Remove extra call to ip6_convert_metrics for multipath case").

Let's remove the argument.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20250418000443.43734-10-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Preallocate nhc_pcpu_rth_output in ip6_route_info_create().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:49 +0000 (17:03 -0700)] 
ipv6: Preallocate nhc_pcpu_rth_output in ip6_route_info_create().

ip6_route_info_create_nh() will be called under RCU.

It calls fib_nh_common_init() and allocates nhc->nhc_pcpu_rth_output.

As with the reason for rt->fib6_nh->rt6i_pcpu, we want to avoid
GFP_ATOMIC allocation for nhc->nhc_pcpu_rth_output under RCU.

Let's preallocate it in ip6_route_info_create().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-9-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Preallocate rt->fib6_nh->rt6i_pcpu in ip6_route_info_create().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:48 +0000 (17:03 -0700)] 
ipv6: Preallocate rt->fib6_nh->rt6i_pcpu in ip6_route_info_create().

ip6_route_info_create_nh() will be called under RCU.

Then, fib6_nh_init() is also under RCU, but per-cpu memory allocation
is very likely to fail with GFP_ATOMIC while bulk-adding IPv6 routes
and we would see a bunch of this message in dmesg.

  percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left
  percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left

Let's preallocate rt->fib6_nh->rt6i_pcpu in ip6_route_info_create().

If something fails before the original memory allocation in
fib6_nh_init(), ip6_route_info_create_nh() calls fib6_info_release(),
which releases the preallocated per-cpu memory.

Note that rt->fib6_nh->rt6i_pcpu is not preallocated when called via
ipv6_stub, so we still need alloc_percpu_gfp() in fib6_nh_init().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-8-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Split ip6_route_info_create().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:47 +0000 (17:03 -0700)] 
ipv6: Split ip6_route_info_create().

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely
on RCU to guarantee dev and nexthop lifetime.

Then, we want to allocate as much as possible before entering
the RCU section.

The RCU section will start in the middle of ip6_route_info_create(),
and this is problematic for ip6_route_multipath_add() that calls
ip6_route_info_create() multiple times.

Let's split ip6_route_info_create() into two parts; one for memory
allocation and another for nexthop setup.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20250418000443.43734-7-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Move nexthop_find_by_id() after fib6_info_alloc().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:46 +0000 (17:03 -0700)] 
ipv6: Move nexthop_find_by_id() after fib6_info_alloc().

We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT.

Then, we must perform two lookups for nexthop and dev under RCU
to guarantee their lifetime.

ip6_route_info_create() calls nexthop_find_by_id() first if
RTA_NH_ID is specified, and then allocates struct fib6_info.

nexthop_find_by_id() must be called under RCU, but we do not want
to use GFP_ATOMIC for memory allocation here, which will be likely
to fail in ip6_route_multipath_add().

Let's move nexthop_find_by_id() after the memory allocation so
that we can later split ip6_route_info_create() into two parts:
the sleepable part and the RCU part.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20250418000443.43734-6-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Check GATEWAY in rtm_to_fib6_multipath_config().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:45 +0000 (17:03 -0700)] 
ipv6: Check GATEWAY in rtm_to_fib6_multipath_config().

In ip6_route_multipath_add(), we call rt6_qualify_for_ecmp() for each
entry.  If it returns false, the request fails.

rt6_qualify_for_ecmp() returns false if either of the conditions below
is true:

  1. f6i->fib6_flags has RTF_ADDRCONF
  2. f6i->nh is not NULL
  3. f6i->fib6_nh->fib_nh_gw_family is AF_UNSPEC

1 is unnecessary because rtm_to_fib6_config() never sets RTF_ADDRCONF
to cfg->fc_flags.

2. is equivalent with cfg->fc_nh_id.

3. can be replaced by checking RTF_GATEWAY in the base and each multipath
entry because AF_INET6 is set to f6i->fib6_nh->fib_nh_gw_family only when
cfg.fc_is_fdb is true or RTF_GATEWAY is set, but the former is always
false.

These checks do not require RCU and can be done earlier.

Let's perform the equivalent checks in rtm_to_fib6_multipath_config().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-5-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:44 +0000 (17:03 -0700)] 
ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().

ip6_route_info_create() is called from 3 functions:

  * ip6_route_add()
  * ip6_route_multipath_add()
  * addrconf_f6i_alloc()

addrconf_f6i_alloc() does not need validation for struct fib6_config in
ip6_route_info_create().

ip6_route_multipath_add() calls ip6_route_info_create() for multiple
routes with slightly different fib6_config instances, which is copied
from the base config passed from userspace.  So, we need not validate
the same config repeatedly.

Let's move such validation into rtm_to_fib6_config().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20250418000443.43734-4-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Get rid of RTNL for SIOCDELRT and RTM_DELROUTE.
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:43 +0000 (17:03 -0700)] 
ipv6: Get rid of RTNL for SIOCDELRT and RTM_DELROUTE.

Basically, removing an IPv6 route does not require RTNL because
the IPv6 routing tables are protected by per table lock.

inet6_rtm_delroute() calls nexthop_find_by_id() to check if the
nexthop specified by RTA_NH_ID exists.  nexthop uses rbtree and
the top-down walk can be safely performed under RCU.

ip6_route_del() already relies on RCU and the table lock, but we
need to extend the RCU critical section a bit more to cover
__ip6_del_rt().  For example, nexthop_for_each_fib6_nh() and
inet6_rt_notify() needs RCU.

Let's call nexthop_find_by_id() and __ip6_del_rt() under RCU and
get rid of RTNL from inet6_rtm_delroute() and SIOCDELRT.

Even if the nexthop is removed after rcu_read_unlock() in
inet6_rtm_delroute(), __remove_nexthop_fib() cleans up the routes
tied to the nexthop, and ip6_route_del() returns -ESRCH.  So the
request was at least valid as of nexthop_find_by_id(), and it's just
a matter of timing.

Note that we need to pass false to lwtunnel_valid_encap_type_attr().
The following patches also use the newroute bool.

Note also that fib6_get_table() does not require RCU because once
allocated fib6_table is not freed until netns dismantle.  I will
post a follow-up series to convert such callers to RCU-lockless
version.  [0]

Link: https://lore.kernel.org/netdev/20250417174557.65721-1-kuniyu@amazon.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-3-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoipv6: Validate RTA_GATEWAY of RTA_MULTIPATH in rtm_to_fib6_config().
Kuniyuki Iwashima [Fri, 18 Apr 2025 00:03:42 +0000 (17:03 -0700)] 
ipv6: Validate RTA_GATEWAY of RTA_MULTIPATH in rtm_to_fib6_config().

We will perform RTM_NEWROUTE and RTM_DELROUTE under RCU, and then
we want to perform some validation out of the RCU scope.

When creating / removing an IPv6 route with RTA_MULTIPATH,
inet6_rtm_newroute() / inet6_rtm_delroute() validates RTA_GATEWAY
in each multipath entry.

Let's do that in rtm_to_fib6_config().

Note that now RTM_DELROUTE returns an error for RTA_MULTIPATH with
0 entries, which was accepted but should result in -EINVAL as
RTM_NEWROUTE.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-2-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'pds_core-updates-and-fixes'
Jakub Kicinski [Thu, 24 Apr 2025 01:50:21 +0000 (18:50 -0700)] 
Merge branch 'pds_core-updates-and-fixes'

Shannon Nelson says:

====================
pds_core: updates and fixes

This patchset has fixes for issues seen in recent internal testing
of error conditions and stress handling.

Note that the first patch in this series is a leftover from an
earlier patchset that was abandoned:
Link: https://lore.kernel.org/netdev/20250129004337.36898-2-shannon.nelson@amd.com/
====================

Link: https://patch.msgid.link/20250421174606.3892-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopds_core: make wait_context part of q_info
Shannon Nelson [Mon, 21 Apr 2025 17:46:06 +0000 (10:46 -0700)] 
pds_core: make wait_context part of q_info

Make the wait_context a full part of the q_info struct rather
than a stack variable that goes away after pdsc_adminq_post()
is done so that the context is still available after the wait
loop has given up.

There was a case where a slow development firmware caused
the adminq request to time out, but then later the FW finally
finished the request and sent the interrupt.  The handler tried
to complete_all() the completion context that had been created
on the stack in pdsc_adminq_post() but no longer existed.
This caused bad pointer usage, kernel crashes, and much wailing
and gnashing of teeth.

Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250421174606.3892-5-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopds_core: Remove unnecessary check in pds_client_adminq_cmd()
Brett Creeley [Mon, 21 Apr 2025 17:46:05 +0000 (10:46 -0700)] 
pds_core: Remove unnecessary check in pds_client_adminq_cmd()

When the pds_core driver was first created there were some race
conditions around using the adminq, especially for client drivers.
To reduce the possibility of a race condition there's a check
against pf->state in pds_client_adminq_cmd(). This is problematic
for a couple of reasons:

1. The PDSC_S_INITING_DRIVER bit is set during probe, but not
   cleared until after everything in probe is complete, which
   includes creating the auxiliary devices. For pds_fwctl this
   means it can't make any adminq commands until after pds_core's
   probe is complete even though the adminq is fully up by the
   time pds_fwctl's auxiliary device is created.

2. The race conditions around using the adminq have been fixed
   and this path is already protected against client drivers
   calling pds_client_adminq_cmd() if the adminq isn't ready,
   i.e. see pdsc_adminq_post() -> pdsc_adminq_inc_if_up().

Fix this by removing the pf->state check in pds_client_adminq_cmd()
because invalid accesses to pds_core's adminq is already handled by
pdsc_adminq_post()->pdsc_adminq_inc_if_up().

Fixes: 10659034c622 ("pds_core: add the aux client API")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250421174606.3892-4-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result
Brett Creeley [Mon, 21 Apr 2025 17:46:04 +0000 (10:46 -0700)] 
pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result

If the FW doesn't support the PDS_CORE_CMD_FW_CONTROL command
the driver might at the least print garbage and at the worst
crash when the user runs the "devlink dev info" devlink command.

This happens because the stack variable fw_list is not 0
initialized which results in fw_list.num_fw_slots being a
garbage value from the stack.  Then the driver tries to access
fw_list.fw_names[i] with i >= ARRAY_SIZE and runs off the end
of the array.

Fix this by initializing the fw_list and by not failing
completely if the devcmd fails because other useful information
is printed via devlink dev info even if the devcmd fails.

Fixes: 45d76f492938 ("pds_core: set up device and adminq")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250421174606.3892-3-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopds_core: Prevent possible adminq overflow/stuck condition
Brett Creeley [Mon, 21 Apr 2025 17:46:03 +0000 (10:46 -0700)] 
pds_core: Prevent possible adminq overflow/stuck condition

The pds_core's adminq is protected by the adminq_lock, which prevents
more than 1 command to be posted onto it at any one time. This makes it
so the client drivers cannot simultaneously post adminq commands.
However, the completions happen in a different context, which means
multiple adminq commands can be posted sequentially and all waiting
on completion.

On the FW side, the backing adminq request queue is only 16 entries
long and the retry mechanism and/or overflow/stuck prevention is
lacking. This can cause the adminq to get stuck, so commands are no
longer processed and completions are no longer sent by the FW.

As an initial fix, prevent more than 16 outstanding adminq commands so
there's no way to cause the adminq from getting stuck. This works
because the backing adminq request queue will never have more than 16
pending adminq commands, so it will never overflow. This is done by
reducing the adminq depth to 16.

Fixes: 45d76f492938 ("pds_core: set up device and adminq")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250421174606.3892-2-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-mlx5-hws-improve-ip-version-handling'
Jakub Kicinski [Thu, 24 Apr 2025 01:48:13 +0000 (18:48 -0700)] 
Merge branch 'net-mlx5-hws-improve-ip-version-handling'

Mark Bloch says:

====================
net/mlx5: HWS, Improve IP version handling

This small series hardens our checks against a single matcher containing
rules that match on IPv4 and IPv6. This scenario is not supported by
hardware steering and the implementation now signals this instead of
failing silently.

Patches:
* Patch 1 forbids a single definer to match on mixed IP versions for
  source and destination address.
* Patch 2 reproduces a couple of firmware checks: it forbids creating
  a definer that matches on IP address without matching on IP version,
  and also disallows matching on IPv6 addresses and the IPv4 IHL fields
  in the same definer.
* Patch 3 forbids mixing rules that match on IPv4 and IPv6 addresses in
  the same matcher. The underlying definer mechanism does not support
  that.
====================

Link: https://patch.msgid.link/20250422092540.182091-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5: HWS, Disallow matcher IP version mixing
Vlad Dogaru [Tue, 22 Apr 2025 09:25:40 +0000 (12:25 +0300)] 
net/mlx5: HWS, Disallow matcher IP version mixing

Signal clearly to the user, via an error, that mixing IPv4 and IPv6
rules in the same matcher is not supported. Previously such cases
silently failed by adding a rule that did not work correctly.

Rules can specify an IP version by one of two fields: IP version or
ethertype. At matcher creation, store whether the template matches on
any of these two fields. If yes, inspect each rule for its corresponding
match value and store the IP version inside the matcher to guard against
inconsistencies with subsequent rules.

Furthermore, also check rules for internal consistency, i.e. verify that
the ethertype and IP version match values do not contradict each other.

The logic applies to inner and outer headers independently, to account
for tunneling.

Rules that do not match on IP addresses are not affected.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250422092540.182091-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5: HWS, Harden IP version definer checks
Vlad Dogaru [Tue, 22 Apr 2025 09:25:39 +0000 (12:25 +0300)] 
net/mlx5: HWS, Harden IP version definer checks

Replicate some sanity checks that firmware does, since hardware steering
does not go through firmware.

When creating a definer, disallow matching on IP addresses without also
matching on IP version. The latter can be satisfied by matching either
on the version field in the IP header, or on the ethertype field.

Also refuse to match IPv4 IHL alongside IPv6.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250422092540.182091-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5: HWS, Fix IP version decision
Vlad Dogaru [Tue, 22 Apr 2025 09:25:38 +0000 (12:25 +0300)] 
net/mlx5: HWS, Fix IP version decision

Unify the check for IP version when creating a definer. A given matcher
is deemed to match on IPv6 if any of the higher order (>31) bits of
source or destination address mask are set.

A single packet cannot mix IP versions between source and destination
addresses, so it makes no sense that they would be decided on
independently.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250422092540.182091-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dsa: mt7530: sync driver-specific behavior of MT7531 variants
Daniel Golle [Tue, 22 Apr 2025 03:10:20 +0000 (04:10 +0100)] 
net: dsa: mt7530: sync driver-specific behavior of MT7531 variants

MT7531 standalone and MMIO variants found in MT7988 and EN7581 share
most basic properties. Despite that, assisted_learning_on_cpu_port and
mtu_enforcement_ingress were only applied for MT7531 but not for MT7988
or EN7581, causing the expected issues on MMIO devices.

Apply both settings equally also for MT7988 and EN7581 by moving both
assignments form mt7531_setup() to mt7531_setup_common().

This fixes unwanted flooding of packets due to unknown unicast
during DA lookup, as well as issues with heterogenous MTU settings.

Fixes: 7f54cc9772ce ("net: dsa: mt7530: split-off common parts from mt7531_setup")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Chester A. Unal <chester.a.unal@arinc9.com>
Link: https://patch.msgid.link/89ed7ec6d4fa0395ac53ad2809742bb1ce61ed12.1745290867.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoocteontx2-pf: AF_XDP: code clean up
Hariprasad Kelam [Sun, 20 Apr 2025 03:23:50 +0000 (08:53 +0530)] 
octeontx2-pf: AF_XDP: code clean up

The current API, otx2_xdp_sq_append_pkt, verifies the number of available
descriptors before sending packets to the hardware.

However, for AF_XDP, this check is unnecessary because the batch value
is already determined based on the free descriptors.

This patch introduces a new API, "otx2_xsk_sq_append_pkt" to address this.

Remove the logic for releasing the TX buffers, as it is implicitly handled
by xsk_tx_peek_release_desc_batch

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250420032350.4047706-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Jakub Kicinski [Thu, 24 Apr 2025 01:31:55 +0000 (18:31 -0700)] 
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
igc: Add support for Frame Preemption

Faizal Rahim says:

Introduce support for the FPE feature in the IGC driver.

The patches aligns with the upstream FPE API:
https://patchwork.kernel.org/project/netdevbpf/cover/20230220122343.1156614-1-vladimir.oltean@nxp.com/
https://patchwork.kernel.org/project/netdevbpf/cover/20230119122705.73054-1-vladimir.oltean@nxp.com/

It builds upon earlier work:
https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/

The patch series adds the following functionalities to the IGC driver:
a) Configure FPE using `ethtool --set-mm`.
b) Display FPE settings via `ethtool --show-mm`.
c) View FPE statistics using `ethtool --include-statistics --show-mm'.
e) Block setting preemptible tc in taprio since it is not supported yet.
   Existing code already blocks it in mqprio.

Tested:
Enabled CONFIG_PROVE_LOCKING, CONFIG_DEBUG_ATOMIC_SLEEP, CONFIG_DMA_API_DEBUG, and CONFIG_KASAN
1) selftests
2) netdev down/up cycles
3) suspend/resume cycles
4) fpe verification

No bugs or unusual dmesg logs were observed.
Ran 1), 2) and 3) with and without the patch series, compared dmesg and selftest logs - no differences found.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igc: add support to get frame preemption statistics via ethtool
  igc: add support to get MAC Merge data via ethtool
  igc: block setting preemptible traffic class in taprio
  igc: add support to set tx-min-frag-size
  igc: add support for frame preemption verification
  igc: set the RX packet buffer size for TSN mode
  igc: use FIELD_PREP and GENMASK for existing RX packet buffer size
  igc: optimize TX packet buffer utilization for TSN mode
  igc: use FIELD_PREP and GENMASK for existing TX packet buffer size
  igc: rename I225_RXPBSIZE_DEFAULT and I225_TXPBSIZE_DEFAULT
  igc: rename xdp_get_tx_ring() for non-xdp usage
  net: ethtool: mm: reset verification status when link is down
  net: ethtool: mm: extract stmmac verification logic into common library
  net: stmmac: move frag_size handling out of spin_lock
====================

Link: https://patch.msgid.link/20250418163822.3519810-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net_sched-fix-uaf-vulnerability-in-hfsc-qdisc'
Jakub Kicinski [Thu, 24 Apr 2025 00:16:52 +0000 (17:16 -0700)] 
Merge branch 'net_sched-fix-uaf-vulnerability-in-hfsc-qdisc'

Cong Wang says:

====================
net_sched: Fix UAF vulnerability in HFSC qdisc

This patchset contains two bug fixes and a selftest for the first one
which we have a reliable reproducer, please check each patch
description for details.
====================

Link: https://patch.msgid.link/20250417184732.943057-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests/tc-testing: Add test for HFSC queue emptying during peek operation
Cong Wang [Thu, 17 Apr 2025 18:47:32 +0000 (11:47 -0700)] 
selftests/tc-testing: Add test for HFSC queue emptying during peek operation

Add a selftest to exercise the condition where qdisc implementations
like netem or codel might empty the queue during a peek operation.
This tests the defensive code path in HFSC that checks the queue length
again after peeking to handle this case.

Based on the reproducer from Gerrard, improved by Jamal.

Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250417184732.943057-4-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: hfsc: Fix a potential UAF in hfsc_dequeue() too
Cong Wang [Thu, 17 Apr 2025 18:47:31 +0000 (11:47 -0700)] 
net_sched: hfsc: Fix a potential UAF in hfsc_dequeue() too

Similarly to the previous patch, we need to safe guard hfsc_dequeue()
too. But for this one, we don't have a reliable reproducer.

Fixes: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 ("Linux-2.6.12-rc2")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250417184732.943057-3-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: hfsc: Fix a UAF vulnerability in class handling
Cong Wang [Thu, 17 Apr 2025 18:47:30 +0000 (11:47 -0700)] 
net_sched: hfsc: Fix a UAF vulnerability in class handling

This patch fixes a Use-After-Free vulnerability in the HFSC qdisc class
handling. The issue occurs due to a time-of-check/time-of-use condition
in hfsc_change_class() when working with certain child qdiscs like netem
or codel.

The vulnerability works as follows:
1. hfsc_change_class() checks if a class has packets (q.qlen != 0)
2. It then calls qdisc_peek_len(), which for certain qdiscs (e.g.,
   codel, netem) might drop packets and empty the queue
3. The code continues assuming the queue is still non-empty, adding
   the class to vttree
4. This breaks HFSC scheduler assumptions that only non-empty classes
   are in vttree
5. Later, when the class is destroyed, this can lead to a Use-After-Free

The fix adds a second queue length check after qdisc_peek_len() to verify
the queue wasn't emptied.

Fixes: 21f4d5cc25ec ("net_sched/hfsc: fix curve activation in hfsc_change_class()")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Reviewed-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250417184732.943057-2-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'enable-multiple-irq-lines-support-in-airoha_eth-driver'
Jakub Kicinski [Thu, 24 Apr 2025 00:03:54 +0000 (17:03 -0700)] 
Merge branch 'enable-multiple-irq-lines-support-in-airoha_eth-driver'

Lorenzo Bianconi says:

====================
Enable multiple IRQ lines support in airoha_eth driver

EN7581 ethernet SoC supports 4 programmable IRQ lines each one composed
by 4 IRQ configuration registers to map Tx/Rx queues. Enable multiple
IRQ lines support.
====================

Link: https://patch.msgid.link/20250418-airoha-eth-multi-irq-v1-0-1ab0083ca3c1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: airoha: Enable multiple IRQ lines support in airoha_eth driver.
Lorenzo Bianconi [Fri, 18 Apr 2025 10:40:50 +0000 (12:40 +0200)] 
net: airoha: Enable multiple IRQ lines support in airoha_eth driver.

EN7581 ethernet SoC supports 4 programmable IRQ lines for Tx and Rx
interrupts. Enable multiple IRQ lines support. Map Rx/Tx queues to the
available IRQ lines using the default scheme used in the vendor SDK:

- IRQ0: rx queues [0-4],[7-9],15
- IRQ1: rx queues [21-30]
- IRQ2: rx queues 5
- IRQ3: rx queues 6

Tx queues interrupts are managed by IRQ0.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250418-airoha-eth-multi-irq-v1-2-1ab0083ca3c1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: airoha: Introduce airoha_irq_bank struct
Lorenzo Bianconi [Fri, 18 Apr 2025 10:40:49 +0000 (12:40 +0200)] 
net: airoha: Introduce airoha_irq_bank struct

EN7581 ethernet SoC supports 4 programmable IRQ lines each one composed
by 4 IRQ configuration registers. Add airoha_irq_bank struct as a
container for independent IRQ lines info (e.g. IRQ number, enabled source
interrupts, ecc). This is a preliminary patch to support multiple IRQ lines
in airoha_eth driver.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250418-airoha-eth-multi-irq-v1-1-1ab0083ca3c1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'r8169-merge-chip-versions'
Jakub Kicinski [Wed, 23 Apr 2025 23:58:05 +0000 (16:58 -0700)] 
Merge branch 'r8169-merge-chip-versions'

Heiner Kallweit says:

====================
r8169: merge chip versions

After 2b065c098c37 ("r8169: refactor chip version detection") we can
merge handling of few chip versions.
====================

Link: https://patch.msgid.link/5e1e14ea-d60f-4608-88eb-3104b6bbace8@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agor8169: merge chip versions 52 and 53 (RTL8117)
Heiner Kallweit [Fri, 18 Apr 2025 09:25:17 +0000 (11:25 +0200)] 
r8169: merge chip versions 52 and 53 (RTL8117)

Handling of both chip versions is the same, only difference is
the firmware. So we can merge handling of both chip versions.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/ae866b71-c904-434e-befb-848c831e33ff@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agor8169: merge chip versions 64 and 65 (RTL8125D)
Heiner Kallweit [Fri, 18 Apr 2025 09:24:30 +0000 (11:24 +0200)] 
r8169: merge chip versions 64 and 65 (RTL8125D)

Handling of both chip versions is the same, only difference is
the firmware. So we can merge handling of both chip versions.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/0baad123-c679-4154-923f-fdc12783e900@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agor8169: merge chip versions 70 and 71 (RTL8126A)
Heiner Kallweit [Fri, 18 Apr 2025 09:23:45 +0000 (11:23 +0200)] 
r8169: merge chip versions 70 and 71 (RTL8126A)

Handling of both chip versions is the same, only difference is
the firmware. So we can merge handling of both chip versions.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/97d7ae79-d021-4b6b-b424-89e5e305b029@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: remove function stubs
Heiner Kallweit [Fri, 18 Apr 2025 09:04:01 +0000 (11:04 +0200)] 
net: phy: remove function stubs

All callers of these functions depend on PHYLIB or select it directly
or indirectly by selecting PHYLINK. Stubs make sense for optional
functionality, but that's not the case here.

MDIO_XGENE usually is selected by NET_XGENE which also selects PHYLIB.
Add a dependency to PHYLIB nevertheless, in order not to break
randconfig builds.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/f7a69a1f-60e9-4ac0-8b7c-481e0cc850e7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'mptcp-pm-defer-freeing-userspace-pm-entries'
Jakub Kicinski [Wed, 23 Apr 2025 23:28:06 +0000 (16:28 -0700)] 
Merge branch 'mptcp-pm-defer-freeing-userspace-pm-entries'

Matthieu Baerts says:

====================
mptcp: pm: Defer freeing userspace pm entries

Here are two unrelated fixes for MPTCP:

- Patch 1: free userspace PM entry with RCU helpers. A fix for v6.14.

- Patch 2: avoid a warning when running diag.sh selftest. A fix for
  v6.15-rc1.
====================

Link: https://patch.msgid.link/20250421-net-mptcp-pm-defer-freeing-v1-0-e731dc6e86b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: mptcp: diag: use mptcp_lib_get_info_value
Geliang Tang [Mon, 21 Apr 2025 17:07:14 +0000 (19:07 +0200)] 
selftests: mptcp: diag: use mptcp_lib_get_info_value

When running diag.sh in a loop, chk_dump_one will report the following
"grep: write error":

 13 ....chk 2 cestab                                  [ OK ]
 grep: write error
 14 ....chk dump_one                                  [ OK ]
 15 ....chk 2->0 msk in use after flush               [ OK ]
 16 ....chk 2->0 cestab after flush                   [ OK ]

This error is caused by a broken pipe. When the output of 'ss' is processed
by grep, 'head -n 1' will exit immediately after getting the first line,
causing the subsequent pipe to close. At this time, if 'grep' is still
trying to write data to the closed pipe, it will trigger a SIGPIPE signal,
causing a write error.

One solution is not to use this problematic "head -n 1" command, but to use
mptcp_lib_get_info_value() helper defined in mptcp_lib.sh to get the value
of 'token'.

Fixes: ba2400166570 ("selftests: mptcp: add a test for mptcp_diag_dump_one")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250421-net-mptcp-pm-defer-freeing-v1-2-e731dc6e86b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agomptcp: pm: Defer freeing of MPTCP userspace path manager entries
Mat Martineau [Mon, 21 Apr 2025 17:07:13 +0000 (19:07 +0200)] 
mptcp: pm: Defer freeing of MPTCP userspace path manager entries

When path manager entries are deleted from the local address list, they
are first unlinked from the address list using list_del_rcu(). The
entries must not be freed until after the RCU grace period, but the
existing code immediately frees the entry.

Use kfree_rcu_mightsleep() and adjust sk_omem_alloc in open code instead
of using the sock_kfree_s() helper. This code path is only called in a
netlink handler, so the "might sleep" function is preferable to adding
a rarely-used rcu_head member to struct mptcp_pm_addr_entry.

Fixes: 88d097316371 ("mptcp: drop free_list for deleting entries")
Cc: stable@vger.kernel.org
Signed-off-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250421-net-mptcp-pm-defer-freeing-v1-1-e731dc6e86b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: Fix wild-memory-access in __register_pernet_operations() when CONFIG_NET_NS=n.
Kuniyuki Iwashima [Fri, 18 Apr 2025 21:50:20 +0000 (14:50 -0700)] 
net: Fix wild-memory-access in __register_pernet_operations() when CONFIG_NET_NS=n.

kernel test robot reported the splat below. [0]

Before commit fed176bf3143 ("net: Add ops_undo_single for module
load/unload."), if CONFIG_NET_NS=n, ops was linked to pernet_list
only when init_net had not been initialised, and ops was unlinked
from pernet_list only under the same condition.

Let's say an ops is loaded before the init_net setup but unloaded
after that.  Then, the ops remains in pernet_list, which seems odd.

The cited commit added ops_undo_single(), which calls list_add() for
ops to link it to a temporary list, so a minor change was added to
__register_pernet_operations() and __unregister_pernet_operations()
under CONFIG_NET_NS=n to avoid the pernet_list corruption.

However, the corruption must have been left as is.

When CONFIG_NET_NS=n, pernet_list was used to keep ops registered
before the init_net setup, and after that, pernet_list was not used
at all.

This was because some ops annotated with __net_initdata are cleared
out of memory at some point during boot.

Then, such ops is initialised by POISON_FREE_INITMEM (0xcc), resulting
in that ops->list.{next,prev} suddenly switches from a valid pointer
to a weird value, 0xcccccccccccccccc.

To avoid such wild memory access, let's allow the pernet_list
corruption for CONFIG_NET_NS=n.

[0]:
Oops: general protection fault, probably for non-canonical address 0xf999959999999999: 0000 [#1] SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0xccccccccccccccc8-0xcccccccccccccccf]
CPU: 2 UID: 0 PID: 346 Comm: modprobe Not tainted 6.15.0-rc1-00294-ga4cba7e98e35 #85 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:__list_add_valid_or_report (lib/list_debug.c:32)
Code: 48 c1 ea 03 80 3c 02 00 0f 85 5a 01 00 00 49 39 74 24 08 0f 85 83 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 1f 01 00 00 4c 39 26 0f 85 ab 00 00 00 4c 39 ee
RSP: 0018:ff11000135b87830 EFLAGS: 00010a07
RAX: dffffc0000000000 RBX: ffffffffc02223c0 RCX: ffffffff8406fcc2
RDX: 1999999999999999 RSI: cccccccccccccccc RDI: ffffffffc02223c0
RBP: ffffffff86064e40 R08: 0000000000000001 R09: fffffbfff0a9f5b5
R10: ffffffff854fadaf R11: 676552203a54454e R12: ffffffff86064e40
R13: ffffffffc02223c0 R14: ffffffff86064e48 R15: 0000000000000021
FS:  00007f6fb0d9e1c0(0000) GS:ff11000858ea0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6fb0eda580 CR3: 0000000122fec005 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 register_pernet_operations (./include/linux/list.h:150 (discriminator 5) ./include/linux/list.h:183 (discriminator 5) net/core/net_namespace.c:1315 (discriminator 5) net/core/net_namespace.c:1359 (discriminator 5))
 register_pernet_subsys (net/core/net_namespace.c:1401)
 inet6_init (net/ipv6/af_inet6.c:535) ipv6
 do_one_initcall (init/main.c:1257)
 do_init_module (kernel/module/main.c:2942)
 load_module (kernel/module/main.c:3409)
 init_module_from_file (kernel/module/main.c:3599)
 idempotent_init_module (kernel/module/main.c:3611)
 __x64_sys_finit_module (./include/linux/file.h:62 ./include/linux/file.h:83 kernel/module/main.c:3634 kernel/module/main.c:3621 kernel/module/main.c:3621)
 do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
 RIP: 0033:0x7f6fb0df7e5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007fffdc6a8968 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055b535721b70 RCX: 00007f6fb0df7e5d
RDX: 0000000000000000 RSI: 000055b51e44aa2a RDI: 0000000000000004
RBP: 0000000000040000 R08: 0000000000000000 R09: 000055b535721b30
R10: 0000000000000004 R11: 0000000000000246 R12: 000055b51e44aa2a
R13: 000055b535721bf0 R14: 000055b5357220b0 R15: 0000000000000000
 </TASK>
Modules linked in: ipv6(+) crc_ccitt

Fixes: fed176bf3143 ("net: Add ops_undo_single for module load/unload.")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202504181511.1c3f23e4-lkp@intel.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418215025.87871-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'netlink-specs-rtnetlink-adjust-specs-for-c-codegen'
Jakub Kicinski [Wed, 23 Apr 2025 23:07:22 +0000 (16:07 -0700)] 
Merge branch 'netlink-specs-rtnetlink-adjust-specs-for-c-codegen'

Jakub Kicinski says:

====================
netlink: specs: rtnetlink: adjust specs for C codegen

The first patch brings a schema extension allowing specifying
"header" (as in .h file) properties in attribute sets.
This is used for rare cases where we carry attributes from
another family in a nest - we need to include the extra
headers. If we were to generate kernel code we'd also
need to skip it in the uAPI output.

The remaining 11 patches are pretty boring schema adjustments.
====================

Link: https://patch.msgid.link/20250418021706.1967583-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rt-rule: add C naming info
Jakub Kicinski [Fri, 18 Apr 2025 02:17:06 +0000 (19:17 -0700)] 
netlink: specs: rt-rule: add C naming info

Add properties needed for C codegen to match names with uAPI headers.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-13-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rtnetlink: correct notify properties
Jakub Kicinski [Fri, 18 Apr 2025 02:17:05 +0000 (19:17 -0700)] 
netlink: specs: rtnetlink: correct notify properties

The notify property should point at the object the notifications
carry, usually the get object, not the cmd which triggers
the notification:

  notify:
    description: Name of the command sharing the reply type with
                 this notification.

Not treating this as a fix, I think that only C codegen cares.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rt-neigh: make sure getneigh is consistent
Jakub Kicinski [Fri, 18 Apr 2025 02:17:04 +0000 (19:17 -0700)] 
netlink: specs: rt-neigh: make sure getneigh is consistent

The consistency check complains replies to do and dump don't match
because dump has no value. It doesn't have to by the schema... but
fixing this in code gen would be more code than adjusting the spec.
This is rare.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rt-neigh: add C naming info
Jakub Kicinski [Fri, 18 Apr 2025 02:17:03 +0000 (19:17 -0700)] 
netlink: specs: rt-neigh: add C naming info

Add properties needed for C codegen to match names with uAPI headers.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rt-link: add notification for newlink
Jakub Kicinski [Fri, 18 Apr 2025 02:17:02 +0000 (19:17 -0700)] 
netlink: specs: rt-link: add notification for newlink

Add a notification entry for netlink so that we can test ntf handling
in classic netlink and C.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rt-link: make bond's ipv6 address attribute fixed size
Jakub Kicinski [Fri, 18 Apr 2025 02:17:01 +0000 (19:17 -0700)] 
netlink: specs: rt-link: make bond's ipv6 address attribute fixed size

ns-ip6-target is an indexed-array. Codegen for variable size binary
array would be a bit tedious, tell C that we know the size of these
attributes, since they are IPv6 addrs.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rt-link: adjust AF_ nest for C codegen
Jakub Kicinski [Fri, 18 Apr 2025 02:17:00 +0000 (19:17 -0700)] 
netlink: specs: rt-link: adjust AF_ nest for C codegen

The AF nest is indexed by AF ID, so it's a bit strange,
but with minor adjustments C codegen deals with it just fine.
Entirely unclear why the names have been in quotes here.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rt-link: add C naming info
Jakub Kicinski [Fri, 18 Apr 2025 02:16:59 +0000 (19:16 -0700)] 
netlink: specs: rt-link: add C naming info

Add properties needed for C codegen to match names with uAPI headers.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rt-link: remove duplicated group in attr list
Jakub Kicinski [Fri, 18 Apr 2025 02:16:58 +0000 (19:16 -0700)] 
netlink: specs: rt-link: remove duplicated group in attr list

group is listed twice for newlink.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonetlink: specs: rt-link: remove if-netnsid from attr list
Jakub Kicinski [Fri, 18 Apr 2025 02:16:57 +0000 (19:16 -0700)] 
netlink: specs: rt-link: remove if-netnsid from attr list

if-netnsid an alias to target-netnsid:

  IFLA_TARGET_NETNSID = IFLA_IF_NETNSID, /* new alias */

We don't have a definition for this attr.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250418021706.1967583-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>