]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
6 weeks agonetlink: specs: fou: change local-v6/peer-v6 check
Asbjørn Sloth Tønnesen [Tue, 2 Sep 2025 15:46:35 +0000 (15:46 +0000)] 
netlink: specs: fou: change local-v6/peer-v6 check

While updating the binary min-len implementation, I noticed that
the only user, should AFAICT be using exact-len instead.

In net/ipv4/fou_core.c FOU_ATTR_LOCAL_V6 and FOU_ATTR_PEER_V6
are only used for singular IPv6 addresses, and there are AFAICT
no known implementations trying to send more, it therefore
appears safe to change it to an exact-len policy.

This patch therefore changes the local-v6/peer-v6 attributes to
use an exact-len check, instead of a min-len check.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250902154640.759815-2-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'mptcp-misc-features-for-v6-18'
Jakub Kicinski [Wed, 3 Sep 2025 22:08:22 +0000 (15:08 -0700)] 
Merge branch 'mptcp-misc-features-for-v6-18'

Matthieu Baerts says:

====================
mptcp: misc. features for v6.18

This series contains 4 independent new features:

- Patch 1: use HMAC-SHA256 library instead of open-coded HMAC.

- Patch 2: selftests: check for unexpected fallback counter increments.

- Patches 3-4: record subflows in RPS table, for aRFS support.

v1: https://lore.kernel.org/20250901-net-next-mptcp-misc-feat-6-18-v1-0-80ae80d2b903@kernel.org
====================

Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-0-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agomptcp: record subflows in RPS table
Christoph Paasch [Tue, 2 Sep 2025 21:11:36 +0000 (23:11 +0200)] 
mptcp: record subflows in RPS table

Accelerated Receive Flow Steering (aRFS) relies on sockets recording
their RX flow hash into the rps_sock_flow_table so that incoming packets
are steered to the CPU where the application runs.

With MPTCP, the application interacts with the parent MPTCP socket while
data is carried over per-subflow TCP sockets. Without recording these
subflows, aRFS cannot steer interrupts and RX processing for the flows
to the desired CPU.

Record all subflows in the RPS table by calling sock_rps_record_flow()
for each subflow at the start of mptcp_sendmsg(), mptcp_recvmsg() and
mptcp_stream_accept(), by using the new helper
mptcp_rps_record_subflows().

It does not by itself improve throughput, but ensures that IRQ and RX
processing are directed to the right CPU, which is a
prerequisite for effective aRFS.

Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-4-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: Add rfs_needed() helper
Christoph Paasch [Tue, 2 Sep 2025 21:11:35 +0000 (23:11 +0200)] 
net: Add rfs_needed() helper

Add a helper to check if RFS is needed or not. Allows to make the code a
bit cleaner and the next patch to have MPTCP use this helper to decide
whether or not to iterate over the subflows.

tun_flow_update() was calling sock_rps_record_flow_hash() regardless of
the state of rfs_needed. This was not really a bug as sock_flow_table
simply ends up being NULL and thus everything will be fine.
This commit here thus also implicitly makes tun_flow_update() respect
the state of rfs_needed.

Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-3-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: mptcp: add checks for fallback counters
Gang Yan [Tue, 2 Sep 2025 21:11:34 +0000 (23:11 +0200)] 
selftests: mptcp: add checks for fallback counters

Recently, some mib counters about fallback has been added, this patch
provides a method to check the expected behavior of these mib counters
during the test execution.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/571
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-2-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agomptcp: use HMAC-SHA256 library instead of open-coded HMAC
Eric Biggers [Tue, 2 Sep 2025 21:11:33 +0000 (23:11 +0200)] 
mptcp: use HMAC-SHA256 library instead of open-coded HMAC

Now that there are easy-to-use HMAC-SHA256 library functions, use these
in net/mptcp/crypto.c instead of open-coding the HMAC algorithm.

Remove the WARN_ON_ONCE() for messages longer than SHA256_DIGEST_SIZE.
The new implementation handles all message lengths correctly.

The mptcp-crypto KUnit test still passes after this change.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-1-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'mlx5-psp-ifc' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Wed, 3 Sep 2025 21:59:40 +0000 (14:59 -0700)] 
Merge tag 'mlx5-psp-ifc' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5 PSP IFC bits

This PR has a single patch to add mlx5_ifc PSP related capabilities structures
and HW definitions needed for PSP support in mlx5.

Link: https://lore.kernel.org/20250828162953.2707727-1-daniel.zahka@gmail.com
* tag 'mlx5-psp-ifc' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add PSP capabilities structures and bits
====================

Link: https://patch.msgid.link/20250903063050.668442-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5: Add PSP capabilities structures and bits
Saeed Mahameed [Wed, 3 Sep 2025 05:45:24 +0000 (22:45 -0700)] 
net/mlx5: Add PSP capabilities structures and bits

Add mlx5_ifc PSP related capabilities structures and HW definitions
needed for PSP support in mlx5.

Link: https://lore.kernel.org/netdev/20250828162953.2707727-1-daniel.zahka@gmail.com/
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
7 weeks agoMerge branch 'net-dsa-lantiq_gswip-prepare-for-supporting-maxlinear-gsw1xx'
Jakub Kicinski [Wed, 3 Sep 2025 00:45:44 +0000 (17:45 -0700)] 
Merge branch 'net-dsa-lantiq_gswip-prepare-for-supporting-maxlinear-gsw1xx'

Daniel Golle says:

====================
net: dsa: lantiq_gswip: prepare for supporting MaxLinear GSW1xx

Continue to prepare for supporting the newer standalone MaxLinear GSW1xx
switch family by extending the existing lantiq_gswip driver to allow it
to support MII interfaces and MDIO bus of the GSW1xx.

This series has been preceded by an RFC series which covers everything
needed to support the MaxLinear GSW1xx family of switches. Andrew Lunn
had suggested to split it into a couple of smaller series and start
with the changes which don't yet make actual functional changes or
support new features.

Everything has been compile and runtime tested on AVM Fritz!Box 7490
(GSWIP version 2.1, VR9 v1.2)

Link: https://lore.kernel.org/netdev/aKDhFCNwjDDwRKsI@pidgin.makrotopia.org/
====================

Link: https://patch.msgid.link/cover.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dsa: lantiq_gswip: move MDIO bus registration to .setup()
Daniel Golle [Sat, 30 Aug 2025 02:34:48 +0000 (03:34 +0100)] 
net: dsa: lantiq_gswip: move MDIO bus registration to .setup()

Instead of registering the switch MDIO bus in the probe() function, move
the call to gswip_mdio() into the .setup() DSA switch op, so it can be
reused independently of the probe() function.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/2650602042c0bfdc5664b88d59071ed4dca96c26.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dsa: lantiq_gswip: support standard MDIO node name
Daniel Golle [Sat, 30 Aug 2025 02:34:23 +0000 (03:34 +0100)] 
net: dsa: lantiq_gswip: support standard MDIO node name

Instead of matching against the child node's compatible string also
support locating the node of the device tree node of the MDIO bus
in the standard way by referencing the node name ("mdio").

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/5a9a3d659ef0d8b7eca37fb69ec87ff5a3192820.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dsa: lantiq_gswip: support offset of MII registers
Daniel Golle [Sat, 30 Aug 2025 02:34:13 +0000 (03:34 +0100)] 
net: dsa: lantiq_gswip: support offset of MII registers

The MaxLinear GSW1xx family got a single (R)(G)MII port at index 5 but
the registers MII_PCDU and MII_CFG are those of port 0.
Allow applying an offset for the port index to access those registers.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/88145164c1f948e4ae9b04706f408359cf54223c.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dsa: lantiq_gswip: ignore SerDes modes in phylink_mac_config()
Daniel Golle [Sat, 30 Aug 2025 02:33:48 +0000 (03:33 +0100)] 
net: dsa: lantiq_gswip: ignore SerDes modes in phylink_mac_config()

We can safely ignore SerDes interface modes 1000Base-X, 2500Base-X and
SGMII in phylink_mac_config() as they are being taken care of by the PCS
and the SGMII port anyway doesn't have MII_CFG and MII_PCDU registers
and hence gswip_phylink_mac_config() is already a no-op apart from
outputing a misleading error message.

Return early in case of SerDes interface modes to avoid printing that
error message.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/dcb066d6a02e6340314b5ff4f73937757a4f8eb3.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dsa: lantiq_gswip: support model-specific mac_select_pcs()
Daniel Golle [Sat, 30 Aug 2025 02:33:00 +0000 (03:33 +0100)] 
net: dsa: lantiq_gswip: support model-specific mac_select_pcs()

Call mac_select_pcs() function if provided in struct gswip_hwinfo.
The MaxLinear GSW1xx series got one port wired to a SerDes PCS and
PHY which can do 1000Base-X, 2500Base-X and SGMII. Support for the
SerDes port will be provided using phylink_pcs, so provide a
convenient way for mac_select_pcs() to differ based on the hardware
model.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/7668666aa51e43e7f2a6cbcf36eb5a0a3020998f.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dsa: lantiq_gswip: move to dedicated folder
Daniel Golle [Sat, 30 Aug 2025 02:32:42 +0000 (03:32 +0100)] 
net: dsa: lantiq_gswip: move to dedicated folder

Move the lantiq_gswip driver to its own folder and update
MAINTAINERS file accordingly.
This is done ahead of extending the driver to support the MaxLinear
GSW1xx series of standalone switch ICs, which includes adding a bunch
of files.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/a5923dee9a174501b284dc473bdec9dd89c68de1.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoipv6: Add sanity checks on ipv6_devconf.rpl_seg_enabled
Yue Haibing [Mon, 1 Sep 2025 12:37:26 +0000 (20:37 +0800)] 
ipv6: Add sanity checks on ipv6_devconf.rpl_seg_enabled

In ipv6_rpl_srh_rcv() we use min(net->ipv6.devconf_all->rpl_seg_enabled,
idev->cnf.rpl_seg_enabled) is intended to return 0 when either value is
zero, but if one of the values is negative it will in fact return non-zero.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250901123726.1972881-3-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: net: avoid memory leak
Zongmin Zhou [Mon, 1 Sep 2025 05:45:57 +0000 (13:45 +0800)] 
selftests: net: avoid memory leak

The buffer be used without free,fix it to avoid memory leak.

Signed-off-by: Zongmin Zhou <zhouzongmin@kylinos.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901054557.32811-1-min_halo@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: macb: Validate the value of base_time properly
Chandra Mohan Sundar [Mon, 1 Sep 2025 16:29:19 +0000 (21:59 +0530)] 
net: macb: Validate the value of base_time properly

In macb_taprio_setup_replace(), the value of start_time is being
compared against zero which would never be true since start_time
is an unsigned value. Due to this there is a chance that an
incorrect config base time value can be used for computation.

Fix by checking the value of conf->base_time directly.
This issue was reported by static coverity analyzer.

Fixes: 89934dbf169e3 ("net: macb: Add TAPRIO traffic scheduling support")
Signed-off-by: Chandra Mohan Sundar <chandramohan.explore@gmail.com>
Reviewed-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com>
Link: https://patch.msgid.link/20250901162923.627765-1-chandramohan.explore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: drv-net: rss_ctx: make the test pass with few queues
Jakub Kicinski [Mon, 1 Sep 2025 17:31:39 +0000 (10:31 -0700)] 
selftests: drv-net: rss_ctx: make the test pass with few queues

rss_ctx.test_rss_key_indir implicitly expects at least 5 queues,
as it checks that the traffic on first 2 queues is lower than
the remaining queues when we use all queues. Special case fewer
queues.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901173139.881070-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: drv-net: rss_ctx: use Netlink for timed reconfig
Jakub Kicinski [Mon, 1 Sep 2025 17:31:38 +0000 (10:31 -0700)] 
selftests: drv-net: rss_ctx: use Netlink for timed reconfig

The rss_ctx test has gotten pretty flaky after I increased
the queue count in NIPA 2->3. Not 100% clear why. We get
a lot of failures in the rss_ctx.test_hitless_key_update case.

Looking closer it appears that the failures are mostly due
to startup costs. I measured the following timing for ethtool -X:
 - python cmd(shell=True)  : 150-250msec
 - python cmd(shell=False) :  50- 70msec
 - timed in bash           :  45- 55msec
 - YNL Netlink call        :   2-  4msec
 - .set_rxfh callback      :   1-  2msec

The target in the test was set to 200msec. We were mostly measuring
ethtool startup cost it seems. Switch to YNL since it's 100x faster.

Lower the pass criteria to 150msec, no real science behind this number
but we removed some overhead, drivers which previously passed 200msec
should easily pass 150msec now.

Separately we should probably follow up on defaulting to shell=False,
when script doesn't explicitly ask for True, because the overhead
is rather significant.

Switch from _rss_key_rand() to random.randbytes(), YNL takes a binary
array rather than array of ints.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901173139.881070-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: net: py: don't default to shell=True
Jakub Kicinski [Sat, 30 Aug 2025 18:43:17 +0000 (11:43 -0700)] 
selftests: net: py: don't default to shell=True

Overhead of using shell=True is quite significant.
Micro-benchmark of running ethtool --help shows that
non-shell run is 2x faster.

Runtime of the XDP tests also shows improvement:
this patch: 2m34s 2m21s 2m18s 2m18s
    before:     2m54s 2m36s 2m34s

Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250830184317.696121-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: drv-net: adjust tests before defaulting to shell=False
Jakub Kicinski [Sat, 30 Aug 2025 18:43:16 +0000 (11:43 -0700)] 
selftests: drv-net: adjust tests before defaulting to shell=False

Clean up tests which expect shell=True without explicitly passing
that param to cmd(). There seems to be only one such case, and
in fact it's better converted to a direct write.

Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250830184317.696121-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: act: remove tcfa_qstats
Eric Dumazet [Mon, 1 Sep 2025 09:31:41 +0000 (09:31 +0000)] 
net_sched: act: remove tcfa_qstats

tcfa_qstats is currently only used to hold drops and overlimits counters.

tcf_action_inc_drop_qstats() and tcf_action_inc_overlimit_qstats()
currently acquire a->tcfa_lock to increment these counters.

Switch to two atomic_t to get lock-free accounting.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250901093141.2093176-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: add back BH safety to tcf_lock
Eric Dumazet [Mon, 1 Sep 2025 09:26:07 +0000 (09:26 +0000)] 
net_sched: add back BH safety to tcf_lock

Jamal reported that we had to use BH safety after all,
because stats can be updated from BH handler.

Fixes: 3133d5c15cb5 ("net_sched: remove BH blocking in eight actions")
Fixes: 53df77e78590 ("net_sched: act_skbmod: use RCU in tcf_skbmod_dump()")
Fixes: e97ae742972f ("net_sched: act_tunnel_key: use RCU in tunnel_key_dump()")
Fixes: 48b5e5dbdb23 ("net_sched: act_vlan: use RCU in tcf_vlan_dump()")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Closes: https://lore.kernel.org/netdev/CAM0EoMmhq66EtVqDEuNik8MVFZqkgxFbMu=fJtbNoYD7YXg4bA@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250901092608.2032473-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: selftests: clean up tools/testing/selftests/net/lib/py/utils.py
Breno Leitao [Mon, 1 Sep 2025 10:00:07 +0000 (03:00 -0700)] 
net: selftests: clean up tools/testing/selftests/net/lib/py/utils.py

This patch improves the utils.py module by removing unused imports
(errno, random), simplifying the fd_read_timeout() function by
eliminating unnecessary else clause, and cleaning up code style in the
defer class constructor.

Additionally, it renames the parameter in rand_port() from 'type' to
'stype' to avoid shadowing the built-in Python name 'type', improving
code clarity and preventing potential issues.

These changes enhance code readability and maintainability without
affecting functionality.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901-fix-v1-1-df0abb67481e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/smc: Replace use of strncpy on NUL-terminated string with strscpy
James Flowers [Mon, 1 Sep 2025 03:04:59 +0000 (20:04 -0700)] 
net/smc: Replace use of strncpy on NUL-terminated string with strscpy

strncpy is deprecated for use on NUL-terminated strings, as indicated in
Documentation/process/deprecated.rst. strncpy NUL-pads the destination
buffer and doesn't guarantee the destination buffer will be NUL
terminated.

Signed-off-by: James Flowers <bold.zone2373@fastmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com>
Link: https://patch.msgid.link/20250901030512.80099-1-bold.zone2373@fastmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: mvpp2: add xlg pcs inband capabilities
Russell King (Oracle) [Sun, 31 Aug 2025 17:01:51 +0000 (18:01 +0100)] 
net: mvpp2: add xlg pcs inband capabilities

Add PCS inband capabilities for XLG in the Marvell PP2 driver, so
phylink knows that 5G and 10G speeds have no inband capabilities.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/E1uslR9-00000001OxL-44CD@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agobonding: Remove support for use_carrier
Jay Vosburgh [Sat, 30 Aug 2025 00:08:37 +0000 (17:08 -0700)] 
bonding: Remove support for use_carrier

Remove the implementation of use_carrier, the link monitoring
method that utilizes ethtool or ioctl to determine the link state of an
interface in a bond.  Bonding will always behaves as if use_carrier=1,
which relies on netif_carrier_ok() to determine the link state of
interfaces.

To avoid acquiring RTNL many times per second, bonding inspects
link state under RCU, but not under RTNL.  However, ethtool
implementations in drivers may sleep, and therefore this strategy is
unsuitable for use with calls into driver ethtool functions.

The use_carrier option was introduced in 2003, to provide
backwards compatibility for network device drivers that did not support
the then-new netif_carrier_ok/on/off system.  Device drivers are now
expected to support netif_carrier_*, and the use_carrier backwards
compatibility logic is no longer necessary.

The option itself remains, but when queried always returns 1,
and may only be set to 1.

Link: https://lore.kernel.org/000000000000eb54bf061cfd666a@google.com
Link: https://lore.kernel.org/20240718122017.d2e33aaac43a.I10ab9c9ded97163aef4e4de10985cd8f7de60d28@changeid
Signed-off-by: Jay Vosburgh <jv@jvosburgh.net>
Reported-by: syzbot+b8c48ea38ca27d150063@syzkaller.appspotmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/2029487.1756512517@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'e-switch-vport-sharing-delegation'
Paolo Abeni [Tue, 2 Sep 2025 13:18:18 +0000 (15:18 +0200)] 
Merge branch 'e-switch-vport-sharing-delegation'

Saeed Mahameed says:

====================
E-Switch vport sharing & delegation

An mlx5 E-Switch FDB table can manage vports belonging to other sibling
physical functions, such as ECPF (ARM embedded cores) and Host PF (x86).
This enables a single source of truth for SDN software to manage network
pipelines from one host. While such functionality already exists in mlx5,
it is currently limited by static vport allocation,
meaning the number of vports shared between multi-host functions
must be known pre-boot.

This patchset enables delegated/external vports to be discovered
dynamically when switchdev mode is enabled, leveraging new firmware
capabilities for dynamic vport creation.

Adjacent functions that delegate their SR-IOV VFs to sibling PFs, can be
dynamically discovered on the sibling PF's switchdev mode enabling,
after sriov was enabled on the originating PF, allowing for more
flexible and scalable management in multi-host and ECPF-to-host
scenarios.

The patchset consists of the following changes:

- Refactoring of ACL root namespace handling: The storage of vport ACL root
  namespaces is converted from a linear array to an xarray, allowing dynamic
  creation of ACLs per individual vport.
- Improvements for vhca_id to vport mapping.
- Dynamic querying and creation of delegated functions/vports.
====================

Link: https://patch.msgid.link/20250829223722.900629-1-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: {DR,HWS}, Use the cached vhca_id for this device
Saeed Mahameed [Fri, 29 Aug 2025 22:37:22 +0000 (15:37 -0700)] 
net/mlx5: {DR,HWS}, Use the cached vhca_id for this device

The mlx5 driver caches many capabilities to be used by mlx5 layers.

In SW and HW steering we can use the cached vhca_id instead of invoking
FW commands.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-8-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: E-switch, Set representor attributes for adjacent VFs
Adithya Jayachandran [Fri, 29 Aug 2025 22:37:21 +0000 (15:37 -0700)] 
net/mlx5: E-switch, Set representor attributes for adjacent VFs

Adjacent vfs get their devlink port information from firmware,
use the information (pfnum, function id) from FW when populating the
devlink port attributes.

Before:
$ devlink port show
pci/0000:00:03.0/180225: type eth netdev eth0 flavour pcivf controller 0 pfnum 0 vfnum 49152 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00

After:
$ devlink port show
pci/0000:00:03.0/180225: type eth netdev enp0s3npf0vf2 flavour pcivf controller 0 pfnum 0 vfnum 2 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00

Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-7-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: E-Switch, Register representors for adjacent vports
Saeed Mahameed [Fri, 29 Aug 2025 22:37:20 +0000 (15:37 -0700)] 
net/mlx5: E-Switch, Register representors for adjacent vports

Register representors for adjacent vports dynamically when they are
discovered. Dynamically added representors state will now be set to
'REGISTERED' when the representor type was already registered,
otherwise they won't be loaded.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-6-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: E-Switch, Create acls root namespace for adjacent vports
Saeed Mahameed [Fri, 29 Aug 2025 22:37:19 +0000 (15:37 -0700)] 
net/mlx5: E-Switch, Create acls root namespace for adjacent vports

Use the new vport acl root namespace add/remove API to create the
missing acl root name spaces per each new adjacent function vport.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-5-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: E-Switch, Add support for adjacent functions vports discovery
Adithya Jayachandran [Fri, 29 Aug 2025 22:37:18 +0000 (15:37 -0700)] 
net/mlx5: E-Switch, Add support for adjacent functions vports discovery

Adding driver support to query adjacent functions vports, AKA
delegated vports.

Adjacent functions can delegate their sriov vfs to other sibling PF in
the system, to be managed by the eswitch capable sibling PF.
E.g, ECPF to Host PF, multi host PF between each other, etc.

Only supported in switchdev mode.

Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-4-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: E-Switch, Move vport acls root namespaces creation to eswitch
Saeed Mahameed [Fri, 29 Aug 2025 22:37:17 +0000 (15:37 -0700)] 
net/mlx5: E-Switch, Move vport acls root namespaces creation to eswitch

Move the loop that creates the vports ACLs root name spaces to eswitch,
since it is the eswitch responsibility to decide when and how many
vports ACLs root namespaces to create, in the next patches we will use
the fs_core vport ACL root namespace APIs to create/remove root ns
ACLs dynamically for dynamically created vports.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-3-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet/mlx5: FS, Convert vport acls root namespaces to xarray
Saeed Mahameed [Fri, 29 Aug 2025 22:37:16 +0000 (15:37 -0700)] 
net/mlx5: FS, Convert vport acls root namespaces to xarray

Before this patch it was a linear array and could only support a certain
number of vports, in the next patches, vport numbers are not bound to a
well known limit, thus convert acl root name space storage to xarray.

In addition create fs_core public API to add/remove vport acl namespaces
as it is the eswitch responsibility to create the vports and their
root name spaces for acls, in the next patch we will move
mlx5_fs_ingress_acls_{init,cleanup} to eswitch and will use
the individual mlx5_fs_vport_{egress,ingresS}_acl_ns_{add,remove}
APIs for dynamically create vports.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-2-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'add-netc-timer-ptp-driver-and-add-ptp-support-for-i-mx95'
Paolo Abeni [Tue, 2 Sep 2025 11:13:53 +0000 (13:13 +0200)] 
Merge branch 'add-netc-timer-ptp-driver-and-add-ptp-support-for-i-mx95'

Wei Fang says:

====================
Add NETC Timer PTP driver and add PTP support for i.MX95

This series adds NETC Timer PTP clock driver, which supports precise
periodic pulse, time capture on external pulse and PTP synchronization.
It also adds PTP support to the enetc v4 driver for i.MX95 and optimizes
the PTP-related code in the enetc driver.
====================

Link: https://patch.msgid.link/20250829050615.1247468-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: enetc: don't update sync packet checksum if checksum offload is used
Wei Fang [Fri, 29 Aug 2025 05:06:15 +0000 (13:06 +0800)] 
net: enetc: don't update sync packet checksum if checksum offload is used

For ENETC v4, the hardware has the capability to support Tx checksum
offload. so the enetc driver does not need to update the UDP checksum
of PTP sync packets if Tx checksum offload is enabled.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-15-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: enetc: add PTP synchronization support for ENETC v4
Wei Fang [Fri, 29 Aug 2025 05:06:14 +0000 (13:06 +0800)] 
net: enetc: add PTP synchronization support for ENETC v4

Regarding PTP, ENETC v4 has some changes compared to ENETC v1 (LS1028A),
mainly as follows.

1. ENETC v4 uses a different PTP driver, so the way to get phc_index is
different from LS1028A. Therefore, enetc_get_ts_info() has been modified
appropriately to be compatible with ENETC v1 and v4.

2. The PMa_SINGLE_STEP register has changed in ENETC v4, not only the
register offset, but also some register fields. Therefore, two helper
functions are added, enetc_set_one_step_ts() for ENETC v1 and
enetc4_set_one_step_ts() for ENETC v4.

3. Since the generic helper functions from ptp_clock are used to get
the PHC index of the PTP clock, so FSL_ENETC_CORE depends on Kconfig
symbol "PTP_1588_CLOCK_OPTIONAL". But FSL_ENETC_CORE can only be
selected, so add the dependency to FSL_ENETC, FSL_ENETC_VF and
NXP_ENETC4. Perhaps the best approach would be to change FSL_ENETC_CORE
to a visible menu entry. Then make FSL_ENETC, FSL_ENETC_VF, and
NXP_ENETC4 depend on it, but this is not the goal of this patch, so this
may be changed in the future.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-14-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: enetc: move sync packet modification before dma_map_single()
Wei Fang [Fri, 29 Aug 2025 05:06:13 +0000 (13:06 +0800)] 
net: enetc: move sync packet modification before dma_map_single()

Move sync packet content modification before dma_map_single() to follow
correct DMA usage process, even though the previous sequence worked due
to hardware DMA-coherence support (LS1028A). But for the upcoming i.MX95,
its ENETC (v4) does not support "dma-coherent", so this step is very
necessary. Otherwise, the originTimestamp and correction fields of the
sent packets will still be the values before the modification.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-13-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: enetc: remove unnecessary CONFIG_FSL_ENETC_PTP_CLOCK check
Wei Fang [Fri, 29 Aug 2025 05:06:12 +0000 (13:06 +0800)] 
net: enetc: remove unnecessary CONFIG_FSL_ENETC_PTP_CLOCK check

The ENETC_F_RX_TSTAMP flag of priv->active_offloads can only be set when
CONFIG_FSL_ENETC_PTP_CLOCK is enabled. Similarly, rx_ring->ext_en can
only be set when CONFIG_FSL_ENETC_PTP_CLOCK is enabled as well. So it is
safe to remove unnecessary CONFIG_FSL_ENETC_PTP_CLOCK check.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-12-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: enetc: extract enetc_update_ptp_sync_msg() to handle PTP Sync packets
Wei Fang [Fri, 29 Aug 2025 05:06:11 +0000 (13:06 +0800)] 
net: enetc: extract enetc_update_ptp_sync_msg() to handle PTP Sync packets

Move PTP Sync packet processing from enetc_map_tx_buffs() to a new helper
function enetc_update_ptp_sync_msg() to simplify the original function.
Prepare for upcoming ENETC v4 one-step support. There is no functional
change. It is worth mentioning that ENETC_TXBD_TSTAMP is added to replace
0x3fffffff.

Prepare for upcoming ENETC v4 one-step support.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-11-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agonet: enetc: save the parsed information of PTP packet to skb->cb
Wei Fang [Fri, 29 Aug 2025 05:06:10 +0000 (13:06 +0800)] 
net: enetc: save the parsed information of PTP packet to skb->cb

Currently, the Tx PTP packets are parsed twice in the enetc driver, once
in enetc_xmit() and once in enetc_map_tx_buffs(). The latter is duplicate
and is unnecessary, since the parsed information can be saved to skb->cb
so that enetc_map_tx_buffs() can get the previously parsed data from
skb->cb. Therefore, add struct enetc_skb_cb as the format of the data
in the skb->cb buffer to save the parsed information of PTP packet. Use
saved information in enetc_map_tx_buffs() to avoid parsing data again.

In addition, rename variables offset1 and offset2 in enetc_map_tx_buffs()
to corr_off and tstamp_off for better readability.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-10-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMAINTAINERS: add NETC Timer PTP clock driver section
Wei Fang [Fri, 29 Aug 2025 05:06:09 +0000 (13:06 +0800)] 
MAINTAINERS: add NETC Timer PTP clock driver section

Add a section entry for NXP NETC Timer PTP clock driver.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-9-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoptp: netc: add external trigger stamp support
F.S. Peng [Fri, 29 Aug 2025 05:06:08 +0000 (13:06 +0800)] 
ptp: netc: add external trigger stamp support

The NETC Timer is capable of recording the timestamp on receipt of an
external pulse on a GPIO pin. It supports two such external triggers.
The recorded value is saved in a 16 entry FIFO accessed by
TMR_ETTSa_H/L. An interrupt can be generated when the trigger occurs,
when the FIFO reaches a threshold or overflows.

Signed-off-by: F.S. Peng <fushi.peng@nxp.com>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-8-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoptp: netc: add periodic pulse output support
Wei Fang [Fri, 29 Aug 2025 05:06:07 +0000 (13:06 +0800)] 
ptp: netc: add periodic pulse output support

NETC Timer has three pulse channels, all of which support periodic pulse
output. Bind the channel to a ALARM register and then sets a future time
into the ALARM register. When the current time is greater than the ALARM
value, the FIPER register will be triggered to count down, and when the
count reaches 0, the pulse will be triggered. The PPS signal is also
implemented in this way.

i.MX95 only has ALARM1 can be used as an indication to the FIPER start
down counting, but i.MX943 has ALARM1 and ALARM2 can be used. Therefore,
only one channel can work for i.MX95, two channels for i.MX943 as most.

In addition, change the PPS channel to be dynamically selected from fixed
number (0) because add PTP_CLK_REQ_PEROUT support.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-7-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoptp: netc: add PTP_CLK_REQ_PPS support
Wei Fang [Fri, 29 Aug 2025 05:06:06 +0000 (13:06 +0800)] 
ptp: netc: add PTP_CLK_REQ_PPS support

The NETC Timer is capable of generating a PPS interrupt to the host. To
support this feature, a 64-bit alarm time (which is a integral second
of PHC in the future) is set to TMR_ALARM, and the period is set to
TMR_FIPER. The alarm time is compared to the current time on each update,
then the alarm trigger is used as an indication to the TMR_FIPER starts
down counting. After the period has passed, the PPS event is generated.

According to the NETC block guide, the Timer has three FIPERs, any of
which can be used to generate the PPS events, but in the current
implementation, we only need one of them to implement the PPS feature,
so FIPER 0 is used as the default PPS generator. Also, the Timer has
2 ALARMs, currently, ALARM 0 is used as the default time comparator.

However, if the time is adjusted or the integer of period is changed when
PPS is enabled, the PPS event will not be generated at an integral second
of PHC. The suggested steps from IP team if time drift happens:

1. Disable FIPER before adjusting the hardware time
2. Rearm ALARM after the time adjustment to make the next PPS event be
generated at an integral second of PHC.
3. Re-enable FIPER.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-6-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoptp: netc: add NETC V4 Timer PTP driver support
Wei Fang [Fri, 29 Aug 2025 05:06:05 +0000 (13:06 +0800)] 
ptp: netc: add NETC V4 Timer PTP driver support

NETC V4 Timer provides current time with nanosecond resolution, precise
periodic pulse, pulse on timeout (alarm), and time capture on external
pulse support. And it supports time synchronization as required for
IEEE 1588 and IEEE 802.1AS-2020.

Inside NETC, ENETC can capture the timestamp of the sent/received packet
through the PHC provided by the Timer and record it on the Tx/Rx BD. And
through the relevant PHC interfaces provided by the driver, the enetc V4
driver can support PTP time synchronization.

In addition, NETC V4 Timer is similar to the QorIQ 1588 timer, but it is
not exactly the same. The current ptp-qoriq driver is not compatible with
NETC V4 Timer, most of the code cannot be reused, see below reasons.

1. The architecture of ptp-qoriq driver makes the register offset fixed,
however, the offsets of all the high registers and low registers of V4
are swapped, and V4 also adds some new registers. so extending ptp-qoriq
to make it compatible with V4 Timer is tantamount to completely rewriting
ptp-qoriq driver.

2. The usage of some functions is somewhat different from QorIQ timer,
such as the setting of TCLK_PERIOD and TMR_ADD, the logic of configuring
PPS, etc., so making the driver compatible with V4 Timer will undoubtedly
increase the complexity of the code and reduce readability.

3. QorIQ is an expired brand. It is difficult for us to verify whether
it works stably on the QorIQ platforms if we refactor the driver, and
this will make maintenance difficult, so refactoring the driver obviously
does not bring any benefits.

Therefore, add this new driver for NETC V4 Timer. Note that the missing
features like PEROUT, PPS and EXTTS will be added in subsequent patches.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-5-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoptp: add helpers to get the phc_index by of_node or dev
Wei Fang [Fri, 29 Aug 2025 05:06:04 +0000 (13:06 +0800)] 
ptp: add helpers to get the phc_index by of_node or dev

Some Ethernet controllers do not have an integrated PTP timer function.
Instead, the PTP timer is a separated device and provides PTP hardware
clock to the Ethernet controller to use. Therefore, the Ethernet
controller driver needs to obtain the PTP clock's phc_index in its
ethtool_ops::get_ts_info(). Currently, most drivers implement this in
the following ways.

1. The PTP device driver adds a custom API and exports it to the Ethernet
controller driver.
2. The PTP device driver adds private data to its device structure. So
the private data structure needs to be exposed to the Ethernet controller
driver.

When registering the ptp clock, ptp_clock_register() always saves the
ptp_clock pointer to the private data of ptp_clock::dev. Therefore, as
long as ptp_clock::dev is obtained, the phc_index can be obtained. So
the following generic APIs can be added to the ptp driver to obtain the
phc_index.

1. ptp_clock_index_by_dev(): Obtain the phc_index by the device pointer
of the PTP device.
2.ptp_clock_index_by_of_node(): Obtain the phc_index by the of_node
pointer of the PTP device.

Also, we can add another API like ptp_clock_index_by_fwnode() to get the
phc_index by fwnode of PTP device. However, this API is not used in this
patch set, so it is better to add it when needed.

Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250829050615.1247468-4-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agodt-bindings: net: move ptp-timer property to ethernet-controller.yaml
Wei Fang [Fri, 29 Aug 2025 05:06:03 +0000 (13:06 +0800)] 
dt-bindings: net: move ptp-timer property to ethernet-controller.yaml

For some Ethernet controllers, the PTP timer function is not integrated.
Instead, the PTP timer is a separate device and provides PTP Hardware
Clock (PHC) to the Ethernet controller to use, such as NXP FMan MAC,
ENETC, etc. Therefore, a property is needed to indicate this hardware
relationship between the Ethernet controller and the PTP timer.

Since this use case is also very common, it is better to add a generic
property to ethernet-controller.yaml. According to the existing binding
docs, there are two good candidates, one is the "ptp-timer" defined in
fsl,fman-dtsec.yaml, and the other is the "ptimer-handle" defined in
fsl,fman.yaml. From the perspective of the name, the former is more
straightforward, so move the "ptp-timer" from fsl,fman-dtsec.yaml to
ethernet-controller.yaml.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250829050615.1247468-3-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agodt-bindings: ptp: add NETC Timer PTP clock
Wei Fang [Fri, 29 Aug 2025 05:06:02 +0000 (13:06 +0800)] 
dt-bindings: ptp: add NETC Timer PTP clock

NXP NETC (Ethernet Controller) is a multi-function PCIe Root Complex
Integrated Endpoint (RCiEP), the Timer is one of its functions which
provides current time with nanosecond resolution, precise periodic
pulse, pulse on timeout (alarm), and time capture on external pulse
support. And also supports time synchronization as required for IEEE
1588 and IEEE 802.1AS-2020. So add device tree binding doc for the PTP
clock based on NETC Timer.

NETC Timer has three reference clock sources, but the clock mux is inside
the IP. Therefore, the driver will parse the clock name to select the
desired clock source. If the clocks property is not present, NETC Timer
will use the system clock of NETC IP as its reference clock. Because the
Timer is a PCIe function of NETC IP, the system clock of NETC is always
available to the Timer.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250829050615.1247468-2-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'inet-ping-misc-changes'
Jakub Kicinski [Mon, 1 Sep 2025 20:15:17 +0000 (13:15 -0700)] 
Merge branch 'inet-ping-misc-changes'

Eric Dumazet says:

====================
inet: ping: misc changes

First and third patches improve security a bit.

Second patch (ping_hash removal) is a cleanup.

Fourth patch uses EXPORT_IPV6_MOD[_GPL].
====================

Link: https://patch.msgid.link/20250829153054.474201-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet: ping: use EXPORT_IPV6_MOD[_GPL]()
Eric Dumazet [Fri, 29 Aug 2025 15:30:54 +0000 (15:30 +0000)] 
inet: ping: use EXPORT_IPV6_MOD[_GPL]()

There is no neeed to export ping symbols when CONFIG_IPV6=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250829153054.474201-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet: ping: make ping_port_rover per netns
Eric Dumazet [Fri, 29 Aug 2025 15:30:53 +0000 (15:30 +0000)] 
inet: ping: make ping_port_rover per netns

Provide isolation between netns for ping idents.

Randomize initial ping_port_rover value at netns creation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250829153054.474201-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet: ping: remove ping_hash()
Eric Dumazet [Fri, 29 Aug 2025 15:30:52 +0000 (15:30 +0000)] 
inet: ping: remove ping_hash()

There is no point in keeping ping_hash().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250829153054.474201-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet: ping: check sock_net() in ping_get_port() and ping_lookup()
Eric Dumazet [Fri, 29 Aug 2025 15:30:51 +0000 (15:30 +0000)] 
inet: ping: check sock_net() in ping_get_port() and ping_lookup()

We need to check socket netns before considering them in ping_get_port().
Otherwise, one malicious netns could 'consume' all ports.

Add corresponding check in ping_lookup().

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250829153054.474201-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: mdio: update runtime PM
Russell King (Oracle) [Fri, 29 Aug 2025 09:02:29 +0000 (10:02 +0100)] 
net: stmmac: mdio: update runtime PM

Commit 3c7826d0b106 ("net: stmmac: Separate C22 and C45 transactions
for xgmac") missed a change that happened in commit e2d0acd40c87
("net: stmmac: using pm_runtime_resume_and_get instead of
pm_runtime_get_sync").

Update the two clause 45 functions that didn't get switched to
pm_runtime_resume_and_get().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/E1urv09-00000000gJ1-3SxO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: net: fix spelling and grammar mistakes
Praveen Balakrishnan [Thu, 28 Aug 2025 21:11:00 +0000 (22:11 +0100)] 
selftests: net: fix spelling and grammar mistakes

Fix several spelling and grammatical mistakes in output messages from
the net selftests to improve readability.

Only the message strings for the test output have been modified. No
changes to the functional logic of the tests have been made.

Signed-off-by: Praveen Balakrishnan <praveen.balakrishnan@magd.ox.ac.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250828211100.51019-1-praveen.balakrishnan@magd.ox.ac.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoptp: Limit time setting of PTP clocks
Miroslav Lichvar [Thu, 28 Aug 2025 10:32:53 +0000 (12:32 +0200)] 
ptp: Limit time setting of PTP clocks

Networking drivers implementing PTP clocks and kernel socket code
handling hardware timestamps use the 64-bit signed ktime_t type counting
nanoseconds. When a PTP clock reaches the maximum value in year 2262,
the timestamps returned to applications will overflow into year 1667.
The same thing happens when injecting a large offset with
clock_adjtime(ADJ_SETOFFSET).

The commit 7a8e61f84786 ("timekeeping: Force upper bound for setting
CLOCK_REALTIME") limited the maximum accepted value setting the system
clock to 30 years before the maximum representable value (i.e. year
2232) to avoid the overflow, assuming the system will not run for more
than 30 years.

Enforce the same limit for PTP clocks. Don't allow negative values and
values closer than 30 years to the maximum value. Drivers may implement
an even lower limit if the hardware registers cannot represent the whole
interval between years 1970 and 2262 in the required resolution.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <jstultz@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250828103300.1387025-1-mlichvar@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethernet: qualcomm: QCOM_PPE should depend on ARCH_QCOM
Geert Uytterhoeven [Fri, 29 Aug 2025 11:27:06 +0000 (13:27 +0200)] 
net: ethernet: qualcomm: QCOM_PPE should depend on ARCH_QCOM

The Qualcomm Technologies, Inc. Packet Process Engine (PPE) is only
present on Qualcomm IPQ SoCs.  Hence add a dependency on ARCH_QCOM, to
prevent asking the user about this driver when configuring a kernel
without Qualcomm platform support,

Fixes: 353a0f1d5b27606b ("net: ethernet: qualcomm: Add PPE driver for IPQ9574 SoC")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/eb7bd6e6ce27eb6d602a63184d9daa80127e32bd.1756466786.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: Remove sk->sk_prot->orphan_count.
Kuniyuki Iwashima [Fri, 29 Aug 2025 21:56:38 +0000 (21:56 +0000)] 
tcp: Remove sk->sk_prot->orphan_count.

TCP tracks the number of orphaned (SOCK_DEAD but not yet destructed)
sockets in tcp_orphan_count.

In some code that was shared with DCCP, tcp_orphan_count is referenced
via sk->sk_prot->orphan_count.

Let's reference tcp_orphan_count directly.

inet_csk_prepare_for_destroy_sock() is moved to inet_connection_sock.c
due to header dependency.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250829215641.711664-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-add-rcu-safety-to-dst-dev'
Jakub Kicinski [Sat, 30 Aug 2025 02:36:34 +0000 (19:36 -0700)] 
Merge branch 'net-add-rcu-safety-to-dst-dev'

Eric Dumazet says:

====================
net: add rcu safety to dst->dev

Followup of commit 88fe14253e18 ("net: dst: add four helpers
to annotate data-races around dst->dev").

Use lockdep enabled helpers to convert our unsafe dst->dev
uses one at a time.

More to come...
====================

Link: https://patch.msgid.link/20250828195823.3958522-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoipv4: start using dst_dev_rcu()
Eric Dumazet [Thu, 28 Aug 2025 19:58:23 +0000 (19:58 +0000)] 
ipv4: start using dst_dev_rcu()

Change icmpv4_xrlim_allow(), ip_defrag() to prevent possible UAF.

Change ipmr_prepare_xmit(), ipmr_queue_fwd_xmit(), ip_mr_output(),
ipv4_neigh_lookup() to use lockdep enabled dst_dev_rcu().

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: use dst_dev_rcu() in tcp_fastopen_active_disable_ofo_check()
Eric Dumazet [Thu, 28 Aug 2025 19:58:22 +0000 (19:58 +0000)] 
tcp: use dst_dev_rcu() in tcp_fastopen_active_disable_ofo_check()

Use RCU to avoid a pair of atomic operations and a potential
UAF on dst_dev()->flags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp_metrics: use dst_dev_net_rcu()
Eric Dumazet [Thu, 28 Aug 2025 19:58:21 +0000 (19:58 +0000)] 
tcp_metrics: use dst_dev_net_rcu()

Replace three dst_dev() with a lockdep enabled helper.

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: use dst_dev_rcu() in sk_setup_caps()
Eric Dumazet [Thu, 28 Aug 2025 19:58:20 +0000 (19:58 +0000)] 
net: use dst_dev_rcu() in sk_setup_caps()

Use RCU to protect accesses to dst->dev from sk_setup_caps()
and sk_dst_gso_max_size().

Also use dst_dev_rcu() in ip6_dst_mtu_maybe_forward(),
and ip_dst_mtu_maybe_forward().

ip4_dst_hoplimit() can use dst_dev_net_rcu().

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoipv6: use RCU in ip6_output()
Eric Dumazet [Thu, 28 Aug 2025 19:58:19 +0000 (19:58 +0000)] 
ipv6: use RCU in ip6_output()

Use RCU in ip6_output() in order to use dst_dev_rcu() to prevent
possible UAF.

We can remove rcu_read_lock()/rcu_read_unlock() pairs
from ip6_finish_output2().

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoipv6: use RCU in ip6_xmit()
Eric Dumazet [Thu, 28 Aug 2025 19:58:18 +0000 (19:58 +0000)] 
ipv6: use RCU in ip6_xmit()

Use RCU in ip6_xmit() in order to use dst_dev_rcu() to prevent
possible UAF.

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoipv6: start using dst_dev_rcu()
Eric Dumazet [Thu, 28 Aug 2025 19:58:17 +0000 (19:58 +0000)] 
ipv6: start using dst_dev_rcu()

Refactor icmpv6_xrlim_allow() and ip6_dst_hoplimit()
so that we acquire rcu_read_lock() a bit longer
to be able to use dst_dev_rcu() instead of dst_dev().

__ip6_rt_update_pmtu() and rt6_do_redirect can directly
use dst_dev_rcu() in sections already holding rcu_read_lock().

Small changes to use dst_dev_net_rcu() in
ip6_default_advmss(), ipv6_sock_ac_join(),
ip6_mc_find_dev() and ndisc_send_skb().

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dst: introduce dst->dev_rcu
Eric Dumazet [Thu, 28 Aug 2025 19:58:16 +0000 (19:58 +0000)] 
net: dst: introduce dst->dev_rcu

Followup of commit 88fe14253e18 ("net: dst: add four helpers
to annotate data-races around dst->dev").

We want to gradually add explicit RCU protection to dst->dev,
including lockdep support.

Add an union to alias dst->dev_rcu and dst->dev.

Add dst_dev_net_rcu() helper.

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'inet_diag-make-dumps-faster-with-simple-filters'
Jakub Kicinski [Sat, 30 Aug 2025 02:29:26 +0000 (19:29 -0700)] 
Merge branch 'inet_diag-make-dumps-faster-with-simple-filters'

Eric Dumazet says:

====================
inet_diag: make dumps faster with simple filters

inet_diag_bc_sk() pulls five cache lines per socket,
while most filters only need the two first ones.

We can change it to only pull needed cache lines,
to make things like "ss -temoi src :21456" much faster.

First patches (1-3) are annotating data-races as a first step.
====================

Link: https://patch.msgid.link/20250828102738.2065992-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet_diag: avoid cache line misses in inet_diag_bc_sk()
Eric Dumazet [Thu, 28 Aug 2025 10:27:38 +0000 (10:27 +0000)] 
inet_diag: avoid cache line misses in inet_diag_bc_sk()

inet_diag_bc_sk() pulls five cache lines per socket,
while most filters only need the two first ones.

Add three booleans to struct inet_diag_dump_data,
that are selectively set if a filter needs specific socket fields.

- mark_needed       /* INET_DIAG_BC_MARK_COND present. */
- cgroup_needed     /* INET_DIAG_BC_CGROUP_COND present. */
- userlocks_needed  /* INET_DIAG_BC_AUTO present. */

This removes millions of cache lines misses per ss invocation
when simple filters are specified on busy servers.

offsetof(struct sock, sk_userlocks) = 0xf3
offsetof(struct sock, sk_mark) = 0x20c
offsetof(struct sock, sk_cgrp_data) = 0x298

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet_diag: change inet_diag_bc_sk() first argument
Eric Dumazet [Thu, 28 Aug 2025 10:27:37 +0000 (10:27 +0000)] 
inet_diag: change inet_diag_bc_sk() first argument

We want to have access to the inet_diag_dump_data structure
in the following patch.

This patch removes duplication in callers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet_diag: annotate data-races in inet_diag_bc_sk()
Eric Dumazet [Thu, 28 Aug 2025 10:27:36 +0000 (10:27 +0000)] 
inet_diag: annotate data-races in inet_diag_bc_sk()

inet_diag_bc_sk() runs with an unlocked socket,
annotate potential races with READ_ONCE().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: annotate data-races in tcp_req_diag_fill()
Eric Dumazet [Thu, 28 Aug 2025 10:27:35 +0000 (10:27 +0000)] 
tcp: annotate data-races in tcp_req_diag_fill()

req->num_retrans and rsk_timer.expires are read locklessly,
and can be changed from tcp_rtx_synack().

Add READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet_diag: annotate data-races in inet_diag_msg_common_fill()
Eric Dumazet [Thu, 28 Aug 2025 10:27:34 +0000 (10:27 +0000)] 
inet_diag: annotate data-races in inet_diag_msg_common_fill()

inet_diag_msg_common_fill() can run without socket lock.
Add READ_ONCE() or data_race() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agomicrochip: lan865x: add ndo_eth_ioctl handler to enable PHY ioctl support
Parthiban Veerasooran [Thu, 28 Aug 2025 11:45:49 +0000 (17:15 +0530)] 
microchip: lan865x: add ndo_eth_ioctl handler to enable PHY ioctl support

Introduce support for standard MII ioctl operations in the LAN865x
Ethernet driver by implementing the .ndo_eth_ioctl callback. This allows
PHY-related ioctl commands to be handled via phy_do_ioctl_running() and
enables support for ethtool and other user-space tools that rely on ioctl
interface to perform PHY register access using commands like SIOCGMIIREG
and SIOCSMIIREG.

This feature enables improved diagnostics and PHY configuration
capabilities from userspace.

Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250828114549.46116-1-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agovsock/test: Remove redundant semicolons
Liao Yuanhong [Thu, 28 Aug 2025 08:39:38 +0000 (16:39 +0800)] 
vsock/test: Remove redundant semicolons

Remove unnecessary semicolons.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250828083938.400872-1-liaoyuanhong@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopppoe: drop sock reference counting on fast path
Qingfang Deng [Thu, 28 Aug 2025 01:20:17 +0000 (09:20 +0800)] 
pppoe: drop sock reference counting on fast path

Now that PPPoE sockets are freed via RCU (SOCK_RCU_FREE), it is no longer
necessary to take a reference count when looking up sockets on the receive
path. Readers are protected by RCU, so the socket memory remains valid
until after a grace period.

Convert fast-path lookups to avoid refcounting:
 - Replace get_item() and sk_receive_skb() in pppoe_rcv() with
   __get_item() and __sk_receive_skb().
 - Rework get_item_by_addr() into __get_item_by_addr() (no refcount and
   move RCU lock into pppoe_ioctl)
 - Remove unnecessary sock_put() calls.

This avoids cacheline bouncing from atomic reference counting and improves
performance on the receive fast path.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250828012018.15922-2-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopppoe: remove rwlock usage
Qingfang Deng [Thu, 28 Aug 2025 01:20:16 +0000 (09:20 +0800)] 
pppoe: remove rwlock usage

Like ppp_generic.c, convert the PPPoE socket hash table to use RCU for
lookups and a spinlock for updates. This removes rwlock usage and allows
lockless readers on the fast path.

- Mark hash table and list pointers as __rcu.
- Use spin_lock() to protect writers.
- Readers use rcu_dereference() under rcu_read_lock(). All known callers
  of get_item() already hold the RCU read lock, so no additional locking
  is needed.
- get_item() now uses refcount_inc_not_zero() instead of sock_hold() to
  safely take a reference. This prevents crashes if a socket is already
  in the process of being freed (sk_refcnt == 0).
- Set SOCK_RCU_FREE to defer socket freeing until after an RCU grace
  period.
- Move skb_queue_purge() into sk_destruct callback to ensure purge
  happens after an RCU grace period.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250828012018.15922-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 12 Jun 2025 17:08:24 +0000 (10:08 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.17-rc4).

No conflicts.

Adjacent changes:

drivers/net/ethernet/intel/idpf/idpf_txrx.c
  02614eee26fb ("idpf: do not linearize big TSO packets")
  6c4e68480238 ("idpf: remove obsolete stashing code")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 29 Aug 2025 00:35:51 +0000 (17:35 -0700)] 
Merge tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth.

  Current release - regressions:

    - ipv4: fix regression in local-broadcast routes

    - vsock: fix error-handling regression introduced in v6.17-rc1

  Previous releases - regressions:

    - bluetooth:
        - mark connection as closed during suspend disconnect
        - fix set_local_name race condition

    - eth:
        - ice: fix NULL pointer dereference on reset
        - mlx5: fix memory leak in hws_pool_buddy_init error path
        - bnxt_en: fix stats context reservation logic
        - hv: fix loss of receive events from host during channel open

  Previous releases - always broken:

    - page_pool: fix incorrect mp_ops error handling

    - sctp: initialize more fields in sctp_v6_from_sk()

    - eth:
        - octeontx2-vf: fix max packet length errors
        - idpf: fix Tx flow scheduling to avoid Tx timeouts
        - bnxt_en: fix memory corruption during ifdown
        - ice: fix incorrect counter for buffer allocation failures
        - mlx5: fix lockdep assertion on sync reset unload event
        - fbnic: fixup rtnl_lock and devl_lock handling
        - xgmac: do not enable RX FIFO overflow interrupts

    - phy: mscc: fix when PTP clock is register and unregister

  Misc:

    - add Telit Cinterion LE910C4-WWX new compositions"

* tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
  net: ipv4: fix regression in local-broadcast routes
  net: macb: Disable clocks once
  fbnic: Move phylink resume out of service_task and into open/close
  fbnic: Fixup rtnl_lock and devl_lock handling related to mailbox code
  net: rose: fix a typo in rose_clear_routes()
  l2tp: do not use sock_hold() in pppol2tp_session_get_sock()
  sctp: initialize more fields in sctp_v6_from_sk()
  MAINTAINERS: rmnet: Update email addresses
  net: rose: include node references in rose_neigh refcount
  net: rose: convert 'use' field to refcount_t
  net: rose: split remove and free operations in rose_remove_neigh()
  net: hv_netvsc: fix loss of early receive events from host during channel open.
  net: stmmac: Set CIC bit only for TX queues with COE
  net: stmmac: xgmac: Correct supported speed modes
  net: stmmac: xgmac: Do not enable RX FIFO Overflow interrupts
  net/mlx5e: Set local Xoff after FW update
  net/mlx5e: Update and set Xon/Xoff upon port speed set
  net/mlx5e: Update and set Xon/Xoff upon MTU set
  net/mlx5: Prevent flow steering mode changes in switchdev mode
  net/mlx5: Nack sync reset when SFs are present
  ...

7 weeks agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Thu, 28 Aug 2025 23:59:21 +0000 (16:59 -0700)] 
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
ice: split ice_virtchnl.c git-blame friendly way

Przemek Kitszel says:

Split ice_virtchnl.c into two more files (+headers), in a way
that git-blame works better.
Then move virtchnl files into a new subdir.
No logic changes.

I have developed (or discovered ;)) how to split a file in a way that
both old and new are nice in terms of git-blame
There was not much discussion on [RFC], so I would like to propose
to go forward with this approach.

There are more commits needed to have it nice, so it forms a git-log vs
git-blame tradeoff, but (after the brief moment that this is on the top)
we spend orders of magnitude more time looking at the blame output (and
commit messages linked from that) - so I find it much better to see
actual logic changes instead of "move xx to yy" stuff (typical for
"squashed/single-commit splits").

Cherry-picks/rebases work the same with this method as with simple
"squashed/single-commit" approach (literally all commits squashed into
one (to have better git-log, but shitty git-blame output).

Rationale for the split itself is, as usual, "file is big and we want to
extend it".

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: finish virtchnl.c split into rss.c
  ice: extract virt/rss.c: cleanup - p2
  ice: extract virt/rss.c: cleanup - p1
  ice: split RSS stuff out of virtchnl.c - copy back
  ice: split RSS stuff out of virtchnl.c - tmp rename
  ice: finish virtchnl.c split into queues.c
  ice: extract virt/queues.c: cleanup - p3
  ice: extract virt/queues.c: cleanup - p2
  ice: extract virt/queues.c: cleanup - p1
  ice: split queue stuff out of virtchnl.c - copy back
  ice: split queue stuff out of virtchnl.c - tmp rename
  ice: add virt/ and move ice_virtchnl* files there
====================

Link: https://patch.msgid.link/20250827224641.415806-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoeth: mlx5: remove Kconfig co-dependency with VXLAN
Jakub Kicinski [Wed, 27 Aug 2025 23:43:19 +0000 (16:43 -0700)] 
eth: mlx5: remove Kconfig co-dependency with VXLAN

mlx5 has a Kconfig co-dependency on VXLAN, even tho it doesn't
call any VXLAN function (unlike mlxsw). Perhaps this dates back
to very old days when tunnel ports were fetched directly from
VXLAN.

Remove the dependency to allow MLX5=y + VXLAN=m kernel configs.
But still avoid compiling in the lib/vxlan code if VXLAN=n.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20250827234319.3504852-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: mdio: clean up c22/c45 accessor split
Russell King (Oracle) [Wed, 27 Aug 2025 13:27:47 +0000 (14:27 +0100)] 
net: stmmac: mdio: clean up c22/c45 accessor split

The C45 accessors were setting the GR (register number) field twice,
once with the 16-bit register address truncated to five bits, and
then overwritten with the C45 devad. This is harmless since the field
was being cleared prior to being updated with the C45 devad, except
for the extra work.

Remove the redundant code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1urGBn-00000000DCH-3swS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net_sched-extend-rcu-use-in-dump-methods-ii'
Jakub Kicinski [Thu, 28 Aug 2025 23:46:25 +0000 (16:46 -0700)] 
Merge branch 'net_sched-extend-rcu-use-in-dump-methods-ii'

Eric Dumazet says:

====================
net_sched: extend RCU use in dump() methods (II)

Second series adding RCU dump() to three actions

First patch removes BH blocking on modules done in the first series.
====================

Link: https://patch.msgid.link/20250827125349.3505302-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: act_skbmod: use RCU in tcf_skbmod_dump()
Eric Dumazet [Wed, 27 Aug 2025 12:53:49 +0000 (12:53 +0000)] 
net_sched: act_skbmod: use RCU in tcf_skbmod_dump()

Also storing tcf_action into struct tcf_skbmod_params
makes sure there is no discrepancy in tcf_skbmod_act().

No longer block BH in tcf_skbmod_init() when acquiring tcf_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827125349.3505302-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: act_tunnel_key: use RCU in tunnel_key_dump()
Eric Dumazet [Wed, 27 Aug 2025 12:53:48 +0000 (12:53 +0000)] 
net_sched: act_tunnel_key: use RCU in tunnel_key_dump()

Also storing tcf_action into struct tcf_tunnel_key_params
makes sure there is no discrepancy in tunnel_key_act().

No longer block BH in tunnel_key_init() when acquiring tcf_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827125349.3505302-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: act_vlan: use RCU in tcf_vlan_dump()
Eric Dumazet [Wed, 27 Aug 2025 12:53:47 +0000 (12:53 +0000)] 
net_sched: act_vlan: use RCU in tcf_vlan_dump()

Also storing tcf_action into struct tcf_vlan_params
makes sure there is no discrepancy in tcf_vlan_act().

No longer block BH in tcf_vlan_init() when acquiring tcf_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827125349.3505302-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: remove BH blocking in eight actions
Eric Dumazet [Wed, 27 Aug 2025 12:53:46 +0000 (12:53 +0000)] 
net_sched: remove BH blocking in eight actions

Followup of f45b45cbfae3 ("Merge branch
'net_sched-act-extend-rcu-use-in-dump-methods'")

We never grab tcf_lock from BH context in these modules:

 act_connmark
 act_csum
 act_ct
 act_ctinfo
 act_mpls
 act_nat
 act_pedit
 act_skbedit

No longer block BH when acquiring tcf_lock from init functions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827125349.3505302-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: minor cleanups to stmmac_bus_clks_config()
Russell King (Oracle) [Wed, 27 Aug 2025 08:54:51 +0000 (09:54 +0100)] 
net: stmmac: minor cleanups to stmmac_bus_clks_config()

stmmac_bus_clks_config() doesn't need to repeatedly on dereference
priv->plat as this remains the same throughout this function. Not only
does this detract from the function's readability, but it could cause
the value to be reloaded each time. Use a local variable.

Also, the final return can simply return zero, and we can dispense
with the initialiser for 'ret'.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/E1urBvf-000000002ii-37Ce@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: mdio: use netdev_priv() directly
Russell King (Oracle) [Wed, 27 Aug 2025 08:41:48 +0000 (09:41 +0100)] 
net: stmmac: mdio: use netdev_priv() directly

netdev_priv() is an inline function, taking a struct net_device
pointer. When passing in the MII bus->priv, which is a void pointer,
there is no need to go via a local ndev variable to type it first.

Thus, instead of:

struct net_device *ndev = bus->priv;
struct stmmac_priv *priv;
...
priv = netdev_priv(ndev);

we can simply do:

struct stmmac_priv *priv = netdev_priv(bus->priv);

which simplifies the code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/E1urBj2-000000002as-0pod@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: mtk-2p5ge: Add LED support for MT7988
Sky Huang [Wed, 27 Aug 2025 04:47:55 +0000 (12:47 +0800)] 
net: phy: mtk-2p5ge: Add LED support for MT7988

Add LED support for MT7988's built-in 2.5Gphy. LED hardware has almost
the same design with MT7981's/MT7988's built-in GbE. So hook the same
helper function here.

Before mtk_phy_leds_state_init(), set correct default values of LED0
and LED1.

Signed-off-by: Sky Huang <skylake.huang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250827044755.3256991-1-SkyLake.Huang@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'pm-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Thu, 28 Aug 2025 23:34:32 +0000 (16:34 -0700)] 
Merge tag 'pm-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Add missing locking annotations to two recently introduced
  list_for_each_entry_rcu() loops in the core device suspend/resume
  code (Johannes Berg)"

* tag 'pm-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: annotate RCU list iterations

7 weeks agoselftests: drv-net: rss_ctx: fix the queue count check
Jakub Kicinski [Wed, 27 Aug 2025 17:35:58 +0000 (10:35 -0700)] 
selftests: drv-net: rss_ctx: fix the queue count check

Commit 0d6ccfe6b319 ("selftests: drv-net: rss_ctx: check for all-zero keys")
added a skip exception if NIC has fewer than 3 queues enabled,
but it's just constructing the object, it's not actually rising
this exception.

Before:

  # Exception| net.lib.py.utils.CmdExitFailure: Command failed: ethtool -X enp1s0 equal 3 hkey d1:cc:77:47:9d:ea:15:f2:b9:6c:ef:68:62:c0:45:d5:b0:99:7d:cf:29:53:40:06:3d:8e:b9:bc:d4:70:89:b8:8d:59:04:ea:a9:c2:21:b3:55:b8:ab:6b:d9:48:b4:bd:4c:ff:a5:f0:a8:c2
  not ok 1 rss_ctx.test_rss_key_indir

After:

  ok 1 rss_ctx.test_rss_key_indir # SKIP Device has fewer than 3 queues (or doesn't support queue stats)

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827173558.3259072-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'devmem-io_uring-allow-more-flexibility-for-zc-dma-devices'
Jakub Kicinski [Thu, 28 Aug 2025 23:05:34 +0000 (16:05 -0700)] 
Merge branch 'devmem-io_uring-allow-more-flexibility-for-zc-dma-devices'

Dragos Tatulea says:

====================
devmem/io_uring: allow more flexibility for ZC DMA devices

For TCP zerocopy rx (io_uring, devmem), there is an assumption that the
parent device can do DMA. However that is not always the case:
- Scalable Function netdevs [1] have the DMA device in the grandparent.
- For Multi-PF netdevs [2] queues can be associated to different DMA
  devices.

The series adds an API for getting the DMA device for a netdev queue.
Drivers that have special requirements can implement the newly added
queue management op. Otherwise the parent will still be used as before.

This series continues with switching to this API for io_uring zcrx and
devmem and adds a ndo_queue_dma_dev op for mlx5.

The last part of the series changes devmem rx bind to get the DMA device
per queue and blocks the case when multiple queues use different DMA
devices. The tx bind is left as is.

[1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
[2] Documentation/networking/multi-pf-netdev.rst
====================

Link: https://patch.msgid.link/20250827144017.1529208-2-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: devmem: allow binding on rx queues with same DMA devices
Dragos Tatulea [Wed, 27 Aug 2025 14:40:01 +0000 (17:40 +0300)] 
net: devmem: allow binding on rx queues with same DMA devices

Multi-PF netdevs have queues belonging to different PFs which also means
different DMA devices. This means that the binding on the DMA buffer can
be done to the incorrect device.

This change allows devmem binding to multiple queues only when the
queues have the same DMA device. Otherwise an error is returned.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20250827144017.1529208-9-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: devmem: pre-read requested rx queues during bind
Dragos Tatulea [Wed, 27 Aug 2025 14:40:00 +0000 (17:40 +0300)] 
net: devmem: pre-read requested rx queues during bind

Instead of reading the requested rx queues after binding the buffer,
read the rx queues in advance in a bitmap and iterate over them when
needed.

This is a preparation for fetching the DMA device for each queue.

This patch has no functional changes besides adding an extra
rq index bounds check.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250827144017.1529208-8-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: devmem: pull out dma_dev out of net_devmem_bind_dmabuf
Dragos Tatulea [Wed, 27 Aug 2025 14:39:59 +0000 (17:39 +0300)] 
net: devmem: pull out dma_dev out of net_devmem_bind_dmabuf

Fetch the DMA device before calling net_devmem_bind_dmabuf()
and pass it on as a parameter.

This is needed for an upcoming change which will read the
DMA device per queue.

This patch has no functional changes.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250827144017.1529208-7-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5e: add op for getting netdev DMA device
Dragos Tatulea [Wed, 27 Aug 2025 14:39:58 +0000 (17:39 +0300)] 
net/mlx5e: add op for getting netdev DMA device

For zero-copy (devmem, io_uring), the netdev DMA device used
is the parent device of the net device. However that is not
always accurate for mlx5 devices:
- SFs: The parent device is an auxdev.
- Multi-PF netdevs: The DMA device should be determined by
  the queue.

This change implements the DMA device queue API that returns the DMA
device appropriately for all cases.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250827144017.1529208-6-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>