]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
3 months agoselftests/net: expand cmsg_ipv6.sh with ipv4
Willem de Bruijn [Tue, 25 Feb 2025 02:23:59 +0000 (21:23 -0500)] 
selftests/net: expand cmsg_ipv6.sh with ipv4

Expand IPV6_TCLASS to also cover IP_TOS.
Expand IPV6_HOPLIMIT to also cover IP_TTL.

Expand csmg_sender.c to allow setting IPv4 setsockopts.
Also rename struct v6 to cmsg to match its expanded scope.
Don't bother updating all occurrences of tclass and hoplimit.

Rename cmsg_ipv6.sh to cmsg_ip.sh to match the expanded scope.

Be careful around the subtle API difference between TCLASS and TOS.
IP_TOS includes ECN bits. Add a test to verify that these are masked
when making routing decisions.

Diff is more concise with --word-diff

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250225022431.2083926-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests/net: prepare cmsg_ipv6.sh for ipv4
Willem de Bruijn [Tue, 25 Feb 2025 02:23:58 +0000 (21:23 -0500)] 
selftests/net: prepare cmsg_ipv6.sh for ipv4

Move IPV6_TCLASS and IPV6_HOPLIMIT into loops, to be able to use them
for IP_TOS and IP_TTL in a follow-on patch.

Indentation in this file is a mix of four spaces and tabs for double
indents. To minimize code churn, maintain that pattern.

Very small diff if viewing with -w.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250225022431.2083926-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agotcp: be less liberal in TSEcr received while in SYN_RECV state
Eric Dumazet [Tue, 25 Feb 2025 17:10:48 +0000 (17:10 +0000)] 
tcp: be less liberal in TSEcr received while in SYN_RECV state

Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS,
for flows using TCP TS option (RFC 7323)

As hinted by an old comment in tcp_check_req(),
we can check the TSEcr value in the incoming packet corresponds
to one of the SYNACK TSval values we have sent.

In this patch, I record the oldest and most recent values
that SYNACK packets have used.

Send a challenge ACK if we receive a TSEcr outside
of this range, and increase a new SNMP counter.

nstat -az | grep TSEcrRejected
TcpExtTSEcrRejected            0                  0.0

Due to TCP fastopen implementation, do not apply yet these checks
for fastopen flows.

v2: No longer use req->num_timeout, but treq->snt_tsval_first
    to detect when first SYNACK is prepared. This means
    we make sure to not send an initial zero TSval.
    Make sure MPTCP and TCP selftests are passing.
    Change MIB name to TcpExtTSEcrRejected

v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/

Reported-by: Yong-Hao Zou <yonghaoz1994@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoenic: add dependency on Page Pool
John Daley [Mon, 24 Feb 2025 23:43:50 +0000 (15:43 -0800)] 
enic: add dependency on Page Pool

Driver was not configured to select page_pool, causing a compile error
if page pool module was not already selected.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502211253.3XRosM9I-lkp@intel.com/
Fixes: d24cb52b2d8a ("enic: Use the Page Pool API for RX")
Reviewed-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250224234350.23157-2-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Wed, 26 Feb 2025 02:40:04 +0000 (18:40 -0800)] 
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2025-02-24

The following pull-request contains common mlx5 updates
for your *net-next* tree.

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Change POOL_NEXT_SIZE define value and make it global
  net/mlx5: Add new health syndrome error and crr bit offset
====================

Link: https://patch.msgid.link/20250224212446.523259-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: wangxun: fix LIBWX dependencies
Arnd Bergmann [Mon, 24 Feb 2025 14:05:06 +0000 (15:05 +0100)] 
net: wangxun: fix LIBWX dependencies

Selecting LIBWX requires that its dependencies are met first:

WARNING: unmet direct dependencies detected for LIBWX
  Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PTP_1588_CLOCK_OPTIONAL [=m]
  Selected by [y]:
  - TXGBE [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PCI [=y] && COMMON_CLK [=y] && I2C_DESIGNWARE_PLATFORM [=y]
ld.lld-21: error: undefined symbol: ptp_schedule_worker
>>> referenced by wx_ptp.c:747 (/home/arnd/arm-soc/drivers/net/ethernet/wangxun/libwx/wx_ptp.c:747)
>>>               drivers/net/ethernet/wangxun/libwx/wx_ptp.o:(wx_ptp_reset) in archive vmlinux.a

Add the smae dependency on PTP_1588_CLOCK_OPTIONAL to the two driver
using this library module.

Fixes: 06e75161b9d4 ("net: wangxun: Add support for PTP clock")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250224140516.1168214-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'symmetric-or-xor-rss-hash'
Jakub Kicinski [Wed, 26 Feb 2025 02:31:07 +0000 (18:31 -0800)] 
Merge branch 'symmetric-or-xor-rss-hash'

Gal Pressman says:

====================
Symmetric OR-XOR RSS hash

Add support for a new type of input_xfrm: Symmetric OR-XOR.
Symmetric OR-XOR performs hash as follows:
(SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT)

Configuration is done through ethtool -x/X command.
For mlx5, the default is already symmetric hash, this patch now exposes
this to userspace and allows enabling/disabling of the feature.

v5: https://lore.kernel.org/20250220113435.417487-1-gal@nvidia.com
v4: https://lore.kernel.org/20250216182453.226325-1-gal@nvidia.com
v3: https://lore.kernel.org/20250205135341.542720-1-gal@nvidia.com
v2: https://lore.kernel.org/20250203150039.519301-1-gal@nvidia.com
====================

Link: https://patch.msgid.link/20250224174416.499070-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests: drv-net-hw: Add a test for symmetric RSS hash
Gal Pressman [Mon, 24 Feb 2025 17:44:16 +0000 (19:44 +0200)] 
selftests: drv-net-hw: Add a test for symmetric RSS hash

Add a selftest that verifies symmetric RSS hash is working as intended.
The test runs iterations of traffic, swapping the src/dst UDP ports, and
verifies that the same RX queue is receiving the traffic in both cases.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250224174416.499070-5-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests: drv-net: Make rand_port() get a port more reliably
Gal Pressman [Mon, 24 Feb 2025 17:44:15 +0000 (19:44 +0200)] 
selftests: drv-net: Make rand_port() get a port more reliably

Instead of guessing a port and checking whether it's available, get an
available port from the OS.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250224174416.499070-4-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Symmetric OR-XOR RSS hash control
Gal Pressman [Mon, 24 Feb 2025 17:44:14 +0000 (19:44 +0200)] 
net/mlx5e: Symmetric OR-XOR RSS hash control

Allow control over the symmetric RSS hash, which was previously set to
enabled by default by the driver.

Symmetric OR-XOR RSS can now be queried and controlled using the
'ethtool -x/X' command.

Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250224174416.499070-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoethtool: Symmetric OR-XOR RSS hash
Gal Pressman [Mon, 24 Feb 2025 17:44:13 +0000 (19:44 +0200)] 
ethtool: Symmetric OR-XOR RSS hash

Add an additional type of symmetric RSS hash type: OR-XOR.
The "Symmetric-OR-XOR" algorithm transforms the input as follows:

(SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT)

Change 'cap_rss_sym_xor_supported' to 'supported_input_xfrm', a bitmap
of supported RXH_XFRM_* types.

Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250224174416.499070-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodrivers: net: xgene: Don't use "proxy" headers
Andy Shevchenko [Mon, 24 Feb 2025 12:00:37 +0000 (14:00 +0200)] 
drivers: net: xgene: Don't use "proxy" headers

Update header inclusions to follow IWYU (Include What You Use)
principle.

In this case replace *gpio.h, which are subject to remove by the GPIOLIB
subsystem, with the respective headers that are being used by the driver.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20250224120037.3801609-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'net-stmmac-dwc-qos-clean-up-clock-initialisation'
Jakub Kicinski [Wed, 26 Feb 2025 02:17:56 +0000 (18:17 -0800)] 
Merge branch 'net-stmmac-dwc-qos-clean-up-clock-initialisation'

Russell King says:

====================
net: stmmac: dwc-qos: clean up clock initialisation

My single v1 patch has become two patches as a result of the build
error, as it appears this code uses "data" differently from others.
v2 still produced build warnings despite local builds being clean,
so v3 addresses those.

The first patch brings some consistency with other drivers, naming
local variables that refer to struct plat_stmmacenet_data as
"plat_dat", as is used elsewhere in this driver.
====================

Link: https://patch.msgid.link/Z7yj_BZa6yG02KcI@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: stmmac: dwc-qos: clean up clock initialisation
Russell King (Oracle) [Mon, 24 Feb 2025 16:53:15 +0000 (16:53 +0000)] 
net: stmmac: dwc-qos: clean up clock initialisation

Clean up the clock initialisation by providing a helper to find a
named clock in the bulk clocks, and provide the name of the stmmac
clock in match data so we can locate the stmmac clock in generic
code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/E1tmbhj-004vSz-Pt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: stmmac: dwc-qos: name struct plat_stmmacenet_data consistently
Russell King (Oracle) [Mon, 24 Feb 2025 16:53:10 +0000 (16:53 +0000)] 
net: stmmac: dwc-qos: name struct plat_stmmacenet_data consistently

Most of the stmmac driver uses "plat_dat" to name the platform data,
and "data" is used for the glue driver data. make dwc-qos follow
this pattern to avoid silly mistakes.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/E1tmbhe-004vSt-M3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoAdd OVN to `rtnetlink.h`
Jonas Gottlieb [Mon, 24 Feb 2025 09:44:27 +0000 (10:44 +0100)] 
Add OVN to `rtnetlink.h`

- The Open Virtual Network (OVN) routing netlink handler uses ID 84
- Will also add to `/etc/iproute2/rt_protos` once this is accepted
- For more information: https://github.com/ovn-org/ovn

Signed-off-by: Jonas Gottlieb <jonas.gottlieb@stackit.cloud>
Link: https://patch.msgid.link/Z7w_e7cfA3xmHDa6@SIT-SDELAP4051.int.lidl.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests/net: ensure mptcp is enabled in netns
Hangbin Liu [Mon, 24 Feb 2025 09:40:13 +0000 (09:40 +0000)] 
selftests/net: ensure mptcp is enabled in netns

Some distributions may not enable MPTCP by default. All other MPTCP tests
source mptcp_lib.sh to ensure MPTCP is enabled before testing. However,
the ip_local_port_range test is the only one that does not include this
step.

Let's also ensure MPTCP is enabled in netns for ip_local_port_range so
that it passes on all distributions.

Suggested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250224094013.13159-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoOcteontx2-af: RPM: Register driver with PCI subsys IDs
Hariprasad Kelam [Mon, 24 Feb 2025 03:56:03 +0000 (09:26 +0530)] 
Octeontx2-af: RPM: Register driver with PCI subsys IDs

Although the PCI device ID and Vendor ID for the RPM (MAC) block
have remained the same across Octeon CN10K and the next-generation
CN20K silicon, Hardware architecture has changed (NIX mapped RPMs
and RFOE Mapped RPMs).

Add PCI Subsystem IDs to the device table to ensure that this driver
can be probed from NIX mapped RPM devices only.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250224035603.1220913-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agosctp: Remove unused payload from sctp_idatahdr
Thorsten Blum [Sun, 23 Feb 2025 20:45:07 +0000 (21:45 +0100)] 
sctp: Remove unused payload from sctp_idatahdr

Remove the unused payload array from the struct sctp_idatahdr.

Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250223204505.2499-3-thorsten.blum@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoMerge branch 'eth-fbnic-update-fbnic-driver'
Paolo Abeni [Tue, 25 Feb 2025 11:56:26 +0000 (12:56 +0100)] 
Merge branch 'eth-fbnic-update-fbnic-driver'

Mohsin Bashir says:

====================
eth: fbnic: Update fbnic driver

This patchset makes following trivial changes to the fbnic driver:
1) Add coverage for PCIe CSRs in the ethtool register dump.

2) Consolidate the PUL_USER CSR section, update the end boundary,
and remove redundant definition of the end boundary.

3) Update the return value in kdoc for fbnic_netdev_alloc().
====================

Link: https://patch.msgid.link/20250221201813.2688052-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoeth: fbnic: Update return value in kdoc
Mohsin Bashir [Fri, 21 Feb 2025 20:18:13 +0000 (12:18 -0800)] 
eth: fbnic: Update return value in kdoc

Fix return value in kdoc for fbnic_netdev_alloc()

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoeth: fbnic: Consolidate PUL_USER CSR section
Mohsin Bashir [Fri, 21 Feb 2025 20:18:12 +0000 (12:18 -0800)] 
eth: fbnic: Consolidate PUL_USER CSR section

Move PUL_USER CSRs in the relevant section, update the end boundary
address, and remove the redundant definition of end boundary.

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoeth: fbnic: Add PCIe registers dump
Mohsin Bashir [Fri, 21 Feb 2025 20:18:11 +0000 (12:18 -0800)] 
eth: fbnic: Add PCIe registers dump

Provide coverage to PCIe registers in ethtool register dump

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agonet: phy: add phylib-internal.h
Heiner Kallweit [Sat, 22 Feb 2025 08:36:10 +0000 (09:36 +0100)] 
net: phy: add phylib-internal.h

This patch is a starting point for moving phylib-internal
declarations to a private header file.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/082eacd2-a888-4716-8797-b3491ce02820@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'mptcp-pm-misc-cleanups-part-3'
Jakub Kicinski [Tue, 25 Feb 2025 02:23:46 +0000 (18:23 -0800)] 
Merge branch 'mptcp-pm-misc-cleanups-part-3'

Matthieu Baerts says:

====================
mptcp: pm: misc cleanups, part 3

These cleanups lead the way to the unification of the path-manager
interfaces, and allow future extensions. The following patches are not
all linked to each others, but are all related to the path-managers,
except the last three.

- Patch 1: remove unused returned value in mptcp_nl_set_flags().
- Patch 2: new flag: avoid iterating over all connections if not needed.
- Patch 3: add a build check making sure there is enough space in cb-ctx.
- Patch 4: new mptcp_pm_genl_fill_addr helper to reduce duplicated code.
- Patch 5: simplify userspace_pm_append_new_local_addr helper.
- Patch 6: drop unneeded inet6_sk().
- Patch 7: use ipv6_addr_equal() instead of !ipv6_addr_cmp()
- Patch 8: scheduler: split an interface in two.
- Patch 9: scheduler: save 64 bytes of currently unused data.
- Patch 10: small optimisation to exit early in case of retransmissions.
====================

Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-0-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: blackhole: avoid checking the state twice
Matthieu Baerts (NGI0) [Fri, 21 Feb 2025 15:44:03 +0000 (16:44 +0100)] 
mptcp: blackhole: avoid checking the state twice

A small cleanup, reordering the conditions to avoid checking things
twice.

The code here is called in case of timeout on a TCP connection, before
triggering a retransmission. But it only acts on SYN + MPC packets.

So the conditions can be re-order to exit early in case of non-MPTCP
SYN + MPC. This also reduce the indentation levels.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-10-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: sched: reduce size for unused data
Matthieu Baerts (NGI0) [Fri, 21 Feb 2025 15:44:02 +0000 (16:44 +0100)] 
mptcp: sched: reduce size for unused data

Thanks for the previous commit ("mptcp: sched: split get_subflow
interface into two"), the mptcp_sched_data structure is now currently
unused.

This structure has been added to allow future extensions that are not
ready yet. At the end, this structure will not even be used at all when
mptcp_subflow bpf_iter will be supported [1].

Here is a first step to save 64 bytes on the stack for each scheduling
operation. The structure is not removed yet not to break the WIP work on
these extensions, but will be done when [1] will be ready and applied.

Link: https://lore.kernel.org/6645ad6e-8874-44c5-8730-854c30673218@linux.dev
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-9-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: sched: split get_subflow interface into two
Geliang Tang [Fri, 21 Feb 2025 15:44:01 +0000 (16:44 +0100)] 
mptcp: sched: split get_subflow interface into two

get_retrans() interface of the burst packet scheduler invokes a sleeping
function mptcp_pm_subflow_chk_stale(), which calls __lock_sock_fast().
So get_retrans() interface should be set with BPF_F_SLEEPABLE flag in
BPF. But get_send() interface of this scheduler can't be set with
BPF_F_SLEEPABLE flag since it's invoked in ack_update_msk() under mptcp
data lock.

So this patch has to split get_subflow() interface of packet scheduer into
two interfaces: get_send() and get_retrans(). Then we can set get_retrans()
interface alone with BPF_F_SLEEPABLE flag.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-8-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: pm: use ipv6_addr_equal in addresses_equal
Geliang Tang [Fri, 21 Feb 2025 15:44:00 +0000 (16:44 +0100)] 
mptcp: pm: use ipv6_addr_equal in addresses_equal

Use ipv6_addr_equal() to check whether two IPv6 addresses are equal in
mptcp_addresses_equal().

This is more appropriate than using !ipv6_addr_cmp().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-7-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: pm: drop inet6_sk after inet_sk
Geliang Tang [Fri, 21 Feb 2025 15:43:59 +0000 (16:43 +0100)] 
mptcp: pm: drop inet6_sk after inet_sk

In mptcp_event_add_subflow(), mptcp_event_pm_listener() and
mptcp_nl_find_ssk(), 'issk' has already been got through inet_sk().

No need to use inet6_sk() to get 'ipv6_pinfo' again, just use
issk->pinet6 instead. This patch also drops these 'ipv6_pinfo'
variables.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-6-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: pm: drop match in userspace_pm_append_new_local_addr
Geliang Tang [Fri, 21 Feb 2025 15:43:58 +0000 (16:43 +0100)] 
mptcp: pm: drop match in userspace_pm_append_new_local_addr

The variable 'match' in mptcp_userspace_pm_append_new_local_addr() is a
redundant one, and this patch drops it.

No need to define 'match' as 'struct mptcp_pm_addr_entry *' type. In this
function, it's only used to check whether it's NULL. It can be defined as
a Boolean one.

Also other variables 'addr_match' and 'id_match' make 'match' a redundant
one, which can be replaced by directly checking 'addr_match && id_match'.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-5-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: pm: add mptcp_pm_genl_fill_addr helper
Geliang Tang [Fri, 21 Feb 2025 15:43:57 +0000 (16:43 +0100)] 
mptcp: pm: add mptcp_pm_genl_fill_addr helper

To save some redundant code in dump_addr() interfaces of both the
netlink PM and userspace PM, the code that calls netlink message
helpers (genlmsg_put/cancel/end) and mptcp_nl_fill_addr() is wrapped
into a new helper mptcp_pm_genl_fill_addr().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-4-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: pm: add a build check for userspace_pm_dump_addr
Geliang Tang [Fri, 21 Feb 2025 15:43:56 +0000 (16:43 +0100)] 
mptcp: pm: add a build check for userspace_pm_dump_addr

This patch adds a build check for mptcp_userspace_pm_dump_addr() to make
sure there is enough space in 'cb->ctx' to store an address id bitmap.

Just in case info stored in 'cb->ctx' are increased later.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-3-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: pm: change to fullmesh only for 'subflow'
Matthieu Baerts (NGI0) [Fri, 21 Feb 2025 15:43:55 +0000 (16:43 +0100)] 
mptcp: pm: change to fullmesh only for 'subflow'

If an endpoint doesn't have the 'subflow' flag -- in fact, has no type,
so not 'subflow', 'signal', nor 'implicit' -- there are then no subflows
created from this local endpoint to at least the initial destination
address. In this case, no need to call mptcp_pm_nl_fullmesh() which is
there to recreate the subflows to reflect the new value of the fullmesh
attribute.

Similarly, there is then no need to iterate over all connections to do
nothing, if only the 'fullmesh' flag has been changed, and the endpoint
doesn't have the 'subflow' one. So stop early when dealing with this
specific case.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-2-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agomptcp: pm: remove unused ret value to set flags
Matthieu Baerts (NGI0) [Fri, 21 Feb 2025 15:43:54 +0000 (16:43 +0100)] 
mptcp: pm: remove unused ret value to set flags

The returned value is not used, it can then be dropped.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-1-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5: Use secs_to_jiffies() instead of msecs_to_jiffies()
Thorsten Blum [Fri, 21 Feb 2025 08:53:22 +0000 (09:53 +0100)] 
net/mlx5: Use secs_to_jiffies() instead of msecs_to_jiffies()

Use secs_to_jiffies() and simplify the code.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250221085350.198024-3-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'net-mlx5e-move-ipsec-policy-check-after-decryption'
Jakub Kicinski [Tue, 25 Feb 2025 02:15:19 +0000 (18:15 -0800)] 
Merge branch 'net-mlx5e-move-ipsec-policy-check-after-decryption'

Tariq Toukan says:

====================
net/mlx5e: Move IPSec policy check after decryption

This series by Jianbo adds IPsec policy check after decryption.

In current mlx5 driver, the policy check is done before decryption for
IPSec crypto and packet offload. This series changes that order to
make it consistent with the processing in kernel xfrm. Besides, RX
state with UPSPEC selector is supported correctly after new steering
table is added after decryption and before the policy check.
====================

Link: https://patch.msgid.link/20250220213959.504304-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Support RX xfrm state selector's UPSPEC for packet offload
Jianbo Liu [Thu, 20 Feb 2025 21:39:58 +0000 (23:39 +0200)] 
net/mlx5e: Support RX xfrm state selector's UPSPEC for packet offload

Previously, the upper layer matches are added for the decryption rule
when xfrm selector's UPSPEC is specified in the command. However, it's
impossible as packets are not decrypted, and there is no way to do
match on the upper protocol (TCP/UDP) with specific source/destination
port. The result is that packets are not decrypted by hardware because
of this mismatch. Instead, they are forwarded to kernel, and
decryption is done by software.

To resolve this issue, this patch adds new table (sa_sel) after status
table and before policy table. When UPSPEC's proto is specified in
xfrm state's selector, a rule is added in status table to forward the
decrypted packets to sa_sel table, where the corresponding rule for
selector's UPSPEC is added, and packet's upper headers are checked
there. If matched, they will be forward to policy table to do policy
check. Otherwise, they are dropped immediately.

Besides, add a global count for this kind of packet drop.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Add pass flow group for IPSec RX status table
Jianbo Liu [Thu, 20 Feb 2025 21:39:57 +0000 (23:39 +0200)] 
net/mlx5e: Add pass flow group for IPSec RX status table

This flow group is added for the pass rules for both crypto offload
and packet offload. It is placed at the end of the table, and right
before the miss group. There are two entries, and the default pass
rules for both offloads are added in this group.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Add num_reserved_entries param for ipsec_ft_create()
Jianbo Liu [Thu, 20 Feb 2025 21:39:56 +0000 (23:39 +0200)] 
net/mlx5e: Add num_reserved_entries param for ipsec_ft_create()

Add parameter for ipsec_ft_create() to pass the number of the reserved
entries when creating auto-grouped flow table. It's used to create
table with pre-defined group(s) which may have more than one rule.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Skip IPSec RX policy check for crypto offload
Jianbo Liu [Thu, 20 Feb 2025 21:39:55 +0000 (23:39 +0200)] 
net/mlx5e: Skip IPSec RX policy check for crypto offload

For crypto offload, there is no xfrm policy rule offloaded to
hardware, so no need to continue with policy check for it.

Previously, for crypto offload, the hardware metadata reg c4 is not
used and not changed, but set to ASO_OK(0) before decryption to avoid
garbage data. Then a default rule is added to check ipsec_syndrome and
this register. Packets are forwarded to policy table if succeed, or
drop if fails.

According to hardware document, this register value could be 0, 1.
So a special value (0xAA), which is not used by hardware, is chosen as
an indication for crypto offload. It is set to c4 before decryption.
Then a default rule, which matches on 0xAA (and ipsec_syndrome on 0),
is added, which means packets are done by crypto offload, and
sends them to kernel directly, thus skips the policy check.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Move IPSec policy check after decryption
Jianbo Liu [Thu, 20 Feb 2025 21:39:54 +0000 (23:39 +0200)] 
net/mlx5e: Move IPSec policy check after decryption

Currently, xfrm policy check is done before decryption in mlx5 driver.
If matching any policy, packets are forwarded to xfrm state table for
decryption. But this is exact opposite to what software does. For
kernel implementation, xfrm decode is unconditionally activated
whenever an IPSec packet reaches the input flow if there’s a matching
state rule.

This patch changes the order, move policy check after decryption.
Besides, a miss flow table is added at the end for legacy mode, to
make it easier to update the default destination of the steering rules.

So ESP packets are firstly forwarded to SA table for decryption, then
the result is checked in status table. If the decryption succeeds,
packets are forwarded to another table to check xfrm policy rules.
When a policy with allow action is matched, if in legacy mode packets
are forwarded to miss flow table with one rule to forward them to RoCE
tables, if in switchdev mode they are forwarded directly to TC root
chain instead.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Add correct match to check IPSec syndromes for switchdev mode
Jianbo Liu [Thu, 20 Feb 2025 21:39:53 +0000 (23:39 +0200)] 
net/mlx5e: Add correct match to check IPSec syndromes for switchdev mode

In commit dddb49b63d86 ("net/mlx5e: Add IPsec and ASO syndromes check
in HW"), IPSec and ASO syndromes checks after decryption for the
specified ASO object were added. But they are correct only for eswith
in legacy mode. For switchdev mode, metadata register c1 is used to
save the mapped id (not ASO object id). So, need to change the match
accordingly for the check rules in status table.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Change the destination of IPSec RX SA miss rule
Jianbo Liu [Thu, 20 Feb 2025 21:39:52 +0000 (23:39 +0200)] 
net/mlx5e: Change the destination of IPSec RX SA miss rule

For eswitch in legacy mode, the packets decrypted in RX SA table will
continue to be processed for RoCE. But this is not necessary for the
un-decrypted packets, which don't match any decryption rules but hit
the miss rule at the end of the table. So, change the destination of
miss rule to TTC default one and skip RoCE.

For eswitch in switchdev mode, the destination is unchanged.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/mlx5e: Add helper function to update IPSec default destination
Jianbo Liu [Thu, 20 Feb 2025 21:39:51 +0000 (23:39 +0200)] 
net/mlx5e: Add helper function to update IPSec default destination

The default destination of IPSec steering rules for MPV mode will be
updated when the master device is brought up or down. Move the common
code into the helper function. It’s convenient to update destinations
in later patches.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: wangxun: Replace the judgement of MAC type with flags
Jiawen Wu [Fri, 21 Feb 2025 06:57:18 +0000 (14:57 +0800)] 
net: wangxun: Replace the judgement of MAC type with flags

Since device MAC types are constantly being added, the judgments of
wx->mac.type are complex. Try to convert the types to flags depending
on functions.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250221065718.197544-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: txgbe: Add basic support for new AML devices
Jiawen Wu [Fri, 21 Feb 2025 06:57:17 +0000 (14:57 +0800)] 
net: txgbe: Add basic support for new AML devices

There is a new 40/25/10 Gigabit Ethernet device.

To support basic functions, PHYLINK is temporarily skipped as it is
intended to implement these configurations in the firmware. And the
associated link IRQ is also skipped.

And Implement the new SW-FW interaction interface, which use 64 Byte
message buffer.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250221065718.197544-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ethernet: renesas: rcar_gen4_ptp: Remove bool conversion
Thorsten Blum [Sun, 23 Feb 2025 23:36:11 +0000 (00:36 +0100)] 
net: ethernet: renesas: rcar_gen4_ptp: Remove bool conversion

Remove the unnecessary bool conversion and simplify the code.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20250223233613.100518-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: Remove shadow variable in netdev_run_todo()
Breno Leitao [Fri, 21 Feb 2025 17:51:27 +0000 (09:51 -0800)] 
net: Remove shadow variable in netdev_run_todo()

Fix a shadow variable warning in net/core/dev.c when compiled with
CONFIG_LOCKDEP enabled. The warning occurs because 'dev' is redeclared
inside the while loop, shadowing the outer scope declaration.

net/core/dev.c:11211:22: warning: declaration shadows a local variable [-Wshadow]
struct net_device *dev = list_first_entry(&unlink_list,

net/core/dev.c:11202:21: note: previous declaration is here
struct net_device *dev, *tmp;

Remove the redundant declaration since the variable is already defined
in the outer scope and will be overwritten in the subsequent
list_for_each_entry_safe() loop anyway.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250221-netcons_fix_shadow-v1-1-dee20c8658dd@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'net-stmmac-thead-clean-up-clock-rate-setting'
Jakub Kicinski [Mon, 24 Feb 2025 22:29:59 +0000 (14:29 -0800)] 
Merge branch 'net-stmmac-thead-clean-up-clock-rate-setting'

Russell King says:

====================
net: stmmac: thead: clean up clock rate setting

This series cleans up the thead clock rate setting to use the
rgmii_clock() helper function added to phylib.

The first patch switches over to using the rgmii_clock() helper,
and the second patch cleans up the verification that the desired
clock rate is achievable, allowing the private clock rate
definitions to be removed.
====================

Tested-by: Drew Fustini <drew@pdp7.com>
Link: https://patch.msgid.link/Z7iKdaCp4hLWWgJ2@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: stmmac: thead: ensure divisor gives proper rate
Russell King (Oracle) [Fri, 21 Feb 2025 14:15:17 +0000 (14:15 +0000)] 
net: stmmac: thead: ensure divisor gives proper rate

thead was checking that the stmmac_clk rate was a multiple of the
RGMII rates for 1G and 100M, but didn't check for 10M. Rather than
use this with hard-coded speeds, check that the calculated divisor
gives the required rate by multplying the transmit clock rate back
up to the stmmac clock rate and checking that it agrees.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Drew Fustini <drew@pdp7.com>
Link: https://patch.msgid.link/E1tlToD-004W3g-HB@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: stmmac: thead: use rgmii_clock() for RGMII clock rate
Russell King (Oracle) [Fri, 21 Feb 2025 14:15:12 +0000 (14:15 +0000)] 
net: stmmac: thead: use rgmii_clock() for RGMII clock rate

Switch to using rgmii_clock() to get the RGMII TXC rate, and calculate
the divisor from the parent clock rate and the rate indicated by
rgmii_clock().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Drew Fustini <drew@pdp7.com>
Link: https://patch.msgid.link/E1tlTo8-004W3a-CO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'net-remove-skb_flow_get_ports'
Jakub Kicinski [Mon, 24 Feb 2025 22:27:55 +0000 (14:27 -0800)] 
Merge branch 'net-remove-skb_flow_get_ports'

Nicolas Dichtel says:

====================
net: remove skb_flow_get_ports()

Remove skb_flow_get_ports() and rename __skb_flow_get_ports() to
skb_flow_get_ports().
====================

Link: https://patch.msgid.link/20250221110941.2041629-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: remove '__' from __skb_flow_get_ports()
Nicolas Dichtel [Fri, 21 Feb 2025 11:07:29 +0000 (12:07 +0100)] 
net: remove '__' from __skb_flow_get_ports()

Only one version of skb_flow_get_ports() exists after the previous commit,
so let's remove the useless '__'.

Suggested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://patch.msgid.link/20250221110941.2041629-3-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoskbuff: kill skb_flow_get_ports()
Nicolas Dichtel [Fri, 21 Feb 2025 11:07:28 +0000 (12:07 +0100)] 
skbuff: kill skb_flow_get_ports()

Since commit a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use
with xdp_buff"), this function is not used anymore.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250221110941.2041629-2-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: stmmac: Correct usage of maximum queue number macros
Kunihiko Hayashi [Fri, 21 Feb 2025 05:18:18 +0000 (14:18 +0900)] 
net: stmmac: Correct usage of maximum queue number macros

The maximum numbers of each Rx and Tx queues are defined by
MTL_MAX_RX_QUEUES and MTL_MAX_TX_QUEUES respectively.

There are some places where Rx and Tx are used in reverse. There is no
issue when the Tx and Rx macros have the same value, but should correct
usage of macros for maximum queue number to keep consistency and prevent
unexpected mistakes.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Link: https://patch.msgid.link/20250221051818.4163678-1-hayashi.kunihiko@socionext.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet-sysfs: restore behavior for not running devices
Eric Dumazet [Fri, 21 Feb 2025 05:12:23 +0000 (05:12 +0000)] 
net-sysfs: restore behavior for not running devices

modprobe dummy dumdummies=1

Old behavior :

$ cat /sys/class/net/dummy0/carrier
cat: /sys/class/net/dummy0/carrier: Invalid argument

After blamed commit, an empty string is reported.

$ cat /sys/class/net/dummy0/carrier
$

In this commit, I restore the old behavior for carrier,
speed and duplex attributes.

Fixes: 79c61899b5ee ("net-sysfs: remove rtnl_trylock from device attributes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marco Leogrande <leogrande@google.com>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250221051223.576726-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: stmmac: qcom-ethqos: use rgmii_clock() to set the link clock
Russell King (Oracle) [Fri, 21 Feb 2025 11:38:20 +0000 (11:38 +0000)] 
net: stmmac: qcom-ethqos: use rgmii_clock() to set the link clock

The link clock operates at twice the RGMII clock rate. Therefore, we
can use the rgmii_clock() helper to set this clock rate.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tlRMK-004Vsx-Ss@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agovirtio-net: tweak for better TX performance in NAPI mode
Jason Wang [Tue, 18 Feb 2025 02:39:08 +0000 (10:39 +0800)] 
virtio-net: tweak for better TX performance in NAPI mode

There are several issues existed in start_xmit():

- Transmitted packets need to be freed before sending a packet, this
  introduces delay and increases the average packets transmit
  time. This also increase the time that spent in holding the TX lock.
- Notification is enabled after free_old_xmit_skbs() which will
  introduce unnecessary interrupts if TX notification happens on the
  same CPU that is doing the transmission now (actually, virtio-net
  driver are optimized for this case).

So this patch tries to avoid those issues by not cleaning transmitted
packets in start_xmit() when TX NAPI is enabled and disable
notifications even more aggressively. Notification will be since the
beginning of the start_xmit(). But we can't enable delayed
notification after TX is stopped as we will lose the
notifications. Instead, the delayed notification needs is enabled
after the virtqueue is kicked for best performance.

Performance numbers:

1) single queue 2 vcpus guest with pktgen_sample03_burst_single_flow.sh
   (burst 256) + testpmd (rxonly) on the host:

- When pinning TX IRQ to pktgen VCPU: split virtqueue PPS were
  increased 55% from 6.89 Mpps to 10.7 Mpps and 32% TX interrupts were
  eliminated. Packed virtqueue PPS were increased 50% from 7.09 Mpps to
  10.7 Mpps, 99% TX interrupts were eliminated.

- When pinning TX IRQ to VCPU other than pktgen: split virtqueue PPS
  were increased 96% from 5.29 Mpps to 10.4 Mpps and 45% TX interrupts
  were eliminated; Packed virtqueue PPS were increased 78% from 6.12
  Mpps to 10.9 Mpps and 99% TX interrupts were eliminated.

2) single queue 1 vcpu guest + vhost-net/TAP on the host: single
   session netperf from guest to host shows 82% improvement from
   31Gb/s to 58Gb/s, %stddev were reduced from 34.5% to 1.9% and 88%
   of TX interrupts were eliminated.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 months agonet/mlx5: Change POOL_NEXT_SIZE define value and make it global
Patrisious Haddad [Wed, 19 Feb 2025 08:58:08 +0000 (10:58 +0200)] 
net/mlx5: Change POOL_NEXT_SIZE define value and make it global

Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define
is used to request the available maximum sized flow table, and zero doesn't
make sense for it, whereas some places in the driver use zero explicitly
expecting the smallest table size possible but instead due to this
define they end up allocating the biggest table size unawarely.

In addition move the definition to "include/linux/mlx5/fs.h" to expose the
define to IB driver as well, while appropriately renaming it.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219085808.349923-3-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
3 months agonet/mlx5: Add new health syndrome error and crr bit offset
Shahar Shitrit [Wed, 19 Feb 2025 08:58:07 +0000 (10:58 +0200)] 
net/mlx5: Add new health syndrome error and crr bit offset

Add new error value for trust lockdown in health syndrome enum.
Also, include the offset for crr bit in the health buffer layout.

These changes prepare for downstream patches that update health
event handling.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219085808.349923-2-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
3 months agoMerge branch 'mctp-add-mctp-over-usb-hardware-transport-binding'
Jakub Kicinski [Sat, 22 Feb 2025 00:45:26 +0000 (16:45 -0800)] 
Merge branch 'mctp-add-mctp-over-usb-hardware-transport-binding'

Jeremy Kerr says:

====================
mctp: Add MCTP-over-USB hardware transport binding

Add an implementation of the DMTF standard DSP0283, providing an MCTP
channel over high-speed USB.

This is a fairly trivial first implementation, in that we only submit
one tx and one rx URB at a time. We do accept multi-packet transfers,
but do not yet generate them on transmit.

Of course, questions and comments are most welcome, particularly on the
USB interfaces.

v2: https://lore.kernel.org/20250212-dev-mctp-usb-v2-0-76e67025d764@codeconstruct.com.au
v1: https://lore.kernel.org/20250206-dev-mctp-usb-v1-0-81453fe26a61@codeconstruct.com.au
====================

Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-0-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: mctp: Add MCTP USB transport driver
Jeremy Kerr [Fri, 21 Feb 2025 00:56:58 +0000 (08:56 +0800)] 
net: mctp: Add MCTP USB transport driver

Add an implementation for DMTF DSP0283, which defines a MCTP-over-USB
transport. As per that spec, we're restricted to full speed mode,
requiring 512-byte transfers.

Each MCTP-over-USB interface is a peer-to-peer link to a single MCTP
endpoint, so no physical addressing is required (of course, that MCTP
endpoint may then bridge to further MCTP endpoints). Consequently,
interfaces will report with no lladdr data:

    # mctp link
    dev lo index 1 address 00:00:00:00:00:00 net 1 mtu 65536 up
    dev mctpusb0 index 6 address none net 1 mtu 68 up

This is a simple initial implementation, with single rx & tx urbs, and
no multi-packet tx transfers - although we do accept multi-packet rx
from the device.

Includes suggested fixes from Santosh Puranik <spuranik@nvidia.com>.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Cc: Santosh Puranik <spuranik@nvidia.com>
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-2-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agousb: Add base USB MCTP definitions
Jeremy Kerr [Fri, 21 Feb 2025 00:56:57 +0000 (08:56 +0800)] 
usb: Add base USB MCTP definitions

Upcoming changes will add a USB host (and later gadget) driver for the
MCTP-over-USB protocol. Add a header that provides common definitions
for protocol support: the packet header format and a few framing
definitions. Add a define for the MCTP class code, as per
https://usb.org/defined-class-codes.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-1-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: cadence: macb: Implement BQL
Sean Anderson [Thu, 20 Feb 2025 16:42:57 +0000 (11:42 -0500)] 
net: cadence: macb: Implement BQL

Implement byte queue limits to allow queuing disciplines to account for
packets enqueued in the ring buffer but not yet transmitted. There are a
separate set of transmit functions for AT91 that I haven't touched since
I don't have hardware to test on.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250220164257.96859-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: stmmac: print stmmac_init_dma_engine() errors using netdev_err()
Russell King (Oracle) [Thu, 20 Feb 2025 12:47:49 +0000 (12:47 +0000)] 
net: stmmac: print stmmac_init_dma_engine() errors using netdev_err()

stmmac_init_dma_engine() uses dev_err() which leads to errors being
reported as e.g:

dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma
dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed

stmmac_init_dma_engine() is only called from stmmac_hw_setup() which
itself uses netdev_err(), and we will have a net_device setup. So,
change the dev_err() to netdev_err() to give consistent error
messages.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tl5y1-004UgG-8X@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests: fib_nexthops: do not mark skipped tests as failed
Hangbin Liu [Thu, 20 Feb 2025 08:53:26 +0000 (08:53 +0000)] 
selftests: fib_nexthops: do not mark skipped tests as failed

The current test marks all unexpected return values as failed and sets ret
to 1. If a test is skipped, the entire test also returns 1, incorrectly
indicating failure.

To fix this, add a skipped variable and set ret to 4 if it was previously
0. Otherwise, keep ret set to 1.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250220085326.1512814-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'net-fib_rules-add-dscp-mask-support'
Jakub Kicinski [Sat, 22 Feb 2025 00:08:54 +0000 (16:08 -0800)] 
Merge branch 'net-fib_rules-add-dscp-mask-support'

Ido Schimmel says:

====================
net: fib_rules: Add DSCP mask support

In some deployments users would like to encode path information into
certain bits of the IPv6 flow label, the UDP source port and the DSCP
field and use this information to route packets accordingly.

Redirecting traffic to a routing table based on specific bits in the
DSCP field is not currently possible. Only exact match is currently
supported by FIB rules.

This patchset extends FIB rules to match on the DSCP field with an
optional mask.

Patches #1-#5 gradually extend FIB rules to match on the DSCP field with
an optional mask.

Patch #6 adds test cases for the new functionality.

iproute2 support can be found here [1].

[1] https://github.com/idosch/iproute2/tree/submit/fib_rule_mask_v1
====================

Link: https://patch.msgid.link/20250220080525.831924-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests: fib_rule_tests: Add DSCP mask match tests
Ido Schimmel [Thu, 20 Feb 2025 08:05:25 +0000 (10:05 +0200)] 
selftests: fib_rule_tests: Add DSCP mask match tests

Add tests for FIB rules that match on DSCP with a mask. Test both good
and bad flows and both the input and output paths.

 # ./fib_rule_tests.sh
 IPv6 FIB rule tests
 [...]
    TEST: rule6 check: dscp redirect to table                           [ OK ]
    TEST: rule6 check: dscp no redirect to table                        [ OK ]
    TEST: rule6 del by pref: dscp redirect to table                     [ OK ]
    TEST: rule6 check: iif dscp redirect to table                       [ OK ]
    TEST: rule6 check: iif dscp no redirect to table                    [ OK ]
    TEST: rule6 del by pref: iif dscp redirect to table                 [ OK ]
    TEST: rule6 check: dscp masked redirect to table                    [ OK ]
    TEST: rule6 check: dscp masked no redirect to table                 [ OK ]
    TEST: rule6 del by pref: dscp masked redirect to table              [ OK ]
    TEST: rule6 check: iif dscp masked redirect to table                [ OK ]
    TEST: rule6 check: iif dscp masked no redirect to table             [ OK ]
    TEST: rule6 del by pref: iif dscp masked redirect to table          [ OK ]
 [...]

 Tests passed: 316
 Tests failed:   0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-7-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonetlink: specs: Add FIB rule DSCP mask attribute
Ido Schimmel [Thu, 20 Feb 2025 08:05:24 +0000 (10:05 +0200)] 
netlink: specs: Add FIB rule DSCP mask attribute

Add new DSCP mask attribute to the spec. Example:

 # ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
 --do newrule \
 --json '{"family": 2, "dscp": 10, "dscp-mask": 63, "action": 1, "table": 1}'
 None
 $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
 --dump getrule --json '{"family": 2}' --output-json | jq '.[]'
 [...]
 {
   "table": 1,
   "suppress-prefixlen": "0xffffffff",
   "protocol": 0,
   "priority": 32765,
   "dscp": 10,
   "dscp-mask": "0x3f",
   "family": 2,
   "dst-len": 0,
   "src-len": 0,
   "tos": 0,
   "action": "to-tbl",
   "flags": 0
 }
 [...]

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: fib_rules: Enable DSCP mask usage
Ido Schimmel [Thu, 20 Feb 2025 08:05:23 +0000 (10:05 +0200)] 
net: fib_rules: Enable DSCP mask usage

Allow user space to configure FIB rules that match on DSCP with a mask,
now that support has been added to the IPv4 and IPv6 address families.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoipv6: fib_rules: Add DSCP mask matching
Ido Schimmel [Thu, 20 Feb 2025 08:05:22 +0000 (10:05 +0200)] 
ipv6: fib_rules: Add DSCP mask matching

Extend IPv6 FIB rules to match on DSCP using a mask. Unlike IPv4, also
initialize the DSCP mask when a non-zero 'tos' is specified as there is
no difference in matching between 'tos' and 'dscp'. As a side effect,
this makes it possible to match on 'dscp 0', like in IPv4.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoipv4: fib_rules: Add DSCP mask matching
Ido Schimmel [Thu, 20 Feb 2025 08:05:21 +0000 (10:05 +0200)] 
ipv4: fib_rules: Add DSCP mask matching

Extend IPv4 FIB rules to match on DSCP using a mask. The mask is only
set in rules that match on DSCP (not TOS) and initialized to cover the
entire DSCP field if the mask attribute is not specified.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: fib_rules: Add DSCP mask attribute
Ido Schimmel [Thu, 20 Feb 2025 08:05:20 +0000 (10:05 +0200)] 
net: fib_rules: Add DSCP mask attribute

Add an attribute that allows matching on DSCP with a mask. Matching on
DSCP with a mask is needed in deployments where users encode path
information into certain bits of the DSCP field.

Temporarily set the type of the attribute to 'NLA_REJECT' while support
is being added.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Jakub Kicinski [Fri, 21 Feb 2025 23:59:47 +0000 (15:59 -0800)] 
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-02-20

We've added 19 non-merge commits during the last 8 day(s) which contain
a total of 35 files changed, 1126 insertions(+), 53 deletions(-).

The main changes are:

1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing

2) Add network TX timestamping support to BPF sock_ops, from Jason Xing

3) Add TX metadata Launch Time support, from Song Yoong Siang

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  igc: Add launch time support to XDP ZC
  igc: Refactor empty frame insertion for launch time support
  net: stmmac: Add launch time support to XDP ZC
  selftests/bpf: Add launch time request to xdp_hw_metadata
  xsk: Add launch time hardware offload support to XDP Tx metadata
  selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
  bpf: Support selective sampling for bpf timestamping
  bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
  net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
  bpf: Disable unsafe helpers in TX timestamping callbacks
  bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
  bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
  bpf: Add networking timestamping support to bpf_get/setsockopt()
  selftests/bpf: Add rto max for bpf_setsockopt test
  bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
====================

Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agogve: Add RSS cache for non RSS device option scenario
Ziwei Xiao [Wed, 19 Feb 2025 20:04:51 +0000 (12:04 -0800)] 
gve: Add RSS cache for non RSS device option scenario

Not all the devices have the capability for the driver to query for the
registered RSS configuration. The driver can discover this by checking
the relevant device option during setup. If it cannot, the driver needs
to store the RSS config cache and directly return such cache when
queried by the ethtool. RSS config is inited when driver probes. Also the
default RSS config will be adjusted when there is RX queue count change.

At this point, only keys of GVE_RSS_KEY_SIZE and indirection tables of
GVE_RSS_INDIR_SIZE are supported.

Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250219200451.3348166-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet/rds: Replace deprecated strncpy() with strscpy_pad()
Thorsten Blum [Wed, 19 Feb 2025 22:47:31 +0000 (23:47 +0100)] 
net/rds: Replace deprecated strncpy() with strscpy_pad()

strncpy() is deprecated for NUL-terminated destination buffers. Use
strscpy_pad() instead and remove the manual NUL-termination.

Compile-tested only.

Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Tested-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250219224730.73093-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'net-improve-netns-handling-in-rtnetlink'
Jakub Kicinski [Fri, 21 Feb 2025 23:28:07 +0000 (15:28 -0800)] 
Merge branch 'net-improve-netns-handling-in-rtnetlink'

Xiao Liang says:

====================
net: Improve netns handling in rtnetlink

This patch series includes some netns-related improvements and fixes for
rtnetlink, to make link creation more intuitive:

 1) Creating link in another net namespace doesn't conflict with link
    names in current one.
 2) Refector rtnetlink link creation. Create link in target namespace
    directly.

So that

  # ip link add netns ns1 link-netns ns2 tun0 type gre ...

will create tun0 in ns1, rather than create it in ns2 and move to ns1.
And don't conflict with another interface named "tun0" in current netns.

Patch 01 avoids link name conflict in different netns.

To achieve 2), there're mainly 3 steps:

 - Patch 02 packs newlink() parameters into a struct, including
   the original "src_net" along with more netns context. No semantic
   changes are introduced.
 - Patch 03 ~ 09 converts device drivers to use the explicit netns
   extracted from params.
 - Patch 10 ~ 11 removes the old netns parameter, and converts
   rtnetlink to create device in target netns directly.

Patch 12 ~ 13 adds some tests for link name and link netns.

---

Please note there're some issues found in current code:

- In amt_newlink() drivers/net/amt.c:

    amt->net = net;
    ...
    amt->stream_dev = dev_get_by_index(net, ...

  Uses net, but amt_lookup_upper_dev() only searches in dev_net.
  So the AMT device may not be properly deleted if it's in a different
  netns from lower dev.

- In lowpan_newlink() in net/ieee802154/6lowpan/core.c:

    wdev = dev_get_by_index(dev_net(ldev), nla_get_u32(tb[IFLA_LINK]));

  Looks for IFLA_LINK in dev_net, but in theory the ifindex is defined
  in link netns.

And thanks to Kuniyuki for fixing related issues in gtp and pfcp:
https://lore.kernel.org/netdev/20250110014754.33847-1-kuniyu@amazon.com/

v9: https://lore.kernel.org/20250210133002.883422-1-shaw.leon@gmail.com
v8: https://lore.kernel.org/20250113143719.7948-1-shaw.leon@gmail.com
v7: https://lore.kernel.org/20250104125732.17335-1-shaw.leon@gmail.com
v6: https://lore.kernel.org/20241218130909.2173-1-shaw.leon@gmail.com
v5: https://lore.kernel.org/20241209140151.231257-1-shaw.leon@gmail.com
v4: https://lore.kernel.org/20241118143244.1773-1-shaw.leon@gmail.com
v3: https://lore.kernel.org/20241113125715.150201-1-shaw.leon@gmail.com
v2: https://lore.kernel.org/20241107133004.7469-1-shaw.leon@gmail.com
v1: https://lore.kernel.org/20241023023146.372653-1-shaw.leon@gmail.com
====================

Link: https://patch.msgid.link/20250219125039.18024-1-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests: net: Add test cases for link and peer netns
Xiao Liang [Wed, 19 Feb 2025 12:50:39 +0000 (20:50 +0800)] 
selftests: net: Add test cases for link and peer netns

- Add test for creating link in another netns when a link of the same
   name and ifindex exists in current netns.
 - Add test to verify that link is created in target netns directly -
   no link new/del events should be generated in link netns or current
   netns.
 - Add test cases to verify that link-netns is set as expected for
   various drivers and combination of namespace-related parameters.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Link: https://patch.msgid.link/20250219125039.18024-14-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoselftests: net: Add python context manager for netns entering
Xiao Liang [Wed, 19 Feb 2025 12:50:38 +0000 (20:50 +0800)] 
selftests: net: Add python context manager for netns entering

Change netns of current thread and switch back on context exit.
For example:

    with NetNSEnter("ns1"):
        ip("link add dummy0 type dummy")

The command be executed in netns "ns1".

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Link: https://patch.msgid.link/20250219125039.18024-13-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agortnetlink: Create link directly in target net namespace
Xiao Liang [Wed, 19 Feb 2025 12:50:37 +0000 (20:50 +0800)] 
rtnetlink: Create link directly in target net namespace

Make rtnl_newlink_create() create device in target namespace directly.
Avoid extra netns change when link netns is provided.

Device drivers has been converted to be aware of link netns, that is not
assuming device netns is and link netns is the same when ops->newlink()
is called.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-12-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agortnetlink: Remove "net" from newlink params
Xiao Liang [Wed, 19 Feb 2025 12:50:36 +0000 (20:50 +0800)] 
rtnetlink: Remove "net" from newlink params

Now that devices have been converted to use the specific netns instead
of ambiguous "net", let's remove it from newlink parameters.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-11-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: xfrm: Use link netns in newlink() of rtnl_link_ops
Xiao Liang [Wed, 19 Feb 2025 12:50:35 +0000 (20:50 +0800)] 
net: xfrm: Use link netns in newlink() of rtnl_link_ops

When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-10-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ipv6: Use link netns in newlink() of rtnl_link_ops
Xiao Liang [Wed, 19 Feb 2025 12:50:34 +0000 (20:50 +0800)] 
net: ipv6: Use link netns in newlink() of rtnl_link_ops

When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-9-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ipv6: Init tunnel link-netns before registering dev
Xiao Liang [Wed, 19 Feb 2025 12:50:33 +0000 (20:50 +0800)] 
net: ipv6: Init tunnel link-netns before registering dev

Currently some IPv6 tunnel drivers set tnl->net to dev_net(dev) in
ndo_init(), which is called in register_netdevice(). However, it lacks
the context of link-netns when we enable cross-net tunnels at device
registration time.

Let's move the init of tunnel link-netns before register_netdevice().

ip6_gre has already initialized netns, so just remove the redundant
assignment.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-8-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ip_tunnel: Use link netns in newlink() of rtnl_link_ops
Xiao Liang [Wed, 19 Feb 2025 12:50:32 +0000 (20:50 +0800)] 
net: ip_tunnel: Use link netns in newlink() of rtnl_link_ops

When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Convert common ip_tunnel_newlink() to accept an extra link netns
argument.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-7-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: ip_tunnel: Don't set tunnel->net in ip_tunnel_init()
Xiao Liang [Wed, 19 Feb 2025 12:50:31 +0000 (20:50 +0800)] 
net: ip_tunnel: Don't set tunnel->net in ip_tunnel_init()

ip_tunnel_init() is called from register_netdevice(). In all code paths
reaching here, tunnel->net should already have been set (either in
ip_tunnel_newlink() or __ip_tunnel_create()). So don't set it again.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-6-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoieee802154: 6lowpan: Validate link netns in newlink() of rtnl_link_ops
Xiao Liang [Wed, 19 Feb 2025 12:50:30 +0000 (20:50 +0800)] 
ieee802154: 6lowpan: Validate link netns in newlink() of rtnl_link_ops

Device denoted by IFLA_LINK is in link_net (IFLA_LINK_NETNSID) or
source netns by design, but 6lowpan uses dev_net.

Note dev->netns_local is set to true and currently link_net is
implemented via a netns change. These together effectively reject
IFLA_LINK_NETNSID.

This patch adds a validation to ensure link_net is either NULL or
identical to dev_net. Thus it would be fine to continue using dev_net
when rtnetlink core begins to create devices directly in target netns.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-5-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: Use link/peer netns in newlink() of rtnl_link_ops
Xiao Liang [Wed, 19 Feb 2025 12:50:29 +0000 (20:50 +0800)] 
net: Use link/peer netns in newlink() of rtnl_link_ops

Add two helper functions - rtnl_newlink_link_net() and
rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back
to link netns, and link netns falls back to source netns.

Convert the use of params->net in netdevice drivers to one of the helper
functions for clarity.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-4-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agortnetlink: Pack newlink() params into struct
Xiao Liang [Wed, 19 Feb 2025 12:50:28 +0000 (20:50 +0800)] 
rtnetlink: Pack newlink() params into struct

There are 4 net namespaces involved when creating links:

 - source netns - where the netlink socket resides,
 - target netns - where to put the device being created,
 - link netns - netns associated with the device (backend),
 - peer netns - netns of peer device.

Currently, two nets are passed to newlink() callback - "src_net"
parameter and "dev_net" (implicitly in net_device). They are set as
follows, depending on netlink attributes in the request.

 +------------+-------------------+---------+---------+
 | peer netns | IFLA_LINK_NETNSID | src_net | dev_net |
 +------------+-------------------+---------+---------+
 |            | absent            | source  | target  |
 | absent     +-------------------+---------+---------+
 |            | present           | link    | link    |
 +------------+-------------------+---------+---------+
 |            | absent            | peer    | target  |
 | present    +-------------------+---------+---------+
 |            | present           | peer    | link    |
 +------------+-------------------+---------+---------+

When IFLA_LINK_NETNSID is present, the device is created in link netns
first and then moved to target netns. This has some side effects,
including extra ifindex allocation, ifname validation and link events.
These could be avoided if we create it in target netns from
the beginning.

On the other hand, the meaning of src_net parameter is ambiguous. It
varies depending on how parameters are passed. It is the effective
link (or peer netns) by design, but some drivers ignore it and use
dev_net instead.

To provide more netns context for drivers, this patch packs existing
newlink() parameters, along with the source netns, link netns and peer
netns, into a struct. The old "src_net" is renamed to "net" to avoid
confusion with real source netns, and will be deprecated later. The use
of src_net are converted to params->net trivially.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agortnetlink: Lookup device in target netns when creating link
Xiao Liang [Wed, 19 Feb 2025 12:50:27 +0000 (20:50 +0800)] 
rtnetlink: Lookup device in target netns when creating link

When creating link, lookup for existing device in target net namespace
instead of current one.
For example, two links created by:

  # ip link add dummy1 type dummy
  # ip link add netns ns1 dummy1 type dummy

should have no conflict since they are in different namespaces.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-2-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agoMerge branch 'dt-bindings-net-realtek-rtl9301-switch'
Jakub Kicinski [Fri, 21 Feb 2025 23:08:05 +0000 (15:08 -0800)] 
Merge branch 'dt-bindings-net-realtek-rtl9301-switch'

Chris Packham says:

====================
dt-bindings: net: realtek,rtl9301-switch (schema part)

This is my attempt at trying to sort out the mess I've created with the RTL9300
switch dt-bindings. Some context is available on [1] and [2].

The first patch just moves the binding from mfd/ to net/ (with an adjustment of
the internal path name). The next two patches are successors to patches already
sent as part of the series [3].

[1] - https://lore.kernel.org/lkml/20250204-eccentric-deer-of-felicity-02b7ee@krzk-bin/
[2] - https://lore.kernel.org/lkml/4e3c5d83-d215-4eff-bf02-6d420592df8f@alliedtelesis.co.nz/
[3] - https://lore.kernel.org/lkml/20250204030249.1965444-1-chris.packham@alliedtelesis.co.nz/
====================

Link: https://patch.msgid.link/20250218195216.1034220-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: Add Realtek MDIO controller
Chris Packham [Tue, 18 Feb 2025 19:52:14 +0000 (08:52 +1300)] 
dt-bindings: net: Add Realtek MDIO controller

Add dtschema for the MDIO controller found in the RTL9300 Ethernet
switch. The controller is slightly unusual in that direct MDIO
communication is not possible. We model the MDIO controller with the
MDIO buses as child nodes and the PHYs as children of the buses. The
mapping of switch port number to MDIO bus/addr requires the
ethernet-ports sibling to provide the mapping via the phy-handle
property.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250218195216.1034220-4-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: Add switch ports and interrupts to RTL9300
Chris Packham [Tue, 18 Feb 2025 19:52:13 +0000 (08:52 +1300)] 
dt-bindings: net: Add switch ports and interrupts to RTL9300

Add bindings for the ethernet-switch and interrupt properties for the
RTL9300.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250218195216.1034220-3-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agodt-bindings: net: Move realtek,rtl9301-switch to net
Chris Packham [Tue, 18 Feb 2025 19:52:12 +0000 (08:52 +1300)] 
dt-bindings: net: Move realtek,rtl9301-switch to net

Initially realtek,rtl9301-switch was placed under mfd/ because it had
some non-switch related blocks (specifically i2c and reset) but with a
bit more review it has become apparent that this was wrong and the
binding should live under net/.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Lee Jones <lee@kernel.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250218195216.1034220-2-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 months agonet: sfp: add quirk for 2.5G OEM BX SFP
Birger Koblitz [Tue, 18 Feb 2025 17:59:40 +0000 (18:59 +0100)] 
net: sfp: add quirk for 2.5G OEM BX SFP

The OEM SFP-2.5G-BX10-D/U SFP module pair is meant to operate with
2500Base-X. However, in their EEPROM they incorrectly specify:
Transceiver codes   : 0x00 0x12 0x00 0x00 0x12 0x00 0x01 0x05 0x00
BR, Nominal         : 2500MBd

Use sfp_quirk_2500basex for this module to allow 2500Base-X mode anyway.
Tested on BananaPi R3.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/20250218-b4-lkmsub-v1-1-1e51dcabed90@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: phy: remove unused feature array declarations
Heiner Kallweit [Wed, 19 Feb 2025 20:15:05 +0000 (21:15 +0100)] 
net: phy: remove unused feature array declarations

After 12d5151be010 ("net: phy: remove leftovers from switch to linkmode
bitmaps") the following declarations are unused and can be removed too.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/b2883c75-4108-48f2-ab73-e81647262bc2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'selftests-drv-net-improve-the-queue-test-for-xsk'
Jakub Kicinski [Fri, 21 Feb 2025 01:57:11 +0000 (17:57 -0800)] 
Merge branch 'selftests-drv-net-improve-the-queue-test-for-xsk'

Jakub Kicinski says:

====================
selftests: drv-net: improve the queue test for XSK

We see some flakes in the the XSK test:

   Exception| Traceback (most recent call last):
   Exception|   File "/home/virtme/testing-18/tools/testing/selftests/net/lib/py/ksft.py", line 218, in ksft_run
   Exception|     case(*args)
   Exception|   File "/home/virtme/testing-18/tools/testing/selftests/drivers/net/./queues.py", line 53, in check_xdp
   Exception|     ksft_eq(q['xsk'], {})
   Exception| KeyError: 'xsk'

I think it's because the method of running the helper in the background
is racy. Add more solid infra for waiting for a background helper to be
initialized.

v1: https://lore.kernel.org/20250218195048.74692-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250219234956.520599-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: drv-net: rename queues check_xdp to check_xsk
Jakub Kicinski [Wed, 19 Feb 2025 23:49:56 +0000 (15:49 -0800)] 
selftests: drv-net: rename queues check_xdp to check_xsk

The test is for AF_XDP, we refer to AF_XDP as XSK.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: drv-net: improve the use of ksft helpers in XSK queue test
Jakub Kicinski [Wed, 19 Feb 2025 23:49:55 +0000 (15:49 -0800)] 
selftests: drv-net: improve the use of ksft helpers in XSK queue test

Avoid exceptions when xsk attr is not present, and add a proper ksft
helper for "not in" condition.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>