]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
6 weeks agodpll: remove documentation of rclk_dev_name
Simon Horman [Mon, 16 Jun 2025 12:58:35 +0000 (13:58 +0100)] 
dpll: remove documentation of rclk_dev_name

Remove documentation of rclk_dev_name member of dpll_device which
doesn't exist.

Flagged by ./scripts/kernel-doc -none

Introduced by commit 9431063ad323 ("dpll: core: Add DPLL framework base
functions")

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250616-dpll-member-v1-1-8c9e6b8e1fd4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Wed, 18 Jun 2025 01:50:57 +0000 (18:50 -0700)] 
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
libeth: add libeth_xdp helper lib

Alexander Lobakin says:

Time to add XDP helpers infra to libeth to greatly simplify adding
XDP to idpf and iavf, as well as improve and extend XDP in ice and
i40e. Any vendor is free to reuse helpers. If this happens, I'm fine
with moving the folder of out intel/.

The helpers greatly simplify building xdp_buff, running a prog,
handling the verdict, implement XDP_TX, .ndo_xdp_xmit, XDP buffer
completion. Same applies to XSk (with XSk xmit instead of
.ndo_xdp_xmit, plus stuff like XSk wakeup).
They are entirely generic with no HW definitions or assumptions.
HW-specific stuff like parsing Rx desc / filling Tx desc is passed
from the driver as inline callbacks.

For now, key assumptions that optimize performance / avoid code
bloat, but might not fit every driver in driver/net/:
 * netmem holding the buffers are always order-0;
 * driver has separate XDP Tx queues, doesn't use stack queues for
   that. For best efficiency, you may want to have nr_cpu_ids XDP
   queues, but less (queue sharing) is also supported;
 * XDP Tx queues are interrupt-less and use "lazy" cleaning only
   when there are less than 1/4 free Tx descriptors of the queue
   size;
 * main target platforms are 64-bit, although 32-bit is also fully
   supported, but the code might be not as optimized for them.

Library code already supports multi-buffer for all kinds of Tx and
both header split and no split for Rx and Tx. Frags can come from
devmem/io_uring etc., direct `struct page *` is used only for header
buffers for which it's always true.
Drivers are free to pass their own Rx hints and XSK xmit hints ops.

XDP_TX and ndo_xdp_xmit use onstack bulk for the frames to be sent
and send them by batches of 16 buffers. This eats ~280 bytes on the
stack, but gives good boosts and allow to greatly optimize the main
sending function leaving it without any error/exception paths.

XSk xmit fills Tx descriptors in the loop unrolled by 8. This was
proven to improve perf on ice and i40e. XDP_TX and ndo_xdp_xmit
doesn't use unrolling as I wasn't able to get any improvements in
those scenenarios from this, while +1 Kb for their sending functions
for nothing doesn't sound reasonable.

XSk wakeup, instead of traditionally used "SW interrupts" provided
by NICs, uses IPI to schedule NAPI on the CPU corresponding to the
given queue pair. It gives better control over CPU distribution and
in general performs way better than "SW interrupts", plus allows us
to not pass any HW-specific callbacks there.

The code is built the way that all callbacks passed from drivers
get inlined; in general, most of hotpath gets inlined. Everything
slow/exception lands to .c files in the libeth folder, doesn't
create copies in the drivers themselves and doesn't overloat
hotpath.
Sure, inlining means that hotpath will be compiled into every driver
that uses the lib, but the core code is written in one place, so no
copying of bugs happens. Fixed once -- works everywhere.

The last commit might look like sorta hack, but it gives really good
boosts and decreases object code size, plus there are checks that
all those wider accesses are fully safe, so I don't feel anything
bad about it.

An example of using libeth_xdp can be found either on my GitHub or
on the mailing lists here ("XDP for idpf"). Macros for building
driver XDP functions lead to that some implementations (XDP_TX,
ndo_xdp_xmit etc.) consist of really only a few lines.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  libeth: xdp, xsk: access adjacent u32s as u64 where applicable
  libeth: xsk: add XSkFQ refill and XSk wakeup helpers
  libeth: xsk: add XSk Rx processing support
  libeth: xsk: add XSk xmit functions
  libeth: xsk: add XSk XDP_TX sending helpers
  libeth: xdp: add RSS hash hint and XDP features setup helpers
  libeth: xdp: add templates for building driver-side callbacks
  libeth: xdp: add XDP prog run and verdict result handling
  libeth: xdp: add helpers for preparing/processing &libeth_xdp_buff
  libeth: xdp: add XDPSQ cleanup timers
  libeth: xdp: add XDPSQ locking helpers
  libeth: xdp: add XDPSQE completion helpers
  libeth: xdp: add .ndo_xdp_xmit() helpers
  libeth: xdp: add XDP_TX buffers sending
  libeth: support native XDP and register memory model
  libeth: convert to netmem
  libeth, libie: clean symbol exports up a little
====================

Link: https://patch.msgid.link/20250616201639.710420-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-mlx5e-add-support-for-devmem-and-io_uring-tcp-zero-copy'
Jakub Kicinski [Wed, 18 Jun 2025 01:34:15 +0000 (18:34 -0700)] 
Merge branch 'net-mlx5e-add-support-for-devmem-and-io_uring-tcp-zero-copy'

Mark Bloch says:

====================
net/mlx5e: Add support for devmem and io_uring TCP zero-copy

This series adds support for zerocopy rx TCP with devmem and io_uring
for ConnectX7 NICs and above. For performance reasons and simplicity
HW-GRO will also be turned on when header-data split mode is on.

Performance
===========

Test setup:

* CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (single NUMA)
* NIC: ConnectX7
* Benchmarking tool: kperf [0]
* Single TCP flow
* Test duration: 60s

With application thread and interrupts pinned to the *same* core:

|------+-----------+----------|
| MTU  | epoll     | io_uring |
|------+-----------+----------|
| 1500 | 61.6 Gbps | 114 Gbps |
| 4096 | 69.3 Gbps | 151 Gbps |
| 9000 | 67.8 Gbps | 187 Gbps |
|------+-----------+----------|

The CPU usage for io_uring is 95%.

Reproduction steps for io_uring:

server --no-daemon -a 2001:db8::1 --no-memcmp --iou --iou_sendzc \
--iou_zcrx --iou_dev_name eth2 --iou_zcrx_queue_id 2

server --no-daemon -a 2001:db8::2 --no-memcmp --iou --iou_sendzc

client --src 2001:db8::2 --dst 2001:db8::1 \
--msg-zerocopy -t 60 --cpu-min=2 --cpu-max=2

Patch overview:
================

First, a netmem API for skb_can_coalesce is added to the core to be able
to do skb fragment coalescing on netmems.

The next patches introduce some cleanups in the internal SHAMPO code and
improvements to hw gro capability checks in FW.

A separate page_pool is introduced for headers, to be used only when
the rxq has a memory provider.

Then the driver is converted to use the netmem API and to allow support
for unreadable netmem page pool.

The queue management ops are implemented.

Finally, the tcp-data-split ring parameter is exposed.
References
==========
[0] kperf: git://git.kernel.dk/kperf.git
v1: https://lore.kernel.org/20250116215530.158886-1-saeed@kernel.org
v2: https://lore.kernel.org/1747950086-1246773-1-git-send-email-tariqt@nvidia.com
v3: https://lore.kernel.org/20250609145833.990793-1-mbloch@nvidia.com
v4: https://lore.kernel.org/20250610150950.1094376-1-mbloch@nvidia.com
v5: https://lore.kernel.org/20250612154648.1161201-1-mbloch@nvidia.com
====================

Link: https://patch.msgid.link/20250616141441.1243044-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Add TX support for netmems
Dragos Tatulea [Mon, 16 Jun 2025 14:14:41 +0000 (17:14 +0300)] 
net/mlx5e: Add TX support for netmems

Declare netmem TX support in netdev.

As required, use the netmem aware dma unmapping APIs
for unmapping netmems in tx completion path.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-13-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Support ethtool tcp-data-split settings
Saeed Mahameed [Mon, 16 Jun 2025 14:14:40 +0000 (17:14 +0300)] 
net/mlx5e: Support ethtool tcp-data-split settings

In mlx5, tcp header-data split requires HW GRO to be on.

Enabling it fails when HW GRO is off.
mlx5e_fix_features now keeps HW GRO on when tcp data split is enabled.
Finally, when tcp data split is disabled, features are updated to maybe
remove the forced HW GRO.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-12-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Implement queue mgmt ops and single channel swap
Saeed Mahameed [Mon, 16 Jun 2025 14:14:39 +0000 (17:14 +0300)] 
net/mlx5e: Implement queue mgmt ops and single channel swap

The bulk of the work is done in mlx5e_queue_mem_alloc, where we allocate
and create the new channel resources, similar to
mlx5e_safe_switch_params, but here we do it for a single channel using
existing params, sort of a clone channel.
To swap the old channel with the new one, we deactivate and close the
old channel then replace it with the new one, since the swap procedure
doesn't fail in mlx5, we do it all in one place (mlx5e_queue_start).

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250616141441.1243044-11-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Add support for UNREADABLE netmem page pools
Saeed Mahameed [Mon, 16 Jun 2025 14:14:38 +0000 (17:14 +0300)] 
net/mlx5e: Add support for UNREADABLE netmem page pools

On netdev_rx_queue_restart, a special type of page pool maybe expected.

In this patch declare support for UNREADABLE netmem iov pages in the
pool params only when header data split shampo RQ mode is enabled, also
set the queue index in the page pool params struct.

Shampo mode requirement: Without header split rx needs to peek at the data,
we can't do UNREADABLE_NETMEM.

The patch also enables the use of a separate page pool for headers when
a memory provider is installed for the queue, otherwise the same common
page pool continues to be used.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-10-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Convert over to netmem
Saeed Mahameed [Mon, 16 Jun 2025 14:14:37 +0000 (17:14 +0300)] 
net/mlx5e: Convert over to netmem

mlx5e_page_frag holds the physical page itself, to naturally support
zc page pools, remove physical page reference from mlx5 and replace it
with netmem_ref, to avoid internal handling in mlx5 for net_iov backed
pages.

SHAMPO can issue packets that are not split into header and data. These
packets will be dropped if the data part resides in a net_iov as the
driver can't read into this area.

No performance degradation observed.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: SHAMPO: Separate pool for headers
Saeed Mahameed [Mon, 16 Jun 2025 14:14:36 +0000 (17:14 +0300)] 
net/mlx5e: SHAMPO: Separate pool for headers

Allow allocating a separate page pool for headers when SHAMPO is on.
This will be useful for adding support to zc page pool, which has to be
different from the headers page pool.
For now, the pools are the same.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-8-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: SHAMPO: Improve hw gro capability checking
Saeed Mahameed [Mon, 16 Jun 2025 14:14:35 +0000 (17:14 +0300)] 
net/mlx5e: SHAMPO: Improve hw gro capability checking

Add missing HW capabilities, declare the feature in
netdev->vlan_features, similar to other features in mlx5e_build_nic_netdev.
No functional change here as all by default disabled features are
explicitly disabled at the bottom of the function.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: SHAMPO: Remove redundant params
Saeed Mahameed [Mon, 16 Jun 2025 14:14:34 +0000 (17:14 +0300)] 
net/mlx5e: SHAMPO: Remove redundant params

Two SHAMPO params are static and always the same, remove them from the
global mlx5e_params struct.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc
Saeed Mahameed [Mon, 16 Jun 2025 14:14:33 +0000 (17:14 +0300)] 
net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc

Drop redundant SHAMPO structure alloc/free functions.

Gather together function calls pertaining to header split info, pass
header per WQE (hd_per_wqe) as parameter to those function to avoid use
before initialization future mistakes.

Allocate HW GRO related info outside of the header related info scope.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agopage_pool: Add page_pool_dev_alloc_netmems helper
Dragos Tatulea [Mon, 16 Jun 2025 14:14:32 +0000 (17:14 +0300)] 
page_pool: Add page_pool_dev_alloc_netmems helper

This is the netmem counterpart of page_pool_dev_alloc_pages() which
uses the default GFP flags for RX.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: Add skb_can_coalesce for netmem
Dragos Tatulea [Mon, 16 Jun 2025 14:14:31 +0000 (17:14 +0300)] 
net: Add skb_can_coalesce for netmem

Allow drivers that have moved over to netmem to do fragment coalescing.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: Allow const args for of page_to_netmem()
Dragos Tatulea [Mon, 16 Jun 2025 14:14:30 +0000 (17:14 +0300)] 
net: Allow const args for of page_to_netmem()

This allows calling page_to_netmem() with a const page * argument.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250616141441.1243044-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: tcp: tsq: Convert from tasklet to BH workqueue
Tejun Heo [Mon, 16 Jun 2025 18:10:47 +0000 (08:10 -1000)] 
net: tcp: tsq: Convert from tasklet to BH workqueue

The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts TCP Small Queues implementation from tasklet to BH
workqueue.

Semantically, this is an equivalent conversion and there shouldn't be any
user-visible behavior changes. While workqueue's queueing and execution
paths are a bit heavier than tasklet's, unless the work item is being queued
every packet, the difference hopefully shouldn't matter.

My experience with the networking stack is very limited and this patch
definitely needs attention from someone who actually understands networking.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/aFBeJ38AS1ZF3Dq5@slm.duckdns.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'ipmr-ip6mr-allow-mc-routing-locally-generated-mc-packets'
Jakub Kicinski [Wed, 18 Jun 2025 01:18:48 +0000 (18:18 -0700)] 
Merge branch 'ipmr-ip6mr-allow-mc-routing-locally-generated-mc-packets'

Petr Machata says:

====================
ipmr, ip6mr: Allow MC-routing locally-generated MC packets

Multicast routing is today handled in the input path. Locally generated MC
packets don't hit the IPMR code. Thus if a VXLAN remote address is
multicast, the driver needs to set an OIF during route lookup. In practice
that means that MC routing configuration needs to be kept in sync with the
VXLAN FDB and MDB. Ideally, the VXLAN packets would be routed by the MC
routing code instead.

To that end, this patchset adds support to route locally generated
multicast packets.

However, an installation that uses a VXLAN underlay netdevice for which it
also has matching MC routes, would get a different routing with this patch.
Previously, the MC packets would be delivered directly to the underlay
port, whereas now they would be MC-routed. In order to avoid this change in
behavior, introduce an IPCB/IP6CB flag. Unless the flag is set, the new
MC-routing code is skipped.

All this is keyed to a new VXLAN attribute, IFLA_VXLAN_MC_ROUTE. Only when
it is set does any of the above engage.

In addition to that, and as is the case today with MC forwarding,
IPV4_DEVCONF_MC_FORWARDING must be enabled for the netdevice that acts as a
source of MC traffic (i.e. the VXLAN PHYS_DEV), so an MC daemon must be
attached to the netdevice.

When a VXLAN netdevice with a MC remote is brought up, the physical
netdevice joins the indicated MC group. This is important for local
delivery of MC packets, so it is still necessary to configure a physical
netdevice -- the parameter cannot go away. The netdevice would however
typically not be a front panel port, but a dummy. An MC daemon would then
sit on top of that netdevice as well as any front panel ports that it needs
to service, and have routes set up between the two.

A way to configure the VXLAN netdevice to take advantage of the new MC
routing would be:

 # ip link add name d up type dummy
 # ip link add name vx10 up type vxlan id 1000 dstport 4789 \
local 192.0.2.1 group 225.0.0.1 ttl 16 dev d mrcoute
 # ip link set dev vx10 master br # plus vlans etc.

With the following MC routes:

 (192.0.2.1, 225.0.0.1) iif=d oil=swp1,swp2 # TX route
 (*, 225.0.0.1) iif=swp1 oil=d,swp2         # RX route
 (*, 225.0.0.1) iif=swp2 oil=d,swp1         # RX route

The RX path has not changed, with the exception of an extra MC hop. Packets
are delivered to the front panel port and MC-forwarded to the VXLAN
physical port, here "d". Since the port has joined the multicast group, the
packets are locally delivered, and end up being processed by the VXLAN
netdevice.

This patchset is based on earlier patches from Nikolay Aleksandrov and
Roopa Prabhu, though it underwent significant changes. Roopa broadly
presented the topic on LPC 2019 [0].

Patchset progression:

- Patches #1 to #4 add ip_mr_output()
- Patches #5 to #10 add ip6_mr_output()
- Patch #11 adds the VXLAN bits to enable MR engagement
- Patches #12 to #14 prepare selftest libraries
- Patch #15 includes a new test suite

[0] https://www.youtube.com/watch?v=xlReECfi-uo
====================

Link: https://patch.msgid.link/cover.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: forwarding: Add a test for verifying VXLAN MC underlay
Petr Machata [Mon, 16 Jun 2025 22:44:23 +0000 (00:44 +0200)] 
selftests: forwarding: Add a test for verifying VXLAN MC underlay

Add tests for MC-routing underlay VXLAN traffic.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/eecd2c0fefc754182e74be8e8e65751bf5749c21.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: forwarding: adf_mcd_start(): Allow configuring custom interfaces
Petr Machata [Mon, 16 Jun 2025 22:44:22 +0000 (00:44 +0200)] 
selftests: forwarding: adf_mcd_start(): Allow configuring custom interfaces

Tests may wish to add other interfaces to listen on. Notably locally
generated traffic uses dummy interfaces. The multicast daemon needs to know
about these so that it allows forming rules that involve these interfaces,
and so that net.ipv4.conf.X.mc_forwarding is set for the interfaces.

To that end, allow passing in a list of interfaces to configure in addition
to all the physical ones.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/2e8d83297985933be4850f2b9f296b3c27110388.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: net: lib: Add ip_link_has_flag()
Petr Machata [Mon, 16 Jun 2025 22:44:21 +0000 (00:44 +0200)] 
selftests: net: lib: Add ip_link_has_flag()

Add a helper to determine whether a given netdevice has a given flag.

Rewrite ip_link_is_up() in terms of the new helper.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/e1eb174a411f9d24735d095984c731d1d4a5a592.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: forwarding: lib: Move smcrouted helpers here
Petr Machata [Mon, 16 Jun 2025 22:44:20 +0000 (00:44 +0200)] 
selftests: forwarding: lib: Move smcrouted helpers here

router_multicast.sh has several helpers for work with smcrouted. Extract
them to lib.sh so that other selftests can use them as well. Convert the
helpers to defer in the process, because that simplifies the interface
quite a bit. Therefore have router_multicast.sh invoke
defer_scopes_cleanup() in its cleanup() function.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/410411c1a81225ce6e44542289b9c3ec21e5786c.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agovxlan: Support MC routing in the underlay
Petr Machata [Mon, 16 Jun 2025 22:44:19 +0000 (00:44 +0200)] 
vxlan: Support MC routing in the underlay

Locally-generated MC packets have so far not been subject to MC routing.
Instead an MC-enabled installation would maintain the MC routing tables,
and separately from that the list of interfaces to send packets to as part
of the VXLAN FDB and MDB.

In a previous patch, a ip_mr_output() and ip6_mr_output() routines were
added for IPv4 and IPv6. All locally generated MC traffic is now passed
through these functions. For reasons of backward compatibility, an SKB
(IPCB / IP6CB) flag guards the actual MC routing.

This patch adds logic to set the flag, and the UAPI to enable the behavior.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/d899655bb7e9b2521ee8c793e67056b9fd02ba12.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv6: Add ip6_mr_output()
Petr Machata [Mon, 16 Jun 2025 22:44:18 +0000 (00:44 +0200)] 
net: ipv6: Add ip6_mr_output()

Multicast routing is today handled in the input path. Locally generated MC
packets don't hit the IPMR code today. Thus if a VXLAN remote address is
multicast, the driver needs to set an OIF during route lookup. Thus MC
routing configuration needs to be kept in sync with the VXLAN FDB and MDB.
Ideally, the VXLAN packets would be routed by the MC routing code instead.

To that end, this patch adds support to route locally generated multicast
packets. The newly-added routines do largely what ip6_mr_input() and
ip6_mr_forward() do: make an MR cache lookup to find where to send the
packets, and use ip6_output() to send each of them. When no cache entry is
found, the packet is punted to the daemon for resolution.

Similarly to the IPv4 case in a previous patch, the new logic is contingent
on a newly-added IP6CB flag being set.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/3bcc034a3ab4d3c291072fff38f78d7fbbeef4e6.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv6: ip6mr: Split ip6mr_forward2() in two
Petr Machata [Mon, 16 Jun 2025 22:44:17 +0000 (00:44 +0200)] 
net: ipv6: ip6mr: Split ip6mr_forward2() in two

Some of the work of ip6mr_forward2() is specific to IPMR forwarding, and
should not take place on the output path. In order to allow reuse of the
common parts, extract out of the function a helper,
ip6mr_prepare_forward().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/8932bd5c0fbe3f662b158803b8509604fa7bc113.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv6: ip6mr: Make ip6mr_forward2() void
Petr Machata [Mon, 16 Jun 2025 22:44:16 +0000 (00:44 +0200)] 
net: ipv6: ip6mr: Make ip6mr_forward2() void

Nobody uses the return value, so convert the function to void.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/e0bee259da0da58da96647ea9e21e6360c8f7e11.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv6: ip6mr: Fix in/out netdev to pass to the FORWARD chain
Petr Machata [Mon, 16 Jun 2025 22:44:15 +0000 (00:44 +0200)] 
net: ipv6: ip6mr: Fix in/out netdev to pass to the FORWARD chain

The netfilter hook is invoked with skb->dev for input netdevice, and
vif_dev for output netdevice. However at the point of invocation, skb->dev
is already set to vif_dev, and MR-forwarded packets are reported with
in=out:

 # ip6tables -A FORWARD -j LOG --log-prefix '[forw]'
 # cd tools/testing/selftests/net/forwarding
 # ./router_multicast.sh
 # dmesg | fgrep '[forw]'
 [ 1670.248245] [forw]IN=v5 OUT=v5 [...]

For reference, IPv4 MR code shows in and out as appropriate.
Fix by caching skb->dev and using the updated value for output netdev.

Fixes: 7bc570c8b4f7 ("[IPV6] MROUTE: Support multicast forwarding.")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/3141ae8386fbe13fef4b793faa75e6bae58d798a.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv6: Add a flags argument to ip6tunnel_xmit(), udp_tunnel6_xmit_skb()
Petr Machata [Mon, 16 Jun 2025 22:44:14 +0000 (00:44 +0200)] 
net: ipv6: Add a flags argument to ip6tunnel_xmit(), udp_tunnel6_xmit_skb()

ip6tunnel_xmit() erases the contents of the SKB control block. In order to
be able to set particular IP6CB flags on the SKB, add a corresponding
parameter, and propagate it to udp_tunnel6_xmit_skb() as well.

In one of the following patches, VXLAN driver will use this facility to
mark packets as subject to IPv6 multicast routing.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/acb4f9f3e40c3a931236c3af08a720b017fbfbfb.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv6: Make udp_tunnel6_xmit_skb() void
Petr Machata [Mon, 16 Jun 2025 22:44:13 +0000 (00:44 +0200)] 
net: ipv6: Make udp_tunnel6_xmit_skb() void

The function always returns zero, thus the return value does not carry any
signal. Just make it void.

Most callers already ignore the return value. However:

- Refold arguments of the call from sctp_v6_xmit() so that they fit into
  the 80-column limit.

- tipc_udp_xmit() initializes err from the return value, but that should
  already be always zero at that point. So there's no practical change, but
  elision of the assignment prompts a couple more tweaks to clean up the
  function.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/7facacf9d8ca3ca9391a4aee88160913671b868d.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv4: Add ip_mr_output()
Petr Machata [Mon, 16 Jun 2025 22:44:12 +0000 (00:44 +0200)] 
net: ipv4: Add ip_mr_output()

Multicast routing is today handled in the input path. Locally generated MC
packets don't hit the IPMR code today. Thus if a VXLAN remote address is
multicast, the driver needs to set an OIF during route lookup. Thus MC
routing configuration needs to be kept in sync with the VXLAN FDB and MDB.
Ideally, the VXLAN packets would be routed by the MC routing code instead.

To that end, this patch adds support to route locally generated multicast
packets. The newly-added routines do largely what ip_mr_input() and
ip_mr_forward() do: make an MR cache lookup to find where to send the
packets, and use ip_mc_output() to send each of them. When no cache entry
is found, the packet is punted to the daemon for resolution.

However, an installation that uses a VXLAN underlay netdevice for which it
also has matching MC routes, would get a different routing with this patch.
Previously, the MC packets would be delivered directly to the underlay
port, whereas now they would be MC-routed. In order to avoid this change in
behavior, introduce an IPCB flag. Only if the flag is set will
ip_mr_output() actually engage, otherwise it reverts to ip_mc_output().

This code is based on work by Roopa Prabhu and Nikolay Aleksandrov.

Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/0aadbd49330471c0f758d54afb05eb3b6e3a6b65.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv4: ipmr: Split ipmr_queue_xmit() in two
Petr Machata [Mon, 16 Jun 2025 22:44:11 +0000 (00:44 +0200)] 
net: ipv4: ipmr: Split ipmr_queue_xmit() in two

Some of the work of ipmr_queue_xmit() is specific to IPMR forwarding, and
should not take place on the output path. In order to allow reuse of the
common parts, split the function into two: the ipmr_prepare_xmit() helper
that takes care of the common bits, and the ipmr_queue_fwd_xmit(), which
invokes the former and encapsulates the whole forwarding algorithm.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/4e8db165572a4f8bd29a723a801e854e9d20df4d.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv4: ipmr: ipmr_queue_xmit(): Drop local variable `dev'
Petr Machata [Mon, 16 Jun 2025 22:44:10 +0000 (00:44 +0200)] 
net: ipv4: ipmr: ipmr_queue_xmit(): Drop local variable `dev'

The variable is used for caching of rt->dst.dev. The netdevice referenced
therein does not change during the scope of validity of that local. At the
same time, the local is only used twice, and each of these uses will end up
in a different function in the following patches, further eliminating any
use the local could have had.

Drop the local altogether and inline the uses.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/c80600a4b51679fe78f429ccb6d60892c2f9e4de.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ipv4: Add a flags argument to iptunnel_xmit(), udp_tunnel_xmit_skb()
Petr Machata [Mon, 16 Jun 2025 22:44:09 +0000 (00:44 +0200)] 
net: ipv4: Add a flags argument to iptunnel_xmit(), udp_tunnel_xmit_skb()

iptunnel_xmit() erases the contents of the SKB control block. In order to
be able to set particular IPCB flags on the SKB, add a corresponding
parameter, and propagate it to udp_tunnel_xmit_skb() as well.

In one of the following patches, VXLAN driver will use this facility to
mark packets as subject to IP multicast routing.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Antonio Quartulli <antonio@openvpn.net>
Link: https://patch.msgid.link/89c9daf9f2dc088b6b92ccebcc929f51742de91f.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'misc-vlan-cleanups'
Jakub Kicinski [Wed, 18 Jun 2025 01:06:42 +0000 (18:06 -0700)] 
Merge branch 'misc-vlan-cleanups'

Gal Pressman says:

====================
Misc vlan cleanups

This patch series addresses compilation issues with objtool when VLAN
support is disabled (CONFIG_VLAN_8021Q=n) and makes related improvements
to the VLAN infrastructure.

When CONFIG_VLAN_8021Q=n, CONFIG_OBJTOOL=y, and CONFIG_OBJTOOL_WERROR=y,
the following compilation error occurs:

drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: error: objtool: parse_mirred.isra.0+0x370: mlx5e_tc_act_vlan_add_push_action() missing __noreturn in .c/.h or NORETURN() in noreturns.h

The error occurs because objtool cannot determine that unreachable BUG()
calls in VLAN code paths are actually dead code when VLAN support is
disabled.

First patch makes is_vlan_dev() a stub when VLAN is not configured,
allows compile-out of VLAN-dependent dead code paths and resolves the
objtool compilation error.

Second patch replaces BUG() calls with WARN_ON_ONCE(), as the usage of
BUG() should be avoided.

Third patch uses the "kernel" way of testing whether an option is
configured as builtin/module, instead of open-coding it.

v2: https://lore.kernel.org/20250610072611.1647593-1-gal@nvidia.com/
====================

Link: https://patch.msgid.link/20250616132626.1749331-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: vlan: Use IS_ENABLED() helper for CONFIG_VLAN_8021Q guard
Gal Pressman [Mon, 16 Jun 2025 13:26:26 +0000 (16:26 +0300)] 
net: vlan: Use IS_ENABLED() helper for CONFIG_VLAN_8021Q guard

The header currently tests the VLAN core with an explicit pair of 'if
defined' checks:
    #if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)

Instead, use IS_ENABLED() which is the kernel way to test whether an
option is configured as builtin/module.

This is purely cosmetic – no functional changes.

Reviewed-by: Alex Lazar <alazar@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250616132626.1749331-4-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: vlan: Replace BUG() with WARN_ON_ONCE() in vlan_dev_* stubs
Gal Pressman [Mon, 16 Jun 2025 13:26:25 +0000 (16:26 +0300)] 
net: vlan: Replace BUG() with WARN_ON_ONCE() in vlan_dev_* stubs

When CONFIG_VLAN_8021Q=n, a set of stub helpers are used, three of these
helpers use BUG() unconditionally.

This code should not be reached, as callers of these functions should
always check for is_vlan_dev() first, but the usage of BUG() is not
recommended, replace it with WARN_ON() instead.

Reviewed-by: Alex Lazar <alazar@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250616132626.1749331-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: vlan: Make is_vlan_dev() a stub when VLAN is not configured
Gal Pressman [Mon, 16 Jun 2025 13:26:24 +0000 (16:26 +0300)] 
net: vlan: Make is_vlan_dev() a stub when VLAN is not configured

Add a stub implementation of is_vlan_dev() that returns false when
VLAN support is not compiled in (CONFIG_VLAN_8021Q=n).

This allows us to compile-out VLAN-dependent dead code when it is not
needed.

This also resolves the following compilation error when:
* CONFIG_VLAN_8021Q=n
* CONFIG_OBJTOOL=y
* CONFIG_OBJTOOL_WERROR=y

drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: error: objtool: parse_mirred.isra.0+0x370: mlx5e_tc_act_vlan_add_push_action() missing __noreturn in .c/.h or NORETURN() in noreturns.h

The error occurs because objtool cannot determine that unreachable BUG()
(which doesn't return) calls in VLAN code paths are actually dead code
when VLAN support is disabled.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250616132626.1749331-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-use-new-gpio-line-value-setter-callbacks'
Jakub Kicinski [Wed, 18 Jun 2025 01:04:13 +0000 (18:04 -0700)] 
Merge branch 'net-use-new-gpio-line-value-setter-callbacks'

Bartosz Golaszewski says:

====================
net: use new GPIO line value setter callbacks

Commit 98ce1eb1fd87e ("gpiolib: introduce gpio_chip setters that return
values") added new line setter callbacks to struct gpio_chip. They allow
to indicate failures to callers. We're in the process of converting all
GPIO controllers to using them before removing the old ones. This series
converts all GPIO chips implemented under drivers/net/.

v1: https://lore.kernel.org/20250610-gpiochip-set-rv-net-v1-0-35668dd1c76f@linaro.org
====================

Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-0-cae0b182a552@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: qca807x: use new GPIO line value setter callbacks
Bartosz Golaszewski [Mon, 16 Jun 2025 07:24:08 +0000 (09:24 +0200)] 
net: phy: qca807x: use new GPIO line value setter callbacks

struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-5-cae0b182a552@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: can: mcp251x: use new GPIO line value setter callbacks
Bartosz Golaszewski [Mon, 16 Jun 2025 07:24:07 +0000 (09:24 +0200)] 
net: can: mcp251x: use new GPIO line value setter callbacks

struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-4-cae0b182a552@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: can: mcp251x: propagate the return value of mcp251x_spi_write()
Bartosz Golaszewski [Mon, 16 Jun 2025 07:24:06 +0000 (09:24 +0200)] 
net: can: mcp251x: propagate the return value of mcp251x_spi_write()

Add an integer return value to mcp251x_write_bits() and use it to
propagate the one returned by mcp251x_spi_write(). Return that value on
error in the request() GPIO callback.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-3-cae0b182a552@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: mt7530: use new GPIO line value setter callbacks
Bartosz Golaszewski [Mon, 16 Jun 2025 07:24:05 +0000 (09:24 +0200)] 
net: dsa: mt7530: use new GPIO line value setter callbacks

struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-2-cae0b182a552@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: vsc73xx: use new GPIO line value setter callbacks
Bartosz Golaszewski [Mon, 16 Jun 2025 07:24:04 +0000 (09:24 +0200)] 
net: dsa: vsc73xx: use new GPIO line value setter callbacks

struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://patch.msgid.link/20250616-gpiochip-set-rv-net-v2-1-cae0b182a552@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agogve: Return error for unknown admin queue command
Alok Tiwari [Mon, 16 Jun 2025 05:45:01 +0000 (22:45 -0700)] 
gve: Return error for unknown admin queue command

In gve_adminq_issue_cmd(), return -EINVAL instead of 0 when an unknown
admin queue command opcode is encountered.

This prevents the function from silently succeeding on invalid input
and prevents undefined behavior by ensuring the function fails gracefully
when an unrecognized opcode is provided.

These changes improve error handling.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250616054504.1644770-2-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agogve: Fix various typos and improve code comments
Alok Tiwari [Mon, 16 Jun 2025 05:45:00 +0000 (22:45 -0700)] 
gve: Fix various typos and improve code comments

- Correct spelling and improves the clarity of comments
   "confiugration" -> "configuration"
   "spilt" -> "split"
   "It if is 0" -> "If it is 0"
   "DQ" -> "DQO" (correct abbreviation)
- Clarify BIT(0) flag usage in gve_get_priv_flags()
- Replaced hardcoded array size with GVE_NUM_PTYPES
  for clarity and maintainability.

These changes are purely cosmetic and do not affect functionality.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250616054504.1644770-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: devmem: add ipv4 support to chunks test
Mina Almasry [Sun, 15 Jun 2025 20:35:11 +0000 (20:35 +0000)] 
selftests: devmem: add ipv4 support to chunks test

Add ipv4 support to the recently added chunks tests, which was added as
ipv6 only.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250615203511.591438-3-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: devmem: remove unused variable
Mina Almasry [Sun, 15 Jun 2025 20:35:10 +0000 (20:35 +0000)] 
selftests: devmem: remove unused variable

Trivial fix to unused variable.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250615203511.591438-2-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonetmem: fix netmem comments
Mina Almasry [Sun, 15 Jun 2025 20:35:09 +0000 (20:35 +0000)] 
netmem: fix netmem comments

Trivial fix to a couple of outdated netmem comments. No code changes,
just more accurately describing current code.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250615203511.591438-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftest: Add selftest for multicast address notifications
Yuyang Huang [Sat, 14 Jun 2025 05:35:22 +0000 (14:35 +0900)] 
selftest: Add selftest for multicast address notifications

This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR
and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and
removing a dummy interface and then confirming that the system
correctly receives join and removal notifications for the 224.0.0.1
and ff02::1 multicast addresses.

The test relies on the iproute2 version to be 6.13+.

Tested by the following command:
$ vng -v --user root --cpus 16 -- \
make -C tools/testing/selftests TARGETS=net
TEST_PROGS=rtnetlink_notification.sh \
TEST_GEN_PROGS="" run_tests

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Link: https://patch.msgid.link/20250614053522.623820-1-yuyanghuang@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-dsa-b53-fix-bcm5325-support'
Jakub Kicinski [Wed, 18 Jun 2025 00:52:32 +0000 (17:52 -0700)] 
Merge branch 'net-dsa-b53-fix-bcm5325-support'

Álvaro Fernández Rojas says:

====================
net: dsa: b53: fix BCM5325 support

These patches get the BCM5325 switch working with b53.

The existing brcm legacy tag only works with BCM63xx switches.
We need to add a new legacy tag for BCM5325 and BCM5365 switches, which
require including the FCS and length.

I'm not really sure that everything here is correct since I don't work for
Broadcom and all this is based on the public datasheet available for the
BCM5325 and my own experiments with a Huawei HG556a (BCM6358).

Both sets of patches have been merged due to the change requested by Jonas
about BRCM_HDR register access depending on legacy tags.
====================

Link: https://patch.msgid.link/20250614080000.1884236-1-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: ensure BCM5325 PHYs are enabled
Álvaro Fernández Rojas [Sat, 14 Jun 2025 08:00:00 +0000 (10:00 +0200)] 
net: dsa: b53: ensure BCM5325 PHYs are enabled

According to the datasheet, BCM5325 uses B53_PD_MODE_CTRL_25 register to
disable clocking to individual PHYs.
Only ports 1-4 can be enabled or disabled and the datasheet is explicit
about not toggling BIT(0) since it disables the PLL power and the switch.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-15-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: fix b53_imp_vlan_setup for BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:59 +0000 (09:59 +0200)] 
net: dsa: b53: fix b53_imp_vlan_setup for BCM5325

CPU port should be B53_CPU_PORT instead of B53_CPU_PORT_25 for
B53_PVLAN_PORT_MASK register.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-14-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: fix unicast/multicast flooding on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:58 +0000 (09:59 +0200)] 
net: dsa: b53: fix unicast/multicast flooding on BCM5325

BCM5325 doesn't implement UC_FLOOD_MASK, MC_FLOOD_MASK and IPMC_FLOOD_MASK
registers.
This has to be handled differently with other pages and registers.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-13-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: prevent GMII_PORT_OVERRIDE_CTRL access on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:57 +0000 (09:59 +0200)] 
net: dsa: b53: prevent GMII_PORT_OVERRIDE_CTRL access on BCM5325

BCM5325 doesn't implement GMII_PORT_OVERRIDE_CTRL register so we should
avoid reading or writing it.
PORT_OVERRIDE_RX_FLOW and PORT_OVERRIDE_TX_FLOW aren't defined on BCM5325
and we should use PORT_OVERRIDE_LP_FLOW_25 instead.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-12-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: prevent BRCM_HDR access on older devices
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:56 +0000 (09:59 +0200)] 
net: dsa: b53: prevent BRCM_HDR access on older devices

Older switches don't implement BRCM_HDR register so we should avoid
reading or writing it.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-11-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: prevent DIS_LEARNING access on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:55 +0000 (09:59 +0200)] 
net: dsa: b53: prevent DIS_LEARNING access on BCM5325

BCM5325 doesn't implement DIS_LEARNING register so we should avoid reading
or writing it.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-10-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: fix IP_MULTICAST_CTRL on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:54 +0000 (09:59 +0200)] 
net: dsa: b53: fix IP_MULTICAST_CTRL on BCM5325

BCM5325 doesn't implement B53_UC_FWD_EN, B53_MC_FWD_EN or B53_IPMC_FWD_EN.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-9-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: prevent SWITCH_CTRL access on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:53 +0000 (09:59 +0200)] 
net: dsa: b53: prevent SWITCH_CTRL access on BCM5325

BCM5325 doesn't implement SWITCH_CTRL register so we should avoid reading
or writing it.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-8-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: prevent FAST_AGE access on BCM5325
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:52 +0000 (09:59 +0200)] 
net: dsa: b53: prevent FAST_AGE access on BCM5325

BCM5325 doesn't implement FAST_AGE registers so we should avoid reading or
writing them.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-7-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: add support for FDB operations on 5325/5365
Florian Fainelli [Sat, 14 Jun 2025 07:59:51 +0000 (09:59 +0200)] 
net: dsa: b53: add support for FDB operations on 5325/5365

BCM5325 and BCM5365 are part of a much older generation of switches which,
due to their limited number of ports and VLAN entries (up to 256) allowed
a single 64-bit register to hold a full ARL entry.
This requires a little bit of massaging when reading, writing and
converting ARL entries in both directions.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-6-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: detect BCM5325 variants
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:50 +0000 (09:59 +0200)] 
net: dsa: b53: detect BCM5325 variants

We need to be able to differentiate the BCM5325 variants because:
- BCM5325M switches lack the ARLIO_PAGE->VLAN_ID_IDX register.
- BCM5325E have less 512 ARL buckets instead of 1024.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-5-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: b53: support legacy FCS tags
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:49 +0000 (09:59 +0200)] 
net: dsa: b53: support legacy FCS tags

Commit 46c5176c586c ("net: dsa: b53: support legacy tags") introduced
support for legacy tags, but it turns out that BCM5325 and BCM5365
switches require the original FCS value and length, so they have to be
treated differently.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-4-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: tag_brcm: add support for legacy FCS tags
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:48 +0000 (09:59 +0200)] 
net: dsa: tag_brcm: add support for legacy FCS tags

Add support for legacy Broadcom FCS tags, which are similar to
DSA_TAG_PROTO_BRCM_LEGACY.
BCM5325 and BCM5365 switches require including the original FCS value and
length, as opposed to BCM63xx switches.
Adding the original FCS value and length to DSA_TAG_PROTO_BRCM_LEGACY would
impact performance of BCM63xx switches, so it's better to create a new tag.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250614080000.1884236-3-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: tag_brcm: legacy: reorganize functions
Álvaro Fernández Rojas [Sat, 14 Jun 2025 07:59:47 +0000 (09:59 +0200)] 
net: dsa: tag_brcm: legacy: reorganize functions

Move brcm_leg_tag_rcv() definition to top.
This function is going to be shared between two different tags.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Link: https://patch.msgid.link/20250614080000.1884236-2-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'nte-stmmac-visconti-cleanups'
Jakub Kicinski [Tue, 17 Jun 2025 23:25:27 +0000 (16:25 -0700)] 
Merge branch 'nte-stmmac-visconti-cleanups'

Russell King says:

====================
net: stmmac: visconti: cleanups

A short series of cleanups to the visconti dwmac glue.
====================

Link: https://patch.msgid.link/aFCHJWXSLbUoogi6@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: visconti: make phy_intf_sel local
Russell King (Oracle) [Mon, 16 Jun 2025 21:06:32 +0000 (22:06 +0100)] 
net: stmmac: visconti: make phy_intf_sel local

There is little need to have phy_intf_sel as a member of struct
visconti_eth when we have the PHY interface mode available from
phylink in visconti_eth_set_clk_tx_rate(). Without multiple
interface support, phylink is fixed to supporting only
plat->phy_interface, so we can be sure that "interface" passed
into this function is the same as plat->phy_interface.

Make phy_intf_sel local to visconti_eth_init_hw() and clean up.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uRH2G-004UyY-GD@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: visconti: clean up code formatting
Russell King (Oracle) [Mon, 16 Jun 2025 21:06:27 +0000 (22:06 +0100)] 
net: stmmac: visconti: clean up code formatting

Ensure that code is wrapped prior to column 80, and shorten the
needlessly long "clk_sel_val" to just "clk_sel".

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uRH2B-004UyS-Ch@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: visconti: reorganise visconti_eth_set_clk_tx_rate()
Russell King (Oracle) [Mon, 16 Jun 2025 21:06:22 +0000 (22:06 +0100)] 
net: stmmac: visconti: reorganise visconti_eth_set_clk_tx_rate()

Rather than testing dwmac->phy_intf_sel several times for the same
values in this function, group the code together. The only part
which was common was stopping the internal clock before programming
the clock setting.

This further improves the readability of this function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uRH26-004UyM-9G@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: visconti: re-arrange speed decode
Russell King (Oracle) [Mon, 16 Jun 2025 21:06:17 +0000 (22:06 +0100)] 
net: stmmac: visconti: re-arrange speed decode

Re-arrange the speed decode in visconti_eth_set_clk_tx_rate() to be
more readable by first checking to see if we're using RGMII or RMII
and then decoding the speed, rather than decoding the speed and then
testing the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uRH21-004UyG-50@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'link-napi-instances-to-queues-and-irqs'
Jakub Kicinski [Tue, 17 Jun 2025 23:24:13 +0000 (16:24 -0700)] 
Merge branch 'link-napi-instances-to-queues-and-irqs'

Justin Lai says:

====================
Link NAPI instances to queues and IRQs

This patch series introduces netdev-genl support to rtase, enabling
user-space applications to query the relationships between IRQs,
queues, and NAPI instances.
====================

Link: https://patch.msgid.link/20250616032226.7318-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agortase: Link queues to NAPI instances
Justin Lai [Mon, 16 Jun 2025 03:22:26 +0000 (11:22 +0800)] 
rtase: Link queues to NAPI instances

Link queues to NAPI instances with netif_queue_set_napi. This
information can be queried with the netdev-genl API.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250616032226.7318-3-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agortase: Link IRQs to NAPI instances
Justin Lai [Mon, 16 Jun 2025 03:22:25 +0000 (11:22 +0800)] 
rtase: Link IRQs to NAPI instances

Link IRQs to NAPI instances with netif_napi_set_irq. This
information can be queried with the netdev-genl API.

Also add support for persistent NAPI configuration using
netif_napi_add_config().

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250616032226.7318-2-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: nettest: Fix typo in log and error messages for clarity
Alok Tiwari [Sun, 15 Jun 2025 08:48:12 +0000 (01:48 -0700)] 
selftests: nettest: Fix typo in log and error messages for clarity

This patch corrects several logging and error message in nettest.c:
- Corrects function name in log messages "setsockopt" -> "getsockopt".
- Closes missing parentheses in "setsockopt(IPV6_FREEBIND)".
- Replaces misleading error text ("Invalid port") with the correct
  description ("Invalid prefix length").
- remove Redundant wording like "status from status" and clarifies
  context in IPC error messages.

These changes improve readability and aid in debugging test output.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250615084822.1344759-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'tcp-remove-obsolete-rfc3517-rfc6675-code'
Jakub Kicinski [Tue, 17 Jun 2025 23:19:04 +0000 (16:19 -0700)] 
Merge branch 'tcp-remove-obsolete-rfc3517-rfc6675-code'

Neal Cardwell says:

====================
tcp: remove obsolete RFC3517/RFC6675 code

RACK-TLP loss detection has been enabled as the default loss detection
algorithm for Linux TCP since 2018, in:

 commit b38a51fec1c1 ("tcp: disable RFC6675 loss detection")

In case users ran into unexpected bugs or performance regressions,
that commit allowed Linux system administrators to revert to using
RFC3517/RFC6675 loss recovery by setting net.ipv4.tcp_recovery to 0.

In the seven years since 2018, our team has not heard reports of
anyone reverting Linux TCP to use RFC3517/RFC6675 loss recovery, and
we can't find any record in web searches of such a revert.

RACK-TLP was published as a standards-track RFC, RFC8985, in February
2021.

Several other major TCP implementations have default-enabled RACK-TLP
at this point as well.

RACK-TLP offers several significant performance advantages over
RFC3517/RFC6675 loss recovery, including much better performance in
the common cases of tail drops, lost retransmissions, and reordering.

It is now time to remove the obsolete and unused RFC3517/RFC6675 loss
recovery code. This will allow a substantial simplification of the
Linux TCP code base, and removes 12 bytes of state in every tcp_sock
for 64-bit machines (8 bytes on 32-bit machines).

To arrange the commits in reasonable sizes, this patch series is split
into 3 commits:

(1) Removes the core RFC3517/RFC6675 logic.

(2) Removes the RFC3517/RFC6675 hint state and the first layer of logic that
    updates that state.

(3) Removes the emptied-out tcp_clear_retrans_hints_partial() helper function
    and all of its call sites.
====================

Link: https://patch.msgid.link/20250615001435.2390793-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial()
Neal Cardwell [Sun, 15 Jun 2025 00:14:35 +0000 (20:14 -0400)] 
tcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial()

Now that we have removed the RFC3517/RFC6675 hints,
tcp_clear_retrans_hints_partial() is empty, and can be removed.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250615001435.2390793-4-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint
Neal Cardwell [Sun, 15 Jun 2025 00:14:34 +0000 (20:14 -0400)] 
tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint

Now that obsolete RFC3517/RFC6675 TCP loss detection has been removed,
we can remove the somewhat complex and intrusive code to maintain its
hint state: lost_skb_hint and lost_cnt_hint.

This commit makes tcp_clear_retrans_hints_partial() empty. We will
remove tcp_clear_retrans_hints_partial() and its call sites in the
next commit.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250615001435.2390793-3-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code
Neal Cardwell [Sun, 15 Jun 2025 00:14:33 +0000 (20:14 -0400)] 
tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code

RACK-TLP loss detection has been enabled as the default loss detection
algorithm for Linux TCP since 2018, in:

 commit b38a51fec1c1 ("tcp: disable RFC6675 loss detection")

In case users ran into unexpected bugs or performance regressions,
that commit allowed Linux system administrators to revert to using
RFC3517/RFC6675 loss recovery by setting net.ipv4.tcp_recovery to 0.

In the seven years since 2018, our team has not heard reports of
anyone reverting Linux TCP to use RFC3517/RFC6675 loss recovery, and
we can't find any record in web searches of such a revert.

RACK-TLP was published as a standards-track RFC, RFC8985, in February
2021.

Several other major TCP implementations have default-enabled RACK-TLP
at this point as well.

RACK-TLP offers several significant performance advantages over
RFC3517/RFC6675 loss recovery, including much better performance in
the common cases of tail drops, lost retransmissions, and reordering.

It is now time to remove the obsolete and unused RFC3517/RFC6675 loss
recovery code. This will allow a substantial simplification of the
Linux TCP code base, and removes 12 bytes of state in every tcp_sock
for 64-bit machines (8 bytes on 32-bit machines).

To arrange the commits in reasonable sizes, this patch series is split
into 3 commits. The following 2 commits remove bookkeeping state and
code that is no longer needed after this removal of RFC3517/RFC6675
loss recovery.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250615001435.2390793-2-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: bcmgenet: update PHY power down
Doug Berger [Sat, 14 Jun 2025 02:58:16 +0000 (19:58 -0700)] 
net: bcmgenet: update PHY power down

The disable sequence in bcmgenet_phy_power_set() is updated to
match the inverse sequence and timing (and spacing) of the
enable sequence. This ensures that LEDs driven by the GENET IP
are disabled when the GPHY is powered down.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250614025817.3808354-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agobnxt_en: Improve comment wording and error return code
Alok Tiwari [Sun, 15 Jun 2025 15:40:40 +0000 (08:40 -0700)] 
bnxt_en: Improve comment wording and error return code

Improved wording and grammar in several comments for clarity.
  "the must belongs" -> "it must belong"
  "mininum" -> "minimum"
  "fileds" -> "fields"

Replaced return -1 with -EINVAL in hwrm_ring_alloc_send_msg()
to return a proper error code.

These changes enhance code readability and consistent error handling.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250615154051.1365631-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: liquidio: Remove unused validate_cn23xx_pf_config_info()
Dr. David Alan Gilbert [Sat, 14 Jun 2025 23:49:41 +0000 (00:49 +0100)] 
net: liquidio: Remove unused validate_cn23xx_pf_config_info()

[Note, I'm wondering if actually this is a case of a missing call;
the other similar function is called in __verify_octeon_config_info(),
but I don't have or know the hardware.]

validate_cn23xx_pf_config_info() was added in 2016 by
commit 72c0091293c0 ("liquidio: CN23XX device init and sriov config")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250614234941.61769-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-stmmac-rk-more-cleanups'
Jakub Kicinski [Tue, 17 Jun 2025 22:30:15 +0000 (15:30 -0700)] 
Merge branch 'net-stmmac-rk-more-cleanups'

Russell King says:

====================
net: stmmac: rk: more cleanups

Another couple of cleanups removing pointless code.
====================

Link: https://patch.msgid.link/aE_u8mCkUXEWTzJe@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: rk: remove unnecessary clk_mac
Russell King (Oracle) [Mon, 16 Jun 2025 10:16:01 +0000 (11:16 +0100)] 
net: stmmac: rk: remove unnecessary clk_mac

The stmmac platform code already gets the "stmmaceth" clock, so there
is no need for drivers to get it. Use the stored pointer in struct
plat_stmmacenet_data instead of getting and storing our own pointer.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uR6sj-004Ku5-HR@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: rk: use device rather than platform device in rk_priv_data
Russell King (Oracle) [Mon, 16 Jun 2025 10:15:56 +0000 (11:15 +0100)] 
net: stmmac: rk: use device rather than platform device in rk_priv_data

All the code in dwmac-rk uses &bsp_priv->pdev->dev, nothing uses
bsp_priv->pdev directly. Store the struct device rather than the
struct platform_device in struct rk_priv_data, and simplifying the
code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uR6se-004Ktz-Dx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: rk: fix code formmating issue
Russell King (Oracle) [Mon, 16 Jun 2025 10:15:51 +0000 (11:15 +0100)] 
net: stmmac: rk: fix code formmating issue

Fix a code formatting issue introduced in the previous series, no
space after , before "int".

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uR6sZ-004Ktt-9y@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'io_uring-cmd-for-tx-timestamps'
Jakub Kicinski [Tue, 17 Jun 2025 22:24:29 +0000 (15:24 -0700)] 
Merge branch 'io_uring-cmd-for-tx-timestamps'

Pavel Begunkov says:

====================
io_uring cmd for tx timestamps (part)

Apply the networking helpers for the io_uring timestamp API.
====================

Link: https://patch.msgid.link/cover.1750065793.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: timestamp: add helper returning skb's tx tstamp
Pavel Begunkov [Mon, 16 Jun 2025 09:46:25 +0000 (10:46 +0100)] 
net: timestamp: add helper returning skb's tx tstamp

Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
associated with an error queue skb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/702357dd8936ef4c0d3864441e853bfe3224a677.1750065793.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'vsock-test-improve-transport_uaf-test'
Jakub Kicinski [Tue, 17 Jun 2025 21:50:37 +0000 (14:50 -0700)] 
Merge branch 'vsock-test-improve-transport_uaf-test'

Michal Luczaj says:

====================
vsock/test: Improve transport_uaf test

Increase the coverage of a test implemented in commit 301a62dfb0d0
("vsock/test: Add test for UAF due to socket unbinding"). Take this
opportunity to factor out some utility code, drop a redundant sync between
client and server, and introduce a /proc/kallsyms harvesting logic for
auto-detecting registered vsock transports.

v2: https://lore.kernel.org/20250528-vsock-test-inc-cov-v2-0-8f655b40d57c@rbox.co
v1: https://lore.kernel.org/20250523-vsock-test-inc-cov-v1-1-fa3507941bbd@rbox.co
====================

Link: https://patch.msgid.link/20250611-vsock-test-inc-cov-v3-0-5834060d9c20@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agovsock/test: Cover more CIDs in transport_uaf test
Michal Luczaj [Wed, 11 Jun 2025 19:56:52 +0000 (21:56 +0200)] 
vsock/test: Cover more CIDs in transport_uaf test

Increase the coverage of test for UAF due to socket unbinding, and losing
transport in general. It's a follow up to commit 301a62dfb0d0 ("vsock/test:
Add test for UAF due to socket unbinding") and discussion in [1].

The idea remains the same: take an unconnected stream socket with a
transport assigned and then attempt to switch the transport by trying (and
failing) to connect to some other CID. Now do this iterating over all the
well known CIDs (plus one).

While at it, drop the redundant synchronization between client and server.

Some single-transport setups can't be tested effectively; a warning is
issued. Depending on transports available, a variety of splats are possible
on unpatched machines. After reverting commit 78dafe1cf3af ("vsock: Orphan
socket after transport release") and commit fcdd2242c023 ("vsock: Keep the
binding until socket destruction"):

BUG: KASAN: slab-use-after-free in __vsock_bind+0x61f/0x720
Read of size 4 at addr ffff88811ff46b54 by task vsock_test/1475
Call Trace:
 dump_stack_lvl+0x68/0x90
 print_report+0x170/0x53d
 kasan_report+0xc2/0x180
 __vsock_bind+0x61f/0x720
 vsock_connect+0x727/0xc40
 __sys_connect+0xe8/0x100
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

WARNING: CPU: 0 PID: 1475 at net/vmw_vsock/virtio_transport_common.c:37 virtio_transport_send_pkt_info+0xb2b/0x1160
Call Trace:
 virtio_transport_connect+0x90/0xb0
 vsock_connect+0x782/0xc40
 __sys_connect+0xe8/0x100
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:sock_has_perm+0xa7/0x2a0
Call Trace:
 selinux_socket_connect_helper.isra.0+0xbc/0x450
 selinux_socket_connect+0x3b/0x70
 security_socket_connect+0x31/0xd0
 __sys_connect_file+0x79/0x1f0
 __sys_connect+0xe8/0x100
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 1518 at lib/refcount.c:25 refcount_warn_saturate+0xdd/0x140
RIP: 0010:refcount_warn_saturate+0xdd/0x140
Call Trace:
 __vsock_bind+0x65e/0x720
 vsock_connect+0x727/0xc40
 __sys_connect+0xe8/0x100
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 1475 at lib/refcount.c:28 refcount_warn_saturate+0x12b/0x140
RIP: 0010:refcount_warn_saturate+0x12b/0x140
Call Trace:
 vsock_remove_bound+0x18f/0x280
 __vsock_release+0x371/0x480
 vsock_release+0x88/0x120
 __sock_release+0xaa/0x260
 sock_close+0x14/0x20
 __fput+0x35a/0xaa0
 task_work_run+0xff/0x1c0
 do_exit+0x849/0x24c0
 make_task_dead+0xf3/0x110
 rewind_stack_and_make_dead+0x16/0x20

[1]: https://lore.kernel.org/netdev/CAGxU2F5zhfWymY8u0hrKksW8PumXAYz-9_qRmW==92oAx1BX3g@mail.gmail.com/

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250611-vsock-test-inc-cov-v3-3-5834060d9c20@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agovsock/test: Introduce get_transports()
Michal Luczaj [Wed, 11 Jun 2025 19:56:51 +0000 (21:56 +0200)] 
vsock/test: Introduce get_transports()

Return a bitmap of registered vsock transports. As guesstimated by grepping
/proc/kallsyms (CONFIG_KALLSYMS=y) for known symbols of type `struct
vsock_transport`, or `struct virtio_transport` in case the vsock_transport
is embedded within.

Note that the way `enum transport` and `transport_ksyms[]` are defined
triggers checkpatch.pl:

util.h:11: ERROR: Macros with complex values should be enclosed in parentheses
util.h:20: ERROR: Macros with complex values should be enclosed in parentheses
util.h:20: WARNING: Argument 'symbol' is not used in function-like macro
util.h:28: WARNING: Argument 'name' is not used in function-like macro

While commit 15d4734c7a58 ("checkpatch: qualify do-while-0 advice")
suggests it is known that the ERRORs heuristics are insufficient, I can not
find many other places where preprocessor is used in this
checkpatch-unhappy fashion. Notable exception being bcachefs, e.g.
fs/bcachefs/alloc_background_format.h. WARNINGs regarding unused macro
arguments seem more common, e.g. __ASM_SEL in arch/x86/include/asm/asm.h.

In other words, this might be unnecessarily complex. The same can be
achieved by just telling human to keep the order:

enum transport {
TRANSPORT_LOOPBACK = BIT(0),
TRANSPORT_VIRTIO = BIT(1),
TRANSPORT_VHOST = BIT(2),
TRANSPORT_VMCI = BIT(3),
TRANSPORT_HYPERV = BIT(4),
TRANSPORT_NUM = 5,
};

 #define KSYM_ENTRY(sym) "d " sym "_transport"

/* Keep `enum transport` order */
static const char * const transport_ksyms[] = {
KSYM_ENTRY("loopback"),
KSYM_ENTRY("virtio"),
KSYM_ENTRY("vhost"),
KSYM_ENTRY("vmci"),
KSYM_ENTRY("vhs"),
};

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Tested-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250611-vsock-test-inc-cov-v3-2-5834060d9c20@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agovsock/test: Introduce vsock_bind_try() helper
Michal Luczaj [Wed, 11 Jun 2025 19:56:50 +0000 (21:56 +0200)] 
vsock/test: Introduce vsock_bind_try() helper

Create a socket and bind() it. If binding failed, gracefully return an
error code while preserving `errno`.

Base vsock_bind() on top of it.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20250611-vsock-test-inc-cov-v3-1-5834060d9c20@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux
Jakub Kicinski [Tue, 17 Jun 2025 21:43:58 +0000 (14:43 -0700)] 
Merge branch 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux

Shradha Gupta says:

====================
Allow dyn MSI-X vector allocation of MANA

In this patchset we want to enable the MANA driver to be able to
allocate MSI-X vectors in PCI dynamically.

The first patch exports pci_msix_prepare_desc() in PCI to be able to
correctly prepare descriptors for dynamically added MSI-X vectors.

The second patch adds the support of dynamic vector allocation in
pci-hyperv PCI controller by enabling the MSI_FLAG_PCI_MSIX_ALLOC_DYN
flag and using the pci_msix_prepare_desc() exported in first patch.

The third patch adds a detailed description of the irq_setup(), to
help understand the function design better.

The fourth patch is a preparation patch for mana changes to support
dynamic IRQ allocation. It contains changes in irq_setup() to allow
skipping first sibling CPU sets, in case certain IRQs are already
affinitized to them.

The fifth patch has the changes in MANA driver to be able to allocate
MSI-X vectors dynamically. If the support does not exist it defaults to
older behavior.

* 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux:
  net: mana: Allocate MSI-X vectors dynamically
  net: mana: Allow irq_setup() to skip cpus for affinity
  net: mana: explain irq_setup() algorithm
  PCI: hv: Allow dynamic MSI-X vector allocation
  PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations
====================

Link: https://patch.msgid.link/1749650984-9193-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: Add c45_phy_ids sysfs directory entry
Yajun Deng [Fri, 13 Jun 2025 13:19:03 +0000 (21:19 +0800)] 
net: phy: Add c45_phy_ids sysfs directory entry

The phy_id field only shows the PHY ID of the C22 device, and the C45
device did not store its PHY ID in this field.

Add a new phy_mmd_group, and export the mmd<n>_device_id for the C45
device. These files are invisible to the C22 device.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250613131903.2961-1-yajun.deng@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoMerge branch 'intel-next-queue-1GbE'
Paolo Abeni [Tue, 17 Jun 2025 12:58:47 +0000 (14:58 +0200)] 
Merge branch 'intel-next-queue-1GbE'

Tony Nguyen says:

====================
Faizal Rahim says:

MAC Merge support for frame preemption was previously added for igc:
https://lore.kernel.org/netdev/20250418163822.3519810-1-anthony.l.nguyen@intel.com/

This series builds on that work and adds support for:
- Harmonizing taprio and mqprio queue priority behavior, based on past
  discussions and suggestions:
  https://lore.kernel.org/all/20250214102206.25dqgut5tbak2rkz@skbuf/
- Enabling preemptible queue support for both taprio and mqprio, with
  priority harmonization as a prerequisite.

Patch organization:
- Patches 1-3: Preparation work for patches 6 and 7
- Patches 4-5: Queue priority harmonization
- Patches 6-7: Add preemptible queue support
====================

Link: https://patch.msgid.link/20250611180314.2059166-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: enetc: replace PCVLANR1/2 with SICVLANR1/2 and remove dead branch
Wei Fang [Fri, 13 Jun 2025 09:36:05 +0000 (17:36 +0800)] 
net: enetc: replace PCVLANR1/2 with SICVLANR1/2 and remove dead branch

Both PF and VF have rx-vlan-offload enabled, however, the PCVLANR1/2
registers are resources controlled by PF, so VF cannot access these
two registers. Fortunately, the hardware provides SICVLANR1/2 registers
for each SI to reflect the value of PCVLANR1/2 registers. Therefore,
use SICVLANR1/2 instead of PCVLANR1/2. Note that this is not an issue
in actual use, because the current driver does not support custom TPID,
the driver will not access these two registers in actual use, so this
modification is just an optimization.

In addition, since ENETC_RXBD_FLAG_TPID is defined as GENMASK(1, 0),
the possible values are only 0, 1, 2, 3, so the default branch will
never be true, so remove the default branch.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250613093605.39277-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: mana: Allocate MSI-X vectors dynamically
Shradha Gupta [Wed, 11 Jun 2025 14:11:13 +0000 (07:11 -0700)] 
net: mana: Allocate MSI-X vectors dynamically

Currently, the MANA driver allocates MSI-X vectors statically based on
MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
up allocating more vectors than it needs. This is because, by this time
we do not have a HW channel and do not know how many IRQs should be
allocated.

To avoid this, we allocate 1 MSI-X vector during the creation of HWC and
after getting the value supported by hardware, dynamically add the
remaining MSI-X vectors.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
6 weeks agonet: mana: Allow irq_setup() to skip cpus for affinity
Shradha Gupta [Wed, 11 Jun 2025 14:10:42 +0000 (07:10 -0700)] 
net: mana: Allow irq_setup() to skip cpus for affinity

In order to prepare the MANA driver to allocate the MSI-X IRQs
dynamically, we need to enhance irq_setup() to allow skipping
affinitizing IRQs to the first CPU sibling group.

This would be for cases when the number of IRQs is less than or equal
to the number of online CPUs. In such cases for dynamically added IRQs
the first CPU sibling group would already be affinitized with HWC IRQ.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
6 weeks agonet: mana: explain irq_setup() algorithm
Yury Norov [Wed, 11 Jun 2025 14:10:29 +0000 (07:10 -0700)] 
net: mana: explain irq_setup() algorithm

Commit 91bfe210e196 ("net: mana: add a function to spread IRQs per CPUs")
added the irq_setup() function that distributes IRQs on CPUs according
to a tricky heuristic. The corresponding commit message explains the
heuristic.

Duplicate it in the source code to make available for readers without
digging git in history. Also, add more detailed explanation about how
the heuristics is implemented.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
6 weeks agoPCI: hv: Allow dynamic MSI-X vector allocation
Shradha Gupta [Wed, 11 Jun 2025 14:10:15 +0000 (07:10 -0700)] 
PCI: hv: Allow dynamic MSI-X vector allocation

Allow dynamic MSI-X vector allocation for pci_hyperv PCI controller
by adding support for the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and using
pci_msix_prepare_desc() to prepare the MSI-X descriptors.

Feature support added for both x86 and ARM64

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
6 weeks agoPCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations
Shradha Gupta [Wed, 11 Jun 2025 14:10:01 +0000 (07:10 -0700)] 
PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations

For supporting dynamic MSI-X vector allocation by PCI controllers, enabling
the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc()
to prepare the MSI descriptor is also needed.

Export pci_msix_prepare_desc() to allow PCI controllers to support dynamic
MSI-X vector allocation.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
6 weeks agoeth: gianfar: migrate to new RXFH callbacks
Jakub Kicinski [Fri, 13 Jun 2025 17:27:51 +0000 (10:27 -0700)] 
eth: gianfar: migrate to new RXFH callbacks

Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").

Uniquely, this driver supports only the SET operation. It does not
support GET at all. The SET callback also always returns 0, even
tho it checks a bunch of conditions, and if my quick reading is
right, expects the user to insert filtering rules for given flow
type first? Long story short it seems too convoluted to easily
add the GET as part of the conversion.

Link: https://patch.msgid.link/20250613172751.3754732-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-phy-remove-phy_driver_is_genphy-and-phy_driver_is_genphy_10g'
Jakub Kicinski [Tue, 17 Jun 2025 01:15:19 +0000 (18:15 -0700)] 
Merge branch 'net-phy-remove-phy_driver_is_genphy-and-phy_driver_is_genphy_10g'

Heiner Kallweit says:

====================
net: phy: remove phy_driver_is_genphy and phy_driver_is_genphy_10g

Replace phy_driver_is_genphy() and phy_driver_is_genphy_10g()
with a new flag in struct phy_device.
====================

Link: https://patch.msgid.link/5778e86e-dd54-4388-b824-6132729ad481@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>