]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
2 weeks agonet: factor-out _sk_charge() helper
Paolo Abeni [Fri, 21 Nov 2025 17:02:00 +0000 (18:02 +0100)] 
net: factor-out _sk_charge() helper

Move out of __inet_accept() the code dealing charging newly
accepted socket to memcg. MPTCP will soon use it to on a per
subflow basis, in different contexts.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Geliang Tang <geliang@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-1-1f34b6c1e0b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoipvlan: fix sparse warning about __be32 -> u32
Dmitry Skorodumov [Fri, 21 Nov 2025 15:51:08 +0000 (18:51 +0300)] 
ipvlan: fix sparse warning about __be32 -> u32

Fixed a sparse warning:

ipvlan_core.c:56: warning: incorrect type in argument 1
(different base types) expected unsigned int [usertype] a
got restricted __be32 const [usertype] s_addr

Force cast the s_addr to u32

Signed-off-by: Dmitry Skorodumov <skorodumov.dmitry@huawei.com>
Link: https://patch.msgid.link/20251121155112.4182007-1-skorodumov.dmitry@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: mvpp2: extract GRXRINGS from .get_rxnfc
Breno Leitao [Fri, 21 Nov 2025 17:02:36 +0000 (09:02 -0800)] 
net: mvpp2: extract GRXRINGS from .get_rxnfc

Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count() for the mvpp2 driver.

This simplifies the RX ring count retrieval and aligns mvpp2 with the new
ethtool API for querying RX ring parameters, while keeping the other
rxnfc handlers (GRXCLSRLCNT, GRXCLSRULE, GRXCLSRLALL) intact.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251121-marvell-v1-2-8338f3e55a4c@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: mvneta: convert to use .get_rx_ring_count
Breno Leitao [Fri, 21 Nov 2025 17:02:35 +0000 (09:02 -0800)] 
net: mvneta: convert to use .get_rx_ring_count

Convert the mvneta driver to use the new .get_rx_ring_count ethtool
operation instead of implementing .get_rxnfc solely for handling
ETHTOOL_GRXRINGS command. This simplifies the code by removing the
switch statement and replacing it with a direct return of the queue
count.

The new callback provides the same functionality in a more direct way,
following the ongoing ethtool API modernization.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251121-marvell-v1-1-8338f3e55a4c@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: hyperv: convert to use .get_rx_ring_count
Breno Leitao [Fri, 21 Nov 2025 09:59:23 +0000 (01:59 -0800)] 
net: hyperv: convert to use .get_rx_ring_count

Convert the hyperv netvsc driver to use the new .get_rx_ring_count
ethtool operation instead of implementing .get_rxnfc solely for handling
ETHTOOL_GRXRINGS command. This simplifies the code by replacing the
switch statement with a direct return of the queue count.

The new callback provides the same functionality in a more direct way,
following the ongoing ethtool API modernization.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251121-hyperv_gxrings-v1-1-31293104953b@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: optimize eth_type_trans() vs CONFIG_STACKPROTECTOR_STRONG=y
Eric Dumazet [Fri, 21 Nov 2025 06:17:25 +0000 (06:17 +0000)] 
net: optimize eth_type_trans() vs CONFIG_STACKPROTECTOR_STRONG=y

Some platforms exhibit very high costs with CONFIG_STACKPROTECTOR_STRONG=y
when a function needs to pass the address of a local variable to external
functions.

eth_type_trans() (and its callers) is showing this anomaly on AMD EPYC 7B12
platforms (and maybe others).

We could :

1) inline eth_type_trans()

   This would help if its callers also has the same issue, and the canary cost
   would be paid by the callers already.

   This is a bit cumbersome because netdev_uses_dsa() is pulling
   whole <net/dsa.h> definitions.

2) Compile net/ethernet/eth.c with -fno-stack-protector

   This would weaken security.

3) Hack eth_type_trans() to temporarily use skb->dev as a place holder
   if skb_header_pointer() needs to pull 2 bytes not present in skb->head.

This patch implements 3), and brings a 5% improvement on TX/RX intensive
workload (tcp_rr 10,000 flows) on AMD EPYC 7B12.

Removing CONFIG_STACKPROTECTOR_STRONG on this platform can improve
performance by 25 %.
This means eth_type_trans() issue is not an isolated artifact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121061725.206675-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: af_unix: don't use SKIP for expected failures
Jakub Kicinski [Sun, 23 Nov 2025 02:16:01 +0000 (18:16 -0800)] 
selftests: af_unix: don't use SKIP for expected failures

netdev CI reserves SKIP in selftests for cases which can't be executed
due to setup issues, like missing or old commands. Tests which are
expected to fail must use XFAIL.

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251123021601.158709-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: netconsole: ensure required log level is set on netcons_basic
Andre Carvalho [Fri, 21 Nov 2025 15:00:22 +0000 (15:00 +0000)] 
selftests: netconsole: ensure required log level is set on netcons_basic

This commit ensures that the required log level is set at the start of
the test iteration.

Part of the cleanup performed at the end of each test iteration resets
the log level (do_cleanup in lib_netcons.sh) to the values defined at the
time test script started. This may cause further test iterations to fail
if the default values are not sufficient.

Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251121-netcons-basic-loglevel-v1-1-577f8586159c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'selftests-hw-net-toeplitz-read-config-from-the-nic-directly'
Jakub Kicinski [Tue, 25 Nov 2025 02:51:44 +0000 (18:51 -0800)] 
Merge branch 'selftests-hw-net-toeplitz-read-config-from-the-nic-directly'

Jakub Kicinski says:

====================
selftests: hw-net: toeplitz: read config from the NIC directly

First patch here tries to auto-disable building the iouring sample.
Our CI will still run the iouring test(s), of course, but it looks
like the liburing updates aren't very quick in distroes and having
to hack around it when developing unrelated tests is a bit annoying.

Remaining 4 patches iron out running the Toeplitz hash test against
real NICs. I tested mlx5, bnxt and fbnic, they all pass now.
I switched to using YNL directly in the C code, can't see a reason
to get the info in Python and pass it to C via argv. The old code
likely did this because it predates YNL.
====================

Link: https://patch.msgid.link/20251121040259.3647749-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: hw-net: toeplitz: give the test up to 4 seconds
Jakub Kicinski [Fri, 21 Nov 2025 04:02:59 +0000 (20:02 -0800)] 
selftests: hw-net: toeplitz: give the test up to 4 seconds

Increase the receiver timeout. When running between machines
in different geographic regions the test needs more than
a second to SSH across and send the frames.

The bkg() command that runs the receiver defaults to 5 sec timeout,
so using 4 sec sounds like a reasonable value for the receiver itself.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: hw-net: toeplitz: read indirection table from the device
Jakub Kicinski [Fri, 21 Nov 2025 04:02:58 +0000 (20:02 -0800)] 
selftests: hw-net: toeplitz: read indirection table from the device

Replace the simple modulo math with the real indirection table
read from the device. This makes the tests pass for mlx5 and
bnxt NICs.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: hw-net: toeplitz: read the RSS key directly from C
Jakub Kicinski [Fri, 21 Nov 2025 04:02:57 +0000 (20:02 -0800)] 
selftests: hw-net: toeplitz: read the RSS key directly from C

Now that we have YNL support for RSS accessing the RSS info from
C is very easy. Instead of passing the RSS key from Python do it
directly in the C code.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: hw-net: toeplitz: make sure NICs have pure Toeplitz configured
Jakub Kicinski [Fri, 21 Nov 2025 04:02:56 +0000 (20:02 -0800)] 
selftests: hw-net: toeplitz: make sure NICs have pure Toeplitz configured

Make sure that the NIC under test is configured for pure Toeplitz
hashing, and no input key transform (no symmetric hashing).

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: hw-net: auto-disable building the iouring C code
Jakub Kicinski [Fri, 21 Nov 2025 04:02:55 +0000 (20:02 -0800)] 
selftests: hw-net: auto-disable building the iouring C code

Looks like the liburing is not updated by distros very aggressively.
Presumably because a lot of packages depend on it. I just updated
to Fedora 43 and it's still on liburing 2.9. The test is 9mo old,
at this stage I think this warrants handling the build failure
more gracefully.

Detect if iouring is recent enough and if not print a warning
and exclude the C prog from build. The Python test will just
fail since the binary won't exist. But it removes the major
annoyance of having to update liburing from sources when
developing other tests.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoi40e: delete a stray tab
Dan Carpenter [Fri, 21 Nov 2025 13:35:10 +0000 (16:35 +0300)] 
i40e: delete a stray tab

This return statement is indented one tab too far.  Delete a tab.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Link: https://patch.msgid.link/aSBqjtA8oF25G1OG@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-stmmac-qcon-ethqos-rgmii-accessor-cleanups'
Jakub Kicinski [Sat, 22 Nov 2025 02:13:46 +0000 (18:13 -0800)] 
Merge branch 'net-stmmac-qcon-ethqos-rgmii-accessor-cleanups'

Russell King says:

====================
net: stmmac: qcon-ethqos: "rgmii" accessor cleanups

This series cleans up the "rgmii" accessors in qcom-ethqos.

readl() and writel() return and take a u32 for the value. Rather than
implicitly casting this to an int, keep it as a u32.

Add set/clear functions to reduce the code and make it easier to read.

Finally, convert the open-coded poll loops to use the iopoll helpers.

Note that patch 1 has a checkpatch warning concerning "volatile" -
I'm changing the type here, and the "volatile" is removed in patch 3.
I do not feel it is appropriate to remove it in patch 1.
====================

Link: https://patch.msgid.link/aR76i0HjXitfl7xk@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: qcom-ethqos: use read_poll_timeout_atomic()
Russell King (Oracle) [Thu, 20 Nov 2025 11:25:27 +0000 (11:25 +0000)] 
net: stmmac: qcom-ethqos: use read_poll_timeout_atomic()

Use read_poll_timeout_atomic() to poll the rgmii registers rather than
open-coding the polling.

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vM2n1-0000000FRTu-0js9@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: qcom-ethqos: add rgmii set/clear functions
Russell King (Oracle) [Thu, 20 Nov 2025 11:25:22 +0000 (11:25 +0000)] 
net: stmmac: qcom-ethqos: add rgmii set/clear functions

The driver has a lot of bit manipulation of the RGMII registers. Add
a pair of helpers to set bits and clear bits, converting the various
calls to rgmii_updatel() as appropriate.

Most of the change was done via this sed script:

/rgmii_updatel/ {
N
/,$/N
/mask, / ! {
s|rgmii_updatel\(([^,]*,\s+([^,]*),\s+)\2,\s+|rgmii_setmask(\1|
s|rgmii_updatel\(([^,]*,\s+([^,]*),\s+)0,\s+|rgmii_clrmask(\1|
s|^\s+$||
}
}

and then formatting tweaked where necessary.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vM2mw-0000000FRTo-0End@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: qcom-ethqos: use u32 for rgmii read/write/update
Russell King (Oracle) [Thu, 20 Nov 2025 11:25:16 +0000 (11:25 +0000)] 
net: stmmac: qcom-ethqos: use u32 for rgmii read/write/update

readl() returns a u32, and writel() takes a "u32" for the value. These
are used in rgmii_readl()() and rgmii_writel(), but the value and
return are "int". As these are 32-bit register values which are not
signed, use "u32".

These changes do not cause generated code changes.

Update rgmii_updatel() to use u32 for mask and val. Changing "mask"
to "u32" also does not cause generated code changes. However, changing
"val" causes the generated assembly to be re-ordered for aarch64.

Update the temporary variables used with the rgmii functions to use
u32.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vM2mq-0000000FRTi-3y5F@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: wwan: t7xx: Make local function static
Slark Xiao [Thu, 20 Nov 2025 11:52:08 +0000 (19:52 +0800)] 
net: wwan: t7xx: Make local function static

This function was used in t7xx_hif_cldma.c only. Make it static
as it should be.

Signed-off-by: Slark Xiao <slark_xiao@163.com>
Reviewed-by: Loic Poulain <loic.poulain@qualcomm.com>
Link: https://patch.msgid.link/20251120115208.345578-1-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'devlink-net-mlx5-implement-swp_l4_csum_mode-via-devlink-params'
Jakub Kicinski [Fri, 21 Nov 2025 03:01:24 +0000 (19:01 -0800)] 
Merge branch 'devlink-net-mlx5-implement-swp_l4_csum_mode-via-devlink-params'

Daniel Zahka says:

====================
devlink: net/mlx5: implement swp_l4_csum_mode via devlink params

This series introduces a new devlink feature for querying param
default values, and resetting params to their default values. This
feature is then used to implement a new mlx5 driver param.

The series starts with two pure refactor patches: one that passes
through the extack to devlink_param::get() implementations. And a
second small refactor that prepares the netlink tlv handling code in
the devlink_param::get() path to better handle default parameter
values.

The third patch introduces the uapi and driver api for default
parameter values. The driver api is opt-in, and both the uapi and
driver api preserve existing behavior when not used by drivers or
userspace.

The fourth patch introduces a new mlx5 driver param, swp_l4_csum_mode,
for controlling tx csum behavior. The "l4_only" value of this param is
a dependency for PSP initialization on CX7 NICs.

Lastly, the series introduces a new driver param with cmode runtime to
netdevsim, and then uses this param in a new testcase for netdevsim
devlink params.

Here are some examples of using the default param uapi with the devlink
cli. Note the devlink cli binary I am using has changes which I am
posting in accompanying series targeting iproute2-next:

  # netdevsim
./devlink dev param show netdevsim/netdevsim0
netdevsim/netdevsim0:
  name max_macs type generic
    values:
      cmode driverinit value 32 default 32
  name test1 type driver-specific
    values:
      cmode driverinit value true default true

  # set to false
./devlink dev param set netdevsim/netdevsim0 name test1 value false cmode driverinit
./devlink dev param show netdevsim/netdevsim0
netdevsim/netdevsim0:
  name max_macs type generic
    values:
      cmode driverinit value 32 default 32
  name test1 type driver-specific
    values:
      cmode driverinit value false default true

  # set back to default
./devlink dev param set netdevsim/netdevsim0 name test1 default cmode driverinit
./devlink dev param show netdevsim/netdevsim0
netdevsim/netdevsim0:
  name max_macs type generic
    values:
      cmode driverinit value 32 default 32
  name test1 type driver-specific
    values:
      cmode driverinit value true default true

 # mlx5 params on cx7
./devlink dev param show pci/0000:01:00.0
pci/0000:01:00.0:
  name max_macs type generic
    values:
      cmode driverinit value 128 default 128
...
  name swp_l4_csum_mode type driver-specific
    values:
      cmode permanent value default default default

  # set to l4_only
./devlink dev param set pci/0000:01:00.0 name swp_l4_csum_mode value l4_only cmode permanent
./devlink dev param show pci/0000:01:00.0 name swp_l4_csum_mode
pci/0000:01:00.0:
  name swp_l4_csum_mode type driver-specific
    values:
      cmode permanent value l4_only default default

  # reset to default
./devlink dev param set pci/0000:01:00.0 name swp_l4_csum_mode default cmode permanent
./devlink dev param show pci/0000:01:00.0 name swp_l4_csum_mode
pci/0000:01:00.0:
  name swp_l4_csum_mode type driver-specific
    values:
      cmode permanent value default default default
====================

Link: https://patch.msgid.link/20251119025038.651131-1-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftest: netdevsim: test devlink default params
Daniel Zahka [Wed, 19 Nov 2025 02:50:36 +0000 (18:50 -0800)] 
selftest: netdevsim: test devlink default params

Test querying default values and resetting to default values for
netdevsim devlink params.

This should cover the basic paths of interest: driverinit and
non-driverinit cmodes, as well as bool and non-bool value
type. Default param values of type bool are encoded with u8 netlink
type as opposed to flag type, so that userspace can distinguish
"not-present" from false.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-7-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetdevsim: register a new devlink param with default value interface
Daniel Zahka [Wed, 19 Nov 2025 02:50:35 +0000 (18:50 -0800)] 
netdevsim: register a new devlink param with default value interface

Create a new devlink param, test2, that supports default param actions
via the devlink_param::get_default() and
devlink_param::reset_default() functions.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-6-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet/mlx5: implement swp_l4_csum_mode via devlink params
Daniel Zahka [Wed, 19 Nov 2025 02:50:34 +0000 (18:50 -0800)] 
net/mlx5: implement swp_l4_csum_mode via devlink params

swp_l4_csum_mode controls how L4 transmit checksums are computed when
using Software Parser (SWP) hints for header locations.

Supported values:
  1. default: device will choose between full_csum or l4_only. Driver
     will discover the device's choice during initialization.
  2. full_csum: calculate L4 checksum with the pseudo-header.
  3. l4_only: calculate L4 checksum without the pseudo-header. Only
     available when swp_l4_csum_mode_l4_only is set in
     mlx5_ifc_nv_sw_offload_cap_bits.

Note that 'default' might be returned from the device and passed to
userspace, and it might also be set during a
devlink_param::reset_default() call, but attempts to set a value of
default directly with param-set will be rejected.

The l4_only setting is a dependency for PSP initialization in
mlx5e_psp_init().

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-5-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agodevlink: support default values for param-get and param-set
Daniel Zahka [Wed, 19 Nov 2025 02:50:33 +0000 (18:50 -0800)] 
devlink: support default values for param-get and param-set

Support querying and resetting to default param values.

Introduce two new devlink netlink attrs:
DEVLINK_ATTR_PARAM_VALUE_DEFAULT and
DEVLINK_ATTR_PARAM_RESET_DEFAULT. The former is used to contain an
optional parameter value inside of the param_value nested
attribute. The latter is used in param-set requests from userspace to
indicate that the driver should reset the param to its default value.

To implement this, two new functions are added to the devlink driver
api: devlink_param::get_default() and
devlink_param::reset_default(). These callbacks allow drivers to
implement default param actions for runtime and permanent cmodes. For
driverinit params, the core latches the last value set by a driver via
devl_param_driverinit_value_set(), and uses that as the default value
for a param.

Because default parameter values are optional, it would be impossible
to discern whether or not a param of type bool has default value of
false or not provided if the default value is encoded using a netlink
flag type. For this reason, when a DEVLINK_PARAM_TYPE_BOOL has an
associated default value, the default value is encoded using a u8
type.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-4-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agodevlink: refactor devlink_nl_param_value_fill_one()
Daniel Zahka [Wed, 19 Nov 2025 02:50:32 +0000 (18:50 -0800)] 
devlink: refactor devlink_nl_param_value_fill_one()

Lift the param type demux and value attr placement into a separate
function. This new function, devlink_nl_param_put(), can be used to
place additional types values in the value array, e.g., default,
current, next values. This commit has no functional change.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-3-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agodevlink: pass extack through to devlink_param::get()
Daniel Zahka [Wed, 19 Nov 2025 02:50:31 +0000 (18:50 -0800)] 
devlink: pass extack through to devlink_param::get()

Allow devlink_param::get() handlers to report error messages via
extack. This function is called in a few different contexts, but not
all of them will have an valid extack to use.

When devlink_param::get() is called from param_get_doit or
param_get_dumpit contexts, pass the extack through so that drivers can
report errors when retrieving param values. devlink_param::get() is
called from the context of devlink_param_notify(), pass NULL in for
the extack.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-2-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'netconsole-allow-userdata-buffer-to-grow-dynamically'
Jakub Kicinski [Fri, 21 Nov 2025 02:47:20 +0000 (18:47 -0800)] 
Merge branch 'netconsole-allow-userdata-buffer-to-grow-dynamically'

Gustavo Luiz Duarte says:

====================
netconsole: Allow userdata buffer to grow dynamically

The current netconsole implementation allocates a static buffer for
extradata (userdata + sysdata) with a fixed size of
MAX_EXTRADATA_ENTRY_LEN * MAX_EXTRADATA_ITEMS bytes for every target,
regardless of whether userspace actually uses this feature. This forces
us to keep MAX_EXTRADATA_ITEMS small (16), which is restrictive for
users who need to attach more metadata to their log messages.

This patch series enables dynamic allocation of the userdata buffer,
allowing it to grow on-demand based on actual usage. The series:

1. Refactors send_fragmented_body() to simplify handling of separated
   userdata and sysdata (patch 1/4)
2. Splits userdata and sysdata into separate buffers (patch 2/4)
3. Implements dynamic allocation for the userdata buffer (patch 3/4)
4. Increases MAX_USERDATA_ITEMS from 16 to 256 now that we can do so
   without memory waste (patch 4/4)

Benefits:
- No memory waste when userdata is not used
- Targets that use userdata only consume what they need
- Users can attach significantly more metadata without impacting systems
  that don't use this feature
====================

Link: https://patch.msgid.link/20251119-netconsole_dynamic_extradata-v3-0-497ac3191707@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetconsole: Increase MAX_USERDATA_ITEMS
Gustavo Luiz Duarte [Thu, 20 Nov 2025 00:14:52 +0000 (16:14 -0800)] 
netconsole: Increase MAX_USERDATA_ITEMS

Increase MAX_USERDATA_ITEMS from 16 to 256 entries now that the userdata
buffer is allocated dynamically.

The previous limit of 16 was necessary because the buffer was statically
allocated for all targets. With dynamic allocation, we can support more
entries without wasting memory on targets that don't use userdata.

This allows users to attach more metadata to their netconsole messages,
which is useful for complex debugging and logging scenarios.

Also update the testcase accordingly.

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251119-netconsole_dynamic_extradata-v3-4-497ac3191707@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetconsole: Dynamic allocation of userdata buffer
Gustavo Luiz Duarte [Thu, 20 Nov 2025 00:14:51 +0000 (16:14 -0800)] 
netconsole: Dynamic allocation of userdata buffer

The userdata buffer in struct netconsole_target is currently statically
allocated with a size of MAX_USERDATA_ITEMS * MAX_EXTRADATA_ENTRY_LEN
(16 * 256 = 4096 bytes). This wastes memory when userdata entries are
not used or when only a few entries are configured, which is common in
typical usage scenarios. It also forces us to keep MAX_USERDATA_ITEMS
small to limit the memory wasted.

Change the userdata buffer from a static array to a dynamically
allocated pointer. The buffer is now allocated on-demand in
update_userdata() whenever userdata entries are added, modified, or
removed via configfs. The implementation calculates the exact size
needed for all current userdata entries, allocates a new buffer of that
size, formats the entries into it, and atomically swaps it with the old
buffer.

This approach provides several benefits:
- Memory efficiency: Targets with no userdata use zero bytes instead of
  4KB, and targets with userdata only allocate what they need;
- Scalability: Makes it practical to increase MAX_USERDATA_ITEMS to a
  much larger value without imposing a fixed memory cost on every
  target;
- No hot-path overhead: Allocation occurs during configuration (write to
  configfs), not during message transmission

If memory allocation fails during userdata update, -ENOMEM is returned
to userspace through the configfs attribute write operation.

The sysdata buffer remains statically allocated since it has a smaller
fixed size (MAX_SYSDATA_ITEMS * MAX_EXTRADATA_ENTRY_LEN = 4 * 256 = 1024
bytes) and its content length is less predictable.

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251119-netconsole_dynamic_extradata-v3-3-497ac3191707@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetconsole: Split userdata and sysdata
Gustavo Luiz Duarte [Thu, 20 Nov 2025 00:14:50 +0000 (16:14 -0800)] 
netconsole: Split userdata and sysdata

Separate userdata and sysdata into distinct buffers to enable independent
management. Previously, both were stored in a single extradata_complete
buffer with a fixed size that accommodated both types of data.

This separation allows:
- userdata to grow dynamically (in subsequent patch)
- sysdata to remain in a small static buffer
- removal of complex entry counting logic that tracked both types together

The split also simplifies the code by eliminating the need to check total
entry count across both userdata and sysdata when enabling features,
which allows to drop holding su_mutex on sysdata_*_enabled_store().

No functional change in this patch, just structural preparation for
dynamic userdata allocation.

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251119-netconsole_dynamic_extradata-v3-2-497ac3191707@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetconsole: Simplify send_fragmented_body()
Gustavo Luiz Duarte [Thu, 20 Nov 2025 00:14:49 +0000 (16:14 -0800)] 
netconsole: Simplify send_fragmented_body()

Refactor send_fragmented_body() to use separate offset tracking for
msgbody, and extradata instead of complex conditional logic.
The previous implementation used boolean flags and calculated offsets
which made the code harder to follow.

The new implementation maintains independent offset counters
(msgbody_offset, extradata_offset) and processes each section
sequentially, making the data flow more straightforward and the code
easier to maintain.

This is a preparatory refactoring with no functional changes, which will
allow easily splitting extradata_complete into separate userdata and
sysdata buffers in the next patch.

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251119-netconsole_dynamic_extradata-v3-1-497ac3191707@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoeth: fbnic: access @pp through netmem_desc instead of page
Byungchul Park [Thu, 20 Nov 2025 01:11:18 +0000 (10:11 +0900)] 
eth: fbnic: access @pp through netmem_desc instead of page

To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make fbnic access @pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20251120011118.73253-1-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-fec-do-some-cleanup-for-the-driver'
Jakub Kicinski [Fri, 21 Nov 2025 02:40:15 +0000 (18:40 -0800)] 
Merge branch 'net-fec-do-some-cleanup-for-the-driver'

Wei Fang says:

====================
net: fec: do some cleanup for the driver

This patch set removes some unnecessary or invalid code from the FEC
driver. See each patch for details.
====================

Link: https://patch.msgid.link/20251119025148.2817602-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: fec: remove duplicate macros of the BD status
Wei Fang [Wed, 19 Nov 2025 02:51:48 +0000 (10:51 +0800)] 
net: fec: remove duplicate macros of the BD status

There are two sets of macros used to define the status bits of TX and RX
BDs, one is the BD_SC_xx macros, the other one is the BD_ENET_xx macros.
For the BD_SC_xx macros, only BD_SC_WRAP is used in the driver. But the
BD_ENET_xx macros are more widely used in the driver, and they define
more bits of the BD status. Therefore, remove the BD_SC_xx macros from
now on.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20251119025148.2817602-6-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: fec: remove rx_align from fec_enet_private
Wei Fang [Wed, 19 Nov 2025 02:51:47 +0000 (10:51 +0800)] 
net: fec: remove rx_align from fec_enet_private

The rx_align was introduced by the commit 41ef84ce4c72 ("net: fec: change
FEC alignment according to i.mx6 sx requirement"). Because the i.MX6 SX
requires RX buffer must be 64 bytes alignment.

Since the commit 95698ff6177b ("net: fec: using page pool to manage RX
buffers"), the address of the RX buffer is always the page address plus
FEC_ENET_XDP_HEADROOM which is 256 bytes, so the RX buffer is always
64-byte aligned. Therefore, rx_align has no effect since that commit,
and we can safely remove it.

In addition, to prevent future modifications to FEC_ENET_XDP_HEADROOM,
a BUILD_BUG_ON() test has been added to the driver, which ensures that
FEC_ENET_XDP_HEADROOM provides the required alignment.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251119025148.2817602-5-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: fec: remove struct fec_enet_priv_txrx_info
Wei Fang [Wed, 19 Nov 2025 02:51:46 +0000 (10:51 +0800)] 
net: fec: remove struct fec_enet_priv_txrx_info

The struct fec_enet_priv_txrx_info has three members: offset, page and
skb. The offset is only initialized in the driver and is not used, the
skb is never initialized and used in the driver. The both will not be
used in the future. Therefore, replace struct fec_enet_priv_txrx_info
directly with struct page.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20251119025148.2817602-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: fec: simplify the conditional preprocessor directives
Wei Fang [Wed, 19 Nov 2025 02:51:45 +0000 (10:51 +0800)] 
net: fec: simplify the conditional preprocessor directives

From the Kconfig file, we can see CONFIG_FEC depends on the following
platform-related options.

ColdFire: M523x, M527x, M5272, M528x, M520x and M532x
S32: ARCH_S32 (ARM64)
i.MX: SOC_IMX28 and ARCH_MXC (ARM and ARM64)

Based on the code of fec driver, only some macro definitions on the
M5272 platform are different from those on other platforms. Therefore,
we can simplify the following complex preprocessor directives to
"if !defined(CONFIG_M5272)".

"#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || \
     defined(CONFIG_M528x) || defined(CONFIG_M520x) || \
     defined(CONFIG_M532x) || defined(CONFIG_ARM) || \
     defined(CONFIG_ARM64)"

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251119025148.2817602-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: fec: remove useless conditional preprocessor directives
Wei Fang [Wed, 19 Nov 2025 02:51:44 +0000 (10:51 +0800)] 
net: fec: remove useless conditional preprocessor directives

The conditional preprocessor directive was added to fix build errors on
the MCF5272 platform, see commit d13919301d9a ("net: fec: Fix build for
MCF5272"). The compilation errors were originally caused by some register
macros not being defined on that platform.

The driver now uses quirks to dynamically handle platform differences,
and for MCF5272, its quirks is 0, so it does not support RACC and GBIT
Ethernet. So these preprocessor directives are no longer required and
can be safely removed without causing build or functional issue.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251119025148.2817602-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-add-1600gbps-1-6t-link-mode-support'
Jakub Kicinski [Fri, 21 Nov 2025 02:21:32 +0000 (18:21 -0800)] 
Merge branch 'net-add-1600gbps-1-6t-link-mode-support'

Tariq Toukan says:

====================
net: Add 1600Gbps (1.6T) link mode support

This series by Yael adds 1600Gbps (1.6T) link mode support.
See detailed description by Yael below.
====================

Link: https://patch.msgid.link/1763585297-1243980-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agobonding: 3ad: Add support for 1600G speed
Yael Chemla [Wed, 19 Nov 2025 20:48:17 +0000 (22:48 +0200)] 
bonding: 3ad: Add support for 1600G speed

Add support for 1600Gbps speed to allow using 3ad mode with 1600G
devices.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763585297-1243980-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet/mlx5e: Add 1600Gbps link modes
Yael Chemla [Wed, 19 Nov 2025 20:48:16 +0000 (22:48 +0200)] 
net/mlx5e: Add 1600Gbps link modes

Introduce support for a 1600Gbps link mode, utilizing 8 lanes at 200Gbps
per lane.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763585297-1243980-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: ethtool: Add support for 1600Gbps speed
Yael Chemla [Wed, 19 Nov 2025 20:48:15 +0000 (22:48 +0200)] 
net: ethtool: Add support for 1600Gbps speed

Add support for 1600Gbps link modes based on 200Gbps per lane [1].
This includes the adopted IEEE 802.3dj copper and optical PMDs that use
200G/lane signaling [2].

Add the following PMD types:
- KR8 (backplane)
- CR8 (copper cable)
- DR8 (SMF 500m)
- DR8-2 (SMF 2km)

These modes are defined in the 802.3dj specifications.
References:
[1] https://www.ieee802.org/3/dj/public/23_03/opsasnick_3dj_01a_2303.pdf
[2] https://www.ieee802.org/3/dj/projdoc/objectives_P802d3dj_240314.pdf

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/1763585297-1243980-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoynl: samples: add tc filter example
Zahari Doychev [Wed, 19 Nov 2025 20:36:18 +0000 (21:36 +0100)] 
ynl: samples: add tc filter example

Add a sample tool demonstrating how to add, dump, and delete a
flower filter with two VLAN push actions. The example can be
invoked as:

  # samples/tc-filter-add p2

    flower pref 1 proto: 0x8100
    flower:
      vlan_id: 100
      vlan_prio: 5
      num_of_vlans: 3
    action order: 1 vlan push id 200 protocol 0x8100 priority 0
    action order: 2 vlan push id 300 protocol 0x8100 priority 0

This verifies correct handling of tc action attributes for multiple
VLAN push actions. The tc action indexed arrays start from index 1,
and the index defines the action order. This behavior differs from
the YNL specification, which expects arrays to be zero-based. To
accommodate this, the example adds a dummy action at index 0, which
is ignored by the kernel.

Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
Link: https://patch.msgid.link/20251119203618.263780-2-zahari.doychev@linux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'selftests-drv-net-convert-gro-and-toeplitz-tests-to-work-for-drivers...
Jakub Kicinski [Fri, 21 Nov 2025 02:19:33 +0000 (18:19 -0800)] 
Merge branch 'selftests-drv-net-convert-gro-and-toeplitz-tests-to-work-for-drivers-in-nipa'

Jakub Kicinski says:

====================
selftests: drv-net: convert GRO and Toeplitz tests to work for drivers in NIPA

Main objective of this series is to convert the gro.sh and toeplitz.sh
tests to be "NIPA-compatible" - meaning make use of the Python env,
which lets us run the tests against either netdevsim or a real device.

The tests seem to have been written with a different flow in mind.
Namely they source different bash "setup" scripts depending on arguments
passed to the test. While I have nothing against the use of bash and
the overall architecture - the existing code needs quite a bit of work
(don't assume MAC/IP addresses, support remote endpoint over SSH).
If I'm the one fixing it, I'd rather convert them to our "simplistic"
Python.

This series rewrites the tests in Python while addressing their
shortcomings. The functionality of running the test over loopback
on a real device is retained but with a different method of invocation
(see the last patch).

Once again we are dealing with a script which run over a variety of
protocols (combination of [ipv4, ipv6, ipip] x [tcp, udp]). The first
4 patches add support for test variants to our scripts. We use the
term "variant" in the same sense as the C kselftest_harness.h -
variant is just a set of static input arguments.

Note that neither GRO nor the Toeplitz test fully passes for me on
any HW I have access to. But this is unrelated to the conversion.
This series is not making any real functional changes to the tests,
it is limited to improving the "test harness" scripts.
====================

Link: https://patch.msgid.link/20251120021024.2944527-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: net: remove old setup_* scripts
Jakub Kicinski [Thu, 20 Nov 2025 02:10:24 +0000 (18:10 -0800)] 
selftests: net: remove old setup_* scripts

gro.sh and toeplitz.sh used to source in one of two setup scripts
depending on whether the test was expected to be run against
veth or a real device. veth testing is replaced by netdevsim
and existing "remote endpoint" support in our Python tests.
Add a script which sets up loopback mode.

The usage is a little bit more complicated than running
the scripts used to be. Testing used to work like this:

  ./../gro.sh -i eth0 ...

now the "setup script" has to be run explicitly:

  NETIF=eth0 ./../ksft_setup_loopback.sh ./../gro.sh

But the functionality itself is retained.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-13-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetdevsim: add loopback support
Jakub Kicinski [Thu, 20 Nov 2025 02:10:23 +0000 (18:10 -0800)] 
netdevsim: add loopback support

Support device loopback. Apparently this mode has been historically
supported by the toeplitz test and I don't have any HW which lets
me test the conversion..

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: drv-net: hw: convert the Toeplitz test to Python
Jakub Kicinski [Thu, 20 Nov 2025 02:10:22 +0000 (18:10 -0800)] 
selftests: drv-net: hw: convert the Toeplitz test to Python

Rewrite the existing toeplitz.sh test in Python. The conversion
is a lot less exact than the GRO one. We use Netlink APIs to
get the device RSS and IRQ information. We expect that the device
has neither RPS nor RFS configured, and set RPS up as part of
the test.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: drv-net: add a Python version of the GRO test
Jakub Kicinski [Thu, 20 Nov 2025 02:10:21 +0000 (18:10 -0800)] 
selftests: drv-net: add a Python version of the GRO test

Rewrite the existing gro.sh test in Python. The conversion
not exact, the changes are related to integrating the test
with our "remote endpoint" paradigm. The test now reads
the IP addresses from the user config. It resolves the MAC
address (including running over Layer 3 networks).

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetdevsim: pass packets thru GRO on Rx
Jakub Kicinski [Thu, 20 Nov 2025 02:10:20 +0000 (18:10 -0800)] 
netdevsim: pass packets thru GRO on Rx

To replace veth in software GRO testing with netdevsim we need
GRO support in netdevsim. Luckily we already have NAPI support
so this change is trivial (compared to veth).

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: net: py: read ip link info about remote dev
Jakub Kicinski [Thu, 20 Nov 2025 02:10:19 +0000 (18:10 -0800)] 
selftests: net: py: read ip link info about remote dev

We're already saving the info about the local dev in env.dev
for the tests, save remote dev as well. This is more symmetric,
env generally provides the same info for local and remote end.

While at it make sure that we reliably get the detailed info
about the local dev. nsim used to read the dev info without -d.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: net: py: support ksft ready without wait
Jakub Kicinski [Thu, 20 Nov 2025 02:10:18 +0000 (18:10 -0800)] 
selftests: net: py: support ksft ready without wait

There's a common synchronization problem when a script (Python test)
uses a C program to set up some state (usually start a receiving
process for traffic). The script needs to know when the process
has fully initialized. The inverse of the problem exists for shutting
the process down - we need a reliable way to tell the process to exit.

We added helpers to do this safely in
commit 71477137994f ("selftests: drv-net: add a way to wait for a local process")
unfortunately the two operations (wait for init, and shutdown) are
controlled by a single parameter (ksft_wait). Add support for using
ksft_ready without using the second fd for exit.

This is useful for programs which wait for a specific number of packets
to rx so exit_wait is a good match, but we still need to wait for init.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251120021024.2944527-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: net: relocate gro and toeplitz tests to drivers/net
Jakub Kicinski [Thu, 20 Nov 2025 02:10:17 +0000 (18:10 -0800)] 
selftests: net: relocate gro and toeplitz tests to drivers/net

The GRO test can run on a real device or a veth.
The Toeplitz hash test can only run on a real device.
Move them from net/ to drivers/net/ and drivers/net/hw/ respectively.

There are two scripts which set up the environment for these tests
setup_loopback.sh and setup_veth.sh. Move those scripts to net/lib.
The paths to the setup files are a little ugly but they will be
deleted shortly.

toeplitz_client.sh is not a test in itself, but rather a helper
to send traffic, so add it to TEST_FILES rather than TEST_PROGS.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: drv-net: xdp: use variants for qstat tests
Jakub Kicinski [Thu, 20 Nov 2025 02:10:16 +0000 (18:10 -0800)] 
selftests: drv-net: xdp: use variants for qstat tests

Use just-added ksft variants for XDP qstat tests.

While at it correct the number of packets, we're sending
1000 packets now.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: net: py: add test variants
Jakub Kicinski [Thu, 20 Nov 2025 02:10:15 +0000 (18:10 -0800)] 
selftests: net: py: add test variants

There's a lot of cases where we try to re-run the same code with
different parameters. We currently need to either use a generator
method or create a "main" case implementation which then gets called
by trivial case functions:

  def _test(x, y, z):
     ...

  def case_int():
     _test(1, 2, 3)

  def case_str():
     _test('a', 'b', 'c')

Add support for variants, similar to kselftests_harness.h and
a lot of other frameworks. Variants can be added as decorator
to test functions:

  @ksft_variants([(1, 2, 3), ('a', 'b', 'c')])
  def case(x, y, z):
     ...

ksft_run() will auto-generate case names:
  case.1_2_3
  case.a_b_c

Because the names may not always be pretty (and to avoid forcing
classes to implement case-friendly __str__()) add a wrapper class
KsftNamedVariant which lets the user specify the name for the variant.

Note that ksft_run's args are still supported. ksft_run splices args
and variant params together.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251120021024.2944527-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: net: py: extract the case generation logic
Jakub Kicinski [Thu, 20 Nov 2025 02:10:14 +0000 (18:10 -0800)] 
selftests: net: py: extract the case generation logic

In preparation for adding test variants move the test case
collection logic to a dedicated function. New helper returns

 (function, args, name, )

tuples. The main test loop can simply run them, not much
logic or discernment needed.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: net: py: coding style improvements
Jakub Kicinski [Thu, 20 Nov 2025 02:10:13 +0000 (18:10 -0800)] 
selftests: net: py: coding style improvements

We're about to add more features here and finding new issues with old
ones in place is hard. Address ruff checks:
 - bare exceptions
 - f-string with no params
 - unused import

We need to use BaseException when handling defer(), as Petr points out.
This retains the old behavior of ignoring SIGTERM while running cleanups.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251120021024.2944527-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: phy: fixed_phy: remove not needed initialization of phy_device members
Heiner Kallweit [Wed, 19 Nov 2025 06:55:47 +0000 (07:55 +0100)] 
net: phy: fixed_phy: remove not needed initialization of phy_device members

All these members are populated by the phylib state machine once the
PHY has been started, based on the fixed autoneg results.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/bc666a53-5469-4e9c-85a1-dd285aadfe4f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: phy: fixed_phy: fix missing initialization of fixed phy link
Heiner Kallweit [Wed, 19 Nov 2025 07:05:45 +0000 (08:05 +0100)] 
net: phy: fixed_phy: fix missing initialization of fixed phy link

Original change remove the link initialization from the passed struct
fixed_phy_status, but @status is also passed to __fixed_phy_add(),
where it is saved. Make sure that copy also has link set to 1.

Fixes: 9f07af1d2742 ("net: phy: fixed_phy: initialize the link status as up")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/dab6c10e-725e-4648-9662-39cc821723d0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-phy-adin1100-fix-powerdown-mode-setting'
Jakub Kicinski [Fri, 21 Nov 2025 02:04:00 +0000 (18:04 -0800)] 
Merge branch 'net-phy-adin1100-fix-powerdown-mode-setting'

Alexander Dahl says:

====================
net: phy: adin1100: Fix powerdown mode setting

while building a new device around the ADIN1100 I noticed some errors in
kernel log when calling `ifdown` on the ethernet device.  Series has a
straight forward fix and an obvious follow-up code simplification.
====================

Link: https://patch.msgid.link/20251119124737.280939-1-ada@thorsis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: phy: adin1100: Simplify register value passing
Alexander Dahl [Wed, 19 Nov 2025 12:47:37 +0000 (13:47 +0100)] 
net: phy: adin1100: Simplify register value passing

The additional use case for that variable is gone,
the expression is simple enough to pass it inline now.

Signed-off-by: Alexander Dahl <ada@thorsis.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Nuno Sá <nuno.sa@analog.com>
Link: https://patch.msgid.link/20251119124737.280939-3-ada@thorsis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: phy: adin1100: Fix software power-down ready condition
Alexander Dahl [Wed, 19 Nov 2025 12:47:36 +0000 (13:47 +0100)] 
net: phy: adin1100: Fix software power-down ready condition

Value CRSM_SFT_PD written to Software Power-Down Control Register
(CRSM_SFT_PD_CNTRL) is 0x01 and therefor different to value
CRSM_SFT_PD_RDY (0x02) read from System Status Register (CRSM_STAT) for
confirmation powerdown has been reached.

The condition could have only worked when disabling powerdown
(both 0x00), but never when enabling it (0x01 != 0x02).

Result is a timeout, like so:

    $ ifdown eth0
    macb f802c000.ethernet eth0: Link is Down
    ADIN1100 f802c000.ethernet-ffffffff:01: adin_set_powerdown_mode failed: -110
    ADIN1100 f802c000.ethernet-ffffffff:01: adin_set_powerdown_mode failed: -110

Fixes: 7eaf9132996a ("net: phy: adin1100: Add initial support for ADIN1100 industrial PHY")
Signed-off-by: Alexander Dahl <ada@thorsis.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Nuno Sá <nuno.sa@analog.com>
Link: https://patch.msgid.link/20251119124737.280939-2-ada@thorsis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-stmmac-simplify-axi_blen-handling'
Jakub Kicinski [Fri, 21 Nov 2025 01:57:42 +0000 (17:57 -0800)] 
Merge branch 'net-stmmac-simplify-axi_blen-handling'

Russell King says:

====================
net: stmmac: simplify axi_blen handling

stmmac's axi_blen (burst length) handling is very verbose and
unnecessary.

Firstly, the burst length register bitfield is the same across all
dwmac cores, so we can use common definitions for these bits which
platform glue can use.

We end up with platform glue:
- filling in the axi_blen[] array with the decimal burst lengths, e.g.
  dwmac-intel.c, etc
- decoding a bitmap into burst lengths for this array, e.g.
  dwmac-dwc-qos-eth.c

Other cases read the array from DT, placing it into the axi_blen
array, and converting later to the register bitfield.

This series removes all this complexity, ultimately ending up with
platform glue providing the register value containing the burst
length bitfield directly. Where necessary, platform glue calls
stmmac_axi_blen_to_mask() to convert a decimal array (e.g. from
DT) to the register value.

This also means that stmmac_axi_blen_to_mask() can issue a
diagnostic message at probe time if the burst length is incorrect.
====================

Link: https://patch.msgid.link/aR2aaDs6rqfu32B-@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: remove axi_blen array
Russell King (Oracle) [Wed, 19 Nov 2025 10:23:40 +0000 (10:23 +0000)] 
net: stmmac: remove axi_blen array

Remove the axi_blen array from struct stmmac_axi as we set this array,
and then immediately convert it ot the register value, never looking at
the array again. Thus, the array can be function local rather than part
of a run-time allocated long-lived struct.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vLfLg-0000000FMbD-1vmh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: move stmmac_axi_blen_to_mask() to axi_blen init sites
Russell King (Oracle) [Wed, 19 Nov 2025 10:23:35 +0000 (10:23 +0000)] 
net: stmmac: move stmmac_axi_blen_to_mask() to axi_blen init sites

Move stmmac_axi_blen_to_mask() to the axi->axi_blen array init sites
to prepare for the removal of axi_blen. For sites which initialise
axi->axi_blen with constant data, initialise axi->axi_blen_regval
using the DMA_AXI_BLENx constants.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vLfLb-0000000FMb7-1SgG@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: move stmmac_axi_blen_to_mask() to stmmac_main.c
Russell King (Oracle) [Wed, 19 Nov 2025 10:23:30 +0000 (10:23 +0000)] 
net: stmmac: move stmmac_axi_blen_to_mask() to stmmac_main.c

Move the call to stmmac_axi_blen_to_mask() out of the individual
MAC version drivers into the main code in stmmac_init_dma_engine(),
passing the resulting value through a new member, axi_blen_regval,
in the struct stmmac_axi structure.

There is now no need for stmmac_axi_blen_to_dma_mask() to use
u32p_replace_bits(), so use FIELD_PREP() instead.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vLfLW-0000000FMb1-0zKV@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: provide common stmmac_axi_blen_to_mask()
Russell King (Oracle) [Wed, 19 Nov 2025 10:23:25 +0000 (10:23 +0000)] 
net: stmmac: provide common stmmac_axi_blen_to_mask()

Provide a common stmmac_axi_blen_to_mask() function to translate the
burst length array to the value for the AXI bus mode register, and use
it for dwmac, dwmac4 and dwxgmac2. Remove the now unnecessary
XGMAC_BLEN* definitions.

Note that stmmac_axi_blen_to_dma_mask() is coded to be more efficient
than the original three implementations, and verifies the contents of
the burst length array.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vLfLR-0000000FMav-0VL6@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: move common DMA AXI register bits to common.h
Russell King (Oracle) [Wed, 19 Nov 2025 10:23:19 +0000 (10:23 +0000)] 
net: stmmac: move common DMA AXI register bits to common.h

Move the common DMA AXI register bits to common.h so they can be shared
and we can provide a common function to convert the axi->dma_blen[]
array to the format needed for this register.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vLfLL-0000000FMap-49gf@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: dwc-qos-eth: simplify switch() in dwc_eth_dwmac_config_dt()
Russell King (Oracle) [Wed, 19 Nov 2025 10:23:14 +0000 (10:23 +0000)] 
net: stmmac: dwc-qos-eth: simplify switch() in dwc_eth_dwmac_config_dt()

Simplify the switch() statement in dwc_eth_dwmac_config_dt().
Although this is not speed-critical, simplifying it can make it more
readable. This also drastically improves the code emitted by the
compiler.

On aarch64, with the original code, the compiler loads registers with
every possible value, and then has a tree of test-and-branch statements
to work out which register to store. With the simplified code, the
compiler can load a register with '4' and shift it appropriately.

This shrinks the text size on aarch64 from 4289 bytes to 4153 bytes,
a reduction of 3%.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vLfLG-0000000FMai-3fKz@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: rk: use phylink's interface mode for set_clk_tx_rate()
Russell King (Oracle) [Wed, 19 Nov 2025 11:29:16 +0000 (11:29 +0000)] 
net: stmmac: rk: use phylink's interface mode for set_clk_tx_rate()

rk_set_clk_tx_rate() is passed the interface mode from phylink which
will be the same as bsp_priv->phy_iface. Use the passed-in interface
mode rather than bsp_priv->phy_iface.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vLgNA-0000000FMjN-0DSS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-stmmac-pass-struct-device-to-init-exit'
Jakub Kicinski [Fri, 21 Nov 2025 01:54:10 +0000 (17:54 -0800)] 
Merge branch 'net-stmmac-pass-struct-device-to-init-exit'

Russell King says:

====================
net: stmmac: pass struct device to init/exit

Rather than passing the platform device to the ->init() and ->exit()
methods, make these methods useful for other devices by passing the
struct device instead. Update the implementations appropriately for
this change.

Move the calls for these methods into the core driver's probe and
remove methods from the stmmac_platform layer.

Convert dwmac-rk to use ->init() and ->exit().
====================

Link: https://patch.msgid.link/aR2V0Kib7j0L4FNN@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: rk: convert to init()/exit() methods
Russell King (Oracle) [Wed, 19 Nov 2025 10:04:00 +0000 (10:04 +0000)] 
net: stmmac: rk: convert to init()/exit() methods

Convert rk to use the init() and exit() methods for powering up and
down the device. This allows us to use the pltfr versions of probe()
and remove().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vLf2e-0000000FMNN-1Xnh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: move probe/remove calling of init/exit
Russell King (Oracle) [Wed, 19 Nov 2025 10:03:55 +0000 (10:03 +0000)] 
net: stmmac: move probe/remove calling of init/exit

Move the probe/remove time calling of the init()/exit() methods in
the platform data to the main driver probe/remove functions. This
allows them to be used by non-platform_device based drivers.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vLf2Z-0000000FMNH-0xPV@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: pass struct device to init()/exit() methods
Russell King (Oracle) [Wed, 19 Nov 2025 10:03:50 +0000 (10:03 +0000)] 
net: stmmac: pass struct device to init()/exit() methods

As struct plat_stmmacenet_data is not platform_device specific, pass
a struct device into the init() and exit() methods to allow them to
become independent of the underlying device.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chen-Yu Tsai <wens@kernel.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vLf2U-0000000FMN2-0SLg@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'tcp-tcp_rcvbuf_grow-changes'
Jakub Kicinski [Fri, 21 Nov 2025 01:44:26 +0000 (17:44 -0800)] 
Merge branch 'tcp-tcp_rcvbuf_grow-changes'

Eric Dumazet says:

====================
tcp: tcp_rcvbuf_grow() changes

First pach is minor and moves tcp_moderate_rcvbuf in appropriate group.

Second patch is another attempt to keep small sk->sk_rcvbuf for DC
(small RT) TCP flows for optimal performance.
====================

Link: https://patch.msgid.link/20251119084813.3684576-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotcp: add net.ipv4.tcp_rcvbuf_low_rtt
Eric Dumazet [Wed, 19 Nov 2025 08:48:13 +0000 (08:48 +0000)] 
tcp: add net.ipv4.tcp_rcvbuf_low_rtt

This is a follow up of commit aa251c84636c ("tcp: fix too slow
tcp_rcvbuf_grow() action") which brought again the issue that I tried
to fix in commit 65c5287892e9 ("tcp: fix sk_rcvbuf overshoot")

We also recently increased tcp_rmem[2] to 32 MB in commit 572be9bf9d0d
("tcp: increase tcp_rmem[2] to 32 MB")

Idea of this patch is to not let tcp_rcvbuf_grow() grow sk->sk_rcvbuf
too fast for small RTT flows. If sk->sk_rcvbuf is too big, this can
force NIC driver to not recycle pages from their page pool, and also
can cause cache evictions for DDIO enabled cpus/NIC, as receivers
are usually slower than senders.

Add net.ipv4.tcp_rcvbuf_low_rtt sysctl, set by default to 1000 usec (1 ms)

If RTT if smaller than the sysctl value, use the RTT/tcp_rcvbuf_low_rtt
ratio to control sk_rcvbuf inflation.

Tested:

Pair of hosts with a 200Gbit IDPF NIC. Using netperf/netserver

Client initiates 8 TCP bulk flows, asking netserver to use CPU #10 only.

super_netperf 8 -H server -T,10 -l 30

On server, use perf -e tcp:tcp_rcvbuf_grow while test is running.

Before:

sysctl -w net.ipv4.tcp_rcvbuf_low_rtt=1
perf record -a -e tcp:tcp_rcvbuf_grow sleep 30 ; perf script|tail -20|cut -c30-230
 1153.051201: tcp:tcp_rcvbuf_grow: time=398 rtt_us=382 copied=6905856 inq=180224 space=6115328 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil
 1153.138752: tcp:tcp_rcvbuf_grow: time=446 rtt_us=413 copied=5529600 inq=180224 space=4505600 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=21286912 famil
 1153.361484: tcp:tcp_rcvbuf_grow: time=415 rtt_us=380 copied=7061504 inq=204800 space=6725632 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil
 1153.457642: tcp:tcp_rcvbuf_grow: time=483 rtt_us=421 copied=5885952 inq=720896 space=4407296 ooo=0 scaling_ratio=240 rcvbuf=23763511 rcv_ssthresh=22223271 window_clamp=22278291 rcv_wnd=21430272 famil
 1153.466002: tcp:tcp_rcvbuf_grow: time=308 rtt_us=281 copied=3244032 inq=180224 space=2883584 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41992059 window_clamp=42050919 rcv_wnd=41713664 famil
 1153.747792: tcp:tcp_rcvbuf_grow: time=394 rtt_us=332 copied=4460544 inq=585728 space=3063808 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41992059 window_clamp=42050919 rcv_wnd=41373696 famil
 1154.260747: tcp:tcp_rcvbuf_grow: time=652 rtt_us=226 copied=10977280 inq=737280 space=9486336 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29197743 window_clamp=29217691 rcv_wnd=28368896 fami
 1154.375019: tcp:tcp_rcvbuf_grow: time=461 rtt_us=443 copied=7573504 inq=507904 space=6856704 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25288704 famil
 1154.463072: tcp:tcp_rcvbuf_grow: time=494 rtt_us=408 copied=7983104 inq=200704 space=7065600 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25579520 famil
 1154.474658: tcp:tcp_rcvbuf_grow: time=507 rtt_us=459 copied=5586944 inq=540672 space=4718592 ooo=0 scaling_ratio=240 rcvbuf=17852266 rcv_ssthresh=16692999 window_clamp=16736499 rcv_wnd=16056320 famil
 1154.584657: tcp:tcp_rcvbuf_grow: time=494 rtt_us=427 copied=8126464 inq=204800 space=7782400 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25878235 window_clamp=25937095 rcv_wnd=25600000 famil
 1154.702117: tcp:tcp_rcvbuf_grow: time=480 rtt_us=406 copied=5734400 inq=180224 space=5349376 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=21286912 famil
 1155.941595: tcp:tcp_rcvbuf_grow: time=717 rtt_us=670 copied=11042816 inq=3784704 space=7159808 ooo=0 scaling_ratio=240 rcvbuf=19581357 rcv_ssthresh=18333222 window_clamp=18357522 rcv_wnd=14614528 fam
 1156.384735: tcp:tcp_rcvbuf_grow: time=529 rtt_us=473 copied=9011200 inq=180224 space=7258112 ooo=0 scaling_ratio=240 rcvbuf=19581357 rcv_ssthresh=18333222 window_clamp=18357522 rcv_wnd=18018304 famil
 1157.821676: tcp:tcp_rcvbuf_grow: time=529 rtt_us=272 copied=8224768 inq=602112 space=6545408 ooo=0 scaling_ratio=240 rcvbuf=67000000 rcv_ssthresh=62793576 window_clamp=62812500 rcv_wnd=62115840 famil
 1158.906379: tcp:tcp_rcvbuf_grow: time=710 rtt_us=445 copied=11845632 inq=540672 space=10240000 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29205935 window_clamp=29217691 rcv_wnd=28536832 fam
 1164.600160: tcp:tcp_rcvbuf_grow: time=841 rtt_us=430 copied=12976128 inq=1290240 space=11304960 ooo=0 scaling_ratio=240 rcvbuf=31165538 rcv_ssthresh=29212591 window_clamp=29217691 rcv_wnd=27856896 fa
 1165.163572: tcp:tcp_rcvbuf_grow: time=845 rtt_us=800 copied=12632064 inq=540672 space=7921664 ooo=0 scaling_ratio=240 rcvbuf=27666235 rcv_ssthresh=25912795 window_clamp=25937095 rcv_wnd=25260032 fami
 1165.653464: tcp:tcp_rcvbuf_grow: time=388 rtt_us=309 copied=4493312 inq=180224 space=3874816 ooo=0 scaling_ratio=240 rcvbuf=44854314 rcv_ssthresh=41995899 window_clamp=42050919 rcv_wnd=41713664 famil
 1166.651211: tcp:tcp_rcvbuf_grow: time=556 rtt_us=553 copied=6328320 inq=540672 space=5554176 ooo=0 scaling_ratio=240 rcvbuf=23068672 rcv_ssthresh=21571860 window_clamp=21626880 rcv_wnd=20946944 famil

After:

sysctl -w net.ipv4.tcp_rcvbuf_low_rtt=1000
perf record -a -e tcp:tcp_rcvbuf_grow sleep 30 ; perf script|tail -20|cut -c30-230
 1457.053149: tcp:tcp_rcvbuf_grow: time=128 rtt_us=24 copied=1441792 inq=40960 space=1269760 ooo=0 scaling_ratio=240 rcvbuf=2960741 rcv_ssthresh=2605474 window_clamp=2775694 rcv_wnd=2568192 family=AF_I
 1458.000778: tcp:tcp_rcvbuf_grow: time=128 rtt_us=31 copied=1441792 inq=24576 space=1400832 ooo=0 scaling_ratio=240 rcvbuf=3060163 rcv_ssthresh=2810042 window_clamp=2868902 rcv_wnd=2674688 family=AF_I
 1458.088059: tcp:tcp_rcvbuf_grow: time=190 rtt_us=110 copied=3227648 inq=385024 space=2781184 ooo=0 scaling_ratio=240 rcvbuf=6728240 rcv_ssthresh=6252705 window_clamp=6307725 rcv_wnd=5799936 family=AF
 1458.148549: tcp:tcp_rcvbuf_grow: time=232 rtt_us=129 copied=3956736 inq=237568 space=2842624 ooo=0 scaling_ratio=240 rcvbuf=6731333 rcv_ssthresh=6252705 window_clamp=6310624 rcv_wnd=5918720 family=AF
 1458.466861: tcp:tcp_rcvbuf_grow: time=193 rtt_us=83 copied=2949120 inq=180224 space=2457600 ooo=0 scaling_ratio=240 rcvbuf=5751438 rcv_ssthresh=5357689 window_clamp=5391973 rcv_wnd=5054464 family=AF_
 1458.775476: tcp:tcp_rcvbuf_grow: time=257 rtt_us=127 copied=4304896 inq=352256 space=3346432 ooo=0 scaling_ratio=240 rcvbuf=8067131 rcv_ssthresh=7523275 window_clamp=7562935 rcv_wnd=7061504 family=AF
 1458.776631: tcp:tcp_rcvbuf_grow: time=200 rtt_us=96 copied=3260416 inq=143360 space=2768896 ooo=0 scaling_ratio=240 rcvbuf=6397256 rcv_ssthresh=5938567 window_clamp=5997427 rcv_wnd=5828608 family=AF_
 1459.707973: tcp:tcp_rcvbuf_grow: time=215 rtt_us=96 copied=2506752 inq=163840 space=1388544 ooo=0 scaling_ratio=240 rcvbuf=3068867 rcv_ssthresh=2768282 window_clamp=2877062 rcv_wnd=2555904 family=AF_
 1460.246494: tcp:tcp_rcvbuf_grow: time=231 rtt_us=80 copied=3756032 inq=204800 space=3117056 ooo=0 scaling_ratio=240 rcvbuf=7288091 rcv_ssthresh=6773725 window_clamp=6832585 rcv_wnd=6471680 family=AF_
 1460.714596: tcp:tcp_rcvbuf_grow: time=270 rtt_us=110 copied=4714496 inq=311296 space=3719168 ooo=0 scaling_ratio=240 rcvbuf=8957739 rcv_ssthresh=8339020 window_clamp=8397880 rcv_wnd=7933952 family=AF
 1462.029977: tcp:tcp_rcvbuf_grow: time=101 rtt_us=19 copied=1105920 inq=40960 space=1036288 ooo=0 scaling_ratio=240 rcvbuf=2338970 rcv_ssthresh=2091684 window_clamp=2192784 rcv_wnd=1986560 family=AF_I
 1462.802385: tcp:tcp_rcvbuf_grow: time=89 rtt_us=45 copied=1069056 inq=0 space=1064960 ooo=0 scaling_ratio=240 rcvbuf=2338970 rcv_ssthresh=2091684 window_clamp=2192784 rcv_wnd=2035712 family=AF_INET6
 1462.918648: tcp:tcp_rcvbuf_grow: time=105 rtt_us=33 copied=1441792 inq=180224 space=1069056 ooo=0 scaling_ratio=240 rcvbuf=2383282 rcv_ssthresh=2091684 window_clamp=2234326 rcv_wnd=1896448 family=AF_
 1463.222533: tcp:tcp_rcvbuf_grow: time=273 rtt_us=144 copied=4603904 inq=385024 space=3469312 ooo=0 scaling_ratio=240 rcvbuf=8422564 rcv_ssthresh=7891053 window_clamp=7896153 rcv_wnd=7409664 family=AF
 1466.519312: tcp:tcp_rcvbuf_grow: time=130 rtt_us=23 copied=1343488 inq=0 space=1261568 ooo=0 scaling_ratio=240 rcvbuf=2780158 rcv_ssthresh=2493778 window_clamp=2606398 rcv_wnd=2494464 family=AF_INET6
 1466.681003: tcp:tcp_rcvbuf_grow: time=128 rtt_us=21 copied=1441792 inq=12288 space=1343488 ooo=0 scaling_ratio=240 rcvbuf=2932027 rcv_ssthresh=2578555 window_clamp=2748775 rcv_wnd=2568192 family=AF_I
 1470.689959: tcp:tcp_rcvbuf_grow: time=255 rtt_us=122 copied=3932160 inq=204800 space=3551232 ooo=0 scaling_ratio=240 rcvbuf=8182038 rcv_ssthresh=7647384 window_clamp=7670660 rcv_wnd=7442432 family=AF
 1471.754154: tcp:tcp_rcvbuf_grow: time=188 rtt_us=95 copied=2138112 inq=577536 space=1429504 ooo=0 scaling_ratio=240 rcvbuf=3113650 rcv_ssthresh=2806426 window_clamp=2919046 rcv_wnd=2248704 family=AF_
 1476.813542: tcp:tcp_rcvbuf_grow: time=269 rtt_us=99 copied=3088384 inq=180224 space=2564096 ooo=0 scaling_ratio=240 rcvbuf=6219470 rcv_ssthresh=5771893 window_clamp=5830753 rcv_wnd=5509120 family=AF_
 1477.738309: tcp:tcp_rcvbuf_grow: time=166 rtt_us=54 copied=1777664 inq=180224 space=1417216 ooo=0 scaling_ratio=240 rcvbuf=3117118 rcv_ssthresh=2874958 window_clamp=2922298 rcv_wnd=2613248 family=AF_

We can see sk_rcvbuf values are much smaller, and that rtt_us (estimation of rtt
from a receiver point of view) is kept small, instead of being bloated.

No difference in throughput.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20251119084813.3684576-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotcp: tcp_moderate_rcvbuf is only used in rx path
Eric Dumazet [Wed, 19 Nov 2025 08:48:12 +0000 (08:48 +0000)] 
tcp: tcp_moderate_rcvbuf is only used in rx path

sysctl_tcp_moderate_rcvbuf is only used from tcp_rcvbuf_grow().

Move it to netns_ipv4_read_rx group.

Remove various CACHELINE_ASSERT_GROUP_SIZE() from netns_ipv4_struct_check(),
as they have no real benefit but cause pain for all changes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251119084813.3684576-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-mdio-improve-reset-handling-of-mdio-devices'
Jakub Kicinski [Fri, 21 Nov 2025 01:41:40 +0000 (17:41 -0800)] 
Merge branch 'net-mdio-improve-reset-handling-of-mdio-devices'

Buday Csaba says:

====================
net: mdio: improve reset handling of mdio devices

This patchset refactors and slightly improves the reset handling of
`mdio_device`.

The patches were split from a larger series, discussed previously in the
links below.

The difference between v2 and v3, is that the helper function declarations
have been moved to a new header file: drivers/net/phy/mdio-private.h
See links for the previous versions, and for the now separate leak fix.
====================

Link: https://patch.msgid.link/cover.1763473655.git.buday.csaba@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: mdio: improve reset handling in mdio_device.c
Buday Csaba [Tue, 18 Nov 2025 13:58:54 +0000 (14:58 +0100)] 
net: mdio: improve reset handling in mdio_device.c

Change fwnode_property_read_u32() in mdio_device_register_reset()
to device_property_read_u32(), which is more appropriate here.

Make mdio_device_unregister_reset() truly reverse
mdio_device_register_reset() by setting the internal fields to
their default values.

Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Link: https://patch.msgid.link/641df1488517ae71ba10158ec1e38424211d8651.1763473655.git.buday.csaba@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: mdio: common handling of phy device reset properties
Buday Csaba [Tue, 18 Nov 2025 13:58:53 +0000 (14:58 +0100)] 
net: mdio: common handling of phy device reset properties

Unify the handling of the per device reset properties for
`mdio_device`.

Merge mdio_device_register_gpiod() and mdio_device_register_reset()
into mdio_device_register_reset(), that handles both
reset-controllers and reset-gpios.
Move reading of the reset firmware properties (reset-assert-us,
reset-deassert-us) from fwnode_mdio.c to mdio_device_register_reset(),
so all reset related initialization code is kept in one place.

Introduce mdio_device_unregister_reset() to release the associated
resources.

These changes make tracking the reset properties easier.
Added kernel-doc for mdio_device_register/unregister_reset().

Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Link: https://patch.msgid.link/17c216efd7a47be17db104378b6aacfc8741d8b9.1763473655.git.buday.csaba@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: mdio: move device reset functions to mdio_device.c
Buday Csaba [Tue, 18 Nov 2025 13:58:52 +0000 (14:58 +0100)] 
net: mdio: move device reset functions to mdio_device.c

The functions mdiobus_register_gpiod() and mdiobus_register_reset()
handle the mdio device reset initialization, which belong to
mdio_device.c.
Move them from mdio_bus.c to mdio_device.c, and rename them to match
the corresponding source file: mdio_device_register_gpio() and
mdio_device_register_reset().
Remove 'static' qualifiers and declare them in
drivers/net/phy/mdio-private.h (new header file).

Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Link: https://patch.msgid.link/5f684838ee897130f21b21beb07695eea4af8988.1763473655.git.buday.csaba@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 20 Nov 2025 17:12:41 +0000 (09:12 -0800)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.18-rc7).

No conflicts, adjacent changes:

tools/testing/selftests/net/af_unix/Makefile
  e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.")
  45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 20 Nov 2025 16:52:07 +0000 (08:52 -0800)] 
Merge tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from IPsec and wireless.

  Previous releases - regressions:

   - prevent NULL deref in generic_hwtstamp_ioctl_lower(),
     newer APIs don't populate all the pointers in the request

   - phylink: add missing supported link modes for the fixed-link

   - mptcp: fix false positive warning in mptcp_pm_nl_rm_addr

  Previous releases - always broken:

   - openvswitch: remove never-working support for setting NSH fields

   - xfrm: number of fixes for error paths of xfrm_state creation/
     modification/deletion

   - xfrm: fixes for offload
      - fix the determination of the protocol of the inner packet
      - don't push locally generated packets directly to L2 tunnel
        mode offloading, they still need processing from the standard
        xfrm path

   - mptcp: fix a couple of corner cases in fallback and fastclose
     handling

   - wifi: rtw89: hw_scan: prevent connections from getting stuck,
     work around apparent bug in FW by tweaking messages we send

   - af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait

   - veth: more robust handing of race to avoid txq getting stuck

   - eth: ps3_gelic_net: handle skb allocation failures"

* tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  vsock: Ignore signal/timeout on connect() if already established
  be2net: pass wrb_params in case of OS2BMC
  l2tp: reset skb control buffer on xmit
  net: dsa: microchip: lan937x: Fix RGMII delay tuning
  selftests: mptcp: add a check for 'add_addr_accepted'
  mptcp: fix address removal logic in mptcp_pm_nl_rm_addr
  selftests: mptcp: join: userspace: longer timeout
  selftests: mptcp: join: endpoints: longer timeout
  selftests: mptcp: join: fastclose: remove flaky marks
  mptcp: fix duplicate reset on fastclose
  mptcp: decouple mptcp fastclose from tcp close
  mptcp: do not fallback when OoO is present
  mptcp: fix premature close in case of fallback
  mptcp: avoid unneeded subflow-level drops
  mptcp: fix ack generation for fallback msk
  wifi: rtw89: hw_scan: Don't let the operating channel be last
  net: phylink: add missing supported link modes for the fixed-link
  selftest: af_unix: Add test for SO_PEEK_OFF.
  af_unix: Read sk_peek_offset() again after sleeping in unix_stream_read_generic().
  net/mlx5: Clean up only new IRQ glue on request_irq() failure
  ...

3 weeks agovsock: Ignore signal/timeout on connect() if already established
Michal Luczaj [Wed, 19 Nov 2025 14:02:59 +0000 (15:02 +0100)] 
vsock: Ignore signal/timeout on connect() if already established

During connect(), acting on a signal/timeout by disconnecting an already
established socket leads to several issues:

1. connect() invoking vsock_transport_cancel_pkt() ->
   virtio_transport_purge_skbs() may race with sendmsg() invoking
   virtio_transport_get_credit(). This results in a permanently elevated
   `vvs->bytes_unsent`. Which, in turn, confuses the SOCK_LINGER handling.

2. connect() resetting a connected socket's state may race with socket
   being placed in a sockmap. A disconnected socket remaining in a sockmap
   breaks sockmap's assumptions. And gives rise to WARNs.

3. connect() transitioning SS_CONNECTED -> SS_UNCONNECTED allows for a
   transport change/drop after TCP_ESTABLISHED. Which poses a problem for
   any simultaneous sendmsg() or connect() and may result in a
   use-after-free/null-ptr-deref.

Do not disconnect socket on signal/timeout. Keep the logic for unconnected
sockets: they don't linger, can't be placed in a sockmap, are rejected by
sendmsg().

[1]: https://lore.kernel.org/netdev/e07fd95c-9a38-4eea-9638-133e38c2ec9b@rbox.co/
[2]: https://lore.kernel.org/netdev/20250317-vsock-trans-signal-race-v4-0-fc8837f3f1d4@rbox.co/
[3]: https://lore.kernel.org/netdev/60f1b7db-3099-4f6a-875e-af9f6ef194f6@rbox.co/

Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251119-vsock-interrupted-connect-v2-1-70734cf1233f@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agobe2net: pass wrb_params in case of OS2BMC
Andrey Vatoropin [Wed, 19 Nov 2025 10:51:12 +0000 (10:51 +0000)] 
be2net: pass wrb_params in case of OS2BMC

be_insert_vlan_in_pkt() is called with the wrb_params argument being NULL
at be_send_pkt_to_bmc() call site.  This may lead to dereferencing a NULL
pointer when processing a workaround for specific packet, as commit
bc0c3405abbb ("be2net: fix a Tx stall bug caused by a specific ipv6
packet") states.

The correct way would be to pass the wrb_params from be_xmit().

Fixes: 760c295e0e8d ("be2net: Support for OS2BMC.")
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Link: https://patch.msgid.link/20251119105015.194501-1-a.vatoropin@crpt.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'ynl-cli-list-attrs-argument'
Paolo Abeni [Thu, 20 Nov 2025 14:43:05 +0000 (15:43 +0100)] 
Merge branch 'ynl-cli-list-attrs-argument'

Gal Pressman says:

====================
YNL CLI --list-attrs argument

While experimenting with the YNL CLI, I found the process of going back
and forth to examine the YAML spec files in order to figure out how to
use each command quite tiring.

The addition of --list-attrs helps by providing all information needed
directly in the tool. I figured others would likely find it useful as
well.

v1: https://lore.kernel.org/all/20251116192845.1693119-1-gal@nvidia.com/
====================

Link: https://patch.msgid.link/20251118143208.2380814-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agotools: ynl: cli: Display enum values in --list-attrs output
Gal Pressman [Tue, 18 Nov 2025 14:32:08 +0000 (16:32 +0200)] 
tools: ynl: cli: Display enum values in --list-attrs output

When listing attributes with --list-attrs, display the actual enum
values for attributes that reference an enum type.

  # ./cli.py --family netdev --list-attrs dev-get
  [..]
    - xdp-features: u64 (enum: xdp-act)
      Flags: basic, redirect, ndo-xmit, xsk-zerocopy, hw-offload, rx-sg, ndo-xmit-sg
      Bitmask of enabled xdp-features.
  [..]

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20251118143208.2380814-4-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agotools: ynl: cli: Parse nested attributes in --list-attrs output
Gal Pressman [Tue, 18 Nov 2025 14:32:07 +0000 (16:32 +0200)] 
tools: ynl: cli: Parse nested attributes in --list-attrs output

Enhance the --list-attrs option to recursively display nested attributes
instead of just showing "nest" as the type.
Nested attributes now show their attribute set name and expand to
display their contents.

  # ./cli.py --family ethtool --list-attrs rss-get
  [..]
  Do request attributes:
    - header: nest -> header
        - dev-index: u32
        - dev-name: string
        - flags: u32 (enum: header-flags)
        - phy-index: u32
    - context: u32
  [..]

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251118143208.2380814-3-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agotools: ynl: cli: Add --list-attrs option to show operation attributes
Gal Pressman [Tue, 18 Nov 2025 14:32:06 +0000 (16:32 +0200)] 
tools: ynl: cli: Add --list-attrs option to show operation attributes

Add a --list-attrs option to the YNL CLI that displays information about
netlink operations, including request and reply attributes.
This eliminates the need to manually inspect YAML spec files to
determine the JSON structure required for operations, or understand the
structure of the reply.

Example usage:
  # ./cli.py --family netdev --list-attrs dev-get
  Operation: dev-get
  Get / dump information about a netdev.

  Do request attributes:
    - ifindex: u32
      netdev ifindex

  Do reply attributes:
    - ifindex: u32
      netdev ifindex
    - xdp-features: u64 (enum: xdp-act)
      Bitmask of enabled xdp-features.
    - xdp-zc-max-segs: u32
      max fragment count supported by ZC driver
    - xdp-rx-metadata-features: u64 (enum: xdp-rx-metadata)
      Bitmask of supported XDP receive metadata features. See Documentation/networking/xdp-rx-metadata.rst for more details.
    - xsk-features: u64 (enum: xsk-flags)
      Bitmask of enabled AF_XDP features.

  Dump reply attributes:
    - ifindex: u32
      netdev ifindex
    - xdp-features: u64 (enum: xdp-act)
      Bitmask of enabled xdp-features.
    - xdp-zc-max-segs: u32
      max fragment count supported by ZC driver
    - xdp-rx-metadata-features: u64 (enum: xdp-rx-metadata)
      Bitmask of supported XDP receive metadata features. See Documentation/networking/xdp-rx-metadata.rst for more details.
    - xsk-features: u64 (enum: xsk-flags)
      Bitmask of enabled AF_XDP features.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20251118143208.2380814-2-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoMerge branch 'add-af_xdp-zero-copy-support'
Paolo Abeni [Thu, 20 Nov 2025 14:24:13 +0000 (15:24 +0100)] 
Merge branch 'add-af_xdp-zero-copy-support'

Meghana Malladi says:

====================
Add AF_XDP zero copy support

This series adds AF_XDP zero coppy support to icssg driver.

Tests were performed on AM64x-EVM with xdpsock application [1].

A clear improvement is seen Transmit (txonly) and receive (rxdrop)
for 64 byte packets. 1500 byte test seems to be limited by line
rate (1G link) so no improvement seen there in packet rate

Having some issue with l2fwd as the benchmarking numbers show 0
for 64 byte packets after forwading first batch packets and I am
currently looking into it.

AF_XDP performance using 64 byte packets in Kpps.
AF_XDP performance using 64 byte packets in Kpps.
Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy)
rxdrop 253 473 656
txonly 350 354 855
l2fwd  178 240 0

AF_XDP performance using 1500 byte packets in Kpps.
Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy)
rxdrop 82 82 82
txonly 81 82 82
l2fwd  81 82 82

[1]: https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example
v5: https://lore.kernel.org/all/20251111101523.3160680-1-m-malladi@ti.com/
====================

Link: https://patch.msgid.link/20251118135542.380574-1-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: ti: icssg-prueth: Enable zero copy in XDP features
Meghana Malladi [Tue, 18 Nov 2025 13:55:42 +0000 (19:25 +0530)] 
net: ti: icssg-prueth: Enable zero copy in XDP features

Enable the zero copy feature flag in xdp_set_features_flag()
for a given ndev to get the AF-XDP zero copy support running
for both Tx and Rx.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-7-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: ti: icssg-prueth: Add AF_XDP zero copy for RX
Meghana Malladi [Tue, 18 Nov 2025 13:55:41 +0000 (19:25 +0530)] 
net: ti: icssg-prueth: Add AF_XDP zero copy for RX

Use xsk_pool inside rx_chn to check if a given Rx queue id
is registered for xsk zero copy, which gets populated during
xsk enable.

Update prueth_create_xdp_rxqs to register and support two different
memory models (xsk and page) for a given Rx queue, if registered for
zero copy.

If xsk_pool is registered, allocate buffers from UMEM and map them
to the hardware Rx descriptors. In NAPI context, run the XDP program
for each packet and process the xsk buffer according to the XDP
result codes. Also allocate new set of buffers from UMEM for the
next batch of NAPI Rx processing. Add XDK_WAKEUP_RX support to support
xsk wakeup for Rx.

Move prueth_create_page_pool to prueth_init_rx_chns to avoid freeing
and re-allocating the system memory every time there is a transition
from zero copy to copy and prevents any type of memory fragmentation
or leak.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-6-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: ti: icssg-prueth: Make emac_run_xdp function independent of page
Meghana Malladi [Tue, 18 Nov 2025 13:55:40 +0000 (19:25 +0530)] 
net: ti: icssg-prueth: Make emac_run_xdp function independent of page

emac_run_xdp function runs xdp program, at a given hook point
in the Rx path of the driver in NAPI context and returns
XDP return codes. In zero copy mode the driver receives
packets using UMEM frames instead of pages (native XDP).
Decouple the usage of page in this function.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-5-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: ti: icssg-prueth: Add AF_XDP zero copy for TX
Meghana Malladi [Tue, 18 Nov 2025 13:55:39 +0000 (19:25 +0530)] 
net: ti: icssg-prueth: Add AF_XDP zero copy for TX

Use xsk_pool inside tx_chn to check if a given Tx queue id
is registered for xsk zero copy, which gets populated during
xsk enable

If xsk_pool is set, get frames from the pool in NAPI
context and submit them to the Tx channel. Tx completion
is also handled in the NAPI context.

Use PRUETH_SWDATA_XSK to recycle xsk buffers back to the
umem pool. Add XDP_WAKEUP_TX support to enable xsk_wakeup
for Tx.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-4-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: ti: icssg-prueth: Add XSK pool helpers
Meghana Malladi [Tue, 18 Nov 2025 13:55:38 +0000 (19:25 +0530)] 
net: ti: icssg-prueth: Add XSK pool helpers

Implement XSK NDOs (setup, wakeup) and create XSK
Rx and Tx queues. xsk_qid stores the queue id for
a given port which has been registered for zero copy
AF_XDP and used to acquire UMEM pointer if registered.

Based on the xsk_qid and the xsk_pool (umem) the driver
is either in copy or zero copy mode. In case of copy mode
the xsk_qid value will be invalid and will be set to valid
queue id when enabling zero copy. To enable zero copy, the
Rx queues are destroyed, i.e., descriptors pushed to fq
and cq are freed to remap them to xdp buffers from the umem.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-3-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: ti: icssg-prueth: Add functions to create and destroy Rx/Tx queues
Meghana Malladi [Tue, 18 Nov 2025 13:55:37 +0000 (19:25 +0530)] 
net: ti: icssg-prueth: Add functions to create and destroy Rx/Tx queues

Each port for a given ICSSG instance has their own set of
Tx and Rx queues. Add functions to create and destroy these
queues, which will be further used while performing ndo_bpf
operations to set up XSK Tx/Rx queues for a given port.

In the destroy Rx queue sequence add teardown wait to ensure
that all the descriptors including the TDCM (teardown completion
marker) have been serviced and freed to avoid any sort of descriptor
leaks.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-2-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoMerge tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Abeni [Thu, 20 Nov 2025 12:02:00 +0000 (13:02 +0100)] 
Merge tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
wireless-2025-11-20

A single fix for scanning on some rtw89 devices.

* tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: rtw89: hw_scan: Don't let the operating channel be last
====================

Link: https://patch.msgid.link/20251120085433.8601-3-johannes@sipsolutions.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoMerge branch 'txgbe-support-more-modules'
Paolo Abeni [Thu, 20 Nov 2025 11:47:28 +0000 (12:47 +0100)] 
Merge branch 'txgbe-support-more-modules'

Jiawen Wu says:

====================
TXGBE support more modules

Support CR modules for 25G devices and QSFP modules for 40G devices. And
implement .get_module_eeprom_by_page() to get module info.

v1: https://lore.kernel.org/all/20251112055841.22984-1-jiawenwu@trustnetic.com/
====================

Link: https://patch.msgid.link/20251118080259.24676-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: txgbe: support getting module EEPROM by page
Jiawen Wu [Tue, 18 Nov 2025 08:02:59 +0000 (16:02 +0800)] 
net: txgbe: support getting module EEPROM by page

Getting module EEPROM has been supported in TXGBE SP devices, since SFP
driver has already implemented it.

Now add support to read module EEPROM for AML devices. Towards this, add
a new firmware mailbox command to get the page data.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20251118080259.24676-6-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: txgbe: delay to identify modules in .ndo_open
Jiawen Wu [Tue, 18 Nov 2025 08:02:58 +0000 (16:02 +0800)] 
net: txgbe: delay to identify modules in .ndo_open

For QSFP modules, there is a possibility that the module cannot be
identified when read I2C immediately in .ndo_open. So just set the flag
WX_FLAG_NEED_MODULE_RESET and do it in the subtask, which always wait
200 ms to identify the module. And this change has no impact on the
original adaptation.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20251118080259.24676-5-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>