]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
5 weeks agonet: stmmac: visconti: use PHY_INTF_SEL_x to select PHY interface
Russell King (Oracle) [Tue, 11 Nov 2025 08:12:43 +0000 (08:12 +0000)] 
net: stmmac: visconti: use PHY_INTF_SEL_x to select PHY interface

Convert dwmac-visconti to use the PHY_INTF_SEL_x definitions. The
original definitions used constant 0, BIT(0) (==1) and BIT(2) (==4)
to define these, but the values of the bits corresponds with the
PHY_INTF_SEL_x values, so it is highly likely that these are not
individual bits, but the PHY_INTF_SEL_x bitfield.

This removes this incorrect use of BIT().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vIjUZ-0000000Dqu5-2sDI@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: stm32: use stmmac_get_phy_intf_sel()
Russell King (Oracle) [Tue, 11 Nov 2025 08:12:38 +0000 (08:12 +0000)] 
net: stmmac: stm32: use stmmac_get_phy_intf_sel()

Use stmmac_get_phy_intf_sel() to decode the PHY interface mode to the
phy_intf_sel value. As both configure functions would end up with the
same code, call this from stm32mp1_set_mode(), validate the result and
pass the resulting value into the stm32 configure function. Use this
value to set the operating mode for the DWMAC core.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vIjUU-0000000Dqtz-2PwT@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: stm32: use PHY_INTF_SEL_x directly
Russell King (Oracle) [Tue, 11 Nov 2025 08:12:33 +0000 (08:12 +0000)] 
net: stmmac: stm32: use PHY_INTF_SEL_x directly

Rather than defining separate constants for each, use the
PHY_INTF_SEL_x definitions in the switch()es configuring the
control register, and use one FIELD_PREP() to convert phy_intf_sel
to the register value.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vIjUP-0000000Dqtt-1bYn@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: stm32: use PHY_INTF_SEL_x to select PHY interface
Russell King (Oracle) [Tue, 11 Nov 2025 08:12:28 +0000 (08:12 +0000)] 
net: stmmac: stm32: use PHY_INTF_SEL_x to select PHY interface

Convert dwmac-stm32 to use the PHY_INTF_SEL_x definitions.

For stm32mp1, the original definitions used constant 0 (GMII, 0 << 21),
BIT(21) (RGMII, 1 << 21) and BIT(23) (RMII, 4 << 21) to define these,
but from the values it can be clearly seen that these are the
PHY_INTF_SEL_x inputs to the dwmac.

For stm32mp2, the original definitions cover a bitfield 6:4 in the
SYSCFG Ethernet1 control register (according to documentation) and use
the PHY_INTF_SEL_x values.

Use the common dwmac definitions for the PHY interface selection field
by adding the bitfield mask, and using FIELD_PREP() for the bitfield
values.

This removes this incorrect use of BIT().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vIjUK-0000000Dqtn-1AyK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: starfive: use stmmac_get_phy_intf_sel()
Russell King (Oracle) [Tue, 11 Nov 2025 08:12:23 +0000 (08:12 +0000)] 
net: stmmac: starfive: use stmmac_get_phy_intf_sel()

Use stmmac_get_phy_intf_sel() to decode the PHY interface mode to the
phy_intf_sel value, validate the result and use that to set the
control register to select the operating mode for the DWMAC core.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://patch.msgid.link/E1vIjUF-0000000Dqth-0gwD@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: starfive: use PHY_INTF_SEL_x to select PHY interface
Russell King (Oracle) [Tue, 11 Nov 2025 08:12:18 +0000 (08:12 +0000)] 
net: stmmac: starfive: use PHY_INTF_SEL_x to select PHY interface

Use the common dwmac definitions for the PHY interface selection field.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://patch.msgid.link/E1vIjUA-0000000Dqtb-0AfP@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: mediatek: simplify set_interface() methods
Russell King (Oracle) [Tue, 11 Nov 2025 08:12:12 +0000 (08:12 +0000)] 
net: stmmac: mediatek: simplify set_interface() methods

Use the phy_intf_sel field value when deciding what other options to
apply for the configuration register.

Note that this will allow GMII as well as MII as the phy_intf_sel
value is the same for both.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vIjU4-0000000DqtV-3qsX@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: mediatek: use stmmac_get_phy_intf_sel()
Russell King (Oracle) [Tue, 11 Nov 2025 08:12:07 +0000 (08:12 +0000)] 
net: stmmac: mediatek: use stmmac_get_phy_intf_sel()

Use stmmac_get_phy_intf_sel() to decode the PHY interface mode to the
phy_intf_sel value, validate the result, and pass that into the
implementation specific ->dwmac_set_phy_interface() method. Use this
to configure the PHY interface selection field.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vIjTz-0000000DqtP-3N9v@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: mediatek: use PHY_INTF_SEL_x
Russell King (Oracle) [Tue, 11 Nov 2025 08:12:02 +0000 (08:12 +0000)] 
net: stmmac: mediatek: use PHY_INTF_SEL_x

Use PHY_INTF_SEL_x definitions for the fields that correspond to the
phy_intf_sel inputs to the dwmac core.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vIjTu-0000000DqtI-2sUB@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: loongson1: use stmmac_get_phy_intf_sel()
Russell King (Oracle) [Tue, 11 Nov 2025 08:11:57 +0000 (08:11 +0000)] 
net: stmmac: loongson1: use stmmac_get_phy_intf_sel()

Use stmmac_get_phy_intf_sel() to decode the PHY interface mode to the
phy_intf_sel value, validate the result and use that to set the
control register to select the operating mode for the DWMAC core.

Note that this will allow GMII as well as MII as the phy_intf_sel
value is the same for both.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vIjTp-0000000DqtC-2DmI@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: loongson1: use PHY_INTF_SEL_x directly
Russell King (Oracle) [Tue, 11 Nov 2025 08:11:52 +0000 (08:11 +0000)] 
net: stmmac: loongson1: use PHY_INTF_SEL_x directly

Use the PHY_INTF_SEL_xx values directly in ls1c_dwmac_syscon_init(),
converting them to the PHY_INTF_SELI bitfield when calling
regmap_update_bits().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vIjTk-0000000Dqt6-1gN9@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: loongson1: use PHY_INTF_SEL_x
Russell King (Oracle) [Tue, 11 Nov 2025 08:11:47 +0000 (08:11 +0000)] 
net: stmmac: loongson1: use PHY_INTF_SEL_x

Use PHY_INTF_SEL_x definitions for phy_intf_sel bitfield.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotools: ynltool: correct install in Makefile
Jakub Kicinski [Tue, 11 Nov 2025 15:52:14 +0000 (07:52 -0800)] 
tools: ynltool: correct install in Makefile

Use the variable in case user has a custom install binary.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20251111155214.2760711-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: drv-net: Limit the max number of queues in procfs_downup_hammer
Dimitri Daskalakis [Tue, 11 Nov 2025 22:53:19 +0000 (14:53 -0800)] 
selftests: drv-net: Limit the max number of queues in procfs_downup_hammer

For NICs with a large (1024+) number of queues, this test can cause
excessive memory fragmentation. This results in OOM errors, and in the
worst case driver/kernel crashes. We don't need to test with the max number
of queues, just enough to create a high likelihood of races between
reconfiguration and stats getting read.

Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20251111225319.3019542-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'wireless-next-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Wed, 12 Nov 2025 17:33:23 +0000 (09:33 -0800)] 
Merge tag 'wireless-next-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
More -next material, notably:
 - split ieee80211.h file, it's way too big
 - mac80211: initial chanctx work towards NAN
 - mac80211: MU-MIMO sniffer improvements
 - ath12k: statistics improvements

* tag 'wireless-next-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (26 commits)
  wifi: cw1200: Fix potential memory leak in cw1200_bh_rx_helper()
  wifi: mac80211: make monitor link info check more specific
  wifi: mac80211: track MU-MIMO configuration on disabled interfaces
  wifi: cfg80211/mac80211: Add fallback mechanism for INDOOR_SP connection
  wifi: cfg80211/mac80211: clean up duplicate ap_power handling
  wifi: cfg80211: use a C99 initializer in wiphy_register
  wifi: cfg80211: fix doc of struct key_params
  wifi: mac80211: remove unnecessary vlan NULL check
  wifi: mac80211: pass frame type to element parsing
  wifi: mac80211: remove "disabling VHT" message
  wifi: mac80211: add and use chanctx usage iteration
  wifi: mac80211: simplify ieee80211_recalc_chanctx_min_def() API
  wifi: mac80211: remove chanctx to link back-references
  wifi: mac80211: make link iteration safe for 'break'
  wifi: mac80211: fix EHT typo
  wifi: cfg80211: fix EHT typo
  wifi: ieee80211: split NAN definitions out
  wifi: ieee80211: split P2P definitions out
  wifi: ieee80211: split S1G definitions out
  wifi: ieee80211: split EHT definitions out
  ...
====================

Link: https://patch.msgid.link/20251112115126.16223-4-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: improve ndev->max_mtu setup readability
Russell King (Oracle) [Tue, 11 Nov 2025 11:26:34 +0000 (11:26 +0000)] 
net: stmmac: improve ndev->max_mtu setup readability

Improve the readibility of the code setting ndev->max_mtu. This depends
on the hardware specific maximum defined by the MAC core, and also a
platform provided maximum.

The code was originally checking that the platform specific maximum was
between ndev->min_mtu..MAC core maximum before reducing ndev->max_mtu,
otherwise if the platform specific maximum was less than ndev->min_mtu,
issuing a warning.

Re-order the code to handle the case where the platform specific max is
below ndev->min_mtu, which then means that the subsequent test is
simply reducing ndev->max_mtu.

Update the comment, and add a few blank lines to separate the blocks of
code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vImWA-0000000DrIl-1HZY@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agor8169: add support for RTL8125K
javen [Tue, 11 Nov 2025 09:28:51 +0000 (17:28 +0800)] 
r8169: add support for RTL8125K

This adds support for chip RTL8125K. Its XID is 0x68a. It is basically
based on the one with XID 0x688, but with different firmware file.

Signed-off-by: javen <javen_xu@realsil.com.cn>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20251111092851.3371-1-javen_xu@realsil.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: clear skb->sk in skb_release_head_state()
Eric Dumazet [Tue, 11 Nov 2025 15:12:35 +0000 (15:12 +0000)] 
net: clear skb->sk in skb_release_head_state()

skb_release_head_state() inlines skb_orphan().

We need to clear skb->sk otherwise we can freeze TCP flows
on a mostly idle host, because skb_fclone_busy() would
return true as long as the packet is not yet processed by
skb_defer_free_flush().

Fixes: 1fcf572211da ("net: allow skb_release_head_state() to be called multiple times")
Fixes: e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251111151235.1903659-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'selftests-vsock-refactor-and-improve-vmtest-infrastructure'
Jakub Kicinski [Wed, 12 Nov 2025 14:19:42 +0000 (06:19 -0800)] 
Merge branch 'selftests-vsock-refactor-and-improve-vmtest-infrastructure'

Bobby Eshleman says:

====================
selftests/vsock: refactor and improve vmtest infrastructure

This patch series refactors the vsock selftest VM infrastructure to
improve test run times, improve logging, and prepare for future tests
which make heavy usage of these refactored functions and have new
requirements such as simultaneous QEMU processes.
====================

Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-0-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: disable shellcheck SC2317 and SC2119
Bobby Eshleman [Sat, 8 Nov 2025 16:01:03 +0000 (08:01 -0800)] 
selftests/vsock: disable shellcheck SC2317 and SC2119

Disable shellcheck rules SC2317 and SC2119. These rules are being
triggered due to false positives. For SC2317, many `return
"${KSFT_PASS}"` lines are reported as unreachable, even though they are
executed during normal runs. For SC2119, the fact that
log_guest/log_host accept either stdin or arguments triggers SC2119,
despite being valid.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-12-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: add vsock_loopback module loading
Bobby Eshleman [Sat, 8 Nov 2025 16:01:02 +0000 (08:01 -0800)] 
selftests/vsock: add vsock_loopback module loading

Add vsock_loopback module loading to the loopback test so that vmtest.sh
can be used for kernels built with loopback as a module.

This is not technically a fix as kselftest expects loopback to be
built-in already (defined in selftests/vsock/config). This is useful
only for using vmtest.sh outside of kselftest.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-11-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: add 1.37 to tested virtme-ng versions
Bobby Eshleman [Sat, 8 Nov 2025 16:01:01 +0000 (08:01 -0800)] 
selftests/vsock: add 1.37 to tested virtme-ng versions

Testing with 1.37 shows all tests passing but emits the warning:

warning: vng version 'virtme-ng 1.37' has not been tested and may not function properly.
The following versions have been tested: 1.33 1.36

This patch adds 1.37 to the virtme-ng versions to get rid of the above
warning.

Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-10-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: add BUILD=0 definition
Bobby Eshleman [Sat, 8 Nov 2025 16:01:00 +0000 (08:01 -0800)] 
selftests/vsock: add BUILD=0 definition

Add the definition for BUILD and initialize it to zero. This avoids
'bash -u vmtest.sh` from throwing 'unbound variable' when BUILD is not
set to 1 and is later checked for its value.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-9-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: identify and execute tests that can re-use VM
Bobby Eshleman [Sat, 8 Nov 2025 16:00:59 +0000 (08:00 -0800)] 
selftests/vsock: identify and execute tests that can re-use VM

In preparation for future patches that introduce tests that cannot
re-use the same VM, add functions to identify those that *can* re-use a
VM.

By continuing to re-use the same VM for these tests we can save time by
avoiding the delay of booting a VM for every test.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-8-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: add check_result() for pass/fail counting
Bobby Eshleman [Sat, 8 Nov 2025 16:00:58 +0000 (08:00 -0800)] 
selftests/vsock: add check_result() for pass/fail counting

Add check_result() function to reuse logic for incrementing the
pass/fail counters. This function will get used by different callers as
we add different types of tests in future patches (namely, namespace and
non-namespace tests will be called at different places, and re-use this
function).

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-7-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: speed up tests by reducing the QEMU pidfile timeout
Bobby Eshleman [Sat, 8 Nov 2025 16:00:57 +0000 (08:00 -0800)] 
selftests/vsock: speed up tests by reducing the QEMU pidfile timeout

Reduce the time waiting for the QEMU pidfile from three minutes to five
seconds. The three minute time window was chosen to make sure QEMU had
enough time to fully boot up. This, however, is an unreasonably long
delay for QEMU to write the pidfile, which happens earlier when the QEMU
process starts (not after VM boot). The three minute delay becomes
noticeably wasteful in future tests that expect QEMU to fail and wait a
full three minutes for a pidfile that will never exist.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-6-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: do not unconditionally die if qemu fails
Bobby Eshleman [Sat, 8 Nov 2025 16:00:56 +0000 (08:00 -0800)] 
selftests/vsock: do not unconditionally die if qemu fails

If QEMU fails to boot, then set the returncode (via timeout) instead of
unconditionally dying. This is in preparation for tests that expect QEMU
to fail to boot. In that case, we just want to know if the boot failed
or not so we can test the pass/fail criteria, and continue executing the
next test.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-5-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: avoid multi-VM pidfile collisions with QEMU
Bobby Eshleman [Sat, 8 Nov 2025 16:00:55 +0000 (08:00 -0800)] 
selftests/vsock: avoid multi-VM pidfile collisions with QEMU

Change QEMU to use generated pidfile names instead of just a single
globally-defined pidfile. This allows multiple QEMU instances to
co-exist with different pidfiles. This is required for future tests that
use multiple VMs to check for CID collissions.

Additionally, this also places the burden of killing the QEMU process
and cleaning up the pidfile on the caller of vm_start(). To help with
this, a function terminate_pidfiles() is introduced that callers use to
perform the cleanup. The terminate_pidfiles() function supports multiple
pidfile removals because future patches will need to process two
pidfiles at a time.

Change QEMU_OPTS to be initialized inside the vm_start(). This allows
the generated pidfile to be passed to the string assignment, and
prepares for future vm-specific options as well (e.g., cid).

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-4-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: reuse logic for vsock_test through wrapper functions
Bobby Eshleman [Sat, 8 Nov 2025 16:00:54 +0000 (08:00 -0800)] 
selftests/vsock: reuse logic for vsock_test through wrapper functions

Add wrapper functions vm_vsock_test() and host_vsock_test() to invoke
the vsock_test binary. This encapsulates several items of repeat logic,
such as waiting for the server to reach listening state and
enabling/disabling the bash option pipefail to avoid pipe-style logging
from hiding failures.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-3-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: make wait_for_listener() work even if pipefail is on
Bobby Eshleman [Sat, 8 Nov 2025 16:00:53 +0000 (08:00 -0800)] 
selftests/vsock: make wait_for_listener() work even if pipefail is on

Rewrite wait_for_listener()'s pattern matching to avoid tripping the
if-condition when pipefail is on.

awk doesn't gracefully handle SIGPIPE with a non-zero exit code, so grep
exiting upon finding a match causes false-positives when the pipefail
option is used (grep exits, SIGPIPE emits, and awk complains with a
non-zero exit code). Instead, move all of the pattern matching into awk
so that SIGPIPE cannot happen and the correct exit code is returned.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-2-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests/vsock: improve logging in vmtest.sh
Bobby Eshleman [Sat, 8 Nov 2025 16:00:52 +0000 (08:00 -0800)] 
selftests/vsock: improve logging in vmtest.sh

Improve usability of logging functions. Remove the test name prefix from
logging functions so that logging calls can be made deeper into the call
stack without passing down the test name or setting some global. Teach
log function to accept a LOG_PREFIX variable to avoid unnecessary
argument shifting.

Remove log_setup() and instead use log_host(). The host/guest prefixes
are useful to show whether a failure happened on the guest or host side,
but "setup" doesn't really give additional useful information. Since all
log_setup() calls happen on the host, lets just use log_host() instead.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-1-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'ath-next-20251111' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath
Johannes Berg [Wed, 12 Nov 2025 08:56:03 +0000 (09:56 +0100)] 
Merge tag 'ath-next-20251111' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath

Jeff Johnson says:
==================
ath.git patches for v6.19 (#2)

Just one 2-patch series for this PR.

Once pulled into wireless-next, ath-next will fast-forward, and that
will provide the baseline for merging ath12k-ng into ath-next.
==================

Link: https://patch.msgid.link/15a98cae-0274-45f4-9b8e-be6fa9720884@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 weeks agonet: sparx5/lan969x: populate netdev of_node
Robert Marko [Mon, 10 Nov 2025 12:42:53 +0000 (13:42 +0100)] 
net: sparx5/lan969x: populate netdev of_node

Populate of_node for the port netdevs, to make the individual ports
of_nodes available in sysfs.

Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Link: https://patch.msgid.link/20251110124342.199216-1-robert.marko@sartura.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-stmmac-convert-meson8b-to-use-stmmac_get_phy_intf_sel'
Jakub Kicinski [Wed, 12 Nov 2025 01:53:30 +0000 (17:53 -0800)] 
Merge branch 'net-stmmac-convert-meson8b-to-use-stmmac_get_phy_intf_sel'

Russell King says:

====================
net: stmmac: convert meson8b to use stmmac_get_phy_intf_sel()

This series splits out meson8b from the previous 16 patch series
as that now has r-b tags.

This series converts meson8b to use stmmac_get_phy_intf_sel(). This
driver is not converted to the set_phy_intf_sel() method as it is
unclear whether there are ordering dependencies that would prevent
it. I would appreciate the driver author looking in to whether this
conversion is possible.
====================

Link: https://patch.msgid.link/aRH50uVDX4_9O5ZU@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: meson8b: use stmmac_get_phy_intf_sel()
Russell King (Oracle) [Mon, 10 Nov 2025 14:42:53 +0000 (14:42 +0000)] 
net: stmmac: meson8b: use stmmac_get_phy_intf_sel()

Use stmmac_get_phy_intf_sel() to decode the PHY interface mode to the
phy_intf_sel value, validate the result and use that to set the
control register to select the operating mode for the DWMAC core.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vIT6b-0000000DpPX-1LQ0@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: meson8b: use phy_intf_sel directly
Russell King (Oracle) [Mon, 10 Nov 2025 14:42:48 +0000 (14:42 +0000)] 
net: stmmac: meson8b: use phy_intf_sel directly

Rearrange meson_axg_set_phy_mode() to use phy_intf_sel directly,
converting it to the register field for meson8b_dwmac_mask_bits().

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vIT6W-0000000DpPR-0tby@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: meson8b: use PHY_INTF_SEL_x
Russell King (Oracle) [Mon, 10 Nov 2025 14:42:43 +0000 (14:42 +0000)] 
net: stmmac: meson8b: use PHY_INTF_SEL_x

Use PHY_INTF_SEL_x definitions for phy_intf_sel bitfield.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vIT6R-0000000DpPL-0Nli@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: netcp: ethss: Fix type of first parameter in hwtstamp stubs
Nathan Chancellor [Mon, 10 Nov 2025 20:55:34 +0000 (13:55 -0700)] 
net: netcp: ethss: Fix type of first parameter in hwtstamp stubs

When building without CONFIG_TI_CPTS, there are a series of errors from
-Wincompatible-pointer-types:

  drivers/net/ethernet/ti/netcp_ethss.c:3831:27: error: initialization of 'int (*)(void *, struct kernel_hwtstamp_config *)' from incompatible pointer type 'int (*)(struct gbe_intf *, struct kernel_hwtstamp_config *)' [-Wincompatible-pointer-types]
   3831 |         .hwtstamp_get   = gbe_hwtstamp_get,
        |                           ^~~~~~~~~~~~~~~~
  drivers/net/ethernet/ti/netcp_ethss.c:3831:27: note: (near initialization for 'gbe_module.hwtstamp_get')
  drivers/net/ethernet/ti/netcp_ethss.c:2758:19: note: 'gbe_hwtstamp_get' declared here
   2758 | static inline int gbe_hwtstamp_get(struct gbe_intf *gbe_intf,
        |                   ^~~~~~~~~~~~~~~~
  drivers/net/ethernet/ti/netcp_ethss.c:3832:27: error: initialization of 'int (*)(void *, struct kernel_hwtstamp_config *, struct netlink_ext_ack *)' from incompatible pointer type 'int (*)(struct gbe_intf *, struct kernel_hwtstamp_config *, struct netlink_ext_ack *)' [-Wincompatible-pointer-types]
   3832 |         .hwtstamp_set   = gbe_hwtstamp_set,
        |                           ^~~~~~~~~~~~~~~~
  drivers/net/ethernet/ti/netcp_ethss.c:3832:27: note: (near initialization for 'gbe_module.hwtstamp_set')
  drivers/net/ethernet/ti/netcp_ethss.c:2764:19: note: 'gbe_hwtstamp_set' declared here
   2764 | static inline int gbe_hwtstamp_set(struct gbe_intf *gbe_intf,
        |                   ^~~~~~~~~~~~~~~~

In a recent conversion to ndo_hwtstamp, the type of the first parameter
was updated for the CONFIG_TI_CPTS=y implementations of
gbe_hwtstamp_get() and gbe_hwtstamp_set() but not the CONFIG_TI_CPTS=n
ones.

Update the type of the first parameter in the CONFIG_TI_CPTS=n stubs to
resolve the errors.

Fixes: 3f02b8272557 ("ti: netcp: convert to ndo_hwtstamp callbacks")
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251110-netcp_ethss-fix-cpts-stubs-clang-wifpts-v2-1-aa6204ec1f43@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'devlink-eswitch-inactive-mode'
Paolo Abeni [Tue, 11 Nov 2025 12:17:56 +0000 (13:17 +0100)] 
Merge branch 'devlink-eswitch-inactive-mode'

Saeed Mahameed says:

====================
devlink eswitch inactive mode

Before having traffic flow through an eswitch, a user may want to have the
ability to block traffic towards the FDB until FDB is fully programmed and the
user is ready to send traffic to it. For example: when two eswitches are present
for vports in a multi-PF setup, one eswitch may take over the traffic from the
other when the user chooses. Before this take over, a user may want to first
program the inactive eswitch and then once ready redirect traffic to this new
eswitch.

This series introduces a user-configurable mode for an eswitch that allows
dynamically switching between active and inactive modes. When inactive, traffic
does not flow through the eswitch. While inactive, steering pipeline
configuration can be done (e.g. adding TC rules, discovering representors,
enabling the desired SDN modes such as bridge/OVS/DPDK/etc). Once configuration
is completed, a user can set the eswitch mode to active and have traffic flow
through. This allows admins to upgrade forwarding pipeline rules with very
minimal downtime and packet drops.

A user can start the eswitch in switchdev or switchdev_inactive mode.

Active: Traffic is enabled on this eswitch FDB.
Inactive: Traffic is ignored/dropped on this eswitch FDB.

An example use case:
$ devlink dev eswitch set pci/0000:08:00.1 mode switchdev_inactive
Setup FDB pipeline and netdev representors
...
Once ready to start receiving traffic
$ devlink dev eswitch set pci/0000:08:00.1 mode switchdev

v2: https://lore.kernel.org/all/20251107000831.157375-1-saeed@kernel.org/
v1: https://lore.kernel.org/all/20251016013618.2030940-1-saeed@kernel.org/
====================

Link: https://patch.msgid.link/20251108070404.1551708-1-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet/mlx5: E-Switch, support eswitch inactive mode
Saeed Mahameed [Sat, 8 Nov 2025 07:04:04 +0000 (23:04 -0800)] 
net/mlx5: E-Switch, support eswitch inactive mode

Add support for eswitch switchdev inactive mode

Inactive mode: Drop all traffic going to FDB, Remove
mpfs l2 rules and disconnect adjacent vports.

Active mode: Traffic flows through FDB, mpfs table populated, and
adjacent vports are connected.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20251108070404.1551708-4-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet/mlx5: MPFS, add support for dynamic enable/disable
Saeed Mahameed [Sat, 8 Nov 2025 07:04:03 +0000 (23:04 -0800)] 
net/mlx5: MPFS, add support for dynamic enable/disable

MPFS (Multi PF Switch) is enabled by default in Multi-Host environments,
the driver keeps a list of desired unicast mac addresses of all vports
(vfs/Sfs) and applied to HW via L2_table FW command.

Add API to dynamically apply the list of MACs to HW when needed for next
patches, to utilize this new API in devlink eswitch active/in-active uAPI.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20251108070404.1551708-3-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agodevlink: Introduce switchdev_inactive eswitch mode
Saeed Mahameed [Sat, 8 Nov 2025 07:04:02 +0000 (23:04 -0800)] 
devlink: Introduce switchdev_inactive eswitch mode

Adds DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE attribute to UAPI and
documentation.

Before having traffic flow through an eswitch, a user may want to have the
ability to block traffic towards the FDB until FDB is fully programmed and
the user is ready to send traffic to it. For example: when two eswitches
are present for vports in a multi-PF setup, one eswitch may take over the
traffic from the other when the user chooses.
Before this take over, a user may want to first program the inactive
eswitch and then once ready redirect traffic to this new eswitch.

switchdev modes transition semantics:

legacy->switchdev_inactive: Create switchdev mode normally, traffic not
  allowed to flow yet.

switchdev_inactive->switchdev: Enable traffic to flow.

switchdev->switchdev_inactive: Block traffic on the FDB, FDB and
  representros state and content is preserved.

When eswitch is configured to this mode, traffic is ignored/dropped on
this eswitch FDB, while current configuration is kept, e.g FDB rules and
netdev representros are kept available, FDB programming is allowed.

Example:
 # start inactive switchdev
devlink dev eswitch set pci/0000:08:00.1 mode switchdev_inactive
 # setup TC rules, representors etc ..
 # activate
devlink dev eswitch set pci/0000:08:00.1 mode switchdev

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20251108070404.1551708-2-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoMerge branch 'tools-ynl-turn-the-page-pool-sample-into-a-real-tool'
Paolo Abeni [Tue, 11 Nov 2025 11:21:06 +0000 (12:21 +0100)] 
Merge branch 'tools-ynl-turn-the-page-pool-sample-into-a-real-tool'

Jakub Kicinski says:

====================
tools: ynl: turn the page-pool sample into a real tool

The page-pool YNL sample is quite useful. It's helps calculate
recycling rate and memory consumption. Since we still haven't
figured out a way to integrate with iproute2 (not for the lack
of thinking how to solve it) - create a ynltool command in ynl.

Add page-pool and qstats support.

Most commands can use the Python YNL CLI directly but low level
stats often need aggregation or some math on top to be useful.
Specifically in this patch set:
 - page pool stats are aggregated and recycling rate computed
 - per-queue stats are used to compute traffic balance across queues

v1: https://lore.kernel.org/20251104232348.1954349-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20251107162227.980672-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agotools: ynltool: add traffic distribution balance
Jakub Kicinski [Fri, 7 Nov 2025 16:22:27 +0000 (08:22 -0800)] 
tools: ynltool: add traffic distribution balance

The main if not only use case for per-queue stats today is checking
for traffic imbalance. Add simple traffic balance analysis to qstats.

 $ ynltool qstat balance
 eth0 rx 44 queues:
  rx-packets  : cv=6.9% ns=24.2% stddev=512006493
                min=6278921110 max=8011570575 mean=7437054644
  rx-bytes    : cv=6.9% ns=24.1% stddev=759670503060
                min=9326315769440 max=11884393670786 mean=11035439201354
  ...

  $ ynltool -j qstat balance | jq
  [
   {
    "ifname": "eth0",
    "ifindex": 2,
    "queue-type": "rx",
    "rx-packets": {
      "queue-count": 44,
      "min": 6278301665,
      "max": 8010780185,
      "mean": 7.43635E+9,
      "stddev": 5.12012E+8,
      "coefficient-of-variation": 6.88525,
      "normalized-spread": 24.249
    },
   ...

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20251107162227.980672-5-kuba@kernel.org
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agotools: ynltool: add qstats support
Jakub Kicinski [Fri, 7 Nov 2025 16:22:26 +0000 (08:22 -0800)] 
tools: ynltool: add qstats support

$ ynltool qstat
  eth0        rx-packets:       493192163        rx-bytes:   1442544543997
              tx-packets:       745999838        tx-bytes:   4574215826482
                 tx-stop:            7033         tx-wake:            7033

  $ ynltool qstat show group-by queue
  eth0  rx-0     packets:        70196880           bytes:    178633973750
  eth0  rx-1     packets:        63623419           bytes:    197274745250
  ...
  eth0  tx-1     packets:        98645810           bytes:    631247647938
                    stop:            1048            wake:            1048
  eth0  tx-2     packets:        86775824           bytes:    563930471952
                    stop:            1126            wake:            1126
  ...

  $ ynltool -j qstat  | jq
  [
   {
    "ifname": "eth0",
    "ifindex": 2,
    "rx": {
      "packets": 493396439,
      "bytes": 1443608198921
    },
    "tx": {
      "packets": 746239978,
      "bytes": 4574333772645,
      "stop": 7072,
      "wake": 7072
    }
   }
  ]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20251107162227.980672-4-kuba@kernel.org
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agotools: ynltool: add page-pool stats
Jakub Kicinski [Fri, 7 Nov 2025 16:22:25 +0000 (08:22 -0800)] 
tools: ynltool: add page-pool stats

Replace the page-pool sample with page pool support in ynltool.

 # ynltool page-pool stats
    eth0[2] page pools: 18 (zombies: 0)
refs: 171456 bytes: 702283776 (refs: 0 bytes: 0)
recycling: 97.3% (alloc: 2679:6134966 recycle: 1250981:4719386)
 # ynltool -j page-pool stats | jq
 [
  {
    "ifname": "eth0",
    "ifindex": 2,
    "page_pools": 18,
    "zombies": 0,
    "live": {
      "refs": 171456,
      "bytes": 702283776
    },
    "zombie": {
      "refs": 0,
      "bytes": 0
    },
    "recycling_pct": 97.2746,
    "alloc": {
      "slow": 2679,
      "fast": 6135029
    },
    "recycle": {
      "ring": 1250997,
      "cache": 4719432
    }
  }
 ]

 # ynltool page-pool stats group-by pp
 pool id: 108  dev: eth0[2]  napi: 530
   inflight: 9472 pages 38797312 bytes
   recycling: 95.5% (alloc: 148:208379 recycle: 45386:153842)
 pool id: 107  dev: eth0[2]  napi: 529
   inflight: 9408 pages 38535168 bytes
   recycling: 94.9% (alloc: 147:180178 recycle: 42251:128808)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20251107162227.980672-3-kuba@kernel.org
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agotools: ynltool: create skeleton for the C command
Jakub Kicinski [Fri, 7 Nov 2025 16:22:24 +0000 (08:22 -0800)] 
tools: ynltool: create skeleton for the C command

Based on past discussions it seems like integration of YNL into
iproute2 is unlikely. YNL itself is not great as a C library,
since it has no backward compat (we routinely change types).

Most of the operations can be performed with the generic Python
CLI directly. There is, however, a handful of operations where
summarization of kernel output is very useful (mostly related
to stats: page-pool, qstat).

Create a command (inspired by bpftool, I think it stood the test
of time reasonably well) to be able to plug the subcommands into.

Link: https://lore.kernel.org/1754895902-8790-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20251107162227.980672-2-kuba@kernel.org
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agowifi: cw1200: Fix potential memory leak in cw1200_bh_rx_helper()
Abdun Nihaal [Mon, 10 Nov 2025 17:53:15 +0000 (23:23 +0530)] 
wifi: cw1200: Fix potential memory leak in cw1200_bh_rx_helper()

In one of the error paths, the memory allocated for skb_rx is not freed.
Fix that by freeing it before returning.

Fixes: a910e4a94f69 ("cw1200: add driver for the ST-E CW1100 & CW1200 WLAN chipsets")
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Link: https://patch.msgid.link/20251110175316.106591-1-nihaal@cse.iitm.ac.in
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 weeks agowifi: mac80211: make monitor link info check more specific
Benjamin Berg [Mon, 10 Nov 2025 12:20:20 +0000 (14:20 +0200)] 
wifi: mac80211: make monitor link info check more specific

Verify that only one of the permitted change flags is set when changing
the link of a monitor interface. Before the WARN_ON_ONCE would accept
anything if mu_mimo_owner was set.

Also, split out the mu_mimo_owner flag and enable it for all interface
types. The option is set during association when VHT is available and it
is not expected that any configuration of the MU groups is done without
it being set.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251110141948.6696dba8678d.Icafac3be4724825dd6140e4407bae3a2adb593a5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 weeks agowifi: mac80211: track MU-MIMO configuration on disabled interfaces
Benjamin Berg [Mon, 10 Nov 2025 12:18:20 +0000 (14:18 +0200)] 
wifi: mac80211: track MU-MIMO configuration on disabled interfaces

For monitoring, userspace will try to configure the VIF sdata, while the
driver may see the monitor_sdata that is created when only monitor
interfaces are up. This causes the odd situation that it may not be
possible to store the MU-MIMO configuration on monitor_sdata.

Fix this by storing that information on the VIF sdata and updating the
monitor_sdata when available and the interface is up. Also, adjust the
code that adds monitor_sdata so that it will configure MU-MIMO based on
the newly added interface or one of the existing ones.

This should give a mostly consistent behaviour when configuring MU-MIMO
on sniffer interfaces. Should the user configure MU-MIMO on multiple
sniffer interfaces, then mac80211 will simply select one of the
configurations. This behaviour should be good enough and avoids breaking
user expectations in the common scenarios.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251110141514.677915f8f6bb.If4e04a57052f9ca763562a67248b06fd80d0c2c1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 weeks agowifi: cfg80211/mac80211: Add fallback mechanism for INDOOR_SP connection
Pagadala Yesu Anjaneyulu [Mon, 10 Nov 2025 12:10:30 +0000 (14:10 +0200)] 
wifi: cfg80211/mac80211: Add fallback mechanism for INDOOR_SP connection

Implement fallback to LPI mode when SP mode is not permitted
by regulatory constraints for INDOOR_SP connections.
Limit fallback mechanism to client mode.

Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251110140806.8b43201a34ae.I37fc7bb5892eb9d044d619802e8f2095fde6b296@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 weeks agowifi: cfg80211/mac80211: clean up duplicate ap_power handling
Pagadala Yesu Anjaneyulu [Mon, 10 Nov 2025 12:10:29 +0000 (14:10 +0200)] 
wifi: cfg80211/mac80211: clean up duplicate ap_power handling

Move duplicated ap_power type handling code to an inline
function in cfg80211.

Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251110140806.959948da1cb5.I893b5168329fb3232f249c182a35c99804112da6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 weeks agowifi: cfg80211: use a C99 initializer in wiphy_register
Emmanuel Grumbach [Mon, 10 Nov 2025 12:03:15 +0000 (14:03 +0200)] 
wifi: cfg80211: use a C99 initializer in wiphy_register

struct regulatory request was not fully initialized. While this is not
really a big deal because nl80211_send_reg_change_event doesn't look at
the other fields, it still makes sense to zero all the other fields as
Coverity suggests.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251110140230.f8d4fcb1328b.I87170b1caef04356809838e684c9499f5806e624@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 weeks agoxsk: add indirect call for xsk_destruct_skb
Jason Xing [Fri, 31 Oct 2025 10:33:28 +0000 (18:33 +0800)] 
xsk: add indirect call for xsk_destruct_skb

Since Eric proposed an idea about adding indirect call wrappers for
UDP and managed to see a huge improvement[1], the same situation can
also be applied in xsk scenario.

This patch adds an indirect call for xsk and helps current copy mode
improve the performance by around 1% stably which was observed with
IXGBE at 10Gb/sec loaded. If the throughput grows, the positive effect
will be magnified. I applied this patch on top of batch xmit series[2],
and was able to see <5% improvement from our internal application
which is a little bit unstable though.

Use INDIRECT wrappers to keep xsk_destruct_skb static as it used to
be when the mitigation config is off.

Be aware of the freeing path that can be very hot since the frequency
can reach around 2,000,000 times per second with the xdpsock test.

[1]: https://lore.kernel.org/netdev/20251006193103.2684156-2-edumazet@google.com/
[2]: https://lore.kernel.org/all/20251021131209.41491-1-kerneljasonxing@gmail.com/

Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251031103328.95468-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: dsa: loop: use new helper fixed_phy_register_100fd to simplify the code
Heiner Kallweit [Sat, 8 Nov 2025 21:59:51 +0000 (22:59 +0100)] 
net: dsa: loop: use new helper fixed_phy_register_100fd to simplify the code

Use new helper fixed_phy_register_100fd to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/922f1b45-1748-4dd2-87eb-9d018df44731@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoisdn: kcapi: add WQ_PERCPU to alloc_workqueue users
Marco Crivellari [Fri, 7 Nov 2025 13:44:52 +0000 (14:44 +0100)] 
isdn: kcapi: add WQ_PERCPU to alloc_workqueue users

Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107134452.198378-1-marco.crivellari@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'gve-improve-rx-buffer-length-management'
Jakub Kicinski [Tue, 11 Nov 2025 01:36:40 +0000 (17:36 -0800)] 
Merge branch 'gve-improve-rx-buffer-length-management'

Ankit Garg says:

====================
gve: Improve RX buffer length management

This patch series improves the management of the RX buffer length for
the DQO queue format in the gve driver. The goal is to make RX buffer
length config more explicit, easy to change, and performant by default.

We accomplish that in four patches:

1.  Currently, the buffer length is implicitly coupled with the header
    split setting, which is an unintuitive and restrictive design. The
    first patch decouples the RX buffer length from the header split
    configuration.

2.  The second patch is a preparatory step for third. It converts the XDP
    config verification method to use extack for better error reporting.

3.  The third patch exposes the `rx_buf_len` parameter to userspace via
    ethtool, allowing user to directly view or modify the RX buffer length
    if supported by the device.

4.  The final patch improves the out-of-the-box RX single stream throughput
    by >10%  by changing the driver's default behavior to select the
    maximum supported RX buffer length advertised by the device during
    initialization.
====================

Link: https://patch.msgid.link/20251106192746.243525-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agogve: Default to max_rx_buffer_size for DQO if device supported
Ankit Garg [Thu, 6 Nov 2025 19:27:46 +0000 (11:27 -0800)] 
gve: Default to max_rx_buffer_size for DQO if device supported

Change the driver's default behavior to prefer the largest available RX
buffer length supported by the device for DQO format, rather than always
using the hardcoded 2K default.

Previously, the driver would initialize with
`GVE_DEFAULT_RX_BUFFER_SIZE` (2K), even if the device advertised support
for a larger length (e.g., 4K).

Performance observations:
- With LRO disabled, we observed >10% improvement in RX single stream
throughput when MTU >=2048.
- With LRO enabled, we observed >10% improvement in RX single stream
throughput when MTU >=1460.
- No regressions were observed.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251106192746.243525-5-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agogve: Allow ethtool to configure rx_buf_len
Ankit Garg [Thu, 6 Nov 2025 19:27:45 +0000 (11:27 -0800)] 
gve: Allow ethtool to configure rx_buf_len

Add support for getting and setting the RX buffer length via the
ethtool ring parameters (`ethtool -g`/`-G`). The driver restricts the
allowed buffer length to 2048 (SZ_2K) by default and allows 4096 (SZ_4K)
based on device options.

As XDP is only supported when the `rx_buf_len` is 2048, the driver now
enforces this in two places:
1.  In `gve_xdp_set`, rejecting XDP programs if the current buffer
    length is not 2048.
2.  In `gve_set_rx_buf_len_config`, rejecting buffer length changes if XDP
    is loaded and the new length is not 2048.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251106192746.243525-4-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agogve: Use extack to log xdp config verification errors
Ankit Garg [Thu, 6 Nov 2025 19:27:44 +0000 (11:27 -0800)] 
gve: Use extack to log xdp config verification errors

Plumb extack as it allows us to send more detailed error messages back
and append 'gve' suffix to method name per convention.

NL_SET_ERR_MSG_FMT_MOD doesn't support format string longer than 80
chars so keeping netdev warning with actual queue count details.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251106192746.243525-3-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agogve: Decouple header split from RX buffer length
Ankit Garg [Thu, 6 Nov 2025 19:27:43 +0000 (11:27 -0800)] 
gve: Decouple header split from RX buffer length

Previously, enabling header split via `gve_set_hsplit_config` also
implicitly changed the RX buffer length to 4K (if supported by the
device). This coupled two settings that should be orthogonal; this patch
removes that side effect.

After this change, `gve_set_hsplit_config` only toggles the header
split configuration. The RX buffer length is no longer affected and
must be configured independently.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251106192746.243525-2-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-stmmac-ingenic-convert-to-set_phy_intf_sel'
Jakub Kicinski [Tue, 11 Nov 2025 01:30:43 +0000 (17:30 -0800)] 
Merge branch 'net-stmmac-ingenic-convert-to-set_phy_intf_sel'

Russell King says:

====================
net: stmmac: ingenic: convert to set_phy_intf_sel()

Convert ingenic to use the new ->set_phy_intf_sel() method that was
recently introduced in net-next.

This is the largest of the conversions, as there is scope for cleanups
along with the conversion.
====================

Link: https://patch.msgid.link/aQ2tgEu-dudzlZlg@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: use ->set_phy_intf_sel()
Russell King (Oracle) [Fri, 7 Nov 2025 08:29:26 +0000 (08:29 +0000)] 
net: stmmac: ingenic: use ->set_phy_intf_sel()

Rather than placing the phy_intf_sel() setup in the ->init() method,
move it to the new ->set_phy_intf_sel() method.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHqY-0000000Djrn-1D6H@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: pass ingenic_mac struct rather than plat_dat
Russell King (Oracle) [Fri, 7 Nov 2025 08:29:21 +0000 (08:29 +0000)] 
net: stmmac: ingenic: pass ingenic_mac struct rather than plat_dat

It no longer makes sense to pass a pointer to struct
plat_stmmacenet_data when calling the set_mode() methods to only use it
to get a pointer to the ingenic_mac structure that we already had in
the caller. Simplify this by passing the struct ingenic_mac pointer.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHqT-0000000Djrh-0ka3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: simplify x2000 mac_set_mode()
Russell King (Oracle) [Fri, 7 Nov 2025 08:29:16 +0000 (08:29 +0000)] 
net: stmmac: ingenic: simplify x2000 mac_set_mode()

As per the previous commit, we have validated that the phy_intf_sel
value is one that is permissible for this SoC, so there is no need to
handle invalid PHY interface modes. We can also apply the other
configuration based upon the phy_intf_sel value rather than the
PHY interface mode.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHqO-0000000Djrb-0DPN@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: simplify mac_set_mode() methods
Russell King (Oracle) [Fri, 7 Nov 2025 08:29:10 +0000 (08:29 +0000)] 
net: stmmac: ingenic: simplify mac_set_mode() methods

x1000, x1600 and x1830 only accept RMII mode. PHY_INTF_SEL_RMII is only
selected with PHY_INTERFACE_MODE_RMII, and PHY_INTF_SEL_RMII has been
validated by the SoC's .valid_phy_intf_sel bitmask. Thus, checking the
interface mode in these functions becomes unnecessary. Remove these.

jz4775 is similar, except for a greater set of PHY_INTF_SEL_x valies.
Also remove the switch statement here.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHqI-0000000DjrV-3ygL@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: move "MAC PHY control register" debug
Russell King (Oracle) [Fri, 7 Nov 2025 08:29:05 +0000 (08:29 +0000)] 
net: stmmac: ingenic: move "MAC PHY control register" debug

Move the printing of the MAC PHY control register interface mode
setting into ingenic_set_phy_intf_sel(), and use phy_modes() to
print the string rather than using the enum name.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHqD-0000000DjrP-3aaU@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: use stmmac_get_phy_intf_sel()
Russell King (Oracle) [Fri, 7 Nov 2025 08:29:00 +0000 (08:29 +0000)] 
net: stmmac: ingenic: use stmmac_get_phy_intf_sel()

Use stmmac_get_phy_intf_sel() to decode the PHY interface mode to the
phy_intf_sel value, validate the result against the SoC specific
supported phy_intf_sel values, and pass into the SoC specific
set_mode() methods, replacing the local phy_intf_sel variable. This
provides the value for the MACPHYC_PHY_INFT_MASK field.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHq8-0000000DjrJ-2NRK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: prep PHY_INTF_SEL_x field after switch()
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:55 +0000 (08:28 +0000)] 
net: stmmac: ingenic: prep PHY_INTF_SEL_x field after switch()

Move the preparation of the PHY_INTF_SEL_x bitfield out of the switch()
statement such that it only appears once.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHq3-0000000DjrD-1u8O@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: use PHY_INTF_SEL_x directly
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:50 +0000 (08:28 +0000)] 
net: stmmac: ingenic: use PHY_INTF_SEL_x directly

Use the PHY_INTF_SEL_x values directly in each of the mac_set_mode
methods rather than the driver private MACPHYC_PHY_INFT_x definitions.
Remove the MACPHYC_PHY_INFT_x definitions.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHpy-0000000Djr7-1R1m@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: use PHY_INTF_SEL_x to select PHY interface
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:45 +0000 (08:28 +0000)] 
net: stmmac: ingenic: use PHY_INTF_SEL_x to select PHY interface

Use the common dwmac definitions for the PHY interface selection field.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHpt-0000000Djr1-0wwr@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: simplify jz4775 mac_set_mode()
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:40 +0000 (08:28 +0000)] 
net: stmmac: ingenic: simplify jz4775 mac_set_mode()

All paths configure the transmit clock as an input. Move this out of
the switch() statement to simplify the code.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHpo-0000000Djqv-0RD4@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: ingenic: move ingenic_mac_init()
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:34 +0000 (08:28 +0000)] 
net: stmmac: ingenic: move ingenic_mac_init()

Move ingenic_mac_init() to between variant specific set_mode()
implementations and ingenic_mac_probe(). No code changes.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHpi-0000000Djqp-4910@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agousbnet: Add support for Byte Queue Limits (BQL)
Simon Schippers [Thu, 6 Nov 2025 17:56:15 +0000 (18:56 +0100)] 
usbnet: Add support for Byte Queue Limits (BQL)

In the current implementation, usbnet uses a fixed tx_qlen of:

USB2: 60 * 1518 bytes = 91.08 KB
USB3: 60 * 5 * 1518 bytes = 454.80 KB

Such large transmit queues can be problematic, especially for cellular
modems. For example, with a typical celluar link speed of 10 Mbit/s, a
fully occupied USB3 transmit queue results in:

454.80 KB / (10 Mbit/s / 8 bit/byte) = 363.84 ms

of additional latency.

This patch adds support for Byte Queue Limits (BQL) [1] to dynamically
manage the transmit queue size and reduce latency without sacrificing
throughput.

Testing was performed on various devices using the usbnet driver for
packet transmission:

- DELOCK 66045: USB3 to 2.5 GbE adapter (ax88179_178a)
- DELOCK 61969: USB2 to 1 GbE adapter (asix)
- Quectel RM520: 5G modem (qmi_wwan)
- USB2 Android tethering (cdc_ncm)

No performance degradation was observed for iperf3 TCP or UDP traffic,
while latency for a prioritized ping application was significantly
reduced. For example, using the USB3 to 2.5 GbE adapter, which was fully
utilized by iperf3 UDP traffic, the prioritized ping was improved from
1.6 ms to 0.6 ms. With the same setup but with a 100 Mbit/s Ethernet
connection, the prioritized ping was improved from 35 ms to 5 ms.

[1] https://lwn.net/Articles/469652/

Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106175615.26948-1-simon.schippers@tu-dortmund.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotg3: Fix num of RX queues being reported by ethtool
Breno Leitao [Fri, 7 Nov 2025 10:36:59 +0000 (02:36 -0800)] 
tg3: Fix num of RX queues being reported by ethtool

Using num_online_cpus() to report number of queues is actually not
correct, as reported by Michael[1].

netif_get_num_default_rss_queues() was used to replace num_online_cpus()
in the past, but tg3 ethtool callbacks didn't get converted. Doing it
now.

Link: https://lore.kernel.org/all/CACKFLim7ruspmqvjr6bNRq5Z_XXVk3vVaLZOons7kMCzsEG23A@mail.gmail.com/#t
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251107-tg3_counts-v1-1-337fe5c8ccb7@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-dsa-b53-add-support-for-bcm5389-97-98-and-bcm63xx-arl-formats'
Jakub Kicinski [Tue, 11 Nov 2025 01:11:09 +0000 (17:11 -0800)] 
Merge branch 'net-dsa-b53-add-support-for-bcm5389-97-98-and-bcm63xx-arl-formats'

Jonas Gorski says:

====================
net: dsa: b53: add support for BCM5389/97/98 and BCM63XX ARL formats

Currently b53 assumes that all switches apart from BCM5325/5365 use the
same ARL formats, but there are actually multiple formats in use.

Older switches use a format apparently introduced with BCM5387/BCM5389,
while newer chips use a format apparently introduced with BCM5395.

Note that these numbers are not linear, BCM5397/BCM5398 use the older
format.

In addition to that the switches integrated into BCM63XX SoCs use their
own format. While accessing these normal read/write ARL entries are the
same format as BCM5389 one, the search format is different.

So in order to support all these different format, split all code
accessing these entries into chip-family specific functions, and collect
them in appropriate arl ops structs to keep the code cleaner.

Sent as net-next since the ARL accesses have never worked before, and
the extensive refactoring might be too much to warrant a fix.
====================

Link: https://patch.msgid.link/20251107080749.26936-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: b53: add support for bcm63xx ARL entry format
Jonas Gorski [Fri, 7 Nov 2025 08:07:49 +0000 (09:07 +0100)] 
net: dsa: b53: add support for bcm63xx ARL entry format

The ARL registers of BCM63XX embedded switches are somewhat unique. The
normal ARL table access registers have the same format as BCM5389, but
the ARL search registers differ:

* SRCH_CTL is at the same offset of BCM5389, but 16 bits wide. It does
  not have more fields, just needs to be accessed by a 16 bit read.
* SRCH_RSLT_MACVID and SRCH_RSLT are aligned to 32 bit, and have shifted
  offsets.
* SRCH_RSLT has a different format than the normal ARL data entry
  register.
* There is only one set of ENTRY_N registers, implying a 1 bin layout.

So add appropriate ops for bcm63xx and let it use it.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-9-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: b53: add support for 5389/5397/5398 ARL entry format
Jonas Gorski [Fri, 7 Nov 2025 08:07:48 +0000 (09:07 +0100)] 
net: dsa: b53: add support for 5389/5397/5398 ARL entry format

BCM5389, BCM5397 and BCM5398 use a different ARL entry format with just
a 16 bit fwdentry register, as well as different search control and data
offsets.

So add appropriate ops for them and switch those chips to use them.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-8-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: b53: move ARL entry functions into ops struct
Jonas Gorski [Fri, 7 Nov 2025 08:07:47 +0000 (09:07 +0100)] 
net: dsa: b53: move ARL entry functions into ops struct

Now that the differences in ARL entry formats are neatly contained into
functions per chip family, wrap them into an ops struct and add wrapper
functions to access them.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-7-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: b53: split reading search entry into their own functions
Jonas Gorski [Fri, 7 Nov 2025 08:07:46 +0000 (09:07 +0100)] 
net: dsa: b53: split reading search entry into their own functions

Split reading search entries into a function for each format.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-6-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: b53: provide accessors for accessing ARL_SRCH_CTL
Jonas Gorski [Fri, 7 Nov 2025 08:07:45 +0000 (09:07 +0100)] 
net: dsa: b53: provide accessors for accessing ARL_SRCH_CTL

In order to more easily support more formats, move accessing
ARL_SRCH_CTL into helper functions to contain the differences.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-5-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: b53: move writing ARL entries into their own functions
Jonas Gorski [Fri, 7 Nov 2025 08:07:44 +0000 (09:07 +0100)] 
net: dsa: b53: move writing ARL entries into their own functions

Move writing ARL entries into individual functions for each format.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-4-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: b53: move reading ARL entries into their own function
Jonas Gorski [Fri, 7 Nov 2025 08:07:43 +0000 (09:07 +0100)] 
net: dsa: b53: move reading ARL entries into their own function

Instead of duplicating the whole code iterating over all bins for
BCM5325, factor out reading and parsing the entry into its own
functions, and name it the modern one after the first chip with that ARL
format, (BCM53)95.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-3-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: b53: b53_arl_read{,25}(): use the entry for comparision
Jonas Gorski [Fri, 7 Nov 2025 08:07:42 +0000 (09:07 +0100)] 
net: dsa: b53: b53_arl_read{,25}(): use the entry for comparision

Align the b53_arl_read{,25}() functions by consistently using the
parsed arl entry instead of parsing the raw registers again.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-2-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Jakub Kicinski [Tue, 11 Nov 2025 00:43:51 +0000 (16:43 -0800)] 
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-11-10

We've added 19 non-merge commits during the last 3 day(s) which contain
a total of 22 files changed, 1345 insertions(+), 197 deletions(-).

The main changes are:

1) Preserve skb metadata after a TC BPF program has changed the skb,
   from Jakub Sitnicki.
   This allows a TC program at the end of a TC filter chain to still see
   the skb metadata, even if another TC program at the front of the chain
   has changed the skb using BPF helpers.

2) Initial af_smc bpf_struct_ops support to control the smc specific
   syn/synack options, from D. Wythe.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  bpf/selftests: Add selftest for bpf_smc_hs_ctrl
  net/smc: bpf: Introduce generic hook for handshake flow
  bpf: Export necessary symbols for modules with struct_ops
  selftests/bpf: Cover skb metadata access after bpf_skb_change_proto
  selftests/bpf: Cover skb metadata access after change_head/tail helper
  selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room
  selftests/bpf: Cover skb metadata access after vlan push/pop helper
  selftests/bpf: Expect unclone to preserve skb metadata
  selftests/bpf: Dump skb metadata on verification failure
  selftests/bpf: Verify skb metadata in BPF instead of userspace
  bpf: Make bpf_skb_change_head helper metadata-safe
  bpf: Make bpf_skb_change_proto helper metadata-safe
  bpf: Make bpf_skb_adjust_room metadata-safe
  bpf: Make bpf_skb_vlan_push helper metadata-safe
  bpf: Make bpf_skb_vlan_pop helper metadata-safe
  vlan: Make vlan_remove_tag return nothing
  bpf: Unclone skb head on bpf_dynptr_write to skb metadata
  net: Preserve metadata on pskb_expand_head
  net: Helper to move packet data and metadata after skb_push/pull
====================

Link: https://patch.msgid.link/20251110232427.3929291-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ravb: Correct bad check of timestamp control flags
Niklas Söderlund [Fri, 7 Nov 2025 20:01:00 +0000 (21:01 +0100)] 
net: ravb: Correct bad check of timestamp control flags

When converting the Renesas network drivers to use flags from enum
hwtstamp_rx_filters to control when to timestamp packages instead of a
driver specific schema with bit-wise flags an error was made.

The bit-wise driver specific flags correct logic to set get_ts was:

  q: RAVB_BE + tstamp_rx_ctrl: 0 => 0
  q: RAVB_NC + tstamp_rx_ctrl: 0 => 0
  q: RAVB_BE + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_V2_L2_EVENT => 0
  q: RAVB_NC + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_V2_L2_EVENT => 1
  q: RAVB_BE + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_ALL => 1
  q: RAVB_NC + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_ALL => 1

The converted logic to use enum flags mapped tstamp_rx_ctrl as

  0 to HWTSTAMP_FILTER_NONE
  RAVB_RXTSTAMP_TYPE_V2_L2_EVENT to HWTSTAMP_FILTER_PTP_V2_L2_EVENT
  RAVB_RXTSTAMP_TYPE_ALL to HWTSTAMP_FILTER_ALL

But the logic was incorrectly changed to:

  q: RAVB_BE + tstamp_rx_ctrl: HWTSTAMP_FILTER_NONE => 1 (error)
  q: RAVB_NC + tstamp_rx_ctrl: HWTSTAMP_FILTER_NONE => 0
  q: RAVB_BE + tstamp_rx_ctrl: HWTSTAMP_FILTER_PTP_V2_L2_EVENT => 0
  q: RAVB_NC + tstamp_rx_ctrl: HWTSTAMP_FILTER_PTP_V2_L2_EVENT => 1
  q: RAVB_BE + tstamp_rx_ctrl: HWTSTAMP_FILTER_ALL => 1
  q: RAVB_NC + tstamp_rx_ctrl: HWTSTAMP_FILTER_ALL => 0 (error)

This change restores the converted flag check to the correct logic of
the bit-wise driver specific flags.

Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/linux-renesas-soc/aQ4xSv9629XF-Bt3@horms.kernel.org/
Fixes: 16e2e6cf75e6 ("net: ravb: Use common defines for time stamping control")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20251107200100.3637869-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoptp: ocp: Document sysfs output format for backward compatibility
Zhongqiu Han [Fri, 7 Nov 2025 07:45:33 +0000 (15:45 +0800)] 
ptp: ocp: Document sysfs output format for backward compatibility

Add a comment to ptp_ocp_tty_show() explaining that the sysfs output
intentionally does not include a trailing newline. This is required for
backward compatibility with existing userspace software that reads the
sysfs attribute and uses the value directly as a device path.

A previous attempt to add a newline to align with common kernel
conventions broke userspace applications that were opening device paths
like "/dev/ttyS4\n" instead of "/dev/ttyS4", resulting in ENOENT errors.

This comment prevents future attempts to "fix" this behavior, which would
break existing userspace applications.

Link: https://lore.kernel.org/netdev/20251030124519.1828058-1-zhongqiu.han@oss.qualcomm.com/
Link: https://lore.kernel.org/netdev/aef3b850-5f38-4c28-a018-3b0006dc2f08@linux.dev/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251107074533.416048-1-zhongqiu.han@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agosctp: Don't inherit do_auto_asconf in sctp_clone_sock().
Kuniyuki Iwashima [Thu, 6 Nov 2025 22:34:06 +0000 (22:34 +0000)] 
sctp: Don't inherit do_auto_asconf in sctp_clone_sock().

syzbot reported list_del(&sp->auto_asconf_list) corruption
in sctp_destroy_sock().

The repro calls setsockopt(SCTP_AUTO_ASCONF, 1) to a SCTP
listener, calls accept(), and close()s the child socket.

setsockopt(SCTP_AUTO_ASCONF, 1) sets sp->do_auto_asconf
to 1 and links sp->auto_asconf_list to a per-netns list.

Both fields are placed after sp->pd_lobby in struct sctp_sock,
and sctp_copy_descendant() did not copy the fields before the
cited commit.

Also, sctp_clone_sock() did not set them explicitly.

In addition, sctp_auto_asconf_init() is called from
sctp_sock_migrate(), but it initialises the fields only
conditionally.

The two fields relied on __GFP_ZERO added in sk_alloc(),
but sk_clone() does not use it.

Let's clear newsp->do_auto_asconf in sctp_clone_sock().

[0]:
list_del corruption. prev->next should be ffff8880799e9148, but was ffff8880799e8808. (prev=ffff88803347d9f8)
kernel BUG at lib/list_debug.c:64!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6008 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
RIP: 0010:__list_del_entry_valid_or_report+0x15a/0x190 lib/list_debug.c:62
Code: e8 7b 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 7c ee 92 fd 49 8b 17 48 c7 c7 80 0a bf 8b 48 89 de 4c 89 f9 e8 07 c6 94 fc 90 <0f> 0b 4c 89 f7 e8 4c 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 4d
RSP: 0018:ffffc90003067ad8 EFLAGS: 00010246
RAX: 000000000000006d RBX: ffff8880799e9148 RCX: b056988859ee6e00
RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffc90003067807 R09: 1ffff9200060cf00
R10: dffffc0000000000 R11: fffff5200060cf01 R12: 1ffff1100668fb3f
R13: dffffc0000000000 R14: ffff88803347d9f8 R15: ffff88803347d9f8
FS:  00005555823e5500(0000) GS:ffff88812613e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000000480 CR3: 00000000741ce000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 __list_del_entry_valid include/linux/list.h:132 [inline]
 __list_del_entry include/linux/list.h:223 [inline]
 list_del include/linux/list.h:237 [inline]
 sctp_destroy_sock+0xb4/0x370 net/sctp/socket.c:5163
 sk_common_release+0x75/0x310 net/core/sock.c:3961
 sctp_close+0x77e/0x900 net/sctp/socket.c:1550
 inet_release+0x144/0x190 net/ipv4/af_inet.c:437
 __sock_release net/socket.c:662 [inline]
 sock_close+0xc3/0x240 net/socket.c:1455
 __fput+0x44c/0xa70 fs/file_table.c:468
 task_work_run+0x1d4/0x260 kernel/task_work.c:227
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43
 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
 syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
 do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 16942cf4d3e3 ("sctp: Use sk_clone() in sctp_accept().")
Reported-by: syzbot+ba535cb417f106327741@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/690d2185.a70a0220.22f260.000e.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251106223418.1455510-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-smc-introduce-smc_hs_ctrl'
Martin KaFai Lau [Mon, 10 Nov 2025 19:10:09 +0000 (11:10 -0800)] 
Merge branch 'net-smc-introduce-smc_hs_ctrl'

D. Wythe says:

====================
net/smc: Introduce smc_hs_ctrl

This patch aims to introduce BPF injection capabilities for SMC and
includes a self-test to ensure code stability.

Since the SMC protocol isn't ideal for every situation, especially
short-lived ones, most applications can't guarantee the absence of
such scenarios. Consequently, applications may need specific strategies
to decide whether to use SMC. For example, an application might limit SMC
usage to certain IP addresses or ports.

To maintain the principle of transparent replacement, we want applications
to remain unaffected even if they need specific SMC strategies. In other
words, they should not require recompilation of their code.

Additionally, we need to ensure the scalability of strategy implementation.
While using socket options or sysctl might be straightforward, it could
complicate future expansions.

Fortunately, BPF addresses these concerns effectively. Users can write
their own strategies in eBPF to determine whether to use SMC, and they can
easily modify those strategies in the future.

This is a rework of the series from [1]. Changes since [1] are limited to
the SMC parts:

1. Rename smc_ops to smc_hs_ctrl and change interface name.
2. Squash SMC patches, removing standalone non-BPF hook capability.
3. Fix typos

[1]: https://lore.kernel.org/bpf/20250123015942.94810-1-alibuda@linux.alibaba.com/#t

v2 -> v1:
  - Removed the fixes patch, which have already been merged on current branch.
  - Fixed compilation warning of smc_call_hsbpf() when CONFIG_SMC_HS_CTRL_BPF
    is not enabled.
  - Changed the default value of CONFIG_SMC_HS_CTRL_BPF to Y.
  - Fix typo and renamed some variables

v3 -> v2:
  - Removed the libbpf patch, which have already been merged on current branch.
  - Fixed sparse warning of smc_call_hsbpf() and xchg().

v4 -> v3:
   - Rebased on latest bpf-next, updated SMC loopback config from SMC_LO to DIBS_LO
     per upstream changes.

v5 -> v4:
    - Removed the redundant sk parameter from smc_call_hsbpf
    - Reject registration when bpf_link is set, link support will be added in the
      future.
    - Updated selftests with new test heplers.
====================

Link: https://patch.msgid.link/20251107035632.115950-1-alibuda@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
5 weeks agobpf/selftests: Add selftest for bpf_smc_hs_ctrl
D. Wythe [Fri, 7 Nov 2025 03:56:32 +0000 (11:56 +0800)] 
bpf/selftests: Add selftest for bpf_smc_hs_ctrl

This tests introduces a tiny smc_hs_ctrl for filtering SMC connections
based on IP pairs, and also adds a realistic topology model to verify it.

Also, we can only use SMC loopback under CI test, so an additional
configuration needs to be enabled.

Follow the steps below to run this test.

make -C tools/testing/selftests/bpf
cd tools/testing/selftests/bpf
sudo ./test_progs -t smc

Results shows:
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20251107035632.115950-4-alibuda@linux.alibaba.com
5 weeks agonet/smc: bpf: Introduce generic hook for handshake flow
D. Wythe [Fri, 7 Nov 2025 03:56:31 +0000 (11:56 +0800)] 
net/smc: bpf: Introduce generic hook for handshake flow

The introduction of IPPROTO_SMC enables eBPF programs to determine
whether to use SMC based on the context of socket creation, such as
network namespaces, PID and comm name, etc.

As a subsequent enhancement, to introduce a new generic hook that
allows decisions on whether to use SMC or not at runtime, including
but not limited to local/remote IP address or ports.

User can write their own implememtion via bpf_struct_ops now to choose
whether to use SMC or not before TCP 3rd handshake to be comleted.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20251107035632.115950-3-alibuda@linux.alibaba.com
5 weeks agobpf: Export necessary symbols for modules with struct_ops
D. Wythe [Fri, 7 Nov 2025 03:56:30 +0000 (11:56 +0800)] 
bpf: Export necessary symbols for modules with struct_ops

Exports three necessary symbols for implementing struct_ops with
tristate subsystem.

To hold or release refcnt of struct_ops refcnt by inline funcs
bpf_try_module_get and bpf_module_put which use bpf_struct_ops_get(put)
conditionally.

And to copy obj name from one to the other with effective checks by
bpf_obj_name_cpy.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251107035632.115950-2-alibuda@linux.alibaba.com
5 weeks agoMerge branch 'make-tc-bpf-helpers-preserve-skb-metadata'
Martin KaFai Lau [Mon, 10 Nov 2025 18:52:33 +0000 (10:52 -0800)] 
Merge branch 'make-tc-bpf-helpers-preserve-skb-metadata'

Jakub Sitnicki says:

====================
Make TC BPF helpers preserve skb metadata

Changes in v4:
- Fix copy-paste bug in check_metadata() test helper (AI review)
- Add "out of scope" section (at the bottom)
- Link to v3: https://lore.kernel.org/r/20251026-skb-meta-rx-path-v3-0-37cceebb95d3@cloudflare.com

Changes in v3:
- Use the already existing BPF_STREAM_STDERR const in tests (Martin)
- Unclone skb head on bpf_dynptr_write to skb metadata (patch 3) (Martin)
- Swap order of patches 1 & 2 to refer to skb_postpush_data_move() in docs
- Mention in skb_data_move() docs how to move just the metadata
- Note in pskb_expand_head() docs to move metadata after skb_push() (Jakub)
- Link to v2: https://lore.kernel.org/r/20251019-skb-meta-rx-path-v2-0-f9a58f3eb6d6@cloudflare.com

Changes in v2:
- Tweak WARN_ON_ONCE check in skb_data_move() (patch 2)
- Convert all tests to verify skb metadata in BPF (patches 9-10)
- Add test coverage for modified BPF helpers (patches 12-15)
- Link to RFCv1: https://lore.kernel.org/r/20250929-skb-meta-rx-path-v1-0-de700a7ab1cb@cloudflare.com

This patch set continues our work [1] to allow BPF programs and user-space
applications to attach multiple bytes of metadata to packets via the
XDP/skb metadata area.

The focus of this patch set it to ensure that skb metadata remains intact
when packets pass through a chain of TC BPF programs that call helpers
which operate on skb head.

Currently, several helpers that either adjust the skb->data pointer or
reallocate skb->head do not preserve metadata at its expected location,
that is immediately in front of the MAC header. These are:

- bpf_skb_adjust_room
- bpf_skb_change_head
- bpf_skb_change_proto
- bpf_skb_change_tail
- bpf_skb_vlan_pop
- bpf_skb_vlan_push

In TC BPF context, metadata must be moved whenever skb->data changes to
keep the skb->data_meta pointer valid. I don't see any way around
it. Creative ideas how to avoid that would be very welcome.

With that in mind, we can patch the helpers in at least two different ways:

1. Integrate metadata move into header move

   Replace the existing memmove, which follows skb_push/pull, with a helper
   that moves both headers and metadata in a single call. This avoids an
   extra memmove but reduces transparency.

        skb_pull(skb, len);
-       memmove(skb->data, skb->data - len, n);
+       skb_postpull_data_move(skb, len, n);
        skb->mac_header += len;

        skb_push(skb, len)
-       memmove(skb->data, skb->data + len, n);
+       skb_postpush_data_move(skb, len, n);
        skb->mac_header -= len;

2. Move metadata separately

   Add a dedicated metadata move after the header move. This is more
   explicit but costs an additional memmove.

        skb_pull(skb, len);
        memmove(skb->data, skb->data - len, n);
+       skb_metadata_postpull_move(skb, len);
        skb->mac_header += len;

        skb_push(skb, len)
+       skb_metadata_postpush_move(skb, len);
        memmove(skb->data, skb->data + len, n);
        skb->mac_header -= len;

This patch set implements option (1), expecting that "you can have just one
memmove" will be the most obvious feedback, while readability is a,
somewhat subjective, matter of taste, which I don't claim to have ;-)

The structure of the patch set is as follows:

- patches 1-4 prepare ground for safe-proofing the BPF helpers
- patches 5-9 modify the BPF helpers to preserve skb metadata
- patches 10-11 prepare ground for metadata tests with BPF helper calls
- patches 12-16 adapt and expand tests to cover the made changes

Out of scope for this series:
- safe-proofing tunnel & tagging devices - VLAN, GRE, ...
  (next in line, in development preview at [2])
- metadata access after packet foward
  (to do after Rx path - once metadata reliably reaches sk_filter)

Thanks,
-jkbs

[1] https://lore.kernel.org/all/20250814-skb-metadata-thru-dynptr-v7-0-8a39e636e0fb@cloudflare.com/
[2] https://github.com/jsitnicki/linux/commits/skb-meta/safeproof-netdevs/
====================

Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-0-5ceb08a9b37b@cloudflare.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
5 weeks agoselftests/bpf: Cover skb metadata access after bpf_skb_change_proto
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:53 +0000 (21:19 +0100)] 
selftests/bpf: Cover skb metadata access after bpf_skb_change_proto

Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_proto(), which modifies packet headroom to accommodate
different IP header sizes.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-16-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Cover skb metadata access after change_head/tail helper
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:52 +0000 (21:19 +0100)] 
selftests/bpf: Cover skb metadata access after change_head/tail helper

Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_head() and bpf_skb_change_tail(), which modify packet
headroom/tailroom and can trigger head reallocation.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-15-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Cover skb metadata access after bpf_skb_adjust_room
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:51 +0000 (21:19 +0100)] 
selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room

Add a test to verify that skb metadata remains accessible after calling
bpf_skb_adjust_room(), which modifies the packet headroom and can trigger
head reallocation.

The helper expects an Ethernet frame carrying an IP packet so switch test
packet identification by source MAC address since we can no longer rely on
Ethernet proto being set to zero.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-14-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Cover skb metadata access after vlan push/pop helper
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:50 +0000 (21:19 +0100)] 
selftests/bpf: Cover skb metadata access after vlan push/pop helper

Add a test to verify that skb metadata remains accessible after calling
bpf_skb_vlan_push() and bpf_skb_vlan_pop(), which modify the packet
headroom.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-13-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Expect unclone to preserve skb metadata
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:49 +0000 (21:19 +0100)] 
selftests/bpf: Expect unclone to preserve skb metadata

Since pskb_expand_head() no longer clears metadata on unclone, update tests
for cloned packets to expect metadata to remain intact.

Also simplify the clone_dynptr_kept_on_{data,meta}_slice_write tests.
Creating an r/w dynptr slice is sufficient to trigger an unclone in the
prologue, so remove the extraneous writes to the data/meta slice.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-12-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Dump skb metadata on verification failure
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:48 +0000 (21:19 +0100)] 
selftests/bpf: Dump skb metadata on verification failure

Add diagnostic output when metadata verification fails to help with
troubleshooting test failures. Introduce a check_metadata() helper that
prints both expected and received metadata to the BPF program's stderr
stream on mismatch. The userspace test reads and dumps this stream on
failure.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-11-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Verify skb metadata in BPF instead of userspace
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:47 +0000 (21:19 +0100)] 
selftests/bpf: Verify skb metadata in BPF instead of userspace

Move metadata verification into the BPF TC programs. Previously,
userspace read metadata from a map and verified it once at test end.

Now TC programs compare metadata directly using __builtin_memcmp() and
set a test_pass flag. This enables verification at multiple points during
test execution rather than a single final check.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-10-5ceb08a9b37b@cloudflare.com