]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
8 days agoselftests: mptcp: join: wait for estab event instead of MPJ
Matthieu Baerts (NGI0) [Tue, 3 Feb 2026 18:41:25 +0000 (19:41 +0100)] 
selftests: mptcp: join: wait for estab event instead of MPJ

'wait_mpj' was used just after having created a background connection,
but before creating new subflows. So no MPJ were sent. The intention was
to wait for the connection to be established, which was the same as
doing a simple sleep with a "random" value.

Instead, wait for an "established" event. With this, the tests can
finish quicker.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-9-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agoselftests: mptcp: diag: sort all #include
Matthieu Baerts (NGI0) [Tue, 3 Feb 2026 18:41:24 +0000 (19:41 +0100)] 
selftests: mptcp: diag: sort all #include

This file is the only one from this directory not to have all these
header inclusions sorted by type and alphabetical order.

Adapt them, to ease the reading, prevent conflicts during potential
future backport modifying these lines, and also to avoid having UAPI
header inclusions before libc ones, see [1].

Link: https://lore.kernel.org/20260120-uapi-sockaddr-v2-1-63c319111cf6@linutronix.de
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-8-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agomptcp: Change some dubious min_t(int, ...) to min()
David Laight [Tue, 3 Feb 2026 18:41:22 +0000 (19:41 +0100)] 
mptcp: Change some dubious min_t(int, ...) to min()

There are two:

min_t(int, xxx, mptcp_wnd_end(msk) - msk->snd_nxt);

Both mptcp_wnd_end(msk) and msk->snd_nxt are u64, their difference
(aka the window size) might be limited to 32 bits - but that isn't
knowable from this code.
So checks being added to min_t() detect the potential discard of
significant bits.

Provided the 'avail_size' and return of mptcp_check_allowed_size()
are changed to an unsigned type (size_t matches the type the caller
uses) both min_t() can be changed to min().

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
[ wrapped too long lines when declaring mptcp_check_allowed_size() ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-6-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agomptcp: pm: align endpoint flags size with the NL specs
Matthieu Baerts (NGI0) [Tue, 3 Feb 2026 18:41:21 +0000 (19:41 +0100)] 
mptcp: pm: align endpoint flags size with the NL specs

The MPTCP Netlink specs describe the 'flags' as a u32 type. Internally,
a u8 type was used.

Using a u8 is currently fine, because only the 5 first bits are used.
But there is also no reason not to be aligns with the specs, and
to stick to a u8. Especially because there is a whole of 3 bytes after
in both mptcp_pm_local and mptcp_pm_addr_entry structures.

Also, setting it to a u32 will allow future flags, just in case.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-5-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agotrace: mptcp: add mptcp_rcvbuf_grow tracepoint
Paolo Abeni [Tue, 3 Feb 2026 18:41:20 +0000 (19:41 +0100)] 
trace: mptcp: add mptcp_rcvbuf_grow tracepoint

Similar to tcp, provide a new tracepoint to better understand
mptcp_rcv_space_adjust() behavior, which presents many artifacts.

Note that the used format string is so long that I preferred
wrap it, contrary to guidance for quoted strings.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-4-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agomptcp: consolidate rcv space init
Paolo Abeni [Tue, 3 Feb 2026 18:41:19 +0000 (19:41 +0100)] 
mptcp: consolidate rcv space init

MPTCP uses several calls of the mptcp_rcv_space_init() helper to
initialize the receive space, with a catch-up call in
mptcp_rcv_space_adjust().

Drop all the other strictly not needed invocations and move constant
fields initialization at socket init/reset time.

This removes a bit of complexity from mptcp DRS code. No functional
changes intended.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-3-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agomptcp: fix receive space timestamp initialization
Paolo Abeni [Tue, 3 Feb 2026 18:41:18 +0000 (19:41 +0100)] 
mptcp: fix receive space timestamp initialization

MPTCP initialize the receive buffer stamp in mptcp_rcv_space_init(),
using the provided subflow stamp. Such helper is invoked in several
places; for passive sockets, space init happened at clone time.

In such scenario, MPTCP ends-up accesses the subflow stamp before
its initialization, leading to quite randomic timing for the first
receive buffer auto-tune event, as the timestamp for newly created
subflow is not refreshed there.

Fix the issue moving the stamp initialization out of the mentioned helper,
at the data transfer start, and always using a fresh timestamp.

Fixes: 013e3179dbd2 ("mptcp: fix rcv space initialization")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-2-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agomptcp: do not account for OoO in mptcp_rcvbuf_grow()
Paolo Abeni [Tue, 3 Feb 2026 18:41:17 +0000 (19:41 +0100)] 
mptcp: do not account for OoO in mptcp_rcvbuf_grow()

MPTCP-level OoOs are physiological when multiple subflows are active
concurrently and will not cause retransmissions nor are caused by
drops.

Accounting for them in mptcp_rcvbuf_grow() causes the rcvbuf slowly
drifting towards tcp_rmem[2].

Remove such accounting. Note that subflows will still account for TCP-level
OoO when the MPTCP-level rcvbuf is propagated.

This also closes a subtle and very unlikely race condition with rcvspace
init; active sockets with user-space holding the msk-level socket lock,
could complete such initialization in the receive callback, after that the
first OoO data reaches the rcvbuf and potentially triggering a divide by
zero Oops.

Fixes: e118cdc34dd1 ("mptcp: rcvbuf auto-tuning improvement")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-1-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agovmw_vsock: bypass false-positive Wnonnull warning with gcc-16
Arnd Bergmann [Tue, 3 Feb 2026 16:34:00 +0000 (17:34 +0100)] 
vmw_vsock: bypass false-positive Wnonnull warning with gcc-16

The gcc-16.0.1 snapshot produces a false-positive warning that turns
into a build failure with CONFIG_WERROR:

In file included from arch/x86/include/asm/string.h:6,
                 from net/vmw_vsock/vmci_transport.c:10:
In function 'vmci_transport_packet_init',
    inlined from '__vmci_transport_send_control_pkt.constprop' at net/vmw_vsock/vmci_transport.c:198:2:
arch/x86/include/asm/string_32.h:150:25: error: argument 2 null where non-null expected because argument 3 is nonzero [-Werror=nonnull]
  150 | #define memcpy(t, f, n) __builtin_memcpy(t, f, n)
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~
net/vmw_vsock/vmci_transport.c:164:17: note: in expansion of macro 'memcpy'
  164 |                 memcpy(&pkt->u.wait, wait, sizeof(pkt->u.wait));
      |                 ^~~~~~
arch/x86/include/asm/string_32.h:150:25: note: in a call to built-in function '__builtin_memcpy'
net/vmw_vsock/vmci_transport.c:164:17: note: in expansion of macro 'memcpy'
  164 |                 memcpy(&pkt->u.wait, wait, sizeof(pkt->u.wait));
      |                 ^~~~~~

This seems relatively harmless, and it so far the only instance of this
warning I have found. The __vmci_transport_send_control_pkt function
is called either with wait=NULL or with one of the type values that
pass 'wait' into memcpy() here, but not from the same caller.

Replacing the memcpy with a struct assignment is otherwise the same
but avoids the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Bryan Tan <bryan-bt.tan@broadcom.com>
Link: https://patch.msgid.link/20260203163406.2636463-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agodt-bindings: net: renesas,rzv2h-gbeth: Document Renesas RZ/G3L RMII{tx,rx} clocks
Biju Das [Tue, 3 Feb 2026 10:45:38 +0000 (10:45 +0000)] 
dt-bindings: net: renesas,rzv2h-gbeth: Document Renesas RZ/G3L RMII{tx,rx} clocks

As per the RZ/G3L Hardware manual, CPG_CLKON_ETH register bits{12,13} are
to control the RMII{tx, rx} clocks. Document the rmii{tx.rx} clocks for
RZ/G3L SoC.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260203104541.264759-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agoMerge branch 's32g-use-a-syscon-for-gpr'
Jakub Kicinski [Thu, 5 Feb 2026 02:17:57 +0000 (18:17 -0800)] 
Merge branch 's32g-use-a-syscon-for-gpr'

Dan Carpenter says:

====================
s32g: Use a syscon for GPR

The s32g devices have a GPR register region which holds a number of
miscellaneous registers.  Currently only the stmmac/dwmac-s32.c uses
anything from there and we just add a line to the device tree to
access that GMAC_0_CTRL_STS register:

                        reg = <0x4033c000 0x2000>, /* gmac IP */
                              <0x4007c004 0x4>;    /* GMAC_0_CTRL_STS */

I have included the whole list of registers below.

We still have to maintain backwards compatibility to this format,
of course, but it would be better to access these registers through a
syscon.  Putting all the registers together is more organized and shows
how the hardware actually is implemented.

Secondly, in some versions of this chipset those registers can only be
accessed via SCMI.  It's relatively straight forward to handle this
by writing a syscon driver and registering it with of_syscon_register_regmap()
but it's complicated to deal with if the registers aren't grouped
together.

Here is the whole list of registers in the GPR region

Starting from 0x4007C000

0  Software-Triggered Faults (SW_NCF)
4  GMAC Control (GMAC_0_CTRL_STS)
28 CMU Status 1 (CMU_STATUS_REG1)
2C CMUs Status 2 (CMU_STATUS_REG2)
30 FCCU EOUT Override Clear (FCCU_EOUT_OVERRIDE_CLEAR_REG)
38 SRC POR Control (SRC_POR_CTRL_REG)
54 GPR21 (GPR21)
5C GPR23 (GPR23)
60 GPR24 Register (GPR24)
CC Debug Control (DEBUG_CONTROL)
F0 Timestamp Control (TIMESTAMP_CONTROL_REGISTER)
F4 FlexRay OS Tick Input Select (FLEXRAY_OS_TICK_INPUT_SELECT_REG)
FC GPR63 Register (GPR63)

Starting from 0x4007CA00

0  Coherency Enable for PFE Ports (PFE_COH_EN)
4  PFE EMAC Interface Mode (PFE_EMACX_INTF_SEL)
20 PFE EMACX Power Control (PFE_PWR_CTRL)
28 Error Injection on Cortex-M7 AHB and AXI Pipe (CM7_TCM_AHB_SLICE)
2C Error Injection AHBP Gasket Cortex-M7 (ERROR_INJECTION_AHBP_GASKET_CM7)
40 LLCE Subsystem Status (LLCE_STAT)
44 LLCE Power Control (LLCE_CTRL)
48 DDR Urgent Control (DDR_URGENT_CTRL)
4C FTM Global Load Control (FLXTIM_CTRL)
50 FTM LDOK Status (FLXTIM_STAT)
54 Top CMU Status (CMU_STAT)
58 Accelerator NoC No Pending Trans Status (NOC_NOPEND_TRANS)
90 SerDes RD/WD Toggle Control (PCIE_TOGGLE)
94 SerDes Toggle Done Status (PCIE_TOGGLEDONE_STAT)
E0 Generic Control 0 (GENCTRL0)
E4 Generic Control 1 (GENCTRL1)
F0 Generic Status 0 (GENSTAT0)
FC Cortex-M7 AXI Parity Error and AHBP Gasket Error Alarm (CM7_AXI_AHBP_GASKET_ERROR_ALARM)

Starting from 4007C800

4  GPR01 Register (GPR01)
30 GPR12 Register (GPR12)
58 GPR22 Register (GPR22)
70 GPR28 Register (GPR28)
74 GPR29 Register (GPR29)

Starting from 4007CB00

4 WKUP Pad Pullup/Pulldown Select (WKUP_PUS)
====================

Link: https://patch.msgid.link/cover.1769764941.git.dan.carpenter@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agodt-bindings: net: nxp,s32-dwmac: Use the GPR syscon
Dan Carpenter [Fri, 30 Jan 2026 13:19:47 +0000 (16:19 +0300)] 
dt-bindings: net: nxp,s32-dwmac: Use the GPR syscon

The S32 chipsets have a GPR region which has a miscellaneous registers
including the GMAC_0_CTRL_STS register.  Originally, this code accessed
that register in a sort of ad-hoc way, but it's cleaner to use a
syscon interface to access these registers.

We still need to maintain the old method of accessing the GMAC register
but using a syscon will let us access other registers more cleanly.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/3b75e950b2f8faecd1a9fa757e7eb7b42ace838f.1769764941.git.dan.carpenter@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agonet: stmmac: s32: use a syscon for S32_PHY_INTF_SEL_RGMII
Dan Carpenter [Fri, 30 Jan 2026 13:19:41 +0000 (16:19 +0300)] 
net: stmmac: s32: use a syscon for S32_PHY_INTF_SEL_RGMII

On the s32 chipsets the GMAC_0_CTRL_STS register is in GPR region.
Originally, accessing this register was done in a sort of ad-hoc way,
but we want to use the syscon interface to do it.

This is a little bit ugly because we have to maintain backwards
compatibility to the old device trees so we have to support both ways
to access this register.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/b6b60d03344d070b2b4db7f0f00527f166e594e0.1769764941.git.dan.carpenter@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agoMerge branch 'stp-rstp-switch-support-for-pru-icssm-ethernet-driver'
Jakub Kicinski [Thu, 5 Feb 2026 02:12:03 +0000 (18:12 -0800)] 
Merge branch 'stp-rstp-switch-support-for-pru-icssm-ethernet-driver'

Parvathi Pudi says:

====================
STP/RSTP SWITCH support for PRU-ICSSM Ethernet driver

The DUAL-EMAC patch series for Megabit Industrial Communication Sub-system
(ICSSM), which provides the foundational support for Ethernet functionality
over PRU-ICSS on the TI SOCs (AM335x, AM437x, and AM57x), was merged into
net-next recently [1].

This patch series enhances the PRU-ICSSM Ethernet driver to support bridge
(STP/RSTP) SWITCH mode, which has been implemented using the "switchdev"
framework and interacts with the "mstp daemon" for STP and RSTP management
in userspace.

When the  SWITCH mode is enabled, forwarding of Ethernet packets using
either the traditional store-and-forward mechanism or via cut-through is
offloaded to the two PRU based Ethernet interfaces available within the
ICSSM. The firmware running on the PRU inspects the bridge port states and
performs necessary checks before forwarding a packet. This improves the
overall system performance and significantly reduces the packet forwarding
latency.

Protocol switching from Dual-EMAC to bridge (STP/RSTP) SWITCH mode can be
done as follows.

Assuming eth2 and eth3 are the two physical ports of the ICSS2 instance:

>> brctl addbr br0
>> ip maddr add 01:80:c2:00:00:00 dev br0
>> ip link set dev br0 address $(cat /sys/class/net/eth2/address)
>> brctl addif br0 eth2
>> brctl addif br0 eth3
>> mstpd
>> brctl stp br0 on
>> mstpctl setforcevers br0 rstp
>> ip link set dev br0 up

To revert back to the default dual EMAC mode, the steps are as follows:

>> ip link set dev br0 down
>> brctl delif br0 eth2
>> brctl delif br0 eth3
>> brctl delbr br0

The patches presented in this series have gone through the patch verification
tools and no warnings or errors are reported.
====================

Link: https://patch.msgid.link/20260130124559.1182780-1-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agonet: ti: icssm-prueth: Add support for ICSSM RSTP switch
Roger Quadros [Fri, 30 Jan 2026 12:43:45 +0000 (18:13 +0530)] 
net: ti: icssm-prueth: Add support for ICSSM RSTP switch

Add support for RSTP switch mode by enhancing the existing ICSSM dual EMAC
driver with switchdev support.

Enable the PRU-ICSSM to operate in switch mode, with the 2 PRU ports acting
as external ports and the host acting as an internal port. Packets received
from the PRU ports will be forwarded to the host (store and forward mode)
and also to the other PRU port (either using store and forward mode or via
cut-through mode). Packets coming from the host will be transmitted either
from one or both of the PRU ports (depending on the FDB decision).

By default, the dual EMAC firmware will be loaded in the PRU-ICSS
subsystem. To configure the PRU-ICSS to operate as a switch, a different
firmware must to be loaded.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Basharath Hussain Khaja <basharath@couthit.com>
Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260130124559.1182780-4-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agonet: ti: icssm-prueth: Add switchdev support for icssm_prueth driver
Roger Quadros [Fri, 30 Jan 2026 12:43:44 +0000 (18:13 +0530)] 
net: ti: icssm-prueth: Add switchdev support for icssm_prueth driver

Add support for offloading the RSTP switch feature to the PRU-ICSS
subsystem by adding switchdev support. PRU-ICSS is capable of operating
in RSTP switch mode with two external ports and one host port.

PRUETH driver and firmware interface support will be added into
icssm_prueth in the subsequent commits.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Basharath Hussain Khaja <basharath@couthit.com>
Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260130124559.1182780-3-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 days agonet: ti: icssm-prueth: Add helper functions to configure and maintain FDB
Roger Quadros [Fri, 30 Jan 2026 12:43:43 +0000 (18:13 +0530)] 
net: ti: icssm-prueth: Add helper functions to configure and maintain FDB

Introduce helper functions to configure and maintain Forwarding
Database (FDB) tables to aid with the switch mode feature for PRU-ICSS
ports. The PRU-ICSS FDB is maintained such that it is always in sync with
the Linux bridge driver FDB.

The FDB is used by the driver to determine whether to flood a packet,
received from the user plane, to both ports or direct it to a specific port
using the flags in the FDB table entry.

The FDB is implemented in two main components: the Index table and the
MAC Address table. Adding, deleting, and maintaining entries are handled
by the PRUETH driver. There are two types of entries:

Dynamic: created from the received packets and are subject to aging.
Static: created by the user and these entries never age out.

8-bit hash value obtained using the source MAC address is used to identify
the index to the Index/Hash table. A bucket-based approach is used to
collate source MAC addresses with the same hash value. The Index/Hash table
holds the bucket index (16-bit value) and the number of entries in the
bucket with the same hash value (16-bit value). This table can hold up to
256 entries, with each entry consuming 4 bytes of memory. The bucket index
value points to the MAC address table indicating the start of MAC addresses
having the same hash values.

Each entry in the MAC Address table consists of:
1. 6 bytes of the MAC address,
2. 2-byte aging time, and
3. 1-byte each for port information and flags respectively.

When a new entry is added to the FDB, the hash value is calculated using an
XOR operation on the 6-byte MAC address. The result is used as an index
into the Hash/Index table to check if any entries exist. If no entries are
present, the first available empty slot in the MAC Address table is
allocated to insert this MAC address. If entries with the same hash value
are already present, the new MAC address entry is added to the MAC Address
table in such a way that it ensures all entries are grouped together and
sorted in ascending MAC address order. This approach helps efficiently
manage FDB entries.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Basharath Hussain Khaja <basharath@couthit.com>
Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260130124559.1182780-2-parvathi@couthit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: usb: sr9700: remove code to drive nonexistent multicast filter
Ethan Nelson-Moore [Tue, 3 Feb 2026 01:39:09 +0000 (17:39 -0800)] 
net: usb: sr9700: remove code to drive nonexistent multicast filter

Several registers referenced in this driver's source code do not
actually exist (they are not writable and read as zero in my testing).
They exist in this driver because it originated as a copy of the dm9601
driver. Notably, these include the multicast filter registers - this
causes the driver to not support multicast packets correctly. Remove
the multicast filter code and register definitions. Instead, set the
chip to receive all multicast filter packets when any multicast
addresses are in the list.

Reviewed-by: Simon Horman <horms@kernel.org> (from v1)
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260203013924.28582-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: usb: introduce usbnet_mii_ioctl helper function
Ethan Nelson-Moore [Tue, 3 Feb 2026 01:34:55 +0000 (17:34 -0800)] 
net: usb: introduce usbnet_mii_ioctl helper function

Many USB network drivers use identical code to pass ioctl
requests on to the MII layer. Reduce code duplication by
refactoring this code into a helper function.

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> (v1)
Reviewed-by: Andrew Lunn <andrew@lunn.ch> (v3)
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260203013517.26170-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoMerge branch 'net-ethernet-renesas-rcar_gen4_ptp-hide-private-data'
Jakub Kicinski [Wed, 4 Feb 2026 03:35:40 +0000 (19:35 -0800)] 
Merge branch 'net-ethernet-renesas-rcar_gen4_ptp-hide-private-data'

Niklas Söderlund says:

====================
net: ethernet: renesas: rcar_gen4_ptp: Hide private data

The R-Car Gen4 PTP module started out as an exclusive feature of a
single driver, but have since been extended to cover both R-Car Switch
and TSN driver implementations on Gen4.

The feature have already been extended to be built as its own module
with an interface exposed thru a local header file. The header file
however also exposes the modules private data structure. The two
existing users have already started to poke at members of the struct.

The exposed private data being manipulated by users makes refactoring
and future rework hard as the interface for the module becomes to
chaotic. This small series aims to create two helpers to hide the
private data.

This is done as a small preparation before a third, new, users of the
Gen4 PTP will be added in a follow up series.
====================

Link: https://patch.msgid.link/20260201183745.1075399-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: ethernet: renesas: rcar_gen4_ptp: Hide private data from users
Niklas Söderlund [Sun, 1 Feb 2026 18:37:45 +0000 (19:37 +0100)] 
net: ethernet: renesas: rcar_gen4_ptp: Hide private data from users

The Gen4 PTP helper module is already used by RTSN and RSWITCH to
support PTP clocks and will be used by RAVB too. Hide the Gen4 PTP
private data structure to make sure none of the users poke at it.

This will be more important for RAVB use-cases as more then one RAVB
device will need to cooperate using one PTP clock source.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260201183745.1075399-5-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: ethernet: renesas: rcar_gen4_ptp: Add helper to read time
Niklas Söderlund [Sun, 1 Feb 2026 18:37:44 +0000 (19:37 +0100)] 
net: ethernet: renesas: rcar_gen4_ptp: Add helper to read time

Instead of accessing the Gen4 PTP specific structure directly in drivers
add a helper to read the time. This is done in preparation to
completely hide the Gen4 PTP specific structure from users.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260201183745.1075399-4-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: ethernet: renesas: rcar_gen4_ptp: Add helper to get clock index
Niklas Söderlund [Sun, 1 Feb 2026 18:37:43 +0000 (19:37 +0100)] 
net: ethernet: renesas: rcar_gen4_ptp: Add helper to get clock index

Instead of accessing the Gen4 PTP specific structure directly in drivers
add a helper to read the clock index. This is done in preparation to
completely hide the Gen4 PTP specific structure from users.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260201183745.1075399-3-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: ethernet: renesas: rcar_gen4_ptp: Move address assignment
Niklas Söderlund [Sun, 1 Feb 2026 18:37:42 +0000 (19:37 +0100)] 
net: ethernet: renesas: rcar_gen4_ptp: Move address assignment

Instead of accessing the Gen4 PTP specific structure directly in drivers
move the device address assignment into the preparation call. This is
done in preparation to completely hide the Gen4 PTP specific structure
from users.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260201183745.1075399-2-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: bridge: use sysfs_emit instead of sprintf
David Corvaglia [Mon, 2 Feb 2026 19:09:41 +0000 (19:09 +0000)] 
net: bridge: use sysfs_emit instead of sprintf

Replace sprintf with sysfs_emit in sysfs show() methods as outlined in
Documentation/filesystems/sysfs.rst.

sysfs_emit is preferred to sprintf in sysfs show() methods as it is safer
with buffer handling.

Signed-off-by: David Corvaglia <david@corvaglia.dev>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/0100019c1fc2bcc3-bc9ca2f1-22d7-4250-8441-91e4af57117b-000000@email.amazonses.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agobng_en: fix misleading error message for generic firmware version
Alok Tiwari [Mon, 2 Feb 2026 03:38:43 +0000 (19:38 -0800)] 
bng_en: fix misleading error message for generic firmware version

The devlink info_get handler incorrectly reports "roce firmware" when
populating the generic firmware version field.

Update the error message to correctly describe the failing operation.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260202033848.22993-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoMerge branch 'net-stmmac-rk-cleanups-v3-mode-and-speed-for-most'
Jakub Kicinski [Wed, 4 Feb 2026 02:01:47 +0000 (18:01 -0800)] 
Merge branch 'net-stmmac-rk-cleanups-v3-mode-and-speed-for-most'

Russell King says:

====================
net: stmmac: rk: cleanups v3: mode and speed for most

Third installment in the rk cleanups, this converts the interface mode
and speed configuration for most RK SoCs.
====================

Link: https://patch.msgid.link/aYB2cKRu3DQh6yXK@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: convert px30
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:41 +0000 (10:04 +0000)] 
net: stmmac: rk: convert px30

Use rk_set_clk_mac_speed() rather than px30 specific function for
configuring RMII clock.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqnR-00000007VDE-2fM1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: remove need for ->set_speed() method
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:36 +0000 (10:04 +0000)] 
net: stmmac: rk: remove need for ->set_speed() method

As we can detect whether the SoC provides the parameters necessary for
rk_set_reg_speed(), we don't need to have explicit calls to this.
Instead, we can move the contents of this function to
rk_set_clk_tx_rate().

This remsoves all the .set_speed() implementations that merely go on to
invoke rk_set_reg_speed().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqnM-00000007VD8-1xWo@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: use rk_encode_wm16() for RMII clock
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:31 +0000 (10:04 +0000)] 
net: stmmac: rk: use rk_encode_wm16() for RMII clock

The RMII clock is a single bit, which is set for 100M and clear for
10M. Move this out of struct rk_reg_speed_data (which gets rid of
this structure) into the struct rk_clock_fields as the bitmask for
this bit.

This gets rid of the per-SoC variability in the calls to
rk_set_reg_speed().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqnH-00000007VCz-1WmP@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: use rk_encode_wm16() for RMII speed
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:26 +0000 (10:04 +0000)] 
net: stmmac: rk: use rk_encode_wm16() for RMII speed

The RMII speed configuration is encoded as a single bit, which is set
for 100M and clean for 10M. Provide the bitfield definition in
struct rk_clock_fields, moving it out of struct rk_reg_speed_data's
rmii_10 and rmii_100 initialisers. Update rk_set_reg_speed() to handle
the new definition location of this bit.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqnC-00000007VCt-0oRg@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: use rk_encode_wm16() for RGMII clocks
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:21 +0000 (10:04 +0000)] 
net: stmmac: rk: use rk_encode_wm16() for RGMII clocks

As all of the RGMII clock selection bitfields (gmii_clk_sel) use the
same encoding, parameterise this by providing the bitfield mask in
the BSP private data.

This is the last user of GRF_FIELD_CONST(), so remove that definition
as well.

One additional change is for RK3328 - as only gmac2io supports RGMII,
only initialise the mask for this instance.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqn7-00000007VCn-0OZA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: remove rk3528 RMII clock initialisation
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:15 +0000 (10:04 +0000)] 
net: stmmac: rk: remove rk3528 RMII clock initialisation

There is no need to pre-initialise the rk3528 RMII clock when
selecting RMII mode on gmac0.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqn1-00000007VCh-47Sv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: convert rk3588 to rk_set_reg_speed()
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:10 +0000 (10:04 +0000)] 
net: stmmac: rk: convert rk3588 to rk_set_reg_speed()

Update rk_set_reg_speed() to use either the grf or php_grf regmap
depending on the SoC's requirements and convert rk3588, removing
its custom code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqmw-00000007VCb-3glG@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: move speed GRF register offset to private data
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:05 +0000 (10:04 +0000)] 
net: stmmac: rk: move speed GRF register offset to private data

Move the speed/clocking related GRF register offset into the driver
private data, convert rk_set_reg_speed() to use it and initialise this
member either from the corresponding member in struct rk_gmac_ops, or
the SoC specific initialisation function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqmr-00000007VCV-3Cz8@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: convert rk3588 to mask-based interface mode config
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:00 +0000 (10:04 +0000)] 
net: stmmac: rk: convert rk3588 to mask-based interface mode config

rk3588 has a quirk compared to the other Rockchip implementations in
that the interface mode configuration register is in the php_grf
regmap rather than the grf regmap. Add a flag to indicate this, and
a separate function to write to the appropriate regmap. This allows
rk3588 to be converted.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqmm-00000007VCP-2XZc@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: convert to mask-based interface mode configuration
Russell King (Oracle) [Mon, 2 Feb 2026 10:03:55 +0000 (10:03 +0000)] 
net: stmmac: rk: convert to mask-based interface mode configuration

The majority of Rockchip implementations require three common pieces
of information to configure the PHY interface mode:

- The grf register offset for configuring the GMAC phy_intf_sel field
  and the RMII mode bit.
- The bitfield in this register for the GMAC's phy_intf_sel.
- The bit position for RMII mode but clear for RGMII mode.

Introduce members for this information into struct rk_priv_data and
struct rk_gmac_ops, which will be used to pre-initialise the struct
rk_priv_data members. We describe the register contents using
bitfields, even for those that are a single bit for consistency.

As each register comprises of two halves, where the upper half enables
changing the bit state in the lower half, we can describe these
bitfields using a 16-bit data type, and provide rk_encode_wm16() to
generate the actual register values from the field mask and field
value. We are unable to use the FIELD_PREP_WM16() macros for this as
these require the field mask to be a constant.

Add code to rk_gmac_powerup() to get the phy_intf_sel value, validating
that the resulting mode is either RMII or RGMII. No other modes are
supported by any of the Rockchip SoCs supported by this driver.

If either of the bitfield mask values are populated in struct
rk_priv_data, use these to generate the register contents, and write
the resulting value to the specified GRF register.

Convert many Rockchip implementations to use this new infrastructure.
For those where there is a single GMAC instance, it is merely a case of
filling in the new members of struct rk_gmac_ops. For those with
multiple instances, one or more of these members depends on the GMAC
instance, so setup of the members in struct rk_gmac has to be done via
the .init method of struct rk_gmac_ops. The corresponding code is
removed from the set_to_rgmii() and set_to_rmii() implementations.

Since the member name documents the purpose of the field that is being
initialised, providing preprocessor macros to define the bitfields is
deemed to be less than useful given the massive size of this driver.

The existing mechanisms remain behind for those SoCs that can not be
converted to this scheme.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
v2: disable clocks on failure
Link: https://patch.msgid.link/E1vmqmh-00000007VCJ-1xns@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoMerge branch 'accecn-protocol-case-handling-series'
Paolo Abeni [Tue, 3 Feb 2026 14:13:30 +0000 (15:13 +0100)] 
Merge branch 'accecn-protocol-case-handling-series'

Chia-Yu Chang says:

====================
AccECN protocol case handling series

Plesae find the v13 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.

This patch series is part of the full AccECN patch series, which is at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
---
Chia-Yu Chang (13):
  selftests/net: gro: add self-test for TCP CWR flag
  tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
  tcp: disable RFC3168 fallback identifier for CC modules
  tcp: accecn: handle unexpected AccECN negotiation feedback
  tcp: accecn: retransmit downgraded SYN in AccECN negotiation
  tcp: add TCP_SYNACK_RETRANS synack_type
  tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
    SYN/ACK
  tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
  tcp: accecn: fallback outgoing half link to non-AccECN
  tcp: accecn: detect loss ACK w/ AccECN option and add
    TCP_ACCECN_OPTION_PERSIST
  tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info
  tcp: accecn: enable AccECN
  selftests/net: packetdrill: add TCP Accurate ECN cases

Ilpo Järvinen (2):
  tcp: try to avoid safer when ACKs are thinned
  gro: flushing when CWR is set negatively affects AccECN

 Documentation/networking/ip-sysctl.rst        |   4 +-
 .../networking/net_cachelines/tcp_sock.rst    |   1 +
 include/linux/tcp.h                           |   4 +-
 include/net/inet_ecn.h                        |  20 +++-
 include/net/tcp.h                             |  32 +++++-
 include/net/tcp_ecn.h                         | 103 ++++++++++++------
 include/uapi/linux/tcp.h                      |  26 ++++-
 net/ipv4/inet_connection_sock.c               |   3 +
 net/ipv4/sysctl_net_ipv4.c                    |   4 +-
 net/ipv4/tcp.c                                |  10 ++
 net/ipv4/tcp_cong.c                           |   5 +-
 net/ipv4/tcp_input.c                          |  40 ++++++-
 net/ipv4/tcp_minisocks.c                      |  43 +++++---
 net/ipv4/tcp_offload.c                        |   3 +-
 net/ipv4/tcp_output.c                         |  34 ++++--
 net/ipv4/tcp_timer.c                          |   3 +
 tools/testing/selftests/drivers/net/gro.c     |  81 ++++++++++----
 tools/testing/selftests/drivers/net/gro.py    |   3 +-
 .../tcp_accecn_2nd_data_as_first.pkt          |  24 ++++
 .../tcp_accecn_2nd_data_as_first_connect.pkt  |  30 +++++
 .../tcp_accecn_3rd_ack_after_synack_rxmt.pkt  |  19 ++++
 ..._accecn_3rd_ack_ce_updates_received_ce.pkt |  18 +++
 .../tcp_accecn_3rd_ack_lost_data_ce.pkt       |  22 ++++
 .../net/packetdrill/tcp_accecn_3rd_dups.pkt   |  26 +++++
 .../tcp_accecn_acc_ecn_disabled.pkt           |  13 +++
 .../tcp_accecn_accecn_then_notecn_syn.pkt     |  28 +++++
 .../tcp_accecn_accecn_to_rfc3168.pkt          |  18 +++
 .../tcp_accecn_client_accecn_options_drop.pkt |  34 ++++++
 .../tcp_accecn_client_accecn_options_lost.pkt |  38 +++++++
 .../tcp_accecn_clientside_disabled.pkt        |  12 ++
 ...cecn_close_local_close_then_remote_fin.pkt |  25 +++++
 .../tcp_accecn_delivered_2ndlargeack.pkt      |  25 +++++
 ..._accecn_delivered_falseoverflow_detect.pkt |  31 ++++++
 .../tcp_accecn_delivered_largeack.pkt         |  24 ++++
 .../tcp_accecn_delivered_largeack2.pkt        |  25 +++++
 .../tcp_accecn_delivered_maxack.pkt           |  25 +++++
 .../tcp_accecn_delivered_updates.pkt          |  70 ++++++++++++
 .../net/packetdrill/tcp_accecn_ecn3.pkt       |  12 ++
 .../tcp_accecn_ecn_field_updates_opt.pkt      |  35 ++++++
 .../packetdrill/tcp_accecn_ipflags_drop.pkt   |  14 +++
 .../tcp_accecn_listen_opt_drop.pkt            |  16 +++
 .../tcp_accecn_multiple_syn_ack_drop.pkt      |  28 +++++
 .../tcp_accecn_multiple_syn_drop.pkt          |  18 +++
 .../tcp_accecn_negotiation_bleach.pkt         |  23 ++++
 .../tcp_accecn_negotiation_connect.pkt        |  23 ++++
 .../tcp_accecn_negotiation_listen.pkt         |  26 +++++
 .../tcp_accecn_negotiation_noopt_connect.pkt  |  23 ++++
 .../tcp_accecn_negotiation_optenable.pkt      |  23 ++++
 .../tcp_accecn_no_ecn_after_accecn.pkt        |  20 ++++
 .../net/packetdrill/tcp_accecn_noopt.pkt      |  27 +++++
 .../net/packetdrill/tcp_accecn_noprogress.pkt |  27 +++++
 .../tcp_accecn_notecn_then_accecn_syn.pkt     |  28 +++++
 .../tcp_accecn_rfc3168_to_fallback.pkt        |  18 +++
 .../tcp_accecn_rfc3168_to_rfc3168.pkt         |  18 +++
 .../tcp_accecn_sack_space_grab.pkt            |  28 +++++
 .../tcp_accecn_sack_space_grab_with_ts.pkt    |  39 +++++++
 ...tcp_accecn_serverside_accecn_disabled1.pkt |  20 ++++
 ...tcp_accecn_serverside_accecn_disabled2.pkt |  20 ++++
 .../tcp_accecn_serverside_broken.pkt          |  19 ++++
 .../tcp_accecn_serverside_ecn_disabled.pkt    |  19 ++++
 .../tcp_accecn_serverside_only.pkt            |  18 +++
 ...n_syn_ace_flags_acked_after_retransmit.pkt |  18 +++
 .../tcp_accecn_syn_ace_flags_drop.pkt         |  16 +++
 ...n_ack_ace_flags_acked_after_retransmit.pkt |  27 +++++
 .../tcp_accecn_syn_ack_ace_flags_drop.pkt     |  26 +++++
 .../net/packetdrill/tcp_accecn_syn_ce.pkt     |  13 +++
 .../net/packetdrill/tcp_accecn_syn_ect0.pkt   |  13 +++
 .../net/packetdrill/tcp_accecn_syn_ect1.pkt   |  13 +++
 .../net/packetdrill/tcp_accecn_synack_ce.pkt  |  27 +++++
 ..._accecn_synack_ce_updates_delivered_ce.pkt |  22 ++++
 .../packetdrill/tcp_accecn_synack_ect0.pkt    |  24 ++++
 .../packetdrill/tcp_accecn_synack_ect1.pkt    |  24 ++++
 .../packetdrill/tcp_accecn_synack_rexmit.pkt  |  15 +++
 .../packetdrill/tcp_accecn_synack_rxmt.pkt    |  25 +++++
 .../packetdrill/tcp_accecn_tsnoprogress.pkt   |  26 +++++
 .../net/packetdrill/tcp_accecn_tsprogress.pkt |  25 +++++
 76 files changed, 1680 insertions(+), 102 deletions(-)
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_2nd_data_as_first.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_2nd_data_as_first_connect.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_ack_after_synack_rxmt.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_ack_ce_updates_received_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_ack_lost_data_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_dups.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_acc_ecn_disabled.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_accecn_then_notecn_syn.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_accecn_to_rfc3168.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_client_accecn_options_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_client_accecn_options_lost.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_clientside_disabled.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_close_local_close_then_remote_fin.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_2ndlargeack.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_falseoverflow_detect.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_largeack.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_largeack2.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_maxack.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_updates.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_ecn3.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_ecn_field_updates_opt.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_ipflags_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_listen_opt_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_multiple_syn_ack_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_multiple_syn_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_bleach.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_connect.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_listen.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_noopt_connect.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_optenable.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_no_ecn_after_accecn.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_noopt.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_noprogress.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_notecn_then_accecn_syn.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_rfc3168_to_fallback.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_rfc3168_to_rfc3168.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_sack_space_grab.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_sack_space_grab_with_ts.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_accecn_disabled1.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_accecn_disabled2.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_broken.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_ecn_disabled.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_only.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ace_flags_acked_after_retransmit.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ace_flags_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ack_ace_flags_acked_after_retransmit.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ack_ace_flags_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ect0.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ect1.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ce_updates_delivered_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ect0.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ect1.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_rexmit.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_rxmt.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_tsnoprogress.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_tsprogress.pkt
====================

Link: https://patch.msgid.link/20260131222515.8485-1-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agoselftests/net: packetdrill: add TCP Accurate ECN cases
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:15 +0000 (23:25 +0100)] 
selftests/net: packetdrill: add TCP Accurate ECN cases

Linux Accurate ECN test sets using ACE counters and AccECN options to
cover several scenarios: Connection teardown, different ACK conditions,
counter wrapping, SACK space grabbing, fallback schemes, negotiation
retransmission/reorder/loss, AccECN option drop/loss, different
handshake reflectors, data with marking, and different sysctl values.

The packetdrill used is commit cbe405666c9c8698ac1e72f5e8ffc551216dfa56
of repo: https://github.com/minuscat/packetdrill/tree/upstream_accecn.
And corresponding patches are sent to google/packetdrill email list.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-16-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: accecn: enable AccECN
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:14 +0000 (23:25 +0100)] 
tcp: accecn: enable AccECN

Enable Accurate ECN negotiation and request for incoming and
outgoing connection by setting sysctl_tcp_ecn:

+==============+===========================================+
|              |  Highest ECN variant (Accurate ECN, ECN,  |
|   tcp_ecn    |  or no ECN) to be negotiated & requested  |
|              +---------------------+---------------------+
|              | Incoming connection | Outgoing connection |
+==============+=====================+=====================+
|      0       |        No ECN       |        No ECN       |
|      1       |         ECN         |         ECN         |
|      2       |         ECN         |        No ECN       |
+--------------+---------------------+---------------------+
|      3       |     Accurate ECN    |     Accurate ECN    |
|      4       |     Accurate ECN    |         ECN         |
|      5       |     Accurate ECN    |        No ECN       |
+==============+=====================+=====================+

Refer Documentation/networking/ip-sysctl.rst for more details.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-15-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:13 +0000 (23:25 +0100)] 
tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info

Add 2-bit tcpi_ecn_mode feild within tcp_info to indicate which ECN
mode is negotiated: ECN_MODE_DISABLED, ECN_MODE_RFC3168, ECN_MODE_ACCECN,
or ECN_MODE_PENDING. This is done by utilizing available bits from
tcpi_accecn_opt_seen (reduced from 16 bits to 2 bits) and
tcpi_accecn_fail_mode (reduced from 16 bits to 4 bits).

Also, an extra 24-bit tcpi_options2 field is identified to represent
newer options and connection features, as all 8 bits of tcpi_options
field have been used.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-14-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:12 +0000 (23:25 +0100)] 
tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST

Detect spurious retransmission of a previously sent ACK carrying the
AccECN option after the second retransmission. Since this might be caused
by the middlebox dropping ACK with options it does not recognize, disable
the sending of the AccECN option in all subsequent ACKs. This patch
follows Section 3.2.3.2.2 of AccECN spec (RFC9768), and a new field
(accecn_opt_sent_w_dsack) is added to indicate that an AccECN option was
sent with duplicate SACK info.

Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl:
(TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and
persistently sends AccECN option once it fits into TCP option space.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-13-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: accecn: fallback outgoing half link to non-AccECN
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:11 +0000 (23:25 +0100)] 
tcp: accecn: fallback outgoing half link to non-AccECN

According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server
is in AccECN mode and in SYN-RCVD state, and if it receives a value of
zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the
connection the Server MUST NOT set ECT on outgoing packets and MUST
NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it
MUST NOT disable AccECN feedback.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-12-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:10 +0000 (23:25 +0100)] 
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion

Based on specification:
  https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt

Based on Section 3.1.5 of AccECN spec (RFC9768), a TCP Server in
AccECN mode MUST NOT set ECT on any packet for the rest of the connection,
if it has received or sent at least one valid SYN or Acceptable SYN/ACK
with (AE,CWR,ECE) = (0,0,0) during the handshake.

In addition, a host in AccECN mode that is feeding back the IP-ECN
field on a SYN or SYN/ACK MUST feed back the IP-ECN field on the
latest valid SYN or acceptable SYN/ACK to arrive.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-11-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:09 +0000 (23:25 +0100)] 
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK

For Accurate ECN, the first SYN/ACK sent by the TCP server shall set
the ACE flag (Table 1 of RFC9768) and the AccECN option to complete the
capability negotiation. However, if the TCP server needs to retransmit
such a SYN/ACK (for example, because it did not receive an ACK
acknowledging its SYN/ACK, or received a second SYN requesting AccECN
support), the TCP server retransmits the SYN/ACK without the AccECN
option. This is because the SYN/ACK may be lost due to congestion, or a
middlebox may block the AccECN option. Furthermore, if this retransmission
also times out, to expedite connection establishment, the TCP server
should retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and without the
AccECN option, while maintaining AccECN feedback mode.

This complies with Section 3.2.3.2.2 of the AccECN spec RFC9768.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-10-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: add TCP_SYNACK_RETRANS synack_type
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:08 +0000 (23:25 +0100)] 
tcp: add TCP_SYNACK_RETRANS synack_type

Before this patch, retransmitted SYN/ACK did not have a specific
synack_type; however, the upcoming patch needs to distinguish between
retransmitted and non-retransmitted SYN/ACK for AccECN negotiation to
transmit the fallback SYN/ACK during AccECN negotiation. Therefore, this
patch introduces a new synack_type (TCP_SYNACK_RETRANS).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-9-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: accecn: retransmit downgraded SYN in AccECN negotiation
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:07 +0000 (23:25 +0100)] 
tcp: accecn: retransmit downgraded SYN in AccECN negotiation

Based on AccECN spec (RFC9768) Section 3.1.4.1, if the sender of an
AccECN SYN (the TCP Client) times out before receiving the SYN/ACK, it
SHOULD attempt to negotiate the use of AccECN at least one more time
by continuing to set all three TCP ECN flags (AE,CWR,ECE) = (1,1,1) on
the first retransmitted SYN (using the usual retransmission time-outs).

If this first retransmission also fails to be acknowledged, in
deployment scenarios where AccECN path traversal might be problematic,
the TCP Client SHOULD send subsequent retransmissions of the SYN with
the three TCP-ECN flags cleared (AE,CWR,ECE) = (0,0,0).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-8-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: accecn: handle unexpected AccECN negotiation feedback
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:06 +0000 (23:25 +0100)] 
tcp: accecn: handle unexpected AccECN negotiation feedback

According to Sections 3.1.2 and 3.1.3 of AccECN spec (RFC9768).

In Section 3.1.2, it says an AccECN implementation has no need to
recognize or support the Server response labelled 'Nonce' or ECN-nonce
feedback more generally, as RFC 3540 has been reclassified as Historic.
AccECN is compatible with alternative ECN feedback integrity approaches
to the nonce. The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1)
is reserved for future use. A TCP Client (A) that receives such a SYN/ACK
follows the procedure for forward compatibility given in Section 3.1.3.

Then in Section 3.1.3, it says if a TCP Client has sent a SYN requesting
AccECN feedback with (AE,CWR,ECE) = (1,1,1) then receives a SYN/ACK with
the currently reserved combination (AE,CWR,ECE) = (1,0,1) but it does not
have logic specific to such a combination, the Client MUST enable AccECN
mode as if the SYN/ACK onfirmed that the Server supported AccECN and as
if it fed back that the IP-ECN field on the SYN had arrived unchanged.

Fixes: 3cae34274c79 ("tcp: accecn: AccECN negotiation").
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-7-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: disable RFC3168 fallback identifier for CC modules
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:05 +0000 (23:25 +0100)] 
tcp: disable RFC3168 fallback identifier for CC modules

When AccECN is not successfully negociated for a TCP flow, it defaults
fallback to classic ECN (RFC3168). However, L4S service will fallback
to non-ECN.

This patch enables congestion control module to control whether it
should not fallback to classic ECN after unsuccessful AccECN negotiation.
A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this
behavior expected by the CA.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-6-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:04 +0000 (23:25 +0100)] 
tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers

Two flags for congestion control (CC) module are added in this patch
related to AccECN negotiation. First, a new flag (TCP_CONG_NEEDS_ACCECN)
defines that the CC expects to negotiate AccECN functionality using the
ECE, CWR and AE flags in the TCP header.

Second, during ECN negotiation, ECT(0) in the IP header is used. This
patch enables CC to control whether ECT(0) or ECT(1) should be used on
a per-segment basis. A new flag (TCP_CONG_ECT_1_NEGOTIATION) defines the
expected ECT value in the IP header by the CA when not-yet initialized
for the connection.

The detailed AccECN negotiaotn can be found in IETF RFC9768.

Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-5-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agoselftests/net: gro: add self-test for TCP CWR flag
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:03 +0000 (23:25 +0100)] 
selftests/net: gro: add self-test for TCP CWR flag

Currently, GRO does not flush packets when the CWR bit is set.
A corresponding self-test is being added, in which the CWR flag
is set for two consecutive packets, but the first packet with the
CWR flag set will not be flushed immediately.

+===================+==========+===============+===========+
|     Packet id     | CWR flag |    Payload    | Flushing? |
+===================+==========+===============+===========+
|         0         |     0    |  PAYLOAD_LEN  |     0     |
|        ...        |     0    |  PAYLOAD_LEN  |     1     |
+-------------------+----------+---------------+-----------+
| NUM_PACKETS/2 - 1 |     1    |  payload_len  |     0     |
|   NUM_PACKETS/2   |     1    |  payload_len  |     1     |
+-------------------+----------+---------------+-----------+
|        ...        |     0    |  PAYLOAD_LEN  |     0     |
|   NUM_PACKETS     |     0    |  PAYLOAD_LEN  |     1     |
+===================+==========+===============+===========+

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-4-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agogro: flushing when CWR is set negatively affects AccECN
Ilpo Järvinen [Sat, 31 Jan 2026 22:25:02 +0000 (23:25 +0100)] 
gro: flushing when CWR is set negatively affects AccECN

As AccECN may keep CWR bit asserted due to different
interpretation of the bit, flushing with GRO because of
CWR may effectively disable GRO until AccECN counter
field changes such that CWR-bit becomes 0.

There is no harm done from not immediately forwarding the
CWR'ed segment with RFC3168 ECN.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-3-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agotcp: try to avoid safer when ACKs are thinned
Ilpo Järvinen [Sat, 31 Jan 2026 22:25:01 +0000 (23:25 +0100)] 
tcp: try to avoid safer when ACKs are thinned

Add newly acked pkts EWMA. When ACK thinning occurs, select
between safer and unsafe cep delta in AccECN processing based
on it. If the packets ACKed per ACK tends to be large, don't
conservatively assume ACE field overflow.

This patch uses the existing 2-byte holes in the rx group for new
u16 variables withtout creating more holes. Below are the pahole
outcomes before and after this patch:

[BEFORE THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    /* XXX 4 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 177 */
}

[AFTER THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    u16                        pkts_acked_ewma;        /*  2756     2 */
    /* XXX 2 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 178 */
}

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-2-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agoMerge branch 'net-dsa-yt921x-add-dcb-qos-support'
Paolo Abeni [Tue, 3 Feb 2026 14:09:32 +0000 (15:09 +0100)] 
Merge branch 'net-dsa-yt921x-add-dcb-qos-support'

David Yang says:

====================
net: dsa: yt921x: Add DCB/QoS support

This series add DCB/QoS support to the driver.

v5: https://lore.kernel.org/r/20260128215202.2244266-1-mmyangfl@gmail.com
v4: https://lore.kernel.org/r/20260127020847.1482724-1-mmyangfl@gmail.com
v3: https://lore.kernel.org/r/20260125001328.3784006-1-mmyangfl@gmail.com
v2: https://lore.kernel.org/r/20260122194233.2777550-1-mmyangfl@gmail.com
v1: https://lore.kernel.org/r/20260119185935.2072685-1-mmyangfl@gmail.com
====================

Link: https://patch.msgid.link/20260131021854.3405036-1-mmyangfl@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agonet: dsa: yt921x: Add DCB/QoS support
David Yang [Sat, 31 Jan 2026 02:18:51 +0000 (10:18 +0800)] 
net: dsa: yt921x: Add DCB/QoS support

Set up global DSCP/PCP priority mappings and add related DCB methods.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-6-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agonet: dsa: yt921x: Refactor yt921x_chip_setup()
David Yang [Sat, 31 Jan 2026 02:18:50 +0000 (10:18 +0800)] 
net: dsa: yt921x: Refactor yt921x_chip_setup()

yt921x_chip_setup() is already pretty long, and is going to become
longer. Split it into parts.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-5-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agonet: dsa: yt921x: Refactor VLAN awareness setting
David Yang [Sat, 31 Jan 2026 02:18:49 +0000 (10:18 +0800)] 
net: dsa: yt921x: Refactor VLAN awareness setting

Create a helper function to centralize the logic for enabling and
disabling VLAN awareness on a port.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-4-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agonet: dsa: tag_yt921x: add priority support
David Yang [Sat, 31 Jan 2026 02:18:48 +0000 (10:18 +0800)] 
net: dsa: tag_yt921x: add priority support

Required by DCB/QoS support of the switch driver, since the rx packets
will have non-zero priorities.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-3-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agonet: dsa: tag_yt921x: clarify priority and code fields
David Yang [Sat, 31 Jan 2026 02:18:47 +0000 (10:18 +0800)] 
net: dsa: tag_yt921x: clarify priority and code fields

Packet priority is part of the tag, and the priority and code fields are
used by tx and rx. Make revisions to reflect the facts.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-2-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agoMerge branch 'net-phy-remove-modalias-based-mdio-device-bus-matching'
Paolo Abeni [Tue, 3 Feb 2026 11:46:57 +0000 (12:46 +0100)] 
Merge branch 'net-phy-remove-modalias-based-mdio-device-bus-matching'

Heiner Kallweit says:

====================
net: phy: remove modalias-based MDIO device bus matching

modalias-based MDIO device bus matching has only one user (dsa-loop),
where we can replace modalias-based matching with a simple custom
match function. This, and first patch of the series, lay the foundation
for removing modalias-based matching.
====================

Link: https://patch.msgid.link/d9543e7d-23e1-4dba-a6b3-35dcd6a35dec@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agonet: phy: remove modalias-based mdio bus matching
Heiner Kallweit [Sat, 31 Jan 2026 17:40:00 +0000 (18:40 +0100)] 
net: phy: remove modalias-based mdio bus matching

Last user dsa_loop has been migrated away from modalias-based matching,
so we can remove this feature now. It was the only user of MDIO_NAME_SIZE,
so remove also this constant.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ce1c6df0-4785-4b28-8322-32dc6bceea18@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agonet: dsa: loop: remove MDIO device modalias
Heiner Kallweit [Sat, 31 Jan 2026 17:38:56 +0000 (18:38 +0100)] 
net: dsa: loop: remove MDIO device modalias

This change is a prerequisite for removing the MDIO device modalias,
as dsa_loop is the only user. Switch from modalias to a custom
bus match function.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/15a4318f-50b5-4df5-874e-e387ee070a9d@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 days agonet: ethernet: adi: make name member of struct adin1110_cfg a pointer
Heiner Kallweit [Sat, 31 Jan 2026 17:37:15 +0000 (18:37 +0100)] 
net: ethernet: adi: make name member of struct adin1110_cfg a pointer

Primary reason for this change is to remove the misuse of MDIO_NAME_SIZE
here, so that this constant can be removed in a follow-up patch.
Use case here is simply a chip name w/o any relationship to a MDIO
device. Also there's no need to reserve a longer char array, so make
the name a pointer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/61bc14fa-eed3-43b6-ae40-b98063e81578@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agoMerge branch 'devlink-and-mlx5-support-cross-function-rate-scheduling'
Jakub Kicinski [Tue, 3 Feb 2026 04:05:51 +0000 (20:05 -0800)] 
Merge branch 'devlink-and-mlx5-support-cross-function-rate-scheduling'

Tariq Toukan says:

====================
devlink and mlx5: Support cross-function rate scheduling [part]

Apply trivial cleanups from the series to make it smaller.
====================

Link: https://patch.msgid.link/20260128112544.1661250-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agodevlink: Refactor devlink_rate_nodes_check
Cosmin Ratiu [Wed, 28 Jan 2026 11:25:35 +0000 (13:25 +0200)] 
devlink: Refactor devlink_rate_nodes_check

devlink_rate_nodes_check() was used to verify there are no devlink rate
nodes created when switching the esw mode.

Rate management code is about to become more complex, so refactor this
function:
- remove unused param 'mode'.
- add a new 'rate_filter' param.
- rename to devlink_rates_check().
- expose devlink_rate_is_node() to be used as a rate filter.

This makes it more usable from multiple places, so use it from those
places as well.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260128112544.1661250-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agodevlink: Reverse locking order for nested instances
Cosmin Ratiu [Wed, 28 Jan 2026 11:25:33 +0000 (13:25 +0200)] 
devlink: Reverse locking order for nested instances

Commit [1] defined the locking expectations for nested devlink
instances: the nested-in devlink instance lock needs to be acquired
before the nested devlink instance lock. The code handling devlink rels
was architected with that assumption in mind.

There are no actual users of double locking yet but that is about to
change in the upcoming patches in the series.

Code operating on nested devlink instances will require also obtaining
the nested-in instance lock, but such code may already be called from a
variety of places with the nested devlink instance lock. Then, there's
no way to acquire the nested-in lock other than making sure that all
callers acquire it first.

Reversing the nested lock order allows incrementally acquiring the
nested-in instance lock when needed (perhaps even a chain of locks up to
the root) without affecting any caller.

The only affected use of nesting is devlink_nl_nested_fill(), which
iterates over nested devlink instances with the RCU lock, without
locking them, so there's no possibility of deadlock.

So this commit just updates a comment regarding the nested locks.

[1] commit c137743bce02b ("devlink: introduce object and nested devlink
relationship infra")

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260128112544.1661250-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'net-stmmac-pcs-preparation'
Jakub Kicinski [Tue, 3 Feb 2026 03:16:05 +0000 (19:16 -0800)] 
Merge branch 'net-stmmac-pcs-preparation'

Russell King says:

====================
net: stmmac: pcs preparation

These three patches prepare for the PCS changes, which, subject
to Qualcomm testing, should be coming in the next cycle.
====================

Link: https://patch.msgid.link/aXyRlFw7ZuhRPiKo@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: stmmac: handle integrated PCS phy_intf_sel separately
Russell King (Oracle) [Fri, 30 Jan 2026 11:10:36 +0000 (11:10 +0000)] 
net: stmmac: handle integrated PCS phy_intf_sel separately

The dwmac core has no support for SGMII without using its integrated
PCS. Thus, PHY_INTF_SEL_SGMII is only supported when this block is
present, and it makes no sense for stmmac_get_phy_intf_sel() to decode
this.

None of the platform glue users that use stmmac_get_phy_intf_sel()
directly accept PHY_INTF_SEL_SGMII as a valid mode.

Check whether a PCS will be used by the driver for the interface mode,
and if it is the integrated PCS, query the integrated PCS for the
phy_intf_sel_i value to use.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vlmOa-00000006zvB-1fIe@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: stmmac: move most PCS register definitions to stmmac_pcs.c
Russell King (Oracle) [Fri, 30 Jan 2026 11:10:31 +0000 (11:10 +0000)] 
net: stmmac: move most PCS register definitions to stmmac_pcs.c

Move most of the PCS register offset definitions to stmmac_pcs.c.
Since stmmac_pcs.c only ever passes zero into the register offset
macros, remove that ability, making them simple constant integer
definitions.

Add appropriate descriptions of the registers, pointing out their
similarity with their IEEE 802.3 counterparts. Make use of the
BMSR definitions for the GMAC_AN_STATUS register and remove the
driver private versions.

Note that BMSR_LSTATUS is non-low-latching, unlike it's 802.3z
counterpart.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vlmOV-00000006zv5-1CwO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: stmmac: clear half-duplex caps where unsupported
Russell King (Oracle) [Fri, 30 Jan 2026 11:10:26 +0000 (11:10 +0000)] 
net: stmmac: clear half-duplex caps where unsupported

Where a core supports hardware features, but does not indicate support
for half-duplex, clear phylink's half-duplex 1G, 100M and 10M
capability bits to disallow half-duplex operation and advertisement of
these link modes.

This will avoid the need for special code in the PCS driver to do this
based on the ESTATUS register bits, as the support in the PCS is
dependent on the same synthesis choice as the MAC core.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vlmOQ-00000006zuz-0ffN@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'add-support-for-renesas-rz-g3l-gbeth'
Jakub Kicinski [Tue, 3 Feb 2026 03:12:19 +0000 (19:12 -0800)] 
Merge branch 'add-support-for-renesas-rz-g3l-gbeth'

Biju Das says:

====================
Add support for Renesas RZ/G3L GBETH

From: Biju Das <biju.das.jz@bp.renesas.com>

The Renesas RZ/G3L GBETH IP uses Synopsys DesignWare MAC version 5.30
compared to other Renesas SoC such as RZ/V2H that use MAC version 5.20.

The RZ/G3L GBETH requires an extra clock compared to RZ/G3E and has pps
interrupts. Document the Renesas RZ/G3L GBETH IP in bindings and add
support for the RZ/G3L GBETH in dwmac-renesas-gbeth glue driver.
====================

Link: https://patch.msgid.link/20260131161250.5047-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: stmmac: dwmac-renesas-gbeth: Add support for RZ/G3L SoC
Biju Das [Sat, 31 Jan 2026 16:12:43 +0000 (16:12 +0000)] 
net: stmmac: dwmac-renesas-gbeth: Add support for RZ/G3L SoC

Compared to other Renesas GBETH stmmac glue drivers, RZ/G3L GBETH IP use
the version Synopsys DesignWare MAC (version 5.30). It has an extra clock
compared to RZ/V2H and has ptp_pps_o interrupts. Add support for RZ/G3L
GBETH by reusing device data of RZ/V2H and can be extended to add other
functionalities later.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260131161250.5047-3-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agodt-bindings: net: renesas,rzv2h-gbeth: Document Renesas RZ/G3L SoC
Biju Das [Sat, 31 Jan 2026 16:12:42 +0000 (16:12 +0000)] 
dt-bindings: net: renesas,rzv2h-gbeth: Document Renesas RZ/G3L SoC

Add device tree binding support for the Gigabit Ethernet (GBETH) IP on
Renesas RZ/G3L SoC. This SoC uses different Synopsys DesignWare MAC
version 5.30 compared to RZ/G3E.

RZ/G3L requires an extra clock compared to RZ/G3E and has pps interrupts.

Add a new compatible string "renesas,r9a08g046-gbeth" for RZ/G3L SoC and
update the schema to handle hardware differences between SoC variants.

Extend the base snps,dwmac.yaml schema to accommodate the PPS interrupts.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260131161250.5047-2-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'mptcp-implement-read_sock-and-splice_read'
Jakub Kicinski [Tue, 3 Feb 2026 02:15:35 +0000 (18:15 -0800)] 
Merge branch 'mptcp-implement-read_sock-and-splice_read'

Matthieu Baerts says:

====================
mptcp: implement .read_sock and .splice_read

This series is a preparation work for future in-kernel MPTCP sockets
usage. Here, two interfaces are implemented: read_sock and splice_read.
As a result of this series, splice() with MPTCP sockets -- which was
already supported -- is now improved.

- Patches 1-2: .read_sock implementation

- Patches 3-4: .splice_read implementation

- Patches 5-6: validate splice() support with MPTCP sockets.
====================

Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-0-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoselftests: mptcp: connect: cover splice mode
Geliang Tang [Fri, 30 Jan 2026 19:24:29 +0000 (20:24 +0100)] 
selftests: mptcp: connect: cover splice mode

The "splice" alternate mode for mptcp_connect.sh/.c is available now,
this patch adds mptcp_connect_splice.sh to test it in the MPTCP CI by
default.

Note that this mode is also supported by stable kernel versions, but
optimised in this patch series.

Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-6-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoselftests: mptcp: add splice io mode
Geliang Tang [Fri, 30 Jan 2026 19:24:28 +0000 (20:24 +0100)] 
selftests: mptcp: add splice io mode

This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice_read() functions of MPTCP.

do_splice() efficiently transfers data directly between two file
descriptors (infd and outfd) without copying to userspace, using
Linux's splice() system call.

Usage:
./mptcp_connect.sh -m splice

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-5-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agomptcp: implement .splice_read
Geliang Tang [Fri, 30 Jan 2026 19:24:27 +0000 (20:24 +0100)] 
mptcp: implement .splice_read

This patch implements .splice_read interface of mptcp struct proto_ops
as mptcp_splice_read() with reference to tcp_splice_read().

Corresponding to __tcp_splice_read(), __mptcp_splice_read() is defined,
invoking mptcp_read_sock() instead of tcp_read_sock().

mptcp_splice_read() is almost the same as tcp_splice_read(), except for
sock_rps_record_flow().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-4-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: export tcp_splice_state
Geliang Tang [Fri, 30 Jan 2026 19:24:26 +0000 (20:24 +0100)] 
tcp: export tcp_splice_state

Export struct tcp_splice_state and tcp_splice_data_recv() in net/tcp.h
so that they can be used by MPTCP in the next patch.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-3-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agomptcp: implement .read_sock
Geliang Tang [Fri, 30 Jan 2026 19:24:25 +0000 (20:24 +0100)] 
mptcp: implement .read_sock

Current in-kernel TCP sockets -- i.e. from nvme_tcp_try_recv() -- need
to call .read_sock interface of struct proto_ops, but it's not
implemented in MPTCP.

This patch implements it with reference to __tcp_read_sock() and
__mptcp_recvmsg_mskq().

Corresponding to tcp_recv_skb(), a new helper for MPTCP named
mptcp_recv_skb() is added to peek a skb from sk->sk_receive_queue.

Compared with __mptcp_recvmsg_mskq(), mptcp_read_sock() uses
sk->sk_rcvbuf as the max read length. The LISTEN status is checked
before the while loop, and mptcp_recv_skb() and mptcp_cleanup_rbuf()
are invoked after the loop. In the loop, all flags checks for
__mptcp_recvmsg_mskq() are removed.

Reviewed-by: Hannes Reinecke <hare@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-2-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agomptcp: add eat_recv_skb helper
Geliang Tang [Fri, 30 Jan 2026 19:24:24 +0000 (20:24 +0100)] 
mptcp: add eat_recv_skb helper

This patch extracts the free skb related code in __mptcp_recvmsg_mskq()
into a new helper mptcp_eat_recv_skb().

This new helper will be used in the next patch.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-1-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'ptp-vmclock-add-vm-generation-counter-and-acpi-notification'
Jakub Kicinski [Tue, 3 Feb 2026 02:06:02 +0000 (18:06 -0800)] 
Merge branch 'ptp-vmclock-add-vm-generation-counter-and-acpi-notification'

Takahiro Itazuri says:

====================
ptp: vmclock: Add VM generation counter and ACPI notification

Similarly to live migration, starting a VM from some serialized state
(aka snapshot) is an event which calls for adjusting guest clocks, hence
a hypervisor should increase the disruption_marker before resuming the
VM vCPUs, letting the guest know.

However, loading a snapshot, is slightly different than live migration,
especially since we can start multiple VMs from the same serialized
state. Apart from adjusting clocks, the guest needs to take additional
action during such events, e.g. recreate UUIDs, reset network
adapters/connections, reseed entropy pools, etc. These actions are not
necessary during live migration. This calls for a differentiation
between the two triggering events.

We differentiate between the two events via an extra field in the
vmclock_abi, called vm_generation_counter. Whereas hypervisors should
increase the disruption marker in both cases, they should only increase
vm_generation_counter when a snapshot is loaded in a VM (not during live
migration).

Additionally, we attach an ACPI notification to VMClock. Implementing
the notification is optional for the device. VMClock device will declare
that it implements the notification by setting
VMCLOCK_FLAG_NOTIFICATION_PRESENT bit in vmclock_abi flags. Hypervisors
that implement the notification must send an ACPI notification every
time seq_count changes to an even number. The driver will propagate
these notifications to userspace via the poll() interface.
====================

Link: https://patch.msgid.link/20260130173704.12575-1-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: ptp_vmclock: return TAI not UTC
David Woodhouse [Fri, 30 Jan 2026 17:36:06 +0000 (17:36 +0000)] 
ptp: ptp_vmclock: return TAI not UTC

To output UTC would involve complex calculations about whether the time
elapsed since the reference time has crossed the end of the month when
a leap second takes effect. I've prototyped that, but it made me sad.

Much better to report TAI, which is what PHCs should do anyway.
And much much simpler.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-8-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: ptp_vmclock: remove dependency on CONFIG_ACPI
David Woodhouse [Fri, 30 Jan 2026 17:36:05 +0000 (17:36 +0000)] 
ptp: ptp_vmclock: remove dependency on CONFIG_ACPI

Now that we added device tree support we can remove dependency on
CONFIG_ACPI.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Tested-by: Takahiro Itazuri <itazur@amazon.dom>
Link: https://patch.msgid.link/20260130173704.12575-7-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: ptp_vmclock: add 'VMCLOCK' to ACPI device match
David Woodhouse [Fri, 30 Jan 2026 17:36:04 +0000 (17:36 +0000)] 
ptp: ptp_vmclock: add 'VMCLOCK' to ACPI device match

As we finalised the spec, we spotted that vmgenid actually says that the
_HID is supposed to be hypervisor-specific. Although in the 13 years
since the original vmgenid doc was published, nobody seems to have cared
about using _HID to distinguish between implementations on different
hypervisors, and we only ever use the _CID.

For consistency, match the _CID of "VMCLOCK" too.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-6-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: ptp_vmclock: Add device tree support
David Woodhouse [Fri, 30 Jan 2026 17:36:03 +0000 (17:36 +0000)] 
ptp: ptp_vmclock: Add device tree support

Add device tree support to the ptp_vmclock driver, allowing it to probe
via device tree in addition to ACPI.

Handle optional interrupt for clock disruption notifications, mirroring
the ACPI notification behaviour.

Although the interrupt is marked as 'optional' in the DT bindings, if
the device *advertises* the VMCLOCK_FLAG_NOTIFICATION_ABSENT then it
*should* have an interrupt. The driver will refuse to initialize if not.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-5-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agodt-bindings: ptp: Add amazon,vmclock
David Woodhouse [Fri, 30 Jan 2026 17:36:02 +0000 (17:36 +0000)] 
dt-bindings: ptp: Add amazon,vmclock

The vmclock device provides a PTP clock source and precise timekeeping
across live migration and snapshot/restore operations.

The binding has a required memory region containing the vmclock_abi
structure and an optional interrupt for clock disruption notifications.

The full spec is at https://uapi-group.org/specifications/specs/vmclock/

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-4-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: vmclock: support device notifications
Babis Chalios [Fri, 30 Jan 2026 17:36:01 +0000 (17:36 +0000)] 
ptp: vmclock: support device notifications

Add optional support for device notifications in VMClock. When
supported, the hypervisor will send a device notification every time it
updates the seq_count to a new even value.

Moreover, add support for poll() in VMClock as a means to propagate this
notification to user space. poll() will return a POLLIN event to
listeners every time seq_count changes to a value different than the one
last seen (since open() or last read()/pread()). This means that when
poll() returns a POLLIN event, listeners need to use read() to observe
what has changed and update the reader's view of seq_count. In other
words, after a poll() returned, all subsequent calls to poll() will
immediately return with a POLLIN event until the listener calls read().

The device advertises support for the notification mechanism by setting
flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If
the flag is not present the driver won't setup the ACPI notification
handler and poll() will always immediately return POLLHUP.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-3-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: vmclock: add vm generation counter
Babis Chalios [Fri, 30 Jan 2026 17:36:00 +0000 (17:36 +0000)] 
ptp: vmclock: add vm generation counter

Similar to live migration, loading a VM from some saved state (aka
snapshot) is also an event that calls for clock adjustments in the
guest. However, guests might want to take more actions as a response to
such events, e.g. as discarding UUIDs, resetting network connections,
reseeding entropy pools, etc. These are actions that guests don't
typically take during live migration, so add a new field in the
vmclock_abi called vm_generation_counter which informs the guest about
such events.

Hypervisor advertises support for vm_generation_counter through the
VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT flag. Users need to check the
presence of this bit in vmclock_abi flags field before using this flag.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Takahiro Itazur <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-2-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'ipv6-misc-changes-in-output-path'
Jakub Kicinski [Tue, 3 Feb 2026 01:49:31 +0000 (17:49 -0800)] 
Merge branch 'ipv6-misc-changes-in-output-path'

Eric Dumazet says:

====================
ipv6: misc changes in output path

Small optimizations mostly in ip6_xmit() path.

TX performance increases by about 3 %.

Patches 5-7: add dst4_mtu() and dst6_mtu() to save space.

Last patch colocates inet6_cork in inet_cork_full.

This series reduces kernel size by 494 bytes on x86_64:

scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 4/2 grow/shrink: 9/23 up/down: 665/-1159 (-494)
Function                                     old     new   delta
ip6_finish_output_gso_slowpath_drop            -     197    +197
ip6_xmit                                    1452    1595    +143
do_ipv6_getsockopt                          2855    2950     +95
kzalloc_noprof                                 -      55     +55
ip4ip6_err                                   918     955     +37
__icmp_send                                 1499    1532     +33
do_ip_getsockopt                            2573    2605     +32
__ip6_append_data                           4109    4137     +28
__pfx_kzalloc_noprof                           -      16     +16
__pfx_ip6_finish_output_gso_slowpath_drop       -      16     +16
ipmr_prepare_xmit                           1232    1238      +6
ip6_forward                                 1905    1909      +4
ip6_cork_release                             108     111      +3
ipv6_push_nfrag_opts                         489     486      -3
ipv6_push_frag_opts                           90      87      -3
ip6_finish_output2                          1446    1437      -9
ip6_tnl_xmit                                2639    2627     -12
ip6_default_advmss                           176     160     -16
__ip6_rt_update_pmtu                        1087    1071     -16
tcp_v6_syn_recv_sock                        1715    1696     -19
tcp_v4_syn_recv_sock                        1107    1088     -19
__ip_make_skb                               1339    1320     -19
ip_setup_cork                                406     385     -21
ip6_setup_cork                               732     710     -22
rawv6_push_pending_frames                    581     556     -25
ip6_push_pending_frames                      184     157     -27
udpv6_splice_eof                             203     170     -33
ip6_flush_pending_frames                     220     183     -37
ip6_append_data                              349     312     -37
udp_v6_push_pending_frames                   155     115     -40
sit_tunnel_xmit                             1957    1914     -43
__pfx_dst_mtu                                 64       -     -64
tcp_v4_mtu_reduced                           289     220     -69
tcp_v6_mtu_reduced                           209     139     -70
ip6_make_skb                                 574     484     -90
ip6_finish_output                            827     697    -130
dst_mtu                                      160       -    -160
fib6_nh_mtu_change                           511     336    -175
Total: Before=22584400, After=22583906, chg -0.00%
====================

Link: https://patch.msgid.link/20260130210303.3888261-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: colocate inet6_cork in inet_cork_full
Eric Dumazet [Fri, 30 Jan 2026 21:03:03 +0000 (21:03 +0000)] 
ipv6: colocate inet6_cork in inet_cork_full

All inet6_cork users also use one inet_cork_full.

Reduce number of parameters and increase data locality.

This saves ~275 bytes of code on x86_64.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv4: use dst4_mtu() instead of dst_mtu()
Eric Dumazet [Fri, 30 Jan 2026 21:03:02 +0000 (21:03 +0000)] 
ipv4: use dst4_mtu() instead of dst_mtu()

When we expect an IPv4 dst, use dst4_mtu() instead of dst_mtu()
to save some code space.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: use dst6_mtu() instead of dst_mtu()
Eric Dumazet [Fri, 30 Jan 2026 21:03:01 +0000 (21:03 +0000)] 
ipv6: use dst6_mtu() instead of dst_mtu()

When we expect an IPv6 dst, use dst6_mtu() instead of dst_mtu()
to save some code space.

Due to current dst6_mtu() implementation, only convert
users in IPv6 stack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoinet: add dst4_mtu() and dst6_mtu() helpers
Eric Dumazet [Fri, 30 Jan 2026 21:03:00 +0000 (21:03 +0000)] 
inet: add dst4_mtu() and dst6_mtu() helpers

With CONFIG_MITIGATION_RETPOLINE=y dst_mtu() is a bit fat,
because it is generic.

Indeed, clang does not always inline it.

Add dst4_mtu() and dst6_mtu() helpers for callers that
expect either ipv4_mtu() or ip6_mtu() to be called.

These helpers are always inlined.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: use SKB_DROP_REASON_PKT_TOO_BIG in ip6_xmit()
Eric Dumazet [Fri, 30 Jan 2026 21:02:59 +0000 (21:02 +0000)] 
ipv6: use SKB_DROP_REASON_PKT_TOO_BIG in ip6_xmit()

When a too big packet is dropped, use SKB_DROP_REASON_PKT_TOO_BIG.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: use __skb_push() in ip6_xmit()
Eric Dumazet [Fri, 30 Jan 2026 21:02:58 +0000 (21:02 +0000)] 
ipv6: use __skb_push() in ip6_xmit()

ip6_xmit() makes sure there is enough headroom in the skb,
it can uses __skb_push() instead of the out-of-line skb_push().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: add some unlikely()/likely() clauses in ip6_output.c
Eric Dumazet [Fri, 30 Jan 2026 21:02:57 +0000 (21:02 +0000)] 
ipv6: add some unlikely()/likely() clauses in ip6_output.c

1) daddr is unlikely a multicast in ip6_finish_output2().

2) ip6_finish_output_gso_slowpath_drop() should not be called often.

3) ip6_fragment() should not be called often.

4) opt is unlikely to be set.

5) ip6_xmit() and ip6_forward() mostly sends not too big packets.

6) Most __ip6_make_skb() calls are for UDP packets,
   not ICMPV6 ones.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: pass proto by value to ipv6_push_nfrag_opts() and ipv6_push_frag_opts()
Eric Dumazet [Fri, 30 Jan 2026 21:02:56 +0000 (21:02 +0000)] 
ipv6: pass proto by value to ipv6_push_nfrag_opts() and ipv6_push_frag_opts()

With CONFIG_STACKPROTECTOR_STRONG=y, it is better to avoid passing
a pointer to an automatic variable.

Change these exported functions to return 'u8 proto'
instead of void.

- ipv6_push_nfrag_opts()
- ipv6_push_frag_opts()

For instance, replace
ipv6_push_frag_opts(skb, opt, &proto);
with:
proto = ipv6_push_frag_opts(skb, opt, proto);

Note that even after this change, ip6_xmit() has to use a stack canary
because of @first_hop variable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: remove unnecessary module_init/exit functions
Ethan Nelson-Moore [Sat, 31 Jan 2026 00:42:56 +0000 (16:42 -0800)] 
net: remove unnecessary module_init/exit functions

Many network drivers have unnecessary empty module_init and module_exit
functions. Remove them (including some that just print a message). Note
that if a module_init function exists, a module_exit function must also
exist; otherwise, the module cannot be unloaded.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260131004327.18112-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: ethernet: use module_pci_driver; remove useless driver versions
Ethan Nelson-Moore [Sat, 31 Jan 2026 02:24:30 +0000 (18:24 -0800)] 
net: ethernet: use module_pci_driver; remove useless driver versions

The module version is useless, and the only thing these drivers' init
routines did besides pci_register_driver was to print the driver name
and/or version.

Acked-by: Francois Romieu <romieu@fr.zoreil.com> (epic100)
Reviewed-by: Simon Horman <horms@kernel.org> (epic100, sis900)
Reviewed-by: Sai Krishna <saikrishnag@marvell.com> (epic100)
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260131022441.56274-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: add a debug check in __skb_push()
Eric Dumazet [Fri, 30 Jan 2026 16:02:53 +0000 (16:02 +0000)] 
net: add a debug check in __skb_push()

Add the following check, to detect bugs sooner for CONFIG_DEBUG_NET=y
builds.

DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head);

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130160253.2936789-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>