]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
9 days agoMerge branch 'net-ethernet-renesas-rcar_gen4_ptp-hide-private-data'
Jakub Kicinski [Wed, 4 Feb 2026 03:35:40 +0000 (19:35 -0800)] 
Merge branch 'net-ethernet-renesas-rcar_gen4_ptp-hide-private-data'

Niklas Söderlund says:

====================
net: ethernet: renesas: rcar_gen4_ptp: Hide private data

The R-Car Gen4 PTP module started out as an exclusive feature of a
single driver, but have since been extended to cover both R-Car Switch
and TSN driver implementations on Gen4.

The feature have already been extended to be built as its own module
with an interface exposed thru a local header file. The header file
however also exposes the modules private data structure. The two
existing users have already started to poke at members of the struct.

The exposed private data being manipulated by users makes refactoring
and future rework hard as the interface for the module becomes to
chaotic. This small series aims to create two helpers to hide the
private data.

This is done as a small preparation before a third, new, users of the
Gen4 PTP will be added in a follow up series.
====================

Link: https://patch.msgid.link/20260201183745.1075399-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: ethernet: renesas: rcar_gen4_ptp: Hide private data from users
Niklas Söderlund [Sun, 1 Feb 2026 18:37:45 +0000 (19:37 +0100)] 
net: ethernet: renesas: rcar_gen4_ptp: Hide private data from users

The Gen4 PTP helper module is already used by RTSN and RSWITCH to
support PTP clocks and will be used by RAVB too. Hide the Gen4 PTP
private data structure to make sure none of the users poke at it.

This will be more important for RAVB use-cases as more then one RAVB
device will need to cooperate using one PTP clock source.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260201183745.1075399-5-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: ethernet: renesas: rcar_gen4_ptp: Add helper to read time
Niklas Söderlund [Sun, 1 Feb 2026 18:37:44 +0000 (19:37 +0100)] 
net: ethernet: renesas: rcar_gen4_ptp: Add helper to read time

Instead of accessing the Gen4 PTP specific structure directly in drivers
add a helper to read the time. This is done in preparation to
completely hide the Gen4 PTP specific structure from users.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260201183745.1075399-4-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: ethernet: renesas: rcar_gen4_ptp: Add helper to get clock index
Niklas Söderlund [Sun, 1 Feb 2026 18:37:43 +0000 (19:37 +0100)] 
net: ethernet: renesas: rcar_gen4_ptp: Add helper to get clock index

Instead of accessing the Gen4 PTP specific structure directly in drivers
add a helper to read the clock index. This is done in preparation to
completely hide the Gen4 PTP specific structure from users.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260201183745.1075399-3-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: ethernet: renesas: rcar_gen4_ptp: Move address assignment
Niklas Söderlund [Sun, 1 Feb 2026 18:37:42 +0000 (19:37 +0100)] 
net: ethernet: renesas: rcar_gen4_ptp: Move address assignment

Instead of accessing the Gen4 PTP specific structure directly in drivers
move the device address assignment into the preparation call. This is
done in preparation to completely hide the Gen4 PTP specific structure
from users.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260201183745.1075399-2-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: bridge: use sysfs_emit instead of sprintf
David Corvaglia [Mon, 2 Feb 2026 19:09:41 +0000 (19:09 +0000)] 
net: bridge: use sysfs_emit instead of sprintf

Replace sprintf with sysfs_emit in sysfs show() methods as outlined in
Documentation/filesystems/sysfs.rst.

sysfs_emit is preferred to sprintf in sysfs show() methods as it is safer
with buffer handling.

Signed-off-by: David Corvaglia <david@corvaglia.dev>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/0100019c1fc2bcc3-bc9ca2f1-22d7-4250-8441-91e4af57117b-000000@email.amazonses.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agobng_en: fix misleading error message for generic firmware version
Alok Tiwari [Mon, 2 Feb 2026 03:38:43 +0000 (19:38 -0800)] 
bng_en: fix misleading error message for generic firmware version

The devlink info_get handler incorrectly reports "roce firmware" when
populating the generic firmware version field.

Update the error message to correctly describe the failing operation.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260202033848.22993-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoMerge branch 'net-stmmac-rk-cleanups-v3-mode-and-speed-for-most'
Jakub Kicinski [Wed, 4 Feb 2026 02:01:47 +0000 (18:01 -0800)] 
Merge branch 'net-stmmac-rk-cleanups-v3-mode-and-speed-for-most'

Russell King says:

====================
net: stmmac: rk: cleanups v3: mode and speed for most

Third installment in the rk cleanups, this converts the interface mode
and speed configuration for most RK SoCs.
====================

Link: https://patch.msgid.link/aYB2cKRu3DQh6yXK@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: convert px30
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:41 +0000 (10:04 +0000)] 
net: stmmac: rk: convert px30

Use rk_set_clk_mac_speed() rather than px30 specific function for
configuring RMII clock.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqnR-00000007VDE-2fM1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: remove need for ->set_speed() method
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:36 +0000 (10:04 +0000)] 
net: stmmac: rk: remove need for ->set_speed() method

As we can detect whether the SoC provides the parameters necessary for
rk_set_reg_speed(), we don't need to have explicit calls to this.
Instead, we can move the contents of this function to
rk_set_clk_tx_rate().

This remsoves all the .set_speed() implementations that merely go on to
invoke rk_set_reg_speed().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqnM-00000007VD8-1xWo@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: use rk_encode_wm16() for RMII clock
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:31 +0000 (10:04 +0000)] 
net: stmmac: rk: use rk_encode_wm16() for RMII clock

The RMII clock is a single bit, which is set for 100M and clear for
10M. Move this out of struct rk_reg_speed_data (which gets rid of
this structure) into the struct rk_clock_fields as the bitmask for
this bit.

This gets rid of the per-SoC variability in the calls to
rk_set_reg_speed().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqnH-00000007VCz-1WmP@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: use rk_encode_wm16() for RMII speed
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:26 +0000 (10:04 +0000)] 
net: stmmac: rk: use rk_encode_wm16() for RMII speed

The RMII speed configuration is encoded as a single bit, which is set
for 100M and clean for 10M. Provide the bitfield definition in
struct rk_clock_fields, moving it out of struct rk_reg_speed_data's
rmii_10 and rmii_100 initialisers. Update rk_set_reg_speed() to handle
the new definition location of this bit.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqnC-00000007VCt-0oRg@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: use rk_encode_wm16() for RGMII clocks
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:21 +0000 (10:04 +0000)] 
net: stmmac: rk: use rk_encode_wm16() for RGMII clocks

As all of the RGMII clock selection bitfields (gmii_clk_sel) use the
same encoding, parameterise this by providing the bitfield mask in
the BSP private data.

This is the last user of GRF_FIELD_CONST(), so remove that definition
as well.

One additional change is for RK3328 - as only gmac2io supports RGMII,
only initialise the mask for this instance.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqn7-00000007VCn-0OZA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: remove rk3528 RMII clock initialisation
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:15 +0000 (10:04 +0000)] 
net: stmmac: rk: remove rk3528 RMII clock initialisation

There is no need to pre-initialise the rk3528 RMII clock when
selecting RMII mode on gmac0.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqn1-00000007VCh-47Sv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: convert rk3588 to rk_set_reg_speed()
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:10 +0000 (10:04 +0000)] 
net: stmmac: rk: convert rk3588 to rk_set_reg_speed()

Update rk_set_reg_speed() to use either the grf or php_grf regmap
depending on the SoC's requirements and convert rk3588, removing
its custom code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqmw-00000007VCb-3glG@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: move speed GRF register offset to private data
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:05 +0000 (10:04 +0000)] 
net: stmmac: rk: move speed GRF register offset to private data

Move the speed/clocking related GRF register offset into the driver
private data, convert rk_set_reg_speed() to use it and initialise this
member either from the corresponding member in struct rk_gmac_ops, or
the SoC specific initialisation function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqmr-00000007VCV-3Cz8@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: convert rk3588 to mask-based interface mode config
Russell King (Oracle) [Mon, 2 Feb 2026 10:04:00 +0000 (10:04 +0000)] 
net: stmmac: rk: convert rk3588 to mask-based interface mode config

rk3588 has a quirk compared to the other Rockchip implementations in
that the interface mode configuration register is in the php_grf
regmap rather than the grf regmap. Add a flag to indicate this, and
a separate function to write to the appropriate regmap. This allows
rk3588 to be converted.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vmqmm-00000007VCP-2XZc@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet: stmmac: rk: convert to mask-based interface mode configuration
Russell King (Oracle) [Mon, 2 Feb 2026 10:03:55 +0000 (10:03 +0000)] 
net: stmmac: rk: convert to mask-based interface mode configuration

The majority of Rockchip implementations require three common pieces
of information to configure the PHY interface mode:

- The grf register offset for configuring the GMAC phy_intf_sel field
  and the RMII mode bit.
- The bitfield in this register for the GMAC's phy_intf_sel.
- The bit position for RMII mode but clear for RGMII mode.

Introduce members for this information into struct rk_priv_data and
struct rk_gmac_ops, which will be used to pre-initialise the struct
rk_priv_data members. We describe the register contents using
bitfields, even for those that are a single bit for consistency.

As each register comprises of two halves, where the upper half enables
changing the bit state in the lower half, we can describe these
bitfields using a 16-bit data type, and provide rk_encode_wm16() to
generate the actual register values from the field mask and field
value. We are unable to use the FIELD_PREP_WM16() macros for this as
these require the field mask to be a constant.

Add code to rk_gmac_powerup() to get the phy_intf_sel value, validating
that the resulting mode is either RMII or RGMII. No other modes are
supported by any of the Rockchip SoCs supported by this driver.

If either of the bitfield mask values are populated in struct
rk_priv_data, use these to generate the register contents, and write
the resulting value to the specified GRF register.

Convert many Rockchip implementations to use this new infrastructure.
For those where there is a single GMAC instance, it is merely a case of
filling in the new members of struct rk_gmac_ops. For those with
multiple instances, one or more of these members depends on the GMAC
instance, so setup of the members in struct rk_gmac has to be done via
the .init method of struct rk_gmac_ops. The corresponding code is
removed from the set_to_rgmii() and set_to_rmii() implementations.

Since the member name documents the purpose of the field that is being
initialised, providing preprocessor macros to define the bitfields is
deemed to be less than useful given the massive size of this driver.

The existing mechanisms remain behind for those SoCs that can not be
converted to this scheme.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
v2: disable clocks on failure
Link: https://patch.msgid.link/E1vmqmh-00000007VCJ-1xns@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'accecn-protocol-case-handling-series'
Paolo Abeni [Tue, 3 Feb 2026 14:13:30 +0000 (15:13 +0100)] 
Merge branch 'accecn-protocol-case-handling-series'

Chia-Yu Chang says:

====================
AccECN protocol case handling series

Plesae find the v13 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.

This patch series is part of the full AccECN patch series, which is at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
---
Chia-Yu Chang (13):
  selftests/net: gro: add self-test for TCP CWR flag
  tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
  tcp: disable RFC3168 fallback identifier for CC modules
  tcp: accecn: handle unexpected AccECN negotiation feedback
  tcp: accecn: retransmit downgraded SYN in AccECN negotiation
  tcp: add TCP_SYNACK_RETRANS synack_type
  tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
    SYN/ACK
  tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
  tcp: accecn: fallback outgoing half link to non-AccECN
  tcp: accecn: detect loss ACK w/ AccECN option and add
    TCP_ACCECN_OPTION_PERSIST
  tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info
  tcp: accecn: enable AccECN
  selftests/net: packetdrill: add TCP Accurate ECN cases

Ilpo Järvinen (2):
  tcp: try to avoid safer when ACKs are thinned
  gro: flushing when CWR is set negatively affects AccECN

 Documentation/networking/ip-sysctl.rst        |   4 +-
 .../networking/net_cachelines/tcp_sock.rst    |   1 +
 include/linux/tcp.h                           |   4 +-
 include/net/inet_ecn.h                        |  20 +++-
 include/net/tcp.h                             |  32 +++++-
 include/net/tcp_ecn.h                         | 103 ++++++++++++------
 include/uapi/linux/tcp.h                      |  26 ++++-
 net/ipv4/inet_connection_sock.c               |   3 +
 net/ipv4/sysctl_net_ipv4.c                    |   4 +-
 net/ipv4/tcp.c                                |  10 ++
 net/ipv4/tcp_cong.c                           |   5 +-
 net/ipv4/tcp_input.c                          |  40 ++++++-
 net/ipv4/tcp_minisocks.c                      |  43 +++++---
 net/ipv4/tcp_offload.c                        |   3 +-
 net/ipv4/tcp_output.c                         |  34 ++++--
 net/ipv4/tcp_timer.c                          |   3 +
 tools/testing/selftests/drivers/net/gro.c     |  81 ++++++++++----
 tools/testing/selftests/drivers/net/gro.py    |   3 +-
 .../tcp_accecn_2nd_data_as_first.pkt          |  24 ++++
 .../tcp_accecn_2nd_data_as_first_connect.pkt  |  30 +++++
 .../tcp_accecn_3rd_ack_after_synack_rxmt.pkt  |  19 ++++
 ..._accecn_3rd_ack_ce_updates_received_ce.pkt |  18 +++
 .../tcp_accecn_3rd_ack_lost_data_ce.pkt       |  22 ++++
 .../net/packetdrill/tcp_accecn_3rd_dups.pkt   |  26 +++++
 .../tcp_accecn_acc_ecn_disabled.pkt           |  13 +++
 .../tcp_accecn_accecn_then_notecn_syn.pkt     |  28 +++++
 .../tcp_accecn_accecn_to_rfc3168.pkt          |  18 +++
 .../tcp_accecn_client_accecn_options_drop.pkt |  34 ++++++
 .../tcp_accecn_client_accecn_options_lost.pkt |  38 +++++++
 .../tcp_accecn_clientside_disabled.pkt        |  12 ++
 ...cecn_close_local_close_then_remote_fin.pkt |  25 +++++
 .../tcp_accecn_delivered_2ndlargeack.pkt      |  25 +++++
 ..._accecn_delivered_falseoverflow_detect.pkt |  31 ++++++
 .../tcp_accecn_delivered_largeack.pkt         |  24 ++++
 .../tcp_accecn_delivered_largeack2.pkt        |  25 +++++
 .../tcp_accecn_delivered_maxack.pkt           |  25 +++++
 .../tcp_accecn_delivered_updates.pkt          |  70 ++++++++++++
 .../net/packetdrill/tcp_accecn_ecn3.pkt       |  12 ++
 .../tcp_accecn_ecn_field_updates_opt.pkt      |  35 ++++++
 .../packetdrill/tcp_accecn_ipflags_drop.pkt   |  14 +++
 .../tcp_accecn_listen_opt_drop.pkt            |  16 +++
 .../tcp_accecn_multiple_syn_ack_drop.pkt      |  28 +++++
 .../tcp_accecn_multiple_syn_drop.pkt          |  18 +++
 .../tcp_accecn_negotiation_bleach.pkt         |  23 ++++
 .../tcp_accecn_negotiation_connect.pkt        |  23 ++++
 .../tcp_accecn_negotiation_listen.pkt         |  26 +++++
 .../tcp_accecn_negotiation_noopt_connect.pkt  |  23 ++++
 .../tcp_accecn_negotiation_optenable.pkt      |  23 ++++
 .../tcp_accecn_no_ecn_after_accecn.pkt        |  20 ++++
 .../net/packetdrill/tcp_accecn_noopt.pkt      |  27 +++++
 .../net/packetdrill/tcp_accecn_noprogress.pkt |  27 +++++
 .../tcp_accecn_notecn_then_accecn_syn.pkt     |  28 +++++
 .../tcp_accecn_rfc3168_to_fallback.pkt        |  18 +++
 .../tcp_accecn_rfc3168_to_rfc3168.pkt         |  18 +++
 .../tcp_accecn_sack_space_grab.pkt            |  28 +++++
 .../tcp_accecn_sack_space_grab_with_ts.pkt    |  39 +++++++
 ...tcp_accecn_serverside_accecn_disabled1.pkt |  20 ++++
 ...tcp_accecn_serverside_accecn_disabled2.pkt |  20 ++++
 .../tcp_accecn_serverside_broken.pkt          |  19 ++++
 .../tcp_accecn_serverside_ecn_disabled.pkt    |  19 ++++
 .../tcp_accecn_serverside_only.pkt            |  18 +++
 ...n_syn_ace_flags_acked_after_retransmit.pkt |  18 +++
 .../tcp_accecn_syn_ace_flags_drop.pkt         |  16 +++
 ...n_ack_ace_flags_acked_after_retransmit.pkt |  27 +++++
 .../tcp_accecn_syn_ack_ace_flags_drop.pkt     |  26 +++++
 .../net/packetdrill/tcp_accecn_syn_ce.pkt     |  13 +++
 .../net/packetdrill/tcp_accecn_syn_ect0.pkt   |  13 +++
 .../net/packetdrill/tcp_accecn_syn_ect1.pkt   |  13 +++
 .../net/packetdrill/tcp_accecn_synack_ce.pkt  |  27 +++++
 ..._accecn_synack_ce_updates_delivered_ce.pkt |  22 ++++
 .../packetdrill/tcp_accecn_synack_ect0.pkt    |  24 ++++
 .../packetdrill/tcp_accecn_synack_ect1.pkt    |  24 ++++
 .../packetdrill/tcp_accecn_synack_rexmit.pkt  |  15 +++
 .../packetdrill/tcp_accecn_synack_rxmt.pkt    |  25 +++++
 .../packetdrill/tcp_accecn_tsnoprogress.pkt   |  26 +++++
 .../net/packetdrill/tcp_accecn_tsprogress.pkt |  25 +++++
 76 files changed, 1680 insertions(+), 102 deletions(-)
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_2nd_data_as_first.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_2nd_data_as_first_connect.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_ack_after_synack_rxmt.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_ack_ce_updates_received_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_ack_lost_data_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_dups.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_acc_ecn_disabled.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_accecn_then_notecn_syn.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_accecn_to_rfc3168.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_client_accecn_options_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_client_accecn_options_lost.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_clientside_disabled.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_close_local_close_then_remote_fin.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_2ndlargeack.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_falseoverflow_detect.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_largeack.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_largeack2.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_maxack.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_updates.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_ecn3.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_ecn_field_updates_opt.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_ipflags_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_listen_opt_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_multiple_syn_ack_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_multiple_syn_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_bleach.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_connect.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_listen.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_noopt_connect.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_optenable.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_no_ecn_after_accecn.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_noopt.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_noprogress.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_notecn_then_accecn_syn.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_rfc3168_to_fallback.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_rfc3168_to_rfc3168.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_sack_space_grab.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_sack_space_grab_with_ts.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_accecn_disabled1.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_accecn_disabled2.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_broken.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_ecn_disabled.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_only.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ace_flags_acked_after_retransmit.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ace_flags_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ack_ace_flags_acked_after_retransmit.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ack_ace_flags_drop.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ect0.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ect1.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ce_updates_delivered_ce.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ect0.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ect1.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_rexmit.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_rxmt.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_tsnoprogress.pkt
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_tsprogress.pkt
====================

Link: https://patch.msgid.link/20260131222515.8485-1-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agoselftests/net: packetdrill: add TCP Accurate ECN cases
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:15 +0000 (23:25 +0100)] 
selftests/net: packetdrill: add TCP Accurate ECN cases

Linux Accurate ECN test sets using ACE counters and AccECN options to
cover several scenarios: Connection teardown, different ACK conditions,
counter wrapping, SACK space grabbing, fallback schemes, negotiation
retransmission/reorder/loss, AccECN option drop/loss, different
handshake reflectors, data with marking, and different sysctl values.

The packetdrill used is commit cbe405666c9c8698ac1e72f5e8ffc551216dfa56
of repo: https://github.com/minuscat/packetdrill/tree/upstream_accecn.
And corresponding patches are sent to google/packetdrill email list.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-16-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: accecn: enable AccECN
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:14 +0000 (23:25 +0100)] 
tcp: accecn: enable AccECN

Enable Accurate ECN negotiation and request for incoming and
outgoing connection by setting sysctl_tcp_ecn:

+==============+===========================================+
|              |  Highest ECN variant (Accurate ECN, ECN,  |
|   tcp_ecn    |  or no ECN) to be negotiated & requested  |
|              +---------------------+---------------------+
|              | Incoming connection | Outgoing connection |
+==============+=====================+=====================+
|      0       |        No ECN       |        No ECN       |
|      1       |         ECN         |         ECN         |
|      2       |         ECN         |        No ECN       |
+--------------+---------------------+---------------------+
|      3       |     Accurate ECN    |     Accurate ECN    |
|      4       |     Accurate ECN    |         ECN         |
|      5       |     Accurate ECN    |        No ECN       |
+==============+=====================+=====================+

Refer Documentation/networking/ip-sysctl.rst for more details.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-15-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:13 +0000 (23:25 +0100)] 
tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info

Add 2-bit tcpi_ecn_mode feild within tcp_info to indicate which ECN
mode is negotiated: ECN_MODE_DISABLED, ECN_MODE_RFC3168, ECN_MODE_ACCECN,
or ECN_MODE_PENDING. This is done by utilizing available bits from
tcpi_accecn_opt_seen (reduced from 16 bits to 2 bits) and
tcpi_accecn_fail_mode (reduced from 16 bits to 4 bits).

Also, an extra 24-bit tcpi_options2 field is identified to represent
newer options and connection features, as all 8 bits of tcpi_options
field have been used.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-14-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:12 +0000 (23:25 +0100)] 
tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST

Detect spurious retransmission of a previously sent ACK carrying the
AccECN option after the second retransmission. Since this might be caused
by the middlebox dropping ACK with options it does not recognize, disable
the sending of the AccECN option in all subsequent ACKs. This patch
follows Section 3.2.3.2.2 of AccECN spec (RFC9768), and a new field
(accecn_opt_sent_w_dsack) is added to indicate that an AccECN option was
sent with duplicate SACK info.

Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl:
(TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and
persistently sends AccECN option once it fits into TCP option space.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-13-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: accecn: fallback outgoing half link to non-AccECN
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:11 +0000 (23:25 +0100)] 
tcp: accecn: fallback outgoing half link to non-AccECN

According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server
is in AccECN mode and in SYN-RCVD state, and if it receives a value of
zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the
connection the Server MUST NOT set ECT on outgoing packets and MUST
NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it
MUST NOT disable AccECN feedback.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-12-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:10 +0000 (23:25 +0100)] 
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion

Based on specification:
  https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt

Based on Section 3.1.5 of AccECN spec (RFC9768), a TCP Server in
AccECN mode MUST NOT set ECT on any packet for the rest of the connection,
if it has received or sent at least one valid SYN or Acceptable SYN/ACK
with (AE,CWR,ECE) = (0,0,0) during the handshake.

In addition, a host in AccECN mode that is feeding back the IP-ECN
field on a SYN or SYN/ACK MUST feed back the IP-ECN field on the
latest valid SYN or acceptable SYN/ACK to arrive.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-11-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:09 +0000 (23:25 +0100)] 
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK

For Accurate ECN, the first SYN/ACK sent by the TCP server shall set
the ACE flag (Table 1 of RFC9768) and the AccECN option to complete the
capability negotiation. However, if the TCP server needs to retransmit
such a SYN/ACK (for example, because it did not receive an ACK
acknowledging its SYN/ACK, or received a second SYN requesting AccECN
support), the TCP server retransmits the SYN/ACK without the AccECN
option. This is because the SYN/ACK may be lost due to congestion, or a
middlebox may block the AccECN option. Furthermore, if this retransmission
also times out, to expedite connection establishment, the TCP server
should retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and without the
AccECN option, while maintaining AccECN feedback mode.

This complies with Section 3.2.3.2.2 of the AccECN spec RFC9768.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-10-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: add TCP_SYNACK_RETRANS synack_type
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:08 +0000 (23:25 +0100)] 
tcp: add TCP_SYNACK_RETRANS synack_type

Before this patch, retransmitted SYN/ACK did not have a specific
synack_type; however, the upcoming patch needs to distinguish between
retransmitted and non-retransmitted SYN/ACK for AccECN negotiation to
transmit the fallback SYN/ACK during AccECN negotiation. Therefore, this
patch introduces a new synack_type (TCP_SYNACK_RETRANS).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-9-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: accecn: retransmit downgraded SYN in AccECN negotiation
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:07 +0000 (23:25 +0100)] 
tcp: accecn: retransmit downgraded SYN in AccECN negotiation

Based on AccECN spec (RFC9768) Section 3.1.4.1, if the sender of an
AccECN SYN (the TCP Client) times out before receiving the SYN/ACK, it
SHOULD attempt to negotiate the use of AccECN at least one more time
by continuing to set all three TCP ECN flags (AE,CWR,ECE) = (1,1,1) on
the first retransmitted SYN (using the usual retransmission time-outs).

If this first retransmission also fails to be acknowledged, in
deployment scenarios where AccECN path traversal might be problematic,
the TCP Client SHOULD send subsequent retransmissions of the SYN with
the three TCP-ECN flags cleared (AE,CWR,ECE) = (0,0,0).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-8-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: accecn: handle unexpected AccECN negotiation feedback
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:06 +0000 (23:25 +0100)] 
tcp: accecn: handle unexpected AccECN negotiation feedback

According to Sections 3.1.2 and 3.1.3 of AccECN spec (RFC9768).

In Section 3.1.2, it says an AccECN implementation has no need to
recognize or support the Server response labelled 'Nonce' or ECN-nonce
feedback more generally, as RFC 3540 has been reclassified as Historic.
AccECN is compatible with alternative ECN feedback integrity approaches
to the nonce. The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1)
is reserved for future use. A TCP Client (A) that receives such a SYN/ACK
follows the procedure for forward compatibility given in Section 3.1.3.

Then in Section 3.1.3, it says if a TCP Client has sent a SYN requesting
AccECN feedback with (AE,CWR,ECE) = (1,1,1) then receives a SYN/ACK with
the currently reserved combination (AE,CWR,ECE) = (1,0,1) but it does not
have logic specific to such a combination, the Client MUST enable AccECN
mode as if the SYN/ACK onfirmed that the Server supported AccECN and as
if it fed back that the IP-ECN field on the SYN had arrived unchanged.

Fixes: 3cae34274c79 ("tcp: accecn: AccECN negotiation").
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-7-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: disable RFC3168 fallback identifier for CC modules
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:05 +0000 (23:25 +0100)] 
tcp: disable RFC3168 fallback identifier for CC modules

When AccECN is not successfully negociated for a TCP flow, it defaults
fallback to classic ECN (RFC3168). However, L4S service will fallback
to non-ECN.

This patch enables congestion control module to control whether it
should not fallback to classic ECN after unsuccessful AccECN negotiation.
A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this
behavior expected by the CA.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-6-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:04 +0000 (23:25 +0100)] 
tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers

Two flags for congestion control (CC) module are added in this patch
related to AccECN negotiation. First, a new flag (TCP_CONG_NEEDS_ACCECN)
defines that the CC expects to negotiate AccECN functionality using the
ECE, CWR and AE flags in the TCP header.

Second, during ECN negotiation, ECT(0) in the IP header is used. This
patch enables CC to control whether ECT(0) or ECT(1) should be used on
a per-segment basis. A new flag (TCP_CONG_ECT_1_NEGOTIATION) defines the
expected ECT value in the IP header by the CA when not-yet initialized
for the connection.

The detailed AccECN negotiaotn can be found in IETF RFC9768.

Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-5-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agoselftests/net: gro: add self-test for TCP CWR flag
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:03 +0000 (23:25 +0100)] 
selftests/net: gro: add self-test for TCP CWR flag

Currently, GRO does not flush packets when the CWR bit is set.
A corresponding self-test is being added, in which the CWR flag
is set for two consecutive packets, but the first packet with the
CWR flag set will not be flushed immediately.

+===================+==========+===============+===========+
|     Packet id     | CWR flag |    Payload    | Flushing? |
+===================+==========+===============+===========+
|         0         |     0    |  PAYLOAD_LEN  |     0     |
|        ...        |     0    |  PAYLOAD_LEN  |     1     |
+-------------------+----------+---------------+-----------+
| NUM_PACKETS/2 - 1 |     1    |  payload_len  |     0     |
|   NUM_PACKETS/2   |     1    |  payload_len  |     1     |
+-------------------+----------+---------------+-----------+
|        ...        |     0    |  PAYLOAD_LEN  |     0     |
|   NUM_PACKETS     |     0    |  PAYLOAD_LEN  |     1     |
+===================+==========+===============+===========+

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-4-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agogro: flushing when CWR is set negatively affects AccECN
Ilpo Järvinen [Sat, 31 Jan 2026 22:25:02 +0000 (23:25 +0100)] 
gro: flushing when CWR is set negatively affects AccECN

As AccECN may keep CWR bit asserted due to different
interpretation of the bit, flushing with GRO because of
CWR may effectively disable GRO until AccECN counter
field changes such that CWR-bit becomes 0.

There is no harm done from not immediately forwarding the
CWR'ed segment with RFC3168 ECN.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-3-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agotcp: try to avoid safer when ACKs are thinned
Ilpo Järvinen [Sat, 31 Jan 2026 22:25:01 +0000 (23:25 +0100)] 
tcp: try to avoid safer when ACKs are thinned

Add newly acked pkts EWMA. When ACK thinning occurs, select
between safer and unsafe cep delta in AccECN processing based
on it. If the packets ACKed per ACK tends to be large, don't
conservatively assume ACE field overflow.

This patch uses the existing 2-byte holes in the rx group for new
u16 variables withtout creating more holes. Below are the pahole
outcomes before and after this patch:

[BEFORE THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    /* XXX 4 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 177 */
}

[AFTER THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    u16                        pkts_acked_ewma;        /*  2756     2 */
    /* XXX 2 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 178 */
}

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-2-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agoMerge branch 'net-dsa-yt921x-add-dcb-qos-support'
Paolo Abeni [Tue, 3 Feb 2026 14:09:32 +0000 (15:09 +0100)] 
Merge branch 'net-dsa-yt921x-add-dcb-qos-support'

David Yang says:

====================
net: dsa: yt921x: Add DCB/QoS support

This series add DCB/QoS support to the driver.

v5: https://lore.kernel.org/r/20260128215202.2244266-1-mmyangfl@gmail.com
v4: https://lore.kernel.org/r/20260127020847.1482724-1-mmyangfl@gmail.com
v3: https://lore.kernel.org/r/20260125001328.3784006-1-mmyangfl@gmail.com
v2: https://lore.kernel.org/r/20260122194233.2777550-1-mmyangfl@gmail.com
v1: https://lore.kernel.org/r/20260119185935.2072685-1-mmyangfl@gmail.com
====================

Link: https://patch.msgid.link/20260131021854.3405036-1-mmyangfl@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agonet: dsa: yt921x: Add DCB/QoS support
David Yang [Sat, 31 Jan 2026 02:18:51 +0000 (10:18 +0800)] 
net: dsa: yt921x: Add DCB/QoS support

Set up global DSCP/PCP priority mappings and add related DCB methods.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-6-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agonet: dsa: yt921x: Refactor yt921x_chip_setup()
David Yang [Sat, 31 Jan 2026 02:18:50 +0000 (10:18 +0800)] 
net: dsa: yt921x: Refactor yt921x_chip_setup()

yt921x_chip_setup() is already pretty long, and is going to become
longer. Split it into parts.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-5-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agonet: dsa: yt921x: Refactor VLAN awareness setting
David Yang [Sat, 31 Jan 2026 02:18:49 +0000 (10:18 +0800)] 
net: dsa: yt921x: Refactor VLAN awareness setting

Create a helper function to centralize the logic for enabling and
disabling VLAN awareness on a port.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-4-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agonet: dsa: tag_yt921x: add priority support
David Yang [Sat, 31 Jan 2026 02:18:48 +0000 (10:18 +0800)] 
net: dsa: tag_yt921x: add priority support

Required by DCB/QoS support of the switch driver, since the rx packets
will have non-zero priorities.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-3-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agonet: dsa: tag_yt921x: clarify priority and code fields
David Yang [Sat, 31 Jan 2026 02:18:47 +0000 (10:18 +0800)] 
net: dsa: tag_yt921x: clarify priority and code fields

Packet priority is part of the tag, and the priority and code fields are
used by tx and rx. Make revisions to reflect the facts.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-2-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agoMerge branch 'net-phy-remove-modalias-based-mdio-device-bus-matching'
Paolo Abeni [Tue, 3 Feb 2026 11:46:57 +0000 (12:46 +0100)] 
Merge branch 'net-phy-remove-modalias-based-mdio-device-bus-matching'

Heiner Kallweit says:

====================
net: phy: remove modalias-based MDIO device bus matching

modalias-based MDIO device bus matching has only one user (dsa-loop),
where we can replace modalias-based matching with a simple custom
match function. This, and first patch of the series, lay the foundation
for removing modalias-based matching.
====================

Link: https://patch.msgid.link/d9543e7d-23e1-4dba-a6b3-35dcd6a35dec@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agonet: phy: remove modalias-based mdio bus matching
Heiner Kallweit [Sat, 31 Jan 2026 17:40:00 +0000 (18:40 +0100)] 
net: phy: remove modalias-based mdio bus matching

Last user dsa_loop has been migrated away from modalias-based matching,
so we can remove this feature now. It was the only user of MDIO_NAME_SIZE,
so remove also this constant.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ce1c6df0-4785-4b28-8322-32dc6bceea18@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agonet: dsa: loop: remove MDIO device modalias
Heiner Kallweit [Sat, 31 Jan 2026 17:38:56 +0000 (18:38 +0100)] 
net: dsa: loop: remove MDIO device modalias

This change is a prerequisite for removing the MDIO device modalias,
as dsa_loop is the only user. Switch from modalias to a custom
bus match function.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/15a4318f-50b5-4df5-874e-e387ee070a9d@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agonet: ethernet: adi: make name member of struct adin1110_cfg a pointer
Heiner Kallweit [Sat, 31 Jan 2026 17:37:15 +0000 (18:37 +0100)] 
net: ethernet: adi: make name member of struct adin1110_cfg a pointer

Primary reason for this change is to remove the misuse of MDIO_NAME_SIZE
here, so that this constant can be removed in a follow-up patch.
Use case here is simply a chip name w/o any relationship to a MDIO
device. Also there's no need to reserve a longer char array, so make
the name a pointer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/61bc14fa-eed3-43b6-ae40-b98063e81578@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 days agoMerge branch 'devlink-and-mlx5-support-cross-function-rate-scheduling'
Jakub Kicinski [Tue, 3 Feb 2026 04:05:51 +0000 (20:05 -0800)] 
Merge branch 'devlink-and-mlx5-support-cross-function-rate-scheduling'

Tariq Toukan says:

====================
devlink and mlx5: Support cross-function rate scheduling [part]

Apply trivial cleanups from the series to make it smaller.
====================

Link: https://patch.msgid.link/20260128112544.1661250-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agodevlink: Refactor devlink_rate_nodes_check
Cosmin Ratiu [Wed, 28 Jan 2026 11:25:35 +0000 (13:25 +0200)] 
devlink: Refactor devlink_rate_nodes_check

devlink_rate_nodes_check() was used to verify there are no devlink rate
nodes created when switching the esw mode.

Rate management code is about to become more complex, so refactor this
function:
- remove unused param 'mode'.
- add a new 'rate_filter' param.
- rename to devlink_rates_check().
- expose devlink_rate_is_node() to be used as a rate filter.

This makes it more usable from multiple places, so use it from those
places as well.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260128112544.1661250-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agodevlink: Reverse locking order for nested instances
Cosmin Ratiu [Wed, 28 Jan 2026 11:25:33 +0000 (13:25 +0200)] 
devlink: Reverse locking order for nested instances

Commit [1] defined the locking expectations for nested devlink
instances: the nested-in devlink instance lock needs to be acquired
before the nested devlink instance lock. The code handling devlink rels
was architected with that assumption in mind.

There are no actual users of double locking yet but that is about to
change in the upcoming patches in the series.

Code operating on nested devlink instances will require also obtaining
the nested-in instance lock, but such code may already be called from a
variety of places with the nested devlink instance lock. Then, there's
no way to acquire the nested-in lock other than making sure that all
callers acquire it first.

Reversing the nested lock order allows incrementally acquiring the
nested-in instance lock when needed (perhaps even a chain of locks up to
the root) without affecting any caller.

The only affected use of nesting is devlink_nl_nested_fill(), which
iterates over nested devlink instances with the RCU lock, without
locking them, so there's no possibility of deadlock.

So this commit just updates a comment regarding the nested locks.

[1] commit c137743bce02b ("devlink: introduce object and nested devlink
relationship infra")

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260128112544.1661250-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'net-stmmac-pcs-preparation'
Jakub Kicinski [Tue, 3 Feb 2026 03:16:05 +0000 (19:16 -0800)] 
Merge branch 'net-stmmac-pcs-preparation'

Russell King says:

====================
net: stmmac: pcs preparation

These three patches prepare for the PCS changes, which, subject
to Qualcomm testing, should be coming in the next cycle.
====================

Link: https://patch.msgid.link/aXyRlFw7ZuhRPiKo@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: stmmac: handle integrated PCS phy_intf_sel separately
Russell King (Oracle) [Fri, 30 Jan 2026 11:10:36 +0000 (11:10 +0000)] 
net: stmmac: handle integrated PCS phy_intf_sel separately

The dwmac core has no support for SGMII without using its integrated
PCS. Thus, PHY_INTF_SEL_SGMII is only supported when this block is
present, and it makes no sense for stmmac_get_phy_intf_sel() to decode
this.

None of the platform glue users that use stmmac_get_phy_intf_sel()
directly accept PHY_INTF_SEL_SGMII as a valid mode.

Check whether a PCS will be used by the driver for the interface mode,
and if it is the integrated PCS, query the integrated PCS for the
phy_intf_sel_i value to use.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vlmOa-00000006zvB-1fIe@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: stmmac: move most PCS register definitions to stmmac_pcs.c
Russell King (Oracle) [Fri, 30 Jan 2026 11:10:31 +0000 (11:10 +0000)] 
net: stmmac: move most PCS register definitions to stmmac_pcs.c

Move most of the PCS register offset definitions to stmmac_pcs.c.
Since stmmac_pcs.c only ever passes zero into the register offset
macros, remove that ability, making them simple constant integer
definitions.

Add appropriate descriptions of the registers, pointing out their
similarity with their IEEE 802.3 counterparts. Make use of the
BMSR definitions for the GMAC_AN_STATUS register and remove the
driver private versions.

Note that BMSR_LSTATUS is non-low-latching, unlike it's 802.3z
counterpart.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vlmOV-00000006zv5-1CwO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: stmmac: clear half-duplex caps where unsupported
Russell King (Oracle) [Fri, 30 Jan 2026 11:10:26 +0000 (11:10 +0000)] 
net: stmmac: clear half-duplex caps where unsupported

Where a core supports hardware features, but does not indicate support
for half-duplex, clear phylink's half-duplex 1G, 100M and 10M
capability bits to disallow half-duplex operation and advertisement of
these link modes.

This will avoid the need for special code in the PCS driver to do this
based on the ESTATUS register bits, as the support in the PCS is
dependent on the same synthesis choice as the MAC core.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vlmOQ-00000006zuz-0ffN@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'add-support-for-renesas-rz-g3l-gbeth'
Jakub Kicinski [Tue, 3 Feb 2026 03:12:19 +0000 (19:12 -0800)] 
Merge branch 'add-support-for-renesas-rz-g3l-gbeth'

Biju Das says:

====================
Add support for Renesas RZ/G3L GBETH

From: Biju Das <biju.das.jz@bp.renesas.com>

The Renesas RZ/G3L GBETH IP uses Synopsys DesignWare MAC version 5.30
compared to other Renesas SoC such as RZ/V2H that use MAC version 5.20.

The RZ/G3L GBETH requires an extra clock compared to RZ/G3E and has pps
interrupts. Document the Renesas RZ/G3L GBETH IP in bindings and add
support for the RZ/G3L GBETH in dwmac-renesas-gbeth glue driver.
====================

Link: https://patch.msgid.link/20260131161250.5047-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: stmmac: dwmac-renesas-gbeth: Add support for RZ/G3L SoC
Biju Das [Sat, 31 Jan 2026 16:12:43 +0000 (16:12 +0000)] 
net: stmmac: dwmac-renesas-gbeth: Add support for RZ/G3L SoC

Compared to other Renesas GBETH stmmac glue drivers, RZ/G3L GBETH IP use
the version Synopsys DesignWare MAC (version 5.30). It has an extra clock
compared to RZ/V2H and has ptp_pps_o interrupts. Add support for RZ/G3L
GBETH by reusing device data of RZ/V2H and can be extended to add other
functionalities later.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260131161250.5047-3-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agodt-bindings: net: renesas,rzv2h-gbeth: Document Renesas RZ/G3L SoC
Biju Das [Sat, 31 Jan 2026 16:12:42 +0000 (16:12 +0000)] 
dt-bindings: net: renesas,rzv2h-gbeth: Document Renesas RZ/G3L SoC

Add device tree binding support for the Gigabit Ethernet (GBETH) IP on
Renesas RZ/G3L SoC. This SoC uses different Synopsys DesignWare MAC
version 5.30 compared to RZ/G3E.

RZ/G3L requires an extra clock compared to RZ/G3E and has pps interrupts.

Add a new compatible string "renesas,r9a08g046-gbeth" for RZ/G3L SoC and
update the schema to handle hardware differences between SoC variants.

Extend the base snps,dwmac.yaml schema to accommodate the PPS interrupts.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260131161250.5047-2-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'mptcp-implement-read_sock-and-splice_read'
Jakub Kicinski [Tue, 3 Feb 2026 02:15:35 +0000 (18:15 -0800)] 
Merge branch 'mptcp-implement-read_sock-and-splice_read'

Matthieu Baerts says:

====================
mptcp: implement .read_sock and .splice_read

This series is a preparation work for future in-kernel MPTCP sockets
usage. Here, two interfaces are implemented: read_sock and splice_read.
As a result of this series, splice() with MPTCP sockets -- which was
already supported -- is now improved.

- Patches 1-2: .read_sock implementation

- Patches 3-4: .splice_read implementation

- Patches 5-6: validate splice() support with MPTCP sockets.
====================

Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-0-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoselftests: mptcp: connect: cover splice mode
Geliang Tang [Fri, 30 Jan 2026 19:24:29 +0000 (20:24 +0100)] 
selftests: mptcp: connect: cover splice mode

The "splice" alternate mode for mptcp_connect.sh/.c is available now,
this patch adds mptcp_connect_splice.sh to test it in the MPTCP CI by
default.

Note that this mode is also supported by stable kernel versions, but
optimised in this patch series.

Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-6-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoselftests: mptcp: add splice io mode
Geliang Tang [Fri, 30 Jan 2026 19:24:28 +0000 (20:24 +0100)] 
selftests: mptcp: add splice io mode

This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice_read() functions of MPTCP.

do_splice() efficiently transfers data directly between two file
descriptors (infd and outfd) without copying to userspace, using
Linux's splice() system call.

Usage:
./mptcp_connect.sh -m splice

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-5-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agomptcp: implement .splice_read
Geliang Tang [Fri, 30 Jan 2026 19:24:27 +0000 (20:24 +0100)] 
mptcp: implement .splice_read

This patch implements .splice_read interface of mptcp struct proto_ops
as mptcp_splice_read() with reference to tcp_splice_read().

Corresponding to __tcp_splice_read(), __mptcp_splice_read() is defined,
invoking mptcp_read_sock() instead of tcp_read_sock().

mptcp_splice_read() is almost the same as tcp_splice_read(), except for
sock_rps_record_flow().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-4-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: export tcp_splice_state
Geliang Tang [Fri, 30 Jan 2026 19:24:26 +0000 (20:24 +0100)] 
tcp: export tcp_splice_state

Export struct tcp_splice_state and tcp_splice_data_recv() in net/tcp.h
so that they can be used by MPTCP in the next patch.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-3-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agomptcp: implement .read_sock
Geliang Tang [Fri, 30 Jan 2026 19:24:25 +0000 (20:24 +0100)] 
mptcp: implement .read_sock

Current in-kernel TCP sockets -- i.e. from nvme_tcp_try_recv() -- need
to call .read_sock interface of struct proto_ops, but it's not
implemented in MPTCP.

This patch implements it with reference to __tcp_read_sock() and
__mptcp_recvmsg_mskq().

Corresponding to tcp_recv_skb(), a new helper for MPTCP named
mptcp_recv_skb() is added to peek a skb from sk->sk_receive_queue.

Compared with __mptcp_recvmsg_mskq(), mptcp_read_sock() uses
sk->sk_rcvbuf as the max read length. The LISTEN status is checked
before the while loop, and mptcp_recv_skb() and mptcp_cleanup_rbuf()
are invoked after the loop. In the loop, all flags checks for
__mptcp_recvmsg_mskq() are removed.

Reviewed-by: Hannes Reinecke <hare@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-2-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agomptcp: add eat_recv_skb helper
Geliang Tang [Fri, 30 Jan 2026 19:24:24 +0000 (20:24 +0100)] 
mptcp: add eat_recv_skb helper

This patch extracts the free skb related code in __mptcp_recvmsg_mskq()
into a new helper mptcp_eat_recv_skb().

This new helper will be used in the next patch.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-1-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'ptp-vmclock-add-vm-generation-counter-and-acpi-notification'
Jakub Kicinski [Tue, 3 Feb 2026 02:06:02 +0000 (18:06 -0800)] 
Merge branch 'ptp-vmclock-add-vm-generation-counter-and-acpi-notification'

Takahiro Itazuri says:

====================
ptp: vmclock: Add VM generation counter and ACPI notification

Similarly to live migration, starting a VM from some serialized state
(aka snapshot) is an event which calls for adjusting guest clocks, hence
a hypervisor should increase the disruption_marker before resuming the
VM vCPUs, letting the guest know.

However, loading a snapshot, is slightly different than live migration,
especially since we can start multiple VMs from the same serialized
state. Apart from adjusting clocks, the guest needs to take additional
action during such events, e.g. recreate UUIDs, reset network
adapters/connections, reseed entropy pools, etc. These actions are not
necessary during live migration. This calls for a differentiation
between the two triggering events.

We differentiate between the two events via an extra field in the
vmclock_abi, called vm_generation_counter. Whereas hypervisors should
increase the disruption marker in both cases, they should only increase
vm_generation_counter when a snapshot is loaded in a VM (not during live
migration).

Additionally, we attach an ACPI notification to VMClock. Implementing
the notification is optional for the device. VMClock device will declare
that it implements the notification by setting
VMCLOCK_FLAG_NOTIFICATION_PRESENT bit in vmclock_abi flags. Hypervisors
that implement the notification must send an ACPI notification every
time seq_count changes to an even number. The driver will propagate
these notifications to userspace via the poll() interface.
====================

Link: https://patch.msgid.link/20260130173704.12575-1-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: ptp_vmclock: return TAI not UTC
David Woodhouse [Fri, 30 Jan 2026 17:36:06 +0000 (17:36 +0000)] 
ptp: ptp_vmclock: return TAI not UTC

To output UTC would involve complex calculations about whether the time
elapsed since the reference time has crossed the end of the month when
a leap second takes effect. I've prototyped that, but it made me sad.

Much better to report TAI, which is what PHCs should do anyway.
And much much simpler.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-8-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: ptp_vmclock: remove dependency on CONFIG_ACPI
David Woodhouse [Fri, 30 Jan 2026 17:36:05 +0000 (17:36 +0000)] 
ptp: ptp_vmclock: remove dependency on CONFIG_ACPI

Now that we added device tree support we can remove dependency on
CONFIG_ACPI.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Tested-by: Takahiro Itazuri <itazur@amazon.dom>
Link: https://patch.msgid.link/20260130173704.12575-7-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: ptp_vmclock: add 'VMCLOCK' to ACPI device match
David Woodhouse [Fri, 30 Jan 2026 17:36:04 +0000 (17:36 +0000)] 
ptp: ptp_vmclock: add 'VMCLOCK' to ACPI device match

As we finalised the spec, we spotted that vmgenid actually says that the
_HID is supposed to be hypervisor-specific. Although in the 13 years
since the original vmgenid doc was published, nobody seems to have cared
about using _HID to distinguish between implementations on different
hypervisors, and we only ever use the _CID.

For consistency, match the _CID of "VMCLOCK" too.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-6-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: ptp_vmclock: Add device tree support
David Woodhouse [Fri, 30 Jan 2026 17:36:03 +0000 (17:36 +0000)] 
ptp: ptp_vmclock: Add device tree support

Add device tree support to the ptp_vmclock driver, allowing it to probe
via device tree in addition to ACPI.

Handle optional interrupt for clock disruption notifications, mirroring
the ACPI notification behaviour.

Although the interrupt is marked as 'optional' in the DT bindings, if
the device *advertises* the VMCLOCK_FLAG_NOTIFICATION_ABSENT then it
*should* have an interrupt. The driver will refuse to initialize if not.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-5-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agodt-bindings: ptp: Add amazon,vmclock
David Woodhouse [Fri, 30 Jan 2026 17:36:02 +0000 (17:36 +0000)] 
dt-bindings: ptp: Add amazon,vmclock

The vmclock device provides a PTP clock source and precise timekeeping
across live migration and snapshot/restore operations.

The binding has a required memory region containing the vmclock_abi
structure and an optional interrupt for clock disruption notifications.

The full spec is at https://uapi-group.org/specifications/specs/vmclock/

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-4-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: vmclock: support device notifications
Babis Chalios [Fri, 30 Jan 2026 17:36:01 +0000 (17:36 +0000)] 
ptp: vmclock: support device notifications

Add optional support for device notifications in VMClock. When
supported, the hypervisor will send a device notification every time it
updates the seq_count to a new even value.

Moreover, add support for poll() in VMClock as a means to propagate this
notification to user space. poll() will return a POLLIN event to
listeners every time seq_count changes to a value different than the one
last seen (since open() or last read()/pread()). This means that when
poll() returns a POLLIN event, listeners need to use read() to observe
what has changed and update the reader's view of seq_count. In other
words, after a poll() returned, all subsequent calls to poll() will
immediately return with a POLLIN event until the listener calls read().

The device advertises support for the notification mechanism by setting
flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If
the flag is not present the driver won't setup the ACPI notification
handler and poll() will always immediately return POLLHUP.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-3-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoptp: vmclock: add vm generation counter
Babis Chalios [Fri, 30 Jan 2026 17:36:00 +0000 (17:36 +0000)] 
ptp: vmclock: add vm generation counter

Similar to live migration, loading a VM from some saved state (aka
snapshot) is also an event that calls for clock adjustments in the
guest. However, guests might want to take more actions as a response to
such events, e.g. as discarding UUIDs, resetting network connections,
reseeding entropy pools, etc. These are actions that guests don't
typically take during live migration, so add a new field in the
vmclock_abi called vm_generation_counter which informs the guest about
such events.

Hypervisor advertises support for vm_generation_counter through the
VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT flag. Users need to check the
presence of this bit in vmclock_abi flags field before using this flag.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Takahiro Itazur <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-2-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'ipv6-misc-changes-in-output-path'
Jakub Kicinski [Tue, 3 Feb 2026 01:49:31 +0000 (17:49 -0800)] 
Merge branch 'ipv6-misc-changes-in-output-path'

Eric Dumazet says:

====================
ipv6: misc changes in output path

Small optimizations mostly in ip6_xmit() path.

TX performance increases by about 3 %.

Patches 5-7: add dst4_mtu() and dst6_mtu() to save space.

Last patch colocates inet6_cork in inet_cork_full.

This series reduces kernel size by 494 bytes on x86_64:

scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 4/2 grow/shrink: 9/23 up/down: 665/-1159 (-494)
Function                                     old     new   delta
ip6_finish_output_gso_slowpath_drop            -     197    +197
ip6_xmit                                    1452    1595    +143
do_ipv6_getsockopt                          2855    2950     +95
kzalloc_noprof                                 -      55     +55
ip4ip6_err                                   918     955     +37
__icmp_send                                 1499    1532     +33
do_ip_getsockopt                            2573    2605     +32
__ip6_append_data                           4109    4137     +28
__pfx_kzalloc_noprof                           -      16     +16
__pfx_ip6_finish_output_gso_slowpath_drop       -      16     +16
ipmr_prepare_xmit                           1232    1238      +6
ip6_forward                                 1905    1909      +4
ip6_cork_release                             108     111      +3
ipv6_push_nfrag_opts                         489     486      -3
ipv6_push_frag_opts                           90      87      -3
ip6_finish_output2                          1446    1437      -9
ip6_tnl_xmit                                2639    2627     -12
ip6_default_advmss                           176     160     -16
__ip6_rt_update_pmtu                        1087    1071     -16
tcp_v6_syn_recv_sock                        1715    1696     -19
tcp_v4_syn_recv_sock                        1107    1088     -19
__ip_make_skb                               1339    1320     -19
ip_setup_cork                                406     385     -21
ip6_setup_cork                               732     710     -22
rawv6_push_pending_frames                    581     556     -25
ip6_push_pending_frames                      184     157     -27
udpv6_splice_eof                             203     170     -33
ip6_flush_pending_frames                     220     183     -37
ip6_append_data                              349     312     -37
udp_v6_push_pending_frames                   155     115     -40
sit_tunnel_xmit                             1957    1914     -43
__pfx_dst_mtu                                 64       -     -64
tcp_v4_mtu_reduced                           289     220     -69
tcp_v6_mtu_reduced                           209     139     -70
ip6_make_skb                                 574     484     -90
ip6_finish_output                            827     697    -130
dst_mtu                                      160       -    -160
fib6_nh_mtu_change                           511     336    -175
Total: Before=22584400, After=22583906, chg -0.00%
====================

Link: https://patch.msgid.link/20260130210303.3888261-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: colocate inet6_cork in inet_cork_full
Eric Dumazet [Fri, 30 Jan 2026 21:03:03 +0000 (21:03 +0000)] 
ipv6: colocate inet6_cork in inet_cork_full

All inet6_cork users also use one inet_cork_full.

Reduce number of parameters and increase data locality.

This saves ~275 bytes of code on x86_64.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv4: use dst4_mtu() instead of dst_mtu()
Eric Dumazet [Fri, 30 Jan 2026 21:03:02 +0000 (21:03 +0000)] 
ipv4: use dst4_mtu() instead of dst_mtu()

When we expect an IPv4 dst, use dst4_mtu() instead of dst_mtu()
to save some code space.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: use dst6_mtu() instead of dst_mtu()
Eric Dumazet [Fri, 30 Jan 2026 21:03:01 +0000 (21:03 +0000)] 
ipv6: use dst6_mtu() instead of dst_mtu()

When we expect an IPv6 dst, use dst6_mtu() instead of dst_mtu()
to save some code space.

Due to current dst6_mtu() implementation, only convert
users in IPv6 stack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoinet: add dst4_mtu() and dst6_mtu() helpers
Eric Dumazet [Fri, 30 Jan 2026 21:03:00 +0000 (21:03 +0000)] 
inet: add dst4_mtu() and dst6_mtu() helpers

With CONFIG_MITIGATION_RETPOLINE=y dst_mtu() is a bit fat,
because it is generic.

Indeed, clang does not always inline it.

Add dst4_mtu() and dst6_mtu() helpers for callers that
expect either ipv4_mtu() or ip6_mtu() to be called.

These helpers are always inlined.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: use SKB_DROP_REASON_PKT_TOO_BIG in ip6_xmit()
Eric Dumazet [Fri, 30 Jan 2026 21:02:59 +0000 (21:02 +0000)] 
ipv6: use SKB_DROP_REASON_PKT_TOO_BIG in ip6_xmit()

When a too big packet is dropped, use SKB_DROP_REASON_PKT_TOO_BIG.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: use __skb_push() in ip6_xmit()
Eric Dumazet [Fri, 30 Jan 2026 21:02:58 +0000 (21:02 +0000)] 
ipv6: use __skb_push() in ip6_xmit()

ip6_xmit() makes sure there is enough headroom in the skb,
it can uses __skb_push() instead of the out-of-line skb_push().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: add some unlikely()/likely() clauses in ip6_output.c
Eric Dumazet [Fri, 30 Jan 2026 21:02:57 +0000 (21:02 +0000)] 
ipv6: add some unlikely()/likely() clauses in ip6_output.c

1) daddr is unlikely a multicast in ip6_finish_output2().

2) ip6_finish_output_gso_slowpath_drop() should not be called often.

3) ip6_fragment() should not be called often.

4) opt is unlikely to be set.

5) ip6_xmit() and ip6_forward() mostly sends not too big packets.

6) Most __ip6_make_skb() calls are for UDP packets,
   not ICMPV6 ones.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: pass proto by value to ipv6_push_nfrag_opts() and ipv6_push_frag_opts()
Eric Dumazet [Fri, 30 Jan 2026 21:02:56 +0000 (21:02 +0000)] 
ipv6: pass proto by value to ipv6_push_nfrag_opts() and ipv6_push_frag_opts()

With CONFIG_STACKPROTECTOR_STRONG=y, it is better to avoid passing
a pointer to an automatic variable.

Change these exported functions to return 'u8 proto'
instead of void.

- ipv6_push_nfrag_opts()
- ipv6_push_frag_opts()

For instance, replace
ipv6_push_frag_opts(skb, opt, &proto);
with:
proto = ipv6_push_frag_opts(skb, opt, proto);

Note that even after this change, ip6_xmit() has to use a stack canary
because of @first_hop variable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: remove unnecessary module_init/exit functions
Ethan Nelson-Moore [Sat, 31 Jan 2026 00:42:56 +0000 (16:42 -0800)] 
net: remove unnecessary module_init/exit functions

Many network drivers have unnecessary empty module_init and module_exit
functions. Remove them (including some that just print a message). Note
that if a module_init function exists, a module_exit function must also
exist; otherwise, the module cannot be unloaded.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260131004327.18112-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: ethernet: use module_pci_driver; remove useless driver versions
Ethan Nelson-Moore [Sat, 31 Jan 2026 02:24:30 +0000 (18:24 -0800)] 
net: ethernet: use module_pci_driver; remove useless driver versions

The module version is useless, and the only thing these drivers' init
routines did besides pci_register_driver was to print the driver name
and/or version.

Acked-by: Francois Romieu <romieu@fr.zoreil.com> (epic100)
Reviewed-by: Simon Horman <horms@kernel.org> (epic100, sis900)
Reviewed-by: Sai Krishna <saikrishnag@marvell.com> (epic100)
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260131022441.56274-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: add a debug check in __skb_push()
Eric Dumazet [Fri, 30 Jan 2026 16:02:53 +0000 (16:02 +0000)] 
net: add a debug check in __skb_push()

Add the following check, to detect bugs sooner for CONFIG_DEBUG_NET=y
builds.

DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head);

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130160253.2936789-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'net-phy-dp83867-always-program-r-sgmii-enable-bits'
Jakub Kicinski [Tue, 3 Feb 2026 01:19:55 +0000 (17:19 -0800)] 
Merge branch 'net-phy-dp83867-always-program-r-sgmii-enable-bits'

Sean Anderson says:

====================
net: phy: dp83867: Always program R/SGMII enable bits

The hardware designers at my company neglected to read the datasheet for
this PHY and did not add appropriate resistors to configure it for
SGMII. Add support for configuring the it based on phy-mode instead of
relying on the resistors for a suitable default.
====================

Link: https://patch.msgid.link/20260129171205.3868605-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: phy: dp83867: Always program R/SGMII enable bits
Sean Anderson [Thu, 29 Jan 2026 17:12:05 +0000 (12:12 -0500)] 
net: phy: dp83867: Always program R/SGMII enable bits

If the board designers have neglected to populate the appropriate
resistors on the strapping pins then the phy may default to the wrong
interface mode. Enable/disable the RGMII/SGMII enable bits as necessary
to select the correct interface.

The dp83867 strapping pins have four levels and typically configure two
features at once. LED_0 controls both port mirroring and whether SGMII
is enabled. If it is pulled to VDDIO, both port mirroring and SGMII
will be enabled. For variants of the dp83867 that do not support SGMII,
this will prevent data from being transferred. As we now explicitly set
the SGMII and RGMII enable bits, we do not need to detect whether SGMII
has been inadvertently enabled.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20260129171205.3868605-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: phy: dp83867: Program TX FIFO for all interfaces
Sean Anderson [Thu, 29 Jan 2026 17:12:04 +0000 (12:12 -0500)] 
net: phy: dp83867: Program TX FIFO for all interfaces

All supported interfaces use the TX FIFO register at least some of the
time, so there's no point in checking the interface. Retain the check
for the RX FIFO level since it is only used by SGMII.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20260129171205.3868605-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: l3mdev: use skb_dst_dev_rcu() in l3mdev_l3_out()
Eric Dumazet [Fri, 30 Jan 2026 19:19:06 +0000 (19:19 +0000)] 
net: l3mdev: use skb_dst_dev_rcu() in l3mdev_l3_out()

Extend the RCU section a bit so that we can use the safer
skb_dst_dev_rcu() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130191906.3781856-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agobnxt_en: Allow ntuple filters for drops
Joe Damato [Sat, 31 Jan 2026 00:30:41 +0000 (16:30 -0800)] 
bnxt_en: Allow ntuple filters for drops

It appears that in commit 7efd79c0e689 ("bnxt_en: Add drop action
support for ntuple"), bnxt gained support for ntuple filters for packet
drops.

However, support for this does not seem to work in recent kernels or
against net-next:

  % sudo ethtool -U eth0 flow-type udp4 src-ip 1.1.1.1 action -1
    rmgr: Cannot insert RX class rule: Operation not supported
    Cannot insert classification rule

The issue is that the existing code uses ethtool_get_flow_spec_ring_vf,
which will return a non-zero value if the ring_cookie is set to
RX_CLS_FLOW_DISC, which then causes bnxt_add_ntuple_cls_rule to return
-EOPNOTSUPP because it thinks the user is trying to set an ntuple filter
for a vf.

Fix this by first checking that the ring_cookie is not RX_CLS_FLOW_DISC.

After this patch, ntuple filters for drops can be added:

  % sudo ethtool -U eth0 flow-type udp4 src-ip 1.1.1.1 action -1
  Added rule with ID 0

  % ethtool -n eth0
  44 RX rings available
  Total 1 rules

  Filter: 0
      Rule Type: UDP over IPv4
      Src IP addr: 1.1.1.1 mask: 0.0.0.0
      Dest IP addr: 0.0.0.0 mask: 255.255.255.255
      TOS: 0x0 mask: 0xff
      Src port: 0 mask: 0xffff
      Dest port: 0 mask: 0xffff
      Action: Drop

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260131003042.2570434-1-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotools: ynl: cli: make the output compact
Jakub Kicinski [Sat, 31 Jan 2026 20:30:29 +0000 (12:30 -0800)] 
tools: ynl: cli: make the output compact

Make the default (non-JSON) output more compact. Looking at RSS
context dumps is pretty much impossible without this, because
default print shows the indirection table with line per entry:

  'indir': [0,
            1,
            2,
    ...

And indirection tables have 100-200 entries each.

The compact output is far more readable:

    'indir': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
              16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260131203029.1173492-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agodocs: networking: mention that RSS table should be 4x the queue count
Jakub Kicinski [Sat, 31 Jan 2026 22:54:54 +0000 (14:54 -0800)] 
docs: networking: mention that RSS table should be 4x the queue count

Spell out the recommendation that the RSS table should be
4x the queue count to avoid traffic imbalance. Include minor
rephrasing and removal of the explicit 128 entry example
since a 128 entry table is inadequate on modern machines.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131225454.1225151-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoselftests: drv-net: rss: validate min RSS table size
Jakub Kicinski [Sat, 31 Jan 2026 22:54:53 +0000 (14:54 -0800)] 
selftests: drv-net: rss: validate min RSS table size

Add a test which checks that the RSS table is at least 4x the max
queue count supported by the device. The original RSS spec from
Microsoft stated that the RSS indirection table should be 2 to 8
times the CPU count, presumably assuming queue per CPU. If the
CPU count is not a power of two, however, a power-of-2 table
2x larger than queue count results in a 33% traffic imbalance.
Validate that the indirection table is at least 4x the queue
count. This lowers the imbalance to 16% which empirically
appears to be more acceptable to memcache-like workloads.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260131225454.1225151-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: spacemit: display phy driver information
Chukun Pan [Sun, 1 Feb 2026 10:00:01 +0000 (18:00 +0800)] 
net: spacemit: display phy driver information

Print the PHY driver used and interrupt status after connection.

Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260201100001.33102-1-amadeus@jmu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agoMerge tag 'linux-can-next-for-6.20-20260131' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Sat, 31 Jan 2026 20:59:48 +0000 (12:59 -0800)] 
Merge tag 'linux-can-next-for-6.20-20260131' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2026-01-31

This first 2 patches are by Biju Das, target the rcar_canfd driver and
add support for FD-only mode.

Lad Prabhakar's patches, also for the rcar_canfd driver add support
for the RZ/T2H SoC.

The last 2 patches are by Michael Tretter and me, target the sja1000
driver and clean up the CAN state handling.

* tag 'linux-can-next-for-6.20-20260131' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: sja1000: sja1000_err(): use error counter for error state
  can: sja1000: sja1000_err(): make use of sja1000_get_berr_counter() to read error counters
  can: rcar_canfd: Add RZ/T2H support
  dt-bindings: can: renesas,rcar-canfd: Document RZ/T2H and RZ/N2H SoCs
  dt-bindings: can: renesas,rcar-canfd: Document RZ/V2H(P) and RZ/V2N SoCs
  dt-bindings: can: renesas,rcar-canfd: Specify reset-names
  can: rcar_canfd: Add support for FD-Only mode
  dt-bindings: can: renesas,rcar-canfd: Document renesas,fd-only property
====================

Link: https://patch.msgid.link/20260131101512.1958907-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agoRevert "net/smc: Introduce TCP ULP support"
D. Wythe [Wed, 28 Jan 2026 05:54:52 +0000 (13:54 +0800)] 
Revert "net/smc: Introduce TCP ULP support"

This reverts commit d7cd421da9da2cc7b4d25b8537f66db5c8331c40.

As reported by Al Viro, the TCP ULP support for SMC is fundamentally
broken. The implementation attempts to convert an active TCP socket
into an SMC socket by modifying the underlying `struct file`, dentry,
and inode in-place, which violates core VFS invariants that assume
these structures are immutable for an open file, creating a risk of
use after free errors and general system instability.

Given the severity of this design flaw and the fact that cleaner
alternatives (e.g., LD_PRELOAD, BPF) exist for legacy application
transparency, the correct course of action is to remove this feature
entirely.

Fixes: d7cd421da9da ("net/smc: Introduce TCP ULP support")
Link: https://lore.kernel.org/netdev/Yus1SycZxcd+wHwz@ZenIV/
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260128055452.98251-1-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: ax25: remove plumbing for never-implemented DAMA Master support
Ethan Nelson-Moore [Thu, 29 Jan 2026 08:09:04 +0000 (00:09 -0800)] 
net: ax25: remove plumbing for never-implemented DAMA Master support

The AX25_DAMA_MASTER option has been unimplemented and marked broken
ever since it was introduced in 2007 in commit 954b2e7f4c37 ("[NET]
AX.25 Kconfig and docs updates and fixes"). At this point, it is very
unlikely it will be implemented. Remove it.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260129080908.44710-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agoMerge branch 'net-wwan-add-nmea-port-type-support'
Jakub Kicinski [Sat, 31 Jan 2026 02:27:02 +0000 (18:27 -0800)] 
Merge branch 'net-wwan-add-nmea-port-type-support'

Slark Xiao says:

====================
net: wwan: add NMEA port type support

The series introduces a long discussed NMEA port type support for the
WWAN subsystem. There are two goals. From the WWAN driver perspective,
NMEA exported as any other port type (e.g. AT, MBIM, QMI, etc.). From
user space software perspective, the exported chardev belongs to the
GNSS class what makes it easy to distinguish desired port and the WWAN
device common to both NMEA and control (AT, MBIM, etc.) ports makes it
easy to locate a control port for the GNSS receiver activation.

Done by exporting the NMEA port via the GNSS subsystem with the WWAN
core acting as proxy between the WWAN modem driver and the GNSS
subsystem.

The series starts from a cleanup patch. Then three patches prepares the
WWAN core for the proxy style operation. Followed by a patch introding a
new WWNA port type, integration with the GNSS subsystem and demux. The
series ends with a couple of patches that introduce emulated EMEA port
to the WWAN HW simulator.

The series is the product of the discussion with Loic about the pros and
cons of possible models and implementation. Also Muhammad and Slark did
a great job defining the problem, sharing the code and pushing me to
finish the implementation. Daniele has caught an issue on driver
unloading and suggested an investigation direction. What was concluded
by Loic. Many thanks.
====================

Link: https://patch.msgid.link/20260126062158.308598-1-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: wwan: mhi_wwan_ctrl: Add NMEA channel support
Slark Xiao [Mon, 26 Jan 2026 06:21:58 +0000 (14:21 +0800)] 
net: wwan: mhi_wwan_ctrl: Add NMEA channel support

For MHI WWAN device, we need a match between NMEA channel and
WWAN_PORT_NMEA type. Then the GNSS subsystem could create the
gnss device succssfully.

Signed-off-by: Slark Xiao <slark_xiao@163.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20260126062158.308598-9-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: wwan: hwsim: support NMEA port emulation
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:57 +0000 (14:21 +0800)] 
net: wwan: hwsim: support NMEA port emulation

Support NMEA port emulation for the WWAN core GNSS port testing purpose.
Emulator produces pair of GGA + RMC sentences every second what should
be enough to fool gpsd into believing it is working with a NMEA GNSS
receiver.

If the GNSS system is enabled then one NMEA port will be created
automatically for the simulated WWAN device. Manual NMEA port creation
is not supported at the moment.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20260126062158.308598-8-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: wwan: hwsim: refactor to support more port types
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:56 +0000 (14:21 +0800)] 
net: wwan: hwsim: refactor to support more port types

Just introduced WWAN NMEA port type needs a testing option. The WWAN HW
simulator was developed with the AT port type in mind and cannot be
easily extended. Refactor it now to make it capable to support more port
types.

No big functional changes, mostly renaming with a little code
rearrangement.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20260126062158.308598-7-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: wwan: add NMEA port support
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:55 +0000 (14:21 +0800)] 
net: wwan: add NMEA port support

Many WWAN modems come with embedded GNSS receiver inside and have a
dedicated port to output geopositioning data. On the one hand, the
GNSS receiver has little in common with WWAN modem and just shares a
host interface and should be exported using the GNSS subsystem. On the
other hand, GNSS receiver is not automatically activated and needs a
generic WWAN control port (AT, MBIM, etc.) to be turned on. And a user
space software needs extra information to find the control port.

Introduce the new type of WWAN port - NMEA. When driver asks to register
a NMEA port, the core allocates common parent WWAN device as usual, but
exports the NMEA port via the GNSS subsystem and acts as a proxy between
the device driver and the GNSS subsystem.

From the WWAN device driver perspective, a NMEA port is registered as a
regular WWAN port without any difference. And the driver interacts only
with the WWAN core. From the user space perspective, the NMEA port is a
GNSS device which parent can be used to enumerate and select the proper
control port for the GNSS receiver management.

CC: Slark Xiao <slark_xiao@163.com>
CC: Muhammad Nuzaihan <zaihan@unrealasia.net>
CC: Qiang Yu <quic_qianyu@quicinc.com>
CC: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
CC: Johan Hovold <johan@kernel.org>
Suggested-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20260126062158.308598-6-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: wwan: core: split port unregister and stop
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:54 +0000 (14:21 +0800)] 
net: wwan: core: split port unregister and stop

Upcoming GNSS (NMEA) port type support requires exporting it via the
GNSS subsystem. On another hand, we still need to do basic WWAN core
work: call the port stop operation, purge queues, release the parent
WWAN device, etc. To reuse as much code as possible, split the port
unregistering function into the deregistration of a regular WWAN port
device, and the common port tearing down code.

In order to keep more code generic, break the device_unregister() call
into device_del() and put_device(), which release the port memory
uniformly.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20260126062158.308598-5-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: wwan: core: split port creation and registration
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:53 +0000 (14:21 +0800)] 
net: wwan: core: split port creation and registration

Upcoming GNSS (NMEA) port type support requires exporting it via the
GNSS subsystem. On another hand, we still need to do basic WWAN core
work: find or allocate the WWAN device, make it the port parent, etc. To
reuse as much code as possible, split the port creation function into
the registration of a regular WWAN port device, and basic port struct
initialization.

To be able to use put_device() uniformly, break the device_register()
call into device_initialize() and device_add() and call device
initialization earlier.

While at it, fix a minor number leak upon WWAN port registration
failure.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://patch.msgid.link/20260126062158.308598-4-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>