====================
net: ethernet: renesas: rcar_gen4_ptp: Hide private data
The R-Car Gen4 PTP module started out as an exclusive feature of a
single driver, but have since been extended to cover both R-Car Switch
and TSN driver implementations on Gen4.
The feature have already been extended to be built as its own module
with an interface exposed thru a local header file. The header file
however also exposes the modules private data structure. The two
existing users have already started to poke at members of the struct.
The exposed private data being manipulated by users makes refactoring
and future rework hard as the interface for the module becomes to
chaotic. This small series aims to create two helpers to hide the
private data.
This is done as a small preparation before a third, new, users of the
Gen4 PTP will be added in a follow up series.
====================
net: ethernet: renesas: rcar_gen4_ptp: Hide private data from users
The Gen4 PTP helper module is already used by RTSN and RSWITCH to
support PTP clocks and will be used by RAVB too. Hide the Gen4 PTP
private data structure to make sure none of the users poke at it.
This will be more important for RAVB use-cases as more then one RAVB
device will need to cooperate using one PTP clock source.
net: ethernet: renesas: rcar_gen4_ptp: Add helper to read time
Instead of accessing the Gen4 PTP specific structure directly in drivers
add a helper to read the time. This is done in preparation to
completely hide the Gen4 PTP specific structure from users.
net: ethernet: renesas: rcar_gen4_ptp: Add helper to get clock index
Instead of accessing the Gen4 PTP specific structure directly in drivers
add a helper to read the clock index. This is done in preparation to
completely hide the Gen4 PTP specific structure from users.
Instead of accessing the Gen4 PTP specific structure directly in drivers
move the device address assignment into the preparation call. This is
done in preparation to completely hide the Gen4 PTP specific structure
from users.
net: stmmac: rk: remove need for ->set_speed() method
As we can detect whether the SoC provides the parameters necessary for
rk_set_reg_speed(), we don't need to have explicit calls to this.
Instead, we can move the contents of this function to
rk_set_clk_tx_rate().
This remsoves all the .set_speed() implementations that merely go on to
invoke rk_set_reg_speed().
net: stmmac: rk: use rk_encode_wm16() for RMII clock
The RMII clock is a single bit, which is set for 100M and clear for
10M. Move this out of struct rk_reg_speed_data (which gets rid of
this structure) into the struct rk_clock_fields as the bitmask for
this bit.
This gets rid of the per-SoC variability in the calls to
rk_set_reg_speed().
net: stmmac: rk: use rk_encode_wm16() for RMII speed
The RMII speed configuration is encoded as a single bit, which is set
for 100M and clean for 10M. Provide the bitfield definition in
struct rk_clock_fields, moving it out of struct rk_reg_speed_data's
rmii_10 and rmii_100 initialisers. Update rk_set_reg_speed() to handle
the new definition location of this bit.
net: stmmac: rk: use rk_encode_wm16() for RGMII clocks
As all of the RGMII clock selection bitfields (gmii_clk_sel) use the
same encoding, parameterise this by providing the bitfield mask in
the BSP private data.
This is the last user of GRF_FIELD_CONST(), so remove that definition
as well.
One additional change is for RK3328 - as only gmac2io supports RGMII,
only initialise the mask for this instance.
net: stmmac: rk: move speed GRF register offset to private data
Move the speed/clocking related GRF register offset into the driver
private data, convert rk_set_reg_speed() to use it and initialise this
member either from the corresponding member in struct rk_gmac_ops, or
the SoC specific initialisation function.
net: stmmac: rk: convert rk3588 to mask-based interface mode config
rk3588 has a quirk compared to the other Rockchip implementations in
that the interface mode configuration register is in the php_grf
regmap rather than the grf regmap. Add a flag to indicate this, and
a separate function to write to the appropriate regmap. This allows
rk3588 to be converted.
net: stmmac: rk: convert to mask-based interface mode configuration
The majority of Rockchip implementations require three common pieces
of information to configure the PHY interface mode:
- The grf register offset for configuring the GMAC phy_intf_sel field
and the RMII mode bit.
- The bitfield in this register for the GMAC's phy_intf_sel.
- The bit position for RMII mode but clear for RGMII mode.
Introduce members for this information into struct rk_priv_data and
struct rk_gmac_ops, which will be used to pre-initialise the struct
rk_priv_data members. We describe the register contents using
bitfields, even for those that are a single bit for consistency.
As each register comprises of two halves, where the upper half enables
changing the bit state in the lower half, we can describe these
bitfields using a 16-bit data type, and provide rk_encode_wm16() to
generate the actual register values from the field mask and field
value. We are unable to use the FIELD_PREP_WM16() macros for this as
these require the field mask to be a constant.
Add code to rk_gmac_powerup() to get the phy_intf_sel value, validating
that the resulting mode is either RMII or RGMII. No other modes are
supported by any of the Rockchip SoCs supported by this driver.
If either of the bitfield mask values are populated in struct
rk_priv_data, use these to generate the register contents, and write
the resulting value to the specified GRF register.
Convert many Rockchip implementations to use this new infrastructure.
For those where there is a single GMAC instance, it is merely a case of
filling in the new members of struct rk_gmac_ops. For those with
multiple instances, one or more of these members depends on the GMAC
instance, so setup of the members in struct rk_gmac has to be done via
the .init method of struct rk_gmac_ops. The corresponding code is
removed from the set_to_rgmii() and set_to_rmii() implementations.
Since the member name documents the purpose of the field that is being
initialised, providing preprocessor macros to define the bitfields is
deemed to be less than useful given the massive size of this driver.
The existing mechanisms remain behind for those SoCs that can not be
converted to this scheme.
====================
AccECN protocol case handling series
Plesae find the v13 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
---
Chia-Yu Chang (13):
selftests/net: gro: add self-test for TCP CWR flag
tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: add TCP_SYNACK_RETRANS synack_type
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: detect loss ACK w/ AccECN option and add
TCP_ACCECN_OPTION_PERSIST
tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info
tcp: accecn: enable AccECN
selftests/net: packetdrill: add TCP Accurate ECN cases
Ilpo Järvinen (2):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:15 +0000 (23:25 +0100)]
selftests/net: packetdrill: add TCP Accurate ECN cases
Linux Accurate ECN test sets using ACE counters and AccECN options to
cover several scenarios: Connection teardown, different ACK conditions,
counter wrapping, SACK space grabbing, fallback schemes, negotiation
retransmission/reorder/loss, AccECN option drop/loss, different
handshake reflectors, data with marking, and different sysctl values.
The packetdrill used is commit cbe405666c9c8698ac1e72f5e8ffc551216dfa56
of repo: https://github.com/minuscat/packetdrill/tree/upstream_accecn.
And corresponding patches are sent to google/packetdrill email list.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Co-developed-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Ilpo Järvinen <ij@kernel.org> Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-16-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:13 +0000 (23:25 +0100)]
tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info
Add 2-bit tcpi_ecn_mode feild within tcp_info to indicate which ECN
mode is negotiated: ECN_MODE_DISABLED, ECN_MODE_RFC3168, ECN_MODE_ACCECN,
or ECN_MODE_PENDING. This is done by utilizing available bits from
tcpi_accecn_opt_seen (reduced from 16 bits to 2 bits) and
tcpi_accecn_fail_mode (reduced from 16 bits to 4 bits).
Also, an extra 24-bit tcpi_options2 field is identified to represent
newer options and connection features, as all 8 bits of tcpi_options
field have been used.
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:12 +0000 (23:25 +0100)]
tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST
Detect spurious retransmission of a previously sent ACK carrying the
AccECN option after the second retransmission. Since this might be caused
by the middlebox dropping ACK with options it does not recognize, disable
the sending of the AccECN option in all subsequent ACKs. This patch
follows Section 3.2.3.2.2 of AccECN spec (RFC9768), and a new field
(accecn_opt_sent_w_dsack) is added to indicate that an AccECN option was
sent with duplicate SACK info.
Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl:
(TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and
persistently sends AccECN option once it fits into TCP option space.
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:11 +0000 (23:25 +0100)]
tcp: accecn: fallback outgoing half link to non-AccECN
According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server
is in AccECN mode and in SYN-RCVD state, and if it receives a value of
zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the
connection the Server MUST NOT set ECT on outgoing packets and MUST
NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it
MUST NOT disable AccECN feedback.
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:10 +0000 (23:25 +0100)]
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
Based on specification:
https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
Based on Section 3.1.5 of AccECN spec (RFC9768), a TCP Server in
AccECN mode MUST NOT set ECT on any packet for the rest of the connection,
if it has received or sent at least one valid SYN or Acceptable SYN/ACK
with (AE,CWR,ECE) = (0,0,0) during the handshake.
In addition, a host in AccECN mode that is feeding back the IP-ECN
field on a SYN or SYN/ACK MUST feed back the IP-ECN field on the
latest valid SYN or acceptable SYN/ACK to arrive.
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:09 +0000 (23:25 +0100)]
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
For Accurate ECN, the first SYN/ACK sent by the TCP server shall set
the ACE flag (Table 1 of RFC9768) and the AccECN option to complete the
capability negotiation. However, if the TCP server needs to retransmit
such a SYN/ACK (for example, because it did not receive an ACK
acknowledging its SYN/ACK, or received a second SYN requesting AccECN
support), the TCP server retransmits the SYN/ACK without the AccECN
option. This is because the SYN/ACK may be lost due to congestion, or a
middlebox may block the AccECN option. Furthermore, if this retransmission
also times out, to expedite connection establishment, the TCP server
should retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and without the
AccECN option, while maintaining AccECN feedback mode.
This complies with Section 3.2.3.2.2 of the AccECN spec RFC9768.
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:08 +0000 (23:25 +0100)]
tcp: add TCP_SYNACK_RETRANS synack_type
Before this patch, retransmitted SYN/ACK did not have a specific
synack_type; however, the upcoming patch needs to distinguish between
retransmitted and non-retransmitted SYN/ACK for AccECN negotiation to
transmit the fallback SYN/ACK during AccECN negotiation. Therefore, this
patch introduces a new synack_type (TCP_SYNACK_RETRANS).
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:07 +0000 (23:25 +0100)]
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
Based on AccECN spec (RFC9768) Section 3.1.4.1, if the sender of an
AccECN SYN (the TCP Client) times out before receiving the SYN/ACK, it
SHOULD attempt to negotiate the use of AccECN at least one more time
by continuing to set all three TCP ECN flags (AE,CWR,ECE) = (1,1,1) on
the first retransmitted SYN (using the usual retransmission time-outs).
If this first retransmission also fails to be acknowledged, in
deployment scenarios where AccECN path traversal might be problematic,
the TCP Client SHOULD send subsequent retransmissions of the SYN with
the three TCP-ECN flags cleared (AE,CWR,ECE) = (0,0,0).
According to Sections 3.1.2 and 3.1.3 of AccECN spec (RFC9768).
In Section 3.1.2, it says an AccECN implementation has no need to
recognize or support the Server response labelled 'Nonce' or ECN-nonce
feedback more generally, as RFC 3540 has been reclassified as Historic.
AccECN is compatible with alternative ECN feedback integrity approaches
to the nonce. The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1)
is reserved for future use. A TCP Client (A) that receives such a SYN/ACK
follows the procedure for forward compatibility given in Section 3.1.3.
Then in Section 3.1.3, it says if a TCP Client has sent a SYN requesting
AccECN feedback with (AE,CWR,ECE) = (1,1,1) then receives a SYN/ACK with
the currently reserved combination (AE,CWR,ECE) = (1,0,1) but it does not
have logic specific to such a combination, the Client MUST enable AccECN
mode as if the SYN/ACK onfirmed that the Server supported AccECN and as
if it fed back that the IP-ECN field on the SYN had arrived unchanged.
Fixes: 3cae34274c79 ("tcp: accecn: AccECN negotiation"). Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-7-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:05 +0000 (23:25 +0100)]
tcp: disable RFC3168 fallback identifier for CC modules
When AccECN is not successfully negociated for a TCP flow, it defaults
fallback to classic ECN (RFC3168). However, L4S service will fallback
to non-ECN.
This patch enables congestion control module to control whether it
should not fallback to classic ECN after unsuccessful AccECN negotiation.
A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this
behavior expected by the CA.
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:04 +0000 (23:25 +0100)]
tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
Two flags for congestion control (CC) module are added in this patch
related to AccECN negotiation. First, a new flag (TCP_CONG_NEEDS_ACCECN)
defines that the CC expects to negotiate AccECN functionality using the
ECE, CWR and AE flags in the TCP header.
Second, during ECN negotiation, ECT(0) in the IP header is used. This
patch enables CC to control whether ECT(0) or ECT(1) should be used on
a per-segment basis. A new flag (TCP_CONG_ECT_1_NEGOTIATION) defines the
expected ECT value in the IP header by the CA when not-yet initialized
for the connection.
The detailed AccECN negotiaotn can be found in IETF RFC9768.
Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-5-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Chia-Yu Chang [Sat, 31 Jan 2026 22:25:03 +0000 (23:25 +0100)]
selftests/net: gro: add self-test for TCP CWR flag
Currently, GRO does not flush packets when the CWR bit is set.
A corresponding self-test is being added, in which the CWR flag
is set for two consecutive packets, but the first packet with the
CWR flag set will not be flushed immediately.
Ilpo Järvinen [Sat, 31 Jan 2026 22:25:02 +0000 (23:25 +0100)]
gro: flushing when CWR is set negatively affects AccECN
As AccECN may keep CWR bit asserted due to different
interpretation of the bit, flushing with GRO because of
CWR may effectively disable GRO until AccECN counter
field changes such that CWR-bit becomes 0.
There is no harm done from not immediately forwarding the
CWR'ed segment with RFC3168 ECN.
Ilpo Järvinen [Sat, 31 Jan 2026 22:25:01 +0000 (23:25 +0100)]
tcp: try to avoid safer when ACKs are thinned
Add newly acked pkts EWMA. When ACK thinning occurs, select
between safer and unsafe cep delta in AccECN processing based
on it. If the packets ACKed per ACK tends to be large, don't
conservatively assume ACE field overflow.
This patch uses the existing 2-byte holes in the rx group for new
u16 variables withtout creating more holes. Below are the pahole
outcomes before and after this patch:
====================
net: phy: remove modalias-based MDIO device bus matching
modalias-based MDIO device bus matching has only one user (dsa-loop),
where we can replace modalias-based matching with a simple custom
match function. This, and first patch of the series, lay the foundation
for removing modalias-based matching.
====================
Heiner Kallweit [Sat, 31 Jan 2026 17:40:00 +0000 (18:40 +0100)]
net: phy: remove modalias-based mdio bus matching
Last user dsa_loop has been migrated away from modalias-based matching,
so we can remove this feature now. It was the only user of MDIO_NAME_SIZE,
so remove also this constant.
Heiner Kallweit [Sat, 31 Jan 2026 17:38:56 +0000 (18:38 +0100)]
net: dsa: loop: remove MDIO device modalias
This change is a prerequisite for removing the MDIO device modalias,
as dsa_loop is the only user. Switch from modalias to a custom
bus match function.
Heiner Kallweit [Sat, 31 Jan 2026 17:37:15 +0000 (18:37 +0100)]
net: ethernet: adi: make name member of struct adin1110_cfg a pointer
Primary reason for this change is to remove the misuse of MDIO_NAME_SIZE
here, so that this constant can be removed in a follow-up patch.
Use case here is simply a chip name w/o any relationship to a MDIO
device. Also there's no need to reserve a longer char array, so make
the name a pointer.
Cosmin Ratiu [Wed, 28 Jan 2026 11:25:35 +0000 (13:25 +0200)]
devlink: Refactor devlink_rate_nodes_check
devlink_rate_nodes_check() was used to verify there are no devlink rate
nodes created when switching the esw mode.
Rate management code is about to become more complex, so refactor this
function:
- remove unused param 'mode'.
- add a new 'rate_filter' param.
- rename to devlink_rates_check().
- expose devlink_rate_is_node() to be used as a rate filter.
This makes it more usable from multiple places, so use it from those
places as well.
Cosmin Ratiu [Wed, 28 Jan 2026 11:25:33 +0000 (13:25 +0200)]
devlink: Reverse locking order for nested instances
Commit [1] defined the locking expectations for nested devlink
instances: the nested-in devlink instance lock needs to be acquired
before the nested devlink instance lock. The code handling devlink rels
was architected with that assumption in mind.
There are no actual users of double locking yet but that is about to
change in the upcoming patches in the series.
Code operating on nested devlink instances will require also obtaining
the nested-in instance lock, but such code may already be called from a
variety of places with the nested devlink instance lock. Then, there's
no way to acquire the nested-in lock other than making sure that all
callers acquire it first.
Reversing the nested lock order allows incrementally acquiring the
nested-in instance lock when needed (perhaps even a chain of locks up to
the root) without affecting any caller.
The only affected use of nesting is devlink_nl_nested_fill(), which
iterates over nested devlink instances with the RCU lock, without
locking them, so there's no possibility of deadlock.
So this commit just updates a comment regarding the nested locks.
The dwmac core has no support for SGMII without using its integrated
PCS. Thus, PHY_INTF_SEL_SGMII is only supported when this block is
present, and it makes no sense for stmmac_get_phy_intf_sel() to decode
this.
None of the platform glue users that use stmmac_get_phy_intf_sel()
directly accept PHY_INTF_SEL_SGMII as a valid mode.
Check whether a PCS will be used by the driver for the interface mode,
and if it is the integrated PCS, query the integrated PCS for the
phy_intf_sel_i value to use.
net: stmmac: move most PCS register definitions to stmmac_pcs.c
Move most of the PCS register offset definitions to stmmac_pcs.c.
Since stmmac_pcs.c only ever passes zero into the register offset
macros, remove that ability, making them simple constant integer
definitions.
Add appropriate descriptions of the registers, pointing out their
similarity with their IEEE 802.3 counterparts. Make use of the
BMSR definitions for the GMAC_AN_STATUS register and remove the
driver private versions.
Note that BMSR_LSTATUS is non-low-latching, unlike it's 802.3z
counterpart.
net: stmmac: clear half-duplex caps where unsupported
Where a core supports hardware features, but does not indicate support
for half-duplex, clear phylink's half-duplex 1G, 100M and 10M
capability bits to disallow half-duplex operation and advertisement of
these link modes.
This will avoid the need for special code in the PCS driver to do this
based on the ESTATUS register bits, as the support in the PCS is
dependent on the same synthesis choice as the MAC core.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vlmOQ-00000006zuz-0ffN@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Add support for Renesas RZ/G3L GBETH
From: Biju Das <biju.das.jz@bp.renesas.com>
The Renesas RZ/G3L GBETH IP uses Synopsys DesignWare MAC version 5.30
compared to other Renesas SoC such as RZ/V2H that use MAC version 5.20.
The RZ/G3L GBETH requires an extra clock compared to RZ/G3E and has pps
interrupts. Document the Renesas RZ/G3L GBETH IP in bindings and add
support for the RZ/G3L GBETH in dwmac-renesas-gbeth glue driver.
====================
Biju Das [Sat, 31 Jan 2026 16:12:43 +0000 (16:12 +0000)]
net: stmmac: dwmac-renesas-gbeth: Add support for RZ/G3L SoC
Compared to other Renesas GBETH stmmac glue drivers, RZ/G3L GBETH IP use
the version Synopsys DesignWare MAC (version 5.30). It has an extra clock
compared to RZ/V2H and has ptp_pps_o interrupts. Add support for RZ/G3L
GBETH by reusing device data of RZ/V2H and can be extended to add other
functionalities later.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/20260131161250.5047-3-biju.das.jz@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add device tree binding support for the Gigabit Ethernet (GBETH) IP on
Renesas RZ/G3L SoC. This SoC uses different Synopsys DesignWare MAC
version 5.30 compared to RZ/G3E.
RZ/G3L requires an extra clock compared to RZ/G3E and has pps interrupts.
Add a new compatible string "renesas,r9a08g046-gbeth" for RZ/G3L SoC and
update the schema to handle hardware differences between SoC variants.
Extend the base snps,dwmac.yaml schema to accommodate the PPS interrupts.
Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/20260131161250.5047-2-biju.das.jz@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
mptcp: implement .read_sock and .splice_read
This series is a preparation work for future in-kernel MPTCP sockets
usage. Here, two interfaces are implemented: read_sock and splice_read.
As a result of this series, splice() with MPTCP sockets -- which was
already supported -- is now improved.
- Patches 1-2: .read_sock implementation
- Patches 3-4: .splice_read implementation
- Patches 5-6: validate splice() support with MPTCP sockets.
====================
Geliang Tang [Fri, 30 Jan 2026 19:24:28 +0000 (20:24 +0100)]
selftests: mptcp: add splice io mode
This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice_read() functions of MPTCP.
do_splice() efficiently transfers data directly between two file
descriptors (infd and outfd) without copying to userspace, using
Linux's splice() system call.
Geliang Tang [Fri, 30 Jan 2026 19:24:25 +0000 (20:24 +0100)]
mptcp: implement .read_sock
Current in-kernel TCP sockets -- i.e. from nvme_tcp_try_recv() -- need
to call .read_sock interface of struct proto_ops, but it's not
implemented in MPTCP.
This patch implements it with reference to __tcp_read_sock() and
__mptcp_recvmsg_mskq().
Corresponding to tcp_recv_skb(), a new helper for MPTCP named
mptcp_recv_skb() is added to peek a skb from sk->sk_receive_queue.
Compared with __mptcp_recvmsg_mskq(), mptcp_read_sock() uses
sk->sk_rcvbuf as the max read length. The LISTEN status is checked
before the while loop, and mptcp_recv_skb() and mptcp_cleanup_rbuf()
are invoked after the loop. In the loop, all flags checks for
__mptcp_recvmsg_mskq() are removed.
Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-2-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
ptp: vmclock: Add VM generation counter and ACPI notification
Similarly to live migration, starting a VM from some serialized state
(aka snapshot) is an event which calls for adjusting guest clocks, hence
a hypervisor should increase the disruption_marker before resuming the
VM vCPUs, letting the guest know.
However, loading a snapshot, is slightly different than live migration,
especially since we can start multiple VMs from the same serialized
state. Apart from adjusting clocks, the guest needs to take additional
action during such events, e.g. recreate UUIDs, reset network
adapters/connections, reseed entropy pools, etc. These actions are not
necessary during live migration. This calls for a differentiation
between the two triggering events.
We differentiate between the two events via an extra field in the
vmclock_abi, called vm_generation_counter. Whereas hypervisors should
increase the disruption marker in both cases, they should only increase
vm_generation_counter when a snapshot is loaded in a VM (not during live
migration).
Additionally, we attach an ACPI notification to VMClock. Implementing
the notification is optional for the device. VMClock device will declare
that it implements the notification by setting
VMCLOCK_FLAG_NOTIFICATION_PRESENT bit in vmclock_abi flags. Hypervisors
that implement the notification must send an ACPI notification every
time seq_count changes to an even number. The driver will propagate
these notifications to userspace via the poll() interface.
====================
David Woodhouse [Fri, 30 Jan 2026 17:36:06 +0000 (17:36 +0000)]
ptp: ptp_vmclock: return TAI not UTC
To output UTC would involve complex calculations about whether the time
elapsed since the reference time has crossed the end of the month when
a leap second takes effect. I've prototyped that, but it made me sad.
Much better to report TAI, which is what PHCs should do anyway.
And much much simpler.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es> Tested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-8-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Woodhouse [Fri, 30 Jan 2026 17:36:04 +0000 (17:36 +0000)]
ptp: ptp_vmclock: add 'VMCLOCK' to ACPI device match
As we finalised the spec, we spotted that vmgenid actually says that the
_HID is supposed to be hypervisor-specific. Although in the 13 years
since the original vmgenid doc was published, nobody seems to have cared
about using _HID to distinguish between implementations on different
hypervisors, and we only ever use the _CID.
For consistency, match the _CID of "VMCLOCK" too.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es> Tested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-6-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Woodhouse [Fri, 30 Jan 2026 17:36:03 +0000 (17:36 +0000)]
ptp: ptp_vmclock: Add device tree support
Add device tree support to the ptp_vmclock driver, allowing it to probe
via device tree in addition to ACPI.
Handle optional interrupt for clock disruption notifications, mirroring
the ACPI notification behaviour.
Although the interrupt is marked as 'optional' in the DT bindings, if
the device *advertises* the VMCLOCK_FLAG_NOTIFICATION_ABSENT then it
*should* have an interrupt. The driver will refuse to initialize if not.
Babis Chalios [Fri, 30 Jan 2026 17:36:01 +0000 (17:36 +0000)]
ptp: vmclock: support device notifications
Add optional support for device notifications in VMClock. When
supported, the hypervisor will send a device notification every time it
updates the seq_count to a new even value.
Moreover, add support for poll() in VMClock as a means to propagate this
notification to user space. poll() will return a POLLIN event to
listeners every time seq_count changes to a value different than the one
last seen (since open() or last read()/pread()). This means that when
poll() returns a POLLIN event, listeners need to use read() to observe
what has changed and update the reader's view of seq_count. In other
words, after a poll() returned, all subsequent calls to poll() will
immediately return with a POLLIN event until the listener calls read().
The device advertises support for the notification mechanism by setting
flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If
the flag is not present the driver won't setup the ACPI notification
handler and poll() will always immediately return POLLHUP.
Signed-off-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Takahiro Itazuri <itazur@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-3-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Babis Chalios [Fri, 30 Jan 2026 17:36:00 +0000 (17:36 +0000)]
ptp: vmclock: add vm generation counter
Similar to live migration, loading a VM from some saved state (aka
snapshot) is also an event that calls for clock adjustments in the
guest. However, guests might want to take more actions as a response to
such events, e.g. as discarding UUIDs, resetting network connections,
reseeding entropy pools, etc. These are actions that guests don't
typically take during live migration, so add a new field in the
vmclock_abi called vm_generation_counter which informs the guest about
such events.
Hypervisor advertises support for vm_generation_counter through the
VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT flag. Users need to check the
presence of this bit in vmclock_abi flags field before using this flag.
Signed-off-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Takahiro Itazur <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-2-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Many network drivers have unnecessary empty module_init and module_exit
functions. Remove them (including some that just print a message). Note
that if a module_init function exists, a module_exit function must also
exist; otherwise, the module cannot be unloaded.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260131004327.18112-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ethernet: use module_pci_driver; remove useless driver versions
The module version is useless, and the only thing these drivers' init
routines did besides pci_register_driver was to print the driver name
and/or version.
Acked-by: Francois Romieu <romieu@fr.zoreil.com> (epic100) Reviewed-by: Simon Horman <horms@kernel.org> (epic100, sis900) Reviewed-by: Sai Krishna <saikrishnag@marvell.com> (epic100) Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Link: https://patch.msgid.link/20260131022441.56274-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: phy: dp83867: Always program R/SGMII enable bits
The hardware designers at my company neglected to read the datasheet for
this PHY and did not add appropriate resistors to configure it for
SGMII. Add support for configuring the it based on phy-mode instead of
relying on the resistors for a suitable default.
====================
Sean Anderson [Thu, 29 Jan 2026 17:12:05 +0000 (12:12 -0500)]
net: phy: dp83867: Always program R/SGMII enable bits
If the board designers have neglected to populate the appropriate
resistors on the strapping pins then the phy may default to the wrong
interface mode. Enable/disable the RGMII/SGMII enable bits as necessary
to select the correct interface.
The dp83867 strapping pins have four levels and typically configure two
features at once. LED_0 controls both port mirroring and whether SGMII
is enabled. If it is pulled to VDDIO, both port mirroring and SGMII
will be enabled. For variants of the dp83867 that do not support SGMII,
this will prevent data from being transferred. As we now explicitly set
the SGMII and RGMII enable bits, we do not need to detect whether SGMII
has been inadvertently enabled.
Sean Anderson [Thu, 29 Jan 2026 17:12:04 +0000 (12:12 -0500)]
net: phy: dp83867: Program TX FIFO for all interfaces
All supported interfaces use the TX FIFO register at least some of the
time, so there's no point in checking the interface. Retain the check
for the RX FIFO level since it is only used by SGMII.
The issue is that the existing code uses ethtool_get_flow_spec_ring_vf,
which will return a non-zero value if the ring_cookie is set to
RX_CLS_FLOW_DISC, which then causes bnxt_add_ntuple_cls_rule to return
-EOPNOTSUPP because it thinks the user is trying to set an ntuple filter
for a vf.
Fix this by first checking that the ring_cookie is not RX_CLS_FLOW_DISC.
After this patch, ntuple filters for drops can be added:
% sudo ethtool -U eth0 flow-type udp4 src-ip 1.1.1.1 action -1
Added rule with ID 0
% ethtool -n eth0
44 RX rings available
Total 1 rules
Filter: 0
Rule Type: UDP over IPv4
Src IP addr: 1.1.1.1 mask: 0.0.0.0
Dest IP addr: 0.0.0.0 mask: 255.255.255.255
TOS: 0x0 mask: 0xff
Src port: 0 mask: 0xffff
Dest port: 0 mask: 0xffff
Action: Drop
Jakub Kicinski [Sat, 31 Jan 2026 20:30:29 +0000 (12:30 -0800)]
tools: ynl: cli: make the output compact
Make the default (non-JSON) output more compact. Looking at RSS
context dumps is pretty much impossible without this, because
default print shows the indirection table with line per entry:
Jakub Kicinski [Sat, 31 Jan 2026 22:54:54 +0000 (14:54 -0800)]
docs: networking: mention that RSS table should be 4x the queue count
Spell out the recommendation that the RSS table should be
4x the queue count to avoid traffic imbalance. Include minor
rephrasing and removal of the explicit 128 entry example
since a 128 entry table is inadequate on modern machines.
Jakub Kicinski [Sat, 31 Jan 2026 22:54:53 +0000 (14:54 -0800)]
selftests: drv-net: rss: validate min RSS table size
Add a test which checks that the RSS table is at least 4x the max
queue count supported by the device. The original RSS spec from
Microsoft stated that the RSS indirection table should be 2 to 8
times the CPU count, presumably assuming queue per CPU. If the
CPU count is not a power of two, however, a power-of-2 table
2x larger than queue count results in a 33% traffic imbalance.
Validate that the indirection table is at least 4x the queue
count. This lowers the imbalance to 16% which empirically
appears to be more acceptable to memcache-like workloads.
This first 2 patches are by Biju Das, target the rcar_canfd driver and
add support for FD-only mode.
Lad Prabhakar's patches, also for the rcar_canfd driver add support
for the RZ/T2H SoC.
The last 2 patches are by Michael Tretter and me, target the sja1000
driver and clean up the CAN state handling.
* tag 'linux-can-next-for-6.20-20260131' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: sja1000: sja1000_err(): use error counter for error state
can: sja1000: sja1000_err(): make use of sja1000_get_berr_counter() to read error counters
can: rcar_canfd: Add RZ/T2H support
dt-bindings: can: renesas,rcar-canfd: Document RZ/T2H and RZ/N2H SoCs
dt-bindings: can: renesas,rcar-canfd: Document RZ/V2H(P) and RZ/V2N SoCs
dt-bindings: can: renesas,rcar-canfd: Specify reset-names
can: rcar_canfd: Add support for FD-Only mode
dt-bindings: can: renesas,rcar-canfd: Document renesas,fd-only property
====================
As reported by Al Viro, the TCP ULP support for SMC is fundamentally
broken. The implementation attempts to convert an active TCP socket
into an SMC socket by modifying the underlying `struct file`, dentry,
and inode in-place, which violates core VFS invariants that assume
these structures are immutable for an open file, creating a risk of
use after free errors and general system instability.
Given the severity of this design flaw and the fact that cleaner
alternatives (e.g., LD_PRELOAD, BPF) exist for legacy application
transparency, the correct course of action is to remove this feature
entirely.
net: ax25: remove plumbing for never-implemented DAMA Master support
The AX25_DAMA_MASTER option has been unimplemented and marked broken
ever since it was introduced in 2007 in commit 954b2e7f4c37 ("[NET]
AX.25 Kconfig and docs updates and fixes"). At this point, it is very
unlikely it will be implemented. Remove it.
====================
net: wwan: add NMEA port type support
The series introduces a long discussed NMEA port type support for the
WWAN subsystem. There are two goals. From the WWAN driver perspective,
NMEA exported as any other port type (e.g. AT, MBIM, QMI, etc.). From
user space software perspective, the exported chardev belongs to the
GNSS class what makes it easy to distinguish desired port and the WWAN
device common to both NMEA and control (AT, MBIM, etc.) ports makes it
easy to locate a control port for the GNSS receiver activation.
Done by exporting the NMEA port via the GNSS subsystem with the WWAN
core acting as proxy between the WWAN modem driver and the GNSS
subsystem.
The series starts from a cleanup patch. Then three patches prepares the
WWAN core for the proxy style operation. Followed by a patch introding a
new WWNA port type, integration with the GNSS subsystem and demux. The
series ends with a couple of patches that introduce emulated EMEA port
to the WWAN HW simulator.
The series is the product of the discussion with Loic about the pros and
cons of possible models and implementation. Also Muhammad and Slark did
a great job defining the problem, sharing the code and pushing me to
finish the implementation. Daniele has caught an issue on driver
unloading and suggested an investigation direction. What was concluded
by Loic. Many thanks.
====================
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:57 +0000 (14:21 +0800)]
net: wwan: hwsim: support NMEA port emulation
Support NMEA port emulation for the WWAN core GNSS port testing purpose.
Emulator produces pair of GGA + RMC sentences every second what should
be enough to fool gpsd into believing it is working with a NMEA GNSS
receiver.
If the GNSS system is enabled then one NMEA port will be created
automatically for the simulated WWAN device. Manual NMEA port creation
is not supported at the moment.
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:56 +0000 (14:21 +0800)]
net: wwan: hwsim: refactor to support more port types
Just introduced WWAN NMEA port type needs a testing option. The WWAN HW
simulator was developed with the AT port type in mind and cannot be
easily extended. Refactor it now to make it capable to support more port
types.
No big functional changes, mostly renaming with a little code
rearrangement.
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:55 +0000 (14:21 +0800)]
net: wwan: add NMEA port support
Many WWAN modems come with embedded GNSS receiver inside and have a
dedicated port to output geopositioning data. On the one hand, the
GNSS receiver has little in common with WWAN modem and just shares a
host interface and should be exported using the GNSS subsystem. On the
other hand, GNSS receiver is not automatically activated and needs a
generic WWAN control port (AT, MBIM, etc.) to be turned on. And a user
space software needs extra information to find the control port.
Introduce the new type of WWAN port - NMEA. When driver asks to register
a NMEA port, the core allocates common parent WWAN device as usual, but
exports the NMEA port via the GNSS subsystem and acts as a proxy between
the device driver and the GNSS subsystem.
From the WWAN device driver perspective, a NMEA port is registered as a
regular WWAN port without any difference. And the driver interacts only
with the WWAN core. From the user space perspective, the NMEA port is a
GNSS device which parent can be used to enumerate and select the proper
control port for the GNSS receiver management.
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:54 +0000 (14:21 +0800)]
net: wwan: core: split port unregister and stop
Upcoming GNSS (NMEA) port type support requires exporting it via the
GNSS subsystem. On another hand, we still need to do basic WWAN core
work: call the port stop operation, purge queues, release the parent
WWAN device, etc. To reuse as much code as possible, split the port
unregistering function into the deregistration of a regular WWAN port
device, and the common port tearing down code.
In order to keep more code generic, break the device_unregister() call
into device_del() and put_device(), which release the port memory
uniformly.
Sergey Ryazanov [Mon, 26 Jan 2026 06:21:53 +0000 (14:21 +0800)]
net: wwan: core: split port creation and registration
Upcoming GNSS (NMEA) port type support requires exporting it via the
GNSS subsystem. On another hand, we still need to do basic WWAN core
work: find or allocate the WWAN device, make it the port parent, etc. To
reuse as much code as possible, split the port creation function into
the registration of a regular WWAN port device, and basic port struct
initialization.
To be able to use put_device() uniformly, break the device_register()
call into device_initialize() and device_add() and call device
initialization earlier.
While at it, fix a minor number leak upon WWAN port registration
failure.