Hangbin Liu [Tue, 2 Sep 2025 06:45:01 +0000 (06:45 +0000)]
selftests: bonding: add test for LACP actor port priority
Add comprehensive selftest to verify:
- Per-port actor priority setting via ad_actor_port_prio
- Aggregator selection behavior with port_priority ad_select policy
Also move cmd_jq helper from forwarding/lib.sh to net/lib.sh for
broader reusability across network selftests.
Here is the result output
# ./bond_lacp_prio.sh
TEST: bond 802.3ad (ad_actor_port_prio setting) [ OK ]
TEST: bond 802.3ad (ad_actor_port_prio select) [ OK ]
TEST: bond 802.3ad (ad_actor_port_prio switch) [ OK ]
Hangbin Liu [Tue, 2 Sep 2025 06:45:00 +0000 (06:45 +0000)]
bonding: support aggregator selection based on port priority
Add a new ad_select policy 'port_priority' that uses the per-port
actor priority values (set via ad_actor_port_prio) to determine
aggregator selection.
This allows administrators to influence which ports are preferred
for aggregation by assigning different priority values, providing
more flexible load balancing control in LACP configurations.
Hangbin Liu [Tue, 2 Sep 2025 06:44:59 +0000 (06:44 +0000)]
bonding: add support for per-port LACP actor priority
Introduce a new netlink attribute 'actor_port_prio' to allow setting
the LACP actor port priority on a per-slave basis. This extends the
existing bonding infrastructure to support more granular control over
LACP negotiations.
The priority value is embedded in LACPDU packets and will be used by
subsequent patches to influence aggregator selection policies.
====================
Support exposing raw cycle counters in PTP and mlx5
This series by Carolina adds support in ptp and usage in mlx5 for
exposing the raw free-running cycle counter of PTP hardware clocks.
This is V2. Find previous one here:
https://lore.kernel.org/all/1752556533-39218-1-git-send-email-tariqt@nvidia.com/
Find detailed description by Carolina below [1].
[1]
This patch series introduces support for exposing the raw free-running
cycle counter of PTP hardware clocks. When the device is in free-running
mode, it emits timestamps as raw cycle values instead of nanoseconds.
These values may be passed directly to user space through:
- fwctl: exposes internal device event records that include raw
cycle-based timestamps.
- DPDK: retrieves CQEs that contain raw cycle counters, which are passed
to user space unmodified.
To address this, the series introduces two new ioctl commands that allow
userspace to query the device's raw cycle counter together with host
time:
- PTP_SYS_OFFSET_PRECISE_CYCLES
- PTP_SYS_OFFSET_EXTENDED_CYCLES
These commands work like their existing counterparts but return the
device timestamp in cycle units instead of real-time nanoseconds. This
allows user space to collect (cycle, time) pairs and build a mapping
between the device’s free-running clock and host time.
This can also be useful in the XDP fast path: if a driver inserts the
raw cycle value into metadata instead of a real-time timestamp, it can
avoid the overhead of converting cycles to time in the kernel. Then
userspace can resolve the cycle-to-time mapping using this ioctl when
needed.
The ioctl enables user space to correlate those with host time, without
requiring the PHC to be synchronized, so long as the drift remains
stable during collection.
Adds the new PTP ioctls and integrates support in ptp_ioctl():
- ptp: Add ioctl commands to expose raw cycle counter values
Support for exposing raw cycles in mlx5:
- net/mlx5: Extract MTCTR register read logic into helper function
- net/mlx5: Support getcyclesx and getcrosscycles
====================
Carolina Jubran [Tue, 12 Aug 2025 14:17:06 +0000 (17:17 +0300)]
ptp: Add ioctl commands to expose raw cycle counter values
Introduce two new ioctl commands, PTP_SYS_OFFSET_PRECISE_CYCLES and
PTP_SYS_OFFSET_EXTENDED_CYCLES, to allow user space to access the
raw free-running cycle counter from PTP devices.
These ioctls are variants of the existing PRECISE and EXTENDED
offset queries, but instead of returning device time in realtime,
they return the raw cycle counter value.
In the old days, RDS used FMR (Fast Memory Registration) to register
IB MRs to be used by RDMA. A newer and better verbs based
registration/de-registration method called FRWR (Fast Registration
Work Request) was added to RDS by commit 1659185fb4d0 ("RDS: IB:
Support Fastreg MR (FRMR) memory registration mode") in 2016.
Detection and enablement of FRWR was done in commit 2cb2912d6563
("RDS: IB: add Fastreg MR (FRMR) detection support"). But said commit
added an extern bool prefer_frmr, which was not used by said commit -
nor used by later commits. Hence, remove it.
Jakub Kicinski [Tue, 9 Sep 2025 01:12:10 +0000 (18:12 -0700)]
Merge branch 'net-stmmac-mdio-cleanups'
Russell King says:
====================
net: stmmac: mdio cleanups
Clean up the stmmac MDIO code:
- provide an address register formatter to avoid repeated code
- provide a common function to wait for the busy bit to clear
- pre-compute the CR field (mdio clock divider)
- move address formatter into read/write functions
- combine the read/write functions into a common accessor function
- move runtime PM handling into common accessor function
- rename register constants to better reflect manufacturer names
- move stmmac_clk_csr_set() into stmmac_mdio
- make stmmac_clk_csr_set() return the CR field value and remove
priv->clk_csr
- clean up if() range tests in stmmac_clk_csr_set()
- use STMMAC_CSR_xxx definitions in initialisers
For Qualcomm QCS9100 Ride R3 board with the AQR115C PHY:
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
====================
net: stmmac: use STMMAC_CSR_xxx definitions in platform glue
Use the STMMAC_CSR_xxx definitions to initialise plat->clk_csr in the
platform glue drivers to make the integer values meaningful.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8oh-00000001vpT-0vk2@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8oc-00000001vpN-0S1A@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: mdio: return clk_csr value from stmmac_clk_csr_set()
Return the clk_csr value from stmmac_clk_csr_set() rather than
using priv->clk_csr, as this struct member now serves very little
purpose. This allows us to remove priv->clk_csr.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8oW-00000001vpH-46zf@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: mdio: move initialisation of priv->clk_csr to stmmac_mdio
The only user of priv->clk_csr is the MDIO code, so move its
initialisation to stmmac_mdio.c.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8oR-00000001vpB-3fbY@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: mdio: improve mdio register field definitions
Include the register name in the definitions, and use a name which
more closely resembles that used in documentation, while still being
descriptive.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8oM-00000001vp4-3DC5@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: mdio: move runtime PM into stmmac_mdio_access()
Move the runtime PM handling into the common stmmac_mdio_access()
function, rather than having it in the four top-level bus access
functions.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8oH-00000001voy-2jfU@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: mdio: merge stmmac_mdio_read() and stmmac_mdio_write()
stmmac_mdio_read() and stmmac_mdio_write() are virtually identical
except for the final read in the stmmac_mdio_read(). Handle this as
a flag.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8oC-00000001vos-2JnA@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: mdio: move stmmac_mdio_format_addr() into read/write
Move stmmac_mdio_format_addr() into stmmac_mdio_read() and
stmmac_mdio_write().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8o7-00000001vom-1pN8@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: mdio: provide priv->gmii_address_bus_config
Provide a pre-formatted value for the MDIO address register fields
which remain constant across the various different transactions
rather than recreating the register value from scratch every time.
Currently, we only do this for the CR (clock range) field.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8o2-00000001vog-1LyK@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All the readl_poll_timeout()s follow the same pattern - test a register
for a bit being clear every 100us, and timeout after 10ms returning
-EBUSY. Wrap this up into a function to avoid duplicating this.
This slightly changes the return value for stmmac_mdio_write() if the
second readl_poll_timeout() fails - rather than returning -ETIMEDOUT
we return -EBUSY matching the stmmac_mdio_read() behaviour.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8nx-00000001voa-0tJ0@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: mdio: provide address register formatter
Rather than duplicating the logic for filling the PA (MDIO address),
GR (MDIO register/devad), CR (clock range) and GB (busy) fields of the
address register in four locations, provide a helper to do this.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com> Link: https://patch.msgid.link/E1uu8ns-00000001voU-0S7b@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
ipv6: snmp: avoid performance issue with RATELIMITHOST
Addition of ICMP6_MIB_RATELIMITHOST in commit d0941130c9351
("icmp: Add counters for rate limits") introduced a performance drop
in case of DOS (like receiving UDP packets to closed ports).
Per netns ICMP6_MIB_RATELIMITHOST tracking uses per-cpu storage and
is enough, we do not need per-device and slow tracking for this metric.
In v2 of this series, I completed the removal of SNMP_MIB_SENTINEL
in all the kernel for consistency.
Jakub Kicinski [Sat, 6 Sep 2025 21:13:51 +0000 (14:13 -0700)]
selftests: net: move netlink-dumps back to progs
Commit 9bb88c659673 ("selftests: net: test extacks in netlink dumps")
moved netlink-dumps from TEST_GEN_PROGS to YNL_GEN_FILES.
But _FILES are not for tests, rather for utilities / helpers.
Create YNL_GEN_PROGS and include netlink-dumps there.
This makes netlink-dumps part of executed tests, again.
Jakub Kicinski [Sat, 6 Sep 2025 21:13:50 +0000 (14:13 -0700)]
selftests: net: make the dump test less sensitive to mem accounting
Recent changes to make netlink socket memory accounting must
have broken the implicit assumption of the netlink-dump test
that we can fit exactly 64 dumps into the socket. Handle the
failure mode properly, and increase the dump count to 80
to make sure we still run into the error condition if
the default buffer size increases in the future.
====================
10G-QXGMII for AQR412C, Felix DSA and Lynx PCS driver
Introduce the first user of the "10g-qxgmii" phy-mode, since its
introduction from commit 5dfabcdd76b1 ("dt-bindings: net:
ethernet-controller: add 10g-qxgmii mode").
The arch/arm64/boot/dts/freescale/fsl-ls1028a-qds-13bb.dtso already
exists upstream, but has phy-mode = "usxgmii", which comes from the fact
that the AQR412(C) PHY does not distinguish between the two modes.
Yet, the distinction is crucial for the upcoming SerDes driver for the
LS1028A platform.
The series is comprised of:
- preliminary patches to the Lynx PCS and Felix DSA driver which accept
the phy-mode and treat it like "usxgmii"
- an ad-hoc whitelisting mechanism in the Aquantia PHY driver based on
firmware version, which was agreed upon with Marvell, and which serves
as "detection"
- in-band auto-negotiation capability reporting and configuration. This
makes sure this feature is enabled in the PHY, because the Lynx PCS
only works with USXGMII/10G-QXGMII in-band autoneg enabled.
Notably, it lacks a device tree update, which will come later, but
should not be strictly necessary. The expectation is for the Aquantia
PHY driver to pick up "10g-qxgmii" with existing device trees as well,
which it does, except for the slightly confusing "configuring for
inband/usxgmii link mode" initial message. This changes to "configuring
for inband/10g-qxgmii link mode" once phylink gets a chance to pick up
the phydev->interface in its pl->link_config.interface.
$ ip link set swp3 up
mscc_felix 0000:00:00.5 swp3: configuring for inband/usxgmii link mode
mscc_felix 0000:00:00.5 swp3: phylink_mac_config: mode=inband/usxgmii/none adv=0000000,00000000,00008000,0002606c pause=04
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 0
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 1
mscc_felix 0000:00:00.5 swp3: phylink_mac_config: mode=inband/10g-qxgmii/none adv=0000000,00000000,00008000,0002606c pause=00
mscc_felix 0000:00:00.5 swp3: Link is Up - 2.5Gbps/Full - flow control off
$ ip link set swp3 down
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 0
mscc_felix 0000:00:00.5 swp3: Link is Down
$ ip link set swp3 up
mscc_felix 0000:00:00.5 swp3: configuring for inband/10g-qxgmii link mode
mscc_felix 0000:00:00.5 swp3: phylink_mac_config: mode=inband/10g-qxgmii/none adv=0000000,00000000,00008000,0002606c pause=04
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 0
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 1
mscc_felix 0000:00:00.5 swp3: Link is Up - 2.5Gbps/Full - flow control off
====================
Vladimir Oltean [Wed, 3 Sep 2025 13:07:30 +0000 (16:07 +0300)]
net: phy: aquantia: support phy-mode = "10g-qxgmii" on NXP SPF-30841 (AQR412C)
The quad port PHYs (AQR4*) have 4 system interfaces, and some of them,
like AQR412C, can be used with a special firmware provisioning which
multiplexes all ports over a single host-side SerDes lane. The protocol
used over this lane is Cisco 10G-QXGMII feature, or "MUSX", as Aquantia
seems to call it.
One such example is the AQR412C PHY from the NXP SPF-30841 10G-QXGMII
add-in card, which uses this firmware file:
https://github.com/nxp-qoriq/qoriq-firmware-aquantia/blob/master/AQR-G3_v4.3.C-AQR_NXP_SPF-30841_MUSX_ID40019_VER1198.cld
There seems to be no disagreement, including from Marvell FAE, that
10G-QXGMII is reported to the host over MDIO as USXGMII and
indistinguishable from it. This includes the registers from the
provisioning based on which the firmware configures a single system
interface (lane C in the case of SPF-30841) to multiplex all ports -
they are also only accessible from the firmware, or over I2C (?!).
However, the Linux MAC and especially SerDes drivers may need to know if
it is using 1 port per lane (USXGMII) or 4 ports per lane (10G-QXGMII).
In the downstream Layerscape SDK we have previously implemented a
simpler scheme where for certain PHY interface modes, we trust the
device tree and never let the PHY driver overwrite phydev->interface:
https://github.com/nxp-qoriq/linux/commit/862694a4961db590c4d8a5590b84791361ca773d
but for upstream, a nicer detection method is implemented, where
although we can not distinguish USXGMII from 10G-QXGMII per se, we
create a whitelist of firmware fingerprints for which USXGMII is
translated into 10G-QXGMII. At the time of writing, it is expected that
this should only happen for the NXP SPF-30841 card, although extending
for more is trivial - just uncomment the phydev_dbg() in
aqr_build_fingerprint().
An advantage of this method is that it doesn't strictly require updates
to arch/arm64/boot/dts/freescale/fsl-ls1028a-qds-13bb.dtso, since the
PHY driver will transition from "usxgmii" to "10g-qxgmii".
All aqr_translate_interface() callers have also previously called
aqr107_probe(), so dereferencing phydev->priv is safe.
Vladimir Oltean [Wed, 3 Sep 2025 13:07:29 +0000 (16:07 +0300)]
net: phy: aquantia: create and store a 64-bit firmware image fingerprint
Some PHY features cannot be queried through MDIO registers and require
alternative driver detection methods.
One such feature is 10G-QXGMII (4 ports of up to 2.5G multiplexed over
a single SerDes lane), or "MUSX" as it is called by Aquantia/Marvell.
The firmware has provisioning to modify some registers which seem
inaccessible for read or write over MDIO, which configure an internal
mux for MUSX. To the host, over MDIO, the system interface appears
indistinguishable from single-port-per-lane USXGMII.
Marvell FAE Ziang You recommended a detection method for this feature
based on a tuple which should hopefully identify the firmware build
uniquely. Most of the tuple items are already printed by
aqr107_chip_info(), and an extra set is the misc ID (reg 1.c41d) and the
misc version (reg 1.c41e). These are auto-generated by the Marvell
firmware tool for formal builds, and should be unique (not my claim).
In addition, at least for the builds provided to NXP and redistributed
here:
https://github.com/nxp-qoriq/qoriq-firmware-aquantia/tree/master
these registers are part of the name, for example in
AQR-G3_v4.3.C-AQR_NXP_SPF-30841_MUSX_ID40019_VER1198.cld, reg 1.c41d
will contain 40019 and reg 1.c41e will contain 1198.
Note that according to commit 43429a0353af ("net: phy: aquantia: report
PHY details like firmware version"), the "chip may be functional even
w/o firmware image." In that case, we can't construct a fingerprint and
it will remain zero. That shouldn't imact the use case though.
Dereferencing phydev->priv should be ok in all cases: all
aqr_gen1_config_init() callers have also previously called
aqr107_probe().
Vladimir Oltean [Wed, 3 Sep 2025 13:07:28 +0000 (16:07 +0300)]
net: phy: aquantia: report and configure in-band autoneg capabilities
The Global System Configuration registers for each media side link speed
have bit 3 which controls auto-negotiation for the system interface.
Since bits 2:0 of the same register indicate the SerDes protocol for the
same system interface, it makes sense to filter these registers for the
SerDes protocol matching phydev->interface, and to read/write the
auto-negotiation bit.
However, experimentally, USXGMII in-band auto-negotiation is unaffected
by this bit, and instead reacts to bit 3 of register 4.C441 (PHY XS
Transmit Reserved Vendor Provisioning 2).
Both the Global System Configuration as well as the aforementioned
register 4.C441 are documented as PD (Provisioning Defaults), i.e. each
PHY firmware may provision its own values.
I was initially planning to only read these values and not support
changing them (instead just the MAC PCS reconfigures itself, if it can).
But there is one problem: Linux expects that the in-band capability is
configured the same for all speeds where a given SerDes protocol is used.
I was going to add logic that detects mismatched vendor provisioning
(in-band autoneg enabled for speed X, disabled for speed Y) and warn
about it and return 0 (unknown capabilities).
Funnily enough, there is already a known instance where speed 2500 has
"autoneg 1" and the lower speeds have "autoneg 0":
https://lore.kernel.org/netdev/aJH8n0zheqB8tWzb@FUE-ALEWI-WINX/
I don't think it's worth fighting the battle with inconsistent firmware
images built by Aquantia/Marvell, and reporting that to the user, when
we have the ability to modify these fields to values that make sense to
us. We see the same situation with all the aqr*_get_features() functions
which fix up nonsensical supported link modes.
Furthermore, altering the in-band auto-negotiation setting can be
considered a minor change, compared to changing the SerDes protocol in
its entirety, for which we are still not prepared.
Testing was done on:
- AQR107 (Gen2) in USXGMII mode, as found on the NXP LX2160A-RDB.
- AQR112 (Gen3) in USXGMII mode, as found on the NXP SCH-30842 riser
card, plugged into LS1028A-QDS.
- AQR412C (Gen3) in 10G-QXGMII mode, as found on the NXP SCH-30841 riser
card, plugged into the LS1028A-QDS.
- AQR115 (Gen4) in SGMII mode, as found on the NXP LS1046A-RDB rev E.
Vladimir Oltean [Wed, 3 Sep 2025 13:07:27 +0000 (16:07 +0300)]
net: phy: aquantia: print global syscfg registers
Sometimes people with unknown firmware provisioning post on the mailing
lists asking for support. The information collected by
aqr_gen2_read_global_syscfg() is sufficiently important to warrant a
phydev_dbg() that can easily be turned into a verbose print by the
system owner in case some debugging is needed.
Vladimir Oltean [Wed, 3 Sep 2025 13:07:26 +0000 (16:07 +0300)]
net: dsa: felix: support phy-mode = "10g-qxgmii"
The "usxgmii" phy-mode that the Felix switch ports support on LS1028A is
not quite USXGMII, it is defined by the USXGMII multiport specification
document as 10G-QXGMII. It uses the same signaling as USXGMII, but it
multiplexes 4 ports over the link, resulting in a maximum speed of 2.5G
per port.
This change is needed in preparation for the lynx-10g SerDes driver on
LS1028A, which will make a more clear distinction between usxgmii
(supported on lane 0) and 10g-qxgmii (supported on lane 1). These
protocols have their configuration in different PCCR registers (PCCRB vs
PCCR9).
Continue parsing and supporting single-port-per-lane USXGMII when found
in the device tree as usual (because it works), but add support for
10G-QXGMII too. Using phy-mode = "10g-qxgmii" will be required when
modifying the device trees to specify a "phys" phandle to the SerDes
lane. The result when the "phys" phandle is present but the phy-mode is
wrong is undefined.
The only PHY driver in known use with this phy-mode, AQR412C, will gain
logic to transition from "usxgmii" to "10g-qxgmii" in a future change.
Prepare the driver by also setting PHY_INTERFACE_MODE_10G_QXGMII in
supported_interfaces when PHY_INTERFACE_MODE_USXGMII is there, to
prevent breakage with existing device trees.
Vladimir Oltean [Wed, 3 Sep 2025 13:07:25 +0000 (16:07 +0300)]
net: pcs: lynx: support phy-mode = "10g-qxgmii"
This is a SerDes protocol with 4 ports multiplexed over a single SerDes
lane, each port capable of 10/100/1000/2500. It is used on LS1028A lane
1, connected to the 4 switch ports.
While reviewing code in the stmmac PTP driver, I noticed that the
getcrosststamp() method is always populated, irrespective of whether
it is implemented or not by the stmmac platform specific glue layer.
Where a platform specific glue layer does not implement it, the core
stmmac driver code returns -EOPNOTSUPP. However, the PTP clock core
code uses the presence of the method in ptp_clock_ops to determine
whether this facility should be advertised to userspace (see
ptp_clock_getcaps()).
Moreover, the only platform glue that implements this method is the
Intel glue, and for it not to return -EOPNOTSUPP, the CPU has to
support X86_FEATURE_ART.
This series updates the core stmmac code to only provide the
getcrosststamp() method in ptp_clock_ops when the platform glue code
provides an implementation, and then updates the Intel glue code to
only provide its implementation when the CPU has the necessary
X86_FEATURE_ART feature.
====================
net: stmmac: intel: only populate plat->crosststamp when supported
To allow the ptp_chardev code to correctly detect whether crosststamps
are supported, we need to conditionally populate the .getcrosststamp()
method. As the previous patch implements that functionality by
detecting whether the platform glue provides a crosststamp() method,
arrange for the dwmac-intel code to only populate this if the X86
ART feature is present, rather than testing for it at runtime in
intel_crosststamp().
This reflects what other x86 PTP clock drivers do, e.g.
ice_ptp_set_funcs_e830(), e1000e_ptp_init(), idpf_ptp_set_caps() etc.
drivers/char/ptp_chardev.c::ptp_clock_getcaps() uses the presence of
the getcrosststamp() method to indicate to userspace whether
crosststamping is supported or not. Therefore, we should not provide
this method unless it is functional. Only set this method pointer
in stmmac_ptp_register() if the platform glue provides the
necessary functionality.
This does not mean that it will be supported (see intel_crosststamp(),
which is the only implementation that may have support) but at least
we won't be suggesting that it is supported on many platforms where
there is no hope.
Jakub Kicinski [Sat, 6 Sep 2025 01:16:09 +0000 (18:16 -0700)]
Merge branch 'sh_eth-pm-related-cleanups'
Geert Uytterhoeven says:
====================
sh_eth: PM-related cleanups
This patch series contains various cleanups related to power management
for the Renesas SH Ethernet driver, as used on Renesas SH, ARM32, and
ARM64 platforms.
This has been tested on various SoCs (R-Mobile A1, RZ/A1H, RZ/A2M, R-Car
H1, R-Car M2-W).
====================
There is no stringent need to power down the device immediately after a
register read, or after a failed open. Relax power down handling by
replacing calls to synchronous pm_runtime_put_sync() by calls to
asynchronous pm_runtime_put().
Convert the Renesas SuperH Ethernet driver from an open-coded dev_pm_ops
structure to DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr(). This lets
us drop the checks for CONFIG_PM and CONFIG_PM_SLEEP without impacting
code size, while increasing build coverage.
Since commit 63d00be69348fda4 ("PM: runtime: Allow unassigned
->runtime_suspend|resume callbacks"), unassigned
.runtime_{suspend,resume}() callbacks are treated the same as dummy
callbacks that just return zero.
devmem test fails on NIPA. Most likely we get skb(s) with readable
frags (why?) but the failure manifests as an OOM. The OOM happens
because ncdevmem spams the following message:
recvmsg ret=-1
recvmsg: Bad address
As of today, ncdevmem can't deal with various reasons of EFAULT:
- falling back to regular recvmsg for non-devmem skbs
- increasing ctrl_data size (can't happen with ncdevmem's large buffer)
Exit (cleanly) with error when recvmsg returns EFAULT. This should at
least cause the test to cleanup its state.
The only user of fixed_phy gpio functionality was here:
arch/arm/boot/dts/nxp/vf/vf610-zii-dev-rev-b.dts
Support for the switch on this board was migrated to phylink
(DSA - mv88e6xxx) years ago, so the functionality is unused now.
Therefore remove it.
Note: There is a very small risk that there's out-of-tree users
who use link gpio with a switch chip not handled by DSA.
However we care about in-tree device trees only.
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/75295a9a-e162-432c-ba9f-5d3125078788@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After analysis, it appears this is because of the cond_resched()
call from __release_sock().
When current thread is yielding, while still holding the TCP socket lock,
it might regain the cpu after a very long time.
Other peer TLP/RTO is firing (multiple times) and packets are retransmit,
while the initial copy is waiting in the socket backlog or receive queue.
In this patch, I call cond_resched() only once every 16 packets.
Modern TCP stack now spends less time per packet in the backlog,
especially because ACK are no longer sent (commit 133c4c0d3717
"tcp: defer regular ACK while processing socket backlog")
Eric Dumazet [Wed, 3 Sep 2025 08:47:20 +0000 (08:47 +0000)]
tcp: use tcp_eat_recv_skb in __tcp_close()
Small change to use tcp_eat_recv_skb() instead
of __kfree_skb(). This can help if an application
under attack has to close many sockets with unread data.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250903084720.1168904-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Piotr allows for 2.5Gb and 5Gb autoneg for ixgbe E610 devices.
Jedrzej refactors reading of OROM data to be more efficient on ixgbe.
Kohei Enju adds reporting of loopback Tx packets and bytes on igbvf. He
also removes redundant reporting of Rx bytes.
Jacek Kowalski remove unnecessary u16 casts in e1000, e1000e, igb, igc,
and ixgbe drivers.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ixgbe: drop unnecessary casts to u16 / int
igc: drop unnecessary constant casts to u16
igb: drop unnecessary constant casts to u16
e1000e: drop unnecessary constant casts to u16
e1000: drop unnecessary constant casts to u16
igbvf: remove redundant counter rx_long_byte_count from ethtool statistics
igbvf: add lbtx_packets and lbtx_bytes to ethtool statistics
ixgbe: reduce number of reads when getting OROM data
ixgbe: add the 2.5G and 5G speeds in auto-negotiation for E610
====================
Merge tag 'net-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and Bluetooth.
We're reverting the removal of a Sundance driver, a user has appeared.
This makes the PR rather large in terms of LoC.
There's a conspicuous absence of real, user-reported 6.17 issues.
Slightly worried that the summer distracted people from testing.
Previous releases - regressions:
- ax25: properly unshare skbs in ax25_kiss_rcv()
Previous releases - always broken:
- phylink: disable autoneg for interfaces that have no inband, fix
regression on pcs-lynx (NXP LS1088)
- vxlan: fix null-deref when using nexthop objects
- batman-adv: fix OOB read/write in network-coding decode
- icmp: icmp_ndo_send: fix reversing address translation for replies
- tcp: fix socket ref leak in TCP-AO failure handling for IPv6
- mctp:
- mctp_fraq_queue should take ownership of passed skb
- usb: initialise mac header in RX path, avoid WARN
- wifi: mac80211: do not permit 40 MHz EHT operation on 5/6 GHz,
respect device limitations
- wifi: wilc1000: avoid buffer overflow in WID string configuration
- wifi: mt76:
- fix regressions from mt7996 MLO support rework
- fix offchannel handling issues on mt7996
- fix multiple wcid linked list corruption issues
- mt7921: don't disconnect when AP requests switch to a channel
which requires radar detection
- mt7925u: use connac3 tx aggr check in tx complete
- wifi: intel:
- improve validation of ACPI DSM data
- cfg: restore some 1000 series configs
- wifi: ath:
- ath11k: a fix for GTK rekeying
- ath12k: a missed WiFi7 capability (multi-link EMLSR)
- eth: intel:
- ice: fix races in "low latency" firmware interface for Tx timestamps
- idpf: set mac type when adding and removing MAC filters
- i40e: remove racy read access to some debugfs files
Misc:
- Revert "eth: remove the DLink/Sundance (ST201) driver"
Merge tag 'slab-for-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- Stable fix to make slub_debug code not access invalid pointers in the
process of reporting issues (Li Qiong)
- Stable fix to make object tracking pass gfp flags to stackdepot to
avoid deadlock in contexts that can't even wake up kswapd due to e.g.
timers debugging enabled (yangshiguang)
* tag 'slab-for-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm: slub: avoid wake up kswapd in set_track_prepare
mm/slub: avoid accessing metadata when pointer is invalid in object_err()
phy: mscc: Stop taking ts_lock for tx_queue and use its own lock
When transmitting a PTP frame which is timestamp using 2 step, the
following warning appears if CONFIG_PROVE_LOCKING is enabled:
=============================
[ BUG: Invalid wait context ] 6.17.0-rc1-00326-ge6160462704e #427 Not tainted
-----------------------------
ptp4l/119 is trying to lock: c2a44ed4 (&vsc8531->ts_lock){+.+.}-{3:3}, at: vsc85xx_txtstamp+0x50/0xac
other info that might help us debug this:
context-{4:4}
4 locks held by ptp4l/119:
#0: c145f068 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x58/0x1440
#1: c29df974 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x5c4/0x1440
#2: c2aaaad0 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x108/0x350
#3: c2aac170 (&lan966x->tx_lock){+.-.}-{2:2}, at: lan966x_port_xmit+0xd0/0x350
stack backtrace:
CPU: 0 UID: 0 PID: 119 Comm: ptp4l Not tainted 6.17.0-rc1-00326-ge6160462704e #427 NONE
Hardware name: Generic DT based system
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x7c/0xac
dump_stack_lvl from __lock_acquire+0x8e8/0x29dc
__lock_acquire from lock_acquire+0x108/0x38c
lock_acquire from __mutex_lock+0xb0/0xe78
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from vsc85xx_txtstamp+0x50/0xac
vsc85xx_txtstamp from lan966x_fdma_xmit+0xd8/0x3a8
lan966x_fdma_xmit from lan966x_port_xmit+0x1bc/0x350
lan966x_port_xmit from dev_hard_start_xmit+0xc8/0x2c0
dev_hard_start_xmit from sch_direct_xmit+0x8c/0x350
sch_direct_xmit from __dev_queue_xmit+0x680/0x1440
__dev_queue_xmit from packet_sendmsg+0xfa4/0x1568
packet_sendmsg from __sys_sendto+0x110/0x19c
__sys_sendto from sys_send+0x18/0x20
sys_send from ret_fast_syscall+0x0/0x1c
Exception stack(0xf0b05fa8 to 0xf0b05ff0)
5fa0: 000000010000000e0000000e0004b47a0000003a00000000
5fc0: 000000010000000e00000000000001210004af58000448740000000000000000
5fe0: 00000001bee9d42000025a10b6e75c7c
So, instead of using the ts_lock for tx_queue, use the spinlock that
skb_buff_head has.
selftest: net: Fix weird setsockopt() in bind_bhash.c.
bind_bhash.c passes (SO_REUSEADDR | SO_REUSEPORT) to setsockopt().
In the asm-generic definition, the value happens to match with the
bare SO_REUSEPORT, (2 | 15) == 15, but not on some arch.
arch/alpha/include/uapi/asm/socket.h:18:#define SO_REUSEADDR 0x0004
arch/alpha/include/uapi/asm/socket.h:24:#define SO_REUSEPORT 0x0200
arch/mips/include/uapi/asm/socket.h:24:#define SO_REUSEADDR 0x0004 /* Allow reuse of local addresses. */
arch/mips/include/uapi/asm/socket.h:33:#define SO_REUSEPORT 0x0200 /* Allow local address and port reuse. */
arch/parisc/include/uapi/asm/socket.h:12:#define SO_REUSEADDR 0x0004
arch/parisc/include/uapi/asm/socket.h:18:#define SO_REUSEPORT 0x0200
arch/sparc/include/uapi/asm/socket.h:13:#define SO_REUSEADDR 0x0004
arch/sparc/include/uapi/asm/socket.h:20:#define SO_REUSEPORT 0x0200
include/uapi/asm-generic/socket.h:12:#define SO_REUSEADDR 2
include/uapi/asm-generic/socket.h:27:#define SO_REUSEPORT 15
Let's pass SO_REUSEPORT only.
Fixes: c35ecb95c448 ("selftests/net: Add test for timing a bind request to a port with a populated bhash entry") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250903222938.2601522-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 3 Sep 2025 21:20:54 +0000 (14:20 -0700)]
MAINTAINERS: add Sabrina to TLS maintainers
Sabrina has been very helpful reviewing TLS patches, fixing bugs,
and, I believe, the last one to implement any major feature in
the TLS code base (rekeying). Add her as a maintainer.
Qianfeng Rong [Wed, 3 Sep 2025 12:34:03 +0000 (20:34 +0800)]
net: dsa: dsa_loop: use int type to store negative error codes
Change the 'ret' variable in dsa_loop_init() from unsigned int to int, as
it needs to store either negative error codes or zero returned by
mdio_driver_register().
Storing the negative error codes in unsigned type, doesn't cause an issue
at runtime but can be confusing. Additionally, assigning negative error
codes to unsigned type may trigger a GCC warning when the -Wsign-conversion
flag is enabled.
Qingfang Deng [Wed, 3 Sep 2025 10:07:26 +0000 (18:07 +0800)]
ppp: fix memory leak in pad_compress_skb
If alloc_skb() fails in pad_compress_skb(), it returns NULL without
releasing the old skb. The caller does:
skb = pad_compress_skb(ppp, skb);
if (!skb)
goto drop;
drop:
kfree_skb(skb);
When pad_compress_skb() returns NULL, the reference to the old skb is
lost and kfree_skb(skb) ends up doing nothing, leading to a memory leak.
Align pad_compress_skb() semantics with realloc(): only free the old
skb if allocation and compression succeed. At the call site, use the
new_skb variable so the original skb is not lost when pad_compress_skb()
fails.
Fixes: b3f9b92a6ec1 ("[PPP]: add PPP MPPE encryption module") Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250903100726.269839-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add proper error checking for dmaengine_desc_get_metadata_ptr() which
can return an error pointer and lead to potential crashes or undefined
behaviour if the pointer retrieval fails.
Properly handle the error by unmapping DMA buffer, freeing the skb and
returning early to prevent further processing with invalid data.
Jakub Kicinski [Thu, 4 Sep 2025 13:59:27 +0000 (06:59 -0700)]
Merge tag 'nf-25-09-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: updates for net
1) Fix a silly bug in conntrack selftest, busyloop may get optimized to
for (;;), reported by Yi Chen.
2) Introduce new NFTA_DEVICE_PREFIX attribute in nftables netlink api,
re-using old NFTA_DEVICE_NAME led to confusion with different
kernel/userspace versions. This refines the wildcard interface
support added in 6.16 release. From Phil Sutter.
* tag 'nf-25-09-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
selftests: netfilter: fix udpclash tool hang
====================
====================
eth: fbnic: support queue API and zero-copy Rx
Add support for queue API to fbnic, enable zero-copy Rx.
Patch 10 is likely of most interest as it adds a new core helper
(and touches mlx5). The rest of the patches are fbnic-specific
(and relatively boring).
Patches 1-3 reshuffle the Rx init/allocation path to better
align structures and functions which operate on them. Notably
patch 1 moves the page pool pointer to the queue struct (from NAPI).
Patch 4 converts the driver to use netmem_ref. The driver has
separate and explicit buffer queue for scatter / payloads, so only
references to those are converted.
Next 5 patches are more boring code shifts.
Patch 11 adds unreadable memory support to page pool allocation.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:14 +0000 (14:12 -0700)]
eth: fbnic: support queue ops / zero-copy Rx
Support queue ops. fbnic doesn't shut down the entire device
just to restart a single queue.
./tools/testing/selftests/drivers/net/hw/iou-zcrx.py
TAP version 13
1..3
ok 1 iou-zcrx.test_zcrx
ok 2 iou-zcrx.test_zcrx_oneshot
ok 3 iou-zcrx.test_zcrx_rss
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski [Mon, 1 Sep 2025 21:12:13 +0000 (14:12 -0700)]
eth: fbnic: don't pass NAPI into pp alloc
Queue API may ask us to allocate page pools when the device
is down, to validate that we ingested a memory provider binding.
Don't require NAPI to be passed to fbnic_alloc_qt_page_pools(),
to make calling fbnic_alloc_qt_page_pools() without NAPI possible.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:12 +0000 (14:12 -0700)]
eth: fbnic: defer page pool recycling activation to queue start
We need to be more careful about when direct page pool recycling
is enabled in preparation for queue ops support. Don't set the
NAPI pointer, call page_pool_enable_direct_recycling() from
the function that activates the queue (once the config can
no longer fail).
Jakub Kicinski [Mon, 1 Sep 2025 21:12:11 +0000 (14:12 -0700)]
eth: fbnic: allocate unreadable page pool for the payloads
Allow allocating a page pool with unreadable memory for the payload
ring (sub1). We need to provide the queue ID so that the memory provider
can match the PP. Use the appropriate page pool DMA sync helper.
For unreadable mem the direction has to be FROM_DEVICE. The default
is BIDIR for XDP, but obviously unreadable mem is not compatible
with XDP in the first place, so that's fine. While at it remove
the define for page pool flags.
The rxq_idx is passed to fbnic_alloc_rx_qt_resources() explicitly
to make it easy to allocate page pools without NAPI (see the patch
after the next).
Jakub Kicinski [Mon, 1 Sep 2025 21:12:10 +0000 (14:12 -0700)]
net: add helper to pre-check if PP for an Rx queue will be unreadable
mlx5 pokes into the rxq state to check if the queue has a memory
provider, and therefore whether it may produce unreadable mem.
Add a helper for doing this in the page pool API. fbnic will want
a similar thing (tho, for a slightly different reason).
Jakub Kicinski [Mon, 1 Sep 2025 21:12:08 +0000 (14:12 -0700)]
eth: fbnic: split fbnic_enable()
Factor out handling a single nv from fbnic_enable() to make
it reusable for queue ops. Use a __ prefix for the factored
out code. The real fbnic_nv_enable() which will include
fbnic_wrfl() will be added with the qops, to avoid unused
function warnings.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:06 +0000 (14:12 -0700)]
eth: fbnic: split fbnic_disable()
Factor out handling a single nv from fbnic_disable() to make
it reusable for queue ops. Use a __ prefix for the factored
out code. The real fbnic_nv_disable() which will include
fbnic_wrfl() will be added with the qops, to avoid unused
function warnings.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:05 +0000 (14:12 -0700)]
eth: fbnic: request ops lock
We'll add queue ops soon so. queue ops will opt the driver into
extra locking. Request this locking explicitly already to make
future patches smaller and easier to review.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:04 +0000 (14:12 -0700)]
eth: fbnic: use netmem_ref where applicable
Use netmem_ref instead of struct page pointer in prep for
unreadable memory. fbnic has separate free buffer submission
queues for headers and for data. Refactor the helper which
returns page pointer for a submission buffer to take the
high level queue container, create a separate handler
for header and payload rings. This ties the "upcast" from
netmem to system page to use of sub0 which we know has
system pages.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:02 +0000 (14:12 -0700)]
eth: fbnic: move xdp_rxq_info_reg() to resource alloc
Move rxq_info and mem model registration from fbnic_alloc_napi_vector()
and fbnic_alloc_nv_resources() to fbnic_alloc_rx_qt_resources().
The rxq_info is now registered later in the process, but that
should not cause any issues.
rxq_info lives in the fbnic_q_triad (qt) struct so qt init is a more
natural place. Encapsulating the logic in the qt functions will also
allow simplifying the cleanup in the NAPI related alloc functions
in the next commit.
Rx does not have a dedicated fbnic_free_rx_qt_resources(),
but we can use xdp_rxq_info_is_reg() to tell whether given
rxq_info was in use (effectively - if it's a qt for an Rx queue).
Having to pass nv into fbnic_alloc_rx_qt_resources() is not
great in terms of layering, but that's temporary, pp will
move soon..
Jakub Kicinski [Mon, 1 Sep 2025 21:12:01 +0000 (14:12 -0700)]
eth: fbnic: move page pool pointer from NAPI to the ring struct
In preparation for memory providers we need a closer association
between queues and page pools. We used to have a page pool at the
NAPI level to serve all associated queues but with MP the queues
under a NAPI may no longer be created equal.
The "ring" structure in fbnic is a descriptor ring. We have separate
"rings" for payload and header pages ("to device"), as well as a ring
for completions ("from device"). Technically we only need the page
pool pointers in the "to device" rings, so adding the pointer to
the ring struct is a bit wasteful. But it makes passing the structures
around much easier.
For now both "to device" rings store a pointer to the same
page pool. Using more than one queue per NAPI is extremely rare
so don't bother trying to share a single page pool between queues.
ipv6: sit: Add ipip6_tunnel_dst_find() for cleanup
Extract the dst lookup logic from ipip6_tunnel_xmit() into new helper
ipip6_tunnel_dst_find() to reduce code duplication and enhance readability.
No functional change intended.
On a x86_64, with allmodconfig object size is also reduced:
Wang Liang [Mon, 1 Sep 2025 06:35:37 +0000 (14:35 +0800)]
net: atm: fix memory leak in atm_register_sysfs when device_register fail
When device_register() return error in atm_register_sysfs(), which can be
triggered by kzalloc fail in device_private_init() or other reasons,
kmemleak reports the following memory leaks:
The current R-Car S4 rswitch driver only supports port based fowarding.
This patch set adds HW offloading for L2 switching/bridgeing. The driver
hooks into switchdev.
1. Rename the base driver file to keep the driver name (rswitch.ko)
2. Add setting of default MAC ageing time in hardware.
3. Add the L2 driver extension in a separate file. The HW offloading
is automatically configured when a port is added to the bridge device.
Usage example:
ip link add name br0 type bridge
ip link set dev tsn0 master br0
ip link set dev tsn1 master br0
ip link set dev br0 up
ip link set dev tsn0 up
ip link set dev tsn1 up
Layer 2 traffic is now fowarded by HW from port TSN0 to port TSN1.
4. Provides the functionality to set the MAC table ageing time in the
Rswitch.
Usage example:
ip link change dev br0 type bridge ageing 100
Michael Dege [Mon, 1 Sep 2025 04:58:07 +0000 (06:58 +0200)]
net: renesas: rswitch: add offloading for L2 switching
Add hardware offloading for L2 switching on R-Car S4.
On S4 brdev is limited to one per-device (not per port). Reasoning
is that hw L2 forwarding support lacks any sort of source port based
filtering, which makes it unusable to offload more than one bridge
device. Either you allow hardware to forward destination MAC to a
port, or you have to send it to CPU. You can't make it forward only
if src and dst ports are in the same brdev.
Michael Dege [Mon, 1 Sep 2025 04:58:05 +0000 (06:58 +0200)]
net: renesas: rswitch: rename rswitch.c to rswitch_main.c
Adding new functionality to the driver. Therefore splitting into multiple
c files to keep them manageable. New functionality will be added to
separate files.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Michael Dege <michael.dege@renesas.com> Link: https://patch.msgid.link/20250901-add_l2_switching-v5-1-5f13e46860d5@renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This new attribute is supposed to be used instead of NFTA_DEVICE_NAME
for simple wildcard interface specs. It holds a NUL-terminated string
representing an interface name prefix to match on.
While kernel code to distinguish full names from prefixes in
NFTA_DEVICE_NAME is simpler than this solution, reusing the existing
attribute with different semantics leads to confusion between different
versions of kernel and user space though:
* With old kernels, wildcards submitted by user space are accepted yet
silently treated as regular names.
* With old user space, wildcards submitted by kernel may cause crashes
since libnftnl expects NUL-termination when there is none.
Using a distinct attribute type sanitizes these situations as the
receiving part detects and rejects the unexpected attribute nested in
*_HOOK_DEVS attributes.
Fixes: 6d07a289504a ("netfilter: nf_tables: Support wildcard netdev hook specs") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Wed, 27 Aug 2025 17:17:32 +0000 (19:17 +0200)]
selftests: netfilter: fix udpclash tool hang
Yi Chen reports that 'udpclash' loops forever depending on compiler
(and optimization level used); while (x == 1) gets optimized into
for (;;). Add volatile qualifier to avoid that.
While at it, also run it under timeout(1) and fix the resize script
to not ignore the timeout passed as second parameter to insert_flood.
Reported-by: Yi Chen <yiche@redhat.com> Suggested-by: Yi Chen <yiche@redhat.com> Fixes: 78a588363587 ("selftests: netfilter: add conntrack clash resolution test case") Signed-off-by: Florian Westphal <fw@strlen.de>
====================
net: phy: micrel: Add PTP support for lan8842
The PTP block in lan8842 is the same as lan8814 so reuse all these
functions. The first patch of the series just does cosmetic changes such
that lan8842 can reuse the function lan8814_ptp_probe. There should not be
any functional changes here. While the second patch adds the PTP support
to lan8842.
====================
It has the same PTP IP block as lan8814, only the number of GPIOs is
different, all the other functionality is the same. So reuse the same
functions as lan8814 for lan8842.
There is a revision of lan8842 called lan8832 which doesn't have the PTP
IP block. So make sure in that case the PTP is not initialized.
net: phy: micrel: Introduce function __lan8814_ptp_probe_once
Introduce the function __lan8814_ptp_probe_once as this function will be
used also by lan8842 driver which has a different number of GPIOs
compared to lan8814. This change doesn't have any functional
changes.