git.ipfire.org Git - thirdparty/kernel/linux.git/log

]> git.ipfire.org Git - thirdparty/kernel/linux.git/log

projects / thirdparty / kernel / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Gregor Herburger [Wed, 1 Oct 2025 09:10:04 +0000 (14:40 +0530)]

can: mcp251xfd: only configure PIN1 when rx_int is set

When rx_int is used th mcp251xfd_chip_rx_int_enable and
mcp251xfd_chip_rx_int_disable function configure both PIN0 and PIN1. To
prepare the support of the GPIOS only configure PIN1 with
regmap_update_bits.

This way PIN0 can be used as GPIO while PIN1 is used as rx_int
interrupt.

Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
Tested-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251001091006.4003841-5-viken.dadhaniya@oss.qualcomm.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

commit | commitdiff | tree

Gregor Herburger [Wed, 1 Oct 2025 09:10:03 +0000 (14:40 +0530)]

can: mcp251xfd: add workaround for errata 5

According to Errata DS80000789E 5 writing IOCON register using one SPI
write command clears LAT0/LAT1.

Errata Fix/Work Around suggests to write registers with single byte write
instructions. However, it seems that every write to the second byte
causes the overwrite of LAT0/LAT1.

Never write byte 2 of IOCON register to avoid clearing of LAT0/LAT1.

Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
Tested-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251001091006.4003841-4-viken.dadhaniya@oss.qualcomm.com
[mkl: add missing MCP251XFD_REG_IOCON_GPIO_MASK]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

commit | commitdiff | tree

Gregor Herburger [Wed, 1 Oct 2025 09:10:02 +0000 (14:40 +0530)]

can: mcp251xfd: utilize gather_write function for all non-CRC writes

This is a preparation patch to add errata workaround for non crc writes.

Currently for non-crc writes to the chip can go through the
.gather_write, .write or the reg_update_bits callback.

To allow the addition of the errata fix at a single location use
mcp251xfd_regmap_nocrc_gather_write for all non-CRC write instructions,
similar to the crc regmap.

Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
Tested-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251001091006.4003841-3-viken.dadhaniya@oss.qualcomm.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

commit | commitdiff | tree

Marc Kleine-Budde [Wed, 1 Oct 2025 09:10:01 +0000 (14:40 +0530)]

can: mcp251xfd: move chip sleep mode into runtime pm

This is a preparation patch to add GPIO support.

Up to now, the Vdd regulator and the clocks have been managed by
Runtime-PM (on systems without CONFIG_PM these remain permanently
switched on).

During the mcp251xfd_open() callback the mcp251xfd is powered,
soft-reset and configured. In mcp251xfd_stop() the chip is shut down
again. To support the on-chip GPIOs, the chip must be supplied with
power while GPIOs are being requested, even if the networking
interface is down.

To support this, move the functions mcp251xfd_chip_softreset() and
mcp251xfd_chip_clock_init() from mcp251xfd_chip_start() to
mcp251xfd_runtime_resume(). Instead of setting the controller to sleep
mode in mcp251xfd_chip_stop(), bring it into configuration mode. This
way it doesn't take part in bus activity and doesn't enter sleep mode.

Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
Tested-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251001091006.4003841-2-viken.dadhaniya@oss.qualcomm.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

commit | commitdiff | tree

Maud Spierings [Thu, 30 Oct 2025 07:07:42 +0000 (08:07 +0100)]

can: mcp251x: mcp251x_can_probe(): use dev_err_probe()

The currently used combination of dev_err() plus return leaves a loud error
in dmesg even when the error is a deferred probe which gets resolved later.
For example a supply that has not been probed yet.

Use dev_err_probe() to improve the handling/display of errors.

Signed-off-by: Maud Spierings <maudspierings@gocontroll.com>
Link: https://patch.msgid.link/20251030-mcp_err-v1-1-eecf737823b7@gocontroll.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

commit | commitdiff | tree

Marc Kleine-Budde [Fri, 31 Oct 2025 12:12:20 +0000 (13:12 +0100)]

Merge patch series "convert can drivers to use ndo_hwtstamp callbacks"

Vadim Fedorenko <vadim.fedorenko@linux.dev> says:

The patchset converts generic ioctl implementation into a pair of
ndo_hwtstamp_get/ndo_hwtstamp_set generic callbacks and replaces
callbacks in drivers.

Link: https://patch.msgid.link/20251029231620.1135640-1-vadim.fedorenko@linux.dev
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

commit | commitdiff | tree

Vadim Fedorenko [Wed, 29 Oct 2025 23:16:20 +0000 (23:16 +0000)]

can: peak_usb: convert to use ndo_hwtstamp callbacks

Convert driver to use ndo_hwtstamp_set()/ndo_hwtstamp_get() callbacks.
ndo_eth_ioctl handler does nothing after conversion - remove it.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20251029231620.1135640-4-vadim.fedorenko@linux.dev
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

commit | commitdiff | tree

Vadim Fedorenko [Wed, 29 Oct 2025 23:16:19 +0000 (23:16 +0000)]

can: peak_canfd: convert to use ndo_hwtstamp callbacks

Convert driver to use ndo_hwtstamp_set()/ndo_hwtstamp_get() callbacks.
ndo_eth_ioctl handler does nothing after conversion - remove it.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20251029231620.1135640-3-vadim.fedorenko@linux.dev
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

commit | commitdiff | tree

Vadim Fedorenko [Wed, 29 Oct 2025 23:16:18 +0000 (23:16 +0000)]

can: convert generic HW timestamp ioctl to ndo_hwtstamp callbacks

Can has generic implementation of ndo_eth_ioctl which implements only HW
timestamping commands. Implement generic ndo_hwtstamp callbacks and use
it in drivers instead of generic ioctl interface.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20251029231620.1135640-2-vadim.fedorenko@linux.dev
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

commit | commitdiff | tree

Paolo Abeni [Thu, 30 Oct 2025 09:44:12 +0000 (10:44 +0100)]

Merge branch 'add-cn20k-nix-and-npa-contexts'

Subbaraya Sundeep says:

====================
Add CN20K NIX and NPA contexts

The hardware contexts of blocks NIX and NPA in CN20K silicon are
different than that of previous silicons CN10K and CN9XK. This
patchset adds the new contexts of CN20K in AF and PF drivers.
A new mailbox for enqueuing contexts to hardware is added.

Patch 1 simplifies context writing and reading by using max context
size supported by hardware instead of using each context size.
Patch 2 and 3 adds NIX block contexts in AF driver and extends
debugfs to display those new contexts
Patch 4 and 5 adds NPA block contexts in AF driver and extends
debugfs to display those new contexts
Patch 6 omits NDC configuration since CN20K NPA does not use NDC
for caching its contexts
Patch 7 and 8 uses the new NIX and NPA contexts in PF/VF driver.
Patch 9, 10 and 11 are to support more bandwidth profiles present in
CN20K for RX ratelimiting and to display new profiles in debugfs

v3: https://lore.kernel.org/all/1752772063-6160-1-git-send-email-sbhatta@marvell.com/
====================

Link: https://patch.msgid.link/1761388367-16579-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Subbaraya Sundeep [Sat, 25 Oct 2025 10:32:47 +0000 (16:02 +0530)]

octeontx2-pf: Use new bandwidth profiles in receive queue

Receive queue points to a bandwidth profile for rate limiting.
Since cn20k has additional bandwidth profiles use them
too while mapping receive queue to bandwidth profile.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-12-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Subbaraya Sundeep [Sat, 25 Oct 2025 10:32:46 +0000 (16:02 +0530)]

octeontx2-af: Display new bandwidth profiles too in debugfs

Consider the new profiles of cn20k too while displaying
bandwidth profile contexts in debugfs.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-11-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Subbaraya Sundeep [Sat, 25 Oct 2025 10:32:45 +0000 (16:02 +0530)]

octeontx2-af: Accommodate more bandwidth profiles for cn20k

CN20K has 16k of leaf profiles, 2k of middle profiles and
256 of top profiles. This patch modifies existing receive
queue and bandwidth profile context structures to accommodate
additional profiles of cn20k.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-10-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Subbaraya Sundeep [Sat, 25 Oct 2025 10:32:44 +0000 (16:02 +0530)]

octeontx2-pf: Initialize new NIX SQ context for cn20k

cn20k has different NIX context for send queue hence use
the new cn20k mailbox to init SQ context.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-9-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Linu Cherian [Sat, 25 Oct 2025 10:32:43 +0000 (16:02 +0530)]

octeontx2-pf: Initialize cn20k specific aura and pool contexts

With new CN20K NPA pool and aura contexts supported in AF
driver this patch modifies PF driver to use new NPA contexts.
Implement new hw_ops for intializing aura and pool contexts
for all the silicons.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-8-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Linu Cherian [Sat, 25 Oct 2025 10:32:42 +0000 (16:02 +0530)]

octeontx2-af: Skip NDC operations for cn20k

For cn20k, NPA block doesn't use the general purpose
NDC (Near Coprocessor Bus Data cache Unit) for caching,
hence skip the NDC related operations.
Also refactor NDC configuration code to a helper function.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-7-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Linu Cherian [Sat, 25 Oct 2025 10:32:41 +0000 (16:02 +0530)]

octeontx2-af: Extend debugfs support for cn20k NPA

Extend debugfs to display CN20K NPA aura and pool contexts.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-6-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Linu Cherian [Sat, 25 Oct 2025 10:32:40 +0000 (16:02 +0530)]

octeontx2-af: Add cn20k NPA block contexts

New CN20K silicon has NPA hardware context structures different from
previous silicons. Add NPA aura and pool context definitions for cn20k.
Extend NPA context handling support to cn20k.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-5-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Subbaraya Sundeep [Sat, 25 Oct 2025 10:32:39 +0000 (16:02 +0530)]

octeontx2-af: Extend debugfs support for cn20k NIX

Extend debugfs to display CN20K NIX send, receive and
completion queue contexts.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-4-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Subbaraya Sundeep [Sat, 25 Oct 2025 10:32:38 +0000 (16:02 +0530)]

octeontx2-af: Add cn20k NIX block contexts

New CN20K silicon has NIX hardware context structures different from
previous silicons. Add NIX send and completion queue context
definitions for cn20k. Extend NIX context handling support to cn20k.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-3-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Subbaraya Sundeep [Sat, 25 Oct 2025 10:32:37 +0000 (16:02 +0530)]

octeontx2-af: Simplify context writing and reading to hardware

Simplify NIX context reading and writing by using hardware
maximum context size instead of using individual sizes of
each context type.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1761388367-16579-2-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jakub Kicinski [Thu, 30 Oct 2025 02:00:37 +0000 (19:00 -0700)]

Merge branch 'net-phy-add-iterator-mdiobus_for_each_phy'

Heiner Kallweit says:

====================
net: phy: add iterator mdiobus_for_each_phy

Add and use an iterator for all PHY's on a MII bus, and phy_find_next()
as a prerequisite.
====================

Link: https://patch.msgid.link/07fc63e8-53fd-46aa-853e-96187bba9d44@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Sat, 25 Oct 2025 18:52:56 +0000 (20:52 +0200)]

net: phy: use new iterator mdiobus_for_each_phy in mdiobus_prevent_c45_scan

Use new iterator mdiobus_for_each_phy() to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/6d792b1e-d23d-4b7e-a94f-89c6617b620f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Sat, 25 Oct 2025 18:52:09 +0000 (20:52 +0200)]

net: davinci_mdio: use new iterator mdiobus_for_each_phy

Use new iterator mdiobus_for_each_phy() to simplify the code.

Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/326d1337-2c22-42e3-a152-046ac5c43095@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Sat, 25 Oct 2025 18:50:41 +0000 (20:50 +0200)]

net: fec: use new iterator mdiobus_for_each_phy

Use new iterator mdiobus_for_each_phy() to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/65eb9490-5666-4b4a-8d26-3fca738b1315@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Sat, 25 Oct 2025 18:49:50 +0000 (20:49 +0200)]

net: phy: add iterator mdiobus_for_each_phy

Add an iterator for all PHY's on a MII bus, and phy_find_next()
as a prerequisite.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/cd112f15-401a-43d9-8525-9ff0965a68cd@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Sat, 25 Oct 2025 18:35:47 +0000 (20:35 +0200)]

net: stmmac: mdio: fix incorrect phy address check

max_addr is the max number of addresses, not the highest possible address,
therefore check phydev->mdio.addr > max_addr isn't correct.
To fix this change the semantics of max_addr, so that it represents
the highest possible address. IMO this is also a little bit more intuitive
wrt name max_addr.

Fixes: 4a107a0e8361 ("net: stmmac: mdio: use phy_find_first to simplify stmmac_mdio_register")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reported-by: Simon Horman <horms@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/e869999b-2d4b-4dc1-9890-c2d3d1e8d0f8@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Sakari Ailus [Mon, 27 Oct 2025 11:50:22 +0000 (13:50 +0200)]

net: wwan: Remove redundant pm_runtime_mark_last_busy() calls

pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(),
pm_runtime_autosuspend() and pm_request_autosuspend() now include a call
to pm_runtime_mark_last_busy(). Remove the now-reduntant explicit call to
pm_runtime_mark_last_busy().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://patch.msgid.link/20251027115022.390997-4-sakari.ailus@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Sakari Ailus [Mon, 27 Oct 2025 11:50:20 +0000 (13:50 +0200)]

net: ipa: Remove redundant pm_runtime_mark_last_busy() calls

pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(),
pm_runtime_autosuspend() and pm_request_autosuspend() now include a call
to pm_runtime_mark_last_busy(). Remove the now-reduntant explicit call to
pm_runtime_mark_last_busy().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20251027115022.390997-2-sakari.ailus@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Sakari Ailus [Mon, 27 Oct 2025 11:50:19 +0000 (13:50 +0200)]

net: ethernet: Remove redundant pm_runtime_mark_last_busy() calls

pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(),
pm_runtime_autosuspend() and pm_request_autosuspend() now include a call
to pm_runtime_mark_last_busy(). Remove the now-reduntant explicit call to
pm_runtime_mark_last_busy().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20251027115022.390997-1-sakari.ailus@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 30 Oct 2025 01:44:21 +0000 (18:44 -0700)]

Merge branch 'net-enetc-add-i-mx94-enetc-support'

Wei Fang says:

====================
net: enetc: Add i.MX94 ENETC support

i.MX94 NETC has two kinds of ENETCs, one is the same as i.MX95, which
can be used as a standalone network port. The other one is an internal
ENETC, it connects to the CPU port of NETC switch through the pseudo
MAC. Also, i.MX94 have multiple PTP Timers, which is different from
i.MX95. Any PTP Timer can be bound to a specified standalone ENETC by
the IERB ETBCR registers. Currently, this patch only add ENETC support
and Timer support for i.MX94. The switch will be added by a separate
patch set.

In addition, note that i.MX94 SoC is launched after i.MX95, its NETC
has a higher version, so the driver support is added after i.MX95.
====================

Link: https://patch.msgid.link/20251029013900.407583-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Wei Fang [Wed, 29 Oct 2025 01:39:00 +0000 (09:39 +0800)]

net: enetc: add standalone ENETC support for i.MX94

The revision of i.MX94 ENETC is changed to v4.3, so add this revision to
enetc_info to support i.MX94 ENETC. And add PTP suspport for i.MX94.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20251029013900.407583-7-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Wei Fang [Wed, 29 Oct 2025 01:38:59 +0000 (09:38 +0800)]

net: enetc: add basic support for the ENETC with pseudo MAC for i.MX94

The ENETC with pseudo MAC is an internal port which connects to the CPU
port of the switch. The switch CPU/host ENETC is fully integrated with
the switch and does not require a back-to-back MAC, instead a light
weight "pseudo MAC" provides the delineation between switch and ENETC.
This translates to lower power (less logic and memory) and lower delay
(as there is no serialization delay across this link).

Different from the standalone ENETC which is used as the external port,
the internal ENETC has a different PCIe device ID, and it does not have
Ethernet MAC port registers, instead, it has a small number of pseudo
MAC port registers, so some features are not supported by pseudo MAC,
such as loopback, half duplex, one-step timestamping and so on.

Therefore, the configuration of this internal ENETC is also somewhat
different from that of the standalone ENETC. So add the basic support
for ENETC with pseudo MAC. More supports will be added in the future.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://patch.msgid.link/20251029013900.407583-6-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Clark Wang [Wed, 29 Oct 2025 01:38:58 +0000 (09:38 +0800)]

net: enetc: add ptp timer binding support for i.MX94

The i.MX94 has three PTP timers, and all standalone ENETCs can select
one of them to bind to as their PHC. The 'ptp-timer' property is used
to represent the PTP device of the Ethernet controller. So users can
add 'ptp-timer' to the ENETC node to specify the PTP timer. The driver
parses this property to bind the two hardware devices.

If the "ptp-timer" property is not present, the first timer of the PCIe
bus where the ENETC is located is used as the default bound PTP timer.

Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251029013900.407583-5-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Wei Fang [Wed, 29 Oct 2025 01:38:57 +0000 (09:38 +0800)]

net: enetc: add preliminary i.MX94 NETC blocks control support

NETC blocks control is used for warm reset and pre-boot initialization.
Different versions of NETC blocks control are not exactly the same. We
need to add corresponding netc_devinfo data for each version. i.MX94
series are launched after i.MX95, so its NETC version (v4.3) is higher
than i.MX95 NETC (v4.1). Currently, the patch adds the following
configurations for ENETCs.

1. Set the link's MII protocol.
2. ENETC 0 (MAC 3) and the switch port 2 (MAC 2) share the same parallel
interface, but due to SoC constraint, they cannot be used simultaneously.
Since the switch is not supported yet, so the interface is assigned to
ENETC 0 by default.

The switch configuration will be added separately in a subsequent patch.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20251029013900.407583-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Wei Fang [Wed, 29 Oct 2025 01:38:56 +0000 (09:38 +0800)]

dt-bindings: net: enetc: add compatible string for ENETC with pseduo MAC

The ENETC with pseudo MAC is used to connect to the CPU port of the NETC
switch. This ENETC has a different PCI device ID, so add a standard PCI
device compatible string to it.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251029013900.407583-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Wei Fang [Wed, 29 Oct 2025 01:38:55 +0000 (09:38 +0800)]

dt-bindings: net: netc-blk-ctrl: add compatible string for i.MX94 platforms

Add the compatible string "nxp,imx94-netc-blk-ctrl" for i.MX94 platforms.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251029013900.407583-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 30 Oct 2025 01:28:32 +0000 (18:28 -0700)]

Merge branch 'icmp-add-rfc-5837-support'

Ido Schimmel says:

====================
icmp: Add RFC 5837 support

tl;dr
=====

This patchset extends certain ICMP error messages (e.g., "Time
Exceeded") with incoming interface information in accordance with RFC
5837 [1]. This is required for more meaningful traceroute results in
unnumbered networks. Like other ICMP settings, the feature is controlled
via a per-{netns, address family} sysctl. The interface and the
implementation are designed to support more ICMP extensions.

Motivation
==========

Over the years, the kernel was extended with the ability to derive the
source IP of ICMP error messages from the interface that received the
datagram which elicited the ICMP error [2][3][4]. This is especially
important for "Time Exceeded" messages as it allows traceroute users to
trace the actual packet path along the network.

The above scheme does not work in unnumbered networks. In these
networks, only the loopback / VRF interface is assigned a global IP
address while router interfaces are assigned IPv6 link-local addresses.
As such, ICMP error messages are generated with a source IP derived from
the loopback / VRF interface, making it impossible to trace the actual
packet path when parallel links exist between routers.

The problem can be solved by implementing the solution proposed by RFC
4884 [5] and RFC 5837. The former defines an ICMP extension structure
that can be appended to selected ICMP messages and carry extension
objects. The latter defines an extension object called the "Interface
Information Object" (IIO) that can carry interface information (e.g.,
name, index, MTU) about interfaces with certain roles such as the
interface that received the datagram which elicited the ICMP error.

The payload of the datagram that elicited the error (potentially padded
/ trimmed) along with the ICMP extension structure will be queued to the
error queue of the originating socket, thereby allowing traceroute
applications to parse and display the information encoded in the ICMP
extension structure. Example:

# traceroute6 -e 2001:db8:1::3
traceroute to 2001:db8:1::3 (2001:db8:1::3), 30 hops max, 80 byte packets
  1  2001:db8:1::2 (2001:db8:1::2) <INC:11,"eth1",mtu=1500>  0.214 ms  0.171 ms  0.162 ms
  2  2001:db8:1::3 (2001:db8:1::3) <INC:12,"eth2",mtu=1500>  0.154 ms  0.135 ms  0.127 ms

# traceroute -e 192.0.2.3
traceroute to 192.0.2.3 (192.0.2.3), 30 hops max, 60 byte packets
  1  192.0.2.2 (192.0.2.2) <INC:11,"eth1",mtu=1500>  0.191 ms  0.148 ms  0.144 ms
  2  192.0.2.3 (192.0.2.3) <INC:12,"eth2",mtu=1500>  0.137 ms  0.122 ms  0.114 ms

Implementation
==============

As previously stated, the feature is controlled via a per-{netns,
address} sysctl. Specifically, a bit mask where each bit controls the
addition of a different ICMP extension to ICMP error messages.
Currently, only a single value is supported, to append the incoming
interface information.

Key points:

1. Global knob vs finer control. I am not aware of users who require
finer control, but it is possible that some users will want to avoid
appending ICMP extensions when the packet is sent out of a specific
interface (e.g., the management interface) or to a specific subnet. This
can be accomplished via a tc-bpf program that trims the ICMP extension
structure. An example program can be found here [6].

2. Split implementation between IPv4 / IPv6. While the implementation is
currently similar, there are some differences between both address
families. In addition, some extensions (e.g., RFC 8883 [7]) are
IPv6-specific. Given the above and given that the implementation is not
very complex, it makes sense to keep both implementations separate.

3. Compatibility with legacy applications. RFC 4884 from 2007 extended
certain ICMP messages with a length field that encodes the length of the
"original datagram" field, so that applications will be able to tell
where the "original datagram" ends and where the ICMP extension
structure starts.

Before the introduction of the IP{,6}_RECVERR_RFC4884 socket options
[8][9] in 2020 it was impossible for applications to know where the ICMP
extension structure starts and to this day some applications assume that
it starts at offset 128, which is the minimum length of the "original
datagram" field as specified by RFC 4884.

Therefore, in order to be compatible with both legacy and modern
applications, the datagram that elicited the ICMP error is trimmed /
padded to 128 bytes before appending the ICMP extension structure.

This behavior is specifically called out by RFC 4884: "Those wishing to
be backward compatible with non-compliant TRACEROUTE implementations
will include exactly 128 octets" [10].

Note that in 128 bytes we should be able to include enough headers for
the originating node to match the ICMP error message with the relevant
socket. For example, the following headers will be present in the
"original datagram" field when a VXLAN encapsulated IPv6 packet elicits
an ICMP error in an IPv6 underlay: IPv6 (40) | UDP (8) | VXLAN (8) | Eth
(14) | IPv6 (40) | UDP (8). Overall, 118 bytes.

If the 128 bytes limit proves to be insufficient for some use case, we
can consider dedicating a new bit in the previously mentioned sysctl to
allow for more bytes to be included in the "original datagram" field.

4. Extensibility. This patchset adds partial support for a single ICMP
extension. However, the interface and the implementation should be able
to support more extensions, if needed. Examples:

* More interface information objects as part of RFC 5837. We should be
  able to derive the outgoing interface information and nexthop IP from
  the dst entry attached to the packet that elicited the error.

* Node identification object (e.g., hostname / loopback IP) [11].

* Extended Information object which encodes aggregate header limits as
  part of RFC 8883.

A previous proposal from Ishaan Gandhi and Ron Bonica is available here
[12].

Testing
=======

The existing traceroute selftest is extended to test that ICMP
extensions are reported correctly when enabled. Both address families
are tested and with different packet sizes in order to make sure that
trimming / padding works correctly. Tested that packets are parsed
correctly by the IP{,6}_RECVERR_RFC4884 socket options using Willem's
selftest [13].

Changelog
=========

Changes since v1 [14]:

* Patches #1-#2: Added a comment about field ordering and review tags.

* Patch #3: Converted "sysctl" to "echo" when testing the return value.
  Added a check to skip the test if traceroute version is older
  than 2.1.5.

[1] https://datatracker.ietf.org/doc/html/rfc5837
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c2fb7f93cb20621772bf304f3dba0849942e5db
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fac6fce9bdb59837bb89930c3a92f5e0d1482f0b
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4a8c416602d97a4e2073ed563d4d4c7627de19cf
[5] https://datatracker.ietf.org/doc/html/rfc4884
[6] https://gist.github.com/idosch/5013448cdb5e9e060e6bfdc8b433577c
[7] https://datatracker.ietf.org/doc/html/rfc8883
[8] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eba75c587e811d3249c8bd50d22bb2266ccd3c0f
[9] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=01370434df85eb76ecb1527a4466013c4aca2436
[10] https://datatracker.ietf.org/doc/html/rfc4884#section-5.3
[11] https://datatracker.ietf.org/doc/html/draft-ietf-intarea-extended-icmp-nodeid-04
[12] https://lore.kernel.org/netdev/20210317221959.4410-1-ishaangandhi@gmail.com/
[13] https://lore.kernel.org/netdev/aPpMItF35gwpgzZx@shredder/
[14] https://lore.kernel.org/netdev/20251022065349.434123-1-idosch@nvidia.com/
====================

Link: https://patch.msgid.link/20251027082232.232571-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ido Schimmel [Mon, 27 Oct 2025 08:22:32 +0000 (10:22 +0200)]

selftests: traceroute: Add ICMP extensions tests

Test that ICMP extensions are reported correctly when enabled and not
reported when disabled. Test both IPv4 and IPv6 and using different
packet sizes, to make sure trimming / padding works correctly.

Disable ICMP rate limiting (defaults to 1 per-second per-target) so that
the kernel will always generate ICMP errors when needed.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251027082232.232571-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ido Schimmel [Mon, 27 Oct 2025 08:22:31 +0000 (10:22 +0200)]

ipv6: icmp: Add RFC 5837 support

Add the ability to append the incoming IP interface information to
ICMPv6 error messages in accordance with RFC 5837 and RFC 4884. This is
required for more meaningful traceroute results in unnumbered networks.

The feature is disabled by default and controlled via a new sysctl
("net.ipv6.icmp.errors_extension_mask") which accepts a bitmask of ICMP
extensions to append to ICMP error messages. Currently, only a single
value is supported, but the interface and the implementation should be
able to support more extensions, if needed.

Clone the skb and copy the relevant data portions before modifying the
skb as the caller of icmp6_send() still owns the skb after the function
returns. This should be fine since by default ICMP error messages are
rate limited to 1000 per second and no more than 1 per second per
specific host.

Trim or pad the packet to 128 bytes before appending the ICMP extension
structure in order to be compatible with legacy applications that assume
that the ICMP extension structure always starts at this offset (the
minimum length specified by RFC 4884).

Since commit 20e1954fe238 ("ipv6: RFC 4884 partial support for SIT/GRE
tunnels") it is possible for icmp6_send() to be called with an skb that
already contains ICMP extensions. This can happen when we receive an
ICMPv4 message with extensions from a tunnel and translate it to an
ICMPv6 message towards an IPv6 host in the overlay network. I could not
find an RFC that supports this behavior, but it makes sense to not
overwrite the original extensions that were appended to the packet.
Therefore, avoid appending extensions if the length field in the
provided ICMPv6 header is already filled.

Export netdev_copy_name() using EXPORT_IPV6_MOD_GPL() to make it
available to IPv6 when it is built as a module.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251027082232.232571-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ido Schimmel [Mon, 27 Oct 2025 08:22:30 +0000 (10:22 +0200)]

ipv4: icmp: Add RFC 5837 support

Add the ability to append the incoming IP interface information to
ICMPv4 error messages in accordance with RFC 5837 and RFC 4884. This is
required for more meaningful traceroute results in unnumbered networks.

The feature is disabled by default and controlled via a new sysctl
("net.ipv4.icmp_errors_extension_mask") which accepts a bitmask of ICMP
extensions to append to ICMP error messages. Currently, only a single
value is supported, but the interface and the implementation should be
able to support more extensions, if needed.

Clone the skb and copy the relevant data portions before modifying the
skb as the caller of __icmp_send() still owns the skb after the function
returns. This should be fine since by default ICMP error messages are
rate limited to 1000 per second and no more than 1 per second per
specific host.

Trim or pad the packet to 128 bytes before appending the ICMP extension
structure in order to be compatible with legacy applications that assume
that the ICMP extension structure always starts at this offset (the
minimum length specified by RFC 4884).

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251027082232.232571-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Tue, 28 Oct 2025 03:58:23 +0000 (03:58 +0000)]

net: sched: Don't use WARN_ON_ONCE() for -ENOMEM in tcf_classify().

As demonstrated by syzbot, WARN_ON_ONCE() in tcf_classify() can
be easily triggered by fault injection. [0]

We should not use WARN_ON_ONCE() for the simple -ENOMEM case.

Also, we provide SKB_DROP_REASON_NOMEM for the same error.

Let's remove WARN_ON_ONCE() there.

[0]:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 0 UID: 0 PID: 31392 Comm: syz.8.7081 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250
should_fail_ex+0x414/0x560
should_failslab+0xa8/0x100
kmem_cache_alloc_noprof+0x74/0x6e0
skb_ext_add+0x148/0x8f0
tcf_classify+0xeba/0x1140
multiq_enqueue+0xfd/0x4c0 net/sched/sch_multiq.c:66
...
WARNING: CPU: 0 PID: 31392 at net/sched/cls_api.c:1869 tcf_classify+0xfd7/0x1140
Modules linked in:
CPU: 0 UID: 0 PID: 31392 Comm: syz.8.7081 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
RIP: 0010:tcf_classify+0xfd7/0x1140
Code: e8 03 42 0f b6 04 30 84 c0 0f 85 41 01 00 00 66 41 89 1f eb 05 e8 89 26 75 f8 bb ff ff ff ff e9 04 f9 ff ff e8 7a 26 75 f8 90 <0f> 0b 90 49 83 c5 44 4c 89 eb 49 c1 ed 03 43 0f b6 44 35 00 84 c0
RSP: 0018:ffffc9000b7671f0 EFLAGS: 00010293
RAX: ffffffff894addf6 RBX: 0000000000000002 RCX: ffff888025029e40
RDX: 0000000000000000 RSI: ffffffff8bbf05c0 RDI: ffffffff8bbf0580
RBP: 0000000000000000 R08: 00000000ffffffff R09: 1ffffffff1c0bfd6
R10: dffffc0000000000 R11: fffffbfff1c0bfd7 R12: ffff88805a90de5c
R13: ffff88805a90ddc0 R14: dffffc0000000000 R15: ffffc9000b7672c0
FS: 00007f20739f66c0(0000) GS:ffff88812613e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000110c2d2a80 CR3: 0000000024e36000 CR4: 00000000003526f0
Call Trace:
<TASK>
multiq_classify net/sched/sch_multiq.c:39 [inline]
multiq_enqueue+0xfd/0x4c0 net/sched/sch_multiq.c:66
dev_qdisc_enqueue+0x4e/0x260 net/core/dev.c:4118
__dev_xmit_skb net/core/dev.c:4214 [inline]
__dev_queue_xmit+0xe83/0x3b50 net/core/dev.c:4729
packet_snd net/packet/af_packet.c:3076 [inline]
packet_sendmsg+0x3e33/0x5080 net/packet/af_packet.c:3108
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:742
____sys_sendmsg+0x505/0x830 net/socket.c:2630
___sys_sendmsg+0x21f/0x2a0 net/socket.c:2684
__sys_sendmsg net/socket.c:2716 [inline]
__do_sys_sendmsg net/socket.c:2721 [inline]
__se_sys_sendmsg net/socket.c:2719 [inline]
__x64_sys_sendmsg+0x19b/0x260 net/socket.c:2719
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f207578efc9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f20739f6038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f20759e5fa0 RCX: 00007f207578efc9
RDX: 0000000000000004 RSI: 00002000000000c0 RDI: 0000000000000008
RBP: 00007f20739f6090 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007f20759e6038 R14: 00007f20759e5fa0 R15: 00007f2075b0fa28
</TASK>

Reported-by: syzbot+87e1289a044fcd0c5f62@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69003e33.050a0220.32483.00e8.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251028035859.2067690-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ankit Khushwaha [Tue, 28 Oct 2025 17:29:47 +0000 (22:59 +0530)]

selftest: net: fix socklen_t type mismatch in sctp_collision test

Socket APIs like recvfrom(), accept(), and getsockname() expect socklen_t*
arg, but tests were using int variables. This causes -Wpointer-sign
warnings on platforms where socklen_t is unsigned.

Change the variable type from int to socklen_t to resolve the warning and
ensure type safety across platforms.

warning fixed:

sctp_collision.c:62:70: warning: passing 'int *' to parameter of
type 'socklen_t *' (aka 'unsigned int *') converts between pointers to
integer types with different sign [-Wpointer-sign]
   62 |                 ret = recvfrom(sd, buf, sizeof(buf),
0, (struct sockaddr *)&daddr, &len);
      |                                                           ^~~~
/usr/include/sys/socket.h:165:27: note: passing argument to
parameter '__addr_len' here
  165 |                          socklen_t *__restrict __addr_len);
      |                                                ^

Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251028172947.53153-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 30 Oct 2025 00:18:24 +0000 (17:18 -0700)]

Merge branch 'net-stmmac-hwif-c-cleanups'

Russell King says:

====================
net: stmmac: hwif.c cleanups

This series cleans up hwif.c:

- move the reading of the version information out of stmmac_hwif_init()
  into its own function, stmmac_get_version(), storing the result in a
  new struct.

- simplify stmmac_get_version().

- read the version register once, passing it to stmmac_get_id() and
  stmmac_get_dev_id().

- move stmmac_get_id() and stmmac_get_dev_id() into
  stmmac_get_version()

- define version register fields and use FIELD_GET() to decode

- start tackling the big loop in stmmac_hwif_init() - provide a
  function, stmmac_hwif_find(), which looks up the hwif entry, thus
  making a much smaller loop, which improves readability of this code.

- change the use of '^' to '!=' when comparing the dev_id, which is
  what is really meant here.

- reorganise the test after calling stmmac_hwif_init() so that we
  handle the error case in the indented code, and the success case
  with no indent, which is the classical arrangement.
====================

Link: https://patch.msgid.link/aQFZVSGJuv8-_DIo@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 29 Oct 2025 00:03:46 +0000 (00:03 +0000)]

net: stmmac: reorganise stmmac_hwif_init()

Reorganise stmmac_hwif_init() to handle the error case of
stmmac_hwif_find() in the indented block, which follows normal
programming pattern.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vDtfG-0000000CCCX-2YwQ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 29 Oct 2025 00:03:41 +0000 (00:03 +0000)]

net: stmmac: use != rather than ^ for comparing dev_id

Use the more usual not-equals rather than exclusive-or operator when
comparing the dev_id in stmmac_hwif_find().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vDtfB-0000000CCCR-25rr@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 29 Oct 2025 00:03:36 +0000 (00:03 +0000)]

net: stmmac: provide function to lookup hwif

Provide a function to lookup the hwif entry given the core type,
Synopsys version, and device ID (used for XGMAC cores).

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vDtf6-0000000CCCL-1cQA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 29 Oct 2025 00:03:31 +0000 (00:03 +0000)]

net: stmmac: use FIELD_GET() for version register

Provide field definitions in common.h, and use these with FIELD_GET()
to extract the fields from the version register.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vDtf1-0000000CCCF-0uUV@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 29 Oct 2025 00:03:26 +0000 (00:03 +0000)]

net: stmmac: move stmmac_get_*id() into stmmac_get_version()

Move the contents of both stmmac_get_id() and stmmac_get_dev_id() into
stmmac_get_version() as it no longer makes sense for these to be
separate functions.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vDtew-0000000CCC9-0KeM@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 29 Oct 2025 00:03:20 +0000 (00:03 +0000)]

net: stmmac: consolidate version reading and validation

There is no need to read the version register twice, once in
stmmac_get_id() and then again in stmmac_get_dev_id(). Consolidate
this into stmmac_get_version() and pass each of these this value.

As both functions unnecessarily issue the same warning for a zero
register value, also move this into stmmac_get_version().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vDteq-0000000CCC3-3zbJ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 29 Oct 2025 00:03:15 +0000 (00:03 +0000)]

net: stmmac: simplify stmmac_get_version()

We can simplify stmmac_get_version() by pre-initialising the version
members to zero, detecting the MAC100 core and returning, otherwise
determining the version register offset separately from calling
stmmac_get_id() and stmmac_get_dev_id(). Do this.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vDtel-0000000CCBx-3Lpf@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Russell King (Oracle) [Wed, 29 Oct 2025 00:03:10 +0000 (00:03 +0000)]

net: stmmac: move version handling into own function

Move the version handling out of stmmac_hwif_init() and into its own
function, returning the version information through a structure.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vDteg-0000000CCBr-2m7q@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Wang Liang [Sat, 25 Oct 2025 09:26:37 +0000 (17:26 +0800)]

net: ipv4: Remove extern udp_v4_early_demux()/tcp_v4_early_demux() in .c files

Function udp_v4_early_demux() was already declared in 'include/net/udp.h',
no need to keep the extern in 'ip_input.c', which may produce the
following checkpatch warning:

  WARNING: externs should be avoided in .c files
  #45: FILE: net/ipv4/ip_input.c:322:
  +enum skb_drop_reason udp_v4_early_demux(struct sk_buff *skb);

Replace it by including 'net/udp.h'. Do the same for tcp_v4_early_demux().

Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20251025092637.1020960-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Tianling Shen [Sun, 26 Oct 2025 13:36:52 +0000 (21:36 +0800)]

net: phy: motorcomm: Add support for PHY LEDs on YT8531

The LED registers on YT8531 are exactly same as YT8521, so simply
reuse yt8521_led_hw_* functions.

Tested on OrangePi R1 Plus LTS and Zero3.

Signed-off-by: Tianling Shen <cnsztl@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jijie Shao<shaojijie@huawei.com>
Link: https://patch.msgid.link/20251026133652.1288732-1-cnsztl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Issam Hamdi [Fri, 24 Oct 2025 09:49:00 +0000 (11:49 +0200)]

net: phy: realtek: Add RTL8224 cable testing support

The RTL8224 can detect open pairs and short types (in same pair or some
other pair). The distance to this problem can be estimated. This is done
for each of the 4 pairs separately.

It is not meant to be run while there is an active link partner because
this interferes with the active test pulses.

Output with open 50 m cable:

  Pair A code Open Circuit, source: TDR
  Pair A, fault length: 51.79m, source: TDR
  Pair B code Open Circuit, source: TDR
  Pair B, fault length: 51.28m, source: TDR
  Pair C code Open Circuit, source: TDR
  Pair C, fault length: 50.46m, source: TDR
  Pair D code Open Circuit, source: TDR
  Pair D, fault length: 51.12m, source: TDR

Terminated cable:

  Pair A code OK, source: TDR
  Pair B code OK, source: TDR
  Pair C code OK, source: TDR
  Pair D code OK, source: TDR

Shorted cable (both short types are at roughly the same distance)

  Pair A code Short to another pair, source: TDR
  Pair A, fault length: 2.35m, source: TDR
  Pair B code Short to another pair, source: TDR
  Pair B, fault length: 2.15m, source: TDR
  Pair C code OK, source: TDR
  Pair D code Short within Pair, source: TDR
  Pair D, fault length: 1.94m, source: TDR

Signed-off-by: Issam Hamdi <ih@simonwunderlich.de>
Co-developed-by: Sven Eckelmann <se@simonwunderlich.de>
Signed-off-by: Sven Eckelmann <se@simonwunderlich.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251024-rtl8224-cable-test-v1-1-e3cda89ac98f@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Wed, 29 Oct 2025 01:12:07 +0000 (18:12 -0700)]

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
ice: postpone service task disabling

Przemek Kitszel says:

Move service task shutdown to the very end of driver teardown procedure.
This is needed (or at least beneficial) for all unwinding functions that
talk to FW/HW via Admin Queue (so, most of top-level functions, like
ice_deinit_hw()).

Most of the patches move stuff around (I believe it makes it much easier
to review/proof when kept separate) in preparation to defer stopping the
service task to the very end of ice_remove() (and other unwinding flows).
Then last patch fixes duplicate call to ice_init_hw() (actual, but
unlikely to encounter, so -next, given the size of the changes).

First patch is not much related, only by that it was developed together

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: remove duplicate call to ice_deinit_hw() on error paths
  ice: move ice_deinit_dev() to the end of deinit paths
  ice: extract ice_init_dev() from ice_init()
  ice: move ice_init_pf() out of ice_init_dev()
  ice: move udp_tunnel_nic and misc IRQ setup into ice_init_pf()
  ice: ice_init_pf: destroy mutexes and xarrays on memory alloc failure
  ice: move ice_init_interrupt_scheme() prior ice_init_pf()
  ice: move service task start out of ice_init_pf()
  ice: enforce RTNL assumption of queue NAPI manipulation
====================

Link: https://patch.msgid.link/20251024204746.3092277-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Rakuram Eswaran [Sat, 25 Oct 2025 12:05:18 +0000 (17:35 +0530)]

net: tcp_lp: fix kernel-doc warnings and update outdated reference links

Fix kernel-doc warnings in tcp_lp.c by adding missing parameter
descriptions for tcp_lp_cong_avoid() and tcp_lp_pkts_acked() when
building with W=1.

Also replace invalid URLs in the file header comment with the currently
valid links to the TCP-LP paper and implementation page.

No functional changes.

Signed-off-by: Rakuram Eswaran <rakuram.e96@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251025-net_ipv4_tcp_lp_c-v1-1-058cc221499e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Christophe JAILLET [Sat, 25 Oct 2025 07:40:59 +0000 (09:40 +0200)]

sctp: Constify struct sctp_sched_ops

'struct sctp_sched_ops' is not modified in these drivers.

Constifying this structure moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.

On a x86_64, with allmodconfig, as an example:
Before:
======
   text    data     bss     dec     hex filename
   8019     568       0    8587    218b net/sctp/stream_sched_fc.o

After:
=====
   text    data     bss     dec     hex filename
   8275     312       0    8587    218b net/sctp/stream_sched_fc.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://patch.msgid.link/dce03527eb7b7cc8a3c26d5cdac12bafe3350135.1761377890.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Bobby Eshleman [Fri, 24 Oct 2025 18:02:56 +0000 (11:02 -0700)]

net: netmem: remove NET_IOV_MAX from net_iov_type enum

Remove the NET_IOV_MAX workaround from the net_iov_type enum. This entry
was previously added to force the enum size to unsigned long to satisfy
the NET_IOV_ASSERT_OFFSET static assertions.

After commit f3d85c9ee510 ("netmem: introduce struct netmem_desc
mirroring struct page") this approach became unnecessary by placing the
net_iov_type after the netmem_desc. Placing the net_iov_type after
netmem_desc results in the net_iov_type size having no effect on the
position or layout of the fields that mirror the struct page.

The layout before this patch:

struct net_iov {
union {
struct netmem_desc desc;                 /*     0    48 */
struct {
long unsigned int _flags;        /*     0     8 */
long unsigned int pp_magic;      /*     8     8 */
struct page_pool * pp;           /*    16     8 */
long unsigned int _pp_mapping_pad; /*    24     8 */
long unsigned int dma_addr;      /*    32     8 */
atomic_long_t pp_ref_count;      /*    40     8 */
};                                       /*     0    48 */
};                                               /*     0    48 */
struct net_iov_area *      owner;                /*    48     8 */
enum net_iov_type          type;                 /*    56     8 */

/* size: 64, cachelines: 1, members: 3 */
};

The layout after this patch:

struct net_iov {
union {
struct netmem_desc desc;                 /*     0    48 */
struct {
long unsigned int _flags;        /*     0     8 */
long unsigned int pp_magic;      /*     8     8 */
struct page_pool * pp;           /*    16     8 */
long unsigned int _pp_mapping_pad; /*    24     8 */
long unsigned int dma_addr;      /*    32     8 */
atomic_long_t pp_ref_count;      /*    40     8 */
};                                       /*     0    48 */
};                                               /*     0    48 */
struct net_iov_area *      owner;                /*    48     8 */
enum net_iov_type          type;                 /*    56     4 */

/* size: 64, cachelines: 1, members: 3 */
/* padding: 4 */
};

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20251024-b4-devmem-remove-niov-max-v1-1-ba72c68bc869@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Fri, 24 Oct 2025 09:12:40 +0000 (09:12 +0000)]

net: rps: softnet_data reorg to make enqueue_to_backlog() fast

enqueue_to_backlog() is showing up in kernel profiles on hosts
with many cores, when RFS/RPS is used.

The following softnet_data fields need to be updated:

- input_queue_tail
- input_pkt_queue (next, prev, qlen, lock)
- backlog.state (if input_pkt_queue was empty)

Unfortunately they are currenly using two cache lines:

/* --- cacheline 3 boundary (192 bytes) --- */
call_single_data_t         csd __attribute__((__aligned__(64))); /*  0xc0  0x20 */
struct softnet_data *      rps_ipi_next;         /*  0xe0   0x8 */
unsigned int               cpu;                  /*  0xe8   0x4 */
unsigned int               input_queue_tail;     /*  0xec   0x4 */
struct sk_buff_head        input_pkt_queue;      /*  0xf0  0x18 */

/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */

struct napi_struct         backlog __attribute__((__aligned__(8))); /* 0x108 0x1f0 */

Add one ____cacheline_aligned_in_smp to make sure they now are using
a single cache line.

Also, because napi_struct has written fields, make @state its first field.

We want to make sure that cpus adding packets to sd->input_pkt_queue
are not slowing down cpus processing their backlog because of
false sharing.

After this patch new layout is:

/* --- cacheline 5 boundary (320 bytes) --- */
long int                   pad[3] __attribute__((__aligned__(64))); /* 0x140  0x18 */
unsigned int               input_queue_tail;     /* 0x158   0x4 */

/* XXX 4 bytes hole, try to pack */

struct sk_buff_head        input_pkt_queue;      /* 0x160  0x18 */
struct napi_struct         backlog __attribute__((__aligned__(8))); /* 0x178 0x1f0 */

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251024091240.3292546-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Fri, 24 Oct 2025 09:05:17 +0000 (09:05 +0000)]

net: optimize enqueue_to_backlog() for the fast path

Add likely() and unlikely() clauses for the common cases:

Device is running.
Queue is not full.
Queue is less than half capacity.

Add max_backlog parameter to skb_flow_limit() to avoid
a second READ_ONCE(net_hotdata.max_backlog).

skb_flow_limit() does not need the backlog_lock protection,
and can be called before we acquire the lock, for even better
resistance to attacks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251024090517.3289181-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Mon, 27 Oct 2025 19:29:58 +0000 (12:29 -0700)]

tools: ynl: rework the string representation of NlError

In early days of YNL development dumping the NlMsg on errors
was quite useful, as the library itself could have been buggy.
These days increasingly the NlMsg is just taking up screen space
and means nothing to a typical user. Try to format the errors
more in line with how YNL C formats its errors strings.

Before:
  $ ynl --family ethtool  --do channels-set  --json '{}'
  Netlink error: Invalid argument
  nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
error: -22
extack: {'miss-type': 'header'}

  $ ynl --family ethtool  --do channels-set  --json '{..., "tx-count": 999}'
  Netlink error: Invalid argument
  nl_len = 88 (72) nl_flags = 0x300 nl_type = 2
error: -22
extack: {'msg': 'requested channel count exceeds maximum', 'bad-attr': '.tx-count'}

After:
  $ ynl --family ethtool  --do channels-set  --json '{}'
  Netlink error: Invalid argument {'miss-type': 'header'}

  $ ynl --family ethtool  --do channels-set  --json '{..., "tx-count": 999}'
  Netlink error: requested channel count exceeds maximum: Invalid argument {'bad-attr': '.tx-count'}

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251027192958.2058340-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Mon, 27 Oct 2025 19:29:57 +0000 (12:29 -0700)]

tools: ynl: fix indent issues in the main Python lib

Class NlError() and operation_do_attributes() are indented by 2 spaces
rather than 4 spaces used by the rest of the file.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251027192958.2058340-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Paolo Abeni [Tue, 28 Oct 2025 14:34:36 +0000 (15:34 +0100)]

Merge branch 'net-stmmac-add-support-for-coarse-timestamping'

Maxime Chevallier says:

====================
net: stmmac: Add support for coarse timestamping

This is V2 for coarse timetamping support in stmmac. This version uses a
dedicated devlink param "ts_coarse" to control this mode.

This doesn't conflict with Russell's cleanup of hwif.

Maxime

[1] : https://lore.kernel.org/netdev/20200514102808.31163-1-olivier.dautricourt@orolia.com/

V1: https://lore.kernel.org/netdev/20251015102725.1297985-1-maxime.chevallier@bootlin.com/
====================

Link: https://patch.msgid.link/20251024070720.71174-1-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Maxime Chevallier [Fri, 24 Oct 2025 07:07:18 +0000 (09:07 +0200)]

net: stmmac: Add a devlink attribute to control timestamping mode

The DWMAC1000 supports 2 timestamping configurations to configure how
frequency adjustments are made to the ptp_clock, as well as the reported
timestamp values.

There was a previous attempt at upstreaming support for configuring this
mode by Olivier Dautricourt and Julien Beraud a few years back [1]

In a nutshell, the timestamping can be either set in fine mode or in
coarse mode.

In fine mode, which is the default, we use the overflow of an accumulator to
trigger frequency adjustments, but by doing so we lose precision on the
timetamps that are produced by the timestamping unit. The main drawback
is that the sub-second increment value, used to generate timestamps, can't be
set to lower than (2 / ptp_clock_freq).

The "fine" qualification comes from the frequent frequency adjustments we are
able to do, which is perfect for a PTP follower usecase.

In Coarse mode, we don't do frequency adjustments based on an
accumulator overflow. We can therefore have very fine subsecond
increment values, allowing for better timestamping precision. However
this mode works best when the ptp clock frequency is adjusted based on
an external signal, such as a PPS input produced by a GPS clock. This
mode is therefore perfect for a Grand-master usecase.

Introduce a driver-specific devlink parameter "ts_coarse" to enable or
disable coarse mode, keeping the "fine" mode as a default.

This can then be changed with:

devlink dev param set <dev> name ts_coarse value true cmode runtime

The associated documentation is also added.

[1] : https://lore.kernel.org/netdev/20200514102808.31163-1-olivier.dautricourt@orolia.com/

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20251024070720.71174-3-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Maxime Chevallier [Fri, 24 Oct 2025 07:07:17 +0000 (09:07 +0200)]

net: stmmac: Move subsecond increment configuration in dedicated helper

In preparation for fine/coarse support, let's move the subsecond increment
and addend configuration in a dedicated helper.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20251024070720.71174-2-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Paolo Abeni [Tue, 28 Oct 2025 14:17:55 +0000 (15:17 +0100)]

Merge branch 'net-macb-eyeq5-support'

says:

====================
net: macb: EyeQ5 support

This series' goal is adding support to the MACB driver for EyeQ5 GEM.
The specifics for this compatible are:

- HW cannot add dummy bytes at the start of IP packets for alignment
   purposes. The behavior can be detected using DCFG6 so it isn't
   attached to compatible data.

- The hardware LSO/TSO is known to be buggy: add a compatible
   capability flag to force disable it.

- At init, we have to wiggle two syscon registers that configure the
   PHY integration.

   In past attempts [0] we did it in macb_config->init() using a syscon
   regmap. That was far from ideal so now a generic PHY driver
   abstracts that away. We reuse the bp->sgmii_phy field used by some
   compatibles.

   We have to add a phy_set_mode() call as the PHY power on sequence
   depends on whether we do RGMII or SGMII.

[0]: https://lore.kernel.org/lkml/20250627-macb-v2-15-ff8207d0bb77@bootlin.com/

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
---
Changes in v3:
- Drop Fixes: trailer on [2/5]. We don't fix any platform using the
  driver currently.
- Improve [5/5] commit message; add info about how an unconditional
  phy_set_mode_ext() won't break existing platforms.
- Hardbreak 82 characters line in [2/5]; warning by patchwork.
- Trailers:
  - 1x Acked-by: Conor Dooley on [1/5].
  - 2x Reviewed-by: Andrew Lunn on [1/5] and [4/5].
  - 2x Reviewed-by: Maxime Chevallier on [4/5] and [5/5].
- Link to v2: https://lore.kernel.org/r/20251022-macb-eyeq5-v2-0-7c140abb0581@bootlin.com

Changes in v2:
- Drop non net-next patches.
- Re-run get_maintainers.pl to shorten the To/Cc list.
- Rebase upon latest net-next; no changes. Tested on HW.
- Link to v1: https://lore.kernel.org/r/20251021-macb-eyeq5-v1-0-3b0b5a9d2f85@bootlin.com

Past versions of the MACB EyeQ5 patches:
- March 2025: [PATCH net-next 00/13] Support the Cadence MACB/GEM
   instances on Mobileye EyeQ5 SoCs
   https://lore.kernel.org/lkml/20250321-macb-v1-0-537b7e37971d@bootlin.com/
- June 2025: [PATCH net-next v2 00/18] Support the Cadence MACB/GEM
   instances on Mobileye EyeQ5 SoCs
   https://lore.kernel.org/lkml/20250627-macb-v2-0-ff8207d0bb77@bootlin.com/
- August 2025: [PATCH net v3 00/16] net: macb: various fixes & cleanup
   https://lore.kernel.org/lkml/20250808-macb-fixes-v3-0-08f1fcb5179f@bootlin.com/

---
Théo Lebrun (5):
      dt-bindings: net: cdns,macb: add Mobileye EyeQ5 ethernet interface
      net: macb: match skb_reserve(skb, NET_IP_ALIGN) with HW alignment
      net: macb: add no LSO capability (MACB_CAPS_NO_LSO)
      net: macb: rename bp->sgmii_phy field to bp->phy
      net: macb: Add "mobileye,eyeq5-gem" compatible

.../devicetree/bindings/net/cdns,macb.yaml         | 10 +++
drivers/net/ethernet/cadence/macb.h                |  6 +-
drivers/net/ethernet/cadence/macb_main.c           | 94 +++++++++++++++++-----
3 files changed, 91 insertions(+), 19 deletions(-)
---
base-commit: 61b7ade9ba8c3b16867e25411b5f7cf1abe35879
change-id: 20251020-macb-eyeq5-fe2c0d1edc75

Best regards,
====================

Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-0-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Théo Lebrun [Thu, 23 Oct 2025 16:22:55 +0000 (18:22 +0200)]

net: macb: Add "mobileye,eyeq5-gem" compatible

Add support for the two GEM instances inside Mobileye EyeQ5 SoCs, using
compatible "mobileye,eyeq5-gem". With it, add a custom init sequence
that must grab a generic PHY and initialise it.

We use bp->phy in both RGMII and SGMII cases. Tell our mode by adding a
phy_set_mode_ext() during macb_open(), before phy_power_on(). We are
the first users of bp->phy that use it in non-SGMII cases.

The phy_set_mode_ext() call is made unconditionally. It cannot cause
issues on platforms where !bp->phy or !bp->phy->ops->set_mode as, in
those cases, the call is a no-op (returning zero). From reading
upstream DTS, we can figure out that no platform has a bp->phy and a
PHY driver that has a .set_mode() implementation:
- cdns,zynqmp-gem: no DTS upstream.
- microchip,mpfs-macb: microchip/mpfs.dtsi, &mac0..1, no PHY attached.
- xlnx,versal-gem: xilinx/versal-net.dtsi, &gem0..1, no PHY attached.
- xlnx,zynqmp-gem: xilinx/zynqmp.dtsi, &gem0..3, PHY attached to
drivers/phy/xilinx/phy-zynqmp.c which has no .set_mode().

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-5-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Théo Lebrun [Thu, 23 Oct 2025 16:22:54 +0000 (18:22 +0200)]

net: macb: rename bp->sgmii_phy field to bp->phy

The bp->sgmii_phy field is initialised at probe by init_reset_optional()
if bp->phy_interface == PHY_INTERFACE_MODE_SGMII. It gets used by:
- zynqmp_config: "cdns,zynqmp-gem" or "xlnx,zynqmp-gem" compatibles.
- mpfs_config: "microchip,mpfs-macb" compatible.
- versal_config: "xlnx,versal-gem" compatible.

Make name more generic as EyeQ5 requires the PHY in SGMII & RGMII cases.

Drop "for ZynqMP SGMII mode" comment that is already a lie, as it gets
used on Microchip platforms as well. And soon it won't be SGMII-only.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-4-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Théo Lebrun [Thu, 23 Oct 2025 16:22:53 +0000 (18:22 +0200)]

net: macb: add no LSO capability (MACB_CAPS_NO_LSO)

LSO is runtime-detected using the PBUF_LSO field inside register DCFG6.
Allow disabling that feature if it is broken by using bp->caps coming
from match data.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-3-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Théo Lebrun [Thu, 23 Oct 2025 16:22:52 +0000 (18:22 +0200)]

net: macb: match skb_reserve(skb, NET_IP_ALIGN) with HW alignment

If HW is RSC capable, it cannot add dummy bytes at the start of IP
packets. Alignment (ie number of dummy bytes) is configured using the
RBOF field inside the NCFGR register.

On the software side, the skb_reserve(skb, NET_IP_ALIGN) call must only
be done if those dummy bytes are added by the hardware; notice the
skb_reserve() is done AFTER writing the address to the device.

We cannot do the skb_reserve() call BEFORE writing the address because
the address field ignores the low 2/3 bits. Conclusion: in some cases,
we risk not being able to respect the NET_IP_ALIGN value (which is
picked based on unaligned CPU access performance).

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-2-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Théo Lebrun [Thu, 23 Oct 2025 16:22:51 +0000 (18:22 +0200)]

dt-bindings: net: cdns,macb: add Mobileye EyeQ5 ethernet interface

Add "cdns,eyeq5-gem" as compatible for the integrated GEM block inside
Mobileye EyeQ5 SoCs. It is different from other compatibles in two main
ways: (1) it requires a generic PHY and (2) it is better to keep TCP
Segmentation Offload (TSO) disabled.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-1-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Alexandra Winter [Thu, 23 Oct 2025 15:06:36 +0000 (17:06 +0200)]

dibs: Use subsys_initcall()

In the case of built-in modules, the order of module_init() calls are
derived from the Makefiles.

Use subsys_initcall() for the dibs module, to make sure dibs_init() is
executed before dibs clients like smc and dibs devices like ism are
initialized. So future dibs client or dibs device modules can use
module_init() without the risk of getting the order in the Makefiles wrong.

Reported-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251023150636.3995476-2-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Alexandra Winter [Thu, 23 Oct 2025 15:06:35 +0000 (17:06 +0200)]

dibs: Remove reset of static vars in dibs_init()

'clients' and 'max_client' are static variables and therefore don't need to
be initialized.

Reported-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20251023150636.3995476-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Paolo Abeni [Tue, 28 Oct 2025 10:11:38 +0000 (11:11 +0100)]

Merge branch 'net-mlx5-add-balance-id-support-for-lag-multiplane-groups'

Tariq Toukan says:

====================
net/mlx5: Add balance ID support for LAG multiplane groups

This series adds balance ID support for MLX5 LAG in multiplane
configurations.

See detailed description by Mark below [1].

[1]
The problem: In complex multiplane LAG setups, we need finer control over LAG
groups. Currently, devices with the same system image GUID are treated
identically, but hardware now supports per-multiplane-group balance IDs that
let us differentiate between them. On such systems image system guid
isn't enough to decide which devices should be part of which LAG.

The solution: Extend the system image GUID with a balance ID byte when the
hardware supports it. This gives us the granularity we need without breaking
existing deployments.

What this series does:

1. Clean up some duplicate code while we're here
2. Rework the system image GUID infrastructure to handle variable lengths
3. Update PTP clock pairing to use the new approach
4. Restructure capability setting to make room for the new feature
5. Actually implement the balance ID support

The key insight is in patch 5: we only append the balance ID when both
capabilities are present, so older hardware and software continue to work
exactly as before. For newer setups, you get the extra byte that enables
per-multiplane-group load balancing.

This has been tested with both old and new hardware configurations.
====================

Link: https://patch.msgid.link/1761211020-925651-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Mark Bloch [Thu, 23 Oct 2025 09:17:00 +0000 (12:17 +0300)]

net/mlx5: Add balance ID support for LAG multiplane groups

Implement balance ID support for multiplane LAG configurations. This
feature enables per-multiplane group load balancing by extending the
software system image GUID with a balance ID component.

Key implementations:
- Enable lag_per_mp_group capability when supported by hardware.
- Append load_balance_id to software system image GUID when conditions
are met.
- Increase MLX5_SW_IMAGE_GUID_MAX_BYTES from 8 to 9 to accommodate the
extra byte.

The balance ID is appended to the system image GUID only when both
load_balance_id and lag_per_mp_group capabilities are available, ensuring
backward compatibility while enabling enhanced LAG functionality.

This enhancement allows for more granular load balancing control in complex
multi-plane LAG deployments, improving network performance and flexibility.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Mark Bloch [Thu, 23 Oct 2025 09:16:59 +0000 (12:16 +0300)]

net/mlx5: Refactor HCA cap 2 setting

Refactor HCA capability 2 setting logic to be more structured and
conditional. Move the sw_vhca_id_valid setting inside proper conditional
checks and prepare the function for additional capability settings.

The refactoring:
- Always copy current capabilities to set_hca_cap buffer.
- Apply sw_vhca_id_valid setting only when conditions are met.
- Improve code readability and maintainability.

This cleanup prepares the handle_hca_cap_2() function for the upcoming
balance ID capability setting.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Mark Bloch [Thu, 23 Oct 2025 09:16:58 +0000 (12:16 +0300)]

net/mlx5: Refactor PTP clock devcom pairing

Refactor PTP clock device component pairing to use the clock identity
buffer instead of casting it to a u64 key. This change leverages the new
software system image GUID infrastructure.

Changes include:
- Pass identity buffer to mlx5_shared_clock_register().
- Use memcpy for identity buffer in devcom matching attributes.
- Remove intermediate u64 key conversion.
- Add BUILD_BUG_ON to ensure identity size fits in match key.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Mark Bloch [Thu, 23 Oct 2025 09:16:57 +0000 (12:16 +0300)]

net/mlx5: Add software system image GUID infrastructure

Replace direct hardware system image GUID usage with a new software
system image GUID function that supports variable-length identifiers.

Key changes:
- Add mlx5_query_nic_sw_system_image_guid() function with length parameter.
- Update all callsites to use the new function and buffer/length approach.
- Modify mapping contexts to use byte arrays instead of u64 keys.
- Update devcom matching to support variable-length keys.
- Change mlx5_same_hw_devs() to use buffer comparison instead of u64.

This refactoring prepares the infrastructure for balance ID support,
which requires extending the system image GUID with additional data.
The change maintains backward compatibility while enabling future
enhancements.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Mark Bloch [Thu, 23 Oct 2025 09:16:56 +0000 (12:16 +0300)]

net/mlx5: Use common mlx5_same_hw_devs function

Refactor duplicate hardware device comparison code to use the common
mlx5_same_hw_devs() function instead of reimplementing system GUID
comparison logic in multiple places.

This cleanup eliminates code duplication in:
- Bridge representor device comparison.
- TC hardware device comparison.

Using the centralized function improves maintainability and ensures
consistent behavior across the driver.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Paolo Abeni [Tue, 28 Oct 2025 09:44:19 +0000 (10:44 +0100)]

Merge branch 'implement-more-features-for-txgbe-devices'

Jiawen Wu says:

====================
Implement more features for txgbe devices

Based on the features of hardware support, implement RX desc merge and
TX head write-back for AML devices, support RSC offload for AML and SP
devices.
====================

Link: https://patch.msgid.link/20251023014538.12644-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jiawen Wu [Thu, 23 Oct 2025 01:45:38 +0000 (09:45 +0800)]

net: txgbe: support RSC offload

Support to enable and disable RSC for txgbe devices.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251023014538.12644-4-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jiawen Wu [Thu, 23 Oct 2025 01:45:37 +0000 (09:45 +0800)]

net: txgbe: support TX head write-back mode

TX head write-back mode is supported on AML devices. When it is enabled,
the hardware no longer writes the descriptors DD one by one, but write
back pointer of completion descriptor to the head_wb address.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251023014538.12644-3-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jiawen Wu [Thu, 23 Oct 2025 01:45:36 +0000 (09:45 +0800)]

net: txgbe: support RX desc merge mode

RX descriptor merge mode is supported on AML devices. When it is
enabled, the hardware process the RX descriptors in batches.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251023014538.12644-2-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Lad Prabhakar [Sat, 25 Oct 2025 06:48:50 +0000 (07:48 +0100)]

dt-bindings: net: phy: vsc8531: Convert to DT schema

Convert VSC8531 Gigabit ethernet phy binding to DT schema format. While
at it add compatible string for VSC8541 PHY which is very much similar
to the VSC8531 PHY and is already supported in the kernel. VSC8541 PHY
is present on the Renesas RZ/T2H EVK.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251025064850.393797-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Fri, 24 Oct 2025 12:07:07 +0000 (12:07 +0000)]

tcp: remove one ktime_get() from recvmsg() fast path

Each time some payload is consumed by user space (recvmsg() and friends),
TCP calls tcp_rcv_space_adjust() to run DRS algorithm to check
if an increase of sk->sk_rcvbuf is needed.

This function is based on time sampling, and currently calls
tcp_mstamp_refresh(tp), which is a wrapper around ktime_get_ns().

ktime_get_ns() has a high cost on some platforms.
100+ cycles for rdtscp on AMD EPYC Turin for instance.

We do not have to refresh tp->tcp_mpstamp, using the last cached value
is enough. We only need to refresh it from __tcp_cleanup_rbuf()
if an ACK must be sent (this is a rare event).

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251024120707.3516550-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Yue Haibing [Fri, 24 Oct 2025 02:51:45 +0000 (10:51 +0800)]

net/sched: Remove unused typedef psched_tdiff_t

Since commit 051d44209842 ("net/sched: Retire CBQ qdisc")
this is not used anymore.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20251024025145.4069583-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 28 Oct 2025 01:05:01 +0000 (18:05 -0700)]

Merge branch 'sctp-avoid-redundant-initialisation-in-sctp_accept-and-sctp_do_peeloff'

Kuniyuki Iwashima says:

====================
sctp: Avoid redundant initialisation in sctp_accept() and sctp_do_peeloff().

When sctp_accept() and sctp_do_peeloff() allocates a new socket,
somehow sk_alloc() is used, and the new socket goes through full
initialisation, but most of the fields are overwritten later.

  1)
  sctp_accept()
  |- sctp_v[46]_create_accept_sk()
  |  |- sk_alloc()
  |  |- sock_init_data()
  |  |- sctp_copy_sock()
  |  `- newsk->sk_prot->init() / sctp_init_sock()
  |
  `- sctp_sock_migrate()
     `- sctp_copy_descendant(newsk, oldsk)

  sock_init_data() initialises struct sock, but many fields are
  overwritten by sctp_copy_sock(), which inherits fields of struct
  sock and inet_sock from the parent socket.

  sctp_init_sock() fully initialises struct sctp_sock, but later
  sctp_copy_descendant() inherits most fields from the parent's
  struct sctp_sock by memcpy().

  2)
  sctp_do_peeloff()
  |- sock_create()
  |  |
  |  ...
  |      |- sk_alloc()
  |      |- sock_init_data()
  |  ...
  |    `- newsk->sk_prot->init() / sctp_init_sock()
  |
  |- sctp_copy_sock()
  `- sctp_sock_migrate()
     `- sctp_copy_descendant(newsk, oldsk)

  sock_create() creates a brand new socket, but sctp_copy_sock()
  and sctp_sock_migrate() overwrite most of the fields.

So, sk_alloc(), sock_init_data(), sctp_copy_sock(), and
sctp_copy_descendant() can be replaced with a single function
like sk_clone_lock().

This series does the conversion and removes TODO comment added
by commit 4a997d49d92ad ("tcp: Save lock_sock() for memcg in
inet_csk_accept().").

Tested accept() and SCTP_SOCKOPT_PEELOFF and both work properly.

  socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP) = 3
  bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
  listen(3, -1)                           = 0
  getsockname(3, {sa_family=AF_INET, sin_port=htons(49460), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
  socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP) = 4
  connect(4, {sa_family=AF_INET, sin_port=htons(49460), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
  accept(3, NULL, NULL)                   = 5
  ...
  socket(AF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
  bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
  listen(3, -1)                           = 0
  getsockname(3, {sa_family=AF_INET, sin_port=htons(48240), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
  socket(AF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 4
  connect(4, {sa_family=AF_INET, sin_port=htons(48240), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
  getsockopt(3, SOL_SCTP, SCTP_SOCKOPT_PEELOFF, "*\0\0\0\5\0\0\0", [8]) = 5

v1: https://lore.kernel.org/20251021214422.1941691-1-kuniyu@google.com
====================

Link: https://patch.msgid.link/20251023231751.4168390-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Thu, 23 Oct 2025 23:16:57 +0000 (23:16 +0000)]

sctp: Remove sctp_copy_sock() and sctp_copy_descendant().

Now, sctp_accept() and sctp_do_peeloff() use sk_clone(), and
we no longer need sctp_copy_sock() and sctp_copy_descendant().

Let's remove them.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Thu, 23 Oct 2025 23:16:56 +0000 (23:16 +0000)]

sctp: Use sctp_clone_sock() in sctp_do_peeloff().

sctp_do_peeloff() calls sock_create() to allocate and initialise
struct sock, inet_sock, and sctp_sock, but later sctp_copy_sock()
and sctp_sock_migrate() overwrite most fields.

What sctp_do_peeloff() does is more like accept().

Let's use sock_create_lite() and sctp_clone_sock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Thu, 23 Oct 2025 23:16:55 +0000 (23:16 +0000)]

sctp: Remove sctp_pf.create_accept_sk().

sctp_v[46]_create_accept_sk() are no longer used.

Let's remove sctp_pf.create_accept_sk().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Thu, 23 Oct 2025 23:16:54 +0000 (23:16 +0000)]

sctp: Use sk_clone() in sctp_accept().

sctp_accept() calls sctp_v[46]_create_accept_sk() to allocate a new
socket and calls sctp_sock_migrate() to copy fields from the parent
socket to the new socket.

sctp_v4_create_accept_sk() allocates sk by sk_alloc(), initialises
it by sock_init_data(), and copy a bunch of fields from the parent
socekt by sctp_copy_sock().

sctp_sock_migrate() calls sctp_copy_descendant() to copy most fields
in sctp_sock from the parent socket by memcpy().

These can be simply replaced by sk_clone().

Let's consolidate sctp_v[46]_create_accept_sk() to sctp_clone_sock()
with sk_clone().

We will reuse sctp_clone_sock() for sctp_do_peeloff() and then remove
sctp_copy_descendant().

Note that sock_reset_flag(newsk, SOCK_ZAPPED) is not copied to
sctp_clone_sock() as sctp does not use SOCK_ZAPPED at all.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Thu, 23 Oct 2025 23:16:53 +0000 (23:16 +0000)]

net: Add sk_clone().

sctp_accept() will use sk_clone_lock(), but it will be called
with the parent socket locked, and sctp_migrate() acquires the
child lock later.

Let's add no lock version of sk_clone_lock().

Note that lockdep complains if we simply use bh_lock_sock_nested().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Thu, 23 Oct 2025 23:16:52 +0000 (23:16 +0000)]

sctp: Don't call sk->sk_prot->init() in sctp_v[46]_create_accept_sk().

sctp_accept() calls sctp_v[46]_create_accept_sk() to allocate a new
socket and calls sctp_sock_migrate() to copy fields from the parent
socket to the new socket.

sctp_v[46]_create_accept_sk() calls sctp_init_sock() to initialise
sctp_sock, but most fields are overwritten by sctp_copy_descendant()
called from sctp_sock_migrate().

Things done in sctp_init_sock() but not in sctp_sock_migrate() are
the following:

  1. Copy sk->sk_gso
  2. Copy sk->sk_destruct (sctp_v6_init_sock())
  3. Allocate sctp_sock.ep
  4. Initialise sctp_sock.pd_lobby
  5. Count sk_sockets_allocated_inc(), sock_prot_inuse_add(),
     and SCTP_DBG_OBJCNT_INC()

Let's do these in sctp_copy_sock() and sctp_sock_migrate() and avoid
calling sk->sk_prot->init() in sctp_v[46]_create_accept_sk().

Note that sk->sk_destruct is already copied in sctp_copy_sock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Thu, 23 Oct 2025 23:16:51 +0000 (23:16 +0000)]

sctp: Don't copy sk_sndbuf and sk_rcvbuf in sctp_sock_migrate().

sctp_sock_migrate() is called from 2 places.

1) sctp_accept() calls sp->pf->create_accept_sk() before
   sctp_sock_migrate(), and sp->pf->create_accept_sk() calls
   sctp_copy_sock().

2) sctp_do_peeloff() also calls sctp_copy_sock() before
   sctp_sock_migrate().

sctp_copy_sock() copies sk_sndbuf and sk_rcvbuf from the
parent socket.

Let's not copy the two fields in sctp_sock_migrate().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Thu, 23 Oct 2025 23:16:50 +0000 (23:16 +0000)]

sctp: Defer SCTP_DBG_OBJCNT_DEC() to sctp_destroy_sock().

SCTP_DBG_OBJCNT_INC() is called only when sctp_init_sock()
returns 0 after successfully allocating sctp_sk(sk)->ep.

OTOH, SCTP_DBG_OBJCNT_DEC() is called in sctp_close().

The code seems to expect that the socket is always exposed
to userspace once SCTP_DBG_OBJCNT_INC() is incremented, but
there is a path where the assumption is not true.

In sctp_accept(), sctp_sock_migrate() could fail after
sctp_init_sock().

Then, sk_common_release() does not call inet_release() nor
sctp_close(). Instead, it calls sk->sk_prot->destroy().

Let's move SCTP_DBG_OBJCNT_DEC() from sctp_close() to
sctp_destroy_sock().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 28 Oct 2025 01:04:38 +0000 (18:04 -0700)]

Merge branch 'convert-net-drivers-to-ndo_hwtstamp-api-part-2'

Vadim Fedorenko says:

====================
convert net drivers to ndo_hwtstamp API part 2

This is part 2 of patchset to convert drivers which support HW
timestamping to use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.
The new API uses netlink to communicate with user-space and have some
test coverage.
====================

Link: https://patch.msgid.link/20251023220457.3201122-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Vadim Fedorenko [Thu, 23 Oct 2025 22:04:57 +0000 (22:04 +0000)]

net: hns3: add hwtstamp_get/hwtstamp_set ops

And .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks to HNS3 framework
to support HW timestamp configuration via netlink and adopt hns3pf to
use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251023220457.3201122-7-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Vadim Fedorenko [Thu, 23 Oct 2025 22:04:56 +0000 (22:04 +0000)]

net: renesas: rswitch: convert to ndo_hwtstamp API

Convert driver to use .ndo_hwtstamp_set()/.ndo_hwtstamp_get() callbacks.
rswitch_eth_ioctl() becomes phy_do_ioctl_running(), remove it and
replace .ndo_eth_ioctl callback with phy_do_ioctl_running().

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251023220457.3201122-6-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Vadim Fedorenko [Thu, 23 Oct 2025 22:04:55 +0000 (22:04 +0000)]

net: ravb: convert to ndo_hwtstamp API

Convert driver to use .ndo_hwtstamp_set()/.ndo_hwtstamp_get callbacks.
ravb_do_ioctl() becomes pure phy_do_ioctl_running(), remove it and
replace in callbacks.

Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251023220457.3201122-5-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

A mirror of Linus' kernel repository