]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
6 weeks agoMIPS: BCM47XX: remove creating a fixed phy
Heiner Kallweit [Thu, 30 Oct 2025 21:45:30 +0000 (22:45 +0100)] 
MIPS: BCM47XX: remove creating a fixed phy

Now that b44 ethernet driver creates a fixed phy if needed, we can
remove this here.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/8983b705-6bca-4728-9283-7aa60f49340f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: b44: register a fixed phy using fixed_phy_register_100fd if needed
Heiner Kallweit [Thu, 30 Oct 2025 21:44:35 +0000 (22:44 +0100)] 
net: b44: register a fixed phy using fixed_phy_register_100fd if needed

In case of bcm47xx a fixed phy is used, which so far is created
by platform code, using fixed_phy_add(). This function has a number of
problems, therefore create a potentially needed fixed phy here, using
fixed_phy_register_100fd.

Due to lack of hardware, this is compile-tested only.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/53e4e74d-a49e-4f37-b970-5543a35041db@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agom68k: coldfire: remove creating a fixed phy
Heiner Kallweit [Thu, 30 Oct 2025 21:43:36 +0000 (22:43 +0100)] 
m68k: coldfire: remove creating a fixed phy

Now that the fec ethernet driver creates a fixed phy if needed,
we can remove this here.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/212e0cb5-a2f5-460f-8e03-3c3369d0acf1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: fec: register a fixed phy using fixed_phy_register_100fd if needed
Heiner Kallweit [Thu, 30 Oct 2025 21:42:30 +0000 (22:42 +0100)] 
net: fec: register a fixed phy using fixed_phy_register_100fd if needed

In case of coldfire/5272 a fixed phy is used, which so far is created
by platform code, using fixed_phy_add(). This function has a number of
problems, therefore create a potentially needed fixed phy here, using
fixed_phy_register_100fd.

Note 1: This includes a small functional change, as coldfire/5272
created a fixed phy in half-duplex mode. Likely this was by mistake,
because the fec MAC is 100FD-capable, and connection is to a switch.

Note 2: Usage of phy_find_next() makes use of the fact that dev_id can
only be 0 or 1.

Due to lack of hardware, this is compile-tested only.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/adf4dc5c-5fa3-4ae6-a75c-a73954dede73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: fixed_phy: add helper fixed_phy_register_100fd
Heiner Kallweit [Thu, 30 Oct 2025 21:41:19 +0000 (22:41 +0100)] 
net: phy: fixed_phy: add helper fixed_phy_register_100fd

In few places a 100FD fixed PHY is used. Create a helper so that users
don't have to define the struct fixed_phy_status.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/bf564b19-e9bc-4896-aeae-9f721cc4fecd@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-altera-tse-cleanup-init-sequence'
Jakub Kicinski [Wed, 5 Nov 2025 02:15:17 +0000 (18:15 -0800)] 
Merge branch 'net-altera-tse-cleanup-init-sequence'

Maxime Chevallier says:

====================
net: altera-tse: Cleanup init sequence

Altera TSE cleanup to make sure everything is properly intialized
before registering the netdev.

When Altera TSE was converted to phylink, the PCS and phylink creation
were added after register_netdev(), which is wrong as this may race
with .ndo_open() once the netdev is registered.

This series makes so that we register the netdev once all resources are
cleanly initialised, that includes PCS and phylink creation as well as a
few other operations such as reading the IP version.

No errors were found in the wild, so this series doesn't target net, but
given that we fix some racy-ness, a point could be made to send that to
net.

This series doesn't introduce functional changes, however the internal
mii_bus for PCS configuration is renamed.

v1: https://lore.kernel.org/20251030102418.114518-1-maxime.chevallier@bootlin.com
====================

Link: https://patch.msgid.link/20251103104928.58461-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: altera-tse: Init PCS and phylink before registering netdev
Maxime Chevallier [Mon, 3 Nov 2025 10:49:27 +0000 (11:49 +0100)] 
net: altera-tse: Init PCS and phylink before registering netdev

register_netdev() must be done only once all resources are ready, as
they may be used in .ndo_open() immediately upon registration.

Move the lynx PCS and phylink initialisation before registerng the
netdevice. We also remove the call to netif_carrier_off(), as phylink
takes care of that.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251103104928.58461-5-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: altera-tse: Don't use netdev name for the PCS mdio bus
Maxime Chevallier [Mon, 3 Nov 2025 10:49:26 +0000 (11:49 +0100)] 
net: altera-tse: Don't use netdev name for the PCS mdio bus

The PCS mdio bus must be created before registering the net_device. To
do that, we musn't depend on the netdev name to create the mdio bus
name. Let's use the device's name instead.

Note that this changes the bus name in /sys/bus/mdiobus

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251103104928.58461-4-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: altera-tse: Warn on bad revision at probe time
Maxime Chevallier [Mon, 3 Nov 2025 10:49:25 +0000 (11:49 +0100)] 
net: altera-tse: Warn on bad revision at probe time

Instead of reading the core revision at probe time, and print a warning
for an unexecpected version at .ndo_open() time, let's print that
warning directly in .probe().

This allows getting rid of the "revision" private field, and also
prevent a potential race between reading the revision in .probe() after
netdev registration, and accessing that revision in .ndo_open().

By printing the warning after register_netdev(), we are sure that we
have a netdev name, and that we try to print the revision after having
read it from the internal registers.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251103104928.58461-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: altera-tse: Set platform drvdata before registering netdev
Maxime Chevallier [Mon, 3 Nov 2025 10:49:24 +0000 (11:49 +0100)] 
net: altera-tse: Set platform drvdata before registering netdev

We don't have to wait until netdev is registered before setting it as the
pdev's drvdata. Move it at netdev alloc time.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251103104928.58461-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: make phy_device members pause and asym_pause bitfield bits
Heiner Kallweit [Mon, 3 Nov 2025 22:26:49 +0000 (23:26 +0100)] 
net: phy: make phy_device members pause and asym_pause bitfield bits

We can reduce the size of struct phy_device a little by switching
the type of members pause and asym_pause from int to a single bit.
As C99 is supported now, we can use type bool for the bitfield members,
what provides us with the benefit of the usual implicit bool conversions.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/764e9a31-b40b-4dc9-b808-118192a16d87@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'add-driver-for-1gbe-network-chips-from-mucse'
Jakub Kicinski [Wed, 5 Nov 2025 02:11:38 +0000 (18:11 -0800)] 
Merge branch 'add-driver-for-1gbe-network-chips-from-mucse'

Dong Yibo says:

====================
Add driver for 1Gbe network chips from MUCSE

This patch series adds support for MUCSE RNPGBE 1Gbps PCIe Ethernet controllers
(N500/N210 series), including build infrastructure, hardware initialization,
mailbox (MBX) communication with firmware, and basic netdev registration
(Can show mac witch is got from firmware, and tx/rx will be added later).

Series breakdown (5 patches):
 01/05: net: ethernet/mucse: Add build support for rnpgbe
       - Kconfig/Makefile for MUCSE vendor, basic PCI probe (no netdev)
 02/05: net: ethernet/mucse: Add N500/N210 chip support
       - netdev allocation, BAR mapping
 03/05: net: ethernet/mucse: Add basic MBX ops for PF-FW communication
       - base read/write, write with poll ack, poll and read data
 04/05: net: ethernet/mucse: Add FW commands (sync, reset, MAC query)
       - FW sync retry logic, MAC address retrieval, reset hw with
         base mbx ops in patch4
 05/05: net: ethernet/mucse: Complete netdev registration
       - HW reset, MAC setup, netdev_ops registration
====================

Link: https://patch.msgid.link/20251101013849.120565-1-dong100@mucse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: rnpgbe: Add register_netdev
Dong Yibo [Sat, 1 Nov 2025 01:38:49 +0000 (09:38 +0800)] 
net: rnpgbe: Add register_netdev

Complete the network device (netdev) registration flow for Mucse Gbe
Ethernet chips, including:
1. Hardware state initialization:
   - Send powerup notification to firmware (via echo_fw_status)
   - Sync with firmware
   - Reset hardware
2. MAC address handling:
   - Retrieve permanent MAC from firmware (via mucse_mbx_get_macaddr)
   - Fallback to random valid MAC (eth_random_addr) if not valid mac
     from Fw

Signed-off-by: Dong Yibo <dong100@mucse.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251101013849.120565-6-dong100@mucse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: rnpgbe: Add basic mbx_fw support
Dong Yibo [Sat, 1 Nov 2025 01:38:48 +0000 (09:38 +0800)] 
net: rnpgbe: Add basic mbx_fw support

Add fundamental firmware (FW) communication operations via PF-FW
mailbox, including:
- FW sync (via HW info query with retries)
- HW reset (post FW command to reset hardware)
- MAC address retrieval (request FW for port-specific MAC)
- Power management (powerup/powerdown notification to FW)

Signed-off-by: Dong Yibo <dong100@mucse.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251101013849.120565-5-dong100@mucse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: rnpgbe: Add basic mbx ops support
Dong Yibo [Sat, 1 Nov 2025 01:38:47 +0000 (09:38 +0800)] 
net: rnpgbe: Add basic mbx ops support

Add fundamental mailbox (MBX) communication operations between PF
(Physical Function) and firmware for n500/n210 chips

Signed-off-by: Dong Yibo <dong100@mucse.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251101013849.120565-4-dong100@mucse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: rnpgbe: Add n500/n210 chip support with BAR2 mapping
Dong Yibo [Sat, 1 Nov 2025 01:38:46 +0000 (09:38 +0800)] 
net: rnpgbe: Add n500/n210 chip support with BAR2 mapping

Add hardware initialization foundation for MUCSE 1Gbe controller,
including:
1. Map PCI BAR2 as hardware register base;
2. Bind PCI device to driver private data (struct mucse) and
   initialize hardware context (struct mucse_hw);
3. Reserve board-specific init framework via rnpgbe_init_hw.

Signed-off-by: Dong Yibo <dong100@mucse.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20251101013849.120565-3-dong100@mucse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: rnpgbe: Add build support for rnpgbe
Dong Yibo [Sat, 1 Nov 2025 01:38:45 +0000 (09:38 +0800)] 
net: rnpgbe: Add build support for rnpgbe

Add build options and doc for mucse.
Initialize pci device access for MUCSE devices.

Signed-off-by: Dong Yibo <dong100@mucse.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20251101013849.120565-2-dong100@mucse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoti: netcp: convert to ndo_hwtstamp callbacks
Vadim Fedorenko [Mon, 3 Nov 2025 17:29:02 +0000 (17:29 +0000)] 
ti: netcp: convert to ndo_hwtstamp callbacks

Convert TI NetCP driver to use ndo_hwtstamp_get()/ndo_hwtstamp_set()
callbacks. The logic is slightly changed, because I believe the original
logic was not really correct. Config reading part is using the very
first module to get the configuration instead of iterating over all of
them and keep the last one as the configuration is supposed to be identical
for all modules. HW timestamp config set path is now trying to configure
all modules, but in case of error from one module it adds extack
message. This way the configuration will be as synchronized as possible.

There are only 2 modules using netcp core infrastructure, and both use
the very same function to configure HW timestamping, so no actual
difference in behavior is expected.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20251103172902.3538392-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'convert-drivers-to-use-ndo_hwtstamp-callbacks-part-3'
Jakub Kicinski [Wed, 5 Nov 2025 01:32:04 +0000 (17:32 -0800)] 
Merge branch 'convert-drivers-to-use-ndo_hwtstamp-callbacks-part-3'

Vadim Fedorenko says:

====================
convert drivers to use ndo_hwtstamp callbacks part 3 [part]

This patchset converts the rest of ethernet drivers to use ndo callbacks
instead ioctl to configure and report time stamping. The drivers in part
3 originally implemented only SIOCSHWTSTAMP command, but converted to
also provide configuration back to users.
====================

Link: https://patch.msgid.link/20251103150952.3538205-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: pch_gbe: convert to use ndo_hwtstamp callbacks
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:51 +0000 (15:09 +0000)] 
net: pch_gbe: convert to use ndo_hwtstamp callbacks

The driver implemented SIOCSHWTSTAMP ioctl command only, but it stores
configuration in the private data, so it is possible to report it back
to users. Implement both ndo_hwtstamp_set and ndo_hwtstamp_get
callbacks. To properly report RX filter type, store it in hwts_rx_en
instead of using this field as a simple flag. The logic didn't change
because receive path used this field as boolean flag.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20251103150952.3538205-7-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: thunderx: convert to use ndo_hwtstamp callbacks
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:50 +0000 (15:09 +0000)] 
net: thunderx: convert to use ndo_hwtstamp callbacks

The driver implemented SIOCSHWTSTAMP ioctl command only, but it also
stores configuration in private data, so it's possible to report it back
to users. Implement both ndo_hwtstamp_set and ndo_hwtstamp_get
callbacks.

Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251103150952.3538205-6-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: octeon: mgmt: convert to use ndo_hwtstamp callbacks
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:49 +0000 (15:09 +0000)] 
net: octeon: mgmt: convert to use ndo_hwtstamp callbacks

The driver implemented SIOCSHWTSTAMP ioctl command only. But it stores
timestamping configuration, so it is possible to report it to users.
Implement both ndo_hwtstamp_set and ndo_hwtstamp_get callbacks. After
this the ndo_eth_ioctl effectively becomes phy_do_ioctl - adjust
callback accordingly.

Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251103150952.3538205-5-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: liquidio_vf: convert to use ndo_hwtstamp callbacks
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:48 +0000 (15:09 +0000)] 
net: liquidio_vf: convert to use ndo_hwtstamp callbacks

The driver implemented SIOCSHWTSTAMP ioctl command only, but there is a
way to get configuration back. Implement both ndo_hwtstamp_set and
ndo_hwtstamp_set callbacks.

Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251103150952.3538205-4-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: liquidio: convert to use ndo_hwtstamp callbacks
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:47 +0000 (15:09 +0000)] 
net: liquidio: convert to use ndo_hwtstamp callbacks

The driver implemented SIOCSHWTSTAMP ioctl command only, but there is a
way to get configured status. Implement both ndo_hwtstamp_set and
ndo_hwtstamp_get callbacks.

Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251103150952.3538205-3-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agodt-bindings: net: ethernet-phy: clarify when compatible must specify PHY ID
Buday Csaba [Mon, 3 Nov 2025 08:13:42 +0000 (09:13 +0100)] 
dt-bindings: net: ethernet-phy: clarify when compatible must specify PHY ID

Change PHY ID description in ethernet-phy.yaml to clarify that a
PHY ID is required (may -> must) when the PHY requires special
initialization sequence.

Link: https://lore.kernel.org/netdev/20251026212026.GA2959311-robh@kernel.org/
Link: https://lore.kernel.org/netdev/aQIZvDt5gooZSTcp@debianbuilder/
Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/64c52d1a726944a68a308355433e8ef0f82c4240.1762157515.git.buday.csaba@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: devmem: Remove unused declaration net_devmem_bind_tx_release()
Yue Haibing [Mon, 3 Nov 2025 07:20:46 +0000 (15:20 +0800)] 
net: devmem: Remove unused declaration net_devmem_bind_tx_release()

Commit bd61848900bf ("net: devmem: Implement TX path") declared this
but never implemented it.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20251103072046.1670574-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'mptcp-pm-in-kernel-fullmesh-endp-nb-bind-cases'
Jakub Kicinski [Wed, 5 Nov 2025 01:16:06 +0000 (17:16 -0800)] 
Merge branch 'mptcp-pm-in-kernel-fullmesh-endp-nb-bind-cases'

Matthieu Baerts says:

====================
mptcp: pm: in-kernel: fullmesh endp nb + bind cases

Here is a small optimisation for the in-kernel PM, joined by a small
behavioural change to avoid confusions, and followed by a few more
tests.

- Patch 1: record fullmesh endpoints numbers, not to iterate over all
  endpoints to check if one is marked as fullmesh.

- Patch 2: when at least one endpoint is marked as fullmesh, only use
  these endpoints when reacting to an ADD_ADDR, even if there are no
  endpoints for this IP family: this is less confusing.

- Patch 3: reduce duplicated code to prepare the next patch.

- Patch 4: extra "bind" cases: the listen socket restrict the bind to
  one IP address, not allowing MP_JOIN to extra IP addresses, except if
  another listening socket accepts them.
====================

Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-0-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: mptcp: join: validate extra bind cases
Matthieu Baerts (NGI0) [Sat, 1 Nov 2025 17:56:54 +0000 (18:56 +0100)] 
selftests: mptcp: join: validate extra bind cases

By design, an MPTCP connection will not accept extra subflows where no
MPTCP listening sockets can accept such requests.

In other words, it means that if the 'server' listens on a specific
address / device, it cannot accept MP_JOIN sent to a different address /
device. Except if there is another MPTCP listening socket accepting
them.

This is what the new tests are validating:

 - Forcing a bind on the main v4/v6 address, and checking that MP_JOIN
   to announced addresses are not accepted.

 - Also forcing a bind on the main v4/v6 address, but before, another
   listening socket is created to accept additional subflows. Note that
   'mptcpize run nc -l' -- or something else only doing: socket(MPTCP),
   bind(<IP>), listen(0) -- would be enough, but here mptcp_connect is
   reused not to depend on another tool just for that.

 - Same as the previous one, but using v6 link-local addresses: this is
   a bit particular because it is required to specify the outgoing
   network interface when connecting to a link-local address announced
   by the other peer. When using the routing rules, this doesn't work
   (the outgoing interface is not known) ; but it does work with a
   'laminar' endpoint having a specified interface.

Note that extra small modifications are needed for these tests to work:

 - mptcp_connect's check_getpeername_connect() check should strip the
   specified interface when comparing addresses.

 - With IPv6 link-local addresses, it is required to wait for them to
   be ready (no longer in 'tentative' mode) before using them, otherwise
   the bind() will not be allowed.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/591
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-4-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: mptcp: join: do_transfer: reduce code dup
Matthieu Baerts (NGI0) [Sat, 1 Nov 2025 17:56:53 +0000 (18:56 +0100)] 
selftests: mptcp: join: do_transfer: reduce code dup

The same extra long commands are present twice, with small differences:
the variable for the stdin file is different.

Use new dedicated variables in one command to avoid this code
duplication.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-3-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agomptcp: pm: in kernel: only use fullmesh endp if any
Matthieu Baerts (NGI0) [Sat, 1 Nov 2025 17:56:52 +0000 (18:56 +0100)] 
mptcp: pm: in kernel: only use fullmesh endp if any

Our documentation is saying that the in-kernel PM is only using fullmesh
endpoints to establish subflows to announced addresses when at least one
endpoint has a fullmesh flag. But this was not totally correct: only
fullmesh endpoints were used if at least one endpoint *from the same
address family as the received ADD_ADDR* has the fullmesh flag.

This is confusing, and it seems clearer not to have differences
depending on the address family.

So, now, when at least one MPTCP endpoint has a fullmesh flag, the local
addresses are picked from all fullmesh endpoints, which might be 0 if
there are no endpoints for the correct address family.

One selftest needs to be adapted for this behaviour change.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-2-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agomptcp: pm: in-kernel: record fullmesh endp nb
Matthieu Baerts (NGI0) [Sat, 1 Nov 2025 17:56:51 +0000 (18:56 +0100)] 
mptcp: pm: in-kernel: record fullmesh endp nb

Instead of iterating over all endpoints, under RCU read lock, just to
check if one of them as the fullmesh flag, we can keep a counter of
fullmesh endpoint, similar to what is done with the other flags.

This counter is now checked, before iterating over all endpoints.

Similar to the other counters, this new one is also exposed. A userspace
app can then know when it is being used in a fullmesh mode, with
potentially (too) many subflows.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-1-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-mlx5e-reduce-interface-downtime-on-configuration-change'
Jakub Kicinski [Wed, 5 Nov 2025 01:04:53 +0000 (17:04 -0800)] 
Merge branch 'net-mlx5e-reduce-interface-downtime-on-configuration-change'

Tariq Toukan says:

====================
net/mlx5e: Reduce interface downtime on configuration change

This series significantly reduces the interface downtime while swapping
channels during a configuration change, on capable devices.

Here we remove an old requirement on operations ordering that became
obsolete on recent capable devices. This helps cutting the downtime by a
factor of magnitude, ~80% in our example.

Perf numbers:
Measured the number of dropped packets in a simple ping flood test,
during a configuration change operation, that switches the number of
channels from 247 to 248.

Before: 71 packets lost
After:  15 packets lost, ~80% saving.
====================

Link: https://patch.msgid.link/1761831159-1013140-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Defer channels closure to reduce interface down time
Tariq Toukan [Thu, 30 Oct 2025 13:32:39 +0000 (15:32 +0200)] 
net/mlx5e: Defer channels closure to reduce interface down time

Cap bit tis_tir_td_order=1 indicates that an old firmware requirement /
limitation no longer exists. When unset, the latency of several firmware
commands significantly increases with the presence of high number of
co-existing channels (both old and new sets). Hence, we used to close
unneeded old channels before invoking those firmware commands.

Today, on capable devices, this is no longer the case. Minimize the
interface down time by deferring the old channels closure, after the
activation of the new ones.

Perf numbers:
Measured the number of dropped packets in a simple ping flood test,
during a configuration change operation, that switches the number of
channels from 247 to 248.

Before: 71 packets lost
After:  15 packets lost, ~80% saving.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-8-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Pass old channels as argument to mlx5e_switch_priv_channels
Tariq Toukan [Thu, 30 Oct 2025 13:32:38 +0000 (15:32 +0200)] 
net/mlx5e: Pass old channels as argument to mlx5e_switch_priv_channels

Let the caller function mlx5e_safe_switch_params() maintain a copy
of the old channels, and pass it to mlx5e_switch_priv_channels().

This is in preparation for the next patch.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Do not re-apply TIR loopback configuration if not necessary
Tariq Toukan [Thu, 30 Oct 2025 13:32:37 +0000 (15:32 +0200)] 
net/mlx5e: Do not re-apply TIR loopback configuration if not necessary

On old firmware, (tis_tir_td_order=0), TIR of a transport domain should
either be created after all SQs of the same domain, or TIR.self_lb_en
should be reapplied using MODIFY_TIR, for self loopback filtering to
function correctly.

This is not necessary anymnore on new FW (tis_tir_td_order=1), thus
there's no need for calling modify_tir operations after creating a new
set of SQs to maintain the self loopback prevention functional.

Skip these operations.

This saves O(max_num_channels) MODIFY_TIR firmware commands in
operations like interface up or channels configuration change.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5: IPoIB, set self loopback prevention in TIR init
Tariq Toukan [Thu, 30 Oct 2025 13:32:36 +0000 (15:32 +0200)] 
net/mlx5: IPoIB, set self loopback prevention in TIR init

In IPoIB, the self loopback prevention configuration apply in activation
stage has two roles: fulfill a firmware requirement for old firmware
(tis_tir_td_order=0), and update the proper configuration as it was not
set in init.

Here we set the proper configuration in init, to allow skipping the
modify_tirs commands on new firmware in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Allow setting self loopback prevention bits on TIR init
Tariq Toukan [Thu, 30 Oct 2025 13:32:35 +0000 (15:32 +0200)] 
net/mlx5e: Allow setting self loopback prevention bits on TIR init

Until now, IPoIB was creating TIRs without setting self loopback
prevention, then modifying them in activation stage.

This is a preparation patch, that will be used by IPoIB to init TIRs
properly without the need for following calls of modify_tir.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Use TIR API in mlx5e_modify_tirs_lb()
Tariq Toukan [Thu, 30 Oct 2025 13:32:34 +0000 (15:32 +0200)] 
net/mlx5e: Use TIR API in mlx5e_modify_tirs_lb()

Extend the TIR API and use it in mlx5e_modify_tirs_lb() instead of the
explicit modify_tir code.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mlx5e: Enhance function structures for self loopback prevention application
Tariq Toukan [Thu, 30 Oct 2025 13:32:33 +0000 (15:32 +0200)] 
net/mlx5e: Enhance function structures for self loopback prevention application

The re-application of self loopback prevention attributes in TIRs is
necessary in old firmwares (where tis_tir_td_order cap is cleared) after
recreation of SQs.

However, this is not needed in new firmware with tis_tir_td_order=1.

As a preparation patch, enhance the function structures to differentiate
between an explicit loopback prevention configuration apply, and the
re-apply operation required by old firmware.

Loopback selftests should now call mlx5e_modify_tirs_lb() directly, as
their use case is not related to the firmware limitation.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoxen/netfront: Comment Correction: Fix Spelling Error and Description of Queue Quantit...
Chu Guangqing [Mon, 3 Nov 2025 03:22:12 +0000 (11:22 +0800)] 
xen/netfront: Comment Correction: Fix Spelling Error and Description of Queue Quantity Rules

The original comments contained spelling errors and incomplete logical
descriptions, which could easily lead to misunderstandings of the code
logic. The specific modifications are as follows:

Correct the spelling error by changing "inut max" to "but not exceed the
maximum limit";

Add the note "If the user has not specified a value, the default maximum
limit is 8" to clarify the default value logic;

Improve the coherence of the statement to make the queue quantity rules
clearer.

After the modification, the comments can accurately reflect the code
behavior of "taking the smaller value between the number of CPUs and the
default maximum limit of 8 for the number of queues", enhancing code
maintainability.

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://patch.msgid.link/20251103032212.2462-1-chuguangqing@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: sungem_phy: Fix a typo error in sungem_phy
Chu Guangqing [Mon, 3 Nov 2025 05:44:43 +0000 (13:44 +0800)] 
net: sungem_phy: Fix a typo error in sungem_phy

Fix a spelling mistakes for regularly

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251103054443.2878-1-chuguangqing@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoveth: Fix a typo error in veth
Chu Guangqing [Mon, 3 Nov 2025 05:53:51 +0000 (13:53 +0800)] 
veth: Fix a typo error in veth

Fix a spellling error for resources

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251103055351.3150-1-chuguangqing@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agogtp: Fix a typo error for size
Chu Guangqing [Mon, 3 Nov 2025 06:05:04 +0000 (14:05 +0800)] 
gtp: Fix a typo error for size

Fix the spelling error of "size".

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251103060504.3524-1-chuguangqing@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agovirtio_net: Fix a typo error in virtio_net
Chu Guangqing [Mon, 3 Nov 2025 07:43:05 +0000 (15:43 +0800)] 
virtio_net: Fix a typo error in virtio_net

Fix the spelling error of "separate".

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20251103074305.4727-1-chuguangqing@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-stmmac-multi-interface-stmmac'
Jakub Kicinski [Wed, 5 Nov 2025 00:15:56 +0000 (16:15 -0800)] 
Merge branch 'net-stmmac-multi-interface-stmmac'

Russell King says:

====================
net: stmmac: multi-interface stmmac

This series adds a callback for platform glue to configure the stmmac
core interface mode depending on the PHY interface mode that is being
used. This is currently only called just before the dwmac core is reset
since these signals are latched on reset.

Included in this series are changes to s32 to move its PHY_INTF_SEL_x
definitions out of the way of the dwmac core's signals which has more
entitlement to use this name. We convert dwmac-imx as an example.

Including other platform glue would make this series excessively large,
but once this core code is merged, the individual platform glue updates
can be posted one after another as they will be independent of each
other.

It is hoped that this callback can be used in future to reconfigure the
dwmac core when the interface mode changes to support PHYs that change
their interface mode, but we're nowhere near being able to do that yet.
====================

Link: https://patch.msgid.link/aQiWzyrXU_2hGJ4j@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: imx: use ->set_phy_intf_sel()
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:46 +0000 (11:50 +0000)] 
net: stmmac: imx: use ->set_phy_intf_sel()

Rather than placing the phy_intf_sel() setup in the ->init() method,
move it to the new ->set_phy_intf_sel() method.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt5C-0000000ChpR-2kAB@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: imx: cleanup arguments for set_intf_mode() method
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:41 +0000 (11:50 +0000)] 
net: stmmac: imx: cleanup arguments for set_intf_mode() method

Pass the imx_priv_data instead of the plat_stmmacenet_data into the
set_intf_mode() SoC specific methods.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt57-0000000ChpL-25kS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: imx: simplify set_intf_mode() implementations
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:36 +0000 (11:50 +0000)] 
net: stmmac: imx: simplify set_intf_mode() implementations

Simplify the set_intf_mode() implementations, testing the phy_intf_sel
value rather than the PHY interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt52-0000000ChpG-1bsd@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: imx: use stmmac_get_phy_intf_sel()
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:31 +0000 (11:50 +0000)] 
net: stmmac: imx: use stmmac_get_phy_intf_sel()

i.MX implementations other than IMX8DXL involve setting the dwmac core
phy_intf_sel input. Use stmmac_get_phy_intf_sel() to decode the PHY
interface mode to the phy_intf_sel value, validating the result, and
passing it into the implementation specific .set_intf_mode() method
rather than each .set_intf_mode() method doing this.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt4x-0000000ChpA-1Edr@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: imx: use FIELD_PREP()/FIELD_GET() for PHY_INTF_SEL_x
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:26 +0000 (11:50 +0000)] 
net: stmmac: imx: use FIELD_PREP()/FIELD_GET() for PHY_INTF_SEL_x

Use FIELD_PREP()/FIELD_GET() in the functions to construct the PHY
interface selection bitfield or to extract its value.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt4s-0000000Chp4-0kwf@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: imx: convert to PHY_INTF_SEL_xxx
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:21 +0000 (11:50 +0000)] 
net: stmmac: imx: convert to PHY_INTF_SEL_xxx

Convert dwmac-imx to use the PHY_INTF_SEL_xxx definitions rather than
constants via:
- ensuring that the prefix for the MASK and value definitions is the
  same.
- using FIELD_PREP() to shift the PHY_INTF_SEL_xxx definition to the
  appropriate bitfield.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt4n-0000000Choy-0IeG@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: add support for configuring the phy_intf_sel inputs
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:15 +0000 (11:50 +0000)] 
net: stmmac: add support for configuring the phy_intf_sel inputs

When dwmac is synthesised with support for multiple PHY interfaces, the
core provides phy_intf_sel inputs, sampled on reset, to configure the
PHY facing interface. Use stmmac_get_phy_intf_sel() in core code to
determine the dwmac phy_intf_sel input value, and provide a new
platform method called with this value just before we issue a soft
reset to the dwmac core.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt4h-0000000Chos-3wxX@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: add stmmac_get_phy_intf_sel()
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:10 +0000 (11:50 +0000)] 
net: stmmac: add stmmac_get_phy_intf_sel()

Provide a function to translate the PHY interface mode to the
phy_intf_sel pin configuration for dwmac1000 and dwmac4 cores that
support multiple interfaces. We currently handle MII, GMII, RGMII,
SGMII, RMII and REVMII, but not TBI, RTBI nor SMII as drivers do not
appear to use these three and the driver doesn't currently support
these.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt4c-0000000Choe-3SII@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: add phy_intf_sel and ACTPHYIF definitions
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:05 +0000 (11:50 +0000)] 
net: stmmac: add phy_intf_sel and ACTPHYIF definitions

Add definitions for the active PHY interface found in DMA hardware
feature register 0, and also used to configure the core in multi-
interface designs via phy_intf_sel.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vFt4X-0000000ChoY-30p9@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: s32: move PHY_INTF_SEL_x definitions out of the way
Russell King (Oracle) [Mon, 3 Nov 2025 11:50:00 +0000 (11:50 +0000)] 
net: stmmac: s32: move PHY_INTF_SEL_x definitions out of the way

S32's PHY_INTF_SEL_x definitions conflict with those for the dwmac
cores as they use a different bitmapping. Add a S32 prefix so that
they are unique.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/E1vFt4S-0000000ChoS-2Ahi@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: imx: use phylink's interface mode for set_clk_tx_rate()
Russell King (Oracle) [Mon, 3 Nov 2025 11:49:55 +0000 (11:49 +0000)] 
net: stmmac: imx: use phylink's interface mode for set_clk_tx_rate()

imx_dwmac_set_clk_tx_rate() is passed the interface mode from phylink
which will be the same as plat_dat->phy_interface. Use the passed-in
interface mode rather than plat_dat->phy_interface.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt4N-0000000ChoM-1llp@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: mark deliver_skb() as unlikely and not inlined
Eric Dumazet [Mon, 3 Nov 2025 16:52:56 +0000 (16:52 +0000)] 
net: mark deliver_skb() as unlikely and not inlined

deliver_skb() should not be inlined as is it not called
in the fast path.

Add unlikely() clauses giving hints to the compiler about this fact.

Before this patch:

size net/core/dev.o
   text    data     bss     dec     hex filename
 121794   13330     176  135300   21084 net/core/dev.o

__netif_receive_skb_core() size on x86_64 : 4080 bytes.

After:

size net/core/dev.o
  text    data     bss     dec     hex filenamee
 120330   13338     176  133844   20ad4 net/core/dev.o

__netif_receive_skb_core() size on x86_64 : 2781 bytes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251103165256.1712169-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agortnetlink: honor RTEXT_FILTER_SKIP_STATS in IFLA_STATS
Adrian Moreno [Mon, 3 Nov 2025 15:40:04 +0000 (16:40 +0100)] 
rtnetlink: honor RTEXT_FILTER_SKIP_STATS in IFLA_STATS

Gathering interface statistics can be a relatively expensive operation
on certain systems as it requires iterating over all the cpus.

RTEXT_FILTER_SKIP_STATS was first introduced [1] to skip AF_INET6
statistics from interface dumps and it was then extended [2] to
also exclude IFLA_VF_INFO.

The semantics of the flag does not seem to be limited to AF_INET
or VF statistics and having a way to query the interface status
(e.g: carrier, address) without retrieving its statistics seems
reasonable. So this patch extends the use RTEXT_FILTER_SKIP_STATS
to also affect IFLA_STATS.

[1] https://lore.kernel.org/all/20150911204848.GC9687@oracle.com/
[2] https://lore.kernel.org/all/20230611105108.122586-1-gal@nvidia.com/

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://patch.msgid.link/20251103154006.1189707-1-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'xsk-minor-optimizations-around-locks'
Paolo Abeni [Tue, 4 Nov 2025 15:10:54 +0000 (16:10 +0100)] 
Merge branch 'xsk-minor-optimizations-around-locks'

Jason Xing says:

====================
xsk: minor optimizations around locks

Two optimizations regarding xsk_tx_list_lock and cq_lock can yield a
performance increase because of avoiding disabling and enabling
interrupts frequently.
====================

Link: https://patch.msgid.link/20251030000646.18859-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoxsk: use a smaller new lock for shared pool case
Jason Xing [Thu, 30 Oct 2025 00:06:46 +0000 (08:06 +0800)] 
xsk: use a smaller new lock for shared pool case

- Split cq_lock into two smaller locks: cq_prod_lock and
  cq_cached_prod_lock
- Avoid disabling/enabling interrupts in the hot xmit path

In either xsk_cq_cancel_locked() or xsk_cq_reserve_locked() function,
the race condition is only between multiple xsks sharing the same
pool. They are all in the process context rather than interrupt context,
so now the small lock named cq_cached_prod_lock can be used without
handling interrupts.

While cq_cached_prod_lock ensures the exclusive modification of
@cached_prod, cq_prod_lock in xsk_cq_submit_addr_locked() only cares
about @producer and corresponding @desc. Both of them don't necessarily
be consistent with @cached_prod protected by cq_cached_prod_lock.
That's the reason why the previous big lock can be split into two
smaller ones. Please note that SPSC rule is all about the global state
of producer and consumer that can affect both layers instead of local
or cached ones.

Frequently disabling and enabling interrupt are very time consuming
in some cases, especially in a per-descriptor granularity, which now
can be avoided after this optimization, even when the pool is shared by
multiple xsks.

With this patch, the performance number[1] could go from 1,872,565 pps
to 1,961,009 pps. It's a minor rise of around 5%.

[1]: taskset -c 1 ./xdpsock -i enp2s0f1 -q 0 -t -S -s 64

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20251030000646.18859-3-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoxsk: do not enable/disable irq when grabbing/releasing xsk_tx_list_lock
Jason Xing [Thu, 30 Oct 2025 00:06:45 +0000 (08:06 +0800)] 
xsk: do not enable/disable irq when grabbing/releasing xsk_tx_list_lock

The commit ac98d8aab61b ("xsk: wire upp Tx zero-copy functions")
originally introducing this lock put the deletion process in the
sk_destruct which can run in irq context obviously, so the
xxx_irqsave()/xxx_irqrestore() pair was used. But later another
commit 541d7fdd7694 ("xsk: proper AF_XDP socket teardown ordering")
moved the deletion into xsk_release() that only happens in process
context. It means that since this commit, it doesn't necessarily
need that pair.

Now, there are two places that use this xsk_tx_list_lock and only
run in the process context. So avoid manipulating the irq then.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20251030000646.18859-2-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: add net cookie for net device trace events
Tonghao Zhang [Tue, 28 Oct 2025 04:32:44 +0000 (12:32 +0800)] 
net: add net cookie for net device trace events

In a multi-network card or container environment, this is needed in order
to differentiate between trace events relating to net devices that exist
in different network namespaces and share the same name.

for xmit_timeout trace events:
[002] ..s1.  1838.311662: net_dev_xmit_timeout: dev=eth0 driver=virtio_net queue=10 net_cookie=3
[007] ..s1.  1839.335650: net_dev_xmit_timeout: dev=eth0 driver=virtio_net queue=10 net_cookie=4100
[007] ..s1.  1844.455659: net_dev_xmit_timeout: dev=eth0 driver=virtio_net queue=10 net_cookie=3
[002] ..s1.  1850.087647: net_dev_xmit_timeout: dev=eth0 driver=virtio_net queue=10 net_cookie=3

Cc: Eran Ben Elisha <eranbe@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251028043244.82288-1-tonghao@bamaicloud.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoMerge branch 'ethtool-introduce-phy-mse-diagnostics-uapi-and-drivers'
Jakub Kicinski [Tue, 4 Nov 2025 02:32:33 +0000 (18:32 -0800)] 
Merge branch 'ethtool-introduce-phy-mse-diagnostics-uapi-and-drivers'

Oleksij Rempel says:

====================
ethtool: introduce PHY MSE diagnostics UAPI and drivers

This series introduces a generic kernel-userspace API for retrieving PHY
Mean Square Error (MSE) diagnostics, together with netlink integration,
a fast-path reporting hook in LINKSTATE_GET, and initial driver
implementations for the KSZ9477 and DP83TD510E PHYs.

MSE is defined by the OPEN Alliance "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification [1] as a measure of
slicer error rate, typically used internally to derive the Signal
Quality Indicator (SQI). While SQI is useful as a normalized quality
index, it hides raw measurement data, varies in scaling and thresholds
between vendors, and may not indicate certain failure modes - for
example, cases where autonegotiation would fail even though SQI reports
a good link. In practice, such scenarios can only be investigated in
fixed-link mode; here, MSE can provide an empirically estimated value
indicating conditions under which autonegotiation would not succeed.

Example output with current implementation:
root@DistroKit:~ ethtool lan1
Settings for lan1:
...
        Speed: 1000Mb/s
        Duplex: Full
...
        Link detected: yes
        SQI: 5/7
        MSE: 3/127 (channel: worst)

root@DistroKit:~ ethtool --show-mse lan1
MSE diagnostics for lan1:
MSE Configuration:
        Max Average MSE: 127
        Refresh Rate: 2000000 ps
        Symbols per Sample: 250
        Supported capabilities: average channel-a channel-b channel-c
                                channel-d worst

MSE Snapshot (Channel: a):
        Average MSE: 4

MSE Snapshot (Channel: b):
        Average MSE: 3

MSE Snapshot (Channel: c):
        Average MSE: 2

MSE Snapshot (Channel: d):
        Average MSE: 3

[1] https://opensig.org/wp-content/uploads/2024/01/Advanced_PHY_features_for_automotive_Ethernet_V1.0.pdf
====================

Link: https://patch.msgid.link/20251027122801.982364-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: dp83td510: add MSE interface support for 10BASE-T1L
Oleksij Rempel [Mon, 27 Oct 2025 12:28:01 +0000 (13:28 +0100)] 
net: phy: dp83td510: add MSE interface support for 10BASE-T1L

Implement get_mse_capability() and get_mse_snapshot() for the DP83TD510E
to expose its Mean Square Error (MSE) register via the new PHY MSE
UAPI.

The DP83TD510E does not document any peak MSE values; it only exposes
a single average MSE register used internally to derive SQI. This
implementation therefore advertises only PHY_MSE_CAP_AVG, along with
LINK and channel-A selectors. Scaling is fixed to 0xFFFF, and the
refresh interval/number of symbols are estimated from 10BASE-T1L
symbol rate (7.5 MBd) and typical diagnostic intervals (~1 ms).

For 10BASE-T1L deployments, SQI is a reliable indicator of link
modulation quality once the link is established, but it does not
indicate whether autonegotiation pulses will be correctly received
in marginal conditions. MSE provides a direct measurement of slicer
error rate that can be used to evaluate if autonegotiation is likely
to succeed under a given cable length and condition. In practice,
testing such scenarios often requires forcing a fixed-link setup to
isolate MSE behaviour from the autonegotiation process.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251027122801.982364-5-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: micrel: add MSE interface support for KSZ9477 family
Oleksij Rempel [Mon, 27 Oct 2025 12:28:00 +0000 (13:28 +0100)] 
net: phy: micrel: add MSE interface support for KSZ9477 family

Implement the get_mse_capability() and get_mse_snapshot() PHY driver ops
for KSZ9477-series integrated PHYs to demonstrate the new PHY MSE
UAPI.

These PHYs do not expose a documented direct MSE register, but the
Signal Quality Indicator (SQI) registers are derived from the
internal MSE computation. This hook maps SQI readings into the MSE
interface so that tooling can retrieve the raw value together with
metadata for correct interpretation in userspace.

Behaviour:
  - For 1000BASE-T, report per-channel (A–D) values and support a
    WORST channel selector.
  - For 100BASE-TX, only LINK-wide measurements are available.
  - Report average MSE only, with a max scale based on
    KSZ9477_MMD_SQI_MASK and a fixed refresh rate of 2 µs.

This mapping differs from the OPEN Alliance SQI definition, which
assigns thresholds such as pre-fail indices; the MSE interface
instead provides the raw measurement, leaving interpretation to
userspace.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251027122801.982364-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoethtool: netlink: add ETHTOOL_MSG_MSE_GET and wire up PHY MSE access
Oleksij Rempel [Mon, 27 Oct 2025 12:27:59 +0000 (13:27 +0100)] 
ethtool: netlink: add ETHTOOL_MSG_MSE_GET and wire up PHY MSE access

Introduce the userspace entry point for PHY MSE diagnostics via
ethtool netlink. This exposes the core API added previously and
returns both capability information and one or more snapshots.

Userspace sends ETHTOOL_MSG_MSE_GET. The reply carries:
- ETHTOOL_A_MSE_CAPABILITIES: scale limits and timing information
- ETHTOOL_A_MSE_CHANNEL_* nests: one or more snapshots (per-channel
  if available, otherwise WORST, otherwise LINK)

Link down returns -ENETDOWN.

Changes:
  - YAML: add attribute sets (mse, mse-capabilities, mse-snapshot)
    and the mse-get operation
  - UAPI (generated): add ETHTOOL_A_MSE_* enums and message IDs,
    ETHTOOL_MSG_MSE_GET/REPLY
  - ethtool core: add net/ethtool/mse.c implementing the request,
    register genl op, and hook into ethnl dispatch
  - docs: document MSE_GET in ethtool-netlink.rst

The include/uapi/linux/ethtool_netlink_generated.h is generated
from Documentation/netlink/specs/ethtool.yaml.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20251027122801.982364-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: introduce internal API for PHY MSE diagnostics
Oleksij Rempel [Mon, 27 Oct 2025 12:27:58 +0000 (13:27 +0100)] 
net: phy: introduce internal API for PHY MSE diagnostics

Add the base infrastructure for Mean Square Error (MSE) diagnostics,
as proposed by the OPEN Alliance "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" [1] specification.

The OPEN Alliance spec defines only average MSE and average peak MSE
over a fixed number of symbols. However, other PHYs, such as the
KSZ9131, additionally expose a worst-peak MSE value latched since the
last channel capture. This API accounts for such vendor extensions by
adding a distinct capability bit and snapshot field.

Channel-to-pair mapping is normally straightforward, but in some cases
(e.g. 100BASE-TX with MDI-X resolution unknown) the mapping is ambiguous.
If hardware does not expose MDI-X status, the exact pair cannot be
determined. To avoid returning misleading per-channel data in this case,
a LINK selector is defined for aggregate MSE measurements.

All investigated devices differ in MSE capabilities, such
as sample rate, number of analyzed symbols, and scaling factors.
For example, the KSZ9131 uses different scaling for MSE and pMSE.
To make this visible to callers, scale limits and timing information
are returned via get_mse_capability().

Some PHYs sample very few symbols at high frequency (e.g. 2 us update
rate). To cover such cases and allow for future high-speed PHYs with
even shorter intervals, the refresh rate is reported as u64 in
picoseconds.

This patch introduces the internal PHY API for Mean Square Error
diagnostics. It defines new kernel-side data types and driver hooks:

  - struct phy_mse_capability: describes supported metrics, scale
    limits, update interval, and sampling length.
  - struct phy_mse_snapshot: holds one correlated measurement set.
  - New phy_driver ops: `get_mse_capability()` and `get_mse_snapshot()`.

These definitions form the core kernel API. No user-visible interfaces
are added in this commit.

Standardization notes:
OPEN Alliance defines presence and interpretation of some metrics but does
not fix numeric scales or sampling internals:

- SQI (3-bit, 0..7) is mandatory; correlation to SNR/BER is informative
  (OA 100BASE-T1 TC1 v1.0 6.1.2; OA 1000BASE-T1 TC12 v2.2 6.1.2).
- MSE is optional; OA recommends 2^16 symbols and scaling to 0..511,
  with a worst-case latch since last read (OA 100BASE-T1 TC1 v1.0 6.1.1; OA
  1000BASE-T1 TC12 v2.2 6.1.1). Refresh is recommended (~0.8-2.0 ms for
  100BASE-T1; ~80-200 us for 1000BASE-T1). Exact scaling/time windows
  are vendor-specific.
- Peak MSE (pMSE) is defined only for 100BASE-T1 as optional, e.g.
  128-symbol sliding window with 8-bit range and worst-case latch (OA
  100BASE-T1 TC1 v1.0 6.1.3).

Therefore this API exposes which measures and selectors a PHY supports,
and documents where behavior is standard-referenced vs vendor-specific.

[1] <https://opensig.org/wp-content/uploads/2024/01/
     Advanced_PHY_features_for_automotive_Ethernet_V1.0.pdf>

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251027122801.982364-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'add-support-to-do-threaded-napi-busy-poll'
Jakub Kicinski [Tue, 4 Nov 2025 02:11:43 +0000 (18:11 -0800)] 
Merge branch 'add-support-to-do-threaded-napi-busy-poll'

Samiullah Khawaja says:

====================
Add support to do threaded napi busy poll

Extend the already existing support of threaded napi poll to do continuous
busy polling.

This is used for doing continuous polling of napi to fetch descriptors
from backing RX/TX queues for low latency applications. Allow enabling
of threaded busypoll using netlink so this can be enabled on a set of
dedicated napis for low latency applications.

Once enabled user can fetch the PID of the kthread doing NAPI polling
and set affinity, priority and scheduler for it depending on the
low-latency requirements.

Extend the netlink interface to allow enabling/disabling threaded
busypolling at individual napi level.

We use this for our AF_XDP based hard low-latency usecase with usecs
level latency requirement. For our usecase we want low jitter and stable
latency at P99.

Following is an analysis and comparison of available (and compatible)
busy poll interfaces for a low latency usecase with stable P99. This can
be suitable for applications that want very low latency at the expense
of cpu usage and efficiency.

Already existing APIs (SO_BUSYPOLL and epoll) allow busy polling a NAPI
backing a socket, but the missing piece is a mechanism to busy poll a
NAPI instance in a dedicated thread while ignoring available events or
packets, regardless of the userspace API. Most existing mechanisms are
designed to work in a pattern where you poll until new packets or events
are received, after which userspace is expected to handle them.

As a result, one has to hack together a solution using a mechanism
intended to receive packets or events, not to simply NAPI poll. NAPI
threaded busy polling, on the other hand, provides this capability
natively, independent of any userspace API. This makes it really easy to
setup and manage.

For analysis we use an AF_XDP based benchmarking tool `xsk_rr`. The
description of the tool and how it tries to simulate the real workload
is following,

- It sends UDP packets between 2 machines.
- The client machine sends packets at a fixed frequency. To maintain the
  frequency of the packet being sent, we use open-loop sampling. That is
  the packets are sent in a separate thread.
- The server replies to the packet inline by reading the pkt from the
  recv ring and replies using the tx ring.
- To simulate the application processing time, we use a configurable
  delay in usecs on the client side after a reply is received from the
  server.

The xsk_rr tool is posted separately as an RFC for tools/testing/selftest.

We use this tool with following napi polling configurations,

- Interrupts only
- SO_BUSYPOLL (inline in the same thread where the client receives the
  packet).
- SO_BUSYPOLL (separate thread and separate core)
- Threaded NAPI busypoll

System is configured using following script in all 4 cases,

```
echo 0 | sudo tee /sys/class/net/eth0/threaded
echo 0 | sudo tee /proc/sys/kernel/timer_migration
echo off | sudo tee  /sys/devices/system/cpu/smt/control

sudo ethtool -L eth0 rx 1 tx 1
sudo ethtool -G eth0 rx 1024

echo 0 | sudo tee /proc/sys/net/core/rps_sock_flow_entries
echo 0 | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus

 # pin IRQs on CPU 2
IRQS="$(gawk '/eth0-(TxRx-)?1/ {match($1, /([0-9]+)/, arr); \
print arr[0]}' < /proc/interrupts)"
for irq in "${IRQS}"; \
do echo 2 | sudo tee /proc/irq/$irq/smp_affinity_list; done

echo -1 | sudo tee /proc/sys/kernel/sched_rt_runtime_us

for i in /sys/devices/virtual/workqueue/*/cpumask; \
do echo $i; echo 1,2,3,4,5,6 > $i; done

if [[ -z "$1" ]]; then
  echo 400 | sudo tee /proc/sys/net/core/busy_read
  echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
  echo 15000   | sudo tee /sys/class/net/eth0/gro_flush_timeout
fi

sudo ethtool -C eth0 adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0

if [[ "$1" == "enable_threaded" ]]; then
  echo 0 | sudo tee /proc/sys/net/core/busy_poll
  echo 0 | sudo tee /proc/sys/net/core/busy_read
  echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
  echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout
  NAPI_ID=$(ynl --family netdev --output-json --do queue-get \
    --json '{"ifindex": '${IFINDEX}', "id": '0', "type": "rx"}' | jq '."napi-id"')

  ynl --family netdev --json '{"id": "'${NAPI_ID}'", "threaded": "busy-poll"}'

  NAPI_T=$(ynl --family netdev --output-json --do napi-get \
    --json '{"id": "'$NAPI_ID'"}' | jq '."pid"')

  sudo chrt -f  -p 50 $NAPI_T

  # pin threaded poll thread to CPU 2
  sudo taskset -pc 2 $NAPI_T
fi

if [[ "$1" == "enable_interrupt" ]]; then
  echo 0 | sudo tee /proc/sys/net/core/busy_read
  echo 0 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
  echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout
fi
```

To enable various configurations, script can be run as following,

- Interrupt Only
  ```
  <script> enable_interrupt
  ```

- SO_BUSYPOLL (no arguments to script)
  ```
  <script>
  ```

- NAPI threaded busypoll
  ```
  <script> enable_threaded
  ```

Once configured, the workload is run with various configurations using
following commands. Set period (1/frequency) and delay in usecs to
produce results for packet frequency and application processing delay.

 ## Interrupt Only and SO_BUSYPOLL (inline)

- Server
```
sudo chrt -f 50 taskset -c 3-5 ./xsk_rr -o 0 -B 400 -i eth0 -4 \
-D <IP-dest> -S <IP-src> -M <MAC-dst> -m <MAC-src> -p 54321 -h -v
```

- Client
```
sudo chrt -f 50 taskset -c 3-5 ./xsk_rr -o 0 -B 400 -i eth0 -4 \
-S <IP-src> -D <IP-dest> -m <MAC-src> -M <MAC-dst> -p 54321 \
-P <Period-usecs> -d <Delay-usecs>  -T -l 1 -v
```

 ## SO_BUSYPOLL(done in separate core using recvfrom)

Argument -t spawns a separate thread and continuously calls recvfrom.

- Server
```
sudo chrt -f 50 taskset -c 3-5 ./xsk_rr -o 0 -B 400 -i eth0 -4 \
-D <IP-dest> -S <IP-src> -M <MAC-dst> -m <MAC-src> -p 54321 \
-h -v -t
```

- Client
```
sudo chrt -f 50 taskset -c 3-5 ./xsk_rr -o 0 -B 400 -i eth0 -4 \
-S <IP-src> -D <IP-dest> -m <MAC-src> -M <MAC-dst> -p 54321 \
-P <Period-usecs> -d <Delay-usecs>  -T -l 1 -v -t
```

 ## NAPI Threaded Busy Poll

Argument -n skips the recvfrom call as there is no recv kick needed.

- Server
```
sudo chrt -f 50 taskset -c 3-5 ./xsk_rr -o 0 -B 400 -i eth0 -4 \
-D <IP-dest> -S <IP-src> -M <MAC-dst> -m <MAC-src> -p 54321 \
-h -v -n
```

- Client
```
sudo chrt -f 50 taskset -c 3-5 ./xsk_rr -o 0 -B 400 -i eth0 -4 \
-S <IP-src> -D <IP-dest> -m <MAC-src> -M <MAC-dst> -p 54321 \
-P <Period-usecs> -d <Delay-usecs>  -T -l 1 -v -n
```

| Experiment | interrupts | SO_BUSYPOLL | SO_BUSYPOLL(separate) | NAPI threaded |
|---|---|---|---|---|
| 12 Kpkt/s + 0us delay | | | | |
|  | p5: 12700 | p5: 12900 | p5: 13300 | p5: 12800 |
|  | p50: 13100 | p50: 13600 | p50: 14100 | p50: 13000 |
|  | p95: 13200 | p95: 13800 | p95: 14400 | p95: 13000 |
|  | p99: 13200 | p99: 13800 | p99: 14400 | p99: 13000 |
| 32 Kpkt/s + 30us delay | | | | |
|  | p5: 19900 | p5: 16600 | p5: 13100 | p5: 12800 |
|  | p50: 21100 | p50: 17000 | p50: 13700 | p50: 13000 |
|  | p95: 21200 | p95: 17100 | p95: 14000 | p95: 13000 |
|  | p99: 21200 | p99: 17100 | p99: 14000 | p99: 13000 |
| 125 Kpkt/s + 6us delay | | | | |
|  | p5: 14600 | p5: 17100 | p5: 13300 | p5: 12900 |
|  | p50: 15400 | p50: 17400 | p50: 13800 | p50: 13100 |
|  | p95: 15600 | p95: 17600 | p95: 14000 | p95: 13100 |
|  | p99: 15600 | p99: 17600 | p99: 14000 | p99: 13100 |
| 12 Kpkt/s + 78us delay | | | | |
|  | p5: 14100 | p5: 16700 | p5: 13200 | p5: 12600 |
|  | p50: 14300 | p50: 17100 | p50: 13900 | p50: 12800 |
|  | p95: 14300 | p95: 17200 | p95: 14200 | p95: 12800 |
|  | p99: 14300 | p99: 17200 | p99: 14200 | p99: 12800 |
| 25 Kpkt/s + 38us delay | | | | |
|  | p5: 19900 | p5: 16600 | p5: 13000 | p5: 12700 |
|  | p50: 21000 | p50: 17100 | p50: 13800 | p50: 12900 |
|  | p95: 21100 | p95: 17100 | p95: 14100 | p95: 12900 |
|  | p99: 21100 | p99: 17100 | p99: 14100 | p99: 12900 |

 ## Observations

- Here without application processing all the approaches give the same
  latency within 1usecs range and NAPI threaded gives minimum latency.
- With application processing the latency increases by 3-4usecs when
  doing inline polling.
- Using a dedicated core to drive napi polling keeps the latency same
  even with application processing. This is observed both in userspace
  and threaded napi (in kernel).
- Using napi threaded polling in kernel gives lower latency by
  1-2usecs as compared to userspace driven polling in separate core.
- Even on a dedicated core, SO_BUSYPOLL adds around 1-2usecs of latency.
  This is because it doesn't continuously busy poll until events are
  ready. Instead, it returns after polling only once, requiring the
  process to re-invoke the syscall for each poll, which requires a new
  enter/leave kernel cycle and the setup/teardown of the busy poll for
  every single poll attempt.
- With application processing userspace will get the packet from recv
  ring and spend some time doing application processing and then do napi
  polling. While application processing is happening a dedicated core
  doing napi polling can pull the packet of the NAPI RX queue and
  populate the AF_XDP recv ring. This means that when the application
  thread is done with application processing it has new packets ready to
  recv and process in recv ring.
- Napi threaded busy polling in the kernel with a dedicated core gives
  the consistent P5-P99 latency.

Note well that threaded napi busy-polling has not been shown to yield
efficiency or throughput benefits. In contrast, dedicating an entire
core to busy-polling one NAPI (NIC queue) is rather inefficient.
However, in certain specific use cases, this mechanism results in lower
packet processing latency. The experimental testing reported here only
covers those use cases and does not present a comprehensive evaluation
of threaded napi busy-polling.

Following histogram is generated to measure the time spent in recvfrom
while using inline thread with SO_BUSYPOLL. The histogram is generated
using the following bpftrace command. In this experiment there are 32K
packets per second and the application processing delay is 30usecs. This
is to measure whether there is significant time spent pulling packets
from the descriptor queue that it will affect the overall latency if
done inline.

```
bpftrace -e '
        kprobe:xsk_recvmsg {
                @start[tid] = nsecs;
        }
        kretprobe:xsk_recvmsg {
                if (@start[tid]) {
                        $sample = (nsecs - @start[tid]);
                        @xsk_recvfrom_hist = hist($sample);
                        delete(@start[tid]);
                }
        }
        END { clear(@start);}'
```

Here in case of inline busypolling around 35 percent of calls are taking
1-2usecs and around 50 percent are taking 0.5-2usecs.

@xsk_recvfrom_hist:
[128, 256)         24073 |@@@@@@@@@@@@@@@@@@@@@@                              |
[256, 512)         55633 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K)          20974 |@@@@@@@@@@@@@@@@@@@                                 |
[1K, 2K)           34234 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
[2K, 4K)            3266 |@@@                                                 |
[4K, 8K)              19 |                                                    |
====================

Link: https://patch.msgid.link/20251028203007.575686-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: Add napi threaded busy poll test in `busy_poller`
Samiullah Khawaja [Tue, 28 Oct 2025 20:30:06 +0000 (20:30 +0000)] 
selftests: Add napi threaded busy poll test in `busy_poller`

Add testcase to run busy poll test with threaded napi busy poll enabled.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin Karsten <mkarsten@uwaterloo.ca>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Link: https://patch.msgid.link/20251028203007.575686-3-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: Extend NAPI threaded polling to allow kthread based busy polling
Samiullah Khawaja [Tue, 28 Oct 2025 20:30:05 +0000 (20:30 +0000)] 
net: Extend NAPI threaded polling to allow kthread based busy polling

Add a new state NAPI_STATE_THREADED_BUSY_POLL to the NAPI state enum to
enable and disable threaded busy polling.

When threaded busy polling is enabled for a NAPI, enable
NAPI_STATE_THREADED also.

When the threaded NAPI is scheduled, set NAPI_STATE_IN_BUSY_POLL to
signal napi_complete_done not to rearm interrupts.

Whenever NAPI_STATE_THREADED_BUSY_POLL is unset, the
NAPI_STATE_IN_BUSY_POLL will be unset, napi_complete_done unsets the
NAPI_STATE_SCHED_THREADED bit also, which in turn will make the kthread
go to sleep.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin Karsten <mkarsten@uwaterloo.ca>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Link: https://patch.msgid.link/20251028203007.575686-2-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'mpls-remove-rtnl-dependency'
Jakub Kicinski [Tue, 4 Nov 2025 01:40:59 +0000 (17:40 -0800)] 
Merge branch 'mpls-remove-rtnl-dependency'

Kuniyuki Iwashima says:

====================
mpls: Remove RTNL dependency.

MPLS uses RTNL

  1) to guarantee the lifetime of struct mpls_nh.nh_dev
  2) to protect net->mpls.platform_label

, but neither actually requires RTNL.

If struct mpls_nh holds a refcnt for nh_dev, we do not need RTNL,
and it can be replaced with a dedicated mutex.

The series removes RTNL from net/mpls/.

Overview:

  Patch 1 is misc cleanup.

  Patch 2 - 9 are prep to drop RTNL for RTM_{NEW,DEL,GET}ROUTE
  handlers.

  Patch 10 & 11 converts mpls_dump_routes() and RTM_GETNETCONF to RCU.

  Patch 12 replaces RTNL with a new per-netns mutex.

  Patch 13 drops RTNL from RTM_{NEW,DEL,GET}ROUTE.
====================

Link: https://patch.msgid.link/20251029173344.2934622-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Drop RTNL for RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE.
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:33:05 +0000 (17:33 +0000)] 
mpls: Drop RTNL for RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE.

RTM_NEWROUTE looks up dev under RCU (ip_route_output(),
ipv6_stub->ipv6_dst_lookup_flow(), netdev_get_by_index()),
and each neighbour holds the refcnt of its dev.

Also, net->mpls.platform_label is protected by a dedicated
per-netns mutex.

Now, no MPLS code depends on RTNL.

Let's drop RTNL for RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Protect net->mpls.platform_label with a per-netns mutex.
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:33:04 +0000 (17:33 +0000)] 
mpls: Protect net->mpls.platform_label with a per-netns mutex.

MPLS (re)uses RTNL to protect net->mpls.platform_label,
but the lock does not need to be RTNL at all.

Let's protect net->mpls.platform_label with a dedicated
per-netns mutex.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Convert RTM_GETNETCONF to RCU.
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:33:03 +0000 (17:33 +0000)] 
mpls: Convert RTM_GETNETCONF to RCU.

mpls_netconf_get_devconf() calls __dev_get_by_index(),
and this only depends on RTNL.

Let's convert mpls_netconf_get_devconf() to RCU and use
dev_get_by_index_rcu().

Note that nlmsg_new() is moved ahead to use GFP_KERNEL.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-12-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Convert mpls_dump_routes() to RCU.
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:33:02 +0000 (17:33 +0000)] 
mpls: Convert mpls_dump_routes() to RCU.

mpls_dump_routes() sets fib_dump_filter.rtnl_held to true and
calls __dev_get_by_index() in mpls_valid_fib_dump_req().

This is the only RTNL dependant in mpls_dump_routes().

Also, synchronize_rcu() in resize_platform_label_table()
guarantees that net->mpls.platform_label is alive under RCU.

Let's convert mpls_dump_routes() to RCU and use dev_get_by_index_rcu().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-11-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Use mpls_route_input() where appropriate.
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:33:01 +0000 (17:33 +0000)] 
mpls: Use mpls_route_input() where appropriate.

In many places, we uses rtnl_dereference() twice for
net->mpls.platform_label and net->mpls.platform_label[index].

Let's replace the code with mpls_route_input().

We do not use mpls_route_input() in mpls_dump_routes() since
we will rely on RCU there.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Add mpls_route_input().
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:33:00 +0000 (17:33 +0000)] 
mpls: Add mpls_route_input().

mpls_route_input_rcu() is called from mpls_forward() and
mpls_getroute().

The former is under RCU, and the latter is under RTNL, so
mpls_route_input_rcu() uses rcu_dereference_rtnl().

Let's use rcu_dereference() in mpls_route_input_rcu() and
add an RTNL variant for mpls_getroute().

Later, we will remove rtnl_dereference() there.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Pass net to mpls_dev_get().
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:32:59 +0000 (17:32 +0000)] 
mpls: Pass net to mpls_dev_get().

We will replace RTNL with a per-netns mutex to protect dev->mpls_ptr.

Then, we will use rcu_dereference_protected() with the lockdep_is_held()
annotation, which requires net to access the per-netns mutex.

However, dev_net(dev) is not safe without RTNL.

Let's pass net to mpls_dev_get().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Add mpls_dev_rcu().
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:32:58 +0000 (17:32 +0000)] 
mpls: Add mpls_dev_rcu().

mpls_dev_get() uses rcu_dereference_rtnl() to fetch dev->mpls_ptr.

We will replace RTNL with a dedicated mutex to protect the field.

Then, we will use rcu_dereference_protected() for clarity.

Let's add mpls_dev_rcu() for the RCU reader.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Use in6_dev_rcu() and dev_net_rcu() in mpls_forward() and mpls_xmit().
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:32:57 +0000 (17:32 +0000)] 
mpls: Use in6_dev_rcu() and dev_net_rcu() in mpls_forward() and mpls_xmit().

mpls_forward() and mpls_xmit() are called under RCU.

Let's use in6_dev_rcu() and dev_net_rcu() there to annotate
as such.

Now we pass net to mpls_stats_inc_outucastpkts() not to read
dev_net_rcu() twice.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoipv6: Add in6_dev_rcu().
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:32:56 +0000 (17:32 +0000)] 
ipv6: Add in6_dev_rcu().

rcu_dereference_rtnl() does not clearly tell whether the caller
is under RCU or RTNL.

Let's add in6_dev_rcu() to make it easy to remove __in6_dev_get()
in the future.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Unify return paths in mpls_dev_notify().
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:32:55 +0000 (17:32 +0000)] 
mpls: Unify return paths in mpls_dev_notify().

We will protect net->mpls.platform_label by a dedicated mutex.

Then, we need to wrap functions called from mpls_dev_notify()
with the mutex.

As a prep, let's unify the return paths.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Hold dev refcnt for mpls_nh.
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:32:54 +0000 (17:32 +0000)] 
mpls: Hold dev refcnt for mpls_nh.

MPLS uses RTNL

  1) to guarantee the lifetime of struct mpls_nh.nh_dev
  2) to protect net->mpls.platform_label

, but neither actually requires RTNL.

If we do not call dev_put() in find_outdev() and call it
just before freeing struct mpls_route, we can drop RTNL for 1).

Let's hold the refcnt of mpls_nh.nh_dev and track it with
netdevice_tracker.

Two notable changes:

If mpls_nh_build_multi() fails to set up a neighbour, we need
to call netdev_put() for successfully created neighbours in
mpls_rt_free_rcu(), so the number of neighbours (rt->rt_nhn)
is now updated in each iteration.

When a dev is unregistered, mpls_ifdown() clones mpls_route
and replaces it with the clone, so the clone requires extra
netdev_hold().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agompls: Return early in mpls_label_ok().
Kuniyuki Iwashima [Wed, 29 Oct 2025 17:32:53 +0000 (17:32 +0000)] 
mpls: Return early in mpls_label_ok().

When mpls_label_ok() returns false, it does not need to update *index.

Let's remove is_ok and return early.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20251029173344.2934622-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: rename devlink parameter ts_coarse into phc_coarse_adj
Maxime Chevallier [Thu, 30 Oct 2025 18:24:53 +0000 (19:24 +0100)] 
net: stmmac: rename devlink parameter ts_coarse into phc_coarse_adj

The devlink param "ts_coarse" doesn't indicate that we get coarse
timestamps, but rather that the PHC clock adjusments are coarse as the
frequency won't be continuously adjusted. Adjust the devlink parameter
name to reflect that.

The Coarse terminlogy comes from the dwmac register naming, update the
documentation to better explain what the parameter is about.

With this change, the parameter can now be adjusted using:
  devlink dev param set <dev> name phc_coarse_adj value true cmode runtime

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251030182454.182406-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: yt921x: Fix spelling mistake "stucked" -> "stuck"
Colin Ian King [Sat, 1 Nov 2025 18:34:46 +0000 (18:34 +0000)] 
net: dsa: yt921x: Fix spelling mistake "stucked" -> "stuck"

There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251101183446.32134-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: realtek: add interrupt support for RTL8221B
Jianhui Zhao [Sun, 2 Nov 2025 15:26:37 +0000 (16:26 +0100)] 
net: phy: realtek: add interrupt support for RTL8221B

This commit introduces interrupt support for RTL8221B (C45 mode).
Interrupts are mapped on the VEND2 page. VEND2 registers are only
accessible via C45 reads and cannot be accessed by C45 over C22.

Signed-off-by: Jianhui Zhao <zhaojh329@gmail.com>
[Enable only link state change interrupts]
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251102152644.1676482-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agohinic3: fix misleading error message in hinic3_open_channel()
Alok Tiwari [Fri, 31 Oct 2025 11:26:44 +0000 (04:26 -0700)] 
hinic3: fix misleading error message in hinic3_open_channel()

The error message printed when hinic3_configure() fails incorrectly
reports "Failed to init txrxq irq", which does not match the actual
operation performed. The hinic3_configure() function sets up various
device resources such as MTU and RSS parameters , not IRQ initialization.

Update the log to "Failed to configure device resources" to make the
message accurate and clearer for debugging.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/20251031112654.46187-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoDocumentation: ARCnet: Update obsolete contact info
Bagas Sanjaya [Tue, 28 Oct 2025 01:44:52 +0000 (08:44 +0700)] 
Documentation: ARCnet: Update obsolete contact info

ARCnet docs states that inquiries on the subsystem should be emailed to
Avery Pennarun <apenwarr@worldvisions.ca>, for whom has been in CREDITS
since the beginning of kernel git history and her email address is
unreachable (bounce). The subsystem is now maintained by Michael
Grzeschik since c38f6ac74c9980 ("MAINTAINERS: add arcnet and take
maintainership").

In addition, there used to be a dedicated ARCnet mailing list but its
archive at epistolary.org has been shut down. ARCnet discussion nowadays
take place in netdev list. The arcnet.com domain mentioned has become
AIoT (Artificial Intelligence of Things) related Typeform page and
ARCnet info now resides on arcnet.cc (ARCnet Resource Center) instead.

Update contact information.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251028014451.10521-2-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'dpll-add-support-for-phase-adjustment-granularity'
Jakub Kicinski [Sat, 1 Nov 2025 00:59:21 +0000 (17:59 -0700)] 
Merge branch 'dpll-add-support-for-phase-adjustment-granularity'

Ivan Vecera says:

====================
dpll: Add support for phase adjustment granularity

Phase-adjust values are currently limited only by a min-max range. Some
hardware requires, for certain pin types, that values be multiples of
a specific granularity, as in the zl3073x driver.

Patch 1: Adds 'phase-adjust-gran' pin attribute and an appropriate
         handling
Patch 2: Adds a support for this attribute into zl3073x driver
====================

Link: https://patch.msgid.link/20251029153207.178448-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agodpll: zl3073x: Specify phase adjustment granularity for pins
Ivan Vecera [Wed, 29 Oct 2025 15:32:07 +0000 (16:32 +0100)] 
dpll: zl3073x: Specify phase adjustment granularity for pins

Output pins phase adjustment values in the device are expressed
in half synth clock cycles. Use this number of cycles as output
pins' phase adjust granularity and simplify both get/set callbacks.

Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20251029153207.178448-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agodpll: add phase-adjust-gran pin attribute
Ivan Vecera [Wed, 29 Oct 2025 15:32:06 +0000 (16:32 +0100)] 
dpll: add phase-adjust-gran pin attribute

Phase-adjust values are currently limited by a min-max range. Some
hardware requires, for certain pin types, that values be multiples of
a specific granularity, as in the zl3073x driver.

Add a `phase-adjust-gran` pin attribute and an appropriate field in
dpll_pin_properties. If set by the driver, use its value to validate
user-provided phase-adjust values.

Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20251029153207.178448-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-pse-pd-add-tps23881b-support'
Jakub Kicinski [Sat, 1 Nov 2025 00:56:33 +0000 (17:56 -0700)] 
Merge branch 'net-pse-pd-add-tps23881b-support'

Thomas Wismer says:

====================
net: pse-pd: Add TPS23881B support

This patch series aims at adding support for the TI TPS23881B PoE
PSE controller.
====================

Link: https://patch.msgid.link/20251029212312.108749-1-thomas@wismer.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agodt-bindings: pse-pd: ti,tps23881: Add TPS23881B
Thomas Wismer [Wed, 29 Oct 2025 21:23:10 +0000 (22:23 +0100)] 
dt-bindings: pse-pd: ti,tps23881: Add TPS23881B

Add the TPS23881B I2C power sourcing equipment controller to the list of
supported devices.

Falling back to the TPS23881 predecessor device is not suitable as firmware
loading needs to handled differently by the driver. The TPS23881 and
TPS23881B devices require different firmware. Trying to load the TPS23881
firmware on a TPS23881B device fails and must therefore be omitted.

Signed-off-by: Thomas Wismer <thomas.wismer@scs.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20251029212312.108749-3-thomas@wismer.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: pse-pd: tps23881: Add support for TPS23881B
Thomas Wismer [Wed, 29 Oct 2025 21:23:09 +0000 (22:23 +0100)] 
net: pse-pd: tps23881: Add support for TPS23881B

The TPS23881B uses different firmware than the TPS23881. Trying to load the
TPS23881 firmware on a TPS23881B device fails and must be omitted.

The TPS23881B ships with a more recent ROM firmware. Moreover, no updated
firmware has been released yet and so the firmware loading step must be
skipped. As of today, the TPS23881B is intended to use its ROM firmware.

Signed-off-by: Thomas Wismer <thomas.wismer@scs.ch>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20251029212312.108749-2-thomas@wismer.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoDocumentation: netconsole: Separate literal code blocks for full and short netcat...
Bagas Sanjaya [Thu, 30 Oct 2025 07:50:13 +0000 (14:50 +0700)] 
Documentation: netconsole: Separate literal code blocks for full and short netcat command name versions

Both full and short (abbreviated) command name versions of netcat
example are combined in single literal code block due to 'or::'
paragraph being indented one more space than the preceding paragraph
(before the short version example).

Unindent it to separate the versions.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251030075013.40418-1-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-phy-microchip_t1s-add-support-for-lan867x-rev-d0-phy'
Jakub Kicinski [Fri, 31 Oct 2025 23:52:08 +0000 (16:52 -0700)] 
Merge branch 'net-phy-microchip_t1s-add-support-for-lan867x-rev-d0-phy'

Parthiban Veerasooran says:

====================
net: phy: microchip_t1s: Add support for LAN867x Rev.D0 PHY

This patch series adds support for the latest Microchip LAN8670/1/2 Rev.D0
10BASE-T1S PHYs to the microchip_t1s driver.

The new Rev.D0 silicon introduces updated initialization requirements and
link status handling behavior compared to earlier revisions (Rev.C2 and
below). These updates are necessary for full compliance with the OPEN
Alliance 10BASE-T1S specification and are documented in Microchip
Application Note AN1699 Revision G (DS60001699G – October 2025).

Summary of changes:
- Implements Rev.D0-specific configuration sequence as described in AN1699
  Rev.G.
- Introduces link status control configuration for LAN867x Rev.D0.
====================

Link: https://patch.msgid.link/20251030102258.180061-1-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: microchip_t1s: configure link status control for LAN867x Rev.D0
Parthiban Veerasooran [Thu, 30 Oct 2025 10:22:58 +0000 (15:52 +0530)] 
net: phy: microchip_t1s: configure link status control for LAN867x Rev.D0

Configure the link status in the Link Status Control register for
LAN8670/1/2 Rev.D0 PHYs, depending on whether PLCA or CSMA/CD mode
is enabled. When PLCA is enabled, the link status reflects the PLCA
status. When PLCA is disabled (CSMA/CD mode), the PHY does not support
autonegotiation, so the link status is forced active by setting
the LINK_STATUS_SEMAPHORE bit.

The link status control is configured:
- During PHY initialization, for default CSMA/CD mode.
- Whenever PLCA configuration is updated.

This ensures correct link reporting and consistent behavior for
LAN867x Rev.D0 devices.

Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251030102258.180061-3-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: phy: microchip_t1s: add support for Microchip LAN867X Rev.D0 PHY
Parthiban Veerasooran [Thu, 30 Oct 2025 10:22:57 +0000 (15:52 +0530)] 
net: phy: microchip_t1s: add support for Microchip LAN867X Rev.D0 PHY

Add support for the LAN8670/1/2 Rev.D0 10BASE-T1S PHYs from Microchip.
The new Rev.D0 silicon requires a specific set of initialization
settings to be configured for optimal performance and compliance with
OPEN Alliance specifications, as described in Microchip Application Note
AN1699 (Revision G, DS60001699G – October 2025).
https://www.microchip.com/en-us/application-notes/an1699

Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251030102258.180061-2-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: qcom-ethqos: remove MAC_CTRL_REG modification
Russell King (Oracle) [Thu, 30 Oct 2025 10:20:32 +0000 (10:20 +0000)] 
net: stmmac: qcom-ethqos: remove MAC_CTRL_REG modification

When operating in "SGMII" mode (Cisco SGMII or 2500BASE-X), qcom-ethqos
modifies the MAC control register in its ethqos_configure_sgmii()
function, which is only called from one path:

stmmac_mac_link_up()
+- reads MAC_CTRL_REG
+- masks out priv->hw->link.speed_mask
+- sets bits according to speed (2500, 1000, 100, 10) from priv->hw.link.speed*
+- ethqos_fix_mac_speed()
|  +- qcom_ethqos_set_sgmii_loopback(false)
|  +- ethqos_update_link_clk(speed)
|  `- ethqos_configure(speed)
|     `- ethqos_configure_sgmii(speed)
|        +- reads MAC_CTRL_REG,
|        +- configures PS/FES bits according to speed
|        `- writes MAC_CTRL_REG as the last operation
+- sets duplex bit(s)
+- stmmac_mac_flow_ctrl()
+- writes MAC_CTRL_REG if changed from original read
...

As can be seen, the modification of the control register that
stmmac_mac_link_up() overwrites the changes that ethqos_fix_mac_speed()
does to the register. This makes ethqos_configure_sgmii()'s
modification questionable at best.

Analysing the values written, GMAC4 sets the speed bits as:
speed_mask = GMAC_CONFIG_FES | GMAC_CONFIG_PS
speed2500 = GMAC_CONFIG_FES                     B14=1 B15=0
speed1000 = 0                                   B14=0 B15=0
speed100 = GMAC_CONFIG_FES | GMAC_CONFIG_PS     B14=1 B15=1
speed10 = GMAC_CONFIG_PS                        B14=0 B15=1

Whereas ethqos_configure_sgmii():
2500: clears ETHQOS_MAC_CTRL_PORT_SEL           B14=X B15=0
1000: clears ETHQOS_MAC_CTRL_PORT_SEL           B14=X B15=0
100: sets ETHQOS_MAC_CTRL_PORT_SEL |            B14=1 B15=1
          ETHQOS_MAC_CTRL_SPEED_MODE
10: sets ETHQOS_MAC_CTRL_PORT_SEL               B14=0 B15=1
    clears ETHQOS_MAC_CTRL_SPEED_MODE

Thus, they appear to be doing very similar, with the exception of the
FES bit (bit 14) for 1G and 2.5G speeds.

Given that stmmac_mac_link_up() will write the MAC_CTRL_REG after
ethqos_configure_sgmii(), remove the unnecessary update in the
glue driver's ethqos_configure_sgmii() method, simplifying the code.

Konrad states:

Without any additional knowledge, the register description says:

2500: B14=1 B15=0
1000: B14=0 B15=0
 100: B14=1 B15=1
  10: B14=0 B15=1

Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vEPlg-0000000CFHY-282A@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>