git.ipfire.org Git - thirdparty/kernel/linux.git/log

Merge branch 'net-convert-drivers-to-get_rx_ring_count-part-2'

Breno Leitao says:

====================
net: convert drivers to .get_rx_ring_count (part 2)

Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().

This simplifies the RX ring count retrieval and aligns the following
drivers with the new ethtool API for querying RX ring parameters.
  * engleder/tsnep
  * mediatek
  * amazon/ena
  * microchip/lan743x
  * amd/xgbe
  * chelsio/cxgb4
  * wangxun/txgbe
  * cadence/macb

All of these change were compile-tested only.
====================

Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-0-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-9-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-8-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: cxgb4: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-7-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: xgbe: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Since ETHTOOL_GRXRINGS was the only command handled by xgbe_get_rxnfc(),
remove the function entirely.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-6-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan743x: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Since ETHTOOL_GRXRINGS was the only command handled by
lan743x_ethtool_get_rxnfc(), remove the function entirely.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-5-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Since ETHTOOL_GRXRINGS was the only useful command handled by
ena_get_rxnfc(), remove the function entirely.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-4-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mediatek: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-3-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: tsnep: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-2-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-net-improve-error-handling-in-passive-tfo-test'

Yohei Kojima says:

====================
selftests: net: improve error handling in passive TFO test

This series improves error handling in the passive TFO test by (1)
fixing a broken behavior when the child processes failed (or timed out),
and (2) adding more error handlng code in the test program.

The first patch fixes the behavior that the test didn't report failure
even if the server or the client process exited with non-zero status.
The second patch adds error handling code in the test program to improve
reliability of the test.
====================

Link: https://patch.msgid.link/cover.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: improve error handling in passive TFO test

Improve the error handling in passive TFO test to check the return value
from sendto(), and to fail if read() or fprintf() failed.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
Link: https://patch.msgid.link/24707c8133f7095c0e5a94afa69e75c3a80bf6e7.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: fix passive TFO test to fail if child processes failed

Improve the passive TFO test to report failure if the server or the
client timed out or exited with non-zero status.

Before this commit, TFO test didn't fail even if exit(EXIT_FAILURE) is
added to the first line of the run_server() and run_client() functions.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
Link: https://patch.msgid.link/214d399caec2e5de7738ced5736829915d507e4e.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-realtek-simplify-and-reunify-c22-c45-drivers'

Daniel Golle says:

====================
net: phy: realtek: simplify and reunify C22/C45 drivers

The RTL8221B PHY variants (VB-CG and VM-CG) were previously split into
separate C22 and C45 driver instances to support copper SFP modules
using the RollBall MDIO-over-I2C protocol, which only supports Clause-45
access. However, this split created significant code duplication and
complexity.

Commit 8af2136e77989 ("net: phy: realtek: add helper
RTL822X_VND2_C22_REG") exposed that RealTek PHYs map all standard
Clause-22 registers into MDIO_MMD_VEND2 at offset 0xa400.

With commit 1850ec20d6e71 ("net: phy: realtek: use paged access for
MDIO_MMD_VEND2 in C22 mode") it is now possible to access all MMD
registers transparently, regardless of whether the PHY is accessed via
C22 or C45 MDIO.

Further improve the translation logic for this register mapping, so a
single unified driver works efficiently with both access methods,
reducing code duplication.

The series also includes cleanup to remove unnecessary paged operations
on registers that aren't actually affected by page selection.

Testing was done on RTL8211F and RTL8221B-VB-CG (the latter in both
C22 and C45 modes).
====================

Link: https://patch.msgid.link/cover.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: simplify bogus paged operations

Only registers 0x10~0x17 are affected by the value in the page
selection register 0x1f. Hence there is no point in using paged
operations when accessing any other registers.
Simplify the driver by using the normal phy_read and phy_write
operations for registers which are anyway not affected by paging.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/0c5cbb66ce3e72a011d76f8c3d61ebcac44483bb.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: demystify PHYSR register location

Turns out that register address RTL_VND2_PHYSR (0xa434) maps to
Clause-22 register MII_RESV2. Use that to get rid of yet another magic
number, and rename access macros accordingly.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/6ed246e0aa3ca8038d2fa432d51518959fb89b6b.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: reunify C22 and C45 drivers

Reunify the split C22/C45 drivers for the RTL8221B-VB-CG 2.5Gbps and
RTL8221B-VM-CG 2.5Gbps PHYs back into a single driver.

This is possible now by using all the driver operations previously used
by the C45 driver, as transparent access to all MMDs including
MDIO_MMD_VEND2 is now possible also over Clause-22 MDIO.

The unified driver will still only use Clause-45 access on any Clause-45
capable busses while still working fine on Clause-22 busses.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/bffcb85fdc20e07056976962d3caaa1be5d0ddb0.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: simplify C22 reg access via MDIO_MMD_VEND2

RealTek 2.5GE PHYs have all standard Clause-22 registers mapped also
inside MDIO_MMD_VEND2 at offset 0xa400. This is used mainly in case the
PHY is connected to a Clause-45-only bus. The RTL8221B is frequently
used in copper SFP module which uses the RollBall MDIO-over-I2C
method which *only* supports Clause-45, for example.

In order to support using the PHY on Clause-45-only busses, the PHY
driver has previously been split into a C22-only and C45-only instances,
creating quite a bit of redundancy and confusion.

In preparation of reunifying the two driver instances, add support for
translating MDIO_MMD_VEND2 registers 0xa400 to 0xa43c back to Clause-22
registers 0 to 30 in case the PHY is accessed on a Clause-22 bus.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/fd49d86bd0445b76269fd3ea456c709c2066683f.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: support interrupt also for C22 variants

Now that access to MDIO_MMD_VEND2 works transparently also in Clause-22
mode, add interrupt support also for the C22 variants of the
RTL8221B-VB-CG and RTL8221B-VM-CG. This results in the C22 and C45
driver instances now having all the same features implemented.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/7620084b1de01580edc2d0e1b9548507fb4643a8.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: fix dwmac4 transmit performance regression

dwmac4's transmit performance dropped by a factor of four due to an
incorrect assumption about which definitions are for what. This
highlights the need for sane register macros.

Commit 8409495bf6c9 ("net: stmmac: cores: remove many xxx_SHIFT
definitions") changed the way the txpbl value is merged into the
register:

        value = readl(ioaddr + DMA_CHAN_TX_CONTROL(dwmac4_addrs, chan));
-       value = value | (txpbl << DMA_BUS_MODE_PBL_SHIFT);
+       value = value | FIELD_PREP(DMA_BUS_MODE_PBL, txpbl);

With the following in the header file:

#define DMA_BUS_MODE_PBL               BIT(16)
-#define DMA_BUS_MODE_PBL_SHIFT         16

The assumption here was that DMA_BUS_MODE_PBL was the mask for
DMA_BUS_MODE_PBL_SHIFT, but this turns out not to be the case.

The field is actually six bits wide, buts 21:16, and is called
TXPBL.

What's even more confusing is, there turns out to be a PBLX8
single bit in the DMA_CHAN_CONTROL register (0x1100 for channel 0),
and DMA_BUS_MODE_PBL seems to be used for that. However, this bit
et.al. was listed under a comment "/* DMA SYS Bus Mode bitmap */"
which is for register 0x1004.

Fix this up by adding an appropriately named field definition under
the DMA_CHAN_TX_CONTROL() register address definition.

Move the RPBL mask definition under DMA_CHAN_RX_CONTROL(), correctly
renaming it as well.

Also move the PBL bit definition under DMA_CHAN_CONTROL(), correctly
renaming it.

This removes confusion over the PBL fields.

Fixes: 8409495bf6c9 ("net: stmmac: cores: remove many xxx_SHIFT definitions")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Bisected-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://lore.kernel.org/51859704-57fd-4913-b09d-9ac58a57f185@bootlin.com
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vgY1k-00000003vOC-0Z1H@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: move tcp_rate_skb_sent() to tcp_output.c

It is only called from __tcp_transmit_skb() and __tcp_retransmit_skb().

Move it in tcp_output.c and make it static.

clang compiler is now able to inline it from __tcp_transmit_skb().

gcc compiler inlines it in the two callers, which is also fine.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260114165109.1747722-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: cpsw_ale: Remove obsolete macros

- ALE_VERSION_MAJOR/MINOR are no longer used following the transition to
  regmaps in commit bbfc7e2b9ebe ("net: ethernet: ti: cpsw_ale: use
  regfields for ALE registers")
- ALE_VERSION_IR3 is unused since entry mask bits are no longer
  hardcoded with commit b5d31f294027 ("net: ethernet: ti: ale: optimize
  ale entry mask bits configuartion")
- ALE_VERSION_IR4 has never been used since its introduction in commit
  ca47130a744b ("net: netcp: ale: update to support unknown vlan
  controls for NU switch")

Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Link: https://patch.msgid.link/20260114144425.3973272-1-stefan.wiehler@nokia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: cake: avoid separate allocation of struct cake_sched_config

Paolo pointed out that we can avoid separately allocating struct
cake_sched_config even in the non-mq case, by embedding it into struct
cake_sched_data. This reduces the complexity of the logic that swaps the
pointers and frees the old value, at the cost of adding 56 bytes to the
latter. Since cake_sched_data is already almost 17k bytes, this seems
like a reasonable tradeoff.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Fixes: bc0ce2bad36c ("net/sched: sch_cake: Factor out config variables into separate struct")
Link: https://patch.msgid.link/20260113143157.2581680-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: tls: Enhance TLS resync async process documentation

Expand the tls-offload.rst documentation to provide a more detailed
explanation of the asynchronous resync process, including the role
of struct tls_offload_resync_async in managing resync requests on
the kernel side.

Also, add documentation for helper functions
tls_offload_rx_resync_async_request_start/ _end/ _cancel.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1768298883-1602599-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'uapi-use-uapi-definitions-of-int_max-and-int_min'

Thomas Weißschuh says:

====================
uapi: Use UAPI definitions of INT_MAX and INT_MIN

Using <limits.h> to gain access to INT_MAX and INT_MIN introduces a
dependency on a libc, which UAPI headers should not do.

Introduce and use equivalent UAPI constants.
====================

Link: https://patch.msgid.link/20260113-uapi-limits-v2-0-93c20f4b2c1a@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netfilter: uapi: Use UAPI definition of INT_MAX and INT_MIN

Using <limits.h> to gain access to INT_MAX and INT_MIN introduces a
dependency on a libc, which UAPI headers should not do.

Use the equivalent UAPI constants.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20260113-uapi-limits-v2-3-93c20f4b2c1a@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: uapi: Use UAPI definition of INT_MAX

Using <limits.h> to gain access to INT_MAX introduces a dependency on a
libc, which UAPI headers should not do.

Use the equivalent UAPI constant.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20260113-uapi-limits-v2-2-93c20f4b2c1a@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

uapi: add INT_MAX and INT_MIN constants

Some UAPI headers use INT_MAX and INT_MIN. Currently they include
<limits.h> for their definitions, which introduces a problematic
dependency on libc.

Add custom, namespaced definitions of INT_MAX and INT_MIN using the
same values as the regular kernel code.
These definitions are not added to uapi/linux/limits.h, as that header
will conflict with libc definitions on some platforms.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20260113-uapi-limits-v2-1-93c20f4b2c1a@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: sr9700: remove code to drive nonexistent MII

This device does not have a MII, even though the driver
contains code to drive one (because it originated as a copy of the
dm9601 driver). It also only supports 10Mbps half-duplex
operation (the DM9601 registers to set the speed/duplex mode
are read-only). Remove all MII-related code and implement
sr9700_get_link_ksettings which returns hardcoded correct
information for the link speed and duplex mode. Also add
announcement of the link status like many other Ethernet
drivers have.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260113040649.54248-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-pcs-rzn1-miic-support-configurable-phy_link-polarity'

Lad Prabhakar says:

====================
net: pcs: rzn1-miic: Support configurable PHY_LINK polarity

This series adds support for configuring the active level of MIIC
PHY_LINK status signals on Renesas RZ/N1 and RZ/T2H/N2H platforms.

The MIIC block provides dedicated hardware PHY_LINK signals that indicate
EtherPHY link-up and link-down status independently of whether the MAC
(GMAC) or Ethernet switch (ETHSW) is used. While GMAC-based systems
typically obtain link state via MDIO and handle it in software, the
ETHSW relies on these PHY_LINK pins for both CPU-assisted operation and
switch-only forwarding paths that do not involve the host processor.

These hardware PHY_LINK signals are particularly important for use cases
requiring fast reaction to link-down events, such as redundancy protocols
including Device Level Ring (DLR). In such scenarios, relying solely on
software-based link detection introduces latency that can negatively
impact recovery time. The ETHSW therefore exposes PHY_LINK signals to
enable immediate hardware-level detection of cable or port failures.

Some systems require the PHY_LINK signal polarity to be configured as
active low rather than the default active high. This series introduces a
new DT property to describe the required polarity and adds corresponding
driver support to program the MIIC PHY_LINK register accordingly. The
configuration is accumulated during DT parsing and applied once hardware
initialization is complete, taking into account SoC-specific differences
between RZ/N1 and RZ/T2H/N2H.
====================

Link: https://patch.msgid.link/20260112173555.1166714-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pcs: rzn1-miic: Add PHY_LINK active-level configuration support

Add support to configure the active level of MIIC PHY_LINK status signals
on a per-converter basis using a DT property.

MIIC provides dedicated PHY_LINK signals that indicate EtherPHY link-up and
link-down status in hardware. These signals are required regardless of
whether GMAC or ETHSW is used. With GMAC, link state is retrieved via
MDC/MDIO and handled in software, while ETHSW relies on PHY_LINK pins for
both CPU-assisted operation and switch-only data paths that do not involve
the host.

Hardware PHY_LINK signals are also critical for fast reaction to link-down
events, for example when running redundancy protocols such as Device Level
Ring (DLR), where rapid detection of cable faults is required to switch to
an alternate path without software latency.

Parse the requested polarity from DT, accumulate the configuration during
probing, and apply it to the MIIC_PHY_LINK register once hardware
initialization is complete, when the registers can be safely modified.
Handle SoC-specific bit layout differences between RZ/N1 and RZ/T2H/N2H
within the driver.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260112173555.1166714-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: pcs: renesas,rzn1-miic: Add phy_link property

Add the renesas,miic-phy-link-active-low property to allow configuring
the active level of phy_link status signals provided by the MIIC block.

EtherPHY link-up and link-down status is required as a hardware IP
feature independent of whether GMAC or ETHSW is used. With GMAC, link
state is retrieved via MDC/MDIO and handled in software. In contrast,
ETHSW exposes dedicated PHY_LINK pins that provide this information
directly in hardware.

These PHY_LINK signals are required not only for host-controlled traffic
but also for switch-only forwarding paths where frames are exchanged
between external nodes without CPU involvement. This is particularly
important for redundancy protocols such as DLR (Device Level Ring),
which depend on fast detection of link-down events caused by cable or
port failures. Handling such events purely in software introduces
latency, which is why ETHSW provides dedicated hardware PHY_LINK pins.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260112173555.1166714-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mctp i2c: initialise event handler read bytes

Set a 0xff value for i2c reads of an mctp-i2c device. Otherwise reads
will return "val" from the i2c bus driver. For i2c-aspeed and
i2c-npcm7xx that is a stack uninitialised u8.

Tested with "i2ctransfer -y 1 r10@0x34" where 0x34 is a mctp-i2c
instance, now it returns all 0xff.

Fixes: f5b8abf9fc3d ("mctp i2c: MCTP I2C binding driver")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20260113-mctp-read-fix-v1-1-70c4b59c741c@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xgbe: Use netlink extack to report errors to ethtool

Upgrade XGBE driver to report errors via netlink extack instead
of netdev_error so ethtool userspace can be aware of failures.

Signed-off-by: Vishal Badole <Vishal.Badole@amd.com>
Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260114080357.1778132-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Fix build break on non-x86 platforms

Commit c470195b989fe added .getcrosststamp() interface where
the code uses boot_cpu_has() function which is available only
in x86 platforms. This fails the build on any other platform.

Since the interface is going to be supported only on x86 anyway,
we can simply compile out the entire support on non-x86 platforms.

Cover the .getcrosststamp support under CONFIG_X86

Fixes: c470195b989f ("bnxt_en: Add PTP .getcrosststamp() interface to get device/host times")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601111808.WnBJCuWI-lkp@intel.com
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260113183422.508851-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: add WQ_PERCPU to alloc_workqueue users

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.

In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/20260113151433.257320-1-marco.crivellari@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: fix in-band capabilities for 2.5G PHYs

It looks like the configuration of in-band AN only affects SGMII, and it
is always disabled for 2500Base-X. Adjust the reported capabilities
accordingly.

This is based on testing using OpenWrt on Zyxel XGS1010-12 rev A1 with
RTL8226-CG, and Zyxel XGS1210-12 rev B1 with RTL8221B-VB-CG. On these
devices, 2500Base-X in-band AN is known to work with some SFP modules
(containing an unknown PHY). However, with the built-in Realtek PHYs,
no auto-negotiation takes place, irrespective of the configuration of
the PHY.

Fixes: 10fbd71fc5f9b ("net: phy: realtek: implement configuring in-band an")
Signed-off-by: Jan Hoffmann <jan@3e8.eu>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/20260113205557.503409-1-jan@3e8.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: minor __alloc_skb() optimization

We can directly call __finalize_skb_around()
instead of __build_skb_around() because @size is not zero.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260113131017.2310584-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: remove unused fixup unregistering functions

No user of PHY fixups unregisters these. IOW: The fixup unregistering
functions are unused and can be removed. Remove also documentation
for these functions. Whilst at it, remove also mentioning of
phy_register_fixup() from the Documentation, as this function has been
static since ea47e70e476f ("net: phy: remove fixup-related definitions
from phy.h which are not used outside phylib").

Fixup unregistering functions were added with f38e7a32ee4f
("phy: add phy fixup unregister functions") in 2016, and last user
was removed with 6782d06a47ad ("net: usb: lan78xx: Remove KSZ9031 PHY
fixup") in 2024.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ff8ac321-435c-48d0-b376-fbca80c0c22e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: sr9700: fix byte numbering in comments

The comments describing the RX/TX headers and status response use
a combination of 0- and 1-based indexing, leading to confusion. Correct
the numbering and make it consistent. Also fix a typo "pm" for "pn".

This issue also existed in dm9601 and was fixed in commit 61189c78bda8
("dm9601: trivial comment fixes").

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Peter Korsgaard <peter@korsgaard.com>
Link: https://patch.msgid.link/20260113075327.85435-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: dnet: remove driver

This legacy platform driver was used with some Qong board.
Support for this board was removed with
commit c93197b0041d ("ARM: imx: Remove i.MX31 board files")
in 2020. So remove this now orphaned driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/cef7c728-28ee-439f-b747-eb1c9394fe51@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-airoha-init-block-ack-memory-region-for-mt7996-npu-offloading'

Lorenzo Bianconi says:

====================
net: airoha: Init Block Ack memory region for MT7996 NPU offloading

This is a preliminary series in order to enable NPU offloading for
MT7996 (Eagle) chipset.
====================

Link: https://patch.msgid.link/20260108-airoha-ba-memory-region-v3-0-bf1814e5dcc4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: npu: Init BA memory region if provided via DTS

Initialize NPU Block Ack memory region if reserved via DTS.
Block Ack memory region is used by NPU MT7996 (Eagle) offloading.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260108-airoha-ba-memory-region-v3-2-bf1814e5dcc4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: airoha: npu: Add BA memory region

Introduce Block Ack memory region used by NPU MT7996 (Eagle) offloading.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260108-airoha-ba-memory-region-v3-1-bf1814e5dcc4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-adin-enable-configuration-of-the-lp-termination-register'

Osose Itua says:

====================
net: phy: adin: enable configuration of the LP Termination Register
====================

Link: https://patch.msgid.link/20260107221913.1334157-1-osose.itua@savoirfairelinux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: adin: enable configuration of the LP Termination Register

The ADIN1200/ADIN1300 provide a control bit that selects between normal
receive termination and the lowest common mode impedance for 100BASE-TX
operation. This behavior is controlled through the Low Power Termination
register (B_100_ZPTM_EN_DIMRX).

Bit 0 of this register enables normal termination when set (this is the
default), and selects the lowest common mode impedance when cleared.

Signed-off-by: Osose Itua <osose.itua@savoirfairelinux.com>
Acked-by: Nuno Sá <nuno.sa@analog.com>
Link: https://patch.msgid.link/20260107221913.1334157-3-osose.itua@savoirfairelinux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: adi,adin: document LP Termination property

Add "adi,low-cmode-impedance" boolean property which, when present,
configures the PHY for the lowest common-mode impedance on the receive
pair for 100BASE-TX operation by clearing the B_100_ZPTM_EN_DIMRX bit.
This is suited for capacitive coupled applications and other
applications where there may be a path for high common-mode noise to
reach the PHY.

If this value is not present, the value of the bit by default is 1,
which is normal termination (zero-power termination) mode.

Signed-off-by: Osose Itua <osose.itua@savoirfairelinux.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Nuno Sá <nuno.sa@analog.com>
Link: https://patch.msgid.link/20260107221913.1334157-2-osose.itua@savoirfairelinux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'phy_common_properties' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Vinod Koul says:

====================
phy common properties

Introduce "rx-polarity" and "tx-polarity" device tree properties
with Kunit tests (from Vladimir Oltean).

* tag 'phy_common_properties' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: add phy_get_rx_polarity() and phy_get_tx_polarity()
  dt-bindings: phy-common-props: RX and TX lane polarity inversion
  dt-bindings: phy-common-props: ensure protocol-names are unique
  dt-bindings: phy-common-props: create a reusable "protocol-names" definition
  dt-bindings: phy: rename transmit-amplitude.yaml to phy-common-props.yaml
====================

Link: https://patch.msgid.link/aWeXvFcGNK5T6As9@vaman
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: Clarify len/n_stats fields in/out semantics

Document that the 'len' field in ethtool_gstrings and 'n_stats' field in
ethtool_stats optionally serve dual purposes: on entry they specify the
number of items requested, and on return they indicate the number
actually returned (which is not necessarily the same).

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20260115060544.481550-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.19-rc6).

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, can and IPsec.

  Current release - regressions:

   - net: add net.core.qdisc_max_burst

   - can: propagate CAN device capabilities via ml_priv

  Previous releases - regressions:

   - dst: fix races in rt6_uncached_list_del() and
     rt_del_uncached_list()

   - ipv6: fix use-after-free in inet6_addr_del().

   - xfrm: fix inner mode lookup in tunnel mode GSO segmentation

   - ip_tunnel: spread netdev_lockdep_set_classes()

   - ip6_tunnel: use skb_vlan_inet_prepare() in __ip6_tnl_rcv()

   - bluetooth: hci_sync: enable PA sync lost event

   - eth: virtio-net:
      - fix the deadlock when disabling rx NAPI
      - fix misalignment bug in struct virtnet_info

  Previous releases - always broken:

   - ipv4: ip_gre: make ipgre_header() robust

   - can: fix SSP_SRC in cases when bit-rate is higher than 1 MBit.

   - eth:
      - mlx5e: profile change fix
      - octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
      - macvlan: fix possible UAF in macvlan_forward_source()"

* tag 'net-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
  virtio_net: Fix misalignment bug in struct virtnet_info
  net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts
  can: raw: instantly reject disabled CAN frames
  can: propagate CAN device capabilities via ml_priv
  Revert "can: raw: instantly reject unsupported CAN frames"
  net/sched: sch_qfq: do not free existing class in qfq_change_class()
  selftests: drv-net: fix RPS mask handling for high CPU numbers
  selftests: drv-net: fix RPS mask handling in toeplitz test
  ipv6: Fix use-after-free in inet6_addr_del().
  dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list()
  net: hv_netvsc: reject RSS hash key programming without RX indirection table
  tools: ynl: render event op docs correctly
  net: add net.core.qdisc_max_burst
  net: airoha: Fix typo in airoha_ppe_setup_tc_block_cb definition
  net: phy: motorcomm: fix duplex setting error for phy leds
  net: octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
  net/mlx5e: Restore destroying state bit after profile cleanup
  net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of priv
  net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv
  net/mlx5e: Fix crash on profile change rollback failure
  ...

Merge tag 'linux-can-fixes-for-6.19-20260115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2026-01-15

this is a pull request of 4 patches for net/main, it super-seeds the
"can 2026-01-14" pull request. The dev refcount leak in patch #3 is
fixed.

The first 3 patches are by Oliver Hartkopp and revert the approach to
instantly reject unsupported CAN frames introduced in
net-next-for-v6.19 and replace it by placing the needed data into the
CAN specific ml_priv.

The last patch is by Tetsuo Handa and fixes a J1939 refcount leak for
j1939_session in session deactivation upon receiving the second RTS.

linux-can-fixes-for-6.19-20260115

* tag 'linux-can-fixes-for-6.19-20260115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts
  can: raw: instantly reject disabled CAN frames
  can: propagate CAN device capabilities via ml_priv
  Revert "can: raw: instantly reject unsupported CAN frames"
====================

Link: https://patch.msgid.link/20260115090603.1124860-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'ipsec-2026-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2026-01-14

1) Fix inner mode lookup in tunnel mode GSO segmentation.
   The protocol was taken from the wrong field.

2) Set ipv4 no_pmtu_disc flag only on output SAs. The
   insertation of input SAs can fail if no_pmtu_disc
   is set.

Please pull or let me know if there are problems.

ipsec-2026-01-14

* tag 'ipsec-2026-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: set ipv4 no_pmtu_disc flag only on output sa when direction is set
  xfrm: Fix inner mode lookup in tunnel mode GSO segmentation
====================

Link: https://patch.msgid.link/20260114121817.1106134-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: inline napi_skb_cache_get()

clang is inlining it already, gcc (14.2) does not.

Small space cost (215 bytes on x86_64) but faster sk_buff allocations.

$ scripts/bloat-o-meter -t net/core/skbuff.gcc.before.o net/core/skbuff.gcc.after.o
add/remove: 0/1 grow/shrink: 4/1 up/down: 359/-144 (215)
Function                                     old     new   delta
__alloc_skb                                  471     611    +140
napi_build_skb                               245     363    +118
napi_alloc_skb                               331     416     +85
skb_copy_ubufs                              1869    1885     +16
skb_shift                                   1445    1413     -32
napi_skb_cache_get                           112       -    -112
Total: Before=59941, After=60156, chg +0.36%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260112131515.4051589-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-mlx5-hws-single-flow-counter-support'

Tariq Toukan says:

====================
net/mlx5: HWS single flow counter support

This small series refactors the flow counter bulk initialization code
and extends it so that single flow counters are also usable by hardware
steering (HWS) rules.

Patches 1-2 refactor the bulk init path: first by factoring out common
flow counter bulk initialization into mlx5_fc_bulk_init(), then by
splitting the bitmap allocation into mlx5_fs_bulk_bitmap_alloc(), with
no functional changes.

Patch 3 initializes bulk data for counters allocated via
mlx5_fc_single_alloc(), so they can be safely used by HWS rules.
====================

Link: https://patch.msgid.link/1768210825-1598472-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: Initialize bulk for single flow counters

Ensure that flow counters allocated with mlx5_fc_single_alloc() have
bulk correctly initialized so they can safely be used in HWS rules.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768210825-1598472-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: fs, split bulk init

Refactor mlx5_fs_bulk_init() by moving bitmap allocation logic into a
new helper function mlx5_fs_bulk_bitmap_alloc(). This change does not
alter any logic.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768210825-1598472-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: fs, factor out flow counter bulk init

Add mlx5_fc_bulk_init() to handle bulk initialization of flow counters.
This change does not alter any logic, but refactors the code to remove
duplicate initialization logic by centralizing it in a single function.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768210825-1598472-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'introduce-and-use-netif_xmit_timeout_ms-helper'

Tariq Toukan says:

====================
Introduce and use netif_xmit_timeout_ms() helper

This is V2, find V1 here:
https://lore.kernel.org/all/1764054776-1308696-1-git-send-email-tariqt@nvidia.com/

This series by Shahar introduces a new helper function
netif_xmit_timeout_ms() to check if a TX queue has timed out and report
the timeout duration.
It also encapsulates the check for whether the TX queue is stopped.

Replace duplicated open-coded timeout check in hns3 driver with the new
helper.

For mlx5e, refine the TX timeout recovery flow to act only on SQs whose
transmit timestamp indicates an actual timeout, as determined by the
helper. This prevents unnecessary channel reopen events caused by
attempting recovery on queues that are merely stopped but not truly
timed out.
====================

Link: https://patch.msgid.link/1768209383-1546791-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Refine TX timeout handling to skip non-timed-out SQ

mlx5e_tx_timeout_work() is invoked when the dev_watchdog reports a
timed-out TX queue. Currently, the recovery flow is triggered for all
stopped SQs, which is not always correct — some SQs may be temporarily
stopped without actually timing out. Attempting to recover such SQs
results in no EQE being polled (since no real timeout occurred), which
the driver misinterprets as a recovery failure, unnecessarily causing
channel reopening.

Improve the logic to initiate recovery only for SQs that are both
stopped and timed out. Utilize the helper introduced in the previous
patch to determine whether the netdevice watchdog timeout period has
elapsed since the SQ’s last transmit timestamp.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768209383-1546791-4-git-send-email-tariqt@nvidia.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: hns3: Use netif_xmit_timeout_ms() helper

Replace the open-coded TX queue timeout check
in hns3_get_timeout_queue() with a call to
netif_xmit_timeout_ms() helper.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/1768209383-1546791-3-git-send-email-tariqt@nvidia.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: Introduce netif_xmit_timeout_ms() helper

Introduce a new helper function netif_xmit_timeout_ms() to check
if a TX queue is stopped and has timed out and report the timeout
duration. This makes the timeout logic reusable, and will be used
in several places in subsequent patches.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768209383-1546791-2-git-send-email-tariqt@nvidia.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio_net: Fix misalignment bug in struct virtnet_info

Use the new TRAILING_OVERLAP() helper to fix a misalignment bug
along with the following warning:

drivers/net/virtio_net.c:429:46: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

This helper creates a union between a flexible-array member (FAM)
and a set of members that would otherwise follow it (in this case
`u8 rss_hash_key_data[VIRTIO_NET_RSS_MAX_KEY_SIZE];`). This
overlays the trailing members (rss_hash_key_data) onto the FAM
(hash_key_data) while keeping the FAM and the start of MEMBERS aligned.
The static_assert() ensures this alignment remains.

Notice that due to tail padding in flexible `struct
virtio_net_rss_config_trailer`, `rss_trailer.hash_key_data`
(at offset 83 in struct virtnet_info) and `rss_hash_key_data` (at
offset 84 in struct virtnet_info) are misaligned by one byte. See
below:

struct virtio_net_rss_config_trailer {
        __le16                     max_tx_vq;            /*     0     2 */
        __u8                       hash_key_length;      /*     2     1 */
        __u8                       hash_key_data[];      /*     3     0 */

        /* size: 4, cachelines: 1, members: 3 */
        /* padding: 1 */
        /* last cacheline: 4 bytes */
};

struct virtnet_info {
...
        struct virtio_net_rss_config_trailer rss_trailer; /*    80     4 */

        /* XXX last struct has 1 byte of padding */

        u8                         rss_hash_key_data[40]; /*    84    40 */
...
        /* size: 832, cachelines: 13, members: 48 */
        /* sum members: 801, holes: 8, sum holes: 31 */
        /* paddings: 2, sum paddings: 5 */
};

After changes, those members are correctly aligned at offset 795:

struct virtnet_info {
...
        union {
                struct virtio_net_rss_config_trailer rss_trailer; /*   792     4 */
                struct {
                        unsigned char __offset_to_hash_key_data[3]; /*   792     3 */
                        u8         rss_hash_key_data[40]; /*   795    40 */
                };                                       /*   792    43 */
        };                                               /*   792    44 */
...
        /* size: 840, cachelines: 14, members: 47 */
        /* sum members: 801, holes: 8, sum holes: 35 */
        /* padding: 4 */
        /* paddings: 1, sum paddings: 4 */
        /* last cacheline: 8 bytes */
};

As a result, the RSS key passed to the device is shifted by 1
byte: the last byte is cut off, and instead a (possibly
uninitialized) byte is added at the beginning.

As a last note `struct virtio_net_rss_config_hdr *rss_hdr;` is also
moved to the end, since it seems those three members should stick
around together. :)

Cc: stable@vger.kernel.org
Fixes: ed3100e90d0d ("virtio_net: Use new RSS config structs")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/aWIItWq5dV9XTTCJ@kspp
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'xsk-move-cq_cached_prod_lock'

Jason Xing says:

====================
xsk: move cq_cached_prod_lock

From: Jason Xing <kernelxing@tencent.com>

Move cq_cached_prod_lock to avoid touching new cacheline.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
====================

Link: https://patch.msgid.link/20260104012125.44003-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

xsk: move cq_cached_prod_lock to avoid touching a cacheline in sending path

We (Paolo and I) noticed that in the sending path touching an extra
cacheline due to cq_cached_prod_lock will impact the performance. After
moving the lock from struct xsk_buff_pool to struct xsk_queue, the
performance is increased by ~5% which can be observed by xdpsock.

An alternative approach [1] can be using atomic_try_cmpxchg() to have the
same effect. But unfortunately I don't have evident performance numbers to
prove the atomic approach is better than the current patch. The advantage
is to save the contention time among multiple xsks sharing the same pool
while the disadvantage is losing good maintenance. The full discussion can
be found at the following link.

[1]: https://lore.kernel.org/all/20251128134601.54678-1-kerneljasonxing@gmail.com/

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20260104012125.44003-3-kerneljasonxing@gmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

xsk: advance cq/fq check when shared umem is used

In the shared umem mode with different queues or devices, either
uninitialized cq or fq is not allowed which was previously done in
xp_assign_dev_shared(). The patch advances the check at the beginning
so that 1) we can avoid a few memory allocation and stuff if cq or fq
is NULL, 2) it can be regarded as preparation for the next patch in
the series.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20260104012125.44003-2-kerneljasonxing@gmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts

Since j1939_session_deactivate_activate_next() in j1939_tp_rxtimer() is
called only when the timer is enabled, we need to call
j1939_session_deactivate_activate_next() if we cancelled the timer.
Otherwise, refcount for j1939_session leaks, which will later appear as

| unregister_netdevice: waiting for vcan0 to become free. Usage count = 2.

problem.

Reported-by: syzbot <syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Link: https://patch.msgid.link/b1212653-8fa1-44e1-be9d-12f950fb3a07@I-love.SAKURA.ne.jp
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: raw: better approach to instantly reject unsupported CAN frames"

Oliver Hartkopp <socketcan@hartkopp.net> says:

This series reverts commit 1a620a723853 ("can: raw: instantly reject
unsupported CAN frames").

and its follow-up fixes for the introduced dependency issues.

commit 1a620a723853 ("can: raw: instantly reject unsupported CAN frames")
commit cb2dc6d2869a ("can: Kconfig: select CAN driver infrastructure by default")
commit 6abd4577bccc ("can: fix build dependency")
commit 5a5aff6338c0 ("can: fix build dependency")

The reverted patch was accessing CAN device internal data structures
from the network layer because it needs to know about the CAN protocol
capabilities of the CAN devices.

This data access caused build problems between the CAN network and the
CAN driver layer which introduced unwanted Kconfig dependencies and fixes.

The patches 2 & 3 implement a better approach which makes use of the
CAN specific ml_priv data which is accessible from both sides.

With this change the CAN network layer can check the required features
and the decoupling of the driver layer and network layer is restored.

Link: https://patch.msgid.link/20260109144135.8495-1-socketcan@hartkopp.net
[mkl: give series a more descriptive name]
[mkl: properly format reverted patch commitish]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: raw: instantly reject disabled CAN frames

For real CAN interfaces the CAN_CTRLMODE_FD and CAN_CTRLMODE_XL control
modes indicate whether an interface can handle those CAN FD/XL frames.

In the case a CAN XL interface is configured in CANXL-only mode with
disabled error-signalling neither CAN CC nor CAN FD frames can be sent.

The checks are now performed on CAN_RAW sockets to give an instant feedback
to the user when writing unsupported CAN frames to the interface or when
the CAN interface is in read-only mode.

Fixes: 1a620a723853 ("can: raw: instantly reject unsupported CAN frames")
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260109144135.8495-4-socketcan@hartkopp.net
[mkl: fix dev reference leak]
Link: https://lore.kernel.org/all/0636c732-2e71-4633-8005-dfa85e1da445@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: propagate CAN device capabilities via ml_priv

Commit 1a620a723853 ("can: raw: instantly reject unsupported CAN frames")
caused a sequence of dependency and linker fixes.

Instead of accessing CAN device internal data structures which caused the
dependency problems this patch introduces capability information into the
CAN specific ml_priv data which is accessible from both sides.

With this change the CAN network layer can check the required features and
the decoupling of the driver layer and network layer is restored.

Fixes: 1a620a723853 ("can: raw: instantly reject unsupported CAN frames")
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260109144135.8495-3-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Revert "can: raw: instantly reject unsupported CAN frames"

This reverts commit 1a620a723853a0f49703c317d52dc6b9602cbaa8

and its follow-up fixes for the introduced dependency issues.

commit 1a620a723853 ("can: raw: instantly reject unsupported CAN frames")
commit cb2dc6d2869a ("can: Kconfig: select CAN driver infrastructure by default")
commit 6abd4577bccc ("can: fix build dependency")
commit 5a5aff6338c0 ("can: fix build dependency")

The entire problem was caused by the requirement that a new network layer
feature needed to know about the protocol capabilities of the CAN devices.
Instead of accessing CAN device internal data structures which caused the
dependency problems a better approach has been developed which makes use of
CAN specific ml_priv data which is accessible from both sides.

Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260109144135.8495-2-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Only one core change (and one in doc only) the rest are drivers.

  The one core fix is for some inline encrypting drives that can't
  handle encryption requests on non-data commands (like error handling
  ones); it saves the request level encryption parameters in the eh_save
  structure so they can be cleared for error handling and restored after
  it is completed"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: host: mediatek: Make read-only array scale_us static const
  scsi: bfa: Update outdated comment
  scsi: mpt3sas: Update maintainer list
  scsi: ufs: core: Configure MCQ after link startup
  scsi: core: Fix error handler encryption support
  scsi: core: Correct documentation for scsi_test_unit_ready()
  scsi: ufs: dt-bindings: Fix several grammar errors

Merge tag 'bitmap-for-6.19-rc5' of https://github.com/norov/linux

Pull bitmap fix from Yury Norov:
"Fix Rust build for architectures implementing their own find_bit() ops
(arm and m68k)"

* tag 'bitmap-for-6.19-rc5' of https://github.com/norov/linux:
rust: bitops: fix missing _find_* functions on 32-bit ARM

Merge tag 'media/v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

- ov02c10: some fixes related to preserving bayer pattern and
   horizontal control

- ipu-bridge: Add quirks for some Dell XPS laptops with inverted
   sensors

- mali-c55: Fix version identifier logic

- rzg2l-cru: csi-2: fix RZ/V2H input sizes on some variants

* tag 'media/v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: ov02c10: Remove unnecessary hflip and vflip pointers
  media: ipu-bridge: Add DMI quirk for Dell XPS laptops with upside down sensors
  media: ov02c10: Fix the horizontal flip control
  media: ov02c10: Adjust x-win/y-win when changing flipping to preserve bayer-pattern
  media: ov02c10: Fix bayer-pattern change after default vflip change
  media: rzg2l-cru: csi-2: Support RZ/V2H input sizes
  media: uapi: mali-c55-config: Remove version identifier
  media: mali-c55: Remove duplicated version check
  media: Documentation: mali-c55: Use v4l2-isp version identifier

phy: add phy_get_rx_polarity() and phy_get_tx_polarity()

Add helpers in the generic PHY folder which can be used using 'select
PHY_COMMON_PROPS' from Kconfig, without otherwise needing to
enable GENERIC_PHY.

These helpers need to deal with the slight messiness of the fact that
the polarity properties are arrays per protocol, and with the fact that
there is no default value mandated by the standard properties, all
default values depend on driver and protocol (PHY_POL_NORMAL may be a
good default for SGMII, whereas PHY_POL_AUTO may be a good default for
PCIe).

Push the supported mask of polarities to these helpers, to simplify
drivers such that they don't need to validate what's in the device tree
(or other firmware description).

Add a KUnit test suite to make sure that the API produces the expected
results. The fact that we use fwnode structures means we can validate
with software nodes, and as opposed to the device_property API, we can
bypass the need to have a device structure.

Co-developed-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260111093940.975359-6-vladimir.oltean@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>

dt-bindings: phy-common-props: RX and TX lane polarity inversion

Differential signaling is a technique for high-speed protocols to be
more resilient to noise. At the transmit side we have a positive and a
negative signal which are mirror images of each other. At the receiver,
if we subtract the negative signal (say of amplitude -A) from the
positive signal (say +A), we recover the original single-ended signal at
twice its original amplitude. But any noise, like one coming from EMI
from outside sources, is supposed to have an almost equal impact upon
the positive (A + E, E being for "error") and negative signal (-A + E).
So (A + E) - (-A + E) eliminates this noise, and this is what makes
differential signaling useful.

Except that in order to work, there must be strict requirements observed
during PCB design and layout, like the signal traces needing to have the
same length and be physically close to each other, and many others.

Sometimes it is not easy to fulfill all these requirements, a simple
case to understand is when on chip A's pins, the positive pin is on the
left and the negative is on the right, but on the chip B's pins (with
which A tries to communicate), positive is on the right and negative on
the left. The signals would need to cross, using vias and other ugly
stuff that affects signal integrity (introduces impedance
discontinuities which cause reflections, etc).

So sometimes, board designers intentionally connect differential lanes
the wrong way, and expect somebody else to invert that signal to recover
useful data. This is where RX and TX polarity inversion comes in as a
generic concept that applies to any high-speed serial protocol as long
as it uses differential signaling.

I've stopped two attempts to introduce more vendor-specific descriptions
of this only in the past month:
https://lore.kernel.org/linux-phy/20251110110536.2596490-1-horatiu.vultur@microchip.com/
https://lore.kernel.org/netdev/20251028000959.3kiac5kwo5pcl4ft@skbuf/

and in the kernel we already have merged:
- "st,px_rx_pol_inv"
- "st,pcie-tx-pol-inv"
- "st,sata-tx-pol-inv"
- "mediatek,pnswap"
- "airoha,pnswap-rx"
- "airoha,pnswap-tx"

and maybe more. So it is pretty general.

One additional element of complexity is introduced by the fact that for
some protocols, receivers can automatically detect and correct for an
inverted lane polarity (example: the PCIe LTSSM does this in the
Polling.Configuration state; the USB 3.1 Link Layer Test Specification
says that the detection and correction of the lane polarity inversion in
SuperSpeed operation shall be enabled in Polling.RxEQ.). Whereas for
other protocols (SGMII, SATA, 10GBase-R, etc etc), the polarity is all
manual and there is no detection mechanism mandated by their respective
standards.

So why would one even describe rx-polarity and tx-polarity for protocols
like PCIe, if it had to always be PHY_POL_AUTO?

Related question: why would we define the polarity as an array per
protocol? Isn't the physical PCB layout protocol-agnostic, and aren't we
describing the same physical reality from the lens of different protocols?

The answer to both questions is because multi-protocol PHYs exist
(supporting e.g. USB2 and USB3, or SATA and PCIe, or PCIe and Ethernet
over the same lane), one would need to manually set the polarity for
SATA/Ethernet, while leaving it at auto for PCIe/USB 3.0+.

I also investigated from another angle: what if polarity inversion in
the PHY is one layer, and then the PCIe/USB3 LTSSM polarity detection is
another layer on top? Then rx-polarity = <PHY_POL_AUTO> doesn't make
sense, it can still be rx-polarity = <PHY_POL_NORMAL> or <PHY_POL_INVERT>,
and the link training state machine figures things out on top of that.
This would radically simplify the design, as the elimination of
PHY_POL_AUTO inherently means that the need for a property array per
protocol also goes away.

I don't know how things are in the general case, but at least in the 10G
and 28G Lynx SerDes blocks from NXP Layerscape devices, this isn't the
case, and there's only a single level of RX polarity inversion: in the
SerDes lane. In the case of PCIe, the controller is in charge of driving
the RDAT_INV bit autonomously, and it is read-only to software.

So the existence of this kind of SerDes lane proves the need for
PHY_POL_AUTO to be a third state.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260111093940.975359-5-vladimir.oltean@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>

dt-bindings: phy-common-props: ensure protocol-names are unique

Rob Herring points out that "The default for .*-names is the entries
don't have to be unique.":
https://lore.kernel.org/linux-phy/20251204155219.GA1533839-robh@kernel.org/

Let's use uniqueItems: true to make sure the schema enforces this. It
doesn't make sense in this case to have duplicate properties for the
same SerDes protocol.

Note that this can only be done with the $defs + $ref pattern as
established by the previous commit. When the tx-p2p-microvolt-names
constraints were expressed directly under "properties", it would have
been validated by the string-array meta-schema, which does not support
the 'uniqueItems' keyword as can be seen below.

properties:tx-p2p-microvolt-names: Additional properties are not allowed ('uniqueItems' was unexpected)
from schema $id: http://devicetree.org/meta-schemas/string-array.yaml

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260111093940.975359-4-vladimir.oltean@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>

dt-bindings: phy-common-props: create a reusable "protocol-names" definition

Other properties also need to be defined per protocol than just
tx-p2p-microvolt-names. Create a common definition to avoid copying a 55
line property.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260111093940.975359-3-vladimir.oltean@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>

dt-bindings: phy: rename transmit-amplitude.yaml to phy-common-props.yaml

I would like to add more properties similar to tx-p2p-microvolt, and I
don't think it makes sense to create one schema for each such property
(transmit-amplitude.yaml, lane-polarity.yaml, transmit-equalization.yaml
etc).

Instead, let's rename to phy-common-props.yaml, which makes it a more
adequate host schema for all the above properties.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260111093940.975359-2-vladimir.oltean@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK in riscv JIT (Menglong
   Dong)

- Fix reference count leak in bpf_prog_test_run_xdp() (Tetsuo Handa)

- Fix metadata size check in bpf_test_run() (Toke Høiland-Jørgensen)

- Check that BPF insn array is not allowed as a map for const strings
   (Deepanshu Kartikey)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix reference count leak in bpf_prog_test_run_xdp()
  bpf: Reject BPF_MAP_TYPE_INSN_ARRAY in check_reg_const_str()
  selftests/bpf: Update xdp_context_test_run test to check maximum metadata size
  bpf, test_run: Subtract size of xdp_frame from allowed metadata size
  riscv, bpf: Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK

net/sched: sch_qfq: do not free existing class in qfq_change_class()

Fixes qfq_change_class() error case.

cl->qdisc and cl should only be freed if a new class and qdisc
were allocated, or we risk various UAF.

Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: syzbot+07f3f38f723c335f106d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6965351d.050a0220.eaf7.00c5.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260112175656.17605-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rust: bitops: fix missing _find_* functions on 32-bit ARM

On 32-bit ARM, you may encounter linker errors such as this one:

ld.lld: error: undefined symbol: _find_next_zero_bit
>>> referenced by rust_binder_main.43196037ba7bcee1-cgu.0
>>> drivers/android/binder/rust_binder_main.o:(<rust_binder_main::process::Process>::insert_or_update_handle) in archive vmlinux.a
>>> referenced by rust_binder_main.43196037ba7bcee1-cgu.0
>>> drivers/android/binder/rust_binder_main.o:(<rust_binder_main::process::Process>::insert_or_update_handle) in archive vmlinux.a

This error occurs because even though the functions are declared by
include/linux/find.h, the definition is #ifdef'd out on 32-bit ARM. This
is because arch/arm/include/asm/bitops.h contains:

#define find_first_zero_bit(p,sz) _find_first_zero_bit_le(p,sz)
#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_le(p,sz,off)
#define find_first_bit(p,sz) _find_first_bit_le(p,sz)
#define find_next_bit(p,sz,off) _find_next_bit_le(p,sz,off)

And the underscore-prefixed function is conditional on #ifndef of the
non-underscore-prefixed name, but the declaration in find.h is *not*
conditional on that #ifndef.

To fix the linker error, we ensure that the symbols in question exist
when compiling Rust code. We do this by defining them in rust/helpers/
whenever the normal definition is #ifndef'd out.

Note that these helpers are somewhat unusual in that they do not have
the rust_helper_ prefix that most helpers have. Adding the rust_helper_
prefix does not compile, as 'bindings::_find_next_zero_bit()' will
result in a call to a symbol called _find_next_zero_bit as defined by
include/linux/find.h rather than a symbol with the rust_helper_ prefix.
This is because when a symbol is present in both include/ and
rust/helpers/, the one from include/ wins under the assumption that the
current configuration is one where that helper is unnecessary. This
heuristic fails for _find_next_zero_bit() because the header file always
declares it even if the symbol does not exist.

The functions still use the __rust_helper annotation. This lets the
wrapper function be inlined into Rust code even if full kernel LTO is
not used once the patch series for that feature lands.

Yury: arches are free to implement they own find_bit() functions. Most
rely on generic implementation, but arm32 and m86k - not; so they require
custom handling. Alice confirmed it fixes the build for both.

Cc: stable@vger.kernel.org
Fixes: 6cf93a9ed39e ("rust: add bindings for bitops.h")
Reported-by: Andreas Hindborg <a.hindborg@kernel.org>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/x/topic/x/near/561677301
Tested-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>

net: mana: Implement ndo_tx_timeout and serialize queue resets per port.

Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected
and a device-controlled port reset for all queues can be scheduled to a
ordered workqueue. The reset for all queues on stall detection is
recomended by hardware team.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/20260112130552.GA11785@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-couple-of-fixes-in-toeplitz-rps-cases'

Gal Pressman says:

====================
selftests: Couple of fixes in Toeplitz RPS cases

Fix a couple of bugs in the RPS cases of the Toeplitz selftest.
====================

Link: https://patch.msgid.link/20260112173715.384843-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: fix RPS mask handling for high CPU numbers

The RPS bitmask bounds check uses ~(RPS_MAX_CPUS - 1) which equals ~15 =
0xfff0, only allowing CPUs 0-3.

Change the mask to ~((1UL << RPS_MAX_CPUS) - 1) = ~0xffff to allow CPUs
0-15.

Fixes: 5ebfb4cc3048 ("selftests/net: toeplitz test")
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260112173715.384843-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: fix RPS mask handling in toeplitz test

The toeplitz.py test passed the hex mask without "0x" prefix (e.g.,
"300" for CPUs 8,9). The toeplitz.c strtoul() call wrongly parsed this
as decimal 300 (0x12c) instead of hex 0x300.

Pass the prefixed mask to toeplitz.c, and the unprefixed one to sysfs.

Fixes: 9cf9aa77a1f6 ("selftests: drv-net: hw: convert the Toeplitz test to Python")
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260112173715.384843-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Fix use-after-free in inet6_addr_del().

syzbot reported use-after-free of inet6_ifaddr in
inet6_addr_del(). [0]

The cited commit accidentally moved ipv6_del_addr() for
mngtmpaddr before reading its ifp->flags for temporary
addresses in inet6_addr_del().

Let's move ipv6_del_addr() down to fix the UAF.

[0]:
BUG: KASAN: slab-use-after-free in inet6_addr_del.constprop.0+0x67a/0x6b0 net/ipv6/addrconf.c:3117
Read of size 4 at addr ffff88807b89c86c by task syz.3.1618/9593

CPU: 0 UID: 0 PID: 9593 Comm: syz.3.1618 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xcd/0x630 mm/kasan/report.c:482
kasan_report+0xe0/0x110 mm/kasan/report.c:595
inet6_addr_del.constprop.0+0x67a/0x6b0 net/ipv6/addrconf.c:3117
addrconf_del_ifaddr+0x11e/0x190 net/ipv6/addrconf.c:3181
inet6_ioctl+0x1e5/0x2b0 net/ipv6/af_inet6.c:582
sock_do_ioctl+0x118/0x280 net/socket.c:1254
sock_ioctl+0x227/0x6b0 net/socket.c:1375
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl fs/ioctl.c:583 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f164cf8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f164de64038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f164d1e5fa0 RCX: 00007f164cf8f749
RDX: 0000200000000000 RSI: 0000000000008936 RDI: 0000000000000003
RBP: 00007f164d013f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f164d1e6038 R14: 00007f164d1e5fa0 R15: 00007ffde15c8288
</TASK>

Allocated by task 9593:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
kasan_save_track+0x14/0x30 mm/kasan/common.c:77
poison_kmalloc_redzone mm/kasan/common.c:397 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:414
kmalloc_noprof include/linux/slab.h:957 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
ipv6_add_addr+0x4e3/0x2010 net/ipv6/addrconf.c:1120
inet6_addr_add+0x256/0x9b0 net/ipv6/addrconf.c:3050
addrconf_add_ifaddr+0x1fc/0x450 net/ipv6/addrconf.c:3160
inet6_ioctl+0x103/0x2b0 net/ipv6/af_inet6.c:580
sock_do_ioctl+0x118/0x280 net/socket.c:1254
sock_ioctl+0x227/0x6b0 net/socket.c:1375
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl fs/ioctl.c:583 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 6099:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
kasan_save_track+0x14/0x30 mm/kasan/common.c:77
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:252 [inline]
__kasan_slab_free+0x5f/0x80 mm/kasan/common.c:284
kasan_slab_free include/linux/kasan.h:234 [inline]
slab_free_hook mm/slub.c:2540 [inline]
slab_free_freelist_hook mm/slub.c:2569 [inline]
slab_free_bulk mm/slub.c:6696 [inline]
kmem_cache_free_bulk mm/slub.c:7383 [inline]
kmem_cache_free_bulk+0x2bf/0x680 mm/slub.c:7362
kfree_bulk include/linux/slab.h:830 [inline]
kvfree_rcu_bulk+0x1b7/0x1e0 mm/slab_common.c:1523
kvfree_rcu_drain_ready mm/slab_common.c:1728 [inline]
kfree_rcu_monitor+0x1d0/0x2f0 mm/slab_common.c:1801
process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257
process_scheduled_works kernel/workqueue.c:3340 [inline]
worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421
kthread+0x3c5/0x780 kernel/kthread.c:463
ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

Fixes: 00b5b7aab9e42 ("net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged")
Reported-by: syzbot+72e610f4f1a930ca9d8a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/696598e9.050a0220.3be5c5.0009.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260113010538.2019411-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list()

syzbot was able to crash the kernel in rt6_uncached_list_flush_dev()
in an interesting way [1]

Crash happens in list_del_init()/INIT_LIST_HEAD() while writing
list->prev, while the prior write on list->next went well.

static inline void INIT_LIST_HEAD(struct list_head *list)
{
WRITE_ONCE(list->next, list); // This went well
WRITE_ONCE(list->prev, list); // Crash, @list has been freed.
}

Issue here is that rt6_uncached_list_del() did not attempt to lock
ul->lock, as list_empty(&rt->dst.rt_uncached) returned
true because the WRITE_ONCE(list->next, list) happened on the other CPU.

We might use list_del_init_careful() and list_empty_careful(),
or make sure rt6_uncached_list_del() always grabs the spinlock
whenever rt->dst.rt_uncached_list has been set.

A similar fix is neeed for IPv4.

[1]

BUG: KASAN: slab-use-after-free in INIT_LIST_HEAD include/linux/list.h:46 [inline]
BUG: KASAN: slab-use-after-free in list_del_init include/linux/list.h:296 [inline]
BUG: KASAN: slab-use-after-free in rt6_uncached_list_flush_dev net/ipv6/route.c:191 [inline]
BUG: KASAN: slab-use-after-free in rt6_disable_ip+0x633/0x730 net/ipv6/route.c:5020
Write of size 8 at addr ffff8880294cfa78 by task kworker/u8:14/3450

CPU: 0 UID: 0 PID: 3450 Comm: kworker/u8:14 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Workqueue: netns cleanup_net
Call Trace:
<TASK>
  dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
  print_address_description mm/kasan/report.c:378 [inline]
  print_report+0xca/0x240 mm/kasan/report.c:482
  kasan_report+0x118/0x150 mm/kasan/report.c:595
  INIT_LIST_HEAD include/linux/list.h:46 [inline]
  list_del_init include/linux/list.h:296 [inline]
  rt6_uncached_list_flush_dev net/ipv6/route.c:191 [inline]
  rt6_disable_ip+0x633/0x730 net/ipv6/route.c:5020
  addrconf_ifdown+0x143/0x18a0 net/ipv6/addrconf.c:3853
addrconf_notify+0x1bc/0x1050 net/ipv6/addrconf.c:-1
  notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85
  call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
  call_netdevice_notifiers net/core/dev.c:2282 [inline]
  netif_close_many+0x29c/0x410 net/core/dev.c:1785
  unregister_netdevice_many_notify+0xb50/0x2330 net/core/dev.c:12353
  ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
  ops_undo_list+0x3dc/0x990 net/core/net_namespace.c:248
  cleanup_net+0x4de/0x7b0 net/core/net_namespace.c:696
  process_one_work kernel/workqueue.c:3257 [inline]
  process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
  worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
  kthread+0x711/0x8a0 kernel/kthread.c:463
  ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>

Allocated by task 803:
  kasan_save_stack mm/kasan/common.c:57 [inline]
  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
  unpoison_slab_object mm/kasan/common.c:340 [inline]
  __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
  kasan_slab_alloc include/linux/kasan.h:253 [inline]
  slab_post_alloc_hook mm/slub.c:4953 [inline]
  slab_alloc_node mm/slub.c:5263 [inline]
  kmem_cache_alloc_noprof+0x18d/0x6c0 mm/slub.c:5270
  dst_alloc+0x105/0x170 net/core/dst.c:89
  ip6_dst_alloc net/ipv6/route.c:342 [inline]
  icmp6_dst_alloc+0x75/0x460 net/ipv6/route.c:3333
  mld_sendpack+0x683/0xe60 net/ipv6/mcast.c:1844
  mld_send_cr net/ipv6/mcast.c:2154 [inline]
  mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693
  process_one_work kernel/workqueue.c:3257 [inline]
  process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
  worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
  kthread+0x711/0x8a0 kernel/kthread.c:463
  ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

Freed by task 20:
  kasan_save_stack mm/kasan/common.c:57 [inline]
  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
  kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
  poison_slab_object mm/kasan/common.c:253 [inline]
  __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
  kasan_slab_free include/linux/kasan.h:235 [inline]
  slab_free_hook mm/slub.c:2540 [inline]
  slab_free mm/slub.c:6670 [inline]
  kmem_cache_free+0x18f/0x8d0 mm/slub.c:6781
  dst_destroy+0x235/0x350 net/core/dst.c:121
  rcu_do_batch kernel/rcu/tree.c:2605 [inline]
  rcu_core kernel/rcu/tree.c:2857 [inline]
  rcu_cpu_kthread+0xba5/0x1af0 kernel/rcu/tree.c:2945
  smpboot_thread_fn+0x542/0xa60 kernel/smpboot.c:160
  kthread+0x711/0x8a0 kernel/kthread.c:463
  ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

Last potentially related work creation:
  kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
  kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
  __call_rcu_common kernel/rcu/tree.c:3119 [inline]
  call_rcu+0xee/0x890 kernel/rcu/tree.c:3239
  refdst_drop include/net/dst.h:266 [inline]
  skb_dst_drop include/net/dst.h:278 [inline]
  skb_release_head_state+0x71/0x360 net/core/skbuff.c:1156
  skb_release_all net/core/skbuff.c:1180 [inline]
  __kfree_skb net/core/skbuff.c:1196 [inline]
  sk_skb_reason_drop+0xe9/0x170 net/core/skbuff.c:1234
  kfree_skb_reason include/linux/skbuff.h:1322 [inline]
  tcf_kfree_skb_list include/net/sch_generic.h:1127 [inline]
  __dev_xmit_skb net/core/dev.c:4260 [inline]
  __dev_queue_xmit+0x26aa/0x3210 net/core/dev.c:4785
  NF_HOOK_COND include/linux/netfilter.h:307 [inline]
  ip6_output+0x340/0x550 net/ipv6/ip6_output.c:247
  NF_HOOK+0x9e/0x380 include/linux/netfilter.h:318
  mld_sendpack+0x8d4/0xe60 net/ipv6/mcast.c:1855
  mld_send_cr net/ipv6/mcast.c:2154 [inline]
  mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693
  process_one_work kernel/workqueue.c:3257 [inline]
  process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
  worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
  kthread+0x711/0x8a0 kernel/kthread.c:463
  ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

The buggy address belongs to the object at ffff8880294cfa00
which belongs to the cache ip6_dst_cache of size 232
The buggy address is located 120 bytes inside of
freed 232-byte region [ffff8880294cfa00, ffff8880294cfae8)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x294cf
memcg:ffff88803536b781
flags: 0x80000000000000(node=0|zone=1)
page_type: f5(slab)
raw: 0080000000000000 ffff88802ff1c8c0 ffffea0000bf2bc0 dead000000000006
raw: 0000000000000000 00000000800c000c 00000000f5000000 ffff88803536b781
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52820(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 9, tgid 9 (kworker/0:0), ts 91119585830, free_ts 91088628818
  set_page_owner include/linux/page_owner.h:32 [inline]
  post_alloc_hook+0x234/0x290 mm/page_alloc.c:1857
  prep_new_page mm/page_alloc.c:1865 [inline]
  get_page_from_freelist+0x28c0/0x2960 mm/page_alloc.c:3915
  __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5210
  alloc_pages_mpol+0xd1/0x380 mm/mempolicy.c:2486
  alloc_slab_page mm/slub.c:3075 [inline]
  allocate_slab+0x86/0x3b0 mm/slub.c:3248
  new_slab mm/slub.c:3302 [inline]
  ___slab_alloc+0xb10/0x13e0 mm/slub.c:4656
  __slab_alloc+0xc6/0x1f0 mm/slub.c:4779
  __slab_alloc_node mm/slub.c:4855 [inline]
  slab_alloc_node mm/slub.c:5251 [inline]
  kmem_cache_alloc_noprof+0x101/0x6c0 mm/slub.c:5270
  dst_alloc+0x105/0x170 net/core/dst.c:89
  ip6_dst_alloc net/ipv6/route.c:342 [inline]
  icmp6_dst_alloc+0x75/0x460 net/ipv6/route.c:3333
  mld_sendpack+0x683/0xe60 net/ipv6/mcast.c:1844
  mld_send_cr net/ipv6/mcast.c:2154 [inline]
  mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693
  process_one_work kernel/workqueue.c:3257 [inline]
  process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
  worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
  kthread+0x711/0x8a0 kernel/kthread.c:463
  ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
page last free pid 5859 tgid 5859 stack trace:
  reset_page_owner include/linux/page_owner.h:25 [inline]
  free_pages_prepare mm/page_alloc.c:1406 [inline]
  __free_frozen_pages+0xfe1/0x1170 mm/page_alloc.c:2943
  discard_slab mm/slub.c:3346 [inline]
  __put_partials+0x149/0x170 mm/slub.c:3886
  __slab_free+0x2af/0x330 mm/slub.c:5952
  qlink_free mm/kasan/quarantine.c:163 [inline]
  qlist_free_all+0x97/0x100 mm/kasan/quarantine.c:179
  kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
  __kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:350
  kasan_slab_alloc include/linux/kasan.h:253 [inline]
  slab_post_alloc_hook mm/slub.c:4953 [inline]
  slab_alloc_node mm/slub.c:5263 [inline]
  kmem_cache_alloc_noprof+0x18d/0x6c0 mm/slub.c:5270
  getname_flags+0xb8/0x540 fs/namei.c:146
  getname include/linux/fs.h:2498 [inline]
  do_sys_openat2+0xbc/0x200 fs/open.c:1426
  do_sys_open fs/open.c:1436 [inline]
  __do_sys_openat fs/open.c:1452 [inline]
  __se_sys_openat fs/open.c:1447 [inline]
  __x64_sys_openat+0x138/0x170 fs/open.c:1447
  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
  do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94

Fixes: 8d0b94afdca8 ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister")
Fixes: 78df76a065ae ("ipv4: take rt_uncached_lock only if needed")
Reported-by: syzbot+179fc225724092b8b2b2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6964cdf2.050a0220.eaf7.009d.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260112103825.3810713-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hv_netvsc: reject RSS hash key programming without RX indirection table

RSS configuration requires a valid RX indirection table. When the device
reports a single receive queue, rndis_filter_device_add() does not
allocate an indirection table, accepting RSS hash key updates in this
state leads to a hang.

Fix this by gating netvsc_set_rxfh() on ndc->rx_table_sz and return
-EOPNOTSUPP when the table is absent. This aligns set_rxfh with the device
capabilities and prevents incorrect behavior.

Fixes: 962f3fee83a4 ("netvsc: add ethtool ops to get/set RSS key")
Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1768212093-1594-1-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sxgbe: fix typo in comment

Fix a misspelling in the sxgbe_mtl_init() function comment.
"Algorith" should be spelled as "Algorithm".

Signed-off-by: Jinseok Kim <always.starving0@gmail.com>
Link: https://patch.msgid.link/20260112044147.2844-1-always.starving0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-fixed_phy-replace-list-of-fixed-phys-with-static-array'

Heiner Kallweit says:

====================
net: phy: fixed_phy: replace list of fixed PHYs with static array

Due to max 32 PHY addresses being available per mii bus, using a list
can't support more fixed PHY's. And there's no known use case for as
much as 32 fixed PHY's on a system. 8 should be plenty of fixed PHY's,
so use an array of that size instead of a list. This allows to
significantly reduce the code size and complexity.
In addition replace heavy-weight IDA with a simple bitmap.
====================

Link: https://patch.msgid.link/110f676d-727c-4575-abe4-e383f98fc38f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: replace IDA with a bitmap

Size of array fmb_fixed_phys is small, so we can use a simple bitmap
instead of an IDA to manage dynamic allocation of fixed PHY's.
find_first_zero_bit() isn't atomic, so we need the loop to rule out
double allocation of a PHY address.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/d4614463-d532-41fc-92e9-ef97107aceb5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: replace list of fixed PHYs with static array

Due to max 32 PHY addresses being available per mii bus, using a list
can't support more fixed PHY's. And there's no known use case for as
much as 32 fixed PHY's on a system. 8 should be plenty of fixed PHY's,
so use an array of that size instead of a list. This allows to
significantly reduce the code size and complexity.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/8610d30c-eac7-4100-9008-d3b6cee6a5cd@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv6-allow-for-nexthop-device-mismatch-with-onlink'

Ido Schimmel says:

====================
ipv6: Allow for nexthop device mismatch with "onlink"

This patchset aligns IPv6 with IPv4 with respect to the "onlink" keyword
and allows IPv6 routes to be configured with a gateway address that is
resolved out of a different interface than the one specified.

Patches #1-#3 are small preparations in the existing "onlink" selftest.

Patch #4 is the actual change. See the commit message for detailed
description and motivation.

Patch #5 adds test cases for both address families, to make sure that
this use case does not regress.
====================

Link: https://patch.msgid.link/20260111120813.159799-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib-onlink: Add test cases for nexthop device mismatch

Add test cases that verify that when the "onlink" keyword is specified,
both address families (with and without VRF) accept routes with a
gateway address that is reachable via a different interface than the one
specified.

Output without "ipv6: Allow for nexthop device mismatch with "onlink"":

# ./fib-onlink-tests.sh | grep mismatch
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [FAIL]
TEST: nexthop device mismatch                             [FAIL]

Output with "ipv6: Allow for nexthop device mismatch with "onlink"":

# ./fib-onlink-tests.sh | grep mismatch
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [ OK ]

That is, the IPv4 tests were always passing, but the IPv6 ones only pass
after the specified patch.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Allow for nexthop device mismatch with "onlink"

IPv4 allows for a nexthop device mismatch when the "onlink" keyword is
specified:

# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 198.51.100.0/24 nexthop via 192.0.2.2 dev dummy2
Error: Nexthop has invalid gateway.
# ip route add 198.51.100.0/24 nexthop via 192.0.2.2 dev dummy2 onlink
# echo $?
0

This seems to be consistent with the description of "onlink" in the
ip-route man page: "Pretend that the nexthop is directly attached to
this link, even if it does not match any interface prefix".

On the other hand, IPv6 rejects a nexthop device mismatch, even when
"onlink" is specified:

# ip link add name dummy1 up type dummy
# ip address add 2001:db8:1::1/64 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2
RTNETLINK answers: No route to host
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink
Error: Nexthop has invalid gateway or device mismatch.

This is intentional according to commit fc1e64e1092f ("net/ipv6: Add
support for onlink flag") which added IPv6 "onlink" support and states
that "any unicast gateway is allowed as long as the gateway is not a
local address and if it resolves it must match the given device".

The condition was later relaxed in commit 4ed591c8ab44 ("net/ipv6: Allow
onlink routes to have a device mismatch if it is the default route") to
allow for a nexthop device mismatch if the gateway address is resolved
via the default route:

# ip link add name dummy1 up type dummy
# ip route add ::/0 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2
RTNETLINK answers: No route to host
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink
# echo $?
0

While the decision to forbid a nexthop device mismatch in IPv6 seems to
be intentional, it is unclear why it was made. Especially when it
differs from IPv4 and seems to go against the intended behavior of
"onlink".

Therefore, relax the condition further and allow for a nexthop device
mismatch when "onlink" is specified:

# ip link add name dummy1 up type dummy
# ip address add 2001:db8:1::1/64 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink
# echo $?
0

The motivating use case is the fact that FRR would like to be able to
configure overlay routes of the following form:

# ip route add <host-Z> vrf <VRF> encap ip id <ID> src <VTEP-A> dst <VTEP-Z> via <VTEP-Z> dev vxlan0 onlink

Where vxlan0 is in the default VRF in which "VTEP-Z" is reachable via
one of the underlay routes (e.g., via swpX). Without this patch, the
above only works with IPv4, but not with IPv6.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib-onlink: Add a test case for IPv4 multicast gateway

A multicast gateway address should be rejected when "onlink" is
specified, but it is only tested as part of the IPv6 tests. Add an
equivalent IPv4 test.

# ./fib-onlink-tests.sh -v
[...]
COMMAND: ip ro add table 254 169.254.101.12/32 via 233.252.0.1 dev veth1 onlink
Error: Nexthop has invalid gateway.

TEST: Invalid gw - multicast address                      [ OK ]
[...]
COMMAND: ip ro add table 1101 169.254.102.12/32 via 233.252.0.1 dev veth5 onlink
Error: Nexthop has invalid gateway.

TEST: Invalid gw - multicast address, VRF                 [ OK ]
[...]
Tests passed:  37
Tests failed:   0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib-onlink: Remove "wrong nexthop device" IPv6 tests

The command in the test fails as expected because IPv6 forbids a nexthop
device mismatch:

# ./fib-onlink-tests.sh -v
[...]
COMMAND: ip -6 ro add table 1101 2001:db8:102::103/128 via 2001:db8:701::64 dev veth5 onlink
Error: Nexthop has invalid gateway or device mismatch.

TEST: Gateway resolves to wrong nexthop device - VRF [ OK ]
[...]

Where:

# ip route get 2001:db8:701::64 vrf lisa
2001:db8:701::64 dev veth7 table 1101 proto kernel src 2001:db8:701::1 metric 256 pref medium

This is in contrast to IPv4 where a nexthop device mismatch is allowed
when "onlink" is specified:

# ip route get 169.254.7.2 vrf lisa
169.254.7.2 dev veth7 table 1101 src 169.254.7.1 uid 0
# ip ro add table 1101 169.254.102.103/32 via 169.254.7.2 dev veth5 onlink
# echo $?
0

Remove these tests in preparation for aligning IPv6 with IPv4 and
allowing nexthop device mismatch when "onlink" is specified.

A subsequent patch will add tests that verify that both address families
allow a nexthop device mismatch with "onlink".

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib-onlink: Remove "wrong nexthop device" IPv4 tests

According to the test description, these tests fail because of a wrong
nexthop device:

# ./fib-onlink-tests.sh -v
[...]
COMMAND: ip ro add table 254 169.254.101.102/32 via 169.254.3.1 dev veth1 onlink
Error: Nexthop has invalid gateway.

TEST: Gateway resolves to wrong nexthop device            [ OK ]
COMMAND: ip ro add table 1101 169.254.102.103/32 via 169.254.7.1 dev veth5 onlink
Error: Nexthop has invalid gateway.

TEST: Gateway resolves to wrong nexthop device - VRF      [ OK ]
[...]

But this is incorrect. They fail because the gateway addresses are local
addresses:

# ip -4 address show
[...]
28: veth3@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link-netns peer_ns-Urqh3o
     inet 169.254.3.1/24 scope global veth3
[...]
32: veth7@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lisa state UP group default qlen 1000 link-netns peer_ns-Urqh3o
     inet 169.254.7.1/24 scope global veth7

Therefore, using a local address that matches the nexthop device fails
as well:

# ip ro add table 254 169.254.101.102/32 via 169.254.3.1 dev veth3 onlink
Error: Nexthop has invalid gateway.

Using a gateway address with a "wrong" nexthop device is actually valid
and allowed:

# ip route get 169.254.1.2
169.254.1.2 dev veth1 src 169.254.1.1 uid 0
# ip ro add table 254 169.254.101.102/32 via 169.254.1.2 dev veth3 onlink
# echo $?
0

Remove these tests given that their output is confusing and that the
scenario that they are testing is already covered by other tests.

A subsequent patch will add tests for the nexthop device mismatch
scenario.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-introduce-phy-ports-representation'

Maxime Chevallier says:

====================
net: phy: Introduce PHY ports representation

A few important notes:

- This is only a first phase. It instantiates the port, and leverage
   that to make the MAC <-> PHY <-> SFP usecase simpler.

- Next phase will deal with controlling the port state, as well as the
   netlink uAPI for that.

- The end-goal is to enable support for complex port MUX. This
   preliminary work focuses on PHY-driven ports, but this will be
   extended to support muxing at the MII level (Multi-phy, or compo PHY
   + SFP as found on Turris Omnia for example).

- The naming is definitely not set in stone. I named that "phy_port",
   but this may convey the false sense that this is phylib-specific.
   Even the word "port" is not that great, as it already has several
   different meanings in the net world (switch port, devlink port,
   etc.). I used the term "connector" in the binding.

A bit of history on that work :

The end goal that I personnaly want to achieve is :

            + PHY - RJ45
            |
MAC - MUX -+ PHY - RJ45

After many discussions here on netdev@, but also at netdevconf[1] and
LPC[2], there appears to be several analoguous designs that exist out
there.

[1] : https://netdevconf.info/0x17/sessions/talk/improving-multi-phy-and-multi-port-interfaces.html
[2] : https://lpc.events/event/18/contributions/1964/ (video isn't the
right one)

Take the MAchiatobin, it has 2 interfaces that looks like this :

MAC - PHY -+ RJ45
            |
    + SFP - Whatever the module does

Now, looking at the Turris Omnia, we have :

MAC - MUX -+ PHY - RJ45
            |
    + SFP - Whatever the module does

We can find more example of this kind of designs, the common part is
that we expose multiple front-facing media ports. This is what this
current work aims at supporting. As of right now, it does'nt add any
support for muxing, but this will come later on.

This first phase focuses on phy-driven ports only, but there are already
quite some challenges already. For one, we can't really autodetect how
many ports are sitting behind a PHY. That's why this series introduces a
new binding. Describing ports in DT should however be a last-resort
thing when we need to clear some ambiguity about the PHY media-side.

The only use-cases that we have today for multi-port PHYs are combo PHYs
that drive both a Copper port and an SFP (the Macchiatobin case). This
in itself is challenging and this series only addresses part of this
support, by registering a phy_port for the PHY <-> SFP connection. The
SFP module should in the end be considered as a port as well, but that's
not yet the case.

However, because now PHYs can register phy_ports for every media-side
interface they have, they can register the capabilities of their ports,
which allows making the PHY-driver SFP case much more generic.
====================

Link: https://patch.msgid.link/20260108080041.553250-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Documentation: networking: Document the phy_port infrastructure

This documentation aims at describing the main goal of the phy_port
infrastructure.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-15-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>