git.ipfire.org Git - thirdparty/kernel/linux.git/log

net: enetc: add unstructured pMAC counters for ENETC v1

The ENETC v1 has two MACs (eMAC and pMAC) to support preemption. The
existing unstructured counters include the eMAC counters, but not the
pMAC counters. So add pMAC counters to improve statistical coverage.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-5-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: remove standardized counters from enetc_pm_counters

The standardized counters are already exposed via the get_pause_stats(),
get_rmon_stats(), get_eth_ctrl_stats() and get_eth_mac_stats()
interfaces. Keeping the same counters in enetc_pm_counters results in
redundant output.

Remove these standardized counters from enetc_pm_counters and rely on
the existing statistics interfaces to report them.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: show RX drop counters only for assigned RX rings

For ENETC v1, each SI provides 16 RBDCR registers for RX ring drop
counters, but this does not imply that an SI actually owns 16 RX rings.
The ENETC hardware supports a total of 16 RX rings, which are assigned
to 3 SIs (1 PSI and 2 VSIs), so each SI is assigned fewer than 16 RX
rings.

The current implementation always reports 16 RX drop counters per SI,
leading to redundant output for SIs with fewer RX rings. Update the
logic to display drop counters only for the RX rings that are actually
assigned to the SI.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: add support for the standardized counters

ENETC v4 provides 64-bit counters for IEEE 802.3 basic and mandatory
managed objects, the IETF Management Information Database (MIB) package
(RFC2665), and Remote Network Monitoring (RMON) statistics. In addition,
some ENETCs support preemption, so these ENETCs have two MACs: MAC 0 is
the express MAC (eMAC), MAC 1 is the preemptible MAC (pMAC). Both MACs
support these statistics.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gre: Count GRE packet drops

GRE is silently dropping packets without updating statistics.

In case of drop, increment rx_dropped counter to provide visibility into
packet loss. For the case where no GRE protocol handler is registered,
use rx_nohandler.

Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260409090945.1542440-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-add-support-for-disabling-autonomous-eee'

Nicolai Buchwitz says:

====================
net: phy: add support for disabling autonomous EEE

Some PHYs implement autonomous EEE where the PHY manages EEE
independently, preventing the MAC from controlling LPI signaling.
This conflicts with MACs that implement their own LPI control.

This series adds a .disable_autonomous_eee callback to struct phy_driver
and calls it from phy_support_eee(). When a MAC indicates it supports
EEE, the PHY's autonomous EEE is automatically disabled. The setting is
persisted across suspend/resume by re-applying it in phy_init_hw() after
soft reset, following the same pattern suggested by Russell King for PHY
tunables [1].

Patch 1 adds the phylib infrastructure.
Patch 2 implements it for Broadcom BCM54xx (AutogrEEEn).
Patch 3 converts the Realtek RTL8211F, which previously unconditionally
disabled PHY-mode EEE in config_init.

This came up while adding EEE support to the Cadence macb driver (used
on Raspberry Pi 5 with a BCM54210PE PHY). The PHY's AutogrEEEn mode
prevented the MAC from tracking LPI state. The Realtek RTL8211F has
the same pattern, unconditionally disabling PHY-mode EEE with the
comment "Disable PHY-mode EEE so LPI is passed to the MAC".

Other BCM54xx PHYs likely have the same AutogrEEEn register layout,
but I only have access to the BCM54210PE/BCM54213PE datasheets. It
would be appreciated if Florian or others could confirm which other
BCM54xx variants share this register so we can wire them up too.

Tested on Raspberry Pi CM4 (bcmgenet + BCM54210PE),
Raspberry Pi CM5 (Cadence GEM + BCM54210PE) and
Raspberry Pi 5 (Cadence GEM + BCM54213PE).

[1] https://lore.kernel.org/netdev/acuwvoydmJusuj9x@shell.armlinux.org.uk/
====================

Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-0-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: convert RTL8211F to .disable_autonomous_eee

The RTL8211F previously unconditionally disabled PHY-mode EEE in
config_init. Convert this to use the new .disable_autonomous_eee
callback so it is only disabled when the MAC indicates EEE support
via phy_support_eee().

This preserves PHY-autonomous EEE for MACs that do not support EEE,
while still disabling it when the MAC manages LPI.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-3-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: broadcom: implement .disable_autonomous_eee for BCM54xx

Implement the .disable_autonomous_eee callback for the BCM54210E.

In AutogrEEEn mode the PHY manages EEE autonomously. Clearing the
AutogrEEEn enable bit in MII_BUF_CNTL_0 switches the PHY to Native
EEE mode.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-2-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: add support for disabling PHY-autonomous EEE

Some PHYs (e.g. Broadcom BCM54xx, Realtek RTL8211F) implement
autonomous EEE where the PHY manages LPI signaling without forwarding
it to the MAC. This conflicts with MAC drivers that implement their own
LPI control.

Add a .disable_autonomous_eee callback to struct phy_driver and call it
from phy_support_eee(). When a MAC driver indicates it supports EEE via
phy_support_eee(), the PHY's autonomous EEE is automatically disabled so
the MAC can manage LPI entry/exit.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-1-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ynl-ethtool-netlink-fix-nla_len-overflow-for-large-string-sets'

Hangbin Liu says:

====================
ynl/ethtool/netlink: fix nla_len overflow for large string sets

This series addresses a silent data corruption issue triggered when ynl
retrieves string sets from NICs with a large number of statistics entries
(e.g. mlx5_core with thousands of ETH_SS_STATS strings).

The root cause is that struct nlattr.nla_len is a __u16 (max 65535
bytes). When a NIC exports enough statistics strings, the
ETHTOOL_A_STRINGSET_STRINGS nest built by strset_fill_set() exceeds
this limit. nla_nest_end() silently truncates the length on assignment,
producing a corrupted netlink message.

Patch 1 moves ethtool.py to selftest.

Patch 2 improves the ethtool tool: rename the doit/dumpit helpers
to do_set/do_get and convert do_get to use ynl.do() with an
explicit device header instead of a full dump with client-side filtering.

Patch 3 adds a --dbg-small-recv option to the YNL ethtool tool,
matching the same option already present in cli.py, to help debug netlink
message size issues

Patch 4 adds a new helper nla_nest_end_safe() to check whether the nla_len
is overflow and return -EMSGSIZE early if so.

Patch 5 uses the new helper in ethtool to make sure the ethtool doesn't
reply a corrupted netlink message.
====================

Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-0-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: strset: check nla_len overflow

The netlink attribute length field nla_len is a __u16, which can only
represent values up to 65535 bytes. NICs with a large number of
statistics strings (e.g. mlx5_core with thousands of ETH_SS_STATS
entries) can produce a ETHTOOL_A_STRINGSET_STRINGS nest that exceeds
this limit.

When nla_nest_end() writes the actual nest size back to nla_len, the
value is silently truncated. This results in a corrupted netlink message
being sent to userspace: the parser reads a wrong (truncated) attribute
length and misaligns all subsequent attribute boundaries, causing decode
errors.

Fix this by using the new helper nla_nest_end_safe and error out if
the size exceeds U16_MAX.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-5-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: add a nla_nest_end_safe() helper

The nla_len field in struct nlattr is a __u16, which can only hold
values up to 65535. If a nested attribute grows beyond this limit,
nla_nest_end() silently truncates the length, producing a corrupted
netlink message with no indication of the problem.

Since nla_nest_end() is used everywhere and this issue rarely happens,
let's add a new helper to check the length.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-4-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: ethtool: add --dbg-small-recv option

Add a --dbg-small-recv debug option to control the recv() buffer size
used by YNL, matching the same option already present in cli.py. This
is useful if user need to get large netlink message.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-3-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: ethtool: use doit instead of dumpit for per-device GET

Rename the local helper doit() to do_set() and dumpit() to do_get() to
better reflect their purpose.

Convert do_get() to use ynl.do() with an explicit device header instead
of ynl.dump() followed by client-side filtering. This is more efficient
as the kernel only processes and returns data for the requested device,
rather than dumping all devices across the netns.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-2-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: move ethtool.py to selftest

We have converted all the samples to selftests. This script is
the last piece of random "PoC" code we still have lying around.
Let's move it to tests.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-1-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bng_en-add-link-management-and-statistics-support'

Bhargava Marreddy says:

====================
bng_en: add link management and statistics support

This series enhances the bng_en driver by adding:
1. Link/PHY support
   a. Link query
   b. Async Link events
   c. Ethtool link set/get functionality
2. Hardware statistics reporting via ethtool -S

This version incorporates feedback received prior to splitting the
original series into two parts.
====================

Link: https://patch.msgid.link/20260406180420.279470-1-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: add support for ethtool -S stats display

Implement the legacy ethtool statistics interface (get_sset_count,
get_strings, get_ethtool_stats) to expose hardware counters not
available through standard kernel stats APIs.

Ex:
a) Per-queue ring stats
     rxq0_ucast_packets: 2
     rxq0_mcast_packets: 0
     rxq0_bcast_packets: 15
     rxq0_ucast_bytes: 120
     rxq0_mcast_bytes: 0
     rxq0_bcast_bytes: 900
     txq0_ucast_packets: 0
     txq0_mcast_packets: 0
     txq0_bcast_packets: 0
     txq0_ucast_bytes: 0
     txq0_mcast_bytes: 0
     txq0_bcast_bytes: 0

b) Per-queue TPA(LRO/GRO) stats
     rxq4_tpa_eligible_pkt: 0
     rxq4_tpa_eligible_bytes: 0
     rxq4_tpa_pkt: 0
     rxq4_tpa_bytes: 0
     rxq4_tpa_errors: 0
     rxq4_tpa_events: 0

c) Port level stats
     rxp_good_vlan_frames: 0
     rxp_mtu_err_frames: 0
     rxp_tagged_frames: 0
     rxp_double_tagged_frames: 0
     rxp_pfc_ena_frames_pri0: 0
     rxp_pfc_ena_frames_pri1: 0
     rxp_pfc_ena_frames_pri2: 0
     rxp_pfc_ena_frames_pri3: 0
     rxp_pfc_ena_frames_pri4: 0
     rxp_pfc_ena_frames_pri5: 0
     rxp_pfc_ena_frames_pri6: 0
     rxp_pfc_ena_frames_pri7: 0
     rxp_eee_lpi_events: 0
     rxp_eee_lpi_duration: 0
     rxp_runt_bytes: 0
     rxp_runt_frames: 0
     txp_good_vlan_frames: 0
     txp_jabber_frames: 0
     txp_fcs_err_frames: 0
     txp_pfc_ena_frames_pri0: 0
     txp_pfc_ena_frames_pri1: 0
     txp_pfc_ena_frames_pri2: 0
     txp_pfc_ena_frames_pri3: 0
     txp_pfc_ena_frames_pri4: 0
     txp_pfc_ena_frames_pri5: 0
     txp_pfc_ena_frames_pri6: 0
     txp_pfc_ena_frames_pri7: 0
     txp_eee_lpi_events: 0
     txp_eee_lpi_duration: 0
     txp_xthol_frames: 0

d) Per-priority stats
     rx_bytes_pri0: 4182650
     rx_bytes_pri1: 4182650
     rx_bytes_pri2: 4182650
     rx_bytes_pri3: 4182650
     rx_bytes_pri4: 4182650
     rx_bytes_pri5: 4182650
     rx_bytes_pri6: 4182650
     rx_bytes_pri7: 4182650

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-11-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: implement netdev_stat_ops

Implement netdev_stat_ops to provide standardized per-queue
statistics via the Netlink API.

Below is the description of the hardware drop counters:

rx-hw-drop-overruns: Packets dropped by HW due to resource limitations
(e.g., no BDs available in the host ring).
rx-hw-drops: Total packets dropped by HW (sum of overruns and error
drops).
tx-hw-drop-errors: Packets dropped by HW because they were invalid or
malformed.
tx-hw-drops: Total packets dropped by HW (sum of resource limitations
and error drops).

The implementation was verified using the ynl tool:

./tools/net/ynl/pyynl/cli.py --spec \
Documentation/netlink/specs/netdev.yaml --dump qstats-get --json \
'{"ifindex":14, "scope":"queue"}'

[{'ifindex': 14, 'queue-id': 0, 'queue-type': 'rx', 'rx-bytes': 758,
'rx-hw-drop-overruns': 0, 'rx-hw-drops': 0, 'rx-packets': 11},
{'ifindex': 14, 'queue-id': 1, 'queue-type': 'rx', 'rx-bytes': 0,
'rx-hw-drop-overruns': 0, 'rx-hw-drops': 0, 'rx-packets': 0},
{'ifindex': 14, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 0,
'tx-hw-drop-errors': 0, 'tx-hw-drops': 0, 'tx-packets': 0},
{'ifindex': 14, 'queue-id': 1, 'queue-type': 'tx', 'tx-bytes': 0,
'tx-hw-drop-errors': 0, 'tx-hw-drops': 0, 'tx-packets': 0},
{'ifindex': 14, 'queue-id': 2, 'queue-type': 'tx', 'tx-bytes': 810,
'tx-hw-drop-errors': 0, 'tx-hw-drops': 0, 'tx-packets': 10},]

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-10-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: implement ndo_get_stats64

Implement the ndo_get_stats64 callback to report aggregate network
statistics. The driver gathers these by accumulating the per-ring
counters into the provided rtnl_link_stats64 structure.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-9-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: periodically fetch and accumulate hardware statistics

Use the timer to schedule periodic stats collection via
the workqueue when the link is up. Fetch fresh counters from
hardware via DMA and accumulate them into 64-bit software
shadows, handling wrap-around for counters narrower than
64 bits.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rahul Gupta <rahul-rg.gupta@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-8-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: add HW stats infra and structured ethtool ops

Implement the hardware-level statistics foundation and modern structured
ethtool operations.

1. Infrastructure: Add HWRM firmware wrappers (FUNC_QSTATS_EXT,
PORT_QSTATS_EXT, and PORT_QSTATS) to query ring and port counters.
2. Structured ops: Implement .get_eth_phy_stats, .get_eth_mac_stats,
.get_eth_ctrl_stats, .get_pause_stats, and .get_rmon_stats.

Stats are initially reported as 0; accumulation logic is added
in a subsequent patch.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-7-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: add support for link async events

Register for firmware asynchronous events, including link-status,
link-speed, and PHY configuration changes. Upon event reception,
re-query the PHY and update ethtool settings accordingly.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-6-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: implement ethtool pauseparam operations

Implement .get_pauseparam and .set_pauseparam to support flow control
configuration. This allows reporting and setting of autoneg, RX pause,
and TX pause states.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-5-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: add ethtool link settings, get_link, and nway_reset

Add get/set_link_ksettings, get_link, and nway_reset support.
Report supported, advertised, and link-partner speeds across NRZ,
PAM4, and PAM4-112 signaling modes. Enable lane count reporting.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-4-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: query PHY capabilities and report link status

Query PHY capabilities and supported speeds from firmware,
retrieve current link state (speed, duplex, pause, FEC),
and log the information. Seed initial link state during probe.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-3-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bng_en: add per-PF workqueue, timer, and slow-path task

Add a dedicated single-thread workqueue and a timer for each PF
to drive deferred slow-path work such as link event handling and
stats collection. The timer is stopped via timer_delete_sync()
when interrupts are disabled and restarted on open.

While the close path stops the timer to prevent new tasks from
being scheduled, the sp_task and workqueue are preserved to
maintain state continuity. Final draining and destruction of
the workqueue are handled during PCI remove.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-2-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-tso-map-once-dma-helpers-and-bnxt-sw-uso-support'

Joe Damato says:

====================
Add TSO map-once DMA helpers and bnxt SW USO support

Greetings:

This series extends net/tso to add a data structure and some helpers allowing
drivers to DMA map headers and packet payloads a single time. The helpers can
then be used to reference slices of shared mapping for each segment. This
helps to avoid the cost of repeated DMA mappings, especially on systems which
use an IOMMU. N per-packet DMA maps are replaced with a single map for the
entire GSO skb. As of v3, the series uses the DMA IOVA API (as suggested by
Leon [1]) and provides a fallback path when an IOMMU is not in use. The DMA
IOVA API provides even better efficiency than the v2; see below.

The added helpers are then used in bnxt to add support for software UDP
Segmentation Offloading (SW USO) for older bnxt devices which do not have
support for USO in hardware. Since the helpers are generic, other drivers
can be extended similarly.

The v2 showed a ~4x reduction in DMA mapping calls at the same wire packet
rate on production traffic with a bnxt device. The v3, however, shows a larger
reduction of about ~6x at the same wire packet rate. This is thanks to Leon's
suggestion of using the DMA IOVA API [1].

Special care is taken to make bnxt ethtool operations work correctly: the ring
size cannot be reduced below a minimum threshold while USO is enabled and
growing the ring automatically re-enables USO if it was previously blocked.

This v10 contains some cosmetic changes (wrapping long lines), moves the test
to the correct directory, and attempts to fix the slot availability check
added in the v9.

I re-ran the python test and the test passed on my bnxt system. I also ran
this on a production system.
====================

Link: https://patch.msgid.link/20260408230607.2019402-1-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: Add USO test

Add a simple test for USO. Tests both ipv4 and ipv6 with several full
segments and a partial segment.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-11-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnxt: Dispatch to SW USO

Wire in the SW USO path added in preceding commits when hardware USO is
not possible.

When a GSO skb with SKB_GSO_UDP_L4 arrives and the NIC lacks HW USO
capability, redirect to bnxt_sw_udp_gso_xmit() which handles software
segmentation into individual UDP frames submitted directly to the TX
ring.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-10-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnxt: Add SW GSO completion and teardown support

Update __bnxt_tx_int and bnxt_free_one_tx_ring_skbs to handle SW GSO
segments:

- MID segments: adjust tx_pkts/tx_bytes accounting and skip skb free
  (the skb is shared across all segments and freed only once)

- LAST segments: call tso_dma_map_complete() to tear down the IOVA
  mapping if one was used. On the fallback path, payload DMA unmapping
  is handled by the existing per-BD dma_unmap_len walk.

Both MID and LAST completions advance tx_inline_cons to release the
segment's inline header slot back to the ring.

is_sw_gso is initialized to zero, so the new code paths are not run.

Add logic for feature advertisement and guardrails for ring sizing.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-9-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnxt: Implement software USO

Implement bnxt_sw_udp_gso_xmit() using the core tso_dma_map API and
the pre-allocated TX inline buffer for per-segment headers.

The xmit path:
1. Calls tso_start() to initialize TSO state
2. Stack-allocates a tso_dma_map and calls tso_dma_map_init() to
   DMA-map the linear payload and all frags upfront.
3. For each segment:
   - Copies and patches headers via tso_build_hdr() into the
     pre-allocated tx_inline_buf (DMA-synced per segment)
   - Counts payload BDs via tso_dma_map_count()
   - Emits long BD (header) + ext BD + payload BDs
   - Payload BDs use tso_dma_map_next() which yields (dma_addr,
     chunk_len, mapping_len) tuples.

Header BDs set dma_unmap_len=0 since the inline buffer is pre-allocated
and unmapped only at ring teardown.

Completion state is updated by calling tso_dma_map_completion_save() for
the last segment.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-8-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnxt: Add boilerplate GSO code

Add bnxt_gso.c and bnxt_gso.h with a stub bnxt_sw_udp_gso_xmit()
function, SW USO constants (BNXT_SW_USO_MAX_SEGS,
BNXT_SW_USO_MAX_DESCS), and the is_sw_gso field in bnxt_sw_tx_bd
with BNXT_SW_GSO_MID/LAST markers.

The full SW USO implementation will be added in a future commit.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-7-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnxt: Add TX inline buffer infrastructure

Add per-ring pre-allocated inline buffer fields (tx_inline_buf,
tx_inline_dma, tx_inline_size) to bnxt_tx_ring_info and helpers to
allocate and free them. A producer and consumer (tx_inline_prod,
tx_inline_cons) are added to track which slot(s) of the inline buffer
are in-use.

The inline buffer will be used by the SW USO path for pre-allocated,
pre-DMA-mapped per-segment header copies. In the future, this
could be extended to support TX copybreak.

Allocation helper is marked __maybe_unused in this commit because it
will be wired in later.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-6-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnxt: Use dma_unmap_len for TX completion unmapping

Store the DMA mapping length in each TX buffer descriptor via
dma_unmap_len_set at submit time, and use dma_unmap_len at completion
time.

This is a no-op for normal packets but prepares for software USO,
where header BDs set dma_unmap_len to 0 because the header buffer
is unmapped collectively rather than per-segment.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-5-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnxt: Add a helper for tx_bd_ext

Factor out some code to setup tx_bd_exts into a helper function. This
helper will be used by SW USO implementation in the following commits.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-4-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnxt: Export bnxt_xmit_get_cfa_action

Export bnxt_xmit_get_cfa_action so that it can be used in future commits
which add software USO support to bnxt.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-3-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: tso: Introduce tso_dma_map and helpers

Add struct tso_dma_map to tso.h for tracking DMA addresses of mapped
GSO payload data and tso_dma_map_completion_state.

The tso_dma_map combines DMA mapping storage with iterator state, allowing
drivers to walk pre-mapped DMA regions linearly. Includes fields for
the DMA IOVA path (iova_state, iova_offset, total_len) and a fallback
per-region path (linear_dma, frags[], frag_idx, offset).

The tso_dma_map_completion_state makes the IOVA completion state opaque
for drivers. Drivers are expected to allocate this and use the added
helpers to update the completion state.

Adds skb_frag_phys() to skbuff.h, returning the physical address
of a paged fragment's data, which is used by the tso_dma_map helpers
introduced in this commit described below.

The added TSO DMA map helpers are:

tso_dma_map_init(): DMA-maps the linear payload region and all frags
upfront. Prefers the DMA IOVA API for a single contiguous mapping with
one IOTLB sync; falls back to per-region dma_map_phys() otherwise.
Returns 0 on success, cleans up partial mappings on failure.

tso_dma_map_cleanup(): Handles both IOVA and fallback teardown paths.

tso_dma_map_count(): counts how many descriptors the next N bytes of
payload will need. Returns 1 if IOVA is used since the mapping is
contiguous.

tso_dma_map_next(): yields the next (dma_addr, chunk_len) pair.
On the IOVA path, each segment is a single contiguous chunk. On the
fallback path, indicates when a chunk starts a new DMA mapping so the
driver can set dma_unmap_len on that descriptor for completion-time
unmapping.

tso_dma_map_completion_save(): updates the completion state. Drivers
will call this at xmit time.

tso_dma_map_complete(): tears down the mapping at completion time and
returns true if the IOVA path was used. If it was not used, this is a
no-op and returns false.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-2-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock/virtio: remove unnecessary call to `virtio_transport_get_ops`

`virtio_transport_send_pkt_info` gets all the transport information
from the parameter `t_ops`. There is no need to call
`virtio_transport_get_ops()`.

Remove it.

Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260408-remove_parameter-v2-1-e00f31cf7a17@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: skb: clean up dead code after skb_kfree_head() simplification

Since commit 0f42e3f4fe2a ("net: skb: fix cross-cache free of
KFENCE-allocated skb head"), skb_kfree_head() always calls kfree()
and no longer uses end_offset to distinguish between skb_small_head_cache
and generic kmalloc caches.

Clean up the leftovers:

- Remove the unused end_offset parameter from skb_kfree_head() and
  update all callers.
- Remove the SKB_SMALL_HEAD_HEADROOM guard in __skb_unclone_keeptruesize()
  which was protecting the old skb_kfree_head() logic.
- Update the SKB_SMALL_HEAD_CACHE_SIZE comment to reflect that the
  non-power-of-2 sizing is no longer used for free-path disambiguation.

No functional change.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260410034736.297900-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netkit: Don't emit scrub attribute for single device mode

When userspace reads a single mode netkit device via RTM_GETLINK,
it receives IFLA_NETKIT_SCRUB=NETKIT_SCRUB_DEFAULT attribute from
netkit_fill_info(). If that attribute is echoed back to recreate
the device, the seen_scrub presence check in netkit_new_link()
causes creation to fail with -EOPNOTSUPP. Since it has no meaning
for single devices at this point, just don't dump it.

Fixes: 481038960538 ("netkit: Add single device mode for netkit")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260410072334.548232-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan743x: rename chip_rev to fpga_rev

The variable chip_rev stores the value read from the FPGA_REV
register and represents the FPGA revision. Rename it to fpga_rev
to better reflect its meaning.

No functional change intended.

Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Link: https://patch.msgid.link/20260410085710.9246-1-thangaraj.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: netfilter: nft_tproxy.sh: adjust to socat changes

Like e65d8b6f3092 ("selftests: drv-net: adjust to socat changes") we
need to add shut-none for this test too.

The extra 0-packet can trigger a second (unexpected) reply from the server.

Fixes: 7e37e0eacd22 ("selftests: netfilter: nft_tproxy.sh: add tcp tests")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260408152432.24b8ad0d@kernel.org/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260409224506.27072-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: updates for net-next

1-3) IPVS updates from Julian Anastasov to enhance visibility into
     IPVS internal state by exposing hash size, load factor etc and
     allows userspace to tune the load factor used for resizing hash
     tables.

4) reject empty/not nul terminated device names from xt_physdev.
   This isn't a bug fix; existing code doesn't require a c-string.
   But clean this up anyway because conceptually the interface name
   definitely should be a c-string.

5) Switch nfnetlink to skb_mac_header helpers that didn't exist back
   when this code was written.  This gives us additional debug checks
   but is not intended to change functionality.

6) Let the xt ttl/hoplimit match reject unknown operator modes.
   This is a cleanup, the evaluation function simply returns false when
   the mode is out of range.  From Marino Dzalto.

7) xt_socket match should enable defrag after all other checks. This
   bug is harmless, historically defrag could not be disabled either
   except by rmmod.

8) remove UDP-Lite conntrack support, from Fernando Fernandez Mancera.

9) Avoid a couple -Wflex-array-member-not-at-end warnings in the old
   xtables 32bit compat code, from Gustavo A. R. Silva.

10) nftables fwd expression should drop packets when their ttl/hl has
    expired.  This is a bug fix deferred, its not deemed important
    enough for -rc8.
11) Add additional checks before assuming the mac header is an ethernet
    header, from Zhengchuan Liang.

* tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: require Ethernet MAC header before using eth_hdr()
  netfilter: nft_fwd_netdev: check ttl/hl before forwarding
  netfilter: x_tables: Avoid a couple -Wflex-array-member-not-at-end warnings
  netfilter: conntrack: remove UDP-Lite conntrack support
  netfilter: xt_socket: enable defrag after all other checks
  netfilter: xt_HL: add pr_fmt and checkentry validation
  netfilter: nfnetlink: prefer skb_mac_header helpers
  netfilter: x_physdev: reject empty or not-nul terminated device names
  ipvs: add conn_lfactor and svc_lfactor sysctl vars
  ipvs: add ip_vs_status info
  ipvs: show the current conn_tab size to users
====================

Link: https://patch.msgid.link/20260410112352.23599-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Final updates, notably:
- crypto: move Michael MIC code into wireless (only)
- mac80211:
   - multi-link 4-addr support
   - NAN data support (but no drivers yet)
- ath10k: DT quirk to make it work on some devices
- ath12k: IPQ5424 support
- rtw89: USB improvements for performance

* tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (124 commits)
  wifi: cfg80211: Explicitly include <linux/export.h> in michael-mic.c
  wifi: ath10k: Add device-tree quirk to skip host cap QMI requests
  dt-bindings: wireless: ath10k: Add quirk to skip host cap QMI requests
  crypto: Remove michael_mic from crypto_shash API
  wifi: ipw2x00: Use michael_mic() from cfg80211
  wifi: ath12k: Use michael_mic() from cfg80211
  wifi: ath11k: Use michael_mic() from cfg80211
  wifi: mac80211, cfg80211: Export michael_mic() and move it to cfg80211
  wifi: ipw2x00: Rename michael_mic() to libipw_michael_mic()
  wifi: libertas_tf: refactor endpoint lookup
  wifi: libertas: refactor endpoint lookup
  wifi: at76c50x: refactor endpoint lookup
  wifi: ath12k: Enable IPQ5424 WiFi device support
  wifi: ath12k: Add CE remap hardware parameters for IPQ5424
  wifi: ath12k: add ath12k_hw_regs for IPQ5424
  wifi: ath12k: add ath12k_hw_version_map entry for IPQ5424
  wifi: ath12k: Add ath12k_hw_params for IPQ5424
  dt-bindings: net: wireless: add ath12k wifi device IPQ5424
  wifi: ath10k: fix station lookup failure during disconnect
  wifi: ath12k: Create symlink for each radio in a wiphy
  ...
====================

Link: https://patch.msgid.link/20260410064703.735099-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: add indirect call wrapper in tcp_conn_request()

Small improvement in SYN processing, to directly call
tcp_v6_init_seq_and_ts_off() or tcp_v4_init_seq_and_ts_off().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260410174950.745670-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Rename ifq_idx to rxq_idx in netif_mp_* helpers

Rename the leftover ifq_idx parameter naming to rxq_idx to be
consistent with the rest of the file and the header declaration.
Back then this was taken out of the queue leasing series given
the cleanup is independent. No functional change.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/netdev/20260131160237.07789674@kernel.org
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260410130602.552600-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: py: add test case filtering and listing

When developing new test cases and reproducing failures in
existing ones we currently have to run the entire test which
can take minutes to finish.

Add command line options for test selection, modeled after
kselftest_harness.h:

  -l       list tests (filtered, if filters were specified)
  -t name  include test
  -T name  exclude test

Since we don't have as clean separation into fixture / variant /
test as kselftest_harness this is not really a 1 to 1 match.
We have to lean on glob patterns instead.

Like in kselftest_harness filters are evaluated in order, first
match wins. If only exclusions are specified everything else is
included and vice versa.

Glob patterns (*, ?, [) are supported in addition to exact
matching.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Tested-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260410013921.1710295-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fix reference tracker mismanagement in netdev_put_lock()

dev_put() releases a reference which didn't have a tracker.
References without a tracker are accounted in the tracking
code as "no_tracker". We can't free the tracker and then
call dev_put(). The references themselves will be fine
but the tracking code will think it's a double-release:

refcount_t: decrement hit 0; leaking memory.

IOW commit under fixes confused dev_put() (release never tracked
reference) with __dev_put() (just release the reference, skipping
the reference tracking infra).

Since __netdev_put_lock() uses dev_put() we can't feed a previously
tracked netdev ref into it. Let's flip things around.
netdev_put(dev, NULL) is the same as dev_put(dev) so make
netdev_put_lock() the real function and have __netdev_put_lock()
feed it a NULL tracker for all the cases that were untracked.

Fixes: d04686d9bc86 ("net: Implement netdev_nl_queue_create_doit")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260410153600.1984522-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: return a drop_reason from tcp_add_backlog()

Part of a stack canary removal from tcp_v{4,6}_rcv().

Return a drop_reason instead of a boolean, so that we no longer
have to pass the address of a local variable.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-37 (-37)
Function                                     old     new   delta
tcp_v6_rcv                                  3133    3129      -4
tcp_v4_rcv                                  3206    3202      -4
tcp_add_backlog                             1281    1252     -29
Total: Before=25567186, After=25567149, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409101147.1642967-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipvlan-multicast-delivery-changes'

Eric Dumazet says:

====================
ipvlan: multicast delivery changes

As we did recently for macvlan, this series adds some relief
when ipvlan is under multicast storms.
====================

Link: https://patch.msgid.link/20260409085238.1122947-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipvlan: avoid spinlock contention in ipvlan_multicast_enqueue()

Under high stress, we spend a lot of time cloning skbs,
then acquiring a spinlock, then freeing the clone because
the queue is full.

Add a shortcut to avoid these costs under pressure, as we did
in macvlan with commit 0d5dc1d7aad1 ("macvlan: avoid spinlock
contention in macvlan_broadcast_enqueue()")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409085238.1122947-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipvlan: ipvlan_handle_mode_l2() refactoring

Reduce indentation level, and add a likely() clause
as we expect to process more unicast packets than multicast ones.

No functional change, this eases the following patch review.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409085238.1122947-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Add net_cookie to Dead loop messages

Network devices can have the same name within different network namespaces.
To help distinguish these devices, add the net_cookie value which can be
used to identify the netns.

Signed-off-by: Chris J Arges <carges@cloudflare.com>
Link: https://patch.msgid.link/20260408191056.1036330-1-carges@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-dsa-tag_rtl8_4-fixes-doc-and-set-keep'

Luiz Angelo Daros de Luca says:

====================
net: dsa: tag_rtl8_4: fixes doc and set keep

This small series addresses two points in the rtl8_4 tagger used by the
realtel rtl8365mb driver.

The first patch updates the documentation of the tag format while the
second patch sets the KEEP flag bit, ensuring that the switch
respects the frame's VLAN format as provided by the kernel.

These patches were previously part of a larger series but are being
submitted independently as they are self-contained and already
received review.

Link: https://patch.msgid.link/CAD++jLmX31KfhGXA6SMAPXb14dHSC1t4JQZ=PQvjh-3hUcnzJA@mail.gmail.com
====================

Link: https://patch.msgid.link/20260408-realtek_fixes-v1-0-915ff1404d56@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_rtl8_4: set KEEP flag

KEEP=1 is needed because we should respect the format of the packet as
the kernel sends it to us. Unless tx forward offloading is used, the
kernel is giving us the packet exactly as it should leave the specified
port on the wire. Until now this was not needed because the ports were
always functioning in a standalone mode in a VLAN-unaware way, so the
switch would not tag or untag frames anyway. But arguably it should have
been KEEP=1 all along.

Co-developed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260408-realtek_fixes-v1-2-915ff1404d56@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_rtl8_4: update format description

Document the updated tag layout fields (EFID, VSEL/VIDX) and clarify
which bits are set/cleared when emitting tags.

Co-developed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260408-realtek_fixes-v1-1-915ff1404d56@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'wangxun-improvement'

Jiawen Wu says:

====================
Wangxun improvement

This patch series cleans up the code and enhances the implementation.
====================

Link: https://patch.msgid.link/20260407025616.33652-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: libwx: improve flow control setting

Save the current mode of flow control, and enhance the statistics of
pause frames.

The received pause frames are divided into XON and XOFF to be counted.
And due to the hardware defect of SP devices, XON packets cannot be
trasmitted correctly, so Tx XON pause is disabled by default for those
devices.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-10-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: libwx: wrap-around and reset qmprc counter

The WX_PX_MPRC registers are not clear-on-read hardware counters. The
previous implementation directly read and accumulated these 32-bit values
into a 64-bit software counter. Now implement a rd32_wrap() helper
function to calculate the delta counter to correct the statistic.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-9-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wangxun: schedule hardware stats update in watchdog

Hardware statistics should be updated periodically in the watchdog to
prevent 32-bit registers from overflowing. This is also required for the
upcoming pause frame accounting logic, which relies on regular statistics
sampling.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-8-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wangxun: reorder timer and work sync cancellations

When removing the device, timer_delete_sync(&wx->service_timer) is
called in .ndo_stop() after cancel_work_sync(&wx->service_task). This
may cause new work to be queued after device down.

Move unregister_netdev() before cancel_work_sync(), and use
timer_shutdown_sync() to prevent the timer from being re-armed.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-7-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wangxun: move ethtool_ops.set_channels into libwx

Since function ops wx->setup_tc() is set in txgbe and ngbe,
ethtool_ops.set_channels can be implemented in libwx to reduce
duplicated code.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-6-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wangxun: replace busy-wait reset flag with kernel mutex

Replace the busy-wait loop using test_and_set_bit(WX_STATE_RESETTING)
with a proper per-device mutex to serialize reset operations.

The reset flag is reserved for other code paths (like watchdog), which
need tocheck if a reset is in process.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-5-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ngbe: remove redundant macros

NGBE_NCSI_SUP and NGBE_NCSI_MASK are duplicate-defined, they can be
replaced by the macros defined in libwx. Just remove them.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260407025616.33652-4-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ngbe: move the WOL functions to libwx

Remove duplicate-defined register macros, move the WOL implementation to
the library module. So that the WOL functions can be reused in txgbe
later.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-3-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ngbe: remove netdev->ethtool->wol_enabled setting

netdev->ethtool->wol_enabled is set in ethtool core code, so remove the
redundant setting in ngbe_set_wol().

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260407025616.33652-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: am65-cpsw: add support for J722S SoC family

The J722S CPSW3G is mostly identical to the AM64's, but additionally
supports SGMII.

Signed-off-by: Nora Schiffer <nora.schiffer@ew.tq-group.com>
Link: https://patch.msgid.link/6118e81358d47f455e4c1dbddf3ece6f3329e184.1775558273.git.nora.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: ti: k3-am654-cpsw-nuss: Add ti,j722s-cpsw-nuss compatible

The J722S CPSW3G is mostly identical to the AM64's, but additionally
supports SGMII. The AM64 compatible ti,am642-cpsw-nuss is used as a
fallback.

Signed-off-by: Nora Schiffer <nora.schiffer@ew.tq-group.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/191e9f7e3a6c14eabe891a98c5fb646766479c0a.1775558273.git.nora.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'dpll-zl3073x-add-ref-sync-pair-support'

Ivan Vecera says:

====================
dpll: zl3073x: add ref-sync pair support

This series adds Reference-Sync pair support to the ZL3073x DPLL driver.
A Ref-Sync pair consists of a clock reference and a low-frequency sync
signal (e.g. 1 PPS) where the DPLL locks to the clock reference but
phase-aligns to the sync reference.

Patches 1-3 are preparatory cleanups and helper additions:
- Clean up esync get/set callbacks with early returns and use the
zl3073x_out_is_ndiv() helper
- Convert open-coded clear-and-set bitfield patterns to FIELD_MODIFY()
- Add ref sync control and output clock type accessor helpers

Patch 4 adds the 'ref-sync-sources' phandle-array property to the
dpll-pin device tree binding schema and updates the ZL3073x binding
examples.

Patch 5 implements the driver support:
- ref_sync_get/set callbacks with frequency validation
- Automatic sync source exclusion from reference selection
- Device tree based ref-sync pair registration

Tested and verified on Microchip EDS2 (pcb8385) development board.
====================

Link: https://patch.msgid.link/20260408102716.443099-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: add ref-sync pair support

Add support for ref-sync pair registration using the 'ref-sync-sources'
phandle property from device tree. A ref-sync pair consists of a clock
reference and a low-frequency sync signal where the DPLL locks to the
clock reference but phase-aligns to the sync reference.

The implementation:
- Stores fwnode handle in zl3073x_dpll_pin during pin registration
- Adds ref_sync_get/set callbacks to read and write the sync control
  mode and pair registers
- Validates ref-sync frequency constraints: sync signal must be 8 kHz
  or less, clock reference must be 1 kHz or more and higher than sync
- Excludes sync source from automatic reference selection by setting
  its priority to NONE on connect; on disconnect the priority is left
  as NONE and the user must explicitly make the pin selectable again
- Iterates ref-sync-sources phandles to register declared pairings
  via dpll_pin_ref_sync_pair_add()

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-6-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: dpll: add ref-sync-sources property

Add ref-sync-sources phandle-array property to the dpll-pin schema
allowing board designers to declare which input pins can serve as
sync sources in a Reference-Sync pair. A Ref-Sync pair consists of
a clock reference and a low-frequency sync signal where the DPLL locks
to the clock but phase-aligns to the sync reference.

Update both examples in the Microchip ZL3073x binding to demonstrate
the new property with a 1 PPS sync source paired to a clock source.

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-5-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: add ref sync and output clock type helpers

Add ZL_REF_SYNC_CTRL_MODE_REFSYNC_PAIR and ZL_REF_SYNC_CTRL_PAIR
register definitions.

Add inline helpers to get and set the sync control mode and sync pair
fields of the reference sync control register:

  zl3073x_ref_sync_mode_get/set() - ZL_REF_SYNC_CTRL_MODE field
  zl3073x_ref_sync_pair_get/set() - ZL_REF_SYNC_CTRL_PAIR field

Add inline helpers to get and set the clock type field of the output
mode register:

  zl3073x_out_clock_type_get/set() - ZL_OUTPUT_MODE_CLOCK_TYPE field

Convert existing esync callbacks to use the new helpers.

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: use FIELD_MODIFY() for clear-and-set patterns

Replace open-coded clear-and-set bitfield operations with
FIELD_MODIFY().

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: clean up esync get/set and use zl3073x_out_is_ndiv()

Return -EOPNOTSUPP early in esync_get callbacks when esync is not
supported instead of conditionally populating the range at the end.
This simplifies the control flow by removing the finish label/goto
in the output variant and the conditional range assignment in both
input and output variants.

Replace open-coded N-div signal format switch statements with
zl3073x_out_is_ndiv() helper in esync_get, esync_set and
frequency_set callbacks.

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

iavf: fix kernel-doc comment style in iavf_ethtool.c

iavf_ethtool.c contains 31 kernel-doc comment blocks using the legacy
`**/` terminator instead of the correct single `*/`. Two function
headers also use a colon separator (`iavf_get_channels:`,
`iavf_set_channels:`) instead of the ` - ` dash required by kernel-doc.

Additionally several comments embed their return-value descriptions in
the body paragraph, producing `scripts/kernel-doc -Wreturn` warnings.
Void functions that incorrectly say "Returns ..." are also rephrased.

Fix all issues across the full file:
- Replace every `**/` terminator with `*/`.
- Change `function_name:` doc headers to `function_name -`.
- Move inline "Returns ..." sentences into dedicated `Return:` sections
   for non-void functions (iavf_get_msglevel, iavf_get_rxnfc,
   iavf_set_channels, iavf_get_rxfh_key_size, iavf_get_rxfh_indir_size,
   iavf_get_rxfh, iavf_set_rxfh).
- Rephrase body descriptions in void functions that incorrectly said
   "Returns ..." (iavf_get_drvinfo, iavf_get_ringparam, iavf_get_coalesce).
- Remove boilerplate body text for iavf_get_rxfh_key_size and
   iavf_get_rxfh_indir_size; the `Return:` line now conveys the same
   information without the vague "Returns the table size." sentence.

Suggested-by: Anthony L. Nguyen <anthony.l.nguyen@intel.com>
Suggested-by: Leszek Pepiak <leszek.pepiak@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260409093020.3808687-1-aleksandr.loktionov@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-dsa-mxl862xx-vlan-support-and-minor-improvements'

Daniel Golle says:

====================
net: dsa: mxl862xx: VLAN support and minor improvements

This series adds VLAN offloading to the mxl862xx DSA driver along
with two minor improvements to port setup and bridge configuration.
VLAN support uses a hybrid architecture combining the Extended VLAN
engine for PVID insertion and tag stripping with the VLAN Filter
engine for per-port VID membership, both drawing from shared
1024-entry hardware pools partitioned across user ports at probe time.
====================

Link: https://patch.msgid.link/cover.1775581804.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mxl862xx: implement VLAN functionality

Add VLAN support using both the Extended VLAN (EVLAN) engine and the
VLAN Filter (VF) engine in a hybrid architecture that allows a higher
number of VIDs than either engine could achieve alone.

The VLAN Filter engine handles per-port VID membership checks with
discard-unmatched semantics. The Extended VLAN engine handles PVID
insertion on ingress (via fixed catchall rules) and tag stripping on
egress (2 rules per untagged VID). Tagged-only VIDs need no EVLAN
egress rules at all, so they consume only a VF entry.

Both engines draw from shared 1024-entry hardware pools. The VF pool
is divided equally among user ports for VID membership, while the
EVLAN pool is partitioned into small fixed-size ingress blocks (7
entries of catchall rules per port) and fixed-size egress blocks for
tag stripping.

With 5 user ports this yields up to 204 VIDs per port (limited by VF),
of which up to 98 can be untagged (limited by EVLAN egress budget).
With 9 user ports the numbers are 113 total and 53 untagged.

Wire up .port_vlan_add, .port_vlan_del, and .port_vlan_filtering.
Reprogram all EVLAN rules when the PVID or filtering mode changes.
Detach blocks from the bridge port before freeing them on bridge leave
to satisfy the firmware's internal refcount.

Future optimizations could increase VID capacity by dynamically sizing
the egress EVLAN blocks based on actual per-port untagged VID counts
rather than worst-case pre-allocation, or by sharing EVLAN egress and
VLAN Filter blocks across ports with identical VID sets.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/9be29637675342b109a85fa08f5378800d9f7b78.1775581804.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mxl862xx: don't skip early bridge port configuration

mxl862xx_bridge_port_set() is currently guarded by the
mxl8622_port->setup_done flag, as the early call to
mxl862xx_bridge_port_set() from mxl862xx_port_stp_state_set() would
otherwise cause a NULL-pointer dereference on unused ports which don't
have dp->cpu_dp despite not being a CPU port.

Using the setup_done flag (which is never set for unused ports),
however, also prevents mxl862xx_bridge_port_set() from configuring
user ports' single-port bridges early, which was unintended.

Fix this by returning early from mxl862xx_bridge_port_set() in case
dsa_port_is_unused().

Fixes: 340bdf984613c ("net: dsa: mxl862xx: implement bridge offloading")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/15962aac29ebe0a6eb77565451acff880c41ef33.1775581804.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mxl862xx: reject DSA_PORT_TYPE_DSA

DSA links aren't supported by the mxl862xx driver.

Instead of returning early from .port_setup when called for
DSA_PORT_TYPE_DSA ports rather return -EOPNOTSUPP and show an error
message.

The desired side-effect is that the framework will switch the port to
DSA_PORT_TYPE_UNUSED, so we can stop caring about DSA_PORT_TYPE_DSA in
all other places.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/b686f3a22d8a6e7d470e7aa98da811a996a229b9.1775581804.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-bridge-add-stp_mode-attribute-for-stp-mode-selection'

Andy Roulin says:

====================
net: bridge: add stp_mode attribute for STP mode selection

The bridge-stp usermode helper is currently restricted to the initial
network namespace, preventing userspace STP daemons like mstpd from
operating on bridges in other namespaces. Since commit ff62198553e4
("bridge: Only call /sbin/bridge-stp for the initial network
namespace"), bridges in non-init namespaces silently fall back to
kernel STP with no way to request userspace STP.

This series adds a new IFLA_BR_STP_MODE bridge attribute that allows
explicit per-bridge control over STP mode selection. Three modes are
supported:

  - auto (default): existing behavior, try /sbin/bridge-stp in
    init_net, fall back to kernel STP otherwise
  - user: directly enable BR_USER_STP without invoking the helper,
    works in any network namespace
  - kernel: directly enable BR_KERNEL_STP without invoking the helper

The user and kernel modes bypass call_usermodehelper() entirely,
addressing the security concerns discussed at [1]. Userspace is
responsible for ensuring an STP daemon manages the bridge, rather
than relying on the kernel to invoke /sbin/bridge-stp.

Patch 1 adds the kernel support. The mode can only be changed while
STP is disabled and is processed before IFLA_BR_STP_STATE in
br_changelink() so both can be set atomically in a single netlink
message.

Patch 2 adds documentation for the new attribute in the bridge docs.

Patch 3 adds a selftest with 9 test cases. The test requires iproute2
with IFLA_BR_STP_MODE support and can be run with virtme-ng:

  vng --run arch/x86/boot/bzImage --skip-modules \
      --overlay-rwdir /sbin --overlay-rwdir /tmp --overlay-rwdir /bin \
      --exec 'cp /path/to/iproute2-next/ip/ip /bin/ip && \
              cd tools/testing/selftests/net && \
              bash bridge_stp_mode.sh'

iproute2 support can be found here [2].

[1] https://lore.kernel.org/netdev/565B7F7D.80208@nod.at/
[2] https://github.com/aroulin/iproute2-next/tree/bridge-stp-mode
====================

Link: https://patch.msgid.link/20260405205224.3163000-1-aroulin@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: add bridge STP mode selection test

Add a selftest for the IFLA_BR_STP_MODE bridge attribute that verifies:

1. stp_mode defaults to auto on new bridges
2. stp_mode can be toggled between user, kernel, and auto
3. Changing stp_mode while STP is active is rejected with -EBUSY
4. Re-setting the same stp_mode while STP is active succeeds
5. stp_mode user in a network namespace yields userspace STP (stp_state=2)
6. stp_mode kernel forces kernel STP (stp_state=1)
7. stp_mode auto in a netns preserves traditional fallback to kernel STP
8. stp_mode and stp_state can be set atomically in a single message
9. stp_mode persists across STP disable/enable cycles

Test 5 is the key use case: it demonstrates that userspace STP can now
be enabled in non-init network namespaces by setting stp_mode to user
before enabling STP.

Test 8 verifies the atomic usage pattern where both attributes are set
in a single netlink message, which is supported because br_changelink()
processes IFLA_BR_STP_MODE before IFLA_BR_STP_STATE.

The test gracefully skips if the installed iproute2 does not support
the stp_mode attribute.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Link: https://patch.msgid.link/20260405205224.3163000-4-aroulin@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: net: bridge: document stp_mode attribute

Add documentation for the IFLA_BR_STP_MODE bridge attribute in the
"User space STP helper" section of the bridge documentation. Reference
the BR_STP_MODE_* values via kernel-doc and describe the use case for
network namespace environments.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Link: https://patch.msgid.link/20260405205224.3163000-3-aroulin@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: add stp_mode attribute for STP mode selection

The bridge-stp usermode helper is currently restricted to the initial
network namespace, preventing userspace STP daemons (e.g. mstpd) from
operating on bridges in other network namespaces. Since commit
ff62198553e4 ("bridge: Only call /sbin/bridge-stp for the initial
network namespace"), bridges in non-init namespaces silently fall
back to kernel STP with no way to use userspace STP.

Add a new bridge attribute IFLA_BR_STP_MODE that allows explicit
per-bridge control over STP mode selection:

  BR_STP_MODE_AUTO (default) - Existing behavior: invoke the
    /sbin/bridge-stp helper in init_net only; fall back to kernel STP
    if it fails or in non-init namespaces.

  BR_STP_MODE_USER - Directly enable userspace STP (BR_USER_STP)
    without invoking the helper. Works in any network namespace.
    Userspace is responsible for ensuring an STP daemon manages the
    bridge.

  BR_STP_MODE_KERNEL - Directly enable kernel STP (BR_KERNEL_STP)
    without invoking the helper.

The mode can only be changed while STP is disabled, or set to the
same value (-EBUSY otherwise). IFLA_BR_STP_MODE is processed before
IFLA_BR_STP_STATE in br_changelink(), so both can be set atomically
in a single netlink message. The mode can also be changed in the
same message that disables STP.

The stp_mode struct field is u8 since all possible values fit, while
NLA_U32 is used for the netlink attribute since it occupies the same
space in the netlink message as NLA_U8.

A new stp_helper_active boolean tracks whether the /sbin/bridge-stp
helper was invoked during br_stp_start(), so that br_stp_stop() only
calls the helper for stop when it was called for start. This avoids
calling the helper asymmetrically when stp_mode changes between
start and stop.

Suggested-by: Ido Schimmel <idosch@nvidia.com>
Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Link: https://patch.msgid.link/20260405205224.3163000-2-aroulin@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-selftests-for-ntuple-nfc-rules'

Dimitri Daskalakis says:

====================
Add selftests for ntuple (NFC) rules

Thoroughly testing a device's NFC implementation can be tedious. The more
features a device supports, the more combinations to validate.

This series aims to ease that burden, validating the most common NFC rule
combinations.
====================

Link: https://patch.msgid.link/20260407164954.2977820-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: ntuple: Add dst-ip, src-port, dst-port fields

Extend the ntuple flow steering test to cover dst-ip, src-port, and
dst-port fields. The test supports arbitrary combinations of the fields,
for now we test src_ip/dst_ip, and src_ip/dst_ip/src_port/dst_port.

The tests currently match full fields, but we can consider adding
support for masked fields in the future.

TAP version 13
1..24
ok 1 ntuple.queue.tcp4.src_ip
ok 2 ntuple.queue.tcp4.dst_ip
ok 3 ntuple.queue.tcp4.src_port
ok 4 ntuple.queue.tcp4.dst_port
ok 5 ntuple.queue.tcp4.src_ip.dst_ip
ok 6 ntuple.queue.tcp4.src_ip.dst_ip.src_port.dst_port
ok 7 ntuple.queue.udp4.src_ip
ok 8 ntuple.queue.udp4.dst_ip
ok 9 ntuple.queue.udp4.src_port
ok 10 ntuple.queue.udp4.dst_port
ok 11 ntuple.queue.udp4.src_ip.dst_ip
ok 12 ntuple.queue.udp4.src_ip.dst_ip.src_port.dst_port
ok 13 ntuple.queue.tcp6.src_ip
ok 14 ntuple.queue.tcp6.dst_ip
ok 15 ntuple.queue.tcp6.src_port
ok 16 ntuple.queue.tcp6.dst_port
ok 17 ntuple.queue.tcp6.src_ip.dst_ip
ok 18 ntuple.queue.tcp6.src_ip.dst_ip.src_port.dst_port
ok 19 ntuple.queue.udp6.src_ip
ok 20 ntuple.queue.udp6.dst_ip
ok 21 ntuple.queue.udp6.src_port
ok 22 ntuple.queue.udp6.dst_port
ok 23 ntuple.queue.udp6.src_ip.dst_ip
ok 24 ntuple.queue.udp6.src_ip.dst_ip.src_port.dst_port
# Totals: pass:24 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Link: https://patch.msgid.link/20260407164954.2977820-3-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: Add ntuple (NFC) flow steering test

Add a test for ethtool NFC (ntuple) flow steering rules. The test
creates an ntuple rule matching on various flow fields and verifies
that traffic is steered to the correct queue.

The test forces all traffic to queue 0 via the indirection table,
then installs an ntuple rule to steer select traffic to a specific
queue. The test then verifies the expected number of packets is received
on the queue.

This test has variants for TCP/UDP over IPv4/IPv6, with rules matching
the source IP. Additional match fields will be added in the next commit.

TAP version 13
1..4
ok 1 ntuple.queue.tcp4.src_ip
ok 2 ntuple.queue.udp4.src_ip
ok 3 ntuple.queue.tcp6.src_ip
ok 4 ntuple.queue.udp6.src_ip
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Link: https://patch.msgid.link/20260407164954.2977820-2-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netfilter: require Ethernet MAC header before using eth_hdr()

`ip6t_eui64`, `xt_mac`, the `bitmap:ip,mac`, `hash:ip,mac`, and
`hash:mac` ipset types, and `nf_log_syslog` access `eth_hdr(skb)`
after either assuming that the skb is associated with an Ethernet
device or checking only that the `ETH_HLEN` bytes at
`skb_mac_header(skb)` lie between `skb->head` and `skb->data`.

Make these paths first verify that the skb is associated with an
Ethernet device, that the MAC header was set, and that it spans at
least a full Ethernet header before accessing `eth_hdr(skb)`.

Suggested-by: Florian Westphal <fw@strlen.de>
Tested-by: Ren Wei <enjou1224z@gmail.com>
Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nft_fwd_netdev: check ttl/hl before forwarding

Drop packets if their ttl/hl is too small for forwarding.

Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer")
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: x_tables: Avoid a couple -Wflex-array-member-not-at-end warnings

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Use the TRAILING_OVERLAP() helper to fix the following warnings:

1 net/netfilter/x_tables.c:816:39: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
1 net/netfilter/x_tables.c:811:39: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

This helper creates a union between a flexible-array member (FAM)
and a set of members that would otherwise follow it. This overlays
the trailing members onto the FAM while preserving the original
memory layout.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: conntrack: remove UDP-Lite conntrack support

UDP-Lite (RFC 3828) socket support was recently retired from the core
networking stack. As a follow-up of that, drop the connection tracker
and NAT support for UDP-Lite in Netfilter.

This patch removes CONFIG_NF_CT_PROTO_UDPLITE and scrubs UDP-Lite
awareness from the conntrack core, NAT core, nft_ct, and ctnetlink.
Please note that stateless packet inspection, matching, ipsets or
logging support for IPPROTO_UDPLITE is preserved.

As conntrack no longer extracts UDP-Lite ports or tracks its L4 state,
when performing NAT the UDP-Lite checksum cannot be updated anymore.
That is an expected and acceptable consequence of removing UDP-Lite
conntrack module.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: xt_socket: enable defrag after all other checks

Originally this did not matter because defrag was enabled once per netns
and only disabled again on netns dismantle. When this got changed I should
have adjusted checkentry to not leave defrag enabled on error.

Fixes: de8c12110a13 ("netfilter: disable defrag once its no longer needed")
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: xt_HL: add pr_fmt and checkentry validation

Add pr_fmt to prefix log messages with the module name for
easier debugging in dmesg.

Add checkentry functions for IPv4 (ttl_mt_check) and IPv6
(hl_mt6_check) to validate the match mode at rule registration
time, rejecting invalid modes with -EINVAL.

The evaluation function returns false in case the mode is
unknown, so this is a cleanup, not a bug fix.

Signed-off-by: Marino Dzalto <marino.dzalto@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nfnetlink: prefer skb_mac_header helpers

This adds implicit DEBUG_WARN_ON_ONCE for debug configurations.
No other changes intended.

Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: x_physdev: reject empty or not-nul terminated device names

Reject names that lack a \0 character and reject the empty string as
well. iptables allows this but it fails to re-parse iptables-save output
that contain such rules.

Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: add conn_lfactor and svc_lfactor sysctl vars

Allow the default load factor for the connection and service tables
to be configured.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: add ip_vs_status info

Add /proc/net/ip_vs_status to show current state of IPVS.

The motivation for this new /proc interface is to provide the output
for the users to help them decide when to tune the load factor for
hash tables, which is possible with the new sysctl knobs coming in
followup patch.

The output also includes information for the kthreads used for stats.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: show the current conn_tab size to users

As conn_tab is per-net, better to show the current hash table size
to users instead of the ip_vs_conn_tab_size (max).

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

tools: ynl: tests: fix leading space on Makefile target

The ../generated/protos.a rule had a spurious leading space before the
target name. In make, target rules must start at column 0; only recipe
lines are indented with a tab. The extra space caused make to misparse
the rule.

Remove the leading space to match the style of the adjacent
../lib/ynl.a rule.

Fixes: e0aa0c61758f ("tools: ynl: move samples to tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-ynl_makefile-v1-1-f9624acc2ad9@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: py: explicitly forbid multiple ksft_run() calls

People (do people still write code or is it all AI?) seem to not
get that ksft_run() can only be called once. If we call it
multiple times KTAP parsers will likely cut off after the first
batch has finished.

Link: https://patch.msgid.link/20260408221952.819822-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: sit: remove redundant ret = 0 assignment

The variable ret is assigned a value at all places where it is used;
There is no need to assign a value when it is initially defined.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20260408032051.3096449-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>