git.ipfire.org Git - thirdparty/linux.git/log

Merge branch 'mptcp-add-tcp_maxseg-sockopt-support'

Matthieu Baerts says:

====================
mptcp: add TCP_MAXSEG sockopt support

The TCP_MAXSEG socket option was not supported by MPTCP, mainly because
it has never been requested before. But there are still valid use-cases,
e.g. with HAProxy.

- Patch 1 is a small cleanup patch in the MPTCP sockopt file.

- Patch 2 expose some code from TCP, to avoid duplicating it in MPTCP.

- Patch 3 adds TCP_MAXSEG sockopt support in MPTCP.

- Patch 4 is not related to the others, it fixes a typo in a comment.

Note that the new TCP_MAXSEG sockopt support has been validated by a new
packetdrill script on the MPTCP CI:

https://github.com/multipath-tcp/packetdrill/pull/161

v1: https://lore.kernel.org/20250716-net-next-mptcp-tcp_maxseg-v1-0-548d3a5666f6@kernel.org
====================

Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-0-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: fix typo in a comment

This patch fixes the follow spelling mistake in a comment:

greter -> greater

Signed-off-by: moyuanhao <moyuanhao3676@163.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-4-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: add TCP_MAXSEG sockopt support

The TCP_MAXSEG socket option is currently not supported by MPTCP, mainly
because it has never been requested before. But there are still valid
use-cases, e.g. with HAProxy.

This patch adds its support in MPTCP by propagating the value to all
subflows. The get part looks at the value on the first subflow, to be as
closed as possible to TCP. Only one value can be returned for the cached
MSS, so this can come only from one subflow.

Similar to mptcp_setsockopt_first_sf_only(), a generic helper
mptcp_setsockopt_all_subflows() is added to set sockopt for each
subflows of the mptcp socket.

Add a new member for struct mptcp_sock to store the TCP_MAXSEG value,
and return this value in getsockopt.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/515
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-3-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: add tcp_sock_set_maxseg

Add a helper tcp_sock_set_maxseg() to directly set the TCP_MAXSEG
sockopt from kernel space.

This new helper will be used in the following patch from MPTCP.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-2-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: sockopt: drop redundant tcp_getsockopt

tcp_getsockopt() is called twice in mptcp_getsockopt_first_sf_only() in
different conditions, which makes the code a bit redundant.

The first call to tcp_getsockopt() when the first subflow exists can be
replaced by going to a new label "get" before the second call.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-1-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/qeth: Make hw_trap sysfs attribute idempotent

Update qeth driver to allow writing an existing value to the "hw_trap"
sysfs attribute. Attempting such a write earlier resulted in -EINVAL.
In other words, make the sysfs attribute idempotent.

After:
    $ cat hw_trap
    disarm
    $ echo disarm > hw_trap
    $

Suggested-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250718141711.1141049-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: qcom: qca807x: Enable WoL support using shared library

The Wake-on-LAN (WoL) functionality for the QCA807x series is identical
to that of the AT8031. WoL support for QCA807x is enabled by utilizing
the at8031_set_wol() function provided in the shared library.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250718-qca807x_wol_support-v1-1-cfe323cbb4e8@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: smsc95xx: add support for ethtool pause parameters

Implement ethtool .get_pauseparam and .set_pauseparam handlers for
configuring flow control on smsc95xx. The driver now supports enabling
or disabling transmit and receive pause frames, with or without
autonegotiation. Pause settings are applied during link-up based on
current PHY state and user configuration.

Previously, the driver used phy_get_pause() during link-up handling,
but lacked initialization and an ethtool interface to configure pause
modes. As a result, flow control support was effectively non-functional.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250718075157.297923-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-07-18 (idpf, ice, igc, igbvf, ixgbevf)

For idpf:
Ahmed and Sudheer add support for flow steering via ntuple filters.
Current support is for IPv4 and TCP/UDP only.

Milena adds support for cross timestamping.

Ahmed preserves coalesce settings across resets.

For ice:
Alex adds reporting of 40GbE speed in devlink port split.

Dawid adds support for E835 devices.

Jesse refactors profile ptype processing for cleaner, more readable,
code.

Dave adds a couple of helper functions for LAG to reduce code
duplication.

For igc:
Siang adds support to configure "Default Queue" during runtime using
ethtool's Network Flow Classification (NFC) wildcard rule approach.

For igbvf:
Yuto Ohnuki removes unused fields from igbvf_adapter.

For ixgbevf:
Yuto Ohnuki removes unused fields from ixgbevf_adapter.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ixgbevf: remove unused fields from struct ixgbevf_adapter
  igbvf: remove unused fields from struct igbvf_adapter
  igc: Add wildcard rule support to ethtool NFC using Default Queue
  igc: Relocate RSS field definitions to igc_defines.h
  ice: breakout common LAG code into helpers
  ice: convert ice_add_prof() to bitmap
  ice: add E835 device IDs
  ice: add 40G speed to Admin Command GET PORT OPTION
  idpf: preserve coalescing settings across resets
  idpf: add cross timestamping
  idpf: add flow steering support
  virtchnl2: add flow steering support
  virtchnl2: rename enum virtchnl2_cap_rss
====================

Link: https://patch.msgid.link/20250718185118.2042772-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: cdc-ncm: check for filtering capability

If the decice does not support filtering, filtering
must not be used and all packets delivered for the
upper layers to sort.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://patch.msgid.link/20250717120649.2090929-1-oneukum@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-renesas-gbeth: Add PM suspend/resume callbacks

Add PM suspend/resume callbacks for RZ/G3E SMARC EVK.

The PM deep entry is executed by pressing the SLEEP button and exit from
entry is by pressing the power button.

Logs:
root@smarc-rzg3e:~# PM: suspend entry (deep)
Filesystems sync: 0.115 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.002 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
printk: Suspending console(s) (use no_console_suspend to debug)
NOTICE:  BL2: v2.10.5(release):2.10.5/rz_soc_dev-162-g7148ba838
NOTICE:  BL2: Built : 14:23:58, Jul  5 2025
NOTICE:  BL2: SYS_LSI_MODE: 0x13e06
NOTICE:  BL2: SYS_LSI_DEVID: 0x8679447
NOTICE:  BL2: SYS_LSI_PRR: 0x0
NOTICE:  BL2: Booting BL31
renesas-gbeth 15c30000.ethernet end0: Link is Down
Disabling non-boot CPUs ...
psci: CPU3 killed (polled 0 ms)
psci: CPU2 killed (polled 0 ms)
psci: CPU1 killed (polled 0 ms)
Enabling non-boot CPUs ...
Detected VIPT I-cache on CPU1
GICv3: CPU1: found redistributor 100 region 0:0x0000000014960000
CPU1: Booted secondary processor 0x0000000100 [0x412fd050]
CPU1 is up
Detected VIPT I-cache on CPU2
GICv3: CPU2: found redistributor 200 region 0:0x0000000014980000
CPU2: Booted secondary processor 0x0000000200 [0x412fd050]
CPU2 is up
Detected VIPT I-cache on CPU3
GICv3: CPU3: found redistributor 300 region 0:0x00000000149a0000
CPU3: Booted secondary processor 0x0000000300 [0x412fd050]
CPU3 is up
dwmac4: Master AXI performs fixed burst length
15c30000.ethernet end0: No Safety Features support found
15c30000.ethernet end0: IEEE 1588-2008 Advanced Timestamp supported
15c30000.ethernet end0: configuring for phy/rgmii-id link mode
dwmac4: Master AXI performs fixed burst length
15c40000.ethernet end1: No Safety Features support found
15c40000.ethernet end1: IEEE 1588-2008 Advanced Timestamp supported
15c40000.ethernet end1: configuring for phy/rgmii-id link mode
OOM killer enabled.
Restarting tasks: Starting
Restarting tasks: Done
random: crng reseeded on system resumption
PM: suspend exit

15c30000.ethernet end0: Link is Up - 1Gbps/Full - flow control rx/tx
root@smarc-rzg3e:~# ifconfig end0 192.168.10.7 up
root@smarc-rzg3e:~# ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.
64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=2.05 ms
64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=0.928 ms

Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20250717071109.8213-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'amd-xgbe-add-hardware-ptp-timestamping'

Raju Rangoju says:

====================
amd-xgbe: add hardware PTP timestamping

Remove the hwptp abstraction and associated callbacks from the
struct xgbe_hw_if {} and move them to separate file after cleanup.

Adds complete support for hardware-based PTP (IEEE 1588)
timestamping to the AMD XGBE driver.
====================

Link: https://patch.msgid.link/20250718185628.4038779-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

amd-xgbe: add hardware PTP timestamping support

Adds complete support for hardware-based PTP (IEEE 1588)
timestamping to the AMD XGBE driver.

- Initialize and configure the MAC PTP registers based on link
speed and reference clock.
- Support both 50MHz and 125MHz PTP reference clocks.
- Update the driver interface and version data to support PTP
clock frequency selection.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20250718185628.4038779-3-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

and-xgbe: remove the abstraction for hwptp

Remove the hwptp abstraction and associated callbacks from
the struct xgbe_hw_if {}.

The callback structure was only ever assigned a single function, without
null checks. This cleanup inlines the logic and moves all the hwtstamp
realted code a separate file, improving readability and maintainance.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20250718185628.4038779-2-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tc: Add generic erspan_opts matching support for tc-flower

Add test cases to tc_flower.sh to validate generic matching on ERSPAN
options. Both ERSPAN Type II and Type III are covered.

Also add check_tc_erspan_support() to verify whether tc supports
erspan_opts.

Signed-off-by: Li Shuang <shuali@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/1f354a1afd60f29bbbf02bd60cb52ecfc0b6bd17.1752848172.git.shuali@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: Remove duplicate assignments for net->pcpu_stat_type

This commit remove duplicate assignments for net->pcpu_stat_type
in usbnet_probe().

Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Use correct byte order and format string for TCP seq and ack_seq

The TCP header fields seq and ack_seq are 32-bit values in network
byte order as (__be32). these fields were earlier printed using
ntohs(), which converts only 16-bit values and produces incorrect
results for 32-bit fields. This patch is changeing the conversion
to ntohl(), ensuring correct interpretation of these sequence numbers.

Notably, the format specifier is updated from %d to %u to reflect the
unsigned nature of these fields.

improves the accuracy of debug log messages for TCP sequence and
acknowledgment numbers during TX timeouts.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250717193552.3648791-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmasp: Add support for re-starting auto-negotiation

Wire-up ethtool_ops::nway_reset to phy_ethtool_nway_reset in order to
support re-starting auto-negotiation.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250717180915.2611890-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-maintain-netif-vs-dev-prefix-semantics'

Stanislav Fomichev says:

====================
net: maintain netif vs dev prefix semantics

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate. We care only
about driver-visible APIs, don't touch stack-internal routines.

The rest seem to be ok:
  * dev_xdp_prog_count - mostly called by sw drivers (bonding), should not matter
  * dev_get_by_xxx - too many to reasonably cleanup, already have different flavors
  * dev_fetch_sw_netstats - don't need instance lock
  * dev_get_tstats64 - never called directly, only as an ndo callback
  * dev_pick_tx_zero - never called directly, only as an ndo callback
  * dev_add_pack / dev_remove_pack - called early enough (in module init) to not matter
  * dev_get_iflink - mostly called by sw drivers, should not matter
  * dev_fill_forward_path - ditto
  * dev_getbyhwaddr_rcu - ditto
  * dev_getbyhwaddr - ditto
  * dev_getfirstbyhwtype - ditto
  * dev_valid_name - ditto
  * __dev_forward_skb dev_forward_skb dev_queue_xmit_nit - established helpers, no netif vs dev distinction
====================

Link: https://patch.msgid.link/20250717172333.1288349-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: s/dev_close_many/netif_close_many/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

netif_close_many is used only by vlan/dsa and one mtk driver, so move it into
NETDEV_INTERNAL namespace.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-8-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: s/dev_set_threaded/netif_set_threaded/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

Note that one dev_set_threaded call still remains in mt76 for debugfs file.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-7-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: s/dev_get_flags/netif_get_flags/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-6-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: s/__dev_set_mtu/__netif_set_mtu/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

__netif_set_mtu is used only by bond, so move it into
NETDEV_INTERNAL namespace.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-5-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: s/dev_pre_changeaddr_notify/netif_pre_changeaddr_notify/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

netif_pre_changeaddr_notify is used only by ipvlan/bond, so move it into
NETDEV_INTERNAL namespace.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-4-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: s/dev_get_mac_address/netif_get_mac_address/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

netif_get_mac_address is used only by tun/tap, so move it into
NETDEV_INTERNAL namespace.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-3-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: s/dev_get_port_parent_id/netif_get_port_parent_id/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-2-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rtnetlink: Add operational state test

Virtual devices (e.g., VXLAN) that do not have a notion of a carrier are
created with an "UNKNOWN" operational state which some users find
confusing [1].

It is possible to set the operational state from user space either
during device creation or afterwards and some applications will start
doing that in order to avoid the above problem.

Add a test for this functionality to ensure it does not regress.

[1] https://lore.kernel.org/netdev/20241119153703.71f97b76@hermes.local/

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250717125151.466882-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: selftests: add PHY-loopback test for bad TCP checksums

Detect NICs and drivers that either drop frames with a corrupted TCP
checksum or, worse, pass them up as valid. The test flips one bit in
the checksum, transmits the packet in internal loopback, and fails when
the driver reports CHECKSUM_UNNECESSARY.

Discussed at:
https://lore.kernel.org/all/20250625132117.1b3264e8@kernel.org/

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250717083524.1645069-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: track pfmemalloc drops via SKB_DROP_REASON_PFMEMALLOC

Add a new SKB drop reason (SKB_DROP_REASON_PFMEMALLOC) to track packets
dropped due to memory pressure. In production environments, we've observed
memory exhaustion reported by memory layer stack traces, but these drops
were not properly tracked in the SKB drop reason infrastructure.

While most network code paths now properly report pfmemalloc drops, some
protocol-specific socket implementations still use sk_filter() without
drop reason tracking:
- Bluetooth L2CAP sockets
- CAIF sockets
- IUCV sockets
- Netlink sockets
- SCTP sockets
- Unix domain sockets

These remaining cases represent less common paths and could be converted
in a follow-up patch if needed. The current implementation provides
significantly improved observability into memory pressure events in the
network stack, especially for key protocols like TCP and UDP, helping to
diagnose problems in production environments.

Reported-by: Matt Fleming <mfleming@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/175268316579.2407873.11634752355644843509.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stream: add description for sk_stream_write_space()

Add a proper description for the sk_stream_write_space() function as
previously marked by a FIXME comment.
No functional changes.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Link: https://patch.msgid.link/20250716153404.7385-1-suchitkarunakaran@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbevf: remove unused fields from struct ixgbevf_adapter

Remove hw_rx_no_dma_resources and eitr_param fields from struct
ixgbevf_adapter since these fields are never referenced in the driver.

Note that the interrupt throttle rate is controlled by the
rx_itr_setting and tx_itr_setting variables.

This change simplifies the ixgbevf driver by removing unused fields,
which improves maintainability.

Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igbvf: remove unused fields from struct igbvf_adapter

Remove following unused fields from struct igbvf_adapter that are never
referenced in the driver.

- blink_timer
- eeprom_wol
- fc_autoneg
- int_mode
- led_status
- mng_vlan_id
- polling_interval
- rx_dma_failed
- test_icr
- test_rx_ring
- test_tx_ring
- tx_dma_failed
- tx_fifo_head
- tx_fifo_size
- tx_head_addr

Also removed the following fields from struct igbvf_adapter since they
are never read or used after initialization by igbvf_probe() and igbvf_sw_init().

- bd_number
- rx_abs_int_delay
- tx_abs_int_delay
- rx_int_delay
- tx_int_delay

This changes simplify the igbvf driver by removing unused fields, which
improves maintenability.

Tested-by: Kohei Enju <enjuk@amazon.com>
Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Add wildcard rule support to ethtool NFC using Default Queue

Introduce support for a lowest priority wildcard (catch-all) rule in
ethtool's Network Flow Classification (NFC) for the igc driver. The
wildcard rule directs all unmatched network traffic, including traffic not
captured by Receive Side Scaling (RSS), to a specified queue. This
functionality utilizes the Default Queue feature available in I225/I226
hardware.

The implementation has been validated on Intel ADL-S systems with two
back-to-back connected I226 network interfaces.

Testing Procedure:
1. On the Device Under Test (DUT), verify the initial statistic:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 0
        rx_queue_1_packets: 0
        rx_queue_2_packets: 0
        rx_queue_3_packets: 0

2. From the Link Partner, send 10 ARP packets:
   $ arping -c 10 -I enp170s0 169.254.1.2

3. On the DUT, verify the packet reception on Queue 0:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 10
        rx_queue_1_packets: 0
        rx_queue_2_packets: 0
        rx_queue_3_packets: 0

4. On the DUT, add a wildcard rule to route all packets to Queue 3:
   $ sudo ethtool -N enp1s0 flow-type ether queue 3

5. From the Link Partner, send another 10 ARP packets:
   $ arping -c 10 -I enp170s0 169.254.1.2

6. Now, packets are routed to Queue 3 by the wildcard (Default Queue) rule:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 10
        rx_queue_1_packets: 0
        rx_queue_2_packets: 0
        rx_queue_3_packets: 10

7. On the DUT, add a EtherType rule to route ARP packet to Queue 1:
   $ sudo ethtool -N enp1s0 flow-type ether proto 0x0806 queue 1

8. From the Link Partner, send another 10 ARP packets:
   $ arping -c 10 -I enp170s0 169.254.1.2

9. Now, packets are routed to Queue 1 by the EtherType rule because it is
   higher priority than the wildcard (Default Queue) rule:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 10
        rx_queue_1_packets: 10
        rx_queue_2_packets: 0
        rx_queue_3_packets: 10

10. On the DUT, delete all the NFC rules:
    $ sudo ethtool -N enp1s0 delete 63
    $ sudo ethtool -N enp1s0 delete 64

11. From the Link Partner, send another 10 ARP packets:
    $ arping -c 10 -I enp170s0 169.254.1.2

12. Now, packets are routed to Queue 0 because the value of Default Queue
    is reset back to 0:
    $ ethtool -S enp1s0 | grep rx_q.*packets
         rx_queue_0_packets: 20
         rx_queue_1_packets: 10
         rx_queue_2_packets: 0
         rx_queue_3_packets: 10

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Co-developed-by: Blanco Alcaine Hector <hector.blanco.alcaine@intel.com>
Signed-off-by: Blanco Alcaine Hector <hector.blanco.alcaine@intel.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Relocate RSS field definitions to igc_defines.h

Move the RSS field definitions related to IPv4 and IPv6 UDP from igc.h to
igc_defines.h to consolidate the RSS field definitions in a single header
file, improving code organization and maintainability.

This refactoring does not alter the functionality of the driver but
enhances the logical grouping of related constants

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: breakout common LAG code into helpers

In the VF handling code, parts of the code for lag can be broken out into
helper functions to reduce code duplication. Break this code out into
helper functions

Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: convert ice_add_prof() to bitmap

Previously the ice_add_prof() took an array of u8 and looped over it with
for_each_set_bit(), examining each 8 bit value as a bitmap.
This was just hard to understand and unnecessary, and was triggering
undefined behavior sanitizers with unaligned accesses within bitmap
fields (on our internal tools/builds). Since the @ptype being passed in
was already declared as a bitmap, refactor this to use native types with
the advantage of simplifying the code to use a single loop.

Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
CC: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: add E835 device IDs

E835 is an enhanced version of the E830.
It continues to use the same set of commands, registers and interfaces
as other devices in the 800 Series.

Following device IDs are added:
- 0x1248: Intel(R) Ethernet Controller E835-CC for backplane
- 0x1249: Intel(R) Ethernet Controller E835-CC for QSFP
- 0x124A: Intel(R) Ethernet Controller E835-CC for SFP
- 0x1261: Intel(R) Ethernet Controller E835-C for backplane
- 0x1262: Intel(R) Ethernet Controller E835-C for QSFP
- 0x1263: Intel(R) Ethernet Controller E835-C for SFP
- 0x1265: Intel(R) Ethernet Controller E835-L for backplane
- 0x1266: Intel(R) Ethernet Controller E835-L for QSFP
- 0x1267: Intel(R) Ethernet Controller E835-L for SFP

Reviewed-by: Konrad Knitter <konrad.knitter@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: add 40G speed to Admin Command GET PORT OPTION

Introduce the ICE_AQC_PORT_OPT_MAX_LANE_40G constant and update the code
to process this new option in both the devlink and the Admin Queue Command
GET PORT OPTION (opcode 0x06EA) message, similar to existing constants like
ICE_AQC_PORT_OPT_MAX_LANE_50G, ICE_AQC_PORT_OPT_MAX_LANE_100G, and so on.

This feature allows the driver to correctly report configuration options
for 2x40G on E823 and other cards in the future via devlink.

Example command:
devlink port split pci/0000:01:00.0/0 count 2

Example dmesg:
ice 0000:01:00.0: Available port split options and max port speeds (Gbps):
ice 0000:01:00.0: Status  Split      Quad 0          Quad 1
ice 0000:01:00.0:         count  L0  L1  L2  L3  L4  L5  L6  L7
ice 0000:01:00.0:         2      40   -   -   -  40   -   -   -
ice 0000:01:00.0:         2      50   -  50   -   -   -   -   -
ice 0000:01:00.0:         4      25  25  25  25   -   -   -   -
ice 0000:01:00.0:         4      25  25   -   -  25  25   -   -
ice 0000:01:00.0: Active  8      10  10  10  10  10  10  10  10
ice 0000:01:00.0:         1     100   -   -   -   -   -   -   -

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

idpf: preserve coalescing settings across resets

The IRQ coalescing config currently reside only inside struct
idpf_q_vector. However, all idpf_q_vector structs are de-allocated and
re-allocated during resets. This leads to user-set coalesce configuration
to be lost.

Add new fields to struct idpf_vport_user_config_data to save the user
settings and re-apply them after reset.

Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

idpf: add cross timestamping

Add cross timestamp support through virtchnl mailbox messages and directly,
through PCIe BAR registers. Cross timestamping assumes that both system
time and device clock time values are cached simultaneously, what is
triggered by HW. Feature is enabled for both ARM and x86 archs.

Signed-off-by: Milena Olech <milena.olech@intel.com>
Reviewed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

idpf: add flow steering support

Use the new virtchnl2 OP codes to communicate with the Control Plane to
add flow steering filters. We add the basic functionality for add/delete
with TCP/UDP IPv4 only. Support for other OP codes and protocols will be
added later.

Standard 'ethtool -N|--config-ntuple' should be used, for example:

# ethtool -N ens801f0d1 flow-type tcp4 src-ip 10.0.0.1 action 6

to route all IPv4/TCP traffic from IP 10.0.0.1 to queue 6.

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

virtchnl2: add flow steering support

Add opcodes and corresponding message structure to add and delete
flow steering rules. Flow steering enables configuration
of rules to take an action or subset of actions based on a match
criteria. Actions could be redirect to queue, redirect to queue
group, drop packet or mark.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Co-developed-by: Dinesh Kumar <dinesh.kumar@intel.com>
Signed-off-by: Dinesh Kumar <dinesh.kumar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

virtchnl2: rename enum virtchnl2_cap_rss

The "enum virtchnl2_cap_rss" will be used for negotiating flow
steering capabilities. Instead of adding a new enum, rename
virtchnl2_cap_rss to virtchnl2_flow_types. Also rename the enum's
constants.

Flow steering will use this enum in the next patches.

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

et131x: Add missing check after DMA map

The DMA map functions can fail and should be tested for errors.
If the mapping fails, unmap and return an error.

Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Acked-by: Mark Einon <mark.einon@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250716094733.28734-2-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ag71xx: Add missing check after DMA map

The DMA map functions can fail and should be tested for errors.

Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250716095733.37452-3-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/drivers/net: Support ipv6 for napi_id test

Add support for IPv6 environment for napi_id test.

Test Plan:

    ./run_kselftest.sh -t drivers/net:napi_id.py
    TAP version 13
    1..1
    # timeout set to 45
    # selftests: drivers/net: napi_id.py
    # TAP version 13
    # 1..1
    # ok 1 napi_id.test_napi_id
    # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
    ok 1 selftests: drivers/net: napi_id.py

Signed-off-by: Tianyi Cui <1997cui@gmail.com>
Link: https://patch.msgid.link/20250717011913.1248816-1-1997cui@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ibmvnic: Use ndo_get_stats64 to fix inaccurate SAR reporting

VNIC testing on multi-core Power systems showed SAR stats drift
and packet rate inconsistencies under load.

Implements ndo_get_stats64 to provide safe aggregation of queue-level
atomic64 counters into rtnl_link_stats64 for use by tools like 'ip -s',
'ifconfig', and 'sar'. Switch to ndo_get_stats64 to align SAR reporting
with the standard kernel interface for retrieving netdev stats.

This removes redundant per-adapter stat updates, reduces overhead,
eliminates cacheline bouncing from hot path updates, and improves
the accuracy of reported packet rates.

Signed-off-by: Mingming Cao <mmc@linux.ibm.com>
Reviewed-by: Brian King <bjking1@linux.ibm.com>
Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
----
Changes since v3:
link to v3: https://www.spinics.net/lists/netdev/msg1107999.html
-- keep per queue counters as u64 (this patch) and drop off patch 1 in v3

Link: https://patch.msgid.link/20250716152115.61143-1-mmc@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-misc-changes-2025-07-16'

Tariq Toukan says:

====================
net/mlx5: misc changes 2025-07-16

This series contains misc enhancements to the mlx5 driver.

v1: https://lore.kernel.org/1752471585-18053-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/1752675472-201445-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Properly access RCU protected qdisc_sleeping variable

qdisc_sleeping variable is declared as "struct Qdisc __rcu" and
as such needs proper annotation while accessing it.

Without rtnl_dereference(), the following error is generated by sparse:

drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: warning:
  incorrect type in initializer (different address spaces)
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40:    expected
  struct Qdisc *qdisc
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40:    got struct
  Qdisc [noderef] __rcu *qdisc_sleeping

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1752675472-201445-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: fix kdoc warning on eswitch.h

Fix the following kdoc warning:
git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/mlx5/core/ |\
xargs scripts/kernel-doc --none
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:824: warning: cannot
understand function prototype: 'struct mlx5_esw_event_info '

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1752675472-201445-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Enable IPSec hardware offload in legacy mode

IPSec hardware offload in legacy mode should not be affected by the
steering mode, hence it should also work properly with hmfs mode.

Remove steering mode validation when calculating the cap for packet
offload, this will also enable the missing cap MLX5_IPSEC_CAP_PRIO
needed for crypto offload.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1752675472-201445-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pcs: xpcs: mask readl() return value to 16 bits

readl() returns 32-bit value but Clause 22/45 registers are 16-bit wide.
Masking with 0xFFFF avoids using garbage upper bits.

Signed-off-by: Jack Ping CHNG <jchng@maxlinear.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250716030349.3796806-1-jchng@maxlinear.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix an IS_ERR() vs NULL bug in esw_qos_move_node()

The __esw_qos_alloc_node() function returns NULL on error. It doesn't
return error pointers. Update the error checking to match.

Fixes: 96619c485fa6 ("net/mlx5: Add support for setting tc-bw on nodes")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/0ce4ec2a-2b5d-4652-9638-e715a99902a7@sabinyo.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_wed: Fix NULL vs IS_ERR() bug in mtk_wed_get_memory_region()

We recently changed this from using devm_ioremap() to using
devm_ioremap_resource() and unfortunately the former returns NULL while
the latter returns error pointers. The check for errors needs to be
updated as well.

Fixes: e27dba1951ce ("net: Use of_reserved_mem_region_to_resource{_byname}() for "memory-region"")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/87c10dbd-df86-4971-b4f5-40ba02c076fb@sabinyo.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: Fix a NULL vs IS_ERR() bug in airoha_npu_run_firmware()

The devm_ioremap_resource() function returns error pointers. It never
returns NULL. Update the check to match.

Fixes: e27dba1951ce ("net: Use of_reserved_mem_region_to_resource{_byname}() for "memory-region"")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/fc6d194e-6bf5-49ca-bc77-3fdfda62c434@sabinyo.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-shared-phy-counter-support-for-qca807x-and-qca808x'

Luo Jie says:

====================
Add shared PHY counter support for QCA807x and QCA808x

The implementation of the PHY counter is identical for both QCA808x and
QCA807x series devices. This includes counters for both good and bad CRC
frames in the RX and TX directions, which are active when CRC checking
is enabled.

This patch series introduces PHY counter functions into a shared library,
enabling counter support for the QCA808x and QCA807x families through this
common infrastructure. Additionally, enable CRC checking and configure
automatic clearing of counters after reading within config_init() to ensure
accurate counter recording.

v2: https://lore.kernel.org/20250714-qcom_phy_counter-v2-0-94dde9d9769f@quicinc.com
v1: https://lore.kernel.org/20250709-qcom_phy_counter-v1-0-93a54a029c46@quicinc.com
====================

Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-0-8b0e460a527b@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: qcom: qca807x: Support PHY counter

Within the QCA807X PHY operation's config_init() function, enable CRC
checking for received and transmitted frames and configure counter to
clear after being read to support counter recording. Additionally, add
support for PHY counter operations.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-3-8b0e460a527b@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: qcom: qca808x: Support PHY counter

Enable CRC checking for received and transmitted frames, and configure
counters to clear after being read within config_init() for accurate
counter recording. Additionally, add PHY counter operations and integrate
shared functions.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-2-8b0e460a527b@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: qcom: Add PHY counter support

Add PHY counter functionality to the shared library. The implementation
is identical for the current QCA807X and QCA808X PHYs.

The PHY counter can be configured to perform CRC checking for both received
and transmitted packets. Additionally, the packet counter can be set to
automatically clear after it is read.

The PHY counter includes 32-bit packet counters for both RX (received) and
TX (transmitted) packets, as well as 16-bit counters for recording CRC
error packets for both RX and TX.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250715-qcom_phy_counter-v3-1-8b0e460a527b@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdevsim: remove redundant branch

bool notify is referenced nowhere else in the function except to check
whether or not to call rtnl_offload_xstats_notify(). Remove it and move
the call to the previous branch.

Signed-off-by: Dennis Chen <dechen@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250716165750.561175-1-dechen@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-07-17

We've added 13 non-merge commits during the last 20 day(s) which contain
a total of 4 files changed, 712 insertions(+), 84 deletions(-).

The main changes are:

1) Avoid skipping or repeating a sk when using a TCP bpf_iter,
   from Jordan Rife.

2) Clarify the driver requirement on using the XDP metadata,
   from Song Yoong Siang

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  doc: xdp: Clarify driver implementation for XDP Rx metadata
  selftests/bpf: Add tests for bucket resume logic in established sockets
  selftests/bpf: Create iter_tcp_destroy test program
  selftests/bpf: Create established sockets in socket iterator tests
  selftests/bpf: Make ehash buckets configurable in socket iterator tests
  selftests/bpf: Allow for iteration over multiple states
  selftests/bpf: Allow for iteration over multiple ports
  selftests/bpf: Add tests for bucket resume logic in listening sockets
  bpf: tcp: Avoid socket skips and repeats during iteration
  bpf: tcp: Use bpf_tcp_iter_batch_item for bpf_tcp_iter_state batch items
  bpf: tcp: Get rid of st_bucket_done
  bpf: tcp: Make sure iter->batch always contains a full bucket snapshot
  bpf: tcp: Make mem flags configurable through bpf_iter_tcp_realloc_batch
====================

Link: https://patch.msgid.link/20250717191731.4142326-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: prevent Python from buffering the output

Make sure Python doesn't buffer the output, otherwise for some
tests we may see false positive timeouts in NIPA. NIPA thinks that
a machine has hung if the test doesn't print anything for 3min.
This is also nice to heave for running the tests manually,
especially in vng.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250716205712.1787325-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'neighbour-convert-rtm_getneigh-to-rcu-and-make-pneigh-rtnl-free'

Kuniyuki Iwashima says:

====================
neighbour: Convert RTM_GETNEIGH to RCU and make pneigh RTNL-free.

This is kind of v3 of the series below [0] but without NEIGHTBL patches.

Patch 1 ~ 4 and 9 come from the series to convert RTM_GETNEIGH to RCU.

Other patches clean up pneigh_lookup() and convert the pneigh code to
RCU + private mutex so that we can easily remove RTNL from RTM_NEWNEIGH
in the later series.

[0]: https://lore.kernel.org/netdev/20250418012727.57033-1-kuniyu@amazon.com/

v2: https://lore.kernel.org/20250712203515.4099110-1-kuniyu@google.com
v1: https://lore.kernel.org/20250711191007.3591938-1-kuniyu@google.com
====================

Link: https://patch.msgid.link/20250716221221.442239-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Update pneigh_entry in pneigh_create().

neigh_add() updates pneigh_entry() found or created by pneigh_create().

This update is serialised by RTNL, but we will remove it.

Let's move the update part to pneigh_create() and make it return errno
instead of a pointer of pneigh_entry.

Now, the pneigh code is RTNL free.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-16-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Protect tbl->phash_buckets[] with a dedicated mutex.

tbl->phash_buckets[] is only modified in the slow path by pneigh_create()
and pneigh_delete() under the table lock.

Both of them are called under RTNL, so no extra lock is needed, but we
will remove RTNL from the paths.

pneigh_create() looks up a pneigh_entry, and this part can be lockless,
but it would complicate the logic like

  1. lookup
  2. allocate pengih_entry for GFP_KERNEL
  3. lookup again but under lock
  4. if found, return it after freeing the allocated memory
  5. else, return the new one

Instead, let's add a per-table mutex and run lookup and allocation
under it.

Note that updating pneigh_entry part in neigh_add() is still protected
by RTNL and will be moved to pneigh_create() in the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-15-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Drop read_lock_bh(&tbl->lock) in pneigh_lookup().

Now, all callers of pneigh_lookup() are under RCU, and the read
lock there is no longer needed.

Let's drop the lock, inline __pneigh_lookup_1() to pneigh_lookup(),
and call it from pneigh_create().

The next patch will remove tbl->lock from pneigh_create().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Remove __pneigh_lookup().

__pneigh_lookup() is the lockless version of pneigh_lookup(),
but its only caller pndisc_is_router() holds the table lock and
reads pneigh_netry.flags.

This is because accessing pneigh_entry after pneigh_lookup() was
illegal unless the caller holds RTNL or the table lock.

Now, pneigh_entry is guaranteed to be alive during the RCU critical
section.

Let's call pneigh_lookup() and use READ_ONCE() for n->flags in
pndisc_is_router() and remove __pneigh_lookup().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Use rcu_dereference() in pneigh_get_{first,next}().

Now pneigh_entry is guaranteed to be alive during the
RCU critical section even without holding tbl->lock.

Let's use rcu_dereference() in pneigh_get_{first,next}().

Note that neigh_seq_start() still holds tbl->lock for the
normal neighbour entry.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-12-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Drop read_lock_bh(&tbl->lock) in pneigh_dump_table().

Now pneigh_entry is guaranteed to be alive during the
RCU critical section even without holding tbl->lock.

Let's drop read_lock_bh(&tbl->lock) and use rcu_dereference()
to iterate tbl->phash_buckets[] in pneigh_dump_table()

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-11-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Convert RTM_GETNEIGH to RCU.

Only __dev_get_by_index() is the RTNL dependant in neigh_get().

Let's replace it with dev_get_by_index_rcu() and convert RTM_GETNEIGH
to RCU.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Annotate access to struct pneigh_entry.{flags,protocol}.

We will convert pneigh readers to RCU, and its flags and protocol
will be read locklessly.

Let's annotate the access to the two fields.

Note that all access to pn->permanent is under RTNL (neigh_add()
and pneigh_ifdown_and_unlock()), so WRITE_ONCE() and READ_ONCE()
are not needed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Free pneigh_entry after RCU grace period.

We will convert RTM_GETNEIGH to RCU.

neigh_get() looks up pneigh_entry by pneigh_lookup() and passes
it to pneigh_fill_info().

Then, we must ensure that the entry is alive till pneigh_fill_info()
completes, but read_lock_bh(&tbl->lock) in pneigh_lookup() does not
guarantee that.

Also, we will convert all readers of tbl->phash_buckets[] to RCU.

Let's use call_rcu() to free pneigh_entry and update phash_buckets[]
and ->next by rcu_assign_pointer().

pneigh_ifdown_and_unlock() uses list_head to avoid overwriting
->next and moving RCU iterators to another list.

pndisc_destructor() (only IPv6 ndisc uses this) uses a mutex, so it
is not delayed to call_rcu(), where we cannot sleep. This is fine
because the mcast code works with RCU and ipv6_dev_mc_dec() frees
mcast objects after RCU grace period.

While at it, we change the return type of pneigh_ifdown_and_unlock()
to void.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Annotate neigh_table.phash_buckets and pneigh_entry.next with __rcu.

The next patch will free pneigh_entry with call_rcu().

Then, we need to annotate neigh_table.phash_buckets[] and
pneigh_entry.next with __rcu.

To make the next patch cleaner, let's annotate the fields in advance.

Currently, all accesses to the fields are under the neigh table lock,
so rcu_dereference_protected() is used with 1 for now, but most of them
(except in pneigh_delete() and pneigh_ifdown_and_unlock()) will be
replaced with rcu_dereference() and rcu_dereference_check().

Note that pneigh_ifdown_and_unlock() changes pneigh_entry.next to a
local list, which is illegal because the RCU iterator could be moved
to another list. This part will be fixed in the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Split pneigh_lookup().

pneigh_lookup() has ASSERT_RTNL() in the middle of the function, which
is confusing.

When called with the last argument, creat, 0, pneigh_lookup() literally
looks up a proxy neighbour entry. This is the case of the reader path
as the fast path and RTM_GETNEIGH.

pneigh_lookup(), however, creates a pneigh_entry when called with creat 1
from RTM_NEWNEIGH and SIOCSARP, which require RTNL.

Let's split pneigh_lookup() into two functions.

We will convert all the reader paths to RCU, and read_lock_bh(&tbl->lock)
in the new pneigh_lookup() will be dropped.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Move neigh_find_table() to neigh_get().

neigh_valid_get_req() calls neigh_find_table() to fetch neigh_tables[].

neigh_find_table() uses rcu_dereference_rtnl(), but RTNL actually does
not protect it at all; neigh_table_clear() can be called without RTNL
and only waits for RCU readers by synchronize_rcu().

Fortunately, there is no bug because IPv4 is built-in, IPv6 cannot be
unloaded, and DECNET was removed.

To fetch neigh_tables[] by rcu_dereference() later, let's move
neigh_find_table() from neigh_valid_get_req() to neigh_get().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Allocate skb in neigh_get().

We will remove RTNL for neigh_get() and run it under RCU instead.

neigh_get_reply() and pneigh_get_reply() allocate skb with GFP_KERNEL.

Let's move the allocation before __dev_get_by_index() in neigh_get().

Now, neigh_get_reply() and pneigh_get_reply() are inlined and
rtnl_unicast() is factorised.

We will convert pneigh_lookup() to __pneigh_lookup() later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Move two validations from neigh_get() to neigh_valid_get_req().

We will remove RTNL for neigh_get() and run it under RCU instead.

neigh_get() returns -EINVAL in the following cases:

* NDA_DST is not specified
* Both ndm->ndm_ifindex and NTF_PROXY are not specified

These validations do not require RCU.

Let's move them to neigh_valid_get_req().

While at it, the extack string for the first case is replaced with
NL_SET_ERR_ATTR_MISS().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Make neigh_valid_get_req() return ndmsg.

neigh_get() passes 4 local variable pointers to neigh_valid_get_req().

If it returns a pointer of struct ndmsg, we do not need to pass two
of them.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250716221221.442239-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ethtool-rss-support-rss_set-via-netlink'

Jakub Kicinski says:

====================
ethtool: rss: support RSS_SET via Netlink

Support configuring RSS settings via Netlink.
Creating and removing contexts remains for the following series.

v2: https://lore.kernel.org/20250714222729.743282-1-kuba@kernel.org
v1: https://lore.kernel.org/20250711015303.3688717-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250716000331.1378807-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss_api: test input-xfrm and hash fields

Test configuring input-xfrm and hash fields with all the limitations.
Tested on mlx5 (CX6):

  # ./ksft-net-drv/drivers/net/hw/rss_api.py
  TAP version 13
  1..10
  ok 1 rss_api.test_rxfh_nl_set_fail
  ok 2 rss_api.test_rxfh_nl_set_indir
  ok 3 rss_api.test_rxfh_nl_set_indir_ctx
  ok 4 rss_api.test_rxfh_indir_ntf
  ok 5 rss_api.test_rxfh_indir_ctx_ntf
  ok 6 rss_api.test_rxfh_nl_set_key
  ok 7 rss_api.test_rxfh_fields
  ok 8 rss_api.test_rxfh_fields_set
  ok 9 rss_api.test_rxfh_fields_set_xfrm
  ok 10 rss_api.test_rxfh_fields_ntf
  # Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Link: https://patch.msgid.link/20250716000331.1378807-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rss: support setting flow hashing fields

Add support for ETHTOOL_SRXFH (setting hashing fields) in RSS_SET.

The tricky part is dealing with symmetric hashing. In netlink user
can change the hashing fields and symmetric hash in one request,
in IOCTL the two used to be set via different uAPI requests.
Since fields and hash function config are still separate driver
callbacks - changes to the two are not atomic. Keep things simple
and validate the settings against both pre- and post- change ones.
Meaning that we will reject the config request if user tries
to correct the flow fields and set input_xfrm in one request,
or disables input_xfrm and makes flow fields non-symmetric.

We can adjust it later if there's a real need. Starting simple feels
right, and potentially partially applying the settings isn't nice,
either.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rss: support setting input-xfrm via Netlink

Support configuring symmetric hashing via Netlink.
We have the flow field config prepared as part of SET handling,
so scan it for conflicts instead of querying the driver again.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: define input-xfrm enum in the spec

Help YNL decode the values for input-xfrm by defining
the possible values in the spec. Don't define "no change"
as it's an IOCTL artifact with no use in Netlink.

With this change on mlx5 input-xfrm gets decoded:

# ynl --family ethtool --dump rss-get
[{'header': {'dev-index': 2, 'dev-name': 'eth0'},
   'hfunc': 1,
   'hkey': b'V\xa8\xf9\x9 ...',
   'indir': [0, 1, ... ],
   'input-xfrm': {'sym-or-xor'},                         <<<
   'flow-hash': {'ah4': {'ip-dst', 'ip-src'},
                 'ah6': {'ip-dst', 'ip-src'},
                 'esp4': {'ip-dst', 'ip-src'},
                 'esp6': {'ip-dst', 'ip-src'},
                 'ip4': {'ip-dst', 'ip-src'},
                 'ip6': {'ip-dst', 'ip-src'},
                 'tcp4': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'},
                 'tcp6': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'},
                 'udp4': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'},
                 'udp6': {'l4-b-0-1', 'ip-dst', 'l4-b-2-3', 'ip-src'}}
}]

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss_api: test setting hashing key via Netlink

Test setting hashing key via Netlink.

  # ./tools/testing/selftests/drivers/net/hw/rss_api.py
  TAP version 13
  1..7
  ok 1 rss_api.test_rxfh_nl_set_fail
  ok 2 rss_api.test_rxfh_nl_set_indir
  ok 3 rss_api.test_rxfh_nl_set_indir_ctx
  ok 4 rss_api.test_rxfh_indir_ntf
  ok 5 rss_api.test_rxfh_indir_ctx_ntf
  ok 6 rss_api.test_rxfh_nl_set_key
  ok 7 rss_api.test_rxfh_fields
  # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Link: https://patch.msgid.link/20250716000331.1378807-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rss: support setting hkey via Netlink

Support setting RSS hashing key via ethtool Netlink.
Use the Netlink policy to make sure user doesn't pass
an empty key, "resetting" the key is not a thing.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rss: support setting hfunc via Netlink

Support setting RSS hash function / algo via ethtool Netlink.
Like IOCTL we don't validate that the function is within the
range known to the kernel. The drivers do a pretty good job
validating the inputs, and the IDs are technically "dynamically
queried" rather than part of uAPI.

Only change should be that in Netlink we don't support user
explicitly passing ETH_RSS_HASH_NO_CHANGE (0), if no change
is requested the attribute should be absent.

The ETH_RSS_HASH_NO_CHANGE is retained in driver-facing
API for consistency (not that I see a strong reason for it).

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss_api: test setting indirection table via Netlink

Test setting indirection table via Netlink.

  # ./tools/testing/selftests/drivers/net/hw/rss_api.py
  TAP version 13
  1..6
  ok 1 rss_api.test_rxfh_nl_set_fail
  ok 2 rss_api.test_rxfh_nl_set_indir
  ok 3 rss_api.test_rxfh_nl_set_indir_ctx
  ok 4 rss_api.test_rxfh_indir_ntf
  ok 5 rss_api.test_rxfh_indir_ctx_ntf
  ok 6 rss_api.test_rxfh_fields
  # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0

Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250716000331.1378807-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support packing binary arrays of scalars

We support decoding a binary type with a scalar subtype already,
add support for sending such arrays to the kernel. While at it
also support using "None" to indicate that the binary attribute
should be empty. I couldn't decide whether empty binary should
be [] or None, but there should be no harm in supporting both.

Link: https://patch.msgid.link/20250716000331.1378807-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss_api: factor out checking min queue count

Multiple tests check min queue count, create a helper.

Link: https://patch.msgid.link/20250716000331.1378807-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: rss: initial RSS_SET (indirection table handling)

Add initial support for RSS_SET, for now only operations on
the indirection table are supported.

Unlike the ioctl don't check if at least one parameter is
being changed. This is how other ethtool-nl ops behave,
so pick the ethtool-nl consistency vs copying ioctl behavior.

There are two special cases here:
1) resetting the table to defaults;
2) support for tables of different size.

For (1) I use an empty Netlink attribute (array of size 0).

(2) may require some background. AFAICT a lot of modern devices
allow allocating RSS tables of different sizes. mlx5 can upsize
its tables, bnxt has some "table size calculation", and Intel
folks asked about RSS table sizing in context of resource allocation
in the past. The ethtool IOCTL API has a concept of table size,
but right now the user is expected to provide a table exactly
the size the device requests. Some drivers may change the table
size at runtime (in response to queue count changes) but the
user is not in control of this. What's not great is that all
RSS contexts share the same table size. For example a device
with 128 queues enabled, 16 RSS contexts 8 queues in each will
likely have 256 entry tables for each of the 16 contexts,
while 32 would be more than enough given each context only has
8 queues. To address this the Netlink API should avoid enforcing
table size at the uAPI level, and should allow the user to express
the min table size they expect.

To fully solve (2) we will need more driver plumbing but
at the uAPI level this patch allows the user to specify
a table size smaller than what the device advertises. The device
table size must be a multiple of the user requested table size.
We then replicate the user-provided table to fill the full device
size table. This addresses the "allow the user to express the min
table size" objective, while not enforcing any fixed size.
From Netlink perspective .get_rxfh_indir_size() is now de facto
the "max" table size supported by the device.

We may choose to support table replication in ethtool, too,
when we actually plumb this thru the device APIs.

Initially I was considering moving full pattern generation
to the kernel (which queues to use, at which frequency and
what min sequence length). I don't think this complexity
would buy us much and most if not all devices have pow-2
table sizes, which simplifies the replication a lot.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TX, Fix dma unmapping for devmem tx

net_iovs should have the dma address set to 0 so that
netmem_dma_unmap_page_attrs() correctly skips the unmap. This was
not done in mlx5 when support for devmem tx was added and resulted
in the splat below when the platform iommu was enabled.

This patch addresses the issue by using netmem_dma_unmap_addr_set()
which handles the net_iov case when setting the dma address. A small
refactoring of mlx5e_dma_push() was required to be able to use this API.
The function was split in two versions and each version called
accordingly. Note that netmem_dma_unmap_addr_set() introduces an
additional if case.

Splat:
  WARNING: CPU: 14 PID: 2587 at drivers/iommu/dma-iommu.c:1228 iommu_dma_unmap_page+0x7d/0x90
  Modules linked in: [...]
  Unloaded tainted modules: i10nm_edac(E):1 fjes(E):1
  CPU: 14 UID: 0 PID: 2587 Comm: ncdevmem Tainted: G S          E       6.15.0+ #3 PREEMPT(voluntary)
  Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
  Hardware name: HPE ProLiant DL380 Gen10 Plus/ProLiant DL380 Gen10 Plus, BIOS U46 06/01/2022
  RIP: 0010:iommu_dma_unmap_page+0x7d/0x90
  Code: [...]
  RSP: 0000:ff6b1e3ea0b2fc58 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ff46ef2d0a2340c8 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
  RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff8827a120
  R10: 0000000000000000 R11: 0000000000000000 R12: 00000000d8000000
  R13: 0000000000000008 R14: 0000000000000001 R15: 0000000000000000
  FS:  00007feb69adf740(0000) GS:ff46ef2c779f1000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007feb69cca000 CR3: 0000000154b97006 CR4: 0000000000773ef0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   <TASK>
   dma_unmap_page_attrs+0x227/0x250
   mlx5e_poll_tx_cq+0x163/0x510 [mlx5_core]
   mlx5e_napi_poll+0x94/0x720 [mlx5_core]
   __napi_poll+0x28/0x1f0
   net_rx_action+0x33a/0x420
   ? mlx5e_completion_event+0x3d/0x40 [mlx5_core]
   handle_softirqs+0xe8/0x2f0
   __irq_exit_rcu+0xcd/0xf0
   common_interrupt+0x47/0xa0
   asm_common_interrupt+0x26/0x40
  RIP: 0033:0x7feb69cd08ec
  Code: [...]
  RSP: 002b:00007ffc01b8c880 EFLAGS: 00000246
  RAX: 00000000c3a60cf7 RBX: 0000000000045e12 RCX: 000000000000000e
  RDX: 00000000000035b4 RSI: 0000000000000000 RDI: 00007ffc01b8c8c0
  RBP: 00007ffc01b8c8b0 R08: 0000000000000000 R09: 0000000000000064
  R10: 00007ffc01b8c8c0 R11: 0000000000000000 R12: 00007feb69cca000
  R13: 00007ffc01b90e48 R14: 0000000000427e18 R15: 00007feb69d07000
   </TASK>

Cc: Mina Almasry <almasrymina@google.com>
Reported-by: Stanislav Fomichev <stfomichev@gmail.com>
Closes: https://lore.kernel.org/all/aFM6r9kFHeTdj-25@mini-arch/
Fixes: 5a842c288cfa ("net/mlx5e: Add TX support for netmems")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/1752649242-147678-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.16-rc7).

Conflicts:

Documentation/netlink/specs/ovpn.yaml
  880d43ca9aa4 ("netlink: specs: clean up spaces in brackets")
  af52020fc599 ("ovpn: reject unexpected netlink attributes")

drivers/net/phy/phy_device.c
  a44312d58e78 ("net: phy: Don't register LEDs for genphy")
  f0f2b992d818 ("net: phy: Don't register LEDs for genphy")
https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org

drivers/net/wireless/intel/iwlwifi/fw/regulatory.c
drivers/net/wireless/intel/iwlwifi/mld/regulatory.c
  5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap")
  ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware")

net/ipv6/mcast.c
  ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()")
  a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()")
https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth, CAN, WiFi and Netfilter.

  More code here than I would have liked. That said, better now than
  next week. Nothing particularly scary stands out. The improvement to
  the OpenVPN input validation is a bit large but better get them in
  before the code makes it to a final release. Some of the changes we
  got from sub-trees could have been split better between the fix and
  -next refactoring, IMHO, that has been communicated.

  We have one known regression in a TI AM65 board not getting link. The
  investigation is going a bit slow, a number of people are on vacation.
  We'll try to wrap it up, but don't think it should hold up the
  release.

  Current release - fix to a fix:

   - Bluetooth: L2CAP: fix attempting to adjust outgoing MTU, it broke
     some headphones and speakers

  Current release - regressions:

   - wifi: ath12k: fix packets received in WBM error ring with REO LUT
     enabled, fix Rx performance regression

   - wifi: iwlwifi:
       - fix crash due to a botched indexing conversion
       - mask reserved bits in chan_state_active_bitmap, avoid FW assert()

  Current release - new code bugs:

   - nf_conntrack: fix crash due to removal of uninitialised entry

   - eth: airoha: fix potential UaF in airoha_npu_get()

  Previous releases - regressions:

   - net: fix segmentation after TCP/UDP fraglist GRO

   - af_packet: fix the SO_SNDTIMEO constraint not taking effect and a
     potential soft lockup waiting for a completion

   - rpl: fix UaF in rpl_do_srh_inline() for sneaky skb geometry

   - virtio-net: fix recursive rtnl_lock() during probe()

   - eth: stmmac: populate entire system_counterval_t in get_time_fn()

   - eth: libwx: fix a number of crashes in the driver Rx path

   - hv_netvsc: prevent IPv6 addrconf after IFF_SLAVE lost that meaning

  Previous releases - always broken:

   - mptcp: fix races in handling connection fallback to pure TCP

   - rxrpc: assorted error handling and race fixes

   - sched: another batch of "security" fixes for qdiscs (QFQ, HTB)

   - tls: always refresh the queue when reading sock, avoid UaF

   - phy: don't register LEDs for genphy, avoid deadlock

   - Bluetooth: btintel: check if controller is ISO capable on
     btintel_classify_pkt_type(), work around FW returning incorrect
     capabilities

  Misc:

   - make OpenVPN Netlink input checking more strict before it makes it
     to a final release

   - wifi: cfg80211: remove scan request n_channels __counted_by, it's
     only yielding false positives"

* tag 'net-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
  rxrpc: Fix to use conn aborts for conn-wide failures
  rxrpc: Fix transmission of an abort in response to an abort
  rxrpc: Fix notification vs call-release vs recvmsg
  rxrpc: Fix recv-recv race of completed call
  rxrpc: Fix irq-disabled in local_bh_enable()
  selftests/tc-testing: Test htb_dequeue_tree with deactivation and row emptying
  net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtree
  net: bridge: Do not offload IGMP/MLD messages
  selftests: Add test cases for vlan_filter modification during runtime
  net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
  tls: always refresh the queue when reading sock
  virtio-net: fix recursived rtnl_lock() during probe()
  net/mlx5: Update the list of the PCI supported devices
  hv_netvsc: Set VF priv_flags to IFF_NO_ADDRCONF before open to prevent IPv6 addrconf
  phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept()
  Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU
  netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
  net: fix segmentation after TCP/UDP fraglist GRO
  ipv6: mcast: Delay put pmc->idev in mld_del_delrec()
  net: airoha: fix potential use-after-free in airoha_npu_get()
  ...

Merge tag 'pm-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These address three issues introduced during the current development
  cycle and related to system suspend and hibernation, one triggering
  when asynchronous suspend of devices fails, one possibly affecting
  memory management in the core suspend code error path, and one due to
  duplicate filesystems freezing during system suspend:

   - Fix a deadlock that may occur on asynchronous device suspend
     failures due to missing completion updates in error paths (Rafael
     Wysocki)

   - Drop a misplaced pm_restore_gfp_mask() call, which may cause swap
     to be accessed too early if system suspend fails, from
     suspend_devices_and_enter() (Rafael Wysocki)

   - Remove duplicate filesystems_freeze/thaw() calls, which sometimes
     cause systems to be unable to resume, from enter_state() (Zihuan
     Zhang)"

* tag 'pm-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: Update power.completion for all devices on errors
  PM: suspend: clean up redundant filesystems_freeze/thaw() handling
  PM: suspend: Drop a misplaced pm_restore_gfp_mask() call

Merge tag 'for-net-2025-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- hci_sync: fix connectable extended advertising when using static random address
- hci_core: fix typos in macros
- hci_core: add missing braces when using macro parameters
- hci_core: replace 'quirks' integer by 'quirk_flags' bitmap
- SMP: If an unallowed command is received consider it a failure
- SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout
- L2CAP: Fix null-ptr-deref in l2cap_sock_resume_cb()
- L2CAP: Fix attempting to adjust outgoing MTU
- btintel: Check if controller is ISO capable on btintel_classify_pkt_type
- btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID

* tag 'for-net-2025-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU
  Bluetooth: btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID
  Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmap
  Bluetooth: hci_core: add missing braces when using macro parameters
  Bluetooth: hci_core: fix typos in macros
  Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout
  Bluetooth: SMP: If an unallowed command is received consider it a failure
  Bluetooth: btintel: Check if controller is ISO capable on btintel_classify_pkt_type
  Bluetooth: hci_sync: fix connectable extended advertising when using static random address
  Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb()
====================

Link: https://patch.msgid.link/20250717142849.537425-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'rxrpc-miscellaneous-fixes'

David Howells says:

====================
rxrpc: Miscellaneous fixes

Here are some fixes for rxrpc:

(1) Fix the calling of IP routing code with IRQs disabled.

(2) Fix a recvmsg/recvmsg race when the first completes a call.

(3) Fix a race between notification, recvmsg and sendmsg releasing a call.

(4) Fix abort of abort.

(5) Fix call-level aborts that should be connection-level aborts.
====================

Link: https://patch.msgid.link/20250717074350.3767366-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Fix to use conn aborts for conn-wide failures

Fix rxrpc to use connection-level aborts for things that affect the whole
connection, such as the service ID not matching a local service.

Fixes: 57af281e5389 ("rxrpc: Tidy up abort generation infrastructure")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250717074350.3767366-6-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Fix transmission of an abort in response to an abort

Under some circumstances, such as when a server socket is closing, ABORT
packets will be generated in response to incoming packets. Unfortunately,
this also may include generating aborts in response to incoming aborts -
which may cause a cycle. It appears this may be made possible by giving
the client a multicast address.

Fix this such that rxrpc_reject_packet() will refuse to generate aborts in
response to aborts.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com>
cc: LePremierHomme <kwqcheii@proton.me>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250717074350.3767366-5-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Fix notification vs call-release vs recvmsg

When a call is released, rxrpc takes the spinlock and removes it from
->recvmsg_q in an effort to prevent racing recvmsg() invocations from
seeing the same call.  Now, rxrpc_recvmsg() only takes the spinlock when
actually removing a call from the queue; it doesn't, however, take it in
the lead up to that when it checks to see if the queue is empty.  It *does*
hold the socket lock, which prevents a recvmsg/recvmsg race - but this
doesn't prevent sendmsg from ending the call because sendmsg() drops the
socket lock and relies on the call->user_mutex.

Fix this by firstly removing the bit in rxrpc_release_call() that dequeues
the released call and, instead, rely on recvmsg() to simply discard
released calls (done in a preceding fix).

Secondly, rxrpc_notify_socket() is abandoned if the call is already marked
as released rather than trying to be clever by setting both pointers in
call->recvmsg_link to NULL to trick list_empty().  This isn't perfect and
can still race, resulting in a released call on the queue, but recvmsg()
will now clean that up.

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com>
cc: LePremierHomme <kwqcheii@proton.me>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250717074350.3767366-4-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Fix recv-recv race of completed call

If a call receives an event (such as incoming data), the call gets placed
on the socket's queue and a thread in recvmsg can be awakened to go and
process it.  Once the thread has picked up the call off of the queue,
further events will cause it to be requeued, and once the socket lock is
dropped (recvmsg uses call->user_mutex to allow the socket to be used in
parallel), a second thread can come in and its recvmsg can pop the call off
the socket queue again.

In such a case, the first thread will be receiving stuff from the call and
the second thread will be blocked on call->user_mutex.  The first thread
can, at this point, process both the event that it picked call for and the
event that the second thread picked the call for and may see the call
terminate - in which case the call will be "released", decoupling the call
from the user call ID assigned to it (RXRPC_USER_CALL_ID in the control
message).

The first thread will return okay, but then the second thread will wake up
holding the user_mutex and, if it sees that the call has been released by
the first thread, it will BUG thusly:

kernel BUG at net/rxrpc/recvmsg.c:474!

Fix this by just dequeuing the call and ignoring it if it is seen to be
already released.  We can't tell userspace about it anyway as the user call
ID has become stale.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: LePremierHomme <kwqcheii@proton.me>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250717074350.3767366-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>