git.ipfire.org Git - thirdparty/kernel/linux.git/log

af_unix: Refine wait_for_unix_gc().

unix_tot_inflight is a poor metric, only telling the number of
inflight AF_UNXI sockets, and we should use unix_graph_state instead.

Also, if the receiver is catching up with the passed fds, the
sender does not need to schedule GC.

GC only helps unreferenced cyclic SCM_RIGHTS references, and in
such a situation, the malicious sendmsg() will continue to call
wait_for_unix_gc() and hit the UNIX_INFLIGHT_SANE_USER condition.

Let's make only malicious users schedule GC and wait for it to
finish if a cyclic reference exists during the previous GC run.

Then, sane users will pay almost no cost for wait_for_unix_gc().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Don't call wait_for_unix_gc() on every sendmsg().

We have been calling wait_for_unix_gc() on every sendmsg() in case
there are too many inflight AF_UNIX sockets.

This is also because the old GC implementation had poor knowledge
of the inflight sockets and had to suspect every sendmsg().

This was improved by commit d9f21b361333 ("af_unix: Try to run GC
async."), but we do not even need to call wait_for_unix_gc() if the
process is not sending AF_UNIX sockets.

The wait_for_unix_gc() call only helps when a malicious process
continues to create cyclic references, and we can detect that
in a better place and slow it down.

Let's move wait_for_unix_gc() to unix_prepare_fpl() that is called
only when AF_UNIX socket fd is passed via SCM_RIGHTS.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Don't trigger GC from close() if unnecessary.

We have been triggering GC on every close() if there is even one
inflight AF_UNIX socket.

This is because the old GC implementation had no idea of the graph
shape formed by SCM_RIGHTS references.

The new GC knows whether there could be a cyclic reference or not,
and we can do better.

Let's not trigger GC from close() if there is no cyclic reference
or GC is already in progress.

While at it, unix_gc() is renamed to unix_schedule_gc() as it does
not actually perform GC since commit 8b90a9f819dc ("af_unix: Run
GC on only one CPU.").

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Simplify GC state.

GC manages its state by two variables, unix_graph_maybe_cyclic
and unix_graph_grouped, both of which are set to false in the
initial state.

When an AF_UNIX socket is passed to an in-flight AF_UNIX socket,
unix_update_graph() sets unix_graph_maybe_cyclic to true and
unix_graph_grouped to false, making the next GC invocation call
unix_walk_scc() to group SCCs.

Once unix_walk_scc() finishes, sockets in the same SCC are linked
via vertex->scc_entry.  Then, unix_graph_grouped is set to true
so that the following GC invocations can skip Tarjan's algorithm
and simply iterate through the list in unix_walk_scc_fast().

In addition, if we know there is at least one cyclic reference,
we set unix_graph_maybe_cyclic to true so that we do not skip GC.

So the state transitions as follows:

  (unix_graph_maybe_cyclic, unix_graph_grouped)
  =
  (false, false) -> (true, false) -> (true, true) or (false, true)
                         ^.______________/________________/

There is no transition to the initial state where both variables
are false.

If we consider the initial state as grouped, we can see that the
GC actually has a tristate.

Let's consolidate two variables into one enum.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Count cyclic SCC.

__unix_walk_scc() and unix_walk_scc_fast() call unix_scc_cyclic()
for each SCC to check if it forms a cyclic reference, so that we
can skip GC at the following invocations in case all SCCs do not
have any cycles.

If we count the number of cyclic SCCs in __unix_walk_scc(), we can
simplify unix_walk_scc_fast() because the number of cyclic SCCs
only changes when it garbage-collects a SCC.

So, let's count cyclic SCC in __unix_walk_scc() and decrement it
in unix_walk_scc_fast() when performing garbage collection.

Note that we will use this counter in a later patch to check if a
cycle existed in the previous GC run.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-misc-changes-2025-11-17'

Tariq Toukan says:

====================
net/mlx5: misc changes 2025-11-17

This series contains misc enhancements to the mlx5 driver.
====================

Link: https://patch.msgid.link/1763415729-1238421-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Use EOPNOTSUPP instead of ENOTSUPP

Per Documentation/dev-tools/checkpatch.rst, ENOTSUPP is not a standard
error code and should be avoided. EOPNOTSUPP should be used instead.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Abort new commands if all command slots are stalled

In case of a FW issue, FW might be not responding to FW commands,
causing kernel lockout for a long period of time, e.g. rtnl_lock held
while ethtool is trying to collect stats waiting for FW to respond to
multiple commands, when all of them will timeout.

While there's no immediate indication of the FW lockout, we can safely
assume that something is wrong when all command slots are busy and in
a timeout state and no FW completion was received on any of them.

In such case, start immediately failing new commands.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Remove redundant bw_share minimal value assignment

Remove unnecessary logic that sets bw_share to minimal value, when
parent has bw_share configured but nodes don't have min_rate.

This check is redundant because the parent bandwidth acts as the upper
bound regardless, and the firmware always enforces the topmost
bandwidth constraint.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Recover SQ on excessive PTP TX timestamp delta

Extend the TX timestamp handler to recover the SQ when the difference
between the port and CQE TX timestamps is abnormally large.

The current logic aborts timestamp delivery if the delta exceeds
1/128 seconds, which matches the maximum expected packet interval in
ptp4l. A larger delta makes the timestamps unreliable.

This change adds recovery if the delta exceeds 0.5 seconds. Such a
large gap should not occur in normal operation and indicates that
firmware is stuck or metadata tracking is out of sync, leading to stale
or mismatched timestamps. Recovering the SQ ensures forward progress
and avoids silently dropping invalid timestamps.

The timestamp handler now takes mlx5e_ptpsq directly to access both CQ
stats and the recovery state.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Refactor EEPROM query error handling to return status separately

Matthew and Jakub reported [1] issues where inventory automation tools
are calling EEPROM query repeatedly on a port that doesn't have an SFP
connected, resulting in millions of error prints.

Move MCIA register status extraction from the query functions to the
callers, allowing use of extack reporting instead of a dmesg print when
using the netlink API.

[1] https://lore.kernel.org/netdev/20251028194011.39877-1-mattc@purestorage.com/

Cc: Matthew W Carlis <mattc@purestorage.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: support ipv4-or-v6 for dual-stack fields

Since commit 1b255e1beabf ("tools: ynl: add ipv4-or-v6 display hint"), we
can display either IPv4 or IPv6 addresses for a single field based on the
address family. However, most dual-stack fields still use the ipv4 display
hint. This update changes them to use the new ipv4-or-v6 display hint and
converts IPv4-only fields to use the u32 type.

Field changes:
  - v4-or-v6
    - IFA_ADDRESS, IFA_LOCAL
    - IFLA_GRE_LOCAL, IFLA_GRE_REMOTE
    - IFLA_VTI_LOCAL, IFLA_VTI_REMOTE
    - IFLA_IPTUN_LOCAL, IFLA_IPTUN_REMOTE
    - NDA_DST
    - RTA_DST, RTA_SRC, RTA_GATEWAY, RTA_PREFSRC
    - FRA_SRC, FRA_DST
  - ipv4
    - IFA_BROADCAST
    - IFLA_GENEVE_REMOTE
    - IFLA_IPTUN_6RD_RELAY_PREFIX

Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251117024457.3034-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: Add MAC address parsing support

Add missing support for parsing MAC addresses when display_hint is 'mac'
in the YNL library. This enables YNL CLI to accept MAC address strings
for attributes like lladdr in rt-neigh operations.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251117024457.3034-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-expand-napi_skb_cache-use'

Eric Dumazet says:

====================
net: expand napi_skb_cache use

This is a followup of commit e20dfbad8aab ("net: fix napi_consume_skb()
with alien skbs").

Now the per-cpu napi_skb_cache is populated from TX completion path,
we can make use of this cache, especially for cpus not used
from a driver NAPI poll (primary user of napi_cache).

With this series, I consistently reach 130 Mpps on my UDP tx stress test
and reduce SLUB spinlock contention to smaller values.
====================

Link: https://patch.msgid.link/20251116202717.1542829-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: use napi_skb_cache even in process context

This is a followup of commit e20dfbad8aab ("net: fix napi_consume_skb()
with alien skbs").

Now the per-cpu napi_skb_cache is populated from TX completion path,
we can make use of this cache, especially for cpus not used
from a driver NAPI poll (primary user of napi_cache).

We can use the napi_skb_cache only if current context is not from hard irq.

With this patch, I consistently reach 130 Mpps on my UDP tx stress test
and reduce SLUB spinlock contention to smaller values.

Note there is still some SLUB contention for skb->head allocations.

I had to tune /sys/kernel/slab/skbuff_small_head/cpu_partial
and /sys/kernel/slab/skbuff_small_head/min_partial depending
on the platform taxonomy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251116202717.1542829-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: __alloc_skb() cleanup

This patch refactors __alloc_skb() to prepare the following one,
and does not change functionality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251116202717.1542829-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add a new @alloc parameter to napi_skb_cache_get()

We want to be able in the series last patch to get an skb from
napi_skb_cache from process context, if there is one available.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251116202717.1542829-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: ks8995: Fix incorrect OF match table name

The driver declares an OF match table named ks8895_spi_of_match, even
though it describes compatible strings for the KS8995 and related Micrel
switches. This is a leftover typo, the correct name should match the
chip family handled by this driver ks8995, and also match the variable
used in spi_driver.of_match_table.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20251117095356.2099772-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

kcm: Fix typo and add hyphen in Kconfig help text

s/connectons/connections/ and s/message based/message-based/

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20251116135616.106079-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Don't reinitialise tw->tw_transparent in tcp_time_wait().

tw->tw_transparent is initialised twice in inet_twsk_alloc()
and tcp_time_wait().

Let's remove the latter.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251118000445.4091280-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'ipsec-next-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2025-11-18

1) Relax a lock contention bottleneck to improve IPsec crypto
   offload performance. From Jianbo Liu.

2) Deprecate pfkey, the interface will be removed in 2027.

3) Update xfrm documentation and move it to ipsec maintainance.
   From Bagas Sanjaya.

* tag 'ipsec-next-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  MAINTAINERS: Add entry for XFRM documentation
  net: Move XFRM documentation into its own subdirectory
  Documentation: xfrm_sync: Number the fifth section
  Documentation: xfrm_sysctl: Trim trailing colon in section heading
  Documentation: xfrm_sync: Trim excess section heading characters
  Documentation: xfrm_sync: Properly reindent list text
  Documentation: xfrm_device: Separate hardware offload sublists
  Documentation: xfrm_device: Use numbered list for offloading steps
  Documentation: xfrm_device: Wrap iproute2 snippets in literal code block
  pfkey: Deprecate pfkey
  xfrm: Skip redundant replay recheck for the hardware offload path
  xfrm: Refactor xfrm_input lock to reduce contention with RSS
====================

Link: https://patch.msgid.link/20251118092610.2223552-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: cdns,macb: Add pic64gx compatibility

The pic64gx uses an identical integration of the macb IP to mpfs.

Signed-off-by: Pierre-Henry Moussay <pierre-henry.moussay@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251117-easter-machine-37851f20aaf3@spud
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynltool: ignore *.d deps files

Add *.d to gitignore for ynltool

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251117143155.44806-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'gve-implement-xdp-hw-rx-timestamping-support-for-dq'

Tim Hostetler says:

====================
gve: Implement XDP HW RX Timestamping support for DQ

From: Tim Hostetler <thostet@google.com>

This patch series adds support for bpf_xdp_metadata_rx_timestamp from an
XDP program loaded into the driver on its own or bound to an XSK. This
is only supported for DQ.
====================

Link: https://patch.msgid.link/20251114211146.292068-1-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Add Rx HWTS metadata to AF_XDP ZC mode

By overlaying the struct gve_xdp_buff on top of the struct xdp_buff_xsk
that AF_XDP utilizes, the driver records the 32 bit timestamp via the
completion descriptor and the cached 64 bit NIC timestamp via gve_priv.

The driver's implementation of xmo_rx_timestamp extends the timestamp to
the full and up to date 64 bit timestamp and returns it to the user.

gve_rx_xsk_dqo is modified to accept a pointer to the completion
descriptor and no longer takes a buf_len explicitly as it can be pulled
out of the descriptor.

With this patch gve now supports bpf_xdp_metadata_rx_timestamp.

Signed-off-by: Tim Hostetler <thostet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251114211146.292068-5-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Prepare bpf_xdp_metadata_rx_timestamp support

Support populating XDP RX metadata with hardware RX timestamps. This
patch utilizes the same underlying logic to calculate hardware
timestamps as the regular RX path.

xdp_metadata_ops is registered with the net_device in a future patch.

gve_rx_calculate_hwtstamp was pulled out so as to not duplicate logic
between gve_xdp_rx_timestamp and gve_rx_hwtstamp.

Signed-off-by: Tim Hostetler <thostet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251114211146.292068-4-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Wrap struct xdp_buff

RX timestamping will need to keep track of extra temporary information
per-packet. In preparation for this, introduce gve_xdp_buff to wrap the
xdp_buff. This is similar in function to stmmac_xdp_buff and
ice_xdp_buff.

Signed-off-by: Tim Hostetler <thostet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251114211146.292068-3-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Move ptp_schedule_worker to gve_init_clock

Previously, gve had been only initializing ptp aux work when
hardware timestamping was initialized through ndo_hwtsatmp_set. As this
patch series introduces XDP hardware timestamp metadata which will
require the ptp aux work, the work can't be gated on the
kernel_hwtstamp_config being set and must be initialized elsewhere.

For simplicity, ptp_schedule_worker is invoked right after the ptp_clock
is registered with the kernel (which happens during gve_probe or
following reset). The worker is scheduled in GVE_NIC_TS_SYNC_INTERVAL_MS
as the synchronous call to gve_clock_nic_ts_read makes the worker
redundant if scheduled immediately.

If gve cannot read the device clock immediately, it errors out of
gve_init_clock.

Signed-off-by: Tim Hostetler <thostet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251114211146.292068-2-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: micrel: lan8814: Enable in-band auto-negotiation

The lan8814 supports two interfaces towards the host (QSGMII and QUSGMII).
Currently the lan8814 disables the auto-negotiation towards the host
side. So, extend this to allow to configure to use in-band
auto-negotiation.
I have tested this only with the QSGMII interface.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251114084224.3268928-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: af_unix: Add tests for ECONNRESET and EOF semantics

Add selftests to verify and document Linux’s intended behaviour for
UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes.
The tests verify that:

1. SOCK_STREAM returns EOF when the peer closes normally.
2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data.
3. SOCK_SEQPACKET returns EOF when the peer closes normally.
4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data.
5. SOCK_DGRAM does not return ECONNRESET when the peer closes.

This follows up on review feedback suggesting a selftest to clarify
Linux’s semantics.

Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com>
Link: https://patch.msgid.link/20251113112802.44657-1-adelodunolaoluwa@yahoo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-stmmac-disable-eee-rx-clock-stop-when-vlan-is-enabled'

Ovidiu Panait says:

====================
net: stmmac: Disable EEE RX clock stop when VLAN is enabled

This series fixes a couple of VLAN issues observed on the Renesas RZ/V2H
EVK platform (stmmac + Microchip KSZ9131RNXI PHY):

- The first patch fixes a bug where VLAN ID 0 would not be properly removed
due to how vlan_del_hw_rx_fltr() matched entries in the VLAN filter table.

- The second patch addresses RX clock gating issues that occur during VLAN
creation and deletion when EEE is enabled with RX clock-stop active (the
default configuration). For example:

    # ip link add link end1 name end1.5 type vlan id 5
    15c40000.ethernet end1: Timeout accessing MAC_VLAN_Tag_Filter
    RTNETLINK answers: Device or resource busy

The stmmac hardware requires the receive clock to be running when writing
certain registers, including VLAN registers. However, by default the driver
enables Energy Efficient Ethernet (EEE) and allows the PHY to stop the
receive clock when the link is idle. As a result, the RX clock might be
stopped when attempting to access these registers, leading to timeouts.

A more comprehensive overview of receive clock related issues in the
stmmac driver can be found here:
https://lore.kernel.org/all/Z9ySeo61VYTClIJJ@shell.armlinux.org.uk/

Most of the issues were resolved by commit dd557266cf5fb ("net: stmmac:
block PHY RXC clock-stop"), which wraps register accesses with
phylink_rx_clk_stop_block()/unblock() calls. However, VLAN add/delete
operations are invoked with bottom halves disabled, where sleeping is
not permitted, so those helpers cannot be used.

To avoid these VLAN timeouts, the second patch disables the EEE RX
clock-stop feature when VLAN support is enabled. This ensures the receive
clock remains active, allowing VLAN operations to complete successfully.
====================

Link: https://patch.msgid.link/20251113112721.70500-1-ovidiu.panait.rb@renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: Disable EEE RX clock stop when VLAN is enabled

On the Renesas RZ/V2H EVK platform, where the stmmac MAC is connected to a
Microchip KSZ9131RNXI PHY, creating or deleting VLAN interfaces may fail
with timeouts:

    # ip link add link end1 name end1.5 type vlan id 5
    15c40000.ethernet end1: Timeout accessing MAC_VLAN_Tag_Filter
    RTNETLINK answers: Device or resource busy

Disabling EEE at runtime avoids the problem:

    # ethtool --set-eee end1 eee off
    # ip link add link end1 name end1.5 type vlan id 5
    # ip link del end1.5

The stmmac hardware requires the receive clock to be running when writing
certain registers, such as those used for MAC address configuration or
VLAN filtering. However, by default the driver enables Energy Efficient
Ethernet (EEE) and allows the PHY to stop the receive clock when the link
is idle. As a result, the RX clock might be stopped when attempting to
access these registers, leading to timeouts and other issues.

Commit dd557266cf5fb ("net: stmmac: block PHY RXC clock-stop")
addressed this issue for most register accesses by wrapping them in
phylink_rx_clk_stop_block()/phylink_rx_clk_stop_unblock() calls.
However, VLAN add/delete operations may be invoked with bottom halves
disabled, where sleeping is not allowed, so using these helpers is not
possible.

Therefore, to fix this, disable the RX clock stop feature in the phylink
configuration if VLAN features are set. This ensures the RX clock remains
active and register accesses succeed during VLAN operations.

Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20251113112721.70500-3-ovidiu.panait.rb@renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: Fix VLAN 0 deletion in vlan_del_hw_rx_fltr()

When the "rx-vlan-filter" feature is enabled on a network device, the 8021q
module automatically adds a VLAN 0 hardware filter when the device is
brought administratively up.

For stmmac, this causes vlan_add_hw_rx_fltr() to create a new entry for
VID 0 in the mac_device_info->vlan_filter array, in the following format:

VLAN_TAG_DATA_ETV | VLAN_TAG_DATA_VEN | vid

Here, VLAN_TAG_DATA_VEN indicates that the hardware filter is enabled for
that VID.

However, on the delete path, vlan_del_hw_rx_fltr() searches the vlan_filter
array by VID only, without verifying whether a VLAN entry is enabled. As a
result, when the 8021q module attempts to remove VLAN 0, the function may
mistakenly match a zero-initialized slot rather than the actual VLAN 0
entry, causing incorrect deletions and leaving stale entries in the
hardware table.

Fix this by verifying that the VLAN entry's enable bit (VLAN_TAG_DATA_VEN)
is set before matching and deleting by VID. This ensures only active VLAN
entries are removed and avoids leaving stale entries in the VLAN filter
table, particularly for VLAN ID 0.

Fixes: ed64639bc1e08 ("net: stmmac: Add support for VLAN Rx filtering")
Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251113112721.70500-2-ovidiu.panait.rb@renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'dpll-zl3073x-refactor-state-management'

Ivan Vecera says:

====================
dpll: zl3073x: Refactor state management

This patch set is a refactoring of the zl3073x driver to clean up
state management, improve modularity, and significantly reduce
on-demand I/O.

The driver's dpll.c implementation previously performed on-demand
register reads and writes (wrapped in mailbox operations) to get
or set properties like frequency, phase, and embedded-sync settings.
This cluttered the DPLL logic with low-level I/O, duplicated locking,
and led to inefficient bus traffic.

This series addresses this by:
1. Splitting the monolithic 'core.c' into logical units ('ref.c',
   'out.c', 'synth.c').
2. Implementing a full read/write-back cache for 'zl3073x_ref' and
   'zl3073x_out' structures.

All state is now read once during '_state_fetch()' (and status updated
periodically). DPLL get callbacks read from this cache. Set callbacks
modify a copy of the state, which is then committed via a new
'..._state_set()' function. These '_state_set' functions compare
the new state to the cached state and write *only* the modified
register values back to the hardware, all within a single mailbox
sequence.

The result is a much cleaner 'dpll.c' that is almost entirely
free of direct register I/O, and all state logic is properly
encapsulated in its respective file.

The series is broken down as follows:

* Patch 1: Changes the state structs to store raw register values
  (e.g., 'config', 'ctrl') instead of parsed booleans, centralizing
  parsing logic into the helpers.
* Patch 2: Splits the logic from 'core.c' into new 'ref.c', 'out.c'
  and 'synth.c' files, creating a 'zl3073x_dev_...' abstraction layer.
* Patch 3: Introduces the caching concept by reading and caching
  the reference monitor status periodically, removing scattered
  reads from 'dpll.c'.
* Patch 4: Expands the 'zl3073x_ref' struct to cache *all* reference
  properties and adds 'zl3073x_ref_state_set()' to write back changes.
* Patch 5: Does the same for the 'zl3073x_out' struct, caching all
  output properties and adding 'zl3073x_out_state_set()'.
* Patch 6: A final cleanup that removes the 'zl3073x_dev_...' wrapper
  functions that became redundant after the refactoring.
====================

Link: https://patch.msgid.link/20251113074105.141379-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Remove unused dev wrappers

Remove several zl3073x_dev_... inline wrapper functions from core.h
as they are no longer used by any callers.

Removed functions:
* zl3073x_dev_ref_ffo_get
* zl3073x_dev_ref_is_enabled
* zl3073x_dev_synth_dpll_get
* zl3073x_dev_synth_is_enabled
* zl3073x_dev_out_signal_format_get

This is a cleanup after recent refactoring, as the remaining callers
now fetch the state object and use the base helpers directly.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-7-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Cache all output properties in zl3073x_out

Expand the zl3073x_out structure to cache all output-related
hardware registers, including divisors, widths, embedded-sync
parameters and phase compensation.

Modify zl3073x_out_state_fetch() to read and populate all these
new fields at once, including zero-divisor checks. Refactor all
dpll "getter" functions in dpll.c to read from this new
cached state instead of performing direct register access.

Introduce a new function, zl3073x_out_state_set(), to handle
writing changes back to the hardware. This function compares the
provided state with the current cached state and writes *only* the
modified register values via a single mailbox sequence before
updating the local cache.

Refactor all dpll "setter" functions to modify a local copy of
the output state and then call zl3073x_out_state_set() to
commit the changes.

This change centralizes all output-related register I/O into
out.c, significantly reduces bus traffic, and simplifies the logic
in dpll.c.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-6-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Cache all reference properties in zl3073x_ref

Expand the zl3073x_ref structure to cache all reference-related
hardware registers, including frequency components, embedded-sync
settings and phase compensation. Previously, these registers were
read on-demand from various functions in dpll.c leading to frequent
mailbox operations.

Modify zl3073x_ref_state_fetch() to read and populate all these new
fields at once. Refactor all "getter" functions in dpll.c to read
from this new cached state instead of performing direct register
access.

Remove the standalone zl3073x_dpll_input_ref_frequency_get() helper,
as its functionality is now replaced by zl3073x_ref_freq_get() which
operates on the cached state and add a corresponding zl3073x_dev_...
wrapper.

Introduce a new function, zl3073x_ref_state_set(), to handle
writing changes back to the hardware. This function compares the
provided state with the current cached state and writes *only* the
modified register values to the device via a single mailbox sequence
before updating the local cache.

Refactor all dpll "setter" functions to modify a local copy of the
ref state and then call zl3073x_ref_state_set() to commit the changes.

As a cleanup, update callers in dpll.c that already have
a struct zl3073x_ref * to use the direct helpers instead of the
zl3073x_dev_... wrappers.

This change centralizes all reference-related register I/O into ref.c,
significantly reduces bus traffic, and simplifies the logic in dpll.c.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-5-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Cache reference monitor status

Instead of reading the ZL_REG_REF_MON_STATUS register every time
the reference status is needed, cache this value in the zl3073x_ref
struct.

This is achieved by:
* Adding a mon_status field to struct zl3073x_ref
* Introducing zl3073x_dev_ref_status_update() to read the status for
  all references into this new cache field
* Calling this update function from the periodic work handler
* Adding zl3073x_ref_is_status_ok() and zl3073x_dev_ref_is_status_ok()
  helpers to check the cached value
* Refactoring all callers in dpll.c to use the new
  zl3073x_dev_ref_is_status_ok() helper, removing direct register reads

This change consolidates all status register reads into a single periodic
function and reduces I/O bus traffic in dpll callbacks.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Split ref, out, and synth logic from core

Refactor the zl3073x driver by splitting the logic for input
references, outputs and synthesizers out of the monolithic
core.[ch] files.

Move the logic for each functional block into its own dedicated files:
ref.[ch], out.[ch] and synth.[ch].

Specifically:
- Move state structures (zl3073x_ref, zl3073x_out, zl3073x_synth)
  from core.h into their respective new headers
- Move state-fetching functions (..._state_fetch) from core.c to their
  new .c files
- Move the zl3073x_ref_freq_factorize helper from core.c to ref.c
- Introduce a new helper layer to decouple the core device logic from
  the state-parsing logic:
  1. Move the original inline helpers (e.g., zl3073x_ref_is_enabled)
     to the new headers (ref.h, etc.) and make them operate on a
     const struct ... * pointer.
  2. Create new zl3073x_dev_... prefixed functions in core.h
     (e.g., zl3073x_dev_ref_is_enabled) and Implement these _dev_ functions
     to fetch state using a new ..._state_get() helper and then call
     the non-prefixed helper.
  3. Update all driver-internal callers (in dpll.c, prop.c, etc.) to use
     the new zl3073x_dev_... functions.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Store raw register values instead of parsed state

The zl3073x_ref, zl3073x_out and zl3073x_synth structures
previously stored state that was parsed from register reads. This
included values like boolean 'enabled' flags, synthesizer selections,
and pre-calculated frequencies.

This commit refactors the state management to store the raw register
values directly in these structures. The various inline helper functions
are updated to parse these raw values on-demand using FIELD_GET.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 's390-qeth-improve-handling-of-osa-rcs'

Aswin Karuvally says:

====================
s390/qeth: Improve handling of OSA RCs

This two patch series aims to improve how return codes from OSA Express
are handled in the qeth driver.

OSA defines a number of return codes whose meaning is determined by the
issuing command, ie. they are ambiguous. The first patch moves
definitions of all return codes including the ambiguous ones to a single
enum block to aid readability and maintainability.

The second patch implements a mechanism to interpret return codes based
on the issuing command to ensure accurate debug messages. While at it,
remove extern keyword and fix indentation for function declarations to
be in line with Linux kernel coding style.
====================

Link: https://patch.msgid.link/20251113144209.2140061-1-aswin@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/qeth: Handle ambiguous OSA RCs in s390dbf

OSA Express defines a number of return codes whose meaning is determined
by the issuing command, making them ambiguous. The important ones are
reported as debug messages through the s390 debug feature.

The qeth driver currently does not take the issuing command into account
when interpreting the return code which sometimes leads to incorrect
debug messages.

Implement a mechanism to interpret and report these return codes
properly. While at it, remove extern keyword and fix indentation for
function declarations to be in line with Linux kernel coding style.

Suggested-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Link: https://patch.msgid.link/20251113144209.2140061-3-aswin@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/qeth: Move all OSA RCs to single enum

OSA Express defines a number of return codes whose meaning is
determined by the issuing command, making them ambiguous. Move
definitions of all return codes including the ambiguous ones to a single
enum block to aid readability and maintainability.

Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Link: https://patch.msgid.link/20251113144209.2140061-2-aswin@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: bail out from probe if fiber mode is detected on RTL8127AF

It was reported that on a card with RTL8127AF (SFP + DAC) link-up isn't
detected. Realtek hides the SFP behind the internal PHY, which isn't
behaving fully compliance with clause 22 any longer in fiber mode.
Due to not having access to chip documentation there isn't much I can
do for now. Instead of silently failing to detect link-up in fiber mode,
inform the user that fiber mode isn't support and bail out.

The logic to detect fiber mode is borrowed from Realtek's r8127 driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/fab6605a-54e2-4f54-b194-11c2b9caaaa9@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-dwmac-sophgo-add-phy-interface-filter'

Inochi Amaoto says:

====================
net: stmmac: dwmac-sophgo: Add phy interface filter

As the SG2042 has an internal rx delay, the delay should be remove
when init the mac, otherwise the phy will be misconfigurated.

Since this delay fix is common for other MACs, add a common helper
for it. And use it to fix SG2042.
====================

Link: https://patch.msgid.link/20251114003805.494387-1-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-sophgo: Add phy interface filter

As the SG2042 has an internal rx delay, the delay should be removed
when initializing the mac, otherwise the phy will be misconfigurated.

Fixes: 543009e2d4cd ("net: stmmac: dwmac-sophgo: Add support for Sophgo SG2042 SoC")
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Tested-by: Han Gao <rabenda.cn@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251114003805.494387-4-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Add helper for fixing RGMII PHY mode based on internal mac delay

The "phy-mode" property of devicetree indicates whether the PCB has
delay now, which means the mac needs to modify the PHY mode based
on whether there is an internal delay in the mac.

This modification is similar for many ethernet drivers. To simplify
code, define the helper phy_fix_phy_mode_for_mac_delays(speed, mac_txid,
mac_rxid) to fix PHY mode based on whether mac adds internal delay.

Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251114003805.494387-3-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: sophgo,sg2044-dwmac: add phy mode restriction

As the ethernet controller of SG2044 and SG2042 only supports
RGMII phy. Add phy-mode property to restrict the value.

Also, since SG2042 has internal rx delay in its mac, make
only "rgmii-txid" and "rgmii-id" valid for phy-mode.

Fixes: e281c48a7336 ("dt-bindings: net: sophgo,sg2044-dwmac: Add support for Sophgo SG2042 dwmac")
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251114003805.494387-2-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mana-refactor-gf-stats-handling-and-add-rx_missed_errors-counter'

Erni Sri Satya Vennela says:

====================
net: mana: Refactor GF stats handling and add rx_missed_errors counter

Restructure mana_query_gf_stats() to operate on the per-VF mana_context,
instead of per-port statistics. Introduce mana_ethtool_hc_stats to
isolate hardware counter statistics and update the
"ethtool -S <interface>" output to expose all relevant counters while
preserving backward compatibility.

Add support for the standard rx_missed_errors counter by mapping it
to the hardware's hc_rx_discards_no_wqe metric. Refresh statistics
every 2 seconds.
====================

Link: https://patch.msgid.link/1763120599-6331-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Add standard counter rx_missed_errors

Report standard counter stats->rx_missed_errors
using hc_rx_discards_no_wqe from the hardware.

Add a global workqueue to periodically run
mana_query_gf_stats every 2 seconds to get the latest
info in eth_stats and define a driver capability flag
to notify hardware of the periodic queries.

To avoid repeated failures and log flooding, the workqueue
is not rescheduled if mana_query_gf_stats fails on HWC timeout
error and the stats are reset to 0. Other errors are transient
which will not need a VF reset for recovery.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763120599-6331-3-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Move hardware counter stats from per-port to per-VF context

Move hardware counter (HC) statistics from mana_port_context to
mana_context to enable sharing stats across multiple network ports
on the same MANA VF. Previously, each network port queried
hardware counters independently using MANA_QUERY_GF_STAT command
(GF = Generic Function stats from GDMA hardware), resulting in
redundant queries when multiple ports existed on the same device.

Isolate hardware counter stats by introducing mana_ethtool_hc_stats
in mana_context and update the code to ensure all stats are properly
reported via ethtool -S <interface>, maintaining consistency with
previous behavior.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763120599-6331-2-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-clean-up-plat_dat-allocation-initialisation'

Russell King says:

====================
net: stmmac: clean up plat_dat allocation/initialisation

This series cleans up the plat_dat allocation and initialisation,
moving common themes into the allocator.

This results in a nice saving:
7 files changed, 53 insertions(+), 148 deletions(-)
====================

Link: https://patch.msgid.link/aRdKVMPHXlIn457m@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: remove unnecessary .pkt_route queue initialisation

PCI drivers explicitly set .pkt_route to zero. However, as the struct
is allocated using devm_kzalloc(), all members default to zero unless
explicitly initialised. Thus, explicitly setting these to zero is
unnecessary. Remove these. This leaves only stmmac_platform.c where
this is explicitly initialised depending on DT properties.

$ grep '\.pkt_route =' *.c
dwmac-intel.c: plat->rx_queues_cfg[0].pkt_route = 0x0;
dwmac-intel.c: plat->rx_queues_cfg[i].pkt_route = 0x0;
dwmac-loongson.c: plat->rx_queues_cfg[0].pkt_route = 0x0;
stmmac_main.c: if (priv->plat->rx_queues_cfg[queue].pkt_route == 0x0)
stmmac_pci.c: plat->rx_queues_cfg[0].pkt_route = 0x0;
stmmac_pci.c: plat->rx_queues_cfg[i].pkt_route = 0x0;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_AVCPQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_PTPQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_DCBCPQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_UPQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_MCBCQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = 0x0;

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvjf-0000000EVkO-1ZaO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: remove unnecessary .prio queue initialisation

stmmac_platform.c explicitly sets .prio to zero if the snps,priority
property is not present in DT for the queue. However, as the struct
is allocated using devm_kzalloc(), all members default to zero unless
explicitly initialised, and of_property_read_u32() will not write to
its argument if the property is not found. Thus, explicitly setting
these to zero is unnecessary. Remove these.

$ grep '\.prio =' *.c
stmmac_platform.c: plat->rx_queues_cfg[queue].prio = 0;
stmmac_platform.c: plat->tx_queues_cfg[queue].prio = 0;

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvja-0000000EVkI-0zUH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: remove unnecessary .use_prio queue initialisation

Several drivers (see below) explicitly set the queue .use_prio
configuration to false. However, as this structure is allocated using
devm_kzalloc(), all members default to zero unless otherwise explicitly
initialised. .use_prio isn't, so defaults to false. Remove these
unnecessary initialisations, leaving stmmac_platform.c as the only
file that .use_prio is set true.

$ grep 'use_prio =' *.c
dwmac-intel.c: plat->tx_queues_cfg[0].use_prio = false;
dwmac-intel.c: plat->rx_queues_cfg[0].use_prio = false;
dwmac-intel.c: plat->rx_queues_cfg[i].use_prio = false;
dwmac-intel.c: plat->tx_queues_cfg[i].use_prio = false;
dwmac-loongson.c: plat->tx_queues_cfg[0].use_prio = false;
dwmac-loongson.c: plat->rx_queues_cfg[0].use_prio = false;
stmmac_pci.c: plat->tx_queues_cfg[0].use_prio = false;
stmmac_pci.c: plat->rx_queues_cfg[0].use_prio = false;
stmmac_pci.c: plat->tx_queues_cfg[i].use_prio = false;
stmmac_pci.c: plat->rx_queues_cfg[i].use_prio = false;
stmmac_platform.c: plat->rx_queues_cfg[queue].use_prio = false;
stmmac_platform.c: plat->rx_queues_cfg[queue].use_prio = true;
stmmac_platform.c: plat->tx_queues_cfg[queue].use_prio = false;
stmmac_platform.c: plat->tx_queues_cfg[queue].use_prio = true;

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvjV-0000000EVkC-0WAV@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: setup default RX channel map in stmmac_plat_dat_alloc()

Setup the default 1:1 RX channel map in stmmac_plat_dat_alloc() and
remove 1:1 initialisations from platform glue drivers.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvjQ-0000000EVk6-05z7@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move initialisation of queues_to_use to stmmac_plat_dat_alloc()

Move the default initialisation of plat_dat->tx_queues_to_use and
plat_dat->rx_queues_to_use to 1 to stmmac_plat_dat_alloc(). This means
platform glue only needs to override this if different.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvjK-0000000EVk0-3qb2@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move initialisation of unicast_filter_entries to stmmac_plat_dat_alloc()

Move the default initialisation of plat_dat->unicast_filter_entries to
1 to stmmac_plat_dat_alloc(). This means platform glue only needs to
override this if different.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvjF-0000000EVju-3LfS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move initialisation of multicast_filter_bins to stmmac_plat_dat_alloc()

Move the default initialisation of plat_dat->multicast_filter_bins to
HASH_TABLE_SIZE to stmmac_plat_dat_alloc(). This means platform glue
only needs to override this if different.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvjA-0000000EVjo-2qVn@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move initialisation of maxmtu to stmmac_plat_dat_alloc()

Move the default initialisation of plat_dat->maxmtu to JUMBO_LEN to
stmmac_plat_dat_alloc().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvj5-0000000EVji-2EYA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move initialisation of clk_csr to stmmac_plat_dat_alloc()

Move the default initialisation of plat_dat->clk_csr to
stmmac_plat_dat_alloc().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvj0-0000000EVjb-1jDh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move initialisation of phy_addr to stmmac_plat_dat_alloc()

Move the default initialisation of plat_dat->phy_addr to
stmmac_plat_dat_alloc().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJviv-0000000EVjV-1CLF@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: add stmmac_plat_dat_alloc()

Add a function to allocate and initialise the plat_stmmacenet_data
structure with default values.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJviq-0000000EVjP-0c0l@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

NFC: mei_phy: fix kernel-doc warnings

Fix kernel-doc warnings in mei_phy.h to avoid build warnings and to
improve and documentation:

mei_phy.h:15: warning: missing initial short description on line:
* struct nfc_mei_phy
mei_phy.h:19: warning: bad line:

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://patch.msgid.link/20251116070959.85055-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'convert-drivers-to-use-ndo_hwtstamp-callbacks-part-4'

Vadim Fedorenko says:

====================
convert drivers to use ndo_hwtstamp callbacks part 4

This patchset is a subset of part 3 patchset to convert bnx2x and qede
drviers to use ndo callbacks instead ioctl to configure and report time
stamping. These drivers implemented only SIOCSHWTSTAMP command, but
converted to also provide configuration back to users. Some logic is
changed to avoid reporting configuration which is not in sync with the
HW in case of error happened.
====================

Link: https://patch.msgid.link/20251116094610.3932005-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qede: convert to use ndo_hwtstamp callbacks

The driver implemented SIOCSHWTSTAMP ioctl cmd only, but it stores
configuration in private structure, so it can be reported back to users.
Implement both ndo_hwtstamp_set and ndo_hwtstamp_set callbacks.
ndo_hwtstamp_set implements a check of unsupported 1-step timestamping
and qede_ptp_cfg_filters() becomes void as it cannot fail anymore.

Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251116094610.3932005-3-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnx2x: convert to use ndo_hwtstamp callbacks

The driver implemented SIOCSHWTSTAMP ioctl command only, but at the same
time it has configuration stored in a private structure. Implement both
ndo_hwtstamp_set and ndo_hwtstamp_get callback using stored info.
ndo_hwtstamp_set callback implements a check for unsupported 1-step
timestamping. The same check is removed from bnx2x_configure_ptp_filters
function as it's not needed anymore. Another call site of
bnx2x_configure_ptp_filters has hwtstamp_ioctl_called guard.

Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251116094610.3932005-2-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-mptcp-counter-cache-stats-before-timeout'

Matthieu Baerts says:

====================
selftests: mptcp: counter cache & stats before timeout

Here are a bunch of small improvements to the MPTCP selftests:

- Patch 1: move code to mptcp_lib.sh to prepare the new features.

- Patch 2: simplify mptcp_lib_pr_err_stats helper use.

- Patch 3: remove unused last column from nstat output.

- Patch 4: improve stats dump in mptcp_join.sh.

- Patch 5: get counters from nstat history and simplify mptcp_connect.sh.

- Patch 6: avoid taking the same packet trace twice.

- Patch 7: wait for an event instead of a fix time.

- Patch 8: instead of using 'timeout' and print the stats after, another
internal timeout is used: if it fires, it will print stats, then stop
everything. This avoids confusions around stats in case of timeout.
====================

Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-0-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: get stats just before timing out

Recently, some debugging happened around a test that was timing out. The
stats were showing connections being closed which was confusing because
the closing state was caused by the timeout stopping the transfer.

To avoid such confusion, the timeout is no longer done per mptcp_connect
process, but separately. In case of timeout, the stats are now printed,
then the apps are killed.

The stats will still be printed after the kill, but that's fine, and
this might even be useful, just in case. Timeout should be exceptional.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-8-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: wait for port instead of sleep

After having started mptcp_connect in listening mode,
'mptcp_lib_wait_local_port_listen' can be used to wait for the listening
socket to be ready.

This is better than using the 'sleep' command, not to pause for a fixed
amount of time, but waiting for an event. This helper is used in all
other MPTCP selftests, but not in these two.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-7-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: connect: avoid double packet traces

When the same netns is used for the listener and the connector, no need
to take exactly the same packet trace twice, one is enough.

This avoids confusions when the traces are the same, and wasting
resources which might not help reproducing an issue.

While at it, avoid long lines and double spaces now that these lines are
no longer aligned.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-6-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: lib: get counters from nstat history

Before, 'nstat' was used to retrieve each individual counter: this means
querying 4 different sources from /proc/net and iterating over 100+
counters each time. Instead, the stats could be retrieved once, and the
output file could be parsed for each counter. Even better, such file is
already present: the nstat history file.

To be able to get this working, the nstat history file also needs to
contains zero counters too, so it is still possible to know if a counter
is missing or set to 0.

This also simplifies mptcp_connect.sh: instead of checking multiple
counters before and after a test to compute the difference, the stats
history files can be reset before each test, and nstat can display only
the difference.

mptcp_lib_get_counter() continues to work when no history file is
available: by fetching nstat directly, like before. This is the case in
diag.sh and userspace_pm.sh where there is no need to save the history
file. This is also the case in mptcp_join.sh, when 'run_tests' is
executed in the background: easier to continue fetching counters than
updating the history each time it is needed.

Note: 'nstat' is called with '-s' in mptcp_lib_nstat_get(), so this
helper can be called multiple times during the test if needed.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-5-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: join: dump stats from history

In case of errors, dump the stats from history instead of using nstat.

There are multiple advantages to that:

- The same filters from pr_err_stats are used, e.g. the unused 'rate'
column is not displayed.

- The counters are closer to the ones from when the test stopped.

- While at it, the errors can be better presented: error colours, a
small indentation to distinguish the different parts, extra new lines.

Even if it should only happen in rare cases -- internal errors, or netns
issues -- if no history is available, 'nstat' is used like before, just
in case.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-4-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: lib: stats: remove nstat rate columns

With the MPTCP selftests, the nstat daemon is not used. It means that
the last column (the rate) is always 0.0, and that's not something
interesting to display.

Then, this last column can be filtered out.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-3-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: lib: remove stats files args

Now that these files are written from MPTCP lib helpers, the stats file
paths are uniformed. Then, no need to specify them from the each
selftest.

No behavioural changes intended.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-2-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: lib: introduce 'nstat_{init,get}'

These new helpers are easier to read than the long and multi lines
commands. Plus it will ease the addition of new features related to that
in the next commits.

No behavioural changes intended.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-1-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec

net.ipv4.tcp_comp_sack_slack_ns current default value is too high.

When a flow has many drops (1 % or more), and small RTT, adding 100 usec
before sending SACK stalls the sender relying on getting SACK
fast enough to keep the pipe busy.

Decrease the default to 10 usec.

This is orthogonal to Congestion Control heuristics to determine
if drops are caused by congestion or not.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20251114135141.3810964-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: remove setting supported/advertised modes from fixed_phy_register

This code was added with 34b31da486a5 ("phy: fixed_phy: Set supported
speed in phydev") 10 yrs ago. The commit message of this change
mentions a use case involving callback adjust_link of struct
dsa_switch_driver. This struct doesn't exist any longer, and in general
usage of the legacy fixed PHY has been removed from DSA with the switch
to phylink.

Note: Supported and advertised modes are now set by phy_probe() when
the fixed PHY is attached to the netdev and bound to the genphy driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/3abaa3c5-fbb9-4052-9346-6cb096a25878@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynltool: remove -lmnl from link flags

The libmnl dependency has been removed from libynl back in
commit 73395b43819b ("tools: ynl: remove the libmnl dependency")
Remove it from the ynltool Makefile.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251115225508.1000072-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Configure RDE settings for pause frame

fbnic supports pause frames. When pause frames are enabled presumably
user expects lossless operation from the NIC. Make sure we configure
RDE (Rx DMA Engine) to DROP_NEVER mode to avoid discards due to delays
in fetching Rx descriptors from the host.

While at it enable DROP_NEVER when NIC only has a single queue
configured. In this case the NIC acts as a FIFO so there's no risk
of head-of-line blocking other queues by making RDE wait. If pause
is disabled this just moves the packet loss from the DMA engine to
the Rx buffer.

Remove redundant call to fbnic_config_drop_mode_rcq(), introduced by
commit 0cb4c0a13723 ("eth: fbnic: Implement Rx queue
alloc/start/stop/free"). This call does not add value as
fbnic_enable_rcq(), which is called immediately afterward, already
handles this.

Although we do not support autoneg at this time, preserve tx_pause in
.mac_link_up instead of fbnic_phylink_get_pauseparam()

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251113232610.1151712-1-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx-migrate-to-new-get_rx_ring_count-ethtool-api'

Breno Leitao says:

====================
net: mlx: migrate to new get_rx_ring_count ethtool API

This series migrates the mlx4 and mlx5 drivers to use the new
.get_rx_ring_count() callback introduced in commit 84eaf4359c36 ("net:
ethtool: add get_rx_ring_count callback to optimize RX ring queries").

Previously, these drivers handled ETHTOOL_GRXRINGS within the
.get_rxnfc() callback. With the dedicated .get_rx_ring_count() API, this
handling can be extracted and simplified.

For mlx5, this affects both the ethernet and IPoIB drivers. The
ETHTOOL_GRXRINGS handling was previously kept in .get_rxnfc() to support
"ethtool -x" when CONFIG_MLX5_EN_RXNFC=n, but this is no longer
necessary with the new dedicated callback.

Note: The mlx4 changes are compile-tested only, while mlx5 changes were
properly tested.
====================

Link: https://patch.msgid.link/20251113-mlx_grxrings-v1-0-0017f2af7dd0@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlx5: extract GRXRINGS from .get_rxnfc

Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count() for both the mlx5 ethernet and IPoIB drivers.

The ETHTOOL_GRXRINGS handling was previously kept in .get_rxnfc() to
support "ethtool -x" when CONFIG_MLX5_EN_RXNFC=n. With the new
dedicated .get_rx_ring_count() callback, this is no longer necessary.

This simplifies the RX ring count retrieval and aligns mlx5 with the new
ethtool API for querying RX ring parameters.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20251113-mlx_grxrings-v1-2-0017f2af7dd0@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlx4: extract GRXRINGS from .get_rxnfc

Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().

This simplifies the RX ring count retrieval and aligns mlx4 with the new
ethtool API for querying RX ring parameters. This is compiled tested
only.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20251113-mlx_grxrings-v1-1-0017f2af7dd0@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2025-11-13

The following pull-request contains common mlx5 updates

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Expose definition for 1600Gbps link mode
  net/mlx5: fs, set non default device per namespace
  net/mlx5: fs, Add other_eswitch support for steering tables
  net/mlx5: Add OTHER_ESWITCH HW capabilities
  net/mlx5: Add direct ST mode support for RDMA
  PCI/TPH: Expose pcie_tph_get_st_table_loc()
  {rdma,net}/mlx5: Query vports mac address from device
====================

Link: https://patch.msgid.link/1763027252-1168760-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-rk-use-phy_intf_sel_x'

Russell King says:

====================
net: stmmac: rk: use PHY_INTF_SEL_x

This series is a minimal conversion of the dwmac-rk huge driver to use
PHY_INTF_SEL_x constants.

Patch 2 appears to reorder the output functions making diffing the
generated code impossible.
====================

Link: https://patch.msgid.link/aRYZaKTIvfYoV3wE@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: rk: use PHY_INTF_SEL_x in functions

Rather than defining one xxx_GMAC_PHY_INTF_SEL_xxx() for each mode,
define xxx_GMAC_PHY_INTF_SEL() which takes the phy_intf_sel value.
Pass the appropriate value into these new macros in the set_to_xxx()
methods.

No change to produced code on aarch64.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vJbPG-0000000EBqb-2cF2@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: rk: use PHY_INTF_SEL_x constants

The values used in the xxx_GMAC_PHY_INTF_SEL_xxx() macros are the
phy_intf_sel values used for the dwmac core. Use these to define these
constants.

No change to produced code on aarch64.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vJbPB-0000000EBqV-27GS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: rk: convert all bitfields to GRF_FIELD*()

Convert all bitfields to GRF_FIELD() or GRF_FIELD_CONST(), which makes
the bitfield values more readable, and also allows the aarch64 compiler
to produce better code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJbP6-0000000EBqP-1cmm@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: rk: replace HIWORD_UPDATE() with GRF_FIELD()

Provide GRF_FIELD() which takes the high/low bit numbers of the field
and field value, generates the mask and passes it to FIELD_PREP_WM16.
Replace all HIWORD_UPDATE() instances with this.

No change to produced code on aarch64.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vJbP1-0000000EBqJ-1AjR@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnx2x: convert to use get_rx_ring_count

Convert the bnx2x driver to use the new .get_rx_ring_count ethtool
operation instead of implementing .get_rxnfc solely for handling
ETHTOOL_GRXRINGS command. This simplifies the code by replacing the
switch statement with a direct return of the queue count.

The new callback provides the same functionality in a more direct way,
following the ongoing ethtool API modernization.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251112-bnx_grxrings-v1-1-1c2cb73979e2@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ixgbe: convert to use .get_rx_ring_count

Convert the ixgbe driver to use the new .get_rx_ring_count ethtool
operation for handling ETHTOOL_GRXRINGS command. This simplifies the
code by extracting the ring count logic into a dedicated callback.

The new callback provides the same functionality in a more direct way,
following the ongoing ethtool API modernization.

This was compile-tested only.

Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251113-ixgbe_gxrings-v2-1-0ecf57808a78@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: Remove unused declaration sctp_auth_init_hmacs()

Commit bf40785fa437 ("sctp: Use HMAC-SHA1 and HMAC-SHA256 library for chunk
authentication") removed the implementation but leave declaration.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20251113114501.32905-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: gro: inline tcp_gro_pull_header()

tcp_gro_pull_header() is used in GRO fast path, inline it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251113140358.58242-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: xdp: make the XDP qstats tests less flaky

The XDP qstats tests send 2k packets over a single socket.
Looks like when netdev CI is busy running those tests in QEMU
occasionally flakes. The target doesn't get to run at all
before all 2000 packets are sent.

Lower the number of packets to 1000 and reopen the socket
every 50 packets, to give RSS a chance to spread the packets
to multiple queues.

For the netdev CI testing either lowering the count or using
multiple sockets is enough, but let's do both for extra resiliency.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20251113152703.3819756-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: xdp: Fix register spill error with clang 20

On clang 20.1.8 the XDP program fails to load with a register spill error.
Since hdr_len is a __u32, the compiler decided it only needed the lower
32-bits of ctx->data, which later triggers the register spill verifier
error.

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20251113043102.4062150-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: clean up routes when manually removing address with a lifetime

When an IPv6 address with a finite lifetime (configured with valid_lft
and preferred_lft) is manually deleted, the kernel does not clean up the
associated prefix route. This results in orphaned routes (marked "proto
kernel") remaining in the routing table even after their corresponding
address has been deleted.

This is particularly problematic on networks using combination of SLAAC
and bridges.

1. Machine comes up and performs RA on eth0.
2. User creates a bridge
- does an ip -6 addr flush dev eth0;
- adds the eth0 under the bridge.
3. SLAAC happens on br0.

Even tho the address has "moved" to br0 there will still be a route
pointing to eth0, but eth0 is not usable for IP any more.

Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251113031700.3736285-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-mscc-add-support-for-phy-led-control'

Lad Prabhakar says:

====================
net: phy: mscc: Add support for PHY LED control

This patch series adds support for controlling the PHY LEDs on the
VSC85xx family of PHYs from Microsemi (now part of Renesas).
The first two patches simplify and consolidate existing probe code
the third patch introduces the LED control functionality.
The LED control feature allows users to configure the LED behavior
based on link activity, speed, and other criteria.
====================

Link: https://patch.msgid.link/20251112135715.1017117-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mscc: Handle devm_phy_package_join() failure in vsc85xx_probe_common()

devm_phy_package_join() may fail and return a negative error code.
Update vsc85xx_probe_common() to properly handle this failure by
checking the return value and propagating the error to the caller.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251112135715.1017117-5-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mscc: Add support for PHY LED control

Add support for the PHY LED controller in the MSCC VSC85xx driver. The
implementation provides LED brightness and hardware control through the
LED subsystem and integrates with the standard 'netdev' trigger.

Introduce new register definitions for the LED behavior register
(MSCC_PHY_LED_BEHAVIOR = 30) and the LED combine disable bits, which
control whether LEDs indicate link-only or combined link and activity
status. Implement a helper, vsc8541_led_combine_disable_set(), to update
these bits safely using phy_modify().

Add support for LED brightness control and hardware mode configuration.
The new callbacks implement the standard LED class operations, allowing
user control through sysfs. The brightness control maps to PHY LED force
on/off modes. The hardware control get and set functions translate
between the PHY-specific LED mode encodings and the LED subsystem
TRIGGER_NETDEV_* rules.

The combine feature is managed automatically based on the selected
rules. When both RX and TX activity are disabled, the combine feature is
turned off, causing LEDs to indicate link-only status. When either RX or
TX activity is enabled, the combine feature remains active and LEDs
indicate combined link and activity.

Register the LED callbacks for all VSC85xx PHY variants so that the LED
subsystem can manage their indicators consistently. Existing device tree
LED configuration and default behavior are preserved.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251112135715.1017117-4-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>