unix_tot_inflight is a poor metric, only telling the number of
inflight AF_UNXI sockets, and we should use unix_graph_state instead.
Also, if the receiver is catching up with the passed fds, the
sender does not need to schedule GC.
GC only helps unreferenced cyclic SCM_RIGHTS references, and in
such a situation, the malicious sendmsg() will continue to call
wait_for_unix_gc() and hit the UNIX_INFLIGHT_SANE_USER condition.
Let's make only malicious users schedule GC and wait for it to
finish if a cyclic reference exists during the previous GC run.
Then, sane users will pay almost no cost for wait_for_unix_gc().
af_unix: Don't call wait_for_unix_gc() on every sendmsg().
We have been calling wait_for_unix_gc() on every sendmsg() in case
there are too many inflight AF_UNIX sockets.
This is also because the old GC implementation had poor knowledge
of the inflight sockets and had to suspect every sendmsg().
This was improved by commit d9f21b361333 ("af_unix: Try to run GC
async."), but we do not even need to call wait_for_unix_gc() if the
process is not sending AF_UNIX sockets.
The wait_for_unix_gc() call only helps when a malicious process
continues to create cyclic references, and we can detect that
in a better place and slow it down.
Let's move wait_for_unix_gc() to unix_prepare_fpl() that is called
only when AF_UNIX socket fd is passed via SCM_RIGHTS.
af_unix: Don't trigger GC from close() if unnecessary.
We have been triggering GC on every close() if there is even one
inflight AF_UNIX socket.
This is because the old GC implementation had no idea of the graph
shape formed by SCM_RIGHTS references.
The new GC knows whether there could be a cyclic reference or not,
and we can do better.
Let's not trigger GC from close() if there is no cyclic reference
or GC is already in progress.
While at it, unix_gc() is renamed to unix_schedule_gc() as it does
not actually perform GC since commit 8b90a9f819dc ("af_unix: Run
GC on only one CPU.").
GC manages its state by two variables, unix_graph_maybe_cyclic
and unix_graph_grouped, both of which are set to false in the
initial state.
When an AF_UNIX socket is passed to an in-flight AF_UNIX socket,
unix_update_graph() sets unix_graph_maybe_cyclic to true and
unix_graph_grouped to false, making the next GC invocation call
unix_walk_scc() to group SCCs.
Once unix_walk_scc() finishes, sockets in the same SCC are linked
via vertex->scc_entry. Then, unix_graph_grouped is set to true
so that the following GC invocations can skip Tarjan's algorithm
and simply iterate through the list in unix_walk_scc_fast().
In addition, if we know there is at least one cyclic reference,
we set unix_graph_maybe_cyclic to true so that we do not skip GC.
__unix_walk_scc() and unix_walk_scc_fast() call unix_scc_cyclic()
for each SCC to check if it forms a cyclic reference, so that we
can skip GC at the following invocations in case all SCCs do not
have any cycles.
If we count the number of cyclic SCCs in __unix_walk_scc(), we can
simplify unix_walk_scc_fast() because the number of cyclic SCCs
only changes when it garbage-collects a SCC.
So, let's count cyclic SCC in __unix_walk_scc() and decrement it
in unix_walk_scc_fast() when performing garbage collection.
Note that we will use this counter in a later patch to check if a
cycle existed in the previous GC run.
Saeed Mahameed [Mon, 17 Nov 2025 21:42:08 +0000 (23:42 +0200)]
net/mlx5: Abort new commands if all command slots are stalled
In case of a FW issue, FW might be not responding to FW commands,
causing kernel lockout for a long period of time, e.g. rtnl_lock held
while ethtool is trying to collect stats waiting for FW to respond to
multiple commands, when all of them will timeout.
While there's no immediate indication of the FW lockout, we can safely
assume that something is wrong when all command slots are busy and in
a timeout state and no FW completion was received on any of them.
In such case, start immediately failing new commands.
Carolina Jubran [Mon, 17 Nov 2025 21:42:07 +0000 (23:42 +0200)]
net/mlx5: Remove redundant bw_share minimal value assignment
Remove unnecessary logic that sets bw_share to minimal value, when
parent has bw_share configured but nodes don't have min_rate.
This check is redundant because the parent bandwidth acts as the upper
bound regardless, and the firmware always enforces the topmost
bandwidth constraint.
Carolina Jubran [Mon, 17 Nov 2025 21:42:06 +0000 (23:42 +0200)]
net/mlx5e: Recover SQ on excessive PTP TX timestamp delta
Extend the TX timestamp handler to recover the SQ when the difference
between the port and CQE TX timestamps is abnormally large.
The current logic aborts timestamp delivery if the delta exceeds
1/128 seconds, which matches the maximum expected packet interval in
ptp4l. A larger delta makes the timestamps unreliable.
This change adds recovery if the delta exceeds 0.5 seconds. Such a
large gap should not occur in normal operation and indicates that
firmware is stuck or metadata tracking is out of sync, leading to stale
or mismatched timestamps. Recovering the SQ ensures forward progress
and avoids silently dropping invalid timestamps.
The timestamp handler now takes mlx5e_ptpsq directly to access both CQ
stats and the recovery state.
Gal Pressman [Mon, 17 Nov 2025 21:42:05 +0000 (23:42 +0200)]
net/mlx5: Refactor EEPROM query error handling to return status separately
Matthew and Jakub reported [1] issues where inventory automation tools
are calling EEPROM query repeatedly on a port that doesn't have an SFP
connected, resulting in millions of error prints.
Move MCIA register status extraction from the query functions to the
callers, allowing use of extack reporting instead of a dmesg print when
using the netlink API.
Cc: Matthew W Carlis <mattc@purestorage.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763415729-1238421-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Mon, 17 Nov 2025 02:44:56 +0000 (02:44 +0000)]
netlink: specs: support ipv4-or-v6 for dual-stack fields
Since commit 1b255e1beabf ("tools: ynl: add ipv4-or-v6 display hint"), we
can display either IPv4 or IPv6 addresses for a single field based on the
address family. However, most dual-stack fields still use the ipv4 display
hint. This update changes them to use the new ipv4-or-v6 display hint and
converts IPv4-only fields to use the u32 type.
Hangbin Liu [Mon, 17 Nov 2025 02:44:55 +0000 (02:44 +0000)]
tools: ynl: Add MAC address parsing support
Add missing support for parsing MAC addresses when display_hint is 'mac'
in the YNL library. This enables YNL CLI to accept MAC address strings
for attributes like lladdr in rt-neigh operations.
Jakub Kicinski [Wed, 19 Nov 2025 02:25:48 +0000 (18:25 -0800)]
Merge branch 'net-expand-napi_skb_cache-use'
Eric Dumazet says:
====================
net: expand napi_skb_cache use
This is a followup of commit e20dfbad8aab ("net: fix napi_consume_skb()
with alien skbs").
Now the per-cpu napi_skb_cache is populated from TX completion path,
we can make use of this cache, especially for cpus not used
from a driver NAPI poll (primary user of napi_cache).
With this series, I consistently reach 130 Mpps on my UDP tx stress test
and reduce SLUB spinlock contention to smaller values.
====================
Eric Dumazet [Sun, 16 Nov 2025 20:27:17 +0000 (20:27 +0000)]
net: use napi_skb_cache even in process context
This is a followup of commit e20dfbad8aab ("net: fix napi_consume_skb()
with alien skbs").
Now the per-cpu napi_skb_cache is populated from TX completion path,
we can make use of this cache, especially for cpus not used
from a driver NAPI poll (primary user of napi_cache).
We can use the napi_skb_cache only if current context is not from hard irq.
With this patch, I consistently reach 130 Mpps on my UDP tx stress test
and reduce SLUB spinlock contention to smaller values.
Note there is still some SLUB contention for skb->head allocations.
I had to tune /sys/kernel/slab/skbuff_small_head/cpu_partial
and /sys/kernel/slab/skbuff_small_head/min_partial depending
on the platform taxonomy.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251116202717.1542829-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alok Tiwari [Mon, 17 Nov 2025 09:53:50 +0000 (01:53 -0800)]
net: dsa: ks8995: Fix incorrect OF match table name
The driver declares an OF match table named ks8895_spi_of_match, even
though it describes compatible strings for the KS8995 and related Micrel
switches. This is a leftover typo, the correct name should match the
chip family handled by this driver ks8995, and also match the variable
used in spi_driver.of_match_table.
1) Relax a lock contention bottleneck to improve IPsec crypto
offload performance. From Jianbo Liu.
2) Deprecate pfkey, the interface will be removed in 2027.
3) Update xfrm documentation and move it to ipsec maintainance.
From Bagas Sanjaya.
* tag 'ipsec-next-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
MAINTAINERS: Add entry for XFRM documentation
net: Move XFRM documentation into its own subdirectory
Documentation: xfrm_sync: Number the fifth section
Documentation: xfrm_sysctl: Trim trailing colon in section heading
Documentation: xfrm_sync: Trim excess section heading characters
Documentation: xfrm_sync: Properly reindent list text
Documentation: xfrm_device: Separate hardware offload sublists
Documentation: xfrm_device: Use numbered list for offloading steps
Documentation: xfrm_device: Wrap iproute2 snippets in literal code block
pfkey: Deprecate pfkey
xfrm: Skip redundant replay recheck for the hardware offload path
xfrm: Refactor xfrm_input lock to reduce contention with RSS
====================
====================
gve: Implement XDP HW RX Timestamping support for DQ
From: Tim Hostetler <thostet@google.com>
This patch series adds support for bpf_xdp_metadata_rx_timestamp from an
XDP program loaded into the driver on its own or bound to an XSK. This
is only supported for DQ.
====================
Tim Hostetler [Fri, 14 Nov 2025 21:11:46 +0000 (13:11 -0800)]
gve: Add Rx HWTS metadata to AF_XDP ZC mode
By overlaying the struct gve_xdp_buff on top of the struct xdp_buff_xsk
that AF_XDP utilizes, the driver records the 32 bit timestamp via the
completion descriptor and the cached 64 bit NIC timestamp via gve_priv.
The driver's implementation of xmo_rx_timestamp extends the timestamp to
the full and up to date 64 bit timestamp and returns it to the user.
gve_rx_xsk_dqo is modified to accept a pointer to the completion
descriptor and no longer takes a buf_len explicitly as it can be pulled
out of the descriptor.
With this patch gve now supports bpf_xdp_metadata_rx_timestamp.
Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-5-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tim Hostetler [Fri, 14 Nov 2025 21:11:45 +0000 (13:11 -0800)]
gve: Prepare bpf_xdp_metadata_rx_timestamp support
Support populating XDP RX metadata with hardware RX timestamps. This
patch utilizes the same underlying logic to calculate hardware
timestamps as the regular RX path.
xdp_metadata_ops is registered with the net_device in a future patch.
gve_rx_calculate_hwtstamp was pulled out so as to not duplicate logic
between gve_xdp_rx_timestamp and gve_rx_hwtstamp.
Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-4-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tim Hostetler [Fri, 14 Nov 2025 21:11:44 +0000 (13:11 -0800)]
gve: Wrap struct xdp_buff
RX timestamping will need to keep track of extra temporary information
per-packet. In preparation for this, introduce gve_xdp_buff to wrap the
xdp_buff. This is similar in function to stmmac_xdp_buff and
ice_xdp_buff.
Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-3-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tim Hostetler [Fri, 14 Nov 2025 21:11:43 +0000 (13:11 -0800)]
gve: Move ptp_schedule_worker to gve_init_clock
Previously, gve had been only initializing ptp aux work when
hardware timestamping was initialized through ndo_hwtsatmp_set. As this
patch series introduces XDP hardware timestamp metadata which will
require the ptp aux work, the work can't be gated on the
kernel_hwtstamp_config being set and must be initialized elsewhere.
For simplicity, ptp_schedule_worker is invoked right after the ptp_clock
is registered with the kernel (which happens during gve_probe or
following reset). The worker is scheduled in GVE_NIC_TS_SYNC_INTERVAL_MS
as the synchronous call to gve_clock_nic_ts_read makes the worker
redundant if scheduled immediately.
If gve cannot read the device clock immediately, it errors out of
gve_init_clock.
Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-2-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The lan8814 supports two interfaces towards the host (QSGMII and QUSGMII).
Currently the lan8814 disables the auto-negotiation towards the host
side. So, extend this to allow to configure to use in-band
auto-negotiation.
I have tested this only with the QSGMII interface.
Sunday Adelodun [Thu, 13 Nov 2025 11:28:02 +0000 (12:28 +0100)]
selftests: af_unix: Add tests for ECONNRESET and EOF semantics
Add selftests to verify and document Linux’s intended behaviour for
UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes.
The tests verify that:
1. SOCK_STREAM returns EOF when the peer closes normally.
2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data.
3. SOCK_SEQPACKET returns EOF when the peer closes normally.
4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data.
5. SOCK_DGRAM does not return ECONNRESET when the peer closes.
This follows up on review feedback suggesting a selftest to clarify
Linux’s semantics.
====================
net: stmmac: Disable EEE RX clock stop when VLAN is enabled
This series fixes a couple of VLAN issues observed on the Renesas RZ/V2H
EVK platform (stmmac + Microchip KSZ9131RNXI PHY):
- The first patch fixes a bug where VLAN ID 0 would not be properly removed
due to how vlan_del_hw_rx_fltr() matched entries in the VLAN filter table.
- The second patch addresses RX clock gating issues that occur during VLAN
creation and deletion when EEE is enabled with RX clock-stop active (the
default configuration). For example:
# ip link add link end1 name end1.5 type vlan id 5 15c40000.ethernet end1: Timeout accessing MAC_VLAN_Tag_Filter
RTNETLINK answers: Device or resource busy
The stmmac hardware requires the receive clock to be running when writing
certain registers, including VLAN registers. However, by default the driver
enables Energy Efficient Ethernet (EEE) and allows the PHY to stop the
receive clock when the link is idle. As a result, the RX clock might be
stopped when attempting to access these registers, leading to timeouts.
A more comprehensive overview of receive clock related issues in the
stmmac driver can be found here:
https://lore.kernel.org/all/Z9ySeo61VYTClIJJ@shell.armlinux.org.uk/
Most of the issues were resolved by commit dd557266cf5fb ("net: stmmac:
block PHY RXC clock-stop"), which wraps register accesses with
phylink_rx_clk_stop_block()/unblock() calls. However, VLAN add/delete
operations are invoked with bottom halves disabled, where sleeping is
not permitted, so those helpers cannot be used.
To avoid these VLAN timeouts, the second patch disables the EEE RX
clock-stop feature when VLAN support is enabled. This ensures the receive
clock remains active, allowing VLAN operations to complete successfully.
====================
Ovidiu Panait [Thu, 13 Nov 2025 11:27:21 +0000 (11:27 +0000)]
net: stmmac: Disable EEE RX clock stop when VLAN is enabled
On the Renesas RZ/V2H EVK platform, where the stmmac MAC is connected to a
Microchip KSZ9131RNXI PHY, creating or deleting VLAN interfaces may fail
with timeouts:
# ip link add link end1 name end1.5 type vlan id 5 15c40000.ethernet end1: Timeout accessing MAC_VLAN_Tag_Filter
RTNETLINK answers: Device or resource busy
Disabling EEE at runtime avoids the problem:
# ethtool --set-eee end1 eee off
# ip link add link end1 name end1.5 type vlan id 5
# ip link del end1.5
The stmmac hardware requires the receive clock to be running when writing
certain registers, such as those used for MAC address configuration or
VLAN filtering. However, by default the driver enables Energy Efficient
Ethernet (EEE) and allows the PHY to stop the receive clock when the link
is idle. As a result, the RX clock might be stopped when attempting to
access these registers, leading to timeouts and other issues.
Commit dd557266cf5fb ("net: stmmac: block PHY RXC clock-stop")
addressed this issue for most register accesses by wrapping them in
phylink_rx_clk_stop_block()/phylink_rx_clk_stop_unblock() calls.
However, VLAN add/delete operations may be invoked with bottom halves
disabled, where sleeping is not allowed, so using these helpers is not
possible.
Therefore, to fix this, disable the RX clock stop feature in the phylink
configuration if VLAN features are set. This ensures the RX clock remains
active and register accesses succeed during VLAN operations.
Ovidiu Panait [Thu, 13 Nov 2025 11:27:20 +0000 (11:27 +0000)]
net: stmmac: Fix VLAN 0 deletion in vlan_del_hw_rx_fltr()
When the "rx-vlan-filter" feature is enabled on a network device, the 8021q
module automatically adds a VLAN 0 hardware filter when the device is
brought administratively up.
For stmmac, this causes vlan_add_hw_rx_fltr() to create a new entry for
VID 0 in the mac_device_info->vlan_filter array, in the following format:
VLAN_TAG_DATA_ETV | VLAN_TAG_DATA_VEN | vid
Here, VLAN_TAG_DATA_VEN indicates that the hardware filter is enabled for
that VID.
However, on the delete path, vlan_del_hw_rx_fltr() searches the vlan_filter
array by VID only, without verifying whether a VLAN entry is enabled. As a
result, when the 8021q module attempts to remove VLAN 0, the function may
mistakenly match a zero-initialized slot rather than the actual VLAN 0
entry, causing incorrect deletions and leaving stale entries in the
hardware table.
Fix this by verifying that the VLAN entry's enable bit (VLAN_TAG_DATA_VEN)
is set before matching and deleting by VID. This ensures only active VLAN
entries are removed and avoids leaving stale entries in the VLAN filter
table, particularly for VLAN ID 0.
Fixes: ed64639bc1e08 ("net: stmmac: Add support for VLAN Rx filtering") Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20251113112721.70500-2-ovidiu.panait.rb@renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
dpll: zl3073x: Refactor state management
This patch set is a refactoring of the zl3073x driver to clean up
state management, improve modularity, and significantly reduce
on-demand I/O.
The driver's dpll.c implementation previously performed on-demand
register reads and writes (wrapped in mailbox operations) to get
or set properties like frequency, phase, and embedded-sync settings.
This cluttered the DPLL logic with low-level I/O, duplicated locking,
and led to inefficient bus traffic.
This series addresses this by:
1. Splitting the monolithic 'core.c' into logical units ('ref.c',
'out.c', 'synth.c').
2. Implementing a full read/write-back cache for 'zl3073x_ref' and
'zl3073x_out' structures.
All state is now read once during '_state_fetch()' (and status updated
periodically). DPLL get callbacks read from this cache. Set callbacks
modify a copy of the state, which is then committed via a new
'..._state_set()' function. These '_state_set' functions compare
the new state to the cached state and write *only* the modified
register values back to the hardware, all within a single mailbox
sequence.
The result is a much cleaner 'dpll.c' that is almost entirely
free of direct register I/O, and all state logic is properly
encapsulated in its respective file.
The series is broken down as follows:
* Patch 1: Changes the state structs to store raw register values
(e.g., 'config', 'ctrl') instead of parsed booleans, centralizing
parsing logic into the helpers.
* Patch 2: Splits the logic from 'core.c' into new 'ref.c', 'out.c'
and 'synth.c' files, creating a 'zl3073x_dev_...' abstraction layer.
* Patch 3: Introduces the caching concept by reading and caching
the reference monitor status periodically, removing scattered
reads from 'dpll.c'.
* Patch 4: Expands the 'zl3073x_ref' struct to cache *all* reference
properties and adds 'zl3073x_ref_state_set()' to write back changes.
* Patch 5: Does the same for the 'zl3073x_out' struct, caching all
output properties and adding 'zl3073x_out_state_set()'.
* Patch 6: A final cleanup that removes the 'zl3073x_dev_...' wrapper
functions that became redundant after the refactoring.
====================
Ivan Vecera [Thu, 13 Nov 2025 07:41:04 +0000 (08:41 +0100)]
dpll: zl3073x: Cache all output properties in zl3073x_out
Expand the zl3073x_out structure to cache all output-related
hardware registers, including divisors, widths, embedded-sync
parameters and phase compensation.
Modify zl3073x_out_state_fetch() to read and populate all these
new fields at once, including zero-divisor checks. Refactor all
dpll "getter" functions in dpll.c to read from this new
cached state instead of performing direct register access.
Introduce a new function, zl3073x_out_state_set(), to handle
writing changes back to the hardware. This function compares the
provided state with the current cached state and writes *only* the
modified register values via a single mailbox sequence before
updating the local cache.
Refactor all dpll "setter" functions to modify a local copy of
the output state and then call zl3073x_out_state_set() to
commit the changes.
This change centralizes all output-related register I/O into
out.c, significantly reduces bus traffic, and simplifies the logic
in dpll.c.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-6-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Thu, 13 Nov 2025 07:41:03 +0000 (08:41 +0100)]
dpll: zl3073x: Cache all reference properties in zl3073x_ref
Expand the zl3073x_ref structure to cache all reference-related
hardware registers, including frequency components, embedded-sync
settings and phase compensation. Previously, these registers were
read on-demand from various functions in dpll.c leading to frequent
mailbox operations.
Modify zl3073x_ref_state_fetch() to read and populate all these new
fields at once. Refactor all "getter" functions in dpll.c to read
from this new cached state instead of performing direct register
access.
Remove the standalone zl3073x_dpll_input_ref_frequency_get() helper,
as its functionality is now replaced by zl3073x_ref_freq_get() which
operates on the cached state and add a corresponding zl3073x_dev_...
wrapper.
Introduce a new function, zl3073x_ref_state_set(), to handle
writing changes back to the hardware. This function compares the
provided state with the current cached state and writes *only* the
modified register values to the device via a single mailbox sequence
before updating the local cache.
Refactor all dpll "setter" functions to modify a local copy of the
ref state and then call zl3073x_ref_state_set() to commit the changes.
As a cleanup, update callers in dpll.c that already have
a struct zl3073x_ref * to use the direct helpers instead of the
zl3073x_dev_... wrappers.
This change centralizes all reference-related register I/O into ref.c,
significantly reduces bus traffic, and simplifies the logic in dpll.c.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-5-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Thu, 13 Nov 2025 07:41:02 +0000 (08:41 +0100)]
dpll: zl3073x: Cache reference monitor status
Instead of reading the ZL_REG_REF_MON_STATUS register every time
the reference status is needed, cache this value in the zl3073x_ref
struct.
This is achieved by:
* Adding a mon_status field to struct zl3073x_ref
* Introducing zl3073x_dev_ref_status_update() to read the status for
all references into this new cache field
* Calling this update function from the periodic work handler
* Adding zl3073x_ref_is_status_ok() and zl3073x_dev_ref_is_status_ok()
helpers to check the cached value
* Refactoring all callers in dpll.c to use the new
zl3073x_dev_ref_is_status_ok() helper, removing direct register reads
This change consolidates all status register reads into a single periodic
function and reduces I/O bus traffic in dpll callbacks.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-4-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Thu, 13 Nov 2025 07:41:01 +0000 (08:41 +0100)]
dpll: zl3073x: Split ref, out, and synth logic from core
Refactor the zl3073x driver by splitting the logic for input
references, outputs and synthesizers out of the monolithic
core.[ch] files.
Move the logic for each functional block into its own dedicated files:
ref.[ch], out.[ch] and synth.[ch].
Specifically:
- Move state structures (zl3073x_ref, zl3073x_out, zl3073x_synth)
from core.h into their respective new headers
- Move state-fetching functions (..._state_fetch) from core.c to their
new .c files
- Move the zl3073x_ref_freq_factorize helper from core.c to ref.c
- Introduce a new helper layer to decouple the core device logic from
the state-parsing logic:
1. Move the original inline helpers (e.g., zl3073x_ref_is_enabled)
to the new headers (ref.h, etc.) and make them operate on a
const struct ... * pointer.
2. Create new zl3073x_dev_... prefixed functions in core.h
(e.g., zl3073x_dev_ref_is_enabled) and Implement these _dev_ functions
to fetch state using a new ..._state_get() helper and then call
the non-prefixed helper.
3. Update all driver-internal callers (in dpll.c, prop.c, etc.) to use
the new zl3073x_dev_... functions.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-3-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Thu, 13 Nov 2025 07:41:00 +0000 (08:41 +0100)]
dpll: zl3073x: Store raw register values instead of parsed state
The zl3073x_ref, zl3073x_out and zl3073x_synth structures
previously stored state that was parsed from register reads. This
included values like boolean 'enabled' flags, synthesizer selections,
and pre-calculated frequencies.
This commit refactors the state management to store the raw register
values directly in these structures. The various inline helper functions
are updated to parse these raw values on-demand using FIELD_GET.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
s390/qeth: Improve handling of OSA RCs
This two patch series aims to improve how return codes from OSA Express
are handled in the qeth driver.
OSA defines a number of return codes whose meaning is determined by the
issuing command, ie. they are ambiguous. The first patch moves
definitions of all return codes including the ambiguous ones to a single
enum block to aid readability and maintainability.
The second patch implements a mechanism to interpret return codes based
on the issuing command to ensure accurate debug messages. While at it,
remove extern keyword and fix indentation for function declarations to
be in line with Linux kernel coding style.
====================
Aswin Karuvally [Thu, 13 Nov 2025 14:42:09 +0000 (15:42 +0100)]
s390/qeth: Handle ambiguous OSA RCs in s390dbf
OSA Express defines a number of return codes whose meaning is determined
by the issuing command, making them ambiguous. The important ones are
reported as debug messages through the s390 debug feature.
The qeth driver currently does not take the issuing command into account
when interpreting the return code which sometimes leads to incorrect
debug messages.
Implement a mechanism to interpret and report these return codes
properly. While at it, remove extern keyword and fix indentation for
function declarations to be in line with Linux kernel coding style.
Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com> Link: https://patch.msgid.link/20251113144209.2140061-3-aswin@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aswin Karuvally [Thu, 13 Nov 2025 14:42:08 +0000 (15:42 +0100)]
s390/qeth: Move all OSA RCs to single enum
OSA Express defines a number of return codes whose meaning is
determined by the issuing command, making them ambiguous. Move
definitions of all return codes including the ambiguous ones to a single
enum block to aid readability and maintainability.
Heiner Kallweit [Thu, 13 Nov 2025 21:09:08 +0000 (22:09 +0100)]
r8169: bail out from probe if fiber mode is detected on RTL8127AF
It was reported that on a card with RTL8127AF (SFP + DAC) link-up isn't
detected. Realtek hides the SFP behind the internal PHY, which isn't
behaving fully compliance with clause 22 any longer in fiber mode.
Due to not having access to chip documentation there isn't much I can
do for now. Instead of silently failing to detect link-up in fiber mode,
inform the user that fiber mode isn't support and bail out.
The logic to detect fiber mode is borrowed from Realtek's r8127 driver.
Inochi Amaoto [Fri, 14 Nov 2025 00:38:04 +0000 (08:38 +0800)]
net: phy: Add helper for fixing RGMII PHY mode based on internal mac delay
The "phy-mode" property of devicetree indicates whether the PCB has
delay now, which means the mac needs to modify the PHY mode based
on whether there is an internal delay in the mac.
This modification is similar for many ethernet drivers. To simplify
code, define the helper phy_fix_phy_mode_for_mac_delays(speed, mac_txid,
mac_rxid) to fix PHY mode based on whether mac adds internal delay.
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251114003805.494387-3-inochiama@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Restructure mana_query_gf_stats() to operate on the per-VF mana_context,
instead of per-port statistics. Introduce mana_ethtool_hc_stats to
isolate hardware counter statistics and update the
"ethtool -S <interface>" output to expose all relevant counters while
preserving backward compatibility.
Add support for the standard rx_missed_errors counter by mapping it
to the hardware's hc_rx_discards_no_wqe metric. Refresh statistics
every 2 seconds.
====================
Report standard counter stats->rx_missed_errors
using hc_rx_discards_no_wqe from the hardware.
Add a global workqueue to periodically run
mana_query_gf_stats every 2 seconds to get the latest
info in eth_stats and define a driver capability flag
to notify hardware of the periodic queries.
To avoid repeated failures and log flooding, the workqueue
is not rescheduled if mana_query_gf_stats fails on HWC timeout
error and the stats are reset to 0. Other errors are transient
which will not need a VF reset for recovery.
net: mana: Move hardware counter stats from per-port to per-VF context
Move hardware counter (HC) statistics from mana_port_context to
mana_context to enable sharing stats across multiple network ports
on the same MANA VF. Previously, each network port queried
hardware counters independently using MANA_QUERY_GF_STAT command
(GF = Generic Function stats from GDMA hardware), resulting in
redundant queries when multiple ports existed on the same device.
Isolate hardware counter stats by introducing mana_ethtool_hc_stats
in mana_context and update the code to ensure all stats are properly
reported via ethtool -S <interface>, maintaining consistency with
previous behavior.
PCI drivers explicitly set .pkt_route to zero. However, as the struct
is allocated using devm_kzalloc(), all members default to zero unless
explicitly initialised. Thus, explicitly setting these to zero is
unnecessary. Remove these. This leaves only stmmac_platform.c where
this is explicitly initialised depending on DT properties.
stmmac_platform.c explicitly sets .prio to zero if the snps,priority
property is not present in DT for the queue. However, as the struct
is allocated using devm_kzalloc(), all members default to zero unless
explicitly initialised, and of_property_read_u32() will not write to
its argument if the property is not found. Thus, explicitly setting
these to zero is unnecessary. Remove these.
Several drivers (see below) explicitly set the queue .use_prio
configuration to false. However, as this structure is allocated using
devm_kzalloc(), all members default to zero unless otherwise explicitly
initialised. .use_prio isn't, so defaults to false. Remove these
unnecessary initialisations, leaving stmmac_platform.c as the only
file that .use_prio is set true.
net: stmmac: move initialisation of queues_to_use to stmmac_plat_dat_alloc()
Move the default initialisation of plat_dat->tx_queues_to_use and
plat_dat->rx_queues_to_use to 1 to stmmac_plat_dat_alloc(). This means
platform glue only needs to override this if different.
net: stmmac: move initialisation of unicast_filter_entries to stmmac_plat_dat_alloc()
Move the default initialisation of plat_dat->unicast_filter_entries to
1 to stmmac_plat_dat_alloc(). This means platform glue only needs to
override this if different.
net: stmmac: move initialisation of multicast_filter_bins to stmmac_plat_dat_alloc()
Move the default initialisation of plat_dat->multicast_filter_bins to
HASH_TABLE_SIZE to stmmac_plat_dat_alloc(). This means platform glue
only needs to override this if different.
====================
convert drivers to use ndo_hwtstamp callbacks part 4
This patchset is a subset of part 3 patchset to convert bnx2x and qede
drviers to use ndo callbacks instead ioctl to configure and report time
stamping. These drivers implemented only SIOCSHWTSTAMP command, but
converted to also provide configuration back to users. Some logic is
changed to avoid reporting configuration which is not in sync with the
HW in case of error happened.
====================
Vadim Fedorenko [Sun, 16 Nov 2025 09:46:10 +0000 (09:46 +0000)]
qede: convert to use ndo_hwtstamp callbacks
The driver implemented SIOCSHWTSTAMP ioctl cmd only, but it stores
configuration in private structure, so it can be reported back to users.
Implement both ndo_hwtstamp_set and ndo_hwtstamp_set callbacks.
ndo_hwtstamp_set implements a check of unsupported 1-step timestamping
and qede_ptp_cfg_filters() becomes void as it cannot fail anymore.
Vadim Fedorenko [Sun, 16 Nov 2025 09:46:09 +0000 (09:46 +0000)]
bnx2x: convert to use ndo_hwtstamp callbacks
The driver implemented SIOCSHWTSTAMP ioctl command only, but at the same
time it has configuration stored in a private structure. Implement both
ndo_hwtstamp_set and ndo_hwtstamp_get callback using stored info.
ndo_hwtstamp_set callback implements a check for unsupported 1-step
timestamping. The same check is removed from bnx2x_configure_ptp_filters
function as it's not needed anymore. Another call site of
bnx2x_configure_ptp_filters has hwtstamp_ioctl_called guard.
- Patch 3: remove unused last column from nstat output.
- Patch 4: improve stats dump in mptcp_join.sh.
- Patch 5: get counters from nstat history and simplify mptcp_connect.sh.
- Patch 6: avoid taking the same packet trace twice.
- Patch 7: wait for an event instead of a fix time.
- Patch 8: instead of using 'timeout' and print the stats after, another
internal timeout is used: if it fires, it will print stats, then stop
everything. This avoids confusions around stats in case of timeout.
====================
selftests: mptcp: get stats just before timing out
Recently, some debugging happened around a test that was timing out. The
stats were showing connections being closed which was confusing because
the closing state was caused by the timeout stopping the transfer.
To avoid such confusion, the timeout is no longer done per mptcp_connect
process, but separately. In case of timeout, the stats are now printed,
then the apps are killed.
The stats will still be printed after the kill, but that's fine, and
this might even be useful, just in case. Timeout should be exceptional.
After having started mptcp_connect in listening mode,
'mptcp_lib_wait_local_port_listen' can be used to wait for the listening
socket to be ready.
This is better than using the 'sleep' command, not to pause for a fixed
amount of time, but waiting for an event. This helper is used in all
other MPTCP selftests, but not in these two.
selftests: mptcp: lib: get counters from nstat history
Before, 'nstat' was used to retrieve each individual counter: this means
querying 4 different sources from /proc/net and iterating over 100+
counters each time. Instead, the stats could be retrieved once, and the
output file could be parsed for each counter. Even better, such file is
already present: the nstat history file.
To be able to get this working, the nstat history file also needs to
contains zero counters too, so it is still possible to know if a counter
is missing or set to 0.
This also simplifies mptcp_connect.sh: instead of checking multiple
counters before and after a test to compute the difference, the stats
history files can be reset before each test, and nstat can display only
the difference.
mptcp_lib_get_counter() continues to work when no history file is
available: by fetching nstat directly, like before. This is the case in
diag.sh and userspace_pm.sh where there is no need to save the history
file. This is also the case in mptcp_join.sh, when 'run_tests' is
executed in the background: easier to continue fetching counters than
updating the history each time it is needed.
Note: 'nstat' is called with '-s' in mptcp_lib_nstat_get(), so this
helper can be called multiple times during the test if needed.
In case of errors, dump the stats from history instead of using nstat.
There are multiple advantages to that:
- The same filters from pr_err_stats are used, e.g. the unused 'rate'
column is not displayed.
- The counters are closer to the ones from when the test stopped.
- While at it, the errors can be better presented: error colours, a
small indentation to distinguish the different parts, extra new lines.
Even if it should only happen in rare cases -- internal errors, or netns
issues -- if no history is available, 'nstat' is used like before, just
in case.
With the MPTCP selftests, the nstat daemon is not used. It means that
the last column (the rate) is always 0.0, and that's not something
interesting to display.
These new helpers are easier to read than the long and multi lines
commands. Plus it will ease the addition of new features related to that
in the next commits.
Eric Dumazet [Fri, 14 Nov 2025 13:51:41 +0000 (13:51 +0000)]
tcp: reduce tcp_comp_sack_slack_ns default value to 10 usec
net.ipv4.tcp_comp_sack_slack_ns current default value is too high.
When a flow has many drops (1 % or more), and small RTT, adding 100 usec
before sending SACK stalls the sender relying on getting SACK
fast enough to keep the pipe busy.
Decrease the default to 10 usec.
This is orthogonal to Congestion Control heuristics to determine
if drops are caused by congestion or not.
Heiner Kallweit [Wed, 12 Nov 2025 21:06:13 +0000 (22:06 +0100)]
net: phy: fixed_phy: remove setting supported/advertised modes from fixed_phy_register
This code was added with 34b31da486a5 ("phy: fixed_phy: Set supported
speed in phydev") 10 yrs ago. The commit message of this change
mentions a use case involving callback adjust_link of struct
dsa_switch_driver. This struct doesn't exist any longer, and in general
usage of the legacy fixed PHY has been removed from DSA with the switch
to phylink.
Note: Supported and advertised modes are now set by phy_probe() when
the fixed PHY is attached to the netdev and bound to the genphy driver.
Jakub Kicinski [Sat, 15 Nov 2025 22:55:08 +0000 (14:55 -0800)]
tools: ynltool: remove -lmnl from link flags
The libmnl dependency has been removed from libynl back in
commit 73395b43819b ("tools: ynl: remove the libmnl dependency")
Remove it from the ynltool Makefile.
Mohsin Bashir [Thu, 13 Nov 2025 23:26:10 +0000 (15:26 -0800)]
eth: fbnic: Configure RDE settings for pause frame
fbnic supports pause frames. When pause frames are enabled presumably
user expects lossless operation from the NIC. Make sure we configure
RDE (Rx DMA Engine) to DROP_NEVER mode to avoid discards due to delays
in fetching Rx descriptors from the host.
While at it enable DROP_NEVER when NIC only has a single queue
configured. In this case the NIC acts as a FIFO so there's no risk
of head-of-line blocking other queues by making RDE wait. If pause
is disabled this just moves the packet loss from the DMA engine to
the Rx buffer.
Remove redundant call to fbnic_config_drop_mode_rcq(), introduced by
commit 0cb4c0a13723 ("eth: fbnic: Implement Rx queue
alloc/start/stop/free"). This call does not add value as
fbnic_enable_rcq(), which is called immediately afterward, already
handles this.
Although we do not support autoneg at this time, preserve tx_pause in
.mac_link_up instead of fbnic_phylink_get_pauseparam()
====================
net: mlx: migrate to new get_rx_ring_count ethtool API
This series migrates the mlx4 and mlx5 drivers to use the new
.get_rx_ring_count() callback introduced in commit 84eaf4359c36 ("net:
ethtool: add get_rx_ring_count callback to optimize RX ring queries").
Previously, these drivers handled ETHTOOL_GRXRINGS within the
.get_rxnfc() callback. With the dedicated .get_rx_ring_count() API, this
handling can be extracted and simplified.
For mlx5, this affects both the ethernet and IPoIB drivers. The
ETHTOOL_GRXRINGS handling was previously kept in .get_rxnfc() to support
"ethtool -x" when CONFIG_MLX5_EN_RXNFC=n, but this is no longer
necessary with the new dedicated callback.
Note: The mlx4 changes are compile-tested only, while mlx5 changes were
properly tested.
====================
Breno Leitao [Thu, 13 Nov 2025 16:46:04 +0000 (08:46 -0800)]
mlx5: extract GRXRINGS from .get_rxnfc
Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.
Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count() for both the mlx5 ethernet and IPoIB drivers.
The ETHTOOL_GRXRINGS handling was previously kept in .get_rxnfc() to
support "ethtool -x" when CONFIG_MLX5_EN_RXNFC=n. With the new
dedicated .get_rx_ring_count() callback, this is no longer necessary.
This simplifies the RX ring count retrieval and aligns mlx5 with the new
ethtool API for querying RX ring parameters.
Breno Leitao [Thu, 13 Nov 2025 16:46:03 +0000 (08:46 -0800)]
mlx4: extract GRXRINGS from .get_rxnfc
Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.
Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().
This simplifies the RX ring count retrieval and aligns mlx4 with the new
ethtool API for querying RX ring parameters. This is compiled tested
only.
Jakub Kicinski [Sat, 15 Nov 2025 02:55:38 +0000 (18:55 -0800)]
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2025-11-13
The following pull-request contains common mlx5 updates
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Expose definition for 1600Gbps link mode
net/mlx5: fs, set non default device per namespace
net/mlx5: fs, Add other_eswitch support for steering tables
net/mlx5: Add OTHER_ESWITCH HW capabilities
net/mlx5: Add direct ST mode support for RDMA
PCI/TPH: Expose pcie_tph_get_st_table_loc()
{rdma,net}/mlx5: Query vports mac address from device
====================
Rather than defining one xxx_GMAC_PHY_INTF_SEL_xxx() for each mode,
define xxx_GMAC_PHY_INTF_SEL() which takes the phy_intf_sel value.
Pass the appropriate value into these new macros in the set_to_xxx()
methods.
net: stmmac: rk: convert all bitfields to GRF_FIELD*()
Convert all bitfields to GRF_FIELD() or GRF_FIELD_CONST(), which makes
the bitfield values more readable, and also allows the aarch64 compiler
to produce better code.
net: stmmac: rk: replace HIWORD_UPDATE() with GRF_FIELD()
Provide GRF_FIELD() which takes the high/low bit numbers of the field
and field value, generates the mask and passes it to FIELD_PREP_WM16.
Replace all HIWORD_UPDATE() instances with this.
Breno Leitao [Wed, 12 Nov 2025 09:50:23 +0000 (01:50 -0800)]
net: bnx2x: convert to use get_rx_ring_count
Convert the bnx2x driver to use the new .get_rx_ring_count ethtool
operation instead of implementing .get_rxnfc solely for handling
ETHTOOL_GRXRINGS command. This simplifies the code by replacing the
switch statement with a direct return of the queue count.
The new callback provides the same functionality in a more direct way,
following the ongoing ethtool API modernization.
Breno Leitao [Thu, 13 Nov 2025 14:23:29 +0000 (06:23 -0800)]
net: ixgbe: convert to use .get_rx_ring_count
Convert the ixgbe driver to use the new .get_rx_ring_count ethtool
operation for handling ETHTOOL_GRXRINGS command. This simplifies the
code by extracting the ring count logic into a dedicated callback.
The new callback provides the same functionality in a more direct way,
following the ongoing ethtool API modernization.
Jakub Kicinski [Thu, 13 Nov 2025 15:27:03 +0000 (07:27 -0800)]
selftests: drv-net: xdp: make the XDP qstats tests less flaky
The XDP qstats tests send 2k packets over a single socket.
Looks like when netdev CI is busy running those tests in QEMU
occasionally flakes. The target doesn't get to run at all
before all 2000 packets are sent.
Lower the number of packets to 1000 and reopen the socket
every 50 packets, to give RSS a chance to spread the packets
to multiple queues.
For the netdev CI testing either lowering the count or using
multiple sockets is enough, but let's do both for extra resiliency.
selftests: drv-net: xdp: Fix register spill error with clang 20
On clang 20.1.8 the XDP program fails to load with a register spill error.
Since hdr_len is a __u32, the compiler decided it only needed the lower
32-bits of ctx->data, which later triggers the register spill verifier
error.
Jakub Kicinski [Thu, 13 Nov 2025 03:17:00 +0000 (19:17 -0800)]
ipv6: clean up routes when manually removing address with a lifetime
When an IPv6 address with a finite lifetime (configured with valid_lft
and preferred_lft) is manually deleted, the kernel does not clean up the
associated prefix route. This results in orphaned routes (marked "proto
kernel") remaining in the routing table even after their corresponding
address has been deleted.
This is particularly problematic on networks using combination of SLAAC
and bridges.
1. Machine comes up and performs RA on eth0.
2. User creates a bridge
- does an ip -6 addr flush dev eth0;
- adds the eth0 under the bridge.
3. SLAAC happens on br0.
Even tho the address has "moved" to br0 there will still be a route
pointing to eth0, but eth0 is not usable for IP any more.
====================
net: phy: mscc: Add support for PHY LED control
This patch series adds support for controlling the PHY LEDs on the
VSC85xx family of PHYs from Microsemi (now part of Renesas).
The first two patches simplify and consolidate existing probe code
the third patch introduces the LED control functionality.
The LED control feature allows users to configure the LED behavior
based on link activity, speed, and other criteria.
====================
Lad Prabhakar [Wed, 12 Nov 2025 13:57:15 +0000 (13:57 +0000)]
net: phy: mscc: Handle devm_phy_package_join() failure in vsc85xx_probe_common()
devm_phy_package_join() may fail and return a negative error code.
Update vsc85xx_probe_common() to properly handle this failure by
checking the return value and propagating the error to the caller.
Lad Prabhakar [Wed, 12 Nov 2025 13:57:14 +0000 (13:57 +0000)]
net: phy: mscc: Add support for PHY LED control
Add support for the PHY LED controller in the MSCC VSC85xx driver. The
implementation provides LED brightness and hardware control through the
LED subsystem and integrates with the standard 'netdev' trigger.
Introduce new register definitions for the LED behavior register
(MSCC_PHY_LED_BEHAVIOR = 30) and the LED combine disable bits, which
control whether LEDs indicate link-only or combined link and activity
status. Implement a helper, vsc8541_led_combine_disable_set(), to update
these bits safely using phy_modify().
Add support for LED brightness control and hardware mode configuration.
The new callbacks implement the standard LED class operations, allowing
user control through sysfs. The brightness control maps to PHY LED force
on/off modes. The hardware control get and set functions translate
between the PHY-specific LED mode encodings and the LED subsystem
TRIGGER_NETDEV_* rules.
The combine feature is managed automatically based on the selected
rules. When both RX and TX activity are disabled, the combine feature is
turned off, causing LEDs to indicate link-only status. When either RX or
TX activity is enabled, the combine feature remains active and LEDs
indicate combined link and activity.
Register the LED callbacks for all VSC85xx PHY variants so that the LED
subsystem can manage their indicators consistently. Existing device tree
LED configuration and default behavior are preserved.