Meghana Malladi [Tue, 18 Nov 2025 13:55:41 +0000 (19:25 +0530)]
net: ti: icssg-prueth: Add AF_XDP zero copy for RX
Use xsk_pool inside rx_chn to check if a given Rx queue id
is registered for xsk zero copy, which gets populated during
xsk enable.
Update prueth_create_xdp_rxqs to register and support two different
memory models (xsk and page) for a given Rx queue, if registered for
zero copy.
If xsk_pool is registered, allocate buffers from UMEM and map them
to the hardware Rx descriptors. In NAPI context, run the XDP program
for each packet and process the xsk buffer according to the XDP
result codes. Also allocate new set of buffers from UMEM for the
next batch of NAPI Rx processing. Add XDK_WAKEUP_RX support to support
xsk wakeup for Rx.
Move prueth_create_page_pool to prueth_init_rx_chns to avoid freeing
and re-allocating the system memory every time there is a transition
from zero copy to copy and prevents any type of memory fragmentation
or leak.
Meghana Malladi [Tue, 18 Nov 2025 13:55:40 +0000 (19:25 +0530)]
net: ti: icssg-prueth: Make emac_run_xdp function independent of page
emac_run_xdp function runs xdp program, at a given hook point
in the Rx path of the driver in NAPI context and returns
XDP return codes. In zero copy mode the driver receives
packets using UMEM frames instead of pages (native XDP).
Decouple the usage of page in this function.
Meghana Malladi [Tue, 18 Nov 2025 13:55:38 +0000 (19:25 +0530)]
net: ti: icssg-prueth: Add XSK pool helpers
Implement XSK NDOs (setup, wakeup) and create XSK
Rx and Tx queues. xsk_qid stores the queue id for
a given port which has been registered for zero copy
AF_XDP and used to acquire UMEM pointer if registered.
Based on the xsk_qid and the xsk_pool (umem) the driver
is either in copy or zero copy mode. In case of copy mode
the xsk_qid value will be invalid and will be set to valid
queue id when enabling zero copy. To enable zero copy, the
Rx queues are destroyed, i.e., descriptors pushed to fq
and cq are freed to remap them to xdp buffers from the umem.
Meghana Malladi [Tue, 18 Nov 2025 13:55:37 +0000 (19:25 +0530)]
net: ti: icssg-prueth: Add functions to create and destroy Rx/Tx queues
Each port for a given ICSSG instance has their own set of
Tx and Rx queues. Add functions to create and destroy these
queues, which will be further used while performing ndo_bpf
operations to set up XSK Tx/Rx queues for a given port.
In the destroy Rx queue sequence add teardown wait to ensure
that all the descriptors including the TDCM (teardown completion
marker) have been serviced and freed to avoid any sort of descriptor
leaks.
Jiawen Wu [Tue, 18 Nov 2025 08:02:58 +0000 (16:02 +0800)]
net: txgbe: delay to identify modules in .ndo_open
For QSFP modules, there is a possibility that the module cannot be
identified when read I2C immediately in .ndo_open. So just set the flag
WX_FLAG_NEED_MODULE_RESET and do it in the subtask, which always wait
200 ms to identify the module. And this change has no impact on the
original adaptation.
Jiawen Wu [Tue, 18 Nov 2025 08:02:57 +0000 (16:02 +0800)]
net: txgbe: improve functions of AML 40G devices
Support to identify QSFP modules for AML 40G devices. The definition of
GPIO pins follows the design of the QSFP modules, and TXGBE_GPIOBIT_4 is
used for module present.
Meanwhile, implement phylink in XLGMII mode by default, and get the link
state from MAC link.
Jiawen Wu [Tue, 18 Nov 2025 08:02:56 +0000 (16:02 +0800)]
net: txgbe: rename the SFP related
QSFP supported will be introduced for AML 40G devices, the code related
to identify various modules should be renamed to more appropriate names.
And struct txgbe_hic_i2c_read used to get module information is renamed
as struct txgbe_hic_get_module_info, because another SW-FW command to
read I2C will be added later.
====================
net/mlx5: Move notifiers outside the devlink lock
This series by Cosmin moves blocking notifier registration in the mlx5
driver outside the devlink lock during probe.
This is mostly a no-op refactoring that consists of multiple pieces.
It is necessary because upcoming code will introduce a potential locking
cycle between the devlink lock and the blocking notifier head mutexes,
so these notifiers must move out of the devlink-locked critical section.
====================
Cosmin Ratiu [Sun, 16 Nov 2025 20:45:40 +0000 (22:45 +0200)]
net/mlx5: Move SF dev table notifier registration outside the PF devlink lock
This completes the previous patches by moving notifier registration for
SF dev tables outside the devlink locked critical section in
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.
This is only done for non-SFs, since SFs do not have a SF HW table
themselves.
After this patch, notifiers can grab the PF devlink lock (soon to be
necessary) without creating a locking cycle.
Cosmin Ratiu [Sun, 16 Nov 2025 20:45:39 +0000 (22:45 +0200)]
net/mlx5: Move the SF table notifiers outside the devlink lock
Move the SF table notifiers registration/unregistration outside of
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.
This is only done for non-SFs, since SFs do not have a SF table
themselves and thus don't need notifiers.
Cosmin Ratiu [Sun, 16 Nov 2025 20:45:38 +0000 (22:45 +0200)]
net/mlx5: Move the SF HW table notifier outside the devlink lock
Move the SF HW table notifier registration/unregistration outside of
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.
This is only done for non-SFs, since SFs do not have a SF HW table
themselves.
Cosmin Ratiu [Sun, 16 Nov 2025 20:45:37 +0000 (22:45 +0200)]
net/mlx5: Move the vhca event notifier outside of the devlink lock
The vhca event notifier consists of an atomic notifier for vhca state
changes (used for SF events), multiple workqueues and a blocking
notifier chain for delivering the vhca state change events for further
processing.
This patch moves the vhca notifier head outside of mlx5_init_one() /
mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit()
functions.
This allows called notifiers to grab the PF devlink lock which was
previously impossible because it would create a circular lock
dependency.
mlx5_vhca_event_stop() is now called earlier in the cleanup phase and
flushes the workqueues to ensure that after the call, there are no
pending events. This simplifies the cleanup flow for vhca event
consumers.
Cosmin Ratiu [Sun, 16 Nov 2025 20:45:36 +0000 (22:45 +0200)]
net/mlx5: Move the esw mode notifier chain outside the devlink lock
The esw mode change notifier chain is initialized/cleaned up in
mlx5_init_one() / mlx5_uninit_one() with the devlink lock held.
Move the notifier head from the eswitch struct into mlx5_priv directly,
and initialize it outside the critical section. This will allow notifier
registration to happen earlier in the init procedure in subsequent
patches.
Cosmin Ratiu [Sun, 16 Nov 2025 20:45:35 +0000 (22:45 +0200)]
net/mlx5: Initialize events outside devlink lock
Move event init/cleanup outside of mlx5_init_one() / mlx5_uninit_one()
and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions.
By doing this, we avoid the events being reinitialized on devlink reload
and, more importantly, the events->sw_nh notifier chain becomes
available earlier in the init procedure, which will be used in
subsequent patches. This makes sense because the events struct is pure
software, independent of any HW details.
====================
net: adjust conservative values around napi
This series keeps at least 96 skbs per cpu and frees 32 skbs at one
time in conclusion. More initial discussions with Eric can be seen at
the link [1].
Jason Xing [Tue, 18 Nov 2025 07:06:46 +0000 (15:06 +0800)]
net: prefetch the next skb in napi_skb_cache_get()
After getting the current skb in napi_skb_cache_get(), the next skb in
cache is highly likely to be used soon, so prefetch would be helpful.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251118070646.61344-5-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Tue, 18 Nov 2025 07:06:45 +0000 (15:06 +0800)]
net: use NAPI_SKB_CACHE_FREE to keep 32 as default to do bulk free
- Replace NAPI_SKB_CACHE_HALF with NAPI_SKB_CACHE_FREE
- Only free 32 skbs in napi_skb_cache_put()
Since the first patch adjusting NAPI_SKB_CACHE_SIZE to 128, the number
of packets to be freed in the softirq was increased from 32 to 64.
Considering a subsequent net_rx_action() calling napi_poll() a few
times can easily consume the 64 available slots and we can afford
keeping a higher value of sk_buffs in per-cpu storage, decrease
NAPI_SKB_CACHE_FREE to 32 like before. So now the logic is 1) keeping
96 skbs, 2) freeing 32 skbs at one time.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251118070646.61344-4-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Tue, 18 Nov 2025 07:06:44 +0000 (15:06 +0800)]
net: increase default NAPI_SKB_CACHE_BULK to 32
The previous value 16 is a bit conservative, so adjust it along with
NAPI_SKB_CACHE_SIZE, which can minimize triggering memory allocation
in napi_skb_cache_get*().
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251118070646.61344-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Tue, 18 Nov 2025 07:06:43 +0000 (15:06 +0800)]
net: increase default NAPI_SKB_CACHE_SIZE to 128
After commit b61785852ed0 ("net: increase skb_defer_max default to 128")
changed the value sysctl_skb_defer_max to avoid many calls to
kick_defer_list_purge(), the same situation can be applied to
NAPI_SKB_CACHE_SIZE that was proposed in 2016. It's a trade-off between
using pre-allocated memory in skb_cache and saving more a bit heavy
function calls in the softirq context.
With this patch applied, we can have more skbs per-cpu to accelerate the
sending path that needs to acquire new skbs.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251118070646.61344-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Disable CLKOUT on RTL8211F(D)(I)-VD-CG
The Realtek RTL8211F(D)(I)-VD-CG is similar to other RTL8211F models in
that the CLKOUT signal can be turned off - a feature requested to reduce
EMI, and implemented via "realtek,clkout-disable" as documented in
Documentation/devicetree/bindings/net/realtek,rtl82xx.yaml.
It is also dissimilar to said PHY models because it has no PHYCR2
register, and disabling CLKOUT is done through some other register.
The strategy adopted in this 6-patch series is to make the PHY driver
not think in terms of "priv->has_phycr2" and "priv->phycr2", but of more
high-level features ("priv->disable_clk_out") while maintaining behaviour.
Then, the logic is extended for the new PHY.
Very loosely based on previous work from Clark Wang, who took a
different approach, to pretend that the RTL8211FVD_CLKOUT_REG is
actually this PHY's PHYCR2.
====================
To simplify the rtl8211f_config_init() control flow and get rid of
"early" returns for PHYs where the PHYCR2 register is absent, move the
entire logic sub-block that deals with disabling PHY-mode EEE to a
separate function. There, it is much more obvious what the early
"return 0" skips, and it becomes more difficult to accidentally skip
unintended stuff.
Previous changes have replaced the machine-level priv->phycr2 with a
high-level priv->disable_clk_out. This created a discrepancy with
priv->phycr1 which is resolved here, for uniformity.
One advantage of this new implementation is that we don't read
priv->phycr1 in rtl821x_probe() if we're never going to modify it.
We never test the positive return code from phy_modify_mmd_changed(), so
we could just as well use phy_modify_mmd().
I took the ALDPS feature description from commit d90db36a9e74 ("net:
phy: realtek: add dt property to enable ALDPS mode") and transformed it
into a function comment - the feature is sufficiently non-obvious to
deserve that.
Vladimir Oltean [Mon, 17 Nov 2025 23:40:31 +0000 (01:40 +0200)]
net: phy: realtek: allow CLKOUT to be disabled on RTL8211F(D)(I)-VD-CG
Add CLKOUT disable support for RTL8211F(D)(I)-VD-CG. Like with other PHY
variants, this feature might be requested by customers when the clock
output is not used, in order to reduce electromagnetic interference (EMI).
In the common driver, the CLKOUT configuration is done through PHYCR2.
The RTL_8211FVD_PHYID is singled out as not having that register, and
execution in rtl8211f_config_init() returns early after commit 2c67301584f2 ("net: phy: realtek: Avoid PHYCR2 access if PHYCR2 not
present").
But actually CLKOUT is configured through a different register for this
PHY. Instead of pretending this is PHYCR2 (which it is not), just add
some code for modifying this register inside the rtl8211f_disable_clk_out()
function, and move that outside the code portion that runs only if
PHYCR2 exists.
In practice this reorders the PHYCR2 writes to disable PHY-mode EEE and
to disable the CLKOUT for the normal RTL8211F variants, but this should
be perfectly fine.
It was not noted that RTL8211F(D)(I)-VD-CG would need a genphy_soft_reset()
call after disabling the CLKOUT. Despite that, we do it out of caution
and for symmetry with the other RTL8211F models.
Co-developed-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251117234033.345679-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Mon, 17 Nov 2025 23:40:30 +0000 (01:40 +0200)]
net: phy: realtek: eliminate has_phycr2 variable
This variable is assigned in rtl821x_probe() and used in
rtl8211f_config_init(), which is more complex than it needs to be.
Simply testing the same condition from rtl821x_probe() in
rtl8211f_config_init() yields the same result (the PHY driver ID is a
runtime invariant), but with one temporary variable less.
The RTL8211F(D)(I)-VD-CG PHY also has support for disabling the CLKOUT,
and we'd like to introduce the "realtek,clkout-disable" property for
that.
But it isn't done through the PHYCR2 register, and it becomes awkward to
have the driver pretend that it is. So just replace the machine-level
"u16 phycr2" variable with a logical "bool disable_clk_out", which
scales better to the other PHY as well.
The change is a complete functional equivalent. Before, if the device
tree property was absent, priv->phycr2 would contain the RTL8211F_CLKOUT_EN
bit as read from hardware. Now, we don't save priv->phycr2, but we just
don't call phy_modify_paged() on it. Also, we can simply call
phy_modify_paged() with the "set" argument to 0.
The control flow in rtl8211f_config_init() has some pitfalls which were
probably unintended. Specifically it has an early return:
switch (phydev->interface) {
...
default: /* the rest of the modes imply leaving delay as is. */
return 0;
}
which exits the entire config_init() function. This means it also skips
doing things such as disabling CLKOUT or disabling PHY-mode EEE.
For the RTL8211FS, which uses PHY_INTERFACE_MODE_SGMII, this might be a
problem. However, I don't know that it is, so there is no Fixes: tag.
The issue was observed through code inspection.
Breno Leitao [Tue, 18 Nov 2025 09:44:56 +0000 (01:44 -0800)]
net: vmxnet3: convert to use .get_rx_ring_count
Convert the vmxnet3 driver to use the new .get_rx_ring_count ethtool
operation instead of implementing .get_rxnfc solely for handling
ETHTOOL_GRXRINGS command. This simplifies the code by removing the
switch statement and replacing it with a direct return of the queue
count.
The new callback provides the same functionality in a more direct way,
following the ongoing ethtool API modernization.
Add pre-transmission checks to block SKBs that exceed the hardware's SGE
limit. Force software segmentation for GSO traffic and linearize non-GSO
packets as needed.
Update TX error handling to drop failed SKBs and unmap resources
immediately.
====================
Aditya Garg [Tue, 18 Nov 2025 11:11:08 +0000 (03:11 -0800)]
net: mana: Handle SKB if TX SGEs exceed hardware limit
The MANA hardware supports a maximum of 30 scatter-gather entries (SGEs)
per TX WQE. Exceeding this limit can cause TX failures.
Add ndo_features_check() callback to validate SKB layout before
transmission. For GSO SKBs that would exceed the hardware SGE limit, clear
NETIF_F_GSO_MASK to enforce software segmentation in the stack.
Add a fallback in mana_start_xmit() to linearize non-GSO SKBs that still
exceed the SGE limit.
Also, Add ethtool counter for SKBs linearized
Co-developed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1763464269-10431-2-git-send-email-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Anshumali Gaur [Tue, 18 Nov 2025 05:42:34 +0000 (11:12 +0530)]
octeontx2-af: Skip TM tree print for disabled SQs
Currently, the TM tree is printing all SQ topology including those
which are not enabled, this results in redundant output for SQs
which are not active. This patch adds a check in print_tm_tree()
to skip printing the TM tree hierarchy if the SQ is not enabled.
Sjoerd Simons [Sat, 15 Nov 2025 20:58:09 +0000 (21:58 +0100)]
dt-bindings: net: mediatek,net: Correct bindings for MT7981
Different SoCs have different numbers of Wireless Ethernet
Dispatch (WED) units:
- MT7981: Has 1 WED unit
- MT7986: Has 2 WED units
- MT7988: Has 2 WED units
Update the binding to reflect these hardware differences. The MT7981
also uses infracfg for PHY switching, so allow that property.
stmmac_is_jumbo_frm() takes skb->len, which is unsigned int, but the
parameter is passed as an "int" and then tested using signed
comparisons. This can cause bugs. Change the parameter to be unsigned.
Also arrange for it to return a bool.
====================
stmmac_is_jumbo_frm() returns whether the driver considers the frame
size to be a jumbo frame, and thus returns 0/1 values. This is boolean,
so convert it to return a boolean and use false/true instead. Also
convert stmmac_xmit()'s is_jumbo to be bool, which causes several
variables to be repositioned to keep it in reverse Christmas-tree
order.
net: stmmac: stmmac_is_jumbo_frm() len should be unsigned
stmmac_is_jumbo_frm() and the is_jumbo_frm() methods take skb->len
which is an unsigned int. Avoid an implicit cast to "int" via the
method parameter and then incorrectly doing signed comparisons on
this unsigned value.
selftests: fib_tests: add fib6 from ra to static test
The new test checks that a route that has been promoted from RA-learned
to static does not switch back when a new RA message arrives. In
addition, it checks that the route is owned by RA again when the static
address is removed.
When an IPv6 Router Advertisement (RA) is received for a prefix, the
kernel creates the corresponding on-link route with flags RTF_ADDRCONF
and RTF_PREFIX_RT configured and RTF_EXPIRES if lifetime is set.
If later a user configures a static IPv6 address on the same prefix the
kernel clears the RTF_EXPIRES flag but it doesn't clear the RTF_ADDRCONF
and RTF_PREFIX_RT. When the next RA for that prefix is received, the
kernel sees the route as RA-learned and wrongly configures back the
lifetime. This is problematic because if the route expires, the static
address won't have the corresponding on-link route.
This fix clears the RTF_ADDRCONF and RTF_PREFIX_RT flags preventing that
the lifetime is configured when the next RA arrives. If the static
address is deleted, the route becomes RA-learned again.
Fixes: 14ef37b6d00e ("ipv6: fix route lookup in addrconf_prefix_rcv()") Reported-by: Garri Djavadyan <g.djavadyan@gmail.com> Closes: https://lore.kernel.org/netdev/ba807d39aca5b4dcf395cc11dca61a130a52cfd3.camel@gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20251115095939.6967-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
unix_tot_inflight is a poor metric, only telling the number of
inflight AF_UNXI sockets, and we should use unix_graph_state instead.
Also, if the receiver is catching up with the passed fds, the
sender does not need to schedule GC.
GC only helps unreferenced cyclic SCM_RIGHTS references, and in
such a situation, the malicious sendmsg() will continue to call
wait_for_unix_gc() and hit the UNIX_INFLIGHT_SANE_USER condition.
Let's make only malicious users schedule GC and wait for it to
finish if a cyclic reference exists during the previous GC run.
Then, sane users will pay almost no cost for wait_for_unix_gc().
af_unix: Don't call wait_for_unix_gc() on every sendmsg().
We have been calling wait_for_unix_gc() on every sendmsg() in case
there are too many inflight AF_UNIX sockets.
This is also because the old GC implementation had poor knowledge
of the inflight sockets and had to suspect every sendmsg().
This was improved by commit d9f21b361333 ("af_unix: Try to run GC
async."), but we do not even need to call wait_for_unix_gc() if the
process is not sending AF_UNIX sockets.
The wait_for_unix_gc() call only helps when a malicious process
continues to create cyclic references, and we can detect that
in a better place and slow it down.
Let's move wait_for_unix_gc() to unix_prepare_fpl() that is called
only when AF_UNIX socket fd is passed via SCM_RIGHTS.
af_unix: Don't trigger GC from close() if unnecessary.
We have been triggering GC on every close() if there is even one
inflight AF_UNIX socket.
This is because the old GC implementation had no idea of the graph
shape formed by SCM_RIGHTS references.
The new GC knows whether there could be a cyclic reference or not,
and we can do better.
Let's not trigger GC from close() if there is no cyclic reference
or GC is already in progress.
While at it, unix_gc() is renamed to unix_schedule_gc() as it does
not actually perform GC since commit 8b90a9f819dc ("af_unix: Run
GC on only one CPU.").
GC manages its state by two variables, unix_graph_maybe_cyclic
and unix_graph_grouped, both of which are set to false in the
initial state.
When an AF_UNIX socket is passed to an in-flight AF_UNIX socket,
unix_update_graph() sets unix_graph_maybe_cyclic to true and
unix_graph_grouped to false, making the next GC invocation call
unix_walk_scc() to group SCCs.
Once unix_walk_scc() finishes, sockets in the same SCC are linked
via vertex->scc_entry. Then, unix_graph_grouped is set to true
so that the following GC invocations can skip Tarjan's algorithm
and simply iterate through the list in unix_walk_scc_fast().
In addition, if we know there is at least one cyclic reference,
we set unix_graph_maybe_cyclic to true so that we do not skip GC.
__unix_walk_scc() and unix_walk_scc_fast() call unix_scc_cyclic()
for each SCC to check if it forms a cyclic reference, so that we
can skip GC at the following invocations in case all SCCs do not
have any cycles.
If we count the number of cyclic SCCs in __unix_walk_scc(), we can
simplify unix_walk_scc_fast() because the number of cyclic SCCs
only changes when it garbage-collects a SCC.
So, let's count cyclic SCC in __unix_walk_scc() and decrement it
in unix_walk_scc_fast() when performing garbage collection.
Note that we will use this counter in a later patch to check if a
cycle existed in the previous GC run.
Saeed Mahameed [Mon, 17 Nov 2025 21:42:08 +0000 (23:42 +0200)]
net/mlx5: Abort new commands if all command slots are stalled
In case of a FW issue, FW might be not responding to FW commands,
causing kernel lockout for a long period of time, e.g. rtnl_lock held
while ethtool is trying to collect stats waiting for FW to respond to
multiple commands, when all of them will timeout.
While there's no immediate indication of the FW lockout, we can safely
assume that something is wrong when all command slots are busy and in
a timeout state and no FW completion was received on any of them.
In such case, start immediately failing new commands.
Carolina Jubran [Mon, 17 Nov 2025 21:42:07 +0000 (23:42 +0200)]
net/mlx5: Remove redundant bw_share minimal value assignment
Remove unnecessary logic that sets bw_share to minimal value, when
parent has bw_share configured but nodes don't have min_rate.
This check is redundant because the parent bandwidth acts as the upper
bound regardless, and the firmware always enforces the topmost
bandwidth constraint.
Carolina Jubran [Mon, 17 Nov 2025 21:42:06 +0000 (23:42 +0200)]
net/mlx5e: Recover SQ on excessive PTP TX timestamp delta
Extend the TX timestamp handler to recover the SQ when the difference
between the port and CQE TX timestamps is abnormally large.
The current logic aborts timestamp delivery if the delta exceeds
1/128 seconds, which matches the maximum expected packet interval in
ptp4l. A larger delta makes the timestamps unreliable.
This change adds recovery if the delta exceeds 0.5 seconds. Such a
large gap should not occur in normal operation and indicates that
firmware is stuck or metadata tracking is out of sync, leading to stale
or mismatched timestamps. Recovering the SQ ensures forward progress
and avoids silently dropping invalid timestamps.
The timestamp handler now takes mlx5e_ptpsq directly to access both CQ
stats and the recovery state.
Gal Pressman [Mon, 17 Nov 2025 21:42:05 +0000 (23:42 +0200)]
net/mlx5: Refactor EEPROM query error handling to return status separately
Matthew and Jakub reported [1] issues where inventory automation tools
are calling EEPROM query repeatedly on a port that doesn't have an SFP
connected, resulting in millions of error prints.
Move MCIA register status extraction from the query functions to the
callers, allowing use of extack reporting instead of a dmesg print when
using the netlink API.
Cc: Matthew W Carlis <mattc@purestorage.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763415729-1238421-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Mon, 17 Nov 2025 02:44:56 +0000 (02:44 +0000)]
netlink: specs: support ipv4-or-v6 for dual-stack fields
Since commit 1b255e1beabf ("tools: ynl: add ipv4-or-v6 display hint"), we
can display either IPv4 or IPv6 addresses for a single field based on the
address family. However, most dual-stack fields still use the ipv4 display
hint. This update changes them to use the new ipv4-or-v6 display hint and
converts IPv4-only fields to use the u32 type.
Hangbin Liu [Mon, 17 Nov 2025 02:44:55 +0000 (02:44 +0000)]
tools: ynl: Add MAC address parsing support
Add missing support for parsing MAC addresses when display_hint is 'mac'
in the YNL library. This enables YNL CLI to accept MAC address strings
for attributes like lladdr in rt-neigh operations.
Jakub Kicinski [Wed, 19 Nov 2025 02:25:48 +0000 (18:25 -0800)]
Merge branch 'net-expand-napi_skb_cache-use'
Eric Dumazet says:
====================
net: expand napi_skb_cache use
This is a followup of commit e20dfbad8aab ("net: fix napi_consume_skb()
with alien skbs").
Now the per-cpu napi_skb_cache is populated from TX completion path,
we can make use of this cache, especially for cpus not used
from a driver NAPI poll (primary user of napi_cache).
With this series, I consistently reach 130 Mpps on my UDP tx stress test
and reduce SLUB spinlock contention to smaller values.
====================
Eric Dumazet [Sun, 16 Nov 2025 20:27:17 +0000 (20:27 +0000)]
net: use napi_skb_cache even in process context
This is a followup of commit e20dfbad8aab ("net: fix napi_consume_skb()
with alien skbs").
Now the per-cpu napi_skb_cache is populated from TX completion path,
we can make use of this cache, especially for cpus not used
from a driver NAPI poll (primary user of napi_cache).
We can use the napi_skb_cache only if current context is not from hard irq.
With this patch, I consistently reach 130 Mpps on my UDP tx stress test
and reduce SLUB spinlock contention to smaller values.
Note there is still some SLUB contention for skb->head allocations.
I had to tune /sys/kernel/slab/skbuff_small_head/cpu_partial
and /sys/kernel/slab/skbuff_small_head/min_partial depending
on the platform taxonomy.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251116202717.1542829-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alok Tiwari [Mon, 17 Nov 2025 09:53:50 +0000 (01:53 -0800)]
net: dsa: ks8995: Fix incorrect OF match table name
The driver declares an OF match table named ks8895_spi_of_match, even
though it describes compatible strings for the KS8995 and related Micrel
switches. This is a leftover typo, the correct name should match the
chip family handled by this driver ks8995, and also match the variable
used in spi_driver.of_match_table.
1) Relax a lock contention bottleneck to improve IPsec crypto
offload performance. From Jianbo Liu.
2) Deprecate pfkey, the interface will be removed in 2027.
3) Update xfrm documentation and move it to ipsec maintainance.
From Bagas Sanjaya.
* tag 'ipsec-next-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
MAINTAINERS: Add entry for XFRM documentation
net: Move XFRM documentation into its own subdirectory
Documentation: xfrm_sync: Number the fifth section
Documentation: xfrm_sysctl: Trim trailing colon in section heading
Documentation: xfrm_sync: Trim excess section heading characters
Documentation: xfrm_sync: Properly reindent list text
Documentation: xfrm_device: Separate hardware offload sublists
Documentation: xfrm_device: Use numbered list for offloading steps
Documentation: xfrm_device: Wrap iproute2 snippets in literal code block
pfkey: Deprecate pfkey
xfrm: Skip redundant replay recheck for the hardware offload path
xfrm: Refactor xfrm_input lock to reduce contention with RSS
====================
====================
gve: Implement XDP HW RX Timestamping support for DQ
From: Tim Hostetler <thostet@google.com>
This patch series adds support for bpf_xdp_metadata_rx_timestamp from an
XDP program loaded into the driver on its own or bound to an XSK. This
is only supported for DQ.
====================
Tim Hostetler [Fri, 14 Nov 2025 21:11:46 +0000 (13:11 -0800)]
gve: Add Rx HWTS metadata to AF_XDP ZC mode
By overlaying the struct gve_xdp_buff on top of the struct xdp_buff_xsk
that AF_XDP utilizes, the driver records the 32 bit timestamp via the
completion descriptor and the cached 64 bit NIC timestamp via gve_priv.
The driver's implementation of xmo_rx_timestamp extends the timestamp to
the full and up to date 64 bit timestamp and returns it to the user.
gve_rx_xsk_dqo is modified to accept a pointer to the completion
descriptor and no longer takes a buf_len explicitly as it can be pulled
out of the descriptor.
With this patch gve now supports bpf_xdp_metadata_rx_timestamp.
Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-5-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tim Hostetler [Fri, 14 Nov 2025 21:11:45 +0000 (13:11 -0800)]
gve: Prepare bpf_xdp_metadata_rx_timestamp support
Support populating XDP RX metadata with hardware RX timestamps. This
patch utilizes the same underlying logic to calculate hardware
timestamps as the regular RX path.
xdp_metadata_ops is registered with the net_device in a future patch.
gve_rx_calculate_hwtstamp was pulled out so as to not duplicate logic
between gve_xdp_rx_timestamp and gve_rx_hwtstamp.
Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-4-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tim Hostetler [Fri, 14 Nov 2025 21:11:44 +0000 (13:11 -0800)]
gve: Wrap struct xdp_buff
RX timestamping will need to keep track of extra temporary information
per-packet. In preparation for this, introduce gve_xdp_buff to wrap the
xdp_buff. This is similar in function to stmmac_xdp_buff and
ice_xdp_buff.
Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-3-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tim Hostetler [Fri, 14 Nov 2025 21:11:43 +0000 (13:11 -0800)]
gve: Move ptp_schedule_worker to gve_init_clock
Previously, gve had been only initializing ptp aux work when
hardware timestamping was initialized through ndo_hwtsatmp_set. As this
patch series introduces XDP hardware timestamp metadata which will
require the ptp aux work, the work can't be gated on the
kernel_hwtstamp_config being set and must be initialized elsewhere.
For simplicity, ptp_schedule_worker is invoked right after the ptp_clock
is registered with the kernel (which happens during gve_probe or
following reset). The worker is scheduled in GVE_NIC_TS_SYNC_INTERVAL_MS
as the synchronous call to gve_clock_nic_ts_read makes the worker
redundant if scheduled immediately.
If gve cannot read the device clock immediately, it errors out of
gve_init_clock.
Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-2-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The lan8814 supports two interfaces towards the host (QSGMII and QUSGMII).
Currently the lan8814 disables the auto-negotiation towards the host
side. So, extend this to allow to configure to use in-band
auto-negotiation.
I have tested this only with the QSGMII interface.
Sunday Adelodun [Thu, 13 Nov 2025 11:28:02 +0000 (12:28 +0100)]
selftests: af_unix: Add tests for ECONNRESET and EOF semantics
Add selftests to verify and document Linux’s intended behaviour for
UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes.
The tests verify that:
1. SOCK_STREAM returns EOF when the peer closes normally.
2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data.
3. SOCK_SEQPACKET returns EOF when the peer closes normally.
4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data.
5. SOCK_DGRAM does not return ECONNRESET when the peer closes.
This follows up on review feedback suggesting a selftest to clarify
Linux’s semantics.
====================
net: stmmac: Disable EEE RX clock stop when VLAN is enabled
This series fixes a couple of VLAN issues observed on the Renesas RZ/V2H
EVK platform (stmmac + Microchip KSZ9131RNXI PHY):
- The first patch fixes a bug where VLAN ID 0 would not be properly removed
due to how vlan_del_hw_rx_fltr() matched entries in the VLAN filter table.
- The second patch addresses RX clock gating issues that occur during VLAN
creation and deletion when EEE is enabled with RX clock-stop active (the
default configuration). For example:
# ip link add link end1 name end1.5 type vlan id 5 15c40000.ethernet end1: Timeout accessing MAC_VLAN_Tag_Filter
RTNETLINK answers: Device or resource busy
The stmmac hardware requires the receive clock to be running when writing
certain registers, including VLAN registers. However, by default the driver
enables Energy Efficient Ethernet (EEE) and allows the PHY to stop the
receive clock when the link is idle. As a result, the RX clock might be
stopped when attempting to access these registers, leading to timeouts.
A more comprehensive overview of receive clock related issues in the
stmmac driver can be found here:
https://lore.kernel.org/all/Z9ySeo61VYTClIJJ@shell.armlinux.org.uk/
Most of the issues were resolved by commit dd557266cf5fb ("net: stmmac:
block PHY RXC clock-stop"), which wraps register accesses with
phylink_rx_clk_stop_block()/unblock() calls. However, VLAN add/delete
operations are invoked with bottom halves disabled, where sleeping is
not permitted, so those helpers cannot be used.
To avoid these VLAN timeouts, the second patch disables the EEE RX
clock-stop feature when VLAN support is enabled. This ensures the receive
clock remains active, allowing VLAN operations to complete successfully.
====================
Ovidiu Panait [Thu, 13 Nov 2025 11:27:21 +0000 (11:27 +0000)]
net: stmmac: Disable EEE RX clock stop when VLAN is enabled
On the Renesas RZ/V2H EVK platform, where the stmmac MAC is connected to a
Microchip KSZ9131RNXI PHY, creating or deleting VLAN interfaces may fail
with timeouts:
# ip link add link end1 name end1.5 type vlan id 5 15c40000.ethernet end1: Timeout accessing MAC_VLAN_Tag_Filter
RTNETLINK answers: Device or resource busy
Disabling EEE at runtime avoids the problem:
# ethtool --set-eee end1 eee off
# ip link add link end1 name end1.5 type vlan id 5
# ip link del end1.5
The stmmac hardware requires the receive clock to be running when writing
certain registers, such as those used for MAC address configuration or
VLAN filtering. However, by default the driver enables Energy Efficient
Ethernet (EEE) and allows the PHY to stop the receive clock when the link
is idle. As a result, the RX clock might be stopped when attempting to
access these registers, leading to timeouts and other issues.
Commit dd557266cf5fb ("net: stmmac: block PHY RXC clock-stop")
addressed this issue for most register accesses by wrapping them in
phylink_rx_clk_stop_block()/phylink_rx_clk_stop_unblock() calls.
However, VLAN add/delete operations may be invoked with bottom halves
disabled, where sleeping is not allowed, so using these helpers is not
possible.
Therefore, to fix this, disable the RX clock stop feature in the phylink
configuration if VLAN features are set. This ensures the RX clock remains
active and register accesses succeed during VLAN operations.
Ovidiu Panait [Thu, 13 Nov 2025 11:27:20 +0000 (11:27 +0000)]
net: stmmac: Fix VLAN 0 deletion in vlan_del_hw_rx_fltr()
When the "rx-vlan-filter" feature is enabled on a network device, the 8021q
module automatically adds a VLAN 0 hardware filter when the device is
brought administratively up.
For stmmac, this causes vlan_add_hw_rx_fltr() to create a new entry for
VID 0 in the mac_device_info->vlan_filter array, in the following format:
VLAN_TAG_DATA_ETV | VLAN_TAG_DATA_VEN | vid
Here, VLAN_TAG_DATA_VEN indicates that the hardware filter is enabled for
that VID.
However, on the delete path, vlan_del_hw_rx_fltr() searches the vlan_filter
array by VID only, without verifying whether a VLAN entry is enabled. As a
result, when the 8021q module attempts to remove VLAN 0, the function may
mistakenly match a zero-initialized slot rather than the actual VLAN 0
entry, causing incorrect deletions and leaving stale entries in the
hardware table.
Fix this by verifying that the VLAN entry's enable bit (VLAN_TAG_DATA_VEN)
is set before matching and deleting by VID. This ensures only active VLAN
entries are removed and avoids leaving stale entries in the VLAN filter
table, particularly for VLAN ID 0.
Fixes: ed64639bc1e08 ("net: stmmac: Add support for VLAN Rx filtering") Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20251113112721.70500-2-ovidiu.panait.rb@renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
dpll: zl3073x: Refactor state management
This patch set is a refactoring of the zl3073x driver to clean up
state management, improve modularity, and significantly reduce
on-demand I/O.
The driver's dpll.c implementation previously performed on-demand
register reads and writes (wrapped in mailbox operations) to get
or set properties like frequency, phase, and embedded-sync settings.
This cluttered the DPLL logic with low-level I/O, duplicated locking,
and led to inefficient bus traffic.
This series addresses this by:
1. Splitting the monolithic 'core.c' into logical units ('ref.c',
'out.c', 'synth.c').
2. Implementing a full read/write-back cache for 'zl3073x_ref' and
'zl3073x_out' structures.
All state is now read once during '_state_fetch()' (and status updated
periodically). DPLL get callbacks read from this cache. Set callbacks
modify a copy of the state, which is then committed via a new
'..._state_set()' function. These '_state_set' functions compare
the new state to the cached state and write *only* the modified
register values back to the hardware, all within a single mailbox
sequence.
The result is a much cleaner 'dpll.c' that is almost entirely
free of direct register I/O, and all state logic is properly
encapsulated in its respective file.
The series is broken down as follows:
* Patch 1: Changes the state structs to store raw register values
(e.g., 'config', 'ctrl') instead of parsed booleans, centralizing
parsing logic into the helpers.
* Patch 2: Splits the logic from 'core.c' into new 'ref.c', 'out.c'
and 'synth.c' files, creating a 'zl3073x_dev_...' abstraction layer.
* Patch 3: Introduces the caching concept by reading and caching
the reference monitor status periodically, removing scattered
reads from 'dpll.c'.
* Patch 4: Expands the 'zl3073x_ref' struct to cache *all* reference
properties and adds 'zl3073x_ref_state_set()' to write back changes.
* Patch 5: Does the same for the 'zl3073x_out' struct, caching all
output properties and adding 'zl3073x_out_state_set()'.
* Patch 6: A final cleanup that removes the 'zl3073x_dev_...' wrapper
functions that became redundant after the refactoring.
====================
Ivan Vecera [Thu, 13 Nov 2025 07:41:04 +0000 (08:41 +0100)]
dpll: zl3073x: Cache all output properties in zl3073x_out
Expand the zl3073x_out structure to cache all output-related
hardware registers, including divisors, widths, embedded-sync
parameters and phase compensation.
Modify zl3073x_out_state_fetch() to read and populate all these
new fields at once, including zero-divisor checks. Refactor all
dpll "getter" functions in dpll.c to read from this new
cached state instead of performing direct register access.
Introduce a new function, zl3073x_out_state_set(), to handle
writing changes back to the hardware. This function compares the
provided state with the current cached state and writes *only* the
modified register values via a single mailbox sequence before
updating the local cache.
Refactor all dpll "setter" functions to modify a local copy of
the output state and then call zl3073x_out_state_set() to
commit the changes.
This change centralizes all output-related register I/O into
out.c, significantly reduces bus traffic, and simplifies the logic
in dpll.c.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-6-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Thu, 13 Nov 2025 07:41:03 +0000 (08:41 +0100)]
dpll: zl3073x: Cache all reference properties in zl3073x_ref
Expand the zl3073x_ref structure to cache all reference-related
hardware registers, including frequency components, embedded-sync
settings and phase compensation. Previously, these registers were
read on-demand from various functions in dpll.c leading to frequent
mailbox operations.
Modify zl3073x_ref_state_fetch() to read and populate all these new
fields at once. Refactor all "getter" functions in dpll.c to read
from this new cached state instead of performing direct register
access.
Remove the standalone zl3073x_dpll_input_ref_frequency_get() helper,
as its functionality is now replaced by zl3073x_ref_freq_get() which
operates on the cached state and add a corresponding zl3073x_dev_...
wrapper.
Introduce a new function, zl3073x_ref_state_set(), to handle
writing changes back to the hardware. This function compares the
provided state with the current cached state and writes *only* the
modified register values to the device via a single mailbox sequence
before updating the local cache.
Refactor all dpll "setter" functions to modify a local copy of the
ref state and then call zl3073x_ref_state_set() to commit the changes.
As a cleanup, update callers in dpll.c that already have
a struct zl3073x_ref * to use the direct helpers instead of the
zl3073x_dev_... wrappers.
This change centralizes all reference-related register I/O into ref.c,
significantly reduces bus traffic, and simplifies the logic in dpll.c.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-5-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Thu, 13 Nov 2025 07:41:02 +0000 (08:41 +0100)]
dpll: zl3073x: Cache reference monitor status
Instead of reading the ZL_REG_REF_MON_STATUS register every time
the reference status is needed, cache this value in the zl3073x_ref
struct.
This is achieved by:
* Adding a mon_status field to struct zl3073x_ref
* Introducing zl3073x_dev_ref_status_update() to read the status for
all references into this new cache field
* Calling this update function from the periodic work handler
* Adding zl3073x_ref_is_status_ok() and zl3073x_dev_ref_is_status_ok()
helpers to check the cached value
* Refactoring all callers in dpll.c to use the new
zl3073x_dev_ref_is_status_ok() helper, removing direct register reads
This change consolidates all status register reads into a single periodic
function and reduces I/O bus traffic in dpll callbacks.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-4-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Thu, 13 Nov 2025 07:41:01 +0000 (08:41 +0100)]
dpll: zl3073x: Split ref, out, and synth logic from core
Refactor the zl3073x driver by splitting the logic for input
references, outputs and synthesizers out of the monolithic
core.[ch] files.
Move the logic for each functional block into its own dedicated files:
ref.[ch], out.[ch] and synth.[ch].
Specifically:
- Move state structures (zl3073x_ref, zl3073x_out, zl3073x_synth)
from core.h into their respective new headers
- Move state-fetching functions (..._state_fetch) from core.c to their
new .c files
- Move the zl3073x_ref_freq_factorize helper from core.c to ref.c
- Introduce a new helper layer to decouple the core device logic from
the state-parsing logic:
1. Move the original inline helpers (e.g., zl3073x_ref_is_enabled)
to the new headers (ref.h, etc.) and make them operate on a
const struct ... * pointer.
2. Create new zl3073x_dev_... prefixed functions in core.h
(e.g., zl3073x_dev_ref_is_enabled) and Implement these _dev_ functions
to fetch state using a new ..._state_get() helper and then call
the non-prefixed helper.
3. Update all driver-internal callers (in dpll.c, prop.c, etc.) to use
the new zl3073x_dev_... functions.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-3-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Thu, 13 Nov 2025 07:41:00 +0000 (08:41 +0100)]
dpll: zl3073x: Store raw register values instead of parsed state
The zl3073x_ref, zl3073x_out and zl3073x_synth structures
previously stored state that was parsed from register reads. This
included values like boolean 'enabled' flags, synthesizer selections,
and pre-calculated frequencies.
This commit refactors the state management to store the raw register
values directly in these structures. The various inline helper functions
are updated to parse these raw values on-demand using FIELD_GET.
Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20251113074105.141379-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
s390/qeth: Improve handling of OSA RCs
This two patch series aims to improve how return codes from OSA Express
are handled in the qeth driver.
OSA defines a number of return codes whose meaning is determined by the
issuing command, ie. they are ambiguous. The first patch moves
definitions of all return codes including the ambiguous ones to a single
enum block to aid readability and maintainability.
The second patch implements a mechanism to interpret return codes based
on the issuing command to ensure accurate debug messages. While at it,
remove extern keyword and fix indentation for function declarations to
be in line with Linux kernel coding style.
====================
Aswin Karuvally [Thu, 13 Nov 2025 14:42:09 +0000 (15:42 +0100)]
s390/qeth: Handle ambiguous OSA RCs in s390dbf
OSA Express defines a number of return codes whose meaning is determined
by the issuing command, making them ambiguous. The important ones are
reported as debug messages through the s390 debug feature.
The qeth driver currently does not take the issuing command into account
when interpreting the return code which sometimes leads to incorrect
debug messages.
Implement a mechanism to interpret and report these return codes
properly. While at it, remove extern keyword and fix indentation for
function declarations to be in line with Linux kernel coding style.
Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com> Link: https://patch.msgid.link/20251113144209.2140061-3-aswin@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aswin Karuvally [Thu, 13 Nov 2025 14:42:08 +0000 (15:42 +0100)]
s390/qeth: Move all OSA RCs to single enum
OSA Express defines a number of return codes whose meaning is
determined by the issuing command, making them ambiguous. Move
definitions of all return codes including the ambiguous ones to a single
enum block to aid readability and maintainability.
Heiner Kallweit [Thu, 13 Nov 2025 21:09:08 +0000 (22:09 +0100)]
r8169: bail out from probe if fiber mode is detected on RTL8127AF
It was reported that on a card with RTL8127AF (SFP + DAC) link-up isn't
detected. Realtek hides the SFP behind the internal PHY, which isn't
behaving fully compliance with clause 22 any longer in fiber mode.
Due to not having access to chip documentation there isn't much I can
do for now. Instead of silently failing to detect link-up in fiber mode,
inform the user that fiber mode isn't support and bail out.
The logic to detect fiber mode is borrowed from Realtek's r8127 driver.
Inochi Amaoto [Fri, 14 Nov 2025 00:38:04 +0000 (08:38 +0800)]
net: phy: Add helper for fixing RGMII PHY mode based on internal mac delay
The "phy-mode" property of devicetree indicates whether the PCB has
delay now, which means the mac needs to modify the PHY mode based
on whether there is an internal delay in the mac.
This modification is similar for many ethernet drivers. To simplify
code, define the helper phy_fix_phy_mode_for_mac_delays(speed, mac_txid,
mac_rxid) to fix PHY mode based on whether mac adds internal delay.
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251114003805.494387-3-inochiama@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Restructure mana_query_gf_stats() to operate on the per-VF mana_context,
instead of per-port statistics. Introduce mana_ethtool_hc_stats to
isolate hardware counter statistics and update the
"ethtool -S <interface>" output to expose all relevant counters while
preserving backward compatibility.
Add support for the standard rx_missed_errors counter by mapping it
to the hardware's hc_rx_discards_no_wqe metric. Refresh statistics
every 2 seconds.
====================
Report standard counter stats->rx_missed_errors
using hc_rx_discards_no_wqe from the hardware.
Add a global workqueue to periodically run
mana_query_gf_stats every 2 seconds to get the latest
info in eth_stats and define a driver capability flag
to notify hardware of the periodic queries.
To avoid repeated failures and log flooding, the workqueue
is not rescheduled if mana_query_gf_stats fails on HWC timeout
error and the stats are reset to 0. Other errors are transient
which will not need a VF reset for recovery.
net: mana: Move hardware counter stats from per-port to per-VF context
Move hardware counter (HC) statistics from mana_port_context to
mana_context to enable sharing stats across multiple network ports
on the same MANA VF. Previously, each network port queried
hardware counters independently using MANA_QUERY_GF_STAT command
(GF = Generic Function stats from GDMA hardware), resulting in
redundant queries when multiple ports existed on the same device.
Isolate hardware counter stats by introducing mana_ethtool_hc_stats
in mana_context and update the code to ensure all stats are properly
reported via ethtool -S <interface>, maintaining consistency with
previous behavior.
PCI drivers explicitly set .pkt_route to zero. However, as the struct
is allocated using devm_kzalloc(), all members default to zero unless
explicitly initialised. Thus, explicitly setting these to zero is
unnecessary. Remove these. This leaves only stmmac_platform.c where
this is explicitly initialised depending on DT properties.
stmmac_platform.c explicitly sets .prio to zero if the snps,priority
property is not present in DT for the queue. However, as the struct
is allocated using devm_kzalloc(), all members default to zero unless
explicitly initialised, and of_property_read_u32() will not write to
its argument if the property is not found. Thus, explicitly setting
these to zero is unnecessary. Remove these.