git.ipfire.org Git - thirdparty/kernel/linux.git/log

net: ti: icssg-prueth: Add AF_XDP zero copy for RX

Use xsk_pool inside rx_chn to check if a given Rx queue id
is registered for xsk zero copy, which gets populated during
xsk enable.

Update prueth_create_xdp_rxqs to register and support two different
memory models (xsk and page) for a given Rx queue, if registered for
zero copy.

If xsk_pool is registered, allocate buffers from UMEM and map them
to the hardware Rx descriptors. In NAPI context, run the XDP program
for each packet and process the xsk buffer according to the XDP
result codes. Also allocate new set of buffers from UMEM for the
next batch of NAPI Rx processing. Add XDK_WAKEUP_RX support to support
xsk wakeup for Rx.

Move prueth_create_page_pool to prueth_init_rx_chns to avoid freeing
and re-allocating the system memory every time there is a transition
from zero copy to copy and prevents any type of memory fragmentation
or leak.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-6-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg-prueth: Make emac_run_xdp function independent of page

emac_run_xdp function runs xdp program, at a given hook point
in the Rx path of the driver in NAPI context and returns
XDP return codes. In zero copy mode the driver receives
packets using UMEM frames instead of pages (native XDP).
Decouple the usage of page in this function.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-5-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg-prueth: Add AF_XDP zero copy for TX

Use xsk_pool inside tx_chn to check if a given Tx queue id
is registered for xsk zero copy, which gets populated during
xsk enable

If xsk_pool is set, get frames from the pool in NAPI
context and submit them to the Tx channel. Tx completion
is also handled in the NAPI context.

Use PRUETH_SWDATA_XSK to recycle xsk buffers back to the
umem pool. Add XDP_WAKEUP_TX support to enable xsk_wakeup
for Tx.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-4-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg-prueth: Add XSK pool helpers

Implement XSK NDOs (setup, wakeup) and create XSK
Rx and Tx queues. xsk_qid stores the queue id for
a given port which has been registered for zero copy
AF_XDP and used to acquire UMEM pointer if registered.

Based on the xsk_qid and the xsk_pool (umem) the driver
is either in copy or zero copy mode. In case of copy mode
the xsk_qid value will be invalid and will be set to valid
queue id when enabling zero copy. To enable zero copy, the
Rx queues are destroyed, i.e., descriptors pushed to fq
and cq are freed to remap them to xdp buffers from the umem.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-3-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg-prueth: Add functions to create and destroy Rx/Tx queues

Each port for a given ICSSG instance has their own set of
Tx and Rx queues. Add functions to create and destroy these
queues, which will be further used while performing ndo_bpf
operations to set up XSK Tx/Rx queues for a given port.

In the destroy Rx queue sequence add teardown wait to ensure
that all the descriptors including the TDCM (teardown completion
marker) have been serviced and freed to avoid any sort of descriptor
leaks.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20251118135542.380574-2-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'txgbe-support-more-modules'

Jiawen Wu says:

====================
TXGBE support more modules

Support CR modules for 25G devices and QSFP modules for 40G devices. And
implement .get_module_eeprom_by_page() to get module info.

v1: https://lore.kernel.org/all/20251112055841.22984-1-jiawenwu@trustnetic.com/
====================

Link: https://patch.msgid.link/20251118080259.24676-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: support getting module EEPROM by page

Getting module EEPROM has been supported in TXGBE SP devices, since SFP
driver has already implemented it.

Now add support to read module EEPROM for AML devices. Towards this, add
a new firmware mailbox command to get the page data.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20251118080259.24676-6-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: delay to identify modules in .ndo_open

For QSFP modules, there is a possibility that the module cannot be
identified when read I2C immediately in .ndo_open. So just set the flag
WX_FLAG_NEED_MODULE_RESET and do it in the subtask, which always wait
200 ms to identify the module. And this change has no impact on the
original adaptation.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20251118080259.24676-5-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: improve functions of AML 40G devices

Support to identify QSFP modules for AML 40G devices. The definition of
GPIO pins follows the design of the QSFP modules, and TXGBE_GPIOBIT_4 is
used for module present.

Meanwhile, implement phylink in XLGMII mode by default, and get the link
state from MAC link.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20251118080259.24676-4-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: rename the SFP related

QSFP supported will be introduced for AML 40G devices, the code related
to identify various modules should be renamed to more appropriate names.

And struct txgbe_hic_i2c_read used to get module information is renamed
as struct txgbe_hic_get_module_info, because another SW-FW command to
read I2C will be added later.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20251118080259.24676-3-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: support CR modules for AML devices

Support to identify 25G/10G CR modules for AML devices. Autoneg is
enbaled by default in CR mode.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20251118080259.24676-2-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-mlx5-move-notifiers-outside-the-devlink-lock'

Tariq Toukan says:

====================
net/mlx5: Move notifiers outside the devlink lock

This series by Cosmin moves blocking notifier registration in the mlx5
driver outside the devlink lock during probe.

This is mostly a no-op refactoring that consists of multiple pieces.
It is necessary because upcoming code will introduce a potential locking
cycle between the devlink lock and the blocking notifier head mutexes,
so these notifiers must move out of the devlink-locked critical section.
====================

Link: https://patch.msgid.link/1763325940-1231508-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Move SF dev table notifier registration outside the PF devlink lock

This completes the previous patches by moving notifier registration for
SF dev tables outside the devlink locked critical section in
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.

This is only done for non-SFs, since SFs do not have a SF HW table
themselves.

After this patch, notifiers can grab the PF devlink lock (soon to be
necessary) without creating a locking cycle.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Move the SF table notifiers outside the devlink lock

Move the SF table notifiers registration/unregistration outside of
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.

This is only done for non-SFs, since SFs do not have a SF table
themselves and thus don't need notifiers.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Move the SF HW table notifier outside the devlink lock

Move the SF HW table notifier registration/unregistration outside of
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.

This is only done for non-SFs, since SFs do not have a SF HW table
themselves.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Move the vhca event notifier outside of the devlink lock

The vhca event notifier consists of an atomic notifier for vhca state
changes (used for SF events), multiple workqueues and a blocking
notifier chain for delivering the vhca state change events for further
processing.

This patch moves the vhca notifier head outside of mlx5_init_one() /
mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit()
functions.

This allows called notifiers to grab the PF devlink lock which was
previously impossible because it would create a circular lock
dependency.

mlx5_vhca_event_stop() is now called earlier in the cleanup phase and
flushes the workqueues to ensure that after the call, there are no
pending events. This simplifies the cleanup flow for vhca event
consumers.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Move the esw mode notifier chain outside the devlink lock

The esw mode change notifier chain is initialized/cleaned up in
mlx5_init_one() / mlx5_uninit_one() with the devlink lock held.

Move the notifier head from the eswitch struct into mlx5_priv directly,
and initialize it outside the critical section. This will allow notifier
registration to happen earlier in the init procedure in subsequent
patches.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Initialize events outside devlink lock

Move event init/cleanup outside of mlx5_init_one() / mlx5_uninit_one()
and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions.

By doing this, we avoid the events being reinitialized on devlink reload
and, more importantly, the events->sw_nh notifier chain becomes
available earlier in the init procedure, which will be used in
subsequent patches. This makes sense because the events struct is pure
software, independent of any HW details.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-adjust-conservative-values-around-napi'

Jason Xing says:

====================
net: adjust conservative values around napi

This series keeps at least 96 skbs per cpu and frees 32 skbs at one
time in conclusion. More initial discussions with Eric can be seen at
the link [1].

[1]: https://lore.kernel.org/all/CAL+tcoBEEjO=-yvE7ZJ4sB2smVBzUht1gJN85CenJhOKV====================

Link: https://patch.msgid.link/20251118070646.61344-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: prefetch the next skb in napi_skb_cache_get()

After getting the current skb in napi_skb_cache_get(), the next skb in
cache is highly likely to be used soon, so prefetch would be helpful.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251118070646.61344-5-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: use NAPI_SKB_CACHE_FREE to keep 32 as default to do bulk free

- Replace NAPI_SKB_CACHE_HALF with NAPI_SKB_CACHE_FREE
- Only free 32 skbs in napi_skb_cache_put()

Since the first patch adjusting NAPI_SKB_CACHE_SIZE to 128, the number
of packets to be freed in the softirq was increased from 32 to 64.
Considering a subsequent net_rx_action() calling napi_poll() a few
times can easily consume the 64 available slots and we can afford
keeping a higher value of sk_buffs in per-cpu storage, decrease
NAPI_SKB_CACHE_FREE to 32 like before. So now the logic is 1) keeping
96 skbs, 2) freeing 32 skbs at one time.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251118070646.61344-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: increase default NAPI_SKB_CACHE_BULK to 32

The previous value 16 is a bit conservative, so adjust it along with
NAPI_SKB_CACHE_SIZE, which can minimize triggering memory allocation
in napi_skb_cache_get*().

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251118070646.61344-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: increase default NAPI_SKB_CACHE_SIZE to 128

After commit b61785852ed0 ("net: increase skb_defer_max default to 128")
changed the value sysctl_skb_defer_max to avoid many calls to
kick_defer_list_purge(), the same situation can be applied to
NAPI_SKB_CACHE_SIZE that was proposed in 2016. It's a trade-off between
using pre-allocated memory in skb_cache and saving more a bit heavy
function calls in the softirq context.

With this patch applied, we can have more skbs per-cpu to accelerate the
sending path that needs to acquire new skbs.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251118070646.61344-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'disable-clkout-on-rtl8211f-d-i-vd-cg'

Vladimir Oltean says:

====================
Disable CLKOUT on RTL8211F(D)(I)-VD-CG

The Realtek RTL8211F(D)(I)-VD-CG is similar to other RTL8211F models in
that the CLKOUT signal can be turned off - a feature requested to reduce
EMI, and implemented via "realtek,clkout-disable" as documented in
Documentation/devicetree/bindings/net/realtek,rtl82xx.yaml.

It is also dissimilar to said PHY models because it has no PHYCR2
register, and disabling CLKOUT is done through some other register.

The strategy adopted in this 6-patch series is to make the PHY driver
not think in terms of "priv->has_phycr2" and "priv->phycr2", but of more
high-level features ("priv->disable_clk_out") while maintaining behaviour.
Then, the logic is extended for the new PHY.

Very loosely based on previous work from Clark Wang, who took a
different approach, to pretend that the RTL8211FVD_CLKOUT_REG is
actually this PHY's PHYCR2.
====================

Link: https://patch.msgid.link/20251117234033.345679-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: create rtl8211f_config_phy_eee() helper

To simplify the rtl8211f_config_init() control flow and get rid of
"early" returns for PHYs where the PHYCR2 register is absent, move the
entire logic sub-block that deals with disabling PHY-mode EEE to a
separate function. There, it is much more obvious what the early
"return 0" skips, and it becomes more difficult to accidentally skip
unintended stuff.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251117234033.345679-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: eliminate priv->phycr1 variable

Previous changes have replaced the machine-level priv->phycr2 with a
high-level priv->disable_clk_out. This created a discrepancy with
priv->phycr1 which is resolved here, for uniformity.

One advantage of this new implementation is that we don't read
priv->phycr1 in rtl821x_probe() if we're never going to modify it.

We never test the positive return code from phy_modify_mmd_changed(), so
we could just as well use phy_modify_mmd().

I took the ALDPS feature description from commit d90db36a9e74 ("net:
phy: realtek: add dt property to enable ALDPS mode") and transformed it
into a function comment - the feature is sufficiently non-obvious to
deserve that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251117234033.345679-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: allow CLKOUT to be disabled on RTL8211F(D)(I)-VD-CG

Add CLKOUT disable support for RTL8211F(D)(I)-VD-CG. Like with other PHY
variants, this feature might be requested by customers when the clock
output is not used, in order to reduce electromagnetic interference (EMI).

In the common driver, the CLKOUT configuration is done through PHYCR2.
The RTL_8211FVD_PHYID is singled out as not having that register, and
execution in rtl8211f_config_init() returns early after commit
2c67301584f2 ("net: phy: realtek: Avoid PHYCR2 access if PHYCR2 not
present").

But actually CLKOUT is configured through a different register for this
PHY. Instead of pretending this is PHYCR2 (which it is not), just add
some code for modifying this register inside the rtl8211f_disable_clk_out()
function, and move that outside the code portion that runs only if
PHYCR2 exists.

In practice this reorders the PHYCR2 writes to disable PHY-mode EEE and
to disable the CLKOUT for the normal RTL8211F variants, but this should
be perfectly fine.

It was not noted that RTL8211F(D)(I)-VD-CG would need a genphy_soft_reset()
call after disabling the CLKOUT. Despite that, we do it out of caution
and for symmetry with the other RTL8211F models.

Co-developed-by: Clark Wang <xiaoning.wang@nxp.com>
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251117234033.345679-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: eliminate has_phycr2 variable

This variable is assigned in rtl821x_probe() and used in
rtl8211f_config_init(), which is more complex than it needs to be.
Simply testing the same condition from rtl821x_probe() in
rtl8211f_config_init() yields the same result (the PHY driver ID is a
runtime invariant), but with one temporary variable less.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251117234033.345679-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: eliminate priv->phycr2 variable

The RTL8211F(D)(I)-VD-CG PHY also has support for disabling the CLKOUT,
and we'd like to introduce the "realtek,clkout-disable" property for
that.

But it isn't done through the PHYCR2 register, and it becomes awkward to
have the driver pretend that it is. So just replace the machine-level
"u16 phycr2" variable with a logical "bool disable_clk_out", which
scales better to the other PHY as well.

The change is a complete functional equivalent. Before, if the device
tree property was absent, priv->phycr2 would contain the RTL8211F_CLKOUT_EN
bit as read from hardware. Now, we don't save priv->phycr2, but we just
don't call phy_modify_paged() on it. Also, we can simply call
phy_modify_paged() with the "set" argument to 0.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251117234033.345679-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: create rtl8211f_config_rgmii_delay()

The control flow in rtl8211f_config_init() has some pitfalls which were
probably unintended. Specifically it has an early return:

switch (phydev->interface) {
...
default: /* the rest of the modes imply leaving delay as is. */
return 0;
}

which exits the entire config_init() function. This means it also skips
doing things such as disabling CLKOUT or disabling PHY-mode EEE.

For the RTL8211FS, which uses PHY_INTERFACE_MODE_SGMII, this might be a
problem. However, I don't know that it is, so there is no Fixes: tag.
The issue was observed through code inspection.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251117234033.345679-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: vmxnet3: convert to use .get_rx_ring_count

Convert the vmxnet3 driver to use the new .get_rx_ring_count ethtool
operation instead of implementing .get_rxnfc solely for handling
ETHTOOL_GRXRINGS command. This simplifies the code by removing the
switch statement and replacing it with a direct return of the queue
count.

The new callback provides the same functionality in a more direct way,
following the ongoing ethtool API modernization.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251118-vmxnet3_grxrings-v1-1-ed8abddd2d52@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mana-enforce-tx-sge-limit-and-fix-error-cleanup'

Aditya Garg says:

====================
net: mana: Enforce TX SGE limit and fix error cleanup

Add pre-transmission checks to block SKBs that exceed the hardware's SGE
limit. Force software segmentation for GSO traffic and linearize non-GSO
packets as needed.

Update TX error handling to drop failed SKBs and unmap resources
immediately.
====================

Link: https://patch.msgid.link/1763464269-10431-1-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Drop TX skb on post_work_request failure and unmap resources

Drop TX packets when posting the work request fails and ensure DMA
mappings are always cleaned up.

Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763464269-10431-3-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Handle SKB if TX SGEs exceed hardware limit

The MANA hardware supports a maximum of 30 scatter-gather entries (SGEs)
per TX WQE. Exceeding this limit can cause TX failures.
Add ndo_features_check() callback to validate SKB layout before
transmission. For GSO SKBs that would exceed the hardware SGE limit, clear
NETIF_F_GSO_MASK to enforce software segmentation in the stack.
Add a fallback in mana_start_xmit() to linearize non-GSO SKBs that still
exceed the SGE limit.

Also, Add ethtool counter for SKBs linearized

Co-developed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763464269-10431-2-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: Skip TM tree print for disabled SQs

Currently, the TM tree is printing all SQ topology including those
which are not enabled, this results in redundant output for SQs
which are not active. This patch adds a check in print_tm_tree()
to skip printing the TM tree hierarchy if the SQ is not enabled.

Signed-off-by: Anshumali Gaur <agaur@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251118054235.1599714-1-agaur@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: mediatek,net: Correct bindings for MT7981

Different SoCs have different numbers of Wireless Ethernet
Dispatch (WED) units:
- MT7981: Has 1 WED unit
- MT7986: Has 2 WED units
- MT7988: Has 2 WED units

Update the binding to reflect these hardware differences. The MT7981
also uses infracfg for PHY switching, so allow that property.

Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251115-openwrt-one-network-v4-6-48cbda2969ac@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-sanitise-stmmac_is_jumbo_frm'

Russell King says:

====================
net: stmmac: sanitise stmmac_is_jumbo_frm()

stmmac_is_jumbo_frm() takes skb->len, which is unsigned int, but the
parameter is passed as an "int" and then tested using signed
comparisons. This can cause bugs. Change the parameter to be unsigned.

Also arrange for it to return a bool.
====================

Link: https://patch.msgid.link/aRxDqJSWxOdOaRt4@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: stmmac_is_jumbo_frm() returns boolean

stmmac_is_jumbo_frm() returns whether the driver considers the frame
size to be a jumbo frame, and thus returns 0/1 values. This is boolean,
so convert it to return a boolean and use false/true instead. Also
convert stmmac_xmit()'s is_jumbo to be bool, which causes several
variables to be repositioned to keep it in reverse Christmas-tree
order.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vLIWW-0000000Ewkl-21Ia@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: stmmac_is_jumbo_frm() len should be unsigned

stmmac_is_jumbo_frm() and the is_jumbo_frm() methods take skb->len
which is an unsigned int. Avoid an implicit cast to "int" via the
method parameter and then incorrectly doing signed comparisons on
this unsigned value.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vLIWR-0000000Ewkf-1Tdx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: convert priv->sph* to boolean and rename

priv->sph* only have 'true' and 'false' used with them, yet they are an
int. Change their type to a bool, and rename to make their usage more
clear.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vLIDN-0000000Evur-2NLU@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib_tests: add fib6 from ra to static test

The new test checks that a route that has been promoted from RA-learned
to static does not switch back when a new RA message arrives. In
addition, it checks that the route is owned by RA again when the static
address is removed.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20251115095939.6967-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: clear RA flags when adding a static route

When an IPv6 Router Advertisement (RA) is received for a prefix, the
kernel creates the corresponding on-link route with flags RTF_ADDRCONF
and RTF_PREFIX_RT configured and RTF_EXPIRES if lifetime is set.

If later a user configures a static IPv6 address on the same prefix the
kernel clears the RTF_EXPIRES flag but it doesn't clear the RTF_ADDRCONF
and RTF_PREFIX_RT. When the next RA for that prefix is received, the
kernel sees the route as RA-learned and wrongly configures back the
lifetime. This is problematic because if the route expires, the static
address won't have the corresponding on-link route.

This fix clears the RTF_ADDRCONF and RTF_PREFIX_RT flags preventing that
the lifetime is configured when the next RA arrives. If the static
address is deleted, the route becomes RA-learned again.

Fixes: 14ef37b6d00e ("ipv6: fix route lookup in addrconf_prefix_rcv()")
Reported-by: Garri Djavadyan <g.djavadyan@gmail.com>
Closes: https://lore.kernel.org/netdev/ba807d39aca5b4dcf395cc11dca61a130a52cfd3.camel@gmail.com/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20251115095939.6967-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'af_unix-gc-cleanup-and-optimisation'

Kuniyuki Iwashima says:

====================
af_unix: GC cleanup and optimisation.

Currently, AF_UNIX GC is triggered from close() and sendmsg()
based on the number of inflight AF_UNIX sockets.

This is because the old GC implementation had no idea of the
shape of the graph formed by SCM_RIGHTS references.

The new GC knows whether cyclic references (could) exist.

This series refines such conditions not to trigger GC unless
really needed.
====================

Link: https://patch.msgid.link/20251115020935.2643121-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Consolidate unix_schedule_gc() and wait_for_unix_gc().

unix_schedule_gc() and wait_for_unix_gc() share some code.

Let's consolidate the two.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Remove unix_tot_inflight.

unix_tot_inflight is no longer used.

Let's remove it.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Refine wait_for_unix_gc().

unix_tot_inflight is a poor metric, only telling the number of
inflight AF_UNXI sockets, and we should use unix_graph_state instead.

Also, if the receiver is catching up with the passed fds, the
sender does not need to schedule GC.

GC only helps unreferenced cyclic SCM_RIGHTS references, and in
such a situation, the malicious sendmsg() will continue to call
wait_for_unix_gc() and hit the UNIX_INFLIGHT_SANE_USER condition.

Let's make only malicious users schedule GC and wait for it to
finish if a cyclic reference exists during the previous GC run.

Then, sane users will pay almost no cost for wait_for_unix_gc().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Don't call wait_for_unix_gc() on every sendmsg().

We have been calling wait_for_unix_gc() on every sendmsg() in case
there are too many inflight AF_UNIX sockets.

This is also because the old GC implementation had poor knowledge
of the inflight sockets and had to suspect every sendmsg().

This was improved by commit d9f21b361333 ("af_unix: Try to run GC
async."), but we do not even need to call wait_for_unix_gc() if the
process is not sending AF_UNIX sockets.

The wait_for_unix_gc() call only helps when a malicious process
continues to create cyclic references, and we can detect that
in a better place and slow it down.

Let's move wait_for_unix_gc() to unix_prepare_fpl() that is called
only when AF_UNIX socket fd is passed via SCM_RIGHTS.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Don't trigger GC from close() if unnecessary.

We have been triggering GC on every close() if there is even one
inflight AF_UNIX socket.

This is because the old GC implementation had no idea of the graph
shape formed by SCM_RIGHTS references.

The new GC knows whether there could be a cyclic reference or not,
and we can do better.

Let's not trigger GC from close() if there is no cyclic reference
or GC is already in progress.

While at it, unix_gc() is renamed to unix_schedule_gc() as it does
not actually perform GC since commit 8b90a9f819dc ("af_unix: Run
GC on only one CPU.").

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Simplify GC state.

GC manages its state by two variables, unix_graph_maybe_cyclic
and unix_graph_grouped, both of which are set to false in the
initial state.

When an AF_UNIX socket is passed to an in-flight AF_UNIX socket,
unix_update_graph() sets unix_graph_maybe_cyclic to true and
unix_graph_grouped to false, making the next GC invocation call
unix_walk_scc() to group SCCs.

Once unix_walk_scc() finishes, sockets in the same SCC are linked
via vertex->scc_entry.  Then, unix_graph_grouped is set to true
so that the following GC invocations can skip Tarjan's algorithm
and simply iterate through the list in unix_walk_scc_fast().

In addition, if we know there is at least one cyclic reference,
we set unix_graph_maybe_cyclic to true so that we do not skip GC.

So the state transitions as follows:

  (unix_graph_maybe_cyclic, unix_graph_grouped)
  =
  (false, false) -> (true, false) -> (true, true) or (false, true)
                         ^.______________/________________/

There is no transition to the initial state where both variables
are false.

If we consider the initial state as grouped, we can see that the
GC actually has a tristate.

Let's consolidate two variables into one enum.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Count cyclic SCC.

__unix_walk_scc() and unix_walk_scc_fast() call unix_scc_cyclic()
for each SCC to check if it forms a cyclic reference, so that we
can skip GC at the following invocations in case all SCCs do not
have any cycles.

If we count the number of cyclic SCCs in __unix_walk_scc(), we can
simplify unix_walk_scc_fast() because the number of cyclic SCCs
only changes when it garbage-collects a SCC.

So, let's count cyclic SCC in __unix_walk_scc() and decrement it
in unix_walk_scc_fast() when performing garbage collection.

Note that we will use this counter in a later patch to check if a
cycle existed in the previous GC run.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251115020935.2643121-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-misc-changes-2025-11-17'

Tariq Toukan says:

====================
net/mlx5: misc changes 2025-11-17

This series contains misc enhancements to the mlx5 driver.
====================

Link: https://patch.msgid.link/1763415729-1238421-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Use EOPNOTSUPP instead of ENOTSUPP

Per Documentation/dev-tools/checkpatch.rst, ENOTSUPP is not a standard
error code and should be avoided. EOPNOTSUPP should be used instead.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Abort new commands if all command slots are stalled

In case of a FW issue, FW might be not responding to FW commands,
causing kernel lockout for a long period of time, e.g. rtnl_lock held
while ethtool is trying to collect stats waiting for FW to respond to
multiple commands, when all of them will timeout.

While there's no immediate indication of the FW lockout, we can safely
assume that something is wrong when all command slots are busy and in
a timeout state and no FW completion was received on any of them.

In such case, start immediately failing new commands.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Remove redundant bw_share minimal value assignment

Remove unnecessary logic that sets bw_share to minimal value, when
parent has bw_share configured but nodes don't have min_rate.

This check is redundant because the parent bandwidth acts as the upper
bound regardless, and the firmware always enforces the topmost
bandwidth constraint.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Recover SQ on excessive PTP TX timestamp delta

Extend the TX timestamp handler to recover the SQ when the difference
between the port and CQE TX timestamps is abnormally large.

The current logic aborts timestamp delivery if the delta exceeds
1/128 seconds, which matches the maximum expected packet interval in
ptp4l. A larger delta makes the timestamps unreliable.

This change adds recovery if the delta exceeds 0.5 seconds. Such a
large gap should not occur in normal operation and indicates that
firmware is stuck or metadata tracking is out of sync, leading to stale
or mismatched timestamps. Recovering the SQ ensures forward progress
and avoids silently dropping invalid timestamps.

The timestamp handler now takes mlx5e_ptpsq directly to access both CQ
stats and the recovery state.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Refactor EEPROM query error handling to return status separately

Matthew and Jakub reported [1] issues where inventory automation tools
are calling EEPROM query repeatedly on a port that doesn't have an SFP
connected, resulting in millions of error prints.

Move MCIA register status extraction from the query functions to the
callers, allowing use of extack reporting instead of a dmesg print when
using the netlink API.

[1] https://lore.kernel.org/netdev/20251028194011.39877-1-mattc@purestorage.com/

Cc: Matthew W Carlis <mattc@purestorage.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: support ipv4-or-v6 for dual-stack fields

Since commit 1b255e1beabf ("tools: ynl: add ipv4-or-v6 display hint"), we
can display either IPv4 or IPv6 addresses for a single field based on the
address family. However, most dual-stack fields still use the ipv4 display
hint. This update changes them to use the new ipv4-or-v6 display hint and
converts IPv4-only fields to use the u32 type.

Field changes:
  - v4-or-v6
    - IFA_ADDRESS, IFA_LOCAL
    - IFLA_GRE_LOCAL, IFLA_GRE_REMOTE
    - IFLA_VTI_LOCAL, IFLA_VTI_REMOTE
    - IFLA_IPTUN_LOCAL, IFLA_IPTUN_REMOTE
    - NDA_DST
    - RTA_DST, RTA_SRC, RTA_GATEWAY, RTA_PREFSRC
    - FRA_SRC, FRA_DST
  - ipv4
    - IFA_BROADCAST
    - IFLA_GENEVE_REMOTE
    - IFLA_IPTUN_6RD_RELAY_PREFIX

Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251117024457.3034-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: Add MAC address parsing support

Add missing support for parsing MAC addresses when display_hint is 'mac'
in the YNL library. This enables YNL CLI to accept MAC address strings
for attributes like lladdr in rt-neigh operations.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251117024457.3034-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-expand-napi_skb_cache-use'

Eric Dumazet says:

====================
net: expand napi_skb_cache use

This is a followup of commit e20dfbad8aab ("net: fix napi_consume_skb()
with alien skbs").

Now the per-cpu napi_skb_cache is populated from TX completion path,
we can make use of this cache, especially for cpus not used
from a driver NAPI poll (primary user of napi_cache).

With this series, I consistently reach 130 Mpps on my UDP tx stress test
and reduce SLUB spinlock contention to smaller values.
====================

Link: https://patch.msgid.link/20251116202717.1542829-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: use napi_skb_cache even in process context

This is a followup of commit e20dfbad8aab ("net: fix napi_consume_skb()
with alien skbs").

Now the per-cpu napi_skb_cache is populated from TX completion path,
we can make use of this cache, especially for cpus not used
from a driver NAPI poll (primary user of napi_cache).

We can use the napi_skb_cache only if current context is not from hard irq.

With this patch, I consistently reach 130 Mpps on my UDP tx stress test
and reduce SLUB spinlock contention to smaller values.

Note there is still some SLUB contention for skb->head allocations.

I had to tune /sys/kernel/slab/skbuff_small_head/cpu_partial
and /sys/kernel/slab/skbuff_small_head/min_partial depending
on the platform taxonomy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251116202717.1542829-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: __alloc_skb() cleanup

This patch refactors __alloc_skb() to prepare the following one,
and does not change functionality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251116202717.1542829-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add a new @alloc parameter to napi_skb_cache_get()

We want to be able in the series last patch to get an skb from
napi_skb_cache from process context, if there is one available.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251116202717.1542829-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: ks8995: Fix incorrect OF match table name

The driver declares an OF match table named ks8895_spi_of_match, even
though it describes compatible strings for the KS8995 and related Micrel
switches. This is a leftover typo, the correct name should match the
chip family handled by this driver ks8995, and also match the variable
used in spi_driver.of_match_table.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20251117095356.2099772-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

kcm: Fix typo and add hyphen in Kconfig help text

s/connectons/connections/ and s/message based/message-based/

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20251116135616.106079-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Don't reinitialise tw->tw_transparent in tcp_time_wait().

tw->tw_transparent is initialised twice in inet_twsk_alloc()
and tcp_time_wait().

Let's remove the latter.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251118000445.4091280-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'ipsec-next-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2025-11-18

1) Relax a lock contention bottleneck to improve IPsec crypto
   offload performance. From Jianbo Liu.

2) Deprecate pfkey, the interface will be removed in 2027.

3) Update xfrm documentation and move it to ipsec maintainance.
   From Bagas Sanjaya.

* tag 'ipsec-next-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  MAINTAINERS: Add entry for XFRM documentation
  net: Move XFRM documentation into its own subdirectory
  Documentation: xfrm_sync: Number the fifth section
  Documentation: xfrm_sysctl: Trim trailing colon in section heading
  Documentation: xfrm_sync: Trim excess section heading characters
  Documentation: xfrm_sync: Properly reindent list text
  Documentation: xfrm_device: Separate hardware offload sublists
  Documentation: xfrm_device: Use numbered list for offloading steps
  Documentation: xfrm_device: Wrap iproute2 snippets in literal code block
  pfkey: Deprecate pfkey
  xfrm: Skip redundant replay recheck for the hardware offload path
  xfrm: Refactor xfrm_input lock to reduce contention with RSS
====================

Link: https://patch.msgid.link/20251118092610.2223552-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: cdns,macb: Add pic64gx compatibility

The pic64gx uses an identical integration of the macb IP to mpfs.

Signed-off-by: Pierre-Henry Moussay <pierre-henry.moussay@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251117-easter-machine-37851f20aaf3@spud
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynltool: ignore *.d deps files

Add *.d to gitignore for ynltool

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251117143155.44806-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'gve-implement-xdp-hw-rx-timestamping-support-for-dq'

Tim Hostetler says:

====================
gve: Implement XDP HW RX Timestamping support for DQ

From: Tim Hostetler <thostet@google.com>

This patch series adds support for bpf_xdp_metadata_rx_timestamp from an
XDP program loaded into the driver on its own or bound to an XSK. This
is only supported for DQ.
====================

Link: https://patch.msgid.link/20251114211146.292068-1-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Add Rx HWTS metadata to AF_XDP ZC mode

By overlaying the struct gve_xdp_buff on top of the struct xdp_buff_xsk
that AF_XDP utilizes, the driver records the 32 bit timestamp via the
completion descriptor and the cached 64 bit NIC timestamp via gve_priv.

The driver's implementation of xmo_rx_timestamp extends the timestamp to
the full and up to date 64 bit timestamp and returns it to the user.

gve_rx_xsk_dqo is modified to accept a pointer to the completion
descriptor and no longer takes a buf_len explicitly as it can be pulled
out of the descriptor.

With this patch gve now supports bpf_xdp_metadata_rx_timestamp.

Signed-off-by: Tim Hostetler <thostet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251114211146.292068-5-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Prepare bpf_xdp_metadata_rx_timestamp support

Support populating XDP RX metadata with hardware RX timestamps. This
patch utilizes the same underlying logic to calculate hardware
timestamps as the regular RX path.

xdp_metadata_ops is registered with the net_device in a future patch.

gve_rx_calculate_hwtstamp was pulled out so as to not duplicate logic
between gve_xdp_rx_timestamp and gve_rx_hwtstamp.

Signed-off-by: Tim Hostetler <thostet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251114211146.292068-4-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Wrap struct xdp_buff

RX timestamping will need to keep track of extra temporary information
per-packet. In preparation for this, introduce gve_xdp_buff to wrap the
xdp_buff. This is similar in function to stmmac_xdp_buff and
ice_xdp_buff.

Signed-off-by: Tim Hostetler <thostet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251114211146.292068-3-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: Move ptp_schedule_worker to gve_init_clock

Previously, gve had been only initializing ptp aux work when
hardware timestamping was initialized through ndo_hwtsatmp_set. As this
patch series introduces XDP hardware timestamp metadata which will
require the ptp aux work, the work can't be gated on the
kernel_hwtstamp_config being set and must be initialized elsewhere.

For simplicity, ptp_schedule_worker is invoked right after the ptp_clock
is registered with the kernel (which happens during gve_probe or
following reset). The worker is scheduled in GVE_NIC_TS_SYNC_INTERVAL_MS
as the synchronous call to gve_clock_nic_ts_read makes the worker
redundant if scheduled immediately.

If gve cannot read the device clock immediately, it errors out of
gve_init_clock.

Signed-off-by: Tim Hostetler <thostet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20251114211146.292068-2-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: micrel: lan8814: Enable in-band auto-negotiation

The lan8814 supports two interfaces towards the host (QSGMII and QUSGMII).
Currently the lan8814 disables the auto-negotiation towards the host
side. So, extend this to allow to configure to use in-band
auto-negotiation.
I have tested this only with the QSGMII interface.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251114084224.3268928-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: af_unix: Add tests for ECONNRESET and EOF semantics

Add selftests to verify and document Linux’s intended behaviour for
UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes.
The tests verify that:

1. SOCK_STREAM returns EOF when the peer closes normally.
2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data.
3. SOCK_SEQPACKET returns EOF when the peer closes normally.
4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data.
5. SOCK_DGRAM does not return ECONNRESET when the peer closes.

This follows up on review feedback suggesting a selftest to clarify
Linux’s semantics.

Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com>
Link: https://patch.msgid.link/20251113112802.44657-1-adelodunolaoluwa@yahoo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-stmmac-disable-eee-rx-clock-stop-when-vlan-is-enabled'

Ovidiu Panait says:

====================
net: stmmac: Disable EEE RX clock stop when VLAN is enabled

This series fixes a couple of VLAN issues observed on the Renesas RZ/V2H
EVK platform (stmmac + Microchip KSZ9131RNXI PHY):

- The first patch fixes a bug where VLAN ID 0 would not be properly removed
due to how vlan_del_hw_rx_fltr() matched entries in the VLAN filter table.

- The second patch addresses RX clock gating issues that occur during VLAN
creation and deletion when EEE is enabled with RX clock-stop active (the
default configuration). For example:

    # ip link add link end1 name end1.5 type vlan id 5
    15c40000.ethernet end1: Timeout accessing MAC_VLAN_Tag_Filter
    RTNETLINK answers: Device or resource busy

The stmmac hardware requires the receive clock to be running when writing
certain registers, including VLAN registers. However, by default the driver
enables Energy Efficient Ethernet (EEE) and allows the PHY to stop the
receive clock when the link is idle. As a result, the RX clock might be
stopped when attempting to access these registers, leading to timeouts.

A more comprehensive overview of receive clock related issues in the
stmmac driver can be found here:
https://lore.kernel.org/all/Z9ySeo61VYTClIJJ@shell.armlinux.org.uk/

Most of the issues were resolved by commit dd557266cf5fb ("net: stmmac:
block PHY RXC clock-stop"), which wraps register accesses with
phylink_rx_clk_stop_block()/unblock() calls. However, VLAN add/delete
operations are invoked with bottom halves disabled, where sleeping is
not permitted, so those helpers cannot be used.

To avoid these VLAN timeouts, the second patch disables the EEE RX
clock-stop feature when VLAN support is enabled. This ensures the receive
clock remains active, allowing VLAN operations to complete successfully.
====================

Link: https://patch.msgid.link/20251113112721.70500-1-ovidiu.panait.rb@renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: Disable EEE RX clock stop when VLAN is enabled

On the Renesas RZ/V2H EVK platform, where the stmmac MAC is connected to a
Microchip KSZ9131RNXI PHY, creating or deleting VLAN interfaces may fail
with timeouts:

    # ip link add link end1 name end1.5 type vlan id 5
    15c40000.ethernet end1: Timeout accessing MAC_VLAN_Tag_Filter
    RTNETLINK answers: Device or resource busy

Disabling EEE at runtime avoids the problem:

    # ethtool --set-eee end1 eee off
    # ip link add link end1 name end1.5 type vlan id 5
    # ip link del end1.5

The stmmac hardware requires the receive clock to be running when writing
certain registers, such as those used for MAC address configuration or
VLAN filtering. However, by default the driver enables Energy Efficient
Ethernet (EEE) and allows the PHY to stop the receive clock when the link
is idle. As a result, the RX clock might be stopped when attempting to
access these registers, leading to timeouts and other issues.

Commit dd557266cf5fb ("net: stmmac: block PHY RXC clock-stop")
addressed this issue for most register accesses by wrapping them in
phylink_rx_clk_stop_block()/phylink_rx_clk_stop_unblock() calls.
However, VLAN add/delete operations may be invoked with bottom halves
disabled, where sleeping is not allowed, so using these helpers is not
possible.

Therefore, to fix this, disable the RX clock stop feature in the phylink
configuration if VLAN features are set. This ensures the RX clock remains
active and register accesses succeed during VLAN operations.

Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20251113112721.70500-3-ovidiu.panait.rb@renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: Fix VLAN 0 deletion in vlan_del_hw_rx_fltr()

When the "rx-vlan-filter" feature is enabled on a network device, the 8021q
module automatically adds a VLAN 0 hardware filter when the device is
brought administratively up.

For stmmac, this causes vlan_add_hw_rx_fltr() to create a new entry for
VID 0 in the mac_device_info->vlan_filter array, in the following format:

VLAN_TAG_DATA_ETV | VLAN_TAG_DATA_VEN | vid

Here, VLAN_TAG_DATA_VEN indicates that the hardware filter is enabled for
that VID.

However, on the delete path, vlan_del_hw_rx_fltr() searches the vlan_filter
array by VID only, without verifying whether a VLAN entry is enabled. As a
result, when the 8021q module attempts to remove VLAN 0, the function may
mistakenly match a zero-initialized slot rather than the actual VLAN 0
entry, causing incorrect deletions and leaving stale entries in the
hardware table.

Fix this by verifying that the VLAN entry's enable bit (VLAN_TAG_DATA_VEN)
is set before matching and deleting by VID. This ensures only active VLAN
entries are removed and avoids leaving stale entries in the VLAN filter
table, particularly for VLAN ID 0.

Fixes: ed64639bc1e08 ("net: stmmac: Add support for VLAN Rx filtering")
Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251113112721.70500-2-ovidiu.panait.rb@renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'dpll-zl3073x-refactor-state-management'

Ivan Vecera says:

====================
dpll: zl3073x: Refactor state management

This patch set is a refactoring of the zl3073x driver to clean up
state management, improve modularity, and significantly reduce
on-demand I/O.

The driver's dpll.c implementation previously performed on-demand
register reads and writes (wrapped in mailbox operations) to get
or set properties like frequency, phase, and embedded-sync settings.
This cluttered the DPLL logic with low-level I/O, duplicated locking,
and led to inefficient bus traffic.

This series addresses this by:
1. Splitting the monolithic 'core.c' into logical units ('ref.c',
   'out.c', 'synth.c').
2. Implementing a full read/write-back cache for 'zl3073x_ref' and
   'zl3073x_out' structures.

All state is now read once during '_state_fetch()' (and status updated
periodically). DPLL get callbacks read from this cache. Set callbacks
modify a copy of the state, which is then committed via a new
'..._state_set()' function. These '_state_set' functions compare
the new state to the cached state and write *only* the modified
register values back to the hardware, all within a single mailbox
sequence.

The result is a much cleaner 'dpll.c' that is almost entirely
free of direct register I/O, and all state logic is properly
encapsulated in its respective file.

The series is broken down as follows:

* Patch 1: Changes the state structs to store raw register values
  (e.g., 'config', 'ctrl') instead of parsed booleans, centralizing
  parsing logic into the helpers.
* Patch 2: Splits the logic from 'core.c' into new 'ref.c', 'out.c'
  and 'synth.c' files, creating a 'zl3073x_dev_...' abstraction layer.
* Patch 3: Introduces the caching concept by reading and caching
  the reference monitor status periodically, removing scattered
  reads from 'dpll.c'.
* Patch 4: Expands the 'zl3073x_ref' struct to cache *all* reference
  properties and adds 'zl3073x_ref_state_set()' to write back changes.
* Patch 5: Does the same for the 'zl3073x_out' struct, caching all
  output properties and adding 'zl3073x_out_state_set()'.
* Patch 6: A final cleanup that removes the 'zl3073x_dev_...' wrapper
  functions that became redundant after the refactoring.
====================

Link: https://patch.msgid.link/20251113074105.141379-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Remove unused dev wrappers

Remove several zl3073x_dev_... inline wrapper functions from core.h
as they are no longer used by any callers.

Removed functions:
* zl3073x_dev_ref_ffo_get
* zl3073x_dev_ref_is_enabled
* zl3073x_dev_synth_dpll_get
* zl3073x_dev_synth_is_enabled
* zl3073x_dev_out_signal_format_get

This is a cleanup after recent refactoring, as the remaining callers
now fetch the state object and use the base helpers directly.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-7-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Cache all output properties in zl3073x_out

Expand the zl3073x_out structure to cache all output-related
hardware registers, including divisors, widths, embedded-sync
parameters and phase compensation.

Modify zl3073x_out_state_fetch() to read and populate all these
new fields at once, including zero-divisor checks. Refactor all
dpll "getter" functions in dpll.c to read from this new
cached state instead of performing direct register access.

Introduce a new function, zl3073x_out_state_set(), to handle
writing changes back to the hardware. This function compares the
provided state with the current cached state and writes *only* the
modified register values via a single mailbox sequence before
updating the local cache.

Refactor all dpll "setter" functions to modify a local copy of
the output state and then call zl3073x_out_state_set() to
commit the changes.

This change centralizes all output-related register I/O into
out.c, significantly reduces bus traffic, and simplifies the logic
in dpll.c.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-6-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Cache all reference properties in zl3073x_ref

Expand the zl3073x_ref structure to cache all reference-related
hardware registers, including frequency components, embedded-sync
settings and phase compensation. Previously, these registers were
read on-demand from various functions in dpll.c leading to frequent
mailbox operations.

Modify zl3073x_ref_state_fetch() to read and populate all these new
fields at once. Refactor all "getter" functions in dpll.c to read
from this new cached state instead of performing direct register
access.

Remove the standalone zl3073x_dpll_input_ref_frequency_get() helper,
as its functionality is now replaced by zl3073x_ref_freq_get() which
operates on the cached state and add a corresponding zl3073x_dev_...
wrapper.

Introduce a new function, zl3073x_ref_state_set(), to handle
writing changes back to the hardware. This function compares the
provided state with the current cached state and writes *only* the
modified register values to the device via a single mailbox sequence
before updating the local cache.

Refactor all dpll "setter" functions to modify a local copy of the
ref state and then call zl3073x_ref_state_set() to commit the changes.

As a cleanup, update callers in dpll.c that already have
a struct zl3073x_ref * to use the direct helpers instead of the
zl3073x_dev_... wrappers.

This change centralizes all reference-related register I/O into ref.c,
significantly reduces bus traffic, and simplifies the logic in dpll.c.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-5-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Cache reference monitor status

Instead of reading the ZL_REG_REF_MON_STATUS register every time
the reference status is needed, cache this value in the zl3073x_ref
struct.

This is achieved by:
* Adding a mon_status field to struct zl3073x_ref
* Introducing zl3073x_dev_ref_status_update() to read the status for
  all references into this new cache field
* Calling this update function from the periodic work handler
* Adding zl3073x_ref_is_status_ok() and zl3073x_dev_ref_is_status_ok()
  helpers to check the cached value
* Refactoring all callers in dpll.c to use the new
  zl3073x_dev_ref_is_status_ok() helper, removing direct register reads

This change consolidates all status register reads into a single periodic
function and reduces I/O bus traffic in dpll callbacks.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Split ref, out, and synth logic from core

Refactor the zl3073x driver by splitting the logic for input
references, outputs and synthesizers out of the monolithic
core.[ch] files.

Move the logic for each functional block into its own dedicated files:
ref.[ch], out.[ch] and synth.[ch].

Specifically:
- Move state structures (zl3073x_ref, zl3073x_out, zl3073x_synth)
  from core.h into their respective new headers
- Move state-fetching functions (..._state_fetch) from core.c to their
  new .c files
- Move the zl3073x_ref_freq_factorize helper from core.c to ref.c
- Introduce a new helper layer to decouple the core device logic from
  the state-parsing logic:
  1. Move the original inline helpers (e.g., zl3073x_ref_is_enabled)
     to the new headers (ref.h, etc.) and make them operate on a
     const struct ... * pointer.
  2. Create new zl3073x_dev_... prefixed functions in core.h
     (e.g., zl3073x_dev_ref_is_enabled) and Implement these _dev_ functions
     to fetch state using a new ..._state_get() helper and then call
     the non-prefixed helper.
  3. Update all driver-internal callers (in dpll.c, prop.c, etc.) to use
     the new zl3073x_dev_... functions.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Store raw register values instead of parsed state

The zl3073x_ref, zl3073x_out and zl3073x_synth structures
previously stored state that was parsed from register reads. This
included values like boolean 'enabled' flags, synthesizer selections,
and pre-calculated frequencies.

This commit refactors the state management to store the raw register
values directly in these structures. The various inline helper functions
are updated to parse these raw values on-demand using FIELD_GET.

Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251113074105.141379-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 's390-qeth-improve-handling-of-osa-rcs'

Aswin Karuvally says:

====================
s390/qeth: Improve handling of OSA RCs

This two patch series aims to improve how return codes from OSA Express
are handled in the qeth driver.

OSA defines a number of return codes whose meaning is determined by the
issuing command, ie. they are ambiguous. The first patch moves
definitions of all return codes including the ambiguous ones to a single
enum block to aid readability and maintainability.

The second patch implements a mechanism to interpret return codes based
on the issuing command to ensure accurate debug messages. While at it,
remove extern keyword and fix indentation for function declarations to
be in line with Linux kernel coding style.
====================

Link: https://patch.msgid.link/20251113144209.2140061-1-aswin@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/qeth: Handle ambiguous OSA RCs in s390dbf

OSA Express defines a number of return codes whose meaning is determined
by the issuing command, making them ambiguous. The important ones are
reported as debug messages through the s390 debug feature.

The qeth driver currently does not take the issuing command into account
when interpreting the return code which sometimes leads to incorrect
debug messages.

Implement a mechanism to interpret and report these return codes
properly. While at it, remove extern keyword and fix indentation for
function declarations to be in line with Linux kernel coding style.

Suggested-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Link: https://patch.msgid.link/20251113144209.2140061-3-aswin@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/qeth: Move all OSA RCs to single enum

OSA Express defines a number of return codes whose meaning is
determined by the issuing command, making them ambiguous. Move
definitions of all return codes including the ambiguous ones to a single
enum block to aid readability and maintainability.

Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Link: https://patch.msgid.link/20251113144209.2140061-2-aswin@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: bail out from probe if fiber mode is detected on RTL8127AF

It was reported that on a card with RTL8127AF (SFP + DAC) link-up isn't
detected. Realtek hides the SFP behind the internal PHY, which isn't
behaving fully compliance with clause 22 any longer in fiber mode.
Due to not having access to chip documentation there isn't much I can
do for now. Instead of silently failing to detect link-up in fiber mode,
inform the user that fiber mode isn't support and bail out.

The logic to detect fiber mode is borrowed from Realtek's r8127 driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/fab6605a-54e2-4f54-b194-11c2b9caaaa9@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-dwmac-sophgo-add-phy-interface-filter'

Inochi Amaoto says:

====================
net: stmmac: dwmac-sophgo: Add phy interface filter

As the SG2042 has an internal rx delay, the delay should be remove
when init the mac, otherwise the phy will be misconfigurated.

Since this delay fix is common for other MACs, add a common helper
for it. And use it to fix SG2042.
====================

Link: https://patch.msgid.link/20251114003805.494387-1-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-sophgo: Add phy interface filter

As the SG2042 has an internal rx delay, the delay should be removed
when initializing the mac, otherwise the phy will be misconfigurated.

Fixes: 543009e2d4cd ("net: stmmac: dwmac-sophgo: Add support for Sophgo SG2042 SoC")
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Tested-by: Han Gao <rabenda.cn@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251114003805.494387-4-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Add helper for fixing RGMII PHY mode based on internal mac delay

The "phy-mode" property of devicetree indicates whether the PCB has
delay now, which means the mac needs to modify the PHY mode based
on whether there is an internal delay in the mac.

This modification is similar for many ethernet drivers. To simplify
code, define the helper phy_fix_phy_mode_for_mac_delays(speed, mac_txid,
mac_rxid) to fix PHY mode based on whether mac adds internal delay.

Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251114003805.494387-3-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: sophgo,sg2044-dwmac: add phy mode restriction

As the ethernet controller of SG2044 and SG2042 only supports
RGMII phy. Add phy-mode property to restrict the value.

Also, since SG2042 has internal rx delay in its mac, make
only "rgmii-txid" and "rgmii-id" valid for phy-mode.

Fixes: e281c48a7336 ("dt-bindings: net: sophgo,sg2044-dwmac: Add support for Sophgo SG2042 dwmac")
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251114003805.494387-2-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mana-refactor-gf-stats-handling-and-add-rx_missed_errors-counter'

Erni Sri Satya Vennela says:

====================
net: mana: Refactor GF stats handling and add rx_missed_errors counter

Restructure mana_query_gf_stats() to operate on the per-VF mana_context,
instead of per-port statistics. Introduce mana_ethtool_hc_stats to
isolate hardware counter statistics and update the
"ethtool -S <interface>" output to expose all relevant counters while
preserving backward compatibility.

Add support for the standard rx_missed_errors counter by mapping it
to the hardware's hc_rx_discards_no_wqe metric. Refresh statistics
every 2 seconds.
====================

Link: https://patch.msgid.link/1763120599-6331-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Add standard counter rx_missed_errors

Report standard counter stats->rx_missed_errors
using hc_rx_discards_no_wqe from the hardware.

Add a global workqueue to periodically run
mana_query_gf_stats every 2 seconds to get the latest
info in eth_stats and define a driver capability flag
to notify hardware of the periodic queries.

To avoid repeated failures and log flooding, the workqueue
is not rescheduled if mana_query_gf_stats fails on HWC timeout
error and the stats are reset to 0. Other errors are transient
which will not need a VF reset for recovery.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763120599-6331-3-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Move hardware counter stats from per-port to per-VF context

Move hardware counter (HC) statistics from mana_port_context to
mana_context to enable sharing stats across multiple network ports
on the same MANA VF. Previously, each network port queried
hardware counters independently using MANA_QUERY_GF_STAT command
(GF = Generic Function stats from GDMA hardware), resulting in
redundant queries when multiple ports existed on the same device.

Isolate hardware counter stats by introducing mana_ethtool_hc_stats
in mana_context and update the code to ensure all stats are properly
reported via ethtool -S <interface>, maintaining consistency with
previous behavior.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763120599-6331-2-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-clean-up-plat_dat-allocation-initialisation'

Russell King says:

====================
net: stmmac: clean up plat_dat allocation/initialisation

This series cleans up the plat_dat allocation and initialisation,
moving common themes into the allocator.

This results in a nice saving:
7 files changed, 53 insertions(+), 148 deletions(-)
====================

Link: https://patch.msgid.link/aRdKVMPHXlIn457m@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: remove unnecessary .pkt_route queue initialisation

PCI drivers explicitly set .pkt_route to zero. However, as the struct
is allocated using devm_kzalloc(), all members default to zero unless
explicitly initialised. Thus, explicitly setting these to zero is
unnecessary. Remove these. This leaves only stmmac_platform.c where
this is explicitly initialised depending on DT properties.

$ grep '\.pkt_route =' *.c
dwmac-intel.c: plat->rx_queues_cfg[0].pkt_route = 0x0;
dwmac-intel.c: plat->rx_queues_cfg[i].pkt_route = 0x0;
dwmac-loongson.c: plat->rx_queues_cfg[0].pkt_route = 0x0;
stmmac_main.c: if (priv->plat->rx_queues_cfg[queue].pkt_route == 0x0)
stmmac_pci.c: plat->rx_queues_cfg[0].pkt_route = 0x0;
stmmac_pci.c: plat->rx_queues_cfg[i].pkt_route = 0x0;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_AVCPQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_PTPQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_DCBCPQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_UPQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = PACKET_MCBCQ;
stmmac_platform.c: plat->rx_queues_cfg[queue].pkt_route = 0x0;

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvjf-0000000EVkO-1ZaO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: remove unnecessary .prio queue initialisation

stmmac_platform.c explicitly sets .prio to zero if the snps,priority
property is not present in DT for the queue. However, as the struct
is allocated using devm_kzalloc(), all members default to zero unless
explicitly initialised, and of_property_read_u32() will not write to
its argument if the property is not found. Thus, explicitly setting
these to zero is unnecessary. Remove these.

$ grep '\.prio =' *.c
stmmac_platform.c: plat->rx_queues_cfg[queue].prio = 0;
stmmac_platform.c: plat->tx_queues_cfg[queue].prio = 0;

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vJvja-0000000EVkI-0zUH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>