git.ipfire.org Git - thirdparty/kernel/linux.git/log

Merge branch 'net-mlx5-hws-single-flow-counter-support'

Tariq Toukan says:

====================
net/mlx5: HWS single flow counter support

This small series refactors the flow counter bulk initialization code
and extends it so that single flow counters are also usable by hardware
steering (HWS) rules.

Patches 1-2 refactor the bulk init path: first by factoring out common
flow counter bulk initialization into mlx5_fc_bulk_init(), then by
splitting the bitmap allocation into mlx5_fs_bulk_bitmap_alloc(), with
no functional changes.

Patch 3 initializes bulk data for counters allocated via
mlx5_fc_single_alloc(), so they can be safely used by HWS rules.
====================

Link: https://patch.msgid.link/1768210825-1598472-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: Initialize bulk for single flow counters

Ensure that flow counters allocated with mlx5_fc_single_alloc() have
bulk correctly initialized so they can safely be used in HWS rules.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768210825-1598472-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: fs, split bulk init

Refactor mlx5_fs_bulk_init() by moving bitmap allocation logic into a
new helper function mlx5_fs_bulk_bitmap_alloc(). This change does not
alter any logic.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768210825-1598472-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: fs, factor out flow counter bulk init

Add mlx5_fc_bulk_init() to handle bulk initialization of flow counters.
This change does not alter any logic, but refactors the code to remove
duplicate initialization logic by centralizing it in a single function.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768210825-1598472-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'introduce-and-use-netif_xmit_timeout_ms-helper'

Tariq Toukan says:

====================
Introduce and use netif_xmit_timeout_ms() helper

This is V2, find V1 here:
https://lore.kernel.org/all/1764054776-1308696-1-git-send-email-tariqt@nvidia.com/

This series by Shahar introduces a new helper function
netif_xmit_timeout_ms() to check if a TX queue has timed out and report
the timeout duration.
It also encapsulates the check for whether the TX queue is stopped.

Replace duplicated open-coded timeout check in hns3 driver with the new
helper.

For mlx5e, refine the TX timeout recovery flow to act only on SQs whose
transmit timestamp indicates an actual timeout, as determined by the
helper. This prevents unnecessary channel reopen events caused by
attempting recovery on queues that are merely stopped but not truly
timed out.
====================

Link: https://patch.msgid.link/1768209383-1546791-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Refine TX timeout handling to skip non-timed-out SQ

mlx5e_tx_timeout_work() is invoked when the dev_watchdog reports a
timed-out TX queue. Currently, the recovery flow is triggered for all
stopped SQs, which is not always correct — some SQs may be temporarily
stopped without actually timing out. Attempting to recover such SQs
results in no EQE being polled (since no real timeout occurred), which
the driver misinterprets as a recovery failure, unnecessarily causing
channel reopening.

Improve the logic to initiate recovery only for SQs that are both
stopped and timed out. Utilize the helper introduced in the previous
patch to determine whether the netdevice watchdog timeout period has
elapsed since the SQ’s last transmit timestamp.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768209383-1546791-4-git-send-email-tariqt@nvidia.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: hns3: Use netif_xmit_timeout_ms() helper

Replace the open-coded TX queue timeout check
in hns3_get_timeout_queue() with a call to
netif_xmit_timeout_ms() helper.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/1768209383-1546791-3-git-send-email-tariqt@nvidia.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: Introduce netif_xmit_timeout_ms() helper

Introduce a new helper function netif_xmit_timeout_ms() to check
if a TX queue is stopped and has timed out and report the timeout
duration. This makes the timeout logic reusable, and will be used
in several places in subsequent patches.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768209383-1546791-2-git-send-email-tariqt@nvidia.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'xsk-move-cq_cached_prod_lock'

Jason Xing says:

====================
xsk: move cq_cached_prod_lock

From: Jason Xing <kernelxing@tencent.com>

Move cq_cached_prod_lock to avoid touching new cacheline.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
====================

Link: https://patch.msgid.link/20260104012125.44003-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

xsk: move cq_cached_prod_lock to avoid touching a cacheline in sending path

We (Paolo and I) noticed that in the sending path touching an extra
cacheline due to cq_cached_prod_lock will impact the performance. After
moving the lock from struct xsk_buff_pool to struct xsk_queue, the
performance is increased by ~5% which can be observed by xdpsock.

An alternative approach [1] can be using atomic_try_cmpxchg() to have the
same effect. But unfortunately I don't have evident performance numbers to
prove the atomic approach is better than the current patch. The advantage
is to save the contention time among multiple xsks sharing the same pool
while the disadvantage is losing good maintenance. The full discussion can
be found at the following link.

[1]: https://lore.kernel.org/all/20251128134601.54678-1-kerneljasonxing@gmail.com/

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20260104012125.44003-3-kerneljasonxing@gmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

xsk: advance cq/fq check when shared umem is used

In the shared umem mode with different queues or devices, either
uninitialized cq or fq is not allowed which was previously done in
xp_assign_dev_shared(). The patch advances the check at the beginning
so that 1) we can avoid a few memory allocation and stuff if cq or fq
is NULL, 2) it can be regarded as preparation for the next patch in
the series.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20260104012125.44003-2-kerneljasonxing@gmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mana: Implement ndo_tx_timeout and serialize queue resets per port.

Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected
and a device-controlled port reset for all queues can be scheduled to a
ordered workqueue. The reset for all queues on stall detection is
recomended by hardware team.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/20260112130552.GA11785@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sxgbe: fix typo in comment

Fix a misspelling in the sxgbe_mtl_init() function comment.
"Algorith" should be spelled as "Algorithm".

Signed-off-by: Jinseok Kim <always.starving0@gmail.com>
Link: https://patch.msgid.link/20260112044147.2844-1-always.starving0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-fixed_phy-replace-list-of-fixed-phys-with-static-array'

Heiner Kallweit says:

====================
net: phy: fixed_phy: replace list of fixed PHYs with static array

Due to max 32 PHY addresses being available per mii bus, using a list
can't support more fixed PHY's. And there's no known use case for as
much as 32 fixed PHY's on a system. 8 should be plenty of fixed PHY's,
so use an array of that size instead of a list. This allows to
significantly reduce the code size and complexity.
In addition replace heavy-weight IDA with a simple bitmap.
====================

Link: https://patch.msgid.link/110f676d-727c-4575-abe4-e383f98fc38f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: replace IDA with a bitmap

Size of array fmb_fixed_phys is small, so we can use a simple bitmap
instead of an IDA to manage dynamic allocation of fixed PHY's.
find_first_zero_bit() isn't atomic, so we need the loop to rule out
double allocation of a PHY address.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/d4614463-d532-41fc-92e9-ef97107aceb5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: replace list of fixed PHYs with static array

Due to max 32 PHY addresses being available per mii bus, using a list
can't support more fixed PHY's. And there's no known use case for as
much as 32 fixed PHY's on a system. 8 should be plenty of fixed PHY's,
so use an array of that size instead of a list. This allows to
significantly reduce the code size and complexity.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/8610d30c-eac7-4100-9008-d3b6cee6a5cd@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv6-allow-for-nexthop-device-mismatch-with-onlink'

Ido Schimmel says:

====================
ipv6: Allow for nexthop device mismatch with "onlink"

This patchset aligns IPv6 with IPv4 with respect to the "onlink" keyword
and allows IPv6 routes to be configured with a gateway address that is
resolved out of a different interface than the one specified.

Patches #1-#3 are small preparations in the existing "onlink" selftest.

Patch #4 is the actual change. See the commit message for detailed
description and motivation.

Patch #5 adds test cases for both address families, to make sure that
this use case does not regress.
====================

Link: https://patch.msgid.link/20260111120813.159799-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib-onlink: Add test cases for nexthop device mismatch

Add test cases that verify that when the "onlink" keyword is specified,
both address families (with and without VRF) accept routes with a
gateway address that is reachable via a different interface than the one
specified.

Output without "ipv6: Allow for nexthop device mismatch with "onlink"":

# ./fib-onlink-tests.sh | grep mismatch
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [FAIL]
TEST: nexthop device mismatch                             [FAIL]

Output with "ipv6: Allow for nexthop device mismatch with "onlink"":

# ./fib-onlink-tests.sh | grep mismatch
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [ OK ]
TEST: nexthop device mismatch                             [ OK ]

That is, the IPv4 tests were always passing, but the IPv6 ones only pass
after the specified patch.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Allow for nexthop device mismatch with "onlink"

IPv4 allows for a nexthop device mismatch when the "onlink" keyword is
specified:

# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 198.51.100.0/24 nexthop via 192.0.2.2 dev dummy2
Error: Nexthop has invalid gateway.
# ip route add 198.51.100.0/24 nexthop via 192.0.2.2 dev dummy2 onlink
# echo $?
0

This seems to be consistent with the description of "onlink" in the
ip-route man page: "Pretend that the nexthop is directly attached to
this link, even if it does not match any interface prefix".

On the other hand, IPv6 rejects a nexthop device mismatch, even when
"onlink" is specified:

# ip link add name dummy1 up type dummy
# ip address add 2001:db8:1::1/64 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2
RTNETLINK answers: No route to host
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink
Error: Nexthop has invalid gateway or device mismatch.

This is intentional according to commit fc1e64e1092f ("net/ipv6: Add
support for onlink flag") which added IPv6 "onlink" support and states
that "any unicast gateway is allowed as long as the gateway is not a
local address and if it resolves it must match the given device".

The condition was later relaxed in commit 4ed591c8ab44 ("net/ipv6: Allow
onlink routes to have a device mismatch if it is the default route") to
allow for a nexthop device mismatch if the gateway address is resolved
via the default route:

# ip link add name dummy1 up type dummy
# ip route add ::/0 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2
RTNETLINK answers: No route to host
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink
# echo $?
0

While the decision to forbid a nexthop device mismatch in IPv6 seems to
be intentional, it is unclear why it was made. Especially when it
differs from IPv4 and seems to go against the intended behavior of
"onlink".

Therefore, relax the condition further and allow for a nexthop device
mismatch when "onlink" is specified:

# ip link add name dummy1 up type dummy
# ip address add 2001:db8:1::1/64 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink
# echo $?
0

The motivating use case is the fact that FRR would like to be able to
configure overlay routes of the following form:

# ip route add <host-Z> vrf <VRF> encap ip id <ID> src <VTEP-A> dst <VTEP-Z> via <VTEP-Z> dev vxlan0 onlink

Where vxlan0 is in the default VRF in which "VTEP-Z" is reachable via
one of the underlay routes (e.g., via swpX). Without this patch, the
above only works with IPv4, but not with IPv6.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib-onlink: Add a test case for IPv4 multicast gateway

A multicast gateway address should be rejected when "onlink" is
specified, but it is only tested as part of the IPv6 tests. Add an
equivalent IPv4 test.

# ./fib-onlink-tests.sh -v
[...]
COMMAND: ip ro add table 254 169.254.101.12/32 via 233.252.0.1 dev veth1 onlink
Error: Nexthop has invalid gateway.

TEST: Invalid gw - multicast address                      [ OK ]
[...]
COMMAND: ip ro add table 1101 169.254.102.12/32 via 233.252.0.1 dev veth5 onlink
Error: Nexthop has invalid gateway.

TEST: Invalid gw - multicast address, VRF                 [ OK ]
[...]
Tests passed:  37
Tests failed:   0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib-onlink: Remove "wrong nexthop device" IPv6 tests

The command in the test fails as expected because IPv6 forbids a nexthop
device mismatch:

# ./fib-onlink-tests.sh -v
[...]
COMMAND: ip -6 ro add table 1101 2001:db8:102::103/128 via 2001:db8:701::64 dev veth5 onlink
Error: Nexthop has invalid gateway or device mismatch.

TEST: Gateway resolves to wrong nexthop device - VRF [ OK ]
[...]

Where:

# ip route get 2001:db8:701::64 vrf lisa
2001:db8:701::64 dev veth7 table 1101 proto kernel src 2001:db8:701::1 metric 256 pref medium

This is in contrast to IPv4 where a nexthop device mismatch is allowed
when "onlink" is specified:

# ip route get 169.254.7.2 vrf lisa
169.254.7.2 dev veth7 table 1101 src 169.254.7.1 uid 0
# ip ro add table 1101 169.254.102.103/32 via 169.254.7.2 dev veth5 onlink
# echo $?
0

Remove these tests in preparation for aligning IPv6 with IPv4 and
allowing nexthop device mismatch when "onlink" is specified.

A subsequent patch will add tests that verify that both address families
allow a nexthop device mismatch with "onlink".

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib-onlink: Remove "wrong nexthop device" IPv4 tests

According to the test description, these tests fail because of a wrong
nexthop device:

# ./fib-onlink-tests.sh -v
[...]
COMMAND: ip ro add table 254 169.254.101.102/32 via 169.254.3.1 dev veth1 onlink
Error: Nexthop has invalid gateway.

TEST: Gateway resolves to wrong nexthop device            [ OK ]
COMMAND: ip ro add table 1101 169.254.102.103/32 via 169.254.7.1 dev veth5 onlink
Error: Nexthop has invalid gateway.

TEST: Gateway resolves to wrong nexthop device - VRF      [ OK ]
[...]

But this is incorrect. They fail because the gateway addresses are local
addresses:

# ip -4 address show
[...]
28: veth3@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link-netns peer_ns-Urqh3o
     inet 169.254.3.1/24 scope global veth3
[...]
32: veth7@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lisa state UP group default qlen 1000 link-netns peer_ns-Urqh3o
     inet 169.254.7.1/24 scope global veth7

Therefore, using a local address that matches the nexthop device fails
as well:

# ip ro add table 254 169.254.101.102/32 via 169.254.3.1 dev veth3 onlink
Error: Nexthop has invalid gateway.

Using a gateway address with a "wrong" nexthop device is actually valid
and allowed:

# ip route get 169.254.1.2
169.254.1.2 dev veth1 src 169.254.1.1 uid 0
# ip ro add table 254 169.254.101.102/32 via 169.254.1.2 dev veth3 onlink
# echo $?
0

Remove these tests given that their output is confusing and that the
scenario that they are testing is already covered by other tests.

A subsequent patch will add tests for the nexthop device mismatch
scenario.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-introduce-phy-ports-representation'

Maxime Chevallier says:

====================
net: phy: Introduce PHY ports representation

A few important notes:

- This is only a first phase. It instantiates the port, and leverage
   that to make the MAC <-> PHY <-> SFP usecase simpler.

- Next phase will deal with controlling the port state, as well as the
   netlink uAPI for that.

- The end-goal is to enable support for complex port MUX. This
   preliminary work focuses on PHY-driven ports, but this will be
   extended to support muxing at the MII level (Multi-phy, or compo PHY
   + SFP as found on Turris Omnia for example).

- The naming is definitely not set in stone. I named that "phy_port",
   but this may convey the false sense that this is phylib-specific.
   Even the word "port" is not that great, as it already has several
   different meanings in the net world (switch port, devlink port,
   etc.). I used the term "connector" in the binding.

A bit of history on that work :

The end goal that I personnaly want to achieve is :

            + PHY - RJ45
            |
MAC - MUX -+ PHY - RJ45

After many discussions here on netdev@, but also at netdevconf[1] and
LPC[2], there appears to be several analoguous designs that exist out
there.

[1] : https://netdevconf.info/0x17/sessions/talk/improving-multi-phy-and-multi-port-interfaces.html
[2] : https://lpc.events/event/18/contributions/1964/ (video isn't the
right one)

Take the MAchiatobin, it has 2 interfaces that looks like this :

MAC - PHY -+ RJ45
            |
    + SFP - Whatever the module does

Now, looking at the Turris Omnia, we have :

MAC - MUX -+ PHY - RJ45
            |
    + SFP - Whatever the module does

We can find more example of this kind of designs, the common part is
that we expose multiple front-facing media ports. This is what this
current work aims at supporting. As of right now, it does'nt add any
support for muxing, but this will come later on.

This first phase focuses on phy-driven ports only, but there are already
quite some challenges already. For one, we can't really autodetect how
many ports are sitting behind a PHY. That's why this series introduces a
new binding. Describing ports in DT should however be a last-resort
thing when we need to clear some ambiguity about the PHY media-side.

The only use-cases that we have today for multi-port PHYs are combo PHYs
that drive both a Copper port and an SFP (the Macchiatobin case). This
in itself is challenging and this series only addresses part of this
support, by registering a phy_port for the PHY <-> SFP connection. The
SFP module should in the end be considered as a port as well, but that's
not yet the case.

However, because now PHYs can register phy_ports for every media-side
interface they have, they can register the capabilities of their ports,
which allows making the PHY-driver SFP case much more generic.
====================

Link: https://patch.msgid.link/20260108080041.553250-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Documentation: networking: Document the phy_port infrastructure

This documentation aims at describing the main goal of the phy_port
infrastructure.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-15-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Only rely on phy_port for PHY-driven SFP

Now that all PHY drivers that support downstream SFP have been converted
to phy_port serdes handling, we can make the generic PHY SFP handling
mandatory, thus making all phylib sfp helpers static.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-14-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: qca807x: Support SFP through phy_port interface

QCA8072/8075 may be used as combo-port PHYs, with Serdes (100/1000BaseX)
and Copper interfaces. The PHY has the ability to read the configuration
it's in. If the configuration indicates the PHY is in combo mode, allow
registering up to 2 ports.

Register a dedicated set of port ops to handle the serdes port, and rely
on generic phylib SFP support for the SFP handling.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-13-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: at803x: Support SFP through phy_port interface

Convert the at803x driver to use the generic phylib SFP handling, via a
dedicated .attach_port() callback, populating the supported interfaces.

As these devices are limited to 1000BaseX, a workaround is used to also
support, in a very limited way, copper modules. This is done by
supporting SGMII but limiting it to 1G full duplex (in which case it's
somewhat compatible with 1000BaseX).

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-12-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: marvell10g: Support SFP through phy_port

Convert the Marvell10G driver to use the generic SFP handling, through a
dedicated .attach_port() handler to populate the port's supported
interfaces.

As the 88x3310 supports multiple MDI, the .attach_port() logic handles
both SFP attach with 10GBaseR support, and support for the "regular"
port that usually is a BaseT port.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-11-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: marvell: Support SFP through phy_port interface

Convert the Marvell driver (especially the 88e1512 driver) to use the
phy_port interface to handle SFPs. This means registering a
.attach_port() handler to detect when a serdes line interface is used
(most likely, and SFP module).

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-10-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: marvell-88x2222: Support SFP through phy_port interface

The 88x2222 PHY from Marvell only supports serialised modes as its
line-facing interfaces. Convert that driver to the generic phylib SFP
handling.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-9-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Introduce generic SFP handling for PHY drivers

There are currently 4 PHY drivers that can drive downstream SFPs:
marvell.c, marvell10g.c, at803x.c and marvell-88x2222.c. Most of the
logic is boilerplate, either calling into generic phylib helpers (for
SFP PHY attach, bus attach, etc.) or performing the same tasks with a
bit of validation :
- Getting the module's expected interface mode
- Making sure the PHY supports it
- Optionaly perform some configuration to make sure the PHY outputs
   the right mode

This can be made more generic by leveraging the phy_port, and its
configure_mii() callback which allows setting a port's interfaces when
the port is a serdes.

Introduce a generic PHY SFP support. If a driver doesn't probe the SFP
bus itself, but an SFP phandle is found in devicetree/firmware, then the
generic PHY SFP support will be used, relying on port ops.

PHY driver need to :
- Register a .attach_port() callback
- When a serdes port is registered to the PHY, drivers must set
   port->interfaces to the set of PHY_INTERFACE_MODE the port can output
- If the port has limitations regarding speed, duplex and aneg, the
   port can also fine-tune the final linkmodes that can be supported
- The port may register a set of ops, including .configure_mii(), that
   will be called at module_insert time to adjust the interface based on
   the module detected.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-8-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Create a phy_port for PHY-driven SFPs

Some PHY devices may be used as media-converters to drive SFP ports (for
example, to allow using SFP when the SoC can only output RGMII). This is
already supported to some extend by allowing PHY drivers to registers
themselves as being SFP upstream.

However, the logic to drive the SFP can actually be split to a per-port
control logic, allowing support for multi-port PHYs, or PHYs that can
either drive SFPs or Copper.

To that extent, create a phy_port when registering an SFP bus onto a
PHY. This port is considered a "serdes" port, in that it can feed data
to another entity on the link. The PHY driver needs to specify the
various PHY_INTERFACE_MODE_XXX that this port supports.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-7-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dp83822: Deprecate ti,fiber-mode

The newly added ethernet-connector binding allows describing an Ethernet
connector with greater precision, and in a more generic manner, than
ti,fiber-mode. Deprecate this property.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-6-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: dp83822: Add support for phy_port representation

With the phy_port representation introduced, we can use .attach_port to
populate the port information based on either the straps or the
ti,fiber-mode property. This allows simplifying the probe function and
allow users to override the strapping configuration.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260108080041.553250-5-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Introduce PHY ports representation

Ethernet provides a wide variety of layer 1 protocols and standards for
data transmission. The front-facing ports of an interface have their own
complexity and configurability.

Introduce a representation of these front-facing ports. The current code
is minimalistic and only support ports controlled by PHY devices, but
the plan is to extend that to SFP as well as raw Ethernet MACs that
don't use PHY devices.

This minimal port representation allows describing the media and number
of pairs of a BaseT port. From that information, we can derive the
linkmodes usable on the port, which can be used to limit the
capabilities of an interface.

For now, the port pairs and medium is derived from devicetree, defined
by the PHY driver, or populated with default values (as we assume that
all PHYs expose at least one port).

The typical example is 100M ethernet. 100BaseTX works using only 2
pairs on a Cat 5 cables. However, in the situation where a 10/100/1000
capable PHY is wired to its RJ45 port through 2 pairs only, we have no
way of detecting that. The "max-speed" DT property can be used, but a
more accurate representation can be used :

mdi {
connector-0 {
media = "BaseT";
pairs = <2>;
};
};

From that information, we can derive the max speed reachable on the
port.

Another benefit of having that is to avoid vendor-specific DT properties
(micrel,fiber-mode or ti,fiber-mode).

This basic representation is meant to be expanded, by the introduction
of port ops, userspace listing of ports, and support for multi-port
devices.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260108080041.553250-4-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: Introduce ETHTOOL_LINK_MEDIUM_* values

In an effort to have a better representation of Ethernet ports,
introduce enumeration values representing the various ethernet Mediums.

This is part of the 802.3 naming convention, for example :

1000 Base T 4
|    |   | |
|    |   | \_ pairs (4)
|    |   \___ Medium (T == Twisted Copper Pairs)
|    \_______ Baseband transmission
\____________ Speed

Other example :

10000 Base K X 4
           | | \_ lanes (4)
           | \___ encoding (BaseX is 8b/10b while BaseR is 66b/64b)
           \_____ Medium (K is backplane ethernet)

In the case of representing a physical port, only the medium and number
of pairs should be relevant. One exception would be 1000BaseX, which is
currently also used as a medium in what appears to be any of 1000BaseSX,
1000BaseCX, 1000BaseLX, 1000BaseEX, 1000BaseBX10 and some other.

This was reflected in the mediums associated with the 1000BaseX linkmode.

These mediums are set in the net/ethtool/common.c lookup table that
maintains a list of all linkmodes with their number of pairs, medium,
encoding, speed and duplex.

One notable exception to this is 100BaseT Ethernet. It emcompasses 100BaseTX,
which is a 2-pairs protocol but also 100BaseT4, that will also work on 4-pairs
cables. As we don't make a disctinction between these,  the lookup table
contains 2 sets of pair numbers, indicating the min number of pairs for a
protocol to work and the "nominal" number of pairs as well.

Another set of exceptions are linkmodes such 100000baseLR4_ER4, where
the same link mode seems to represent 100GBaseLR4 and 100GBaseER4. The
macro __DEFINE_LINK_MODE_PARAMS_MEDIUMS is here used to populate the
.mediums bitfield with all appropriate mediums.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260108080041.553250-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: Introduce the ethernet-connector description

The ability to describe the physical ports of Ethernet devices is useful
to describe multi-port devices, as well as to remove any ambiguity with
regard to the nature of the port.

Moreover, describing ports allows for a better description of features
that are tied to connectors, such as PoE through the PSE-PD devices.

Introduce a binding to allow describing the ports, for now with 2
attributes :

- The number of pairs, which is a quite generic property that allows
   differentating between multiple similar technologies such as BaseT1
   and "regular" BaseT (which usually means BaseT4).

- The media that can be used on that port, such as BaseT for Twisted
   Copper, BaseC for coax copper, BaseS/L for Fiber, BaseK for backplane
   ethernet, etc. This allows defining the nature of the port, and
   therefore avoids the need for vendor-specific properties such as
   "micrel,fiber-mode" or "ti,fiber-mode".

The port description lives in its own file, as it is intended in the
future to allow describing the ports for phy-less devices.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260108080041.553250-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-drv-net-gro-enable-hw-gro-and-lro-testing'

Jakub Kicinski says:

====================
selftests: drv-net: gro: enable HW GRO and LRO testing

Add support for running our existing GRO test against HW GRO
and LRO implementation. The first 3 patches are just ksft lib
nice-to-haves, and patch 4 cleans up the existing gro Python.

Patches 5 and 6 are of most practical interest. The support
reconfiguring the NIC to disable SW GRO and enable HW GRO and LRO.
Additionally last patch breaks up the existing GRO cases to
track HW compliance at finer granularity.
====================

Link: https://patch.msgid.link/20260113000740.255360-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: gro: break out all individual test cases

GRO test groups the cases into categories, e.g. "tcp" case
checks coalescing in presence of:
- packets with bad csum,
- sequence number mismatch,
- timestamp option value mismatch,
- different TCP options.

Since we now have TAP support grouping the cases like that
lowers our reporting granularity. This matters even more for
NICs performing HW GRO and LRO since it appears that most
implementation have _some_ bugs. Flagging the whole group
of tests as failed prevents us from catching regressions
in the things that work today.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260113000740.255360-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: gro: run the test against HW GRO and LRO

Run the test against HW GRO and LRO. NICs I have pass the base cases.
Interestingly all are happy to build GROs larger than 64k.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260113000740.255360-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: gro: improve feature config

We'll need to do a lot more feature handling to test HW-GRO and LRO.
Clean up the feature handling for SW GRO a bit to let the next commit
focus on the new test cases, only.

Make sure HW GRO-like features are not enabled for the SW tests.
Be more careful about changing features as "nothing changed"
situations may result in non-zero error code from ethtool.

Don't disable TSO on the local interface (receiver) when running over
netdevsim, we just want GSO to break up the segments on the sender.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260113000740.255360-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: gro: use cmd print

Now that cmd() can be printed directly remove the old formatting.

Before:

  # fragmented ip6 doesn't coalesce:
  # Expected {200 100 100 }, Total 3 packets
  # Received {200 100 }, Total 2 packets.
  # /root/ksft-net-drv/drivers/net/gro: incorrect number of packets

Now:

  # CMD: drivers/net/gro --ipv6 --dmac 9e:[...]
  #   EXIT: 1
  #   STDOUT: fragmented ip6 doesn't coalesce:
  #   STDERR: Expected {200 100 100 }, Total 3 packets
  #           Received {200 100 }, Total 2 packets.
  #           /root/ksft-net-drv/drivers/net/gro: incorrect number of packets

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260113000740.255360-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: py: teach cmd() how to print itself

Teach cmd() how to print itself, to make debug prints easier.
Example output (leading # due to ksft_pr()):

  # CMD: /root/ksft-net-drv/drivers/net/gro
  #   EXIT: 1
  #   STDOUT: ipv6 with ext header does coalesce:
  #   STDERR: Expected {200 }, Total 1 packets
  #           Received {100 [!=200]100 [!=0]}, Total 2 packets.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260113000740.255360-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: py: teach ksft_pr() multi-line safety

Make printing multi-line logs easier by automatically prefixing
each line in ksft_pr(). Make use of this when formatting exceptions.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260113000740.255360-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools/net/ynl: suppress jobserver warning in ynltool version detection

When building ynltool with parallel make (-jN), a warning is emitted:

  make[1]: warning: jobserver unavailable: using -j1.
  Add '+' to parent make rule.

The warning trips up local runs of NIPA's ingest_mdir.py, which
correctly fails on make warnings.

This occurs because SRC_VERSION uses $(shell make ...) to make
kernelversion. The $(shell) function inherits make's MAKEFLAGS env var
which specifies "--jobserver-auth=R,W" pointing to file descriptors that
the invoked make sub-shell does not have access to.

Observed with:

$ make --version | head -1
GNU Make 4.3

Instead of suppressing MAKEFLAGS and foregoing all future MAKEFLAGS
(some of which may be desirable, such as variable overrides) or
introducing a new make target, we instead just ignore the warning by
piping stderr to /dev/null. If 'make kernelversion' fails, the ' || echo
"unknown"' phrase will catch the failure.

Before:
NIPA ingest_mdir.py:

ynl
Full series FAIL   (1)
   Generated files up to date; build has 1 warnings/errors; no diff in
   generated;

After:
NIPA ingest_mdir.py:

Series level tests:
ynl                             OKAY

Validated output:
$ ./ynltool/ynltool --version
ynltool 6.19.0-rc4

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260112-ynl-make-fix-v1-1-c399e76925ad@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2026-01-13

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add IFC bits for extended ETS rate limit bandwidth value
  net/mlx5: Add support for querying bond speed
  net/mlx5: Handle port and vport speed change events in MPESW
  net/mlx5: Propagate LAG effective max_tx_speed to vports
  net/mlx5: Add max_tx_speed and its CAP bit to IFC
====================

Link: https://patch.msgid.link/1768299471-1603093-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: reduce txtimestamp deschedule flakes

This test occasionally fails due to exceeding timing bounds, as
run in continuous testing on netdev.bots:

  https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh

A common pattern is a single elevated delay between USR and SND.

    # 8.36 [+0.00] test SND
    # 8.36 [+0.00]     USR: 1767864384 s 240994 us (seq=0, len=0)
    # 8.44 [+0.08] ERROR: 18461 us expected between 10000 and 18000
    # 8.44 [+0.00]     SND: 1767864384 s 259455 us (seq=42, len=10)  (USR +18460 us)
    # 8.52 [+0.07]     SND: 1767864384 s 339523 us (seq=42, len=10)  (USR +10005 us)
    # 8.52 [+0.00]     USR: 1767864384 s 409580 us (seq=0, len=0)
    # 8.60 [+0.08]     SND: 1767864384 s 419586 us (seq=42, len=10)  (USR +10005 us)
    # 8.60 [+0.00]     USR: 1767864384 s 489645 us (seq=0, len=0)
    # 8.68 [+0.08]     SND: 1767864384 s 499651 us (seq=42, len=10)  (USR +10005 us)
    # 8.68 [+0.00]     USR-SND: count=4, avg=12119 us, min=10005 us, max=18460 us

(Note that other delays are nowhere near the large 8ms tolerance.)

One hypothesis is that the task is descheduled between taking the USR
timestamp and sending the packet. Possibly in printing.

Delay taking the timestamp closer to sendmsg, and delay printing until
after sendmsg.

With this change, failure rate is significantly lower in current runs.

Link: https://lore.kernel.org/netdev/20260107110521.1aab55e9@kernel.org/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260112163355.3510150-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: implement get_link_ksettings

Implement the .get_link_ksettings to get the rate, duplex, and
auto-negotiation status.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Tested-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260110170212.570793-1-olek2@wp.pl
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-rds-rds-tcp-bug-fix-collection-subset-1-work-queue-scalability'

Allison Henderson says:

====================
net/rds: RDS-TCP bug fix collection, subset 1: Work queue scalability

This is subset 1 of the RDS-TCP bug fix collection series I posted last
Oct.  The greater series aims to correct multiple rds-tcp bugs that
can cause dropped or out of sequence messages.  The set was starting to
get a bit large, so I've broken it down into smaller sets to make
reviews more manageable.

In this subset, we focus on work queue scalability.  Messages queues
are refactored to operate in parallel across multiple connections,
which improves response times and avoids timeouts.

The entire set can be viewed in the rfc here:
https://lore.kernel.org/netdev/20251022191715.157755-1-achender@kernel.org/

Questions, comments, flames appreciated!
====================

Link: https://patch.msgid.link/20260109224843.128076-1-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/rds: Give each connection path its own workqueue

RDS was written to require ordered workqueues for "cp->cp_wq":
Work is executed in the order scheduled, one item at a time.

If these workqueues are shared across connections,
then work executed on behalf of one connection blocks work
scheduled for a different and unrelated connection.

Luckily we don't need to share these workqueues.
While it obviously makes sense to limit the number of
workers (processes) that ought to be allocated on a system,
a workqueue that doesn't have a rescue worker attached,
has a tiny footprint compared to the connection as a whole:
A workqueue costs ~900 bytes, including the workqueue_struct,
pool_workqueue, workqueue_attrs, wq_node_nr_active and the
node_nr_active flex array. Each connection can have up to 8
(RDS_MPATH_WORKERS) paths for a worst case of ~7 KBytes per
connection. While an RDS/IB connection totals only ~5 MBytes.

So we're getting a signficant performance gain
(90% of connections fail over under 3 seconds vs. 40%)
for a less than 0.02% overhead.

RDS doesn't even benefit from the additional rescue workers:
of all the reasons that RDS blocks workers, allocation under
memory pressue is the least of our concerns. And even if RDS
was stalling due to the memory-reclaim process, the work
executed by the rescue workers are highly unlikely to free up
any memory. If anything, they might try to allocate even more.

By giving each connection path its own workqueues, we allow
RDS to better utilize the unbound workers that the system
has available.

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20260109224843.128076-3-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/rds: Add per cp work queue

This patch adds a per connection workqueue which can be initialized
and used independently of the globally shared rds_wq.

This patch is the first in a series that aims to address tcp ack
timeouts during the tcp socket shutdown sequence.

This initial refactoring lays the ground work needed to alleviate
queue congestion during heavy reads and writes. The independently
managed queues will allow shutdowns and reconnects respond more quickly
before the peer(s) timeout waiting for the proper acks.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20260109224843.128076-2-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'multi-queue-aware-sch_cake'

says:

====================
Multi-queue aware sch_cake

This series adds a multi-queue aware variant of the sch_cake scheduler,
called 'cake_mq'. Using this makes it possible to scale the rate shaper
of sch_cake across multiple CPUs, while still enforcing a single global
rate on the interface.

The approach taken in this patch series is to implement a separate qdisc
called 'cake_mq', which is based on the existing 'mq' qdisc, but differs
in a couple of aspects:

- It will always install a cake instance on each hardware queue (instead
  of using the default qdisc for each queue like 'mq' does).

- The cake instances on the queues will share their configuration, which
  can only be modified through the parent cake_mq instance.

Doing things this way simplifies user configuration by centralising
all configuration through the cake_mq qdisc (which also serves as an
obvious way of opting into the multi-queue aware behaviour). The cake_mq
qdisc takes all the same configuration parameters as the cake qdisc.

An earlier version of this work was presented at this year's Netdevconf:
https://netdevconf.info/0x19/sessions/talk/mq-cake-scaling-software-rate-limiting-across-cpu-cores.html

The patch series is structured as follows:

- Patch 1 exports the mq qdisc functions for reuse.

- Patch 2 factors out the sch_cake configuration variables into a
  separate struct that can be shared between instances.

- Patch 3 adds the basic cake_mq qdisc, reusing the exported mq code

- Patch 4 adds configuration sharing across the cake instances installed
  under cake_mq

- Patch 5 adds the shared shaper state that enables the multi-core rate
  shaping

- Patch 6 adds selftests for cake_mq

A patch to iproute2 to make it aware of the cake_mq qdisc were submitted
separately with a previous patch version:

https://lore.kernel.org/r/20260105162902.1432940-1-toke@redhat.com
====================

Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-0-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests/tc-testing: add selftests for cake_mq qdisc

Test 684b: Create CAKE_MQ with default setting (4 queues)
Test 7ee8: Create CAKE_MQ with bandwidth limit (4 queues)
Test 1f87: Create CAKE_MQ with rtt time (4 queues)
Test e9cf: Create CAKE_MQ with besteffort flag (4 queues)
Test 7c05: Create CAKE_MQ with diffserv8 flag (4 queues)
Test 5a77: Create CAKE_MQ with diffserv4 flag (4 queues)
Test 8f7a: Create CAKE_MQ with flowblind flag (4 queues)
Test 7ef7: Create CAKE_MQ with dsthost and nat flag (4 queues)
Test 2e4d: Create CAKE_MQ with wash flag (4 queues)
Test b3e6: Create CAKE_MQ with flowblind and no-split-gso flag (4 queues)
Test 62cd: Create CAKE_MQ with dual-srchost and ack-filter flag (4 queues)
Test 0df3: Create CAKE_MQ with dual-dsthost and ack-filter-aggressive flag (4 queues)
Test 9a75: Create CAKE_MQ with memlimit and ptm flag (4 queues)
Test cdef: Create CAKE_MQ with fwmark and atm flag (4 queues)
Test 93dd: Create CAKE_MQ with overhead 0 and mpu (4 queues)
Test 1475: Create CAKE_MQ with conservative and ingress flag (4 queues)
Test 7bf1: Delete CAKE_MQ with conservative and ingress flag (4 queues)
Test ee55: Replace CAKE_MQ with mpu (4 queues)
Test 6df9: Change CAKE_MQ with mpu (4 queues)
Test 67e2: Show CAKE_MQ class (4 queues)
Test 2de4: Change bandwidth of CAKE_MQ (4 queues)
Test 5f62: Fail to create CAKE_MQ with autorate-ingress flag (4 queues)
Test 038e: Fail to change setting of sub-qdisc under CAKE_MQ
Test 7bdc: Fail to replace sub-qdisc under CAKE_MQ
Test 18e0: Fail to install CAKE_MQ on single queue device

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-6-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: sch_cake: share shaper state across sub-instances of cake_mq

This commit adds shared shaper state across the cake instances beneath a
cake_mq qdisc. It works by periodically tracking the number of active
instances, and scaling the configured rate by the number of active
queues.

The scan is lockless and simply reads the qlen and the last_active state
variable of each of the instances configured beneath the parent cake_mq
instance. Locking is not required since the values are only updated by
the owning instance, and eventual consistency is sufficient for the
purpose of estimating the number of active queues.

The interval for scanning the number of active queues is set to 200 us.
We found this to be a good tradeoff between overhead and response time.
For a detailed analysis of this aspect see the Netdevconf talk:

https://netdevconf.info/0x19/docs/netdev-0x19-paper16-talk-paper.pdf

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-5-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: sch_cake: Share config across cake_mq sub-qdiscs

This adds support for configuring the cake_mq instance directly, sharing
the config across the cake sub-qdiscs.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-4-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: sch_cake: Add cake_mq qdisc for using cake on mq devices

Add a cake_mq qdisc which installs cake instances on each hardware
queue on a multi-queue device.

This is just a copy of sch_mq that installs cake instead of the default
qdisc on each queue. Subsequent commits will add sharing of the config
between cake instances, as well as a multi-queue aware shaper algorithm.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-3-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: sch_cake: Factor out config variables into separate struct

Factor out all the user-configurable variables into a separate struct
and embed it into struct cake_sched_data. This is done in preparation
for sharing the configuration across multiple instances of cake in an mq
setup.

No functional change is intended with this patch.

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-2-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: Export mq functions for reuse

To enable the cake_mq qdisc to reuse code from the mq qdisc, export a
bunch of functions from sch_mq. Split common functionality out from some
functions so it can be composed with other code, and export other
functions wholesale. To discourage wanton reuse, put the symbols into a
new NET_SCHED_INTERNAL namespace, and a sch_priv.h header file.

No functional change intended.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-1-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'r8169-add-dash-and-ltr-support'

Javen Xu says:

====================
r8169: add dash and LTR support

From: Javen Xu <javen_xu@realsil.com.cn>

This series patch adds dash support for RTL8127AP and LTR support for
RTL8168FP/RTL8168EP/RTL8168H/RTL8125/RTL8126/RTL8127.
====================

Link: https://patch.msgid.link/20260109070415.1115-1-javen_xu@realsil.com.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

r8169: enable LTR support

This patch will enable
RTL8168FP/RTL8168EP/RTL8168H/RTL8125/RTL8126/RTL8127 LTR support.

Signed-off-by: Javen Xu <javen_xu@realsil.com.cn>
Link: https://patch.msgid.link/20260109070415.1115-3-javen_xu@realsil.com.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

r8169: add DASH support for RTL8127AP

This adds DASH support for chip RTL8127AP. Its mac version is
RTL_GIGA_MAC_VER_80 and revision id is 0x04. DASH is a standard for
remote management of network device, allowing out-of-band control.

Signed-off-by: Javen Xu <javen_xu@realsil.com.cn>
Link: https://patch.msgid.link/20260109070415.1115-2-javen_xu@realsil.com.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: Add IFC bits for extended ETS rate limit bandwidth value

Add hardware interface definitions to support extended bandwidth rate
limiting in the QoS Enhanced Transmission Selection (ETS) configuration.

The new fields include:
- max_bw_value: extended from 8-bit to 16-bit in ets_tcn_config_reg,
  simplifying the implementation by using a single field instead of
  separate MSB/LSB fields.
- qetcr_qshr_max_bw_val_msb: capability bit in qcam_qos_feature_cap_mask
  indicating device support for the extended 16-bit max_bw_value field.

These interface additions are prerequisites for increasing the per-TC
rate limit beyond 255 Gbps to support higher-bandwidth NICs.

Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768200608-1543180-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

Merge branch 'net-stmmac-pcs-clean-up-pcs-interrupt-handling'

Russell King says:

====================
net: stmmac: pcs: clean up pcs interrupt handling

Clean up the stmmac PCS interrupt handling:

- Avoid promotion to unsigned long from unsigned int by defining PCS
  register bits/fields using u32 macros.
- Pass struct stmmac_priv into the host_irq_status MAC core method.
- Move the existing PCS interrupt handler (dwmac_pcs_isr) into
  stmmac_pcs.c, change it's arguments, use dev_info() rather than
  pr_info()
- arrange to call phylink_pcs_change() on link state changes.
====================

Link: https://patch.msgid.link/aWOiOfDQkMXDwtPp@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: report PCS link changes to phylink

Report PCS link changes to phylink, which will allow phylink's inband
support to respoind to link events once the PCS is appropriately
configured.

An expected behavioural change is that should the PCS report that its
link has failed, but phylink is operating in outband mode and the PHY
reports that link is up, this event will cause the netdev's link to
momentarily drop, making the event more noticable, rather than just
producing a "stmmac_pcs: Link Down" message.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vevI1-00000002Yp8-3cM3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: change arguments to PCS handler and use dev_info()

Change the arguments to the PCS handler so that it can access the
struct device pointer and integrated PCS pointers.

This allows us to use the PCS register offset stored in struct
stmmac_pcs rather than passing it into the function, and also allows
the messages to be printed using dev_info() rather than pr_info(),
thereby allowing the stmmac instance to be identified.

Finally, as dev_info() identifies the driver/device, prefixing with
"stmmac_pcs: " is now redundant, so replace this with just "PCS ".

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vevHw-00000002Yoz-35A7@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: pass struct stmmac_priv to host_irq_status() method

Rather than passing struct mac_device_info to the host_irq_status()
method, pass struct stmmac_priv so that we can pass the integrated
PCS to the PCS interrupt handler.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vevHr-00000002YoY-2X2i@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move and rename dwmac_pcs_isr()

dwmac_pcs_isr() doesn't need to be inlined into the MAC's
host_irq_status method, as handling PCS interrupts isn't performance
critical. However, there is little point calling this function unless
an interrupt is pending for the PCS.

Rename it to stmmac_integrated_pcs_irq() while moving it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vevHm-00000002YoS-23RX@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: use BIT_U32() and GENMASK_U32() for PCS registers

stmmac registers a 32-bit. u32 is unsigned int. The use of BIT() and
GENMASK() leads to integer promotion to unsigned long in expressions
such as:

u32 old = foo;

dev_info(dev, "%08x %08x\n", old, old & BIT(1));

resulting in arg2 being accepted as compatible with the format string
and arg3 warning that the argument does not match (because the former
is unsigned int, and the latter is unsigned long.)

Fix this by defining 32-bit register bits using BIT_U32() and
GENMASK_U32() macros.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vevHh-00000002YoM-1TYL@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add skbuff_clear() helper

clang is unable to inline the memset() calls in net/core/skbuff.c
when initializing allocated sk_buff.

memset(skb, 0, offsetof(struct sk_buff, tail));

This is unfortunate, because:

1) calling external memset_orig() helper adds a call/ret and
   typical setup cost.

2) offsetof(struct sk_buff, tail) == 0xb8 = 0x80 + 0x38

   On x86_64, memset_orig() performs two 64 bytes clear,
   then has to loop 7 times to clear the final 56 bytes.

skbuff_clear() makes sure the minimal and optimal code
is generated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260109203836.1667441-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'r8169-add-support-for-rtl8127atf-10g-fiber-sfp'

Heiner Kallweit says:

====================
r8169: add support for RTL8127ATF (10G Fiber SFP)

RTL8127ATF supports a SFP+ port for fiber modules (10GBASE-SR/LR/ER/ZR and
DAC). The list of supported modes was provided by Realtek. According to the
r8127 vendor driver also 1G modules are supported, but this needs some more
complexity in the driver, and only 10G mode has been tested so far.
Therefore mainline support will be limited to 10G for now.
The SFP port signals are hidden in the chip IP and driven by firmware.
Therefore mainline SFP support can't be used here.
The PHY driver is used by the RTL8127ATF support in r8169.
RTL8127ATF reports the same PHY ID as the TP version. Therefore use a dummy
PHY ID.
====================

Link: https://patch.msgid.link/c2ad7819-85f5-4df8-8ecf-571dbee8931b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: add support for RTL8127ATF (Fiber SFP)

RTL8127ATF supports a SFP+ port for fiber modules (10GBASE-SR/LR/ER/ZR and
DAC). The list of supported modes was provided by Realtek. According to the
r8127 vendor driver also 1G modules are supported, but this needs some more
complexity in the driver, and only 10G mode has been tested so far.
Therefore mainline support will be limited to 10G for now.
The SFP port signals are hidden in the chip IP and driven by firmware.
Therefore mainline SFP support can't be used here.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Fabio Baltieri <fabio.baltieri@gmail.com>
Link: https://patch.msgid.link/5c390273-458f-4d92-896b-3d85f2998d7d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: add dummy PHY driver for RTL8127ATF

RTL8127ATF supports a SFP+ port for fiber modules (10GBASE-SR/LR/ER/ZR and
DAC). The list of supported modes was provided by Realtek. According to the
r8127 vendor driver also 1G modules are supported, but this needs some more
complexity in the driver, and only 10G mode has been tested so far.
Therefore mainline support will be limited to 10G for now.
The SFP port signals are hidden in the chip IP and driven by firmware.
Therefore mainline SFP support can't be used here.
This PHY driver is used by the RTL8127ATF support in r8169.
RTL8127ATF reports the same PHY ID as the TP version. Therefore use a dummy
PHY ID. This PHY driver is used by the RTL8127ATF support in r8169.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/e3d55162-210a-4fab-9abf-99c6954eee10@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mctp-i2c: fix duplicate reception of old data

The MCTP I2C slave callback did not handle I2C_SLAVE_READ_REQUESTED
events. As a result, i2c read event will trigger repeated reception of
old data, reset rx_pos when a read request is received.

Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com>
Link: https://patch.msgid.link/20260108101829.1140448-1-zhangjian.3032@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-dwmac-glue-driver-for-motorcomm-yt6801'

Yao Zi says:

====================
Add DWMAC glue driver for Motorcomm YT6801

This series adds glue driver for Motorcomm YT6801 PCIe ethernet
controller, which is considered mostly compatible with DWMAC-4 IP by
inspecting the register layout[1]. It integrates a Motorcomm YT8531S PHY
(confirmed by reading PHY ID) and GMII is used to connect the PHY to
MAC[2].

The initialization logic of the MAC is mostly based on previous upstream
effort for the controller[3] and the Deepin-maintained downstream Linux
driver[4] licensed under GPL-2.0 according to its SPDX headers. However,
this series is a completely re-write of the previous patch series,
utilizing the existing DWMAC4 driver and introducing a glue driver only.

This series only aims to add basic networking functions for the
controller, features like WoL, RSS and LED control are omitted for now.
Testing is done on i3-4170, it reaches 939Mbps (TX)/933Mbps (RX) on
average,

YT6801 TX

Connecting to host 192.168.114.51, port 5201
[  5] local 192.168.114.50 port 52986 connected to 192.168.114.51 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   938 Mbits/sec    0    950 KBytes
[  5]   1.00-2.00   sec   113 MBytes   949 Mbits/sec    0   1.08 MBytes
[  5]   2.00-3.00   sec   112 MBytes   938 Mbits/sec    0   1.08 MBytes
[  5]   3.00-4.00   sec   111 MBytes   932 Mbits/sec    0   1.13 MBytes
[  5]   4.00-5.00   sec   113 MBytes   945 Mbits/sec    0   1.13 MBytes
[  5]   5.00-6.00   sec   112 MBytes   936 Mbits/sec    0   1.13 MBytes
[  5]   6.00-7.00   sec   112 MBytes   942 Mbits/sec    0   1.19 MBytes
[  5]   7.00-8.00   sec   112 MBytes   935 Mbits/sec    0   1.19 MBytes
[  5]   8.00-9.00   sec   113 MBytes   948 Mbits/sec    0   1.19 MBytes
[  5]   9.00-10.00  sec   111 MBytes   931 Mbits/sec    0   1.19 MBytes

YT6801 RX

Connecting to host 192.168.114.50, port 5201
[  5] local 192.168.114.51 port 41578 connected to 192.168.114.50 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   113 MBytes   944 Mbits/sec    0    542 KBytes
[  5]   1.00-2.00   sec   111 MBytes   934 Mbits/sec    0    850 KBytes
[  5]   2.00-3.00   sec   111 MBytes   933 Mbits/sec    0   1.01 MBytes
[  5]   3.00-4.00   sec   112 MBytes   943 Mbits/sec    0   1.01 MBytes
[  5]   4.00-5.00   sec   111 MBytes   932 Mbits/sec    0   1.01 MBytes
[  5]   5.00-6.00   sec   111 MBytes   929 Mbits/sec    0   1.01 MBytes
[  5]   6.00-7.00   sec   112 MBytes   937 Mbits/sec    0   1.01 MBytes
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec    0   1.01 MBytes
[  5]   8.00-9.00   sec   111 MBytes   929 Mbits/sec    0   1.01 MBytes
[  5]   9.00-10.00  sec   111 MBytes   932 Mbits/sec    0   1.01 MBytes
====================

Link: https://patch.msgid.link/20260109093445.46791-2-me@ziyao.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: Assign myself as maintainer of Motorcomm DWMAC glue driver

I volunteer to maintain the DWMAC glue driver for Motorcomm ethernet
controllers.

Signed-off-by: Yao Zi <me@ziyao.cc>
Link: https://patch.msgid.link/20260109093445.46791-5-me@ziyao.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: Add glue driver for Motorcomm YT6801 ethernet controller

Motorcomm YT6801 is a PCIe ethernet controller based on DWMAC4 IP. It
integrates an GbE phy, supporting WOL, VLAN tagging and various types
of offloading. It ships an on-chip eFuse for storing various vendor
configuration, including MAC address.

This patch adds basic glue code for the controller, allowing it to be
set up and transmit data at a reasonable speed. Features like WOL could
be implemented in the future.

Signed-off-by: Yao Zi <me@ziyao.cc>
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: Runhua He <hua@aosc.io>
Tested-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Link: https://patch.msgid.link/20260109093445.46791-4-me@ziyao.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: motorcomm: Support YT8531S PHY in YT6801 Ethernet controller

YT6801's internal PHY is confirmed as a GMII-capable variant of YT8531S
by a previous series[1] and reading PHY ID. Add support for
PHY_INTERFACE_MODE_GMII for YT8531S to allow the Ethernet driver to
reuse the PHY code for its internal PHY.

Link: https://lore.kernel.org/all/a48d76ac-db08-46d5-9528-f046a7b541dc@motor-comm.com/
Co-developed-by: Frank Sae <Frank.Sae@motor-comm.com>
Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Signed-off-by: Yao Zi <me@ziyao.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260109093445.46791-3-me@ziyao.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net/ipsec: Fix variable size type not at the end of struct

The "struct alg" object contains a union of 3 xfrm structures:

union {
struct xfrm_algo;
struct xfrm_algo_aead;
struct xfrm_algo_auth;
}

All of them end with a flexible array member used to store key material,
but the flexible array appears at *different offsets* in each struct.
bcz of this, union itself is of variable-sized & Placing it above
char buf[...] triggers:

ipsec.c:835:5: warning: field 'u' with variable sized type 'union
(unnamed union at ipsec.c:831:3)' not at the end of a struct or class
is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
835 | } u;
| ^

one fix is to use "TRAILING_OVERLAP()" which works with one flexible
array member only.

But In "struct alg" flexible array member exists in all union members,
but not at the same offset, so TRAILING_OVERLAP cannot be applied.

so the fix is to explicitly overlay the key buffer at the correct offset
for the largest union member (xfrm_algo_auth). This ensures that the
flexible-array region and the fixed buffer line up.

No functional change.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20260109152201.15668-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipconfig: Remove outdated comment and indent code block

The comment has been around ever since commit 1da177e4c3f4
("Linux-2.6.12-rc2") and can be removed. Remove it and indent the code
block accordingly.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20260109121128.170020-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-cleanups-and-low-priority-fixes'

Russell King says:

====================
net: stmmac: cleanups and low priority fixes

Further cleanups and a few low priority fixes:

- Remove duplicated register definitions from header files
- Fix harmless wrong definition used for PTP message type in
  descriptors
- Fix norm_set_tx_desc_len_on_ring() off-by-one error (and make
  enh_set_tx_desc_len_on_ring() follow a similar pattern.)
  Document the buffer size limits. I believe we never call
  norm_set_tx_desc_len_on_ring() with 2KiB lengths.
- use u32 rather than unsigned int for 32-bit quantities in
  descriptors
- modernise: convert to use FIELD_PREP() rather than separate mask
  and shift definitions.
- Reorganise register and register field definitions: registers
  defined in address offset order followed by their register field
  definitions.
- Remove lots of unused register definitions.
====================

Link: https://patch.msgid.link/aV_q2Kneinrk3Z-W@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: remove unused definitions

Potentially unused definitions were discovered using:

$ for m in $(grep '#define ' $header | sed -e 's,#define[ ]*$[^ ]*$[ ].*,\1,;s,(.*,,'); do if ! grep -q $m *.c; then echo $m; fi; done

Each was verified, and then removed where truly unused.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vdtwI-00000002Gu6-1HYu@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: arrange register fields after register offsets

Arrange the register fields to be after their corresponding register
offset definitions, which groups all the definitions for a register
together.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vdtwD-00000002Gu0-0nTN@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: cores: remove many xxx_SHIFT definitions

We have many xxx_SHIFT definitions along side their corresponding
xxx_MASK definitions for the various cores. Manually using the
shift and mask can be error prone, as shown with the dwmac4 RXFSTS
fix patch.

Convert sites that use xxx_SHIFT and xxx_MASK directly to use
FIELD_GET(), FIELD_PREP(), and u32_replace_bits() as appropriate.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vdtw8-00000002Gtu-0Hyu@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: descs: remove many xxx_SHIFT definitions

Remove many xxx_SHIFT definitions for descriptors, isntead using
FIELD_PREP(), FIELD_GET(), and u32_replace_bits() as appropriate to
manipulate the bitfields. This avoids potential errors where an
incorrect shift is used with a mask.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vdtw2-00000002Gto-3ZPt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: descs: use u32 for descriptors

Use u32 rather than unsigned int for 32-bit descriptor variables.
This will allow the u32 bitfield helpers to be used. Note, we use
__le32 for the in-memory descriptor structures.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vdtvx-00000002Gth-32RU@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: descs: fix buffer 1 off-by-one error

norm_set_tx_desc_len_on_ring() incorrectly tests the buffer length,
leading to a length of 2048 being squeezed into a bitfield covering
bits 10:0 - which results in the buffer 1 size being zero.

If this field is zero, buffer 1 is ignored, and thus is equivalent to
transmitting a zero length buffer.

The path to norm_set_tx_desc_len_on_ring() is only possible when the
hardware does not support enhanced descriptors (plat->enh_desc clear)
which is dependent on the hardware.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vdtvs-00000002Gtb-2U9G@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac4: fix PTP message type field extraction

In dwmac4_wrback_get_rx_status(), the code extracts the PTP message
type from receive descriptor 1 using the dwmac enhanced descriptor
definitions:

message_type = (rdes1 & ERDES4_MSG_TYPE_MASK) >> 8;

This is defined as:

#define ERDES4_MSG_TYPE_MASK GENMASK(11, 8)

The correct definition is RDES1_PTP_MSG_TYPE_MASK, which is also
defined as:

#define RDES1_PTP_MSG_TYPE_MASK GENMASK(11, 8)

Use the correct definition, converting to use FIELD_GET() to extract
it without needing an open-coded shift right that is dependent on the
mask definition.

As this change has no effect on the generated code, there is no need
to treat this as a bug fix.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vdtvn-00000002GtV-1wCS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac4: fix RX FIFO fill statistics

In dwmac4_debug(), the wrong shift is used with the RXFSTS mask:

#define MTL_DEBUG_RXFSTS_MASK          GENMASK(5, 4)
#define MTL_DEBUG_RXFSTS_SHIFT         4
#define MTL_DEBUG_RRCSTS_SHIFT         1

                       u32 rxfsts = (value & MTL_DEBUG_RXFSTS_MASK)
                                    >> MTL_DEBUG_RRCSTS_SHIFT;

where rxfsts is tested against small integers 1 .. 3. This results in
the tests always failing, causing the "mtl_rx_fifo__fill_level_empty"
statistic counter to always be incremented no matter what the fill
level actually is.

Fix this by using FIELD_GET() and remove the unnecessary
MTL_DEBUG_RXFSTS_SHIFT definition as FIELD_GET() will shift according
to the least siginificant set bit in the supplied field mask.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vdtvi-00000002GtP-1Os1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac4: remove duplicated definitions

dwmac4.h duplicates some of the debug register definitions. Remove
the second copy.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vdtvd-00000002GtJ-0qFI@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: convert binding refcount to percpu_ref

Convert net_devmem_dmabuf_binding refcount from refcount_t to percpu_ref
to optimize common-case reference counting on the hot path.

The typical devmem workflow involves binding a dmabuf to a queue
(acquiring the initial reference on binding->ref), followed by
high-volume traffic where every skb fragment acquires a reference.
Eventually traffic stops and the unbind operation releases the initial
reference. Additionally, the high traffic hot path is often multi-core.
This access pattern is ideal for percpu_ref as the first and last
reference during bind/unbind normally book-ends activity in the hot
path.

__net_devmem_dmabuf_binding_free becomes the percpu_ref callback invoked
when the last reference is dropped.

kperf test:
- 4MB message sizes
- 60s of workload each run
- 5 runs
- 4 flows

Throughput:
Before: 45.31 GB/s (+/- 3.17 GB/s)
After: 48.67 GB/s (+/- 0.01 GB/s)

Picking throughput-matched kperf runs (both before and after matched at
~48 GB/s) for apples-to-apples comparison:

Summary (averaged across 4 workers):

  TX worker CPU idle %:
    Before: 34.44%
    After: 87.13%

  RX worker CPU idle %:
    Before: 5.38%
    After: 9.73%

kperf before:

client: == Source
client:   Tx 98.100 Gbps (735764807680 bytes in 60001149 usec)
client:   Tx102.798 Gbps (770996961280 bytes in 60001149 usec)
client:   Tx101.534 Gbps (761517834240 bytes in 60001149 usec)
client:   Tx 82.794 Gbps (620966707200 bytes in 60001149 usec)
client:   net CPU 56: usr: 0.01% sys: 0.12% idle:17.06% iow: 0.00% irq: 9.89% sirq:72.91%
client:   app CPU 60: usr: 0.08% sys:63.30% idle:36.24% iow: 0.00% irq: 0.30% sirq: 0.06%
client:   net CPU 57: usr: 0.03% sys: 0.08% idle:75.68% iow: 0.00% irq: 2.96% sirq:21.23%
client:   app CPU 61: usr: 0.06% sys:67.67% idle:31.94% iow: 0.00% irq: 0.28% sirq: 0.03%
client:   net CPU 58: usr: 0.01% sys: 0.06% idle:76.87% iow: 0.00% irq: 2.84% sirq:20.19%
client:   app CPU 62: usr: 0.06% sys:69.78% idle:29.79% iow: 0.00% irq: 0.30% sirq: 0.05%
client:   net CPU 59: usr: 0.06% sys: 0.16% idle:74.97% iow: 0.00% irq: 3.76% sirq:21.03%
client:   app CPU 63: usr: 0.06% sys:59.82% idle:39.80% iow: 0.00% irq: 0.25% sirq: 0.05%
client: == Target
client:   Rx 98.092 Gbps (735764807680 bytes in 60006084 usec)
client:   Rx102.785 Gbps (770962161664 bytes in 60006084 usec)
client:   Rx101.523 Gbps (761499566080 bytes in 60006084 usec)
client:   Rx 82.783 Gbps (620933136384 bytes in 60006084 usec)
client:   net CPU  2: usr: 0.00% sys: 0.01% idle:24.51% iow: 0.00% irq: 1.67% sirq:73.79%
client:   app CPU  6: usr: 1.51% sys:96.43% idle: 1.13% iow: 0.00% irq: 0.36% sirq: 0.55%
client:   net CPU  1: usr: 0.00% sys: 0.01% idle:25.18% iow: 0.00% irq: 1.99% sirq:72.80%
client:   app CPU  5: usr: 2.21% sys:94.54% idle: 2.54% iow: 0.00% irq: 0.38% sirq: 0.30%
client:   net CPU  3: usr: 0.00% sys: 0.01% idle:26.34% iow: 0.00% irq: 2.12% sirq:71.51%
client:   app CPU  7: usr: 2.22% sys:94.28% idle: 2.52% iow: 0.00% irq: 0.59% sirq: 0.37%
client:   net CPU  0: usr: 0.00% sys: 0.03% idle: 0.00% iow: 0.00% irq:10.44% sirq:89.51%
client:   app CPU  4: usr: 2.39% sys:81.46% idle:15.33% iow: 0.00% irq: 0.50% sirq: 0.30%

kperf after:

client: == Source
client:   Tx 99.257 Gbps (744447016960 bytes in 60001303 usec)
client:   Tx101.013 Gbps (757617131520 bytes in 60001303 usec)
client:   Tx 88.179 Gbps (661357854720 bytes in 60001303 usec)
client:   Tx101.002 Gbps (757533245440 bytes in 60001303 usec)
client:   net CPU 56: usr: 0.00% sys: 0.01% idle: 6.22% iow: 0.00% irq: 8.68% sirq:85.06%
client:   app CPU 60: usr: 0.08% sys:12.56% idle:87.21% iow: 0.00% irq: 0.08% sirq: 0.05%
client:   net CPU 57: usr: 0.00% sys: 0.05% idle:69.53% iow: 0.00% irq: 2.02% sirq:28.38%
client:   app CPU 61: usr: 0.11% sys:13.40% idle:86.36% iow: 0.00% irq: 0.08% sirq: 0.03%
client:   net CPU 58: usr: 0.00% sys: 0.03% idle:70.04% iow: 0.00% irq: 3.38% sirq:26.53%
client:   app CPU 62: usr: 0.10% sys:11.46% idle:88.31% iow: 0.00% irq: 0.08% sirq: 0.03%
client:   net CPU 59: usr: 0.01% sys: 0.06% idle:71.18% iow: 0.00% irq: 1.97% sirq:26.75%
client:   app CPU 63: usr: 0.10% sys:13.10% idle:86.64% iow: 0.00% irq: 0.10% sirq: 0.05%
client: == Target
client:   Rx 99.250 Gbps (744415182848 bytes in 60003297 usec)
client:   Rx101.006 Gbps (757589737472 bytes in 60003297 usec)
client:   Rx 88.171 Gbps (661319475200 bytes in 60003297 usec)
client:   Rx100.996 Gbps (757514792960 bytes in 60003297 usec)
client:   net CPU  2: usr: 0.00% sys: 0.01% idle:28.02% iow: 0.00% irq: 1.95% sirq:70.00%
client:   app CPU  6: usr: 2.03% sys:87.20% idle:10.04% iow: 0.00% irq: 0.37% sirq: 0.33%
client:   net CPU  3: usr: 0.00% sys: 0.00% idle:27.63% iow: 0.00% irq: 1.90% sirq:70.45%
client:   app CPU  7: usr: 1.78% sys:89.70% idle: 7.79% iow: 0.00% irq: 0.37% sirq: 0.34%
client:   net CPU  0: usr: 0.00% sys: 0.01% idle: 0.00% iow: 0.00% irq: 9.96% sirq:90.01%
client:   app CPU  4: usr: 2.33% sys:83.51% idle:13.24% iow: 0.00% irq: 0.64% sirq: 0.26%
client:   net CPU  1: usr: 0.00% sys: 0.01% idle:27.60% iow: 0.00% irq: 1.94% sirq:70.43%
client:   app CPU  5: usr: 1.88% sys:89.61% idle: 7.86% iow: 0.00% irq: 0.35% sirq: 0.27%

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260107-upstream-precpu-ref-v2-v2-1-a709f098b3dc@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2026-01-09 (ice, ixgbe, idpf)

For ice:
Grzegorz commonizes firmware loading process across all ice devices.

Michal adjusts default queue allocation to be based on
netif_get_num_default_rss_queues() rather than num_online_cpus().

For ixgbe:
Birger Koblitz adds support for 10G-BX modules.

For idpf:
Sreedevi converts always successful function to return void.

Andy Shevchenko fixes kdocs for missing 'Return:' in idpf_txrx.c file.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  idpf: Fix kernel-doc descriptions to avoid warnings
  idpf: update idpf_up_complete() return type to void
  ice: use netif_get_num_default_rss_queues()
  ixgbe: Add 10G-BX support
  ice: unify PHY FW loading status handler for E800 devices
====================

Link: https://patch.msgid.link/20260109210647.3849008-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
First set of changes for the current -next cycle, of note:

- ath12k gets an overhaul to support multi-wiphy device
   wiphy and pave the way for future device support in
   the same driver (rather than splitting to ath13k)

- mac80211 gets some better iteration macros

* tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (120 commits)
  wifi: mac80211: remove width argument from ieee80211_parse_bitrates
  wifi: mac80211_hwsim: remove NAN by default
  wifi: mac80211: improve station iteration ergonomics
  wifi: mac80211: improve interface iteration ergonomics
  wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channel
  wifi: mac80211: unexport ieee80211_get_bssid()
  wl1251: Replace strncpy with strscpy in wl1251_acx_fw_version
  wifi: iwlegacy: 3945-rs: remove redundant pointer check in il3945_rs_tx_status() and il3945_rs_get_rate()
  wifi: mac80211: don't send an unused argument to ieee80211_check_combinations
  wifi: libertas: fix WARNING in usb_tx_block
  wifi: mwifiex: Allocate dev name earlier for interface workqueue name
  wifi: wlcore: sdio: Use pm_ptr instead of #ifdef CONFIG_PM
  wifi: cfg80211: Fix use_for flag update on BSS refresh
  wifi: brcmfmac: rename function that frees vif
  wifi: brcmfmac: fix/add kernel-doc comments
  wifi: mac80211: Update csa_finalize to use link_id
  wifi: cfg80211: add cfg80211_stop_link() for per-link teardown
  wifi: ath12k: Skip DP peer creation for scan vdev
  wifi: ath12k: move firmware stats request outside of atomic context
  wifi: ath12k: add the missing RCU lock in ath12k_dp_tx_free_txbuf()
  ...
====================

Link: https://patch.msgid.link/20260112185836.378736-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tools-ynl-cli-improve-the-help-and-doc'

Jakub Kicinski says:

====================
tools: ynl: cli: improve the help and doc

I had some time on the plane to LPC, so here are improvements
to the --help and --list-attrs handling of YNL CLI which seem
in order given growing use of YNL as a real CLI tool.
====================

Link: https://patch.msgid.link/20260110233142.3921386-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: cli: print reply in combined format if possible

As pointed out during review of the --list-attrs support the GET
ops very often return the same attrs from do and dump. Make the
output more readable by combining the reply information, from:

  Do request attributes:
    - ifindex: u32
      netdev ifindex

  Do reply attributes:
    - ifindex: u32
      netdev ifindex
    [ .. other attrs .. ]

  Dump reply attributes:
    - ifindex: u32
      netdev ifindex
    [ .. other attrs .. ]

To, after:

  Do request attributes:
    - ifindex: u32
      netdev ifindex

  Do and Dump reply attributes:
    - ifindex: u32
      netdev ifindex
    [ .. other attrs .. ]

Tested-by: Gal Pressman <gal@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260110233142.3921386-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: cli: extract the event/notify handling in --list-attrs

Event and notify handling is quite different from do / dump
handling. Forcing it into print_mode_attrs() doesn't really
buy us anything as events and notifications do not have requests.
Call print_attr_list() directly. Apart form subjective code
clarity this also removes the word "reply" from the output:

Before:

Event reply attributes:

Now:

Event attributes:

Tested-by: Gal Pressman <gal@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260110233142.3921386-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: cli: factor out --list-attrs / --doc handling

We'll soon add more code to the --doc handling. Factor it out
to avoid making main() too long.

Tested-by: Gal Pressman <gal@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260110233142.3921386-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: cli: add --doc as alias to --list-attrs

--list-attrs also provides information about the operation itself.
So --doc seems more appropriate. Add an alias.

Tested-by: Gal Pressman <gal@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260110233142.3921386-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: cli: improve --help

Improve the clarity of --help. Reorder, provide some grouping and
add help messages to most of the options.

No functional changes intended.

Tested-by: Gal Pressman <gal@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260110233142.3921386-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: cli: wrap the doc text if it's long

We already use textwrap when printing "doc" section about an attribute,
but only to indent the text. Switch to using fill() to split and indent
all the lines. While at it indent the text by 2 more spaces, so that it
doesn't align with the name of the attribute.

Before (I'm drawing a "box" at ~60 cols here, in an attempt for clarity):

|  - irq-suspend-timeout: uint                              |
|    The timeout, in nanoseconds, of how long to suspend irq|
|processing, if event polling finds events                  |

After:

|  - irq-suspend-timeout: uint                              |
|      The timeout, in nanoseconds, of how long to suspend  |
|      irq processing, if event polling finds events        |

Tested-by: Gal Pressman <gal@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260110233142.3921386-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: cli: introduce formatting for attr names in --list-attrs

It's a little hard to make sense of the output of --list-attrs,
it looks like a wall of text. Sprinkle a little bit of formatting -
make op and attr names bold, and Enum: / Flags: keywords italics.

Tested-by: Gal Pressman <gal@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260110233142.3921386-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>