git.ipfire.org Git - thirdparty/linux.git/log

dpll: zl3073x: report FFO as DPLL vs input reference offset

Replace the per-reference frequency offset measurement (which was
redundant with measured-frequency) with a direct read of the DPLL's
delta frequency offset vs its tracked input reference.

The new implementation uses the dpll_df_offset_x register with
ref_ofst=1 via the dpll_df_read_x semaphore mechanism. This
provides 2^-48 resolution (~3.5 fE) and reports the actual
frequency difference between the DPLL and its active input.

Switch supported_ffo from DPLL_FFO_PORT_RXTX_RATE to
DPLL_FFO_PIN_DEVICE so FFO is reported only in the per-parent
context for the active input pin.

Use atomic64_t for freq_offset to prevent torn reads on 32-bit
architectures between the periodic worker and netlink callbacks.

Rewrite ffo_check to compare the cached df_offset converted to PPT
instead of using the old per-reference measurement. Remove the
ref_ffo_update periodic measurement and the ref ffo field since
they are no longer needed.

Changes v3 -> v4:
- Switch to DPLL_FFO_PIN_DEVICE, remove dpll=NULL guard
- Use atomic64_t for freq_offset (torn read on 32-bit)

Reviewed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260511155816.99936-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: add fractional frequency offset to pin-parent-device

Add both fractional-frequency-offset (PPM) and
fractional-frequency-offset-ppt (PPT) attributes to the
pin-parent-device nested attribute set, alongside the existing
top-level pin attributes. Both carry the same measurement at
different precisions.

Introduce enum dpll_ffo_type and struct dpll_ffo_param to
distinguish FFO contexts: DPLL_FFO_PORT_RXTX_RATE for the RX vs
TX symbol rate offset reported at the top level, and
DPLL_FFO_PIN_DEVICE for the pin vs parent DPLL offset reported
in the pin-parent-device nest.

Add a supported_ffo bitmask to struct dpll_pin_ops so drivers
declare which FFO types they support. The core only calls ffo_get
for types the driver has opted into, eliminating the need for
per-driver NULL pointer guards. Validate at pin registration time
that supported_ffo is not set without an ffo_get callback.

Update mlx5 (DPLL_FFO_PORT_RXTX_RATE) and zl3073x
(DPLL_FFO_PORT_RXTX_RATE) drivers to use the new API.

Add documentation for both FFO types to dpll.rst.

Changes v3 -> v4:
- Replace dpll=NULL overloading with enum dpll_ffo_type and
  struct dpll_ffo_param (Jakub Kicinski)
- Add supported_ffo opt-in bitmask in dpll_pin_ops for fail-close
  driver validation (Jakub Kicinski)
- Add WARN_ON in dpll_pin_register for supported_ffo without
  ffo_get callback

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260511155816.99936-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: add support for RTL8367SB

Add chip info entry for the Realtek RTL8367SB switch. This device has
chip ID 0x6367 and version 0x0010. It exposes two external interfaces:
port 6 supports MII, TMII, RMII, RGMII, SGMII and HSGMII, while port 7
supports MII, TMII, RMII and RGMII. Use the existing 8365MB-VC jam table
for initialization.

Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Link: https://patch.msgid.link/3c6d822b-0e85-4173-86ba-2badb140bbf1@yahoo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-use-ip_outnoroutes-drop-reason'

Eric Dumazet says:

====================
net: use IP_OUTNOROUTES drop reason

First patch changes sk_skb_reason_drop() sock to be const.

Second and last patch add SKB_DROP_REASON_IP_OUTNOROUTES
to both tcp_v6_send_response() and inet6_csk_xmit().
====================

Link: https://patch.msgid.link/20260511072310.1094859-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: use SKB_DROP_REASON_IP_OUTNOROUTES in inet6_csk_xmit()

Replace a bare kfree_skb() with a modern sk_skb_reason_drop() call,
and provide IP_OUTNOROUTES drop reason.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260511072310.1094859-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: use SKB_DROP_REASON_IP_OUTNOROUTES in tcp_v6_send_response()

Replace a bare kfree_skb() with a modern sk_skb_reason_drop() call,
and provide IP_OUTNOROUTES drop reason.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260511072310.1094859-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: constify sk_skb_reason_drop() sock parameter

sk_skb_reason_drop() does not change sock parameter, make it
const so that we can call it from TCP stack without a cast
on a (const) listener socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260511072310.1094859-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtnetlink: add RTEXT_FILTER_NAME_ONLY support

iproute2 can spend considerable amount of time in ll_init_map()
or ll_link_get() to dump verbose netdev attributes, contributing
to RTNL pressure.

Add RTEXT_FILTER_NAME_ONLY new flag so that rtnl_fill_ifinfo()
limits its output to:

- struct nlmsghdr
- IFLA_IFNAME
- IFLA_PROP_LIST (alternate names)

We can later avoid using RTNL when RTEXT_FILTER_NAME_ONLY
is requested, as none of these attributes need RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260511070244.971028-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'rework-pci_device_id-initialisation'

Uwe Kleine-König says:

====================
Rework pci_device_id initialisation
====================

Link: https://patch.msgid.link/20260511090023.1634387-4-u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Consistently define pci_device_ids using named initializers

... and PCI device helpers.

The various struct pci_device_id arrays were initialized mostly by one
the PCI_DEVICE macros and then list expressions. The latter isn't easily
readable if you're not into PCI. Using named initializers is more
explicit and thus easier to parse.

Also use PCI_DEVICE* helper macros to assign .vendor, .device,
.subvendor and .subdevice where appropriate and skip explicit
assignments of 0 (which the compiler takes care of).

The secret plan is to make struct pci_device_id::driver_data an
anonymous union (similar to
https://lore.kernel.org/all/cover.1776579304.git.u.kleine-koenig@baylibre.com/)
and that requires named initializers. But it's also a nice cleanup on
its own.

This change doesn't introduce changes to the compiled pci_device_id
arrays. Tested on x86 and arm64.

Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Forwarded: id:76da4f44d48bdde84580963862bf9616bee5c9e9.1778149923.git.u.kleine-koenig@baylibre.com (v2)
Reviewed-by: Michael Grzeschik <mgr@kernel.org>
Link: https://patch.msgid.link/20260511090023.1634387-6-u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: nfp: Drop PCI class entries with .class_mask = 0

With .class_mask being zero the value of .class doesn't matter because
to check if a pci_device_id entry matches a given device the expression

(id->class ^ dev->class) & id->class_mask

is checked for being zero (see pci_match_one_device()). So drop the
useless and irritating assignment for .class to match what (I think) all
other drivers are doing that don't need to match on .class, i.e. set
both members to zero.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Link: https://patch.msgid.link/20260511090023.1634387-5-u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: intel-xway: add PHY-level statistics via ethtool

Report PCS receive error counts for all supported PEF 7061, 7071, 7072 and
xRX200 PHYs.

Accumulate the vendor-specific PHY_ERRCNT read-clear counter
(SEL=RXERR) in .update_stats() and expose it as both IEEE 802.3
SymbolErrorDuringCarrier and generic rx_errors via
.get_phy_stats().

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20260509205933.3965832-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: intel-xway: fix typo in Kconfig description

Replace "22E" with "22F" in the description.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20260509210900.3968447-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: cope with slow env in so_txtime.py test

This test was converted from shell script to drv-net test.

The new version is flaky in dbg builds on the netdev.bots dashboard.
The previous shell script had more protections to avoid these. Added
in commit a7ee79b9c455 ("selftests: net: cope with slow env in
so_txtime.sh test").

Add the same overall protection:

- Suppress so_txtime process failure if KSFT_MACHINE_SLOW

Also relax two timeouts to reduce the number of process failures
themselves

- Increase SO_RCVTIMEO to 2 seconds
- Increase process start-up stabilization to 2 seconds

Delays were experimentally arrived at while running with vng
built with kernel/configs/debug.config

Fixes: 5c6baef3885c ("selftests: drv-net: convert so_txtime to drv-net")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260510174219.74aeee6d@kernel.org/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260511222138.2045551-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: fix missing closing paren in rings_reply_size()

sizeof(u32) on the _RINGS_CQE_SIZE line is missing its closing
parenthesis, causing nla_total_size() to absorb the subsequent
_TX_PUSH and _RX_PUSH entries.

The resulting size estimate happens to be numerically identical
due to NLA alignment, so not treating this as a real fix.
But the nesting is wrong and misleading.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260508125412.189804-1-cuitao@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-sched-prepare-lockless-qdisc-dumps'

Eric Dumazet says:

====================
net/sched: prepare lockless qdisc dumps

Goal is to no longer acquire RTNL in qdisc dumps.

This series annotate data-races, and change mq and mq_prio to
no longer acquire children qdisc spinlocks.
====================

Link: https://patch.msgid.link/20260510091455.4039245-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: mq_prio: no longer acquire qdisc spinlocks in mqprio_dump_class_stats()

Prepare mqprio_dump_class_stats() for RTNL avoidance.

Use RCU instead of RTNL, and no longer acquire each children spinlock.

As a bonus we no longer have to release/acquire d->lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: mq_prio: no longer acquire qdisc spinlocks in mqprio_dump()

Prepare mqprio_dump() for RTNL avoidance.

Use RCU instead of RTNL, and no longer acquire each children spinlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: mq: no longer acquire qdisc spinlocks in dump operations

Prepare mq_dump_common() for RTNL avoidance.

Use RCU instead of RTNL, and no longer acquire each children spinlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: add const qualifiers to gnet_stats helpers

In preparation of lockless qdisc dumps, add const qualifiers to:

- gnet_stats_add_basic()
- gnet_stats_copy_basic()
- gnet_stats_copy_basic_hw()
- gnet_stats_copy_queue()
- gnet_stats_read_basic()
- ___gnet_stats_copy_basic()
- qdisc_qstats_copy()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: add qdisc_qlen_lockless() helper

Used in contexts were qdisc spinlock is not held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: annotate data-races around sch->qstats.backlog

Add qstats_backlog_sub() and qstats_backlog_add() helpers
and use them instead of open-coding them.

These helpers use WRITE_ONCE() to prevent store-tearing.

Also use WRITE_ONCE() in fq_reset() and qdisc_reset()
when sch->qstats.backlog is cleared.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()

Helpers to increment or decrement sch->q.qlen, with appropriate
WRITE_ONCE() to prevent store tearing.

Add other WRITE_ONCE() when sch->q.qlen is changed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]

Stats are read locklessly, add READ_ONCE() to prevent load-tearing.

Write side will be handled in separate patches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260510091455.4039245-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-motorcomm-add-acpi-_dsd-property-support'

chunzhi.lin says:

====================
net: phy: motorcomm: add ACPI _DSD property support

This series makes the Motorcomm PHY driver parse firmware properties via
device_property_*() so the same property set can be provided by either
Devicetree or ACPI _DSD.

Patch 1 switches drivers/net/phy/motorcomm.c from of_property_*() to
device_property_*() on &phydev->mdio.dev.

Patch 2 documents Motorcomm yt8xxx PHY ACPI _DSD properties under
Documentation/firmware-guide/acpi/dsd and links the new document from
the ACPI index.
====================

Link: https://patch.msgid.link/20260507040221.3679454-1-linchunzhi0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: acpi: dsd: add Motorcomm yt8xxx PHY properties

Document Motorcomm yt8xxx PHY ACPI _DSD properties consumed by
the motorcomm PHY driver.

Describe property placement, UUID usage, and reference the DT binding
for value constraints and defaults.

Signed-off-by: chunzhi.lin <linchunzhi0@gmail.com>
Changes in v2:
- Keep dsd/ entries sorted in Documentation/firmware-guide/acpi/index.rst

Link: https://patch.msgid.link/20260507040221.3679454-3-linchunzhi0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: motorcomm: use device properties for firmware tuning

The Motorcomm PHY driver reads optional firmware properties via
of_property_read_*() from phydev->mdio.dev.of_node. This works for
Device Tree based systems, but causes ACPI platforms to ignore the same
properties when they are supplied through _DSD.

As a result, ACPI-described Motorcomm PHY devices fall back to default
settings instead of applying firmware-provided tuning such as
rx/tx internal delay, drive strength, clock output frequency, and
optional boolean controls like auto-sleep-disabled,
keep-pll-enabled, and tx clock inversion.

Switch these lookups to device_property_read_*() so the driver uses the
generic firmware node interface and can consume the same property names
from either Device Tree or ACPI.

This keeps the existing DT behavior unchanged while allowing ACPI
platforms to honor PHY configuration from firmware.

We have completed testing on Sophgo RISC-V architecture server SD3-10.
This server has a 64-core Thead C920 CPU whose DWMAC is connected to
Motorcomm's PHY YT8531. This server supports UEFI boot and it would like
to use the ACPI table.

Signed-off-by: chunzhi.lin <linchunzhi0@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260507040221.3679454-2-linchunzhi0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-pm-in-kernel-increase-limits'

Matthieu Baerts says:

====================
mptcp: pm: in-kernel: increase limits

Allow switching from 8 to 64 for the maximum number of subflows and
accepted ADD_ADDR, and from 8 to 255 for the number of MPTCP endpoints.

The previous limit of 8 subflows makes sense in most cases. Using more
subflows will very likely *not* improve the situation, and could even
decrease the performances. But there are no technical limitations nor
performance impact to raise this limit, so let's do it: this will allow
people with very specific use-cases, and researchers to easily create
more subflows, and measure the performance impact by themselves.

- Patches 1-2: increase subflows and accepted ADD_ADDR limits.

- Patches 3-4: increase endpoints limit.

- Patches 5-7: validate the new limits: 64 subflows, 255 endpoints.

- Patch 8: selftests: use send()/recv() instead of sendto()/recvfrom().
====================

Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-0-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: pm: use simpler send/recv forms

Instead of sendto() and recvfrom() which the NL address that was already
provided before.

Just simpler and easier to read without the to/from variants.

While at it, fix a checkpatch warning by removing multiple assignments.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-8-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: pm: validate new limits

These limits have been recently updated, from 8 to:

- 64 for the subflows and accepted add_addr

- 255 for the MPTCP endpoints

These modifications validate the new limits, but are also compatible
with the previous ones, to be able to continue to validate stable kernel
using the last version of the selftests. That's why new variables are
now used instead of hard-coded values.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-7-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: join: validate 8x8 subflows

The limits have been recently increased, it is required to validate that
having 64 subflows is allowed.

Here, both the client and the server have 8 network interfaces. The
server has 8 endpoints marked as 'signal' to announce all its v4
addresses. The client also has 8 endpoints, but marked as 'subflow' and
'fullmesh' in order to create 8 subflows to each address announced by
the server. This means 63 additional subflows will be created after the
initial one.

If it is not possible to increase the limits to 64, it means an older
kernel version is being used, and the test is skipped.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-6-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: join: allow changing ifaces nr per test

By default, 4 network interfaces are created per subtest in a dedicated
net namespace. Each netns has a dedicated pair of v4 and v6 addresses.
Future tests will need more.

Simply always creating more network interfaces per test will increase
the execution time for all other tests, for no other benefits. So now it
is possible to change this number only when needed, by setting ifaces_nr
when calling 'reset' and 'init_shapers', e.g.

ifaces_nr=8 reset "Subtest title"
ifaces_nr=8 init_shapers

Note that it might also be interesting to decrease the default value to
2 to reduce the setup time, especially when a debug kernel config is
being used.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-5-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: in-kernel: increase endpoints limit

The endpoints are managed in a list which was limited to 8 entries.

This limit can be too small in some cases: by having the same limit as
the number of subflows, it might not allow creating all expected
subflows when having a mix of v4 and v6 addresses that can all use MPTCP
on v4/v6 only networks.

While increasing the limit above the new subflows one, why not using the
technical limit: 255. Indeed, the endpoint will each have an ID that
will be used on the wire, limited to u8, and the ID 0 is reserved to the
initial subflow.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-4-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: kernel: allow flushing more than 8 endpoints

The mptcp_rm_list structure contains an array of IDs of 8 entries: to be
able to send a RM_ADDR with 8 IDs. This limitation was OK so far because
there could maximum 8 endpoints.

But this is going to change in the next commit. To cope with that, if
one of the arrays is full, the iteration stops, the lists are processed,
then the iteration continues where it previously stopped.

Note that if there are many endpoints to remove, and multiple RM_ADDR to
send, it might be more likely that some of these RM_ADDRs are dropped or
lost. This is a known limitation: RM_ADDR are not retransmitted in
MPTCPv1.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-3-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: in-kernel: increase all limits to 64

This means switching the maximum from 8 to 64 for the number of subflows
and accepted ADD_ADDR.

The previous limit of 8 subflows makes sense in most cases. Using more
subflows will very likely *not* improve the situation, and could even
decrease the performances. But there are no technical limitations nor
performance impact to raise this limit, so let's do it: this will allow
people with very specific use-cases, and researchers to easily create
more subflows, and measure the performance impact by themselves.

The theoretical limit is 255 -- the ID is written in a u8 on the wire --
but 64 is more than enough. With so many subflows, it will be costly to
iterate over all of them when operations are done in bottom half.

Note that the in-kernel PM will continue to create subflows in reply to
ADD_ADDR with a single batch of maximum 8 subflows. Same when adding new
"subflow" endpoints with the fullmesh flag. Increasing those batch
limits would have a memory impact, and it looks fine not to cover these
cases with larger batches for the moment. If more is needed later, the
position of the last subflow from the list could be remembered, and the
list iteration could continue later.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/434
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-2-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: in-kernel: explicitly limit batches to array size

The in-kernel PM can create subflows in reply to ADD_ADDR by batch of
maximum 8 subflows for the moment. Same when adding new "subflow"
endpoints with the fullmesh flag. This limit is linked to the arrays
used during these steps.

There was no explicit limit to the arrays size (8), because the limit of
extra subflows is the same (8). It seems safer to use an explicit limit,
but also these two sizes are going to be different in the next commit.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260508-net-next-mptcp-pm-inc-limits-v1-1-c84e3fdf9b6a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mention the convention for .ndo_setup_tc()

qdisc_offload_dump_helper(), originated from commit 602f3baf2218
("net_sch: red: Add offload ability to RED qdisc"), is designed to that

    Whether RED is being offloaded is being determined every time dump
    action is being called because parent change of this qdisc could
    change its offload state but doesn't require any RED function to be
    called.

and returning -EOPNOTSUPP (for dump queries) does not mean "I don't have
any statistics", but "I don't offload this qdisc anymore". At least two
existing drivers did it wrong, so it is worth mentioning.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260507214054.2539790-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-steering-misc-enhancements'

Tariq Toukan says:

====================
net/mlx5: Steering misc enhancements

This small series by Yevgeny contains a few steering enhancements /
cleanups.
====================

Link: https://patch.msgid.link/20260507173443.320465-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: DR, Remove unused field of struct mlx5dr_matcher_rx_tx

Remove a field that was never used.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260507173443.320465-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Handle destroying table that has a miss table

If a table has a miss table that was created by
'mlx5hws_table_set_default_miss' API function, its miss_tbl
keeps the table that points to it in a list.
If such table is deleted, we need to also remove it from the
miss_tbl list, otherwise the node in miss_tbl list will contain
garbage.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260507173443.320465-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Check if device is down while polling for completion

In case the device is down for any reason (e.g. FLR),
the HW will no longer generate completions - no point
polling and waiting for timeout.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260507173443.320465-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'log-clean-up-and-tap-follow-ups'

Allison Henderson says:

====================
Log clean up and TAP follow ups

This is a follow up series to the  "Log collection, TAP compliance and
cleanups" set.  The sashiko report had made some points that I thought
was worth addressing.  This patch set fixes a few more TAP compliance
prints in the check_gcov* routines.  Also since the user must now pass
in the log folder to collect logs, log clean up is tightened to only
remove rds* prefixed artifacts instead of the entire folder.  Lastly a
the signal handler alarm should be disarmed after the completes to
avoid multiple calls to the stop_pcaps routine.
====================

Link: https://patch.msgid.link/20260507233213.556182-1-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Disarm signal alarm on test completion

A race in stop_pcaps is possible if the test completes and then
times out while waiting for the tcpdump process to exit.  The
signal handler may fire again and needlessly call stop_pcap a
second time.  Fix this by disabling the alarm after normal
test completion.

Also if there are no tcpdump processes to wait on, stop_pcaps can
just exit.  This avoids misleading prints when there are no procs
to collect dumps from.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260507233213.556182-4-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Fix TAP-prefixed prints in check_gcov*

This patch adds the # prefix to info and warning prints in
the check_gcov* routines. Since these routines do not exit,
as the other check_* routines do, the output here should be
kept TAP compliant.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260507233213.556182-3-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Fix stale log clean up

Since rds self tests no longer has a default folder, users must
specify a log collection folder if they want to collect logs.
Currently the log folder is deleted and recreated, but this can
be dangerous if the user exports RDS_LOG_DIR=/tmp or /var/log.
This patch corrects the clean up to delete only rds log artifacts
from the log folder, and further prefixes rds specific logs as rds*

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260507233213.556182-2-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv4-flush-the-fib-once-on-multiple-nexthop-removal'

Cosmin Ratiu says:

====================
ipv4: Flush the FIB once on multiple nexthop removal

This series optimizes multiple nexthop removal performance from having
to do a FIB flush for each nexthop being removed to only doing a single
FIB flush after all nexthops are removed.

This dramatically improves performance in scenarios where there are
many nexthops and many ipv4 routes. Please see individual patches for
more details and for a test scenario.
====================

Link: https://patch.msgid.link/20260507075606.322405-1-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: Add __must_check to nexthop removal functions

These functions return a signal whether FIB flushing is required which
must not be ignored. Use the compiler to help with enforcing this
requirement in the future.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260507075606.322405-4-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: Flush the FIB once on multiple nexthop removal

When a device is going down or when a net namespace is deleted, all
nexthops on it are removed, and for each nexthop being removed the FIB
table is flushed, which does a full trie traversal looking for entries
marked RTNH_F_DEAD and removing them. This is O(N x R), with N being
number of dev nexthops and R being number of IPv4 routes.

The RTNL is held the entire time.

When there are many nexthops to be removed and many routing entries,
this can result in the RTNL being held for multiple minutes, which
causes unhappiness in other processes trying to acquire the RTNL (e.g.
systemd-networkd for DHCP renewals).

In a complicated deployment with multiple vxlan devices, each having
16K nexthops and a total of 128K ipv4 routes, this is exactly what
happens:

nexthop_flush_dev()                # loops over 16K nexthops
  -> remove_nexthop()
    -> __remove_nexthop()
      -> __remove_nexthop_fib()    # marks fi->fib_flags |= RTNH_F_DEAD
        -> fib_flush()             # for EACH nexthop!
  -> fib_table_flush()     # walks the ENTIRE FIB, 128K entries

This patch makes use of the previously added FIB flushing signal to only
do a single FIB flush after all nexthops to be removed are marked as
RTNH_F_DEAD:
- __remove_nexthop_fib() no longer flushes the FIB.
- nexthop_flush_dev() and flush_all_nexthops() now keep track whether
  any nexthop was removed and trigger a FIB flush at the end.
- a new wrapper is defined, remove_one_nexthop() which calls
  remove_nexthop() and flushes if necessary. This is intended for places
  which must remove a single nexthop and shouldn't worry about the need
  to trigger a FIB flush. For now, the only caller is rtm_del_nexthop().
- The two direct callers of __remove_nexthop() get a WARN_ON_ONCE, since
  the nh about to be removed should not have any FIB entries referencing
  it when replacing or inserting a new one.

This dramatically improves performance from O(N x R) to O(N + R).

Releasing a nexthop reference in remove_nexthop() now no longer frees
it. Instead, it is deleted when the last fib_info pointing to it gets
freed via free_fib_info_rcu(). All routing code is already careful not
to take into consideration routes marked with RTNH_F_DEAD.

Tested with:
DEV=eth2
ip link set up dev $DEV
ip link add testnh0 link $DEV type macvlan mode bridge
ip addr add 198.51.100.1/24 dev testnh0
ip link set testnh0 up

seq 1 65536 | \
sed 's/.*/nexthop add id & via 198.51.100.2 dev testnh0/' | \
ip -batch -

i=1
for a in $(seq 0 255); do
  for b in $(seq 0 255); do
    echo "route add 10.${a}.${b}.0/32 nhid $i"
    i=$((i + 1))
  done
done | ip -batch -

time ip link set testnh0 down
ip link del testnh0

Without this patch:
real 0m32.601s
user 0m0.000s
sys 0m32.511s

With this patch:
real 0m0.209s
user 0m0.000s
sys 0m0.153s

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260507075606.322405-3-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: Provide a FIB flushing signal from nexthop removal functions

Plumb a bool value throughout the various nexthop removal functions,
determined in the innermost __remove_nexthop_fib() (which still does the
FIB flushing) and propagated up all callers.

The next patch will make use of this signal to optimize the removal of
multiple nexthops by moving the FIB flushing up the call hierarchy.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260507075606.322405-2-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-convert-four-more-protocols-to-getsockopt_iter'

Breno Leitao says:

====================
net: convert four more protocols to getsockopt_iter

Continue the work to convert protocols to the new getsockopt_iter API.

Convert four additional getsockopt implementations to the new
sockopt_t/getsockopt_iter callback:

  - MCTP
  - LLC
  - X.25
  - KCM

These are mechanical, ABI-preserving conversions following the same
pattern as the previously converted protocols (af_packet, can/raw,
af_netlink, af_vsock): the (char __user *optval, int __user *optlen)
pair is replaced with a single sockopt_t *opt that carries the buffer
length on input and the returned size on output, and exposes an iov_iter
for the copy-out path. put_user()/copy_to_user() pairs are replaced with
a single copy_to_iter() per option, and the wrapper in
do_sock_getsockopt() handles writing optlen back to userspace.

I picked these four because each is small and self-contained.
====================

Link: https://patch.msgid.link/20260507-getsock_two-v2-0-5873111d9c12@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: getsockopt_iter: cleanup

Apply two cleanups suggested by Stanislav and bobby on the original
selftest series:

- Reorder local variable declarations into reverse christmas-tree
  order (longest line first). Because that ordering puts socklen_t
  optlen before the variable whose size it stores, the
  "optlen = sizeof(...)" initializer is moved out of the declaration
  to a plain assignment in the test body, as Stanislav suggested.

- Add ASSERT_EQ(optlen, ...) on every error path so the value the
  kernel writes back to the userspace optlen is pinned down even
  when the syscall returns -1. With do_sock_getsockopt() now writing
  opt->optlen back to userspace unconditionally, asserting that the
  netlink/vsock error paths leave the original input length untouched
  guards against future regressions.

Bobby Eshleman pointed out that
SO_VM_SOCKETS_CONNECT_TIMEOUT_NEW/OLD return a sock_timeval-shaped
payload (16 bytes on 64-bit), which is wider than the u64 case
already covered. Add four tests that exercise this path:

- connect_timeout_new_exact         exact-size buffer
- connect_timeout_new_oversize_clamped  oversize buffer, clamped
- connect_timeout_new_undersize     undersize -> -EINVAL, optlen
                                    untouched
- connect_timeout_old_exact         exact-size buffer for OLD optname

Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Suggested-by: Bobby Eshleman <bobbyeshleman@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260507-getsock_two-v2-5-5873111d9c12@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

kcm: convert to getsockopt_iter

Convert KCM socket's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()
- Add linux/uio.h for copy_to_iter()

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260507-getsock_two-v2-4-5873111d9c12@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

x25: convert to getsockopt_iter

Convert X.25 socket's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()
- Add linux/uio.h for copy_to_iter()

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260507-getsock_two-v2-3-5873111d9c12@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

llc: convert to getsockopt_iter

Convert LLC socket's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()
- Add linux/uio.h for copy_to_iter()

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260507-getsock_two-v2-2-5873111d9c12@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mctp: convert to getsockopt_iter

Convert MCTP socket's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input)
- Use copy_to_iter() instead of copy_to_user()
- Add linux/uio.h for copy_to_iter()

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260507-getsock_two-v2-1-5873111d9c12@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: eth: fbnic: Fix addr validation in pcs write

The DW IP has two distinct PCS address ranges cooresponding
to the C45 PCS registers.

The shim translates the PCS addr/regno into specific CSR writes
into one of those two zero-relative ranges.

This patch fixes a one off in the test that could allow an invalid
CSR write if an addr == 2 was called.

There are is of yet, no real impact for the bug as no PCS writes are
present.

Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com>
Link: https://patch.msgid.link/20260507154203.3667-1-mike.marciniszyn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-dsa-microchip-remove-one-indirection-layer'

Bastien Curutchet says:

====================
net: dsa: microchip: Remove one indirection layer

This series follows the discussions we had on a previous series that
aimed to add PTP support for the KSZ8463 (cf [1]).

The KSZ driver got way too convoluted over time because it uses a common
framework to handle more than 20 switches split in 5 families (see below
table)

+----------+---------+---------+---------+---------+---------+
| Family   | KSZ8463 | KSZ87xx | KSZ88xx | KSZ9477 | LAN937X |
+----------+---------+---------+---------+---------+---------+
| Switches | KSZ8463 | KSZ8795 | KSZ88X3 | KSZ8563 | LAN9370 |
|          |         | KSZ8794 | KSZ8864 | KSZ9477 | LAN9371 |
|          |         | KSZ8765 | KSZ8895 | KSZ9896 | LAN9372 |
|          |         |         |         | KSZ9897 | LAN9373 |
|          |         |         |         | KSZ9893 | LAN9374 |
|          |         |         |         | KSZ9563 |         |
|          |         |         |         | KSZ8567 |         |
|          |         |         |         | KSZ9567 |         |
|          |         |         |         | LAN9646 |         |
+----------+---------+---------+---------+---------+---------+

A unique struct dsa_switch_ops is used by all the switches. Next to it,
each switch family has its own struct ksz_dev_ops with family-specific
callbacks. So the dsa_switch_ops operations handle the specificities of
each family through these ksz_dev_ops callbacks and/or conditional
branches based on the chip ID.

Vladimir initiated a rework of the driver ([2]) which I carried on. On
top of the rework I added PTP and periodic output support for the
KSZ8463 (which was my first goal). There are more than 60 patches for
all this so this series will be followed by several others and if you
want to see the full picture we can check my github ([3]).

This first series aims to split the unique struct dsa_switch_ops into
5 so each switch family will be able to implement its own set of DSA
operations.

I haven't finished yet to group all the patches into meaningful series
but here is more or less what I plan to do next:

- A series will remove from the struct ksz_dev_ops the callbacks
  that have an equivalent in dsa_switch_ops to remove one level of
  indirection.
- A series will split again some operations to get rid of the
  if (is_kszXYZ) branches.
- Maybe a fourth one will be needed to completely move out of
  ksz_common.c everything that isn't truly common to all the switches
- A series will add PTP support for the KSZ8463
- A final series will add periodic output support for the KSZ8463

[1]: https://lore.kernel.org/r/20260304-ksz8463-ptp-v6-0-3f4c47954c71@bootlin.com)
[2]: https://github.com/vladimiroltean/linux/tree/ksz_separate_dsa_switch_ops
[3]: https://github.com/bastien-curutchet/linux/tree/ksz_rework
====================

Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-0-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: split ksz_connect_tag_protocol()

All the KSZ switches use the same ksz_connect_tag_protocol while they
don't support all the KSZ tag protocols. So if, for some reason, a given
switch tries to connect another KSZ tag protocol, it won't fail.

Split the common ksz_connect_tag_protocol() into switch-specific
operations. This way, each switch will only accept to connect the tag
protocol it supports.
Remove the no longer used common operation.

Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-9-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: split ksz_get_tag_protocol()

All the switch families use a common function to implement
.get_tag_protocol(). This function then returns the relevant protocol
depending on the chip ID.

Make the protocol to dsa_switch_ops association a little bit more
obvious by having separate implementations.

Change made by manually checking which chip id has which dsa_switch_ops
assigned to it, then filtering the common ksz_get_tag_protocol() for
just those chip IDs pertaining to it.

As an important benefit, we no longer have that weird-looking
DSA_TAG_PROTO_NONE fallback which was never actually returned.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-8-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: hook up ksz_switch_alloc() to chip-specific dsa_switch_ops

Now that each switch driver has its own dsa_switch_ops (currently a copy
of ksz_switch_ops), we no longer need ksz_switch_ops and can remove it.

Get to the driver-specific dsa_switch_ops through the ksz_chip_data
structure.
Reorder the alloc()/get_match_data() calls such as to have that
pointer available.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-7-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: ensure each ksz_dev_ops has its own dsa_switch_ops

Currently we have a single dsa_switch_ops for 4 very distinct families
of switches, and many dsa_switch_ops methods are simply a dispatches
through ksz_dev_ops. That creates an avoidable level of indirection.

As a preparation for removing that indirection layer, create a separate
dsa_switch_ops structure wherever we have a ksz_dev_ops. These
structures are not yet used - ksz_switch_ops from ksz_common.c still is.
However, this reduces the noise from subsequent changes.

All new dsa_switch_ops are exact copies of ksz_switch_ops. But we need
to export function prototypes from ksz_common.c so that they are
callable from individual drivers.

Note that "individual drivers" are not actual separate kernel modules.
All of ksz8.c, ksz9477.c and lan937x_main.c are part of the same
ksz_switch.ko. Only the "register interface" drivers are different
modules (ksz9477_i2c.o for I2C, ksz_spi.o for SPI, ksz8863_smi.o for
MDIO). So we don't need to export any symbol.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-6-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: move phylink_mac_ops to individual drivers

Similar to ksz_dev_ops, struct phylink_mac_ops shouldn't be part of
the common code. Instead, the common code should provide callable
functionality.

Invert the paradigm and export the common aspects from ksz_common.c, and
move the chip-specific stuff in individual drivers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-5-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: move KSZ9477 and LAN937 ksz_dev_ops to individual drivers

The ksz_dev_ops() are specific to each switch family so they should
belong to the individual drivers instead of the common section.

Move the ksz_dev_ops() definitions of the KSZ9477 and the LAN937 to
their individual drivers.
Set static the functions that aren't exported anymore.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-4-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: move KSZ8 ksz_dev_ops to ksz8.c

The ksz_dev_ops() are specific to each switch family so they should
belong to the individual drivers instead of the common section.

Move the ksz_dev_ops() definitions of the KSZ8xxx to ksz8.c
Set static the functions that aren't exported anymore.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-3-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: remove unused port_cleanup() callback

ksz_dev_ops :: port_cleanup() isn't used anywhere.

Remove it.

Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-2-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: Remove unused ksz8_all_queues_split()

ksz8_all_queues_split() isn't used anywhere.

Remove it.

Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20260505-clean-ksz-driver-v1-1-05d70fa42461@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-icm-page-management-in-vhca_id-mode'

Tariq Toukan says:

====================
net/mlx5: ICM page management in VHCA_ID mode

This series adds driver support for the VHCA_ID page management mode.
When firmware and driver support this mode, ICM (Interconnect Context
Memory) page management uses the device vhca_id as the function
identifier in MANAGE_PAGES, QUERY_PAGES, and page request events instead
of the legacy function_id + ec_function pair.

Background
Firmware can operate page management in two modes:
FUNC_ID mode (current): Function identity is (function_id, ec_function).
This remains the default and is used for boot pages and when the new
mode capability is not set.
VHCA_ID mode (new): Function identity is vhca_id only; ec_function is
ignored. This aligns page management with the vhca_id-based model used
by other firmware commands and simplifies identification on SmartNIC and
multi-function setups.
====================

Link: https://patch.msgid.link/20260506133239.276237-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add VHCA_ID page management mode support

Add support for VHCA_ID-based page management mode. When the device
firmware advertises the icm_mng_function_id_mode capability with
MLX5_ID_MODE_FUNCTION_VHCA_ID, page management operations between the
driver and firmware may use vhca_id instead of function_id as the
effective function identifier, and the ec_function field is ignored.

Update page management commands to conditionally set ec_function field
only in FUNC_ID mode. Boot page allocation always uses FUNC_ID mode
semantics for backward compatibility, as the capability bit is only
available after set_hca_cap(). If after set_hca_cap() VHCA_ID mode was
set, modify the tracking of the boot pages in page_root_xa to use
vhca_id too.

Add mlx5_esw_vhca_id_to_func_type() to resolve the function type in
VHCA_ID mode, enabling per-type debugfs counters. Use a dedicated
vhca_type_map xarray, to provide lockless lookup. Store the resolved
type on each fw_page at allocation time so reclaim and release paths
read it directly without any lookup.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260506133239.276237-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Make debugfs page counters by function type dynamic

Make the per function type debugfs page counters dynamically added after
mlx5_eswitch_init(). When page management operates in vhca_id mode, only
the function acting as either eSwitch or vport manager can initialize
the eSwitch structure and translate the vhca_id to function type for the
functions to which it supplies pages. The next patch will add support
for page management in vhca_id mode.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260506133239.276237-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Relax capability check for eswitch query paths

Several eswitch functions that only query other functions' HCA
capabilities or read cached vport state are guarded by the
vhca_resource_manager capability. This capability is required for
set_hca_cap operations but query_hca_cap of other functions only
requires the vport_group_manager capability.

Relax the capability check from vhca_resource_manager to
vport_group_manager in the following query-only paths:
- mlx5_esw_vport_caps_get() - queries other function general caps
- esw_ipsec_vf_query_generic() - queries other function ipsec cap
- mlx5_devlink_port_fn_migratable_get() - reads cached vport state
- mlx5_devlink_port_fn_roce_get() - reads cached vport state
- mlx5_devlink_port_fn_max_io_eqs_get() - queries other function caps
- mlx5_esw_vport_enable/disable() - vhca_id map/unmap

Functions that perform also set_hca_cap (migratable_set, roce_set,
max_io_eqs_set, esw_ipsec_vf_set_generic, esw_ipsec_vf_set_bytype)
retain the vhca_resource_manager requirement.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260506133239.276237-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbe: E610: do not fill EEE lp_advertised from local PHY caps

ixgbe_get_eee_e610() fills kedata->lp_advertised from pcaps.eee_cap
returned by ixgbe_aci_get_phy_caps() with IXGBE_ACI_REPORT_ACTIVE_CFG.
That report mode (and the other IXGBE_ACI_REPORT_* modes) describe the
local PHY only, not the link partner. The X550 path uses a separate
FW_PHY_ACT_UD_2 activity for partner data; the E610 ACI has no
equivalent.

Leave lp_advertised zeroed via the existing linkmode_zero() and drop
the now-unused ixgbe_eee_cap_map[]. eee_active/eee_enabled are
unaffected (sourced from link.eee_status).

Fixes: b61dbdeff3a9 ("ixgbe: E610: add EEE support")
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260507-jk-iwl-next-fix-eee-ixgbe-v1-1-62bc1d197d1d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: Fix typo in comment

Fix a typo in a comment in sctp_endpoint_destroy(): "releated" should
be "related".

Signed-off-by: Md Shofiqul Islam <shofiqtest@gmail.com>
Link: https://patch.msgid.link/20260507105758.25728-1-shofiqtest@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-fix-protodown-with-macvlan'

Ido Schimmel says:

====================
net: Fix protodown with macvlan

When protodown is enabled on a macvlan, two bugs cause the macvlan to
incorrectly gain carrier:

1. Toggling the lower device's carrier while protodown is enabled on the
macvlan causes the macvlan to gain carrier, effectively bypassing the
protodown mechanism.

2. Toggling protodown on and then off on the macvlan while the lower
device has no carrier causes the macvlan to gain carrier, since
netif_change_proto_down() unconditionally turns the carrier on.

Patch #1 is a preparation.

Patch #2 solves the first problem by making netif_carrier_on() return
early when protodown is on.

Patch #3 solves the second problem by only calling netif_carrier_on()
when protodown is turned off if there is no linked net device or if the
linked net device has a carrier.

Patch #4 adds a selftest covering both bugs and the basic protodown
functionality.

Targeting at net-next since these are not regressions (i.e., never
worked).

Note that while these changes are in the core, they should only affect
macvlan as protodown is only supported by macvlan and vxlan and only
the former has a linked net device.
====================

Link: https://patch.msgid.link/20260507105906.891817-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: Add protodown tests

Add a selftest for the protodown mechanism.

Five test cases are included:

1. Basic protodown toggling: Verify that setting protodown on macvlan
   results in DOWN operational state and clearing it restores UP.

2. Same as the previous test case, but with vxlan.

3. Protodown reasons: Verify that protodown cannot be cleared while
   there are active protodown reasons, but can be cleared once all
   reasons are removed.

4. Protodown with lower device being toggled: Verify that toggling the
   lower device's carrier while protodown is on does not cause the
   macvlan to gain carrier.

5. Protodown with lower device down: Verify that toggling protodown
   while the lower device has no carrier does not cause the macvlan to
   gain carrier.

Note that the last two test cases fail without "net: Do not turn on
carrier when protodown is on" and "net: Do not unconditionally turn on
carrier when turning off protodown":

# ./protodown.sh
TEST: Basic protodown on/off with macvlan                           [ OK ]
TEST: Basic protodown on/off with vxlan                             [ OK ]
TEST: Protodown reasons                                             [ OK ]
TEST: Protodown with lower device toggled                           [FAIL]
         Macvlan operational state is not DOWN despite protodown
TEST: Protodown with lower device down                              [FAIL]
         Macvlan is not LOWERLAYERDOWN after clearing protodown

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Do not unconditionally turn on carrier when turning off protodown

The protodown functionality allows user space to turn off the carrier of
a net device:

# ip link add name dummy1 up type dummy
# ip link add name macvlan1 up link dummy1 type macvlan mode bridge
# ip link set dev macvlan1 protodown on
$ ip -br link show dev macvlan1
macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>

When protodown is turned off, the core unconditionally turns on the
carrier of the net device:

# ip link set dev macvlan1 protodown off
$ ip -br link show dev macvlan1
macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

This is wrong as it means that a macvlan can end up with a carrier when
its lower device does not have a carrier:

# ip link set dev dummy1 carrier off
$ ip -br link show dev macvlan1
macvlan1@dummy1  LOWERLAYERDOWN 0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
# ip link set dev macvlan1 protodown on
# ip link set dev macvlan1 protodown off
$ ip -br link show dev macvlan1
macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Solve this by resolving the linked net device and if one exists, inherit
its carrier state when protodown is turned off. Otherwise, if no linked
net device exists, as before, simply turn on the carrier.

Resolve the linked net device using a new helper and have it return the
device itself (in a similar fashion to dev_get_iflink()) if the device
does not implement both ndo_get_iflink() and get_link_net(). If the
latter is not implemented, it is unclear in which network namespace we
should look up the linked net device. Currently, this helper is only
used for net devices that support protodown (macvlan and vxlan) and for
both it returns the correct result.

Output with the patch:

# ip link add name dummy1 up type dummy
# ip link add name macvlan1 up link dummy1 type macvlan mode bridge
# ip link set dev dummy1 carrier off
$ ip -br link show dev macvlan1
macvlan1@dummy1  LOWERLAYERDOWN 0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
# ip link set dev macvlan1 protodown on
# ip link set dev macvlan1 protodown off
$ ip -br link show dev macvlan1
macvlan1@dummy1  LOWERLAYERDOWN 0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
# ip link set dev dummy1 carrier on
$ ip -br link show dev macvlan1
macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>
# ip link set dev macvlan1 protodown on
# ip link set dev macvlan1 protodown off
$ ip -br link show dev macvlan1
macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Do not turn on carrier when protodown is on

The protodown functionality allows user space to turn off the carrier of
a net device:

# ip link add name dummy1 up type dummy
# ip link add name macvlan1 up link dummy1 type macvlan mode bridge
# ip link set dev macvlan1 protodown on
$ ip -br link show dev macvlan1
macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>

Different applications can set different protodown reasons, which
prevents an application from turning on the carrier of a net device as
long as others want it down:

# ip link set dev macvlan1 protodown_reason 1 on
# ip link set dev macvlan1 protodown_reason 2 on
# ip link set dev macvlan1 protodown off
Error: Cannot clear protodown, active reasons.
# ip link set dev macvlan1 protodown_reason 2 off
# ip link set dev macvlan1 protodown off
Error: Cannot clear protodown, active reasons.
# ip link set dev macvlan1 protodown_reason 1 off
# ip link set dev macvlan1 protodown off
$ ip -br link show dev macvlan1
macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Unfortunately, this mechanism is not very useful when the carrier of a
net device can be toggled by toggling the carrier of its lower device:

# ip link set dev macvlan1 protodown on
$ ip -br link show dev macvlan1
macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
# ip link set dev dummy1 carrier off
# ip link set dev dummy1 carrier on
$ ip -br link show dev macvlan1
macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Obviously, this is not the intended behavior and it is unlikely to be
relied on by anyone. In fact, it is a problem for applications like FRR
that use protodown with macvlan on top of a bridge as part of Virtual
Router Redundancy Protocol (VRRP).

Solve this by preventing a net device configured with protodown on from
gaining carrier by making netif_carrier_on() a NOP when protodown is
turned on.

Output with the patch:

# ip link add name dummy1 up type dummy
# ip link add name macvlan1 up link dummy1 type macvlan mode bridge
# ip link set dev macvlan1 protodown on
$ ip -br link show dev macvlan1
macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
# ip link set dev dummy1 carrier off
# ip link set dev dummy1 carrier on
$ ip -br link show dev macvlan1
macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
# ip link set dev macvlan1 protodown off
$ ip -br link show dev macvlan1
macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Set dev->proto_down before changing carrier state

A subsequent patch will make netif_carrier_on() a NOP for net devices
that have protodown turned on so that they will not accidentally gain
carrier. As a preparation, set dev->proto_down before calling
netif_carrier_{off,on}().

Note that the only driver that supports protodown and has a notion of a
carrier is macvlan and it is calling netif_carrier_{off,on}() with RTNL
held.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'keep-phy-link-during-wol-sleep-cycle'

Justin Chen says:

====================
Keep PHY link during WoL sleep cycle

First we divide the init/deinit path to allow for a partial init/deinit
during a sleep cycle. We also remove some unnecessary small functions at
the same time.

Then we modify the suspend and resume path to allow for a partial bring
down and bring up. This allow us to keep the PHY link up and to resume
network traffic much quicker. Note we only do this when WoL is enabled
since the PHY is already powered. In the non-WoL case we want to follow
the same flow.
====================

Link: https://patch.msgid.link/20260506213114.2002886-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmasp: Keep phy link during WoL sleep cycle

We currently more or less restart all the HW on resume. Since we also
stop the PHY, it takes a while for the PHY link to be re-negotiated on
resume. Instead of doing a full restart, we keep the HW state and the
PHY link, that way we can resume network traffic with a much smaller
delay.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260506213114.2002886-3-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmasp: Divide init to allow partial bring up

To prepare for a partial bring up of the interface during resume,
we break apart the bcmasp_netif_init() function into smaller chunks
that can be called as necessary. Also consolidate some functions that
do not need to be standalone.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260506213114.2002886-2-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: microchip: Add LAN7500 and LAN7505 devices

Add bindings for LAN7500 and LAN7505 USB Ethernet Devices which are similar
to LAN9500.

Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260506-b4-var-som-om44-lan7500-v2-1-b8af59ab877c@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: dp83867: add MDI-X management

ethtool on this phy device always reports "MDI-X: Unknown" and doesn't
support forcing it to on or off.
This patch adds support for reading/forcing MDI-X mode from ethtool
properly.

Signed-off-by: Luca Ellero <l.ellero@asem.it>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260506141918.13136-1-l.ellero@asem.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: Use generic power management

Switch to the generic power management and remove the usage of legacy
(pci_driver) hooks.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260506165015.641738-1-vaibhavgupta40@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Drop pci_save_state() after pci_restore_state()

Commit 383d89699c50 ("treewide: Drop pci_save_state() after
pci_restore_state()") sought to purge all superfluous invocations of
pci_save_state() from the tree.

Unfortunately the commit missed one invocation in the Broadcom
NetXtreme-C/E driver. Drop it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/39de1b025928d9a457976010b2324e7e99baa92a.1778158755.git.lukas@wunner.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: lan966x: Accept standard ethernet prefixes

The dsa.yaml and ethernet-switch.yaml bindings recommend
prefixing ethernet switches and ports with "ethernet-" so
make the LAN966x do the same.

Reported-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260507-lan966-binding-v1-1-e99293d2a4ec@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: atheros: atl2: remove kernel backward-compatibility code

The atl2 driver contains code for compatibility with old kernels that
do not support module_param_array. Backward compatibility is
irrelevant because this driver is in-tree. Remove this unreachable
code to simplify the driver's handling of module parameters.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260506054035.23710-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: add tests for filtered dumps of page pool

Add tests for page pool dumps of a specific ifindex.

Link: https://patch.msgid.link/20260506034821.1710113-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: page_pool: support dumping pps of a specific ifindex via Netlink

NIPA tries to make sure that HW tests don't modify system state.
It saves the state of page pools, too. Now that I write this commit
message I realize that this is impractical since page pool IDs and
state will get legitimately changed by the tests. But I already
spent a couple of hours implementing the filtering, so..

Link: https://patch.msgid.link/20260506034821.1710113-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.1-rc3).

Conflicts:

net/ipv4/igmp.c
  726fa7da2d8c ("ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation")
  c6bebaa744f7 ("ipv4: igmp: annotate data-races in igmp_heard_query()")
https://lore.kernel.org/a7365e4873340f7a5e30411207de3bf9@kernel.org

Adjacent changes:

net/psp/psp_main.c
  30cb24f97d44 ("psp: strip variable-length PSP header in psp_dev_rcv()")
  c2b22277ad89 ("psp: validate IPv4 header fields in psp_dev_rcv()")

net/sched/sch_fq_codel.c
  f83e07b29246 ("net/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()")
  3f3aa77ff1c8 ("net/sched: add qstats_cpu_drop_inc() helper")

net/wireless/pmsr.c
  0f3c0a197309 ("wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage")
  410aa47fd9d3 ("wifi: cfg80211: allow suppressing FTM result reporting for PD requests")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter, IPsec, Bluetooth and WiFi.

  Current release - fix to a fix:

   - ipmr: add __rcu to netns_ipv4.mrt, make sure we hold the RCU lock
     in all relevant places

  Current release - new code bugs:

   - fixes for the recently added resizable hash tables

   - ipv6: make sure we default IPv6 tunnel drivers to =m now that IPv6
     itself is built in

   - drv: octeontx2-af: fixes for parser/CAM fixes

  Previous releases - regressions:

   - phy: micrel: fix LAN8814 QSGMII soft reset

   - wifi:
       - cw1200: revert "Fix locking in error paths"
       - ath12k: fix crash on WCN7850, due to adding the same queue
         buffer to a list multiple times

  Previous releases - always broken:

   - number of info leak fixes

   - ipv6: implement limits on extension header parsing

   - wifi: number of fixes for missing bound checks in the drivers

   - Bluetooth: fixes for races and locking issues

   - af_unix:
       - fix an issue between garbage collection and PEEK
       - fix yet another issue with OOB data

   - xfrm: esp: avoid in-place decrypt on shared skb frags

   - netfilter: replace skb_try_make_writable() by skb_ensure_writable()

   - openvswitch: vport: fix race between tunnel creation and linking
     leading to invalid memory accesses (type confusion)

   - drv: amd-xgbe: fix PTP addend overflow causing frozen clock

  Misc:

   - sched/isolation: make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
     (for relevant IPVS change)"

* tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (190 commits)
  net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()
  net: sparx5: fix wrong chip ids for TSN SKUs
  net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()
  tcp: Fix dst leak in tcp_v6_connect().
  ipmr: Call ipmr_fib_lookup() under RCU.
  net: phy: broadcom: Save PHY counters during suspend
  net/smc: fix missing sk_err when TCP handshake fails
  af_unix: Reject SIOCATMARK on non-stream sockets
  veth: fix OOB txq access in veth_poll() with asymmetric queue counts
  eth: fbnic: fix double-free of PCS on phylink creation failure
  net: ethernet: cortina: Drop half-assembled SKB
  selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl
  selftests: mptcp: check output: catch cmd errors
  mptcp: pm: prio: skip closed subflows
  mptcp: pm: ADD_ADDR rtx: return early if no retrans
  mptcp: pm: ADD_ADDR rtx: skip inactive subflows
  mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker
  mptcp: pm: ADD_ADDR rtx: free sk if last
  mptcp: pm: ADD_ADDR rtx: always decrease sk refcount
  mptcp: pm: ADD_ADDR rtx: fix potential data-race
  ...

net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()

sparx5_port_init() only invokes sparx5_serdes_set() and the associated
shadow-device enable and low-speed device switch for SGMII and QSGMII.
On any port with a high-speed primary device (DEV5G/DEV10G/DEV25G)
configured for 1000BASE-X the serdes is therefore left uninitialized,
the DEV2G5 shadow is never enabled, and the port stays pointed at its
high-speed device rather than the DEV2G5. The PCS1G block looks
healthy in isolation, but no frames reach the link partner.

Add 1000BASE-X to the check so the same three steps run.

Note: the same issue might apply to 2500BASE-X, but that will,
eventually, be addressed in a separate commit.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 946e7fd5053a ("net: sparx5: add port module support")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20260506-misc-fixes-sparx5-lan969x-v2-4-fb236aa96908@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sparx5: fix wrong chip ids for TSN SKUs

The TSN SKUs in enum spx5_target_chiptype have incorrect IDs:

  SPX5_TARGET_CT_7546TSN    = 0x47546,
  SPX5_TARGET_CT_7549TSN    = 0x47549,
  SPX5_TARGET_CT_7552TSN    = 0x47552,
  SPX5_TARGET_CT_7556TSN    = 0x47556,
  SPX5_TARGET_CT_7558TSN    = 0x47558,

The value read back from the chip is GCB_CHIP_ID_PART_ID, which is a
GENMASK(27, 12) field, i.e. at most 16 bits wide. It can never match
these IDs, so probing a TSN part fails with a "Target not supported"
error.

Fix the enum to use the actual 16-bit part IDs returned by the
hardware: 0x0546, 0x0549, 0x0552, 0x0556 and 0x0558.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 3cfa11bac9bb ("net: sparx5: add the basic sparx5 driver")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20260506-misc-fixes-sparx5-lan969x-v2-3-fb236aa96908@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Again a collection of small fixes, mostly for device-specific ones.

  The only big LOC is about the removal of pretty old dead code in
  ab8500 codec driver, while the rest all nice small changes.

  Core / API:
   - Fix race in deferred fasync state checks
   - Fix UMP group filtering in sequencer

  ASoC:
   - cs35l56: fixes for driver cleanup and error paths
   - tas2764/2770: workaround for bogus temperature readings
   - wm_adsp: fixes for firmware unit tests
   - amd-yc: more DMI quirks for laptops
   - Minor fixes for fsl_xcvr and spacemit

  HD-Audio:
   - Mute LED and speaker quirks for HP, Lenovo, and Xiaomi laptops

  USB-audio:
   - New device-specific quirks (Motu, JBL, AlphaTheta, Razer)
   - Fix of MIDI2 playback on resume

  Others:
   - Firewire-tascam control event fix
   - Minor cleanups and fixes for sparc/dbri and pcmtest"

* tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
  ASoC: cs35l56: Destroy workqueue in probe error path
  ASoC: cs35l56: Don't use devres to unregister component
  ALSA: sparc/dbri: add missing fallthrough
  ALSA: core: Serialize deferred fasync state checks
  ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx
  ALSA: seq: Fix UMP group 16 filtering
  ASoC: wm_adsp_fw_find_test: Clear searched_fw_files in find-by-index test
  ASoC: wm_adsp_fw_find_test: Redirect wm_adsp_release_firmware_files()
  ASoC: tas2770: Deal with bogus initial temperature value
  ASoC: tas2764: Deal with bogus initial temperature register value
  ALSA: usb-audio: add clock quirk for Motu 1248
  ALSA: usb-audio: midi2: Restart output URBs on resume
  ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 15-fh0xxx
  ALSA: usb-audio: Add quirk flags for JBL Pebbles
  ALSA: firewire-tascam: Do not drop unread control events
  ALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA
  ASoC: fsl_xcvr: Fix event generation for cached controls
  ASoC: sdw_utils: avoid the SDCA companion function not supported failure
  ASoC: amd: yc: Add HP OMEN Gaming Laptop 16-ap0xxx product line in quirk table
  ASoC: cs35l56: Fix out-of-bounds in dev_err() in cs35l56_read_onchip_spkid()
  ...

Merge tag 'platform-drivers-x86-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

- Silence unknown board warning for 8D41 (hp-wmi)

- Fix uninitialized variable in fan RPM handling (lenovo/wmi-other)

- Check min_size also when ACPI does not return an out object (wmi)

* tag 'platform-drivers-x86-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: lenovo: wmi-other: Fix uninitialized variable in lwmi_om_hwmon_write()
  platform/x86: hp-wmi: silence unknown board warning for 8D41
  platform/wmi: Fix unchecked min_size in wmidev_invoke_method()

Merge tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:

- Fix detach procedure for virtual devices in genpd

- mediatek: Fix use-after-free in scpsys_get_bus_protection_legacy()

* tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: mediatek: fix use-after-free in scpsys_get_bus_protection_legacy()
pmdomain: core: Fix detach procedure for virtual devices in genpd

net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()

priv->dev was never initialized after devm_kzalloc() allocates the
private data structure. When nvt_set_phy_intf_sel() is later invoked
via the phylink interface_select callback, it calls
nvt_gmac_get_delay(priv->dev, ...) which dereferences the NULL pointer.

Fix this by assigning priv->dev = dev immediately after allocation.

Fixes: 4d7c557f58ef ("net: stmmac: dwmac-nuvoton: Add dwmac glue for Nuvoton MA35 family")
Signed-off-by: Joey Lu <a0987203069@gmail.com>
Link: https://patch.msgid.link/20260506084614.192894-2-a0987203069@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Fix dst leak in tcp_v6_connect().

If a socket is bound to a wildcard address, tcp_v[46]_connect()
updates it with a non-wildcard address based on the route lookup.

After bhash2 was introduced in the cited commit, we must call
inet_bhash2_update_saddr() to update the bhash2 entry as well.

If inet_bhash2_update_saddr() fails, we must release the refcount
for dst by ip_route_connect() or ip6_dst_lookup_flow().

While tcp_v4_connect() calls ip_rt_put() in the error path,
tcp_v6_connect() does not call dst_release().

Let's call dst_release() when inet_bhash2_update_saddr() fails
in tcp_v6_connect().

Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260506070443.1699879-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipmr: Call ipmr_fib_lookup() under RCU.

Yi Lai reported RCU splat in reg_vif_xmit() below. [0]

When CONFIG_IP_MROUTE_MULTIPLE_TABLES=n, ipmr_fib_lookup()
uses rcu_dereference() without explicit rcu_read_lock().

Although rcu_read_lock_bh() is already held by the caller
__dev_queue_xmit(), lockdep requires explicit rcu_read_lock()
for rcu_dereference().

Let's move up rcu_read_lock() in reg_vif_xmit() to
cover ipmr_fib_lookup().

[0]:
WARNING: suspicious RCU usage
7.1.0-rc2-next-20260504-9d0d467c3572 #1 Not tainted
-----------------------------
net/ipv4/ipmr.c:329 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by syz.2.17/1779:
#0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: rcu_read_lock_bh include/linux/rcupdate.h:891 [inline]
#0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x239/0x4140 net/core/dev.c:4792
#1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
#1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: __netif_tx_lock include/linux/netdevice.h:4795 [inline]
#1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: __dev_queue_xmit+0x1d5d/0x4140 net/core/dev.c:4865

stack backtrace:
CPU: 1 UID: 0 PID: 1779 Comm: syz.2.17 Not tainted 7.1.0-rc2-next-20260504-9d0d467c3572 #1 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x121/0x150 lib/dump_stack.c:120
dump_stack+0x19/0x20 lib/dump_stack.c:129
lockdep_rcu_suspicious+0x15b/0x1f0 kernel/locking/lockdep.c:6878
ipmr_fib_lookup net/ipv4/ipmr.c:329 [inline]
reg_vif_xmit+0x2ee/0x3c0 net/ipv4/ipmr.c:540
__netdev_start_xmit include/linux/netdevice.h:5382 [inline]
netdev_start_xmit include/linux/netdevice.h:5391 [inline]
xmit_one net/core/dev.c:3889 [inline]
dev_hard_start_xmit+0x170/0x700 net/core/dev.c:3905
__dev_queue_xmit+0x1df1/0x4140 net/core/dev.c:4871
dev_queue_xmit include/linux/netdevice.h:3423 [inline]
packet_xmit+0x252/0x370 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3082 [inline]
packet_sendmsg+0x39ad/0x5650 net/packet/af_packet.c:3114
sock_sendmsg_nosec net/socket.c:797 [inline]
__sock_sendmsg net/socket.c:812 [inline]
____sys_sendmsg+0xa21/0xba0 net/socket.c:2716
___sys_sendmsg+0x121/0x1c0 net/socket.c:2770
__sys_sendmsg+0x177/0x220 net/socket.c:2802
__do_sys_sendmsg net/socket.c:2807 [inline]
__se_sys_sendmsg net/socket.c:2805 [inline]
__x64_sys_sendmsg+0x80/0xc0 net/socket.c:2805
x64_sys_call+0x1d9c/0x21c0 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xc1/0x1020 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f37e563ee5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe5caa7fa8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000005c5fa0 RCX: 00007f37e563ee5d
RDX: 0000000000000000 RSI: 00002000000012c0 RDI: 0000000000000004
RBP: 00000000005c5fa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00000000005c5fac R15: 00000000005c5fa0
</TASK>

Fixes: b3b6babf4751 ("ipmr: Free mr_table after RCU grace period.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: Yi Lai <yi1.lai@intel.com>
Closes: https://lore.kernel.org/netdev/afrY34dLXNUboevf@ly-workstation/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260506065955.1695753-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: broadcom: Save PHY counters during suspend

The PHY counters can be lost if the PHY is reset during suspend. We
need to save the values into the shadow counters or the accounting
will be incorrect over multiple suspend and resume cycles.

Fixes: 820ee17b8d3b ("net: phy: broadcom: Add support code for reading PHY counters")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260505173926.2870069-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/smc: fix missing sk_err when TCP handshake fails

In smc_connect_work(), when the underlying TCP handshake fails, the error
code (rc) must be propagated to sk_err to ensure userspace can correctly
retrieve the error status via SO_ERROR. Currently, the code only handles
a restricted set of error codes (e.g., EPIPE, ECONNREFUSED). If other
errors occurs, such as EHOSTUNREACH, sk_err remains unset (zero).

This affects applications that rely on SO_ERROR to determine connect
outcome. For example, higher versions of Go's netpoller treats
SO_ERROR == 0 combined with a failed getpeername() as a spurious wakeup
and re-enters epoll_wait(). Under ET mode, no further edge will be
generated since the socket is already in a terminal state, causing the
connect to hang indefinitely or until a user-specified timeout, if one
is set.

Fixes: 50717a37db03 ("net/smc: nonblocking connect rework")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20260506014105.27093-1-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>