git.ipfire.org Git - thirdparty/linux.git/log

net: txgbe: rework service event handling

Convert to use test_and_clear_bit() for link event subtasks. Only re-arm
the WX_FLAG_NEED_MODULE_RESET flag when module is absent. Unsupported or
invalid modules no longer cause the service task to continuously retry
module identification.

Additionally, explicitly cancel service_task during device teardown to
ensure no pending asynchronous service work survives after the device
has entered the DOWN state.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260525100543.27140-4-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: wangxun: avoid statistics updates during device teardown

After introducing WX_STATE_DOWN, wx_update_stats() now explicitly skips
statistics collection while the device is in teardown or reset state.
Calling wx_update_stats() from the device disable path therefore becomes
redundant.

Remove wx_update_stats() calls from ngbe_disable_device() and
txgbe_disable_device().

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260525100543.27140-3-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: wangxun: introduce WX_STATE_DOWN to serialize device shutdown state

Replace various netif_running() checks with an explicit WX_STATE_DOWN
state bit to track whether the device datapath and interrupt handling
are operational.

The previous logic relied on netif_running() to gate interrupt
reenablement, queue wakeups, statistics updates, and service task
execution. However, netif_running() only reflects the administrative
state of the netdevice and does not fully serialize against teardown
and reset paths. During device shutdown and reset flows, asynchronous
contexts such as interrupt handlers, NAPI poll, and service work could
still observe netif_running() as true while device resources were
already being disabled or freed.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260525100543.27140-2-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: sch_fq: update flow delivery time on earlier EDT packet

When inserting an EDT packet with time before flow->time_next_packet,
update the flow and possibly queue next delivery time.

Reinsert the flow into the q->delayed rb-tree to position correctly
and to have fq_check_throttled set wake-up at the right next time.

Factor RB tree insertion out fq_flow_set_throttled to avoid open
coding twice.

EDT packets do not take precedence over queue rate limit. Skip this
new step if a queue limit is set. EDT packets do take precedence over
per-socket rate limits, as can be seen from fq_dequeue reading
sk_pacing_rate if !skb->tstamp.

With this change the so_txtime selftest sends packets in the expected
order.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260526134109.2624493-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'introduce-airoha-an8801r-series-gigabit-ethernet-phy-driver'

Louis-Alexis Eyraud says:

====================
Introduce Airoha AN8801R series Gigabit Ethernet PHY driver

This series introduces the Airoha AN8801R Gigabit Ethernet PHY initial
support.

The Airoha AN8801R is a low power single-port Ethernet PHY Transceiver
with Single-port serdes interface for 1000Base-X/RGMII.
This chip is compliant with 10Base-T, 100Base-TX and 1000Base-T IEEE
802.3(u,ab) and supports:
  - Energy Efficient Ethernet (802.3az)
  - Full Duplex Control Flow (802.3x)
  - auto-negotiation
  - crossover detect and autocorrection,
  - Wake-on-LAN with Magic Packet
  - Jumbo Frame up to 9 Kilobytes.
This PHY also supports up to three user-configurable LEDs, which are
usually used for LAN Activity, 100M, 1000M indication.

The series provides the devicetree binding and the driver that have been
written by AngeloGioacchino Del Regno, based on downstream
implementation ([1]). The driver allows setting up PHY LEDs, 10/100M,
1000M speeds, and Wake on LAN and PHY interrupts.

Since v2, the series also adds the air_phy_lib library, which goal is to
share common code between air_en8811h and air_an8801 drivers, and its use
in them. The first shared functions are the existing BuckPbus register
accessors and air_phy_read/write_page functions coming from air_en8811h
driver.

The series is based on net-next kernel tree (sha1: 90d03ee2c5dc) and
I have tested it on Mediatek Genio 720-EVK board (that integrates an
Airoha AN8801RIN/A Ethernet PHY) with early board hardware enablement
patches.

[1]: https://gitlab.com/mediatek/aiot/bsp/linux/-/blob/mtk-v6.6/drivers/net/phy/an8801.c
====================

Link: https://patch.msgid.link/20260526-add-airoha-an8801-support-v5-0-01aea8dee69b@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: air_an8801: ensure maximum available speed link use

To ensure that the Airoha AN8801R PHY uses the maximum available link
speed, an additional register write is needed to configure the function
mode for either 1G or 100M/10M operation after link detection.

So, in air_an8801 driver, implement a custom read_status callback, that
after genphy_read_status determines the link speed, sets the bit 0 of
the link mode register (REG_LINK_MODE) if the detected speed is 1Gbps,
or unsets it otherwise.

Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260526-add-airoha-an8801-support-v5-6-01aea8dee69b@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Introduce Airoha AN8801R Gigabit Ethernet PHY driver

Introduce a driver for the Airoha AN8801R Series Gigabit Ethernet
PHY; this currently supports setting up PHY LEDs, 10/100M, 1000M
speeds, and Wake on LAN and PHY interrupts.

Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260526-add-airoha-an8801-support-v5-5-01aea8dee69b@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Rename Airoha common BuckPBus register accessors

Rename the BuckPBus register accessors functions present in air_phy_lib
and their calls in air_en8811h driver, so all exported functions start
with the same prefix.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Link: https://patch.msgid.link/20260526-add-airoha-an8801-support-v5-4-01aea8dee69b@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: air_phy_lib: Factorize BuckPBus register accessors

In preparation of Airoha AN8801R PHY support, move the BuckPBus
register accessors and definitions, present in air_en8811h driver,
into the Airoha PHY shared code (air_phy_lib), so they will be usable
by the new driver without duplicating them.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Link: https://patch.msgid.link/20260526-add-airoha-an8801-support-v5-3-01aea8dee69b@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Add Airoha phy library for shared code

In preparation of Airoha AN8801R PHY support, split out the interface
functions that will be common between the already present air_en8811h
driver and the new one, and put them into a new library named
air_phy_lib.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Link: https://patch.msgid.link/20260526-add-airoha-an8801-support-v5-2-01aea8dee69b@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: Add support for Airoha AN8801R GbE PHY

Add a new binding to support the Airoha AN8801R Series Gigabit
Ethernet PHY.

Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Link: https://patch.msgid.link/20260526-add-airoha-an8801-support-v5-1-01aea8dee69b@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: cls_bpf: prevent unbounded recursion in offload rollback

Quan Sun reported [1] a stack overflow in cls_bpf_offload_cmd().

Reproducer on netdevsim: add a skip_sw cls_bpf filter, set the
bpf_tc_accept debugfs knob to 0, then `tc filter replace`. The replace
calls tc_setup_cb_replace() which fails. cls_bpf_offload_cmd() then
swaps prog/oldprog and recursively calls itself to roll back. But
bpf_tc_accept=0 makes the rollback fail too, which triggers yet another
rollback frame with the same arguments, and so on until the stack is
exhausted.

bpf_tc_accept is just a convenient knob for the reproducer. Any driver
whose tc_setup_cb_replace() fails twice in a row can hit the same loop,
so this is not a netdevsim-only issue.

Two ways to fix it:

  1) Have the rollback call tc_setup_cb_add() on oldprog instead of
     re-entering cls_bpf_offload_cmd().
  2) Mark the rollback frame with a flag and skip a second-level
     rollback from inside it.

Go with (2). It is the smaller change and keeps the original behaviour:
the rollback still goes through tc_setup_cb_replace(), so the driver
gets one real chance to restore its state. If that attempt also fails,
we just return the original error instead of recursing.

[1]: https://lore.kernel.org/bpf/ce5a6005-3c5e-4696-9e05-eba9461dc860@std.uestc.edu.cn/T/#u

Fixes: 102740bd9436 ("cls_bpf: fix offload assumptions after callback conversion")
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://patch.msgid.link/20260526025529.24382-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: page_pool: silence static analysis warnings in page_pool_nl_stats_fill()

nla_nest_start() can return NULL if the skb runs out of space.

Jakub:
There is no bug here, if nla_nest_start() failed there's not space
left in the message. Next nla_put_uint() will also fail and we will
exit via nla_nest_cancel() which handles NULL just fine.
Various people keep sending us this patch so let's commit this.

Signed-off-by: Zhao Dongdong <zhaodongdong@kylinos.cn>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/tencent_A82EBAB365A8B888B66FDCF115A3DCB8880A@qq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv6-frags-adopt-__in6_dev_stats_get-a-bit-more'

Eric Dumazet says:

====================
ipv6: frags: adopt __in6_dev_stats_get() a bit more

First patch addresses Sashiko's feedback about a potential
NULL dereference in __in6_dev_stats_get().

Second patch adopts __in6_dev_stats_get() in net/ipv6/reassembly.c.
====================

Link: https://patch.msgid.link/20260526145529.3587126-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: frags: cleanup __IP6_INC_STATS() confusion

After commits e1ae5c2ea478 ("vrf: Increment Icmp6InMsgs on the original
netdev") and bdb7cc643fc9 ("ipv6: Count interface receive statistics
on the ingress netdev") net/ipv6/reassembly.c uses three different
ways to reach idev in various __IP6_INC_STATS() calls.

- ip6_dst_idev(skb_dst(skb))
- __in6_dev_get_safely(skb->dev)
- __in6_dev_stats_get(skb->dev)

Lets centralize this from ipv6_frag_rcv() and use __in6_dev_stats_get().

Note that ipv6_frag_rcv() tests if skb->dev could be NULL already, so
I chose to also guard against NULL, but we probably can remove the
tests in a followup patch, because I do not think skb->dev could be NULL.

iif = skb->dev ? skb->dev->ifindex : 0;

idev can be NULL, __IP6_INC_STATS() deals with this possibility.

Small code size reduction as a bonus.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-145 (-145)
Function                                     old     new   delta
ipv6_frag_rcv                               2399    2362     -37
ip6_frag_reasm                               705     597    -108
Total: Before=31455552, After=31455407, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260526145529.3587126-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: guard against possible NULL deref in __in6_dev_stats_get()

dev_get_by_index_rcu() could return NULL if the original physical
device is unregistered.

Found by Sashiko.

Fixes: e1ae5c2ea478 ("vrf: Increment Icmp6InMsgs on the original netdev")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260526145529.3587126-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sfp: add quirk for OEM 2.5G optical modules

Some OEM-branded SFP modules are incorrectly detected as
1000Base-X and fail to establish link on 2.5G-capable ports.

These modules do not properly advertise 2500Base-X capability
in their EEPROM and require forcing the correct SerDes mode.

Add sfp_quirk_2500basex for:

  - OEM SFP-2.5G-LH03-B
  - OEM SFP-2.5G-LH20-A

Both modules report:

  Vendor name: OEM
  Vendor PN: SFP-2.5G-LH03-B / SFP-2.5G-LH20-A

Tested on OpenWrt with successful 2.5G link establishment.

Signed-off-by: Wei Qisen <weixiansen574@163.com>
Link: https://patch.msgid.link/20260526055206.1750-1-weixiansen574@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: mcast: annotate data-races around mca_users

/proc/net/igmp6 walks IPv6 multicast memberships under RCU and prints
mca_users without holding idev->mc_lock, while multicast join and leave
paths update the field while holding idev->mc_lock. Annotate this
intentional lockless snapshot with READ_ONCE() and the matching writers
with WRITE_ONCE().

Signed-off-by: Yuyang Huang <sigefriedhyy@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260524022456.20689-1-sigefriedhyy@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/dns_resolver: consolidate namelen checks in dns_query

Consolidate the namelen checks and return -EINVAL early if needed. Drop
the namelen == 0 check since it is covered by namelen < 3.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260525095400.821912-3-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'rtnetlink-rtnl-avoidance-in-rtnl_getlink-and-rtnl_dump_ifinfo'

Eric Dumazet says:

====================
rtnetlink: RTNL avoidance in rtnl_getlink() and rtnl_dump_ifinfo()

Many shell scripts invoke iproute2 commands specifying a device by
its name.

This series improves their performance avoiding RTNL acquisition
for their (repeated) name->index conversion.
====================

Link: https://patch.msgid.link/20260525083542.1565964-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtnetlink: add RTEXT_FILTER_NAME_ONLY support to rtnl_dump_ifinfo()

When user requests RTEXT_FILTER_NAME_ONLY flag, we limit the dump
parts to:

- struct nlmsghdr
- IFLA_IFNAME
- IFLA_PROP_LIST (alternate names)

- This saves space in the dump, pushing more devices per system call.
- This can be done without acquiring RTNL.

I still have a medium term goal to avoid RTNL in rtnl_dump_ifinfo()
regardless of RTEXT_FILTER_NAME_ONLY being used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtnetlink: do not assume RTNL is held in link_master_filtered()

RTNL might be no longer held by the caller in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtnetlink: do not acquire RTNL in rtnl_getlink() with RTEXT_FILTER_NAME_ONLY

When RTEXT_FILTER_NAME_ONLY is requested, rtnl_fill_ifinfo()
is dumping device attributes which do not need RTNL protection.

Many shell scripts invoke iproute2 commands specifying a device by
its name. After this patch, they will no longer add RTNL pressure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: defer netdev_name_node_alt_flush() call to netdev_run_todo()

In the following patch, we want to call rtnl_fill_prop_list() without
RTNL being held, but after a device reference was taken.

We need to free altnames in netdev_run_todo() instead of
unregister_netdevice_many_notify().

Freeing will only happen once all device references
have been released.

Note that dev->name_node serves as the anchor for altnames,
thus must be also freed in netdev_run_todo().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtnetlink: use nla_nest_end_safe() in rtnl_fill_prop_list()

Avoid corrupting a netlink message and confuse user space in the
very unlikely case rtnl_fill_prop_list was able to produce a very big
nested element.

This is extremely unlikely, because rtnl_prop_list_size()
provisions nla_total_size(ALTIFNAMSIZ) per altname.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260525083542.1565964-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: thunderx: fix PTP device ref leak in nicvf_probe()

cavium_ptp_get() acquires a reference to the PTP PCI device
through pci_get_device(). If any initialization step fails
after cavium_ptp_get(), the PTP PCI device reference is leaked.
Add a common error path to release the PTP reference before
returning from probe failures.

Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Link: https://patch.msgid.link/20260525082611.61817-1-lihaoxiang@isrc.iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hsr: require valid EOT supervision TLV

Supervision frames are only valid if terminated with a zero-length EOT
TLV. The current check fails to reject non-EOT entries as the terminal
TLV, potentially allowing malformed supervision traffic.

Fix this by strictly requiring the terminal TLV to be HSR_TLV_EOT with
a length of zero.

Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260523130420.62144-1-luka.gejak@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-next-26-05-25' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: updates for net-next

The following patchset contains Netfilter fixes and small enhancements:

1) Disable 32-bit x_tables compatibility (32bit binaries on 64bit
   kernel) interface in user namespaces.  This is 'last warning'
   before this is removed for good.

2) Add a configuration toggle for netfilter GCOV profiling. Provide
   dedicated toggles for ipset and ipvs.

3) Remove modular support for nfnetlink and restrict it to built-in only.
   From Pablo Neira Ayuso.

4) Use per-rule hash initval in nf_conncount. This avoids unecessary
   lock contention with short keys (e.g. conntrack zones) in different
   namespaces.

5) Use nf_ct_exp_net() in ctnetlink expectation dumps.
   From Pratham Gupta.

6) Remove a dead conditional in nft_set_rbtree.

7) Fix conntrack helper policy updates to apply per-class values correctly.
   From David Carlier.

8) Fix an off-by-one OOB read in nf_conntrack_irc:parse_dcc(). Use strict
   less-than comparison in the newline search loop to respect the
   exclusive-end pointer convention.  From Muhammad Bilal.

9) Fix typos in nf_conntrack_proto_tcp comments.  From Avinash Duduskar.

10) Restore performance optimization in nft_set_pipapo_avx2 by passing
    the next map index. Refactor lookup logic for clarity and add a
    DEBUG_NET check to document this.

11) Avoid (harmless) u16 overflow in nf_conntrack_ftp when parsing FTP PORT
    and EPRT commands.  Ignore commands where single octet exceeds 255.
    From Giuseppe Caruso.

Patch 12, which removes incorrect (and obviously unused) code from
nft_byteorder was kept back to avoid a net -> net-next merge conflict.

* tag 'nf-next-26-05-25' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_conntrack_ftp: avoid u16 overflows
  netfilter: nft_set_pipapo_avx2: restore performance optimization
  netfilter: nf_conntrack_proto_tcp: fix typos in comments
  netfilter: nf_conntrack_irc: fix parse_dcc() off-by-one OOB read
  netfilter: nfnl_cthelper: apply per-class values when updating policies
  netfilter: nft_set_rbtree: remove dead conditional
  netfilter: ctnetlink: use nf_ct_exp_net() in expectation dump
  netfilter: nf_conncount: use per-rule hash initval
  netfilter: allow nfnetlink built-in only
  netfilter: add option for GCOV profiling
  netfilter: x_tables: disable 32bit compat interface in user namespaces
====================

Link: https://patch.msgid.link/20260525182924.28456-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netconsole: Constify struct configfs_item_operations and configfs_group_operations

'struct configfs_item_operations' and 'configfs_group_operations' are not
modified in this driver.

Constifying these structures moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.

On a x86_64, with allmodconfig, as an example:
Before:
======
   text    data     bss     dec     hex filename
  64259   24272     608   89139   15c33 drivers/net/netconsole.o

After:
=====
   text    data     bss     dec     hex filename
  64579   23952     608   89139   15c33 drivers/net/netconsole.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/7ff56bdb0cee826a56365f930dcdf457b44931df.1779711734.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_fid: use a dedicated list head pointer for sorted insert

mlxsw_sp_fid_port_vid_list_add() inserts into a list sorted by
local_port. It walks the list to find the first entry with a
larger local_port, then inserts the new entry before it:

list_add_tail(&port_vid->list, &tmp_port_vid->list);

If the loop falls through (the new local_port is the largest),
tmp_port_vid runs off the end of the list. &tmp_port_vid->list
then ends up at the list head itself (container_of() offsets
cancel), and list_add_tail() inserts at the tail. So the code
works today.

It is fragile though. Anyone who later adds a read of another
field of tmp_port_vid will hit memory outside the list head.

Track the insertion point with a dedicated list_head pointer.
Initialise insert_before to &fid->port_vid_list, set it to
&tmp_port_vid->list only on early break, and pass insert_before
to list_add_tail(). The cursor is no longer touched after the
loop. Behaviour is unchanged.

Same shape as the Koschel cleanups from 2022 (e.g. 99d8ae4ec8a
tracing, 2966a9918df clockevents, dc1acd5c946 dlm).

Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260525071759.1517576-1-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: netc: fix unmet Kconfig dependencies for NET_DSA_NETC_SWITCH

NET_DSA_NETC_SWITCH selects NXP_NTMP, NXP_NETC_LIB and FSL_ENETC_MDIO,
but these symbols depend on NET_VENDOR_FREESCALE which may not be
enabled. This results in Kconfig warnings and linker errors like:

  undefined reference to `ntmp_bpt_update_entry'
  undefined reference to `ntmp_fdbt_search_port_entry'
  undefined reference to `ntmp_free_cbdr'
  undefined reference to `enetc_hw_alloc'
  ...

Therefore, add "depends on NET_VENDOR_FREESCALE" to NET_DSA_NETC_SWITCH,
ensuring that the selected symbols NXP_NTMP, NXP_NETC_LIB and
FSL_ENETC_MDIO, which all depend on NET_VENDOR_FREESCALE, can only be
selected when that dependency is already satisfied.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605240046.8MvKuOMg-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202605240706.EuGmnrz5-lkp@intel.com/
Fixes: 187fbae024c8 ("net: dsa: netc: introduce NXP NETC switch driver for i.MX94")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260524070310.2429819-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib_tests: add temporary IPv6 address renewal test

Add a test to check that temporary IPv6 address is regenerated properly
after the base prefix is deprecated and restored.

Fib6 temporary address renewal test
TEST: IPv6 temporary address cleanly deprecated and regenerated [ OK ]

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260523103811.3790-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: addrconf: fix temp address generation after prefix deprecation

When a router temporarily deprecates an IPv6 prefix (either by sending a
Router Advertisement with Preferred Lifetime = 0 or by letting the
lifetime expire) and later restores it, the kernel permanently loses its
ability to generate temporary privacy addresses (RFC 8981) for that
prefix.

This happens because the address worker attempts to generate a
replacement temporary address when the current one nears expiration. As
the base prefix is deprecated already, the generation fails after
marking the temporary address as already having spawned a replacement
(ifp->regen_count++).

When the router eventually restores the prefix, the temporary address
becomes active again. However, once it naturally expires, the address
worker sees this temporary address already tried to generate one and
skips the regeneration.

Fix the issue by resetting the regen_count check of the latest temp
address generated for the prefix updated by the incoming RA.

Reported-by: Łukasz Stelmach <steelman@post.pl>
Closes: https://lore.kernel.org/netdev/87340td30q.fsf%25steelman@post.pl/
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260523103811.3790-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: napi: Skip last poll when arming gro timer in busy poll

Skip the extra call to napi->poll(), if the gro timer is armed at the
end of busy polling. This removes the need for having a separate
__busy_poll_stop() routine and its code is moved directly into the
relevant places in busy_poll_stop(). Remove obsolete comment about
ndo_busy_poll_stop().

This is a follow-up to commit 58e2330bd455 ("net: napi: Avoid gro timer
misfiring at end of busypoll"), which has deferred arming the gro timer
to the end of __busy_poll_stop() to eliminate a race condition between
a short timer and long poll that could leave the queue stuck with
interrupts disabled and no timer armed.

Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Link: https://patch.msgid.link/20260523012247.1574691-1-mkarsten@uwaterloo.ca
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

llc: Add SPDX id lines to llc header files

Add appropriate SPDX-License-Identifier lines to llc header (.h)
files, and remove other license text from the files.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Link: https://patch.msgid.link/20260523002354.28831-1-tim.bird@sony.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

llc: Add SPDX id lines to some llc source files

Most of the lls source files are missing SPDX-License-Identifier
lines. Add appropriate IDs to these files, and remove other license
info from the header. In once case, leave the existing id line
and just remove the license reference text.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Link: https://patch.msgid.link/20260522225508.24006-1-tim.bird@sony.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-ovs-packet-family-ynl-spec-and-unicast-notification-support'

Minxi Hou says:

====================
Add OVS packet family YNL spec and unicast notification support

This series adds a YAML netlink spec for the OVS_PACKET_FAMILY genetlink
family and a bind-only ntf_bind() helper for receiving unicast
notifications.
====================

Link: https://patch.msgid.link/20260522174154.720293-1-houminxi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: add unicast notification receive support

Add ntf_bind() method to YnlFamily for binding the netlink
socket without joining a multicast group. This enables receiving
unicast notifications through the existing poll_ntf/check_ntf
path.

The OVS packet family sends MISS and ACTION upcalls via
genlmsg_unicast() to a per-vport PID rather than through a
multicast group. The existing ntf_subscribe() couples bind()
with setsockopt(ADD_MEMBERSHIP), which does not fit the unicast
case. ntf_bind() provides the bind-only alternative, with the
address defaulting to (0, 0) but exposed as an explicit argument.

Signed-off-by: Minxi Hou <houminxi@gmail.com>
Link: https://patch.msgid.link/20260522174154.720293-3-houminxi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: add OVS packet family specification

Add YAML netlink spec for the OVS_PACKET_FAMILY (ovs_packet).
This completes the set of OVS genetlink family specs (ovs_datapath,
ovs_flow, ovs_vport already exist).

The spec defines three operations: MISS (event), ACTION (event),
and EXECUTE (do). MISS and ACTION are kernel-to-userspace upcalls
sent via genlmsg_unicast(); EXECUTE is the only registered genl
operation.

Key, actions, and egress-tun-key attributes are typed as binary
rather than nest because the nested attribute definitions belong
to the ovs_flow spec and cross-spec references are not supported
by the YNL framework.

Signed-off-by: Minxi Hou <houminxi@gmail.com>
Link: https://patch.msgid.link/20260522174154.720293-2-houminxi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/ism: Drop superfluous zeros in pci_device_id array

The .driver_data member of the struct pci_device_id array were
initialized by a list expressions to zero without making use of that
value. In this case it's better to not specify a value at all and let
the compiler fill in the zeros. Same for the list terminator that can
better be completely empty.

This patch doesn't introduce changes to the compiled array.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://patch.msgid.link/20260522153010.777081-2-u.kleine-koenig@baylibre.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-enetc-prepare-for-enetc-v4-vf-support'

Wei Fang says:

====================
net: enetc: Prepare for ENETC v4 VF support

This patch series refactors and extends the ENETC driver infrastructure
to prepare for upcoming ENETC v4 Virtual Function (VF) support. The main
focus is on code commonization, improved VF-PF communication, and dynamic
resource allocation.

The ENETC IP has evolved across different revisions, and the existing
driver architecture was primarily designed around v1 hardware. To support
v4 VFs efficiently, we need to share common code between PF drivers of
different IP versions while maintaining compatibility.

Key changes in this series:

1. VF-PF Messaging Infrastructure:
   - Convert mailbox messages to new formats
   - Use read_poll_timeout() for simplifying VF mailbox polling
   - Add support for IP minor revision query via messaging

2. Code Commonization:
   - Relocate SR-IOV configuration helpers to common PF code
   - Move VF message handlers to dedicated enetc_msg.c
   - Integrate enetc_msg.c into enetc-pf-common driver

3. CBDR (Control Buffer Descriptor Ring) Improvements:
   - Align v1 CBDR API with v4 for VF driver sharing
   - Add CBDR setup/teardown hooks to enetc_si_ops

4. Dynamic Resource Management:
   - Dynamically allocate rxmsg based on actual VF count
   - Use MADDR_TYPE constant for MAC filter array sizing

5. Generic SR-IOV Initialization:
   - Add generic helper to initialize SR-IOV resources

This refactoring lays the groundwork for cleanly integrating ENETC v4 VF
support in subsequent patch series, allowing code reuse between v1 and v4
PF drivers while maintaining a clean separation of version-specific
logic.

v1: https://lore.kernel.org/imx/20260511080805.2052495-1-wei.fang@nxp.com/
====================

Link: https://patch.msgid.link/20260522092438.1264020-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: dynamically allocate rxmsg based on VF count

The constant ENETC_MAX_NUM_VFS is defined as 2 when enabling support for
LS1028A. This works for LS1028A because its ENETC hardware supports up
to 2 VFs. However, ENETC v4 has varying VF capabilities depending on the
SoC:

i.MX94 standalone ENETC: 0 VFs
i.MX94 internal ENETC: 3 VFs
i.MX952: 1 VF

Using a fixed ENETC_MAX_NUM_VFS for memory allocation leads to
over-allocation on SoCs with fewer or no VF support. To better match
hardware capabilities and avoid unnecessary memory usage, change rxmsg
memory allocation from a fixed-size array to dynamic allocation based
on the actual VF count retrieved via pci_sriov_get_totalvfs().

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-13-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: use MADDR_TYPE for MAC filter array size

The mac_filter array in struct enetc_pf is sized as
ENETC_MAX_NUM_MAC_FLT, defined as (ENETC_MAX_NUM_VFS + 1) * MADDR_TYPE.
This resulted in an array of 6 elements (for 2 VFs), but only the first
2 entries are actually used.

The PF driver maintains MAC filters for unicast (UC) and multicast (MC)
addresses, indexed by the enum enetc_mac_addr_type (UC=0, MC=1). The
code only iterates over MADDR_TYPE (2) entries and directly accesses
mac_filter[UC] and mac_filter[MC]. The extra space allocated for
(ENETC_MAX_NUM_VFS * MADDR_TYPE) entries is never used because VF MAC
filtering is not implemented yet.

Remove the ENETC_MAX_NUM_MAC_FLT macro and size the array as
MADDR_TYPE, reducing the allocation from 6 to 2 entries. This saves 48
bytes per PF and better reflects the actual usage.

This change has no functional impact. Future VF MAC filtering support
will move mac_filter into struct enetc_si, allowing each SI (PF or VF)
to maintain its own independent filter table.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-12-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: add generic helper to initialize SR-IOV resources

The upcoming ENETC v4 PF driver will support SR-IOV, and its logic for
initializing VF resources is identical to the existing ENETC v1 PF
implementation. To avoid code duplication across PF drivers, factor out
the common SR-IOV initialization logic into the enetc-pf-common driver.

Add enetc_init_sriov_resources() to handle:
- Querying the total number of VFs supported by the device via
pci_sriov_get_totalvfs()
- Allocating memory for the VF state array (struct enetc_vf_state)

The implementation uses devm_kcalloc() instead of kzalloc() to simplify
memory management. This automatically frees VF state memory when the PF
device is removed, eliminating the need for explicit cleanup in error
and remove paths.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-11-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: add CBDR setup/teardown hooks to enetc_si_ops for VF support

The upcoming ENETC v4 VF will share the enetc-vf driver with the
existing v1 VF. However, ENETC v4 uses a revised CBDR (command BD ring)
setup/teardown API that differs from v1.

To support both versions in the same driver, add setup_cbdr() and
teardown_cbdr() function pointers to struct enetc_si_ops. This allows
each hardware version to register its own CBDR implementation:

- ENETC v1 VF registers enetc_setup_cbdr/enetc_teardown_cbdr (existing)
- ENETC v4 VF will register enetc4_setup_cbdr/enetc4_teardown_cbdr

Update the enetc-vf driver to call CBDR operations through si->ops
instead of directly invoking the v1 functions. This enables runtime
selection of the correct CBDR backend based on hardware version.

Changes:
- Add setup_cbdr() and teardown_cbdr() hooks to struct enetc_si_ops
- Register v1 CBDR functions in enetc_vsi_ops
- Replace direct calls with si->ops->setup_cbdr() and
si->ops->teardown_cbdr() in enetc_vf.c

No functional changes to existing v1 VF behavior.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-10-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: align v1 CBDR API with v4 for VF driver sharing

The upcoming ENETC v4 VF will share the enetc-vf driver with the v1 VF.
However, ENETC v4 introduces different CBDR (command BD ring) setup and
teardown semantics that are incompatible with v1.

To support both versions in the same driver, the .setup_cbdr() and
.teardown_cbdr() hooks will be added to struct enetc_si_ops, allowing
the driver to register version-specific implementations. So refactor the
v1 CBDR functions to match the v4-style interface (taking struct enetc_si*
instead of individual parameters), enabling them to be registered via
si_ops in the subsequent patch.

Changes:
- Update enetc_setup_cbdr() and enetc_teardown_cbdr() prototypes to
  take 'struct enetc_si *' as the sole parameter
- Extract parameters (dev, hw) from the enetc_si structure within the
  function implementations
- ENETC_CBDR_DEFAULT_SIZE has always been used as the number of command
  BDs, and there is no need to adjust the size of the command BD ring.
  Therefore, ENETC_CBDR_DEFAULT_SIZE is moved into the enetc_setup_cbdr()
- Update all call sites in enetc_pf.c and enetc_vf.c

No functional changes. This prepares for adding v4-specific CBDR handling
in subsequent patches.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-9-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: add VF-PF messaging support for IP minor revision query

For ENETC v4, different SoCs use different minor revisions, such as
i.MX95 v4.1, i.MX94 v4.3, and i.MX952 v4.6. Unlike the PF, the VF does
not have access to a global register that exposes the IP minor revision.
In the current driver model, the VF must select the appropriate driver
data based on this revision information.

To support this requirement, the VF now sends a minor revision query
message to the PF through the VSI-to-PSI mailbox mechanism. The PF
responds with the IP minor revision so that the VF can match the correct
driver data.

This patch adds PF-side support for replying to the minor revision
message and VF-side support for sending the query.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-8-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: convert mailbox messages to new formats

On the LS1028A platform, the PF-VF mailbox was only used to update the
VF's MAC address. The original message format is minimal, lacks a clear
structure, and provides no means for the receiver to validate message
integrity, making it difficult to extend for new features.

With the introduction of i.MX ENETC v4, the interaction between PF and
VF has become significantly more complex. Typical deployments now include
scenarios where the PF is controlled by an M core while the VF is driven
by either the Linux kernel or DPDK, or where the PF is controlled by the
Linux kernel while the VF is controlled by DPDK. These heterogeneous
driver combinations require a unified and extensible message format to
ensure compatibility across different operating environments.

This patch introduces a newly defined PF-VF message structure and
converts the existing MAC-update mechanism to use the new format. The
redesigned message layout provides:

  - extensibility to support future PF-VF features on ENETC v4,
  - consistent framing for all message types,
  - improved data integrity checking,
  - a common protocol usable across Linux, M core firmware, and DPDK.

Additional PF-VF message types will be added in subsequent patches.

Note that switch to the new message format will not affect ENETC v1
(LS1028A). Due to a hardware limitation of ENETC v1, the ENETC PF and
VFs can only be controlled by the same OS. If the PF is controlled by
the Linux kernel driver, then the VFs must also be controlled by the
Linux kernel driver.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-7-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: use read_poll_timeout() for VF mailbox polling

Replace the manual do-while polling loop in enetc_msg_vsi_send() with
the standard read_poll_timeout() helper to simplify the code.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-6-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: integrate enetc_msg.c into enetc-pf-common driver

Move enetc_msg.c from the fsl-enetc driver to the nxp-enetc-pf-common
driver so that SR-IOV mailbox handling can be shared between ENETC v1
and v4 PF drivers.

Changes:
- Move enetc_msg.o compilation from fsl-enetc to nxp-enetc-pf-common
- Export enetc_sriov_configure() with EXPORT_SYMBOL_GPL for use by
both PF drivers

The fsl-enetc driver now depends on nxp-enetc-pf-common for SR-IOV
functionality.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-5-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: relocate SR-IOV configuration helper for common PF support

Move enetc_sriov_configure() from enetc_pf.c to enetc_msg.c to prepare
for integrating enetc_msg.c into the enetc-pf-common driver, where it
will be shared between ENETC v1 and v4 PF drivers.

Since enetc_msg_psi_init() and enetc_msg_psi_free() are now only called
from enetc_sriov_configure() within the same file, make them static.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-4-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: move VF message handlers to enetc_msg.c

Move enetc_msg_pf_set_vf_primary_mac_addr() and enetc_msg_handle_rxmsg()
to enetc_msg.c to consolidate VF mailbox message handling logic.

Make enetc_msg_handle_rxmsg() static since it's only called from
enetc_msg_task() within the same file.

This prepares for integrating enetc_msg.c into the enetc-pf-common
driver to be shared between ENETC v1 and v4 PF drivers.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-3-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: use enetc_set_si_hw_addr() for setting MAC address

Replace enetc_pf_set_primary_mac_addr() with the generic
enetc_set_si_hw_addr() function. This prepares for moving
enetc_msg_pf_set_vf_primary_mac_addr() to the enetc-pf-common driver,
where it can be shared between ENETC v1 and v4 PF drivers.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260522092438.1264020-2-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'mv88e6xxx-cache-scratch-config3-of-6352'

Fidan Aliyeva says:

====================
mv88e6xxx: Cache scratch config3 of 6352

In mv88e6352 scratch register in Global Control 2 set of registers
returns which port is attached to SERDES. This value is a pin
strapping value and is set after the switch is released from reset.
Thus, it can be cached during chip setup instead of reading the
register everytime when SERDES check is needed.

The series consist of 4 parts:
1. Add new mv88e6352_reset function as ops->reset
2. Cache the register value in this reset function
3. Refactor mv88e6352_g2_scratch_port_has_serdes to use the cached
value.
4. Remove the locks surrounding mv88e6352_g2_scratch_port_has_serdes.

v1: https://lore.kernel.org/netdev/20260515223707.1026325-1-fidan.aliyeva.ext@ericsson.com/
RFC: https://lore.kernel.org/netdev/20260510213429.2044612-1-fidan.aliyeva.ext@ericsson.com/
====================

Link: https://patch.msgid.link/20260521202924.727929-1-fidan.aliyeva.ext@ericsson.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mv88e6xxx: Remove locks for 6352's has_serdes

There is no register access anymore in
mv88e6352_g2_scratch_port_has_serdes. So, remove the locks
surrounding the function.

Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com>
Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com>
Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com>
Link: https://patch.msgid.link/20260521202924.727929-5-fidan.aliyeva.ext@ericsson.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mv88e6xxx: Use cached config3 in 6352 has_serdes

1. Refactor mv88e6352_g2_scratch_port_has_serdes to use the cached
scratch config3 value instead of reading it everytime.
2. Remove err<0 check from mv88e6352_phylink_get_caps as it is never
true anymore

Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com>
Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com>
Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com>
Link: https://patch.msgid.link/20260521202924.727929-4-fidan.aliyeva.ext@ericsson.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mv88e6xxx: Cache scratch config3 of 6352

Changes:

1. Add g2_scratch_config3 member to mv88e6xxx_chip.

2. Add mv88e6352_g2_cache_global_scratch_config3 which reads the
CONFIG3 value from the scratch register and caches it.

3. Call this function in mv88e6352_reset.

Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com>
Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com>
Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com>
Link: https://patch.msgid.link/20260521202924.727929-3-fidan.aliyeva.ext@ericsson.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mv88e6xxx: Add mv88e6352_reset for 6352 family

1. Add mv88e6352_reset which calls the previous ops->reset function
- mv88e6352_g1_reset.
2. Make all 6352 family use this new function as ops->reset

Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com>
Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com>
Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com>
Link: https://patch.msgid.link/20260521202924.727929-2-fidan.aliyeva.ext@ericsson.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-mdio-realtek-rtl9300-groundwork-for-multi-soc-support'

Markus Stockhausen says:

====================
net: mdio: realtek-rtl9300: Groundwork for multi SOC support

The Realtek Otto switch platform consist of four different series

- RTL838x aka maple   : 28 port 1G Switches
- RTL839x aka cypress : 52 port 1G Switches
- RTL930x aka longan  : 28 port 1G/2.5G/10G Switches
- RTL931x aka mango   : 56 port 1G/2.5G/10G Switches

The existing realtek-rtl9300 MDIO driver was only designed for RTL930x
devices. The three other SOCs are not supported although they basically
incorporate a very similar MDIO controller.

This series is the first step in a multi-stage approach to also support
the missing SOCs. Device specific properties and registers are converted
into designated initializers. Based on this new devices can be added
much easier in future commits.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
====================

Link: https://patch.msgid.link/20260521175918.1494797-1-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: realtek-rtl9300: Link I/O functions in info structure

The MDIO controller registers of the different devices of the
Realtek Otto switch series are very similar. Nevertheless each
device will need to feed the whole command data distributed over
the controller registers slightly different.

E.g. the combined C22/command register has different field layouts.
On RTL930x bits 24-20 define the to-be-accessed C22 register number
while on RTL839x this is stored in bits 9-5.

Thus there need to be device specific read/write functions that
are called dynamically. Add them into the info structure and make
use of them where needed.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20260521175918.1494797-10-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: realtek-rtl9300: Add port mask register

MDIO controller commands work on ports. These are converted by
the driver and hardware forth and back to bus/address. For write
commands a port mask register needs to be filled. Each bit tells the
controller to which PHY the write will be issued. Setting multiple
bits allows to program multiple PHYs in one step. The driver will
not make use of this parallel write feature. But it must at least
fill the bit of the target port that it wants to write to.

Depending on the SOC type and the number of supported PHYs this is
either one or two 32 bit port mask registers. The driver currently only
supports the 28 port RTL930x SOCs. So provide only the mask register
for the lower 32 ports. Add it to the register structure and make use
of it where needed.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20260521175918.1494797-9-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: realtek-rtl9300: Add I/O register

The MDIO data that needs to be written or read to registers of the
controller is handled by an I/O register. Add that to the register
structure and make use of it where needed.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20260521175918.1494797-8-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: realtek-rtl9300: Add command/C22 register

Command issuing/status bits and C22 data share the same register. In the
future the number of places where this register is used will be:

- One generic command helper/runner for all devices that will access the
command bits of the register
- 8 device specific C22 read/write functions that will access the C22
data fields.

Thus name the register c22_data to align with the existing c45_data
register. This way all device specific helpers will have a common
view on the to-be-fed data. Add the register to the existing structure
and make use of it where needed.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20260521175918.1494797-7-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: realtek-rtl9300: Add register structure

The MDIO controller of the Realtek Otto switches has either 4 or 7 command
registers. This depends on the number of supported ports. These registers
are "scattered" around the MMIO block and their addresses depend on the
specific model.

Nevertheless all command registers share a common pattern:

- A mask register with one bit per addressed port
  (remark: the driver internally works on ports instead of bus/address)
- A I/O data register that transfers the to be read/written data
- A C45 registers that takes devnum and regnum
- A C22 register that also includes run and status bits
  (remark: this also takes the Realtek proprietary C22 PHY page)

Provide an additional structure for these command registers so it can be
reused in two places.

1. For defining the register addresses in the regmap.
2. For defining the to be read/written register data

This will finally result in access patterns like

static int otto_emdio_run_cmd(u32 cmd,
                              struct rtl_mdio_cmd_regs *cmd_regs, ...)
{
  regmap_write(regmap, priv->info->reg->cmd_regs.c45_data,
                       cmd_regs->c45_data);
  ...
}

static int otto_emdio_9300_write_c45(...)
{
  struct otto_emdio_cmd_regs cmd_regs = {
    .c45_data  = ...
    .io_data   = ...,
    .port_mask = ...,
  };

  return otto_emdio_run_cmd(RTL9300_CMD_WRITE_C45, &cmd_regs, ...);
}

As a first step start with the C45 register. This one takes the
devnum/regnum data that is stored in the high/low 16 bits.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20260521175918.1494797-6-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: realtek-rtl9300: Add pages to info structure

The Realtek ethernet MDIO controller has a proprietary paging feature
that is closely aligned with Realtek based PHYs. These PHY know "pages"
for C22 access. Those can be switched via reads/writes to register 31.
Usually the paged access must be programmed in four steps.

1. read/save page register
2. change "page" register 31
3. read/write data register (on the given page)
4. restore page register

The controller can run all this in hardware with one single request
from the driver. It is given the page, the register and the data
and takes care of all the rest. This reduces CPU load. The number
of supported pages depend on the model. This is either 4096 for low
port count SOCs (up to 28 ports) or 8192 for high port count SOCs
(up to 56 ports).

There is however one special page that allows to pass through all C22
commands directly to the PHY - without any caching. This so called raw
page is dependent of the hardware. It is the highest supported page
number minus 1.

Provide the number of supported pages as a device specific property.
This new "num_pages" aligns with the existing properties and gives
an better insight into the hardware layout than just defining the
number of the raw page. The later directly derives from that and
can be accessed with the new RAW_PAGE() macro. Make use of it where
needed.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20260521175918.1494797-5-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: realtek-rtl9300: Add ports to info structure

The ethernet MDIO controller in the Realtek Otto series has a very special
command register style. Instead of working with bus/address it works on
ethernet port numbers. For this the controller is initialized via mapping
registers that tell which port is mapped to which bus/address. Every
request to the driver is then converted as follows

1. Kernel calls driver with bus/address
2. Driver converts bus/address to port and issues command
3. Hardware maps port back to bus/address

The number of ports is different for each device. Make this configurable
by adding a property to the info structure. Switch the existing usage of
MAX_PORTS to this new property where needed.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20260521175918.1494797-4-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: realtek-rtl9300: Add device specific info structure

Device properties of the RTL930x SOCs are hardcoded into the MDIO driver.
This must be relaxed to support additional devices like the RTL838x or
RTL839x. These do not have 4 SMI buses but 1 or 2 instead.

To support multiple devices establish an info structure that contains
individual variations of each series. As a first use case add the number
of buses into this structure and use it where needed.

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20260521175918.1494797-3-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: realtek-rtl9300: enhance documentation & naming

The Realtek ethernet MDIO driver currently only serves SOCs from the
Realtek RTL930x series. This is only one lineup of the Realtek Otto
switch series that also knows RTL838x, RTL839x, RTL931x devices.
All of these share similar hardware with comparable MMIO access logic
but have individual variations. Important to note

- Controller works on switch ports instead of buses and addresses.
- Devices incorporate additional MDIO hardware. E.g.
- an auxiliary MDIO controller for GPIO expanders [1]
- a MDIO style SerDes controller [2]

To avoid future confusion enhance the driver documentation and
function naming. Make clear what this driver is about and what
parts are generic and what parts are device specific. For this
rename the function and structure prefix as follows:

- for generic functions use otto_emdio_
- for device specific helpers use e.g. otto_emdio_9300_

This prefix naming tries to align with the watchdog timer [3].
It paves the way so that drivers for the other Realtek Otto MDIO
controllers can be added in future commits using the same naming
convention.

Remark 1: The read/write functions are kept device specific for now
because they will only fit the RTL930x SOCs. Renaming will take place
as soon as the I/O handling will be generalized.

Remark 2: The driver name "mdio-rtl9300" is kept for now.

[1] https://git.openwrt.org/openwrt/openwrt/tree/target/linux/realtek/patches-6.18/723-net-mdio-Add-Realtek-Otto-auxiliary-controller.patch
[2] https://git.openwrt.org/openwrt/openwrt/tree/target/linux/realtek/files-6.18/drivers/net/mdio/mdio-realtek-otto-serdes.c
[3] https://elixir.bootlin.com/linux/v7.0/source/drivers/watchdog/realtek_otto_wdt.c

Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://patch.msgid.link/20260521175918.1494797-2-markus.stockhausen@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: dpaa2: constify dpaa2_ethtool_stats and dpaa2_ethtool_extras

The 'dpaa2_ethtool_stats' and 'dpaa2_ethtool_extras' structures are
initialized in their declarations and never changed. So, constify them
to reduce the attack surface.

Before the patch (size dpaa2-ethtool.o):

   text    data     bss     dec     hex
  33433    5992       0   39425    9a01

After the patch (size dpaa2-ethtool.o):

   text    data     bss     dec     hex
  34937    4488       0   39425    9a01

Signed-off-by: Len Bao <len.bao@gmx.us>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260523150737.36988-1-len.bao@gmx.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ibm: emac: Use napi_gro_receive() for Rx packets

emac_poll_rx() already runs in NAPI context and TAH-equipped EMACs set
CHECKSUM_UNNECESSARY on verified frames, which lets GRO coalesce TCP
segments without a software checksum on the merge path. Replace the
per-poll rx_list batched with netif_receive_skb_list() with direct
napi_gro_receive() calls so the stack can merge segments into super-skbs
and skip a full traversal per packet -- a meaningful win on the slow
4xx-class CPUs this driver targets.

Small routing speed improvement tested on a Cisco Meraki MX60W:

Tested with iperf3

Before:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   494 MBytes   414 Mbits/sec  839             sender
[  5]   0.00-10.04  sec   492 MBytes   411 Mbits/sec                  receiver

After:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   510 MBytes   428 Mbits/sec  580             sender
[  5]   0.00-10.04  sec   508 MBytes   424 Mbits/sec                  receiver

Traffic to and from the router seems to be slow no matter what:

Tested with iperf3 --bidir

Before:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  8][TX-C]   0.00-10.00  sec   297 MBytes   249 Mbits/sec   35            sender
[  8][TX-C]   0.00-10.00  sec   293 MBytes   245 Mbits/sec                  receiver
[ 10][RX-C]   0.00-10.00  sec   184 MBytes   154 Mbits/sec    0            sender
[ 10][RX-C]   0.00-10.00  sec   184 MBytes   154 Mbits/sec                  receiver

After:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  8][TX-C]   0.00-10.00  sec   295 MBytes   248 Mbits/sec   31            sender
[  8][TX-C]   0.00-10.00  sec   294 MBytes   246 Mbits/sec                  receiver
[ 10][RX-C]   0.00-10.00  sec   181 MBytes   152 Mbits/sec    0            sender
[ 10][RX-C]   0.00-10.00  sec   181 MBytes   152 Mbits/sec                  receiver

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260521215908.257118-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'octeontx2-af-npc-enhancements'

Ratheesh Kannoth says:

====================
octeontx2-af: npc: Enhancements. [part]

Patch 1 reduces stack usage in mlx5e_pcie_cong_get_thresh_config()
by reusing a single union devlink_param_value across four
devl_param_driverinit_value_get() calls (instead of
union devlink_param_value val[4] on the stack) and assigning each
vu16 into mlx5e_pcie_cong_thresh, so the helper stays under the
frame-size warning limit as the union grows.

Patch 2 changes devlink_nl_param_value_put() and
devlink_nl_param_value_fill_one() to pass union devlink_param_value
by pointer instead of by value. Passing two copies of the union
by value in the param netlink path consumes over 500 bytes of argument
stack and risks CONFIG_FRAME_WARN as the union grows beyond its
historical size.
====================

Picking a couple of uncontroversial changes from the series
since it's making very slow progress.

Link: https://patch.msgid.link/20260521095303.2395584-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: pass param values by pointer

union devlink_param_value grows substantially once U64 array
parameters are added to devlink (from 32 bytes to over 264 bytes).
devlink_nl_param_value_fill_one() and devlink_nl_param_value_put()
copy the union by value in several places. Passing two instances as
value arguments alone consumes over 528 bytes of stack; combined with
deeper call chains the parameter stack can approach 800 bytes and trip
CONFIG_FRAME_WARN more easily.

Switch internal helpers and exported driver APIs to pass pointers to
union devlink_param_value rather than passing the union by value.

Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> #for ena
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260521095303.2395584-4-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Reduce stack use reading PCIe congestion thresholds

union devlink_param_value grew when U64 array parameters were added.
Keeping union devlink_param_value val[4] in
mlx5e_pcie_cong_get_thresh_config() exceeded the compiler's
-Wframe-larger-than limit.

Reuse one union: call devl_param_driverinit_value_get() once per
MLX5 PCIe congestion threshold and assign each vu16 to the
corresponding mlx5e_pcie_cong_thresh member.

Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260521095303.2395584-3-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-add-satellite-pf-support'

Tariq Toukan says:

====================
net/mlx5: Add satellite PF support

A satellite PF is a new SmartNIC configuration that adds another
physical function on the DPU that is not an eswitch manager and not a
page manager. The satellite PF can have its own SFs and can be passed
through to a VM on the DPU, providing an isolated function for users who
should not have access to the privileged ECPF. The ECPF handles the
satellite PF and the host PF in a similar way, using the same management
framework.

This series adds support for satellite PFs (SPFs) in the mlx5 eswitch.
SPFs are discovered through the v1 response layout of the
query_esw_functions command, introduced in the previous infrastructure
preparation series.

The first four patches discover satellite PFs, allocate eswitch vports
for them and their SFs, and extend the SF hardware table to manage SPF
SF entries.

The next five patches expose PF numbers from firmware, map SF
controllers to their pfnum, register devlink ports with proper
attributes, and register SF resource on satellite PF ports.

The final four patches add devlink port state management, FDB peer miss
rules, dedicated page accounting, and SF resource registration for
satellite PF vports.

This series builds on the eswitch infrastructure preparation series
previously submitted.
====================

Link: https://patch.msgid.link/20260521110843.367329-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add SPF function type for page management

Add MLX5_SPF to enum mlx5_func_type so SPFs get their own page counter,
and add the corresponding WARN check at page cleanup. Wait for SPF pages
to be reclaimed during ECPF teardown, alongside the existing host PF and
VF page waits.

SPF page requests are always identified by vhca_id, so the legacy
func_id_to_type() path is not reached for satellite PFs.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-13-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add FDB peer miss rules for satellite PFs

Add satellite PF (SPF) vports to the FDB peer miss rules flow.
Introduce mlx5_esw_for_each_spf_vport() macro to iterate SPF vports.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Support state get/set for satellite PF ports

Extend mlx5_devlink_pf_port_fn_state_get() to support satellite PF
vports by querying their vhca_state from the query_esw_functions output
using the vport's vhca_id.

Extend mlx5_devlink_pf_port_fn_state_set() to support satellite PFs by
using the generic mlx5_esw_pf_enable/disable_hca() functions.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Register SF resource on satellite PF ports

Extend port-level resource registration to satellite PF vports.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Register devlink ports for satellite PFs

Include satellite PFs in mlx5_eswitch_is_pf_vf_vport() so they receive
the standard PF/VF devlink port operations. Update
mlx5_esw_devlink_port_supported() and devlink port attribute setup to
register SPF devlink ports with controller number and PF number.

Add mlx5_esw_spf_vport_to_idx() to look up the SPF array index by vport
number, and mlx5_esw_is_spf_vport() boolean wrapper to identify
satellite PF vports.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Map SF controller to pfnum for satellite PFs

SF devlink port creation and registration used the ECPF's PCI function
as pfnum. Extend this to support satellite PF controllers by introducing
mlx5_esw_sf_controller_to_pfnum() that maps a controller number to the
corresponding PF number, and use it in SF port attribute setup and SF
creation validation.

Reorder the checks in mlx5_devlink_sf_port_new() so that
mlx5_sf_table_supported() runs before attribute validation, since the
new helper requires the eswitch to be initialized.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Expose PF number from query_esw_functions

Extract pci_device_function from the query_esw_functions output for both
the host PF and satellite PFs, storing it alongside the existing
host_number field.

Add mlx5_esw_get_hpf_pf_num() helper that returns the host PF's actual
PCI device function when the new query format is supported, falling back
to PCI_FUNC(dev->pdev->devfn) for older firmware. Use it in devlink port
attribute setup so that host PF and VF devlink ports report the correct
PF number rather than the ECPF's own PCI function number.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Support SPF SFs in SF hardware table

Convert the SF hardware table from a fixed-size hwc array to a
dynamically allocated one, supporting satellite PF (SPF) SFs alongside
local and external host SFs. Initialize hwc entries for each SPF using
its host_number as controller. Rename MLX5_SF_HWC_EXTERNAL to
MLX5_SF_HWC_EXT_HOST and add MLX5_SF_HWC_FIRST_SPF for clarity.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Initialize satellite PF SF vports

Extend satellite PF (SPF) initialization to allocate SF vports for each
SPF. For each discovered SPF, query its SF capabilities, allocate SF
vports, and store the host_number for controller identification.

Add accessor APIs mlx5_esw_get_num_spfs(),
mlx5_esw_spf_get_host_number(), mlx5_esw_sf_max_spf_functions(), and
mlx5_esw_has_spf_sfs() for use by the SF hardware table in a subsequent
patch. Also extend mlx5_esw_offloads_controller_valid() to accept SPF
controllers in addition to the host PF controller.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Initialize host PF host number earlier

Move host_number from esw->offloads to esw->esw_funcs as hpf_host_number
and initialize it during vports_init instead of offloads_enable. This
makes the host PF host number available earlier in the initialization
sequence, which is required for upcoming SF hardware table support for
satellite PFs.

Add a mlx5_esw_get_hpf_host_number() accessor to retrieve the stored
host number.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Introduce generic helper for PF SFs info

Introduce mlx5_esw_sf_max_pf_functions() that queries a PF's max_num_sf
and sf_base_id using mlx5_vport_get_other_func_general_cap(), which
supports both function_id and vhca_id based addressing.

Refactor mlx5_esw_sf_max_hpf_functions() into a thin wrapper that adds
the host PF precondition checks and calls the new generic helper. Remove
mlx5_query_hca_cap_host_pf() as it is not used anymore.

This prepares for querying SFs info of Satellite PFs.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add satellite PF vport support

Discover satellite PFs from query_esw_functions output and allocate
eswitch vports for them. For each satellite PF, create a vport via the
CREATE_ESW_VPORT command using its vhca_id and allocate it in the
eswitch vport table.

When enabling switchdev mode, the ECPF acting as the eswitch manager
activates each satellite PF with enable_hca, loads its vport and adds
a representor. Since satellite PF devlink ports are registered in a
later patch, guard mlx5_esw_offloads_devlink_port() against vports
with no devlink port to avoid NULL dereference during representor
attach.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260521110843.367329-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: cleanup error handling in lan966x_fdma_rx_alloc_page_pool()

This code works, but there are a few things to tidy up:
1. No need to an unlikely() because IS_ERR() already has an unlikely()
built in.
2. No need to use PTR_ERR_OR_ZERO() because it's not an error pointer.
3. Use the returned error code directly instead of using groveling in
rx->page_pool to find it.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Link: https://patch.msgid.link/ag7_YBWRpRmY9MGT@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rds: filter RDS_INFO_* getsockopt by caller's netns

The RDS_INFO_* family of getsockopt(2) options reads several
file-scope global lists that are not per-netns:

  rds_sock_info / rds6_sock_info,
  rds_sock_inc_info / rds6_sock_inc_info        -> rds_sock_list
  rds_tcp_tc_info / rds6_tcp_tc_info            -> rds_tcp_tc_list
  rds_conn_info / rds6_conn_info,
  rds_conn_message_info_cmn (for the *_SEND_MESSAGES and
  *_RETRANS_MESSAGES variants),
  rds_for_each_conn_info (for RDS_INFO_IB_CONNECTIONS)
                                                -> rds_conn_hash[]

The handlers do not filter by the caller's network namespace.
rds_info_getsockopt() has no netns or capable() check, and
rds_create() has no capable() check, so AF_RDS is reachable from
an unprivileged user namespace. As a result, an unprivileged
caller in a fresh user_ns plus netns can read the bound address
and sock inode of every RDS socket on the host, the peer address
of incoming messages on every RDS socket on the host, the peer
address and TCP sequence numbers of every rds-tcp connection on
the host, and the peer address and RDS sequence numbers of every
RDS connection on the host.

The rds-tcp transport is reachable from a non-initial netns (see
rds_set_transport()), so a one-shot init_net gate at
rds_info_getsockopt() would deny legitimate per-netns visibility
to rds-tcp callers. Instead, filter at each handler by comparing
the netns of the caller's socket to the netns of the list entry,
or to rds_conn_net(conn) for connection paths. Only copy entries
whose netns matches the caller. Counters (RDS_INFO_COUNTERS) are
aggregate statistics and remain global.

Reproducer (KASAN VM, rds and rds_tcp loaded): an AF_RDS socket
binds 127.0.0.1:4242 in init_net as root. A child process enters
a fresh user_ns plus netns and opens AF_RDS there, then calls
getsockopt(SOL_RDS, RDS_INFO_SOCKETS). Before this change, the
child sees the init_net socket. After this change, the child
sees zero entries.

Drop the rds_sock_count, rds_tcp_tc_count, and rds6_tcp_tc_count
globals. v2 used them for the size precheck and lens->nr; v3
replaced the precheck with a per-ns count from a first pass over
the list, so the globals have no remaining readers. The matching
increments and decrements in rds_create()/rds_destroy_sock() and
rds_tcp_set_callbacks()/rds_tcp_restore_callbacks() go away with
them. Reported by the kernel test robot under clang W=1.

Suggested-by: Allison Henderson <achender@kernel.org>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Allison Henderson <achender@kernel.org>
Co-developed-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
Signed-off-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Link: https://patch.msgid.link/20260520084236.2724349-1-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlabel: fix IPv6 unlabeled address add error handling

netlbl_unlhsh_add_addr6() always returned zero after
netlbl_af6list_add(), masking failures such as duplicate
IPv6 static label entries.

Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20260522022910.398416-1-zhaochenguang@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rds: annotate data-race around rs_seen_congestion

rs_seen_congestion is read in rds_poll() and written in rds_sendmsg()
and rds_poll() without any lock. Use READ_ONCE()/WRITE_ONCE() to
annotate these lockless accesses and silence KCSAN.

Reported-by: syzbot+fbf3648ae7f5bdb05c59@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a0f8d94.050a0220.6b33c.0000.GAE@google.com/
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Allison Henderson <achender@kernel.org>
Tested-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260522011621.304470-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: sch_ets: make cl->quantum lockless

cl->quantum does not need to be protected by RTNL or qdisc spinlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260522110356.1403343-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: igmp: annotate data-races around im->users

/proc/net/igmp walks IPv4 multicast memberships under RCU and
prints im->users without holding RTNL, while multicast join and leave
paths update the field while holding RTNL. Annotate this intentional
lockless snapshot with READ_ONCE() and the matching writers with
WRITE_ONCE().

Signed-off-by: Yuyang Huang <sigefriedhyy@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260522093906.39764-1-sigefriedhyy@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netfilter: nf_conntrack_ftp: avoid u16 overflows

get_port and try_number() parse comma-separated decimal values from FTP PORT
and EPRT commands into a u_int32_t array, but does not validate that each
value fits in a single octet. RFC 959 specifies that PORT parameters
are decimal integers in the range 0-255, representing the four octets
of an IP address followed by two octets encoding the port number.

Values exceeding 255 are silently accepted. In try_rfc959(), the raw
u32 values are combined via shift-and-OR to form the IP and port:

  cmd->u3.ip = htonl((array[0] << 24) | (array[1] << 16) |
                     (array[2] << 8) | array[3]);
  cmd->u.tcp.port = htons((array[4] << 8) | array[5]);

When array elements exceed 255, bits from one field bleed into adjacent
fields after shifting, producing IP addresses and port numbers that
differ from what the text representation suggests. For example,
"PORT 10,0,1,2,256,22" yields port (256<<8)|22 = 65558, truncated to
u16 = 22. This mismatch between the textual and computed values can
confuse network monitoring tools that parse FTP commands independently.

Ignore the command by returning 0 (no match) when any accumulated
value exceeds 255 so that no expectation is created.

Signed-off-by: Giuseppe Caruso <giuseppecaruso0990@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nft_set_pipapo_avx2: restore performance optimization

The avx2 lookup routines get the next map index to process passes as a
function argument, but this isn't obvious because it's hidden in the
lookup macro.

Additionally, a recent LLM review pointed out following "bug":
-------------------------------------------------------------
>               b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
>               if (last)
> -                     return b;
> +                     ret = b;
>
>               if (unlikely(ret == -1))
>                       ret = b / XSAVE_YMM_SIZE;

Does this change introduce a logic error when last=true and no match is
found? [..]

Should this be changed to an else-if structure instead?
-------------------------------------------------------------

LLM sees a control-flow change, but there is none:

All call sites invoke nft_pipapo_avx2_refill() only when at least one
bit in the map is set, i.e. nft_pipapo_avx2_refill() never returns -1.

Add a runtime debug check that fires if we'd return -1 as additional
documentation and also make the suggested change, code might be easier
to understand this way.

In commit 17a20e09f086 ("netfilter: nft_set: remove one argument from
lookup and update functions") I incorrectly moved the "ret" scope into
the loop.

This has no effect on the correctness, but it can (depending on map sizes)
cause a redundant repeat of an earlier processing step.

Restore the intended 'pass map index' instead of always-0.  Note that I
did not see any change in performance numbers, but Stefano correctly
points out that the existing perf test likely lack a sparse intermediate
bitmap (between fields) with a lot of leading zeroes.

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nf_conntrack_proto_tcp: fix typos in comments

Fix three typos in comments:

- "migth"/"Migth" -> "might" (two adjacent occurrences in the
tcp_conntracks[] state-transition table comment block).
- "agaist" -> "against" (tcp_error() header comment).
- "intrepretated" -> "interpreted" (RFC 5961 challenge-ACK
marker comment in nf_conntrack_tcp_packet()).

Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nf_conntrack_irc: fix parse_dcc() off-by-one OOB read

parse_dcc() treats data_end as an inclusive end pointer, but its only
caller passes data_limit = ib_ptr + datalen, which points one past the
last valid byte.

The newline search loop iterates while tmp <= data_end, so when no
newline is present, *tmp is read at tmp == data_end, one byte beyond
the region filled by skb_header_pointer().

irc_buffer is kmalloc'd as MAX_SEARCH_SIZE + 1 bytes and datalen is
capped at MAX_SEARCH_SIZE, so the stray read does not fault.  The byte
is uninitialized or stale; if it contains an ASCII digit, simple_strtoul
will consume it and produce a wrong DCC IP or port in the conntrack
expectation.  The extra allocation byte is also a fragile guard: if the
cap or allocation size changes, this becomes a real out-of-bounds read.

Change the loop and its post-loop check to use strict less-than,
consistent with the caller's exclusive-end convention.  Update the
function comment accordingly.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nfnl_cthelper: apply per-class values when updating policies

When a userspace conntrack helper with multiple expectation classes is
updated via nfnetlink, every class ends up with the first class's
max_expected and timeout values.

nfnl_cthelper_update_policy_all() validates each new policy into the
corresponding slot of the temporary new_policy array, but the second
loop that commits the values into the live helper dereferences
new_policy as a pointer instead of indexing it, so every iteration
reads new_policy[0] regardless of i. An update that changes per-class
values is silently collapsed onto class 0's values with no error
returned to userspace.

Index the temporary array by i in the commit loop so each class gets
its own validated values.

Fixes: 2c422257550f ("netfilter: nfnl_cthelper: fix runtime expectation policy updates")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nft_set_rbtree: remove dead conditional

net/netfilter/nft_set_rbtree.c:399 __nft_rbtree_insert()
warn: 'removed_end' is not an error pointer

Since commit : 087388278e0f ("netfilter: nf_tables: nft_set_rbtree: fix
spurious insertion failure") __nft_rbtree_insert() can no longer fail
and this condition is always false. Remove it.

Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/netfilter-devel/adjSaolTji0mPgqx@stanley.mountain/
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: ctnetlink: use nf_ct_exp_net() in expectation dump

Commit 02a3231b6d82 ("netfilter: nf_conntrack_expect: store netns and zone in expectation")
introduced exp->net so RCU-only expectation paths no longer need to
dereference exp->master for netns lookups.

Commit 3db5647984de ("netfilter: nf_conntrack_expect: skip expectations in other netns via proc")
updated the proc path accordingly, but ctnetlink_exp_dump_table() still
compares against nf_ct_net(exp->master).

Use nf_ct_exp_net(exp) here as well so the netlink dump path matches
the rest of the March 2026 expectation netns/RCU cleanup.

Fixes: 02a3231b6d82 ("netfilter: nf_conntrack_expect: store netns and zone in expectation")
Cc: stable@vger.kernel.org
Signed-off-by: Pratham Gupta <pratham36gupta@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nf_conncount: use per-rule hash initval

As-is, different netns will use same slots if the key is the same.
OVS uses this infrastructure to limit conntrack counts per zones.
Those can easily overlap. Make them hash to different slots internally.

Signed-off-by: Florian Westphal <fw@strlen.de>