]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
7 weeks agodt-bindings: net: move ptp-timer property to ethernet-controller.yaml
Wei Fang [Fri, 29 Aug 2025 05:06:03 +0000 (13:06 +0800)] 
dt-bindings: net: move ptp-timer property to ethernet-controller.yaml

For some Ethernet controllers, the PTP timer function is not integrated.
Instead, the PTP timer is a separate device and provides PTP Hardware
Clock (PHC) to the Ethernet controller to use, such as NXP FMan MAC,
ENETC, etc. Therefore, a property is needed to indicate this hardware
relationship between the Ethernet controller and the PTP timer.

Since this use case is also very common, it is better to add a generic
property to ethernet-controller.yaml. According to the existing binding
docs, there are two good candidates, one is the "ptp-timer" defined in
fsl,fman-dtsec.yaml, and the other is the "ptimer-handle" defined in
fsl,fman.yaml. From the perspective of the name, the former is more
straightforward, so move the "ptp-timer" from fsl,fman-dtsec.yaml to
ethernet-controller.yaml.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250829050615.1247468-3-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agodt-bindings: ptp: add NETC Timer PTP clock
Wei Fang [Fri, 29 Aug 2025 05:06:02 +0000 (13:06 +0800)] 
dt-bindings: ptp: add NETC Timer PTP clock

NXP NETC (Ethernet Controller) is a multi-function PCIe Root Complex
Integrated Endpoint (RCiEP), the Timer is one of its functions which
provides current time with nanosecond resolution, precise periodic
pulse, pulse on timeout (alarm), and time capture on external pulse
support. And also supports time synchronization as required for IEEE
1588 and IEEE 802.1AS-2020. So add device tree binding doc for the PTP
clock based on NETC Timer.

NETC Timer has three reference clock sources, but the clock mux is inside
the IP. Therefore, the driver will parse the clock name to select the
desired clock source. If the clocks property is not present, NETC Timer
will use the system clock of NETC IP as its reference clock. Because the
Timer is a PCIe function of NETC IP, the system clock of NETC is always
available to the Timer.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250829050615.1247468-2-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 weeks agoMerge branch 'inet-ping-misc-changes'
Jakub Kicinski [Mon, 1 Sep 2025 20:15:17 +0000 (13:15 -0700)] 
Merge branch 'inet-ping-misc-changes'

Eric Dumazet says:

====================
inet: ping: misc changes

First and third patches improve security a bit.

Second patch (ping_hash removal) is a cleanup.

Fourth patch uses EXPORT_IPV6_MOD[_GPL].
====================

Link: https://patch.msgid.link/20250829153054.474201-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet: ping: use EXPORT_IPV6_MOD[_GPL]()
Eric Dumazet [Fri, 29 Aug 2025 15:30:54 +0000 (15:30 +0000)] 
inet: ping: use EXPORT_IPV6_MOD[_GPL]()

There is no neeed to export ping symbols when CONFIG_IPV6=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250829153054.474201-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet: ping: make ping_port_rover per netns
Eric Dumazet [Fri, 29 Aug 2025 15:30:53 +0000 (15:30 +0000)] 
inet: ping: make ping_port_rover per netns

Provide isolation between netns for ping idents.

Randomize initial ping_port_rover value at netns creation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250829153054.474201-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet: ping: remove ping_hash()
Eric Dumazet [Fri, 29 Aug 2025 15:30:52 +0000 (15:30 +0000)] 
inet: ping: remove ping_hash()

There is no point in keeping ping_hash().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250829153054.474201-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet: ping: check sock_net() in ping_get_port() and ping_lookup()
Eric Dumazet [Fri, 29 Aug 2025 15:30:51 +0000 (15:30 +0000)] 
inet: ping: check sock_net() in ping_get_port() and ping_lookup()

We need to check socket netns before considering them in ping_get_port().
Otherwise, one malicious netns could 'consume' all ports.

Add corresponding check in ping_lookup().

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250829153054.474201-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: mdio: update runtime PM
Russell King (Oracle) [Fri, 29 Aug 2025 09:02:29 +0000 (10:02 +0100)] 
net: stmmac: mdio: update runtime PM

Commit 3c7826d0b106 ("net: stmmac: Separate C22 and C45 transactions
for xgmac") missed a change that happened in commit e2d0acd40c87
("net: stmmac: using pm_runtime_resume_and_get instead of
pm_runtime_get_sync").

Update the two clause 45 functions that didn't get switched to
pm_runtime_resume_and_get().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/E1urv09-00000000gJ1-3SxO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoselftests: net: fix spelling and grammar mistakes
Praveen Balakrishnan [Thu, 28 Aug 2025 21:11:00 +0000 (22:11 +0100)] 
selftests: net: fix spelling and grammar mistakes

Fix several spelling and grammatical mistakes in output messages from
the net selftests to improve readability.

Only the message strings for the test output have been modified. No
changes to the functional logic of the tests have been made.

Signed-off-by: Praveen Balakrishnan <praveen.balakrishnan@magd.ox.ac.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250828211100.51019-1-praveen.balakrishnan@magd.ox.ac.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoptp: Limit time setting of PTP clocks
Miroslav Lichvar [Thu, 28 Aug 2025 10:32:53 +0000 (12:32 +0200)] 
ptp: Limit time setting of PTP clocks

Networking drivers implementing PTP clocks and kernel socket code
handling hardware timestamps use the 64-bit signed ktime_t type counting
nanoseconds. When a PTP clock reaches the maximum value in year 2262,
the timestamps returned to applications will overflow into year 1667.
The same thing happens when injecting a large offset with
clock_adjtime(ADJ_SETOFFSET).

The commit 7a8e61f84786 ("timekeeping: Force upper bound for setting
CLOCK_REALTIME") limited the maximum accepted value setting the system
clock to 30 years before the maximum representable value (i.e. year
2232) to avoid the overflow, assuming the system will not run for more
than 30 years.

Enforce the same limit for PTP clocks. Don't allow negative values and
values closer than 30 years to the maximum value. Drivers may implement
an even lower limit if the hardware registers cannot represent the whole
interval between years 1970 and 2262 in the required resolution.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <jstultz@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250828103300.1387025-1-mlichvar@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: ethernet: qualcomm: QCOM_PPE should depend on ARCH_QCOM
Geert Uytterhoeven [Fri, 29 Aug 2025 11:27:06 +0000 (13:27 +0200)] 
net: ethernet: qualcomm: QCOM_PPE should depend on ARCH_QCOM

The Qualcomm Technologies, Inc. Packet Process Engine (PPE) is only
present on Qualcomm IPQ SoCs.  Hence add a dependency on ARCH_QCOM, to
prevent asking the user about this driver when configuring a kernel
without Qualcomm platform support,

Fixes: 353a0f1d5b27606b ("net: ethernet: qualcomm: Add PPE driver for IPQ9574 SoC")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/eb7bd6e6ce27eb6d602a63184d9daa80127e32bd.1756466786.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: Remove sk->sk_prot->orphan_count.
Kuniyuki Iwashima [Fri, 29 Aug 2025 21:56:38 +0000 (21:56 +0000)] 
tcp: Remove sk->sk_prot->orphan_count.

TCP tracks the number of orphaned (SOCK_DEAD but not yet destructed)
sockets in tcp_orphan_count.

In some code that was shared with DCCP, tcp_orphan_count is referenced
via sk->sk_prot->orphan_count.

Let's reference tcp_orphan_count directly.

inet_csk_prepare_for_destroy_sock() is moved to inet_connection_sock.c
due to header dependency.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250829215641.711664-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net-add-rcu-safety-to-dst-dev'
Jakub Kicinski [Sat, 30 Aug 2025 02:36:34 +0000 (19:36 -0700)] 
Merge branch 'net-add-rcu-safety-to-dst-dev'

Eric Dumazet says:

====================
net: add rcu safety to dst->dev

Followup of commit 88fe14253e18 ("net: dst: add four helpers
to annotate data-races around dst->dev").

Use lockdep enabled helpers to convert our unsafe dst->dev
uses one at a time.

More to come...
====================

Link: https://patch.msgid.link/20250828195823.3958522-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoipv4: start using dst_dev_rcu()
Eric Dumazet [Thu, 28 Aug 2025 19:58:23 +0000 (19:58 +0000)] 
ipv4: start using dst_dev_rcu()

Change icmpv4_xrlim_allow(), ip_defrag() to prevent possible UAF.

Change ipmr_prepare_xmit(), ipmr_queue_fwd_xmit(), ip_mr_output(),
ipv4_neigh_lookup() to use lockdep enabled dst_dev_rcu().

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: use dst_dev_rcu() in tcp_fastopen_active_disable_ofo_check()
Eric Dumazet [Thu, 28 Aug 2025 19:58:22 +0000 (19:58 +0000)] 
tcp: use dst_dev_rcu() in tcp_fastopen_active_disable_ofo_check()

Use RCU to avoid a pair of atomic operations and a potential
UAF on dst_dev()->flags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp_metrics: use dst_dev_net_rcu()
Eric Dumazet [Thu, 28 Aug 2025 19:58:21 +0000 (19:58 +0000)] 
tcp_metrics: use dst_dev_net_rcu()

Replace three dst_dev() with a lockdep enabled helper.

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: use dst_dev_rcu() in sk_setup_caps()
Eric Dumazet [Thu, 28 Aug 2025 19:58:20 +0000 (19:58 +0000)] 
net: use dst_dev_rcu() in sk_setup_caps()

Use RCU to protect accesses to dst->dev from sk_setup_caps()
and sk_dst_gso_max_size().

Also use dst_dev_rcu() in ip6_dst_mtu_maybe_forward(),
and ip_dst_mtu_maybe_forward().

ip4_dst_hoplimit() can use dst_dev_net_rcu().

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoipv6: use RCU in ip6_output()
Eric Dumazet [Thu, 28 Aug 2025 19:58:19 +0000 (19:58 +0000)] 
ipv6: use RCU in ip6_output()

Use RCU in ip6_output() in order to use dst_dev_rcu() to prevent
possible UAF.

We can remove rcu_read_lock()/rcu_read_unlock() pairs
from ip6_finish_output2().

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoipv6: use RCU in ip6_xmit()
Eric Dumazet [Thu, 28 Aug 2025 19:58:18 +0000 (19:58 +0000)] 
ipv6: use RCU in ip6_xmit()

Use RCU in ip6_xmit() in order to use dst_dev_rcu() to prevent
possible UAF.

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoipv6: start using dst_dev_rcu()
Eric Dumazet [Thu, 28 Aug 2025 19:58:17 +0000 (19:58 +0000)] 
ipv6: start using dst_dev_rcu()

Refactor icmpv6_xrlim_allow() and ip6_dst_hoplimit()
so that we acquire rcu_read_lock() a bit longer
to be able to use dst_dev_rcu() instead of dst_dev().

__ip6_rt_update_pmtu() and rt6_do_redirect can directly
use dst_dev_rcu() in sections already holding rcu_read_lock().

Small changes to use dst_dev_net_rcu() in
ip6_default_advmss(), ipv6_sock_ac_join(),
ip6_mc_find_dev() and ndisc_send_skb().

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: dst: introduce dst->dev_rcu
Eric Dumazet [Thu, 28 Aug 2025 19:58:16 +0000 (19:58 +0000)] 
net: dst: introduce dst->dev_rcu

Followup of commit 88fe14253e18 ("net: dst: add four helpers
to annotate data-races around dst->dev").

We want to gradually add explicit RCU protection to dst->dev,
including lockdep support.

Add an union to alias dst->dev_rcu and dst->dev.

Add dst_dev_net_rcu() helper.

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'inet_diag-make-dumps-faster-with-simple-filters'
Jakub Kicinski [Sat, 30 Aug 2025 02:29:26 +0000 (19:29 -0700)] 
Merge branch 'inet_diag-make-dumps-faster-with-simple-filters'

Eric Dumazet says:

====================
inet_diag: make dumps faster with simple filters

inet_diag_bc_sk() pulls five cache lines per socket,
while most filters only need the two first ones.

We can change it to only pull needed cache lines,
to make things like "ss -temoi src :21456" much faster.

First patches (1-3) are annotating data-races as a first step.
====================

Link: https://patch.msgid.link/20250828102738.2065992-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet_diag: avoid cache line misses in inet_diag_bc_sk()
Eric Dumazet [Thu, 28 Aug 2025 10:27:38 +0000 (10:27 +0000)] 
inet_diag: avoid cache line misses in inet_diag_bc_sk()

inet_diag_bc_sk() pulls five cache lines per socket,
while most filters only need the two first ones.

Add three booleans to struct inet_diag_dump_data,
that are selectively set if a filter needs specific socket fields.

- mark_needed       /* INET_DIAG_BC_MARK_COND present. */
- cgroup_needed     /* INET_DIAG_BC_CGROUP_COND present. */
- userlocks_needed  /* INET_DIAG_BC_AUTO present. */

This removes millions of cache lines misses per ss invocation
when simple filters are specified on busy servers.

offsetof(struct sock, sk_userlocks) = 0xf3
offsetof(struct sock, sk_mark) = 0x20c
offsetof(struct sock, sk_cgrp_data) = 0x298

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet_diag: change inet_diag_bc_sk() first argument
Eric Dumazet [Thu, 28 Aug 2025 10:27:37 +0000 (10:27 +0000)] 
inet_diag: change inet_diag_bc_sk() first argument

We want to have access to the inet_diag_dump_data structure
in the following patch.

This patch removes duplication in callers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet_diag: annotate data-races in inet_diag_bc_sk()
Eric Dumazet [Thu, 28 Aug 2025 10:27:36 +0000 (10:27 +0000)] 
inet_diag: annotate data-races in inet_diag_bc_sk()

inet_diag_bc_sk() runs with an unlocked socket,
annotate potential races with READ_ONCE().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agotcp: annotate data-races in tcp_req_diag_fill()
Eric Dumazet [Thu, 28 Aug 2025 10:27:35 +0000 (10:27 +0000)] 
tcp: annotate data-races in tcp_req_diag_fill()

req->num_retrans and rsk_timer.expires are read locklessly,
and can be changed from tcp_rtx_synack().

Add READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoinet_diag: annotate data-races in inet_diag_msg_common_fill()
Eric Dumazet [Thu, 28 Aug 2025 10:27:34 +0000 (10:27 +0000)] 
inet_diag: annotate data-races in inet_diag_msg_common_fill()

inet_diag_msg_common_fill() can run without socket lock.
Add READ_ONCE() or data_race() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250828102738.2065992-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agomicrochip: lan865x: add ndo_eth_ioctl handler to enable PHY ioctl support
Parthiban Veerasooran [Thu, 28 Aug 2025 11:45:49 +0000 (17:15 +0530)] 
microchip: lan865x: add ndo_eth_ioctl handler to enable PHY ioctl support

Introduce support for standard MII ioctl operations in the LAN865x
Ethernet driver by implementing the .ndo_eth_ioctl callback. This allows
PHY-related ioctl commands to be handled via phy_do_ioctl_running() and
enables support for ethtool and other user-space tools that rely on ioctl
interface to perform PHY register access using commands like SIOCGMIIREG
and SIOCSMIIREG.

This feature enables improved diagnostics and PHY configuration
capabilities from userspace.

Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250828114549.46116-1-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agovsock/test: Remove redundant semicolons
Liao Yuanhong [Thu, 28 Aug 2025 08:39:38 +0000 (16:39 +0800)] 
vsock/test: Remove redundant semicolons

Remove unnecessary semicolons.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250828083938.400872-1-liaoyuanhong@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopppoe: drop sock reference counting on fast path
Qingfang Deng [Thu, 28 Aug 2025 01:20:17 +0000 (09:20 +0800)] 
pppoe: drop sock reference counting on fast path

Now that PPPoE sockets are freed via RCU (SOCK_RCU_FREE), it is no longer
necessary to take a reference count when looking up sockets on the receive
path. Readers are protected by RCU, so the socket memory remains valid
until after a grace period.

Convert fast-path lookups to avoid refcounting:
 - Replace get_item() and sk_receive_skb() in pppoe_rcv() with
   __get_item() and __sk_receive_skb().
 - Rework get_item_by_addr() into __get_item_by_addr() (no refcount and
   move RCU lock into pppoe_ioctl)
 - Remove unnecessary sock_put() calls.

This avoids cacheline bouncing from atomic reference counting and improves
performance on the receive fast path.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250828012018.15922-2-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agopppoe: remove rwlock usage
Qingfang Deng [Thu, 28 Aug 2025 01:20:16 +0000 (09:20 +0800)] 
pppoe: remove rwlock usage

Like ppp_generic.c, convert the PPPoE socket hash table to use RCU for
lookups and a spinlock for updates. This removes rwlock usage and allows
lockless readers on the fast path.

- Mark hash table and list pointers as __rcu.
- Use spin_lock() to protect writers.
- Readers use rcu_dereference() under rcu_read_lock(). All known callers
  of get_item() already hold the RCU read lock, so no additional locking
  is needed.
- get_item() now uses refcount_inc_not_zero() instead of sock_hold() to
  safely take a reference. This prevents crashes if a socket is already
  in the process of being freed (sk_refcnt == 0).
- Set SOCK_RCU_FREE to defer socket freeing until after an RCU grace
  period.
- Move skb_queue_purge() into sk_destruct callback to ensure purge
  happens after an RCU grace period.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250828012018.15922-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 12 Jun 2025 17:08:24 +0000 (10:08 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.17-rc4).

No conflicts.

Adjacent changes:

drivers/net/ethernet/intel/idpf/idpf_txrx.c
  02614eee26fb ("idpf: do not linearize big TSO packets")
  6c4e68480238 ("idpf: remove obsolete stashing code")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 29 Aug 2025 00:35:51 +0000 (17:35 -0700)] 
Merge tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth.

  Current release - regressions:

    - ipv4: fix regression in local-broadcast routes

    - vsock: fix error-handling regression introduced in v6.17-rc1

  Previous releases - regressions:

    - bluetooth:
        - mark connection as closed during suspend disconnect
        - fix set_local_name race condition

    - eth:
        - ice: fix NULL pointer dereference on reset
        - mlx5: fix memory leak in hws_pool_buddy_init error path
        - bnxt_en: fix stats context reservation logic
        - hv: fix loss of receive events from host during channel open

  Previous releases - always broken:

    - page_pool: fix incorrect mp_ops error handling

    - sctp: initialize more fields in sctp_v6_from_sk()

    - eth:
        - octeontx2-vf: fix max packet length errors
        - idpf: fix Tx flow scheduling to avoid Tx timeouts
        - bnxt_en: fix memory corruption during ifdown
        - ice: fix incorrect counter for buffer allocation failures
        - mlx5: fix lockdep assertion on sync reset unload event
        - fbnic: fixup rtnl_lock and devl_lock handling
        - xgmac: do not enable RX FIFO overflow interrupts

    - phy: mscc: fix when PTP clock is register and unregister

  Misc:

    - add Telit Cinterion LE910C4-WWX new compositions"

* tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
  net: ipv4: fix regression in local-broadcast routes
  net: macb: Disable clocks once
  fbnic: Move phylink resume out of service_task and into open/close
  fbnic: Fixup rtnl_lock and devl_lock handling related to mailbox code
  net: rose: fix a typo in rose_clear_routes()
  l2tp: do not use sock_hold() in pppol2tp_session_get_sock()
  sctp: initialize more fields in sctp_v6_from_sk()
  MAINTAINERS: rmnet: Update email addresses
  net: rose: include node references in rose_neigh refcount
  net: rose: convert 'use' field to refcount_t
  net: rose: split remove and free operations in rose_remove_neigh()
  net: hv_netvsc: fix loss of early receive events from host during channel open.
  net: stmmac: Set CIC bit only for TX queues with COE
  net: stmmac: xgmac: Correct supported speed modes
  net: stmmac: xgmac: Do not enable RX FIFO Overflow interrupts
  net/mlx5e: Set local Xoff after FW update
  net/mlx5e: Update and set Xon/Xoff upon port speed set
  net/mlx5e: Update and set Xon/Xoff upon MTU set
  net/mlx5: Prevent flow steering mode changes in switchdev mode
  net/mlx5: Nack sync reset when SFs are present
  ...

7 weeks agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Thu, 28 Aug 2025 23:59:21 +0000 (16:59 -0700)] 
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
ice: split ice_virtchnl.c git-blame friendly way

Przemek Kitszel says:

Split ice_virtchnl.c into two more files (+headers), in a way
that git-blame works better.
Then move virtchnl files into a new subdir.
No logic changes.

I have developed (or discovered ;)) how to split a file in a way that
both old and new are nice in terms of git-blame
There was not much discussion on [RFC], so I would like to propose
to go forward with this approach.

There are more commits needed to have it nice, so it forms a git-log vs
git-blame tradeoff, but (after the brief moment that this is on the top)
we spend orders of magnitude more time looking at the blame output (and
commit messages linked from that) - so I find it much better to see
actual logic changes instead of "move xx to yy" stuff (typical for
"squashed/single-commit splits").

Cherry-picks/rebases work the same with this method as with simple
"squashed/single-commit" approach (literally all commits squashed into
one (to have better git-log, but shitty git-blame output).

Rationale for the split itself is, as usual, "file is big and we want to
extend it".

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: finish virtchnl.c split into rss.c
  ice: extract virt/rss.c: cleanup - p2
  ice: extract virt/rss.c: cleanup - p1
  ice: split RSS stuff out of virtchnl.c - copy back
  ice: split RSS stuff out of virtchnl.c - tmp rename
  ice: finish virtchnl.c split into queues.c
  ice: extract virt/queues.c: cleanup - p3
  ice: extract virt/queues.c: cleanup - p2
  ice: extract virt/queues.c: cleanup - p1
  ice: split queue stuff out of virtchnl.c - copy back
  ice: split queue stuff out of virtchnl.c - tmp rename
  ice: add virt/ and move ice_virtchnl* files there
====================

Link: https://patch.msgid.link/20250827224641.415806-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoeth: mlx5: remove Kconfig co-dependency with VXLAN
Jakub Kicinski [Wed, 27 Aug 2025 23:43:19 +0000 (16:43 -0700)] 
eth: mlx5: remove Kconfig co-dependency with VXLAN

mlx5 has a Kconfig co-dependency on VXLAN, even tho it doesn't
call any VXLAN function (unlike mlxsw). Perhaps this dates back
to very old days when tunnel ports were fetched directly from
VXLAN.

Remove the dependency to allow MLX5=y + VXLAN=m kernel configs.
But still avoid compiling in the lib/vxlan code if VXLAN=n.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20250827234319.3504852-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: mdio: clean up c22/c45 accessor split
Russell King (Oracle) [Wed, 27 Aug 2025 13:27:47 +0000 (14:27 +0100)] 
net: stmmac: mdio: clean up c22/c45 accessor split

The C45 accessors were setting the GR (register number) field twice,
once with the 16-bit register address truncated to five bits, and
then overwritten with the C45 devad. This is harmless since the field
was being cleared prior to being updated with the C45 devad, except
for the extra work.

Remove the redundant code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1urGBn-00000000DCH-3swS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'net_sched-extend-rcu-use-in-dump-methods-ii'
Jakub Kicinski [Thu, 28 Aug 2025 23:46:25 +0000 (16:46 -0700)] 
Merge branch 'net_sched-extend-rcu-use-in-dump-methods-ii'

Eric Dumazet says:

====================
net_sched: extend RCU use in dump() methods (II)

Second series adding RCU dump() to three actions

First patch removes BH blocking on modules done in the first series.
====================

Link: https://patch.msgid.link/20250827125349.3505302-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: act_skbmod: use RCU in tcf_skbmod_dump()
Eric Dumazet [Wed, 27 Aug 2025 12:53:49 +0000 (12:53 +0000)] 
net_sched: act_skbmod: use RCU in tcf_skbmod_dump()

Also storing tcf_action into struct tcf_skbmod_params
makes sure there is no discrepancy in tcf_skbmod_act().

No longer block BH in tcf_skbmod_init() when acquiring tcf_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827125349.3505302-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: act_tunnel_key: use RCU in tunnel_key_dump()
Eric Dumazet [Wed, 27 Aug 2025 12:53:48 +0000 (12:53 +0000)] 
net_sched: act_tunnel_key: use RCU in tunnel_key_dump()

Also storing tcf_action into struct tcf_tunnel_key_params
makes sure there is no discrepancy in tunnel_key_act().

No longer block BH in tunnel_key_init() when acquiring tcf_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827125349.3505302-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: act_vlan: use RCU in tcf_vlan_dump()
Eric Dumazet [Wed, 27 Aug 2025 12:53:47 +0000 (12:53 +0000)] 
net_sched: act_vlan: use RCU in tcf_vlan_dump()

Also storing tcf_action into struct tcf_vlan_params
makes sure there is no discrepancy in tcf_vlan_act().

No longer block BH in tcf_vlan_init() when acquiring tcf_lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827125349.3505302-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet_sched: remove BH blocking in eight actions
Eric Dumazet [Wed, 27 Aug 2025 12:53:46 +0000 (12:53 +0000)] 
net_sched: remove BH blocking in eight actions

Followup of f45b45cbfae3 ("Merge branch
'net_sched-act-extend-rcu-use-in-dump-methods'")

We never grab tcf_lock from BH context in these modules:

 act_connmark
 act_csum
 act_ct
 act_ctinfo
 act_mpls
 act_nat
 act_pedit
 act_skbedit

No longer block BH when acquiring tcf_lock from init functions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827125349.3505302-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: minor cleanups to stmmac_bus_clks_config()
Russell King (Oracle) [Wed, 27 Aug 2025 08:54:51 +0000 (09:54 +0100)] 
net: stmmac: minor cleanups to stmmac_bus_clks_config()

stmmac_bus_clks_config() doesn't need to repeatedly on dereference
priv->plat as this remains the same throughout this function. Not only
does this detract from the function's readability, but it could cause
the value to be reloaded each time. Use a local variable.

Also, the final return can simply return zero, and we can dispense
with the initialiser for 'ret'.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/E1urBvf-000000002ii-37Ce@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: stmmac: mdio: use netdev_priv() directly
Russell King (Oracle) [Wed, 27 Aug 2025 08:41:48 +0000 (09:41 +0100)] 
net: stmmac: mdio: use netdev_priv() directly

netdev_priv() is an inline function, taking a struct net_device
pointer. When passing in the MII bus->priv, which is a void pointer,
there is no need to go via a local ndev variable to type it first.

Thus, instead of:

struct net_device *ndev = bus->priv;
struct stmmac_priv *priv;
...
priv = netdev_priv(ndev);

we can simply do:

struct stmmac_priv *priv = netdev_priv(bus->priv);

which simplifies the code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/E1urBj2-000000002as-0pod@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: phy: mtk-2p5ge: Add LED support for MT7988
Sky Huang [Wed, 27 Aug 2025 04:47:55 +0000 (12:47 +0800)] 
net: phy: mtk-2p5ge: Add LED support for MT7988

Add LED support for MT7988's built-in 2.5Gphy. LED hardware has almost
the same design with MT7981's/MT7988's built-in GbE. So hook the same
helper function here.

Before mtk_phy_leds_state_init(), set correct default values of LED0
and LED1.

Signed-off-by: Sky Huang <skylake.huang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250827044755.3256991-1-SkyLake.Huang@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'pm-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Thu, 28 Aug 2025 23:34:32 +0000 (16:34 -0700)] 
Merge tag 'pm-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Add missing locking annotations to two recently introduced
  list_for_each_entry_rcu() loops in the core device suspend/resume
  code (Johannes Berg)"

* tag 'pm-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: annotate RCU list iterations

7 weeks agoselftests: drv-net: rss_ctx: fix the queue count check
Jakub Kicinski [Wed, 27 Aug 2025 17:35:58 +0000 (10:35 -0700)] 
selftests: drv-net: rss_ctx: fix the queue count check

Commit 0d6ccfe6b319 ("selftests: drv-net: rss_ctx: check for all-zero keys")
added a skip exception if NIC has fewer than 3 queues enabled,
but it's just constructing the object, it's not actually rising
this exception.

Before:

  # Exception| net.lib.py.utils.CmdExitFailure: Command failed: ethtool -X enp1s0 equal 3 hkey d1:cc:77:47:9d:ea:15:f2:b9:6c:ef:68:62:c0:45:d5:b0:99:7d:cf:29:53:40:06:3d:8e:b9:bc:d4:70:89:b8:8d:59:04:ea:a9:c2:21:b3:55:b8:ab:6b:d9:48:b4:bd:4c:ff:a5:f0:a8:c2
  not ok 1 rss_ctx.test_rss_key_indir

After:

  ok 1 rss_ctx.test_rss_key_indir # SKIP Device has fewer than 3 queues (or doesn't support queue stats)

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827173558.3259072-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge branch 'devmem-io_uring-allow-more-flexibility-for-zc-dma-devices'
Jakub Kicinski [Thu, 28 Aug 2025 23:05:34 +0000 (16:05 -0700)] 
Merge branch 'devmem-io_uring-allow-more-flexibility-for-zc-dma-devices'

Dragos Tatulea says:

====================
devmem/io_uring: allow more flexibility for ZC DMA devices

For TCP zerocopy rx (io_uring, devmem), there is an assumption that the
parent device can do DMA. However that is not always the case:
- Scalable Function netdevs [1] have the DMA device in the grandparent.
- For Multi-PF netdevs [2] queues can be associated to different DMA
  devices.

The series adds an API for getting the DMA device for a netdev queue.
Drivers that have special requirements can implement the newly added
queue management op. Otherwise the parent will still be used as before.

This series continues with switching to this API for io_uring zcrx and
devmem and adds a ndo_queue_dma_dev op for mlx5.

The last part of the series changes devmem rx bind to get the DMA device
per queue and blocks the case when multiple queues use different DMA
devices. The tx bind is left as is.

[1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
[2] Documentation/networking/multi-pf-netdev.rst
====================

Link: https://patch.msgid.link/20250827144017.1529208-2-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: devmem: allow binding on rx queues with same DMA devices
Dragos Tatulea [Wed, 27 Aug 2025 14:40:01 +0000 (17:40 +0300)] 
net: devmem: allow binding on rx queues with same DMA devices

Multi-PF netdevs have queues belonging to different PFs which also means
different DMA devices. This means that the binding on the DMA buffer can
be done to the incorrect device.

This change allows devmem binding to multiple queues only when the
queues have the same DMA device. Otherwise an error is returned.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20250827144017.1529208-9-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: devmem: pre-read requested rx queues during bind
Dragos Tatulea [Wed, 27 Aug 2025 14:40:00 +0000 (17:40 +0300)] 
net: devmem: pre-read requested rx queues during bind

Instead of reading the requested rx queues after binding the buffer,
read the rx queues in advance in a bitmap and iterate over them when
needed.

This is a preparation for fetching the DMA device for each queue.

This patch has no functional changes besides adding an extra
rq index bounds check.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250827144017.1529208-8-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: devmem: pull out dma_dev out of net_devmem_bind_dmabuf
Dragos Tatulea [Wed, 27 Aug 2025 14:39:59 +0000 (17:39 +0300)] 
net: devmem: pull out dma_dev out of net_devmem_bind_dmabuf

Fetch the DMA device before calling net_devmem_bind_dmabuf()
and pass it on as a parameter.

This is needed for an upcoming change which will read the
DMA device per queue.

This patch has no functional changes.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250827144017.1529208-7-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet/mlx5e: add op for getting netdev DMA device
Dragos Tatulea [Wed, 27 Aug 2025 14:39:58 +0000 (17:39 +0300)] 
net/mlx5e: add op for getting netdev DMA device

For zero-copy (devmem, io_uring), the netdev DMA device used
is the parent device of the net device. However that is not
always accurate for mlx5 devices:
- SFs: The parent device is an auxdev.
- Multi-PF netdevs: The DMA device should be determined by
  the queue.

This change implements the DMA device queue API that returns the DMA
device appropriately for all cases.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250827144017.1529208-6-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agonet: devmem: get netdev DMA device via new API
Dragos Tatulea [Wed, 27 Aug 2025 14:39:57 +0000 (17:39 +0300)] 
net: devmem: get netdev DMA device via new API

Switch to the new API for fetching DMA devices for a netdev. The API is
called with queue index 0 for now which is equivalent with the previous
behavior.

This patch will allow devmem to work with devices where the DMA device
is not stored in the parent device. mlx5 SFs are an example of such a
device.

Multi-PF netdevs are still problematic (as they were before this
change). Upcoming patches will address this for the rx binding.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250827144017.1529208-5-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoio_uring/zcrx: add support for custom DMA devices
Dragos Tatulea [Wed, 27 Aug 2025 14:39:56 +0000 (17:39 +0300)] 
io_uring/zcrx: add support for custom DMA devices

Use the new API for getting a DMA device for a specific netdev queue.

This patch will allow io_uring zero-copy rx to work with devices
where the DMA device is not stored in the parent device. mlx5 SFs
are an example of such a device.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/20250827144017.1529208-4-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoqueue_api: add support for fetching per queue DMA dev
Dragos Tatulea [Wed, 27 Aug 2025 14:39:55 +0000 (17:39 +0300)] 
queue_api: add support for fetching per queue DMA dev

For zerocopy (io_uring, devmem), there is an assumption that the
parent device can do DMA. However that is not always the case:
- Scalable Function netdevs [1] have the DMA device in the grandparent.
- For Multi-PF netdevs [2] queues can be associated to different DMA
devices.

This patch introduces the a queue based interface for allowing drivers
to expose a different DMA device for zerocopy.

[1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
[2] Documentation/networking/multi-pf-netdev.rst

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250827144017.1529208-3-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 weeks agoMerge tag 'dma-mapping-6.17-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 28 Aug 2025 23:04:14 +0000 (16:04 -0700)] 
Merge tag 'dma-mapping-6.17-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping fixes from Marek Szyprowski:

 - another small fix for arm64 systems with memory encryption (Shanker
   Donthineni)

 - fix for arm32 systems with non-standard CMA configuration (Oreoluwa
   Babatunde)

* tag 'dma-mapping-6.17-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma/pool: Ensure DMA_DIRECT_REMAP allocations are decrypted
  of: reserved_mem: Restructure call site for dma_contiguous_early_fixup()

7 weeks agoMerge tag 'fixes-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Thu, 28 Aug 2025 22:46:06 +0000 (15:46 -0700)] 
Merge tag 'fixes-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fixes from Mike Rapoport:

 - printk cleanups in memblock and numa_memblks

 - update kernel-doc for MEMBLOCK_RSRV_NOINIT to be more accurate and
   detailed

* tag 'fixes-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock: fix kernel-doc for MEMBLOCK_RSRV_NOINIT
  mm: numa,memblock: Use SZ_1M macro to denote bytes to MB conversion
  mm/numa_memblks: Use pr_debug instead of printk(KERN_DEBUG)

7 weeks agoMerge tag 'powerpc-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Thu, 28 Aug 2025 22:39:06 +0000 (15:39 -0700)] 
Merge tag 'powerpc-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Madhavan Srinivasan:

 - Merge two CONFIG_POWERPC64_CPU entries in Kconfig.cputype

 - Replace extra-y to always-y in Makefile

 - Cleanup to use dev_fwnode helper

 - Fix misleading comment in kvmppc_prepare_to_enter()

 - misc cleanup and fixes

Thanks to Amit Machhiwal, Andrew Donnellan, Christophe Leroy, Gautam
Menghani, Jiri Slaby (SUSE), Masahiro Yamada, Shrikanth Hegde, Stephen
Rothwell, Venkat Rao Bagalkote, and Xichao Zhao

* tag 'powerpc-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/boot/install.sh: Fix shellcheck warnings
  powerpc/prom_init: Fix shellcheck warnings
  powerpc/kvm: Fix ifdef to remove build warning
  powerpc: unify two CONFIG_POWERPC64_CPU entries in the same choice block
  powerpc: use always-y instead of extra-y in Makefiles
  powerpc/64: Drop unnecessary 'rc' variable
  powerpc: Use dev_fwnode()
  KVM: PPC: Fix misleading interrupts comment in kvmppc_prepare_to_enter()

7 weeks agoMAINTAINERS: mark bcachefs externally maintained
Linus Torvalds [Thu, 28 Aug 2025 22:16:16 +0000 (15:16 -0700)] 
MAINTAINERS: mark bcachefs externally maintained

As per many long discussion threads, public and private.

Signed-off-by: Linus Torvalds <torbalds@linux-foundation.org>
8 weeks agoMerge branch 'fbnic-synchronize-address-handling-with-bmc'
Paolo Abeni [Thu, 28 Aug 2025 12:51:09 +0000 (14:51 +0200)] 
Merge branch 'fbnic-synchronize-address-handling-with-bmc'

Alexander Duyck says:

====================
fbnic: Synchronize address handling with BMC

The fbnic driver needs to communicate with the BMC if it is operating on
the RMII-based transport (RBT) of the same port the host is on. To enable
this we need to add rules that will route BMC traffic to the RBT/BMC and
the BMC and firmware need to configure rules on the RBT side of the
interface to route traffic from the BMC to the host instead of the MAC.

To enable that this patch set addresses two issues. First it will cause the
TCAM to be reconfigured in the event that the BMC was not previously
present when the driver was loaded, but the FW sends a notification that
the FW capabilities have changed and a BMC w/ various MAC addresses is now
present. Second it adds support for sending a message to the firmware so
that if the host adds additional MAC addresses the FW can be made aware and
route traffic for those addresses from the RBT to the host instead of the
MAC.
====================

Link: https://patch.msgid.link/175623715978.2246365.7798520806218461199.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agofbnic: Push local unicast MAC addresses to FW to populate TCAMs
Alexander Duyck [Tue, 26 Aug 2025 19:45:07 +0000 (12:45 -0700)] 
fbnic: Push local unicast MAC addresses to FW to populate TCAMs

The MACDA TCAM can only be accessed by one entity at a time and as such we
cannot have simultaneous reads from the firmware to probe for changes from
the host. As such we have to send a message indicating what the state of
the MACDA is to the firmware when we updated it so that the firmware can
sync up the TCAMs it owns to route BMC packets to the host.

To support that we are adding a new message that is invoked when we write
the MACDA that will notify the firmware of updates from the host and allow
it to sync up the TCAM configuration to match the one on the host side.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/175623750782.2246365.9178255870985916357.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agofbnic: Add logic to repopulate RPC TCAM if BMC enables channel
Alexander Duyck [Tue, 26 Aug 2025 19:45:01 +0000 (12:45 -0700)] 
fbnic: Add logic to repopulate RPC TCAM if BMC enables channel

The BMC itself can decide to abandon a link and move onto another link in
the event of things such as a link flap. As a result the driver may load
with the BMC not present, and then needs to update things to support the
BMC being present while the link is up and the NIC is passing traffic.

To support this we add support to the watchdog to reinitialize the RPC to
support adding the BMC unicast, multicast, and multicast promiscuous
filters while the link is up and the NIC owns the link.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/175623750101.2246365.8518307324797058580.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agofbnic: Pass fbnic_dev instead of netdev to __fbnic_set/clear_rx_mode
Alexander Duyck [Tue, 26 Aug 2025 19:44:54 +0000 (12:44 -0700)] 
fbnic: Pass fbnic_dev instead of netdev to __fbnic_set/clear_rx_mode

To make the __fbnic_set_rx_mode and __fbnic_clear_rx_mode calls usable by
more points in the code we can make to that they expect a fbnic_dev pointer
instead of a netdev pointer.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/175623749436.2246365.6068665520216196789.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agofbnic: Move promisc_sync out of netdev code and into RPC path
Alexander Duyck [Tue, 26 Aug 2025 19:44:47 +0000 (12:44 -0700)] 
fbnic: Move promisc_sync out of netdev code and into RPC path

In order for us to support the BMC possibly connecting, disconnecting, and
then reconnecting we need to be able to support entities outside of just
the NIC setting up promiscuous mode as the BMC can use a multicast
promiscuous setup.

To support that we should move the promisc_sync code out of the netdev and
into the RPC section of the driver so that it is reachable from more paths.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/175623748769.2246365.2130394904175851458.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agoMerge branch 'add-si3474-pse-controller-driver'
Paolo Abeni [Thu, 28 Aug 2025 12:42:02 +0000 (14:42 +0200)] 
Merge branch 'add-si3474-pse-controller-driver'

Piotr Kubik says:

====================
Add Si3474 PSE controller driver

From: Piotr Kubik <piotr.kubik@adtran.com>

These patch series provide support for Skyworks Si3474 I2C Power
Sourcing Equipment controller.

Based on the TPS23881 driver code.

Supported features of Si3474:
- get port status,
- get port power,
- get port voltage,
- enable/disable port power

Signed-off-by: Piotr Kubik <piotr.kubik@adtran.com>
====================

Link: https://patch.msgid.link/6af537dc-8a52-4710-8a18-dcfbb911cf23@adtran.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agonet: pse-pd: Add Si3474 PSE controller driver
Piotr Kubik [Tue, 26 Aug 2025 14:41:58 +0000 (14:41 +0000)] 
net: pse-pd: Add Si3474 PSE controller driver

Add a driver for the Skyworks Si3474 I2C Power Sourcing Equipment
controller.

Driver supports basic features of Si3474 IC:
- get port status,
- get port power,
- get port voltage,
- enable/disable port power.

Only 4p configurations are supported at this moment.

Signed-off-by: Piotr Kubik <piotr.kubik@adtran.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/9b72c8cd-c8d3-4053-9c80-671b9481d166@adtran.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agodt-bindings: net: pse-pd: Add bindings for Si3474 PSE controller
Piotr Kubik [Tue, 26 Aug 2025 14:41:44 +0000 (14:41 +0000)] 
dt-bindings: net: pse-pd: Add bindings for Si3474 PSE controller

Add the Si3474 I2C Power Sourcing Equipment controller device tree
bindings documentation.

Signed-off-by: Piotr Kubik <piotr.kubik@adtran.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/71a67c6f-6fce-49c7-96ec-554602dbd4f1@adtran.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agoMerge branch 'net-better-drop-accounting'
Paolo Abeni [Thu, 28 Aug 2025 11:14:52 +0000 (13:14 +0200)] 
Merge branch 'net-better-drop-accounting'

Eric Dumazet says:

====================
net: better drop accounting

Incrementing sk->sk_drops for every dropped packet can
cause serious cache line contention under DOS.

Add optional sk->sk_drop_counters pointer so that
protocols can opt-in to use two dedicated cache lines
to hold drop counters.

Convert UDP and RAW to use this infrastructure.

Tested on UDP (see patch 4/5 for details)

Before:

nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 615091             0.0
Udp6InErrors                    3904277            0.0
Udp6RcvbufErrors                3904277            0.0

After:

nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 816281             0.0
Udp6InErrors                    7497093            0.0
Udp6RcvbufErrors                7497093            0.0
====================

Link: https://patch.msgid.link/20250826125031.1578842-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agoinet: raw: add drop_counters to raw sockets
Eric Dumazet [Tue, 26 Aug 2025 12:50:31 +0000 (12:50 +0000)] 
inet: raw: add drop_counters to raw sockets

When a packet flood hits one or more RAW sockets, many cpus
have to update sk->sk_drops.

This slows down other cpus, because currently
sk_drops is in sock_write_rx group.

Add a socket_drop_counters structure to raw sockets.

Using dedicated cache lines to hold drop counters
makes sure that consumers no longer suffer from
false sharing if/when producers only change sk->sk_drops.

This adds 128 bytes per RAW socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-6-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agoudp: add drop_counters to udp socket
Eric Dumazet [Tue, 26 Aug 2025 12:50:30 +0000 (12:50 +0000)] 
udp: add drop_counters to udp socket

When a packet flood hits one or more UDP sockets, many cpus
have to update sk->sk_drops.

This slows down other cpus, because currently
sk_drops is in sock_write_rx group.

Add a socket_drop_counters structure to udp sockets.

Using dedicated cache lines to hold drop counters
makes sure that consumers no longer suffer from
false sharing if/when producers only change sk->sk_drops.

This adds 128 bytes per UDP socket.

Tested with the following stress test, sending about 11 Mpps
to a dual socket AMD EPYC 7B13 64-Core.

super_netperf 20 -t UDP_STREAM -H DUT -l10 -- -n -P,1000 -m 120
Note: due to socket lookup, only one UDP socket is receiving
packets on DUT.

Then measure receiver (DUT) behavior. We can see both
consumer and BH handlers can process more packets per second.

Before:

nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 615091             0.0
Udp6InErrors                    3904277            0.0
Udp6RcvbufErrors                3904277            0.0

After:

nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 816281             0.0
Udp6InErrors                    7497093            0.0
Udp6RcvbufErrors                7497093            0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-5-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agonet: add sk->sk_drop_counters
Eric Dumazet [Tue, 26 Aug 2025 12:50:29 +0000 (12:50 +0000)] 
net: add sk->sk_drop_counters

Some sockets suffer from heavy false sharing on sk->sk_drops,
and fields in the same cache line.

Add sk->sk_drop_counters to:

- move the drop counter(s) to dedicated cache lines.
- Add basic NUMA awareness to these drop counter(s).

Following patches will use this infrastructure for UDP and RAW sockets.

sk_clone_lock() is not yet ready, it would need to properly
set newsk->sk_drop_counters if we plan to use this for TCP sockets.

v2: used Paolo suggestion from https://lore.kernel.org/netdev/8f09830a-d83d-43c9-b36b-88ba0a23e9b2@redhat.com/

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-4-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agonet: add sk_drops_skbadd() helper
Eric Dumazet [Tue, 26 Aug 2025 12:50:28 +0000 (12:50 +0000)] 
net: add sk_drops_skbadd() helper

Existing sk_drops_add() helper is renamed to sk_drops_skbadd().

Add sk_drops_add() and convert sk_drops_inc() to use it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-3-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agonet: add sk_drops_read(), sk_drops_inc() and sk_drops_reset() helpers
Eric Dumazet [Tue, 26 Aug 2025 12:50:27 +0000 (12:50 +0000)] 
net: add sk_drops_read(), sk_drops_inc() and sk_drops_reset() helpers

We want to split sk->sk_drops in the future to reduce
potential contention on this field.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-2-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agouapi: wrap compiler_types.h in an ifdef instead of the implicit strip
Jakub Kicinski [Mon, 25 Aug 2025 20:18:28 +0000 (13:18 -0700)] 
uapi: wrap compiler_types.h in an ifdef instead of the implicit strip

The uAPI stddef header includes compiler_types.h, a kernel-only
header, to make sure that kernel definitions of annotations
like __counted_by() take precedence.

There is a hack in scripts/headers_install.sh which strips includes
of compiler.h and compiler_types.h when installing uAPI headers.
While explicit handling makes sense for compiler.h, which is included
all over the uAPI, compiler_types.h is only included by stddef.h
(within the uAPI, obviously it's included in kernel code a lot).

Remove the stripping from scripts/headers_install.sh and wrap
the include of compiler_types.h in #ifdef __KERNEL__ instead.
This should be equivalent functionally, but is easier to understand
to a casual reader of the code. It also makes it easier to work
with kernel headers directly from under tools/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250825201828.2370083-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agonet: ipv4: fix regression in local-broadcast routes
Oscar Maes [Wed, 27 Aug 2025 06:23:21 +0000 (08:23 +0200)] 
net: ipv4: fix regression in local-broadcast routes

Commit 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes")
introduced a regression where local-broadcast packets would have their
gateway set in __mkroute_output, which was caused by fi = NULL being
removed.

Fix this by resetting the fib_info for local-broadcast packets. This
preserves the intended changes for directed-broadcast packets.

Cc: stable@vger.kernel.org
Fixes: 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes")
Reported-by: Brett A C Sheffield <bacs@librecast.net>
Closes: https://lore.kernel.org/regressions/20250822165231.4353-4-bacs@librecast.net
Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250827062322.4807-1-oscmaes92@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agonet: macb: Disable clocks once
Neil Mandir [Tue, 26 Aug 2025 14:30:22 +0000 (10:30 -0400)] 
net: macb: Disable clocks once

When the driver is removed the clocks are disabled twice: once in
macb_remove and a second time by runtime pm. Disable wakeup in remove so
all the clocks are disabled and skip the second call to macb_clks_disable.
Always suspend the device as we always set it active in probe.

Fixes: d54f89af6cc4 ("net: macb: Add pm runtime support")
Signed-off-by: Neil Mandir <neil.mandir@seco.com>
Co-developed-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250826143022.935521-1-sean.anderson@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 weeks agoMerge tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm...
Linus Torvalds [Thu, 28 Aug 2025 02:18:51 +0000 (19:18 -0700)] 
Merge tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf-tools fixes from Namhyung Kim:
 "A number of kernel header sync changes and two build-id fixes"

* tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf symbol: Add blocking argument to filename__read_build_id
  perf symbol-minimal: Fix ehdr reading in filename__read_build_id
  tools headers: Sync uapi/linux/vhost.h with the kernel source
  tools headers: Sync uapi/linux/prctl.h with the kernel source
  tools headers: Sync uapi/linux/fs.h with the kernel source
  tools headers: Sync uapi/linux/fcntl.h with the kernel source
  tools headers: Sync syscall tables with the kernel source
  tools headers: Sync powerpc headers with the kernel source
  tools headers: Sync arm64 headers with the kernel source
  tools headers: Sync x86 headers with the kernel source
  tools headers: Sync linux/cfi_types.h with the kernel source
  tools headers: Sync linux/bits.h with the kernel source
  tools headers: Sync KVM headers with the kernel source
  perf test: Fix a build error in x86 topdown test

8 weeks agoMerge branch 'locking-fixes-for-fbnic-driver'
Jakub Kicinski [Thu, 28 Aug 2025 01:57:13 +0000 (18:57 -0700)] 
Merge branch 'locking-fixes-for-fbnic-driver'

Alexander Duyck says:

====================
Locking fixes for fbnic driver

Address a few locking issues that were reported on the fbnic driver.
Specifically in one case we were seeing locking leaks due to us not
releasing the locks in certain exception paths. In another case we were
using phylink_resume outside of a section in which we held the RTNL mutex
and as a result we were throwing an assert.
====================

Link: https://patch.msgid.link/175616242563.1963577.7257712519613275567.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agofbnic: Move phylink resume out of service_task and into open/close
Alexander Duyck [Mon, 25 Aug 2025 22:56:13 +0000 (15:56 -0700)] 
fbnic: Move phylink resume out of service_task and into open/close

The fbnic driver was presenting with the following locking assert coming
out of a PM resume:
[   42.208116][  T164] RTNL: assertion failed at drivers/net/phy/phylink.c (2611)
[   42.208492][  T164] WARNING: CPU: 1 PID: 164 at drivers/net/phy/phylink.c:2611 phylink_resume+0x190/0x1e0
[   42.208872][  T164] Modules linked in:
[   42.209140][  T164] CPU: 1 UID: 0 PID: 164 Comm: bash Not tainted 6.17.0-rc2-virtme #134 PREEMPT(full)
[   42.209496][  T164] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014
[   42.209861][  T164] RIP: 0010:phylink_resume+0x190/0x1e0
[   42.210057][  T164] Code: 83 e5 01 0f 85 b0 fe ff ff c6 05 1c cd 3e 02 01 90 ba 33 0a 00 00 48 c7 c6 20 3a 1d a5 48 c7 c7 e0 3e 1d a5 e8 21 b8 90 fe 90 <0f> 0b 90 90 e9 86 fe ff ff e8 42 ea 1f ff e9 e2 fe ff ff 48 89 ef
[   42.210708][  T164] RSP: 0018:ffffc90000affbd8 EFLAGS: 00010296
[   42.210983][  T164] RAX: 0000000000000000 RBX: ffff8880078d8400 RCX: 0000000000000000
[   42.211235][  T164] RDX: 0000000000000000 RSI: 1ffffffff4f10938 RDI: 0000000000000001
[   42.211466][  T164] RBP: 0000000000000000 R08: ffffffffa2ae79ea R09: fffffbfff4b3eb84
[   42.211707][  T164] R10: 0000000000000003 R11: 0000000000000000 R12: ffff888007ad8000
[   42.211997][  T164] R13: 0000000000000002 R14: ffff888006a18800 R15: ffffffffa34c59e0
[   42.212234][  T164] FS:  00007f0dc8e39740(0000) GS:ffff88808f51f000(0000) knlGS:0000000000000000
[   42.212505][  T164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.212704][  T164] CR2: 00007f0dc8e9fe10 CR3: 000000000b56d003 CR4: 0000000000772ef0
[   42.213227][  T164] PKRU: 55555554
[   42.213366][  T164] Call Trace:
[   42.213483][  T164]  <TASK>
[   42.213565][  T164]  __fbnic_pm_attach.isra.0+0x8e/0xa0
[   42.213725][  T164]  pci_reset_function+0x116/0x1d0
[   42.213895][  T164]  reset_store+0xa0/0x100
[   42.214025][  T164]  ? pci_dev_reset_attr_is_visible+0x50/0x50
[   42.214221][  T164]  ? sysfs_file_kobj+0xc1/0x1e0
[   42.214374][  T164]  ? sysfs_kf_write+0x65/0x160
[   42.214526][  T164]  kernfs_fop_write_iter+0x2f8/0x4c0
[   42.214677][  T164]  ? kernfs_vma_page_mkwrite+0x1f0/0x1f0
[   42.214836][  T164]  new_sync_write+0x308/0x6f0
[   42.214987][  T164]  ? __lock_acquire+0x34c/0x740
[   42.215135][  T164]  ? new_sync_read+0x6f0/0x6f0
[   42.215288][  T164]  ? lock_acquire.part.0+0xbc/0x260
[   42.215440][  T164]  ? ksys_write+0xff/0x200
[   42.215590][  T164]  ? perf_trace_sched_switch+0x6d0/0x6d0
[   42.215742][  T164]  vfs_write+0x65e/0xbb0
[   42.215876][  T164]  ksys_write+0xff/0x200
[   42.215994][  T164]  ? __ia32_sys_read+0xc0/0xc0
[   42.216141][  T164]  ? do_user_addr_fault+0x269/0x9f0
[   42.216292][  T164]  ? rcu_is_watching+0x15/0xd0
[   42.216442][  T164]  do_syscall_64+0xbb/0x360
[   42.216591][  T164]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[   42.216784][  T164] RIP: 0033:0x7f0dc8ea9986

A bit of digging showed that we were invoking the phylink_resume as a part
of the fbnic_up path when we were enabling the service task while not
holding the RTNL lock. We should be enabling this sooner as a part of the
ndo_open path and then just letting the service task come online later.
This will help to enforce the correct locking and brings the phylink
interface online at the same time as the network interface, instead of at a
later time.

I tested this on QEMU to verify this was working by putting the system to
sleep using "echo mem > /sys/power/state" to put the system to sleep in the
guest and then using the command "system_wakeup" in the QEMU monitor.

Fixes: 69684376eed5 ("eth: fbnic: Add link detection")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/175616257316.1963577.12238158800417771119.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agofbnic: Fixup rtnl_lock and devl_lock handling related to mailbox code
Alexander Duyck [Mon, 25 Aug 2025 22:56:06 +0000 (15:56 -0700)] 
fbnic: Fixup rtnl_lock and devl_lock handling related to mailbox code

The exception handling path for the __fbnic_pm_resume function had a bug in
that it was taking the devlink lock and then exiting to exception handling
instead of waiting until after it released the lock to do so. In order to
handle that I am swapping the placement of the unlock and the exception
handling jump to label so that we don't trigger a deadlock by holding the
lock longer than we need to.

In addition this change applies the same ordering to the rtnl_lock/unlock
calls in the same function as it should make the code easier to follow if
it adheres to a consistent pattern.

Fixes: 82534f446daa ("eth: fbnic: Add devlink dev flash support")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/175616256667.1963577.5543500806256052549.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'eth-fbnic-extend-hw-stats-support'
Jakub Kicinski [Thu, 28 Aug 2025 01:56:27 +0000 (18:56 -0700)] 
Merge branch 'eth-fbnic-extend-hw-stats-support'

Jakub Kicinski says:

====================
eth: fbnic: Extend hw stats support

Mohsin says:

Extend hardware stats support for fbnic by adding the ability to reset
hardware stats when the device experience a reset due to a PCI error and
include MAC stats in the hardware stats reset. Additionally, expand
hardware stats coverage to include FEC, PHY, and Pause stats.

v1: https://lore.kernel.org/20250822164731.1461754-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250825200206.2357713-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoeth: fbnic: Add pause stats support
Mohsin Bashir [Mon, 25 Aug 2025 20:02:06 +0000 (13:02 -0700)] 
eth: fbnic: Add pause stats support

Add support to read pause stats for fbnic. Unlike FEC and PCS stats,
pause stats won't wrap, do not fetch them under the service task. Since,
they are exclusively accessed via the ethtool API, don't include them in
fbnic_get_hw_stats().

]# ethtool -I -a eth0
Pause parameters for eth0:
Autonegotiate: on
RX: off
TX: off
Statistics:
  tx_pause_frames: 0
  rx_pause_frames: 0

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoeth: fbnic: Read PHY stats via the ethtool API
Mohsin Bashir [Mon, 25 Aug 2025 20:02:05 +0000 (13:02 -0700)] 
eth: fbnic: Read PHY stats via the ethtool API

Provide support to read PHY stats (FEC and PCS) via the ethtool API.

]# ethtool -I --show-fec eth0
FEC parameters for eth0:
Supported/Configured FEC encodings: RS
Active FEC encoding: RS
Statistics:
  corrected_blocks: 0
  uncorrectable_blocks: 0

]# ethtool -S eth0 --groups eth-phy
Standard stats for eth0:
eth-phy-SymbolErrorDuringCarrier: 0

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoeth: fbnic: Fetch PHY stats from device
Mohsin Bashir [Mon, 25 Aug 2025 20:02:04 +0000 (13:02 -0700)] 
eth: fbnic: Fetch PHY stats from device

Add support to fetch PHY stats consisting of PCS and FEC stats from the
device. When reading the stats counters, the lo part is read first, which
latches the hi part to ensure consistent reading of the stats counter.

FEC and PCS stats can wrap depending on the access frequency. To prevent
wrapping, fetch these stats periodically under the service task. Also to
maintain consistency fetch these stats along with other 32b stats under
__fbnic_get_hw_stats32().

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoeth: fbnic: Reset MAC stats
Mohsin Bashir [Mon, 25 Aug 2025 20:02:03 +0000 (13:02 -0700)] 
eth: fbnic: Reset MAC stats

Reset the MAC stats as part of the hardware stats reset to ensure
consistency. Currently, hardware stats are reset during device bring-up
and upon experiencing PCI errors; however, MAC stats are being skipped
during these resets.

When fbnic_reset_hw_stats() is called upon recovering from PCI error,
MAC stats are accessed outside the rtnl_lock. The only other access to
MAC stats is via the ethtool API, which is protected by rtnl_lock. This
can result in concurrent access to MAC stats and a potential race. Protect
the fbnic_reset_hw_stats() call in __fbnic_pm_attach() with rtnl_lock to
avoid this.

Note that fbnic_reset_hw_mac_stats() is called outside the hardware
stats lock which protects access to the fbnic_hw_stats. This is intentional
because MAC stats are fetched from the device outside this lock and are
exclusively read via the ethtool API.

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoeth: fbnic: Reset hw stats upon PCI error
Mohsin Bashir [Mon, 25 Aug 2025 20:02:02 +0000 (13:02 -0700)] 
eth: fbnic: Reset hw stats upon PCI error

Upon experiencing a PCI error, fbnic reset the device to recover from
the failure. Reset the hardware stats as part of the device reset to
ensure accurate stats reporting.

Note that the reset is not really resetting the aggregate value to 0,
which may result in a spike for a system collecting deltas in stats.
Rather, the reset re-latches the current value as previous, in case HW
got reset.

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoeth: fbnic: Move hw_stats_lock out of fbnic_dev
Mohsin Bashir [Mon, 25 Aug 2025 20:02:01 +0000 (13:02 -0700)] 
eth: fbnic: Move hw_stats_lock out of fbnic_dev

Move hw_stats_lock out of fbnic_dev to a more appropriate struct
fbnic_hw_stats since the only use of this lock is to protect access to
the hardware stats. While at it, enclose the lock and stats
initialization in a single init call.

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agoMerge branch 'macsec-replace-custom-netlink-attribute-checks-with-policy-level-checks'
Jakub Kicinski [Thu, 28 Aug 2025 01:34:55 +0000 (18:34 -0700)] 
Merge branch 'macsec-replace-custom-netlink-attribute-checks-with-policy-level-checks'

Sabrina Dubroca says:

====================
macsec: replace custom netlink attribute checks with policy-level checks

We can simplify attribute validation a lot by describing the accepted
ranges more precisely in the policies, using NLA_POLICY_MAX etc.

Some of the checks still need to be done later on, because the
attribute length and acceptable range can vary based on values that
can't be known when the policy is validated (cipher suite determines
the key length and valid ICV length, presence of XPN changes the PN
length, detection of duplicate SCIs or ANs, etc).

As a bonus, we get a few extack messages from the policy
validation. I'll add extack to the rest of the checks (mostly in the
genl commands) in an future series.

v1: https://lore.kernel.org/netdev/cover.1664379352.git.sd@queasysnail.net
====================

Link: https://patch.msgid.link/cover.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: replace custom check on IFLA_MACSEC_ENCODING_SA with NLA_POLICY_MAX
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:31 +0000 (15:16 +0200)] 
macsec: replace custom check on IFLA_MACSEC_ENCODING_SA with NLA_POLICY_MAX

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/085bc642136cf3d267ddbb114e6f0c4a9247c797.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: replace custom checks for IFLA_MACSEC_* flags with NLA_POLICY_MAX
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:30 +0000 (15:16 +0200)] 
macsec: replace custom checks for IFLA_MACSEC_* flags with NLA_POLICY_MAX

Those are all off/on flags.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/95707fb36adc1904fa327bc8f4eb055895aa6eff.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: validate IFLA_MACSEC_VALIDATION with NLA_POLICY_MAX
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:29 +0000 (15:16 +0200)] 
macsec: validate IFLA_MACSEC_VALIDATION with NLA_POLICY_MAX

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/629efe0b2150b30abc6472074018cbd521b46578.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: use NLA_POLICY_VALIDATE_FN to validate IFLA_MACSEC_CIPHER_SUITE
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:28 +0000 (15:16 +0200)] 
macsec: use NLA_POLICY_VALIDATE_FN to validate IFLA_MACSEC_CIPHER_SUITE

Unfortunately, since the value of MACSEC_DEFAULT_CIPHER_ID doesn't fit
near the others, we can't use a simple range in the policy.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/015e43ade9548c7682c9739087eba0853b3a1331.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: replace custom checks on IFLA_MACSEC_ICV_LEN with NLA_POLICY_RANGE
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:27 +0000 (15:16 +0200)] 
macsec: replace custom checks on IFLA_MACSEC_ICV_LEN with NLA_POLICY_RANGE

The existing checks already force this range.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/398cf16191a634ab343ecd811c481d7bdd44a933.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: add NLA_POLICY_MAX for MACSEC_OFFLOAD_ATTR_TYPE and IFLA_MACSEC_OFFLOAD
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:26 +0000 (15:16 +0200)] 
macsec: add NLA_POLICY_MAX for MACSEC_OFFLOAD_ATTR_TYPE and IFLA_MACSEC_OFFLOAD

This is equivalent to the existing checks allowing either
MACSEC_OFFLOAD_OFF or calling macsec_check_offload.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/37e1f1716f1d1d46d3d06c52317564b393fe60e6.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: remove validate_add_rxsc
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:25 +0000 (15:16 +0200)] 
macsec: remove validate_add_rxsc

It's not doing much anymore.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/218147f2f11cab885abc86b779dcefcd3208a2f8.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: use NLA_UINT for MACSEC_SA_ATTR_PN
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:24 +0000 (15:16 +0200)] 
macsec: use NLA_UINT for MACSEC_SA_ATTR_PN

MACSEC_SA_ATTR_PN is either a u32 or a u64, we can now use NLA_UINT
for this instead of a custom binary type. We can then use a min check
within the policy.

We need to keep the length checks done in macsec_{add,upd}_{rx,tx}sa
based on whether the device is set up for XPN (with 64b PNs instead of
32b).

On the dump side, keep the existing custom code as userspace may
expect a u64 when using XPN, and nla_put_uint may only output a u32
attribute if the value fits.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/c9d32bd479cd4464e09010fbce1becc75377c8a0.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: use NLA_POLICY_MAX_LEN for MACSEC_SA_ATTR_KEY
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:23 +0000 (15:16 +0200)] 
macsec: use NLA_POLICY_MAX_LEN for MACSEC_SA_ATTR_KEY

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/192227ca0047b643d6530ece0a3679998b010fac.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: replace custom checks on MACSEC_SA_ATTR_KEYID with NLA_POLICY_EXACT_LEN
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:22 +0000 (15:16 +0200)] 
macsec: replace custom checks on MACSEC_SA_ATTR_KEYID with NLA_POLICY_EXACT_LEN

The existing checks already specify that MACSEC_SA_ATTR_KEYID must
have length MACSEC_KEYID_LEN.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/c4c113328962aae4146183e7a27854e854c796fb.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: replace custom checks on MACSEC_SA_ATTR_SALT with NLA_POLICY_EXACT_LEN
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:21 +0000 (15:16 +0200)] 
macsec: replace custom checks on MACSEC_SA_ATTR_SALT with NLA_POLICY_EXACT_LEN

The existing checks already specify that MACSEC_SA_ATTR_SALT must have
length MACSEC_SALT_LEN.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/9699c5fd72322118b164cc8777fadabcce3b997c.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: replace custom checks on MACSEC_*_ATTR_ACTIVE with NLA_POLICY_MAX
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:20 +0000 (15:16 +0200)] 
macsec: replace custom checks on MACSEC_*_ATTR_ACTIVE with NLA_POLICY_MAX

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/2b07434304c725c72a7d81a8460d0bbe8af384a2.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 weeks agomacsec: replace custom checks on MACSEC_SA_ATTR_AN with NLA_POLICY_MAX
Sabrina Dubroca [Tue, 26 Aug 2025 13:16:19 +0000 (15:16 +0200)] 
macsec: replace custom checks on MACSEC_SA_ATTR_AN with NLA_POLICY_MAX

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/22a7820cfc2cbfe5e33f030f1a3276e529cc70dc.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>