git.ipfire.org Git - thirdparty/kernel/linux.git/log

ipv4: drop ipv6_stub usage and use direct function calls

As IPv6 is built-in only, the ipv6_stub infrastructure is no longer
necessary.

The IPv4 stack interacts with IPv6 mainly to support IPv4 routes with
IPv6 next-hops (RFC 8950). Convert all these cross-family calls from
ipv6_stub to direct function calls. The fallback functions introduced
previously will prevent linkage errors when CONFIG_IPV6 is disabled.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Tested-by: Ricardo B. Marlière <rbm@suse.com>
Link: https://patch.msgid.link/20260325120928.15848-8-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drivers: net: drop ipv6_stub usage and use direct function calls

As IPv6 is built-in only, the ipv6_stub infrastructure is no longer
necessary.

Convert all drivers currently utilizing ipv6_stub to make direct
function calls. The fallback functions introduced previously will
prevent linkage errors when CONFIG_IPV6 is disabled.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Tested-by: Ricardo B. Marlière <rbm@suse.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Antonio Quartulli <antonio@openvpn.net>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20260325120928.15848-7-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: prepare headers for ipv6_stub removal

In preparation for dropping ipv6_stub and converting its users to direct
function calls, introduce static inline dummy functions and fallback
macros in the IPv6 networking headers. In addition, introduce checks on
fib6_nh_init(), ip6_dst_lookup_flow() and ip6_fragment() to avoid a
crash due to ipv6.disable=1 set during booting. The other functions are
safe as they cannot be called with ipv6.disable=1 set.

These fallbacks ensure that when CONFIG_IPV6 is completely disabled,
there are no compiling or linking errors due to code paths not guarded
by preprocessor macro IS_ENABLED(CONFIG_IPV6).

In addition, export ndisc_send_na(), ip6_route_input() and
ip6_fragment().

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Tested-by: Ricardo B. Marlière <rbm@suse.com>
Link: https://patch.msgid.link/20260325120928.15848-6-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: remove dynamic ICMPv6 sender registration infrastructure

As IPv6 is built-in only, there is no need to maintain the sender
registration infrastructure used to allow built-in subsystems to send
ICMPv6 messages when IPv6 was compiled as a module.

Drop the registration mechanism and the __icmpv6_send() sender
implementation. While icmpv6_send() users could be converted to
icmp6_send() that doesn't seems necessary as none of them are using the
force_saddr parameter.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Tested-by: Ricardo B. Marlière <rbm@suse.com>
Link: https://patch.msgid.link/20260325120928.15848-5-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: replace IS_BUILTIN(CONFIG_IPV6) with IS_ENABLED(CONFIG_IPV6)

As IPv6 is built-in only, it does not make sense to continue using
IS_BUILTIN(CONFIG_IPV6). Therefore, replace it with IS_ENABLED() when
necessary and drop it if it isn't valid anymore.

Notice that there is still one instance related to ICMPv6, as it
requires more changes it will be handle separately.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Tested-by: Ricardo B. Marlière <rbm@suse.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260325120928.15848-4-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: remove EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() macros

As IPv6 is built-in only, the macro is always evaluating to an empty
one. Remove it completely from the code.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260325120928.15848-3-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: convert CONFIG_IPV6 to built-in only and clean up Kconfigs

Maintaining a modular IPv6 stack offers image size savings for specific
setups, this benefit is outweighed by the architectural burden it
imposes on the subsystems on implementation and maintenance. Therefore,
drop it.

Change CONFIG_IPV6 from tristate to bool. Remove all Kconfig
dependencies across the tree that explicitly checked for IPV6=m. In
addition, remove MODULE_DESCRIPTION(), MODULE_ALIAS(), MODULE_AUTHOR()
and MODULE_LICENSE().

This is also replacing module_init() by device_initcall(). It is not
possible to use fs_initcall() as IPv4 does because that creates a race
condition on IPv6 addrconf.

Finally, modify the default configs from CONFIG_IPV6=m to CONFIG_IPV6=y
except for m68k as according to the bloat-o-meter the image is
increasing by 330KB~ and that isn't acceptable. Instead, disable IPv6 on
this architecture by default. This is aligned with m68k RAM requirements
and recommendations [1].

[1] http://www.linux-m68k.org/faq/ram.html

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Tested-by: Ricardo B. Marlière <rbm@suse.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org> # arm64
Link: https://patch.msgid.link/20260325120928.15848-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'vrf-a-few-cleanups'

Ido Schimmel says:

====================
vrf: A few cleanups

Perform a few cleanups in the VRF driver. Noticed these while reviewing
a recent patch [1]. See individual patches for more details.

[1] https://lore.kernel.org/netdev/20260310105331.2371-1-lirongqing@baidu.com/
====================

Link: https://patch.msgid.link/20260326203233.1128554-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vrf: Remove unnecessary RCU protection around dst entries

During initialization of a VRF device, the VRF driver creates two dst
entries (for IPv4 and IPv6). They are attached to locally generated
packets that are transmitted out of the VRF ports (via the
l3mdev_l3_out() hook). Their purpose is to redirect packets towards the
VRF device instead of having the packets egress directly out of the VRF
ports. This is useful, for example, when a queuing discipline is
configured on the VRF device.

In order to avoid a NULL pointer dereference, commit b0e95ccdd775 ("net:
vrf: protect changes to private data with rcu") made the pointers to the
dst entries RCU protected. As far as I can tell, this was needed because
back then the dst entries were released (and the pointers reset to NULL)
before removing the VRF ports.

Later on, commit f630c38ef0d7 ("vrf: fix bug_on triggered by rx when
destroying a vrf") moved the removal of the VRF ports to the VRF
device's dellink() callback. As such, the tear down sequence of a VRF
device looks as follows:

1. VRF ports are removed.
2. VRF device is unregistered.
    a. Device is closed.
    b. An RCU grace period passes.
    c. ndo_uninit() is called.
        i. dst entries are released.

Given the above, the Tx path will always see the same fully initialized
dst entries and will never race with the ndo_uninit() callback.

Therefore, there is no need to make the pointers to the dst entries RCU
protected. Remove it as well as the unnecessary NULL checks in the Tx
path.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260326203233.1128554-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vrf: Use dst_dev_put() instead of using loopback device

Use dst_dev_put() to clean up the device referenced by the dst entry
instead of partially open coding it. Internally, the helper uses the
blackhole device instead of the loopback device.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260326203233.1128554-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vrf: Remove unnecessary NULL check

The VRF driver always allocates an IPv4 dst entry for a VRF device and
prevents the device from being registered if the allocation fails.

Therefore, there is no need to check if the entry exists when tearing
down a VRF device. Remove the check.

Note that the same is not true for the IPv6 dst entry. Its creation can
be skipped if IPv6 is administratively disabled (i.e.,
'ipv6.disable=1').

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260326203233.1128554-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-disable-eee-on-i-mx'

Laurent Pinchart says:

====================
net: stmmac: Disable EEE on i.MX

This small patch series fixes a long-standing interrupt storm issue with
stmmac on NXP i.MX platforms.

The initial attempt to fix^Wwork around the problem in DT ([1]) was
painfully but rightfully rejected by Russell, who helped me investigate
the issue in depth. It turned out that the root cause is a mistake in
how interrupts are wired in the SoC, a hardware bug that has been
replicated in all i.MX SoCs that integrate an stmmac. The only viable
solution is to disable EEE on those devices.

Individual patches explain the issue in more details. Patch 1/2,
authored by Russell, adds a new STMMAC_FLAG to disable EEE, and patch
2/2 sets the flag for i.MX platforms.

[1] https://lore.kernel.org/r/20251026122905.29028-1-laurent.pinchart@ideasonboard.com
====================

Link: https://patch.msgid.link/20260325210003.2752013-1-laurent.pinchart@ideasonboard.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: imx: Disable EEE

The i.MX8MP suffers from an interrupt storm related to the stmmac and
EEE. A long and tedious analysis ([1]) concluded that the SoC wires the
stmmac lpi_intr_o signal to an OR gate along with the main dwmac
interrupts, which causes an interrupt storm for two reasons.

First, there's a race condition due to the interrupt deassertion being
synchronous to the RX clock domain:

- When the PHY exits LPI mode, it restarts generating the RX clock
  (clk_rx_i input signal to the GMAC).
- The MAC detects exit from LPI, and asserts lpi_intr_o. This triggers
  the ENET_EQOS interrupt.
- Before the CPU has time to process the interrupt, the PHY enters LPI
  mode again, and stops generating the RX clock.
- The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
  registers. This does not clear lpi_intr_o as there's no clk_rx_i.

An attempt was made to fixing the issue by not stopping RX_CLK in Rx LPI
state ([2]). This alleviates the symptoms but doesn't fix the issue.
Since lpi_intr_o takes four RX_CLK cycles to clear, an interrupt storm
can still occur during that window. In 1000T mode this is harder to
notice, but slower receive clocks cause hundreds to thousands of
spurious interrupts.

Fix the issue by disabling EEE completely on i.MX8MP.

[1] https://lore.kernel.org/all/20251026122905.29028-1-laurent.pinchart@ideasonboard.com/
[2] https://lore.kernel.org/all/20251123053518.8478-1-laurent.pinchart@ideasonboard.com/

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260325210003.2752013-3-laurent.pinchart@ideasonboard.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: provide flag to disable EEE

Some platforms have problems when EEE is enabled, and thus need a way
to disable stmmac EEE support. Add a flag before the other LPI related
flags which tells stmmac to avoid populating the phylink LPI
capabilities, which causes phylink to call phy_disable_eee() for any
PHY that is attached to the affected phylink instance.

iMX8MP is an example - the lpi_intr_o signal is wired to an OR gate
along with the main dwmac interrupts. Since lpi_intr_o is synchronous
to the receive clock domain, and takes four clock cycles to clear, this
leads to interrupt storms as the interrupt remains asserted for some
time after the LPI control and status register is read.

This problem becomes worse when the receive clock from the PHY stops
when the receive path enters LPI state - which means that lpi_intr_o
can not deassert until the clock restarts. Since the LPI state of the
receive path depends on the link partner, this is out of our control.
We could disable RX clock stop at the PHY, but that doesn't get around
the slow-to-deassert lpi_intr_o mentioned in the above paragraph.

Previously, iMX8MP worked around this by disabling gigabit EEE, but
this is insufficient - the problem is also visible at 100M speeds,
where the receive clock is slower.

There is extensive discussion and investigation in the thread linked
below, the result of which is summarised in this commit message.

Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Closes: https://lore.kernel.org/r/20251026122905.29028-1-laurent.pinchart@ideasonboard.com
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Link: https://patch.msgid.link/20260325210003.2752013-2-laurent.pinchart@ideasonboard.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Use at least SZ_4K in doorbell ID range check

mana_gd_ring_doorbell() accesses offsets up to DOORBELL_OFFSET_EQ
(0xFF8) + 8 bytes = 4KB within each doorbell page. A db_page_size
smaller than SZ_4K is fundamentally incompatible with the driver:
doorbell pages would overlap and the device cannot function correctly.

Validate db_page_size at the source and fail the
probe early if the value is below SZ_4K. This ensures the doorbell ID
range check in mana_gd_register_device() can rely on db_page_size
being valid.

Fixes: 89fe91c65992 ("net: mana: hardening: Validate doorbell ID from GDMA_REGISTER_DEVICE response")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260325180423.1923060-1-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlx5: shd: Gracefully avoid shared devlink creation when no usable SN is found

On some HW, not even fall-back "SERIALNO" is found in VPD data. Unify
the behavior with the case there are no VPD data at all and avoid
creation of shared devlink instance.

Fixes: 2a8c8a03f306 ("net/mlx5: Add a shared devlink instance for PFs on same chip")
Reported-by: Adam Young <admiyo@amperemail.onmicrosoft.com>
Closes: https://lore.kernel.org/all/bab5b6bc-aa42-4af1-80d1-e56bcef06bc2@amperemail.onmicrosoft.com/
Reported-by: Ben Copeland <ben.copeland@linaro.org>
Closes: https://lore.kernel.org/all/20260324151014.860376-1-ben.copeland@linaro.org/
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Ben Copeland <ben.copeland@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260325152801.236343-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptr_ring: disable KCSAN warnings

Eric Dumazet reported KCSAN warnings:

BUG: KCSAN: data-race in pfifo_fast_dequeue / pfifo_fast_enqueue

write to 0xffff88811d5ccc00 of 8 bytes by interrupt on cpu 0:
__ptr_ring_zero_tail include/linux/ptr_ring.h:259 [inline]
__ptr_ring_discard_one include/linux/ptr_ring.h:291 [inline]
__ptr_ring_consume include/linux/ptr_ring.h:311 [inline]
__skb_array_consume include/linux/skb_array.h:98 [inline]
pfifo_fast_dequeue+0x770/0x8f0 net/sched/sch_generic.c:770
dequeue_skb net/sched/sch_generic.c:297 [inline]
qdisc_restart net/sched/sch_generic.c:402 [inline]
__qdisc_run+0x189/0xc80 net/sched/sch_generic.c:420
qdisc_run include/net/pkt_sched.h:120 [inline]
net_tx_action+0x379/0x590 net/core/dev.c:5793
handle_softirqs+0xb9/0x280 kernel/softirq.c:622
do_softirq+0x45/0x60 kernel/softirq.c:523
__local_bh_enable_ip+0x70/0x80 kernel/softirq.c:450
local_bh_enable include/linux/bottom_half.h:33 [inline]
bpf_test_run+0x2db/0x620 net/bpf/test_run.c:426
bpf_prog_test_run_skb+0x9a4/0xef0 net/bpf/test_run.c:1159
bpf_prog_test_run+0x204/0x340 kernel/bpf/syscall.c:4721
__sys_bpf+0x52e/0x7e0 kernel/bpf/syscall.c:6246
__do_sys_bpf kernel/bpf/syscall.c:6341 [inline]
__se_sys_bpf kernel/bpf/syscall.c:6339 [inline]
__x64_sys_bpf+0x41/0x50 kernel/bpf/syscall.c:6339
x64_sys_call+0x10cb/0x3020 arch/x86/include/generated/asm/syscalls_64.h:322
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x12c/0x370 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff88811d5ccc00 of 8 bytes by task 22632 on cpu 1:
__ptr_ring_produce include/linux/ptr_ring.h:106 [inline]
ptr_ring_produce include/linux/ptr_ring.h:129 [inline]
skb_array_produce include/linux/skb_array.h:44 [inline]
pfifo_fast_enqueue+0xd5/0x2c0 net/sched/sch_generic.c:741
dev_qdisc_enqueue net/core/dev.c:4144 [inline]
__dev_xmit_skb net/core/dev.c:4188 [inline]
__dev_queue_xmit+0x6a4/0x1f20 net/core/dev.c:4795
dev_queue_xmit include/linux/netdevice.h:3384 [inline]
__bpf_tx_skb net/core/filter.c:2153 [inline]
__bpf_redirect_common net/core/filter.c:2197 [inline]
__bpf_redirect+0x862/0x990 net/core/filter.c:2204
____bpf_clone_redirect net/core/filter.c:2487 [inline]
bpf_clone_redirect+0x20c/0x290 net/core/filter.c:2450
bpf_prog_53f18857bc887b09+0x22/0x2a
bpf_dispatcher_nop_func include/linux/bpf.h:1402 [inline]
__bpf_prog_run include/linux/filter.h:723 [inline]
bpf_prog_run include/linux/filter.h:730 [inline]
bpf_test_run+0x29d/0x620 net/bpf/test_run.c:423
bpf_prog_test_run_skb+0x9a4/0xef0 net/bpf/test_run.c:1159
bpf_prog_test_run+0x204/0x340 kernel/bpf/syscall.c:4721
__sys_bpf+0x52e/0x7e0 kernel/bpf/syscall.c:6246
__do_sys_bpf kernel/bpf/syscall.c:6341 [inline]
__se_sys_bpf kernel/bpf/syscall.c:6339 [inline]
__x64_sys_bpf+0x41/0x50 kernel/bpf/syscall.c:6339
x64_sys_call+0x10cb/0x3020 arch/x86/include/generated/asm/syscalls_64.h:322
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x12c/0x370 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0xffff888104a93a00 -> 0x0000000000000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 22632 Comm: syz.0.4135 Tainted: G W syzkaller #0
PREEMPT(full)
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/24/2026

There is no race on ring accesses: reading/writing a partial pointer
would be fine, because the reading is done by the producer
which merely cares about NULL/non NULL.
Document and disable the warnings using data_race().

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/dd3984b3bce9df3591927f927668cb31cc7ecf34.1774460059.git.mst@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: qrtr: fix endian handling of confirm_rx field

Convert confirm_rx to little endian when enqueueing and convert it back on
receive. This fixes control flow on big endian hosts, little endian is
unaffected.

On transmit, store confirm_rx as __le32 using cpu_to_le32(). On receive,
apply le32_to_cpu() before using the value. !! ensures the value is 0 or 1
in native endianness, so the conversion isn’t strictly required here, but
it is kept for consistency and clarity.

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Alexander Wilhelm <alexander.wilhelm@westermo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix inconsistent indenting warning

Suppress such warning reported by test robot:
include/net/tcp.h:1449 tcp_ca_event() warn: inconsistent indenting

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603251430.gQ3VuiKV-lkp@intel.com/
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260325071854.805-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-remove-unused-and-unimplemented-axi-properties'

Russell King says:

====================
net: stmmac: remove unused and unimplemented AXI properties

commit afea03656add ("stmmac: rework DMA bus setting and introduce new
platform AXI structure") added support for parsing all the stmmac AXI
attributes, and added code to set most of the appropriate register bits
with three exceptions:

snps,kbbe
snps,mb
snps,rb

These were parsed by the driver, but the result of parsing was never
used by any of the cores.

Moreover, no DTS in the kernel makes use of these properties.

Thus, it doesn't make sense for the driver to parse these, so let's
remove them. Also remove them from the DT binding document.
====================

Link: https://patch.msgid.link/acJh4z3pRKkeaFbR@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: remove unimplemented AXI snps,kbbe snps,mb and snps,rb

Remove the AXI snps,kbbe snps,mb and snps,rb properties as they have
not been used, and although the driver parses these, the code hasn't
ever used the parsed result. This parsing has now been removed.

These were introduced by commit afea03656add ("stmmac: rework DMA bus
setting and introduce new platform AXI structure").

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/E1w4ydt-0000000Dlph-3WvI@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: remove axi_kbbe, axi_mb and axi_rb members

axi_kbbe, axi_mb and axi_rb are all written, but nothing ever reads
their values. Remove the code that sets these and the struct members.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w4ydo-0000000Dlpb-34jd@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio-net: enable NETIF_F_GRO_HW only if GRO-related offloads are supported

Negotiating VIRTIO_NET_F_CTRL_GUEST_OFFLOADS indicates the device
allows control over offload support, but the offloads that can be
controlled may have nothing to do with GRO (e.g., if neither GUEST_TSO4
nor GUEST_TSO6 is supported).

In such a setup, reporting NETIF_F_GRO_HW as available for the device
is too optimistic and misleading to the user.

Improve the situation by masking off NETIF_F_GRO_HW unless the device
possesses actual GRO-related offload capabilities. Out of an abundance
of caution, this does not change the current behaviour for hardware with
just v6 or just v4 GRO: current interfaces do not allow distinguishing
between v6/v4 GRO, so we can't expose them to userspace precisely.

Signed-off-by: Di Zhu <zhud@hygon.cn>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20260323041730.986351-1-zhud@hygon.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: tcp_vegas: use tcp_vegas_cwnd_event_tx_start()

While net/ipv4/tcp_yeah.c is correctly setting .cwnd_event_tx_start
to tcp_vegas_cwnd_event_tx_start(), I forgot to do the same in tcp_vegas.c

Fixes: d1e59a469737 ("tcp: add cwnd_event_tx_start to tcp_congestion_ops")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260325212440.4146579-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio_net: sync RX buffer before reading the header

receive_buf() reads the virtio header through buf before
page_pool_dma_sync_for_cpu() runs in receive_small() or
receive_mergeable(). The header buffer is thus unsynchronized at the
point where flags and, for mergeable buffers, num_buffers are consumed.

Omar Elghoul reported that on s390x Secure Execution this showed up as
greatly reduced virtio-net performance together with "bad gso" and
"bad csum" messages in dmesg. This is because with SE sync actually
copies data, so the header is uninitialized.

Move the sync into receive_buf() so the
header is synchronized before any access through buf.

Tool use: Cursor with GPT-5.4 drafted the initial code move from prompt:
"in drivers/net/virtio_net.c, move page_pool_dma_sync_for_cpu on receive
path to before memory is accessed through buf".
The result and the commit log were reviewed and edited manually.

Fixes: 24fbd3967f3f ("virtio_net: add page_pool support for buffer allocation")
Reported-by: Omar Elghoul <oelghoul@linux.ibm.com>
Tested-by: Srikanth Aithal <sraithal@amd.com>
Tested-by: Omar Elghoul <oelghoul@linux.ibm.com>
Link: https://lore.kernel.org/r/20260323150136.14452-1-oelghoul@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Vishwanath Seshagiri <vishs@meta.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/f4caa9be9e5addae7851c012cab0a733be7f0974.1774365273.git.mst@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-dsa-microchip-small-cleanups-for-ksz9477-sgmii'

Maxime Chevallier says:

====================
net: dsa: microchip: Small cleanups for ksz9477 SGMII

While working with ksz9477, I've done some very minor cleanups around the
PCS code for the SGMII port. No changes intended.
====================

Link: https://patch.msgid.link/20260324180826.524327-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: drop an outdated comment about SGMII support

SGMII support has been added to ksz9477, we can drop the comment saying
that it'll be added later.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20260324180826.524327-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: Drop unnecessary check in ksz9477 PCS setup

The ksz_dev_ops .pcs_create() is called under the assumption that the
switch has a PCS port :

  if (ksz_has_sgmii_port(dev) && dev->dev_ops->pcs_create) {
          ret = dev->dev_ops->pcs_create(dev);
  [...]
  }

The KSZ9477 implementation of .pcs_create() does the same check on
ksz_has_sgmii_port(), and protects the entire function with it.

Drop it, saving a level of indentation and increasing readability.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20260324180826.524327-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: Remove unnecessary backslashes in fq_band_pktlimit.sh

Address "grep: warning: stray \ before white space" warning from GNU
grep 3.12. This warns the misplaced backslashes before whitespaces
(e.g. \\' ' or '\ ') which leads to unspecified behavior [1].

We can just remove the backslashes before whitespaces as POSIX says:

Enclosing characters in single-quotes ('') shall preserve the literal
value of each character within the single-quotes.

and bourne-compatible shells behave so.

[1]: https://lists.gnu.org/r/bug-gnulib/2022-05/msg00057.html

Signed-off-by: Yohei Kojima <yk@y-koj.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/dd0bbd48cdf468da56ec34fd61cecd4d2111d7ba.1774372510.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mctp: avoid copy in fragmentation loop for near-MTU messages

Currently, we incorrectly send messages that are within 4 bytes (a
struct mctp_hdr) smaller than the MTU through mctp_do_fragment_route().
This has no effect on the actual fragmentation, as we will still send as
one packet, but unnecessarily copies the original skb into a new
single-fragment skb.

Instead of having the MTU comparisons in both mctp_local_output() and
mctp_do_fragment_route(), feed all local messages through the latter,
and add the single-packet optimisation there.

This means we can coalesce the routing path of mctp_local_output, so our
out_release path is now solely for errors, so rename the label
accordingly.

Include a check in the route tests for the single-packet case too.

Reported-by: yuanzhaoming <yuanzm2@lenovo.com>
Closes: https://github.com/openbmc/linux/commit/269936db5eb3962fe290b1dc4dbf1859cd5a04dd#r175836230
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260324-dev-mtu-copy-v1-1-7af6bd7027d3@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-support-for-per-route-seg6-tunsrc'

Justin Iurman says:

====================
Add support for per-route seg6 tunsrc

This series adds support for the new per-route seg6 "tunsrc" parameter.
Selftests are extended to make sure it works as expected.

Example with the iproute2-next companion patch:

ip -6 r a 2001:db8:1::/64 encap seg6 mode encap tunsrc 2001:db8:ab::
segs 2001:db8:42::1,2001:db8:ffff::2 dev eth0
====================

Link: https://patch.msgid.link/20260324091434.359341-1-justin.iurman@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: add check for seg6 tunsrc

Extend srv6_hencap_red_l3vpn_test.sh to include checks for the new
"tunsrc" feature. If there is no support for tunsrc, it silently
falls back to the encap config without tunsrc.

Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@6wind.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20260324091434.359341-3-justin.iurman@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

seg6: add per-route tunnel source address

Add SEG6_IPTUNNEL_SRC in the uapi for users to configure a specific
tunnel source address. Make seg6_iptunnel handle the new attribute
correctly. It has priority over the configured per-netns tunnel
source address, if any.

Cc: David Ahern <dsahern@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@6wind.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20260324091434.359341-2-justin.iurman@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs/mlx5: Fix typo subfuction

Fix two typos:
- 'Subfunctons' -> 'Subfunctions'
- 'subfuction' -> 'subfunction'

Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Ryohei Kinugawa <ryohei.kinugawa@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260324053416.70166-1-ryohei.kinugawa@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: use phylink_expects_phy() in phylink_fwnode_phy_connect()

The tests in phylink_expects_phy() and phylink_fwnode_phy_connect() are
identical (by intention). Use phylink_expects_phy() to decide whether
to ignore a call to phylink_fwnode_phy_connect().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1w4zHg-0000000DmC4-2oyb@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-dwmac-socfpga-cleanup-fix_mac_speed'

Maxime Chevallier says:

====================
net: stmmac: dwmac-socfpga: Cleanup .fix_mac_speed

This small series does a bit of cleanup in the dwmad-socfpga glue
driver, especially around the .fix_mac_speed() operation.

It's mostly about re-using existing helpers from the glue driver, as
well as reorganizing the code to make the local private structures a
little bit smaller.

No functionnal changes are intended, this was tested on cyclone V.
====================

Link: https://patch.msgid.link/20260324092102.687082-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-sofcpga: Drop the struct device reference

We keep a reference to our the struct device in the socfpga_dwmac priv
structure, but now it's only ever used to produce logs in the
.set_phy_mode() ops, that are specific to this driver.

When we call that ops, we always have a ref to the struct device around,
so let's pass it to .set_phy_mode(). We can now discard that reference
from struct socfpga_dwmac.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260324092102.687082-6-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-socfpga: get the phy_mode with the dedicated helper

We enable/disable the sgmii_adapter in the .fix_mac_speed() ops based on
the phy_mode used in the plat_data. We currently get it with :

socfpga_dwmac
  ->dev
    ->drv_data
      ->netdev
        ->priv
  ->stmmac_priv
    ->plat
      ->phy_interface

where we can get it with :

socfpga_dwmac
  ->plat_data
    ->phy_interface (done by socfpga_get_plat_phymode)

Use that helper here.

Note that we are also being passed a phy_interface_t from the
.fix_mac_speed() callback, provided by phylink.

We can handle that in the future when dynamic interface selection is
supported. We'd need to guarantee that we have a Lynx PCS to handle it.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260324092102.687082-5-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-socfpga: Use the correct type for interface modes

The internal helper socfpga_get_plat_phymode() returns an int where we
actually return a PHY_INTERFACE_MODE_xxx, use the correct type for this.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260324092102.687082-4-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-socfpga: Use the socfpga_sgmii_config() helper

Use the existing socfpga_sgmii_config() helper in
socfpga_dwmac_fix_mac_speed(), instead of re-coding the register access.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260324092102.687082-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-socfpga: Move internal helpers

This is preparatory work to allow reusing the SGMII configuration helper
and the wrapper to get the interface in the fix_mac_speed() callback.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260324092102.687082-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-next-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
A fairly big set of changes all over, notably with:
- cfg80211: new APIs for NAN (Neighbor Aware Networking,
   aka Wi-Fi Aware) so less work must be in firmware
- mt76:
   - mt7996/mt7925 MLO fixes/improvements
   - mt7996 NPU support (HW eth/wifi traffic offload)
- iwlwifi: UNII-9 and continuing UHR work

* tag 'wireless-next-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (230 commits)
  wifi: mac80211: ignore reserved bits in reconfiguration status
  wifi: cfg80211: allow protected action frame TX for NAN
  wifi: ieee80211: Add some missing NAN definitions
  wifi: nl80211: Add a notification to notify NAN channel evacuation
  wifi: nl80211: add NL80211_CMD_NAN_ULW_UPDATE notification
  wifi: nl80211: allow reporting spurious NAN Data frames
  wifi: cfg80211: allow ToDS=0/FromDS=0 data frames on NAN data interfaces
  wifi: nl80211: define an API for configuring the NAN peer's schedule
  wifi: nl80211: add support for NAN stations
  wifi: cfg80211: separately store HT, VHT and HE capabilities for NAN
  wifi: cfg80211: add support for NAN data interface
  wifi: cfg80211: make sure NAN chandefs are valid
  wifi: cfg80211: Add an API to configure local NAN schedule
  wifi: mac80211: cleanup error path of ieee80211_do_open
  wifi: mac80211: extract channel logic from link logic
  wifi: iwlwifi: mld: set RX_FLAG_RADIOTAP_TLV_AT_END generically
  wifi: iwlwifi: reduce the number of prints upon firmware crash
  wifi: iwlwifi: fix the description of SESSION_PROTECTION_CMD
  wifi: iwlwifi: mld: introduce iwl_mld_vif_fw_id_valid
  wifi: iwlwifi: mld: block EMLSR during TDLS connections
  ...
====================

Link: https://patch.msgid.link/20260326152021.305959-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-support-for-nuvoton-ma35d1-gmac'

Joey Lu says:

====================
Add support for Nuvoton MA35D1 GMAC

This patch series is submitted to add GMAC support for Nuvoton MA35D1
SoC platform. This work involves implementing a GMAC driver glue layer
based on Synopsys DWMAC driver framework to leverage MA35D1's dual GMAC
interface capabilities.

Overview:
  1. Added a GMAC driver glue layer for MA35D1 SoC, providing support for
  the platform's two GMAC interfaces.
  2. Added device tree settings, with specific configurations for our
  development boards:
    a. SOM board: Configured for two RGMII interfaces.
    b. IoT board: Configured with one RGMII and one RMII interface.
  3. Added dt-bindings for the GMAC interfaces.
====================

Link: https://patch.msgid.link/20260323101756.81849-1-a0987203069@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-nuvoton: Add dwmac glue for Nuvoton MA35 family

Add support for Gigabit Ethernet on Nuvoton MA35 series using dwmac driver.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Joey Lu <a0987203069@gmail.com>
Link: https://patch.msgid.link/20260323101756.81849-4-a0987203069@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: nuvoton: Add schema for Nuvoton MA35 family GMAC

Create initial schema for Nuvoton MA35 family Gigabit MAC.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Joey Lu <a0987203069@gmail.com>
Link: https://patch.msgid.link/20260323101756.81849-2-a0987203069@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.0-rc6).

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth, CAN, IPsec and Netfilter.

  Notably, this includes the fix for the Bluetooth regression that you
  were notified about. I'm not aware of any other pending regressions.

  Current release - regressions:

    - bluetooth:
       - fix stack-out-of-bounds read in l2cap_ecred_conn_req
       - fix regressions caused by reusing ident

    - netfilter: revisit array resize logic

    - eth: ice: set max queues in alloc_etherdev_mqs()

  Previous releases - regressions:

    - core: correctly handle tunneled traffic on IPV6_CSUM GSO fallback

    - bluetooth:
       - fix dangling pointer on mgmt_add_adv_patterns_monitor_complete
       - fix deadlock in l2cap_conn_del()

    - sched: codel: fix stale state for empty flows in fq_codel

    - ipv6: remove permanent routes from tb6_gc_hlist when all exceptions expire.

    - xfrm: fix skb_put() panic on non-linear skb during reassembly

    - openvswitch:
       - avoid releasing netdev before teardown completes
       - validate MPLS set/set_masked payload length

    - eth: iavf: fix out-of-bounds writes in iavf_get_ethtool_stats()

  Previous releases - always broken:

    - bluetooth: fix null-ptr-deref on l2cap_sock_ready_cb

    - udp: fix wildcard bind conflict check when using hash2

    - netfilter: fix use of uninitialized rtp_addr in process_sdp

    - tls: Purge async_hold in tls_decrypt_async_wait()

    - xfrm:
       - prevent policy_hthresh.work from racing with netns teardown
       - fix skb leak with espintcp and async crypto

    - smc: fix double-free of smc_spd_priv when tee() duplicates splice pipe buffer

    - can:
       - add missing error handling to call can_ctrlmode_changelink()
       - fix OOB heap access in cgw_csum_crc8_rel()

    - eth:
       - mana: fix use-after-free in add_adev() error path
       - virtio-net: fix for VIRTIO_NET_F_GUEST_HDRLEN
       - bcmasp: fix double free of WoL irq"

* tag 'net-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (90 commits)
  net: macb: use the current queue number for stats
  netfilter: ctnetlink: use netlink policy range checks
  netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp
  netfilter: nf_conntrack_expect: skip expectations in other netns via proc
  netfilter: nf_conntrack_expect: store netns and zone in expectation
  netfilter: ctnetlink: ensure safe access to master conntrack
  netfilter: nf_conntrack_expect: use expect->helper
  netfilter: nf_conntrack_expect: honor expectation helper field
  netfilter: nft_set_rbtree: revisit array resize logic
  netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check()
  netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD
  tls: Purge async_hold in tls_decrypt_async_wait()
  selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug
  netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry
  Bluetooth: btusb: clamp SCO altsetting table indices
  Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
  Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del()
  Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock
  Bluetooth: L2CAP: Fix send LE flow credits in ACL link
  net: mana: fix use-after-free in add_adev() error path
  ...

Merge tag 'pinctrl-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

- Implement .get_direction() in the spmi-gpio gpio_chip

   Recent changes makes this start to print warnings and it's not nice,
   let's just fix it

- Clamp the return value of gpio_get() in the Renesas RZA1 driver

- Add the GPIO_GENERIC dependency to the STM32 HDP driver

- Modify the Mediatek driver to accept devices that do not use external
   interrupts (EINT) at all

- Fix flag propagation in the Sunxi driver, so that we can fix an issue
   with uninitialized pins in a follow-up patch using said flags

* tag 'pinctrl-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sunxi: fix gpiochip_lock_as_irq() failure when pinmux is unknown
  pinctrl: sunxi: pass down flags to pinctrl routines
  pinctrl: mediatek: common: Fix probe failure for devices without EINT
  pinctrl: stm32: fix HDP driver dependency on GPIO_GENERIC
  pinctrl: renesas: rza1: Normalize return value of gpio_get()
  pinctrl: qcom: spmi-gpio: implement .get_direction()
  pinctrl: renesas: rzt2h: Fix invalid wait context
  pinctrl: renesas: rzt2h: Fix device node leak in rzt2h_gpio_register()

Merge tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping fixes from Marek Szyprowski:
"A set of fixes for DMA-mapping subsystem, which resolve false-
  positive warnings from KMSAN and DMA-API debug (Shigeru Yoshida
  and Leon Romanovsky) as well as a simple build fix (Miguel Ojeda)"

* tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-mapping: add missing `inline` for `dma_free_attrs`
  mm/hmm: Indicate that HMM requires DMA coherency
  RDMA/umem: Tell DMA mapping that UMEM requires coherency
  iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attribute
  dma-direct: prevent SWIOTLB path when DMA_ATTR_REQUIRE_COHERENT is set
  dma-mapping: Introduce DMA require coherency attribute
  dma-mapping: Clarify valid conditions for CPU cache line overlap
  dma-mapping: handle DMA_ATTR_CPU_CACHE_CLEAN in trace output
  dma-debug: Allow multiple invocations of overlapping entries
  dma: swiotlb: add KMSAN annotations to swiotlb_bounce()

Merge tag 'nf-26-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter for net

This is v3, I kept back an ipset fix and another to tigthen the xtables
interface to reject invalid combinations with the NFPROTO_ARP family.
They need a bit more discussion. I fixed the issues reported by AI on
patch 9 (add #ifdef to access ct zone, update nf_conntrack_broadcast
and patch 10 (use better Fixes: tag). Thanks!

The following patchset contains Netfilter fixes for *net*.

Note that most bugs fixed here stem from 2.6 days, the large PR is not
due to an increase in regressions.

1) Fix incorrect reject of set updates with nf_tables pipapo set
   avx2 backend.  This comes with a regression test in patch 2.
   From Florian Westphal.

2) nfnetlink_log needs to zero padding to prevent infoleak to userspace,
   from Weiming Shi.

3) xtables ip6t_rt module never validated that addrnr length is within the
   allowed array boundary. Reject bogus values.  From Ren Wei.

4) Fix high memory usage in rbtree set backend that was unwanted side-effect
   of the recently added binary search blob. From Pablo Neira Ayuso.

5) Patches 5 to 10, also from Pablo, address long-standing RCU safety bugs
   in conntracks handling of expectations: We can never safely defer
   a conntrack extension area without holding a reference. Yet expectation
   handling does so in multiple places.  Fix this by avoiding the need to
   look into the master conntrack to begin with and by extending locked
   sections in a few places.

11) Fix use of uninitialized rtp_addr in the sip conntrack helper,
    also from Weiming Shi.

12) Add stricter netlink policy checks in ctnetlink, from David Carlier.
    This avoids undefined behaviour when userspace provides huge wscale
    value.

netfilter pull request 26-03-26

* tag 'nf-26-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: ctnetlink: use netlink policy range checks
  netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp
  netfilter: nf_conntrack_expect: skip expectations in other netns via proc
  netfilter: nf_conntrack_expect: store netns and zone in expectation
  netfilter: ctnetlink: ensure safe access to master conntrack
  netfilter: nf_conntrack_expect: use expect->helper
  netfilter: nf_conntrack_expect: honor expectation helper field
  netfilter: nft_set_rbtree: revisit array resize logic
  netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check()
  netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD
  selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug
  netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry
====================

Link: https://patch.msgid.link/20260326125153.685915-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
For ice:
Michal corrects call to alloc_etherdev_mqs() to provide maximum number
of queues supported rather than currently allocated number of queues.

Petr Oros fixes issues related to some ethtool operations in switchdev
mode.

For iavf:
Kohei Enju corrects number of reported queues for ethtool statistics to
absolute max as using current number could race and cause out-of-bounds
issues.

For idpf:
Josh NULLs cdev_info pointer after freeing to prevent possible subsequent
improper access. He also defers setting of refillqs value until after
allocation to prevent possible NULL pointer dereference.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  idpf: only assign num refillqs if allocation was successful
  idpf: clear stale cdev_info ptr
  iavf: fix out-of-bounds writes in iavf_get_ethtool_stats()
  ice: use ice_update_eth_stats() for representor stats
  ice: fix inverted ready check for VF representors
  ice: set max queues in alloc_etherdev_mqs()
====================

Link: https://patch.msgid.link/20260323205843.624704-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mana: Set default number of queues to 16

Set the default number of queues per vPort to MANA_DEF_NUM_QUEUES (16),
as 16 queues can achieve optimal throughput for typical workloads. The
actual number of queues may be lower if it exceeds the hardware reported
limit. Users can increase the number of queues up to max_queues via
ethtool if needed.

Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/20260323194925.1766385-1-longli@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: macb: use the current queue number for stats

There's a potential mismatch between the memory reserved for statistics
and the amount of memory written.

gem_get_sset_count() correctly computes the number of stats based on the
active queues, whereas gem_get_ethtool_stats() indiscriminately copies
data using the maximum number of queues, and in the case the number of
active queues is less than MACB_MAX_QUEUES, this results in a OOB write
as observed in the KASAN splat.

==================================================================
BUG: KASAN: vmalloc-out-of-bounds in gem_get_ethtool_stats+0x54/0x78
  [macb]
Write of size 760 at addr ffff80008080b000 by task ethtool/1027

CPU: [...]
Tainted: [E]=UNSIGNED_MODULE
Hardware name: raspberrypi rpi/rpi, BIOS 2025.10 10/01/2025
Call trace:
show_stack+0x20/0x38 (C)
dump_stack_lvl+0x80/0xf8
print_report+0x384/0x5e0
kasan_report+0xa0/0xf0
kasan_check_range+0xe8/0x190
__asan_memcpy+0x54/0x98
gem_get_ethtool_stats+0x54/0x78 [macb
   926c13f3af83b0c6fe64badb21ec87d5e93fcf65]
dev_ethtool+0x1220/0x38c0
dev_ioctl+0x4ac/0xca8
sock_do_ioctl+0x170/0x1d8
sock_ioctl+0x484/0x5d8
__arm64_sys_ioctl+0x12c/0x1b8
invoke_syscall+0xd4/0x258
el0_svc_common.constprop.0+0xb4/0x240
do_el0_svc+0x48/0x68
el0_svc+0x40/0xf8
el0t_64_sync_handler+0xa0/0xe8
el0t_64_sync+0x1b0/0x1b8

The buggy address belongs to a 1-page vmalloc region starting at
  0xffff80008080b000 allocated at dev_ethtool+0x11f0/0x38c0
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000
  index:0xffff00000a333000 pfn:0xa333
flags: 0x7fffc000000000(node=0|zone=0|lastcpupid=0x1ffff)
raw: 007fffc000000000 0000000000000000 dead000000000122 0000000000000000
raw: ffff00000a333000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff80008080b080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff80008080b100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff80008080b180: 00 00 00 00 00 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
                                  ^
ffff80008080b200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
ffff80008080b280: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
==================================================================

Fix it by making sure the copied size only considers the active number of
queues.

Fixes: 512286bbd4b7 ("net: macb: Added some queue statistics")
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260323191634.2185840-1-pvalerio@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'for-net-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- L2CAP: Fix deadlock in l2cap_conn_del()
- L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
- L2CAP: Fix send LE flow credits in ACL link
- btintel: serialize btintel_hw_error() with hci_req_sync_lock
- btusb: clamp SCO altsetting table indices

* tag 'for-net-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btusb: clamp SCO altsetting table indices
  Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
  Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del()
  Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock
  Bluetooth: L2CAP: Fix send LE flow credits in ACL link
====================

Link: https://patch.msgid.link/20260325194358.618892-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netfilter: ctnetlink: use netlink policy range checks

Replace manual range and mask validations with netlink policy
annotations in ctnetlink code paths, so that the netlink core rejects
invalid values early and can generate extack errors.

- CTA_PROTOINFO_TCP_STATE: reject values > TCP_CONNTRACK_SYN_SENT2 at
  policy level, removing the manual >= TCP_CONNTRACK_MAX check.
- CTA_PROTOINFO_TCP_WSCALE_ORIGINAL/REPLY: reject values > TCP_MAX_WSCALE
  (14). The normal TCP option parsing path already clamps to this value,
  but the ctnetlink path accepted 0-255, causing undefined behavior when
  used as a u32 shift count.
- CTA_FILTER_ORIG_FLAGS/REPLY_FLAGS: use NLA_POLICY_MASK with
  CTA_FILTER_F_ALL, removing the manual mask checks.
- CTA_EXPECT_FLAGS: use NLA_POLICY_MASK with NF_CT_EXPECT_MASK, adding
  a new mask define grouping all valid expect flags.

Extracted from a broader nf-next patch by Florian Westphal, scoped to
ctnetlink for the fixes tree.

Fixes: c8e2078cfe41 ("[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling")
Signed-off-by: David Carlier <devnexen@gmail.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp

process_sdp() declares union nf_inet_addr rtp_addr on the stack and
passes it to the nf_nat_sip sdp_session hook after walking the SDP
media descriptions. However rtp_addr is only initialized inside the
media loop when a recognized media type with a non-zero port is found.

If the SDP body contains no m= lines, only inactive media sections
(m=audio 0 ...) or only unrecognized media types, rtp_addr is never
assigned. Despite that, the function still calls hooks->sdp_session()
with &rtp_addr, causing nf_nat_sdp_session() to format the stale stack
value as an IP address and rewrite the SDP session owner and connection
lines with it.

With CONFIG_INIT_STACK_ALL_ZERO (default on most distributions) this
results in the session-level o= and c= addresses being rewritten to
0.0.0.0 for inactive SDP sessions. Without stack auto-init the
rewritten address is whatever happened to be on the stack.

Fix this by pre-initializing rtp_addr from the session-level connection
address (caddr) when available, and tracking via a have_rtp_addr flag
whether any valid address was established. Skip the sdp_session hook
entirely when no valid address exists.

Fixes: 4ab9e64e5e3c ("[NETFILTER]: nf_nat_sip: split up SDP mangling")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack_expect: skip expectations in other netns via proc

Skip expectations that do not reside in this netns.

Similar to e77e6ff502ea ("netfilter: conntrack: do not dump other netns's
conntrack entries via proc").

Fixes: 9b03f38d0487 ("netfilter: netns nf_conntrack: per-netns expectations")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack_expect: store netns and zone in expectation

__nf_ct_expect_find() and nf_ct_expect_find_get() are called under
rcu_read_lock() but they dereference the master conntrack via
exp->master.

Since the expectation does not hold a reference on the master conntrack,
this could be dying conntrack or different recycled conntrack than the
real master due to SLAB_TYPESAFE_RCU.

Store the netns, the master_tuple and the zone in struct
nf_conntrack_expect as a safety measure.

This patch is required by the follow up fix not to dump expectations
that do not belong to this netns.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ctnetlink: ensure safe access to master conntrack

Holding reference on the expectation is not sufficient, the master
conntrack object can just go away, making exp->master invalid.

To access exp->master safely:

- Grab the nf_conntrack_expect_lock, this gets serialized with
  clean_from_lists() which also holds this lock when the master
  conntrack goes away.

- Hold reference on master conntrack via nf_conntrack_find_get().
  Not so easy since the master tuple to look up for the master conntrack
  is not available in the existing problematic paths.

This patch goes for extending the nf_conntrack_expect_lock section
to address this issue for simplicity, in the cases that are described
below this is just slightly extending the lock section.

The add expectation command already holds a reference to the master
conntrack from ctnetlink_create_expect().

However, the delete expectation command needs to grab the spinlock
before looking up for the expectation. Expand the existing spinlock
section to address this to cover the expectation lookup. Note that,
the nf_ct_expect_iterate_net() calls already grabs the spinlock while
iterating over the expectation table, which is correct.

The get expectation command needs to grab the spinlock to ensure master
conntrack does not go away. This also expands the existing spinlock
section to cover the expectation lookup too. I needed to move the
netlink skb allocation out of the spinlock to keep it GFP_KERNEL.

For the expectation events, the IPEXP_DESTROY event is already delivered
under the spinlock, just move the delivery of IPEXP_NEW under the
spinlock too because the master conntrack event cache is reached through
exp->master.

While at it, add lockdep notations to help identify what codepaths need
to grab the spinlock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack_expect: use expect->helper

Use expect->helper in ctnetlink and /proc to dump the helper name.
Using nfct_help() without holding a reference to the master conntrack
is unsafe.

Use exp->master->helper in ctnetlink path if userspace does not provide
an explicit helper when creating an expectation to retain the existing
behaviour. The ctnetlink expectation path holds the reference on the
master conntrack and nf_conntrack_expect lock and the nfnetlink glue
path refers to the master ct that is attached to the skb.

Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack_expect: honor expectation helper field

The expectation helper field is mostly unused. As a result, the
netfilter codebase relies on accessing the helper through exp->master.

Always set on the expectation helper field so it can be used to reach
the helper.

nf_ct_expect_init() is called from packet path where the skb owns
the ct object, therefore accessing exp->master for the newly created
expectation is safe. This saves a lot of updates in all callsites
to pass the ct object as parameter to nf_ct_expect_init().

This is a preparation patches for follow up fixes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_set_rbtree: revisit array resize logic

Chris Arges reports high memory consumption with thousands of
containers, this patch revisits the array allocation logic.

For anonymous sets, start by 16 slots (which takes 256 bytes on x86_64).
Expand it by x2 until threshold of 512 slots is reached, over that
threshold, expand it by x1.5.

For non-anonymous set, start by 1024 slots in the array (which takes 16
Kbytes initially on x86_64). Expand it by x1.5.

Use set->ndeact to subtract deactivated elements when calculating the
number of the slots in the array, otherwise the array size array gets
increased artifically. Add special case shrink logic to deal with flush
set too.

The shrink logic is skipped by anonymous sets.

Use check_add_overflow() to calculate the new array size.

Add a WARN_ON_ONCE check to make sure elements fit into the new array
size.

Reported-by: Chris Arges <carges@cloudflare.com>
Fixes: 7e43e0a1141d ("netfilter: nft_set_rbtree: translate rbtree to array for binary search")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check()

Reject rt match rules whose addrnr exceeds IP6T_RT_HOPS.

rt_mt6() expects addrnr to stay within the bounds of rtinfo->addrs[].
Validate addrnr during rule installation so malformed rules are rejected
before the match logic can use an out-of-range value.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Yuhang Zheng <z1652074432@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD

__build_packet_message() manually constructs the NFULA_PAYLOAD netlink
attribute using skb_put() and skb_copy_bits(), bypassing the standard
nla_reserve()/nla_put() helpers. While nla_total_size(data_len) bytes
are allocated (including NLA alignment padding), only data_len bytes
of actual packet data are copied. The trailing nla_padlen(data_len)
bytes (1-3 when data_len is not 4-byte aligned) are never initialized,
leaking stale heap contents to userspace via the NFLOG netlink socket.

Replace the manual attribute construction with nla_reserve(), which
handles the tailroom check, header setup, and padding zeroing via
__nla_reserve(). The subsequent skb_copy_bits() fills in the payload
data on top of the properly initialized attribute.

Fixes: df6fb868d611 ("[NETFILTER]: nfnetlink: convert to generic netlink attribute functions")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge branch 'net-dpaa2-mac-export-standard-statistics'

Ioana Ciornei says:

====================
net: dpaa2-mac: export standard statistics

This patch set adds support for standard ethtool statistics - rmon,
eth-ctrl, eth-mac and pause - to dpaa2-mac and its users dpaa2-eth and
dpaa2-switch.

The first patch extends the firmware APIs related to MAC counters and
adds dpmac_get_statistics() which can be used to retrieve multiple counter
values through a single firmware call.

This new API is put in use in the second patch by gathering all
previously exported ethtool statistics through a single MC firmware
call. In this patch we are also adding the setup and cleanup
infrastructure which will be also used for the standard ethtool
counters.

The third patch adds the actual suppord for rmon, eth-ctrl, eth-mac and
pause statistics in dpaa2-mac and its users.
====================

Link: https://patch.msgid.link/20260323115039.3932600-1-ioana.ciornei@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dpaa2-mac: export standard statistics

Take advantage of all the newly added MAC counters available through the
MC API and export them through the standard statistics structures -
rmon, eth-ctrl, eth-mac and pause.

A new version based feature is added into dpaa2-mac -
DPAA2_MAC_FEATURE_STANDARD_STATS - and based on the two memory zones
needed for gathering the MAC counters are setup for each statistics
group.

The dpmac_counter structure is extended with a new field - size_t offset
- which is being used to instruct the dpaa2_mac_transfer_stats()
function where exactly to store a counter value inside the standard
statistics structure.

The newly added support is used both in the dpaa2-eth driver as well as
the dpaa2-switch one.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260323115039.3932600-4-ioana.ciornei@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dpaa2-mac: retrieve MAC statistics in one firmware command

The latest MC firmware version added a new command to retrieve all DPMAC
counters in a single firmware call. Use this new command, when possible,
in dpaa2-mac as well.

In order to use the dpmac_get_statistics() API, two DMA memory areas are
used: one to transmit what counters the driver is requesting and one to
receive the values of those counters. These memory areas are allocated
and DMA mapped at probe time so that we don't waste time at runtime.

And since we are planning to add rmon, eth-ctrl and other standard
statistics using the same infrastructure, make the setup and cleanup
processes as generic as possibile through the dpaa2_mac_setup_stats()
and dpaa2_mac_clear_stats() functions.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260323115039.3932600-3-ioana.ciornei@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dpaa2-mac: extend APIs related to statistics

Extend the dpmac_counter_id enum with the newly added counters which can
be interrogated through the MC firmware. Also add the
dpmac_get_statistics() API which can be used to retrieve multiple MAC
counters through a single firmware command.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260323115039.3932600-2-ioana.ciornei@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-pf: macsec: Use AES library instead of ecb(aes) skcipher

cn10k_ecb_aes_encrypt() just encrypts a single block with AES. That is
much more easily and efficiently done with the AES library than
crypto_skcipher. Use the AES library instead.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/20260321225208.64508-1-ebiggers@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

tls: Purge async_hold in tls_decrypt_async_wait()

The async_hold queue pins encrypted input skbs while
the AEAD engine references their scatterlist data. Once
tls_decrypt_async_wait() returns, every AEAD operation
has completed and the engine no longer references those
skbs, so they can be freed unconditionally.

A subsequent patch adds batch async decryption to
tls_sw_read_sock(), introducing a new call site that
must drain pending AEAD operations and release held
skbs. Move __skb_queue_purge(&ctx->async_hold) into
tls_decrypt_async_wait() so the purge is centralized
and every caller -- recvmsg's drain path, the -EBUSY
fallback in tls_do_decryption(), and the new read_sock
batch path -- releases held skbs on synchronization
without each site managing the purge independently.

This fixes a leak when tls_strp_msg_hold() fails part-way through,
after having added some cloned skbs to the async_hold
queue. tls_decrypt_sg() will then call tls_decrypt_async_wait() to
process all pending decrypts, and drop back to synchronous mode, but
tls_sw_recvmsg() only flushes the async_hold queue when one record has
been processed in "fully-async" mode, which may not be the case here.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Fixes: b8a6ff84abbc ("tls: wait for pending async decryptions if tls_strp_msg_hold fails")
Link: https://patch.msgid.link/20260324-tls-read-sock-v5-1-5408befe5774@oracle.com
[pabeni@redhat.com: added leak comment]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'erofs-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

- Mark I/Os as failed when encountering short reads on file-backed
   mounts

- Label GFP_NOIO in the BIO completion when the completion is in the
   process context, and directly call into the decompression to avoid
   deadlocks

- Improve Kconfig descriptions to better highlight the overall efforts

- Fix .fadvise() for page cache sharing

* tag 'erofs-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix .fadvise() for page cache sharing
  erofs: update the Kconfig description
  erofs: add GFP_NOIO in the bio completion if needed
  erofs: set fileio bio failed in short read case

Merge tag 'rcu-fixes.v7.0-20260325a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU fixes from Boqun Feng:
"Fix a regression introduced by commit c27cea4416a3 ("rcu: Re-implement
  RCU Tasks Trace in terms of SRCU-fast"): BPF contexts can run with
  preemption disabled or scheduler locks held, so call_srcu() must work
  in all such contexts.

  Fix this by converting SRCU's spinlocks to raw spinlocks and avoiding
  scheduler lock acquisition in call_srcu() by deferring to an irq_work
  (similar to call_rcu_tasks_generic()), for both tree SRCU and tiny
  SRCU.

  Also fix a follow-on lockdep splat caused by srcu_node allocation
  under the newly introduced raw spinlock by deferring the allocation to
  grace-period worker context"

* tag 'rcu-fixes.v7.0-20260325a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  srcu: Use irq_work to start GP in tiny SRCU
  rcu: Use an intermediate irq_work to start process_srcu()
  srcu: Push srcu_node allocation to GP when non-preemptible
  srcu: Use raw spinlocks so call_srcu() can be used under preempt_disable()

Merge tag 'hardening-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

- fix required Clang version for CC_HAS_COUNTED_BY_PTR (Nathan
   Chancellor)

- update Coccinelle script used for kmalloc_obj

* tag 'hardening-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  init/Kconfig: Require a release version of clang-22 for CC_HAS_COUNTED_BY_PTR
  coccinelle: kmalloc_obj: Remove default GFP_KERNEL arg

Merge tag 'platform-drivers-x86-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes and New HW Support. The trivial drop of unused gz_chain_head is
  not exactly fixes material but it allows other work to avoid problems
  so I decided to take it in along with the fixes.

   - amd/hsmp: Fix typo in error message

   - asus-armoury: Add support for G614FP, GA503QM, GZ302EAC, and GZ302EAC

   - asus-nb-wmi: Add DMI quirk for ASUS ROG Flow Z13-KJP GZ302EAC

   - hp-wmi: Support for Omen 16-k0xxx, 16-wf1xxx, 16-xf0xxx

   - intel-hid: Disable wakeup_mode during hibernation

   - ISST:
      - Check HWP support before MSR access
      - Correct locked bit width

   - lenovo: wmi-gamezone: Drop unused gz_chain_head

   - olpc-xo175-ec: Fix overflow error message"

* tag 'platform-drivers-x86-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: ISST: Correct locked bit width
  platform/x86: intel-hid: disable wakeup_mode during hibernation
  platform/x86: asus-armoury: add support for GZ302EA and GZ302EAC
  platform/x86: asus-nb-wmi: add DMI quirk for ASUS ROG Flow Z13-KJP GZ302EAC
  platform/x86/amd/hsmp: Fix typo in error message
  platform/olpc: olpc-xo175-ec: Fix overflow error message to print inlen
  platform/x86: lenovo: wmi-gamezone: Drop gz_chain_head
  platform/x86: ISST: Check HWP support before MSR access
  platform/x86: hp-wmi: Add support for Omen 16-k0xxx (8A4D)
  platform/x86: hp-wmi: Add support for Omen 16-wf1xxx (8C76)
  platform/x86: hp-wmi: Add Omen 16-xf0xxx (8BCA) support
  platform/x86: asus-armoury: add support for G614FP
  platform/x86: asus-armoury: add support for GA503QM
  MAINTAINERS: change email address of Denis Benato

selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug

This test will fail without
the preceding commit ("netfilter: nft_set_pipapo_avx2: fix match retart if found element is expired"):

reject overlapping range on add 0s [ OK ]
reload with flush /dev/stdin:59:32-52: Error: Could not process rule: File exists
add element inet filter test { 10.0.0.29 . 10.0.2.29 }

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry

New test case fails unexpectedly when avx2 matching functions are used.

The test first loads a ranomly generated pipapo set
with 'ipv4 . port' key, i.e.  nft -f foo.

This works.  Then, it reloads the set after a flush:
(echo flush set t s; cat foo) | nft -f -

This is expected to work, because its the same set after all and it was
already loaded once.

But with avx2, this fails: nft reports a clashing element.

The reported clash is of following form:

    We successfully re-inserted
      a . b
      c . d

Then we try to insert a . d

avx2 finds the already existing a . d, which (due to 'flush set') is marked
as invalid in the new generation.  It skips the element and moves to next.

Due to incorrect masking, the skip-step finds the next matching
element *only considering the first field*,

i.e. we return the already reinserted "a . b", even though the
last field is different and the entry should not have been matched.

No such error is reported for the generic c implementation (no avx2) or when
the last field has to use the 'nft_pipapo_avx2_lookup_slow' fallback.

Bisection points to
7711f4bb4b36 ("netfilter: nft_set_pipapo: fix range overlap detection")
but that fix merely uncovers this bug.

Before this commit, the wrong element is returned, but erronously
reported as a full, identical duplicate.

The root-cause is too early return in the avx2 match functions.
When we process the last field, we should continue to process data
until the entire input size has been consumed to make sure no stale
bits remain in the map.

Link: https://lore.kernel.org/netfilter-devel/20260321152506.037f68c0@elisabeth/
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

wifi: mac80211: ignore reserved bits in reconfiguration status

The Link ID Info field in the Reconfiguration Status Duple subfield of
the Reconfiguration Response frame only uses the lower four bits for the
link ID. The upper bits are reserved and should therefore be ignored.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260325215404.ab5ccf4bc62e.I9aef8f4fb6f1b06671bb6cf0e2bd4ec6e4c8bda4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: allow protected action frame TX for NAN

Allow transmitting protected dual of public action frames on
NAN device and NAN data interfaces, since NAN action frames
may be protected and can be sent on both.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260318143604.73801a92180c.I16000c3e1e2bbc320457db1ac728d789bb2f36c6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: ieee80211: Add some missing NAN definitions

Add some missing NAN Device capabilities definitions.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260318143604.5f6b36d2b208.I7ef571682d5add96eabfcf87f81285893021e851@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: nl80211: Add a notification to notify NAN channel evacuation

If all available channel resources are used for NAN channels, and one of
them is shared with another interface, and that interface needs to move
to a different channel (for example STA interface that needs to do a
channel or a link switch), then the driver can evacuate one of the NAN
channels (i.e. detach it from its channel resource and announce to the
peers that this channel is ULWed). In that case, the driver needs to
notify user space about the channel evacuation, so the user space can
adjust the local schedule accordingly.

Add a notification to let userspace know about it.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219114327.d5bebfd5ff73.Iaaf5ef17e1ab7a38c19d60558e68fcf517e2b400@changeid
Link: https://patch.msgid.link/20260318123926.206536-11-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: nl80211: add NL80211_CMD_NAN_ULW_UPDATE notification

Add a new notification command that allows drivers to notify user space
when the device's ULW (Unaligned Schedule) blob has been updated. This
enables user space to attach the updated ULW blob to frames sent to NAN
peers.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219114327.32b715af4ebb.Ibdb6e33941afd94abf77245245f87e4338d729d3@changeid
Link: https://patch.msgid.link/20260318123926.206536-10-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: nl80211: allow reporting spurious NAN Data frames

Currently we have this ability for AP and GO. But it is now needed also for
NAN_DATA mode - as per Wi-Fi Aware (TM) 4.0 specification 6.2.5:
"If a NAN Device receives a unicast NAN Data frame destined for it, but
with A1 address and A2 address that are not assigned to the NDP, it shall
discard the frame, and should send a Data Path Termination NAF to the
frame transmitter"

To allow this, change NL80211_CMD_UNEXPECTED_FRAME to support also
NAN_DATA, so drivers can report such cases and the user space can act
accordingly.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260108102921.5cf9f1351655.I47c98ce37843730b8b9eb8bd8e9ef62ed6c17613@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219094725.3846371-6-miriam.rachel.korenblit@intel.com
Link: https://patch.msgid.link/20260318123926.206536-9-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: allow ToDS=0/FromDS=0 data frames on NAN data interfaces

According to Wi-Fi Aware (TM) specification Table 3, data frame should
have 0 in the FromDS/ToDS fields. Don't drop received frames with 0
FromDS/ToDS if they are received on NAN_DATA interface.
While at it, fix a double indent.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260108102921.de5f318a790a.Id34dd69552920b579e6881ffd38fa692a491b601@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219094725.3846371-5-miriam.rachel.korenblit@intel.com
Link: https://patch.msgid.link/20260318123926.206536-8-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: nl80211: define an API for configuring the NAN peer's schedule

Add an NL80211 command to configure the NAN schedule of a NAN peer.
Such a schedule contains a list of NAN channels, and a mapping from each
time slots to the corresponding channel (or unscheduled).
Also contains more information about the schedule, such as sequence ID
and map ID.

Not all of the restrictions are validated in this patch. In particular,
comparison of two maps of the same peer requires storing/retrieving each
map of each peer, only for validation.
Therefore, it is the responsibilty of the driver to check that.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219114327.5b13fa5af4f6.If0e214ff5b52c9666e985fefa3f7be0ad14d93fb@changeid
Link: https://patch.msgid.link/20260318123926.206536-7-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: nl80211: add support for NAN stations

There are 2 types of logical links with a NAN peer:
- management (NMI), which is used for Tx/Rx of NAN management frames.
- data (NDI), which is used for Tx/Rx of data frames, or non-NAN
  management frames.

The NMI station has two roles:
- representation of the NAN peer - for example, the peer's schedule
  and the HT, VHT, HE capabilities - belong to the NMI station, and not to
  the NDI ones.
- Tx/Rx of NAN management frames to/from the peer.

The NDI station is used for Tx/Rx data frames of a specific NDP that was
established with the NAN peer.

Note that a peer can choose to reuse its NMI address as the NDI address.
In that case, it is expected that two stations will be added even though
they will have the same address.

- An NDI station can only be added after the corresponding NMI station
  was configured with capabilities.
- All the NDI stations will be removed before the NDI interface is brought
  down.
- All NMI stations will be removed before NAN is stopped.
- Before NMI sta removal, all corresponding NDI stations will be removed

Add support for adding, removing, and changing NMI and NDI stations.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219114327.d280936ee832.I6d859eee759bb5824a9ffd2984410faf879ba00e@changeid
Link: https://patch.msgid.link/20260318123926.206536-6-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: separately store HT, VHT and HE capabilities for NAN

In NAN, unlike in other modes, there is only one set of (HT, VHT, HE)
capabilities that is used for all channels (and bands) used in the NAN
data path.

This set of capabilities will have to be a special one, for example - have
the minimum of (HT-for-5 GHz, HT-for-2.4 GHz), careful handling of the
bits that have a different meaning for each band, etc.

While we could use the exiting sband/iftype capabilities, and require
identical capabilities for all bands (makes no sense since this means
that we will have VHT capabilities in the 2.4 GHz slot),
or require that only one of the sbands will be set,
or have logic to extract the minimum and handle the conflicting bits -
it seems simpler to add a dedicated set of capabilities which is special
for NAN, and is band agnostic, to be populated by the driver.

That way we also let the driver decide how it wants to handle the
conflicting bits.

Add this special set of these capabilities to wiphy:nan_capabilities, to be
populated by the driver.
Send it to user space.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219114327.4b6f3e4a81b4.I45422adc0df3ad4101d857a92e83f0de5cf241e1@changeid
Link: https://patch.msgid.link/20260318123926.206536-5-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: add support for NAN data interface

This new interface type represents a NAN data interface (NDI).
It is used for data communication with NAN peers.

Note that the existing NL80211_IFTYPE_NAN interface, which is the NAN
Management Interface (NMI), is used for management communication.

An NDI interface is started when a new NAN data path is about to
be established, and is stopped after the NAN data path is terminated.

- An NDI interface can only be started if the NMI is running, and NAN is
started.
- Before the NMI is stopped, the NDI interfaces will be stopped.

Add the new interface type, handle add/remove operations for it,
and makes sure of the conditions above.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219114327.0d681335c2e2.I92973483e927820ae2297853c141842fdb262747@changeid
Link: https://patch.msgid.link/20260318123926.206536-4-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: make sure NAN chandefs are valid

Until now there was not handling for NAN in reg_wdev_chan_valid.
Now as this wdev might use chandefs, check the validity of those.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260108102921.51b42ffc9a42.Iacb030fc17027afb55707ca1d6dc146631d55767@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219094725.3846371-4-miriam.rachel.korenblit@intel.com
Link: https://patch.msgid.link/20260318123926.206536-3-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: Add an API to configure local NAN schedule

Add an nl80211 API to allow user space to configure the local NAN
schedule.
The local schedule consists of a list of channel definitions and a schedule
map, in which each element covers a time slot and indicates on what
channel the device should be in that time slot.

Channels can be added to schedule even without being scheduled, for
reservation purposes.

A schedule can be configured either immedietally or be deferred, in case
there are already connected peers.
When the deferred flag is set, the command is a request from the device
to perform an announced schedule update: send the updated NAN
Availability - as set in this command - to the peers, and do the
actual switch to the new schedule on the right time (i.e. at the end of
the slot after the slot in which the update was sent to the peers).
In addition, a notification will be sent to indicate a deferred update
completion.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219114327.ecca178a2de0.Ic977ab08b4ed5cf9b849e55d3a59b01ad3fbd08e@changeid
Link: https://patch.msgid.link/20260318123926.206536-2-miriam.rachel.korenblit@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge tag 'iwlwifi-next-2026-03-25' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next into wireless-next

Miri Korenblit says:
====================
wifi: iwlwifi: updates - 2026-03-25
====================

Looks like among mostly cleanups there's
- UNII-9 enablement,
- more UHR work, and
- various FW API updates

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: cleanup error path of ieee80211_do_open

If we failed on drv_start, we currently cleanup AP_VLAN reference to
bss.
But this is not needed, since AP_VLAN must be tied to a pre-existing AP
interface, so open_count cannot be 0, so we will never call drv_start
for AP_VLAN interfaces.

Remove these cleanup and return immediately instead.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260325154742.3c532a9132c3.Idac5c38d5ad7ce97782a8c05ae72bb0c689c4fa9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: extract channel logic from link logic

The logic that tries to reuse an existing chanctx or create a new one if
such doesn't exist will be used for other types of chanctx users.
Extract this logic from _ieee80211_link_use_channel.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260325154550.9a08397a7590.Id24934d14f240f8d38a23f3b1786235bac0b3e60@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Bluetooth: btusb: clamp SCO altsetting table indices

btusb_work() maps the number of active SCO links to USB alternate
settings through a three-entry lookup table when CVSD traffic uses
transparent voice settings. The lookup currently indexes alts[] with
data->sco_num - 1 without first constraining sco_num to the number of
available table entries.

While the table only defines alternate settings for up to three SCO
links, data->sco_num comes from hci_conn_num() and is used directly.
Cap the lookup to the last table entry before indexing it so the
driver keeps selecting the highest supported alternate setting without
reading past alts[].

Fixes: baac6276c0a9 ("Bluetooth: btusb: handle mSBC audio over USB Endpoints")
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop

l2cap_config_req() processes CONFIG_REQ for channels in BT_CONNECTED
state to support L2CAP reconfiguration (e.g. MTU changes). However,
since both CONF_INPUT_DONE and CONF_OUTPUT_DONE are already set from
the initial configuration, the reconfiguration path falls through to
l2cap_ertm_init(), which re-initializes tx_q, srej_q, srej_list, and
retrans_list without freeing the previous allocations and sets
chan->sdu to NULL without freeing the existing skb. This leaks all
previously allocated ERTM resources.

Additionally, l2cap_parse_conf_req() does not validate the minimum
value of remote_mps derived from the RFC max_pdu_size option. A zero
value propagates to l2cap_segment_sdu() where pdu_len becomes zero,
causing the while loop to never terminate since len is never
decremented, exhausting all available memory.

Fix the double-init by skipping l2cap_ertm_init() and
l2cap_chan_ready() when the channel is already in BT_CONNECTED state,
while still allowing the reconfiguration parameters to be updated
through l2cap_parse_conf_req(). Also add a pdu_len zero check in
l2cap_segment_sdu() as a safeguard.

Fixes: 96298f640104 ("Bluetooth: L2CAP: handle l2cap config request during open state")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del()

l2cap_conn_del() calls cancel_delayed_work_sync() for both info_timer
and id_addr_timer while holding conn->lock. However, the work functions
l2cap_info_timeout() and l2cap_conn_update_id_addr() both acquire
conn->lock, creating a potential AB-BA deadlock if the work is already
executing when l2cap_conn_del() takes the lock.

Move the work cancellations before acquiring conn->lock and use
disable_delayed_work_sync() to additionally prevent the works from
being rearmed after cancellation, consistent with the pattern used in
hci_conn_del().

Fixes: ab4eedb790ca ("Bluetooth: L2CAP: Fix corrupted list in hci_chan_del")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock

btintel_hw_error() issues two __hci_cmd_sync() calls (HCI_OP_RESET
and Intel exception-info retrieval) without holding
hci_req_sync_lock().  This lets it race against
hci_dev_do_close() -> btintel_shutdown_combined(), which also runs
__hci_cmd_sync() under the same lock.  When both paths manipulate
hdev->req_status/req_rsp concurrently, the close path may free the
response skb first, and the still-running hw_error path hits a
slab-use-after-free in kfree_skb().

Wrap the whole recovery sequence in hci_req_sync_lock/unlock so it
is serialized with every other synchronous HCI command issuer.

Below is the data race report and the kasan report:

  BUG: data-race in __hci_cmd_sync_sk / btintel_shutdown_combined

  read of hdev->req_rsp at net/bluetooth/hci_sync.c:199
  by task kworker/u17:1/83:
   __hci_cmd_sync_sk+0x12f2/0x1c30 net/bluetooth/hci_sync.c:200
   __hci_cmd_sync+0x55/0x80 net/bluetooth/hci_sync.c:223
   btintel_hw_error+0x114/0x670 drivers/bluetooth/btintel.c:254
   hci_error_reset+0x348/0xa30 net/bluetooth/hci_core.c:1030

  write/free by task ioctl/22580:
   btintel_shutdown_combined+0xd0/0x360
    drivers/bluetooth/btintel.c:3648
   hci_dev_close_sync+0x9ae/0x2c10 net/bluetooth/hci_sync.c:5246
   hci_dev_do_close+0x232/0x460 net/bluetooth/hci_core.c:526

  BUG: KASAN: slab-use-after-free in
   sk_skb_reason_drop+0x43/0x380 net/core/skbuff.c:1202
  Read of size 4 at addr ffff888144a738dc
  by task kworker/u17:1/83:
   __hci_cmd_sync_sk+0x12f2/0x1c30 net/bluetooth/hci_sync.c:200
   __hci_cmd_sync+0x55/0x80 net/bluetooth/hci_sync.c:223
   btintel_hw_error+0x186/0x670 drivers/bluetooth/btintel.c:260

Fixes: 973bb97e5aee ("Bluetooth: btintel: Add generic function for handling hardware errors")
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: L2CAP: Fix send LE flow credits in ACL link

When the L2CAP channel mode is L2CAP_MODE_ERTM/L2CAP_MODE_STREAMING,
l2cap_publish_rx_avail will be called and le flow credits will be sent in
l2cap_chan_rx_avail, even though the link type is ACL.

The logs in question as follows:
> ACL Data RX: Handle 129 flags 0x02 dlen 12
      L2CAP: Unknown (0x16) ident 4 len 4
        40 00 ed 05
< ACL Data TX: Handle 129 flags 0x00 dlen 10
      L2CAP: Command Reject (0x01) ident 4 len 2
        Reason: Command not understood (0x0000)

Bluetooth: Unknown BR/EDR signaling command 0x16
Bluetooth: Wrong link type (-22)

Fixes: ce60b9231b66 ("Bluetooth: compute LE flow credits based on recvbuf space")
Signed-off-by: Zhang Chen <zhangchen01@kylinos.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

srcu: Use irq_work to start GP in tiny SRCU

Tiny SRCU's srcu_gp_start_if_needed() directly calls schedule_work(),
which acquires the workqueue pool->lock.

This causes a lockdep splat when call_srcu() is called with a scheduler
lock held, due to:

  call_srcu() [holding pi_lock]
    srcu_gp_start_if_needed()
      schedule_work() -> pool->lock

  workqueue_init() / create_worker() [holding pool->lock]
    wake_up_process() -> try_to_wake_up() -> pi_lock

Also add irq_work_sync() to cleanup_srcu_struct() to prevent a
use-after-free if a queued irq_work fires after cleanup begins.

Tested with rcutorture SRCU-T and no lockdep warnings.

[ Thanks to Boqun for similar fix in patch "rcu: Use an intermediate irq_work
to start process_srcu()" ]

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun@kernel.org>

rcu: Use an intermediate irq_work to start process_srcu()

Since commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms
of SRCU-fast") we switched to SRCU in BPF. However as BPF instrument can
happen basically everywhere (including where a scheduler lock is held),
call_srcu() now needs to avoid acquiring scheduler lock because
otherwise it could cause deadlock [1]. Fix this by following what the
previous RCU Tasks Trace did: using an irq_work to delay the queuing of
the work to start process_srcu().

[boqun: Apply Joel's feedback]
[boqun: Apply Andrea's test feedback]

Reported-by: Andrea Righi <arighi@nvidia.com>
Closes: https://lore.kernel.org/all/abjzvz_tL_siV17s@gpd4/
Fixes: commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
Link: https://lore.kernel.org/rcu/3c4c5a29-24ea-492d-aeee-e0d9605b4183@nvidia.com/
Suggested-by: Zqiang <qiang.zhang@linux.dev>
Tested-by: Andrea Righi <arighi@nvidia.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>

srcu: Push srcu_node allocation to GP when non-preemptible

When the srcutree.convert_to_big and srcutree.big_cpu_lim kernel boot
parameters specify initialization-time allocation of the srcu_node
tree for statically allocated srcu_struct structures (for example, in
DEFINE_SRCU() at build time instead of init_srcu_struct() at runtime),
init_srcu_struct_nodes() will attempt to dynamically allocate this tree
at the first run-time update-side use of this srcu_struct structure,
but while holding a raw spinlock. Because the memory allocator can
acquire non-raw spinlocks, this can result in lockdep splats.

This commit therefore uses the same SRCU_SIZE_ALLOC trick that is used
when the first run-time update-side use of this srcu_struct structure
happens before srcu_init() is called. The actual allocation then takes
place from workqueue context at the ends of upcoming SRCU grace periods.

[boqun: Adjust the sha1 of the Fixes tag]

Fixes: 175b45ed343a ("srcu: Use raw spinlocks so call_srcu() can be used under preempt_disable()")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun@kernel.org>