Yiqi Sun [Thu, 2 Apr 2026 07:04:19 +0000 (15:04 +0800)]
ipv4: icmp: fix null-ptr-deref in icmp_build_probe()
ipv6_stub->ipv6_dev_find() may return ERR_PTR(-EAFNOSUPPORT) when the
IPv6 stack is not active (CONFIG_IPV6=m and not loaded), and passing
this error pointer to dev_hold() will cause a kernel crash with
null-ptr-deref.
Instead, silently discard the request. RFC 8335 does not appear to
define a specific response for the case where an IPv6 interface
identifier is syntactically valid but the implementation cannot perform
the lookup at runtime, and silently dropping the request may safer than
misreporting "No Such Interface".
ipv4: nexthop: allocate skb dynamically in rtm_get_nexthop()
When querying a nexthop object via RTM_GETNEXTHOP, the kernel currently
allocates a fixed-size skb using NLMSG_GOODSIZE. While sufficient for
single nexthops and small Equal-Cost Multi-Path groups, this fixed
allocation fails for large nexthop groups like 512 nexthops.
Fix this by allocating the size dynamically using nh_nlmsg_size() and
using nlmsg_new(), this is consistent with nexthop_notify() behavior. In
addition, adjust nh_nlmsg_size_grp() so it calculates the size needed
based on flags passed. While at it, also add the size of NHA_FDB for
nexthop group size calculation as it was missing too.
This cannot be reproduced via iproute2 as the group size is currently
limited and the command fails as follows:
addattr_l ERROR: message exceeded bound of 1048
Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8Li2h4KO+AQFXW4S6Yb_u5X4oSKnkywW+LPFjuErhqELA@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump
Currently NHA_HW_STATS_ENABLE is included twice everytime a dump of
nexthop group is performed with NHA_OP_FLAG_DUMP_STATS. As all the stats
querying were moved to nla_put_nh_group_stats(), leave only that
instance of the attribute querying.
Fixes: 5072ae00aea4 ("net: nexthop: Expose nexthop group HW stats to user space") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: qualcomm: qca_uart: report the consumed byte on RX skb allocation failure
qca_tty_receive() consumes each input byte before checking whether a
completed frame needs a fresh receive skb. When the current byte completes
a frame, the driver delivers that frame and then allocates a new skb for
the next one.
If that allocation fails, the current code returns i even though data[i]
has already been consumed and may already have completed the delivered
frame. Since serdev interprets the return value as the number of accepted
bytes, this under-reports progress by one byte and can replay the final
byte of the completed frame into a fresh parser state on the next call.
Return i + 1 in that failure path so the accepted-byte count matches the
actual receive-state progress.
Fixes: dfc768fbe618 ("net: qualcomm: add QCA7000 UART driver") Cc: stable@vger.kernel.org Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260402071207.4036-1-pengpeng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleh Konko [Thu, 2 Apr 2026 09:48:57 +0000 (09:48 +0000)]
tipc: fix bc_ackers underflow on duplicate GRP_ACK_MSG
The GRP_ACK_MSG handler in tipc_group_proto_rcv() currently decrements
bc_ackers on every inbound group ACK, even when the same member has
already acknowledged the current broadcast round.
Because bc_ackers is a u16, a duplicate ACK received after the last
legitimate ACK wraps the counter to 65535. Once wrapped,
tipc_group_bc_cong() keeps reporting congestion and later group
broadcasts on the affected socket stay blocked until the group is
recreated.
Fix this by ignoring duplicate or stale ACKs before touching bc_acked or
bc_ackers. This makes repeated GRP_ACK_MSG handling idempotent and
prevents the underflow path.
Fixes: 2f487712b893 ("tipc: guarantee that group broadcast doesn't bypass group unicast") Cc: stable@vger.kernel.org Signed-off-by: Oleh Konko <security@1seal.org> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/41a4833f368641218e444fdcff822039.security@1seal.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rtnetlink: add missing netlink_ns_capable() check for peer netns
rtnl_newlink() lacks a CAP_NET_ADMIN capability check on the peer
network namespace when creating paired devices (veth, vxcan,
netkit). This allows an unprivileged user with a user namespace
to create interfaces in arbitrary network namespaces, including
init_net.
Add a netlink_ns_capable() check for CAP_NET_ADMIN in the peer
namespace before allowing device creation to proceed.
Fixes: 81adee47dfb6 ("net: Support specifying the network namespace upon device creation.") Signed-off-by: Nikolaos Gkarlis <nickgarlis@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260402181432.4126920-1-nickgarlis@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bridge: guard local VLAN-0 FDB helpers against NULL vlan group
When CONFIG_BRIDGE_VLAN_FILTERING is not set, br_vlan_group() and
nbp_vlan_group() return NULL (br_private.h stub definitions). The
BR_BOOLOPT_FDB_LOCAL_VLAN_0 toggle code is compiled unconditionally and
reaches br_fdb_delete_locals_per_vlan_port() and
br_fdb_insert_locals_per_vlan_port(), where the NULL vlan group pointer
is dereferenced via list_for_each_entry(v, &vg->vlan_list, vlist).
The observed crash is in the delete path, triggered when creating a
bridge with IFLA_BR_MULTI_BOOLOPT containing BR_BOOLOPT_FDB_LOCAL_VLAN_0
via RTM_NEWLINK. The insert helper has the same bug pattern.
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] KASAN NOPTI
KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7]
RIP: 0010:br_fdb_delete_locals_per_vlan+0x2b9/0x310
Call Trace:
br_fdb_toggle_local_vlan_0+0x452/0x4c0
br_toggle_fdb_local_vlan_0+0x31/0x80 net/bridge/br.c:276
br_boolopt_toggle net/bridge/br.c:313
br_boolopt_multi_toggle net/bridge/br.c:364
br_changelink net/bridge/br_netlink.c:1542
br_dev_newlink net/bridge/br_netlink.c:1575
Add NULL checks for the vlan group pointer in both helpers, returning
early when there are no VLANs to iterate. This matches the existing
pattern used by other bridge FDB functions such as br_fdb_add() and
br_fdb_delete().
Fixes: 21446c06b441 ("net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0") Signed-off-by: Zijing Yin <yzjaurora@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402140153.3925663-1-yzjaurora@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: airoha: Fix memory leak in airoha_qdma_rx_process()
If an error occurs on the subsequents buffers belonging to the
non-linear part of the skb (e.g. due to an error in the payload length
reported by the NIC or if we consumed all the available fragments for
the skb), the page_pool fragment will not be linked to the skb so it will
not return to the pool in the airoha_qdma_rx_process() error path. Fix the
memory leak partially reverting commit 'd6d2b0e1538d ("net: airoha: Fix
page recycling in airoha_qdma_rx_process()")' and always running
page_pool_put_full_page routine in the airoha_qdma_rx_process() error
path.
When CONFIG_FIXED_PHY is in a loadable module, the fec driver cannot be
built-in any more:
x86_64-linux-ld: vmlinux.o: in function `fec_enet_mii_probe':
fec_main.c:(.text+0xc4f367): undefined reference to `fixed_phy_unregister'
x86_64-linux-ld: vmlinux.o: in function `fec_enet_close':
fec_main.c:(.text+0xc59591): undefined reference to `fixed_phy_unregister'
x86_64-linux-ld: vmlinux.o: in function `fec_enet_mii_probe.cold':
Select the fixed phy support on all targets to make this build
correctly, not just on coldfire.
Notat that Essentially the stub helpers in include/linux/phy_fixed.h
cannot be used correctly because of this build time dependency,
and we could just remove them to hit the build failure more often
when a driver uses them without the 'select FIXED_PHY'.
Fixes: dc86b621e1b4 ("net: fec: register a fixed phy using fixed_phy_register_100fd if needed") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260402141048.2713445-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tcf_csum_act() walks nested VLAN headers directly from skb->data when an
skb still carries in-payload VLAN tags. The current code reads
vlan->h_vlan_encapsulated_proto and then pulls VLAN_HLEN bytes without
first ensuring that the full VLAN header is present in the linear area.
If only part of an inner VLAN header is linearized, accessing
h_vlan_encapsulated_proto reads past the linear area, and the following
skb_pull(VLAN_HLEN) may violate skb invariants.
Fix this by requiring pskb_may_pull(skb, VLAN_HLEN) before accessing and
pulling each nested VLAN header. If the header still is not fully
available, drop the packet through the existing error path.
Fixes: 2ecba2d1e45b ("net: sched: act_csum: Fix csum calc for tagged packets") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Ruide Cao <caoruide123@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/22df2fcb49f410203eafa5d97963dd36089f4ecf.1774892775.git.caoruide123@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The jumbo_frm() chain-mode implementation unconditionally computes
len = nopaged_len - bmax;
where nopaged_len = skb_headlen(skb) (linear bytes only) and bmax is
BUF_SIZE_8KiB or BUF_SIZE_2KiB. However, the caller stmmac_xmit()
decides to invoke jumbo_frm() based on skb->len (total length including
page fragments):
When a packet has a small linear portion (nopaged_len <= bmax) but a
large total length due to page fragments (skb->len > bmax), the
subtraction wraps as an unsigned integer, producing a huge len value
(~0xFFFFxxxx). This causes the while (len != 0) loop to execute
hundreds of thousands of iterations, passing skb->data + bmax * i
pointers far beyond the skb buffer to dma_map_single(). On IOMMU-less
SoCs (the typical deployment for stmmac), this maps arbitrary kernel
memory to the DMA engine, constituting a kernel memory disclosure and
potential memory corruption from hardware.
Fix this by introducing a buf_len local variable clamped to
min(nopaged_len, bmax). Computing len = nopaged_len - buf_len is then
always safe: it is zero when the linear portion fits within a single
descriptor, causing the while (len != 0) loop to be skipped naturally,
and the fragment loop in stmmac_xmit() handles page fragments afterward.
David Carlier [Wed, 1 Apr 2026 21:12:18 +0000 (22:12 +0100)]
net: altera-tse: fix skb leak on DMA mapping error in tse_start_xmit()
When dma_map_single() fails in tse_start_xmit(), the function returns
NETDEV_TX_OK without freeing the skb. Since NETDEV_TX_OK tells the
stack the packet was consumed, the skb is never freed, leaking memory
on every DMA mapping failure.
Add dev_kfree_skb_any() before returning to properly free the skb.
Fixes: bbd2190ce96d ("Altera TSE: Add main and header file for Altera Ethernet Driver") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260401211218.279185-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qingfang Deng [Wed, 1 Apr 2026 02:28:39 +0000 (10:28 +0800)]
MAINTAINERS: orphan PPP over Ethernet driver
We haven't seen activities from Michal Ostrowski for quite a long time.
The last commit from him is fb64bb560e18 ("PPPoE: Fix flush/close
races."), which was in 2009. Email to mostrows@earthlink.net also
bounces.
Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"With fixes from wireless, bluetooth and netfilter included we're back
to each PR carrying 30%+ more fixes than in previous era.
The good news is that so far none of the "extra" fixes are themselves
causing real regressions. Not sure how much comfort that is.
Current release - fix to a fix:
- netdevsim: fix build if SKB_EXTENSIONS=n
- eth: stmmac: skip VLAN restore when VLAN hash ops are missing
Previous releases - regressions:
- wifi: iwlwifi: mvm: don't send a 6E related command when
not supported
Previous releases - always broken:
- some info leak fixes
- add missing clearing of skb->cb[] on ICMP paths from tunnels
- ipv6:
- flowlabel: defer exclusive option free until RCU teardown
- avoid overflows in ip6_datagram_send_ctl()
- mpls: add seqcount to protect platform_labels from OOB access
- bridge: improve safety of parsing ND options
- bluetooth: fix leaks, overflows and races in hci_sync
- netfilter: add more input validation, some to address bugs directly
some to prevent exploits from cooking up broken configurations
- wifi:
- ath: avoid poor performance due to stopping the wrong
aggregation session
- virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free
- eth:
- fec: fix the PTP periodic output sysfs interface
- enetc: safely reinitialize TX BD ring when it has unsent frames"
* tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64
ipv6: avoid overflows in ip6_datagram_send_ctl()
net: hsr: fix VLAN add unwind on slave errors
net: hsr: serialize seq_blocks merge across nodes
vsock: initialize child_ns_mode_locked in vsock_net_init()
selftests/tc-testing: add tests for cls_fw and cls_flow on shared blocks
net/sched: cls_flow: fix NULL pointer dereference on shared blocks
net/sched: cls_fw: fix NULL pointer dereference on shared blocks
net/x25: Fix overflow when accumulating packets
net/x25: Fix potential double free of skb
bnxt_en: Restore default stat ctxs for ULP when resource is available
bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode()
bnxt_en: Refactor some basic ring setup and adjustment logic
net/mlx5: Fix switchdev mode rollback in case of failure
net/mlx5: Avoid "No data available" when FW version queries fail
net/mlx5: lag: Check for LAG device before creating debugfs
net: macb: properly unregister fixed rate clocks
net: macb: fix clk handling on PCI glue driver removal
virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN
net/sched: sch_netem: fix out-of-bounds access in packet corruption
...
Merge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- IOMMU-PT related compile breakage in for AMD driver
- IOTLB flushing behavior when unmapped region is larger than requested
due to page-sizes
- Fix IOTLB flush behavior with empty gathers
* tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline
iommupt: Fix short gather if the unmap goes into a large mapping
iommu: Do not call drivers for empty gathers
Merge tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"People have been so busy for hunting and we're still getting more
changes than wished for, but it doesn't look too scary; almost all
changes are device-specific small fixes.
I guess it's rather a casual bump, and no more Easter eggs are left
for 7.0 (hopefully)...
- Fixes for the recent regression on ctxfi driver
- Fix missing INIT_LIST_HEAD() for ASoC card_aux_list
- Usual HD- and USB-audio, and ASoC AMD quirk updates
- ASoC fixes for AMD and Intel"
* tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ASoC: amd: ps: Fix missing leading zeros in subsystem_device SSID log
ALSA: usb-audio: Exclude Scarlett 2i2 1st Gen (8016) from SKIP_IFACE_SETUP
ALSA: hda/realtek: add quirk for Acer Swift SFG14-73
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IMH9
ASoC: Intel: boards: fix unmet dependency on PINCTRL
ASoC: Intel: ehl_rt5660: Use the correct rtd->dev device in hw_params
ALSA: ctxfi: Don't enumerate SPDIF1 at DAIO initialization
ALSA: hda/realtek: Add quirk for Lenovo Yoga Slim 7 14AKP10
ALSA: hda/realtek: add quirk for HP Laptop 15-fc0xxx
ASoC: ep93xx: Fix unchecked clk_prepare_enable() and add rollback on failure
ASoC: soc-core: call missing INIT_LIST_HEAD() for card_aux_list
ALSA: hda/realtek: Add quirk for Samsung Book2 Pro 360 (NP950QED)
ASoC: amd: yc: Add DMI entry for HP Laptop 15-fc0xxx
ASoC: amd: yc: Add DMI quirk for ASUS Vivobook Pro 16X OLED M7601RM
ALSA: hda/realtek: Add quirk for ASUS ROG Strix SCAR 15
ALSA: usb-audio: Exclude Scarlett Solo 1st Gen from SKIP_IFACE_SETUP
ALSA: caiaq: fix stack out-of-bounds read in init_card
ALSA: ctxfi: Check the error for index mapping
ALSA: ctxfi: Fix missing SPDIFI1 index handling
ALSA: hda/realtek: add quirk for HP Victus 15-fb0xxx
...
Merge tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay
Pull auxdisplay fixes from Andy Shevchenko:
- Fix NULL dereference in linedisp_release()
- Fix ht16k33 DT bindings to avoid warnings
- Handle errors in I²C transfers in lcd2s driver
* tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay:
auxdisplay: line-display: fix NULL dereference in linedisp_release
auxdisplay: lcd2s: add error handling for i2c transfers
dt-bindings: auxdisplay: ht16k33: Use unevaluatedProperties to fix common property warning
On systems with 64K pages, RX queues will be wedged if users set the
descriptor count to the current minimum (16). Fbnic fragments large
pages into 4K chunks, and scales down the ring size accordingly. With
64K pages and 16 descriptors, the ring size mask is 0 and will never
be filled.
32 descriptors is another special case that wedges the RX rings.
Internally, the rings track pages for the head/tail pointers, not page
fragments. So with 32 descriptors, there's only 1 usable page as one
ring slot is kept empty to disambiguate between an empty/full ring.
As a result, the head pointer never advances and the HW stalls after
consuming 16 page fragments.
Eric Dumazet [Wed, 1 Apr 2026 15:47:21 +0000 (15:47 +0000)]
ipv6: avoid overflows in ip6_datagram_send_ctl()
Yiming Qian reported :
<quote>
I believe I found a locally triggerable kernel bug in the IPv6 sendmsg
ancillary-data path that can panic the kernel via `skb_under_panic()`
(local DoS).
The core issue is a mismatch between:
- a 16-bit length accumulator (`struct ipv6_txoptions::opt_flen`, type
`__u16`) and
- a pointer to the *last* provided destination-options header (`opt->dst1opt`)
when multiple `IPV6_DSTOPTS` control messages (cmsgs) are provided.
- `include/net/ipv6.h`:
- `struct ipv6_txoptions::opt_flen` is `__u16` (wrap possible).
(lines 291-307, especially 298)
- `net/ipv6/datagram.c:ip6_datagram_send_ctl()`:
- Accepts repeated `IPV6_DSTOPTS` and accumulates into `opt_flen`
without rejecting duplicates. (lines 909-933)
- `net/ipv6/ip6_output.c:__ip6_append_data()`:
- Uses `opt->opt_flen + opt->opt_nflen` to compute header
sizes/headroom decisions. (lines 1448-1466, especially 1463-1465)
- `net/ipv6/ip6_output.c:__ip6_make_skb()`:
- Calls `ipv6_push_frag_opts()` if `opt->opt_flen` is non-zero.
(lines 1930-1934)
- `net/ipv6/exthdrs.c:ipv6_push_frag_opts()` / `ipv6_push_exthdr()`:
- Push size comes from `ipv6_optlen(opt->dst1opt)` (based on the
pointed-to header). (lines 1179-1185 and 1206-1211)
1. `opt_flen` is a 16-bit accumulator:
- `include/net/ipv6.h:298` defines `__u16 opt_flen; /* after fragment hdr */`.
2. `ip6_datagram_send_ctl()` accepts *repeated* `IPV6_DSTOPTS` cmsgs
and increments `opt_flen` each time:
- In `net/ipv6/datagram.c:909-933`, for `IPV6_DSTOPTS`:
- It computes `len = ((hdr->hdrlen + 1) << 3);`
- It checks `CAP_NET_RAW` using `ns_capable(net->user_ns,
CAP_NET_RAW)`. (line 922)
- Then it does:
- `opt->opt_flen += len;` (line 927)
- `opt->dst1opt = hdr;` (line 928)
There is no duplicate rejection here (unlike the legacy
`IPV6_2292DSTOPTS` path which rejects duplicates at
`net/ipv6/datagram.c:901-904`).
If enough large `IPV6_DSTOPTS` cmsgs are provided, `opt_flen` wraps
while `dst1opt` still points to a large (2048-byte)
destination-options header.
In the attached PoC (`poc.c`):
- 32 cmsgs with `hdrlen=255` => `len = (255+1)*8 = 2048`
- 1 cmsg with `hdrlen=0` => `len = 8`
- Total increment: `32*2048 + 8 = 65544`, so `(__u16)opt_flen == 8`
- The last cmsg is 2048 bytes, so `dst1opt` points to a 2048-byte header.
3. The transmit path sizes headers using the wrapped `opt_flen`:
- The `IPV6_DSTOPTS` cmsg path requires `CAP_NET_RAW` in the target
netns user namespace (`ns_capable(net->user_ns, CAP_NET_RAW)`).
- Root (or any task with `CAP_NET_RAW`) can trigger this without user
namespaces.
- An unprivileged `uid=1000` user can trigger this if unprivileged
user namespaces are enabled and it can create a userns+netns to obtain
namespaced `CAP_NET_RAW` (the attached PoC does this).
- Local denial of service: kernel BUG/panic (system crash).
- Reproducible with a small userspace PoC.
</quote>
This patch does not reject duplicated options, as this might break
some user applications.
Instead, it makes sure to adjust opt_flen and opt_nflen to correctly
reflect the size of the current option headers, preventing the overflows
and the potential for panics.
This applies to IPV6_DSTOPTS, IPV6_HOPOPTS, and IPV6_RTHDR.
Specifically:
When a new IPV6_DSTOPTS is processed, the length of the old opt->dst1opt
is subtracted from opt->opt_flen before adding the new length.
When a new IPV6_HOPOPTS is processed, the length of the old opt->dst0opt
is subtracted from opt->opt_nflen.
When a new Routing Header (IPV6_RTHDR or IPV6_2292RTHDR) is processed,
the length of the old opt->srcrt is subtracted from opt->opt_nflen.
In the special case within IPV6_2292RTHDR handling where dst1opt is moved
to dst0opt, the length of the old opt->dst0opt is subtracted from
opt->opt_nflen before the new one is added.
Fixes: 333fad5364d6 ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8JNzawgr5OX5m+3jnQDHry2XxhQT5=jThW1zDPtUikRYA@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260401154721.3740056-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: hsr: fixes for PRP duplication and VLAN unwind
This series addresses two logic bugs in the HSR/PRP implementation
identified during a protocol audit. These are targeted for the 'net'
tree as they fix potential memory corruption and state inconsistency.
The primary change resolves a race condition in the node merging path by
implementing address-based lock ordering. This ensures that concurrent
mutations of sequence blocks do not lead to state corruption or
deadlocks.
An additional fix corrects asymmetric VLAN error unwinding by
implementing a centralized unwind path on slave errors.
====================
Luka Gejak [Wed, 1 Apr 2026 09:22:43 +0000 (11:22 +0200)]
net: hsr: fix VLAN add unwind on slave errors
When vlan_vid_add() fails for a secondary slave, the error path calls
vlan_vid_del() on the failing port instead of the peer slave that had
already succeeded. This results in asymmetric VLAN state across the HSR
pair.
Fix this by switching to a centralized unwind path that removes the VID
from any slave device that was already programmed.
Luka Gejak [Wed, 1 Apr 2026 09:22:42 +0000 (11:22 +0200)]
net: hsr: serialize seq_blocks merge across nodes
During node merging, hsr_handle_sup_frame() walks node_curr->seq_blocks
to update node_real without holding node_curr->seq_out_lock. This
allows concurrent mutations from duplicate registration paths, risking
inconsistent state or XArray/bitmap corruption.
Fix this by locking both nodes' seq_out_lock during the merge.
To prevent ABBA deadlocks, locks are acquired in order of memory
address.
Reviewed-by: Felix Maurer <fmaurer@redhat.com> Fixes: 415e6367512b ("hsr: Implement more robust duplicate discard for PRP") Signed-off-by: Luka Gejak <luka.gejak@linux.dev> Link: https://patch.msgid.link/20260401092243.52121-2-luka.gejak@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
vsock: initialize child_ns_mode_locked in vsock_net_init()
The `child_ns_mode_locked` field lives in `struct net`, which persists
across vsock module reloads. When the module is unloaded and reloaded,
`vsock_net_init()` resets `mode` and `child_ns_mode` back to their
default values, but does not reset `child_ns_mode_locked`.
The stale lock from the previous module load causes subsequent writes
to `child_ns_mode` to silently fail: `vsock_net_set_child_mode()` sees
the old lock, skips updating the actual value, and returns success
when the requested mode matches the stale lock. The sysctl handler
reports no error, but `child_ns_mode` remains unchanged.
Steps to reproduce:
$ modprobe vsock
$ echo local > /proc/sys/net/vsock/child_ns_mode
$ cat /proc/sys/net/vsock/child_ns_mode
local
$ modprobe -r vsock
$ modprobe vsock
$ echo local > /proc/sys/net/vsock/child_ns_mode
$ cat /proc/sys/net/vsock/child_ns_mode
global <--- expected "local"
Fix this by initializing `child_ns_mode_locked` to 0 (unlocked) in
`vsock_net_init()`, so the write-once mechanism works correctly after
module reload.
Fixes: 102eab95f025 ("vsock: lock down child_ns_mode as write-once") Reported-by: Jin Liu <jinl@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260401092153.28462-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xiang Mei [Tue, 31 Mar 2026 05:02:17 +0000 (22:02 -0700)]
selftests/tc-testing: add tests for cls_fw and cls_flow on shared blocks
Regression tests for the shared-block NULL derefs fixed in the previous
two patches:
- fw: attempt to attach an empty fw filter to a shared block and
verify the configuration is rejected with EINVAL.
- flow: create a flow filter on a shared block without a baseclass
and verify the configuration is rejected with EINVAL.
Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260331050217.504278-3-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Xiang Mei [Tue, 31 Mar 2026 05:02:16 +0000 (22:02 -0700)]
net/sched: cls_flow: fix NULL pointer dereference on shared blocks
flow_change() calls tcf_block_q() and dereferences q->handle to derive
a default baseclass. Shared blocks leave block->q NULL, causing a NULL
deref when a flow filter without a fully qualified baseclass is created
on a shared block.
Check tcf_block_shared() before accessing block->q and return -EINVAL
for shared blocks. This avoids the null-deref shown below:
=======================================================================
KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
RIP: 0010:flow_change (net/sched/cls_flow.c:508)
Call Trace:
tc_new_tfilter (net/sched/cls_api.c:2432)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6980)
[...]
=======================================================================
Fixes: 1abf272022cf ("net: sched: tcindex, fw, flow: use tcf_block_q helper to get struct Qdisc") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260331050217.504278-2-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Xiang Mei [Tue, 31 Mar 2026 05:02:15 +0000 (22:02 -0700)]
net/sched: cls_fw: fix NULL pointer dereference on shared blocks
The old-method path in fw_classify() calls tcf_block_q() and
dereferences q->handle. Shared blocks leave block->q NULL, causing a
NULL deref when an empty cls_fw filter is attached to a shared block
and a packet with a nonzero major skb mark is classified.
Reject the configuration in fw_change() when the old method (no
TCA_OPTIONS) is used on a shared block, since fw_classify()'s
old-method path needs block->q which is NULL for shared blocks.
The fixed null-ptr-deref calling stack:
KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
RIP: 0010:fw_classify (net/sched/cls_fw.c:81)
Call Trace:
tcf_classify (./include/net/tc_wrapper.h:197 net/sched/cls_api.c:1764 net/sched/cls_api.c:1860)
tc_run (net/core/dev.c:4401)
__dev_queue_xmit (net/core/dev.c:4535 net/core/dev.c:4790)
Fixes: 1abf272022cf ("net: sched: tcindex, fw, flow: use tcf_block_q helper to get struct Qdisc") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260331050217.504278-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Martin Schiller [Tue, 31 Mar 2026 07:43:17 +0000 (09:43 +0200)]
net/x25: Fix potential double free of skb
When alloc_skb fails in x25_queue_rx_frame it calls kfree_skb(skb) at
line 48 and returns 1 (error).
This error propagates back through the call chain:
x25_queue_rx_frame returns 1
|
v
x25_state3_machine receives the return value 1 and takes the else
branch at line 278, setting queued=0 and returning 0
|
v
x25_process_rx_frame returns queued=0
|
v
x25_backlog_rcv at line 452 sees queued=0 and calls kfree_skb(skb)
again
This would free the same skb twice. Looking at x25_backlog_rcv:
Merge tag 'asoc-fix-v7.0-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v7.0
Another smallish batch of fixes and quirks, these days it's AMD that is
getting all the DMI entries added. We've got one core fix for a missing
list initialisation with auxiliary devices, otherwise it's all fairly
small things.
Jakub Kicinski [Thu, 2 Apr 2026 03:13:00 +0000 (20:13 -0700)]
Merge branch 'bnxt_en-bug-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes
The first patch is a refactor patch needed by the second patch to
fix XDP ring initialization during FW reset. The third patch
fixes an issue related to stats context reservation for RoCE.
====================
Pavan Chebbi [Tue, 31 Mar 2026 06:51:38 +0000 (23:51 -0700)]
bnxt_en: Restore default stat ctxs for ULP when resource is available
During resource reservation, if the L2 driver does not have enough
MSIX vectors to provide to the RoCE driver, it sets the stat ctxs for
ULP also to 0 so that we don't have to reserve it unnecessarily.
However, subsequently the user may reduce L2 rings thereby freeing up
some resources that the L2 driver can now earmark for RoCE. In this
case, the driver should restore the default ULP stat ctxs to make
sure that all RoCE resources are ready for use.
The RoCE driver may fail to initialize in this scenario without this
fix.
Fixes: d630624ebd70 ("bnxt_en: Utilize ulp client resources if RoCE is not registered") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260331065138.948205-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Tue, 31 Mar 2026 06:51:37 +0000 (23:51 -0700)]
bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode()
The original code made the assumption that when we set up the initial
default ring mode, we must be just loading the driver and XDP cannot
be enabled yet. This is not true when the FW goes through a resource
or capability change. Resource reservations will be cancelled and
reinitialized with XDP already enabled. devlink reload with XDP enabled
will also have the same issue. This scenario will cause the ring
arithmetic to be all wrong in the bnxt_init_dflt_ring_mode() path
causing failure:
bnxt_en 0000:a1:00.0 ens2f0np0: bnxt_setup_int_mode err: ffffffea
bnxt_en 0000:a1:00.0 ens2f0np0: bnxt_request_irq err: ffffffea
bnxt_en 0000:a1:00.0 ens2f0np0: nic open fail (rc: ffffffea)
Fix it by properly accounting for XDP in the bnxt_init_dflt_ring_mode()
path by using the refactored helper functions in the previous patch.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.") Fixes: 228ea8c187d8 ("bnxt_en: implement devlink dev reload driver_reinit") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260331065138.948205-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Tue, 31 Mar 2026 06:51:36 +0000 (23:51 -0700)]
bnxt_en: Refactor some basic ring setup and adjustment logic
Refactor out the basic code that trims the default rings, sets up and
adjusts XDP TX rings and CP rings. There is no change in behavior.
This is to prepare for the next bug fix patch.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260331065138.948205-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed [Mon, 30 Mar 2026 19:40:15 +0000 (22:40 +0300)]
net/mlx5: Fix switchdev mode rollback in case of failure
If for some internal reason switchdev mode fails, we rollback to legacy
mode, before this patch, rollback will unregister the uplink netdev and
leave it unregistered causing the below kernel bug.
To fix this, we need to avoid netdev unregister by setting the proper
rollback flag 'MLX5_PRIV_FLAGS_SWITCH_LEGACY' to indicate legacy mode.
Saeed Mahameed [Mon, 30 Mar 2026 19:40:14 +0000 (22:40 +0300)]
net/mlx5: Avoid "No data available" when FW version queries fail
Avoid printing the misleading "kernel answers: No data available" devlink
output when querying firmware or pending firmware version fails
(e.g. MLX5 fw state errors / flash failures).
FW can fail on loading the pending flash image and get its version due
to various reasons, examples:
mlxfw: Firmware flash failed: key not applicable, err (7)
mlx5_fw_image_pending: can't read pending fw version while fw state is 1
and the resulting:
$ devlink dev info
kernel answers: No data available
Instead, just report 0 or 0xfff.. versions in case of failure to indicate
a problem, and let other information be shown.
after the fix:
$ devlink dev info
pci/0000:00:06.0:
driver mlx5_core
serial_number xxx...
board.serial_number MT2225300179
versions:
fixed:
fw.psid MT_0000000436
running:
fw.version 22.41.0188
fw 22.41.0188
stored:
fw.version 255.255.65535
fw 255.255.65535
Shay Drory [Mon, 30 Mar 2026 19:40:13 +0000 (22:40 +0300)]
net/mlx5: lag: Check for LAG device before creating debugfs
__mlx5_lag_dev_add_mdev() may return 0 (success) even when an error
occurs that is handled gracefully. Consequently, the initialization
flow proceeds to call mlx5_ldev_add_debugfs() even when there is no
valid LAG context.
mlx5_ldev_add_debugfs() blindly created the debugfs directory and
attributes. This exposed interfaces (like the members file) that rely on
a valid ldev pointer, leading to potential NULL pointer dereferences if
accessed when ldev is NULL.
Add a check to verify that mlx5_lag_dev(dev) returns a valid pointer
before attempting to create the debugfs entries.
Fixes: 7f46a0b7327a ("net/mlx5: Lag, add debugfs to query hardware lag state") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260330194015.53585-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fedor Pchelkin [Mon, 30 Mar 2026 18:45:40 +0000 (21:45 +0300)]
net: macb: fix clk handling on PCI glue driver removal
platform_device_unregister() may still want to use the registered clks
during runtime resume callback.
Note that there is a commit d82d5303c4c5 ("net: macb: fix use after free
on rmmod") that addressed the similar problem of clk vs platform device
unregistration but just moved the bug to another place.
Save the pointers to clks into local variables for reuse after platform
device is unregistered.
BUG: KASAN: use-after-free in clk_prepare+0x5a/0x60
Read of size 8 at addr ffff888104f85e00 by task modprobe/597
Srujana Challa [Thu, 26 Mar 2026 14:23:44 +0000 (19:53 +0530)]
virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN
rss_max_key_size in the virtio spec is the maximum key size supported by
the device, not a mandatory size the driver must use. Also the value 40
is a spec minimum, not a spec maximum.
The current code rejects RSS and can fail probe when the device reports a
larger rss_max_key_size than the driver buffer limit. Instead, clamp the
effective key length to min(device rss_max_key_size, NETDEV_RSS_KEY_LEN)
and keep RSS enabled.
This keeps probe working on devices that advertise larger maximum key sizes
while respecting the netdev RSS key buffer size limit.
Fixes: 3f7d9c1964fc ("virtio_net: Add hash_key_length check") Cc: stable@vger.kernel.org Signed-off-by: Srujana Challa <schalla@marvell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260326142344.1171317-1-schalla@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yucheng Lu [Tue, 31 Mar 2026 08:00:21 +0000 (16:00 +0800)]
net/sched: sch_netem: fix out-of-bounds access in packet corruption
In netem_enqueue(), the packet corruption logic uses
get_random_u32_below(skb_headlen(skb)) to select an index for
modifying skb->data. When an AF_PACKET TX_RING sends fully non-linear
packets over an IPIP tunnel, skb_headlen(skb) evaluates to 0.
Passing 0 to get_random_u32_below() takes the variable-ceil slow path
which returns an unconstrained 32-bit random integer. Using this
unconstrained value as an offset into skb->data results in an
out-of-bounds memory access.
Fix this by verifying skb_headlen(skb) is non-zero before attempting
to corrupt the linear data area. Fully non-linear packets will silently
bypass the corruption logic.
Fixes: c865e5d99e25 ("[PKT_SCHED] netem: packet corruption option") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Signed-off-by: Yuan Tan <tanyuan98@outlook.com> Signed-off-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yuhang Zheng <z1652074432@gmail.com> Signed-off-by: Yucheng Lu <kanolyc@gmail.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://patch.msgid.link/45435c0935df877853a81e6d06205ac738ec65fa.1774941614.git.kanolyc@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 2 Apr 2026 02:19:35 +0000 (19:19 -0700)]
Merge tag 'nf-26-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net. Note that most
of the bugs fixed here are >5 years old. The large PR is not due to an
increase in regressions.
1) Flowtable hardware offload support in IPv6 can lead to out-of-bounds
when populating the rule action array when combined with double-tagged
vlan. Bump the maximum number of actions from 16 to 24 and check that
such limit is never reached, otherwise bail out. This bugs stems from
the original flowtable hardware offload support.
2) nfnetlink_log does not include the netlink header size of the trailing
NLMSG_DONE message when calculating the skb size. From Florian Westphal.
3) Reject names in xt_cgroup and xt_rateest extensions which are not
nul-terminated. Also from Florian.
4) Use nla_strcmp in ipset lookup by set name, since IPSET_ATTR_NAME and
IPSET_ATTR_NAMEREF are of NLA_STRING type. From Florian Westphal.
5) When unregistering conntrack helpers, pass the helper that is going
away so the expectation cleanup is done accordingly, otherwise UaF is
possible when accessing expectation that refer to the helper that is
gone. From Qi Tang.
6) Zero expectation NAT fields to address leaking kernel memory through
the expectation netlink dump when unset. Also from Qi Tang.
7) Use the master conntrack helper when creating expectations via
ctnetlink, ignore the suggested helper through CTA_EXPECT_HELP_NAME.
This allows to address a possible read of kernel memory off the
expectation object boundary.
8) Fix incorrect release of the hash bucket logic in ipset when the
bucket is empty, leading to shrinking the hash bucket to size 0
which deals to out-of-bound write in next element additions.
From Yifan Wu.
9) Allow the use of x_tables extensions that explicitly declare
NFPROTO_ARP support only. This is to avoid an incorrect hook number
validation due to non-overlapping arp and inet hook number
definitions.
10) Reject immediate NF_QUEUE verdict in nf_tables. The userspace
nft tool always uses the nft_queue expression for queueing.
This ensures this verdict cannot be used for the arp family,
which does supported this.
* tag 'nf-26-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: reject immediate NF_QUEUE verdict
netfilter: x_tables: restrict xt_check_match/xt_check_target extensions for NFPROTO_ARP
netfilter: ipset: drop logically empty buckets in mtype_del
netfilter: ctnetlink: ignore explicit helper on new expectations
netfilter: ctnetlink: zero expect NAT fields when CTA_EXPECT_NAT absent
netfilter: nf_conntrack_helper: pass helper to expect cleanup
netfilter: ipset: use nla_strcmp for IPSET_ATTR_NAME attr
netfilter: x_tables: ensure names are nul-terminated
netfilter: nfnetlink_log: account for netlink header size
netfilter: flowtable: strictly check for maximum number of actions
====================
Weiming Shi [Mon, 30 Mar 2026 16:32:38 +0000 (00:32 +0800)]
rds: ib: reject FRMR registration before IB connection is established
rds_ib_get_mr() extracts the rds_ib_connection from conn->c_transport_data
and passes it to rds_ib_reg_frmr() for FRWR memory registration. On a
fresh outgoing connection, ic is allocated in rds_ib_conn_alloc() with
i_cm_id = NULL because the connection worker has not yet called
rds_ib_conn_path_connect() to create the rdma_cm_id. When sendmsg() with
RDS_CMSG_RDMA_MAP is called on such a connection, the sendmsg path parses
the control message before any connection establishment, allowing
rds_ib_post_reg_frmr() to dereference ic->i_cm_id->qp and crash the
kernel.
The existing guard in rds_ib_reg_frmr() only checks for !ic (added in
commit 9e630bcb7701), which does not catch this case since ic is allocated
early and is always non-NULL once the connection object exists.
Add a check in rds_ib_get_mr() that verifies ic, i_cm_id, and qp are all
non-NULL before proceeding with FRMR registration, mirroring the guard
already present in rds_ib_post_inv(). Return -ENODEV when the connection
is not ready, which the existing error handling in rds_cmsg_send() converts
to -EAGAIN for userspace retry and triggers rds_conn_connect_if_down() to
start the connection worker.
Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260330163237.2752440-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When two CPUs process Router Advertisement packets for the same router
simultaneously, they can both arrive at fib6_metric_set() with the same
fib6_info pointer whose fib6_metrics still points to dst_default_metrics.
Fix this by:
- Set val for p->metrics before published via cmpxchg() so the metrics
value is ready before the pointer becomes visible to other CPUs.
- Replace the plain pointer store with cmpxchg() and free the allocation
safely when competition failed.
- Add READ_ONCE()/WRITE_ONCE() for metrics[] setting in the non-default
metrics path to prevent compiler-based data races.
Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info") Reported-by: Fei Liu <feliu@redhat.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260331-b4-fib6_metric_set-kmemleak-v3-1-88d27f4d8825@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
hkbinbin [Tue, 31 Mar 2026 05:39:16 +0000 (05:39 +0000)]
Bluetooth: hci_sync: fix stack buffer overflow in hci_le_big_create_sync
hci_le_big_create_sync() uses DEFINE_FLEX to allocate a
struct hci_cp_le_big_create_sync on the stack with room for 0x11 (17)
BIS entries. However, conn->num_bis can hold up to HCI_MAX_ISO_BIS (31)
entries — validated against ISO_MAX_NUM_BIS (0x1f) in the caller
hci_conn_big_create_sync(). When conn->num_bis is between 18 and 31,
the memcpy that copies conn->bis into cp->bis writes up to 14 bytes
past the stack buffer, corrupting adjacent stack memory.
This is trivially reproducible: binding an ISO socket with
bc_num_bis = ISO_MAX_NUM_BIS (31) and calling listen() will
eventually trigger hci_le_big_create_sync() from the HCI command
sync worker, causing a KASAN-detectable stack-out-of-bounds write:
BUG: KASAN: stack-out-of-bounds in hci_le_big_create_sync+0x256/0x3b0
Write of size 31 at addr ffffc90000487b48 by task kworker/u9:0/71
Fix this by changing the DEFINE_FLEX count from the incorrect 0x11 to
HCI_MAX_ISO_BIS, which matches the maximum number of BIS entries that
conn->bis can actually carry.
Fixes: 42ecf1947135 ("Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending") Cc: stable@vger.kernel.org Signed-off-by: hkbinbin <hkbinbinbin@gmail.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Oleh Konko [Tue, 31 Mar 2026 11:52:13 +0000 (11:52 +0000)]
Bluetooth: SMP: derive legacy responder STK authentication from MITM state
The legacy responder path in smp_random() currently labels the stored
STK as authenticated whenever pending_sec_level is BT_SECURITY_HIGH.
That reflects what the local service requested, not what the pairing
flow actually achieved.
For Just Works/Confirm legacy pairing, SMP_FLAG_MITM_AUTH stays clear
and the resulting STK should remain unauthenticated even if the local
side requested HIGH security. Use the established MITM state when
storing the responder STK so the key metadata matches the pairing result.
This also keeps the legacy path aligned with the Secure Connections code,
which already treats JUST_WORKS/JUST_CFM as unauthenticated.
Fixes: fff3490f4781 ("Bluetooth: Fix setting correct authentication information for SMP STK") Cc: stable@vger.kernel.org Signed-off-by: Oleh Konko <security@1seal.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Oleh Konko [Tue, 31 Mar 2026 11:52:12 +0000 (11:52 +0000)]
Bluetooth: SMP: force responder MITM requirements before building the pairing response
smp_cmd_pairing_req() currently builds the pairing response from the
initiator auth_req before enforcing the local BT_SECURITY_HIGH
requirement. If the initiator omits SMP_AUTH_MITM, the response can
also omit it even though the local side still requires MITM.
tk_request() then sees an auth value without SMP_AUTH_MITM and may
select JUST_CFM, making method selection inconsistent with the pairing
policy the responder already enforces.
When the local side requires HIGH security, first verify that MITM can
be achieved from the IO capabilities and then force SMP_AUTH_MITM in the
response in both rsp.auth_req and auth. This keeps the responder auth bits
and later method selection aligned.
Fixes: 2b64d153a0cc ("Bluetooth: Add MITM mechanism to LE-SMP") Cc: stable@vger.kernel.org Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Signed-off-by: Oleh Konko <security@1seal.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
mesh_send() currently bounds MGMT_OP_MESH_SEND by total command
length, but it never verifies that the bytes supplied for the
flexible adv_data[] array actually match the embedded adv_data_len
field. MGMT_MESH_SEND_SIZE only covers the fixed header, so a
truncated command can still pass the existing 20..50 byte range
check and later drive the async mesh send path past the end of the
queued command buffer.
Keep rejecting zero-length and oversized advertising payloads, but
validate adv_data_len explicitly and require the command length to
exactly match the flexible array size before queueing the request.
Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pauli Virtanen [Sun, 29 Mar 2026 13:43:02 +0000 (16:43 +0300)]
Bluetooth: hci_event: fix potential UAF in hci_le_remote_conn_param_req_evt
hci_conn lookup and field access must be covered by hdev lock in
hci_le_remote_conn_param_req_evt, otherwise it's possible it is freed
concurrently.
Extend the hci_dev_lock critical section to cover all conn usage.
Fixes: 95118dd4edfec ("Bluetooth: hci_event: Use of a function table to handle LE subevents") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pauli Virtanen [Sun, 29 Mar 2026 13:43:01 +0000 (16:43 +0300)]
Bluetooth: hci_conn: fix potential UAF in set_cig_params_sync
hci_conn lookup and field access must be covered by hdev lock in
set_cig_params_sync, otherwise it's possible it is freed concurrently.
Take hdev lock to prevent hci_conn from being deleted or modified
concurrently. Just RCU lock is not suitable here, as we also want to
avoid "tearing" in the configuration.
Fixes: a091289218202 ("Bluetooth: hci_conn: Fix hci_le_set_cig_params") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Keenan Dong [Sat, 28 Mar 2026 08:46:47 +0000 (16:46 +0800)]
Bluetooth: MGMT: validate LTK enc_size on load
Load Long Term Keys stores the user-provided enc_size and later uses
it to size fixed-size stack operations when replying to LE LTK
requests. An enc_size larger than the 16-byte key buffer can therefore
overflow the reply stack buffer.
Reject oversized enc_size values while validating the management LTK
record so invalid keys never reach the stored key state.
Fixes: 346af67b8d11 ("Bluetooth: Add MGMT handlers for dealing with SMP LTK's") Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Commit 5df5dafc171b ("Bluetooth: hci_uart: Fix another race during
initialization") fixed a race for hci commands sent during initialization.
However, there is still a race that happens if an hci event from one of
these commands is received before HCI_UART_REGISTERED has been set at
the end of hci_uart_register_dev(). The event will be ignored which
causes the command to fail with a timeout in the log:
"Bluetooth: hci0: command 0x1003 tx timeout"
This is because the hci event receive path (hci_uart_tty_receive ->
h4_recv) requires HCI_UART_REGISTERED to be set in h4_recv(), while the
hci command transmit path (hci_uart_send_frame -> h4_enqueue) only
requires HCI_UART_PROTO_INIT to be set in hci_uart_send_frame().
The check for HCI_UART_REGISTERED was originally added in commit c2578202919a ("Bluetooth: Fix H4 crash from incoming UART packets")
to fix a crash caused by hu->hdev being null dereferenced. That can no
longer happen: once HCI_UART_PROTO_INIT is set in hci_uart_register_dev()
all pointers (hu, hu->priv and hu->hdev) are valid, and
hci_uart_tty_receive() already calls h4_recv() on HCI_UART_PROTO_INIT
or HCI_UART_PROTO_READY.
Remove the check for HCI_UART_REGISTERED in h4_recv() to fix the race
condition.
Fixes: 5df5dafc171b ("Bluetooth: hci_uart: Fix another race during initialization") Signed-off-by: Jonathan Rissanen <jonathan.rissanen@axis.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci_sync: Fix UAF in le_read_features_complete
This fixes the following backtrace caused by hci_conn being freed
before le_read_features_complete but after
hci_le_read_remote_features_sync so hci_conn_del -> hci_cmd_sync_dequeue
is not able to prevent it:
==================================================================
BUG: KASAN: slab-use-after-free in instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
BUG: KASAN: slab-use-after-free in atomic_dec_and_test include/linux/atomic/atomic-instrumented.h:1383 [inline]
BUG: KASAN: slab-use-after-free in hci_conn_drop include/net/bluetooth/hci_core.h:1688 [inline]
BUG: KASAN: slab-use-after-free in le_read_features_complete+0x5b/0x340 net/bluetooth/hci_sync.c:7344
Write of size 4 at addr ffff8880796b0010 by task kworker/u9:0/52
The buggy address belongs to the object at ffff8880796b0000
which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 16 bytes inside of
freed 8192-byte region [ffff8880796b0000, ffff8880796b2000)
Memory state around the buggy address: ffff8880796aff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8880796aff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880796b0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8880796b0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880796b0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Fixes: a106e50be74b ("Bluetooth: HCI: Add support for LL Extended Feature Set") Reported-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com Tested-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=87badbb9094e008e0685 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Pauli Virtanen <pav@iki.fi>
Pauli Virtanen [Wed, 25 Mar 2026 19:07:43 +0000 (21:07 +0200)]
Bluetooth: hci_sync: hci_cmd_sync_queue_once() return -EEXIST if exists
hci_cmd_sync_queue_once() needs to indicate whether a queue item was
added, so caller can know if callbacks are called, so it can avoid
leaking resources.
Change the function to return -EEXIST if queue item already exists.
Modify all callsites to handle that.
Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Oleh Konko [Thu, 26 Mar 2026 17:31:24 +0000 (17:31 +0000)]
Bluetooth: hci_event: move wake reason storage into validated event handlers
hci_store_wake_reason() is called from hci_event_packet() immediately
after stripping the HCI event header but before hci_event_func()
enforces the per-event minimum payload length from hci_ev_table.
This means a short HCI event frame can reach bacpy() before any bounds
check runs.
Rather than duplicating skb parsing and per-event length checks inside
hci_store_wake_reason(), move wake-address storage into the individual
event handlers after their existing event-length validation has
succeeded. Convert hci_store_wake_reason() into a small helper that only
stores an already-validated bdaddr while the caller holds hci_dev_lock().
Use the same helper after hci_event_func() with a NULL address to
preserve the existing unexpected-wake fallback semantics when no
validated event handler records a wake address.
Annotate the helper with __must_hold(&hdev->lock) and add
lockdep_assert_held(&hdev->lock) so future call paths keep the lock
contract explicit.
Call the helper from hci_conn_request_evt(), hci_conn_complete_evt(),
hci_sync_conn_complete_evt(), le_conn_complete_evt(),
hci_le_adv_report_evt(), hci_le_ext_adv_report_evt(),
hci_le_direct_adv_report_evt(), hci_le_pa_sync_established_evt(), and
hci_le_past_received_evt().
Fixes: 2f20216c1d6f ("Bluetooth: Emit controller suspend and resume events") Cc: stable@vger.kernel.org Signed-off-by: Oleh Konko <security@1seal.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cen Zhang [Thu, 26 Mar 2026 15:16:45 +0000 (23:16 +0800)]
Bluetooth: SCO: fix race conditions in sco_sock_connect()
sco_sock_connect() checks sk_state and sk_type without holding
the socket lock. Two concurrent connect() syscalls on the same
socket can both pass the check and enter sco_connect(), leading
to use-after-free.
The buggy scenario involves three participants and was confirmed
with additional logging instrumentation:
Thread A (connect): HCI disconnect: Thread B (connect):
Thread B revives a BT_CLOSED + SOCK_ZAPPED socket back to
BT_CONNECT. Subsequent cleanup triggers double sock_put() and
use-after-free. Meanwhile conn1 is leaked as it was orphaned
when sco_conn_del() cleared the association.
Fix this by:
- Moving lock_sock() before the sk_state/sk_type checks in
sco_sock_connect() to serialize concurrent connect attempts
- Fixing the sk_type != SOCK_SEQPACKET check to actually
return the error instead of just assigning it
- Adding a state re-check in sco_connect() after lock_sock()
to catch state changes during the window between the locks
- Adding sco_pi(sk)->conn check in sco_chan_add() to prevent
double-attach of a socket to multiple connections
- Adding hci_conn_drop() on sco_chan_add failure to prevent
HCI connection leaks
Fixes: 9a8ec9e8ebb5 ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm") Signed-off-by: Cen Zhang <zzzccc427@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pauli Virtanen [Wed, 25 Mar 2026 19:07:46 +0000 (21:07 +0200)]
Bluetooth: hci_sync: call destroy in hci_cmd_sync_run if immediate
hci_cmd_sync_run() may run the work immediately if called from existing
sync work (otherwise it queues a new sync work). In this case it fails
to call the destroy() function.
On immediate run, make it behave same way as if item was queued
successfully: call destroy, and return 0.
The only callsite is hci_abort_conn() via hci_cmd_sync_run_once(), and
this changes its return value. However, its return value is not used
except as the return value for hci_disconnect(), and nothing uses the
return value of hci_disconnect(). Hence there should be no behavior
change anywhere.
Fixes: c898f6d7b093b ("Bluetooth: hci_sync: Introduce hci_cmd_sync_run/hci_cmd_sync_run_once") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Simon Trimmer [Tue, 31 Mar 2026 13:19:16 +0000 (13:19 +0000)]
ASoC: amd: ps: Fix missing leading zeros in subsystem_device SSID log
Ensure that subsystem_device is printed with leading zeros when combined
with subsystem_vendor to form the SSID. Without this, devices with upper
bits unset may appear to have an incorrect SSID in the debug output.
nft_queue is always used from userspace nftables to deliver the NF_QUEUE
verdict. Immediately emitting an NF_QUEUE verdict is never used by the
userspace nft tools, so reject immediate NF_QUEUE verdicts.
The arp family does not provide queue support, but such an immediate
verdict is still reachable. Globally reject NF_QUEUE immediate verdicts
to address this issue.
netfilter: x_tables: restrict xt_check_match/xt_check_target extensions for NFPROTO_ARP
Weiming Shi says:
xt_match and xt_target structs registered with NFPROTO_UNSPEC can be
loaded by any protocol family through nft_compat. When such a
match/target sets .hooks to restrict which hooks it may run on, the
bitmask uses NF_INET_* constants. This is only correct for families
whose hook layout matches NF_INET_*: IPv4, IPv6, INET, and bridge
all share the same five hooks (PRE_ROUTING ... POST_ROUTING).
ARP only has three hooks (IN=0, OUT=1, FORWARD=2) with different
semantics. Because NF_ARP_OUT == 1 == NF_INET_LOCAL_IN, the .hooks
validation silently passes for the wrong reasons, allowing matches to
run on ARP chains where the hook assumptions (e.g. state->in being
set on input hooks) do not hold. This leads to NULL pointer
dereferences; xt_devgroup is one concrete example:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000044: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000220-0x0000000000000227]
RIP: 0010:devgroup_mt+0xff/0x350
Call Trace:
<TASK>
nft_match_eval (net/netfilter/nft_compat.c:407)
nft_do_chain (net/netfilter/nf_tables_core.c:285)
nft_do_chain_arp (net/netfilter/nft_chain_filter.c:61)
nf_hook_slow (net/netfilter/core.c:623)
arp_xmit (net/ipv4/arp.c:666)
</TASK>
Kernel panic - not syncing: Fatal exception in interrupt
Fix it by restricting arptables to NFPROTO_ARP extensions only.
Note that arptables-legacy only supports:
- arpt_CLASSIFY
- arpt_mangle
- arpt_MARK
that provide explicit NFPROTO_ARP match/target declarations.
Fixes: 9291747f118d ("netfilter: xtables: add device group match") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yifan Wu [Mon, 30 Mar 2026 21:39:24 +0000 (14:39 -0700)]
netfilter: ipset: drop logically empty buckets in mtype_del
mtype_del() counts empty slots below n->pos in k, but it only drops the
bucket when both n->pos and k are zero. This misses buckets whose live
entries have all been removed while n->pos still points past deleted slots.
Treat a bucket as empty when all positions below n->pos are unused and
release it directly instead of shrinking it further.
Fixes: 8af1c6fbd923 ("netfilter: ipset: Fix forceadd evaluation path") Cc: stable@vger.kernel.org Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <dstsmallbird@foxmail.com> Signed-off-by: Yifan Wu <yifanwucs@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: ctnetlink: ignore explicit helper on new expectations
Use the existing master conntrack helper, anything else is not really
supported and it just makes validation more complicated, so just ignore
what helper userspace suggests for this expectation.
This was uncovered when validating CTA_EXPECT_CLASS via different helper
provided by userspace than the existing master conntrack helper:
BUG: KASAN: slab-out-of-bounds in nf_ct_expect_related_report+0x2479/0x27c0
Read of size 4 at addr ffff8880043fe408 by task poc/102
Call Trace:
nf_ct_expect_related_report+0x2479/0x27c0
ctnetlink_create_expect+0x22b/0x3b0
ctnetlink_new_expect+0x4bd/0x5c0
nfnetlink_rcv_msg+0x67a/0x950
netlink_rcv_skb+0x120/0x350
Allowing to read kernel memory bytes off the expectation boundary.
CTA_EXPECT_HELP_NAME is still used to offer the helper name to userspace
via netlink dump.
Fixes: bd0779370588 ("netfilter: nfnetlink_queue: allow to attach expectations to conntracks") Reported-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Qi Tang [Tue, 31 Mar 2026 06:17:12 +0000 (14:17 +0800)]
netfilter: ctnetlink: zero expect NAT fields when CTA_EXPECT_NAT absent
ctnetlink_alloc_expect() allocates expectations from a non-zeroing
slab cache via nf_ct_expect_alloc(). When CTA_EXPECT_NAT is not
present in the netlink message, saved_addr and saved_proto are
never initialized. Stale data from a previous slab occupant can
then be dumped to userspace by ctnetlink_exp_dump_expect(), which
checks these fields to decide whether to emit CTA_EXPECT_NAT.
The safe sibling nf_ct_expect_init(), used by the packet path,
explicitly zeroes these fields.
Zero saved_addr, saved_proto and dir in the else branch, guarded
by IS_ENABLED(CONFIG_NF_NAT) since these fields only exist when
NAT is enabled.
Confirmed by priming the expect slab with NAT-bearing expectations,
freeing them, creating a new expectation without CTA_EXPECT_NAT,
and observing that the ctnetlink dump emits a spurious
CTA_EXPECT_NAT containing stale data from the prior allocation.
Fixes: 076a0ca02644 ("netfilter: ctnetlink: add NAT support for expectations") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Qi Tang [Sun, 29 Mar 2026 16:50:36 +0000 (00:50 +0800)]
netfilter: nf_conntrack_helper: pass helper to expect cleanup
nf_conntrack_helper_unregister() calls nf_ct_expect_iterate_destroy()
to remove expectations belonging to the helper being unregistered.
However, it passes NULL instead of the helper pointer as the data
argument, so expect_iter_me() never matches any expectation and all
of them survive the cleanup.
After unregister returns, nfnl_cthelper_del() frees the helper
object immediately. Subsequent expectation dumps or packet-driven
init_conntrack() calls then dereference the freed exp->helper,
causing a use-after-free.
Pass the actual helper pointer so expectations referencing it are
properly destroyed before the helper object is freed.
BUG: KASAN: slab-use-after-free in string+0x38f/0x430
Read of size 1 at addr ffff888003b14d20 by task poc/103
Call Trace:
string+0x38f/0x430
vsnprintf+0x3cc/0x1170
seq_printf+0x17a/0x240
exp_seq_show+0x2e5/0x560
seq_read_iter+0x419/0x1280
proc_reg_read+0x1ac/0x270
vfs_read+0x179/0x930
ksys_read+0xef/0x1c0
Freed by task 103:
The buggy address is located 32 bytes inside of
freed 192-byte region [ffff888003b14d00, ffff888003b14dc0)
Fixes: ac7b84839003 ("netfilter: expect: add and use nf_ct_expect_iterate helpers") Signed-off-by: Qi Tang <tpluszz77@gmail.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: flowtable: strictly check for maximum number of actions
The maximum number of flowtable hardware offload actions in IPv6 is:
* ethernet mangling (4 payload actions, 2 for each ethernet address)
* SNAT (4 payload actions)
* DNAT (4 payload actions)
* Double VLAN (4 vlan actions, 2 for popping vlan, and 2 for pushing)
for QinQ.
* Redirect (1 action)
Which makes 17, while the maximum is 16. But act_ct supports for tunnels
actions too. Note that payload action operates at 32-bit word level, so
mangling an IPv6 address takes 4 payload actions.
Update flow_action_entry_next() calls to check for the maximum number of
supported actions.
While at it, rise the maximum number of actions per flow from 16 to 24
so this works fine with IPv6 setups.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ALSA: usb-audio: Exclude Scarlett 2i2 1st Gen (8016) from SKIP_IFACE_SETUP
Same issue as the other 1st Gen Scarletts: QUIRK_FLAG_SKIP_IFACE_SETUP
causes distorted audio on this revision of the Scarlett 2i2 1st Gen
(1235:8016).
Michal Piekos [Sat, 28 Mar 2026 08:55:51 +0000 (09:55 +0100)]
net: stmmac: skip VLAN restore when VLAN hash ops are missing
stmmac_vlan_restore() unconditionally calls stmmac_vlan_update() when
NETIF_F_VLAN_FEATURES is set. On platforms where priv->hw->vlan (or
->update_vlan_hash) is not provided, stmmac_update_vlan_hash() returns
-EINVAL via stmmac_do_void_callback(), resulting in a spurious
"Failed to restore VLANs" error even when no VLAN filtering is in use.
Remove not needed comment.
Remove not used return value from stmmac_vlan_restore().
Yufan Chen [Sat, 28 Mar 2026 16:32:57 +0000 (00:32 +0800)]
net: ftgmac100: fix ring allocation unwind on open failure
ftgmac100_alloc_rings() allocates rx_skbs, tx_skbs, rxdes, txdes, and
rx_scratch in stages. On intermediate failures it returned -ENOMEM
directly, leaking resources allocated earlier in the function.
Rework the failure path to use staged local unwind labels and free
allocated resources in reverse order before returning -ENOMEM. This
matches common netdev allocation cleanup style.
Fixes: d72e01a0430f ("ftgmac100: Use a scratch buffer for failed RX allocations") Cc: stable@vger.kernel.org Signed-off-by: Yufan Chen <yufan.chen@linux.dev> Link: https://patch.msgid.link/20260328163257.60836-1-yufan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Li Xiasong [Mon, 30 Mar 2026 12:03:35 +0000 (20:03 +0800)]
mptcp: fix soft lockup in mptcp_recvmsg()
syzbot reported a soft lockup in mptcp_recvmsg() [0].
When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
removed from the sk_receive_queue. This causes sk_wait_data() to always
find available data and never perform actual waiting, leading to a soft
lockup.
Fix this by adding a 'last' parameter to track the last peeked skb.
This allows sk_wait_data() to make informed waiting decisions and prevent
infinite loops when MSG_PEEK is used.
Zhengchuan Liang [Mon, 30 Mar 2026 08:46:24 +0000 (16:46 +0800)]
net: ipv6: flowlabel: defer exclusive option free until RCU teardown
`ip6fl_seq_show()` walks the global flowlabel hash under the seq-file
RCU read-side lock and prints `fl->opt->opt_nflen` when an option block
is present.
Exclusive flowlabels currently free `fl->opt` as soon as `fl->users`
drops to zero in `fl_release()`. However, the surrounding
`struct ip6_flowlabel` remains visible in the global hash table until
later garbage collection removes it and `fl_free_rcu()` finally tears it
down.
A concurrent `/proc/net/ip6_flowlabel` reader can therefore race that
early `kfree()` and dereference freed option state, triggering a crash
in `ip6fl_seq_show()`.
Fix this by keeping `fl->opt` alive until `fl_free_rcu()`. That matches
the lifetime already required for the enclosing flowlabel while readers
can still reach it under RCU.
Fixes: d3aedd5ebd4b ("ipv6 flowlabel: Convert hash list to RCU.") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/07351f0ec47bcee289576f39f9354f4a64add6e4.1774855883.git.zcliangcn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 31 Mar 2026 21:23:12 +0000 (14:23 -0700)]
Merge tag 'sched_ext-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix SCX_KICK_WAIT deadlock where multiple CPUs waiting for each other
in hardirq context form a cycle. Move the wait to a balance callback
which can drop the rq lock and process IPIs.
- Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() where
the waker_node used cpu_to_node() while prev_cpu used
scx_cpu_node_if_enabled(), leading to undefined behavior when
per-node idle tracking is disabled.
* tag 'sched_ext-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback
sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
Linus Torvalds [Tue, 31 Mar 2026 21:20:39 +0000 (14:20 -0700)]
Merge tag 'wq-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
- Fix false positive stall reports on weakly ordered architectures
where the lockless worklist/timestamp check in the watchdog can
observe stale values due to memory reordering.
Recheck under pool->lock to confirm.
* tag 'wq-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Better describe stall check
workqueue: Fix false positive stall reports
Linus Torvalds [Tue, 31 Mar 2026 20:59:51 +0000 (13:59 -0700)]
Merge tag 'cgroup-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix cgroup rmdir racing with dying tasks.
Deferred task cgroup unlink introduced a window where cgroup.procs
is empty but the cgroup is still populated, causing rmdir to fail
with -EBUSY and selftest failures.
Make rmdir wait for dying tasks to fully leave and fix selftests to
not depend on synchronous populated updates.
- Fix cpuset v1 task migration failure from empty cpusets under strict
security policies.
When CPU hotplug removes the last CPU from a v1 cpuset, tasks must be
migrated to an ancestor without a security_task_setscheduler() check
that would block the migration.
* tag 'cgroup-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Skip security check for hotplug induced v1 task migration
cgroup/cpuset: Simplify setsched decision check in task iteration loop of cpuset_can_attach()
cgroup: Fix cgroup_drain_dying() testing the wrong condition
selftests/cgroup: Don't require synchronous populated update on task exit
cgroup: Wait for dying tasks to leave on rmdir
Waiman Long [Tue, 31 Mar 2026 15:11:08 +0000 (11:11 -0400)]
cgroup/cpuset: Skip security check for hotplug induced v1 task migration
When a CPU hot removal causes a v1 cpuset to lose all its CPUs, the
cpuset hotplug handler will schedule a work function to migrate tasks
in that cpuset with no CPU to its ancestor to enable those tasks to
continue running.
If a strict security policy is in place, however, the task migration
may fail when security_task_setscheduler() call in cpuset_can_attach()
returns a -EACCES error. That will mean that those tasks will have
no CPU to run on. The system administrators will have to explicitly
intervene to either add CPUs to that cpuset or move the tasks elsewhere
if they are aware of it.
This problem was found by a reported test failure in the LTP's
cpuset_hotplug_test.sh. Fix this problem by treating this special case as
an exception to skip the setsched security check in cpuset_can_attach()
when a v1 cpuset with tasks have no CPU left.
With that patch applied, the cpuset_hotplug_test.sh test can be run
successfully without failure.
Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Waiman Long [Tue, 31 Mar 2026 15:11:07 +0000 (11:11 -0400)]
cgroup/cpuset: Simplify setsched decision check in task iteration loop of cpuset_can_attach()
Centralize the check required to run security_task_setscheduler() in
the task iteration loop of cpuset_can_attach() outside of the loop as
it has no dependency on the characteristics of the tasks themselves.
There is no functional change.
Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Torvalds [Tue, 31 Mar 2026 17:28:08 +0000 (10:28 -0700)]
Merge tag 'fs_for_v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf fix from Jan Kara:
"Fix for a race in UDF that can lead to memory corruption"
* tag 'fs_for_v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix race between file type conversion and writeback
mpage: Provide variant of mpage_writepages() with own optional folio handler
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IMH9
The Lenovo Yoga Pro 7 14IMH9 (DMI: 83E2) shares PCI SSID 17aa:3847
with the Legion 7 16ACHG6, but has a different codec subsystem ID
(17aa:38cf). The existing SND_PCI_QUIRK for 17aa:3847 applies
ALC287_FIXUP_LEGION_16ACHG6, which attempts to initialize an external
I2C amplifier (CLSA0100) that is not present on the Yoga Pro 7 14IMH9.
As a result, pin 0x17 (bass speakers) is connected to DAC 0x06 which
has no volume control, making hardware volume adjustment completely
non-functional. Audio is either silent or at maximum volume regardless
of the slider position.
Add a HDA_CODEC_QUIRK entry using the codec subsystem ID (17aa:38cf)
to correctly identify the Yoga Pro 7 14IMH9 and apply
ALC287_FIXUP_YOGA9_14IMH9_BASS_SPK_PIN, which redirects pin 0x17 to
DAC 0x02 and restores proper volume control. The existing Legion entry
is preserved unchanged.
This follows the same pattern used for 17aa:386e, where Legion Y9000X
and Yoga Pro 7 14ARP8 share a PCI SSID but are distinguished via
HDA_CODEC_QUIRK.
Xiang Mei [Sat, 28 Mar 2026 06:30:00 +0000 (23:30 -0700)]
bridge: mrp: reject zero test interval to avoid OOM panic
br_mrp_start_test() and br_mrp_start_in_test() accept the user-supplied
interval value from netlink without validation. When interval is 0,
usecs_to_jiffies(0) yields 0, causing the delayed work
(br_mrp_test_work_expired / br_mrp_in_test_work_expired) to reschedule
itself with zero delay. This creates a tight loop on system_percpu_wq
that allocates and transmits MRP test frames at maximum rate, exhausting
all system memory and causing a kernel panic via OOM deadlock.
The same zero-interval issue applies to br_mrp_start_in_test_parse()
for interconnect test frames.
Use NLA_POLICY_MIN(NLA_U32, 1) in the nla_policy tables for both
IFLA_BRIDGE_MRP_START_TEST_INTERVAL and
IFLA_BRIDGE_MRP_START_IN_TEST_INTERVAL, so zero is rejected at the
netlink attribute parsing layer before the value ever reaches the
workqueue scheduling code. This is consistent with how other bridge
subsystems (br_fdb, br_mst) enforce range constraints on netlink
attributes.
Fixes: 20f6a05ef635 ("bridge: mrp: Rework the MRP netlink interface") Fixes: 7ab1748e4ce6 ("bridge: mrp: Extend MRP netlink interface for configuring MRP interconnect") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260328063000.1845376-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Julian Braha [Wed, 25 Mar 2026 00:15:21 +0000 (00:15 +0000)]
ASoC: Intel: boards: fix unmet dependency on PINCTRL
This reverts commit c073f0757663 ("ASoC: Intel: sof_sdw: select PINCTRL_CS42L43 and SPI_CS42L43")
Currently, SND_SOC_INTEL_SOUNDWIRE_SOF_MACH selects PINCTRL_CS42L43
without also selecting or depending on PINCTRL, despite PINCTRL_CS42L43
depending on PINCTRL.
In response to v1 of this patch [1], Arnd pointed out that there is
no compile-time dependency sof_sdw and the PINCTRL_CS42L43 driver.
After testing, I can confirm that the kernel compiled with
SND_SOC_INTEL_SOUNDWIRE_SOF_MACH enabled and PINCTRL_CS42L43 disabled.
This unmet dependency was detected by kconfirm, a static analysis
tool for Kconfig.
Sachin Mokashi [Fri, 27 Mar 2026 13:14:39 +0000 (09:14 -0400)]
ASoC: Intel: ehl_rt5660: Use the correct rtd->dev device in hw_params
In rt5660_hw_params(), the error path for snd_soc_dai_set_sysclk()
correctly uses rtd->dev as the logging device, but the error path for
snd_soc_dai_set_pll() uses codec_dai->dev instead.
These two devices are distinct:
- rtd->dev is the platform device of the PCM runtime (the Intel HDA/SSP
controller, e.g. 0000:00:1f.3), which owns the machine driver callback.
- codec_dai->dev is the I2C device of the rt5660 codec itself
(i2c-10EC5660:00).
Since hw_params is a machine driver operation and both calls are made
within the same function from the machine driver's context, all error
messages should be attributed to rtd->dev. Using codec_dai->dev for one
of them would suggest the error originates inside the codec driver,
which is misleading.
Align the PLL error log with the sysclk one to use rtd->dev, matching
the convention used by all other Intel board drivers in this directory.
====================
Correct BD length masks and BQL accounting for multi-BD TX packets
This patch series fixes two issues in the Xilinx AXI Ethernet driver:
1. Corrects the BD length masks to match the AXIDMA IP spec.
2. Fixes BQL accounting for multi-BD TX packets.
====================
Suraj Gupta [Fri, 27 Mar 2026 07:32:38 +0000 (13:02 +0530)]
net: xilinx: axienet: Fix BQL accounting for multi-BD TX packets
When a TX packet spans multiple buffer descriptors (scatter-gather),
axienet_free_tx_chain sums the per-BD actual length from descriptor
status into a caller-provided accumulator. That sum is reset on each
NAPI poll. If the BDs for a single packet complete across different
polls, the earlier bytes are lost and never credited to BQL. This
causes BQL to think bytes are permanently in-flight, eventually
stalling the TX queue.
The SKB pointer is stored only on the last BD of a packet. When that
BD completes, use skb->len for the byte count instead of summing
per-BD status lengths. This matches netdev_sent_queue(), which debits
skb->len, and naturally survives across polls because no partial
packet contributes to the accumulator.
Fixes: c900e49d58eb ("net: xilinx: axienet: Implement BQL") Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20260327073238.134948-3-suraj.gupta2@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suraj Gupta [Fri, 27 Mar 2026 07:32:37 +0000 (13:02 +0530)]
net: xilinx: axienet: Correct BD length masks to match AXIDMA IP spec
The XAXIDMA_BD_CTRL_LENGTH_MASK and XAXIDMA_BD_STS_ACTUAL_LEN_MASK
macros were defined as 0x007FFFFF (23 bits), but the AXI DMA IP
product guide (PG021) specifies the buffer length field as bits 25:0
(26 bits). Update both masks to match the IP documentation.
In practice this had no functional impact, since Ethernet frames are
far smaller than 2^23 bytes and the extra bits were always zero, but
the masks should still reflect the hardware specification.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20260327073238.134948-2-suraj.gupta2@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pengpeng Hou [Thu, 26 Mar 2026 14:20:33 +0000 (22:20 +0800)]
NFC: pn533: bound the UART receive buffer
pn532_receive_buf() appends every incoming byte to dev->recv_skb and
only resets the buffer after pn532_uart_rx_is_frame() recognizes a
complete frame. A continuous stream of bytes without a valid PN532 frame
header therefore keeps growing the skb until skb_put_u8() hits the tail
limit.
Drop the accumulated partial frame once the fixed receive buffer is full
so malformed UART traffic cannot grow the skb past
PN532_UART_SKB_BUFF_LEN.
Xiang Mei [Thu, 26 Mar 2026 07:55:53 +0000 (00:55 -0700)]
net: bonding: fix use-after-free in bond_xmit_broadcast()
bond_xmit_broadcast() reuses the original skb for the last slave
(determined by bond_is_last_slave()) and clones it for others.
Concurrent slave enslave/release can mutate the slave list during
RCU-protected iteration, changing which slave is "last" mid-loop.
This causes the original skb to be double-consumed (double-freed).
Replace the racy bond_is_last_slave() check with a simple index
comparison (i + 1 == slaves_count) against the pre-snapshot slave
count taken via READ_ONCE() before the loop. This preserves the
zero-copy optimization for the last slave while making the "last"
determination stable against concurrent list mutations.
The UAF can trigger the following crash:
==================================================================
BUG: KASAN: slab-use-after-free in skb_clone
Read of size 8 at addr ffff888100ef8d40 by task exploit/147
The buggy address belongs to the object at ffff888100ef8c80
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 192 bytes inside of
freed 224-byte region [ffff888100ef8c80, ffff888100ef8d60)
Memory state around the buggy address: ffff888100ef8c00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888100ef8c80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888100ef8d00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
^ ffff888100ef8d80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888100ef8e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Fixes: 4e5bd03ae346 ("net: bonding: fix bond_xmit_broadcast return value error bug") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260326075553.3960562-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Takashi Iwai [Tue, 31 Mar 2026 08:12:17 +0000 (10:12 +0200)]
ALSA: ctxfi: Don't enumerate SPDIF1 at DAIO initialization
The recent refactoring of xfi driver changed the assignment of
atc->daios[] at atc_get_resources(); now it loops over all enum
DAIOTYP entries while it looped formerly only a part of them.
The problem is that the last entry, SPDIF1, is a special type that
is used only for hw20k1 CTSB073X model (as a replacement of SPDIFIO),
and there is no corresponding definition for hw20k2. Due to the lack
of the info, it caused a kernel crash on hw20k2, which was already
worked around by the commit b045ab3dff97 ("ALSA: ctxfi: Fix missing
SPDIFI1 index handling").
This patch addresses the root cause of the regression above properly,
simply by skipping the incorrect SPDIF1 type in the parser loop.
For making the change clearer, the code is slightly arranged, too.
songxiebing [Tue, 31 Mar 2026 03:36:50 +0000 (11:36 +0800)]
ALSA: hda/realtek: Add quirk for Lenovo Yoga Slim 7 14AKP10
The Pin Complex 0x17 (bass/woofer speakers) is incorrectly reported as
unconnected in the BIOS (pin default 0x411111f0 = N/A). This causes the
kernel to configure speaker_outs=0, meaning only the tweeters (pin 0x14)
are used. The result is very low, tinny audio with no bass.
The existing quirk ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN (already present
in patch_realtek.c for SSID 0x17aa3801) fixes the issue completely.
Pengpeng Hou [Sat, 28 Mar 2026 23:43:56 +0000 (07:43 +0800)]
bnxt_en: set backing store type from query type
bnxt_hwrm_func_backing_store_qcaps_v2() stores resp->type from the
firmware response in ctxm->type and later uses that value to index
fixed backing-store metadata arrays such as ctx_arr[] and
bnxt_bstore_to_trace[].
ctxm->type is fixed by the current backing-store query type and matches
the array index of ctx->ctx_arr. Set ctxm->type from the current loop
variable instead of depending on resp->type.
Also update the loop to advance type from next_valid_type in the for
statement, which keeps the control flow simpler for non-valid and
unchanged entries.
Fixes: 6a4d0774f02d ("bnxt_en: Add support for new backing store query firmware API") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260328234357.43669-1-pengpeng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Sun, 29 Mar 2026 10:32:27 +0000 (12:32 +0200)]
net: airoha: Delay offloading until all net_devices are fully registered
Netfilter flowtable can theoretically try to offload flower rules as soon
as a net_device is registered while all the other ones are not
registered or initialized, triggering a possible NULL pointer dereferencing
of qdma pointer in airoha_ppe_set_cpu_port routine. Moreover, if
register_netdev() fails for a particular net_device, there is a small
race if Netfilter tries to offload flowtable rules before all the
net_devices are properly unregistered in airoha_probe() error patch,
triggering a NULL pointer dereferencing in airoha_ppe_set_cpu_port
routine. In order to avoid any possible race, delay offloading until
all net_devices are registered in the networking subsystem.