Linus Torvalds [Fri, 5 Jun 2026 17:38:45 +0000 (10:38 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"arm64:
- Correctly drop the ITS translation cache reference when it actually
gets invalidated
- Take the SRCU lock for SW page table walks
- Restore POR_EL0 access to host EL0, avoiding POR_EL0 becoming
inaccessible from EL0 after running a guest
- Reassign nested_mmus array behind mmu_lock, ensuring that vcpu init
and MMU notifiers are mutually exclusive
- Correctly handle FEAT_XNX at stage-2
s390:
- More fixes for the new page table management and nested
virtualization
x86:
- More fixes for GHCB issues:
- Read start/end indices of page size change requests exactly once
per vmexit
- Unmap and unpin the GHCB as needed on vCPU free"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
KVM: arm64: Correctly identify executable PTEs at stage-2
KVM: arm64: nv: Fix handling of XN[0] when !FEAT_XNX
KVM: arm64: Reassign nested_mmus array behind mmu_lock
KVM: arm64: Restore POR_EL0 access to host EL0
KVM: arm64: Take the SRCU lock for page table walks in fault injection and AT emulation
KVM: arm64: vgic-its: Drop the translation cache reference only for the erased entry
KVM: SEV: Unmap and unpin the GHCB as needed on vCPU free
KVM: SEV: Decouple the need to sync the GHCB SA from the need to free the SA
KVM: SEV: Move sev_free_vcpu() down below sev_es_unmap_ghcb()
KVM: Don't WARN if memory is dirtied without a vCPU when the VM is dying
KVM: SEV: Read start/end indices of PSC requests exactly once per #VMGEXIT
KVM: SEV: Add an anonymous "psc" struct to track current PSC metadata
KVM: SEV: Make it more obvious when KVM is writing back the current PSC index
KVM: s390: Remove ptep_zap_softleaf_entry()
KVM: s390: Fix possible reference leak in fault-in code
KVM: s390: Prevent memslots outside the ASCE range
KVM: s390: Lock pte when making page secure
KVM: s390: Fix fault-in code
KVM: s390: vsie: Fix rmap handling in _do_shadow_crste()
KVM: s390: Fix guest / virtual address confusion in _essa_clear_cbrl()
...
Linus Torvalds [Fri, 5 Jun 2026 17:33:32 +0000 (10:33 -0700)]
Merge tag 'probes-fixes-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing/probes fix from Masami Hiramatsu:
"Fix the eprobe event parser to point error position correctly"
* tag 'probes-fixes-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Point the error offset correctly for eprobe argument error
Zhou Yuhang [Wed, 20 May 2026 07:08:00 +0000 (15:08 +0800)]
kconfig: Fix repeated include selftest expectation
The err_repeated_inc test was added with an expected stderr fixture
that does not match the diagnostic printed by kconfig.
Running "make testconfig" currently fails in that test even though the
parser reports the duplicated include correctly:
[stderr]
Kconfig.inc1:4: error: repeated inclusion of Kconfig.inc3
Kconfig.inc2:3: note: location of first inclusion of Kconfig.inc3
The fixture expects "Repeated" and "Location" with capital letters, but
the diagnostic emitted by scripts/kconfig/util.c uses lowercase words.
Update the fixture to match the real message.
Fixes: 102d712ded3e ("kconfig: Error out on duplicated kconfig inclusion") Signed-off-by: Zhou Yuhang <zhouyuhang@kylinos.cn> Tested-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20260520070800.2265479-1-zhouyuhang1010@163.com Signed-off-by: Nicolas Schier <nsc@kernel.org>
Linus Torvalds [Fri, 5 Jun 2026 15:34:32 +0000 (08:34 -0700)]
Merge tag 'xfs-fixes-7.1-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
"A collection of fixes mostly for the RT device, including a small
refactor that has no functional change"
* tag 'xfs-fixes-7.1-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Remove mention of PageWriteback
xfs: abort mount if xfs_fs_reserve_ag_blocks fails
xfs: factor rtgroup geom write pointer reporting into a helper
xfs: drop the RTG reference later in xfs_ioc_rtgroup_geometry
xfs: fix rtgroup cleanup in CoW fork repair
xfs: fix error returns in CoW fork repair
xfs: fix overlapping extents returned for pNFS LAYOUTGET
xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
xfs: handle racing deletions in xfs_zone_gc_iter_irec
Linus Torvalds [Fri, 5 Jun 2026 15:28:10 +0000 (08:28 -0700)]
Merge tag 'erofs-for-7.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Fix a UAF of sbi->sync_decompress when compressed I/Os
race with unmount
- Fix a regression introduced this development cycle that
incorrectly rejects multiple-algorithm images
* tag 'erofs-for-7.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix EFSCORRUPTED on multi-algorithm images in z_erofs_map_sanity_check()
erofs: fix use-after-free on sbi->sync_decompress
Linus Torvalds [Fri, 5 Jun 2026 15:23:02 +0000 (08:23 -0700)]
Merge tag 'v7.1-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix use after free in SMB2_CANCEL
- Fix race in ksmbd_reopen_durable_fd
- Fix oplock and lease break potential NULL-dref
* tag 'v7.1-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix use-after-free of a deferred file_lock on double SMB2_CANCEL
ksmbd: fix durable reconnect double-bind race in ksmbd_reopen_durable_fd
ksmbd: fix NULL-deref of opinfo->conn in oplock/lease break notifiers
Oliver Upton [Tue, 2 Jun 2026 16:59:01 +0000 (09:59 -0700)]
KVM: arm64: Correctly identify executable PTEs at stage-2
KVM invalidates the I-cache before installing an executable PTE on
implementations without DIC. Unfortunately, support for FEAT_XNX
broke this check as KVM_PTE_LEAF_ATTR_HI_S2_XN was expanded to a
bitfield.
Fix it by reusing kvm_pgtable_stage2_pte_prot() and testing the abstract
permission bits instead.
Fixes: 2608563b466b ("KVM: arm64: Add support for FEAT_XNX stage-2 permissions") Reported-by: Sashiko (gemini/gemini-3.1-pro-preview) Signed-off-by: Oliver Upton <oupton@kernel.org> Reviewed-by: Wei-Lin Chang <weilin.chang@arm.com> Link: https://patch.msgid.link/20260602165901.52800-3-oupton@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
Oliver Upton [Tue, 2 Jun 2026 16:59:00 +0000 (09:59 -0700)]
KVM: arm64: nv: Fix handling of XN[0] when !FEAT_XNX
XN has already been extracted from its bitfield position so using
FIELD_PREP() on the mask that clears XN[0] is completely broken, having
the effect of unconditionally granting execute permissions...
Fix the obvious mistake by manipulating the right bit.
Cc: stable@vger.kernel.org Fixes: d93febe2ed2e ("KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2") Reviewed-by: Wei-Lin Chang <weilin.chang@arm.com> Signed-off-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20260602165901.52800-2-oupton@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
kvm->arch.nested_mmus[] is walked under kvm->mmu_lock, including from the
MMU notifier path (kvm_unmap_gfn_range() -> kvm_nested_s2_unmap()), which
can run at any time. kvm_vcpu_init_nested() reallocates the array and frees
the old buffer while holding only kvm->arch.config_lock, so such a walker
can reference the freed array.
Allocate the new array outside of mmu_lock, as the allocation can sleep.
Under the lock, copy the existing entries, fix up the back pointers and
reassign the array. Free the old buffer after dropping the lock, as
kvfree() can sleep as well.
Fixes: 4f128f8e1aaac ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/aiKIVVeIr1aAB1yp@v4bel Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger,kernel.org
Joey Gouly [Thu, 4 Jun 2026 10:54:34 +0000 (11:54 +0100)]
KVM: arm64: Restore POR_EL0 access to host EL0
CPTR_EL2.E0POE was being cleared in __deactivate_cptr_traps_vhe(), which meant
that any accesses to POR_EL0 from host EL0 would trap and be reported to
userspace as an Illegal instruction. This would happen after running any VM,
regardless if it used POE or not.
Hyunwoo Kim [Wed, 3 Jun 2026 12:09:33 +0000 (21:09 +0900)]
KVM: arm64: Take the SRCU lock for page table walks in fault injection and AT emulation
walk_s1() and kvm_walk_nested_s2() expect to be called while holding
kvm->srcu to guard against memslot changes. While this is generally
the case, __kvm_at_s12() and __kvm_find_s1_desc_level() call into the
respective walkers without taking kvm->srcu.
Fix by acquiring kvm->srcu prior to the table walk in both instances.
Cc: stable@vger.kernel.org Fixes: 50f77dc87f13 ("KVM: arm64: Populate level on S1PTW SEA injection") Fixes: be04cebf3e78 ("KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W}") Suggested-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/aiAZfdeyanIvP8SD@v4bel Signed-off-by: Marc Zyngier <maz@kernel.org>
Hyunwoo Kim [Mon, 1 Jun 2026 14:53:26 +0000 (23:53 +0900)]
KVM: arm64: vgic-its: Drop the translation cache reference only for the erased entry
vgic_its_invalidate_cache() walks the per-ITS translation cache with
xa_for_each() and drops the cache's reference on each entry with
vgic_put_irq(). It puts the iterated pointer, though, rather than the
value returned by xa_erase().
The function is called from contexts that do not exclude one another: the
ITS command handlers hold its_lock, the GITS_CTLR write path holds
cmd_lock, and the path that clears EnableLPIs in a redistributor's
GICR_CTLR holds neither. Two or more of them can drain the same cache
concurrently, and if each one observes the same entry, erases it and then
puts it, the single reference the cache holds on that entry is dropped
more than once. The entry can then be freed while an ITE still maps it.
xa_erase() is atomic and returns the previous entry, so put only the entry
that this context actually removed. The cache reference is then dropped
exactly once per entry even when the invalidations run concurrently, and
the behavior is unchanged when only one context runs.
Fixes: 8201d1028caa ("KVM: arm64: vgic-its: Maintain a translation cache per ITS") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/ah2c5lu4JbUg7dj-@v4bel Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
Linus Torvalds [Thu, 4 Jun 2026 21:35:55 +0000 (14:35 -0700)]
Merge tag 'net-7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter, wireless and Bluetooth.
Current release - fix to a fix:
- Bluetooth: MGMT: fix backward compatibility with bluetoothd
which adds stray bytes to MGMT_OP_ADD_EXT_ADV_DATA
Previous releases - regressions:
- af_unix: fix inq_len update inaccuracy on partial read
- eth: fec: fix pinctrl default state restore order on resume
- wifi: iwlwifi:
- mvm: don't support the reset handshake for old firmwares
- pcie: simplify the resume flow if fast resume is not used,
work around NIC access failures
Previous releases - always broken:
- Bluetooth: L2CAP: reject BR/EDR signaling packets over MTUsig
- sctp: fix a couple of bugs in COOKIE_ECHO processing
- sched: fix pedit partial COW leading to page cache corruption
- wifi: nl80211: reject oversized EMA RNR lists
- netfilter:
- conntrack_irc: fix possible out-of-bounds read
- bridge: make ebt_snat ARP rewrite writable
- appletalk: zero-initialize aarp_entry to prevent heap info leak
- ipv4: restrict IPOPT_SSRR and IPOPT_LSRR options
- mptcp: fix number of bugs reported by AI scans and discovered
during NVMe over MPTCP testing"
* tag 'net-7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
Reapply "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"
udp: clear skb->dev before running a sockmap verdict
sctp: purge outqueue on stale COOKIE-ECHO handling
bonding: annotate data-races arcound churn variables
net/802/mrp: fix vector attribute parsing in mrp_pdu_parse_vecattr
rtase: Avoid sleeping in get_stats64()
ieee802154: 6lowpan: only accept IPv6 packets in lowpan_xmit()
ipv6: mcast: Fix use-after-free when processing MLD queries
selftests: net: add vxlan vnifilter notification test
vxlan: vnifilter: fix spurious notification on VNI update
vxlan: vnifilter: send notification on VNI add
rtase: Reset TX subqueue when clearing TX ring
octeontx2-af: npc: Fix CPT channel mask in npc_install_flow
dt-bindings: ethernet: eswin: fix hsp-sp-csr backward compatibility
sctp: validate cached peer INIT chunk length in COOKIE_ECHO processing
net/sched: fix pedit partial COW leading to page cache corruption
vsock/vmci: fix sk_ack_backlog leak on failed handshake
net: bonding: fix NULL pointer dereference in bond_do_ioctl()
geneve: fix length used in GRO hint UDP checksum adjustment
net: ethernet: mtk_eth_soc: Fix use-after-free in metadata dst teardown
...
Linus Torvalds [Thu, 4 Jun 2026 20:38:42 +0000 (13:38 -0700)]
Merge tag 'trace-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
- Fix CFI violation in probestub function
The probestub is a function to allow tprobes to hook to a tracepoint
to gain access to its parameters.
The function itself is only referenced by the tracepoint structure
which lives in the __tracepoint section. objtool explicitly ignores
that section and when processing functions in the kernel, if it
detects one that has no references it will seal it to have its ENDBR
stripped on boot up.
This means the probstub function will have its ENDBR stripped and if
a tprobe is attached to it with IBT enabled, it will go *boom*.
* tag 'trace-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix CFI violation in probestub being called by tprobes
Linus Torvalds [Thu, 4 Jun 2026 19:31:20 +0000 (12:31 -0700)]
Merge tag 's390-7.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Enable IOMMUFD and VFIO cdev such that PCI pass-through to
QEMU/KVM can optionally utilize native IOMMUFD
- With HAVE_ARCH_BUG_FORMAT enabled the BUG infrastructure might
misinterpret flags or fault. Fix this by moving the "format"
field emission into __BUG_ENTRY()
- The generic version of _THIS_IP_ is known to be brittle and may
break with current and future GCC and Clang optimizations. Fix
it by overriding _THIS_IP_
* tag 's390-7.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: Implement _THIS_IP_ using inline asm
s390/bug: Always emit format word in __BUG_ENTRY
s390/configs: Enable IOMMUFD and VFIO cdev in defconfigs
Breno reports a lockdep warning in bnxt. During FW reset the driver
may end up calling netif_set_real_num_tx_queues() (if queue count
changes), so calls to bnxt_open() still require rtnl_lock.
Sechang Lim [Wed, 3 Jun 2026 16:27:33 +0000 (16:27 +0000)]
udp: clear skb->dev before running a sockmap verdict
On the UDP receive path skb->dev is repurposed as dev_scratch (the
truesize/state cache set by udp_set_dev_scratch()), through the
union { struct net_device *dev; unsigned long dev_scratch; } in sk_buff.
When a UDP socket is in a sockmap, sk_data_ready is
sk_psock_verdict_data_ready(), which calls udp_read_skb() -> recv_actor()
(sk_psock_verdict_recv) to run the attached SK_SKB verdict program in softirq.
If that program calls a socket-lookup helper (bpf_sk_lookup_tcp/udp,
bpf_skc_lookup_tcp), bpf_skc_lookup() does:
if (skb->dev)
caller_net = dev_net(skb->dev);
skb->dev still holds the dev_scratch value (a non-NULL integer), so dev_net()
dereferences it as a struct net_device * and the kernel takes a general
protection fault on a non-canonical address in softirq:
The rmem charge that dev_scratch accounted for is released by skb_recv_udp() on
dequeue, just above, so the scratch is dead by the time recv_actor() runs. Clear
skb->dev so bpf_skc_lookup() falls back to sock_net(skb->sk), which
skb_set_owner_sk_safe() set just above.
Fixes: 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()") Cc: stable@vger.kernel.org Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260603162737.697215-1-rhkrqnwk98@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Wed, 3 Jun 2026 18:11:44 +0000 (14:11 -0400)]
sctp: purge outqueue on stale COOKIE-ECHO handling
sctp_stream_update() is only invoked when the association is moved into
COOKIE_WAIT during association setup/reconfiguration. In this path, the
outbound stream scheduler state (stream->out_curr) is expected to be
clean, since no user data should have been transmitted yet unless the
state machine has already partially progressed.
However, a corner case exists in sctp_sf_do_5_2_6_stale(): when a
Stale Cookie ERROR is received, the association is rolled back from
COOKIE_ECHOED to COOKIE_WAIT. In this scenario, user data may already
have been queued and even bundled with the COOKIE-ECHO chunk.
During the rollback, sctp_stream_update() frees the old stream table
and installs a new one, but it does not invalidate stream->out_curr.
As a result, out_curr may still point to a freed sctp_stream_out
entry from the previous stream state.
Later, SCTP scheduler dequeue paths (FCFS, RR, PRIO, etc.) rely on
stream->out_curr->ext, which can lead to use-after-free once the old
stream state has been released via sctp_stream_free().
This results in crashes such as (reported by Yuqi):
BUG: KASAN: slab-use-after-free in sctp_sched_fcfs_dequeue+0x13a/0x140
Read of size 8 at addr ff1100004d4d3208 by task mini_poc/9312
CPU: 1 UID: 1001 PID: 9312 Comm: mini_poc Not tainted 7.1.0-rc1-00305-gbd3a4795d574 #5 PREEMPT(full)
sctp_sched_fcfs_dequeue+0x13a/0x140
sctp_outq_flush+0x1603/0x33e0
sctp_do_sm+0x31c9/0x5d30
sctp_assoc_bh_rcv+0x392/0x6f0
sctp_inq_push+0x1db/0x270
sctp_rcv+0x138d/0x3c10
Fix this by fully purging the association outqueue when handling the
Stale Cookie case. This ensures all pending transmit and retransmit
state is dropped, and any scheduler cached pointers are invalidated,
making it safe to rebuild stream state during COOKIE_WAIT restart.
Updating only stream->out_curr would be insufficient, since queued
and retransmittable data would still reference the old stream state and
trigger later use-after-free in dequeue paths.
Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Reported-by: Yuqi Xu <xuyq21@lenovo.com> Reported-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/94318159b9052907a6cbb7256aee8b5f8dfbfccb.1780510304.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These fields are updated asynchronously by the bonding state machine
in ad_churn_machine() while holding bond->mode_lock.
bond_info_show_slave() and bond_fill_slave_info() read them without
bond->mode_lock being held, we need to add READ_ONCE() and
WRITE_ONCE() annotations.
Note that AD_CHURN_MONITOR, AD_CHURN, and AD_NO_CHURN are defined
exclusively in (kernel private) include/net/bond_3ad.h header.
They should be moved to include/uapi/linux/if_bonding.h or userspace
tools will have to hardcode their values.
Fixes: 4916f2e2f3fc ("bonding: print churn state via netlink") Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260603123514.388226-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yizhou Zhao [Wed, 3 Jun 2026 06:00:13 +0000 (14:00 +0800)]
net/802/mrp: fix vector attribute parsing in mrp_pdu_parse_vecattr
In mrp_pdu_parse_vecattr(), vector attribute events are encoded three
per byte and valen tracks the number of events left to process.
The parser decrements valen after processing the first and second events
from each event byte, but not after processing the third one. When valen
is exactly a multiple of three, the loop continues after the last valid
event and consumes the next byte as a new event byte, applying a
spurious event to the MRP applicant state.
Additionally, when valen is zero the parser unconditionally consumes
attrlen bytes as FirstValue and advances the offset, even though per
IEEE 802.1ak a VectorAttribute with only a LeaveAllEvent has valen of
zero and no FirstValue or Vector fields. This corrupts the offset for
subsequent PDU parsing.
Also, when valen exceeds three the loop crosses byte boundaries but
the attribute value is not incremented between the last event of one
byte and the first event of the next. This causes the first event of
the next byte to use the same attribute value as the third event
rather than the next consecutive value.
Decrement valen after processing the third event, skip FirstValue
consumption when valen is zero, and increment the attribute value at
the end of each loop iteration.
Fixes: febf018d2234 ("net/802: Implement Multiple Registration Protocol (MRP)") Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn> Reported-by: Ao Wang <wangao@seu.edu.cn> Reported-by: Xuewei Feng <fengxw06@126.com> Reported-by: Qi Li <qli01@tsinghua.edu.cn> Reported-by: Ke Xu <xuke@tsinghua.edu.cn> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Link: https://patch.msgid.link/20260603060016.21522-1-zhaoyz24@mails.tsinghua.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Justin Lai [Wed, 3 Jun 2026 06:18:16 +0000 (14:18 +0800)]
rtase: Avoid sleeping in get_stats64()
The .ndo_get_stats64 callback must not sleep because it can be
called when reading /proc/net/dev.
rtase_get_stats64() calls rtase_dump_tally_counter(), which polls
the tally counter dump bit with read_poll_timeout(). This may
sleep while waiting for the hardware counter dump to complete.
Use read_poll_timeout_atomic() instead to avoid sleeping in the
get_stats64() path.
Eric Dumazet [Wed, 3 Jun 2026 07:29:55 +0000 (07:29 +0000)]
ieee802154: 6lowpan: only accept IPv6 packets in lowpan_xmit()
The aoe driver (or similar) generates a non-IPv6 packet
(e.g., ETH_P_AOE) and queues it for transmission via dev_queue_xmit()
on a 6LoWPAN interface (configured by the user or test case).
Since the packet is not IPv6, the 6LoWPAN header_ops->create function
(lowpan_header_create or header_create) returns early without initializing
the lowpan_addr_info structure in the skb headroom.
In the transmit function (lowpan_xmit), the driver calls lowpan_header
(or setup_header) which unconditionally copies and uses the lowpan_addr_info
from the headroom, which contains uninitialized data.
Fix this by dropping non IPv6 packets.
A similar fix is needed in net/bluetooth/6lowpan.c bt_xmit().
Ido Schimmel [Wed, 3 Jun 2026 10:18:11 +0000 (13:18 +0300)]
ipv6: mcast: Fix use-after-free when processing MLD queries
When processing an MLD query, a pointer to the multicast group address
is retrieved when initially parsing the packet. This pointer is later
dereferenced without being reloaded despite the fact that the skb header
might have been reallocated following the pskb_may_pull() calls, leading
to a use-after-free [1].
Fix by copying the multicast group address when the packet is initially
parsed.
[1]
BUG: KASAN: slab-use-after-free in __mld_query_work (net/ipv6/mcast.c:1512)
Read of size 8 at addr ffff8881154b8e90 by task kworker/4:1/118
When a vxlan device has vnifilter enabled, userspace observers
(e.g., bridge monitor vni) miss VNI add events and see spurious
notifications on no-op VNI re-adds.
Patch 1 fixes the missing notification on VNI add: vxlan_vni_add()
guarded the notification on a 'changed' flag that vxlan_vni_update_group()
only sets when a multicast group or remote is supplied, so VNIs added
without a group (e.g., L3 VXLAN) were silently created.
Patch 2 fixes the spurious notification on VNI update: vxlan_vni_update()
tested 'if (changed)' against a bool pointer instead of dereferencing it,
so every re-add produced a notification regardless of whether anything
actually changed.
Patch 3 adds a selftest covering both bugs along with a few related
cases (add with remote, remote update, delete-nonexistent).
====================
Andy Roulin [Tue, 2 Jun 2026 18:51:38 +0000 (11:51 -0700)]
selftests: net: add vxlan vnifilter notification test
Add a selftest for VXLAN vnifilter netlink notifications that verifies
RTM_NEWTUNNEL and RTM_DELTUNNEL are sent correctly when VNIs are added,
deleted, or updated, and that no spurious notifications are sent when
a VNI is re-added with the same attributes.
Andy Roulin [Tue, 2 Jun 2026 18:51:37 +0000 (11:51 -0700)]
vxlan: vnifilter: fix spurious notification on VNI update
When a VNI is re-added with the same attributes (e.g. same group or no
group), vxlan_vni_update() sends a spurious RTM_NEWTUNNEL notification
even though nothing changed.
The bug is that 'if (changed)' tests whether the pointer is non-NULL,
not the bool value it points to. Since every caller passes a valid
pointer, the condition is always true and the notification fires
unconditionally.
Fix by dereferencing the pointer: 'if (*changed)'.
Reproducer:
# ip link add vxlan100 type vxlan dstport 4789 local 10.0.0.1 \
nolearning external vnifilter
# ip link set vxlan100 up
# bridge monitor vni &
# bridge vni add vni 1000 dev vxlan100
# bridge vni add vni 1000 dev vxlan100 # spurious notification
Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device") Signed-off-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260602185138.253265-3-aroulin@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andy Roulin [Tue, 2 Jun 2026 18:51:36 +0000 (11:51 -0700)]
vxlan: vnifilter: send notification on VNI add
When a new VNI is added to a vxlan device with vnifilter enabled,
no RTM_NEWTUNNEL notification is sent to userspace. This means
'bridge monitor vni' never shows VNI add events, even though
VNI delete events are reported correctly.
The bug is in vxlan_vni_add(), where the notification is guarded by
'if (changed)'. The 'changed' flag is set by vxlan_vni_update_group()
only when the multicast group or remote IP is modified, but for a
new VNI added without a group (e.g. in L3 VxLAN interface scenarios),
the function returns early without setting changed=true. Since this
is a new VNI, the notification should be sent unconditionally.
The notification is not guarded by the return value of
vxlan_vni_update_group() because, at this point, the VNI has already
been inserted into the hash table and list with no rollback on error.
The VNI will be visible in 'bridge vni show' regardless, so userspace
should be informed. This is consistent with vxlan_vni_del() which also
notifies unconditionally.
The 'if (changed)' guard remains correct in vxlan_vni_update(), which
handles the case where a VNI already exists and is being re-added --
there, we only want to notify if the group/remote actually changed.
Reproducer:
# ip link add vxlan100 type vxlan dstport 4789 local 10.0.0.1 \
nolearning external vnifilter
# ip link set vxlan100 up
# bridge monitor vni &
# bridge vni add vni 1000 dev vxlan100 # no notification
# bridge vni delete vni 1000 dev vxlan100 # notification received
Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device") Reported-by: Chirag Shah <chirag@nvidia.com> Signed-off-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260602185138.253265-2-aroulin@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
octeontx2-af: npc: Fix CPT channel mask in npc_install_flow
Use the CPT-aware NIX channel mask in the npc_install_flow path so that
when the host PF installs steering rules in kernel for a VF used from
userspace (e.g. DPDK), MCAM entries see the same channel mask semantics as
other RX paths.
Fixes: 56bcef528bd8 ("octeontx2-af: Use npc_install_flow API for promisc and broadcast entries") Cc: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260602045853.1558530-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Tue, 2 Jun 2026 01:06:06 +0000 (21:06 -0400)]
sctp: validate cached peer INIT chunk length in COOKIE_ECHO processing
When a listening SCTP server processes a COOKIE_ECHO chunk, the cached
peer INIT chunk embedded after the cookie is parsed and its parameters
are later walked by sctp_process_init() using sctp_walk_params().
However, the chunk header length of this cached INIT chunk was not
validated against the remaining buffer in the COOKIE_ECHO payload. If
the length field is inflated, the parameter walk can run beyond the
actual received data, leading to out-of-bounds reads and potential
memory corruption during later parameter handling (e.g. STATE_COOKIE
processing and kmemdup() copies).
Add a bounds check in sctp_unpack_cookie() to ensure the cached INIT
chunk length does not exceed the available data in the COOKIE_ECHO
buffer before it is used.
Rajat Gupta [Sun, 31 May 2026 12:32:21 +0000 (08:32 -0400)]
net/sched: fix pedit partial COW leading to page cache corruption
tcf_pedit_act() computes the COW range for skb_ensure_writable()
once before the key loop using tcfp_off_max_hint, but the hint does
not account for the runtime header offset added by typed keys. This
can leave part of the write region un-COW'd.
Fix by moving skb_ensure_writable() inside the per-key loop where
the actual write offset is known, and add overflow checking on the
offset arithmetic. For negative offsets (e.g. Ethernet header edits
at ingress), use skb_cow() to COW the headroom instead. Guard
offset_valid() against INT_MIN, where negation is undefined.
Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable") Reported-by: Yiming Qian <yimingqian591@gmail.com> Reported-by: Keenan Dong <keenanat2000@gmail.com> Reported-by: Han Guidong <2045gemini@gmail.com> Reported-by: Zhang Cen <rollkingzzc@gmail.com> Reviewed-by: Han Guidong <2045gemini@gmail.com> Tested-by: Han Guidong <2045gemini@gmail.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Tested-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> Link: https://patch.msgid.link/20260531123221.48732-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Raf Dickson [Tue, 26 May 2026 10:43:56 +0000 (10:43 +0000)]
vsock/vmci: fix sk_ack_backlog leak on failed handshake
When vmci_transport_recv_connecting_server() returns an error,
vmci_transport_recv_listen() calls vsock_remove_pending() but never
calls sk_acceptq_removed(). This leaves sk_ack_backlog incremented
permanently.
Repeated handshake failures (malformed packets, queue pair alloc
failure, event subscribe failure) cause sk_ack_backlog to climb
toward sk_max_ack_backlog. Once it reaches the limit the listener
permanently refuses all new connections with -ECONNREFUSED, a
silent denial of service requiring a process restart to recover.
The two existing sk_acceptq_removed() calls in af_vsock.c do not
cover this path: line 764 checks vsock_is_pending() which returns
false after vsock_remove_pending(), and line 1889 is only reached
on successful accept().
Fix by balancing sk_acceptq_added() with sk_acceptq_removed() on
the error path.
ZhaoJinming [Mon, 1 Jun 2026 08:56:49 +0000 (16:56 +0800)]
net: bonding: fix NULL pointer dereference in bond_do_ioctl()
In bond_do_ioctl(), slave_dev is obtained via __dev_get_by_name() which
can return NULL if the requested interface name does not exist. However,
the subsequent slave_dbg() call is placed before the NULL check:
The slave_dbg() macro expands to netdev_dbg(bond_dev, "(slave %s): " fmt,
(slave_dev)->name, ...) which unconditionally dereferences slave_dev->name
before the NULL check is performed. This results in a NULL pointer
dereference kernel oops when a user calls bonding ioctl (e.g.
SIOCBONDENSLAVE, SIOCBONDRELEASE, etc.) with a non-existent slave
interface name.
This is reachable from userspace via the bonding ioctl interface with
CAP_NET_ADMIN capability, making it a potential local denial-of-service
vector.
Fix by moving the slave_dbg() call after the NULL check.
Eva Kurchatova [Wed, 3 Jun 2026 15:31:42 +0000 (18:31 +0300)]
tracing: Fix CFI violation in probestub being called by tprobes
The probestub is a function to allow tprobes to hook to a tracepoint to
gain access to its parameters. The function itself is only referenced by
the tracepoint structure which lives in the __tracepoint section. objtool
explicitly ignores that section and when processing functions in the
kernel, if it detects one that has no references it will seal it to have
its ENDBR stripped on boot up.
This means when a tprobe is attached to the sched_wakeup tracepoint, when it
is triggered it will call __probestub_sched_wakeup and due to the missing
ENDBR on a CFI-enabled machine it will take a #CP exception.
Fix this by adding CFI_NOSEAL annotation to probestub declaration.
Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://patch.msgid.link/20260603153147.573589-1-eva.kurchatova@virtuozzo.com Fixes: d5173f753750 ("objtool: Exclude __tracepoints data from ENDBR checks") Signed-off-by: Eva Kurchatova <eva.kurchatova@virtuozzo.com>
[ Updated change log ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Antoine Tenart [Fri, 29 May 2026 14:47:00 +0000 (16:47 +0200)]
geneve: fix length used in GRO hint UDP checksum adjustment
In geneve_post_decap_hint the length used for adjusting the UDP checksum
should be 'skb->len - gro_hint->nested_tp_offset' (UDP length) instead
of 'skb->len - gro_hint->nested_nh_offset' (IP length).
Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path") Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260521131436.748832-1-jhs%40mojatatu.com Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260529144713.780938-1-atenart@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
Fix use-after-free in metadata dst teardown in airoha_eth and mtk_eth_soc drivers
airoha_metadata_dst_free() and mtk_free_dev() call metadata_dst_free()
which frees the metadata_dst with kfree() immediately, bypassing the RCU
grace period.
Replace metadata_dst_free() with dst_release() which properly goes
through the refcount path and runs call_rcu_hurry() if refcount goes to
zero.
====================
net: ethernet: mtk_eth_soc: Fix use-after-free in metadata dst teardown
mtk_free_dev() calls metadata_dst_free() which frees the metadata_dst
with kfree() immediately, bypassing the RCU grace period.
In the RX path, skb_dst_set_noref() sets a non-refcounted pointer from
the skb to the metadata_dst. This function requires RCU read-side
protection and the dst must remain valid until all RCU readers complete.
Since metadata_dst_free() calls kfree() directly, a use-after-free can
occur if any skb still holds a noref pointer to the dst when the driver
tears it down.
Replace metadata_dst_free() with dst_release() which properly goes
through the refcount path: when the refcount drops to zero, it schedules
the actual free via call_rcu_hurry(), ensuring all RCU readers have
completed before the memory is freed.
net: airoha: Fix use-after-free in metadata dst teardown
airoha_metadata_dst_free() runs metadata_dst_free() which frees the
metadata_dst with kfree() immediately, bypassing the RCU grace period.
In the RX path, skb_dst_set_noref() sets a non-refcounted pointer from
the skb to the metadata_dst. This function requires RCU read-side
protection and the dst must remain valid until all RCU readers complete.
Since metadata_dst_free() calls kfree() directly, an use-after-free can
occur if any skb still holds a noref pointer to the dst when the driver
tears it down.
Replace metadata_dst_free() with dst_release() which properly goes
through the refcount path: when the refcount drops to zero, it schedules
the actual free via call_rcu_hurry(), ensuring all RCU readers have
completed before the memory is freed.
Jakub Kicinski [Thu, 4 Jun 2026 02:07:46 +0000 (19:07 -0700)]
Merge tag 'for-net-2026-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_core: fix memory leak in error path of hci_alloc_dev()
- hci_sync: reject oversized Broadcast Announcement prepend
- MGMT: Fix backward compatibility with userspace
- MGMT: validate advertising TLV before type checks
- L2CAP: reject BR/EDR signaling packets over MTUsig
- RFCOMM: validate skb length in MCC handlers
- RFCOMM: hold listener socket in rfcomm_connect_ind()
- ISO: Fix not releasing hdev reference on iso_conn_big_sync
- ISO: Fix a use-after-free of the hci_conn pointer
- ISO: Fix data-race on iso_pi fields in hci_get_route calls
- SCO: Fix data-race on sco_pi fields in sco_connect
- BNEP: reject short frames before parsing
* tag 'for-net-2026-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Fix backward compatibility with userspace
Bluetooth: SCO: Fix data-race on sco_pi fields in sco_connect
Bluetooth: ISO: Fix data-race on iso_pi fields in hci_get_route calls
Bluetooth: ISO: Fix a use-after-free of the hci_conn pointer
Bluetooth: ISO: Fix not releasing hdev reference on iso_conn_big_sync
Bluetooth: fix memory leak in error path of hci_alloc_dev()
Bluetooth: bnep: reject short frames before parsing
Bluetooth: hci_sync: reject oversized Broadcast Announcement prepend
Bluetooth: L2CAP: reject BR/EDR signaling packets over MTUsig
Bluetooth: RFCOMM: validate skb length in MCC handlers
Bluetooth: MGMT: validate advertising TLV before type checks
Bluetooth: RFCOMM: hold listener socket in rfcomm_connect_ind()
====================
Jakub Kicinski [Thu, 4 Jun 2026 02:07:34 +0000 (19:07 -0700)]
Merge tag 'wireless-2026-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Things are finally quieting down:
- iwlwifi:
- FW reset handshake removal for older devices
- NIC access fix in fast resume
- avoid too large command for some BIOSes
- fix TX power constraints in AP mode
- cfg80211:
- fix netlink parse overflow
- fix potential 6 GHz scan memory leak
- enforce HE/EHT consistency to avoid mac80211 crash
- mac80211: guard radiotap antenna parsing
* tag 'wireless-2026-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: enforce HE/EHT cap/oper consistency
wifi: fix leak if split 6 GHz scanning fails
wifi: mac80211: limit injected antenna index in ieee80211_parse_tx_radiotap
wifi: nl80211: reject oversized EMA RNR lists
wifi: iwlwifi: pcie: simplify the resume flow if fast resume is not used
wifi: iwlwifi: mvm: avoid oversized UATS command copy
wifi: iwlwifi: mld: send tx power constraints before link activation
wifi: iwlwifi: mvm: don't support the reset handshake for old firmwares
====================
Jakub Kicinski [Thu, 4 Jun 2026 02:04:46 +0000 (19:04 -0700)]
Merge branch 'mptcp-misc-fixes-for-v7-1-rc7'
Matthieu Baerts says:
====================
mptcp: misc fixes for v7.1-rc7
Here are various unrelated fixes:
- Patch 1: fix missing wakeups when multiple threads are reading from
the same fd. A fix for v5.7.
- Patch 2: fix retransmission loop when MPTCP checksum is enabled. A fix
for v5.14.
- Patch 3: fix a TOCTOU race while computing rcv_wnd. A fix for v5.11.
- Patch 4: allow subflows receive window to shrink if needed. A fix for
v5.19.
- Patches 5-6: avoid 'extra_subflows' to underflow with the userspace
PM. A fix for v5.19.
- Patch 7: report errors if one subflow cannot set SO_TIMESTAMPING. A
fix for v5.14.
- Patch 8: try to set TCP_MAXSEG on all subflows, before reporting
errors, if any. A fix for v6.17.
- Patch 9: check desc->count in read_sock, to act as expected. A fix
for v7.0.
- Patch 10: fix an uninit value in mptcp_established_options, reported
by syzbot. A fix for v7.1-rc1.
- Patch 11: fix a similar issue than the previous patch, exposed by the
same modification from v7.1-rc1, but was already causing issues since
v5.15.
====================
When an ADD_ADDR needs to be sent, it could be prepared if there is
enough remaining space and even if the packet is not a pure ACK. But it
would be dropped soon after.
Indeed, in mptcp_pm_add_addr_signal(), there is enough space to fit a
DSS of 20 octets and an ADD_ADDR echo containing an IPv4 address on 8
octets for example. In this case, the packet would be prepared, the
MPTCP_ADD_ADDR_ECHO bit would be removed from pm->addr_signal, but the
option would be silently dropped in mptcp_established_options_add_addr()
not to override DSS info in the union from 'struct mptcp_out_options',
and also because mptcp_write_options() will enforce mutually exclusion
with DSS.
Instead, don't even try to send an ADD_ADDR if it is not a pure ACK.
Retry for each new packet until a pure-ACK is emitted. That's fine to do
that, because each time an ADD_ADDR (echo) is scheduled, a pure ACK is
queued.
This also simplifies the code, and the skb checks can be done earlier,
before the lock.
Note: also, since commit 6d0060f600ad ("mptcp: Write MPTCP DSS headers
to outgoing data packets"), opts->ahmac would not have been set to 0
when other suboptions were not dropped, and when sending an ADD_ADDR
echo. That would have resulted in sending an ADD_ADDR using garbage
info, where there was not enough space, instead of an echo one without
the ADD_ADDR HMAC.
Local variable opts created at:
__tcp_transmit_skb+0x4d/0x5fe0 net/ipv4/tcp_output.c:1536
__tcp_send_ack+0x967/0xad0 net/ipv4/tcp_output.c:4499
The output path currently omits initializing the mptcp extension
`use_map` flag in a few corner cases.
Address the issue always zeroing all the extensions flags before
eventually initializing the individual bits. To that extent, introduce
and use a struct_group to avoid multiple bitwise operations.
Gang Yan [Tue, 2 Jun 2026 12:14:16 +0000 (22:14 +1000)]
mptcp: check desc->count in read_sock
__tcp_read_sock() checks desc->count after each skb is consumed and
breaks the loop when it reaches 0. The MPTCP variant lacks this check.
This is a functional bug, other subsystems also rely on this check:
TLS strparser sets desc->count to 0 once a full TLS record is assembled
and depends on this break to stop reading.
Add the same desc->count check to __mptcp_read_sock(), mirroring
__tcp_read_sock().
The mptcp_setsockopt_all_sf(), currently used only with TCP_MAXSEG,
stopped when one subflow returned an error.
Even if it is not wrong, this is different from the other helpers trying
to set the option on all subflows, and then returning an error if at
least one of them had an issue.
Follow this behaviour, for a question of uniformity.
sock_set_timestamping() can fail for different reasons. The returned
value should then be checked.
If sock_set_timestamping() fails for at least one subflow, the first
error is now reported to the userspace, similar to what is done with
other socket options.
Fixes: 9061f24bf82e ("mptcp: sockopt: propagate timestamp request to subflows") Cc: stable@vger.kernel.org Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Closes: https://lore.kernel.org/willemdebruijn.kernel.178a41a53d041@gmail.com Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-7-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tao Cui [Tue, 2 Jun 2026 12:14:13 +0000 (22:14 +1000)]
selftests: mptcp: add test for extra_subflows underflow on userspace PM
Add a test to verify that when userspace PM fails to create a subflow
(e.g. using an unreachable address), the extra_subflows counter is not
decremented below zero.
Fixes: 77e4b94a3de6 ("mptcp: update userspace pm infos") Cc: stable@vger.kernel.org Signed-off-by: Tao Cui <cuitao@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-6-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tao Cui [Tue, 2 Jun 2026 12:14:12 +0000 (22:14 +1000)]
mptcp: pm: fix extra_subflows underflow on userspace PM subflow creation
The userspace PM increments extra_subflows after __mptcp_subflow_connect()
succeeds, but __mptcp_subflow_connect() calls mptcp_pm_close_subflow()
on failure to roll back the pre-increment done by the kernel PM's fill_*()
helpers. Because the userspace PM hasn't incremented yet at that point,
this decrement is spurious and causes extra_subflows to underflow.
Fix it by aligning the userspace PM with the kernel PM: increment
extra_subflows before calling __mptcp_subflow_connect(), so the existing
error path in subflow.c correctly rolls it back on failure. Also simplify
the error handling by taking pm.lock only when needed for cleanup.
Fixes: 77e4b94a3de6 ("mptcp: update userspace pm infos") Cc: stable@vger.kernel.org Signed-off-by: Tao Cui <cuitao@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-5-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 2 Jun 2026 12:14:11 +0000 (22:14 +1000)]
mptcp: allow subflow rcv wnd to shrink
In MPTCP connection, the `window` field in the TCP header refers to the
MPTCP-level rcv_nxt and it's right edge should not move backward. Such
constraint is enforced at DSS option generation time.
At the same time, the TCP stack ensures independently that the TCP-level
rcv wnd right's edge does not move backward. That in turn causes artificial
inflating of the MPTCP rcv window when the incoming data is acked at the
TCP level and is OoO in the MPTCP sequence space (or lands in the backlog).
As a consequence, the incoming traffic can exceed the receiver rcvbuf size
even when the sender is not misbehaving.
Prevent such scenario forcibly allowing the TCP subflow to shrink the
TCP-level rcv wnd regardless of the current netns setting.
Fixes: f3589be0c420 ("mptcp: never shrink offered window") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-4-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 2 Jun 2026 12:14:10 +0000 (22:14 +1000)]
mptcp: close TOCTOU race while computing rcv_wnd
The MPTCP output path access locklessly the MPTCP-level ack_seq
in multiple times, using possibly different values for the data_ack
in the DSS option and to compute the announced rcv wnd for the same
packet.
Refactor the cote to avoid inconsistencies which may confuse the
peer. Also ensure that the MPTCP level rcv wnd is updated only when
the egress packet actually contains a DSS ack.
Fixes: fa3fe2b15031 ("mptcp: track window announced to peer") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-3-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 2 Jun 2026 12:14:09 +0000 (22:14 +1000)]
mptcp: fix retransmission loop when csum is enabled
Sashiko noted that retransmission with csum enabled can actually
transmit new data, but currently the relevant code does not update
accordingly snd_nxt.
The may cause incoming ack drop and an endless retransmission loop.
Paolo Abeni [Tue, 2 Jun 2026 12:14:08 +0000 (22:14 +1000)]
mptcp: fix missing wakeups in edge scenarios
The mptcp_recvmsg() can fill MPTCP socket receive queue via
mptcp_move_skbs(), but currently does not try to wakeup any listener,
because the same process is going to check the receive queue soon.
When multiple threads are reading from the same fd, the above can
cause stall. Add the missing wakeup.
Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-1-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kurt Kanzenbach [Fri, 29 May 2026 17:11:47 +0000 (19:11 +0200)]
ptp: vclock: Switch from RCU to SRCU
The usage of PTP vClocks leads immediately to the following issues with
ptp4l with LOCKDEP and DEBUG_ATOMIC_SLEEP enabled: "BUG: sleeping function
called from invalid context".
ptp_convert_timestamp() acquires a mutex_t within a RCU read section. This
is illegal, because acquiring a mutex_t can result in voluntary scheduling
request which is not allowed within a RCU read section.
Replace the RCU usage with SRCU where sleeping is allowed.
Reported-by: Florian Zeitz <florian.zeitz@schettke.com> Closes: https://lore.kernel.org/all/00a8cce8-410e-4038-98af-49be6d93d7bd@schettke.com/ Fixes: 67d93ffc0f3c ("ptp: vclock: use mutex to fix "sleep on atomic" bug") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20260529-vclock_rcu-v2-1-02a5531fab92@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 2 Jun 2026 16:15:47 +0000 (16:15 +0000)]
ipv4: restrict IPOPT_SSRR and IPOPT_LSRR options
This patch restricts setting Loose Source and Record Route (LSRR)
and Strict Source and Record Route (SSRR) IP options to users
with CAP_NET_RAW capability.
This prevents unprivileged applications from forcing packets to route
through attacker-controlled nodes to leak TCP ISN and possibly other
protocol information.
While LSRR and SSRR are commonly filtered in many network environments,
they may still be supported and forwarded along some network paths.
RFC 7126 (Recommendations on Filtering of IPv4 Packets Containing
IPv4 Options) recommend to drop these options in 4.3 and 4.4.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Tamir Shahar <tamirthesis@gmail.com> Reported-by: Amit Klein <aksecurity@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260602161547.2642155-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianyu Li [Mon, 1 Jun 2026 11:36:40 +0000 (19:36 +0800)]
af_unix: Add test for SCM_INQ on partial read
Add test to verify that when a skb is partially consumed,
unix_inq_len() return correct remaining byte count.
Before:
# RUN scm_inq.stream.partial_read ...
# scm_inq.c:165:partial_read:Expected remain (512) == *(int *)CMSG_DATA(cmsg) (768)
# partial_read: Test terminated by assertion
# FAIL scm_inq.stream.partial_read
not ok 2 scm_inq.stream.partial_read
After:
# RUN scm_inq.stream.partial_read ...
# OK scm_inq.stream.partial_read
ok 2 scm_inq.stream.partial_read
Jianyu Li [Mon, 1 Jun 2026 11:36:39 +0000 (19:36 +0800)]
af_unix: Fix inq_len update problem in partial read
Currently inq_len is updated only when the whole skb is consumed.
If only part of the data is read, following SIOCINQ query would
get value greater than what actually left.
This change update inq_len timely in unix_stream_read_generic(),
and adjust unix_stream_read_skb() accordingly to prevent
repetitive update.
Fixes: f4e1fb04c123 ("af_unix: Use cached value for SOCK_STREAM in unix_inq_len().") Signed-off-by: Jianyu Li <jianyu.li@mediatek.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260601113640.231897-2-jianyu.li@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Suman Ghosh [Fri, 29 May 2026 11:37:05 +0000 (17:07 +0530)]
octeontx2-af: Fix initialization of mcam's entry2target_pffunc field
NPC mcam entry stores a mapping between mcam entry and target pcifunc.
During initialization of this field, API kmalloc_array has been used which
caused some junk values to array. Whereas, the array is expected to be
initialized by 0. This patch fixes the same by using kcalloc instead of
kmalloc_array.
Fixes: 55307fcb9258 ("octeontx2-af: Add mbox messages to install and delete MCAM rules") Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1780054625-17090-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geetha sowjanya [Fri, 29 May 2026 11:37:57 +0000 (17:07 +0530)]
octeontx2-pf: Fix NDC sync operation errors
On system reboot "rvu_nicpf 0002:03:00.0: NDC sync operation failed"
error messages are shown, even if the operations is successful.
This is due to wrong if error check in ndc_syc() function.
Fixes: 42c45ac1419c ("octeontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAM") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1780054677-17249-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yizhou Zhao [Fri, 29 May 2026 10:50:16 +0000 (18:50 +0800)]
appletalk: aarp: zero-initialize aarp_entry to prevent heap info leak
aarp_alloc() allocates struct aarp_entry without zeroing it, but only
initializes refcnt and packet_queue. When an unresolved AARP entry is
created, hwaddr[ETH_ALEN] is left uninitialized.
aarp_seq_show() later prints this field with %pM when users read
/proc/net/atalk/arp. This can expose 6 bytes of stale heap data for
each unresolved entry.
Fix this by zero-initializing struct aarp_entry at allocation time.
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn> Reported-by: Ao Wang <wangao@seu.edu.cn> Reported-by: Xuewei Feng <fengxw06@126.com> Reported-by: Qi Li <qli01@tsinghua.edu.cn> Reported-by: Ke Xu <xuke@tsinghua.edu.cn> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260529105017.81531-1-zhaoyz24@mails.tsinghua.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonas Jelonek [Thu, 28 May 2026 20:52:40 +0000 (20:52 +0000)]
net: sfp: initialize i2c_block_size at adapter configure time
sfp->i2c_block_size is only assigned in sfp_sm_mod_probe(), which runs
from the state machine timer after SFP_F_PRESENT has been set. Between
those two points, sfp_module_eeprom() (the ethtool -m callback) gates
only on SFP_F_PRESENT and can be entered with i2c_block_size still at
its kzalloc'd value of 0.
On a pure-I2C adapter, sfp_i2c_read() then issues an i2c_transfer()
with msgs[1].len = 0 inside a loop that subtracts this_len from len
each iteration; on adapters that succeed a zero-length read the loop
never advances, spinning while holding rtnl_lock.
This was previously addressed by initializing i2c_block_size in
sfp_alloc() (commit 813c2dd78618), but the initialization was dropped
when i2c_block_size was split from i2c_max_block_size.
Initialize sfp->i2c_block_size from sfp->i2c_max_block_size in
sfp_i2c_configure(), so the field is valid as soon as the adapter is
known. sfp_sm_mod_probe() still reassigns it on each module insertion
to recover from a per-module clamp to 1 (sfp_id_needs_byte_io).
Jason Xing [Sat, 30 May 2026 04:26:30 +0000 (12:26 +0800)]
xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
The TX metadata area resides in the UMEM buffer which is memory-mapped
and concurrently writable by userspace. In xsk_skb_metadata(),
csum_start and csum_offset are read from shared memory for bounds
validation, then read again for skb assignment. A malicious userspace
application can race to overwrite these values between the two reads,
bypassing the bounds check and causing out-of-bounds memory access
during checksum computation in the transmit path.
Fix this by reading csum_start and csum_offset into local variables
once, then using the local copies for both validation and assignment.
Note that other metadata fields (flags, launch_time) and the cached
csum fields may be mutually inconsistent due to concurrent userspace
writes, but this is benign: the only security-critical invariant is
that each field's validated value is the same one used, which local
caching guarantees.
Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/ Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") Link: https://patch.msgid.link/20260530042630.80626-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 3 Jun 2026 16:09:24 +0000 (09:09 -0700)]
Merge tag 'mmc-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix host controller programming for eMMC fixed driver type
MMC host:
- dw_mmc-rockchip: Add missing private data for very old controllers
- litex_mmc: Fix clock management
- renesas_sdhi: Add OF entry for RZ/G2H SoC
- sdhci: Manage signal voltage switch during system resume for some hosts
- sdhci-of-dwcmshc: Fix reset, clk and SDIO support for Eswin EIC7700"
* tag 'mmc-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci: add signal voltage switch in sdhci_resume_host
mmc: dw_mmc-rockchip: Add missing private data for very old controllers
mmc: litex_mmc: Set mandatory idle clocks before CMD0
mmc: litex_mmc: Use DIV_ROUND_UP for more accurate clock calculation
mmc: renesas_sdhi: Add OF entry for RZ/G2H SoC
mmc: sdhci-of-dwcmshc: Fix reset, clk, and SDIO support for Eswin EIC7700
mmc: core: Fix host controller programming for fixed driver type
Linus Torvalds [Wed, 3 Jun 2026 15:59:24 +0000 (08:59 -0700)]
Merge tag 'cgroup-for-7.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"One cpuset fix and a maintenance update, both low-risk:
- Fix cpuset partition CPU accounting under sibling CPU exclusion
that could produce wrong CPU assignments and trigger
scheduling-domain warnings. Includes selftests.
- Update an email address in MAINTAINERS"
* tag 'cgroup-for-7.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Change Ridong's email
cgroup/cpuset: Add test cases for sibling CPU exclusion on partition update
cgroup/cpuset: Use effective_xcpus in partcmd_update add/del mask calculation
Linus Torvalds [Wed, 3 Jun 2026 15:52:26 +0000 (08:52 -0700)]
Merge tag 'sched_ext-for-7.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
"Two low-risk fixes:
- Drop a spurious warning that can fire during cgroup migration while
a sched_ext scheduler is loaded
- Fix a drgn-based debug script that broke after scheduler state
moved into a per-scheduler struct"
* tag 'sched_ext-for-7.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task()
tools/sched_ext: Fix scx_show_state per-scheduler state reads
Bluetooth: MGMT: Fix backward compatibility with userspace
bluetoothd has a bug with makes it send extra bytes as part of
MGMT_OP_ADD_EXT_ADV_DATA which are now being checked to be the
exact the expected length, relax this so only when the expected
length is greater than the data length to cause an error since
that would result in accessing invalid memory, otherwise just
ignore the extra bytes.
SeungJu Cheon [Mon, 1 Jun 2026 11:19:08 +0000 (20:19 +0900)]
Bluetooth: SCO: Fix data-race on sco_pi fields in sco_connect
sco_sock_connect() copies the destination address into sco_pi(sk)->dst
under lock_sock(), then releases the lock and calls sco_connect(),
which reads dst, src, setting, and codec without holding lock_sock() in
hci_get_route() and hci_connect_sco().
These fields may be modified concurrently by connect(), bind(), or
setsockopt() on the same socket, resulting in data-races reported by
KCSAN.
Fix this by snapshotting dst, src, setting, and codec under lock_sock()
at the start of sco_connect() before passing them to hci_get_route()
and hci_connect_sco().
BUG: KCSAN: data-race in memcmp+0x45/0xb0
race at unknown origin, with read to 0xffff88800e6b0dd0 of 1 bytes
by task 315 on cpu 0:
memcmp+0x45/0xb0
hci_connect_acl+0x1b7/0x6b0
hci_connect_sco+0x4d/0xb30
sco_sock_connect+0x27b/0xd60
__sys_connect_file+0xbd/0xe0
__sys_connect+0xe0/0x110
__x64_sys_connect+0x40/0x50
x64_sys_call+0xcad/0x1c60
do_syscall_64+0x133/0x590
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 9a8ec9e8ebb5 ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm") Signed-off-by: SeungJu Cheon <suunj1331@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
SeungJu Cheon [Mon, 1 Jun 2026 11:19:07 +0000 (20:19 +0900)]
Bluetooth: ISO: Fix data-race on iso_pi fields in hci_get_route calls
iso_connect_bis(), iso_connect_cis(), iso_listen_bis(), and
iso_conn_big_sync() call hci_get_route() using iso_pi(sk)->dst,
iso_pi(sk)->src, and iso_pi(sk)->src_type without holding lock_sock().
These fields may be modified concurrently by connect() or setsockopt()
on the same socket, resulting in data-races reported by KCSAN.
Fix this by snapshotting the required fields under lock_sock() before
calling hci_get_route().
BUG: KCSAN: data-race in memcmp+0x45/0xb0
race at unknown origin, with read to 0xffff8880122135cf of 1 bytes
by task 333 on cpu 1:
memcmp+0x45/0xb0
hci_get_route+0x27e/0x490
iso_connect_cis+0x4c/0xa10
iso_sock_connect+0x60e/0xb30
__sys_connect_file+0xbd/0xe0
__sys_connect+0xe0/0x110
__x64_sys_connect+0x40/0x50
x64_sys_call+0xcad/0x1c60
do_syscall_64+0x133/0x590
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 241f51931c35 ("Bluetooth: ISO: Avoid circular locking dependency") Signed-off-by: SeungJu Cheon <suunj1331@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: ISO: Fix a use-after-free of the hci_conn pointer
In iso_sock_rebind_bc(), the bis pointer is cached, then the socket lock is
dropped:
bis = iso_pi(sk)->conn->hcon;
/* Release the socket before lookups since that requires hci_dev_lock
* which shall not be acquired while holding sock_lock for proper
* ordering.
*/
release_sock(sk);
hci_dev_lock(bis->hdev);
During the unlocked window, could a concurrent close() destroy the connection
and free the bis structure, causing hci_dev_lock(bis->hdev) to access memory
after it is freed, fix this by using the hdev reference which was safely
acquired via iso_conn_get_hdev().
Fixes: d3413703d5f8 ("Bluetooth: ISO: Add support to bind to trigger PAST") Reported-by: Sashiko <sashiko-bot@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: ISO: Fix not releasing hdev reference on iso_conn_big_sync
hci_get_route() returns a reference-counted hci_dev pointer via
hci_dev_hold(). The function exits normally or with an error without ever
releasing it.
Fixes: 07a9342b94a9 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync") Reported-by: Sashiko <sashiko-bot@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bharath Reddy [Mon, 1 Jun 2026 03:24:26 +0000 (08:54 +0530)]
Bluetooth: fix memory leak in error path of hci_alloc_dev()
Early failures in Bluetooth HCI UART configuration leak SRCU percpu
memory.
When device initialization fails before hci_register_dev() completes,
the HCI_UNREGISTER flag is never set. As a result, when the device
reference count reaches zero, bt_host_release() evaluates this flag as
false and falls back to a direct kfree(hdev).
Because hci_release_dev() is bypassed, the SRCU struct initialized
early in hci_alloc_dev() is never cleaned up, resulting in a leak of
percpu memory.
Fix the leak by explicitly calling cleanup_srcu_struct() in the
fallback (unregistered) branch of bt_host_release() before freeing
the device.
Reported-by: syzbot+535ecc844591e50588a5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=535ecc844591e50588a5 Tested-by: syzbot+535ecc844591e50588a5@syzkaller.appspotmail.com Fixes: 1d6123102e9f ("Bluetooth: hci_core: Fix use-after-free in vhci_flush()") Signed-off-by: Bharath Reddy <kbreddy.rpbc@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Zhang Cen [Fri, 29 May 2026 03:22:09 +0000 (11:22 +0800)]
Bluetooth: bnep: reject short frames before parsing
A BNEP peer can send a short BNEP SDU. bnep_rx_frame() reads the
packet type byte immediately and, for control packets, reads the control
opcode and setup UUID-size byte before proving that those bytes are
present. bnep_rx_control() also dereferences the control opcode without
rejecting an empty control payload.
Use skb_pull_data() for the fixed fields in bnep_rx_frame() so a NULL
return gates each dereference. Split the control handler so the frame
path can pass an opcode that has already been pulled, and keep the
byte-buffer wrapper for extension control payloads.
For BNEP_SETUP_CONN_REQ, name the UUID-size byte before pulling the
setup payload. struct bnep_setup_conn_req carries destination and source
service UUIDs after that byte, each uuid_size bytes, so the parser now
documents that tuple explicitly instead of leaving the pull length as an
opaque multiplication.
Validation reproduced this kernel report:
KASAN slab-out-of-bounds in bnep_rx_frame.isra.0+0x130c/0x1790
The buggy address belongs to the object at ffff88800c0f7908 which belongs
to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes to the right of allocated 1-byte
region [ffff88800c0f7908, ffff88800c0f7909)
Read of size 1
Call trace:
dump_stack_lvl+0xb3/0x140 (?:?)
print_address_description+0x57/0x3a0 (?:?)
bnep_rx_frame+0x130c/0x1790 (net/bluetooth/bnep/core.c:306)
print_report+0xb9/0x2b0 (?:?)
__virt_addr_valid+0x1ba/0x3a0 (?:?)
srso_alias_return_thunk+0x5/0xfbef5 (?:?)
kasan_addr_to_slab+0x21/0x60 (?:?)
kasan_report+0xe0/0x110 (?:?)
process_one_work+0xfce/0x17e0 (kernel/workqueue.c:3200)
worker_thread+0x65c/0xe40 (?:?)
__kthread_parkme+0x184/0x230 (?:?)
kthread+0x35e/0x470 (?:?)
_raw_spin_unlock_irq+0x28/0x50 (?:?)
ret_from_fork+0x586/0x870 (?:?)
__switch_to+0x74f/0xdc0 (?:?)
ret_from_fork_asm+0x1a/0x30 (?:?)
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Assisted-by: Codex:gpt-5.5 Signed-off-by: Zhang Cen <rollkingzzc@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Existing advertising instances can already hold the maximum extended
advertising payload. When hci_adv_bcast_annoucement() prepends the
Broadcast Announcement service data to that payload, the combined data
may no longer fit in the temporary buffer used to rebuild the
advertising data.
Reject that case before copying the existing payload and report the
failure through the device log. This keeps the existing advertising
data intact and avoids overrunning the temporary buffer.
Fixes: 5725bc608252 ("Bluetooth: hci_sync: Fix broadcast/PA when using an existing instance") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:GPT-5.4 Signed-off-by: Yuqi Xu <xuyq21@lenovo.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: L2CAP: reject BR/EDR signaling packets over MTUsig
net/bluetooth/l2cap_core.c:l2cap_sig_channel() accepts BR/EDR
signaling packets up to the channel MTU and dispatches each command
without enforcing the signaling MTU (MTUsig). A Bluetooth BR/EDR peer
within radio range can send a fixed-channel CID 0x0001 packet that is
larger than MTUsig and contains many L2CAP_ECHO_REQ commands before
pairing. In a real-radio stock-kernel run, one 681-byte signaling
packet containing 168 zero-length ECHO_REQ commands made the target
transmit 168 ECHO_RSP frames over about 220 ms.
Impact: a Bluetooth BR/EDR peer within radio range, before pairing, can
force 168 ECHO_RSP frames from one 681-byte fixed-channel signaling
packet containing packed ECHO_REQ commands.
Define Linux's BR/EDR signaling MTU as the spec minimum of 48 bytes and
reject any larger signaling packet with one L2CAP_COMMAND_REJECT_RSP
carrying L2CAP_REJ_MTU_EXCEEDED before any command is dispatched.
The Bluetooth Core spec wording for MTUExceeded says the reject
identifier shall match the first request command in the packet, and
that packets containing only responses shall be silently discarded.
Linux intentionally deviates from that prescription: silently
discarding desynchronizes the peer because the remote stack never
learns its responses were dropped, and locating the first request
command requires walking command headers past MTUsig, i.e. processing
bytes from a packet we have already decided is too large to process.
We therefore always emit one reject and use the identifier from the
first command header, a single fixed-offset byte read.
The unrestricted BR/EDR signaling parser and ECHO_REQ response path both
trace to the initial git import; no later introducing commit is
available for a Fixes tag.
SeungJu Cheon [Mon, 25 May 2026 11:04:43 +0000 (20:04 +0900)]
Bluetooth: RFCOMM: validate skb length in MCC handlers
The RFCOMM MCC handlers cast skb->data to protocol-specific structs
without validating skb->len first. A malicious remote device can send
truncated MCC frames and trigger out-of-bounds reads in these handlers.
Fix this by using skb_pull_data() to validate and access the required
data before dereferencing it.
rfcomm_recv_rpn() requires special handling since ETSI TS 07.10 allows
1-byte RPN requests. Handle this by validating only the DLCI byte first,
and validating the full struct only when len > 1.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Muhammad Bilal <meatuni001@gmail.com> Signed-off-by: SeungJu Cheon <suunj1331@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Zhang Cen [Thu, 28 May 2026 09:45:06 +0000 (17:45 +0800)]
Bluetooth: MGMT: validate advertising TLV before type checks
tlv_data_is_valid() reads each advertising data field length from
data[i], then inspects data[i + 1] for managed EIR types before
checking that the current field still fits inside the supplied buffer.
A malformed field whose length byte is the last byte of the buffer can
therefore make the parser read one byte past the advertising data.
KASAN reported the following when a malformed MGMT_OP_ADD_ADVERTISING
request reached that path:
BUG: KASAN: vmalloc-out-of-bounds in tlv_data_is_valid()
Read of size 1
Call trace:
tlv_data_is_valid()
add_advertising()
hci_mgmt_cmd()
hci_sock_sendmsg()
Move the existing element-length check before any type-octet inspection
so each non-empty element is proven to contain its type byte before the
parser looks at data[i + 1].
Fixes: 2bb36870e8cb ("Bluetooth: Unify advertising instance flags check") Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Zhang Cen <rollkingzzc@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Zhang Cen [Thu, 28 May 2026 07:56:41 +0000 (15:56 +0800)]
Bluetooth: RFCOMM: hold listener socket in rfcomm_connect_ind()
rfcomm_get_sock_by_channel() scans rfcomm_sk_list under the list lock,
but returns the selected listener after dropping that lock without
taking a reference. rfcomm_connect_ind() then locks the listener,
queues a child socket on it, and may notify it after unlocking it.
The buggy scenario involves two paths, with each column showing the
order within that path:
rfcomm_connect_ind(): listener close:
1. Find parent in 1. close() enters
rfcomm_get_sock_by_channel() rfcomm_sock_release().
2. Drop rfcomm_sk_list.lock 2. rfcomm_sock_shutdown()
without pinning parent. closes the listener.
3. Call lock_sock(parent) and 3. rfcomm_sock_kill()
bt_accept_enqueue(parent, unlinks and puts parent.
sk, true).
4. Read parent flags and may 4. parent can be freed.
call sk_state_change().
If close wins the race, parent can be freed before
rfcomm_connect_ind() reaches lock_sock(), bt_accept_enqueue(), or the
deferred-setup callback.
Take a reference on the listener before leaving rfcomm_sk_list.lock.
After lock_sock() succeeds, recheck that it is still in BT_LISTEN
before queueing a child, cache the deferred-setup bit while the parent
is locked, and drop the reference after the last parent use.
KASAN reported a slab-use-after-free in lock_sock_nested() from
rfcomm_connect_ind(), with the freeing stack going through
rfcomm_sock_kill() and rfcomm_sock_release().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Zhang Cen <rollkingzzc@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
KVM: SEV: Unmap and unpin the GHCB as needed on vCPU free
Unmap and unpin the GHCB as needed when freeing a vCPU. If the VM is
destroyed after mapping+pinning the GHCB on #VMGEXIT, without re-running
the vCPU, KVM will effectively leak the GHCB and any mappings created for
the GHCB.
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: stable@vger.kernel.org Tested-by: Michael Roth <michael.roth@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260529183549.1104619-18-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Move sev_free_vcpu() down below sev_es_unmap_ghcb()
Relocate sev_free_vcpu() down in sev.c so that it's definition comes after
sev_es_unmap_ghcb(). This will allow sharing unmap functionality between
the two functions without needing a forward declaration (or weird placement
of the common code).
No functional change intended.
Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260529183549.1104619-16-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: Don't WARN if memory is dirtied without a vCPU when the VM is dying
When marking a page dirty, complain about not having a running/loaded vCPU
if and only if the VM is still alive, i.e. its refcount is non-zero. This
will allow fixing a memory leak for x86 SEV-ES guests without hitting what
is effectively a false positive on the WARN.
For some SEV-ES VM-Exits, KVM keeps a writable mapping of a guest page
across an exit to userspace, and typically unmaps the page on the next
KVM_RUN. But if userspace never calls KVM_RUN after such an exit, then KVM
needs to unmap the page when the vCPU is destroyed, which in turn triggers
the WARN about not having a running vCPU.
Alternatively, SEV-ES could temporarily load the vCPU to suppress the WARN,
as is done in nested_vmx_free_vcpu() (but for completely unrelated reasons;
suppressing WARN from nested_put_vmcs12_pages() is pure happenstance). But
loading a vCPU during destruction is gross (ideally nVMX code would be
cleaned up), risks complicating the SEV-ES code (KVM would need to ensure
the temporarily load()+put() only runs when the vCPU isn't already loaded),
and is ultimately pointless.
The motivation for the WARN is to guard against KVM dirtying guest memory
without pushing the corresponding GFN to the active vCPU's dirty ring, e.g.
to ensure userspace doesn't miss a dirty page. But for the VM's refcount
to reach zero, there can't be _any_ userspace mappings to the dirty ring,
as mapping the dirty ring requires doing mmap() on the vCPU FD. I.e. if
userspace had a valid mapping for the dirty ring, then the vCPU file and
thus the owning VM would still be alive. And so since userspace can't
possibly reach the dirty ring, whether or not KVM technically "misses" a
push to the dirty ring is irrelevant.
Reported-by: Michael Roth <michael.roth@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260529183549.1104619-15-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Read start/end indices of PSC requests exactly once per #VMGEXIT
Rework Page State Change (PSC) handling to read the guest-provided start
and end indices exactly once, at the beginning of the request. Re-reading
the indices is "fine", _if_ the guest is well-behaved. KVM _should_ be
safe against concurrent guest modification of the indices, but there is
zero reason to introduce unnecessary risk.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260529183549.1104619-14-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Add an anonymous "psc" struct to track current PSC metadata
Add a "psc" struct to vcpu_sev_es_state to avoid having to prefix all of
the fields with "psc_".
Take advantage of the code churn to opportunistically rename local
variables to "guest_psc" to make it more obvious that the buffer is guest
data, and more importantly, guest accessible!
Opportunistically rename inflight => batch_size as well, because there can
really only be one operation in-flight (per-vCPU), i.e. "inflight" _looks_
like a boolean, but in actuality is an integer tracking how many pages are
being handled by the current operation.
No functional change intended.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260529183549.1104619-13-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Make it more obvious when KVM is writing back the current PSC index
Increment the guest-visible "cur_entry" index outside of the for-loop
when processing Page State Change entries, and add a comment to make it
more obvious which code is operating on trusted data, and which code is
touching guest-accessible data.
No functional change intended.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260529183549.1104619-12-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiang Mei reports that mac80211 could crash if eht_cap is set
but eht_oper isn't. Rather than fixing that for the individual
user(s), enforce that both HE/EHT have consistent elements.
Johannes Berg [Tue, 2 Jun 2026 11:28:38 +0000 (13:28 +0200)]
Merge tag 'iwlwifi-fixes-2026-05-31' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
wifi: iwlwifi: fixes - 2026-05-31
Miri Korenblit says:
====================
This contains a few fixes:
- Don't grab nic access in non-fast-resume
- Don't send a large hcmd than transport supports
- In AP mode, don't send tx power constraints command before activating
the link
- Don't do sw reset handshake on older firmwares.
====================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fedor Pchelkin [Mon, 1 Jun 2026 09:41:56 +0000 (12:41 +0300)]
wifi: fix leak if split 6 GHz scanning fails
rdev->int_scan_req is leaked if cfg80211_scan() fails. Note that it's
supposed to be released at ___cfg80211_scan_done() but this doesn't happen
as rdev->scan_req is NULL at that point, too, leading to the early return
from the freeing function.
Jiayuan Chen [Fri, 29 May 2026 15:22:18 +0000 (23:22 +0800)]
ipv6: anycast: insert aca into global hash under idev->lock
syzbot reported a splat [1]: a slab-use-after-free in
ipv6_chk_acast_addr(), which walks the global inet6_acaddr_lst[] hash
under RCU and dereferences a struct ifacaddr6 that has already been
freed while still linked in the hash, so a later reader walks into a
dangling node.
In __ipv6_dev_ac_inc() the aca is allocated with refcount 1, then
aca_get() bumps it to 2 to keep it alive across the unlocked region.
It is published to idev->ac_list under idev->lock, but
ipv6_add_acaddr_hash() runs after write_unlock_bh(). A concurrent
teardown (ipv6_ac_destroy_dev() from addrconf_ifdown(), under RTNL)
can slip into that window:
CPU0 __ipv6_dev_ac_inc CPU1 ipv6_ac_destroy_dev (RTNL)
------------------------------ ------------------------------------
aca_alloc() refcnt 1
aca_get() refcnt 2
write_lock_bh(idev->lock)
add aca to ac_list
write_unlock_bh(idev->lock)
write_lock_bh(idev->lock)
pull aca off ac_list
write_unlock_bh(idev->lock)
ipv6_del_acaddr_hash(aca)
hlist_del_init_rcu() is a no-op,
aca is not in the hash yet
aca_put() refcnt 2->1
ipv6_add_acaddr_hash(aca)
aca now inserted into the hash
aca_put() refcnt 1->0
call_rcu(aca_free_rcu) -> kfree(aca)
The hash removal becomes a no-op because the insertion has not
happened yet, so once CPU0 inserts and drops the last reference, the
aca is freed while still linked in inet6_acaddr_lst[], and readers
dereference freed memory after the slab slot is reused.
This window opened once RTNL stopped serializing the join path against
device teardown. Move ipv6_add_acaddr_hash() inside the idev->lock
section so the ac_list and hash insertions are atomic with respect to
teardown: a racing remover now either misses the aca entirely or finds
it in both lists.
acaddr_hash_lock is now nested under idev->lock, which is acquired in
softirq context, so switch all acaddr_hash_lock sites to spin_lock_bh()
to avoid the irq lock inversion reported in [2].
Tejun Heo [Mon, 1 Jun 2026 19:22:37 +0000 (09:22 -1000)]
sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task()
A WARN fires when systemd's user manager writes "+cpu +memory +pids" to
its own subtree_control while a sched_ext scheduler is loaded:
WARNING: at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0
scx_cgroup_move_task+0xa8/0xb0
sched_move_task+0x134/0x290
cpu_cgroup_attach+0x39/0x70
cgroup_migrate_execute+0x37d/0x450
cgroup_update_dfl_csses+0x1e3/0x270
cgroup_subtree_control_write+0x3e7/0x440
scx_cgroup_can_attach() arms cgrp_moving_from only when a task's cpu
cgroup changes. It can still be NULL when scx_cgroup_move_task() runs,
through this sequence:
Step Result
--------------------------------- ----------------------------------
1. cpu enabled on cgroup G cpu css = A
2. cpu toggled off then on for G A killed, B created (same cgroup)
3. an exiting task keeps A alive migration skips it, A now stale
4. +memory migrates G stale A vs current B pulls cpu in
5. cpu attach runs for all tasks hits a live, cpu-unchanged task
6. scx_cgroup_move_task() on it cgrp_moving_from NULL -> WARN
The mismatch is that scx_cgroup_can_attach() keys on cgroup identity
while migration drives the move on css identity, so a NULL cgrp_moving_from
here is a legitimate css-only migration, not a missing prep.
The call is already gated on cgrp_moving_from, so just drop the warning.
ops.cgroup_prep_move() and ops.cgroup_move() stay paired.
Zhao Zhang [Sat, 30 May 2026 15:57:14 +0000 (23:57 +0800)]
sctp: diag: reject stale associations in dump_one path
The SCTP exact sock_diag lookup can hold a transport reference, block on
lock_sock(sk), and then resume after sctp_association_free() has marked
the association dead and freed its bind address list.
When that happens, inet_assoc_attr_size() and
inet_diag_msg_sctpasoc_fill() can still dereference association state
that is no longer valid for reporting. In particular,
inet_diag_msg_sctpasoc_fill() may read an empty bind-address list as a
real sctp_sockaddr_entry and trigger an out-of-bounds read from
unrelated association memory.
Reject the association after taking the socket lock if it has been
reaped or detached from the endpoint, and report the lookup as stale.
This keeps the exact dump-one path from formatting torn association
state.
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Zhao Zhang <zzhan461@ucr.edu> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/fac6043fa20a2ff68e12958c431836f692c51268.1780113823.git.zzhan461@ucr.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tapio Reijonen [Fri, 29 May 2026 06:18:57 +0000 (06:18 +0000)]
net: fec: fix pinctrl default state restore order on resume
In fec_resume(), fec_enet_clk_enable() is called before
pinctrl_pm_select_default_state() in the non-WoL path, inverting the
ordering used in fec_suspend() which correctly switches to the sleep
pinctrl state before disabling clocks.
For PHYs with the PHY_RST_AFTER_CLK_EN flag (e.g. TI DP83848 or
SMSC LAN87xx), fec_enet_clk_enable() triggers a hardware reset pulse
via the phy-reset GPIO. With the GPIO pin still in sleep pinctrl state
at that point, the GPIO write has no physical effect and the PHY never
receives the required reset after clock enable, leading to unreliable
link establishment after system resume.
Fix by restoring the default pinctrl state before enabling clocks,
making resume the proper mirror of suspend. The call is made
unconditionally: fec_suspend() only switches to the sleep pinctrl state
on the non-WoL path and leaves the pins in the default state when WoL
is enabled, so on a WoL resume the device is already in the default
state and pinctrl_pm_select_default_state() is a no-op.
David Thompson [Fri, 29 May 2026 21:03:00 +0000 (21:03 +0000)]
net: lan743x: permit VLAN-tagged packets up to configured MTU
VLAN-tagged interfaces on lan743x devices were previously unreachable via
SSH and failed to respond to large ping packets (e.g. "ping -s 1469" given
MTU=1500). In these scenarios, "ethtool -S" reports non-zero "RX Oversize
Frame Errors". According to Microchip AN2948, the MAC_RX FSE (VLAN field
size enforcement) bit determines whether frames with VLAN tags exceeding
the base MTU plus tag length are discarded.
The driver must set the MAC_RX.FSE bit before setting MAC_RX.RXEN to allow
VLAN-tagged frames up to the interface MTU, preventing them from being
treated as oversized. As a result, both the base and VLAN-tagged interfaces
can use the same MTU without receive errors.
Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Thangaraj Samynathan <Thangaraj.s@microchip.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Tested-by: Nicolai Buchwitz <nb@tipi-net.de> # lan7430 on arm64 (RevPi Link: https://patch.msgid.link/20260529210300.433135-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yuqi Xu [Fri, 29 May 2026 13:01:44 +0000 (21:01 +0800)]
net: rds: clear i_sends on setup unwind
The RDS IB connection teardown path is written so it can run during
partial startup and on repeated shutdown attempts. It uses NULL
pointers to distinguish resources that are still owned from resources
that have already been released.
When rds_ib_setup_qp() fails after allocating i_sends but before
allocating i_recvs, the sends_out path frees i_sends without clearing
the pointer. A later shutdown pass can still treat that stale pointer
as a live send ring allocation.
Clear i_sends after vfree() in the error unwind path so the existing
shutdown logic continues to use the correct ownership state.
Yizhou Zhao [Wed, 27 May 2026 08:31:58 +0000 (16:31 +0800)]
net: garp: fix unsigned integer underflow in garp_pdu_parse_attr
The receive-side GARP attribute parser computes dlen with reversed
operands:
dlen = sizeof(*ga) - ga->len;
ga->len is the on-wire attribute length and includes the GARP attribute
header. For normal attributes with data, ga->len is larger than
sizeof(*ga), so the subtraction underflows in unsigned arithmetic.
The resulting value is later passed to garp_attr_lookup(), whose length
argument is u8. After truncation, the parsed data length usually no
longer matches the length stored for locally registered attributes, so
received Join/Leave events are ignored. This breaks the GARP receive path
for common attributes, such as GVRP VLAN registration attributes.
Compute the data length as the attribute length minus the header length.
Fixes: eca9ebac651f ("net: Add GARP applicant-only participant") Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn> Reported-by: Ao Wang <wangao@seu.edu.cn> Reported-by: Xuewei Feng <fengxw06@126.com> Reported-by: Qi Li <qli01@tsinghua.edu.cn> Reported-by: Ke Xu <xuke@tsinghua.edu.cn> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260527083200.42861-1-zhaoyz24@mails.tsinghua.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>