Linus Torvalds [Wed, 13 May 2026 18:53:51 +0000 (11:53 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"arm64:
- Add the pKVM side of the workaround for ARM's erratum 4193714,
provided that the EL3 firmware does its part of the job. KVM will
refuse to initialise otherwise
- Correctly handle 52bit VAs for guest EL2 stage-1 translations when
running under NV with E2H==0
- Correctly deal with permission faults in guest_memfd memslots
- Fix the steal-time selftest after the infrastructure was reworked
- Make sure the host cannot pass a non-sensical clock update to the
EL2 tracing infrastructure
- Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390
ability to run arm64 guests, which will inevitably lead to arm64
code being directly used on s390
- Make sure that EL2 is configured with both exception entry and exit
being Context Synchronization Events
- Handle the current vcpu being NULL on EL2 panic
- Fix the selftest_vcpu memcache being empty at the point of donation
or sharing
- Check that the memcache has enough capacity before engaging on the
share/donate path
- Fix __deactivate_fgt() to use its parameter rather than a variable
in the macro context
s390:
- Fix array overrun with large amounts of PCI devices
x86:
- Never use L0's PAUSE loop exiting while L2 is running, since it's
unlikely that a nested guest will help solving the hypervisor's
spinlock contention
- Fix emulation of MOVNTDQA
- Fix typo in Xen hypercall tracepoint
- Add back an optimization that was left behind when recently fixing
a bug
- Add module parameter to disable CET, whose implementation seems to
have issues. For now it remains enabled by default
Generic:
- Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn()
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: VMX: introduce module parameter to disable CET
KVM: x86: Swap the dst and src operand for MOVNTDQA
KVM: x86: use again the flush argument of __link_shadow_page()
KVM: selftests: Ensure gmem file sizes are multiple of host page size
Documentation: kvm: update links in the references section of AMD Memory Encryption
KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running
KVM: x86: Fix Xen hypercall tracepoint argument assignment
KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
KVM: arm64: Pre-check vcpu memcache for host->guest donate
KVM: arm64: Pre-check vcpu memcache for host->guest share
KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
KVM: arm64: Fix __deactivate_fgt macro parameter typo
KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
KVM: arm64: Make EL2 exception entry and exit context-synchronization events
MAINTAINERS: Add Steffen as reviewer for KVM/arm64
KVM: arm64: Remove potential UB on nvhe tracing clock update
KVM: selftests: arm64: Fix steal_time test after UAPI refactoring
KVM: arm64: Handle permission faults with guest_memfd
KVM: arm64: nv: Consider the DS bit when translating TCR_EL2
KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests
...
Paolo Bonzini [Tue, 12 May 2026 14:58:48 +0000 (16:58 +0200)]
KVM: VMX: introduce module parameter to disable CET
There have been reports of host hangs caused by CET virtualization.
Until these are analyzed further, introduce a module parameter that
makes it possible to easily disable it.
KVM: x86: Swap the dst and src operand for MOVNTDQA
Swap the MOVNTDQA operands, as MOVNTDQA does NOT in fact have "the same
characteristics as 0F E7 (MOVNTDQ)"; MOVNTDQA loads from memory and stores
to registers, while MOVNTDQ loads from registers and stores to memory.
Per the SDM:
MOVNTDQ - Move packed integer values in xmm1 to m128 using non-temporal
hint.
MOVNTDQA - Move double quadword from m128 to xmm1 using non-temporal hint
if WC memory type.
Reported-by: Josh Eads <josheads@google.com> Fixes: c57d9bafbd0b ("KVM: x86: Add support for emulating MOVNTDQA") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260506213514.2781948-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sun, 3 May 2026 21:09:17 +0000 (23:09 +0200)]
KVM: x86: use again the flush argument of __link_shadow_page()
Except in the case of parentless nested-TDP pages, mmu_page_zap_pte()
clears the SPTE but leaves the invalid_list empty. In this case, using
kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill.
Avoid flushing the entirety of the remote TLBs unless the invalid_list
was populated: instead, use a more efficient gfn-targeting flush (if
available) and skip it altogether if the caller guarantees that a TLB
flush is not necessary.
Based-on: <20260503201029.106481-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260503210917.121840-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: selftests: Ensure gmem file sizes are multiple of host page size
When creating a guest_memfd file and associated memslot to validate shared
guest memory, size the file+memslot to the maximum of the host or guest
page size. Attempting to allocate a single guest page will fail if the
host page size is greater than the guest page size, as KVM requires that
the size of memslots and guest_memfd files are a multiple of the host page
size.
For simplicity, verify the entire file can be shared between guest and host,
e.g. instead of trying to validate "partial" mappings.
Fixes: 42188667be38 ("KVM: selftests: Add guest_memfd testcase to fault-in on !mmap()'d memory") Reported-by: Zenghui Yu <zenghui.yu@linux.dev> Closes: https://lore.kernel.org/all/0064952b-048c-455d-ad89-e27e5cb82591@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260512155634.772602-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 May 2026 20:19:20 +0000 (22:19 +0200)]
Merge tag 'kvmarm-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 7.1, take #2
- Add the pKVM side of the workaround for ARM's erratum 4193714, provided
that the EL3 firmware does its part of the job. KVM will refuse to
initialise otherwise.
- Correctly handle 52bit VAs for guest EL2 stage-1 translations when
running under NV with E2H==0.
- Correctly deal with permission faults in guest_memfd memslots.
- Fix the steal-time selftest after the infrastructure was reworked.
- Make sure the host cannot pass a non-sensical clock update to the
EL2 tracing infrastructure.
- Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390
ability to run arm64 guests, which will inevitably lead to arm64
code being directly used on s390.
- Make sure that EL2 is configured with both exception entry and exit
being Context Synchronization Events.
- Handle the current vcpu being NULL on EL2 panic.
- Fix the selftest_vcpu memcache being empty at the point of donation or
sharing.
- Check that the memcache has enough capacity before engaging on the
share/donate path.
- Fix __deactivate_fgt() to use its parameter rather than a variable
in the macro context.
KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running
Never use L0's (KVM's) PAUSE loop exiting controls while L2 is running,
and instead always configure vmcb02 according to L1's exact capabilities
and desires.
The purpose of intercepting PAUSE after N attempts is to detect when the
vCPU may be stuck waiting on a lock, so that KVM can schedule in a
different vCPU that may be holding said lock. Barring a very interesting
setup, L1 and L2 do not share locks, and it's extremely unlikely that an
L1 vCPU would hold a spinlock while running L2. I.e. having a vCPU
executing in L1 yield to a vCPU running in L2 will not allow the L1 vCPU
to make forward progress, and vice versa.
While teaching KVM's "on spin" logic to only yield to other vCPUs in L2 is
doable, in all likelihood it would do more harm than good for most setups.
KVM has limited visibility into which L2 "vCPUs" belong to the same VM,
and thus share a locking domain. And even if L2 vCPUs are in the same
VM, KVM has no visilibity into L2 vCPU's that are scheduled out by the
L1 hypervisor.
Furthermore, KVM doesn't actually steal PAUSE exits from L1. If L1 is
intercepting PAUSE, KVM will route PAUSE exits to L1, not L0, as
nested_svm_intercept() gives priority to the vmcb12 intercept. As such,
overriding the count/threshold fields in vmcb02 with vmcb01's values is
nonsensical, as doing so clobbers all the training/learning that has been
done in L1.
Even worse, if L1 is not intercepting PAUSE, i.e. KVM is handling PAUSE
exits, then KVM will adjust the PLE knobs based on L2 behavior, which could
very well be detrimental to L1, e.g. due to essentially poisoning L1 PLE
training with bad data.
And copying the count from vmcb02 to vmcb01 on a nested VM-Exit makes even
less sense, because again, the purpose of PLE is to detect spinning vCPUs.
Whether or not a vCPU is spinning in L2 at the time of a nested VM-Exit
has no relevance as to the behavior of the vCPU when it executes in L1.
The only scenarios where any of this actually works is if at least one
of KVM or L1 is NOT intercepting PAUSE for the guest. Per the original
changelog, those were the only scenarios considered to be supported.
Disabling KVM's use of PLE makes it so the VM is always in a "supported"
mode.
Last, but certainly not least, using KVM's count/threshold instead of the
values provided by L1 is a blatant violation of the SVM architecture.
Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Cc: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260508213321.373309-1-seanjc@google.com/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aaron Sacks [Tue, 12 May 2026 06:07:42 +0000 (02:07 -0400)]
KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
kvm_reset_dirty_gfn() guards the gfn range with
if (!memslot || (offset + __fls(mask)) >= memslot->npages)
return;
but offset is u64 and the addition is unchecked. The check can be
silently bypassed by a u64 wrap.
The dirty ring backing those entries is MAP_SHARED at
KVM_DIRTY_LOG_PAGE_OFFSET of the vcpu fd, so the VMM can rewrite the
slot and offset fields of any entry between when the kernel pushes
them and when KVM_RESET_DIRTY_RINGS consumes them. On reset,
kvm_dirty_ring_reset() re-reads the values via READ_ONCE() and feeds
them straight back into this check; only the flags handshake is
treated as the handover, the slot/offset payload is taken on trust.
makes the coalescing loop in kvm_dirty_ring_reset() compute
delta = (s64)(0 - 0xffffffffffffffc1) = 63
which falls in [0, BITS_PER_LONG), so it folds entry[i+1] into the
existing mask by setting bit 63. The trailing kvm_reset_dirty_gfn()
call then sees offset = 0xffffffffffffffc1 and __fls(mask) = 63;
the sum is 0 in u64 and the bounds check passes.
That offset propagates into kvm_arch_mmu_enable_log_dirty_pt_masked()
unchanged. On the legacy MMU path -- kvm_memslots_have_rmaps() ==
true, i.e. shadow paging, any VM that has allocated shadow roots, or
a write-tracked slot -- it reaches gfn_to_rmap(), which indexes
slot->arch.rmap[0][] with a near-U64_MAX gfn. That is an
out-of-bounds load of a kvm_rmap_head, followed by a conditional
clear of PT_WRITABLE_MASK in whatever the loaded pointer points at.
The path is reachable from any process holding /dev/kvm.
Range-check offset on its own first, so the addition cannot wrap.
memslot->npages is bounded well below U64_MAX, so once offset <
npages holds, offset + __fls(mask) (with __fls(mask) < BITS_PER_LONG)
stays in range.
Linus Torvalds [Tue, 12 May 2026 17:18:02 +0000 (10:18 -0700)]
Merge tag 'probes-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()
Since the ftrace adds its NOPs at .kprobes.text section (which stores
an array), a wrong entry is added when loading a module which uses
"__kprobes" attribute.
To solve this, add "notrace" to __kprobes functions
- test_kprobes: clear kprobes between test runs
Clear all kprobes in the test program after running a test set,
because Kunit test can run several times
- fprobe: Fix unregister_fprobe() to wait for RCU grace period
Since the fprobe data structure is removed with hlist_del_rcu(), it
should wait for the RCU grace period. If the caller waits for RCU, we
can use the async variant (e.g. eBPF)
* tag 'probes-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fprobe: Fix unregister_fprobe() to wait for RCU grace period
test_kprobes: clear kprobes between test runs
kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()
Linus Torvalds [Mon, 11 May 2026 22:38:49 +0000 (15:38 -0700)]
Merge tag 'linux_kselftest-kunit-fixes-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
"Fix to decouple KUNIT_DEBUGFS and KUNIT_ALL_TESTS options and fix
KUNIT_DEBUGFS dependencies so it depends on DEBUG_FS without which it
will not be useful"
* tag 'linux_kselftest-kunit-fixes-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: config: KUNIT_DEBUGFS should depend on DEBUG_FS
kunit: config: Enable KUNIT_DEBUGFS by default
Jann Horn [Mon, 11 May 2026 15:55:11 +0000 (08:55 -0700)]
exit: prevent preemption of oopsing TASK_DEAD task
When an already-exiting task oopses, make_task_dead() currently calls
do_task_dead() with preemption enabled. That is forbidden:
do_task_dead() calls __schedule(), which has a comment saying "WARNING:
must be called with preemption disabled!".
If an oopsing task is preempted in do_task_dead(), between becoming
TASK_DEAD and entering the scheduler explicitly, bad things happen:
finish_task_switch() assumes that once the scheduler has switched away
from a TASK_DEAD task, the task can never run again and its stack is no
longer needed; but that assumption apparently doesn't hold if the dead
task was preempted (the SM_PREEMPT case).
This means that the scheduler ends up repeatedly dropping references on
the dead task's stack, which can lead to use-after-free or double-free
of the entire task stack; in other words, two tasks can end up running
on the same stack, resulting in various kinds of memory corruption.
(This does not just affect "recursively oopsing" tasks; it is enough to
oops once during task exit, for example in a file_operations::release
handler)
Fixes: 7f80a2fd7db9 ("exit: Stop poorly open coding do_task_dead in make_task_dead") Cc: stable@kernel.org Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fprobe: Fix unregister_fprobe() to wait for RCU grace period
Commit 4346ba1604093 ("fprobe: Rewrite fprobe on function-graph tracer")
changed fprobe to register struct fprobe to an rcu-hlist, but it forgot
to wait for RCU GP. Thus there can be use-after-free if the fprobe is
released right after unregistering. This can be happened on fprobe
event and sample module code.
To fix this issue, add synchronize_rcu() in unregister_fprobe().
Note that BPF is OK because fprobe is used as a part of
bpf_kprobe_multi_link. This unregisters its fprobe in
bpf_kprobe_multi_link_release() and it is deallocated via
bpf_kprobe_multi_link_dealloc(), which is invoked from
bpf_link_defer_dealloc_rcu_gp() RCU callback.
For BPF, this also introduced unregister_fprobe_async() which does
NOT wait for RCU grace priod.
Hyunwoo Kim [Fri, 8 May 2026 08:53:09 +0000 (17:53 +0900)]
rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
handler in rxrpc_verify_response() copy the skb to a linear one before
calling into the security ops only when skb_cloned() is true. An skb
that is not cloned but still carries externally-owned paged fragments
(e.g. SKBFL_SHARED_FRAG set by splice() into a UDP socket via
__ip_append_data, or a chained skb_has_frag_list()) falls through to
the in-place decryption path, which binds the frag pages directly into
the AEAD/skcipher SGL via skb_to_sgvec().
Extend the gate to also unshare when skb_has_frag_list() or
skb_has_shared_frag() is true. This catches the splice-loopback vector
and other externally-shared frag sources while preserving the
zero-copy fast path for skbs whose frags are kernel-private (e.g. NIC
page_pool RX, GRO). The OOM/trace handling already in place is reused.
Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Cc: stable@vger.kernel.org Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 10 May 2026 15:10:47 +0000 (08:10 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk driver fixes from Stephen Boyd:
- Mark the DDR bus clk critical in the SpaceMiT driver so that
boot doesn't fail
- Fix boot on Mobile EyeQ by creating the auxiliary device for
the ethernet PHY
- Plug an OF node leak in Rockchip rk808 clk driver
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: rk808: fix OF node reference imbalance
MAINTAINERS: add myself as a reviewer for the clk subsystem
reset: eyeq: drop device_set_of_node_from_dev() done by parent
clk: eyeq: add EyeQ5 children auxiliary device for generic PHYs
clk: eyeq: use the auxiliary device creation helper
clk: spacemit: k3: mark top_dclk as CLK_IS_CRITICAL
Linus Torvalds [Sun, 10 May 2026 01:42:54 +0000 (18:42 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix sk_local_storage diag dump via netlink (Amery Hung)
- Fix off-by-one in arena direct-value access (Junyoung Jang)
- Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan)
- Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima)
- Reject TX-only AF_XDP sockets (Linpu Yu)
- Don't run arg-tracking analysis twice on main subprog (Paul Chaignon)
- Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup
(Weiming Shi)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix off-by-one boundary validation in arena direct-value access
xskmap: reject TX-only AF_XDP sockets
bpf: Don't run arg-tracking analysis twice on main subprog
bpf: Free reuseport cBPF prog after RCU grace period.
bpf: tcp: Fix type confusion in sol_tcp_sockopt().
bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
bpf: tcp: Fix type confusion in bpf_tcp_sock().
tools/headers: Regenerate stddef.h to fix BPF selftests
bpf: Fix sk_local_storage diag dumping uninitialized special fields
bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()
sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
bpf: Reject TCP_NODELAY in bpf-tcp-cc
bpf: Reject TCP_NODELAY in TCP header option callbacks
Junyoung Jang [Sun, 26 Apr 2026 17:25:05 +0000 (02:25 +0900)]
bpf: Fix off-by-one boundary validation in arena direct-value access
BPF_MAP_TYPE_ARENA accepts BPF_PSEUDO_MAP_VALUE offsets at exactly
the end of the arena mapping (off == arena_size). The boundary check
in arena_map_direct_value_addr() uses `>` instead of `>=`, which
incorrectly allows a one-past-end pointer to be accepted.
Change the condition to `>=` to correctly reject offsets that fall
outside the valid arena user_vm range.
Linpu Yu [Fri, 8 May 2026 14:43:43 +0000 (22:43 +0800)]
xskmap: reject TX-only AF_XDP sockets
XSKMAP entries are used as redirect targets for incoming XDP frames.
A TX-only AF_XDP socket lacks an Rx ring and cannot handle redirected
traffic, but xsk_map_update_elem() currently allows such sockets to
be inserted into the map.
Redirecting packets to such a socket on the veth generic-XDP path
causes a kernel crash in xsk_generic_rcv().
This became possible after xsk_is_setup_for_bpf_map() was removed from
the XSKMAP update path, which allowed bound TX-only sockets to be
inserted into the map.
Reject TX-only sockets during XSKMAP updates to avoid the crash.
They remain fully operational for pure Tx purposes outside XSKMAP.
Fixes: 968be23ceaca ("xsk: Fix possible segfault at xskmap entry insertion") Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yifan Wu <yifanwucs@gmail.com> Signed-off-by: Linpu Yu <linpu5433@gmail.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20260508144344.694-1-linpu5433@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul Chaignon [Thu, 7 May 2026 18:22:06 +0000 (20:22 +0200)]
bpf: Don't run arg-tracking analysis twice on main subprog
Because subprog 0, the main subprog, is considered a global function,
we end up running the arg-tracking dataflow analysis twice on it. That
results in slightly longer verification but mostly in more verbose
verifier logs. This patch fixes it by keeping only the iteration over
global subprogs.
When running over all of Cilium's programs with BPF_LOG_LEVEL2, this
reduces verbosity by ~20% on average.
Linus Torvalds [Sat, 9 May 2026 18:47:39 +0000 (11:47 -0700)]
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux
Pull fsverity fix from Eric Biggers:
"Fix a regression in overlayfs caused by an fsverity API change"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
ovl: fix verity lazy-load guard broken by fsverity_active() semantic change
Linus Torvalds [Sat, 9 May 2026 15:32:50 +0000 (08:32 -0700)]
Merge tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- ads7871: Fix endianness bug in 16-bit register reads
- lm75: Fix configuration register writes and AS6200/TMP112 setup and
alarm handling
- lm63: Fix TOCTOU problems
- corsair-psu: Close HID device on probe errors
- ltc2992: Fix overflow and threshold range
- Documentation: fix link to ideapad-laptop.c file
- Remove stale CONFIG_SENSORS_SBRMI Makefile reference
* tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ads7871) Fix endianness bug in 16-bit register reads
hwmon: (lm75) Fix configuration register writes.
hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling
hwmon: (lm63) Add locking to avoid TOCTOU
hwmon: (corsair-psu) Close HID device on probe errors
hwmon: Remove stale CONFIG_SENSORS_SBRMI Makefile reference
Documentation: hwmon: fix link to ideapad-laptop.c file
hwmon: (ltc2992) Fix u32 overflow in power read path
hwmon: (ltc2992) Clamp threshold writes to hardware range
Linus Torvalds [Sat, 9 May 2026 15:10:07 +0000 (08:10 -0700)]
Merge tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- sanitize more input parameters in the core (found by syzkaller)
- usual set of driver fixes (proper completion handling, applying
quirks, correct workqueue selection...)
- ID additions to simplify dependency handling
- new email address for Peter Rosin
* tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: smbus: reject oversized block transfers in the common path
MAINTAINERS: Update mail for Peter Rosin
i2c: stub: Reject I2C block transfers with invalid length
i2c: Compare the return value of gpiod_get_direction against GPIO_LINE_DIRECTION_OUT
i2c: dev: prevent integer overflow in I2C_TIMEOUT ioctl
i2c: acpi: Add ELAN0678 to i2c_acpi_force_100khz_device_ids
dt-bindings: i2c: apple,i2c: Add t8122 compatible
i2c: stm32f7: reinit_completion() per transfer not per msg
dt-bindings: i2c: amlogic: Add compatible for T7 SOC
i2c: testunit: Replace system_long_wq with system_dfl_long_wq
Linus Torvalds [Sat, 9 May 2026 15:03:21 +0000 (08:03 -0700)]
Merge tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fix KASAN sanitization flag for core_$(BITS).o
- Fixes for handling offset values in pseries htmdump
- Fix interrupt mask in cpm1_gpiochip_add16()
- ps3/pasemi fixes to drop redundant result assignment
- Fixes in papr-hvpipe code path
- powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
Thanks to Aboorva Devarajan, Athira Rajeev, Christophe Leroy (CS GROUP),
Geert Uytterhoeven, Haren Myneni, Krzysztof Kozlowski, Mukesh Kumar
Chaurasiya (IBM), Nathan Chancellor, Ritesh Harjani (IBM), Shivani
Nittor, Sourabh Jain, Thomas Zimmermann, and Venkat Rao Bagalkote.
* tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits)
powerpc/pasemi: Drop redundant res assignment
powerpc/ps3: Drop redundant result assignment
powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang
arch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files
powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16()
powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec
powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o
pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ()
pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg()
pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info
pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release()
pseries/papr-hvpipe: Fix the usage of copy_to_user()
pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()
pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()
pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace
pseries/papr-hvpipe: Fix race with interrupt handler
powerpc/pseries/htmdump: Add memory configuration dump support to htmdump module
powerpc/pseries/htmdump: Fix the offset value used in htm status dump
powerpc/pseries/htmdump: Fix the offset value used in processor configuration dump
...
Linus Torvalds [Sat, 9 May 2026 03:28:45 +0000 (20:28 -0700)]
Merge tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix memory map enumeration bug in the Xen e820 parsing code (Juergen
Gross)
- Re-enable e820 BIOS fallback if e820 table is empty (David Gow)
* tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/e820: Re-enable BIOS fallback if e820 table is empty
x86/xen: Fix a potential problem in xen_e820_resolve_conflicts()
Linus Torvalds [Sat, 9 May 2026 02:42:10 +0000 (19:42 -0700)]
Merge tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix spurious failures in rseq self-tests (Mark Brown)
- Fix rseq rseq::cpu_id_start ABI regression due to TCMalloc's creative
use of the supposedly read-only field
The fix is to introduce a new ABI variant based on a new (larger)
rseq area registration size, to keep the TCMalloc use of rseq
backwards compatible on new kernels (Thomas Gleixner)
- Fix wakeup_preempt_fair() for not waking up task (Vincent Guittot)
- Fix s64 mult overflow in vruntime_eligible() (Zhan Xusheng)
* tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix wakeup_preempt_fair() for not waking up task
sched/fair: Fix overflow in vruntime_eligible()
selftests/rseq: Expand for optimized RSEQ ABI v2
rseq: Reenable performance optimizations conditionally
rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode
selftests/rseq: Validate legacy behavior
selftests/rseq: Make registration flexible for legacy and optimized mode
selftests/rseq: Skip tests if time slice extensions are not available
rseq: Revert to historical performance killing behaviour
rseq: Don't advertise time slice extensions if disabled
rseq: Protect rseq_reset() against interrupts
rseq: Set rseq::cpu_id_start to 0 on unregistration
selftests/rseq: Don't run tests with runner scripts outside of the scripts
Linus Torvalds [Sat, 9 May 2026 02:39:18 +0000 (19:39 -0700)]
Merge tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events fixes from Ingo Molnar:
- Fix deadlock in the perf_mmap() failure path (Peter Zijlstra)
- Intel ACR (Auto Counter Reload) fixes (Dapeng Mi):
- Fix validation and configuration of ACR masks
- Fix ACR rescheduling bug causing stale masks
- Disable the PMI on ACR-enabled hardware
- Enable ACR on Panther Cover uarch too
* tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Enable auto counter reload for DMR
perf/x86/intel: Disable PMI for self-reloaded ACR events
perf/x86/intel: Always reprogram ACR events to prevent stale masks
perf/x86/intel: Improve validation and configuration of ACR masks
perf/core: Fix deadlock in perf_mmap() failure path
Linus Torvalds [Fri, 8 May 2026 23:08:58 +0000 (16:08 -0700)]
Merge tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Don't fallback to bus reset after failed slot reset; a bus reset
isn't safe if the .reset_slot() callback is implemented (Keith Busch)
- Update saved_config_space upon resource assignment to fix passthrough
regressions when x86 pcibios_assign_resources() updates BARs (Lukas
Wunner)
- Initialize a temporary pci_dev->dev in sysfs 'new_id' attribute to
fix a lockdep regression after driver_override was moved from PCI to
device core (Samiullah Khawaja)
- Update MAINTAINERS email addresses (Marek Vasut, Hans Zhang)
- Add MAINTAINERS reviewer for PCIe Cadence IP (Aksh Garg)
* tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer
MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1
MAINTAINERS: Update Marek Vasut email for PCIe R-Car
PCI: Initialize temporary device in new_id_store()
PCI: Update saved_config_space upon resource assignment
PCI: Don't fallback to bus reset after failed slot reset
PCI: Initialize temporary device in new_id_store()
When setting new_id of a PCI device driver using sysfs a lockdep splat
occurs. This is because new_id_store() builds a temporary pci_dev for
pci_match_device(), which calls device_match_driver_override(). That
depends on the driver_override.lock added by cb3d1049f4ea ("driver core:
generalize driver_override in struct device").
The new driver_override.lock was not initialized in the temporary pci_dev,
resulting in this lockdep splat.
Initialize the temporary pci_dev to fix this.
Repro:
Build with CONFIG_LOCKDEP=y, boot with QEMU, and add a new ID:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 2 UID: 0 PID: 177 Comm: liveupdate-iomm Not tainted 7.0.0+ #9 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
register_lock_class+0x77e/0x790
lock_acquire+0xbf/0x2e0
pci_match_device+0x24/0x180
new_id_store+0x189/0x1d0
kernfs_fop_write_iter+0x14f/0x210
vfs_write+0x263/0x5e0
ksys_write+0x79/0xf0
do_syscall_64+0x117/0xf80
Fixes: 10a4206a2401 ("PCI: use generic driver_override infrastructure") Fixes: 8895d3bcb8ba ("PCI: Fail new_id for vendor/device values already built into driver") Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
[bhelgaas: add commit log details and repro, trim backtrace] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Link: https://patch.msgid.link/20260505234327.716630-1-skhawaja@google.com
PCI: Update saved_config_space upon resource assignment
Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):
ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
ddbridge 0000:05:00.0: cannot read registers
ddbridge 0000:05:00.0: fail
BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window. The kernel corrects the BAR assignment:
pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned
Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:
No other architecture performs such a late BAR correction.
Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit 4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.
Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction. Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:
But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7. Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.
Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well. The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.
Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.
To robustly fix the issue, always update saved_config_space upon resource
assignment.
Reported-by: Bernd Schumacher <bernd@bschu.de> Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/ Reported-by: Alexandre N. <an.tech@mailo.com> Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/ Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Bernd Schumacher <bernd@bschu.de> Tested-by: Alexandre N. <an.tech@mailo.com> Cc: stable@vger.kernel.org # v6.12+ Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de
bpf: Free reuseport cBPF prog after RCU grace period.
Eulgyu Kim reported the splat below with a repro. [0]
The repro sets up a UDP reuseport group with a cBPF prog and
replaces it with a new one while another thread is sending
a UDP packet to the group.
The reuseport prog is freed by sk_reuseport_prog_free().
bpf_prog_put() is called for "e"BPF prog to destruct through
multiple stages while cBPF prog is freed immediately by
bpf_release_orig_filter() and bpf_prog_free().
If a reuseport prog is detached from the setsockopt() path
(reuseport_attach_prog() or reuseport_detach_prog()),
sk_reuseport_prog_free() is called without waiting for RCU
readers to complete, resulting in various bugs.
Let's defer freeing the reuseport cBPF prog after one RCU
grace period.
Note "e"BPF prog is safe as is unless the fast path starts
to touch fields destroyed in bpf_prog_put_deferred() and
__bpf_prog_put_noref().
Linus Torvalds [Fri, 8 May 2026 20:18:13 +0000 (13:18 -0700)]
Merge tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- Fix for ublk not doing an actual issue from the task_work fallback
path. Any request hitting that should be canceled automatically
- Fix for uring_cmd prep side handling, for the block side uring_cmd
discard handling
- Fix for missing validation of the io and physical block size shifts
- Fix for a use-after-free in ublk's cancel command handling
* tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
ublk: fix use-after-free in ublk_cancel_cmd()
ublk: validate physical_bs_shift, io_min_shift and io_opt_shift
block: only read from sqe on initial invocation of blkdev_uring_cmd()
ublk: don't issue uring_cmd from fallback task work
Linus Torvalds [Fri, 8 May 2026 20:12:48 +0000 (13:12 -0700)]
Merge tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Ensure that the absolute timeouts for both the command side and the
waiting side honor the callers time namespace
- Ensure tracked NAPI entries are cleared at unregistration time, as
the NAPI polling loop checks the list state rather than the general
NAPI state. This can lead to NAPI polling even after unregistration
has been done. If unregistered, all NAPI polling should be disabled
- Fix for eventfd recursive invocation handling
* tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
io_uring/eventfd: reset deferred signal state
io_uring/napi: clear tracked NAPI entries on unregister
bpf: tcp: Fix type confusion in sol_tcp_sockopt().
sol_tcp_sockopt() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:
socket(AF_INET, SOCK_RAW, IPPROTO_TCP)
Let's use sk_is_tcp().
Note that initially sol_tcp_sockopt() checked sk->sk_prot->setsockopt.
Fixes: 2ab42c7b871f ("bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260504210610.180150-7-kuniyu@google.com
mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
bpf_mptcp_sock_from_subflow() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:
socket(AF_INET, SOCK_RAW, IPPROTO_TCP)
In this case, it would NOT be valid to call sk_is_mptcp() which will
assume sk is a pointer to a struct tcp_sock, and wrongly checks for:
tcp_sk(sk)->is_mptcp.
Linus Torvalds [Fri, 8 May 2026 17:24:35 +0000 (10:24 -0700)]
Merge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix for two ACL issues (security fix to validate dacloffset better
and chmod fix)
- Fix out of bounds reads (in check_wsl_eas and smb2_check_msg for
symlinks)
- Two Kerberos fixes including an important one when AES-256 encryption
chosen
- Fix open_cached_dir problem when directory leases disabled
* tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: validate dacloffset before building DACL pointers
smb/client: fix out-of-bounds read in smb2_compound_op()
smb/client: fix out-of-bounds read in symlink_data()
smb: client: Zero-pad short GSS session keys per MS-SMB2
smb: client: Use FullSessionKey for AES-256 encryption key derivation
smb: client: use kzalloc to zero-initialize security descriptor buffer
cifs: abort open_cached_dir if we don't request leases
Linus Torvalds [Fri, 8 May 2026 17:14:51 +0000 (10:14 -0700)]
Merge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's two main series here, fixing issues that came up in the
Microchip QSPI and Freescale i.MX drivers. Both of those could result
in some quite noticable issues if they were encountered in production.
We also have one minor documentation fix in the ch341 driver"
* tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: ch341: correct company name in MODULE_DESCRIPTION
spi: microchip-core-qspi: remove some inline markings
spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations
spi: microchip-core-qspi: control built-in cs manually
spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()
spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()
spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()
The buggy address belongs to the object at ffff88801083d280
which belongs to the cache RAW of size 1792
The buggy address is located 1248 bytes inside of
allocated 1792-byte region [ffff88801083d280, ffff88801083d980)
Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20260504210610.180150-2-kuniyu@google.com
Linus Torvalds [Fri, 8 May 2026 15:23:06 +0000 (08:23 -0700)]
Merge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, lots of them but all pretty small, amdgpu and xe are the
usual but then a large amount of fixes all over.
xe:
- Add NULL check for media_gt in intel_hdcp_gsc_check_status
- Fix EAGAIN sign in pf_migration_consume
- Fix MMIO access using PF view instead of VF view during migration
- Exclude indirect ring state page from ADS engine state size
* tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
drm: Set old handle to NULL before prime swap in change_handle
drm/bochs: Drop manual put on probe error path
drm/xe/guc: Exclude indirect ring state page from ADS engine state size
drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
drm/exynos: remove bridge when component_add fails
drm/amdgpu: nuke amdgpu_userq_fence_slab v2
drm/amdgpu/userq: fix access to stale wptr mapping
drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
drm/amdgpu: zero-initialize GART table on allocation
drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission
drm/radeon: add missing revision check for CI
drm/amdgpu/pm: align Hawaii mclk workaround with radeon
drm/amdgpu/pm: add missing revision check for CI
drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ
drm/amdkfd: Make all TLB-flushes heavy-weight
drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds
drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds
drm/panel: feiyang-fy07024di26a30d: return display-on error
...
Linus Torvalds [Fri, 8 May 2026 15:16:07 +0000 (08:16 -0700)]
Merge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
"Core:
- Cache-flushing fix for non-x86 platforms
AMD-Vi:
- Security fix when SEV-SNP is enabled
- Operator precedence fix in DTE setting"
* tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/amd: Fix precedence order in set_dte_passthrough()
iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86
iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19
iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19
Ming Lei [Fri, 8 May 2026 12:37:46 +0000 (20:37 +0800)]
ublk: fix use-after-free in ublk_cancel_cmd()
When ublk_reset_ch_dev() clears io->cmd via ublk_queue_reinit()
concurrently with ublk_cancel_cmd(), ublk_cancel_cmd() can read a
stale pointer and pass it to io_uring_cmd_done(), causing a
use-after-free.
Fix by synchronizing the two paths with ubq->cancel_lock:
- ublk_cancel_cmd(): read and clear io->cmd under cancel_lock,
then call io_uring_cmd_done() on the saved local copy outside
the lock.
- ublk_reset_ch_dev(): hold cancel_lock across ublk_queue_reinit()
so that io->cmd and io->flags are cleared atomically with respect
to ublk_cancel_cmd().
Francis, David [Tue, 28 Apr 2026 19:25:50 +0000 (19:25 +0000)]
drm: Set old handle to NULL before prime swap in change_handle
There was a potential race condition in change_handle. The ioctl
briefly had a single object with two idr entries; a concurrent
gem_close could delete the object and remove one of the handles
while leaving the other one dangling, which could subsequently
be dereferenced for a use-after-free.
To fix this, do the same dance that gem_close itself does.
(f6cd7daecff5 drm: Release driver references to handle before making it available again)
First idr_replace the old handle to NULL. Later, if the prime
operations are successful, actually close it.
create_tail required a similar dance to avoid a similar problem.
(bd46cece51a3 drm/gem: Fix race in drm_gem_handle_create_tail())
It idr_allocs the new handle with NULL, then swaps in the correct
object later to avoid races. We don't need to do that here, since
the only operations that could race are drm_prime, and
change_handle holds the prime lock for the entire duration.
v2: cleanups of error paths
Signed-off-by: David Francis <David.Francis@amd.com> Co-authored-by: Dave Airlie <airlied@gmail.com> Reported-by: Puttimet Thammasaeng <pwn8official@gmail.com> Tested-by: Vitaly Prosyak <Vitaly.Prosyak@amd.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: stable@vger.kernel.org Cc: Christian Koenig <Christian.Koenig@amd.com> Fixes: 53096728b8910 ("drm: Add DRM prime interface to reassign GEM handle") Signed-off-by: Dave Airlie <airlied@redhat.com>
The testsuite defines several kprobes and kretprobes as static variables
that are preserved across test runs.
After register_kprobe and unregister_kprobe, a kprobe contains some
leftover data that must be cleared before the kprobe can be registered
again. The tests are setting symbol_name to define the probe location.
Address and flags must be cleared.
The existing code clears some of the probes between subsequent tests, but
not between two test runs. The leftover data from a previous test run
makes the registrations fail in the next run.
Move the cleanups for all kprobes into kprobes_test_init, this function
is called before each single test (including the first test of a test
run).
Jianpeng Chang [Fri, 8 May 2026 00:56:36 +0000 (09:56 +0900)]
kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()
When kprobe_add_area_blacklist() iterates through a section like
.kprobes.text, the start address may not correspond to a named symbol.
On ARM64 with CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y (introduced by
commit baaf553d3bc3 ("arm64: Implement
HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS")), the compiler flag
-fpatchable-function-entry=4,2 inserts 2 NOPs before each function entry
point for ftrace call_ops. These pre-function NOPs sit at the section base
address, before the first named function symbol. The compiler emits a $x
mapping symbol at offset 0x00 to mark the start of code, but
find_kallsyms_symbol() ignores mapping symbols.
Without CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS (e.g. defconfig), no
pre-function NOPs are inserted, the first function starts at offset
0x00, and the bug does not trigger.
This only affects modules that have a .kprobes.text section (i.e. those
using the __kprobes annotation). Modules using NOKPROBE_SYMBOL() instead
(like kretprobe_example.ko) blacklist exact function addresses via the
_kprobe_blacklist section and are not affected.
For kprobe_example.ko on ARM64 with -fpatchable-function-entry=4,2,
the .kprobes.text section layout is:
kprobe_add_area_blacklist() starts iterating from the section base
address (offset 0x00), which only has the $x mapping symbol.
kprobe_add_ksym_blacklist() then calls kallsyms_lookup_size_offset()
for this address, which goes through:
find_kallsyms_symbol() scans all module symbols to find the closest
preceding symbol.
Since no named text symbol exists at offset 0x00,
find_kallsyms_symbol() picks __UNIQUE_ID_vermagic (a .modinfo symbol
whose address is in the temporary image) as the "best" match. The
computed "size" = next_text_symbol - modinfo_symbol spans across
these two unrelated memory regions, creating a blacklist entry with
a bogus range of tens of terabytes.
Whether this causes a visible failure depends on address randomization,
here is what happens on Raspberry Pi 4/5:
- On RPi5, the bogus size was ~35 TB. start + size stayed within
64-bit range, so the blacklist entry covered the entire kernel
text. register_kprobe() in the module's own init function failed
with -EINVAL.
- On RPi4, the bogus size was ~75 TB. start + size overflowed
64 bits and wrapped to a small address near zero. The range
check (addr >= start && addr < end) then failed because end
wrapped around, so the bogus entry was accidentally harmless
and kprobes worked by luck.
The same bug exists on both machines, but randomization determines whether
the integer overflow masks it or not.
Fix this by adding notrace to the __kprobes macro. Functions in
.kprobes.text are kprobe infrastructure handlers that should never be
traced by ftrace. With notrace, the compiler stops inserting them and the
non-symbol gap at the section start disappears entirely.
Linus Torvalds [Fri, 8 May 2026 00:26:43 +0000 (17:26 -0700)]
Merge tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
- Allow for multiple opens of /sys/fs/selinux/policy
Prevent a single process from blocking others from reading the
SELinux policy loaded in the kernel. This does have the side effect
of potentially allowing userspace to trigger additional kernel memory
allocations as part of the open/read operation, but this is mitigated
by requiring the SELinux security/read_policy permission.
- Reduce the critical sections where the SELinux policy mutex is held
This includes the patch to the policy loader code where we move the
permission checks and an allocation outside the mutex as well as the
the patch to checkreqprot which drops the code/lock entirely.
While the checkreqprot code had effectively been dropped in an
earlier release, portions of the code still remained that would have
triggered the mutex to perform an IMA measurement. This finally drops
all of that while preserving the user visible behavior.
- Eliminate potential sources of log spamming
There were a few areas where processes could flood the system logs
and hide other, more critical events. The previously disabled
checkreqprot and runtime disable knobs in selinuxfs were two such
areas that have now been greatly simplified and a pr_err() replaced
with a pr_err_once().
The third such place is the /sys/fs/selinux/user file, which hasn't
been used by a userspace release since 2020 and was scheduled for
removal after 2025; this effectively disables this functionality, but
similar to checkreqprot, it is done in a way that should not break
old userspace.
* tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: shrink critical section in sel_write_load()
selinux: allow multiple opens of /sys/fs/selinux/policy
selinux: prune /sys/fs/selinux/user
selinux: prune /sys/fs/selinux/disable
selinux: prune /sys/fs/selinux/checkreqprot
Tabrez Ahmed [Sat, 2 May 2026 02:08:42 +0000 (07:38 +0530)]
hwmon: (ads7871) Fix endianness bug in 16-bit register reads
The ads7871_read_reg16() function relies on spi_w8r16() to read the
16-bit sensor output. The ADS7871 device transmits the Least Significant
Byte (LSB) first.
On Little-Endian architectures, spi_w8r16() correctly reconstructs the
16-bit value. However, on Big-Endian architectures, the byte swapping
causes the first received byte (LSB) to be placed in the most significant
byte of the u16, resulting in corrupted voltage readings.
To fix this, cast the integer result of spi_w8r16() to a restricted
__le16 type and convert it to the host CPU's native byte order using
le16_to_cpu(). Negative error codes returned by the SPI core are caught
and returned prior to the conversion to avoid mangling the error status.
Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260418034601.90226-1-tabreztalks@gmail.com Fixes: e0c70b8078629 ("hwmon: add TI ads7871 a/d converter driver") Suggested-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Tabrez Ahmed <tabreztalks@gmail.com> Link: https://lore.kernel.org/r/20260502020844.110038-2-tabreztalks@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Sensors configurations are defined by set and clear masks. These
do not follow a consistent "clear mask is a superset of set mask"
rule. This relaxed definition breaks lm75_write_config()
Basically all bits from set_mask that are not defined in clr_mask are
dropped. Fix that by enhancing the helper to always combine clr_mask
and set_mask into the mask bits of regmap_update_bits().
hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling
The initialization of the AS6200 has two shortcomings
- The device-add-commit states "Conversion mode: continuous" but the
the lm75_params structure uses set_mask = 0x94c0. This activates
single shot mode (bit 15). According to the datasheet "The device
features a single shot measurement mode if the device is in sleep
mode (SM=1)". This is quite contradictionary.
- It is the only device that activates polarity active-high (bit 10)
All this is paired with a undefined clear mask bug in function
lm75_write_config() that was introduced with a later refactoring
commit.
[as6200] = {
.config_reg_16bits = true,
.set_mask = 0x94C0,
-> .clr_mask not defined here
.default_resolution = 12,
...
static inline int lm75_write_config(struct lm75_data *data, u16 set_mask,
u16 clr_mask)
{
return regmap_update_bits(data->regmap, LM75_REG_CONF,
clr_mask | LM75_SHUTDOWN, set_mask);
}
regmap_update_bits() requires clr_mask to be a superset of set_mask.
So basically all sensors with "wrong" masks like the AS6200 are not
initialized as intended.
Fix that by
- Change the set_mask to 0xc010 to reflect the current active-low
setup properly and to drive the sensor in continous mode. This
takes into account that the config register is little endian and
the first byte sent to the chip is the LSB.
- Adapt the alarm handling so it can report the alarm correctly
even if it is high active. This is done by comparing config register
bit 5 and 10 (translated to 2 and 13).
This commit does not introduce any ABI breakage as the mutliple bugs
effectly drive the AS6200 in standard active-low mode.
Fixes: 4b6358e1fe46 ("hwmon: (lm75) Add AMS AS6200 temperature sensor") Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de> Link: https://lore.kernel.org/r/20260502173207.3567876-2-markus.stockhausen@gmx.de
[groeck: Update set_mask for as6200 further: As modeled, the upper bits
contain the conversion rate, so the config register needs to be set to
0xc010 instead of 0x10c0 to reflect 8 samples/s and 4 consecutive faults.
Fix the same problem for TMP112.] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Dave Airlie [Thu, 7 May 2026 22:51:01 +0000 (08:51 +1000)]
Merge tag 'drm-xe-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
Driver Changes:
- Add NULL check for media_gt in intel_hdcp_gsc_check_status (Gustavo)
- Fix EAGAIN sign in pf_migration_consume (Shuicheng)
- Fix MMIO access using PF view instead of VF view during migration (Shuicheng)
- Exclude indirect ring state page from ADS engine state size (Satya)
Dave Airlie [Thu, 7 May 2026 22:34:34 +0000 (08:34 +1000)]
Merge tag 'drm-rust-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/rust/kernel into drm-fixes
DRM Rust fixes for v7.1-rc3
- Fix unsound initialization in drm::Device::new(); if pinned
initialization of drm::Device::Data fails, make sure
drm::Device::release() isn't called, so we don't run the data's
destructor
- Fix missing GEM state cleanup in the init failure case; call
drm_gem_private_object_fini() if drm_gem_object_init() fails
- Fix wrong ARef import in the DRM shmem GEM helper abstraction
- Replace the nouveau mailing list with the new nova-gpu mailing list
for both nova-core and nova-drm, and remove unused patchwork entries
smb: client: validate dacloffset before building DACL pointers
parse_sec_desc(), build_sec_desc(), and the chown path in
id_mode_to_cifs_acl() all add the server-supplied dacloffset to pntsd
before proving a DACL header fits inside the returned security
descriptor.
On 32-bit builds a malicious server can return dacloffset near
U32_MAX, wrap the derived DACL pointer below end_of_acl, and then slip
past the later pointer-based bounds checks. build_sec_desc() and
id_mode_to_cifs_acl() can then dereference DACL fields from the wrapped
pointer in the chmod/chown rewrite paths.
Validate dacloffset numerically before building any DACL pointer and
reuse the same helper at the three DACL entry points.
Fixes: bc3e9dd9d104 ("cifs: Change SIDs in ACEs while transferring file ownership.") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Zisen Ye [Wed, 6 May 2026 03:49:08 +0000 (11:49 +0800)]
smb/client: fix out-of-bounds read in smb2_compound_op()
If a server sends a truncated response but a large OutputBufferLength, and
terminates the EA list early, check_wsl_eas() returns success without
validating that the entire OutputBufferLength fits within iov_len.
Then smb2_compound_op() does:
memcpy(idata->wsl.eas, data[0], size[0]);
Where size[0] is OutputBufferLength. If iov_len is smaller than size[0],
memcpy can read beyond the end of the rsp_iov allocation and leak adjacent
kernel heap memory.
Zisen Ye [Sat, 2 May 2026 10:48:36 +0000 (18:48 +0800)]
smb/client: fix out-of-bounds read in symlink_data()
Since smb2_check_message() returns success without length validation for
the symlink error response, in symlink_data() it is possible for
iov->iov_len to be smaller than sizeof(struct smb2_err_rsp). If the buffer
only contains the base SMB2 header (64 bytes), accessing
err->ErrorContextCount (at offset 66) or err->ByteCount later in
symlink_data() will cause an out-of-bounds read.
Piyush Sachdeva [Thu, 7 May 2026 16:52:14 +0000 (22:22 +0530)]
smb: client: Zero-pad short GSS session keys per MS-SMB2
Per MS-SMB2 section 3.2.5.3, Session.SessionKey is the first 16 bytes
of the GSS cryptographic key, right-padded with zero bytes if the key
is shorter than 16 bytes.
SMB2_auth_kerberos() copies the GSS session key from the cifs.upcall
response using kmemdup(msg->data, msg->sesskey_len, ...) and stores
the GSS-reported length verbatim in ses->auth_key.len. generate_key()
reads SMB2_NTLMV2_SESSKEY_SIZE bytes from this buffer when feeding the
HMAC-SHA256 KDF for signing key derivation. If a GSS mechanism returns
a session key shorter than 16 bytes (e.g. a deprecated single-DES
Kerberos enctype with an 8-byte session key), the KDF call performs an
out-of-bounds slab read and derives keys that do not match the server,
which pads per the spec.
Modern KDCs disable short-key enctypes by default, so this is latent
rather than reachable in production, but it is still a kernel heap
over-read.
Allocate auth_key.response with kzalloc() at a length of
max(msg->sesskey_len, SMB2_NTLMV2_SESSKEY_SIZE), copy the GSS key in,
and rely on kzalloc()'s zero initialization for the spec-mandated
padding. Set ses->auth_key.len to the padded length. Larger GSS keys
(e.g. the 32-byte aes256-cts-hmac-sha1-96 session key) continue to be
stored at their natural length, preserving the FullSessionKey path.
Emit a cifs_dbg(VFS, ...) message when a short key is encountered to
surface deprecated-enctype usage.
NTLMv2 and NTLMSSP code paths produce a 16-byte session key by
construction and are unaffected.
Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com> Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Piyush Sachdeva [Thu, 7 May 2026 16:52:13 +0000 (22:22 +0530)]
smb: client: Use FullSessionKey for AES-256 encryption key derivation
When Kerberos authentication is used with AES-256 encryption (AES-256-CCM
or AES-256-GCM), the SMB3 encryption and decryption keys must be derived
using the full session key (Session.FullSessionKey) rather than just the
first 16 bytes (Session.SessionKey).
Per MS-SMB2 section 3.2.5.3.1, when Connection.Dialect is "3.1.1" and
Connection.CipherId is AES-256-CCM or AES-256-GCM, Session.FullSessionKey
must be set to the full cryptographic key from the GSS authentication
context. The encryption and decryption key derivation (SMBC2SCipherKey,
SMBS2CCipherKey) must use this FullSessionKey as the KDF input. The
signing key derivation continues to use Session.SessionKey (first 16
bytes) in all cases.
Previously, generate_key() hardcoded SMB2_NTLMV2_SESSKEY_SIZE (16) as the
HMAC-SHA256 key input length for all derivations. When Kerberos with
AES-256 provides a 32-byte session key, the KDF for encryption/decryption
was using only the first 16 bytes, producing keys that did not match the
server's, causing mount failures with sec=krb5 and require_gcm_256=1.
Add a full_key_size parameter to generate_key() and pass the appropriate
size from generate_smb3signingkey():
- Signing: always SMB2_NTLMV2_SESSKEY_SIZE (16 bytes)
- Encryption/Decryption: ses->auth_key.len when AES-256, otherwise 16
Also fix cifs_dump_full_key() to report the actual session key length for
AES-256 instead of hardcoded CIFS_SESS_KEY_SIZE, so that userspace tools
like Wireshark receive the correct key for decryption.
Cc: <stable@vger.kernel.org> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com> Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Daniel Machon [Wed, 6 May 2026 07:25:39 +0000 (09:25 +0200)]
net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()
sparx5_port_init() only invokes sparx5_serdes_set() and the associated
shadow-device enable and low-speed device switch for SGMII and QSGMII.
On any port with a high-speed primary device (DEV5G/DEV10G/DEV25G)
configured for 1000BASE-X the serdes is therefore left uninitialized,
the DEV2G5 shadow is never enabled, and the port stays pointed at its
high-speed device rather than the DEV2G5. The PCS1G block looks
healthy in isolation, but no frames reach the link partner.
Add 1000BASE-X to the check so the same three steps run.
Note: the same issue might apply to 2500BASE-X, but that will,
eventually, be addressed in a separate commit.
The value read back from the chip is GCB_CHIP_ID_PART_ID, which is a
GENMASK(27, 12) field, i.e. at most 16 bits wide. It can never match
these IDs, so probing a TSN part fails with a "Target not supported"
error.
Fix the enum to use the actual 16-bit part IDs returned by the
hardware: 0x0546, 0x0549, 0x0552, 0x0556 and 0x0558.
Linus Torvalds [Thu, 7 May 2026 15:55:15 +0000 (08:55 -0700)]
Merge tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Again a collection of small fixes, mostly for device-specific ones.
The only big LOC is about the removal of pretty old dead code in
ab8500 codec driver, while the rest all nice small changes.
Core / API:
- Fix race in deferred fasync state checks
- Fix UMP group filtering in sequencer
ASoC:
- cs35l56: fixes for driver cleanup and error paths
- tas2764/2770: workaround for bogus temperature readings
- wm_adsp: fixes for firmware unit tests
- amd-yc: more DMI quirks for laptops
- Minor fixes for fsl_xcvr and spacemit
HD-Audio:
- Mute LED and speaker quirks for HP, Lenovo, and Xiaomi laptops
USB-audio:
- New device-specific quirks (Motu, JBL, AlphaTheta, Razer)
- Fix of MIDI2 playback on resume
Others:
- Firewire-tascam control event fix
- Minor cleanups and fixes for sparc/dbri and pcmtest"
* tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
ASoC: cs35l56: Destroy workqueue in probe error path
ASoC: cs35l56: Don't use devres to unregister component
ALSA: sparc/dbri: add missing fallthrough
ALSA: core: Serialize deferred fasync state checks
ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx
ALSA: seq: Fix UMP group 16 filtering
ASoC: wm_adsp_fw_find_test: Clear searched_fw_files in find-by-index test
ASoC: wm_adsp_fw_find_test: Redirect wm_adsp_release_firmware_files()
ASoC: tas2770: Deal with bogus initial temperature value
ASoC: tas2764: Deal with bogus initial temperature register value
ALSA: usb-audio: add clock quirk for Motu 1248
ALSA: usb-audio: midi2: Restart output URBs on resume
ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 15-fh0xxx
ALSA: usb-audio: Add quirk flags for JBL Pebbles
ALSA: firewire-tascam: Do not drop unread control events
ALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA
ASoC: fsl_xcvr: Fix event generation for cached controls
ASoC: sdw_utils: avoid the SDCA companion function not supported failure
ASoC: amd: yc: Add HP OMEN Gaming Laptop 16-ap0xxx product line in quirk table
ASoC: cs35l56: Fix out-of-bounds in dev_err() in cs35l56_read_onchip_spkid()
...
Linus Torvalds [Thu, 7 May 2026 15:43:25 +0000 (08:43 -0700)]
Merge tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
- Fix detach procedure for virtual devices in genpd
- mediatek: Fix use-after-free in scpsys_get_bus_protection_legacy()
* tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: mediatek: fix use-after-free in scpsys_get_bus_protection_legacy()
pmdomain: core: Fix detach procedure for virtual devices in genpd
Joey Lu [Wed, 6 May 2026 08:46:13 +0000 (16:46 +0800)]
net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()
priv->dev was never initialized after devm_kzalloc() allocates the
private data structure. When nvt_set_phy_intf_sel() is later invoked
via the phylink interface_select callback, it calls
nvt_gmac_get_delay(priv->dev, ...) which dereferences the NULL pointer.
Fix this by assigning priv->dev = dev immediately after allocation.
If a socket is bound to a wildcard address, tcp_v[46]_connect()
updates it with a non-wildcard address based on the route lookup.
After bhash2 was introduced in the cited commit, we must call
inet_bhash2_update_saddr() to update the bhash2 entry as well.
If inet_bhash2_update_saddr() fails, we must release the refcount
for dst by ip_route_connect() or ip6_dst_lookup_flow().
While tcp_v4_connect() calls ip_rt_put() in the error path,
tcp_v6_connect() does not call dst_release().
Let's call dst_release() when inet_bhash2_update_saddr() fails
in tcp_v6_connect().
Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260506070443.1699879-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Justin Chen [Tue, 5 May 2026 17:39:26 +0000 (10:39 -0700)]
net: phy: broadcom: Save PHY counters during suspend
The PHY counters can be lost if the PHY is reset during suspend. We
need to save the values into the shadow counters or the accounting
will be incorrect over multiple suspend and resume cycles.
Fixes: 820ee17b8d3b ("net: phy: broadcom: Add support code for reading PHY counters") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20260505173926.2870069-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
D. Wythe [Wed, 6 May 2026 01:41:05 +0000 (09:41 +0800)]
net/smc: fix missing sk_err when TCP handshake fails
In smc_connect_work(), when the underlying TCP handshake fails, the error
code (rc) must be propagated to sk_err to ensure userspace can correctly
retrieve the error status via SO_ERROR. Currently, the code only handles
a restricted set of error codes (e.g., EPIPE, ECONNREFUSED). If other
errors occurs, such as EHOSTUNREACH, sk_err remains unset (zero).
This affects applications that rely on SO_ERROR to determine connect
outcome. For example, higher versions of Go's netpoller treats
SO_ERROR == 0 combined with a failed getpeername() as a spurious wakeup
and re-enters epoll_wait(). Under ET mode, no further edge will be
generated since the socket is already in a terminal state, causing the
connect to hang indefinitely or until a user-specified timeout, if one
is set.
Jiexun Wang [Wed, 6 May 2026 14:08:23 +0000 (22:08 +0800)]
af_unix: Reject SIOCATMARK on non-stream sockets
SIOCATMARK reports whether the receive queue is at the urgent mark for
MSG_OOB.
In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets.
SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(),
so they should not support SIOCATMARK either.
Return -EOPNOTSUPP for non-stream sockets before checking the receive
queue.
Fixes: 314001f0bf92 ("af_unix: Add OOB support") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260506140825.2987635-1-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
veth: fix OOB txq access in veth_poll() with asymmetric queue counts
XDP redirect into a veth device (via bpf_redirect()) calls
veth_xdp_xmit(), which enqueues frames into the peer's ptr_ring using
smp_processor_id() % peer->real_num_rx_queues
as the ring index. With an asymmetric veth pair where the peer has
fewer TX queues than RX queues, that index can exceed
peer->real_num_tx_queues.
veth_poll() then resolves peer_txq for the ring via:
where queue_idx = rq->xdp_rxq.queue_index. When queue_idx exceeds
peer_dev->real_num_tx_queues this is an out-of-bounds (OOB) access
into the peer's netdev_queue array, triggering DEBUG_NET_WARN_ON_ONCE
in netdev_get_tx_queue().
The normal ndo_start_xmit path is not affected: the stack clamps
skb->queue_mapping via netdev_cap_txqueue() before invoking
ndo_start_xmit, so rxq in veth_xmit() never exceeds real_num_tx_queues.
Fix veth_poll() by clamping: only dereference peer_txq when queue_idx is
within bounds, otherwise set it to NULL. The out-of-range rings are fed
exclusively via XDP redirect (veth_xdp_xmit), never via ndo_start_xmit
(veth_xmit), so the peer txq was never stopped and there is nothing to
wake; NULL is the correct fallback.
Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://lore.kernel.org/all/20260502071828.616C3C19425@smtp.kernel.org/ Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20260505132159.241305-2-hawk@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fuad Tabba [Fri, 1 May 2026 11:21:49 +0000 (12:21 +0100)]
KVM: arm64: Pre-check vcpu memcache for host->guest donate
__pkvm_host_donate_guest() flips the host stage-2 PTE for the
donated page to a non-valid annotation via
host_stage2_set_owner_metadata_locked() and then calls
kvm_pgtable_stage2_map() to install the matching guest stage-2
mapping. The map's return value is wrapped in WARN_ON() and
otherwise discarded, asserting that the call cannot fail.
WARN_ON() at nVHE EL2 panics, so this assertion is only correct
if the call genuinely cannot fail. kvm_pgtable_stage2_map() can
fail with -ENOMEM even at PAGE_SIZE granularity: the donate path
verifies PKVM_NOPAGE for the guest IPA before the map, so the
walker must allocate fresh page-table pages from the vcpu
memcache, and the host controls the vcpu memcache via the topup
interface. An under-provisioned donation request would otherwise
turn a recoverable -ENOMEM into a fatal hyp panic.
Bound the worst-case walker allocation alongside the existing
__host_check_page_state_range() / __guest_check_page_state_range()
pre-checks, using the helper introduced for host->guest share. If
the vcpu memcache holds fewer pages than kvm_mmu_cache_min_pages(),
return -ENOMEM before any state mutation.
Fuad Tabba [Fri, 1 May 2026 11:21:48 +0000 (12:21 +0100)]
KVM: arm64: Pre-check vcpu memcache for host->guest share
__pkvm_host_share_guest() ends with kvm_pgtable_stage2_map() to
install the guest stage-2 mapping, after a forward pass that mutates
the host vmemmap (sets PKVM_PAGE_SHARED_OWNED and increments
host_share_guest_count) for every page in the range. The map's
return value is wrapped in WARN_ON() and otherwise discarded,
asserting that the call cannot fail.
WARN_ON() at nVHE EL2 panics, so this assertion is only correct if
the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail
with -ENOMEM when the stage-2 walker exhausts the caller's
memcache, and the host controls the vcpu memcache via the topup
interface, so an under-provisioned share request would otherwise
turn a recoverable -ENOMEM into a fatal hyp panic.
Bound the worst-case walker allocation in the existing pre-check
pass so that kvm_pgtable_stage2_map() cannot fail at the call
site, using kvm_mmu_cache_min_pages() -- the same bound host EL1
uses for its own stage-2 maps. If the vcpu memcache holds fewer
pages, return -ENOMEM before any state mutation.
The hypercall handlers call pkvm_refill_memcache() to top up the
hyp_vcpu memcache before invoking __pkvm_host_{share,donate}_guest().
pkvm_ownership_selftest invokes those functions directly with a
static selftest_vcpu that has an empty memcache.
Seed selftest_vcpu's memcache from the prepopulated selftest
pages, leaving the remainder for selftest_vm.pool. Required by
the memcache-sufficiency pre-check added in the following
patches.
__deactivate_fgt() declares its first parameter as "htcxt" but the body
references "hctxt". The parameter is unused; the macro silently captures
"hctxt" from the enclosing scope. Both existing callers
(__deactivate_traps_hfgxtr() and __deactivate_traps_ich_hfgxtr()) happen
to define a local "struct kvm_cpu_context *hctxt", so the macro works
by coincidence.
A future caller without an "hctxt" local in scope, or naming it
differently, would compile but bind to the wrong context. Align the
parameter name with the sibling __activate_fgt() macro.
The "vcpu" parameter remains unused in the body, kept for API symmetry
with __activate_fgt() (which uses it).
Fuad Tabba [Fri, 1 May 2026 11:21:45 +0000 (12:21 +0100)]
KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
On VHE, __hyp_call_panic() unconditionally calls __deactivate_traps(vcpu)
on the vcpu pointer read from host_ctxt->__hyp_running_vcpu. That pointer
is cleared after every guest exit (and is never set when no guest is
running), so an unexpected EL2 exception landing in _guest_exit_panic,
e.g. via the el2t*_invalid / el2h_irq_invalid vectors - reaches this
function with vcpu == NULL. __deactivate_traps() then dereferences vcpu
via ___deactivate_traps() -> vserror_state_is_nested() -> vcpu_has_nv()
-> vcpu->arch.features, faulting inside the panic handler and obscuring
the original failure.
The nVHE counterpart (hyp_panic() in arch/arm64/kvm/hyp/nvhe/switch.c)
already guards its vcpu-using cleanup with "if (vcpu)"; mirror that
here. sysreg_restore_host_state_vhe() does not depend on vcpu and
continues to run unconditionally, preserving panic forensics. The
trailing panic("...VCPU:%p", vcpu) prints "(null)" safely via printk's
%p handling.
Fuad Tabba [Fri, 1 May 2026 11:21:44 +0000 (12:21 +0100)]
KVM: arm64: Make EL2 exception entry and exit context-synchronization events
SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
0487 M.b D24.2.175 (p. D24-9754):
- !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
a CSE.
- FEAT_ExS: the reset value is architecturally UNKNOWN; software
must set the bit to make the entry/exit a CSE.
INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
architecturally backed unless EOS=1 (and, for entry, EIS=1).
Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
was setting both unconditionally. The conversion made
SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
models unconditionally-RES1 fields and EIS/EOS are RES1 only when
FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
are the only bits whose semantics changed for FEAT_ExS hardware
and where the kernel relies on the value being 1.
Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
unconditionally CSEs regardless of whether FEAT_ExS is implemented.
This matches the pairing in arch/arm64/kvm/config.c which treats EIS
and EOS together as RES1 under !FEAT_ExS.
Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure") Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
Bobby Eshleman [Tue, 5 May 2026 01:42:11 +0000 (18:42 -0700)]
eth: fbnic: fix double-free of PCS on phylink creation failure
fbnic_phylink_create() stores the newly allocated PCS in fbn->pcs and
then calls phylink_create(). When phylink_create() fails, the error path
correctly destroys the PCS via xpcs_destroy_pcs(), but the caller,
fbnic_netdev_alloc(), responds by invoking fbnic_netdev_free() which
calls fbnic_phylink_destroy(). That function finds fbn->pcs non-NULL and
calls xpcs_destroy_pcs() a second time on the already-freed object,
triggering a refcount underflow use-after-free:
Instead of calling fbnic_phylink_destroy(), the prior initialization of
netdev should just be unrolled with free_netdev() and clearing
fbd->netdev.
Clearing fbd->netdev to NULL avoids UAF in init_failure_mode where
callers guard by checking !fbd->netdev, such as fbnic_mdio_read_pmd().
These callers remain active even after a failed probe, so fdb->netdev
still needs to be cleared.
Weiming Shi [Tue, 5 May 2026 17:55:11 +0000 (01:55 +0800)]
i2c: smbus: reject oversized block transfers in the common path
The SMBus block transfer length data->block[0] is validated in
i2c_smbus_xfer_emulated() but that check runs too late for tracepoints
and is skipped entirely when the adapter provides a native smbus_xfer
implementation. This allows user-controlled oversized block lengths to
reach tracepoint memcpy calls and driver callbacks unchecked.
Add an early validation in __i2c_smbus_xfer() that rejects block
transfers whose caller-supplied length is zero or exceeds
I2C_SMBUS_BLOCK_MAX before any tracepoint fires or driver callback
runs. data->block[0] is filled in by the device on SMBus block reads,
so the check is scoped to operations where the length is actually
supplied by the caller. This is consistent with the existing -EINVAL
convention in the emulated path and protects all downstream consumers
at once: the smbus_write tracepoint, all native smbus_xfer driver
implementations, and the emulated path.
Two distinct bugs are fixed by this change:
Bug 1: smbus_write tracepoint OOB (include/trace/events/smbus.h)
trace_smbus_write() fires before any validation and copies
data->block[0]+1 bytes into a 34-byte event buffer. With
block[0]=0xfe the tracepoint copies 255 bytes, overflowing by 221.
BUG: KASAN: stack-out-of-bounds in trace_event_raw_event_smbus_write+0x27c/0x530
Read of size 255 at addr ffff88800d98fcf8 by task poc_smbus/91
Call Trace:
<TASK>
__asan_memcpy+0x23/0x80
trace_event_raw_event_smbus_write+0x27c/0x530
__i2c_smbus_xfer+0x43a/0xa40
i2c_smbus_xfer+0x19e/0x340
i2cdev_ioctl_smbus+0x38f/0x7f0
i2cdev_ioctl+0x35e/0x680
__x64_sys_ioctl+0x147/0x1e0
do_syscall_64+0xcf/0x15a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
Bug 2: i2c-stub I2C_SMBUS_I2C_BLOCK_DATA OOB (drivers/i2c/i2c-stub.c)
stub_xfer() implements .smbus_xfer directly and only clamps
block[0] against 256-command, not I2C_SMBUS_BLOCK_MAX. With
block[0]=0xff and command=0 the loop accesses block[1+i] for
i up to 254, far past the 34-byte union.
UBSAN: array-index-out-of-bounds in drivers/i2c/i2c-stub.c:223:44
index 34 is out of range for type '__u8 [34]'
Call Trace:
<TASK>
__ubsan_handle_out_of_bounds+0xd7/0x120
stub_xfer+0x1971/0x198f [i2c_stub]
__i2c_smbus_xfer+0x306/0xa40
i2c_smbus_xfer+0x19e/0x340
i2cdev_ioctl_smbus+0x38f/0x7f0
i2cdev_ioctl+0x35e/0x680
__x64_sys_ioctl+0x147/0x1e0
do_syscall_64+0xcf/0x15a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
Both traces reproduced on v7.0-rc6+i2c/for-current with KASAN+UBSAN.
Fixes: 8a325997d95d ("i2c: Add message transfer tracepoints for SMBUS [ver #2]") Fixes: 4710317891e4 ("i2c-stub: Implement I2C block support") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Myeonghun Pak [Fri, 24 Apr 2026 12:34:28 +0000 (21:34 +0900)]
drm/bochs: Drop manual put on probe error path
bochs_pci_probe() allocates the DRM device with devm_drm_dev_alloc(),
which registers a devres action to drop the initial DRM device reference
on driver detach or probe failure.
The error path currently calls drm_dev_put() manually. If probe then
returns an error, devres will run the registered release action and put
the same device again, after the first put may already have released it.
Return the probe error directly and let devres own the final put.
Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Fixes: 04826f588682 ("drm/bochs: Allocate DRM device in struct bochs_device") Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260424123506.32275-1-mhun512@gmail.com
David Gow [Thu, 16 Apr 2026 06:57:43 +0000 (14:57 +0800)]
x86/boot/e820: Re-enable BIOS fallback if e820 table is empty
In commit:
157266edcc56 ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables")
the check on the number of entries in the e820 table was removed. The intention
was to support single-entry maps, but by removing the check entirely, we also
skip the fallback (to, e.g., the BIOS 88h function).
This means that if no E820 map is passed in from the bootloader (which is the
case on some bootloaders, like linld), we end up with an empty memory map, and
the kernel fails to boot (either by deadlocking on OOM, or by failing to
allocate the real mode trampoline, or similar).
Re-instate the check in append_e820_table(), but only check that nr_entries is
non-zero. This allows e820__memory_setup_default() to fall back to other memory
size sources, and doesn't affect e820__memory_setup_extended(), as the latter
ignores the return value from append_e820_table().
In doing so, we also update the return values to be proper error codes, with
-ENOENT for this case (there are no entries), and -EINVAL for the case where an
entry appears invalid. Given none of the callers check the actual value -- just
whether it's nonzero -- this is largely aesthetic in practice.
Tested against linld, and the kernel boots again fine.
[ mingo: Readability edits to the comment and the changelog. ]
Fixes: 157266edcc56 ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables") Signed-off-by: David Gow <david@davidgow.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://patch.msgid.link/20260416065746.1896647-1-david@davidgow.net