KVM: TDX: Use struct_size to simplify tdx_get_capabilities()
Use struct_size() instead of manually calculating the number of bytes to
allocate for 'caps', including the nested flexible array, and copy all of
'caps' to user space with a single copy_to_user() call (thanks to the full
size being provided by struct_size()).
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20251017213914.167301-1-thorsten.blum@linux.dev
[sean: separate from swap of get_user() vs. kzalloc() ordering] Signed-off-by: Sean Christopherson <seanjc@google.com>
Thorsten Blum [Fri, 17 Oct 2025 21:39:14 +0000 (23:39 +0200)]
KVM: TDX: Check size of user's kvm_tdx_capabilities array before allocating
When userspace is getting TDX capabilities, retrieve and check the number
of user entries before allocating kernel scratch space to avoid having to
unwind the allocation if get_user() fails or if 'user_caps' is too small
to fit 'caps'.
Dave Hansen [Mon, 3 Nov 2025 23:44:37 +0000 (15:44 -0800)]
KVM: TDX: Remove __user annotation from kernel pointer
Separate __user pointer variable declaration from kernel one.
There are two 'kvm_cpuid2' pointers involved here. There's an "input"
side: 'td_cpuid' which is a normal kernel pointer and an 'output'
side. The output here is userspace and there is an attempt at properly
annotating the variable with __user:
struct kvm_cpuid2 __user *output, *td_cpuid;
But, alas, this is wrong. The __user in the definition applies to both
'output' and 'td_cpuid'. Sparse notices the address space mismatch and
will complain about it.
Fix it up by completely separating the two definitions so that it is
obviously correct without even having to know what the C syntax rules
even are.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Fixes: 488808e682e7 ("KVM: x86: Introduce KVM_TDX_GET_CPUID") Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Kirill A. Shutemov" <kas@kernel.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Kiryl Shutsemau <kas@kernel.org> Link: https://patch.msgid.link/20251103234437.A0532420@davehans-spike.ostc.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Rick Edgecombe [Tue, 28 Oct 2025 00:28:24 +0000 (17:28 -0700)]
KVM: TDX: Take MMU lock around tdh_vp_init()
Take MMU lock around tdh_vp_init() in KVM_TDX_INIT_VCPU to prevent
meeting contention during retries in some no-fail MMU paths.
The TDX module takes various try-locks internally, which can cause
SEAMCALLs to return an error code when contention is met. Dealing with
an error in some of the MMU paths that make SEAMCALLs is not straight
forward, so KVM takes steps to ensure that these will meet no contention
during a single BUSY error retry. The whole scheme relies on KVM to take
appropriate steps to avoid making any SEAMCALLs that could contend while
the retry is happening.
Unfortunately, there is a case where contention could be met if userspace
does something unusual. Specifically, hole punching a gmem fd while
initializing the TD vCPU. The impact would be triggering a KVM_BUG_ON().
The resource being contended is called the "TDR resource" in TDX docs
parlance. The tdh_vp_init() can take this resource as exclusive if the
'version' passed is 1, which happens to be version the kernel passes. The
various MMU operations (tdh_mem_range_block(), tdh_mem_track() and
tdh_mem_page_remove()) take it as shared.
There isn't a KVM lock that maps conceptually and in a lock order friendly
way to the TDR lock. So to minimize infrastructure, just take MMU lock
around tdh_vp_init(). This makes the operations we care about mutually
exclusive. Since the other operations are under a write mmu_lock, the code
could just take the lock for read, however this is weirdly inverted from
the actual underlying resource being contended. Since this is covering an
edge case that shouldn't be hit in normal usage, be a little less weird
and take the mmu_lock for write around the call.
Fixes: 02ab57707bdb ("KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table") Reported-by: Yan Zhao <yan.y.zhao@intel.com> Suggested-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://patch.msgid.link/20251028002824.1470939-1-rick.p.edgecombe@intel.com Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
[sean: tweak comment and capture PUNCH_HOLE interaction] Signed-off-by: Sean Christopherson <seanjc@google.com>
Yan Zhao [Thu, 30 Oct 2025 20:09:51 +0000 (13:09 -0700)]
KVM: TDX: Fix list_add corruption during vcpu_load()
During vCPU creation, a vCPU may be destroyed immediately after
kvm_arch_vcpu_create() (e.g., due to vCPU id confiliction). However, the
vcpu_load() inside kvm_arch_vcpu_create() may have associate the vCPU to
pCPU via "list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu))"
before invoking tdx_vcpu_free().
Though there's no need to invoke tdh_vp_flush() on the vCPU, failing to
dissociate the vCPU from pCPU (i.e., "list_del(&to_tdx(vcpu)->cpu_list)")
will cause list corruption of the per-pCPU list associated_tdvcpus.
Then, a later list_add() during vcpu_load() would detect list corruption
and print calltrace as shown below.
Dissociate a vCPU from its associated pCPU in tdx_vcpu_free() for the vCPUs
destroyed immediately after creation which must be in
VCPU_TD_STATE_UNINITIALIZED state.
KVM: TDX: Bug the VM if extending the initial measurement fails
WARN and terminate the VM if TDH_MR_EXTEND fails, as extending the
measurement should fail if and only if there is a KVM bug, or if the S-EPT
mapping is invalid. Now that KVM makes all state transitions mutually
exclusive via tdx_vm_state_guard, it should be impossible for S-EPT
mappings to be removed between kvm_tdp_mmu_map_private_pfn() and
tdh_mr_extend().
Holding slots_lock prevents zaps due to memslot updates,
filemap_invalidate_lock() prevents zaps due to guest_memfd PUNCH_HOLE,
vcpu->mutex locks prevents updates from other vCPUs, kvm->lock prevents
VM-scoped ioctls from creating havoc (e.g. by creating new vCPUs), and all
usage of kvm_zap_gfn_range() is mutually exclusive with S-EPT entries that
can be used for the initial image.
For kvm_zap_gfn_range(), the call from sev.c is obviously mutually
exclusive, TDX disallows KVM_X86_QUIRK_IGNORE_GUEST_PAT so the same goes
for kvm_noncoherent_dma_assignment_start_or_stop(), and
__kvm_set_or_clear_apicv_inhibit() is blocked by virtue of holding all
VM and vCPU mutexes (and the APIC page has its own KVM-internal memslot
that is never created for TDX VMs, and so can't possibly be used for the
initial image, which means that too is mutually exclusive irrespective of
locking).
Opportunistically return early if the region doesn't need to be measured
in order to reduce line lengths and avoid wraps. Similarly, immediately
and explicitly return if TDH_MR_EXTEND fails to make it clear that KVM
needs to bail entirely if extending the measurement fails.
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-28-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Guard VM state transitions with "all" the locks
Acquire kvm->lock, kvm->slots_lock, and all vcpu->mutex locks when
servicing ioctls that (a) transition the TD to a new state, i.e. when
doing INIT or FINALIZE or (b) are only valid if the TD is in a specific
state, i.e. when initializing a vCPU or memory region. Acquiring "all"
the locks fixes several KVM_BUG_ON() situations where a SEAMCALL can fail
due to racing actions, e.g. if tdh_vp_create() contends with either
tdh_mr_extend() or tdh_mr_finalize().
For all intents and purposes, the paths in question are fully serialized,
i.e. there's no reason to try and allow anything remotely interesting to
happen. Smack 'em with a big hammer instead of trying to be "nice".
Acquire kvm->lock to prevent VM-wide things from happening, slots_lock to
prevent kvm_mmu_zap_all_fast(), and _all_ vCPU mutexes to prevent vCPUs
from interefering. Use the recently-renamed kvm_arch_vcpu_unlocked_ioctl()
to service the vCPU-scoped ioctls to avoid a lock inversion problem, e.g.
due to taking vcpu->mutex outside kvm->lock.
See also commit ecf371f8b02d ("KVM: SVM: Reject SEV{-ES} intra host
migration if vCPU creation is in-flight"), which fixed a similar bug with
SEV intra-host migration where an in-flight vCPU creation could race with
a VM-wide state transition.
Define a fancy new CLASS to handle the lock+check => unlock logic with
guard()-like syntax:
CLASS(tdx_vm_state_guard, guard)(kvm);
if (IS_ERR(guard))
return PTR_ERR(guard);
to simplify juggling the many locks.
Note! Take kvm->slots_lock *after* all vcpu->mutex locks, as per KVM's
soon-to-be-documented lock ordering rules[1].
KVM: TDX: Don't copy "cmd" back to userspace for KVM_TDX_CAPABILITIES
Don't copy the kvm_tdx_cmd structure back to userspace when handling
KVM_TDX_CAPABILITIES, as tdx_get_capabilities() doesn't modify hw_error or
any other fields.
Opportunistically hoist the call to tdx_get_capabilities() outside of the
kvm->lock critical section, as getting the capabilities doesn't touch the
VM in any way, e.g. doesn't even take @kvm.
Suggested-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-26-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Use guard() to acquire kvm->lock in tdx_vm_ioctl()
Use guard() in tdx_vm_ioctl() to tidy up the code a small amount, but more
importantly to minimize the diff of a future change, which will use
guard-like semantics to acquire and release multiple locks.
No functional change intended.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-25-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Convert INIT_MEM_REGION and INIT_VCPU to "unlocked" vCPU ioctl
Handle the KVM_TDX_INIT_MEM_REGION and KVM_TDX_INIT_VCPU vCPU sub-ioctls
in the unlocked variant, i.e. outside of vcpu->mutex, in anticipation of
taking kvm->lock along with all other vCPU mutexes, at which point the
sub-ioctls _must_ start without vcpu->mutex held.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com> Co-developed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Add tdx_get_cmd() helper to get and validate sub-ioctl command
Add a helper to copy a kvm_tdx_cmd structure from userspace and verify
that must-be-zero fields are indeed zero.
No functional change intended.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Add macro to retry SEAMCALLs when forcing vCPUs out of guest
Add a macro to handle kicking vCPUs out of the guest and retrying
SEAMCALLs on TDX_OPERAND_BUSY instead of providing small helpers to be
used by each SEAMCALL. Wrapping the SEAMCALLs in a macro makes it a
little harder to tease out which SEAMCALL is being made, but
significantly reduces the amount of copy+paste code, and makes it all but
impossible to leave an elevated wait_for_sept_zap.
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-22-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Assert that mmu_lock is held for write when removing S-EPT entries
Unconditionally assert that mmu_lock is held for write when removing S-EPT
entries, not just when removing S-EPT entries triggers certain conditions,
e.g. needs to do TDH_MEM_TRACK or kick vCPUs out of the guest.
Conditionally asserting implies that it's safe to hold mmu_lock for read
when those paths aren't hit, which is simply not true, as KVM doesn't
support removing S-EPT entries under read-lock.
Only two paths lead to remove_external_spte(), and both paths asserts that
mmu_lock is held for write (tdp_mmu_set_spte() via lockdep, and
handle_removed_pt() via KVM_BUG_ON()).
Deliberately leave lockdep assertions in the "no vCPUs" helpers to document
that wait_for_sept_zap is guarded by holding mmu_lock for write, and keep
the conditional assert in tdx_track() as well, but with a comment to help
explain why holding mmu_lock for write matters (above and beyond why
tdx_sept_remove_private_spte()'s requirements).
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Derive error argument names from the local variable names
When printing SEAMCALL errors, use the name of the variable holding an
error parameter instead of the register from whence it came, so that flows
which use descriptive variable names will similarly print descriptive
error messages.
Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Combine KVM_BUG_ON + pr_tdx_error() into TDX_BUG_ON()
Add TDX_BUG_ON() macros (with varying numbers of arguments) to deduplicate
the myriad flows that do KVM_BUG_ON()/WARN_ON_ONCE() followed by a call to
pr_tdx_error(). In addition to reducing boilerplate copy+paste code, this
also helps ensure that KVM provides consistent handling of SEAMCALL errors.
Opportunistically convert a handful of bare WARN_ON_ONCE() paths to the
equivalent of KVM_BUG_ON(), i.e. have them terminate the VM. If a SEAMCALL
error is fatal enough to WARN on, it's fatal enough to terminate the TD.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Fold tdx_sept_zap_private_spte() into tdx_sept_remove_private_spte()
Do TDH_MEM_RANGE_BLOCK directly in tdx_sept_remove_private_spte() instead
of using a one-off helper now that the nr_premapped tracking is gone.
Opportunistically drop the WARN on hugepages, which was dead code (see the
KVM_BUG_ON() in tdx_sept_remove_private_spte()).
No functional change intended.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: ADD pages to the TD image while populating mirror EPT entries
When populating the initial memory image for a TDX guest, ADD pages to the
TD as part of establishing the mappings in the mirror EPT, as opposed to
creating the mappings and then doing ADD after the fact. Doing ADD in the
S-EPT callbacks eliminates the need to track "premapped" pages, as the
mirror EPT (M-EPT) and S-EPT are always synchronized, e.g. if ADD fails,
KVM reverts to the previous M-EPT entry (guaranteed to be !PRESENT).
Eliminating the hole where the M-EPT can have a mapping that doesn't exist
in the S-EPT in turn obviates the need to handle errors that are unique to
encountering a missing S-EPT entry (see tdx_is_sept_zap_err_due_to_premap()).
Keeping the M-EPT and S-EPT synchronized also eliminates the need to check
for unconsumed "premap" entries during tdx_td_finalize(), as there simply
can't be any such entries. Dropping that check in particular reduces the
overall cognitive load, as the management of nr_premapped with respect
to removal of S-EPT is _very_ subtle. E.g. successful removal of an S-EPT
entry after it completed ADD doesn't adjust nr_premapped, but it's not
clear why that's "ok" but having half-baked entries is not (it's not truly
"ok" in that removing pages from the image will likely prevent the guest
from booting, but from KVM's perspective it's "ok").
Doing ADD in the S-EPT path requires passing an argument via a scratch
field, but the current approach of tracking the number of "premapped"
pages effectively does the same. And the "premapped" counter is much more
dangerous, as it doesn't have a singular lock to protect its usage, since
nr_premapped can be modified as soon as mmu_lock is dropped, at least in
theory. I.e. nr_premapped is guarded by slots_lock, but only for "happy"
paths.
Note, this approach was used/tried at various points in TDX development,
but was ultimately discarded due to a desire to avoid stashing temporary
state in kvm_tdx. But as above, KVM ended up with such state anyways,
and fully committing to using temporary state provides better access
rules (100% guarded by slots_lock), and makes several edge cases flat out
impossible.
Note #2, continue to extend the measurement outside of mmu_lock, as it's
a slow operation (typically 16 SEAMCALLs per page whose data is included
in the measurement), and doesn't *need* to be done under mmu_lock, e.g.
for consistency purposes. However, MR.EXTEND isn't _that_ slow, e.g.
~1ms latency to measure a full page, so if it needs to be done under
mmu_lock in the future, e.g. because KVM gains a flow that can remove
S-EPT entries during KVM_TDX_INIT_MEM_REGION, then extending the
measurement can also be moved into the S-EPT mapping path (again, only if
absolutely necessary). P.S. _If_ MR.EXTEND is moved into the S-EPT path,
take care not to return an error up the stack if TDH_MR_EXTEND fails, as
removing the M-EPT entry but not the S-EPT entry would result in
inconsistent state!
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Fold tdx_mem_page_record_premap_cnt() into tdx_sept_set_private_spte() as
providing a one-off helper for effectively three lines of code is at best a
wash, and splitting the code makes the comment for smp_rmb() _extremely_
confusing as the comment talks about reading kvm->arch.pre_fault_allowed
before kvm_tdx->state, but the immediately visible code does the exact
opposite.
Opportunistically rewrite the comments to more explicitly explain who is
checking what, as well as _why_ the ordering matters.
No functional change intended.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Use atomic64_dec_return() instead of a poor equivalent
Use atomic64_dec_return() when decrementing the number of "pre-mapped"
S-EPT pages to ensure that the count can't go negative without KVM
noticing. In theory, checking for '0' and then decrementing in a separate
operation could miss a 0=>-1 transition. In practice, such a condition is
impossible because nr_premapped is protected by slots_lock, i.e. doesn't
actually need to be an atomic (that wart will be addressed shortly).
Don't bother trying to keep the count non-negative, as the KVM_BUG_ON()
ensures the VM is dead, i.e. there's no point in trying to limp along.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Avoid a double-KVM_BUG_ON() in tdx_sept_zap_private_spte()
Return -EIO immediately from tdx_sept_zap_private_spte() if the number of
to-be-added pages underflows, so that the following "KVM_BUG_ON(err, kvm)"
isn't also triggered. Isolating the check from the "is premap error"
if-statement will also allow adding a lockdep assertion that premap errors
are encountered if and only if slots_lock is held.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: WARN if mirror SPTE doesn't have full RWX when creating S-EPT mapping
Pass in the mirror_spte to kvm_x86_ops.set_external_spte() to provide
symmetry with .remove_external_spte(), and assert in TDX that the mirror
SPTE is shadow-present with full RWX permissions (the TDX-Module doesn't
allow the hypervisor to control protections).
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: x86/mmu: Drop the return code from kvm_x86_ops.remove_external_spte()
Drop the return code from kvm_x86_ops.remove_external_spte(), a.k.a.
tdx_sept_remove_private_spte(), as KVM simply does a KVM_BUG_ON() failure,
and that KVM_BUG_ON() is redundant since all error paths in TDX also do a
KVM_BUG_ON().
Opportunistically pass the spte instead of the pfn, as the API is clearly
about removing an spte.
Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Fold tdx_sept_drop_private_spte() into tdx_sept_remove_private_spte()
Fold tdx_sept_drop_private_spte() into tdx_sept_remove_private_spte() as a
step towards having "remove" be the one and only function that deals with
removing/zapping/dropping a SPTE, e.g. to avoid having to differentiate
between "zap", "drop", and "remove". Eliminating the "drop" helper also
gets rid of what is effectively dead code due to redundant checks, e.g. on
an HKID being assigned.
No functional change intended.
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Return -EIO, not -EINVAL, on a KVM_BUG_ON() condition
Return -EIO when a KVM_BUG_ON() is tripped, as KVM's ABI is to return -EIO
when a VM has been killed due to a KVM bug, not -EINVAL. Note, many (all?)
of the affected paths never propagate the error code to userspace, i.e.
this is about internal consistency more than anything else.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Yan Zhao [Thu, 30 Oct 2025 20:09:31 +0000 (13:09 -0700)]
KVM: TDX: Drop superfluous page pinning in S-EPT management
Don't explicitly pin pages when mapping pages into the S-EPT, guest_memfd
doesn't support page migration in any capacity, i.e. there are no migrate
callbacks because guest_memfd pages *can't* be migrated. See the WARN in
kvm_gmem_migrate_folio().
Eliminating TDX's explicit pinning will also enable guest_memfd to support
in-place conversion between shared and private memory[1][2]. Because KVM
cannot distinguish between speculative/transient refcounts and the
intentional refcount for TDX on private pages[3], failing to release
private page refcount in TDX could cause guest_memfd to indefinitely wait
on decreasing the refcount for the splitting.
Under normal conditions, not holding an extra page refcount in TDX is safe
because guest_memfd ensures pages are retained until its invalidation
notification to KVM MMU is completed. However, if there're bugs in KVM/TDX
module, not holding an extra refcount when a page is mapped in S-EPT could
result in a page being released from guest_memfd while still mapped in the
S-EPT. But, doing work to make a fatal error slightly less fatal is a net
negative when that extra work adds complexity and confusion.
Several approaches were considered to address the refcount issue, including
- Attempting to modify the KVM unmap operation to return a failure,
which was deemed too complex and potentially incorrect[4].
- Increasing the folio reference count only upon S-EPT zapping failure[5].
- Use page flags or page_ext to indicate a page is still used by TDX[6],
which does not work for HVO (HugeTLB Vmemmap Optimization).
- Setting HWPOISON bit or leveraging folio_set_hugetlb_hwpoison()[7].
Due to the complexity or inappropriateness of these approaches, and the
fact that S-EPT zapping failure is currently only possible when there are
bugs in the KVM or TDX module, which is very rare in a production kernel,
a straightforward approach of simply not holding the page reference count
in TDX was chosen[8].
When S-EPT zapping errors occur, KVM_BUG_ON() is invoked to kick off all
vCPUs and mark the VM as dead. Although there is a potential window that a
private page mapped in the S-EPT could be reallocated and used outside the
VM, the loud warning from KVM_BUG_ON() should provide sufficient debug
information. To be robust against bugs, the user can enable panic_on_warn
as normal.
KVM: x86/mmu: Rename kvm_tdp_map_page() to kvm_tdp_page_prefault()
Rename kvm_tdp_map_page() to kvm_tdp_page_prefault() now that it's used
only by kvm_arch_vcpu_pre_fault_memory().
No functional change intended.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Revert "KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU"
Remove the helper and exports that were added to allow TDX code to reuse
kvm_tdp_map_page() for its gmem post-populate flow now that a dedicated
TDP MMU API is provided to install a mapping given a gfn+pfn pair.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: x86/mmu: WARN if KVM attempts to map into an invalid TDP MMU root
When mapping into the TDP MMU, WARN (if KVM_PROVE_MMU=y) if the root is
invalid, e.g. if KVM is attempting to insert a mapping without checking if
the information and MMU context is fresh.
Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: x86/mmu: Add dedicated API to map guest_memfd pfn into TDP MMU
Add and use a new API for mapping a private pfn from guest_memfd into the
TDP MMU from TDX's post-populate hook instead of partially open-coding the
functionality into the TDX code. Sharing code with the pre-fault path
sounded good on paper, but it's fatally flawed as simulating a fault loses
the pfn, and calling back into gmem to re-retrieve the pfn creates locking
problems, e.g. kvm_gmem_populate() already holds the gmem invalidation
lock.
Providing a dedicated API will also removing several MMU exports that
ideally would not be exposed outside of the MMU, let alone to vendor code.
On that topic, opportunistically drop the kvm_mmu_load() export. Leave
kvm_tdp_mmu_gpa_is_mapped() alone for now; the entire commit that added
kvm_tdp_mmu_gpa_is_mapped() will be removed in the near future.
Gate the API on CONFIG_KVM_GUEST_MEMFD=y as private memory _must_ be backed
by guest_memfd. Add a lockdep-only assert to that the incoming pfn is
indeed backed by guest_memfd, and that the gmem instance's invalidate lock
is held (which, combined with slots_lock being held, obviates the need to
check for a stale "fault").
Cc: Michael Roth <michael.roth@amd.com> Cc: Yan Zhao <yan.y.zhao@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20250709232103.zwmufocd3l7sqk7y@amd.com Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: TDX: Drop PROVE_MMU=y sanity check on to-be-populated mappings
Drop TDX's sanity check that a mirror EPT mapping isn't zapped between
creating said mapping and doing TDH.MEM.PAGE.ADD, as the check is
simultaneously superfluous and incomplete. Per commit 2608f1057601
("KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU"), the
justification for introducing kvm_tdp_mmu_gpa_is_mapped() was to check
that the target gfn was pre-populated, with a link that points to this
snippet:
: > One small question:
: >
: > What if the memory region passed to KVM_TDX_INIT_MEM_REGION hasn't been pre-
: > populated? If we want to make KVM_TDX_INIT_MEM_REGION work with these regions,
: > then we still need to do the real map. Or we can make KVM_TDX_INIT_MEM_REGION
: > return error when it finds the region hasn't been pre-populated?
:
: Return an error. I don't love the idea of bleeding so many TDX details into
: userspace, but I'm pretty sure that ship sailed a long, long time ago.
But that justification makes little sense for the final code, as the check
on nr_premapped after TDH.MEM.PAGE.ADD will detect and return an error if
KVM attempted to zap a S-EPT entry (tdx_sept_zap_private_spte() will fail
on TDH.MEM.RANGE.BLOCK due lack of a valid S-EPT entry). And as evidenced
by the "is mapped?" code being guarded with CONFIG_KVM_PROVE_MMU=y, KVM is
NOT relying on the check for general correctness.
The sanity check is also incomplete in the sense that mmu_lock is dropped
between the check and TDH.MEM.PAGE.ADD, i.e. will only detect KVM bugs that
zap SPTEs in a very specific window (note, this also applies to the check
on nr_premapped).
Removing the sanity check will allow removing kvm_tdp_mmu_gpa_is_mapped(),
which has no business being exposed to vendor code, and more importantly
will pave the way for eliminating the "pre-map" approach entirely in favor
of doing TDH.MEM.PAGE.ADD under mmu_lock.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: Rename kvm_arch_vcpu_async_ioctl() to kvm_arch_vcpu_unlocked_ioctl()
Rename the "async" ioctl API to "unlocked" so that upcoming usage in x86's
TDX code doesn't result in a massive misnomer. To avoid having to retry
SEAMCALLs, TDX needs to acquire kvm->lock *and* all vcpu->mutex locks, and
acquiring all of those locks after/inside the current vCPU's mutex is a
non-starter. However, TDX also needs to acquire the vCPU's mutex and load
the vCPU, i.e. the handling is very much not async to the vCPU.
No functional change intended.
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: Make support for kvm_arch_vcpu_async_ioctl() mandatory
Implement kvm_arch_vcpu_async_ioctl() "natively" in x86 and arm64 instead
of relying on an #ifdef'd stub, and drop HAVE_KVM_VCPU_ASYNC_IOCTL in
anticipation of using the API on x86. Once x86 uses the API, providing a
stub for one architecture and having all other architectures opt-in
requires more code than simply implementing the API in the lone holdout.
Eliminating the Kconfig will also reduce churn if the API is renamed in
the future (spoiler alert).
No functional change intended.
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Linus Torvalds [Sat, 1 Nov 2025 17:49:12 +0000 (10:49 -0700)]
Merge tag 'regulator-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A simple fix for a missed part of an API conversion in the bd718x7
driver"
* tag 'regulator-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: bd718x7: Fix voltages scaled by resistor divider
Linus Torvalds [Sat, 1 Nov 2025 17:45:39 +0000 (10:45 -0700)]
Merge tag 'regmap-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"One documentation fix and a fix for a problem with the slimbus regmap
which was uncovered by some changes in one of the drivers"
* tag 'regmap-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: irq: Correct documentation of wake_invert flag
regmap: slimbus: fix bus_context pointer in regmap init calls
Linus Torvalds [Sat, 1 Nov 2025 17:20:07 +0000 (10:20 -0700)]
Merge tag 'x86-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Limit AMD microcode Entrysign sha256 signature checking to
known CPU generations
- Disable AMD RDSEED32 on certain Zen5 CPUs that have a
microcode version before when the microcode-based fix was
issued for the AMD-SB-7055 erratum
- Fix FPU AMD XFD state synchronization on signal delivery
- Fix (work around) a SSE4a-disassembly related build failure
on X86_NATIVE_CPU=y builds
- Extend the AMD Zen6 model space with a new range of models
- Fix <asm/intel-family.h> CPU model comments
- Fix the CONFIG_CFI=y and CONFIG_LTO_CLANG_FULL=y build, which
was unhappy due to missing kCFI type annotations of clear_page()
variants
* tag 'x86-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols
x86/cpu: Add/fix core comments for {Panther,Nova} Lake
x86/CPU/AMD: Extend Zen6 model range
x86/build: Disable SSE4a
x86/fpu: Ensure XFD state on signal delivery
x86/CPU/AMD: Add RDSEED fix for Zen5
x86/microcode/AMD: Limit Entrysign signature checking to known generations
Linus Torvalds [Sat, 1 Nov 2025 17:17:40 +0000 (10:17 -0700)]
Merge tag 'perf-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
"Miscellaneous fixes and CPU model updates:
- Fix an out-of-bounds access on non-hybrid platforms in the Intel
PMU DS code, reported by KASAN
- Add WildcatLake PMU and uncore support: it's identical to the
PantherLake version"
* tag 'perf-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Add uncore PMU support for Wildcat Lake
perf/x86/intel: Add PMU support for WildcatLake
perf/x86/intel: Fix KASAN global-out-of-bounds warning
Linus Torvalds [Sat, 1 Nov 2025 17:07:35 +0000 (10:07 -0700)]
Merge tag 'objtool-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Ingo Molnar:
"Fix objtool warning when faced with raw STAC/CLAC instructions"
* tag 'objtool-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix skip_alt_group() for non-alternative STAC/CLAC
Linus Torvalds [Sat, 1 Nov 2025 17:04:35 +0000 (10:04 -0700)]
Merge tag 'xfs-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
"Just a single bug fix (and documentation for the issue)"
* tag 'xfs-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: document another racy GC case in xfs_zoned_map_extent
xfs: prevent gc from picking the same zone twice
Linus Torvalds [Sat, 1 Nov 2025 17:00:53 +0000 (10:00 -0700)]
Merge tag 'kbuild-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:
- Formally adopt Kconfig in MAINTAINERS
- Fix install-extmod-build for more O= paths
- Align end of .modinfo to fix Authenticode calculation in EDK2
- Restore dynamic check for '-fsanitize=kernel-memory' in
CONFIG_HAVE_KMSAN_COMPILER to ensure backend target has support
for it
- Initialize locale in menuconfig and nconfig to fix UTF-8 terminals
that may not support VT100 ACS by default like PuTTY
* tag 'kbuild-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kconfig/nconf: Initialize the default locale at startup
kconfig/mconf: Initialize the default locale at startup
KMSAN: Restore dynamic check for '-fsanitize=kernel-memory'
kbuild: align modinfo section for Secureboot Authenticode EDK2 compat
kbuild: install-extmod-build: Fix when given dir outside the build dir
MAINTAINERS: Update Kconfig section
Josh Poimboeuf [Wed, 29 Oct 2025 19:54:08 +0000 (12:54 -0700)]
objtool: Fix skip_alt_group() for non-alternative STAC/CLAC
If an insn->alt points to a STAC/CLAC instruction, skip_alt_group()
assumes it's part of an alternative ("alt group") as opposed to some
other kind of "alt" such as an exception fixup.
While that assumption may hold true in the current code base, Linus has
an out-of-tree patch which breaks that assumption by replacing the
STAC/CLAC alternatives with raw STAC/CLAC instructions.
Make skip_alt_group() more robust by making sure it's actually an alt
group before continuing.
Jakub Horký [Tue, 14 Oct 2025 14:44:06 +0000 (16:44 +0200)]
kconfig/nconf: Initialize the default locale at startup
Fix bug where make nconfig doesn't initialize the default locale, which
causes ncurses menu borders to be displayed incorrectly (lqqqqk) in
UTF-8 terminals that don't support VT100 ACS by default, such as PuTTY.
Jakub Horký [Tue, 14 Oct 2025 15:49:32 +0000 (17:49 +0200)]
kconfig/mconf: Initialize the default locale at startup
Fix bug where make menuconfig doesn't initialize the default locale, which
causes ncurses menu borders to be displayed incorrectly (lqqqqk) in
UTF-8 terminals that don't support VT100 ACS by default, such as PuTTY.
- Reject negative head_room in __bpf_skb_change_head (Daniel Borkmann)
- Conditionally include dynptr copy kfuncs (Malin Jonsson)
- Sync pending IRQ work before freeing BPF ring buffer (Noorain Eqbal)
- Do not audit capability check in x86 do_jit() (Ondrej Mosnacek)
- Fix arm64 JIT of BPF_ST insn when it writes into arena memory
(Puranjay Mohan)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf/arm64: Fix BPF_ST into arena memory
bpf: Make migrate_disable always inline to avoid partial inlining
bpf: Reject negative head_room in __bpf_skb_change_head
bpf: Conditionally include dynptr copy kfuncs
libbpf: Fix powerpc's stack register definition in bpf_tracing.h
bpf: Do not audit capability check in do_jit()
bpf: Sync pending IRQ work before freeing ring buffer
x86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols
When building with CONFIG_CFI=y and CONFIG_LTO_CLANG_FULL=y, there is a series
of errors from the various versions of clear_page() not having __kcfi_typeid_
symbols.
$ cat kernel/configs/repro.config
CONFIG_CFI=y
# CONFIG_LTO_NONE is not set
CONFIG_LTO_CLANG_FULL=y
$ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config bzImage
ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_rep
>>> referenced by ld-temp.o
>>> vmlinux.o:(__cfi_clear_page_rep)
With full LTO, it is possible for LLVM to realize that these functions never
have their address taken (as they are only used within an alternative, which
will make them a direct call) across the whole kernel and either drop or skip
generating their kCFI type identification symbols.
clear_page_{rep,orig,erms}() are defined in clear_page_64.S with
SYM_TYPED_FUNC_START as a result of
2981557cb040 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI"),
as exported functions are free to be called indirectly thus need kCFI type
identifiers.
Use KCFI_REFERENCE with these clear_page() functions to force LLVM to see
these functions as address-taken and generate then keep the kCFI type
identifiers.
Linus Torvalds [Fri, 31 Oct 2025 21:47:02 +0000 (14:47 -0700)]
Merge tag 'drm-fixes-2025-10-31' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Simona Vetter:
"Looks like stochastics conspired to make this one a bit bigger, but
nothing scary at all. Also first examples of the new Link: tags, yay!
* tag 'drm-fixes-2025-10-31' of https://gitlab.freedesktop.org/drm/kernel: (44 commits)
drm/ast: Clear preserved bits from register output value
drm/imx: parallel-display: add the bridge before attaching it
drm/imx: parallel-display: convert to devm_drm_bridge_alloc() API
drm/panel: kingdisplay-kd097d04: Disable EoTp
drm/panel: sitronix-st7789v: fix sync flags for t28cp45tn89
drm/xe: Do not wake device during a GT reset
drm/xe: Fix uninitialized return value from xe_validation_guard()
drm/msm/dpu: Fix adjusted mode clock check for 3d merge
drm/msm/dpu: Disable broken YUV on QSEED2 hardware
drm/msm/dpu: Require linear modifier for writeback framebuffers
drm/msm/dpu: Fix pixel extension sub-sampling
drm/msm/dpu: Disable scaling for unsupported scaler types
drm/msm/dpu: Propagate error from dpu_assign_plane_resources
drm/msm/dpu: Fix allocation of RGB SSPPs without scaling
drm/msm: dsi: fix PLL init in bonded mode
drm/i915/dmc: Clear HRR EVT_CTL/HTP to zero on ADL-S
drm/amd/display: Fix incorrect return of vblank enable on unconfigured crtc
drm/amd/display: Add HDR workaround for a specific eDP
drm/amdgpu: fix SPDX header on cyan_skillfish_reg_init.c
drm/amdgpu: fix SPDX header on irqsrcs_vcn_5_0.h
...
Linus Torvalds [Fri, 31 Oct 2025 21:24:32 +0000 (14:24 -0700)]
Merge tag 'pci-v6.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Restore custom qcom ASPM enablement code so L1 PM Substates are
enabled as they were in v6.17 even though the PCI core now enables
just L0s and L1 by default (Bjorn Helgaas)
- Size prefetchable bridge windows only when they actually exist, to
avoid a WARN_ON() regression (Ilpo Järvinen)
* tag 'pci-v6.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Do not size non-existing prefetchable window
Revert "PCI: qcom: Remove custom ASPM enablement code"
Linus Torvalds [Fri, 31 Oct 2025 21:20:09 +0000 (14:20 -0700)]
Merge tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- Fix overflows in vfio type1 backend for mappings at the end of the
64-bit address space, resulting in leaked pinned memory.
New selftest support included to avoid such issues in the future
(Alex Mastro)
* tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio:
vfio: selftests: add end of address space DMA map/unmap tests
vfio: selftests: update DMA map/unmap helpers to support more test kinds
vfio/type1: handle DMA map/unmap up to the addressable limit
vfio/type1: move iova increment to unmap_unpin_*() caller
vfio/type1: sanitize for overflow using check_*_overflow()
Ilpo Järvinen [Mon, 27 Oct 2025 13:24:23 +0000 (15:24 +0200)]
PCI: Do not size non-existing prefetchable window
pbus_size_mem() should only be called for bridge windows that exist but
__pci_bus_size_bridges() may point 'pref' to a resource that does not exist
(has zero flags) in case of non-root buses.
When prefetchable bridge window does not exist, the same non-prefetchable
bridge window is sized more than once which may result in duplicating
entries into the realloc_head list. Duplicated entries are shown in this
log and trigger a WARN_ON() because realloc_head had residual entries after
the resource assignment algorithm:
pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400 PCIe Root Port
pci 0000:00:03.0: PCI bridge to [bus 00]
pci 0000:00:03.0: bridge window [io 0x0000-0x0fff]
pci 0000:00:03.0: bridge window [mem 0x00000000-0x000fffff]
pci 0000:00:03.0: bridge window [mem 0x00200000-0x003fffff] to [bus 02] add_size 200000 add_align 200000
pci 0000:00:03.0: bridge window [mem 0x00200000-0x003fffff] to [bus 02] add_size 200000 add_align 200000
pci 0000:00:03.0: bridge window [mem 0xe0000000-0xe03fffff]: assigned
pci 0000:00:03.0: PCI bridge to [bus 02]
pci 0000:00:03.0: bridge window [mem 0xe0000000-0xe03fffff]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at drivers/pci/setup-bus.c:2373 pci_assign_unassigned_root_bus_resources+0x1bc/0x234
Check resource flags of 'pref' and only size the prefetchable window if the
resource has the IORESOURCE_PREFETCH flag.
Fixes: ae88d0b9c57f ("PCI: Use pbus_select_window_for_type() during mem window sizing") Reported-by: Klaus Kudielka <klaus.kudielka@gmail.com> Closes: https://lore.kernel.org/r/51e8cf1c62b8318882257d6b5a9de7fdaaecc343.camel@gmail.com/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Klaus Kudielka <klaus.kudielka@gmail.com> Link: https://patch.msgid.link/20251027132423.8841-1-ilpo.jarvinen@linux.intel.com
Prior to a729c1664619 ("PCI: qcom: Remove custom ASPM enablement code"),
the qcom controller driver enabled ASPM, including L0s, L1, and L1 PM
Substates, for all devices powered on at the time the controller driver
enumerates them.
ASPM was *not* enabled for devices powered on later by pwrctrl (unless the
kernel was built with PCIEASPM_POWERSAVE or PCIEASPM_POWER_SUPERSAVE, or
the user enabled ASPM via module parameter or sysfs).
After f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for
devicetree platforms"), the PCI core enabled all ASPM states for all
devices whether powered on initially or by pwrctrl, so a729c1664619 was
unnecessary and reverted.
But f3ac2ff14834 was too aggressive and broke platforms that didn't support
CLKREQ# or required device-specific configuration for L1 Substates, so df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
enabled only L0s and L1.
On Qualcomm platforms, this left L1 Substates disabled, which was a
regression. Revert a729c1664619 so L1 Substates will be enabled on devices
that are initially powered on. Devices powered on by pwrctrl will be
addressed later.
Fixes: df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms") Reported-by: Johan Hovold <johan@kernel.org> Closes: https://lore.kernel.org/lkml/aPuXZlaawFmmsLmX@hovoldconsulting.com/ Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Johan Hovold <johan@kernel.org> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20251024210514.1365996-1-helgaas@kernel.org
Linus Torvalds [Fri, 31 Oct 2025 19:57:19 +0000 (12:57 -0700)]
Merge tag 'block-6.18-20251031' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- Fix blk-crypto reporting EIO when EINVAL is the correct error code
- Two bug fixes for the block zone support
- NVME pull request via Keith:
- Target side authentication fixup
- Peer-to-peer metadata fixup
- null_blk DMA alignment fix
* tag 'block-6.18-20251031' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
null_blk: set dma alignment to logical block size
blk-crypto: use BLK_STS_INVAL for alignment errors
block: make REQ_OP_ZONE_OPEN a write operation
block: fix op_is_zone_mgmt() to handle REQ_OP_ZONE_RESET_ALL
nvme-pci: use blk_map_iter for p2p metadata
nvmet-auth: update sc_c in host response
Linus Torvalds [Fri, 31 Oct 2025 19:50:35 +0000 (12:50 -0700)]
Merge tag 's390-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Use correct locking in zPCI event code to avoid deadlock
- Get rid of irqs_registered flag in zpci_dev structure and restore IRQ
unconditionally for zPCI devices. This fixes sit uations where the
flag was not correctly updated
- Disable (revert) ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP for s390 again.
The optimized hugetlb vmemmap code modifies kernel page tables in a
way which does not work on s390 and leads to reproducible kernel
crashes due to stale TLB entries. This needs to be addressed with
some larger changes. For now simply disable the feature
- Update defconfigs
* tag 's390-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: Disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
s390/mm: Fix memory leak in add_marker() when kvrealloc() fails
s390/pci: Restore IRQ unconditionally for the zPCI device
s390: Update defconfigs
s390/pci: Avoid deadlock between PCI error recovery and mlx5 crdump
Puranjay Mohan [Thu, 30 Oct 2025 12:17:14 +0000 (12:17 +0000)]
bpf/arm64: Fix BPF_ST into arena memory
The arm64 JIT supports BPF_ST with BPF_PROBE_MEM32 (arena) by using the
tmp2 register to hold the dst + arena_vm_base value and using tmp2 as the
new dst register. But this is broken because in case is_lsi_offset()
returns false the tmp2 will be clobbered by emit_a64_mov_i(1, tmp2, off,
ctx); and hence the emitted store instruction will be of the form:
strb w10, [x11, x11]
Fix this by using the third temporary register to hold the dst +
arena_vm_base.
Two functions with identical names but different addresses are
considered ambiguous and removed by "pahole" from vmlinux BTF.
Later resolve_btfids warns since it cannot find them.
Commit 378b7708194f ("sched: Make migrate_{en,dis}able() inline") made
them inlineable in most places, but in vmlinux built with llvm 21 and 22
there are four symbols for migrate_{enable,disable}:
three static functions and one global function.
Fix the issue by marking migrate_{enable,disable} as always inline.
The alternative is to mark them as notrace/nokprobe which is more
drastic. Only bpf programs are prevented from attaching to these
functions. The rest of the tracing shouldn't be affected.
[note: Peter ok-ed the patch, Alexei rewrote commit log]
Fixes: 378b7708194f ("sched: Make migrate_{en,dis}able() inline") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Menglong Dong <menglong.dong@linux.dev> Link: https://lore.kernel.org/r/20251029183646.3811774-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Linus Torvalds [Fri, 31 Oct 2025 16:34:21 +0000 (09:34 -0700)]
Merge tag '6.18-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fix potential UAF in statfs
- DFS fix for expired referrals
- fix minor modinfo typo
- small improvement to reconnect for smbdirect
* tag '6.18-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: call smbd_destroy() in the same splace as kernel_sock_shutdown()/sock_release()
smb: client: handle lack of IPC in dfs_cache_refresh()
smb: client: fix potential cfid UAF in smb2_query_info_compound
cifs: fix typo in enable_gcm_256 module parameter
Hans Holmberg [Fri, 31 Oct 2025 09:48:26 +0000 (10:48 +0100)]
null_blk: set dma alignment to logical block size
This driver assumes that bio vectors are memory aligned to the logical
block size, so set the queue limit to reflect that.
Unless we set up the limit based on the logical block size, we will go
out of page bounds in copy_to_nullb / copy_from_nullb.
Apparently this wasn't noticed so far because none of the tests generate
such buffers, but since commit 851c4c96db00 ("xfs: implement
XFS_IOC_DIOINFO in terms of vfs_getattr") xfstests generates unaligned
I/O, which now lead to memory corruption when using null_blk devices
with 4k block size.
Fixes: bf8d08532bc1 ("iomap: add support for dma aligned direct-io") Fixes: b1a000d3b8ec ("block: relax direct io memory alignment") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 31 Oct 2025 14:29:09 +0000 (07:29 -0700)]
Merge tag 'sound-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. It became slightly bigger than usual due
to timing issues (holidays, etc), but all changes are rather
device-specific fixes, so not really worrisome.
- ASoC Cirrus codec fixes for AMD
- Various fixes for ASoC Intel AVS, Qualcomm, SoundWire, FSL,
Mediatek, Renesas
- A few HD-audio quirks, and USB-audio regression fixes for Presonus"
* tag 'sound-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda/realtek: Enable mic on Vaio RPL
ASoC: dt-bindings: pm4125-sdw: correct number of soundwire ports
ASoC: renesas: rz-ssi: Use proper dma_buffer_pos after resume
ASoC: soc_sdw_utils: remove cs42l43 component_name
ASoC: fsl_sai: Fix sync error in consumer mode
ASoC: Fix build for sdw_utils
ALSA: hda/realtek: Fix mute led for HP Victus 15-fa1xxx (MB 8C2D)
ASoC: rt721: fix prepare clock stop failed
ALSA: usb-audio: don't log messages meant for 1810c when initializing 1824c
ASoC: mediatek: Fix double pm_runtime_disable in remove functions
ASoC: fsl_micfil: correct the endian format for DSD
ASoC: fsl_sai: fix bit order for DSD format
ASoC: Intel: avs: Use snd_codec format when initializing probe
ASoC: Intel: avs: Disable periods-elapsed work when closing PCM
ASoC: Intel: avs: Unprepare a stream when XRUN occurs
ASoC: sdw_utils: add name_prefix for rt1321 part id
ASoC: qdsp6: q6asm: do not sleep while atomic
ASoC: Intel: soc-acpi-intel-ptl-match: Remove cs42l43 match from sdw link3
ASOC: max98090/91: fix for filter configuration: AHPF removed DMIC2_HPF added
ASoC: amd: acp: Add ACP7.0 match entries for cs35l56 and cs42l43
...
Linus Torvalds [Fri, 31 Oct 2025 14:25:10 +0000 (07:25 -0700)]
Merge tag 'v6.18-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Fix double free in aspeed
- Fix req->nbytes clobbering in s390/phmac
* tag 'v6.18-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: aspeed - fix double free caused by devm
crypto: s390/phmac - Do not modify the req->nbytes value
Linus Torvalds [Fri, 31 Oct 2025 14:08:47 +0000 (07:08 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"ufs driver plus two core fixes.
One core fix makes the unit attention counters atomic (just in case
multiple commands detect them) and the other is fixing a merge window
regression caused by changes in the block tree"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: Fix the unit attention counter implementation
scsi: ufs: core: Declare tx_lanes witout initialization
scsi: ufs: core: Initialize value of an attribute returned by uic cmd
scsi: ufs: core: Fix error handler host_sem issue
scsi: core: Fix a regression triggered by scsi_host_busy()
xfs: document another racy GC case in xfs_zoned_map_extent
Besides blocks being invalidated, there is another case when the original
mapping could have changed between querying the rmap for GC and calling
xfs_zoned_map_extent. Document it there as it took us quite some time
to figure out what is going on while developing the multiple-GC
protection fix.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
When we are picking a zone for gc it might already be in the pipeline
which can lead to us moving the same data twice resulting in in write
amplification and a very unfortunate case where we keep on garbage
collecting the zone we just filled with migrated data stopping all
forward progress.
Fix this by introducing a count of on-going GC operations on a zone, and
skip any zone with ongoing GC when picking a new victim.
Fixes: 080d01c41 ("xfs: implement zoned garbage collection") Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Linus Torvalds [Fri, 31 Oct 2025 02:48:13 +0000 (19:48 -0700)]
Merge tag 'linux_kselftest-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Fix build warning in cachestat found during clang build and add
tmpshmcstat to .gitignore"
* tag 'linux_kselftest-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: cachestat: Fix warning on declaration under label
selftests/cachestat: add tmpshmcstat file to .gitignore
Linus Torvalds [Fri, 31 Oct 2025 02:11:27 +0000 (19:11 -0700)]
Merge tag 'linux_kselftest-kunit-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
"Fix log overwrite in param_tests and fixes incorrect cast of priv
pointer in test_dev_action().
Update email address for Rae Moar in MAINTAINERS KUnit entry"
* tag 'linux_kselftest-kunit-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
MAINTAINERS: Update KUnit email address for Rae Moar
kunit: prevent log overwrite in param_tests
kunit: test_dev_action: Correctly cast 'priv' pointer to long*
Linus Torvalds [Fri, 31 Oct 2025 02:05:46 +0000 (19:05 -0700)]
Merge tag 'acpi-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix three ACPI driver issues and add version checks to two ACPI
table parsers:
- Call input_free_device() on failing input device registration as
necessary (and mentioned in the input subsystem documentation) in
the ACPI button driver (Kaushlendra Kumar)
- Fix use-after-free in acpi_video_switch_brightness() by canceling a
delayed work during tear-down (Yuhao Jiang)
- Use platform device for devres-related actions in the ACPI fan
driver to allow device-managed resources to be cleaned up properly
(Armin Wolf)
- Add version checks to the MRRM and SPCR table parsers (Tony Luck
and Punit Agrawal)"
* tag 'acpi-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: SPCR: Check for table version when using precise baudrate
ACPI: MRRM: Check revision of MRRM table
ACPI: fan: Use platform device for devres-related actions
ACPI: fan: Use ACPI handle when retrieving _FST
ACPI: video: Fix use-after-free in acpi_video_switch_brightness()
ACPI: button: Call input_free_device() on failing input device registration
Linus Torvalds [Fri, 31 Oct 2025 02:02:16 +0000 (19:02 -0700)]
Merge tag 'pm-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix three regressions, two recent ones and one introduced during
the 6.17 development cycle:
- Add an exit latency check to the menu cpuidle governor in the case
when it considers using a real idle state instead of a polling one
to address a performance regression (Rafael Wysocki)
- Revert an attempted cleanup of a system suspend code path that
introduced a regression elsewhere (Samuel Wu)
- Allow pm_restrict_gfp_mask() to be called multiple times in a row
and adjust pm_restore_gfp_mask() accordingly to avoid having to
play nasty games with these calls during hibernation (Rafael
Wysocki)"
* tag 'pm-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Allow pm_restrict_gfp_mask() stacking
cpuidle: governors: menu: Select polling state in some more cases
Revert "PM: sleep: Make pm_wakeup_clear() call more clear"
- fbcon: Fix slab-use-after-free in fb_mode_is_equal (Quanmin Yan)
- fb.h: Fix typo in "vertical" (Piyush Choudhary)
* tag 'fbdev-for-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: atyfb: Check if pll_ops->init_pll failed
fbcon: Set fb_display[i]->mode to NULL when the mode is released
fbdev: bitblit: bound-check glyph index in bit_putcs*
fbdev: pvr2fb: Fix leftover reference to ONCHIP_NR_DMA_CHANNELS
fbdev: valkyriefb: Fix reference count leak in valkyriefb_init
video: fb: Fix typo in comment in fb.h
drm/ast: Clear preserved bits from register output value
Preserve the I/O register bits in __ast_write8_i_masked() as specified
by preserve_mask. Accidentally OR-ing the output value into these will
overwrite the register's previous settings.
Fixes display output on the AST2300, where the screen can go blank at
boot. The driver's original commit 312fec1405dd ("drm: Initial KMS
driver for AST (ASpeed Technologies) 2000 series (v2)") already added
the broken code. Commit 6f719373b943 ("drm/ast: Blank with VGACR17 sync
enable, always clear VGACRB6 sync off") triggered the bug.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reported-by: Peter Schneider <pschneider1968@googlemail.com> Closes: https://lore.kernel.org/dri-devel/a40caf8e-58ad-4f9c-af7f-54f6f69c29bb@googlemail.com/ Tested-by: Peter Schneider <pschneider1968@googlemail.com> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Fixes: 6f719373b943 ("drm/ast: Blank with VGACR17 sync enable, always clear VGACRB6 sync off") Fixes: 312fec1405dd ("drm: Initial KMS driver for AST (ASpeed Technologies) 2000 series (v2)") Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Nick Bowler <nbowler@draconx.ca> Cc: Douglas Anderson <dianders@chromium.org> Cc: Dave Airlie <airlied@redhat.com> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v3.5+ Link: https://patch.msgid.link/20251024073626.129032-1-tzimmermann@suse.de
Merge branches 'acpi-button', 'acpi-video' and 'acpi-fan'
Merge ACPI button, ACPI backlight (video), and ACPI fan driver fixes for
6.18-rc4:
- Call input_free_device() on failing input device registration as
necessary (and mentioned in the input subsystem documentation) in the
ACPI button driver (Kaushlendra Kumar)
- Fix use-after-free in acpi_video_switch_brightness() by canceling
a delayed work during tear-down (Yuhao Jiang)
- Use platform device for devres-related actions in the ACPI fan driver
to allow device-managed resources to be cleaned up properly (Armin
Wolf)
Merge a cpuidle fix and two fixes related to system sleep for 6.18-rc4:
- Add an exit latency check to the menu cpuidle governor in the case
when it considers using a real idle state instead of a polling one to
address a performance regression (Rafael Wysocki)
- Revert an attempted cleanup of a system suspend code path that
introduced a regression elsewhere (Samuel Wu)
- Allow pm_restrict_gfp_mask() to be called multiple times in a row
and adjust pm_restore_gfp_mask() accordingly to avoid having to play
nasty games with these calls during hibernation (Rafael Wysocki)
* pm-cpuidle:
cpuidle: governors: menu: Select polling state in some more cases
* pm-sleep:
PM: sleep: Allow pm_restrict_gfp_mask() stacking
Revert "PM: sleep: Make pm_wakeup_clear() call more clear"
Heiko Carstens [Thu, 30 Oct 2025 14:55:05 +0000 (15:55 +0100)]
s390: Disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
As reported by Luiz Capitulino enabling HVO on s390 leads to reproducible
crashes. The problem is that kernel page tables are modified without
flushing corresponding TLB entries.
Even if it looks like the empty flush_tlb_all() implementation on s390 is
the problem, it is actually a different problem: on s390 it is not allowed
to replace an active/valid page table entry with another valid page table
entry without the detour over an invalid entry. A direct replacement may
lead to random crashes and/or data corruption.
In order to invalidate an entry special instructions have to be used
(e.g. ipte or idte). Alternatively there are also special instructions
available which allow to replace a valid entry with a different valid
entry (e.g. crdte or cspg).
Given that the HVO code currently does not provide the hooks to allow for
an implementation which is compliant with the s390 architecture
requirements, disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP again, which is
basically a revert of the original patch which enabled it.
Luca Ceresoli [Tue, 14 Oct 2025 11:30:51 +0000 (13:30 +0200)]
drm/imx: parallel-display: convert to devm_drm_bridge_alloc() API
This is the new API for allocating DRM bridges.
This conversion was missed during the initial conversion of all bridges to
the new API. Thus all kernels with commit 94d50c1a2ca3 ("drm/bridge:
get/put the bridge reference in drm_bridge_attach/detach()") and using this
driver now warn due to drm_bridge_attach() incrementing the refcount, which
is not initialized without using devm_drm_bridge_alloc() for allocation.
To make the conversion simple and straightforward without messing up with
the drmm_simple_encoder_alloc(), move the struct drm_bridge from struct
imx_parallel_display_encoder to struct imx_parallel_display.
Also remove the 'struct imx_parallel_display *pd' from struct
imx_parallel_display_encoder, not needed anymore.
Fixes: 94d50c1a2ca3 ("drm/bridge: get/put the bridge reference in drm_bridge_attach/detach()") Reported-by: Ernest Van Hoecke <ernestvanhoecke@gmail.com> Closes: https://lore.kernel.org/all/hlf4wdopapxnh4rekl5s3kvoi6egaga3lrjfbx6r223ar3txri@3ik53xw5idyh/ Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com> Tested-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com> Link: https://patch.msgid.link/20251014-drm-bridge-alloc-imx-ipuv3-v1-1-a1bb1dcbff50@bootlin.com Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Takashi Iwai [Thu, 30 Oct 2025 12:08:08 +0000 (13:08 +0100)]
Merge tag 'asoc-fix-v6.18-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.18
A bigger batch of fixes than I'd like, things built up due to holidays
and some last minute issues which caused me to hold off on sending a pul
request. None of these are super remarkable, and there's a few new
device IDs in here too including a relatively big block of AMD devices.
The Cirrus Logic CS530x support subject line is actually a fix that was
on the start of that series and got pulled in here, I forgot to fix the
subject up when merging.
Maud Spierings [Thu, 30 Oct 2025 06:35:38 +0000 (07:35 +0100)]
regulator: bd718x7: Fix voltages scaled by resistor divider
The .min_sel and .max_sel fields remained uninitialized in the new
linear_range, causing an error further down the line. Copy the old
values of these fields to the new one as they represent the range of
register values, which does not change.
Fixes: d2ad981151b3a ("regulator: bd718x7: Support external connection to scale voltages") Signed-off-by: Maud Spierings <maudspierings@gocontroll.com> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com> Link: https://patch.msgid.link/20251030-mini_iv-v3-2-ef56c4d9f219@gocontroll.com Signed-off-by: Mark Brown <broonie@kernel.org>
Ranganath V N [Sun, 26 Oct 2025 16:33:12 +0000 (22:03 +0530)]
net: sctp: fix KMSAN uninit-value in sctp_inq_pop
Fix an issue detected by syzbot:
KMSAN reported an uninitialized-value access in sctp_inq_pop
BUG: KMSAN: uninit-value in sctp_inq_pop
The issue is actually caused by skb trimming via sk_filter() in sctp_rcv().
In the reproducer, skb->len becomes 1 after sk_filter(), which bypassed the
original check:
if (skb->len < sizeof(struct sctphdr) + sizeof(struct sctp_chunkhdr) +
skb_transport_offset(skb))
To handle this safely, a new check should be performed after sk_filter().
Reported-by: syzbot+d101e12bccd4095460e7@syzkaller.appspotmail.com Tested-by: syzbot+d101e12bccd4095460e7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d101e12bccd4095460e7 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Ranganath V N <vnranganath.20@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251026-kmsan_fix-v3-1-2634a409fa5f@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vaio RPL is equipped with ACL256, and needs a
fix to make the internal mic and headphone mic to work.
Also must to limits the internal microphone boost.
Shivaji Kant [Wed, 29 Oct 2025 06:54:19 +0000 (06:54 +0000)]
net: devmem: refresh devmem TX dst in case of route invalidation
The zero-copy Device Memory (Devmem) transmit path
relies on the socket's route cache (`dst_entry`) to
validate that the packet is being sent via the network
device to which the DMA buffer was bound.
However, this check incorrectly fails and returns `-ENODEV`
if the socket's route cache entry (`dst`) is merely missing
or expired (`dst == NULL`). This scenario is observed during
network events, such as when flow steering rules are deleted,
leading to a temporary route cache invalidation.
This patch fixes -ENODEV error for `net_devmem_get_binding()`
by doing the following:
1. It attempts to rebuild the route via `rebuild_header()`
if the route is initially missing (`dst == NULL`). This
allows the TCP/IP stack to recover from transient route
cache misses.
2. It uses `rcu_read_lock()` and `dst_dev_rcu()` to safely
access the network device pointer (`dst_dev`) from the
route, preventing use-after-free conditions if the
device is concurrently removed.
3. It maintains the critical safety check by validating
that the retrieved destination device (`dst_dev`) is
exactly the device registered in the Devmem binding
(`binding->dev`).
These changes prevent unnecessary ENODEV failures while
maintaining the critical safety requirement that the
Devmem resources are only used on the bound network device.
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: Vedant Mathur <vedantmathur@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: bd61848900bf ("net: devmem: Implement TX path") Signed-off-by: Shivaji Kant <shivajikant@google.com> Link: https://patch.msgid.link/20251029065420.3489943-1-shivajikant@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: stmmac: Fixes for stmmac Tx VLAN insert and EST
This patchset includes following fixes for stmmac Tx VLAN insert and
EST implementations:
1. Disable STAG insertion offloading, as DWMAC IPs doesn't support
offload of STAG for double VLAN packets and CTAG for single VLAN
packets when using the same register configuration. The current
configuration in the driver is undocumented and is adding an
additional 802.1Q tag with VLAN ID 0 for double VLAN packets.
2. Consider Tx VLAN offload tag length for maxSDU estimation.
3. Fix GCL bounds check
====================
Rohan G Thomas [Tue, 28 Oct 2025 03:18:45 +0000 (11:18 +0800)]
net: stmmac: est: Fix GCL bounds checks
Fix the bounds checks for the hw supported maximum GCL entry
count and gate interval time.
Fixes: b60189e0392f ("net: stmmac: Integrate EST with TAPRIO scheduler API") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Link: https://patch.msgid.link/20251028-qbv-fixes-v4-3-26481c7634e3@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohan G Thomas [Tue, 28 Oct 2025 03:18:44 +0000 (11:18 +0800)]
net: stmmac: Consider Tx VLAN offload tag length for maxSDU
Queue maxSDU requirement of 802.1 Qbv standard requires mac to drop
packets that exceeds maxSDU length and maxSDU doesn't include
preamble, destination and source address, or FCS but includes
ethernet type and VLAN header.
On hardware with Tx VLAN offload enabled, VLAN header length is not
included in the skb->len, when Tx VLAN offload is requested. This
leads to incorrect length checks and allows transmission of
oversized packets. Add the VLAN_HLEN to the skb->len before checking
the Qbv maxSDU if Tx VLAN offload is requested for the packet.
Fixes: c5c3e1bfc9e0 ("net: stmmac: Offload queueMaxSDU from tc-taprio") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Link: https://patch.msgid.link/20251028-qbv-fixes-v4-2-26481c7634e3@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohan G Thomas [Tue, 28 Oct 2025 03:18:43 +0000 (11:18 +0800)]
net: stmmac: vlan: Disable 802.1AD tag insertion offload
The DWMAC IP's VLAN tag insertion offload does not support inserting
STAG (802.1AD) and CTAG (802.1Q) types in bytes 13 and 14 using the
same MAC_VLAN_Incl and MAC_VLAN_Inner_Incl register configurations.
Currently, MAC_VLAN_Incl is configured to offload only STAG type
insertion. However, the DWMAC IP inserts a CTAG type when the inner
VLAN ID field of the descriptor is not configured, and a STAG type
when it is configured. This behavior is not documented and leads to
inconsistent double VLAN tagging.
Additionally, an unexpected CTAG with VLAN ID 0 is inserted, resulting
in frames like:
Frame 1: 110 bytes on wire (880 bits), 110 bytes captured (880 bits)
Ethernet II, Src: <src> (<src>), Dst: <dst> (<dst>)
IEEE 802.1ad, ID: 100
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 0 (unexpected)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 200
Internet Protocol Version 4, Src: 192.168.4.10, Dst: 192.168.4.11
Internet Control Message Protocol
To avoid this undocumented and incorrect behavior, disable 802.1AD tag
insertion offload. Also, don't set CSVL bit. As per the data book,
when this bit is set, S-VLAN type (0x88A8) is inserted in the 13th and
14th bytes of transmitted packets and when this bit is reset, C-VLAN
type (0x8100) is inserted in the 13th and 14th bytes of transmitted
packets.
Fixes: 30d932279dc2 ("net: stmmac: Add support for VLAN Insertion Offload") Fixes: e94e3f3b51ce ("net: stmmac: Add support for VLAN Insertion Offload in GMAC4+") Fixes: 1d2c7a5fee31 ("net: stmmac: Refactor VLAN implementation") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Boon Khai Ng <boon.khai.ng@altera.com> Link: https://patch.msgid.link/20251028-qbv-fixes-v4-1-26481c7634e3@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
tls: Introduce and use RX async resync request cancel function
This series by Shahar introduces RX async resync request cancel function
in tls module, and uses it in mlx5e driver.
For a device-offloaded TLS RX connection, the TLS module increments
rcd_delta each time a new TLS record is received, tracking the distance
from the original resync request. In the meanwhile, the device is
queried and is expected to respond, asynchronously.
However, if the device response is delayed or fails (e.g due to unstable
connection and device getting out of tracking, hardware errors, resource
exhaustion etc.), the TLS module keeps logging and incrementing
rcd_delta, which can lead to a WARN() when rcd_delta exceeds the
threshold.
This series improves this code area by canceling the resync request when
spotting an issue with the device response.
====================
Shahar Shitrit [Sun, 26 Oct 2025 20:03:03 +0000 (22:03 +0200)]
net/mlx5e: kTLS, Cancel RX async resync request in error flows
When device loses track of TLS records, it attempts to resync by
monitoring records and requests an asynchronous resynchronization
from software for this TLS connection.
The TLS module handles such device RX resync requests by logging record
headers and comparing them with the record tcp_sn when provided by the
device. It also increments rcd_delta to track how far the current
record tcp_sn is from the tcp_sn of the original resync request.
If the device later responds with a matching tcp_sn, the TLS module
approves the tcp_sn for resync.
However, the device response may be delayed or never arrive,
particularly due to traffic-related issues such as packet drops or
reordering. In such cases, the TLS module remains unaware that resync
will not complete, and continues performing unnecessary work by logging
headers and incrementing rcd_delta, which can eventually exceed the
threshold and trigger a WARN(). For example, this was observed when the
device got out of tracking, causing
mlx5e_ktls_handle_get_psv_completion() to fail and ultimately leading
to the rcd_delta warning.
To address this, call tls_offload_rx_resync_async_request_cancel()
to cancel the resync request and stop resync tracking in such error
cases. Also, increment the tls_resync_req_skip counter to track these
cancellations.
Shahar Shitrit [Sun, 26 Oct 2025 20:03:02 +0000 (22:03 +0200)]
net: tls: Cancel RX async resync request on rcd_delta overflow
When a netdev issues a RX async resync request for a TLS connection,
the TLS module handles it by logging record headers and attempting to
match them to the tcp_sn provided by the device. If a match is found,
the TLS module approves the tcp_sn for resynchronization.
While waiting for a device response, the TLS module also increments
rcd_delta each time a new TLS record is received, tracking the distance
from the original resync request.
However, if the device response is delayed or fails (e.g due to
unstable connection and device getting out of tracking, hardware
errors, resource exhaustion etc.), the TLS module keeps logging and
incrementing, which can lead to a WARN() when rcd_delta exceeds the
threshold.
To address this, introduce tls_offload_rx_resync_async_request_cancel()
to explicitly cancel resync requests when a device response failure is
detected. Call this helper also as a final safeguard when rcd_delta
crosses its threshold, as reaching this point implies that earlier
cancellation did not occur.
Shahar Shitrit [Sun, 26 Oct 2025 20:03:01 +0000 (22:03 +0200)]
net: tls: Change async resync helpers argument
Update tls_offload_rx_resync_async_request_start() and
tls_offload_rx_resync_async_request_end() to get a struct
tls_offload_resync_async parameter directly, rather than
extracting it from struct sock.
This change aligns the function signatures with the upcoming
tls_offload_rx_resync_async_request_cancel() helper, which
will be introduced in a subsequent patch.
Jakub Kicinski [Thu, 30 Oct 2025 01:25:12 +0000 (18:25 -0700)]
Merge tag 'nf-25-10-29' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: updates for net
1) its not possible to attach conntrack labels via ctnetlink
unless one creates a dummy 'ct labels set' rule in nftables.
This is an oversight, the 'ruleset tests presence, userspace
(netlink) sets' use-case is valid and should 'just work'.
Always broken since this got added in Linux 4.7.
2) nft_connlimit reads count value without holding the relevant
lock, add a READ_ONCE annotation. From Fernando Fernandez Mancera.
3) There is a long-standing bug (since 4.12) in nftables helper infra
when NAT is in use: if the helper gets assigned after the nat binding
was set up, we fail to initialise the 'seqadj' extension, which is
needed in case NAT payload rewrites need to add (or remove) from the
packet payload. Fix from Andrii Melnychenko.
* tag 'nf-25-10-29' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_ct: add seqadj extension for natted connections
netfilter: nft_connlimit: fix possible data race on connection count
netfilter: nft_ct: enable labels for get case too
====================
smb: client: call smbd_destroy() in the same splace as kernel_sock_shutdown()/sock_release()
With commit b0432201a11b ("smb: client: let destroy_mr_list() keep
smbdirect_mr_io memory if registered") the changes from commit 214bab448476 ("cifs: Call MID callback before destroying transport") and
commit 1d2a4f57cebd ("cifs:smbd When reconnecting to server, call
smbd_destroy() after all MIDs have been called") are no longer needed.
And it's better to use the same logic flow, so that
the chance of smbdirect related problems is smaller.
Fixes: 214bab448476 ("cifs: Call MID callback before destroying transport") Fixes: 1d2a4f57cebd ("cifs:smbd When reconnecting to server, call smbd_destroy() after all MIDs have been called") Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>