git.ipfire.org Git - thirdparty/linux.git/log

Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM x86 changes for 6.18

- Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw
   where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's
   intercepts and trigger a variety of WARNs in KVM.

- Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
   supposed to exist for v2 PMUs.

- Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.

- Clean up KVM's vector hashing code for delivering lowest priority IRQs.

- Clean up the fastpath handler code to only handle IPIs and WRMSRs that are
   actually "fast", as opposed to handling those that KVM _hopes_ are fast, and
   in the process of doing so add fastpath support for TSC_DEADLINE writes on
   AMD CPUs.

- Clean up a pile of PMU code in anticipation of adding support for mediated
   vPMUs.

- Add support for the immediate forms of RDMSR and WRMSRNS, sans full
   emulator support (KVM should never need to emulate the MSRs outside of
   forced emulation and other contrived testing scenarios).

- Clean up the MSR APIs in preparation for CET and FRED virtualization, as
   well as mediated vPMU support.

- Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs,
   as KVM can't faithfully emulate an I/O APIC for such guests.

- KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation
   for mediated vPMU support, as KVM will need to recalculate MSR intercepts in
   response to PMU refreshes for guests with mediated vPMUs.

- Misc cleanups and minor fixes.

Merge tag 'kvm-x86-ciphertext-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM SEV-SNP CipherText Hiding support for 6.18

Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents
unauthorized CPU accesses from reading the ciphertext of SNP guest private
memory, e.g. to attempt an offline attack. Instead of ciphertext, the CPU
will always read back all FFs when CipherText Hiding is enabled.

Add new module parameter to the KVM module to enable CipherText Hiding and
control the number of ASIDs that can be used for VMs with CipherText Hiding,
which is in effect the number of SNP VMs. When CipherText Hiding is enabled,
the shared SEV-ES/SEV-SNP ASID space is split into separate ranges for SEV-ES
and SEV-SNP guests, i.e. ASIDs that can be used for CipherText Hiding cannot
be used to run SEV-ES guests.

Merge tag 'kvm-x86-svm-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.18

- Require a minimum GHCB version of 2 when starting SEV-SNP guests via
   KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors
   instead of latent guest failures.

- Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted
   host from tampering with the guest's TSC frequency, while still allowing the
   the VMM to configure the guest's TSC frequency prior to launch.

- Mitigate the potential for TOCTOU bugs when accessing GHCB fields by
   wrapping all accesses via READ_ONCE().

- Validate the XCR0 provided by the guest (via the GHCB) to avoid tracking a
   bogous XCR0 value in KVM's software model.

- Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to
   avoid leaving behind stale state (thankfully not consumed in KVM).

- Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE
   instead of subtly relying on guest_memfd to do the "heavy" lifting.

- Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's
   desired TSC_AUX, to fix a bug where KVM could clobber a different vCPU's
   TSC_AUX due to hardware not matching the value cached in the user-return MSR
   infrastructure.

- Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported,
   and clean up the AVIC initialization code along the way.

Merge tag 'kvm-x86-vmx-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM VMX changes for 6.18

- Add read/write helpers for MSRs that need to be accessed with preemption
disable to prepare for virtualizing FRED RSP0.

- Fix a bug where KVM would return 0/success from __tdx_bringup() on error,
i.e. where KVM would load with enable_tdx=true despite TDX not being usable.

- Minor cleanups.

Merge tag 'kvm-x86-mmu-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM x86 MMU changes for 6.18

- Recover possible NX huge pages within the TDP MMU under read lock to
   reduce guest jitter when restoring NX huge pages.

- Return -EAGAIN during prefault if userspace concurrently deletes/moves the
   relevant memslot to fix an issue where prefaulting could deadlock with the
   memslot update.

- Don't retry in TDX's anti-zero-step mitigation if the target memslot is
   invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar
   to the aforementioned prefaulting case.

Merge tag 'kvm-x86-generic-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM common changes for 6.18

Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is
now included in GFP_NOWAIT.

Merge tag 'kvm-x86-guest-6.18' of https://github.com/kvm-x86/linux into HEAD

x86/kvm guest side changes for 6.18

- For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
   overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
   could map devices as WB and prevent the device drivers from mapping their
   devices with UC/UC-.

- Make kvm_async_pf_task_wake() a local static helper and remove its
   export.

- Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
   bindings even when PV_UNHALT is unsupported.

Merge tag 'kvm-x86-selftests-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM selftests changes for 6.18

- Add #DE coverage in the fastops test (the only exception that's guest-
triggerable in fastop-emulated instructions).

- Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
Forest (SRF) and Clearwater Forest (CWF).

- Minor cleanups and improvements

Merge tag 'loongarch-kvm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.18

1. Add PTW feature detection on new hardware.
2. Add sign extension with kernel MMIO/IOCSR emulation.
3. Improve in-kernel IPI emulation.
4. Improve in-kernel PCH-PIC emulation.
5. Move kvm_iocsr tracepoint out of generic code.

Merge tag 'kvm-riscv-6.18-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.18

- Added SBI FWFT extension for Guest/VM with misaligned
  delegation and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V such as
  access_tracking_perf_test, dirty_log_perf_test,
  memslot_modification_stress_test, memslot_perf_test,
  mmu_stress_test, and rseq_test
- Added SBI v3.0 PMU enhancements in KVM and perf driver

Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.18

- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
  allowing more registers to be used as part of the message payload.

- Change the way pKVM allocates its VM handles, making sure that the
  privileged hypervisor is never tricked into using uninitialised
  data.

- Speed up MMIO range registration by avoiding unnecessary RCU
  synchronisation, which results in VMs starting much quicker.

- Add the dump of the instruction stream when panic-ing in the EL2
  payload, just like the rest of the kernel has always done. This will
  hopefully help debugging non-VHE setups.

- Add 52bit PA support to the stage-1 page-table walker, and make use
  of it to populate the fault level reported to the guest on failing
  to translate a stage-1 walk.

- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
  feature parity for guests, irrespective of the host platform.

- Fix some really ugly architecture problems when dealing with debug
  in a nested VM. This has some bad performance impacts, but is at
  least correct.

- Add enough infrastructure to be able to disable EL2 features and
  give effective values to the EL2 control registers. This then allows
  a bunch of features to be turned off, which helps cross-host
  migration.

- Large rework of the selftest infrastructure to allow most tests to
  transparently run at EL2. This is the first step towards enabling
  NV testing.

- Various fixes and improvements all over the map, including one BE
  fix, just in time for the removal of the feature.

Merge tag 'kvmarm-fixes-6.17-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 changes for 6.17, round #3

- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
   visiting from an MMU notifier

- Fixes to the TLB match process and TLB invalidation range for
   managing the VCNR pseudo-TLB

- Prevent SPE from erroneously profiling guests due to UNKNOWN reset
   values in PMSCR_EL1

- Fix save/restore of host MDCR_EL2 to account for eagerly programming
   at vcpu_load() on VHE systems

- Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
   where an xarray's spinlock was nested with a *raw* spinlock

- Permit stage-2 read permission aborts which are possible in the case
   of NV depending on the guest hypervisor's stage-2 translation

- Call raw_spin_unlock() instead of the internal spinlock API

- Fix parameter ordering when assigning VBAR_EL1

[Pull into kvm/master to fix conflicts. - Paolo]

Merge tag 'kvm-s390-next-6.18-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: A bugfix and a performance improvement

* Improve interrupt cpu for wakeup, change the heuristic to decide wich
vCPU to deliver a floating interrupt to.
* Clear the pte when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.

KVM: s390: Fix to clear PTE when discarding a swapped page

KVM run fails when guests with 'cmm' cpu feature and host are
under memory pressure and use swap heavily. This is because
npages becomes ENOMEN (out of memory) in hva_to_pfn_slow()
which inturn propagates as EFAULT to qemu. Clearing the page
table entry when discarding an address that maps to a swap
entry resolves the issue.

Fixes: 200197908dc4 ("KVM: s390: Refactor and split some gmap helpers")
Cc: stable@vger.kernel.org
Suggested-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Gautam Gala <ggala@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

Merge branch kvm-arm64/selftests-6.18 into kvmarm-master/next

* kvm-arm64/selftests-6.18:
  : .
  : KVM/arm64 selftest updates for 6.18:
  :
  : - Large update to run EL1 selftests at EL2 when possible
  :   (20250917212044.294760-1-oliver.upton@linux.dev)
  :
  : - Work around lack of ID_AA64MMFR4_EL1 trapping on CPUs
  :   without FEAT_FGT
  :   (20250923173006.467455-1-oliver.upton@linux.dev)
  :
  : - Additional fixes and cleanups
  :   (20250920-kvm-arm64-id-aa64isar3-el1-v1-0-1764c1c1c96d@kernel.org)
  : .
  KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
  KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
  KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
  KVM: arm64: selftests: Add basic test for running in VHE EL2
  KVM: arm64: selftests: Enable EL2 by default
  KVM: arm64: selftests: Initialize HCR_EL2
  KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
  KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
  KVM: arm64: selftests: Select SMCCC conduit based on current EL
  KVM: arm64: selftests: Provide helper for getting default vCPU target
  KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
  KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
  KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
  KVM: arm64: selftests: Add helper to check for VGICv3 support
  KVM: arm64: selftests: Initialize VGICv3 only once
  KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code

Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs

We have a couple of writable bitfields in ID_AA64ISAR3_EL1 but the
set_id_regs selftest does not cover this register at all, add coverage.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs

Currently we list the main set of registers with bits we test three
times, once in the test_regs array which is used at runtime, once in the
guest code and once in a list of ARRAY_SIZE() operations we use to tell
kselftest how many tests we plan to execute. This is needlessly fiddly,
when adding new registers as the test_cnt calculation is formatted with
two registers per line. Instead count the number of bitfields in the
register arrays at runtime.

The existing code subtracts ARRAY_SIZE(test_regs) from the number of
tests to account for the terminating FTR_REG_END entries in the per
register arrays, the new code accounts for this when enumerating.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Cope with arch silliness in EL2 selftest

Implementations without FEAT_FGT aren't required to trap the entire ID
register space when HCR_EL2.TID3 is set. This is a terrible idea, as the
hypervisor may need to advertise the absence of a feature to the VM
using a negative value in a signed field, FEAT_E2H0 being a great
example of this.

Cope with uncooperative implementations in the EL2 selftest by accepting
a zero value when FEAT_FGT is absent and otherwise only tolerating the
expected nonzero value.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Add basic test for running in VHE EL2

Add an embarrassingly simple selftest for sanity checking KVM's VHE EL2
and test that the ID register bits are consistent with HCR_EL2.E2H being
RES1.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Enable EL2 by default

Take advantage of VHE to implicitly promote KVM selftests to run at EL2
with only slight modification. Update the smccc_filter test to account
for this now that the EL2-ness of a VM is visible to tests.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Initialize HCR_EL2

Initialize HCR_EL2 such that EL2&0 is considered 'InHost', allowing the
use of (mostly) unmodified EL1 selftests at EL2.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters

Configuring the number of implemented counters via PMCR_EL0.N was a bad
idea in retrospect as it interacts poorly with nested. Migrate the
selftest to use the vCPU attribute instead of the KVM_SET_ONE_REG
mechanism.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2

Arch timer registers are redirected to their hypervisor counterparts
when running in VHE EL2. This is great, except for the fact that the
hypervisor timers use different PPIs. Use the correct INTIDs when that
is the case.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Select SMCCC conduit based on current EL

HVCs are taken within the VM when EL2 is in use. Ensure tests use the
SMC instruction when running at EL2 to interact with the host.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Provide helper for getting default vCPU target

The default vCPU target in KVM selftests is pretty boring in that it
doesn't enable any vCPU features. Expose a helper for getting the
default target to prepare for cramming in more features. Call
KVM_ARM_PREFERRED_TARGET directly from get-reg-list as it needs
fine-grained control over feature flags.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts

FEAT_VHE has the somewhat nice property of implicitly redirecting EL1
register aliases to their corresponding EL2 representations when E2H=1.
Unfortunately, there's no such abstraction for userspace and EL2
registers are always accessed by their canonical encoding.

Introduce a helper that applies EL2 redirections to sysregs and use
aggressive inlining to catch misuse at compile time. Go a little past
the architectural definition for ease of use for test authors (e.g. the
stack pointer).

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Create a VGICv3 for 'default' VMs

Start creating a VGICv3 by default unless explicitly opted-out by the
test. While having an interrupt controller is nice, the real benefit
here is clearing a hurdle for EL2 VMs which mandate the presence of a
VGIC.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation

vgic_v3_setup() has a good bit of sanity checking internally to ensure
that vCPUs have actually been created and match the dimensioning of the
vgic itself. Spin off an unsanitised setup and initialization helper so
vgic initialization can be wired in around a 'default' VM's vCPU
creation.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Add helper to check for VGICv3 support

Introduce a proper predicate for probing VGICv3 by performing a 'test'
creation of the device on a dummy VM.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Initialize VGICv3 only once

vgic_v3_setup() unnecessarily initializes the vgic twice. Keep the
initialization after configuring MMIO frames and get rid of the other.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code

In order to compel the default usage of EL2 in selftests, move
kvm_arch_vm_post_create() to library code and expose an opt-in for using
MTE by default.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: SVM: Enable AVIC by default for Zen4+ if x2AVIC is support

AVIC and x2AVIC are fully functional since Zen 4, with no known hardware
errata. Enable AVIC and x2AVIC by default on Zen4+ so long as x2AVIC is
supported (to avoid enabling partial support for APIC virtualization by
default).

Internally, convert "avic" to an integer so that KVM can identify if the
user has asked to explicitly enable or disable AVIC, i.e. so that KVM
doesn't override an explicit 'y' from the user. Arbitrarily use -1 to
denote auto-mode, and accept the string "auto" for the module param in
addition to standard boolean values, i.e. continue to allow the user to
configure the "avic" module parameter to explicitly enable/disable AVIC.

To again maintain backward compatibility with a standard boolean param,
set KERNEL_PARAM_OPS_FL_NOARG, which tells the params infrastructure to
allow empty values for %true, i.e. to interpret a bare "avic" as "avic=y".
Take care to check for a NULL @val when looking for "auto"!

Lastly, always print "avic" as a boolean, since auto-mode is resolved
during module initialization, i.e. the user should never see "auto" in
sysfs.

Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org>
Tested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250919215934.1590410-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SVM: Move global "avic" variable to avic.c

Move "avic" to avic.c so that it's colocated with the other AVIC specific
globals and module params, and so that avic_hardware_setup() is a bit more
self-contained, e.g. similar to sev_hardware_setup().

Deliberately set enable_apicv in svm.c as it's already globally visible
(defined by kvm.ko, not by kvm-amd.ko), and to clearly capture the
dependency on enable_apicv being initialized (svm_hardware_setup() clears
several AVIC-specific hooks when enable_apicv is disabled).

Alternatively, clearing of the hooks (and enable_ipiv) could be moved to
avic_hardware_setup(), but that's not obviously better, e.g. it's helpful
to isolate the setting of enable_apicv when reading code from the generic
x86 side of the world.

No functional change intended.

Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Tested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/20250919215934.1590410-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SVM: Don't advise the user to do force_avic=y (when x2AVIC is detected)

Don't advise the end user to try to force enable AVIC when x2AVIC is
reported as supported in CPUID, as forcefully enabling AVIC isn't something
that should be done lightly.  E.g. some Zen4 client systems hide AVIC but
leave x2AVIC behind, and while such a configuration is indeed due to buggy
firmware in the sense the reporting x2AVIC without AVIC is nonsensical,
KVM has no idea _why_ firmware disabled AVIC in the first place.

Suggesting that the user try to run with force_avic=y is sketchy even if
the user explicitly tries to enable AVIC, and will be downright
irresponsible once KVM starts enabling AVIC by default.  Alternatively,
KVM could print the message only when the user explicitly asks for AVIC,
but running with force_avic=y isn't something that should be encouraged
for random users.  force_avic is a useful knob for developers and perhaps
even advanced users, but isn't something that KVM should advertise broadly.

Opportunistically append a newline to the pr_warn() so that it prints out
immediately, and tweak the message to say that AVIC is unsupported instead
of disabled (disabled suggests that the kernel/KVM is somehow responsible).

Suggested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Tested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/20250919215934.1590410-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SVM: Always print "AVIC enabled" separately, even when force enabled

Print the customary "AVIC enabled" informational message even when AVIC is
force enabled on a system that doesn't advertise supported for AVIC in
CPUID, as not printing the standard message can confuse users and tools.

Opportunistically clean up the scary message when AVIC is force enabled,
but keep it as separate message so that it is printed at level "warn",
versus the standard message only being printed for level "info".

Suggested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Tested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/20250919215934.1590410-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SVM: Update "APICv in x2APIC without x2AVIC" in avic.c, not svm.c

Set the "allow_apicv_in_x2apic_without_x2apic_virtualization" flag as part
of avic_hardware_setup() instead of handling in svm_hardware_setup(), and
make x2avic_enabled local to avic.c (setting the flag was the only use in
svm.c).

Tag avic_hardware_setup() with __init as necessary (it should have been
tagged __init long ago).

No functional change intended (aside from the side effects of tagging
avic_hardware_setup() with __init).

Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Tested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/20250919215934.1590410-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SVM: Move x2AVIC MSR interception helper to avic.c

Move svm_set_x2apic_msr_interception() to avic.c as it's only relevant
when x2AVIC is enabled/supported and only called by AVIC code. In
addition to scoping AVIC code to avic.c, this will allow burying the
global x2avic_enabled variable in avic.

Opportunistically rename the helper to explicitly scope it to "avic".

No functional change intended.

Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Tested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/20250919215934.1590410-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SVM: Make svm_x86_ops globally visible, clean up on-HyperV usage

Make svm_x86_ops globally visible in anticipation of modifying the struct
in avic.c, and clean up the KVM-on-HyperV usage, as declaring _and using_
a local variable in a header that's only defined in one specific .c-file
is all kinds of ugly.

Opportunistically make svm_hv_enable_l2_tlb_flush() local to
svm_onhyperv.c, as the only reason it was visible was due to the
aforementioned shenanigans in svm_onhyperv.h.

Alternatively, svm_x86_ops could be explicitly passed to
svm_hv_hardware_setup() as a parameter.  While that approach is slightly
safer, e.g. avoids "hidden" updates, for better or worse, the Intel side
of KVM has already chosen to expose vt_x86_ops (and vt_init_ops).  Given
that svm_x86_ops is only truly consumed by kvm_ops_update, the odds of a
"hidden" update causing problems are extremely low.  So, absent a strong
reason to rework the VMX/TDX code, make svm_x86_ops visible, as having all
updates use exactly "svm_x86_ops." is advantageous in its own right.

No functional change intended.

Link: https://lore.kernel.org/r/20250919215934.1590410-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SVM: Re-load current, not host, TSC_AUX on #VMEXIT from SEV-ES guest

Prior to running an SEV-ES guest, set TSC_AUX in the host save area to the
current value in hardware, as tracked by the user return infrastructure,
instead of always loading the host's desired value for the CPU. If the
pCPU is also running a non-SEV-ES vCPU, loading the host's value on #VMEXIT
could clobber the other vCPU's value, e.g. if the SEV-ES vCPU preempted
the non-SEV-ES vCPU, in which case KVM expects the other vCPU's TSC_AUX
value to be resident in hardware.

Note, unlike TDX, which blindly _zeroes_ TSC_AUX on TD-Exit, SEV-ES CPUs
can load an arbitrary value. Stuff the current value in the host save
area instead of refreshing the user return cache so that KVM doesn't need
to track whether or not the vCPU actually enterred the guest and thus
loaded TSC_AUX from the host save area.

Opportunistically tag tsc_aux_uret_slot as read-only after init to guard
against unexpected modifications, and to make it obvious that using the
variable in sev_es_prepare_switch_to_guest() is safe.

Fixes: 916e3e5f26ab ("KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX")
Cc: stable@vger.kernel.org
Suggested-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
[sean: handle the SEV-ES case in sev_es_prepare_switch_to_guest()]
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250923153738.1875174-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Add helper to retrieve current value of user return MSR

In the user return MSR support, the cached value is always the hardware
value of the specific MSR. Therefore, add a helper to retrieve the
cached value, which can replace the need for RDMSR, for example, to
allow SEV-ES guests to restore the correct host hardware value without
using RDMSR.

Cc: stable@vger.kernel.org
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
[sean: drop "cache" from the name, make it a one-liner, tag for stable]
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250923153738.1875174-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SEV: Reject non-positive effective lengths during LAUNCH_UPDATE

Check for an invalid length during LAUNCH_UPDATE at the start of
snp_launch_update() instead of subtly relying on kvm_gmem_populate() to
detect the bad state.  Code that directly handles userspace input
absolutely should sanitize those inputs; failure to do so is asking for
bugs where KVM consumes an invalid "npages".

Keep the check in gmem, but wrap it in a WARN to flag any bad usage by
the caller.

Note, this is technically an ABI change as KVM would previously allow a
length of '0'.  But allowing a length of '0' is nonsensical and creates
pointless conundrums in KVM.  E.g. an empty range is arguably neither
private nor shared, but LAUNCH_UPDATE will fail if the starting gpa can't
be made private.  In practice, no known or well-behaved VMM passes a
length of '0'.

Note #2, the PAGE_ALIGNED(params.len) check ensures that lengths between
1 and 4095 (inclusive) are also rejected, i.e. that KVM won't end up with
npages=0 when doing "npages = params.len / PAGE_SIZE".

Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: Michael Roth <michael.roth@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250919211649.1575654-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SEV: Validate XCR0 provided by guest in GHCB

Use __kvm_set_xcr() to propagate XCR0 changes from the GHCB to KVM's
software model in order to validate the new XCR0 against KVM's view of
the supported XCR0.  Allowing garbage is thankfully mostly benign, as
kvm_load_{guest,host}_xsave_state() bail early for vCPUs with protected
state, xstate_required_size() will simply provide garbage back to the
guest, and attempting to save/restore the bad value via KVM_{G,S}ET_XCRS
will only harm the guest (setting XCR0 will fail).

However, allowing the guest to put junk into a field that KVM assumes is
valid is a CVE waiting to happen.  And as a bonus, using the proper API
eliminates the ugly open coding of setting arch.cpuid_dynamic_bits_dirty.

Simply ignore bad values, as either the guest managed to get an
unsupported value into hardware, or the guest is misbehaving and providing
pure garbage.  In either case, KVM can't fix the broken guest.

Note, using __kvm_set_xcr() also avoids recomputing dynamic CPUID bits
if XCR0 isn't actually changing (relatively to KVM's previous snapshot).

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SEV: Read save fields from GHCB exactly once

Wrap all reads of GHCB save fields with READ_ONCE() via a KVM-specific
GHCB get() utility to help guard against TOCTOU bugs. Using READ_ONCE()
doesn't completely prevent such bugs, e.g. doesn't prevent KVM from
redoing get() after checking the initial value, but at least addresses
all potential TOCTOU issues in the current KVM code base.

To prevent unintentional use of the generic helpers, take only @svm for
the kvm_ghcb_get_xxx() helpers and retrieve the ghcb instead of explicitly
passing it in.

Opportunistically reduce the indentation of the macro-defined helpers and
clean up the alignment.

Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it")
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code()

Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() to make
it clear that KVM is getting the cached value, not reading directly from
the guest-controlled GHCB. More importantly, vacating
kvm_ghcb_get_sw_exit_code() will allow adding a KVM-specific macro-built
kvm_ghcb_get_##field() helper to read values from the GHCB.

No functional change intended.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: selftests: Add ex_str() to print human friendly name of exception vectors

Steal exception_mnemonic() from KVM-Unit-Tests as ex_str() (to keep line
lengths reasonable) and use it in assert messages that currently print the
raw vector number.

Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-45-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

selftests/kvm: remove stale TODO in xapic_state_test

The TODO about using the number of vCPUs instead of vcpu.id + 1
was already addressed by commit 376bc1b458c9 ("KVM: selftests: Don't
assume vcpu->id is '0' in xAPIC state test"). The comment is now
stale and can be removed.

Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com>
Link: https://lore.kernel.org/r/20250908210547.12748-1-hsukrut3@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount

Add a PMU errata framework and use it to relax precise event counts on
Atom platforms that overcount "Instruction Retired" and "Branch Instruction
Retired" events, as the overcount issues on VM-Exit/VM-Entry are impossible
to prevent from userspace, e.g. the test can't prevent host IRQs.

Setup errata during early initialization and automatically sync the mask
to VMs so that tests can check for errata without having to manually
manage host=>guest variables.

For Intel Atom CPUs, the PMU events "Instruction Retired" or
"Branch Instruction Retired" may be overcounted for some certain
instructions, like FAR CALL/JMP, RETF, IRET, VMENTRY/VMEXIT/VMPTRLD
and complex SGX/SMX/CSTATE instructions/flows.

The detailed information can be found in the errata (section SRF7):
https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details/

For the Atom platforms before Sierra Forest (including Sierra Forest),
Both 2 events "Instruction Retired" and "Branch Instruction Retired" would
be overcounted on these certain instructions, but for Clearwater Forest
only "Instruction Retired" event is overcounted on these instructions.

Signed-off-by: dongsheng <dongsheng.x.zhang@intel.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: selftests: Validate more arch-events in pmu_counters_test

Add support for 5 new architectural events (4 topdown level 1 metrics
events and LBR inserts event) that will first show up in Intel's
Clearwater Forest CPUs. Detailed info about the new events can be found
in SDM section 21.2.7 "Pre-defined Architectural Performance Events".

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
[sean: drop "unavailable_mask" changes]
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: selftests: Reduce number of "unavailable PMU events" combos tested

Reduce the number of combinations of unavailable PMU events masks that are
testing by the PMU counters test. In reality, testing every possible
combination isn't all that interesting, and certainly not worth the tens
of seconds (or worse, minutes) of runtime. Fully testing the N^2 space
will be especially problematic in the near future, as 5! new arch events
are on their way.

Use alternating bit patterns (and 0 and -1u) in the hopes that _if_ there
is ever a KVM bug, it's not something horribly convoluted that shows up
only with a super specific pattern/value.

Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: selftests: Track unavailable_mask for PMU events as 32-bit value

Track the mask of "unavailable" PMU events as a 32-bit value. While bits
31:9 are currently reserved, silently truncating those bits is unnecessary
and asking for missed coverage. To avoid running afoul of the sanity check
in vcpu_set_cpuid_property(), explicitly adjust the mask based on the
non-reserved bits as reported by KVM's supported CPUID.

Opportunistically update the "all ones" testcase to pass -1u instead of
0xff.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: selftests: Add timing_info bit support in vmx_pmu_caps_test

A new bit PERF_CAPABILITIES[17] called "PEBS_TIMING_INFO" bit is added
to indicated if PEBS supports to record timing information in a new
"Retried Latency" field.

Since KVM requires user can only set host consistent PEBS capabilities,
otherwise the PERF_CAPABILITIES setting would fail, add pebs_timing_info
into the "immutable_caps" to block host inconsistent PEBS configuration
and cause errors.

Opportunistically drop the anythread_deprecated bit. It isn't and likely
never was a PERF_CAPABILITIES flag, the test's definition snuck in when
the union was copy+pasted from the kernel's definition.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
[sean: call out anythread_deprecated change]
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

LoongArch: KVM: Move kvm_iocsr tracepoint out of generic code

The tracepoint kvm_iocsr is only used by the loongarch architecture. As
trace events can take up to 5K of memory, move this tracepoint into the
LoongArch specific tracing file so that it doesn't waste memory for all
other architectures.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Rework pch_pic_update_batch_irqs()

Use proper bitmap API and drop all the housekeeping code.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Add different length support in loongarch_pch_pic_write()

With function loongarch_pch_pic_write(), currently there is only four
bytes register write support. But in theory, all length 1/2/4/8 should
be supported for all the registers, here add different length support
about register write emulation in function loongarch_pch_pic_write().

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Add different length support in loongarch_pch_pic_read()

With function loongarch_pch_pic_read(), currently it is hardcoded length
for different registers, and the length comes from exising linux pch_pic
driver code. But in theory, all length 1/2/4/8 should be supported for
all the registers, here add different length support about register read
emulation in function loongarch_pch_pic_read().

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Add IRR and ISR register read emulation

With LS7A user manual, there are registers PCH_PIC_INT_IRR_START
and PCH_PIC_INT_ISR_START. So add read access emulation in function
loongarch_pch_pic_read() here.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Set version information at initial stage

Register PCH_PIC_INT_ID constains version and supported irq number
information, and it is a read only register. The detailed value can
be set at initial stage, rather than read callback.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Access mailbox directly in mail_send()

With function mail_send(), it is to write mailbox of other VCPUs.
Existing simple APIs read_mailbox()/write_mailbox() can be used directly
rather than send command on IOCSR address.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Add implementation with IOCSR_IPI_SET

IPI IOCSR register IOCSR_IPI_SET can send ipi interrupt to other vCPUs,
but it can also send an interrupt to vCPU itself. Indeed there are such
operations on Linux as arch_irq_work_raise() which will send ipi message
to vCPU itself.

Here add implementation of write operation with IOCSR_IPI_SET register.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Add sign extension with kernel IOCSR read emulation

Function kvm_complete_iocsr_read() is to add sign extension with IOCSR
read emulation, it is used in user space IOCSR read completion now. Also
it should be used in kernel IOCSR read emulation.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Add sign extension with kernel MMIO read emulation

Function kvm_complete_mmio_read() is to add sign extension with MMIO
read emulation, it is used in user space MMIO read completion now. Also
it should be used in kernel MMIO read emulation.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Add PTW feature detection on new hardware

With new Loongson-3A6000/3C6000 hardware platforms (or later), hardware
page table walking (PTW) feature is supported on host. So here add this
feature detection on KVM host.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

KVM: x86: Fix hypercalls docs section number order

Commit 4180bf1b655a79 ("KVM: X86: Implement "send IPI" hypercall")
documents KVM_HC_SEND_IPI hypercall, yet its section number duplicates
KVM_HC_CLOCK_PAIRING one (which both are 6th). Fix the numbering order
so that the former should be 7th.

Fixes: 4180bf1b655a ("KVM: X86: Implement "send IPI" hypercall")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20250909003952.10314-1-bagasdotme@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Don't treat ENTER and LEAVE as branches, because they aren't

Remove the IsBranch flag from ENTER and LEAVE in KVM's emulator, as ENTER
and LEAVE are stack operations, not branches. Add forced emulation of
said instructions to the PMU counters test to prove that KVM diverges from
hardware, and to guard against regressions.

Opportunistically add a missing "1 MOV" to the selftest comment regarding
the number of instructions per loop, which commit 7803339fa929 ("KVM:
selftests: Use data load to trigger LLC references/misses in Intel PMU")
forgot to add.

Fixes: 018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
Cc: Jim Mattson <jmattson@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919004639.1360453-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

Linux 6.17-rc7

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"Fixes to the Allwinner and Renesas clk drivers:

   - Do the math properly in Allwinner's ccu_mp_recalc_rate() so clk
     rates aren't bogus

   - Fix a clock domain regression on Renesas R-Car M1A, R-Car H1,
     and RZ/A1 by registering the domain after the pmdomain bus is
     registered instead of before"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: mp: Fix dual-divider clock rate readback
  clk: renesas: mstp: Add genpd OF provider at postcore_initcall()

Merge tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull a few more btrfs fixes from David Sterba:

- in tree-checker, fix wrong size of check for inode ref item

- in ref-verify, handle combination of mount options that allow
   partially damaged extent tree (reported by syzbot)

- additional validation of compression mount option to catch invalid
   string as level

* tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: reject invalid compression level
  btrfs: ref-verify: handle damaged extent root tree
  btrfs: tree-checker: fix the incorrect inode ref size check

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
"One driver fix for a dma error checking thinko"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: mcq: Fix memory allocation checks for SQE and CQE

Merge tag 'firewire-fixes-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
"When new structures and events were added to UAPI in v6.5 kernel, the
  required update to the subsystem ABI version returned to userspace
  client was overlooked. The version is now updated"

* tag 'firewire-fixes-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: fix overlooked update of subsystem ABI version

Merge tag 'x86-urgent-2025-09-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
"Fix a SEV-SNP regression when CONFIG_KVM_AMD_SEV is disabled"

* tag 'x86-urgent-2025-09-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Guard sev_evict_cache() with CONFIG_AMD_MEM_ENCRYPT

Merge branch kvm-arm64/misc-6.18 into kvmarm-master/next

* kvm-arm64/misc-6.18:
  : .
  : .
  : Misc improvements and bug fixes:
  :
  : - Fix XN handling in the S2 page table dumper
  :   (20250809135356.1003520-1-r09922117@csie.ntu.edu.tw)
  :
  : - Fix sanitity checks for huge mapping with pKVM running np guests
  :   (20250815162655.121108-1-ben.horgan@arm.com)
  :
  : - Fix use of TRBE when KVM is disabled, and Linux running under
  :   a lesser hypervisor (20250902-etm_crash-v2-1-aa9713a7306b@oss.qualcomm.com)
  :
  : - Fix out of date MTE-related comments (20250915155234.196288-1-alexandru.elisei@arm.com)
  :
  : - Fix PSCI BE support when running a NV guest (20250916161103.1040727-1-maz@kernel.org)
  :
  : - Fix page reference leak when refusing to map a page due to mismatched attributes
  :   (20250917130737.2139403-1-tabba@google.com)
  :
  : - Add trap handling for PMSDSFR_EL1
  :   (20250901-james-perf-feat_spe_eft-v8-7-2e2738f24559@linaro.org)
  :
  : - Add advertisement from FEAT_LSFE (Large System Float Extension)
  :   (20250918-arm64-lsfe-v4-1-0abc712101c7@kernel.org)
  : .
  KVM: arm64: Expose FEAT_LSFE to guests
  KVM: arm64: Add trap configs for PMSDSFR_EL1
  KVM: arm64: Fix page leak in user_mem_abort()
  KVM: arm64: Fix kvm_vcpu_{set,is}_be() to deal with EL2 state
  KVM: arm64: Update stale comment for sanitise_mte_tags()
  KVM: arm64: Return early from trace helpers when KVM isn't available
  KVM: arm64: Fix debug checking for np-guests using huge mappings
  KVM: arm64: ptdump: Don't test PTE_VALID alongside other attributes

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/nv-misc-6.18 into kvmarm-master/next

* kvm-arm64/nv-misc-6.18:
  : .
  : Various NV-related fixes:
  :
  : - Relax KVM's SError injection to consider that HCR_EL2.AMO's
  :   effective value is 1 when HCR_EL2.{E2H,TGE)=={1,0}.
  :   (20250918164632.410404-1-oliver.upton@linux.dev)
  :
  : - Allow userspace to disable some S2 base granule sizes
  :   (20250918165505.415017-1-oliver.upton@linux.dev)
  : .
  KVM: arm64: nv: Allow userspace to de-feature stage-2 TGRANs
  KVM: arm64: nv: Treat AMO as 1 when at EL2 and {E2H,TGE} = {1, 0}

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/el2-feature-control into kvmarm-master/next

* kvm-arm64/el2-feature-control: (23 commits)
  : .
  : General rework of EL2 features that can be disabled to satisfy
  : the requirement of migration between heterogeneous hosts:
  :
  : - Handle effective RES0 behaviour of undefined registers, making sure
  :   that disabling a feature affects full registeres, and not just
  :   individual control bits. (20250918151402.1665315-1-maz@kernel.org)
  :
  : - Allow ID_AA64MMFR1_EL1.{TWED,HCX} to be disabled from userspace.
  :   (20250911114621.3724469-1-yangjinqian1@huawei.com)
  :
  : - Turn the NV feature management into a deny-list, and expose
  :   missing features to EL2 guests.
  :   (20250912212258.407350-1-oliver.upton@linux.dev)
  : .
  KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs
  KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs
  KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs
  KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs
  KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't set
  KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs
  KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs
  KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac
  KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs
  KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs
  KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg()
  KVM: arm64: selftests: Test writes to ID_AA64MMFR1_EL1.{HCX, TWED}
  KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspace
  KVM: arm64: Convert MDCR_EL2 RES0 handling to compute_reg_res0_bits()
  KVM: arm64: Convert SCTLR_EL1 RES0 handling to compute_reg_res0_bits()
  KVM: arm64: Enforce absence of FEAT_TCR2 on TCR2_EL2
  KVM: arm64: Enforce absence of FEAT_SCTLR2 on SCTLR2_EL{1,2}
  KVM: arm64: Convert HCR_EL2 RES0 handling to compute_reg_res0_bits()
  KVM: arm64: Enforce absence of FEAT_HCX on HCRX_EL2
  KVM: arm64: Enforce absence of FEAT_FGT2 on FGT2 registers
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/nv-debug into kvmarm-master/next

* kvm-arm64/nv-debug:
  : .
  : Fix handling of MDSCR_EL1 in NV context, which is unfortunately
  : mishandled by the architecture. Patches courtesy of Oliver Upton
  : (20250917203125.283116-2-oliver.upton@linux.dev)
  : .
  KVM: arm64: nv: Apply guest's MDCR traps in nested context
  KVM: arm64: nv: Trap debug registers when in hyp context

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/gic-v5-nv into kvmarm-master/next

* kvm-arm64/gic-v5-nv:
  : .
  : Add NV support to GICv5 in GICv3 emulation mode, ensuring that the v3
  : guest support is identical to that of a pure v3 platform.
  :
  : Patches courtesy of Sascha Bischoff (20250828105925.3865158-1-sascha.bischoff@arm.com)
  : .
  irqchip/gic-v5: Drop has_gcie_v3_compat from gic_kvm_info
  KVM: arm64: Use ARM64_HAS_GICV5_LEGACY for GICv5 probing
  arm64: cpucaps: Add GICv5 Legacy vCPU interface (GCIE_LEGACY) capability
  KVM: arm64: Enable nested for GICv5 host with FEAT_GCIE_LEGACY
  KVM: arm64: Don't access ICC_SRE_EL2 if GICv3 doesn't support v2 compatibility

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/52bit-at into kvmarm-master/next

* kvm-arm64/52bit-at:
  : .
  : Upgrade the S1 page table walker to support 52bit PA, and use it to
  : report the fault level when taking a S2 fault on S1PTW, which is required
  : by the architecture (20250915114451.660351-1-maz@kernel.org).
  : .
  KVM: arm64: selftest: Expand external_aborts test to look for TTW levels
  KVM: arm64: Populate level on S1PTW SEA injection
  KVM: arm64: Add S1 IPA to page table level walker
  KVM: arm64: Add filtering hook to S1 page table walk
  KVM: arm64: Don't switch MMU on translation from non-NV context
  KVM: arm64: Allow EL1 control registers to be accessed from the CPU state
  KVM: arm64: Allow use of S1 PTW for non-NV vcpus
  KVM: arm64: Report faults from S1 walk setup at the expected start level
  KVM: arm64: Expand valid block mappings to FEAT_LPA/LPA2 support
  KVM: arm64: Populate PAR_EL1 with 52bit addresses
  KVM: arm64: Compute shareability for LPA2
  KVM: arm64: Pass the walk_info structure to compute_par_s1()
  KVM: arm64: Decouple output address from the PT descriptor
  KVM: arm64: Compute 52bit TTBR address and alignment
  KVM: arm64: Account for 52bit when computing maximum OA
  KVM: arm64: Add helper computing the state of 52bit PA support

Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftest: Expand external_aborts test to look for TTW levels

Add a basic test corrupting a level-2 table entry to check that
the resulting abort is a SEA on a PTW at level-3.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Populate level on S1PTW SEA injection

Our fault injection mechanism is mildly primitive, and doesn't
really implement the architecture when it comes to reporting
the level of a failing S1 PTW (we blindly report a SEA outside
of a PTW).

Now that we can walk the S1 page tables and look for a particular
IPA in the descriptors, it is pretty easy to improve the SEA
injection code.

Note that we only do it for AArch64 guests, and that 32bit guests
are left to their own device (oddly enough, I don't fancy writing
a 32bit PTW...).

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Add S1 IPA to page table level walker

Use the filtering hook infrastructure to implement a new walker
that, for a given VA and an IPA, returns the level of the first
occurence of this IPA in the walk from that VA.

This will be used to improve our SEA syndrome reporting.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Add filtering hook to S1 page table walk

Add a filtering hook that can get called on each level of the
walk, and providing access to the full state.

Crucially, this is called *before* the access is made, so that
it is possible to track down the level of a faulting access.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Don't switch MMU on translation from non-NV context

If calling into the AT code from guest EL1, there is no need
to consider any context switch, as we are guaranteed to be
in the correct context.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Allow EL1 control registers to be accessed from the CPU state

As we are about to plug the SW PTW into the EL1-only code, we can
no longer assume that the EL1 state is not resident on the CPU,
as we don't necessarily get there from EL2 traps.

Turn the __vcpu_sys_reg() access on the EL1 state into calls to
the vcpu_read_sys_reg() helper, which is guaranteed to do the
right thing.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Allow use of S1 PTW for non-NV vcpus

As we are about to use the S1 PTW in non-NV contexts, we must make
sure that we don't evaluate the EL2 state when dealing with the EL1&0
translation regime.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Report faults from S1 walk setup at the expected start level

Translation faults from TTBR must be reported on the start level,
and not level-0. Enforcing this requires moving quite a lot of
code around so that the start level can be computed early enough
that it is usable.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Expand valid block mappings to FEAT_LPA/LPA2 support

With 52bit PAs, block mappings can exist at different levels (such
as level 0 for 4kB pages, or level 1 for 16kB and 64kB pages).

Account for this in walk_s1().

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Populate PAR_EL1 with 52bit addresses

Expand the output address populated in PAR_EL1 to 52bit addresses.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Compute shareability for LPA2

LPA2 gets the memory access shareability from TCR_ELx instead of
getting it form the descriptors. Store it in the walk info struct
so that it is passed around and evaluated as required.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Pass the walk_info structure to compute_par_s1()

Instead of just passing the translation regime, pass the full
walk_info structure to compute_par_s1(). This will help further
chamges that will require it.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Decouple output address from the PT descriptor

Add a helper converting the descriptor into a nicely formed OA,
irrespective of the in-descriptor representation (< 52bit, LPA
or LPA2).

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Compute 52bit TTBR address and alignment

52bit addresses from TTBR need extra adjustment and alignment
checks. Implement the requirements of the architecture.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Account for 52bit when computing maximum OA

Adjust the computation of the max OA to account for 52bit PAs.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Add helper computing the state of 52bit PA support

Track whether the guest is using 52bit PAs, either LPA or LPA2.
This further simplifies the handling of LVA for 4k and 16k pages,
as LPA2 implies LVA in this case.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge tag 'sunxi-clk-fixes-for-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes

Pull an Allwinner clk driver fix from Chen-Yu Tsai:

- One fix for the clock rate readback on the recently added dual
divider clocks

* tag 'sunxi-clk-fixes-for-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi-ng: mp: Fix dual-divider clock rate readback

firewire: core: fix overlooked update of subsystem ABI version

In kernel v6.5, several functions were added to the cdev layer. This
required updating the default version of subsystem ABI up to 6, but
this requirement was overlooked.

This commit updates the version accordingly.

Fixes: 6add87e9764d ("firewire: cdev: add new version of ABI to notify time stamp at request/response subaction of transaction#")
Link: https://lore.kernel.org/r/20250920025148.163402-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>

Merge tag '6.17-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Two unlink fixes: one for rename and one for deferred close

- Four smbdirect/RDMA fixes: fix buffer leak in negotiate, two fixes
   for races in smbd_destroy, fix offset and length checks in recv_done

* tag '6.17-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix smbdirect_recv_io leak in smbd_negotiate() error path
  smb: client: fix file open check in __cifs_unlink()
  smb: client: let smbd_destroy() call disable_work_sync(&info->post_send_credits_work)
  smb: client: use disable[_delayed]_work_sync in smbdirect.c
  smb: client: fix filename matching of deferred files
  smb: client: let recv_done verify data_offset, data_length and remaining_data_length

Merge tag 'iommu-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

- Fixes for memory leak and memory corruption bugs on S390 and AMD-Vi

- Race condition fix in AMD-Vi page table code and S390 device attach
   code

- Intel VT-d: Fix alignment checks in __domain_mapping()

- AMD-Vi: Fix potentially incorrect DTE settings when device has
   aliases

* tag 'iommu-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/amd/pgtbl: Fix possible race while increase page table level
  iommu/amd: Fix alias device DTE setting
  iommu/s390: Make attach succeed when the device was surprise removed
  iommu/vt-d: Fix __domain_mapping()'s usage of switch_to_super_page()
  iommu/s390: Fix memory corruption when using identity domain
  iommu/amd: Fix ivrs_base memleak in early_amd_iommu_init()

KVM: TDX: Fix uninitialized error code for __tdx_bringup()

Fix a Smatch static checker warning reported by Dan:

arch/x86/kvm/vmx/tdx.c:3464 __tdx_bringup()
warn: missing error code 'r'

Initialize r to -EINVAL before tdx_get_sysinfo() to simplify the code and
to prevent similar issues from sneaking in later on as suggested by Kai.

Cc: stable@vger.kernel.org
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 61bb28279623 ("KVM: TDX: Get system-wide info about TDX module on initialization")
Suggested-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com>
Link: https://lore.kernel.org/r/20250918053226.802204-1-tony.lindgren@linux.intel.com
[sean: tag for stable]
Signed-off-by: Sean Christopherson <seanjc@google.com>

Merge tag 'block-6.17-20250918' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
"A set of fixes for an issue with md array assembly and drbd for
  devices supporting write zeros"

* tag 'block-6.17-20250918' of git://git.kernel.dk/linux:
  drbd: init queue_limits->max_hw_wzeroes_unmap_sectors parameter
  md: init queue_limits->max_hw_wzeroes_unmap_sectors parameter

Merge tag 'io_uring-6.17-20250919' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Fix for a regression introduced in the io-wq worker creation logic.

- Remove the allocation cache for the msg_ring io_kiocb allocations. I
   have a suspicion that there's a bug there, and since we just fixed
   one in that area, let's just yank the use of that cache entirely.
   It's not that important, and it kills some code.

- Treat a closed ring like task exiting in that any requests that
   trigger post that condition should just get canceled. Doesn't fix any
   real issues, outside of having tasks being able to rely on that
   guarantee.

- Fix for a bug in the network zero-copy notification mechanism, where
   a comparison for matching tctx/ctx for notifications was buggy in
   that it didn't correctly compare with the previous notification.

* tag 'io_uring-6.17-20250919' of git://git.kernel.dk/linux:
  io_uring: fix incorrect io_kiocb reference in io_link_skb
  io_uring/msg_ring: kill alloc_cache for io_kiocb allocations
  io_uring: include dying ring in task_work "should cancel" state
  io_uring/io-wq: fix `max_workers` breakage and `nr_workers` underflow

Merge tag 'gpio-fixes-for-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix an ACPI I2C HID driver breakage due to not initializing a
   structure on the stack and passing garbage down to GPIO core

- ignore touchpad wakeup on GPD G1619-05

- fix debouncing configuration when looking up GPIOs in ACPI

* tag 'gpio-fixes-for-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: acpi: initialize acpi_gpio_info struct
  gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-05
  gpiolib: acpi: Program debounce when finding GPIO