From: Paolo Bonzini Date: Mon, 28 Jul 2025 15:13:57 +0000 (-0400) Subject: Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=1a14928e2e91a098c6117ba52b06327e3fc5072c;p=thirdparty%2Fkernel%2Flinux.git Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD KVM x86 misc changes for 6.17 - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the guest. Failure to honor FREEZE_IN_SMM can bleed host state into the guest. - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF. - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model. - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical. - Recalculate all MSR intercepts from the "source" on MSR filter changes, and drop the dedicated "shadow" bitmaps (and their awful "max" size defines). - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR. - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently. - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED, a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON). Use the same approach KVM uses for dealing with "impossible" emulation when running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU has architecturally impossible state. - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can "virtualize" APERF/MPERF (with many caveats). - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default" frequency is unsupported for VMs with a "secure" TSC, and there's no known use case for changing the default frequency for other VM types. --- 1a14928e2e91a098c6117ba52b06327e3fc5072c diff --cc Documentation/virt/kvm/api.rst index 6f93c8134f95c,dcddef1dd47be..fcb783735dd1b --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@@ -2006,15 -2006,8 +2006,15 @@@ frequency is KHz If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also be used as a vm ioctl to set the initial tsc frequency of subsequently - created vCPUs. + created vCPUs. Note, the vm ioctl is only allowed prior to creating vCPUs. +For TSC protected Confidential Computing (CoCo) VMs where TSC frequency +is configured once at VM scope and remains unchanged during VM's +lifetime, the vm ioctl should be used to configure the TSC frequency +and the vcpu ioctl is not supported. + +Example of such CoCo VMs: TDX guests. + 4.56 KVM_GET_TSC_KHZ -------------------- diff --cc arch/x86/include/asm/kvm_host.h index bfa6d05d38bc0,f2f641583a503..c3cf1875606b2 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@@ -1408,11 -1392,9 +1408,8 @@@ struct kvm_arch gpa_t wall_clock; - bool mwait_in_guest; - bool hlt_in_guest; - bool pause_in_guest; - bool cstate_in_guest; + u64 disabled_exits; - unsigned long irq_sources_bitmap; s64 kvmclock_offset; /* diff --cc arch/x86/kvm/x86.c index f641903659e9a,a247f67954eab..9ea2e54e75655 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@@ -10920,12 -11046,22 +10924,22 @@@ static int vcpu_enter_guest(struct kvm_ set_debugreg(vcpu->arch.eff_db[3], 3); /* When KVM_DEBUGREG_WONT_EXIT, dr6 is accessible in guest. */ if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) - kvm_x86_call(set_dr6)(vcpu, vcpu->arch.dr6); + run_flags |= KVM_RUN_LOAD_GUEST_DR6; } else if (unlikely(hw_breakpoint_active())) { - set_debugreg(0, 7); + set_debugreg(DR7_FIXED_1, 7); } - vcpu->arch.host_debugctl = get_debugctlmsr(); + /* + * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL + * can be modified in IRQ context, e.g. via SMP function calls. Inform + * vendor code if any host-owned bits were changed, e.g. so that the + * value loaded into hardware while running the guest can be updated. + */ + debug_ctl = get_debugctlmsr(); + if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL && + !vcpu->arch.guest_state_protected) + run_flags |= KVM_RUN_LOAD_DEBUGCTL; + vcpu->arch.host_debugctl = debug_ctl; guest_timing_enter_irqoff(); diff --cc tools/testing/selftests/kvm/include/kvm_util.h index 1ddf85eebd1d0,492f0edc79559..e18d4bb1372fb --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@@ -18,9 -18,10 +18,11 @@@ #include #include +#include #include + #include + #include "kvm_util_arch.h" #include "kvm_util_types.h" #include "sparsebit.h"