Extend KVM's restriction on CPUID and feature MSR changes to disallow
updates while L2 is active in addition to rejecting updates after the vCPU
has run at least once. Like post-run vCPU model updates, attempting to
react to model changes while L2 is active is practically infeasible, e.g.
KVM would need to do _something_ in response to impossible situations where
userspace has a removed a feature that was consumed as parted of nested
VM-Enter.
In practice, disallowing vCPU model changes while L2 is active is largely
uninteresting, as the only way for L2 to be active without the vCPU having
run at least once is if userspace stuffed state via KVM_SET_NESTED_STATE.
And because KVM_SET_NESTED_STATE can't put the vCPU into L2 without
userspace first defining the vCPU model, e.g. to enable SVM/VMX, modifying
the vCPU model while L2 is active would require deliberately setting the
vCPU model, then loading nested state, and then changing the model. I.e.
no sane VMM should run afoul of the new restriction, and any VMM that does
encounter problems has likely been running a broken setup for a long time.
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Kevin Cheng <chengkev@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230205641.4092235-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));
/*
- * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
- * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
- * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page
- * faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with
- * the core vCPU model on the fly. It would've been better to forbid any
- * KVM_SET_CPUID{,2} calls after KVM_RUN altogether but unfortunately
- * some VMMs (e.g. QEMU) reuse vCPU fds for CPU hotplug/unplug and do
+ * KVM does not correctly handle changing guest CPUID after KVM_RUN or
+ * while L2 is active, as MAXPHYADDR, GBPAGES support, AMD reserved bit
+ * behavior, etc. aren't tracked in kvm_mmu_page_role, and L2 state
+ * can't be adjusted (without breaking L2 in some way). As a result,
+ * KVM may reuse SPs/SPTEs and/or run L2 with bad/misconfigured state.
+ *
+ * In practice, no sane VMM mucks with the core vCPU model on the fly.
+ * It would've been better to forbid any KVM_SET_CPUID{,2} calls after
+ * KVM_RUN or KVM_SET_NESTED_STATE altogether, but unfortunately some
+ * VMMs (e.g. QEMU) reuse vCPU fds for CPU hotplug/unplug and do
* KVM_SET_CPUID{,2} again. To support this legacy behavior, check
* whether the supplied CPUID data is equal to what's already set.
*/
- if (kvm_vcpu_has_run(vcpu)) {
+ if (!kvm_can_set_cpuid_and_feature_msrs(vcpu)) {
r = kvm_cpuid_check_equal(vcpu, e2, nent);
if (r)
goto err;
vcpu->arch.nested_mmu.cpu_role.ext.valid = 0;
kvm_mmu_reset_context(vcpu);
- /*
- * Changing guest CPUID after KVM_RUN is forbidden, see the comment in
- * kvm_arch_vcpu_ioctl().
- */
- KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm);
+ KVM_BUG_ON(!kvm_can_set_cpuid_and_feature_msrs(vcpu), vcpu->kvm);
}
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
+ if (KVM_BUG_ON(!kvm_can_set_cpuid_and_feature_msrs(vcpu), vcpu->kvm))
return;
/*
u64 val;
/*
- * Disallow writes to immutable feature MSRs after KVM_RUN. KVM does
- * not support modifying the guest vCPU model on the fly, e.g. changing
- * the nVMX capabilities while L2 is running is nonsensical. Allow
- * writes of the same value, e.g. to allow userspace to blindly stuff
- * all MSRs when emulating RESET.
- */
- if (kvm_vcpu_has_run(vcpu) && kvm_is_immutable_feature_msr(index) &&
+ * Reject writes to immutable feature MSRs if the vCPU model is frozen,
+ * as KVM doesn't support modifying the guest vCPU model on the fly,
+ * e.g. changing the VMX capabilities MSRs while L2 is active is
+ * nonsensical. Allow writes of the same value, e.g. so that userspace
+ * can blindly stuff all MSRs when emulating RESET.
+ */
+ if (!kvm_can_set_cpuid_and_feature_msrs(vcpu) &&
+ kvm_is_immutable_feature_msr(index) &&
(do_get_msr(vcpu, index, &val) || *data != val))
return -EINVAL;
indirect_branch_prediction_barrier();
}
-static inline bool kvm_vcpu_has_run(struct kvm_vcpu *vcpu)
+/*
+ * Disallow modifying CPUID and feature MSRs, which affect the core virtual CPU
+ * model exposed to the guest and virtualized by KVM, if the vCPU has already
+ * run or is in guest mode (L2). In both cases, KVM has already consumed the
+ * current virtual CPU model, and doesn't support "unwinding" to react to the
+ * new model.
+ *
+ * Note, the only way is_guest_mode() can be true with 'last_vmentry_cpu == -1'
+ * is if userspace sets CPUID and feature MSRs (to enable VMX/SVM), then sets
+ * nested state, and then attempts to set CPUID and/or feature MSRs *again*.
+ */
+static inline bool kvm_can_set_cpuid_and_feature_msrs(struct kvm_vcpu *vcpu)
{
- return vcpu->arch.last_vmentry_cpu != -1;
+ return vcpu->arch.last_vmentry_cpu == -1 && !is_guest_mode(vcpu);
}
static inline void kvm_set_mp_state(struct kvm_vcpu *vcpu, int mp_state)