]> git.ipfire.org Git - thirdparty/kernel/linux.git/commitdiff
KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
authorYosry Ahmed <yosry.ahmed@linux.dev>
Mon, 9 Feb 2026 19:51:41 +0000 (19:51 +0000)
committerSean Christopherson <seanjc@google.com>
Thu, 5 Mar 2026 00:09:10 +0000 (16:09 -0800)
KVM tracks when EFER.SVME is set and cleared to initialize and tear down
nested state. However, it doesn't differentiate if EFER.SVME is getting
toggled in L1 or L2+. If L2 clears EFER.SVME, and L1 does not intercept
the EFER write, KVM exits guest mode and tears down nested state while
L2 is running, executing L1 without injecting a proper #VMEXIT.

According to the APM:

    The effect of turning off EFER.SVME while a guest is running is
    undefined; therefore, the VMM should always prevent guests from
    writing EFER.

Since the behavior is architecturally undefined, KVM gets to choose what
to do. Inject a triple fault into L1 as a more graceful option that
running L1 with corrupted state.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
base-commit: 95deaec3557dced322e2540bfa426e60e5373d46
Link: https://patch.msgid.link/20260209195142.2554532-2-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
arch/x86/kvm/svm/svm.c

index 2c511f86b79d95282b7a178f7f257348174fb2c8..4bf0f5d7167faa3a98b1f8dc3590b971d33d6f44 100644 (file)
@@ -217,6 +217,19 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
        if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
                if (!(efer & EFER_SVME)) {
+                       /*
+                        * Architecturally, clearing EFER.SVME while a guest is
+                        * running yields undefined behavior, i.e. KVM can do
+                        * literally anything.  Force the vCPU back into L1 as
+                        * that is the safest option for KVM, but synthesize a
+                        * triple fault (for L1!) so that KVM at least doesn't
+                        * run random L2 code in the context of L1.  Do so if
+                        * and only if the vCPU is actively running, e.g. to
+                        * avoid positives if userspace is stuffing state.
+                        */
+                       if (is_guest_mode(vcpu) && vcpu->wants_to_run)
+                               kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
                        svm_leave_nested(vcpu);
                        /* #GP intercept is still needed for vmware backdoor */
                        if (!enable_vmware_backdoor)