From: Sean Christopherson Date: Thu, 14 May 2026 21:31:15 +0000 (-0700) Subject: KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=8c63179d975f2029c948ecce622f72af616dbff7;p=thirdparty%2Fkernel%2Flinux.git KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated When x2AVIC is enabled, disable WRMSR interception only for MSRs that are actually accelerated by hardware. Disabling interception for MSRs that aren't accelerated is functionally "fine", and in some cases a weird "win" for performance, but only for cases that should never be triggered by a well-behaved VM (writes to read-only registers; the #GP will typically occur in the guest without taking a #VMEXIT, even for fault-like exits). But overall, disabling interception for MSRs that aren't accelerated is at best confusing and unintuitive, and at worst introduces avoidable risk, as the APM's documentation is imperfect and contradictory. The table in "15.29.3.1 Virtual APIC Register Accesses" of simply states that such writes generate exits, where as "Section 15.29.10 x2AVIC" says: x2APIC MSR intercept checks and access checks have higher priority than AVIC access permission checks. CPU behavior follows the latter (which makes perfect sense), but all in all there's simply no reason to disable interception just to make a #GP faster. Note, the set of MSRs that are passed through for write is identical to VMX's set when IPI virtualization is enabled. This is not a coincidence, and is another motiviating factor for cleaning up the intercepts, as x2AVIC is functionally equivalent to APICv+IPIv. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Reviewed-by: Naveen N Rao (AMD) Link: https://patch.msgid.link/20260514213115.1637082-4-seanjc@google.com Signed-off-by: Sean Christopherson --- diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 8e4926c7b8dc..724a45c2aa23 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -124,39 +124,6 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm, { struct kvm_vcpu *vcpu = &svm->vcpu; u64 rd_regs; - - static const u32 x2avic_passthrough_msrs[] = { - X2APIC_MSR(APIC_ID), - X2APIC_MSR(APIC_LVR), - X2APIC_MSR(APIC_TASKPRI), - X2APIC_MSR(APIC_ARBPRI), - X2APIC_MSR(APIC_PROCPRI), - X2APIC_MSR(APIC_EOI), - X2APIC_MSR(APIC_RRR), - X2APIC_MSR(APIC_LDR), - X2APIC_MSR(APIC_DFR), - X2APIC_MSR(APIC_SPIV), - X2APIC_MSR(APIC_ISR), - X2APIC_MSR(APIC_TMR), - X2APIC_MSR(APIC_IRR), - X2APIC_MSR(APIC_ESR), - X2APIC_MSR(APIC_ICR), - X2APIC_MSR(APIC_ICR2), - - /* - * Note! Always intercept LVTT, as TSC-deadline timer mode - * isn't virtualized by hardware, and the CPU will generate a - * #GP instead of a #VMEXIT. - */ - X2APIC_MSR(APIC_LVTTHMR), - X2APIC_MSR(APIC_LVTPC), - X2APIC_MSR(APIC_LVT0), - X2APIC_MSR(APIC_LVT1), - X2APIC_MSR(APIC_LVTERR), - X2APIC_MSR(APIC_TMICT), - X2APIC_MSR(APIC_TMCCT), - X2APIC_MSR(APIC_TDCR), - }; int i; if (intercept == svm->x2avic_msrs_intercepted) @@ -171,9 +138,10 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm, svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i, MSR_TYPE_R, intercept); - for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++) - svm_set_intercept_for_msr(vcpu, x2avic_passthrough_msrs[i], - MSR_TYPE_W, intercept); + svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_W, intercept); + svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W, intercept); + svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W, intercept); + svm_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR), MSR_TYPE_W, intercept); svm->x2avic_msrs_intercepted = intercept; }