From: Sasha Levin Date: Sat, 13 Jun 2026 15:52:04 +0000 (-0400) Subject: Fixes for all trees X-Git-Url: http://git.ipfire.org/gitweb/?a=commitdiff_plain;ds=sidebyside;p=thirdparty%2Fkernel%2Fstable-queue.git Fixes for all trees Signed-off-by: Sasha Levin --- diff --git a/queue-6.12/kvm-vmx-update-svi-during-runtime-apicv-activation.patch b/queue-6.12/kvm-vmx-update-svi-during-runtime-apicv-activation.patch new file mode 100644 index 0000000000..e2c471a71f --- /dev/null +++ b/queue-6.12/kvm-vmx-update-svi-during-runtime-apicv-activation.patch @@ -0,0 +1,156 @@ +From 38915d6cbfd63e277b35055fd4ab769fa59fe32e Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 12 Jun 2026 14:10:01 -0700 +Subject: KVM: VMX: Update SVI during runtime APICv activation + +From: Dongli Zhang + +commit b2849bec936be642b5420801f902337f2507648e upstream. + +The APICv (apic->apicv_active) can be activated or deactivated at runtime, +for instance, because of APICv inhibit reasons. Intel VMX employs different +mechanisms to virtualize LAPIC based on whether APICv is active. + +When APICv is activated at runtime, GUEST_INTR_STATUS is used to configure +and report the current pending IRR and ISR states. Unless a specific vector +is explicitly included in EOI_EXIT_BITMAP, its EOI will not be trapped to +KVM. Intel VMX automatically clears the corresponding ISR bit based on the +GUEST_INTR_STATUS.SVI field. + +When APICv is deactivated at runtime, the VM_ENTRY_INTR_INFO_FIELD is used +to specify the next interrupt vector to invoke upon VM-entry. The +VMX IDT_VECTORING_INFO_FIELD is used to report un-invoked vectors on +VM-exit. EOIs are always trapped to KVM, so the software can manually clear +pending ISR bits. + +There are scenarios where, with APICv activated at runtime, a guest-issued +EOI may not be able to clear the pending ISR bit. + +Taking vector 236 as an example, here is one scenario. + +1. Suppose APICv is inactive. Vector 236 is pending in the IRR. +2. To handle KVM_REQ_EVENT, KVM moves vector 236 from the IRR to the ISR, +and configures the VM_ENTRY_INTR_INFO_FIELD via vmx_inject_irq(). +3. After VM-entry, vector 236 is invoked through the guest IDT. At this +point, the data in VM_ENTRY_INTR_INFO_FIELD is no longer valid. The guest +interrupt handler for vector 236 is invoked. +4. Suppose a VM exit occurs very early in the guest interrupt handler, +before the EOI is issued. +5. Nothing is reported through the IDT_VECTORING_INFO_FIELD because +vector 236 has already been invoked in the guest. +6. Now, suppose APICv is activated. Before the next VM-entry, KVM calls +kvm_vcpu_update_apicv() to activate APICv. +7. Unfortunately, GUEST_INTR_STATUS.SVI is not configured, although +vector 236 is still pending in the ISR. +8. After VM-entry, the guest finally issues the EOI for vector 236. +However, because SVI is not configured, vector 236 is not cleared. +9. ISR is stalled forever on vector 236. + +Here is another scenario. + +1. Suppose APICv is inactive. Vector 236 is pending in the IRR. +2. To handle KVM_REQ_EVENT, KVM moves vector 236 from the IRR to the ISR, +and configures the VM_ENTRY_INTR_INFO_FIELD via vmx_inject_irq(). +3. VM-exit occurs immediately after the next VM-entry. The vector 236 is +not invoked through the guest IDT. Instead, it is saved to the +IDT_VECTORING_INFO_FIELD during the VM-exit. +4. KVM calls kvm_queue_interrupt() to re-queue the un-invoked vector 236 +into vcpu->arch.interrupt. A KVM_REQ_EVENT is requested. +5. Now, suppose APICv is activated. Before the next VM-entry, KVM calls +kvm_vcpu_update_apicv() to activate APICv. +6. Although APICv is now active, KVM still uses the legacy +VM_ENTRY_INTR_INFO_FIELD to re-inject vector 236. GUEST_INTR_STATUS.SVI is +not configured. +7. After the next VM-entry, vector 236 is invoked through the guest IDT. +Finally, an EOI occurs. However, due to the lack of GUEST_INTR_STATUS.SVI +configuration, vector 236 is not cleared from the ISR. +8. ISR is stalled forever on vector 236. + +Using QEMU as an example, vector 236 is stuck in ISR forever. + +(qemu) info lapic 1 +dumping local APIC state for CPU 1 + +LVT0 0x00010700 active-hi edge masked ExtINT (vec 0) +LVT1 0x00010400 active-hi edge masked NMI +LVTPC 0x00000400 active-hi edge NMI +LVTERR 0x000000fe active-hi edge Fixed (vec 254) +LVTTHMR 0x00010000 active-hi edge masked Fixed (vec 0) +LVTT 0x000400ec active-hi edge tsc-deadline Fixed (vec 236) +Timer DCR=0x0 (divide by 2) initial_count = 0 current_count = 0 +SPIV 0x000001ff APIC enabled, focus=off, spurious vec 255 +ICR 0x000000fd physical edge de-assert no-shorthand +ICR2 0x00000000 cpu 0 (X2APIC ID) +ESR 0x00000000 +ISR 236 +IRR 37(level) 236 + +The issue isn't applicable to AMD SVM as KVM simply writes vmcb01 directly +irrespective of whether L1 (vmcs01) or L2 (vmcb02) is active (unlike VMX, +there is no need/cost to switch between VMCBs). In addition, +APICV_INHIBIT_REASON_IRQWIN ensures AMD SVM AVIC is not activated until +the last interrupt is EOI'd. + +Fix the bug by configuring Intel VMX GUEST_INTR_STATUS.SVI if APICv is +activated at runtime. + +Signed-off-by: Dongli Zhang +Reviewed-by: Chao Gao +Link: https://patch.msgid.link/20251110063212.34902-1-dongli.zhang@oracle.com +[sean: call out that SVM writes vmcb01 directly, tweak comment] +Link: https://patch.msgid.link/20251205231913.441872-2-seanjc@google.com +Signed-off-by: Sean Christopherson +(cherry picked from commit b2849bec936be642b5420801f902337f2507648e) +Cc: stable@vger.kernel.org # 6.6.x and above +Cc: Gulshan Gabel +Signed-off-by: Jon Kohler +Signed-off-by: Sasha Levin +--- + arch/x86/kvm/vmx/vmx.c | 9 --------- + arch/x86/kvm/x86.c | 7 +++++++ + 2 files changed, 7 insertions(+), 9 deletions(-) + +diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c +index b8aa9ef73e7a46..d9011af23fb625 100644 +--- a/arch/x86/kvm/vmx/vmx.c ++++ b/arch/x86/kvm/vmx/vmx.c +@@ -6853,15 +6853,6 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr) + * VM-Exit, otherwise L1 with run with a stale SVI. + */ + if (is_guest_mode(vcpu)) { +- /* +- * KVM is supposed to forward intercepted L2 EOIs to L1 if VID +- * is enabled in vmcs12; as above, the EOIs affect L2's vAPIC. +- * Note, userspace can stuff state while L2 is active; assert +- * that VID is disabled if and only if the vCPU is in KVM_RUN +- * to avoid false positives if userspace is setting APIC state. +- */ +- WARN_ON_ONCE(vcpu->wants_to_run && +- nested_cpu_has_vid(get_vmcs12(vcpu))); + to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true; + return; + } +diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c +index a1ee8bd3ca1569..21c10a87eed5b2 100644 +--- a/arch/x86/kvm/x86.c ++++ b/arch/x86/kvm/x86.c +@@ -10629,9 +10629,16 @@ void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) + * pending. At the same time, KVM_REQ_EVENT may not be set as APICv was + * still active when the interrupt got accepted. Make sure + * kvm_check_and_inject_events() is called to check for that. ++ * ++ * Update SVI when APICv gets enabled, otherwise SVI won't reflect the ++ * highest bit in vISR and the next accelerated EOI in the guest won't ++ * be virtualized correctly (the CPU uses SVI to determine which vISR ++ * vector to clear). + */ + if (!apic->apicv_active) + kvm_make_request(KVM_REQ_EVENT, vcpu); ++ else ++ kvm_apic_update_hwapic_isr(vcpu); + + out: + preempt_enable(); +-- +2.53.0 + diff --git a/queue-6.12/series b/queue-6.12/series index e792c43bb2..146cd6f37f 100644 --- a/queue-6.12/series +++ b/queue-6.12/series @@ -109,3 +109,4 @@ writeback-avoid-contention-on-wb-list_lock-when-swit.patch writeback-fix-use-after-free-in-inode_switch_wbs_wor.patch xfrm-hold-device-only-for-the-asynchronous-decryptio.patch xfrm-hold-dev-ref-until-after-transport_finish-nf_ho.patch +kvm-vmx-update-svi-during-runtime-apicv-activation.patch