Paolo Bonzini [Tue, 2 Dec 2025 17:36:26 +0000 (18:36 +0100)]
Merge tag 'kvmarm-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.19
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
Paolo Bonzini [Tue, 2 Dec 2025 17:35:25 +0000 (18:35 +0100)]
Merge tag 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.19
- SBI MPXY support for KVM guest
- New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support enabling dirty log gradually in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
Paolo Bonzini [Tue, 2 Dec 2025 17:34:22 +0000 (18:34 +0100)]
Merge tag 'loongarch-kvm-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.19
1. Get VM PMU capability from HW GCFG register.
2. Add AVEC basic support.
3. Use 64-bit register definition for EIOINTC.
4. Add KVM timer test cases for tools/selftests.
Oliver Upton [Mon, 1 Dec 2025 08:47:41 +0000 (00:47 -0800)]
Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/next
* kvm-arm64/nv-xnx-haf: (22 commits)
: Support for FEAT_XNX and FEAT_HAF in nested
:
: Add support for a couple of MMU-related features that weren't
: implemented by KVM's software page table walk:
:
: - FEAT_XNX: Allows the hypervisor to describe execute permissions
: separately for EL0 and EL1
:
: - FEAT_HAF: Hardware update of the Access Flag, which in the context of
: nested means software walkers must also set the Access Flag.
:
: The series also adds some basic support for testing KVM's emulation of
: the AT instruction, including the implementation detail that AT sets the
: Access Flag in KVM.
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
...
Oliver Upton [Mon, 1 Dec 2025 08:47:32 +0000 (00:47 -0800)]
Merge branch 'kvm-arm64/vgic-lr-overflow' into kvmarm/next
* kvm-arm64/vgic-lr-overflow: (50 commits)
: Support for VGIC LR overflows, courtesy of Marc Zyngier
:
: Address deficiencies in KVM's GIC emulation when a vCPU has more active
: IRQs than can be represented in the VGIC list registers. Sort the AP
: list to prioritize inactive and pending IRQs, potentially spilling
: active IRQs outside of the LRs.
:
: Handle deactivation of IRQs outside of the LRs for both EOImode=0/1,
: which involves special consideration for SPIs being deactivated from a
: different vCPU than the one that acked it.
KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
KVM: arm64: selftests: vgic_irq: Add timer deactivation test
KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
KVM: arm64: selftests: gic_v3: Add irq group setting helper
KVM: arm64: GICv2: Always trap GICV_DIR register
KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
KVM: arm64: GICv2: Handle LR overflow when EOImode==0
KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
KVM: arm64: GICv3: Handle in-LR deactivation when possible
KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
...
Oliver Upton [Mon, 1 Dec 2025 08:47:20 +0000 (00:47 -0800)]
Merge branch 'kvm-arm64/sea-user' into kvmarm/next
* kvm-arm64/sea-user:
: Userspace handling of SEAs, courtesy of Jiaqi Yan
:
: Add support for processing external aborts in userspace in situations
: where the host has failed to do so, allowing the VMM to potentially
: reinject an external abort into the VM.
Documentation: kvm: new UAPI for handling SEA
KVM: selftests: Test for KVM_EXIT_ARM_SEA
KVM: arm64: VM exit to userspace to handle SEA
Oliver Upton [Mon, 1 Dec 2025 08:47:12 +0000 (00:47 -0800)]
Merge branch 'kvm-arm64/misc' into kvmarm/next
* kvm-arm64/misc:
: Miscellaneous fixes/cleanups for KVM/arm64
:
: - Fix for need_resched warnings on non-preemptible kernels when
: tearing down a VM's stage-2
:
: - Improvements to KVM struct allocation, getting rid of pointless
: __GFP_HIGHMEM and switching to kvzalloc()
:
: - SYNC ITS configuration before injecting LPIs in vgic_lpi_stress
: selftest
KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
KVM: arm64: Split kvm_pgtable_stage2_destroy()
KVM: arm64: Only drop references on empty tables in stage2_free_walker
KVM: selftests: SYNC after guest ITS setup in vgic_lpi_stress
KVM: selftests: Assert GICR_TYPER.Processor_Number matches selftest CPU number
KVM: arm64: Use kvzalloc() for kvm struct allocation
KVM: arm64: Drop useless __GFP_HIGHMEM from kvm struct allocation
Alexandru Elisei [Fri, 28 Nov 2025 10:09:46 +0000 (10:09 +0000)]
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
A guest can write 1 to TCR_ELx.HA, making the KVM software walker update
the access flag in a table descriptor even if FEAT_HAFDBS is not present.
Avoid this by making wi->ha depend on FEAT_HAFDBS being enabled in the VM,
similar to how the software walker treats FEAT_HPDS.
This is not needed for VTCR_EL2.HA, since a guest will always write to
the in-memory copy of the register, where the HA bit is masked (set to
0) by KVM if the VM doesn't have FEAT_HAFDBS.
Fixes: c59ca4b5b0c3 ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW") Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://msgid.link/20251128100946.74210-5-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>
Alexandru Elisei [Fri, 28 Nov 2025 10:09:43 +0000 (10:09 +0000)]
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
Commit 2608563b466b ("KVM: arm64: Add support for FEAT_XNX stage-2
permissions") added the KVM_PGTABLE_PROX_{UX,PX} permissions to stage 2 and
to EL2 translation regimes, but left them undocumented. Let's fix that.
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
Clang warns (or errors with CONFIG_WERROR=y / W=e):
arch/arm64/kvm/hyp/pgtable.c:757:2: error: label at end of compound statement is a C23 extension [-Werror,-Wc23-extensions]
757 | }
| ^
With older versions of clang (15 and older) and GCC (at least the minimum
supported, 8.1), this is an unconditional hard error:
arch/arm64/kvm/hyp/pgtable.c: In function 'kvm_pgtable_stage2_pte_prot':
arch/arm64/kvm/hyp/pgtable.c:756:2: error: label at end of compound statement
default:
^~~~~~~
arch/arm64/kvm/hyp/pgtable.c:756:10: error: label at end of compound statement: expected statement
default:
^
;
Add a break statement to this default case to clear up the error/warning.
Oliver Upton [Mon, 24 Nov 2025 23:54:09 +0000 (15:54 -0800)]
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
__lse_swap_desc() is compiled unconditionally, even if LSE is disabled
using the config option. Align with the spirit of the config option and
fix some build errors due to __LSE_PREAMBLE being undefined with the
application of some ifdeffery.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511250700.kAutzJFm-lkp@intel.com/ Link: https://msgid.link/20251124235409.1731253-1-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Oliver Upton [Mon, 24 Nov 2025 19:01:54 +0000 (11:01 -0800)]
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
Atomically update the Access flag at stage-1 when the guest has
configured the MMU to do so. Make the implementation choice (and liberal
interpretation of speculation) that any access type updates the Access
flag, including AT and CMO instructions.
Restart the entire walk by returning to the exception-generating
instruction in the case of a failed Access flag update.
Oliver Upton [Mon, 24 Nov 2025 19:01:53 +0000 (11:01 -0800)]
KVM: arm64: Propagate PTW errors up to AT emulation
KVM's software PTW will soon support 'hardware' updates to the access
flag. Similar to fault handling, races to update the descriptor will be
handled by restarting the instruction. Prepare for this by propagating
errors up to the AT emulation, only retiring the instruction if the walk
succeeds.
Oliver Upton [Mon, 24 Nov 2025 19:01:52 +0000 (11:01 -0800)]
KVM: arm64: Add helper for swapping guest descriptor
Implementing FEAT_HAFDBS in KVM's software PTWs requires the ability to
CAS a descriptor to update the in-memory value. Add an accessor to do
exactly that, coping with the fact that guest descriptors are in user
memory (duh).
While FEAT_LSE required on any system that implements NV, KVM now uses
the stage-1 PTW for non-nested use cases meaning an LL/SC implementation
is necessary as well.
Oliver Upton [Mon, 24 Nov 2025 19:01:50 +0000 (11:01 -0800)]
KVM: arm64: Handle endianness in read helper for emulated PTW
Implementing FEAT_HAFDBS means adding another descriptor accessor that
needs to deal with the guest-configured endianness. Prepare by moving
the endianness handling into the read accessor and out of the main body
of the S1/S2 PTWs.
Oliver Upton [Mon, 24 Nov 2025 19:01:49 +0000 (11:01 -0800)]
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
The stage-2 table walker passes down the vCPU as a void pointer. That
might've made sense if the walker was generic although at this point it
is clear this will only ever be used in the context of a vCPU.
Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-8-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Oliver Upton [Mon, 24 Nov 2025 19:01:46 +0000 (11:01 -0800)]
KVM: arm64: Teach ptdump about FEAT_XNX permissions
Although KVM doesn't make direct use of the feature, guest hypervisors
can use FEAT_XNX which influences the permissions of the shadow stage-2.
Update ptdump to separately print the privileged and unprivileged
execute permissions.
Bibo Mao [Fri, 28 Nov 2025 06:49:47 +0000 (14:49 +0800)]
KVM: LoongArch: selftests: Add SW emulated timer test case
This test case setup one-shot timer and execute idle instruction
immediately to indicate giving up CPU, hypervisor will emulate SW
hrtimer and wakeup vCPU when SW hrtimer is fired.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Thu, 27 Nov 2025 03:00:18 +0000 (11:00 +0800)]
LoongArch: KVM: Use 64-bit register definition for EIOINTC
With in-kernel emulated eiointc driver, hardware register can be
accessed by different size, there is reg_u8/reg_u16/reg_u32/reg_u64
union type with EIOINTC register.
Here use 64-bit type with register definition and remove union type
since most registers are accessed with 64-bit method. And this makes
EIOINTC emulated driver simpler.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Thu, 27 Nov 2025 03:00:18 +0000 (11:00 +0800)]
LoongArch: KVM: Get VM PMU capability from HW GCFG register
Now VM PMU capability comes from host PMU capability directly, instead
bit 23 of HW GCFG CSR register also show PMU capability for VM. It will
be better if it comes from HW GCFG CSR register rather than just host
PMU capability, especially when LVZ feature is emulated in TCG mode, in
which case without PMU capability.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Paolo Bonzini [Wed, 26 Nov 2025 08:46:45 +0000 (09:46 +0100)]
Merge tag 'kvm-x86-svm-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.19:
- Fix a few missing "VMCB dirty" bugs.
- Fix the worst of KVM's lack of EFER.LMSLE emulation.
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode.
- Fix incorrect handling of selective CR0 writes when checking intercepts
during emulation of L2 instructions.
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on
VMRUN and #VMEXIT.
- Fix a bug where KVM corrupt the guest code stream when re-injecting a soft
interrupt if the guest patched the underlying code after the VM-Exit, e.g.
when Linux patches code with a temporary INT3.
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to
userspace, and extend KVM "support" to all policy bits that don't require
any actual support from KVM.
Paolo Bonzini [Wed, 26 Nov 2025 08:44:52 +0000 (09:44 +0100)]
Merge tag 'kvm-x86-vmx-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM VMX changes for 6.19:
- Use the root role from kvm_mmu_page to construct EPTPs instead of the
current vCPU state, partly as worthwhile cleanup, but mostly to pave the
way for tracking per-root TLB flushes so that KVM can elide EPT flushes on
pCPU migration if KVM has flushed the root at least once.
- Add a few missing nested consistency checks.
- Rip out support for doing "early" consistency checks via hardware as the
functionality hasn't been used in years and is no longer useful in general,
and replace it with an off-by-default module param to detected missed
consistency checks (i.e. WARN if hardware finds a check that KVM does not).
- Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32]
on VM-Enter.
Paolo Bonzini [Wed, 26 Nov 2025 08:36:37 +0000 (09:36 +0100)]
Merge tag 'kvm-x86-tdx-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM TDX changes for 6.19:
- Overhaul the TDX code to address systemic races where KVM (acting on behalf
of userspace) could inadvertantly trigger lock contention in the TDX-Module,
which KVM was either working around in weird, ugly ways, or was simply
oblivious to (as proven by Yan tripping several KVM_BUG_ON()s with clever
selftests).
- Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a vCPU if
creating said vCPU failed partway through.
- Fix a few sparse warnings (bad annotation, 0 != NULL).
- Use struct_size() to simplify copying capabilities to userspace.
Paolo Bonzini [Wed, 26 Nov 2025 08:35:40 +0000 (09:35 +0100)]
Merge tag 'kvm-x86-selftests-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.19:
- Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
- Forcefully override ARCH from x86_64 to x86 to play nice with specifying
ARCH=x86_64 on the command line.
- Extend a bunch of nested VMX to validate nested SVM as well.
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
verify KVM can save/restore nested VMX state when L1 is using 5-level
paging, but L2 is not.
- Clean up the guest paging code in anticipation of sharing the core logic for
nested EPT and nested NPT.
Paolo Bonzini [Wed, 26 Nov 2025 08:34:21 +0000 (09:34 +0100)]
Merge tag 'kvm-x86-misc-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.19:
- Fix an async #PF bug where KVM would clear the completion queue when the
guest transitioned in and out of paging mode, e.g. when handling an SMI and
then returning to paged mode via RSM.
- Fix a bug where TDX would effectively corrupt user-return MSR values if the
TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected.
- Leave the user-return notifier used to restore MSRs registered when
disabling virtualization, and instead pin kvm.ko. Restoring host MSRs via
IPI callback is either pointless (clean reboot) or dangerous (forced reboot)
since KVM has no idea what code it's interrupting.
- Use the checked version of {get,put}_user(), as Linus wants to kill them
off, and they're measurably faster on modern CPUs due to the unchecked
versions containing an LFENCE.
- Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC
timers can result in a hard lockup in the host.
- Revert the periodic kvmclock sync logic now that KVM doesn't use a
clocksource that's subject to NPT corrections.
- Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter
behind CONFIG_CPU_MITIGATIONS.
- Context switch XCR0, XSS, and PKRU outside of the entry/exit fastpath as
the only reason they were handled in the faspath was to paper of a bug in
the core #MC code that has long since been fixed.
- Add emulator support for AVX MOV instructions to play nice with emulated
devices whose PCI BARs guest drivers like to access with large multi-byte
instructions.
Paolo Bonzini [Wed, 26 Nov 2025 08:32:44 +0000 (09:32 +0100)]
Merge tag 'kvm-x86-gmem-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM guest_memfd changes for 6.19:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety of
rough edges in guest_memfd along the way.
- Define a CLASS to automatically handle get+put when grabbing a guest_memfd
from a memslot to make it harder to leak references.
- Enhance KVM selftests to make it easer to develop and debug selftests like
those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
often result in hard-to-debug SIGBUS errors.
Marc Zyngier [Tue, 25 Nov 2025 16:01:44 +0000 (16:01 +0000)]
KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
Suzuki notices that making the ICH_HCR_EL2_TDIR capability a system
one isn't a very good idea, should we end-up with CPUs that have
asymmetric TDIR support (somehow unlikely, but you never know what
level of stupidity vendors are up to). For this hypothetical setup,
making this an "EARLY_LOCAL_CPU_FEATURE" is a much better option.
This is actually consistent with what we already do with GICv5
legacy interface, so flip the capability over.
Wanpeng Li [Mon, 10 Nov 2025 03:32:27 +0000 (11:32 +0800)]
KVM: Fix last_boosted_vcpu index assignment bug
In kvm_vcpu_on_spin(), the loop counter 'i' is incorrectly written to
last_boosted_vcpu instead of the actual vCPU index 'idx'. This causes
last_boosted_vcpu to store the loop iteration count rather than the
vCPU index, leading to incorrect round-robin behavior in subsequent
directed yield operations.
Fix this by using 'idx' instead of 'i' in the assignment.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20251110033232.12538-7-kernellwp@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Marc Zyngier [Thu, 20 Nov 2025 17:25:39 +0000 (17:25 +0000)]
KVM: arm64: selftests: vgic_irq: Add timer deactivation test
Add a new test case that triggers the HW deactivation emulation path
when trapping ICV_DIR_EL1. This is obviously tied to the way KVM
works now, but the test follows the expected architectural behaviour.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-50-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:38 +0000 (17:25 +0000)]
KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
Add a new test case that inject a Group-0 interrupt together
with a bunch of Group-1 interrupts, Ack/EOI the G1 interrupts,
and only then enable G0, expecting to get the G0 interrupt.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-49-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:37 +0000 (17:25 +0000)]
KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
Add a new test case that makes an interrupt pending on a vcpu,
activates it, do the priority drop, and then get *another* vcpu
to do the deactivation.
Special care is taken not to trigger an exit in the process, so
that we are sure that the active interrupt is in an LR. Joy.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-48-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:36 +0000 (17:25 +0000)]
KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
When EOImode==1, perform the deactivation in the order of activation,
just to make things a bit worse for KVM. Yes, I'm nasty.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-47-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Good news: our GIC emulation is not completely broken, and we can
activate as many interrupts as we want.
Bump the test to cover all the SGIs, all the allowed PPIs, and
31 SPIs. Yes, 31, because we have 31 available priorities, and the
test is not happy with having two interrupts with the same priority.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-46-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
The PPI injection API is clear that you can't inject the timer PPIs
from userspace, since they are controlled by the timers themselves.
Add an exclusion list for this purpose.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-45-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:33 +0000 (17:25 +0000)]
KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
The architecture is pretty clear that changing the configuration of
an enable interrupt is not OK. It doesn't really matter here, but
doing the right thing is not more expensive.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-44-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
No, 0 is not a spurious INTID. Never been, never was.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-43-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:31 +0000 (17:25 +0000)]
KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
Make sure G0 is disabled at the point of initialising the GIC.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-42-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:30 +0000 (17:25 +0000)]
KVM: arm64: selftests: gic_v3: Add irq group setting helper
Being able to set the group of an interrupt is pretty useful.
Add such a helper.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-41-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:29 +0000 (17:25 +0000)]
KVM: arm64: GICv2: Always trap GICV_DIR register
Since we can't decide to trap the DIR register on a per-vcpu basis,
always trap the second page of the GIC CPU interface. Yes, this is
costly. On the bright side, no sane SW should use EOImode==1 on
GICv2...
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-40-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:28 +0000 (17:25 +0000)]
KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
Add the plumbing of GICv2 interrupt deactivation via GICV_DIR.
This requires adding a new device so that we can easily decode
the DIR address.
The deactivation itself is very similar to the GICv3 version.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-39-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:27 +0000 (17:25 +0000)]
KVM: arm64: GICv2: Handle LR overflow when EOImode==0
Similarly to the GICv3 version, handle the EOIcount-driven deactivation
by walking the overflow list.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-38-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:26 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
FEAT_NV2 is pretty terrible for anything that tries to enforce immediate
effects, and writing to ICH_HCR_EL2 in the hope to disable a maintenance
interrupt is vain. This only hits memory, and the guest hasn't cleared
anything -- the MI will fire.
For example, running the vgic_irq test under NV results in about 800
maintenance interrupts being actually handled by the L1 guest,
when none were expected.
As a cheap workaround, read back ICH_MISR_EL2 after writing 0 to
ICH_HCR_EL2. This is very cheap on real HW, and causes a trap to
the host in NV, giving it the opportunity to retire the pending MI.
With this, the above test runs to completion without any MI being
actually handled.
Yes, this is really poor...
Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-37-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:25 +0000 (17:25 +0000)]
KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
Pretty much like the rest of the LR handling, deactivation of an
L2 interrupt gets reflected in the L1 LRs, and therefore must be
propagated into the L1 shadow state if the interrupt is HW-bound.
Instead of directly handling the active state (which looks a bit
off as it ignores locking and L1->L0 HW propagation), use the new
deactivation primitive to perform the deactivation and deal with
the required maintenance.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-36-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:24 +0000 (17:25 +0000)]
KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
The current approach to nested GICv3 support is to not do anything
while L2 is running, wait a transition from L2 to L1 to resync
LRs, VMCR and HCR, and only then evaluate the state to decide
whether to generate a maintenance interrupt.
This doesn't provide a good quality of emulation, and it would be
far preferable to find out early that we need to perform a switch.
Move the LRs/VMCR and HCR resync into vgic_v3_sync_nested(), so
that we have most of the state available. As we turning the vgic
off at this stage to avoid a screaming host MI, add a new helper
vgic_v3_flush_nested() that switches the vgic on again. The MI can
then be directly injected as required.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-35-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:23 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
CPUs lacking TDIR always trap ICV_DIR_EL1, no matter what, since
we have ICH_HCR_EL2.TC set permanently. For these CPUs, it is
useless to use a broadcast kick on SPI injection, as the sole
purpose of this is to set TDIR.
We can therefore skip this on these CPUs, which are challenged
enough not to be burdened by extra IPIs. As a consequence,
permanently set the TDIR bit in the shadow state to notify the
fast-path emulation code of the exit reason.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-34-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:22 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Handle in-LR deactivation when possible
Even when we have either an LR overflow or SPIs in flight, it is
extremely likely that the interrupt being deactivated is still in
the LRs, and that going all the way back to the the generic trap
handling code is a waste of time.
Instead, try and deactivate in place when possible, and only if
this fails, perform a full exit.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-33-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:21 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
SPIs are specially annpying, as they can be activated on a CPU and
deactivated on another. WHich means that when an SPI is in flight
anywhere, all CPUs need to have their TDIR trap bit set.
This translates into broadcasting an IPI across all CPUs to make sure
they set their trap bit, The number of in-flight SPIs is kept in
an atomic variable so that CPUs can turn the trap bit off as soon
as possible.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-32-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:20 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Set ICH_HCR_EL2.TDIR when interrupts overflow LR capacity
Now that we are ready to handle deactivation through ICV_DIR_EL1,
set the trap bit if we have active interrupts outside of the LRs.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-31-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:19 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Add GICv2 SGI handling to deactivation primitive
The GICv2 SGIs require additional handling for deactivation, as they
are effectively multiple interrrupts muxed into one. Make sure we
check for the source CPU when deactivating.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-30-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:18 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Handle deactivation via ICV_DIR_EL1 traps
Deactivation via ICV_DIR_EL1 is both relatively straightforward
(we have the interrupt that needs deactivation) and really awkward.
The main issue is that the interrupt may either be in an LR on
another CPU, or ourside of any LR.
In the former case, we process the deactivation is if ot was
a write to GICD_CACTIVERn, which is already implemented as a big
hammer IPI'ing all vcpus. In the latter case, we just perform
a normal deactivation, similar to what we do for EOImode==0.
Another annoying aspect is that we need to tell the CPU owning
the interrupt that its ap_list needs laudering. We use a brand new
vcpu request to that effect.
Note that this doesn't address deactivation via the GICV MMIO view,
which will be taken care of in a later change.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-29-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:17 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Handle LR overflow when EOImode==0
Now that we can identify interrupts that have not made it into the LRs,
it becomes relatively easy to use EOIcount to walk the overflow list.
What is a bit odd is that we compute a fake LR for the original
state of the interrupt, clear the active bit, and feed into the existing
logic for processing. In a way, this is what would have happened if
the interrupt was in an LR.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-28-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:16 +0000 (17:25 +0000)]
KVM: arm64: Use MI to detect groups being enabled/disabled
Add the maintenance interrupt to force an exit when the guest
enables/disables individual groups, so that we can resort the
ap_list accordingly.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-27-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:15 +0000 (17:25 +0000)]
KVM: arm64: Move undeliverable interrupts to the end of ap_list
Interrupts in the ap_list that cannot be acted upon because they
are not enabled, or that their group is not enabled, shouldn't
make it into the LRs if we are space-constrained.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-26-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:14 +0000 (17:25 +0000)]
KVM: arm64: Invert ap_list sorting to push active interrupts out
Having established that pending interrupts should have priority
to be moved into the LRs over the active interrupts, implement this
in the ap_list sorting.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-25-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:13 +0000 (17:25 +0000)]
KVM: arm64: Make vgic_target_oracle() globally available
Make the internal crystal ball global, so that implementation-specific
code can use it.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-24-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:12 +0000 (17:25 +0000)]
KVM: arm64: Turn kvm_vgic_vcpu_enable() into kvm_vgic_vcpu_reset()
Now that we always reconfigure the vgic HCR register on entry,
the "enable" part of kvm_vgic_vcpu_enable() is pretty useless.
Removing the enable bits from these functions makes it plain that
they are just about computing the reset state. Just rename the
functions accordingly.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-23-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
We currently don't use the maintenance interrupt very much, apart
from EOI on level interrupts, and for LR underflow in limited cases.
However, as we are moving toward a setup where active interrupts
can live outside of the LRs, we need to use the MIs in a more
diverse set of cases.
Add a new helper that produces a digest of the ap_list, and use
that summary to set the various control bits as required.
This slightly changes the way v2 SGIs are handled, as they used to
count for more than one interrupt, but not anymore.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-22-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:10 +0000 (17:25 +0000)]
KVM: arm64: Eagerly save VMCR on exit
We currently save/restore the VMCR register in a pretty lazy way
(on load/put, consistently with what we do with the APRs).
However, we are going to need the group-enable bits that are backed
by VMCR on each entry (so that we can avoid injecting interrupts for
disabled groups).
Move the synchronisation from put to sync, which results in some minor
churn in the nVHE hypercalls to simplify things.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-21-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:09 +0000 (17:25 +0000)]
KVM: arm64: Compute vgic state irrespective of the number of interrupts
As we are going to rely on the [G]ICH_HCR{,_EL2} register to be
programmed with MI information at all times, slightly de-optimise
the flush/sync code to always be called. This is rather lightweight
when no interrupts are in flight.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-20-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:08 +0000 (17:25 +0000)]
KVM: arm64: GICv2: Extract LR computing primitive
Split vgic_v2_populate_lr() into two helpers, so that we have another
primitive that computes the LR from a vgic_irq, but doesn't update
anything in the shadow structure.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-19-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:07 +0000 (17:25 +0000)]
KVM: arm64: GICv2: Extract LR folding primitive
As we are going to need to handle deactivation for interrupts that
are not in the LRs, split vgic_v2_fold_lr_state() into a helper
that deals with a single interrupt, and the function that loops
over the used LRs.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-18-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:06 +0000 (17:25 +0000)]
KVM: arm64: GICv2: Decouple GICH_HCR programming from LRs being loaded
Not programming GICH_HCR while no LRs are populated is a bit
of an issue, as we otherwise don't see any maintenance interrupt
when the guest interacts with the LRs.
Decouple the two and always program the control register, even when
we don't have to touch the LRs.
This is very similar to what we are already doing for GICv3.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-17-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:05 +0000 (17:25 +0000)]
KVM: arm64: GICv2: Preserve EOIcount on exit
EOIcount is how the virtual CPU interface signals that the guest
is deactivating interrupts outside of the LRs when EOImode==0.
We therefore need to preserve that information so that we can find
out what actually needs deactivating, just like we already do on
GICv3.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-16-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:04 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Extract LR computing primitive
Split vgic_v3_populate_lr() into two, so that we have another
primitive that computes the LR from a vgic_irq, but doesn't
update anything in the shadow structure.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-15-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:03 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Extract LR folding primitive
As we are going to need to handle deactivation for interrupts that
are not in the LRs, split vgic_v3_fold_lr_state() into a helper
that deals with a single interrupt, and the function that loops
over the used LRs.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-14-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:02 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Decouple ICH_HCR_EL2 programming from LRs
Not programming ICH_HCR_EL2 while no LRs are populated is a bit
of an issue, as we otherwise don't see any maintenance interrupt
when the guest interacts with the LRs.
Decouple the two and always program the control register, even when
we don't have to touch the LRs.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-13-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:01 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Preserve EOIcount on exit
EOIcount is how the virtual CPU interface signals that the guest
is deactivating interrupts outside of the LRs when EOImode==0.
We therefore need to preserve that information so that we can find
out what actually needs deactivating.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-12-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:25:00 +0000 (17:25 +0000)]
KVM: arm64: GICv3: Drop LPI active state when folding LRs
Despite LPIs not having an active state, *virtual* LPIs do have
one, which gets cleared on EOI. So far, so good.
However, this leads to a small problem: when an active LPI is not
in the LRs, that EOImode==0 and that the guest EOIs it, EOIcount
doesn't get bumped up. Which means that in these condition, the
LPI would stay active forever.
Clearly, we can't have that. So if we spot an active LPI, we drop
that state. It's pretty pointless anyway, and only serves as a way
to trip SW over.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-11-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:24:59 +0000 (17:24 +0000)]
KVM: arm64: Add LR overflow handling documentation
Add a bit of documentation describing how we are dealing with LR
overflow. This is mostly a braindump of how things are expected
to work. For now anyway.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-10-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:24:58 +0000 (17:24 +0000)]
KVM: arm64: Add tracking of vgic_irq being present in a LR
We currently cannot identify whether an interrupt is queued into
a LR. It wasn't needed until now, but that's about to change.
Add yet another flag to track that state.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-9-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:24:57 +0000 (17:24 +0000)]
KVM: arm64: Repack struct vgic_irq fields
struct vgic_irq has grown over the years, in a rather bad way.
Repack it using bitfields so that the individual flags, and move
things around a bit so that it a bit smaller.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-8-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:24:56 +0000 (17:24 +0000)]
KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping
A long time ago, an unsuspecting architect forgot to add a trap
bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but
what's a bit of spec between friends? Thankfully, this was fixed
in a later revision, and ARM "deprecates" the lack of trapping
ability.
Unfortuantely, a few (billion) CPUs went out with that defect,
anything ARMv8.0 from ARM, give or take. And on these CPUs,
you can't trap DIR on its own, full stop.
As the next best thing, we can trap everything in the common group,
which is a tad expensive, but hey ho, that's what you get. You can
otherwise recycle the HW in the neaby bin.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:24:55 +0000 (17:24 +0000)]
KVM: arm64: vgic-v3: Fix GICv3 trapping in protected mode
As we are about to start trapping a bunch of extra things, augment
the pKVM trap description with all the registers trapped by ICH_HCR_EL2.TC,
making them legal instead of resulting in a UNDEF injection in the guest.
While we're at it, ensure that pKVM captures the vgic model so that it
can be checked by the emulation code.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-6-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:24:54 +0000 (17:24 +0000)]
KVM: arm64: Turn vgic-v3 errata traps into a patched-in constant
The trap bits are currently only set to manage CPU errata. However,
we are about to make use of them for purposes beyond beating broken
CPUs into submission.
For this purpose, turn these errata-driven bits into a patched-in
constant that is merged with the KVM-driven value at the point of
programming the ICH_HCR_EL2 register, rather than being directly
stored with with the shadow value..
This allows the KVM code to distinguish between a trap being handled
for the purpose of an erratum workaround, or for KVM's own need.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-5-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:24:53 +0000 (17:24 +0000)]
irqchip/apple-aic: Spit out ICH_MISR_EL2 value on spurious vGIC MI
It is all good and well to scream about spurious vGIC maintenance
interrupts. It would be even better to output the reason why, which
is already checked, but not printed out.
The unsuspecting kernel tinkerer thanks you.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-4-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:24:52 +0000 (17:24 +0000)]
irqchip/gic: Expose CPU interface VA to KVM
Future changes will require KVM to be able to perform deactivations
by writing to the physical CPU interface. Add the corresponding
VA to the kvm_info structure, and let KVM stash it.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-3-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Marc Zyngier [Thu, 20 Nov 2025 17:24:51 +0000 (17:24 +0000)]
irqchip/gic: Add missing GICH_HCR control bits
The GICH_HCR description is missing a bunch of control bits that
control the maintenance interrupt. Add them.
Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-2-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
Oliver Upton [Mon, 24 Nov 2025 19:01:45 +0000 (11:01 -0800)]
KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
Add support for FEAT_XNX to shadow stage-2 MMUs, being careful to only
evaluate XN[0] when the feature is actually exposed to the VM.
Restructure the layering of permissions in the fault handler to assume
pX and uX then restricting based on the guest's stage-2 afterwards.
Oliver Upton [Mon, 24 Nov 2025 19:01:44 +0000 (11:01 -0800)]
KVM: arm64: Add support for FEAT_XNX stage-2 permissions
FEAT_XNX adds support for encoding separate execute permissions for EL0
and EL1 at stage-2. Add support for this to the page table library,
hiding the unintuitive encoding scheme behind generic pX and uX
permission flags.
RISC-V: KVM: Flush VS-stage TLB after VCPU migration for Andes cores
Most implementations cache the combined result of two-stage translation,
but some, like Andes cores, use split TLBs that store VS-stage and
G-stage entries separately.
On such systems, when a VCPU migrates to another CPU, an additional
HFENCE.VVMA is required to avoid using stale VS-stage entries, which
could otherwise cause guest faults.
Introduce a static key to identify CPUs with split two-stage TLBs.
When enabled, KVM issues an extra HFENCE.VVMA on VCPU migration to
prevent stale VS-stage mappings.
Signed-off-by: Hui Min Mina Chou <minachou@andestech.com> Signed-off-by: Ben Zong-You Xie <ben717@andestech.com> Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20251117084555.157642-1-minachou@andestech.com Signed-off-by: Anup Patel <anup@brainfault.org>
Fangyu Yu [Fri, 21 Nov 2025 13:35:43 +0000 (21:35 +0800)]
RISC-V: KVM: Fix guest page fault within HLV* instructions
When executing HLV* instructions at the HS mode, a guest page fault
may occur when a g-stage page table migration between triggering the
virtual instruction exception and executing the HLV* instruction.
This may be a corner case, and one simpler way to handle this is to
re-execute the instruction where the virtual instruction exception
occurred, and the guest page fault will be automatically handled.
Dong Yang [Mon, 3 Nov 2025 06:28:25 +0000 (14:28 +0800)]
KVM: riscv: Support enabling dirty log gradually in small chunks
There is already support of enabling dirty log gradually in small chunks
for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in
small chunks") and c862626 ("KVM: arm64: Support enabling dirty log
gradually in small chunks"). This adds support for riscv.
x86 and arm64 writes protect both huge pages and normal pages now, so
riscv protect also protects both huge pages and normal pages.
On a nested virtualization setup (RISC-V KVM running inside a QEMU VM
on an [Intel® Core™ i5-12500H] host), I did some tests with a 2G Linux
VM using different backing page sizes. The time taken for
memory_global_dirty_log_start in the L2 QEMU is listed below:
Page Size Before After Optimization
4K 4490.23ms 31.94ms
2M 48.97ms 45.46ms
1G 28.40ms 30.93ms