]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
2 weeks agoMerge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/next
Oliver Upton [Mon, 1 Dec 2025 08:47:41 +0000 (00:47 -0800)] 
Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/next

* kvm-arm64/nv-xnx-haf: (22 commits)
  : Support for FEAT_XNX and FEAT_HAF in nested
  :
  : Add support for a couple of MMU-related features that weren't
  : implemented by KVM's software page table walk:
  :
  :  - FEAT_XNX: Allows the hypervisor to describe execute permissions
  :    separately for EL0 and EL1
  :
  :  - FEAT_HAF: Hardware update of the Access Flag, which in the context of
  :    nested means software walkers must also set the Access Flag.
  :
  : The series also adds some basic support for testing KVM's emulation of
  : the AT instruction, including the implementation detail that AT sets the
  : Access Flag in KVM.
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
  ...

Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoMerge branch 'kvm-arm64/vgic-lr-overflow' into kvmarm/next
Oliver Upton [Mon, 1 Dec 2025 08:47:32 +0000 (00:47 -0800)] 
Merge branch 'kvm-arm64/vgic-lr-overflow' into kvmarm/next

* kvm-arm64/vgic-lr-overflow: (50 commits)
  : Support for VGIC LR overflows, courtesy of Marc Zyngier
  :
  : Address deficiencies in KVM's GIC emulation when a vCPU has more active
  : IRQs than can be represented in the VGIC list registers. Sort the AP
  : list to prioritize inactive and pending IRQs, potentially spilling
  : active IRQs outside of the LRs.
  :
  : Handle deactivation of IRQs outside of the LRs for both EOImode=0/1,
  : which involves special consideration for SPIs being deactivated from a
  : different vCPU than the one that acked it.
  KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
  KVM: arm64: selftests: vgic_irq: Add timer deactivation test
  KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
  KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
  KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
  KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
  KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
  KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
  KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
  KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
  KVM: arm64: selftests: gic_v3: Add irq group setting helper
  KVM: arm64: GICv2: Always trap GICV_DIR register
  KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
  KVM: arm64: GICv2: Handle LR overflow when EOImode==0
  KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
  KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
  KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
  KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
  KVM: arm64: GICv3: Handle in-LR deactivation when possible
  KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
  ...

Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoMerge branch 'kvm-arm64/sea-user' into kvmarm/next
Oliver Upton [Mon, 1 Dec 2025 08:47:20 +0000 (00:47 -0800)] 
Merge branch 'kvm-arm64/sea-user' into kvmarm/next

* kvm-arm64/sea-user:
  : Userspace handling of SEAs, courtesy of Jiaqi Yan
  :
  : Add support for processing external aborts in userspace in situations
  : where the host has failed to do so, allowing the VMM to potentially
  : reinject an external abort into the VM.
  Documentation: kvm: new UAPI for handling SEA
  KVM: selftests: Test for KVM_EXIT_ARM_SEA
  KVM: arm64: VM exit to userspace to handle SEA

Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoMerge branch 'kvm-arm64/misc' into kvmarm/next
Oliver Upton [Mon, 1 Dec 2025 08:47:12 +0000 (00:47 -0800)] 
Merge branch 'kvm-arm64/misc' into kvmarm/next

* kvm-arm64/misc:
  : Miscellaneous fixes/cleanups for KVM/arm64
  :
  :  - Fix for need_resched warnings on non-preemptible kernels when
  :    tearing down a VM's stage-2
  :
  :  - Improvements to KVM struct allocation, getting rid of pointless
  :    __GFP_HIGHMEM and switching to kvzalloc()
  :
  :  - SYNC ITS configuration before injecting LPIs in vgic_lpi_stress
  :    selftest
  KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
  KVM: arm64: Split kvm_pgtable_stage2_destroy()
  KVM: arm64: Only drop references on empty tables in stage2_free_walker
  KVM: selftests: SYNC after guest ITS setup in vgic_lpi_stress
  KVM: selftests: Assert GICR_TYPER.Processor_Number matches selftest CPU number
  KVM: arm64: Use kvzalloc() for kvm struct allocation
  KVM: arm64: Drop useless __GFP_HIGHMEM from kvm struct allocation

Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
Alexandru Elisei [Fri, 28 Nov 2025 10:09:46 +0000 (10:09 +0000)] 
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS

A guest can write 1 to TCR_ELx.HA, making the KVM software walker update
the access flag in a table descriptor even if FEAT_HAFDBS is not present.
Avoid this by making wi->ha depend on FEAT_HAFDBS being enabled in the VM,
similar to how the software walker treats FEAT_HPDS.

This is not needed for VTCR_EL2.HA, since a guest will always write to
the in-memory copy of the register, where the HA bit is masked (set to
0) by KVM if the VM doesn't have FEAT_HAFDBS.

Fixes: c59ca4b5b0c3 ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://msgid.link/20251128100946.74210-5-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
Alexandru Elisei [Fri, 28 Nov 2025 10:09:44 +0000 (10:09 +0000)] 
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2

According to ARM DDI 0487L.b, the HA bit in TCR_EL2 when the translation
regime is EL2 (or !ELIsInHost(EL2)) is bit 21, not 39.

Fixes: c59ca4b5b0c3 ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://msgid.link/20251128100946.74210-3-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
Alexandru Elisei [Fri, 28 Nov 2025 10:09:43 +0000 (10:09 +0000)] 
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}

Commit 2608563b466b ("KVM: arm64: Add support for FEAT_XNX stage-2
permissions") added the KVM_PGTABLE_PROX_{UX,PX} permissions to stage 2 and
to EL2 translation regimes, but left them undocumented. Let's fix that.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://msgid.link/20251128100946.74210-2-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
Colin Ian King [Fri, 28 Nov 2025 17:51:24 +0000 (17:51 +0000)] 
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"

There is a spelling mistake in a TEST_FAIL message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://msgid.link/20251128175124.319094-1-colin.i.king@gmail.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
Nathan Chancellor [Tue, 25 Nov 2025 17:59:15 +0000 (10:59 -0700)] 
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()

Clang warns (or errors with CONFIG_WERROR=y / W=e):

  arch/arm64/kvm/hyp/pgtable.c:757:2: error: label at end of compound statement is a C23 extension [-Werror,-Wc23-extensions]
    757 |         }
        |         ^

With older versions of clang (15 and older) and GCC (at least the minimum
supported, 8.1), this is an unconditional hard error:

  arch/arm64/kvm/hyp/pgtable.c: In function 'kvm_pgtable_stage2_pte_prot':
  arch/arm64/kvm/hyp/pgtable.c:756:2: error: label at end of compound statement
    default:
    ^~~~~~~

  arch/arm64/kvm/hyp/pgtable.c:756:10: error: label at end of compound statement: expected statement
          default:
                  ^
                   ;

Add a break statement to this default case to clear up the error/warning.

Fixes: 2608563b466b ("KVM: arm64: Add support for FEAT_XNX stage-2 permissions")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251125-arm64-kvm-hyp-pgtable-fix-c23-ext-warn-v1-1-98b506ddefbf@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
Marc Zyngier [Tue, 25 Nov 2025 20:48:48 +0000 (20:48 +0000)] 
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()

Keep sparse quiet by explicitly casting endianness conversion
when swapping S1 and S2 descriptors.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511260246.JQDGsQKa-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511260344.9XehvH5Q-lkp@intel.com/
Fixes: c59ca4b5b0c3f ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW")
Fixes: 39db933ba67f8 ("KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251125204848.1136383-1-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
Oliver Upton [Mon, 24 Nov 2025 23:54:09 +0000 (15:54 -0800)] 
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n

__lse_swap_desc() is compiled unconditionally, even if LSE is disabled
using the config option. Align with the spirit of the config option and
fix some build errors due to __LSE_PREAMBLE being undefined with the
application of some ifdeffery.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511250700.kAutzJFm-lkp@intel.com/
Link: https://msgid.link/20251124235409.1731253-1-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: selftests: Add test for AT emulation
Oliver Upton [Mon, 24 Nov 2025 19:01:57 +0000 (11:01 -0800)] 
KVM: arm64: selftests: Add test for AT emulation

Add a basic test for AT emulation in the EL2&0 and EL1&0 translation
regimes.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-16-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: nv: Expose hardware access flag management to NV guests
Oliver Upton [Mon, 24 Nov 2025 19:01:56 +0000 (11:01 -0800)] 
KVM: arm64: nv: Expose hardware access flag management to NV guests

Everything is in place to update the access flag at S1 and S2. Expose
support for the access flag flavor of FEAT_HAFDBS to NV guests.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-15-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
Oliver Upton [Mon, 24 Nov 2025 19:01:55 +0000 (11:01 -0800)] 
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW

Give the stage-2 walk similar treatment to stage-1: update the access
flag during the table walk and do so for any walk context.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-14-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Implement HW access flag management in stage-1 SW PTW
Oliver Upton [Mon, 24 Nov 2025 19:01:54 +0000 (11:01 -0800)] 
KVM: arm64: Implement HW access flag management in stage-1 SW PTW

Atomically update the Access flag at stage-1 when the guest has
configured the MMU to do so. Make the implementation choice (and liberal
interpretation of speculation) that any access type updates the Access
flag, including AT and CMO instructions.

Restart the entire walk by returning to the exception-generating
instruction in the case of a failed Access flag update.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-13-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Propagate PTW errors up to AT emulation
Oliver Upton [Mon, 24 Nov 2025 19:01:53 +0000 (11:01 -0800)] 
KVM: arm64: Propagate PTW errors up to AT emulation

KVM's software PTW will soon support 'hardware' updates to the access
flag. Similar to fault handling, races to update the descriptor will be
handled by restarting the instruction. Prepare for this by propagating
errors up to the AT emulation, only retiring the instruction if the walk
succeeds.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-12-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Add helper for swapping guest descriptor
Oliver Upton [Mon, 24 Nov 2025 19:01:52 +0000 (11:01 -0800)] 
KVM: arm64: Add helper for swapping guest descriptor

Implementing FEAT_HAFDBS in KVM's software PTWs requires the ability to
CAS a descriptor to update the in-memory value. Add an accessor to do
exactly that, coping with the fact that guest descriptors are in user
memory (duh).

While FEAT_LSE required on any system that implements NV, KVM now uses
the stage-1 PTW for non-nested use cases meaning an LL/SC implementation
is necessary as well.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-11-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: nv: Use pgtable definitions in stage-2 walk
Oliver Upton [Mon, 24 Nov 2025 19:01:51 +0000 (11:01 -0800)] 
KVM: arm64: nv: Use pgtable definitions in stage-2 walk

Use the existing page table definitions instead of magic numbers for the
stage-2 table walk.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-10-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Handle endianness in read helper for emulated PTW
Oliver Upton [Mon, 24 Nov 2025 19:01:50 +0000 (11:01 -0800)] 
KVM: arm64: Handle endianness in read helper for emulated PTW

Implementing FEAT_HAFDBS means adding another descriptor accessor that
needs to deal with the guest-configured endianness. Prepare by moving
the endianness handling into the read accessor and out of the main body
of the S1/S2 PTWs.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-9-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
Oliver Upton [Mon, 24 Nov 2025 19:01:49 +0000 (11:01 -0800)] 
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW

The stage-2 table walker passes down the vCPU as a void pointer. That
might've made sense if the walker was generic although at this point it
is clear this will only ever be used in the context of a vCPU.

Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-8-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Call helper for reading descriptors directly
Oliver Upton [Mon, 24 Nov 2025 19:01:48 +0000 (11:01 -0800)] 
KVM: arm64: Call helper for reading descriptors directly

Going through a function pointer doesn't serve much purpose when there's
only one implementation.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-7-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: nv: Advertise support for FEAT_XNX
Oliver Upton [Mon, 24 Nov 2025 19:01:47 +0000 (11:01 -0800)] 
KVM: arm64: nv: Advertise support for FEAT_XNX

Everything is in place to support FEAT_XNX, advertise support.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-6-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2 weeks agoKVM: arm64: Teach ptdump about FEAT_XNX permissions
Oliver Upton [Mon, 24 Nov 2025 19:01:46 +0000 (11:01 -0800)] 
KVM: arm64: Teach ptdump about FEAT_XNX permissions

Although KVM doesn't make direct use of the feature, guest hypervisors
can use FEAT_XNX which influences the permissions of the shadow stage-2.
Update ptdump to separately print the privileged and unprivileged
execute permissions.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-5-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
Marc Zyngier [Tue, 25 Nov 2025 16:01:44 +0000 (16:01 +0000)] 
KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE

Suzuki notices that making the ICH_HCR_EL2_TDIR capability a system
one isn't a very good idea, should we end-up with CPUs that have
asymmetric TDIR support (somehow unlikely, but you never know what
level of stupidity vendors are up to). For this hypothetical setup,
making this an "EARLY_LOCAL_CPU_FEATURE" is a much better option.

This is actually consistent with what we already do with GICv5
legacy interface, so flip the capability over.

Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: 2a28810cbb8b2 ("KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping")
Link: https://lore.kernel.org/r/5df713d4-8b79-4456-8fd1-707ca89a61b6@arm.com
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://msgid.link/20251125160144.1086511-1-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: vgic_irq: Add timer deactivation test
Marc Zyngier [Thu, 20 Nov 2025 17:25:39 +0000 (17:25 +0000)] 
KVM: arm64: selftests: vgic_irq: Add timer deactivation test

Add a new test case that triggers the HW deactivation emulation path
when trapping ICV_DIR_EL1. This is obviously tied to the way KVM
works now, but the test follows the expected architectural behaviour.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-50-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: vgic_irq: Add Group-0 enable test
Marc Zyngier [Thu, 20 Nov 2025 17:25:38 +0000 (17:25 +0000)] 
KVM: arm64: selftests: vgic_irq: Add Group-0 enable test

Add a new test case that inject a Group-0 interrupt together
with a bunch of Group-1 interrupts, Ack/EOI the G1 interrupts,
and only then enable G0, expecting to get the G0 interrupt.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-49-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
Marc Zyngier [Thu, 20 Nov 2025 17:25:37 +0000 (17:25 +0000)] 
KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test

Add a new test case that makes an interrupt pending on a vcpu,
activates it, do the priority drop, and then get *another* vcpu
to do the deactivation.

Special care is taken not to trigger an exit in the process, so
that we are sure that the active interrupt is in an LR. Joy.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-48-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
Marc Zyngier [Thu, 20 Nov 2025 17:25:36 +0000 (17:25 +0000)] 
KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order

When EOImode==1, perform the deactivation in the order of activation,
just to make things a bit worse for KVM. Yes, I'm nasty.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-47-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
Marc Zyngier [Thu, 20 Nov 2025 17:25:35 +0000 (17:25 +0000)] 
KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation

Good news: our GIC emulation is not completely broken, and we can
activate as many interrupts as we want.

Bump the test to cover all the SGIs, all the allowed PPIs, and
31 SPIs. Yes, 31, because we have 31 available priorities, and the
test is not happy with having two interrupts with the same priority.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-46-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
Marc Zyngier [Thu, 20 Nov 2025 17:25:34 +0000 (17:25 +0000)] 
KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts

The PPI injection API is clear that you can't inject the timer PPIs
from userspace, since they are controlled by the timers themselves.

Add an exclusion list for this purpose.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-45-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
Marc Zyngier [Thu, 20 Nov 2025 17:25:33 +0000 (17:25 +0000)] 
KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt

The architecture is pretty clear that changing the configuration of
an enable interrupt is not OK. It doesn't really matter here, but
doing the right thing is not more expensive.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-44-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
Marc Zyngier [Thu, 20 Nov 2025 17:25:32 +0000 (17:25 +0000)] 
KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper

No, 0 is not a spurious INTID. Never been, never was.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-43-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
Marc Zyngier [Thu, 20 Nov 2025 17:25:31 +0000 (17:25 +0000)] 
KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default

Make sure G0 is disabled at the point of initialising the GIC.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-42-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: selftests: gic_v3: Add irq group setting helper
Marc Zyngier [Thu, 20 Nov 2025 17:25:30 +0000 (17:25 +0000)] 
KVM: arm64: selftests: gic_v3: Add irq group setting helper

Being able to set the group of an interrupt is pretty useful.
Add such a helper.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-41-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv2: Always trap GICV_DIR register
Marc Zyngier [Thu, 20 Nov 2025 17:25:29 +0000 (17:25 +0000)] 
KVM: arm64: GICv2: Always trap GICV_DIR register

Since we can't decide to trap the DIR register on a per-vcpu basis,
always trap the second page of the GIC CPU interface. Yes, this is
costly. On the bright side, no sane SW should use EOImode==1 on
GICv2...

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-40-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
Marc Zyngier [Thu, 20 Nov 2025 17:25:28 +0000 (17:25 +0000)] 
KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps

Add the plumbing of GICv2 interrupt deactivation via GICV_DIR.
This requires adding a new device so that we can easily decode
the DIR address.

The deactivation itself is very similar to the GICv3 version.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-39-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv2: Handle LR overflow when EOImode==0
Marc Zyngier [Thu, 20 Nov 2025 17:25:27 +0000 (17:25 +0000)] 
KVM: arm64: GICv2: Handle LR overflow when EOImode==0

Similarly to the GICv3 version, handle the EOIcount-driven deactivation
by walking the overflow list.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-38-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
Marc Zyngier [Thu, 20 Nov 2025 17:25:26 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En

FEAT_NV2 is pretty terrible for anything that tries to enforce immediate
effects, and writing to ICH_HCR_EL2 in the hope to disable a maintenance
interrupt is vain. This only hits memory, and the guest hasn't cleared
anything -- the MI will fire.

For example, running the vgic_irq test under NV results in about 800
maintenance interrupts being actually handled by the L1 guest,
when none were expected.

As a cheap workaround, read back ICH_MISR_EL2 after writing 0 to
ICH_HCR_EL2. This is very cheap on real HW, and causes a trap to
the host in NV, giving it the opportunity to retire the pending MI.
With this, the above test runs to completion without any MI being
actually handled.

Yes, this is really poor...

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-37-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
Marc Zyngier [Thu, 20 Nov 2025 17:25:25 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive

Pretty much like the rest of the LR handling, deactivation of an
L2 interrupt gets reflected in the L1 LRs, and therefore must be
propagated into the L1 shadow state if the interrupt is HW-bound.

Instead of directly handling the active state (which looks a bit
off as it ignores locking and L1->L0 HW propagation), use the new
deactivation primitive to perform the deactivation and deal with
the required maintenance.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-36-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
Marc Zyngier [Thu, 20 Nov 2025 17:25:24 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation

The current approach to nested GICv3 support is to not do anything
while L2 is running, wait a transition from L2 to L1 to resync
LRs, VMCR and HCR, and only then evaluate the state to decide
whether to generate a maintenance interrupt.

This doesn't provide a good quality of emulation, and it would be
far preferable to find out early that we need to perform a switch.

Move the LRs/VMCR and HCR resync into vgic_v3_sync_nested(), so
that we have most of the state available. As we turning the vgic
off at this stage to avoid a screaming host MI, add a new helper
vgic_v3_flush_nested() that switches the vgic on again. The MI can
then be directly injected as required.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-35-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
Marc Zyngier [Thu, 20 Nov 2025 17:25:23 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR

CPUs lacking TDIR always trap ICV_DIR_EL1, no matter what, since
we have ICH_HCR_EL2.TC set permanently. For these CPUs, it is
useless to use a broadcast kick on SPI injection, as the sole
purpose of this is to set TDIR.

We can therefore skip this on these CPUs, which are challenged
enough not to be burdened by extra IPIs. As a consequence,
permanently set the TDIR bit in the shadow state to notify the
fast-path emulation code of the exit reason.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-34-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Handle in-LR deactivation when possible
Marc Zyngier [Thu, 20 Nov 2025 17:25:22 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Handle in-LR deactivation when possible

Even when we have either an LR overflow or SPIs in flight, it is
extremely likely that the interrupt being deactivated is still in
the LRs, and that going all the way back to the the generic trap
handling code is a waste of time.

Instead, try and deactivate in place when possible, and only if
this fails, perform a full exit.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-33-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
Marc Zyngier [Thu, 20 Nov 2025 17:25:21 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation

SPIs are specially annpying, as they can be activated on a CPU and
deactivated on another. WHich means that when an SPI is in flight
anywhere, all CPUs need to have their TDIR trap bit set.

This translates into broadcasting an IPI across all CPUs to make sure
they set their trap bit, The number of in-flight SPIs is kept in
an atomic variable so that CPUs can turn the trap bit off as soon
as possible.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-32-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Set ICH_HCR_EL2.TDIR when interrupts overflow LR capacity
Marc Zyngier [Thu, 20 Nov 2025 17:25:20 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Set ICH_HCR_EL2.TDIR when interrupts overflow LR capacity

Now that we are ready to handle deactivation through ICV_DIR_EL1,
set the trap bit if we have active interrupts outside of the LRs.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-31-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Add GICv2 SGI handling to deactivation primitive
Marc Zyngier [Thu, 20 Nov 2025 17:25:19 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Add GICv2 SGI handling to deactivation primitive

The GICv2 SGIs require additional handling for deactivation, as they
are effectively multiple interrrupts muxed into one. Make sure we
check for the source CPU when deactivating.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-30-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Handle deactivation via ICV_DIR_EL1 traps
Marc Zyngier [Thu, 20 Nov 2025 17:25:18 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Handle deactivation via ICV_DIR_EL1 traps

Deactivation via ICV_DIR_EL1 is both relatively straightforward
(we have the interrupt that needs deactivation) and really awkward.

The main issue is that the interrupt may either be in an LR on
another CPU, or ourside of any LR.

In the former case, we process the deactivation is if ot was
a write to GICD_CACTIVERn, which is already implemented as a big
hammer IPI'ing all vcpus. In the latter case, we just perform
a normal deactivation, similar to what we do for EOImode==0.

Another annoying aspect is that we need to tell the CPU owning
the interrupt that its ap_list needs laudering. We use a brand new
vcpu request to that effect.

Note that this doesn't address deactivation via the GICV MMIO view,
which will be taken care of in a later change.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-29-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Handle LR overflow when EOImode==0
Marc Zyngier [Thu, 20 Nov 2025 17:25:17 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Handle LR overflow when EOImode==0

Now that we can identify interrupts that have not made it into the LRs,
it becomes relatively easy to use EOIcount to walk the overflow list.

What is a bit odd is that we compute a fake LR for the original
state of the interrupt, clear the active bit, and feed into the existing
logic for processing. In a way, this is what would have happened if
the interrupt was in an LR.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-28-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Use MI to detect groups being enabled/disabled
Marc Zyngier [Thu, 20 Nov 2025 17:25:16 +0000 (17:25 +0000)] 
KVM: arm64: Use MI to detect groups being enabled/disabled

Add the maintenance interrupt to force an exit when the guest
enables/disables individual groups, so that we can resort the
ap_list accordingly.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-27-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Move undeliverable interrupts to the end of ap_list
Marc Zyngier [Thu, 20 Nov 2025 17:25:15 +0000 (17:25 +0000)] 
KVM: arm64: Move undeliverable interrupts to the end of ap_list

Interrupts in the ap_list that cannot be acted upon because they
are not enabled, or that their group is not enabled, shouldn't
make it into the LRs if we are space-constrained.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-26-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Invert ap_list sorting to push active interrupts out
Marc Zyngier [Thu, 20 Nov 2025 17:25:14 +0000 (17:25 +0000)] 
KVM: arm64: Invert ap_list sorting to push active interrupts out

Having established that pending interrupts should have priority
to be moved into the LRs over the active interrupts, implement this
in the ap_list sorting.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-25-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Make vgic_target_oracle() globally available
Marc Zyngier [Thu, 20 Nov 2025 17:25:13 +0000 (17:25 +0000)] 
KVM: arm64: Make vgic_target_oracle() globally available

Make the internal crystal ball global, so that implementation-specific
code can use it.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-24-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Turn kvm_vgic_vcpu_enable() into kvm_vgic_vcpu_reset()
Marc Zyngier [Thu, 20 Nov 2025 17:25:12 +0000 (17:25 +0000)] 
KVM: arm64: Turn kvm_vgic_vcpu_enable() into kvm_vgic_vcpu_reset()

Now that we always reconfigure the vgic HCR register on entry,
the "enable" part of kvm_vgic_vcpu_enable() is pretty useless.

Removing the enable bits from these functions makes it plain that
they are just about computing the reset state. Just rename the
functions accordingly.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-23-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Revamp vgic maintenance interrupt configuration
Marc Zyngier [Thu, 20 Nov 2025 17:25:11 +0000 (17:25 +0000)] 
KVM: arm64: Revamp vgic maintenance interrupt configuration

We currently don't use the maintenance interrupt very much, apart
from EOI on level interrupts, and for LR underflow in limited cases.

However, as we are moving toward a setup where active interrupts
can live outside of the LRs, we need to use the MIs in a more
diverse set of cases.

Add a new helper that produces a digest of the ap_list, and use
that summary to set the various control bits as required.

This slightly changes the way v2 SGIs are handled, as they used to
count for more than one interrupt, but not anymore.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-22-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Eagerly save VMCR on exit
Marc Zyngier [Thu, 20 Nov 2025 17:25:10 +0000 (17:25 +0000)] 
KVM: arm64: Eagerly save VMCR on exit

We currently save/restore the VMCR register in a pretty lazy way
(on load/put, consistently with what we do with the APRs).

However, we are going to need the group-enable bits that are backed
by VMCR on each entry (so that we can avoid injecting interrupts for
disabled groups).

Move the synchronisation from put to sync, which results in some minor
churn in the nVHE hypercalls to simplify things.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-21-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Compute vgic state irrespective of the number of interrupts
Marc Zyngier [Thu, 20 Nov 2025 17:25:09 +0000 (17:25 +0000)] 
KVM: arm64: Compute vgic state irrespective of the number of interrupts

As we are going to rely on the [G]ICH_HCR{,_EL2} register to be
programmed with MI information at all times, slightly de-optimise
the flush/sync code to always be called. This is rather lightweight
when no interrupts are in flight.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-20-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv2: Extract LR computing primitive
Marc Zyngier [Thu, 20 Nov 2025 17:25:08 +0000 (17:25 +0000)] 
KVM: arm64: GICv2: Extract LR computing primitive

Split vgic_v2_populate_lr() into two helpers, so that we have another
primitive that computes the LR from a vgic_irq, but doesn't update
anything in the shadow structure.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-19-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv2: Extract LR folding primitive
Marc Zyngier [Thu, 20 Nov 2025 17:25:07 +0000 (17:25 +0000)] 
KVM: arm64: GICv2: Extract LR folding primitive

As we are going to need to handle deactivation for interrupts that
are not in the LRs, split vgic_v2_fold_lr_state() into a helper
that deals with a single interrupt, and the function that loops
over the used LRs.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-18-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv2: Decouple GICH_HCR programming from LRs being loaded
Marc Zyngier [Thu, 20 Nov 2025 17:25:06 +0000 (17:25 +0000)] 
KVM: arm64: GICv2: Decouple GICH_HCR programming from LRs being loaded

Not programming GICH_HCR while no LRs are populated is a bit
of an issue, as we otherwise don't see any maintenance interrupt
when the guest interacts with the LRs.

Decouple the two and always program the control register, even when
we don't have to touch the LRs.

This is very similar to what we are already doing for GICv3.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-17-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv2: Preserve EOIcount on exit
Marc Zyngier [Thu, 20 Nov 2025 17:25:05 +0000 (17:25 +0000)] 
KVM: arm64: GICv2: Preserve EOIcount on exit

EOIcount is how the virtual CPU interface signals that the guest
is deactivating interrupts outside of the LRs when EOImode==0.

We therefore need to preserve that information so that we can find
out what actually needs deactivating, just like we already do on
GICv3.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-16-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Extract LR computing primitive
Marc Zyngier [Thu, 20 Nov 2025 17:25:04 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Extract LR computing primitive

Split vgic_v3_populate_lr() into two, so that we have another
primitive that computes the LR from a vgic_irq, but doesn't
update anything in the shadow structure.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-15-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Extract LR folding primitive
Marc Zyngier [Thu, 20 Nov 2025 17:25:03 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Extract LR folding primitive

As we are going to need to handle deactivation for interrupts that
are not in the LRs, split vgic_v3_fold_lr_state() into a helper
that deals with a single interrupt, and the function that loops
over the used LRs.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-14-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Decouple ICH_HCR_EL2 programming from LRs
Marc Zyngier [Thu, 20 Nov 2025 17:25:02 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Decouple ICH_HCR_EL2 programming from LRs

Not programming ICH_HCR_EL2 while no LRs are populated is a bit
of an issue, as we otherwise don't see any maintenance interrupt
when the guest interacts with the LRs.

Decouple the two and always program the control register, even when
we don't have to touch the LRs.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-13-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Preserve EOIcount on exit
Marc Zyngier [Thu, 20 Nov 2025 17:25:01 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Preserve EOIcount on exit

EOIcount is how the virtual CPU interface signals that the guest
is deactivating interrupts outside of the LRs when EOImode==0.

We therefore need to preserve that information so that we can find
out what actually needs deactivating.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-12-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Drop LPI active state when folding LRs
Marc Zyngier [Thu, 20 Nov 2025 17:25:00 +0000 (17:25 +0000)] 
KVM: arm64: GICv3: Drop LPI active state when folding LRs

Despite LPIs not having an active state, *virtual* LPIs do have
one, which gets cleared on EOI. So far, so good.

However, this leads to a small problem: when an active LPI is not
in the LRs, that EOImode==0 and that the guest EOIs it, EOIcount
doesn't get bumped up. Which means that in these condition, the
LPI would stay active forever.

Clearly, we can't have that. So if we spot an active LPI, we drop
that state. It's pretty pointless anyway, and only serves as a way
to trip SW over.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-11-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Add LR overflow handling documentation
Marc Zyngier [Thu, 20 Nov 2025 17:24:59 +0000 (17:24 +0000)] 
KVM: arm64: Add LR overflow handling documentation

Add a bit of documentation describing how we are dealing with LR
overflow. This is mostly a braindump of how things are expected
to work. For now anyway.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-10-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Add tracking of vgic_irq being present in a LR
Marc Zyngier [Thu, 20 Nov 2025 17:24:58 +0000 (17:24 +0000)] 
KVM: arm64: Add tracking of vgic_irq being present in a LR

We currently cannot identify whether an interrupt is queued into
a LR. It wasn't needed until now, but that's about to change.

Add yet another flag to track that state.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-9-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Repack struct vgic_irq fields
Marc Zyngier [Thu, 20 Nov 2025 17:24:57 +0000 (17:24 +0000)] 
KVM: arm64: Repack struct vgic_irq fields

struct vgic_irq has grown over the years, in a rather bad way.
Repack it using bitfields so that the individual flags, and move
things around a bit so that it a bit smaller.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-8-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping
Marc Zyngier [Thu, 20 Nov 2025 17:24:56 +0000 (17:24 +0000)] 
KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping

A long time ago, an unsuspecting architect forgot to add a trap
bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but
what's a bit of spec between friends? Thankfully, this was fixed
in a later revision, and ARM "deprecates" the lack of trapping
ability.

Unfortuantely, a few (billion) CPUs went out with that defect,
anything ARMv8.0 from ARM, give or take. And on these CPUs,
you can't trap DIR on its own, full stop.

As the next best thing, we can trap everything in the common group,
which is a tad expensive, but hey ho, that's what you get. You can
otherwise recycle the HW in the neaby bin.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: vgic-v3: Fix GICv3 trapping in protected mode
Marc Zyngier [Thu, 20 Nov 2025 17:24:55 +0000 (17:24 +0000)] 
KVM: arm64: vgic-v3: Fix GICv3 trapping in protected mode

As we are about to start trapping a bunch of extra things, augment
the pKVM trap description with all the registers trapped by ICH_HCR_EL2.TC,
making them legal instead of resulting in a UNDEF injection in the guest.

While we're at it, ensure that pKVM captures the vgic model so that it
can be checked by the emulation code.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-6-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Turn vgic-v3 errata traps into a patched-in constant
Marc Zyngier [Thu, 20 Nov 2025 17:24:54 +0000 (17:24 +0000)] 
KVM: arm64: Turn vgic-v3 errata traps into a patched-in constant

The trap bits are currently only set to manage CPU errata. However,
we are about to make use of them for purposes beyond beating broken
CPUs into submission.

For this purpose, turn these errata-driven bits into a patched-in
constant that is merged with the KVM-driven value at the point of
programming the ICH_HCR_EL2 register, rather than being directly
stored with with the shadow value..

This allows the KVM code to distinguish between a trap being handled
for the purpose of an erratum workaround, or for KVM's own need.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-5-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoirqchip/apple-aic: Spit out ICH_MISR_EL2 value on spurious vGIC MI
Marc Zyngier [Thu, 20 Nov 2025 17:24:53 +0000 (17:24 +0000)] 
irqchip/apple-aic: Spit out ICH_MISR_EL2 value on spurious vGIC MI

It is all good and well to scream about spurious vGIC maintenance
interrupts. It would be even better to output the reason why, which
is already checked, but not printed out.

The unsuspecting kernel tinkerer thanks you.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-4-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoirqchip/gic: Expose CPU interface VA to KVM
Marc Zyngier [Thu, 20 Nov 2025 17:24:52 +0000 (17:24 +0000)] 
irqchip/gic: Expose CPU interface VA to KVM

Future changes will require KVM to be able to perform deactivations
by writing to the physical CPU interface. Add the corresponding
VA to the kvm_info structure, and let KVM stash it.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-3-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoirqchip/gic: Add missing GICH_HCR control bits
Marc Zyngier [Thu, 20 Nov 2025 17:24:51 +0000 (17:24 +0000)] 
irqchip/gic: Add missing GICH_HCR control bits

The GICH_HCR description is missing a bunch of control bits that
control the maintenance interrupt. Add them.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-2-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
Oliver Upton [Mon, 24 Nov 2025 19:01:45 +0000 (11:01 -0800)] 
KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2

Add support for FEAT_XNX to shadow stage-2 MMUs, being careful to only
evaluate XN[0] when the feature is actually exposed to the VM.
Restructure the layering of permissions in the fault handler to assume
pX and uX then restricting based on the guest's stage-2 afterwards.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-4-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoKVM: arm64: Add support for FEAT_XNX stage-2 permissions
Oliver Upton [Mon, 24 Nov 2025 19:01:44 +0000 (11:01 -0800)] 
KVM: arm64: Add support for FEAT_XNX stage-2 permissions

FEAT_XNX adds support for encoding separate execute permissions for EL0
and EL1 at stage-2. Add support for this to the page table library,
hiding the unintuitive encoding scheme behind generic pX and uX
permission flags.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-3-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
3 weeks agoarm64: Detect FEAT_XNX
Oliver Upton [Mon, 24 Nov 2025 19:01:43 +0000 (11:01 -0800)] 
arm64: Detect FEAT_XNX

Detect the feature in anticipation of using it in KVM.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-2-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
4 weeks agoKVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
Raghavendra Rao Ananta [Thu, 13 Nov 2025 05:24:52 +0000 (05:24 +0000)] 
KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables

When a large VM, specifically one that holds a significant number of PTEs,
gets abruptly destroyed, the following warning is seen during the
page-table walk:

 sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without schedule
 CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV #3 NONE
 Tainted: [O]=OOT_MODULE
 Call trace:
  show_stack+0x20/0x38 (C)
  dump_stack_lvl+0x3c/0xb8
  dump_stack+0x18/0x30
  resched_latency_warn+0x7c/0x88
  sched_tick+0x1c4/0x268
  update_process_times+0xa8/0xd8
  tick_nohz_handler+0xc8/0x168
  __hrtimer_run_queues+0x11c/0x338
  hrtimer_interrupt+0x104/0x308
  arch_timer_handler_phys+0x40/0x58
  handle_percpu_devid_irq+0x8c/0x1b0
  generic_handle_domain_irq+0x48/0x78
  gic_handle_irq+0x1b8/0x408
  call_on_irq_stack+0x24/0x30
  do_interrupt_handler+0x54/0x78
  el1_interrupt+0x44/0x88
  el1h_64_irq_handler+0x18/0x28
  el1h_64_irq+0x84/0x88
  stage2_free_walker+0x30/0xa0 (P)
  __kvm_pgtable_walk+0x11c/0x258
  __kvm_pgtable_walk+0x180/0x258
  __kvm_pgtable_walk+0x180/0x258
  __kvm_pgtable_walk+0x180/0x258
  kvm_pgtable_walk+0xc4/0x140
  kvm_pgtable_stage2_destroy+0x5c/0xf0
  kvm_free_stage2_pgd+0x6c/0xe8
  kvm_uninit_stage2_mmu+0x24/0x48
  kvm_arch_flush_shadow_all+0x80/0xa0
  kvm_mmu_notifier_release+0x38/0x78
  __mmu_notifier_release+0x15c/0x250
  exit_mmap+0x68/0x400
  __mmput+0x38/0x1c8
  mmput+0x30/0x68
  exit_mm+0xd4/0x198
  do_exit+0x1a4/0xb00
  do_group_exit+0x8c/0x120
  get_signal+0x6d4/0x778
  do_signal+0x90/0x718
  do_notify_resume+0x70/0x170
  el0_svc+0x74/0xd8
  el0t_64_sync_handler+0x60/0xc8
  el0t_64_sync+0x1b0/0x1b8

The warning is seen majorly on the host kernels that are configured
not to force-preempt, such as CONFIG_PREEMPT_NONE=y. To avoid this,
instead of walking the entire page-table in one go, split it into
smaller ranges, by checking for cond_resched() between each range.
Since the path is executed during VM destruction, after the
page-table structure is unlinked from the KVM MMU, relying on
cond_resched_rwlock_write() isn't necessary.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://msgid.link/20251113052452.975081-4-rananta@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
4 weeks agoKVM: arm64: Split kvm_pgtable_stage2_destroy()
Raghavendra Rao Ananta [Thu, 13 Nov 2025 05:24:51 +0000 (05:24 +0000)] 
KVM: arm64: Split kvm_pgtable_stage2_destroy()

Split kvm_pgtable_stage2_destroy() into two:
  - kvm_pgtable_stage2_destroy_range(), that performs the
    page-table walk and free the entries over a range of addresses.
  - kvm_pgtable_stage2_destroy_pgd(), that frees the PGD.

This refactoring enables subsequent patches to free large page-tables
in chunks, calling cond_resched() between each chunk, to yield the
CPU as necessary.

Existing callers of kvm_pgtable_stage2_destroy(), that probably cannot
take advantage of this (such as nVMHE), will continue to function as is.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Suggested-by: Oliver Upton <oupton@kernel.org>
Link: https://msgid.link/20251113052452.975081-3-rananta@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
4 weeks agoKVM: arm64: Only drop references on empty tables in stage2_free_walker
Oliver Upton [Wed, 19 Nov 2025 22:11:50 +0000 (14:11 -0800)] 
KVM: arm64: Only drop references on empty tables in stage2_free_walker

A subsequent change to the way KVM frees stage-2s will invoke the free
walker on sub-ranges of the VM's IPA space, meaning there's potential
for only partially visiting a table's PTEs.

Split the leaf and table visitors and only drop references on a table
when the page count reaches 1, implying there are no valid PTEs that
need to be visited. Invalidate the table PTE to avoid traversing the
stale reference.

Link: https://msgid.link/20251113052452.975081-2-rananta@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
4 weeks agoKVM: selftests: SYNC after guest ITS setup in vgic_lpi_stress
Maximilian Dittgen [Wed, 19 Nov 2025 13:57:44 +0000 (14:57 +0100)] 
KVM: selftests: SYNC after guest ITS setup in vgic_lpi_stress

vgic_lpi_stress sends MAPTI and MAPC commands during guest GIC setup to
map interrupt events to ITT entries and collection IDs to
redistributors, respectively.

We have no guarantee that the ITS will finish handling these mapping
commands before the selftest calls KVM_SIGNAL_MSI to inject LPIs to the
guest. If LPIs are injected before ITS mapping completes, the ITS cannot
properly pass the interrupt on to the redistributor.

Fix by adding a SYNC command to the selftests ITS library, then calling
SYNC after ITS mapping to ensure mapping completes before signal_lpi()
writes to GITS_TRANSLATER.

Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-2-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
4 weeks agoKVM: selftests: Assert GICR_TYPER.Processor_Number matches selftest CPU number
Maximilian Dittgen [Wed, 19 Nov 2025 13:57:43 +0000 (14:57 +0100)] 
KVM: selftests: Assert GICR_TYPER.Processor_Number matches selftest CPU number

The selftests GIC library and tests assume that the
GICR_TYPER.Processor_number associated with a given CPU is the same as
the CPU's selftest index.

Since this assumption is not guaranteed by specification, add an assert
in gicv3_cpu_init() that validates this is true.

Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-1-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
4 weeks agoKVM: arm64: Use kvzalloc() for kvm struct allocation
Oliver Upton [Wed, 19 Nov 2025 09:38:22 +0000 (01:38 -0800)] 
KVM: arm64: Use kvzalloc() for kvm struct allocation

Physically-allocated KVM structs aren't necessary when in VHE mode as
there's no need to share with the hyp's address space. Of course, there
can still be a performance benefit from physical allocations.

Use kvzalloc() for opportunistic physical allocations.

Acked-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://msgid.link/20251119093822.2513142-3-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
4 weeks agoKVM: arm64: Drop useless __GFP_HIGHMEM from kvm struct allocation
Oliver Upton [Wed, 19 Nov 2025 09:38:21 +0000 (01:38 -0800)] 
KVM: arm64: Drop useless __GFP_HIGHMEM from kvm struct allocation

A recent change on the receiving end of vmalloc() started warning about
unsupported GFP flags passed by the caller. Nathan reports that this
warning fires in kvm_arch_alloc_vm(), owing to the fact that KVM is
passing a meaningless __GFP_HIGHMEM.

Do as the warning says and fix the code.

Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/kvmarm/20251118224448.GA998046@ax162/
Acked-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://msgid.link/20251119093822.2513142-2-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
5 weeks agoDocumentation: kvm: new UAPI for handling SEA
Jiaqi Yan [Mon, 13 Oct 2025 18:59:03 +0000 (18:59 +0000)] 
Documentation: kvm: new UAPI for handling SEA

Document the new userspace-visible features and APIs for handling
synchronous external abort (SEA)
- KVM_CAP_ARM_SEA_TO_USER: How userspace enables the new feature.
- KVM_EXIT_ARM_SEA: exit userspace gets when it needs to handle SEA
  and what userspace gets while taking the SEA.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-4-jiaqiyan@google.com
[ oliver: make documentation concise, remove implementation detail ]
Signed-off-by: Oliver Upton <oupton@kernel.org>
5 weeks agoKVM: selftests: Test for KVM_EXIT_ARM_SEA
Jiaqi Yan [Mon, 13 Oct 2025 18:59:02 +0000 (18:59 +0000)] 
KVM: selftests: Test for KVM_EXIT_ARM_SEA

Test how KVM handles guest SEA when APEI is unable to claim it, and
KVM_CAP_ARM_SEA_TO_USER is enabled.

The behavior is triggered by consuming recoverable memory error (UER)
injected via EINJ. The test asserts two major things:
1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and
   has provided expected fault information, e.g. esr, flags, gva, gpa.
2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to
   guest and KVM injects expected SEA into the VCPU.

Tested on a data center server running Siryn AmpereOne processor
that has RAS support.

Several things to notice before attempting to run this selftest:
- The test relies on EINJ support in both firmware and kernel to
  inject UER. Otherwise the test will be skipped.
- The under-test platform's APEI should be unable to claim the SEA.
  Otherwise the test will be skipped.
- Some platform doesn't support notrigger in EINJ, which may cause
  APEI and GHES to offline the memory before guest can consume
  injected UER, and making test unable to trigger SEA.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
5 weeks agoKVM: arm64: VM exit to userspace to handle SEA
Jiaqi Yan [Mon, 13 Oct 2025 18:59:01 +0000 (18:59 +0000)] 
KVM: arm64: VM exit to userspace to handle SEA

When APEI fails to handle a stage-2 synchronous external abort (SEA),
today KVM injects an asynchronous SError to the VCPU then resumes it,
which usually results in unpleasant guest kernel panic.

One major situation of guest SEA is when vCPU consumes recoverable
uncorrected memory error (UER). Although SError and guest kernel panic
effectively stops the propagation of corrupted memory, guest may
re-use the corrupted memory if auto-rebooted; in worse case, guest
boot may run into poisoned memory. So there is room to recover from
an UER in a more graceful manner.

Alternatively KVM can redirect the synchronous SEA event to VMM to
- Reduce blast radius if possible. VMM can inject a SEA to VCPU via
  KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
  consumption or fault is not from guest kernel, blast radius can be
  limited to the triggering thread in guest userspace, so VM can
  keep running.
- Allow VMM to protect from future memory poison consumption by
  unmapping the page from stage-2, or to interrupt guest of the
  poisoned page so guest kernel can unmap it from stage-1 page table.
- Allow VMM to track SEA events that VM customers care about, to restart
  VM when certain number of distinct poison events have happened,
  to provide observability to customers in log management UI.

Introduce an userspace-visible feature to enable VMM handle SEA:
- KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
  when host APEI fails to claim a SEA, userspace can opt in this new
  capability to let KVM exit to userspace during SEA if it is not
  owned by host.
- KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
  KVM fills kvm_run.arm_sea with as much as possible information about
  the SEA, enabling VMM to emulate SEA to guest by itself.
  - Sanitized ESR_EL2. The general rule is to keep only the bits
    useful for userspace and relevant to guest memory.
  - Flags indicating if faulting guest physical address is valid.
  - Faulting guest physical and virtual addresses if valid.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://msgid.link/20251013185903.1372553-2-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
7 weeks agoLinux 6.18-rc3 v6.18-rc3
Linus Torvalds [Sun, 26 Oct 2025 22:59:49 +0000 (15:59 -0700)] 
Linux 6.18-rc3

7 weeks agoMerge tag 'char-misc-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 26 Oct 2025 17:33:46 +0000 (10:33 -0700)] 
Merge tag 'char-misc-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char/misc/android driver fixes for 6.18-rc3 for
  reported issues. Included in here are:

   - rust binder fixes for reported issues

   - mei device id addition

   - mei driver fixes

   - comedi bugfix

   - most usb driver bugfixes

   - fastrpc memory leak fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  most: usb: hdm_probe: Fix calling put_device() before device initialization
  most: usb: Fix use-after-free in hdm_disconnect
  binder: remove "invalid inc weak" check
  mei: txe: fix initialization order
  comedi: fix divide-by-zero in comedi_buf_munge()
  mei: late_bind: Fix -Wincompatible-function-pointer-types-strict
  misc: fastrpc: Fix dma_buf object leak in fastrpc_map_lookup
  mei: me: add wildcat lake P DID
  misc: amd-sbi: Clarify that this is a BMC driver
  nvmem: rcar-efuse: add missing MODULE_DEVICE_TABLE
  binder: Fix missing kernel-doc entries in binder.c
  rust_binder: report freeze notification only when fully frozen
  rust_binder: don't delete FreezeListener if there are pending duplicates
  rust_binder: freeze_notif_done should resend if wrong state
  rust_binder: remove warning about orphan mappings
  rust_binder: clean `clippy::mem_replace_with_default` warning

7 weeks agoMerge tag 'staging-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 26 Oct 2025 17:29:45 +0000 (10:29 -0700)] 
Merge tag 'staging-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are some small staging driver fixes for the gpib subsystem to
  resolve some reported issues. Included in here are:

   - memory leak fixes

   - error code fixes

   - proper protocol fixes

  All of these have been in linux-next for almost 2 weeks now with no
  reported issues"

* tag 'staging-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: gpib: Fix device reference leak in fmh_gpib driver
  staging: gpib: Return -EINTR on device clear
  staging: gpib: Fix sending clear and trigger events
  staging: gpib: Fix no EOI on 1 and 2 byte writes

7 weeks agoMerge tag 'tty-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 26 Oct 2025 17:24:39 +0000 (10:24 -0700)] 
Merge tag 'tty-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
 "Here are some small tty and serial driver fixes for reported issues.
  Included in here are:

   - sh-sci serial driver fixes

   - 8250_dw and _mtk driver fixes

   - sc16is7xx driver bugfix

   - new 8250_exar device ids added

  All of these have been in linux-next this past week with no reported
  issues"

* tag 'tty-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250_mtk: Enable baud clock and manage in runtime PM
  serial: 8250_dw: handle reset control deassert error
  dt-bindings: serial: sh-sci: Fix r8a78000 interrupts
  serial: sc16is7xx: remove useless enable of enhanced features
  serial: 8250_exar: add support for Advantech 2 port card with Device ID 0x0018
  tty: serial: sh-sci: fix RSCI FIFO overrun handling

7 weeks agoMerge tag 'usb-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 26 Oct 2025 17:21:13 +0000 (10:21 -0700)] 
Merge tag 'usb-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
 "Here are some small USB driver fixes and new device ids for 6.18-rc3.
  Included in here are:

   - new option serial driver device ids added

   - dt bindings fixes for numerous platforms

   - xhci bugfixes for many reported regressions

   - usbio dependency bugfix

   - dwc3 driver fix

   - raw-gadget bugfix

  All of these have been in linux-next this week with no reported issues"

* tag 'usb-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: option: add Telit FN920C04 ECM compositions
  USB: serial: option: add Quectel RG255C
  tcpm: switch check for role_sw device with fw_node
  usb/core/quirks: Add Huawei ME906S to wakeup quirk
  usb: raw-gadget: do not limit transfer length
  USB: serial: option: add UNISOC UIS7720
  xhci: dbc: enable back DbC in resume if it was enabled before suspend
  xhci: dbc: fix bogus 1024 byte prefix if ttyDBC read races with stall event
  usb: xhci-pci: Fix USB2-only root hub registration
  dt-bindings: usb: qcom,snps-dwc3: Fix bindings for X1E80100
  usb: misc: Add x86 dependency for Intel USBIO driver
  dt-bindings: usb: switch: split out ports definition
  usb: dwc3: Don't call clk_bulk_disable_unprepare() twice
  dt-bindings: usb: dwc3-imx8mp: dma-range is required only for imx8mp

7 weeks agoMerge tag 'x86_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Oct 2025 16:57:18 +0000 (09:57 -0700)] 
Merge tag 'x86_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Remove dead code leftovers after a recent mitigations cleanup which
   fail a Clang build

 - Make sure a Retbleed mitigation message is printed only when
   necessary

 - Correct the last Zen1 microcode revision for which Entrysign sha256
   check is needed

 - Fix a NULL ptr deref when mounting the resctrl fs on a system which
   supports assignable counters but where L3 total and local bandwidth
   monitoring has been disabled at boot

* tag 'x86_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Remove dead code which might prevent from building
  x86/bugs: Qualify RETBLEED_INTEL_MSG
  x86/microcode: Fix Entrysign revision check for Zen1/Naples
  x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode

7 weeks agoMerge tag 'irq_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Oct 2025 16:54:36 +0000 (09:54 -0700)] 
Merge tag 'irq_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Restore the original buslock locking in a couple of places in the irq
   core subsystem after a rework

* tag 'irq_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/manage: Add buslock back in to enable_irq()
  genirq/manage: Add buslock back in to __disable_irq_nosync()
  genirq/chip: Add buslock back in to irq_set_handler()

7 weeks agoMerge tag 'objtool_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Oct 2025 16:44:36 +0000 (09:44 -0700)] 
Merge tag 'objtool_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:

 - Fix x32 build due to wrong format specifier on that sub-arch

 - Add one more Rust noreturn function to objtool's list

* tag 'objtool_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix failure when being compiled on x32 system
  objtool/rust: add one more `noreturn` Rust function

7 weeks agoMerge tag 'sched_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Oct 2025 16:42:19 +0000 (09:42 -0700)] 
Merge tag 'sched_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:

 - Make sure a CFS runqueue on a throttled hierarchy has its PELT clock
   throttled otherwise task movement and manipulation would lead to
   dangling cfs_rq references and an eventual crash

* tag 'sched_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Start a cfs_rq on throttled hierarchy with PELT clock throttled

7 weeks agoMerge tag 'timers_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Oct 2025 16:40:16 +0000 (09:40 -0700)] 
Merge tag 'timers_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Borislav Petkov:

 - Do not create more than eight (max supported) AUX clocks sysfs
   hierarchies

* tag 'timers_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix aux clocks sysfs initialization loop bound

8 weeks agoMerge tag 'driver-core-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 25 Oct 2025 18:03:46 +0000 (11:03 -0700)] 
Merge tag 'driver-core-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core fixes from Danilo Krummrich:

 - In Device::parent(), do not make any assumptions on the device
   context of the parent device

 - Check visibility before changing ownership of a sysfs attribute
   group

 - In topology_parse_cpu_capacity(), replace an incorrect usage of
   PTR_ERR_OR_ZERO() with IS_ERR_OR_NULL()

 - In devcoredump, fix a circular locking dependency between
   struct devcd_entry::mutex and kernfs

 - Do not warn about a pending fw_devlink sync state

* tag 'driver-core-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  arch_topology: Fix incorrect error check in topology_parse_cpu_capacity()
  rust: device: fix device context of Device::parent()
  sysfs: check visibility before changing group attribute ownership
  devcoredump: Fix circular locking dependency with devcd->mutex.
  driver core: fw_devlink: Don't warn about sync_state() pending

8 weeks agoMerge tag 'firewire-fixes-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 25 Oct 2025 17:58:32 +0000 (10:58 -0700)] 
Merge tag 'firewire-fixes-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fixes from Takashi Sakamoto:
 "A small collection of FireWire fixes. This includes corrections to
  sparse and API documentation"

* tag 'firewire-fixes-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: init_ohci1394_dma: add missing function parameter documentation
  firewire: core: fix __must_hold() annotation

8 weeks agoMerge tag 'riscv-for-linus-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 25 Oct 2025 16:35:26 +0000 (09:35 -0700)] 
Merge tag 'riscv-for-linus-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

 - Close a race during boot between userspace vDSO usage and some
   late-initialized vDSO data

 - Improve performance on systems with non-CPU-cache-coherent
   DMA-capable peripherals by enabling write combining on
   pgprot_dmacoherent() allocations

 - Add human-readable detail for RISC-V IPI tracing

 - Provide more information to zsmalloc on 64-bit RISC-V to improve
   allocation

 - Silence useless boot messages about CPUs that have been disabled in
   DT

 - Resolve some compiler and smatch warnings and remove a redundant
   macro

* tag 'riscv-for-linus-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: hwprobe: avoid uninitialized variable use in hwprobe_arch_id()
  riscv: cpufeature: avoid uninitialized variable in has_thead_homogeneous_vlenb()
  riscv: hwprobe: Fix stale vDSO data for late-initialized keys at boot
  riscv: add a forward declaration for cpuinfo_op
  RISC-V: Don't print details of CPUs disabled in DT
  riscv: Remove the PER_CPU_OFFSET_SHIFT macro
  riscv: mm: Define MAX_POSSIBLE_PHYSMEM_BITS for zsmalloc
  riscv: Register IPI IRQs with unique names
  ACPI: RIMT: Fix unused function warnings when CONFIG_IOMMU_API is disabled
  RISC-V: Define pgprot_dmacoherent() for non-coherent devices

8 weeks agoMerge tag 'xfs-fixes-6.18-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 25 Oct 2025 16:31:13 +0000 (09:31 -0700)] 
Merge tag 'xfs-fixes-6.18-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:
 "The main highlight here is a fix for a bug brought in by the removal
  of attr2 mount option, where some installations might actually have
  'attr2' explicitly configured in fstab preventing system to boot by
  not being able to remount the rootfs as RW.

  Besides that there are a couple fix to the zonefs implementation,
  changing XFS_ONLINE_SCRUB_STATS to depend on DEBUG_FS (was select
  before), and some other minor changes"

* tag 'xfs-fixes-6.18-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix locking in xchk_nlinks_collect_dir
  xfs: loudly complain about defunct mount options
  xfs: always warn about deprecated mount options
  xfs: don't set bt_nr_sectors to a negative number
  xfs: don't use __GFP_NOFAIL in xfs_init_fs_context
  xfs: cache open zone in inode->i_private
  xfs: avoid busy loops in GCD
  xfs: XFS_ONLINE_SCRUB_STATS should depend on DEBUG_FS
  xfs: do not tightly pack-write large files
  xfs: Improve CONFIG_XFS_RT Kconfig help