git.ipfire.org Git - thirdparty/kernel/linux.git/log

KVM: arm64: Fix FEAT_Debugv8p9 to check DebugVer, not PMUVer

FEAT_Debugv8p9 is incorrectly defined against ID_AA64DFR0_EL1.PMUVer
instead of ID_AA64DFR0_EL1.DebugVer.  All three consumers of the macro
gate features that are architecturally tied to FEAT_Debugv8p9
(DebugVer = 0b1011, DDI0487 M.b A2.2.10):

  - HDFGRTR2_EL2.nMDSELR_EL1, HDFGWTR2_EL2.nMDSELR_EL1: MDSELR_EL1
    is present only when FEAT_Debugv8p9 is implemented (D24.3.21).

  - MDCR_EL2.EBWE: the Extended Breakpoint and Watchpoint Enable bit
    is RES0 unless FEAT_Debugv8p9 is implemented (D24.3.17).

Neither register has any dependency on PMUVer.

FEAT_Debugv8p9 and FEAT_PMUv3p9 are independent.  Per DDI0487 M.b
A2.2.10, FEAT_Debugv8p9 is unconditionally mandatory from Armv8.9,
whereas FEAT_PMUv3p9 is mandatory only when FEAT_PMUv3 is implemented.
An Armv8.9 CPU without a PMU has DebugVer = 0b1011 but PMUVer = 0b0000,
so the wrong field check would cause KVM to incorrectly treat EBWE and
MDSELR_EL1 as RES0 on such hardware.

Fixes: 4bc0fe089840 ("KVM: arm64: Add sanitisation for FEAT_FGT2 registers")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260424084908.370776-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org

KVM: arm64: Reject non compliant SMCCC function calls in pKVM

Prevent the propagation of a function-id that has the top bits set since
this is not compliant with the SMCCC spec and can overlap with the
already known function-id decoders. (eg. if we invoke an smc with
0xffffffffc4000012 it will be decoded as a PSCI reset call). Instead,
make it clear that we don't support it and return an error.

Signed-off-by: Sebastian Ene <sebastianene@google.com>
Link: https://patch.msgid.link/20260408114118.422604-1-sebastianene@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic: Fix IIDR revision field extracted from wrong value

The uaccess write handlers for GICD_IIDR in both GICv2 and GICv3
extract the revision field from 'reg' (the current IIDR value read back
from the emulated distributor) instead of 'val' (the value userspace is
trying to write). This means userspace can never actually change the
implementation revision — the extracted value is always the current one.

Fix the FIELD_GET to use 'val' so that userspace can select a different
revision for migration compatibility.

Fixes: 49a1a2c70a7f ("KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR revision")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20260407210949.2076251-2-dwmw2@infradead.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org

KVM: arm64: pkvm: Adopt MARKER() to define host hypercall ranges

The EL2 code defines ranges of host hypercalls that are either
enabled at boot-time only, used by [nh]VHE KVM, or reserved to pKVM.

The way these ranges are delineated is error prone, as the enum symbols
defining the limits are expressed in terms of actual function symbols.
This means that should a new function be added, special care must be
taken to also update the limit symbol.

Improve this by reusing the mechanism introduced for the vcpu_sysreg
enum, which uses a MARKER() macro and some extra trickery to make
the limit symbol standalone. Crucially, the limit symbol has the
same value as the *following* symbol.

The handle_host_hcall() function is then updated to make use of
the new limit definitions and get rid of the brittle default
upper limit. This allows for some more strict checks at build
time, and the removal of an comparison at run time.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260414160528.2218858-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Re-allow hyp tracing HVCs for [nh]VHE

The introduction of __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM excluded hyp
tracing HVCs from the common [nh]VHE/pKVM list. Re-allow them.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260414100231.1859687-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/misc-7.1 into kvmarm-master/next

* kvm-arm64/misc-7.1:
  KVM: arm64: selftests: Avoid testing the IMPDEF behavior
  KVM: arm64: Destroy stage-2 page-table in kvm_arch_destroy_vm()
  KVM: arm64: Don't leave mmu->pgt dangling on kvm_init_stage2_mmu() error
  KVM: arm64: Prevent the host from using an smc with imm16 != 0

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/vgic-fixes-7.1 into kvmarm-master/next

* kvm-arm64/vgic-fixes-7.1:
  : .
  : FIrst pass at fixing a number of vgic-v5 bugs that were found
  : after the merge of the initial series.
  : .
  KVM: arm64: Advertise ID_AA64PFR2_EL1.GCIE
  KVM: arm64: vgic-v5: Fold PPI state for all exposed PPIs
  KVM: arm64: set_id_regs: Allow GICv3 support to be set at runtime
  KVM: arm64: Don't advertises GICv3 in ID_PFR1_EL1 if AArch32 isn't supported
  KVM: arm64: Correctly plumb ID_AA64PFR2_EL1 into pkvm idreg handling
  KVM: arm64: Move GICv5 timer PPI validation into timer_irqs_are_valid()
  KVM: arm64: Remove evaluation of timer state in kvm_cpu_has_pending_timer()
  KVM: arm64: Kill arch_timer_context::direct field
  KVM: arm64: vgic-v5: Correctly set dist->ready once initialised
  KVM: arm64: vgic-v5: Make the effective priority mask a strict limit
  KVM: arm64: vgic-v5: Cast vgic_apr to u32 to avoid undefined behaviours
  KVM: arm64: vgic-v5: Transfer edge pending state to ICH_PPI_PENDRx_EL2
  KVM: arm64: vgic-v5: Hold config_lock while finalizing GICv5 PPIs
  KVM: arm64: Account for RESx bits in __compute_fgt()
  KVM: arm64: Fix writeable mask for ID_AA64PFR2_EL1
  arm64: Fix field references for ICH_PPI_DVIR[01]_EL2
  KVM: arm64: Don't skip per-vcpu NV initialisation
  KVM: arm64: vgic: Don't reset cpuif/redist addresses at finalize time

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/pkvm-protected-guest into kvmarm-master/next

* kvm-arm64/pkvm-protected-guest: (41 commits)
  : .
  : pKVM support for protected guests, implementing the very long
  : awaited support for anonymous memory, as the elusive guestmem
  : has failed to deliver on its promises despite a multi-year
  : effort. Patches courtesy of Will Deacon. From the initial cover
  : letter:
  :
  : "[...] this patch series implements support for protected guest
  : memory with pKVM, where pages are unmapped from the host as they are
  : faulted into the guest and can be shared back from the guest using pKVM
  : hypercalls. Protected guests are created using a new machine type
  : identifier and can be booted to a shell using the kvmtool patches
  : available at [2], which finally means that we are able to test the pVM
  : logic in pKVM. Since this is an incremental step towards full isolation
  : from the host (for example, the CPU register state and DMA accesses are
  : not yet isolated), creating a pVM requires a developer Kconfig option to
  : be enabled in addition to booting with 'kvm-arm.mode=protected' and
  : results in a kernel taint."
  : .
  KVM: arm64: Don't hold 'vm_table_lock' across guest page reclaim
  KVM: arm64: Allow get_pkvm_hyp_vm() to take a reference to a dying VM
  KVM: arm64: Prevent teardown finalisation of referenced 'hyp_vm'
  drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL
  KVM: arm64: Rename PKVM_PAGE_STATE_MASK
  KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs
  KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim
  KVM: arm64: Register 'selftest_vm' in the VM table
  KVM: arm64: Extend pKVM page ownership selftests to cover guest donation
  KVM: arm64: Add some initial documentation for pKVM
  KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled
  KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
  KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
  KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
  KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
  KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
  KVM: arm64: Introduce hypercall to force reclaim of a protected page
  KVM: arm64: Annotate guest donations with handle and gfn in host stage-2
  KVM: arm64: Change 'pkvm_handle_t' to u16
  KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/spe-trbe-nvhe into kvmarm-master/next

* kvm-arm64/spe-trbe-nvhe:
  : .
  : Fix SPE and TRBE nVHE world switch which can otherwise result in
  : pretty bad behaviours, as they have the nasty habit of performing
  : out of context speculative page table walks.
  :
  : Patches courtesy of Will Deacon.
  : .
  KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines
  KVM: arm64: Disable SPE Profiling Buffer when running in guest context
  KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/user_mem_abort-rework into kvmarm-master/next

* kvm-arm64/user_mem_abort-rework: (30 commits)
  : .
  : user_mem_abort() has become an absolute pain to maintain,
  : to the point that each single fix is likely to introduce
  : *two* new bugs.
  :
  : Deconstruct the whole thing in logical units, reducing
  : the amount of visible and/or mutable state between functions,
  : and finally making the code a bit more maintainable.
  : .
  KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc
  KVM: arm64: Simplify integration of adjust_nested_*_perms()
  KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault
  KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn()
  KVM: arm64: Replace force_pte with a max_map_size attribute
  KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info
  KVM: arm64: Restrict the scope of the 'writable' attribute
  KVM: arm64: Kill logging_active from kvm_s2_fault
  KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
  KVM: arm64: Kill topup_memcache from kvm_s2_fault
  KVM: arm64: Kill exec_fault from kvm_s2_fault
  KVM: arm64: Kill write_fault from kvm_s2_fault
  KVM: arm64: Constrain fault_granule to kvm_s2_fault_map()
  KVM: arm64: Replace fault_is_perm with a helper
  KVM: arm64: Move fault context to const structure
  KVM: arm64: Make fault_ipa immutable
  KVM: arm64: Kill fault->ipa
  KVM: arm64: Clean up control flow in kvm_s2_fault_map()
  KVM: arm64: Hoist MTE validation check out of MMU lock path
  KVM: arm64: Optimize early exit checks in kvm_s2_fault_pin_pfn()
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/pkvm-psci into kvmarm-master/next

* kvm-arm64/pkvm-psci:
  : .
  : Cleanup of the pKVM PSCI relay CPU entry code, making it slightly
  : easier to follow, should someone have to wade into these waters
  : ever again.
  : .
  KVM: arm64: Remove extra ISBs when using msr_hcr_el2
  KVM: arm64: pkvm: Use direct function pointers for cpu_{on,resume}
  KVM: arm64: pkvm: Turn __kvm_hyp_init_cpu into an inner label
  KVM: arm64: pkvm: Simplify BTI handling on CPU boot
  KVM: arm64: pkvm: Move error handling to the end of kvm_hyp_cpu_entry

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/nv-s2-debugfs into kvmarm-master/next

* kvm-arm64/nv-s2-debugfs:
  : .
  : Expand the stage-2 ptdump infrastructure to be able to display
  : the content of the shadow s2 tables generated by nested virt.
  :
  : Patches courtesy of Wei-Lin Chang.
  : .
  KVM: arm64: ptdump: Initialize parser_state before pgtable walk
  KVM: arm64: nv: Expose shadow page tables in debugfs
  KVM: arm64: ptdump: Make KVM ptdump code s2 mmu aware

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/vgic-v5-ppi into kvmarm-master/next

* kvm-arm64/vgic-v5-ppi: (40 commits)
  : .
  : Add initial GICv5 support for KVM guests, only adding PPI support
  : for the time being. Patches courtesy of Sascha Bischoff.
  :
  : From the cover letter:
  :
  : "This is v7 of the patch series to add the virtual GICv5 [1] device
  : (vgic_v5). Only PPIs are supported by this initial series, and the
  : vgic_v5 implementation is restricted to the CPU interface,
  : only. Further patch series are to follow in due course, and will add
  : support for SPIs, LPIs, the GICv5 IRS, and the GICv5 ITS."
  : .
  KVM: arm64: selftests: Add no-vgic-v5 selftest
  KVM: arm64: selftests: Introduce a minimal GICv5 PPI selftest
  KVM: arm64: gic-v5: Communicate userspace-driveable PPIs via a UAPI
  Documentation: KVM: Introduce documentation for VGICv5
  KVM: arm64: gic-v5: Probe for GICv5 device
  KVM: arm64: gic-v5: Set ICH_VCTLR_EL2.En on boot
  KVM: arm64: gic-v5: Introduce kvm_arm_vgic_v5_ops and register them
  KVM: arm64: gic-v5: Hide FEAT_GCIE from NV GICv5 guests
  KVM: arm64: gic: Hide GICv5 for protected guests
  KVM: arm64: gic-v5: Mandate architected PPI for PMU emulation on GICv5
  KVM: arm64: gic-v5: Enlighten arch timer for GICv5
  irqchip/gic-v5: Introduce minimal irq_set_type() for PPIs
  KVM: arm64: gic-v5: Initialise ID and priority bits when resetting vcpu
  KVM: arm64: gic-v5: Create and initialise vgic_v5
  KVM: arm64: gic-v5: Support GICv5 interrupts with KVM_IRQ_LINE
  KVM: arm64: gic-v5: Implement direct injection of PPIs
  KVM: arm64: Introduce set_direct_injection irq_op
  KVM: arm64: gic-v5: Trap and mask guest ICC_PPI_ENABLERx_EL1 writes
  KVM: arm64: gic-v5: Check for pending PPIs
  KVM: arm64: gic-v5: Clear TWI if single task running
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/hyp-tracing into kvmarm-master/next

* kvm-arm64/hyp-tracing: (40 commits)
  : .
  : EL2 tracing support, adding both 'remote' ring-buffer
  : infrastructure and the tracing itself, courtesy of
  : Vincent Donnefort. From the cover letter:
  :
  : "The growing set of features supported by the hypervisor in protected
  : mode necessitates debugging and profiling tools. Tracefs is the
  : ideal candidate for this task:
  :
  :   * It is simple to use and to script.
  :
  :   * It is supported by various tools, from the trace-cmd CLI to the
  :     Android web-based perfetto.
  :
  :   * The ring-buffer, where are stored trace events consists of linked
  :     pages, making it an ideal structure for sharing between kernel and
  :     hypervisor.
  :
  : This series first introduces a new generic way of creating remote events and
  : remote buffers. Then it adds support to the pKVM hypervisor."
  : .
  tracing: selftests: Extend hotplug testing for trace remotes
  tracing: Non-consuming read for trace remotes with an offline CPU
  tracing: Adjust cmd_check_undefined to show unexpected undefined symbols
  tracing: Restore accidentally removed SPDX tag
  KVM: arm64: avoid unused-variable warning
  tracing: Generate undef symbols allowlist for simple_ring_buffer
  KVM: arm64: tracing: add ftrace dependency
  tracing: add more symbols to whitelist
  tracing: Update undefined symbols allow list for simple_ring_buffer
  KVM: arm64: Fix out-of-tree build for nVHE/pKVM tracing
  tracing: selftests: Add hypervisor trace remote tests
  KVM: arm64: Add selftest event support to nVHE/pKVM hyp
  KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp
  KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote
  KVM: arm64: Add trace reset to the nVHE/pKVM hyp
  KVM: arm64: Sync boot clock with the nVHE/pKVM hyp
  KVM: arm64: Add trace remote for the nVHE/pKVM hyp
  KVM: arm64: Add tracing capability for the nVHE/pKVM hyp
  KVM: arm64: Support unaligned fixmap in the pKVM hyp
  KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Advertise ID_AA64PFR2_EL1.GCIE

As we are missing ID_AA64PFR2_EL1.GCIE from the kernel feature set,
userspace cannot write ID_AA64PFR2_EL1 with GCIE set, even if we are
on a GICv5 host.

Add the required field description.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://patch.msgid.link/20260401170017.369529-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

tracing: selftests: Extend hotplug testing for trace remotes

The hotplug testing only tries reading a trace remote buffer, loaded
before a CPU is offline. Extend this testing to cover:

* A trace remote buffer loaded after a CPU is offline.
* A trace remote buffer loaded before a CPU is online.

Because of these added test cases, move the hotplug testing into a
separate hotplug.tc file.

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260401045100.3394299-3-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

tracing: Non-consuming read for trace remotes with an offline CPU

When a trace_buffer is created while a CPU is offline, this CPU is
cleared from the trace_buffer CPU mask, preventing the creation of a
non-consuming iterator (ring_buffer_iter). For trace remotes, it means
the iterator fails to be allocated (-ENOMEM) even though there are
available ring buffers in the trace_buffer.

For non-consuming reads of trace remotes, skip missing ring_buffer_iter
to allow reading the available ring buffers.

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260401045100.3394299-2-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic-v5: Fold PPI state for all exposed PPIs

GICv5 supports up to 128 PPIs, which would introduce a large amount of
overhead if all of them were actively tracked. Rather than keeping
track of all 128 potential PPIs, we instead only consider the set of
architected PPIs (the first 64). Moreover, we further reduce that set
by only exposing a subset of the PPIs to a guest. In practice, this
means that only 4 PPIs are typically exposed to a guest - the SW_PPI,
PMUIRQ, and the timers.

When folding the PPI state, changed bits in the active or pending were
used to choose which state to sync back. However, this breaks badly
for Edge interrupts when exiting the guest before it has consumed the
edge. There is no change in pending state detected, and the edge is
lost forever.

Given the reduced set of PPIs exposed to the guest, and the issues
around tracking the edges, drop the tracking of changed state, and
instead iterate over the limited subset of PPIs exposed to the guest
directly.

This change drops the second copy of the PPI pending state used for
detecting edges in the pending state, and reworks
vgic_v5_fold_ppi_state() to iterate over the VM's PPI mask instead.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://patch.msgid.link/20260401162152.932243-1-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Avoid testing the IMPDEF behavior

It turned out that we can't really force KVM to use the "slow" path when
emulating AT instructions [1]. We should therefore avoid testing the IMPDEF
behavior (i.e., TEST_ACCESS_FLAG - address translation instructions are
permitted to update AF but not required).

Remove it and improve the comment a bit.

[1] https://lore.kernel.org/r/b951dcfb-0ad1-4d7b-b6ce-d54b272dd9be@linux.dev

Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260317131558.52751-1-zenghui.yu@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Destroy stage-2 page-table in kvm_arch_destroy_vm()

kvm_arch_destroy_vm() can be called on the kvm_create_vm() error path
after we have failed to register the MMU notifiers for the new VM. In
this case, we cannot rely on the MMU ->release() notifier to call
kvm_arch_flush_shadow_all() and so the stage-2 page-table allocated in
kvm_arch_init_vm() will be leaked.

Explicitly destroy the stage-2 page-table in kvm_arch_destroy_vm(), so
that we clean up after kvm_arch_destroy_vm() without relying on the MMU
notifiers.

Link: https://sashiko.dev/#/patchset/20260327140039.21228-1-will%40kernel.org?patch=12265
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260327192758.21739-3-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Don't leave mmu->pgt dangling on kvm_init_stage2_mmu() error

If kvm_init_stage2_mmu() fails to allocate 'mmu->last_vcpu_ran', it
destroys the newly allocated stage-2 page-table before returning ENOMEM.

Unfortunately, it also leaves a dangling pointer in 'mmu->pgt' which
points at the freed 'kvm_pgtable' structure. This is likely to confuse
the kvm_vcpu_init_nested() failure path which can double-free the
structure if it finds it via kvm_free_stage2_pgd().

Ensure that the dangling 'mmu->pgt' pointer is cleared when returning an
error from kvm_init_stage2_mmu().

Link: https://sashiko.dev/#/patchset/20260327140039.21228-1-will%40kernel.org?patch=12265
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260327192758.21739-2-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Prevent the host from using an smc with imm16 != 0

The ARM Service Calling Convention (SMCCC) specifies that the function
identifier and parameters should be passed in registers, leaving the
16-bit immediate field un-handled in pKVM when an SMC instruction is
trapped.
Since the HVC is a private interface between EL2 and the host,
enforce the host kernel running under pKVM to use an immediate value
of 0 only when using SMCs to make it clear for non-compliant software
talking to Trustzone that we only use SMCCC.

Signed-off-by: Sebastian Ene <sebastianene@google.com>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260330105441.3226904-1-sebastianene@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: set_id_regs: Allow GICv3 support to be set at runtime

set_id_regs creates a GIC3 guest when possible, and then proceeds
to write the ID registers as if they were not affected by the presence
of a GIC. As it turns out, ID_AA64PFR1_EL1 is the proof of the
contrary.

KVM now makes a point in exposing the GIC support to the guest,
no matter what userspace says (userspace such as QEMU is known to
write silly things at times).

Accommodate for this level of nonsense by teaching set_id_regs about
fields that are mutable, and only compare registers that have been
re-sanitised first.

Reported-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20260401103611.357092-17-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Don't advertises GICv3 in ID_PFR1_EL1 if AArch32 isn't supported

Although the AArch32 ID regs are architecturally UNKNOWN when AArch32
isn't supported at any EL, KVM makes a point in making them RAZ.

Therefore, advertising GICv3 in ID_PFR1_EL1 must be gated on AArch32
being supported at least at EL0.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: a258a383b9177 ("KVM: arm64: gic-v5: Sanitize ID_AA64PFR2_EL1.GCIE")
Reported-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20260401103611.357092-16-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Correctly plumb ID_AA64PFR2_EL1 into pkvm idreg handling

While we now compute ID_AA64PFR2_EL1 to a glorious 0, we never use
that data and instead return the 0 that corresponds to an allocated
idreg. Not a big deal, but we might as well be consistent.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: 5aefaf11f9af5 ("KVM: arm64: gic: Hide GICv5 for protected guests")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-15-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Move GICv5 timer PPI validation into timer_irqs_are_valid()

Userspace can set the timer PPI numbers way before a GIC has been
created, leading to odd behaviours on GICv5 as we'd accept non
architectural PPI numbers.

Move the v5 check into timer_irqs_are_valid(), which aligns the
behaviour with the pre-v5 GICs, and is also guaranteed to run
only once a GIC has been configured.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: 9491c63b6cd7b ("KVM: arm64: gic-v5: Enlighten arch timer for GICv5")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-14-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Remove evaluation of timer state in kvm_cpu_has_pending_timer()

The vgic-v5 code added some evaluations of the timers in a helper funtion
(kvm_cpu_has_pending_timer()) that is called to determine whether
the vcpu can wake-up.

But looking at the timer there is wrong:

- we want to see timers that are signalling an interrupt to the
  vcpu, and not just that have a pending interrupt

- we already have kvm_arch_vcpu_runnable() that evaluates the
  state of interrupts

- kvm_cpu_has_pending_timer() really is about WFIT, as the timeout
  does not generate an interrupt, and is therefore distinct from
  the point above

As a consequence, revert these changes and teach vgic_v5_has_pending_ppi()
about checking for pending HW interrupts instead.

Fixes: 9491c63b6cd7b ("KVM: arm64: gic-v5: Enlighten arch timer for GICv5")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-13-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Kill arch_timer_context::direct field

The newly introduced arch_timer_context::direct field is a bit pointless,
as it is always set on timers that are... err... direct, while
we already have a way to get to that by doing a get_map() operation.

Additionally, this field is:

- only set when get_map() is called

- never cleared

and the single point where it is actually checked doesn't call get_map()
at all.

At this stage, it is probably better to just kill it, and rely on
get_map() to give us the correct information.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: 9491c63b6cd7b ("KVM: arm64: gic-v5: Enlighten arch timer for GICv5")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-12-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic-v5: Correctly set dist->ready once initialised

kvm_vgic_map_resources() targetting a v5 model results in vgic->dist_ready
never being set. This doesn't result in anything really bad, only
some more heavy locking as we go and re-init something for no good reason.

Rejig the code to correctly set the ready flag in all non-failing
cases.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: f4d37c7c35769 ("KVM: arm64: gic-v5: Create and initialise vgic_v5")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic-v5: Make the effective priority mask a strict limit

The way the effective priority mask is compared to the priority of
an interrupt to decide whether to wake-up or not, is slightly odd,
and breaks at the limits.

This could result in spurious wake-ups that are undesirable.

Make the computed priority mask comparison a strict inequality, so
that interrupts that have the same priority as the mask are not
signalled.

Fixes: 933e5288fa971 ("KVM: arm64: gic-v5: Check for pending PPIs")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic-v5: Cast vgic_apr to u32 to avoid undefined behaviours

Passing a u64 to __builtin_ctz() is odd, and requires some digging to
figure out why this construct is indeed safe as long as the HW is
correct.

But it is much easier to make it clear to the compiler by casting
the u64 into an intermediate u32, and be done with the UD.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: 933e5288fa971 ("KVM: arm64: gic-v5: Check for pending PPIs")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-9-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic-v5: Transfer edge pending state to ICH_PPI_PENDRx_EL2

While it is perfectly correct to leave the pending state of a level
interrupt as is when queuing it (it is, after all, only driven by
the line), edge pending state must be transfered, as nothing will
lower it.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: 4d591252bacb2 ("KVM: arm64: gic-v5: Implement PPI interrupt injection")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-8-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic-v5: Hold config_lock while finalizing GICv5 PPIs

Finalizing the PPI state is done without holding any lock, which
means that two vcpus can race against each other and have one zeroing
the state while another one is setting it, or even maybe using it.

Fixing this is done by:

- holding the config lock while performing the initialisation

- checking if SW_PPI has already been advertised, meaning that
we have already completed the initialisation once

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: 8f1fbe2fd2792 ("KVM: arm64: gic-v5: Finalize GICv5 PPIs and generate mask")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-7-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Account for RESx bits in __compute_fgt()

When computing Fine Grained Traps, it is preferable to account for
the reserved bits. The HW will most probably ignore them, unless the
bits have been repurposed to do something else.

Use caution, and fold our view of the reserved bits in,

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: c259d763e6b09 ("KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260401103611.357092-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix writeable mask for ID_AA64PFR2_EL1

The writeable mask for fields in ID_AA64PFR2_EL1 has been accidentally
inverted, which isn't a very good idea.

Restore the expected polarity.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: a258a383b9177 ("KVM: arm64: gic-v5: Sanitize ID_AA64PFR2_EL1.GCIE")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

arm64: Fix field references for ICH_PPI_DVIR[01]_EL2

The ICH_PPI_DVIR[01]_EL2 registers should refer to the ICH_PPI_DVIRx_EL2
fields, instead of ICH_PPI_DVIx_EL2.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: 2808a8337078f ("arm64/sysreg: Add remaining GICv5 ICC_ & ICH_ sysregs for KVM support")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Don't skip per-vcpu NV initialisation

Some GICv5-related rework have resulted in the NV sanitisation of
registers being skipped for secondary vcpus, which is a pretty bad
idea.

Hoist the NV init early so that it is always executed.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: cbd8c958be54a ("KVM: arm64: Return early from kvm_finalize_sys_regs() if guest has run")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Link: https://patch.msgid.link/20260401103611.357092-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic: Don't reset cpuif/redist addresses at finalize time

Although we are OK with rewriting idregs at finalize time, resetting
the guest's cpuif (GICv3) or redistributor (GICv3) addresses once
we start running the guest is a pretty bad idea.

Move back this initialisation to vgic creation time.

Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: a258a383b9177 ("KVM: arm64: gic-v5: Sanitize ID_AA64PFR2_EL1.GCIE")
Link: https://patch.msgid.link/20260323174713.3183111-1-maz@kernel.org
Link: https://patch.msgid.link/20260401103611.357092-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Don't hold 'vm_table_lock' across guest page reclaim

Now that the teardown of a VM cannot be finalised as long as a reference
is held on the VM, rework __pkvm_reclaim_dying_guest_page() to hold a
reference to the dying VM rather than take the global 'vm_table_lock'
during the reclaim operation.

Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260331155056.28220-4-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Allow get_pkvm_hyp_vm() to take a reference to a dying VM

Now that completion of the teardown path requires a refcount of zero for
the target VM, we can allow get_pkvm_hyp_vm() to take a reference on a
dying VM, which is necessary to unshare pages with a non-protected VM
during the teardown process itself.

Note that vCPUs belonging to a dying VM cannot be loaded and pages can
only be reclaimed from a protected VM (via
__pkvm_reclaim_dying_guest_page()) if the target VM is in the dying
state.

Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260331155056.28220-3-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Prevent teardown finalisation of referenced 'hyp_vm'

Destroying a 'hyp_vm' with an elevated referenced count in
__pkvm_finalize_teardown_vm() is only going to lead to tears.

In preparation for allowing limited references to be acquired on dying
VMs during the teardown process, factor out the handle-to-vm logic for
the teardown path and reuse it for both the 'start' and 'finalise'
stages of the teardown process.

Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260331155056.28220-2-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL

pKVM guests practically rely on CONFIG_DMA_RESTRICTED_POOL=y in order
to establish shared memory regions with the host for virtio buffers.

Make CONFIG_ARM_PKVM_GUEST depend on CONFIG_DMA_RESTRICTED_POOL to avoid
the inevitable segmentation faults experience if you have the former but
not the latter.

Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-39-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Rename PKVM_PAGE_STATE_MASK

Rename PKVM_PAGE_STATE_MASK to PKVM_PAGE_STATE_VMEMMAP_MASK to make it
clear that the mask applies to the page state recorded in the entries
of the 'hyp_vmemmap', rather than page states stored elsewhere (e.g. in
the ptes).

Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-38-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs

Now that the guest can share and unshare memory with the host using
hypercalls, extend the pKVM page ownership selftest to exercise these
new transitions.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-37-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim

Extend the pKVM page ownership selftests to forcefully reclaim a donated
page and check that it cannot be re-donated at the same IPA.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-36-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Register 'selftest_vm' in the VM table

In preparation for extending the pKVM page ownership selftests to cover
forceful reclaim of donated pages, rework the creation of the
'selftest_vm' so that it is registered in the VM table while the tests
are running.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-35-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Extend pKVM page ownership selftests to cover guest donation

Extend the pKVM page ownership selftests to donate and reclaim a page
to/from a guest.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-34-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Add some initial documentation for pKVM

Add some initial documentation for pKVM to help people understand what
is supported, the limitations of protected VMs when compared to
non-protected VMs and also what is left to do.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-33-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled

Introduce a new VM type for KVM/arm64 to allow userspace to request the
creation of a "protected VM" when the host has booted with pKVM enabled.

For now, this feature results in a taint on first use as many aspects of
a protected VM are not yet protected!

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-32-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs

Implement the ARM_SMCCC_KVM_FUNC_MEM_UNSHARE hypercall to allow
protected VMs to unshare memory that was previously shared with the host
using the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-31-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs

Implement the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall to allow protected
VMs to share memory (e.g. the swiotlb bounce buffers) back to the host.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-30-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs

Add a hypercall handler at EL2 for hypercalls originating from protected
VMs. For now, this implements only the FEATURES and MEMINFO calls, but
subsequent patches will implement the SHARE and UNSHARE functions
necessary for virtio.

Unhandled hypercalls (including PSCI) are passed back to the host.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-29-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte

If a protected vCPU faults on an IPA which appears to be mapped, query
the hypervisor to determine whether or not the faulting pte has been
poisoned by a forceful reclaim. If the pte has been poisoned, return
-EFAULT back to userspace rather than retrying the instruction forever.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-28-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler

Host kernel accesses to pages that are inaccessible at stage-2 result in
the injection of a translation fault, which is fatal unless an exception
table fixup is registered for the faulting PC (e.g. for user access
routines). This is undesirable, since a get_user_pages() call could be
used to obtain a reference to a donated page and then a subsequent
access via a kernel mapping would lead to a panic().

Rework the spurious fault handler so that stage-2 faults injected back
into the host result in the target page being forcefully reclaimed when
no exception table fixup handler is registered.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-27-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce hypercall to force reclaim of a protected page

Introduce a new hypercall, __pkvm_force_reclaim_guest_page(), to allow
the host to forcefully reclaim a physical page that was previous donated
to a protected guest. This results in the page being zeroed and the
previous guest mapping being poisoned so that new pages cannot be
subsequently donated at the same IPA.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-26-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Annotate guest donations with handle and gfn in host stage-2

Handling host kernel faults arising from accesses to donated guest
memory will require an rmap-like mechanism to identify the guest mapping
of the faulting page.

Extend the page donation logic to encode the guest handle and gfn
alongside the owner information in the host stage-2 pte.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-25-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Change 'pkvm_handle_t' to u16

'pkvm_handle_t' doesn't need to be a 32-bit type and subsequent patches
will rely on it being no more than 16 bits so that it can be encoded
into a pte annotation.

Change 'pkvm_handle_t' to a u16 and add a compile-type check that the
maximum handle fits into the reduced type.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-24-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()

Rework host_stage2_set_owner_locked() to add a new helper function,
host_stage2_set_owner_metadata_locked(), which will allow us to store
additional metadata alongside a 3-bit owner ID for invalid host stage-2
entries.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-23-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Generalise kvm_pgtable_stage2_set_owner()

kvm_pgtable_stage2_set_owner() can be generalised into a way to store
up to 59 bits in the page tables alongside a 4-bit 'type' identifier
specific to the format of the 59-bit payload.

Introduce kvm_pgtable_stage2_annotate() and move the existing invalid
ptes (for locked ptes and donated pages) over to the new scheme.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-22-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Avoid pointless annotation when mapping host-owned pages

When a page is transitioned to host ownership, we can eagerly map it
into the host stage-2 page-table rather than going via the convoluted
step of a faulting annotation to trigger the mapping.

Call host_stage2_idmap_locked() directly when transitioning a page to
be owned by the host.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-21-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Inject SIGSEGV on illegal accesses

The pKVM hypervisor will currently panic if the host tries to access
memory that it doesn't own (e.g. protected guest memory). Sadly, as
guest memory can still be mapped into the VMM's address space, userspace
can trivially crash the kernel/hypervisor by poking into guest memory.

To prevent this, inject the abort back in the host with S1PTW set in the
ESR, hence allowing the host to differentiate this abort from normal
userspace faults and inject a SIGSEGV cleanly.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-20-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Support translation faults in inject_host_exception()

Extend inject_host_exception() to support the injection of translation
faults on both the data and instruction side to 32-bit and 64-bit EL0
as well as 64-bit EL1. This will be used in a subsequent patch when
resolving an unhandled host stage-2 abort.

Cc: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-19-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Factor out pKVM host exception injection logic

inject_undef64() open-codes the logic to inject an exception into the
pKVM host. In preparation for reusing this logic to inject a data abort
on an unhandled stage-2 fault from the host, factor out the meat and
potatoes of the function into a new inject_host_exception() function
which takes the ESR as a parameter.

Cc: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-18-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()

During teardown of a protected guest, its memory pages must be reclaimed
from the hypervisor by issuing the '__pkvm_reclaim_dying_guest_page'
hypercall.

Add a new helper, __pkvm_pgtable_stage2_reclaim(), which is called
during the VM teardown operation to reclaim pages from the hypervisor
and drop the GUP pin on the host.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-17-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()

To enable reclaim of pages from a protected VM during teardown,
introduce a new hypercall to reclaim a single page from a protected
guest that is in the dying state.

Since the EL2 code is non-preemptible, the new hypercall deliberately
acts on a single page at a time so as to allow EL1 to reschedule
frequently during the teardown operation.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-16-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Handle aborts from protected VMs

Introduce a new abort handler for resolving stage-2 page faults from
protected VMs by pinning and donating anonymous memory. This is
considerably simpler than the infamous user_mem_abort() as we only have
to deal with translation faults at the pte level.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-15-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map()

Mapping pages into a protected guest requires the donation of memory
from the host.

Extend pkvm_pgtable_stage2_map() to issue a donate hypercall when the
target VM is protected. Since the hypercall only handles a single page,
the splitting logic used for the share path is not required.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-14-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Introduce __pkvm_host_donate_guest()

In preparation for supporting protected VMs, whose memory pages are
isolated from the host, introduce a new pKVM hypercall to allow the
donation of pages to a guest.

Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-13-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Split teardown hypercall into two phases

In preparation for reclaiming protected guest VM pages from the host
during teardown, split the current 'pkvm_teardown_vm' hypercall into
separate 'start' and 'finalise' calls.

The 'pkvm_start_teardown_vm' hypercall puts the VM into a new 'is_dying'
state, which is a point of no return past which no vCPU of the pVM is
allowed to run any more. Once in this new state,
'pkvm_finalize_teardown_vm' can be used to reclaim meta-data and
page-table pages from the VM. A subsequent patch will add support for
reclaiming the individual guest memory pages.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-12-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host

If the host takes a stage-2 translation fault on two CPUs at the same
time, one of them will get back -EAGAIN from the page-table mapping code
when it runs into the mapping installed by the other.

Rather than handle this explicitly in handle_host_mem_abort(), pass the
new KVM_PGTABLE_WALK_IGNORE_EAGAIN flag to kvm_pgtable_stage2_map() from
__host_stage2_idmap() and return -EEXIST if host_stage2_adjust_range()
finds a valid pte. This will avoid having to test for -EAGAIN on the
reclaim path in subsequent patches.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-11-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Prevent unsupported memslot operations on protected VMs

Protected VMs do not support deleting or moving memslots after first
run nor do they support read-only or dirty logging.

Return -EPERM to userspace if such an operation is attempted.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-10-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Ignore MMU notifier callbacks for protected VMs

In preparation for supporting the donation of pinned pages to protected
VMs, return early from the MMU notifiers when called for a protected VM,
as the necessary hypercalls are exposed only for non-protected guests.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-9-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls

When pKVM is not enabled, the host shouldn't issue pKVM-specific
hypercalls and so there's no point checking for this in the pKVM
hypercall handlers.

Remove the redundant is_protected_kvm_enabled() checks from each
hypercall and instead rejig the hypercall table so that the
pKVM-specific hypercalls are unreachable when pKVM is not being used.

Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-8-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Expose self-hosted debug regs as RAZ/WI for protected guests

Debug and trace are not currently supported for protected guests, so
trap accesses to the related registers and emulate them as RAZ/WI for
now. Although this isn't strictly compatible with the architecture, it's
sufficient for Linux guests and means that debug support can be added
later on.

Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-7-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Don't advertise unsupported features for protected guests

Both SVE and PMUv3 are treated as "restricted" features for protected
guests and attempts to access their corresponding architectural state
from a protected guest result in an undefined exception being injected
by the hypervisor.

Since these exceptions are unexpected and typically fatal for the guest,
don't advertise these features for protected guests.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-6-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Rename __pkvm_pgtable_stage2_unmap()

In preparation for adding support for protected VMs, where pages are
donated rather than shared, rename __pkvm_pgtable_stage2_unmap() to
__pkvm_pgtable_stage2_unshare() to make it clearer about what is going
on.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-5-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range()

When pKVM is enabled, a VM has a 'handle' allocated by the hypervisor
in kvm_arch_init_vm() and released later by kvm_arch_destroy_vm().

Consequently, the only time __pkvm_pgtable_stage2_unmap() can run into
an uninitialised 'handle' is on the kvm_arch_init_vm() failure path,
where we destroy the empty stage-2 page-table if we fail to allocate a
handle.

Move the handle check into pkvm_pgtable_stage2_destroy_range(), which
will additionally handle protected VMs in subsequent patches.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-4-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM

If pkvm_init_host_vm() fails, we should free the stage-2 page-table
previously allocated by kvm_init_stage2_mmu().

Cc: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Fixes: 07aeb70707b1 ("KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm()")
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-3-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Remove unused PKVM_ID_FFA definition

Commit 7cbf7c37718e ("KVM: arm64: Drop pkvm_mem_transition for host/hyp
sharing") removed the last users of PKVM_ID_FFA, so drop the definition
altogether.

Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260330144841.26181-2-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines

Now that the SPE and BRBE nVHE world-switch routines operate on the
host_debug_state directly, tweak the BRBE code to do the same for
consistency.

This is purely cosmetic.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260327130047.21065-4-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Disable SPE Profiling Buffer when running in guest context

The nVHE world-switch code relies on zeroing PMSCR_EL1 to disable
profiling data generation in guest context when SPE is in use by the
host.

Unfortunately, this may leave PMBLIMITR_EL1.E set and consequently we
can end up running in guest/hypervisor context with the Profiling Buffer
enabled. The current "known issues" document for Rev M.a of the Arm ARM
states that this can lead to speculative, out-of-context translations:

  | 2.18 D23136:
  |
  | When the Profiling Buffer is enabled, profiling is not stopped, and
  | Discard mode is not enabled, the Statistical Profiling Unit might
  | cause speculative translations for the owning translation regime,
  | including when the owning translation regime is out-of-context.

In a similar fashion to TRBE, ensure that the Profiling Buffer is
disabled during the nVHE world switch before we start messing with the
stage-2 MMU and trap configuration.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Fuad Tabba <tabba@google.com>
Fixes: f85279b4bd48 ("arm64: KVM: Save/restore the host SPE state when entering/leaving a VM")
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260327130047.21065-3-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context

The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
generation in guest context when self-hosted TRBE is in use by the host.

Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
per R_YCHKJ the Trace Buffer Unit will still be enabled if
TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
Trace Buffer Unit can perform address translation for the "owning
exception level" even when it is out of context.

Consequently, we can end up in a state where TRBE performs speculative
page-table walks for a host VA/IPA in guest/hypervisor context depending
on the value of MDCR_EL2.E2TB, which changes over world-switch. The
potential result appears to be a heady mixture of SErrors, data
corruption and hardware lockups.

Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
draining the buffer, restoring the register on return to the host. This
unfortunately means we need to tackle CPU errata #2064142 and #2038923
which add additional synchronisation requirements around manipulations
of the limit register. Hopefully this doesn't need to be fast.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260327130047.21065-2-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc

Having fully converted user_mem_abort() to kvm_s2_fault_desc and
co, convert gmem_abort() to it as well. The change is obviously
much simpler.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Simplify integration of adjust_nested_*_perms()

Instead of passing pointers to adjust_nested_*_perms(), allow
them to return a new set of permissions.

With some careful moving around so that the canonical permissions
are computed before the nested ones are applied, we end-up with
a bit less code, and something a bit more readable.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault

The 'prot' field is the only one left in kvm_s2_fault. Expose it
directly to the functions needing it, and get rid of kvm_s2_fault.

It has served us well during this refactoring, but it is now no
longer needed.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn()

Attributes computed for devices are computed very late in the fault
handling process, meanning they are mutable for that long.

Introduce both 'device' and 'map_non_cacheable' attributes to the
vma_info structure, allowing that information to be set in stone
earlier, in kvm_s2_fault_pin_pfn().

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Replace force_pte with a max_map_size attribute

force_pte is annoyingly limited in what it expresses, and we'd
be better off with a more generic primitive. Introduce max_map_size
instead, which does the trick and can be moved into the vma_info
structure. This firther allows it to reduce the scopes in which
it is mutable.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info

Continue restricting the visibility/mutability of some attributes
by moving kvm_s2_fault.{pfn,page} to kvm_s2_vma_info.

This is a pretty mechanical change.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Restrict the scope of the 'writable' attribute

The 'writable' field is ambiguous, and indicates multiple things:

- whether the underlying memslot is writable

- whether we are resolving the fault with writable attributes

Add a new field to kvm_s2_fault_vma_info (map_writable) to indicate
the former condition, and have local writable variables to track
the latter.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Kill logging_active from kvm_s2_fault

There are only two spots where we evaluate whether logging is
active. Replace the boolean with calls to the relevant helper.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info

Mecanically extract a bunch of VMA-related fields from kvm_s2_fault
and move them to a new kvm_s2_fault_vma_info structure.

This is not much, but it already allows us to define which functions
can update this structure, and which ones are pure consumers of the
data. Those in the latter camp are updated to take a const pointer
to that structure.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Kill topup_memcache from kvm_s2_fault

The topup_memcache field can be easily replaced by the equivalent
conditions, and the resulting code is not much worse. While at it,
split prepare_mmu_memcache() into get/topup helpers, which makes
the code more readable.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Kill exec_fault from kvm_s2_fault

Similarly to write_fault, exec_fault can be advantageously replaced
by the kvm_vcpu_trap_is_exec_fault() predicate where needed.

Another one bites the dust...

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Kill write_fault from kvm_s2_fault

We already have kvm_is_write_fault() as a predicate indicating
a S2 fault on a write, and we're better off just using that instead
of duplicating the state.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Constrain fault_granule to kvm_s2_fault_map()

The notion of fault_granule is specific to kvm_s2_fault_map(), and
is unused anywhere else.

Move this variable locally, removing it from kvm_s2_fault.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Replace fault_is_perm with a helper

Carrying a boolean to indicate that a given fault is a permission fault
is slightly odd, as this is a property of the fault itself, and we'd
better avoid duplicating state.

For this purpose, introduce a kvm_s2_fault_is_perm() predicate that
can take a fault descriptor as a parameter. fault_is_perm is therefore
dropped from kvm_s2_fault.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Move fault context to const structure

In order to make it clearer what gets updated or not during fault
handling, move a set of information that losely represents the
fault context.

This gets populated early, from handle_mem_abort(), and gets passed
along as a const pointer. user_mem_abort()'s signature is majorly
improved in doing so, and kvm_s2_fault loses a bunch of fields.

gmem_abort() will get a similar treatment down the line.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Make fault_ipa immutable

Updating fault_ipa is conceptually annoying, as it changes something
that is a property of the fault itself.

Stop doing so and instead use fault->gfn as the sole piece of state
that can be used to represent the faulting IPA.

At the same time, introduce get_canonical_gfn() for the couple of case
we're we are concerned with the memslot-related IPA and not the faulting
one.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Kill fault->ipa

fault->ipa, in a nested contest, represents the output of the guest's
S2 translation for the fault->fault_ipa input, and is equal to
fault->fault_ipa otherwise,

Given that this is readily available from kvm_s2_trans_output(),
drop fault->ipa and directly compute fault->gfn instead, which
is really what we want.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Clean up control flow in kvm_s2_fault_map()

Clean up the KVM MMU lock retry loop by pre-assigning the error code.
Add clear braces to the THP adjustment integration for readability, and
safely unnest the transparent hugepage logic branches.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>