Marc Zyngier [Wed, 25 Feb 2026 10:47:18 +0000 (10:47 +0000)]
KVM: arm64: Deduplicate ASID retrieval code
We currently have three versions of the ASID retrieval code, one
in the S1 walker, and two in the VNCR handling (although the last
two are limited to the EL2&0 translation regime).
Make this code common, and take this opportunity to also simplify
the code a bit while switching over to the TTBRx_EL1_ASID macro.
Sascha Bischoff [Wed, 25 Feb 2026 08:31:40 +0000 (08:31 +0000)]
irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag
It appears that a !! became ! during a cleanup, resulting in inverted
logic when detecting if a host GICv5 implementation is capable of
virtualization.
Fuad Tabba [Sun, 22 Feb 2026 08:33:52 +0000 (08:33 +0000)]
KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs
Commit 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is
not supported") added an early return to several functions in
arch/arm64/kvm/nested.c to prevent a UBSAN shift-out-of-bounds error
when accessing the pgt union for non-nested VMs.
However, this early return was inadvertently applied to
kvm_arch_flush_shadow_all() as well, causing it to skip the call to
kvm_uninit_stage2_mmu(kvm) for all non-nested VMs.
For pKVM, skipping this teardown means the host never unshares the
guest's memory with the EL2 hypervisor. When the host kernel later
recycles these leaked pages for a new VM, it attempts to re-share them.
The hypervisor correctly rejects this with -EPERM, triggering a host
WARN_ON and hanging the guest.
Fix this by dropping the early return from kvm_arch_flush_shadow_all().
The for-loop guarding the nested MMU cleanup already bounds itself when
nested_mmus_size == 0, allowing execution to proceed to
kvm_uninit_stage2_mmu() as intended.
Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/all/60916cb6-f460-4751-b910-f63c58700ad0@sirena.org.uk/ Fixes: 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported") Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20260222083352.89503-1-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sun, 22 Feb 2026 13:35:13 +0000 (13:35 +0000)]
KVM: arm64: Fix protected mode handling of pages larger than 4kB
Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
pKVM tracks the memory that has been mapped into a guest in a
side data structure. Crucially, it uses it to find out whether
a page has already been mapped, and therefore refuses to map it
twice. So far, so good.
However, this very patch completely breaks non-4kB page support,
with guests being unable to boot. The most obvious symptom is that
we take the same fault repeatedly, and not making forward progress.
A quick investigation shows that this is because of the above
rejection code.
As it turns out, there are multiple issues at play:
- while the HPFAR_EL2 register gives you the faulting IPA minus
the bottom 12 bits, it will still give you the extra bits that
are part of the page offset for anything larger than 4kB,
even for a level-3 mapping
- pkvm_pgtable_stage2_map() assumes that the address passed as
a parameter is aligned to the size of the intended mapping
- the faulting address is only aligned for a non-page mapping
When the planets are suitably aligned (pun intended), the guest
faults on a page by accessing it past the bottom 4kB, and extra bits
get set in the HPFAR_EL2 register. If this results in a page mapping
(which is likely with large granule sizes), nothing aligns it further
down, and pkvm_mapping_iter_first() finds an intersection that
doesn't really exist. We assume this is a spurious fault and return
-EAGAIN. And again...
This doesn't hit outside of the protected code, as the page table
code always aligns the IPA down to a page boundary, hiding the issue
for everyone else.
Fix it by always forcing the alignment on vma_pagesize, irrespective
of the value of vma_pagesize.
Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings") Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://https://patch.msgid.link/20260222141000.3084258-1-maz@kernel.org Cc: stable@vger.kernel.org
Kees Cook [Fri, 6 Feb 2026 22:30:23 +0000 (14:30 -0800)]
KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct gic_kvm_info", but the returned type,
while matching, is const qualified. To get them exactly matching, just
use the dereferenced pointer for the sizeof().
Fuad Tabba [Fri, 13 Feb 2026 14:38:15 +0000 (14:38 +0000)]
KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state()
The `sve_state` pointer in `hyp_vcpu->vcpu.arch` is initialized as a
hypervisor virtual address during vCPU initialization in
`pkvm_vcpu_init_sve()`.
`unpin_host_sve_state()` calls `kern_hyp_va()` on this address. Since
`kern_hyp_va()` is idempotent, it's not a bug. However, it is
unnecessary and potentially confusing. Remove the redundant conversion.
Fuad Tabba [Fri, 13 Feb 2026 14:38:14 +0000 (14:38 +0000)]
KVM: arm64: Fix ID register initialization for non-protected pKVM guests
In protected mode, the hypervisor maintains a separate instance of
the `kvm` structure for each VM. For non-protected VMs, this structure is
initialized from the host's `kvm` state.
Currently, `pkvm_init_features_from_host()` copies the
`KVM_ARCH_FLAG_ID_REGS_INITIALIZED` flag from the host without the
underlying `id_regs` data being initialized. This results in the
hypervisor seeing the flag as set while the ID registers remain zeroed.
Consequently, `kvm_has_feat()` checks at EL2 fail (return 0) for
non-protected VMs. This breaks logic that relies on feature detection,
such as `ctxt_has_tcrx()` for TCR2_EL1 support. As a result, certain
system registers (e.g., TCR2_EL1, PIR_EL1, POR_EL1) are not
saved/restored during the world switch, which could lead to state
corruption.
Fix this by explicitly copying the ID registers from the host `kvm` to
the hypervisor `kvm` for non-protected VMs during initialization, since
we trust the host with its non-protected guests' features. Also ensure
`KVM_ARCH_FLAG_ID_REGS_INITIALIZED` is cleared initially in
`pkvm_init_features_from_host` so that `vm_copy_id_regs` can properly
initialize them and set the flag once done.
Fuad Tabba [Fri, 13 Feb 2026 14:38:13 +0000 (14:38 +0000)]
KVM: arm64: Optimise away S1POE handling when not supported by host
Although ID register sanitisation prevents guests from seeing the
feature, adding this check to the helper allows the compiler to entirely
eliminate S1POE-specific code paths (such as context switching POR_EL1)
when the host kernel is compiled without support (CONFIG_ARM64_POE is
disabled).
This aligns with the pattern used for other optional features like SVE
(kvm_has_sve()) and FPMR (kvm_has_fpmr()), ensuring no POE logic if the
host lacks support, regardless of the guest configuration state.
Fuad Tabba [Fri, 13 Feb 2026 14:38:12 +0000 (14:38 +0000)]
KVM: arm64: Hide S1POE from guests when not supported by the host
When CONFIG_ARM64_POE is disabled, KVM does not save/restore POR_EL1.
However, ID_AA64MMFR3_EL1 sanitisation currently exposes the feature to
guests whenever the hardware supports it, ignoring the host kernel
configuration.
If a guest detects this feature and attempts to use it, the host will
fail to context-switch POR_EL1, potentially leading to state corruption.
Fix this by masking ID_AA64MMFR3_EL1.S1POE in the sanitised system
registers, preventing KVM from advertising the feature when the host
does not support it (i.e. system_supports_poe() is false).
Marc Zyngier [Thu, 5 Feb 2026 09:17:58 +0000 (09:17 +0000)]
Merge branch kvm-arm64/misc-6.20 into kvmarm-master/next
* kvm-arm64/misc-6.20:
: .
: Misc KVM/arm64 changes for 6.20
:
: - Trivial FPSIMD cleanups
:
: - Calculate hyp VA size only once, avoiding potential mapping issues when
: VA bits is smaller than expected
:
: - Silence sparse warning for the HYP stack base
:
: - Fix error checking when handling FFA_VERSION
:
: - Add missing trap configuration for DBGWCR15_EL1
:
: - Don't try to deal with nested S2 when NV isn't enabled for a guest
:
: - Various spelling fixes
: .
KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported
KVM: arm64: Fix various comments
KVM: arm64: nv: Add trap config for DBGWCR<15>_EL1
KVM: arm64: Fix error checking for FFA_VERSION
KVM: arm64: Fix missing <asm/stackpage/nvhe.h> include
KVM: arm64: Calculate hyp VA size only once
KVM: arm64: Remove ISB after writing FPEXC32_EL2
KVM: arm64: Shuffle KVM_HOST_DATA_FLAG_* indices
KVM: arm64: Fix comment in fpsimd_lazy_switch_to_host()
Marc Zyngier [Thu, 5 Feb 2026 09:17:48 +0000 (09:17 +0000)]
Merge branch kvm-arm64/resx into kvmarm-master/next
* kvm-arm64/resx:
: .
: Add infrastructure to deal with the full gamut of RESx bits
: for NV. As a result, it is now possible to have the expected
: semantics for some bits such as SCTLR_EL2.SPAN.
: .
KVM: arm64: Add debugfs file dumping computed RESx values
KVM: arm64: Add sanitisation to SCTLR_EL2
KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
KVM: arm64: Remove all traces of FEAT_TME
KVM: arm64: Simplify handling of full register invalid constraint
KVM: arm64: Get rid of FIXED_VALUE altogether
KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
KVM: arm64: Move RESx into individual register descriptors
KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
KVM: arm64: Simplify FIXED_VALUE handling
KVM: arm64: Convert HCR_EL2.RW to AS_RES1
KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features
KVM: arm64: Allow RES1 bits to be inferred from configuration
KVM: arm64: Inherit RESx bits from FGT register descriptors
KVM: arm64: Extend unified RESx handling to runtime sanitisation
KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits
KVM: arm64: Introduce standalone FGU computing primitive
KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E}
arm64: Convert SCTLR_EL2 to sysreg infrastructure
Marc Zyngier [Thu, 5 Feb 2026 09:17:44 +0000 (09:17 +0000)]
Merge branch kvm-arm64/debugfs-fixes into kvmarm-master/next
* kvm-arm64/debugfs-fixes:
: .
: Cleanup of the debugfs iterator, which are way more complicated
: than they ought to be, courtesy of Fuad Tabba. From the cover letter:
:
: "This series refactors the debugfs implementations for `idregs` and
: `vgic-state` to use standard `seq_file` iterator patterns.
:
: The existing implementations relied on storing iterator state within
: global VM structures (`kvm_arch` and `vgic_dist`). This approach
: prevented concurrent reads of the debugfs files (returning -EBUSY) and
: created improper dependencies between transient file operations and
: long-lived VM state."
: .
KVM: arm64: Use standard seq_file iterator for vgic-debug debugfs
KVM: arm64: Reimplement vgic-debug XArray iteration
KVM: arm64: Use standard seq_file iterator for idregs debugfs
Marc Zyngier [Thu, 5 Feb 2026 09:17:30 +0000 (09:17 +0000)]
Merge branch kvm-arm64/gicv5-prologue into kvmarm-master/next
* kvm-arm64/gicv5-prologue:
: .
: Prologue to GICv5 support, courtesy of Sascha Bischoff.
:
: This is preliminary work that sets the scene for the full-blow
: support.
: .
irqchip/gic-v5: Check if impl is virt capable
KVM: arm64: gic: Set vgic_model before initing private IRQs
arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1
KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2
Marc Zyngier [Thu, 5 Feb 2026 09:17:04 +0000 (09:17 +0000)]
Merge branch kvm-arm64/gicv3-tdir-fixes into kvmarm-master/next
* kvm-arm64/gicv3-tdir-fixes:
: .
: Address two trapping-related issues when running legacy (i.e. GICv3)
: guests on GICv5 hosts, courtesy of Sascha Bischoff.
: .
KVM: arm64: Correct test for ICH_HCR_EL2_TDIR cap for GICv5 hosts
KVM: arm64: gic: Enable GICv3 CPUIF trapping on GICv5 hosts if required
Marc Zyngier [Thu, 5 Feb 2026 09:16:42 +0000 (09:16 +0000)]
Merge branch kvm-arm64/fwb-for-all into kvmarm-master/next
* kvm-arm64/fwb-for-all:
: .
: Allow pKVM's host stage-2 mappings to use the Force Write Back version
: of the memory attributes by using the "pass-through' encoding.
:
: This avoids having two separate encodings for S2 on a given platform.
: .
KVM: arm64: Simplify PAGE_S2_MEMATTR
KVM: arm64: Kill KVM_PGTABLE_S2_NOFWB
KVM: arm64: Switch pKVM host S2 over to KVM_PGTABLE_S2_AS_S1
KVM: arm64: Add KVM_PGTABLE_S2_AS_S1 flag
arm64: Add MT_S2{,_FWB}_AS_S1 encodings
Marc Zyngier [Thu, 5 Feb 2026 09:16:31 +0000 (09:16 +0000)]
Merge branch kvm-arm64/pkvm-no-mte into kvmarm-master/next
* kvm-arm64/pkvm-no-mte:
: .
: pKVM updates preventing the host from using MTE-related system
: sysrem registers when the feature is disabled from the kernel
: command-line (arm64.nomte), courtesy of Fuad Taba.
:
: From the cover letter:
:
: "If MTE is supported by the hardware (and is enabled at EL3), it remains
: available to lower exception levels by default. Disabling it in the host
: kernel (e.g., via 'arm64.nomte') only stops the kernel from advertising
: the feature; it does not physically disable MTE in the hardware.
:
: The ability to disable MTE in the host kernel is used by some systems,
: such as Android, so that the physical memory otherwise used as tag
: storage can be used for other things (i.e. treated just like the rest of
: memory). In this scenario, a malicious host could still access tags in
: pages donated to a guest using MTE instructions (e.g., STG and LDG),
: bypassing the kernel's configuration."
: .
KVM: arm64: Use kvm_has_mte() in pKVM trap initialization
KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabled
KVM: arm64: Trap MTE access and discovery when MTE is disabled
KVM: arm64: Remove dead code resetting HCR_EL2 for pKVM
Computing RESx values is hard. Verifying that they are correct is
harder. Add a debugfs file called "resx" that will dump all the RESx
values for a given VM.
Marc Zyngier [Mon, 2 Feb 2026 18:43:27 +0000 (18:43 +0000)]
KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
MIOCNCE had the potential to eat your data, and also was never
implemented by anyone. It's been retrospectively removed from
the architecture, and we're happy to follow that lead.
Marc Zyngier [Mon, 2 Feb 2026 18:43:25 +0000 (18:43 +0000)]
KVM: arm64: Simplify handling of full register invalid constraint
Now that we embed the RESx bits in the register description, it becomes
easier to deal with registers that are simply not valid, as their
existence is not satisfied by the configuration (SCTLR2_ELx without
FEAT_SCTLR2, for example). Such registers essentially become RES0 for
any bit that wasn't already advertised as RESx.
Marc Zyngier [Mon, 2 Feb 2026 18:43:23 +0000 (18:43 +0000)]
KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
Now that we can link the RESx behaviour with the value of HCR_EL2.E2H,
we can trivially express the tautological constraint that makes E2H
a reserved value at all times.
Marc Zyngier [Mon, 2 Feb 2026 18:43:21 +0000 (18:43 +0000)]
KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
"Thanks" to VHE, SCTLR_EL2 radically changes shape depending on the
value of HCR_EL2.E2H, as a lot of the bits that didn't have much
meaning with E2H=0 start impacting EL0 with E2H=1.
This has a direct impact on the RESx behaviour of these bits, and
we need a way to express them.
For this purpose, introduce two new constaints that, when the
controlling feature is not present, force the field to RES1 depending
on the value of E2H. Note that RES0 is still implicit,
This allows diverging RESx values depending on the value of E2H,
something that is required by a bunch of SCTLR_EL2 bits.
Marc Zyngier [Mon, 2 Feb 2026 18:43:20 +0000 (18:43 +0000)]
KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
A bunch of EL2 configuration are very similar to their EL1 counterpart,
with the added constraint that HCR_EL2.E2H being 1.
For us, this means HCR_EL2.E2H being RES1, which is something we can
statically evaluate.
Add a REQUIRES_E2H1 constraint, which allows us to express conditions
in a much simpler way (without extra code). Existing occurrences are
converted, before we add a lot more.
Marc Zyngier [Mon, 2 Feb 2026 18:43:19 +0000 (18:43 +0000)]
KVM: arm64: Simplify FIXED_VALUE handling
The FIXED_VALUE qualifier (mostly used for HCR_EL2) is pointlessly
complicated, as it tries to piggy-back on the previous RES0 handling
while being done in a different phase, on different data.
Instead, make it an integral part of the RESx computation, and allow
it to directly set RESx bits. This is much easier to understand.
It also paves the way for some additional changes to that will allow
the full removal of the FIXED_VALUE handling.
Marc Zyngier [Mon, 2 Feb 2026 18:43:16 +0000 (18:43 +0000)]
KVM: arm64: Allow RES1 bits to be inferred from configuration
So far, when a bit field is tied to an unsupported feature, we set
it as RES0. This is almost correct, but there are a few exceptions
where the bits become RES1.
Add a AS_RES1 qualifier that instruct the RESx computing code to
simply do that.
Marc Zyngier [Mon, 2 Feb 2026 18:43:14 +0000 (18:43 +0000)]
KVM: arm64: Extend unified RESx handling to runtime sanitisation
Add a new helper to retrieve the RESx values for a given system
register, and use it for the runtime sanitisation.
This results in slightly better code generation for a fairly hot
path in the hypervisor, and additionally covers all sanitised
registers in all conditions, not just the VNCR-based ones.
Marc Zyngier [Mon, 2 Feb 2026 18:43:13 +0000 (18:43 +0000)]
KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits
We have so far mostly tracked RES0 bits, but only made a few attempts
at being just as strict for RES1 bits (probably because they are both
rarer and harder to handle).
Start scratching the surface by introducing a data structure tracking
RES0 and RES1 bits at the same time.
Note that contrary to the usual idiom, this structure is mostly passed
around by value -- the ABI handles it nicely, and the resulting code is
much nicer.
Marc Zyngier [Mon, 2 Feb 2026 18:43:10 +0000 (18:43 +0000)]
arm64: Convert SCTLR_EL2 to sysreg infrastructure
Convert SCTLR_EL2 to the sysreg infrastructure, as per the 2025-12_rel
revision of the Registers.json file.
Note that we slightly deviate from the above, as we stick to the ARM
ARM M.a definition of SCTLR_EL2[9], which is RES0, in order to avoid
dragging the POE2 definitions...
Fuad Tabba [Mon, 2 Feb 2026 15:22:53 +0000 (15:22 +0000)]
KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported
The NV stage-2 manipulation functions kvm_nested_s2_unmap(),
kvm_nested_s2_wp(), and others, are being called for any stage-2
manipulation regardless of whether nested virtualization is supported or
enabled for the VM.
For protected KVM (pKVM), `struct kvm_pgtable` uses the
`pkvm_mappings` member of the union. This member aliases `ia_bits`,
which is used by the non-protected NV code paths. Attempting to
read `pgt->ia_bits` in these functions results in treating
protected mapping pointers or state values as bit-shift amounts.
This triggers a UBSAN shift-out-of-bounds error:
UBSAN: shift-out-of-bounds in arch/arm64/kvm/nested.c:1127:34
shift exponent 174565952 is too large for 64-bit type 'unsigned long'
Call trace:
__ubsan_handle_shift_out_of_bounds+0x28c/0x2c0
kvm_nested_s2_unmap+0x228/0x248
kvm_arch_flush_shadow_memslot+0x98/0xc0
kvm_set_memslot+0x248/0xce0
Since pKVM and NV are mutually exclusive, prevent entry into these
NV handling functions if the VM has not allocated any nested MMUs
(i.e., `kvm->arch.nested_mmus_size` is 0).
Fixes: 7270cc9157f47 ("KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers") Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202152310.113467-1-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
Fuad Tabba [Mon, 2 Feb 2026 08:57:21 +0000 (08:57 +0000)]
KVM: arm64: Use standard seq_file iterator for vgic-debug debugfs
The current implementation uses `vgic_state_iter` in `struct
vgic_dist` to track the sequence position. This effectively makes the
iterator shared across all open file descriptors for the VM.
This approach has significant drawbacks:
- It enforces mutual exclusion, preventing concurrent reads of the
debugfs file (returning -EBUSY).
- It relies on storing transient iterator state in the long-lived
VM structure (`vgic_dist`).
Refactor the implementation to use the standard `seq_file` iterator.
Instead of storing state in `kvm_arch`, rely on the `pos` argument
passed to the `start` and `next` callbacks, which tracks the logical
index specific to the file descriptor.
This change enables concurrent access and eliminates the
`vgic_state_iter` field from `struct vgic_dist`.
The vgic-debug interface implementation uses XArray marks
(`LPI_XA_MARK_DEBUG_ITER`) to "snapshot" LPIs at the start of iteration.
This modifies global state for a read-only operation and complicates
reference counting, leading to leaks if iteration is aborted or fails.
Reimplement the iterator to use dynamic iteration logic:
- Remove `lpi_idx` from `struct vgic_state_iter`.
- Replace the XArray marking mechanism with dynamic iteration using
`xa_find_after(..., XA_PRESENT)`.
- Wrap XArray traversals in `rcu_read_lock()`/`rcu_read_unlock()` to
ensure safety against concurrent modifications (e.g., LPI unmapping).
- Handle potential races where an LPI is removed during iteration by
gracefully skipping it in `show()`, rather than warning.
- Remove the unused `LPI_XA_MARK_DEBUG_ITER` definition.
This simplifies the lifecycle management of the iterator and prevents
resource leaks associated with the marking mechanism, and paves the way
for using a standard seq_file iterator.
Fuad Tabba [Mon, 2 Feb 2026 08:57:19 +0000 (08:57 +0000)]
KVM: arm64: Use standard seq_file iterator for idregs debugfs
The current implementation uses `idreg_debugfs_iter` in `struct
kvm_arch` to track the sequence position. This effectively makes the
iterator shared across all open file descriptors for the VM.
This approach has significant drawbacks:
- It enforces mutual exclusion, preventing concurrent reads of the
debugfs file (returning -EBUSY).
- It relies on storing transient iterator state in the long-lived VM
structure (`kvm_arch`).
- The use of `u8` for the iterator index imposes an implicit limit of
255 registers. While not currently exceeded, this is fragile against
future architectural growth. Switching to `loff_t` eliminates this
overflow risk.
Refactor the implementation to use the standard `seq_file` iterator.
Instead of storing state in `kvm_arch`, rely on the `pos` argument
passed to the `start` and `next` callbacks, which tracks the logical
index specific to the file descriptor.
This change enables concurrent access and eliminates the
`idreg_debugfs_iter` field from `struct kvm_arch`.
Sascha Bischoff [Wed, 28 Jan 2026 18:07:33 +0000 (18:07 +0000)]
irqchip/gic-v5: Check if impl is virt capable
Now that there is support for creating a GICv5-based guest with KVM,
check that the hardware itself supports virtualisation, skipping the
setting of struct gic_kvm_info if not.
Note: If native GICv5 virt is not supported, then nor is
FEAT_GCIE_LEGACY, so we are able to skip altogether.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260128175919.3828384-33-sascha.bischoff@arm.com
[maz: cosmetic changes] Signed-off-by: Marc Zyngier <maz@kernel.org>
Sascha Bischoff [Wed, 28 Jan 2026 18:00:52 +0000 (18:00 +0000)]
KVM: arm64: gic: Set vgic_model before initing private IRQs
Different GIC types require the private IRQs to be initialised
differently. GICv5 is the culprit as it supports both a different
number of private IRQs, and all of these are PPIs (there are no
SGIs). Moreover, as GICv5 uses the top bits of the interrupt ID to
encode the type, the intid also needs to computed differently.
Up until now, the GIC model has been set after initialising the
private IRQs for a VCPU. Move this earlier to ensure that the GIC
model is available when configuring the private IRQs. While we're at
it, also move the setting of the in_kernel flag and implementation
revision to keep them grouped together as before.
Sascha Bischoff [Wed, 28 Jan 2026 18:00:06 +0000 (18:00 +0000)]
arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1
The GICv5 architecture is dropping the ICC_HAPR_EL1 and ICV_HAPR_EL1
system registers. These registers were never added to the sysregs, but
the traps for them were.
Drop the trap bit from the ICH_HFGRTR_EL2 and make it Res1 as per the
upcoming GICv5 spec change. Additionally, update the EL2 setup code to
not attempt to set that bit.
Sascha Bischoff [Wed, 28 Jan 2026 17:59:50 +0000 (17:59 +0000)]
KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2
The VGIC-v3 code relied on hand-written definitions for the
ICH_VMCR_EL2 register. This register, and the associated fields, is
now generated as part of the sysreg framework. Move to using the
generated definitions instead of the hand-written ones.
There are no functional changes as part of this change.
Sascha Bischoff [Mon, 8 Dec 2025 15:28:23 +0000 (15:28 +0000)]
KVM: arm64: Correct test for ICH_HCR_EL2_TDIR cap for GICv5 hosts
The original order of checks in the ICH_HCR_EL2_TDIR test returned
with false early in the case where the native GICv3 CPUIF was not
present. The result was that on GICv5 hosts with legacy support -
which do not have the GICv3 CPUIF - the test always returned false.
Reshuffle the checks such that support for GICv5 legacy is checked
prior to checking for the native GICv3 CPUIF.
Sascha Bischoff [Mon, 8 Dec 2025 15:28:22 +0000 (15:28 +0000)]
KVM: arm64: gic: Enable GICv3 CPUIF trapping on GICv5 hosts if required
Factor out the enable (and printing of) the GICv3 CPUIF traps from the
main GICv3 probe into a separate function. Call said function from the
GICv5 probe for legacy support, ensuring that any required GICv3 CPUIF
traps on GICv5 hosts will be correctly handled, rather than injecting
an undef into the guest.
Marc Zyngier [Fri, 23 Jan 2026 19:16:36 +0000 (19:16 +0000)]
KVM: arm64: Kill KVM_PGTABLE_S2_NOFWB
Nobody is using this flag anymore, so remove it. This allows
some cleanup by removing stage2_has_fwb(), which is can be replaced
by a direct check on the capability.
Marc Zyngier [Fri, 23 Jan 2026 19:16:33 +0000 (19:16 +0000)]
arm64: Add MT_S2{,_FWB}_AS_S1 encodings
pKVM usage of S2 translation on the host is purely for isolation
purposes, not translation. To that effect, the memory attributes
being used must be that of S1.
With FWB=0, this is easily achieved by using the Normal Cacheable
type (which is the weakest possible memory type) at S2, and let S1
pick something stronger as required.
With FWB=1, the attributes are combined in a different way, and we
cannot arbitrarily use Normal Cacheable. We can, however, use a
memattr encoding that indicates that the final attributes are that
of Stage-1.
Add these encoding and a few pointers to the relevant parts of the
specification. It might come handy some day.
Fuad Tabba [Thu, 22 Jan 2026 11:22:18 +0000 (11:22 +0000)]
KVM: arm64: Use kvm_has_mte() in pKVM trap initialization
When initializing HCR traps in protected mode, use kvm_has_mte() to
check for MTE support rather than kvm_has_feat(kvm, ID_AA64PFR1_EL1,
MTE, IMP).
kvm_has_mte() provides a more comprehensive check:
- kvm_has_feat() only checks if MTE is in the guest's ID register view
(i.e., what we advertise to the guest)
- kvm_has_mte() checks both system_supports_mte() AND whether
KVM_ARCH_FLAG_MTE_ENABLED is set for this VM instance
Fuad Tabba [Thu, 22 Jan 2026 11:22:17 +0000 (11:22 +0000)]
KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabled
When MTE hardware is present but disabled via software (`arm64.nomte` or
`CONFIG_ARM64_MTE=n`), the kernel clears `HCR_EL2.ATA` and sets
`HCR_EL2.TID5`, to prevent the use of MTE instructions.
Additionally, accesses to certain MTE system registers trap to EL2 with
exception class ESR_ELx_EC_SYS64. To emulate hardware without MTE (where
such accesses would cause an Undefined Instruction exception), inject
UNDEF into the host.
Fuad Tabba [Thu, 22 Jan 2026 11:22:16 +0000 (11:22 +0000)]
KVM: arm64: Trap MTE access and discovery when MTE is disabled
If MTE is not supported by the hardware, or is disabled in the kernel
configuration (`CONFIG_ARM64_MTE=n`) or command line (`arm64.nomte`),
the kernel stops advertising MTE to userspace and avoids using MTE
instructions. However, this is a software-level disable only.
When MTE hardware is present and enabled by EL3 firmware, leaving
`HCR_EL2.ATA` set allows the host to execute MTE instructions (STG, LDG,
etc.) and access allocation tags in physical memory.
Prevent this by clearing `HCR_EL2.ATA` when MTE is disabled. Remove it
from the `HCR_HOST_NVHE_FLAGS` default, and conditionally set it in
`cpu_prepare_hyp_mode()` only when `system_supports_mte()` returns true.
This causes MTE instructions to trap to EL2 when `HCR_EL2.ATA` is
cleared.
Additionally, set `HCR_EL2.TID5` when MTE is disabled. This traps reads
of `GMID_EL1` (Multiple tag transfer ID register) to EL2, preventing the
discovery of MTE parameters (such as tag block size) when the feature is
suppressed.
Early boot code in `head.S` temporarily keeps `HCR_ATA` set to avoid
special-casing initialization paths. This is safe because this code
executes before untrusted code runs and will clear `HCR_ATA` if MTE is
disabled.
Fuad Tabba [Thu, 22 Jan 2026 11:22:15 +0000 (11:22 +0000)]
KVM: arm64: Remove dead code resetting HCR_EL2 for pKVM
The pKVM lifecycle does not support tearing down the hypervisor and
returning to the hyp stub once initialized. The transition to protected
mode is one-way.
Consequently, the code path in hyp-init.S responsible for resetting
EL2 registers (triggered by kexec or hibernation) is unreachable in
protected mode.
Remove the dead code handling HCR_EL2 reset for
ARM64_KVM_PROTECTED_MODE.
Marc Zyngier [Fri, 23 Jan 2026 10:04:47 +0000 (10:04 +0000)]
Merge branch kvm-arm64/pkvm-features-6.20 into kvmarm-master/next
* kvm-arm64/pkvm-features-6.20:
: .
: pKVM guest feature trapping fixes, courtesy of Fuad Tabba.
: .
KVM: arm64: Prevent host from managing timer offsets for protected VMs
KVM: arm64: Check whether a VM IOCTL is allowed in pKVM
KVM: arm64: Track KVM IOCTLs and their associated KVM caps
KVM: arm64: Do not allow KVM_CAP_ARM_MTE for any guest in pKVM
KVM: arm64: Include VM type when checking VM capabilities in pKVM
KVM: arm64: Introduce helper to calculate fault IPA offset
KVM: arm64: Fix MTE flag initialization for protected VMs
KVM: arm64: Fix Trace Buffer trap polarity for protected VMs
KVM: arm64: Fix Trace Buffer trapping for protected VMs
Marc Zyngier [Fri, 23 Jan 2026 10:04:35 +0000 (10:04 +0000)]
Merge branch kvm-arm64/feat_idst into kvmarm-master/next
* kvm-arm64/feat_idst:
: .
: Add support for FEAT_IDST, allowing ID registers that are not implemented
: to be reported as a normal trap rather than as an UNDEF exception.
: .
KVM: arm64: selftests: Add a test for FEAT_IDST
KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome
KVM: arm64: pkvm: Add a generic synchronous exception injection primitive
KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE
KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way
KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers
KVM: arm64: Add a generic synchronous exception injection primitive
KVM: arm64: Add trap routing for GMID_EL1
arm64: Repaint ID_AA64MMFR2_EL1.IDS description
Marc Zyngier [Fri, 23 Jan 2026 10:03:03 +0000 (10:03 +0000)]
Merge branch kvm-arm64/vtcr into kvmarm-master/next
* kvm-arm64/vtcr:
: .
: VTCR_EL2 conversion to the configuration-driven RESx framework,
: fixing a couple of UXN/PXN/XN bugs in the process.
: .
KVM: arm64: nv: Return correct RES0 bits for FGT registers
KVM: arm64: Always populate FGT masks at boot time
KVM: arm64: Honor UX/PX attributes for EL2 S1 mappings
KVM: arm64: Convert VTCR_EL2 to config-driven sanitisation
KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co
arm64: Convert VTCR_EL2 to sysreg infratructure
arm64: Convert ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 to UnsignedEnum
KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers
KVM: arm64: Don't blindly set set PSTATE.PAN on guest exit
KVM: arm64: nv: Respect stage-2 write permssion when setting stage-1 AF
KVM: arm64: Remove unused vcpu_{clear,set}_wfx_traps()
KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate()
KVM: arm64: Remove extra argument for __pvkm_host_{share,unshare}_hyp()
KVM: arm64: Inject UNDEF for a register trap without accessor
KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load
KVM: arm64: Fix EL2 S1 XN handling for hVHE setups
KVM: arm64: gic: Check for vGICv3 when clearing TWI
Marc Zyngier [Fri, 23 Jan 2026 09:51:57 +0000 (09:51 +0000)]
Merge branch arm64/for-next/cpufeature into kvmarm-master/next
Merge arm64/for-next/cpufeature in to resolve conflicts resulting from
the removal of CONFIG_PAN.
* arm64/for-next/cpufeature:
arm64: Add support for FEAT_{LS64, LS64_V}
KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest
arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1
KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory
KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B
KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots
arm64: Unconditionally enable PAN support
arm64: Unconditionally enable LSE support
arm64: Add support for TSV110 Spectre-BHB mitigation
Yicong Yang [Mon, 19 Jan 2026 02:29:27 +0000 (10:29 +0800)]
arm64: Add support for FEAT_{LS64, LS64_V}
Armv8.7 introduces single-copy atomic 64-byte loads and stores
instructions and its variants named under FEAT_{LS64, LS64_V}.
These features are identified by ID_AA64ISAR1_EL1.LS64 and the
use of such instructions in userspace (EL0) can be trapped.
As st64bv (FEAT_LS64_V) and st64bv0 (FEAT_LS64_ACCDATA) can not be tell
apart, FEAT_LS64 and FEAT_LS64_ACCDATA which will be supported in later
patch will be exported to userspace, FEAT_LS64_V will be enabled only
in kernel.
In order to support the use of corresponding instructions in userspace:
- Make ID_AA64ISAR1_EL1.LS64 visbile to userspace
- Add identifying and enabling in the cpufeature list
- Expose these support of these features to userspace through HWCAP3
and cpuinfo
ld64b/st64b (FEAT_LS64) and st64bv (FEAT_LS64_V) is intended for
special memory (device memory) so requires support by the CPU, system
and target memory location (device that support these instructions).
The HWCAP3_LS64, implies the support of CPU and system (since no
identification method from system, so SoC vendors should advertise
support in the CPU if system also support them).
Otherwise for ld64b/st64b the atomicity may not be guaranteed or a
DABT will be generated, so users (probably userspace driver developer)
should make sure the target memory (device) also have the support.
For st64bv 0xffffffffffffffff will be returned as status result for
unsupported memory so user should check it.
Document the restrictions along with HWCAP3_LS64.
Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
Yicong Yang [Mon, 19 Jan 2026 02:29:26 +0000 (10:29 +0800)]
KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest
Using FEAT_{LS64, LS64_V} instructions in a guest is also controlled
by HCRX_EL2.{EnALS, EnASR}. Enable it if guest has related feature.
Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
Yicong Yang [Mon, 19 Jan 2026 02:29:25 +0000 (10:29 +0800)]
arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1
Instructions introduced by FEAT_{LS64, LS64_V} is controlled by
HCRX_EL2.{EnALS, EnASR}. Configure all of these to allow usage
at EL0/1.
This doesn't mean these instructions are always available in
EL0/1 if provided. The hypervisor still have the control at
runtime.
Acked-by: Will Deacon <will@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
Yicong Yang [Mon, 19 Jan 2026 02:29:24 +0000 (10:29 +0800)]
KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory
If FEAT_LS64WB not supported, FEAT_LS64* instructions only support
to access Device/Uncacheable memory, otherwise a data abort for
unsupported Exclusive or atomic access (0x35, UAoEF) is generated
per spec. It's implementation defined whether the target exception
level is routed and is possible to implemented as route to EL2 on a
VHE VM according to DDI0487L.b Section C3.2.6 Single-copy atomic
64-byte load/store.
If it's implemented as generate the DABT to the final enabled stage
(stage-2), inject the UAoEF back to the guest after checking the
memslot is valid.
Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
Marc Zyngier [Mon, 19 Jan 2026 02:29:23 +0000 (10:29 +0800)]
KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B
Add a bit of documentation for KVM_EXIT_ARM_LDST64B so that userspace
knows what to expect.
Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
Marc Zyngier [Mon, 19 Jan 2026 02:29:22 +0000 (10:29 +0800)]
KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots
The main use of {LD,ST}64B* is to talk to a device, which is hopefully
directly assigned to the guest and requires no additional handling.
However, this does not preclude a VMM from exposing a virtual device
to the guest, and to allow 64 byte accesses as part of the programming
interface. A direct consequence of this is that we need to be able
to forward such access to userspace.
Given that such a contraption is very unlikely to ever exist, we choose
to offer a limited service: userspace gets (as part of a new exit reason)
the ESR, the IPA, and that's it. It is fully expected to handle the full
semantics of the instructions, deal with ACCDATA, the return values and
increment PC. Much fun.
A canonical implementation can also simply inject an abort and be done
with it. Frankly, don't try to do anything else unless you have time
to waste.
Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
Marc Zyngier [Wed, 7 Jan 2026 18:06:59 +0000 (18:06 +0000)]
arm64: Unconditionally enable LSE support
LSE atomics have been in the architecture since ARMv8.1 (released in
2014), and are hopefully supported by all modern toolchains.
Drop the optional nature of LSE support in the kernel, and always
compile the support in, as this really is very little code. LL/SC
still is the default, and the switch to LSE is done dynamically.
Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
KVM: arm64: nv: Return correct RES0 bits for FGT registers
We had extended the sysreg masking infrastructure to more general
registers, instead of restricting it to VNCR-backed registers, since
commit a0162020095e ("KVM: arm64: Extend masking facility to arbitrary
registers"). Fix kvm_get_sysreg_res0() to reflect this fact.
Note that we're sure that we only deal with FGT registers in
kvm_get_sysreg_res0(), the
if (sr < __VNCR_START__)
is actually a never false, which should probably be removed later.
Fixes: 69c19e047dfe ("KVM: arm64: Add TCR2_EL2 to the sysreg arrays") Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260121101631.41037-1-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
Marc Zyngier [Thu, 22 Jan 2026 08:51:53 +0000 (08:51 +0000)]
KVM: arm64: Always populate FGT masks at boot time
We currently only populate the FGT masks if the underlying HW does
support FEAT_FGT. However, with the addition of the RES1 support for
system registers, this results in a lot of noise at boot time, as
reported by Nathan.
That's because even if FGT isn't supported, we still check for the
attribution of the bits to particular features, and not keeping the
masks up-to-date leads to (fairly harmess) warnings.
Given that we want these checks to be enforced even if the HW doesn't
support FGT, enable the generation of FGT masks unconditionally (this
is rather cheap anyway). Only the storage of the FGT configuration is
avoided, which will save a tiny bit of memory on these machines.
Kornel Dulęba [Fri, 14 Nov 2025 11:11:53 +0000 (11:11 +0000)]
KVM: arm64: Fix error checking for FFA_VERSION
According to section 13.2 of the DEN0077 FF-A specification, when
firmware does not support the requested version, it should reply with
FFA_RET_NOT_SUPPORTED(-1). Table 13.6 specifies the type of the error
code as int32.
Currently, the error checking logic compares the unsigned long return
value it got from the SMC layer, against a "-1" literal. This fails due
to a type mismatch: the literal is extended to 64 bits, whereas the
register contains only 32 bits of ones(0x00000000ffffffff).
Consequently, hyp_ffa_init misinterprets the "-1" return value as an
invalid FF-A version. This prevents pKVM initialization on devices where
FF-A is not supported in firmware.
Fix this by explicitly casting res.a0 to s32.
Fuad Tabba [Thu, 11 Dec 2025 10:47:09 +0000 (10:47 +0000)]
KVM: arm64: Prevent host from managing timer offsets for protected VMs
For protected VMs, the guest's timer offset state should not be
controlled by the host and must always run with a virtual counter offset
of 0. The existing timer logic allowed the host to set and manage the
timer counter offsets for protected VMs in certain cases.
Disable all host-side management of timer offsets for protected VMs by
adding checks in the relevant code paths.
Fuad Tabba [Thu, 11 Dec 2025 10:47:08 +0000 (10:47 +0000)]
KVM: arm64: Check whether a VM IOCTL is allowed in pKVM
Certain VM IOCTLs are tied to specific VM features. Since pKVM does not
support all features, restrict which IOCTLs are allowed depending on
whether the associated feature is supported.
Use the existing VM capability check as the source of truth to whether
an IOCTL is allowed for a particular VM by mapping the IOCTLs with their
associated capabilities.
Fuad Tabba [Thu, 11 Dec 2025 10:47:07 +0000 (10:47 +0000)]
KVM: arm64: Track KVM IOCTLs and their associated KVM caps
Track KVM IOCTLs (VM IOCTLs for now), and the associated KVM capability
that enables that IOCTL. Add a function that performs the lookup.
This will be used by CoCo VM Hypervisors (e.g., pKVM) to determine
whether a particular KVM IOCTL is allowed for its VMs.
Suggested-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com>
[maz: don't expose KVM_CAP_BASIC to userspace, and rely on NR_VCPUS
as a proxy for this] Link: https://patch.msgid.link/20251211104710.151771-8-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
Fuad Tabba [Thu, 11 Dec 2025 10:47:06 +0000 (10:47 +0000)]
KVM: arm64: Do not allow KVM_CAP_ARM_MTE for any guest in pKVM
Supporting MTE in pKVM introduces significant complexity to the
hypervisor at EL2, even for non-protected VMs, since it would require
EL2 to handle tag management.
For now, do not allow KVM_CAP_ARM_MTE for any VM type in protected mode.
Fuad Tabba [Thu, 11 Dec 2025 10:47:05 +0000 (10:47 +0000)]
KVM: arm64: Include VM type when checking VM capabilities in pKVM
Certain features and capabilities are restricted in protected mode. Most
of these features are restricted only for protected VMs, but some
are restricted for ALL VMs in protected mode.
Extend the pKVM capability check to pass the VM (kvm), and use that when
determining supported features.
Fuad Tabba [Thu, 11 Dec 2025 10:47:03 +0000 (10:47 +0000)]
KVM: arm64: Fix MTE flag initialization for protected VMs
The function pkvm_init_features_from_host() initializes guest
features, propagating them from the host. The logic to propagate
KVM_ARCH_FLAG_MTE_ENABLED (Memory Tagging Extension)
has a couple of issues.
First, the check was in the common path, before the divergence for
protected and non-protected VMs. For non-protected VMs, this was
unnecessary, as 'kvm->arch.flags' is completely overwritten by
host_arch_flags immediately after, which already contains the MTE flag.
For protected VMs, this was setting the flag even if the feature is not
allowed.
Second, the check was reading 'host_kvm->arch.flags' instead of using
the local 'host_arch_flags', which is read once from the host flags.
Fix these by moving the MTE flag check inside the protected-VM-only
path, checking if the feature is allowed, and changing it to use the
correct host_arch_flags local variable. This ensures non-protected VMs
get the flag via the bulk copy, and protected VMs get it via an explicit
check.
Fixes: b7f345fbc32a ("KVM: arm64: Fix FEAT_MTE in pKVM") Reviewed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
Fuad Tabba [Thu, 11 Dec 2025 10:47:02 +0000 (10:47 +0000)]
KVM: arm64: Fix Trace Buffer trap polarity for protected VMs
The E2TB bits in MDCR_EL2 control trapping of Trace Buffer system
register accesses. These accesses are trapped to EL2 when the bits are
clear.
The trap initialization logic for protected VMs in pvm_init_traps_mdcr()
had the polarity inverted. When a guest did not support the Trace Buffer
feature, the code was setting E2TB. This incorrectly disabled the trap,
potentially allowing a protected guest to access registers for a feature
it was not given.
Fix this by inverting the operation.
Fixes: f50758260bff ("KVM: arm64: Group setting traps for protected VMs by control register") Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
Fuad Tabba [Thu, 11 Dec 2025 10:47:01 +0000 (10:47 +0000)]
KVM: arm64: Fix Trace Buffer trapping for protected VMs
For protected VMs in pKVM, the hypervisor should trap accesses to trace
buffer system registers if Trace Buffer isn't supported by the VM.
However, the current code only traps if Trace Buffer External Mode isn't
supported.
Fix this by checking for FEAT_TRBE (Trace Buffer) rather than
FEAT_TRBE_EXT.
Fixes: 9d5261269098 ("KVM: arm64: Trap external trace for protected VMs") Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
Fuad Tabba [Fri, 9 Jan 2026 08:22:18 +0000 (08:22 +0000)]
KVM: selftests: Fix typos and stale comments in kvm_util
Fix minor documentation errors in `kvm_util.h` and `kvm_util.c`.
- Correct the argument description for `vcpu_args_set` in `kvm_util.h`,
which incorrectly listed `vm` instead of `vcpu`.
- Fix a typo in the comment for `kvm_selftest_arch_init` ("exeucting" ->
"executing").
- Correct the return value description for `vm_vaddr_unused_gap` in
`kvm_util.c` to match the implementation, which returns an address "at
or above" `vaddr_min`, not "at or below".
Fuad Tabba [Fri, 9 Jan 2026 08:22:17 +0000 (08:22 +0000)]
KVM: selftests: Move page_align() to shared header
To avoid code duplication, move page_align() to the shared `kvm_util.h`
header file. Rename it to vm_page_align(), to make it clear that the
alignment is done with respect to the guest's base page size.
Fuad Tabba [Fri, 9 Jan 2026 08:22:16 +0000 (08:22 +0000)]
KVM: riscv: selftests: Fix incorrect rounding in page_align()
The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.
Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.
Fixes: 3e06cdf10520 ("KVM: selftests: Add initial support for RISC-V 64-bit") Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260109082218.3236580-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
Fuad Tabba [Fri, 9 Jan 2026 08:22:15 +0000 (08:22 +0000)]
KVM: arm64: selftests: Fix incorrect rounding in page_align()
The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.
Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.
Fixes: 7a6629ef746d ("kvm: selftests: add virt mem support for aarch64") Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260109082218.3236580-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM selftests map all guest code and data into the lower virtual address
range (0x0000...) managed by TTBR0_EL1. The upper range (0xFFFF...)
managed by TTBR1_EL1 is unused and uninitialized.
If a guest accesses the upper range, the MMU attempts a translation
table walk using uninitialized registers, leading to unpredictable
behavior.
Set `TCR_EL1.EPD1` to disable translation table walks for TTBR1_EL1,
ensuring that any access to the upper range generates an immediate
Translation Fault. Additionally, set `TCR_EL1.TBI1` (Top Byte Ignore) to
ensure that tagged pointers in the upper range also deterministically
trigger a Translation Fault via EPD1.
Define `TCR_EPD1_MASK`, `TCR_EPD1_SHIFT`, and `TCR_TBI1` in
`processor.h` to support this configuration. These are based on their
definitions in `arch/arm64/include/asm/pgtable-hwdef.h`.
Marc Zyngier [Thu, 8 Jan 2026 17:32:32 +0000 (17:32 +0000)]
KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome
With FEAT_IDST, unimplemented system registers in the feature ID space
must be reported using EC=0x18 at the closest handling EL, rather than
with an UNDEF.
Most of these system registers are always implemented thanks to their
dependency on FEAT_AA64, except for a set of (currently) three registers:
GMID_EL1 (depending on MTE2), CCSIDR2_EL1 (depending on FEAT_CCIDX),
and SMIDR_EL1 (depending on SME).
For these three registers, report their trap as EC=0x18 if they
end-up trapping into KVM and that FEAT_IDST is implemented in the guest.
Otherwise, just make them UNDEF.
Marc Zyngier [Thu, 8 Jan 2026 17:32:30 +0000 (17:32 +0000)]
KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE
If our host has MTE, but the guest doesn't, make sure we set HCR_EL2.TID5
to force GMID_EL1 being trapped. Such trap will be handled by the
FEAT_IDST handling.
Marc Zyngier [Thu, 8 Jan 2026 17:32:28 +0000 (17:32 +0000)]
KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers
Add a bit of infrastrtcture to triage_sysreg_trap() to handle the
case of registers falling into the Feature ID space that do not
have a local handler.
For these, we can directly apply the FEAT_IDST semantics and inject
an EC=0x18 exception. Otherwise, an UNDEF will do.
Marc Zyngier [Thu, 8 Jan 2026 17:32:26 +0000 (17:32 +0000)]
KVM: arm64: Add trap routing for GMID_EL1
HCR_EL2.TID5 is currently ignored by the trap routing infrastructure.
Wire it in the routing table so that GMID_EL1, the sole register
trapped by this bit, is correctly handled in the NV case.
Marc Zyngier [Thu, 8 Jan 2026 17:32:25 +0000 (17:32 +0000)]
arm64: Repaint ID_AA64MMFR2_EL1.IDS description
ID_AA64MMFR2_EL1.IDS, as described in the sysreg file, is pretty horrible
as it diesctly give the ESR value. Repaint it using the usual NI/IMP
identifiers to describe the absence/presence of FEAT_IDST.
Also add the new EL3 routing feature, even if we really don't care about it.
Marc Zyngier [Wed, 10 Dec 2025 17:30:24 +0000 (17:30 +0000)]
KVM: arm64: Honor UX/PX attributes for EL2 S1 mappings
Now that we potentially have two bits to deal with when setting
execution permissions, make sure we correctly handle them when both
when building the page tables and when reading back from them.
Marc Zyngier [Wed, 10 Dec 2025 17:30:23 +0000 (17:30 +0000)]
KVM: arm64: Convert VTCR_EL2 to config-driven sanitisation
Describe all the VTCR_EL2 fields and their respective configurations,
making sure that we correctly ignore the bits that are not defined
for a given guest configuration.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 10 Dec 2025 17:30:22 +0000 (17:30 +0000)]
KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co
None of the registers we manage in the feature dependency infrastructure
so far has any RES1 bit. This is about to change, as VTCR_EL2 has
its bit 31 being RES1.
In order to not fail the consistency checks by not describing a bit,
add RES1 bits to the set of immutable bits. This requires some extra
surgery for the FGT handling, as we now need to track RES1 bits there
as well.
There are no RES1 FGT bits *yet*. Watch this space.
Marc Zyngier [Wed, 10 Dec 2025 17:30:21 +0000 (17:30 +0000)]
arm64: Convert VTCR_EL2 to sysreg infratructure
Our definition of VTCR_EL2 is both partial (tons of fields are
missing) and totally inconsistent (some constants are shifted,
some are not). They are also expressed in terms of TCR, which is
rather inconvenient.
Replace the ad-hoc definitions with the the generated version.
This results in a bunch of additional changes to make the code
with the unshifted nature of generated enumerations.
The register data was extracted from the BSD licenced AARCHMRS
(AARCHMRS_OPENSOURCE_A_profile_FAT-2025-09_ASL0).
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>