Marc Zyngier [Tue, 3 Dec 2024 19:02:36 +0000 (19:02 +0000)]
KVM: arm64: Do not allow ID_AA64MMFR0_EL1.ASIDbits to be overridden
Catalin reports that a hypervisor lying to a guest about the size
of the ASID field may result in unexpected issues:
- if the underlying HW does only supports 8 bit ASIDs, the ASID
field in a TLBI VAE1* operation is only 8 bits, and the HW will
ignore the other 8 bits
- if on the contrary the HW is 16 bit capable, the ASID field
in the same TLBI operation is always 16 bits, irrespective of
the value of TCR_ELx.AS.
This could lead to missed invalidations if the guest was lead to
assume that the HW had 8 bit ASIDs while they really are 16 bit wide.
In order to avoid any potential disaster that would be hard to debug,
prenent the migration between a host with 8 bit ASIDs to one with
wider ASIDs (the converse was obviously always forbidden). This is
also consistent with what we already do for VMIDs.
If it becomes absolutely mandatory to support such a migration path
in the future, we will have to trap and emulate all TLBIs, something
that nobody should look forward to.
Fixes: d5a32b60dc18 ("KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20241203190236.505759-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Marc Zyngier [Mon, 25 Nov 2024 09:47:56 +0000 (09:47 +0000)]
KVM: arm64: Fix S1/S2 combination when FWB==1 and S2 has Device memory type
The G.a revision of the ARM ARM had it pretty clear that HCR_EL2.FWB
had no influence on "The way that stage 1 memory types and attributes
are combined with stage 2 Device type and attributes." (D5.5.5).
However, this wording was lost in further revisions of the architecture.
Restore the intended behaviour, which is to take the strongest memory
type of S1 and S2 in this case, as if FWB was 0. The specification is
being fixed accordingly.
James Clark [Fri, 22 Nov 2024 16:46:35 +0000 (16:46 +0000)]
arm64: Fix usage of new shifted MDCR_EL2 values
Since the linked fixes commit, these masks are already shifted so remove
the shifts. One issue that this fixes is SPE and TRBE not being
available anymore:
arm_spe_pmu arm,spe-v1: profiling buffer owned by higher exception level
Fixes: 641630313e9c ("arm64: sysreg: Migrate MDCR_EL2 definition to table") Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241122164636.2944180-1-james.clark@linaro.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Wed, 20 Nov 2024 00:52:30 +0000 (16:52 -0800)]
KVM: arm64: Use MDCR_EL2.HPME to evaluate overflow of hyp counters
The 'global enable control' (as it is termed in the architecture) for
counters reserved by EL2 is MDCR_EL2.HPME. Use that instead of
PMCR_EL0.E when evaluating the overflow state for hyp counters.
Change the return value to a bool while at it, which better reflects the
fact that the overflow state is a shared signal and not a per-counter
property.
KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status
DDI0487K.a D13.3.1 describes the PMU overflow condition, which evaluates
to true if any counter's global enable (PMCR_EL0.E), overflow flag
(PMOVSSET_EL0[n]), and interrupt enable (PMINTENSET_EL1[n]) are all 1.
Of note, this does not require a counter to be enabled
(i.e. PMCNTENSET_EL0[n] = 1) to generate an overflow.
Align kvm_pmu_overflow_status() with the reality of the architecture
and stop using PMCNTENSET_EL0 as part of the overflow condition. The
bug was discovered while running an SBSA PMU test [*], which only sets
PMCR.E, PMOVSSET<0>, PMINTENSET<0>, and expects an overflow interrupt.
Marc Zyngier [Sun, 17 Nov 2024 16:57:57 +0000 (16:57 +0000)]
KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizes
The ITS ABI infrastructure allows for some pretty lax code, where
the size of the data doesn't have to match the size of the entry,
potentially leading to a collection of interesting bugs.
Commit 7fe28d7e68f9 ("KVM: arm64: vgic-its: Add a data length check
in vgic_its_save_*") added some checks, but starts by implicitly
casting all writes to a 64bit value, hiding some of the issues.
Instead, introduce macros that will check the data type actually used
for dealing with the table entries. The macros are taking a symbolic
entry type that is used to fetch the size of the entry type for the
current ABI. This immediately catches a couple of low-impact gotchas
(zero values that are implicitly 32bit), easy enough to fix.
Given that we currently only have a single ABI, hardcode a couple of
BUILD_BUG_ON()s that will fire if we use anything but a 64bit quantity,
and some (currently unreachable) fallback code that may become useful
one day.
Marc Zyngier [Sun, 17 Nov 2024 16:57:55 +0000 (16:57 +0000)]
KVM: arm64: vgic: Make vgic_get_irq() more robust
vgic_get_irq() has an awkward signature, as it takes both a kvm
*and* a vcpu, where the vcpu is allowed to be NULL if the INTID
being looked up is a global interrupt (SPI or LPI).
This leads to potentially problematic situations where the INTID
passed is a private interrupt, but that there is no vcpu.
In order to make things less ambiguous, let have *two* helpers
instead:
- vgic_get_irq(struct kvm *kvm, u32 intid), which is only concerned
with *global* interrupts, as indicated by the lack of vcpu.
- vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid), which can
return *any* interrupt class, but must have of course a non-NULL
vcpu.
Most of the code nicely falls under one or the other situations,
except for a couple of cases (close to the UABI or in the debug code)
where we have to distinguish between the two cases.
James Clark [Tue, 12 Nov 2024 10:56:03 +0000 (10:56 +0000)]
KVM: arm64: Pass on SVE mapping failures
This function can fail but its return value isn't passed onto the
caller. Presumably this could result in a broken state.
Fixes: 66d5b53e20a6 ("KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Fuad Tabba <tabba@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241112105604.795809-1-james.clark@linaro.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 11 Nov 2024 20:09:09 +0000 (20:09 +0000)]
Merge branch kvm-arm64/vgic-its-fixes into kvmarm/next
* kvm-arm64/vgic-its-fixes:
: Fixes for vgic-its save/restore, courtesy of Kunkun Jiang and Jing Zhang
:
: Address bugs where restoring an ITS consumes a stale DTE/ITE, which
: may lead to either garbage mappings in the ITS or the overall restore
: ioctl failing. The fix in both cases is to zero a DTE/ITE when its
: translation has been invalidated by the guest.
KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE
KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device
KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Kunkun Jiang [Thu, 7 Nov 2024 21:41:37 +0000 (13:41 -0800)]
KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE
When DISCARD frees an ITE, it does not invalidate the
corresponding ITE. In the scenario of continuous saves and
restores, there may be a situation where an ITE is not saved
but is restored. This is unreasonable and may cause restore
to fail. This patch clears the corresponding ITE when DISCARD
frees an ITE.
Cc: stable@vger.kernel.org Fixes: eff484e0298d ("KVM: arm64: vgic-its: ITT save and restore") Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
[Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-6-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Kunkun Jiang [Thu, 7 Nov 2024 21:41:36 +0000 (13:41 -0800)]
KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device
vgic_its_save_device_tables will traverse its->device_list to
save DTE for each device. vgic_its_restore_device_tables will
traverse each entry of device table and check if it is valid.
Restore if valid.
But when MAPD unmaps a device, it does not invalidate the
corresponding DTE. In the scenario of continuous saves
and restores, there may be a situation where a device's DTE
is not saved but is restored. This is unreasonable and may
cause restore to fail. This patch clears the corresponding
DTE when MAPD unmaps a device.
Cc: stable@vger.kernel.org Fixes: 57a9a117154c ("KVM: arm64: vgic-its: Device table save/restore") Co-developed-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
[Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-5-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Jing Zhang [Thu, 7 Nov 2024 21:41:34 +0000 (13:41 -0800)]
KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*
In all the vgic_its_save_*() functinos, they do not check whether
the data length is 8 bytes before calling vgic_write_guest_lock.
This patch adds the check. To prevent the kernel from being blown up
when the fault occurs, KVM_BUG_ON() is used. And the other BUG_ON()s
are replaced together.
Cc: stable@vger.kernel.org Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
[Jing: Update with the new entry read/write helpers] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-4-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 11 Nov 2024 18:48:49 +0000 (18:48 +0000)]
Merge branch kvm-arm64/nv-pmu into kvmarm/next
* kvm-arm64/nv-pmu:
: Support for vEL2 PMU controls
:
: Align the vEL2 PMU support with the current state of non-nested KVM,
: including:
:
: - Trap routing, with the annoying complication of EL2 traps that apply
: in Host EL0
:
: - PMU emulation, using the correct configuration bits depending on
: whether a counter falls in the hypervisor or guest range of PMCs
:
: - Perf event swizzling across nested boundaries, as the event filtering
: needs to be remapped to cope with vEL2
KVM: arm64: nv: Reprogram PMU events affected by nested transition
KVM: arm64: nv: Apply EL2 event filtering when in hyp context
KVM: arm64: nv: Honor MDCR_EL2.HLP
KVM: arm64: nv: Honor MDCR_EL2.HPME
KVM: arm64: Add helpers to determine if PMC counts at a given EL
KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN
KVM: arm64: Rename kvm_pmu_valid_counter_mask()
KVM: arm64: nv: Advertise support for FEAT_HPMN0
KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN
KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0
KVM: arm64: nv: Reinject traps that take effect in Host EL0
KVM: arm64: nv: Rename BEHAVE_FORWARD_ANY
KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps
KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2
arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1
arm64: sysreg: Migrate MDCR_EL2 definition to table
arm64: sysreg: Describe ID_AA64DFR2_EL1 fields
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 11 Nov 2024 18:48:12 +0000 (18:48 +0000)]
Merge branch kvm-arm64/mmio-sea into kvmarm/next
* kvm-arm64/mmio-sea:
: Fix for SEA injection in response to MMIO
:
: Fix + test coverage for SEA injection in response to an unhandled MMIO
: exit to userspace. Naturally, if userspace decides to abort an MMIO
: instruction KVM shouldn't continue with instruction emulation...
KVM: arm64: selftests: Add tests for MMIO external abort injection
KVM: arm64: selftests: Convert to kernel's ESR terminology
tools: arm64: Grab a copy of esr.h from kernel
KVM: arm64: Don't retire aborted MMIO instruction
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 11 Nov 2024 18:47:50 +0000 (18:47 +0000)]
Merge branch kvm-arm64/misc into kvmarm/next
* kvm-arm64/misc:
: Miscellaneous updates
:
: - Drop useless check against vgic state in ICC_CLTR_EL1.SEIS read
: emulation
:
: - Fix trap configuration for pKVM
:
: - Close the door on initialization bugs surrounding userspace irqchip
: static key by removing it.
KVM: selftests: Don't bother deleting memslots in KVM when freeing VMs
KVM: arm64: Get rid of userspace_irqchip_in_use
KVM: arm64: Initialize trap register values in hyp in pKVM
KVM: arm64: Initialize the hypervisor's VM state at EL2
KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp use
KVM: arm64: Move pkvm_vcpu_init_traps() to init_pkvm_hyp_vcpu()
KVM: arm64: Don't map 'kvm_vgic_global_state' at EL2 with pKVM
KVM: arm64: Just advertise SEIS as 0 when emulating ICC_CTLR_EL1
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
KVM: selftests: Don't bother deleting memslots in KVM when freeing VMs
When freeing a VM, don't call into KVM to manually remove each memslot,
simply cleanup and free any userspace assets associated with the memory
region. KVM is ultimately responsible for ensuring kernel resources are
freed when the VM is destroyed, deleting memslots one-by-one is
unnecessarily slow, and unless a test is already leaking the VM fd, the
VM will be destroyed when kvm_vm_release() is called.
Not deleting KVM's memslot also allows cleaning up dead VMs without having
to care whether or not the to-be-freed VM is dead or alive.
Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/kvmarm/Zy0bcM0m-N18gAZz@google.com/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 11 Nov 2024 18:38:30 +0000 (18:38 +0000)]
Merge branch kvm-arm64/mpam-ni into kvmarm/next
* kvm-arm64/mpam-ni:
: Hiding FEAT_MPAM from KVM guests, courtesy of James Morse + Joey Gouly
:
: Fix a longstanding bug where FEAT_MPAM was accidentally exposed to KVM
: guests + the EL2 trap configuration was not explicitly configured. As
: part of this, bring in skeletal support for initialising the MPAM CPU
: context so KVM can actually set traps for its guests.
:
: Be warned -- if this series leads to boot failures on your system,
: you're running on turd firmware.
:
: As an added bonus (that builds upon the infrastructure added by the MPAM
: series), allow userspace to configure CTR_EL0.L1Ip, courtesy of Shameer
: Kolothum.
KVM: arm64: Make L1Ip feature in CTR_EL0 writable from userspace
KVM: arm64: selftests: Test ID_AA64PFR0.MPAM isn't completely ignored
KVM: arm64: Disable MPAM visibility by default and ignore VMM writes
KVM: arm64: Add a macro for creating filtered sys_reg_descs entries
KVM: arm64: Fix missing traps of guest accesses to the MPAM registers
arm64: cpufeature: discover CPU support for MPAM
arm64: head.S: Initialise MPAM EL2 registers and disable traps
arm64/sysreg: Convert existing MPAM sysregs and add the remaining entries
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 11 Nov 2024 18:36:46 +0000 (18:36 +0000)]
Merge branch kvm-arm64/psci-1.3 into kvmarm/next
* kvm-arm64/psci-1.3:
: PSCI v1.3 support, courtesy of David Woodhouse
:
: Bump KVM's PSCI implementation up to v1.3, with the added bonus of
: implementing the SYSTEM_OFF2 call. Like other system-scoped PSCI calls,
: this gets relayed to userspace for further processing with a new
: KVM_SYSTEM_EVENT_SHUTDOWN flag.
:
: As an added bonus, implement client-side support for hibernation with
: the SYSTEM_OFF2 call.
arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
KVM: selftests: Add test for PSCI SYSTEM_OFF2
KVM: arm64: Add support for PSCI v1.2 and v1.3
KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
firmware/psci: Add definitions for PSCI v1.3 specification
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 11 Nov 2024 18:36:12 +0000 (18:36 +0000)]
Merge branch kvm-arm64/nv-s1pie-s1poe into kvmarm/next
* kvm-arm64/nv-s1pie-s1poe: (36 commits)
: NV support for S1PIE/S1POE, courtesy of Marc Zyngier
:
: Complete support for S1PIE/S1POE at vEL2, including:
:
: - Save/restore of the vEL2 sysreg context
:
: - Use the S1PIE/S1POE context for fast-path AT emulation
:
: - Enlightening the software walker to the behavior of S1PIE/S1POE
:
: - Like any other good NV series, some trap routing descriptions
KVM: arm64: Handle WXN attribute
KVM: arm64: Handle stage-1 permission overlays
KVM: arm64: Make PAN conditions part of the S1 walk context
KVM: arm64: Disable hierarchical permissions when POE is enabled
KVM: arm64: Add POE save/restore for AT emulation fast-path
KVM: arm64: Add save/restore support for POR_EL2
KVM: arm64: Add basic support for POR_EL2
KVM: arm64: Add kvm_has_s1poe() helper
KVM: arm64: Subject S1PIE/S1POE registers to HCR_EL2.{TVM,TRVM}
KVM: arm64: Drop bogus CPTR_EL2.E0POE trap routing
arm64: Add encoding for POR_EL2
KVM: arm64: Rely on visibility to let PIR*_ELx/TCR2_ELx UNDEF
KVM: arm64: Hide S1PIE registers from userspace when disabled for guests
KVM: arm64: Hide TCR2_EL1 from userspace when disabled for guests
KVM: arm64: Define helper for EL2 registers with custom visibility
KVM: arm64: Add a composite EL2 visibility helper
KVM: arm64: Implement AT S1PIE support
KVM: arm64: Disable hierarchical permissions when S1PIE is enabled
KVM: arm64: Split S1 permission evaluation into direct and hierarchical parts
KVM: arm64: Add AT fast-path support for S1PIE
...
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Shameer Kolothum [Tue, 22 Oct 2024 07:39:43 +0000 (08:39 +0100)]
KVM: arm64: Make L1Ip feature in CTR_EL0 writable from userspace
Only allow userspace to set VIPT(0b10) or PIPT(0b11) for L1Ip based on
what hardware reports as both AIVIVT (0b01) and VPIPT (0b00) are
documented as reserved.
Using a VIPT for Guest where hardware reports PIPT may lead to over
invalidation, but is still correct. Hence, we can allow downgrading
PIPT to VIPT, but not the other way around.
The following sequence led to the scenario:
- Userspace creates a VM and a vCPU.
- The vCPU is initialized with KVM_ARM_VCPU_PMU_V3 during
KVM_ARM_VCPU_INIT.
- Without any other setup, such as vGIC or vPMU, userspace issues
KVM_RUN on the vCPU. Since the vPMU is requested, but not setup,
kvm_arm_pmu_v3_enable() fails in kvm_arch_vcpu_run_pid_change().
As a result, KVM_RUN returns after enabling the timer, but before
incrementing 'userspace_irqchip_in_use':
kvm_arch_vcpu_run_pid_change()
ret = kvm_arm_pmu_v3_enable()
if (!vcpu->arch.pmu.created)
return -EINVAL;
if (ret)
return ret;
[...]
if (!irqchip_in_kernel(kvm))
static_branch_inc(&userspace_irqchip_in_use);
- Userspace ignores the error and issues KVM_ARM_VCPU_INIT again.
Since the timer is already enabled, control moves through the
following flow, ultimately hitting the WARN_ON():
kvm_timer_vcpu_reset()
if (timer->enabled)
kvm_timer_update_irq()
if (!userspace_irqchip())
ret = kvm_vgic_inject_irq()
ret = vgic_lazy_init()
if (unlikely(!vgic_initialized(kvm)))
if (kvm->arch.vgic.vgic_model !=
KVM_DEV_TYPE_ARM_VGIC_V2)
return -EBUSY;
WARN_ON(ret);
Theoretically, since userspace_irqchip_in_use's functionality can be
simply replaced by '!irqchip_in_kernel()', get rid of the static key
to avoid the mismanagement, which also helps with the syzbot issue.
Cc: <stable@vger.kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Fri, 25 Oct 2024 18:25:59 +0000 (18:25 +0000)]
KVM: arm64: nv: Reprogram PMU events affected by nested transition
Start reprogramming PMU events at nested boundaries now that everything
is in place to handle the EL2 event filter. Only repaint events where
the filter differs between EL1 and EL2 as a slight optimization.
Oliver Upton [Fri, 25 Oct 2024 18:23:52 +0000 (18:23 +0000)]
KVM: arm64: nv: Apply EL2 event filtering when in hyp context
It hopefully comes as no surprise when I say that vEL2 actually runs at
EL1. So, the guest hypervisor's EL2 event filter (NSH) needs to actually
be applied to EL1 in the perf event. In addition to this, the disable
bit for the guest counter range (HPMD) needs to have the effect of
stopping the affected counters.
Do exactly that by stuffing ::exclude_kernel with the combined effect of
these controls. This isn't quite enough yet, as the backing perf events
need to be reprogrammed upon nested ERET/exception entry to remap the
effective filter onto ::exclude_kernel.
Oliver Upton [Fri, 25 Oct 2024 18:23:49 +0000 (18:23 +0000)]
KVM: arm64: Add helpers to determine if PMC counts at a given EL
Checking the exception level filters for a PMC is a minor annoyance to
open code. Add helpers to check if an event counts at EL0 and EL1, which
will prove useful in a subsequent change.
Oliver Upton [Fri, 25 Oct 2024 18:23:48 +0000 (18:23 +0000)]
KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN
The value of MDCR_EL2.HPMN controls the number of event counters made
visible to EL0 and EL1. This means it is possible for the guest
hypervisor to allow direct access to event counters to the L2.
Rework KVM's PMU register emulation to take the effects of HPMN into
account when handling a trap. For bitmask-style registers, writes only
affect accessible registers.
Oliver Upton [Fri, 25 Oct 2024 18:23:47 +0000 (18:23 +0000)]
KVM: arm64: Rename kvm_pmu_valid_counter_mask()
Nested PMU support requires dynamically changing the visible range of
PMU counters based on the exception level and value of MDCR_EL2.HPMN. At
the same time, the PMU emulation code needs to know the absolute number
of implemented counters, regardless of context.
Rename the existing helper to make it obvious that it returns the number
of implemented counters and not anything else.
Oliver Upton [Fri, 25 Oct 2024 18:23:46 +0000 (18:23 +0000)]
KVM: arm64: nv: Advertise support for FEAT_HPMN0
Everything is in place now for KVM to actually handle MDCR_EL2.HPMN. Not
only that, the emulation is capable of doing FEAT_HPMN0. Advertise
support for the feature in the VM's ID registers. It is possible to
emulate FEAT_HPMN0 on hardware that doesn't support it since KVM
currently traps all PMU registers. Having said that, let's only
advertise the feature on supporting hardware in case KVM ever provides
'direct' PMU support to VMs w/o involving host perf.
Oliver Upton [Fri, 25 Oct 2024 18:23:45 +0000 (18:23 +0000)]
KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN
MDCR_EL2.HPMN splits the PMU event counters into two ranges: the first
range is accessible from all ELs, and the second range is accessible
only to EL2/3. Supposing the guest hypervisor allows direct access to
the PMU counters from the L2, KVM needs to locally handle those
accesses.
Add a new complex trap configuration for HPMN that checks if the counter
index is accessible to the current context. As written, the architecture
suggests HPMN only causes PMEVCNTR<n>_EL0 to trap, though intuition (and
the pseudocode) suggest that the trap applies to PMEVTYPER<n>_EL0 as
well.
Oliver Upton [Fri, 25 Oct 2024 18:23:43 +0000 (18:23 +0000)]
KVM: arm64: nv: Reinject traps that take effect in Host EL0
Wire up the other end of traps that affect host EL0 by actually
injecting them into the guest hypervisor. Skip over FGT entirely, as a
cursory glance suggests no FGT is effective in host EL0.
Note that kvm_inject_nested() is already equipped for handling
exceptions while the VM is already in a host context.
Oliver Upton [Fri, 25 Oct 2024 18:23:41 +0000 (18:23 +0000)]
KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps
KVM uses a sanity-check to avoid infinite recursion in trap combinations
that could potentially depend on itself. Narrow the scope of this sanity
check to the exact CGT IDs that correspond w/ trap combos, opening the
door to using 'complex' traps as part of a combination.
Fuad Tabba [Fri, 18 Oct 2024 07:48:32 +0000 (08:48 +0100)]
KVM: arm64: Initialize trap register values in hyp in pKVM
Handle the initialization of trap registers at the hypervisor in
pKVM, even for non-protected guests. The host is not trusted with
the values of the trap registers, regardless of the VM type.
Therefore, when switching between the host and the guests, only
flush the HCR_EL2 TWI and TWE bits. The host is allowed to
configure these for opportunistic scheduling, as neither affects
the protection of VMs or the hypervisor.
Reported-by: Will Deacon <will@kernel.org> Fixes: 814ad8f96e92 ("KVM: arm64: Drop trapping of PAuth instructions/keys") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-5-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Fuad Tabba [Fri, 18 Oct 2024 07:48:31 +0000 (08:48 +0100)]
KVM: arm64: Initialize the hypervisor's VM state at EL2
Do not trust the state of the VM as provided by the host when
initializing the hypervisor's view of the VM sate. Initialize it
instead at EL2 to a known good and safe state, as pKVM already
does with hypervisor VCPU states.
Fuad Tabba [Fri, 18 Oct 2024 07:48:29 +0000 (08:48 +0100)]
KVM: arm64: Move pkvm_vcpu_init_traps() to init_pkvm_hyp_vcpu()
Move pkvm_vcpu_init_traps() to the initialization of the
hypervisor's vcpu state in init_pkvm_hyp_vcpu(), and remove the
associated hypercall.
In protected mode, traps need to be initialized whenever a VCPU
is initialized anyway, and not only for protected VMs. This also
saves an unnecessary hypercall.
James Morse [Wed, 30 Oct 2024 16:03:17 +0000 (16:03 +0000)]
KVM: arm64: selftests: Test ID_AA64PFR0.MPAM isn't completely ignored
The ID_AA64PFR0.MPAM bit was previously accidentally exposed to guests,
and is ignored by KVM. KVM will always present the guest with 0 here,
and trap the MPAM system registers to inject an undef.
But, this value is still needed to prevent migration when the value
is incompatible with the target hardware. Add a kvm unit test to try
and write multiple values to ID_AA64PFR0.MPAM. Only the hardware value
previously exposed should be ignored, all other values should be
rejected.
Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-8-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
James Morse [Wed, 30 Oct 2024 16:03:16 +0000 (16:03 +0000)]
KVM: arm64: Disable MPAM visibility by default and ignore VMM writes
commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in
ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
but didn't add trap handling. A previous patch supplied the missing trap
handling.
Existing VMs that have the MPAM field of ID_AA64PFR0_EL1 set need to
be migratable, but there is little point enabling the MPAM CPU
interface on new VMs until there is something a guest can do with it.
Clear the MPAM field from the guest's ID_AA64PFR0_EL1 and on hardware
that supports MPAM, politely ignore the VMMs attempts to set this bit.
Guests exposed to this bug have the sanitised value of the MPAM field,
so only the correct value needs to be ignored. This means the field
can continue to be used to block migration to incompatible hardware
(between MPAM=1 and MPAM=5), and the VMM can't rely on the field
being ignored.
Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-7-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
James Morse [Wed, 30 Oct 2024 16:03:15 +0000 (16:03 +0000)]
KVM: arm64: Add a macro for creating filtered sys_reg_descs entries
The sys_reg_descs array holds function pointers and reset value for
managing the user-space and guest view of system registers. These
are mostly created by a set of macro's as only some combinations
of behaviour are needed.
If a register needs special treatment, its sys_reg_descs entry is
open-coded. This is true of some id registers where the value provided
by user-space is validated by some helpers.
Before adding another one of these, add a helper that covers the
existing special cases. 'ID_FILTERED' expects helpers to set the
user-space value, and retrieve the modified reset value.
Like ID_WRITABLE() this uses id_visibility(), which should have no
functional change for the registers converted to use ID_FILTERED().
read_sanitised_id_aa64dfr0_el1() and read_sanitised_id_aa64pfr0_el1()
have been refactored to be called from kvm_read_sanitised_id_reg(), to
try be consistent with ID_WRITABLE().
Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-6-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
James Morse [Wed, 30 Oct 2024 16:03:14 +0000 (16:03 +0000)]
KVM: arm64: Fix missing traps of guest accesses to the MPAM registers
commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in
ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
but didn't add trap handling.
If you are unlucky, this results in an MPAM aware guest being delivered
an undef during boot. The host prints:
| kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005]
| { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read },
Add the support to enable the traps, and handle the three guest accessible
registers by injecting an UNDEF. This stops KVM from spamming the host
log, but doesn't yet hide the feature from the id registers.
With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if
ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps
MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of
these if either is supported. If neither is supported, the guest can
discover that the CPU has MPAM support, and how many PARTID etc the
host has ... but it can't influence anything, so its harmless.
James Morse [Wed, 30 Oct 2024 16:03:13 +0000 (16:03 +0000)]
arm64: cpufeature: discover CPU support for MPAM
ARMv8.4 adds support for 'Memory Partitioning And Monitoring' (MPAM)
which describes an interface to cache and bandwidth controls wherever
they appear in the system.
Add support to detect MPAM. Like SVE, MPAM has an extra id register that
describes some more properties, including the virtualisation support,
which is optional. Detect this separately so we can detect
mismatched/insane systems, but still use MPAM on the host even if the
virtualisation support is missing.
MPAM needs enabling at the highest implemented exception level, otherwise
the register accesses trap. The 'enabled' flag is accessible to lower
exception levels, but its in a register that traps when MPAM isn't enabled.
The cpufeature 'matches' hook is extended to test this on one of the
CPUs, so that firmware can emulate MPAM as disabled if it is reserved
for use by secure world.
Secondary CPUs that appear late could trip cpufeature's 'lower safe'
behaviour after the MPAM properties have been advertised to user-space.
Add a verify call to ensure late secondaries match the existing CPUs.
(If you have a boot failure that bisects here its likely your CPUs
advertise MPAM in the id registers, but firmware failed to either enable
or MPAM, or emulate the trap as if it were disabled)
Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-4-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
James Morse [Wed, 30 Oct 2024 16:03:12 +0000 (16:03 +0000)]
arm64: head.S: Initialise MPAM EL2 registers and disable traps
Add code to head.S's el2_setup to detect MPAM and disable any EL2 traps.
This register resets to an unknown value, setting it to the default
parititons/pmg before we enable the MMU is the best thing to do.
Kexec/kdump will depend on this if the previous kernel left the CPU
configured with a restrictive configuration.
If linux is booted at the highest implemented exception level el2_setup
will clear the enable bit, disabling MPAM.
This code can't be enabled until a subsequent patch adds the Kconfig
and cpufeature boiler plate.
Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-3-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
David Woodhouse [Sat, 19 Oct 2024 17:15:47 +0000 (18:15 +0100)]
arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
The PSCI v1.3 specification adds support for a SYSTEM_OFF2 function
which is analogous to ACPI S4 state. This will allow hosting
environments to determine that a guest is hibernated rather than just
powered off, and handle that state appropriately on subsequent launches.
Since commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and
poweroff") the EFI shutdown method is deliberately preferred over PSCI
or other methods. So register a SYS_OFF_MODE_POWER_OFF handler which
*only* handles the hibernation, leaving the original PSCI SYSTEM_OFF as
a last resort via the legacy pm_power_off function pointer.
The hibernation code already exports a system_entering_hibernation()
function which is be used by the higher-priority handler to check for
hibernation. That existing function just returns the value of a static
boolean variable from hibernate.c, which was previously only set in the
hibernation_platform_enter() code path. Set the same flag in the simpler
code path around the call to kernel_power_off() too.
An alternative way to hook SYSTEM_OFF2 into the hibernation code would
be to register a platform_hibernation_ops structure with an ->enter()
method which makes the new SYSTEM_OFF2 call. But that would have the
unwanted side-effect of making hibernation take a completely different
code path in hibernation_platform_enter(), invoking a lot of special dpm
callbacks.
Another option might be to add a new SYS_OFF_MODE_HIBERNATE mode, with
fallback to SYS_OFF_MODE_POWER_OFF. Or to use the sys_off_data to
indicate whether the power off is for hibernation.
But this version works and is relatively simple.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20241019172459.2241939-7-dwmw2@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Marc Zyngier [Wed, 23 Oct 2024 14:53:45 +0000 (15:53 +0100)]
KVM: arm64: Handle WXN attribute
Until now, we didn't really care about WXN as it didn't have an
effect on the R/W permissions (only the execution could be droppped),
and therefore not of interest for AT.
However, with S1POE, WXN can revoke the Write permission if an
overlay is active and that execution is allowed. This *is* relevant
to AT.
Add full handling of WXN so that we correctly handle this case.
Marc Zyngier [Wed, 23 Oct 2024 14:53:44 +0000 (15:53 +0100)]
KVM: arm64: Handle stage-1 permission overlays
We now have the intrastructure in place to emulate S1POE:
- direct permissions are always overlay-capable
- indirect permissions are overlay-capable if the permissions are
in the 0b0xxx range
- the overlays are strictly substractive
Marc Zyngier [Wed, 23 Oct 2024 14:53:41 +0000 (15:53 +0100)]
KVM: arm64: Add POE save/restore for AT emulation fast-path
Just like the other extensions affecting address translation,
we must save/restore POE so that an out-of-context translation
context can be restored and used with the AT instructions.
Marc Zyngier [Wed, 23 Oct 2024 14:53:39 +0000 (15:53 +0100)]
KVM: arm64: Add basic support for POR_EL2
S1POE support implies support for POR_EL2, which we provide by
- adding it to the vcpu_sysreg enum
- advertising it as mapped to its EL1 counterpart in get_el2_to_el1_mapping
- wiring it in the sys_reg_desc table with the correct visibility
- handling POR_EL1 in __vcpu_{read,write}_sys_reg_from_cpu()
Marc Zyngier [Wed, 23 Oct 2024 14:53:36 +0000 (15:53 +0100)]
KVM: arm64: Drop bogus CPTR_EL2.E0POE trap routing
It took me some time to realise it, but CPTR_EL2.E0POE does not
apply to a guest, only to EL0 when InHost(). And when InHost(),
CPCR_EL2 is mapped to CPACR_EL1, maning that the E0POE bit naturally
takes effect without any trap.
To sum it up, this trap bit is better left ignored, we will never
have to hanedle it.
Marc Zyngier [Wed, 23 Oct 2024 14:53:34 +0000 (15:53 +0100)]
KVM: arm64: Rely on visibility to let PIR*_ELx/TCR2_ELx UNDEF
With a visibility defined for these registers, there is no need
to check again for S1PIE or TCRX being implemented as perform_access()
already handles it.
Mark Brown [Wed, 23 Oct 2024 14:53:33 +0000 (15:53 +0100)]
KVM: arm64: Hide S1PIE registers from userspace when disabled for guests
When the guest does not support S1PIE we should not allow any access
to the system registers it adds in order to ensure that we do not create
spurious issues with guest migration. Add a visibility operation for these
registers.
Mark Brown [Wed, 23 Oct 2024 14:53:32 +0000 (15:53 +0100)]
KVM: arm64: Hide TCR2_EL1 from userspace when disabled for guests
When the guest does not support FEAT_TCR2 we should not allow any access
to it in order to ensure that we do not create spurious issues with guest
migration. Add a visibility operation for it.
Mark Brown [Wed, 23 Oct 2024 14:53:31 +0000 (15:53 +0100)]
KVM: arm64: Define helper for EL2 registers with custom visibility
In preparation for adding more visibility filtering for EL2 registers add
a helper macro like EL2_REG() which allows specification of a custom
visibility operation.
Marc Zyngier [Wed, 23 Oct 2024 14:53:30 +0000 (15:53 +0100)]
KVM: arm64: Add a composite EL2 visibility helper
We are starting to have a bunch of visibility helpers checking
for EL2 + something else, and we are going to add more.
Simplify things somehow by introducing a helper that implement
extractly that by taking a visibility helper as a parameter,
and convert the existing ones to that.
Marc Zyngier [Wed, 23 Oct 2024 14:53:29 +0000 (15:53 +0100)]
KVM: arm64: Implement AT S1PIE support
It doesn't take much effort to implement S1PIE support in AT.
It is only a matter of using the AArch64.S1IndirectBasePermissions()
encodings for the permission, ignoring GCS which has no impact on AT,
and enforce FEAT_PAN3 being enabled as this is a requirement of
FEAT_S1PIE.
Marc Zyngier [Wed, 23 Oct 2024 14:53:28 +0000 (15:53 +0100)]
KVM: arm64: Disable hierarchical permissions when S1PIE is enabled
S1PIE implicitly disables hierarchical permissions, as specified in
R_JHSVW, by making TCR_ELx.HPDn RES1.
Add a predicate for S1PIE being enabled for a given translation regime,
and emulate this behaviour by forcing the hpd field to true if S1PIE
is enabled for that translation regime.
Marc Zyngier [Wed, 23 Oct 2024 14:53:27 +0000 (15:53 +0100)]
KVM: arm64: Split S1 permission evaluation into direct and hierarchical parts
The AArch64.S1DirectBasePermissions() pseudocode deals with both
direct and hierarchical S1 permission evaluation. While this is
probably convenient in the pseudocode, we would like a bit more
flexibility to slot things like indirect permissions.
To that effect, split the two permission check parts out of
handle_at_slow() and into their own functions. The permissions
are passed around as part of the walk_result structure.
Marc Zyngier [Wed, 23 Oct 2024 14:53:23 +0000 (15:53 +0100)]
KVM: arm64: Add save/restore for PIR{,E0}_EL2
Like their EL1 equivalent, the EL2-specific FEAT_S1PIE registers
are context-switched. This is made conditional on both FEAT_TCRX
and FEAT_S1PIE being adversised.
Note that this change only makes sense if read together with the
issue D22677 contained in 102105_K.a_04_en.
Marc Zyngier [Wed, 23 Oct 2024 14:53:17 +0000 (15:53 +0100)]
KVM: arm64: Extend masking facility to arbitrary registers
We currently only use the masking (RES0/RES1) facility for VNCR
registers, as they are memory-based and thus easy to sanitise.
But we could apply the same thing to other registers if we:
- split the sanitisation from __VNCR_START__
- apply the sanitisation when reading from a HW register
This involves a new "marker" in the vcpu_sysreg enum, which
defines the point at which the sanitisation applies (the VNCR
registers being of course after this marker).
Whle we are at it, rename kvm_vcpu_sanitise_vncr_reg() to
kvm_vcpu_apply_reg_masks(), which is vaguely more explicit,
and harden set_sysreg_masks() against setting masks for
random registers...
Marc Zyngier [Wed, 23 Oct 2024 14:53:16 +0000 (15:53 +0100)]
KVM: arm64: Correctly access TCR2_EL1, PIR_EL1, PIRE0_EL1 with VHE
For code that accesses any of the guest registers for emulation
purposes, it is crucial to know where the most up-to-date data is.
While this is pretty clear for nVHE (memory is the sole repository),
things are a lot muddier for VHE, as depending on the SYSREGS_ON_CPU
flag, registers can either be loaded on the HW or be in memory.
Even worse with NV, where the loaded state is by definition partial.
For these reasons, KVM offers the vcpu_read_sys_reg() and
vcpu_write_sys_reg() primitives that always do the right thing.
However, these primitive must know what register to access, and
this is the role of the __vcpu_read_sys_reg_from_cpu() and
__vcpu_write_sys_reg_to_cpu() helpers.
As it turns out, TCR2_EL1, PIR_EL1, PIRE0_EL1 and not described
in the latter helpers, meaning that the AT code cannot use them
to emulate S1PIE.
Add the three registers to the (long) list.
Fixes: 86f9de9db178 ("KVM: arm64: Save/restore PIE registers") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Marc Zyngier [Wed, 23 Oct 2024 14:53:15 +0000 (15:53 +0100)]
KVM: arm64: nv: Save/Restore vEL2 sysregs
Whenever we need to restore the guest's system registers to the CPU, we
now need to take care of the EL2 system registers as well. Most of them
are accessed via traps only, but some have an immediate effect and also
a guest running in VHE mode would expect them to be accessible via their
EL1 encoding, which we do not trap.
For vEL2 we write the virtual EL2 registers with an identical format directly
into their EL1 counterpart, and translate the few registers that have a
different format for the same effect on the execution when running a
non-VHE guest guest hypervisor.
Based on an initial patch from Andre Przywara, rewritten many times
since.
Marc Zyngier [Wed, 23 Oct 2024 14:53:14 +0000 (15:53 +0100)]
KVM: arm64: nv: Handle CNTHCTL_EL2 specially
Accessing CNTHCTL_EL2 is fraught with danger if running with
HCR_EL2.E2H=1: half of the bits are held in CNTKCTL_EL1, and
thus can be changed behind our back, while the rest lives
in the CNTHCTL_EL2 shadow copy that is memory-based.
Yes, this is a lot of fun!
Make sure that we merge the two on read access, while we can
write to CNTKCTL_EL1 in a more straightforward manner.
Marc Zyngier [Wed, 23 Oct 2024 14:53:10 +0000 (15:53 +0100)]
arm64: Remove VNCR definition for PIRE0_EL2
As of the ARM ARM Known Issues document 102105_K.a_04_en, D22677
fixes a problem with the PIRE0_EL2 register, resulting in its
removal from the VNCR page (it had no purpose being there the
first place).
Follow the architecture update by removing this offset.
Marc Zyngier [Wed, 23 Oct 2024 14:53:09 +0000 (15:53 +0100)]
arm64: Drop SKL0/SKL1 from TCR2_EL2
Despite what the documentation says, TCR2_EL2.{SKL0,SKL1} do not exist,
and the corresponding information is in the respective TTBRx_EL2. This
is a leftover from a development version of the architecture.
This change makes TCR2_EL2 similar to TCR2_EL1 in that respect.
Linus Torvalds [Sun, 27 Oct 2024 19:01:36 +0000 (09:01 -1000)]
Merge tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Prevent a certain range of pages which get marked as hypervisor-only,
to get allocated to a CoCo (SNP) guest which cannot use them and thus
fail booting
- Fix the microcode loader on AMD to pay attention to the stepping of a
patch and to handle the case where a BIOS config option splits the
machine into logical NUMA nodes per L3 cache slice
- Disable LAM from being built by default due to security concerns
* tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Ensure that RMP table fixups are reserved
x86/microcode/AMD: Split load_microcode_amd()
x86/microcode/AMD: Pay attention to the stepping dynamically
x86/lam: Disable ADDRESS_MASKING in most cases
Linus Torvalds [Sun, 27 Oct 2024 18:56:22 +0000 (08:56 -1000)]
Merge tag 'ftrace-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:
- Fix missing mutex unlock in error path of register_ftrace_graph()
A previous fix added a return on an error path and forgot to unlock
the mutex. Instead of dealing with error paths, use guard(mutex) as
the mutex is just released at the exit of the function anyway. Other
functions in this file should be updated with this, but that's a
cleanup and not a fix.
- Change cpuhp setup name to be consistent with other cpuhp states
The same fix that the above patch fixes added a cpuhp_setup_state()
call with the name of "fgraph_idle_init". I was informed that it
should instead be something like: "fgraph:online". Update that too.
* tag 'ftrace-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fgraph: Change the name of cpuhp state to "fgraph:online"
fgraph: Fix missing unlock in register_ftrace_graph()
Linus Torvalds [Sun, 27 Oct 2024 18:40:33 +0000 (08:40 -1000)]
Merge tag 'platform-drivers-x86-v6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- Asus thermal profile fix, fixing performance issues on Lunar Lake
- Intel PMC: one revert for a lockdep issue and one bugfix
- Dell WMI: Ignore some WMI events on suspend/resume to silence warnings
* tag 'platform-drivers-x86-v6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-wmi: Fix thermal profile initialization
platform/x86: dell-wmi: Ignore suspend notifications
platform/x86/intel/pmc: Fix pmc_core_iounmap to call iounmap for valid addresses
platform/x86:intel/pmc: Revert "Enable the ACPI PM Timer to be turned off when suspended"
Linus Torvalds [Sun, 27 Oct 2024 18:36:01 +0000 (08:36 -1000)]
Merge tag 'firewire-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
"A single commit to resolve a regression existing in v6.11 or later.
The change in 1394 OHCI driver in v6.11 kernel could cause general
protection faults when rediscovering nodes in IEEE 1394 bus while
holding a spin lock. Consequently, watchdog checks can report a hard
lockup.
Currently, this issue is observed primarily during the system resume
phase when using an extra node with three ports or more is used.
However, it could potentially occur in the other cases as well"
* tag 'firewire-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: fix invalid port index for parent device
Linus Torvalds [Sun, 27 Oct 2024 18:29:36 +0000 (08:29 -1000)]
Merge tag 'block-6.12-20241026' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Pull request for MD via Song fixing a few issues
- Fix a wrong check in blk_rq_map_user_bvec(), causing IO errors on
passthrough IO (Xinyu)
* tag 'block-6.12-20241026' of git://git.kernel.dk/linux:
block: fix sanity checks in blk_rq_map_user_bvec
md/raid10: fix null ptr dereference in raid10_size()
md: ensure child flush IO does not affect origin bio->bi_status
Linus Torvalds [Sun, 27 Oct 2024 18:23:49 +0000 (08:23 -1000)]
Merge tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- Fix recovery of allocator ops after a growfs
- Do not fail repairs on metadata files with no attr fork
* tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: update the pag for the last AG at recovery time
xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag
xfs: error out when a superblock buffer update reduces the agcount
xfs: update the file system geometry after recoverying superblock buffers
xfs: merge the perag freeing helpers
xfs: pass the exact range to initialize to xfs_initialize_perag
xfs: don't fail repairs on metadata files with no attr fork
Takashi Sakamoto [Fri, 25 Oct 2024 03:41:37 +0000 (12:41 +0900)]
firewire: core: fix invalid port index for parent device
In a commit 24b7f8e5cd65 ("firewire: core: use helper functions for self
ID sequence"), the enumeration over self ID sequence was refactored with
some helper functions with KUnit tests. These helper functions are
guaranteed to work expectedly by the KUnit tests, however their application
includes a mistake to assign invalid value to the index of port connected
to parent device.
This bug affects the case that any extra node devices which has three or
more ports are connected to 1394 OHCI controller. In the case, the path
to update the tree cache could hits WARN_ON(), and gets general protection
fault due to the access to invalid address computed by the invalid value.
This commit fixes the bug to assign correct port index.
Cc: stable@vger.kernel.org Reported-by: Edmund Raile <edmund.raile@proton.me> Closes: https://lore.kernel.org/lkml/8a9902a4ece9329af1e1e42f5fea76861f0bf0e8.camel@proton.me/ Fixes: 24b7f8e5cd65 ("firewire: core: use helper functions for self ID sequence") Link: https://lore.kernel.org/r/20241025034137.99317-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>