Oliver Upton [Sat, 26 Jul 2025 15:50:06 +0000 (08:50 -0700)]
Merge branch 'kvm-arm64/gcie-legacy' into kvmarm/next
* kvm-arm64/gcie-legacy:
: Support for GICv3 emulation on GICv5, courtesy of Sascha Bischoff
:
: FEAT_GCIE_LEGACY adds the necessary hardware for GICv5 systems to
: support the legacy GICv3 for VMs, including a backwards-compatible VGIC
: implementation that we all know and love.
:
: As a starting point for GICv5 enablement in KVM, enable + use the
: GICv3-compatible feature when running VMs on GICv5 hardware.
KVM: arm64: gic-v5: Probe for GICv5
KVM: arm64: gic-v5: Support GICv3 compat
arm64/sysreg: Add ICH_VCTLR_EL2
irqchip/gic-v5: Populate struct gic_kvm_info
irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Sat, 26 Jul 2025 15:47:22 +0000 (08:47 -0700)]
Merge branch 'kvm-arm64/doublefault2' into kvmarm/next
* kvm-arm64/doublefault2: (33 commits)
: NV Support for FEAT_RAS + DoubleFault2
:
: Delegate the vSError context to the guest hypervisor when in a nested
: state, including registers related to ESR propagation. Additionally,
: catch up KVM's external abort infrastructure to the architecture,
: implementing the effects of FEAT_DoubleFault2.
:
: This has some impact on non-nested guests, as SErrors deemed unmasked at
: the time they're made pending are now immediately injected with an
: emulated exception entry rather than using the VSE bit.
KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised
KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context
KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state
KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately
KVM: arm64: selftests: Test ESR propagation for vSError injection
KVM: arm64: Populate ESR_ELx.EC for emulated SError injection
KVM: arm64: selftests: Catch up set_id_regs with the kernel
KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list
KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1
KVM: arm64: selftests: Add basic SError injection test
KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError
KVM: arm64: Advertise support for FEAT_DoubleFault2
KVM: arm64: Advertise support for FEAT_SCTLR2
KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set
KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA
KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set
KVM: arm64: Route SEAs to the SError vector when EASE is set
KVM: arm64: nv: Ensure Address size faults affect correct ESR
KVM: arm64: Factor out helper for selecting exception target EL
KVM: arm64: Describe SCTLR2_ELx RESx masks
...
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Sat, 26 Jul 2025 15:47:02 +0000 (08:47 -0700)]
Merge branch 'kvm-arm64/cacheable-pfnmap' into kvmarm/next
* kvm-arm64/cacheable-pfnmap:
: Cacheable PFNMAP support at stage-2, courtesy of Ankit Agrawal
:
: For historical reasons, KVM only allows cacheable mappings at stage-2
: when a kernel alias exists in the direct map for the memory region. On
: hardware without FEAT_S2FWB, this is necessary as KVM must do cache
: maintenance to keep guest/host accesses coherent.
:
: This is unnecessarily restrictive on systems with FEAT_S2FWB and
: CTR_EL0.DIC, as KVM no longer needs to perform cache maintenance to
: maintain correctness.
:
: Allow cacheable mappings at stage-2 on supporting hardware when the
: corresponding VMA has cacheable memory attributes and advertise a
: capability to userspace such that a VMM can determine if a stage-2
: mapping can be established (e.g. VFIO device).
KVM: arm64: Expose new KVM cap for cacheable PFNMAP
KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
KVM: arm64: Block cacheable PFNMAP mapping
KVM: arm64: Assume non-PFNMAP/MIXEDMAP VMAs can be mapped cacheable
KVM: arm64: Rename the device variable to s2_force_noncacheable
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Marc Zyngier [Mon, 21 Jul 2025 10:19:51 +0000 (11:19 +0100)]
KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised
We currently always expose FEAT_RAS when available on the host.
As we are about to make this feature selectable from userspace,
check for it being present before emulating register accesses
as RAZ/WI, and inject an UNDEF otherwise.
Marc Zyngier [Mon, 21 Jul 2025 10:19:50 +0000 (11:19 +0100)]
KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context
Most HCR_EL2 bits are not supposed to affect EL2 at all, but only
the guest. However, we gladly merge these bits with the host's
HCR_EL2 configuration, irrespective of entering L1 or L2.
This leads to some funky behaviour, such as L1 trying to inject
a virtual SError for L2, and getting a taste of its own medecine.
Not quite what the architecture anticipated.
In the end, the only bits that matter are those we have defined as
invariants, either because we've made them RESx (E2H, HCD...), or
that we actively refuse to merge because the mess with KVM's own
logic.
Use the sanitisation infrastructure to get the RES1 bits, and let
things rip in a safer way.
Fixes: 04ab519bb86df ("KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250721101955.535159-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Marc Zyngier [Sun, 20 Jul 2025 10:22:29 +0000 (11:22 +0100)]
KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state
Mark Brown reports that since we commit to making exceptions
visible without the vcpu being loaded, the external abort selftest
fails.
Upon investigation, it turns out that the code that makes registers
affected by an exception visible to the guest is completely broken
on VHE, as we don't check whether the system registers are loaded
on the CPU at this point. We managed to get away with this so far,
but that's obviously as bad as it gets,
Add the required checksm and document the absolute need to check
for the SYSREGS_ON_CPU flag before calling into any of the
__vcpu_write_sys_reg_to_cpu()__vcpu_read_sys_reg_from_cpu() helpers.
Oliver Upton [Tue, 15 Jul 2025 06:25:07 +0000 (23:25 -0700)]
KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately
syzkaller has found that it can trip a warning in KVM's exception
emulation infrastructure by repeatedly injecting exceptions into the
guest.
While it's unlikely that a reasonable VMM will do this, further
investigation of the issue reveals that KVM can potentially discard the
"pending" SEA state. While the handling of KVM_GET_VCPU_EVENTS presumes
that userspace-injected SEAs are realized immediately, in reality the
emulated exception entry is deferred until the next call to KVM_RUN.
Hack-a-fix the immediate issues by committing the pending exceptions to
the vCPU's architectural state immediately in KVM_SET_VCPU_EVENTS. This
is no different to the way KVM-injected exceptions are handled in
KVM_RUN where we potentially call __kvm_adjust_pc() before returning to
userspace.
Reported-by: syzbot+4e09b1432de3774b86ae@syzkaller.appspotmail.com Reported-by: syzbot+1f6f096afda6f4f8f565@syzkaller.appspotmail.com Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Marc Zyngier [Tue, 15 Jul 2025 17:11:12 +0000 (18:11 +0100)]
arm64: smp: Fix pNMI setup after GICv5 rework
Breno reports that pNMIs are not behaving the way they should since
they were reworked for GICv5. Turns out we feed the IRQ number to
the pNMI helper instead of the IPI number -- not a good idea.
Fix it by providing the correct number (duh).
Fixes: ba1004f861d16 ("arm64: smp: Support non-SGIs for IPIs") Reported-by: Breno Leitao <leitao@debian.org> Suggested-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Tue, 8 Jul 2025 23:06:32 +0000 (16:06 -0700)]
KVM: arm64: selftests: Test ESR propagation for vSError injection
Ensure that vSErrors taken in the guest have an appropriate ESR_ELx
value for the expected exception. Additionally, switch the EASE test to
install the SEA handler at the SError offset, as the ESR is still
expected to match an SEA in that case.
Oliver Upton [Tue, 8 Jul 2025 23:06:31 +0000 (16:06 -0700)]
KVM: arm64: Populate ESR_ELx.EC for emulated SError injection
The hardware vSError injection mechanism populates ESR_ELx.EC as part of
ESR propagation and the contents of VSESR_EL2 populate the ISS field. Of
course, this means our emulated injection needs to set up the EC
correctly for an SError too.
Sascha Bischoff [Fri, 27 Jun 2025 10:09:02 +0000 (10:09 +0000)]
KVM: arm64: gic-v5: Support GICv3 compat
Add support for GICv3 compat mode (FEAT_GCIE_LEGACY) which allows a
GICv5 host to run GICv3-based VMs. This change enables the
VHE/nVHE/hVHE/protected modes, but does not support nested
virtualization.
A lazy-disable approach is taken for compat mode; it is enabled on the
vgic_v3_load path but not disabled on the vgic_v3_put path. A
non-GICv3 VM, i.e., one based on GICv5, is responsible for disabling
compat mode on the corresponding vgic_v5_load path. Currently, GICv5
is not supported, and hence compat mode is not disabled again once it
is enabled, and this function is intentionally omitted from the code.
Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://lore.kernel.org/r/20250627100847.1022515-5-sascha.bischoff@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Sascha Bischoff [Fri, 27 Jun 2025 10:09:01 +0000 (10:09 +0000)]
irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts
If a PPI interrupt is forwarded to a guest, skip the deactivate and
only EOI. Rely on the guest deactivating both the virtual and physical
interrupts (due to ICH_LRx_EL2.HW being set) later on as part of
handling the injected interrupt. This mimics the behaviour seen on
native GICv3.
This is part of adding support for the GICv3 compatibility mode on a
GICv5 host.
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://lore.kernel.org/r/20250627100847.1022515-2-sascha.bischoff@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
KVM might have an emulated SError queued for the guest if userspace
returned an abort for MMIO. Better yet, it could actually be a
*synchronous* exception in disguise if SCTLR2_ELx.EASE is set.
Don't advance PC if KVM owes an emulated SError, just like the handling
of emulated SEA injection.
Oliver Upton [Tue, 8 Jul 2025 17:25:24 +0000 (10:25 -0700)]
KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA
As the name might imply, when NMEA is set SErrors are non-maskable and
can be taken regardless of PSTATE.A. As is the recurring theme with
DoubleFault2, the effects on SError routing are entirely backwards to
this.
If at EL1, NMEA is *not* considered for SError routing when TMEA is set
and the exception is taken to EL2 when PSTATE.A is set.
Oliver Upton [Tue, 8 Jul 2025 17:25:23 +0000 (10:25 -0700)]
KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set
HCRX_EL2.TMEA further modifies the external abort behavior where
unmasked aborts are taken to EL1 and masked aborts are taken to EL2.
It's rather weird when you consider that SEAs are, well, *synchronous*
and therefore not actually maskable. However, for the purposes of
exception routing, they're considered "masked" if the A flag is set.
This gets a bit hairier when considering the fact that TMEA
also enables vSErrors, i.e. KVM has delegated the HW vSError context to
the guest hypervisor. We can keep the vSError context delegation as-is
by taking advantage of a couple properties:
- If SErrors are unmasked, the 'physical' SError can be taken
in-context immediately. In other words, KVM can emulate the EL1
SError while preserving vEL2's ownership of the vSError context.
- If SErrors are masked, the 'physical' SError is taken to EL2
immediately and needs the usual nested exception entry.
Note that the new in-context handling has the benign effect where
unmasked SError injections are emulated even for non-nested VMs.
Oliver Upton [Tue, 8 Jul 2025 17:25:22 +0000 (10:25 -0700)]
KVM: arm64: Route SEAs to the SError vector when EASE is set
One of the finest additions of FEAT_DoubleFault2 is the ability for
software to request *synchronous* external aborts be taken to the
SError vector, which of coure are *asynchronous* in nature.
Opinions be damned, implement the architecture and send SEAs to the
SError vector if EASE is set for the target context.
For historical reasons, Address size faults are first injected into the
guest as an SEA and ESR_EL1 is subsequently modified to reflect the
correct FSC. Of course, when dealing with a vEL2 this should poke
ESR_EL2.
Oliver Upton [Tue, 8 Jul 2025 17:25:20 +0000 (10:25 -0700)]
KVM: arm64: Factor out helper for selecting exception target EL
Pull out the exception target selection from pend_sync_exception() for
general use. Use PSR_MODE_ELxh as a shorthand for the target EL, as
SP_ELx selection is handled further along in the hyp's exception
emulation.
Oliver Upton [Tue, 8 Jul 2025 17:25:13 +0000 (10:25 -0700)]
KVM: arm64: nv: Use guest hypervisor's vSError state
When HCR_EL2.AMO is set, physical SErrors are routed to EL2 and virtual
SError injection is enabled for EL1. Conceptually treating
host-initiated SErrors as 'physical', this means we can delegate control
of the vSError injection context to the guest hypervisor when nesting &&
AMO is set.
To date KVM has used HCR_EL2.VSE to track the state of a pending SError
for the guest. With this bit set, hardware respects the EL1 exception
routing / masking rules and injects the vSError when appropriate.
This isn't correct for NV guests as hardware is oblivious to vEL2's
intentions for SErrors. Better yet, with FEAT_NV2 the guest can change
the routing behind our back as HCR_EL2 is redirected to memory. Cope
with this mess by:
- Using a flag (instead of HCR_EL2.VSE) to track the pending SError
state when SErrors are unconditionally masked for the current context
- Resampling the routing / masking of a pending SError on every guest
entry/exit
- Emulating exception entry when SError routing implies a translation
regime change
Oliver Upton [Tue, 8 Jul 2025 17:25:10 +0000 (10:25 -0700)]
KVM: arm64: nv: Respect exception routing rules for SEAs
Synchronous external aborts are taken to EL2 if ELIsInHost() or
HCR_EL2.TEA=1. Rework the SEA injection plumbing to respect the imposed
routing of the guest hypervisor and opportunistically rephrase things to
make their function a bit more obvious.
Oliver Upton [Tue, 8 Jul 2025 17:25:09 +0000 (10:25 -0700)]
KVM: arm64: Treat vCPU with pending SError as runnable
Per R_VRLPB, a pending SError is a WFI wakeup event regardless of
PSTATE.A, meaning that the vCPU is runnable. Sample VSE in addition to
the other IRQ lines.
Marc Zyngier [Tue, 8 Jul 2025 17:25:08 +0000 (10:25 -0700)]
KVM: arm64: Add helper to identify a nested context
A common idiom in the KVM code is to check if we are currently
dealing with a "nested" context, defined as having NV enabled,
but being in the EL1&0 translation regime.
This is usually expressed as:
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu) ... )
which is a mouthful and a bit hard to read, specially when followed
by additional conditions.
Introduce a new helper that encapsulate these two terms, allowing
the above to be written as
if (is_nested_context(vcpu) ... )
which is both shorter and easier to read, and makes more obvious
the potential for simplification on some code paths.
docs: arm64: gic-v5: Document booting requirements for GICv5
Document the requirements for booting a kernel on a system implementing
a GICv5 interrupt controller.
Specifically, other than DT/ACPI providing the required firmware
representation, define what traps must be disabled if the kernel is
booted at EL1 on a system where EL2 is implemented.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-30-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
The GICv5 architecture implements the Interrupt Wire Bridge (IWB) in
order to support wired interrupts that cannot be connected directly
to an IRS and instead uses the ITS to translate a wire event into
an IRQ signal.
Add the wired-to-MSI IWB driver to manage IWB wired interrupts.
An IWB is connected to an ITS and it has its own deviceID for all
interrupt wires that it manages; the IWB input wire number must be
exposed to the ITS as an eventID with a 1:1 mapping.
This eventID is not programmable and therefore requires a new
msi_alloc_info_t flag to make sure the ITS driver does not allocate
an eventid for the wire but rather it uses the msi_alloc_info_t.hwirq
number to gather the ITS eventID.
Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-29-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
The GICv5 architecture implements Interrupt Translation Service
(ITS) components in order to translate events coming from peripherals
into interrupt events delivered to the connected IRSes.
Events (ie MSI memory writes to ITS translate frame), are translated
by the ITS using tables kept in memory.
ITS translation tables for peripherals is kept in memory storage
(device table [DT] and Interrupt Translation Table [ITT]) that
is allocated by the driver on boot.
Both tables can be 1- or 2-level; the structure is chosen by the
driver after probing the ITS HW parameters and checking the
allowed table splits and supported {device/event}_IDbits.
DT table entries are allocated on demand (ie when a device is
probed); the DT table is sized using the number of supported
deviceID bits in that that's a system design decision (ie the
number of deviceID bits implemented should reflect the number
of devices expected in a system) therefore it makes sense to
allocate a DT table that can cater for the maximum number of
devices.
DT and ITT tables are allocated using the kmalloc interface;
the allocation size may be smaller than a page or larger,
and must provide contiguous memory pages.
LPIs INTIDs backing the device events are allocated one-by-one
and only upon Linux IRQ allocation; this to avoid preallocating
a large number of LPIs to cover the HW device MSI vector
size whereas few MSI entries are actually enabled by a device.
ITS cacheability/shareability attributes are programmed
according to the provided firmware ITS description.
The GICv5 partially reuses the GICv3 ITS MSI parent infrastructure
and adds functions required to retrieve the ITS translate frame
addresses out of msi-map and msi-parent properties to implement
the GICv5 ITS MSI parent callbacks.
Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-28-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
In some irqchip implementations the fwnode representing the IRQdomain
and the MSI controller fwnode do not match; in particular the IRQdomain
fwnode is the MSI controller fwnode parent.
To support selecting such IRQ domains, add a flag in core IRQ domain
code that explicitly tells the MSI lib to use the parent fwnode while
carrying out IRQ domain selection.
Update the msi-lib select callback with the resulting logic.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-27-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
The GICv5 ITS will reuse some GICv3 ITS MSI parent functions therefore
it makes sense to keep the code functionality in a compilation unit
shared by the two drivers.
Rename the GICv3 ITS MSI parent file and update the related
Kconfig/Makefile entries to pave the way for code sharing.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-26-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
PCI/MSI: Add pci_msi_map_rid_ctlr_node() helper function
IRQchip drivers need a PCI/MSI function to map a RID to a MSI
controller deviceID namespace and at the same time retrieve the
struct device_node pointer of the MSI controller the RID is mapped
to.
Add pci_msi_map_rid_ctlr_node() to achieve this purpose.
Cc Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-25-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
Add an of_msi_xlate() helper that maps a device ID and returns
the device node of the MSI controller the device ID is mapped to.
Required by core functions that need an MSI controller device node
pointer at the same time as a mapped device ID, of_msi_map_id() is not
sufficient for that purpose.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-24-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
An IRS supports Logical Peripheral Interrupts (LPIs) and implement
Linux IPIs on top of it.
LPIs are used for interrupt signals that are translated by a
GICv5 ITS (Interrupt Translation Service) but also for software
generated IRQs - namely interrupts that are not driven by a HW
signal, ie IPIs.
LPIs rely on memory storage for interrupt routing and state.
LPIs state and routing information is kept in the Interrupt
State Table (IST).
IRSes provide support for 1- or 2-level IST tables configured
to support a maximum number of interrupts that depend on the
OS configuration and the HW capabilities.
On systems that provide 2-level IST support, always allow
the maximum number of LPIs; On systems with only 1-level
support, limit the number of LPIs to 2^12 to prevent
wasting memory (presumably a system that supports a 1-level
only IST is not expecting a large number of interrupts).
On a 2-level IST system, L2 entries are allocated on
demand.
The IST table memory is allocated using the kmalloc() interface;
the allocation required may be smaller than a page and must be
made up of contiguous physical pages if larger than a page.
On systems where the IRS is not cache-coherent with the CPUs,
cache mainteinance operations are executed to clean and
invalidate the allocated memory to the point of coherency
making it visible to the IRS components.
On GICv5 systems, IPIs are implemented using LPIs.
Add an LPI IRQ domain and implement an IPI-specific IRQ domain created
as a child/subdomain of the LPI domain to allocate the required number
of LPIs needed to implement the IPIs.
IPIs are backed by LPIs, add LPIs allocation/de-allocation
functions.
The LPI INTID namespace is managed using an IDA to alloc/free LPI INTIDs.
Associate an IPI irqchip with IPI IRQ descriptors to provide
core code with the irqchip.ipi_send_single() method required
to raise an IPI.
Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-22-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
The GICv5 Interrupt Routing Service (IRS) component implements
interrupt management and routing in the GICv5 architecture.
A GICv5 system comprises one or more IRSes, that together
handle the interrupt routing and state for the system.
An IRS supports Shared Peripheral Interrupts (SPIs), that are
interrupt sources directly connected to the IRS; they do not
rely on memory for storage. The number of supported SPIs is
fixed for a given implementation and can be probed through IRS
IDR registers.
SPI interrupt state and routing are managed through GICv5
instructions.
Each core (PE in GICv5 terms) in a GICv5 system is identified with
an Interrupt AFFinity ID (IAFFID).
An IRS manages a set of cores that are connected to it.
Firmware provides a topology description that the driver uses
to detect to which IRS a CPU (ie an IAFFID) is associated with.
Use probeable information and firmware description to initialize
the IRSes and implement GICv5 IRS SPIs support through an
SPI-specific IRQ domain.
The GICv5 IRS driver:
- Probes IRSes in the system to detect SPI ranges
- Associates an IRS with a set of cores connected to it
- Adds an IRQchip structure for SPI handling
SPIs priority is set to a value corresponding to the lowest
permissible priority in the system (taking into account the
implemented priority bits of the IRS and CPU interface).
Since all IRQs are set to the same priority value, the value
itself does not matter as long as it is a valid one.
Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-21-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
The GICv5 CPU interface implements support for PE-Private Peripheral
Interrupts (PPI), that are handled (enabled/prioritized/delivered)
entirely within the CPU interface hardware.
To enable PPI interrupts, implement the baseline GICv5 host kernel
driver infrastructure required to handle interrupts on a GICv5 system.
Add the exception handling code path and definitions for GICv5
instructions.
Add GICv5 PPI handling code as a specific IRQ domain to:
- Set-up PPI priority
- Manage PPI configuration and state
- Manage IRQ flow handler
- IRQs allocation/free
- Hook-up a PPI specific IRQchip to provide the relevant methods
PPI IRQ priority is chosen as the minimum allowed priority by the
system design (after probing the number of priority bits implemented
by the CPU interface).
Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-20-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Thu, 3 Jul 2025 10:25:08 +0000 (12:25 +0200)]
arm64: smp: Support non-SGIs for IPIs
The arm64 arch has relied so far on GIC architectural software
generated interrupt (SGIs) to handle IPIs. Those are per-cpu
software generated interrupts.
arm64 architecture code that allocates the IPIs virtual IRQs and
IRQ descriptors was written accordingly.
On GICv5 systems, IPIs are implemented using LPIs that are not
per-cpu interrupts - they are just normal routable IRQs.
Add arch code to set-up IPIs on systems where they are handled
using normal routable IRQs.
For those systems, force the IRQ affinity (and make it immutable)
to the cpu a given IRQ was assigned to.
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
[lpieralisi: changed affinity set-up, log] Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-18-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
arm64: cpucaps: Rename GICv3 CPU interface capability
In preparation for adding a GICv5 CPU interface capability,
rework the existing GICv3 CPUIF capability - change its name and
description so that the subsequent GICv5 CPUIF capability
can be added with a more consistent naming on top.
Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-16-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
The GICv5 interrupt controller architecture is composed of:
- one or more Interrupt Routing Service (IRS)
- zero or more Interrupt Translation Service (ITS)
- zero or more Interrupt Wire Bridge (IWB)
Describe a GICv5 implementation by specifying a top level node
corresponding to the GICv5 system component.
IRS nodes are added as GICv5 system component children.
An ITS is associated with an IRS so ITS nodes are described
as IRS children - use the hierarchy explicitly in the device
tree to define the association.
IWB nodes are described as a separate schema.
An IWB is connected to a single ITS, the connection is made explicit
through the msi-parent property and therefore is not required to be
explicit through a parent-child relationship in the device tree.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Conor Dooley <conor+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Krzysztof Kozlowski <krzk+dt@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-1-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM: arm64: Expose new KVM cap for cacheable PFNMAP
Introduce a new KVM capability to expose to the userspace whether
cacheable mapping of PFNMAP is supported.
The ability to safely do the cacheable mapping of PFNMAP is contingent
on S2FWB and ARM64_HAS_CACHE_DIC. S2FWB allows KVM to avoid flushing
the D cache, ARM64_HAS_CACHE_DIC allows KVM to avoid flushing the icache
and turns icache_inval_pou() into a NOP. The cap would be false if
those requirements are missing and is checked by making use of
kvm_arch_supports_cacheable_pfnmap.
This capability would allow userspace to discover the support.
It could for instance be used by userspace to prevent live-migration
across FWB and non-FWB hosts.
CC: Catalin Marinas <catalin.marinas@arm.com> CC: Jason Gunthorpe <jgg@nvidia.com> CC: Oliver Upton <oliver.upton@linux.dev> CC: David Hildenbrand <david@redhat.com> Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-7-ankita@nvidia.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
KVM currently forces non-cacheable memory attributes (either Normal-NC
or Device-nGnRE) for a region based on pfn_is_map_memory(), i.e. whether
or not the kernel has a cacheable alias for it. This is necessary in
situations where KVM needs to perform CMOs on the region but is
unnecessarily restrictive when hardware obviates the need for CMOs.
KVM doesn't need to perform any CMOs on hardware with FEAT_S2FWB and
CTR_EL0.DIC. As luck would have it, there are implementations in the
wild that need to map regions of a device with cacheable attributes to
function properly. An example of this is Nvidia's Grace Hopper/Blackwell
systems where GPU memory is interchangeable with DDR and retains
properties such as cacheability, unaligned accesses, atomics and
handling of executable faults. Of course, for this to work in a VM the
GPU memory needs to have a cacheable mapping at stage-2.
Allow cacheable stage-2 mappings to be created on supporting hardware
when the VMA has cacheable memory attributes. Check these preconditions
during memslot creation (in addition to fault handling) to potentially
'fail-fast' as a courtesy to userspace.
CC: Oliver Upton <oliver.upton@linux.dev> CC: Sean Christopherson <seanjc@google.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: David Hildenbrand <david@redhat.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-6-ankita@nvidia.com
[ Oliver: refine changelog, squash kvm_supports_cacheable_pfnmap() patch ] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Fixes a security bug due to mismatched attributes between S1 and
S2 mapping.
Currently, it is possible for a region to be cacheable in the userspace
VMA, but mapped non cached in S2. This creates a potential issue where
the VMM may sanitize cacheable memory across VMs using cacheable stores,
ensuring it is zeroed. However, if KVM subsequently assigns this memory
to a VM as uncached, the VM could end up accessing stale, non-zeroed data
from a previous VM, leading to unintended data exposure. This is a security
risk.
Block such mismatch attributes case by returning EINVAL when userspace
try to map PFNMAP cacheable. Only allow NORMAL_NC and DEVICE_*.
CC: Oliver Upton <oliver.upton@linux.dev> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Sean Christopherson <seanjc@google.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-4-ankita@nvidia.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
KVM: arm64: Assume non-PFNMAP/MIXEDMAP VMAs can be mapped cacheable
Despite its name, kvm_is_device_pfn() is actually used to determine if a
given PFN has a kernel mapping that can be used to perform cache
maintenance, as it calls pfn_is_map_memory() internally.
Expand the helper into its single callsite and further condition the
check on the VMA having either VM_PFNMAP or VM_MIXEDMAP set. VMAs that
set neither of these flags must always contain Normal, struct page
backed memory with valid aliases in the kernel address space.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-3-ankita@nvidia.com
[ Oliver: fixed typos, refined changelog ] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
KVM: arm64: Rename the device variable to s2_force_noncacheable
To perform cache maintenance on a region of memory, KVM/arm64 relies on
that region having a cacheable alias in the kernel's address space which
can be used with CMO instructions.
The 'device' variable is somewhat of a misnomer, as it actually
indicates whether or not the stage-2 alias is allowed to have cacheable
memory attributes. The resulting stage-2 memory attributes are further
modified by VM_ALLOW_ANY_UNCACHED, selecting between Normal-NC or
Device-nGnRE depending on what the endpoint supports.
Rename the to s2_force_noncacheable such that its purpose is a bit more
obvious.
CC: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-2-ankita@nvidia.com
[ Oliver: addressed typos, wound up rewriting changelog ] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Linus Torvalds [Sun, 22 Jun 2025 17:50:36 +0000 (10:50 -0700)]
Merge tag 'i2c-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- subsystem: convert drivers to use recent callbacks of struct
i2c_algorithm A typical after-rc1 cleanup, which I couldn't send in
time for rc2
- tegra: fix YAML conversion of device tree bindings
- k1: re-add a check which got lost during upstreaming
* tag 'i2c-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: k1: check for transfer error
i2c: use inclusive callbacks in struct i2c_algorithm
dt-bindings: i2c: nvidia,tegra20-i2c: Specify the required properties
Linus Torvalds [Sun, 22 Jun 2025 17:30:44 +0000 (10:30 -0700)]
Merge tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure the array tracking which kernel text positions need to be
alternatives-patched doesn't get mishandled by out-of-order
modifications, leading to it overflowing and causing page faults when
patching
- Avoid an infinite loop when early code does a ranged TLB invalidation
before the broadcast TLB invalidation count of how many pages it can
flush, has been read from CPUID
- Fix a CONFIG_MODULES typo
- Disable broadcast TLB invalidation when PTI is enabled to avoid an
overflow of the bitmap tracking dynamic ASIDs which need to be
flushed when the kernel switches between the user and kernel address
space
- Handle the case of a CPU going offline and thus reporting zeroes when
reading top-level events in the resctrl code
* tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Fix int3 handling failure from broken text_poke array
x86/mm: Fix early boot use of INVPLGB
x86/its: Fix an ifdef typo in its_alloc()
x86/mm: Disable INVLPGB when PTI is enabled
x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem
Linus Torvalds [Sun, 22 Jun 2025 17:17:51 +0000 (10:17 -0700)]
Merge tag 'irq_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix missing prototypes warnings
- Properly initialize work context when allocating it
- Remove a method tracking when managed interrupts are suspended during
hotplug, in favor of the code using a IRQ disable depth tracking now,
and have interrupts get properly enabled again on restore
- Make sure multiple CPUs getting hotplugged don't cause wrong tracking
of the managed IRQ disable depth
* tag 'irq_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ath79-misc: Fix missing prototypes warnings
genirq/irq_sim: Initialize work context pointers properly
genirq/cpuhotplug: Restore affinity even for suspended IRQ
genirq/cpuhotplug: Rebalance managed interrupts across multi-CPU hotplug
Linus Torvalds [Sun, 22 Jun 2025 17:11:45 +0000 (10:11 -0700)]
Merge tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Avoid a crash on a heterogeneous machine where not all cores support
the same hw events features
- Avoid a deadlock when throttling events
- Document the perf event states more
- Make sure a number of perf paths switching off or rescheduling events
call perf_cgroup_event_disable()
- Make sure perf does task sampling before its userspace mapping is
torn down, and not after
* tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix crash in icl_update_topdown_event()
perf: Fix the throttle error of some clock events
perf: Add comment to enum perf_event_state
perf/core: Fix WARN in perf_cgroup_switch()
perf: Fix dangling cgroup pointer in cpuctx
perf: Fix cgroup state vs ERROR
perf: Fix sample vs do_exit()
Linus Torvalds [Sun, 22 Jun 2025 17:09:23 +0000 (10:09 -0700)]
Merge tag 'locking_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Borislav Petkov:
- Make sure the switch to the global hash is requested always under a
lock so that two threads requesting that simultaneously cannot get to
inconsistent state
- Reject negative NUMA nodes earlier in the futex NUMA interface
handling code
- Selftests fixes
* tag 'locking_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Verify under the lock if hash can be replaced
futex: Handle invalid node numbers supplied by user
selftests/futex: Set the home_node in futex_numa_mpol
selftests/futex: getopt() requires int as return value.
* tag 'edac_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/igen6: Fix NULL pointer dereference
EDAC/amd64: Correct number of UMCs for family 19h models 70h-7fh
Linus Torvalds [Sun, 22 Jun 2025 16:46:11 +0000 (09:46 -0700)]
Merge tag 'v6.16-rc2-smb3-client-fixes-v2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Multichannel channel allocation fix for Kerberos mounts
- Two reconnect fixes
- Fix netfs_writepages crash with smbdirect/RDMA
- Directory caching fix
- Three minor cleanup fixes
- Log error when close cached dirs fails
* tag 'v6.16-rc2-smb3-client-fixes-v2' of git://git.samba.org/sfrench/cifs-2.6:
smb: minor fix to use SMB2_NTLMV2_SESSKEY_SIZE for auth_key size
smb: minor fix to use sizeof to initialize flags_string buffer
smb: Use loff_t for directory position in cached_dirents
smb: Log an error when close_all_cached_dirs fails
cifs: Fix prepare_write to negotiate wsize if needed
smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma()
smb: client: fix first command failure during re-negotiation
cifs: Remove duplicate fattr->cf_dtype assignment from wsl_to_fattr() function
smb: fix secondary channel creation issue with kerberos by populating hostname when adding channels
Alex Elder [Mon, 16 Jun 2025 12:51:36 +0000 (07:51 -0500)]
i2c: k1: check for transfer error
If spacemit_i2c_xfer_msg() times out waiting for a message transfer to
complete, or if the hardware reports an error, it returns a negative
error code (-ETIMEDOUT, -EAGAIN, -ENXIO. or -EIO).
The sole caller of spacemit_i2c_xfer_msg() is spacemit_i2c_xfer(),
which is the i2c_algorithm->xfer callback function. It currently
does not save the value returned by spacemit_i2c_xfer_msg().
The result is that transfer errors go unreported, and a caller
has no indication anything is wrong.
When this code was out for review, the return value *was* checked
in early versions. But for some reason, that assignment got dropped
between versions 5 and 6 of the series, perhaps related to reworking
the code to merge spacemit_i2c_xfer_core() into spacemit_i2c_xfer().
Simply assigning the value returned to "ret" fixes the problem.
Fixes: 5ea558473fa31 ("i2c: spacemit: add support for SpacemiT K1 SoC") Signed-off-by: Alex Elder <elder@riscstar.com> Cc: <stable@vger.kernel.org> # v6.15+ Reviewed-by: Troy Mitchell <troymitchell988@gmail.com> Link: https://lore.kernel.org/r/20250616125137.1555453-1-elder@riscstar.com Signed-off-by: Andi Shyti <andi@smida.it> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Linus Torvalds [Sat, 21 Jun 2025 16:20:15 +0000 (09:20 -0700)]
Merge tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two fixes for commits in the nfsd-6.16 merge
- One fix for the recently-added NFSD netlink facility
- One fix for a remote SunRPC crasher
* tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
sunrpc: handle SVC_GARBAGE during svc auth processing as auth error
nfsd: use threads array as-is in netlink interface
SUNRPC: Cleanup/fix initial rq_pages allocation
NFSD: Avoid corruption of a referring call list
Bharath SM [Thu, 19 Jun 2025 15:35:34 +0000 (21:05 +0530)]
smb: minor fix to use SMB2_NTLMV2_SESSKEY_SIZE for auth_key size
Replaced hardcoded value 16 with SMB2_NTLMV2_SESSKEY_SIZE
in the auth_key definition and memcpy call.
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Bharath SM [Thu, 19 Jun 2025 15:35:33 +0000 (21:05 +0530)]
smb: minor fix to use sizeof to initialize flags_string buffer
Replaced hardcoded length with sizeof(flags_string).
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Bharath SM [Thu, 19 Jun 2025 15:35:32 +0000 (21:05 +0530)]
smb: Use loff_t for directory position in cached_dirents
Change the pos field in struct cached_dirents from int to loff_t
to support large directory offsets. This avoids overflow and
matches kernel conventions for directory positions.
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Paul Aurich [Wed, 20 Nov 2024 16:01:54 +0000 (08:01 -0800)]
smb: Log an error when close_all_cached_dirs fails
Under low-memory conditions, close_all_cached_dirs() can't move the
dentries to a separate list to dput() them once the locks are dropped.
This will result in a "Dentry still in use" error, so add an error
message that makes it clear this is what happened:
[ 495.281119] CIFS: VFS: \\otters.example.com\share Out of memory while dropping dentries
[ 495.281595] ------------[ cut here ]------------
[ 495.281887] BUG: Dentry ffff888115531138{i=78,n=/} still in use (2) [unmount of cifs cifs]
[ 495.282391] WARNING: CPU: 1 PID: 2329 at fs/dcache.c:1536 umount_check+0xc8/0xf0
Also, bail out of looping through all tcons as soon as a single
allocation fails, since we're already in trouble, and kmalloc() attempts
for subseqeuent tcons are likely to fail just like the first one did.
Signed-off-by: Paul Aurich <paul@darkrain42.org> Acked-by: Bharath SM <bharathsm@microsoft.com> Suggested-by: Ruben Devos <rdevos@oxya.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
David Howells [Wed, 18 Jun 2025 15:39:47 +0000 (16:39 +0100)]
cifs: Fix prepare_write to negotiate wsize if needed
Fix cifs_prepare_write() to negotiate the wsize if it is unset.
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: linux-cifs@vger.kernel.org Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Tom Talpey <tom@talpey.com> Fixes: c45ebd636c32 ("cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
zhangjian [Thu, 19 Jun 2025 01:18:29 +0000 (09:18 +0800)]
smb: client: fix first command failure during re-negotiation
after fabc4ed200f9, server_unresponsive add a condition to check whether client
need to reconnect depending on server->lstrp. When client failed to reconnect
for some time and abort connection, server->lstrp is updated for the last time.
In the following scene, server->lstrp is too old. This cause next command
failure in re-negotiation rather than waiting for re-negotiation done.
1. mount -t cifs -o username=Everyone,echo_internal=10 //$server_ip/export /mnt
2. ssh $server_ip "echo b > /proc/sysrq-trigger &"
3. ls /mnt
4. sleep 21s
5. ssh $server_ip "service firewalld stop"
6. ls # return EHOSTDOWN
If the interval between 5 and 6 is too small, 6 may trigger sending negotiation
request. Before backgrounding cifsd thread try to receive negotiation response
from server in cifs_readv_from_socket, server_unresponsive may trigger
cifs_reconnect which cause 6 to be failed:
ls thread
----------------
smb2_negotiate
server->tcpStatus = CifsInNegotiate
compound_send_recv
wait_for_compound_request
ls thread
----------------
cifs_sync_mid_result return EAGAIN
smb2_negotiate return EHOSTDOWN
Though server->lstrp means last server response time, it is updated in
cifs_abort_connection and cifs_get_tcp_session. We can also update server->lstrp
before switching into CifsInNegotiate state to avoid failure in 6.
Fixes: 7ccc1465465d ("smb: client: fix hang in wait_for_response() for negproto") Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Acked-by: Meetakshi Setiya <msetiya@microsoft.com> Signed-off-by: zhangjian <zhangjian496@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Sat, 21 Jun 2025 15:27:12 +0000 (08:27 -0700)]
Merge tag 'acpi-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a crash in ACPICA while attempting to evaluate a control method
that expects more arguments than are being passed to it, which was
exposed by a defective firmware update from a prominent OEM on
multiple systems (Rafael Wysocki)"
* tag 'acpi-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Refuse to evaluate a method if arguments are missing