KVM: VMX: Inject #UD if guest tries to execute SEAMCALL or TDCALL
Add VMX exit handlers for SEAMCALL and TDCALL to inject a #UD if a non-TD
guest attempts to execute SEAMCALL or TDCALL. Neither SEAMCALL nor TDCALL
is gated by any software enablement other than VMXON, and so will generate
a VM-Exit instead of e.g. a native #UD when executed from the guest kernel.
Note! No unprivileged DoS of the L1 kernel is possible as TDCALL and
SEAMCALL #GP at CPL > 0, and the CPL check is performed prior to the VMX
non-root (VM-Exit) check, i.e. userspace can't crash the VM. And for a
nested guest, KVM forwards unknown exits to L1, i.e. an L2 kernel can
crash itself, but not L1.
Note #2! The IntelĀ® Trust Domain CPU Architectural Extensions spec's
pseudocode shows the CPL > 0 check for SEAMCALL coming _after_ the VM-Exit,
but that appears to be a documentation bug (likely because the CPL > 0
check was incorrectly bundled with other lower-priority #GP checks).
Testing on SPR and EMR shows that the CPL > 0 check is performed before
the VMX non-root check, i.e. SEAMCALL #GPs when executed in usermode.
Note #3! The aforementioned Trust Domain spec uses confusing pseudocode
that says that SEAMCALL will #UD if executed "inSEAM", but "inSEAM"
specifically means in SEAM Root Mode, i.e. in the TDX-Module. The long-
form description explicitly states that SEAMCALL generates an exit when
executed in "SEAM VMX non-root operation". But that's a moot point as the
TDX-Module injects #UD if the guest attempts to execute SEAMCALL, as
documented in the "Unconditionally Blocked Instructions" section of the
TDX-Module base specification.
Cc: stable@vger.kernel.org Cc: Kai Huang <kai.huang@intel.com> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20251016182148.69085-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Paolo Bonzini [Sat, 18 Oct 2025 08:25:43 +0000 (10:25 +0200)]
Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 6.18:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace
deletes/moves memslot during prefault")
- Don't try to get PMU capabbilities from perf when running a CPU with hybrid
CPUs/PMUs, as perf will rightly WARN.
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
said flag to initialize memory as SHARED, irrespective of MMAP. The
behavior merged in 6.18 is that enabling mmap() implicitly initializes
memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
as their memory is currently always initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
memory, to enable testing such setups, i.e. to hopefully flush out any
other lurking ABI issues before 6.18 is officially released.
- Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
and host userspace accesses to mmap()'d private memory.
Marc Zyngier [Wed, 8 Oct 2025 21:35:15 +0000 (22:35 +0100)]
arm64: Revamp HCR_EL2.E2H RES1 detection
We currently have two ways to identify CPUs that only implement FEAT_VHE
and not FEAT_E2H0:
- either they advertise it via ID_AA64MMFR4_EL1.E2H0,
- or the HCR_EL2.E2H bit is RAO/WI
However, there is a third category of "cpus" that fall between these
two cases: on CPUs that do not implement FEAT_FGT, it is IMPDEF whether
an access to ID_AA64MMFR4_EL1 can trap to EL2 when the register value
is zero.
A consequence of this is that on systems such as Neoverse V2, a NV
guest cannot reliably detect that it is in a VHE-only configuration
(E2H is writable, and ID_AA64MMFR0_EL1 is 0), despite the hypervisor's
best effort to repaint the id register.
Replace the RAO/WI test by a sequence that makes use of the VHE
register remnapping between EL1 and EL2 to detect this situation,
and work out whether we get the VHE behaviour even after having
set HCR_EL2.E2H to 0.
This solves the NV problem, and provides a more reliable acid test
for CPUs that do not completely follow the letter of the architecture
while providing a RES1 behaviour for HCR_EL2.E2H.
Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Jan Kotas <jank@cadence.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/15A85F2B-1A0C-4FA7-9FE4-EEC2203CC09E@global.cadence.com
Oliver Upton [Wed, 24 Sep 2025 23:51:50 +0000 (16:51 -0700)]
KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available
Marc reports that the performance of running an L3 guest has regressed
by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture.
That's of course terrible for the single user of recursive NV ;-)
While there's nothing to be done on non-FGT systems, take advantage of
the precise write trap of MDSCR_EL1 and leave the rest of the debug
registers untrapped.
Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Wed, 24 Sep 2025 23:51:49 +0000 (16:51 -0700)]
KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
To date KVM has used the fine-grained traps for the sake of UNDEF
enforcement (so-called FGUs), meaning the constituent parts could be
computed on a per-VM basis and folded into the effective value when
programmed.
Prepare for traps changing based on the vCPU context by computing the
whole mess of them at vcpu_load(). Aggressively inline all the helpers
to preserve the build-time checks that were there before.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 29 Sep 2025 16:04:57 +0000 (17:04 +0100)]
KVM: arm64: selftests: Fix misleading comment about virtual timer encoding
The userspace-visible encoding for CNTV_CVAL_EL0 and CNTVCNT_EL0
have been swapped for as long as usersapce has had access to the
registers. This is documented in arch/arm64/include/uapi/asm/kvm.h.
Despite that, the get_reg_list test has unhelpful comments indicating
the wrong register for the encoding.
Replace this with definitions exposed in the include file, and
a comment explaining again the brokenness.
Marc Zyngier [Mon, 29 Sep 2025 16:04:53 +0000 (17:04 +0100)]
KVM: arm64: Fix WFxT handling of nested virt
The spec for WFxT indicates that the parameter to the WFxT instruction
is relative to the reading of CNTVCT_EL0. This means that the implementation
needs to take the execution context into account, as CNTVOFF_EL2
does not always affect readings of CNTVCT_EL0 (such as when HCR_EL2.E2H
is 1 and that we're in host context).
This also rids us of the last instance of KVM_REG_ARM_TIMER_CNT
outside of the userspace interaction code.
Marc Zyngier [Mon, 29 Sep 2025 16:04:52 +0000 (17:04 +0100)]
KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure
Moving the counter registers is a bit more involved than for the control
and comparator (there is no shadow data for the counter), but still
pretty manageable.
Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 29 Sep 2025 16:04:49 +0000 (17:04 +0100)]
KVM: arm64: Add timer UAPI workaround to sysreg infrastructure
Amongst the numerous bugs that plague the KVM/arm64 UAPI, one of
the most annoying thing is that the userspace view of the virtual
timer has its CVAL and CNT encodings swapped.
In order to reduce the amount of code that has to know about this,
start by adding handling for this bug in the sys_reg code.
Nothing is making use of it yet, as the code responsible for userspace
interaction is catching the accesses early.
We currently have a vcpu pointer nested into each timer context.
As we are about to remove this pointer, introduce a helper (aptly
named timer_context_to_vcpu()) that returns this pointer, at least
until we repaint the data structure.
Sascha Bischoff [Tue, 7 Oct 2025 15:48:54 +0000 (15:48 +0000)]
Documentation: KVM: Update GICv3 docs for GICv5 hosts
GICv5 hosts optionally include FEAT_GCIE_LEGACY, which allows them to
execute GICv3-based VMs on GICv5 hardware. Update the GICv3
documentation to reflect this now that GICv3 guests are supports on
compatible GICv5 hosts.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Sascha Bischoff [Tue, 7 Oct 2025 16:07:13 +0000 (16:07 +0000)]
KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests
The ICH_HCR_EL2 traps are used when running on GICv3 hardware, or when
running a GICv3-based guest using FEAT_GCIE_LEGACY on GICv5
hardware. When running a GICv2 guest on GICv3 hardware the traps are
used to ensure that the guest never sees any part of GICv3 (only GICv2
is visible to the guest), and when running a GICv3 guest they are used
to trap in specific scenarios. They are not applicable for a
GICv2-native guest, and won't be applicable for a(n upcoming) GICv5
guest.
The traps themselves are configured in the vGIC CPU IF state, which is
stored as a union. Updating the wrong aperture of the union risks
corrupting state, and therefore needs to be avoided at all costs.
Bail early if we're not running a compatible guest (GICv2 on GICv3
hardware, GICv3 native, GICv3 on GICv5 hardware). Trap everything
unconditionally if we're running a GICv2 guest on GICv3
hardware. Otherwise, conditionally set up GICv3-native trapping.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Tue, 7 Oct 2025 19:52:55 +0000 (12:52 -0700)]
KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress
vgic_lpi_stress rather hilariously leaves IRQs disabled for the duration
of the test. While the ITS translation of MSIs happens regardless of
this, for completeness the guest should actually handle the LPIs.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
Zenghui Yu [Wed, 8 Oct 2025 15:45:20 +0000 (23:45 +0800)]
KVM: arm64: selftests: Allocate vcpus with correct size
vcpus array contains pointers to struct kvm_vcpu {}. It is way overkill
to allocate the array with (nr_cpus * sizeof(struct kvm_vcpu)). Fix the
allocation by using the correct size.
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
Mukesh Ojha [Fri, 10 Oct 2025 17:47:07 +0000 (23:17 +0530)]
KVM: arm64: Guard PMSCR_EL1 initialization with SPE presence check
Commit efad60e46057 ("KVM: arm64: Initialize PMSCR_EL1 when in VHE")
does not perform sufficient check before initializing PMSCR_EL1 to 0
when running in VHE mode. On some platforms, this causes the system to
hang during boot, as EL3 has not delegated access to the Profiling
Buffer to the Non-secure world, nor does it reinject an UNDEF on sysreg
trap.
To avoid this issue, restrict the PMSCR_EL1 initialization to CPUs that
support Statistical Profiling Extension (FEAT_SPE) and have the
Profiling Buffer accessible in Non-secure EL1. This is determined via a
new helper `cpu_has_spe()` which checks both PMSVer and PMBIDR_EL1.P.
This ensures the initialization only affects CPUs where SPE is
implemented and usable, preventing boot failures on platforms where SPE
is not properly configured.
Fixes: efad60e46057 ("KVM: arm64: Initialize PMSCR_EL1 when in VHE") Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Tue, 30 Sep 2025 19:33:02 +0000 (12:33 -0700)]
KVM: selftests: Fix irqfd_test for non-x86 architectures
The KVM_IRQFD ioctl fails if no irqchip is present in-kernel, which
isn't too surprising as there's not much KVM can do for an IRQ if it
cannot resolve a destination.
As written the irqfd_test assumes that a 'default' VM created in
selftests has an in-kernel irqchip created implicitly. That may be the
case on x86 but it isn't necessarily true on other architectures.
Add an arch predicate indicating if 'default' VMs get an irqchip and
make the irqfd_test depend on it. Work around arm64 VGIC initialization
requirements by using vm_create_with_one_vcpu(), ignoring the created
vCPU as it isn't used for the test.
Reported-by: Sebastian Ott <sebott@redhat.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Acked-by: Sean Christopherson <seanjc@google.com> Fixes: 7e9b231c402a ("KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Tue, 30 Sep 2025 23:36:20 +0000 (16:36 -0700)]
KVM: arm64: Document vCPU event ioctls as requiring init'ed vCPU
KVM rejects calls to KVM_{GET,SET}_VCPU_EVENTS for an uninitialized vCPU
as of commit cc96679f3c03 ("KVM: arm64: Prevent access to vCPU events
before init"). Update the corresponding API documentation.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Tue, 30 Sep 2025 08:52:37 +0000 (01:52 -0700)]
KVM: arm64: Prevent access to vCPU events before init
Another day, another syzkaller bug. KVM erroneously allows userspace to
pend vCPU events for a vCPU that hasn't been initialized yet, leading to
KVM interpreting a bunch of uninitialized garbage for routing /
injecting the exception.
In one case the injection code and the hyp disagree on whether the vCPU
has a 32bit EL1 and put the vCPU into an illegal mode for AArch64,
tripping the BUG() in exception_target_el() during the next injection:
Reject the ioctls outright as no sane VMM would call these before
KVM_ARM_VCPU_INIT anyway. Even if it did the exception would've been
thrown away by the eventual reset of the vCPU's state.
Cc: stable@vger.kernel.org # 6.17 Fixes: b7b27facc7b5 ("arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM: arm64: selftests: Track width of timer counter as "int", not "uint64_t"
Store the width of arm64's timer counter as an "int", not a "uint64_t".
ilog2() returns an "int", and more importantly using what is an "unsigned
long" under the hood makes clang unhappy due to a type mismatch when
clamping the width to a sane value.
Fixes: fad4cf944839 ("KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases") Cc: Sebastian Ott <sebott@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Fri, 26 Sep 2025 22:44:54 +0000 (15:44 -0700)]
KVM: arm64: selftests: Test effective value of HCR_EL2.AMO
A defect against the architecture now allows an implementation to treat
AMO as 1 when HCR_EL2.{E2H, TGE} = {1, 0}. KVM now takes advantage of
this interpretation to address a quality of emulation issue w.r.t.
SError injection.
Add a corresponding test case and expect a pending SError to be taken.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Fri, 26 Sep 2025 22:42:46 +0000 (15:42 -0700)]
KVM: arm64: Use the in-context stage-1 in __kvm_find_s1_desc_level()
Running the external_aborts selftest at EL2 leads to an ugly splat due
to the stage-1 MMU being disabled for the walked context, owing to the
fact that __kvm_find_s1_desc_level() is hardcoded to the EL1&0 regime.
Select the appropriate translation regime for the stage-1 walk based on
the current vCPU context.
Fixes: b8e625167a32 ("KVM: arm64: Add S1 IPA to page table level walker") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 26 Sep 2025 19:41:08 +0000 (12:41 -0700)]
KVM: arm64: nv: Don't advance PC when pending an SVE exception
Jan reports that running a nested guest on Neoverse-V2 leads to a WARN
in the host due to simultaneously pending an exception and PC increment
after an access to ZCR_EL2.
Returning true from a sysreg accessor is an indication that the sysreg
instruction has been retired. Of course this isn't the case when we've
pended a synchronous SVE exception for the guest. Fix the return value
and let the exception propagate to the guest as usual.
Reported-by: Jan Kotas <jank@cadence.com> Closes: https://lore.kernel.org/kvmarm/865xd61tt5.wl-maz@kernel.org/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Fri, 26 Sep 2025 19:41:07 +0000 (12:41 -0700)]
KVM: arm64: nv: Don't treat ZCR_EL2 as a 'mapped' register
Unlike the other mapped EL2 sysregs ZCR_EL2 isn't guaranteed to be
resident when a vCPU is loaded as it actually follows the SVE
context. As such, the contents of ZCR_EL1 may belong to another guest if
the vCPU has been preempted before reaching sysreg emulation.
Unconditionally use the in-memory value of ZCR_EL2 and switch to the
memory-only accessors. The in-memory value is guaranteed to be valid as
fpsimd_lazy_switch_to_{guest,host}() will restore/save the register
appropriately.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
Linus Torvalds [Sun, 12 Oct 2025 20:27:56 +0000 (13:27 -0700)]
Merge tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"One revert because of a regression in the I2C core which has sadly not
showed up during its time in -next"
* tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: boardinfo: Annotate code used in init phase only"
Linus Torvalds [Sun, 12 Oct 2025 15:45:52 +0000 (08:45 -0700)]
Merge tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Skip interrupt ID 0 in sifive-plic during suspend/resume because
ID 0 is reserved and accessing reserved register space could result
in undefined behavior
- Fix a function's retval check in aspeed-scu-ic
* tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resume
irqchip/aspeed-scu-ic: Fix an IS_ERR() vs NULL check
Linus Torvalds [Sat, 11 Oct 2025 23:06:04 +0000 (16:06 -0700)]
Merge tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"The previous fix to trace_marker required updating trace_marker_raw as
well. The difference between trace_marker_raw from trace_marker is
that the raw version is for applications to write binary structures
directly into the ring buffer instead of writing ASCII strings. This
is for applications that will read the raw data from the ring buffer
and get the data structures directly. It's a bit quicker than using
the ASCII version.
Unfortunately, it appears that our test suite has several tests that
test writes to the trace_marker file, but lacks any tests to the
trace_marker_raw file (this needs to be remedied). Two issues came
about the update to the trace_marker_raw file that syzbot found:
- Fix tracing_mark_raw_write() to use per CPU buffer
The fix to use the per CPU buffer to copy from user space was
needed for both the trace_maker and trace_maker_raw file.
The fix for reading from user space into per CPU buffers properly
fixed the trace_marker write function, but the trace_marker_raw
file wasn't fixed properly. The user space data was correctly
written into the per CPU buffer, but the code that wrote into the
ring buffer still used the user space pointer and not the per CPU
buffer that had the user space data already written.
- Stop the fortify string warning from writing into trace_marker_raw
After converting the copy_from_user_nofault() into a memcpy(),
another issue appeared. As writes to the trace_marker_raw expects
binary data, the first entry is a 4 byte identifier. The entry
structure is defined as:
struct {
struct trace_entry ent;
int id;
char buf[];
};
The size of this structure is reserved on the ring buffer with:
size = sizeof(*entry) + cnt;
Then it is copied from the buffer into the ring buffer with:
memcpy(&entry->id, buf, cnt);
This use to be a copy_from_user_nofault(), but now converting it to
a memcpy() triggers the fortify-string code, and causes a warning.
The allocated space is actually more than what is copied, as the
cnt used also includes the entry->id portion. Allocating
sizeof(*entry) plus cnt is actually allocating 4 bytes more than
what is needed.
* tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Stop fortify-string from warning in tracing_mark_raw_write()
tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
Linus Torvalds [Sat, 11 Oct 2025 22:47:12 +0000 (15:47 -0700)]
Merge tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:
- Fix UAPI types check in headers_check.pl
- Only enable -Werror for hostprogs with CONFIG_WERROR / W=e
- Ignore fsync() error when output of gen_init_cpio is a pipe
- Several little build fixes for recent modules.builtin.modinfo series
* tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Use '--strip-unneeded-symbol' for removing module device table symbols
s390/vmlinux.lds.S: Move .vmlinux.info to end of allocatable sections
kbuild: Add '.rel.*' strip pattern for vmlinux
kbuild: Restore pattern to avoid stripping .rela.dyn from vmlinux
gen_init_cpio: Ignore fsync() returning EINVAL on pipes
scripts/Makefile.extrawarn: Respect CONFIG_WERROR / W=e for hostprogs
kbuild: uapi: Strip comments before size type check
Reported-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Closes: https://lore.kernel.org/r/29ec0082-4dd4-4120-acd2-44b35b4b9487@oss.qualcomm.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Linus Torvalds [Sat, 11 Oct 2025 18:56:47 +0000 (11:56 -0700)]
Merge tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"This cycle, we have a new RTC driver, for the SpacemiT P1. The optee
driver gets alarm support. We also get a fix for a race condition that
was fairly rare unless while stress testing the alarms.
Subsystem:
- Fix race when setting alarm
- Ensure alarm irq is enabled when UIE is enabled
- remove unneeded 'fast_io' parameter in regmap_config
New driver:
- SpacemiT P1 RTC
Drivers:
- efi: Remove wakeup functionality
- optee: add alarms support
- s3c: Drop support for S3C2410
- zynqmp: Restore alarm functionality after kexec transition"
* tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (29 commits)
rtc: interface: Ensure alarm irq is enabled when UIE is enabled
rtc: tps6586x: Fix initial enable_irq/disable_irq balance
rtc: cpcap: Fix initial enable_irq/disable_irq balance
rtc: isl12022: Fix initial enable_irq/disable_irq balance
rtc: interface: Fix long-standing race when setting alarm
rtc: pcf2127: fix watchdog interrupt mask on pcf2131
rtc: zynqmp: Restore alarm functionality after kexec transition
rtc: amlogic-a4: Optimize global variables
rtc: sd2405al: Add I2C address.
rtc: Kconfig: move symbols to proper section
rtc: optee: make optee_rtc_pm_ops static
rtc: optee: Fix error code in optee_rtc_read_alarm()
rtc: optee: fix error code in probe()
dt-bindings: rtc: Convert apm,xgene-rtc to DT schema
rtc: spacemit: support the SpacemiT P1 RTC
rtc: optee: add alarm related rtc ops to optee rtc driver
rtc: optee: remove unnecessary memory operations
rtc: optee: fix memory leak on driver removal
rtc: x1205: Fix Xicor X1205 vendor prefix
dt-bindings: rtc: Fix Xicor X1205 vendor prefix
...
Linus Torvalds [Sat, 11 Oct 2025 18:49:00 +0000 (11:49 -0700)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Fixes only in drivers (ufs, mvsas, qla2xxx, target) that came in just
before or during the merge window.
The most important one is the qla2xxx which reverts a conversion to
fix flexible array member warnings, that went up in this merge window
but which turned out on further testing to be causing data corruption"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Include UTP error in INT_FATAL_ERRORS
scsi: ufs: sysfs: Make HID attributes visible
scsi: mvsas: Fix use-after-free bugs in mvs_work_queue
scsi: ufs: core: Fix PM QoS mutex initialization
scsi: ufs: core: Fix runtime suspend error deadlock
Revert "scsi: qla2xxx: Fix memcpy() field-spanning write issue"
scsi: target: target_core_configfs: Add length check to avoid buffer overflow
Linus Torvalds [Sat, 11 Oct 2025 18:19:16 +0000 (11:19 -0700)]
Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 updates from Borislav Petkov:
- Remove a bunch of asm implementing condition flags testing in KVM's
emulator in favor of int3_emulate_jcc() which is written in C
- Replace KVM fastops with C-based stubs which avoids problems with the
fastop infra related to latter not adhering to the C ABI due to their
special calling convention and, more importantly, bypassing compiler
control-flow integrity checking because they're written in asm
- Remove wrongly used static branches and other ugliness accumulated
over time in hyperv's hypercall implementation with a proper static
function call to the correct hypervisor call variant
- Add some fixes and modifications to allow running FRED-enabled
kernels in KVM even on non-FRED hardware
- Add kCFI improvements like validating indirect calls and prepare for
enabling kCFI with GCC. Add cmdline params documentation and other
code cleanups
- Use the single-byte 0xd6 insn as the official #UD single-byte
undefined opcode instruction as agreed upon by both x86 vendors
- Other smaller cleanups and touchups all over the place
* tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86,retpoline: Optimize patch_retpoline()
x86,ibt: Use UDB instead of 0xEA
x86/cfi: Remove __noinitretpoline and __noretpoline
x86/cfi: Add "debug" option to "cfi=" bootparam
x86/cfi: Standardize on common "CFI:" prefix for CFI reports
x86/cfi: Document the "cfi=" bootparam options
x86/traps: Clarify KCFI instruction layout
compiler_types.h: Move __nocfi out of compiler-specific header
objtool: Validate kCFI calls
x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y
x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware
x86/fred: Install system vector handlers even if FRED isn't fully enabled
x86/hyperv: Use direct call to hypercall-page
x86/hyperv: Clean up hv_do_hypercall()
KVM: x86: Remove fastops
KVM: x86: Convert em_salc() to C
KVM: x86: Introduce EM_ASM_3WCL
KVM: x86: Introduce EM_ASM_1SRC2
KVM: x86: Introduce EM_ASM_2CL
KVM: x86: Introduce EM_ASM_2W
...
Linus Torvalds [Sat, 11 Oct 2025 17:51:14 +0000 (10:51 -0700)]
Merge tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- Simplify inline asm flag output operands now that the minimum
compiler version supports the =@ccCOND syntax
- Remove a bunch of AS_* Kconfig symbols which detect assembler support
for various instruction mnemonics now that the minimum assembler
version supports them all
- The usual cleanups all over the place
* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h>
x86/mtrr: Remove license boilerplate text with bad FSF address
x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h>
x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h>
x86/entry/fred: Push __KERNEL_CS directly
x86/kconfig: Remove CONFIG_AS_AVX512
crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
crypto: X86 - Remove CONFIG_AS_VAES
crypto: x86 - Remove CONFIG_AS_GFNI
x86/kconfig: Drop unused and needless config X86_64_SMP
Linus Torvalds [Sat, 11 Oct 2025 17:40:24 +0000 (10:40 -0700)]
Merge tag 'slab-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
"A NULL pointer deref hotfix"
* tag 'slab-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: fix barn NULL pointer dereference on memoryless nodes
- Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel
Borkmann)
- Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers)
- Use correct context to unpin bpf hash map with special types (KaFai
Wan)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add test for unpinning htab with internal timer struct
bpf: Avoid RCU context warning when unpinning htab with internal structs
xsk: Harden userspace-supplied xdp_desc validation
bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
libbpf: Fix undefined behavior in {get,put}_unaligned_be32()
bpf: Finish constification of 1st parameter of bpf_d_path()
Linus Torvalds [Sat, 11 Oct 2025 17:27:52 +0000 (10:27 -0700)]
Merge tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more updates from Andrew Morton:
"Just one series here - Mike Rappoport has taught KEXEC handover to
preserve vmalloc allocations across handover"
* tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt
kho: add support for preserving vmalloc allocations
kho: replace kho_preserve_phys() with kho_preserve_pages()
kho: check if kho is finalized in __kho_preserve_order()
MAINTAINERS, .mailmap: update Umang's email address
Linus Torvalds [Sat, 11 Oct 2025 17:14:55 +0000 (10:14 -0700)]
Merge tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"7 hotfixes. All 7 are cc:stable and all 7 are for MM.
All singletons, please see the changelogs for details"
* tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: hugetlb: avoid soft lockup when mprotect to large memory area
fsnotify: pass correct offset to fsnotify_mmap_perm()
mm/ksm: fix flag-dropping behavior in ksm_madvise
mm/damon/vaddr: do not repeat pte_offset_map_lock() until success
mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping zero-filled mTHP subpage to shared zeropage
mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
memcg: skip cgroup_file_notify if spinning is not allowed
This is because fortify string sees that the size of entry->id is only 4
bytes, but it is writing more than that. But this is OK as the
dynamic_array is allocated to handle that copy.
The size allocated on the ring buffer was actually a bit too big:
size = sizeof(*entry) + cnt;
But cnt includes the 'id' and the buffer data, so adding cnt to the size
of *entry actually allocates too much on the ring buffer.
Vlastimil Babka [Sat, 11 Oct 2025 08:45:41 +0000 (10:45 +0200)]
slab: fix barn NULL pointer dereference on memoryless nodes
Phil reported a boot failure once sheaves become used in commits 59faa4da7cd4 ("maple_tree: use percpu sheaves for maple_node_cache") and 3accabda4da1 ("mm, vma: use percpu sheaves for vm_area_struct cache"):
Linus decoded the stacktrace to get_barn() and get_node() and determined
that kmem_cache->node[numa_mem_id()] is NULL.
The problem is due to a wrong assumption that memoryless nodes only
exist on systems with CONFIG_HAVE_MEMORYLESS_NODES, where numa_mem_id()
points to the nearest node that has memory. SLUB has been allocating its
kmem_cache_node structures only on nodes with memory and so it does with
struct node_barn.
For kmem_cache_node, get_partial_node() checks if get_node() result is
not NULL, which I assumed was for protection from a bogus node id passed
to kmalloc_node() but apparently it's also for systems where
numa_mem_id() (used when no specific node is given) might return a
memoryless node.
Fix the sheaves code the same way by checking the result of get_node()
and bailing out if it's NULL. Note that cpus on such memoryless nodes
will have degraded sheaves performance, which can be improved later,
preferably by making numa_mem_id() work properly on such systems.
Steven Rostedt [Sat, 11 Oct 2025 03:51:42 +0000 (23:51 -0400)]
tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
The fix to use a per CPU buffer to read user space tested only the writes
to trace_marker. But it appears that the selftests are missing tests to
the trace_maker_raw file. The trace_maker_raw file is used by applications
that writes data structures and not strings into the file, and the tools
read the raw ring buffer to process the structures it writes.
The fix that reads the per CPU buffers passes the new per CPU buffer to
the trace_marker file writes, but the update to the trace_marker_raw write
read the data from user space into the per CPU buffer, but then still used
then passed the user space address to the function that records the data.
Pass in the per CPU buffer and not the user space address.
TODO: Add a test to better test trace_marker_raw.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20251011035243.386098147@kernel.org Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
kbuild: Use '--strip-unneeded-symbol' for removing module device table symbols
After commit 5ab23c7923a1 ("modpost: Create modalias for builtin
modules"), relocatable RISC-V kernels with CONFIG_KASAN=y start failing
when attempting to strip the module device table symbols:
riscv64-linux-objcopy: not stripping symbol `__mod_device_table__kmod_irq_starfive_jh8100_intc__of__starfive_intc_irqchip_match_table' because it is named in a relocation
make[4]: *** [scripts/Makefile.vmlinux:97: vmlinux] Error 1
The relocation appears to come from .LASANLOC5 in .data.rel.local:
This section appears to come from GCC for including additional
information about global variables that may be protected by KASAN.
There appears to be no way to opt out of the generation of these symbols
through either a flag or attribute. Attempting to remove '.LASANLOC*'
with '--strip-symbol' results in the same error as above because these
symbols may refer to (thus have relocation between) each other.
Avoid this build breakage by switching to '--strip-unneeded-symbol' for
removing __mod_device_table__ symbols, as it will only remove the symbol
when there is no relocation pointing to it. While this may result in a
little more bloat in the symbol table in certain configurations, it is
not as bad as outright build failures.
Fixes: 5ab23c7923a1 ("modpost: Create modalias for builtin modules") Reported-by: Charles Mirabile <cmirabil@redhat.com> Closes: https://lore.kernel.org/20251007011637.2512413-1-cmirabil@redhat.com/ Suggested-by: Alexey Gladkov <legion@kernel.org> Tested-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org>
KVM: selftests: Verify that reads to inaccessible guest_memfd VMAs SIGBUS
Expand the guest_memfd negative testcases for overflow and MAP_PRIVATE to
verify that reads to inaccessible memory also get a SIGBUS.
Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Lisa Wang <wyihan@google.com> Tested-by: Lisa Wang <wyihan@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: selftests: Verify that faulting in private guest_memfd memory fails
Add a guest_memfd testcase to verify that faulting in private memory gets
a SIGBUS. For now, test only the case where memory is private by default
since KVM doesn't yet support in-place conversion.
Deliberately run the CoW test with and without INIT_SHARED set as KVM
should disallow MAP_PRIVATE regardless of whether the memory itself is
private from a CoCo perspective.
KVM: selftests: Add wrapper macro to handle and assert on expected SIGBUS
Extract the guest_memfd test's SIGBUS handling functionality into a common
TEST_EXPECT_SIGBUS() macro in anticipation of adding more SIGBUS testcases.
Eating a SIGBUS isn't terrible difficult, but it requires a non-trivial
amount of boilerplate code, and using a macro allows selftests to print
out the exact action that failed to generate a SIGBUS without the developer
needing to remember to add a useful error message.
Explicitly mark the SIGBUS handler as "used", as gcc-14 at least likes to
discard the function before linking.
Opportunistically use TEST_FAIL(...) instead of TEST_ASSERT(false, ...),
and fix the write path of the guest_memfd test to use the local "val"
instead of hardcoding the literal value a second time.
Suggested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Lisa Wang <wyihan@google.com> Tested-by: Lisa Wang <wyihan@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: selftests: Add wrappers for mmap() and munmap() to assert success
Add and use wrappers for mmap() and munmap() that assert success to reduce
a significant amount of boilerplate code, to ensure all tests assert on
failure, and to provide consistent error messages on failure.
Ackerley Tng [Fri, 3 Oct 2025 23:26:01 +0000 (16:26 -0700)]
KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP
If a VM type supports KVM_CAP_GUEST_MEMFD_MMAP, the guest_memfd test will
run all test cases with GUEST_MEMFD_FLAG_MMAP set. This leaves the code
path for creating a non-mmap()-able guest_memfd on a VM that supports
mappable guest memfds untested.
Refactor the test to run the main test suite with a given set of flags.
Then, for VM types that support the mappable capability, invoke the test
suite twice: once with no flags, and once with GUEST_MEMFD_FLAG_MMAP
set.
This ensures both creation paths are properly exercised on capable VMs.
Run test_guest_memfd_flags() only once per VM type since it depends only
on the set of valid/supported flags, i.e. iterating over an arbitrary set
of flags is both unnecessary and wrong.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
[sean: use double-underscores for the inner helper] Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: selftests: Create a new guest_memfd for each testcase
Refactor the guest_memfd selftest to improve test isolation by creating a
a new guest_memfd for each testcase. Currently, the test reuses a single
guest_memfd instance for all testcases, and thus creates dependencies
between tests, e.g. not truncating folios from the guest_memfd instance
at the end of a test could lead to unexpected results (see the PUNCH_HOLE
purging that needs to done by in-flight the NUMA testcases[1]).
Invoke each test via a macro wrapper to create and close a guest_memfd
to cut down on the boilerplate copy+paste needed to create a test.
KVM: selftests: Stash the host page size in a global in the guest_memfd test
Use a global variable to track the host page size in the guest_memfd test
so that the information doesn't need to be constantly passed around. The
state is purely a reflection of the underlying system, i.e. can't be set
by the test and is constant for a given invocation of the test, and thus
explicitly passing the host page size to individual testcases adds no
value, e.g. doesn't allow testing different combinations.
Making page_size a global will simplify an upcoming change to create a new
guest_memfd instance per testcase.
KVM: guest_memfd: Allow mmap() on guest_memfd for x86 VMs with private memory
Allow mmap() on guest_memfd instances for x86 VMs with private memory as
the need to track private vs. shared state in the guest_memfd instance is
only pertinent to INIT_SHARED. Doing mmap() on private memory isn't
terrible useful (yet!), but it's now possible, and will be desirable when
guest_memfd gains support for other VMA-based syscalls, e.g. mbind() to
set NUMA policy.
Lift the restriction now, before MMAP support is officially released, so
that KVM doesn't need to add another capability to enumerate support for
mmap() on private memory.
Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files") Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: Explicitly mark KVM_GUEST_MEMFD as depending on KVM_GENERIC_MMU_NOTIFIER
Add KVM_GENERIC_MMU_NOTIFIER as a dependency for selecting KVM_GUEST_MEMFD,
as guest_memfd relies on kvm_mmu_invalidate_{begin,end}(), which are
defined if and only if the generic mmu_notifier implementation is enabled.
The missing dependency is currently benign as s390 is the only KVM arch
that doesn't utilize the generic mmu_notifier infrastructure, and s390
doesn't currently support guest_memfd.
Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: guest_memfd: Invalidate SHARED GPAs if gmem supports INIT_SHARED
When invalidating gmem ranges, e.g. in response to PUNCH_HOLE, process all
possible range types (PRIVATE vs. SHARED) for the gmem instance. Since
since guest_memfd doesn't yet support in-place conversions, simply pivot
on INIT_SHARED as a gmem instance can currently only have private or shared
memory, not both.
Failure to mark shared GPAs for invalidation is benign in the current code
base, as only x86's TDX consumes KVM_FILTER_{PRIVATE,SHARED}, and TDX
doesn't yet support INIT_SHARED with guest_memfd. However, invalidating
only private GPAs is conceptually wrong and a lurking bug, e.g. could
result in missed invalidations if ARM starts filtering invalidations based
on attributes.
Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files") Reviewed-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: guest_memfd: Add INIT_SHARED flag, reject user page faults if not set
Add a guest_memfd flag to allow userspace to state that the underlying
memory should be configured to be initialized as shared, and reject user
page faults if the guest_memfd instance's memory isn't shared. Because
KVM doesn't yet support in-place private<=>shared conversions, all
guest_memfd memory effectively follows the initial state.
Alternatively, KVM could deduce the initial state based on MMAP, which for
all intents and purposes is what KVM currently does. However, implicitly
deriving the default state based on MMAP will result in a messy ABI when
support for in-place conversions is added.
For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
by default (otherwise the memory would be unusable). If MMAP implies
memory is shared by default, then the default state for CoCo VMs will vary
based on MMAP, and from userspace's perspective, will change when in-place
conversion support is added. I.e. to maintain guest<=>host ABI, userspace
would need to immediately convert all memory from shared=>private, which
is both ugly and inefficient. The inefficiency could be avoided by adding
a flag to state that memory is _private_ by default, irrespective of MMAP,
but that would lead to an equally messy and hard to document ABI.
Bite the bullet and immediately add a flag to control the default state so
that the effective behavior is explicit and straightforward.
Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files") Cc: David Hildenbrand <david@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: Rework KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS
Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic
KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't
require a new capability, and so that developers aren't tempted to bundle
multiple flags into a single capability.
Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit
value, but that limitation can be easily circumvented by adding e.g.
KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more
than 32 flags.
Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Dapeng Mi [Fri, 10 Oct 2025 00:52:39 +0000 (08:52 +0800)]
KVM: x86/pmu: Don't try to get perf capabilities for hybrid CPUs
Explicitly zero kvm_host_pmu instead of attempting to get the perf PMU
capabilities when running on a hybrid CPU to avoid running afoul of perf's
sanity check.
Always read the capabilities for non-hybrid CPUs, i.e. don't entirely
revert to reading capabilities if and only if KVM wants to use a PMU, as
it may be useful to have the host PMU capabilities available, e.g. if only
or debug.
Reported-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Closes: https://lore.kernel.org/all/70b64347-2aca-4511-af78-a767d5fa8226@intel.com/ Fixes: 51f34b1e650f ("KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20251010005239.146953-1-dapeng1.mi@linux.intel.com
[sean: rework changelog, call out hybrid CPUs in shortlog] Signed-off-by: Sean Christopherson <seanjc@google.com>
Linus Torvalds [Fri, 10 Oct 2025 21:06:02 +0000 (14:06 -0700)]
Merge tag 'for-6.18/hpfs-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull hpfs updates from Mikulas Patocka:
- Avoid -Wflex-array-member-not-at-end warnings
- Replace simple_strtoul with kstrtoint
- Fix error code for new_inode() failure
* tag 'for-6.18/hpfs-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
fs/hpfs: Fix error code for new_inode() failure in mkdir/create/mknod/symlink
hpfs: Replace simple_strtoul with kstrtoint in hpfs_parse_param
fs: hpfs: Avoid multiple -Wflex-array-member-not-at-end warnings
amdkfd:
- Fix kfd process ref leak
- mmap write lock handling fix
- Fix comments in IOCTL
xe:
- Fix build with clang 16
- Fix handling of invalid configfs syntax usage and spell out the
expected syntax in the documentation
- Do not try late bind firmware when running as VF since it shouldn't
handle firmware loading
- Fix idle assertion for local BOs
- Fix uninitialized variable for late binding
- Do not require perfmon_capable to expose free memory at page
granularity. Handle it like other drm drivers do
- Fix lock handling on suspend error path
- Fix I2C controller resume after S3
v3d:
- fix fence locking"
* tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
drm/amd/display: Incorrect Mirror Cositing
drm/amd/display: Enable Dynamic DTBCLK Switch
drm/amdgpu: Report individual reset error
drm/amdgpu: partially revert "revert to old status lock handling v3"
drm/amd/display: Fix unsafe uses of kernel mode FPU
drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression
drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine
drm/amdgpu: Check swus/ds for switch state save
drm/amdkfd: Fix two comments in kfd_ioctl.h
drm/amd/pm: Avoid interface mismatch messaging
drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init
drm/amd/amdgpu: Fix the mes version that support inv_tlbs
drm/amd: Check whether secure display TA loaded successfully
drm/amdkfd: Fix mmap write lock not release
drm/amdkfd: Fix kfd process ref leaking when userptr unmapping
drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O.
drm/amd/display: Disable scaling on DCE6 for now
drm/amd/display: Properly disable scaling on DCE6
drm/amd/display: Properly clear SCL_*_FILTER_CONTROL on DCE6
drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs
...
Linus Torvalds [Fri, 10 Oct 2025 20:59:38 +0000 (13:59 -0700)]
Merge tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Some fixes leftover from our fixes branch, just nouveau and vmwgfx:
nouveau:
- Return errno code from TTM move helper
vmwgfx:
- Fix null-ptr access in cursor code
- Fix UAF in validation
- Use correct iterator in validation"
* tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel:
drm/nouveau: fix bad ret code in nouveau_bo_move_prep
drm/vmwgfx: Fix copy-paste typo in validation
drm/vmwgfx: Fix Use-after-free in validation
drm/vmwgfx: Fix a null-ptr access in the cursor snooper
Allow additional properties to enable devices attached to the bus.
Fixes warnings like these:
arch/arm/boot/dts/renesas/sh73a0-kzm9g.dtb: bus@fec10000 (renesas,bsc-sh73a0): Unevaluated properties are not allowed ('ethernet@10000000' was unexpected)
arch/arm/boot/dts/renesas/r8a73a4-ape6evm.dtb: bus@fec10000 (renesas,bsc-r8a73a4): Unevaluated properties are not allowed ('ethernet@8000000', 'flash@0' were unexpected)
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Node names are already and properly checked by the core schema. No need
to do it again.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
[robh: Also drop [A-F] in unit address] Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Linus Torvalds [Fri, 10 Oct 2025 18:30:19 +0000 (11:30 -0700)]
Merge tag 'ceph-for-6.18-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
- some messenger improvements (Eric and Max)
- address an issue (also affected userspace) of incorrect permissions
being granted to users who have access to multiple different CephFS
instances within the same cluster (Kotresh)
- a bunch of assorted CephFS fixes (Slava)
* tag 'ceph-for-6.18-rc1' of https://github.com/ceph/ceph-client:
ceph: add bug tracking system info to MAINTAINERS
ceph: fix multifs mds auth caps issue
ceph: cleanup in ceph_alloc_readdir_reply_buffer()
ceph: fix potential NULL dereference issue in ceph_fill_trace()
libceph: add empty check to ceph_con_get_out_msg()
libceph: pass the message pointer instead of loading con->out_msg
libceph: make ceph_con_get_out_msg() return the message pointer
ceph: fix potential race condition on operations with CEPH_I_ODIRECT flag
ceph: refactor wake_up_bit() pattern of calling
ceph: fix potential race condition in ceph_ioctl_lazyio()
ceph: fix overflowed constant issue in ceph_do_objects_copy()
ceph: fix wrong sizeof argument issue in register_session()
ceph: add checking of wait_for_completion_killable() return value
ceph: make ceph_start_io_*() killable
libceph: Use HMAC-SHA256 library instead of crypto_shash
Linus Torvalds [Fri, 10 Oct 2025 18:23:57 +0000 (11:23 -0700)]
Merge tag 'v6.18-rc-part2-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- fix i_size in fallocate
- two truncate fixes
- utime fix
- minor cleanups
- SMB1 fixes
- improve error check in read
- improve perf of copy file_range (copy_chunk)
* tag 'v6.18-rc-part2-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
cifs: Add comments for DeletePending assignments in open functions
cifs: Add fallback code path for cifs_mkdir_setinfo()
cifs: Allow fallback code in smb_set_file_info() also for directories
cifs: Query EA $LXMOD in cifs_query_path_info() for WSL reparse points
smb: client: remove cfids_invalidation_worker
smb: client: remove redudant assignment in cifs_strict_fsync()
smb: client: fix race with fallocate(2) and AIO+DIO
smb: client: fix missing timestamp updates after utime(2)
smb: client: fix missing timestamp updates after ftruncate(2)
smb: client: fix missing timestamp updates with O_TRUNC
cifs: Fix copy_to_iter return value check
smb: client: batch SRV_COPYCHUNK entries to cut round trips
smb: client: Omit an if branch in smb2_find_smb_tcon()
smb: client: Return directly after a failed genlmsg_new() in cifs_swn_send_register_message()
smb: client: Use common code in cifs_do_create()
smb: client: Improve unlocking of a mutex in cifs_get_swn_reg()
smb: client: Return a status code only as a constant in cifs_spnego_key_instantiate()
smb: client: Use common code in cifs_lookup()
smb: client: Reduce the scopes for a few variables in two functions
Linus Torvalds [Fri, 10 Oct 2025 18:20:19 +0000 (11:20 -0700)]
Merge tag 'xtensa-20251010' of https://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa updates from Max Filippov:
- minor cleanups
* tag 'xtensa-20251010' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: use HZ_PER_MHZ in platform_calibrate_ccount
xtensa: simdisk: add input size check in proc_write_simdisk
Linus Torvalds [Fri, 10 Oct 2025 17:37:13 +0000 (10:37 -0700)]
Merge tag 'block-6.18-20251009' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- Don't include __GFP_NOWARN for loop worker allocation, as it already
uses GFP_NOWAIT which has __GFP_NOWARN set already
- Small series cleaning up the recent bio_iov_iter_get_pages() changes
- loop fix for leaking the backing reference file, if validation fails
- Update of a comment pertaining to disk/partition stat locking
* tag 'block-6.18-20251009' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
loop: remove redundant __GFP_NOWARN flag
block: move bio_iov_iter_get_bdev_pages to block/fops.c
iomap: open code bio_iov_iter_get_bdev_pages
block: rename bio_iov_iter_get_pages_aligned to bio_iov_iter_get_pages
block: remove bio_iov_iter_get_pages
block: Update a comment of disk statistics
loop: fix backing file reference leak on validation error
Linus Torvalds [Fri, 10 Oct 2025 17:25:24 +0000 (10:25 -0700)]
Merge tag 'io_uring-6.18-20251009' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Fixup indentation in the UAPI header
- Two fixes for zcrx. One fixes receiving too much in some cases, and
the other deals with not correctly incrementing the source in the
fallback copy loop
- Fix for a race in the IORING_OP_WAITID command, where there was a
small window where the request would be left on the wait_queue_head
list even though it was being canceled/completed
- Update liburing git URL in the kernel tree
* tag 'io_uring-6.18-20251009' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/zcrx: increment fallback loop src offset
io_uring/zcrx: fix overshooting recv limit
io_uring: use tab indentation for IORING_SEND_VECTORIZED comment
io_uring/waitid: always prune wait queue entry in io_waitid_wait()
io_uring: update liburing git URL
Merge patch series "kbuild: Fixes for fallout from recent modules.builtin.modinfo series"
This is a series to address some problems that were exposed by the
recent modules.builtin.modinfo series that landed in commit c7d3dd9163e6
("Merge patch series "Add generated modalias to
modules.builtin.modinfo"").
The third patch is not directly related to the aforementioned series, as
the warning it fixes happens prior to the series but commit 8d18ef04f940
("s390: vmlinux.lds.S: Reorder sections") from the series creates
conflicts in this area, so I included it here.
s390/vmlinux.lds.S: Move .vmlinux.info to end of allocatable sections
When building s390 defconfig with binutils older than 2.32, there are
several warnings during the final linking stage:
s390-linux-ld: .tmp_vmlinux1: warning: allocated section `.got.plt' not in segment
s390-linux-ld: .tmp_vmlinux2: warning: allocated section `.got.plt' not in segment
s390-linux-ld: vmlinux.unstripped: warning: allocated section `.got.plt' not in segment
s390-linux-objcopy: vmlinux: warning: allocated section `.got.plt' not in segment
s390-linux-objcopy: st7afZyb: warning: allocated section `.got.plt' not in segment
binutils commit afca762f598 ("S/390: Improve partial relro support for
64 bit") [1] in 2.32 changed where .got.plt is emitted, avoiding the
warning.
The :NONE in the .vmlinux.info output section description changes the
segment for subsequent allocated sections. Move .vmlinux.info right
above the discards section to place all other sections in the previously
defined segment, .data.
Prior to binutils commit c12d9fa2afe ("Support objcopy
--remove-section=.relaFOO") [1] in 2.32, stripping relocation sections
required the trailing period (i.e., '.rel.*') to work properly.
After commit 3e86e4d74c04 ("kbuild: keep .modinfo section in
vmlinux.unstripped"), there is an error with binutils 2.31.1 or earlier
because these sections are not properly removed:
s390-linux-objcopy: st6tO8Ev: symbol `.modinfo' required but not present
s390-linux-objcopy:st6tO8Ev: no symbols
Add the old pattern to resolve this issue (along with a comment to allow
cleaning this when binutils 2.32 or newer is the minimum supported
version). While the aforementioned kbuild change exposes this, the
pattern was originally changed by commit 71d815bf5dfd ("kbuild: Strip
runtime const RELA sections correctly"), where it would still be
incorrect with binutils older than 2.32.
kbuild: Restore pattern to avoid stripping .rela.dyn from vmlinux
Commit 0ce5139fd96e ("kbuild: always create intermediate
vmlinux.unstripped") removed the pattern to avoid stripping .rela.dyn
sections added by commit e9d86b8e17e7 ("scripts: Do not strip .rela.dyn
section"). Restore it so that .rela.dyn sections remain in the final
vmlinux.
KaFai Wan [Wed, 8 Oct 2025 10:26:27 +0000 (18:26 +0800)]
selftests/bpf: Add test for unpinning htab with internal timer struct
Add test to verify that unpinning hash tables containing internal timer
structures does not trigger context warnings.
Each subtest (timer_prealloc and timer_no_prealloc) can trigger the
context warning when unpinning, but the warning cannot be triggered
twice within a short time interval (a HZ), which is expected behavior.
KaFai Wan [Wed, 8 Oct 2025 10:26:26 +0000 (18:26 +0800)]
bpf: Avoid RCU context warning when unpinning htab with internal structs
When unpinning a BPF hash table (htab or htab_lru) that contains internal
structures (timer, workqueue, or task_work) in its values, a BUG warning
is triggered:
BUG: sleeping function called from invalid context at kernel/bpf/hashtab.c:244
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 14, name: ksoftirqd/0
...
The issue arises from the interaction between BPF object unpinning and
RCU callback mechanisms:
1. BPF object unpinning uses ->free_inode() which schedules cleanup via
call_rcu(), deferring the actual freeing to an RCU callback that
executes within the RCU_SOFTIRQ context.
2. During cleanup of hash tables containing internal structures,
htab_map_free_internal_structs() is invoked, which includes
cond_resched() or cond_resched_rcu() calls to yield the CPU during
potentially long operations.
However, cond_resched() or cond_resched_rcu() cannot be safely called from
atomic RCU softirq context, leading to the BUG warning when attempting
to reschedule.
Fix this by changing from ->free_inode() to ->destroy_inode() and rename
bpf_free_inode() to bpf_destroy_inode() for BPF objects (prog, map, link).
This allows direct inode freeing without RCU callback scheduling,
avoiding the invalid context warning.
Reported-by: Le Chen <tom2cat@sjtu.edu.cn> Closes: https://lore.kernel.org/all/1444123482.1827743.1750996347470.JavaMail.zimbra@sjtu.edu.cn/ Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.") Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251008102628.808045-2-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Turned out certain clearly invalid values passed in xdp_desc from
userspace can pass xp_{,un}aligned_validate_desc() and then lead
to UBs or just invalid frames to be queued for xmit.
desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
can cause positive integer overflow and wraparound, the same way low
enough desc->addr with a non-zero pool->tx_metadata_len can cause
negative integer overflow. Both scenarios can then pass the
validation successfully.
This doesn't happen with valid XSk applications, but can be used
to perform attacks.
Always promote desc->len to ``u64`` first to exclude positive
overflows of it. Use explicit check_{add,sub}_overflow() when
validating desc->addr (which is ``u64`` already).
bloat-o-meter reports a little growth of the code size:
add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
Function old new delta
xskq_cons_peek_desc 299 330 +31
xsk_tx_peek_release_desc_batch 973 1002 +29
xsk_generic_xmit 3148 3132 -16
but hopefully this doesn't hurt the performance much.
Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len") Cc: stable@vger.kernel.org # 6.8+ Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20251008165659.4141318-1-aleksander.lobakin@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Linus Torvalds [Fri, 10 Oct 2025 17:01:55 +0000 (10:01 -0700)]
Merge tag 'parisc-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"Minor enhancements and fixes, specifically:
- report emulation and alignment faults via perf
- add initial kernel-side support for perf_events
- small initialization fixes in the parisc firmware layer
- adjust TC* constants and avoid referencing termio structs to avoid
userspace build errors"
* tag 'parisc-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix iodc and device path return values on old machines
parisc: Firmware: Fix returned path for PDC_MODULE_FIND on older machines
parisc: Add initial kernel-side perf_event support
parisc: Report software alignment faults via perf
parisc: Report emulation faults via perf
parisc: don't reference obsolete termio struct for TC* constants
parisc: Remove spurious if statement from raw_copy_from_user()
Linus Torvalds [Fri, 10 Oct 2025 16:55:19 +0000 (09:55 -0700)]
Merge tag 'sound-fix-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A few more small fixes for 6.18-rc1.
Most of changes are about ASoC Intel and SOF drivers, while a few
other device-specific fixes are found for HD-audio, USB-audio, ASoC
RT722VB and Meson"
* tag 'sound-fix-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: rt722: add settings for rt722VB
ASoC: meson: aiu-encoder-i2s: fix bit clock polarity
ALSA: usb: fpc: replace kmalloc_array followed by copy_from_user with memdup_array_user
ALSA: hda/tas2781: Enable init_profile_id for device initialization
ALSA: emu10k1: Fix typo in docs
ALSA: hda/realtek: Add quirk for ASUS ROG Zephyrus Duo
ASoC: SOF: Intel: Read the LLP via the associated Link DMA channel
ASoC: SOF: ipc4-pcm: do not report invalid delay values
ASoC: SOF: sof-audio: add dev_dbg_ratelimited wrapper
ASoC: SOF: Intel: hda-pcm: Place the constraint on period time instead of buffer time
ASoC: SOF: ipc4-topology: Account for different ChainDMA host buffer size
ASoC: SOF: ipc4-topology: Correct the minimum host DMA buffer size
ASoC: SOF: ipc4-pcm: fix start offset calculation for chain DMA
ASoC: SOF: ipc4-pcm: fix delay calculation when DSP resamples
ASoC: SOF: ipc3-topology: Fix multi-core and static pipelines tear down
ALSA: hda/hdmi: Add pin fix for HP ProDesk model
Linus Torvalds [Fri, 10 Oct 2025 16:36:23 +0000 (09:36 -0700)]
Merge tag 'fbdev-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev updates from Helge Deller:
"Beside the usual bunch of smaller bug fixes, the majority of changes
were by Zsolt Kajtar to improve the s3fb driver.
Bug fixes:
- Bounds checking to fix vmalloc-out-of-bounds (Albin Babu Varghese)
- Fix logic error in "offb" name match (Finn Thain)
- simplefb: Fix use after free in (Janne Grunau)
- s3fb: Various fixes and powersave improvements (Zsolt Kajtar)
Enhancements & code cleanups:
- Various fixes in the documentation (Bagas Sanjaya)
- Use string choices helpers (Chelsy Ratnawat)
- xenfb: Use vmalloc_array to simplify code (Qianfeng Rong)
- mb862xxfb: use signed type for error codes (Qianfeng Rong)
- Make drivers depend on LCD_CLASS_DEVICE (Thomas Zimmermann)
- radeonfb: Remove stale product link in Kconfig (Sukrut Heroorkar)"
* tag 'fbdev-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: Fix logic error in "offb" name match
fbdev: Add bounds checking in bit_putcs to fix vmalloc-out-of-bounds
fbdev: Make drivers depend on LCD_CLASS_DEVICE
fbdev: radeonfb: Remove stale product link in Kconfig
Documentation: fb: Retitle driver docs
Documentation: fb: ep93xx: Demote section headings
Documentation: fb: Split toctree
fbdev: simplefb: Fix use after free in simplefb_detach_genpds()
fbdev: s3fb: Revert mclk stop in suspend
fbdev: mb862xxfb: Use int type to store negative error codes
fbdev: Use string choices helpers
fbdev: core: Fix ubsan warning in pixel_to_pat
fbdev: s3fb: Implement 1 and 2 BPP modes, improve 4 BPP
fbdev: s3fb: Implement powersave for S3 FB
fbdev: xenfb: Use vmalloc_array to simplify code
Linus Torvalds [Fri, 10 Oct 2025 16:22:39 +0000 (09:22 -0700)]
Merge tag 'gpio-fixes-for-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- add a missing ACPI ID for MTL-CVF devices in gpio-usbio
- mark the gpio-wcd934x controller as "sleeping" as it uses a mutex for
locking internally
* tag 'gpio-fixes-for-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: wcd934x: mark the GPIO controller as sleeping
gpio: usbio: Add ACPI device-id for MTL-CVF devices
Linus Torvalds [Fri, 10 Oct 2025 16:18:19 +0000 (09:18 -0700)]
Merge tag 'ntb-6.18' of https://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
- Add support for Renesas R-Car and allow arbitrary BAR mapping in EPF
- Update ntb_hw_amd to support the latest generation secondary topology
and add a new maintainer
- Fix a bug by adding a mutex to ensure `link_event_callback` executes
sequentially
* tag 'ntb-6.18' of https://github.com/jonmason/ntb:
NTB: epf: Add Renesas rcar support
NTB: epf: Allow arbitrary BAR mapping
ntb: Add mutex to make link_event_callback executed linearly.
MAINTAINERS: Update for the NTB AMD driver maintainer
ntb_hw_amd: Update amd_ntb_get_link_status to support latest generation secondary topology
Linus Torvalds [Fri, 10 Oct 2025 16:13:11 +0000 (09:13 -0700)]
Merge tag 'i2c-for-6.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
- Second part of rtl9300 updates since dependencies are in now:
- general cleanups
- implement block read/write support
- add RTL9310 support
- DT schema conversion of hix5hd2 binding
- namespace cleanup for i2c-algo-pca
- minor simplification for mt65xx
* tag 'i2c-for-6.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
dt-bindings: i2c: hisilicon,hix5hd2: convert to DT schema
i2c: mt65xx: convert set_speed function to void
i2c: rename wait_for_completion callback to wait_for_completion_cb
i2c: rtl9300: add support for RTL9310 I2C controller
dt-bindings: i2c: realtek,rtl9301-i2c: extend for RTL9310 support
i2c: rtl9300: use scoped guard instead of explicit lock/unlock
i2c: rtl9300: separate xfer configuration and execution
i2c: rtl9300: do not set read mode on every transfer
i2c: rtl9300: move setting SCL frequency to config_io
i2c: rtl9300: rename internal sda_pin to sda_num
dt-bindings: i2c: realtek,rtl9301-i2c: fix wording and typos
i2c: rtl9300: use regmap fields and API for registers
i2c: rtl9300: Implement I2C block read and write
Linus Torvalds [Fri, 10 Oct 2025 15:34:11 +0000 (08:34 -0700)]
Merge tag 'tpmdd-next-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
- Disable TCG_TPM2_HMAC from defconfig
It causes performance issues, and breaks some atypical
configurations.
- simplify code using the new crypto library
- misc fixes and cleanups
* tag 'tpmdd-next-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Prevent local DOS via tpm/tpm0/ppi/*operations
tpm: use a map for tpm2_calc_ordinal_duration()
tpm_tis: Fix incorrect arguments in tpm_tis_probe_irq_single
tpm: Use HMAC-SHA256 library instead of open-coded HMAC
tpm: Compare HMAC values in constant time
tpm: Disable TPM2_TCG_HMAC by default
The ozlabs.org PW instance is slow due to being geographically far away
from any of the maintainers and seems to have gotten slower as of late
(AI scrapers perhaps). The kernel.org PW also has some additional
features (i.e. pwbot) we want to use.
DT core patches also go into PW, so add the PW link for it.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
gpio: wcd934x: mark the GPIO controller as sleeping
The slimbus regmap passed to the GPIO driver down from MFD does not use
fast_io. This means a mutex is used for locking and thus this GPIO chip
must not be used in atomic context. Change the can_sleep switch in
struct gpio_chip to true.
Fixes: 59c324683400 ("gpio: wcd934x: Add support to wcd934x gpio controller") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
tpm: Prevent local DOS via tpm/tpm0/ppi/*operations
Reads on tpm/tpm0/ppi/*operations can become very long on
misconfigured systems. Reading the TPM is a blocking operation,
thus a user could effectively trigger a DOS.
Resolve this by caching the results and avoiding the blocking
operations after the first read.
[ jarkko: fixed atomic sleep:
sed -i 's/spin_/mutex_/g' drivers/char/tpm/tpm_ppi.c
sed -i 's/DEFINE_SPINLOCK/DEFINE_MUTEX/g' drivers/char/tpm/tpm_ppi.c ]
Signed-off-by: Denis Aleksandrov <daleksan@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Closes: https://lore.kernel.org/linux-integrity/20250915210829.6661-1-daleksan@redhat.com/T/#u Suggested-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Jarkko Sakkinen [Thu, 18 Sep 2025 19:30:18 +0000 (22:30 +0300)]
tpm: use a map for tpm2_calc_ordinal_duration()
The current shenanigans for duration calculation introduce too much
complexity for a trivial problem, and further the code is hard to patch and
maintain.
Address these issues with a flat look-up table, which is easy to understand
and patch. If leaf driver specific patching is required in future, it is
easy enough to make a copy of this table during driver initialization and
add the chip parameter back.
'chip->duration' is retained for TPM 1.x.
As the first entry for this new behavior address TCG spec update mentioned
in this issue:
https://github.com/raspberrypi/linux/issues/7054
Therefore, for TPM_SelfTest the duration is set to 3000 ms.
This does not categorize a as bug, given that this is introduced to the
spec after the feature was originally made.
Reviewed-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
tpm_tis: Fix incorrect arguments in tpm_tis_probe_irq_single
The tpm_tis_write8() call specifies arguments in wrong order. Should be
(data, addr, value) not (data, value, addr). The initial correct order
was changed during the major refactoring when the code was split.
Fixes: 41a5e1cf1fe1 ("tpm/tpm_tis: Split tpm_tis driver into a core and TCG TIS compliant phy") Signed-off-by: Gunnar Kudrjavets <gunnarku@amazon.com> Reviewed-by: Justinien Bouron <jbouron@amazon.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>