Daniel Borkmann [Tue, 2 Jun 2026 13:30:52 +0000 (15:30 +0200)]
selftests/bpf: Test that exclusive maps are rejected as iter targets
Add a subtest to map_excl that creates an exclusive map and verifies a
bpf_map_elem iterator cannot be attached to it, which would otherwise
let an unrelated program read and overwrite the map's contents through
the iterator's writable value buffer.
sashiko complained that 38498c0ebacd ("selftests/bpf: Adjust verifier_map_ptr
for the map's excl field") would slightly decrease the test coverage given
before the test was against the verifier rejecting the ops pointer. Recover
the old test with the right offsets and add the existing one as an additional
test case.
# LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_map_ptr
[ 1.672932] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#637/1 verifier_map_ptr/bpf_map_ptr: read with negative offset rejected:OK
#637/2 verifier_map_ptr/bpf_map_ptr: read with negative offset rejected @unpriv:OK
#637/3 verifier_map_ptr/bpf_map_ptr: write rejected:OK
#637/4 verifier_map_ptr/bpf_map_ptr: write rejected @unpriv:OK
#637/5 verifier_map_ptr/bpf_map_ptr: read non-existent field rejected:OK
#637/6 verifier_map_ptr/bpf_map_ptr: read non-existent field rejected @unpriv:OK
#637/7 verifier_map_ptr/bpf_map_ptr: read beyond excl field rejected:OK
#637/8 verifier_map_ptr/bpf_map_ptr: read beyond excl field rejected @unpriv:OK
#637/9 verifier_map_ptr/bpf_map_ptr: read ops field accepted:OK
#637/10 verifier_map_ptr/bpf_map_ptr: read ops field accepted @unpriv:OK
#637/11 verifier_map_ptr/bpf_map_ptr: r = 0, map_ptr = map_ptr + r:OK
#637/12 verifier_map_ptr/bpf_map_ptr: r = 0, map_ptr = map_ptr + r @unpriv:OK
#637/13 verifier_map_ptr/bpf_map_ptr: r = 0, r = r + map_ptr:OK
#637/14 verifier_map_ptr/bpf_map_ptr: r = 0, r = r + map_ptr @unpriv:OK
#637 verifier_map_ptr:OK
[...]
Summary: 2/20 PASSED, 0 SKIPPED, 0 FAILED
Daniel Borkmann [Tue, 2 Jun 2026 13:30:50 +0000 (15:30 +0200)]
libbpf: Guard add_data() against size overflow
add_data() computes size8 = roundup(size, 8) and then hands size8 to
realloc_data_buf() before doing memcpy(gen->data_cur, data, size) with
the original size. A wrapped size8 passes through the realloc_data_buf()
INT32_MAX check. Harden this against overflow, though not realistic to
happen in practice.
Daniel Borkmann [Tue, 2 Jun 2026 13:30:49 +0000 (15:30 +0200)]
bpf: Reject exclusive maps for bpf_map_elem iterators
Exclusive maps (aka excl_prog_hash) are meant to be reachable only
from the single program whose hash matches. This is enforced by
check_map_prog_compatibility() when the map is referenced from a
program such as signed BPF loaders.
A bpf_map_elem iterator, however, binds its target map at attach
time in bpf_iter_attach_map() instead of referencing it from the
program, so the exclusivity check is never reached. On top of that,
the iterator exposes the map value as a writable buffer.
Linus Torvalds [Tue, 2 Jun 2026 15:59:35 +0000 (08:59 -0700)]
Merge tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"13 hotfixes. All are for MM. 10 are cc:stable and the remaining 3
address post-7.1 issues or aren't considered suitable for backporting.
There's a three-patch series "userfaultfd: verify VMA state across
UFFDIO_COPY retry" from Mike Rapoport which fixes a few uffd things.
The rest are singletons - please see the individual changelogs for
details"
* tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
userfaultfd: remove redundant check in vm_uffd_ops()
userfaultfd: refuse to __mfill_atomic_pte() for unsupported VMAs
userfaultfd: verify VMA state across UFFDIO_COPY retry
mm/huge_memory: update file PMD counter before folio_put()
mm/huge_memory: update file PUD counter before folio_put()
mm/hugetlb_vmemmap: fix incorrect vmemmap restore in rollback
mm/damon/ops-common: call folio_test_lru() after folio_get()
mm/cma: fix reserved page leak on activation failure
mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
mm/hugetlb: restore reservation on error in hugetlb folio copy paths
mm/cma_debug: fix invalid accesses for inactive CMA areas
memcg: use round-robin victim selection in refill_stock
mm/hugetlb: avoid false positive lockdep assertion
dt-bindings: arm-smmu: Correct and add constraints for Hawi, Shikra and Kaanapali
Previous commit 75949eb02653 ("dt-bindings: arm-smmu: Constrain clocks
for newer Qualcomm variants") duplicated constraints for
qcom,sm6350-smmu-500 and qcom,sm6375-smmu-500 - these are already part
of previous "if:" block.
It also missed enforcing one clock for qcom,kaanapali-smmu-500 in GPU
case and missed simultaneously added Shikra and Hawi.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Will Deacon <will@kernel.org>
Lizhi Hou [Tue, 2 Jun 2026 04:06:24 +0000 (21:06 -0700)]
accel/amdxdna: Preserve user address when PASID is disabled
When PASID is not used, the buffer user address is set to
AMDXDNA_INVALID_ADDR. As a result, heap buffer user address validation
fails even though the original userspace address is available.
Preserve the userspace address regardless of PASID usage so heap buffer
address validation works correctly.
Carlos López [Fri, 29 May 2026 14:00:14 +0000 (16:00 +0200)]
KVM: x86: Take PIC lock on KVM_GET_IRQCHIP path
When userspace issues the KVM_SET_IRQCHIP ioctl to set the state of
the PIC, kvm_vm_ioctl_set_irqchip() grabs @kvm->arch.vpic->lock before
updating the state. However, the KVM_GET_IRQCHIP ioctl to retrieve the
same PIC state does not grab such lock, potentially causing torn reads
for userspace.
Fix this by grabbing the lock on the read path.
This issue goes all the way back. The bug was introduced with the
addition of PIC ioctl code itself in 6ceb9d791eee ("KVM: Add get/
set irqchip ioctls for in-kernel PIC live migration support"). Later, 894a9c5543ab ("KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU
ioctl paths") added the locking for kvm_vm_ioctl_set_irqchip(), but
missed kvm_vm_ioctl_get_irqchip().
Fixes: 6ceb9d791eee ("KVM: Add get/set irqchip ioctls for in-kernel PIC live migration support") Fixes: 894a9c5543ab ("KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths") Reported-by: Claude Code:claude-opus-4.6 Signed-off-by: Carlos López <clopez@suse.de> Link: https://patch.msgid.link/20260529140013.14925-2-clopez@suse.de Signed-off-by: Sean Christopherson <seanjc@google.com>
Ard Biesheuvel [Fri, 29 May 2026 15:02:06 +0000 (17:02 +0200)]
arm64: mm: Unmap kernel data/bss entirely from the linear map
The linear aliases of the kernel text and rodata are also mapped
read-only in the linear map. Given that the contents of these regions
are mostly identical to the version in the loadable image, mapping them
read-only and leaving their contents visible is a reasonable hardening
measure.
Data and bss, however, are now also mapped read-only but the contents of
these regions are more likely to contain data that we'd rather not leak.
So let's unmap these entirely in the linear map when the kernel is
running normally.
When going into hibernation or waking up from it, these regions need to
be mapped, so map the region initially, and toggle the valid bit so
map/unmap the region as needed.
Doing so is required because pages covering the kernel image are marked
as PageReserved, and therefore disregarded for snapshotting by the
hibernate logic unless they are mapped.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:02:05 +0000 (17:02 +0200)]
arm64: mm: Map the kernel data/bss read-only in the linear map
On systems where the bootloader adheres to the original arm64 boot
protocol, the placement of the kernel in the physical address space is
highly predictable, and this makes the placement of its linear alias in
the kernel virtual address space equally predictable, given the lack of
randomization of the linear map.
The linear aliases of the kernel text and rodata regions are already
mapped read-only, but the kernel data and bss are mapped read-write in
this region. This is not needed, so map them read-only as well.
Note that the statically allocated kernel page tables do need to be
modifiable via the linear map, so leave these mapped read-write.
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:02:04 +0000 (17:02 +0200)]
mm: Make empty_zero_page[] const
The empty zero page is used to back any kernel or user space mapping
that is supposed to remain cleared, and so the page itself is never
supposed to be modified.
So mark it as const, which moves it into .rodata rather than .bss: on
most architectures, this ensures that both the kernel's mapping of it
and any aliases that are accessible via the kernel direct (linear) map
are mapped read-only, and cannot be used (inadvertently or maliciously)
to corrupt the contents of the zero page.
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
the zero page did double duty as a boot params region, and was cleared
separately, as it was not part of BSS. The memset() in question was
dropped by that commit, but the __flush_wback_region() call remained.
As empty_zero_page[] has been moved to BSS, it can be treated as any
other BSS memory, and so the cache flush can be dropped.
Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:02:02 +0000 (17:02 +0200)]
powerpc/code-patching: Avoid r/w mapping of the zero page
The only remaining use of map_patch_area() is mapping the zero page, and
immediately unmapping it again so that the intermediate page table
levels are all guaranteed to be populated.
The use of the zero page here is completely arbitrary, and not harmful
per se, but currently, it creates a writable mapping, and does so in a
manner that requires that the empty_zero_page[] symbol is not
const-qualified.
Given that this is about to change, and that map_patch_area() now never
maps anything other than the zero page, let's simplify the code and
- remove the helpers and call [un]map_kernel_page() directly
- take the PA of empty_zero_page directly
- create a read-only temporary mapping.
This allows empty_zero_page[] to be repainted as const u8[] in a
subsequent patch, without making substantial changes to this code
patching logic.
Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Link: https://lore.kernel.org/all/20260520085423.485402-1-ardb@kernel.org/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:02:01 +0000 (17:02 +0200)]
arm64: mm: Don't abuse memblock NOMAP to check for overlaps
Now that the linear region mapping routines respect existing table
mappings and contiguous block and page mappings, it is no longer needed
to fiddle with the memblock tables to set and clear the NOMAP attribute
in order to omit text and rodata when creating the linear map.
Instead, map the kernel text and rodata alias first with the desired
initial attributes and granularity, so that the loop iterating over the
memblocks will not remap it in a manner that prevents it from being
remapped with updated attributes later.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:02:00 +0000 (17:02 +0200)]
arm64: Move fixmap and kasan page tables to end of kernel image
Move the fixmap and kasan page tables out of the BSS section, and place
them at the end of the image, right before the init_pg_dir section where
some of the other statically allocated page tables live.
These page tables are currently the only data objects in vmlinux that
are meant to be accessed via the kernel image's linear alias, and so
placing them together allows the remainder of the data/bss section to be
remapped read-only or unmapped entirely.
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:01:59 +0000 (17:01 +0200)]
arm64: mm: Permit contiguous attribute for preliminary mappings
There are a few cases where we omit the contiguous hint for mappings
that start out as read-write and are remapped read-only later, on the
basis that manipulating live descriptors with the PTE_CONT attribute set
is unsafe. When support for the contiguous hint was added to the code,
the ARM ARM was ambiguous about this, and so we erred on the side of
caution.
In the meantime, this has been clarified [0], and regions that will be
remapped in their entirety, retaining the contiguous bit on all entries,
can use the contiguous hint both in the initial mapping as well as the
one that replaces it. Note that this requires that the logic that may be
called to remap overlapping regions respects existing valid descriptors
that have the contiguous bit cleared.
So omit the NO_CONT_MAPPINGS flag in places where it is unneeded.
[0] RJQQTC
For a TLB lookup in a contiguous region mapped by translation table entries that
have consistent values for the Contiguous bit, but have the OA, attributes, or
permissions misprogrammed, that TLB lookup is permitted to produce an OA, access
permissions, and memory attributes that are consistent with any one of the
programmed translation table values.
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:01:58 +0000 (17:01 +0200)]
arm64: kfence: Avoid NOMAP tricks when mapping the early pool
Now that the map_mem() routines respect existing page mappings and
contiguous granule sized blocks with the contiguous bit cleared, there
is no longer a reason to play tricks with the memblock NOMAP attribute.
Instead, the kfence pool can be allocated and mapped with page
granularity first, and this granularity will be respected when the rest
of DRAM is mapped later, even if block and contiguous mappings are
allowed for the remainder of those mappings.
Add the NO_EXEC_MAPPINGS flag to ensure that hierarchical XN attributes
are set on the intermediate page tables that are allocated when mapping
the pool.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:01:57 +0000 (17:01 +0200)]
arm64: mm: Permit contiguous descriptors to be manipulated
Currently, pgattr_change_is_safe() is overly pedantic when it comes to
descriptors with the contiguous hint attribute set, as it rejects
assignments even if the old and the new value are the same.
In fact, as per ARM ARM RJQQTC, manipulating descriptors with the
contiguous bit set is safe as long as the bit itself does not change
value, in the sense that no TLB conflict aborts or other exceptions may
be raised as a result. Inconsistent permission attributes within the
contiguous region may result in any of the alternatives to be taken to
apply to the entire region, which might be a programming error, but it
does not constitute an unsafe manipulation in terms of what
pgattr_change_is_safe() is intended to detect.
So drop the special PTE_CONT check, but still omit PTE_CONT from 'mask'
so that modifying the bit is still regarded as unsafe.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:01:56 +0000 (17:01 +0200)]
arm64: mm: Preserve non-contiguous descriptors when mapping DRAM
Instead of blindly overwriting existing live entries regardless of the
value of their contiguous bit when mapping DRAM regions at
contiguous-hint granularity, check whether the contiguous region in
question contains any valid descriptors that have the contiguous bit
cleared, and in that case, leave the contiguous bit unset on the entire
region. This permits the logic of mapping the kernel's linear alias to
be simplified in a subsequent patch.
Note that this can only result in a misprogrammed contiguous bit (as per
ARM ARM RNGLXZ) if the region in question already contains a mix of
valid contiguous and valid non-contiguous descriptors, in which case it
was already misprogrammed to begin with.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:01:55 +0000 (17:01 +0200)]
arm64: mm: Preserve existing table mappings when mapping DRAM
Instead of blindly overwriting an existing table entry when mapping DRAM
regions, take care not to replace a pre-existing table entry with a
block entry. This permits the logic of mapping the kernel's linear alias
to be simplified in a subsequent patch.
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:01:54 +0000 (17:01 +0200)]
arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings
Sashiko reports:
| If pmd_set_huge() rejects an unsafe page table transition (such as
| mapping a different physical address over an existing block mapping),
| it returns 0 and leaves the page table entry unmodified.
|
| Because *pmdp remains unmodified, READ_ONCE(pmd_val(*pmdp)) will equal
| pmd_val(old_pmd). The transition from old_pmd to old_pmd is evaluated
| as safe by pgattr_change_is_safe(), so the BUG_ON never triggers.
|
| This allows invalid and unsafe mapping updates to be silently dropped
| instead of panicking, leaving stale memory mappings active while the
| caller assumes the update was successful.
The same applies to pud_set_huge() in alloc_init_pud().
Given how it is generally preferred to limp on rather than blow up the
system if an unexpected condition such as this one occurs, and the fact
that there are no known cases where this disparity results in real
problems, let's WARN on these failures rather than BUG, allowing the
system to survive to the point where it can actually report them.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:01:53 +0000 (17:01 +0200)]
arm64: mm: Drop redundant pgd_t* argument from map_mem()
__map_memblock() and map_mem() always operate on swapper_pg_dir, so
there is no need to pass around a pgd_t pointer between them.
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Fri, 29 May 2026 15:01:52 +0000 (17:01 +0200)]
arm64: mm: Remove bogus stop condition from map_mem() loop
The memblock API guarantees that start is not greater than or equal to
end, so there is no need to test it. And if it were, it is doubtful that
breaking out of the loop would be a reasonable course of action here
(rather than attempting to map the remaining regions)
So let's drop this check.
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Tue, 2 Jun 2026 15:21:48 +0000 (16:21 +0100)]
ASoC: loongson: Refactor DMA and regmap handling
Binbin Zhou <zhoubinbin@loongson.cn> says:
This series refactors the Loongson I2S ASoC drivers, reducing code
duplication and improving DMA differentiation. It also adds an entry
in MAINTAINERS and applies a few fixes to the es8323 codec driver.
These changes have been tested on Loongson-2K0300 (platform, eDMA) and
Loongson-2K2000 (PCI, iDMA) boards.
Binbin Zhou [Mon, 1 Jun 2026 09:29:39 +0000 (17:29 +0800)]
ASoC: loongson: Separate external shared DMA from the platform interface
The Loongson I2S platform driver (used on LS2K1000, LS7A etc.) relies on
an external DMA engine (e.g., dw_dmac) rather than the internal DMA.
However, its DMA-related code was originally embedded in
loongson_i2s_plat.c, duplicating logic that should be shared.
Extract the external DMA (eDMA) support from the platform driver and move
it into loongson_dma.c alongside the existing internal DMA (iDMA) code.
This change eliminates code duplication and prepares for future
consolidation of DMA selection logic.
Binbin Zhou [Mon, 1 Jun 2026 09:29:38 +0000 (17:29 +0800)]
ASoC: loongson: Use the `idma` identifier for internal DMA variables
The Loongson I2S controller can work with two types of DMA:
- Internal DMA (iDMA): integrated DMA engine, driven by dedicated
registers and interrupts.
- External DMA (eDMA): generic DMA engine (e.g., dw_dmac), using the
standard dmaengine API.
To distinguish these two distinct implementations, rename all
internal-DMA-related structures, functions, and the component driver
to use the "idma" prefix.
Binbin Zhou [Mon, 1 Jun 2026 09:29:37 +0000 (17:29 +0800)]
ASoC: loongson: Combined regmap definitions
Previously, the regmap configuration for Loongson I2S controller was
duplicated in both PCI and platform glue drivers. Move the common
regmap configuration into the shared loongson_i2s.c to avoid code
duplication and centralize register access handling.
While moving, adjust the following:
- Mark RX_DATA/TX_DATA/I2S_CTRL as volatile registers. The PCI version
incorrectly marked CFG/CFG1 as volatile, which prevented proper
regcache synchronization.
- Change cache type from REGCACHE_FLAT to REGCACHE_MAPLE. The register
map is sparse and the number of registers is small; MAPLE tree provides
better scalability and is the recommended cache type for modern
regmap users.
Also, the following warning for the i2s_plat driver will be eliminated:
loongson-i2s-plat loongson-i2s: using zero-initialized flat cache, this may cause unexpected behavior.
Wandun Chen [Mon, 25 May 2026 12:17:00 +0000 (20:17 +0800)]
of: reserved_mem: only support one <base size> entry in reg property
A /reserved-memory child node may have multiple <base size> tuples in
'reg' property, but multiple entries in 'reg' have never been fully
functional:
- fdt_scan_reserved_mem() in the early pass loops over every
tuple and reserves them all.
- fdt_scan_reserved_mem_late() reads 'reg' by
of_flat_dt_get_addr_size(), which returns false if entries != 1.
So 'reg' property with multiple <base size> entries will be
skipped, no reserved_mem entry is created in reserved_mem[].
Supporting multiple <base size> tuples is not a good idea:
- It requires reserved_mem_ops->node_init support. Currently,
CMA(rmem_cma_setup) and DMA(rmem_dma_setup) are not supported.
- of_reserved_mem_lookup() is name-based, only the first entry in
multiple <base size> tuples will be found.
So change to support one <base size> entry in 'reg' property.
Also update dt binding:
https://github.com/devicetree-org/dt-schema/pull/197
Mark Brown [Tue, 2 Jun 2026 15:13:08 +0000 (16:13 +0100)]
ASoC: mediatek: mt8192 probe cleanup
Cássio Gabriel <cassiogabrielcontato@gmail.com> says:
Fix two MT8192 AFE probe cleanup issues that mirror the recently fixed
MT8189 and MT8196 paths.
The first patch registers a devm cleanup action for a successful
reserved-memory assignment so later probe failures and driver unbind
release it.
The second patch checks the temporary runtime resume used while
reinitializing the regmap cache and makes the regcache failure path drop
the PM reference and clear pm_runtime_bypass_reg_ctl.
Cássio Gabriel [Wed, 27 May 2026 13:55:47 +0000 (10:55 -0300)]
ASoC: mediatek: mt8192: Check runtime resume during probe
The MT8192 AFE probe enables runtime PM temporarily while reinitializing
the regmap cache from hardware, but it uses pm_runtime_get_sync()
without checking the return value. If runtime resume fails, probe keeps
going without the device necessarily being accessible, and
pm_runtime_get_sync() may leave the PM usage count incremented.
The regmap_reinit_cache() failure path also returns before dropping the
temporary PM reference and before clearing pm_runtime_bypass_reg_ctl.
Use pm_runtime_resume_and_get() so resume failures do not leak a usage
count, and clear the temporary bypass flag after dropping the probe PM
reference on all regmap_reinit_cache() outcomes.
Cássio Gabriel [Wed, 27 May 2026 13:55:46 +0000 (10:55 -0300)]
ASoC: mediatek: mt8192: Release reserved memory on cleanup
The MT8192 AFE probe calls of_reserved_mem_device_init() and falls
back to preallocated buffers when no reserved memory region is
available. When the reserved memory assignment succeeds, however, the
driver never releases it.
Register a devm cleanup action after a successful reserved-memory
assignment so the assignment is released on probe failure and driver
unbind.
Cássio Gabriel <cassiogabrielcontato@gmail.com> says:
The MT8183 AFE probe has two cleanup gaps that match issues
recently fixed in newer MediaTek AFE drivers.
First, reserved memory assigned with of_reserved_mem_device_init()
is never released on driver removal or later probe failures.
Second, the probe-time runtime PM resume used before reinitializing
the regmap cache is unchecked, and a regmap_reinit_cache() failure
skips the temporary PM put.
Fix both issues with a devm reserved-memory release action and
checked runtime PM resume handling.
Cássio Gabriel [Wed, 27 May 2026 13:41:49 +0000 (10:41 -0300)]
ASoC: mediatek: mt8183: Check runtime resume during probe
The MT8183 AFE probe uses pm_runtime_get_sync() before reading hardware
defaults into the regmap cache, but does not check whether runtime resume
failed. If regmap_reinit_cache() then fails, the temporary runtime PM
usage count is also not released.
Use pm_runtime_resume_and_get() so resume failures abort probe without
leaking a usage count, and release the temporary reference before
handling the regmap cache result.
Cássio Gabriel [Wed, 27 May 2026 13:41:48 +0000 (10:41 -0300)]
ASoC: mediatek: mt8183: Release reserved memory on cleanup
The MT8183 AFE probe can assign reserved memory with
of_reserved_mem_device_init(), but the assignment is never released on
driver removal or later probe failures.
Register a devm cleanup action so the reserved memory assignment is
released consistently, matching newer Mediatek AFE drivers.
Mark Brown [Tue, 2 Jun 2026 15:09:27 +0000 (16:09 +0100)]
regulator: Use named initializers for platform_device_id arrays
Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> says:
this series targets to use named initializers for platform_device_id
arrays. In general these are better readable for humans and more robust
to changes in the respective struct definition.
regulator: Unify usage of space and comma in platform_device_id arrays
After converting all these arrays to use named initializers and fixing
coding style en passant, adapt the coding style also for those drivers that
already used named initializers before for consistency.
regulator: Use named initializers for platform_device_id arrays
Named initializers are better readable and more robust to changes of the
struct definition. This robustness is relevant for a planned change to
struct platform_device_id replacing .driver_data by an anonymous unit.
While touching these arrays unify spacing and usage of commas.
regulator: Drop unused assignment of platform_device_id driver data
Several drivers explicitly set the .driver_data member of struct
platform_device_id to zero without relying on that value. Drop these
unused assignments.
While touching these arrays unify spacing, usage of commas and use
named initializers for .name.
Cássio Gabriel [Mon, 25 May 2026 17:18:03 +0000 (14:18 -0300)]
ASoC: codecs: rk3328: Use managed GPIO and clock helpers
rk3328_platform_probe() acquires the mute GPIO with gpiod_get_optional()
but never releases it. It also enables mclk and pclk manually while
relying on probe error labels for unwind, and the driver has no platform
remove callback to disable those clocks after a successful unbind.
This path has already needed fixes for missing clock unwinds on probe
errors. Use devm_gpiod_get_optional() and devm_clk_get_enabled() so the
GPIO and enabled clock lifetimes are tied to the device. This removes the
manual error labels and makes both probe failure and driver unbind follow
the normal devres cleanup path.
Wentao Liang [Wed, 27 May 2026 10:48:50 +0000 (10:48 +0000)]
regulator: scmi: fix of_node refcount leak in scmi_regulator_probe()
scmi_regulator_probe() calls of_find_node_by_name() which takes a
reference on the returned device node. On the error path where
process_scmi_regulator_of_node() fails, the function returns without
calling of_node_put() on the child node, leaking the reference.
Add of_node_put(np) on the error path to properly release the
reference.
ntfs3: fix out-of-bounds read in ntfs_dir_emit() and hdr_find_e()
The bounds check in ntfs_dir_emit() compares fname->name_len (a
character count) against e->size (a byte count) without accounting
for the 2-byte-per-character UTF-16LE encoding or the ATTR_FILE_NAME
header size:
if (fname->name_len + sizeof(struct NTFS_DE) > le16_to_cpu(e->size))
This computes: name_len + 16 > e_size
The correct check must account for the ATTR_FILE_NAME header (66 bytes
before the name) and the UTF-16LE character size (2 bytes each):
The correct calculation already exists as fname_full_size() in ntfs.h
and is used in cmp_fnames(), namei.c, and fslog.c, but was not used
in the readdir path.
A crafted NTFS image with an index entry containing a small e->size
but large fname->name_len bypasses the current check, causing
ntfs_utf16_to_nls() to read past the entry boundary.
Additionally, add a key_size validation in hdr_find_e() to ensure the
declared key_size does not exceed the available entry data, preventing
comparison functions from reading past entry boundaries on the lookup
path.
Signed-off-by: Alessandro Schino <7991aleschino@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Jamie Nguyen [Tue, 19 May 2026 19:42:20 +0000 (12:42 -0700)]
fs/ntfs3: fix mount failure on 64K page-size kernels
On 64K page-size kernels, mounting NTFS volumes smaller than ~650 MB
fails with EINVAL. The issue is in log_replay(): the initial log page
size probe uses PAGE_SIZE (65536) instead of DefaultLogPageSize (4096)
when PAGE_SIZE exceeds DefaultLogPageSize * 2.
This makes norm_file_page() require the $LogFile to be at least
50 * 65536 = 3.2 MB, but mkfs.ntfs creates a $LogFile of only ~1.5 MB
for a typical 300 MB volume. norm_file_page() returns 0 and the mount
is rejected with EINVAL.
On 4K kernels the #if guard evaluates to true, so use_default=true is
passed and DefaultLogPageSize (4096) is used, requiring only ~200 KB.
This path works fine.
Fix this by always passing use_default=true, which forces the initial
probe to use DefaultLogPageSize regardless of the kernel's PAGE_SIZE.
This is safe because, after reading the on-disk restart area, log_replay()
already re-adjusts log->page_size to match the volume's actual
sys_page_size.
Also fix read_log_page() to pass log->page_size instead of PAGE_SIZE to
ntfs_fix_post_read(), matching the actual buffer size.
Fixes: b46acd6a6a62 ("fs/ntfs3: Add NTFS journal") Tested-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Jamie Nguyen <jamien@nvidia.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Arnd Bergmann [Fri, 15 May 2026 09:09:50 +0000 (11:09 +0200)]
ntfs3: avoid another -Wmaybe-uninitialized warning
The ntfs3 specific -Wmaybe-uninitialized flag found one more false-postive,
this time with gcc-10 on s390:
fs/ntfs3/frecord.c: In function 'ni_expand_list':
fs/ntfs3/frecord.c:1370:16: error: 'ins_attr' may be used uninitialized in this function [-Werror=maybe-uninitialized]
Add an explicit NULL pointer check before using the pointer, and
initialize it to NULL.
Fixes: 48d9b57b169f ("fs/ntfs3: add a subset of W=1 warnings for stricter checks") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Mihai Brodschi [Mon, 11 May 2026 17:19:04 +0000 (20:19 +0300)]
ntfs3: Allocate iomap inline_data using alloc_page
This fixes a BUG reported in iomap_write_end_inline:
iomap_inline_data_valid checks that the inline_data fits within
a page. If the inline_data is allocated with kmemdup there's no
guarantee that it's page-aligned, so the check sometimes fails.
Allocate it with alloc_page to ensure it's page-aligned.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221446 Fixes: 099ef9ab9203 ("fs/ntfs3: implement iomap-based file operations") Signed-off-by: Mihai Brodschi <m.brodschi@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
fs/ntfs3: reject SEEK_DATA and SEEK_HOLE past EOF early
Handle non-data/hole seeks through generic_file_llseek_size() and return
-ENXIO immediately when SEEK_DATA or SEEK_HOLE is requested at or past
EOF. Handle compressed files in such cases properly as well.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
fs/ntfs3: fold file size handling into ntfs_set_size()
Remove the separate ntfs_extend() and ntfs_truncate() helpers and route
file size changes through ntfs_set_size().
This consolidates ntfs3 size updates in one place and lets the write,
fallocate, and setattr paths share the same logic for updating i_size,
valid data length, and preallocated extents.
This patch fixes a few issues found during internal tests.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
fs/ntfs3: fold resident writeback into writepages loop
Remove the separate ntfs_resident_writepage() helper and handle resident
writeback directly from ntfs_writepages(). This simplifies the resident
writeback path and keeps the folio handling local to ntfs_writepages().
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
fs/ntfs3: handle delayed allocation overlap in run lookup
Introduce run_lookup_entry_da() to look up data runs while taking
delayed allocation into account.
ntfs3 may have both committed extents and delayed allocation extents for
the same VCN range. The new helper checks delayed allocation first and
falls back to the real run, then corrects the returned range when a real
run overlaps with a delayed allocation run.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
fs/ntfs3: zero stale pagecache beyond valid data length
Zero cached folios beyond the valid data length when closing a writable
mapping. This keeps cached data beyond initialized file contents zeroed
and prevents stale pagecache exposure after mmap-based writes.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Zhan Xusheng [Fri, 8 May 2026 09:52:45 +0000 (17:52 +0800)]
fs/ntfs3: fix wrong LCN in run_remove_range() when splitting a run
When run_remove_range() removes a middle portion of a non-sparse run,
it splits the run into head and tail parts. The tail is inserted via
run_add_entry() but uses the original r->lcn as its starting LCN
instead of advancing it by the split offset.
For example, removing VCN range [10, 20) from a run
{vcn=0, lcn=100, len=30} should produce:
{vcn=0, lcn=100, len=10} (head)
{vcn=20, lcn=120, len=10} (tail, lcn advanced by 20)
But the current code produces:
{vcn=0, lcn=100, len=10}
{vcn=20, lcn=100, len=10} (wrong: points to same physical clusters)
This creates overlapping physical mappings in the in-memory run tree,
which can corrupt cluster allocation decisions and lead to data
corruption.
The correct pattern is already used in run_insert_range():
CLST lcn2 = r->lcn == SPARSE_LCN ? SPARSE_LCN : (r->lcn + len1);
Yunpeng Tian [Mon, 4 May 2026 14:19:43 +0000 (07:19 -0700)]
fs/ntfs3: validate Dirty Page Table capacity in log_replay copy_lcns
In the analysis pass of $LogFile journal replay, log_replay() copies
LCNs from each action log record into an existing Dirty Page Table
(DPT) entry without bounding the destination index. A crafted NTFS
image with DPT entry lcns_follow=1 and an action log record with
lcns_follow=2 produces a kernel slab out-of-bounds write at mount
time:
BUG: KASAN: slab-out-of-bounds in log_replay+0x654c/0xdb60
Write of size 8 at addr ffff8880095e1040 by task mount
Two attacker-controlled fields can drive j+i past the allocated
page_lcns[] array:
1. dp->lcns_follow (capacity) can be smaller than lrh->lcns_follow.
2. lrh->target_vcn may be smaller than dp->vcn, making the u64
subtraction wrap to a huge size_t.
Validate target VCN delta and per-record LCN count against the
DPT entry capacity, bail via the existing out: cleanup label with
-EINVAL.
This mirrors the bounds-check pattern added in commit b2bc7c44ed17
("fs/ntfs3: Fix slab-out-of-bounds read in DeleteIndexEntryRoot")
and commit 0ca0485e4b2e ("fs/ntfs3: validate rec->used in
journal-replay file record check").
Zhan Xusheng [Wed, 6 May 2026 07:55:54 +0000 (15:55 +0800)]
fs/ntfs3: fix syncing wrong inode on DIRSYNC cross-directory rename
In ntfs3_rename(), when IS_DIRSYNC(new_dir) is true, the code syncs
the renamed file inode instead of the target directory new_dir:
if (IS_DIRSYNC(new_dir))
ntfs_sync_inode(inode); /* should be new_dir */
DIRSYNC requires that directory metadata changes are written to disk
synchronously. Since new_dir was modified (a new directory entry was
added), it is new_dir that must be synced to satisfy the guarantee,
not the renamed file itself.
This bug has existed since the initial ntfs3 implementation and was
carried through the refactoring in commit 78ab59fee07f
("fs/ntfs3: Rework file operations").
Fix by syncing new_dir instead of inode.
Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") Cc: stable@vger.kernel.org Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
[BUG]
A malformed NTFS directory index entry can advertise a key_size larger
than the bytes actually present in its NTFS_DE payload. Directory lookup
then passes that malformed key to cmp_fnames(), which can read past the
end of the kmalloc'ed index buffer.
BUG: KASAN: slab-out-of-bounds in fname_full_size fs/ntfs3/ntfs.h:590 [inline]
BUG: KASAN: slab-out-of-bounds in cmp_fnames+0x1ea/0x230 fs/ntfs3/index.c:46
Read of size 1 at addr ffff88801c313018 by task syz.6.3365/9279
[CAUSE]
The index-header validators only validated INDEX_HDR-level geometry.
They did not walk each NTFS_DE to verify entry alignment, subnode
layout, or that key_size fit inside the entry payload. They also
allowed a last sentinel entry to carry a non-zero key_size.
[FIX]
Walk every NTFS_DE in ntfs3's index-header validators and reject
entries with invalid layout, mismatched subnode state, oversized
key_size, or non-zero sentinel keys before lookup or log replay can
consume them.
Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
fs/ntfs3: preserve non-DOS attribute bits in system.dos_attrib
[BUG]
A corrupted ntfs3 image can hit a NULL function pointer call in
generic_perform_write() after toggling system.ntfs_attrib and then
overwriting system.dos_attrib on the same file.
[CAUSE]
system.ntfs_attrib updates ATTR_DATA flags via ni_new_attr_flags()
and switches i_mapping->a_ops to ntfs_aops_cmpr when
FILE_ATTRIBUTE_COMPRESSED is set. system.dos_attrib then overwrites
ni->std_fa from a one-byte DOS attribute value, clearing the compression
bit without updating ATTR_DATA or the mapping operations.
Old buffered writes use is_compressed(ni) to choose
__generic_file_write_iter(). That leaves generic_perform_write() calling
a NULL write_begin callback from ntfs_aops_cmpr.
[FIX]
Treat system.dos_attrib as a low-byte DOS attribute update and preserve the
existing non-DOS attribute bits in ni->std_fa. This keeps compressed and
sparse state consistent with ATTR_DATA and the mapping operations while
keeping the existing DOS attribute semantics intact.
Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
fs/ntfs3: hold ni_lock across readdir metadata walk
[BUG]
KASAN reports a slab-use-after-free during getdents(2):
BUG: KASAN: slab-use-after-free in ntfs_read_mft fs/ntfs3/inode.c:79 [inline]
BUG: KASAN: slab-use-after-free in ntfs_iget5+0x59b/0x3450 fs/ntfs3/inode.c:541
Read of size 2 at addr ffff88800b7a5a4e by task syz.0.1061/2354
The faulting address sits 590 bytes inside a freed kmalloc-1k object
allocated by ni_add_subrecord() and freed from ni_write_inode()
writeback.
[CAUSE]
ntfs_readdir() loads all subrecords once, but then drops ni_lock()
before it starts walking the directory metadata through ntfs_read_hdr().
That leaves the current NTFS_DE pointer backed by parent-directory
subrecord memory that concurrent writeback is still allowed to compact
and free.
The later ntfs_dir_emit() -> ntfs_iget5() call exposes the stale e->ref,
but the lifetime bug starts earlier: readdir is still consuming
parent-directory metadata after releasing the lock that protects it.
[FIX]
Keep ni_lock() held from the point where ntfs_readdir() starts
consuming the directory metadata until the walk over root/index entries
is finished.
This closes the parent-directory lifetime hole directly and keeps the
existing readdir d_type behaviour unchanged.
Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
In file included from fs/ntfs3/index.c:15:
fs/ntfs3/index.c: In function 'indx_add_allocate':
fs/ntfs3/ntfs_fs.h:463:9: error: 'bmp_size' may be used uninitialized in this function [-Werror=maybe-uninitialized]
463 | return attr_set_size_ex(ni, type, name, name_len, run, new_size,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
464 | new_valid, keep_prealloc, NULL, false);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/ntfs3/index.c:1498:6: note: 'bmp_size' was declared here
1498 | u64 bmp_size, bmp_size_v;
| ^~~~~~~~
The warning does look correct, as the 'out2' label can be reached
without initializing bmp_size and bmp_size_v. Initialize these at
the same place as bmp.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
fs/ntfs3: add bounds check to run_get_highest_vcn()
run_get_highest_vcn() parses a packed NTFS mapping-pairs buffer without
any length bound, relying solely on a 0x00 terminator to stop. A
crafted $LogFile UpdateMappingPairs record whose embedded attribute
contains mapping-pairs runs without a terminator causes the function to
read past the slab allocation, triggering a KASAN slab-out-of-bounds
read on mount.
The sibling function run_unpack() received an analogous bounds-check in
commit b62567bca474 ("ntfs3: add buffer boundary checks to run_unpack()"),
but run_get_highest_vcn() was missed.
Take a run_buf_size parameter and reject any run header whose payload
would extend past the buffer end, mirroring the pattern used by
run_unpack(). The caller in fslog.c passes the remaining attribute
bytes after the mapping-pairs offset.
KASAN report (on mainline v7.1 merge window HEAD):
BUG: KASAN: slab-out-of-bounds in run_get_highest_vcn+0x3c0/0x410
Read of size 1 at addr ffff88800e2d5400 by task mount/72
Call Trace:
run_get_highest_vcn+0x3c0/0x410
do_action.isra.0+0x3ba8/0x7b50
log_replay+0x9ddd/0x10200
ntfs_loadlog_and_replay+0x4ad/0x610
ntfs_fill_super+0x214a/0x4540
Fixes: b62567bca474 ("ntfs3: add buffer boundary checks to run_unpack()") Signed-off-by: Jaeyeong Lee <lee@jaeyeong.cc> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cássio Gabriel [Fri, 22 May 2026 02:30:07 +0000 (23:30 -0300)]
ASoC: rockchip: i2s: Use managed hclk and runtime PM cleanup
The Rockchip I2S driver mixes devm-managed probe resources with manual
runtime PM and hclk cleanup. This leaves the remove path doing runtime PM
shutdown and clock disable before devm-managed ASoC and PCM resources are
released.
Keep the bus clock enabled for the device lifetime with
devm_clk_get_enabled(), and move the runtime PM teardown into devres so the
unwind order matches the managed registrations. This also removes the
remove callback, which only existed for cleanup.
Use a devm action for the final runtime suspend and register it before the
managed runtime PM action, so teardown disables runtime PM before forcing
the device into the suspended state.
ASoC: cs35l56: Share common SoundWire interrupt enable/disable code
Move the duplicated SoundWire interrupt enable/disable code into shared
functions. These new functions are in cs35l56.c to prevent circular
dependency between cs35l56.c and cs35l56-sdw.c
Make sure _kvm_s390_pv_make_secure() takes the pte lock for the given
address when attempting to make the page secure.
One of the steps in making the page secure is freezing the folio using
folio_ref_freeze(), which temporarily sets the reference count to 0.
Any attempt to get such a folio while frozen will fail and cause a
warning to be printed.
Other users of folio_ref_freeze() make sure that the page is not mapped
while it's being frozen, thus preventing gup functions from being able
to access it. For _kvm_s390_pv_make_secure(), this is not possible,
because the page needs to be mapped in order for the import to succeed.
By taking the pte lock, gup functions will be blocked until the import
operation is done, thus avoiding the race.
In theory this does not completely solve the issue: if a page is mapped
through multiple mappings, locking one pte does not protect from
calling gup on it through the other mapping. In practice this does not
happen and it is a decent stopgap solution until a more correct
solution is available.
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-8-imbrenda@linux.ibm.com>
Fix the fault-in code so that it does not return success if a
concurrent unmap event invalidated the fault-in process between the
best-effort lockless check and the proper check with lock.
The new behaviour is to retry, like the best-effort lockless check
already did.
This prevents the fault-in handler from returning success without
having actually faulted in the requested page.
KVM: s390: vsie: Fix rmap handling in _do_shadow_crste()
Fix _do_shadow_crste() to also apply a mask on the reverse address, to
prevent spurious entries from being created, like already done in
gmap_protect_rmap().
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-6-imbrenda@linux.ibm.com>
KVM: s390: Avoid potentially sleeping while atomic when zapping pages
Factor out try_get_locked_pte(), which behaves similarly to
get_locked_pte(), but does not attempt to allocate missing tables and
performs a spin_trylock() instead of blocking.
The new function is also exported, since it will be used in other
patches.
If intermediate entries are missing, there can be no pte swap entry to
free, so it's safe to ignore them.
This avoids potentially sleeping while atomic.
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-4-imbrenda@linux.ibm.com>
The previous incorrect behaviour cleared the vsie_notif bit without
returning false, which allowed shadow crstes to be installed without
the vsie_notif bit.
Return false and do not perform the operation if an unshadow event has
been triggered, but still attempt to clear the vsie_notif bit from the
existing crste.
This will prevent the installation of shadow crstes without vsie_notif
bit and will also prevent the caller from looping forever if it was
not checking for the sg->invalidated flag.
In _gmap_unmap_crste(), the crste to be unmapped is zapped calling
gmap_crstep_xchg_atomic() exactly once, and expecting it to succeed.
This is a reasonable sanity check, since kvm->mmu_lock is being held in
write mode, and thus no races should be possible.
An upcoming patch will change the behaviour of gmap_crstep_xchg_atomic()
to return false and clear the vsie_notif bit if the operation triggers
an unshadow operation. With the new behaviour, an unmap operation that
triggers an unshadow would cause the VM to be killed.
Prepare for the change by checking if the vsie_notif bit was set in
the old crste if gmap_crstep_xchg_atomic() fails the first time, and
try a second time. The second time no failures are allowed.
Steven Rostedt [Mon, 1 Jun 2026 17:07:46 +0000 (13:07 -0400)]
tracing/eprobes: Allow use of BTF names to dereference pointers
Add syntax to the parsing of eprobes to be able to typecast a trace event
field that is a pointer to a structure.
Currently, a dereference must be a number, where the user has to figure
out manually the offset of a member of a structure that they want to
dereference.
But for event probes that records a field that happens to be a pointer to
a structure, it cannot dereference these values with BTF naming, but
must use numerical offsets.
For example, to find out what device a sk_buff is pointing to in the
net_dev_xmit trace event, one must first use gdb to find the offsets of the
members of the structures:
Rodrigo Alencar [Sun, 24 May 2026 10:17:08 +0000 (11:17 +0100)]
iio: dac: ad5686: create bus ops struct
Create struct with bus operations, which will be used to extend bus
implementation features. Auxiliary functions ad5686_write() and
ad5686_read() are created and ad5686_probe() now receives an ops struct
pointer rather than individual read and write functions.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Rodrigo Alencar [Sun, 24 May 2026 10:17:07 +0000 (11:17 +0100)]
iio: dac: ad5686: cleanup doc header of local structs
Review documentation comment header for ad5686_chip_info and ad5686_state.
Update variable names and description and remove unnecessary blank line
between comment and struct declaration.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Rodrigo Alencar [Sun, 24 May 2026 10:17:06 +0000 (11:17 +0100)]
iio: dac: ad5686: add control_sync() for single-channel devices
Create ad5310_control_sync() and ad5683_control_sync() functions that
properly consume the mask definitions with FIELD_PREP(). This allows to
reuse a function that updates the control register with cached values,
without relying on confusing logic that depends on st->use_internal_vref,
which is initialized earlier in ad5686_probe() because it is also
applicable to the AD5686_REGMAP case, removing the need for the
has_external_vref. Powerdown masks initialization is simplified as
*_control_sync() masks outs any unused bits for the single-channel case.
The change cleans up ad5686_write_dac_powerdown() and ad5686_probe(),
organizing the code for feature extension, e.g. gain control support for
single-channel devices.
Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Rodrigo Alencar [Sun, 24 May 2026 10:17:05 +0000 (11:17 +0100)]
iio: dac: ad5686: add helpers to handle powerdown masks
Add ad5686_pd_field_set() and ad5686_pd_field_get() helpers to cleanup
powerdown mask control. Define AD5686_PD_* constants, e.g. AD5686_PD_MSK
to hold powerdown mask value for a single channel. AD5686_LDAC_PWRDN_*
macros are replaced by AD5686_PD_MODE_*, because they are unused and the
LDAC feature for async load of DAC channel values is not related to power
down control.
Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Rodrigo Alencar [Sun, 24 May 2026 10:17:03 +0000 (11:17 +0100)]
iio: dac: ad5686: drop enum id
Split chip info table into separate structs and expose them to the spi
i2c drivers. That is the preferrable approach and allows for the drivers
to have knowledge of the device info before the common probe function gets
called. Those chip info structs may be shared by SPI and I2C driver
variants.
Channel declaration definitions are grouped according to channel count and
DECLARE_AD5693_CHANNELS() macro is renamed to DECLARE_AD5683_CHANNELS() to
match the regmap_type enum.
Use spi_get_device_match_data() and i2c_get_match_data() to get chip info
struct reference, passing it as parameter to the core probe function.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
AD5683_REGMAP and AD5693_REGMAP behave the same way in the common code,
and that is because they target single channel devices from the same
sub-family. There is no reason to separate them and it will make things
simpler when refactoring the chip info table.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Antoniu Miclaus [Tue, 2 Jun 2026 08:40:19 +0000 (11:40 +0300)]
iio: adc: ad4080: fix AD4880 chip ID
The AD4880 chip ID was incorrectly set to 0x0750. According to the
datasheet, the product ID registers read 0x00 (PRODUCT_ID_H) and 0x59
(PRODUCT_ID_L), giving a combined chip ID of 0x0059. Fix the value to
match the actual hardware.
Extract the SHA-384 hash, RSA public key, and RSA signature from the
FMC ELF32 firmware sections. FSP Chain of Trust verification needs
these to validate the FMC image during boot.
Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-14-jhubbard@nvidia.com
[acourbot: derive `Zeroable` on `FmcSignature` for in-place initialization] Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Hopper and Blackwell use FSP instead of SEC2 for secure boot. The
driver must wait for FSP secure boot to complete before continuing
with GSP bring-up. Poll for boot success with a 5-second timeout, and
return the FSP interface only on success so that later Chain of Trust
operations cannot run before FSP is ready. The interface owns the FSP
falcon and the FMC firmware.
Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-13-jhubbard@nvidia.com
[acourbot: use `inspect_err` instead of `map_err` and display actual error]
[acourbot: limit visibility of `fsp_hal` to `super``] Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
FSP is the Falcon that runs FMC firmware on Hopper and Blackwell.
Load the FMC ELF in two forms: the image section that FSP boots from,
and the full Firmware object for later signature extraction during
Chain of Trust verification. Declare the FMC image in the module's
firmware table so it is bundled for FSP-based chipsets.
Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-12-jhubbard@nvidia.com Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Add the FSP (Foundation Security Processor) falcon engine type that
will handle secure boot and Chain of Trust operations on Hopper and
Blackwell architectures.
The FSP falcon replaces SEC2's role in the boot sequence for these newer
architectures. This initial stub just defines the falcon type and its
base address.
John Hubbard [Tue, 2 Jun 2026 03:20:57 +0000 (20:20 -0700)]
gpu: nova-core: add auto-detection of 32-bit, 64-bit firmware images
A firmware image may be either a 32-bit or a 64-bit ELF, and callers
should not have to know which. Detect the ELF class from the image
header at parse time and dispatch to the matching parser, so a single
entry point handles both layouts.
John Hubbard [Tue, 2 Jun 2026 03:20:56 +0000 (20:20 -0700)]
gpu: nova-core: add support for 32-bit firmware images
Some GPU firmware images are packaged as 32-bit ELF rather than 64-bit.
Add a 32-bit implementation of the shared ELF section-parsing
abstraction so those images can be parsed alongside the existing 64-bit
path.
Introduce a single ELF format abstraction that ties each ELF header
type to its matching section-header type. This keeps the shared
section parser ready for upcoming ELF32 support and avoids mixing
32-bit and 64-bit ELF layouts by mistake.
John Hubbard [Tue, 2 Jun 2026 03:20:54 +0000 (20:20 -0700)]
gpu: nova-core: Blackwell: use correct sysmem flush registers
Blackwell GPUs moved the sysmem flush page registers away from the
Ampere/Ada location. GB10x routes the flush through a pair of HSHUB0
register sets (primary and egress) that must both be programmed to
the same address. GB20x routes it through FBHUB0.
Define these registers relative to their HSHUB0 and FBHUB0 bases, as
Open RM does, and implement the flush paths in the GB10x and GB20x
framebuffer HALs.
The GSP-RM boot working memory portion of the WPR2 heap must be
larger on Hopper and later GPUs than on Turing, Ampere, and Ada.
Select the larger value for those generations.
Hopper and Blackwell need a larger non-WPR heap than the 1 MiB that
earlier architectures use. Hopper and Blackwell GB10x need 2 MiB, while
Blackwell GB20x needs 2 MiB + 128 KiB. These sizes diverge by family,
so give Hopper and each Blackwell family its own framebuffer HAL and
select the non-WPR heap size per chipset family.
GSP boot needs to know how much framebuffer memory is reserved for
the PMU. Compute it per architecture: Blackwell dGPUs reserve a
non-zero amount, earlier architectures leave it at zero, matching
Open RM behavior.
Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-4-jhubbard@nvidia.com Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
John Hubbard [Tue, 2 Jun 2026 03:20:50 +0000 (20:20 -0700)]
gpu: nova-core: Hopper/Blackwell: new location for PCI config mirror
Hopper and Blackwell GPUs moved the PCI config space mirror from
0x088000 to 0x092000. Select the correct address per architecture
when building the GSP system info command.
Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-3-jhubbard@nvidia.com Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
John Hubbard [Tue, 2 Jun 2026 03:20:49 +0000 (20:20 -0700)]
gpu: nova-core: set DMA mask width based on GPU architecture
Replace the hardcoded 47-bit DMA mask with a GPU HAL method that
provides the correct value for the architecture.
Set the DMA mask in Gpu::new(). Gpu owns all DMA allocations for
the device, so no concurrent allocations can exist while the
constructor is still running.
Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://patch.msgid.link/20260602032111.224790-2-jhubbard@nvidia.com Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Yu Kuai [Mon, 1 Jun 2026 06:15:02 +0000 (14:15 +0800)]
block, bfq: release cgroup stats with bfq_group
BFQ cgroup stats contain percpu counters embedded in struct bfq_group,
but the old free path destroys them from bfq_pd_free(), which is tied
to blkg policy-data teardown.
That is not the same lifetime as struct bfq_group. BFQ pins bfq_group
while bfq_queue entities refer to it, so bfq_pd_free() can drop the
policy-data reference while other bfq_group references still exist. The
following blkcg change also defers policy-data release through RCU and
leaves BFQ to run the final bfqg_put() from an RCU callback. For that
conversion, stats teardown must belong to the last bfq_group put, not to
policy-data teardown.
Move stats teardown to bfqg_put() so the embedded counters are destroyed
exactly when the last bfq_group reference is released, before kfree(bfqg).
Without this preparatory change, the RCU-delayed policy-data free
conversion reproduced the following KASAN report:
BUG: KASAN: slab-use-after-free in percpu_counter_destroy_many+0xf1/0x2e0
Write of size 8 at addr ffff88811d9409e0 by task test_blkcg/535
Last potentially related work creation:
kasan_save_stack+0x3e/0x60
kasan_record_aux_stack+0x99/0xb0
call_rcu+0x55/0x5c0
blkg_free_workfn+0x130/0x220
process_scheduled_works+0x655/0xb60
worker_thread+0x446/0x600
kthread+0x1f4/0x230
ret_from_fork+0x259/0x420
ret_from_fork_asm+0x1a/0x30