]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
3 weeks agoselftests/bpf: Test that exclusive maps are rejected as iter targets
Daniel Borkmann [Tue, 2 Jun 2026 13:30:52 +0000 (15:30 +0200)] 
selftests/bpf: Test that exclusive maps are rejected as iter targets

Add a subtest to map_excl that creates an exclusive map and verifies a
bpf_map_elem iterator cannot be attached to it, which would otherwise
let an unrelated program read and overwrite the map's contents through
the iterator's writable value buffer.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t map_excl
  [...]
  ./test_progs -t map_excl
  [    1.704382] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.706068] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #215/1   map_excl/map_excl_allowed:OK
  #215/2   map_excl/map_excl_denied:OK
  #215/3   map_excl/map_excl_no_map_in_map:OK
  #215/4   map_excl/map_excl_no_map_iter:OK
  #215     map_excl:OK
  Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-5-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 weeks agoselftests/bpf: Keep verifier_map_ptr exercising ops pointer access
Daniel Borkmann [Tue, 2 Jun 2026 13:30:51 +0000 (15:30 +0200)] 
selftests/bpf: Keep verifier_map_ptr exercising ops pointer access

sashiko complained that 38498c0ebacd ("selftests/bpf: Adjust verifier_map_ptr
for the map's excl field") would slightly decrease the test coverage given
before the test was against the verifier rejecting the ops pointer. Recover
the old test with the right offsets and add the existing one as an additional
test case.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_map_ptr
  [    1.672932] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #637/1   verifier_map_ptr/bpf_map_ptr: read with negative offset rejected:OK
  #637/2   verifier_map_ptr/bpf_map_ptr: read with negative offset rejected @unpriv:OK
  #637/3   verifier_map_ptr/bpf_map_ptr: write rejected:OK
  #637/4   verifier_map_ptr/bpf_map_ptr: write rejected @unpriv:OK
  #637/5   verifier_map_ptr/bpf_map_ptr: read non-existent field rejected:OK
  #637/6   verifier_map_ptr/bpf_map_ptr: read non-existent field rejected @unpriv:OK
  #637/7   verifier_map_ptr/bpf_map_ptr: read beyond excl field rejected:OK
  #637/8   verifier_map_ptr/bpf_map_ptr: read beyond excl field rejected @unpriv:OK
  #637/9   verifier_map_ptr/bpf_map_ptr: read ops field accepted:OK
  #637/10  verifier_map_ptr/bpf_map_ptr: read ops field accepted @unpriv:OK
  #637/11  verifier_map_ptr/bpf_map_ptr: r = 0, map_ptr = map_ptr + r:OK
  #637/12  verifier_map_ptr/bpf_map_ptr: r = 0, map_ptr = map_ptr + r @unpriv:OK
  #637/13  verifier_map_ptr/bpf_map_ptr: r = 0, r = r + map_ptr:OK
  #637/14  verifier_map_ptr/bpf_map_ptr: r = 0, r = r + map_ptr @unpriv:OK
  #637     verifier_map_ptr:OK
  [...]
  Summary: 2/20 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 weeks agolibbpf: Guard add_data() against size overflow
Daniel Borkmann [Tue, 2 Jun 2026 13:30:50 +0000 (15:30 +0200)] 
libbpf: Guard add_data() against size overflow

add_data() computes size8 = roundup(size, 8) and then hands size8 to
realloc_data_buf() before doing memcpy(gen->data_cur, data, size) with
the original size. A wrapped size8 passes through the realloc_data_buf()
INT32_MAX check. Harden this against overflow, though not realistic to
happen in practice.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 weeks agobpf: Reject exclusive maps for bpf_map_elem iterators
Daniel Borkmann [Tue, 2 Jun 2026 13:30:49 +0000 (15:30 +0200)] 
bpf: Reject exclusive maps for bpf_map_elem iterators

Exclusive maps (aka excl_prog_hash) are meant to be reachable only
from the single program whose hash matches. This is enforced by
check_map_prog_compatibility() when the map is referenced from a
program such as signed BPF loaders.

A bpf_map_elem iterator, however, binds its target map at attach
time in bpf_iter_attach_map() instead of referencing it from the
program, so the exclusivity check is never reached. On top of that,
the iterator exposes the map value as a writable buffer.

Fixes: baefdbdf6812 ("bpf: Implement exclusive map creation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 weeks agoMerge tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Tue, 2 Jun 2026 15:59:35 +0000 (08:59 -0700)] 
Merge tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM fixes from Andrew Morton:
 "13 hotfixes. All are for MM. 10 are cc:stable and the remaining 3
  address post-7.1 issues or aren't considered suitable for backporting.

  There's a three-patch series "userfaultfd: verify VMA state across
  UFFDIO_COPY retry" from Mike Rapoport which fixes a few uffd things.
  The rest are singletons - please see the individual changelogs for
  details"

* tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  userfaultfd: remove redundant check in vm_uffd_ops()
  userfaultfd: refuse to __mfill_atomic_pte() for unsupported VMAs
  userfaultfd: verify VMA state across UFFDIO_COPY retry
  mm/huge_memory: update file PMD counter before folio_put()
  mm/huge_memory: update file PUD counter before folio_put()
  mm/hugetlb_vmemmap: fix incorrect vmemmap restore in rollback
  mm/damon/ops-common: call folio_test_lru() after folio_get()
  mm/cma: fix reserved page leak on activation failure
  mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
  mm/hugetlb: restore reservation on error in hugetlb folio copy paths
  mm/cma_debug: fix invalid accesses for inactive CMA areas
  memcg: use round-robin victim selection in refill_stock
  mm/hugetlb: avoid false positive lockdep assertion

3 weeks agodt-bindings: arm-smmu: Correct and add constraints for Hawi, Shikra and Kaanapali
Krzysztof Kozlowski [Wed, 20 May 2026 11:09:14 +0000 (13:09 +0200)] 
dt-bindings: arm-smmu: Correct and add constraints for Hawi, Shikra and Kaanapali

Previous commit 75949eb02653 ("dt-bindings: arm-smmu: Constrain clocks
for newer Qualcomm variants") duplicated constraints for
qcom,sm6350-smmu-500 and qcom,sm6375-smmu-500 - these are already part
of previous "if:" block.

It also missed enforcing one clock for qcom,kaanapali-smmu-500 in GPU
case and missed simultaneously added Shikra and Hawi.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agodt-bindings: arm-smmu: Add compatible for Qualcomm Nord SoC
Shawn Guo [Tue, 19 May 2026 01:39:50 +0000 (09:39 +0800)] 
dt-bindings: arm-smmu: Add compatible for Qualcomm Nord SoC

Document Applications Processor Subsystem (APSS) SMMU on Qualcomm
Nord SoC.

Signed-off-by: Shawn Guo <shengchao.guo@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoaccel/amdxdna: Preserve user address when PASID is disabled
Lizhi Hou [Tue, 2 Jun 2026 04:06:24 +0000 (21:06 -0700)] 
accel/amdxdna: Preserve user address when PASID is disabled

When PASID is not used, the buffer user address is set to
AMDXDNA_INVALID_ADDR. As a result, heap buffer user address validation
fails even though the original userspace address is available.

Preserve the userspace address regardless of PASID usage so heap buffer
address validation works correctly.

Fixes: dbc8fd7a03cb ("accel/amdxdna: Add expandable device heap support")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260602040624.2206774-1-lizhi.hou@amd.com
3 weeks agoKVM: x86: Take PIC lock on KVM_GET_IRQCHIP path
Carlos López [Fri, 29 May 2026 14:00:14 +0000 (16:00 +0200)] 
KVM: x86: Take PIC lock on KVM_GET_IRQCHIP path

When userspace issues the KVM_SET_IRQCHIP ioctl to set the state of
the PIC, kvm_vm_ioctl_set_irqchip() grabs @kvm->arch.vpic->lock before
updating the state. However, the KVM_GET_IRQCHIP ioctl to retrieve the
same PIC state does not grab such lock, potentially causing torn reads
for userspace.

Fix this by grabbing the lock on the read path.

This issue goes all the way back. The bug was introduced with the
addition of PIC ioctl code itself in 6ceb9d791eee ("KVM: Add get/
set irqchip ioctls for in-kernel PIC live migration support"). Later,
894a9c5543ab ("KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU
ioctl paths") added the locking for kvm_vm_ioctl_set_irqchip(), but
missed kvm_vm_ioctl_get_irqchip().

Fixes: 6ceb9d791eee ("KVM: Add get/set irqchip ioctls for in-kernel PIC live migration support")
Fixes: 894a9c5543ab ("KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths")
Reported-by: Claude Code:claude-opus-4.6
Signed-off-by: Carlos López <clopez@suse.de>
Link: https://patch.msgid.link/20260529140013.14925-2-clopez@suse.de
Signed-off-by: Sean Christopherson <seanjc@google.com>
3 weeks agoarm64: mm: Unmap kernel data/bss entirely from the linear map
Ard Biesheuvel [Fri, 29 May 2026 15:02:06 +0000 (17:02 +0200)] 
arm64: mm: Unmap kernel data/bss entirely from the linear map

The linear aliases of the kernel text and rodata are also mapped
read-only in the linear map. Given that the contents of these regions
are mostly identical to the version in the loadable image, mapping them
read-only and leaving their contents visible is a reasonable hardening
measure.

Data and bss, however, are now also mapped read-only but the contents of
these regions are more likely to contain data that we'd rather not leak.
So let's unmap these entirely in the linear map when the kernel is
running normally.

When going into hibernation or waking up from it, these regions need to
be mapped, so map the region initially, and toggle the valid bit so
map/unmap the region as needed.

Doing so is required because pages covering the kernel image are marked
as PageReserved, and therefore disregarded for snapshotting by the
hibernate logic unless they are mapped.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: mm: Map the kernel data/bss read-only in the linear map
Ard Biesheuvel [Fri, 29 May 2026 15:02:05 +0000 (17:02 +0200)] 
arm64: mm: Map the kernel data/bss read-only in the linear map

On systems where the bootloader adheres to the original arm64 boot
protocol, the placement of the kernel in the physical address space is
highly predictable, and this makes the placement of its linear alias in
the kernel virtual address space equally predictable, given the lack of
randomization of the linear map.

The linear aliases of the kernel text and rodata regions are already
mapped read-only, but the kernel data and bss are mapped read-write in
this region. This is not needed, so map them read-only as well.

Note that the statically allocated kernel page tables do need to be
modifiable via the linear map, so leave these mapped read-write.

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agomm: Make empty_zero_page[] const
Ard Biesheuvel [Fri, 29 May 2026 15:02:04 +0000 (17:02 +0200)] 
mm: Make empty_zero_page[] const

The empty zero page is used to back any kernel or user space mapping
that is supposed to remain cleared, and so the page itself is never
supposed to be modified.

So mark it as const, which moves it into .rodata rather than .bss: on
most architectures, this ensures that both the kernel's mapping of it
and any aliases that are accessible via the kernel direct (linear) map
are mapped read-only, and cannot be used (inadvertently or maliciously)
to corrupt the contents of the zero page.

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agosh: Drop cache flush of the zero page at boot
Ard Biesheuvel [Fri, 29 May 2026 15:02:03 +0000 (17:02 +0200)] 
sh: Drop cache flush of the zero page at boot

SuperH performs cache maintenance on the zero page during boot,
presumably because before commit

  6215d9f4470f ("arch, mm: consolidate empty_zero_page")

the zero page did double duty as a boot params region, and was cleared
separately, as it was not part of BSS. The memset() in question was
dropped by that commit, but the __flush_wback_region() call remained.

As empty_zero_page[] has been moved to BSS, it can be treated as any
other BSS memory, and so the cache flush can be dropped.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agopowerpc/code-patching: Avoid r/w mapping of the zero page
Ard Biesheuvel [Fri, 29 May 2026 15:02:02 +0000 (17:02 +0200)] 
powerpc/code-patching: Avoid r/w mapping of the zero page

The only remaining use of map_patch_area() is mapping the zero page, and
immediately unmapping it again so that the intermediate page table
levels are all guaranteed to be populated.

The use of the zero page here is completely arbitrary, and not harmful
per se, but currently, it creates a writable mapping, and does so in a
manner that requires that the empty_zero_page[] symbol is not
const-qualified.

Given that this is about to change, and that map_patch_area() now never
maps anything other than the zero page, let's simplify the code and
- remove the helpers and call [un]map_kernel_page() directly
- take the PA of empty_zero_page directly
- create a read-only temporary mapping.

This allows empty_zero_page[] to be repainted as const u8[] in a
subsequent patch, without making substantial changes to this code
patching logic.

Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://lore.kernel.org/all/20260520085423.485402-1-ardb@kernel.org/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: mm: Don't abuse memblock NOMAP to check for overlaps
Ard Biesheuvel [Fri, 29 May 2026 15:02:01 +0000 (17:02 +0200)] 
arm64: mm: Don't abuse memblock NOMAP to check for overlaps

Now that the linear region mapping routines respect existing table
mappings and contiguous block and page mappings, it is no longer needed
to fiddle with the memblock tables to set and clear the NOMAP attribute
in order to omit text and rodata when creating the linear map.

Instead, map the kernel text and rodata alias first with the desired
initial attributes and granularity, so that the loop iterating over the
memblocks will not remap it in a manner that prevents it from being
remapped with updated attributes later.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: Move fixmap and kasan page tables to end of kernel image
Ard Biesheuvel [Fri, 29 May 2026 15:02:00 +0000 (17:02 +0200)] 
arm64: Move fixmap and kasan page tables to end of kernel image

Move the fixmap and kasan page tables out of the BSS section, and place
them at the end of the image, right before the init_pg_dir section where
some of the other statically allocated page tables live.

These page tables are currently the only data objects in vmlinux that
are meant to be accessed via the kernel image's linear alias, and so
placing them together allows the remainder of the data/bss section to be
remapped read-only or unmapped entirely.

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: mm: Permit contiguous attribute for preliminary mappings
Ard Biesheuvel [Fri, 29 May 2026 15:01:59 +0000 (17:01 +0200)] 
arm64: mm: Permit contiguous attribute for preliminary mappings

There are a few cases where we omit the contiguous hint for mappings
that start out as read-write and are remapped read-only later, on the
basis that manipulating live descriptors with the PTE_CONT attribute set
is unsafe. When support for the contiguous hint was added to the code,
the ARM ARM was ambiguous about this, and so we erred on the side of
caution.

In the meantime, this has been clarified [0], and regions that will be
remapped in their entirety, retaining the contiguous bit on all entries,
can use the contiguous hint both in the initial mapping as well as the
one that replaces it. Note that this requires that the logic that may be
called to remap overlapping regions respects existing valid descriptors
that have the contiguous bit cleared.

So omit the NO_CONT_MAPPINGS flag in places where it is unneeded.

[0] RJQQTC

For a TLB lookup in a contiguous region mapped by translation table entries that
have consistent values for the Contiguous bit, but have the OA, attributes, or
permissions misprogrammed, that TLB lookup is permitted to produce an OA, access
permissions, and memory attributes that are consistent with any one of the
programmed translation table values.

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: kfence: Avoid NOMAP tricks when mapping the early pool
Ard Biesheuvel [Fri, 29 May 2026 15:01:58 +0000 (17:01 +0200)] 
arm64: kfence: Avoid NOMAP tricks when mapping the early pool

Now that the map_mem() routines respect existing page mappings and
contiguous granule sized blocks with the contiguous bit cleared, there
is no longer a reason to play tricks with the memblock NOMAP attribute.

Instead, the kfence pool can be allocated and mapped with page
granularity first, and this granularity will be respected when the rest
of DRAM is mapped later, even if block and contiguous mappings are
allowed for the remainder of those mappings.

Add the NO_EXEC_MAPPINGS flag to ensure that hierarchical XN attributes
are set on the intermediate page tables that are allocated when mapping
the pool.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: mm: Permit contiguous descriptors to be manipulated
Ard Biesheuvel [Fri, 29 May 2026 15:01:57 +0000 (17:01 +0200)] 
arm64: mm: Permit contiguous descriptors to be manipulated

Currently, pgattr_change_is_safe() is overly pedantic when it comes to
descriptors with the contiguous hint attribute set, as it rejects
assignments even if the old and the new value are the same.

In fact, as per ARM ARM RJQQTC, manipulating descriptors with the
contiguous bit set is safe as long as the bit itself does not change
value, in the sense that no TLB conflict aborts or other exceptions may
be raised as a result. Inconsistent permission attributes within the
contiguous region may result in any of the alternatives to be taken to
apply to the entire region, which might be a programming error, but it
does not constitute an unsafe manipulation in terms of what
pgattr_change_is_safe() is intended to detect.

So drop the special PTE_CONT check, but still omit PTE_CONT from 'mask'
so that modifying the bit is still regarded as unsafe.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: mm: Preserve non-contiguous descriptors when mapping DRAM
Ard Biesheuvel [Fri, 29 May 2026 15:01:56 +0000 (17:01 +0200)] 
arm64: mm: Preserve non-contiguous descriptors when mapping DRAM

Instead of blindly overwriting existing live entries regardless of the
value of their contiguous bit when mapping DRAM regions at
contiguous-hint granularity, check whether the contiguous region in
question contains any valid descriptors that have the contiguous bit
cleared, and in that case, leave the contiguous bit unset on the entire
region. This permits the logic of mapping the kernel's linear alias to
be simplified in a subsequent patch.

Note that this can only result in a misprogrammed contiguous bit (as per
ARM ARM RNGLXZ) if the region in question already contains a mix of
valid contiguous and valid non-contiguous descriptors, in which case it
was already misprogrammed to begin with.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: mm: Preserve existing table mappings when mapping DRAM
Ard Biesheuvel [Fri, 29 May 2026 15:01:55 +0000 (17:01 +0200)] 
arm64: mm: Preserve existing table mappings when mapping DRAM

Instead of blindly overwriting an existing table entry when mapping DRAM
regions, take care not to replace a pre-existing table entry with a
block entry. This permits the logic of mapping the kernel's linear alias
to be simplified in a subsequent patch.

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings
Ard Biesheuvel [Fri, 29 May 2026 15:01:54 +0000 (17:01 +0200)] 
arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings

Sashiko reports:

| If pmd_set_huge() rejects an unsafe page table transition (such as
| mapping a different physical address over an existing block mapping),
| it returns 0 and leaves the page table entry unmodified.
|
| Because *pmdp remains unmodified, READ_ONCE(pmd_val(*pmdp)) will equal
| pmd_val(old_pmd). The transition from old_pmd to old_pmd is evaluated
| as safe by pgattr_change_is_safe(), so the BUG_ON never triggers.
|
| This allows invalid and unsafe mapping updates to be silently dropped
| instead of panicking, leaving stale memory mappings active while the
| caller assumes the update was successful.

The same applies to pud_set_huge() in alloc_init_pud().

Given how it is generally preferred to limp on rather than blow up the
system if an unexpected condition such as this one occurs, and the fact
that there are no known cases where this disparity results in real
problems, let's WARN on these failures rather than BUG, allowing the
system to survive to the point where it can actually report them.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: mm: Drop redundant pgd_t* argument from map_mem()
Ard Biesheuvel [Fri, 29 May 2026 15:01:53 +0000 (17:01 +0200)] 
arm64: mm: Drop redundant pgd_t* argument from map_mem()

__map_memblock() and map_mem() always operate on swapper_pg_dir, so
there is no need to pass around a pgd_t pointer between them.

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoarm64: mm: Remove bogus stop condition from map_mem() loop
Ard Biesheuvel [Fri, 29 May 2026 15:01:52 +0000 (17:01 +0200)] 
arm64: mm: Remove bogus stop condition from map_mem() loop

The memblock API guarantees that start is not greater than or equal to
end, so there is no need to test it. And if it were, it is doubtful that
breaking out of the loop would be a reasonable course of action here
(rather than attempting to map the remaining regions)

So let's drop this check.

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
3 weeks agoASoC: loongson: Refactor DMA and regmap handling
Mark Brown [Tue, 2 Jun 2026 15:21:48 +0000 (16:21 +0100)] 
ASoC: loongson: Refactor DMA and regmap handling

Binbin Zhou <zhoubinbin@loongson.cn> says:

This series refactors the Loongson I2S ASoC drivers, reducing code
duplication and improving DMA differentiation. It also adds an entry
in MAINTAINERS and applies a few fixes to the es8323 codec driver.

These changes have been tested on Loongson-2K0300 (platform, eDMA) and
Loongson-2K2000 (PCI, iDMA) boards.

Link: https://patch.msgid.link/cover.1780304703.git.zhoubinbin@loongson.cn
3 weeks agoASoC: loongson: Separate external shared DMA from the platform interface
Binbin Zhou [Mon, 1 Jun 2026 09:29:39 +0000 (17:29 +0800)] 
ASoC: loongson: Separate external shared DMA from the platform interface

The Loongson I2S platform driver (used on LS2K1000, LS7A etc.) relies on
an external DMA engine (e.g., dw_dmac) rather than the internal DMA.
However, its DMA-related code was originally embedded in
loongson_i2s_plat.c, duplicating logic that should be shared.

Extract the external DMA (eDMA) support from the platform driver and move
it into loongson_dma.c alongside the existing internal DMA (iDMA) code.

This change eliminates code duplication and prepares for future
consolidation of DMA selection logic.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Link: https://patch.msgid.link/979368ad269f192703ed24e9a19eebce32316745.1780304703.git.zhoubinbin@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: loongson: Use the `idma` identifier for internal DMA variables
Binbin Zhou [Mon, 1 Jun 2026 09:29:38 +0000 (17:29 +0800)] 
ASoC: loongson: Use the `idma` identifier for internal DMA variables

The Loongson I2S controller can work with two types of DMA:
- Internal DMA (iDMA): integrated DMA engine, driven by dedicated
  registers and interrupts.
- External DMA (eDMA): generic DMA engine (e.g., dw_dmac), using the
  standard dmaengine API.

To distinguish these two distinct implementations, rename all
internal-DMA-related structures, functions, and the component driver
to use the "idma" prefix.

No functional change intended.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Link: https://patch.msgid.link/58e91c54f2bf658ac9b773741ca2aebc3866e550.1780304703.git.zhoubinbin@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: loongson: Combined regmap definitions
Binbin Zhou [Mon, 1 Jun 2026 09:29:37 +0000 (17:29 +0800)] 
ASoC: loongson: Combined regmap definitions

Previously, the regmap configuration for Loongson I2S controller was
duplicated in both PCI and platform glue drivers. Move the common
regmap configuration into the shared loongson_i2s.c to avoid code
duplication and centralize register access handling.

While moving, adjust the following:
- Mark RX_DATA/TX_DATA/I2S_CTRL as volatile registers. The PCI version
  incorrectly marked CFG/CFG1 as volatile, which prevented proper
  regcache synchronization.
- Change cache type from REGCACHE_FLAT to REGCACHE_MAPLE. The register
  map is sparse and the number of registers is small; MAPLE tree provides
  better scalability and is the recommended cache type for modern
  regmap users.

Also, the following warning for the i2s_plat driver will be eliminated:

loongson-i2s-plat loongson-i2s: using zero-initialized flat cache, this may cause unexpected behavior.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Link: https://patch.msgid.link/e32d24479fc382dc3de6aded6351c13b43b6391d.1780304703.git.zhoubinbin@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoMAINTAINERS: Add entry for Loongson ASoC driver
Binbin Zhou [Mon, 1 Jun 2026 09:29:36 +0000 (17:29 +0800)] 
MAINTAINERS: Add entry for Loongson ASoC driver

Add MAINTAINERS entry for Loongson I2S ASoC drivers to track
changes in sound/soc/loongson/ directory.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Link: https://patch.msgid.link/9451dfcd6ff3048eac0656d3720908386128b7fc.1780304703.git.zhoubinbin@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: es9356: Use new SoundWire enumeration helper
Charles Keepax [Tue, 2 Jun 2026 10:27:49 +0000 (11:27 +0100)] 
ASoC: es9356: Use new SoundWire enumeration helper

Update the driver to use the new core helper that waits for the device
to enumerate on SoundWire and be initialised by the SoundWire core.

Link: https://lore.kernel.org/linux-sound/20260512103022.1154645-1-ckeepax@opensource.cirrus.com/
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20260602102749.3962261-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoof: reserved_mem: only support one <base size> entry in reg property
Wandun Chen [Mon, 25 May 2026 12:17:00 +0000 (20:17 +0800)] 
of: reserved_mem: only support one <base size> entry in reg property

A /reserved-memory child node may have multiple <base size> tuples in
'reg' property, but multiple entries in 'reg' have never been fully
functional:
 - fdt_scan_reserved_mem() in the early pass loops over every
   tuple and reserves them all.

 - fdt_scan_reserved_mem_late() reads 'reg' by
   of_flat_dt_get_addr_size(), which returns false if entries != 1.
   So 'reg' property with multiple <base size> entries will be
   skipped, no reserved_mem entry is created in reserved_mem[].

Supporting multiple <base size> tuples is not a good idea:
  - It requires reserved_mem_ops->node_init support. Currently,
    CMA(rmem_cma_setup) and DMA(rmem_dma_setup) are not supported.

  - of_reserved_mem_lookup() is name-based, only the first entry in
    multiple <base size> tuples will be found.

So change to support one <base size> entry in 'reg' property.

Also update dt binding:
  https://github.com/devicetree-org/dt-schema/pull/197

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
Tested-by: Meijing Zhao <zhaomeijing@lixiang.com>
Link: https://lore.kernel.org/all/20260506014752.GA280279-robh@kernel.org/
Link: https://patch.msgid.link/20260525121700.2706141-1-chenwandun1@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
3 weeks agoASoC: mediatek: mt8192 probe cleanup
Mark Brown [Tue, 2 Jun 2026 15:13:08 +0000 (16:13 +0100)] 
ASoC: mediatek: mt8192 probe cleanup

Cássio Gabriel <cassiogabrielcontato@gmail.com> says:

Fix two MT8192 AFE probe cleanup issues that mirror the recently fixed
MT8189 and MT8196 paths.

The first patch registers a devm cleanup action for a successful
reserved-memory assignment so later probe failures and driver unbind
release it.

The second patch checks the temporary runtime resume used while
reinitializing the regmap cache and makes the regcache failure path drop
the PM reference and clear pm_runtime_bypass_reg_ctl.

Link: https://patch.msgid.link/20260527-asoc-mt8192-probe-cleanup-v1-0-1bb834d05b72@gmail.com
3 weeks agoASoC: mediatek: mt8192: Check runtime resume during probe
Cássio Gabriel [Wed, 27 May 2026 13:55:47 +0000 (10:55 -0300)] 
ASoC: mediatek: mt8192: Check runtime resume during probe

The MT8192 AFE probe enables runtime PM temporarily while reinitializing
the regmap cache from hardware, but it uses pm_runtime_get_sync()
without checking the return value. If runtime resume fails, probe keeps
going without the device necessarily being accessible, and
pm_runtime_get_sync() may leave the PM usage count incremented.

The regmap_reinit_cache() failure path also returns before dropping the
temporary PM reference and before clearing pm_runtime_bypass_reg_ctl.

Use pm_runtime_resume_and_get() so resume failures do not leak a usage
count, and clear the temporary bypass flag after dropping the probe PM
reference on all regmap_reinit_cache() outcomes.

Fixes: 125ab5d588b0 ("ASoC: mediatek: mt8192: add platform driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260527-asoc-mt8192-probe-cleanup-v1-2-1bb834d05b72@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: mediatek: mt8192: Release reserved memory on cleanup
Cássio Gabriel [Wed, 27 May 2026 13:55:46 +0000 (10:55 -0300)] 
ASoC: mediatek: mt8192: Release reserved memory on cleanup

The MT8192 AFE probe calls of_reserved_mem_device_init() and falls
back to preallocated buffers when no reserved memory region is
available. When the reserved memory assignment succeeds, however, the
driver never releases it.

Register a devm cleanup action after a successful reserved-memory
assignment so the assignment is released on probe failure and driver
unbind.

Fixes: ec4a10ca4a68 ("ASoC: mediatek: use reserved memory or enable buffer pre-allocation")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260527-asoc-mt8192-probe-cleanup-v1-1-1bb834d05b72@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: mediatek: mt8183: Fix probe resource cleanup
Mark Brown [Tue, 2 Jun 2026 15:10:37 +0000 (16:10 +0100)] 
ASoC: mediatek: mt8183: Fix probe resource cleanup

Cássio Gabriel <cassiogabrielcontato@gmail.com> says:

The MT8183 AFE probe has two cleanup gaps that match issues
recently fixed in newer MediaTek AFE drivers.

First, reserved memory assigned with of_reserved_mem_device_init()
is never released on driver removal or later probe failures.

Second, the probe-time runtime PM resume used before reinitializing
the regmap cache is unchecked, and a regmap_reinit_cache() failure
skips the temporary PM put.

Fix both issues with a devm reserved-memory release action and
checked runtime PM resume handling.

Link: https://patch.msgid.link/20260527-asoc-mt8183-probe-cleanup-v1-0-4f4f5593c8d1@gmail.com
3 weeks agoASoC: mediatek: mt8183: Check runtime resume during probe
Cássio Gabriel [Wed, 27 May 2026 13:41:49 +0000 (10:41 -0300)] 
ASoC: mediatek: mt8183: Check runtime resume during probe

The MT8183 AFE probe uses pm_runtime_get_sync() before reading hardware
defaults into the regmap cache, but does not check whether runtime resume
failed. If regmap_reinit_cache() then fails, the temporary runtime PM
usage count is also not released.

Use pm_runtime_resume_and_get() so resume failures abort probe without
leaking a usage count, and release the temporary reference before
handling the regmap cache result.

Fixes: a94aec035a12 ("ASoC: mediatek: mt8183: add platform driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260527-asoc-mt8183-probe-cleanup-v1-2-4f4f5593c8d1@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: mediatek: mt8183: Release reserved memory on cleanup
Cássio Gabriel [Wed, 27 May 2026 13:41:48 +0000 (10:41 -0300)] 
ASoC: mediatek: mt8183: Release reserved memory on cleanup

The MT8183 AFE probe can assign reserved memory with
of_reserved_mem_device_init(), but the assignment is never released on
driver removal or later probe failures.

Register a devm cleanup action so the reserved memory assignment is
released consistently, matching newer Mediatek AFE drivers.

Fixes: ec4a10ca4a68 ("ASoC: mediatek: use reserved memory or enable buffer pre-allocation")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260527-asoc-mt8183-probe-cleanup-v1-1-4f4f5593c8d1@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoregulator: Use named initializers for platform_device_id arrays
Mark Brown [Tue, 2 Jun 2026 15:09:27 +0000 (16:09 +0100)] 
regulator: Use named initializers for platform_device_id arrays

Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> says:

this series targets to use named initializers for platform_device_id
arrays. In general these are better readable for humans and more robust
to changes in the respective struct definition.

This robustness is needed as I want to do

Link: https://patch.msgid.link/cover.1779878004.git.u.kleine-koenig@baylibre.com
3 weeks agoregulator: Unify usage of space and comma in platform_device_id arrays
Uwe Kleine-König (The Capable Hub) [Wed, 27 May 2026 10:47:46 +0000 (12:47 +0200)] 
regulator: Unify usage of space and comma in platform_device_id arrays

After converting all these arrays to use named initializers and fixing
coding style en passant, adapt the coding style also for those drivers that
already used named initializers before for consistency.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/a3a2736ebfcfa5a228dcebfbfefc14960dcce314.1779878004.git.u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoregulator: Use named initializers for platform_device_id arrays
Uwe Kleine-König (The Capable Hub) [Wed, 27 May 2026 10:47:45 +0000 (12:47 +0200)] 
regulator: Use named initializers for platform_device_id arrays

Named initializers are better readable and more robust to changes of the
struct definition. This robustness is relevant for a planned change to
struct platform_device_id replacing .driver_data by an anonymous unit.

While touching these arrays unify spacing and usage of commas.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Acked-by: Karel Balej <balejk@matfyz.cz>
Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
Link: https://patch.msgid.link/d02f55dfd5bdd743ae5cd76f2a5af0d346226a68.1779878004.git.u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoregulator: Drop unused assignment of platform_device_id driver data
Uwe Kleine-König (The Capable Hub) [Wed, 27 May 2026 10:47:44 +0000 (12:47 +0200)] 
regulator: Drop unused assignment of platform_device_id driver data

Several drivers explicitly set the .driver_data member of struct
platform_device_id to zero without relying on that value. Drop these
unused assignments.

While touching these arrays unify spacing, usage of commas and use
named initializers for .name.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/613cd1bed263c2bf562ee714595f6d57f442804d.1779878004.git.u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: codecs: rk3328: Use managed GPIO and clock helpers
Cássio Gabriel [Mon, 25 May 2026 17:18:03 +0000 (14:18 -0300)] 
ASoC: codecs: rk3328: Use managed GPIO and clock helpers

rk3328_platform_probe() acquires the mute GPIO with gpiod_get_optional()
but never releases it. It also enables mclk and pclk manually while
relying on probe error labels for unwind, and the driver has no platform
remove callback to disable those clocks after a successful unbind.

This path has already needed fixes for missing clock unwinds on probe
errors. Use devm_gpiod_get_optional() and devm_clk_get_enabled() so the
GPIO and enabled clock lifetimes are tied to the device. This removes the
manual error labels and makes both probe failure and driver unbind follow
the normal devres cleanup path.

Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260525-asoc-rk3328-devm-resources-v1-1-2abde0006f89@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoregulator: scmi: fix of_node refcount leak in scmi_regulator_probe()
Wentao Liang [Wed, 27 May 2026 10:48:50 +0000 (10:48 +0000)] 
regulator: scmi: fix of_node refcount leak in scmi_regulator_probe()

scmi_regulator_probe() calls of_find_node_by_name() which takes a
reference on the returned device node. On the error path where
process_scmi_regulator_of_node() fails, the function returns without
calling of_node_put() on the child node, leaking the reference.

Add of_node_put(np) on the error path to properly release the
reference.

Cc: stable@vger.kernel.org
Fixes: 0fbeae70ee7c ("regulator: add SCMI driver")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20260527104850.872415-1-vulab@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agontfs3: fix out-of-bounds read in ntfs_dir_emit() and hdr_find_e()
Alessandro Schino [Mon, 11 May 2026 18:15:15 +0000 (20:15 +0200)] 
ntfs3: fix out-of-bounds read in ntfs_dir_emit() and hdr_find_e()

The bounds check in ntfs_dir_emit() compares fname->name_len (a
character count) against e->size (a byte count) without accounting
for the 2-byte-per-character UTF-16LE encoding or the ATTR_FILE_NAME
header size:

  if (fname->name_len + sizeof(struct NTFS_DE) > le16_to_cpu(e->size))

This computes: name_len + 16 > e_size

The correct check must account for the ATTR_FILE_NAME header (66 bytes
before the name) and the UTF-16LE character size (2 bytes each):

  sizeof(NTFS_DE) + offsetof(ATTR_FILE_NAME, name) +
  name_len * sizeof(short) > e_size

Which computes: 16 + 66 + name_len * 2 > e_size

The correct calculation already exists as fname_full_size() in ntfs.h
and is used in cmp_fnames(), namei.c, and fslog.c, but was not used
in the readdir path.

A crafted NTFS image with an index entry containing a small e->size
but large fname->name_len bypasses the current check, causing
ntfs_utf16_to_nls() to read past the entry boundary.

Additionally, add a key_size validation in hdr_find_e() to ensure the
declared key_size does not exceed the available entry data, preventing
comparison functions from reading past entry boundaries on the lookup
path.

Signed-off-by: Alessandro Schino <7991aleschino@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: fix mount failure on 64K page-size kernels
Jamie Nguyen [Tue, 19 May 2026 19:42:20 +0000 (12:42 -0700)] 
fs/ntfs3: fix mount failure on 64K page-size kernels

On 64K page-size kernels, mounting NTFS volumes smaller than ~650 MB
fails with EINVAL. The issue is in log_replay(): the initial log page
size probe uses PAGE_SIZE (65536) instead of DefaultLogPageSize (4096)
when PAGE_SIZE exceeds DefaultLogPageSize * 2.

This makes norm_file_page() require the $LogFile to be at least
50 * 65536 = 3.2 MB, but mkfs.ntfs creates a $LogFile of only ~1.5 MB
for a typical 300 MB volume. norm_file_page() returns 0 and the mount
is rejected with EINVAL.

On 4K kernels the #if guard evaluates to true, so use_default=true is
passed and DefaultLogPageSize (4096) is used, requiring only ~200 KB.
This path works fine.

Fix this by always passing use_default=true, which forces the initial
probe to use DefaultLogPageSize regardless of the kernel's PAGE_SIZE.
This is safe because, after reading the on-disk restart area, log_replay()
already re-adjusts log->page_size to match the volume's actual
sys_page_size.

Also fix read_log_page() to pass log->page_size instead of PAGE_SIZE to
ntfs_fix_post_read(), matching the actual buffer size.

Fixes: b46acd6a6a62 ("fs/ntfs3: Add NTFS journal")
Tested-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agontfs3: avoid another -Wmaybe-uninitialized warning
Arnd Bergmann [Fri, 15 May 2026 09:09:50 +0000 (11:09 +0200)] 
ntfs3: avoid another -Wmaybe-uninitialized warning

The ntfs3 specific -Wmaybe-uninitialized flag found one more false-postive,
this time with gcc-10 on s390:

fs/ntfs3/frecord.c: In function 'ni_expand_list':
fs/ntfs3/frecord.c:1370:16: error: 'ins_attr' may be used uninitialized in this function [-Werror=maybe-uninitialized]

Add an explicit NULL pointer check before using the pointer, and
initialize it to NULL.

Fixes: 48d9b57b169f ("fs/ntfs3: add a subset of W=1 warnings for stricter checks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agontfs3: Allocate iomap inline_data using alloc_page
Mihai Brodschi [Mon, 11 May 2026 17:19:04 +0000 (20:19 +0300)] 
ntfs3: Allocate iomap inline_data using alloc_page

This fixes a BUG reported in iomap_write_end_inline:
iomap_inline_data_valid checks that the inline_data fits within
a page. If the inline_data is allocated with kmemdup there's no
guarantee that it's page-aligned, so the check sometimes fails.
Allocate it with alloc_page to ensure it's page-aligned.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221446
Fixes: 099ef9ab9203 ("fs/ntfs3: implement iomap-based file operations")
Signed-off-by: Mihai Brodschi <m.brodschi@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: format code, deal with comments
Konstantin Komarov [Wed, 27 May 2026 08:23:28 +0000 (10:23 +0200)] 
fs/ntfs3: format code, deal with comments

format code according to .clang-format, add useful comments and remove
non-useful comments.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: reject SEEK_DATA and SEEK_HOLE past EOF early
Konstantin Komarov [Fri, 22 May 2026 13:14:39 +0000 (15:14 +0200)] 
fs/ntfs3: reject SEEK_DATA and SEEK_HOLE past EOF early

Handle non-data/hole seeks through generic_file_llseek_size() and return
-ENXIO immediately when SEEK_DATA or SEEK_HOLE is requested at or past
EOF. Handle compressed files in such cases properly as well.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: fold file size handling into ntfs_set_size()
Konstantin Komarov [Fri, 22 May 2026 13:11:25 +0000 (15:11 +0200)] 
fs/ntfs3: fold file size handling into ntfs_set_size()

Remove the separate ntfs_extend() and ntfs_truncate() helpers and route
file size changes through ntfs_set_size().

This consolidates ntfs3 size updates in one place and lets the write,
fallocate, and setattr paths share the same logic for updating i_size,
valid data length, and preallocated extents.

This patch fixes a few issues found during internal tests.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: force waiting for direct I/O completion
Konstantin Komarov [Fri, 22 May 2026 12:58:12 +0000 (14:58 +0200)] 
fs/ntfs3: force waiting for direct I/O completion

It makes ntfs3 wait for direct I/O completion before returning to the
caller, instead of allowing the write path to complete asynchronously.

The issue was discovered during internal tests.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: fold resident writeback into writepages loop
Konstantin Komarov [Fri, 22 May 2026 12:55:38 +0000 (14:55 +0200)] 
fs/ntfs3: fold resident writeback into writepages loop

Remove the separate ntfs_resident_writepage() helper and handle resident
writeback directly from ntfs_writepages(). This simplifies the resident
writeback path and keeps the folio handling local to ntfs_writepages().

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: handle delayed allocation overlap in run lookup
Konstantin Komarov [Fri, 22 May 2026 12:52:37 +0000 (14:52 +0200)] 
fs/ntfs3: handle delayed allocation overlap in run lookup

Introduce run_lookup_entry_da() to look up data runs while taking
delayed allocation into account.

ntfs3 may have both committed extents and delayed allocation extents for
the same VCN range.  The new helper checks delayed allocation first and
falls back to the real run, then corrects the returned range when a real
run overlaps with a delayed allocation run.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: zero stale pagecache beyond valid data length
Konstantin Komarov [Fri, 22 May 2026 12:46:23 +0000 (14:46 +0200)] 
fs/ntfs3: zero stale pagecache beyond valid data length

Zero cached folios beyond the valid data length when closing a writable
mapping. This keeps cached data beyond initialized file contents zeroed
and prevents stale pagecache exposure after mmap-based writes.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: add fileattr support
Konstantin Komarov [Fri, 22 May 2026 12:36:13 +0000 (14:36 +0200)] 
fs/ntfs3: add fileattr support

Implement fileattr_get() and fileattr_set() to fix a problem found
during the internal testing.

This allows ntfs3 to expose and modify inode flags through the generic
file attribute interface used by FS_IOC_GETFLAGS and FS_IOC_SETFLAGS.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: call _ntfs_bad_inode() when failing to rename
Helen Koike [Wed, 6 May 2026 17:08:22 +0000 (14:08 -0300)] 
fs/ntfs3: call _ntfs_bad_inode() when failing to rename

It is safe to call _ntfs_bad_inode on live inodes since:
  commit 519b078998ce ("fs/ntfs3: Exclude call make_bad_inode for live nodes.")

The WARN_ON was added when it wasn't safe by:
  commit d99208b91933 ("fs/ntfs3: cancle set bad inode after removing name fails")

Replace the WARN_ON with a call to _ntfs_bad_inode() to prevent further
operations on the inconsistent inode.

Reported-by: syzbot+4d8e30dbafb5c1260479@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4d8e30dbafb5c1260479
Fixes: 519b078998ce ("fs/ntfs3: Exclude call make_bad_inode for live nodes.")
Signed-off-by: Helen Koike <koike@igalia.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: fix wrong LCN in run_remove_range() when splitting a run
Zhan Xusheng [Fri, 8 May 2026 09:52:45 +0000 (17:52 +0800)] 
fs/ntfs3: fix wrong LCN in run_remove_range() when splitting a run

When run_remove_range() removes a middle portion of a non-sparse run,
it splits the run into head and tail parts.  The tail is inserted via
run_add_entry() but uses the original r->lcn as its starting LCN
instead of advancing it by the split offset.

For example, removing VCN range [10, 20) from a run
{vcn=0, lcn=100, len=30} should produce:
  {vcn=0,  lcn=100, len=10}   (head)
  {vcn=20, lcn=120, len=10}   (tail, lcn advanced by 20)

But the current code produces:
  {vcn=0,  lcn=100, len=10}
  {vcn=20, lcn=100, len=10}   (wrong: points to same physical clusters)

This creates overlapping physical mappings in the in-memory run tree,
which can corrupt cluster allocation decisions and lead to data
corruption.

The correct pattern is already used in run_insert_range():
  CLST lcn2 = r->lcn == SPARSE_LCN ? SPARSE_LCN : (r->lcn + len1);

Apply the same logic in run_remove_range().

Fixes: 10d7c95af043 ("fs/ntfs3: add delayed-allocation (delalloc) support")
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: validate Dirty Page Table capacity in log_replay copy_lcns
Yunpeng Tian [Mon, 4 May 2026 14:19:43 +0000 (07:19 -0700)] 
fs/ntfs3: validate Dirty Page Table capacity in log_replay copy_lcns

In the analysis pass of $LogFile journal replay, log_replay() copies
LCNs from each action log record into an existing Dirty Page Table
(DPT) entry without bounding the destination index. A crafted NTFS
image with DPT entry lcns_follow=1 and an action log record with
lcns_follow=2 produces a kernel slab out-of-bounds write at mount
time:

  BUG: KASAN: slab-out-of-bounds in log_replay+0x654c/0xdb60
  Write of size 8 at addr ffff8880095e1040 by task mount

Two attacker-controlled fields can drive j+i past the allocated
page_lcns[] array:

  1. dp->lcns_follow (capacity) can be smaller than lrh->lcns_follow.
  2. lrh->target_vcn may be smaller than dp->vcn, making the u64
     subtraction wrap to a huge size_t.

Validate target VCN delta and per-record LCN count against the
DPT entry capacity, bail via the existing out: cleanup label with
-EINVAL.

This mirrors the bounds-check pattern added in commit b2bc7c44ed17
("fs/ntfs3: Fix slab-out-of-bounds read in DeleteIndexEntryRoot")
and commit 0ca0485e4b2e ("fs/ntfs3: validate rec->used in
journal-replay file record check").

Fixes: b46acd6a6a62 ("fs/ntfs3: Add NTFS journal")
Reported-by: Yunpeng Tian <shionthanatos@gmail.com>
Reported-by: Mingda Zhang <npczmd@qq.com>
Reported-by: Gongming Wang <gmwgg05@gmail.com>
Reported-by: Peiyuan Xu <paulbucket12@gmail.com>
Reported-by: Qinrun Dai <jupmouse@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Yunpeng Tian <shionthanatos@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: fix syncing wrong inode on DIRSYNC cross-directory rename
Zhan Xusheng [Wed, 6 May 2026 07:55:54 +0000 (15:55 +0800)] 
fs/ntfs3: fix syncing wrong inode on DIRSYNC cross-directory rename

In ntfs3_rename(), when IS_DIRSYNC(new_dir) is true, the code syncs
the renamed file inode instead of the target directory new_dir:
    if (IS_DIRSYNC(new_dir))
        ntfs_sync_inode(inode);      /* should be new_dir */

DIRSYNC requires that directory metadata changes are written to disk
synchronously.  Since new_dir was modified (a new directory entry was
added), it is new_dir that must be synced to satisfy the guarantee,
not the renamed file itself.

This bug has existed since the initial ntfs3 implementation and was
carried through the refactoring in commit 78ab59fee07f
("fs/ntfs3: Rework file operations").

Fix by syncing new_dir instead of inode.

Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: validate index entry key bounds
ZhengYuan Huang [Fri, 24 Apr 2026 03:47:36 +0000 (11:47 +0800)] 
fs/ntfs3: validate index entry key bounds

[BUG]
A malformed NTFS directory index entry can advertise a key_size larger
than the bytes actually present in its NTFS_DE payload. Directory lookup
then passes that malformed key to cmp_fnames(), which can read past the
end of the kmalloc'ed index buffer.

BUG: KASAN: slab-out-of-bounds in fname_full_size fs/ntfs3/ntfs.h:590 [inline]
BUG: KASAN: slab-out-of-bounds in cmp_fnames+0x1ea/0x230 fs/ntfs3/index.c:46
Read of size 1 at addr ffff88801c313018 by task syz.6.3365/9279

Call Trace:
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xbe/0x130 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xd1/0x650 mm/kasan/report.c:482
 kasan_report+0xfb/0x140 mm/kasan/report.c:595
 __asan_report_load1_noabort+0x14/0x30 mm/kasan/report_generic.c:378
 fname_full_size fs/ntfs3/ntfs.h:590 [inline]
 cmp_fnames+0x1ea/0x230 fs/ntfs3/index.c:46
 hdr_find_e.isra.0+0x3ed/0x670 fs/ntfs3/index.c:762
 indx_find+0x4b5/0x900 fs/ntfs3/index.c:1186
 dir_search_u+0x2c0/0x460 fs/ntfs3/dir.c:254
 ntfs_lookup+0x1cc/0x2a0 fs/ntfs3/namei.c:85
 __lookup_slow+0x241/0x450 fs/namei.c:1816
 lookup_slow fs/namei.c:1833 [inline]
 walk_component+0x31c/0x570 fs/namei.c:2151
 link_path_walk+0x592/0xd60 fs/namei.c:2519
 path_lookupat+0x138/0x660 fs/namei.c:2675
 filename_lookup+0x1f3/0x560 fs/namei.c:2705
 filename_setxattr+0xad/0x1c0 fs/xattr.c:660
 path_setxattrat+0x1d8/0x280 fs/xattr.c:713
 __do_sys_lsetxattr fs/xattr.c:754 [inline]
 __se_sys_lsetxattr fs/xattr.c:750 [inline]
 __x64_sys_lsetxattr+0xd0/0x150 fs/xattr.c:750
 ...

Allocated by task 9279:
 kasan_save_stack+0x39/0x70 mm/kasan/common.c:56
 kasan_save_track+0x14/0x40 mm/kasan/common.c:77
 kasan_save_alloc_info+0x37/0x60 mm/kasan/generic.c:573
 poison_kmalloc_redzone mm/kasan/common.c:400 [inline]
 __kasan_kmalloc+0xc3/0xd0 mm/kasan/common.c:417
 kasan_kmalloc include/linux/kasan.h:262 [inline]
 __do_kmalloc_node mm/slub.c:5650 [inline]
 __kmalloc_noprof+0x2bd/0x900 mm/slub.c:5662
 kmalloc_noprof include/linux/slab.h:961 [inline]
 indx_read+0x41d/0xad0 fs/ntfs3/index.c:1059
 indx_find+0x447/0x900 fs/ntfs3/index.c:1179
 dir_search_u+0x2c0/0x460 fs/ntfs3/dir.c:254
 ntfs_lookup+0x1cc/0x2a0 fs/ntfs3/namei.c:85
 __lookup_slow+0x241/0x450 fs/namei.c:1816
 lookup_slow fs/namei.c:1833 [inline]
 walk_component+0x31c/0x570 fs/namei.c:2151
 link_path_walk+0x592/0xd60 fs/namei.c:2519
 path_lookupat+0x138/0x660 fs/namei.c:2675
 filename_lookup+0x1f3/0x560 fs/namei.c:2705
 filename_setxattr+0xad/0x1c0 fs/xattr.c:660
 path_setxattrat+0x1d8/0x280 fs/xattr.c:713
 __do_sys_lsetxattr fs/xattr.c:754 [inline]
 __se_sys_lsetxattr fs/xattr.c:750 [inline]
 __x64_sys_lsetxattr+0xd0/0x150 fs/xattr.c:750
 ...

[CAUSE]
The index-header validators only validated INDEX_HDR-level geometry.
They did not walk each NTFS_DE to verify entry alignment, subnode
layout, or that key_size fit inside the entry payload. They also
allowed a last sentinel entry to carry a non-zero key_size.

[FIX]
Walk every NTFS_DE in ntfs3's index-header validators and reject
entries with invalid layout, mismatched subnode state, oversized
key_size, or non-zero sentinel keys before lookup or log replay can
consume them.

Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: preserve non-DOS attribute bits in system.dos_attrib
ZhengYuan Huang [Mon, 27 Apr 2026 03:24:18 +0000 (11:24 +0800)] 
fs/ntfs3: preserve non-DOS attribute bits in system.dos_attrib

[BUG]
A corrupted ntfs3 image can hit a NULL function pointer call in
generic_perform_write() after toggling system.ntfs_attrib and then
overwriting system.dos_attrib on the same file.

BUG: kernel NULL pointer dereference, address: 0000000000000000
\#PF: supervisor instruction fetch in kernel mode
\#PF: error_code(0x0010) - not-present page
PGD bed5067 P4D bed5067 PUD 0
Oops: Oops: 0010 [#1] SMP KASAN NOPTI
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0018:ffff88801025f988 EFLAGS: 00010246
Call Trace:
 generic_perform_write+0x409/0x8c0 mm/filemap.c:4255
 __generic_file_write_iter+0x1bb/0x200 mm/filemap.c:4372
 ntfs_file_write_iter+0xcd9/0x1c20 fs/ntfs3/file.c:1253
 new_sync_write fs/read_write.c:593 [inline]
 vfs_write+0x63b/0xf70 fs/read_write.c:686
 ksys_write+0x133/0x250 fs/read_write.c:738
 __do_sys_write fs/read_write.c:749 [inline]
 __se_sys_write fs/read_write.c:746 [inline]
 __x64_sys_write+0x77/0xc0 fs/read_write.c:746
 ...

[CAUSE]
system.ntfs_attrib updates ATTR_DATA flags via ni_new_attr_flags()
and switches i_mapping->a_ops to ntfs_aops_cmpr when
FILE_ATTRIBUTE_COMPRESSED is set. system.dos_attrib then overwrites
ni->std_fa from a one-byte DOS attribute value, clearing the compression
bit without updating ATTR_DATA or the mapping operations.

Old buffered writes use is_compressed(ni) to choose
__generic_file_write_iter(). That leaves generic_perform_write() calling
a NULL write_begin callback from ntfs_aops_cmpr.

[FIX]
Treat system.dos_attrib as a low-byte DOS attribute update and preserve the
existing non-DOS attribute bits in ni->std_fa. This keeps compressed and
sparse state consistent with ATTR_DATA and the mapping operations while
keeping the existing DOS attribute semantics intact.

Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: hold ni_lock across readdir metadata walk
ZhengYuan Huang [Mon, 27 Apr 2026 07:26:50 +0000 (15:26 +0800)] 
fs/ntfs3: hold ni_lock across readdir metadata walk

[BUG]
KASAN reports a slab-use-after-free during getdents(2):

BUG: KASAN: slab-use-after-free in ntfs_read_mft fs/ntfs3/inode.c:79 [inline]
BUG: KASAN: slab-use-after-free in ntfs_iget5+0x59b/0x3450 fs/ntfs3/inode.c:541
Read of size 2 at addr ffff88800b7a5a4e by task syz.0.1061/2354

Call Trace:
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xbe/0x130 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xd1/0x650 mm/kasan/report.c:482
 kasan_report+0xfb/0x140 mm/kasan/report.c:595
 __asan_report_load2_noabort+0x14/0x30 mm/kasan/report_generic.c:379
 ntfs_read_mft fs/ntfs3/inode.c:79 [inline]
 ntfs_iget5+0x59b/0x3450 fs/ntfs3/inode.c:541
 ntfs_dir_emit fs/ntfs3/dir.c:337 [inline]
 ntfs_read_hdr+0x714/0x930 fs/ntfs3/dir.c:385
 ntfs_readdir+0xaad/0x1010 fs/ntfs3/dir.c:458
 iterate_dir+0x276/0x9e0 fs/readdir.c:108
 __do_sys_getdents fs/readdir.c:326 [inline]
 __se_sys_getdents fs/readdir.c:312 [inline]
 __x64_sys_getdents+0x143/0x290 fs/readdir.c:312
 ...

Allocated by task 2160:
 kasan_save_stack+0x39/0x70 mm/kasan/common.c:56
 kasan_save_track+0x14/0x40 mm/kasan/common.c:77
 kasan_save_alloc_info+0x37/0x60 mm/kasan/generic.c:573
 poison_kmalloc_redzone mm/kasan/common.c:400 [inline]
 __kasan_kmalloc+0xc3/0xd0 mm/kasan/common.c:417
 kasan_kmalloc include/linux/kasan.h:262 [inline]
 __do_kmalloc_node mm/slub.c:5650 [inline]
 __kmalloc_noprof+0x2bd/0x900 mm/slub.c:5662
 kmalloc_noprof include/linux/slab.h:961 [inline]
 mi_init+0x9d/0x110 fs/ntfs3/record.c:105
 mi_format_new+0x6b/0x500 fs/ntfs3/record.c:422
 ni_add_subrecord+0x129/0x540 fs/ntfs3/frecord.c:321
 ntfs_look_free_mft+0x238/0xd90 fs/ntfs3/fsntfs.c:715
 ni_create_attr_list+0x8e6/0x1690 fs/ntfs3/frecord.c:826
 ni_ins_attr_ext+0x5ec/0x9d0 fs/ntfs3/frecord.c:924
 ni_insert_attr+0x2bf/0x830 fs/ntfs3/frecord.c:1091
 ni_insert_resident+0xec/0x3d0 fs/ntfs3/frecord.c:1475
 ni_add_name+0x4b2/0x8a0 fs/ntfs3/frecord.c:2987
 ni_rename+0xa6/0x160 fs/ntfs3/frecord.c:3026
 ntfs_rename+0xa19/0xe00 fs/ntfs3/namei.c:332
 vfs_rename+0xd42/0x1d50 fs/namei.c:5216
 do_renameat2+0x715/0xb60 fs/namei.c:5364
 __do_sys_rename fs/namei.c:5411 [inline]
 __se_sys_rename fs/namei.c:5409 [inline]
 __x64_sys_rename+0x83/0xb0 fs/namei.c:5409
 x64_sys_call+0x8c4/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:83
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Freed by task 85:
 kasan_save_stack+0x39/0x70 mm/kasan/common.c:56
 kasan_save_track+0x14/0x40 mm/kasan/common.c:77
 __kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:587
 kasan_save_free_info mm/kasan/kasan.h:406 [inline]
 poison_slab_object mm/kasan/common.c:252 [inline]
 __kasan_slab_free+0x6f/0xa0 mm/kasan/common.c:284
 kasan_slab_free include/linux/kasan.h:234 [inline]
 slab_free_hook mm/slub.c:2543 [inline]
 slab_free mm/slub.c:6642 [inline]
 kfree+0x2bf/0x6b0 mm/slub.c:6849
 mi_clear fs/ntfs3/ntfs_fs.h:1107 [inline]
 mi_put+0x10e/0x1a0 fs/ntfs3/record.c:97
 ni_write_inode+0x479/0x2a00 fs/ntfs3/frecord.c:3320
 ntfs3_write_inode+0x51/0x70 fs/ntfs3/inode.c:1042
 write_inode fs/fs-writeback.c:1564 [inline]
 __writeback_single_inode+0x8c9/0xc30 fs/fs-writeback.c:1784
 writeback_sb_inodes+0x5e6/0xf60 fs/fs-writeback.c:2015
 __writeback_inodes_wb+0x10c/0x2d0 fs/fs-writeback.c:2086
 wb_writeback+0x63f/0x900 fs/fs-writeback.c:2197
 wb_check_old_data_flush fs/fs-writeback.c:2301 [inline]
 wb_do_writeback fs/fs-writeback.c:2354 [inline]
 wb_workfn+0x8cc/0xd60 fs/fs-writeback.c:2382
 process_one_work+0x8e0/0x1980 kernel/workqueue.c:3263
 process_scheduled_works kernel/workqueue.c:3346 [inline]
 worker_thread+0x683/0xf80 kernel/workqueue.c:3427
 kthread+0x3f0/0x850 kernel/kthread.c:463
 ret_from_fork+0x50f/0x610 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

The faulting address sits 590 bytes inside a freed kmalloc-1k object
allocated by ni_add_subrecord() and freed from ni_write_inode()
writeback.

[CAUSE]
ntfs_readdir() loads all subrecords once, but then drops ni_lock()
before it starts walking the directory metadata through ntfs_read_hdr().
That leaves the current NTFS_DE pointer backed by parent-directory
subrecord memory that concurrent writeback is still allowed to compact
and free.

The later ntfs_dir_emit() -> ntfs_iget5() call exposes the stale e->ref,
but the lifetime bug starts earlier: readdir is still consuming
parent-directory metadata after releasing the lock that protects it.

[FIX]
Keep ni_lock() held from the point where ntfs_readdir() starts
consuming the directory metadata until the walk over root/index entries
is finished.

This closes the parent-directory lifetime hole directly and keeps the
existing readdir d_type behaviour unchanged.

Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agontfs3: avoid -Wmaybe-uninitialized warning
Arnd Bergmann [Tue, 21 Apr 2026 20:26:20 +0000 (22:26 +0200)] 
ntfs3: avoid -Wmaybe-uninitialized warning

This warning shows up with gcc-10 now:

In file included from fs/ntfs3/index.c:15:
fs/ntfs3/index.c: In function 'indx_add_allocate':
fs/ntfs3/ntfs_fs.h:463:9: error: 'bmp_size' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  463 |  return attr_set_size_ex(ni, type, name, name_len, run, new_size,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  464 |     new_valid, keep_prealloc, NULL, false);
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/ntfs3/index.c:1498:6: note: 'bmp_size' was declared here
 1498 |  u64 bmp_size, bmp_size_v;
      |      ^~~~~~~~

The warning does look correct, as the 'out2' label can be reached
without initializing bmp_size and bmp_size_v. Initialize these at
the same place as bmp.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agofs/ntfs3: add bounds check to run_get_highest_vcn()
Konstantin Komarov [Thu, 30 Apr 2026 12:30:13 +0000 (14:30 +0200)] 
fs/ntfs3: add bounds check to run_get_highest_vcn()

run_get_highest_vcn() parses a packed NTFS mapping-pairs buffer without
any length bound, relying solely on a 0x00 terminator to stop.  A
crafted $LogFile UpdateMappingPairs record whose embedded attribute
contains mapping-pairs runs without a terminator causes the function to
read past the slab allocation, triggering a KASAN slab-out-of-bounds
read on mount.

The sibling function run_unpack() received an analogous bounds-check in
commit b62567bca474 ("ntfs3: add buffer boundary checks to run_unpack()"),
but run_get_highest_vcn() was missed.

Take a run_buf_size parameter and reject any run header whose payload
would extend past the buffer end, mirroring the pattern used by
run_unpack().  The caller in fslog.c passes the remaining attribute
bytes after the mapping-pairs offset.

KASAN report (on mainline v7.1 merge window HEAD):

  BUG: KASAN: slab-out-of-bounds in run_get_highest_vcn+0x3c0/0x410
  Read of size 1 at addr ffff88800e2d5400 by task mount/72
  Call Trace:
   run_get_highest_vcn+0x3c0/0x410
   do_action.isra.0+0x3ba8/0x7b50
   log_replay+0x9ddd/0x10200
   ntfs_loadlog_and_replay+0x4ad/0x610
   ntfs_fill_super+0x214a/0x4540

Fixes: b62567bca474 ("ntfs3: add buffer boundary checks to run_unpack()")
Signed-off-by: Jaeyeong Lee <lee@jaeyeong.cc>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agoASoC: rockchip: i2s: Use managed hclk and runtime PM cleanup
Cássio Gabriel [Fri, 22 May 2026 02:30:07 +0000 (23:30 -0300)] 
ASoC: rockchip: i2s: Use managed hclk and runtime PM cleanup

The Rockchip I2S driver mixes devm-managed probe resources with manual
runtime PM and hclk cleanup.  This leaves the remove path doing runtime PM
shutdown and clock disable before devm-managed ASoC and PCM resources are
released.

Keep the bus clock enabled for the device lifetime with
devm_clk_get_enabled(), and move the runtime PM teardown into devres so the
unwind order matches the managed registrations.  This also removes the
remove callback, which only existed for cleanup.

Use a devm action for the final runtime suspend and register it before the
managed runtime PM action, so teardown disables runtime PM before forcing
the device into the suspended state.

Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://patch.msgid.link/20260521-asoc-rockchip-i2s-devm-cleanup-v1-1-9319bd781393@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agomount: honour SB_NOUSER in the new mount API
Al Viro [Tue, 2 Jun 2026 02:04:44 +0000 (03:04 +0100)] 
mount: honour SB_NOUSER in the new mount API

One should *not* be allowed to mount one of those, new API or not.

Reported-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://patch.msgid.link/20260602020444.GP2636677@ZenIV
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoASoC: cs35l56: Share common SoundWire interrupt enable/disable code
Richard Fitzgerald [Fri, 29 May 2026 14:03:50 +0000 (15:03 +0100)] 
ASoC: cs35l56: Share common SoundWire interrupt enable/disable code

Move the duplicated SoundWire interrupt enable/disable code into shared
functions. These new functions are in cs35l56.c to prevent circular
dependency between cs35l56.c and cs35l56-sdw.c

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260529140350.408557-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoKVM: s390: Lock pte when making page secure
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:53 +0000 (16:23 +0200)] 
KVM: s390: Lock pte when making page secure

Make sure _kvm_s390_pv_make_secure() takes the pte lock for the given
address when attempting to make the page secure.

One of the steps in making the page secure is freezing the folio using
folio_ref_freeze(), which temporarily sets the reference count to 0.
Any attempt to get such a folio while frozen will fail and cause a
warning to be printed.

Other users of folio_ref_freeze() make sure that the page is not mapped
while it's being frozen, thus preventing gup functions from being able
to access it. For _kvm_s390_pv_make_secure(), this is not possible,
because the page needs to be mapped in order for the import to succeed.

By taking the pte lock, gup functions will be blocked until the import
operation is done, thus avoiding the race.

In theory this does not completely solve the issue: if a page is mapped
through multiple mappings, locking one pte does not protect from
calling gup on it through the other mapping. In practice this does not
happen and it is a decent stopgap solution until a more correct
solution is available.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-8-imbrenda@linux.ibm.com>

3 weeks agoKVM: s390: Fix fault-in code
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:52 +0000 (16:23 +0200)] 
KVM: s390: Fix fault-in code

Fix the fault-in code so that it does not return success if a
concurrent unmap event invalidated the fault-in process between the
best-effort lockless check and the proper check with lock.

The new behaviour is to retry, like the best-effort lockless check
already did.

This prevents the fault-in handler from returning success without
having actually faulted in the requested page.

Fixes: e907ae530133 ("KVM: s390: Add helper functions for fault handling")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-7-imbrenda@linux.ibm.com>

3 weeks agoKVM: s390: vsie: Fix rmap handling in _do_shadow_crste()
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:51 +0000 (16:23 +0200)] 
KVM: s390: vsie: Fix rmap handling in _do_shadow_crste()

Fix _do_shadow_crste() to also apply a mask on the reverse address, to
prevent spurious entries from being created, like already done in
gmap_protect_rmap().

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-6-imbrenda@linux.ibm.com>

3 weeks agoKVM: s390: Fix guest / virtual address confusion in _essa_clear_cbrl()
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:50 +0000 (16:23 +0200)] 
KVM: s390: Fix guest / virtual address confusion in _essa_clear_cbrl()

Until now, gmap_helper_zap_one_page() was being called with the guest
absolute address, but it expects a userspace virtual address.

This meant that in the best case the requested pages were not being
discarded, and in the worst case that the wrong pages were being
discarded.

Fix this by converting the guest absolute address to host virtual
before passing it to gmap_helper_zap_one_page().

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-5-imbrenda@linux.ibm.com>

3 weeks agoKVM: s390: Avoid potentially sleeping while atomic when zapping pages
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:49 +0000 (16:23 +0200)] 
KVM: s390: Avoid potentially sleeping while atomic when zapping pages

Factor out try_get_locked_pte(), which behaves similarly to
get_locked_pte(), but does not attempt to allocate missing tables and
performs a spin_trylock() instead of blocking.

The new function is also exported, since it will be used in other
patches.

If intermediate entries are missing, there can be no pte swap entry to
free, so it's safe to ignore them.

This avoids potentially sleeping while atomic.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-4-imbrenda@linux.ibm.com>

3 weeks agoKVM: s390: Fix _gmap_crstep_xchg_atomic()
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:48 +0000 (16:23 +0200)] 
KVM: s390: Fix _gmap_crstep_xchg_atomic()

The previous incorrect behaviour cleared the vsie_notif bit without
returning false, which allowed shadow crstes to be installed without
the vsie_notif bit.

Return false and do not perform the operation if an unshadow event has
been triggered, but still attempt to clear the vsie_notif bit from the
existing crste.

This will prevent the installation of shadow crstes without vsie_notif
bit and will also prevent the caller from looping forever if it was
not checking for the sg->invalidated flag.

Fixes: b827ef02f409 ("KVM: s390: Remove non-atomic dat_crstep_xchg()")
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-3-imbrenda@linux.ibm.com>

3 weeks agoKVM: s390: Fix _gmap_unmap_crste()
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:47 +0000 (16:23 +0200)] 
KVM: s390: Fix _gmap_unmap_crste()

In _gmap_unmap_crste(), the crste to be unmapped is zapped calling
gmap_crstep_xchg_atomic() exactly once, and expecting it to succeed.
This is a reasonable sanity check, since kvm->mmu_lock is being held in
write mode, and thus no races should be possible.

An upcoming patch will change the behaviour of gmap_crstep_xchg_atomic()
to return false and clear the vsie_notif bit if the operation triggers
an unshadow operation. With the new behaviour, an unmap operation that
triggers an unshadow would cause the VM to be killed.

Prepare for the change by checking if the vsie_notif bit was set in
the old crste if gmap_crstep_xchg_atomic() fails the first time, and
try a second time. The second time no failures are allowed.

Fixes: b827ef02f409 ("KVM: s390: Remove non-atomic dat_crstep_xchg()")
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-2-imbrenda@linux.ibm.com>

3 weeks agotracing/eprobes: Allow use of BTF names to dereference pointers
Steven Rostedt [Mon, 1 Jun 2026 17:07:46 +0000 (13:07 -0400)] 
tracing/eprobes: Allow use of BTF names to dereference pointers

Add syntax to the parsing of eprobes to be able to typecast a trace event
field that is a pointer to a structure.

Currently, a dereference must be a number, where the user has to figure
out manually the offset of a member of a structure that they want to
dereference.

But for event probes that records a field that happens to be a pointer to
a structure, it cannot dereference these values with BTF naming, but
must use numerical offsets.

For example, to find out what device a sk_buff is pointing to in the
net_dev_xmit trace event, one must first use gdb to find the offsets of the
members of the structures:

 (gdb) p &((struct sk_buff *)0)->dev
 $1 = (struct net_device **) 0x10
 (gdb) p &((struct net_device *)0)->name
 $2 = (char (*)[16]) 0x118

And then use the raw numbers to dereference:

  # echo 'e:xmit net.net_dev_xmit +0x118(+0x10($skbaddr)):string' >> dynamic_events

If BTF is in the kernel, then instead, the skbaddr can be typecast to
sk_buff and use the normal dereference logic.

  # echo 'e:xmit net.net_dev_xmit (sk_buff)skbaddr->dev->name:string' >> dynamic_events
  # echo 1 > events/eprobes/xmit/enable
  # cat trace
[..]
    sshd-session-1022    [000] b..2.   860.249343: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250061: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250142: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.263553: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.283820: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.302716: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.322905: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.342828: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.362268: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.382335: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.400856: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.419893: xmit: (net.net_dev_xmit) arg1="enp7s0"

The syntax is simply: (STRUCT)(FIELD)->MEMBER[->MEMBER..]

Also add comments around the #else and #endif of #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
to know what they are for.

Link: https://lore.kernel.org/all/20260601130746.2139d926@gandalf.local.home/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
3 weeks agoiio: dac: ad5686: create bus ops struct
Rodrigo Alencar [Sun, 24 May 2026 10:17:08 +0000 (11:17 +0100)] 
iio: dac: ad5686: create bus ops struct

Create struct with bus operations, which will be used to extend bus
implementation features. Auxiliary functions ad5686_write() and
ad5686_read() are created and ad5686_probe() now receives an ops struct
pointer rather than individual read and write functions.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
3 weeks agoiio: dac: ad5686: cleanup doc header of local structs
Rodrigo Alencar [Sun, 24 May 2026 10:17:07 +0000 (11:17 +0100)] 
iio: dac: ad5686: cleanup doc header of local structs

Review documentation comment header for ad5686_chip_info and ad5686_state.
Update variable names and description and remove unnecessary blank line
between comment and struct declaration.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
3 weeks agoiio: dac: ad5686: add control_sync() for single-channel devices
Rodrigo Alencar [Sun, 24 May 2026 10:17:06 +0000 (11:17 +0100)] 
iio: dac: ad5686: add control_sync() for single-channel devices

Create ad5310_control_sync() and ad5683_control_sync() functions that
properly consume the mask definitions with FIELD_PREP(). This allows to
reuse a function that updates the control register with cached values,
without relying on confusing logic that depends on st->use_internal_vref,
which is initialized earlier in ad5686_probe() because it is also
applicable to the AD5686_REGMAP case, removing the need for the
has_external_vref. Powerdown masks initialization is simplified as
*_control_sync() masks outs any unused bits for the single-channel case.
The change cleans up ad5686_write_dac_powerdown() and ad5686_probe(),
organizing the code for feature extension, e.g. gain control support for
single-channel devices.

Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
3 weeks agoiio: dac: ad5686: add helpers to handle powerdown masks
Rodrigo Alencar [Sun, 24 May 2026 10:17:05 +0000 (11:17 +0100)] 
iio: dac: ad5686: add helpers to handle powerdown masks

Add ad5686_pd_field_set() and ad5686_pd_field_get() helpers to cleanup
powerdown mask control. Define AD5686_PD_* constants, e.g. AD5686_PD_MSK
to hold powerdown mask value for a single channel. AD5686_LDAC_PWRDN_*
macros are replaced by AD5686_PD_MODE_*, because they are unused and the
LDAC feature for async load of DAC channel values is not related to power
down control.

Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
3 weeks agoiio: dac: ad5686: add of_match table to the spi driver
Rodrigo Alencar [Sun, 24 May 2026 10:17:04 +0000 (11:17 +0100)] 
iio: dac: ad5686: add of_match table to the spi driver

Add of_match table for the SPI device variants to be consistent with the
AD5696 I2C driver.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
3 weeks agoiio: dac: ad5686: drop enum id
Rodrigo Alencar [Sun, 24 May 2026 10:17:03 +0000 (11:17 +0100)] 
iio: dac: ad5686: drop enum id

Split chip info table into separate structs and expose them to the spi
i2c drivers. That is the preferrable approach and allows for the drivers
to have knowledge of the device info before the common probe function gets
called. Those chip info structs may be shared by SPI and I2C driver
variants.
Channel declaration definitions are grouped according to channel count and
DECLARE_AD5693_CHANNELS() macro is renamed to DECLARE_AD5683_CHANNELS() to
match the regmap_type enum.
Use spi_get_device_match_data() and i2c_get_match_data() to get chip info
struct reference, passing it as parameter to the core probe function.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
3 weeks agoiio: dac: ad5686: remove redundant register definition
Rodrigo Alencar [Sun, 24 May 2026 10:17:02 +0000 (11:17 +0100)] 
iio: dac: ad5686: remove redundant register definition

AD5683_REGMAP and AD5693_REGMAP behave the same way in the common code,
and that is because they target single channel devices from the same
sub-family. There is no reason to separate them and it will make things
simpler when refactoring the chip info table.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
3 weeks agoiio: dac: ad5686: refactor include headers
Rodrigo Alencar [Sun, 24 May 2026 10:17:01 +0000 (11:17 +0100)] 
iio: dac: ad5686: refactor include headers

Apply IWYU principle, replacing unused/generic headers for
specific/missing headers. The resulting include directive lists are sorted
accordingly.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
3 weeks agoMerge tag 'v7.1-rc6' into work
Jonathan Cameron [Tue, 2 Jun 2026 14:24:19 +0000 (15:24 +0100)] 
Merge tag 'v7.1-rc6' into work

Linux 7.1-rc6

3 weeks agoiio: adc: ad4080: fix AD4880 chip ID
Antoniu Miclaus [Tue, 2 Jun 2026 08:40:19 +0000 (11:40 +0300)] 
iio: adc: ad4080: fix AD4880 chip ID

The AD4880 chip ID was incorrectly set to 0x0750. According to the
datasheet, the product ID registers read 0x00 (PRODUCT_ID_H) and 0x59
(PRODUCT_ID_L), giving a combined chip ID of 0x0059. Fix the value to
match the actual hardware.

Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com>
Reviewed-by: Joshua Crofts <joshua.crofts1@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
3 weeks agoprintk: fix typos in comments
Naveen Kumar Chaudhary [Mon, 1 Jun 2026 03:56:26 +0000 (09:26 +0530)] 
printk: fix typos in comments

Fix spelling/grammatical errors in printk.c and nbcon.c:
- "precation" -> "precautionary"
- "othrewise" -> "otherwise"
- "An usable" -> "A usable"
- "made a progress" -> "made progress"
- "preemtible" -> "preemptible"
- "mechasism" -> "mechanism"
- "ownerhip" -> "ownership"

Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com>
Link: https://patch.msgid.link/pakfewagyzb7da3yuxnaxdaoma5w4j2c7i3xebmcld3xy4mqs5@zxsx2idpxrdq
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
3 weeks agogpu: nova-core: Hopper/Blackwell: add FMC signature extraction
John Hubbard [Tue, 2 Jun 2026 03:21:01 +0000 (20:21 -0700)] 
gpu: nova-core: Hopper/Blackwell: add FMC signature extraction

Extract the SHA-384 hash, RSA public key, and RSA signature from the
FMC ELF32 firmware sections. FSP Chain of Trust verification needs
these to validate the FMC image during boot.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-14-jhubbard@nvidia.com
[acourbot: derive `Zeroable` on `FmcSignature` for in-place initialization]
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting
John Hubbard [Tue, 2 Jun 2026 03:21:00 +0000 (20:21 -0700)] 
gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waiting

Hopper and Blackwell use FSP instead of SEC2 for secure boot. The
driver must wait for FSP secure boot to complete before continuing
with GSP bring-up. Poll for boot success with a 5-second timeout, and
return the FSP interface only on success so that later Chain of Trust
operations cannot run before FSP is ready. The interface owns the FSP
falcon and the FMC firmware.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-13-jhubbard@nvidia.com
[acourbot: use `inspect_err` instead of `map_err` and display actual error]
[acourbot: limit visibility of `fsp_hal` to `super``]
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: Hopper/Blackwell: add FMC firmware image
John Hubbard [Tue, 2 Jun 2026 03:20:59 +0000 (20:20 -0700)] 
gpu: nova-core: Hopper/Blackwell: add FMC firmware image

FSP is the Falcon that runs FMC firmware on Hopper and Blackwell.
Load the FMC ELF in two forms: the image section that FSP boots from,
and the full Firmware object for later signature extraction during
Chain of Trust verification. Declare the FMC image in the module's
firmware table so it is bundled for FSP-based chipsets.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-12-jhubbard@nvidia.com
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub
John Hubbard [Tue, 2 Jun 2026 03:20:58 +0000 (20:20 -0700)] 
gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stub

Add the FSP (Foundation Security Processor) falcon engine type that
will handle secure boot and Chain of Trust operations on Hopper and
Blackwell architectures.

The FSP falcon replaces SEC2's role in the boot sequence for these newer
architectures. This initial stub just defines the falcon type and its
base address.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-11-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: add auto-detection of 32-bit, 64-bit firmware images
John Hubbard [Tue, 2 Jun 2026 03:20:57 +0000 (20:20 -0700)] 
gpu: nova-core: add auto-detection of 32-bit, 64-bit firmware images

A firmware image may be either a 32-bit or a 64-bit ELF, and callers
should not have to know which. Detect the ELF class from the image
header at parse time and dispatch to the matching parser, so a single
entry point handles both layouts.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-10-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: add support for 32-bit firmware images
John Hubbard [Tue, 2 Jun 2026 03:20:56 +0000 (20:20 -0700)] 
gpu: nova-core: add support for 32-bit firmware images

Some GPU firmware images are packaged as 32-bit ELF rather than 64-bit.
Add a 32-bit implementation of the shared ELF section-parsing
abstraction so those images can be parsed alongside the existing 64-bit
path.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-9-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: don't assume 64-bit firmware images
John Hubbard [Tue, 2 Jun 2026 03:20:55 +0000 (20:20 -0700)] 
gpu: nova-core: don't assume 64-bit firmware images

Introduce a single ELF format abstraction that ties each ELF header
type to its matching section-header type. This keeps the shared
section parser ready for upcoming ELF32 support and avoids mixing
32-bit and 64-bit ELF layouts by mistake.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-8-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: Blackwell: use correct sysmem flush registers
John Hubbard [Tue, 2 Jun 2026 03:20:54 +0000 (20:20 -0700)] 
gpu: nova-core: Blackwell: use correct sysmem flush registers

Blackwell GPUs moved the sysmem flush page registers away from the
Ampere/Ada location. GB10x routes the flush through a pair of HSHUB0
register sets (primary and egress) that must both be programmed to
the same address. GB20x routes it through FBHUB0.

Define these registers relative to their HSHUB0 and FBHUB0 bases, as
Open RM does, and implement the flush paths in the GB10x and GB20x
framebuffer HALs.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-7-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heap
John Hubbard [Tue, 2 Jun 2026 03:20:53 +0000 (20:20 -0700)] 
gpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heap

The GSP-RM boot working memory portion of the WPR2 heap must be
larger on Hopper and later GPUs than on Turing, Ampere, and Ada.
Select the larger value for those generations.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-6-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: Hopper/Blackwell: larger non-WPR heap
John Hubbard [Tue, 2 Jun 2026 03:20:52 +0000 (20:20 -0700)] 
gpu: nova-core: Hopper/Blackwell: larger non-WPR heap

Hopper and Blackwell need a larger non-WPR heap than the 1 MiB that
earlier architectures use. Hopper and Blackwell GB10x need 2 MiB, while
Blackwell GB20x needs 2 MiB + 128 KiB. These sizes diverge by family,
so give Hopper and each Blackwell family its own framebuffer HAL and
select the non-WPR heap size per chipset family.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-5-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: Blackwell: compute PMU-reserved framebuffer size
John Hubbard [Tue, 2 Jun 2026 03:20:51 +0000 (20:20 -0700)] 
gpu: nova-core: Blackwell: compute PMU-reserved framebuffer size

GSP boot needs to know how much framebuffer memory is reserved for
the PMU. Compute it per architecture: Blackwell dGPUs reserve a
non-zero amount, earlier architectures leave it at zero, matching
Open RM behavior.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-4-jhubbard@nvidia.com
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: Hopper/Blackwell: new location for PCI config mirror
John Hubbard [Tue, 2 Jun 2026 03:20:50 +0000 (20:20 -0700)] 
gpu: nova-core: Hopper/Blackwell: new location for PCI config mirror

Hopper and Blackwell GPUs moved the PCI config space mirror from
0x088000 to 0x092000. Select the correct address per architecture
when building the GSP system info command.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260602032111.224790-3-jhubbard@nvidia.com
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agogpu: nova-core: set DMA mask width based on GPU architecture
John Hubbard [Tue, 2 Jun 2026 03:20:49 +0000 (20:20 -0700)] 
gpu: nova-core: set DMA mask width based on GPU architecture

Replace the hardcoded 47-bit DMA mask with a GPU HAL method that
provides the correct value for the architecture.

Set the DMA mask in Gpu::new(). Gpu owns all DMA allocations for
the device, so no concurrent allocations can exist while the
constructor is still running.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260602032111.224790-2-jhubbard@nvidia.com
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
3 weeks agoblock, bfq: release cgroup stats with bfq_group
Yu Kuai [Mon, 1 Jun 2026 06:15:02 +0000 (14:15 +0800)] 
block, bfq: release cgroup stats with bfq_group

BFQ cgroup stats contain percpu counters embedded in struct bfq_group,
but the old free path destroys them from bfq_pd_free(), which is tied
to blkg policy-data teardown.

That is not the same lifetime as struct bfq_group. BFQ pins bfq_group
while bfq_queue entities refer to it, so bfq_pd_free() can drop the
policy-data reference while other bfq_group references still exist. The
following blkcg change also defers policy-data release through RCU and
leaves BFQ to run the final bfqg_put() from an RCU callback. For that
conversion, stats teardown must belong to the last bfq_group put, not to
policy-data teardown.

Move stats teardown to bfqg_put() so the embedded counters are destroyed
exactly when the last bfq_group reference is released, before kfree(bfqg).

Without this preparatory change, the RCU-delayed policy-data free
conversion reproduced the following KASAN report:

  BUG: KASAN: slab-use-after-free in percpu_counter_destroy_many+0xf1/0x2e0
  Write of size 8 at addr ffff88811d9409e0 by task test_blkcg/535

  CPU: 0 UID: 0 PID: 535 Comm: test_blkcg Not tainted 7.1.0-rc2-g1e14adca0199 #1 PREEMPT  ea13f83d4b74a12510d20db4a7d9a0fe8275f05c
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x54/0x70
   print_address_description+0x77/0x200
   ? percpu_counter_destroy_many+0xf1/0x2e0
   print_report+0x64/0x70
   kasan_report+0x118/0x150
   ? percpu_counter_destroy_many+0xf1/0x2e0
   percpu_counter_destroy_many+0xf1/0x2e0
   __mmdrop+0x1d8/0x350
   finish_task_switch+0x3f5/0x570
   __schedule+0xe8e/0x18a0
   schedule+0xfe/0x1c0
   schedule_timeout+0x7f/0x1d0
   __wait_for_common+0x26c/0x3f0
   wait_for_completion_state+0x21/0x40
   call_usermodehelper_exec+0x271/0x2c0
   __request_module+0x296/0x410
   elv_iosched_store+0x1bc/0x2c0
   queue_attr_store+0x152/0x1c0
   kernfs_fop_write_iter+0x1d7/0x280
   vfs_write+0x580/0x630
   ksys_write+0xec/0x190
   do_syscall_64+0x156/0x490
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Allocated by task 535:
   kasan_save_track+0x3e/0x80
   __kasan_kmalloc+0x72/0x90
   bfq_pd_alloc+0x60/0x100 [bfq]
   blkg_create+0x3bb/0xbe0
   blkg_lookup_create+0x3a2/0x460
   blkg_conf_start+0x24a/0x2d0
   bfq_io_set_weight+0x17f/0x430 [bfq]
   cgroup_file_write+0x1c5/0x4b0
   kernfs_fop_write_iter+0x1d7/0x280
   vfs_write+0x580/0x630
   ksys_write+0xec/0x190
   do_syscall_64+0x156/0x490
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Freed by task 0:
   kasan_save_track+0x3e/0x80
   kasan_save_free_info+0x46/0x50
   __kasan_slab_free+0x3a/0x60
   kfree+0x14e/0x4f0
   rcu_core+0x6f3/0xcd0
   handle_softirqs+0x1a0/0x550
   __irq_exit_rcu+0x8c/0x150
   irq_exit_rcu+0xe/0x20
   sysvec_apic_timer_interrupt+0x6e/0x80
   asm_sysvec_apic_timer_interrupt+0x1a/0x20

  Last potentially related work creation:
   kasan_save_stack+0x3e/0x60
   kasan_record_aux_stack+0x99/0xb0
   call_rcu+0x55/0x5c0
   blkg_free_workfn+0x130/0x220
   process_scheduled_works+0x655/0xb60
   worker_thread+0x446/0x600
   kthread+0x1f4/0x230
   ret_from_fork+0x259/0x420
   ret_from_fork_asm+0x1a/0x30

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260601061502.899552-1-yukuai@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>