git.ipfire.org Git - thirdparty/linux.git/log

mm/khugepaged: generalize hugepage_vma_revalidate for mTHP support

Patch series "khugepaged: add mTHP collapse support", v19.

The following series provides khugepaged with the capability to collapse
anonymous memory regions to mTHPs.

To achieve this we generalize the khugepaged functions to no longer depend
on PMD_ORDER.  Then during the PMD scan, we use a bitmap to track
individual pages that are occupied (!none/zero).  After the PMD scan is
done, we use the bitmap to find the optimal mTHP sizes for the PMD range.
The restriction on max_ptes_none is removed during the scan, to make sure
we account for the whole PMD range in the bitmap.  When no mTHP size is
enabled, the legacy behavior of khugepaged is maintained.

We currently only support max_ptes_none values of 0 or HPAGE_PMD_NR - 1
(ie 511).  If any other value is specified, the kernel will emit a warning
and mTHP collapse will default to max_ptes_none=0.  If a mTHP collapse is
attempted, but contains swapped out, or shared pages, we don't perform the
collapse.

It is now also possible to collapse to mTHPs without requiring the PMD THP
size to be enabled.  These limitations are to prevent collapse "creep"
behavior.  This prevents constantly promoting mTHPs to the next available
size, which would occur because a collapse introduces more non-zero pages
that would satisfy the promotion condition on subsequent scans.

Patch 1-2:   Generalize hugepage_vma_revalidate and alloc_charge_folio
             for arbitrary orders.
Patch 3:     Rework max_ptes_* handling into helper functions
Patch 4:     Generalize __collapse_huge_page_* for mTHP support
Patch 5:     Require collapse_huge_page to enter/exit with the lock dropped
Patch 6:     Generalize collapse_huge_page for mTHP collapse
Patch 7:     Skip collapsing mTHP to smaller orders
Patch 8-9:   Add per-order mTHP statistics and tracepoints
Patch 10:    Introduce collapse_possible_orders helper functions
Patch 11-13: Introduce bitmap and mTHP collapse support, fully enabled
Patch 14:    Documentation

Testing:
- Built for x86_64, aarch64, ppc64le, and s390x
- ran all arches on test suites provided by the kernel-tests project
- internal testing suites: functional testing and performance testing
- selftests mm
- I created a test script that I used to push khugepaged to its limits
   while monitoring a number of stats and tracepoints. The code is
   available here[1] (Run in legacy mode for these changes and set mthp
   sizes to inherit)
   The summary from my testings was that there was no significant
   regression noticed through this test. In some cases my changes had
   better collapse latencies, and was able to scan more pages in the same
   amount of time/work, but for the most part the results were consistent.
- redis testing. I did some testing with these changes along with my defer
  changes (see followup [2] post for more details). We've decided to get
  the mTHP changes merged first before attempting the defer series.
- some basic testing on 64k page size.
- lots of general use.

This patch (of 14):

For khugepaged to support different mTHP orders, we must generalize this
to check if the PMD is not shared by another VMA and that the order is
enabled.

We cannot collapse VMA regions that do not span the full PMD.  This is due
to the potential of the PMD being shared by another VMA which leaves us
vulnerable to race conditions if neighboring VMAs are resized.  Always
check the PMD order here to ensure its not shared by another VMA.  We'd
need to lock all VMAs in the PMD range to support this which may lead to
increased lock contention and code complexity.

No functional change in this patch.  Also correct a comment about the
functionality of the revalidation and fix a double space issues.

Link: https://lore.kernel.org/20260605161422.213817-1-npache@redhat.com
Link: https://lore.kernel.org/20260605161422.213817-2-npache@redhat.com
Link: https://gitlab.com/npache/khugepaged_mthp_test
Link: https://lore.kernel.org/lkml/20250515033857.132535-1-npache@redhat.com/
Co-developed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Nico Pache <npache@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Usama Arif <usama.arif@linux.dev>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Takashi Iwai (SUSE) <tiwai@suse.de>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/page_alloc: only update NUMA min ratios on sysctl write

The sysctl handlers for min_unmapped_ratio and min_slab_ratio invoke
setup_min_unmapped_ratio() and setup_min_slab_ratio() unconditionally
after proc_dointvec_minmax(), even for read operations.

These setup functions first zero all per-NUMA node thresholds
(min_unmapped_pages and min_slab_pages) before recalculating them.
Reading /proc sysctl entries therefore temporarily resets node reclaim
thresholds to zero, which may disturb the behavior of __node_reclaim() and
node_reclaim() during the recomputation.

Fix this by only calling the setup functions when the sysctl is actually
written (write == 1), matching the behavior of existing sysctl handlers
like min_free_kbytes and watermark_scale_factor.

This only affects systems with CONFIG_NUMA.

Link: https://lore.kernel.org/tencent_5891052AF9A4C2D490A62F478D446F74AB09@qq.com
Signed-off-by: Jianlin Shi <shijianlin11@foxmail.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

zsmalloc: simplify data output in zs_stats_size_show()

Move the specification for a line break from a seq_puts() call to a
seq_printf() call.

The source code was transformed by using the Coccinelle software.

Link: https://lore.kernel.org/126a924b-6f68-43bf-ae5a-449fb93e527b@web.de
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: split codetag_lock_module_list()

Letting a function argument indicate whether a lock or unlock operation
should be performed is incompatible with compile-time analysis of locking
operations by sparse and Clang.  Hence, split codetag_lock_module_list()
into two functions: a function that locks cttype->mod_lock and another
function that unlocks cttype->mod_lock.  No functionality has been
changed.  See also commit 916cc5167cc6 ("lib: code tagging framework").

Link: https://lore.kernel.org/20260324214226.3684605-1-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

alloc_tag: fix use-after-free in /proc/allocinfo after module unload

allocinfo_start() only reinitializes the codetag iterator at position 0.
For subsequent reads (position > 0), it reuses cached iterator state from
the previous batch.  allocinfo_stop() drops mod_lock between read batches,
which allows module unload to complete and free the module memory that the
cached iterator still references:

  CPU0 (read)                        CPU1 (rmmod)
  ----                               ----
  allocinfo_start(pos=0)
    down_read(mod_lock)
    allocinfo_show()
    ...
  allocinfo_stop()
    up_read(mod_lock)
                                     codetag_unload_module()
                                       kfree(cmod)
                                       release_module_tags()
                                     ...
                                     free_mod_mem()
  allocinfo_start(pos=N)
    down_read(mod_lock)
    // reuses cached iter, skips re-init
  allocinfo_show()
    ct->filename   <-- UAF

After free_mod_mem() frees the module's .rodata, allocinfo_show()
dereferences ct->filename, ct->function which point there.

Save the iterator state in allocinfo_next() and resume from it in
allocinfo_start() with codetag_next_ct(), which detects module removal via
idr_find() returning NULL and skips to the next module.

Link: https://lore.kernel.org/20260604065938.105991-1-hao.ge@linux.dev
Fixes: 9f44df50fee4 ("alloc_tag: keep codetag iterator active between read()")
Signed-off-by: Hao Ge <hao.ge@linux.dev>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list

Pages allocated before page_ext is available have their codetag left
uninitialized.  Track these early PFNs and clear their codetag in
clear_early_alloc_pfn_tag_refs() to avoid "alloc_tag was not set" warnings
when they are freed later.

Currently a fixed-size array of 8192 entries is used, with a warning if
the limit is exceeded.  However, the number of early allocations depends
on the number of CPUs and can be larger than 8192.

Replace the fixed-size array with a dynamically allocated linked list of
pfn_pool structs.  Each node is allocated via alloc_page() and mapped to a
pfn_pool containing a next pointer, an atomic slot counter, and a PFN
array that fills the remainder of the page.

The tracking pages themselves are allocated via alloc_page(), which would
trigger __pgalloc_tag_add() -> alloc_tag_add_early_pfn() and recurse
indefinitely.  Introduce __GFP_NO_CODETAG (reuses the %__GFP_NO_OBJ_EXT
bit) and pass gfp_flags through pgalloc_tag_add() so that the early path
can skip recording allocations that carry this flag.

Link: https://lore.kernel.org/20260604024008.46592-1-hao.ge@linux.dev
Signed-off-by: Hao Ge <hao.ge@linux.dev>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
"Only ufs driver updates this time, apart from which this is just an
  assortment of bug fixes and AI assisted changes.

  The biggest other change is the reversion of the sas_user_scan patch
  which supported a mpi3mr NVME behaviour but caused major issues for
  other sas controllers. The next biggest is the removal of target reset
  in tcm_loop.c"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (56 commits)
  scsi: target: Remove tcm_loop target reset handling
  scsi: lpfc: Fix spelling mistakes in comments
  scsi: ufs: ufs-pci: Add AMD device ID support
  scsi: ufs: core: Handle PM commands timeout before SCSI EH
  scsi: devinfo: Broaden Promise VTrak E310/E610 identification
  scsi: target: Use constant-time crypto_memneq() for CHAP digests
  scsi: target: Fix hexadecimal CHAP_I handling
  scsi: scsi_debug: Fix one-partition tape setup bounds
  scsi: ufs: qcom: dt-bindings: Document the Hawi UFS controller
  scsi: mailmap: Update Avri Altman's email address
  scsi: ufs: Remove redundant vops NULL check and trivial wrapper
  scsi: ufs: Remove unnecessary return in void vops wrappers
  scsi: ufs: Fix wrong value printed in unexpected UPIU response case
  scsi: ufs: core: Fix NULL pointer dereference in scsi_cmd_priv() calls
  scsi: megaraid_mbox: Avoid double kfree()
  scsi: pm8001: Fix error code in non_fatal_log_show()
  scsi: lpfc: Turn lpfc_queue q_pgs into a flexible array
  scsi: ufs: core: Skip link param validation when lanes_per_direction is unset
  scsi: sas: Skip opt_sectors when DMA reports no real optimization hint
  scsi: Revert "scsi: Fix sas_user_scan() to handle wildcard and multi-channel scans"
  ...

Merge tag '9p-for-7.2-rc1' of https://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:
"Asides of the avalanche of LLM-driven fixes, there are a couple of big
  changes this cycle:

   - negative dentry and symlink cache

   - a way out of the unkillable "io_wait_event_killable" (because it
     looped around waiting for the request flush to come back from
     server; this has been bugging syzcaller folks since forever): I'm
     still not 100% sure about this patch, but I think it's as good as
     we'll ever get, and will keep testing a bit further in the coming
     weeks

  The rest is more noisy than usual, but shouldn't cause any trouble"

* tag '9p-for-7.2-rc1' of https://github.com/martinetd/linux:
  9p: Add missing read barrier in virtio zero-copy path
  net/9p: Replace strlen() strcpy() pair with strscpy()
  9p: skip nlink update in cacheless mode to fix WARN_ON
  net/9p: fix race condition on rdma->state in trans_rdma.c
  9p: v9fs_file_do_lock: replace WARN_ONCE with p9_debug
  9p: Enable symlink caching in page cache
  9p: Set default negative dentry retention time for cache=loose
  9p: Add mount option for negative dentry cache retention
  9p: Cache negative dentries for lookup performance
  9p: avoid returning ERR_PTR(0) from mkdir operations
  9p: avoid putting oldfid in p9_client_walk() error path
  net/9p: fix infinite loop in p9_client_rpc on fatal signal
  docs/filesystems/9p: fix broken external links
  9p: invalidate readdir buffer on seek
  9p: use kvzalloc for readdir buffer
  net/9p/usbg: Constify struct configfs_item_operations

Merge tag 'firewire-updates-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire updates from Takashi Sakamoto:

- firewire drivers have been able to assign an arbitrary value in the
   mod_device entry, which is typed as kernel_ulong_t.

   While storing the pointer value is legitimate, conversion back to a
   pointer has been performed without preserving the const qualifier.

   Uwe Kleine-König introduced an union to provide safer and more robust
   conversions, as part of the ongoing CHERI enhancement work for ARM
   and RISC-V architectures. This includes changes to the sound
   subsystem, since the conversion pattern is widely used in ALSA
   firewire stack.

- Userspace applications can request the core function to perform
   isochronous resource management procedures. Dingsoul reported a
   reference-count leak when these procedures are processed in workqueue
   contexts.

   This refactors the relevant code paths following a divide and conquer
   approach. Consequently, it became clear that the issue still remain
   in the path when userspace applications delegate automatic resource
   reallocation after bus resets to the core.

   In practice, the leak is rarely triggered, and a complete fix is
   still in progress.

* tag 'firewire-updates-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: Open-code topology list walk
  firewire: core: cancel using delayed work for iso_resource_once management
  firewire: core: rename member name for channel mask of isoc resource
  firewire: core: minor code refactoring for case-dependent parameters of iso resources management
  ALSA: firewire: Make use of ieee1394's .driver_data_ptr
  firewire: Simplify storing pointers in device id struct
  firewire: core: move allocation/reallocation paths into specific branch after isoc resource management in cdev
  firewire: core: refactor notification type determination after isoc resource management in cdev
  firewire: core: use switch statement for post-processing of isoc resource management in cdev
  firewire: core: reduce critical section duration in pre-processing of isoc resource management in cdev
  firewire: core: code cleanup for iso resource auto creation
  firewire: core: append _auto suffix for non-once iso resource operations
  firewire: core: code cleanup to remove old implementations for once operation
  firewire: core: split functions for iso_resource once operation
  firewire: core: code refactoring for helper function to fill iso_resource parameters
  firewire: core: code refactoring to queue work item for iso_resource
  firewire: core: code refactoring for early return at client resource allocation

Merge tag 'liveupdate-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux

Pull liveupdate updates from Mike Rapoport:
"Kexec Handover (KHO):

   - make memory preservation compatible with deferred initialization
     of the memory map

  Live Update Orchestrator (LUO):

   - add LIVEUPDATE_SESSION_GET_NAME ioctl and parameter verification
     for LIVEUPDATE_IOCTL_CREATE_SESSION ioctl

   - documentation updates for liveupdate=on command line option,
     systemd support and the current compatibility status

   - remove the fixed limits on the number of files that can be
     preserved within a single session, and the total number of
     sessions managed by the LUO

  Misc fixes:

   - reference count incoming File-Lifecycle-Bound (FLB) data so
     it cannot be freed while a subsystem is still using it

   - fixes for a TOCTOU race in luo_session_retrieve(), a use-
     after-free in the file finish and unpreserve paths, concurrent
     session mutations during reboot and serialization on
     preserve_context kexec

   - make sure ioctls for incoming LUO sessions are blocked for
     outgoing sessions and vice versa

   - make sure KHO scratch size is always aligned by
     CMA_MIN_ALIGNMENT_BYTES

   - fix memblock tests build issue introduced by KHO changes"

* tag 'liveupdate-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux: (36 commits)
  liveupdate: Document that retrieve failure is permanent
  docs: memfd_preservation: fix rendering of ABI documentation
  selftests/liveupdate: Add stress-files kexec test
  selftests/liveupdate: Add stress-sessions kexec test
  selftests/liveupdate: Test session and file limit removal
  liveupdate: Remove limit on the number of files per session
  liveupdate: Remove limit on the number of sessions
  liveupdate: defer session block allocation and physical address setting
  kho: add support for linked-block serialization
  liveupdate: Extract luo_session_deserialize_one helper
  liveupdate: Extract luo_file_deserialize_one helper
  liveupdate: register luo_ser as KHO subtree
  liveupdate: centralize state management into struct luo_ser
  liveupdate: avoid mixing cleanup guards with goto in luo_session_retrieve_fd
  liveupdate: change file_set->count type to u64 for type safety
  liveupdate: Remove unused ser field from struct luo_session
  liveupdate: fix u-a-f in luo_file_unpreserve_files() and luo_file_finish()
  liveupdate: block session mutations during reboot
  liveupdate: fix TOCTOU race in luo_session_retrieve()
  liveupdate: skip serialization for context-preserving kexec
  ...

Merge tag 'for-linus' of https://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:
"A few fixes for text patching related code:

   - Update the section of map_page used in text patching. It was
     left with __init when text patching was introduced to OpenRISC

   - Add fix to invalidate remote SMP core i-caches after text is
     patched"

* tag 'for-linus' of https://github.com/openrisc/linux:
  openrisc: Fix jump_label smp syncing
  openrisc: Add full instruction cache invalidate functions
  openrisc: Cache invalidation cleanup
  openrisc: mm: Fix section mismatch between map_page and __set_fixmap

Merge tag 'nand/for-7.2' into mtd/next

* Extend SPI NAND continuous read to Winbond devices, which requires
  numerous changes in the spi-{mem,nand} layers such as the need for a
  secondary read operation template.

* Continuous reads in general have also been enhanced/fixed for avoiding
  potential issues at probe time and at block boundaries.

Plus, there is the usual load of misc fixes and improvements.

Merge tag 'spi-nor/for-7.2' into mtd/next

SPI NOR changes for 7.2

Notable changes:

- Big set of cleanups and improvements to the locking support. This
  series contains some cleanups and bug fixes for code and documentation
  around write protection. Then support is added for complement locking,
  which allows finer grained configuration of what is considered locked
  and unlocked. Then complement locking is enabled on a bunch of Winbond
  W25 flashes.

- Fix die erase support on Spansion flashes. Die erase is only supported
  on multi-die flashes, but the die erase opcode was set for all. When
  the opcode is set, it overrides the default chip erase opcode which
  should be used for single-die flashes. Only set the opcode on
  multi-die flashes. Also, the opcode was not set on multi-die s28hx-t
  flashes. Set it so they can use die-erase correctly.

drm/nouveau: fix reversed error cleanup order in ucopy functions

nouveau_uvmm_vm_bind_ucopy() and nouveau_exec_ucopy() place their error
cleanup labels in allocation order rather than reverse allocation order.
On a u_memcpya() failure for in_sync.s, the goto to err_free_ops (or
err_free_pushs) frees the first allocation and then falls through to
err_free_ins, which calls u_free() on args->in_sync.s.

Since args->in_sync.s still holds the ERR_PTR returned by the failed
u_memcpya(), and ERR_PTR values are not caught by ZERO_OR_NULL_PTR(),
kvfree() proceeds to dereference it, which can result in a kernel oops.
A failure for out_sync.s instead jumps to err_free_ins and skips freeing
the first allocation, leading to a memory leak.

Fix by swapping the cleanup label order so resources are freed in the
correct reverse allocation sequence.

Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Link: https://patch.msgid.link/SYBPR01MB7881484D91A6F80271415F71AF1A2@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

drm/nouveau/acr: fix missing nvkm_done() in error path of nvkm_acr_oneinit()

In nvkm_acr_oneinit(), nvkm_kmap(acr->wpr) is invoked unconditionally
at line 309 to obtain a mapping reference. Additionally, when both
acr->wpr_fw and acr->wpr_comp are present, a second nvkm_kmap() is
called inside the conditional block. Both mappings are expected to be
released by nvkm_done(acr->wpr) at line 320 before the function returns
successfully.

However, when a mismatch is detected during the loop within the
conditional block, the function returns -EINVAL at line 318 without
calling nvkm_done(). This results in a leak of the kmap reference(s)
acquired earlier.

Fix the issue by invoking nvkm_done(acr->wpr) prior to the early return
to ensure proper release of the mapping references.

Fixes: 22dcda45a3d1 ("drm/nouveau/acr: implement new subdev to replace "secure boot"")
Cc: stable@vger.kernel.org
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20260606155606.77593-1-vulab@iscas.ac.cn
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

irqchip/crossbar: Fix parent domain resource leak

irq_domain_alloc_irqs_parent() is called in allocate_gic_irq() but
irq_domain_free_irqs_parent() is never called which causes a resource leak.

Fix this by calling irq_domain_free_irqs_parent() in
crossbar_domain_free().

Fixes: 783d31863fb82 ("irqchip: crossbar: Convert dra7 crossbar to stacked domains")
Signed-off-by: Bhargav Joshi <j.bhargav.u@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260620-irq-crossbar-fix-v2-2-b8e8499f468a@gmail.com

irqchip/crossbar: Use correct index in crossbar_domain_free()

crossbar_domain_free() resets the domain data and then uses the nulled
out irq_data->hwirq member as index to reset the irq_map[] entry and to
write the relevant crossbar register with a safe entry. That means it
never frees the correct index and keeps the crossbar register connection
to the source interrupt active.

If it would not reset the domain data, then this would be even worse as
irq_data->hwirq holds the source interrupt number, but both the map and
register index need the corresponding GIC SPI number and not the source
interrupt number. This might even result in an out of bounds access as
the source interrupt number can be higher than the maximal index space.

Fix this by using the GIC SPI index from the parent domain's irq_data.

Fixes: 783d31863fb82 ("irqchip: crossbar: Convert dra7 crossbar to stacked domains")
Signed-off-by: Bhargav Joshi <j.bhargav.u@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260620-irq-crossbar-fix-v2-1-b8e8499f468a@gmail.com

locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()

rt_spin_unlock() releases the RCU protection before unlocking the
lock. That opens the door for the following UAF scenario:

T1 T2
spin_lock(&p->lock); rcu_read_lock();
invalidate(p); p = rcu_dereference(ptr);
rcu_assign_pointer(ptr, NULL); if (!p) return;
spin_unlock(&p->lock); spin_lock(&p->lock)
   lock(&lock->lock);
   rcu_read_lock();
kfree_rcu(p); rcu_read_unlock();
....
spin_unlock(&p->lock)
  rcu_read_unlock(); // Ends grace period
rcu_do_batch()
   kfree(p);
    UAF ->   rt_mutex_cmpxchg_release(&lock->lock...)

Regular spinlocks keep preemption disabled accross the unlock operation,
which provides full RCU protection, but the RT substitution fails to
resemble that. Same applies for the rwlock substitution.

Move the rcu_read_unlock() invocation past the unlock operations to match
the non-RT semantics. This makes it asymmetric vs. rt_xxx_lock(), but
that's harmless as the caller needs to hold RCU read lock across the lock
operation. The migrate_enable() call stays before the unlock operation
because there is no per CPU operation in the unlock path which would
require migration to be kept disabled.

Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com
Decoded-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/87jyrud75z.ffs@fw13

Merge tag 'hwlock-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull hwspinlock update from Bjorn Andersson:

- Avoid uninitialized struct members in the Qualcomm hwspinlock driver

* tag 'hwlock-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
hwspinlock: qcom: avoid uninitialized struct members

Merge tag 'rpmsg-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull rpmsg update from Bjorn Andersson:

- Fix use-after-free in rpmsg-char driver

* tag 'rpmsg-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
rpmsg: char: Fix use-after-free on probe error path

Merge tag 'rproc-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull remoteproc updates from Bjorn Andersson:

- Add i.MX94 support to the i.MX remoteproc driver, covering the
   Cortex-M7 and Cortex-M33 Sync cores. This also fixes programming of
   non-zero System Manager CPU/LMM reset vectors.

- Move the remoteproc resource table definitions to a separate header,
   so they can be used by clients that do not otherwise depend on
   remoteproc. Switch the firmware resource handling over to the common
   iterator.

- Update the Xilinx R5F remoteproc driver to check the remote core
   state before attaching, drop a binding header dependency, and add
   firmware-name based auto boot support.

- Add Qualcomm Hawi ADSP/CDSP bindings, together with Shikra RPM
   bindings and CDSP, LPAICP, and MPSS PAS support. Fix a Qualcomm
   minidump leak, clean up PAS and WCSS reset handling, and make the
   user-visible Qualcomm naming consistent.

- Remove a duplicate STM32_RPROC Kconfig dependency and make i.MX
   remoteproc instances use the device node name so multiple processors
   can be distinguished in sysfs.

* tag 'rproc-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  remoteproc: qcom: pas: Drop start/stop completion from struct qcom_pas
  remoteproc: qcom: pas: Add Shikra remoteproc support
  dt-bindings: remoteproc: qcom,shikra-pas: Document Shikra PAS remoteprocs
  dt-bindings: remoteproc: Add Shikra RPM processor compatible
  remoteproc: qcom: Unify user-visible "Qualcomm" name
  remoteproc: qcom: Fix leak when custom dump_segments addition fails
  remoteproc: qcom_q6v5_wcss: drop redundant wcss_q6_bcr_reset
  dt-bindings: remoteproc: qcom,sm8550-pas: Add Hawi CDSP compatible
  dt-bindings: remoteproc: qcom,sm8550-pas: Add Hawi ADSP compatible
  remoteproc: xlnx: Enable auto boot feature
  dt-bindings: remoteproc: xlnx: Add firmware-name property
  remoteproc: xlnx: Remove binding header dependency
  remoteproc: imx_rproc: Use device node name as processor name
  remoteproc: use rsc_table_for_each_entry() in rproc_handle_resources()
  remoteproc: Move resource table data structure to its own header
  remoteproc: xlnx: Check remote core state
  remoteproc: imx_rproc: Add support for i.MX94
  remoteproc: imx_rproc: Program non-zero SM CPU/LMM reset vector
  dt-bindings: remoteproc: imx-rproc: Support i.MX94
  remoteproc: Dead code cleanup in Kconfig for STM32_RPROC

9p: Add missing read barrier in virtio zero-copy path

Commit 2b6e72ed747f ("9P: Add memory barriers to protect request
fields over cb/rpc threads handoff") added a read barrier after
p9_client_rpc() waits for req->status, pairing with the write barrier in
p9_client_cb(). The virtio zero-copy wait path was missed.

Add the same read barrier after the zero-copy wait before reading the
completed request.

Fixes: 2b6e72ed747f ("9P: Add memory barriers to protect request fields over cb/rpc threads handoff")
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Message-ID: <20260529075441.233369-1-hanguidong02@gmail.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

net/9p: Replace strlen() strcpy() pair with strscpy()

Use the result of strscpy() for the overflow check.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Message-ID: <20260606202744.5113-3-david.laight.linux@gmail.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

9p: skip nlink update in cacheless mode to fix WARN_ON

v9fs_dec_count() unconditionally calls drop_nlink() on regular files,
even when the inode's nlink is already zero. In cacheless mode the
client refetches inode metadata from the server (the source of truth)
on every operation, so by the time v9fs_remove() returns, the locally
cached nlink may already reflect the post-unlink value:

  1. Client initiates unlink, server processes it and sets nlink to 0
  2. Client refetches inode metadata (nlink=0) before unlink returns
  3. Client's v9fs_remove() completes successfully
  4. Client calls v9fs_dec_count() which calls drop_nlink() on nlink=0

This race is easily triggered under heavy unlink workloads, such as
stress-ng's unlink stressor, producing the following warning:

  WARNING: fs/inode.c:417 at drop_nlink+0x4c/0xc8
  Call trace:
   drop_nlink+0x4c/0xc8
   v9fs_remove+0x1e0/0x250 [9p]
   v9fs_vfs_unlink+0x20/0x38 [9p]
   vfs_unlink+0x13c/0x258
   ...

In cacheless mode the server is authoritative and the inode is on its
way out, so locally adjusting nlink buys nothing. Skip v9fs_dec_count()
entirely when neither CACHE_META nor CACHE_LOOSE is set, which both
avoids the warning and removes a class of nlink races (two concurrent
unlinkers observing nlink > 0 and both calling drop_nlink()) that an
nlink == 0 guard alone would only narrow rather than close.

Fixes: ac89b2ef9b55 ("9p: don't maintain dir i_nlink if the exported fs doesn't either")
Cc: stable@vger.kernel.org
Suggested-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Message-ID: <20260421-9p-v2-1-48762d294fad@debian.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

net/9p: fix race condition on rdma->state in trans_rdma.c

The rdma->state field is modified without holding req_lock in both
recv_done() and p9_cm_event_handler(), while rdma_request() accesses
the same field under the req_lock spinlock. This inconsistent locking
creates a race condition:

- recv_done() running in softirq completion context sets
  rdma->state = P9_RDMA_FLUSHING without acquiring req_lock

- p9_cm_event_handler() modifies rdma->state at multiple points
  (ADDR_RESOLVED, ROUTE_RESOLVED, ESTABLISHED, CLOSED) without
  req_lock

- rdma_request() uses spin_lock_irqsave(&rdma->req_lock, flags) to
  protect the read-modify-write of rdma->state

The race can cause lost state transitions: recv_done() or the CM
event handler could set state to FLUSHING/CLOSED while rdma_request()
is concurrently checking or modifying state under the lock, leading to
the FLUSHING transition being silently overwritten by CLOSING. This
corrupts the connection state machine and can cause use-after-free on
RDMA request objects during teardown.

Fix by adding req_lock protection to all rdma->state modifications in
recv_done() and p9_cm_event_handler(), matching the pattern already
used in rdma_request(). Use spin_lock_irqsave/spin_unlock_irqrestore
in the CM event handler since it can race with recv_done() which runs
in softirq context.

Tested with a kernel module that races two threads (simulating
rdma_request and recv_done/CM handler) on rdma->state with proper
locking: 5.5M+ FLUSHING writes over 27M iterations with 0 lost
transitions.

Fixes: 473c7dd1d7b5 ("9p/rdma: remove useless check in cm_event_handler")
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Assisted-by: GLM:GLM-5.1
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Message-ID: <20260529073933.77315-1-zhaoyz24@mails.tsinghua.edu.cn>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

9p: v9fs_file_do_lock: replace WARN_ONCE with p9_debug

This warning depends on server-provided data, we should not use
WARN here

Reported-by: Yifei Chu <yifeichu24@gmail.com>
Closes: https://lore.kernel.org/r/CAPJnbgJ7ZK7DCjCfG56hd_iKGePmAzudb4hOWd4=9r32nM+KcA@mail.gmail.com
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Message-ID: <20260529-lock-warn-v1-1-20c29580d61d@codewreck.org>

9p: Enable symlink caching in page cache

Currently, when cache=loose is enabled, file reads are cached in the
page cache, but symlink reads are not. This patch allows the results
of p9_client_readlink() to be stored in the page cache, eliminating
the need for repeated 9P transactions on subsequent symlink accesses.

This change improves performance for workloads that involve frequent
symlink resolution.

Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Message-ID: <982462d17c0c0d2856763266a25eb04d080c1dbb.1779355927.git.repk@triplefau.lt>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

9p: Set default negative dentry retention time for cache=loose

For cache=loose mounts, set the default negative dentry cache retention
time to 24 hours.

Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Message-ID: <b5beca3e70890ab8a4f0b9e99bd69cb97f5cb9eb.1779355927.git.repk@triplefau.lt>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

9p: Add mount option for negative dentry cache retention

Introduce a new mount option, negtimeout, for v9fs that allows users
to specify how long negative dentries are retained in the cache. The
retention time can be set in milliseconds (e.g. negtimeout=10000 for
a 10secs retention time) or a negative value (e.g. negtimeout=-1) to
keep negative entries until the buffer cache management removes them.

For consistency reasons, this option should only be used in exclusive
or read-only mount scenarios, aligning with the cache=loose usage.

Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Message-ID: <b2d66500aa5a2f6540347c4aa46a4be10dd01bc6.1779355927.git.repk@triplefau.lt>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

9p: Cache negative dentries for lookup performance

Not caching negative dentries can result in poor performance for
workloads that repeatedly look up non-existent paths. Each such
lookup triggers a full 9P transaction with the server, adding
unnecessary overhead.

A typical example is source compilation, where multiple cc1 processes
are spawned and repeatedly search for the same missing header files
over and over again.

This change enables caching of negative dentries, so that lookups for
known non-existent paths do not require a full 9P transaction. The
cached negative dentries are retained for a configurable duration
(expressed in milliseconds), as specified by the ndentry_timeout
field in struct v9fs_session_info. If set to -1, negative dentries
are cached indefinitely.

This optimization reduces lookup overhead and improves performance for
workloads involving frequent access to non-existent paths.

Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Message-ID: <e542317dd03bbadb5249abd3ea6aecfdca692c19.1779355927.git.repk@triplefau.lt>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

9p: avoid returning ERR_PTR(0) from mkdir operations

When mkdir succeeds, v9fs_vfs_mkdir_dotl() and v9fs_vfs_mkdir() return
ERR_PTR(0) which is incorrect. They should return NULL instead for
success and ERR_PTR() only with negative error codes for failure.

Return NULL instead of passing to ERR_PTR while err is zero
Fixes smatch warnings:
fs/9p/vfs_inode_dotl.c:420 v9fs_vfs_mkdir_dotl() warn: passing zero to 'ERR_PTR'
fs/9p/vfs_inode.c:695 v9fs_vfs_mkdir() warn: passing zero to 'ERR_PTR'

The v9fs_vfs_mkdir() code was further simplified because v9fs_create()
can never return NULL, so we do not need to check for fid being set
separately, and the error path can be a simple return immediately after
v9fs_create() failure.
There is no intended functional change.

Fixes: 88d5baf69082 ("Change inode_operations.mkdir to return struct dentry *")
Suggested-by: David Laight <david.laight.linux@gmail.com>
Acked-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn>
Message-ID: <20260520022650.14217-1-zenghongling@kylinos.cn>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

9p: avoid putting oldfid in p9_client_walk() error path

When p9_client_walk() is called with clone set to false, fid aliases
oldfid. If the walk subsequently fails after the request has been sent,
the error path jumps to clunk_fid, which currently calls p9_fid_put(fid)
unconditionally.

This drops a reference to oldfid even though ownership of oldfid remains
with the caller. If this is the last reference, oldfid can be clunked and
destroyed while the caller still expects it to be valid. A later use or
put of oldfid can then trigger a use-after-free or refcount underflow.

Fix this by only putting fid in the clunk_fid error path when it does not
alias oldfid, matching the existing guard in the error path below.

This can be triggered when a multi-component walk is split into multiple
p9_client_walk() calls and a later non-cloning walk fails. A reproducer
and refcount warning logs are available on request.

Fixes: b48dbb998d70 ("9p fid refcount: add p9_fid_get/put wrappers")
Cc: stable@vger.kernel.org
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Assisted-by: GLM 5.1
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Message-ID: <20260528053918.53550-1-zhaoyz24@mails.tsinghua.edu.cn>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

mailbox: imx: Don't force-thread the primary handler

The primary interrupt handler (imx_mu_isr()) no longer invokes any
callbacks it only masks the interrupt source and returns. In a
forced-threaded environment the IRQ-core will force-thread the primary
handler which can be avoided.

The primary handler uses a spinlock_t to protect the RMW operation in
imx_mu_xcr_rmw() - nothing that may introduce long latencies.

The lock can be turned into a raw_spinlock_t and then the primary
handler can run in hardirq context even on PREEMPT_RT skipping one
thread.

Make struct imx_mu_priv::xcr_lock a raw_spinlock_t and skip
force-threading the primrary handler by marking it IRQF_NO_THREAD.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

mailbox: imx: Move the RXDB part of the mailbox into the threaded handler

Move RXDB callback handling into the threaded handler. This similar to
the RX side and since the imx_mu_dcfg::rxdb callback can return an error, the
interrupt is only enabled on success.

Move RXDB callback handling into the threaded handler.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

mailbox: imx: Move the RX part of the mailbox into the threaded handler

Move RX callback handling into the threaded handler. This is similar to
the TX side except that we explicitly mask the source interrupt in the
primary handler and unmask it in the threaded handler again after
success. This was done automatically in the TX part.

The masking/ unmasking can be removed from imx_mu_specific_rx() since it
already happens in the primary/ threaded handler before invoking the
channel specific callback.

Move RX channel handling into threaded handler.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

mailbox: imx: Start splitting the IRQ handler in primary and threaded handler

Split the mailbox irq handling into a primary handler (imx_mu_isr()) and
a threaded handler (imx_mu_isr_th()). The primary handler masks the
interrupt event so the threaded handler can run without raising the
interrupt again.

The goal here is to invoke the mailbox core functions (such as
mbox_chan_received_data(), mbox_chan_txdone()) in preemptible context which is
made possible by using an threaded interrupt handler. This in turn means that
mailbox's client callbacks are invoked in preemptible context, too. This then
allows the mailbox client callback to skip an indirection via a workqueue if
it requries preemptible callback.

As a first step, prepare the logic and move TX handling part.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

mailbox: imx: Use channel index instead of zero in imx_mu_specific_rx()

imx_mu_specific_rx() masks channel 0 and unmasks it again at the end of
the function. Given that at startup the channel index got unmasked it
should do the right job.

This here either unmasks the actual channel or another one but should
have no impact given that it reverses its doing at the end.

Peng Fan commented here:
| For specific rx channel, whether it is i.MX8 SCU or i.MX ELE, actually there is
| only 1 channel as of now, but it seems better to use cp->idx in case more
| channels in future.

Use the channel index instead of zero.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

mailbox: imx: use devm_of_platform_populate()

The driver uses of_platform_populate() but does not remove the added
devices on removal. This can lead to "double devices" on module removal
followed by adding the module again.

Use devm_of_platform_populate() to remove the populated devices once the
parent device is removed.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

mailbox: imx: Use devm_pm_runtime_enable()

sashiko complained about early usage of the device while probe isn't
completed. This can be mitigated by delaying the pm_runtime_enable()
into the removal path instead doing it early. This ensures that in an
error case the device is removed (and imx_mu_shutdown()) before
pm_runtime_disable() so we don't have to do this manually.

For the order to work, lets move devm_mbox_controller_register() until
after the pm-runtime part. So the reverse order will be mbox-controller
removal followed by disabling pm runtime.

Use devm_pm_runtime_enable(), remove manual pm_runtime_disable()
invocations and move the pm_runtime handling in probe before
devm_mbox_controller_register().

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

mailbox: imx: Add a channel shutdown field

sashiko complained about possible teardown problem. The scenario

CPU 0                              CPU 1
  imx_mu_isr()                   imx_mu_shutdown()
                                   imx_mu_xcr_rmw(priv, IMX_MU_RCR, 0, IMX_MU_xCR_RIEn(priv->dcfg->type, cp->idx));
    imx_mu_specific_rx()
      imx_mu_xcr_rmw(priv, IMX_MU_RCR, IMX_MU_xCR_RIEn(priv->dcfg->type, 0), 0);
                                   free_irq()

The RX event remains enabled because in this short window the RX event
was disabled in ->shutdown() while the interrupt was active and then
enabled again by the ISR while ->shutdown waited in free_irq().

This race requires timing and if happens can be problematic on shared
handlers if the "removed" channel triggers an interrupt. In this case
the irq-core will shutdown the interrupt with the "nobody cared"
message.

Introduce imx_mu_con_priv::shutdown to signal that the channel is
shutting down. This flag is set with the lock held (by
imx_mu_xcr_clr_shut()). The unmask side uses imx_mu_xcr_set_act() which
only enables the event if the channel has not been shutdown and
serialises on the same lock.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

mailbox: imx: Forward the timeout/ error in imx_mu_generic_tx()

imx_mu_generic_tx() for the IMX_MU_TYPE_TXDB_V2 type polls on a register
which may timeout and is recognized as an error. This error is siltently
dropped and not dropped to the caller.

Forward the error to the caller.

Fixes: b5ef17917f3a7 ("mailbox: imx: fix TXDB_V2 channel race condition")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

tpm: fix event_size output in tpm1_binary_bios_measurements_show

Commit 186d124f07da ("tpm_eventlog.c: fix binary_bios_measurements")
split the output to write the endian-converted event header first and
then the variable-length event data.

However, the split was at sizeof(struct tcpa_event) - 1, even though
event_data was a zero-length array, and later a flexible array member,
both of which already excluded the event data.

Therefore, the current code writes the first three bytes of event_size
from the endian-converted header and then the last byte from the raw
header, which can emit a corrupted event_size on PPC64, where
do_endian_conversion() maps to be32_to_cpu().

Split one byte later to write the full endian-converted header first,
followed by the variable-length event->event_data.

Fixes: 186d124f07da ("tpm_eventlog.c: fix binary_bios_measurements")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in

commit 746d9e9f62a6 ("tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's built-in")
probe tpm_crb_ffa forcefully when it's built-in to integrate with IMA.

However, IMA now provides the IMA_INIT_LATE_SYNC build option, which
initialises IMA at the late_initcall_sync level, so this change is no
longer required.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux.git/commit/?h=for-next/ffa/updates&id=cc7e8f21b9f0c229d68cf19a837cba82b5ac2d87
Link: https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux.git/commit/?h=for-next/ffa/updates&id=e659fc8e537c7a21d5d693d6f30d8852f2fa8d91
Link: https://lore.kernel.org/r/20260605144325.434436-5-yeoreum.yun@arm.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: tpm2-sessions: wait for async KPP completion in tpm_buf_append_salt

tpm_buf_append_salt() in drivers/char/tpm/tpm2-sessions.c calls
crypto_kpp_generate_public_key() and crypto_kpp_compute_shared_secret()
without installing a completion callback, discards both return values,
and immediately frees the kpp_request via kpp_request_free(). When the
resolved ecdh-nist-p256 KPP backend is asynchronous (atmel-ecc, HPRE,
keembay-ocs), either operation returns -EINPROGRESS and the deferred
completion worker dereferences the freed request.

The path fires automatically from the hwrng_fillfn kernel thread via
tpm_get_random -> tpm2_get_random -> tpm2_start_auth_session ->
tpm_buf_append_salt on every entropy poll, without any userland action.

Install crypto_req_done as the completion callback, wrap both KPP
operations in crypto_wait_req(), and propagate errors to the caller.
The wait is a no-op for synchronous backends.

Fixes: 1085b8276bb4 ("tpm: Add the rest of the session HMAC API")
Cc: stable@vger.kernel.org # v6.10+
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: tpm_tis: Add settle time for some TPMs

Some TPMs fail to grant locality when requested immediately after being
relinquished. In this case, the TPM_ACCESS_REQUEST_USE bit of the
TPM_ACCESS register is cleared immediately without setting
TPM_ACCESS_ACTIVE_LOCALITY.

This issue can be seen at boot since tpm_chip_start, called right
after locality is relinquished, will fail. This causes the probe to
fail:

tpm_tis MSFT0101:00: probe with driver tpm_tis failed with error -1

This occurs on some older Dell Latitudes. For the Nuvoton TPM used in
these machines, add a delay after locality is relinquished.

Signed-off-by: Jim Broadus <jbroadus@gmail.com>
Link: https://lore.kernel.org/r/20260526232245.5409-3-jbroadus@gmail.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: tpm_tis: store entire did_vid

The entire 32 bit did_vid is read from the device, but only the 16 bit
vendor id portion was stored in the tpm_tis_data structure. Storing the
entire value allows the device id to be used to handle quirks. Printing
the vid and did in the error case also helps identify problem devices.

Signed-off-by: Jim Broadus <jbroadus@gmail.com>
Link: https://lore.kernel.org/r/20260526232245.5409-2-jbroadus@gmail.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm_crb: Check ACPI_COMPANION() against NULL during probe

Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.

Accordingly, add a requisite ACPI_COMPANION() check against NULL to the
tpm_crb driver.

Fixes: 48fe2cddc85c ("tpm_crb: Convert ACPI driver to a platform one")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/2848144.mvXUDI8C0e@rafael.j.wysocki
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: tpm_tis_spi: Use wait_woken() in wait_for_tmp_stat()

wait_event_interruptible_timeout() evaluates its condition after setting
the current task state to TASK_INTERRUPTIBLE.

With CONFIG_DEBUG_ATOMIC_SLEEP this triggers a warning when the IRQ wait
path is used:

    tpm_tis_status()
      tpm_tis_spi_read_bytes()
        tpm_tis_spi_transfer_full()
          spi_bus_lock()
            mutex_lock()

Address this with the following measures:

1. Call wait_tpm_stat_cond() only while tasking is running.
2. Use wait_woken() to wait for changes.

Cc: stable@vger.kernel.org # v4.19+
Cc: Linus Walleij <linusw@kernel.org>
Reported-by: Stefan Wahren <wahrenst@gmx.net>
Closes: https://lore.kernel.org/linux-integrity/6964bec7-3dbb-453b-89ef-9b990217a8b9@gmx.net/
Fixes: 1a339b658d9d ("tpm_tis_spi: Pass the SPI IRQ down to the driver")
Reviewed-by: Linus Walleij <linusw@kernel.org>
Tested-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: Initialize name_size_alg for non-NULL name in tpm_buf_append_name()

tpm_buf_append_name() supports callers passing a pre-computed name
for handles. When name is non-NULL, the code skips the
tpm2_read_public() path but leaves name_size_alg uninitialized
before it is used as the memcpy size argument.

No current in-tree caller passes a non-NULL name, but future use
cases such as name caching would exercise this path. Initialize
name_size_alg by calling name_size() on the caller-provided name,
sharing the error check and assignment with the existing
tpm2_read_public() path. This prevents unmasking a latent bug when
the non-NULL name path is eventually used.

Assisted-by: Kiro:claude-opus-4.6
Reviewed-by: Justinien Bouron <jbouron@amazon.com>
Reviewed-by: Muhammad Hammad Ijaz <mhijaz@amazon.com>
Signed-off-by: Gunnar Kudrjavets <gunnarku@amazon.com>
Link: https://lore.kernel.org/r/20260510171152.4607-1-gunnarku@amazon.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: restore timeout for key creation commands

Commit 207696b17f38 ("tpm: use a map for tpm2_calc_ordinal_duration()")
inadvertently reduced the timeout for TPM2 key creation commands
(`CREATE_PRIMARY`, `CREATE`, `CREATE_LOADED`) from 300 seconds to 30
seconds.

This causes intermittent timeout failures, with several failures observed
across hundreds of test runs on some Intel platforms using Infineon
SLB9670 and SLB9672 TPM modules. Restore the timeout to 300 seconds to
avoid spurious failures.

Cc: stable@vger.kernel.org # v6.18+
Fixes: 207696b17f38 ("tpm: use a map for tpm2_calc_ordinal_duration()")
Co-developed-by: Lili Li <lili.li@intel.com>
Signed-off-by: Lili Li <lili.li@intel.com>
Signed-off-by: Baoli Zhang <baoli.zhang@linux.intel.com>
Link: https://lore.kernel.org/r/20260421005021.13765-1-baoli.zhang@linux.intel.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: svsm: constify tpm_chip_ops

Constify the SVSM vTPM ops. It is statically initialized and never
written to, so let's store it in .rodata.

Every other tpm_class_ops instance in drivers/char/tpm/ is already
const.

Signed-off-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20260505202738.145800-1-dwindsor@gmail.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak

This needs to test for nonzero retval.

Fixes: c54c7c685494 ("netfilter: nft_meta_bridge: add NFT_META_BRI_IIFPVID support")
Closes: https://sashiko.dev/#/patchset/20260618061631.21919-1-fw%40strlen.de
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack_expect: use conntrack GC to reap expectations

This patch replaces the timer API by GC worker approach for
expectations, as it already happened in many other subsystems.

Use the existing conntrack GC worker to iterate over the local list of
expectations in the master conntrack to reap expired expectations.
Check IPS_HELPER_BIT to run GC for expectations, set it on for nft_ct
expectation which nevers sets it. Hold the expectation spinlock while
iterating over the master conntrack expectation list to synchronize with
nf_ct_remove_expectations(). This also performs runtime packet path
garbage collection through the expectation insertion and lookup
functions while walking over one of the chains of the global expectation
hashtables. Unconfirmed conntrack entries are skipped since ct->ext can
be reallocated and dying are skipped since those will be gone soon.
Set on IPS_HELPER_BIT if the helper ct extension is added, then the new
GC worker does not need to bump the ct refcount to check if the ct->ext
helper is available.

This removes the extra bump on the refcount for expectation timers, this
allows to remove several nf_ct_expect_put() calls after the unlink,
after this update only refcount remains at 1 while on the expectation
hashes.

This patch implicitly addresses a race with the existing timer API
allowing an expectation to access a stale exp->master pointer which has
been already released when expectation removal loses races with an
expiring timer, ie. timer_del() reporting false.

Add a new NF_CT_EXPECT_DEAD flag to reap this expectation via GC. This
is needed by nf_conntrack_unexpect_related() which is called in error
paths to invalidate newly created expectations that has been added into
the hashes. These expectactions cannot be inmediately released as GC or
nf_ct_remove_expectations() could race to make it. On expectation
insert, the runtime GC reaps stale expectations before checking the
expectation limit set by policy.

Set current timestamp in nf_ct_expect_alloc(), then add the expectation
policy timeout (or custom timeout specified added on top of this) to
specify the expectation lifetime.

Fixes: bffcaad9afdf ("netfilter: ctnetlink: ensure safe access to master conntrack")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_reject: skip iphdr options when looking for icmp header

Not a big deal but this hould have used the real ip header length and not the
base header size. As-is, if there are options then
nf_skb_is_icmp_unreach() result will be random.

Fixes: db99b2f2b3e2 ("netfilter: nf_reject: don't reply to icmp error messages")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_flow_offload: zero device address for non-ether case

LLM points out that the skip causes unitialised stack array to
propagate down into dev_fill_forward_path(). Its not clear to me that
there is a guarantee that a later ctx.dev->netdev_ops->ndo_fill_forward_path()
would always fix this up.

Cc: Felix Fietkau <nbd@nbd.name>
Fixes: 45ca3e61999e ("netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_meta_bridge: add validate callback for get operations

Blamed commit added NFT_META_BRI_IIFHWADDR to the set validate callback,
yet this is a get operation.

Add a get validate callback and move the NFT_META_BRI_IIFHWADDR key
there.

AFAICS this is harmless, NFT_META_BRI_IIFHWADDR can deal with a NULL
input device and the set handler ignores a NFT_META_BRI_IIFHWADDR
operation, but it allows to read 4 bytes off bridge skb->cb[].

Fixes: cbd2257dc96e ("netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_payload: reject offsets exceeding 65535 bytes

Large offsets were rejected based on netlink policy, but blamed commit
removed the policy without updating nft_payload_inner_init() to use the
truncation-check helper.

Silent truncation is not a problem, but not wanted either, so add a
check.

Fixes: 077dc4a27579 ("netfilter: nft_payload: extend offset to 65535 bytes")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipset: make sure gc is properly stopped

Sashiko noticed that when destroying a set,
cancel_delayed_work_sync() was called while gc
calls queue_delayed_work() unconditionally which
can lead not to properly shutting down the gc.

Fixes: f66ee0410b1c ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipset: fix order of kfree_rcu() and rcu_assign_pointer()

Sashiko pointed out that kfree_rcu() was called before
rcu_assign_pointer() in handling the comment extension.
Fix the order so that rcu_assign_pointer() called first.

Fixes: b57b2d1fa53f ("netfilter: ipset: Prepare the ipset core to use RCU at set level")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipset: Don't use test_bit() in lockless RCU readers in bitmap types

The pair of the patch "netfilter: ipset: Don't use test_bit() in lockless
RCU readers in hash types" for the bitmap types.

Fixes: 02a3231b6d82 ("netfilter: nf_conntrack_expect: store netns and zone in expectation")
Fixes: b0da3905bb1e ("netfilter: ipset: Bitmap types using the unified code base")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipset: Don't use test_bit() in lockless RCU readers in hash types

Sashiko pointed out that there are a few lockless RCU readers
using test_bit() which is a relaxed atomic operation and
provides no memory barrier guarantees. Use test_bit_acquire()
instead where the operation may run parallel with add/del/gc,
i.e. is not one from the next cases

- protected by region lock
- in a set destroy phase
- in a new/temporary set creation phase

Fixes: 18f84d41d34f ("netfilter: ipset: Introduce RCU locking in hash:* types")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

md/raid5: let stripe batch bm_seq comparison wrap-safe

Once the 32-bit seq wraps, a newer bm_seq can look smaller
than old, so .. covert to wrap-safe calculate way.

Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Link: https://patch.msgid.link/20260618025735.915113-1-chencheng@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid1: protect head_position for read balance

KCSAN reports a data race between raid1_end_read_request() and
raid1_read_request().

The completion path updates conf->mirrors[disk].head_position in
update_head_pos() without a lock, while the read-balance heuristic reads
the same field locklessly in is_sequential() and choose_best_rdev().

KCSAN report:
=========================
  BUG: KCSAN: data-race in raid1_end_read_request / raid1_read_request

  write to 0xffff8f0306ba7868 of 8 bytes by interrupt on cpu 9:
    raid1_end_read_request+0xb5/0x440
    bio_endio+0x3c9/0x3e0
    blk_update_request+0x257/0x770
    scsi_end_request+0x4d/0x520
    scsi_io_completion+0x6f/0x990
    scsi_finish_command+0x188/0x280
    scsi_complete+0xac/0x160
    blk_complete_reqs+0x8e/0xb0
    blk_done_softirq+0x1d/0x30
   [...]

  read to 0xffff8f0306ba7868 of 8 bytes by task 667002 on cpu 11:
    raid1_read_request+0x497/0x1a10
    raid1_make_request+0xdf/0x1950
    md_handle_request+0x2c5/0x700
    md_submit_bio+0x126/0x320
    __submit_bio+0x2ec/0x3a0
    submit_bio_noacct_nocheck+0x572/0x890
   [...]

  value changed: 0x0000000000000078 -> 0x00000000005fe448

Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Link: https://patch.msgid.link/20260619044114.1208456-1-chencheng@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid1: free r1_bio when REQ_NOWAIT is set and read would block on retry

When a read is retried, raid1_read_request() may be called with a
pre-allocated r1_bio. If wait_read_barrier() fails for a REQ_NOWAIT
read, the bio is completed and the function returns immediately. In this
case the existing r1_bio is leaked.

This fixes a leak of pre-allocated r1_bio structures for retried reads.

Fixes: 5aa705039c4f ("md: raid1 add nowait support")
Reported-by: sashiko-bot <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260611083514.754922-1-abd.masalkhi@gmail.com?part=1
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260611101350.759154-1-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid1: honor REQ_NOWAIT when waiting for behind writes

raid1 supports REQ_NOWAIT reads by avoiding waits in the barrier path
through wait_read_barrier(). However, a read can still block on a
WriteMostly device when the array uses a bitmap and there are
outstanding behind writes.

In that case raid1 unconditionally calls wait_behind_writes(), which
may sleep until all behind writes complete. As a result, a REQ_NOWAIT
read can block despite the caller explicitly requesting non-blocking
behavior.

This ensures that raid1 consistently honors REQ_NOWAIT reads across all
paths that may otherwise wait for behind writes.

Fixes: 5aa705039c4f ("md: raid1 add nowait support")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260611083514.754922-1-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid5: always convert llbitmap bits for discard

llbitmap discard is useful even when no underlying member device supports
it. The discard still converts the llbitmap range to unwritten, so later
reads and recovery do not rely on stale parity for that range.

Let llbitmap discard bypass the raid5 lower discard support check. If lower
discard is not safe or not supported, complete the accounted clone after
md_account_bio() so the llbitmap conversion callbacks run without member
discard bios.

Link: https://patch.msgid.link/20260605072639.2434847-4-yukuai@kernel.org
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid5: validate discard support at request time

Raid5 used to disable discard limits when devices_handle_discard_safely
was not set or when stacked member limits could not support a full-stripe
discard. That hides discard from userspace before raid5 can decide whether
a request can be handled safely.

Follow other virtual drivers and advertise a UINT_MAX discard limit for the
md device. Cache lower discard support in r5conf when setting queue limits,
and reject unsupported discard bios before queuing stripe work.

Link: https://patch.msgid.link/20260605072639.2434847-3-yukuai@kernel.org
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid5: account discard IO

Raid5 handles discard bios internally through make_discard_request() and
never passes them through md_account_bio(). As a result, discard IO is
missing the md-device iostat accounting that normal raid5 IO and discard
IO in other raid levels get from md_account_bio().

Before accounting the bio, trim the request to the full data stripes that
raid5 will actually discard. The first full stripe is the ceiling of the
bio start divided by data-stripe sectors, and the last full stripe is the
floor of the bio end divided by data-stripe sectors. Account that exact
MD logical full-stripe range, then restore the original iterator so bio
completion and iostat still cover the original request.

Link: https://patch.msgid.link/20260605072639.2434847-2-yukuai@kernel.org
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid1: simplify raid1_write_request() error handling

raid1_write_request() increments rdev->nr_pending before checking the
badblocks and then immediately decrements it again when a device is
skipped. Move the increment until after the checks succeed so the
reference accounting is easier to follow.

Consolidate the failure paths so that each error label releases exactly
the resources acquired up to that point. err_dec_pending drops pending
references and frees the r1bio, while err_allow_barrier handles the
barrier release before returning.

When a REQ_ATOMIC write cannot be satisfied due to a badblock range,
complete the bio with BLK_STS_NOTSUPP rather than reporting an I/O
error, since the operation is unsupported rather than having failed
during I/O.

Rename max_write_sectors to max_sectors and remove the redundant local
copy.

Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260613182810.1317258-5-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid10: fix writes_pending and barrier reference leaks on discard failures

raid10_make_request() acquires a writes_pending reference with
md_write_start() before calling raid10_handle_discard(). Several failure
paths in raid10_handle_discard() complete the bio and return without
releasing the corresponding reference, causing md_write_end() to be
skipped.

Call md_write_end() before returning from these failure paths to keep
writes_pending accounting balanced.

Additionally, discard split allocation failures can occur after
wait_barrier() succeeds. Those paths return without calling
allow_barrier(), leaking the associated barrier reference.

Release the barrier before returning from those paths.

Fixes: c9aa889b035f ("md: raid10 add nowait support")
Fixes: 4cf58d952909 ("md/raid10: Handle bio_split() errors")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260613182810.1317258-4-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid10: fix writes_pending leak on write request failures

raid10_make_request() acquires a writes_pending reference with
md_write_start() before dispatching write requests. Several failure
paths in raid10_write_request() complete the bio and return without
reaching the normal write completion path, causing the corresponding
md_write_end() to be skipped.

Make raid10_write_request() return a status indicating whether the write
request was successfully queued. This allows raid10_make_request() to
release the writes_pending reference with md_write_end() when a write
request fails.

Fixes: 4cf58d952909 ("md/raid10: Handle bio_split() errors")
Fixes: c9aa889b035f ("md: raid10 add nowait support")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260613182810.1317258-3-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>

md/raid1: fix writes_pending and barrier reference leaks on write failures

raid1_make_request() acquires a writes_pending reference with
md_write_start() before calling raid1_write_request(). Several failure
paths in raid1_write_request() complete the bio and return without
reaching the normal write completion path, causing the corresponding
md_write_end() to be skipped.

Make raid1_write_request() return a status indicating whether the write
request was successfully queued. This allows raid1_make_request() to
call md_write_end() when raid1_write_request() fails.

Additionally, if wait_blocked_rdev() fails after wait_barrier()
succeeds, the associated barrier reference is not released.

Call allow_barrier() before returning from that path to keep the barrier
accounting balanced.

Fixes: b1a7ad8b5c4f ("md/raid1: Handle bio_split() errors")
Fixes: f2a38abf5f1c ("md/raid1: Atomic write support")
Fixes: 5aa705039c4f ("md: raid1 add nowait support")
Reported-by: sashiko-bot <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260611083514.754922-1-abd.masalkhi@gmail.com?part=1
Closes: https://sashiko.dev/#/patchset/20260611132500.763528-1-abd.masalkhi@gmail.com?part=1
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260613182810.1317258-2-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>

docs: ipmi: Fix path of the "hotmod" module parameter

The correct path of the "hotmod" module parameter should be
/sys/module/ipmi_si/parameters/hotmod. Fix it.

Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Message-ID: <20260620122747.7902-1-zenghui.yu@linux.dev>
Signed-off-by: Corey Minyard <corey@minyard.net>

smb: client: refactor ACL setting control flow in id_mode_to_cifs_acl()

Refactor the control flow in id_mode_to_cifs_acl() to reduce nesting and
prevent error code overwriting.

Instead of wrapping the call to ops->set_acl() in a conditional block,
introduce early exits (goto id_mode_to_cifs_acl_exit) when build_sec_desc()
fails or ops->set_acl is NULL. This ensures that any actual error returned
by build_sec_desc() is not overwritten with -EOPNOTSUPP.

Signed-off-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

Merge tag 'for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply and reset updates from Sebastian Reichel:
"Power-supply drivers:

   - New EC driver providing battery info for Microsoft Surface RT

   - New driver for battery charger in Samsung S2M PMICs

   - Rework max17042 driver

   - sysfs control for bd71828 auto input current limitation

  All over:

   - Use named fields for struct platform_device_id and of_device_id
     entries

   - Misc small cleanups and fixes"

* tag 'for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (33 commits)
  Documentation: ABI: sysfs-class-reboot-mode-reboot_modes: fix doc warnings
  power: supply: charger-manager: fix refcount leak in is_full_charged()
  power: supply: core: fix supplied_from allocations
  power: supply: max17042_battery: Use modern PM ops to clear up warning
  power: supply: add support for Samsung S2M series PMIC charger device
  power: supply: Add support for Surface RT battery and charger
  dt-bindings: embedded-controller: Document Surface RT EC
  power: supply: bd71828: sysfs for auto input current limitation
  power: supply: cpcap-charger: include missing <linux/property.h>
  power: supply: cros_charge-control: Move MODULE_DEVICE_TABLE next to the table itself
  power: supply: ab8500_fg: Fix typos in comments
  power: supply: Use named initializers for arrays of i2c_device_data
  power: supply: Remove unused jz4740-battery.h
  power: reset: st-poweroff: Use of_device_get_match_data()
  power: supply: bq257xx: Add fields for 'charging' and 'overvoltage' states
  power: supply: bq257xx: Consistently use indirect get/set helpers
  power: supply: bq257xx: Make the default current limit a per-chip attribute
  power: supply: bq257xx: Fix VSYSMIN clamping logic
  power: supply: cpcap-battery: Fix missing nvmem_device_put() causing reference leak
  power: supply: max17042: fix OF node reference imbalance
  ...

Merge tag 'strncpy-removal-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull strncpy removal from Kees Cook:

- Remove the per-arch strncpy implementations in alpha, m68k, powerpc,
   x86, and xtensa

- Remove strncpy API

   Over the last 6 years working on strncpy removal there were 362
   commits by 70 contributors. Folks with more than 1 commit were:

    211  Justin Stitt <justinstitt@google.com>
     22  Xu Panda <xu.panda@zte.com.cn>
     21  Kees Cook <kees@kernel.org>
     17  Thorsten Blum <thorsten.blum@linux.dev>
     12  Arnd Bergmann <arnd@arndb.de>
      4  Pranav Tyagi <pranav.tyagi03@gmail.com>
      4  Lee Jones <lee@kernel.org>
      2  Steven Rostedt <rostedt@goodmis.org>
      2  Sam Ravnborg <sam@ravnborg.org>
      2  Marcelo Moreira <marcelomoreira1905@gmail.com>
      2  Krzysztof Kozlowski <krzk@kernel.org>
      2  Kalle Valo <kvalo@kernel.org>
      2  Jaroslav Kysela <perex@perex.cz>
      2  Daniel Thompson <danielt@kernel.org>
      2  Andrew Lunn <andrew@lunn.ch>

* tag 'strncpy-removal-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  string: Remove strncpy() from the kernel
  xtensa: Remove arch-specific strncpy() implementation
  x86: Remove arch-specific strncpy() implementation
  powerpc: Remove arch-specific strncpy() implementation
  m68k: Remove arch-specific strncpy() implementation
  alpha: Remove arch-specific strncpy() implementation

ieee802154: allow legacy LLSEC ADD/DEL ops to pass strict validation

The LLSEC ADD/DEL doit handlers under the legacy IEEE802154_NL family
consume IEEE802154_ATTR_LLSEC_KEY_BYTES and
IEEE802154_ATTR_LLSEC_KEY_USAGE_COMMANDS, both declared in
net/ieee802154/nl_policy.c as bare length entries with no .type
(defaulting to NLA_UNSPEC). Generic netlink strict validation rejects
all NLA_UNSPEC attributes via validate_nla(), so every LLSEC_ADD_KEY,
LLSEC_DEL_KEY, LLSEC_ADD_DEV, LLSEC_DEL_DEV, LLSEC_ADD_DEVKEY,
LLSEC_DEL_DEVKEY, LLSEC_ADD_SECLEVEL, and LLSEC_DEL_SECLEVEL request
fails at the dispatcher with "Unsupported attribute" before reaching
the handler.

The doit path has been silently dead since strict validation became
the default for genl families that do not opt out. The dump path is
unaffected because dump requests carry no LLSEC attributes to
validate, which is why the LLSEC_LIST_KEY read remained reachable
(patch 1/2). Introduce IEEE802154_OP_RELAXED() mirroring
IEEE802154_OP() but with .validate = GENL_DONT_VALIDATE_STRICT, and
use it for the eight legacy LLSEC mutate ops so admin-driven LLSEC
configuration via the legacy interface works again.

Fixes: 3e9c156e2c21 ("ieee802154: add netlink interfaces for llsec")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://lore.kernel.org/20260520141640.1149513-3-michael.bommarito@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

ieee802154: admin-gate legacy LLSEC dump operations

In net/ieee802154/netlink.c, the legacy IEEE802154_NL family ops table
builds the LLSEC dump entries (LLSEC_LIST_KEY, LLSEC_LIST_DEV,
LLSEC_LIST_DEVKEY, LLSEC_LIST_SECLEVEL) with IEEE802154_DUMP() which
sets no .flags, so generic netlink runs them ungated. The modern
nl802154 family admin-gates the equivalent reads via
NL802154_CMD_GET_SEC_KEY and friends with .flags = GENL_ADMIN_PERM.

Any local uid that can open AF_NETLINK / NETLINK_GENERIC can resolve
the "802.15.4 MAC" family and dump LLSEC_LIST_KEY on any wpan netdev
that has an LLSEC key installed; the dump handler writes the raw
16-byte AES-128 key bytes (IEEE802154_ATTR_LLSEC_KEY_BYTES, copied
verbatim from struct ieee802154_llsec_key.key) into the reply.
Recovering the AES key compromises 802.15.4 LLSEC link confidentiality
and authenticity, since LLSEC uses CCM* and the same key authenticates
and encrypts frames.

Impact: any local uid with no capabilities can read the raw 16-byte
AES-128 LLSEC key from the kernel keytable on any wpan netdev that has
an administrator-installed LLSEC key, by issuing an LLSEC_LIST_KEY
dump on the legacy IEEE802154_NL generic-netlink family.

Introduce IEEE802154_DUMP_PRIV() mirroring IEEE802154_DUMP() but
setting .flags = GENL_ADMIN_PERM, and use it for the four LLSEC dump
entries. LIST_PHY and LIST_IFACE retain IEEE802154_DUMP() because the
modern nl802154 family exposes their equivalents to unprivileged
readers by design (NL802154_CMD_GET_WPAN_PHY and
NL802154_CMD_GET_INTERFACE carry "can be retrieved by unprivileged
users" annotations).

Fixes: 3e9c156e2c21 ("ieee802154: add netlink interfaces for llsec")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://lore.kernel.org/20260520141640.1149513-2-michael.bommarito@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

mac802154: Prevent overwrite return code in mac802154_perform_association()

When assoc_status not equal to IEEE802154_ASSOCIATION_SUCCESSFUL, the
return value assigned to either "-ERANGE" or "-EPERM" but this return
value will be overwritten to 0 after exiting the conditional scope.
So, jump to clear_assoc label to preserve the return value when
assoc_status not equal to IEEE802154_ASSOCIATION_SUCCESSFUL.

This is reported by Coverity Scan as "Unused value".

Fixes: fefd19807fe9 ("mac802154: Handle associating")
Signed-off-by: Robertus Diawan Chris <robertusdchris@gmail.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/20260602054133.470293-1-robertusdchris@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

ieee802154: fix kernel-infoleak in dgram_recvmsg()

KMSAN reported a kernel-infoleak in move_addr_to_user():

BUG: KMSAN: kernel-infoleak in instrument_copy_to_user
include/linux/instrumented.h:131 [inline]
BUG: KMSAN: kernel-infoleak in _inline_copy_to_user
include/linux/uaccess.h:205 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_user+0xcc/0x120
lib/usercopy.c:26
instrument_copy_to_user include/linux/instrumented.h:131 [inline]
_inline_copy_to_user include/linux/uaccess.h:205 [inline]
_copy_to_user+0xcc/0x120 lib/usercopy.c:26
copy_to_user include/linux/uaccess.h:236 [inline]
move_addr_to_user+0x2e7/0x440 net/socket.c:302
____sys_recvmsg+0x232/0x610 net/socket.c:2925
...
Uninit was stored to memory at:
ieee802154_addr_to_sa include/net/ieee802154_netdev.h:369 [inline]
dgram_recvmsg+0xa09/0xbe0 net/ieee802154/socket.c:739

The issue occurs because the `pan_id` field of `struct ieee802154_addr`
is left uninitialized when the address mode is `IEEE802154_ADDR_NONE`.
The execution flow is as follows:

1. `__ieee802154_rx_handle_packet()` declares a local `struct
ieee802154_hdr hdr` on the stack.
2. `ieee802154_hdr_pull()` calls `ieee802154_hdr_get_addr()` to parse
the source and destination addresses into this structure.
3. If the address mode is `IEEE802154_ADDR_NONE`,
`ieee802154_hdr_get_addr()` previously only set the `mode` field,
leaving the `pan_id` field containing uninitialized stack memory.
4. This uninitialized `pan_id` is later copied into a `struct
sockaddr_ieee802154` in `dgram_recvmsg()` via `ieee802154_addr_to_sa()`.
5. Finally, `move_addr_to_user()` copies the socket address structure to
user space, leaking the uninitialized bytes.

Fix this by using `memset` to zero out the address structure in
`ieee802154_hdr_get_addr()` when the mode is `IEEE802154_ADDR_NONE`.

Fixes: 94b4f6c21cf5 ("ieee802154: add header structs with endiannes and operations")
Assisted-by: Gemini:gemini-3.1-pro-preview Gemini:gemini-3-flash-preview syzbot
Reported-by: syzbot+346474e3bf0b26bd3090@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=346474e3bf0b26bd3090
Link: https://syzkaller.appspot.com/ai_job?id=a507a109-d683-4a2c-bc03-93394f491b17
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/62795fd9-fc0c-48eb-bb82-05ffc5a57104@mail.kernel.org
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

Merge tag 'exfat-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

- Convert exfat buffered and direct I/O to the iomap infrastructure

- Add the supporting block mapping changes needed for that conversion,
   including multi-cluster allocation, byte-based cluster mapping
   helpers

- Support SEEK_HOLE/SEEK_DATA and swapfile activation through iomap

- Fix damaged upcase-table handling so a zero-sized table does not lead
   to an infinite loop

- Fix a potential use-after-free in exfat_find_dir_entry()

- Bound filename-entry advancement in exfat_find_dir_entry()

- Preserve benign secondary entries during rename and move

- Serialize truncate against in-flight direct I/O

- Simplify exfat_lookup()

- Replace unsafe arithmetic macros with static inline helpers

* tag 'exfat-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: bound uniname advance in exfat_find_dir_entry()
  exfat: add swap_activate support
  exfat: preserve benign secondary entries during rename and move
  exfat: serialize truncate against in-flight DIO
  exfat: add support for SEEK_HOLE and SEEK_DATA in llseek
  exfat: add iomap direct I/O support
  exfat: add iomap buffered I/O support
  exfat: fix implicit declaration of brelse()
  exfat: add data_start_bytes and exfat_cluster_to_phys_bytes() helper
  exfat: add support for multi-cluster allocation
  exfat: add exfat_file_open()
  exfat: add balloc parameter to exfat_map_cluster() for iomap support
  exfat: replace unsafe macros with static inline functions
  exfat: simplify exfat_lookup()
  exfat: fix potential use-after-free in exfat_find_dir_entry()
  exfat: fix handling of damaged volume in exfat_create_upcase_table()

mac802154: llsec: add skb_cow_data() before in-place crypto

llsec_do_encrypt_unauth(), llsec_do_encrypt_auth(),
llsec_do_decrypt_unauth(), and llsec_do_decrypt_auth() all perform
in-place cryptographic transformations on skb data.  They build a
scatterlist with sg_init_one() pointing into the skb's linear data area
and then pass the same scatterlist as both src and dst to the crypto API
(e.g. crypto_skcipher_encrypt/decrypt, crypto_aead_encrypt/decrypt).

On the RX path, __ieee802154_rx_handle_packet() clones the received skb
before handing it to each subscriber via ieee802154_subif_frame().  The
cloned skb shares the same underlying data buffer via reference
counting.  When llsec_do_decrypt() subsequently modifies this shared
buffer in place, it corrupts data that other clones -- potentially
belonging to other sockets or subsystems -- still reference.

On the TX path, similar data sharing can occur when an skb's head has
been cloned (skb_cloned() returns true).

The fix is to call skb_cow_data() before performing any in-place crypto
operation.  skb_cow_data() ensures that the skb's data area is not
shared: if the skb head is cloned or the data spans multiple fragments,
it copies the data into a private buffer that can be safely modified in
place.  This is the same pattern used by:

  - ESP (net/ipv4/esp4.c, net/ipv6/esp6.c)
  - MACsec (drivers/net/macsec.c)
  - WireGuard (drivers/net/wireguard/receive.c)
  - TIPC (net/tipc/crypto.c)

Without this guard, in-place crypto on shared skb data leads to:
  - Silent data corruption of other skb clones
  - Use-after-free when the crypto API scatterwalk writes through a
    page that has already been freed by another clone's kfree_skb()
  - Kernel crashes under concurrent 802.15.4 traffic with security
    enabled (KASAN/KMSAN reports slab-use-after-free)

Found by 0sec (https://0sec.ai) using automated source analysis.

Fixes: 4c14a2fb5d14 ("mac802154: add llsec decryption method")
Fixes: 03556e4d0dbb ("mac802154: add llsec encryption method")
Cc: stable@vger.kernel.org
Reported-by: Doruk Tan Ozturk <doruk@0sec.ai>
Closes: https://lore.kernel.org/linux-wpan/20260525161806.96158-1-doruk@0sec.ai/
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Closes: <link to your mail on lore>
Link: https://lore.kernel.org/20260526183726.56100-1-doruk@0sec.ai
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

ieee802154: ca8210: fix pointer truncation in kfifo on 64-bit

ca8210_test_int_driver_write() and ca8210_test_int_user_read() exchange
a kmalloc'd buffer pointer through a struct kfifo, but pass a literal
'4' as the byte count to kfifo_in()/kfifo_out().

This is correct on 32-bit (pointer = 4 bytes), but on 64-bit only the
low 4 bytes of the 8-byte pointer are written into the FIFO. The reader
then reads back 4 bytes into an 8-byte local pointer variable, leaving
the upper 4 bytes uninitialized stack data. The first dereference of
the reconstructed pointer (fifo_buffer[1]) accesses an arbitrary kernel
address and generally results in an oops.

Use sizeof(fifo_buffer) so the byte count matches pointer width on every
architecture.

The driver has no architecture restriction in Kconfig, so any 64-bit
build with CONFIG_IEEE802154_CA8210_DEBUGFS=y is exposed. Issue has
been latent since the driver was added in 2017 because it is most
commonly deployed on 32-bit MCUs.

Found via a custom Coccinelle semantic patch hunting for short-byte
kfifo I/O on byte-mode kfifos used to shuttle pointers.

Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/20260520105750.30144-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

Merge tag 'ntfs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs

Pull ntfs updates from Namjae Jeon:

- Harden handling of malformed on-disk metadata.

   This adds stricter validation for attributes, attribute lists, index
   roots and entries, EA entries, mapping pairs, and $LogFile restart
   areas. These changes fix several out-of-bounds access, integer
   overflow, and inconsistent metadata handling issues.

- Prevent a writeback deadlock involving extent MFT records

- Fix resource leaks in fill_super() failure paths and the name cache

- Serialize volume label access and improving its error handling

- Fix mapping-pairs decoding bounds and LCN overflow checks

- Keep resident index root metadata consistent during resize

- Fix the reported size of symbolic links

- Avoid an unnecessary allocation for resident inline data

- Add support for following and creating Windows native symbolic links.

   Relative links, absolute links, and junctions are handled, with new
   mount options controlling native symlink creation and absolute target
   translation. The existing WSL symlink behavior remains the default.

- The unsupported quota code is removed, along with several smaller
   cleanups

* tag 'ntfs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs: (39 commits)
  docs/fs/ntfs: add mount options to support Windows native symbolic links
  ntfs: support creating Windows native symlinks
  ntfs: clean up target name conversion for WSL symlinks
  ntfs: add native_symlink mount option
  ntfs: support following Windows native symlink with absolute paths
  ntfs: support following Windows native symlink with relative paths
  ntfs: fix incorrect size of symbolic link
  ntfs: use direct pointer for inline data to avoid redundant allocation
  ntfs: validate resident index root values on lookup
  ntfs: update index root allocated size before shrink
  ntfs: grow index root value before reparent header update
  ntfs: reject non-resident records for resident-only attributes
  ntfs: fix u16 truncation of restart-area length check
  ntfs: bound the attribute-list entry in ntfs_read_inode_mount()
  ntfs: bound the look-ahead attribute-list entry in ntfs_external_attr_find()
  ntfs: validate resident attribute lists and harden the validator
  ntfs: validate resident volume name values on lookup
  ntfs: reinit search context before volume information lookup
  ntfs: do not replace volume name after lookup errors
  ntfs: validate attribute values on lookup
  ...

ieee802154: ca8210: fix cas_ctl leak on spi_async failure

ca8210_spi_transfer() allocates cas_ctl with kzalloc_obj(GFP_ATOMIC)
and relies entirely on the SPI completion callback
ca8210_spi_transfer_complete() to free it.

The spi_async() API only invokes the completion callback on successful
submission.  On failure it returns a negative error code without ever
queuing the callback, which leaves cas_ctl and its embedded spi_message
and spi_transfer orphaned.  Every kfree(cas_ctl) in the driver is
inside the completion callback, so there is no other reclamation path.

ca8210_spi_transfer() is called from ca8210_spi_exchange(), the
interrupt handler ca8210_interrupt_handler(), and from the retry path
inside the completion callback itself.  The exchange and interrupt
handler paths loop on -EBUSY, so under sustained SPI bus contention
every retry iteration leaks a fresh cas_ctl (~600 bytes per
occurrence).

Fix it by freeing cas_ctl on the spi_async() error path.  While here,
correct the misleading error string: the function calls spi_async(),
not spi_sync().

Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/20260421073259.2259783-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path

In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and
mm_cid.active is set, the CID is checked with cid_in_transit() before
setting the transition bit.  In per-CPU mode a newly forked or exec'd
task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are
assigned lazily on schedule-in.  With cid_in_transit() the guard passes
for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET |
MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this
to clear_bit() with MM_CID_UNSET as the bit number, triggering an
out-of-bounds write.

Symptoms: this is genuine memory corruption, but a bounded out-of-bounds
write, not an arbitrary one.  MM_CID_UNSET is the fixed sentinel BIT(31),
so once the bad value reaches mm_cid_schedout() the cid_from_transit_cid()
strip leaves MM_CID_UNSET, which fails the "cid < max_cids" convergence
test and falls into mm_drop_cid() -> clear_bit(MM_CID_UNSET,
mm_cidmask(mm)).  The cid bitmap is embedded in the mm_struct slab object
(after cpu_bitmap and mm_cpus_allowed) and is only num_possible_cpus()
bits wide, so clearing bit 31 is a deterministic OOB bit-clear at a
fixed offset of 2^31 / 8 == 256 MiB past the bitmap base.  The address is
not attacker-influenced (fixed sentinel -> fixed offset) and the op only
clears a single bit; what sits 256 MiB further along the direct map is
whatever kernel object happens to live there, so this corrupts one bit of
unpredictable kernel memory -- it is not an arbitrary-address or
arbitrary-value write.

It triggers only in per-CPU CID mode, when a CPU is running an active
task of the target mm whose cid is still MM_CID_UNSET -- the
fork()/execve() window before that task's next schedule-in assigns it a
real CID -- and a per-CPU -> per-task fixup walks over it (the mode
fallback driven by a thread exit, sched_mm_cid_exit(), or by the deferred
max_cids recompute in mm_cid_work_fn()).

In practice syzkaller surfaced it as a KASAN use-after-free reported in
__schedule -> mm_cid_switch_to, where the offending clear_bit() is inlined
via mm_cid_schedout() -> mm_drop_cid().

Guard the transition-bit assignment against MM_CID_UNSET, in addition to
the existing cid_in_transit() check, so the bit is only set on a genuine
task-owned CID.  A CPU-owned (MM_CID_ONCPU) CID of a running active task
is handled by the cid_on_cpu(pcp->cid) branch above and never reaches
this path, so excluding MM_CID_UNSET (and the already-transitioning case)
is sufficient.

Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Assisted-by: Claude:claude-opus-4-8 syzkaller
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260616203818.1516263-1-riel@surriel.com

ieee802154: Remove WARN_ON() in cfg802154_pernet_exit()

There's no need to call WARN_ON() in cfg802154_pernet_exit(), since
every point of failure in cfg802154_switch_netns() is covered with
WARN_ON(), so remove it.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 66e5c2672cd1 ("ieee802154: add netns support")
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru>
Link: https://lore.kernel.org/20250403101935.991385-4-i.abramov@mt-integration.ru
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

ieee802154: Avoid calling WARN_ON() on -ENOMEM in cfg802154_switch_netns()

It's pointless to call WARN_ON() in case of an allocation failure in
dev_change_net_namespace() and device_rename(), since it only leads to
useless splats caused by deliberate fault injections, so avoid it.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 66e5c2672cd1 ("ieee802154: add netns support")
Reported-by: syzbot+e0bd4e4815a910c0daa8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/000000000000f4a1b7061f9421de@google.com/#t
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru>
Link: https://lore.kernel.org/20250403101935.991385-3-i.abramov@mt-integration.ru
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

ieee802154: Restore initial state on failed device_rename() in cfg802154_switch_netns()

Currently, the return value of device_rename() is not acted upon.

To avoid an inconsistent state in case of failure, roll back the changes
made before the device_rename() call.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 66e5c2672cd1 ("ieee802154: add netns support")
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru>
Link: https://lore.kernel.org/20250403101935.991385-2-i.abramov@mt-integration.ru
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

Merge tag 'landlock-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock updates from Mickaël Salaün:
"This adds new Landlock access rights to control UDP bind and
  connect/send operations, and a new "quiet" feature to mute specific
  specific audit logs (and other future observability events).

  A few commits also fix Landlock issues"

* tag 'landlock-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (24 commits)
  selftests/landlock: Add tests for invalid use of quiet flag
  selftests/landlock: Add tests for quiet flag with scope
  selftests/landlock: Add tests for quiet flag with net rules
  selftests/landlock: Add tests for quiet flag with fs rules
  selftests/landlock: Replace hard-coded 16 with a constant
  samples/landlock: Add quiet flag support to sandboxer
  landlock: Suppress logging when quiet flag is present
  landlock: Add API support and docs for the quiet flags
  landlock: Add a place for flags to layer rules
  landlock: Add documentation for UDP support
  samples/landlock: Add sandboxer UDP access control
  selftests/landlock: Add tests for UDP send
  selftests/landlock: Add tests for UDP bind/connect
  landlock: Add UDP send+connect access control
  landlock: Add UDP bind() access control
  landlock: Fix unmarked concurrent access to socket family
  selftests/landlock: Explicitly disable audit in teardowns
  selftests/landlock: Test SCOPE_SIGNAL on the SIGIO/fowner pgid path
  landlock: Fix LANDLOCK_SCOPE_SIGNAL bypass on the SIGIO path
  landlock: Demonstrate best-effort allowed_access filtering
  ...

Merge tag 'for-next-keys-7.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull keys update from Jarkko Sakkinen:
"This contains only bug fixes"

* tag 'for-next-keys-7.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  keys: keyctl_pkey: replace BUG with return -EOPNOTSUPP
  keys: request_key: replace BUG with return -EINVAL
  keys: Pin request_key_auth payload in instantiate paths
  keys: prevent slab cache merging for key_jar
  keys: Replace strcpy(derived_buf, "AUTH_KEY") with strscpy(..., HASH_SIZE)
  KEYS: Use acquire when reading state in keyring search
  keys/trusted_keys: mark 'migratable' as __ro_after_init
  keys: use kmalloc_flex in user_preparse
  KEYS: trusted: Debugging as a feature
  KEYS: encrypted: Remove unnecessary selection of CRYPTO_RNG
  KEYS: fix overflow in keyctl_pkey_params_get_2()

ACPI: IPMI: Fix inverted interface check in ipmi_bmc_gone()

Before commit a1a69b297e47 ("ACPI / IPMI: Fix race caused by the
unprotected ACPI IPMI user"), ipmi_bmc_gone() skipped entries whose
interface number did not match the SMI being removed, then killed the
matching entry:

if (ipmi_device->ipmi_ifnum != iface)
continue;

__ipmi_dev_kill(ipmi_device);

That commit folded the removal block into the existing non-match test
while converting the object lifetime handling, but left the comparison
unchanged. The old != meant "continue past this entry"; after the
refactor it meant "kill this entry".

As a result, a single ACPI IPMI interface is never removed when its SMI
disappears. If multiple interfaces are tracked, the first interface
whose number differs from iface is removed instead, while the interface
that actually disappeared remains on driver_data.ipmi_devices. The
stale entry is not marked dead and can continue to be selected for ACPI
IPMI transactions. It can also prevent the same ACPI handle from being
registered again.

Change the comparison to == so ipmi_bmc_gone() removes exactly the
interface reported as gone by the SMI watcher. This restores the
pre-a1a69b297e47 behavior and is the correct interface matching logic.

Fixes: a1a69b297e47 ("ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user")
Signed-off-by: Xu Rao <raoxu@uniontech.com>
Link: https://patch.msgid.link/B486593E06E6F6E0+20260616093621.1039943-1-raoxu@uniontech.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge tag 'integrity-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull IMA updates from Mimi Zohar:

- Introduce IMA and EVM post-quantum ML-DSA signature support

   ML-DSA signature support for IMA and EVM is limited to sigv3
   signatures, which calculates and verifies a hash of a compact
   structure containing the file data/metadata hash, hash type, and hash
   algorithm. IMA and EVM still calculate the file data/metadata hashes
   respectively.

- Introduce support for removing IMA measurement list records stored in
   kernel memory

   The IMA measurement list can grow large depending on policy, but
   removing records breaks remote attestation, unless they are safely
   preserved and made available for attestation requests. Until
   environments are prepared to preserve the measurement records, a new
   CONFIG_IMA_STAGING Kconfig option is introduced to guard against
   deletion.

   Several approaches for removing measurement list records were
   evaluated but rejected due to filesystem constraints, the
   introduction of a new critical data record, and locking concerns. Two
   methods are being upstreamed: staged deletion with confirmation, and
   staged deletion of N records without confirmation. Both methods
   minimize the period during which new measurements are blocked from
   being appended to the measurement list by staging the measurement
   list.

   A comparison of the two methods is included in the documentation.

- Some code cleanup, and a couple of bug fixes

* tag 'integrity-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  doc: security: Add documentation of exporting and deleting IMA measurements
  ima: Support staging and deleting N measurements records
  ima: Add support for flushing the hash table when staging measurements
  ima: Add support for staging measurements with prompt
  ima: Introduce ima_dump_measurement()
  ima: Use snprintf() in create_securityfs_measurement_lists
  ima: Mediate open/release method of the measurements list
  ima: Introduce _ima_measurements_start() and _ima_measurements_next()
  ima: Introduce per binary measurements list type binary_runtime_size value
  ima: Introduce per binary measurements list type ima_num_records counter
  ima: Replace static htable queue with dynamically allocated array
  ima: Remove ima_h_table structure
  evm: terminate and bound the evm_xattrs read buffer
  integrity: Add support for sigv3 verification using ML-DSA keys
  integrity: Refactor asymmetric_verify for reusability
  integrity: Check that algo parameter is within valid range
  integrity: Check for NULL returned by asymmetric_key_public_key
  ima: return error early if file xattr cannot be changed
  ima: Fix sigv3 signature handling for EVM_IMA_XATTR_DIGSIG

ACPI: resource: Amend kernel-doc style

The functions are referred as func() in the kernel-doc. The % (percent)
character makes the rendering for constants as described in the respective
documentation. Amend all these.

Fixes: 8e345c991c8c ("ACPI: Centralized processing of ACPI device resources")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260617090555.2648709-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: sysfs: Fix path of module parameters in comments

The correct path of module parameters should be
/sys/module/acpi/parameters/xxx. Fix them.

Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260611142518.77343-1-zenghui.yu@linux.dev
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

thermal: intel: Fix dangling resources on thermal_throttle_online() failure

The function thermal_throttle_add_dev() may fail and abort a CPU hotplug
online operation. Since the failure occurs within the online callback,
thermal_throttle_online(), the CPU hotplug framework does not invoke the
corresponding offline callback. As a result, the hardware and software
resources set up during the failed operation are not torn down.

Since only thermal_throttle_add_dev() can fail, call it before setting up
the rest of the resources.

Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://patch.msgid.link/20260613-rneri-directed-therm-intr-v3-1-3a26d1e47fc8@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

- "selftests/mm: clean up build output and verbosity" (Li Wang)

   Remove some noise from the MM selftests build

- "mm: Free contiguous order-0 pages efficiently" (Ryan Roberts)

   Speed up the freeing of a batch of 0-order pages by first scanning
   them for coalescing opportunities. This is applicable to vfree() and
   to the releasing of frozen pages

- "mm/damon: introduce DAMOS failed region quota charge ratio"
   (SeongJae Park)

   Address a DAMOS usability issue: The DAMOS quota often exhausts
   prematurely because it charges for all memory attempted, causing slow
   and inconsistent performance when actions fail on unreclaimable
   memory.

   To fix this, a new feature lets users set a smaller, flexible quota
   charge ratio (via a numerator and denominator) for failed regions.
   Since failed actions cause less overhead, reducing their quota cost
   ensures more predictable and efficient DAMOS processing

- "selftests/cgroup: improve zswap tests robustness and support large
   page sizes" (Li Wang)

   Fix various spurious failures and improves the overall robustness of
   the cgroup zswap selftests

- "fix MAP_DROPPABLE not supported errno" (Anthony Yznaga)

   Fix an issue in the mlock selftests on arm32

- "mm: huge_memory: clean up defrag sysfs with shared" (Breno Leitao)

   Some maintenance work in the huge_memory code

- "treewide: fixup gfp_t printks" (Brendan Jackman)

   Use the special vprintf() gfp_t conversion in various places

- "mm: Fix vmemmap optimization accounting and initialization" (Muchun
   Song)

   Fix several bugs in the vmemmap optimization, mainly around incorrect
   page accounting and memmap initialization in the DAX and memory
   hotplug paths. It also fixes pageblock migratetype initialization and
   struct page initialization for ZONE_DEVICE compound pages

- "mm/damon: repost non-hotfix reviewed patches in damon/next tree"

   A sprinkle of unrelated minor bugfixes for DAMON

- "mm: remove page_mapped()" (David Hildenbrand)

   Remove this function from the tree, replacing it with folio_mapped()

- "mm/damon: let DAMON be paused and resumed" (SeongJae Park)

   Allow DAMON to be paused and resumed without losing its current state

- "kasan: hw_tags: Disable tagging for stack and page-tables" (Muhammad
   Usama Anjum)

   Simplify and speed up kasan by removing its ineffective tagging of
   stacks and page tables

- "mm/damon/reclaim,lru_sort: monitor all system rams by default"
   (SeongJae Park)

   Simplify deployment on diverse hardware like NUMA systems by updating
   DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the
   physical address range covering all System RAM areas by default,
   replacing the overly restrictive behavior that only targeted the
   single largest memory block to save on negligible overhead

- "mm/damon/sysfs: document filters/ directory as deprecated" (SeongJae
   Park)

   Update some DAMON docs

- "mm: use spinlock guards for zone lock" (Dmitry Ilvokhin)

   Switch zone->lock handling over to using the guard() mechanisms

- "mm/filemap: tighten mmap_miss hit accounting" (fujunjie)

   Fix a flaw where the mmap_miss counter over-credited page cache hits
   during fault-arounds and page-fault retries. This results in
   significant reduction of redundant synchronous mmap readahead I/O,
   drastically cutting down execution time and gigabytes read for sparse
   random or strided memory access workloads

- "selftests/cgroup: Fix false positive failures in test_percpu_basic"
   (Li Wang)

   Fix a couple of false-positives in the cgroup kmem selftests

- "mm/damon/reclaim: support monitoring intervals auto-tuning"
   (SeongJae Park)

   Add a new parameter to DAMON permitting DAMON_RECLAIM to
   automatically tune DAMON's sampling and aggregation intervals

- "mm/damon/stat: add kdamond_pid parameter" (SeongJae Park)

   Change DAMON_STAT to provide the pid of its kdamond

- "mm/kmemleak: dedupe verbose scan output" (Breno Leitao)

   Remove large amounts of duplicated backtraces from the verbose-mode
   kmemleak output

- "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" (David
   Hildenbrand)

   Reduce our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to
   removing it entirely in a later series

- "mm/damon: validate min_region_size to be power of 2" (Liew Rui Yan)

   Prevent users from passing a non-power-of-2 value of `addr_unit', as
   this later results in undesirable behavior

- "mm: document read_pages and simplify usage" (Frederick Mayle)

- "tools/mm/page-types: Fix misc bugs" (Ye Liu)

   Fix three issues in tools/mm/page-types.c

- "mm: misc cleanups from __GFP_UNMAPPED series" (Brendan Jackman)

   Implement several cleanups in the page allocator and related code

- "mm, swap: swap table phase IV: unify allocation" (Kairui Song)

   Unify the allocation and charging of anon and shmem swap in folios,
   provides better synchronization, consolidates the metadata
   management, hence dropping the static array and map, and improves
   performance

- "mm/damon: introduce data attributes monitoring" (SeongJae Park(

   Extend DAMON to monitor general data attributes other than accesses

- "mm/vmalloc: free unused pages on vrealloc() shrink" (Shivam Kalra)

   Implement the TODO in vrealloc() to unmap and free unused pages when
   shrinking across a page boundary

- "mm/damon: documentation and comment fixes" (niecheng)

- "remove mmap_action success, error hooks" (Lorenzo Stoakes)

   Eliminate custom hooks from mmap_action by removing the problematic
   success_hook which allowed drivers to improperly access uninitialized
   VMAs. It replaces the error_hook with a simple error-code field and
   updates the memory char driver accordingly

- "mm/damon: minor improvements for code readability and tests"
   (SeongJae Park)

- "mm/damon: fix macro arguments and clarify quota goals doc" (Maksym
   Shcherba)

- "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" (Mike
   Rapoport)

- "mm/mglru: improve reclaim loop and dirty folio" (Kairui Song and
   others)

   Clean up and slightly improves MGLRU's reclaim loop and dirty
   writeback handling. Large performance improvements are measured

- "use vma locks for proc/pid/{smaps|numa_maps} reads" (Suren
   Baghdasaryan)

   Use per-vma locks when reading /proc/pid/smaps and numa_maps similar
   to reduce contention on central mmap_lock

- "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()"
   (Ran Xiaokai)

   Some cleanup work in the THP code

- "selftests/memfd: fix compilation warnings" (Konstantin Khorenko)

   Fix a few build glitches in the memfd selftest code.

- "memcg: shrink obj_stock_pcp and cache multiple objcgs" (Shakeel
   Butt)

   Resolve a 68% performance regression caused by NUMA-node cache
   thrashing around struct obj_stock_pcp by shrinking its existing
   fields and expanding it into a multi-slot array that caches up to
   five obj_cgroup pointers per CPU, allowing per-node variants of the
   same memcg to coexist within a single 64-byte cache line.

- "zram: writeback fixes" (Sergey Senozhatsky)

   address a couple of unrelated zram writeback issues

- "mm: switch THP shrinker to list_lru" (Johannes Weiner)

   Resolve NUMA-awareness issues and streamlines callsite interaction by
   refactoring and extending the list_lru API to completely replace the
   complex, open-coded deferred split queue for Transparent Huge Pages

- "mm: improve large folio readahead for exec memory" (Usama Arif)

   Improve large-folio readahead on systems like 64K-page arm64 by
   preventing the mmap_miss check from permanently disabling
   target-oriented VM_EXEC readahead, and by generalizing the
   force_thp_readahead gate to support mappings with any usefully large
   maximum folio order under the cache cap.

- "userfaultfd/pagemap: pre-existing fixes" (Kiryl Shutsemau)

   Fix a bunch of minor issues in the userfaultfd/pagemap, all of which
   were flagged by Sashiko review of proposed new material

- "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and
   vmemmap_check_pmd()" (Muchun Song)

   Provide generic versions of these two functions so the four
   arch-specific implementations can be removed.

- "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
   device" (Youngjun Park)

   Address a uswsusp-vs-swapoff race and reduces the swap device
   reference taking/releasing frequency.

- "mm/hmm: A fix and a selftest" (Dev Jain)

* tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries
  fs/proc/task_mmu: do not warn on seeing non-migration pmd entry
  lib/test_hmm: check alloc_page_vma() return value and handle OOM
  mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX
  mm/swap: remove redundant swap device reference in alloc/free
  mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device
  mm/filemap: use folio_next_index() for start
  vmalloc: fix NULL pointer dereference in is_vm_area_hugepages()
  sparc/mm: drop vmemmap_check_pmd helper and use generic code
  loongarch/mm: drop vmemmap_check_pmd helper and use generic code
  riscv/mm: drop vmemmap_pmd helpers and use generic code
  arm64/mm: drop vmemmap_pmd helpers and use generic code
  mm/sparse-vmemmap: provide generic vmemmap_set_pmd() and vmemmap_check_pmd()
  rust: page: mark Page::nid as inline
  userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks
  userfaultfd: gate must_wait writability check on pte_present()
  mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade
  fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole()
  fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry()
  fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race
  ...

regulator: pca9450: Correct default t_off_deb for PCA9451A/PCA9452

The PMIC PCA9451A and PCA9452 have a default power-off debounce time of
2ms according to their datasheet, while PCA9450A and PCA9450BC use 120us.

Add default_t_off_deb field to struct pca9450 to support per-variant
default configuration when the device tree property is not specified.

Datasheet reference links:
- PCA9451A Rev.2.1: https://www.nxp.com/docs/en/data-sheet/PCA9451A.pdf
- PCA9452 Rev.1.0: https://www.nxp.com/docs/en/data-sheet/PCA9452.pdf

Signed-off-by: Joy Zou <joy.zou@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260618-b4-regulator-opt-v1-1-c43b1f62aaf6@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: dw: Add support for snps,dwc-ssi-2.00a

Add a new compatible entry "snps,dwc-ssi-2.00a" for the Synopsys
DesignWare SSI controller version 2.00a. This variant uses the same
initialization routine as snps,dwc-ssi-1.01a (dw_spi_hssi_init).

Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
Link: https://patch.msgid.link/20260619143443.22267-3-changhuang.liang@starfivetech.com
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: dt-bindings: snps,dw-apb-ssi: Add starfive,jhb100-spi

Add a new compatible string "starfive,jhb100-spi" for the StarFive
JHB100 SPI, it based on the Synopsys DesignWare SSI version 2.00a,
uses snps,dwc-ssi-2.00a as the primary fallback and snps,dwc-ssi-1.01a
as the secondary fallback.

Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260619143443.22267-2-changhuang.liang@starfivetech.com
Signed-off-by: Mark Brown <broonie@kernel.org>