git.ipfire.org Git - thirdparty/kernel/linux.git/log

rv: Prevent in-flight per-task handlers from using invalid slots

Per-task monitors use a slot in the task_struct->rv[] array and store
that locally (e.g. task_mon_slot), this slot is returned during the
destruction process but currently hanlers can be running while that slot
is returning and this race may lead to accessing an invalid slot.

Synchronise with all in-flight tracepoint handlers using
tracepoint_synchronize_unregister() before returning the slot.

Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
Fixes: a9769a5b9878 ("rv: Add support for LTL monitors")
Suggested-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-4-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

rv: Reset per-task DA monitors before releasing the slot

Per-task monitors use task_mon_slot to determine which slot in the array
to use for the monitor. During destruction, this slot is returned but
this is done before resetting the monitor. As a result, the monitor's
reset is in fact resetting a slot that is outside of the array
(RV_PER_TASK_MONITOR_INIT).

Release the slot only after the reset to avoid out-of-bound memory
access.

Fixes: f5587d1b6ec93 ("rv: Add Hybrid Automata monitor type")
Cc: stable@vger.kernel.org
Suggested-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-3-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

rv: Fix __user specifier usage in extract_params()

The attributes variables extracted from syscalls in the helper are both
defined with the __user specifier although only the actual pointer to
user data should be marked.

Remove the __user specifier from attr.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604150820.Ny143u6X-lkp@intel.com
Fixes: b133207deb72 ("rv: Add nomiss deadline monitor")
Reviewed-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-2-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

pmdomain: Merge branch fixes into next

Merge the pmdomain fixes for v7.1-rc[n] into the next branch, to allow
them to get tested together with the pmdomain changes that are targeted
for the next release.

Signed-off-by: Ulf Hansson <ulfh@kernel.org>

pmdomain: mediatek: mfg: move __packed after struct name to fix kernel-doc

The kernel-doc parser cannot parse 'struct __packed mtk_mfg_opp_entry {'.
Move __packed to the closing brace, which is the more common kernel style.

Assisted-by: Opencode:Big-pickle
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

pmdomain: qcom: rpmpd: Add Shikra RPM Power Domains

Add RPM power domain support for Shikra, reusing SM6125 power
domains with RPM_SMD_LEVEL_TURBO_NO_CPR as the max state.

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Rakesh Kota <rakesh.kota@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

pmdomain: qcom: rpmhpd: Add power domains for Nord SoC

Add RPMh power domains required for Nord SoC. This includes
new definitions for power domains supplying GFX1 and NSP3 subsystem.

Co-developed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Kamal Wadhwa <kamal.wadhwa@oss.qualcomm.com>
Signed-off-by: Shawn Guo <shengchao.guo@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

pmdomain: Merge branch dt into next

Merge the immutable branch dt into next, to allow the updated DT bindings
to be tested together with the pmdomain changes that are targeted for the
next release.

Signed-off-by: Ulf Hansson <ulfh@kernel.org>

dt-bindings: power: qcom,rpmpd: document the Shikra RPM Power Domains

Document the RPM Power Domains on the Shikra Platform.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Rakesh Kota <rakesh.kota@oss.qualcomm.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

pmdomain: imx: fix OF node refcount

for_each_child_of_node_scoped() decrements the reference count of the
nod after each iteration. Assigning it without incrementing the refcount
to a dynamically allocated platform device will result in a double put
in platform_device_release(). Add the missing call to of_node_get().

Cc: stable@vger.kernel.org
Fixes: 3e4d109ee8fc ("pmdomain: imx: gpc: Simplify with scoped for each OF child loop")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

pmdomain: ti_sci: add wakeup constraint to parent devices of wakeup source

Set wakeup constraint for any device in a wakeup path. All parent devices
of a wakeup device should not be turned off during suspend. This ensures
the wakeup device is kept on while the system is suspended.

Cc: stable@vger.kernel.org
Fixes: 9d8aa0dd3be4 ("pmdomain: ti_sci: add wakeup constraint management")
Reported-by: Vitor Soares <vitor.soares@toradex.com>
Closes: https://lore.kernel.org/linux-pm/c0fe43a2339c802e9ce5900092cd530a2ba17a6b.camel@gmail.com/
Signed-off-by: Kendall Willis <k-willis@ti.com>
Reviewed-by: Sebin Francis <sebin.francis@ti.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

nvme-tcp: move nvme_tcp_reclassify_socket()

Move nvme_tcp_reclassify_socket() in tcp.c after the struct
nvme_tcp_queue definition. This is preparation for adding a reference
to struct nvme_tcp_queue in the function, which would otherwise cause a
compile failure due to the struct being defined after the function.

Move the entire CONFIG_DEBUG_LOCK_ALLOC block along with the function
to maintain the code organization.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: validate FDP configuration descriptor sizes

Validate descriptor sizes while walking the FDP configurations log so
dsze == 0 or a descriptor past the log end cannot cause unbounded
iteration or reads past the buffer.

Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: liuxixin <gliuxen@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-auth: validate reply message payload bounds against transfer length

nvmet_auth_reply() accesses the variable-length rval[] array using
attacker-controlled hl (hash length) and dhvlen (DH value length) fields
without verifying they fit within the allocated buffer of tl bytes.

A malicious NVMe-oF initiator can craft a DHCHAP_REPLY message with a
small transfer length but large hl/dhvlen values, causing out-of-bounds
heap reads when the target processes the DH public key (rval + 2*hl) or
performs the host response memcmp.

With DH authentication configured, the OOB pointer is passed directly to
sg_init_one() and read by crypto_kpp_compute_shared_secret(), reaching
up to 526 bytes past the buffer. This is exploitable pre-authentication.

Add bounds validation ensuring sizeof(*data) + 2*hl + dhvlen <= tl before
any access to the variable-length fields.

Discovered by Atuin - Automated Vulnerability Discovery Engine.

Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication")
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Signed-off-by: Tianchu Chen <flynnnchen@tencent.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

selftests: futex: Add tests for robust release operations

Add tests for __vdso_futex_robust_listXX_try_unlock() and for the futex()
op FUTEX_ROBUST_UNLOCK.

Test the contended and uncontended cases for the vDSO functions and all
ops combinations for FUTEX_ROBUST_UNLOCK.

[ tglx: Replace the VDSO function lookup ]

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260329-tonyk-vdso_test-v2-2-b7db810e44a1@igalia.com
Link: https://patch.msgid.link/20260602090535.988101541@kernel.org

Documentation: futex: Add a note about robust list race condition

Add a note to the documentation giving a brief explanation why doing a
robust futex release in userspace is racy, what should be done to avoid
it and provide links to read more.

[ tglx: Fixed a few typos ]

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260329-tonyk-vdso_test-v2-1-b7db810e44a1@igalia.com
Link: https://patch.msgid.link/20260602090535.936286833@kernel.org

x86/vdso: Implement __vdso_futex_robust_try_unlock()

When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
then the unlock sequence in userspace looks like this:

  1) robust_list_set_op_pending(mutex);
  2) robust_list_remove(mutex);

   lval = gettid();
  3) if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
  4) robust_list_clear_op_pending();
   else
  5) sys_futex(OP,...FUTEX_ROBUST_UNLOCK);

That still leaves a minimal race window between #3 and #4 where the mutex
could be acquired by some other task which observes that it is the last
user and:

  1) unmaps the mutex memory
  2) maps a different file, which ends up covering the same address

When then the original task exits before reaching #5 then the kernel robust
list handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupt unrelated data.

Provide a VDSO function which exposes the critical section window in the
VDSO symbol table. The resulting addresses are updated in the task's mm
when the VDSO is (re)map()'ed.

The core code detects when a task was interrupted within the critical
section and is about to deliver a signal. It then invokes an architecture
specific function which determines whether the pending op pointer has to be
cleared or not. The unlock assembly sequence on 64-bit is:

mov %esi,%eax // Load TID into EAX
        xor %ecx,%ecx // Set ECX to 0
lock cmpxchg %ecx,(%rdi) // Try the TID -> 0 transition
  .Lstart:
jnz     .Lend
movq %rcx,(%rdx) // Clear list_op_pending
  .Lend:
ret

So the decision can be simply based on the ZF state in regs->flags. The
pending op pointer is always in DX independent of the build mode
(32/64-bit) to make the pending op pointer retrieval uniform. The size of
the pointer is stored in the matching criticial section range struct and
the core code retrieves it from there. So the pointer retrieval function
does not have to care. It is bit-size independent:

     return regs->flags & X86_EFLAGS_ZF ? regs->dx : NULL;

There are two entry points to handle the different robust list pending op
pointer size:

__vdso_futex_robust_list64_try_unlock()
__vdso_futex_robust_list32_try_unlock()

The 32-bit VDSO provides only __vdso_futex_robust_list32_try_unlock().

The 64-bit VDSO provides always __vdso_futex_robust_list64_try_unlock() and
when COMPAT is enabled also the list32 variant, which is required to
support multi-size robust list pointers used by gaming emulators.

The unlock function is inspired by an idea from Mathieu Desnoyers.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/20260311185409.1988269-1-mathieu.desnoyers@efficios.com
Link: https://patch.msgid.link/20260602090535.883796247@kernel.org

x86/vdso: Prepare for robust futex unlock support

There will be a VDSO function to unlock non-contended robust futexes in
user space. The unlock sequence is racy vs. clearing the list_pending_op
pointer in the task's robust list head. To plug this race the kernel needs
to know the critical section window so it can clear the pointer when the
task is interrupted within that race window. The window is determined by
labels in the inline assembly.

Add these symbols to the vdso2c generator and use them in the VDSO VMA code
to update the critical section addresses in mm_struct::futex on (re)map().

The symbols are not exported to user space, but available in the debug
version of the vDSO.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.828312645@kernel.org

futex: Provide infrastructure to plug the non contended robust futex unlock race

When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes,
then the unlock sequence in user space looks like this:

  1) robust_list_set_op_pending(mutex);
  2) robust_list_remove(mutex);

   lval = gettid();
  3) if (atomic_try_cmpxchg(&mutex->lock, lval, 0))
  4) robust_list_clear_op_pending();
   else
  5) sys_futex(OP | FUTEX_ROBUST_UNLOCK, ....);

That still leaves a minimal race window between #3 and #4 where the mutex
could be acquired by some other task, which observes that it is the last
user and:

  1) unmaps the mutex memory
  2) maps a different file, which ends up covering the same address

When then the original task exits before reaching #5 then the kernel robust
list handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupt unrelated data.

On X86 this boils down to this simplified assembly sequence:

mov %esi,%eax // Load TID into EAX
         xor %ecx,%ecx // Set ECX to 0
   #3 lock cmpxchg %ecx,(%rdi) // Try the TID -> 0 transition
.Lstart:
jnz     .Lend
   #4 movq %rcx,(%rdx) // Clear list_op_pending
.Lend:

If the cmpxchg() succeeds and the task is interrupted before it can clear
list_op_pending in the robust list head (#4) and the task crashes in a
signal handler or gets killed then it ends up in do_exit() and subsequently
in the robust list handling, which then might run into the unmap/map issue
described above.

This is only relevant when user space was interrupted and a signal is
pending. The fix-up has to be done before signal delivery is attempted
because:

   1) The signal might be fatal so get_signal() ends up in do_exit()

   2) The signal handler might crash or the task is killed before returning
      from the handler. At that point the instruction pointer in pt_regs is
      not longer the instruction pointer of the initially interrupted unlock
      sequence.

The right place to handle this is in __exit_to_user_mode_loop() before
invoking arch_do_signal_or_restart() as this covers obviously both
scenarios.

As this is only relevant when the task was interrupted in user space, this
is tied to RSEQ and the generic entry code as RSEQ keeps track of user
space interrupts unconditionally even if the task does not have a RSEQ
region installed. That makes the decision very lightweight:

       if (current->rseq.user_irq && within(regs, csr->unlock_ip_range))
        futex_fixup_robust_unlock(regs, csr);

futex_fixup_robust_unlock() then invokes a architecture specific function
to return the pending op pointer or NULL. The function evaluates the
register content to decide whether the pending ops pointer in the robust
list head needs to be cleared.

Assuming the above unlock sequence, then on x86 this decision is the
trivial evaluation of the zero flag:

return regs->eflags & X86_EFLAGS_ZF ? regs->dx : NULL;

Other architectures might need to do more complex evaluations due to LLSC,
but the approach is valid in general. The size of the pointer is determined
from the matching range struct, which covers both 32-bit and 64-bit builds
including COMPAT.

The unlock sequence is going to be placed in the VDSO so that the kernel
can keep everything synchronized, especially the register usage. The
resulting code sequence for user space is:

   if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != tid)
err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);

Both the VDSO unlock and the kernel side unlock ensure that the pending_op
pointer is always cleared when the lock becomes unlocked.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.773669210@kernel.org

futex: Add robust futex unlock IP range

There will be a VDSO function to unlock robust futexes in user space. The
unlock sequence is racy vs. clearing the list_pending_op pointer in the
tasks robust list head. To plug this race the kernel needs to know the
instruction window. As the VDSO is per MM the addresses are stored in
mm_struct::futex.

Architectures which implement support for this have to update these
addresses when the VDSO is (re)mapped and indicate the pending op pointer
size which is matching the IP.

Arguably this could be resolved by chasing mm->context->vdso->image, but
that's architecture specific and requires to touch quite some cache
lines. Having it in mm::futex reduces the cache line impact and avoids
having yet another set of architecture specific functionality.

To support multi size robust list applications (gaming) this provides two
ranges when COMPAT is enabled.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.718926819@kernel.org

futex: Add support for unlocking robust futexes

Unlocking robust non-PI futexes happens in user space with the following
sequence:

  1) robust_list_set_op_pending(mutex);
  2) robust_list_remove(mutex);

   lval = 0;
  3) lval = atomic_xchg(lock, lval);
  4) if (lval & WAITERS)
  5) sys_futex(WAKE,....);
  6) robust_list_clear_op_pending();

That opens a window between #3 and #6 where the mutex could be acquired by
some other task which observes that it is the last user and:

  A) unmaps the mutex memory
  B) maps a different file, which ends up covering the same address

When the original task exits before reaching #6 then the kernel robust list
handling observes the pending op entry and tries to fix up user space.

In case that the newly mapped data contains the TID of the exiting thread
at the address of the mutex/futex the kernel will set the owner died bit in
that memory and therefore corrupting unrelated data.

PI futexes have a similar problem both for the non-contented user space
unlock and the in kernel unlock:

  1) robust_list_set_op_pending(mutex);
  2) robust_list_remove(mutex);

   lval = gettid();
  3) if (!atomic_try_cmpxchg(lock, lval, 0))
  4) sys_futex(UNLOCK_PI,....);
  5) robust_list_clear_op_pending();

Address the first part of the problem where the futexes have waiters and
need to enter the kernel anyway. Add a new FUTEX_ROBUST_UNLOCK flag, which
is valid for the sys_futex() FUTEX_UNLOCK_PI, FUTEX_WAKE, FUTEX_WAKE_BITSET
operations.

This deliberately omits FUTEX_WAKE_OP from this treatment as it's unclear
whether this is needed and there is no usage of it in glibc either to
investigate.

For the futex2 syscall family this needs to be implemented with a new
syscall.

The sys_futex() case [ab]uses the @uaddr2 argument to hand the pointer to
robust_list_head::list_pending_op into the kernel. This argument is only
evaluated when the FUTEX_ROBUST_UNLOCK bit is set and is therefore backward
compatible.

This is an explicit argument to avoid the lookup of the robust list pointer
and retrieving the pending op pointer from there. User space has the
pointer already available so it can just put it into the @uaddr2
argument. Aside of that this allows the usage of multiple robust lists in
the future without any changes to the internal functions as they just operate
on the provided pointer.

This requires a second flag FUTEX_ROBUST_LIST32 which indicates that the
robust list pointer points to an u32 and not to an u64. This is required
for two reasons:

    1) sys_futex() has no compat variant

    2) The gaming emulators use both both 64-bit and compat 32-bit robust
       lists in the same 64-bit application

As a consequence 32-bit applications have to set this flag unconditionally
so they can run on a 64-bit kernel in compat mode unmodified. 32-bit
kernels return an error code when the flag is not set. 64-bit kernels will
happily clear the full 64 bits if user space fails to set it.

In case of FUTEX_UNLOCK_PI this clears the robust list pending op when the
unlock succeeded. In case of errors, the user space value is still locked
by the caller and therefore the above cannot happen.

In case of FUTEX_WAKE* this does the unlock of the futex in the kernel and
clears the robust list pending op when the unlock was successful. If not,
the user space value is still locked and user space has to deal with the
returned error. That means that the unlocking of non-PI robust futexes has
to use the same try_cmpxchg() unlock scheme as PI futexes.

If the clearing of the pending list op fails (fault) then the kernel clears
the registered robust list pointer if it matches to prevent that exit()
will try to handle invalid data. That's a valid paranoid decision because
the robust list head sits usually in the TLS and if the TLS is not longer
accessible then the chance for fixing up the resulting mess is very close
to zero.

The problem of non-contended unlocks still exists and will be addressed
separately.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.670514505@kernel.org

futex: Cleanup UAPI defines

Make the operand defines tabular for readability sake.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.615600933@kernel.org

x86: Select ARCH_MEMORY_ORDER_TSO

The generic unsafe_atomic_store_release_user() implementation does:

    if (!IS_ENABLED(CONFIG_ARCH_MEMORY_ORDER_TSO))
        smp_mb();
    unsafe_put_user();

As x86 implements Total Store Order (TSO) which means stores imply release,
select ARCH_MEMORY_ORDER_TSO to avoid the unnecessary smp_mb().

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.564499644@kernel.org

uaccess: Provide unsafe_atomic_store_release_user()

The upcoming support for unlocking robust futexes in the kernel requires
store release semantics. Syscalls do not imply memory ordering on all
architectures so the unlock operation requires a barrier.

This barrier can be avoided when stores imply release like on x86.

Provide a generic version with a smp_mb() before the unsafe_put_user(),
which can be overridden by architectures.

Provide also a ARCH_MEMORY_ORDER_TSO Kconfig option, which can be selected
by architectures with Total Store Order (TSO), where store implies release,
so that the smp_mb() in the generic implementation can be avoided.

If that is set a barrier() is used instead of smp_mb(), which is not
required for the use case at hand, but makes it future proof for other
usage to prevent the compiler from reordering.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.513181528@kernel.org

futex: Provide UABI defines for robust list entry modifiers

The marker for PI futexes in the robust list is a hardcoded 0x1 which lacks
any sensible form of documentation.

Provide proper defines for the bit and the mask and fix up the usage
sites. Thereby convert the boolean pi argument into a modifier argument,
which allows new modifier bits to be trivially added and conveyed.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.458758556@kernel.org

futex: Move futex related mm_struct data into a struct

Having all these members in mm_struct along with the required #ifdeffery is
annoying, does not allow efficient initializing of the data with
memset() and makes extending it tedious.

Move it into a data structure and fix up all usage sites.

The extra struct for the private hash is intentional to make integration of
other conditional mechanisms easier in terms of initialization and separation.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260602090535.407756793@kernel.org

futex: Make futex_mm_init() void

Nothing fails there. Mop up the leftovers of the early version of this,
which did an allocation.

While at it clean up the stubs and the #ifdef comments to make the header
file readable.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260602090535.356789395@kernel.org

futex: Move futex task related data into a struct

Having all these members in task_struct along with the required #ifdeffery
is annoying, does not allow efficient initializing of the data with
memset() and makes extending it tedious.

Move it into a data structure and fix up all usage sites.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20260602090535.308220888@kernel.org

percpu: Sanitize __percpu_qual include hell

Slapping __percpu_qual into the next available header is sloppy at best.

It's required by __percpu which is defined in compiler_types.h and that is
meant to be included without requiring a boatload of other headers so that
a struct or function declaration can contain a __percpu qualifier w/o
further prerequisites.

This implicit dependency on linux/percpu.h makes that impossible and causes
a major problem when trying to separate headers.

Create asm/percpu_types.h and move it there. Include that from
compiler_types.h and the whole recursion problem goes away.

Fix up UM so it uses the generic header and includes it in the UM_HOST
build, which pulls in compiler_types.h. The USER_CFLAGS fix was suggested
by Richard.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260602090535.254874125@kernel.org

cleanup: Remove NULL check from unconditional guards

The unconditional guard destructors check whether the lock pointer is
NULL before unlocking. This check is dead code because unconditional
guards guarantee a non-NULL lock pointer at destructor time.

DEFINE_GUARD() runs the lock operation unconditionally in the
constructor. If the pointer were NULL, the lock operation (e.g.
mutex_lock(NULL)) would crash before the constructor returns. The
destructor never runs with a NULL pointer. All DEFINE_GUARD() users
dereference the pointer in their lock. Verified by auditing every
instance found by: git grep -n -A 1 'DEFINE_GUARD('. The only exception
is xe_pm_runtime_release_only, whose constructor is a noop, but it has
no callers.

__DEFINE_UNLOCK_GUARD() has only a few usages outside of
include/linux/cleanup.h: tty_port_tty (NULL-checks in its tty_kref_put()
call), irqdesc_lock (fixed earlier) and two guards in
kernel/sched/sched.h (dereference the pointer unconditionally in their
lock constructors).

DEFINE_LOCK_GUARD_1() sets .lock from its argument and runs the lock
operation in the constructor. Same reasoning applies. All
DEFINE_LOCK_GUARD_1() users dereference the pointer in their lock. Also,
verified by auditing every match of: git grep -n 'DEFINE_LOCK_GUARD_1('.

DEFINE_LOCK_GUARD_0() hardcodes .lock = (void *)1 in the constructor,
so it is never NULL by construction.

Conditional (_try) variants: DEFINE_GUARD_COND() and
DEFINE_LOCK_GUARD_1_COND() use EXTEND_CLASS_COND(), whose wrapper
destructor returns early when the lock was not acquired, before reaching
the base destructor since commit 2deccd5c862a ("cleanup: Optimize
guards"):

if (_cond) return; class_##_name##_destructor(_T);

As compiled by GCC-11 with defconfig on top of the locking/core:

Total: Before=23889980, After=23834334, chg -0.23%

Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/0503a089389b2270c478a873e095cf0a4ff26d24.1780064327.git.d@ilvokhin.com

cleanup: Annotate guard constructors with nonnull

Add __nonnull_args() to unconditional guard constructors so the compiler
warns when NULL is statically known to be passed:

- DEFINE_GUARD(): re-declare the constructor with __nonnull_args().
- __DEFINE_LOCK_GUARD_1(): annotate the constructor directly.

DEFINE_LOCK_GUARD_0() needs no annotation: its constructor takes no
pointer arguments (.lock is hardcoded to (void *)1).

Define the __nonnull_args() macro in compiler_attributes.h, following
the existing convention for attribute wrappers. Deliberately not named
'__nonnull', to avoid clashing with glibc's __nonnull() when kernel and
userspace headers are combined (User Mode Linux for example).

Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/85fee12eec20abfcf711443518e8f0caec982a86.1780064327.git.d@ilvokhin.com

genirq: Move NULL check into irqdesc_lock guard unlock expression

irqdesc_lock uses __DEFINE_UNLOCK_GUARD() directly with a custom
constructor that can set .lock to NULL.

In preparation for removing the NULL check from __DEFINE_UNLOCK_GUARD(),
move the NULL check into the irqdesc_lock unlock expression, making the
NULL handling explicit at the call site.

No functional change.

Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/ab457810653e4356e29b2d74ba616478bd9328ad.1780064327.git.d@ilvokhin.com

nvdimm: Convert nvdimm_bus guard to class

The nvdimm_bus guard accepts NULL and skips locking when NULL is passed.
Convert from DEFINE_GUARD() to DEFINE_CLASS() + DEFINE_CLASS_IS_GUARD().

This is a preparatory change for making DEFINE_GUARD() constructors
__nonnull_args(). nvdimm_bus legitimately passes NULL, so it must be
adjusted to avoid a compile error.

No functional change.

Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/8c0417904d280896ecf2e9923ffa9f20076f59b8.1780064327.git.d@ilvokhin.com

lockdep/selftests: Restore sched_rt_mutex state on PREEMPT_RT

The WW-mutex selftests deliberately exercise failing lock paths. On
PREEMPT_RT, some of those paths enter the RT-mutex scheduler helpers.

The change referenced by the Fixes tag made those helpers track RT-mutex
scheduling state in current->sched_rt_mutex. The bit is normally cleared by
the matching post-schedule helper, but some WW-mutex selftests disable
the runtime debug_locks flag before that happens. With debug_locks cleared,
lockdep_assert() does not evaluate the expression that clears the bit,
leaving stale state for the next testcase.

With CONFIG_PREEMPT_RT=y and CONFIG_DEBUG_LOCKING_API_SELFTESTS=y, that
stale state produces warnings such as:

WARNING: kernel/sched/core.c:7557 at rt_mutex_pre_schedule+0x26/0x2d
RIP: 0010:rt_mutex_pre_schedule+0x26/0x2d

Save and restore current->sched_rt_mutex around each testcase, matching the
existing PREEMPT_RT cleanup for task-local migration and RCU state.

Fixes: d14f9e930b90 ("locking/rtmutex: Use rt_mutex specific scheduler helpers")
Assisted-by: Codex:gpt-5
Signed-off-by: Karl Mehltretter <kmehltretter@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20260523185123.17482-3-kmehltretter@gmail.com

lockdep/selftests: Restore migrate_disable() state on PREEMPT_RT

The lockdep selftests deliberately run unbalanced locking patterns.
dotest() restores the task state they leave behind before running the
next testcase.

On PREEMPT_RT, spin_lock() uses migrate_disable() instead of disabling
preemption. dotest() cleans up the resulting migration-disabled state, but
that cleanup is still guarded by CONFIG_SMP.

That used to match the scheduler data model, where migration_disabled was
also CONFIG_SMP-only. The commit referenced below made SMP scheduler state
unconditional, so CONFIG_SMP=n PREEMPT_RT kernels with
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y report success from the selftests and
then trip over stale current->migration_disabled state:

  releasing a pinned lock
  bad: scheduling from the idle thread!
  Kernel panic - not syncing: Fatal exception

Save and restore current->migration_disabled for every PREEMPT_RT build.

Fixes: cac5cefbade9 ("sched/smp: Make SMP unconditional")
Assisted-by: Codex:gpt-5
Signed-off-by: Karl Mehltretter <kmehltretter@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20260523185123.17482-2-kmehltretter@gmail.com

locking/qspinlock: Clarify pending field layout

For CONFIG_NR_CPUS < 16K, _Q_PENDING_BITS is 8 and the pending
field occupies bits 8-15 of the lock word. The current comment
documents bit 8 as pending and bits 9-15 as unused, which describes
the pending flag value rather than the field layout.

Describe bits 8-15 as the pending byte so the layout description
is consistent with the lock byte.

Signed-off-by: WEI-HONG, YE <1234567weewee457@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://patch.msgid.link/20260525130450.723937-1-1234567weewee457@gmail.com

pmdomain: sunxi: support power domain flags for pck600

While bringing up the PowerVR GPU on the A733 (Radxa Cubie A7Z), we
found that one of the GPU power domains must be configured as "always
on." While the Radxa BSP device tree leaves the GPU power domain nodes
commented out, the GPU driver code contains traces indicating an "always
on" requirement [1].

Currently, sunxi_pck600_desc only supports specifying pd_names. This
patch introduces sunxi_pck600_pd_desc, which stores both the name and
its associated flags. This also (more or less) aligns the implementation
with the existing sun50i PPU handling of always-on domains.

With this change, individual power domains can now be configured more
granularly. In particular, the GPU_CORE domain in sun60i_a733_pck600_pds
can now be explicitly marked with GENPD_FLAG_ALWAYS_ON.

The patch was tested on the Radxa Cubie A7Z, where the GPU now functions
as expected.

Thanks to Icenowy for her support and expertise on sunxi and PowerVR,
and thanks to Mikhail for identifying this exact cause of the GPU
bring-up issue.

[1] https://github.com/radxa/allwinner-bsp/blob/cubie-aiot-v1.4.6/modules/gpu/img-bxm/linux/rogue_km/services/system/rogue/rgx_sunxi/sunxi_platform.c#L62

Signed-off-by: Yuanshen Cao <alex.caoys@gmail.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

pmdomain: core: switch to dynamic root device

Driver core expects devices to be dynamically allocated and will, for
example, complain loudly if a device that lacks a release function is
ever freed.

Use root_device_register() to allocate and register the root device
instead of open coding using a static device.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

pmdomain: qcom: Unify user-visible "Qualcomm" name

Various names for Qualcomm as a company are used in user-visible config
options: QCOM, Qualcomm and Qualcomm Technologies. Switch to unified
"Qualcomm" so it will be easier for users to identify the options when
for example running menuconfig.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

dt-bindings: power: qcom,rpmhpd: Add RPMh power domain for Nord SoC

Document the RPMh power domain for Nord SoC, and add definitions for
the new power domains present on Nord SoC.

- RPMHPD_NSP3: power domain for the 4th NSP subsystem
- RPMHPD_GFX1: power domain for the 2nd GFX subsystem

Signed-off-by: Kamal Wadhwa <kamal.wadhwa@oss.qualcomm.com>
Signed-off-by: Shawn Guo <shengchao.guo@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

dt-bindings: power: qcom,rpmhpd: Fix whitespace in RPMHPD defines

Some RPMHPD_* defines in the Generic RPMH Power Domain Indexes section
were using spaces instead of tabs for alignment. Fix them to be
consistent with the rest of the file.

Signed-off-by: Shawn Guo <shengchao.guo@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

drm/i915: Fix color blob reference handling in intel_plane_state

Take proper references for hw color blobs (degamma_lut, gamma_lut,
ctm, lut_3d) in intel_plane_duplicate_state() and drop them in
intel_plane_destroy_state().

v2:
- handle blobs in hw state clear

Cc: <stable@vger.kernel.org> #v6.19+
Fixes: 3b7476e786c2 ("drm/i915/color: Add framework to program PRE/POST CSC LUT")
Fixes: a78f1b6baf4d ("drm/i915/color: Add framework to program CSC")
Fixes: 65db7a1f9cf7 ("drm/i915/color: Add 3D LUT to color pipeline")
Reviewed-by: Pranay Samala <pranay.samala@intel.com> #v1
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patch.msgid.link/20260601082953.128539-4-chaitanya.kumar.borah@intel.com
(cherry picked from commit c6eea1925154b6697fe22b217faab9bb30635e6b)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>

clocksource/drivers/timer-ti-dm: Add clockevent support

Add support for using the TI Dual-Mode Timer for clockevents. The second
always on device with the "ti,timer-alwon" property is selected to be
used for clockevents. The first one is used as clocksource.

This allows clockevents to be setup independently of the CPU.

Signed-off-by: Markus Schneider-Pargmann (TI) <msp@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260508-topic-ti-dm-clkevt-v6-16-v5-3-61d546a0aff9@baylibre.com

clocksource/drivers/timer-ti-dm: Add clocksource support

Add support for using the TI Dual-Mode Timer as a clocksource. The
driver automatically picks the first timer that is marked as always-on
on with the "ti,timer-alwon" property to be the clocksource.

The timer can then be used for CPU independent time keeping.

Signed-off-by: Markus Schneider-Pargmann (TI) <msp@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260508-topic-ti-dm-clkevt-v6-16-v5-2-61d546a0aff9@baylibre.com

clocksource/drivers/timer-ti-dm: Fix property name in comment

ti,always-on property doesn't exist. ti,timer-alwon is meant here. Fix
this minor bug in the comment.

Signed-off-by: Markus Schneider-Pargmann (TI) <msp@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Link: https://patch.msgid.link/20260508-topic-ti-dm-clkevt-v6-16-v5-1-61d546a0aff9@baylibre.com

dt-bindings: timer: arm,arch_timer: Fix requirements for interrupt description

The arm,arch_timer DT binding is extremely imprecise in describing
the requirements for interrupts.

Follow the architecture by making it explicit that:
- the EL1 secure timer irq is required if EL3 is implemented
- the EL1 physical timer irq is always required
- the EL1 virtual timer irq is always required
- the EL2 physical timer irq is required if EL2 is implemented
- the EL2 virtual timer irq is required if FEAT_VHE is implemented

The consequence of the above is that the minimum number of interrupts
to be described is 2, and not 1.

Finally, clean up the description which made the assumption that
the timers are plugged into a GIC (unfortunately, that's not always
true), drop the MMIO nonsense that has long be moved to a separate
binding, and use the architectural terminology to describe the various
interrupts.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260523140242.586031-5-maz@kernel.org

clocksource/drivers/arm_arch_timer: Default to EL2 virtual timer when running VHE

When running with at EL2 with VHE enabled, the architecture provides
two EL2 timer/counters, dubbed physical and virtual. Apart from their
names, they are strictly identical.

However, they don't get virtualised the same way, specially when
it comes to adding arbitrary offsets to the timers. When running as
a guest, the host CNTVOFF_EL2 does apply to the guest's view of
CNTHV*_El2. This is not true for CNTPOFF_EL2 and CNTHP*_EL2, as
the architecture is broken past the first level of virtualisation
(it lacks some essential mechanisms to be usable, despite what
the ARM ARM pretends).

This means that when running as a L2 guest hypervisor, using the
physical timer results in traps to L0, which are then forwarded to
L1 in order to emulate the offset, leading to even worse performance
due to massive trap amplification (the combination of register and
ERET trapping is absolutely lethal).

Switch the arch timer code to using the virtual timer when running
in VHE by default, only using the physical timer if the interrupt
is not correctly described in the firmware tables (which seems
to be an unfortunately common case). This comes as no impact on
bare-metal, and slightly improves the situation in the virtualised
case.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260523140242.586031-4-maz@kernel.org

ACPI: GTDT: Parse information related to the EL2 virtual timer

Now that we have a way to identify GTDTv3, allow the information
related to the EL2 virtual timer to be retrieved by the interface
used by the architected timer driver.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Sudeep Holla <sudeep.holla@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20260523140242.586031-3-maz@kernel.org

ACPI: GTDT: Account for GTDTv3 size when walking the platform timer descriptors

Since ARMv8.1, the architecture has grown an EL2-private virtual
timer. This has been described in ACPI since ACPI v6.3 and revision
3 of the GTDT table.

An aditional structure was added in ACPICA, though in a rather
bizarre way, and merged in v5.1 as 8f5a14d053100 ("ACPICA: ACPI 6.3:
add GTDT Revision 3 support").

Finally plug the table parsing in GTDT, and correct the parsing of
the platform timer subtables to account for the expanded size of
the base table. This also comes with some extra sanitisation of
the table, in the unlikely case someone got it wrong...

Suggested-by: Sudeep Holla <sudeep.holla@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Sudeep Holla <sudeep.holla@kernel.org>
Link: https://patch.msgid.link/20260523140242.586031-2-maz@kernel.org

Merge tag 'iwlwifi-fixes-2026-05-31' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next

wifi: iwlwifi: fixes - 2026-05-31

Miri Korenblit says:
====================
This contains a few fixes:
- Don't grab nic access in non-fast-resume
- Don't send a large hcmd than transport supports
- In AP mode, don't send tx power constraints command before activating
the link
- Don't do sw reset handshake on older firmwares.
====================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: fix leak if split 6 GHz scanning fails

rdev->int_scan_req is leaked if cfg80211_scan() fails.  Note that it's
supposed to be released at ___cfg80211_scan_done() but this doesn't happen
as rdev->scan_req is NULL at that point, too, leading to the early return
from the freeing function.

unreferenced object 0xffff8881161d0800 (size 512):
  comm "wpa_supplicant", pid 379, jiffies 4294749765
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 f0 81 13 16 81 88 ff ff  ................
  backtrace (crc c867fdb6):
    kmemleak_alloc+0x89/0x90
    __kmalloc_noprof+0x2fd/0x410
    cfg80211_scan+0x133/0x730
    nl80211_trigger_scan+0xc69/0x1cc0
    genl_family_rcv_msg_doit+0x204/0x2f0
    genl_rcv_msg+0x431/0x6b0
    netlink_rcv_skb+0x143/0x3f0
    genl_rcv+0x27/0x40
    netlink_unicast+0x4f6/0x820
    netlink_sendmsg+0x797/0xce0
    __sock_sendmsg+0xc4/0x160
    ____sys_sendmsg+0x5e4/0x890
    ___sys_sendmsg+0xf8/0x180
    __sys_sendmsg+0x136/0x1e0
    __x64_sys_sendmsg+0x76/0xc0
    x64_sys_call+0x13f0/0x17d0

Found by Linux Verification Center (linuxtesting.org).

Fixes: c8cb5b854b40 ("nl80211/cfg80211: support 6 GHz scanning")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Link: https://patch.msgid.link/20260601094157.92703-1-pchelkin@ispras.ru
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

configfs_lookup(): don't leave ->s_dentry dangling on failure

Normally ->s_dentry is cleared when dentry it's pointing to becomes
negative (on eviction, realistically).  However, that only happens
if dentry gets to be positive in the first place; in case of inode
allocation failure dentry never becomes positive, so ->d_iput()
is not called at all.

We do part of what normally would've been done by configfs_d_iput()
(dropping the reference to configfs_dirent) manually, but we do
not clear ->s_dentry there.  Sloppy as it is, it does not matter in
case of configfs_create_{dir,link}() - there configfs_dirent does
not survive dropping the sole reference to it.

However, for configfs_lookup() it *does* survive, with a dangling
pointer to soon to be freed dentry sitting it its ->s_dentry.

Subsequent getdents(2) in that directory will end up dereferencing
that pointer in order to pick the inode number.  Use after free...

This is the minimal fix; the right approach is to set the linkage
between dentry and configfs_dirent only after we know that we have
an inode, but that takes more surgery and the bug had been there
since 2006, so...

Fixes: 3d0f89bb1694 ("configfs: Add permission and ownership to configfs objects") # 2.6.16-rc3
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

thermal/drivers/qcom/tsens: Disable wakeup interrupt setup on automotive targets

Add a no_irq_wake flag to struct tsens_plat_data to allow platforms
to control whether TSENS interrupts should be configured as wakeup
sources.

Create a new data_automotive structure and add compatible strings for
automotive TSENS variants (SA8775P, SA8255P) with wakeup interrupts
disabled.

Automotive platforms can enter a low-power parking suspend state where the
application processors and thermal mitigation paths are not active. In this
state, waking the system due to TSENS threshold interrupts does not enable
useful thermal action, but it does repeatedly break suspend residency and
increase battery drain.

Allow these automotive variants to keep TSENS monitoring enabled during
normal runtime while opting out of TSENS wakeup interrupts during suspend,
so the system can remain in low power until ignition/resume.

Signed-off-by: Priyansh Jain <priyansh.jain@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260601-tsens_interrupt_wake_control-v2-2-ce9570946abd@oss.qualcomm.com

thermal/drivers/qcom/tsens: Switch wake IRQ handling to PM callbacks

This change improves power management by using the standardized PM
framework for wake IRQ handling.

Move wake IRQ control to the PM suspend/resume path:
- store uplow/critical IRQ numbers in struct tsens_priv
- enable wake IRQs in tsens_suspend_common() when wakeup is allowed
- disable wake IRQs in tsens_resume_common()
- mark the device wakeup-capable during probe

This aligns TSENS wake behavior with suspend flow and avoids keeping
wake IRQs permanently enabled during runtime.

Signed-off-by: Priyansh Jain <priyansh.jain@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260601-tsens_interrupt_wake_control-v2-1-ce9570946abd@oss.qualcomm.com

thermal/core: Fix missing stub for devm_thermal_cooling_device_register

Even it is very unlikely the thermal framework is disabled, the newly
added devm_thermal_cooling_device_register() function has not the stub
when the thermal framework is optout in the kernel.

Add it.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605301554.S9n45bfQ-lkp@intel.com/
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260601090152.1243983-2-daniel.lezcano@kernel.org

dt-bindings: thermal: cooling-devices: Update support for 3 cells cooling device

Extend the thermal cooling device binding to support a 3 cells specifier
along with the 2 cells format.

Update #cooling-cells property to enum to support both 2 and 3 arguments.

Fix pwm-fan.yaml to restrict the number of cells to 'const: 2'

Signed-off-by: Gaurav Kohli <gaurav.kohli@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-22-daniel.lezcano@oss.qualcomm.com

thermal/of: Support cooling device ID in cooling-spec

Extend the cooling device specifier parsing to support an optional
cooling device identifier (cdev_id).

Two formats are now supported:

  - Legacy format:
        <&cdev lower upper>

  - Indexed format:
        <&cdev cdev_id lower upper>

When the indexed format is used, both the device node and the
cdev_id must match in order to bind a cooling device to a thermal
zone. The legacy format continues to match on the device node only,
preserving backward compatibility.

Update the parsing logic accordingly to handle both formats and
extract the mitigation limits from the appropriate arguments.

This is a preparatory step for upcoming DT bindings describing
cooling devices using (device node, id) tuples instead of child
nodes.

No functional change for existing device trees.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-21-daniel.lezcano@oss.qualcomm.com

thermal/of: Pass cdev_id and introduce devm registration helper

Extend the OF cooling device registration to support an explicit
cooling device identifier (cdev_id), preparing for upcoming DT
bindings where cooling devices are identified by a tuple (device node,
id) instead of relying on child nodes.

Introduce a new helper:

devm_thermal_of_cooling_device_register()

which registers a cooling device using the device's of_node and an
explicit cdev_id. This complements the existing
devm_thermal_of_child_cooling_device_register() helper, which
remains dedicated to the legacy child-node based bindings.

Internally, factorize the devm registration logic into a common
helper to avoid code duplication.

Existing users are unaffected, as the child-based helper continues
to pass a default cdev_id of 0, preserving current behavior.

This change is a preparatory step for supporting indexed cooling
devices in thermal OF bindings.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-20-daniel.lezcano@oss.qualcomm.com

thermal/of: Add cooling device ID support

Introduce an identifier (cdev_id) for cooling devices registered from
device tree.

This prepares support for a new DT binding where cooling devices are
identified by a tuple (device node, ID), instead of relying on child
nodes.

Existing users are updated to pass a default ID of 0, preserving the
current behavior.

Future changes will extend the cooling map parsing to match cooling
devices based on both the device node and the ID.

No functional change intended.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-19-daniel.lezcano@oss.qualcomm.com

thermal/of: Rename the devm_thermal_of_cooling_device_register() function

To clarify that the function operates on child nodes, rename:

          devm_thermal_of_cooling_device_register()
                     |
     v
       devm_thermal_of_child_cooling_device_register()

Used the command:

      find . -type f -name '*.[ch]' -exec \
sed -i 's/devm_thermal_of_cooling_device_register/\
devm_thermal_of_child_cooling_device_register/g' {} \;

Did not used clang-format-diff because it does not indent correctly
and checkpatch complained. Manually reindented to make checkpatch
happy

This prepares for upcoming support of cooling devices identified by
an ID rather than device tree child nodes.

No functional change.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-18-daniel.lezcano@oss.qualcomm.com

thermal/core: Make cooling device OF node conditional on CONFIG_THERMAL_OF

The device node pointer stored in struct thermal_cooling_device is
only used by the OF-specific thermal code to associate cooling devices
with thermal zones defined in device tree.

Now that OF and non-OF registration paths are separated and non-OF
users no longer rely on devm_thermal_of_cooling_device_register() with
a NULL device node, the np field is no longer required for non-OF
configurations.

Make this field conditional on CONFIG_THERMAL_OF to reduce memory
footprint and better reflect its usage.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20260526140802.1059293-16-daniel.lezcano@oss.qualcomm.com

thermal/of: Move cooling device OF helpers out of thermal core

The functions:
- thermal_of_cooling_device_register()
- devm_thermal_of_cooling_device_register()

are specific to device tree usage but are currently implemented in
thermal_core.c.

Move them to thermal_of.c to better reflect the separation between
generic thermal core code and OF-specific logic.

This change is enabled by the recent split of the cooling device
registration into allocation and addition phases, allowing OF-specific
handling (such as device node assignment) to be isolated from the core.

No functional change intended.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-17-daniel.lezcano@oss.qualcomm.com

hwmon: Use non-OF thermal cooling device registration API

Some HWMON drivers register cooling devices using the OF helper
devm_thermal_of_cooling_device_register() with a NULL device node.

With the introduction of a dedicated non-OF registration API,
switch these users to devm_thermal_cooling_device_register()
to make the intent explicit and avoid relying on OF-specific helpers.

This is a pure refactoring with no functional change.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-15-daniel.lezcano@oss.qualcomm.com

thermal/core: Add devm_thermal_cooling_device_register()

Introduce a device-managed variant of the non-OF cooling device
registration API.

This complements devm_thermal_of_cooling_device_register() and allows
non-device-tree users to register cooling devices with automatic
cleanup tied to the device lifecycle.

The helper relies on devm_add_action_or_reset() to release the cooling
device via thermal_cooling_device_release() on driver detach or probe
failure.

This keeps the API consistent across OF and non-OF users and avoids
manual cleanup in error paths.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-14-daniel.lezcano@oss.qualcomm.com

thermal/core: Introduce non-OF thermal_cooling_device_register()

Split the cooling device registration API into OF and non-OF variants.

Introduce thermal_cooling_device_register() for non-device-tree users
and rework thermal_of_cooling_device_register() to use the new
alloc/add split.

This removes the need for the internal __thermal_cooling_device_register()
helper and makes the separation between OF and non-OF users explicit.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-13-daniel.lezcano@oss.qualcomm.com

thermal/drivers/samsung: Enable TMU by default

The SoC Thermal Management Unit (TMU) is essential for proper operation
of the SoCs.  Kernel should not ask users choice of drivers when that
choice is obvious and known to the developers that answer should be
'yes' or 'module'.

Impact of making it default:

1. arm64 defconfig: No changes, already present in defconfig.

2. arm32: No changes, the driver is already selected by MACH_EXYNOS.

3. COMPILE_TEST builds: enable by default for arm32 or arm64 builds,
   whenever ARCH_EXYNOS is selected.  This has impact on build time and
   feels logical, because if one selects ARCH_EXYNOS then probably by
   default wants to build test it entirely.  Kernels with COMPILE_TEST
   are not supposed to be used for booting.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260526135312.8697-2-krzysztof.kozlowski@oss.qualcomm.com

thermal/driver/qoriq: Workaround unexpected temperature readings from tmu

Invalid temperature measurements may be observed across the temperature
range specified in the device data sheet. The invalid temperature can
be read from any remote site and from any capture or report registers.
The invalid change in temperature can be positive or negative and the
resulting temperature can be outside the calibrated range, in which
case the TSR[ORL] or TSR[ORH] bit will be set.

Workaround:
Use the raising/falling edge threshold to filter out the invalid temp.
Check the TIDR register to make sure no jump happens When reading the temp.

i.MX93 ERR052243:
(https://www.nxp.com/webapp/Download?colCode=IMX93_2P87F&appType=license)

Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260430-imx93_tmu-v6-3-485459d7b54f@nxp.com

thermal/drivers/qoriq: Add i.MX93 tmu support

For Thermal monitor unit(TMU) used on i.MX93, the HW revision info read
from the ID register is the same the one used on some of the QorIQ
platform, but the config has some slight differance. Add i.MX93 compatible
string and corresponding code for it.

Signed-off-by: Alice Guo <alice.guo@nxp.com>
Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260430-imx93_tmu-v6-2-485459d7b54f@nxp.com

dt-bindings: thermal: qoriq: Add compatible string for imx93

Add i.MX93 compatible string 'fsl,imx93-tmu' because Thermal monitor
unit(TMU) on i.MX93 has differences with QorIQ platform and not fully
compatible with existing Platform, such as fsl,qoriq-tmu.

Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20260430-imx93_tmu-v6-1-485459d7b54f@nxp.com

thermal/drivers/spacemit/k1: Add thermal sensor support

The thermal sensor on K1 supports monitoring five temperature zones.
The driver registers these sensors with the thermal framework
and supports standard operations:
- Reading temperature (millidegree Celsius)
- Setting high/low thresholds for interrupts

Signed-off-by: Shuwei Wu <shuwei.wu@mailbox.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Anand Moon <linux.amoon@gmail.com>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Yao Zi <me@ziyao.cc>
Tested-by: Anand Moon <linux.amoon@gmail.com>
Tested-by: Vincent Legoll <legoll@online.fr> # OrangePi-RV2
Tested-by: Gong Shuai <gsh517025@gmail.com>
Link: https://patch.msgid.link/20260427-k1-thermal-v5-2-df39187480ed@mailbox.org

dt-bindings: thermal: Add SpacemiT K1 thermal sensor

Document the SpacemiT K1 Thermal Sensor, which supports
monitoring temperatures for five zones: soc, package, gpu, cluster0,
and cluster1.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Shuwei Wu <shuwei.wu@mailbox.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.or>
Link: https://patch.msgid.link/20260427-k1-thermal-v5-1-df39187480ed@mailbox.org

thermal/drivers/imx: Do not split quoted string across lines

The checkpatch tool warns against splitting quoted strings across
multiple lines. Join the dev_info message into a single line to
improve the ability to grep for the message in the source.

Signed-off-by: Mayur Kumar <kmayur809@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260511174255.215207-1-kmayur809@gmail.com

thermal/of: Fix trailing whitespace and repeated word

Correct a trailing whitespace error on line 101 and remove a
duplicated "which" in the kernel-doc comment for thermal_of_zone_register.

Signed-off-by: Mayur Kumar <kmayur809@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260511161854.193573-1-kmayur809@gmail.com

thermal/drivers/qcom/tsens: Atomic temperature read with hardware-guided retries

The existing TSENS temperature read logic polls the valid bit and then
reads the temperature register. When temperature reads are triggered
at very short intervals, this can race with hardware updates and allow
the temperature field to be read while it is still being updated.

In this case, the valid bit may already be asserted even though the
temperature value is transitioning, resulting in an incorrect reading.

Hardware programming guidelines require the temperature value and the
valid bit to be sampled atomically in the same read transaction. A
reading is considered valid only if the valid bit is observed set in
that same sample.

The guidelines further specify that software should attempt the
temperature read up to three times to account for transient update
windows. If none of the attempts yields a valid sample, a stable fallback
value must be returned: if the first and second samples match, the second
value is returned;otherwise, if the second and third samples match, the
third value is returned;if neither pair matches, -EAGAIN is returned.

Update the TSENS sensor read logic to implement atomic sampling along
with the recommended retry-and-compare fallback behavior. This removes
the race window and ensures deterministic temperature values in
accordance with hardware requirements.

Signed-off-by: Priyansh Jain <priyansh.jain@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260514113643.1954111-1-priyansh.jain@oss.qualcomm.com

dt-bindings: thermal: qcom-tsens: Document the Hawi Temperature Sensor

Document the Temperature Sensor (TSENS) on the Qualcomm Hawi SoC.

Signed-off-by: Dipa Ramesh Mantre <dipa.mantre@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260512-dtbinding-hawi-v1-1-96149d06cccf@oss.qualcomm.com

dt-bindings: thermal: qcom-tsens: Document the Shikra Temperature Sensor

Document the Temperature Sensor (TSENS) on the Shikra SoC.

Signed-off-by: Gaurav Kohli <gaurav.kohli@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260513-tsens_binding-v1-1-1780c6a6caf2@oss.qualcomm.com

thermal/drivers/qcom: Fix typo in comment

Fix a typo in the struct tsens_irq_data comment.
Replace "uppper" with "upper".

Signed-off-by: Jinseok Kim <always.starving0@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Amit Kucheria <amitk@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260516152324.1863-1-always.starving0@gmail.com

thermal/drivers/amlogic: Add support for secure monitor calibration readout

Some SoCs (e.g. T7) expose thermal calibration data through the secure
monitor rather than a directly accessible eFuse register. Add a use_sm
flag to amlogic_thermal_data to select this path, and retrieve the
firmware handle and tsensor_id from the "amlogic,secure-monitor" DT
phandle with one fixed argument.

Also introduce the amlogic,t7-thermal compatible using this new path.

While refactoring, fix a pre-existing bug where
amlogic_thermal_initialize() was called after
devm_thermal_of_zone_register(), causing the thermal framework to
read an uninitialized trim_info on zone registration.

Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260424-add-thermal-t7-vim4-v5-4-9040ca36afe2@aliel.fr

thermal/drivers/amlogic: Add missing dependency on MESON_SM

The amlogic thermal driver calls meson_sm_get() and
meson_sm_get_thermal_calib() which are exported by the meson_sm
driver. Without CONFIG_MESON_SM enabled, the build fails with
undefined references to these symbols.

Add a proper Kconfig dependency on MESON_SM instead of relying on
stub functions, which makes the dependency explicit and prevents
invalid configurations.

Closes: https://lore.kernel.org/oe-kbuild-all/202605291530.en7aGn7w-lkp@intel.com/
Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://lore.kernel.org/oe-kbuild-all/202605291530.en7aGn7w-lkp@intel.com/
Link: https://patch.msgid.link/20260602-fix-missing-meson_sm-symbol-v3-1-6f7f69cd7d6c@aliel.fr

Merge patch series "super: retire sget()"

Christian Brauner <brauner@kernel.org> says:

CIFS plus the two ext4 KUnit tests (extents-test, mballoc-test) were the
last in-tree callers, and all three convert cleanly to sget_fc(). That
lets sget() and its prototype come out, taking ~60 lines that only
existed to be kept in lockstep with sget_fc() on every publish-path
change.

* patches from https://patch.msgid.link/20260529-work-sget-v2-0-57bbe08604e4@kernel.org:
  fs: retire sget()
  smb: client: convert cifs_smb3_do_mount() to sget_fc()
  ext4: convert mballoc KUnit test to sget_fc()
  ext4: convert extents KUnit test to sget_fc()

Link: https://patch.msgid.link/20260529-work-sget-v2-0-57bbe08604e4@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

fs: retire sget()

sget() and sget_fc() have lived side by side as near-duplicate
find-or-create-and-publish helpers for the legacy and fs_context mount
APIs. The three remaining in-tree callers (CIFS plus the ext4 extents
and mballoc KUnit tests) have all been moved to sget_fc(). Nothing
calls sget() anymore.

Delete sget() from fs/super.c and the prototype in <linux/fs.h>.
Update the two comments that referred to "sget()" or "sget{_fc}()" to
just say "sget_fc()".

This removes ~60 lines of code that only existed to be kept in
lockstep with sget_fc() on every superblock publish-path change.

Link: https://patch.msgid.link/20260529-work-sget-v2-4-57bbe08604e4@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

smb: client: convert cifs_smb3_do_mount() to sget_fc()

The CIFS mount path already runs through fs_context: smb3_get_tree()
calls smb3_get_tree_common() with a struct fs_context * in hand. But
the fc is dropped on the way to sget(). Plumb it through to sget_fc()
so the legacy sget() interface can go.

cifs_smb3_do_mount() now takes (struct fs_context *, struct
smb3_fs_context *). The old (fs_type, flags) pair is reconstructed
from fc->fs_type and fc->sb_flags. The flags argument was always
passed as 0 by the sole caller anyway. The cifs_dbg diagnostic now
prints fc->sb_flags directly.

cifs_match_super() and cifs_set_super() were the two void-data
callbacks for sget(). The match callback now takes
(struct super_block *, struct fs_context *) and reads struct
cifs_mnt_data out of fc->sget_key. The set callback is gone entirely:
sget_fc() pre-populates sb->s_fs_info from fc->s_fs_info before
invoking set() so set_anon_super_fc() (which just allocates an anon
bdev) is sufficient.

Before sget_fc() we stash cifs_sb in fc->s_fs_info, the per-mount data
in fc->sget_key and force fc->sb_flags to SB_NODIRATIME | SB_NOATIME
to reproduce the previous hard-coded behaviour (alloc_super() reads
fc->sb_flags). The original sb_flags is saved and restored around the
call so the rest of the mount path sees the same fc semantics as
before.

mnt_data.flags keeps its historical value of 0 so the CIFS_MS_MASK
comparison in compare_mount_options() returns the same (always-equal)
result.

No functional change. With this in place sget() has no remaining CIFS
caller.

Link: https://patch.msgid.link/20260529-work-sget-v2-3-57bbe08604e4@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ext4: convert mballoc KUnit test to sget_fc()

Same treatment as the extents KUnit test. The mballoc test uses sget()
as a thin "give me an initialized superblock" wrapper for a fake
file_system_type. Move it onto sget_fc() so sget() can go away.

Add a no-op mbt_init_fs_context() so fs_context_for_mount() has
something to call on the fake fs_type. mbt_set() now takes a struct
fs_context * (still a no-op). mbt_ext4_alloc_super_block() allocates
the fc, hands it to sget_fc() and drops the fc reference once the sb
is published.

No functional change.

Link: https://patch.msgid.link/20260529-work-sget-v2-2-57bbe08604e4@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ext4: convert extents KUnit test to sget_fc()

The extents KUnit test uses sget() to get an initialized superblock for
its fake file_system_type. sget() predates fs_context and we want to
retire it. Switch this caller over to sget_fc().

Add a no-op ext_init_fs_context() so fs_context_for_mount() has
something to call on the fake fs_type. ext_set() now takes a struct
fs_context * (still a no-op). extents_kunit_init() allocates the fc,
hands it to sget_fc() and drops the fc reference once the sb is
published. sget_fc() does not retain a pointer to it.

No functional change for the test.

Link: https://patch.msgid.link/20260529-work-sget-v2-1-57bbe08604e4@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

dma-mapping: direct: fix missing mapping for THRU_HOST_BRIDGE segments

In dma_direct_map_sg(), the case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE
incorrectly used 'break' instead of falling through to MAP_NONE.
As a result, segments traversing the host bridge skipped the required
dma_direct_map_phys() call entirely, leaving sg->dma_address
uninitialized and leading to DMA failures. Fix this by using
'fallthrough;'.

Fixes: a25e7962db0d79 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers")
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260603013723.2439-1-lirongqing@baidu.com

wifi: rtw89: usb: add serial_number and uuid sysfs attributes for 0x28de:0x2432

Expose the device's Serial Number (SN) and UUID from EFUSE via two
read-only sysfs attributes, `serial_number` and `uuid`, on the ieee80211
phy device under the `rtw89_usb` attribute group.

This hardware identification information is essential for user-space
applications to uniquely identify, track, and manage specific Wi-Fi
adapters. For example, in automated factory provisioning or device
management systems, user-space tools rely on the EFUSE serial number and
UUID to bind configurations to specific physical adapters. Currently,
standard wireless APIs do not expose this low-level hardware
information, making these sysfs nodes the only viable solution for
user space to extract this data.

The attributes are gated behind a new RTW89_QUIRK_HW_INFO_SYSFS quirk,
enabled only for the VID 0x28de / PID 0x2432 device via the
dev_id_quirks field in rtw89_driver_info.

Example usage from user-space:
  $ cat /sys/class/ieee80211/phy0/rtw89_usb/serial_number
  3642000123
  $ cat /sys/class/ieee80211/phy0/rtw89_usb/uuid
  aaec2b7c-0a55-4727-8de0-b30febccbbaa

Cc: Elliot Saba <sabae@valvesoftware.com>
Cc: Charles Lohr <charlesl@valvesoftware.com>
Signed-off-by: Johnson Tsai <wenjie.tsai@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260529075032.16807-3-pkshih@realtek.com

wifi: rtw89: add dev_id_quirks to driver_info for per-device quirk control

Add a dev_id_quirks field to rtw89_driver_info so that per-device
(VID/PID) quirks can be expressed independently of chip-level
default_quirks. Apply the bitmap in rtw89_alloc_ieee80211_hw() so
both USB and PCI probes benefit automatically.

All existing driver_info structs initialize dev_id_quirks to 0;
no behavior change.

Signed-off-by: Johnson Tsai <wenjie.tsai@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260529075032.16807-2-pkshih@realtek.com

wifi: rtw89: usb: skip ACPI capability check for USB devices

Skip the ACPI capability check for all USB devices by default,
allowing them to use their default configurations.

For USB dongles, customers will manage their own compliance and
certification. This initial patch focuses on the generic USB skip
infrastructure; specific customer certifications and localized
configurations will be handled by quirks afterward.

Signed-off-by: David Lee <sc.lee@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260525082636.31105-1-pkshih@realtek.com

Merge tag 'amd-drm-next-7.2-2026-05-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-7.2-2026-05-29:

amdgpu:
- GEM_OP warning fix
- GEM_OP locking fix
- Userq fixes
- DCN 2.1 refclk fix
- SI fixes
- HMM fixes
- Add DC KUNIT tests
- UML fixes
- Switch to system_dfl_wq
- Old DC power state cleanup
- RAS fixes

amdkfd:
- svm_range_set_attr locking fix
- CRIU restore fix
- KFD debugger fix

radeon:
- Use struct drm_edid instead of struct edid

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260529214346.2328355-1-alexander.deucher@amd.com

wifi: rtw89: 8851bu: add Mercusys MA60XNB (2c4e:0128)

Add the specific USB device ID which adapter tested fully functional on
Fedora 44 with kernel 7.0.8-200.fc44.x86_64 and linux-firmware
20260410-1.fc44.

Reported-by: Guillermo Servera Negre <guillem@gservera.com>
Tested-by: Guillermo Servera Negre <guillem@gservera.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260525011728.6836-1-pkshih@realtek.com

Merge tag 'drm-intel-gt-next-2026-05-29' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

Cross-subsystem Changes:

- Backmerge of drm-next to pull in a commit to revert

Driver Changes:

- Avoid skipping already signaled fence after reset (Sebastian)
- Fix potential UAF in TTM object purge (Janusz)
- Fix refcount underflow in intel_engine_park_heartbeat (Sebastian)

- Drop check for changed VM in EXECBUF (Joonas)
- Revert the "else vma = NULL" patch for being superseded (Joonas)
- Selfest improvements (Janusz, Krzysztof)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patch.msgid.link/ahlc1R5bzJvmBLlZ@jlahtine-mobl

batman-adv: tt: directly retrieve wifi flags of net_device

batadv_tt_local_add() tries to retrieve the wifi flags of an interface to
mark the TT entry as wifi client for the AP isolation feature. In the past,
it was necessary to look up the batadv_hard_iface because the wifi_flags
were stored inside this struct. But with the batadv_wifi_net_devices
rhashtable, it is preferred to directly retrieve the wifi_flags instead of
the indirect route via batadv_hard_iface - which at the end only provides
the net_device (which we used to find the batadv_hard_iface).

This will also be essential when the global batadv_hardif_list is removed
and each lookup via batadv_hardif_get_by_netdev() will require the RTNL
lock.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: tt: sync local and global tvlv preparation return values

The batadv_tt_prepare_tvlv_local_data() and
batadv_tt_prepare_tvlv_global_data() functions are supposed to work the
same - just with different sources for the TT entries. But both handled the
return values completely different. The global variant only made sure that
the *tt_len parameter was set to 0 and didn't care about the actual return
value of the function.

The local function never made sure that the *tt_len value was set to some
value when the operation failed. The callers were handling these
differences and made sure that they didn't access the incorrectly
initialized variable.

Sync both function as good as possible to avoid problems with new code
which might not be aware of these differences in the behavior.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: prevent ELP transmission interval underflow

batadv_v_elp_start_timer() enqeues a delayed work. The time when it starts
is randomly chosen between (elp_interval - BATADV_JITTER) and
(elp_interval + BATADV_JITTER). The configured elp_interval must therefore
be larger or equal to BATADV_JITTER to avoid that it causes an underflow of
the unsigned integer. If this would happen, then a "fast" ELP interval
would turn into a "day long" delay.

At the same time, it must not be larger than the maximum value the variable
can store.

Cc: stable@kernel.org
Fixes: a10800829040 ("batman-adv: Add elp_interval hardif genl configuration")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: bla: annotate lasttime access with READ/WRITE_ONCE

The lasttime field for claim, backbone_gw, and loopdetect tracks the
jiffies value of the most recent activity and is used to detect timeouts.
These accesses are not consistently protected by a lock, so
READ_ONCE/WRITE_ONCE must be used to prevent data races caused by compiler
optimizations.

Cc: stable@kernel.org
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

dma: map_benchmark: turn dma_sg_map_param buf into a flexible array

The buf pointer was kmalloc_array()'d immediately after the parent
struct allocation, with the count (granule, validated to 1..1024 by
the ioctl) trivially available beforehand. Move buf to the struct
tail as a flexible array member and fold the two allocations into a
single kzalloc_flex(), dropping the kfree(params->buf) in both the
prepare error path and unprepare.

Add __counted_by for extra runtime analysis.

Assisted-by: Claude:Opus-4.7
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Qinxin Xia <xiaqinxin@huawei.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260603031758.290538-1-rosenp@gmail.com

batman-adv: tp_meter: consolidate congestion control variables

Congestion control variables in batadv_tp_sender were scattered across the
struct without clear grouping, making it difficult to reason about which
fields require cwnd_lock (now "cc_lock") protection. These should be
combined in a structure to make it more easily visible which variable
should be read/modified with the cc_lock held.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: tp_meter: use locking for all congestion control variables

Some variables used atomic_t for concurrent access while others relied on
cwnd_lock, leading to an inconsistent locking model. This can be simplified
by:

* keeping all congestion control decisions inside the cc_lock
* variables which can be accessed without a lock must use
READ_ONCE/WRITE_ONE

This is only possible, by extracting the congestion control logic from
batadv_tp_recv_ack() into a new helper batadv_tp_handle_ack(). Its
decisions are returned as a batadv_tp_ack_reaction enum value and then
applied by the caller. This separates the algorithm (deciding what to do)
from the mechanism (actually doing it).

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: tp_meter: split vars into sender and receiver types

The monolithic batadv_tp_vars struct holds fields for both sender and
receiver roles, distinguished only by a runtime enum role. This makes it
easy to accidentally access a field intended for the opposite role, since
neither the compiler nor the type system provide any guard against such
mistakes. The role check also adds unnecessary branching in several code
paths.

Introduce batadv_tp_vars_common to hold fields shared across both roles,
then derive two separate types (sender/receiver) from it. The functions can
operate on them without any ambiguity about the available fields. This also
reduces the memory footprint of receiver sessions, which no longer carry
the substantial sender-only fields.

Care must be taken to prevent concurrent TP sessions between the same two
peers in opposite directions, since sender and receiver sessions are now
tracked in separate lists and a lookup in one list no longer detects a
session in the other.

Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: tp_meter: add only finished tp_vars to lists

When the receiver variables (aka "session") are initialized, then they are
added to the list of sessions before the timer is set up. A RCU protected
reader could therefore find the entry and run mod_setup before
batadv_tp_init_recv() finished the timer initialization.

The same is true for batadv_tp_start(), which must first initialize the
finish_work and the test_length to avoid a similar problem.

Cc: stable@kernel.org
Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
Signed-off-by: Sven Eckelmann <sven@narfation.org>