git.ipfire.org Git - thirdparty/linux.git/log

sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime()

assign_cfs_rq_runtime() during update_curr() sets the resched indicator
and relies on check_cfs_rq_runtime() during pick_next_task() /
put_prev_entity() to throttle the hierarchy once current task is
preempted / blocks.

Per-task throttle, on the other hand, uses throttle_cfs_rq() to simply
propagate the throttle signals, and then relies on task work to
individually throttle the runnable tasks on their way out to the
userspace.

Remove check_cfs_rq_runtime() and unify throttling into
account_cfs_rq_runtime() which only sets the cfs_rq->throttled,
cfs_rq->throttle_count indicators via throttle_cfs_rq() and optionally
adds the task work to the current task (donor) it is on the throttled
hierarchy.

throttle_cfs_rq() requests for sched_cfs_bandwidth_slice() worth of
bandwidth for the current hierarchy that enable it to continue running
uninterrupted when selected. For the rest, it requests a bare minimum of
"1" to ensure some bandwidth is available and pass the
"runtime_remaining > 0" checks once selected.

For SCHED_PROXY_EXEC, a mutex holder cannot exit to userspace without
dropping it first and the mutex_unlock() ensures proxy is stopped before
the mutex handoff which preserves the current semantics for running a
throttled task until it exits to the userspace even if it acts as a
donor.

[ prateek: rebased on tip, comments, commit message. ]

Reviewed-By: Benjamin Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20260602071005.11942-1-kprateek.nayak@amd.com

sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up()

An update_curr() during the enqueue of throttled task will start
throttling the hierarchy from subsequent commit. This can lead to
tg_throttle_down() seeing non-empty throttled_limbo_list for the cfs_rq
attaching the task from throttled_limbo_list one by one. For example:

     R
     |
     A
    / \
  *B   C
       |
       rq->curr

*B is throttled with tasks on hte limbo list. When the tasks are
unthrottled via tg_unthrottle_up() and entity of group B is placed onto
A, update_curr() is called to catch up the vruntime and it may throttle
group A causing the subsequent tg_throttle_down() to see the pending
task's on B's limbo list.

  tg_unthrottle_up()
    /* --cfs_rq->throttle_count == 0 */
    list_for_each_entry_safe(p, cfs_rq->throttled_limbo_list)
      enqueue_task_fair()
        enqueue_entity(se /* B->se */)
          update_curr(cfs_rq /* A->gcfs_rq */)
            account_cfs_rq_runtime(cfs_rq)
              throttle_cfs_rq(cfs_rq /* A->gcfs_rq */ )
                tg_throttle_down()
                  /* Reaches B->cfs_rq with throttle_count == 0 */

                  !!! !list_empty(&cfs_rq->throttled_limbo_list)) !!!

Move the tasks from throttled_limbo_list onto a local list before
starting the unthrottle to prevent the splat described above. If the
hierarchy is throttled again in middle of an unthrottle, put the pending
tasks back onto the limbo list to prevent running them unnecessarily.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Benjamin Segall <bsegall@google.com>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20260602052531.11450-2-kprateek.nayak@amd.com

sched/fair: Call update_curr() before unthrottling the hierarchy

Subsequent commits will allow update_curr() to throttle the hierarchy
when the runtime accounting exceeds allocated quota. Call update_curr()
before the unthrottle event, and in tg_unthrottle_up() to catch up on
any remaining runtime and stabilize the "runtime_remaining" and
"throttle_count" for that cfs_rq.

Doing an update_curr() early ensures the cfs_rq is not throttled right
back up again when the unthrottle is in progress.

Since all callers of unthrottle_cfs_rq(), except two, already update the
rq_clock and call rq_clock_start_loop_update(), move the
update_rq_clock() from unthrottle_cfs_rq() to the callers that don't
update the rq_clock.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Benjamin Segall <bsegall@google.com>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20260602052531.11450-1-kprateek.nayak@amd.com

sched/fair: Use throttled_csd_list for local unthrottle

When distribute_cfs_runtime() encounters a local cfs_rq, it adds it to a
local list and unthrottles it at the end, when it is done unthrottling
other cfs_rq(s) on cfs_b->throttled_cfs_rq until the bandwidth runs out.

Instead of using a local list, reuse the local CPU's
rq->throttled_csd_list and the __cfsb_csd_unthrottle() path for
unthrottle.

If this is the first cfs_rq to be queued on the "throttled_csd_list", it
prevents the need for a remote CPUs to interrupt this local CPU if they
themselves are performing async unthrottle.

If this is not the first cfs_rq on the list, there is an async unthrottle
operation pending on this local CPU and the unthrottle can be batched
together.

No functional changes intended.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Benjamin Segall <bsegall@google.com>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20260602050005.11160-3-kprateek.nayak@amd.com

sched/fair: Convert cfs bandwidth throttling to use guards

Routine conversion of rcu_read_lock(), spin_lock*, and rq_lock usage
within the cfs bandwidth controller to use class guards.

Only notable changes are:

- Checking for "cfs_rq->runtime_remaining <= 0" instead of the inverse
   to spot a throttle and break early. This also saves the need
   for extra indentation in the unthrottle case.

- Reordering of list_del_rcu() against throttled_clock indicator update
   in unthrottle_cfs_rq(). Both are done with "cfs_b->lock" held after
   the "cfs_rq->throttled" is cleared which make the reordering safe
   against concurrent list modifications.

No functional changes intended.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20260602050005.11160-2-kprateek.nayak@amd.com

sched/fair: Allocate cfs_tg_state with percpu allocator

To remove the cfs_rq pointer array in task_group, allocate the combined
cfs_rq and sched_entity using the per-cpu allocator.

This patch implements the following:

- Changes task_group->cfs_rq from 'struct cfs_rq **' to
   'struct cfs_rq __percpu *'.

- Updates memory allocation in alloc_fair_sched_group() and
   free_fair_sched_group() to use alloc_percpu() and free_percpu()
   respectively.

- Uses the inline accessor tg_cfs_rq(tg, cpu) with per_cpu_ptr() to retrieve
   the pointer to cfs_rq for the given task group and CPU.

- Replaces direct accesses tg->cfs_rq[cpu] with calls to the new tg_cfs_rq(tg,
   cpu) helper.

- Handles the root_task_group: since struct rq is already a per-cpu variable
   (runqueues), its embedded cfs_rq (rq->cfs) is also per-cpu. Therefore, we
   assign root_task_group.cfs_rq = &runqueues.cfs.

- Cleanup the code in initializing the root task group.

This change places each CPU's cfs_rq and sched_entity in its local per-cpu
memory area to remove the per-task_group pointer arrays.

Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Josh Don <joshdon@google.com>
Link: https://patch.msgid.link/20260522141623.600235-4-zli94@ncsu.edu

sched/fair: Remove task_group->se pointer array

Now that struct sched_entity is co-located with struct cfs_rq for non-root task
groups, the task_group->se pointer array is redundant. The associated
sched_entity can be loaded directly from the cfs_rq.

This patch performs the access conversion with the helpers:

- is_root_task_group(tg): checks if a task group is the root task group. It
   compares the task group's address with the global root_task_group variable.

- tg_se(tg, cpu): retrieves the cfs_rq and returns the address of the
   co-located se. This function checks if tg is the root task group to ensure
   behaving the same of previous tg->se[cpu]. Replaces all accesses that use
   the tg->se[cpu] pointer array with calls to the new tg_se(tg, cpu) accessor.

- cfs_rq_se(cfs_rq): simplifies access paths like cfs_rq->tg->se[...] to use
   the co-located sched_entity. This function also checks if tg is the root
   task group to ensure same behavior.

Since tg_se is not in very hot code paths, and the branch is a register
comparison with an immediate value (`&root_task_group`), the performance impact
is expected to be negligible.

Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Josh Don <joshdon@google.com>
Link: https://patch.msgid.link/20260522141623.600235-3-zli94@ncsu.edu

sched/fair: Co-locate cfs_rq and sched_entity in cfs_tg_state

Improve data locality and reduce pointer chasing by allocating struct
cfs_rq and struct sched_entity together for non-root task groups. This
is achieved by introducing a new combined struct cfs_tg_state that
holds both objects in a single allocation.

This patch:

- Introduces struct cfs_tg_state that embeds cfs_rq, sched_entity, and
   sched_statistics together in a single structure.

- Updates __schedstats_from_se() in stats.h to use cfs_tg_state for accessing
   sched_statistics from a group sched_entity.

- Modifies alloc_fair_sched_group() and free_fair_sched_group() to allocate
   and free the new struct as a single unit.

- Modifies the per-CPU pointers in task_group->se and task_group->cfs_rq to
   point to the members in the new combined structure.

Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Josh Don <joshdon@google.com>
Link: https://patch.msgid.link/20260522141623.600235-2-zli94@ncsu.edu

sched: restore timer_slack_ns when resetting RT policy on fork

Commit ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values
for realtime tasks") sets timer_slack_ns to 0 for RT tasks in
__setscheduler_params(). However, when an RT task with SCHED_RESET_ON_FORK
creates child threads, the children inherit timer_slack_ns=0 from the
parent. sched_fork() resets the child's policy to SCHED_NORMAL but does
not restore timer_slack_ns, leaving the child permanently running with
zero slack.

Fix this by restoring timer_slack_ns from default_timer_slack_ns in
sched_fork() when resetting from RT/DL to NORMAL policy, matching the
existing behavior in __setscheduler_params().

Note: this fix alone requires a correct default_timer_slack_ns to be
effective. See the following patch for that fix.

Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")
Reported-by: Qiaoting.Lin <linqiaoting@xiaomi.com>
Signed-off-by: Guanyou.Chen <chenguanyou@xiaomi.com>
Signed-off-by: Chunhui.Li <chunhui.li@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260522131000.1664983-2-chenguanyou@xiaomi.com

MAINTAINERS: Fix spelling mistake in Peter's name

Fix a typo in Peter's name which was added by commit 113d0a6b3954
("MAINTAINERS: Add Peter explicitly to the psi section").

Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260530160842.29089-1-zenghui.yu@linux.dev

sched: Simplify ttwu_runnable()

Note that both proxy and delayed tasks have ->is_blocked set. Use this one
condition to guard both paths.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260526113322.714832584%40infradead.org

sched/proxy: Remove superfluous clear_task_blocked_in()

Per the discussion here:

  https://lore.kernel.org/all/20260403112810.GG3738786@noisy.programming.kicks-ass.net/

The reason for this condition is that the signal condition in
try_to_block_task() would set_task_blocked_in_waking(). However, it no longer
does that, in fact, that path does clear_task_blocked_on().

Further, per the discussions here:

  https://lore.kernel.org/r/dc61cf77-e541-441d-a708-c40e19aa0db2%40amd.com
  https://lore.kernel.org/r//9dd1d24d-45d3-4ee2-8e67-8305b34bfb6d%40amd.com

there are a few other edge cases that needed this. But they're all
variants of PROXY_WAKING leaking out. And since PROXY_WAKING is now
gone, this is no longer needed either.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260526113322.120970670%40infradead.org

sched/proxy: Remove PROXY_WAKING

Now that the proxy path uses ->is_blocked, use the '->is_blocked &&
!->blocked_on' state instead of PROXY_WAKING. Notably, this is where a
blocked_on relation is broken but the donor task might still need a return
migration.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260526113322.596522894%40infradead.org

sched/proxy: Switch proxy to use p->is_blocked

Rather than gate the proxy paths with p->blocked_on, use p->is_blocked.

This opens up the state: '->is_blocked && !->blocked_on' for future use.

Notably, only proxy and delayed tasks can be ->on_rq && ->is_blocked, and it is
guaranteed that sched_class::pick_task() will never return a delayed task.
Therefore any task returned from pick_next_task() that has ->is_blocked set,
must be a proxy task.

Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260526113322.477954312%40infradead.org

sched/proxy: Only return migrate when needed

Current code will 'unconditionally' return migrate on PROXY_WAKING, even if the
task is (still) on the original CPU.

Check task_cpu(p) against p->waking_cpu, which per proxy_set_task_cpu()
preserves the original CPU the task was on. If they do not mis-match, there is
no need to go through the more expensive wakeup path.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260527082916.GP3126523%40noisy.programming.kicks-ass.net

sched: Be more strict about p->is_blocked

Upon entry to try_to_block_task(), p->is_blocked should be false. After all,
the prior wakeup would have made it so per ttwu_do_wakeup().

Ensure this is the case, rather than clearing it in the path that doesn't set
it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260526113322.364017314%40infradead.org

sched/proxy: Optimize try_to_wake_up()

The reason for the clause in try_to_wake_up() is, per its comment, that
find_proxy_task()'s proxy_deactivate() is not always called with a cleared
p->blocked_on.

However, that seems silly and easily cured. Make sure to always call
proxy_deactivate() with a cleared p->blocked_on such that we might remove this
clause from the common wake-up path.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260526113322.244729903%40infradead.org

sched: Add blocked_donor link to task for smarter mutex handoffs

Add link to the task this task is proxying for, and use it so
the mutex owner can do an intelligent hand-off of the mutex to
the task that the owner is running on behalf.

[jstultz: This patch was split out from larger proxy patch]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-8-jstultz@google.com

sched: Add is_blocked task flag

Add a new is_blocked flag to the task struct. This flag is set
by try_to_block_task() and cleared by ttwu_do_wakeup() and
tracks if the task is blocked.

Traditionally this would mirror !p->on_rq, however due things
like DELAY_DEQUEUE and PROXY_EXEC, this can diverge, so its
useful to manage separately.

Additionally with this, we might be able to get rid of the
p->se.sched_delayed (ab)use in the core code (eventually).

Taken whole cloth from Peter's email:
https://lore.kernel.org/lkml/20260501132143.GC1026330@noisy.programming.kicks-ass.net/

With a few additional p->is_blocked = 0 in a few cases where
we return current if blocked_on gets zeroed or there is
no owner. This may hint that these current special cases
might be dropped eventually.

This change also helps resolve wait-queue stalls seen with
proxy-execution. See previous patch attempts for details:
https://lore.kernel.org/lkml/20260430215103.2978955-2-jstultz@google.com/

Reported-by: Vineeth Pillai <vineethrp@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-7-jstultz@google.com

sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case

This patch adds logic so try_to_wake_up() will notice if we are
waking a task where blocked_on == PROXY_WAKING, and if necessary
dequeue the task so the wakeup will naturally return-migrate the
donor task back to a cpu it can run on.

This helps performance as we do the dequeue and wakeup under the
locks normally taken in the try_to_wake_up() and avoids having
to do proxy_force_return() from __schedule(), which has to
re-take similar locks and then force a pick again loop.

This was split out from the larger proxy patch, and
significantly reworked.

Credits for the original patch go to:
  Peter Zijlstra (Intel) <peterz@infradead.org>
  Juri Lelli <juri.lelli@redhat.com>
  Valentin Schneider <valentin.schneider@arm.com>
  Connor O'Brien <connoro@google.com>

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-6-jstultz@google.com

sched: Rework block_task so it can be directly called

Pull most of the logic out of try_to_block_task() and put it
into block_task() directly, so that we can call block_task() and
not have to worry about the failing cases in try_to_block_task()

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-5-jstultz@google.com

sched: deadline: Add dl_rq->curr pointer to address issues with Proxy Exec

The DL scheduler keeps the current task in the rbtree, since
the deadline value isn't usually chagned while the task is
runnable.

This results in set_next_task() and put_prev_task() being
simpler, but unfortunately this causes complexity elsewhere.

Specifically when update_curr_dl() updates the deadline, it has
to dequeue and then enqueue the task.

From put_prev_task_dl(), we first call update_curr_dl(), and
then call enqueue_pushable_dl_task().

However, with Proxy Exec this goes awry. Since when a mutex is
released, we might wake the waiting rq->donor. This will cause
put_prev_task() to be called on the donor to take it off the
cpu for return migration.

At that point, from put_prev_task_dl() the update_curr_dl()
logic will dequeue & enqueue the task, and the enqueue function
will call enqueue_pushable_dl_task() (since the task_current()
check won't prevent it). Then back up the callstack in
put_prev_task_dl() we'll end up calling
enqueue_pushable_dl_task() again, tripping the
!RB_EMPTY_NODE(&p->pushable_dl_tasks) warning.

So to avoid this, use Peter's suggested[1] approach, and add a
dl_rq->curr pointer that is set/cleared from set_next_task()/
put_prev_task(), which effectively tracks the rq->donor. We can
then use this to avoid adding the active donor to the pushable
list from enqueue_task_dl().

[1]: https://lore.kernel.org/lkml/20260304095123.GP606826@noisy.programming.kicks-ass.net/

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-4-jstultz@google.com

sched: deadline: Add some helper variables to cleanup deadline logic

As part of an improvement to handling pushable deadline tasks,
Peter suggested this cleanup[1], to use helper values for
dl_entity and dl_rq in the enqueue_task_dl() and
put_prev_task_dl() functions. There should be no functional
change from this patch.

To make sure this cleanup change doesn't obscure later logic
changes, I've split it into its own patch.

[1]: https://lore.kernel.org/lkml/20260304095123.GP606826@noisy.programming.kicks-ass.net/

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-3-jstultz@google.com

sched: Rework prev_balance() to avoid stale prev references

Historically, the prev value from __schedule() was the rq->curr.
This prev value is passed down through numerous functions, and
used in the class scheduler implementations. The fact that
prev was on_cpu until the end of __schedule(), meant it was
stable across the rq lock drops that the class->balance()
implementations often do.

However, with proxy-exec, the prev passed to functions called
by __schedule() is rq->donor, which may not be the same as
rq->curr and may not be on_cpu, this makes the prev value
potentially unstable across rq lock drops.

A recently found issue with proxy-exec, is when we begin doing
return migration from try_to_wake_up(), its possible we may be
waking up the rq->donor. When we do this, we proxy_resched_idle()
to put_prev_set_next() setting the rq->donor to rq->idle, allowing
the rq->donor to be return migrated and allowed to run.

This however runs into trouble, as on another cpu we might be in
the middle of calling __schedule(). Conceptually the rq lock is
held for the majority of the time, but in calling prev_balance()
its possible the class->balance() handler call may briefly drop the rq lock.
This opens a window for try_to_wake_up() to wake and return migrate the
rq->donor before the class logic reacquires the rq lock.

Unfortunately prev_balance() pass in a prev argument, to which we pass
rq->donor. However this prev value can now become stale and incorrect across a
rq lock drop.

So, to correct this, rework the prev_balance() call so that it does not take a
"prev" argument.

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-2-jstultz@google.com

locking: mutex: Fix proxy-exec potentially deactivating tasks marked TASK_RUNNING

Vineeth found came up with a test driver that could trip up
workqueue stalls. After fixing one issue this test found,
Vineeth reported the test was still failing.

Greatly simplified, a task that tries to take a mutex already
owned by another task that is sleeping, can hit a edge case in
the mutex_lock_common() case.

If the task fails to get the lock, calls into schedule, but gets
a spurious wakeup, it will find that it is first waiter, and
go into the mutex_optimistic_spin() logic. Though before calling
mutex_optimistic_spin(), we clear task blocked_on state, since
mutex_optimistic_spin() may call schedule() if need_resched() is
set.

After mutex_optimistic_spin() fails, we set blocked_on again,
restart the main mutex loop, try to take the lock and call into
schedule_preempt_disabled().

From there, with proxy-execution, we'll see the task is
blocked_on, follow the chain, see the owner is sleeping and
dequeue the waiting task from the runqueue.

This all sounds fine and reasonable. But what I had missed is
that in mutex_optimistic_spin(), not only do we call schedule()
but we set TASK_RUNNABLE right before doing so.

This is ok for that invocation of schedule(). But when we come
back we re-set the blocked_on we had just cleared, but we do not
re-set the task state to TASK_INTERRUPTIBLE/UNINTERRUPTIBLE.

This means we have a task that is blocked_on & TASK_RUNNABLE,
so when the proxy execution code dequeues the task, we are
in trouble since future wakeups will be shortcut by the
ttwu_state_match() check.

Thus, to avoid this, after mutex_optimistic_spin(), set the task
state back when we set blocked_on.

Many many thanks again to Vineeth for his very useful testing
driver that uncovered this long hidden bug, that I hadn't
tripped in all my testing! Very impressed with the problems he's
uncovered!

Reported-by: Vineeth Pillai <vineethrp@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Vineeth Pillai <vineethrp@google.com>
Link: https://patch.msgid.link/20260430215103.2978955-3-jstultz@google.com

Merge branch 'tip/sched/urgent'

Pick up urgent fixes.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>

drm/tyr: use IoMem directly instead of Devres

Now that IoMem is lifetime-parameterized, use it directly in probe
rather than wrapping it in Devres and Arc. The I/O memory mapping is
only used during probe and not stored in driver data, so device-managed
revocation is unnecessary.

This removes the Devres access(dev) pattern from issue_soft_reset(),
GpuInfo::new(), and l2_power_on(), simplifying register access.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Link: https://patch.msgid.link/20260529000106.2257996-3-dakr@kernel.org
Signed-off-by: Alice Ryhl <aliceryhl@google.com>

drm/tyr: separate driver type from driver data

Introduce TyrPlatformDriver as a unit struct for the platform::Driver
trait implementation and keep TyrPlatformDriverData for the private
driver data.

Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260529000106.2257996-2-dakr@kernel.org
Signed-off-by: Alice Ryhl <aliceryhl@google.com>

nvme-multipath: set BIO_REMAPPED on bios remapped to per-path namespace disks

When nvme_ns_head_submit_bio() remaps a bio from the multipath head to a
per-path namespace, bio_set_dev() clears BIO_REMAPPED.  The remapped bio
is then resubmitted through submit_bio_noacct() which calls
bio_check_eod() because BIO_REMAPPED is not set.

This races with nvme_ns_remove() which zeroes the per-path capacity
before synchronize_srcu():

  CPU 0 (IO submission)
  ---------------------
  srcu_read_lock()
  nvme_find_path() -> ns
    [NVME_NS_READY is set]

  CPU 1 (namespace removal)
  -------------------------
  clear_bit(NVME_NS_READY)
  set_capacity(ns->disk, 0)
  synchronize_srcu()  <- blocks

  CPU 0 (IO submission)
  ---------------------
  bio_set_dev(bio, ns->disk->part0)
    [clears BIO_REMAPPED]
  submit_bio_noacct(bio)
    -> bio_check_eod() sees capacity=0
    -> bio fails with IO error

The SRCU read lock prevents synchronize_srcu() from completing, but does
not prevent set_capacity(0) from executing.  The bio fails the EOD check
before it reaches the NVMe driver, so nvme_failover_req() never gets a
chance to redirect it to another path of multipath.  IO errors are
reported to the application despite another path being available.

On older kernels (before commit 0b64682e78f7 "block: skip unnecessary
checks for split bio"), the same race was also reachable through split
remainders resubmitted via submit_bio_noacct().

Fix this by setting BIO_REMAPPED after bio_set_dev() in
nvme_ns_head_submit_bio().  This skips bio_check_eod() on the per-path
device; the EOD check already passed on the multipath head.

NVMe per-path namespace devices are always whole disks (bd_partno=0), so
the blk_partition_remap() skip also gated by BIO_REMAPPED is a no-op.
The flag does not persist across failover and cannot go stale if the
namespace geometry changes between attempts: nvme_failover_req() calls
bio_set_dev() to redirect the bio back to the multipath head, which
clears BIO_REMAPPED.  When nvme_requeue_work() resubmits through
submit_bio_noacct(), bio_check_eod() runs normally against the current
capacity.

Same approach as commit 3a905c37c351 ("block: skip bio_check_eod for
partition-remapped bios").

Fixes: a7c7f7b2b641 ("nvme: use bio_set_dev to assign ->bi_bdev")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Igor Achkinazi <igor.achkinazi@dell.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

xfrm: iptfs: fix use-after-free on first_skb in __input_process_payload

__input_process_payload() stores first_skb into xtfs->ra_newskb under
drop_lock when starting partial reassembly, then unlocks and breaks out
of the processing loop. The post-loop check reads xtfs->ra_newskb
without the lock to decide whether first_skb is still owned:

if (first_skb && first_iplen && !defer && first_skb != xtfs->ra_newskb)

Between spin_unlock and this read, a concurrent CPU running
iptfs_reassem_cont() (or the drop_timer hrtimer) can complete
reassembly, NULL xtfs->ra_newskb, and free the skb. The check then
evaluates first_skb != NULL as true, and pskb_trim/ip_summed/consume_skb
operate on the freed skb — a use-after-free in skbuff_head_cache.

Replace the unlocked read with a local bool that records whether
first_skb was handed to the reassembly state in the current call. The
flag is set after the existing spin_unlock, before the break, using the
pointer equality that is stable at that point (first_skb == skb iff
first_skb was stored in ra_newskb).

Fixes: 3f3339885fb3 ("xfrm: iptfs: add reusing received skb for the tunnel egress packet")
Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

nvme-multipath: require exact iopolicy names for module parameter

The iopolicy module parameter uses strncmp prefix matching, so values
like "numax" are accepted as "numa". The per-subsystem sysfs attribute
already requires an exact match via sysfs_streq(). Parse both through
a shared helper so invalid values are rejected consistently.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: liyouhong <liyouhong@kylinos.cn>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme-multipath: pass NS head to nvme_mpath_revalidate_paths()

In nvme_mpath_revalidate_paths(), we are passed a NS pointer and use that
to lookup the NS head and then use that same NS pointer as an iter variable.

It makes more sense pass the NS head and use a local variable for the NS
iter.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

USB: serial: io_ti: fix heap overflow in build_i2c_fw_hdr()

build_i2c_fw_hdr() allocates a fixed-size buffer of
(16*1024 - 512) + sizeof(struct ti_i2c_firmware_rec) bytes, then
copies le16_to_cpu(img_header->Length) bytes into it without
validating that Length fits within the available space after the
firmware record header.

img_header->Length is a __le16 from the firmware file and can be
up to 65535. check_fw_sanity() validates the total firmware size
but not img_header->Length specifically.

Fix by rejecting images where img_header->Length exceeds the
available destination space.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Korwel <adriank20047@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>

USB: serial: io_ti: fix heap overflow in get_manuf_info()

get_manuf_info() reads le16_to_cpu(rom_desc->Size) bytes from the
device I2C EEPROM into a buffer allocated with kmalloc_obj(), which
is sizeof(struct edge_ti_manuf_descriptor) = 10 bytes.

The Size field comes from the device and is only validated (in
check_i2c_image()) to make sure the descriptor fits within
TI_MAX_I2C_SIZE (16384 bytes), not against the destination buffer size.
A malicious USB device can therefore set Size to any value up to 16377,
causing a heap overflow of up to 16367 bytes when plugged into a host
running this driver.

valid_csum() is called after read_rom() and also iterates
buffer[0..Size-1], compounding the out-of-bounds access.

Fix by rejecting descriptors with unexpected length before calling
read_rom().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Korwel <adriank20047@gmail.com>
[ johan: amend commit message; also check for short descriptors ]
Signed-off-by: Johan Hovold <johan@kernel.org>

ata: pata_ep93xx: add COMPILE_TEST support

Now that the build failures have been fixed, we can add COMPILE_TEST so
the buildbots can find potentially more problems.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

ata: pata_ep93xx: use unsigned long for data

An int is being encoded as a void pointer but that breaks on 64-bit
systems as the type needs to match pointer size.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

ata: pata_ep93xx: avoid asm on non ARM

The raw ARM asm delay loop prevents COMPILE_TEST builds on
non-ARM architectures. Guard it with CONFIG_ARM and provide a
cpu_relax() fallback for compilation on other architectures.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

pps: Convert to ktime_get_snapshot_id()

ktime_get_snapshot() resolves to ktime_get_snapshot_id(CLOCK_REALTIME).

Make it obvious in the code and convert the readout to use the
snapshot::systime and monoraw fields instead of snapshot::real and raw,
which aregoing away.

Similar to the PPS generators, avoid the more expensive snapshot when
CONFIG_NTP_PPS is disabled.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.123410250@kernel.org

pps: generators: Use ktime_get_real_ts64() instead of ktime_get_snapshot()

There is no reason to use the more complex ktime_get_snapshot() for
retrieving CLOCK_REALTIME.

Just use ktime_get_real_ts64(), which avoids the extra timespec64
conversion as a bonus.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Link: https://patch.msgid.link/20260529195557.074439049@kernel.org

timekeeping: Use system_time_snapshot::systime/monoraw instead of ::real/raw

system_time_snapshot::systime provides the same information as
system_time_snapshot::real when the snapshot was taken with
ktime_get_snapshot_id(CLOCK_REALTIME).

Convert the history interpolation over to use 'systime' and 'monoraw' as
'real/raw' are going away once all users are converted.

As a side effect this is the first step to support CLOCK_AUX with
get_device_crosstime_stamp() and the history interpolation.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.024415766@kernel.org

timekeeping: Provide ktime_get_snapshot_id()

ktime_get_snapshot() provides a snapshot of the underlying clocksource
counter value and the corresponding CLOCK_MONOTONIC_RAW, CLOCK_REALTIME and
CLOCK_BOOTTIME timestamps.

There is no usage of CLOCK_REALTIME and CLOCK_BOOTTIME at the same time and
CLOCK_BOOTTIME support was just added for the ARM64 KVM tracing mechanism,
which needs CLOCK_BOOTTIME and the underlying clocksource counter value.

ktime_get_snapshot() is also not suitable for usage with CLOCK_AUX, but
that's a prerequisite to support PTP hardware timestamping for CLOCK_AUX
steering.

As a first step, rename ktime_get_snapshot() to ktime_get_snapshot_id(),
which now takes a clockid argument to select the clock which needs to be
captured. The result is stored in system_time_snapshot::systime, which will
replace the system_time_snapshot::real/boot members once all usage sites
have been converted.

ktime_get_snapshot() is a simple wrapper which hands in CLOCK_REALTIME as
clockid argument for the conversion period. That means CLOCK_REALTIME is
now captured twice, but that redunancy is only temporary.

As all usage sites of struct system_time_snapshot has to be updated anyway,
rename the 'raw' member to 'monoraw' for clarity.

No functional change vs. current users of ktime_get_snapshot()

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195556.971591633@kernel.org

ext2: fix ignored return value of generic_write_sync()

Fix ext2_dio_write_iter() to propagate the error returned by
generic_write_sync() instead of silently discarding it, which could
cause write(2) to return success to userspace on O_SYNC/O_DSYNC files
even when the sync failed.

The correct pattern, already used in ext2_dax_write_iter() in the same
file and in ext4, xfs, f2fs among others, is:
if (ret > 0)
ret = generic_write_sync(iocb, ret);

Found by Linux Verification Center (linuxtesting.org) with SVACE.

[JK: Reflect also filemap_write_and_wait() return value]

Fixes: fb5de4358e1a ("ext2: Move direct-io to use iomap")
Signed-off-by: Danila Chernetsov <listdansp@mail.ru>
Link: https://patch.msgid.link/20260530122311.136803-1-listdansp@mail.ru
Signed-off-by: Jan Kara <jack@suse.cz>

Merge tag 'qcom-arm64-for-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/dt

Qualcomm Arm64 DeviceTree updates for v7.2

Introduce the Qualcomm IPQ9650 router/gateway platform and the RDP488
board. Add support for the Motorola Edge 30 and the Nothing Phone.

Describe the IPA block on the Agatti platform and missing OPP-levels for
the video encoder/decoder.

For Eliza, describe the QUP Serial Engines, GPI DMA, SDHCI, LLCC, IMEM,
QCE crypto, ADSP remoteproc and USB nodes. Enable DSI panel,
DisplayPort, USB, and ADSP support on the Eliza MTP.

On Glymur enable ADSP and CDSP remoteprocs, FastRPC, crypto hardware,
CPUfreq cooling devices, and coresight nodes. Enable the remoteprocs and
the LID sensor on the Glymur CRD.

Describe the CAN-FD controller found on the Hamoa EVK. Correct the
DisplayPort controller OPP tables.

Describe the watchdog on IPQ5210 and IPQ9650.

Describe USB controller and PHYs for the Kaanapali platform and enable
basic USB support on the MTP and QRD devices.

Enable the second display subsystem on Lemans and use this to enable
additional DisplayPort outputs on the Lemans Ride board, and IFP
mezzanine for the EVK. Also enable the GPIO expander on the Lemans EVK
to get the CAN signals out.

Add crypto hardware and qfprom nodes on Milos. Reduce the remotefs
shared memory size to avoid sanity checks in the modem firmware
rejecting the region.
Enable the vibrator on FairPhone FP6.

Add GPSDP FastRPC support on Monaco, and describe the Bluetooth
controller on the Arduino VENTUNO Q board.

Introduce an EL2 overlay for the Purwa IoT EVK.

Enable CAN bus controller on QCS6490 RB3gen2 and add a remotefs node.

Enable FastRPC on the SC8280XP ADSP.

Correct SDM630 and SDM660 ADSP FastRPC channel ids. Also add the ADSP
memory region on SDM630.

On SDM845 devices, enable NFC on Google Pixel 3, OnePlus 6, OnePlus 6T,
and SHIFT SHIFT6mq. Enable camera flash on LG devices. Rework the
framebuffer description on Samsung, SHIFT and Xiaomi devices. Enable
camera flash on LG devices. Fix Bluetooth and WiFi on LG and Xiaomi
devices.

Enable MDSS and the display panel on Xiaomi Mi A3.

Scale L3 and DDR clock votes based on CPUfreq selection.

Enable camera clock controller, cpufreq cooling devices, and correct the
DSI1 reference clock on SM8750.

On the Talos platform, describe the QSPI support, GPR and audio
services, and enable sound on the EVK target. Enable QSPI and describe
the SPINOR on this bus, on the QCS615 Ride.

Describe power-domain and iface clock for the Inline Crypto Engine (ICE)
across various platforms.

Fix the Bluetooth RFA supply name across a variety of devices.

* tag 'qcom-arm64-for-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (131 commits)
  arm64: dts: qcom: add support for pixel 3a xl with the tianma panel
  arm64: dts: qcom: sdm670-google: add common device tree include
  arm64: dts: qcom: hamoa-iot-evk: add MCP2518FD CAN on spi18
  arm64: dts: qcom: sm8750: allow mode-switch events to reach the QMP Combo PHY
  arm64: dts: qcom: sc8280xp: drop unused polling-delay-passive properties
  arm64: dts: qcom: ipq5210: add watchdog node
  arm64: dts: qcom: sdm845-xiaomi-beryllium: Correct IPA FW path
  arm64: dts: qcom: monaco-arduino-monza: Add Bluetooth UART node
  arm64: dts: qcom: glymur: Add qfprom efuse node
  arm64: dts: qcom: milos: Add qfprom efuse node
  arm64: dts: qcom: glymur: add coresight nodes
  arm64: dts: qcom: qcs6490-rb3gen2: add rmtfs node
  arm64: dts: qcom: lemans-evk: Enable CAN RX via I2C GPIO expander
  arm64: dts: qcom: glymur: Fix wrong interrupt number for i2c19
  arm64: dts: qcom: Drop unused remoteproc_adsp_glink label
  arm64: dts: qcom: lemans: Add eDP ref clock for eDP PHYs
  arm64: dts: qcom: sm8750: Add power-domain and iface clk for ice node
  arm64: dts: qcom: sm8650: Add power-domain and iface clk for ice node
  arm64: dts: qcom: sm8550: Add power-domain and iface clk for ice node
  arm64: dts: qcom: sm8450: Add power-domain and iface clk for ice node
  ...

Signed-off-by: Linus Walleij <linusw@kernel.org>

iommu/amd: Don't split flush for amd_iommu_domain_flush_all()

We have observed multiple full invalidations occurring during device
detach when we are done using the vfio-device.

blocked_domain_attach_device()
  -> detach_device()
    -> amd_iommu_domain_flush_all()
      -> amd_iommu_domain_flush_pages(..., CMD_INV_IOMMU_ALL_PAGES_ADDRESS)

       while (size != 0) {

          -> __domain_flush_pages( flush_size /* power of 2 flush_size */)
            -> domain_flush_pages_v1()
              -> build_inv_iommu_pages()
                -> build_inv_address()

         }

build_inv_address() will trigger a full invalidation  if the chunk
size > (1 << 51). Consequently, the guest will issue multiple full
invalidations for a single call to  amd_iommu_domain_flush_all()

Without this patch, we will see 10 time instead of 1 time full
invalidations for every amd_iommu_domain_flush_all().

Cc: stable@vger.kernel.org
Fixes: a270be1b3fdf ("iommu/amd: Use only natural aligned flushes in a VM")
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Weinan Liu <wnliu@google.com>
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/rockchip: disable fetch dte time limit

Disable the Bit 31 of the AUTO_GATING iommu register, as it causes
hangups with the RGA3 (Raster Graphics Acceleration 3) peripheral.
The RGA3 register description of the TRM already states that the bit
must be set to 1. The vendor kernel sets the bit unconditionally to
1 to fix VOP (Video Output Processor) screen black issues. This patch
squashes the 2 vendor kernel commits with the following commit messages:

Master fetch data and cpu update page table may work in parallel, may
have the following procedure:

master                  cpu
fetch dte               update page tabl
        |                       |
(make dte invalid)  <-  zap iotlb entry
        |                       |
fetch dte again
(make dte invalid)  <-  zap iotlb entry
        |                       |
fetch dte again
(make dte invalid)  <-  zap iotlb entry
        |                       |
fetch dte again
(make iommu block)  <-  zap iotlb entry

New iommu version has the above bug, if fetch dte consecutively four
times, then it will be blocked. Fortunately, we can set bit 31 of
register MMU_AUTO_GATING to 1 to make it work as old version which does
not have this issue.

This issue only appears on RV1126 so far, so make a workaround dedicated
to "rockchip,rv1126" machine type.

iommu/rockchip: fix vop blocked and screen black on RK356X and RK3588

RK3568 and RK3588 has the same issue as RV1126/RV1109 that caused by
dte fetch time limit, So we can set BIT(31) of register 0x24 default
to 1 as a workaround.

Signed-off-by: Simon Xue <xxm@rock-chips.com>
Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

HID: Input: Add battery list cleanup with devm action

The batteries list (hdev->batteries) is not cleaned up during
hidinput_disconnect(), but struct hid_battery entries are allocated
with devm_kzalloc.
When a driver is unbound (e.g. during devicereprobe), devm frees those
entries while their list_head nodesremain dangling in hdev->batteries,
which persists across rebinds.

Link: https://lore.kernel.org/all/20260602011949.2825852-1-rafael@rcpassos.me/
Fixes: 4a58ae85c3f9 ("HID: input: Add support for multiple batteries per device")
Signed-off-by: Rafael Passos <rafael@rcpassos.me>
Acked-by: Lucas Zampieri <lcasmz54@gmail.com>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>

NFS: write_completion: dereference loop-local req, not hdr->req

5d3869a41f36 ("NFS: fix writeback in presence of errors") introduced
a dereference of hdr->req->wb_lock_context in nfs_write_completion's
per-request loop.  hdr->req is set once at nfs_pgheader_init() time
and is not refcount-protected for the lifetime of the loop; when hdr
aggregates requests from multiple page groups (common under heavy
NFSv3 writeback), a parallel COMMIT on hdr->req's group can drop the
last reference and free it while the outer loop is still iterating
requests from other groups.  KASAN catches this as an 8-byte read at
offset +24 of a freed nfs_page slab object (wb_lock_context).

All requests in a given pgio share the same open_context, so reading
the loop-local req's wb_lock_context yields the same value and is
safe -- req is still on hdr->pages and holds its writeback kref
through the commit branch.

Caught with kasan:

BUG: KASAN: slab-use-after-free in nfs_write_completion+0x8f8/0xa50 [nfs]
Read of size 8 at addr ffff888118af2058 by task kworker/u16:16/122062
CPU: 2 UID: 0 PID: 122062 Comm: kworker/u16:16 Kdump: loaded Not tainted 7.1.0-rc4+ #ge05a759574b2 PREEMPT
Workqueue: nfsiod rpc_async_release
Call Trace:
<TASK>
dump_stack_lvl+0xaf/0x100
? nfs_write_completion+0x8f8/0xa50 [nfs]
print_report+0x157/0x4a1
? __virt_addr_valid+0x1fb/0x400
? nfs_write_completion+0x8f8/0xa50 [nfs]
kasan_report+0xc2/0x190
? nfs_write_completion+0x8f8/0xa50 [nfs]
nfs_write_completion+0x8f8/0xa50 [nfs]
? nfs_commit_release_pages+0xbd0/0xbd0 [nfs]
? lock_acquire+0x182/0x2e0
? process_one_work+0x937/0x1890
? nfs_pgio_header_alloc+0xd0/0xd0 [nfs]
rpc_free_task+0xee/0x160
rpc_async_release+0x5d/0xb0
process_one_work+0x9b0/0x1890
? pwq_dec_nr_in_flight+0xed0/0xed0
? rpc_final_put_task+0x140/0x140
worker_thread+0x75a/0x10a0
? process_one_work+0x1890/0x1890
? kthread+0x1af/0x4d0
? process_one_work+0x1890/0x1890
kthread+0x3d3/0x4d0
? kthread_affine_node+0x2c0/0x2c0
ret_from_fork+0x669/0xa50
? native_tss_update_io_bitmap+0x660/0x660
? __switch_to+0x9dd/0x1310
? kthread_affine_node+0x2c0/0x2c0
ret_from_fork_asm+0x11/0x20
</TASK>

Allocated by task 121997 on cpu 3 at 31643.290294s:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x13/0x60
__kasan_slab_alloc+0x62/0x70
kmem_cache_alloc_noprof+0x1ab/0x4e0
nfs_page_create+0x152/0x460 [nfs]
nfs_page_create_from_folio+0x7e/0x210 [nfs]
nfs_update_folio+0x7a9/0x32a0 [nfs]
nfs_write_end+0x290/0xc60 [nfs]
generic_perform_write+0x4ce/0x990
nfs_file_write+0x6b3/0xce0 [nfs]
vfs_write+0x63c/0xfa0
ksys_write+0x122/0x240
do_syscall_64+0xc3/0x13f0
entry_SYSCALL_64_after_hwframe+0x4b/0x53

Freed by task 122046 on cpu 0 at 31647.037964s:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x13/0x60
kasan_save_free_info+0x37/0x60
__kasan_slab_free+0x3b/0x60
kmem_cache_free+0x11b/0x5a0
nfs_page_group_destroy+0x13a/0x210 [nfs]
nfs_unlock_and_release_request+0x64/0x90 [nfs]
nfs_commit_release_pages+0x339/0xbd0 [nfs]
nfs_commit_release+0x51/0xb0 [nfs]
rpc_free_task+0xee/0x160
rpc_async_release+0x5d/0xb0
process_one_work+0x9b0/0x1890
worker_thread+0x75a/0x10a0
kthread+0x3d3/0x4d0
ret_from_fork+0x669/0xa50
ret_from_fork_asm+0x11/0x20

The buggy address belongs to the object at ffff888118af2040\x0a which belongs to the cache nfs_page of size 96
The buggy address is located 24 bytes inside of\x0a freed 96-byte region [ffff888118af2040, ffff888118af20a0)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x118af2
head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x4000000000000040(head|zone=2)
page_type: f5(slab)
raw: 4000000000000040 ffff88818cf2c4c0 ffffea000e61b990 ffffea0004e7d110
raw: 0000000000000000 0000000800190019 00000000f5000000 0000000000000000
head: 4000000000000040 ffff88818cf2c4c0 ffffea000e61b990 ffffea0004e7d110
head: 0000000000000000 0000000800190019 00000000f5000000 0000000000000000
head: 4000000000000001 ffffffffffffff81 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 121997, tgid 121997 (rsync), ts 31643290274577, free_ts 31642154777182
post_alloc_hook+0xd1/0x100
get_page_from_freelist+0xbad/0x2910
__alloc_frozen_pages_noprof+0x1c6/0x4a0
allocate_slab+0x330/0x620
___slab_alloc+0xe9/0x930
kmem_cache_alloc_noprof+0x35b/0x4e0
nfs_page_create+0x152/0x460 [nfs]
nfs_page_create_from_folio+0x7e/0x210 [nfs]
nfs_update_folio+0x7a9/0x32a0 [nfs]
nfs_write_end+0x290/0xc60 [nfs]
generic_perform_write+0x4ce/0x990
nfs_file_write+0x6b3/0xce0 [nfs]
vfs_write+0x63c/0xfa0
ksys_write+0x122/0x240
do_syscall_64+0xc3/0x13f0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
page last free pid 122202 tgid 122202 stack trace:
__free_frozen_pages+0x6da/0xf30
qlist_free_all+0x53/0x130
kasan_quarantine_reduce+0x198/0x1f0
__kasan_slab_alloc+0x46/0x70
kmem_cache_alloc_noprof+0x1ab/0x4e0
__alloc_object+0x2f/0x230
__create_object+0x22/0x80
kmem_cache_alloc_node_noprof+0x416/0x4d0
__alloc_skb+0x146/0x6e0
tcp_stream_alloc_skb+0x35/0x660
tcp_sendmsg_locked+0x1746/0x4260
tcp_sendmsg+0x2f/0x40
inet_sendmsg+0x9e/0xe0
__sock_sendmsg+0xd9/0x180
sock_sendmsg+0x122/0x200
xprt_sock_sendmsg+0x4ff/0x9a0

Memory state around the buggy address:
ffff888118af1f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
ffff888118af1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888118af2000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
                                                    ^
ffff888118af2080: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
ffff888118af2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Reviewed-by Jeff Layton <jlayton@kernel.org>

Fixes: 5d3869a41f36 ("NFS: fix writeback in presence of errors")
Cc: Olga Kornievskaia <okorniev@redhat.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: linux-nfs@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

dt-bindings: soc: sophgo: add sg2000 plic and clint documentation

Document the compatible strings for the sg2000 interrupt
controller and timer.

Signed-off-by: Joshua Milas <josh.milas@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260530173347.33533-4-josh.milas@gmail.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>

dt-bindings: soc: sophgo: add Milk-V Duo S board compatibles

Document the compatible strings for the Milk-V Duo S board [1]
which uses the SOPHGO SG2000 SoC.

Link: https://milkv.io/duo-s
Signed-off-by: Joshua Milas <josh.milas@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260530173347.33533-2-josh.milas@gmail.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>

MAINTAINERS: update Chen Wang's email address

Update my email address as my original email provider may
block emails in kernel development.

Signed-off-by: Chen Wang <chen.wang@linux.dev>
Link: https://patch.msgid.link/20260507022058.3913-1-chen.wang@linux.dev
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>

drm/imx: Fix three kernel-doc warnings in dcss-scaler.c

Fix the following W=1 kerneldoc warnings by adding the missing parameter
descriptions for @phase0_identity and @nn_interpolation in
dcss_scaler_filter_design() and @phase0_identity in
dcss_scaler_gaussian_filter()

Warning: drivers/gpu/drm/imx/dcss/dcss-scaler.c:173 function parameter 'phase0_identity' not described in 'dcss_scaler_gaussian_filter'
Warning: drivers/gpu/drm/imx/dcss/dcss-scaler.c:270 function parameter 'phase0_identity' not described in 'dcss_scaler_filter_design'
Warning: drivers/gpu/drm/imx/dcss/dcss-scaler.c:270 function parameter 'nn_interpolation' not described in 'dcss_scaler_filter_design'

Fixes: 9021c317b770 ("drm/imx: Add initial support for DCSS on iMX8MQ")
Signed-off-by: Yicong Hui <yiconghui@gmail.com>
Reviewed-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com>
Link: https://patch.msgid.link/20260406180013.2442096-1-yiconghui@gmail.com
Signed-off-by: Liu Ying <victor.liu@nxp.com>

kbuild: rust: clean `*.long-type-*.txt` files

When rustc prints an error containing a long type that doesn't fit
in a line, it will write the whole thing in a .txt file -- see commit
420dd187e157 (".gitignore: ignore rustc long type txt files") for more
details.

These files are purely compiler artifacts and are not created
intentionally by the build system.

Thus add them to the `clean` target to stop them from cluttering up the
source tree.

Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://github.com/Rust-for-Linux/linux/issues/1236
Signed-off-by: Joel Kamminga <contact@jkam.dev>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260530184944.10459-1-contact@jkam.dev
[ Reworded and linked to the previous related commit. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

arm64: dts: amlogic: t7: Add i2c pinctrl node

Add the T7 pinctrl used by the Khadas VIM4 for MCU communication.

Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Link: https://patch.msgid.link/20260516-add-mcu-fan-khadas-vim4-v6-6-cccc9b61f465@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: khadas-vim4: add PWM-driven status LED

The VIM4 board exposes a status LED wired to the PWM_AO_C_D output.
Enable the pwm_ao_cd controller with its pinmux, and declare a
pwm-leds node with a heartbeat trigger.

Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Link: https://patch.msgid.link/20260513-add-kvim4-sysled-v2-3-3ec9779e8875@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: khadas-vim4: reorder root node

Move the xtal-clk node to restore alphabetical ordering.

Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260513-add-kvim4-sysled-v2-2-3ec9779e8875@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: Fix missing required reset property

CHECK_DTBS shows missing reset required property in T7 DTBS.
A new CHECK_DTBS with this patch does not show this anymore.

Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Link: https://patch.msgid.link/20260331-fix-aml-t7-null-reset-v1-2-eb95b625234c@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: khadas-vim4: Add MMC nodes

Enable and configure the three MMC controllers for the Khadas VIM4 board:
- sd_emmc_a: SDIO interface for the BCM43752 Wi-Fi module
- sd_emmc_b: SD card slot
- sd_emmc_c: eMMC storage

Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Link: https://patch.msgid.link/20260326-add-emmc-t7-vim4-v5-9-d3f182b48e9d@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: khadas-vim4: Add SDIO power sequence and WiFi clock

Add the SDIO power sequence node using mmc-pwrseq-simple and a
32.768kHz PWM-based clock required by the Wi-Fi module.

Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Link: https://patch.msgid.link/20260326-add-emmc-t7-vim4-v5-7-d3f182b48e9d@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: Add PWM controller nodes

Add device tree nodes for the seven PWM controllers available
on the Amlogic T7 SoC, using amlogic,meson-s4-pwm as fallback compatible.
All nodes are disabled by default and should be
enabled in the board-specific DTS file.

Co-developed-by: Nick Xie <nick@khadas.com>
Signed-off-by: Nick Xie <nick@khadas.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Link: https://patch.msgid.link/20260326-add-emmc-t7-vim4-v5-5-d3f182b48e9d@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: Add MMC controller nodes

Add device tree nodes for the three MMC controllers available
on the Amlogic T7 SoC, using amlogic,meson-axg-mmc as fallback compatible.
All nodes are disabled by default and should be
enabled in the board-specific DTS file.

Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260326-add-emmc-t7-vim4-v5-3-d3f182b48e9d@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: Add clock controller nodes

Add the required clock controller nodes for Amlogic T7 SoC family:
- SCMI clock controller
- PLL clock controller
- Peripheral clock controller

Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Signed-off-by: Jian Hu <jian.hu@amlogic.com>
Link: https://patch.msgid.link/20260326092645.1053261-4-jian.hu@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: meson-s4-s905y4-khadas-vim1s: use rc-khadas keymap

The Khadas VIM1S board has an onboard IR receiver.
Configure the default keymap to "rc-khadas" to support the official
Khadas IR remote control.

Signed-off-by: Nick Xie <nick@khadas.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260327093016.722095-4-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: meson-s4-s905y4-khadas-vim1s: enable HYM8563 RTC

The Khadas VIM1S board has an on-board Haoyu Micro HYM8563 Real Time
Clock (RTC) connected to the I2C1 bus.

Enable the I2C1 controller and add the RTC child node to support
hardware clock persistence.

Signed-off-by: Nick Xie <nick@khadas.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260327093016.722095-3-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: meson-s4: add VRTC node

Add the Virtual RTC (VRTC) controller node to the Meson S4 SoC dtsi.

Signed-off-by: Nick Xie <nick@khadas.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260327093016.722095-2-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: meson-s4-s905y4-khadas-vim1s: add Function key support

Enable the SARADC controller and add the adc-keys node to support
the Function key found on the Khadas VIM1S board.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Nick Xie <nick@khadas.com>
Link: https://patch.msgid.link/20260325070618.81955-5-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: meson-s4: add internal SARADC controller

Add the SARADC (Successive Approximation Register ADC) controller
node to the Meson S4 SoC dtsi.

It uses the S4-specific compatible string with a fallback to the
G12A generation, as there are no known hardware differences.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Nick Xie <nick@khadas.com>
Link: https://patch.msgid.link/20260325070618.81955-4-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: khadas-vim4: Add power regulators

Add voltage regulator nodes describing the VIM4 power tree,
required by peripheral nodes such as the SD card controller.

Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Link: https://patch.msgid.link/20260326-add-emmc-t7-vim4-v5-6-d3f182b48e9d@aliel.fr
[narmstrong: squashed "Remove invalid property fix" from Ronald]
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

ALSA: oxygen: add HT-Omega eClaro (7284:9783) support

The HT-Omega eClaro is a PCI sound card built on the C-Media CMI8788
(Oxygen HD) controller, with PCI subsystem ID 7284:9783.

Output hardware:
- AK4396VF stereo DAC: front L/R output, connected via SPI CE0
- CS4362A 6-channel DAC: surround, center/LFE, and side outputs,
   connected via SPI CE1 with a 3-byte [0x30, reg, val] frame

The CS4362A uses inverse attenuation encoding (0 = 0 dB, 127 = max
attenuation) and a 0.5 dB/step logarithmic scale. Volume TLV is set
to TLV_DB_SCALE(-6350, 50, 0) to match the hardware. The channel-to-
register mapping was verified by listening test:
- Pair 1 (regs 7/8):   side L/R   (ALSA channels 6/7)
- Pair 2 (regs 10/11): center/LFE (ALSA channels 4/5)
- Pair 3 (regs 13/14): rear L/R   (ALSA channels 2/3)

Input hardware:
- CS5361 stereo ADC: Line In and Mic In capture

GPIO assignments:
- GPIO 0 (0x0001): CS4362A RESET# (active-low, driven high)
- GPIO 2/3:        CS5361 M0/M1 (sample rate mode)
- GPIO 5 (0x0020): front output stage enable (driven high)
- GPIO 8 (0x0100): headphone amplifier enable

Signed-off-by: Ferus Castor <feruscastor@proton.me>
Assisted-by: Claude:claude-sonnet-4-6
Link: https://patch.msgid.link/20260601015848.128566-1-feruscastor@proton.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>

powerpc/pseries/lparcfg: Replace deprecated strcpy in parse_system_parameter_string

strcpy() is deprecated; use strscpy() instead. Use the return value of
strscpy() instead of calling strlen() again, and ignore any potential
string truncation since both strings are much shorter than the buffer
size SPLPAR_MAXLENGTH.

Change both if conditions to silence the following checkpatch warnings:

Comparisons should place the constant on the right side of the test

Link: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251013135703.97260-2-thorsten.blum@linux.dev

powerpc: Fix indentation and replace typedef with struct name

Indent several struct members using tabs instead of spaces.

Replace the typedef alias AOUTHDR with an explicit struct name to
silence a checkpatch warning.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250517125834.421088-2-thorsten.blum@linux.dev

powerpc/rtas: Replace one-element array with flexible array member

Replace the deprecated one-element array with a modern flexible array
member in the struct rtas_error_log and add the __counted_by_be()
compiler attribute to improve access bounds-checking via
CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.

Link: https://github.com/KSPP/linux/issues/79
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250813103101.163698-2-thorsten.blum@linux.dev

powerpc: use sysfs_emit{_at} in sysfs show functions

Replace sprintf() with sysfs_emit() and sysfs_emit_at() in sysfs show
functions, which are preferred for formatting sysfs output because they
provide safer bounds checking.

While the current code only emits strings that fit easily within
PAGE_SIZE, use sysfs_emit() and sysfs_emit_at() to follow secure coding
best practices.

This is a mechanical cleanup with a few simple edge cases:

- In domains_show(), drop the redundant n < 0 check since neither
  sprintf() nor sysfs_emit() return negative values.

- In powercap_show() and psr_show(), also drop the dead ret < 0 checks.

- In ps3_fw_version_show(), normalize the output by adding a terminating
  newline as suggested by checkpatch.

- In vio's modalias_show(), replace the deprecated strcpy() [1] followed
  by strlen() with sysfs_emit().

Leave validate_show() and the variable-length hv-gpci helpers unchanged
since they already have explicit bounds handling, and converting those
would be more than a mechanical conversion.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260524130002.793476-2-thorsten.blum@linux.dev

MAINTAINERS: powerpc: update VMX AES entries

Commit 7cf2082e74ce ("lib/crypto: powerpc/aes: Migrate POWER8 optimized
code into library") removed arch/powerpc/crypto/aes.c and moved
arch/powerpc/crypto/aesp8-ppc.pl to lib/crypto/powerpc/.

However, the "IBM Power VMX Cryptographic instructions" entry still
references the removed file and no longer covers the moved aesp8-ppc.pl.

Remove the stale entry, add lib/crypto/powerpc/aesp8-ppc.pl, and tighten
the arch/powerpc/crypto/aesp8-ppc.* pattern to match the remaining
header only.

Fixes: 7cf2082e74ce ("lib/crypto: powerpc/aes: Migrate POWER8 optimized code into library")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Breno Leitao <leitao@debian.org>
Acked-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260524212943.799757-3-thorsten.blum@linux.dev

ppc/fadump: invoke kmsg_dump in fadump panic path

fadump is registered in panic_notifier_list and gets triggered before
kmsg_dump_desc() in the panic path. As a result, kmsg_dumpers such as
pstore are not executed during fadump crashes.

This is problematic because pstore provides a critical fallback mechanism
for crash analysis. When fadump fails to successfully reboot the system
or capture a dump, pstore logs may be the only available information from
the crashed kernel. Without invoking kmsg_dump_desc() in the fadump path,
we lose this valuable diagnostic data.

Invoke kmsg_dump_desc() from the fadump panic handler, but only when
fadump is actually registered (checked via should_fadump_crash()). This
ensures kmsg_dumpers are called without duplicating the call that occurs
later in panic() when fadump is not active.

The call is placed before crash_fadump() to ensure logs are captured
before the system attempts to trigger the firmware-assisted dump.

Reported-by: Shirisha G <shirisha@linux.ibm.com>
Suggested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Tested-by: Shirisha G <shirisha@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260412113057.46090-1-shivangu@linux.ibm.com

powerpc/xive: Add warning if target CPU not found

Add a warn_once to warn if the CPU target is not found. This could help
to find about any such usecase.

This is a very rare case, which either means mask was empty or
atomic update failed for all online CPUs. So it is worth printing that
path for potential fix.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Yury Norov <ynorov@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260427044715.559137-5-sshegde@linux.ibm.com

powerpc/perf: Use cpumask_intersects api for checking disable path

First online CPU in the node disables the nest counters by
making an OPAL call. Any other CPU in that node, will bail out.

Instead of using a temporary mask to find out if any cpu in the
node is visited or not, it is better to use the cpumask_intersects
api to achieve the same.

Similarly a temporary cpumask is used to check if a core is already part
of core_imc_cpumask. Use the same cpumask_intersects api there.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Yury Norov <ynorov@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260427044715.559137-4-sshegde@linux.ibm.com

powerpc: Simplify cpumask api usage for cpuinfo display

- cpumask_next can take -1 as valid argument. So simplify cpuinfo
iterator.

- Use cpumask_last to find if this_cpu is last online CPU.

/proc/cpuinfo shows same info with patch.

Reviewed-by: Yury Norov <ynorov@nvidia.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Yury Norov <ynorov@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260427044715.559137-3-sshegde@linux.ibm.com

powerpc: Use cpumask_next_wrap instead

cpu = cpumask_next(cpu, mask)
if (cpu >= nr_cpu_ids)
cpu = cpumask_first(mask)

Above block is identical to:
cpu = cpumask_next_wrap(cpu, mask)

Replace it, No change in functionality or performance.
Slightly simpler code.

Reviewed-by: Yury Norov <ynorov@nvidia.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Yury Norov <ynorov@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260427044715.559137-2-sshegde@linux.ibm.com

powerpc/fadump: Add timeout to RTAS busy-wait loops

The ibm,configure-kernel-dump RTAS call sites in
rtas_fadump_register(), rtas_fadump_unregister(), and
rtas_fadump_invalidate() polled indefinitely while firmware returned
a busy status. A misbehaving or hung firmware could stall these paths
forever, blocking fadump registration at boot or preventing clean
teardown.

Introduce rtas_fadump_call(), a helper that wraps the common
busy-wait pattern shared by all three sites. The helper accumulates
the total delay and returns -ETIMEDOUT if firmware keeps returning a
busy status beyond RTAS_FADUMP_MAX_WAIT_MS (60 seconds). A pr_debug()
message is emitted on each busy iteration to aid diagnosis when the
timeout is hit.

Signed-off-by: Adriano Vero <adri.vero.dev@gmail.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
[Maddy: Fixed newline after Signed-off-by]
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260506222024.30352-1-adri.vero.dev@gmail.com

accel/ivpu: Add buffer overflow check in MS get_info_ioctl

Add validation that the info size returned from the metric stream info
query is not exceeded when checked against the allocated buffer size.
If the firmware returns a size larger than the buffer, reject the
operation with -EOVERFLOW instead of proceeding with an incorrect
buffer copy.

Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support")
Cc: stable@vger.kernel.org # v6.18+
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20260529120841.135852-1-andrzej.kacprowski@linux.intel.com

accel/ivpu: Add bounds checks for firmware log indices

Add validation that read and write indices in the firmware log buffer
are within valid bounds (< data_size) before using them. If
out-of-bounds indices are encountered (from firmware), clamp them to
safe values instead of proceeding with invalid offsets.

This prevents potential out-of-bounds buffer access when firmware
supplies invalid log indices.

Fixes: 1fc1251149a7 ("accel/ivpu: Refactor functions in ivpu_fw_log.c")
Cc: stable@vger.kernel.org # v6.18+
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20260529115842.135378-1-andrzej.kacprowski@linux.intel.com

accel/ivpu: Add bounds check for firmware runtime memory

Validate that the firmware runtime memory specified in the image
header is properly aligned and sized to hold the firmware image.
This prevents errors during memory allocation and image transfer.

Fixes: 2007e210b6a1 ("accel/ivpu: Split FW runtime and global memory buffers")
Cc: stable@vger.kernel.org # v7.0+
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20260529120853.135876-1-andrzej.kacprowski@linux.intel.com

powerpc64/bpf: Add powerpc64 JIT support for timed may_goto

When verifier sees a timed may_goto instruction, it emits a call to
arch_bpf_timed_may_goto() with a stack offset in BPF_REG_AX
(powerpc64 R12) and expects the refreshed count value to be returned
in the same register. The verifier doesn't save or restore any registers
before emitting this call.

arch_bpf_timed_may_goto() should act as a trampoline to call
bpf_check_timed_may_goto() with powerpc64 ELF ABI calling convention.

To support this custom calling convention, implement
arch_bpf_timed_may_goto() in assembly and make sure BPF caller saved
registers are preserved, then call bpf_check_timed_may_goto with
the powerpc64 ABI calling convention where first argument and return
value both are in R3. Finally, move the result back into BPF_REG_AX(R12)
before returning.

Also, introduce bpf_jit_emit_func_call() that computes the offset from
kernel_toc_addr(), validates that the target and emits the ADDIS/ADDI
sequence to load the function address before performing the indirect
branch via MTCTR/BCTRL. The existing code in bpf_jit_emit_func_call_rel()
is refactored to use this function.

Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260403105016.74775-1-skb99@linux.ibm.com

tools/testing/memblock: fix stale NUMA reservation tests

memblock allocations now reserve memory with MEMBLOCK_RSRV_KERN and,
on NUMA configurations, record the requested node on the reserved
region. Several memblock simulator NUMA tests still expected merges
that only worked before those reservation semantics changed, so the
suite aborted even though the allocator behavior was correct.

Update the NUMA merge expectations in the memblock_alloc_try_nid()
and memblock_alloc_exact_nid_raw() tests to match the current reserved
region metadata rules. For cases that should still merge, create the
pre-existing reservation with matching nid and MEMBLOCK_RSRV_KERN
metadata. Also strengthen the memblock_alloc_node() coverage by
checking the newly created reserved region directly instead of
re-reading the source memory node descriptor.

Finally, drop the stale README/TODO notes that still claimed
memblock_alloc_node() could not be tested.

The memblock simulator passes again with NUMA enabled after these
updates.

Signed-off-by: Priyanshu Kumar <priyanshukumarpu@gmail.com>
Link: https://patch.msgid.link/20260415122731.1768912-1-priyanshukumarpu@gmail.com
[rppt: dropped unrelated changes]
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

mm/fake-numa: fix under-allocation detection in uniform split

When splitting NUMA node uniformly, split_nodes_size_interleave_uniform()
returns the next absolute node ID, not the number of nodes created.

The existing under-allocation detection logic compares next absolute node
ID (ret) and request count (n), which only works when nid starts at 0.

For example, on a system with 2 physical NUMA nodes (node 0: 2GB, node
1: 128MB) and numa=fake=8U, 8 fake nodes are successfully created from
node 0 and split_nodes_size_interleave_uniform() returns 8. For node 1,
fake node nid starts at 8, but only 4 fake nodes are created due to
current FAKE_NODE_MIN_SIZE being 32MB, and
split_nodes_size_interleave_uniform() returns 12. By existing
under-allocation detection logic, "ret < n" (12 < 8) is false, so the
under-allocation will not be detected.

Fix under-allocation detection logic to compare the number of actually
created nodes (ret - nid) against the request count (n). Also skip
under-allocation detection logic for memoryless physical nodes where no
fake nodes are created.

Also, fix the outdated comment describing
split_nodes_size_interleave_uniform() to match the actual return value.

Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Reported-by: Donghyeon Lee <asd142513@gmail.com>
Reported-by: Munhui Chae <mochae@student.42seoul.kr>
Fixes: cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability") # 4.19
Link: https://patch.msgid.link/20260417135805.1758378-1-ekffu200098@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

powerpc: dts: Build devicetrees of enabled platforms

Follow the same approach as other architectures such as Arm or RISC-V,
and build devicetrees based on platforms selected in Kconfig. This makes
it unnecessary to use CONFIG_OF_ALL_DTBS on PowerPC in order to build
DTB files.

This makes it easier to use other build and test infrastructure such as
`make dtbs_check`, and is a first step towards generating FIT images
that include all the relevant DTBs with `make image.fit`.

Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260315-mpc83xx-dtb-v4-1-243849be4280@posteo.net

powerpc: dts: Add missing model properties

These are the remaining PowerPC devicetrees that do not have a /model
property, even though it is required by the devicetree specification.
Fix them by simply copying the compatible string.

Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260313-ppc-model-v1-2-bf19b3d1b65d@posteo.net

powerpc: dts: mpc8315erdb: Add missing model property

The mpc8315erdb devicetree did thus far not have a /model property, even
though it is required by the devicetree specification.

Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260313-ppc-model-v1-1-bf19b3d1b65d@posteo.net

bpf: fix UAF by restoring RCU-delayed inode freeing in bpffs

commit 4f375ade6aa9 ("bpf: Avoid RCU context warning when unpinning
htab with internal structs") moved inode cleanup from ->free_inode()
into ->destroy_inode() to avoid sleeping in RCU context when calling
bpf_any_put(). However this removed the RCU delay on freeing the
inode itself and the cached symlink body (i_link), both of which
can be accessed by RCU pathwalk (pick_link, may_lookup etc.).

This causes a use-after-free when a concurrent unlinkat() drops the
last inode reference and destroy_inode() frees the inode immediately,
while another task is still walking the path in RCU mode and reads
inode->i_opflags (offset +2) inside current_time() -> is_mgtime().

KASAN reports:
  BUG: KASAN: slab-use-after-free in is_mgtime include/linux/fs.h:2313
  Read of size 2 at addr ffff8880407e4282 (offset +2 = i_opflags)

The rules (per Al Viro):
  ->destroy_inode()  called immediately, can sleep, use for blocking
                     cleanup e.g. bpf_any_put()
  ->free_inode()     called after RCU grace period, use for freeing
                     inode and anything RCU-accessible e.g. i_link

Fix: split the two concerns properly:
  - keep bpf_any_put() in bpf_destroy_inode() since it is blocking
    and needs to run promptly
  - introduce bpf_free_inode() to handle kfree(i_link) and
    free_inode_nonrcu() with proper RCU delay, preventing the UAF

Fixes: 4f375ade6aa9 ("bpf: Avoid RCU context warning when unpinning htab with internal structs")
Reported-by: syzbot+36e50496c8ac4bcde3f9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=36e50496c8ac4bcde3f9
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/all/20260423043906.GN3518998@ZenIV/
Link: https://lore.kernel.org/all/20260602002607.110866-1-kartikey406@gmail.com/T/
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20260602025249.113828-1-kartikey406@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

platform/chrome: Prevent build for big-endian systems

Both ARM and ARM64 which are a dependency for CHROME_PLATFORMS have
seldomly used big-endian variants.

The ChromeOS EC framework and drivers are written under the assumption
that they will be running on a little-endian systems. Code which would
be broken on big-endian can be found trivially.

Some examples:
cros_ec.c: suspend_params.sleep_timeout_ms = ec_dev->suspend_timeout_ms
cros_ec_debugfs.c: resp->time_since_ec_boot_ms
cros_ec_wdt.c: arg.req.reboot_timeout_sec = wdd->timeout

Prevent the build for big-endian systems.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20260531-cros-big-endian-v1-2-0cc90f39c636@weissschuh.net
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>

platform/chrome: Remove superfluous dependencies from CROS_EC

CROS_EC depends on CHROME_PLATFORMS which already declares these
dependencies.

Remove the duplication.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20260531-cros-big-endian-v1-1-0cc90f39c636@weissschuh.net
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>

Merge branch 'add-starfive-jhb100-soc-sgmii-gmac-support'

Minda Chen says:

====================
Add StarFive jhb100 soc SGMII GMAC support

jhb100 is a Starfive new RISC-V SoC for datacenter BMC (BaseBoard
Managent Controller). Similar with Aspeed 27x0.

The jhb100 minimal system upstream is in progress:
https://patchwork.kernel.org/project/linux-riscv/cover/20260508053632.818548-1-changhuang.liang@starfivetech.com/

jhb100 GMAC still using designware GMAC core like JH7100 and JH7110,
and contains 2 SGMII interfaces, 1 RGMII/RMII interface, 1 RMII
interface. In JH7100/JH7110 dwmac-starfive.c have supported RGMII/RMII
interface. So require to add SGMII support to dwmac-starfive.c for JHB100.

SGMII serdes PHY has been integrated in JHB100 and do not have driver
setting.

In JHB100 EVB board, SGMII connect with motorcomm YT8531s external PHY
and support RJ45 ethernet port.
====================

Link: https://patch.msgid.link/20260527084108.121416-1-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: starfive: Add STMMAC_FLAG_SPH_DISABLE flag

Add default disable split header flag in all the starfive
soc.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260527084108.121416-5-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: starfive: Add jhb100 SGMII interface

Add jhb100 compatible and SGMII support. jhb100 soc contains
2 SGMII interfaces and integrated with serdes PHY. SGMII with
split TX/RX MAC clock and need to set 2.5M/25M/125M TX/RX clock
rate in 10M/100M/1000M speed mode.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Link: https://patch.msgid.link/20260527084108.121416-4-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: starfive,jh7110-dwmac: Add jhb100 support

The jhb100 GMAC still using Synopsys designware GMAC core.
hardware features are similar with jh7100.

Add jhb100 GMAC compatible and reset, interrupts features.
jhb100 dwmac has only one reset signal and one interrupt
line.

jhb100 SGMII interface tx/rx mac clock is split and require to
set clock rate in 10M/100M/1000M speed. So dts need to add a
new rx clock in code, dts and dt binding doc.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260527084108.121416-3-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: starfive,jh7110-dwmac: Remove jh8100

Remove jh8100 dt-bindings because do not support it now.
StarFive have stopped jh8100 developing and will not release
it outside.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260527084108.121416-2-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-7.1/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fix from Mikulas Patocka:

- fix race condition in dm-cache-policy-smq

* tag 'for-7.1/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache policy smq: check allocation under invalidate lock

devlink: Release nested relation on devlink free

devlink relation state is normally released from devl_unregister(), which
calls devlink_rel_put(). This misses devlink instances that get a nested
relation before registration and then fail probe before devl_register() is
reached.

That flow can happen for SFs. The child devlink gets linked to its
parent before registration, then a later probe error calls devlink_free()
directly. Since the instance was never registered, devl_unregister() is not
called and devlink->rel is leaked.

Release any pending relation from devlink_free() as well. The registered
path is unchanged because devl_unregister() already clears devlink->rel
before devlink_free() runs.

Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260528191411.3270532-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: qrtr: fix node refcount leak on ctrl packet alloc failure

qrtr_send_resume_tx() calls qrtr_node_lookup() which takes a
reference on the returned node. If the subsequent call to
qrtr_alloc_ctrl_packet() fails due to memory allocation failure, the
function returns -ENOMEM without calling qrtr_node_release() to
release the node reference.

Add qrtr_node_release(node) before returning on the allocation failure
path to properly release the reference.

Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260528080019.1176700-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan743x: avoid netdev-based logging before netdev registration

This patch updates the lan743x driver to prevent the use of netdev-based
logging APIs (such as netdev_dbg) before the network device has been
successfully registered. Using netdev-based logging prior to registration
results in log messages referencing "(unnamed net_device) (uninitialized)",
which can be confusing and less informative.

The driver must use netif_msg_ APIs and device-based logging (e.g. dev_dbg)
until netdev registration is complete. This ensures log entries are
associated with the correct device context and improves log clarity. After
registration, netdev-based logging APIs can be used safely.

Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://patch.msgid.link/20260528165017.421576-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>