From: Greg Kroah-Hartman Date: Fri, 15 May 2026 09:18:20 +0000 (+0200) Subject: 7.0-stable patches X-Git-Tag: v5.10.256~9 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=594b7278f95d7a3f7ace3a8cd90c3f444f19d797;p=thirdparty%2Fkernel%2Fstable-queue.git 7.0-stable patches added patches: cgroup-defer-css-percpu_ref-kill-on-rmdir-until-cgroup-is-depopulated.patch cgroup-increment-nr_dying_subsys_-from-rmdir-context.patch edac-versalnet-fix-device-name-memory-leak.patch edac-versalnet-refactor-memory-controller-initialization-and-cleanup.patch io_uring-zcrx-use-guards-for-locking.patch io_uring-zcrx-warn-on-freelist-violations.patch kho-fix-error-handling-in-kho_add_subtree.patch mm-vma-do-not-try-to-unmap-a-vma-if-mmap_prepare-invoked-from-mmap.patch perf-build-fix-argument-list-too-long-in-second-location.patch sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch spi-uniphier-fix-controller-deregistration.patch spi-uniphier-simplify-clock-handling-with-devm_clk_get_enabled.patch usb-typec-tcpm-reset-internal-port-states-on-soft-reset-ams.patch --- diff --git a/queue-7.0/cgroup-defer-css-percpu_ref-kill-on-rmdir-until-cgroup-is-depopulated.patch b/queue-7.0/cgroup-defer-css-percpu_ref-kill-on-rmdir-until-cgroup-is-depopulated.patch new file mode 100644 index 0000000000..de7961cbb4 --- /dev/null +++ b/queue-7.0/cgroup-defer-css-percpu_ref-kill-on-rmdir-until-cgroup-is-depopulated.patch @@ -0,0 +1,488 @@ +From stable+bounces-246936-greg=kroah.com@vger.kernel.org Wed May 13 19:05:49 2026 +From: Sasha Levin +Date: Wed, 13 May 2026 12:33:14 -0400 +Subject: cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated +To: stable@vger.kernel.org +Cc: Tejun Heo , Martin Pitt , Sebastian Andrzej Siewior , Sasha Levin +Message-ID: <20260513163314.3807064-2-sashal@kernel.org> + +From: Tejun Heo + +[ Upstream commit 93618edf753838a727dbff63c7c291dee22d656b ] + +A chain of commits going back to v7.0 reworked rmdir to satisfy the +controller invariant that a subsystem's ->css_offline() must not run while +tasks are still doing kernel-side work in the cgroup. + +[1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out") +[2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") +[3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir") +[4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition") +[5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context") + +[1] moved task cset unlink from do_exit() to finish_task_switch() so a +task's cset link drops only after the task has fully stopped scheduling. +That made tasks past exit_signals() linger on cset->tasks until their final +context switch, which led to a series of problems as what userspace expected +to see after rmdir diverged from what the kernel needs to wait for. [2]-[5] +tried to bridge that divergence: [2] filtered the exiting tasks from +cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4] +fixed the wait's condition; [5] made nr_dying_subsys_* visible +synchronously. + +The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the +rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g. +host PID 1 systemd reaping orphan pids that were re-parented to it during +the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those +pids to free, the pids can't free because PID 1 is the reaper and it's stuck +in rmdir, and the system A-A deadlocks. No internal lock ordering breaks +this; the wait itself is the bug. + +The css killing side that drove the original reorder, however, can be made +cleanly asynchronous: ->css_offline() is already async, run from +css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to +make that chain start only after all tasks have left the cgroup. rmdir's +user-visible side then returns as soon as cgroup.procs and friends are +empty, while ->css_offline() still runs only after the cgroup is fully +drained. + +Verified by the original reproducer (pidns teardown + zombie reaper, runs +under vng) which hangs vanilla and succeeds here, and by per-commit +deterministic repros for [2], [3], [4], [5] with a boot parameter that +widens the post-exit_signals() window so each state is reliably reachable. +Some stress tests on top of that. + +cgroup_apply_control_disable() has the same shape of pre-existing race: +when a controller is disabled via subtree_control, kill_css() ran +synchronously while tasks past exit_signals() could still be linked to +the cgroup's csets, and ->css_offline() could fire before they drained. +This patch preserves the existing synchronous behavior at that call site +(kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch +will defer kill_css_finish() there using a per-css trigger. + +This seems like the right approach and I don't see problems with it. The +changes are somewhat invasive but not excessively so, so backporting to +-stable should be okay. If something does turn out to be wrong, the fallback +is to revert the entire chain ([1]-[5]) and rework in the development branch +instead. + +v2: Pin cgrp across the deferred destroy work with explicit + cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1 + wasn't actually broken (ordered cgroup_offline_wq + queue_work order + in cgroup_task_dead() saved it) but the explicit ref removes the + dependency on those non-obvious invariants. Also note the + pre-existing cgroup_apply_control_disable() race in the description; + a follow-up will defer kill_css_finish() there. + +Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir") +Cc: stable@vger.kernel.org # v7.0+ +Reported-and-tested-by: Martin Pitt +Link: https://lore.kernel.org/all/afHNg2VX2jy9bW7y@piware.de/ +Link: https://lore.kernel.org/all/35e0670adb4abeab13da2c321582af9f@kernel.org/ +Signed-off-by: Tejun Heo +Acked-by: Sebastian Andrzej Siewior +Signed-off-by: Sasha Levin +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/cgroup-defs.h | 4 + kernel/cgroup/cgroup.c | 250 ++++++++++++++++++++------------------------ + 2 files changed, 119 insertions(+), 135 deletions(-) + +--- a/include/linux/cgroup-defs.h ++++ b/include/linux/cgroup-defs.h +@@ -609,8 +609,8 @@ struct cgroup { + /* used to wait for offlining of csses */ + wait_queue_head_t offline_waitq; + +- /* used by cgroup_rmdir() to wait for dying tasks to leave */ +- wait_queue_head_t dying_populated_waitq; ++ /* defers killing csses after removal until cgroup is depopulated */ ++ struct work_struct finish_destroy_work; + + /* used to schedule release agent */ + struct work_struct release_agent_work; +--- a/kernel/cgroup/cgroup.c ++++ b/kernel/cgroup/cgroup.c +@@ -278,10 +278,12 @@ static void cgroup_finalize_control(stru + static void css_task_iter_skip(struct css_task_iter *it, + struct task_struct *task); + static int cgroup_destroy_locked(struct cgroup *cgrp); ++static void cgroup_finish_destroy(struct cgroup *cgrp); ++static void kill_css_sync(struct cgroup_subsys_state *css); ++static void kill_css_finish(struct cgroup_subsys_state *css); + static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, + struct cgroup_subsys *ss); + static void css_release(struct percpu_ref *ref); +-static void kill_css(struct cgroup_subsys_state *css); + static int cgroup_addrm_files(struct cgroup_subsys_state *css, + struct cgroup *cgrp, struct cftype cfts[], + bool is_add); +@@ -858,6 +860,16 @@ static void cgroup_update_populated(stru + if (was_populated == cgroup_is_populated(cgrp)) + break; + ++ /* ++ * Subtree just emptied below an offlined cgrp. Fire deferred ++ * destroy. The transition is one-shot. ++ */ ++ if (was_populated && !css_is_online(&cgrp->self)) { ++ cgroup_get(cgrp); ++ WARN_ON_ONCE(!queue_work(cgroup_offline_wq, ++ &cgrp->finish_destroy_work)); ++ } ++ + cgroup1_check_for_release(cgrp); + TRACE_CGROUP_PATH(notify_populated, cgrp, + cgroup_is_populated(cgrp)); +@@ -2100,6 +2112,16 @@ static int cgroup_reconfigure(struct fs_ + return 0; + } + ++static void cgroup_finish_destroy_work_fn(struct work_struct *work) ++{ ++ struct cgroup *cgrp = container_of(work, struct cgroup, finish_destroy_work); ++ ++ cgroup_lock(); ++ cgroup_finish_destroy(cgrp); ++ cgroup_unlock(); ++ cgroup_put(cgrp); ++} ++ + static void init_cgroup_housekeeping(struct cgroup *cgrp) + { + struct cgroup_subsys *ss; +@@ -2126,7 +2148,7 @@ static void init_cgroup_housekeeping(str + #endif + + init_waitqueue_head(&cgrp->offline_waitq); +- init_waitqueue_head(&cgrp->dying_populated_waitq); ++ INIT_WORK(&cgrp->finish_destroy_work, cgroup_finish_destroy_work_fn); + INIT_WORK(&cgrp->release_agent_work, cgroup1_release_agent); + } + +@@ -3436,7 +3458,8 @@ static void cgroup_apply_control_disable + + if (css->parent && + !(cgroup_ss_mask(dsct) & (1 << ss->id))) { +- kill_css(css); ++ kill_css_sync(css); ++ kill_css_finish(css); + } else if (!css_visible(css)) { + css_clear_dir(css); + if (ss->css_reset) +@@ -5558,7 +5581,7 @@ static struct cftype cgroup_psi_files[] + * css destruction is four-stage process. + * + * 1. Destruction starts. Killing of the percpu_ref is initiated. +- * Implemented in kill_css(). ++ * Implemented in kill_css_finish(). + * + * 2. When the percpu_ref is confirmed to be visible as killed on all CPUs + * and thus css_tryget_online() is guaranteed to fail, the css can be +@@ -6037,7 +6060,7 @@ out_unlock: + /* + * This is called when the refcnt of a css is confirmed to be killed. + * css_tryget_online() is now guaranteed to fail. Tell the subsystem to +- * initiate destruction and put the css ref from kill_css(). ++ * initiate destruction and put the css ref from kill_css_finish(). + */ + static void css_killed_work_fn(struct work_struct *work) + { +@@ -6069,15 +6092,12 @@ static void css_killed_ref_fn(struct per + } + + /** +- * kill_css - destroy a css +- * @css: css to destroy ++ * kill_css_sync - synchronous half of css teardown ++ * @css: css being killed + * +- * This function initiates destruction of @css by removing cgroup interface +- * files and putting its base reference. ->css_offline() will be invoked +- * asynchronously once css_tryget_online() is guaranteed to fail and when +- * the reference count reaches zero, @css will be released. ++ * See cgroup_destroy_locked(). + */ +-static void kill_css(struct cgroup_subsys_state *css) ++static void kill_css_sync(struct cgroup_subsys_state *css) + { + struct cgroup_subsys *ss = css->ss; + +@@ -6100,24 +6120,6 @@ static void kill_css(struct cgroup_subsy + */ + css_clear_dir(css); + +- /* +- * Killing would put the base ref, but we need to keep it alive +- * until after ->css_offline(). +- */ +- css_get(css); +- +- /* +- * cgroup core guarantees that, by the time ->css_offline() is +- * invoked, no new css reference will be given out via +- * css_tryget_online(). We can't simply call percpu_ref_kill() and +- * proceed to offlining css's because percpu_ref_kill() doesn't +- * guarantee that the ref is seen as killed on all CPUs on return. +- * +- * Use percpu_ref_kill_and_confirm() to get notifications as each +- * css is confirmed to be seen as killed on all CPUs. +- */ +- percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn); +- + css->cgroup->nr_dying_subsys[ss->id]++; + /* + * Parent css and cgroup cannot be freed until after the freeing +@@ -6130,44 +6132,88 @@ static void kill_css(struct cgroup_subsy + } + + /** +- * cgroup_destroy_locked - the first stage of cgroup destruction ++ * kill_css_finish - deferred half of css teardown ++ * @css: css being killed ++ * ++ * See cgroup_destroy_locked(). ++ */ ++static void kill_css_finish(struct cgroup_subsys_state *css) ++{ ++ lockdep_assert_held(&cgroup_mutex); ++ ++ /* ++ * Skip on re-entry: cgroup_apply_control_disable() may have killed @css ++ * earlier. cgroup_destroy_locked() can still walk it because ++ * offline_css() (which NULLs cgrp->subsys[ssid]) runs async. ++ */ ++ if (percpu_ref_is_dying(&css->refcnt)) ++ return; ++ ++ /* ++ * Killing would put the base ref, but we need to keep it alive until ++ * after ->css_offline(). ++ */ ++ css_get(css); ++ ++ /* ++ * cgroup core guarantees that, by the time ->css_offline() is invoked, ++ * no new css reference will be given out via css_tryget_online(). We ++ * can't simply call percpu_ref_kill() and proceed to offlining css's ++ * because percpu_ref_kill() doesn't guarantee that the ref is seen as ++ * killed on all CPUs on return. ++ * ++ * Use percpu_ref_kill_and_confirm() to get notifications as each css is ++ * confirmed to be seen as killed on all CPUs. ++ */ ++ percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn); ++} ++ ++/** ++ * cgroup_destroy_locked - destroy @cgrp (called on rmdir) + * @cgrp: cgroup to be destroyed + * +- * css's make use of percpu refcnts whose killing latency shouldn't be +- * exposed to userland and are RCU protected. Also, cgroup core needs to +- * guarantee that css_tryget_online() won't succeed by the time +- * ->css_offline() is invoked. To satisfy all the requirements, +- * destruction is implemented in the following two steps. +- * +- * s1. Verify @cgrp can be destroyed and mark it dying. Remove all +- * userland visible parts and start killing the percpu refcnts of +- * css's. Set up so that the next stage will be kicked off once all +- * the percpu refcnts are confirmed to be killed. +- * +- * s2. Invoke ->css_offline(), mark the cgroup dead and proceed with the +- * rest of destruction. Once all cgroup references are gone, the +- * cgroup is RCU-freed. +- * +- * This function implements s1. After this step, @cgrp is gone as far as +- * the userland is concerned and a new cgroup with the same name may be +- * created. As cgroup doesn't care about the names internally, this +- * doesn't cause any problem. ++ * Tear down @cgrp on behalf of rmdir. Constraints: ++ * ++ * - Userspace: rmdir must succeed when cgroup.procs and friends are empty. ++ * ++ * - Kernel: subsystem ->css_offline() must not run while any task in @cgrp's ++ * subtree is still doing kernel work. A task hidden from cgroup.procs (past ++ * exit_signals() with signal->live cleared) can still schedule, allocate, and ++ * consume resources until its final context switch. Dying descendants in the ++ * subtree can host such tasks too. ++ * ++ * - Kernel: css_tryget_online() must fail by the time ->css_offline() runs. ++ * ++ * The destruction runs in three parts: ++ * ++ * - This function: synchronous user-visible state teardown plus kill_css_sync() ++ * on each subsystem css. ++ * ++ * - cgroup_finish_destroy(): kicks the percpu_ref kill via kill_css_finish() on ++ * each subsystem css. Fires once @cgrp's subtree is fully drained, either ++ * inline here or from cgroup_update_populated(). ++ * ++ * - The percpu_ref kill chain: css_killed_ref_fn -> css_killed_work_fn -> ++ * ->css_offline() -> release/free. ++ * ++ * Return 0 on success, -EBUSY if a userspace-visible task or an online child ++ * remains. + */ + static int cgroup_destroy_locked(struct cgroup *cgrp) +- __releases(&cgroup_mutex) __acquires(&cgroup_mutex) + { + struct cgroup *tcgrp, *parent = cgroup_parent(cgrp); + struct cgroup_subsys_state *css; + struct cgrp_cset_link *link; ++ struct css_task_iter it; ++ struct task_struct *task; + int ssid, ret; + + lockdep_assert_held(&cgroup_mutex); + +- /* +- * Only migration can raise populated from zero and we're already +- * holding cgroup_mutex. +- */ +- if (cgroup_is_populated(cgrp)) ++ css_task_iter_start(&cgrp->self, 0, &it); ++ task = css_task_iter_next(&it); ++ css_task_iter_end(&it); ++ if (task) + return -EBUSY; + + /* +@@ -6191,9 +6237,8 @@ static int cgroup_destroy_locked(struct + link->cset->dead = true; + spin_unlock_irq(&css_set_lock); + +- /* initiate massacre of all css's */ + for_each_css(css, ssid, cgrp) +- kill_css(css); ++ kill_css_sync(css); + + /* clear and remove @cgrp dir, @cgrp has an extra ref on its kn */ + css_clear_dir(&cgrp->self); +@@ -6224,79 +6269,27 @@ static int cgroup_destroy_locked(struct + /* put the base reference */ + percpu_ref_kill(&cgrp->self.refcnt); + ++ if (!cgroup_is_populated(cgrp)) ++ cgroup_finish_destroy(cgrp); ++ + return 0; + }; + + /** +- * cgroup_drain_dying - wait for dying tasks to leave before rmdir +- * @cgrp: the cgroup being removed ++ * cgroup_finish_destroy - deferred half of @cgrp destruction ++ * @cgrp: cgroup whose subtree just became empty + * +- * cgroup.procs and cgroup.threads use css_task_iter which filters out +- * PF_EXITING tasks so that userspace doesn't see tasks that have already been +- * reaped via waitpid(). However, cgroup_has_tasks() - which tests whether the +- * cgroup has non-empty css_sets - is only updated when dying tasks pass through +- * cgroup_task_dead() in finish_task_switch(). This creates a window where +- * cgroup.procs reads empty but cgroup_has_tasks() is still true, making rmdir +- * fail with -EBUSY from cgroup_destroy_locked() even though userspace sees no +- * tasks. +- * +- * This function aligns cgroup_has_tasks() with what userspace can observe. If +- * cgroup_has_tasks() but the task iterator sees nothing (all remaining tasks are +- * PF_EXITING), we wait for cgroup_task_dead() to finish processing them. As the +- * window between PF_EXITING and cgroup_task_dead() is short, the wait is brief. +- * +- * This function only concerns itself with this cgroup's own dying tasks. +- * Whether the cgroup has children is cgroup_destroy_locked()'s problem. +- * +- * Each cgroup_task_dead() kicks the waitqueue via cset->cgrp_links, and we +- * retry the full check from scratch. +- * +- * Must be called with cgroup_mutex held. ++ * See cgroup_destroy_locked() for the rationale. + */ +-static int cgroup_drain_dying(struct cgroup *cgrp) +- __releases(&cgroup_mutex) __acquires(&cgroup_mutex) ++static void cgroup_finish_destroy(struct cgroup *cgrp) + { +- struct css_task_iter it; +- struct task_struct *task; +- DEFINE_WAIT(wait); ++ struct cgroup_subsys_state *css; ++ int ssid; + + lockdep_assert_held(&cgroup_mutex); +-retry: +- if (!cgroup_has_tasks(cgrp)) +- return 0; + +- /* Same iterator as cgroup.threads - if any task is visible, it's busy */ +- css_task_iter_start(&cgrp->self, 0, &it); +- task = css_task_iter_next(&it); +- css_task_iter_end(&it); +- +- if (task) +- return -EBUSY; +- +- /* +- * All remaining tasks are PF_EXITING and will pass through +- * cgroup_task_dead() shortly. Wait for a kick and retry. +- * +- * cgroup_has_tasks() can't transition from false to true while we're +- * holding cgroup_mutex, but the true to false transition happens +- * under css_set_lock (via cgroup_task_dead()). We must retest and +- * prepare_to_wait() under css_set_lock. Otherwise, the transition +- * can happen between our first test and prepare_to_wait(), and we +- * sleep with no one to wake us. +- */ +- spin_lock_irq(&css_set_lock); +- if (!cgroup_has_tasks(cgrp)) { +- spin_unlock_irq(&css_set_lock); +- return 0; +- } +- prepare_to_wait(&cgrp->dying_populated_waitq, &wait, +- TASK_UNINTERRUPTIBLE); +- spin_unlock_irq(&css_set_lock); +- mutex_unlock(&cgroup_mutex); +- schedule(); +- finish_wait(&cgrp->dying_populated_waitq, &wait); +- mutex_lock(&cgroup_mutex); +- goto retry; ++ for_each_css(css, ssid, cgrp) ++ kill_css_finish(css); + } + + int cgroup_rmdir(struct kernfs_node *kn) +@@ -6308,12 +6301,9 @@ int cgroup_rmdir(struct kernfs_node *kn) + if (!cgrp) + return 0; + +- ret = cgroup_drain_dying(cgrp); +- if (!ret) { +- ret = cgroup_destroy_locked(cgrp); +- if (!ret) +- TRACE_CGROUP_PATH(rmdir, cgrp); +- } ++ ret = cgroup_destroy_locked(cgrp); ++ if (!ret) ++ TRACE_CGROUP_PATH(rmdir, cgrp); + + cgroup_kn_unlock(kn); + return ret; +@@ -7073,7 +7063,6 @@ void cgroup_task_exit(struct task_struct + + static void do_cgroup_task_dead(struct task_struct *tsk) + { +- struct cgrp_cset_link *link; + struct css_set *cset; + unsigned long flags; + +@@ -7087,11 +7076,6 @@ static void do_cgroup_task_dead(struct t + if (thread_group_leader(tsk) && atomic_read(&tsk->signal->live)) + list_add_tail(&tsk->cg_list, &cset->dying_tasks); + +- /* kick cgroup_drain_dying() waiters, see cgroup_rmdir() */ +- list_for_each_entry(link, &cset->cgrp_links, cgrp_link) +- if (waitqueue_active(&link->cgrp->dying_populated_waitq)) +- wake_up(&link->cgrp->dying_populated_waitq); +- + if (dl_task(tsk)) + dec_dl_tasks_cs(tsk); + diff --git a/queue-7.0/cgroup-increment-nr_dying_subsys_-from-rmdir-context.patch b/queue-7.0/cgroup-increment-nr_dying_subsys_-from-rmdir-context.patch new file mode 100644 index 0000000000..480ca7eef6 --- /dev/null +++ b/queue-7.0/cgroup-increment-nr_dying_subsys_-from-rmdir-context.patch @@ -0,0 +1,76 @@ +From stable+bounces-246935-greg=kroah.com@vger.kernel.org Wed May 13 19:05:07 2026 +From: Sasha Levin +Date: Wed, 13 May 2026 12:33:13 -0400 +Subject: cgroup: Increment nr_dying_subsys_* from rmdir context +To: stable@vger.kernel.org +Cc: Petr Malat , Tejun Heo , Sasha Levin +Message-ID: <20260513163314.3807064-1-sashal@kernel.org> + +From: Petr Malat + +[ Upstream commit 13e786b64bd3fd81c7eb22aa32bf8305c32f2ccf ] + +Incrementing nr_dying_subsys_* in offline_css(), which is executed by +cgroup_offline_wq worker, leads to a race where user can see the value +to be 0 if he reads cgroup.stat after calling rmdir and before the worker +executes. This makes the user wrongly expect resources released by the +removed cgroup to be available for a new assignment. + +Increment nr_dying_subsys_* from kill_css(), which is called from the +cgroup_rmdir() context. + +Fixes: ab0312526867 ("cgroup: Show # of subsystem CSSes in cgroup.stat") +Signed-off-by: Petr Malat +Signed-off-by: Tejun Heo +Stable-dep-of: 93618edf7538 ("cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated") +Signed-off-by: Sasha Levin +Signed-off-by: Greg Kroah-Hartman +--- + kernel/cgroup/cgroup.c | 22 ++++++++++++---------- + 1 file changed, 12 insertions(+), 10 deletions(-) + +--- a/kernel/cgroup/cgroup.c ++++ b/kernel/cgroup/cgroup.c +@@ -5768,16 +5768,6 @@ static void offline_css(struct cgroup_su + RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL); + + wake_up_all(&css->cgroup->offline_waitq); +- +- css->cgroup->nr_dying_subsys[ss->id]++; +- /* +- * Parent css and cgroup cannot be freed until after the freeing +- * of child css, see css_free_rwork_fn(). +- */ +- while ((css = css->parent)) { +- css->nr_descendants--; +- css->cgroup->nr_dying_subsys[ss->id]++; +- } + } + + /** +@@ -6089,6 +6079,8 @@ static void css_killed_ref_fn(struct per + */ + static void kill_css(struct cgroup_subsys_state *css) + { ++ struct cgroup_subsys *ss = css->ss; ++ + lockdep_assert_held(&cgroup_mutex); + + if (css->flags & CSS_DYING) +@@ -6125,6 +6117,16 @@ static void kill_css(struct cgroup_subsy + * css is confirmed to be seen as killed on all CPUs. + */ + percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn); ++ ++ css->cgroup->nr_dying_subsys[ss->id]++; ++ /* ++ * Parent css and cgroup cannot be freed until after the freeing ++ * of child css, see css_free_rwork_fn(). ++ */ ++ while ((css = css->parent)) { ++ css->nr_descendants--; ++ css->cgroup->nr_dying_subsys[ss->id]++; ++ } + } + + /** diff --git a/queue-7.0/edac-versalnet-fix-device-name-memory-leak.patch b/queue-7.0/edac-versalnet-fix-device-name-memory-leak.patch new file mode 100644 index 0000000000..a226fad2b2 --- /dev/null +++ b/queue-7.0/edac-versalnet-fix-device-name-memory-leak.patch @@ -0,0 +1,67 @@ +From stable+bounces-247230-greg=kroah.com@vger.kernel.org Thu May 14 17:11:10 2026 +From: Sasha Levin +Date: Thu, 14 May 2026 11:08:25 -0400 +Subject: EDAC/versalnet: Fix device name memory leak +To: stable@vger.kernel.org +Cc: Prasanna Kumar T S M , "Borislav Petkov (AMD)" , Sasha Levin +Message-ID: <20260514150825.274588-2-sashal@kernel.org> + +From: Prasanna Kumar T S M + +[ Upstream commit 8cf5dd235eff6008cb04c3d8064d2acfa90616f1 ] + +The device name allocated via kzalloc() in init_one_mc() is assigned to +dev->init_name but never freed on the normal removal path. device_register() +copies init_name and then sets dev->init_name to NULL, so the name pointer +becomes unreachable from the device. Thus leaking memory. + +Use a stack-local char array instead of using kzalloc() for name. + +Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller") +Signed-off-by: Prasanna Kumar T S M +Signed-off-by: Borislav Petkov (AMD) +Cc: stable@vger.kernel.org +Link: https://patch.msgid.link/20260401111856.2342975-1-ptsm@linux.microsoft.com +Signed-off-by: Sasha Levin +Signed-off-by: Greg Kroah-Hartman +--- + drivers/edac/versalnet_edac.c | 10 ++-------- + 1 file changed, 2 insertions(+), 8 deletions(-) + +--- a/drivers/edac/versalnet_edac.c ++++ b/drivers/edac/versalnet_edac.c +@@ -777,9 +777,9 @@ static int init_one_mc(struct mc_priv *p + u32 num_chans, rank, dwidth, config; + struct edac_mc_layer layers[2]; + struct mem_ctl_info *mci; ++ char name[MC_NAME_LEN]; + struct device *dev; + enum dev_type dt; +- char *name; + int rc; + + config = priv->adec[CONF + i * ADEC_NUM]; +@@ -813,13 +813,9 @@ static int init_one_mc(struct mc_priv *p + layers[1].is_virt_csrow = false; + + rc = -ENOMEM; +- name = kzalloc(MC_NAME_LEN, GFP_KERNEL); +- if (!name) +- return rc; +- + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) +- goto err_name_free; ++ return rc; + + mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, sizeof(struct mc_priv)); + if (!mci) { +@@ -858,8 +854,6 @@ err_mc_free: + edac_mc_free(mci); + err_dev_free: + kfree(dev); +-err_name_free: +- kfree(name); + + return rc; + } diff --git a/queue-7.0/edac-versalnet-refactor-memory-controller-initialization-and-cleanup.patch b/queue-7.0/edac-versalnet-refactor-memory-controller-initialization-and-cleanup.patch new file mode 100644 index 0000000000..780ee68d34 --- /dev/null +++ b/queue-7.0/edac-versalnet-refactor-memory-controller-initialization-and-cleanup.patch @@ -0,0 +1,252 @@ +From stable+bounces-247229-greg=kroah.com@vger.kernel.org Thu May 14 17:11:14 2026 +From: Sasha Levin +Date: Thu, 14 May 2026 11:08:24 -0400 +Subject: EDAC/versalnet: Refactor memory controller initialization and cleanup +To: stable@vger.kernel.org +Cc: Shubhrajyoti Datta , "Borislav Petkov (AMD)" , Sasha Levin +Message-ID: <20260514150825.274588-1-sashal@kernel.org> + +From: Shubhrajyoti Datta + +[ Upstream commit 62a9fc50e8d947601ea3484e732b1a65a0a54b96 ] + +Simplify the initialization and cleanup flow for Versal Net DDRMC +controllers in the EDAC driver by carving out the single controller init +into a separate function which allows for a much better and more +readable error handling and unwinding. + + [ bp: + - do the kzalloc allocations first + - "publish" the structures only after they've been initialized + properly so that you don't need to unwind unnecessarily when + it fails later + - remove_versalnet() is now trivial + ] + +Signed-off-by: Shubhrajyoti Datta +Signed-off-by: Borislav Petkov (AMD) +Link: https://patch.msgid.link/20251104093932.3838876-1-shubhrajyoti.datta@amd.com +Stable-dep-of: 8cf5dd235eff ("EDAC/versalnet: Fix device name memory leak") +Signed-off-by: Sasha Levin +Signed-off-by: Greg Kroah-Hartman +--- + drivers/edac/versalnet_edac.c | 174 +++++++++++++++++++++++------------------- + 1 file changed, 97 insertions(+), 77 deletions(-) + +--- a/drivers/edac/versalnet_edac.c ++++ b/drivers/edac/versalnet_edac.c +@@ -70,6 +70,8 @@ + #define XDDR5_BUS_WIDTH_32 1 + #define XDDR5_BUS_WIDTH_16 2 + ++#define MC_NAME_LEN 32 ++ + /** + * struct ecc_error_info - ECC error log information. + * @burstpos: Burst position. +@@ -760,7 +762,17 @@ static void versal_edac_release(struct d + kfree(dev); + } + +-static int init_versalnet(struct mc_priv *priv, struct platform_device *pdev) ++static void remove_one_mc(struct mc_priv *priv, int i) ++{ ++ struct mem_ctl_info *mci; ++ ++ mci = priv->mci[i]; ++ device_unregister(mci->pdev); ++ edac_mc_del_mc(mci->pdev); ++ edac_mc_free(mci); ++} ++ ++static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i) + { + u32 num_chans, rank, dwidth, config; + struct edac_mc_layer layers[2]; +@@ -768,102 +780,110 @@ static int init_versalnet(struct mc_priv + struct device *dev; + enum dev_type dt; + char *name; +- int rc, i; ++ int rc; + +- for (i = 0; i < NUM_CONTROLLERS; i++) { +- config = priv->adec[CONF + i * ADEC_NUM]; +- num_chans = FIELD_GET(MC5_NUM_CHANS_MASK, config); +- rank = 1 << FIELD_GET(MC5_RANK_MASK, config); +- dwidth = FIELD_GET(MC5_BUS_WIDTH_MASK, config); +- +- switch (dwidth) { +- case XDDR5_BUS_WIDTH_16: +- dt = DEV_X16; +- break; +- case XDDR5_BUS_WIDTH_32: +- dt = DEV_X32; +- break; +- case XDDR5_BUS_WIDTH_64: +- dt = DEV_X64; +- break; +- default: +- dt = DEV_UNKNOWN; +- } ++ config = priv->adec[CONF + i * ADEC_NUM]; ++ num_chans = FIELD_GET(MC5_NUM_CHANS_MASK, config); ++ rank = 1 << FIELD_GET(MC5_RANK_MASK, config); ++ dwidth = FIELD_GET(MC5_BUS_WIDTH_MASK, config); ++ ++ switch (dwidth) { ++ case XDDR5_BUS_WIDTH_16: ++ dt = DEV_X16; ++ break; ++ case XDDR5_BUS_WIDTH_32: ++ dt = DEV_X32; ++ break; ++ case XDDR5_BUS_WIDTH_64: ++ dt = DEV_X64; ++ break; ++ default: ++ dt = DEV_UNKNOWN; ++ } + +- if (dt == DEV_UNKNOWN) +- continue; ++ if (dt == DEV_UNKNOWN) ++ return 0; + +- /* Find the first enabled device and register that one. */ +- layers[0].type = EDAC_MC_LAYER_CHIP_SELECT; +- layers[0].size = rank; +- layers[0].is_virt_csrow = true; +- layers[1].type = EDAC_MC_LAYER_CHANNEL; +- layers[1].size = num_chans; +- layers[1].is_virt_csrow = false; +- +- rc = -ENOMEM; +- mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, +- sizeof(struct mc_priv)); +- if (!mci) { +- edac_printk(KERN_ERR, EDAC_MC, "Failed memory allocation for MC%d\n", i); +- goto err_alloc; +- } ++ /* Find the first enabled device and register that one. */ ++ layers[0].type = EDAC_MC_LAYER_CHIP_SELECT; ++ layers[0].size = rank; ++ layers[0].is_virt_csrow = true; ++ layers[1].type = EDAC_MC_LAYER_CHANNEL; ++ layers[1].size = num_chans; ++ layers[1].is_virt_csrow = false; + +- priv->mci[i] = mci; +- priv->dwidth = dt; ++ rc = -ENOMEM; ++ name = kzalloc(MC_NAME_LEN, GFP_KERNEL); ++ if (!name) ++ return rc; + +- dev = kzalloc_obj(*dev); +- dev->release = versal_edac_release; +- name = kmalloc(32, GFP_KERNEL); +- sprintf(name, "versal-net-ddrmc5-edac-%d", i); +- dev->init_name = name; +- rc = device_register(dev); +- if (rc) +- goto err_alloc; ++ dev = kzalloc(sizeof(*dev), GFP_KERNEL); ++ if (!dev) ++ goto err_name_free; + +- mci->pdev = dev; ++ mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, sizeof(struct mc_priv)); ++ if (!mci) { ++ edac_printk(KERN_ERR, EDAC_MC, "Failed memory allocation for MC%d\n", i); ++ goto err_dev_free; ++ } + +- platform_set_drvdata(pdev, priv); ++ sprintf(name, "versal-net-ddrmc5-edac-%d", i); + +- mc_init(mci, dev); +- rc = edac_mc_add_mc(mci); +- if (rc) { +- edac_printk(KERN_ERR, EDAC_MC, "Failed to register MC%d with EDAC core\n", i); +- goto err_alloc; +- } +- } +- return 0; ++ dev->init_name = name; ++ dev->release = versal_edac_release; + +-err_alloc: +- while (i--) { +- mci = priv->mci[i]; +- if (!mci) +- continue; +- +- if (mci->pdev) { +- device_unregister(mci->pdev); +- edac_mc_del_mc(mci->pdev); +- } ++ rc = device_register(dev); ++ if (rc) ++ goto err_mc_free; + +- edac_mc_free(mci); ++ mci->pdev = dev; ++ mc_init(mci, dev); ++ ++ rc = edac_mc_add_mc(mci); ++ if (rc) { ++ edac_printk(KERN_ERR, EDAC_MC, "Failed to register MC%d with EDAC core\n", i); ++ goto err_unreg; + } + ++ priv->mci[i] = mci; ++ priv->dwidth = dt; ++ ++ platform_set_drvdata(pdev, priv); ++ ++ return 0; ++ ++err_unreg: ++ device_unregister(mci->pdev); ++err_mc_free: ++ edac_mc_free(mci); ++err_dev_free: ++ kfree(dev); ++err_name_free: ++ kfree(name); ++ + return rc; + } + +-static void remove_versalnet(struct mc_priv *priv) ++static int init_versalnet(struct mc_priv *priv, struct platform_device *pdev) + { +- struct mem_ctl_info *mci; +- int i; ++ int rc, i; + + for (i = 0; i < NUM_CONTROLLERS; i++) { +- device_unregister(priv->mci[i]->pdev); +- mci = edac_mc_del_mc(priv->mci[i]->pdev); +- if (!mci) +- return; ++ rc = init_one_mc(priv, pdev, i); ++ if (rc) { ++ while (i--) ++ remove_one_mc(priv, i); + +- edac_mc_free(mci); ++ return rc; ++ } + } ++ return 0; ++} ++ ++static void remove_versalnet(struct mc_priv *priv) ++{ ++ for (int i = 0; i < NUM_CONTROLLERS; i++) ++ remove_one_mc(priv, i); + } + + static int mc_probe(struct platform_device *pdev) diff --git a/queue-7.0/io_uring-zcrx-use-guards-for-locking.patch b/queue-7.0/io_uring-zcrx-use-guards-for-locking.patch new file mode 100644 index 0000000000..b128835774 --- /dev/null +++ b/queue-7.0/io_uring-zcrx-use-guards-for-locking.patch @@ -0,0 +1,67 @@ +From 898ad80d1207cbdb22b21bafb6de4adfd7627bd0 Mon Sep 17 00:00:00 2001 +From: Pavel Begunkov +Date: Mon, 23 Mar 2026 12:43:57 +0000 +Subject: io_uring/zcrx: use guards for locking + +From: Pavel Begunkov + +commit 898ad80d1207cbdb22b21bafb6de4adfd7627bd0 upstream. + +Convert last several places using manual locking to guards to simplify +the code. + +Signed-off-by: Pavel Begunkov +Link: https://patch.msgid.link/eb4667cfaf88c559700f6399da9e434889f5b04a.1774261953.git.asml.silence@gmail.com +Signed-off-by: Jens Axboe +Signed-off-by: Harshit Mogalapalli +Signed-off-by: Greg Kroah-Hartman +--- + io_uring/zcrx.c | 15 +++++++-------- + 1 file changed, 7 insertions(+), 8 deletions(-) + +--- a/io_uring/zcrx.c ++++ b/io_uring/zcrx.c +@@ -586,9 +586,8 @@ static void io_zcrx_return_niov_freelist + { + struct io_zcrx_area *area = io_zcrx_iov_to_area(niov); + +- spin_lock_bh(&area->freelist_lock); ++ guard(spinlock_bh)(&area->freelist_lock); + area->freelist[area->free_count++] = net_iov_idx(niov); +- spin_unlock_bh(&area->freelist_lock); + } + + static void io_zcrx_return_niov(struct net_iov *niov) +@@ -1029,7 +1028,8 @@ static void io_zcrx_refill_slow(struct p + { + struct io_zcrx_area *area = ifq->area; + +- spin_lock_bh(&area->freelist_lock); ++ guard(spinlock_bh)(&area->freelist_lock); ++ + while (area->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) { + struct net_iov *niov = __io_zcrx_get_free_niov(area); + netmem_ref netmem = net_iov_to_netmem(niov); +@@ -1038,7 +1038,6 @@ static void io_zcrx_refill_slow(struct p + io_zcrx_sync_for_device(pp, niov); + net_mp_netmem_place_in_cache(pp, netmem); + } +- spin_unlock_bh(&area->freelist_lock); + } + + static netmem_ref io_pp_zc_alloc_netmems(struct page_pool *pp, gfp_t gfp) +@@ -1264,10 +1263,10 @@ static struct net_iov *io_alloc_fallback + if (area->mem.is_dmabuf) + return NULL; + +- spin_lock_bh(&area->freelist_lock); +- if (area->free_count) +- niov = __io_zcrx_get_free_niov(area); +- spin_unlock_bh(&area->freelist_lock); ++ scoped_guard(spinlock_bh, &area->freelist_lock) { ++ if (area->free_count) ++ niov = __io_zcrx_get_free_niov(area); ++ } + + if (niov) + page_pool_fragment_netmem(net_iov_to_netmem(niov), 1); diff --git a/queue-7.0/io_uring-zcrx-warn-on-freelist-violations.patch b/queue-7.0/io_uring-zcrx-warn-on-freelist-violations.patch new file mode 100644 index 0000000000..1e6457e33e --- /dev/null +++ b/queue-7.0/io_uring-zcrx-warn-on-freelist-violations.patch @@ -0,0 +1,34 @@ +From 770594e78c3964cf23cf5287f849437cdde9b7d0 Mon Sep 17 00:00:00 2001 +From: Pavel Begunkov +Date: Tue, 21 Apr 2026 09:45:29 +0100 +Subject: io_uring/zcrx: warn on freelist violations + +From: Pavel Begunkov + +commit 770594e78c3964cf23cf5287f849437cdde9b7d0 upstream. + +The freelist is appropriately sized to always be able to take a free +niov, but let's be more defensive and check the invariant with a +warning. That should help to catch any double-free issues. + +Suggested-by: Kai Aizen +Signed-off-by: Pavel Begunkov +Link: https://patch.msgid.link/2f3cea363b04649755e3b6bb9ab66485a95936d5.1776760901.git.asml.silence@gmail.com +Signed-off-by: Jens Axboe +Signed-off-by: Harshit Mogalapalli +Signed-off-by: Greg Kroah-Hartman +--- + io_uring/zcrx.c | 2 ++ + 1 file changed, 2 insertions(+) + +--- a/io_uring/zcrx.c ++++ b/io_uring/zcrx.c +@@ -587,6 +587,8 @@ static void io_zcrx_return_niov_freelist + struct io_zcrx_area *area = io_zcrx_iov_to_area(niov); + + guard(spinlock_bh)(&area->freelist_lock); ++ if (WARN_ON_ONCE(area->free_count >= area->nia.num_niovs)) ++ return; + area->freelist[area->free_count++] = net_iov_idx(niov); + } + diff --git a/queue-7.0/kho-fix-error-handling-in-kho_add_subtree.patch b/queue-7.0/kho-fix-error-handling-in-kho_add_subtree.patch new file mode 100644 index 0000000000..d73ade27a7 --- /dev/null +++ b/queue-7.0/kho-fix-error-handling-in-kho_add_subtree.patch @@ -0,0 +1,82 @@ +From stable+bounces-247282-greg=kroah.com@vger.kernel.org Thu May 14 21:27:11 2026 +From: Sasha Levin +Date: Thu, 14 May 2026 15:25:53 -0400 +Subject: kho: fix error handling in kho_add_subtree() +To: stable@vger.kernel.org +Cc: Breno Leitao , Pratyush Yadav , "Mike Rapoport (Microsoft)" , Alexander Graf , Pasha Tatashin , Andrew Morton , Sasha Levin +Message-ID: <20260514192553.1255751-1-sashal@kernel.org> + +From: Breno Leitao + +[ Upstream commit 9ec95329894864170a1a7685b9a11b739393131a ] + +Fix two error handling issues in kho_add_subtree(), where it doesn't +handle the error path correctly. + +1. If fdt_setprop() fails after the subnode has been created, the + subnode is not removed. This leaves an incomplete node in the FDT + (missing "preserved-data" or "blob-size" properties). + +2. The fdt_setprop() return value (an FDT error code) is stored + directly in err and returned to the caller, which expects -errno. + +Fix both by storing fdt_setprop() results in fdt_err, jumping to a new +out_del_node label that removes the subnode on failure, and only setting +err = 0 on the success path, otherwise returning -ENOMEM (instead of +FDT_ERR_ errors that would come from fdt_setprop). + +No user-visible changes. This patch fixes error handling in the KHO +(Kexec HandOver) subsystem, which is used to preserve data across kexec +reboots. The fix only affects a rare failure path during kexec +preparation — specifically when the kernel runs out of space in the +Flattened Device Tree buffer while registering preserved memory regions. + +In the unlikely event that this error path was triggered, the old code +would leave a malformed node in the device tree and return an incorrect +error code to the calling subsystem, which could lead to confusing log +messages or incorrect recovery decisions. With this fix, the incomplete +node is properly cleaned up and the appropriate errno value is propagated, +this error code is not returned to the user. + +Link: https://lore.kernel.org/20260410-kho_fix_send-v2-1-1b4debf7ee08@debian.org +Fixes: 3dc92c311498 ("kexec: add Kexec HandOver (KHO) generation helpers") +Signed-off-by: Breno Leitao +Suggested-by: Pratyush Yadav +Reviewed-by: Mike Rapoport (Microsoft) +Reviewed-by: Pratyush Yadav +Cc: Alexander Graf +Cc: Breno Leitao +Cc: Pasha Tatashin +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Sasha Levin +Signed-off-by: Greg Kroah-Hartman +--- + kernel/liveupdate/kexec_handover.c | 13 +++++++++---- + 1 file changed, 9 insertions(+), 4 deletions(-) + +--- a/kernel/liveupdate/kexec_handover.c ++++ b/kernel/liveupdate/kexec_handover.c +@@ -757,13 +757,18 @@ int kho_add_subtree(const char *name, vo + goto out_pack; + } + +- err = fdt_setprop(root_fdt, off, KHO_FDT_SUB_TREE_PROP_NAME, +- &phys, sizeof(phys)); +- if (err < 0) +- goto out_pack; ++ fdt_err = fdt_setprop(root_fdt, off, KHO_FDT_SUB_TREE_PROP_NAME, ++ &phys, sizeof(phys)); ++ if (fdt_err < 0) ++ goto out_del_node; + + WARN_ON_ONCE(kho_debugfs_fdt_add(&kho_out.dbg, name, fdt, false)); + ++ err = 0; ++ goto out_pack; ++ ++out_del_node: ++ fdt_del_node(root_fdt, off); + out_pack: + fdt_pack(root_fdt); + diff --git a/queue-7.0/mm-vma-do-not-try-to-unmap-a-vma-if-mmap_prepare-invoked-from-mmap.patch b/queue-7.0/mm-vma-do-not-try-to-unmap-a-vma-if-mmap_prepare-invoked-from-mmap.patch new file mode 100644 index 0000000000..8df1fea072 --- /dev/null +++ b/queue-7.0/mm-vma-do-not-try-to-unmap-a-vma-if-mmap_prepare-invoked-from-mmap.patch @@ -0,0 +1,298 @@ +From stable+bounces-247163-greg=kroah.com@vger.kernel.org Thu May 14 12:35:09 2026 +From: Lorenzo Stoakes +Date: Thu, 14 May 2026 11:33:20 +0100 +Subject: mm/vma: do not try to unmap a VMA if mmap_prepare() invoked from mmap() +To: stable@vger.kernel.org +Message-ID: <20260514103320.155081-1-ljs@kernel.org> + +From: Lorenzo Stoakes + +[ Upstream commit 619eab23e1ce7c97e54bfc5a417306d94b3f6f13 ] + +The mmap_prepare hook functionality includes the ability to invoke +mmap_prepare() from the mmap() hook of existing 'stacked' drivers, that is +ones which are capable of calling the mmap hooks of other drivers/file +systems (e.g. overlayfs, shm). + +As part of the mmap_prepare action functionality, we deal with errors by +unmapping the VMA should one arise. This works in the usual mmap_prepare +case, as we invoke this action at the last moment, when the VMA is +established in the maple tree. + +However, the mmap() hook passes a not-fully-established VMA pointer to the +caller (which is the motivation behind the mmap_prepare() work), which is +detached. + +So attempting to unmap a VMA in this state will be problematic, with the +most obvious symptom being a warning in vma_mark_detached(), because the +VMA is already detached. + +It's also unncessary - the mmap() handler will clean up the VMA on error. + +So to fix this issue, this patch propagates whether or not an mmap action +is being completed via the compatibility layer or directly. + +If the former, then we do not attempt VMA cleanup, if the latter, then we +do. + +This patch also updates the userland VMA tests to reflect the change. + +Link: https://lore.kernel.org/20260421102150.189982-1-ljs@kernel.org +Fixes: ac0a3fc9c07d ("mm: add ability to take further action in vm_area_desc") +Signed-off-by: Lorenzo Stoakes +Reported-by: syzbot+db390288d141a1dccf96@syzkaller.appspotmail.com +Closes: https://lore.kernel.org/all/69e69734.050a0220.24bfd3.0027.GAE@google.com/ +Cc: David Hildenbrand +Cc: Jann Horn +Cc: Liam Howlett +Cc: Michal Hocko +Cc: Mike Rapoport +Cc: Pedro Falcato +Cc: Suren Baghdasaryan +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Lorenzo Stoakes +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/mm.h | 2 - + mm/util.c | 51 +++++++++++++++++++++----------------- + mm/vma.c | 3 -- + tools/testing/vma/include/dup.h | 41 ++++++++++++++---------------- + tools/testing/vma/include/stubs.h | 3 +- + 5 files changed, 53 insertions(+), 47 deletions(-) + +--- a/include/linux/mm.h ++++ b/include/linux/mm.h +@@ -4080,7 +4080,7 @@ static inline void mmap_action_ioremap_f + + int mmap_action_prepare(struct vm_area_desc *desc); + int mmap_action_complete(struct vm_area_struct *vma, +- struct mmap_action *action); ++ struct mmap_action *action, bool is_compat); + + /* Look up the first VMA which exactly match the interval vm_start ... vm_end */ + static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm, +--- a/mm/util.c ++++ b/mm/util.c +@@ -1186,7 +1186,8 @@ int compat_vma_mmap(struct file *file, s + return err; + + set_vma_from_desc(vma, &desc); +- err = mmap_action_complete(vma, &desc.action); ++ err = mmap_action_complete(vma, &desc.action, ++ /*is_compat=*/true); + if (err) { + const size_t len = vma_pages(vma) << PAGE_SHIFT; + +@@ -1277,28 +1278,31 @@ again: + } + + static int mmap_action_finish(struct vm_area_struct *vma, +- struct mmap_action *action, int err) ++ struct mmap_action *action, int err, ++ bool is_compat) + { ++ if (!err && action->success_hook) ++ err = action->success_hook(vma); ++ ++ /* ++ * If this is invoked from the compatibility layer, post-mmap() hook ++ * logic will handle cleanup for us. ++ */ ++ if (!err || is_compat) ++ return err; ++ + /* + * If an error occurs, unmap the VMA altogether and return an error. We + * only clear the newly allocated VMA, since this function is only + * invoked if we do NOT merge, so we only clean up the VMA we created. + */ +- if (err) { +- if (action->error_hook) { +- /* We may want to filter the error. */ +- err = action->error_hook(err); +- +- /* The caller should not clear the error. */ +- VM_WARN_ON_ONCE(!err); +- } +- return err; ++ if (action->error_hook) { ++ /* We may want to filter the error. */ ++ err = action->error_hook(err); ++ /* The caller should not clear the error. */ ++ VM_WARN_ON_ONCE(!err); + } +- +- if (action->success_hook) +- return action->success_hook(vma); +- +- return 0; ++ return err; + } + + #ifdef CONFIG_MMU +@@ -1329,14 +1333,16 @@ EXPORT_SYMBOL(mmap_action_prepare); + * mmap_action_complete - Execute VMA descriptor action. + * @vma: The VMA to perform the action upon. + * @action: The action to perform. ++ * @is_compat: Is this being invoked from the compatibility layer? + * + * Similar to mmap_action_prepare(). + * +- * Return: 0 on success, or error, at which point the VMA will be unmapped. ++ * Return: 0 on success, or error, at which point the VMA will be unmapped if ++ * !@is_compat. + */ + int mmap_action_complete(struct vm_area_struct *vma, +- struct mmap_action *action) +- ++ struct mmap_action *action, ++ bool is_compat) + { + int err = 0; + +@@ -1353,7 +1359,7 @@ int mmap_action_complete(struct vm_area_ + break; + } + +- return mmap_action_finish(vma, action, err); ++ return mmap_action_finish(vma, action, err, is_compat); + } + EXPORT_SYMBOL(mmap_action_complete); + #else +@@ -1373,7 +1379,8 @@ int mmap_action_prepare(struct vm_area_d + EXPORT_SYMBOL(mmap_action_prepare); + + int mmap_action_complete(struct vm_area_struct *vma, +- struct mmap_action *action) ++ struct mmap_action *action, ++ bool is_compat) + { + int err = 0; + +@@ -1388,7 +1395,7 @@ int mmap_action_complete(struct vm_area_ + break; + } + +- return mmap_action_finish(vma, action, err); ++ return mmap_action_finish(vma, action, err, is_compat); + } + EXPORT_SYMBOL(mmap_action_complete); + #endif +--- a/mm/vma.c ++++ b/mm/vma.c +@@ -2708,7 +2708,7 @@ static int call_action_complete(struct m + { + int err; + +- err = mmap_action_complete(vma, action); ++ err = mmap_action_complete(vma, action, /*is_compat=*/false); + + /* If we held the file rmap we need to release it. */ + if (map->hold_file_rmap_lock) { +@@ -2778,7 +2778,6 @@ static unsigned long __mmap_region(struc + + if (have_mmap_prepare && allocated_new) { + error = call_action_complete(&map, &desc.action, vma); +- + if (error) + return error; + } +--- a/tools/testing/vma/include/dup.h ++++ b/tools/testing/vma/include/dup.h +@@ -1071,8 +1071,17 @@ static inline void vma_set_anonymous(str + static inline void set_vma_from_desc(struct vm_area_struct *vma, + struct vm_area_desc *desc); + +-static inline int __compat_vma_mmap(const struct file_operations *f_op, +- struct file *file, struct vm_area_struct *vma) ++static inline unsigned long vma_pages(struct vm_area_struct *vma) ++{ ++ return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; ++} ++ ++static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc) ++{ ++ return file->f_op->mmap_prepare(desc); ++} ++ ++static inline int compat_vma_mmap(struct file *file, struct vm_area_struct *vma) + { + struct vm_area_desc desc = { + .mm = vma->vm_mm, +@@ -1082,14 +1091,14 @@ static inline int __compat_vma_mmap(cons + + .pgoff = vma->vm_pgoff, + .vm_file = vma->vm_file, +- .vm_flags = vma->vm_flags, ++ .vma_flags = vma->flags, + .page_prot = vma->vm_page_prot, + + .action.type = MMAP_NOTHING, /* Default */ + }; + int err; + +- err = f_op->mmap_prepare(&desc); ++ err = vfs_mmap_prepare(file, &desc); + if (err) + return err; + +@@ -1098,27 +1107,22 @@ static inline int __compat_vma_mmap(cons + return err; + + set_vma_from_desc(vma, &desc); +- return mmap_action_complete(vma, &desc.action); +-} ++ err = mmap_action_complete(vma, &desc.action, ++ /*is_compat=*/true); ++ if (err) { ++ const size_t len = vma_pages(vma) << PAGE_SHIFT; + +-static inline int compat_vma_mmap(struct file *file, +- struct vm_area_struct *vma) +-{ +- return __compat_vma_mmap(file->f_op, file, vma); ++ do_munmap(current->mm, vma->vm_start, len, NULL); ++ } ++ return err; + } + +- + static inline void vma_iter_init(struct vma_iterator *vmi, + struct mm_struct *mm, unsigned long addr) + { + mas_init(&vmi->mas, &mm->mm_mt, addr); + } + +-static inline unsigned long vma_pages(struct vm_area_struct *vma) +-{ +- return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; +-} +- + static inline void mmap_assert_locked(struct mm_struct *); + static inline struct vm_area_struct *find_vma_intersection(struct mm_struct *mm, + unsigned long start_addr, +@@ -1309,11 +1313,6 @@ static inline int vfs_mmap(struct file * + return file->f_op->mmap(file, vma); + } + +-static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc) +-{ +- return file->f_op->mmap_prepare(desc); +-} +- + static inline void vma_set_file(struct vm_area_struct *vma, struct file *file) + { + /* Changing an anonymous vma with this is illegal */ +--- a/tools/testing/vma/include/stubs.h ++++ b/tools/testing/vma/include/stubs.h +@@ -87,7 +87,8 @@ static inline int mmap_action_prepare(st + } + + static inline int mmap_action_complete(struct vm_area_struct *vma, +- struct mmap_action *action) ++ struct mmap_action *action, ++ bool is_compat) + { + return 0; + } diff --git a/queue-7.0/perf-build-fix-argument-list-too-long-in-second-location.patch b/queue-7.0/perf-build-fix-argument-list-too-long-in-second-location.patch new file mode 100644 index 0000000000..4f1f734cce --- /dev/null +++ b/queue-7.0/perf-build-fix-argument-list-too-long-in-second-location.patch @@ -0,0 +1,53 @@ +From stable+bounces-247054-greg=kroah.com@vger.kernel.org Thu May 14 01:47:02 2026 +From: Florian Fainelli +Date: Wed, 13 May 2026 16:46:38 -0700 +Subject: perf build: fix "argument list too long" in second location +To: stable@vger.kernel.org +Cc: Markus Mayer , James Clark , Namhyung Kim , Florian Fainelli , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org (open list:PERFORMANCE EVENTS SUBSYSTEM), linux-kernel@vger.kernel.org (open list:PERFORMANCE EVENTS SUBSYSTEM) +Message-ID: <20260513234639.128528-1-florian.fainelli@broadcom.com> + +From: Markus Mayer + +commit 97ab89686a9e5d087042dbe73604a32b3de72653 upstream + +Turns out that displaying "RM $^" via quiet_cmd_rm can also upset the +shell and cause it to display "argument list too long". + +Trying to quote $^ doesn't help. + +In the end, *not* displaying the (potentially long) list of files is +probably the right thing to do for a "quiet" message, anyway. Instead, +let's display a count of how many files were removed. There is always +V=1 if more detail is required. + + TEST linux/tools/perf/pmu-events/metric_test.log + RM ...634 orphan file(s)... + LD linux/tools/perf/util/perf-util-in.o + +Also move the comment regarding xargs before the rule, so it doesn't +show up in the build output. + +Signed-off-by: Markus Mayer +Reviewed-by: James Clark +Signed-off-by: Namhyung Kim +Signed-off-by: Florian Fainelli +Signed-off-by: Greg Kroah-Hartman +--- + tools/perf/pmu-events/Build | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/tools/perf/pmu-events/Build ++++ b/tools/perf/pmu-events/Build +@@ -211,10 +211,10 @@ ifneq ($(strip $(ORPHAN_FILES)),) + + # Message for $(call echo-cmd,rm). Generally cleaning files isn't part + # of a build step. +-quiet_cmd_rm = RM $^ ++quiet_cmd_rm = RM ...$(words $^) orphan file(s)... + ++# The list of files can be long. Use xargs to prevent issues. + prune_orphans: $(ORPHAN_FILES) +- # The list of files can be long. Use xargs to prevent issues. + $(Q)$(call echo-cmd,rm)echo "$^" | xargs rm -f + + JEVENTS_DEPS += prune_orphans diff --git a/queue-7.0/sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch b/queue-7.0/sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch new file mode 100644 index 0000000000..e634eb03d9 --- /dev/null +++ b/queue-7.0/sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch @@ -0,0 +1,50 @@ +From arighi@nvidia.com Wed May 13 15:01:26 2026 +From: Andrea Righi +Date: Wed, 13 May 2026 15:01:11 +0200 +Subject: sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() +To: Greg Kroah-Hartman , Tejun Heo , David Vernet , Changwoo Min +Cc: Chris Mason , Peter Schneider , sched-ext@lists.linux.dev, stable@vger.kernel.org, linux-kernel@vger.kernel.org +Message-ID: <20260513130111.689740-1-arighi@nvidia.com> + +From: Tejun Heo + +commit da2d81b4118a74e65d2335e221a38d665902a98c upstream. + +bypass_lb_cpu() transfers tasks between per-CPU bypass DSQs without +migrating them - task_cpu() only updates when the donee later consumes the +task via move_remote_task_to_local_dsq(). If the LB timer fires again before +consumption and the new DSQ becomes a donor, @p is still on the previous CPU +and task_rq(@p) != donor_rq. @p can't be moved without its own rq locked. + +Skip such tasks. + +Fixes: 95d1df610cdc ("sched_ext: Implement load balancer for bypass mode") +Cc: stable@vger.kernel.org # v6.19+ +Reported-by: Chris Mason +Signed-off-by: Tejun Heo +Reviewed-by: Andrea Righi +[ arighi: replace donor_rq with rq, not present in v7.0.y ] +Signed-off-by: Andrea Righi +Signed-off-by: Greg Kroah-Hartman +--- + kernel/sched/ext.c | 9 +++++++++ + 1 file changed, 9 insertions(+) + +--- a/kernel/sched/ext.c ++++ b/kernel/sched/ext.c +@@ -4010,6 +4010,15 @@ resume: + if (cpumask_empty(donee_mask)) + break; + ++ /* ++ * If an earlier pass placed @p on @donor_dsq from a different ++ * CPU and the donee hasn't consumed it yet, @p is still on the ++ * previous CPU and task_rq(@p) != @rq. @p can't be moved ++ * without its rq locked. Skip. ++ */ ++ if (task_rq(p) != rq) ++ continue; ++ + donee = cpumask_any_and_distribute(donee_mask, p->cpus_ptr); + if (donee >= nr_cpu_ids) + continue; diff --git a/queue-7.0/series b/queue-7.0/series index c9a1566e32..80e6b3e7e3 100644 --- a/queue-7.0/series +++ b/queue-7.0/series @@ -180,3 +180,16 @@ batman-adv-bla-prevent-use-after-free-when-deleting-claims.patch batman-adv-bla-only-purge-non-released-claims.patch batman-adv-bla-put-backbone-reference-on-failed-claim-hash-insert.patch sched_ext-use-hk_type_domain_boot-to-detect-isolcpus-domain-isolation.patch +usb-typec-tcpm-reset-internal-port-states-on-soft-reset-ams.patch +io_uring-zcrx-use-guards-for-locking.patch +io_uring-zcrx-warn-on-freelist-violations.patch +kho-fix-error-handling-in-kho_add_subtree.patch +edac-versalnet-refactor-memory-controller-initialization-and-cleanup.patch +edac-versalnet-fix-device-name-memory-leak.patch +spi-uniphier-simplify-clock-handling-with-devm_clk_get_enabled.patch +spi-uniphier-fix-controller-deregistration.patch +cgroup-increment-nr_dying_subsys_-from-rmdir-context.patch +cgroup-defer-css-percpu_ref-kill-on-rmdir-until-cgroup-is-depopulated.patch +sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch +perf-build-fix-argument-list-too-long-in-second-location.patch +mm-vma-do-not-try-to-unmap-a-vma-if-mmap_prepare-invoked-from-mmap.patch diff --git a/queue-7.0/spi-uniphier-fix-controller-deregistration.patch b/queue-7.0/spi-uniphier-fix-controller-deregistration.patch new file mode 100644 index 0000000000..4678ce1573 --- /dev/null +++ b/queue-7.0/spi-uniphier-fix-controller-deregistration.patch @@ -0,0 +1,59 @@ +From stable+bounces-247104-greg=kroah.com@vger.kernel.org Thu May 14 06:36:57 2026 +From: Sasha Levin +Date: Thu, 14 May 2026 00:36:31 -0400 +Subject: spi: uniphier: fix controller deregistration +To: stable@vger.kernel.org +Cc: Johan Hovold , Keiji Hayashibara , Mark Brown , Sasha Levin +Message-ID: <20260514043631.10946-2-sashal@kernel.org> + +From: Johan Hovold + +[ Upstream commit 0245435f777264ac45945ed2f325dd095a41d1af ] + +Make sure to deregister the controller before releasing underlying +resources like DMA during driver unbind. + +Note that clocks were also disabled before the recent commit +fdca270f8f87 ("spi: uniphier: Simplify clock handling with +devm_clk_get_enabled()"). + +Fixes: 5ba155a4d4cc ("spi: add SPI controller driver for UniPhier SoC") +Cc: stable@vger.kernel.org # 4.19 +Cc: Keiji Hayashibara +Signed-off-by: Johan Hovold +Link: https://patch.msgid.link/20260410081757.503099-25-johan@kernel.org +Signed-off-by: Mark Brown +Signed-off-by: Sasha Levin +Signed-off-by: Greg Kroah-Hartman +--- + drivers/spi/spi-uniphier.c | 8 +++++++- + 1 file changed, 7 insertions(+), 1 deletion(-) + +--- a/drivers/spi/spi-uniphier.c ++++ b/drivers/spi/spi-uniphier.c +@@ -746,7 +746,7 @@ static int uniphier_spi_probe(struct pla + + host->max_dma_len = min(dma_tx_burst, dma_rx_burst); + +- ret = devm_spi_register_controller(&pdev->dev, host); ++ ret = spi_register_controller(host); + if (ret) + goto out_release_dma; + +@@ -771,10 +771,16 @@ static void uniphier_spi_remove(struct p + { + struct spi_controller *host = platform_get_drvdata(pdev); + ++ spi_controller_get(host); ++ ++ spi_unregister_controller(host); ++ + if (host->dma_tx) + dma_release_channel(host->dma_tx); + if (host->dma_rx) + dma_release_channel(host->dma_rx); ++ ++ spi_controller_put(host); + } + + static const struct of_device_id uniphier_spi_match[] = { diff --git a/queue-7.0/spi-uniphier-simplify-clock-handling-with-devm_clk_get_enabled.patch b/queue-7.0/spi-uniphier-simplify-clock-handling-with-devm_clk_get_enabled.patch new file mode 100644 index 0000000000..c6c6fe88a4 --- /dev/null +++ b/queue-7.0/spi-uniphier-simplify-clock-handling-with-devm_clk_get_enabled.patch @@ -0,0 +1,99 @@ +From stable+bounces-247103-greg=kroah.com@vger.kernel.org Thu May 14 06:36:56 2026 +From: Sasha Levin +Date: Thu, 14 May 2026 00:36:30 -0400 +Subject: spi: uniphier: Simplify clock handling with devm_clk_get_enabled() +To: stable@vger.kernel.org +Cc: Pei Xiao , Kunihiko Hayashi , Mark Brown , Sasha Levin +Message-ID: <20260514043631.10946-1-sashal@kernel.org> + +From: Pei Xiao + +[ Upstream commit fdca270f8f87cae2eb5b619234b9dd11a863ce6b ] + +Replace devm_clk_get() followed by clk_prepare_enable() with +devm_clk_get_enabled() for the clock. This removes the need for +explicit clock enable and disable calls, as the managed API automatically +handles clock disabling on device removal or probe failure. + +Remove the now-unnecessary clk_disable_unprepare() calls from the probe +error path and the remove callback. Adjust error labels accordingly. + +Signed-off-by: Pei Xiao +Reviewed-by: Kunihiko Hayashi +Link: https://patch.msgid.link/b2deeefd4ef1a4bce71116aabfcb7e81400f6d37.1775546948.git.xiaopei01@kylinos.cn +Signed-off-by: Mark Brown +Stable-dep-of: 0245435f7772 ("spi: uniphier: fix controller deregistration") +Signed-off-by: Sasha Levin +Signed-off-by: Greg Kroah-Hartman +--- + drivers/spi/spi-uniphier.c | 18 ++++-------------- + 1 file changed, 4 insertions(+), 14 deletions(-) + +--- a/drivers/spi/spi-uniphier.c ++++ b/drivers/spi/spi-uniphier.c +@@ -666,28 +666,24 @@ static int uniphier_spi_probe(struct pla + } + priv->base_dma_addr = res->start; + +- priv->clk = devm_clk_get(&pdev->dev, NULL); ++ priv->clk = devm_clk_get_enabled(&pdev->dev, NULL); + if (IS_ERR(priv->clk)) { + dev_err(&pdev->dev, "failed to get clock\n"); + ret = PTR_ERR(priv->clk); + goto out_host_put; + } + +- ret = clk_prepare_enable(priv->clk); +- if (ret) +- goto out_host_put; +- + irq = platform_get_irq(pdev, 0); + if (irq < 0) { + ret = irq; +- goto out_disable_clk; ++ goto out_host_put; + } + + ret = devm_request_irq(&pdev->dev, irq, uniphier_spi_handler, + 0, "uniphier-spi", priv); + if (ret) { + dev_err(&pdev->dev, "failed to request IRQ\n"); +- goto out_disable_clk; ++ goto out_host_put; + } + + init_completion(&priv->xfer_done); +@@ -716,7 +712,7 @@ static int uniphier_spi_probe(struct pla + if (IS_ERR_OR_NULL(host->dma_tx)) { + if (PTR_ERR(host->dma_tx) == -EPROBE_DEFER) { + ret = -EPROBE_DEFER; +- goto out_disable_clk; ++ goto out_host_put; + } + host->dma_tx = NULL; + dma_tx_burst = INT_MAX; +@@ -766,9 +762,6 @@ out_release_dma: + host->dma_tx = NULL; + } + +-out_disable_clk: +- clk_disable_unprepare(priv->clk); +- + out_host_put: + spi_controller_put(host); + return ret; +@@ -777,14 +770,11 @@ out_host_put: + static void uniphier_spi_remove(struct platform_device *pdev) + { + struct spi_controller *host = platform_get_drvdata(pdev); +- struct uniphier_spi_priv *priv = spi_controller_get_devdata(host); + + if (host->dma_tx) + dma_release_channel(host->dma_tx); + if (host->dma_rx) + dma_release_channel(host->dma_rx); +- +- clk_disable_unprepare(priv->clk); + } + + static const struct of_device_id uniphier_spi_match[] = { diff --git a/queue-7.0/usb-typec-tcpm-reset-internal-port-states-on-soft-reset-ams.patch b/queue-7.0/usb-typec-tcpm-reset-internal-port-states-on-soft-reset-ams.patch new file mode 100644 index 0000000000..4ba2d49f04 --- /dev/null +++ b/queue-7.0/usb-typec-tcpm-reset-internal-port-states-on-soft-reset-ams.patch @@ -0,0 +1,85 @@ +From 2909f0d4994fb4306bf116df5ccee797791fce2c Mon Sep 17 00:00:00 2001 +From: Amit Sunil Dhamne +Date: Tue, 14 Apr 2026 00:58:32 +0000 +Subject: usb: typec: tcpm: reset internal port states on soft reset AMS + +From: Amit Sunil Dhamne + +commit 2909f0d4994fb4306bf116df5ccee797791fce2c upstream. + +Reset internal port states (such as vdm_sm_running and +explicit_contract) on soft reset AMS as the port needs to negotiate a +new contract. The consequence of leaving the states in as-is cond are as +follows: + * port is in SRC power role and an explicit contract is negotiated + with the port partner (in sink role) + * port partner sends a Soft Reset AMS while VDM State Machine is + running + * port accepts the Soft Reset request and the port advertises src caps + * port partner sends a Request message but since the explicit_contract + and vdm_sm_running are true from previous negotiation, the port ends + up sending Soft Reset instead of Accept msg. + +Stub Log: +[ 203.653942] AMS DISCOVER_IDENTITY start +[ 203.653947] PD TX, header: 0x176f +[ 203.655901] PD TX complete, status: 0 +[ 203.657470] PD RX, header: 0x124f [1] +[ 203.657477] Rx VDM cmd 0xff008081 type 2 cmd 1 len 1 +[ 203.657482] AMS DISCOVER_IDENTITY finished +[ 203.657484] cc:=4 +[ 204.155698] PD RX, header: 0x144f [1] +[ 204.155718] Rx VDM cmd 0xeeee8001 type 0 cmd 1 len 1 +[ 204.155741] PD TX, header: 0x196f +[ 204.157622] PD TX complete, status: 0 +[ 204.160060] PD RX, header: 0x4d [1] +[ 204.160066] state change SRC_READY -> SOFT_RESET [rev2 SOFT_RESET_AMS] +[ 204.160076] PD TX, header: 0x163 +[ 204.162486] PD TX complete, status: 0 +[ 204.162832] AMS SOFT_RESET_AMS finished +[ 204.162840] cc:=4 +[ 204.162891] AMS POWER_NEGOTIATION start +[ 204.162896] state change SOFT_RESET -> AMS_START [rev2 POWER_NEGOTIATION] +[ 204.162908] state change AMS_START -> SRC_SEND_CAPABILITIES [rev2 POWER_NEGOTIATION] +[ 204.162913] PD TX, header: 0x1361 +[ 204.165529] PD TX complete, status: 0 +[ 204.165571] pending state change SRC_SEND_CAPABILITIES -> SRC_SEND_CAPABILITIES_TIMEOUT @ 60 ms [rev2 POWER_NEGOTIATION] +[ 204.166996] PD RX, header: 0x1242 [1] +[ 204.167009] state change SRC_SEND_CAPABILITIES -> SRC_SOFT_RESET_WAIT_SNK_TX [rev2 POWER_NEGOTIATION] +[ 204.167019] AMS POWER_NEGOTIATION finished +[ 204.167020] cc:=4 +[ 204.167083] AMS SOFT_RESET_AMS start +[ 204.167086] state change SRC_SOFT_RESET_WAIT_SNK_TX -> SOFT_RESET_SEND [rev2 SOFT_RESET_AMS] +[ 204.167092] PD TX, header: 0x16d +[ 204.168824] PD TX complete, status: 0 +[ 204.168854] pending state change SOFT_RESET_SEND -> HARD_RESET_SEND @ 60 ms [rev2 SOFT_RESET_AMS] +[ 204.171876] PD RX, header: 0x43 [1] +[ 204.171879] AMS SOFT_RESET_AMS finished + +This causes COMMON.PROC.PD.11.2 check failure for +TEST.PD.VDM.SRC.2_Rev2Src test on the PD compliance tester. + +Signed-off-by: Amit Sunil Dhamne +Fixes: 8d3a0578ad1a ("usb: typec: tcpm: Respond Wait if VDM state machine is running") +Fixes: f0690a25a140 ("staging: typec: USB Type-C Port Manager (tcpm)") +Cc: stable +Reviewed-by: Badhri Jagan Sridharan +Acked-by: Heikki Krogerus +Link: https://patch.msgid.link/20260414-fix-soft-reset-v1-1-01d7cb9764e2@google.com +Signed-off-by: Greg Kroah-Hartman +Signed-off-by: Greg Kroah-Hartman +--- + drivers/usb/typec/tcpm/tcpm.c | 2 ++ + 1 file changed, 2 insertions(+) + +--- a/drivers/usb/typec/tcpm/tcpm.c ++++ b/drivers/usb/typec/tcpm/tcpm.c +@@ -5539,6 +5539,8 @@ static void run_state_machine(struct tcp + usb_power_delivery_unregister_capabilities(port->partner_source_caps); + port->partner_source_caps = NULL; + tcpm_pd_send_control(port, PD_CTRL_ACCEPT, TCPC_TX_SOP); ++ port->vdm_sm_running = false; ++ port->explicit_contract = false; + tcpm_ams_finish(port); + if (port->pwr_role == TYPEC_SOURCE) { + port->upcoming_state = SRC_SEND_CAPABILITIES;