--- /dev/null
+From stable+bounces-246936-greg=kroah.com@vger.kernel.org Wed May 13 19:05:49 2026
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 13 May 2026 12:33:14 -0400
+Subject: cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated
+To: stable@vger.kernel.org
+Cc: Tejun Heo <tj@kernel.org>, Martin Pitt <martin@piware.de>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Sasha Levin <sashal@kernel.org>
+Message-ID: <20260513163314.3807064-2-sashal@kernel.org>
+
+From: Tejun Heo <tj@kernel.org>
+
+[ Upstream commit 93618edf753838a727dbff63c7c291dee22d656b ]
+
+A chain of commits going back to v7.0 reworked rmdir to satisfy the
+controller invariant that a subsystem's ->css_offline() must not run while
+tasks are still doing kernel-side work in the cgroup.
+
+[1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out")
+[2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
+[3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
+[4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition")
+[5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context")
+
+[1] moved task cset unlink from do_exit() to finish_task_switch() so a
+task's cset link drops only after the task has fully stopped scheduling.
+That made tasks past exit_signals() linger on cset->tasks until their final
+context switch, which led to a series of problems as what userspace expected
+to see after rmdir diverged from what the kernel needs to wait for. [2]-[5]
+tried to bridge that divergence: [2] filtered the exiting tasks from
+cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4]
+fixed the wait's condition; [5] made nr_dying_subsys_* visible
+synchronously.
+
+The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the
+rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g.
+host PID 1 systemd reaping orphan pids that were re-parented to it during
+the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those
+pids to free, the pids can't free because PID 1 is the reaper and it's stuck
+in rmdir, and the system A-A deadlocks. No internal lock ordering breaks
+this; the wait itself is the bug.
+
+The css killing side that drove the original reorder, however, can be made
+cleanly asynchronous: ->css_offline() is already async, run from
+css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to
+make that chain start only after all tasks have left the cgroup. rmdir's
+user-visible side then returns as soon as cgroup.procs and friends are
+empty, while ->css_offline() still runs only after the cgroup is fully
+drained.
+
+Verified by the original reproducer (pidns teardown + zombie reaper, runs
+under vng) which hangs vanilla and succeeds here, and by per-commit
+deterministic repros for [2], [3], [4], [5] with a boot parameter that
+widens the post-exit_signals() window so each state is reliably reachable.
+Some stress tests on top of that.
+
+cgroup_apply_control_disable() has the same shape of pre-existing race:
+when a controller is disabled via subtree_control, kill_css() ran
+synchronously while tasks past exit_signals() could still be linked to
+the cgroup's csets, and ->css_offline() could fire before they drained.
+This patch preserves the existing synchronous behavior at that call site
+(kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch
+will defer kill_css_finish() there using a per-css trigger.
+
+This seems like the right approach and I don't see problems with it. The
+changes are somewhat invasive but not excessively so, so backporting to
+-stable should be okay. If something does turn out to be wrong, the fallback
+is to revert the entire chain ([1]-[5]) and rework in the development branch
+instead.
+
+v2: Pin cgrp across the deferred destroy work with explicit
+ cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1
+ wasn't actually broken (ordered cgroup_offline_wq + queue_work order
+ in cgroup_task_dead() saved it) but the explicit ref removes the
+ dependency on those non-obvious invariants. Also note the
+ pre-existing cgroup_apply_control_disable() race in the description;
+ a follow-up will defer kill_css_finish() there.
+
+Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
+Cc: stable@vger.kernel.org # v7.0+
+Reported-and-tested-by: Martin Pitt <martin@piware.de>
+Link: https://lore.kernel.org/all/afHNg2VX2jy9bW7y@piware.de/
+Link: https://lore.kernel.org/all/35e0670adb4abeab13da2c321582af9f@kernel.org/
+Signed-off-by: Tejun Heo <tj@kernel.org>
+Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ include/linux/cgroup-defs.h | 4
+ kernel/cgroup/cgroup.c | 250 ++++++++++++++++++++------------------------
+ 2 files changed, 119 insertions(+), 135 deletions(-)
+
+--- a/include/linux/cgroup-defs.h
++++ b/include/linux/cgroup-defs.h
+@@ -609,8 +609,8 @@ struct cgroup {
+ /* used to wait for offlining of csses */
+ wait_queue_head_t offline_waitq;
+
+- /* used by cgroup_rmdir() to wait for dying tasks to leave */
+- wait_queue_head_t dying_populated_waitq;
++ /* defers killing csses after removal until cgroup is depopulated */
++ struct work_struct finish_destroy_work;
+
+ /* used to schedule release agent */
+ struct work_struct release_agent_work;
+--- a/kernel/cgroup/cgroup.c
++++ b/kernel/cgroup/cgroup.c
+@@ -278,10 +278,12 @@ static void cgroup_finalize_control(stru
+ static void css_task_iter_skip(struct css_task_iter *it,
+ struct task_struct *task);
+ static int cgroup_destroy_locked(struct cgroup *cgrp);
++static void cgroup_finish_destroy(struct cgroup *cgrp);
++static void kill_css_sync(struct cgroup_subsys_state *css);
++static void kill_css_finish(struct cgroup_subsys_state *css);
+ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
+ struct cgroup_subsys *ss);
+ static void css_release(struct percpu_ref *ref);
+-static void kill_css(struct cgroup_subsys_state *css);
+ static int cgroup_addrm_files(struct cgroup_subsys_state *css,
+ struct cgroup *cgrp, struct cftype cfts[],
+ bool is_add);
+@@ -858,6 +860,16 @@ static void cgroup_update_populated(stru
+ if (was_populated == cgroup_is_populated(cgrp))
+ break;
+
++ /*
++ * Subtree just emptied below an offlined cgrp. Fire deferred
++ * destroy. The transition is one-shot.
++ */
++ if (was_populated && !css_is_online(&cgrp->self)) {
++ cgroup_get(cgrp);
++ WARN_ON_ONCE(!queue_work(cgroup_offline_wq,
++ &cgrp->finish_destroy_work));
++ }
++
+ cgroup1_check_for_release(cgrp);
+ TRACE_CGROUP_PATH(notify_populated, cgrp,
+ cgroup_is_populated(cgrp));
+@@ -2100,6 +2112,16 @@ static int cgroup_reconfigure(struct fs_
+ return 0;
+ }
+
++static void cgroup_finish_destroy_work_fn(struct work_struct *work)
++{
++ struct cgroup *cgrp = container_of(work, struct cgroup, finish_destroy_work);
++
++ cgroup_lock();
++ cgroup_finish_destroy(cgrp);
++ cgroup_unlock();
++ cgroup_put(cgrp);
++}
++
+ static void init_cgroup_housekeeping(struct cgroup *cgrp)
+ {
+ struct cgroup_subsys *ss;
+@@ -2126,7 +2148,7 @@ static void init_cgroup_housekeeping(str
+ #endif
+
+ init_waitqueue_head(&cgrp->offline_waitq);
+- init_waitqueue_head(&cgrp->dying_populated_waitq);
++ INIT_WORK(&cgrp->finish_destroy_work, cgroup_finish_destroy_work_fn);
+ INIT_WORK(&cgrp->release_agent_work, cgroup1_release_agent);
+ }
+
+@@ -3436,7 +3458,8 @@ static void cgroup_apply_control_disable
+
+ if (css->parent &&
+ !(cgroup_ss_mask(dsct) & (1 << ss->id))) {
+- kill_css(css);
++ kill_css_sync(css);
++ kill_css_finish(css);
+ } else if (!css_visible(css)) {
+ css_clear_dir(css);
+ if (ss->css_reset)
+@@ -5558,7 +5581,7 @@ static struct cftype cgroup_psi_files[]
+ * css destruction is four-stage process.
+ *
+ * 1. Destruction starts. Killing of the percpu_ref is initiated.
+- * Implemented in kill_css().
++ * Implemented in kill_css_finish().
+ *
+ * 2. When the percpu_ref is confirmed to be visible as killed on all CPUs
+ * and thus css_tryget_online() is guaranteed to fail, the css can be
+@@ -6037,7 +6060,7 @@ out_unlock:
+ /*
+ * This is called when the refcnt of a css is confirmed to be killed.
+ * css_tryget_online() is now guaranteed to fail. Tell the subsystem to
+- * initiate destruction and put the css ref from kill_css().
++ * initiate destruction and put the css ref from kill_css_finish().
+ */
+ static void css_killed_work_fn(struct work_struct *work)
+ {
+@@ -6069,15 +6092,12 @@ static void css_killed_ref_fn(struct per
+ }
+
+ /**
+- * kill_css - destroy a css
+- * @css: css to destroy
++ * kill_css_sync - synchronous half of css teardown
++ * @css: css being killed
+ *
+- * This function initiates destruction of @css by removing cgroup interface
+- * files and putting its base reference. ->css_offline() will be invoked
+- * asynchronously once css_tryget_online() is guaranteed to fail and when
+- * the reference count reaches zero, @css will be released.
++ * See cgroup_destroy_locked().
+ */
+-static void kill_css(struct cgroup_subsys_state *css)
++static void kill_css_sync(struct cgroup_subsys_state *css)
+ {
+ struct cgroup_subsys *ss = css->ss;
+
+@@ -6100,24 +6120,6 @@ static void kill_css(struct cgroup_subsy
+ */
+ css_clear_dir(css);
+
+- /*
+- * Killing would put the base ref, but we need to keep it alive
+- * until after ->css_offline().
+- */
+- css_get(css);
+-
+- /*
+- * cgroup core guarantees that, by the time ->css_offline() is
+- * invoked, no new css reference will be given out via
+- * css_tryget_online(). We can't simply call percpu_ref_kill() and
+- * proceed to offlining css's because percpu_ref_kill() doesn't
+- * guarantee that the ref is seen as killed on all CPUs on return.
+- *
+- * Use percpu_ref_kill_and_confirm() to get notifications as each
+- * css is confirmed to be seen as killed on all CPUs.
+- */
+- percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn);
+-
+ css->cgroup->nr_dying_subsys[ss->id]++;
+ /*
+ * Parent css and cgroup cannot be freed until after the freeing
+@@ -6130,44 +6132,88 @@ static void kill_css(struct cgroup_subsy
+ }
+
+ /**
+- * cgroup_destroy_locked - the first stage of cgroup destruction
++ * kill_css_finish - deferred half of css teardown
++ * @css: css being killed
++ *
++ * See cgroup_destroy_locked().
++ */
++static void kill_css_finish(struct cgroup_subsys_state *css)
++{
++ lockdep_assert_held(&cgroup_mutex);
++
++ /*
++ * Skip on re-entry: cgroup_apply_control_disable() may have killed @css
++ * earlier. cgroup_destroy_locked() can still walk it because
++ * offline_css() (which NULLs cgrp->subsys[ssid]) runs async.
++ */
++ if (percpu_ref_is_dying(&css->refcnt))
++ return;
++
++ /*
++ * Killing would put the base ref, but we need to keep it alive until
++ * after ->css_offline().
++ */
++ css_get(css);
++
++ /*
++ * cgroup core guarantees that, by the time ->css_offline() is invoked,
++ * no new css reference will be given out via css_tryget_online(). We
++ * can't simply call percpu_ref_kill() and proceed to offlining css's
++ * because percpu_ref_kill() doesn't guarantee that the ref is seen as
++ * killed on all CPUs on return.
++ *
++ * Use percpu_ref_kill_and_confirm() to get notifications as each css is
++ * confirmed to be seen as killed on all CPUs.
++ */
++ percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn);
++}
++
++/**
++ * cgroup_destroy_locked - destroy @cgrp (called on rmdir)
+ * @cgrp: cgroup to be destroyed
+ *
+- * css's make use of percpu refcnts whose killing latency shouldn't be
+- * exposed to userland and are RCU protected. Also, cgroup core needs to
+- * guarantee that css_tryget_online() won't succeed by the time
+- * ->css_offline() is invoked. To satisfy all the requirements,
+- * destruction is implemented in the following two steps.
+- *
+- * s1. Verify @cgrp can be destroyed and mark it dying. Remove all
+- * userland visible parts and start killing the percpu refcnts of
+- * css's. Set up so that the next stage will be kicked off once all
+- * the percpu refcnts are confirmed to be killed.
+- *
+- * s2. Invoke ->css_offline(), mark the cgroup dead and proceed with the
+- * rest of destruction. Once all cgroup references are gone, the
+- * cgroup is RCU-freed.
+- *
+- * This function implements s1. After this step, @cgrp is gone as far as
+- * the userland is concerned and a new cgroup with the same name may be
+- * created. As cgroup doesn't care about the names internally, this
+- * doesn't cause any problem.
++ * Tear down @cgrp on behalf of rmdir. Constraints:
++ *
++ * - Userspace: rmdir must succeed when cgroup.procs and friends are empty.
++ *
++ * - Kernel: subsystem ->css_offline() must not run while any task in @cgrp's
++ * subtree is still doing kernel work. A task hidden from cgroup.procs (past
++ * exit_signals() with signal->live cleared) can still schedule, allocate, and
++ * consume resources until its final context switch. Dying descendants in the
++ * subtree can host such tasks too.
++ *
++ * - Kernel: css_tryget_online() must fail by the time ->css_offline() runs.
++ *
++ * The destruction runs in three parts:
++ *
++ * - This function: synchronous user-visible state teardown plus kill_css_sync()
++ * on each subsystem css.
++ *
++ * - cgroup_finish_destroy(): kicks the percpu_ref kill via kill_css_finish() on
++ * each subsystem css. Fires once @cgrp's subtree is fully drained, either
++ * inline here or from cgroup_update_populated().
++ *
++ * - The percpu_ref kill chain: css_killed_ref_fn -> css_killed_work_fn ->
++ * ->css_offline() -> release/free.
++ *
++ * Return 0 on success, -EBUSY if a userspace-visible task or an online child
++ * remains.
+ */
+ static int cgroup_destroy_locked(struct cgroup *cgrp)
+- __releases(&cgroup_mutex) __acquires(&cgroup_mutex)
+ {
+ struct cgroup *tcgrp, *parent = cgroup_parent(cgrp);
+ struct cgroup_subsys_state *css;
+ struct cgrp_cset_link *link;
++ struct css_task_iter it;
++ struct task_struct *task;
+ int ssid, ret;
+
+ lockdep_assert_held(&cgroup_mutex);
+
+- /*
+- * Only migration can raise populated from zero and we're already
+- * holding cgroup_mutex.
+- */
+- if (cgroup_is_populated(cgrp))
++ css_task_iter_start(&cgrp->self, 0, &it);
++ task = css_task_iter_next(&it);
++ css_task_iter_end(&it);
++ if (task)
+ return -EBUSY;
+
+ /*
+@@ -6191,9 +6237,8 @@ static int cgroup_destroy_locked(struct
+ link->cset->dead = true;
+ spin_unlock_irq(&css_set_lock);
+
+- /* initiate massacre of all css's */
+ for_each_css(css, ssid, cgrp)
+- kill_css(css);
++ kill_css_sync(css);
+
+ /* clear and remove @cgrp dir, @cgrp has an extra ref on its kn */
+ css_clear_dir(&cgrp->self);
+@@ -6224,79 +6269,27 @@ static int cgroup_destroy_locked(struct
+ /* put the base reference */
+ percpu_ref_kill(&cgrp->self.refcnt);
+
++ if (!cgroup_is_populated(cgrp))
++ cgroup_finish_destroy(cgrp);
++
+ return 0;
+ };
+
+ /**
+- * cgroup_drain_dying - wait for dying tasks to leave before rmdir
+- * @cgrp: the cgroup being removed
++ * cgroup_finish_destroy - deferred half of @cgrp destruction
++ * @cgrp: cgroup whose subtree just became empty
+ *
+- * cgroup.procs and cgroup.threads use css_task_iter which filters out
+- * PF_EXITING tasks so that userspace doesn't see tasks that have already been
+- * reaped via waitpid(). However, cgroup_has_tasks() - which tests whether the
+- * cgroup has non-empty css_sets - is only updated when dying tasks pass through
+- * cgroup_task_dead() in finish_task_switch(). This creates a window where
+- * cgroup.procs reads empty but cgroup_has_tasks() is still true, making rmdir
+- * fail with -EBUSY from cgroup_destroy_locked() even though userspace sees no
+- * tasks.
+- *
+- * This function aligns cgroup_has_tasks() with what userspace can observe. If
+- * cgroup_has_tasks() but the task iterator sees nothing (all remaining tasks are
+- * PF_EXITING), we wait for cgroup_task_dead() to finish processing them. As the
+- * window between PF_EXITING and cgroup_task_dead() is short, the wait is brief.
+- *
+- * This function only concerns itself with this cgroup's own dying tasks.
+- * Whether the cgroup has children is cgroup_destroy_locked()'s problem.
+- *
+- * Each cgroup_task_dead() kicks the waitqueue via cset->cgrp_links, and we
+- * retry the full check from scratch.
+- *
+- * Must be called with cgroup_mutex held.
++ * See cgroup_destroy_locked() for the rationale.
+ */
+-static int cgroup_drain_dying(struct cgroup *cgrp)
+- __releases(&cgroup_mutex) __acquires(&cgroup_mutex)
++static void cgroup_finish_destroy(struct cgroup *cgrp)
+ {
+- struct css_task_iter it;
+- struct task_struct *task;
+- DEFINE_WAIT(wait);
++ struct cgroup_subsys_state *css;
++ int ssid;
+
+ lockdep_assert_held(&cgroup_mutex);
+-retry:
+- if (!cgroup_has_tasks(cgrp))
+- return 0;
+
+- /* Same iterator as cgroup.threads - if any task is visible, it's busy */
+- css_task_iter_start(&cgrp->self, 0, &it);
+- task = css_task_iter_next(&it);
+- css_task_iter_end(&it);
+-
+- if (task)
+- return -EBUSY;
+-
+- /*
+- * All remaining tasks are PF_EXITING and will pass through
+- * cgroup_task_dead() shortly. Wait for a kick and retry.
+- *
+- * cgroup_has_tasks() can't transition from false to true while we're
+- * holding cgroup_mutex, but the true to false transition happens
+- * under css_set_lock (via cgroup_task_dead()). We must retest and
+- * prepare_to_wait() under css_set_lock. Otherwise, the transition
+- * can happen between our first test and prepare_to_wait(), and we
+- * sleep with no one to wake us.
+- */
+- spin_lock_irq(&css_set_lock);
+- if (!cgroup_has_tasks(cgrp)) {
+- spin_unlock_irq(&css_set_lock);
+- return 0;
+- }
+- prepare_to_wait(&cgrp->dying_populated_waitq, &wait,
+- TASK_UNINTERRUPTIBLE);
+- spin_unlock_irq(&css_set_lock);
+- mutex_unlock(&cgroup_mutex);
+- schedule();
+- finish_wait(&cgrp->dying_populated_waitq, &wait);
+- mutex_lock(&cgroup_mutex);
+- goto retry;
++ for_each_css(css, ssid, cgrp)
++ kill_css_finish(css);
+ }
+
+ int cgroup_rmdir(struct kernfs_node *kn)
+@@ -6308,12 +6301,9 @@ int cgroup_rmdir(struct kernfs_node *kn)
+ if (!cgrp)
+ return 0;
+
+- ret = cgroup_drain_dying(cgrp);
+- if (!ret) {
+- ret = cgroup_destroy_locked(cgrp);
+- if (!ret)
+- TRACE_CGROUP_PATH(rmdir, cgrp);
+- }
++ ret = cgroup_destroy_locked(cgrp);
++ if (!ret)
++ TRACE_CGROUP_PATH(rmdir, cgrp);
+
+ cgroup_kn_unlock(kn);
+ return ret;
+@@ -7073,7 +7063,6 @@ void cgroup_task_exit(struct task_struct
+
+ static void do_cgroup_task_dead(struct task_struct *tsk)
+ {
+- struct cgrp_cset_link *link;
+ struct css_set *cset;
+ unsigned long flags;
+
+@@ -7087,11 +7076,6 @@ static void do_cgroup_task_dead(struct t
+ if (thread_group_leader(tsk) && atomic_read(&tsk->signal->live))
+ list_add_tail(&tsk->cg_list, &cset->dying_tasks);
+
+- /* kick cgroup_drain_dying() waiters, see cgroup_rmdir() */
+- list_for_each_entry(link, &cset->cgrp_links, cgrp_link)
+- if (waitqueue_active(&link->cgrp->dying_populated_waitq))
+- wake_up(&link->cgrp->dying_populated_waitq);
+-
+ if (dl_task(tsk))
+ dec_dl_tasks_cs(tsk);
+
--- /dev/null
+From stable+bounces-246935-greg=kroah.com@vger.kernel.org Wed May 13 19:05:07 2026
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 13 May 2026 12:33:13 -0400
+Subject: cgroup: Increment nr_dying_subsys_* from rmdir context
+To: stable@vger.kernel.org
+Cc: Petr Malat <oss@malat.biz>, Tejun Heo <tj@kernel.org>, Sasha Levin <sashal@kernel.org>
+Message-ID: <20260513163314.3807064-1-sashal@kernel.org>
+
+From: Petr Malat <oss@malat.biz>
+
+[ Upstream commit 13e786b64bd3fd81c7eb22aa32bf8305c32f2ccf ]
+
+Incrementing nr_dying_subsys_* in offline_css(), which is executed by
+cgroup_offline_wq worker, leads to a race where user can see the value
+to be 0 if he reads cgroup.stat after calling rmdir and before the worker
+executes. This makes the user wrongly expect resources released by the
+removed cgroup to be available for a new assignment.
+
+Increment nr_dying_subsys_* from kill_css(), which is called from the
+cgroup_rmdir() context.
+
+Fixes: ab0312526867 ("cgroup: Show # of subsystem CSSes in cgroup.stat")
+Signed-off-by: Petr Malat <oss@malat.biz>
+Signed-off-by: Tejun Heo <tj@kernel.org>
+Stable-dep-of: 93618edf7538 ("cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ kernel/cgroup/cgroup.c | 22 ++++++++++++----------
+ 1 file changed, 12 insertions(+), 10 deletions(-)
+
+--- a/kernel/cgroup/cgroup.c
++++ b/kernel/cgroup/cgroup.c
+@@ -5768,16 +5768,6 @@ static void offline_css(struct cgroup_su
+ RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
+
+ wake_up_all(&css->cgroup->offline_waitq);
+-
+- css->cgroup->nr_dying_subsys[ss->id]++;
+- /*
+- * Parent css and cgroup cannot be freed until after the freeing
+- * of child css, see css_free_rwork_fn().
+- */
+- while ((css = css->parent)) {
+- css->nr_descendants--;
+- css->cgroup->nr_dying_subsys[ss->id]++;
+- }
+ }
+
+ /**
+@@ -6089,6 +6079,8 @@ static void css_killed_ref_fn(struct per
+ */
+ static void kill_css(struct cgroup_subsys_state *css)
+ {
++ struct cgroup_subsys *ss = css->ss;
++
+ lockdep_assert_held(&cgroup_mutex);
+
+ if (css->flags & CSS_DYING)
+@@ -6125,6 +6117,16 @@ static void kill_css(struct cgroup_subsy
+ * css is confirmed to be seen as killed on all CPUs.
+ */
+ percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn);
++
++ css->cgroup->nr_dying_subsys[ss->id]++;
++ /*
++ * Parent css and cgroup cannot be freed until after the freeing
++ * of child css, see css_free_rwork_fn().
++ */
++ while ((css = css->parent)) {
++ css->nr_descendants--;
++ css->cgroup->nr_dying_subsys[ss->id]++;
++ }
+ }
+
+ /**
--- /dev/null
+From stable+bounces-247230-greg=kroah.com@vger.kernel.org Thu May 14 17:11:10 2026
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 14 May 2026 11:08:25 -0400
+Subject: EDAC/versalnet: Fix device name memory leak
+To: stable@vger.kernel.org
+Cc: Prasanna Kumar T S M <ptsm@linux.microsoft.com>, "Borislav Petkov (AMD)" <bp@alien8.de>, Sasha Levin <sashal@kernel.org>
+Message-ID: <20260514150825.274588-2-sashal@kernel.org>
+
+From: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
+
+[ Upstream commit 8cf5dd235eff6008cb04c3d8064d2acfa90616f1 ]
+
+The device name allocated via kzalloc() in init_one_mc() is assigned to
+dev->init_name but never freed on the normal removal path. device_register()
+copies init_name and then sets dev->init_name to NULL, so the name pointer
+becomes unreachable from the device. Thus leaking memory.
+
+Use a stack-local char array instead of using kzalloc() for name.
+
+Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller")
+Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
+Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
+Cc: stable@vger.kernel.org
+Link: https://patch.msgid.link/20260401111856.2342975-1-ptsm@linux.microsoft.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/edac/versalnet_edac.c | 10 ++--------
+ 1 file changed, 2 insertions(+), 8 deletions(-)
+
+--- a/drivers/edac/versalnet_edac.c
++++ b/drivers/edac/versalnet_edac.c
+@@ -777,9 +777,9 @@ static int init_one_mc(struct mc_priv *p
+ u32 num_chans, rank, dwidth, config;
+ struct edac_mc_layer layers[2];
+ struct mem_ctl_info *mci;
++ char name[MC_NAME_LEN];
+ struct device *dev;
+ enum dev_type dt;
+- char *name;
+ int rc;
+
+ config = priv->adec[CONF + i * ADEC_NUM];
+@@ -813,13 +813,9 @@ static int init_one_mc(struct mc_priv *p
+ layers[1].is_virt_csrow = false;
+
+ rc = -ENOMEM;
+- name = kzalloc(MC_NAME_LEN, GFP_KERNEL);
+- if (!name)
+- return rc;
+-
+ dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+ if (!dev)
+- goto err_name_free;
++ return rc;
+
+ mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, sizeof(struct mc_priv));
+ if (!mci) {
+@@ -858,8 +854,6 @@ err_mc_free:
+ edac_mc_free(mci);
+ err_dev_free:
+ kfree(dev);
+-err_name_free:
+- kfree(name);
+
+ return rc;
+ }
--- /dev/null
+From stable+bounces-247229-greg=kroah.com@vger.kernel.org Thu May 14 17:11:14 2026
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 14 May 2026 11:08:24 -0400
+Subject: EDAC/versalnet: Refactor memory controller initialization and cleanup
+To: stable@vger.kernel.org
+Cc: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>, "Borislav Petkov (AMD)" <bp@alien8.de>, Sasha Levin <sashal@kernel.org>
+Message-ID: <20260514150825.274588-1-sashal@kernel.org>
+
+From: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
+
+[ Upstream commit 62a9fc50e8d947601ea3484e732b1a65a0a54b96 ]
+
+Simplify the initialization and cleanup flow for Versal Net DDRMC
+controllers in the EDAC driver by carving out the single controller init
+into a separate function which allows for a much better and more
+readable error handling and unwinding.
+
+ [ bp:
+ - do the kzalloc allocations first
+ - "publish" the structures only after they've been initialized
+ properly so that you don't need to unwind unnecessarily when
+ it fails later
+ - remove_versalnet() is now trivial
+ ]
+
+Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
+Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
+Link: https://patch.msgid.link/20251104093932.3838876-1-shubhrajyoti.datta@amd.com
+Stable-dep-of: 8cf5dd235eff ("EDAC/versalnet: Fix device name memory leak")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/edac/versalnet_edac.c | 174 +++++++++++++++++++++++-------------------
+ 1 file changed, 97 insertions(+), 77 deletions(-)
+
+--- a/drivers/edac/versalnet_edac.c
++++ b/drivers/edac/versalnet_edac.c
+@@ -70,6 +70,8 @@
+ #define XDDR5_BUS_WIDTH_32 1
+ #define XDDR5_BUS_WIDTH_16 2
+
++#define MC_NAME_LEN 32
++
+ /**
+ * struct ecc_error_info - ECC error log information.
+ * @burstpos: Burst position.
+@@ -760,7 +762,17 @@ static void versal_edac_release(struct d
+ kfree(dev);
+ }
+
+-static int init_versalnet(struct mc_priv *priv, struct platform_device *pdev)
++static void remove_one_mc(struct mc_priv *priv, int i)
++{
++ struct mem_ctl_info *mci;
++
++ mci = priv->mci[i];
++ device_unregister(mci->pdev);
++ edac_mc_del_mc(mci->pdev);
++ edac_mc_free(mci);
++}
++
++static int init_one_mc(struct mc_priv *priv, struct platform_device *pdev, int i)
+ {
+ u32 num_chans, rank, dwidth, config;
+ struct edac_mc_layer layers[2];
+@@ -768,102 +780,110 @@ static int init_versalnet(struct mc_priv
+ struct device *dev;
+ enum dev_type dt;
+ char *name;
+- int rc, i;
++ int rc;
+
+- for (i = 0; i < NUM_CONTROLLERS; i++) {
+- config = priv->adec[CONF + i * ADEC_NUM];
+- num_chans = FIELD_GET(MC5_NUM_CHANS_MASK, config);
+- rank = 1 << FIELD_GET(MC5_RANK_MASK, config);
+- dwidth = FIELD_GET(MC5_BUS_WIDTH_MASK, config);
+-
+- switch (dwidth) {
+- case XDDR5_BUS_WIDTH_16:
+- dt = DEV_X16;
+- break;
+- case XDDR5_BUS_WIDTH_32:
+- dt = DEV_X32;
+- break;
+- case XDDR5_BUS_WIDTH_64:
+- dt = DEV_X64;
+- break;
+- default:
+- dt = DEV_UNKNOWN;
+- }
++ config = priv->adec[CONF + i * ADEC_NUM];
++ num_chans = FIELD_GET(MC5_NUM_CHANS_MASK, config);
++ rank = 1 << FIELD_GET(MC5_RANK_MASK, config);
++ dwidth = FIELD_GET(MC5_BUS_WIDTH_MASK, config);
++
++ switch (dwidth) {
++ case XDDR5_BUS_WIDTH_16:
++ dt = DEV_X16;
++ break;
++ case XDDR5_BUS_WIDTH_32:
++ dt = DEV_X32;
++ break;
++ case XDDR5_BUS_WIDTH_64:
++ dt = DEV_X64;
++ break;
++ default:
++ dt = DEV_UNKNOWN;
++ }
+
+- if (dt == DEV_UNKNOWN)
+- continue;
++ if (dt == DEV_UNKNOWN)
++ return 0;
+
+- /* Find the first enabled device and register that one. */
+- layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+- layers[0].size = rank;
+- layers[0].is_virt_csrow = true;
+- layers[1].type = EDAC_MC_LAYER_CHANNEL;
+- layers[1].size = num_chans;
+- layers[1].is_virt_csrow = false;
+-
+- rc = -ENOMEM;
+- mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers,
+- sizeof(struct mc_priv));
+- if (!mci) {
+- edac_printk(KERN_ERR, EDAC_MC, "Failed memory allocation for MC%d\n", i);
+- goto err_alloc;
+- }
++ /* Find the first enabled device and register that one. */
++ layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
++ layers[0].size = rank;
++ layers[0].is_virt_csrow = true;
++ layers[1].type = EDAC_MC_LAYER_CHANNEL;
++ layers[1].size = num_chans;
++ layers[1].is_virt_csrow = false;
+
+- priv->mci[i] = mci;
+- priv->dwidth = dt;
++ rc = -ENOMEM;
++ name = kzalloc(MC_NAME_LEN, GFP_KERNEL);
++ if (!name)
++ return rc;
+
+- dev = kzalloc_obj(*dev);
+- dev->release = versal_edac_release;
+- name = kmalloc(32, GFP_KERNEL);
+- sprintf(name, "versal-net-ddrmc5-edac-%d", i);
+- dev->init_name = name;
+- rc = device_register(dev);
+- if (rc)
+- goto err_alloc;
++ dev = kzalloc(sizeof(*dev), GFP_KERNEL);
++ if (!dev)
++ goto err_name_free;
+
+- mci->pdev = dev;
++ mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, sizeof(struct mc_priv));
++ if (!mci) {
++ edac_printk(KERN_ERR, EDAC_MC, "Failed memory allocation for MC%d\n", i);
++ goto err_dev_free;
++ }
+
+- platform_set_drvdata(pdev, priv);
++ sprintf(name, "versal-net-ddrmc5-edac-%d", i);
+
+- mc_init(mci, dev);
+- rc = edac_mc_add_mc(mci);
+- if (rc) {
+- edac_printk(KERN_ERR, EDAC_MC, "Failed to register MC%d with EDAC core\n", i);
+- goto err_alloc;
+- }
+- }
+- return 0;
++ dev->init_name = name;
++ dev->release = versal_edac_release;
+
+-err_alloc:
+- while (i--) {
+- mci = priv->mci[i];
+- if (!mci)
+- continue;
+-
+- if (mci->pdev) {
+- device_unregister(mci->pdev);
+- edac_mc_del_mc(mci->pdev);
+- }
++ rc = device_register(dev);
++ if (rc)
++ goto err_mc_free;
+
+- edac_mc_free(mci);
++ mci->pdev = dev;
++ mc_init(mci, dev);
++
++ rc = edac_mc_add_mc(mci);
++ if (rc) {
++ edac_printk(KERN_ERR, EDAC_MC, "Failed to register MC%d with EDAC core\n", i);
++ goto err_unreg;
+ }
+
++ priv->mci[i] = mci;
++ priv->dwidth = dt;
++
++ platform_set_drvdata(pdev, priv);
++
++ return 0;
++
++err_unreg:
++ device_unregister(mci->pdev);
++err_mc_free:
++ edac_mc_free(mci);
++err_dev_free:
++ kfree(dev);
++err_name_free:
++ kfree(name);
++
+ return rc;
+ }
+
+-static void remove_versalnet(struct mc_priv *priv)
++static int init_versalnet(struct mc_priv *priv, struct platform_device *pdev)
+ {
+- struct mem_ctl_info *mci;
+- int i;
++ int rc, i;
+
+ for (i = 0; i < NUM_CONTROLLERS; i++) {
+- device_unregister(priv->mci[i]->pdev);
+- mci = edac_mc_del_mc(priv->mci[i]->pdev);
+- if (!mci)
+- return;
++ rc = init_one_mc(priv, pdev, i);
++ if (rc) {
++ while (i--)
++ remove_one_mc(priv, i);
+
+- edac_mc_free(mci);
++ return rc;
++ }
+ }
++ return 0;
++}
++
++static void remove_versalnet(struct mc_priv *priv)
++{
++ for (int i = 0; i < NUM_CONTROLLERS; i++)
++ remove_one_mc(priv, i);
+ }
+
+ static int mc_probe(struct platform_device *pdev)
--- /dev/null
+From 898ad80d1207cbdb22b21bafb6de4adfd7627bd0 Mon Sep 17 00:00:00 2001
+From: Pavel Begunkov <asml.silence@gmail.com>
+Date: Mon, 23 Mar 2026 12:43:57 +0000
+Subject: io_uring/zcrx: use guards for locking
+
+From: Pavel Begunkov <asml.silence@gmail.com>
+
+commit 898ad80d1207cbdb22b21bafb6de4adfd7627bd0 upstream.
+
+Convert last several places using manual locking to guards to simplify
+the code.
+
+Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
+Link: https://patch.msgid.link/eb4667cfaf88c559700f6399da9e434889f5b04a.1774261953.git.asml.silence@gmail.com
+Signed-off-by: Jens Axboe <axboe@kernel.dk>
+Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ io_uring/zcrx.c | 15 +++++++--------
+ 1 file changed, 7 insertions(+), 8 deletions(-)
+
+--- a/io_uring/zcrx.c
++++ b/io_uring/zcrx.c
+@@ -586,9 +586,8 @@ static void io_zcrx_return_niov_freelist
+ {
+ struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
+
+- spin_lock_bh(&area->freelist_lock);
++ guard(spinlock_bh)(&area->freelist_lock);
+ area->freelist[area->free_count++] = net_iov_idx(niov);
+- spin_unlock_bh(&area->freelist_lock);
+ }
+
+ static void io_zcrx_return_niov(struct net_iov *niov)
+@@ -1029,7 +1028,8 @@ static void io_zcrx_refill_slow(struct p
+ {
+ struct io_zcrx_area *area = ifq->area;
+
+- spin_lock_bh(&area->freelist_lock);
++ guard(spinlock_bh)(&area->freelist_lock);
++
+ while (area->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) {
+ struct net_iov *niov = __io_zcrx_get_free_niov(area);
+ netmem_ref netmem = net_iov_to_netmem(niov);
+@@ -1038,7 +1038,6 @@ static void io_zcrx_refill_slow(struct p
+ io_zcrx_sync_for_device(pp, niov);
+ net_mp_netmem_place_in_cache(pp, netmem);
+ }
+- spin_unlock_bh(&area->freelist_lock);
+ }
+
+ static netmem_ref io_pp_zc_alloc_netmems(struct page_pool *pp, gfp_t gfp)
+@@ -1264,10 +1263,10 @@ static struct net_iov *io_alloc_fallback
+ if (area->mem.is_dmabuf)
+ return NULL;
+
+- spin_lock_bh(&area->freelist_lock);
+- if (area->free_count)
+- niov = __io_zcrx_get_free_niov(area);
+- spin_unlock_bh(&area->freelist_lock);
++ scoped_guard(spinlock_bh, &area->freelist_lock) {
++ if (area->free_count)
++ niov = __io_zcrx_get_free_niov(area);
++ }
+
+ if (niov)
+ page_pool_fragment_netmem(net_iov_to_netmem(niov), 1);
--- /dev/null
+From 770594e78c3964cf23cf5287f849437cdde9b7d0 Mon Sep 17 00:00:00 2001
+From: Pavel Begunkov <asml.silence@gmail.com>
+Date: Tue, 21 Apr 2026 09:45:29 +0100
+Subject: io_uring/zcrx: warn on freelist violations
+
+From: Pavel Begunkov <asml.silence@gmail.com>
+
+commit 770594e78c3964cf23cf5287f849437cdde9b7d0 upstream.
+
+The freelist is appropriately sized to always be able to take a free
+niov, but let's be more defensive and check the invariant with a
+warning. That should help to catch any double-free issues.
+
+Suggested-by: Kai Aizen <kai@snailsploit.com>
+Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
+Link: https://patch.msgid.link/2f3cea363b04649755e3b6bb9ab66485a95936d5.1776760901.git.asml.silence@gmail.com
+Signed-off-by: Jens Axboe <axboe@kernel.dk>
+Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ io_uring/zcrx.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/io_uring/zcrx.c
++++ b/io_uring/zcrx.c
+@@ -587,6 +587,8 @@ static void io_zcrx_return_niov_freelist
+ struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
+
+ guard(spinlock_bh)(&area->freelist_lock);
++ if (WARN_ON_ONCE(area->free_count >= area->nia.num_niovs))
++ return;
+ area->freelist[area->free_count++] = net_iov_idx(niov);
+ }
+
--- /dev/null
+From stable+bounces-247282-greg=kroah.com@vger.kernel.org Thu May 14 21:27:11 2026
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 14 May 2026 15:25:53 -0400
+Subject: kho: fix error handling in kho_add_subtree()
+To: stable@vger.kernel.org
+Cc: Breno Leitao <leitao@debian.org>, Pratyush Yadav <pratyush@kernel.org>, "Mike Rapoport (Microsoft)" <rppt@kernel.org>, Alexander Graf <graf@amazon.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, Andrew Morton <akpm@linux-foundation.org>, Sasha Levin <sashal@kernel.org>
+Message-ID: <20260514192553.1255751-1-sashal@kernel.org>
+
+From: Breno Leitao <leitao@debian.org>
+
+[ Upstream commit 9ec95329894864170a1a7685b9a11b739393131a ]
+
+Fix two error handling issues in kho_add_subtree(), where it doesn't
+handle the error path correctly.
+
+1. If fdt_setprop() fails after the subnode has been created, the
+ subnode is not removed. This leaves an incomplete node in the FDT
+ (missing "preserved-data" or "blob-size" properties).
+
+2. The fdt_setprop() return value (an FDT error code) is stored
+ directly in err and returned to the caller, which expects -errno.
+
+Fix both by storing fdt_setprop() results in fdt_err, jumping to a new
+out_del_node label that removes the subnode on failure, and only setting
+err = 0 on the success path, otherwise returning -ENOMEM (instead of
+FDT_ERR_ errors that would come from fdt_setprop).
+
+No user-visible changes. This patch fixes error handling in the KHO
+(Kexec HandOver) subsystem, which is used to preserve data across kexec
+reboots. The fix only affects a rare failure path during kexec
+preparation — specifically when the kernel runs out of space in the
+Flattened Device Tree buffer while registering preserved memory regions.
+
+In the unlikely event that this error path was triggered, the old code
+would leave a malformed node in the device tree and return an incorrect
+error code to the calling subsystem, which could lead to confusing log
+messages or incorrect recovery decisions. With this fix, the incomplete
+node is properly cleaned up and the appropriate errno value is propagated,
+this error code is not returned to the user.
+
+Link: https://lore.kernel.org/20260410-kho_fix_send-v2-1-1b4debf7ee08@debian.org
+Fixes: 3dc92c311498 ("kexec: add Kexec HandOver (KHO) generation helpers")
+Signed-off-by: Breno Leitao <leitao@debian.org>
+Suggested-by: Pratyush Yadav <pratyush@kernel.org>
+Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
+Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
+Cc: Alexander Graf <graf@amazon.com>
+Cc: Breno Leitao <leitao@debian.org>
+Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ kernel/liveupdate/kexec_handover.c | 13 +++++++++----
+ 1 file changed, 9 insertions(+), 4 deletions(-)
+
+--- a/kernel/liveupdate/kexec_handover.c
++++ b/kernel/liveupdate/kexec_handover.c
+@@ -757,13 +757,18 @@ int kho_add_subtree(const char *name, vo
+ goto out_pack;
+ }
+
+- err = fdt_setprop(root_fdt, off, KHO_FDT_SUB_TREE_PROP_NAME,
+- &phys, sizeof(phys));
+- if (err < 0)
+- goto out_pack;
++ fdt_err = fdt_setprop(root_fdt, off, KHO_FDT_SUB_TREE_PROP_NAME,
++ &phys, sizeof(phys));
++ if (fdt_err < 0)
++ goto out_del_node;
+
+ WARN_ON_ONCE(kho_debugfs_fdt_add(&kho_out.dbg, name, fdt, false));
+
++ err = 0;
++ goto out_pack;
++
++out_del_node:
++ fdt_del_node(root_fdt, off);
+ out_pack:
+ fdt_pack(root_fdt);
+
--- /dev/null
+From stable+bounces-247163-greg=kroah.com@vger.kernel.org Thu May 14 12:35:09 2026
+From: Lorenzo Stoakes <ljs@kernel.org>
+Date: Thu, 14 May 2026 11:33:20 +0100
+Subject: mm/vma: do not try to unmap a VMA if mmap_prepare() invoked from mmap()
+To: stable@vger.kernel.org
+Message-ID: <20260514103320.155081-1-ljs@kernel.org>
+
+From: Lorenzo Stoakes <ljs@kernel.org>
+
+[ Upstream commit 619eab23e1ce7c97e54bfc5a417306d94b3f6f13 ]
+
+The mmap_prepare hook functionality includes the ability to invoke
+mmap_prepare() from the mmap() hook of existing 'stacked' drivers, that is
+ones which are capable of calling the mmap hooks of other drivers/file
+systems (e.g. overlayfs, shm).
+
+As part of the mmap_prepare action functionality, we deal with errors by
+unmapping the VMA should one arise. This works in the usual mmap_prepare
+case, as we invoke this action at the last moment, when the VMA is
+established in the maple tree.
+
+However, the mmap() hook passes a not-fully-established VMA pointer to the
+caller (which is the motivation behind the mmap_prepare() work), which is
+detached.
+
+So attempting to unmap a VMA in this state will be problematic, with the
+most obvious symptom being a warning in vma_mark_detached(), because the
+VMA is already detached.
+
+It's also unncessary - the mmap() handler will clean up the VMA on error.
+
+So to fix this issue, this patch propagates whether or not an mmap action
+is being completed via the compatibility layer or directly.
+
+If the former, then we do not attempt VMA cleanup, if the latter, then we
+do.
+
+This patch also updates the userland VMA tests to reflect the change.
+
+Link: https://lore.kernel.org/20260421102150.189982-1-ljs@kernel.org
+Fixes: ac0a3fc9c07d ("mm: add ability to take further action in vm_area_desc")
+Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
+Reported-by: syzbot+db390288d141a1dccf96@syzkaller.appspotmail.com
+Closes: https://lore.kernel.org/all/69e69734.050a0220.24bfd3.0027.GAE@google.com/
+Cc: David Hildenbrand <david@kernel.org>
+Cc: Jann Horn <jannh@google.com>
+Cc: Liam Howlett <liam.howlett@oracle.com>
+Cc: Michal Hocko <mhocko@suse.com>
+Cc: Mike Rapoport <rppt@kernel.org>
+Cc: Pedro Falcato <pfalcato@suse.de>
+Cc: Suren Baghdasaryan <surenb@google.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ include/linux/mm.h | 2 -
+ mm/util.c | 51 +++++++++++++++++++++-----------------
+ mm/vma.c | 3 --
+ tools/testing/vma/include/dup.h | 41 ++++++++++++++----------------
+ tools/testing/vma/include/stubs.h | 3 +-
+ 5 files changed, 53 insertions(+), 47 deletions(-)
+
+--- a/include/linux/mm.h
++++ b/include/linux/mm.h
+@@ -4080,7 +4080,7 @@ static inline void mmap_action_ioremap_f
+
+ int mmap_action_prepare(struct vm_area_desc *desc);
+ int mmap_action_complete(struct vm_area_struct *vma,
+- struct mmap_action *action);
++ struct mmap_action *action, bool is_compat);
+
+ /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
+ static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
+--- a/mm/util.c
++++ b/mm/util.c
+@@ -1186,7 +1186,8 @@ int compat_vma_mmap(struct file *file, s
+ return err;
+
+ set_vma_from_desc(vma, &desc);
+- err = mmap_action_complete(vma, &desc.action);
++ err = mmap_action_complete(vma, &desc.action,
++ /*is_compat=*/true);
+ if (err) {
+ const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+@@ -1277,28 +1278,31 @@ again:
+ }
+
+ static int mmap_action_finish(struct vm_area_struct *vma,
+- struct mmap_action *action, int err)
++ struct mmap_action *action, int err,
++ bool is_compat)
+ {
++ if (!err && action->success_hook)
++ err = action->success_hook(vma);
++
++ /*
++ * If this is invoked from the compatibility layer, post-mmap() hook
++ * logic will handle cleanup for us.
++ */
++ if (!err || is_compat)
++ return err;
++
+ /*
+ * If an error occurs, unmap the VMA altogether and return an error. We
+ * only clear the newly allocated VMA, since this function is only
+ * invoked if we do NOT merge, so we only clean up the VMA we created.
+ */
+- if (err) {
+- if (action->error_hook) {
+- /* We may want to filter the error. */
+- err = action->error_hook(err);
+-
+- /* The caller should not clear the error. */
+- VM_WARN_ON_ONCE(!err);
+- }
+- return err;
++ if (action->error_hook) {
++ /* We may want to filter the error. */
++ err = action->error_hook(err);
++ /* The caller should not clear the error. */
++ VM_WARN_ON_ONCE(!err);
+ }
+-
+- if (action->success_hook)
+- return action->success_hook(vma);
+-
+- return 0;
++ return err;
+ }
+
+ #ifdef CONFIG_MMU
+@@ -1329,14 +1333,16 @@ EXPORT_SYMBOL(mmap_action_prepare);
+ * mmap_action_complete - Execute VMA descriptor action.
+ * @vma: The VMA to perform the action upon.
+ * @action: The action to perform.
++ * @is_compat: Is this being invoked from the compatibility layer?
+ *
+ * Similar to mmap_action_prepare().
+ *
+- * Return: 0 on success, or error, at which point the VMA will be unmapped.
++ * Return: 0 on success, or error, at which point the VMA will be unmapped if
++ * !@is_compat.
+ */
+ int mmap_action_complete(struct vm_area_struct *vma,
+- struct mmap_action *action)
+-
++ struct mmap_action *action,
++ bool is_compat)
+ {
+ int err = 0;
+
+@@ -1353,7 +1359,7 @@ int mmap_action_complete(struct vm_area_
+ break;
+ }
+
+- return mmap_action_finish(vma, action, err);
++ return mmap_action_finish(vma, action, err, is_compat);
+ }
+ EXPORT_SYMBOL(mmap_action_complete);
+ #else
+@@ -1373,7 +1379,8 @@ int mmap_action_prepare(struct vm_area_d
+ EXPORT_SYMBOL(mmap_action_prepare);
+
+ int mmap_action_complete(struct vm_area_struct *vma,
+- struct mmap_action *action)
++ struct mmap_action *action,
++ bool is_compat)
+ {
+ int err = 0;
+
+@@ -1388,7 +1395,7 @@ int mmap_action_complete(struct vm_area_
+ break;
+ }
+
+- return mmap_action_finish(vma, action, err);
++ return mmap_action_finish(vma, action, err, is_compat);
+ }
+ EXPORT_SYMBOL(mmap_action_complete);
+ #endif
+--- a/mm/vma.c
++++ b/mm/vma.c
+@@ -2708,7 +2708,7 @@ static int call_action_complete(struct m
+ {
+ int err;
+
+- err = mmap_action_complete(vma, action);
++ err = mmap_action_complete(vma, action, /*is_compat=*/false);
+
+ /* If we held the file rmap we need to release it. */
+ if (map->hold_file_rmap_lock) {
+@@ -2778,7 +2778,6 @@ static unsigned long __mmap_region(struc
+
+ if (have_mmap_prepare && allocated_new) {
+ error = call_action_complete(&map, &desc.action, vma);
+-
+ if (error)
+ return error;
+ }
+--- a/tools/testing/vma/include/dup.h
++++ b/tools/testing/vma/include/dup.h
+@@ -1071,8 +1071,17 @@ static inline void vma_set_anonymous(str
+ static inline void set_vma_from_desc(struct vm_area_struct *vma,
+ struct vm_area_desc *desc);
+
+-static inline int __compat_vma_mmap(const struct file_operations *f_op,
+- struct file *file, struct vm_area_struct *vma)
++static inline unsigned long vma_pages(struct vm_area_struct *vma)
++{
++ return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
++}
++
++static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
++{
++ return file->f_op->mmap_prepare(desc);
++}
++
++static inline int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
+ {
+ struct vm_area_desc desc = {
+ .mm = vma->vm_mm,
+@@ -1082,14 +1091,14 @@ static inline int __compat_vma_mmap(cons
+
+ .pgoff = vma->vm_pgoff,
+ .vm_file = vma->vm_file,
+- .vm_flags = vma->vm_flags,
++ .vma_flags = vma->flags,
+ .page_prot = vma->vm_page_prot,
+
+ .action.type = MMAP_NOTHING, /* Default */
+ };
+ int err;
+
+- err = f_op->mmap_prepare(&desc);
++ err = vfs_mmap_prepare(file, &desc);
+ if (err)
+ return err;
+
+@@ -1098,27 +1107,22 @@ static inline int __compat_vma_mmap(cons
+ return err;
+
+ set_vma_from_desc(vma, &desc);
+- return mmap_action_complete(vma, &desc.action);
+-}
++ err = mmap_action_complete(vma, &desc.action,
++ /*is_compat=*/true);
++ if (err) {
++ const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+-static inline int compat_vma_mmap(struct file *file,
+- struct vm_area_struct *vma)
+-{
+- return __compat_vma_mmap(file->f_op, file, vma);
++ do_munmap(current->mm, vma->vm_start, len, NULL);
++ }
++ return err;
+ }
+
+-
+ static inline void vma_iter_init(struct vma_iterator *vmi,
+ struct mm_struct *mm, unsigned long addr)
+ {
+ mas_init(&vmi->mas, &mm->mm_mt, addr);
+ }
+
+-static inline unsigned long vma_pages(struct vm_area_struct *vma)
+-{
+- return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+-}
+-
+ static inline void mmap_assert_locked(struct mm_struct *);
+ static inline struct vm_area_struct *find_vma_intersection(struct mm_struct *mm,
+ unsigned long start_addr,
+@@ -1309,11 +1313,6 @@ static inline int vfs_mmap(struct file *
+ return file->f_op->mmap(file, vma);
+ }
+
+-static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
+-{
+- return file->f_op->mmap_prepare(desc);
+-}
+-
+ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
+ {
+ /* Changing an anonymous vma with this is illegal */
+--- a/tools/testing/vma/include/stubs.h
++++ b/tools/testing/vma/include/stubs.h
+@@ -87,7 +87,8 @@ static inline int mmap_action_prepare(st
+ }
+
+ static inline int mmap_action_complete(struct vm_area_struct *vma,
+- struct mmap_action *action)
++ struct mmap_action *action,
++ bool is_compat)
+ {
+ return 0;
+ }
--- /dev/null
+From stable+bounces-247054-greg=kroah.com@vger.kernel.org Thu May 14 01:47:02 2026
+From: Florian Fainelli <florian.fainelli@broadcom.com>
+Date: Wed, 13 May 2026 16:46:38 -0700
+Subject: perf build: fix "argument list too long" in second location
+To: stable@vger.kernel.org
+Cc: Markus Mayer <mmayer@broadcom.com>, James Clark <james.clark@linaro.org>, Namhyung Kim <namhyung@kernel.org>, Florian Fainelli <florian.fainelli@broadcom.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>, linux-perf-users@vger.kernel.org (open list:PERFORMANCE EVENTS SUBSYSTEM), linux-kernel@vger.kernel.org (open list:PERFORMANCE EVENTS SUBSYSTEM)
+Message-ID: <20260513234639.128528-1-florian.fainelli@broadcom.com>
+
+From: Markus Mayer <mmayer@broadcom.com>
+
+commit 97ab89686a9e5d087042dbe73604a32b3de72653 upstream
+
+Turns out that displaying "RM $^" via quiet_cmd_rm can also upset the
+shell and cause it to display "argument list too long".
+
+Trying to quote $^ doesn't help.
+
+In the end, *not* displaying the (potentially long) list of files is
+probably the right thing to do for a "quiet" message, anyway. Instead,
+let's display a count of how many files were removed. There is always
+V=1 if more detail is required.
+
+ TEST linux/tools/perf/pmu-events/metric_test.log
+ RM ...634 orphan file(s)...
+ LD linux/tools/perf/util/perf-util-in.o
+
+Also move the comment regarding xargs before the rule, so it doesn't
+show up in the build output.
+
+Signed-off-by: Markus Mayer <mmayer@broadcom.com>
+Reviewed-by: James Clark <james.clark@linaro.org>
+Signed-off-by: Namhyung Kim <namhyung@kernel.org>
+Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ tools/perf/pmu-events/Build | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/tools/perf/pmu-events/Build
++++ b/tools/perf/pmu-events/Build
+@@ -211,10 +211,10 @@ ifneq ($(strip $(ORPHAN_FILES)),)
+
+ # Message for $(call echo-cmd,rm). Generally cleaning files isn't part
+ # of a build step.
+-quiet_cmd_rm = RM $^
++quiet_cmd_rm = RM ...$(words $^) orphan file(s)...
+
++# The list of files can be long. Use xargs to prevent issues.
+ prune_orphans: $(ORPHAN_FILES)
+- # The list of files can be long. Use xargs to prevent issues.
+ $(Q)$(call echo-cmd,rm)echo "$^" | xargs rm -f
+
+ JEVENTS_DEPS += prune_orphans
--- /dev/null
+From arighi@nvidia.com Wed May 13 15:01:26 2026
+From: Andrea Righi <arighi@nvidia.com>
+Date: Wed, 13 May 2026 15:01:11 +0200
+Subject: sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
+To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>, Changwoo Min <changwoo@igalia.com>
+Cc: Chris Mason <clm@meta.com>, Peter Schneider <pschneider1968@googlemail.com>, sched-ext@lists.linux.dev, stable@vger.kernel.org, linux-kernel@vger.kernel.org
+Message-ID: <20260513130111.689740-1-arighi@nvidia.com>
+
+From: Tejun Heo <tj@kernel.org>
+
+commit da2d81b4118a74e65d2335e221a38d665902a98c upstream.
+
+bypass_lb_cpu() transfers tasks between per-CPU bypass DSQs without
+migrating them - task_cpu() only updates when the donee later consumes the
+task via move_remote_task_to_local_dsq(). If the LB timer fires again before
+consumption and the new DSQ becomes a donor, @p is still on the previous CPU
+and task_rq(@p) != donor_rq. @p can't be moved without its own rq locked.
+
+Skip such tasks.
+
+Fixes: 95d1df610cdc ("sched_ext: Implement load balancer for bypass mode")
+Cc: stable@vger.kernel.org # v6.19+
+Reported-by: Chris Mason <clm@meta.com>
+Signed-off-by: Tejun Heo <tj@kernel.org>
+Reviewed-by: Andrea Righi <arighi@nvidia.com>
+[ arighi: replace donor_rq with rq, not present in v7.0.y ]
+Signed-off-by: Andrea Righi <arighi@nvidia.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ kernel/sched/ext.c | 9 +++++++++
+ 1 file changed, 9 insertions(+)
+
+--- a/kernel/sched/ext.c
++++ b/kernel/sched/ext.c
+@@ -4010,6 +4010,15 @@ resume:
+ if (cpumask_empty(donee_mask))
+ break;
+
++ /*
++ * If an earlier pass placed @p on @donor_dsq from a different
++ * CPU and the donee hasn't consumed it yet, @p is still on the
++ * previous CPU and task_rq(@p) != @rq. @p can't be moved
++ * without its rq locked. Skip.
++ */
++ if (task_rq(p) != rq)
++ continue;
++
+ donee = cpumask_any_and_distribute(donee_mask, p->cpus_ptr);
+ if (donee >= nr_cpu_ids)
+ continue;
batman-adv-bla-only-purge-non-released-claims.patch
batman-adv-bla-put-backbone-reference-on-failed-claim-hash-insert.patch
sched_ext-use-hk_type_domain_boot-to-detect-isolcpus-domain-isolation.patch
+usb-typec-tcpm-reset-internal-port-states-on-soft-reset-ams.patch
+io_uring-zcrx-use-guards-for-locking.patch
+io_uring-zcrx-warn-on-freelist-violations.patch
+kho-fix-error-handling-in-kho_add_subtree.patch
+edac-versalnet-refactor-memory-controller-initialization-and-cleanup.patch
+edac-versalnet-fix-device-name-memory-leak.patch
+spi-uniphier-simplify-clock-handling-with-devm_clk_get_enabled.patch
+spi-uniphier-fix-controller-deregistration.patch
+cgroup-increment-nr_dying_subsys_-from-rmdir-context.patch
+cgroup-defer-css-percpu_ref-kill-on-rmdir-until-cgroup-is-depopulated.patch
+sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch
+perf-build-fix-argument-list-too-long-in-second-location.patch
+mm-vma-do-not-try-to-unmap-a-vma-if-mmap_prepare-invoked-from-mmap.patch
--- /dev/null
+From stable+bounces-247104-greg=kroah.com@vger.kernel.org Thu May 14 06:36:57 2026
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 14 May 2026 00:36:31 -0400
+Subject: spi: uniphier: fix controller deregistration
+To: stable@vger.kernel.org
+Cc: Johan Hovold <johan@kernel.org>, Keiji Hayashibara <hayashibara.keiji@socionext.com>, Mark Brown <broonie@kernel.org>, Sasha Levin <sashal@kernel.org>
+Message-ID: <20260514043631.10946-2-sashal@kernel.org>
+
+From: Johan Hovold <johan@kernel.org>
+
+[ Upstream commit 0245435f777264ac45945ed2f325dd095a41d1af ]
+
+Make sure to deregister the controller before releasing underlying
+resources like DMA during driver unbind.
+
+Note that clocks were also disabled before the recent commit
+fdca270f8f87 ("spi: uniphier: Simplify clock handling with
+devm_clk_get_enabled()").
+
+Fixes: 5ba155a4d4cc ("spi: add SPI controller driver for UniPhier SoC")
+Cc: stable@vger.kernel.org # 4.19
+Cc: Keiji Hayashibara <hayashibara.keiji@socionext.com>
+Signed-off-by: Johan Hovold <johan@kernel.org>
+Link: https://patch.msgid.link/20260410081757.503099-25-johan@kernel.org
+Signed-off-by: Mark Brown <broonie@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/spi/spi-uniphier.c | 8 +++++++-
+ 1 file changed, 7 insertions(+), 1 deletion(-)
+
+--- a/drivers/spi/spi-uniphier.c
++++ b/drivers/spi/spi-uniphier.c
+@@ -746,7 +746,7 @@ static int uniphier_spi_probe(struct pla
+
+ host->max_dma_len = min(dma_tx_burst, dma_rx_burst);
+
+- ret = devm_spi_register_controller(&pdev->dev, host);
++ ret = spi_register_controller(host);
+ if (ret)
+ goto out_release_dma;
+
+@@ -771,10 +771,16 @@ static void uniphier_spi_remove(struct p
+ {
+ struct spi_controller *host = platform_get_drvdata(pdev);
+
++ spi_controller_get(host);
++
++ spi_unregister_controller(host);
++
+ if (host->dma_tx)
+ dma_release_channel(host->dma_tx);
+ if (host->dma_rx)
+ dma_release_channel(host->dma_rx);
++
++ spi_controller_put(host);
+ }
+
+ static const struct of_device_id uniphier_spi_match[] = {
--- /dev/null
+From stable+bounces-247103-greg=kroah.com@vger.kernel.org Thu May 14 06:36:56 2026
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 14 May 2026 00:36:30 -0400
+Subject: spi: uniphier: Simplify clock handling with devm_clk_get_enabled()
+To: stable@vger.kernel.org
+Cc: Pei Xiao <xiaopei01@kylinos.cn>, Kunihiko Hayashi <hayashi.kunihiko@socionext.com>, Mark Brown <broonie@kernel.org>, Sasha Levin <sashal@kernel.org>
+Message-ID: <20260514043631.10946-1-sashal@kernel.org>
+
+From: Pei Xiao <xiaopei01@kylinos.cn>
+
+[ Upstream commit fdca270f8f87cae2eb5b619234b9dd11a863ce6b ]
+
+Replace devm_clk_get() followed by clk_prepare_enable() with
+devm_clk_get_enabled() for the clock. This removes the need for
+explicit clock enable and disable calls, as the managed API automatically
+handles clock disabling on device removal or probe failure.
+
+Remove the now-unnecessary clk_disable_unprepare() calls from the probe
+error path and the remove callback. Adjust error labels accordingly.
+
+Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
+Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
+Link: https://patch.msgid.link/b2deeefd4ef1a4bce71116aabfcb7e81400f6d37.1775546948.git.xiaopei01@kylinos.cn
+Signed-off-by: Mark Brown <broonie@kernel.org>
+Stable-dep-of: 0245435f7772 ("spi: uniphier: fix controller deregistration")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/spi/spi-uniphier.c | 18 ++++--------------
+ 1 file changed, 4 insertions(+), 14 deletions(-)
+
+--- a/drivers/spi/spi-uniphier.c
++++ b/drivers/spi/spi-uniphier.c
+@@ -666,28 +666,24 @@ static int uniphier_spi_probe(struct pla
+ }
+ priv->base_dma_addr = res->start;
+
+- priv->clk = devm_clk_get(&pdev->dev, NULL);
++ priv->clk = devm_clk_get_enabled(&pdev->dev, NULL);
+ if (IS_ERR(priv->clk)) {
+ dev_err(&pdev->dev, "failed to get clock\n");
+ ret = PTR_ERR(priv->clk);
+ goto out_host_put;
+ }
+
+- ret = clk_prepare_enable(priv->clk);
+- if (ret)
+- goto out_host_put;
+-
+ irq = platform_get_irq(pdev, 0);
+ if (irq < 0) {
+ ret = irq;
+- goto out_disable_clk;
++ goto out_host_put;
+ }
+
+ ret = devm_request_irq(&pdev->dev, irq, uniphier_spi_handler,
+ 0, "uniphier-spi", priv);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to request IRQ\n");
+- goto out_disable_clk;
++ goto out_host_put;
+ }
+
+ init_completion(&priv->xfer_done);
+@@ -716,7 +712,7 @@ static int uniphier_spi_probe(struct pla
+ if (IS_ERR_OR_NULL(host->dma_tx)) {
+ if (PTR_ERR(host->dma_tx) == -EPROBE_DEFER) {
+ ret = -EPROBE_DEFER;
+- goto out_disable_clk;
++ goto out_host_put;
+ }
+ host->dma_tx = NULL;
+ dma_tx_burst = INT_MAX;
+@@ -766,9 +762,6 @@ out_release_dma:
+ host->dma_tx = NULL;
+ }
+
+-out_disable_clk:
+- clk_disable_unprepare(priv->clk);
+-
+ out_host_put:
+ spi_controller_put(host);
+ return ret;
+@@ -777,14 +770,11 @@ out_host_put:
+ static void uniphier_spi_remove(struct platform_device *pdev)
+ {
+ struct spi_controller *host = platform_get_drvdata(pdev);
+- struct uniphier_spi_priv *priv = spi_controller_get_devdata(host);
+
+ if (host->dma_tx)
+ dma_release_channel(host->dma_tx);
+ if (host->dma_rx)
+ dma_release_channel(host->dma_rx);
+-
+- clk_disable_unprepare(priv->clk);
+ }
+
+ static const struct of_device_id uniphier_spi_match[] = {
--- /dev/null
+From 2909f0d4994fb4306bf116df5ccee797791fce2c Mon Sep 17 00:00:00 2001
+From: Amit Sunil Dhamne <amitsd@google.com>
+Date: Tue, 14 Apr 2026 00:58:32 +0000
+Subject: usb: typec: tcpm: reset internal port states on soft reset AMS
+
+From: Amit Sunil Dhamne <amitsd@google.com>
+
+commit 2909f0d4994fb4306bf116df5ccee797791fce2c upstream.
+
+Reset internal port states (such as vdm_sm_running and
+explicit_contract) on soft reset AMS as the port needs to negotiate a
+new contract. The consequence of leaving the states in as-is cond are as
+follows:
+ * port is in SRC power role and an explicit contract is negotiated
+ with the port partner (in sink role)
+ * port partner sends a Soft Reset AMS while VDM State Machine is
+ running
+ * port accepts the Soft Reset request and the port advertises src caps
+ * port partner sends a Request message but since the explicit_contract
+ and vdm_sm_running are true from previous negotiation, the port ends
+ up sending Soft Reset instead of Accept msg.
+
+Stub Log:
+[ 203.653942] AMS DISCOVER_IDENTITY start
+[ 203.653947] PD TX, header: 0x176f
+[ 203.655901] PD TX complete, status: 0
+[ 203.657470] PD RX, header: 0x124f [1]
+[ 203.657477] Rx VDM cmd 0xff008081 type 2 cmd 1 len 1
+[ 203.657482] AMS DISCOVER_IDENTITY finished
+[ 203.657484] cc:=4
+[ 204.155698] PD RX, header: 0x144f [1]
+[ 204.155718] Rx VDM cmd 0xeeee8001 type 0 cmd 1 len 1
+[ 204.155741] PD TX, header: 0x196f
+[ 204.157622] PD TX complete, status: 0
+[ 204.160060] PD RX, header: 0x4d [1]
+[ 204.160066] state change SRC_READY -> SOFT_RESET [rev2 SOFT_RESET_AMS]
+[ 204.160076] PD TX, header: 0x163
+[ 204.162486] PD TX complete, status: 0
+[ 204.162832] AMS SOFT_RESET_AMS finished
+[ 204.162840] cc:=4
+[ 204.162891] AMS POWER_NEGOTIATION start
+[ 204.162896] state change SOFT_RESET -> AMS_START [rev2 POWER_NEGOTIATION]
+[ 204.162908] state change AMS_START -> SRC_SEND_CAPABILITIES [rev2 POWER_NEGOTIATION]
+[ 204.162913] PD TX, header: 0x1361
+[ 204.165529] PD TX complete, status: 0
+[ 204.165571] pending state change SRC_SEND_CAPABILITIES -> SRC_SEND_CAPABILITIES_TIMEOUT @ 60 ms [rev2 POWER_NEGOTIATION]
+[ 204.166996] PD RX, header: 0x1242 [1]
+[ 204.167009] state change SRC_SEND_CAPABILITIES -> SRC_SOFT_RESET_WAIT_SNK_TX [rev2 POWER_NEGOTIATION]
+[ 204.167019] AMS POWER_NEGOTIATION finished
+[ 204.167020] cc:=4
+[ 204.167083] AMS SOFT_RESET_AMS start
+[ 204.167086] state change SRC_SOFT_RESET_WAIT_SNK_TX -> SOFT_RESET_SEND [rev2 SOFT_RESET_AMS]
+[ 204.167092] PD TX, header: 0x16d
+[ 204.168824] PD TX complete, status: 0
+[ 204.168854] pending state change SOFT_RESET_SEND -> HARD_RESET_SEND @ 60 ms [rev2 SOFT_RESET_AMS]
+[ 204.171876] PD RX, header: 0x43 [1]
+[ 204.171879] AMS SOFT_RESET_AMS finished
+
+This causes COMMON.PROC.PD.11.2 check failure for
+TEST.PD.VDM.SRC.2_Rev2Src test on the PD compliance tester.
+
+Signed-off-by: Amit Sunil Dhamne <amitsd@google.com>
+Fixes: 8d3a0578ad1a ("usb: typec: tcpm: Respond Wait if VDM state machine is running")
+Fixes: f0690a25a140 ("staging: typec: USB Type-C Port Manager (tcpm)")
+Cc: stable <stable@kernel.org>
+Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
+Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
+Link: https://patch.msgid.link/20260414-fix-soft-reset-v1-1-01d7cb9764e2@google.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/usb/typec/tcpm/tcpm.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/drivers/usb/typec/tcpm/tcpm.c
++++ b/drivers/usb/typec/tcpm/tcpm.c
+@@ -5539,6 +5539,8 @@ static void run_state_machine(struct tcp
+ usb_power_delivery_unregister_capabilities(port->partner_source_caps);
+ port->partner_source_caps = NULL;
+ tcpm_pd_send_control(port, PD_CTRL_ACCEPT, TCPC_TX_SOP);
++ port->vdm_sm_running = false;
++ port->explicit_contract = false;
+ tcpm_ams_finish(port);
+ if (port->pwr_role == TYPEC_SOURCE) {
+ port->upcoming_state = SRC_SEND_CAPABILITIES;