]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
cgroup: Wait for dying tasks to leave on rmdir
authorTejun Heo <tj@kernel.org>
Tue, 24 Mar 2026 20:21:25 +0000 (10:21 -1000)
committerTejun Heo <tj@kernel.org>
Tue, 24 Mar 2026 20:21:40 +0000 (10:21 -1000)
commit1b164b876c36c3eb5561dd9b37702b04401b0166
treef691f34f1be6889dc67c6c73ff47189ffa9c85ff
parenta72f73c4dd9b209c53cf8b03b6e97fcefad4262c
cgroup: Wait for dying tasks to leave on rmdir

a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") hid PF_EXITING
tasks from cgroup.procs so that systemd doesn't see tasks that have already
been reaped via waitpid(). However, the populated counter (nr_populated_csets)
is only decremented when the task later passes through cgroup_task_dead() in
finish_task_switch(). This means cgroup.procs can appear empty while the
cgroup is still populated, causing rmdir to fail with -EBUSY.

Fix this by making cgroup_rmdir() wait for dying tasks to fully leave. If the
cgroup is populated but all remaining tasks have PF_EXITING set (the task
iterator returns none due to the existing filter), wait for a kick from
cgroup_task_dead() and retry. The wait is brief as tasks are removed from the
cgroup's css_set between PF_EXITING assertion in do_exit() and
cgroup_task_dead() in finish_task_switch().

v2: cgroup_is_populated() true to false transition happens under css_set_lock
    not cgroup_mutex, so retest under css_set_lock before sleeping to avoid
    missed wakeups (Sebastian).

Fixes: a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202603222104.2c81684e-lkp@intel.com
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Michal Koutny <mkoutny@suse.com>
Cc: cgroups@vger.kernel.org
include/linux/cgroup-defs.h
kernel/cgroup/cgroup.c