]> git.ipfire.org Git - thirdparty/kernel/stable.git/commit
cpuset: Defer flushing of the cpuset_migrate_mm_wq to task_work
authorChuyi Zhou <zhouchuyi@bytedance.com>
Thu, 4 Sep 2025 07:45:04 +0000 (15:45 +0800)
committerTejun Heo <tj@kernel.org>
Thu, 4 Sep 2025 17:22:38 +0000 (07:22 -1000)
commit3514309e03222c0ad06cd3fda0f0d2c98e786bf8
tree40654dd9c801dbce6797a090247aaebe7c44bb41
parentc0fb16ef887d364766d03574ec824509939cf9cc
cpuset: Defer flushing of the cpuset_migrate_mm_wq to task_work

Now in cpuset_attach(), we need to synchronously wait for
flush_workqueue to complete. The execution time of flushing
cpuset_migrate_mm_wq depends on the amount of mm migration initiated by
cpusets at that time. When the cpuset.mems of a cgroup occupying a large
amount of memory is modified, it may trigger extensive mm migration,
causing cpuset_attach() to block on flush_workqueue for an extended period.
This could be dangerous because cpuset_attach() is within the critical
section of cgroup_mutex, which may ultimately cause all cgroup-related
operations in the system to be blocked.

This patch attempts to defer the flush_workqueue() operation until
returning to userspace using the task_work which is originally proposed by
tejun[1], so that flush happens after cgroup_mutex is dropped. That way we
maintain the operation synchronicity while avoiding bothering anyone else.

[1]: https://lore.kernel.org/cgroups/ZgMFPMjZRZCsq9Q-@slm.duckdns.org/T/#m117f606fa24f66f0823a60f211b36f24bd9e1883

Originally-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
kernel/cgroup/cpuset.c