From: John Stultz Date: Tue, 29 Apr 2025 15:07:26 +0000 (-0700) Subject: sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks X-Git-Tag: v6.16-rc1~197^2~4 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=b7ca5743a2604156d6083b88cefacef983f3a3a6;p=thirdparty%2Fkernel%2Flinux.git sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks It was reported that in 6.12, smpboot_create_threads() was taking much longer then in 6.6. I narrowed down the call path to: smpboot_create_threads() -> kthread_create_on_cpu() -> kthread_bind() -> __kthread_bind_mask() ->wait_task_inactive() Where in wait_task_inactive() we were regularly hitting the queued case, which sets a 1 tick timeout, which when called multiple times in a row, accumulates quickly into a long delay. I noticed disabling the DELAY_DEQUEUE sched feature recovered the performance, and it seems the newly create tasks are usually sched_delayed and left on the runqueue. So in wait_task_inactive() when we see the task p->se.sched_delayed, manually dequeue the sched_delayed task with DEQUEUE_DELAYED, so we don't have to constantly wait a tick. Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue") Reported-by: peter-yc.chang@mediatek.com Signed-off-by: John Stultz Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak Link: https://lkml.kernel.org/r/20250429150736.3778580-1-jstultz@google.com --- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 79692f85643fe..a3507ed58424f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2283,6 +2283,12 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state * just go back and repeat. */ rq = task_rq_lock(p, &rf); + /* + * If task is sched_delayed, force dequeue it, to avoid always + * hitting the tick timeout in the queued case + */ + if (p->se.sched_delayed) + dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); trace_sched_wait_task(p); running = task_on_cpu(rq, p); queued = task_on_rq_queued(p);