From: Shubhang Kaushik Date: Wed, 21 Jan 2026 09:31:53 +0000 (-0800) Subject: sched: Update rq->avg_idle when a task is moved to an idle CPU X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=4b603f1551a73e2868b9e7a14b3938c23275cefb;p=thirdparty%2Fkernel%2Flinux.git sched: Update rq->avg_idle when a task is moved to an idle CPU Currently, rq->idle_stamp is only used to calculate avg_idle during wakeups. This means other paths that move a task to an idle CPU such as fork/clone, execve, or migrations, do not end the CPU's idle status in the scheduler's eyes, leading to an inaccurate avg_idle. This patch introduces update_rq_avg_idle() to provide a more accurate measurement of CPU idle duration. By invoking this helper in put_prev_task_idle(), we ensure avg_idle is updated whenever a CPU stops being idle, regardless of how the new task arrived. Testing on an 80-core Ampere Altra (ARMv8) with 6.19-rc5 baseline: - Hackbench : +7.2% performance gain at 16 threads. - Schbench: Reduced p99.9 tail latencies at high concurrency. Signed-off-by: Shubhang Kaushik Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Vincent Guittot Tested-by: Shubhang Kaushik Link: https://patch.msgid.link/20260121-v8-patch-series-v8-1-b7f1cbee5055@os.amperecomputing.com --- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3cca012d1259c..c5431afe23b05 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3613,6 +3613,18 @@ static inline void ttwu_do_wakeup(struct task_struct *p) trace_sched_wakeup(p); } +void update_rq_avg_idle(struct rq *rq) +{ + u64 delta = rq_clock(rq) - rq->idle_stamp; + u64 max = 2*rq->max_idle_balance_cost; + + update_avg(&rq->avg_idle, delta); + + if (rq->avg_idle > max) + rq->avg_idle = max; + rq->idle_stamp = 0; +} + static void ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, struct rq_flags *rf) @@ -3648,18 +3660,6 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, p->sched_class->task_woken(rq, p); rq_repin_lock(rq, rf); } - - if (rq->idle_stamp) { - u64 delta = rq_clock(rq) - rq->idle_stamp; - u64 max = 2*rq->max_idle_balance_cost; - - update_avg(&rq->avg_idle, delta); - - if (rq->avg_idle > max) - rq->avg_idle = max; - - rq->idle_stamp = 0; - } } /* diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 65eb8f8c1a5d3..aba5ad53c07d0 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -460,6 +460,7 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct t { update_curr_idle(rq); scx_update_idle(rq, false, true); + update_rq_avg_idle(rq); } static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 58c9d244f12b0..127633b1377b5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1670,6 +1670,7 @@ static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp) #endif /* !CONFIG_FAIR_GROUP_SCHED */ +extern void update_rq_avg_idle(struct rq *rq); extern void update_rq_clock(struct rq *rq); /*