The access rule for local_cpu_mask_dl requires it to be called on the
local CPU with preemption disabled. However, dl_add_task_root_domain()
currently violates this rule.
Without preemption disabled, the following race can occur:
1. ThreadA calls dl_add_task_root_domain() on CPU 0
2. Gets pointer to CPU 0's local_cpu_mask_dl
3. ThreadA is preempted and migrated to CPU 1
4. ThreadA continues using CPU 0's local_cpu_mask_dl
5. Meanwhile, the scheduler on CPU 0 calls find_later_rq() which also
uses local_cpu_mask_dl (with preemption properly disabled)
6. Both contexts now corrupt the same per-CPU buffer concurrently
Fix this by moving the local_cpu_mask_dl access to the preemption
disabled section.
Closes: https://lore.kernel.org/lkml/aSBjm3mN_uIy64nz@jlelli-thinkpadt14gen4.remote.csb
Fixes: 318e18ed22e8 ("sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug")
Reported-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://patch.msgid.link/20251125032630.8746-3-piliu@redhat.com
struct rq *rq;
struct dl_bw *dl_b;
unsigned int cpu;
- struct cpumask *msk = this_cpu_cpumask_var_ptr(local_cpu_mask_dl);
+ struct cpumask *msk;
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
if (!dl_task(p) || dl_entity_is_special(&p->dl)) {
return;
}
+ msk = this_cpu_cpumask_var_ptr(local_cpu_mask_dl);
dl_get_task_effective_cpus(p, msk);
cpu = cpumask_first_and(cpu_active_mask, msk);
BUG_ON(cpu >= nr_cpu_ids);