]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
sched_ext: Fix scx_sched_lock / rq lock ordering
authorTejun Heo <tj@kernel.org>
Tue, 10 Mar 2026 17:12:21 +0000 (07:12 -1000)
committerTejun Heo <tj@kernel.org>
Tue, 10 Mar 2026 17:12:21 +0000 (07:12 -1000)
commit6b36c4c2935c54d6a103389fad2a2a9d25591501
tree610da0ce3d1c32e1b0a8eeb9362e65abe34b5645
parentf4a6c506d11823e7123bc6573fbd8e432245acf4
sched_ext: Fix scx_sched_lock / rq lock ordering

There are two sites that nest rq lock inside scx_sched_lock:

- scx_bypass() takes scx_sched_lock then rq lock per CPU to propagate
  per-cpu bypass flags and re-enqueue tasks.

- sysrq_handle_sched_ext_dump() takes scx_sched_lock to iterate all
  scheds, scx_dump_state() then takes rq lock per CPU for dump.

And scx_claim_exit() takes scx_sched_lock to propagate exits to
descendants. It can be reached from scx_tick(), BPF kfuncs, and many
other paths with rq lock already held, creating the reverse ordering:

  rq lock -> scx_sched_lock vs. scx_sched_lock -> rq lock

Fix by flipping scx_bypass() to take rq lock first, and dropping
scx_sched_lock from sysrq_handle_sched_ext_dump() as scx_sched_all is
already RCU-traversable and scx_dump_lock now prevents dumping a dead
sched. This makes the consistent ordering rq lock -> scx_sched_lock.

Reported-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Link: https://lore.kernel.org/r/20260309163025.2240221-1-yphbchou0911@gmail.com
Fixes: ebeca1f930ea ("sched_ext: Introduce cgroup sub-sched support")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
kernel/sched/ext.c