From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 3 Dec 2025 21:25:39 +0000 (-0800)
Subject: Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj... 
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=02baaa67d9afc2e56c6e1ac6a1fb1f1dd2be366f;p=thirdparty%2Flinux.git

Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

 - Improve recovery from misbehaving BPF schedulers.

   When a scheduler puts many tasks with varying affinity restrictions
   on a shared DSQ, CPUs scanning through tasks they cannot run can
   overwhelm the system, causing lockups.

   Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
   and hooks into the hardlockup detector to attempt recovery.

   Add scx_cpu0 example scheduler to demonstrate this scenario.

 - Add lockless peek operation for DSQs to reduce lock contention for
   schedulers that need to query queue state during load balancing.

 - Allow scx_bpf_reenqueue_local() to be called from anywhere in
   preparation for deprecating cpu_acquire/release() callbacks in favor
   of generic BPF hooks.

 - Prepare for hierarchical scheduler support: add
   scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
   make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
   structs for future aux__prog parameter.

 - Implement cgroup_set_idle() callback to notify BPF schedulers when a
   cgroup's idle state changes.

 - Fix migration tasks being incorrectly downgraded from
   stop_sched_class to rt_sched_class across sched_ext enable/disable.
   Applied late as the fix is low risk and the bug subtle but needs
   stable backporting.

 - Various fixes and cleanups including cgroup exit ordering,
   SCX_KICK_WAIT reliability, and backward compatibility improvements.

* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
  sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
  sched_ext: tools: Removing duplicate targets during non-cross compilation
  sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
  sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
  sched_ext: Update comments replacing breather with aborting mechanism
  sched_ext: Implement load balancer for bypass mode
  sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
  sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
  sched_ext: Add scx_cpu0 example scheduler
  sched_ext: Hook up hardlockup detector
  sched_ext: Make handle_lockup() propagate scx_verror() result
  sched_ext: Refactor lockup handlers into handle_lockup()
  sched_ext: Make scx_exit() and scx_vexit() return bool
  sched_ext: Exit dispatch and move operations immediately when aborting
  sched_ext: Simplify breather mechanism with scx_aborting flag
  sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
  sched_ext: Refactor do_enqueue_task() local and global DSQ paths
  sched_ext: Use shorter slice in bypass mode
  sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
  sched_ext: Minor cleanups to scx_task_iter
  ...
---

02baaa67d9afc2e56c6e1ac6a1fb1f1dd2be366f
diff --cc kernel/sched/ext.c
index 6827689a0966c,b563b8c3fd24d..05f5a49e9649a
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@@ -473,10 -526,9 +526,9 @@@ struct scx_task_iter 
   */
  static void scx_task_iter_start(struct scx_task_iter *iter)
  {
- 	BUILD_BUG_ON(__SCX_DSQ_ITER_ALL_FLAGS &
- 		     ((1U << __SCX_DSQ_LNODE_PRIV_SHIFT) - 1));
+ 	memset(iter, 0, sizeof(*iter));
  
 -	spin_lock_irq(&scx_tasks_lock);
 +	raw_spin_lock_irq(&scx_tasks_lock);
  
  	iter->cursor = (struct sched_ext_entity){ .flags = SCX_TASK_CURSOR };
  	list_add(&iter->cursor.tasks_node, &scx_tasks);
@@@ -2342,8 -2436,18 +2436,17 @@@ do_pick_task_scx(struct rq *rq, struct 
  	rq_unpin_lock(rq, rf);
  	balance_one(rq, prev);
  	rq_repin_lock(rq, rf);
 -
  	maybe_queue_balance_callback(rq);
- 	if (rq_modified_above(rq, &ext_sched_class))
+ 
+ 	/*
+ 	 * If any higher-priority sched class enqueued a runnable task on
+ 	 * this rq during balance_one(), abort and return RETRY_TASK, so
+ 	 * that the scheduler loop can restart.
+ 	 *
+ 	 * If @force_scx is true, always try to pick a SCHED_EXT task,
+ 	 * regardless of any higher-priority sched classes activity.
+ 	 */
+ 	if (!force_scx && rq_modified_above(rq, &ext_sched_class))
  		return RETRY_TASK;
  
  	keep_prev = rq->scx.flags & SCX_RQ_BAL_KEEP;