From: John Stultz Date: Thu, 30 Apr 2026 21:50:47 +0000 (+0000) Subject: locking: mutex: Fix proxy-exec potentially deactivating tasks marked TASK_RUNNING X-Git-Url: http://git.ipfire.org/gitweb/?a=commitdiff_plain;h=bdaf235913e1f31453c6e0e109d797269f9f0a37;p=thirdparty%2Fkernel%2Flinux.git locking: mutex: Fix proxy-exec potentially deactivating tasks marked TASK_RUNNING Vineeth found came up with a test driver that could trip up workqueue stalls. After fixing one issue this test found, Vineeth reported the test was still failing. Greatly simplified, a task that tries to take a mutex already owned by another task that is sleeping, can hit a edge case in the mutex_lock_common() case. If the task fails to get the lock, calls into schedule, but gets a spurious wakeup, it will find that it is first waiter, and go into the mutex_optimistic_spin() logic. Though before calling mutex_optimistic_spin(), we clear task blocked_on state, since mutex_optimistic_spin() may call schedule() if need_resched() is set. After mutex_optimistic_spin() fails, we set blocked_on again, restart the main mutex loop, try to take the lock and call into schedule_preempt_disabled(). From there, with proxy-execution, we'll see the task is blocked_on, follow the chain, see the owner is sleeping and dequeue the waiting task from the runqueue. This all sounds fine and reasonable. But what I had missed is that in mutex_optimistic_spin(), not only do we call schedule() but we set TASK_RUNNABLE right before doing so. This is ok for that invocation of schedule(). But when we come back we re-set the blocked_on we had just cleared, but we do not re-set the task state to TASK_INTERRUPTIBLE/UNINTERRUPTIBLE. This means we have a task that is blocked_on & TASK_RUNNABLE, so when the proxy execution code dequeues the task, we are in trouble since future wakeups will be shortcut by the ttwu_state_match() check. Thus, to avoid this, after mutex_optimistic_spin(), set the task state back when we set blocked_on. Many many thanks again to Vineeth for his very useful testing driver that uncovered this long hidden bug, that I hadn't tripped in all my testing! Very impressed with the problems he's uncovered! Reported-by: Vineeth Pillai Signed-off-by: John Stultz Signed-off-by: Peter Zijlstra (Intel) Tested-by: Vineeth Pillai Link: https://patch.msgid.link/20260430215103.2978955-3-jstultz@google.com --- diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 09534628dc01a..a93d4c6bee1a3 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -763,6 +763,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas raw_spin_lock_irqsave(&lock->wait_lock, flags); raw_spin_lock(¤t->blocked_lock); __set_task_blocked_on(current, lock); + set_current_state(state); if (opt_acquired) break;