From: Greg Kroah-Hartman Date: Thu, 13 Jan 2022 08:52:16 +0000 (+0100) Subject: 5.15-stable patches X-Git-Tag: v5.16.1~43 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=ff9ebcf76e024279e65995402f4a4cd11c96ef6b;p=thirdparty%2Fkernel%2Fstable-queue.git 5.15-stable patches added patches: workqueue-fix-unbind_workers-vs-wq_worker_running-race.patch --- diff --git a/queue-5.15/series b/queue-5.15/series index d7d89ff0f88..9ef83e68088 100644 --- a/queue-5.15/series +++ b/queue-5.15/series @@ -1,2 +1,3 @@ s390-kexec-handle-r_390_plt32dbl-rela-in-arch_kexec_apply_relocations_add.patch +workqueue-fix-unbind_workers-vs-wq_worker_running-race.patch fget-clarify-and-improve-__fget_files-implementation.patch diff --git a/queue-5.15/workqueue-fix-unbind_workers-vs-wq_worker_running-race.patch b/queue-5.15/workqueue-fix-unbind_workers-vs-wq_worker_running-race.patch new file mode 100644 index 00000000000..09db3bd0f07 --- /dev/null +++ b/queue-5.15/workqueue-fix-unbind_workers-vs-wq_worker_running-race.patch @@ -0,0 +1,101 @@ +From 07edfece8bcb0580a1828d939e6f8d91a8603eb2 Mon Sep 17 00:00:00 2001 +From: Frederic Weisbecker +Date: Wed, 1 Dec 2021 16:19:44 +0100 +Subject: workqueue: Fix unbind_workers() VS wq_worker_running() race + +From: Frederic Weisbecker + +commit 07edfece8bcb0580a1828d939e6f8d91a8603eb2 upstream. + +At CPU-hotplug time, unbind_worker() may preempt a worker while it is +waking up. In that case the following scenario can happen: + + unbind_workers() wq_worker_running() + -------------- ------------------- + if (!(worker->flags & WORKER_NOT_RUNNING)) + //PREEMPTED by unbind_workers + worker->flags |= WORKER_UNBOUND; + [...] + atomic_set(&pool->nr_running, 0); + //resume to worker + atomic_inc(&worker->pool->nr_running); + +After unbind_worker() resets pool->nr_running, the value is expected to +remain 0 until the pool ever gets rebound in case cpu_up() is called on +the target CPU in the future. But here the race leaves pool->nr_running +with a value of 1, triggering the following warning when the worker goes +idle: + + WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 + Modules linked in: + CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 + Workqueue: 0x0 (rcu_par_gp) + RIP: 0010:worker_enter_idle+0x95/0xc0 + Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 + RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 + RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 + RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 + RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 + R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 + R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 + FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 + DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 + DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 + Call Trace: + + worker_thread+0x89/0x3d0 + ? process_one_work+0x400/0x400 + kthread+0x162/0x190 + ? set_kthread_struct+0x40/0x40 + ret_from_fork+0x22/0x30 + + +Also due to this incorrect "nr_running == 1", further queued work may +end up not being served, because no worker is awaken at work insert time. +This raises rcutorture writer stalls for example. + +Fix this with disabling preemption in the right place in +wq_worker_running(). + +It's worth noting that if the worker migrates and runs concurrently with +unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update +due to set_cpus_allowed_ptr() acquiring/releasing rq->lock. + +Fixes: 6d25be5782e4 ("sched/core, workqueues: Distangle worker accounting from rq lock") +Reviewed-by: Lai Jiangshan +Tested-by: Paul E. McKenney +Acked-by: Peter Zijlstra (Intel) +Signed-off-by: Frederic Weisbecker +Cc: Thomas Gleixner +Cc: Ingo Molnar +Cc: Sebastian Andrzej Siewior +Cc: Daniel Bristot de Oliveira +Signed-off-by: Tejun Heo +Signed-off-by: Greg Kroah-Hartman +--- + kernel/workqueue.c | 9 +++++++++ + 1 file changed, 9 insertions(+) + +--- a/kernel/workqueue.c ++++ b/kernel/workqueue.c +@@ -867,8 +867,17 @@ void wq_worker_running(struct task_struc + + if (!worker->sleeping) + return; ++ ++ /* ++ * If preempted by unbind_workers() between the WORKER_NOT_RUNNING check ++ * and the nr_running increment below, we may ruin the nr_running reset ++ * and leave with an unexpected pool->nr_running == 1 on the newly unbound ++ * pool. Protect against such race. ++ */ ++ preempt_disable(); + if (!(worker->flags & WORKER_NOT_RUNNING)) + atomic_inc(&worker->pool->nr_running); ++ preempt_enable(); + worker->sleeping = 0; + } +