From: Willy Tarreau <w@1wt.eu>
Date: Wed, 8 Nov 2017 13:05:19 +0000 (+0100)
Subject: BUG/MAJOR: threads/tasks: fix the scheduler again
X-Git-Tag: v1.8-rc3~21
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=9e45b33f7ee78953d3ec6ad32d4d9eed3bfc897a;p=thirdparty%2Fhaproxy.git

BUG/MAJOR: threads/tasks: fix the scheduler again

My recent change in commit ce4e0aa ("MEDIUM: task: change the construction
of the loop in process_runnable_tasks()") was bogus as it used to keep the
rq_next across an unlock/lock sequence, occasionally leading to crashes for
tasks that are eligible to any thread. We must use the lookup call for each
new batch instead. The problem is easily triggered with such a configuration :

    global
        nbthread 4

    listen check
        mode http
        bind 0.0.0.0:8080
        redirect location /
        option httpchk GET /
        server s1 127.0.0.1:8080 check inter 1
        server s2 127.0.0.1:8080 check inter 1

Thanks to Olivier for diagnosing this one. No backport is needed.
---

diff --git a/src/task.c b/src/task.c
index 4555f2f01b..98829038e4 100644
--- a/src/task.c
+++ b/src/task.c
@@ -252,13 +252,16 @@ void process_runnable_tasks()
 	}
 
 	HA_SPIN_LOCK(TASK_RQ_LOCK, &rq_lock);
-	rq_next = eb32sc_lookup_ge(&rqueue, rqueue_ticks - TIMER_LOOK_BACK, tid_bit);
 
 	do {
 		/* Note: this loop is one of the fastest code path in
 		 * the whole program. It should not be re-arranged
 		 * without a good reason.
 		 */
+
+		/* we have to restart looking up after every batch */
+		rq_next = eb32sc_lookup_ge(&rqueue, rqueue_ticks - TIMER_LOOK_BACK, tid_bit);
+
 		for (local_tasks_count = 0; local_tasks_count < 16; local_tasks_count++) {
 			if (unlikely(!rq_next)) {
 				/* either we just started or we reached the end