From: Willy Tarreau Date: Wed, 10 Aug 2022 16:03:11 +0000 (+0200) Subject: BUG/MEDIUM: task: relax one thread consistency check in task_unlink_wq() X-Git-Tag: v2.7-dev4~62 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=341ac99f4df128bd522fbf5ff8162c9cb450c1d3;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: task: relax one thread consistency check in task_unlink_wq() While testing the fix for the previous issue related to reloads with hard_stop_after, I've met another one which could spuriously produce: FATAL: bug condition "t->tid >= 0 && t->tid != tid" matched at include/haproxy/task.h:266 In 2.3-dev2, we've added more consistency checks for a number of bug- inducing programming errors related to the tasks, via commit e5d79bccc ("MINOR: tasks/debug: add a few BUG_ON() to detect use of wrong timer queue"), and this check comes from there. The problem that happens here is that when hard-stop-after is set, we can abort the current thread even if there are still ongoing checks (or connections in fact). In this case some tasks are present in a thread's wait queue and are thus bound exclusively to this thread. During deinit(), the collect and cleanup of all memory areas also stops servers and kills their check tasks. And calling task_destroy() does in turn call task_unlink_wq()... except that it's called from thread 0 which doesn't match the initially planned thread number. Several approaches are possible. One of them would consist in letting threads perform their own cleanup (tasks, pools, FDs, etc). This would possibly be even faster since done in parallel, but some corner cases might be way more complicated (e.g. who will kill a check's task, or what to do with a task found in a local wait queue or run queue, and what about other consistency checks this could violate?). Thus for now this patches takes an easier and more conservative approach consisting in admitting that when the process is stopping, this rule is not necessarily valid, and to let thread 0 collect all other threads' garbage. As such this patch can be backpoted to 2.4. --- diff --git a/include/haproxy/task.h b/include/haproxy/task.h index a9b260aa1c..3806428db8 100644 --- a/include/haproxy/task.h +++ b/include/haproxy/task.h @@ -263,7 +263,7 @@ static inline struct task *task_unlink_wq(struct task *t) if (likely(task_in_wq(t))) { locked = t->tid < 0; - BUG_ON(t->tid >= 0 && t->tid != tid); + BUG_ON(t->tid >= 0 && t->tid != tid && !(global.mode & MODE_STOPPING)); if (locked) HA_RWLOCK_WRLOCK(TASK_WQ_LOCK, &tg_ctx->wq_lock); __task_unlink_wq(t);