From: Willy Tarreau <w@1wt.eu>
Date: Fri, 16 Oct 2020 07:31:41 +0000 (+0200)
Subject: MEDIUM: task: use an upgradable seek lock when scanning the wait queue
X-Git-Tag: v2.3-dev7~9
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=d48ed6643b29a9f16616916742e83bf36622df00;p=thirdparty%2Fhaproxy.git

MEDIUM: task: use an upgradable seek lock when scanning the wait queue

Right now when running a configuration with many global timers (e.g. many
health checks), there is a lot of contention on the global wait queue
lock because all threads queue up in front of it to scan it.

With 2000 servers checked every 10 milliseconds (200k checks per second),
after 23 seconds running on 8 threads, the lock stats were this high:

  Stats about Lock TASK_WQ:
      write lock  : 9872564
      write unlock: 9872564 (0)
      wait time for write     : 9208.409 msec
      wait time for write/lock: 932.727 nsec
      read lock   : 240367
      read unlock : 240367 (0)
      wait time for read      : 149.025 msec
      wait time for read/lock : 619.991 nsec

i.e. ~5% of the total runtime spent waiting on this specific lock.

With upgradable locks we don't need to work like this anymore. We
can just try to upgade the read lock to a seek lock before scanning
the queue, then upgrade the seek lock to a write lock for each element
we want to delete there and immediately downgrade it to a seek lock.

The benefit is double:
  - all other threads which need to call next_expired_task() before
    polling won't wait anymore since the seek lock is compatible with
    the read lock ;

  - all other threads competing on trying to grab this lock will fail
    on the upgrade attempt from read to seek, and will let the current
    lock owner finish collecting expired entries.

Doing only this has reduced the wake_expired_tasks() CPU usage in a
very large servers test from 2.15% to 1.04% as reported by perf top,
and increased by 3% the health check rate (all threads being saturated).

This is expected to help against (and possibly solve) the problem
described in issue #875.
---

diff --git a/src/task.c b/src/task.c
index 293aefe3b4..3336ebc70f 100644
--- a/src/task.c
+++ b/src/task.c
@@ -289,15 +289,23 @@ void wake_expired_tasks()
 		}
 	}
 	key = eb->key;
-	HA_RWLOCK_RDUNLOCK(TASK_WQ_LOCK, &wq_lock);
 
-	if (tick_is_lt(now_ms, key))
+	if (tick_is_lt(now_ms, key)) {
+		HA_RWLOCK_RDUNLOCK(TASK_WQ_LOCK, &wq_lock);
 		goto leave;
+	}
 
 	/* There's really something of interest here, let's visit the queue */
 
+	if (HA_RWLOCK_TRYRDTOSK(TASK_WQ_LOCK, &wq_lock)) {
+		/* if we failed to grab the lock it means another thread is
+		 * already doing the same here, so let it do the job.
+		 */
+		HA_RWLOCK_RDUNLOCK(TASK_WQ_LOCK, &wq_lock);
+		goto leave;
+	}
+
 	while (1) {
-		HA_RWLOCK_WRLOCK(TASK_WQ_LOCK, &wq_lock);
   lookup_next:
 		if (max_processed-- <= 0)
 			break;
@@ -315,26 +323,29 @@ void wake_expired_tasks()
 		task = eb32_entry(eb, struct task, wq);
 		if (tick_is_expired(task->expire, now_ms)) {
 			/* expired task, wake it up */
+			HA_RWLOCK_SKTOWR(TASK_WQ_LOCK, &wq_lock);
 			__task_unlink_wq(task);
+			HA_RWLOCK_WRTOSK(TASK_WQ_LOCK, &wq_lock);
 			task_wakeup(task, TASK_WOKEN_TIMER);
 		}
 		else if (task->expire != eb->key) {
 			/* task is not expired but its key doesn't match so let's
 			 * update it and skip to next apparently expired task.
 			 */
+			HA_RWLOCK_SKTOWR(TASK_WQ_LOCK, &wq_lock);
 			__task_unlink_wq(task);
 			if (tick_isset(task->expire))
 				__task_queue(task, &timers);
+			HA_RWLOCK_WRTOSK(TASK_WQ_LOCK, &wq_lock);
 			goto lookup_next;
 		}
 		else {
 			/* task not expired and correctly placed */
 			break;
 		}
-		HA_RWLOCK_WRUNLOCK(TASK_WQ_LOCK, &wq_lock);
 	}
 
-	HA_RWLOCK_WRUNLOCK(TASK_WQ_LOCK, &wq_lock);
+	HA_RWLOCK_SKUNLOCK(TASK_WQ_LOCK, &wq_lock);
 #endif
 leave:
 	return;