From: Willy Tarreau <w@1wt.eu>
Date: Thu, 11 Jun 2020 12:25:47 +0000 (+0200)
Subject: BUG/MEDIUM: log: don't hold the log lock during writev() on a file descriptor
X-Git-Tag: v2.2-dev10~61
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=df187875da479c71f12e2e8edb0a2e345f189523;p=thirdparty%2Fhaproxy.git

BUG/MEDIUM: log: don't hold the log lock during writev() on a file descriptor

In issue #648 a second problem was reported, indicating that some users
mistakenly send the log to an FD mapped on a file. This situation doesn't
even enable O_NONBLOCK and results in huge access times in the order of
milliseconds with the lock held and other threads waiting till the
watchdog fires to unblock the situation.

The problem with files is that O_NONBLOCK is ignored, and we still need
to lock otherwise we can end up with interleaved log messages.

What this patch does is different. Instead of locking all writers, it
uses a trylock so that there's always at most one logger and that other
candidates can simply give up and report a failure, just as would happen
if writev() returned -1 due to a pipe full condition. This solution is
elegant because it gives back the control to haproxy to decide to give
up when it takes too much time, while previously it was the kernel that
used to block the syscall.

However at high log rates (500000 req/s) there was up to 50% dropped logs
due to the contention on the lock. In order to address this, we try to
grab the lock up to 200 times and call ha_thread_relax() on failure. This
results in almost no failure (no more than previously with O_NONBLOCK). A
typical test with 6 competing threads writing to stdout chained to a pipe
to a single process shows around 1000 drops for 10 million logs at 480000
lines per second.

Please note that this doesn't mean that writing to a blocking FD is a good
idea, and it might only be temporarily done on testing environments for
debugging. A file or a terminal will continue to block the writing thread
while others spin a little bit and lose their logs, but the writing thread
will still experience performance-killing latencies.

This patch should be backported to 2.1 and 2.0. The code is in log.c in
2.0, but the principle is the same.
---

diff --git a/src/fd.c b/src/fd.c
index 2356a9002b..919e433514 100644
--- a/src/fd.c
+++ b/src/fd.c
@@ -446,6 +446,7 @@ ssize_t fd_write_frag_line(int fd, size_t maxlen, const struct ist pfx[], size_t
 	size_t totlen = 0;
 	size_t sent = 0;
 	int vec = 0;
+	int attempts = 0;
 
 	if (!maxlen)
 		maxlen = ~0;
@@ -481,16 +482,31 @@ ssize_t fd_write_frag_line(int fd, size_t maxlen, const struct ist pfx[], size_t
 		vec++;
 	}
 
+	/* make sure we never interleave writes and we never block. This means
+	 * we prefer to fail on collision than to block. But we don't want to
+	 * lose too many logs so we just perform a few lock attempts then give
+	 * up.
+	 */
+
+	while (HA_SPIN_TRYLOCK(OTHER_LOCK, &log_lock) != 0) {
+		if (++attempts >= 200) {
+			/* so that the caller knows the message couldn't be delivered */
+			sent = -1;
+			errno = EAGAIN;
+			goto leave;
+		}
+		ha_thread_relax();
+	}
+
 	if (unlikely(!fdtab[fd].initialized)) {
 		fdtab[fd].initialized = 1;
 		if (!isatty(fd))
 			fcntl(fd, F_SETFL, O_NONBLOCK);
 	}
-
-	HA_SPIN_LOCK(OTHER_LOCK, &log_lock);
 	sent = writev(fd, iovec, vec);
 	HA_SPIN_UNLOCK(OTHER_LOCK, &log_lock);
 
+ leave:
 	/* sent > 0 if the message was delivered */
 	return sent;
 }