From: Willy Tarreau Date: Thu, 11 Jun 2020 12:25:47 +0000 (+0200) Subject: BUG/MEDIUM: log: don't hold the log lock during writev() on a file descriptor X-Git-Tag: v2.2-dev10~61 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=df187875da479c71f12e2e8edb0a2e345f189523;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: log: don't hold the log lock during writev() on a file descriptor In issue #648 a second problem was reported, indicating that some users mistakenly send the log to an FD mapped on a file. This situation doesn't even enable O_NONBLOCK and results in huge access times in the order of milliseconds with the lock held and other threads waiting till the watchdog fires to unblock the situation. The problem with files is that O_NONBLOCK is ignored, and we still need to lock otherwise we can end up with interleaved log messages. What this patch does is different. Instead of locking all writers, it uses a trylock so that there's always at most one logger and that other candidates can simply give up and report a failure, just as would happen if writev() returned -1 due to a pipe full condition. This solution is elegant because it gives back the control to haproxy to decide to give up when it takes too much time, while previously it was the kernel that used to block the syscall. However at high log rates (500000 req/s) there was up to 50% dropped logs due to the contention on the lock. In order to address this, we try to grab the lock up to 200 times and call ha_thread_relax() on failure. This results in almost no failure (no more than previously with O_NONBLOCK). A typical test with 6 competing threads writing to stdout chained to a pipe to a single process shows around 1000 drops for 10 million logs at 480000 lines per second. Please note that this doesn't mean that writing to a blocking FD is a good idea, and it might only be temporarily done on testing environments for debugging. A file or a terminal will continue to block the writing thread while others spin a little bit and lose their logs, but the writing thread will still experience performance-killing latencies. This patch should be backported to 2.1 and 2.0. The code is in log.c in 2.0, but the principle is the same. --- diff --git a/src/fd.c b/src/fd.c index 2356a9002b..919e433514 100644 --- a/src/fd.c +++ b/src/fd.c @@ -446,6 +446,7 @@ ssize_t fd_write_frag_line(int fd, size_t maxlen, const struct ist pfx[], size_t size_t totlen = 0; size_t sent = 0; int vec = 0; + int attempts = 0; if (!maxlen) maxlen = ~0; @@ -481,16 +482,31 @@ ssize_t fd_write_frag_line(int fd, size_t maxlen, const struct ist pfx[], size_t vec++; } + /* make sure we never interleave writes and we never block. This means + * we prefer to fail on collision than to block. But we don't want to + * lose too many logs so we just perform a few lock attempts then give + * up. + */ + + while (HA_SPIN_TRYLOCK(OTHER_LOCK, &log_lock) != 0) { + if (++attempts >= 200) { + /* so that the caller knows the message couldn't be delivered */ + sent = -1; + errno = EAGAIN; + goto leave; + } + ha_thread_relax(); + } + if (unlikely(!fdtab[fd].initialized)) { fdtab[fd].initialized = 1; if (!isatty(fd)) fcntl(fd, F_SETFL, O_NONBLOCK); } - - HA_SPIN_LOCK(OTHER_LOCK, &log_lock); sent = writev(fd, iovec, vec); HA_SPIN_UNLOCK(OTHER_LOCK, &log_lock); + leave: /* sent > 0 if the message was delivered */ return sent; }