From: Willy Tarreau Date: Fri, 15 Nov 2019 09:20:07 +0000 (+0100) Subject: BUG/MEDIUM: listeners: always pause a listener on out-of-resource condition X-Git-Tag: v2.1-dev5~19 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=93604edb652542f4149282438dc6e0548cd4d545;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: listeners: always pause a listener on out-of-resource condition A corner case was opened in the listener_accept() code by commit 3f0d02bbc2 ("MAJOR: listener: do not hold the listener lock in listener_accept()"). The issue is when one listener (or a group of) managed to eat all the proxy's or all the process's maxconn, and another listener tries to accept a new socket. This results in the atomic increment to detect the excess connection count and immediately abort, without pausing the listener, thus the call is immediately performed again. This doesn't happen when the test is run on a single listener because this listener got limited when crossing the limit. But with 2 or more listeners, we don't have this luxury. The solution consists in limiting the listener as soon as we have to decline accepting an incoming connection. This means that the listener will not be marked full yet if it gets the exact connection count but this is not a problem in practice since all other listeners will only be marked full after their first attempt. Thus from now on, a listener is only full once it has already failed taking an incoming connection. This bug was definitely responsible for the unreproduceable occasional reports of high CPU usage showing epoll_wait() returning immediately without accepting an incoming connection, like in bug #129. This fix must be backported to 1.9 and 1.8. --- diff --git a/src/listener.c b/src/listener.c index 522967edcc..1ce8c596d7 100644 --- a/src/listener.c +++ b/src/listener.c @@ -727,57 +727,46 @@ void listener_accept(int fd) */ do { count = l->nbconn; - if (l->maxconn && count >= l->maxconn) { + if (unlikely(l->maxconn && count >= l->maxconn)) { /* the listener was marked full or another * thread is going to do it. */ next_conn = 0; + listener_full(l); goto end; } next_conn = count + 1; } while (!_HA_ATOMIC_CAS(&l->nbconn, (int *)(&count), next_conn)); - if (l->maxconn && next_conn == l->maxconn) { - /* we filled it, mark it full */ - listener_full(l); - } - if (p) { do { count = p->feconn; - if (count >= p->maxconn) { + if (unlikely(count >= p->maxconn)) { /* the frontend was marked full or another * thread is going to do it. */ next_feconn = 0; + limit_listener(l, &p->listener_queue); goto end; } next_feconn = count + 1; } while (!_HA_ATOMIC_CAS(&p->feconn, &count, next_feconn)); - - if (unlikely(next_feconn == p->maxconn)) { - /* we just filled it */ - limit_listener(l, &p->listener_queue); - } } if (!(l->options & LI_O_UNLIMITED)) { do { count = actconn; - if (count >= global.maxconn) { + if (unlikely(count >= global.maxconn)) { /* the process was marked full or another * thread is going to do it. */ next_actconn = 0; + limit_listener(l, &global_listener_queue); + task_schedule(global_listener_queue_task, tick_add(now_ms, 1000)); /* try again in 1 second */ goto end; } next_actconn = count + 1; } while (!_HA_ATOMIC_CAS(&actconn, (int *)(&count), next_actconn)); - - if (unlikely(next_actconn == global.maxconn)) { - limit_listener(l, &global_listener_queue); - task_schedule(global_listener_queue_task, tick_add(now_ms, 1000)); /* try again in 1 second */ - } } /* with sockpair@ we don't want to do an accept */