From 160fff665e326952ed35aea9be5ed93cefae9c2e Mon Sep 17 00:00:00 2001 From: Christopher Faulet Date: Tue, 26 Jul 2022 19:19:18 +0200 Subject: [PATCH] BUG/MEDIUM: peers: limit reconnect attempts of the old process on reload When peers are configured and HAProxy is reloaded or restarted, a synchronization is performed between the old process and the new one. To do so, the old process connects on the new one. If the synchronization fails, it retries. However, there is no delay and reconnect attempts are not bounded. Thus, it may loop for a while, consuming all the CPU. Of course, it is unexpected, but it is possible. For instance, if the local peer is misconfigured, an infinite loop can be observed if the connection succeeds but not the synchronization. This prevents the old process to exit, except if "hard-stop-after" option is set. To fix the bug, the reconnect is delayed. The local peer already has a expiration date to delay the reconnects. But it was not used on stopping mode. So we use it not. Thanks to the previous fix, the reconnect timeout is shorter in this case (500ms against 5s on running mode). In addition, we also use the peers resync expiration date to not infinitely retries. It is accurate because the new process, on its side, use this timeout to switch from a local resync to a remote resync. This patch depends on "MINOR: peers: Use a dedicated reconnect timeout when stopping the local peer". It fixes the issue #1799. It should be backported as far as 2.0. --- src/peers.c | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/src/peers.c b/src/peers.c index f9d8077ff2..ba2e0d65ce 100644 --- a/src/peers.c +++ b/src/peers.c @@ -3421,6 +3421,10 @@ struct task *process_peer_sync(struct task * task, void *context, unsigned int s peer_session_forceshutdown(ps); } } + + /* Set resync timeout for the local peer and request a immediate reconnect */ + peers->resync_timeout = tick_add(now_ms, MS_TO_TICKS(PEER_RESYNC_TIMEOUT)); + peers->local->reconnect = now_ms; } } @@ -3436,18 +3440,26 @@ struct task *process_peer_sync(struct task * task, void *context, unsigned int s } else if (!ps->appctx) { /* If there's no active peer connection */ - if (ps->statuscode == 0 || - ps->statuscode == PEER_SESS_SC_SUCCESSCODE || - ps->statuscode == PEER_SESS_SC_CONNECTEDCODE || - ps->statuscode == PEER_SESS_SC_TRYAGAIN) { - /* connection never tried - * or previous peer connection was successfully established - * or previous tcp connect succeeded but init state incomplete - * or during previous connect, peer replies a try again statuscode */ - - /* connect to the local peer if we must push a local sync */ - if (peers->flags & PEERS_F_DONOTSTOP) { - peer_session_create(peers, ps); + if (!tick_is_expired(peers->resync_timeout, now_ms) && + (ps->statuscode == 0 || + ps->statuscode == PEER_SESS_SC_SUCCESSCODE || + ps->statuscode == PEER_SESS_SC_CONNECTEDCODE || + ps->statuscode == PEER_SESS_SC_TRYAGAIN)) { + /* The resync timeout is not expired and + * connection never tried + * or previous peer connection was successfully established + * or previous tcp connect succeeded but init state incomplete + * or during previous connect, peer replies a try again statuscode */ + + if (!tick_is_expired(ps->reconnect, now_ms)) { + /* reconnection timer is not expired. reschedule task for reconnect */ + task->expire = tick_first(task->expire, ps->reconnect); + } + else { + /* connect to the local peer if we must push a local sync */ + if (peers->flags & PEERS_F_DONOTSTOP) { + peer_session_create(peers, ps); + } } } else { -- 2.39.5