From: Willy Tarreau Date: Thu, 2 Aug 2018 08:38:07 +0000 (+0200) Subject: MEDIUM: checks: use the new rendez-vous point to spread check result X-Git-Tag: v1.9-dev2~186 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=3d3700f2167cffd9f8b5695279ffa8734eeb321d;p=thirdparty%2Fhaproxy.git MEDIUM: checks: use the new rendez-vous point to spread check result The current sync point causes some important stress when a high number of threads is in use on a config with lots of checks, because it wakes up all threads every time a server state changes. A config like the following can easily saturate a 4-core machine reaching only 750 checks per second out of the ~2000 configured : global nbthread 4 defaults mode http timeout connect 5s timeout client 5s timeout server 5s frontend srv bind :8001 process 1/1 redirect location / if { method OPTIONS } { rand(100) ge 50 } stats uri / backend chk option httpchk server-template srv 1-100 127.0.0.1:8001 check rise 1 fall 1 inter 50 The reason is that the random on the fake server causes the responses to randomly match an HTTP check, and results in a lot of up/down events that are broadcasted to all threads. It's worth noting that the CPU usage already dropped by about 60% between 1.8 and 1.9 just due to the scheduler updates, but the sync point remains expensive. In addition, it's visible on the stats page that a lot of requests end up with an L7TOUT status in ~60ms. With smaller timeouts, it's even L4TOUT around 20-25ms. By not using THREAD_WANT_SYNC() anymore and only calling the server updates under thread_isolate(), we can avoid all these wakeups. The CPU usage on the same config drops to around 44% on the same machine, with all checks being delivered at ~1900 checks per second, and the stats page shows no more timeouts, even at 10 ms check interval. The difference is mainly caused by the fact that there's no more need to wait for a thread to wake up from poll() before starting to process check results. --- diff --git a/src/haproxy.c b/src/haproxy.c index bf982c59c2..fcba6cd222 100644 --- a/src/haproxy.c +++ b/src/haproxy.c @@ -2375,6 +2375,16 @@ void mworker_pipe_register() fd_want_recv(mworker_pipe[0]); } +/* verifies if some servers' statuses need to be updated and call the update */ +static inline void servers_check_for_updates() +{ + if (!LIST_ISEMPTY(&updated_servers)) { + thread_isolate(); + servers_update_status(); + thread_release(); + } +} + static int sync_poll_loop() { int stop = 0; @@ -2387,12 +2397,6 @@ static int sync_poll_loop() if (!THREAD_NEED_SYNC()) goto exit; - /* *** { */ - /* Put here all sync functions */ - - servers_update_status(); /* Commit server status changes */ - - /* *** } */ exit: stop = (jobs == 0); /* stop when there's nothing left to do */ THREAD_EXIT_SYNC(); @@ -2446,6 +2450,8 @@ static void run_poll_loop() HA_ATOMIC_AND(&sleeping_thread_mask, ~tid_bit); fd_process_cached_events(); + /* check for server status updates */ + servers_check_for_updates(); /* Synchronize all polling loops */ if (sync_poll_loop()) diff --git a/src/server.c b/src/server.c index 1aedaf88c5..f0d510ac80 100644 --- a/src/server.c +++ b/src/server.c @@ -2675,7 +2675,6 @@ struct server *server_find_best_match(struct proxy *bk, char *name, int id, int static void srv_register_update(struct server *srv) { if (LIST_ISEMPTY(&srv->update_status)) { - THREAD_WANT_SYNC(); HA_SPIN_LOCK(UPDATED_SERVERS_LOCK, &updated_servers_lock); if (LIST_ISEMPTY(&srv->update_status)) LIST_ADDQ(&updated_servers, &srv->update_status);