From: Willy Tarreau Date: Fri, 30 Jan 2026 12:41:08 +0000 (+0100) Subject: MEDIUM: backend: make "balance random" consider req rate when loads are equal X-Git-Tag: v3.4-dev4~2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=b6bdb2553bd1031acfeadea3dfee5366dabed462;p=thirdparty%2Fhaproxy.git MEDIUM: backend: make "balance random" consider req rate when loads are equal As reported by Damien Claisse and Cédric Paillet, the "random" LB algorithm can become particularly unfair with large numbers of servers having few connections. It's indeed fairly common to see many servers with zero connection in a thousand-server large farm, and in this case the P2C algo consisting in checking the servers' loads doesn't help at all and is basically similar to random(1). In this case, we only rely on the distribution of server IDs in the random space to pick the best server, but it's possible to observe huge discrepancies. An attempt to model the problem clearly shows that with 1600 servers with weight 10, for 1 million requests, the lowest loaded ones will take 300 req while the most loaded ones will get 780, with most of the values between 520 and 700. In addition, only the first 28 lower bits of server IDs are used for the key calculation, which means that node keys are more determinist. Setting random keys in the lowest 28 bits only better packs values with min around 530 and max around 710, with values mostly between 550 and 680. This can only be compensated by increasing weights and draws without being a perfect fix either. At 4 draws, the min is around 560 and the max around 670, with most values bteween 590 and 650. This patch takes another approach to this problem: when servers are on tie regarding their loads, instead of arbitrarily taking the second one, we now compare their current request rates, which is updated all the time and smoothed over one second, and we pick the server with the lowest request rate. Now with 2 draws, the curve is mostly flat, with the min at 580 and the max at 628, and almost all values between 611 and 625. And 4 draws exclusively gives values from 614 to 624. Other points will need to be addressed separately (bits of server ID, maybe refine the hash algorithm), but these ones would affect how caches are selected, and cannot be changed without an extra option. For random however we can perform a change without impacting anyone. This should be backported, probably only to 3.3 since it's where the "random" algo became the default. --- diff --git a/doc/configuration.txt b/doc/configuration.txt index 886b5fd54..33fba9b85 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -6283,8 +6283,16 @@ balance url_param [check_post] will take away N-1 of the highest loaded servers at the expense of performance. With very high values, the algorithm will converge towards the leastconn's result but much slower. + In addition, for large server farms with very low loads (or + perfect balance), comparing loads will often lead to a tie, + so in case of equal loads between all measured servers, their + request rate over the last second are compared, which allows + to better balance server usage over time in the same spirit + as roundrobin does, and smooth consistent hash unfairness. The default value is 2, which generally shows very good - distribution and performance. This algorithm is also known as + distribution and performance. For large farms with low loads + (less than a few requests per second per server), it may help + to raise it to 3 or even 4. This algorithm is also known as the Power of Two Random Choices and is described here : http://www.eecs.harvard.edu/~michaelm/postscripts/handbook2001.pdf diff --git a/src/backend.c b/src/backend.c index 3f00e1d19..73b39306b 100644 --- a/src/backend.c +++ b/src/backend.c @@ -576,9 +576,20 @@ struct server *get_server_rnd(struct stream *s, const struct server *avoid) /* compare the new server to the previous best choice and pick * the one with the least currently served requests. */ - if (prev && prev != curr && - curr->served * prev->cur_eweight > prev->served * curr->cur_eweight) - curr = prev; + if (prev && prev != curr) { + uint64_t wcurr = (uint64_t)curr->served * prev->cur_eweight; + uint64_t wprev = (uint64_t)prev->served * curr->cur_eweight; + + if (wcurr > wprev) + curr = prev; + else if (wcurr == wprev && curr->counters.shared.tg && prev->counters.shared.tg) { + /* same load: pick the lowest weighted request rate */ + wcurr = read_freq_ctr_period_estimate(&curr->counters._sess_per_sec, MS_TO_TICKS(1000)); + wprev = read_freq_ctr_period_estimate(&prev->counters._sess_per_sec, MS_TO_TICKS(1000)); + if (wprev * curr->cur_eweight < wcurr * prev->cur_eweight) + curr = prev; + } + } } while (--draws > 0); /* if the selected server is full, pretend we have none so that we reach