From: Willy Tarreau Date: Thu, 16 Feb 2023 08:19:21 +0000 (+0100) Subject: BUG/MEDIUM: sched: allow a bit more TASK_HEAVY to be processed when needed X-Git-Tag: v2.8-dev5~190 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=ba4c7a15978deaf74b6af09d2a13b4fff7ccea74;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: sched: allow a bit more TASK_HEAVY to be processed when needed As reported in github issue #1881, there are situations where an excess of TLS handshakes can cause a livelock. What's happening is that normally we process at most one TLS handshake per loop iteration to maintain the latency low. This is done by tagging them with TASK_HEAVY, queuing these tasklets in the TL_HEAVY queue. But if something slows down the loop, such as a connect() call when no more ports are available, we could end up processing no more than a few hundred or thousands handshakes per second. If the llmit becomes lower than the rate of incoming handshakes, we will accumulate them and at some point users will get impatient and give up or retry. Then a new problem happens: the queue fills up with even more handshake attempts, only one of which will be handled per iteration, so we can end up processing only outdated handshakes at a low rate, with basically nothing else in the queue. This can for example happen in parallel with health checks that don't require incoming handshakes to succeed to continue to cause some activity that could maintain the high latency stuff active. Here we're taking a slightly different approach. First, instead of always allowing only one handshake per loop (and usually it's critical for latency), we take the current situation into account: - if configured with tune.sched.low-latency, the limit remains 1 - if there are other non-heavy tasks, we set the limit to 1 + one per 1024 tasks, so that a heavily loaded queue of 4k handshakes per thread will be able to drain them at ~4 per loops with a limited impact on latency - if there are no other tasks, the limit grows to 1 + one per 128 tasks, so that a heavily loaded queue of 4k handshakes per thread will be able to drain them at ~32 per loop with still a very limited impact on latency since only I/O will get delayed. It was verified on a 56-core Xeon-8480 that this did not degrade the latency; all requests remained below 1ms end-to-end in full close+ handshake, and even 500us under low-lat + busy-polling. This must be backported to 2.4. --- diff --git a/src/task.c b/src/task.c index d4625535ab..990ab42813 100644 --- a/src/task.c +++ b/src/task.c @@ -770,11 +770,26 @@ void process_runnable_tasks() for (queue = 0; queue < TL_CLASSES; queue++) max[queue] = ((unsigned)max_processed * max[queue] + max_total - 1) / max_total; - /* The heavy queue must never process more than one task at once - * anyway. + /* The heavy queue must never process more than very few tasks at once + * anyway. We set the limit to 1 if running on low_latency scheduling, + * given that we know that other values can have an impact on latency + * (~500us end-to-end connection achieved at 130kcps in SSL), 1 + one + * per 1024 tasks if there is at least one non-heavy task while still + * respecting the ratios above, or 1 + one per 128 tasks if only heavy + * tasks are present. This allows to drain excess SSL handshakes more + * efficiently if the queue becomes congested. */ - if (max[TL_HEAVY] > 1) - max[TL_HEAVY] = 1; + if (max[TL_HEAVY] > 1) { + if (global.tune.options & GTUNE_SCHED_LOW_LATENCY) + budget = 1; + else if (tt->tl_class_mask & ~(1 << TL_HEAVY)) + budget = 1 + tt->rq_total / 1024; + else + budget = 1 + tt->rq_total / 128; + + if (max[TL_HEAVY] > budget) + max[TL_HEAVY] = budget; + } lrq = grq = NULL;