This reduces the avg wakeup latency of sc_conn_io_cb() from 1900 to 51us.
The L2 cache misses from from 1.4 to 1.2 billion for 20k req. But the
perf is not better. Also there are situations where we must not perform
such wakeup, these may only be done from h2_io_cb, hence the test on the
next_tasklet pointer and its reset when leaving the function. In practice
all callers to h2s_close() or h2s_destroy() can reach that code, this
includes h2_detach, h2_snd_buf, h2_shut etc.
Another test with 40 concurrent connections, transferring 40k 1MB objects
at different concurrency levels from 1 to 80 also showed a 21% drop in L2
cache misses, and a 2% perf improvement:
Before:
329,510,887,528 instructions
50,907,966,181 branches
843,515,912 branch-misses
2,753,360,222 cache-misses
19,306,172,474 L1-icache-load-misses
17,321,132,742 L1-dcache-load-misses
951,787,350 LLC-load-misses
44.
660469000 seconds user
62.
459354000 seconds sys
=> avg perf: 373 MB/s
After:
331,310,219,157 instructions
51,343,396,257 branches
851,567,572 branch-misses
2,183,369,149 cache-misses
19,129,827,134 L1-icache-load-misses
17,441,877,512 L1-dcache-load-misses
906,923,115 LLC-load-misses
42.
795458000 seconds user
62.
277983000 seconds sys
=> avg perf: 380 MB/s
With small requests, it's the L1 and L3 cache misses which reduced by
3% and 7% respectively, and the performance went up by 3%.
struct list blocked_list; /* list of streams blocked for other reasons (e.g. sfctl, dep) */
struct buffer_wait buf_wait; /* wait list for buffer allocations */
struct wait_event wait_event; /* To be used if we're waiting for I/Os */
+
+ struct list *next_tasklet; /* which applet to wake up next (NULL by default) */
};
h2c->proxy = prx;
h2c->task = NULL;
h2c->wait_event.tasklet = NULL;
+ h2c->next_tasklet = NULL;
h2c->shared_rx_bufs = NULL;
h2c->idle_start = now_ms;
if (tick_isset(h2c->timeout)) {
{
if (h2s->subs && h2s->subs->events & SUB_RETRY_RECV) {
TRACE_POINT(H2_EV_STRM_WAKE, h2s->h2c->conn, h2s);
- tasklet_wakeup(h2s->subs->tasklet);
+ if (h2s->h2c->next_tasklet ||
+ (th_ctx->current && th_ctx->current->process == h2_io_cb))
+ h2s->h2c->next_tasklet = tasklet_wakeup_after(h2s->h2c->next_tasklet, h2s->subs->tasklet);
+ else
+ tasklet_wakeup(h2s->subs->tasklet);
h2s->subs->events &= ~SUB_RETRY_RECV;
if (!h2s->subs->events)
h2s->subs = NULL;
/* If we were in an idle list, we want to add it back into it,
* unless h2_process() returned -1, which mean it has destroyed
* the connection (testing !ret is enough, if h2_process() wasn't
- * called then ret will be 0 anyway.
+ * called then ret will be 0 anyway. Otherwise we reset the next
+ * tasklet to disable instant wakeups from external callers.
*/
if (ret < 0)
t = NULL;
+ else
+ h2c->next_tasklet = NULL;
if (!ret && conn_in_list) {
struct server *srv = objt_server(conn->target);