OPTIM: sink: try to merge "dropped" messages faster
When a reader doesn't read fast enough and causes drops, subsequent
threads try to produce a "dropped" message. But it takes time to
produce and emit this message, in part due to the use of chunk_printf()
that relies on vfprintf() which has to parse the printf format, and
during this time other threads may continue to increment the counter.
This is the reason why this is currently performed in a loop. When
reading what is received, it's common to see a large count followed
by one or two single-digit counts, indicating that we could possibly
have improved that by writing faster.
Let's improve the situation a little bit. First we're now using a
static message prefixed with enough space to write the digits, and a
call to ultoa_r() fills these digits from right to left so that we
don't have to process a format string nor perform a copy of the message.
Second, we now re-check the counter immediately after having prepared
the message so that we still get an opportunity for updating it. In
order to avoid too long loops, this is limited to 10 iterations.
Tests show that the number of single-digit "dropped" counters on output
now dropped roughly by 15-30%. Also, it was observed that with 8 threads,
there's almost never more than one retry.