git.ipfire.org Git - thirdparty/haproxy.git/log

]> git.ipfire.org Git - thirdparty/haproxy.git/log

projects / thirdparty / haproxy.git / log

Willy Tarreau [Tue, 21 Mar 2023 09:50:51 +0000 (10:50 +0100)]

BUG/MEDIUM: applet: only set appctx->sedesc on successful allocation

If appctx_new_on() fails to allocate a task, it will not remove the
freshly allocated sedesc from the appctx despite freeing it, causing
a UAF. Let's only assign appctx->sedesc upon success.

This needs to be backported to 2.6. In 2.6 the function is slightly
different and called appctx_new(), though the issue is exactly the
same.

commit | commitdiff | tree

Willy Tarreau [Tue, 21 Mar 2023 09:44:44 +0000 (10:44 +0100)]

BUG/MEDIUM: mux-h1: properly destroy a partially allocated h1s

In h1c_frt_stream_new() and h1c_bck_stream_new(), if we fail to completely
initialize the freshly allocated h1s, typically because sc_attach_mux()
fails, we must use h1s_destroy() to de-initialize it. Otherwise it stays
attached to the h1c when released, causing use-after-free upon the next
wakeup. This can be triggered upon memory shortage.

This needs to be backported to 2.6.

commit | commitdiff | tree

Willy Tarreau [Tue, 21 Mar 2023 08:24:53 +0000 (09:24 +0100)]

MINOR: pools: preset the allocation failure rate to 1% with -dMfail

Using -dMfail alone does nothing unless tune.fail-alloc is set, which
renders it pretty useless as-is, and is not intuitive. Let's change
this so that the filure rate is preset to 1% when the option is set on
the command line. This allows to inject failures without having to edit
the configuration.

commit | commitdiff | tree

Willy Tarreau [Tue, 21 Mar 2023 08:15:13 +0000 (09:15 +0100)]

MINOR: dynbuf: set POOL_F_NO_FAIL on buffer allocation

b_alloc() is used to allocate a buffer. We can provoke fault injection
based on forced memory allocation failures using -dMfail on the command
line, but we know that the buffer_wait list is a bit weak and doesn't
always recover well. As such, submitting buffer allocation to such a
treatment seriously limits the usefulness of -dMfail which cannot really
be used for other purposes. Let's just disable it for buffers for now.

commit | commitdiff | tree

Willy Tarreau [Mon, 20 Mar 2023 18:53:14 +0000 (19:53 +0100)]

BUG/MINOR: stconn: fix sedesc memory leak on stream allocation failure

If we fail to allocate a new stream in sc_new_from_endp(), and the call
to sc_new() allocated the sedesc itself (which normally doesn't happen),
then it doesn't get released on the failure path. Let's explicitly
handle this case so that it's not overlooked and avoids some head
scratching sessions.

This may be backported to 2.6.

commit | commitdiff | tree

Willy Tarreau [Mon, 20 Mar 2023 18:45:41 +0000 (19:45 +0100)]

BUG/MEDIUM: stconn: don't set the type before allocation succeeds

There's an occasional crash that can be triggered in sc_detach_endp()
when calling conn->mux->detach() upon memory allocation error. The
problem in fact comes from sc_attach_mux(), which doesn't reset the
sc type flags upon tasklet allocation failure, leading to an attempt
at detaching an incompletely initialized stconn. Let's just attach
the sc after the tasklet allocation succeeds, not before.

This must be backported to 2.6.

commit | commitdiff | tree

Willy Tarreau [Mon, 20 Mar 2023 18:16:04 +0000 (19:16 +0100)]

BUG/MEDIUM: mux-h2: erase h2c->wait_event.tasklet on error path

On the allocation error path in h2_init() we may check if
h2c->wait_event.tasklet needs to be released but it has not yet been
zeroed. Let's do this before jumping to the freeing location.

This needs to be backported to all maintained versions.

commit | commitdiff | tree

Willy Tarreau [Mon, 20 Mar 2023 18:14:47 +0000 (19:14 +0100)]

BUG/MEDIUM: mux-h2: do not try to free an unallocated h2s->sd

In h2s_close() we may dereference h2s->sd to get the sc, but this
function may be called on allocation error paths, so we must check
for this specific condition. Let's also update the comment to make
it explicitly permitted.

This needs to be backported to 2.6.

commit | commitdiff | tree

Willy Tarreau [Mon, 20 Mar 2023 18:11:08 +0000 (19:11 +0100)]

BUG/MEDIUM: stream: do not try to free a failed stream-conn

In stream_free() if we fail to allocate s->scb() we go to the path where
we try to free it, and it doesn't like being called with a null at all.
It's easily reproducible with -dMfail,no-cache and "tune.fail-alloc 10"
in the global section.

This must be backported to 2.6.

commit | commitdiff | tree

Frédéric Lécaille [Mon, 20 Mar 2023 16:23:19 +0000 (17:23 +0100)]

BUG/MINOR: quic: Dysfunctional 01RTT packet number space probing

This bug arrived with this commit:
"MINOR: quic: implement qc_notify_send()".
The ->tx.pto_probe variable was no more set when qc_processt_timer() the timer
task for the connection responsible of detecting packet loss and probing upon
PTO expiration leading to interrupted stream transfers. This was revealed by
blackhole interop failed tests where one could see that qc_process_timer()
was wakeup without traces as follows in the log file:
"needs to probe 01RTT packet number space"

Must be backported to 2.7 and to 2.6 if the commit mentionned above
is backported to 2.6 in the meantime.

commit | commitdiff | tree

Frédéric Lécaille [Wed, 15 Mar 2023 16:21:13 +0000 (17:21 +0100)]

MINOR: quic: Stop stressing the acknowledgments process (RX ACK frames)

The ACK frame range of packets were handled from the largest to the smallest
packet number, leading to big number of ebtree insertions when the packet are
handled in the inverse way they are sent. This was detected a long time ago
but left in the code to stress our implementation. It is time to be more
efficient and process the packet so that to avoid useless ebtree insertions.

Modify qc_ackrng_pkts() responsible of handling the acknowledged packets from an
ACK frame range of acknowledged packets.

Must be backported to 2.7.

commit | commitdiff | tree

Willy Tarreau [Fri, 17 Mar 2023 15:13:05 +0000 (16:13 +0100)]

MINOR: ssl-sock: pass the CO_SFL_MSG_MORE info down the stack

Despite having replaced the SSL BIOs to use our own raw_sock layer, we
still didn't exploit the CO_SFL_MSG_MORE flag which is pretty useful to
avoid sending incomplete packets. It's particularly important for SSL
since the extra overhead almost guarantees that each send() will be
followed by an incomplete (and often odd-sided) segment.

We already have an xprt_st set of flags to pass info to the various
layers, so let's just add a new one, SSL_SOCK_SEND_MORE, that is set
or cleared during ssl_sock_from_buf() to transfer the knowledge of
CO_SFL_MSG_MORE. This way we can recover this information and pass
it to raw_sock.

This alone is sufficient to increase by ~5-10% the H2 bandwidth over
SSL when multiple streams are used in parallel.

commit | commitdiff | tree

Willy Tarreau [Fri, 17 Mar 2023 15:09:14 +0000 (16:09 +0100)]

MINOR: mux-h2: set CO_SFL_MSG_MORE when sending multiple buffers

Traces show that sendto() rarely has MSG_MORE on H2 despite sending
multiple buffers. The reason is that the loop iterating over the buffer
ring doesn't have this info and doesn't pass it down.

But now we know how many buffers are left to be sent, so we know whether
or not the current buffer is the last one. As such we can set this flag
for all buffers but the last one.

commit | commitdiff | tree

Willy Tarreau [Fri, 17 Mar 2023 11:30:38 +0000 (12:30 +0100)]

OPTIM: mux-h1: limit first read size to avoid wrapping

Before muxes were used, we used to refrain from reading past the
buffer's reserve. But with muxes which have their own buffer, this
rule was a bit forgotten, resulting in an extraneous read to be
performed just because the rx buffer cannot be entirely transferred
to the stream layer:

  sendto(12, "GET /?s=16k HTTP/1.1\r\nhost: 127."..., 84, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 84
  recvfrom(12, "HTTP/1.1 200\r\nContent-length: 16"..., 16320, 0, NULL, NULL) = 16320
  recvfrom(12, ".123456789.12345", 16, 0, NULL, NULL) = 16
  recvfrom(12, "6789.123456789.12345678\n.1234567"..., 15244, 0, NULL, NULL) = 182
  recvfrom(12, 0x1e5d5d6, 15062, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

Here the server sends 16kB of payload after a headers block, the mux reads
16320 into the ibuf, and the stream layer consumes 15360 from the first
h1_rcv_buf(), which leaves 960 into the buffer and releases a few indexes.
The buffer cannot be realigned due to these remaining data, and a subsequent
read is made on 16 bytes, then again on 182 bytes.

By avoiding to read too much on the first call, we can avoid needlessly
filling this buffer:

  recvfrom(12, "HTTP/1.1 200\r\nContent-length: 16"..., 15360, 0, NULL, NULL) = 15360
  recvfrom(12, "456789.123456789.123456789.12345"..., 16220, 0, NULL, NULL) = 1158
  recvfrom(12, 0x1d52a3a, 15062, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

This is much more efficient and uses less RAM since the first buffer that
was emptied can now be released.

Note that a further improvement (tested) consists in reading even less
(typically 1kB) so that most of the data are transferred in zero-copy, and
are not read until process_stream() is scheduled. This patch doesn't do that
for now so that it can be backported without any obscure impact.

commit | commitdiff | tree

Willy Tarreau [Fri, 17 Mar 2023 15:40:09 +0000 (16:40 +0100)]

BUG/MAJOR: qpack: fix possible read out of bounds in static table

CertiK Skyfall Team reported that passing an index greater than
QPACK_SHT_SIZE in a qpack instruction referencing a literal field
name with name reference or and indexed field line will cause a
read out of bounds that may crash the process, and confirmed that
this fix addresses the issue.

This needs to be backported as far as 2.5.

commit | commitdiff | tree

Aurelien DARRAGON [Fri, 17 Mar 2023 10:46:37 +0000 (11:46 +0100)]

MINOR: doc: missing entries for sc-add-gpc()

When sc-add-gpc() action was implemented in 5a72d03 ("MINOR:
stick-table: implement the sc-add-gpc() action"), its usage was
only documented for "http-request", but according to the code it
now applies everywhere sc-inc-gpc() is mentioned.

Adding the missing entries in the doc everywhere the action may
be used.

The issue was detected by the haproxy-controller bot and was reported
by Pratik Mohanty and Marko Juraga.

No backport needed, unless 5a72d03 ("MINOR: stick-table: implement the
sc-add-gpc() action") is being backported.

commit | commitdiff | tree

Aurelien DARRAGON [Fri, 17 Mar 2023 10:28:58 +0000 (11:28 +0100)]

MINOR: stick-table: add sc-add-gpc() to http-after-response

sc-add-gpc() was implemented in 5a72d03 ("MINOR: stick-table:
implement the sc-add-gpc() action")

This new action was exposed everywhere sc-inc-gpc() is available,
except for http-after-response.
But there doesn't seem to be a technical constraint that prevents us from
exposing it in http-after-response.
It was probably overlooked, let's add it.

No backport needed, unless 5a72d03 ("MINOR: stick-table: implement the
sc-add-gpc() action") is being backported.

commit | commitdiff | tree

Frédéric Lécaille [Fri, 17 Mar 2023 07:56:50 +0000 (08:56 +0100)]

BUG/MINOR: quic: Missing STREAM frame data pointer updates

This patch follows this one which was not sufficient:
"BUG/MINOR: quic: Missing STREAM frame length updates"
Indeed, it is not sufficient to update the ->len and ->offset member
of a STREAM frame to move it forward. The data pointer must also be updated.
This is not done by the STREAM frame builder.

Must be backported to 2.6 and 2.7.

commit | commitdiff | tree

Willy Tarreau [Thu, 16 Mar 2023 16:30:30 +0000 (17:30 +0100)]

BUG/MINOR: mux-h2: set CO_SFL_STREAMER when sending lots of data

Emeric noticed that h2 bit-rate performance was always slightly lower
than h1 when the CPU is saturated. Strace showed that we were always
data in 2kB chunks, corresponding to the max_record size. What's
happening is that when this mechanism of dynamic record size was
introduced, the STREAMER flag at the stream level was relied upon.
Since all this was moved to the muxes, the flag has to be passed as
an argument to the snd_buf() function, but the mux h2 did not use it
despite a comment mentioning it, probably because before the multi-buf
it was not easy to figure the status of the buffer.

The solution here consists in checking if the mbuf is congested or not,
by checking if it has more than one buffer allocated. If so we set the
CO_SFL_STREAMER flag, otherwise we don't. This way moderate size
exchanges continue to be made over small chunks, but downloads will
be able to use the large ones.

While it could be backported to all supported versions, it would be
better to limit it to the last LTS, so let's do it for 2.7 and 2.6 only.
This patch requires previous commit "MINOR: buffer: add br_single() to
check if a buffer ring has more than one buf".

commit | commitdiff | tree

Willy Tarreau [Thu, 16 Mar 2023 15:47:44 +0000 (16:47 +0100)]

BUG/MEDIUM: mux-h2: only restart sending when mux buffer is decongested

During performance tests, Emeric faced a case where the wakeups of
sc_conn_io_cb() caused by h2_resume_each_sending_h2s() was multiplied
by 5-50 and a lot of CPU was being spent doing this for apparently no
reason.

The culprit is h2_send() not behaving well with congested buffers and
small SSL records. What happens when the output is congested is that
all buffers are full, and data are emitted in 2kB chunks, which are
sufficient to wake all streams up again to ask them to send data again,
something that will obviously only work for one of them at best, and
waste a lot of CPU in wakeups and memcpy() due to the small buffers.
When this happens, the performance can be divided by 2-2.5 on large
objects.

Here the chosen solution against this is to keep in mind that as long
as there are still at least two buffers in the ring after calling
xprt->snd_buf(), it means that the output is congested and there's
no point trying again, because these data will just be placed into
such buffers and will wait there. Instead we only mark the buffer
decongested once we're back to a single allocated buffer in the ring.

By doing so we preserve the ability to deal with large concurrent
bursts while not causing a thundering herd by waking all streams for
almost nothing.

This needs to be backported to 2.7 and 2.6. Other versions could
benefit from it as well but it's not strictly necessary, and we can
reconsider this option if some excess calls to sc_conn_io_cb() are
faced.

Note that this fix depends on this recent commit:
MINOR: buffer: add br_single() to check if a buffer ring has more than one buf

commit | commitdiff | tree

Willy Tarreau [Thu, 16 Mar 2023 16:30:04 +0000 (17:30 +0100)]

MINOR: buffer: add br_single() to check if a buffer ring has more than one buf

It's cheaper and cleaner than using br_count()==1 given that it just compares
two indexes, and that a ring having a single buffer is in a special case where
it is between empty and used up-to-1. In other words it's not congested.

commit | commitdiff | tree

Willy Tarreau [Thu, 16 Mar 2023 15:46:56 +0000 (16:46 +0100)]

MINOR: buffer: add br_count() to return the number of allocated bufs

We have no way to know how many buffers are currently allocated in a
buffer ring. Let's add br_count() for this.

commit | commitdiff | tree

Willy Tarreau [Thu, 16 Mar 2023 17:06:19 +0000 (18:06 +0100)]

BUG/MINOR: mux-h2: make sure the h2c task exists before refreshing it

When detaching a stream, if it's the last one and the mbuf is blocked,
we leave without freeing the stream yet. We also refresh the h2c task's
timeout, except that it's possible that there's no such task in case
there is no client timeout, causing a crash. The fix just consists in
doing this when the task exists.

This bug has always been there and is extremely hard to meet even
without a client timeout. This fix has to be backported to all
branches, but it's unlikely anyone has ever met it anyay.

commit | commitdiff | tree

Christopher Faulet [Thu, 16 Mar 2023 10:43:05 +0000 (11:43 +0100)]

BUG/MEDIUM: connection: Preserve flags when a conn is removed from an idle list

The commit 5e1b0e7bf ("BUG/MEDIUM: connection: Clear flags when a conn is
removed from an idle list") introduced a regression. CO_FL_SAFE_LIST and
CO_FL_IDLE_LIST flags are used when the connection is released to properly
decrement used/idle connection counters. if a connection is idle, these
flags must be preserved till the connection is really released. It may be
removed from the list but not immediately released. If these flags are lost
when it is finally released, the current number of used connections is
erroneously decremented. If means this counter may become negative and the
counters tracking the number of idle connecitons is not decremented,
suggesting a leak.

So, the above commit is reverted and instead we improve a bit the way to
detect an idle connection. The function conn_get_idle_flag() must now be
used to know if a connection is in an idle list. It returns the connection
flag corresponding to the idle list if the connection is idle
(CO_FL_SAFE_LIST or CO_FL_IDLE_LIST) or 0 otherwise. But if the connection
is scheduled to be removed, 0 is also returned, regardless the connection
flags.

This new function is used when the connection is temporarily removed from
the list to be used, mainly in muxes.

This patch should fix #2078 and #2057. It must be backported as far as 2.2.

commit | commitdiff | tree

Frédéric Lécaille [Thu, 16 Mar 2023 11:30:36 +0000 (12:30 +0100)]

BUG/MINOR: quic: Missing STREAM frame length updates

Some STREAM frame lengths were not updated before being duplicated, built
of requeued contrary to their ack offsets. This leads haproxy to crash when
receiving acknowledgements for such frames with this thread #1 backtrace:

Thread 1 (Thread 0x7211b6ffd640 (LWP 986141)):
#0  ha_crash_now () at include/haproxy/bug.h:52
No locals.
#1  b_del (b=<optimized out>, del=<optimized out>) at include/haproxy/buf.h:436
No locals.
#2  qc_stream_desc_ack (stream=stream@entry=0x7211b6fd9bc8, offset=offset@entry=53176, len=len@entry=1122) at src/quic_stream.c:111

Thank you to @Tristan971 for having provided such traces which reveal this issue:

[04|quic|5|c_conn.c:1865] qc_requeue_nacked_pkt_tx_frms(): entering : qc@0x72119c22cfe0
[04|quic|5|_frame.c:1179] qc_frm_unref(): entering : qc@0x72119c22cfe0
[04|quic|5|_frame.c:1186] qc_frm_unref(): remove frame reference : qc@0x72119c22cfe0 frm@0x72118863d260 STREAM_F uni=0 fin=1 id=460 off=52957 len=1122 3244
[04|quic|5|_frame.c:1194] qc_frm_unref(): leaving : qc@0x72119c22cfe0
[04|quic|5|c_conn.c:1902] qc_requeue_nacked_pkt_tx_frms(): updated partially acked frame : qc@0x72119c22cfe0 frm@0x72119c472290 STREAM_F uni=0 fin=1 id=460 off=53176 len=1122

Note that haproxy has much more chance to crash if this frame is the last one
(fin bit set). But another condition must be fullfilled to update the ack offset.
A previous STREAM frame from the same stream with the same offset but with less
data must be acknowledged by the peer. This is the condition to update the ack offset.

For others frames without fin bit in the same conditions, I guess the stream may be
truncated because too much data are removed from the stream when they are
acknowledged.

Must be backported to 2.6 and 2.7.

commit | commitdiff | tree

Aurelien DARRAGON [Fri, 10 Mar 2023 15:53:43 +0000 (16:53 +0100)]

BUG/MINOR: tcp_sample: fix a bug in fc_dst_port and fc_dst_is_local sample fetches

There is a bug in the smp_fetch_dport() function which affects the 'f' case,
also known as 'fc_dst_port' sample fetch.

conn_get_src() is used to retrieve the address prior to calling conn_dst().
But this is wrong: conn_get_dst() should be used instead.

Because of that, conn_dst() may return unexpected results since the dst
address is not guaranteed to be set depending on the conn state at the time
the sample fetch is used.

This was reported by Corin Langosch on the ML:

during his tests he noticed that using fc_dst_port in a log-format string
resulted in the correct value being printed in the logs but when he used it
in an ACL, the ACL did not evaluate properly.

This can be easily reproduced with the following test conf:
    |frontend test-http
    |  bind 127.0.0.1:8080
    |  mode http
    |
    |  acl test fc_dst_port eq 8080
    |  http-request return status 200 if test
    |  http-request return status 500 if !test

A request on 127.0.0.1:8080 should normally return 200 OK, but here it
will return a 500.

The same bug was also found in smp_fetch_dst_is_local() (fc_dst_is_local
sample fetch) by reading the code: the fix was applied twice.

This needs to be backported up to 2.5
[both sample fetches were introduced in 2.5 with 888cd70 ("MINOR:
tcp-sample: Add samples to get original info about client connection")]

commit | commitdiff | tree

Christopher Faulet [Wed, 15 Mar 2023 18:10:55 +0000 (19:10 +0100)]

BUG/MEDIUM: mux-h1: Don't block SE_FL_ERROR if EOS is not reported on H1C

When a connection error is encountered during a receive, the error is not
immediatly reported to the SE descriptor. We take care to process all
pending input data first. However, when the error is finally reported, a
fatal error is reported only if a read0 was also received. Otherwise, only a
pending error is reported.

With a raw socket, it is not an issue because shutdowns for read and write
are systematically reported too when a connection error is detected. So in
this case, the fatal error is always reported to the SE descriptor. But with
a SSL socket, in case of pure SSL error, only the connection error is
reported. This prevent the fatal error to go up. And because the connection
is in error, no more receive or send are preformed on the socket. The mux is
blocked till a timeout is triggered at the stream level, looping infinitly
to progress.

To fix the bug, during the demux stage, when there is no longer pending
data, the read error is reported to the SE descriptor, regardless the
shutdown for reads was received or not. No more data are expected, so it is
safe.

This patch should fix the issue #2046. It must be backported to 2.7.

commit | commitdiff | tree

Christopher Faulet [Tue, 14 Mar 2023 14:51:33 +0000 (15:51 +0100)]

DEBUG: ssl-sock/show_fd: Display SSL error code

Like for connection error code, when FD are dumps, the ssl error code is now
displayed. This may help to diagnose why a connection error occurred.

This patch may be backported to help debugging.

commit | commitdiff | tree

Christopher Faulet [Tue, 14 Mar 2023 14:48:06 +0000 (15:48 +0100)]

DEBUG: cli/show_fd: Display connection error code

When FD are dumps, the connection error code is now displayed. This may help
to diagnose why a connection error occurred.

This patch may be backported to help debugging.

commit | commitdiff | tree

Christopher Faulet [Tue, 14 Mar 2023 13:41:55 +0000 (14:41 +0100)]

BUG/MEDIUM: resolvers: Properly stop server resolutions on soft-stop

When HAproxy is stopping, the DNS resolutions must be stopped, except those
triggered from a "do-resolve" action. To do so, the resolutions themselves
cannot be destroyed, the current design is too complex. However, it is
possible to mute the resolvers tasks. The same is already performed with the
health-checks. On soft-stop, the tasks are still running periodically but
nothing if performed.

For the resolvers, when the process is stopping, before running a
resolution, we check all the requesters attached to this resolution. If t
least a request is a stream or if there is a requester attached to a running
proxy, a new resolution is triggered. Otherwise, we ignored the
resolution. It will be evaluated again on the next wakeup. This way,
"do-resolv" action are still working during soft-stop but other resoluation
are stopped.

Of course, it may be see as a feature and not a bug because it was never
performed. But it is in fact not expected at all to still performing
resolutions when HAProxy is stopping. In addution, a proxy option will be
added to change this behavior.

This patch partially fixes the issue #1874. It could be backported to 2.7
and maybe to 2.6. But no further.

commit | commitdiff | tree

Christopher Faulet [Tue, 14 Mar 2023 13:33:11 +0000 (14:33 +0100)]

BUG/MEDIUM: proxy: properly stop backends on soft-stop

On soft-stop, we must properlu stop backends and not only proxies with at
least a listener. This is mandatory in order to stop the health checks. A
previous fix was provided to do so (ba29687bc1 "BUG/MEDIUM: proxy: properly
stop backends"). However, only stop_proxy() function was fixed. When HAproxy
is stopped, this function is no longer used. So the same kind of fix must be
done on do_soft_stop_now().

This patch partially fixes the issue #1874. It must be backported as far as
2.4.

commit | commitdiff | tree

Remi Tricot-Le Breton [Mon, 13 Mar 2023 14:56:35 +0000 (15:56 +0100)]

MINOR: ssl: Add certificate path to 'show ssl ocsp-response' output

The ocsp-related CLI commands tend to work with OCSP_CERTIDs as well as
certificate paths so the path should also be added to the output of the
"show ssl ocsp-response" command when no certid or path is provided.

commit | commitdiff | tree

Remi Tricot-Le Breton [Mon, 13 Mar 2023 14:56:34 +0000 (15:56 +0100)]

MINOR: ssl: Accept certpath as param in "show ssl ocsp-response" CLI command

In order to increase usability, the "show ssl ocsp-response" also takes
a frontend certificate path as parameter. In such a case, it behaves the
same way as "show ssl cert foo.pem.ocsp".

commit | commitdiff | tree

Remi Tricot-Le Breton [Mon, 13 Mar 2023 14:56:33 +0000 (15:56 +0100)]

BUG/MINOR: ssl: Fix double free in ocsp update deinit

If the last update before a deinit happens was successful, the pointer
to the httpclient in the ocsp update context was not reset while the
httpclient instance was already destroyed.

commit | commitdiff | tree

Remi Tricot-Le Breton [Mon, 13 Mar 2023 14:56:32 +0000 (15:56 +0100)]

MINOR: ssl: Use ocsp update task for "update ssl ocsp-response" command

Instead of having a dedicated httpclient instance and its own code
decorrelated from the actual auto update one, the "update ssl
ocsp-response" will now use the update task in order to perform updates.

Since the cli command allows to update responses that were never
included in the auto update tree, a new flag was added to the
certificate_ocsp structure so that the said entry can be inserted into
the tree "by hand" and it won't be reinserted back into the tree after
the update process is performed. The 'update_once' flag "stole" a bit
from the 'fail_count' counter since it is the one less likely to reach
UINT_MAX among the ocsp counters of the certificate_ocsp structure.

This new logic required that every certificate_ocsp entry contained all
the ocsp-related information at all time since entries that are not
supposed to be configured automatically can still be updated through the
cli. The logic of the ssl_sock_load_ocsp was changed accordingly.

commit | commitdiff | tree

Remi Tricot-Le Breton [Mon, 13 Mar 2023 14:56:31 +0000 (15:56 +0100)]

MINOR: ssl: Change the ocsp update log-format

The dedicated proxy used for OCSP auto update is renamed OCSP-UPDATE
which should be more explicit than the previous HC_OCSP name. The
reference to the underlying httpclient is simply kept in the
documentation.
The certid is removed from the log line since it is not really
comprehensible and is replaced by the path to the corresponding frontend
certificate.

commit | commitdiff | tree

Christopher Faulet [Mon, 13 Mar 2023 10:07:37 +0000 (11:07 +0100)]

BUG/MEDIUM: mux-pt: Set EOS on error on sending path if read0 was received

It is more a less a revert of the commit b65af26e1 ("MEDIUM: mux-pt: Don't
always set a final error on SE on the sending path"). The PT multiplexer is
so simple that an error on the sending path is terminal. Unlike other muxes,
there is no connection level here. However, instead of reporting an final
error by setting SE_FL_ERROR, we set SE_FL_EOS flag instead if a read0 was
received on the underlying connection. Concretely, it is always true with
the current design of the raw socket layer. But it is cleaner this way.

Without this patch, it is possible to block a TCP socket if a connection
error is triggered when data are sent (for instance a broken pipe) while the
upper stream does not expect to receive more data.

Note the patch above introduced a regression because errors handling at the
connection level is quite simple. All errors are final. But we must keep in
mind it may change. And if so, this will require to move back on a 2-step
errors handling in the mux-pt.

This patch must be backported to 2.7.

commit | commitdiff | tree

Willy Tarreau [Fri, 10 Mar 2023 15:28:37 +0000 (16:28 +0100)]

[RELEASE] Released version 2.8-dev5

Released version 2.8-dev5 with the following main changes :
    - MINOR: ssl: rename confusing ssl_bind_kws
    - BUG/MINOR: config: crt-list keywords mistaken for bind ssl keywords
    - BUG/MEDIUM: http-ana: Detect closed SC on opposite side during body forwarding
    - BUG/MEDIUM: stconn: Don't rearm the read expiration date if EOI was reached
    - MINOR: global: Add an option to disable the data fast-forward
    - MINOR: haproxy: Add an command option to disable data fast-forward
    - REGTESTS: Remove unsupported feature command in http_splicing.vtc
    - BUG/MEDIUM: wdt: fix wrong thread being checked for sleeping
    - BUG/MINOR: sched: properly report long_rq when tasks remain in the queue
    - BUG/MEDIUM: sched: allow a bit more TASK_HEAVY to be processed when needed
    - MINOR: threads: add flags to know if a thread is started and/or running
    - MINOR: h3/hq-interop: handle no data in decode_qcs() with FIN set
    - BUG/MINOR: mux-quic: transfer FIN on empty STREAM frame
    - BUG/MINOR: mworker: prevent incorrect values in uptime
    - MINOR: h3: add traces on decode_qcs callback
    - BUG/MINOR: quic: Possible unexpected counter incrementation on send*() errors
    - MINOR: quic: Add new traces about by connection RX buffer handling
    - MINOR: quic: Move code to wakeup the timer task to avoid anti-amplication deadlock
    - BUG/MINOR: quic: Really cancel the connection timer from qc_set_timer()
    - MINOR: quic: Simplication for qc_set_timer()
    - MINOR: quic: Kill the connections on ICMP (port unreachable) packet receipt
    - MINOR: quic: Add traces to qc_kill_conn()
    - MINOR: quic: Make qc_dgrams_retransmit() return a status.
    - BUG/MINOR: quic: Missing call to task_queue() in qc_idle_timer_do_rearm()
    - MINOR: quic: Add a trace to identify connections which sent Initial packet.
    - MINOR: quic: Add <pto_count> to the traces
    - BUG/MINOR: quic: Do not probe with too little Initial packets
    - BUG/MINOR: quic: Wrong initialization for io_cb_wakeup boolean
    - BUG/MINOR: quic: Do not drop too small datagrams with Initial packets
    - BUG/MINOR: quic: Missing padding for short packets
    - MINOR: quic: adjust request reject when MUX is already freed
    - BUG/MINOR: quic: also send RESET_STREAM if MUX released
    - BUG/MINOR: quic: acknowledge STREAM frame even if MUX is released
    - BUG/MINOR: h3: prevent hypothetical demux failure on int overflow
    - MEDIUM: h3: enforce GOAWAY by resetting higher unhandled stream
    - MINOR: mux-quic: define qc_shutdown()
    - MINOR: mux-quic: define qc_process()
    - MINOR: mux-quic: implement client-fin timeout
    - MEDIUM: mux-quic: properly implement soft-stop
    - MINOR: quic: mark quic-conn as jobs on socket allocation
    - MEDIUM: quic: trigger fast connection closing on process stopping
    - MINOR: mux-h2/traces: do not log h2s pointer for dummy streams
    - MINOR: mux-h2/traces: add a missing TRACE_LEAVE() in h2s_frt_handle_headers()
    - BUG/MEDIUM: quic: Missing TX buffer draining from qc_send_ppkts()
    - DEBUG: stream: Add a BUG_ON to never exit process_stream with an expired task
    - DOC: config: Fix description of options about HTTP connection modes
    - MINOR: proxy: Only consider backend httpclose option for server connections
    - BUG/MINOR: haproxy: Fix option to disable the fast-forward
    - DOC: config: Add the missing tune.fail-alloc option from global listing
    - MINOR: cfgcond: Implement strstr condition expression
    - MINOR: cfgcond: Implement enabled condition expression
    - REGTESTS: Skip http_splicing.vtc script if fast-forward is disabled
    - REGTESTS: Fix ssl_errors.vtc script to wait for connections close
    - BUG/MINOR: mworker: stop doing strtok directly from the env
    - BUG/MEDIUM: mworker: prevent inconsistent reload when upgrading from old versions
    - BUG/MEDIUM: mworker: don't register mworker_accept_wrapper() when master FD is wrong
    - MINOR: startup: HAPROXY_STARTUP_VERSION contains the version used to start
    - BUG/MINOR: cache: Cache response even if request has "no-cache" directive
    - BUG/MINOR: cache: Check cache entry is complete in case of Vary
    - MINOR: compiler: add a TOSTR() macro to turn a value into a string
    - BUG/MINOR: lua/httpclient: missing free in hlua_httpclient_send()
    - BUG/MEDIUM: httpclient/lua: fix a race between lua GC and hlua_ctx_destroy
    - MEDIUM: channel: Remove CF_READ_NOEXP flag
    - MAJOR: channel: Remove flags to report READ or WRITE errors
    - DEBUG: stream/trace: Add sedesc flags in trace messages
    - MINOR: channel/stconn: Move rto/wto from the channel to the stconn
    - MEDIUM: channel/stconn: Move rex/wex timer from the channel to the sedesc
    - MEDIUM: stconn: Don't requeue the stream's task after I/O
    - MEDIUM: stconn: Replace read and write timeouts by a unique I/O timeout
    - MEDIUM: stconn: Add two date to track successful reads and blocked sends
    - MINOR: applet/stconn: Add a SE flag to specify an endpoint does not expect data
    - MAJOR: stream: Use SE descriptor date to detect read/write timeouts
    - MINOR: stream: Dump the task expiration date in trace messages
    - MINOR: stream: Report rex/wex value using the sedesc date in trace messages
    - MINOR: stream: Use relative expiration date in trace messages
    - MINOR: stconn: Always report READ/WRITE event on shutr/shutw
    - CLEANUP: stconn: Remove old read and write expiration dates
    - MINOR: stconn: Set half-close timeout using proxy settings
    - MINOR: stconn: Remove half-closed timeout
    - REGTESTS: cache: Use rxresphdrs to only get headers for 304 responses
    - MINOR: stconn: Add functions to set/clear SE_FL_EXP_NO_DATA flag from endpoint
    - BUG/MINOR: proto_ux: report correct error when bind_listener fails
    - BUG/MINOR: protocol: fix minor memory leak in protocol_bind_all()
    - MINOR: proto_uxst: add resume method
    - MINOR: listener/api: add lli hint to listener functions
    - MINOR: listener: add relax_listener() function
    - MINOR: listener: workaround for closing a tiny race between resume_listener() and stopping
    - MINOR: listener: make sure we don't pause/resume bypassed listeners
    - BUG/MEDIUM: listener: fix pause_listener() suspend return value handling
    - BUG/MINOR: listener: fix resume_listener() resume return value handling
    - BUG/MEDIUM: resume from LI_ASSIGNED in default_resume_listener()
    - MINOR: listener: pause_listener() becomes suspend_listener()
    - BUG/MEDIUM: listener/proxy: fix listeners notify for proxy resume
    - BUG/MINOR: sock_unix: match finalname with tempname in sock_unix_addrcmp()
    - MEDIUM: proto_ux: properly suspend named UNIX listeners
    - MINOR: proto_ux: ability to dump ABNS names in error messages
    - MINOR: haproxy: always protocol unbind on startup error path
    - BUILD: quic: 32-bits compilation issue with %zu in quic_rx_pkts_del()
    - BUG/MINOR: ring: do not realign ring contents on resize
    - MEDIUM: ring: make the offset relative to the head/tail instead of absolute
    - CLEANUP: ring: remove the now unused ring's offset
    - MINOR: config: add HAPROXY_BRANCH environment variable
    - BUILD: thead: Fix several 32 bits compilation issues with uint64_t variables
    - BUG/MEDIUM: fd: avoid infinite loops in fd_add_to_fd_list and fd_rm_from_fd_list
    - BUG/MEDIUM: h1-htx: Never copy more than the max data allowed during parsing
    - BUG/MINOR: stream: Remove BUG_ON about the task expiration in process_stream()
    - MINOR: stream: Handle stream's timeouts in a dedicated function
    - MEDIUM: stream: Eventually handle stream timeouts when exiting process_stream()
    - MINOR: stconn: Report a send activity when endpoint is willing to consume data
    - BUG/MEDIUM: stconn: Report a blocked send if some output data are not consumed
    - MEDIUM: mux-h1: Don't expect data from server as long as request is unfinished
    - MEDIUM: mux-h2: Don't expect data from server as long as request is unfinished
    - MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished
    - DOC: config: Clarify the meaning of 'hold' in the 'resolvers' section
    - DOC: config: Replace TABs by spaces
    - BUG/MINOR: fd: used the update list from the fd's group instead of tgid
    - BUG/MEDIUM: fd: make fd_delete() support being called from a different group
    - CLEANUP: listener: only store conn counts for local threads
    - MINOR: tinfo: make thread_set functions return nth group/mask instead of first
    - MEDIUM: quic: improve fatal error handling on send
    - MINOR: quic: consider EBADF as critical on send()
    - BUG/MEDIUM: connection: Clear flags when a conn is removed from an idle list
    - BUG/MINOR: mux-h1: Don't report an error on an early response close
    - BUG/MINOR: http-check: Don't set HTX_SL_F_BODYLESS flag with a log-format body
    - BUG/MINOR: http-check: Skip C-L header for empty body when it's not mandatory
    - BUG/MINOR: http-fetch: recognize IPv6 addresses in square brackets in req.hdr_ip()
    - REGTEST: added tests covering smp_fetch_hdr_ip()
    - MINOR: quic: simplify return path in send functions
    - MINOR: quic: implement qc_notify_send()
    - MINOR: quic: purge txbuf before preparing new packets
    - MEDIUM: quic: implement poller subscribe on sendto error
    - MINOR: quic: notify on send ready
    - BUG/MINOR: http-ana: Don't increment conn_retries counter before the L7 retry
    - BUG/MINOR: http-ana: Do a L7 retry on read error if there is no response
    - BUG/MEDIUM: http-ana: Don't close request side when waiting for response
    - BUG/MINOR: mxu-h1: Report a parsing error on abort with pending data
    - MINOR: ssl: Destroy ocsp update http_client during cleanup
    - MINOR: ssl: Reinsert ocsp update entries later in case of unknown error
    - MINOR: ssl: Add ocsp update success/failure counters
    - MINOR: ssl: Store specific ocsp update errors in response and update ctx
    - MINOR: ssl: Add certificate's path to certificate_ocsp structure
    - MINOR: ssl: Add 'show ssl ocsp-updates' CLI command
    - MINOR: ssl: Add sample fetches related to OCSP update
    - MINOR: ssl: Use dedicated proxy and log-format for OCSP update
    - MINOR: ssl: Reorder struct certificate_ocsp members
    - MINOR: ssl: Increment OCSP update replay delay in case of failure
    - MINOR: ssl: Add way to dump ocsp response in base64
    - MINOR: ssl: Add global options to modify ocsp update min/max delay
    - REGTESTS: ssl: Fix ocsp update crt-lists
    - REGTESTS: ssl: Add test for new ocsp update cli commands
    - MINOR: ssl: Add ocsp-update information to "show ssl crt-list"
    - BUG/MINOR: ssl: Fix ocsp-update when using "add ssl crt-list"
    - MINOR: ssl: Replace now.tv_sec with date.tv_sec in ocsp update task
    - BUG/MINOR: ssl: Use 'date' instead of 'now' in ocsp stapling callback
    - BUG/MEDIUM: quic: properly handle duplicated STREAM frames
    - BUG/MINOR: cli: fix CLI handler "set anon global-key" call
    - MINOR: http_ext: adding some documentation, forgot to inline function
    - BUG/MINOR: quic: Do not send too small datagrams (with Initial packets)
    - MINOR: quic: Add a BUG_ON_HOT() call for too small datagrams
    - BUG/MINOR: quic: Ensure to be able to build datagrams to be retransmitted
    - BUG/MINOR: quic: v2 Initial packets decryption failed
    - MINOR: quic: Add traces about QUIC TLS key update
    - BUG/MINOR: quic: Remove force_ack for Initial,Handshake packets
    - BUG/MINOR: quic: Ensure not to retransmit packets with no ack-eliciting frames
    - BUG/MINOR: quic: Do not resend already acked frames
    - BUG/MINOR: quic: Missing detections of amplification limit reached
    - MINOR: quic: Send PING frames when probing Initial packet number space
    - BUG/MEDIUM: quic: do not crash when handling STREAM on released MUX
    - BUG/MAJOR: fd/thread: fix race between updates and closing FD
    - BUG/MEDIUM: dns: ensure ring offset is properly reajusted to head
    - BUG/MINOR: mux-quic: properly init STREAM frame as not duplicated
    - MINOR: quic: Do not accept wrong active_connection_id_limit values
    - MINOR: quic: Store the next connection IDs sequence number in the connection
    - MINOR: quic: Typo fix for ACK_ECN frame
    - MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX)
    - MINOR: quic: Useless TLS context allocations in qc_do_rm_hp()
    - MINOR: quic: Add spin bit support
    - MINOR: quic: Add transport parameters to "show quic"
    - BUG/MEDIUM: sink/forwarder: ensure ring offset is properly readjusted to head
    - BUG/MINOR: dns: fix ring offset calculation on first read
    - BUG/MINOR: dns: fix ring offset calculation in dns_resolve_send()
    - MINOR: jwt: Add support for RSA-PSS signatures (PS256 algorithm)
    - MINOR: h3: add traces on h3_init_uni_stream() error paths
    - MINOR: quic: create a global list dedicated for closing QUIC conns
    - MINOR: quic: handle new closing list in show quic
    - MEDIUM: quic: release closing connections on stopping
    - BUG/MINOR: quic: Wrong RETIRE_CONNECTION_ID sequence number check
    - MINOR: fd/cli: report the polling mask in "show fd"
    - CLEANUP: sock: always perform last connection updates before wakeup
    - MINOR: quic: Do not stress the peer during retransmissions of lost packets
    - BUG/MINOR: init: properly detect NUMA bindings on large systems
    - BUG/MINOR: thread: report thread and group counts in the correct order
    - BUG/MAJOR: fd/threads: close a race on closing connections after takeover
    - MINOR: debug: add random delay injection with "debug dev delay-inj"
    - BUG/MINOR: mworker: use MASTER_MAXCONN as default maxconn value
    - BUG/MINOR: quic: Missing listener accept queue tasklet wakeups
    - MINOR: quic_sock: un-statify quic_conn_sock_fd_iocb()
    - DOC: config: fix typo "dependeing" in bind thread description
    - DOC/CLEANUP: fix typos

commit | commitdiff | tree

Michael Prokop [Fri, 9 Dec 2022 11:28:46 +0000 (12:28 +0100)]

DOC/CLEANUP: fix typos

s/algorithmm/algorithm/
s/an other/another/
s/certicates/certificates/
s/exemples/examples/
s/informations/information/
s/optionnal/optional/

commit | commitdiff | tree

Willy Tarreau [Tue, 28 Feb 2023 07:19:37 +0000 (08:19 +0100)]

DOC: config: fix typo "dependeing" in bind thread description

"dependeing" -> "depending".

commit | commitdiff | tree

Willy Tarreau [Fri, 10 Mar 2023 11:04:02 +0000 (12:04 +0100)]

MINOR: quic_sock: un-statify quic_conn_sock_fd_iocb()

This one is printed as the iocb in the "show fd" output, and arguably
this wasn't very convenient as-is:
293 : st=0x000123(cl heopI W:sRa R:sRA) ref=0 gid=1 tmask=0x8 umask=0x0 prmsk=0x8 pwmsk=0x0 owner=0x7f488487afe0 iocb=0x50a2c0(main+0x60f90)

Let's unstatify it and export it so that the symbol can now be resolved
from the various points that need it.

commit | commitdiff | tree

Frédéric Lécaille [Fri, 10 Mar 2023 12:34:30 +0000 (13:34 +0100)]

BUG/MINOR: quic: Missing listener accept queue tasklet wakeups

This bug was revealed by h2load tests run as follows:

      h2load -t 4 --npn-list h3 -c 64 -m 16 -n 16384  -v https://127.0.0.1:4443/

This open (-c) 64 QUIC connections (-n) 16384 h3 requets from (-t) 4 threads, i.e.
256 requests by connection. Such tests could not always pass and often ended with
such results displays by h2load:

finished in 53.74s, 38.11 req/s, 493.78KB/s
requests: 16384 total, 2944 started, 2048 done, 2048 succeeded, 14336
failed, 14336 errored, 0 timeout
status codes: 2048 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 25.92MB (27174537) total, 102.00KB (104448) headers (space
savings 1.92%), 25.80MB (27053569) data
UDP datagram: 3883 sent, 24330 received
                     min         max         mean         sd        ± sd
time for request:    48.75ms    502.86ms    134.12ms     75.80ms    92.68%
time for connect:    20.94ms    331.24ms    189.59ms     84.81ms    59.38%
time to 1st byte:   394.36ms    417.01ms    406.72ms      9.14ms    75.00%
req/s           :       0.00      115.45       14.30       38.13    87.50%

The number of successful requests was always a multiple of 256.
Activating the traces also shew that some connections were blocked after having
successfully completed their handshakes due to the fact that the mux. The mux
is started upon the acceptation of the connection.

Under heavy load, some connections were never accepted. From the moment where
more than 4 (MAXACCEPT) connections were enqueued before a listener could be
woken up to accept at most 4 connections, the remaining connections were not
accepted ore lately at the second listener tasklet  wakeup.

Add a call to tasklet_wakeup() to the accept list tasklet of the listeners to
wake up it if there are remaining connections to accept after having called
listener_accept(). In this case the listener must not be removed of this
accept list, if not at the next call it will not accept anything more.

Must be backported to 2.7 and 2.6.

commit | commitdiff | tree

William Lallemand [Thu, 9 Mar 2023 13:28:44 +0000 (14:28 +0100)]

BUG/MINOR: mworker: use MASTER_MAXCONN as default maxconn value

In environments where SYSTEM_MAXCONN is defined when compiling, the
master will use this value instead of the original minimal value which
was set to 100. When this happens, the master process could allocate
RAM excessively since it does not need to have an high maxconn. (For
example if SYSTEM_MAXCONN was set to 100000 or more)

This patch fixes the issue by using the new define MASTER_MAXCONN which
define a default maxconn of 100 for the master process.

Must be backported as far as 2.5.

commit | commitdiff | tree

Willy Tarreau [Thu, 9 Mar 2023 07:25:01 +0000 (08:25 +0100)]

MINOR: debug: add random delay injection with "debug dev delay-inj"

The goal is to send signals to random threads at random instants so that
they spin for a random delay in a relax() loop, trying to give back the
CPU to another competing hardware thread, in hope that from time to time
this can trigger in critical areas and increase the chances to provoke a
latent concurrency bug. For now none were observed.

For example, this command starts 64 such tasks waking after random delays
of 0-1ms and delivering signals to trigger such loops on 3 random threads:

  for i in {1..64}; do
    socat - /tmp/sock1 <<< "expert-mode on;debug dev delay-inj 2 3"
  done

This command is only enabled when DEBUG_DEV is set at build time.

commit | commitdiff | tree

Willy Tarreau [Tue, 7 Mar 2023 18:11:02 +0000 (10:11 -0800)]

BUG/MAJOR: fd/threads: close a race on closing connections after takeover

As mentioned in commit 237e6a0d6 ("BUG/MAJOR: fd/thread: fix race between
updates and closing FD"), a race was found during stress tests involving
heavy backend connection reuse with many competing closes.

Here the problem is complex. The analysis in commit f69fea64e ("MAJOR:
fd: get rid of the DWCAS when setting the running_mask") that removed
the DWCAS in 2.5 overlooked a few races.

First, a takeover from thread1 could happen just after fd_update_events()
in thread2 validates it holds the tmask bit in the CAS loop. Since thread1
releases running_mask after the operation, thread2 will succeed the CAS
and both will believe the FD is theirs. This does explain the occasional
crashes seen with h1_io_cb() being called on a bad context, or
sock_conn_iocb() seeing conn->subs vanish after checking it. This issue
can be addressed using a DWCAS in both fd_takeover() and fd_update_events()
as it was before the patch above but this is not portable to all archs and
is not easy to adapt for those lacking it, due to some operations still
happening only on individual masks after the thread groups were added.

Second, the checks after fd_clr_running() for the current thread being
the last one is not sufficient: at the exact moment the operation
completes, another thread may also set and drop the running bit and see
itself as alone, and both can call _fd_close_orphan() in parallel. In
order to prevent this from happening, we cannot rely on the absence of
others, we need an explicit flag indicating that the FD must be closed.
One approach that was attempted consisted in playing with the thread_mask
but that was not reliable since it could still match between the late
deletion and the early insertion that follows. Instead, a new FD flag
was added, FD_MUST_CLOSE, that exactly indicates that the call to
_fd_delete_orphan() must be done. It is set by fd_delete(), and
atomically cleared by the first one which checks it, and which is the
only one to call _fd_delete_orphan().

With both points addressed, there's no more visible race left:

- takeover() only happens under the connection list's lock and cannot
  compete with fd_delete() since fd_delete() must first remove the
  connection from the list before deleting the FD. That's also why it
  doesn't need to call _fd_delete_orphan() when dropping its running
  bit.

- takeover() sets its running bit then atomically replaces the thread
  mask, so that until that's done, it doesn't validate the condition
  to end the synchonization loop in fd_update_events(). Once it's OK,
  the previous thread's bit is lost, and this is checked for in
  fd_update_events()

- fd_update_events() can compete with fd_delete() at various places
  which are explained above. Since fd_delete() clears the thread mask
  as after setting its running bit and after setting the FD_MUST_CLOSE
  bit, the synchronization loop guarantees that the thread mask is seen
  before going further, and that once it's seen, the FD_MUST_CLOSE flag
  is already present.

- fd_delete() may start while fd_update_events() has already started,
  but fd_delete() must hold a bit in thread_mask before starting, and
  that is checked by the first test in fd_update_events() before setting
  the running_mask.

- the poller's _update_fd() will not compete against _fd_delete_orphan()
  nor fd_insert() thanks to the fd_grab_tgid() that's always done before
  updating the polled_mask, and guarantees that we never pretend that a
  polled_mask has a bit before the FD is added.

The issue is very hard to reproduce and is extremely time-sensitive.
Some tests were required with a 1-ms timeout with request rates
closely matching 1 kHz per server, though certain tests sometimes
benefitted from saturation. It was found that adding the following
slowdown at a few key places helped a lot and managed to trigger the
bug in 0.5 to 5 seconds instead of tens of minutes on a 20-thread
setup:

    { volatile int i = 10000; while (i--); }

Particularly, placing it at key places where only one of running_mask
or thread_mask is set and not the other one yet (e.g. after the
synchronization loop in fd_update_events or after dropping the
running bit) did yield great results.

Many thanks to Olivier Houchard for this expert help analysing these
races and reviewing candidate fixes.

The patch must be backported to 2.5. Note that 2.6 does not have tgid
in FDs, and that it requires a change of output on fd_clr_running() as
we need the previous bit. This is provided by carefully backporting
commit d6e1987612 ("MINOR: fd: make fd_clr_running() return the previous
value instead"). Tests have shown that the lack of tgid is a showstopper
for 2.6 and that unless a better workaround is found, it could still be
preferable to backport the minimum pieces required for fd_grab_tgid()
to 2.6 so that it stays stable long.

commit | commitdiff | tree

Willy Tarreau [Thu, 9 Mar 2023 10:39:51 +0000 (11:39 +0100)]

BUG/MINOR: thread: report thread and group counts in the correct order

In case too many thread groups are needed for the threads, we emit
an error indicating the problem. Unfortunately the threads and groups
counts were reversed.

This can be backported to 2.6.

commit | commitdiff | tree

Willy Tarreau [Thu, 9 Mar 2023 09:12:06 +0000 (10:12 +0100)]

BUG/MINOR: init: properly detect NUMA bindings on large systems

The NUMA detection code tries not to interfer with any taskset the user
could have specified in init scripts. For this it compares the number of
CPUs available with the number the process is bound to. However, the CPU
count is retrieved after being applied an upper bound of MAX_THREADS, so
if the machine has more than 64 CPUs, the comparison always fails and
makes haproxy think the user has already enforced a binding, and it does
not pin it anymore to a single NUMA node.

This can be verified by issuing:

  $ socat /path/to/sock - <<< "show info" | grep thread

On a dual 48-CPU machine it reports 64, implying that threads are allowed
to run on the second socket:

  Nbthread: 64

With this fix, the function properly reports 96, and the output shows 48,
indicating that a single NUMA node was used:

  Nbthread: 48

Of course nothing is changed when "no numa-cpu-mapping" is specified:

  Nbthread: 64

This can be backported to 2.4.

commit | commitdiff | tree

Frédéric Lécaille [Wed, 8 Mar 2023 17:23:13 +0000 (18:23 +0100)]

MINOR: quic: Do not stress the peer during retransmissions of lost packets

This issue was revealed by "Multiple streams" QUIC tracker test which very often
fails (locally) with a file of about 1Mbytes (x4 streams). The log of QUIC tracker
revealed that from its point of view, the 4 files were never all received entirely:

"results" : {
      "stream_0_rec_closed" : true,
      "stream_0_rec_offset" : 1024250,
      "stream_0_snd_closed" : true,
      "stream_0_snd_offset" : 15,
      "stream_12_rec_closed" : false,
      "stream_12_rec_offset" : 72689,
      "stream_12_snd_closed" : true,
      "stream_12_snd_offset" : 15,
      "stream_4_rec_closed" : true,
      "stream_4_rec_offset" : 1024250,
      "stream_4_snd_closed" : true,
      "stream_4_snd_offset" : 15,
      "stream_8_rec_closed" : true,
      "stream_8_rec_offset" : 1024250,
      "stream_8_snd_closed" : true,
      "stream_8_snd_offset" : 15
   },

But this in contradiction with others QUIC tracker logs which confirms that haproxy
has really (re)sent the stream at the suspected offset(stream_12_rec_offset):

       1152085,
       "transport",
       "packet_received",
       {
          "frames" : [
             {
                "frame_type" : "stream",
                "length" : "155",
                "offset" : "72689",
                "stream_id" : "12"
             }
          ],
          "header" : {
             "dcid" : "a14479169ebb9dba",
             "dcil" : "8",
             "packet_number" : "466",
             "packet_size" : 190
          },
          "packet_type" : "1RTT"
       }

When detected as losts, the packets are enlisted, then their frames are
requeued in their packet number space by qc_requeue_nacked_pkt_tx_frms().
This was done using a local list which was spliced to the packet number
frame list. This had as bad effect to retransmit the frames in the inverse
order they have been sent. This is something the QUIC tracker go client
does not like at all!

Removing the frame splicing fixes this issue and allows haproxy to pass the
"Multiple streams" test.

Must be backported to 2.7.

commit | commitdiff | tree

Willy Tarreau [Thu, 2 Mar 2023 14:07:51 +0000 (15:07 +0100)]

CLEANUP: sock: always perform last connection updates before wakeup

Normally the task_wakeup() in sock_conn_io_cb() is expected to
happen on the same thread the FD is attached to. But due to the
way the code was arranged in the past (with synchronous callbacks)
we continue to update connections after the wakeup, which always
makes the reader have to think deeply whether it's possible or not
to call another thread there. Let's just move the tasklet_wakeup()
at the end to make sure there's no problem with that.

commit | commitdiff | tree

Willy Tarreau [Thu, 2 Mar 2023 14:05:31 +0000 (15:05 +0100)]

MINOR: fd/cli: report the polling mask in "show fd"

It's missing and often needed when trying to debug a situation, let's
report the polling mask as well in "show fd".

commit | commitdiff | tree

Frédéric Lécaille [Wed, 8 Mar 2023 10:01:58 +0000 (11:01 +0100)]

BUG/MINOR: quic: Wrong RETIRE_CONNECTION_ID sequence number check

This bug arrived with this commit:
b5a8020e9 MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX)
and was revealed by h3 interop tests with clients like s2n-quic and quic-go
as noticed by Amaury.

Indeed, one must check that the CID matching the sequence number provided by a received
RETIRE_CONNECTION_ID frame does not match the DCID of the packet.
Remove useless ->curr_cid_seq_num member from quic_conn struct.
The sequence number lookup must be done in qc_handle_retire_connection_id_frm()
to check the validity of the RETIRE_CONNECTION_ID frame, it returns the CID to be
retired into <cid_to_retire> variable passed as parameter to this function if
the frame is valid and if the CID was not already retired

Must be backported to 2.7.

commit | commitdiff | tree

Amaury Denoyelle [Wed, 8 Mar 2023 09:37:45 +0000 (10:37 +0100)]

MEDIUM: quic: release closing connections on stopping

Since the following commit :
commit fb375574f947143e185225558c274ac00a3f8cb4
MINOR: quic: mark quic-conn as jobs on socket allocation

quic-conn instances are marked as jobs. This prevent haproxy process to
stop while there is transfer in progress. To not delay process
termination, idle connections are woken up through their MUX instances
to be able to release them immediately.

However, there is no mechanism to wake up quic connections left on
closing or draining state. This means that haproxy process termination
is delayed until every closing quic connections timer has expired.

To improve this, a new function quic_handle_stopping() is called when
haproxy process is stopping. It simply wakes up the idle timer task of
all connections in the global closing list. These connections will thus
be released immediately to not interrupt haproxy process stopping.

This should be backported up to 2.7.

commit | commitdiff | tree

Amaury Denoyelle [Wed, 8 Mar 2023 08:42:31 +0000 (09:42 +0100)]

MINOR: quic: handle new closing list in show quic

A new global quic-conn list has been added by the previous patch. It will
contain every quic-conn in closing or draining state.

Thus, it is now easier to include or skip them on a "show quic" output :
when the default list on the current thread has been browsed entirely,
either we skip to the next thread or we look at the closing list on the
current thread.

This should be backported up to 2.7.

commit | commitdiff | tree

Amaury Denoyelle [Wed, 8 Mar 2023 08:42:04 +0000 (09:42 +0100)]

MINOR: quic: create a global list dedicated for closing QUIC conns

When a CONNECTION_CLOSE is emitted or received, a QUIC connection enters
respectively in draining or closing state. These states are a loose
equivalent of TCP TIME_WAIT. No data can be exchanged anymore but the
connection is maintained during a certain timer to handle packet
reordering or loss.

A new global list has been defined for QUIC connections in
closing/draining state inside thread_ctx structure. Each time a
connection enters in one of this state, it will be moved from the
default global list to the new closing list.

The objective of this patch is to quickly filter connections on
closing/draining. Most notably, this will be used to wake up these
connections and avoid that haproxy process stopping is delayed by them.

A dedicated function qc_detach_th_ctx_list() has been implemented to
transfer a quic-conn from one list instance to the other. This takes
care of back-references attach to a quic-conn instance in case of a
running "show quic".

This should be backported up to 2.7.

commit | commitdiff | tree

Amaury Denoyelle [Wed, 8 Mar 2023 09:25:39 +0000 (10:25 +0100)]

MINOR: h3: add traces on h3_init_uni_stream() error paths

Complete traces on h3_init_uni_stream(). This ensures there is always a
dedicated trace for each error paths.

This should be backported up to 2.7.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 7 Mar 2023 16:43:57 +0000 (17:43 +0100)]

MINOR: jwt: Add support for RSA-PSS signatures (PS256 algorithm)

This patch adds the support for the PS algorithms when verifying JWT
signatures (rsa-pss). It was not managed during the first implementation
and previously raised an "Unmanaged algorithm" error.
The tests use the same rsa signature as the plain rsa tests (RS256 ...)
and the implementation simply adds a call to
EVP_PKEY_CTX_set_rsa_padding in the function that manages rsa and ecdsa
signatures.
The signatures in the reg-test were built thanks to the PyJWT python
library once again.

commit | commitdiff | tree

Aurelien DARRAGON [Tue, 7 Mar 2023 17:01:34 +0000 (18:01 +0100)]

BUG/MINOR: dns: fix ring offset calculation in dns_resolve_send()

With 737d10f ("BUG/MEDIUM: dns: ensure ring offset is properly reajusted
to head") relative offset calculation was fixed in dns_session_io_handler()
and dns_process_req() functions.

But if we compare with the changes performed in the patch that introduced
the bug: d9c7188 ("MEDIUM: ring: make the offset relative to the head/tail
instead of absolute"), we can see that dns_resolve_send() is missing from
the patch.

Applying both 737d10f + ("BUG/MINOR: dns: fix ring offset calculation on
first read") to dns_resolve_send() function.
With this last commit, we should be back at pre d9c7188 behavior.

No backport needed.

commit | commitdiff | tree

Aurelien DARRAGON [Tue, 7 Mar 2023 16:45:02 +0000 (17:45 +0100)]

BUG/MINOR: dns: fix ring offset calculation on first read

With 737d10f ("BUG/MEDIUM: dns: ensure ring offset is properly reajusted
to head") ring offset is now properly re-adjusted in dns_session_io_handler()
and dns_process_req().

But the previous patch does not cope well if the first read is performed
on a non-empty ring since relative ofs will be computed from ds->ofs=0 or
dss->ofs_req=0.
In this case: relative offset could become invalid since we mix up relative
offsets with absolute offsets.

To fix this, we apply the same logic performed in d9c7188 ("MEDIUM: ring:
make the offset relative to the head/tail instead of absolute") for the
cli_io_handler_show_ring() function: that is using b_peek_ofs(buf, 0) to
set the contextual offset instead of hard-coding it to 0.

This should be considered as a minor bugfix since this bug was discovered by
reading the code: 737d10f already survived a good amount of stress-tests as
shown in GH #2068.

No backport needed as 737d10f is not marked for backports.

commit | commitdiff | tree

Aurelien DARRAGON [Tue, 7 Mar 2023 15:09:33 +0000 (16:09 +0100)]

BUG/MEDIUM: sink/forwarder: ensure ring offset is properly readjusted to head

Since d9c7188 ("MEDIUM: ring: make the offset relative to the head/tail instead
of absolute"), ring offset calculation has changed: we don't rely on ring->ofs
absolute offset anymore.

But with the above patch, relative offset is not properly calculated in
sink_forward_oc_io_handler() and sink_forward_io_handler().

The issue here is the same as 737d10f ("BUG/MEDIUM: dns: ensure ring offset is
properly reajusted to head") since dns and sink_forward share the same
ring logic:

When the ring is becoming full, ring_write() will try to regain some space to
insert new data by calling b_del() on older messages. Here b_del() moves
buffer's head under the hood, and since ring->ofs cannot be used to "correct"
the relative offset, both sink_forward_oc_io_handler() and
sink_forward_io_handler() start to get invalid offset.
At this point, we will suffer from ring data corruption resulting in unexpected
behavior or process crashes.

This can be easily demonstrated with the following test:

    |log-forward syslog
    |  dgram-bind 127.0.0.1:5114
    |  log ring@logbuffer local0
    |
    |ring logbuffer
    |  format rfc5424
    |  size 16384
    |  server logserver 127.0.0.1:5114

Haproxy will forward incoming logs on udp@127.0.0.1:5114 to
tcp@127.0.0.1:5114

Then use the following tcp server:
  nc -l -p 5114

With the following udp log sender:
    |while [ 1 ]
    |do
    |  logger --udp  --server 127.0.0.1 -P 5114 -p user.warn "Test 7"
    |done

Once the ring buffer is full (it takes less that a second to fill the 16k
buffer) haproxy starts to misbehave and the log forwarding stops.

We apply the same fix as in 737d10f ("BUG/MEDIUM: dns: ensure ring offset is
properly reajusted to head").
Please note the ~0 case that is handled slightly differently in this patch:
this is required to properly start reading from a non-empty ring. This case
will be fixed in dns related code in the following patch.

This does not need to be backported as d9c7188 was not marked for backports.

commit | commitdiff | tree

Frédéric Lécaille [Tue, 7 Mar 2023 14:18:02 +0000 (15:18 +0100)]

MINOR: quic: Add transport parameters to "show quic"

Modify quic_transport_params_dump() and others function relative to the
transport parameters value dump from TRACE() to make their output more
compact.
Add call to quic_transport_params_dump() to dump the transport parameters
from "show quic" CLI command.

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Tue, 7 Mar 2023 10:53:43 +0000 (11:53 +0100)]

MINOR: quic: Add spin bit support

Add QUIC_FL_RX_PACKET_SPIN_BIT new RX packet flag to mark an RX packet as having
the spin bit set. Idem for the connection with QUIC_FL_CONN_SPIN_BIT flag.
Implement qc_handle_spin_bit() to set/unset QUIC_FL_CONN_SPIN_BIT for the connection
as soon as a packet number could be deciphered.
Modify quic_build_packet_short_header() to set the spin bit when building
a short packet header.

Validated by quic-tracker spin bit test.

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Tue, 7 Mar 2023 09:28:16 +0000 (10:28 +0100)]

MINOR: quic: Useless TLS context allocations in qc_do_rm_hp()

These allocations are definitively useless.

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Mon, 6 Mar 2023 17:16:34 +0000 (18:16 +0100)]

MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX)

Add ->curr_cid_seq_num new quic_conn struct frame to store the connection
ID sequence number currently used by the connection.
Implement qc_handle_retire_connection_id_frm() to handle this RX frame.
Implement qc_retire_connection_seq_num() to remove a connection ID from its
sequence number.
Implement qc_build_new_connection_id_frm to allocate a new NEW_CONNECTION_ID
frame from a CID.
Modify qc_parse_pkt_frms() which parses the frames of an RX packet to handle
the case of the RETIRE_CONNECTION_ID frame.

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Mon, 6 Mar 2023 14:34:19 +0000 (15:34 +0100)]

MINOR: quic: Typo fix for ACK_ECN frame

Wrong name displayed by TRACE().

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Mon, 6 Mar 2023 13:07:59 +0000 (14:07 +0100)]

MINOR: quic: Store the next connection IDs sequence number in the connection

Add ->next_cid_seq_num new member to quic_conn struct to store the next
connection ID to be used to alloacated a connection ID.
It is initialized to 0 from qc_new_conn() which initializes a connection.
Modify new_quic_cid() to use this variable each time it is called without
giving the possibility to the caller to pass the sequence number for the
connection to be allocated.

Modify quic_build_post_handshake_frames() to use ->next_cid_seq_num
when building NEW_CONNECTION_ID frames after the hanshake has been completed.
Limit the number of connection IDs provided to the peer to the minimum
between 4 and the value it sent with active_connection_id_limit transport
parameter. This includes the connection ID used by the connection to send
this new connection IDs.

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Mon, 6 Mar 2023 12:57:40 +0000 (13:57 +0100)]

MINOR: quic: Do not accept wrong active_connection_id_limit values

A peer must not send active_connection_id_limit values smaller than 2
which is also the minimum value when not sent.

Make the transport parameters decoding fail in this case.

Must be backported to 2.7.

commit | commitdiff | tree

Amaury Denoyelle [Tue, 7 Mar 2023 17:07:08 +0000 (18:07 +0100)]

BUG/MINOR: mux-quic: properly init STREAM frame as not duplicated

STREAM frame retransmission has been recently fixed. A new boolean field
<dup> was created for quic_stream frame type. It is set for duplicated
STREAM frame to ensure extra checks on the underlying buffer are
conducted before sending the frame. All of this has been implemented by
this commit :
315a4f6ae54da17fd28f7a14373b05bab0b5aa08
BUG/MEDIUM: quic: do not crash when handling STREAM on released MUX

However, the above commit is incomplete. In the MUX code, when a new
STREAM frame is created, <dup> is left uninitialized. In most cases this
is harmless as it will only add extra unneeded checks before sending the
frame. So this is mainly a performance issue.

There is however one case where this bug will lead to a crash : when the
response consists only of an empty STREAM frame. In this case, the empty
frame will be silently removed as it is incorrectly assimilated to an
already acked frame range in qc_build_frms(). This can trigger a
BUG_ON() on the MUX code as a qcs instance is still in the send list
after qc_send_frames() invocation.

Note that this is extremely rare to have only an empty STREAM frame. It
was reproduced with HTTP/0.9 where no HTTP status line exists on an
empty body. I do not know if this is possible on HTTP/3 as a status line
should be present each time in a HEADERS frame.

Properly initialize <dup> field to 0 on each STREAM frames generated by
the QUIC MUX to fix this issue.

This crash may be linked to github issue #2049.

This should be backported up to 2.6.

commit | commitdiff | tree

Amaury Denoyelle [Tue, 7 Mar 2023 10:18:27 +0000 (11:18 +0100)]

BUG/MEDIUM: dns: ensure ring offset is properly reajusted to head

Since the below patch, ring offset calculation for readers has changed.
  commit d9c718863384e32307f65a9ce319dc362b73feb6
  MEDIUM: ring: make the offset relative to the head/tail instead of absolute

For readers, this requires to adjust their offsets to be relative to the
ring head each time read is resumed. Indeed, buffer head can change any
time a ring_write() is performed after older entries were purged.
This operation was not performed on the DNS code which causes the offset
to become invalid. In most cases, the following BUG_ON() was triggered :

  FATAL: bug condition "msg_len + ofs + cnt + 1 > b_data(buf)" matched
  at src/dns.c:522

Fix this by adjusting DNS reader offsets when entering
dns_session_io_handler() and dns_process_req().

This bug was reproduced by using a backend with 10 servers using SRV
record resolution on a single resolvers section. A BUG_ON() crash would
occur after less than 5 minutes of process execution.

This does not need to be backported as the above patch is not.

This should fix github issue #2068.

commit | commitdiff | tree

Willy Tarreau [Sat, 4 Mar 2023 14:33:24 +0000 (15:33 +0100)]

BUG/MAJOR: fd/thread: fix race between updates and closing FD

While running some L7 retries tests, Christopher and I stumbled upon a
very strange behavior showing some occasional server timeouts when the
server closes keep-alive connections quickly. The issue can be
reproduced with the following config:

    global
        expose-experimental-directives
        #tune.fd.edge-triggered on   # can speed up the issue

    defaults
        mode http
        timeout client 5s
        timeout server 10s
        timeout connect 2s

    listen f
        bind :8001
        http-reuse always
        retry-on all-retryable-errors
        server next 127.0.0.1:8002

    frontend b
        bind :8002
        timeout http-keep-alive 1  # one ms
        redirect location /

Sending fast requests without reusing the client connection on port 8001
with a single connection and at least 3 threads on haproxy occasionally
shows some glitches pauses (below with timeout server 2s):

  $ taskset -c 2,3 h1load  -e -t 1 -r 1 -c 1 http://127.0.0.1:8001/
  #     time conns tot_conn  tot_req      tot_bytes    err  cps  rps  bps   ttfb
           1     1     9794     9793         959714      0 9k79 9k79 7M67 42.94u
           2     1     9794     9793         959714      0 0.00 0.00 0.00    -
           3     1     9794     9793         959714      0 0.00 0.00 0.00    -
           4     0    16015    16015        1569470      0 6k22 6k22 4M87 522.9u
           5     0    18657    18656        1828190      2 2k63 2k63 2M06 39.22u

If this doesn't happen, limiting to a request rate close to 1/timeout
may help.

What is happening is that after several migrations, a late report
via fd_update_events() may detect that the thread is not welcome, and
will want to program an update so that the current thread's poller
disables its polling on it. It is allowed to do so because it used
fd_grab_tgid(). But what if _fd_delete_orphan() was just starting to
be called and already reset the update_mask ? We'll end up with a bit
present in the update mask, then _fd_delete_orphan() resets the tgid,
which will prevent the poller from consuming that update. The update
is not needed anymore since the FD was closed, but in this case nobody
will clear this bit until the same FD is reused again and cleared. And
as long as the thread's bit remains in the update_mask, no new updates
will be programmed for the next use of this FD on the same thread since
due to the bit being present, fd_nbupdt will not be changed. This is
what is causing this timeout.

The fix consists in making sure _fd_delete_orphan() waits for the
occasional watchers to leave, and to do this before clearing the
update_mask. This will be either fd_update_events() trying to check
its thread_mask, or the poller checking its updates, so that's pretty
short. But it definitely closes this race.

This fix is needed since the introduction of fd_grab_tgid(), hence 2.7.

Note that while testing the fix, another related issue concerning the
atomicity of running_mask vs thread_mask popped up and will have to be
fixed till 2.5 as part of another patch. It may make the tests for this
fix occasionally tigger a few BUG_ON() or face a null conn->subs in
sock_conn_iocb(), though these ones are much more difficult to trigger.
This is not caused by this fix.

commit | commitdiff | tree

Amaury Denoyelle [Mon, 6 Mar 2023 08:10:53 +0000 (09:10 +0100)]

BUG/MEDIUM: quic: do not crash when handling STREAM on released MUX

The MUX instance is released before its quic-conn counterpart. On
termination, a H3 GOAWAY is emitted to prevent the client to open new
streams for this connection.

The quic-conn instance will stay alive until all opened streams data are
acknowledged. If the client tries to open a new stream during this
interval despite the GOAWAY, quic-conn is responsible to request its
immediate closure with a STOP_SENDING + RESET_STREAM.

This behavior was already implemented but the received packet with the
new STREAM was never acknowledged. This was fixed with the following
commit :
  commit 156a89aef8c63910502b266251dc34f648a99fae
  BUG/MINOR: quic: acknowledge STREAM frame even if MUX is released

However, this patch introduces a regression as it did not skip the call
to qc_handle_strm_frm() despite the MUX instance being released. This
can cause a segfault when using qcc_get_qcs() on a released MUX
instance. To fix this, add a missing break statement which will skip
qc_handle_strm_frm() when the MUX instance is not initialized.

This commit was reproduced using a short timeout client and sending
several requests with delay between them by using a modified aioquic. It
produces a crash with the following backtrace :
#0  0x000055555594d261 in __eb64_lookup (x=4, root=0x7ffff4091f60) at include/import/eb64tree.h:132
#1  eb64_lookup (root=0x7ffff4091f60, x=4) at src/eb64tree.c:37
#2  0x000055555563fc66 in qcc_get_qcs (qcc=0x7ffff4091dc0, id=4, receive_only=1, send_only=0, out=0x7ffff780ca70) at src/mux_quic.c:668
#3  0x0000555555641e1a in qcc_recv (qcc=0x7ffff4091dc0, id=4, len=40, offset=0, fin=1 '\001', data=0x7ffff40c4fef "\001&") at src/mux_quic.c:974
#4  0x0000555555619d28 in qc_handle_strm_frm (pkt=0x7ffff4088e60, strm_frm=0x7ffff780cf50, qc=0x7ffff7cef000, fin=1 '\001') at src/quic_conn.c:2515
#5  0x000055555561d677 in qc_parse_pkt_frms (qc=0x7ffff7cef000, pkt=0x7ffff4088e60, qel=0x7ffff7cef6c0) at src/quic_conn.c:3050
#6  0x00005555556230aa in qc_treat_rx_pkts (qc=0x7ffff7cef000, cur_el=0x7ffff7cef6c0, next_el=0x0) at src/quic_conn.c:4214
#7  0x0000555555625fee in quic_conn_app_io_cb (t=0x7ffff40c1fa0, context=0x7ffff7cef000, state=32848) at src/quic_conn.c:4640
#8  0x00005555558a676d in run_tasks_from_lists (budgets=0x7ffff780d470) at src/task.c:596
#9  0x00005555558a725b in process_runnable_tasks () at src/task.c:876
#10 0x00005555558522ba in run_poll_loop () at src/haproxy.c:2945
#11 0x00005555558529ac in run_thread_poll_loop (data=0x555555d14440 <ha_thread_info+64>) at src/haproxy.c:3141
#12 0x00007ffff789ebb5 in ?? () from /usr/lib/libc.so.6
#13 0x00007ffff7920d90 in ?? () from /usr/lib/libc.so.6

This should fix github issue #2067.

This must be backported up to 2.6.

commit | commitdiff | tree

Frédéric Lécaille [Fri, 3 Mar 2023 16:34:41 +0000 (17:34 +0100)]

MINOR: quic: Send PING frames when probing Initial packet number space

In very very rare cases, it is possible the Initial packet number space
must be probed even if it there is no more in flight CRYPTO frames.
In such cases, a PING frame is sent into an Initial packet. As this
packet is ack-eliciting, it must be padded by the server. qc_do_build_pkt()
is modified to do so.

Take the opportunity of this patch to modify the trace for TX frames to
easily distinguished them from other frame relative traces.

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Fri, 3 Mar 2023 09:16:32 +0000 (10:16 +0100)]

BUG/MINOR: quic: Missing detections of amplification limit reached

Mark the connection as limited by the anti-amplification limit when trying to
probe the peer.
Wakeup the connection PTO/dectection loss timer as soon as a datagram is
received. This was done only when the datagram was dropped.
This fixes deadlock issues revealed by some interop runner tests.

Must be backported to 2.7 and 2.6.

commit | commitdiff | tree

Frédéric Lécaille [Thu, 2 Mar 2023 13:49:22 +0000 (14:49 +0100)]

BUG/MINOR: quic: Do not resend already acked frames

Some frames are marked as already acknowledged from duplicated packets
whose the original packet has been acknowledged. There is no need
to resend such packets or frames.

Implement qc_pkt_with_only_acked_frms() to detect packet with only
already acknowledged frames inside and use it from qc_prep_fast_retrans()
which selects the packet to be retransmitted.

Must be backported to 2.6 and 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Thu, 2 Mar 2023 10:53:43 +0000 (11:53 +0100)]

BUG/MINOR: quic: Ensure not to retransmit packets with no ack-eliciting frames

Even if there is a check in callers of qc_prep_hdshk_fast_retrans() and
qc_prep_fast_retrans() to prevent retransmissions of packets with no ack-eliciting
frames, these two functions should pay attention not do to that especially if
someone decides to modify their implementations in the future.

Must be backported to 2.6 and 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Sat, 25 Feb 2023 10:27:34 +0000 (11:27 +0100)]

BUG/MINOR: quic: Remove force_ack for Initial,Handshake packets

This is an old bug which arrived in this commit due to a misinterpretation
of the RFC I guess where the desired effect was to acknowledge all the
handshake packets:

77ac6f566 BUG/MINOR: quic: Missing acknowledgments for trailing packets

This had as bad effect to acknowledge all the handshake packets even the
ones which are not ack-eliciting.

Must be backported to 2.7 and 2.6.

commit | commitdiff | tree

Frédéric Lécaille [Thu, 23 Feb 2023 19:38:23 +0000 (20:38 +0100)]

MINOR: quic: Add traces about QUIC TLS key update

Dump the secret used to derive the next one during a key update initiated by the
client and dump the resulted new secret and the new key and iv to be used to
decryption Application level packets.

Also add a trace when the key update is supposed to be initiated on haproxy side.

This has already helped in diagnosing an issue evealed by the key update interop
test with xquic as client.

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Wed, 22 Feb 2023 15:20:09 +0000 (16:20 +0100)]

BUG/MINOR: quic: v2 Initial packets decryption failed

v2 interop runner test revealed this bug as follows:

     [01|quic|4|c_conn.c:4087] new packet : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080 rel=H
     [01|quic|5|c_conn.c:1509] qc_pkt_decrypt(): entering : qc@0x7f62ec026e30
     [01|quic|0|c_conn.c:1553] quic_tls_decrypt() failed : qc@0x7f62ec026e30
     [01|quic|5|c_conn.c:1575] qc_pkt_decrypt(): leaving : qc@0x7f62ec026e30
     [01|quic|0|c_conn.c:4091] packet decryption failed -> dropped : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080

Only v2 Initial packets decryption received by the clients were impacted. There
is no issue to encrypt v2 Initial packets. This is due to the fact that when
negotiated the client may send two versions of Initial packets (currently v1,
then v2). The selection was done for the TX path but not on the RX path.

Implement qc_select_tls_ctx() to select the correct TLS cipher context for all
types of packets and call this function before removing the header protection
and before deciphering the packet.

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Tue, 21 Feb 2023 15:44:05 +0000 (16:44 +0100)]

BUG/MINOR: quic: Ensure to be able to build datagrams to be retransmitted

When retransmitting datagrams with two coalesced packets inside, the second
packet was not taken into consideration when checking there is enough space
into the network for the datagram, especially when limited by the anti-amplification.

Must be backported to 2.6 and 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Mon, 20 Feb 2023 13:43:55 +0000 (14:43 +0100)]

MINOR: quic: Add a BUG_ON_HOT() call for too small datagrams

This should be helpful to detect too small datagrams: datagrams
smaller than 1200 bytes, with Initial packets inside.

Must be backported to 2.7.

commit | commitdiff | tree

Frédéric Lécaille [Mon, 20 Feb 2023 13:39:41 +0000 (14:39 +0100)]

BUG/MINOR: quic: Do not send too small datagrams (with Initial packets)

Before building a packet into a datagram, ensure there is sufficient space for at
least 1200 bytes. Also pad datagrams with only one ack-eliciting Initial packet
inside.

Must be backported to 2.7 and 2.6.

commit | commitdiff | tree

Aurelien DARRAGON [Fri, 3 Mar 2023 12:11:36 +0000 (13:11 +0100)]

MINOR: http_ext: adding some documentation, forgot to inline function

Making http_7239_valid_obfsc() inline because it is only called by inline
functions.

Removing dead comment and documenting proxy_http_parse_{7239,xff,xot} functions.

No backport needed.

commit | commitdiff | tree

Amaury Denoyelle [Fri, 3 Mar 2023 16:11:10 +0000 (17:11 +0100)]

BUG/MINOR: cli: fix CLI handler "set anon global-key" call

Anonymization mode has two CLI handlers "set anon <on|off>" and "set
anon global-key". The last one only requires admin level. However, as
cli_find_kw() is implemented, only the first handler will be retrieved
as they both start with the same prefix "set anon".

This has the effect to execute the wrong handler for "set anon
global-key" with an error message about an invalid keyword. To fix this,
handlers definition have been separated for both "set anon on" and "set
anon off" commands. This allows to have minimal changes while keeping
the same "set anon" prefix for each commands.

Also take this opportunity to fix a reference to a non-existing "set
global-key" CLI handler in the documentation.

This must be backported up to 2.7.

commit | commitdiff | tree

Amaury Denoyelle [Wed, 22 Feb 2023 09:44:27 +0000 (10:44 +0100)]

BUG/MEDIUM: quic: properly handle duplicated STREAM frames

When a STREAM frame is re-emitted, it will point to the same stream
buffer as the original one. If an ACK is received for either one of
these frame, the underlying buffer may be freed. Thus, if the second
frame is declared as lost and schedule for retransmission, we must
ensure that the underlying buffer is still allocated or interrupt the
retransmission.

Stream buffer is stored as an eb_tree indexed by the stream ID. To avoid
to lookup over a tree each time a STREAM frame is re-emitted, a lost
STREAM frame is flagged as QUIC_FL_TX_FRAME_LOST.

In most cases, this code is functional. However, there is several
potential issues which may cause a segfault :
- when explicitely probing with a STREAM frame, the frame won't be
  flagged as lost
- when splitting a STREAM frame during retransmission, the flag is not
  copied

To fix both these cases, QUIC_FL_TX_FRAME_LOST flag has been converted
to a <dup> field in quic_stream structure. This field is now properly
copied when splitting a STREAM frame. Also, as this is now an inner
quic_frame field, it will be copied automatically on qc_frm_dup()
invocation thus ensuring that it will be set on probing.

This issue was encounted randomly with the following backtrace :
#0  __memmove_avx512_unaligned_erms ()
#1  0x000055f4d5a48c01 in memcpy (__len=18446698486215405173, __src=<optimized out>,
#2  quic_build_stream_frame (buf=0x7f6ac3fcb400, end=<optimized out>, frm=0x7f6a00556620,
#3  0x000055f4d5a4a147 in qc_build_frm (buf=buf@entry=0x7f6ac3fcb5d8,
#4  0x000055f4d5a23300 in qc_do_build_pkt (pos=<optimized out>, end=<optimized out>,
#5  0x000055f4d5a25976 in qc_build_pkt (pos=0x7f6ac3fcba10,
#6  0x000055f4d5a30c7e in qc_prep_app_pkts (frms=0x7f6a0032bc50, buf=0x7f6a0032bf30,
#7  qc_send_app_pkts (qc=0x7f6a0032b310, frms=0x7f6a0032bc50) at src/quic_conn.c:4184
#8  0x000055f4d5a35f42 in quic_conn_app_io_cb (t=0x7f6a0009c660, context=0x7f6a0032b310,

This should fix github issue #2051.

This should be backported up to 2.6.

commit | commitdiff | tree

Remi Tricot-Le Breton [Thu, 2 Mar 2023 14:49:55 +0000 (15:49 +0100)]

BUG/MINOR: ssl: Use 'date' instead of 'now' in ocsp stapling callback

In the OCSP response callback, instead of using the actual date of the
system, the scheduler's 'now' timer is used when checking a response's
validity.

This patch can be backported to all stable versions.

commit | commitdiff | tree

Remi Tricot-Le Breton [Thu, 2 Mar 2023 14:49:54 +0000 (15:49 +0100)]

MINOR: ssl: Replace now.tv_sec with date.tv_sec in ocsp update task

Instead of relying on the scheduler's timer in the main ocsp update
task, we use the actual system's date.

commit | commitdiff | tree

Remi Tricot-Le Breton [Thu, 2 Mar 2023 14:49:53 +0000 (15:49 +0100)]

BUG/MINOR: ssl: Fix ocsp-update when using "add ssl crt-list"

When adding a new certificate through the CLI and appending it to a
crt-list with the 'ocsp-update' option set, the new certificate would
not be added to the OCSP response update list.
The only thing that was missing was the copy of the ocsp_update mode
from the ssl_bind_conf into the ckch_store's object.
An extra wakeup of the update task also needed to happen in case the
newly inserted entry needs to be updated before the next wakeup of the
task.

This patch does not need to be backported.

commit | commitdiff | tree

Remi Tricot-Le Breton [Thu, 2 Mar 2023 14:49:52 +0000 (15:49 +0100)]

MINOR: ssl: Add ocsp-update information to "show ssl crt-list"

The "show ssl crt-list <list>" CLI command did not manage the new
ocsp-update option yet.

commit | commitdiff | tree

Remi Tricot-Le Breton [Thu, 2 Mar 2023 14:49:51 +0000 (15:49 +0100)]

REGTESTS: ssl: Add test for new ocsp update cli commands

Add tests for the "show ssl ocsp-updates" cli command as well as the new
'base64' parameter that can be passed to the "show ssl ocsp-response"
command.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:30 +0000 (17:46 +0100)]

REGTESTS: ssl: Fix ocsp update crt-lists

The options were after the filters which does not work well and now
raises a warning. It did not break the regtest because the crt-lists
were not actually used by clients.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:29 +0000 (17:46 +0100)]

MINOR: ssl: Add global options to modify ocsp update min/max delay

The minimum and maximum delays between two automatic updates of a given
OCSP response can now be set via global options. It allows to limit the
update rate of OCSP responses for configurations that use many frontend
certificates with the ocsp-update option set if the updates are deemed
too costly.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:28 +0000 (17:46 +0100)]

MINOR: ssl: Add way to dump ocsp response in base64

A new format option can be passed to the "show ssl ocsp-response" CLI
command to dump the contents of an OCSP response in base64. This is
needed because thanks to the new OCSP auto update mechanism, we could
end up using an OCSP response internally that was never provided by the
user.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:27 +0000 (17:46 +0100)]

MINOR: ssl: Increment OCSP update replay delay in case of failure

In case of successive OCSP update errors for a given OCSP response, the
retry delay will be multiplied by 2 for every new failure in order to
avoid retrying too often to update responses for which the responder is
unresponsive (for instance). The maximum delay will still be taken into
account so the OCSP update requests will wtill be sent at least every
hour.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:26 +0000 (17:46 +0100)]

MINOR: ssl: Reorder struct certificate_ocsp members

Just swapping those two 'refcount' and 'response' members enables to
fill two 4 bytes holes in the structure.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:25 +0000 (17:46 +0100)]

MINOR: ssl: Use dedicated proxy and log-format for OCSP update

Instead of using the same proxy as other http client calls (through lua
for instance), the OCSP update will use a dedicated proxy which will
enable it to change the log format and log conditions (for instance).
This proxy will have the NOLOGNORM option and regular logging will be
managed by the update task itself because in order to dump information
related to OCSP updates, we need to control the moment when the logs are
emitted (instead or relying on the stream's life which is decorrelated
from the update itself).
The update task then calls sess_log directly, which uses a dedicated
ocsp logformat that fetches specific OCSP data. Sess_log was preferred
to the more low level app_log because it offers the strength of
"regular" sample fetches and allows to add generic information alongside
OCSP ones in the log line.
In case of connection error (unreachable server for instance), a regular
httpclient log line will also be emitted. This line will have some extra
HTTP related info that can't be provided by the ocsp update logging
mechanism.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:24 +0000 (17:46 +0100)]

MINOR: ssl: Add sample fetches related to OCSP update

This patch adds a series of sample fetches that rely on the specified
OCSP update context structure. They will then be of use only in the
context of an ongoing OCSP update.
They cannot be used directly in the configuration so they won't be made
public. They will be used in the OCSP update's specific log format which
should be emitted by the update task itself in a future patch.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:23 +0000 (17:46 +0100)]

MINOR: ssl: Add 'show ssl ocsp-updates' CLI command

This command can be used to dump information about the entries contained
in the ocsp update tree. It will display one line per concerned OCSP
response and will contain the expected next update time as well as the
time of the last successful update, and the number of successful and
failed attempts.

commit | commitdiff | tree

Remi Tricot-Le Breton [Wed, 1 Mar 2023 15:11:50 +0000 (16:11 +0100)]

MINOR: ssl: Add certificate's path to certificate_ocsp structure

In order to have some information about the frontend certificate when
dumping the contents of the ocsp update tree from the cli, we could
either keep a reference to a ckch_store in the certificate_ocsp
structure, which might cause some dangling reference problems, or
simply copy the path to the certificate in the ocsp response structure.
This latter solution was chosen because of its simplicity.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:21 +0000 (17:46 +0100)]

MINOR: ssl: Store specific ocsp update errors in response and update ctx

Those new specific error codes will enable to know a bit better what
went wrong during and OCSP update process. They will come to use in
future sample fetches as well as in debugging means (via the cli or
future traces).

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:20 +0000 (17:46 +0100)]

MINOR: ssl: Add ocsp update success/failure counters

Those counters will be used for debugging purposes and will be dumped
via a cli command.

commit | commitdiff | tree

Remi Tricot-Le Breton [Tue, 28 Feb 2023 16:46:19 +0000 (17:46 +0100)]

MINOR: ssl: Reinsert ocsp update entries later in case of unknown error

In case of allocation error during the construction of an OCSP request
for instance, we would have ended reinserting the ocsp entry at the same
place in the ocsp update tree which could potentially lead to an
"endless" loop of errors in ssl_ocsp_update_responses. In such a case,
entries are now reinserted further in the tree (1 minute later) in order
to avoid such a chain of alloc failure.

Mirror of https://github.com/haproxy/haproxy.git

RSS Atom