MAJOR: threads: change thread_isolate to support inter-group synchronization
thread_isolate() and thread_isolate_full() were relying on a set of thread
masks for all threads in different states (rdv, harmless, idle). This cannot
work anymore when the number of threads increases beyond LONGBITS so we need
to change the mechanism.
What is done here is to have a counter of requesters and the number of the
current isolated thread. Threads which want to isolate themselves increment
the request counter and wait for all threads to be marked harmless (or idle)
by scanning all groups and watching the respective masks. This is possible
because threads cannot escape once they discover this counter, unless they
also want to isolate and possibly pass first. Once all threads are harmless,
the requesting thread tries to self-assign the isolated thread number, and
if it fails it loops back to checking all threads. If it wins it's guaranted
to be alone, and can drop its harmless bit, so that other competing threads
go back to the loop waiting for all threads to be harmless. The benefit of
proceeding this way is that there's very little write contention on the
thread number (none during work), hence no cache line moves between caches,
thus frozen threads do not slow down the isolated one.
Once it's done, the isolated thread resets the thread number (hence lets
another thread take the place) and decrements the requester count, thus
possibly releasing all harmless threads.
With this change there's no more need for any global mask to synchronize
any thread, and we only need to loop over a number of groups to check
64 threads at a time per iteration. As such, tinfo's threads_want_rdv
could be dropped.
This was tested with 64 threads spread into 2 groups, running 64 tasks
(from the debug dev command), 20 "show sess" (thread_isolate()), 20
"add server blah/blah" (thread_isolate()), and 20 "del server blah/blah"
(thread_isolate_full()). The load remained very low (limited by external
socat forks) and no stuck nor starved thread was found.
Willy Tarreau [Tue, 28 Jun 2022 17:29:29 +0000 (19:29 +0200)]
MEDIUM: thread: make stopping_threads per-group and add stopping_tgroups
Stopping threads need a mask to figure who's still there without scanning
everything in the poll loop. This means this will have to be per-group.
And we also need to have a global stopping groups mask to know what groups
were already signaled. This is used both to figure what thread is the first
one to catch the event, and which one is the first one to detect the end of
the last job. The logic isn't changed, though a loop is required in the
slow path to make sure all threads are aware of the end.
Note that for now the soft-stop still takes time for group IDs > 1 as the
poller is not yet started on these threads and needs to expire its timeout
as there's no way to wake it up. But all threads are eventually stopped.
Willy Tarreau [Mon, 27 Jun 2022 14:02:24 +0000 (16:02 +0200)]
MEDIUM: tinfo: add a dynamic thread-group context
The thread group info is not sufficient to represent a thread group's
current state as it's read-only. We also need something comparable to
the thread context to represent the aggregate state of the threads in
that group. This patch introduces ha_tgroup_ctx[] and tg_ctx for this.
It's indexed on the group id and must be cache-line aligned. The thread
masks that were global and that do not need to remain global were moved
there (want_rdv, harmless, idle).
Given that all the masks placed there now become group-specific, the
associated thread mask (tid_bit) now switches to the thread's local
bit (ltid_bit). Both are the same for nbtgroups 1 but will differ for
other values.
There's also a tg_ctx pointer in the thread so that it can be reached
from other threads.
Willy Tarreau [Mon, 27 Jun 2022 13:05:44 +0000 (15:05 +0200)]
CLEANUP: thread: remove thread_sync_release() and thread_sync_mask
This function was added in 2.0 when reworking the thread isolation
mechanism to make it more reliable. However it if fundamentally
incompatible with the full isolation mechanism provided by
thread_isolate_full() since that one will wait for all threads to
become idle while the former will wait for all threads to finish
waiting, causing a deadlock.
Given that it's not used, let's just drop it entirely before it gets
used by accident.
Willy Tarreau [Fri, 24 Jun 2022 13:55:11 +0000 (15:55 +0200)]
MINOR: thread: add a new all_tgroups_mask variable to know about active tgroups
In order to kill all_threads_mask we'll need to have an equivalent for
the thread groups. The all_tgroups_mask does just this, it keeps one bit
set per enabled group.
Willy Tarreau [Mon, 27 Jun 2022 14:23:44 +0000 (16:23 +0200)]
MINOR: thread: use ltid_bit in ha_tkillall()
Since commit cc7a11ee3 ("MINOR: threads: set the tid, ltid and their bit
in thread_cfg") we ought not use (1UL << thr) to get the group mask for
thread <thr>, but (ha_thread_info[thr].ltid_bit). ha_tkillall() needs
this.
Willy Tarreau [Mon, 27 Jun 2022 14:22:22 +0000 (16:22 +0200)]
MINOR: clock: use ltid_bit in clock_report_idle()
Since commit cc7a11ee3 ("MINOR: threads: set the tid, ltid and their bit
in thread_cfg") we ought not use (1UL << thr) to get the group mask for
thread <thr>, but (ha_thread_info[thr].ltid_bit). clock_report_idle()
needs this.
This also implies not using all_threads_mask anymore but taking the mask
from the tgroup since it becomes relative now.
Willy Tarreau [Mon, 27 Jun 2022 14:20:13 +0000 (16:20 +0200)]
MINOR: wdt: use ltid_bit in wdt_handler()
Since commit cc7a11ee3 ("MINOR: threads: set the tid, ltid and their bit
in thread_cfg") we ought not use (1UL << thr) to get the group mask for
thread <thr>, but (ha_thread_info[thr].ltid_bit). wdt_handler() needs
this.
Willy Tarreau [Mon, 27 Jun 2022 14:13:50 +0000 (16:13 +0200)]
MINOR: debug: use ltid_bit in ha_thread_dump()
Since commit cc7a11ee3 ("MINOR: threads: set the tid, ltid and their bit
in thread_cfg") we ought not use (1UL << thr) to get the group mask for
thread <thr>, but (ha_thread_info[thr].ltid_bit). ha_thread_dump() needs
this.
Willy Tarreau [Fri, 24 Jun 2022 13:18:49 +0000 (15:18 +0200)]
MINOR: tinfo: add the mask of enabled threads in each group
In order to replace the global "all_threads_mask" we'll need to have an
equivalent per group. Take this opportunity for calling it threads_enabled
and make sure which ones are counted there (in case in the future we allow
to stop some).
Willy Tarreau [Tue, 28 Jun 2022 15:48:07 +0000 (17:48 +0200)]
MINOR: tinfo: replace the tgid with tgid_bit in tgroup_info
Now that the tgid is accessible from the thread, it's pointless to have
it in the group, and it was only set but never used. However we'll soon
frequently need the mask corresponding to the group ID and the risk of
getting it wrong with the +1 or to shift 1 instead of 1UL is important,
so let's store the tgid_bit there.
Willy Tarreau [Tue, 28 Jun 2022 08:49:57 +0000 (10:49 +0200)]
MINOR: tinfo: add the tgid to the thread_info struct
At several places we're dereferencing the thread group just to catch
the group number, and this will become even more required once we start
to use per-group contexts. Let's just add the tgid in the thread_info
struct to make this easier.
Willy Tarreau [Mon, 20 Jun 2022 07:23:24 +0000 (09:23 +0200)]
MEDIUM: tasks/fd: replace sleeping_thread_mask with a TH_FL_SLEEPING flag
Every single place where sleeping_thread_mask was still used was to test
or set a single thread. We can now add a per-thread flag to indicate a
thread is sleeping, and remove this shared mask.
The wake_thread() function now always performs an atomic fetch-and-or
instead of a first load then an atomic OR. That's cleaner and more
reliable.
This is not easy to test, as broadcast FD events are rare. The good
way to test for this is to run a very low rate-limited frontend with
a listener that listens to the fewest possible threads (2), and to
send it only 1 connection at a time. The listener will periodically
pause and the wakeup task will sometimes wake up on a random thread
and will call wake_thread():
Alternately, disabling/enabling a frontend in loops via the CLI also
broadcasts such events, but they're more difficult to observe since
this is causing connection failures.
Willy Tarreau [Wed, 22 Jun 2022 13:38:38 +0000 (15:38 +0200)]
MEDIUM: thread: add a new per-thread flag TH_FL_NOTIFIED to remember wakeups
Right now when an inter-thread wakeup happens, we preliminary check if the
thread was asleep, and if so we wake the poller up and remove its bit from
the sleeping mask. That's not very clean since the sleeping mask cannot be
entirely trusted since a thread that's about to wake up will already have
its sleeping bit removed.
This patch adds a new per-thread flag (TH_FL_NOTIFIED) to remember that a
thread was notified to wake up. It's cleared before checking the task lists
last, so that new wakeups can be considered again (since wake_thread() is
only used to notify about task wakeups and FD polling changes). This way
we do not need to modify a remote thread's sleeping mask anymore. As such
wake_thread() now only tests and sets the TH_FL_NOTIFIED flag but doesn't
clear sleeping anymore.
Willy Tarreau [Thu, 23 Jun 2022 16:31:08 +0000 (18:31 +0200)]
MINOR: poller: update_fd_polling: wake a random other thread
When enabling an FD that's only bound to another thread, instead of
always picking the first one, let's pick a random one. This is rarely
used (enabling a frontend, or session rate-limiting period ending),
and has greater chances of avoiding that some obscure corner cases
could degenerate into a poorly distributed load.
Willy Tarreau [Thu, 23 Jun 2022 16:38:06 +0000 (18:38 +0200)]
MEDIUM: polling: make update_fd_polling() not care about sleeping threads
Till now, update_fd_polling() used to check if all the target threads were
sleeping, and only then would wake an owning thread up. This causes several
problems among which the need for the sleeping_thread_mask and the fact that
by the time we wake one thread up, it has changed.
This commit changes this by leaving it to wake_thread() to perform this
check on the selected thread, since wake_thread() is already capable of
doing this now. Concretely speaking, for updt_fd_polling() it will mean
performing one computation of an ffsl() before knowing the sleeping status
on a global FD state change (which is very rare and not important here,
as it basically happens after relaxing a rate-limit (i.e. once a second
at beast) or after enabling a frontend from the CLI); thus we don't care.
Willy Tarreau [Wed, 22 Jun 2022 13:21:34 +0000 (15:21 +0200)]
MINOR: poller: centralize poll return handling
When returning from the polling syscall, all pollers have a certain
dance to follow, made of wall clock updates, thread harmless updates,
idle time management and sleeping mask updates. Let's have a centralized
function to deal with all of this boring stuff: fd_leaving_poll(), and
make all the pollers use it.
Willy Tarreau [Wed, 22 Jun 2022 07:19:46 +0000 (09:19 +0200)]
MINOR: thread: only use atomic ops to touch the flags
The thread flags are touched a little bit by other threads, e.g. the STUCK
flag may be set by other ones, and they're watched a little bit. As such
we need to use atomic ops only to manipulate them. Most places were already
using them, but here we generalize the practice. Only ha_thread_dump() does
not change because it's run under isolation.
Willy Tarreau [Wed, 22 Jun 2022 07:00:08 +0000 (09:00 +0200)]
MINOR: thread: move the flags to the shared cache line
The thread flags were once believed to be local to the thread, but as
it stands, even the STUCK flag is shared since it's looked at by the
watchdog. As such we'll need to use atomic ops to manipulate them, and
likely to move them into the shared area.
This patch only moves the flag into the shared area so that we can later
decide whether it's best to leave them there or to move them back to the
local area. Interestingly, some tests have shown a 3% better performance
on dequeuing with this, while they're not used by other threads yet, so
there are definitely alignment effects that might change over time.
Willy Tarreau [Mon, 20 Jun 2022 07:14:40 +0000 (09:14 +0200)]
MINOR: thread: make wake_thread() take care of the sleeping threads mask
Almost every call place of wake_thread() checks for sleeping threads and
clears the sleeping mask itself, while the function is solely used for
there. Let's move the check and the clearing of the bit inside the function
itself. Note that updt_fd_polling() still performs the check because its
rules are a bit different.
Willy Tarreau [Thu, 16 Jun 2022 14:10:05 +0000 (16:10 +0200)]
MEDIUM: queue: revert to regular inter-task wakeups
Now that the inter-task wakeups are cheap, there's no point in using
task_instant_wakeup() anymore when dequeueing tasks. The use of the
regular task_wakeup() is sufficient there and will preserve a better
fairness: the test that went from 40k to 570k RPS now gives 580k RPS
(down from 585k RPS with previous commit). This essentially reverts
commit 27fab1dcb ("MEDIUM: queue: use tasklet_instant_wakeup() to
wake tasks").
Willy Tarreau [Thu, 16 Jun 2022 14:28:01 +0000 (16:28 +0200)]
MEDIUM: task: use regular eb32 trees for the run queues
Since we don't mix tasks from different threads in the run queues
anymore, we don't need to use the eb32sc_ trees and we can switch
to the regular eb32 ones. This uses cheaper lookup and insert code,
and a 16-thread test on the queues shows a performance increase
from 570k RPS to 585k RPS.
Willy Tarreau [Thu, 16 Jun 2022 13:59:36 +0000 (15:59 +0200)]
MINOR: task: replace global_tasks_mask with a check for tree's emptiness
This bit field used to be a per-thread cache of the result of the last
lookup of the presence of a task for each thread in the shared cache.
Since we now know that each thread has its own shared cache, a test of
emptiness is now sufficient to decide whether or not the shared tree
has a task for the current thread. Let's just remove this mask.
Willy Tarreau [Thu, 16 Jun 2022 13:52:49 +0000 (15:52 +0200)]
MINOR: task: remove grq_total and use rq_total instead
grq_total was only used to know how many tasks were being queued in the
global runqueue for stats purposes, and that was transferred to the per
thread rq_total counter once assigned. We don't need this anymore since
we know where they are, so let's just directly update rq_total and drop
that one.
Willy Tarreau [Thu, 16 Jun 2022 13:30:50 +0000 (15:30 +0200)]
MEDIUM: task: move the shared runqueue to one per thread
Since we only use the shared runqueue to put tasks only assigned to
known threads, let's move that runqueue to each of these threads. The
goal will be to arrange an N*(N-1) mesh instead of a central contention
point.
The global_rqueue_ticks had to be dropped (for good) since we'll now
use the per-thread rqueue_ticks counter for both trees.
A few points to note:
- the rq_lock stlil remains the global one for now so there should not
be any gain in doing this, but should this trigger any regression, it
is important to detect whether it's related to the lock or to the tree.
- there's no more reason for using the scope-based version of the ebtree
now, we could switch back to the regular eb32_tree.
- it's worth checking if we still need TASK_GLOBAL (probably only to
delete a task in one's own shared queue maybe).
Willy Tarreau [Thu, 16 Jun 2022 13:44:35 +0000 (15:44 +0200)]
MINOR: task: make rqueue_ticks atomic
The runqueue ticks counter is per-thread and wasn't initially meant to
be shared. We'll soon have to share it so let's make it atomic. It's
only updated when waking up a task, and no performance difference was
observed. It was moved in the thread_ctx struct so that it doesn't
pollute the local cache line when it's later updated by other threads.
Willy Tarreau [Thu, 16 Jun 2022 13:36:58 +0000 (15:36 +0200)]
CLEANUP: task: remove the unused task_unlink_rq()
This function stopped being used before 2.4 because either the task is
dequeued by the scheduler itself and it knows where to find it, or it's
killed by any thread, and task_kill() must be used for this as only this
one is safe.
It's difficult to say whether task_unlink_rq() is still safe, but once
the lock moves to a thread declared in the task itself, it will be even
more difficult to keep it safe.
Let's just remove it now before someone reuses it and causes trouble.
Willy Tarreau [Wed, 15 Jun 2022 14:48:45 +0000 (16:48 +0200)]
MEDIUM: task: remove TASK_SHARED_WQ and only use t->tid
TASK_SHARED_WQ was set upon task creation and never changed afterwards.
Thus if a task was created to run anywhere (e.g. a check or a Lua task),
all its timers would always pass through the shared timers queue with a
lock. Now we know that tid<0 indicates a shared task, so we can use that
to decide whether or not to use the shared queue. The task might be
migrated using task_set_affinity() but it's always dequeued first so
the check will still be valid.
Not only this removes a flag that's difficult to keep synchronized with
the thread ID, but it should significantly lower the load on systems with
many checks. A quick test with 5000 servers and fast checks that were
saturating the CPU shows that the check rate increased by 20% (hence the
CPU usage dropped by 17%). It's worth noting that run_task_lists() almost
no longer appears in perf top now.
Willy Tarreau [Wed, 15 Jun 2022 14:37:33 +0000 (16:37 +0200)]
MEDIUM: task: only keep task_new_*() and drop task_new()
As previously advertised in comments, the mask-based task_new() is now
gone. The low-level function now is task_new_on() which takes a thread
number or a negative value for "any thread", which is turned to zero
for thread-less builds since there's no shared WQ in thiscase. The
task_new_here() and task_new_anywhere() functions were adjusted
accordingly.
Willy Tarreau [Wed, 15 Jun 2022 14:35:51 +0000 (16:35 +0200)]
MEDIUM: applet: only keep appctx_new_*() and drop appctx_new()
This removes the mask-based variant so that from now on the low-level
function becomes appctx_new_on() and it takes either a thread number or
a negative value for "any thread". This way we can use task_new_on() and
task_new_anywhere() instead of task_new() which will soon disappear.
Willy Tarreau [Wed, 15 Jun 2022 13:44:00 +0000 (15:44 +0200)]
MAJOR: task: replace t->thread_mask with 1<<t->tid when thread mask is needed
At a few places where the task's thread mask. Now we know that it's always
either one bit or all bits of all_threads_mask, so we can replace it with
either 1<<tid or all_threads_mask depending on what's expected.
It's worth noting that the global_tasks_mask is still set this way and
that it's reaching its limits. Similarly, the task_new() API would deserve
an update to stop using a thread mask and use a thread number instead.
Similarly, task_set_affinity() should be updated to directly take a
thread number.
At this point the task's thread mask is not used anymore.
Willy Tarreau [Wed, 15 Jun 2022 12:31:38 +0000 (14:31 +0200)]
MAJOR: task: use t->tid instead of ffsl(t->thread_mask) to take the thread ID
At several places we need to figure the ID of the first thread allowed
to run a task. Till now this was performed using my_ffsl(t->thread_mask)
but since we now have the thread ID stored into the task, let's use it
instead. This is tagged major because it starts to assume that tid<0 is
strictly equivalent to atleast2(thread_mask), and that as such, among
the allowed threads are the current one.
Willy Tarreau [Wed, 15 Jun 2022 12:19:48 +0000 (14:19 +0200)]
MEDIUM: task: add and preset a thread ID in the task struct
The tasks currently rely on a mask but do not have an assigned thread ID,
contrary to tasklets. However, in practice they're either running on a
single thread or on any thread, so that it will be worth simplifying all
this in order to ease the transition to the thread groups.
This patch introduces a "tid" field in the task struct, that's either
the number of the thread the task is attached to, or a negative value
if the task is not bound to a thread, (i.e. its mask is all_threads_mask).
The new ID is only set and updated but not used yet.
Willy Tarreau [Wed, 15 Jun 2022 14:32:41 +0000 (16:32 +0200)]
MINOR: debug: remove mask support from "debug dev sched"
The thread mask will not be used anymore, instead the thread id only
is used. Interestingly it was already implemented in the parsing but
not used. The single/multi thread argument is not needed anymore since
it's sufficient to pass tid<0 to get a multi-threaded task/tasklet.
This is in preparation for the removal of the thread_mask in tasks as
only this debug code was using it!
Willy Tarreau [Tue, 14 Jun 2022 08:43:01 +0000 (10:43 +0200)]
MINOR: tinfo: make tid temporarily still reflect global ID
For now we still set tid_bit to (1UL << tid) because FDs will not
work with more than one group without this, but once FDs start to
adopt local masks this must change to thr->ltid_bit.
BUG/MEDIUM: ssl/fd: unexpected fd close using async engine
Before 2.3, after an async crypto processing or on session close, the engine
async file's descriptors were removed from the fdtab but not closed because
it is the engine which has created the file descriptor, and it is responsible
for closing it. In 2.3 the fd_remove() call was replaced by fd_stop_both()
which stops the polling but does not remove the fd from the fdtab and the
fd remains indefinitively in the fdtab.
A simple replacement by fd_delete() is not a valid fix because fd_delete()
removes the fd from the fdtab but also closes the fd. And the fd will be
closed twice: by the haproxy's core and by the engine itself.
Instead, let's set FD_DISOWN on the FD before calling fd_delete() which will
take care of not closing it.
This patch must be backported on branches >= 2.3, and it relies on this
previous patch:
MINOR: fd: add a new FD_DISOWN flag to prevent from closing a deleted FD
As mentioned in the patch above, a different flag will be needed in 2.3.
MINOR: fd: add a new FD_DISOWN flag to prevent from closing a deleted FD
Some FDs might be offered to some external code (external libraries)
which will deal with them until they close them. As such we must not
close them upon fd_delete() but we need to delete them anyway so that
they do not appear anymore in the fdtab. This used to be handled by
fd_remove() before 2.3 but we don't have this anymore.
This patch introduces a new flag FD_DISOWN to let fd_delete() know that
the core doesn't own the fd and it must not be closed upon removal from
the fd_tab. This way it's totally unregistered from the poller but still
open.
This patch must be backported on branches >= 2.3 because it will be
needed to fix a bug affecting SSL async. it should be adapted on 2.3
because state flags were stored in a different way (via bits in the
structure).
BUG/MINOR: mux-quic: do not signal FIN if gap in buffer
Adjust FIN signal on Rx path for the application layer : ensure that the
receive buffer has no gap.
Without this extra condition, FIN was signalled as soon as the STREAM
frame with FIN was received, even if we were still waiting to receive
missing offsets.
This bug could have lead to incomplete requests read from the
application protocol. However, in practice this bug has very little
chance to happen as the application layer ensures that the demuxed frame
length is equivalent to the buffer data size. The only way to happen is
if to receive the FIN STREAM as the H3 demuxer is still processing on a
frame which is not the last one of the stream.
This must be backported up to 2.6. The previous patch on ncbuf is
required for the newly defined function ncb_is_fragmented().
MINOR: ncbuf: implement ncb_is_fragmented()
Implement a new status function for ncbuf. It allows to quickly report
if a buffer contains data in a fragmented way, i.e. with gaps in between
or at start of the buffer.
To summarize, a buffer is considered as non-fragmented in the following
cases :
- a null or empty buffer
- a full buffer
- a buffer containing exactly one data block at the beginning, following
by a gap until the end.
CLEANUP: mux-quic: adjust comment on qcs_consume()
Since a previous refactoring, application protocol layer is not require
anymore to call qcs_consume(). This function is now automatically used
by the MUX itself.
With ~1500 bytes QUIC datagrams, we can handle less than 200 datagrams
which is less than the default maxpollevents value. This should reduce
the chances of fulfilling the connections RX buffers as reported by
Tristan in GH #1737.
MINOR: quic: Improvements for the datagrams receipt
First we add a loop around recfrom() into the most low level I/O handler
quic_sock_fd_iocb() to collect as most as possible datagrams before during
its tasklet wakeup with a limit: we recvfrom() at most "maxpollevents"
datagrams. Furthermore we add a local task list into the datagram handler
quic_lstnr_dghdlr() which is passed to the first datagrams parser qc_lstnr_pkt_rcv().
This latter parser only identifies the connection associated to the datagrams then
wakeup the highest level packet parser I/O handlers (quic_conn.*io_cb()) after
it is done, thanks to the call to tasklet_wakeup_after() which replaces from now on
the call to tasklet_wakeup(). This should reduce drastically the latency and the
chances to fulfil the RX buffers at the QUIC connections level as reported in
GH #1737 by Tritan.
These modifications depend on this commit:
"MINOR: task: Add tasklet_wakeup_after()"
Must be backported to 2.6 with the previous commit.
MINOR: quic: Add new stats counter to diagnose RX buffer overrun
Remove the call to qc_list_all_rx_pkts() which print messages on stderr
during RX buffer overruns and add a new counter for the number of dropped packets
because of such events.
BUG/MINOR: quic: Dropped packets not counted (with RX buffers full)
When the connection RX buffer is full, the received packets are dropped.
Some of them were not taken into an account by the ->dropped_pkt counter.
This very simple patch has no impact at all on the packet handling workflow.
We want to be able to schedule a tasklet onto a thread after the current tasklet
is done. What we have to do is to insert this tasklet at the head of the thread
task list. Furthermore, we would like to serialize the tasklets. They must be
run in the same order as the order in which they have been scheduled. This is
implemented passing a list of tasklet as parameter (see <head> parameters) which
must be reused for subsequent calls.
_tasklet_wakeup_after_on() is implemented to accomplish this job.
tasklet_wakeup_after_on() and tasklet_wake_after() are only wrapper macros around
_tasklet_wakeup_after_on(). tasklet_wakeup_after_on() does exactly the same thing
as _tasklet_wakeup_after_on() without having to pass the filename and line in the
filename as parameters (usefull when DEBUG_TASK is enabled).
tasklet_wakeup_after() hides also the usage of the thread parameter which is
<tl> tasklet thread ID.
Return QPACK_DECOMPRESSION_FAILED error code when dealing with dynamic
table references. This is justified as for now haproxy does not
implement dynamic table support and advertizes a zero-sized table.
The H3 calling function will thus reuse this code in a CONNECTION_CLOSE
frame, in conformance with the QPACK RFC.
This proper error management allows to remove obsolete ABORT_NOW guards.
The main objective is to fix coverity report about usage of
uninitialized variable when receiving dynamic table references. These
references are invalid as for the moment haproxy advertizes a 0-sized
dynamic table. An ABORT_NOW clause is present to catch this. A following
patch will clean up this in order to properly handle QPACK errors with
CONNECTION_CLOSE.
This should fix github issue #1753.
No need to backport as this was introduced in the current dev branch.
Amaury Denoyelle [Thu, 30 Jun 2022 08:04:42 +0000 (10:04 +0200)]
MINOR: h3: handle errors on HEADERS parsing/QPACK decoding
Emit a CONNECTION_CLOSE if HEADERS parsing function returns an error.
This is useful to remove previous ABORT_NOW guards.
For the moment, the whole connection is closed. In the future, it may be
justified to only reset the faulting stream in some cases. This requires
the implementation of RESET_STREAM emission.
Willy Tarreau [Fri, 24 Jun 2022 20:09:05 +0000 (22:09 +0200)]
[RELEASE] Released version 2.7-dev1
Released version 2.7-dev1 with the following main changes :
- BUG/MINOR: ssl_ckch: Free error msg if commit changes on a cert entry fails
- BUG/MINOR: ssl_ckch: Free error msg if commit changes on a CA/CRL entry fails
- BUG/MEDIUM: ssl_ckch: Don't delete a cert entry if it is being modified
- BUG/MEDIUM: ssl_ckch: Don't delete CA/CRL entry if it is being modified
- BUG/MINOR: ssl_ckch: Don't duplicate path when replacing a cert entry
- BUG/MINOR: ssl_ckch: Don't duplicate path when replacing a CA/CRL entry
- BUG/MEDIUM: ssl_ckch: Rework 'commit ssl cert' to handle full buffer cases
- BUG/MEDIUM: ssl_ckch: Rework 'commit ssl ca-file' to handle full buffer cases
- BUG/MEDIUM: ssl/crt-list: Rework 'add ssl crt-list' to handle full buffer cases
- BUG/MEDIUM: httpclient: Don't remove HTX header blocks before duplicating them
- BUG/MEDIUM: httpclient: Rework CLI I/O handler to handle full buffer cases
- MEDIUM: httpclient: Don't close CLI applet at the end of a response
- MEDIUM: http-ana: Always report rewrite failures as PRXCOND in logs
- CLEANUP: Re-apply xalloc_size.cocci (2)
- REGTESTS: abortonclose: Add a barrier to not mix up log messages
- REGTESTS: http_request_buffer: Increase client timeout to wait "slow" clients
- CLEANUP: ssl_ckch: Use corresponding enum for commit_cacrlfile_ctx.cafile_type
- MINOR: ssl_ckch: Simplify I/O handler to commit changes on CA/CRL entry
- BUG/MINOR: ssl_ckch: Use right type for old entry in show_crlfile_ctx
- BUG/MINOR: ssl_ckch: Dump CRL transaction only once if show command yield
- BUG/MINOR: ssl_ckch: Dump CA transaction only once if show command yield
- BUG/MINOR: ssl_ckch: Dump cert transaction only once if show command yield
- BUG/MINOR: ssl_ckch: Init right field when parsing "commit ssl crl-file" cmd
- CLEANUP: ssl_ckch: Remove unused field in commit_cacrlfile_ctx structure
- MINOR: ssl_ckch: Simplify structure used to commit changes on CA/CRL entries
- MINOR: ssl_ckch: Remove service context for "set ssl cert" command
- MINOR: ssl_ckch: Remove service context for "set ssl ca-file" command
- MINOR: ssl_ckch: Remove service context for "set ssl crl-file" command
- BUG/MINOR: ssl_ckch: Fix possible uninitialized value in show_cert I/O handler
- BUG/MINOR: ssl_ckch: Fix possible uninitialized value in show_cafile I/O handler
- BUG/MINOR: ssl_ckch: Fix possible uninitialized value in show_crlfile I/O handler
- BUILD: ssl_ckch: Fix build error about a possible uninitialized value
- BUG/MINOR: ssl_ckch: Fix another possible uninitialized value
- REGTESTS: http_abortonclose: Extend supported versions
- REGTESTS: restrict_req_hdr_names: Extend supported versions
- MINOR: connection: support HTTP/3.0 for smp_*_http_major fetch
- MINOR: h3: add h3c pointer into h3s instance
- MINOR: mux-quic: simplify decode_qcs API
- MINOR: mux-quic/h3: adjust demuxing function return values
- BUG/MINOR: h3: fix return value on decode_qcs on error
- BUILD: quic: fix anonymous union for gcc-4.4
- BUILD: compiler: implement unreachable for older compilers too
- DEV: tcploop: reorder options in the usage message
- DEV: tcploop: make the current address the default address
- DEV: tcploop: make it possible to change the target address of a connect()
- DEV: tcploop: factor out the socket creation
- DEV: tcploop: permit port 0 to ease handling of default options
- DEV: tcploop: add a new "bind" command to bind to ip/port.
- DEV: tcploop: add minimal UDP support
- BUG/MINOR: trace: Test server existence for health-checks to get proxy
- BUG/MINOR: checks: Properly handle email alerts in trace messages
- BUG/MEDIUM: mailers: Set the object type for check attached to an email alert
- REGTESTS: healthcheckmail: Update the test to be functionnal again
- REGTESTS: healthcheckmail: Relax health-check failure condition
- BUG/MINOR: h3: fix incorrect BUG_ON assert on SETTINGS parsing
- MEDIUM: mux-h2: try to coalesce outgoing WINDOW_UPDATE frames
- OPTIM: mux-h2: increase h2_settings_initial_window_size default to 64k
- BUG/MINOR: h3: fix frame type definition
- BUG/MEDIUM: h3: fix SETTINGS parsing
- BUG/MINOR: cli/stats: add missing trailing LF after JSON outputs
- BUG/MINOR: server: do not enable DNS resolution on disabled proxies
- BUG/MINOR: cli/stats: add missing trailing LF after "show info json"
- DOC: design: update the notes on thread groups
- BUG/MEDIUM: mux-quic: fix flow control connection Tx level
- MINOR: mux-quic: complete BUG_ON on TX flow-control enforcing
- BUG/MINOR: mux-quic: fix memleak on frames rejected by transport
- BUG/MINOR: tcp-rules: Make action call final on read error and delay expiration
- CLEANUP: check: Remove useless tests on check's stream-connector
- BUG/MEDIUM: stconn: Don't wakeup applet for send if it won't consume data
- BUG/MEDIUM: cli: Notify cli applet won't consume data during request processing
- BUG/MEDIUM: mux-quic: fix segfault on flow-control frame cleanup
- MINOR: task: move profiling bit to per-thread
- CLEANUP: quic: use task_new_on() for single-threaded tasks
- MINOR: tinfo: remove the global thread ID bit (tid_bit)
- CLEANUP: hlua: check for at least 2 threads on a task
- MINOR: thread: get rid of MAX_THREADS_MASK
- OPTIM: task: do not consult shared WQ when we're already full
- DOC: design: update the task vs thread affinity requirements
- MINOR: qpack: add comments and remove a useless trace
- MINOR: qpack: reduce dependencies on other modules
- BUG/MINOR: qpack: support header litteral name decoding
- MINOR: qpack: add ABORT_NOW on unimplemented decoding
- BUG/MINOR: h3/qpack: deal with too many headers
- MINOR: qpack: improve decoding function
- MINOR: qpack: implement standalone decoder tool
- BUG/BUILD: h3: fix wrong label name
- BUG/MINOR: quic: Stop hardcoding Retry packet Version field
- MINOR: quic: Add several nonce and key definitions for Retry tag
- BUG/MINOR: quic: Wrong PTO calculation
- MINOR: quic: Parse long packet version from qc_parse_hd_form()
- CLEANUP: quid: QUIC draft-28 no more supported
- MEDIUM: quic: Add QUIC v2 draft support
- MINOR: quic: Released QUIC TLS extension for QUIC v2 draft
- MEDIUM: quic: Compatible version negotiation implementation (draft-08)
- CLEANUP: quic: Remove any reference to boringssl
- BUG/MINOR: task: fix thread assignment in tasklet_kill()
- BUG/MEDIUM: stream: Properly handle destructive client connection upgrades
- MINOR: stream: Rely on stconn flags to abort stream destructive upgrade
- CLEANUP: stconn: Don't expect to have no sedesc on detach
- BUG/MINOR: log: Properly test connection retries to fix dontlog-normal option
- MINOR: hlua: don't dump empty entries in hlua_traceback()
- MINOR: hlua: add a new hlua_show_current_location() function
- MEDIUM: debug: add a tainted flag when a shared library is loaded
- MEDIUM: debug: detect redefinition of symbols upon dlopen()
- BUILD: quic: Wrong HKDF label constant variable initializations
- BUG/MINOR: quic: Unexpected half open connection counter wrapping
- BUG/MINOR: quic_stats: Duplicate "quic_streams_data_blocked_bidi" field name
- BUG/MINOR: quic: purge conn Rx packet list on release
- BUG/MINOR: quic: free rejected Rx packets
- BUG/MINOR: qpack: abort on dynamic index field line decoding
- BUG/MEDIUM: ssl/cli: crash when crt inserted into a crt-list
- REGTESTS: ssl: add the same cert for client/server
- BUG/MINOR: quic: Acknowledgement must be forced during handshake
- MINOR: quic: Dump version_information transport parameter
- BUG/MEDIUM: mworker: use default maxconn in wait mode
- MINOR: intops: add a function to return a valid bit position from a mask
- TESTS: add a unit test for one_among_mask()
- BUILD: ssl_ckch: fix "maybe-uninitialized" build error on gcc-9.4 + ARM
- BUG/MINOR: ssl: Do not look for key in extra files if already in pem
- BUG/MINOR: quic: Missing acknowledgments for trailing packets
- BUG/MINOR: http-ana: Set method to HTTP_METH_OTHER when an HTTP txn is created
- BUG/MINOR: http-fetch: Use integer value when possible in "method" sample fetch
- MINOR: freq_ctr: Add a function to get events excess over the current period
- BUG/MINOR: stream: only free the req/res captures when set
- CLEANUP: pool/tree-wide: remove suffix "_pool" from certain pool names
- MEDIUM: debug: improve DEBUG_MEM_STATS to also report pool alloc/free
- BUG/MINOR: quic: Wrong reuse of fulfilled dgram RX buffer
- BUG/MAJOR: quic: Big RX dgrams leak when fulfilling a buffer
- BUG/MAJOR: quic: Big RX dgrams leak with POST requests
- BUILD: quic+h3: 32-bit compilation errors fixes
- MEDIUM: bwlim: Add support of bandwith limitation at the stream level
MEDIUM: bwlim: Add support of bandwith limitation at the stream level
This patch adds a filter to limit bandwith at the stream level. Several
filters can be defined. A filter may limit incoming data (upload) or
outgoing data (download). The limit can be defined per-stream or shared via
a stick-table. For a given stream, the bandwith limitation filters can be
enabled using the "set-bandwidth-limit" action.
A bandwith limitation filter can be used indifferently for HTTP or TCP
stream. For HTTP stream, only the payload transfer is limited. The filter is
pretty simple for now. But it was designed to be extensible. The current
design tries, as far as possible, to never exceed the limit. There is no
burst.
In GH #1760 (which is marked as being a feature), there were compilation
errors on MacOS which could be reproduced in Linux when building 32-bit code
(-m32 gcc option). Most of them were due to variables types mixing in QUIC_MIN macro
or using size_t type in place of uint64_t type.
BUG/MAJOR: quic: Big RX dgrams leak with POST requests
This previous commit:
"BUG/MAJOR: Big RX dgrams leak when fulfilling a buffer"
partially fixed an RX dgram memleak. There is a missing break in the loop which
looks for the first datagram attached to an RX buffer dgrams list which may be
reused (because consumed by the connection thread). So when several dgrams were
consumed by the connection thread and are present in the RX buffer list, some are
leaked because never reused for ever. They are removed for their list.
Furthermore, as commented in this patch, there is always at least one dgram
object attached to an RX dgrams list, excepted the first time we enter this
I/O handler function for this RX buffer. So, there is no need to use a loop
to lookup and reuse the first datagram in an RX buffer dgrams list.
This isssue was reproduced with quiche client with plenty of POST requests
(100000 streams):
cargo run --bin quiche-client -- https://127.0.0.1:8080/helloworld.html
--no-verify -n 100000 --method POST --body /var/www/html/helloworld.html
and could be reproduce with GET request.
BUG/MAJOR: quic: Big RX dgrams leak when fulfilling a buffer
When entering quic_sock_fd_iocb() I/O handler which is responsible
of recvfrom() datagrams, the first thing which is done it to try
to reuse a dgram object containing metadata about the received
datagrams which has been consumed by the connection thread.
If this object could not be used for any reason, so when we
"goto out" of this function, we must release the memory allocated
for this objet, if not it will leak. Most of the time, this happened
when we fulfilled a buffer as reported in GH #1749 by Tristan. This is why we
added a pool_free() call just before the out label. We mark <new_dgram> as NULL
when it successfully could be used.
Thank you for Tristan and Willy for their participation on this issue.
BUG/MINOR: quic: Wrong reuse of fulfilled dgram RX buffer
After having fulfilled a buffer, then marked it as full, we must
consume the remaining space. But to do that, and not to erase the
already existing data, we must check there is not remaining data in
after the tail of the buffer (between the tail and the head).
This is done adding a condition to test that adding the number of
bytes from the remaining contiguous space to the tail does not
pass the wrapping postion in the buffer.
Willy Tarreau [Thu, 23 Jun 2022 08:54:17 +0000 (10:54 +0200)]
MEDIUM: debug: improve DEBUG_MEM_STATS to also report pool alloc/free
Sometimes using "debug dev memstats" can be frustrating because all
pool allocations are reported through pool-os.h and that's all.
But in practice there's nothing wrong with also intercepting pool_alloc,
pool_free and pool_zalloc and report their call counts and locations,
so that's what this patch does. It only uses an alternate set of macroes
for these 3 calls when DEBUG_MEM_STATS is defined. The outputs are
reported as P_ALLOC (for both pool_malloc() and pool_zalloc()) and
P_FREE (for pool_free()).
Willy Tarreau [Thu, 23 Jun 2022 09:02:08 +0000 (11:02 +0200)]
CLEANUP: pool/tree-wide: remove suffix "_pool" from certain pool names
A curious practise seems to have started long ago and contaminated various
code areas, consisting in appending "_pool" at the end of the name of a
given pool. That makes no sense as the name is only used to name the pool
in diags such as "show pools", and since names are truncated there, this
adds some confusion when analysing the dump outputs. Let's just clean all
of them at once. there were essentially in SSL and QUIC.
Willy Tarreau [Thu, 23 Jun 2022 09:46:14 +0000 (11:46 +0200)]
BUG/MINOR: stream: only free the req/res captures when set
There's a subtle bug in stream_free() when releasing captures. The
pools may be NULL when no capture is defined, and the calls to
pool_free() are inconditional. The only reason why this doesn't
cause trouble is because the pointer to be freed is always NULL in
this case and we don't go further down the chain. That's particularly
ugly and it complicates debugging, so let's only call these ones when
the pointers are set.
There's no impact on running code, it only fools those trying to debug
pools manually. There's no need to backport it though it unless it helps
for debugging sessions.
MINOR: freq_ctr: Add a function to get events excess over the current period
freq_ctr_overshoot_period() function may be used to retrieve the excess of
events over the current period for a givent frequency counter, ignoring the
history. It is a way compare the "current rate" (the number of events over
the current period) to a given rate and estimate the excess of events.
It may be used to safely add new events, especially at the begining of the
current period for a frequency counter with large period. This way, it is
possible to smoothly add events during the whole period without quickly
consuming all the quota at the beginning of the period and waiting for the
next one to be able to add new events.
BUG/MINOR: http-fetch: Use integer value when possible in "method" sample fetch
Because of the previous fix, if the HTTP parsing is performed when the
"method" sample fetch is called, we always rely on the string representation
of the request method.
Indeed, if no parsing was performed when the "method" sample fetch is
called, the transaction method is HTTP_METH_OTHER because it was just
initialized. However, without this patch, in this case, we always retrieve
the method by reading the request start-line.
Now, when the method is HTTP_METH_OTHER, we systematically try to parse the
request but the method is tested once again after the parsing to be able to
use the integer representation when possible.
BUG/MINOR: http-ana: Set method to HTTP_METH_OTHER when an HTTP txn is created
This patch is required to fix "method" sample fetch. But it make sense to
initialize the method of an HTTP transaction to HTTP_METH_OTHER. This way,
before the request parsing, the method is considered as unknown except if we
are able to retrieve the request start-line. It is especially important for
TCP streams.
About the "method" sample fetch, this patch is a way to be sure no random
method is returned when the sample fetch is used on a TCP stream before any
HTTP parsing.
BUG/MINOR: quic: Missing acknowledgments for trailing packets
This bug was revealed by key_update QUIC tracker test. During this test,
after the handshake has succeeded, the client sends a 1-RTT packet containing
only a PING frame. On our side, we do not acknowledge it, because we have
no ack-eliciting packet to send. This is not correct. We must acknowledge all
the ack-eliciting packets unconditionally. But we must not send too much
ACK frames: every two packets since we have sent an ACK frame. This is the test
(nb_aepkts_since_last_ack >= QUIC_MAX_RX_AEPKTS_SINCE_LAST_ACK) which ensure
this is the case. But this condition must be checked at the very last time,
when we are building a packet and when an acknowledgment is required. This
is not to qc_may_build_pkt() to do that. This boolean function decides if
we have packets to send. It must return "true" if there is no more ack-eliciting
packets to send and if an acknowledgement is required.
We must also add a "force_ack" parameter to qc_build_pkt() to force the
acknowledments during the handshakes (for each packet). If not, they will
be sent every two packets. This parameter must be passed to qc_do_build_pkt()
and be checked alongside the others conditions when building a packet to
decide to add an ACK frame or not to this packet.
BUG/MINOR: ssl: Do not look for key in extra files if already in pem
A bug was introduced by commit 9bf3a1f67eb3bc6f02abcabf8ab141840c7a1db2
"BUG/MINOR: ssl: Fix crash when no private key is found in pem".
If a private key is already contained in a pem file, we will still look
for a .key file and load its private key if it exists when we should
not.
This patch should be backported to all branches where the original fix
was backported (all the way to 2.2).
Willy Tarreau [Wed, 22 Jun 2022 03:40:25 +0000 (05:40 +0200)]
BUILD: ssl_ckch: fix "maybe-uninitialized" build error on gcc-9.4 + ARM
As reported in issue #1755, gcc-9.3 and 9.4 emit a "maybe-uninitialized"
warning in cli_io_handler_commit_cafile_crlfile() because it sees that
when the "path" variable is not set, we're jumping to the error label
inside the loop but cannot see that the new state will avoid the places
where the value is used. Thus it's a false positive but a difficult one.
Let's just preset the value to NULL to make it happy.
This was introduced in 2.7-dev by commit ddc8e1cf8 ("MINOR: ssl_ckch:
Simplify I/O handler to commit changes on CA/CRL entry"), thus no
backport is needed for now.
Willy Tarreau [Tue, 21 Jun 2022 18:19:54 +0000 (20:19 +0200)]
MINOR: intops: add a function to return a valid bit position from a mask
Sometimes we need to be able to signal one thread among a mask, without
caring much about which bit will be picked. At the moment we use ffsl()
for this but this sometimes results in imbalance at certain specific
places where the same first thread in a set is always the same one that
is selected.
Another approach would consist in using the rank finding function but it
requires a popcount and a setup phase, and possibly a modulo operation
depending on the popcount, which starts to be very expensive.
Here we take a different approach. The idea is an input bit value is
passed, from 0 to LONGBITS-1, and that as much as possible we try to
pick the bit matching it if it is set. Otherwise we look at a mirror
position based on a decreasing power of two, and jump to the side
that still has bits left. In 6 iterations it ends up spotting one bit
among 64 and the operations are very cheap and optimizable. This method
has the benefit that we don't care where the holes are located in the
mask, thus it shows a good distribution of output bits based on the
input ones. A long-time test shows an average of 16 cycles, or ~4ns
per lookup at 3.8 GHz, which is about twice as fast as using the rank
finding function.
Just like for that one, the code was stored into tools.c since we don't
have a C file for intops.
BUG/MEDIUM: mworker: use default maxconn in wait mode
In bug #1751, it was reported that haproxy is consumming too much memory
since the 2.4 version. This is because of a change in the master, which
loses completely its configuration in wait mode, and lose its maxconn.
Without the maxconn, haproxy will try to compute one itself, and will
allocate RAM consequently, too much in our case. Which means the master
will have a too high maxconn and too much RAM allocated.
The patch fixes the issue by setting the maxconn to the default value
when re-executing the master in wait mode.
MINOR: quic: Dump version_information transport parameter
Implement quic_tp_version_info_dump() to dump such a transport parameter (only remote).
Call it from quic_transport_params_dump() which dump all the transport parameters.
Can be backported to 2.6 as it's useful for debugging.
BUG/MINOR: quic: Acknowledgement must be forced during handshake
All packets received during hanshakes must be acknowledged asap. This was
not the case for Handshake packets received. At this time, this had
no impact because the client has often only one Handshake packet to send
and last handshake to be sent on our side always embeds an HANDSHAKE_DONE
frame which leads the client to consider it has no more handshake packet
to send.
Add <force_ack> to qc_may_build_pkt() to force an ACK frame to be sent.
Set this parameter to 1 when sending packets from Initial or Handshake
packet number spaces, 0 when sending only Application level packet.
Amaury Denoyelle [Mon, 20 Jun 2022 13:47:46 +0000 (15:47 +0200)]
BUG/MINOR: qpack: abort on dynamic index field line decoding
Add an ABORT_NOW() clause if indexed field line referred to the dynamic
table. This is required as current haproxy QPACK implementation does not
support the dynamic table.
Note that this should not happen as haproxy explicitely advertizes a
null-sized dynamic table to the other peer.
Amaury Denoyelle [Mon, 20 Jun 2022 08:58:03 +0000 (10:58 +0200)]
BUG/MINOR: quic: free rejected Rx packets
Free Rx packet in the datagram handler if the packet was not taken in
charge by a quic_conn instance. This is reflected by the packet refcount
which is null.
A packet can be rejected for a variety of reasons. For example, failed
decryption, no Initial token and Retry emission or for datagram null
padding.
This patch should resolve the Rx packets memory leak observed via "show
pools" with the previous commit 2c31e1293661cd44b71457b8fd0567b08ef95b0b
BUG/MINOR: quic: purge conn Rx packet list on release
This specific memory leak instance was reproduced using quiche client
which uses null datagram padding.
Amaury Denoyelle [Mon, 20 Jun 2022 08:52:55 +0000 (10:52 +0200)]
BUG/MINOR: quic: purge conn Rx packet list on release
When releasing a quic_conn instance, free all remaining Rx packets in
quic_conn.rx.pkt_list. This partially fixes a memory leak on Rx packets
which can be observed after several QUIC connections establishment.
BUG/MINOR: quic_stats: Duplicate "quic_streams_data_blocked_bidi" field name
As reported by broxio in GH #1757, there was a duplication field name
for "quic_streams_data_blocked_bidi", due to a copy and paste without
renaming I guess.
BUG/MINOR: quic: Unexpected half open connection counter wrapping
This counter must be incremented only one time by connection and decremented
as soon as the handshake has failed or succeeded. This is a gauge. Under certain
conditions this counter could be decremented twice. For instance
after having received a TLS alert, then upon SSL_do_handshake() failure.
To stop having to deal to all the current combinations which can lead to such a
situation (and the next to come), add a connection flag to denote if this counter
has been already decremented for a connection. So, this counter must be decremented
only if this flag has not been already set.
Non constant expressions were used to initialize constant variables leading to
such compilation errors:
src/xprt_quic.c:66:3: error: initializer element is not a constant expression
.key_label_len = strlen(QUIC_HKDF_KEY_LABEL_V1),
Reproduced with CC=gcc-4.9 compilation option.
Fix using macros for each HKDF label.
Willy Tarreau [Sun, 19 Jun 2022 14:49:51 +0000 (16:49 +0200)]
MEDIUM: debug: detect redefinition of symbols upon dlopen()
In order to better detect the danger caused by extra shared libraries
which replace some symbols, upon dlopen() we now compare a few critical
symbols such as malloc(), free(), and some OpenSSL symbols, to see if
the loaded library comes with its own version. If this happens, a
warning is emitted and TAINTED_REDEFINITION is set. This is important
because some external libs might be linked against different libraries
than the ones haproxy was linked with, and most often this will end
very badly (e.g. an OpenSSL object is allocated by haproxy and freed
by such libs).
Since the main source of dlopen() calls is the Lua lib, via a "require"
statement, it's worth trying to show a Lua call trace when detecting a
symbol redefinition during dlopen(). As such we emit a Lua backtrace if
Lua is detected as being in use.
Willy Tarreau [Sun, 19 Jun 2022 14:41:59 +0000 (16:41 +0200)]
MEDIUM: debug: add a tainted flag when a shared library is loaded
Several bug reports were caused by shared libraries being loaded by other
libraries or some Lua code. Such libraries could define alternate symbols
or include dependencies to alternate versions of a library used by haproxy,
making it very hard to understand backtraces and analyze the issue.
Let's intercept dlopen() and set a new TAINTED_SHARED_LIBS flag when it
succeeds, so that "show info" makes it visible that some external libs
were added.
The redefinition is based on the definition of RTLD_DEFAULT or RTLD_NEXT
that were previously used to detect that dlsym() is usable, since we need
it as well. This should be sufficient to catch most situations.
Willy Tarreau [Sun, 19 Jun 2022 15:39:33 +0000 (17:39 +0200)]
MINOR: hlua: add a new hlua_show_current_location() function
This function may be used to try to show where some Lua code is currently
being executed. It tries hard to detect the initialization phase, both for
the global and the per-thread states, and for runtime states. This intends
to be used by error handlers to provide the users with indications about
what Lua code was being executed when the error triggered.
Willy Tarreau [Sun, 19 Jun 2022 15:35:53 +0000 (17:35 +0200)]
MINOR: hlua: don't dump empty entries in hlua_traceback()
Calling hlua_traceback() sometimes reports empty entries looking like:
[C]: ?
These ones correspond to certain internal C functions of the Lua library,
but they do not provide any information and even complicate the
interpretation of the dump. Better just skip them.
BUG/MINOR: log: Properly test connection retries to fix dontlog-normal option
The commit 731c8e6cf ("MINOR: stream: Simplify retries counter calculation")
introduced a regression. It broke the dontlog-normal option because the test
on the connection retries counter was not updated accordingly.
This patch should fix the issue #1754. It must be backported to 2.6.
MINOR: stream: Rely on stconn flags to abort stream destructive upgrade
On destructive connection upgrade, instead of using the new mux name to
abort the old stream, we can relay on the stream connector flags. If it is
detached after the upgrade, it means the stream will not be resused by the
new mux and it must be aborted.
When the protocol is changed for a client connection at the stream level
(from TCP to H1/H2), there are two cases. The stream may be reused or
not. The first case, when the stream is reused is working. The second one is
buggy since the conn-stream refactoring and leads to a crash.
In this case, the new mux don't reuse the stream. It must be silently
aborted. However, its front stream connector is still referencing the
connection. So it must be detached. But it must be performed in two stages,
to be sure to not loose the context for the upgrade and to be able to
rollback on error. So now, before the upgrade, we prepare to detach the
stconn and it is finally detached if the upgrade succeeds. There is a trick
here. Because we pretend the stconn is detached but its state is preserved.
Willy Tarreau [Wed, 15 Jun 2022 13:54:56 +0000 (15:54 +0200)]
BUG/MINOR: task: fix thread assignment in tasklet_kill()
tasklet_kill() was introduced in 2.5-dev4 with commit 7b368339a
("MEDIUM: task: implement tasklet kill"), but a comparison error
there makes tasklets killed on thread 1 assigned to the killing
thread. Fortunately, the function was finally not used so there's
no harm right now, hence the minor tag, but this must be fixed and
backported in case a later fix relies on it.
MEDIUM: quic: Compatible version negotiation implementation (draft-08)
At this time haproxy supported only incompatible version negotiation feature which
consists in sending a Version Negotiation packet after having received a long packet
without compatible value in its version field. This version value is the version
use to build the current packet. This patch does not modify this behavior.
This patch adds the support for compatible version negotiation feature which
allows endpoints to negotiate during the first flight or packets sent by the
client the QUIC version to use for the connection (or after the first flight).
This is done thanks to "version_information" parameter sent by both endpoints.
To be short, the client offers a list of supported versions by preference order.
The server (or haproxy listener) chooses the first version it also supported as
negotiated version.
This implementation has an impact on the tranport parameters handling (in both
direcetions). Indeed, the server must sent its version information, but only
after received and parsed the client transport parameters). So we cannot
encode these parameters at the same time we instantiated a new connection.
Add QUIC_TP_DRAFT_VERSION_INFORMATION(0xff73db) new transport parameter.
Add tp_version_information new C struct to handle this new parameter.
Implement quic_transport_param_enc_version_info() (resp.
quic_transport_param_dec_version_info()) to encode (resp. decode) this
parameter.
Add qc_conn_finalize() which encodes the transport parameters and configure
the TLS stack to send them.
Add ->negotiated_ictx quic_conn C struct new member to store the Initial
QUIC TLS context for the negotiated version. The Initial secrets derivation
is version dependent.
Rename ->version to ->original_version and add ->negotiated_version to
this C struct to reflect the QUIC-VN RFC denomination.
Modify most of the QUIC TLS API functions to pass a version as parameter.
Export the QUIC version definitions to be reused at least from quic_tp.c
(transport parameters.
Move the token check after the QUIC connection lookup. As this is the original
version which is sent into a Retry packet, and because this original version is
stored into the connection, we must check the token after having retreived this
connection.
Add packet version to traces.
See https://datatracker.ietf.org/doc/html/draft-ietf-quic-version-negotiation-08
for more information about this new feature.
MINOR: quic: Released QUIC TLS extension for QUIC v2 draft
This is not clear at all how to distinguish a QUIC draft version number from a
released one. And among these QUIC draft versions, which one must use the draft
QUIC TLS extension.
According to the QUIC implementations which support v2 draft, the TLS extension
(transport parameters) to be used is the released one
(TLS_EXTENSION_QUIC_TRANSPORT_PARAMETERS).
As the unique QUIC draft version we support is 0xff00001d and as at this time the
unique version with 0xff as most significant byte is this latter which must use
the draft TLS extension, we select the draft TLS extension
(TLS_EXTENSION_QUIC_TRANSPORT_PARAMETERS_DRAFT) only for such versions with 0xff
as most signification byte.
This is becoming difficult to handle the QUIC TLS related definitions
which arrive with a QUIC version (draft or not). So, here we add
quic_version C struct which does not define only the QUIC version number,
but also the QUIC TLS definitions which depend on a QUIC version.
Modify consequently all the QUIC TLS API to reuse these definitions
through new quic_version C struct.
Implement quic_pkt_type() function which return a packet type (0 up to 3)
depending on the QUIC version number.
Stop harding the Retry packet first byte in send_retry(): this is not more
possible because the packet type field depends on the QUIC version.
Also modify quic_build_packet_long_header() for the same reason: the packet
type depends on the QUIC version.
Add a quic_version C struct member to quic_conn C struct.
Modify qc_lstnr_pkt_rcv() to set this member asap.
Remove the version member from quic_rx_packet C struct: a packet is attached
asap to a connection (or dropped) which is the unique object which should
store the QUIC version.
Modify qc_pkt_is_supported_version() to return a supported quic_version C
struct from a version number.
Add Initial salt for QUIC v2 draft (initial_salt_v2_draft).