DEBUG: pools: inspect pools on fatal error and dump information found
It's a bit frustrating sometimes to see pool checks catch a bug but not
provide exploitable information without a core.
Here we're adding a function "pool_inspect_item()" which is called just
before aborting in pool_check_pattern() and POOL_DEBUG_CHECK_MARK() and
which will display the error type, the pool's pointer and name, and will
try to check if the item's tag matches the pool, and if not, will iterate
over all pools to see if one would be a better candidate, then will try
to figure the last known caller and possibly other likely candidates if
the pool's tag is not sufficiently trusted. This typically helps better
diagnose corruption in use-after-free scenarios, or freeing to a pool
that differs from the one the object was allocated from, and will also
indicate calling points that may help figure where an object was last
released or allocated. The info is printed on stderr just before the
backtrace.
For example, the recent off-by-one test in the PPv2 changes would have
produced the following output in vtest logs:
*** h1 debug|FATAL: pool inconsistency detected in thread 1: tag mismatch on free().
*** h1 debug| caller: 0x62bb87 (conn_free+0x147/0x3c5)
*** h1 debug| pool: 0x2211ec0 ('pp_tlv_256', size 304, real 320, users 1)
*** h1 debug|Tag does not match. Possible origin pool(s):
*** h1 debug| tag: @0x2565530 = 0x2216740 (pp_tlv_128, size 176, real 192, users 1)
*** h1 debug|Recorded caller if pool 'pp_tlv_128':
*** h1 debug| @0x2565538 (+0184) = 0x62c76d (conn_recv_proxy+0x4cd/0xa24)
A mismatch in the allocated/released pool is already visible, and the
callers confirm it once resolved, where the allocator indeed allocates
from pp_tlv_128 and conn_free() releases to pp_tlv_256:
$ addr2line -spafe ./haproxy <<< $'0x62bb87\n0x62c76d'
0x000000000062bb87: conn_free at connection.c:568
0x000000000062c76d: conn_recv_proxy at connection.c:1177
DEBUG: pools: pass the caller pointer to the check functions and macros
In preparation for more detailed pool error reports, let's pass the
caller pointers to the check functions. This will be useful to produce
messages indicating where the issue happened.
DEBUG: pools: always record the caller for uncached allocs as well
When recording the caller of a pool_alloc(), we currently store it only
when the object comes from the cache and never when it comes from the
heap. There's no valid reason for this except that the caller's pointer
was not passed to pool_alloc_nocache(), so it used to set NULL there.
Let's just pass it down the chain.
When using the listener socket as file descriptor, qc->fd value is -1.
In this case one must not access fdtab[qc->fd] element to change its value.
This bug could have been detected by asan with such a backtrace:
=================================================================
==402222==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa8ecf417ex7fa8e915cf90 sp 0x7fa8e915cf88
WRITE of size 8 at 0x7fa8ecf417e8 thread T6
#0 0x55707a0bf18a in qc_new_cc_conn src/quic_conn.c:838
#1 0x55707a0c6dc0 in quic_conn_release src/quic_conn.c:1408
#2 0x55707a10916f in quic_close src/xprt_quic.c:35
#3 0x55707a0cec77 in conn_xprt_close include/haproxy/connection.h:153
#4 0x55707a0ceed0 in conn_full_close include/haproxy/connection.h:197
#5 0x55707a0ec253 in qcc_release src/mux_quic.c:2412
#6 0x55707a0ec7d0 in qcc_io_cb src/mux_quic.c:2443
#7 0x55707a63ff2a in run_tasks_from_lists src/task.c:596
#8 0x55707a641cc9 in process_runnable_tasks src/task.c:876
#9 0x55707a56f7b2 in run_poll_loop src/haproxy.c:2954
#10 0x55707a5705fd in run_thread_poll_loop src/haproxy.c:3153
#11 0x7fa8f9450ea6 in start_thread nptl/pthread_create.c:477
#12 0x7fa8f936ea2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)
0x7fa8ecf417e8 is located 24 bytes to the left of 134217728-byte region [0x7fa8e
allocated by thread T0 here:
#0 0x7fa8f9a37037 in __interceptor_calloc ../../../../src/libsanitizer/asan/
#1 0x55707a71a61d in init_pollers src/fd.c:1161
#2 0x55707a56cdf1 in init src/haproxy.c:2672
#3 0x55707a5714c2 in main src/haproxy.c:3298
#4 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308
Thread T6 created by T0 here:
#0 0x7fa8f99e22a2 in __interceptor_pthread_create ../../../../src/libsanitizpp:214
#1 0x55707a748a21 in setup_extra_threads src/thread.c:252
#2 0x55707a5735c9 in main src/haproxy.c:3844
#3 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-buffer-overflow src/quic_conn.c:838 in qc_new_cc
Shadow bytes around the buggy address:
0x0ff59d9e02a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff59d9e02b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff59d9e02c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff59d9e02d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff59d9e02e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0ff59d9e02f0: fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]fa fa
0x0ff59d9e0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff59d9e0310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff59d9e0320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff59d9e0330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff59d9e0340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==402222==ABORTING
Aborted
Thank you to @Tristan971 for having reported this bug in GH #2247.
Released version 2.9-dev5 with the following main changes :
- BUG/MEDIUM: mux-h2: fix crash when checking for reverse connection after error
- BUILD: import: guard plock.h against multiple inclusion
- BUILD: pools: import plock.h to build even without thread support
- BUG/MINOR: ssl/cli: can't find ".crt" files when replacing a certificate
- BUG/MINOR: stream: protect stream_dump() against incomplete streams
- DOC: config: mention uid dependency on the tune.quic.socket-owner option
- MEDIUM: capabilities: enable support for Linux capabilities
- CLEANUP/MINOR: connection: Improve consistency of PPv2 related constants
- MEDIUM: connection: Generic, list-based allocation and look-up of PPv2 TLVs
- MEDIUM: sample: Add fetch for arbitrary TLVs
- MINOR: sample: Refactor fc_pp_authority by wrapping the generic TLV fetch
- MINOR: sample: Refactor fc_pp_unique_id by wrapping the generic TLV fetch
- MINOR: sample: Add common TLV types as constants for fc_pp_tlv
- MINOR: ssl_sock: avoid iterating realloc(+1) on stored context
- DOC: ssl: add some comments about the non-obvious session allocation stuff
- CLEANUP: ssl: keep a pointer to the server in ssl_sock_init()
- MEDIUM: ssl_sock: always use the SSL's server name, not the one from the tid
- MEDIUM: server/ssl: place an rwlock in the per-thread ssl server session
- MINOR: server/ssl: maintain an index of the last known valid SSL session
- MINOR: server/ssl: clear the shared good session index on failure
- MEDIUM: server/ssl: pick another thread's session when we have none yet
- MINOR: activity: report the current run queue size
- BUG/MINOR: checks: do not queue/wake a bounced check
- MINOR: checks: start the checks in sleeping state
- MINOR: checks: pin the check to its thread upon wakeup
- MINOR: check: remember when we migrate a check
- MINOR: check/activity: collect some per-thread check activity stats
- MINOR: checks: maintain counters of active checks per thread
- MINOR: check: also consider the random other thread's active checks
- MEDIUM: checks: search more aggressively for another thread on overload
- MEDIUM: checks: implement a queue in order to limit concurrent checks
- MINOR: checks: also consider the thread's queue for rebalancing
- DEBUG: applet: Properly report opposite SC expiration dates in traces
- BUG/MEDIUM: stconn: Update stream expiration date on blocked sends
- BUG/MINOR: stconn: Don't report blocked sends during connection establishment
- BUG/MEDIUM: stconn: Wake applets on sending path if there is a pending shutdown
- BUG/MEDIUM: stconn: Don't block sends if there is a pending shutdown
- BUG/MINOR: quic: Possible skipped RTT sampling
- MINOR: quic: Add a trace to quic_release_frm()
- BUG/MAJOR: quic: Really ignore malformed ACK frames.
- BUG/MINOR: quic: Unchecked pointer to packet number space dereferenced
- BUG/MEDIUM: connection: fix pool free regression with recent ppv2 TLV patches
- BUG/MEDIUM: h1-htx: Ensure chunked parsing with full output buffer
- BUG/MINOR: stream: further protect stream_dump() against incomplete sessions
- DOC: configuration: update examples for req.ver
- MINOR: properly mark the end of the CLI command in error messages
- BUILD: ssl: Build with new cryptographic library AWS-LC
- REGTESTS: ssl: skip ssl_dh test with AWS-LC
- BUILD: bug: make BUG_ON() void to avoid a rare warning
- BUILD: checks: shut up yet another stupid gcc warning
- MINOR: cpuset: add ha_cpuset_isset() to check for the presence of a CPU in a set
- MINOR: cpuset: add ha_cpuset_or() to bitwise-OR two CPU sets
- MINOR: cpuset: centralize a reliable bound cpu detection
- MEDIUM: threads: detect incomplete CPU bindings
- MEDIUM: threads: detect excessive thread counts vs cpu-map
- BUILD: quic: Compilation issue on 32-bits systems with quic_may_send_bytes()
- BUG/MINOR: quic: Unchecked pointer to Handshake packet number space
- MINOR: global: export the display_version() symbol
- MEDIUM: mworker: display a more accessible message when a worker crash
- MINOR: httpclient: allow to configure the retries
- MINOR: httpclient: allow to configure the timeout.connect
- BUG/MINOR: quic: Wrong RTT adjusments
- BUG/MINOR: quic: Wrong RTT computation (srtt and rrt_var)
- BUG/MINOR: stconn: Don't inhibit shutdown on connection on error
- BUG/MEDIUM: applet: Fix API for function to push new data in channels buffer
- BUG/MEDIUM: stconn: Report read activity when a stream is attached to front SC
- BUG/MEDIUM: applet: Report an error if applet request more room on aborted SC
- BUG/MEDIUM: stconn/stream: Forward shutdown on write timeout
- NUG/MEDIUM: stconn: Always update stream's expiration date after I/O
- BUG/MINOR: applet: Always expect data when CLI is waiting for a new command
- BUG/MINOR: ring/cli: Don't expect input data when showing events
- BUG/MINOR: quic: Dereferenced unchecked pointer to Handshke packet number space
- BUG/MINOR: hlua/action: incorrect message on E_YIELD error
- MINOR: http_ana: position the FINAL flag for http_after_res execution
- CI: scripts: add support to build-ssl.sh to download and build AWS-LC
- CI: add support to matrix.py to determine the latest AWS-LC release
- CI: Update matrix.py so all code is contained in functions.
- CI: github: Add a weekly CI run building with AWS-LC
- MINOR: ring: add a function to compute max ring payload
- BUG/MEDIUM: ring: adjust maxlen consistency check
- MINOR: sink: simplify post_sink_resolve function
- MINOR: log/sink: detect when log maxlen exceeds sink size
- MINOR: sink: inform the user when logs will be implicitly truncated
- MEDIUM: sink: don't perform implicit truncations when maxlen is not set
- MINOR: log: move log-forwarders cleanup in log.c
- MEDIUM: httpclient/logs: rely on per-proxy post-check instead of global one
- MINOR: log: add dup_logsrv() helper function
- MEDIUM: log/sink: make logsrv postparsing more generic
- MEDIUM: fcgi-app: properly postresolve logsrvs
- MEDIUM: spoe-agent: properly postresolve log rings
- MINOR: sink: add helper function to deallocate sink struct
- MEDIUM: sink/ring: introduce high level ring creation helper function
- MEDIUM: sink: add sink_finalize() function
- CLEANUP: log: remove unnecessary trim in __do_send_log
- MINOR: cache: Change hash function in default normalizer used in case of "vary"
- MINOR: tasks/stats: report the number of niced tasks in "show info"
- CI: Update to actions/checkout@v4
- MINOR: ssl: add support for 'curves' keyword on server lines
- BUG/MINOR: quic: Wrong cluster secret initialization
- CLEANUP: quic: Remove useless free_quic_tx_pkts() function.
- MEDIUM: init: initialize the trash earlier
- MINOR: tools: add function read_line_to_trash() to read a line of a file
- MINOR: cfgparse: use read_line_from_trash() to read from /sys
- MEDIUM: cfgparse: assign NUMA affinity to cpu-maps
- MINOR: cpuset: dynamically allocate cpu_map
- REORG: cpuset: move parse_cpu_set() and parse_cpumap() to cpuset.c
- CI: musl: highlight section if there are coredumps
- CI: musl: drop shopt in workflow invocation
REORG: cpuset: move parse_cpu_set() and parse_cpumap() to cpuset.c
These ones were still in cfgparse.c but they're not specific to the
config at all and may actually be used even when parsing cpu list
entries in /sys. Better move them where they can be reused.
cpu_map is 8.2kB/entry and there's one such entry per group, that's
~520kB total. In addition, the init code is still in haproxy.c enclosed
in ifdefs. Let's make this a dynamically allocated array in the cpuset
code and remove that init code.
Later we may even consider reallocating it once the number of threads
and groups is known, in order to shrink it a little bit, as the typical
setup with a single group will only need 8.2kB, thus saving half a MB
of RAM. This would require that the upper bound is placed in a variable
though.
MEDIUM: cfgparse: assign NUMA affinity to cpu-maps
Do not force affinity on the process, instead let's just apply it to
cpu-map, it will automatically be used later in the init process. We
can do this because we know that cpu-map was not set when we're using
this detection code.
This is much saner, as we don't need to manipulate the process' affinity
at this point in time, and just update the info that the user omitted to
set by themselves, which guarantees a better long-term consistency with
the documented feature.
MINOR: cfgparse: use read_line_from_trash() to read from /sys
It's easier to use this function now to natively support variable
fields in the file's path. This also removes read_file_from_trash()
that was only used here and was static.
MINOR: tools: add function read_line_to_trash() to read a line of a file
This function takes on input a printf format for the file name, making
it particularly suitable for /proc or /sys entries which take a lot of
numbers. It also automatically trims the trailing CR and/or LF chars.
More and more utility functions rely on the trash while most of the init
code doesn't have access to it because it's initialized very late (in
PRE_CHECK for the initial one). It's a pool, and it purposely supports
being reallocated, so let's initialize it in STG_POOL so that early
STG_INIT code can at least use it.
The function generate_random_cluster_secret() which initializes the cluster secret
when not supplied by configuration is buggy. There 1/256 that the cluster secret
string is empty.
To fix this, one stores the cluster as a reduced size first 128 bits of its own
SHA1 (160 bits) digest, if defined by configuration. If this is not the case, it
is initialized with a 128 bits random value. Furthermore, thus the cluster secret
is always initialized.
As the cluster secret is always initialized, there are several tests which
are for now on useless. This patch removes such tests (if(global.cluster_secret))
in the QUIC code part and at parsing time: no need to check that a cluster
secret was initialized with "quic-force-retry" option.
MINOR: tasks/stats: report the number of niced tasks in "show info"
We currently know the number of tasks in the run queue that are niced,
and we don't expose it. It's too bad because it can give a hint about
what share of the load is relevant. For example if one runs a Lua
script that was purposely reniced, or if a stats page or the CLI is
hammered with slow operations, seeing them appear there can help
identify what part of the load is not caused by the traffic, and
improve monitoring systems or autoscalers.
MINOR: cache: Change hash function in default normalizer used in case of "vary"
When building the secondary signature for cache entries when vary is
enabled, the referer part of the signature was a simple crc32 of the
first referer header.
This patch changes it to a 64bits hash based of xxhash algorithm with a
random seed built during init. This will prevent "malicious" hash
collisions between entries of the cache.
CLEANUP: log: remove unnecessary trim in __do_send_log
Since both sink_write and fd_write_frag_line take the maxlen parameter
as argument, there is no added value for the trim before passing the
msg parameter to those functions.
To further clean the code and remove duplication, some sink postparsing
and sink->sft finalization is now performed in a dedicated function
named sink_finalize().
MINOR: sink: add helper function to deallocate sink struct
In this patch we move sink freeing logic outside of sink_deinit() function
in order to create the sink_free() helper function that could be used
on error paths for example.
MEDIUM: log/sink: make logsrv postparsing more generic
We previously had postparsing logic but only for logsrv sinks, but now we
need to make this operation on logsrv directly instead of sinks to prepare
for additional postparsing logic that is not sink-specific.
To do this, we migrated post_sink_resolve() and sink_postresolve_logsrvs()
to their postresolve_logsrvs() and postresolve_logsrv_list() equivalents.
Then, we split postresolve_logsrv_list() so that the sink-only logic stays
in sink.c (sink_resolve_logsrv_buffer() function), and the "generic"
target part stays in log.c as resolve_logsrv().
Error messages formatting was preserved as far as possible but some slight
variations are to be expected.
As for the functional aspect, no change should be expected.
MEDIUM: httpclient/logs: rely on per-proxy post-check instead of global one
httpclient used to register a global post-check function to iterate over
all known proxies and post-initialize httpclient related ones (mainly
for logs initialization).
But we currently have an issue: post_sink_resolve() function which is
also registered using REGISTER_POST_CHECK() macro conflicts with
httpclient_postcheck() function.
This is because post_sink_resolve() relies on proxy->logsrvs to be
correctly initialized already, and httpclient_postcheck() may create
and insert new logsrvs entries to existing proxies when executed.
So depending on which function runs first, we could run into trouble.
Hopefully, to this day, everything works "by accident" due to
http_client.c file being loaded before sink.c file when compiling source
code.
But as soon as we would move one of the two functions to other files, or
if we rename files or make changes to the Makefile build recipe, we could
break this at any time.
To prevent post_sink_resolve() from randomly failing in the future, we now
make httpclient postcheck rely on per-proxy post-checks by slightly
modifying httpclient_postcheck() function so that it can be registered
using REGISTER_POST_PROXY_CHECK() macro.
As per-proxy post-check functions are executed right after config parsing
for each known proxy (vs global post-check which are executed a bit later
in the init process), we can be certain that functions registered using
global post-check macro, ie: post_sink_resolve(), will always be executed
after httpclient postcheck, effectively resolving the ordering conflict.
This should normally not cause visible behavior changes, and while it
could be considered as a bug, it's probably not worth backporting it
since the only way to trigger the issue is through code refactors,
unless we want to backport it to ease code maintenance of course,
in which case it should easily apply for >= 2.7.
MEDIUM: sink: don't perform implicit truncations when maxlen is not set
maxlen now defaults ~0 (instead of BUFSIZE) to make sure no implicit
truncation will be performed when the option is not specified, since the
doc doesn't mention any default value for maxlen. As such, if the payload
is too big, it will be dropped (this is the default expected behavior).
This would result in emitted logs being silently truncated to 1000 because
test-ring maxlen is smaller than the log directive maxlen.
In this patch we're adding an extra check in post_sink_resolve() to detect
this kind of confusing setups and warn the user about the implicit
truncation when DIAG mode is on.
This commit depends on:
- "MINOR: sink: simplify post_sink_resolve function"
MINOR: log/sink: detect when log maxlen exceeds sink size
To prevent logs from being silently (and unexpectly droppped) at runtime,
we check that the maxlen parameter from the log directives are
strictly inferior to the targeted ring size.
|global
| tune.bufsize 16384
| log tcp@127.0.0.1:514 len 32768
| log myring@127.0.0.1:514 len 32768
|ring myring
| # no explicit size
On such configs, a diag warning will be reported.
This commit depends on:
- "MINOR: sink: simplify post_sink_resolve function"
- "MINOR: ring: add a function to compute max ring payload"
When user specifies a maxlen parameter that is greater than the size of
a given ring section, a warning is emitted to inform that the max length
exceeds size, and then the maxlen is forced to size.
The logic is good, but imprecise, because it doesn't take into account
the slight overhead from storing payloads into the ring.
In practise, we cannot store a single message which is exactly the same
length than size. Doing so will result in the message being dropped at
runtime.
Thanks to the ring_max_payload() function introduced in "MINOR: ring: add
a function to compute max ring payload", we can now deduce the maximum
value for the maxlen parameter before it could result in messages being
dropped.
When maxlen value is set to an improper value, the warning will be emitted
and maxlen will be forced to the maximum "single" payload len that could
fit in the ring buffer, preventing messages from being dropped
unexpectedly.
This commit depends on:
- "MINOR: ring: add a function to compute max ring payload"
Andrew Hopkins [Tue, 5 Sep 2023 23:32:50 +0000 (16:32 -0700)]
CI: Update matrix.py so all code is contained in functions.
Refactor matrix.py so all the logic is contained inside either
helper functions or a new main function. Run the new main function
by default. This lets other GitHub actions use functions in the
python code without generating the whole matrix.
Andrew Hopkins [Tue, 5 Sep 2023 23:27:28 +0000 (16:27 -0700)]
CI: add support to matrix.py to determine the latest AWS-LC release
Refactor the existing OpenSSL tag parsing logic to share some of GitHub
tag logic. OpenSSL and AWS-LC don't follow the same naming convention so
each library has it's own sorting logic.
MINOR: http_ana: position the FINAL flag for http_after_res execution
Ensure that the ACT_OPT_FINAL flag is always set when executing actions
from http_after_res context.
This will permit lua functions to be executed as http_after_res actions
since hlua_ctx_resume() automatically disables "yielding" when such flag
is set: the hlua handler will only allow 1shot executions at this point
(lua or not, we don't wan't to reschedule http_after_res actions).
BUG/MINOR: hlua/action: incorrect message on E_YIELD error
When hlua_action error messages were reworked in d5b073cf1
("MINOR: lua: Improve error message"), an error was made for the
E_YIELD case.
Indeed, everywhere E_YIELD error is handled: "yield is not allowed" or
similar error message is reported to the user. But instead we currently
have: "aborting Lua processing on expired timeout".
It is quite misleading because this error message often refers to the
HLUA_E_ETMOUT case.
Thus, we now report the proper error message thanks to this patch.
This should be backported to all stable versions.
[on 2.0, the patch needs to be slightly adapted]
BUG/MINOR: ring/cli: Don't expect input data when showing events
The "show events" command may wait for now events if "-w" option is used. In
this case, no timeout must be triggered. So we explicitly state no input
data are expected. This disables the read timeout on the client side.
This patch should be backported to 2.8. It is probably useless to backport
it further. In all cases, it depends on the commit "BUG/MINOR: applet:
Always expect data when CLI is waiting for a new command"
BUG/MINOR: applet: Always expect data when CLI is waiting for a new command
There is a mechanism for applets to disable the read timeout on the opposite
side if it is now waiting for any data. Of course, there is also a way to
re-activate it. But, it must excplicitly be handle by applets.
For the CLI, some commands may state no input data are expected. So we must
be sure to reset its state when the applet is waiting for a new command. For
now, it is not a bug because no CLI command uses this mechanism.
NUG/MEDIUM: stconn: Always update stream's expiration date after I/O
It is a revert of following patches:
* d7111e7ac ("MEDIUM: stconn: Don't requeue the stream's task after I/O")
* 3479d99d5 ("BUG/MEDIUM: stconn: Update stream expiration date on blocked sends")
Because the first one is reverted, the second one is useless and can be reverted
too.
The issue here is that I/O may be performed without stream wakeup. So if no
expiration date was set on the last call to process_stream(), the stream is
never rescheduled and no timeout can be detected. This especially happens on
TCP streams because fast-forward is enabled very early.
Instead of tracking all places where the stream's expiration data must be
updated, it is now centralized in sc_notify(), as it was performed before
the timeout refactoring.
BUG/MEDIUM: stconn/stream: Forward shutdown on write timeout
The commit 7f59d68fe2 ("BUG/MEDIIM: stconn: Flush output data before
forwarding close to write side") introduced a regression. When a write
timeout is detected, the shutdown is no longer forwarded. Dependig on the
channels state, it may block the processing, waiting the client or the
server leaves.
The commit above tries to avoid to truncate messages on shutdown but on
write timeout, if the channel is not empty, there is nothing more we can do
to send these data. It means the endpoint is unable to send data. In this
case, we must forward the shutdown.
BUG/MEDIUM: applet: Report an error if applet request more room on aborted SC
If an abort was performed and the applet still request more room, it means
the applet has not properly handle the error on its own. At least the CLI
applet is concerned. Instead of reviewing all applets, the error is now
handled in task_run_applet() function.
Because of this bug, a session may be blocked infinitly and may also lead to
a wakup loop.
This patch must only be backported to 2.8 for now. And only to lower
versions if a bug is reported because it is a bit sensitive and the code
older versions are very different.
BUG/MEDIUM: stconn: Report read activity when a stream is attached to front SC
It only concerns the front SC. But it is important to report a read activity
when a stream is created and attached to the front SC, especially in TCP. In
HTTP, when this happens, the request was necessarily received. But in TCP,
the client may open a connection without sending anything. We must still
report a first read activity in this case to be able to properly report
client timeout.
BUG/MEDIUM: applet: Fix API for function to push new data in channels buffer
All applets only check the -1 error value (need room) for applet_put*
functions while the underlying functions may also return -2 if the input is
closed or -3 if the data length is invalid. It means applets already handle
other cases by their own.
The API should be fixed but for now, to ease backports, we only fix
applet_put* functions to always return -1 on error. This way, at least for
the applets point of view, the API is consistent.
This patch should be backported to 2.8. Probably not further. Except if we
suspect it could fix a bug.
BUG/MINOR: stconn: Don't inhibit shutdown on connection on error
In the SC function responsible to perform shutdown, there is a statement
inhibiting the shutdown if an error was encountered on the SC. This
statement is inherited from very old version and should in fact be
removed. The error may be set from the stream. In this case the shutdown
must be performed. In all cases, it is not a big deal if the shutdown is
performed twice because underlying functions already handle multiple calls.
This patch does not fix any bug. Thus there is no reason to backport it.
BUG/MINOR: quic: Wrong RTT computation (srtt and rrt_var)
Due to the fact that several variable values (rtt_var, srtt) were stored as multiple
of their real values, some calculations were less accurate as expected.
Stop storing 4*rtt_var values, and 8*srtt values.
Adjust all the impacted statements.
MINOR: httpclient: allow to configure the timeout.connect
When using the httpclient, one could be bothered with it returning
after a very long time when failing. By default the httpclient has a
retries of 3 and a timeout connect of 5s, which can results in pause of
20s upon failure.
This patch allows the user to configure the "timeout connect" of the
httpclient so it could reduce the time to return an error.
When using the httpclient, one could be bothered with it returning after
a very long time when failing. By default the httpclient has a retries
of 3 and a timeout connect of 5s, which can results in pause of 20s
upon failure.
This patch allows the user to configure the retries of the httpclient so
it could reduce the time to return an error.
MEDIUM: mworker: display a more accessible message when a worker crash
Should fix issue #1034.
Display a more accessible message when a worker crash about what to do.
Example:
$ ./haproxy -W -f haproxy.cfg
[NOTICE] (308877) : New worker (308884) forked
[NOTICE] (308877) : Loading success.
[NOTICE] (308877) : haproxy version is 2.9-dev4-d90d3b-58
[NOTICE] (308877) : path to executable is ./haproxy
[ALERT] (308877) : Current worker (308884) exited with code 139 (Segmentation fault)
[WARNING] (308877) : A worker process unexpectedly died and this can only be explained by a bug in haproxy or its dependencies.
Please check that you are running an up to date and maintained version of haproxy and open a bug report.
HAProxy version 2.9-dev4-d90d3b-58 2023/09/05 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 6.2.0-31-generic #31-Ubuntu SMP PREEMPT_DYNAMIC Mon Aug 14 13:42:26 UTC 2023 x86_64
[ALERT] (308877) : exit-on-failure: killing every processes with SIGTERM
[WARNING] (308877) : All workers exited. Exiting... (139)
MEDIUM: threads: detect excessive thread counts vs cpu-map
This detects when there are more threads bound via cpu-map than CPUs
enabled in cpu-map, or when there are more total threads than the total
number of CPUs available at boot (for unbound threads) and configured
for bound threads. In this case, a warning is emitted to explain the
problems it will cause, and explaining how to address the situation.
Note that some configurations will not be detected as faulty because
the algorithmic complexity to resolve all arrangements grows in O(N!).
This means that having 3 threads on 2 CPUs and one thread on 2 CPUs
will not be detected as it's 4 threads for 4 CPUs. But at least configs
such as T0:(1,4) T1:(1,4) T2:(2,4) T3:(3,4) will not trigger a warning
since they're valid.
It's very easy to mess up with some cpu-map directives and to leave
some thread unbound. Let's add a test that checks that either all
threads are bound or none are bound, but that we do not face the
intermediary situation where some are pinned and others are left
wandering around, possibly on the same CPUs as bound ones.
Note that this should not be backported, or maybe turned into a
notice only, as it appears that it will easily catch invalid
configs and that may break updates for some users.
MINOR: cpuset: centralize a reliable bound cpu detection
Till now the CPUs that were bound were only retrieved in
thread_cpus_enabled() in order to count the number of CPUs allowed,
and it relied on arch-specific code.
Let's slightly arrange this into ha_cpuset_detect_bound() that
reuses the ha_cpuset struct and the accompanying code. This makes
the code much clearer without having to carry along some arch-specific
stuff out of this area.
Note that the macos-specific code used in thread.c to only count
online CPUs but not retrieve a mask, so for now we can't infer
anything from it and can't implement it.
In addition and more importantly, this function is reliable in that
it will only return a value when the detection is accurate, and will
not return incomplete sets on operating systems where we don't have
an exact list, such as online CPUs.
BUILD: checks: shut up yet another stupid gcc warning
gcc has always had hallucinations regarding value ranges, and this one
is interesting, and affects branches 4.7 to 11.3 at least. When building
without threads, the randomly picked new_tid that is reduced to a multiply
by 1 shifted right 32 bits, hence a constant output of 0 shows this
warning:
src/check.c: In function 'process_chk_conn':
src/check.c:1150:32: warning: array subscript [-1, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
In file included from include/haproxy/thread.h:28,
from include/haproxy/list.h:26,
from include/haproxy/action.h:28,
from src/check.c:31:
or this one when trying to force the test to see that it cannot be zero(!):
src/check.c: In function 'process_chk_conn':
src/check.c:1150:54: warning: array subscript [0, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
1150 | uint t2_act = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
| ~~~~~~~~~~~~~^~~~~~
include/haproxy/atomic.h:66:40: note: in definition of macro 'HA_ATOMIC_LOAD'
66 | #define HA_ATOMIC_LOAD(val) *(val)
| ^~~
src/check.c:1150:24: note: in expansion of macro '_HA_ATOMIC_LOAD'
1150 | uint t2_act = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
| ^~~~~~~~~~~~~~~
Let's just add an ALREADY_CHECKED() statement there, no other check seems
to get rid of it. No backport is needed.
BUILD: bug: make BUG_ON() void to avoid a rare warning
When building without threads, the recently introduced BUG_ON(tid != 0)
turns to a constant expression that evaluates to 0 and that is not used,
resulting in this warning:
src/connection.c: In function 'conn_free':
src/connection.c:584:3: warning: statement with no effect [-Wunused-value]
This is because the whole thing is declared as an expression for clarity.
Make it return void to avoid this. No backport is needed.
Andrew Hopkins [Thu, 6 Jul 2023 22:41:46 +0000 (15:41 -0700)]
BUILD: ssl: Build with new cryptographic library AWS-LC
This adds a new option for the Makefile USE_OPENSSL_AWSLC, and
update the documentation with instructions to use HAProxy with
AWS-LC.
Update the type of the OCSP callback retrieved with
SSL_CTX_get_tlsext_status_cb with the actual type for
libcrypto versions greater than 1.0.2. This doesn't affect
OpenSSL which casts the callback to void* in SSL_CTX_ctrl.
MINOR: properly mark the end of the CLI command in error messages
In several places in the file src/ssl_ckch.c, in the message about the
incorrect use of the CLI command, the end of that CLI command is not
correctly marked with the sign ' .
BUG/MINOR: stream: further protect stream_dump() against incomplete sessions
As found by Coverity in issue #2273, the fix in commit e64bccab2 ("BUG/MINOR:
stream: protect stream_dump() against incomplete streams") was still not
enough, as scf/scb are still dereferenced to dump their flags and states.
Chris Staite [Mon, 4 Sep 2023 08:52:26 +0000 (08:52 +0000)]
BUG/MEDIUM: h1-htx: Ensure chunked parsing with full output buffer
A previous fix to ensure that there is sufficient space on the output buffer
to place parsed data (#2053) introduced an issue that if the output buffer is
filled on a chunk boundary no data is parsed but the congested flag is not set
due to the state not being H1_MSG_DATA.
The check to ensure that there is sufficient space in the output buffer is
actually already performed in all downstream functions before it is used.
This makes the early optimisation that avoids the state transition to
H1_MSG_DATA needless. Therefore, in order to allow the chunk parser to
continue in this edge case we can simply remove the early check. This
ensures that the state can progress and set the congested flag correctly
in the caller.
This patch fixes #2262. The upstream change that caused this logic error was
backported as far as 2.5, therefore it makes sense to backport this fix back
that far also.
BUG/MEDIUM: connection: fix pool free regression with recent ppv2 TLV patches
In commit fecc573da ("MEDIUM: connection: Generic, list-based allocation
and look-up of PPv2 TLVs") there was a tiny mistake, elements of length
<= 128 are allocated from pool_pp_128 but only those of length < 128 are
released to this pool, other ones go to pool_pp_256. Because of this,
elements of size exactly 128 are allocated from 128 and released to 256.
It can be reproduced a few times by running sample_fetches/tlvs.vtc 1000
times with -DDEBUG_DONT_SHARE_POOLS -DDEBUG_MEMORY_POOLS -DDEBUG_EXPR
-DDEBUG_STRICT=2 -DDEBUG_POOL_INTEGRITY -DDEBUG_POOL_TRACING
-DDEBUG_NO_POOLS. Not sure why it doesn't reproduce more often though.
No backport is needed. This should address github issues #2275 and #2274.
If not correctly parsed, an ACK frame must be ignored without any more
treatment. Before this patch an ACK frame could be partially correctly
parsed, then some errors could be detected which leaded newly acknowledged
packets to be released in a wrong way calling free_quic_tx_pkts() called
by qc_parse_ack_frm(). But there is no reason to release such packets because
of a malformed ACK frame.
This patch modifies qc_parse_ack_frm(). The newly acknowledged TX packets is done
in two steps. It first collects the newly acknowledged packet calling
qc_newly_acked_pkts(). Then proceed the same way as before for the treatments of
haproxy TX packets acknowledged by the peer. If the ACK frame could not be fully
parsed, the newly ackowledged packets are replaced back from where they were
detached: the tree of TX packets for their encryption level.
Display the address of the frame to be released as soon as entering into
quic_release_frm() whose job is obviously to released the memory allocated
for the frame <frm> passed as parameter.
There are very few chances this bug may occur. Furthermore the consequences
are not dramatic: an RTT sampling may be ignored. I guess this may happen
when the now_ms global value wraps.
Do not rely on the time variable value a packet was sent to decide if it
is a newly acknowledged packet but on its presence or not in the tx packet
ebtree.
BUG/MEDIUM: stconn: Don't block sends if there is a pending shutdown
For the same reason than the previous patch, we must not block the sends
when there is a pending shutdown. In other words, we must consider the sends
are allowed when there is a pending shutdown.
This patch must slowly be backported as far as 2.2. It should partially fix
issue #2249.
BUG/MEDIUM: stconn: Wake applets on sending path if there is a pending shutdown
An applet is not woken up on sending path if it is not waiting for data or
if it states it will not consume data. However, it is important to still
wake it up if there is a pending shutdown. Otherwise, the event may be
missed and some data may remain blocked in the channel's buffer.
Because of this bug, it is possible to have a stream stuck if data are also
blocked on the opposite channel. It is for instance possible to hit the buf
with the stats applet and a client not consuming data.
This patch must slowly be backported as far as 2.2. It should partially fix
issue #2249.
BUG/MINOR: stconn: Don't report blocked sends during connection establishment
The server timeout must not be handled during the connection establishment
to not superseed the connect timeout. To do so, we must not consider
outgoing data are blocked during this stage. Concretly, it means the fsb
time must not be updated during connection establishment.
It is not an issue with regular clients because the server timeout is only
defined when the connection is estalished. However, it may be an issue for the
HTTP client, when the server timeout is lower than the connect timeout. In this
case, an early 502 may be reported with no connection retries.
BUG/MEDIUM: stconn: Update stream expiration date on blocked sends
When outgoing data are blocked, we must update the stream expiration date
and requeue the task. It is important to be sure to properly handle write
timeout, expecially if the stream cannot expire on reads. This bug was
introduced when handling of channel's timeouts was refactored to be managed
by the stream-connectors.
It is an issue if there is no server timeout and the client does not consume
the response (or the opposite but it is less common). It is also possible to
trigger the same scenario with applets on server side because, most of time,
there is no server timeout.
Willy Tarreau [Thu, 31 Aug 2023 15:51:00 +0000 (17:51 +0200)]
MINOR: checks: also consider the thread's queue for rebalancing
Let's also check for other threads when the current one is queueing,
let's not wait for the load to be high. Now this totally eliminates
differences between threads.
Willy Tarreau [Thu, 31 Aug 2023 14:54:22 +0000 (16:54 +0200)]
MEDIUM: checks: implement a queue in order to limit concurrent checks
The progressive adoption of OpenSSL 3 and its abysmal handshake
performance has started to reveal situations where it simply isn't
possible anymore to succesfully run health checks on many servers,
because between the moment all the checks are started and the moment
the handshake finally completes, the timeout has expired!
This also has consequences on production traffic which gets
significantly delayed as well, all that for lots of checks. While it's
possible to increase the check delays, it doesn't solve everything as
checks still take a huge amount of time to converge in such conditions.
Here we take a different approach by permitting to enforce the maximum
concurrent checks per thread limitation and implementing an ordered
queue. Thanks to this, if a thread about to start a check has reached
its limit, it will add the check at the end of a queue and it will be
processed once another check is finished. This proves to be extremely
efficient, with all checks completing in a reasonable amount of time
and not being disturbed by the rest of the traffic from other checks.
They're just cycling slower, but at the speed the machine can handle.
One must understand however that if some complex checks perform multiple
exchanges, they will take a check slot for all the required duration.
This is why the limit is not enforced by default.
Tests on SSL show that a limit of 5-50 checks per thread on local
servers gives excellent results already, so that could be a good starting
point.
Willy Tarreau [Wed, 23 Aug 2023 09:39:00 +0000 (11:39 +0200)]
MEDIUM: checks: search more aggressively for another thread on overload
When the current check is overloaded (more running checks than the
configured limit), we'll try more aggressively to find another thread.
Instead of just opportunistically looking for one half as loaded, now if
the current thread has more than 1% more active checks than another one,
or has more than a configured limit of concurrent running checks, it will
search for a more suitable thread among 3 other random ones in order to
migrate the check there. The number of migrations remains very low (~1%)
and the checks load very fair across all threads (~1% as well). The new
parameter is called tune.max-checks-per-thread.
Willy Tarreau [Wed, 23 Aug 2023 09:25:25 +0000 (11:25 +0200)]
MINOR: check: also consider the random other thread's active checks
When checking if it's worth transferring a sleeping thread to another
random thread, let's also check if that random other thread has less
checks than the current one, which is another reason for transferring
the load there.
This commit adds a function "check_thread_cmp_load()" to compare two
threads' loads in order to simplify the decision taking.
The minimum active check count before starting to consider rebalancing
the load was now raised from 2 to 3, because tests show that at 15k
concurrent checks, at 2, 50% are evaluated for rebalancing and 30%
are rebalanced, while at 3, this is cut in half.
Willy Tarreau [Thu, 17 Aug 2023 14:25:28 +0000 (16:25 +0200)]
MINOR: checks: maintain counters of active checks per thread
Let's keep two check counters per thread:
- one for "active" checks, i.e. checks that are no more sleeping
and are assigned to the thread. These include sleeping and
running checks ;
- one for "running" checks, i.e. those which are currently
executing on the thread.
By doing so, we'll be able to spread the health checks load a bit better
and refrain from sending too many at once per thread. The counters are
atomic since a migration increments the target thread's active counter.
These numbers are reported in "show activity", which allows to check
per thread and globally how many checks are currently pending and running
on the system.
Ideally, we should only consider checks in the process of establishing
a connection since that's really the expensive part (particularly with
OpenSSL 3.0). But the inner layers are really not suitable to doing
this. However knowing the number of active checks is already a good
enough hint.
Willy Tarreau [Thu, 31 Aug 2023 14:06:10 +0000 (16:06 +0200)]
MINOR: check/activity: collect some per-thread check activity stats
We now count the number of times a check was started on each thread
and the number of times a check was adopted. This helps understand
better what is observed regarding checks.
Willy Tarreau [Wed, 23 Aug 2023 09:58:24 +0000 (11:58 +0200)]
MINOR: check: remember when we migrate a check
The goal here is to explicitly mark that a check was migrated so that
we don't do it again. This will allow us to perform other actions on
the target thread while still knowing that we don't want to be migrated
again. The new READY bit combine with SLEEPING to form 4 possible states:
SLP RDY State Description
0 0 - (reserved)
0 1 RUNNING Check is bound to current thread and running
1 0 SLEEPING Check is sleeping, not bound to a thread
1 1 MIGRATING Check is migrating to another thread
Thus we set READY upon migration, and check for it before migrating, this
is sufficient to prevent a second migration. To make things a bit clearer,
the SLEEPING bit was switched with FASTINTER so that SLEEPING and READY are
adjacent.
MINOR: checks: pin the check to its thread upon wakeup
When a check leaves the sleeping state, we must pin it to the thread that
is processing it. It's normally always the case after the first execution,
but initial checks that start assigned to any thread (-1) could be assigned
much later, causing problems with planned changes involving queuing. Thus
better do it early, so that all threads start properly pinned.
Willy Tarreau [Thu, 17 Aug 2023 14:09:05 +0000 (16:09 +0200)]
MINOR: checks: start the checks in sleeping state
The CHK_ST_SLEEPING state was introduced by commit d114f4a68 ("MEDIUM:
checks: spread the checks load over random threads") to indicate that
a check was not currently bound to a thread and that it could easily
be migrated to any other thread. However it did not start the checks
in this state, meaning that they were not redispatchable on startup.
Sometimes under heavy load (e.g. when using SSL checks with OpenSSL 3.0)
the cost of setting up new connections is so high that some threads may
experience connection timeouts on startup. In this case it's better if
they can transfer their excess load to other idle threads. By just
marking the check as sleeping upon startup, we can do this and
significantly reduce the number of failed initial checks.
BUG/MINOR: checks: do not queue/wake a bounced check
A small issue was introduced with commit d114f4a68 ("MEDIUM: checks:
spread the checks load over random threads"): when a check is bounced
to another thread, its expiration time is set to TICK_ETERNITY. This
makes it show as not expired upon first wakeup on the next thread,
thus being detected as "woke up too early" and being instantly
rescheduled. Only this after this next wakeup it will be properly
considered.
Several approaches were attempted to fix this. The best one seems to
consist in resetting t->expire and expired upon wakeup, and changing
the !expired test for !tick_is_expired() so that we don't trigger on
this case.
Willy Tarreau [Mon, 21 Aug 2023 10:12:12 +0000 (12:12 +0200)]
MEDIUM: server/ssl: pick another thread's session when we have none yet
The per-thread SSL context in servers causes a burst of connection
renegotiations on startup, both for the forwarded traffic and for the
health checks. Health checks have been seen to continue to cause SSL
rekeying for several minutes after a restart on large thread-count
machines. The reason is that the context is exlusively per-thread
and that the more threads there are, the more likely it is for a new
connection to start on a thread that doesn't have such a context yet.
In order to improve this situation, this commit ensures that a thread
starting an SSL connection to a server without a session will first
look at the last session that was updated by another thread, and will
try to use it. In order to minimize the contention, we're using a read
lock here to protect the data, and the first-level index is an integer
containing the thread number, that is always valid and may always be
dereferenced. This way the session retrieval algorithm becomes quite
simple:
- if the last thread index is valid, then try to use the same session
under a read lock ;
- if any error happens, then atomically nuke the index so that other
threads don't use it and the next one to update a connection updates
it again
And for the ssl_sess_new_srv_cb(), we have this:
- update the entry under a write lock if the new session is valid,
otherwise kill it if the session is not valid;
- atomically update the index if it was 0 and the new one is valid,
otherwise atomically nuke it if the session failed.
Note that even if only the pointer is destroyed, the element will be
re-allocated by the next thread during the sess_new_srv_sb().
Right now a session is picked even if the SNI doesn't match, because
we don't know the SNI yet during ssl_sock_init(), but that's essentially
a matter of API, since connect_server() figures the SNI very early, then
calls conn_prepare() which calls ssl_sock_init(). Thus in the future we
could easily imaging storing a number of SNI-based contexts instead of
storing contexts per thread.
It could be worth backporting this to one LTS version after some
observation, though this is not strictly necessary. the current commit
depends on the following ones:
BUG/MINOR: ssl_sock: fix possible memory leak on OOM
MINOR: ssl_sock: avoid iterating realloc(+1) on stored context
DOC: ssl: add some comments about the non-obvious session allocation stuff
CLEANUP: ssl: keep a pointer to the server in ssl_sock_init()
MEDIUM: ssl_sock: always use the SSL's server name, not the one from the tid
MEDIUM: server/ssl: place an rwlock in the per-thread ssl server session
MINOR: server/ssl: maintain an index of the last known valid SSL session
MINOR: server/ssl: clear the shared good session index on failure
MEDIUM: server/ssl: pick another thread's session when we have none yet
Willy Tarreau [Mon, 21 Aug 2023 10:04:01 +0000 (12:04 +0200)]
MINOR: server/ssl: clear the shared good session index on failure
If we fail to set the session using SSL_set_session(), we want to quickly
erase our index from the shared one so that any other thread with a valid
session replaces it.
Willy Tarreau [Mon, 21 Aug 2023 09:55:42 +0000 (11:55 +0200)]
MINOR: server/ssl: maintain an index of the last known valid SSL session
When a thread creates a new session for a server, if none was known yet,
we assign the thread id (hence the reused_sess index) to a shared variable
so that other threads will later be able to find it when they don't have
one yet. For now we only set and clear the pointer upon session creation,
we do not yet pick it.
Note that we could have done it per thread-group, so as to avoid any
cross-thread exchanges, but it's anticipated that this is essentially
used during startup, at a moment where the cost of inter-thread contention
is very low compared to the ability to restart at full speed, which
explains why instead we store a single entry.
Willy Tarreau [Mon, 21 Aug 2023 06:41:49 +0000 (08:41 +0200)]
MEDIUM: server/ssl: place an rwlock in the per-thread ssl server session
The goal will be to permit a thread to update its session while having
it shared with other threads. For now we only place the lock and arrange
the code around it so that this is quite light. For now only the owner
thread uses this lock so there is no contention.
Note that there is a subtlety in the openssl API regarding
i2s_SSL_SESSION() in that it fills the area pointed to by its argument
with a dump of the session and returns a size that's equal to the
previously allocated one. As such, it does modify the shared area even
if that's not obvious at first glance.
Willy Tarreau [Wed, 30 Aug 2023 10:02:33 +0000 (12:02 +0200)]
MEDIUM: ssl_sock: always use the SSL's server name, not the one from the tid
In ssl_sock_set_servername(), we're retrieving the current server name
from the current thread, hoping it will not have changed. This is a
bit dangerous as strictly speaking it's not easy to prove that no other
connection had to use one between the moment it was retrieved in
ssl_sock_init() and the moment it's being read here. In addition, this
forces us to maintain one session per thread while this is not the real
need, in practice we only need one session per SNI. And the current model
prevents us from sharing sessions between threads.
This had been done in 2.5 via commit e18d4e828 ("BUG/MEDIUM: ssl: backend
TLS resumption with sni and TLSv1.3"), but as analyzed with William, it
turns out that a saner approach consists in keeping the call to
SSL_get_servername() there and instead to always assign the SNI to the
current SSL context via SSL_set_tlsext_host_name() immediately when the
session is retreived. This way the session and SNI are consulted atomically
and the host name is only checked from the session and not from possibly
changing elements.
As a bonus the rdlock that was added by that commit could now be removed,
though it didn't cost much.
Willy Tarreau [Mon, 21 Aug 2023 09:17:10 +0000 (11:17 +0200)]
DOC: ssl: add some comments about the non-obvious session allocation stuff
The SSL session allocation/reuse part is far from being trivial, and
there are some necessary tricks such as allocating then immediately
freeing that are required by the API due to internal refcount. All of
this is particularly hard to grasp, even with the scarce man pages.
Let's document a little bit what's granted and expected along this path
to help the reader later.