git.ipfire.org Git - thirdparty/haproxy.git/log

MINOR: trace: split the CLI "trace" parser in CLI vs statement

In order to be able to reuse the "trace" statements elsewhere (e.g.
global section), we'll first need to split its parser. It turns out
that the whole thing is self-contained inside a single function that
emits a single message on warning/error or nothing on success. That's
quite easy to split in two parts, the one that does the job and produces
the status message and the one that sends it to the CLI. That's what
this patch does.

DOC: config: fix alphabetical ordering of global section

the global section keywords were seriously misordered, and it's visible
that some mistakes have induced other ones over time, so it was about
time to fix this. Roughly 20% of the keywords were misplaced.

This commit only reordered the keywords index and their description,
nothing else was changed. It might be backported because it's a real
pain to find certain options there.

REG-TESTS: cache: Remove T-E header for 304-Not-Modified responses

VTEST does not properly handle 304-Not-Modified responses. If a
Transfer-Encoding header (and probably a Content-Lenght header too), it
waits for a body. Waiting for a fix, the Transfer-Encoding encoding of
cached responses in 2 VTEST scripts are removed.

Note it is now an issue because of a fix in the H1 multiplexer :

* 226082d13a "BUG/MINOR: mux-h1: Do not send a last null chunk on body-less answers"

This patch must be backported with the above commit.

BUG/MINOR: mux-h1: Do not send a last null chunk on body-less answers

HEAD answers should not contain any body data. Currently when a
"transfer-encoding: chunked" header is returned, a last null-chunk is added to
the answer. Some clients choke on it and fail when trying to reuse the
connection. Check that the response should not be body-less before sending the
null-chunk.

This patch should fix #1932. It must be backported as far as 2.4.

MINOR: dynbuf: switch allocation and release to macros to better track users

When building with DEBUG_MEM_STATS, we only see b_alloc() and b_free() as
users of the "buffer" pool, because all call places rely on these more
convenient functions. It's annoying because it makes it very hard to see
which parts of the code are consuming buffers.

By switching the b_alloc() and b_free() inline functions to macros, we
can now finally track the users of struct buffer, e.g:

  mux_h1.c:513            P_FREE  size:   1275002880  calls:     38910  size/call:  32768 buffer
  mux_h1.c:498           P_ALLOC  size:   1912438784  calls:     58363  size/call:  32768 buffer
  stream.c:763            P_FREE  size:   4121493504  calls:    125778  size/call:  32768 buffer
  stream.c:759            P_FREE  size:   2061697024  calls:     62918  size/call:  32768 buffer
  stream.c:742           P_ALLOC  size:   3341123584  calls:    101963  size/call:  32768 buffer
  stream.c:632            P_FREE  size:   1275068416  calls:     38912  size/call:  32768 buffer
  stream.c:631            P_FREE  size:    637435904  calls:     19453  size/call:  32768 buffer
  channel.h:850          P_ALLOC  size:   4116480000  calls:    125625  size/call:  32768 buffer
  channel.h:850          P_ALLOC  size:       720896  calls:        22  size/call:  32768 buffer
  dynbuf.c:55             P_FREE  size:        65536  calls:         2  size/call:  32768 buffer

Let's do this since it doesn't change anything for the output code
(beyond adding the call places). Interestingly the code even got
slightly smaller now.

MINOR: pool/debug: create a new pool_alloc_flag() macro

This macro just serves as an intermediary for __pool_alloc() and forwards
the flag. When DEBUG_MEM_STATS is set, it will be used to collect all
pool allocations including those which need to pass an explicit flag.

It's now used by b_alloc() which previously couldn't be tracked by
DEBUG_MEM_STATS, causing some free() calls to have no corresponding
allocations.

BUG/MINOR: ssl: SSL_load_error_strings might not be defined

The SSL_load_error_strings function was marked as deprecated in OpenSSL
1.1.0 so compiling HAProxy with OPENSSL_NO_DEPRECATED set and a recent
OpenSSL library would fail.
The manpages say that this function was replaced by OPENSSL_init_crypto
and OPENSSL_init_ssl which are already called at start up by the SSL
lib. We do not seem to be in a case where explicit call of those
functions is required.

This patch fixes GitHub issue #1813.
It can be backported to 2.6.

BUG/MEDIUM: mux-fcgi: Avoid value length overflow when it doesn't fit at once

When the request data are copied in a mbuf, if the free space is too small
to copy all data at once, the data length is shortened. When this is
performed, we reserve the size of the STDIN recod header and eventually the
same for the empty STDIN record if it is the last HTX block of the request.

However, there is no test to be sure the free space is large enough. Thus,
on this special case, when the mbuf is almost full, it is possible to
overflow the value length. Because of this bug, it is possible to experience
crashes from time to time.

This patch should fix the issue #1923. It must be backported as far as 2.4.

BUG/MINOR: mux-fcgi: Be sure to send empty STDING record in case of zero-copy

When the last HTX DATA block was copied in zero-copy, the empty STDIN
record, marking the end of the request data was never sent. Thanks to this
patch, it is now sent.

This patch must be backported as far as 2.4.

BUG/MINOR: resolvers: Set port before IP address when processing SRV records

For a server subject to SRV resolution, when the server's address is set,
its dynamic cookie, if any, and its server key are computed. Both are based
on the ip/port pair. However, this happens before the server's port is
set. Thus the port is equal to 0 at this stage. It is a problem if several
servers share the same IP but with different ports because they will share
the same dynamic cookie and the same server key, disturbing this way the
connection persistency and the session stickiness.

This patch must be backported as far as 2.2.

BUG/MINOR: resolvers: Don't wait periodic resolution on healthcheck failure

DNS resoltions may be triggered via a "do-resolve" action or when a connection
failure is experienced during a healthcheck. Cached valid responses are used, if
possible. But if the entry is expired or if there is no valid response, a new
reolution should be performed. However, an resolution is only performed if the
"resolve" timeout is expired. Thus, when this comes from a healthcheck, it means
no extra resolution is performed at all.

Now, when the resolution is performed for a server (SRV or SRVEQ) and no valid
response is found, the resolution timer is reset (last_resolution is set to
TICK_ETERNITY). Of course, it is only performed if no resolution is already
running.

Note that this feature was broken 5 years ago when the resolvers code was
refactored (67957bd59e).

This patch should fix the issue #1906. It affects all stable versions. However,
it is probably a good idea to not backport it too far (2.6, maybe 2.4) and with
some delay.

BUG/MINOR: http-htx: Fix error handling during parsing http replies

When an error is triggered during arguments parsing of an http reply (for
instance, from a "return" rule), while a log-format body was expected but
not evaluated yet, HAproxy crashes when the body log-format string is
released because it was not properly initialized.

The list used for the log-format string must be initialized earlier.

This patch should fix the issue #1925. It must be backported as far as 2.2.

MINOR: ssl: reintroduce ERR_GET_LIB(ret) == ERR_LIB_PEM in ssl_sock_load_pem_into_ckch()

Commit 432cd1a ("MEDIUM: ssl: be stricter about chain error") removed
the ERR_GET_LIB(ret) != ERR_LIB_PEM to be stricter about errors.

However, PEM_R_NO_START_LINE is better be checked with ERR_LIB_PEM.
So this patch complete the previous one.

The original problem was that the condition was wrongly inversed. This
original code from openssl:

  if (ERR_GET_LIB(err) == ERR_LIB_PEM &&
      ERR_GET_REASON(err) == PEM_R_NO_START_LINE)

became:

  if (ret && (ERR_GET_LIB(ret) != ERR_LIB_PEM &&
              ERR_GET_REASON(ret) != PEM_R_NO_START_LINE))

instead of:

  if (ret && !(ERR_GET_LIB(ret) == ERR_LIB_PEM &&
               ERR_GET_REASON(ret) == PEM_R_NO_START_LINE))

This must not be backported as it will break a lot of setup. That's too
bad because a lot of errors are lost. Not marked as a bug because of the
breakage it could cause on working setups.

MINOR: ssl: ssl_sock_load_cert_chain() display error strings

Display error strings when SSL_CTX_use_certificate() or
SSL_CTX_set1_chain() doesn't work.

OPTIM: ebtree: make ebmb_insert_prefix() keep a copy the new node's key

Similarly to the previous patch, it's better to keep a local copy of
the new node's key instead of accessing it every time. This slightly
reduces the code's size in the descent and further improves the load
time to 7.45s.

OPTIM: ebtree: make ebmb_insert_prefix() keep a copy the new node's pfx

looking at a perf profile while loading a conf with a huge map, it
appeared that there was a hot spot on the access to the new node's
prefix, which is unexpectedly being reloaded for each visited node
during the tree descent. Better keep a copy of it because with large
trees that don't fit into the L3 cache the memory bandwidth is scarce.
Doing so reduces the load time from 8.0 to 7.5 seconds.

MINOR: deinit: add a "quick-exit" option to bypass the deinit step

Once in a while we spot a bug in the deinit code that is complex,
especially when it has to deal with incomplete initializations, and the
ability to bypass this step has regularly been raised. In addition for
fast-reloading setups it could theoretically save some time. Tests have
shown that very large configs can barely save ~100-150ms by skipping the
deinit step. However the ability not to crash if a bug is encountered can
occasionally help.

This patch adds an option to do exactly this. It's obviously not enabled
by default and the documentation discourages from using it, but this might
be useful in the future.

BUG/MEDIUM: wdt/clock: properly handle early task hangs

In ae053b30 - BUG/MEDIUM: wdt: don't trigger the watchdog when p is unitialized:
wdt is not triggering until prev_cpu_time
is initialized to prevent unexpected process
termination.

Unfortunately this is not enough, some tasks could start
immediately after process startup, and in such cases
prev_cpu_time could be uninitialized, because
prev_cpu_time is set after the polling loop while
process_runnable_tasks() is executed before the polling loop.

It happens to be the case with lua tasks registered using
register_task function from lua script.

Those tasks are registered in early init stage of haproxy and
they are scheduled to run before the first polling loop,
leading to prev_cpu_time being uninitialized (equals 0)
on the thread when the task is first executed.

Because of this, if such tasks get stuck right away
(e.g: blocking IO) the watchdog won't behave as expected
and the thread will remain stuck indefinitely.
(polling loop for the thread won't run at all as
the thread is already stuck)

To solve this, we're now making sure that prev_cpu_time is first
set before any tasks are processed on the thread.
This is done by setting initial prev_cpu_time value directly
in clock_init_thread_date()

Thanks to Abhijeet Rastogi for reporting this unexpected behavior.

It could be backported in every stable versions.
(everywhere ae053b30 is, because both are related)

MEDIUM: http-ana: remove set-cookie2 support

This has never really been implemented in clients nor servers. We wanted
to drop it from 2.5 already but forgot, so let's do it now. The code was
only minimally changed. It could possibly be slightly simplified but it
would only be marginal, at the great risk of breaking something, thus
let's keep it in its proven state instead.

Tracked in github issue #1551.

BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task

Pierre Cheynier reported a rare crash that can affect stick-tables. When
a entry is created, the stick-table's expiration date is updated. But if
at exactly the same time the expiration task runs, it finishes by updating
its expiration timer without any protection, which may collide with the
call to task_queue() in another thread. In this case, it sometimes happens
that the first test for TICK_ETERNITY in task_queue() passes, then the
"expire" field is reset, then the BUG_ON() triggers, like below:

  FATAL: bug condition "task->expire == 0" matched at src/task.c:279
    call trace(13):
    |       0x649d86 [c6 04 25 01 00 00 00 00]: __task_queue+0xc6/0xce
    |       0x596bef [eb 90 ba 03 00 00 00 be]: stktable_requeue_exp+0x1ef/0x258
    |       0x596c87 [48 83 bb 90 00 00 00 00]: stktable_touch_with_exp+0x27/0x312
    |       0x563698 [48 8b 4c 24 18 4c 8b 4c]: stream_process_counters+0x3a8/0x6a2
    |       0x569344 [49 8b 87 f8 00 00 00 48]: process_stream+0x3964/0x3b4f
    |       0x64a80b [49 89 c7 e9 23 ff ff ff]: run_tasks_from_lists+0x3ab/0x566
    |       0x64ad66 [29 44 24 14 8b 7c 24 14]: process_runnable_tasks+0x396/0x71e
    |       0x6184b2 [83 3d 47 b3 a6 00 01 0f]: run_poll_loop+0x92/0x4ff
    |       0x618acf [48 8b 1d aa 20 7d 00 48]: main+0x1877ef
    | 0x7fc7d6ec1e45 [64 48 89 04 25 30 06 00]: libpthread:+0x7e45
    | 0x7fc7d6c9e4af [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a

This one is extremely difficult to reproduce in practice, but adding a
printf() in process_table_expire() before assigning the value, while
running with an expire delay of 1ms helps a lot and may trigger the
crash in less than one minute on a 8-thread machine. Interestingly,
depending on the sequencing, this bug could also have made a table fail
to expire if the expire field got reset after the last update but before
the call to task_queue(). It would require to be quite unlucky so that
the table is never touched anymore after the race though.

The solution taken by this patch is to take the table's lock when
updating its expire value in stktable_requeue_exp(), enclosing the call
to task_queue(), and to update the task->expire field while still under
the lock in process_table_expire(). Note that thanks to previous changes,
taking the table's lock for the update in stktable_requeue_exp() costs
almost nothing since we now have the guarantee that this is not done more
than 1000 times a second.

Since process_table_expire() sets the timeout after returning from
stktable_trash_expired() which just released the lock, the two functions
were merged so that the task's expire field is updated while still under
the lock. Note that this heavily depends on the two previous patches
below:

  CLEANUP: stick-table: remove the unused table->exp_next
  OPTIM: stick-table: avoid atomic ops in stktable_requeue_exp() when possible

This is a bit complicated due to the fact that in 2.7 some parts were
made lockless. In 2.6 and older, the second part (the merge of the
two functions) will be sufficient since the task_queue() call was
already performed under the table's lock, and the patches above are
not needed.

This needs to be backported as far as 1.8 scrupulously following
instructions above.

OPTIM: stick-table: avoid atomic ops in stktable_requeue_exp() when possible

Since the task's time resolution is the millisecond we know there
will not be more than 1000 useful updates per second, so there's no
point in doing a CAS and a task_queue() for each call, better first
check if we're going to change the date. Now we're certain not to
perform such operations more than 1000 times a second for a given
table.

The loop was modified because this improvement will also be used to fix a
bug later.

CLEANUP: stick-table: remove the unused table->exp_next

The ->exp_next field of the stick-table was probably useful in 1.5 but
it currently only carries a copy of what the future value of the table's
task's expire value will be, while it's systematically copied over there
immediately after being assigned. As such it provides exactly a local
variable. Let's remove it, as it costs atomic operations.

BUG/MINOR: ssl: Fix potential overflow

Coverity raised a potential overflow issue in these new functions that
work on unsigned long long objects. They were added in commit 9b25982
"BUG/MEDIUM: ssl: Verify error codes can exceed 63".

This patch needs to be backported alongside 9b25982.

BUG/MINOR: ssl: crt-ignore-err memory leak with 'all' parameter

Only allocate "str" if the parameter is not "all" in order to avoid any
memory leak.

No backport needed.

CLEANUP: ssl: remove printf in bind_parse_ignore_err

Remove debug printf.

No backport needed.

BUILD: prometheus: use __fallthrough in promex_dump_metrics() and IO handler()

This avoids 11 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: stconn: use __fallthrough in various shutw() functions

This avoids 7 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: compression: use __fallthrough in comp_http_payload()

This avoids one build warning when preprocessing happens before compiling
with gcc >= 7.

BUILD: map: use __fallthrough in cli_io_handler_*()

This avoids 6 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: vars: use __fallthrough in var_accounting_{diff,add}()

This avoids 6 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: h1_htx: use __fallthrough in h1_parse_chunk()

This avoids one build warning when preprocessing happens before compiling
with gcc >= 7.

BUILD: http_act: use __fallthrough in parse_http_del_header()

This avoids one build warning when preprocessing happens before compiling
with gcc >= 7.

BUILD: check: use __fallthrough in __health_adjust()

This avoids one build warning when preprocessing happens before compiling
with gcc >= 7.

BUILD: logs: use __fallthrough in build_log_header()

This avoids 4 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: spoe: use __fallthrough in spoe_handle_appctx()

This avoids two build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: acl: use __fallthrough in parse_acl_expr()

This avoids two build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: args: use __fallthrough in make_arg_list()

This avoids one build warning when preprocessing happens before compiling
with gcc >= 7.

BUILD: tools: use __fallthrough in url_decode()

This avoids one build warning when preprocessing happens before compiling
with gcc >= 7.

BUILD: hash: use __fallthrough in hash_djb2()

This avoids 5 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: peers: use __fallthrough in peer_io_handler()

This avoids 7 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: stats: use __fallthrough in stats_dump_proxy_to_buffer()

This avoids 12 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: tcpcheck: use __fallthrough in check_proxy_tcpcheck()

This avoids 1 build warning when preprocessing happens before compiling
with gcc >= 7.

BUILD: stream: use __fallthrough in stats_dump_full_strm_to_buffer()

This avoids 1 build warning when preprocessing happens before compiling
with gcc >= 7.

BUILD: hlua: use __fallthrough in hlua_post_init_state()

This avoids 5 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: ssl: use __fallthrough in cli_io_handler_tlskeys_files()

This avoids two build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: ssl: use __fallthrough in cli_io_handler_commit_{cert,cafile_crlfile}()

This avoids 8 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: ssl/crt-list: use __fallthrough in cli_io_handler_add_crtlist()

This avoids 3 build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: quic: use __fallthrough in quic_connect_server()

This avoids one build warning when preprocessing happens before compiling
with gcc >= 7.

BUILD: sample: use __fallthrough in smp_is_rw() and smp_dup()

This avoids three build warnings when preprocessing happens before compiling
with gcc >= 7.

BUILD: compiler: define a __fallthrough statement for switch/case

When the code is preprocessed first and compiled later, such as when
built under distcc, a lot of fallthrough warnings are emitted because
the preprocessor has already stripped the comments.

As an alternative, a "fallthrough" attribute was added with the same
compilers as those which started to emit those warnings. However it's
not portable to older compilers. Let's just define a __fallthrough
statement that corresponds to this attribute on supported compilers
and only switches to the classical empty do {} while (0) on other ones.

This way the code will support being cleaned up using __fallthrough.

BUILD: compiler: add a default definition for __has_attribute()

It happens that gcc since 5.x has this macro which is only mentioned
once in the doc, associated with __builtin_has_attribute(). Clang had
it at least since 3.0. In addition it validates #ifdef when present,
so it's easy to detect it. Here we're providing a fallback to another
macro __has_attribute_<name> so that it's possible to define that macro
to the value 1 for older compilers when the attribute is supported.

BUILD: compiler: add a macro to detect if another one is set and equals 1

In order to simplify compiler-specific checks, we'll need to check if some
attributes exist. In order to ease declarations, we'll only focus on those
that exist and will set them to 1. Let's first add a macro aimed at doing
this. Passed a macro name in argument, it will return 1 if the macro is
defined and equals 1, otherwise it will return 0. This is based on the
concatenation of the macro's value with a name to form the name of a macro
which contains one comma, resulting in some other macros arguments being
shifted by one when the macro is defined. As such it's only a matter of
pushing both a 1 and a 0 and picking the correct argument to see the
desired one. It was verified to work since at least gcc-3.4 so it should
be portable enough.

IMPORT: slz: define and use a __fallthrough statement for switch/case

When the code is preprocessed first and compiled later, such as when
built under distcc, the "fall through" comments are dropped and warnings
are emitted. Let's use the alternative "fallthrough" attribute instead,
that is supported by versions of gcc and clang that also produce this
warning.

This is libslz upstream commit 0fdf8ae218f3ecb0b7f22afd1a6b35a4f94053e2

IMPORT: slz: mention the potential header in slz_finish()

There may be 2 or 10 bytes sent respectively for zlib and gzip.

This is libslz upstream commit de1cac155ac730ba0491a6c866a510760c01fa9b

IMPORT: slz: declare len to fix debug build when optimal match is enabled

Building with -DFIND_OPTIMAL_MATCH would fail on undeclared "len".
This one likely vanished in some cleanup.

This is libslz upstream commit 1ea20360715e1ad0cd81db83fa4361310716b8cc

IMPORT: xxhash: update xxHash to version 0.8.1

This is the latest released version and a minor update on top of the
current one (0.8.0). It addresses a few build issues (some for which
patches were already backported), and particularly the fallthrough
issue by using an attribute instead of a comment.

CI: emit the compiler's version in the build reports

Some occasional builds fail only on a specific platform and being able
to figure the exact compiler version used there is crucial. It's not
easy to guess from the rest of the output, so let's add it before the
platform-specific defines, which suit the same needs.

BUILD: debug: remove unnecessary quotes in HA_WEAK() calls

HA_WEAK() is supposed to take a symbol in argument, not a string, since
the asm statements it produces already quote the argument. Having it
quoted twice doesn't work on older compilers and was the only reason
why DEBUG_MEM_STATS didn't work on older compilers.

BUILD: ssl_utils: fix build on gcc versions before 8

Commit 960fb74ca ("MEDIUM: ssl: {ca,crt}-ignore-err can now use error
constant name") provided a very convenient way to initialize only desired
macros. Unfortunately with gcc versions older than 8, it breaks with:

src/ssl_utils.c:473:12: error: initializer element is not constant

because it seems that the compiler cannot resolve strings to constants
at build time.

This patch takes a different approach, it stores the value of the macro
as a string and this string is converted to integer at boot time. This
way it works everywhere.

BUG/MINOR: ssl: bind_conf is uncorrectly accessed when using QUIC

Since commit 9b2598 ("BUG/MEDIUM: ssl: Verify error codes can exceed
63"), the ca_ignerr_bitfield and crt_ignerr_bietfield are incorrecly
accessed from __objt_listener(conn->target)->bind_conf which is not
avaiable from QUIC. The bind_conf variable was mistakenly replaced.

This patch fixes the issue by using again the bind_conf variable.

Must be backported where 9b2598 was backported.

MINOR: server: clear prefix on stderr logs after add server

cli_parse_add_server() is the CLI handler for 'add server' command. This
functions uses usermsgs_ctx to retrieve logs messages from internal
ha_alert() calls and display it at the end of the handler.

At the beginning of the handler, stderr prefix is defined to "CLI" via
usermsgs_clr() function. However, this is not resetted at the end. This
causes inconsistency for stderr output :
1. each ha_alert() invocation will reuse "CLI" prefix if 'add server'
   command was executed before, even in non-CLI context
2. usermsgs_ctx is thread local, so this is only true if this runs on
   the same thread as 'add server' handler.

To fix this, ensure that "CLI" prefix is now resetted after
cli_parse_add_server(). This is done thanks to the addition to
cli_umsg()/cli_umsgerr() functions.

This can be backported up to 2.5 if we prefer to ensure output
consistency at the risk of changing stderr behaviors in stable versions.
In this case, the previous commit should be backported before :
  MINOR: cli: define usermsgs print context

MINOR: cli: define usermsgs print context

CLI 'add server' handler relies on usermsgs_ctx to display errors in
internal function on CLI output. This may be also extended to other
handlers.

However, to not clutter stderr from another contextes, usermsgs_ctx must
be resetted when it is not needed anymore. This operation cannot be
conducted in the CLI parse handler as display is conducted after it.

To achieve this, define new CLI states CLI_ST_PRINT_UMSG /
CLI_ST_PRINT_UMSGERR. Their principles is nearly identical to states for
dynamic messages printing.

CLEANUP: cli: rename dynamic error printing state

Rename CLI_ST_PRINT_FREE to CLI_ST_PRINT_DYNERR.

Most notably, this highlights that this is reserved to error printing.

This is done to ensure consistency between CLI_ST_PRINT/CLI_ST_PRINT_DYN
and CLI_ST_PRINT_ERR/CLI_ST_PRINT_DYNERR. The name is also consistent
with the function cli_dynerr() which activates it.

MINOR: ssl: x509_v_err_str converter transforms an integer to a X509_V_ERR name

The x509_v_err_str converter transforms a numerical X509 verify error
to its constant name.

MEDIUM: ssl: {ca,crt}-ignore-err can now use error constant name

The ca-ignore-err and crt-ignore-err directives are now able to use the
openssl X509_V_ERR constant names instead of the numerical values.

This allow a configuration to survive an OpenSSL upgrade, because the
numerical ID can change between versions. For example
X509_V_ERR_INVALID_CA was 24 in OpenSSL 1 and is 79 in OpenSSL 3.

The list of errors must be updated when a new major OpenSSL version is
released.

BUG/MEDIUM: ssl: Verify error codes can exceed 63

The CRT and CA verify error codes were stored in 6 bits each in the
xprt_st field of the ssl_sock_ctx meaning that only error code up to 63
could be stored. Likewise, the ca-ignore-err and crt-ignore-err options
relied on two unsigned long longs that were used as bitfields for all
the ignored error codes. On the latest OpenSSL1.1.1 and with OpenSSLv3
and newer, verify errors have exceeded this value so these two storages
must be increased. The error codes will now be stored on 7 bits each and
the ignore-err bitfields are replaced by a big enough array and
dedicated bit get and set functions.

It can be backported on all stable branches.

[wla: let it be tested a little while before backport]
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

CI: enable QUIC for LibreSSL builds

since LibreSSL-3.6.x supports QUIC, let us enable it

CI: switch to the "latest" LibreSSL

LibreSSL-3.6.0 had some regression, it was fixed in 3.6.1, let us
switch back to the latest LibreSSL available

BUG/MINOR: ssl: ocsp structure not freed properly in case of error

In case of error, the ocsp item might already be in the ocsp certificate
tree but simply freed instead of destroyed through ssl_sock_free_ocsp.

This patch can be backported to all stable versions.

BUG/MINOR: ssl: Memory leak of AUTHORITY_KEYID struct when loading issuer

When calling ssl_get0_issuer_chain, if akid is not NULL but its keyid
is, then the AUTHORITY_KEYID is not freed.

This patch can be backported to all stable branches.

BUG/MINOR: ssl: Memory leak of DH BIGNUM fields

When running HAProxy with OpenSSLv3, the two BIGNUMs used to build our
own DH parameters are not freed. It was not necessary previously because
ownership of those parameters was transferred to OpenSSL through the
DH_set0_pqg call.

This patch should be backported to 2.6.

BUG/MINOR: httpclient: fixed memory allocation for the SSL ca_file

The memory for the SSL ca_file was allocated only once (in the function
httpclient_create_proxy()) and that pointer was assigned to each created
proxy that the HTTP client uses.  This would not be a problem if this
memory was not freed in each individual proxy when it was deinitialized
in the function ssl_sock_free_srv_ctx().

  Memory allocation:
    src/http_client.c, function httpclient_create_proxy():
      1277: if (!httpclient_ssl_ca_file)
      1278: httpclient_ssl_ca_file = strdup("@system-ca");
      1280: srv_ssl->ssl_ctx.ca_file = httpclient_ssl_ca_file;

  Memory deallocation:
    src/ssl_sock.c, function ssl_sock_free_srv_ctx():
      5613: ha_free(&srv->ssl_ctx.ca_file);

This should be backported to version 2.6.

CLEANUP: ssl: remove dead code in ssl_sock_load_pem_into_ckch()

Commit 432cd1a ("MEDIUM: ssl: be stricter about chain error")
introduced some dead code, let's remove it.

Should fix issue #1909.

CLEANUP: assorted typo fixes in the code and comments

This is 32nd iteration of typo fixes

CI: add monthly gcc cross compile jobs

Build only gcc cross compile jobs are added with monthly run to catch
rare errors, mostly 32bit <--> 64bit

BUG/MINOR: quic: fix race condition on datagram purging

Each datagram is received by a random thread and dispatch to its
destination thread linked to the connection. Then, the datagram is
handled by the connection thread. Once this is done, datagram buffer
pointer is atomically set to NULL to mark it as consumed.

Consumed datagrams are purged before recvfrom() invocation on random
receiver threads. The check for NULL buffer must thus be done
atomically. This was not the case before this patch, which may have
triggered race conditions.

This bug has been introduced by commit
91b2305ad79bb7086840797b6e98bd791992444f
MINOR: quic: implement datagram cleanup for quic_receiver_buf

This should be backported up to 2.6 after previously mentionned commit.

MINOR: quic: add counter for interrupted reception

Add a new counter "quic_rxbuf_full". It is incremented each time
quic_sock_fd_iocb() is interrupted on full buffer.

This should help to debug github issue #1903. It is suspected that
QUIC receiver buffers are full which in turn cause quic_sock_fd_iocb()
to be called repeatedly resulting in a high CPU consumption.

MINOR: ssl: dump the SSL string error when SSL_CTX_use_PrivateKey() failed.

Display the OpenSSL reason error string when SSL_CTX_use_PrivateKey()
failed.

BUG/MINOR: log: fixing bug in tcp syslog_io_handler Octet-Counting

syslog_io_handler does specific treatment to handle syslog tcp octet
counting:

Logic was good, but a sneaky mistake prevented
rfc-6587 octet counting from working properly.

trash.area was used as an input buffer.
It does not make sense here since it is uninitialized.
Compilation was unaffected because trash is a thread
local "global" variable.

buf->area should definitely be used instead.

This should be backported as far as 2.4.

BUG/MINOR: quic: fix subscribe operation

Subscribing was not properly designed between quic-conn and quic MUX
layers. Align this as with in other haproxy components : <subs> field is
moved from the MUX to the quic-conn structure. All mention of qcc MUX is
cleaned up in quic_conn_subscribe()/quic_conn_unsubscribe().

Thanks to this change, ACK reception notification has been simplified.
It's now unnecessary to check for the MUX existence before waking it.
Instead, if <subs> quic-conn field is set, just wake-up the upper layer
tasklet without mentionning MUX. This should probably be extended to
other part in quic-conn code.

This should be backported up to 2.6.

MINOR: quic: remove unnecessary quic_session_accept()

A specialized listener accept was previously used for QUIC. This is now
unneeded and we can revert to the default one session_accept_fd().

One change of importance is that the call order between
conn_xprt_start() and conn_complete_session() is now reverted to the
default one. This means that MUX instance is now NULL during
qc_xprt_start() and its app-ops layer cannot be set here. This operation
has been delayed to qc_init() to prevent a segfault.

This should be backported up to 2.6.

BUG/MAJOR: stick-table: don't process store-response rules for applets

The commit bc7c207f74 ("BUG/MAJOR: stick-tables: do not try to index a
server name for applets") tried to catch applets case when we tried to index
the server name. However, there is still an issue. The applets are
unconditionally casted to servers and this bug exists since a while. it's
just luck if it doesn't crash.

Now, when store rules are processed, we skip the rule if the stream's target
is not a server or, of course, if it is a server but the "non-stick" option
is set. However, we still take care to release the sticky session.

This patch must be backported to all stable versions.

MEDIUM: ssl: be stricter about chain error

The error check on certificate chain was ignoring all decoding error,
silently ignoring some errors.

This patch fixes the issue by being stricter on errors when reading the
chain, this is a change of behavior, it could break existing setup that
has a wrong chain.

MINOR: ssl: add the SSL error string before the chain

Add the SSL error string when failing to load a certificate in
ssl_sock_load_pem_into_ckch(). It's difficult to know what happen when no
descriptive errror are emitted. This one is for the certificate before
trying to load the complete chain.

MINOR: ssl: add the SSL error string when failing to load a certificate

Add the SSL error string when failing to load a certificate in
ssl_sock_load_pem_into_ckch(). It's difficult to know what happen when no
descriptive errror are emitted.

Example:
[ALERT] (1264006) : config : parsing [ssl_default_server.cfg:51] : 'bind /tmp/ssl.sock' in section 'listen' : unable to load certificate chain from file 'reg-tests/ssl//common.pem': ASN no PEM Header Error

BUG/MINOR: sink: Set default connect/server timeout for implicit ring buffers

Ring buffers may be implicitly created from log declarations when "tcp@",
"tcp6@", "tcp4@" or "uxst@" prefixes are used. These ring buffers rely on
unconfigurable proxies. While connect and server timeouts should be defined for
explicit ring buffers, it is no possible for implicit ones. However, a default
value must be set and TICK_ETERNITY is not an acceptable one.

Thus, now "1s" is set for the connect timeout and "5s" is set for server one.

This patch may be backported as far as 2.4.

BUG/MINOR: sink: Only use backend capability for the sink proxies

When a ring section is parsed, a proxy is created. For now, it has the
frontend (PR_CAP_FE) and the internal (PR_CAP_INT) capabilities, in addition
to the expected backend capability (PR_CAP_BE).

PR_CAP_INT capability was added to silent warning triggered because of
PR_CAP_FE capability. Indeed, Because the proxy is declared as a frontend,
warnings about missing bind lines and missing client timeout should be
triggered during the configuration parsing. These warnings are inhibited
because PR_CAP_INT capability is set. It is an issue on the 2.4 because
PR_CAP_INT capability does not exist. So warnings are always emitted.

But the true bug is that these proxies should not have PR_CAP_FE and
PR_CAP_INT capabilities. Removing these capabilities is enough to remove any
warnings on the 2.4, with no regression on higher versions. However, it may
be a good idea to eval if a dedicated frontend for sinks should be added or
not. This way, a true frontend would be used to start the sink applets. In
addition, proxies capabilities/modes have to be reviewed to have a less
ambiguous API. For instance a dedicate mode for sinks (PR_MODE_SINK ?) may
be added. Finally, it could be very nice to have all proxies in the same
list, including internal ones.

This patch should fix the issue #1900. It must be backported as far as 2.4.

MINOR: peers: handle multiple resync requests using shards

We considered the resync process is finished if a full resync request
is ended receiving the "resync-finish" message. But in the case of
"shards" each node declared with a "shard" has only a partial view
of the table. And the resync process is ended whereas the original
peer tables content contains only a "shard" of the full content.

This patch allow to retrieve the entire tables requesting a resync
from all different "shards".

To do so we don't commit the end of a resync process receiving a
"resync-finish" if the node is part of "shard", we only flag this
peer and all peers using the same shard as "notup2date" as if we
received a "resync-partial" message, and we re-schedule a request
of a resync as it is done receiving a "resync-partial" message.

Doing this the peers flagged "notup2date" won't be addressed for the
next resync request round and the next resync request will be send to
a shard not yet requested.

Receving a "resync-finish" message we also check if all peers using
"shards" are flagged "notup2date". It meens that all peers have been
addressed and we can considered the resync process is now finished.

Note also that the "resync request" scheduler already handle a timeout
and if we are not able to retrieve a full resync after a delay. The
resync process is ended.

This patch should be backported in all versions handling "shard"
on peer lines.

MINOR: peers: Support for peer shards

Add "shards" new keyword for "peers" section to configure the number
of peer shards attached to such secions. This impact all the stick-tables
attached to the section.
Add "shard" new "server" parameter to configure the peers which participate to
all the stick-tables contents distribution. Each peer receive the stick-tables updates
only for keys with this shard value as distribution hash. The "shard" value
is stored in ->shard new server struct member.
cfg_parse_peers() which is the function which is called to parse all
the lines of a "peers" section is modified to parse the "shards" parameter
stored in ->nb_shards new peers struct member.
Add srv_parse_shard() new callback into server.c to pare the "shard"
parameter.
Implement stksess_getkey_hash() to compute the distribution hash for a
stick-table key as the 64-bits xxhash of the key concatenated to the stick-table
name. This function is called by stksess_setkey_shard(), itself
called by the already implemented function which create a new stick-table
key (stksess_new()).
Add ->idlen new stktable struct member to store the stick-table name length
to not have to compute it each time a stick-table key hash is computed.

MINOR: quic: display unknown error sendto counter on stat page

This patch complete the previous incomplete commit. The new counter
sendto_err_unknown is now displayed on stats page/CLI show stats.

This is related to github issue #1903.

This should be backported up to 2.6.

MINOR: quic: do not crash on unhandled sendto error

Remove ABORT_NOW() statement on unhandled sendto error. Instead use a
dedicated counter sendto_err_unknown to report these cases.

If we detect increment of this counter, strace can be used to detect
errno value :
$ strace -p $(pidof haproxy) -f -e trace=sendto -Z

This should be backported up to 2.6.

This should help to debug github issue #1903.

BUG/MEDIUM: compression: handle rewrite errors when updating response headers

When an HTTP response is compressed by HAProxy, the headers are updated.
However it is possible to encounter a rewrite error because the buffer is
full. In this case, the compression is aborted. Thus, we must be sure to
leave the response in a valid state.

For now, it is an issue because the "Content-Encoding" header is added
before all other headers manipulations. So if the compression is aborted on
error, the "Content-Encoding" header may remain while the payload is not
compressed.

So now, we take care to leave with a valid response on error by reordering
the headers manipulations. It is too painful to really rollback all changes,
especially for an edge case.

This patch should be backported as far as 2.0. Note that on the 2.0, the
legacy HTTP part is also concerned.

BUG/MINOR: mux-quic: complete flow-control for uni streams

Max stream data was not enforced and respect for local/remote uni
streams. Previously, qcs instances incorrectly reused the limit defined
from bidirectional ones.

This is now fixed. Two fields are added in qcc structure connection :
* value for local flow control to enforce on remote uni streams
* value for remote flow control to respect on local uni streams

These two values can be reused to properly initialized msd field of a
qcs instance in qcs_new(). The rest of the code is similar.

This must be backported up to 2.6.

MINOR: list: adding MT_LIST_APPEND_LOCKED macro

adding a new mt macro: MT_LIST_APPEND_LOCKED.

This macro may be used to append an item to an existing
list, like MT_LIST_APPEND.

But here the item will be forced into locked/busy state
prior to appending, so that it is already referenced
in the list while still preventing concurrent accesses
until we decide to unlock it.

The macro returns a struct mt_list "np", that is needed
at unlock time using regular MT_LIST_UNLOCK_ELT macro.

DOC/MINOR: list: fixing MT_LIST_LOCK_ELT macro documentation

MT_LIST_LOCK_ELT macro was documented with an ambiguous
usage restriction, implying that concurrent list deletion
was not supported.

But it seems that either the code has evolved, or the comment is
wrong because the locking behavior implemented here is exactly
the same one used in MT_LIST_DELETE, and no such restriction is
described for MT_LIST_DELETE.

I made some tests to make sure concurrent MT_LIST_DELETE (or deletion
from mt_list_for_each_entry_safe) don't cause unexepected results.

At the present time, this macro is not used, this fix only
targets upcoming developments that might rely on this.

No backport needed.

MINOR: list: fixing typo in MT_LIST_LOCK_ELT

A minor typo was made in MT_LIST_LOCK_ELT, preventing
haproxy from compiling if MT_LIST_LOCK_ELT is
used in the code.

Today, the macro is unused, and that's the reason why
the typo has remained unnoticed for such a long time.

Fixing it so it can be used in upcoming developments.

No backport required.

MINOR: mworker/cli: does no try to dump the startup-logs w/o USE_SHM_OPEN

When haproxy is compiled without USE_SHM_OPEN, does not try to dump the
startup-logs in the "reload" output, because it won't show anything
interesting.

CLEANUP: mworker/cli: rename the status function to loadstatus

clarify the name of the IO handler which show the reload status.

DOC: lua: add a note about compression w/ httpclient

Decompression is not supported by the httpclient.

BUILD: Makefile: add "USE_SHM_OPEN" on the linux-musl target

The startup-logs with the shm works correctly with Alpine and Musl,
enable the feature by default for the linux-musl target.