Willy Tarreau [Tue, 31 Aug 2021 06:51:02 +0000 (08:51 +0200)]
MEDIUM: vars: replace the global name index with a hash
The global table of known variables names can only grow and was designed
for static names that are registered at boot. Nowadays it's possible to
set dynamic variable names from Lua or from the CLI, which causes a real
problem that was partially addressed in 2.2 with commit 4e172c93f
("MEDIUM: lua: Add `ifexist` parameter to `set_var`"). Please see github
issue #624 for more context.
This patch simplifies all this by removing the need for a central
registry of known names, and storing 64-bit hashes instead. This is
highly sufficient given the low number of variables in each context.
The hash is calculated using XXH64() which is bijective over the 64-bit
space thus is guaranteed collision-free for 1..8 chars. Above that the
risk remains around 1/2^64 per extra 8 chars so in practice this is
highly sufficient for our usage. A random seed is used at boot to seed
the hash so that it's not attackable from Lua for example.
There's one particular nit though. The "ifexist" hack mentioned above
is now limited to variables of scope "proc" only, and will only match
variables that were already created or declared, but will now verify
the scope as well. This may affect some bogus Lua scripts and SPOE
agents which used to accidentally work because a similarly named
variable used to exist in a different scope. These ones may need to be
fixed to comply with the doc.
Now we can sum up the situation as this one:
- ephemeral variables (scopes sess, txn, req, res) will always be
usable, regardless of any prior declaration. This effectively
addresses the most problematic change from the commit above that
in order to work well could have required some script auditing ;
- process-wide variables (scope proc) that are mentioned in the
configuration, referenced in a "register-var-names" SPOE directive,
or created via "set-var" in the global section or the CLI, are
permanent and will always accept to be set, with or without the
"ifexist" restriction (SPOE uses this internally as well).
- process-wide variables (scope proc) that are only created via a
set-var() tcp/http action, via Lua's set_var() calls, or via an
SPOE with the "force-set-var" directive), will not be permanent
but will always accept to be replaced once they are created, even
if "ifexist" is present
- process-wide variables (scope proc) that do not exist will only
support being created via the set-var() tcp/http action, Lua's
set_var() calls without "ifexist", or an SPOE declared with
"force-set-var".
This means that non-proc variables do not care about "ifexist" nor
prior declaration, and that using "ifexist" should most often be
reliable in Lua and that SPOE should most often work without any
prior declaration. It may be doable to turn "ifexist" to 1 by default
in Lua to further ease the transition. Note: regtests were adjusted.
Willy Tarreau [Tue, 31 Aug 2021 06:48:55 +0000 (08:48 +0200)]
MINOR: vars: preset a random seed to hash variables names
Variables names will be hashed, but for this we need a random seed.
The XXH3() algorithms is bijective over the whole 64-bit space, which
is great as it guarantees no collision for 1..8 byte names. But above
that even if the risk is extremely faint, it theoretically exists and
since variables may be set from Lua we'd rather do our best to limit
the risk of controlled collision, hence the random seed.
MEDIUM: vars: pre-create parsed SCOPE_PROC variables as permanent ones
All variables whose names are parsed by the config parser, the
command-line parser or the SPOE's register-var-names parser are
now preset as permanent. This will guarantee that these variables
will exist through out all the process' life, and that it will be
possible to implement the "ifexist" feature by looking them up.
This was marked medium because pre-setting a variable with an empty
value may always have side effects, even though none was spotted at
this stage.
MEDIUM: vars: make var_clear() only reset VF_PERMANENT variables
We certainly do not want that a permanent variable (one that is listed
in the configuration) be erased by accident by an "unset-var" action.
Let's make sure these ones are only reset to an empty sample, like at
the moment of their initial registration. One trick is that the same
function is used to purge the memory at the end and to delete, so we
need to add an extra "force" argument to make the choice.
MINOR: vars: store flags into variables and add VF_PERMANENT
In order to continue to honor the ifexist Lua option and prevent rogue
SPOA agents from creating too many variables, we'll need to keep the
ability to mark certain proc.* variables as permanent when they're
known from the config file.
Let's add a flag there for this. It's added to the variable when the
variable is created with this flag set by the caller.
Another approach could have been to use a distinct list or distinct
scope but that sounds complicated and bug-prone.
MINOR: vars: support storing empty sample data with a variable
Storing an unset sample (SMP_T_ANY == 0) will be used to only reserve
the variable's space but associate no value. We need to slightly adjust
var_to_smp() for this so that it considers a value-less variable as non
existent and falls back to the default value.
MINOR: vars: add a VF_CREATEONLY flag for creation
Passing this flag to var_set() will result in the variable to only be
created if it did not exist, otherwise nothing is done (it's not even
updated). This will be used for pre-registering names.
MEDIUM: vars: make the ifexist variant of set-var only apply to the proc scope
When setting variables, there are currently two variants, one which will
always create the variable, and another one, "ifexist", which will only
create or update a variable if a similarly named variable in any scope
already existed before.
The goal was to limit the risk of injecting random names in the proc
scope, but it was achieved by making use of the somewhat limited name
indexing model, which explains the scope-agnostic restriction.
With this change, we're moving the check downwards in the chain, at the
variable level, and only variables under the scope "proc" will be subject
to the restriction. A new set of VF_* flags was added to adjust how
variables are set, and VF_UPDATEONLY is used to mention this restriction.
In this exact state of affairs, this is not completely exact, as if a
similar name was not known in any scope, the variable will continue to
be rejected like before, but this will change soon.
REORG: vars: remerge sample_store{,_stream}() into var_set()
The names for these two functions are totally misleading, they have
nothing to do with samples, they're purely dedicated to variables. The
former is only used by the second one and makes no sense by itself, so
it cannot even get a meaningful name. Let's remerge them into a single
one called "var_set()" which, as its name tries to imply, sets a variable
to a given value.
CLEANUP: vars: rename sample_clear_stream() to var_unset()
This name was quite misleading, as it has nothing to do with samples nor
streams. This function's sole purpose is to unset a variable, so let's
call it "var_unset()" and document it a little bit.
Willy Tarreau [Tue, 31 Aug 2021 06:13:25 +0000 (08:13 +0200)]
MINOR: vars: rename vars_init() to vars_init_head()
The vars_init() name is particularly confusing as it does not initialize
the variables code but the head of a list of variables passed in
arguments. And we'll soon need to have proper initialization code, so
let's rename it now.
MINOR: proxy: add a global "grace" directive to postpone soft-stop
In ticket #1348 some users expressed some concerns regarding the removal
of the "grace" directive from the proxies. Their use case very closely
mimmicks the original intent of the grace keyword, which is, let haproxy
accept traffic for some time when stopping, while indicating an external
LB that it's stopping.
This is implemented here by starting a task whose expiration triggers
the soft-stop for real. The global "stopping" variable is immediately
set however. For example, this below will be sufficient to instantly
notify an external check on port 9999 that the service is going down,
while other services remain active for 10s:
At first glance, channel_is_empty() was used on purpose in si_update_rx(),
because of the HTX ("b3e0de46c" MEDIUM: stream-int: Rely only on
SI_FL_WAIT_ROOM to stop data receipt). It is not pretty clear for now why
channel_may_recv() sould not be used here but this change introduce a
possible infinite loop with the stats applet. So, it is safer to revert the
patch, waiting for a better understanding of the probelm.
This means the abortonclose option will be broken again on the 2.3 and lower
versions.
This patch should fix the issue #1360. It must be backported as far as 2.0.
Willy Tarreau [Thu, 26 Aug 2021 14:23:37 +0000 (16:23 +0200)]
BUG/MAJOR: htx: fix missing header name length check in htx_add_header/trailer
Ori Hollander of JFrog Security reported that htx_add_header() and
htx_add_trailer() were missing a length check on the header name. While
this does not allow to overwrite any memory area, it results in bits of
the header name length to slip into the header value length and may
result in forging certain header names on the input. The sad thing here
is that a FIXME comment was present suggesting to add the required length
checks :-(
The injected headers are visible to the HTTP internals and to the config
rules, so haproxy will generally stay synchronized with the server. But
there is one exception which is the content-length header field, because
it is already deduplicated on the input, but before being indexed. As
such, injecting a content-length header after the deduplication stage
may be abused to present a different, shorter one on the other side and
help build a request smuggling attack, or even maybe a response splitting
attack. CVE-2021-40346 was assigned to this problem.
As a mitigation measure, it is sufficient to verify that no more than
one such header is present in any message, which is normally the case
thanks to the duplicate checks:
http-request deny if { req.hdr_cnt(content-length) gt 1 }
http-response deny if { res.hdr_cnt(content-length) gt 1 }
This must be backported to all HTX-enabled versions, hence as far as 2.0.
In 2.3 and earlier, the functions are in src/htx.c instead.
Many thanks to Ori for his work and his responsible report!
Willy Tarreau [Thu, 26 Aug 2021 14:07:22 +0000 (16:07 +0200)]
CLEANUP: htx: remove comments about "must be < 256 MB"
Since commit "BUG/MINOR: config: reject configs using HTTP with bufsize
>= 256 MB" we are now sure that it's not possible anymore to have an HTX
block of a size 256 MB or more, even after concatenation thanks to the
tests for len >= htx_free_data_space(). Let's remove these now obsolete
comments.
A BUG_ON() was added in htx_add_blk() to track any such exception if
the conditions would change later, to complete the one that is performed
on the start address that must remain within the buffer.
Willy Tarreau [Thu, 26 Aug 2021 13:59:44 +0000 (15:59 +0200)]
BUG/MINOR: config: reject configs using HTTP with bufsize >= 256 MB
As seen in commit 5ef965606 ("BUG/MINOR: lua: use strlcpy2() not
strncpy() to copy sample keywords"), configs with large values of
tune.bufsize were not practically usable since Lua was introduced,
regardless of the machine's available memory.
In addition, HTX encoding already limits block sizes to 256 MB, thus
it is not technically possible to use that large a buffer size when
HTTP is in use. This is absurdly high anyway, and for example Lua
initialization would take around one minute on a 4 GHz CPU. Better
prevent such a config from starting than having to deal with bug
reports that make no sense.
The check is only enforced if at least one HTX proxy was found, as
there is no techincal reason to block it for configs that are solely
based on raw TCP, and it could still be imagined that some such might
exist with single connections (e.g. a log forwarder that buffers to
cover for the storage I/O latencies).
This should be backported to all HTX-enabled versions (2.0 and above).
Released version 2.5-dev6 with the following main changes :
- BUG/MINOR threads: Use get_(local|gm)time instead of (local|gm)time
- BUG/MINOR: tools: Fix loop condition in dump_text()
- BUILD: ssl: next round of build warnings on LIBRESSL_VERSION_NUMBER
- BUILD: ssl: fix two remaining occurrences of #if USE_OPENSSL
- BUILD: tools: properly guard __GLIBC__ with defined()
- BUILD: globally enable -Wundef
- MINOR: log: Remove log-error-via-logformat option
- MINOR: log: Add new "error-log-format" option
- BUG/MAJOR: queue: better protect a pendconn being picked from the proxy
- CLEANUP: Add missing include guard to signal.h
- MINOR: ssl: Add new ssl_bc_hsk_err sample fetch
- MINOR: connection: Add a connection error code sample fetch for backend side
- REGTESTS: ssl: Add tests for bc_conn_err and ssl_bc_hsk_err sample fetches
- MINOR: http-rules: add a new "ignore-empty" option to redirects.
- CI: Github Actions: temporarily disable BoringSSL builds
- BUG/MINOR: vars: fix set-var/unset-var exclusivity in the keyword parser
- BUG/MINOR: vars: improve accuracy of the rules used to check expression validity
- MINOR: sample: add missing ARGC_ entries
- BUG/MINOR: vars: properly set the argument parsing context in the expression
- DOC: configuration: remove wrong tcp-request examples in tcp-response
- MEDIUM: vars: add a new "set-var-fmt" action
- BUG/MEDIUM: vars: run over the correct list in release_store_rules()
- BUG/MINOR: vars: truncate the variable name in error reports about scope.
- BUG/MINOR: vars: do not talk about global section in CLI errors for set-var
- CLEANUP: vars: name the temporary proxy "CFG" instead of "CLI" for global vars
- MINOR: log: make log-format expressions completely usable outside of req/resp
- MINOR: vars: add a "set-var-fmt" directive to the global section
- MEDIUM: vars: also support format strings in CLI's "set var" command
- CLEANUP: vars: factor out common code from vars_get_by_{desc,name}
- MINOR: vars: make vars_get_by_* support an optional default value
- MINOR: vars: make the vars() sample fetch function support a default value
- BUILD: ot: add argument for default value to vars_get_by_name()
BUILD: ot: add argument for default value to vars_get_by_name()
The API was extended by commit e352b9dac ("MINOR: vars: make vars_get_by_*
support an optional default value") but I didn't notice that opentracing
was using it, so it broke the build. No backport needed.
The set-var() rules are almost always duplicated when manipulating
integers or any other value that is mandatory along operations. This is
a problem because it makes the configurations complicated to maintain
and slower than needed. And it becomes even more complicated when several
conditions may set the same variable because the risk of forgetting to
initialize it or to accidentally reset it is high.
This patch extends the var() sample fetch function to take an optional
argument which contains a default value to be returned if the variable
was not set. This way it becomes much simpler to use the variable, just
set it where needed, and read it with a fall back to the default value:
The default value is always passed as a string, thus it will experience
a cast to the output type. It doesn't seem userful to complicate the
configuration to pass an explicit type at this point.
MINOR: vars: make vars_get_by_* support an optional default value
In preparation for support default values when fetching variables, we
need to update the internal API to pass an extra argument to functions
vars_get_by_{name,desc} to provide an optional default value. This
patch does this and always passes NULL in this argument. var_to_smp()
was extended to fall back to this value when available.
CLEANUP: vars: factor out common code from vars_get_by_{desc,name}
The two functions vars_get_by_name() and vars_get_by_scope() perform
almost the same operations except that they differ from the way the
name and scope are retrieved. The second part in common is more
complex and involves locking, so better factor this one out into a
new function.
MEDIUM: vars: also support format strings in CLI's "set var" command
Most often "set var" on the CLI is used to set a string, and using only
expressions is not always convenient, particularly when trying to
concatenate variables sur as host names and paths.
Now the "set var" command supports an optional keyword before the value
to indicate its type. "expr" takes an expression just like before this
patch, and "fmt" a format string, making it work like the "set-var-fmt"
actions.
The VTC was updated to include a test on the format string.
MINOR: vars: add a "set-var-fmt" directive to the global section
Just like the set-var-fmt action for tcp/http rules, the set-var-fmt
directive in global sections allows to pre-set process-wide variables
using a format string instead of a sample expression. This is often
more convenient when it is required to concatenate multiple fields,
or when emitting just one word.
MINOR: log: make log-format expressions completely usable outside of req/resp
The log-format strings are usable at plenty of places, but the expressions
using %[] were restricted to request or response context and nothing else.
This prevents from using them from the config context or the CLI, let's
relax this.
CLEANUP: vars: name the temporary proxy "CFG" instead of "CLI" for global vars
We're using a dummy temporary proxy when creating global variables in
the configuration file, it was copied from the CLI's code and was
mistakenly called "CLI", better name it "CFG". It should not appear
anywhere except maybe when debugging cores.
BUG/MINOR: vars: do not talk about global section in CLI errors for set-var
When attempting to set a variable does not start with the "proc" scope on
the CLI, we used to emit "only proc is permitted in the global section"
which obviously is a leftover from the initial code.
BUG/MINOR: vars: truncate the variable name in error reports about scope.
When a variable starts with the wrong scope, it is named without stripping
the extra characters that follow it, which usually are closing parenthesis.
Let's make sure we only report what is expected.
BUG/MEDIUM: vars: run over the correct list in release_store_rules()
In commit 9a621ae76 ("MEDIUM: vars: add a new "set-var-fmt" action")
we introduced the support for format strings in variables with the
ability to release them on exit, except that it's the wrong list that
was being scanned for the rule (http vs vars), resulting in random
crashes during deinit.
This was a recent commit in 2.5-dev, no backport is needed.
The set-var() action is convenient because it preserves the input type
but it's a pain to deal with when trying to concatenate values. The
most recurring example is when it's needed to build a variable composed
of the source address and the source port. Usually it ends up like this:
This is even worse when trying to aggregate multiple fields from stick-table
data for example. Due to this a lot of users instead abuse headers from HTTP
rules:
But this requires some careful cleanups to make sure they won't leak, and
it's significantly more expensive to deal with. And generally speaking it's
not clean. Plus it must be performed for each and every request, which is
expensive for this common case of ip+port that doesn't change for the whole
session.
This patch addresses this limitation by implementing a new "set-var-fmt"
action which performs the same work as "set-var" but takes a format string
in argument instead of an expression. This way it becomes pretty simple to
just write:
It is usable in all rulesets that already support the "set-var" action.
It is not yet implemented for the global "set-var" directive (which already
takes a string) and the CLI's "set var" command, which would definitely
benefit from it but currently uses its own parser and engine, thus it
must be reworked.
DOC: configuration: remove wrong tcp-request examples in tcp-response
There is a massive abuse of copy-paste in the doc that is visible in
the examples and arguments declaration. Let's at least remove irrelevant
examples for now.
BUG/MINOR: vars: properly set the argument parsing context in the expression
When the expression called in "set-var" uses argments that require late
resolution, the context must be set. At the moment, any unknown argument
is misleadingly reported as "ACL":
frontend f
bind :8080
mode http
http-request set-var(proc.a) be_conn(foo)
parsing [b1.cfg:4]: unable to find backend 'foo' referenced in arg 1 \
of ACL keyword 'be_conn' in proxy 'f'.
Once the context is properly set, it now says the truth:
parsing [b1.cfg:8]: unable to find backend 'foo' referenced in arg 1 \
of sample fetch keyword 'be_conn' in http-request expression in proxy 'f'.
This may be backported but is not really important. If so, the preceeding
patches "BUG/MINOR: vars: improve accuracy of the rules used to check
expression validity" and "MINOR: sample: add missing ARGC_ entries" must
be backported as well.
For a long time we couldn't have arguments in expressions used in
tcp-request, tcp-response etc rules. But now due to the variables
it's possible, and their context in case of failure to resolve an
argument (e.g. backend name not found) is not properly reported
because there is no arg context values in ARGC_* to report them.
Let's add a number of missing ones for tcp-request {connection,
session,content}, tcp-response content, tcp-check, the config
parser (for "set-var" in the global section) and the CLI parser
(for "set-var" on the CLI).
BUG/MINOR: vars: improve accuracy of the rules used to check expression validity
The set-var() expression naturally checks whether expressions are valid
in the context of the rule, but it fails to differentiate frontends from
backends. As such for tcp-content and http-request rules, it will only
accept frontend-compatible sample-fetches, excluding those declared with
SMP_UES_BKEND (a few such as be_id, be_name). For the response it accepts
the backend-compatible expressions only, though it seems that there are
no sample-fetch function that are valid only in the frontend's content,
so that should not cause any problem.
Note that while allowing valid configs to be used, the fix might also
uncover some incorrect configurations where some expressions currently
return nothing (e.g. something depending on frontend declared in a
backend), and which could be rejected, but there does not seem to be
any such keyword. Thus while it should be backported, better not backport
it too far (2.4 and possibly 2.3 only).
BUG/MINOR: vars: fix set-var/unset-var exclusivity in the keyword parser
The parser checks first for "set-var" then "unset-var" from the updated
offset instead of testing it only when the other one fails, so it
validates this rule as "unset-var":
http-request set-varunset-var(proc.a)
This should be backported everywhere relevant, though it's mostly harmless
as it's unlikely that some users are purposely writing this in their conf!
A recent update to BoringSSL broke the build again, and given that
it's not used except for QUIC development, let's temporarily disable
it until the issue is analysed and fixed.
MINOR: http-rules: add a new "ignore-empty" option to redirects.
Sometimes it is convenient to remap large sets of URIs to new ones (e.g.
after a site migration for example). This can be achieved using
"http-request redirect" combined with maps, but one difficulty there is
that non-matching entries will return an empty response. In order to
avoid this, duplicating the operation as an ACL condition ending in
"-m found" is possible but it becomes complex and error-prone while it's
known that an empty URL is not valid in a location header.
This patch addresses this by improving the redirect rules to be able to
simply ignore the rule and skip to the next one if the result of the
evaluation of the "location" expression is empty. However in order not
to break existing setups, it requires a new "ignore-empty" keyword.
There used to be an ACT_FLAG_FINAL on redirect rules that's used during
the parsing to emit a warning if followed by another rule, so here we
only set it if the option is not there. The http_apply_redirect_rule()
function now returns a 3rd value to mention that it did nothing and
that this was not an error, so that callers can just ignore the rule.
The regular "redirect" rules were not modified however since this does
not apply there.
The map_redirect VTC was completed with such a test and updated to 2.5
and an example was added into the documentation.
REGTESTS: ssl: Add tests for bc_conn_err and ssl_bc_hsk_err sample fetches
Those fetches are used to identify connection errors and SSL handshake
errors on the backend side of a connection. They can for instance be
used in a log-format line as in the regtest.
MINOR: connection: Add a connection error code sample fetch for backend side
The bc_conn_err and bc_conn_err_str sample fetches give the status of
the connection on the backend side. The error codes and error messages
are the same than the ones that can be raised by the fc_conn_err fetch.
This new sample fetch along the ssl_bc_hsk_err_str fetch contain the
last SSL error of the error stack that occurred during the SSL
handshake (from the backend's perspective).
Willy Tarreau [Tue, 31 Aug 2021 15:21:39 +0000 (17:21 +0200)]
BUG/MAJOR: queue: better protect a pendconn being picked from the proxy
The locking in the dequeuing process was significantly improved by commit 49667c14b ("MEDIUM: queue: take the proxy lock only during the px queue
accesses") in that it tries hard to limit the time during which the
proxy's queue lock is held to the strict minimum. Unfortunately it's not
enough anymore, because we take up the task and manipulate a few pendconn
elements after releasing the proxy's lock (while we're under the server's
lock) but the task will not necessarily hold the server lock since it may
not have successfully found one (e.g. timeout in the backend queue). As
such, stream_free() calling pendconn_free() may release the pendconn
immediately after the proxy's lock is released while the other thread
currently proceeding with the dequeuing tries to wake up the owner's
task and dies in task_wakeup().
One solution consists in releasing le proxy's lock later. But tests have
shown that we'd have to sacrifice a significant share of the performance
gained with the patch above (roughly a 20% loss).
This patch takes another approach. It adds a "del_lock" to each pendconn
struct, that allows to keep it referenced while the proxy's lock is being
released. It's mostly a serialization lock like a refcount, just to maintain
the pendconn alive till the task_wakeup() call is complete. This way we can
continue to release the proxy's lock early while keeping this one. It had
to be added to the few points where we're about to free a pendconn, namely
in pendconn_dequeue() and pendconn_unlink(). This way we continue to
release the proxy's lock very early and there is no performance degradation.
This lock may only be held under the queue's lock to prevent lock
inversion.
No backport is needed since the patch above was merged in 2.5-dev only.
This option can be used to define a specific log format that will be
used in case of error, timeout, connection failure on a frontend... It
will be used for any log line concerned by the log-separate-errors
option. It will also replace the format of specific error messages
decribed in section 8.2.6.
If no "error-log-format" is defined, the legacy error messages are still
emitted and the other error logs keep using the regular log-format.
This option will be replaced by a "error-log-format" that enables to use
a dedicated log-format for connection error messages instead of the
regular log-format (in which most of the fields would be invalid in such
a case).
The "log-error-via-logformat" mechanism will then be replaced by a test
on the presence of such an error log format or not. If a format is
defined, it is used for connection error messages, otherwise the legacy
error log format is used.
Willy Tarreau [Mon, 30 Aug 2021 04:02:47 +0000 (06:02 +0200)]
BUILD: globally enable -Wundef
As seen in issue #1369, supporting #if with unknown macros can silently
hide typos that may result in suboptimal code paths to be used, or even
possibly bugs. It looks like our code base does not rely that much on
this, so it's worth enabling -Wundef to catch future ones and have them
turned to more explicit "#if defined()" or #ifdef.
Willy Tarreau [Mon, 30 Aug 2021 07:35:18 +0000 (09:35 +0200)]
BUILD: ssl: fix two remaining occurrences of #if USE_OPENSSL
One was in backend.c and the other one in hlua.c. No other candidate
was found with "git grep '^#if\s*USE'". It's worth noting that 3
other such tests exist for SSL_OP_NO_{SSLv3,TLSv1_1,TLSv1_2} but
that these ones are properly set to 0 in openssl-compat.h when not
defined.
Willy Tarreau [Mon, 30 Aug 2021 04:10:04 +0000 (06:10 +0200)]
BUILD: ssl: next round of build warnings on LIBRESSL_VERSION_NUMBER
Other build warnings were emitted on LIBRESSL_VERSION_NUMBER with -Wundef
under openssl < 1.1. Related to GH issue #1369. Seems like some of them
could be simplified a little bit.
Tim Duesterhus [Sat, 28 Aug 2021 22:58:22 +0000 (00:58 +0200)]
BUG/MINOR: tools: Fix loop condition in dump_text()
The condition should first check whether `bsize` is reached, before
dereferencing the offset. Even if this always works fine, due to the
string being null-terminated, this certainly looks odd.
Tim Duesterhus [Sat, 28 Aug 2021 21:57:01 +0000 (23:57 +0200)]
BUG/MINOR threads: Use get_(local|gm)time instead of (local|gm)time
Using localtime / gmtime is not thread-safe, whereas the `get_*` wrappers are.
Found using GitHub's CodeQL scan.
The use in sample_conv_ltime() can be traced back to at least fac9ccfb705702f211f99e67d5f5d5129002086a (first appearing in 1.6-dev3), so all
supported branches with thread support are affected.
Willy Tarreau [Sat, 28 Aug 2021 11:46:11 +0000 (13:46 +0200)]
[RELEASE] Released version 2.5-dev5
Released version 2.5-dev5 with the following main changes :
- MINOR: httpclient: initialize the proxy
- MINOR: httpclient: implement a simple HTTP Client API
- MINOR: httpclient/cli: implement a simple client over the CLI
- MINOR: httpclient/cli: change the User-Agent to "HAProxy"
- MEDIUM: ssl: Keep a reference to the client's certificate for use in logs
- BUG/MEDIUM: h2: match absolute-path not path-absolute for :path
- BUILD/MINOR: ssl: Fix compilation with OpenSSL 1.0.2
- MINOR: server: check if srv is NULL in free_server()
- MINOR: proxy: check if p is NULL in free_proxy()
- BUG/MEDIUM: cfgparse: do not allocate IDs to automatic internal proxies
- BUG/MINOR: http_client: make sure to preset the proxy's default settings
- REGTESTS: http_upgrade: fix incorrect expectation on TCP->H1->H2
- REGTESTS: abortonclose: after retries, 503 is expected, not close
- REGTESTS: server: fix agent-check syntax and expectation
- BUG/MINOR: httpclient: fix uninitialized sl variable
- BUG/MINOR: httpclient/cli: change the appctx test in the callbacks
- BUG/MINOR: httpclient: check if hdr_num is not 0
- MINOR: httpclient: cleanup the include files
- MINOR: hlua: take the global Lua lock inside a global function
- MINOR: tools: add FreeBSD support to get_exec_path()
- BUG/MINOR: systemd: ExecStartPre must use -Ws
- MINOR: systemd: remove the ExecStartPre line in the unit file
- MINOR: ssl: add an openssl version string parser
- MINOR: cfgcond: implements openssl_version_atleast and openssl_version_before
- CLEANUP: ssl: remove useless check on p in openssl_version_parser()
- BUG/MINOR: stick-table: fix the sc-set-gpt* parser when using expressions
- BUG/MINOR: httpclient: remove deinit of the httpclient
- BUG/MEDIUM: base64: check output boundaries within base64{dec,urldec}
- MINOR: httpclient: set verify none on the https server
- MINOR: httpclient: add the server to the proxy
- BUG/MINOR: httpclient: fix Host header
- BUILD: httpclient: fix build without OpenSSL
- CI: github-actions: remove obsolete options
- CLEANUP: assorted typo fixes in the code and comments
- MINOR: proc: setting the process to produce a core dump on FreeBSD.
- BUILD: adopt script/build-ssl.sh for OpenSSL-3.0.0beta2
- MINOR: server: return the next srv instance on free_server
- BUG/MINOR: stats: use refcount to protect dynamic server on dump
- MEDIUM: server: extend refcount for all servers
- MINOR: server: define non purgeable server flag
- MINOR: server: mark referenced servers as non purgeable
- MINOR: server: mark servers referenced by LUA script as non purgeable
- MEDIUM: server: allow to remove servers at runtime except non purgeable
- BUG/MINOR: base64: base64urldec() ignores padding in output size check
- REGTEST: add missing lua requirements on server removal test
- REGTEST: fix haproxy required version for server removal test
- BUG/MINOR: proxy: don't dump servers of internal proxies
- REGTESTS: Use `feature cmd` for 2.5+ tests
- REGTESTS: Remove REQUIRE_VERSION=1.5 from all tests
- BUG/MINOR: resolvers: mark servers with name-resolution as non purgeable
- MINOR: compiler: implement an ONLY_ONCE() macro
- BUG/MINOR: lua: use strlcpy2() not strncpy() to copy sample keywords
- MEDIUM: ssl: Capture more info from Client Hello
- MINOR: sample: Expose SSL captures using new fetchers
- MINOR: sample: Add be2dec converter
- MINOR: sample: Add be2hex converter
- MEDIUM: config: Deprecate tune.ssl.capture-cipherlist-size
- BUG/MINOR: time: fix idle time computation for long sleeps
- MINOR: time: add report_idle() to report process-wide idle time
- BUG/MINOR: ebtree: remove dependency on incorrect macro for bits per long
- BUILD: activity: use #ifdef not #if on USE_MEMORY_PROFILING
- BUILD/MINOR: defaults: eliminate warning on MAXHOSTNAMELEN with -Wundef
- BUILD/MINOR: ssl: avoid a build warning on LIBRESSL_VERSION with -Wundef
- IMPORT: slz: silence a build warning with -Wundef
- BUILD/MINOR: regex: avoid a build warning on USE_PCRE2 with -Wundef
Willy Tarreau [Sat, 28 Aug 2021 10:06:51 +0000 (12:06 +0200)]
BUILD/MINOR: ssl: avoid a build warning on LIBRESSL_VERSION with -Wundef
Openssl-compat emits a warning for the test on LIBRESSL_VERSION that might
be underfined, if built with -Wundef. The fix is easy, let's do it. Related
to GH issue #1369.
Willy Tarreau [Sat, 28 Aug 2021 10:05:32 +0000 (12:05 +0200)]
BUILD/MINOR: defaults: eliminate warning on MAXHOSTNAMELEN with -Wundef
As reported in GH issue #1369, there is a single case of #if with a
possibly undefined value in defaults.h which is on MAXHOSTNAMELEN. Let's
turn it to a #ifdef.
Willy Tarreau [Sat, 28 Aug 2021 09:55:53 +0000 (11:55 +0200)]
BUG/MINOR: ebtree: remove dependency on incorrect macro for bits per long
The code used to rely on BITS_PER_LONG to decide on the most efficient
way to perform a 64-bit shift, but this macro is not defined (at best
it's __BITS_PER_LONG) and it's likely that it's been like this since
the early implementation of ebtrees designed on i386. Let's remove the
test on this macro and rely on sizeof(long) instead, it also has the
benefit of letting the compiler validate the two branches.
This can be backported to all versions. Thanks to Ezequiel Garcia for
reporting this one in issue #1369.
Willy Tarreau [Sat, 28 Aug 2021 09:07:31 +0000 (11:07 +0200)]
MINOR: time: add report_idle() to report process-wide idle time
Before threads were introduced in 1.8, idle_pct used to be a global
variable indicating the overall process idle time. Threads made it
thread-local, meaning that its reporting in the stats made little
sense, though this was not easy to spot. In 2.0, the idle_pct variable
moved to the struct thread_info via commit 81036f273 ("MINOR: time:
move the cpu, mono, and idle time to thread_info"). It made it more
obvious that the idle_pct was per thread, and also allowed to more
accurately measure it. But no more effort was made in that direction.
This patch introduces a new report_idle() function that accurately
averages the per-thread idle time over all running threads (i.e. it
should remain valid even if some threads are paused or stopped), and
makes use of it in the stats / "show info" reports.
Sending traffic over only two connections of an 8-thread process
would previously show this erratic CPU usage pattern:
This is not technically a bug but this lack of precision definitely affects
some users who rely on the idle_pct measurement. This should at least be
backported to 2.4, and might be to some older releases depending on users
demand.
Willy Tarreau [Fri, 27 Aug 2021 21:16:58 +0000 (23:16 +0200)]
BUG/MINOR: time: fix idle time computation for long sleeps
In 2.4 we extended the max poll time from 1s to 60s with commit 4f59d3861 ("MINOR: time: increase the minimum wakeup interval to 60s").
This had the consequence that the calculation of the idle time percentage
may overflow during the multiply by 100 if the thread had slept 43s or
more. Let's change this to a 64 bit computation. This will have no
performance impact since this is done at most twice per second.
Marcin Deranek [Tue, 13 Jul 2021 12:05:24 +0000 (14:05 +0200)]
MINOR: sample: Add be2dec converter
Add be2dec converter which allows to build JA3 compatible TLS
fingerprints by converting big-endian binary data into string
separated unsigned integers eg.
Marcin Deranek [Tue, 13 Jul 2021 13:14:21 +0000 (15:14 +0200)]
MINOR: sample: Expose SSL captures using new fetchers
To be able to provide JA3 compatible TLS Fingerprints we need to expose
all Client Hello captured data using fetchers. Patch provides new
and modifies existing fetchers to add ability to filter out GREASE values:
- ssl_fc_cipherlist_*
- ssl_fc_ecformats_bin
- ssl_fc_eclist_bin
- ssl_fc_extlist_bin
- ssl_fc_protocol_hello_id
Marcin Deranek [Mon, 12 Jul 2021 12:16:55 +0000 (14:16 +0200)]
MEDIUM: ssl: Capture more info from Client Hello
When we set tune.ssl.capture-cipherlist-size to a non-zero value
we are able to capture cipherlist supported by the client. To be able to
provide JA3 compatible TLS fingerprinting we need to capture more
information from Client Hello message:
- SSL Version
- SSL Extensions
- Elliptic Curves
- Elliptic Curve Point Formats
This patch allows HAProxy to capture such information and store it for
later use.
Willy Tarreau [Thu, 26 Aug 2021 14:48:53 +0000 (16:48 +0200)]
BUG/MINOR: lua: use strlcpy2() not strncpy() to copy sample keywords
The lua initialization code which creates the Lua mapping of all converters
and sample fetch keywords makes use of strncpy(), and as such can take ages
to start with large values of tune.bufsize because it spends its time zeroing
gigabytes of memory for nothing. A test performed with an extreme value of
16 MB takes roughly 4 seconds, so it's possible that some users with huge
1 MB buffers (e.g. for payload analysis) notice a small startup latency.
However this does not affect config checks since the Lua stack is not yet
started. Let's replace this with strlcpy2().
This should be backported to all supported versions.
Willy Tarreau [Thu, 26 Aug 2021 13:46:51 +0000 (15:46 +0200)]
MINOR: compiler: implement an ONLY_ONCE() macro
There are regularly places, especially in config analysis, where we
need to report certain things (warnings or errors) only once, but
where implementing a counter is sufficiently deterrent so that it's
not done.
Let's add a simple ONLY_ONCE() macro that implements a static variable
(char) which is atomically turned on, and returns true if it's set for
the first time. This uses fairly compact code, a single byte of BSS
and is thread-safe. There are probably a number of places in the config
parser where this could be used. It may also be used to implement a
WARN_ON() similar to BUG_ON() but which would only warn once.
Amaury Denoyelle [Thu, 26 Aug 2021 13:35:59 +0000 (15:35 +0200)]
BUG/MINOR: resolvers: mark servers with name-resolution as non purgeable
When a server is configured with name-resolution, resolvers objects are
created with reference to this server. Thus the server is marked as non
purgeable to prevent its removal at runtime.
Amaury Denoyelle [Wed, 25 Aug 2021 14:24:23 +0000 (16:24 +0200)]
REGTEST: add missing lua requirements on server removal test
The test that removes server via CLI is using LUA to check that servers
referenced in a LUA script cannot be removed. This requires LUA support
to be built in haproxy.
Split the test and create a new one containing only the LUA relevant
test. Mark it as LUA dependant.
Dragan Dosen [Wed, 25 Aug 2021 09:57:01 +0000 (11:57 +0200)]
BUG/MINOR: base64: base64urldec() ignores padding in output size check
Without this fix, the decode function would proceed even when the output
buffer is not large enough, because the padding was not considered. For
example, it would not fail with the input length of 23 and the output
buffer size of 15, even the actual decoded output size is 17.
This patch should be backported to all stable branches that have a
base64urldec() function available.
Amaury Denoyelle [Mon, 23 Aug 2021 12:10:51 +0000 (14:10 +0200)]
MEDIUM: server: allow to remove servers at runtime except non purgeable
Relax the condition on "delete server" CLI handler to be able to remove
all servers, even non dynamic, except if they are flagged as non
purgeable.
This change is necessary to extend the use cases for dynamic servers
with reload. It's expected that each dynamic server created via the CLI
is manually commited in the haproxy configuration by the user. Dynamic
servers will be present on reload only if they are present in the
configuration file. This means that non-dynamic servers must be allowed
to be removable at runtime.
The dynamic servers removal reg-test has been updated and renamed to
reflect its purpose. A new test is present to check that non-purgeable
servers cannot be removed.
Amaury Denoyelle [Mon, 23 Aug 2021 12:05:07 +0000 (14:05 +0200)]
MINOR: server: mark referenced servers as non purgeable
Mark servers that are referenced by configuration elements as non
purgeable. This includes the following list :
- tracked servers
- servers referenced in a use-server rule
- servers referenced in a sample fetch
Amaury Denoyelle [Mon, 23 Aug 2021 12:03:20 +0000 (14:03 +0200)]
MINOR: server: define non purgeable server flag
Define a flag to mark a server as non purgeable. This flag will be used
for "delete server" CLI handler. All servers without this flag will be
eligible to runtime suppression.
Amaury Denoyelle [Wed, 25 Aug 2021 13:34:53 +0000 (15:34 +0200)]
MEDIUM: server: extend refcount for all servers
In a future patch, it will be possible to remove at runtime every
servers, both static and dynamic. This requires to extend the server
refcount for all instances.
First, refcount manipulation functions have been renamed to better
express the API usage.
* srv_refcount_use -> srv_take
The refcount is always initialize to 1 on the server creation in
new_server. It's also incremented for each check/agent configured on a
server instance.
* free_server -> srv_drop
This decrements the refcount and if null, the server is freed, so code
calling it must not use the server reference after it. As a bonus, this
function now returns the next server instance. This is useful when
calling on the server loop without having to save the next pointer
before each invocation.
In these functions, remove the checks that prevent refcount on
non-dynamic servers. Each reference to "dynamic" in variable/function
naming have been eliminated as well.
Amaury Denoyelle [Wed, 25 Aug 2021 12:39:53 +0000 (14:39 +0200)]
BUG/MINOR: stats: use refcount to protect dynamic server on dump
A dynamic server may be deleted at runtime at the same moment when the
stats applet is pointing to it. Use the server refcount to prevent
deletion in this case.
This should be backported up to 2.4, with an observability period of 2
weeks. Note that it requires the dynamic server refcounting feature
which has been implemented on 2.5; the following commits are required :
- MINOR: server: implement a refcount for dynamic servers
- BUG/MINOR: server: do not use refcount in free_server in stopping mode
- MINOR: server: return the next srv instance on free_server
Ilya Shipitsin [Sat, 21 Aug 2021 11:01:25 +0000 (16:01 +0500)]
BUILD: adopt script/build-ssl.sh for OpenSSL-3.0.0beta2
starting with
https://github.com/openssl/openssl/commit/74b7f339aa58af57c0e71b7efca66e6f2db5ae2e,
libs are installed to "lib64", to get back required behaviour, let us
set libdir explicitly
Willy Tarreau [Wed, 25 Aug 2021 03:10:29 +0000 (05:10 +0200)]
CI: github-actions: remove obsolete options
2.5-dev1 removed http-use-htx but the h2spec config was not updated
accordingly, causing failures. In addition, let's also remove the
unneeded "nbthread 4" which is either too much or not enough (it's
automatic nowadays), and remove "option httplog" which causes a
warning since there's no defined log destination.
THe http_update_update_host function takes an URL and extract the domain
to use as a host header. However it only update an existing host header
and does not create one.
This patch add an empty host header so the function can update it.
Add the raw and ssl server to the proxy list so they can be freed during
the deinit() of HAProxy. As a side effect the 2 servers need to have a
different ID so the SSL one was renamed "<HTTPSCLIENT>".
Dragan Dosen [Tue, 24 Aug 2021 07:48:04 +0000 (09:48 +0200)]
BUG/MEDIUM: base64: check output boundaries within base64{dec,urldec}
Ensure that no more than olen bytes is written to the output buffer,
otherwise we might experience an unexpected behavior.
While the original code used to validate that the output size was
always large enough before starting to write, this validation was
later broken by the commit below, allowing to 3-byte blocks to
areas whose size is not multiple of 3:
BUG/MINOR: base64: dec func ignores padding for output size checking
Decode function returns an error even if the ouptut buffer is
large enought because the padding was not considered. This
case was never met with current code base.
For base64urldec(), it's basically the same problem except that since
the input format supports arbitrary lengths, the problem has always
been there since its introduction in 2.4.
This should be backported to all stable branches having a backport of
the patch above (i.e. 2.0), with some adjustments depending on the
availability of the base64dec() and base64urldec().
BUG/MINOR: httpclient: remove deinit of the httpclient
The httpclient does a free of the servers and proxies it uses, however
since we are including them in the global proxy list, haproxy already
free them during the deinit. We can safely remove these free.
Willy Tarreau [Tue, 24 Aug 2021 12:57:28 +0000 (14:57 +0200)]
BUG/MINOR: stick-table: fix the sc-set-gpt* parser when using expressions
The sc-set-gpt0() parser was extended in 2.1 by commit 0d7712dff ("MINOR:
stick-table: allow sc-set-gpt0 to set value from an expression") to support
sample expressions in addition to plain integers. However there is a
subtlety there, which is that while the arg position must be incremented
when parsing an integer, it must not be touched when calling an expression
since the expression parser already does it.
The effect is that rules making use of sc-set-gpt0() followed by an
expression always ignore one word after that expression, and will typically
fail to parse if followed by an "if" as the parser will restart after the
"if". With no condition it's different because an empty condition doesn't
result in trying to parse anything.
This patch moves the increment at the right place and adds a few
explanations for a code part that was far from being obvious.
This should be backported to branches having the commit above (2.1+).
MINOR: systemd: remove the ExecStartPre line in the unit file
The ExecStartPre line was introduced a long time ago in the systemd unit
file, at the time of systemd wrapper. With the haproxy master worker
mode, this line is now useless, since starting haproxy itself will check
the configuration.
However this does not concern the check in the ExecReload which is still
needed to return a reload status to HAProxy.
Willy Tarreau [Fri, 20 Aug 2021 13:47:25 +0000 (15:47 +0200)]
MINOR: hlua: take the global Lua lock inside a global function
Some users are facing huge CPU usage or even watchdog panics due to
the Lua global lock when many threads compete on it, but they have
no way to see that in the usual dumps. We take the lock at 2 or 3
places only, thus it's trivial to move it to a global function so
that stack dumps will now explicitly show it, increasing the change
that it rings a bell and someone suggests switch to lua-load-per-thread: