Willy Tarreau [Sat, 8 May 2021 10:30:50 +0000 (12:30 +0200)]
REORG: mworker: move proc_self from global to mworker
Only mworker uses proc_self, and it was declared in global.h, forcing
users of global.h to include mworker and its dependencies.
Moving it to mworker reduces the preprocessed size of version.c from
170 to 125kB by shrinking the number of local includes from 30 to 16
and the number of system includes from 147 to 132.
Willy Tarreau [Sat, 8 May 2021 09:41:28 +0000 (11:41 +0200)]
REORG: vars: move the "proc" scope variables out of the global struct
The presence of this field causes a long dependency chain because almost
everyone includes global-t.h, and vars include sample_data which include
some system includes as well as HTTP parts.
There is absolutely no reason for having the process-wide variables in
the global struct, let's just move them into vars.c and vars.h. This
reduces from ~190k to ~170k the preprocessed output of version.c.
Willy Tarreau [Sat, 8 May 2021 09:06:32 +0000 (11:06 +0200)]
MINOR: config: mark tune.fd.edge-triggered as experimental
This one is stated as experimental in the doc but could still be used
by accidental copy-paste. Let's mark it with KWF_EXPERIMENTAL so that
users have to opt-in to use it.
Willy Tarreau [Sat, 8 May 2021 06:14:04 +0000 (08:14 +0200)]
MINOR: stats: make "show info" able to report rates as floats when asked
Now "show info float" will also report SSL rates, connection rates and
key reuse ratios as floats. This can be convenient at very low rates.
Note that the SSL reuse ratio which used to commonly oscillate between
0 and 1 under load is now more often above zero with small values. It
indicates that for better stability we shouldn't be comparing a key rate
with a connection rate but instead we should measure the reuse rate at
its source.
Willy Tarreau [Sat, 8 May 2021 05:40:52 +0000 (07:40 +0200)]
MINOR: stats: use tv_remain() to precisely compute the uptime
We'll have to support reporting sub-second uptimes, so let's use the
appropriate function which will automatically adjust the tv_usec field.
In addition to this, it will also report a more accurate uptime thanks
to considering the sub-second part in the result.
Willy Tarreau [Sat, 8 May 2021 05:54:24 +0000 (07:54 +0200)]
MINOR: stats: support an optional "float" option to "show info"
This will allow some fields to be produced with a higher accuracy when
the requester indicates being able to parse floats. Rates and times are
among the elements which can make sense.
Willy Tarreau [Sat, 8 May 2021 05:43:53 +0000 (07:43 +0200)]
MINOR: stats: pass the appctx flags to stats_fill_info()
Currently the stats filling function knows nothing about the caller's
needs, so let's pass the STAT_* flags so that it can adapt to the
requester's constraints.
Willy Tarreau [Sat, 8 May 2021 05:37:38 +0000 (07:37 +0200)]
MINOR: stats: add the HTML conversion for float types
For the prometheus exporter, a new float type was added for the fields
and its conversion was added everywhere except for the HTML output.
Now that we have F2H() we can implement it for consistency.
Willy Tarreau [Sat, 8 May 2021 08:38:20 +0000 (10:38 +0200)]
MINOR: stats: avoid excessive padding of float values with trailing zeroes
When emitting stats, we don't need to have 6 zeroes after the decimal point
for each value, so let's trim floating point numbers to the longest needed
only.
Willy Tarreau [Sat, 8 May 2021 06:12:37 +0000 (08:12 +0200)]
MINOR: freq_ctr: add new functions to report float measurements
For stats reporting it can be convenient to report floats at low rates
instead of discrete integers. We do have quite some precision since we
currently divide counters by number of milliseconds, so we can usually
add 3 digits after the decimal point.
Willy Tarreau [Sat, 8 May 2021 05:35:00 +0000 (07:35 +0200)]
MINOR: tools: add a float-to-ascii conversion function
We already had ultoa_r() and friends but nothing to emit inline floats.
This is now done with ftoa_r() and F2A/F2H. Note that the latter both use
the itoa_str[] as temporary storage and that the HTML format currently is
the exact same as the ASCII one. The trailing zeroes are always timmed so
these outputs are usable in user-visible output.
Willy Tarreau [Sat, 8 May 2021 08:28:53 +0000 (10:28 +0200)]
MINOR: tools: implement trimming of floating point numbers
When using "%f" to print a float, it automatically gets 6 digits after
the decimal point and there's no way to automatically adjust to the
required ones by dropping trailing zeroes. This function does exactly
this and automatically drops the decimal point if all digits after it
were zeroes. This will make numbers more friendly in stats and makes
outputs shorter (e.g. JSON where everything is just a "number").
The function is designed to be easy to use with snprint() and chunks:
Willy Tarreau [Sat, 8 May 2021 04:50:28 +0000 (06:50 +0200)]
MINOR: sample: improve error reporting on missing arg to strcmp() converter
Calling the strcmp() converter with no argument yields this strange error:
[ALERT] (31439) : parsing [test.cfg:3] : error detected in frontend 'f' while parsing 'http-request redirect' rule : failed to parse sample expression <src,strcmp]> : invalid args in converter 'strcmp' : failed to register variable name ''.
This is because the vars name check tries to see if it can create such a
variable having an empty name. Let's at least make a special case of the
missing argument. Now we can read a more explicit:
[ALERT] (31655) : parsing [test.cfg:3] : error detected in frontend 'f' while parsing 'http-request redirect' rule : failed to parse sample expression <src,strcmp]> : invalid args in converter 'strcmp' : missing variable name.
When using the crl-file option with multiple Certificate Authority
levels in the CA chain, there must be one CRL per CA or the verify
function on the backend side will raise an "unagle to get certificate
CRL" error (error code 3).
DOC: ssl: Extra files loading now works for backends too
When implementing the server side certificate hot update, the ckch
mechanism was used on the backend side in order to mimic the frontend
certificate management and to enable server line certificate update via
the CLI (see GitHub issue #427). As an unexpected side effect, we now
also look for ssl extra files (cert.pem.key, cert.pem.ocsp ...) for the
backend side.
This patch updates the documentation accordingly.
Add a new proxy capability for proxy with load-balancing capabilities.
This help to differentiate listen/frontend/backend with special proxies
such as peer proxies.
MINOR: http_act: mark normalize-uri as experimental
normalize-uri http rule is marked as experimental, so it cannot be
activated without the global 'expose-experimental-directives'. The
associated vtc is updated to be able to use it.
Support experimental actions. It is mandatory to use
'expose-experimental-directives' before to be able to use them.
If such action is present in the config file, the tainted status of the
process is updated. Another tainted status is set when an experimental
action is executed.
Add a new flag to mark a keyword as experimental. An experimental
keyword cannot be used if the global 'expose-experimental-directives' is
not present first.
Only keywords parsed through a standard cfg_keywords lists in
global/proxies section will be automatically detected if declared
experimental. To support a keyword outside of these lists,
check_kw_experimental must be called manually during its parsing.
If an experimental keyword is present in the config, the tainted flag is
updated.
For the moment, no keyword is marked as experimental.
MINOR: cfgparse: add a new field flags in cfg_keyword
This field will be used to add various mechanism to config parsing.
Currently no flag value is implemented. The following commit will
implement experimental keywords.
Add a global flag named 'tainted'. Its purpose is to report various
status about experimental features used for the current process
lifetime.
By default it is initialized to 0. It can be set/retrieve by a couple of
new functions mark_tainted()/get_tainted(). Once a flag is set, it
cannot be resetted.
Currently, no tainted status is implemented, it will be the subject of
the following commits.
BUG/MINOR: checks: Reschedule check on observe mode only if fastinter is set
On observe mode, if a server is marked as DOWN, the server's health-check is
rescheduled using the fastinter timeout if the new expiration date is newer
that the current one. But this must only be performed if the fastinter
timeout is defined.
Internally, tick_is_lt() function only checks the date and does not perform any
verification on the provided args. Thus, we must take care of it. However, it is
possible to disable the server health-check by setting its task expiration date
to TICK_ETERNITY.
This patch must be backported as far as 2.2. It is related to
BUG/MINOR: checks: Handle synchronous connect when a tcpcheck is started
A connection may be synchronously established. In the tcpcheck context, it
may be a problem if several connections come one after another. In this
case, there is no event to close the very first connection before starting
the next one. The checks is thus blocked and timed out, a L7 timeout error
is reported.
To fix the bug, when a tcpcheck is started, we immediately evaluate its
state. Most of time, nothing is performed and we must wait. But it is thus
possible to handle the result of a successfull connection.
This patch should fix the issue #1234. It must be backported as far as 2.2.
BUG/MINOR: stream: Reset stream final state and si error type on L7 retry
Thanks to a previous fix, the stream error mask is now cleared on L7
retry. But the stream final state (SF_FINST_*) and the stream-interface
error type must also be reset to properly restart a new connection and be
sure to not inherit errors from the previous connection attempt.
In addition, SF_ADDR_SET flag is not systematically removed.
stream_choose_redispatch() already takes care to unset it if necessary. When
the connection is not redispatch, the server address can be preserved.
Willy Tarreau [Fri, 7 May 2021 09:38:37 +0000 (11:38 +0200)]
CLEANUP: cli/tree-wide: properly re-align the CLI commands' help messages
There were 102 CLI commands whose help were zig-zagging all along the dump
making them unreadable. This patch realigns all these messages so that the
command now uses up to 40 characters before the delimiting colon. About a
third of the commands did not correctly list their arguments which were
added after the first version, so they were all updated. Some abuses of
the term "id" were fixed to use a more explanatory term. The
"set ssl ocsp-response" command was not listed because it lacked a help
message, this was fixed as well. The deprecated enable/disable commands
for agent/health/server were prominently written as deprecated. Whenever
possible, clearer explanations were provided.
Willy Tarreau [Fri, 7 May 2021 06:59:50 +0000 (08:59 +0200)]
MINOR: config: add a new message directive: .diag
This one works just like .notice/.warning/.alert except that it prints
the message at level "DIAG" only when haproxy runs in diagnostic mode
(-dD). This can be convenient for example to pass a few hints to help
locate certain config parts or to leave messages about certain temporary
workarounds.
Example:
.diag "WTA/2021-05-07: $.LINE: replace 'redirect' with 'return' after final switch to 2.4"
http-request redirect location /goaway if ABUSE
Willy Tarreau [Fri, 7 May 2021 06:42:39 +0000 (08:42 +0200)]
MEDIUM: log: slightly refine the output format of alerts/warnings/etc
For about 20 years we've been emitting cryptic messages on warnings and
alerts, that nobody knows how to parse:
[NOTICE] 126/080118 (3115) : haproxy version is 2.4-dev18-0b7c78-49
[NOTICE] 126/080118 (3115) : path to executable is ./haproxy
[WARNING] 126/080119 (3115) : Server default/srv1 is DOWN via static/srv1. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 126/080119 (3115) : backend 'default' has no server available!
Hint: the first 3-digit number is the day of year, and the 6 digits
after it represent the time of day in format HHMMSS, then the pid in
parenthesis. These are not quite user-friendly and such cryptic into
are not useful at all.
This patch slightly adjusts the output by performing these minimal changes:
- removing the date/time, as they were added very early when haproxy
was meant to be used in foreground as a debugging tool, and they're
provided in more details in logs nowadays ;
- better aligning the fields by padding the severity tag to 10 chars.
The diag output was renamed to "DIAG" only.
Now the output provides this:
[NOTICE] (4563) : haproxy version is 2.4-dev18-75a428-51
[NOTICE] (4563) : path to executable is ./haproxy
[WARNING] (4563) : Server default/srv1 is DOWN via static/srv1. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] (4563) : backend 'default' has no server available!
The useless space before the colon was kept so as not to confuse any
possible output parser.
The few entries in the doc referring to this format were adjusted to
reflect the new one.
The change was tagged "MEDIUM" as it may have visible consequences on
home-grown monitoring tools, though it is extremely unlikely due to the
limited extent of these changes.
Willy Tarreau [Fri, 7 May 2021 06:19:30 +0000 (08:19 +0200)]
BUG/MINOR: stream: properly clear the previous error mask on L7 retries
The cleanup of the previous error was incorrect on L7 retries, it would
OR two values while they're part of an enum, leaving some bits set.
Depending on the errors it was possible to occasionally see an internal
error ("I" flag) being logged.
This should be backported as far as 2.0, though the do_l7_retry() function
in in proto_htx.c in older versions.
Willy Tarreau [Fri, 7 May 2021 06:01:35 +0000 (08:01 +0200)]
BUG/MINOR: activity: use the new pointer to calculate the new size in realloc()
When memory profiling is enabled, realloc() can occasionally get the area
size wrong due to the wrong pointer being used to check the new size. When
the old area gets unmapped in the operation, this may even result in a
crash. There's no impact without memory profiling though.
No backport is needed as this is exclusively 2.4-dev.
Willy Tarreau [Thu, 6 May 2021 14:53:26 +0000 (16:53 +0200)]
MINOR: config: add predicates "version_atleast" and "version_before" to cond blocks
These predicates respectively verify that the current version is at least
a given version or is before a specific one. The syntax is exactly the one
reported by "haproxy -v", though each component is optional, so both "1.5"
and "2.4-dev18-88910-48" are supported. Missing components equal zero, and
"dev" is below "pre" or "rc", which are both inferior to no such mention
(i.e. they are negative). Thus "2.4-dev18" is older than "2.4-rc1" which
is older than "2.4".
Willy Tarreau [Thu, 6 May 2021 14:34:23 +0000 (16:34 +0200)]
MINOR: config: add predicate "feature" to detect certain built-in features
The "feature(name)" predicate will return true if <name> corresponds to
a name listed after a '+' in the features list, that is it was enabled at
build time with USE_<name>=1. Typical use cases will include OPENSSL, LUA
and LINUX_SPLICE. But maybe it will also be convenient to use with optional
addons such as PROMEX and the device detection modules to help keeping the
same configs across various deployments.
Willy Tarreau [Thu, 6 May 2021 14:10:09 +0000 (16:10 +0200)]
MINOR: config: add predicates "streq()" and "strneq()" to conditional expressions
"streq(str1,str2)" will return true if the two strings match while
"strneq(str1,str2)" will return true only if they differ. This is
convenient to match an environment variable against a predefined value.
Willy Tarreau [Thu, 6 May 2021 13:49:04 +0000 (15:49 +0200)]
MINOR: config: make cfg_eval_condition() support predicates with arguments
Now we can look up a list of known predicates and pre-parse their
arguments. For now the list is empty. The code needed to be arranged with
a common exit point to release all arguments because there's no default
argument freeing function (it likely only used to exist in the deinit
code). Since we only support simple arguments for now it's no big deal,
only a 2-liner loop.
Let's return the position of the first unparsable character on error,
so that instead of just saying "unparsable conditional expression blah"
we can have:
[ALERT] 125/150618 (13995) : parsing [test-conds2.cfg:1]: unparsable conditional expression '12/blah' in '.if' at position 1:
.if 12/blah
^
This is important because conditions will be made from environment
variables or later from more complex expressions where the error will
not always be easy to locate.
Willy Tarreau [Thu, 6 May 2021 05:43:35 +0000 (07:43 +0200)]
MINOR: global: add version comparison functions
The new function split_version() converts a parsable haproxy version to
an array of integers. The function compare_current_version() compares an
arbitrary version to the current one. These two functions were written
by Thierry Fournier in 2013, and are still usable as-is. They will be
used to write config language predicates.
Willy Tarreau [Thu, 6 May 2021 14:30:32 +0000 (16:30 +0200)]
MINOR: global: export the build features string list
Till now it was only presented in the version output but could not be
consulted outside of haproxy.c, let's export it as a variable, and set
it to an empty string if not defined.
Willy Tarreau [Thu, 6 May 2021 12:50:30 +0000 (14:50 +0200)]
MINOR: arg: improve the error message on missing closing parenthesis
When the closing brace is missing after an argument (acl, ...), the
error may report something like "expected ')' before ''". Let's just
drop "before ''" when the final word is empty to make the message a
bit clearer.
Willy Tarreau [Thu, 6 May 2021 08:25:11 +0000 (10:25 +0200)]
MINOR: config: support some pseudo-variables for file/line/section
The new pseudo-variables ".FILE", ".LINE" and ".SECTION" will be resolved
on the fly by the config parser and will respectively retrieve the current
configuration file name, the current line number and the current section
being parsed. This may help emit logs, errors, and debugging information
(e.g. which rule matched).
The '.' in the first char was reserved for such pseudo-variables and no
other variable is permitted. This will allow to add support for new ones
in the future if they prove to be useful (e.g. randoms/uuid for secret
keying or automatic naming of configuration objects).
Willy Tarreau [Thu, 6 May 2021 08:04:45 +0000 (10:04 +0200)]
MINOR: config: keep up-to-date current file/line/section in the global struct
Let's add a few fields to the global struct to store information about
the current file being processed, the current line number and the current
section. This will be used to retrieve them using special variables.
Willy Tarreau [Thu, 6 May 2021 06:19:48 +0000 (08:19 +0200)]
MINOR: config: centralize the ".if"/".elif" condition parser and evaluator
Instead of duplicating the condition evaluations, let's have a single
function cfg_eval_condition() that returns true/false/error. It takes
less code and will ease its extension.
Willy Tarreau [Thu, 6 May 2021 06:48:09 +0000 (08:48 +0200)]
BUG/MINOR: config: add a missing "ELIF_TAKE" test for ".elif" condition evaluator
This missing state was causing a second elif condition to be evaluated
after a first one succeeded after a .if failed. For example in the test
below the else would be executed:
Willy Tarreau [Thu, 6 May 2021 06:46:11 +0000 (08:46 +0200)]
BUG/MINOR: config: fix uninitialized initial state in ".if" block evaluator
The condition to skip the block in the ".if" evaluator forgot to check
that the level was high enough, resulting in rare cases where a random
value matched one of the 5 values that cause the block to be skipped.
BUG/MINOR: stream: Decrement server current session counter on L7 retry
When a L7 retry is performed, we must not forget to decrement the current
session counter of the assigned server. Of course, it must only be done if
the current session is already counted on the server, thus if SF_CURR_SESS
flag is set on the stream.
This patch is related to the issue #1003. It must be backported as far as
2.0.
MINOR: mux-h1: Manage processing blocking flags on the H1 stream
Because H1C_F_RX_BLK and H1C_F_TX_BLK flags now only concerns data
processing, at the H1 stream level, there is no reason to still manage them
on the H1 connection. Thus, these flags are now set on the H1 stream.
These flags are used to block, respectively, the output and the input
processing. Thus, to be more explicit, H1C_F_WAIT_INPUT is renamed to
H1C_F_TX_BLK and H1C_F_WAIT_OUTPUT is renamed to H1C_F_RX_BLK.
MEDIUM: mux-h1: Wake H1 stream when both sides a synchronized
Instead of subscribing for reads or sends to restart data processing, when
both sides are synchronized, the H1 stream is woken up. This happens when
H1C_F_WAIT_INPUT or H1C_F_WAIT_OUTPUT flags are removed, Indeed, these flags
block the data processing and not raw data sending or receiving.
MINOR: mux-h1: Always subscribe for reads when splicing is disabled
In h1_rcv_pipe(), when the splicing is not possible or disabled at the end
of the fnuction, we make sure to subscribe for reads. It is not a bug but it
avoid an extra call to h1_rcv_pipe() to handle the subscription in some
cases (end of message, end of chunk or read0).
In addition, the condition to detect end of splicing has been simplified. We
now only rely on H1C_F_WANT_SPLICE flags.
MINOR: mux-h1: Subscribe for sends if output buffer is not empty in h1_snd_pipe
In h1_snd_pipe(), before sending spliced data, we take care to flush the
output buffer by subscribing for sends. However, the condition to do so is
not accurate. We test data remaining in the pipe. It works but it also
unnecessarily subscribes H1C for sends when the output buffer is empty if we
are unable to send all spliced data in one time. Instead, H1C is now
subscribed for sends if output buffer is not empty.
MINOR: mux-h1: clean up conditions to enabled and disabled splicing
First, there is no reason to announce the splicing support at the
conn-stream level when it is created, at least for now. GTUNE_USE_SPLICE
option is already handled at the stream level.
Second, in h1_rcv_buf(), there is no reason to test the message state to
switch the H1C in splicing mode (via H1C_F_WANT_SPLICE flag).
h1_process_input() already takes care to set CS_FL_MAY_SPLICE flag on the
conn-stream when appropriate. Thus, in h1_rcv_buf(), we can rely on this
flag to change the H1C state.
Finally, if h1_rcv_pipe() is called, it means the H1C is already in the
splicing mode. H1C_F_WANT_SPLICE flag is necessarily already set. Thus no
reason to force it.
This script test abortonclose option for HTTP/1 client only. It may be
backported as far as 2.0. But on the 2.2 and prior, the syslog part must be
adapted to catch log messages emitted by proxy during HAProxy
startup. Following lines must be added :
BUG/MEDIUM: mux-h1: Properly report client close if abortonclose option is set
On client side, if CO_RFL_KEEP_RECV flags is set when h1_rcv_buf() is
called, we force subscription for reads to be able to catch read0. This way,
the event will be reported to upper layer to let the stream abort the
request.
This patch fixes the abortonclose option for H1 connections. It depends on
following patches :
* MEDIUM: mux-h1: Don't block reads when waiting for the other side
* MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
But to be sure the event is handled by the stream, the following patches are
also required :
* BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
* MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
All the series must be backported with caution as far as 2.0, and only after
a period of observation to be sure nothing broke.
MEDIUM: mux-h1: Don't block reads when waiting for the other side
When we are waiting for the other side to read more data, or to read the
next request, we must only stop the processing of input data and not the
data receipt. This patch don't change anything on the subscribes for
reads. So it should not change anything. The only difference is that the H1
connection will try to read data if it is woken up for an I/O event and if
it was subscribed for reads.
This patch is required to fix abortonclose option for H1 client connections.
MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
When the abortonclose option is enabled, to be sure to be immediately
notified when a shutdown is received from the client, the frontend
conn-stream must be sure the mux will wait for read events. To do so, the
CO_RFL_KEEP_RECV flag is set when mux->rcv_buf() is called. This new flag
instructs the mux to wait for read events, regardless its internal state.
This patch is required to fix abortonclose option for H1 client connections.
BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
In si_update_rx() function, the reads may be blocked because we explicitly
don't want to read or because of a lack of room in the input buffer. The
first condition is valid. However the second one only test if the channel is
empty or not. It means the reads are blocked if there are still some output
data in the input channel, in its buffer or its pipe. This condition is not
accurate. The reads must not be blocked if the channel can still receive
data. Thus instead of relying on channel_is_empty() function, we now call
channel_may_recv().
This patch is especially useful to be able to catch read0 on client side
when we are waiting for a connection to the server, when abortonclose option
is enabled. Otherwise, the client abort is not detected.
This patch depends on "MINOR: channel: Rely on HTX version if appropriate in
channel_may_recv()". Both must be backported as far as 2.0 after a period of
observation to be sure nothing broke.
MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
When channel_may_recv() is called for an HTX stream, the HTX version,
channel_htx_may_recv() is called. This patch is mandatory to fix a bug
related to the abortonclose option.
Willy Tarreau [Wed, 5 May 2021 16:17:39 +0000 (18:17 +0200)]
BUILD: makefile: add new option USE_MEMORY_PROFILING
It is not enabled by default, and may only work on linux-glibc for now,
though maybe other platforms could adopt it, possibly with certain
restrictions.
Willy Tarreau [Wed, 5 May 2021 16:07:02 +0000 (18:07 +0200)]
MINOR: activity: make "show profiling" also dump the memoery usage
Now the memory usage stats are dumped. They are first sorted by total
alloc+free so that the first ones are always the most relevant, and
that most symmetric alloc/free pairs appear next to each other. This
way it becomes convenient to only show a small part of them such as:
show profiling memory 20
It's worth noting that the sorting is performed upon each call to the
iohandler so it is technically possible that an entry could appear
twice or be dropped if the ordering changes between two calls. In
practice it is not an issue but it's worth being mentioned.
Willy Tarreau [Wed, 5 May 2021 15:33:27 +0000 (17:33 +0200)]
MINOR: activity: clean up the show profiling io_handler a little bit
Let's rearrange it to make it more configurable and allow to iterate
over multiple parts (header, tasks, memory etc), to restart from a
given line number (previously it didn't work, though fortunately it
didn't happen), and to support dumping only certain parts and a given
number of lines. A few entries from ctx.cli are now used to store a
restart point and the current step.
Willy Tarreau [Wed, 5 May 2021 15:07:09 +0000 (17:07 +0200)]
MEDIUM: activity: collect memory allocator statistics with USE_MEMORY_PROFILING
When built with USE_MEMORY_PROFILING the main memory allocation functions
are diverted to collect statistics per caller. It is a bit tricky because
the only way to call the original ones is to find their pointer, which
requires dlsym(), and which is not available everywhere.
Thus all functions are designed to call their fallback function (the
original one), which is preset to an initialization function that is
supposed to call dlsym() to resolve the missing symbols, and vanish.
This saves expensive tests in the critical path.
A second problem is that dlsym() calls calloc() to initialize some
error messages. After plenty of tests with posix_memalign(), valloc()
and friends, it turns out that returning NULL still makes it happy.
Thus we currently use a visit counter (in_memprof) to detect if we're
reentering, in which case all allocation functions return NULL.
In order to convert a return address to an entry in the stats, we
perform a cheap hash consisting in multiplying the pointer by a
balanced number (as many zeros as ones) and keeping the middle bits.
The hash is already pretty good like this, achieving to store up to
638 entries in a 2048-entry table without collision. But in order to
further refine this and improve the fill ratio of the table, in case
of collision we move up to 16 adjacent entries to find a free place.
This remains quite cheap and manages to store all of these inside a
1024-entries hash table with even less risk of collision.
Also, free(NULL) does not produce any stats. By doing so we reduce
from 638 to 208 the average number of entries needed for a basic
config using SSL. free(NULL) not only provides no information as it's
a NOP, but keeping it is pure pollution as it happens all the time.
When DEBUG_MEM_STATS is enabled, malloc/calloc/realloc are redefined as
macros, preventing the code from compiling. Thus, when this option is
detected, the macros are undefined as they are pointless there anyway.
The functions are optimized to quickly jump to the fallback and as such
become almost invisible in terms of processing time, execpt an extra
"if" on a read_mostly variable and a jump. Considering that this only
happens for pool misses and library routines, this remains acceptable.
Performance tests in SSL (the most stressful test) shows less than 1%
performance loss when profiling is enabled on 2c4t.
The code was written in a way to ease backporting to modern versions
(2.2+) if needed, so it keeps the long names for integers and doesn't
use the _INC version of the atomic ops.
Willy Tarreau [Wed, 5 May 2021 14:50:40 +0000 (16:50 +0200)]
MINOR: activity: declare the storage for memory usage statistics
We'll need to store for each call place, the pointer to the caller
(the return address to be more exact as with free() it's not uncommon
to see tail calls), the number of calls to alloc/free and the total
alloc/free bytes. realloc() will be counted either as alloc or free
depending on the balance of the size before vs after.
We store 1024+1 entries. The first ones are used as hashes and the
last one for collisions.
When profiling is enabled via the CLI, all the stats are reset.
Willy Tarreau [Wed, 5 May 2021 14:28:31 +0000 (16:28 +0200)]
CLEANUP: activity: mark the profiling and task_profiling_mask __read_mostly
These ones are only read by the scheduler and occasionally written to
by the CLI parser, so let's move them to read_mostly so that they do
not risk to suffer from cache line pollution.
Willy Tarreau [Wed, 5 May 2021 07:06:21 +0000 (09:06 +0200)]
MINOR: tools: add functions to retrieve the address of a symbol
get_sym_curr_addr() will return the address of the first occurrence of
the given symbol while get_sym_next_addr() will return the address of
the next occurrence of the symbol. These ones return NULL on non-linux,
non-ELF, non-USE_DL.
MEDIUM: connection: close front idling connection on soft-stop
Implement a safe mechanism to close front idling connection which
prevents the soft-stop to complete. Every h1/h2 front connection is
added in a new per-thread list instance. On shutdown, a new task is
waking up which calls wake mux operation on every connection still
present in the new list.
A new stopping_list attach point has been added in the connection
structure. As this member is only used for frontend connections, it
shared the same union as the session_list reserved for backend
connections.
MEDIUM: mux_h1: release idling frontend conns on soft-stop
In h1_process, if the proxy of a frontend connection is disabled,
release the connection.
This commit is in preparation to properly close idling front connections
on soft-stop. h1_process must still be called, this will be done via a
dedicated task which monitors the global variable stopping.
MINOR: connection: move session_list member in a union
Move the session_list attach point in an anonymous union. This member is
only used for backend connections. This commit is in preparation for the
support of stopping frontend idling connections which will add another
member to the union.
This change means that a special care must be taken to be sure that only
backend connections manipulate the session_list. A few BUG_ON has been
added as special guard to prevent from misuse.
MINOR: srv: close all idle connections on shutdown
Implement a function to close all server idle connections. This function
is called via a global deinit server handler.
The main objective is to prevents from leaving sockets in TIME_WAIT
state. To limit the set of operations on shutdown and prevents
tasks rescheduling, only the ctrl stack closing is done.
The purpose of this debugging option was to prevent certain pools from
masking other ones when they were shared. For example, task, http_txn,
h2s, h1s, h1c, session, fcgi_strm, and connection are all 192 bytes and
would normally be mergedi, but not with this option. The problem is that
certain pools are declared multiple times with various parameters, which
are often very close, and due to the way the option works, they're not
shared either. Good examples of this are captures and stick tables. Some
configurations have large numbers of stick-tables of pretty similar types
and it's very common to end up with the following when the option is
enabled:
In addition to not being convenient, it can have important effects on the
memory usage because these pools will not share their entries, so one stick
table cannot allocate from another one's pool.
This patch solves this by going back to the initial goal which was not to
have different pools in the same list. Instead of masking the MAP_F_SHARED
flag, it simply adds a test on the pool's name, and disables pool sharing
if the names differ. This way pools are not shared unless they're of the
same name and size, which doesn't hinder debugging. The same test above
now returns this:
Willy Tarreau [Tue, 4 May 2021 16:40:50 +0000 (18:40 +0200)]
MINOR: debug: add a new "debug dev sym" command in expert mode
This command attempts to resolve a pointer to a symbol name. This is
convenient during development as it's easier to get such pointers live
than by issuing a debugger or calling addr2line.
Willy Tarreau [Tue, 4 May 2021 14:27:45 +0000 (16:27 +0200)]
BUG/MEDIUM: cli: prevent memory leak on write errors
Since the introduction of payload support on the CLI in 1.9-dev1 by
commit abbf60710 ("MEDIUM: cli: Add payload support"), a chunk is
temporarily allocated for the CLI to support defragmenting a payload
passed with a command. However it's only released when passing via
the CLI_ST_END state (i.e. on clean shutdown), but not on errors.
Something as trivial as:
$ while :; do ncat --send-only -U /path/to/cli <<< "show stat"; done
with a few hundreds of servers is enough see the number of allocated
trash chunks go through the roof in "show pools".
BUG/MINOR: hlua: Don't rely on top of the stack when using Lua buffers
When the lua buffers are used, a variable number of stack slots may be
used. Thus we cannot assume that we know where the top of the stack is. It
was not an issue for lua < 5.4.3 (at least for small buffers). But
'socket:receive()' now fails with lua 5.4.3 because a light userdata is
systematically pushed on the top of the stack when a buffer is initialized.
To fix the bug, in hlua_socket_receive(), we save the index of the top of
the stack before creating the buffer. This way, we can check the number of
arguments, regardless anything was pushed on the stack or not.
Note that the other buffer usages seem to be safe.
This patch should solve the issue #1240. It should be backport to all stable
branches.
Willy Tarreau [Sat, 1 May 2021 06:25:15 +0000 (08:25 +0200)]
[RELEASE] Released version 2.4-dev18
Released version 2.4-dev18 with the following main changes :
- DOC: Fix indentation for `path-strip-dot` normalizer
- DOC: Fix RFC reference for the percent-to-uppercase normalizer
- DOC: Add RFC references for the path-strip-dot(dot)? normalizers
- MINOR: uri_normalizer: Add a `percent-decode-unreserved` normalizer
- BUG/MINOR: mux-fcgi: Don't send normalized uri to FCGI application
- REORG: htx: Inline htx functions to add HTX blocks in a message
- CLEANUP: assorted typo fixes in the code and comments
- DOC: general: fix white spaces for HTML converter
- BUG/MINOR: ssl: ssl_sock_prepare_ssl_ctx does not return an error code
- BUG/MINOR: cpuset: move include guard at the very beginning
- BUG/MAJOR: fix build on musl with cpu_set_t support
- BUG/MEDIUM: cpuset: fix build on MacOS
- BUG/MINOR: htx: Preserve HTX flags when draining data from an HTX message
- MEDIUM: htx: Refactor htx_xfer_blks() to not rely on hdrs_bytes field
- CLEANUP: htx: Remove unsued hdrs_bytes field from the HTX start-line
- BUG/MINOR: mux-h2: Don't encroach on the reserve when decoding headers
- MEDIUM: http-ana: handle read error on server side if waiting for response
- MINOR: htx: Limit length of headers name/value when a HTX message is dumped
- BUG/MINOR: applet: Notify the other side if data were consumed by an applet
- BUG/MINOR: hlua: Don't consume headers when starting an HTTP lua service
- BUG/MEDIUM: mux-h2: Handle EOM flag when sending a DATA frame with zero-copy
- CLEANUP: channel: No longer notify the producer in co_skip()/co_htx_skip()
- DOC: general: fix example in set-timeout
- CLEANUP: cfgparse: de-uglify early file error handling in readcfgfile()
- MINOR: config: add a new "default-path" global directive
- BUG/MEDIUM: peers: initialize resync timer to get an initial full resync
- BUG/MEDIUM: peers: register last acked value as origin receiving a resync req
- BUG/MEDIUM: peers: stop considering ack messages teaching a full resync
- BUG/MEDIUM: peers: reset starting point if peers appears longly disconnected
- BUG/MEDIUM: peers: reset commitupdate value in new conns
- BUG/MEDIUM: peers: re-work updates lookup during the sync on the fly
- BUG/MEDIUM: peers: reset tables stage flags stages on new conns
- MINOR: peers: add informative flags about resync process for debugging
- BUG/MEDIUM: time: fix updating of global_now upon clock drift
- CLEANUP: freq_ctr: make arguments of freq_ctr_total() const
- CLEANUP: hlua: rename hlua_appctx* appctx to luactx
- MINOR: server: fix doc/trace on lb algo for dynamic server creation
- REGTESTS: server: fix cli_add_server due to previous trace update
- REGTESTS: add minimal CLI "add map" tests
- DOC: management: move "set var" to the proper place
- CLEANUP: map: slightly reorder the add map function
- MINOR: map: get rid of map_add_key_value()
- MINOR: map: show the current and next pattern version in "show map"
- MINOR: map/acl: add the possibility to specify the version in "show map/acl"
- MINOR: pattern: support purging arbitrary ranges of generations
- MINOR: map/acl: add the possibility to specify the version in "clear map/acl"
- MINOR: map/acl: add the "prepare map/acl" CLI command
- MINOR: map/acl: add the "commit map/acl" CLI command
- MINOR: map/acl: make "add map/acl" support an optional version number
- CLEANUP: map/cli: properly align the map/acl help
- BUILD: compiler: do not use already defined __read_mostly on dragonfly
MINOR: map/acl: make "add map/acl" support an optional version number
By passing a version number to "add map/acl", it becomes possible to
atomically replace maps and ACLs. The principle is that a new version
number is first retrieved by calling"prepare map/acl", and this version
number is used with "add map" and "add acl". Newly added entries then
remain invisible to the matching mechanism but are visible in "show
map/acl" when the version number is specified, or may be cleard with
"clear map/acl". Finally when the insertion is complete, a
"commit map/acl" command must be issued, and the version is atomically
updated so that there is no intermediate state with incomplete entries.
MINOR: map/acl: add the "commit map/acl" CLI command
The command is used to atomically replace a map/acl with the pending
contents of the designated version. The new version must have been
allocated by "prepare map/acl" prior to this. At the moment it is not
possible to force the version when adding new entries, so this may only
be used to atomically clear an ACL/map.
MINOR: map/acl: add the "prepare map/acl" CLI command
This command allocates a new version for the map/acl, that will be usable
later to prepare the addition of new values to atomically replace existing
ones. Technically speaking the operation consists in atomically incrementing
the next version. There's no "undo" operation here, if a version is not
committed, it will automatically be trashed when committing a newer version.
MINOR: map/acl: add the possibility to specify the version in "clear map/acl"
This will ease maintenance of versionned maps by allowing to clear old or
failed updates instead of the current version. Nothing was done to allow
clearing everyhing, though if there was a need for this, implementing "@all"
or something equivalent wouldn't require more than 3 lines of code.
MINOR: pattern: support purging arbitrary ranges of generations
Instead of being able to purge only values older than a specific value,
let's support arbitrary ranges and make pat_ref_purge_older() just be
one special case of this one.