Willy Tarreau [Fri, 26 Mar 2021 13:51:31 +0000 (14:51 +0100)]
MINOR: vars/cli: add a "get var" CLI command to retrieve global variables
Process-wide variables can now be displayed from the CLI using "get var"
followed by the variable name. They must all start with "proc." otherwise
they will not be found. The output is very similar to the one of the
debug converter, with a type and value being reported for the embedded
sample.
This command is limited to clients with the level "operator" or higher,
since it can possibly expose traffic-related data.
Willy Tarreau [Fri, 26 Mar 2021 14:36:44 +0000 (15:36 +0100)]
MINOR: action: add a new ACT_F_CLI_PARSER origin designation
In order to process samples from the command line interface we'll need
rules as well, and these rules will have to be marked as coming from
the CLI parser. This new origin is used for this.
Willy Tarreau [Fri, 26 Mar 2021 14:29:35 +0000 (15:29 +0100)]
MINOR: sample: add a new CLI_PARSER context for samples
In order to prepare for supporting calling sample expressions from the
CLI, let's create a new CLI_PARSER parsing context. This one supports
constants and internal samples only.
Willy Tarreau [Fri, 26 Mar 2021 13:03:57 +0000 (14:03 +0100)]
REGTESTS: add a basic reg-test for some "set-var" commands
This reg-test tests "set-var" in the global section, with some overlapping
variables and using a few samples and converters, then at the TCP and HTTP
levels using proc/sess/req variables.
Willy Tarreau [Fri, 26 Mar 2021 10:38:08 +0000 (11:38 +0100)]
MEDIUM: vars: add support for a "set-var" global directive
While we do support process-wide variables ("proc.<name>"), there was
no way to preset them from the configuration. This was particularly
limiting their usefulness since configs involving them always had to
first check if the variable was set prior to performing an operation.
This patch adds a new "set-var" directive in the global section that
supports setting the proc.<name> variables from an expression, like
other set-var actions do. The syntax however follows what is already
being done for setenv, which consists in having one argument for the
variable name and another one for the expression.
Only "constant" expressions are allowed here, such as "int", "str"
etc, combined with arithmetic or string converters, and variable
lookups. A few extra sample fetch keywords like "date", "rand" and
"uuid" are also part of the constant expressions and may make sense
to allow to create a random key or differentiate processes.
The way it was done consists in parsing a dummy rule an executing the
expression in the CFG_PARSE context, then releasing the expression.
This is safe because the sample that variables store does not hold a
back pointer to expression that created them.
Willy Tarreau [Fri, 26 Mar 2021 10:11:34 +0000 (11:11 +0100)]
MINOR: action: add a new ACT_F_CFG_PARSER origin designation
In order to process samples from the config file we'll need rules as
well, and these rules will have to be marked as coming from the
config parser. This new origin is used for this.
Willy Tarreau [Fri, 26 Mar 2021 10:09:38 +0000 (11:09 +0100)]
MINOR: sample: add a new CFG_PARSER context for samples
We'd sometimes like to be able to process samples while parsing
the configuration based on purely internal thing but that's not
possible right now. Let's add a new CFG_PARSER context for samples
which only permits constant samples (i.e. those which do not change
in the process' life and which are stable during config parsing).
Willy Tarreau [Fri, 26 Mar 2021 11:03:11 +0000 (12:03 +0100)]
MINOR: sample: mark the truly constant sample fetch keywords as such
A number of keywords are really constant and safe to use at config
time. This is the case for str(), int() etc but also env(), hostname(),
nbproc() etc. By extension a few other ones which can be useful to
preset values in a configuration were enabled as well, like data(),
rand() or uuid(). At the moment this doesn't change anything as they
are still only usable from runtime rules.
The "var()" keyword was also marked as const as it can definitely
return stable stuff at boot time.
Willy Tarreau [Fri, 26 Mar 2021 10:56:11 +0000 (11:56 +0100)]
MINOR: sample: add a new SMP_SRC_CONST sample capability
This level indicates that everything it constant in the expression during
the whole process' life and that it may safely be used at config parsing
time.
Willy Tarreau [Fri, 26 Mar 2021 15:11:55 +0000 (16:11 +0100)]
MINOR: sample: make smp_resolve_args() return an allocate error message
For now smp_resolve_args() complains on stderr via ha_alert(), but if we
want to make it a bit more dynamic, we need it to return errors in an
allocated message. Let's pass it an error pointer and have it fill it.
On return we indent the output if it contains more than one line.
The "stopping" sample fetch keyword was accidently duplicated in 1.9
by commit 70fe94419 ("MINOR: sample: add cpu_calls, cpu_ns_avg,
cpu_ns_tot, lat_ns_avg, lat_ns_tot"). This has no effect so no
backport is needed.
Willy Tarreau [Fri, 26 Mar 2021 11:08:06 +0000 (12:08 +0100)]
MINOR: vars: make the var() sample fetch keyword depend on nothing
This sample fetch doesn't require any L4 client session in practice, as
get_var() now checks for the session. This is important to remove this
dependency in order to support accessing variables in scope "proc" from
anywhere.
Amaury Denoyelle [Wed, 24 Mar 2021 16:57:47 +0000 (17:57 +0100)]
MINOR: lua: properly allocate the lua Socket servers
Instantiate both lua Socket servers tcp/ssl using standard function
new_server. There is currently no need to tune their settings except to
activate the ssl mode with noverify for the second one. Both servers are
freed with the free_server function.
Amaury Denoyelle [Wed, 24 Mar 2021 09:22:03 +0000 (10:22 +0100)]
MINOR: lua: properly allocate the lua Socket proxy
Replace static initialization of the lua Socket proxy with the standard
function alloc_new_proxy. The settings proxy are properly applied thanks
to PR_CAP_LUA. The proxy is freed with the free_proxy function.
Amaury Denoyelle [Wed, 24 Mar 2021 09:49:34 +0000 (10:49 +0100)]
MINOR: proxy: define cap PR_CAP_LUA
Define a new cap PR_CAP_LUA. It can be used to allocate the internal
proxy for lua Socket class. This cap overrides default settings for
preferable values in the lua context.
Amaury Denoyelle [Wed, 24 Mar 2021 15:13:20 +0000 (16:13 +0100)]
MINOR: proxy: implement a free_proxy function
Move all liberation code related to a proxy in a dedicated function
free_proxy in proxy.c. For now, this function is only called in
haproxy.c. In the future, it will be used to free the lua proxy.
Amaury Denoyelle [Tue, 23 Mar 2021 16:27:05 +0000 (17:27 +0100)]
REORG: split proxy allocation functions
Create a new function parse_new_proxy specifically designed to allocate
a new proxy from the configuration file and copy settings from the
default proxy.
The function alloc_new_proxy is reduced to a minimal allocation. It is
used for default proxy allocation and could also be used for internal
proxies such as the lua Socket proxy.
Amaury Denoyelle [Thu, 25 Mar 2021 16:15:52 +0000 (17:15 +0100)]
REORG: global: move free acl/action in their related source files
Move deinit_acl_cond and deinit_act_rules from haproxy.c respectively in
acl.c and action.c. The name of the functions has been slightly altered,
replacing the prefix deinit_* by free_* to reflect their purpose more
clearly.
This change has been made in preparation to the implementation of a free
proxy function. As a side-effect, it helps to clean up haproxy.c.
Amaury Denoyelle [Thu, 25 Mar 2021 14:09:38 +0000 (15:09 +0100)]
REORG: global: move initcall register code in a dedicated file
Create a new module init which contains code related to REGISTER_*
macros for initcalls. init.h is included in api.h to make init code
available to all modules.
BUG/MINOR: ssl: Prevent removal of crt-list line if the instance is a default one
If the first active line of a crt-list file is also the first mentioned
certificate of a frontend that does not have the strict-sni option
enabled, then its certificate will be used as the default one. We then
do not want this instance to be removable since it would make a frontend
lose its default certificate.
Considering that a crt-list file can be used by multiple frontends, and
that its first mentioned certificate can be used as default certificate
for only a subset of those frontends, we do not want the line to be
removable for some frontends and not the others. So if any of the ckch
instances corresponding to a crt-list line is a default instance, the
removal of the crt-list line will be forbidden.
The default SSL_CTX used by a specific frontend is the one of the first
ckch instance created for this frontend. If this instance has SNIs, then
the SSL context is linked to the instance through the list of SNIs
contained in it. If the instance does not have any SNIs though, then the
SSL_CTX is only referenced by the bind_conf structure and the instance
itself has no link to it.
When trying to update a certificate used by the default instance through
a cli command, a new version of the default instance was rebuilt but the
default SSL context referenced in the bind_conf structure would not be
changed, resulting in a buggy behavior in which depending on the SNI
used by the client, he could either use the new version of the updated
certificate or the original one.
This patch adds a reference to the default SSL context in the default
ckch instances so that it can be hot swapped during a certificate
update.
Willy Tarreau [Fri, 26 Mar 2021 08:09:42 +0000 (09:09 +0100)]
BUG/MEDIUM: mux-h1: make h1_shutw_conn() idempotent
In issue #1197, Stéphane Graber reported a rare case of crash that
results from an attempt to close an already closed H1 connection. It
indeed looks like under some circumstances it should be possible to
call the h1_shutw_conn() function more than once, though these
conditions are not very clear.
Without going through a deep analysis of all possibilities, one
potential case seems to be a detach() called with pending output data,
causing H1C_F_ST_SHUTDOWN to be set on the connection, then h1_process()
being immediately called on I/O, causing h1_send() to flush these data
and call h1_shutw_conn(), and finally the upper stream calling cs_shutw()
hence h1_shutw(), which itself will call h1_shutw_conn() again while the
transport and control layers have already been released. But the whole
sequence is not certain as it's not very clear in which case it's
possible to leave h1_send() without the connection anymore (at least
the obuf is empty).
However what is certain is that a shutdown function must be idempotent,
so let's fix h1_shutw_conn() regarding this point. Stéphane reported the
issue as far back as 2.0, so this patch should be backported this far.
Willy Tarreau [Thu, 25 Mar 2021 13:12:29 +0000 (14:12 +0100)]
BUG/MINOR: http_fetch: make hdr_ip() reject trailing characters
The hdr_ip() sample fetch function will try to extract IP addresses
from a header field. These IP addresses are parsed using url2ipv4()
and if it fails it will fall back to inet_pton(AF_INET6), otherwise
will fail.
There is a small problem there which is that if a field starts with
an IP address and is immediately followed by some garbage, the IP
address part is still returned. This is a problem with fields such
as x-forwarded-for because it prevents detection of accidental
corruption or bug along the chain. For example, the following string:
x-forwarded-for: 1.2.3.4; 5.6.7.8
or this one:
x-forwarded-for: 1.2.3.4O ( the last one being the letter 'O')
would still return "1.2.3.4" despite the trailing characters. This is
bad because it will silently cover broken code running on intermediary
proxies and may even in some cases allow haproxy to pass improperly
formatted headers after they were apparently validated, for example,
if someone extracts the address from this field to place it into
another one.
This issue would only affect the IPv4 parser, because the IPv6 parser
already uses inet_pton() which fails at the first invalid character and
rejects trailing port numbers.
In strict compliance with RFC7239, let's make sure that if there are any
characters left in the string, the parsing fails and makes hdr_ip()
return nothing. However, a special case has to be handled to support
IPv4 addresses followed by a colon and a valid port number, because till
now the parser used to implicitly accept them and it appears that this
practice, though rare, does exist at least in Azure:
https://docs.microsoft.com/en-us/azure/application-gateway/how-application-gateway-works
This issue has always been there so the fix may be backported to all
versions. It will need the following commit in order to work as expected:
MINOR: tools: make url2ipv4 return the exact number of bytes parsed
Many thanks to https://twitter.com/melardev and the BitMEX Security Team
for their detailed report.
Willy Tarreau [Thu, 25 Mar 2021 10:34:40 +0000 (11:34 +0100)]
MINOR: tools: make url2ipv4 return the exact number of bytes parsed
The function's return value is currently used as a boolean but we'll
need it to return the number of bytes parsed. Right now it returns
it minus one, unless the last char doesn't match what is permitted.
Let's update this to make it more usable.
BUG/MEDIUM: thread: Fix a deadlock if an isolated thread is marked as harmless
If an isolated thread is marked as harmless, it will loop forever in
thread_harmless_till_end() waiting no threads are isolated anymore. It never
happens because the current thread is isolated. To fix the bug, we exclude
the current thread for the test. We now wait for all other threads to leave
the rendez-vous point.
This bug only seems to occurr if HAProxy is compiled with DEBUG_UAF, when
pool_gc() is called. pool_gc() isolates the current thread, while
pool_free_area() set the thread as harmless when munmap is called.
Amaury Denoyelle [Tue, 23 Mar 2021 09:44:43 +0000 (10:44 +0100)]
BUG/MEDIUM: release lock on idle conn killing on reached pool high count
Release the lock before calling mux destroy in connect_server when
trying to kill an idle connection because the pool high count has been
reached.
The lock must be released because the mux destroy will call
srv_release_conn which also takes the lock to remove the connection from
the tree. As the connection was already deleted from the tree at this
stage, it is safe to release the lock, and the removal in
srv_release_conn will be a noop.
It does not need to be backported because it is only present in the
current release. It has been introduced by 5c7086f6b06d546c5800486ed9e4bb8d8d471e09
MEDIUM: connection: protect idle conn lists with locks
Olivier Houchard [Thu, 25 Mar 2021 00:38:54 +0000 (01:38 +0100)]
BUG/MEDIUM: fd: Take the fd_mig_lock when closing if no DWCAS is available.
In fd_delete(), if we're running with no double-width cas, take the
fd_mig_lock before setting thread_mask to 0 to make sure that
another thread calling fd_set_running() won't miss the new value of
thread_mask and set its bit in running_mask after we checked it.
This should be backported to 2.2 as part of the series fixing fd_delete().
Willy Tarreau [Wed, 24 Mar 2021 09:51:32 +0000 (10:51 +0100)]
BUG/MEDIUM: fd: do not wait on FD removal in fd_delete()
Christopher discovered an issue mostly affecting 2.2 and to a less extent
2.3 and above, which is that it's possible to deadlock a soft-stop when
several threads are using a same listener:
This simple case disappeared from 2.3 due to the removal of some locked
operations at the end of listener_accept() on the regular path, but the
architectural problem is still here and caused by a lock inversion built
around the loop on running_mask in fd_clr_running_excl(), because there
are situations where the caller of fd_delete() may hold a lock that is
preventing other threads from dropping their bit in running_mask.
The real need here is to make sure the last user deletes the FD. We have
all we need to know the last one, it's the one calling fd_clr_running()
last, or entering fd_delete() last, both of which can be summed up as
the last one calling fd_clr_running() if fd_delete() calls fd_clr_running()
at the end. And we can prevent new threads from appearing in running_mask
by removing their bits in thread_mask.
So what this patch does is that it sets the running_mask for the thread
in fd_delete(), clears the thread_mask, thus marking the FD as orphaned,
then clears the running mask again, and completes the deletion if it was
the last one. If it was not, another thread will pass through fd_clr_running
and will complete the deletion of the FD.
The bug is easily reproducible in 2.2 under high connection rates during
soft close. When the old process stops its listener, occasionally two
threads will deadlock and the old process will then be killed by the
watchdog. It's strongly believed that similar situations do exist in 2.3
and 2.4 (e.g. if the removal attempt happens during resume_listener()
called from listener_accept()) but if so, they should be much harder to
trigger.
This should be backported to 2.2 as the issue appeared with the FD
migration. It requires previous patches "fd: make fd_clr_running() return
the remaining running mask" and "MINOR: fd: remove the unneeded running
bit from fd_insert()".
Notes for backport: in 2.2, the fd_dodelete() function requires an extra
argument "do_close" indicating whether we want to remove and close the FD
(fd_delete) or just delete it (fd_remove). While this information is not
conveyed along the chain, we know that late calls always imply do_close=1
become do_close=0 exclusively results from fd_remove() which is only used
by the config parser and the master, both of which are single-threaded,
hence are always the last ones in the running_mask. Thus it is safe to
assume that a postponed FD deletion always implies do_close=1.
Thanks to Olivier for his help in designing this optimal solution.
Willy Tarreau [Wed, 24 Mar 2021 09:42:16 +0000 (10:42 +0100)]
MINOR: fd: remove the unneeded running bit from fd_insert()
There's no point taking the running bit in fd_insert() since by
definition there will never be more than one thread inserting the FD,
and that fd_insert() may only be done after the fd was allocated by
the system, indicating the end of use by any other thread.
This will need to be backported to 2.2 to fix an issue.
Willy Tarreau [Wed, 24 Mar 2021 09:27:56 +0000 (10:27 +0100)]
MINOR: fd: make fd_clr_running() return the remaining running mask
We'll need to know that a thread is the last one to use an fd, so let's
make fd_clr_running() return the remaining bits after removal. Note that
in practice we're only interested in knowing if it's zero but the compiler
doesn't make use of the clags after the AND and emits a CMPXCHG anyway :-/
This will need to be backported to 2.2 to fix an issue.
BUG/MEDIUM: lua: Always init the lua stack before referencing the context
When a lua context is allocated, its stack must be initialized to NULL
before attaching it to its owner (task, stream or applet). Otherwise, if
the watchdog is fired before the stack is really created, that may lead to a
segfault because we try to dump the traceback of an uninitialized lua stack.
It is easy to trigger this bug if a lua script do a blocking call while
another thread try to initialize a new lua context. Because of the global
lua lock, the init is blocked before the stack creation. Of course, it only
happens if the script is executed in the shared global context.
BUG/MEDIUM: debug/lua: Use internal hlua function to dump the lua traceback
The commit reverts following commits:
* 83926a04 BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable
* a61789a1 MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua
Instead of relying on a Lua function to print the lua traceback into the
debugger, we are now using our own internal function (hlua_traceback()).
This one does not allocate memory and use a chunk instead. This avoids any
issue with a possible deadlock in the memory allocator because the thread
processing was interrupted during a memory allocation.
This patch relies on the commit "BUG/MEDIUM: debug/lua: Use internal hlua
function to dump the lua traceback". Both must be backported wherever the
patches above are backported, thus as far as 2.0
MINOR: lua: Slightly improve function dumping the lua traceback
The separator string is now configurable, passing it as parameter when the
function is called. In addition, the message have been slightly changed to
be a bit more readable.
BUG/MINOR: ssl: Prevent disk access when using "add ssl crt-list"
If an unknown CA file was first mentioned in an "add ssl crt-list" CLI
command, it would result in a call to X509_STORE_load_locations which
performs a disk access which is forbidden during runtime. The same would
happen if a "ca-verify-file" or "crl-file" was specified. This was due
to the fact that the crt-list file parsing and the crt-list related CLI
commands parsing use the same functions.
The patch simply adds a new parameter to all the ssl_bind parsing
functions so that they know if the call is made during init or by the
CLI, and the ssl_store_load_locations function can then reject any new
cafile_entry creation coming from a CLI call.
Willy Tarreau [Tue, 23 Mar 2021 17:36:37 +0000 (18:36 +0100)]
BUILD: tools: fix build error with new PA_O_DEFAULT_DGRAM
Previous commit 69ba35146 ("MINOR: tools: introduce new option
PA_O_DEFAULT_DGRAM on str2sa_range.") managed to introduce a
parenthesis imbalance that broke the build. No backport is needed.
Emeric Brun [Mon, 22 Mar 2021 16:17:34 +0000 (17:17 +0100)]
MINOR: tools: introduce new option PA_O_DEFAULT_DGRAM on str2sa_range.
str2sa_range function options PA_O_DGRAM and PA_O_STREAM are used to
define the supported address types but also to set the default type
if it is not explicit. If the used address support both STREAM and DGRAM,
the default was always set to STREAM.
This patch introduce a new option PA_O_DEFAULT_DGRAM to force the
default to DGRAM type if it is not explicit in the address field
and both STREAM and DGRAM are supported. If only DGRAM or only STREAM
is supported, it continues to be considered as the default.
Willy Tarreau [Tue, 23 Mar 2021 07:58:22 +0000 (08:58 +0100)]
BUG/MEDIUM: freq_ctr/threads: use the global_now_ms variable
In commit a1ecbca0a ("BUG/MINOR: freq_ctr/threads: make use of the last
updated global time"), for period-based counters, the millisecond part
of the global_now variable was used as the date for the new period. But
it's wrong, it only works with sub-second periods as it wraps every
second, and for other periods the counters never rotate anymore.
Let's make use of the newly introduced global_now_ms variable instead,
which contains the global monotonic time expressed in milliseconds.
This patch needs to be backported wherever the patch above is backported.
It depends on previous commit "MINOR: time: also provide a global,
monotonic global_now_ms timer".
Willy Tarreau [Tue, 23 Mar 2021 07:45:42 +0000 (08:45 +0100)]
MINOR: time: also provide a global, monotonic global_now_ms timer
The period-based freq counters need the global date in milliseconds,
so better calculate it and expose it rather than letting all call
places incorrectly retrieve it.
Here what we do is that we maintain a new globally monotonic timer,
global_now_ms, which ought to be very close to the global_now one,
but maintains the monotonic approach of now_ms between all threads
in that global_now_ms is always ahead of any now_ms.
This patch is made simple to ease backporting (it will be needed for
a subsequent fix), but it also opens the way to some simplifications
on the time handling: instead of computing the local time and trying
to force it to the global one, we should soon be able to proceed in
the opposite way, that is computing the new global time an making the
local one just the latest snapshot of it. This will bring the benefit
of making sure that the global time is always ahead of the local one.
Willy Tarreau [Mon, 22 Mar 2021 19:54:15 +0000 (20:54 +0100)]
MINOR: pools: make the pool allocator support a few flags
The pool_alloc_dirty() function was renamed to __pool_alloc() and now
takes a set of flags indicating whether poisonning is permitted or not
and whether zeroing the area is needed or not. The pool_alloc() function
is now just a wrapper calling __pool_alloc(pool, 0).
Willy Tarreau [Mon, 22 Mar 2021 13:56:30 +0000 (14:56 +0100)]
CLEANUP: pools: remove the unused pool_get_first() function
This one used to maintain a shortcut in the pools allocation path that
was only justified by b_alloc_fast() which was not used! Let's get rid
of it as well so that the allocator becomes a bit more straight forward.
Willy Tarreau [Mon, 22 Mar 2021 13:53:56 +0000 (14:53 +0100)]
CLEANUP: dynbuf: remove the unused b_alloc_fast() function
It is never used anymore since 1.7 where it was used by b_alloc_margin()
then replaced by direct calls to the pools function, and it maintains a
dependency on the exposed pools functions. It's time to get rid of it,
as it's not even certain it still works.
Willy Tarreau [Mon, 22 Mar 2021 13:44:31 +0000 (14:44 +0100)]
MEDIUM: dynbuf: remove last usages of b_alloc_margin()
The function's purpose used to be to fail a buffer allocation if that
allocation wouldn't result in leaving some buffers available. Thus,
some allocations could succeed and others fail for the sole purpose of
trying to provide 2 buffers at once to process_stream(). But things
have changed a lot with 1.7 breaking the promise that process_stream()
would always succeed with only two buffers, and later the thread-local
pool caches that keep certain buffers available that are not accounted
for in the global pool so that local allocators cannot guess anything
from the number of currently available pools.
Let's just replace all last uses of b_alloc_margin() with b_alloc() once
for all.
Willy Tarreau [Mon, 22 Mar 2021 13:41:38 +0000 (14:41 +0100)]
MINOR: channel: simplify the channel's buffer allocation
The channel's buffer allocator, channel_alloc_buffer(), was still relying
on the principle of a margin for the request and not for the response.
But this margin stopped working around 1.7 with the introduction of the
content filters such as SPOE, and was completely anihilated with the
local pools that came with threads. Let's simplify this and just use
b_alloc().
Willy Tarreau [Mon, 22 Mar 2021 15:10:22 +0000 (16:10 +0100)]
MINOR: dynbuf: make b_alloc() always check if the buffer is allocated
Right now there is a discrepancy beteween b_alloc() and b_allow_margin():
the former forcefully overwrites the target pointer while the latter tests
it and returns it as-is if already allocated.
As a matter of fact, all callers of b_alloc() either preliminary test the
buffer, or assume it's already null.
Let's remove this pain and make the function test the buffer's allocation
before doing it again, and match call places' expectations.
Willy Tarreau [Mon, 22 Mar 2021 14:10:51 +0000 (15:10 +0100)]
MINOR: opentracing: use pool_alloc(), not pool_alloc_dirty()
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
Willy Tarreau [Mon, 22 Mar 2021 14:09:41 +0000 (15:09 +0100)]
MINOR: ssl: use pool_alloc(), not pool_alloc_dirty()
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
Willy Tarreau [Mon, 22 Mar 2021 14:00:49 +0000 (15:00 +0100)]
MINOR: cache: use pool_alloc(), not pool_alloc_dirty()
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
Willy Tarreau [Mon, 22 Mar 2021 14:07:37 +0000 (15:07 +0100)]
MINOR: fcgi-app: use pool_alloc(), not pool_alloc_dirty()
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
Willy Tarreau [Mon, 22 Mar 2021 14:05:54 +0000 (15:05 +0100)]
MINOR: spoe: use pool_alloc(), not pool_alloc_dirty()
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any real benefit, it only avoids the
area being poisonned before being zeroed. Ideally a pool_calloc() function
should be provided for this.
Willy Tarreau [Mon, 22 Mar 2021 14:08:17 +0000 (15:08 +0100)]
MINOR: compression: use pool_alloc(), not pool_alloc_dirty()
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
Amaury Denoyelle [Mon, 22 Mar 2021 10:44:12 +0000 (11:44 +0100)]
REGTESTS: wait for proper return of enable server in cli add server test
Add an empty expect statement after the 'enable server' cli command.
This ensures that the command has been properly handled by haproxy and
its processing is over.
It should fix the unstable behavior of the test which causes reports of
503 even after the server has been enabled.
Willy Tarreau [Fri, 19 Mar 2021 16:16:18 +0000 (17:16 +0100)]
[RELEASE] Released version 2.4-dev13
Released version 2.4-dev13 with the following main changes :
- BUG/MEDIUM: cli: fix "help" crashing since recent spelling fixes
- BUG/MINOR: cfgparse: use the GLOBAL not LISTEN keywords list for spell checking
- MINOR: tools: improve word fingerprinting by counting presence
- MINOR: tools: do not sum squares of differences for word fingerprints
- MINOR: cli: improve fuzzy matching to work on all remaining words at once
- MINOR: cli: sort the suggestions by order of relevance
- MINOR: cli: limit spelling suggestions to 5
- MINOR: cfgparse/proxy: also support spelling fixes on options
- BUG/MINOR: resolvers: Add missing case-insensitive comparisons of DNS hostnames
- MINOR: time: export the global_now variable
- BUG/MINOR: freq_ctr/threads: make use of the last updated global time
- MINOR: freq_ctr/threads: relax when failing to update a sliding window value
- MINOR/BUG: mworker/cli: do not use the unix_bind prefix for the master CLI socket
- MINOR: mworker/cli: alert the user if we enabled a master CLI but not the master-worker mode
- MINOR: cli: implement experimental-mode
- REORG: server: add a free server function
- MINOR: cfgparse: always alloc idle conns task
- REORG: server: move keywords in srv_kws
- MINOR: server: remove fastinter from mistyped kw list
- REORG: server: split parse_server
- REORG: server: move alert traces in parse_server
- REORG: server: rename internal functions from parse_server
- REORG: server: attach servers in parse_server
- REORG: server: use flags for parse_server
- MINOR: server: prepare parsing for dynamic servers
- MINOR: stats: export function to allocate extra proxy counters
- MEDIUM: server: implement 'add server' cli command
- REGTESTS: implement test for 'add server' cli
- MINOR: server: enable standard options for dynamic servers
- MINOR: server: support keyword proto in 'add server' cli
- BUG/MINOR: protocol: add missing support of dgram unix socket.
- CLEANUP: Fix a typo in fix_is_valid description
- MINOR: raw_sock: Add a close method.
- MEDIUM: connections: Introduce a new XPRT method, start().
- MEDIUM: connections: Implement a start() method for xprt_handshake.
- MEDIUM: connections: Implement a start() method in ssl_sock.
- MINOR: muxes: garbage collect the reset() method.
- CLEANUP: tcp-rules: Fix a typo in error messages about expect-netscaler-cip
- MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua
- BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable
BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable
When we try to dump the stack of a lua context, if it is not dumpable,
nothing is performed and a message is emitted instead. This happens when a
lua execution was interrupted inside a non-reentrant part.
This patch depends on following commit :
* MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua
Thanks to this patch, we avoid a possible deadllock if the lua is
interrupted by the watchdog in the lua memory allocator, because realloc()
is not async-signal-safe.
MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua
Some parts of the Lua are non-reentrant. We must be sure to carefully track
these parts to not dump the lua stack when it is interrupted inside such
parts. For now, we only identified the custom lua allocator. If the thread
is interrupted during the memory allocation, we must not try to print the
lua stack wich also allocate memory. Indeed, realloc() is not
async-signal-safe.
In this patch we introduce a thread-local counter. It is incremented before
entering in a non-reentrant part and decremented when exiting. It is only
performed in hlua_alloc() for now.
CLEANUP: tcp-rules: Fix a typo in error messages about expect-netscaler-cip
It was misspelled (expect-netscaler-ip instead of expect-netscaler-cip). 2
commits are concerned :
* db67b0ed7 MINOR: tcp-rules: suggest approaching action names on mismatch
* 72d012fbd CLEANUP: tcp-rules: add missing actions in the tcp-request error message
The first one will not be backported, but the second one was backported as
far as 1.8. Thus this one may also be backported, but only the 2nd part
about the list of accepted keywords.
Olivier Houchard [Sat, 13 Mar 2021 23:39:49 +0000 (00:39 +0100)]
MINOR: muxes: garbage collect the reset() method.
Now that connections aren't being reused when they failed, remove the
reset() method. It was unimplemented anywhere, except for H1 where it did
nothing, anyway.
MEDIUM: connections: Implement a start() method in ssl_sock.
Add a start() method to ssl_sock. It is responsible with initiating the
SSL handshake, currently by just scheduling the tasklet, instead of doing
it in the init() method, when all the XPRT may not have been initialized.
MEDIUM: connections: Implement a start() method for xprt_handshake.
Add a start_method to xprt_handshake. It schedules the tasklet that does
the handshake. This used to be done in xprt_handshake_add_xprt(), but that's
a much better place.
MEDIUM: connections: Introduce a new XPRT method, start().
Introduce a new XPRT method, start(). The init() method will now only
initialize whatever is needed for the XPRT to run, but any action the XPRT
has to do before being ready, such as handshakes, will be done in the new
start() method. That way, we will be sure the full stack of xprt will be
initialized before attempting to do anything.
The init() call is also moved to conn_prepare(). There's no longer any reason
to wait for the ctrl to be ready, any action will be deferred until start(),
anyway. This means conn_xprt_init() is no longer needed.
Amaury Denoyelle [Fri, 12 Mar 2021 17:03:27 +0000 (18:03 +0100)]
MINOR: server: support keyword proto in 'add server' cli
Allow to specify the mux proto for a dynamic server. It must be
compatible with the backend mode to be accepted. The reg-tests has been
extended for this error case.
MINOR: server: enable standard options for dynamic servers
Enable a subset of server options to be used as keywords on the CLI
command 'add server'. These options are safe and can be applied
flawlessly for a dynamic server.
Amaury Denoyelle [Fri, 12 Mar 2021 09:45:12 +0000 (10:45 +0100)]
REGTESTS: implement test for 'add server' cli
Write a regtest for the cli command 'add server'. This test will execute
some invalid commands and validates the reported error. A client will
then try to connect to a dynamic server just created and activated.
Add a new cli command 'add server'. This command is used to create a new
server at runtime attached on an existing backend. The syntax is the
following one :
$ add server <be_name>/<sv_name> [<kws>...]
This command is only available through experimental mode for the moment.
Currently, no server keywords are supported. They will be activated
individually when deemed properly functional and safe.
Another limitation is put on the backend load-balancing algorithm. The
algorithm must use consistent hashing to guarantee a minimal
reallocation of existing connections on the new server insertion.
Amaury Denoyelle [Thu, 18 Mar 2021 14:48:14 +0000 (15:48 +0100)]
MINOR: stats: export function to allocate extra proxy counters
Remove static qualifier on stats_allocate_proxy_counters_internal. This
function will be used to allocate extra counters at runtime for dynamic
servers.
MINOR: server: prepare parsing for dynamic servers
Prepare the server parsing API to support dynamic servers.
- define a new parsing flag to be used for dynamic servers
- each keyword contains a new field dynamic_ok to indicate if it can be
used for a dynamic server. For now, no keyword are supported.
- do not copy settings from the default server for a new dynamic server.
- a dynamic server is created in a maintenance mode and requires an
explicit 'enable server' command.
- a new server flag named SRV_F_DYNAMIC is created. This flag is set for
all servers created at runtime. It might be useful later, for example
to know if a server can be purged.
Modify the API of parse_server function. Use flags to describe the type
of the parsed server instead of discrete arguments. These flags can be
used to specify if a server/default-server/server-template is parsed.
Additional parameters are also specified (parsing of the address
required, resolve of a name must be done immediately).
It is now unneeded to use strcmp on args[0] in parse_server. Also, the
calls to parse_server are more explicit thanks to the flags.
Move server linked into proxy backend list outside of _srv_parse_init to
parse_server.
This is groundwork for dynamic servers support. There will be two
differences in case of a dynamic server :
- the server will be attached to the proxy list only at the very end of the
operations when everything is ok
- the server will be directly attached to the end of the server proxy
list
Move every ha_alert calls in parsing functions into parse_server.
Parsing functions now support a pointer-to-string argument which will be
allocated with an error message if needed via memprintf.
parse_server has then the responsibility to display errors with ha_alert.
This is groundwork for dynamic server. No traces should be printed on
stderr as a response to a cli command. cli_err will replace ha_alert in
this case.
The huge parse_server function is splitted into two smaller ones.
* _srv_parse_init allocates a new server instance and parses the address
parameter
* _srv_parse_kw parse the current server keyword
This simplify a bit the parse_server function. Besides, it will be
useful for dynamic server creation.