MINOR: stream: Pass an optional input buffer when a stream is created
It is now possible to set the buffer used by the channel request buffer when
a stream is created. It may be useful if input data are already received,
instead of waiting the first call to the mux rcv_buf() callback. This change
is mandatory to support H1 connection with no stream attached.
For now, the multiplexers don't pass any buffer. BUF_NULL is thus used to
call stream_create_from_cs().
MINOR: muxes: Remove get_cs_info callback function now useless
This callback function was only defined by the mux-h1. But it has been
removed in the previous commit because it is unused now. So, we can do a
step forward removing the callback function from the mux definition and the
cs_info structure.
MINOR: stream: Don't retrieve anymore timing info from the mux csinfo
These info are only provided by the mux-h1. But, thanks to previous patches,
we can get them from the session directly. There is no need to retrieve them
from the mux anymore.
MINOR: stream: Always get idle duration from the session
Since the idle duration provided by the session is always up-to-date, there
is no more reason to rely on the multiplexer cs_info to set it to the
stream.
MINOR: logs: Use session idle duration when no stream is provided
When a log message is emitted from the session, using sess_log() function,
there is no stream available. In this case, instead of deducing the idle
duration from the accept date, we use the one provided by the session. 0 is
used if it is undefined (i.e set to -1).
MINOR: mux-h1: Reset session dates and durations info when the CS is detached
These info are reset for the next transaction, if the connection is kept
alive. From the stream point of view, it should be the same a new
connection, except there is no handshake. Thus the handshake duration is set
to 0.
MINOR: session: Add the idle duration field into the session
The idle duration between two streams is added to the session structure. It
is not necessarily pertinent on all protocols. In fact, it is only defined
for H1 connections. It is the duration between two H1 transactions. But the
.get_cs_info() callback function on the multiplexers only exists because
this duration is missing at the session level. So it is a simplification
opportunity for a really low cost.
To reduce the cost, a hole in the session structure is filled by moving
.srv_list field at the end of the structure.
BUG/MINOR: mux-h1: Handle keep-alive timeout for idle frontend connections
IDLE frontend connections have no stream attached. The stream is only
created when new data are received, when the parsing of the next request
starts. Thus the keep-alive timeout, handled into the HTTP analysers, is not
considered while nothing is received. But this is especially when this
timeout must be considered. Concretely the http-keep-alive is ignored while
no data are received. Only the client timeout is used. It will only be
considered on incomplete requests, if the http-request timeout is not set.
To fix the bug, the http-keep-alive timeout must be handled at the mux
level, for IDLE frontend connection only.
This patch should fix the issue #984. It must be backported as far as
2.2. On prior versions, the stream is created earlier. So, it is not a
problem, except if this behavior changes of course (it was an optim of the
2.2, but don't remember the commit).
Willy Tarreau [Fri, 4 Dec 2020 13:28:23 +0000 (14:28 +0100)]
BUG/MINOR: listener: use sockaddr_in6 for IPv6
A copy-paste bug between {tcp,udp}{4,6}_add_listener() resulted in
using a struct sockaddr_in to set the TCP/UDP port while it ought to
be a struct sockaddr_in6. Fortunately, the port has the same offset
(2) in both so it was harmless. A cleaner way to proceed would be
to have a set_port function exported by the address family layer.
Willy Tarreau [Fri, 4 Dec 2020 10:48:12 +0000 (11:48 +0100)]
BUG/MINOR: lua-thread: close all states on deinit
It seems to me that lua_close() must be called on all states at deinit
time, not just the first two ones. This is likely a remnant of commit 59f11be43 ("MEDIUM: lua-thread: Add the lua-load-per-thread directive").
There should likely be some memory leak reports when using Lua without
this fix, though none were observed for now.
No backport is needed as this was merged into 2.4-dev.
BUG/MEDIUM: lua-thread: some parts must be initialized once
Lua dedicated TCP, HTTP and SSL socket and proxies must be initialized
once. Right now, they are initialized from the Lua init state, but since
commit 59f11be43 ("MEDIUM: lua-thread: Add the lua-load-per-thread
directive") this function is called one time per lua context. This
caused some fields to be cleared and overwritten, and pre-allocated
object to be lost. This is why the address sanitizer detected memory
leaks from the socket_ssl server initialization.
Let's move all the state-independent part of the function to the
hlua_init() function to avoid this.
MEDIUM: cache: Remove cache entry in case of POST on the same resource
In case of successful unsafe method on a stored resource, the cached entry
must be invalidated (see RFC7234#4.4).
A "non-error response" is one with a 2xx (Successful) or 3xx (Redirection)
status code.
This implies that the primary hash must now be calculated on requests
that have an unsafe method (POST or PUT for instance) so that we can
disable the corresponding entries when we process the response.
MINOR: cache: Add extra "cache-control" value checks
The Cache-Control max-age and s-maxage directives should be followed by
a positive numerical value (see RFC 7234#5.2.1.1). According to the
specs, a sender "should not" generate a quoted-string value but we will
still accept this format.
When a response has an Age header (filled in by another cache on the
message's path) that is greater than its defined maximum age (extracted
either from cache-control directives or an expires header), it is
already stale and should not be cached.
Willy Tarreau [Wed, 2 Dec 2020 15:44:34 +0000 (16:44 +0100)]
REGTESTS: add a test for the threaded Lua code
This is simply txn_get_priv.vtc with the loading made using
lua-load-per-thread, allowing all threads to run independently.
There's nothing global nor thread-specific in this test, which makes
an excellent example of something that should work out of the box.
Four concurrent clients are initialized with 4 loops each so as to
give a little chance to various threads to run concurrently.
Thierry Fournier [Sat, 28 Nov 2020 23:37:41 +0000 (00:37 +0100)]
MEDIUM: lua-thread: Add the lua-load-per-thread directive
The goal is to allow execution of one main lua state per thread.
This patch contains the main job. The lua init is done using these
steps:
- "lua-load-per-thread" loads the lua code in the first thread
- it creates the structs
- it stores loaded files
- the 1st step load is completed (execution of hlua_post_init)
and now, we known the number of threads
- we initilize lua states for all remaining threads
- for each one, we load the lua file
- for each one, we execute post-init
Once all is loaded, we control consistency of functions references.
The rules are:
- a function reference cannot be in the shared lua state and in
a per-thread lua state at the same time.
- if a function reference is declared in a per-thread lua state, it
must be declared in all per-thread lua states
Thierry Fournier [Sat, 28 Nov 2020 22:57:24 +0000 (23:57 +0100)]
MINOR: lua-thread: Store each function reference and init reference in array
The goal is to allow execution of one main lua state per thread.
The array introduces storage of one reference per thread, because each
lua state can have different reference id for a same function. A function
returns the preferred state id according to configuration and current
thread id.
Thierry Fournier [Sat, 28 Nov 2020 22:42:03 +0000 (23:42 +0100)]
MINOR: lua-thread: Replace state_from by state_id
The goal is to allow execution of one main lua state per thread.
"state_from" is a pointer to the parent lua state. "state_id"
is the index of the parent state id in the reference lua states
array. "state_id" is better because the lock is a "== 0" test
which is quick than pointer comparison. In other way, the state_id
index could index other things the the Lua state concerned. I
think to the function references.
Thierry Fournier [Sat, 28 Nov 2020 20:06:35 +0000 (21:06 +0100)]
MINOR: lua-thread: Replace "struct hlua_function" allocation by dedicated function
The goal is to allow execution of one main lua state per thread.
This function will initialize the struct with other things than 0.
With this function helper, the initialization is centralized and
it prevents mistakes. This patch also keeps a reference to each
declared function in a list. It will be useful in next patches to
control consistency of declared references.
Thierry Fournier [Sat, 28 Nov 2020 16:06:51 +0000 (17:06 +0100)]
MINOR: lua-thread: Replace global gL var with an array of states
The goal is to allow execution of one main lua state per thread.
The array of states is initialized at the max number of thread +1.
We define the index 0 is the common state shared by all threads
and should be locked. Other index index are dedicated to each
one thread. The old gL now becomes hlua_states[0].
Thierry Fournier [Sat, 28 Nov 2020 16:02:21 +0000 (17:02 +0100)]
MEDIUM: lua-thread: Apply lock only if the parent state is the main thread
The goal is to allow execution of one main lua state per thread.
This patch opens the way to addition of a per-thread dedicated lua state.
By passing the hlua we can figure the original state that's been used
and decide to lock or not.
Thierry Fournier [Sat, 28 Nov 2020 15:05:05 +0000 (16:05 +0100)]
MEDIUM: lua-thread: No longer use locked context in initialization parts
The goal is to allow execution of one main lua state per thread.
Stop using locks in init part, we will use only in parts where
the parent lua state is known, so we could take decision about lock
according with the lua parent state.
Thierry Fournier [Sat, 28 Nov 2020 14:49:44 +0000 (15:49 +0100)]
MINOR: lua-thread: Add the "thread" core variable
The goal is to allow execution of one main lua state per thread.
This commit introduces this variable in the core. Lua state initialized
by thread will have access to this variable, which reports the executing
thread. 0 indicates the shared thread. Programs which must be executed
only once can check for core.thread <= 1.
Thierry Fournier [Sat, 28 Nov 2020 12:18:56 +0000 (13:18 +0100)]
MINOR: lua-thread: make hlua_ctx_init() get L from its caller
The goal is to allow execution of one main lua state per thread.
The function hlua_ctx_init() now gets the original lua state from
its caller. This allows the initialisation of lua_thread (coroutines)
from any master lua state.
The parent lua state is stored in the hlua struct.
This patch is a temporary transition, it will be modified later.
Thierry Fournier [Sun, 29 Nov 2020 01:05:57 +0000 (02:05 +0100)]
MINOR: lua-thread: Replace embedded struct hlua_function by a pointer
The goal is to allow execution of one main lua state per thread.
Because this struct will be filled after the configuration parser, we
cannot copy the content. The actual state of the Haproxy code doesn't
justify this change, it is an update preparing next steps.
Thierry Fournier [Sat, 28 Nov 2020 12:18:23 +0000 (13:18 +0100)]
MINOR: lua-thread: Use NULL context for main lua state
The goal is to no longer use "struct hlua" with global main lua_state.
This patch returns NULL value when some code tries go get the hlua struct
associated with a task through hlua_gethlua(). This functions is useful
only during runtime because the struct hlua contains only runtime states.
Some Lua functions allowed to yield are called from init environment.
I'm not sure this is a good practice. Maybe it will be clever to
disallow calling this kind of functions.
Thierry Fournier [Sat, 28 Nov 2020 10:15:14 +0000 (11:15 +0100)]
MINOR: lua-thread: hlua_ctx_renew() is never called with main gL lua state
The goal is no longer using "struct hlua" with global main lua_state.
if somewhere in the code, hlua_ctx_renew() is called with a global Lua
context, we have a serious bug. A crash is better than working with
this bug, so this patch remove a useless control.
In other way, this control were used during hlua_post_init() function.
The function hlua_post_init() used a call to the runtime hlua_ctx_resume()
function. This call no longer exists.
Thierry Fournier [Sat, 28 Nov 2020 09:49:59 +0000 (10:49 +0100)]
MEDIUM: lua-thread: make hlua_post_init() no longer use the runtime execution function
The goal is to no longer use "struct hlua" with global main lua_state.
The hlua_post_init() is executed during start phase, it does not require
yielding nor any advanced runtime error processing. Let's simplify this
by re-implementing the code using lower-level functions which directly
take a state and not an hlua anymore.
Willy Tarreau [Wed, 2 Dec 2020 11:12:00 +0000 (12:12 +0100)]
MEDIUM: lua-thread: use atomics for memory accounting
Let's switch memory accounting to atomics so that the allocator function
may safely be used from concurrent Lua states.
Given that this function is extremely hot on the call path, we try to
optimize it for the most common case, which is:
- no limit
- there's enough memory
The accounting is what is particuarly expensive in threads since all
CPUs compete for a cache line, so when the limit is not used, we don't
want to use accounting. However we need to preserve it during the boot
phase until we may parse a "tune.lua.maxmem" value. For this, we turn
the unlimited "0" value to ~0 at the end of the boot phase to mark the
definite end of accounting. The function then detects this value and
directly jumps to realloc() in this case.
When the limit is enforced however, we use a CAS to check and reserve
our share of memory, and we roll back on failure. The CAS is used both
for increments and decrements so that a single operation is enough to
update the counters.
Willy Tarreau [Wed, 2 Dec 2020 11:26:29 +0000 (12:26 +0100)]
MINOR: lua: simplify hlua_alloc() to only rely on realloc()
The function really has the semantics of a realloc() except that it
also passes the old size to help with accounting. No need to special
case the free or malloc, realloc does everything we need.
Emeric Brun [Wed, 2 Dec 2020 16:02:09 +0000 (17:02 +0100)]
BUG/MAJOR: ring: tcp forward on ring can break the reader counter.
If the session is not established, the applet handler could leave
with the applet detached from the ring. At next call, the attach
counter will be decreased again causing unpredectable behavior.
With commit a1f12746b ("MINOR: traces: add a new level "error" below
the "user" level") a new trace level was inserted, resulting in
shifting all exiting ones by one. But the levels reported in the
__trace() function were not updated accordingly, resulting in the
TRACE_LEVEL_DEVELOPER not to be properly reported anymore. This
patch fixes it by extending the number of levels to 6.
MINOR: cache: Add entry to the tree as soon as possible
When many concurrent requests targeting the same resource were seen, the
cache could sometimes be filled by too many partial responses resulting
in the impossibility to cache a single one of them. This happened
because the actual tree insertion happened only after all the payload of
every response was seen. So until then, every response was added to the
cache because none of the streams knew that a similar request/response
was already being treated.
This patch consists in adding the cache_entry as soon as possible in the
tree (right after the first packet) so that the other responses do not
get cached as well (if they have the same primary key).
A "complete" flag is also added to the cache_entry so that we know if
all the payload is already stored in the entry or if it is still being
processed.
Turn the "Accept-Encoding" value to lower case before processing it.
Calculate the CRC on every token instead of a sorted concatenation of
them all (in order to avoir copying them) then XOR all the CRCs into a
single hash (while ignoring duplicates).
Thierry Fournier [Sat, 28 Nov 2020 19:41:07 +0000 (20:41 +0100)]
BUG/MINOR: lua: warn when registering action, conv, sf, cli or applet multiple times
Lua allows registering multiple sample-fetches, converters, action, cli,
applet/services with the same name. This is absolutely useless since only
the first registration will be used. This patch sends a warning if the case
is encountered.
This pach could be backported until 1.8, with the 3 associated patches:
- MINOR: actions: Export actions lookup functions
- MINOR: actions: add a function returning a service pointer from its name
- MINOR: cli: add a function to look up a CLI service description
Thierry Fournier [Sat, 28 Nov 2020 18:32:14 +0000 (19:32 +0100)]
MINOR: actions: add a function returning a service pointer from its name
This function simply calls action_lookup() on the private service_keywords,
to look up a service name. This will be used to detect double registration
of a same service from Lua.
This will be needed by a next patch to fix a bug and will have to be
backported.
Thierry Fournier [Sat, 28 Nov 2020 16:40:24 +0000 (17:40 +0100)]
MINOR: actions: Export actions lookup functions
These functions will be useful to check if a keyword is already registered.
This will be needed by a next patch to fix a bug, and will need to be
backported.
Thierry Fournier [Sat, 28 Nov 2020 15:08:02 +0000 (16:08 +0100)]
BUG/MINOR: lua: Some lua init operation are processed unsafe
Operation luaL_openlibs() and lua_prepend path are processed whithout
the safe context, so in case of failure Haproxy aborts or stops without
error message.
Thierry Fournier [Sun, 29 Nov 2020 00:06:24 +0000 (01:06 +0100)]
BUG/MINOR: lua: lua-load doesn't check its parameters
"lua-load" doesn't check if the expected parameter is present. It tries to
open() directly the argument at second position. So if the filename is
omitted, it tries to load an empty filename.
Willy Tarreau [Tue, 1 Dec 2020 09:47:18 +0000 (10:47 +0100)]
BUG/MINOR: mux-h2/stats: not all GOAWAY frames are errors
The stats on haproxy.org reported ~12k GOAWAY for ~34k connections, with
only 2 protocol errorss. It turns out that the GOAWAY frame counter added
in commit a8879238c ("MINOR: mux-h2: report detected error on stats")
matches a bit too many situations. First it counts those which are not
sent as well as failed retries, second it counts as errors the cases of
attempts to cleanly close, while it's titled "GOAWAY sent on detected
error". Let's address this by moving the counter up one line and excluding
the clean codes.
Willy Tarreau [Tue, 1 Dec 2020 09:24:29 +0000 (10:24 +0100)]
MINOR: mux-h2/trace: add traces at level ERROR for protocol errors
A number of traces could be added, and a few TRACE_PROTO were replaced
with TRACE_ERROR. The goal is to be able to enable error tracing only
to detect anomalies.
It looks like they're mostly correct as they don't seem to strike on
valid H2 traffic but are very verbose on h2spec.
Willy Tarreau [Tue, 1 Dec 2020 08:46:46 +0000 (09:46 +0100)]
MINOR: traces: add a new level "error" below the "user" level
Sometimes it would be nice to be able to only trace abnormal events such
as protocol errors. Let's add a new "error" level below the "user" level
for this. This will allow to add TRACE_ERROR() at various error points
and only see them.
Willy Tarreau [Tue, 1 Dec 2020 09:22:43 +0000 (10:22 +0100)]
BUG/MINOR: mux-h2/stats: make stream/connection proto errors more accurate
Since commit a8879238c ("MINOR: mux-h2: report detected error on stats")
we now have some error stats on stream/connection level protocol errors,
but some were improperly marked as stream while they're connection, and
2 or 3 relevant ones were missing and have now been added.
Willy Tarreau [Tue, 1 Dec 2020 07:15:26 +0000 (08:15 +0100)]
[RELEASE] Released version 2.4-dev2
Released version 2.4-dev2 with the following main changes :
- BUILD: Make DEBUG part of .build_opts
- BUILD: Show the value of DEBUG= in haproxy -vv
- CI: Set DEBUG=-DDEBUG_STRICT=1 in GitHub Actions
- MINOR: stream: Add level 7 retries on http error 401, 403
- CLEANUP: remove unused function "ssl_sock_is_ckch_valid"
- BUILD: SSL: add BoringSSL guarding to "RAND_keep_random_devices_open"
- BUILD: SSL: do not "update" BoringSSL version equivalent anymore
- BUG/MEDIUM: http_act: Restore init of log-format list
- DOC: better describes how to configure a fallback crt
- BUG/MAJOR: filters: Always keep all offsets up to date during data filtering
- MINOR: cache: Prepare helper functions for Vary support
- MEDIUM: cache: Add the Vary header support
- MINOR: cache: Add a process-vary option that can enable/disable Vary processing
- BUG/CRITICAL: cache: Fix trivial crash by sending accept-encoding header
- BUG/MAJOR: peers: fix partial message decoding
- DOC: cache: Add new caching limitation information
- DOC: cache: Add information about Vary support
- DOC: better document the config file format and escaping/quoting rules
- DOC: Clarify %HP description in log-format
- CI: github actions: update LibreSSL to 3.3.0
- CI: github actions: enable 51degrees feature
- MINOR: fd/threads: silence a build warning with threads disabled
- BUG/MINOR: tcpcheck: Don't forget to reset tcp-check flags on new kind of check
- MINOR: tcpcheck: Don't handle anymore in-progress send rules in tcpcheck_main
- BUG/MAJOR: tcpcheck: Allocate input and output buffers from the buffer pool
- MINOR: tcpcheck: Don't handle anymore in-progress connect rules in tcpcheck_main
- MINOR: config: Deprecate and ignore tune.chksize global option
- MINOR: config: Add a warning if tune.chksize is used
- REORG: tcpcheck: Move check option parsing functions based on tcp-check
- MINOR: check: Always increment check health counter on CONPASS
- MINOR: tcpcheck: Add support of L7OKC on expect rules error-status argument
- DOC: config: Make disable-on-404 option clearer on transition conditions
- DOC: config: Move req.hdrs and req.hdrs_bin in L7 samples fetches section
- BUG/MINOR: http-fetch: Fix smp_fetch_body() when called from a health-check
- MINOR: plock: use an ARMv8 instruction barrier for the pause instruction
- MINOR: debug: add "debug dev sched" to stress the scheduler.
- MINOR: debug: add a trivial PRNG for scheduler stress-tests
- BUG/MEDIUM: lists: Lock the element while we check if it is in a list.
- MINOR: task: remove tasklet_insert_into_tasklet_list()
- MINOR: task: perform atomic counter increments only once per wakeup
- MINOR: task: remove __tasklet_remove_from_tasklet_list()
- BUG/MEDIUM: task: close a possible data race condition on a tasklet's list link
- BUG/MEDIUM: local log format regression.
Emeric Brun [Fri, 27 Nov 2020 15:24:34 +0000 (16:24 +0100)]
BUG/MEDIUM: local log format regression.
Since 2.3 default local log format always adds hostame field.
This behavior change was due to log/sink re-work, because according
to rfc3164 the hostname field is mandatory.
This patch re-introduce a legacy "local" format which is analog
to rfc3164 but with hostname stripped. This is the new
default if logs are generated by haproxy.
To stay compliant with previous configurations, the option
"log-send-hostname" acts as if the default format is switched
to rfc3164.
This patch addresses the github issue #963
This patch should be backported in branches >= 2.3.
Willy Tarreau [Mon, 30 Nov 2020 13:58:53 +0000 (14:58 +0100)]
BUG/MEDIUM: task: close a possible data race condition on a tasklet's list link
In issue #958 Ashley Penney reported intermittent crashes on AWS's ARM
nodes which would not happen on x86 nodes. After investigation it turned
out that the Neoverse N1 CPU cores used in the Graviton2 CPU are much
more aggressive than the usual Cortex A53/A72/A55 or any x86 regarding
memory ordering.
The issue that was triggered there is that if a tasklet_wakeup() call
is made on a tasklet scheduled to run on a foreign thread and that
tasklet is just being dequeued to be processed, there can be a race at
two places:
- if MT_LIST_TRY_ADDQ() happens between MT_LIST_BEHEAD() and
LIST_SPLICE_END_DETACHED() if the tasklet is alone in the list,
because the emptiness tests matches ;
- if MT_LIST_TRY_ADDQ() happens during LIST_DEL_INIT() in
run_tasks_from_lists(), then depending on how LIST_DEL_INIT() ends
up being implemented, it may even corrupt the adjacent nodes while
they're being reused for the in-tree storage.
This issue was introduced in 2.2 when support for waking up remote
tasklets was added. Initially the attachment of a tasklet to a list
was enough to know its status and this used to be stable information.
Now it's not sufficient to rely on this anymore, thus we need to use
a different information.
This patch solves this by adding a new task flag, TASK_IN_LIST, which
is atomically set before attaching a tasklet to a list, and is only
removed after the tasklet is detached from a list. It is checked
by tasklet_wakeup_on() so that it may only be done while the tasklet
is out of any list, and is cleared during the state switch when calling
the tasklet. Note that the flag is not set for pure tasks as it's not
needed.
However this introduces a new special case: the function
tasklet_remove_from_tasklet_list() needs to keep both states in sync
and cannot check both the state and the attachment to a list at the
same time. This function is already limited to being used by the thread
owning the tasklet, so in this case the test remains reliable. However,
just like its predecessors, this function is wrong by design and it
should probably be replaced with a stricter one, a lazy one, or be
totally removed (it's only used in checks to avoid calling a possibly
scheduled event, and when freeing a tasklet). Regardless, for now the
function exists so the flag is removed only if the deletion could be
done, which covers all cases we're interested in regarding the insertion.
This removal is safe against a concurrent tasklet_wakeup_on() since
MT_LIST_DEL() guarantees the atomic test, and will ultimately clear
the flag only if the task could be deleted, so the flag will always
reflect the last state.
This should be carefully be backported as far as 2.2 after some
observation period. This patch depends on previous patch
"MINOR: task: remove __tasklet_remove_from_tasklet_list()".
This function is only used at a single place directly within the
scheduler in run_tasks_from_lists() and it really ought not be called
by anything else, regardless of what its comment says. Let's delete
it, move the two lines directly into the call place, and take this
opportunity to factor the atomic decrement on tasks_run_queue. A comment
was added on the remaining one tasklet_remove_from_tasklet_list() to
mention the risks in using it.
Willy Tarreau [Mon, 30 Nov 2020 14:39:00 +0000 (15:39 +0100)]
MINOR: task: perform atomic counter increments only once per wakeup
In process_runnable_tasks(), we walk the run queue and pick tasks to
insert them into the local list. And for each of these operations we
perform a few increments, some of which are atomic, and they're even
performed under the runqueue's lock. This is useless inside the loop,
better do them at the end, since we don't use these values inside the
loop and they're not used anywhere else either during this time. The
only one is task_list_size which is accessed in parallel by other
threads performing remote tasklet wakeups, but it's already
approximative and is used to decide to get out of the loop when the
limit is reached. So now we compute it first as an initial budget
instead.
This function is only called at a single place and adds more confusion
than it removes. It also makes one think it could be used outside of
the scheduler while it must absolutely not. Let's just move its two
lines to the call place, making the code more readable there. In
addition this clearly shows that the preliminary LIST_INIT() is
useless since the entry is immediately overwritten.
Olivier Houchard [Wed, 25 Nov 2020 19:38:00 +0000 (20:38 +0100)]
BUG/MEDIUM: lists: Lock the element while we check if it is in a list.
In MT_LIST_TRY_ADDQ() and MT_LIST_TRY_ADD() we can't just check if the
element is already in a list, because there's a small race condition, it
could be added between the time we checked, and the time we actually set
its next and prev, so we have to lock it first.
Willy Tarreau [Mon, 30 Nov 2020 15:17:33 +0000 (16:17 +0100)]
MINOR: debug: add a trivial PRNG for scheduler stress-tests
Commit a5a447984 ("MINOR: debug: add "debug dev sched" to stress the
scheduler.") doesn't scale with threads because ha_random64() takes care
of being totally thread-safe for use with UUIDs. We don't need this for
the stress-testing functions, let's just implement a xorshift PRNG
instead. On 8 threads the performance jumped from 230k ctx/s with 96%
spent in ha_random64() to 14M ctx/s.
Willy Tarreau [Sun, 29 Nov 2020 16:12:15 +0000 (17:12 +0100)]
MINOR: debug: add "debug dev sched" to stress the scheduler.
This command supports starting a bunch of tasks or tasklets, either on the
current thread (mask=0), all (default), or any set, either single-threaded
or multi-threaded, and possibly auto-scheduled.
These tasks/tasklets will randomly pick another one to wake it up. The
tasks only do it 50% of the time while tasklets always wake two tasks up,
in order to achieve roughly 50% load (since the target might already be
woken up).
Your Name [Sat, 28 Nov 2020 15:37:14 +0000 (15:37 +0000)]
MINOR: plock: use an ARMv8 instruction barrier for the pause instruction
As suggested by @AGSaidi in issue #958, on ARMv8 its convenient to use
an "isb" instruction in pl_cpu_relax() to improve fairness. Without it
I've met a few watchdog conditions on valid locks with 16 threads,
indicating that some threads couldn't manage to get it in 2 seconds. I
never happened again with it. In addition, the performance increased
by slightly more than 5% thanks to the reduced contention.
This should be backported as far as 2.2, possibly even 2.0.
BUG/MINOR: http-fetch: Fix smp_fetch_body() when called from a health-check
res.body may be called from a health-check. It is probably never used. But it is
possibe. In such case, there is no channel. Thus we must not use it
unconditionally to set the flag SMP_F_MAY_CHANGE on the smp.
Now the condition test the channel first. In addtion, the flag is not set if the
payload is fully received.
MINOR: tcpcheck: Add support of L7OKC on expect rules error-status argument
L7OKC may now be used as an error status for an HTTP/TCP expect rule. Thus
it is for instance possible to write:
option httpchk GET /isalive
http-check expect status 200,404
http-check expect status 200 error-status L7OKC
It is more or less the same than the disable-on-404 option except that if a
DOWN is up again but still replying a 404 will be set to NOLB state. While
it will stay in DOWN state with the disable-on-404 option.
MINOR: check: Always increment check health counter on CONPASS
Regarding the health counter, a check finished with the CONDPASS result is
now the same than with the PASSED result: The health counter is always
incemented. Before, it was only performed is the health counter was not 0.
There is no change for the disable-on-404 option because it is only
evaluated for running or stopping servers. So with an health check counter
greater than 0. But it will make possible to handle (STOPPED -> STOPPING)
transition for servers.
REORG: tcpcheck: Move check option parsing functions based on tcp-check
The parsing of the check options based on tcp-check rules (redis, spop,
smtp, http...) are moved aways from check.c. Now, these functions are placed
in tcpcheck.c. These functions are only related to the tcpcheck ruleset
configured on a proxy and not to the health-check attached to a server.
MINOR: config: Add a warning if tune.chksize is used
This option is now deprecated. It is recent, but it is now marked as
deprecated as far as 2.2. Thus, there is now a warning in the 2.4 if this
option is still used. It will be removed in 2.5.
Becaue the 2.3 is quite new, this patch may be backported to 2.3.
MINOR: config: Deprecate and ignore tune.chksize global option
This option is now ignored because I/O check buffers are now allocated using the
buffer pool. Thus, it is marked as deprecated in the documentation and ignored
during the configuration parsing. The field is also removed from the global
structure.
Because this option is ignored since a recent fix, backported as fare as 2.2,
this patch should be backported too. Especially because it updates the
documentation.
MINOR: tcpcheck: Don't handle anymore in-progress connect rules in tcpcheck_main
The special handling of in-progress connect rules at the begining of
tcpcheck_main() function can be removed. Instead, at the begining of the
tcpcheck_eval_connect() function, we test is there is already an existing
connection. In this case, it means we are waiting for a connection
establishment. In addition, before evaluating a new connect rule, we take
care to release any previous connection.
BUG/MAJOR: tcpcheck: Allocate input and output buffers from the buffer pool
Historically, the input and output buffers of a check are allocated by hand
during the startup, with a specific size (not necessarily the same than
other buffers). But since the recent refactoring of the checks to rely
exclusively on the tcp-checks and to use the underlying mux layer, this part
is totally buggy. Indeed, because these buffers are now passed to a mux,
they maybe be swapped if a zero-copy is possible. In fact, for now it is
only possible in h2_rcv_buf(). Thus the bug concretely only exists if a h2
health-check is performed. But, it is a latent bug for other muxes.
Another problem is the size of these buffers. because it may differ for the
other buffer size, it might be source of bugs.
Finally, for configurations with hundreds of thousands of servers, having 2
buffers per check always allocated may be an issue.
To fix the bug, we now allocate these buffers when required using the buffer
pool. Thus not-running checks don't waste memory and muxes may swap them if
possible. The only drawback is the check buffers have now always the same
size than buffers used by the streams. This deprecates indirectly the
"tune.chksize" global option.
In addition, the http-check regtest have been update to perform some h2
health-checks.
Many thanks to @VigneshSP94 for its help on this bug.
This patch should solve the issue #936. It relies on the commit "MINOR:
tcpcheck: Don't handle anymore in-progress send rules in tcpcheck_main".
Both must be backport as far as 2.2.
MINOR: tcpcheck: Don't handle anymore in-progress send rules in tcpcheck_main
The special handling of in-progress send rules at the begining of
tcpcheck_main() function can be removed. Instead, at the begining of the
tcpcheck_eval_send() function, we test is there is some data in the output
buffer. In this case, it means we are evaluating an unfinished send rule and
we can jump to the sending part, skipping the formatting part.
This patch is mandatory for a major fix on the checks and must be backported
as far as 2.2.
BUG/MINOR: tcpcheck: Don't forget to reset tcp-check flags on new kind of check
When a new kind of check is found during the parsing of a proxy section (via
an option directive), we must reset tcpcheck flags for this proxy. It is
mandatory to not inherit some flags from a previously declared check (for
instance in the default section).
Willy Tarreau [Thu, 26 Nov 2020 21:25:10 +0000 (22:25 +0100)]
MINOR: fd/threads: silence a build warning with threads disabled
Building with gcc-9.3.0 without threads may result in this warning:
In file included from include/haproxy/api-t.h:36,
from include/haproxy/api.h:33,
from src/fd.c:90:
src/fd.c: In function 'updt_fd_polling':
include/haproxy/fd.h:507:11: warning: array subscript 63 is above array bounds of 'int[1]' [-Warray-bounds]
507 | DISGUISE(write(poller_wr_pipe[tid], &c, 1));
include/haproxy/compiler.h:92:41: note: in definition of macro 'DISGUISE'
92 | #define DISGUISE(v) ({ typeof(v) __v = (v); ALREADY_CHECKED(__v); __v; })
| ^
src/fd.c:113:5: note: while referencing 'poller_wr_pipe'
113 | int poller_wr_pipe[MAX_THREADS]; // Pipe to wake the threads
| ^~~~~~~~~~~~~~
gcc is wrong but this time it cannot be blamed because it doesn't know
that the FD's thread_mask always has at least one bit set. Let's add
the test for all_threads_mask there. It will also remove that test and
drop the else block.
Maciej Zdeb [Thu, 26 Nov 2020 10:45:52 +0000 (10:45 +0000)]
DOC: Clarify %HP description in log-format
%HP is used to report HTTP request URI in logs, which might be relative
or absolute. Description in documentation should not suggest that it
behaves exactly the same as "path" sample fetch.
Willy Tarreau [Wed, 25 Nov 2020 18:58:20 +0000 (19:58 +0100)]
DOC: better document the config file format and escaping/quoting rules
It's always a pain to figure how to proceed when special characters need
to be embedded inside arguments of an expression. Let's document the
configuration file format and how unquoting/unescaping works at each
level (top level and argument level) so that everyone hopefully finds
suitable reminders or examples for complex cases.
This is related to github issue #200 and addresses issues #712 and #966.
Willy Tarreau [Thu, 26 Nov 2020 16:06:04 +0000 (17:06 +0100)]
BUG/MAJOR: peers: fix partial message decoding
Another bug in the peers message parser was uncovered by last commit 1dfd4f106 ("BUG/MEDIUM: peers: fix decoding of multi-byte length in
stick-table messages"): the function return on incomplete message does
not check if the channel has a pending close before deciding to return
0. It did not hurt previously because the loop calling co_getblk() once
per character would have depleted the buffer and hit the end, causing
<0 to be returned and matching the condition. But now that we process
at once what is available this cannot be relied on anymore and it's
now clearly visible that the final check is missing.
What happens when this strikes is that if a peer connection breaks in
the middle of a message, the function will return 0 (missing data) but
the caller doesn't check for the closed buffer, subscribes to reads,
and the applet handler is immediately called again since some data are
still available. This is detected by the loop prevention and the process
dies complaining that an appctx is spinning.
This patch simply adds the check for closed channel. It must be
backported to the same versions as the fix above.
Tim Duesterhus [Tue, 24 Nov 2020 21:22:56 +0000 (22:22 +0100)]
BUG/CRITICAL: cache: Fix trivial crash by sending accept-encoding header
Since commit 3d08236cb344607557f22e00cf1cc3e321a1fa95 HAProxy can be trivially
crashed remotely by sending an `accept-encoding` HTTP request header that
contains 16 commas.
This is because the `values` array in `accept_encoding_normalizer` accepts only
16 entries and it is not verified whether the end is reached during looping.
Fix this issue by checking the length. This patch also simplifies the ist
processing in the loop, because it manually calculated offsets and lengths,
when the ist API exposes perfectly safe functions to advance and truncate ists.
I wonder whether the accept_encoding_normalizer function is able to re-use some
existing function for parsing headers that may contain lists of values. I'll
leave this evaluation up to someone else, only patching the obvious crash.
This commit is 2.4-dev specific and was merged just a few hours ago. No
backport needed.
MINOR: cache: Add a process-vary option that can enable/disable Vary processing
The cache section's process-vary option takes a 0 or 1 value to disable
or enable the vary processing.
When disabled, a response containing such a header will never be cached.
When enabled, we will calculate a preliminary hash for a subset of request
headers on all the incoming requests (which might come with a cpu cost) which
will be used to build a secondary key for a given request (see RFC 7234#4.1).
The default value is 0 (disabled).
Calculate a preliminary secondary key for every request we see so that
we can have a real secondary key if the response is cacheable and
contains a manageable Vary header.
The cache's ebtree is now allowed to have multiple entries with the same
primary key. Two of those entries will be distinguished thanks to
secondary keys stored in the cache_entry (based on hashes of a subset of
their headers).
When looking for an entry in the cache (cache_use), we still use the
primary key (built the same way as before), but in case of match, we
also need to check if the entry has a vary signature. If it has one, we
need to perform an extra check based on the newly built secondary key.
We will only be able to forge a response out of the cache if both the
primary and secondary keys match with one of our entries. Otherwise the
request will be forwarder to the server.
MINOR: cache: Prepare helper functions for Vary support
The Vary functionality is based on a secondary key that needs to be
calculated for every request to which a server answers with a Vary
header. The Vary header, which can only be found in server responses,
determines which headers of the request need to be taken into account in
the secondary key. Since we do not want to have to store all the headers
of the request until we have the response, we will pre-calculate as many
sub-hashes as there are headers that we want to manage in a Vary
context. We will only focus on a subset of headers which are likely to
be mentioned in a Vary response (accept-encoding and referer for now).
Every managed header will have its own normalization function which is
in charge of transforming the header value into a core representation,
more robust to insignificant changes that could exist between multiple
clients. For instance, two accept-encoding values mentioning the same
encodings but in different orders should give the same hash.
This patch adds a function that parses a Vary header value and checks if
all the values belong to our supported subset. It also adds the
normalization functions for our two headers, as well as utility
functions that can prebuild a secondary key for a given request and
transform it into an actual secondary key after the vary signature is
determined from the response.
BUG/MAJOR: filters: Always keep all offsets up to date during data filtering
When at least one data filter is registered on a channel, the offsets of all
filters must be kept up to date. For data filters but also for others. It is
safer to do it in that way. Indirectly, this patch fixes 2 hidden bugs
revealed by the commit 22fca1f2c ("BUG/MEDIUM: filters: Forward all filtered
data at the end of http filtering").
The first one, the worst of both, happens at the end of http filtering when
at least one data filtered is registered on the channel. We call the
http_end() callback function on the filters, when defined, to finish the
http filtering. But it is performed for all filters. Before the commit 22fca1f2c, the only risk was to call the http_end() callback function
unexpectedly on a filter. Now, we may have an overflow on the offset
variable, used at the end to forward all filtered data. Of course, from the
moment we forward an arbitrary huge amount of data, all kinds of bad things
may happen. So offset computation is performed for all filters and
http_end() callback function is called only for data filters.
The other one happens when a data filter alter the data of a channel, it
must update the offsets of all previous filters. But the offset of non-data
filters must be up to date, otherwise, here too we may have an integer
overflow.
Another way to fix these bugs is to always ignore non-data filters from the
offsets computation. But this patch is safer and probably easier to
maintain.
This patch must be backported in all versions where the above commit is. So
as far as 2.0.
Joao Morais [Tue, 24 Nov 2020 11:24:30 +0000 (08:24 -0300)]
DOC: better describes how to configure a fallback crt
A default certificate is always the first one declared in the bind line,
either from `crt` or from `crt-line` option. This commit updates the
description of how to configure a fallback certificate, clarifying that
it needs to be the first one of the bind line.
Should be merged as far as the first SNI filter implementation.
Maciej Zdeb [Mon, 23 Nov 2020 16:03:09 +0000 (16:03 +0000)]
BUG/MEDIUM: http_act: Restore init of log-format list
Restore init of log-format list in parse_http_del_header which was
accidently deleted by commit ebdd4c55da4360bde7878604ea528c2031a26541
(implementation of different header matching methods for
http-request/response del-header).
Ilya Shipitsin [Sat, 21 Nov 2020 18:10:53 +0000 (23:10 +0500)]
BUILD: SSL: add BoringSSL guarding to "RAND_keep_random_devices_open"
"RAND_keep_random_devices_open" is OpenSSL specific, does not present
in other OpenSSL variants like LibreSSL or BoringSSL. BoringSSL recently
"updated" its internal openssl version to 1.1.1, we temporarily set it
back to 1.1.0, as we are going to remove that hack, let us add proper
guarding.
Level-7 retries are only possible with a restricted number of HTTP
return codes. While it is usually not safe to retry on 401 and 403, I
came up with an authentication backend which was not synchronizing
authentication of users. While not perfect, being allowed to also retry
on those return codes is really helpful and acts as a hotfix until we
can fix the backend.