Willy Tarreau [Fri, 12 Mar 2021 08:30:14 +0000 (09:30 +0100)]
MINOR: cfgparse: suggest correct spelling for unknown words in global section
The global section also knows a large number of keywords that are not
referenced in any list, so this needed them to be specifically listed.
It becomes particularly handy now because some tunables are never easy
to remember, but now it works remarkably well:
$ printf "global\nsched.queue_depth\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/093007 (23457) : haproxy version is 2.4-dev11-dd8ee5-24
[NOTICE] 070/093007 (23457) : path to executable is ./haproxy
[ALERT] 070/093007 (23457) : parsing [/dev/stdin:2] : unknown keyword 'sched.queue_depth' in 'global' section; did you mean 'tune.runqueue-depth' maybe ?
[ALERT] 070/093007 (23457) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/093007 (23457) : Fatal errors found in configuration.
Willy Tarreau [Fri, 12 Mar 2021 08:14:19 +0000 (09:14 +0100)]
MINOR: cfgparse: suggest correct spelling for unknown words in proxy sections
Let's start by the largest keyword list, the listeners. Many keywords were
still not part of a list, so a common_kw_list array was added to list the
not enumerated ones. Now for example, typing "tmout" properly suggests
"timeout":
$ printf "frontend f\ntmout client 10s\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/091355 (22545) : haproxy version is 2.4-dev11-3b728a-21
[NOTICE] 070/091355 (22545) : path to executable is ./haproxy
[ALERT] 070/091355 (22545) : parsing [/dev/stdin:2] : unknown keyword 'tmout' in 'frontend' section; did you mean 'timeout' maybe ?
[ALERT] 070/091355 (22545) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/091355 (22545) : Fatal errors found in configuration.
Willy Tarreau [Fri, 12 Mar 2021 08:08:04 +0000 (09:08 +0100)]
MINOR: cfgparse: add cfg_find_best_match() to suggest an existing word
Instead of just reporting "unknown keyword", let's provide a function which
will look through a list of registered keywords for a similar-looking word
to the one that wasn't matched. This will help callers suggest correct
spelling. Also, given that a large part of the config parser still relies
on a long chain of strcmp(), we'll need to be able to pass extra candidates.
Thus the function supports an optional extra list for this purpose.
Willy Tarreau [Fri, 12 Mar 2021 08:01:52 +0000 (09:01 +0100)]
MINOR: tools: add simple word fingerprinting to find similar-looking words
This introduces two functions, one which creates a fingerprint of a word,
and one which computes a distance between two words fingerprints. The
fingerprint is made by counting the transitions between one character and
another one. Here we consider the 26 alphabetic letters regardless of
their case, then any digit as a digit, and anything else as "other". We
also consider the first and last locations as transitions from begin to
first char, and last char to end. The distance is simply the sum of the
squares of the differences between two fingerprints. This way, doubling/
missing a letter has the same cost, however some repeated transitions
such as "e"->"r" like in "server" are very unlikely to match against
situations where they do not exist. This is a naive approach but it seems
to work sufficiently well for now. It may be refined in the future if
needed.
Willy Tarreau [Fri, 12 Mar 2021 10:37:05 +0000 (11:37 +0100)]
CLEANUP: http-rules: remove the unexpected comma before the list of action keywords
The error message for http-request and http-response starts with a comma
that very likely is a leftover from a previous list construct. Let's remove
it: "'http-request' expects , 'wait-for-handshake', 'use-service' ...".
Willy Tarreau [Fri, 12 Mar 2021 13:09:10 +0000 (14:09 +0100)]
BUG/MINOR: server-state: use the argument, not the global state
The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor
apply_server_state() to make it more readable") also had a copy-paste
error resulting in using global.server_state_file instead of the
function's argument, which easily crashes with a conf having a
state file in a backend and no global state file.
In addition, let's simplify the code and get rid of strcpy() which
almost certainly will break the build on OpenBSD.
This was introduced in 2.4-dev10, no backport is needed.
Willy Tarreau [Fri, 12 Mar 2021 12:57:19 +0000 (13:57 +0100)]
BUG/MINOR: server-state: properly handle the case where the base is not set
The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor
apply_server_state() to make it more readable") made the global
server_state_base be dereferenced before being checked, resulting
in a crash on certain files.
This happened in 2.4-dev10, no backport is needed.
BUG/MINOR: tcpcheck: Fix double free on error path when parsing tcp/http-check
When a "tcp-check" or a "http-check" rule is parsed, we try to get the
previous rule in the ruleset to get its index. We must take care to reset
the pointer on this rule in case an error is triggered later on the
parsing. Otherwise, the same rule may be released twice. For instance, it
happens with such line :
http-check meth GET uri / ## note there is no "send" parameter
BUG/MINOR: proxy/session: Be sure to have a listener to increment its counters
It is possible to have a session without a listener. It happens for applets
on the client side. Thus all accesses to the listener info from the session
must be guarded. It was the purpose of the commit 36119de18 ("BUG/MEDIUM:
session: NULL dereference possible when accessing the listener"). However,
some tests on the session's listener existence are missing in proxy_inc_*
functions.
This patch should fix the issues #1171, #1172, #1173, #1174 and #1175. It
must be backported with the above commit as far as 1.8.
BUG/MINOR: tcpcheck: Update .health threshold of agent inside an agent-check
If an agent-check is configured for a server, When the response is parsed,
the .health threshold of the agent must be updated on up/down/stopped/fail
command and not the threshold of the health-check. Otherwise, the
agent-check will compete with the health-check and may mark a DOWN server as
UP.
This patch should fix the issue #1176. It must be backported as far as 2.2.
BUG/MEDIUM: filters: Set CF_FL_ANALYZE on channels when filters are attached
CF_FL_ANALYZE flag is used to know a channel is filtered. It is important to
synchronize request and response channels when the filtering ends.
However, it is possible to call all request analyzers before starting the
filtering on the response channel. This means flt_end_analyze() may be
called for the request channel before flt_start_analyze() on the response
channel. Thus because CF_FL_ANALYZE flag is not set on the response channel,
we consider the filtering is finished on both sides. The consequence is that
flt_end_analyze() is not called for the response and backend filters are
unregistered before their execution on the response channel.
It is possible to encounter this bug on TCP frontend or CONNECT request on
HTTP frontend if the client shutdown is reveiced with the first read.
To fix this bug, CF_FL_ANALYZE is set when filters are attached to the
stream. It means, on the request channel when the stream is created, in
flt_stream_start(). And on both channels when the backend is set, in
flt_set_stream_backend().
Willy Tarreau [Fri, 12 Mar 2021 05:06:14 +0000 (06:06 +0100)]
BUILD: atomic/arm64: force the register pairs to use in __ha_cas_dw()
Since commit f8fb4f75f ("MINOR: atomic: implement a more efficient arm64
__ha_cas_dw() using pairs"), on some modern arm64 (armv8.1+) compiled
with -march=armv8.1-a under gcc-7.5.0, a build error may appear on ev_poll.o :
/tmp/ccHD2lN8.s:1771: Error: reg pair must start from even reg at operand 1 -- `casp x27,x28,x22,x23,[x12]'
Makefile:927: recipe for target 'src/ev_poll.o' failed
It appears that the compiler cannot always assign register pairs there
for a structure made of two u64. It was possibly later addressed since
gcc-9.3 never caused this, but there's no trivially available info on
the subject in the changelogs. Unsuprizingly, using a u128 instead does
fix this, but it significantly inflates the code (+4kB for just 6 places,
very likely that it loaded some extra stubs) and the comparison is ugly,
involving two slower conditional jumps instead of a single one and a
conditional comparison. For example, ha_random64() grew from 144 bytes
to 232.
However, simply forcing the base register does work pretty well, and
makes the code even cleaner and more efficient by further reducing it
by about 4.5kB, possibly because it helps the compiler to pick suitable
registers for the pair there. And the perf on 64-cores looks steadily
0.5% above the previous one, so let's do this.
Note that the commit above was backported to 2.3 to fix scalability
issues on AWS Graviton2 platform, so this one will need to be as well.
Emeric Brun [Wed, 10 Mar 2021 15:58:03 +0000 (16:58 +0100)]
BUG/MEDIUM: stick-tables: fix ref counter in table entry using multiple http tracksc.
Setting multiple http-request track-scX rules generates entries
which never expires.
If there was already an entry registered by a previous http rule
'stream_track_stkctr(&s->stkctr[rule->action], t, ts)' didn't
register the new 'ts' into the stkctr. And function is left
with no reference on 'ts' whereas refcount had been increased
by the '_get_entry'
The patch applies the same policy as the one showed on tcp track
rules and if there is successive rules the track counter keep the
first entry registered in the counter and nothing more is computed.
After validation this should be backported in all versions.
The QUIC connection struct connection member was not initialized. This may
make randomly haproxy handle TLS connections as QUIC ones only when QUIC support
is enabled leading to such OpenSSL errors (captured from a reg test output, TLS
Client-Hello callback failed):
Willy Tarreau [Wed, 10 Mar 2021 10:06:26 +0000 (11:06 +0100)]
OPTIM: task: automatically adjust the default runqueue-depth to the threads
The recent default runqueue size reduction appeared to have significantly
lowered performance on low-thread count configs. Testing various values
runqueue values on different workloads under thread counts ranging from
1 to 64, it appeared that lower values are more optimal for high thread
counts and conversely. It could even be drawn that the optimal value for
various workloads sits around 280/sqrt(nbthread), and probably has to do
with both the L3 cache usage and how to optimally interlace the threads'
activity to minimize contention. This is much easier to optimally
configure, so let's do this by default now.
Willy Tarreau [Wed, 10 Mar 2021 08:26:24 +0000 (09:26 +0100)]
MINOR: task: give the scheduler a bit more flexibility in the runqueue size
Instead of setting a hard-limit on runqueue-depth and keeping it short
to maintain fairness, let's allow the scheduler to automatically cut
the existing one in two equal halves if its size is between the configured
size and its double. This will allow to increase the default value while
keeping a low latency.
Daniel Corbett [Wed, 10 Mar 2021 04:00:34 +0000 (23:00 -0500)]
BUG/MINOR: sample: Rename SenderComID/TargetComID to SenderCompID/TargetCompID
The recently introduced Financial Information eXchange (FIX)
converters have some hard coded tags based on the specification that
were misspelled. Specifically, SenderComID and TargetComID should
be SenderCompID and TargetCompID according to the specification [1][2].
This patch updates all references, which includes the converters
themselves, the regression test, and the documentation.
Willy Tarreau [Tue, 9 Mar 2021 16:58:02 +0000 (17:58 +0100)]
BUG/MEDIUM: ssl: properly remove the TASK_HEAVY flag at end of handshake
Emeric found that SSL+keepalive traffic had dropped quite a bit in the
recent changes, which could be bisected to recent commit 9205ab31d
("MINOR: ssl: mark the SSL handshake tasklet as heavy"). Indeed, a
first incarnation of this commit made use of the TASK_SELF_WAKING
flag but the last version directly used TASK_HEAVY, but it would still
continue to remove the already absent TASK_SELF_WAKING one instead of
TASK_HEAVY. As such, the SSL traffic remained processed with low
granularity.
Willy Tarreau [Tue, 9 Mar 2021 15:55:18 +0000 (16:55 +0100)]
CLEANUP: config: also address the cfg_keyword API change in the compression code
The tests were made on slz and the zlib parsers for memlevel and windowsize
managed to escape the change made by commit 018251667 ("CLEANUP: config: make
the cfg_keyword parsers take a const for the defproxy"). This is now fixed.
Emeric Brun [Mon, 8 Mar 2021 15:41:29 +0000 (16:41 +0100)]
BUG/MEDIUM: resolvers: handle huge responses over tcp servers.
Parameter "accepted_payload_size" is currently considered regardless
the used nameserver is using TCP or UDP. It remains mandatory to annouce
such capability to support e-dns, so a value have to be announced also
in TCP. Maximum DNS message size in TCP is limited by protocol to 65535
and so for UDP (65507) if system supports such UDP messages. But
the maximum value for this option was arbitrary forced to 8192.
This patch change this maximum to 65535 to allow user to set bigger value
for UDP if its system supports. It also sets accepted_payload_size
in TCP allowing to retrieve huge responses if the configuration uses
TCP nameservers.
The request announcing the accepted_payload_size capability is currently
built at resolvers level and is common to all used nameservers of the
section regardess transport protocol used. A further patch should be
made to at least specify a different payload size depending of the
transport, and perhaps could be forced to 65535 in case of TCP and
maximum would be forced back to 65507 matching UDP max.
Willy Tarreau [Tue, 9 Mar 2021 14:43:32 +0000 (15:43 +0100)]
CLEANUP: stream: rename a few remaining occurrences of "stream *sess"
These are some leftovers from the ancient code where they were still
called sessions, but these areas in the code remain confusing due to
this naming. They were now called "strm" which will not even affect
indenting nor alignment.
clang always return 0 if an option in not recognized unless
-Werror is also passed, preventing a correct probing of options
supported by the compiler (tested with clang 6.0.1 to 11.1.0).
Please note today this is not visible since clang 11 exit with SIGABRT
or with return code 1 on older version due to bad file descriptor from
file descriptor handling
BUG/MEDIUM: session: NULL dereference possible when accessing the listener
When implementing a client applet, a NULL dereference was encountered on
the error path which increment the counters.
Indeed, the counters incremented are the one in the listener which does
not exist in the case of client applets, so in sess->listener->counters,
listener is NULL.
This patch fixes the access to the listener structure when accessing
from a sesssion, most of the access are the counters in error paths.
Willy Tarreau [Tue, 9 Mar 2021 09:15:16 +0000 (10:15 +0100)]
BUILD: connection: do not use VAR_ARRAY in struct tlv
It was brought by commit c44b8de99 ("CLEANUP: connection: Use `VAR_ARRAY`
in `struct tlv` definition") but breaks the build with clang. Actually it
had already been done 6 months ago by commit 4987a4744 ("CLEANUP: tree-wide:
use VAR_ARRAY instead of [0] in various definitions") then reverted by
commit 441b6c31e ("BUILD: connection: fix build on clang after the VAR_ARRAY
cleanup") which explained the same thing but didn't place a comment in the
code to justify this (in short it's just an end of struct marker).
Willy Tarreau [Tue, 9 Mar 2021 08:53:46 +0000 (09:53 +0100)]
CLEANUP: config: make the cfg_keyword parsers take a const for the defproxy
The default proxy was passed as a variable to all parsers instead of a
const, which is not without risk, especially when some timeout parsers used
to make some int pointers point to the default values for comparisons. We
want to be certain that none of these parsers will modify the defaults
sections by accident, so it's important to mark this proxy as const.
Willy Tarreau [Tue, 9 Mar 2021 09:08:05 +0000 (10:08 +0100)]
BUILD: bug: refine HA_LINK_ERROR() to only be used on gcc and derivatives
TCC happens to define __OPTIMIZE__ at -O2 but doesn't proceed with dead
code elimination, resulting in ha_free() to always reference the link
error symbol. Let's condition this test on __GCC__ which others like
Clang also define.
Willy Tarreau [Tue, 9 Mar 2021 08:59:50 +0000 (09:59 +0100)]
BUILD: task: fix build at -O0 with threads disabled
grq_total was incremented when picking tasks from the global run queue,
but this variable was not defined with threads disabled, and the code
was optimized away at -O2. No backport is needed.
Willy Tarreau [Fri, 5 Mar 2021 20:24:23 +0000 (21:24 +0100)]
[RELEASE] Released version 2.4-dev11
Released version 2.4-dev11 with the following main changes :
- CI: codespell: skip Makefile for spell check
- CLEANUP: assorted typo fixes in the code and comments
- BUG/MINOR: tcp-act: Don't forget to set the original port for IPv4 set-dst rule
- BUG/MINOR: connection: Use the client's dst family for adressless servers
- BUG/MEDIUM: spoe: Kill applets if there are pending connections and nbthread > 1
- CLEANUP: Use ist2(const void*, size_t) whenever possible
- CLEANUP: Use IST_NULL whenever possible
- BUILD: proxy: Missing header inclusion for quic_transport_params_init()
- BUILD: quic: Implicit conversion between SSL related enums.
- DOC: spoe: Add a note about fragmentation support in HAProxy
- MINOR: contrib: add support for heartbeat control messages.
- MINOR: contrib: Enhance peers dissector heuristic.
- BUG/MINOR: mux-h2: Fix typo in scheme adjustment
- CLEANUP: Reapply the ist2() replacement patch
- CLEANUP: Use istadv(const struct ist, const size_t) whenever possible
- CLEANUP: Use isttest(const struct ist) whenever possible
- Revert "CI: Pin VTest to a known good commit"
- CLEANUP: backend: fix a wrong comment
- BUG/MINOR: backend: free allocated bind_addr if reuse conn
- MINOR: backend: handle reuse for conns with no server as target
- REGTESTS: test http-reuse if no server target
- BUG/MINOR: hlua: Don't strip last non-LWS char in hlua_pushstrippedstring()
- BUG/MINOR: server-state: Don't load server-state file for disabled backends
- CLEANUP: dns: Use DISGUISE() on a never-failing ring_attach() call
- CLEANUP: dns: Remove useless test on ns->dgram in dns_connect_nameserver()
- DOC: fix originalto except clause on destination address
- CLEANUP: Use the ist() macro whenever possible
- CLEANUP: Replace for loop with only a condition by while
- REORG: atomic: reimplement pl_cpu_relax() from atomic-ops.h
- BUG/MINOR: mt-list: always perform a cpu_relax call on failure
- MINOR: atomic: add armv8.1-a atomics variant for cas-dw
- MINOR: atomic: implement a more efficient arm64 __ha_cas_dw() using pairs
- BUG/MINOR: ssl: don't truncate the file descriptor to 16 bits in debug mode
- MEDIUM: pools: add CONFIG_HAP_NO_GLOBAL_POOLS and CONFIG_HAP_GLOBAL_POOLS
- MINOR: pools: double the local pool cache size to 1 MB
- MINOR: stream: use ABORT_NOW() and not abort() in stream_dump_and_crash()
- CLEANUP: stream: explain why we queue the stream at the head of the server list
- MEDIUM: backend: use a trylock when trying to grab an idle connection
- REORG: tools: promote the debug PRNG to more general use as a statistical one
- OPTIM: lb-random: use a cheaper PRNG to pick a server
- MINOR: task: stop abusing the nice field to detect a tasklet
- MINOR: task: move the nice field to the struct task only
- MEDIUM: task: extend the state field to 32 bits
- MINOR: task: add an application specific flag to the state: TASK_F_USR1
- MEDIUM: muxes: mark idle conns tasklets with TASK_F_USR1
- MINOR: xprt: add new xprt_set_idle and xprt_set_used methods
- MEDIUM: ssl: implement xprt_set_used and xprt_set_idle to relax context checks
- MINOR: server: don't read curr_used_conns multiple times
- CLEANUP: global: reorder some fields to respect cache lines
- CLEANUP: sockpair: silence a coverity check about fcntl()
- CLEANUP: lua: set a dummy file name and line number on the dummy servers
- MINOR: server: add a global list of all known servers
- MINOR: cfgparse: finish to set up servers outside of the proxy setup loop
- MINOR: server: allocate a per-thread struct for the per-thread connections stuff
- MINOR: server: move actconns to the per-thread structure
- CLEANUP: server: reorder some fields in the server struct to respect cache lines
- MINOR: backend: add a BUG_ON if conn mux NULL in connect_server
- BUG/MINOR: backend: fix condition for reuse on mode HTTP
- BUILD: Fix build when using clang without optimizing.
- CLEANUP: assorted typo fixes in the code and comments
BUILD: Fix build when using clang without optimizing.
ha_free() uses code that attempts to set a non-existant variable to provoke
a link-time error, with the expectation that the compiler will not omit that
if the code is unreachable. However, clang will emit it when compiling with
no optimization, so only do that if __OPTIMIZE__ is defined.
It fixes the check for the early insertion of backend connections in
the reuse lists if the backend mode is HTTP.
The impact of this bug seems limited because :
- in tcp mode, no insertion is done in the avail list as mux_pt does not
support multiple streams.
- in http mode, muxes are also responsible to insert backend connections
in lists in their detach functions. Prior to this fix the reuse rate
could be slightly inferior.
MINOR: backend: add a BUG_ON if conn mux NULL in connect_server
Currently, there seems to be no way to have the transport layer ready
but not the mux in the function connect_server. Add a BUG_ON to report
if this implicit condition is not true anymore.
This should fix coverity report from github issue #1120.
Willy Tarreau [Fri, 5 Mar 2021 10:37:32 +0000 (11:37 +0100)]
CLEANUP: server: reorder some fields in the server struct to respect cache lines
There's currently quite some thread contention in the server struct
because frequently fields accessed fields are mixed with those being
often written to by any thread. Let's split this a little bit to
separate a few areas:
- pure config / admin / operating status (almost never changes)
- idle and queuing (fast changes, done almost together)
- LB (fast changes, not necessarily dependent on the above)
- counters (fast changes, at a different instant again)
Willy Tarreau [Thu, 4 Mar 2021 09:47:54 +0000 (10:47 +0100)]
MINOR: server: move actconns to the per-thread structure
The actconns list creates massive contention on low server counts because
it's in fact a list of streams using a server, all threads compete on the
list's head and it's still possible to see some watchdog panics on 48
threads under extreme contention with 47 threads trying to add and one
thread trying to delete.
Moving this list per thread is trivial because it's only used by
srv_shutdown_streams(), which simply required to iterate over the list.
The field was renamed to "streams" as it's really a list of streams
rather than a list of connections.
Willy Tarreau [Thu, 4 Mar 2021 08:45:32 +0000 (09:45 +0100)]
MINOR: server: allocate a per-thread struct for the per-thread connections stuff
There are multiple per-thread lists in the listeners, which isn't the
most efficient in terms of cache, and doesn't easily allow to store all
the per-thread stuff.
Now we introduce an srv_per_thread structure which the servers will have an
array of, and place the idle/safe/avail conns tree heads into. Overall this
was a fairly mechanical change, and the array is now always initialized for
all servers since we'll put more stuff there. It's worth noting that the Lua
code still has to deal with its own deinit by itself despite being in a
global list, because its server is not dynamically allocated.
Willy Tarreau [Fri, 5 Mar 2021 09:48:42 +0000 (10:48 +0100)]
MINOR: cfgparse: finish to set up servers outside of the proxy setup loop
Till now servers were only initialized as part of the proxy setup loop,
which doesn't cover peers, tcp log, dns, lua etc. Let's move this part
out of this loop and instead iterate over all registered servers. This
way we're certain to visit them all.
The patch looks big but it's just a move of a large block with the
corresponding reindent (as can be checked with diff -b). It relies
on the two previous ones ("MINOR: server: add a global list of all
known servers and" and "CLEANUP: lua: set a dummy file name and line
number on the dummy servers").
Willy Tarreau [Fri, 5 Mar 2021 09:23:32 +0000 (10:23 +0100)]
MINOR: server: add a global list of all known servers
It's a real pain not to have access to the list of all registered servers,
because whenever there is a need to late adjust their configuration, only
those attached to regular proxies are seen, but not the peers, lua, logs
nor DNS.
What this patch does is that new_server() will automatically add the newly
created server to a global list, and it does so as well for the 1 or 2
statically allocated servers created for Lua. This way it will be possible
to iterate over all of them.
Willy Tarreau [Fri, 5 Mar 2021 09:41:48 +0000 (10:41 +0100)]
CLEANUP: lua: set a dummy file name and line number on the dummy servers
The "socket_tcp" and "socket_ssl" servers had no config file name nor
line number, but this is sometimes annoying during debugging or later
in error messages, while all other places using new_server() or
parse_server() make sure to have a valid file:line set. Let's set
something to address this.
Willy Tarreau [Wed, 3 Mar 2021 17:23:19 +0000 (18:23 +0100)]
CLEANUP: global: reorder some fields to respect cache lines
Some entries are atomically updated by various threads, such as the
global counters, and they're mixed with others which are read all the
time like the mode. This explains why "perf" was seeing a huge access
cost on global.mode in process_stream()! Let's reorder them so that
the static config stuff is at the beginning and the live stuff is at
the end.
Willy Tarreau [Tue, 2 Mar 2021 16:29:56 +0000 (17:29 +0100)]
MEDIUM: ssl: implement xprt_set_used and xprt_set_idle to relax context checks
Currently the SSL layer checks the validity of its tasklet's context just
in case it would have been stolen, had the connection been idle. Now it
will be able to be notified by the mux when this situation happens so as
not to have to grab the idle connection lock on each pass. This reuses the
TASK_F_USR1 flag just as the muxes do.
Willy Tarreau [Tue, 2 Mar 2021 16:27:58 +0000 (17:27 +0100)]
MINOR: xprt: add new xprt_set_idle and xprt_set_used methods
These functions are used on the mux layer to indicate that the connection
is becoming idle and that the xprt ought to be careful before checking the
context or that it's not idle anymore and that the context is safe. The
purpose is to allow a mux which is going to release a connection to tell
the xprt to be careful when touching it. At the moment, the xprt are
always careful and that's costly so we want to have the ability to relax
this a bit.
Willy Tarreau [Tue, 2 Mar 2021 15:51:09 +0000 (16:51 +0100)]
MEDIUM: muxes: mark idle conns tasklets with TASK_F_USR1
The muxes are touching the idle_conns_lock all the time now because
they need to be careful that no other thread has stolen their tasklet's
context.
This patch changes this a little bit by setting the TASK_F_USR1 flag on
the tasklet before marking a connection idle, and removing it once it's
not idle anymore. Thanks to this we have the guarantee that a tasklet
without this flag cannot be present in an idle list and does not need
to go through this costly lock. This is especially true for front
connections.
Willy Tarreau [Tue, 2 Mar 2021 15:26:05 +0000 (16:26 +0100)]
MINOR: task: add an application specific flag to the state: TASK_F_USR1
This flag will be usable by any application. It will be preserved across
wakeups so the application can use it to do various stuff. Some I/O
handlers will soon benefit from this.
Willy Tarreau [Tue, 2 Mar 2021 15:09:26 +0000 (16:09 +0100)]
MEDIUM: task: extend the state field to 32 bits
It's been too short for quite a while now and is now full. It's still
time to extend it to 32-bits since we have room for this without
wasting any space, so we now gained 16 new bits for future flags.
The values were not reassigned just in case there would be a few
hidden u16 or short somewhere in which these flags are placed (as
it used to be the case with stream->pending_events).
The patch is tagged MEDIUM because this required to update the task's
process() prototype to use an int instead of a short, that's quite a
bunch of places.
Willy Tarreau [Tue, 2 Mar 2021 15:06:38 +0000 (16:06 +0100)]
MINOR: task: move the nice field to the struct task only
The nice field isn't needed anymore for the tasklet so we can move it
from the TASK_COMMON area into the struct task which already has a
hole around the expire entry.
Willy Tarreau [Tue, 2 Mar 2021 14:54:11 +0000 (15:54 +0100)]
MINOR: task: stop abusing the nice field to detect a tasklet
It's cleaner to use a flag from the task's state to detect a tasklet
and it's even cheaper. One of the best benefits is that this will
allow to get the nice field out of the common part since the tasklet
doesn't need it anymore. This commit uses the last task bit available
but that's temporary as the purpose of the change is to extend this.
Ubuntu [Mon, 1 Mar 2021 07:57:54 +0000 (07:57 +0000)]
OPTIM: lb-random: use a cheaper PRNG to pick a server
The PRNG used by the "random" LB algorithm was the central one which tries
hard to produce "correct" (i.e. hardly predictable) values suitable for use
in UUIDs or cookies. It's much too expensive for pure load balancing where
a cheaper thread-local PRNG is sufficient, and the current PRNG is part of
the hot places when running with many threads.
Let's switch to the stastistical PRNG instead, it's thread-local, very
fast, and with a period of (2^32)-1 which is more than enough to decide
on a server.
Willy Tarreau [Tue, 2 Mar 2021 13:01:35 +0000 (14:01 +0100)]
REORG: tools: promote the debug PRNG to more general use as a statistical one
We frequently need to access a simple and fast PRNG for statistical
purposes. The debug_prng() function did exactly this using a xorshift
generator but its use was limited to debug only. Let's move this to
tools.h and tools.c to make it accessible everywhere. Since it needs to
be fast, its state is thread-local. An initialization function starts a
different initial value for each thread for better distribution.
Ubuntu [Mon, 1 Mar 2021 06:22:17 +0000 (06:22 +0000)]
MEDIUM: backend: use a trylock when trying to grab an idle connection
In conn_backend_get() we can cause some extreme contention due to the
idle_conns_lock. Indeed, even though it's per-thread, it still causes
high contention when running with many threads. The reason is that all
threads which do not have any idle connections are quickly skipped,
till the point where there are still some, so the first reaching that
point will grab the lock and the other ones wait behind. From this
point, all threads are synchronized waiting on the same lock, and
will follow the leader in small jumps, all hindering each other.
Here instead of doing this we're using a trylock. This way when a thread
is already checking a list, other ones will continue to next thread. In
the worst case, a high contention will lead to a few new connections to
be set up, but this may actually be what is required to avoid contention
in the first place. With this change, the contention has mostly
disappeared on this lock (it's still present in muxes and transport
layers due to the takeover).
Surprisingly, checking for emptiness of the tree root before taking
the lock didn't address any contention.
A few improvements are still possible and desirable here. The first
one would be to avoid seeing all threads jump to the next one. We
could have each thread use a different prime number as the increment
so as to spread them across the entire table instead of keeping them
synchronized. The second one is that the lock in the muck layers
shouldn't be needed to check for the tasklet's context availability.
Ubuntu [Mon, 1 Mar 2021 06:21:47 +0000 (06:21 +0000)]
CLEANUP: stream: explain why we queue the stream at the head of the server list
In stream_add_srv_conn() MT_LIST_ADD() is used instead of MT_LIST_ADDQ(),
resulting in the stream being queued at the end of the server list. This
has no particular effect since we cannot dump the streams on a server,
and this is only used by "shutdown sessions" on a server. But it also
turns out to be significantly faster due to the shorter recovery from
the conflict with an adjacent MT_LIST_DEL(), thus it remains desirable
to use it, but at least it deserves a comment.
In addition to this, it's worth mentioning that this list should creates
extreme contention with threads while almost never used. It should be
made per-thread just like the global streams list.
Willy Tarreau [Tue, 2 Mar 2021 18:19:41 +0000 (19:19 +0100)]
MINOR: stream: use ABORT_NOW() and not abort() in stream_dump_and_crash()
Using abort() occasionally results in unexploitable core due to issues
rewinding the stack. Let's use ABORT_NOW() which in addition to crashing
much closer to the call point also has the benefit of showing the call
trace.
Willy Tarreau [Tue, 2 Mar 2021 19:11:29 +0000 (20:11 +0100)]
MINOR: pools: double the local pool cache size to 1 MB
The reason is that H2 can already require 32 16kB buffers for the mux
output at once, which will deplete the local cache. Thus it makes sense
to go further to leave some time to other connection to release theirs.
In addition, the L2 cache on modern CPUs is already 1 MB, so this change
is welcome in any case.
Willy Tarreau [Tue, 2 Mar 2021 19:05:09 +0000 (20:05 +0100)]
MEDIUM: pools: add CONFIG_HAP_NO_GLOBAL_POOLS and CONFIG_HAP_GLOBAL_POOLS
We've reached a point where the global pools represent a significant
bottleneck with threads. On a 64-core machine, the performance was
divided by 8 between 32 and 64 H2 connections only because there were
not enough entries in the local caches to avoid picking from the global
pools, and the contention on the list there was very high. It becomes
obvious that we need to have an array of lists, but that will require
more changes.
In parallel, standard memory allocators have improved, with tcmalloc
and jemalloc finding their ways through mainstream systems, and glibc
having upgraded to a thread-aware ptmalloc variant, keeping this level
of contention here isn't justified anymore when we have both the local
per-thread pool caches and a fast process-wide allocator.
For these reasons, this patch introduces a new compile time setting
CONFIG_HAP_NO_GLOBAL_POOLS which is set by default when threads are
enabled with thread local pool caches, and we know we have a fast
thread-aware memory allocator (currently set for glibc>=2.26). In this
case we entirely bypass the global pool and directly use the standard
memory allocator when missing objects from the local pools. It is also
possible to force it at compile time when a good allocator is used with
another setup.
It is still possible to re-enable the global pools using
CONFIG_HAP_GLOBAL_POOLS, if a corner case is discovered regarding the
operating system's default allocator, or when building with a recent
libc but a different allocator which provides other benefits but does
not scale well with threads.
Willy Tarreau [Tue, 2 Mar 2021 18:32:39 +0000 (19:32 +0100)]
BUG/MINOR: ssl: don't truncate the file descriptor to 16 bits in debug mode
Errors reported by ssl_sock_dump_errors() to stderr would only report the
16 lower bits of the file descriptor because it used to be casted to ushort.
This can be backported to all versions but has really no importance in
practice since this is never seen.
Ubuntu [Mon, 1 Mar 2021 07:01:20 +0000 (07:01 +0000)]
MINOR: atomic: implement a more efficient arm64 __ha_cas_dw() using pairs
There finally is a way to support register pairs on aarch64 assembly
under gcc, it's just undocumented, like many of the options there :-(
As indicated below, it's possible to pass "%H" to mention the high
part of a register pair (e.g. "%H0" to go with "%0"):
By making local variables from pairs of registers via a struct (as
is used in IST for example), we can let gcc choose the correct register
pairs and avoid a few moves in certain situations. The code is now slightly
more efficient than the previous one on AWS' Graviton2 platform, and
noticeably smaller (by 4.5kB approx). A few tests on older releases show
that even Linaro's gcc-4.7 used to support such register pairs and %H,
and by then ATOMICS were not supported so this should not cause build
issues, and as such this patch replaces the earlier implementation.
Willy Tarreau [Mon, 30 Nov 2020 17:58:16 +0000 (18:58 +0100)]
MINOR: atomic: add armv8.1-a atomics variant for cas-dw
This variant uses the CASP instruction available on armv8.1-a CPU cores,
which is detected when __ARM_FEATURE_ATOMICS is set (gcc-linaro >= 7,
mainline >= 9). This one was tested on cortex-A55 (S905D3) and on AWS'
Graviton2 CPUs.
The instruction performs way better on high thread counts since it
guarantees some forward progress when facing extreme contention while
the original LL/SC approach is light on low-thread counts but doesn't
guarantee progress.
The implementation is not the most optimal possible. In particular since
the instruction requires to work on register pairs and there doesn't seem
to be a way to force gcc to emit register pairs, we have to decide to force
to use the pair (x0,x1) to store the old value, and (x2,x3) to store the
new one, and this necessarily involves some extra moves. But at least it
does improve the situation with 16 threads and more. See issue #958 for
more context.
Note, a first implementation of this function was making use of an
input/output constraint passed using "+Q"(*(void**)target), which was
resulting in smaller overall code than passing "target" as an input
register only. It turned out that the cause was directly related to
whether the function was inlined or not, hence the "forceinline"
attribute. Any changes to this code should still pay attention to this
important factor.
Willy Tarreau [Mon, 1 Mar 2021 05:21:22 +0000 (06:21 +0100)]
BUG/MINOR: mt-list: always perform a cpu_relax call on failure
On highly threaded machines it is possible to occasionally trigger the
watchdog on certain contended areas like the server's connection list,
because while the mechanism inherently cannot guarantee a constant
progress, it lacks CPU relax calls which are absolutely necessary in
this situation to let a thread finish its job.
The loop's "while (1)" was changed to use a "for" statement calling
__ha_cpu_relax() as its continuation expression. This way the "continue"
statements jump to the unique place containing the pause without
excessively inflating the code.
This was sufficient to definitely fix the problem on 64-core ARM Graviton2
machines. This patch should probably be backported once it's confirmed it
also helps on many-cores x86 machines since some people are facing
contention in these environments. This patch depends on previous commit
"REORG: atomic: reimplement pl_cpu_relax() from atomic-ops.h".
An attempt was made to first read the value before exchanging, and it
significantly degraded the performance. It's very likely that this caused
other cores to lose exclusive ownership on their line and slow down their
next xchg operation.
In addition it was found that MT_LIST_ADD is significantly faster than
MT_LIST_ADDQ under high contention, because it fails one step earlier
when conflicting with an adjacent MT_LIST_DEL(). It might be worth
switching some operations' order to favor MT_LIST_ADDQ() instead.
Willy Tarreau [Tue, 2 Mar 2021 06:08:34 +0000 (07:08 +0100)]
REORG: atomic: reimplement pl_cpu_relax() from atomic-ops.h
There is some confusion here as we need to place some cpu_relax statements
in some loops where it's not easily possible to condition them on the use
of threads. That's what atomic.h already does. So let's take the various
pl_cpu_relax() implementations from there and place them in atomic.h under
the name __ha_cpu_relax() and let them adapt to the presence or absence of
threads and to the architecture (currently only x86 and aarch64 use a barrier
instruction), though it's very likely that arm would work well with a cache
flushing ISB instruction as well).
This time they were implemented as expressions returning 1 rather than
statements, in order to ease their placement as the loop condition or the
continuation expression inside "for" loops. We should probably do the same
with barriers and a few such other ones.
Note that this replacement is safe even in the strdup() case, because `ist()`
will not call `strlen()` on a `NULL` pointer. Instead is inserts a length of
`0`, effectively resulting in `IST_NULL`.
DOC: fix originalto except clause on destination address
Fix the description of the except clause of the originalto option. The
destination address and not the source is compared with the except range
address to prevent the addition of the X-Original-To header.
CLEANUP: dns: Remove useless test on ns->dgram in dns_connect_nameserver()
When dns_connect_nameserver() is called, the nameserver has always a dgram
field properly defined. The caller, dns_send_nameserver(), already performed
the appropriate verification.
CLEANUP: dns: Use DISGUISE() on a never-failing ring_attach() call
When a DNS session is created, the call to ring_attach() never fails. The
ring is freshly initialized and there is other watcher on it. Thus, the call
always succeeds.
Instead of catching an error that must never happen, we use the DISGUISE()
macro to make static analyzers happy.
BUG/MINOR: server-state: Don't load server-state file for disabled backends
Recent changes on the server-state file loading have introduced a
regression. HAproxy crashes if a backend with no server-state file is
disabled in the configuration. Indeed, configuration of such backends is not
finalized. Thus many fields are not defined.
To fix the bug, disabled backends must be ignored. In addition a BUG_ON()
has been added to verify the proxy mode regarding the server-state file. It
must be specified (none, global or local) for enabled backends.
BUG/MINOR: hlua: Don't strip last non-LWS char in hlua_pushstrippedstring()
hlua_pushstrippedstring() function strips leading and trailing LWS
characters. But the result length it too short by 1 byte. Thus the last
non-LWS character is stripped. Note that a string containing only LWS
characters resulting to a stipped string with an invalid length (-1). This
leads to a lua runtime error.
This bug was reported in the issue #1155. It must be backported as far as
1.7.
Add two new regtests which check the behavior of http-reuse when the
connection target is not a server. More specifically check the dispatch
and transparent backend. In these cases, the behavior should be similar
to http-reuse never mode.
MINOR: backend: handle reuse for conns with no server as target
If dispatch mode or transparent backend is used, the backend connection
target is a proxy instead of a server. In these cases, the reuse of
backend connections is not consistent.
With the default behavior, no reuse is done and every new request uses a
new connection. However, if http-reuse is set to never, the connection
are stored by the mux in the session and can be reused for future
requests in the same session.
As no server is used for these connections, no reuse can be made outside
of the session, similarly to http-reuse never mode. A different
http-reuse config value should not have an impact. To achieve this, mark
these connections as private to have a defined behavior.
For this feature to properly work, the connection hash has been slightly
adjusted. The server pointer as an input as been replaced by a generic
target pointer to refer to the server or proxy instance. The hash is
always calculated on connect_server even if the connection target is not
a server. This also requires to allocate the connection hash node for
every backend connections, not just the one with a server target.
BUG/MINOR: backend: free allocated bind_addr if reuse conn
Fix a leak in connect_server which happens when a connection is reused
and a bind_addr was allocated because transparent mode is active. The
connection has already an allocated bind_addr so free the newly
allocated one.
Tim Duesterhus [Sun, 28 Feb 2021 15:12:20 +0000 (16:12 +0100)]
BUG/MINOR: mux-h2: Fix typo in scheme adjustment
That comma should've been a semicolon. Fortunately, as it is now there
is no impact thanks to operators precedence, and all expressions are
properly evaluated. But this is troubling and the risk is high to
turn it into an effective bug with a minor change.
DOC: spoe: Add a note about fragmentation support in HAProxy
Add a note in SPOE.txt to make it clear that HAPRoxy does not support the
fragmentation. It can send fragmented frames if an agent supports it but it
cannot receives and handles fragmented frames.
This patch should fix the issue #659. It may be backported as far as 1.8.
BUILD: proxy: Missing header inclusion for quic_transport_params_init()
Since this commit: 144289b45 ("REORG: move init_default_instance() to proxy.c and pass it the defproxy pointer")
as quic_transport_params_init() has been moved from cfgparse.c to proxy.c this
latter source file must include xprt_quic.h header.
BUG/MEDIUM: spoe: Kill applets if there are pending connections and nbthread > 1
When the processing stage is finished for a SPOE applet, before returning it
into the idle list, we check if the assigned server appears as full or if
there are some pending connections on the backend or the assigned server. If
yes, it means we reach a maxconn and we close the applet to free a
slot. Otherwise, the applet can be reused. This test is only performed if
there are more than one thread.
It is important to close SPOE applets when there are pending connections for
multithreaded instances because connections with the SPOE agents are
persistent and local to a thread (applets are local to a thread). If a
maxconn is configured, some threads may take all available slots for a
while, leaving remaining threads without any free slot to process SPOE
messages. It is especially true if the maxconn is low.
This patch should fix the issue #705. It must be backported as far as
1.8. However, the code in 1.8 is quite different, a test must be performed
to be sure it works well.
BUG/MINOR: connection: Use the client's dst family for adressless servers
When the selected server has no address, the destination address of the
client is used. However, for now, only the address is set, not the
family. Thus depending on how the server is configured and the client's
destination address, the server address family may be wrong.
For instance, with such server :
server srv 0.0.0.0:0
The server address family is AF_INET. The server connection will fail if a
client is asking for an IPv6 destination.
To fix the bug, we take care to set the rigth family, the family of the
client destination address.
This patch should fix the issue #202. It must be backported to all stable
versions.
BUG/MINOR: tcp-act: Don't forget to set the original port for IPv4 set-dst rule
If an IPv4 is set via a TCP/HTTP set-dst rule, the original port must be
preserved or set to 0 if the previous family was neither AF_INET nor
AF_INET6. The first case is not an issue because the port remains the
same. But if the previous family was, for instance, AF_UNIX, the port is not
set to 0 and have an undefined value.
Willy Tarreau [Fri, 26 Feb 2021 21:49:10 +0000 (22:49 +0100)]
[RELEASE] Released version 2.4-dev10
Released version 2.4-dev10 with the following main changes :
- BUILD: SSL: introduce fine guard for RAND_keep_random_devices_open
- MINOR: Configure the `cpp` userdiff driver for *.[ch] in .gitattributes
- BUG/MINOR: ssl/cli: potential null pointer dereference in "set ssl cert"
- BUG/MINOR: sample: secure convs that accept base64 string and var name as args
- BUG/MEDIUM: vars: make functions vars_get_by_{name,desc} thread-safe
- CLEANUP: vars: make smp_fetch_var() to reuse vars_get_by_desc()
- DOC: muxes: add a diagram of the exchanges between muxes and outer world
- BUG/MEDIUM: proxy: use thread-safe stream killing on hard-stop
- BUG/MEDIUM: cli/shutdown sessions: make it thread-safe
- BUG/MINOR: proxy: wake up all threads when sending the hard-stop signal
- MINOR: stream: add an "epoch" to figure which streams appeared when
- MINOR: cli/streams: make "show sess" dump all streams till the new epoch
- MINOR: streams: use one list per stream instead of a global one
- MEDIUM: streams: do not use the streams lock anymore
- BUILD: dns: avoid a build warning when threads are disabled (dss unused)
- MEDIUM: task: remove the tasks_run_queue counter and have one per thread
- MINOR: tasks: do not maintain the rqueue_size counter anymore
- CLEANUP: tasks: use a less confusing name for task_list_size
- CLEANUP: task: move the tree root detection from __task_wakeup() to task_wakeup()
- MINOR: task: limit the remote thread wakeup to the global runqueue only
- MINOR: task: move the allocated tasks counter to the per-thread struct
- CLEANUP: task: split the large tasklet_wakeup_on() function in two
- BUG/MINOR: fd: properly wait for !running_mask in fd_set_running_excl()
- BUG/MINOR: resolvers: Fix condition to release received ARs if not assigned
- BUG/MINOR: resolvers: Only renew TTL for SRV records with an additional record
- BUG/MINOR: resolvers: new callback to properly handle SRV record errors
- BUG/MEDIUM: resolvers: Reset server address and port for obselete SRV records
- BUG/MEDIUM: resolvers: Reset address for unresolved servers
- DOC: Update the module list in MAINTAINERS file
- MINOR: htx: Add function to reserve the max possible size for an HTX DATA block
- DOC: Update the HTX API documentation
- DOC: Update the filters guide
- BUG/MEDIUM: contrib/prometheus-exporter: fix segfault in listener name dump
- MINOR: task: split the counts of local and global tasks picked
- MINOR: task: do not use __task_unlink_rq() from process_runnable_tasks()
- MINOR: task: don't decrement then increment the local run queue
- CLEANUP: task: re-merge __task_unlink_rq() with task_unlink_rq()
- MINOR: task: make grq_total atomic to move it outside of the grq_lock
- MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set
- MINOR: task: make tasklet wakeup latency measurements more accurate
- MINOR: server: Be more strict on the server-state line parsing
- MINOR: server: Only fill one array when parsing a server-state line
- MEDIUM: server: Refactor apply_server_state() to make it more readable
- CLEANUP: server: Rename state_line node to node instead of name_name
- CLEANUP: server: Rename state_line structure into server_state_line
- CLEANUP: server: Use a local eb-tree to store lines of the global server-state file
- MINOR: server: Be more strict when reading the version of a server-state file
- MEDIUM: server: Store parsed params of a server-state line in the tree
- MINOR: server: Remove cached line from global server-state tree when found
- MINOR: server: Move loading state of servers in a dedicated function
- MEDIUM: server: Use a tree to store local server-state lines
- MINOR: server: Parse and store server-state lines in a dedicated function
- MEDIUM: server: Don't load server-state file if a line is corrupted
- REORG: server: Export and rename some functions updating server info
- REORG: server-state: Move functions to deal with server-state in its own file
- MINOR: server-state: Don't load server-state file for serverless proxies
- CLEANUP: muxes: Remove useless if condition in show_fd function
- BUG/MINOR: stats: fix compare of no-maint url suffix
- MINOR: task: limit the number of subsequent heavy tasks with flag TASK_HEAVY
- MINOR: ssl: mark the SSL handshake tasklet as heavy
- CLEANUP: server: rename srv_cleanup_{idle,toremove}_connections()
- BUG/MINOR: ssl: potential null pointer dereference in ckchs_dup()
- MINOR: task: add one extra tasklet class: TL_HEAVY
- MINOR: task: place the heavy elements in TL_HEAVY
- MINOR: task: only limit TL_HEAVY tasks but not others
- BUG/MINOR: http-ana: Only consider dst address to process originalto option
- MINOR: tools: Add net_addr structure describing a network addess
- MINOR: tools: Add function to compare an address to a network address
- MEDIUM: http-ana: Add IPv6 support for forwardfor and orignialto options
- CLEANUP: hlua: Use net_addr structure internally to parse and compare addresses
- REGTESTS: Add script to test except param for fowardedfor/originalto options
- DOC: scheduler: add a diagram showing the different queues and their usages
- CLEANUP: tree-wide: replace free(x);x=NULL with ha_free(&x)
- CLEANUP: config: replace a few free() with ha_free()
- CLEANUP: vars: always zero the pointers after a free()
- CLEANUP: ssl: remove a useless "if" before freeing an error message
- CLEANUP: ssl: make ssl_sock_free_srv_ctx() zero the pointers after free
- CLEANUP: ssl: use realloc() instead of free()+malloc()
Willy Tarreau [Fri, 26 Feb 2021 20:05:08 +0000 (21:05 +0100)]
CLEANUP: ssl: use realloc() instead of free()+malloc()
There was a free(ptr) followed by ptr=malloc(ptr, len), which is the
equivalent of ptr = realloc(ptr, len) but slower and less clean. Let's
replace this.
Willy Tarreau [Fri, 26 Feb 2021 20:06:32 +0000 (21:06 +0100)]
CLEANUP: ssl: make ssl_sock_free_srv_ctx() zero the pointers after free
In ssl_sock_free_srv_ctx() there are some calls to free() which are not
followed by a zeroing of the pointers. For now this function is only used
during deinit but it could be used at run time in the near future, so
better secure this.