Willy Tarreau [Mon, 29 Jun 2020 12:23:31 +0000 (14:23 +0200)]
BUG/MINOR: debug: fix "show fd" null-deref when built with DEBUG_FD
DEBUG_FD was added by commit 38e8a1c in 2.2-dev, and "show fd" was
slightly modified to still allow to print orphaned/closed FDs if their
count is non-null. But bypassing the existing test made it possible
to dereference fdt.owner which can be null. Let's adjust the condition
to avoid this.
Willy Tarreau [Sat, 27 Jun 2020 22:54:27 +0000 (00:54 +0200)]
MINOR: pools: move the LRU cache heads to thread_info
The LRU cache head was an array of list, which causes false sharing
between 4 to 8 threads in the same cache line. Let's move it to the
thread_info structure instead. There's no need to do the same for the
pool_cache[] array since it's already quite large (32 pointers each).
By doing this the request rate increased by 1% on a 16-thread machine.
Willy Tarreau [Mon, 29 Jun 2020 08:11:24 +0000 (10:11 +0200)]
CLEANUP: pool: only include the type files from types
pool-t.h was mistakenly including the full-blown includes for threads,
lists and api instead of the types, and as such, CONFIG_HAP_LOCAL_POOLS
and CONFIG_HAP_LOCKLESS_POOLS were not visible everywhere.
Willy Tarreau [Mon, 29 Jun 2020 07:57:23 +0000 (09:57 +0200)]
REORG: includes: create tinfo.h for the thread_info struct
The thread_info struct is convenient to store various per-thread info
without having to resort to a painful thread_local storage which is
slow and painful to initialize.
The problem is, by having this one in thread.h it's very difficult to
add more entries there because everyone already includes thread.h so
conversely thread.h cannot reference certain types.
There's no point in having this there, instead let's create a new pair
of files, tinfo{,-t}.h, which declare the structure. This way it will
become possible to extend them with other includes and have certain
files store their own types there.
Olivier Houchard [Sun, 28 Jun 2020 14:23:09 +0000 (16:23 +0200)]
BUG/MEDIUM: checks: Increment the server's curr_used_conns
In tcpcheck_eval_connect(), if we're targetting a server, increase its
curr_used_conns when creating a new connection, as the counter will be
decreased later when the connection is destroyed and conn_free() is called.
Olivier Houchard [Sun, 28 Jun 2020 14:14:09 +0000 (16:14 +0200)]
BUG/MEDIUM: connections: Don't increase curr_used_conns for shared connections.
In connect_server(), we want to increase curr_used_conns only if the
connection is new, or if it comes from an idle_pool, otherwise it means
the connection is already used by at least one another stream, and it is
already accounted for.
Willy Tarreau [Sat, 27 Jun 2020 22:19:17 +0000 (00:19 +0200)]
MINOR: connection: align toremove_{lock,connections} and cleanup into idle_conns
We used to have 3 thread-based arrays for toremove_lock, idle_cleanup,
and toremove_connections. The problem is that these items are small,
and that this creates false sharing between threads since it's possible
to pack up to 8-16 of these values into a single cache line. This can
cause real damage where there is contention on the lock.
This patch creates a new array of struct "idle_conns" that is aligned
on a cache line and which contains all three members above. This way
each thread has access to its variables without hindering the other
ones. Just doing this increased the HTTP/1 request rate by 5% on a
16-thread machine.
The definition was moved to connection.{c,h} since it appeared a more
natural evolution of the ongoing changes given that there was already
one of them declared in connection.h previously.
Willy Tarreau [Sun, 28 Jun 2020 08:31:15 +0000 (10:31 +0200)]
BUG/MEDIUM: buffers: always allocate from the local cache first
It looked strange to see pool_evict_from_cache() always very present
on "perf top", but there was actually a reason to this: while b_free()
uses pool_free() which properly disposes the buffer into the local cache
and b_alloc_fast() allocates using pool_get_first() which considers the
local cache, b_alloc_margin() does not consider the local cache as it
only uses __pool_get_first() which only allocates from the shared pools.
The impact is that basically everywhere a buffer is allocated (muxes,
streams, applets), it's always picked from the shared pool (hence
involves locking) and is released to the local one and makes it grow
until it's required to trigger a flush using pool_evict_from_cache().
Buffers usage are thus not thread-local at all, and cause eviction of
a lot of possibly useful objects from the local caches.
Just fixing this results in a 10% request rate increase in an HTTP/1 test
on a 16-thread machine.
This bug was caused by recent commit ed891fd ("MEDIUM: memory: make local
pools independent on lockless pools") merged into 2.2-dev9, so not backport
is needed.
Willy Tarreau [Sat, 27 Jun 2020 23:24:12 +0000 (01:24 +0200)]
MINOR: cli: make "show sess" stop at the last known session
"show sess" and particularly "show sess all" can be very slow when dumping
lots of information, and while dumping, new sessions might appear, making
the output really endless. When threads are used, this causes a double
problem:
- all threads are paused during the dump, so an overly long dump degrades
the quality of service ;
- since all threads are paused, more events get postponed, possibly
resulting in more streams to be dumped on next invocation of the dump
function.
This patch addresses this long-lasting issue by doing something simple:
the CLI's stream is moved at the end of the steams list, serving as an
identifiable marker to end the dump, because all entries past it were
added after the command was entered. As a result, the CLI's stream always
appears as the last one.
It may make sense to backport this to stable branches where dumping live
streams is difficult as well.
Willy Tarreau [Sat, 27 Jun 2020 22:31:13 +0000 (00:31 +0200)]
BUG/MINOR: mux_h2: don't lose the leaving trace in h2_io_cb()
Commit cd4159f ("MEDIUM: mux_h2: Implement the takeover() method.")
added a return in the middle of the function, and as usual with such
stray return statements, some unrolling was lost. Here it's only the
TRACE_LEAVE() call, so it's mostly harmless. That's 2.2 only, no
backport is needed.
Willy Tarreau [Fri, 26 Jun 2020 20:01:04 +0000 (22:01 +0200)]
[RELEASE] Released version 2.2-dev11
Released version 2.2-dev11 with the following main changes :
- REGTEST: Add a simple script to tests errorfile directives in proxy sections
- BUG/MEDIUM: fcgi-app: Resolve the sink if a fcgi-app logs in a ring buffer
- BUG/MINOR: spoe: correction of setting bits for analyzer
- BUG/MINOR: cfgparse: Support configurations without newline at EOF
- MINOR: cfgparse: Warn on truncated lines / files
- BUG/MINOR: http_ana: clarify connection pointer check on L7 retry
- MINOR: debug: add a new DEBUG_FD build option
- BUG/MINOR: tasks: make sure never to exceed max_processed
- MINOR: task: add a new pointer to current tasklet queue
- BUG/MEDIUM: task: be careful not to run too many tasks at TL_URGENT
- BUG/MINOR: cfgparse: Fix argument reference in PARSE_ERR_TOOMANY message
- BUG/MINOR: cfgparse: Fix calculation of position for PARSE_ERR_TOOMANY message
- BUG/MEDIUM: ssl: fix ssl_bind_conf double free
- MINOR: ssl: free bind_conf_node in crtlist_free()
- MINOR: ssl: free the crtlist and the ckch during the deinit()
- BUG/MINOR: ssl: fix build with ckch_deinit() and crtlist_deinit()
- BUG/MINOR: ssl/cli: certs added from the CLI can't be deleted
- MINOR: ssl: move the ckch/crtlist deinit to ssl_sock.c
- MEDIUM: tasks: apply a fair CPU distribution between tasklet classes
- MINOR: tasks: make current_queue an index instead of a pointer
- MINOR: tasks: add a mask of the queues with active tasklets
- MINOR: tasks: pass the queue index to run_task_from_list()
- MINOR: tasks: make run_tasks_from_lists() scan the queues itself
- MEDIUM: tasks: add a tune.sched.low-latency option
- BUG/MEDIUM: ssl/cli: 'commit ssl cert' crashes when no private key
- BUG/MINOR: cfgparse: don't increment linenum on incomplete lines
- MINOR: tools: make parse_line() always terminate the args list
- BUG/MINOR: cfgparse: report extraneous args *after* the string is allocated
- MINOR: cfgparse: sanitize the output a little bit
- MINOR: cli/ssl: handle trailing slashes in crt-list commands
- MINOR: ssl: add the ssl_s_* sample fetches for server side certificate
- BUG/MEDIUM: http-ana: Don't loop trying to generate a malformed 500 response
- BUG/MINOR: stream-int: Don't wait to send truncated HTTP messages
- BUG/MINOR: http-ana: Set CF_EOI on response channel for generated responses
- BUG/MINOR: http-ana: Don't wait to send 1xx responses generated by HAProxy
- MINOR: spoe: Don't systematically create new applets if processing rate is low
- DOC: fix some typos in the ssl_s_{s|i}_dn documentation
- BUILD: fix ssl_sample.c when building against BoringSSL
- CI: travis-ci: switch BoringSSL builds to ninja
- CI: extend spellchecker whitelist
- DOC: assorted typo fixes in the documentation
- CLEANUP: assorted typo fixes in the code and comments
- MINOR: http: Add support for http 413 status
- REGTEST: ssl: tests the ssl_f_* sample fetches
- REGTEST: ssl: add some ssl_c_* sample fetches test
- DOC: ssl: update the documentation of "commit ssl cert"
- BUG/MINOR: cfgparse: correctly deal with empty lines
- BUG/MEDIUM: fetch: Fix hdr_ip misparsing IPv4 addresses due to missing NUL
Tim Duesterhus [Fri, 26 Jun 2020 13:44:48 +0000 (15:44 +0200)]
BUG/MEDIUM: fetch: Fix hdr_ip misparsing IPv4 addresses due to missing NUL
The IPv4 code did not take into account that the header value might not
contain the trailing NUL byte, possibly reading stray data after the header
value, failing the parse and testing the IPv6 branch. That one adds the
missing NUL, but fails to parse IPv4 addresses.
Fix this issue by always adding the trailing NUL.
The bug was reported on GitHub as issue #715.
It's not entirely clear when this bug started appearing, possibly earlier
versions of smp_fetch_hdr guaranteed the NUL termination. However the
addition of the NUL in the IPv6 case was added together with IPv6 support,
hinting that at that point in time the NUL was not guaranteed.
The commit that added IPv6 support was 69fa99292e689e355080d83ab19db4698b7c502b
which first appeared in HAProxy 1.5. This patch should be backported to
1.5+, taking into account the various buffer / chunk changes and the movement
across different files.
Willy Tarreau [Fri, 26 Jun 2020 15:24:54 +0000 (17:24 +0200)]
BUG/MINOR: cfgparse: correctly deal with empty lines
Issue 23653 in oss-fuzz reports a heap overflow bug which is in fact a
bug introduced by commit 9e1758efb ("BUG/MEDIUM: cfgparse: use
parse_line() to expand/unquote/unescape config lines") to address
oss-fuzz issue 22689, which was only partially fixed by commit 70f58997f
("BUG/MINOR: cfgparse: Support configurations without newline at EOF").
Actually on an empty line, end == line so we cannot dereference end-1
to check for a trailing LF without first being sure that end is greater
than line.
Ilya Shipitsin [Sat, 20 Jun 2020 18:38:37 +0000 (23:38 +0500)]
BUILD: fix ssl_sample.c when building against BoringSSL
BoringSSL does not have X509_get_X509_PUBKEY
let our emulation level define that for BoringSSL as well
Build log:
src/ssl_sample.o: In function `smp_fetch_ssl_x_key_alg':
/home/travis/build/haproxy/haproxy/src/ssl_sample.c:592: undefined reference to `X509_get_X509_PUBKEY'
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
Makefile:860: recipe for target 'haproxy' failed
make: *** [haproxy] Error 1
MINOR: spoe: Don't systematically create new applets if processing rate is low
When an event must be processed, we decide to create a new SPOE applet if there
is no idle applet at all or if the processing rate is lower than the number of
waiting events. But when the processing rate is very low (< 1 event/second), a
new applet is created independently of the number of idle applets.
Now, when there is at least one idle applet when there is only one event to
process, no new applet is created.
BUG/MINOR: http-ana: Don't wait to send 1xx responses generated by HAProxy
When an informational response (1xx) is returned by HAProxy, we must be sure to
send it ASAP. To do so, CF_SEND_DONTWAIT flag must be set on the response
channel to instruct the stream-interface to not set the CO_SFL_MSG_MORE flag on
the transport layer. Otherwise the response delivery may be delayed, because of
the commit 8945bb6c0 ("BUG/MEDIUM: stream-int: fix loss of CO_SFL_MSG_MORE flag
in forwarding").
This patch may be backported as far as 1.9, for HTX part only. But this part has
changed in the 2.2, so it may be a bit tricky. Note it does not fix any known
bug on previous versions because the CO_SFL_MSG_MORE flag is ignored by the h1
mux.
BUG/MINOR: http-ana: Set CF_EOI on response channel for generated responses
To be consistent with other processings on the channels, when HAProxy generates
a final response, the CF_EOI flag must be set on the response channel. This flag
is used to know that a full message was pushed into the channel (HTX messages
with an EOM block). It is used in conjunction with other channel's flags in
stream-interface functions. Especially when si_cs_send() is called, to know if
we must set or not the CO_SFL_MSG_MORE flag. Without CF_EOI, the CO_SFL_MSG_MORE
flag is always set and the message forwarding is delayed.
This patch may be backported as far as 1.9, for HTX part only. But this part has
changed in the 2.2, so it may be a bit tricky. Note it does not fix any known
bug on previous versions because the CO_SFL_MSG_MORE flag is ignored by the h1
mux.
BUG/MINOR: stream-int: Don't wait to send truncated HTTP messages
In HTX, since the commit 8945bb6c0 ("BUG/MEDIUM: stream-int: fix loss of
CO_SFL_MSG_MORE flag in forwarding"), the CO_SFL_MSG_MORE flag is set on the
transport layer if the end of the HTTP message is not reached, to delay the data
forwarding. To do so, the CF_EOI flag is tested and must not be set on the
output channel.
But the CO_SFL_MSG_MORE flag is also added if the message was truncated. Only
CF_SHUTR is set if this case. So the forwarding may be delayed to wait more data
that will never come. So, in HTX, the CO_SFL_MSG_MORE flag must not be set if
the message is finished (full or truncated).
BUG/MEDIUM: http-ana: Don't loop trying to generate a malformed 500 response
When HAProxy generates a 500 response, if the formatting failed, for instance
because the message is larger than a buffer, it retries to format it in loop. To
fix the bug, we must stop trying to send a response if it is a non-rewritable
response (TX_CONST_REPLY flag is set on the HTTP transaction).
Because this part is not trivial, some comments have been added.
Willy Tarreau [Thu, 25 Jun 2020 07:15:40 +0000 (09:15 +0200)]
MINOR: cfgparse: sanitize the output a little bit
With the rework of the config line parser, we've started to emit a dump
of the initial line underlined by a caret character indicating the error
location. But with extremely large lines it starts to take time and can
even cause trouble to slow terminals (e.g. over ssh), and this becomes
useless. In addition, control characters could be dumped as-is which is
bad, especially when the input file is accidently wrong (an executable).
This patch adds a string sanitization function which isolates an area
around the error position in order to report only that area if the string
is too large. The limit was set to 80 characters, which will result in
roughly 40 chars around the error being reported only, prefixed and suffixed
with "..." as needed. In addition, non-printable characters in the line are
now replaced with '?' so as not to corrupt the terminal. This way invalid
variable names, unmatched quotes etc will be easier to spot.
A typical output is now:
[ALERT] 176/092336 (23852) : parsing [bad.cfg:8]: forbidden first char in environment variable name at position 811957:
...c$PATH$PATH$d(xlc`%?$PATH$PATH$dgc?T$%$P?AH?$PATH$PATH$d(?$PATH$PATH$dgc?%...
^
Willy Tarreau [Thu, 25 Jun 2020 05:41:22 +0000 (07:41 +0200)]
BUG/MINOR: cfgparse: report extraneous args *after* the string is allocated
The config parser change in commit 9e1758efb ("BUG/MEDIUM: cfgparse: use
parse_line() to expand/unquote/unescape config lines") is wrong when
displaying the last parsed word, because it doesn't verify that the output
string was properly allocated. This may fail in two cases:
- very first line (outline is NULL, as in oss-fuzz issue 23657)
- much longer line than previous ones, requiring a realloc(), in which
case the final 0 is out of the allocated space.
This patch moves the reporting after the allocation check to fix this.
Willy Tarreau [Thu, 25 Jun 2020 05:35:42 +0000 (07:35 +0200)]
MINOR: tools: make parse_line() always terminate the args list
parse_line() as added in commit c8d167bcf ("MINOR: tools: add a new
configurable line parse, parse_line()") presents an difficult usage
because it's up to the caller to determine the last written argument
based on what was passed to it. In practice the only way to safely
use it is for the caller to always pass nbarg-1 and make that last
entry point to the last arg + its strlen. This is annoying because
it makes it as painful to use as the infamous strncpy() while it has
all the information the caller needs.
This patch changes its behavior so that it guarantees that at least
one argument will point to the trailing zero at the end of the output
string, as long as there is at least one argument. The caller just
has to pass +1 to the arg count to make sure at least a last one is
empty.
Willy Tarreau [Thu, 25 Jun 2020 07:37:54 +0000 (09:37 +0200)]
BUG/MINOR: cfgparse: don't increment linenum on incomplete lines
When fgets() returns an incomplete line we must not increment linenum
otherwise line numbers become incorrect. This may happen when parsing
files with extremely long lines which require a realloc().
The bug has been present since unbounded line length was supported, so
the fix should be backported to older branches.
Willy Tarreau [Wed, 24 Jun 2020 09:11:02 +0000 (11:11 +0200)]
MEDIUM: tasks: add a tune.sched.low-latency option
Now that all tasklet queues are scanned at once by run_tasks_from_lists(),
it becomes possible to always check for lower priority classes and jump
back to them when they exist.
This patch adds tune.sched.low-latency global setting to enable this
behavior. What it does is stick to the lowest ranked priority list in
which tasks are still present with an available budget, and leave the
loop to refill the tasklet lists if the trees got new tasks or if new
work arrived into the shared urgent queue.
Doing so allows to cut the latency in half when running with extremely
deep run queues (10k-100k), thus allowing forwarding of small and large
objects to coexist better. It remains off by default since it does have
a small impact on large traffic by default (shorter batches).
Willy Tarreau [Wed, 24 Jun 2020 08:17:29 +0000 (10:17 +0200)]
MINOR: tasks: make run_tasks_from_lists() scan the queues itself
Now process_runnable_tasks is responsible for calculating the budgets
for each queue, dequeuing from the tree, and calling run_tasks_from_lists().
This latter one scans the queues, picking tasks there and respecting budgets.
Note that its name was updated with a plural "s" for this reason.
Willy Tarreau [Wed, 24 Jun 2020 07:39:48 +0000 (09:39 +0200)]
MINOR: tasks: add a mask of the queues with active tasklets
It is neither convenient nor scalable to check each and every tasklet
queue to figure whether it's empty or not while we often need to check
them all at once. This patch introduces a tasklet class mask which gets
a bit 1 set for each queue representing one class of service. A single
test on the mask allows to figure whether there's still some work to be
done. It will later be usable to better factor the runqueue code.
Bits are set when tasklets are queued. They're cleared when queues are
emptied. It is possible that a queue is empty but has a bit if a tasklet
was added then removed, but this is not a problem as this is properly
checked for in run_tasks_from_list().
Willy Tarreau [Wed, 24 Jun 2020 07:19:50 +0000 (09:19 +0200)]
MINOR: tasks: make current_queue an index instead of a pointer
It will be convenient to have the tasklet queue number soon, better make
current_queue an index rather than a pointer to the queue. When not currently
running (e.g. from I/O), the index is -1.
Willy Tarreau [Wed, 24 Jun 2020 05:21:08 +0000 (07:21 +0200)]
MEDIUM: tasks: apply a fair CPU distribution between tasklet classes
Till now in process_runnable_tasks() we used to reserve a fixed portion
of max_processed to urgent tasks, then a portion of what remains for
normal tasks, then what remains for bulk tasks. This causes two issues:
- the current budget for processed tasks could be drained once for
all by higher level tasks so that they couldn't have enough left
for the next run. For example, if bulk tasklets cause task wakeups,
the required share to run them could be eaten by other bulk tasklets.
- it forces the urgent tasks to be run before scanning the tree so that
we know how many tasks to pick from the tree, and this isn't very
efficient cache-wise.
This patch changes this so that we compute upfront how max_processed will
be shared between classes that require so. We can then decide in advance
to pick a certain number of tasks from the tree, then execute all tasklets
in turn. When reaching the end, if there's still some budget, we can go
back and do the same thing again, improving chances to pick new work
before the global budget is depleted.
The default weights have been set to 50% for urgent tasklets, 37% for
normal ones and 13% for the bulk ones. In practice, there are not that
many urgent tasklets but when they appear they are cheap and must be
processed in as large batches as possible. Every time there is nothing
to pick there, the unused budget is shared between normal and bulk and
this allows bulk tasklets to still have quite some CPU to run on.
MINOR: ssl: move the ckch/crtlist deinit to ssl_sock.c
Move the ckch_deinit() and crtlist_deinit() call to ssl_sock.c,
also unlink the SNI from the ckch_inst because they are free'd before in
ssl_sock_free_all_ctx().
BUG/MINOR: ssl/cli: certs added from the CLI can't be deleted
In ticket #706 it was reported that a certificate which was added from
the CLI can't be removed with 'del ssl cert' and is marked as 'Used'.
The problem is that the certificate instances are not added to the
created crtlist_entry, so they can't be deleted upon a 'del ssl
crt-list', and the store can't never be marked 'Unused' because of this.
This patch fixes the issue by adding the instances to the crtlist_entry,
which is enough to fix the issue.
Since commit 2954c47 ("MEDIUM: ssl: allow crt-list caching"), the
ssl_bind_conf is allocated directly in the crt-list, and the crt-list
can be shared between several bind_conf. The deinit() code wasn't
changed to handle that.
This patch fixes the issue by removing the free of the ssl_conf in
ssl_sock_free_all_ctx().
It should be completed with a patch that free the ssl_conf and the
crt-list.
Tim Duesterhus [Tue, 23 Jun 2020 16:56:09 +0000 (18:56 +0200)]
BUG/MINOR: cfgparse: Fix argument reference in PARSE_ERR_TOOMANY message
The returned `arg` value is the number of arguments found, but in case
of the error message it's not a valid argument index.
Because we know how many arguments we allowed (MAX_LINE_ARGS) we know
what to print in the error message, so do just that.
Consider a configuration like this:
listen foo
1 2 3 [...] 64 65
Then running a configuration check within valgrind reports the following:
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E8B83: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Use of uninitialised value of size 8
==18265== at 0x56E576B: _itoa_word (_itoa.c:179)
==18265== by 0x56E912C: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E5775: _itoa_word (_itoa.c:179)
==18265== by 0x56E912C: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E91AF: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E8C59: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E941A: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E8CAB: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E8CE2: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56EA2DB: vfprintf (vfprintf.c:1632)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
[ALERT] 174/165720 (18265) : parsing [./config.cfg:2]: too many words, truncating at word 65, position -95900735: <(null)>.
[ALERT] 174/165720 (18265) : Error(s) found in configuration file : ./config.cfg
[ALERT] 174/165720 (18265) : Fatal errors found in configuration.
Valgrind reports conditional jumps relying on an undefined value and the
error message clearly shows incorrect stuff.
After this patch is applied the relying on undefined values is gone and
the <(null)> will actually show the argument. However the position value
still is incorrect. This will be fixed in a follow up patch.
Willy Tarreau [Tue, 23 Jun 2020 14:36:36 +0000 (16:36 +0200)]
BUG/MEDIUM: task: be careful not to run too many tasks at TL_URGENT
A test on large objects revealed a big performance loss from 2.1. The
cause was found to be related to cache locality between scheduled
operations that are batched using tasklets. It happens that we now
have several layers of tasklets and that queuing all these operations
leaves time to let memory objects cool down in the CPU cache, effectively
resulting in halving the performance.
A quick test consisting in putting most unknown tasklets into the BULK
queue almost fixed the performance regression, but this is a wrong
approach as it can also slow down some low-latency transfers or access
to applets like the CLI.
What this patch does instead is to queue unknown tasklets into the same
queue as the current one when tasklet_wakeup() is itself called from a
task/tasklet, otherwise it uses urgent for real I/O (when sched->current
is NULL). This results in the called tasklet being woken up much sooner,
often at the end of the current batch of tasklets.
By doing so, a test on 2 cores 4 threads with 256 concurrent H1 conns
transferring 16m objects with 256kB buffers jumped from 55 to 88 Gbps.
It's even possible to go as high as 101 Gbps by evaluating the URGENT
queue after the BULK one, though this was not done as considered
dangerous for latency sensitive operations.
This reinforces the importance of getting back the CPU transfer
mechanisms based on tasklet_wakeup_after() to work at the tasklet level
by supporting an immediate wakeup in certain cases.
Willy Tarreau [Tue, 23 Jun 2020 14:35:38 +0000 (16:35 +0200)]
MINOR: task: add a new pointer to current tasklet queue
In task_per_thread[] we now have current_queue which is a pointer to
the current tasklet_list entry being evaluated. This will be used to
know the class under which the current task/tasklet is currently
running.
Willy Tarreau [Tue, 23 Jun 2020 09:32:35 +0000 (11:32 +0200)]
BUG/MINOR: tasks: make sure never to exceed max_processed
We want to be sure not to exceed max_processed. It can actually go
slightly negative due to the rounding applied to ratios, but we must
refrain from processing too many tasks if it's already low.
This became particularly relevant since recent commit 5c8be272c ("MEDIUM:
tasks: also process late wakeups in process_runnable_tasks()") which was
merged into 2.2-dev10. No backport is needed.
Willy Tarreau [Tue, 23 Jun 2020 08:04:54 +0000 (10:04 +0200)]
MINOR: debug: add a new DEBUG_FD build option
When DEBUG_FD is set at build time, we'll keep a counter of per-FD events
in the fdtab. This counter is reported in "show fd" even for closed FDs if
not zero. The purpose is to help spot situations where an apparently closed
FD continues to be reported in loops, or where some events are dismissed.
Willy Tarreau [Tue, 23 Jun 2020 03:58:20 +0000 (05:58 +0200)]
BUG/MINOR: http_ana: clarify connection pointer check on L7 retry
Coverity reports a possible null deref in issue #703. It seems this
cannot happen as in order to have a CF_READ_ERROR we'd need to have
attempted a recv() which implies a conn_stream, thus conn cannot be
NULL anymore. But at least one line tests for conn and the other one
not, which is confusing. So let's add a check for conn before
dereferencing it.
This needs to be backported to 2.1 and 2.0. Note that in 2.0 it's
in proto_htx.c.
Miroslav Zagorac [Fri, 19 Jun 2020 20:17:17 +0000 (22:17 +0200)]
BUG/MINOR: spoe: correction of setting bits for analyzer
When a SPOE filter starts the response analyze, the wrong flag is tested on the
pre_analyzers bit field. AN_RES_INSPECT must be tested instead of
SPOE_EV_ON_TCP_RSP.
This patch must be backported to all versions with the SPOE support, i.e as far
as 1.7.
BUG/MEDIUM: fcgi-app: Resolve the sink if a fcgi-app logs in a ring buffer
If a fcgi application is configured to send its logs to a ring buffer, the
corresponding sink must be resolved during the configuration post
parsing. Otherwise, the sink is undefined when a log message is emitted,
crashing HAProxy.
Willy Tarreau [Fri, 19 Jun 2020 19:43:26 +0000 (21:43 +0200)]
[RELEASE] Released version 2.2-dev10
Released version 2.2-dev10 with the following main changes :
- BUILD: include: add sys/types before netinet/tcp.h
- BUG/MEDIUM: log: don't hold the log lock during writev() on a file descriptor
- BUILD: Remove nowarn for warnings that do not trigger
- BUG/MEDIUM: pattern: fix thread safety of pattern matching
- BUILD: Re-enable -Wimplicit-fallthrough
- BUG/MINOR: ssl: fix ssl-{min,max}-ver with openssl < 1.1.0
- BUILD: thread: add parenthesis around values of locking macros
- BUILD: proto_uxst: shut up yet another gcc's absurd warning
- BUG/MEDIUM: checks: Fix off-by-one in allocation of SMTP greeting cmd
- CI: travis-ci: use "-O1" for clang builds
- MINOR: haproxy: Add void deinit_and_exit(int)
- MINOR: haproxy: Make use of deinit_and_exit() for clean exits
- BUG/MINOR: haproxy: Free rule->arg.vars.expr during deinit_act_rules
- BUILD: compression: make gcc 10 happy with free_zlib()
- BUILD: atomic: add string.h for memcpy() on ARM64
- BUG/MINOR: http: make smp_fetch_body() report that the contents may change
- BUG/MINOR: tcp-rules: tcp-response must check the buffer's fullness
- BUILD: haproxy: mark deinit_and_exit() as noreturn
- BUG/MAJOR: vars: Fix bogus free() during deinit() for http-request rules
- BUG/MEDIUM: ebtree: use a byte-per-byte memcmp() to compare memory blocks
- MINOR: tools: add a new configurable line parse, parse_line()
- BUG/MEDIUM: cfgparse: use parse_line() to expand/unquote/unescape config lines
- BUG/MEDIUM: cfgparse: stop after a reasonable amount of fatal error
- MINOR: http: do not close connections anymore after internal responses
- BUG/MINOR: cfgparse: Add missing fatal++ in PARSE_ERR_HEX case
- BUG/MINOR: spoe: add missing key length check before checking key names
- MINOR: version: put the compiler version output into version.c not haproxy.c
- MINOR: compiler: always define __has_feature()
- MINOR: version: report the presence of the compiler's address sanitizer
- BUILD: Fix build by including haproxy/global.h
- BUG/MAJOR: connection: always disable ready events once reported
- CLEANUP: activity: remove unused counter fd_lock
- DOC: fd: make it clear that some fields ordering must absolutely be respected
- MINOR: activity: report the number of times poll() reports I/O
- MINOR: activity: rename confusing poll_* fields in the output
- MINOR: fd: Fix a typo in a coment.
- BUG/MEDIUM: fd: Don't fd_stop_recv() a fd we don't own.
- BUG/MEDIUM: fd: Call fd_stop_recv() when we just got a fd.
- MINOR: activity: group the per-loop counters at the top
- MINOR: activity: rename the "stream" field to "stream_calls"
- MEDIUM: fd: refine the fd_takeover() migration lock
- MINOR: fd: slightly optimize the fd_takeover double-CAS loop
- MINOR: fd: factorize the fd_takeover() exit path to make it safer
- MINOR: peers: do not use localpeer as an array anymore
- MEDIUM: peers: add the "localpeer" global option
- MEDIUM: fd: add experimental support for edge-triggered polling
- CONTRIB: debug: add the missing flags CO_FL_SAFE_LIST and CO_FL_IDLE_LIST
- MINOR: haproxy: process signals before runnable tasks
- MEDIUM: tasks: clean up the front side of the wait queue in wake_expired_tasks()
- MEDIUM: tasks: also process late wakeups in process_runnable_tasks()
- BUG/MINOR: cli: allow space escaping on the CLI
- BUG/MINOR: mworker/cli: fix the escaping in the master CLI
- BUG/MINOR: mworker/cli: fix semicolon escaping in master CLI
- REGTEST: http-rules: test spaces in ACLs
- REGTEST: http-rules: test spaces in ACLs with master CLI
- BUG/MAJOR: init: properly compute the default global.maxpipes value
- MEDIUM: map: make the "clear map" operation yield
- BUG/MEDIUM: stream-int: fix loss of CO_SFL_MSG_MORE flag in forwarding
- MINOR: mux_h1: Set H1_F_CO_MSG_MORE if we know we have more to send.
- BUG/MINOR: systemd: Wait for network to be online
- DOC: configuration: Unindent non-code sentences in the protobuf example
- DOC: configuration: http-check send was missing from matrix
Peter Gervai [Thu, 11 Jun 2020 16:26:36 +0000 (18:26 +0200)]
DOC: configuration: http-check send was missing from matrix
The new directive and its doc were added by commit 8acb1284b ("MINOR:
checks: Add a way to send custom headers and payload during http chekcs")
but the index was not updated.
Peter Gervai [Thu, 11 Jun 2020 16:05:11 +0000 (18:05 +0200)]
DOC: configuration: Unindent non-code sentences in the protobuf example
Unindent to make the explanation go back to text from code formatted
example in tyhe HTMLized version. Still it's not perfect since these
are not haproxy examples but protobuf config, but... way better.
Ryan O'Hara [Mon, 15 Jun 2020 16:34:54 +0000 (11:34 -0500)]
BUG/MINOR: systemd: Wait for network to be online
Change systemd service file to wait for network to be completely
online. This solves two problems:
If haproxy is configured to bind to IP address(es) that are not yet
assigned, haproxy would previously fail. The workaround is to use
"option transparent".
If haproxy us configured to use a resolver to resolve servers via DNS,
haproxy would previously fail due to the fact that the network is not
fully online yet. This is the most compelling reason for this patch.
Signed-off-by: Ryan O'Hara <rohara@redhat.com> Acked-by: Lukas Tribus <lukas@ltri.eu>
Olivier Houchard [Fri, 19 Jun 2020 14:15:05 +0000 (16:15 +0200)]
MINOR: mux_h1: Set H1_F_CO_MSG_MORE if we know we have more to send.
In h1_snd_buf(), also set H1_F_CO_MSG_MORE if we know we still have more to
send, not just if the stream-interface told us to do so. This may happen if
the last block of a transfer doesn't fit in the buffer, it remains useful
for the transport layer to know that more data follows what's already in
the buffer.
Willy Tarreau [Fri, 19 Jun 2020 15:07:06 +0000 (17:07 +0200)]
BUG/MEDIUM: stream-int: fix loss of CO_SFL_MSG_MORE flag in forwarding
In 2.2-dev1, a change was made by commit 46230363a ("MINOR: mux-h1: Inherit
send flags from the upper layer"). The purpose was to accurately set the
CO_SFL_MSG_MORE flag on the transport layer because previously it as only
set based on the buffer full condition, which does not accurately indicate
that there are more data to follow.
The problem is that the stream-interface never sets this flag anymore in
HTX mode due to the channel's to_forward always being set to infinity.
Because of this, HTX transfers are always performed without the MSG_MORE
flag and experience a severe performance degradation on large transfers.
This patch addresses this by making the stream-interface aware of HTX and
having it check for CF_EOI to check if more contents are expected or not.
With this change, the single-threaded forwarding performance on 10 MB
objects jumped from 29 to 40 Gbps.
Willy Tarreau [Fri, 20 Dec 2019 17:22:02 +0000 (18:22 +0100)]
MEDIUM: map: make the "clear map" operation yield
As reported in issue #419, a "clear map" operation on a very large map
can take a lot of time and freeze the entire process for several seconds.
This patch makes sure that pat_ref_prune() can regularly yield after
clearing some entries so that the rest of the process continues to work.
The first part, the removal of the patterns, can take quite some time
by itself in one run but it's still relatively fast. It may block for
up to 100ms for 16M IP addresses in a tree typically. This change needed
to declare an I/O handler for the clear operation so that we can get
back to it after yielding.
The second part can be much slower because it deconstructs the elements
and its users, but it iterates progressively so we can yield less often
here.
The patch was tested with traffic in parallel sollicitating the map being
released and showed no problem. Some traffic will definitely notice an
incomplete map but the filling is already not atomic anyway thus this is
not different.
It may be backported to stable versions once sufficiently tested for side
effects, at least as far as 2.0 in order to avoid the watchdog triggering
when the process is frozen there. For a better behaviour, all these
prune_* functions should support yielding so that the callers have a
chance to continue also yield in turn.
Willy Tarreau [Fri, 19 Jun 2020 14:20:59 +0000 (16:20 +0200)]
BUG/MAJOR: init: properly compute the default global.maxpipes value
Initial default settings for maxconn/maxsock/maxpipes were rearranged
in commit a409f30d0 ("MINOR: init: move the maxsock calculation code
to compute_ideal_maxsock()") but as a side effect, the calculated
maxpipes value was not stored anymore into global.maxpipes. This
resulted in splicing being disabled unless there is an explicit
maxpipes setting in the global section.
This patch just stores the calculated ideal value as planned in the
computation and as was done before the patch above.
BUG/MINOR: mworker/cli: fix semicolon escaping in master CLI
Fix the semicolon escaping which must be handled in the master CLI,
the commands were wrongly splitted and could be forwarded partially to
the target CLI.
BUG/MINOR: mworker/cli: fix the escaping in the master CLI
The master CLI must not do the escaping since it forwards the commands
to another CLI. It should be able to split into words by taking care of
the escaping, but must not remove the forwarded backslashes.
This fix do the same thing as the previous patch applied to the
cli_parse_request() function, by taking care of the escaping during the
word split, but it also remove the part which was removing the
backslashes from the forwarded command.
Willy Tarreau [Fri, 19 Jun 2020 10:17:55 +0000 (12:17 +0200)]
MEDIUM: tasks: also process late wakeups in process_runnable_tasks()
Since version 1.8, we've started to use tasks and tasklets more
extensively to defer I/O processing. Originally with the simple
scheduler, a task waking another one up using task_wakeup() would
have caused it to be processed right after the list of runnable ones.
With the introduction of tasklets, we've started to spill running
tasks from the run queues to the tasklet queues, so if a task wakes
another one up, it will only be executed on the next call to
process_runnable_task(), which means after yet another round of
polling loop.
This is particularly visible with I/Os hitting muxes: poll() reports
a read event, the connection layer performs a tasklet_wakeup() on the
mux subscribed to this I/O, and this mux in turn signals the upper
layer stream using task_wakeup(). The process goes back to poll() with
a null timeout since there's one active task, then back to checking all
possibly expired events, and finally back to process_runnable_tasks()
again. Worse, when there is high I/O activity, doing so will make the
task's execution further apart from the tasklet and will both increase
the total processing latency and reduce the cache hit ratio.
This patch brings back to the original spirit of process_runnable_tasks()
which is to execute runnable tasks as long as the execution budget is not
exhausted. By doing so, we're immediately cutting in half the number of
calls to all functions called by run_poll_loop(), and halving the number
of calls to poll(). Furthermore, calling poll() less often also means
purging FD updates less often and offering more chances to merge them.
This also has the nice effect of making tune.runqueue-depth effective
again, as in the past it used to be quickly bounded by this artificial
event horizon which was preventing from executing remaining tasks. On
certain workloads we can see a 2-3% performance increase.
Willy Tarreau [Fri, 19 Jun 2020 09:50:27 +0000 (11:50 +0200)]
MEDIUM: tasks: clean up the front side of the wait queue in wake_expired_tasks()
Due to the way the wait queue works, some tasks might be postponed but not
requeued. However when we exit wake_expired_tasks() on a not-yet-expired
task and leave it in this situation, the next call to next_timer_expiry()
will use this first task's key in the tree as an expiration date, but this
date might be totally off and cause needless wakeups just to reposition it.
This patch makes sure that we leave wake_expired_tasks with a clean state
of frontside tasks and that their tree's key matches their expiration date.
Doing so we can already observe a ~15% reduction of the number of wakeups
when dealing with large numbers of health checks.
The patch looks large because the code was rearranged but the real change
is to take the wakeup/requeue decision on the task's expiration date instead
of the tree node's key, the rest is unchanged.
Willy Tarreau [Fri, 19 Jun 2020 10:06:34 +0000 (12:06 +0200)]
MINOR: haproxy: process signals before runnable tasks
Nowadays signals cause tasks to be woken up. The historic code still
processes signals after tasks, which forces a second round in the loop
before they can effectively be processed. Let's move the signal queue
handling between wake_expired_tasks() and process_runnable_tasks() where
it makes much more sense.
Willy Tarreau [Thu, 18 Jun 2020 06:58:47 +0000 (08:58 +0200)]
MEDIUM: fd: add experimental support for edge-triggered polling
Some of the recent optimizations around the polling to save a few
epoll_ctl() calls have shown that they could also cause some trouble.
However, over time our code base has become totally asynchronous with
I/Os always attempted from the upper layers and only retried at the
bottom, making it look like we're getting closer to EPOLLET support.
There are showstoppers there such as the listeners which cannot support
this. But given that most of the epoll_ctl() dance comes from the
connections, we can try to enable edge-triggered polling on connections.
What this patch does is to add a new global tunable "tune.fd.edge-triggered",
that makes fd_insert() automatically set an et_possible bit on the fd if
the I/O callback is conn_fd_handler. When the epoll code sees an update
for such an FD, it immediately registers it in both directions the first
time and doesn't update it anymore.
On a few tests it proved quite useful with a 14% request rate increase in
a H2->H1 scenario, reducing the epoll_ctl() calls from 2 per request to
2 per connection.
The option is obviously disabled by default as bugs are still expected,
particularly around the subscribe() code where it is possible that some
layers do not always re-attempt reading data after being woken up.
Dragan Dosen [Thu, 18 Jun 2020 16:24:05 +0000 (18:24 +0200)]
MEDIUM: peers: add the "localpeer" global option
localpeer <name>
Sets the local instance's peer name. It will be ignored if the "-L"
command line argument is specified or if used after "peers" section
definitions. In such cases, a warning message will be emitted during
the configuration parsing.
This option will also set the HAPROXY_LOCALPEER environment variable.
See also "-L" in the management guide and "peers" section in the
configuration manual.
Willy Tarreau [Thu, 18 Jun 2020 06:14:59 +0000 (08:14 +0200)]
MINOR: fd: factorize the fd_takeover() exit path to make it safer
Since there was a risk of leaving fd_takeover() without properly
stopping the fd, let's take this opportunity for factoring the code
around a commont exit point that's common to both double-cas and locked
modes. This means using the "ret" variable inside the double-CAS code,
and inverting the loop to first test the old values. Doing do also
produces cleaner code because the compiler cannot factorize common
exit paths using asm statements that are present in some atomic ops.
Willy Tarreau [Thu, 18 Jun 2020 06:05:15 +0000 (08:05 +0200)]
MINOR: fd: slightly optimize the fd_takeover double-CAS loop
The loop in fd_takeover() around the double-CAS is conditionned on
a previous value of old_masks[0] that always matches tid_bit on the
first iteration because it does not result from the atomic op but
from a pre-loaded value. Let's set the result of the atomic op there
instead so that the conflict between threads can be detected earlier
and before performing the double-word CAS.
Willy Tarreau [Thu, 18 Jun 2020 05:28:09 +0000 (07:28 +0200)]
MEDIUM: fd: refine the fd_takeover() migration lock
When haproxy is compiled without double-word CAS, we use a migration lock
in fd_takeover(). This lock was covering the atomic OR on the running_mask
before checking its value, while it is not needed since this atomic op
already returns the result. Let's just refine the code to avoid grabbing
the lock in the event another thread has already stolen the FD, this may
reduce contention in high reuse rate scenarios.
Willy Tarreau [Wed, 17 Jun 2020 18:49:49 +0000 (20:49 +0200)]
MINOR: activity: rename the "stream" field to "stream_calls"
This one was confusingly called, I thought it was the cumulated number
of streams but it's the number of calls to process_stream(). Let's make
this clearer.
Willy Tarreau [Wed, 17 Jun 2020 18:44:28 +0000 (20:44 +0200)]
MINOR: activity: group the per-loop counters at the top
empty_rq and long_rq are per-loop so it makes sense to group them
together with the loop count. In addition since ctxsw and tasksw
apply in the context of these counters, let's move them as well.
More precisely the difference between wake_tasks and long_rq should
roughly correspond to the number of inter-task messages. Visually
it's much easier to spot ratios of wakeup causes now.
Olivier Houchard [Wed, 17 Jun 2020 18:34:05 +0000 (20:34 +0200)]
BUG/MEDIUM: fd: Call fd_stop_recv() when we just got a fd.
In fd_takeover(), when a double-width compare-and-swap is implemented,
make sure, if we managed to get the fd, to call fd_stop_recv() on it, so
that the thread that used to own it will know it has to stop polling it.
Olivier Houchard [Wed, 17 Jun 2020 18:32:34 +0000 (20:32 +0200)]
BUG/MEDIUM: fd: Don't fd_stop_recv() a fd we don't own.
In fd_takeover(), if we failed to grab the fd, when a double-width
compare-and-swap is not implemented, do not call fd_stop_recv() on the
fd, it is not ours and may be used by another thread.
Willy Tarreau [Wed, 17 Jun 2020 18:35:33 +0000 (20:35 +0200)]
MINOR: activity: rename confusing poll_* fields in the output
We have poll_drop, poll_dead and poll_skip which are confusingly named
like their poll_io and poll_exp counterparts except that they are not
per poll() call but per-fd. This patch renames them to poll_drop_fd(),
poll_dead_fd() and poll_skip_fd() for this reason.
Willy Tarreau [Wed, 17 Jun 2020 18:25:18 +0000 (20:25 +0200)]
MINOR: activity: report the number of times poll() reports I/O
The "show activity" output mentions a number of indicators to explain
wake up reasons but doesn't have the number of times poll() sees some
I/O. And given that multiple events can happen simultaneously, it's
not always possible to deduce this metric by subtracting.
This patch adds a new "poll_io" counter that allows one to see how
often poll() returns with at least one active FD. This should help
detect stuck events and measure various ratios of poll sub-metrics.
Willy Tarreau [Wed, 17 Jun 2020 17:58:37 +0000 (19:58 +0200)]
DOC: fd: make it clear that some fields ordering must absolutely be respected
fd_set_running() and fd_takeover() may both use a double-word CAS on the
(running_mask, thread_mask) couple and as such they expect the fields to
be exactly arranged like this. It's critical not to reorder them, so add
a comment to avoid such a potential mistake later.
Willy Tarreau [Wed, 17 Jun 2020 17:12:43 +0000 (19:12 +0200)]
CLEANUP: activity: remove unused counter fd_lock
Since 2.1-dev2, with commit 305d5ab46 ("MAJOR: fd: Get rid of the fd cache.")
we don't have the fd_lock anymore and as such its acitvity counter is always
zero. Let's remove it from the struct and from "show activity" output, as
there are already plenty of indicators to look at.
The cache line comment in the struct activity was updated to reflect
reality as it looks like another one already got removed in the past.
Willy Tarreau [Wed, 17 Jun 2020 14:26:22 +0000 (16:26 +0200)]
BUG/MAJOR: connection: always disable ready events once reported
This effectively reverts the two following commits:
6f95f6e11 ("OPTIM: connection: disable receiving on disabled events when the run queue is too high") 065a02561 ("MEDIUM: connection: don't stop receiving events in the FD handler")
The problem as reported in issue #662 is that when the events signals
the readiness of input data that has to be forwarded over a congested
stream, the mux will read data and wake the stream up to forward them,
but the buffer full condition makes this impossible immediately, then
nobody in the chain will be able to disable the event after it was
first reported. And given we don't know at the connection level whether
an event was already reported or not, we can't decide anymore to
forcefully stop it if for any reason its processing gets delayed.
The problem is magnified in issue #662 by the fact that a shutdown is
reported with pending data occupying the buffer. The shutdown will
strike in loops and cause the upper layer stream to be notified until
it's handled, but with a buffer full it's not possible to call cs_recv()
hence to purge the event.
All this can only be handled optimally by implementing a lower layer,
direct mux-to-mux forwarding that will not require any scheduling. This
was no wake up will be needed and the event will be instantly handled
or paused for a long time.
For now let's simply revert these optimizations. Running a 1 MB transfer
test over H2 using 8 connections having each 32 streams with a limited
link of 320 Mbps shows the following profile before this fix:
Willy Tarreau [Tue, 16 Jun 2020 17:14:19 +0000 (19:14 +0200)]
MINOR: version: report the presence of the compiler's address sanitizer
Since we've seen clang emit bad code when the address sanitizer is enabled
at -O2, better clearly report it in the version output. It is detected both
for clang and gcc (both tested with and without).
Willy Tarreau [Tue, 16 Jun 2020 17:13:24 +0000 (19:13 +0200)]
MINOR: compiler: always define __has_feature()
This macro is provided by clang but gcc lacks it. Not having it makes
it painful to test features on both compilers. Better define it to zero
when not available so that __has_feature(foo) never errors.
Willy Tarreau [Tue, 16 Jun 2020 17:11:11 +0000 (19:11 +0200)]
MINOR: version: put the compiler version output into version.c not haproxy.c
For an unknown reason in commit bb1b63c079 I placed the compiler version
output in haproxy.c instead of version.c. Better have it in version.c which
is more suitable to this sort of things.
The spoe parser fails to check that the decoded key length is large
enough to match a given key but it uses the returned length in memcmp().
So returning "ver" could match "version" for example. In addition this
makes clang 10's ASAN complain because the second argument to memcmp()
is the static key which is shorter than the decoded buffer size, which
in practice has no impact.
I'm still not 100% sure the parser is entirely correct because even
with this fix it cannot parse a key whose name matches the beginning
of another one, but in practice this does not happen. Ideally a
preliminary length check before the comparison would be safer.