This flag was used in some internal functions to be sure the current stream is
able to handle HTTP content. It was introduced when the legacy HTTP code was
still there. Now, It is possible to rely on stream's flags to be sure we have an
HTX stream.
So the flag HLUA_TXN_HTTP_RDY can be removed. Everywhere it was tested, it is
replaced by a call to the IS_HTX_STRM() macro.
This patch is mandatory to allow the support of the filters written in lua.
MINOR: lua: Stop using the lua txn in hlua_http_rep_hdr()
In this function, the lua txn was only used to retrieve the stream. But it can
be retieve from the HTTP message, using its channel pointer. So, the lua txn can
be removed from the function argument list.
This patch is mandatory to allow the support of the filters written in lua.
MINOR: lua: Stop using the lua txn in hlua_http_get_headers()
In this function, the lua txn was only used to test if the HTTP transaction is
defined. But it is always used in a context where it is true. So, the lua txn
can be removed from the function argument list.
This patch is mandatory to allow the support of the filters written in lua.
BUG/MINOR: lua: Ignore the reserve to know if a channel is full or not
The Lua function Channel.is_full() should not take care of the reserve because
it is not called from a producer (an applet for instance). From an action, it is
allowed to overwrite the buffer reserve.
This patch should be backported as far as 1.7. But it must be adapted for 1.8
and lower because there is no HTX on these versions.
BUG/MINOR: lua: Abort when txn:done() is called from a Lua action
When a lua action aborts a transaction calling txn:done() function, the action
must return ACT_RET_ABRT instead of ACT_RET_DONE. It is mandatory to
abort the message analysis.
This patch must be backported everywhere the commit 7716cdf45 ("MINOR: lua: Get
the action return code on the stack when an action finishes") was
backported. For now, no backport needed.
BUG/MINOR: http-ana: Reset request analysers on a response side error
When an error occurred on the response side, request analysers must be reset. At
this stage, only AN_REQ_HTTP_XFER_BODY analyser remains, and possibly
AN_REQ_FLT_END, if at least one filter is attached to the stream. So it is safe
to remove the AN_REQ_HTTP_XFER_BODY analyser. An error was already handled and a
response was already returned to the client (or it was at least scheduled to be
sent). So there is no reason to continue to process the request payload. It may
cause some troubles for the filters because when an error occurred, data from
the request buffer are truncated.
This patch must be backported as far as 1.9, for the HTX part only. I don't know
if the legacy HTTP code is affected.
BUG/MEDIUM: compression/filters: Fix loop on HTX blocks compressing the payload
During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.
So, from the compression filter point of view, when we loop on the HTX blocks to
compress the response payload, we must start from the head of the HTX
message. To ease the loop, we use the function htx_find_offset().
This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.
BUG/MEDIUM: cache/filters: Fix loop on HTX blocks caching the response payload
During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.
So, from the cache point of view, when we loop on the HTX blocks to cache the
response payload, we must start from the head of the HTX message. To ease the
loop, we use the function htx_find_offset().
This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.
BUG/MINOR: filters: Forward everything if no data filters are called
If a filter enable the data filtering, in TCP or in HTTP, but it does not
defined the corresponding callback function (so http_payload() or
tcp_payload()), it will be ignored. If all configured data filter do the same,
we must be sure to forward everything. Otherwise nothing will be forwarded at
all.
BUG/MINOR: filters: Use filter offset to decude the amount of forwarded data
When the tcp or http payload is filtered, it is important to use the filter
offset to decude the amount of forwarded data because this offset may change
during the call to the callback function. So we should not rely on a local
variable defined before this call.
For now, existing HAproxy filters don't change this offset, so this bug may only
affect external filters.
MINOR: htx: Add a function to return a block at a specific offset
The htx_find_offset() function may be used to look for a block at a specific
offset in an HTX message, starting from the message head. A compound result is
returned, an htx_ret structure, with the found block and the position of the
offset in the block. If the offset is ouside of the HTX message, the returned
block is NULL.
MINOR: buf: Add function to insert a string at an absolute offset in a buffer
The b_insert_blk() function may now be used to insert a string, given a pointer
and the string length, at an absolute offset in a buffer, moving data between
this offset and the buffer's tail just after the end of the inserted string. The
buffer's length is automatically updated. This function supports wrapping. All
the string is copied or nothing. So it returns 0 if there are not enough space
to perform the copy. Otherwise, the number of bytes copied is returned.
This patch fixes PROXYv2 parsing when the payload of the TCP connection is
fused with the PROXYv2 header within a single recv() call.
Previously HAProxy ignored the PROXYv2 header length when attempting to
parse the TLV, possibly interpreting the first byte of the payload as a
TLV type.
This patch adds proper validation. It ensures that:
1. TLV parsing stops when the end of the PROXYv2 header is reached.
2. TLV lengths cannot exceed the PROXYv2 header length.
3. The PROXYv2 header ends together with the last TLV, not allowing for
"stray bytes" to be ignored.
A reg-test was added to ensure proper behavior.
This patch tries to find the sweat spot between a small and easily
backportable one, and a cleaner one that's more easily adaptable to
older versions, hence why it merges the "if" and "while" blocks which
causes a reindent of the whole block. It should be used as-is for
versions 1.9 to 2.1, the block about PP2_TYPE_AUTHORITY should be
dropped for 2.0 and the block about CRC32C should be dropped for 1.8.
This bug was introduced when TLV parsing was added. This happened in commit b3e54fe387c7c1ea750f39d3029672d640c499f9. This commit was first released
with HAProxy 1.6-dev1.
Willy Tarreau [Fri, 6 Mar 2020 09:25:31 +0000 (10:25 +0100)]
BUG/MINOR: init: make the automatic maxconn consider the max of soft/hard limits
James Stroehmann reported something working as documented but that can
be considered as a regression in the way the automatic maxconn is
calculated from the process' limits :
The purpose of the changes in 2.0 was to have maxconn default to the
highest possible value permitted to the user based on the ulimit -n
setting, however the calculation starts from the soft limit, which
can be lower than what users were allowed to with previous versions
where the default value of 2000 would force a higher ulimit -n as
long as it fitted in the hard limit.
Usually this is not noticeable if the user changes the limits, because
quite commonly setting a new value restricts both the soft and hard
values.
Let's instead always use the max between the hard and soft limits, as
we know these values are permitted. This was tried on the following
setup:
OPTIM: startup: fast unique_id allocation for acl.
pattern_finalize_config() uses an inefficient algorithm which is a
problem with very large configuration files. This affects startup, and
therefore reload time. When haproxy is deployed as a router in a
Kubernetes cluster the generated configuration file may be large and
reloads are frequently occuring, which makes this a significant issue.
The old algorithm is O(n^2)
* allocate missing uids - O(n^2)
* sort linked list - O(n^2)
The new algorithm is O(n log n):
* find the user allocated uids - O(n)
* store them for efficient lookup - O(n log n)
* allocate missing uids - n times O(log n)
* sort all uids - O(n log n)
* convert back to linked list - O(n)
Willy Tarreau [Thu, 5 Mar 2020 16:16:24 +0000 (17:16 +0100)]
MINOR: debug: add CLI command "debug dev write" to write an arbitrary size
This command is used to produce an arbitrary amount of data on the
output. It can be used to test the CLI's state machine as well as
the internal parts related to applets an I/O. A typical test consists
in asking for all sizes from 0 to 16384:
$ (echo "prompt;expert-mode on";for i in {0..16384}; do
echo "debug dev write $i"; done) | socat - /tmp/sock1 | wc -c 134258738
A better test would consist in first waiting for the response before
sending a new request.
This command is not restricted to the admin since it's harmless.
It's just caused by an inconditional assignment of tmp_filter to
*sni_filter without having been initialized, though it's harmless because
this return pointer is not used when fcount is NULL, which is the only
case where this happens.
No backport is needed as this was brought today by commit 38df1c8006
("MINOR: ssl/cli: support crt-list filters").
Generate a list of the previous filters when updating a certificate
which use filters in crt-list. Then pass this list to the function
generating the sni_ctx during the commit.
This feature allows the update of the crt-list certificates which uses
the filters with "set ssl cert".
This function could be probably replaced by creating a new
ckch_inst_new_load_store() function which take the previous sni_ctx list as
an argument instead of the char **sni_filter, avoiding the
allocation/copy during runtime for each filter. But since are still
handling the multi-cert bundles, it's better this way to avoid code
duplication.
Willy Tarreau [Thu, 5 Mar 2020 06:41:20 +0000 (07:41 +0100)]
BUG/MINOR: connection/debug: do not enforce !event_type on subscribe() anymore
When building with DEBUG_STRICT, there are still some BUG_ON(events&event_type)
in the subscribe() code which are not welcome anymore since we explicitly
permit to wake the caller up on readiness. This causes some regtests to fail
since 2c1f37d353 ("OPTIM: mux-h1: subscribe rather than waking up at a few
other places") when built with this option.
Tim Duesterhus [Fri, 28 Feb 2020 14:13:32 +0000 (15:13 +0100)]
REGTEST: Add unique-id reg-test
This reg-test verifies the following behavior:
1. That unique IDs are stable (i.e. the bug fixed in 530408f976e5fe2f2f2b4b733b39da36770b566f)
2. That unique IDs can use values from the HTTP request (see https://www.mail-archive.com/haproxy@formilux.org/msg36436.html)
Willy Tarreau [Wed, 4 Mar 2020 16:50:02 +0000 (17:50 +0100)]
OPTIM: mux-h1: subscribe rather than waking up at a few other places
This is another round of conversion from a blind tasklet_wakeup() to
a more careful subscribe(). It has significantly improved the number
of function calls per HTTP request (/?s=1k/t=20) :
before after
tasklet_wakeup: 3 2
conn_subscribe: 3 2
h1_iocb: 3 2
h1_process: 3 2
h1_parse_msg_hdrs: 4 3
h1_rcv_buf: 5 3
h1_send: 5 4
h1_subscribe: 2 1
h1_wake_stream_for_send: 5 4
http_wait_for_request: 2 1
process_stream: 3 2
si_cs_io_cb: 4 2
si_cs_process: 3 1
si_cs_rcv: 5 3
si_sync_send: 2 1
si_update_both: 2 1
stream_int_chk_rcv_conn: 3 2
stream_int_notify: 3 1
stream_release_buffers: 9 4
Willy Tarreau [Wed, 4 Mar 2020 17:33:19 +0000 (18:33 +0100)]
OPTIM: connection: disable receiving on disabled events when the run queue is too high
In order to save a lot on syscalls, we currently don't disable receiving
on a file descriptor anymore if its handler was already woken up. But if
the run queue is huge and the poller collects a lot of events, this causes
excess wakeups which take CPU time which is not used to flush these tasklets.
This patch simply considers the run queue size to decide whether or not to
stop receiving. Tests show that by stopping receiving when the run queue
reaches ~16 times its configured size, we can still hold maximal performance
in extreme situations like maxpollevents=20k for runqueue_depth=2, and still
totally avoid calling epoll_event under moderate load using default settings
on keep-alive connections.
Willy Tarreau [Wed, 4 Mar 2020 16:45:21 +0000 (17:45 +0100)]
MEDIUM: connection: only call ->wake() for connect() without I/O
We used to call ->wake() for any I/O event for which there was no
subscriber. But this is a problem because this causes massive wake()
storms since we disabled fd_stop_recv() to save syscalls.
The only reason for the io_available condition is to detect that an
asynchronous connect() just finished and will not be handled by any
registered event handler. Since we now properly handle synchronous
connects, we can detect this situation by the fact that we had a
success on conn_fd_check() and no requested I/O took over.
Willy Tarreau [Wed, 4 Mar 2020 15:42:03 +0000 (16:42 +0100)]
MEDIUM: stream-int: make sure to try to immediately validate the connection
In the rare case of immediate connect() (unix sockets, socket pairs, and
occasionally TCP over the loopback), it is counter-productive to subscribe
for sending and then getting immediately back to process_stream() after
having passed through si_cs_process() just to update the connection. We
already know it is established and it doesn't have any handshake anymore
so we just have to complete it and return to process_stream() with the
stream_interface in the SI_ST_RDY state. In this case, process_stream will
simply loop back to the beginning to synchronize the state and turn it to
SI_ST_EST/ASS/CLO/TAR etc.
This will save us from having to needlessly subscribe in the connect()
code, something which in addition cannot work with edge-triggered pollers.
Willy Tarreau [Wed, 4 Mar 2020 16:22:10 +0000 (17:22 +0100)]
BUG/MEDIUM: connection: stop polling for sending when the event is ready
With commit 065a025610 ("MEDIUM: connection: don't stop receiving events
in the FD handler") we disabled a number of fd_stop_* in conn_fd_handler(),
in order to wait for their respective handlers to deal with them. But it
is not correct to do that for the send direction, as we may very well
have nothing to send. This is visible when connecting in TCP mode to
a server with no data to send, there's nobody anymore to disable the
polling for the send direction.
And it is logical, on the recv() path we know the system has data to
deliver and that some code will be in charge of it. On the send
direction we simply don't know if it was the result of a successful
connect() or if there is still something to send. In any case we
almost never fill the network buffer on a single send() after being
woken up by the system, so disabling the FD immediately or much later
will not change the number of operations.
Willy Tarreau [Wed, 4 Mar 2020 15:21:06 +0000 (16:21 +0100)]
BUILD: makefile: do not modify the build options during make reg-tests
I'm quite fed up with having to rebuild everything from scratch after each
and every "make reg-tests", especially during bisects. The only reason for
this is that there are no build options passed to make for reg-tests, which
modifies the .build_opts file again, resulting in a change upon next build.
Let's just keep this file out of the dependency check for make reg-tests.
Willy Tarreau [Wed, 4 Mar 2020 09:31:58 +0000 (10:31 +0100)]
BUILD: tools: rely on __ELF__ not USE_DL to enable use of dladdr()
The approach was wrong. USE_DL is for the makefile to know if it's required
to append -ldl at link time. Some platforms do not need it (and in fact do
not have it) yet they have a working dladdr(). The real condition is related
to ELF. Given that due to Lua, all platforms that require -ldl already have
USE_DL set, let's replace USE_DL with __ELF__ here and consider that dladdr
is always needed on ELF, which basically is already the case.
Willy Tarreau [Wed, 4 Mar 2020 09:19:36 +0000 (10:19 +0100)]
BUILD: tools: unbreak resolve_sym_name() on non-GNU platforms
resolve_sym_name() doesn't build when USE_DL is set on non-GNU platforms
because "Elf(W)" isn't defined. Since it's only used for dladdr1(), let's
refactor all this so that we can completely ifdef out that part on other
platforms. Now we have a separate function to perform the call depending
on the platform and it only returns the size when available.
Willy Tarreau [Wed, 4 Mar 2020 06:39:32 +0000 (07:39 +0100)]
MINOR: debug: dump the whole trace if we can't spot the starting point
Instead of special-casing the use of the symbol resolving to decide
whether to dump a partial or complete trace, let's simply start over
and dump everything when we reach the end after having found nothing.
It will be more robust against dirty traces as well.
So let's add clang+x86_64 to the list of platforms that will use our
simplified version. As a bonus it will not require to link with
-lexecinfo on FreeBSD and will work out of the box when passing
USE_BACKTRACE=1.
Willy Tarreau [Wed, 4 Mar 2020 06:44:06 +0000 (07:44 +0100)]
MINOR: debug: improve backtrace() on aarch64 and possibly other systems
It happens that on aarch64 backtrace() only returns one entry (tested
with gcc 4.7.4, 5.5.0 and 7.4.1). Probably that it refrains from unwinding
the stack due to the risk of hitting a bad pointer. Here we can use
may_access() to know when it's safe, so we can actually unwind the stack
without taking risks. It happens that the faulting function (the one
just after the signal handler) is not listed here, very likely because
the signal handler uses a special stack and did not create a new frame.
So this patch creates a new my_backtrace() function in standard.h that
either calls backtrace() or does its own unrolling. The choice depends
on HA_HAVE_WORKING_BACKTRACE which is set in compat.h based on the build
target.
Willy Tarreau [Wed, 4 Mar 2020 06:38:23 +0000 (07:38 +0100)]
MINOR: debug: report the number of entries in the backtrace
It's useful to get an indication of unresolved stuff or memory
corruption to have the apparent depth of the stack trace in the
output, especially if we dump nothing.
Willy Tarreau [Wed, 4 Mar 2020 09:48:18 +0000 (10:48 +0100)]
MEDIUM: wdt: fall back to CLOCK_REALTIME if CLOCK_THREAD_CPUTIME is not available
At least FreeBSD has a fully functional CLOCK_THREAD_CPUTIME but it
cannot create a timer on it. This is not a problem since our timer is
only used to measure each thread's usage using now_cpu_time_thread().
So by just replacing this clock with CLOCK_REALTIME we allow such
platforms to periodically call the wdt and check the thread's CPU usage.
The consequence is that even on a totally idle system there will still
be a few extra periodic wakeups, but the watchdog becomes usable there
as well.
Willy Tarreau [Wed, 4 Mar 2020 07:31:47 +0000 (08:31 +0100)]
BUILD: Makefile: include librt before libpthread
Statically building on for i386/x86_64 on linux+glibc 2.18 fails in rt with
undefined references to pthread_attr_init and a few others. Let's just swap
the two libs in order to fix this.
Willy Tarreau [Wed, 4 Mar 2020 09:46:13 +0000 (10:46 +0100)]
BUG/MINOR: wdt: do not return an error when the watchdog couldn't be enabled
On operating systems not supporting to create a timer on
POSIX_THREAD_CPUTIME we emit a warning but we return an error so the
process fails to start, which is absurd. Let's return a success once
the warning is emitted instead.
Emmanuel Hocdet [Mon, 16 Dec 2019 15:39:17 +0000 (16:39 +0100)]
MINOR: ssl: add "ca-verify-file" directive
It's only available for bind line. "ca-verify-file" allows to separate
CA certificates from "ca-file". CA names sent in server hello message is
only compute from "ca-file". Typically, "ca-file" must be defined with
intermediate certificates and "ca-verify-file" with certificates to
ending the chain, like root CA.
Willy Tarreau [Wed, 4 Mar 2020 05:01:40 +0000 (06:01 +0100)]
MINOR: debug: call backtrace() once upon startup
Calling backtrace() will access libgcc at runtime. We don't want to do
it after the chroot, so let's perform a first call to have it ready in
memory for later use.
Willy Tarreau [Tue, 3 Mar 2020 14:40:23 +0000 (15:40 +0100)]
MEDIUM: debug: add support for dumping backtraces of stuck threads
When a panic() occurs due to a stuck thread, we'll try to dump a
backtrace of this thread if the config directive USE_BACKTRACE is
set (which is the case on linux+glibc). For this we use the
backtrace() call provided by glibc and iterate the pointers through
resolve_sym_name(). In order to minimize the output (which is limited
to one buffer), we only do this for stuck threads, and we start the
dump above ha_panic()/ha_thread_dump_all_to_trash(), and stop when
meeting known points such as main/run_tasks_from_list/run_poll_loop.
If enabled without USE_DL, the dump will be complete with no details
except that pointers will all be given relative to main, which is
still better than nothing.
The new USE_BACKTRACE config option is enabled by default on glibc since
it has been present for ages. When it is set, the export-dynamic linker
option is enabled so that all non-static symbols are properly resolved.
Willy Tarreau [Tue, 3 Mar 2020 16:13:02 +0000 (17:13 +0100)]
MINOR: debug: use resolve_sym_name() to dump task handlers
Now in "show threads", the task/tasklet handler will be resolved
using this function, which will provide more detailed results and
will still support offsets to main for unresolved symbols.
Willy Tarreau [Tue, 3 Mar 2020 16:09:08 +0000 (17:09 +0100)]
MINOR: tools: add resolve_sym_name() to resolve function pointers
We use various hacks at a few places to try to identify known function
pointers in debugging outputs (show threads & show fd). Let's centralize
this into a new function dedicated to this. It already knows about the
functions matched by "show threads" and "show fd", and when built with
USE_DL, it can rely on dladdr1() to resolve other functions. There are
some limitations, as static functions are not resolved, linking with
-rdynamic is mandatory, and even then some functions will not necessarily
appear. It's possible to do a better job by rebuilding the whole symbol
table from the ELF headers in memory but it's less portable and the gains
are still limited, so this solution remains a reasonable tradeoff.
Willy Tarreau [Tue, 3 Mar 2020 14:57:10 +0000 (15:57 +0100)]
MINOR: tools: add new function dump_addr_and_bytes()
This function dumps <n> bytes from <addr> in hex form into buffer <buf>
enclosed in brackets after the address itself, formatted on 14 chars
including the "0x" prefix. This is meant to be used as a prefix for code
areas. For example: "0x7f10b6557690 [48 c7 c0 0f 00 00 00 0f]: "
It relies on may_access() to know if the bytes are dumpable, otherwise "--"
is emitted. An optional prefix is supported.
Willy Tarreau [Tue, 3 Mar 2020 16:06:52 +0000 (17:06 +0100)]
BUILD: tools: remove obsolete and conflicting trace() from standard.c
Since commit 4c2ae48375 ("MINOR: trace: implement a very basic trace()
function") merged in 2.1, trace() is an inline function. It must not
appear in standard.c anymore and may break build depending on includes.
Willy Tarreau [Tue, 3 Mar 2020 07:31:34 +0000 (08:31 +0100)]
BUG/MEDIUM: debug: make the debug_handler check for the thread in threads_to_dump
It happens that just sending the debug signal to the process makes on
thread wait for its turn while nobody wants to dump. We need to at
least verify that a dump was really requested for this thread.
Willy Tarreau [Tue, 3 Mar 2020 06:04:42 +0000 (07:04 +0100)]
MINOR: debug: report the task handler's pointer relative to main
Often in crash dumps we see unknown function pointers. Let's display
them relative to main, that helps quite a lot figure the function
from an executable, for example:
Willy Tarreau [Sat, 29 Feb 2020 08:08:02 +0000 (09:08 +0100)]
MINOR: tools: make sure to correctly check the returned 'ms' in date2std_log
In commit 4eee38a ("BUILD/MINOR: tools: fix build warning in the date
conversion functions") we added some return checks to shut build
warnings but the last test is useless since the tested pointer is not
updated by the last call to utoa_pad() used to convert the milliseconds.
It turns out the original code from 2012 already skipped this part,
probably in order to avoid the risk of seeing a millisecond field not
belonging to the 0-999 range. Better keep the check and put the code
into stricter shape.
Commit 80b53ffb1c ("MEDIUM: arg: make make_arg_list() stop after its
own arguments") changed the way we detect the empty list because we
cannot stop by looking up the closing parenthesis anymore, thus for
the first missing arg we have to enter the parsing loop again. And
there, finding an empty arg means we go to the empty_err label, where
it was not initially planned to handle this condition. This results
in %[date()] to fail while %[date] works. Let's simply check if we've
reached the minimally supported args there (it used to be done during
the function entry).
Willy Tarreau [Fri, 28 Feb 2020 14:21:42 +0000 (15:21 +0100)]
MEDIUM: mux-h1: do not blindly wake up the tasklet at end of request anymore
Since commit "MEDIUM: connection: make the subscribe() call able to wakeup
if ready" we have the guarantee that the tasklet will be woken up if
subscribing to a connection for an even that's ready. Since we have too
many tasklet_wakeup() calls in mux-h1, let's now use this property to
improve the situation a bit.
With this change, no syscall count changed, however the number of useless
calls to some functions significantly went down. Here are the differences
for the test below (100k req), in number of calls per request :
Note that the situation could be further improved by having muxes lazily
subscribe to Rx events in case the FD is already being polled. However
this requires deeper changes to implement a LAZY_RECV subscribe mode,
and to replace the FD's active bit by 3 states representing the desired
action to perform on the FD during the update, among NONE (no need to
change), POLL (can't proceed without), and STOP (buffer full). This
would only impact Rx since on Tx we know what we have to send. The
savings to expect from this might be more visible with splicing and/or
when dealing with many connections having long think times.
Willy Tarreau [Fri, 28 Feb 2020 13:42:26 +0000 (14:42 +0100)]
MEDIUM: connection: don't stop receiving events in the FD handler
The remaining epoll_ctl() calls are exclusively caused by the disagreement
between conn_fd_handler() and the mux receiving the data: the fd handler
wants to stop after having woken up the tasklet, then the mux after
receiving data wants to receive again. Given that they don't happen in
the same poll loop when there are many FDs, this causes a lot of state
changes.
As suggested by Olivier, if the task is already scheduled for running,
we don't need to disable the event because it's in the run queue, poll()
cannot stop, and reporting it again will be harmless. What *might*
happen however is that a sampling-based poller like epoll() would report
many times the same event and has trouble getting others behind. But if
it would happen, it would still indicate the run queue has plenty of
pending operations, so it would in fact only displace the problem from
the poller to the run queue, which doesn't seem to be worse (and in
fact we do support priorities while the poller does not).
By doing this change, the keep-alive test with 1k conns and 100k reqs
completely gets rid of the per-request epoll_ctl changes, while still
not causing extra recvfrom() :
Willy Tarreau [Fri, 28 Feb 2020 13:24:49 +0000 (14:24 +0100)]
MEDIUM: connection: make the subscribe() call able to wakeup if ready
There's currently an internal API limitation at the connection layer
regarding conn_subscribe(). We must not subscribe if we haven't yet
met EAGAIN or such a condition, so we sometimes force ourselves to read
in order to meet this condition and being allowed to call subscribe.
But reading cannot always be done (e.g. at the end of a loop where we
cannot afford to retrieve new data and start again) so we instead
perform a tasklet_wakeup() of the requester's io_cb. This is what is
done in mux_h1 for example. The problem with this is that it forces
a new receive when we're not necessarily certain we need one. And if
the FD is not ready and was already being polled, it's a useless
wakeup.
The current patch improves the connection-level subscribe() so that
it really manipulates the polling if the FD is marked not-ready, but
instead schedules the argument tasklet for a wakeup if the FD is
ready. This guarantees we'll wake this tasklet up in any case once the
FD is ready, either immediately or after polling.
By doing so, a test on pure close mode shows we cut in half the number
of epoll_ctl() calls and almost eliminate failed recvfrom():
Willy Tarreau [Fri, 28 Feb 2020 13:09:12 +0000 (14:09 +0100)]
MINOR: rawsock: always mark the FD not ready when we're certain it happens
This partially reverts commit 1113116b4a ("MEDIUM: raw-sock: remove
obsolete calls to fd_{cant,cond,done}_{send,recv}") so that we can mark
the FD not ready as required since commit 19bc201c9f ("MEDIUM: connection:
remove the intermediary polling state from the connection"). Indeed, with
the removal of the latter we don't have any other reliable indication that
the FD is blocked, which explains why there are so many EAGAIN in traces.
It's worth noting that a short read or a short write are also reliable
indicators of exhausted buffers and are even documented as such in the
epoll man page in case of edge-triggered mode. That's why we also report
the FD as blocked in such a case.
With this change we completely got rid of EAGAIN in keep-alive tests, but
they were expectedly transferred to epoll_ctl:
BUG/MINOR: http-htx: Do case-insensive comparisons on Host header name
When a header is added or modified, in http_add_header() or
http_replace_header(), a comparison is performed on its name to know if it is
the Host header and if the authority part of the uri must be updated or
not. This comparision must be case-insensive.
This patch should fix the issue #522. It must be backported to 2.1.
MINOR: contrib/prometheus-exporter: Add the last heathcheck duration metric
ST_F_CHECK_DURATION is now part of exported server metrics, named
haproxy_server_check_duration_seconds and expressed in seconds. For a given
server, this value is exported only if the healthcheck is finished (the status
is greater or equal to HCHK_STATUS_CHECKED).
This patch fixes the issue #519. It may be backported as fat as 2.0.
Lukas Tribus [Thu, 27 Feb 2020 14:47:24 +0000 (15:47 +0100)]
BUG/MINOR: dns: ignore trailing dot
As per issue #435 a hostname with a trailing dot confuses our DNS code,
as for a zero length DNS label we emit a null-byte. This change makes us
ignore the zero length label instead.
Willy Tarreau [Wed, 26 Feb 2020 15:12:45 +0000 (16:12 +0100)]
MINOR: fd: merge the read and write error bits into RW error
We always set them both, which makes sense since errors at the FD level
indicate a terminal condition for the socket that cannot be recovered.
Usually this is detected via a write error, but sometimes such an error
may asynchronously be reported on the read side. Let's simplify this
using only the write bit and calling it RW since it's used like this
everywhere, and leave the R bit spare for future use.
Willy Tarreau [Fri, 21 Feb 2020 15:26:19 +0000 (16:26 +0100)]
CLEANUP: fd: remove some unneeded definitions of FD_EV_* flags
There's no point in trying to be too generic for these flags as the
read and write sides will soon differ a bit. Better explicitly define
the flags for each direction without trying to be direction-agnostic.
this clarifies the code and removes some defines.
Willy Tarreau [Fri, 21 Feb 2020 13:25:34 +0000 (14:25 +0100)]
CLEANUP: fd: remove the FD_EV_STATUS aggregate
This was used only by fd_recv_state() and fd_send_state(), both of
which are unused. This will not work anymore once recv and send flags
start to differ, so let's remove this.
Jerome Magnin [Thu, 27 Feb 2020 22:36:56 +0000 (23:36 +0100)]
BUG/MINOR: http_ana: make sure redirect flags don't have overlapping bits
commit c87e46881 ("MINOR: http-rules: Add a flag on redirect rules to know the
rule direction") introduced a new flag for redirect rules, but its value has
bits in common with REDIRECT_FLAG_DROP_QS, which makes us enter this code path
in http_apply_redirect_rule(), which will then drop the query string.
To fix this, just give REDIRECT_FLAG_FROM_REQ its own unique value.
Ilya Shipitsin [Wed, 26 Feb 2020 14:29:36 +0000 (19:29 +0500)]
BUILD: cirrus-ci: suppress OS version check when installing packages
since we run "snapshot" images of FreeBSD, it is possible that kernel
ABI version might change from time to time. It might differ from
prebuilt packages (installed via "pkg"). We do not test kernel modules,
so for us is safe to ignore ABI mismatch.
BUG/MEDIUM: ssl: chain must be initialized with sk_X509_new_null()
Even when there isn't a chain, it must be initialized to a empty X509
structure with sk_X509_new_null().
This patch fixes a segfault which appears with older versions of the SSL
libs (openssl 0.9.8, libressl 2.8.3...) because X509_chain_up_ref() does
not check the pointer.
This bug was introduced by b90d2cb ("MINOR: ssl: resolve issuers chain
later").
Tim Duesterhus [Wed, 26 Feb 2020 15:20:49 +0000 (16:20 +0100)]
BUG/MINOR: sample: Make sure to return stable IDs in the unique-id fetch
Previously when the `unique-id-format` contained non-deterministic parts,
such as the `uuid` fetch each use of the `unique-id` fetch would generate
a new unique ID, replacing the old one. The following configuration shows
the error:
global
log stdout format short daemon
listen test
log global
log-format "%ID"
unique-id-format %{+X}o\ TEST-%[uuid]
mode http
bind *:8080
http-response set-header A %[unique-id]
http-response set-header B %[unique-id]
server example example.com:80
Without the patch the contents of the `A` and `B` response header would
differ.
Willy Tarreau [Wed, 26 Feb 2020 13:32:53 +0000 (14:32 +0100)]
MINOR: epoll: always initialize all of epoll_event to please valgrind
valgrind complains that epoll_ctl() uses an epoll_event in which we
have only set the part we use from the data field (i.e. the fd). Tests
show that pre-initializing the struct in the stack doesn't have a
measurable impact so let's do it.
Willy Tarreau [Wed, 26 Feb 2020 13:03:05 +0000 (14:03 +0100)]
MINOR: wdt: always clear sigev_value to make valgrind happy
In issue #471 it was reported that valgrind sometimes complains about
timer_create() being called with uninitialized bytes. These are in fact
the bits from sigev_value.sival_ptr that are not part of sival_int that
are tagged as such, as valgrind has no way to know we're using the int
instead of the ptr in the union. It's cheap to initialize the field so
let's do it.
Willy Tarreau [Wed, 26 Feb 2020 12:51:38 +0000 (13:51 +0100)]
BUG/MINOR: h2: reject again empty :path pseudo-headers
Since commit 92919f7fd5 ("MEDIUM: h2: make the request parser rebuild
a complete URI") we make sure to rebuild a complete URI. Unfortunately
the test for an empty :path pseudo-header that is mandated by #8.1.2.3
appened to be performed on the URI before this patch, which is never
empty anymore after being rebuilt, causing h2spec to complain :
8. HTTP Message Exchanges
8.1. HTTP Request/Response Exchange
8.1.2. HTTP Header Fields
8.1.2.3. Request Pseudo-Header Fields
- 1: Sends a HEADERS frame with empty ":path" pseudo-header field
-> The endpoint MUST respond with a stream error of type PROTOCOL_ERROR.
Expected: GOAWAY Frame (Error Code: PROTOCOL_ERROR)
RST_STREAM Frame (Error Code: PROTOCOL_ERROR)
Connection closed
Actual: DATA Frame (length:0, flags:0x01, stream_id:1)
It's worth noting that this error doesn't trigger when calling h2spec
with a timeout as some scripts do, which explains why it wasn't detected
after the patch above.
This fixes one half of issue #471 and should be backported to 2.1.
Emmanuel Hocdet [Tue, 18 Feb 2020 14:56:39 +0000 (15:56 +0100)]
MINOR: ssl: resolve ocsp_issuer later
The goal is to use the ckch to store data from PEM files or <payload> and
only for that. This patch adresses the ckch->ocsp_issuer case. It finds
issuers chain if no chain is present in the ckch in ssl_sock_put_ckch_into_ctx(),
filling the ocsp_issuer from the chain must be done after.
It changes the way '.issuer' is managed: it tries to load '.issuer' in
ckch->ocsp_issuer first and then look for the issuer in the chain later
(in ssl_sock_load_ocsp() ). "ssl-load-extra-files" without the "issuer"
parameter can negate extra '.issuer' file check.
Emmanuel Hocdet [Tue, 18 Feb 2020 14:27:32 +0000 (15:27 +0100)]
MINOR: ssl: resolve issuers chain later
The goal is to use the ckch to store data from a loaded PEM file or a
<payload> and only for that. This patch addresses the ckch->chain case.
Looking for the issuers chain, if no chain is present in the ckch, can
be done in ssl_sock_put_ckch_into_ctx(). This way it is possible to know
the origin of the certificate chain without an extra state.
Willy Tarreau [Wed, 26 Feb 2020 09:39:36 +0000 (10:39 +0100)]
MEDIUM: buffer: remove the buffer_wq lock
This lock was only needed to protect the buffer_wq list, but now we have
the mt_list for this. This patch simply turns the buffer_wq list to an
mt_list and gets rid of the lock.
It's worth noting that the whole buffer_wait thing still looks totally
wrong especially in a threaded context: the wakeup_cb() callback is
called synchronously from any thread and may end up calling some
connection code that was not expected to run on a given thread. The
whole thing should probably be reworked to use tasklets instead and be
a bit more centralized.
Willy Tarreau [Tue, 25 Feb 2020 17:14:02 +0000 (18:14 +0100)]
[RELEASE] Released version 2.2-dev3
Released version 2.2-dev3 with the following main changes :
- SCRIPTS: announce-release: place the send command in the mail's header
- SCRIPTS: announce-release: allow the user to force to overwrite old files
- SCRIPTS: backport: fix the master branch detection
- BUG/MINOR: http-act: Set stream error flag before returning an error
- BUG/MINOR: http-act: Fix bugs on error path during parsing of return actions
- BUG/MEDIUM: ssl/cli: 'commit ssl cert' wrong SSL_CTX init
- BUG/MEDIUM: tcp-rules: Fix track-sc* actions for L4/L5 TCP rules
- DOC: schematic of the SSL certificates architecture
- BUG/MAJOR: mux-h2: don't wake streams after connection was destroyed
- BUG/MINOR: unix: better catch situations where the unix socket path length is close to the limit
- BUILD: cirrus-ci: switch to "snap" images to unify openssl naming
- BUILD: cirrus-ci: workaround "pkg install" bug
- BUILD: cirrus-ci: add ERR=1 to freebsd builds
- BUG/MINOR: connection: correctly retry I/O on signals
- CLEANUP: mini-clist: simplify nested do { while(1) {} } while (0)
- BUILD: http_act: cast file sizes when reporting file size error
- BUG/MEDIUM: listener: only consider running threads when resuming listeners
- BUG/MINOR: listener: enforce all_threads_mask on bind_thread on init
- BUG/MINOR: tcp: avoid closing fd when socket failed in tcp_bind_listener
- MINOR: build: add aix72-gcc build TARGET and power{8,9} CPUs
- BUILD: travis-ci: no more allowed failures for openssl-1.0.2
- BUILD: travis-ci: harden builds, add ERR=1 (warning ought to be errors)
- BUILD: scripts/build-ssl.sh: use "uname" instead of ${TRAVIS_OS_NAME}
- BUG/MINOR: tcp: don't try to set defaultmss when value is negative
- SCRIPTS: make announce-release executable again
- BUG/MINOR: namespace: avoid closing fd when socket failed in my_socketat
- BUG/MEDIUM: muxes: Use the right argument when calling the destroy method.
- BUG/MINOR: mux-fcgi: Forbid special characters when matching PATH_INFO param
- CLEANUP: ssl: remove unused functions in openssl-compat.h
- MINOR: mux-fcgi: Make the capture of the path-info optional in pathinfo regex
- MINOR: tools: add is_idchar() to tell if a char may belong to an identifier
- MINOR: chunk: implement chunk_strncpy() to copy partial strings
- MINOR: sample/acl: use is_idchar() to locate the fetch/conv name
- MEDIUM: arg: make make_arg_list() stop after its own arguments
- MEDIUM: arg: copy parsed arguments into the trash instead of allocating them
- MEDIUM: arg: make make_arg_list() support quotes in arguments
- MINOR: sample: make sample_parse_expr() able to return an end pointer
- MEDIUM: log-format: make the LF parser aware of sample expressions' end
- BUG/MINOR: arg: report an error if an argument is larger than bufsize
- SCRIPTS: announce-release: use mutt -H instead of -i to include the draft
- BUILD: enable ERR=1 in github cygwin builds
- BUG/MINOR: arg: fix again incorrect argument length check
- MINOR: sample: regsub now supports backreferences
- BUG/MINOR: tools: also accept '+' as a valid character in an identifier
- MINOR: http-htx: Add a function to retrieve the headers size of an HTX message
- MINOR: filters: Forward data only if the last filter forwards something
- BUG/MINOR: filters: Count HTTP headers as filtered data but don't forward them
- BUG/MINOR: http-htx: Don't return error if authority is updated without changes
- BUG/MINOR: stream: Don't incr frontend cum_req counter when stream is closed
- BUG/MINOR: sample: exit regsub() in case of trash allocation error
- MINOR: ssl: add "issuers-chain-path" directive.
- REGTESTS: use "command -v" instead of "which"
- BUG/MINOR: http-ana: Matching on monitor-uri should be case-sensitive
- MINOR: http-ana: Match on the path if the monitor-uri starts by a /
- BUG/MINOR: ssl: Stop passing dynamic strings as format arguments
- BUG/MAJOR: http-ana: Always abort the request when a tarpit is triggered
- BUG/MINOR: mux: do not call conn_xprt_stop_recv() on buffer shortage
- MINOR: checks: do not call conn_xprt_stop_send() anymore
- CLEANUP: epoll: place the struct epoll_event in the stack
- MEDIUM: connection: remove the intermediary polling state from the connection
- MINOR: raw_sock: directly call fd_stop_send() and not conn_xprt_stop_send()
- MINOR: tcp/uxst/sockpair: use fd_want_send() instead of conn_xprt_want_send()
- MINOR: connection: remove the last calls to conn_xprt_{want,stop}_*
- CLEANUP: connection: remove the definitions of conn_xprt_{stop,want}_{send,recv}
- MINOR: connection: introduce a new receive flag: CO_RFL_READ_ONCE
- MINOR: mux-h1: pass CO_RFL_READ_ONCE to the lower layers when relevant
- MINOR: ist: add an iststop() function
- BUG/MINOR: http: http-request replace-path duplicates the query string
- CLEANUP: sample: use iststop instead of a for loop
- BUG/MEDIUM: shctx: make sure to keep all blocks aligned
- MINOR: compiler: move CPU capabilities definition from config.h and complete them
- BUG/MEDIUM: ebtree: don't set attribute packed without unaligned access support
- CLEANUP: http/h1: rely on HA_UNALIGNED_LE instead of checking for CPU families
- BUILD: fix recent build failure on unaligned archs
- MINOR: ssl: load the key from a dedicated file
- BUG/MINOR: ssl: load .key in a directory only after PEM
- MINOR: compiler: drop special cases of likely/unlikely for older compilers
- CLEANUP: conn: Do not pass a pointer to likely
- CLEANUP: net_helper: Do not negate the result of unlikely
- BUILD: remove obsolete support for -mregparm / USE_REGPARM
- CLEANUP: cfgparse: Fix type of second calloc() parameter
- BUILD: ssl: only pass unsigned chars to isspace()
- BUILD: general: always pass unsigned chars to is* functions
- BUG/MINOR: sample: fix the json converter's endian-sensitivity
- BUG/MEDIUM: ssl: fix several bad pointer aliases in a few sample fetch functions
- CLEANUP: fd: use a union in fd_rm_from_fd_list() to shut aliasing warnings
- CLEANUP: cache: use read_u32/write_u32 to access the cache entry's hash
- CLEANUP: stick-tables: use read_u32() to display a node's key
- CLEANUP: sample: use read_u64() in ipmask() to apply an IPv6 mask
- MINOR: pattern: fix all remaining strict aliasing issues
- CLEANUP: lua: fix aliasing issues in the address matching code
- CLEANUP: connection: use read_u32() instead of a cast in the netscaler parser
- BUILD: makefile: re-enable strict aliasing
- BUG/MINOR: connection: make sure to correctly tag local PROXY connections
- MINOR: compiler: add new alignment macros
- BUILD: ebtree: improve architecture-specific alignment
- MINOR: config: mark global.debug as deprecated
- BUILD: travis-ci: enable s390x builds
- MINOR: ssl/cli: 'show ssl cert' displays the chain
- MINOR: ssl/cli: 'show ssl cert'displays the issuer in the chain
- MINOR: ssl/cli: reorder 'show ssl cert' output
- CLEANUP: ssl: move issuer_chain tree and definition
- DOC: proxy-protocol: clarify IPv6 address representation in the spec
Willy Tarreau [Tue, 25 Feb 2020 17:04:39 +0000 (18:04 +0100)]
DOC: proxy-protocol: clarify IPv6 address representation in the spec
Daniel Barclay reported that the wording around "IPv6 addresses must be
indicated as series of 4 hex digits" is confusing and can be interpreted
two ways (only 4 digits or series of sets of 4 digits), so let's adjust
the wording to resolve this ambiguity.
CLEANUP: ssl: move issuer_chain tree and definition
Move the cert_issuer_tree outside the global_ssl structure since it's
not a configuration variable. And move the declaration of the
issuer_chain structure in types/ssl_sock.h
Reorder the 'show ssl cert' output so it's easier to see if the whole
chain is correct.
For a chain to be correct, an "Issuer" line must have the same
content as the next "Subject" line.
Example:
Subject: /C=FR/ST=Paris/O=HAProxy Test Certificate/CN=test.haproxy.local
Issuer: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 2/CN=ca2.haproxy.local
Chain Subject: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 2/CN=ca2.haproxy.local
Chain Issuer: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local
Chain Subject: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local
Chain Issuer: /C=FR/ST=Paris/O=HAProxy Test Root CA/CN=root.haproxy.local
MINOR: ssl/cli: 'show ssl cert'displays the issuer in the chain
For each certificate in the chain, displays the issuer, so it's easy to
know if the chain is right.
Also rename "Chain" to "Chain Subject".
Example:
Chain Subject: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 2/CN=ca2.haproxy.local
Chain Issuer: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local
Chain Subject: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local
Chain Issuer: /C=FR/ST=Paris/O=HAProxy Test Root CA/CN=root.haproxy.local
MINOR: ssl/cli: 'show ssl cert' displays the chain
Display the subject of each certificate contained in the chain in the
output of "show ssl cert <filename>".
Each subjects are on a unique line prefixed by "Chain: "
Example:
Chain: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 2/CN=ca2.haproxy.local
Chain: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local