git.ipfire.org Git - thirdparty/haproxy.git/log

BUILD: travis-ci: link with ssl libraries using rpath instead of LD_LIBRARY_PATH/DYLD_LIBRARY_PATH

modifying LD_LIBRARY_PATH/DYLD_LIBRARY_PATH also affects other utilities like curl
to avoid side effects let us use rpath for ssl library linking

Fixes #418

BUILD: ssl: improve SSL_CTX_set_ecdh_auto compatibility

SSL_CTX_set_ecdh_auto() is not defined when OpenSSL 1.1.1 is compiled
with the no-deprecated option. Remove existing, incomplete guards and
add a compatibility macro in openssl-compat.h, just as OpenSSL does:

https://github.com/openssl/openssl/blob/bf4006a6f9be691ba6eef0e8629e63369a033ccf/include/openssl/ssl.h#L1486

This should be backported as far as 2.0 and probably even 1.9.

BUG/MEDIUM: stream: Be sure to never assign a TCP backend to an HTX stream

With a TCP frontend, it is possible to upgrade a connection to HTTP when the
backend is in HTTP mode. Concretly the upgrade install a new mux. So, once it is
done, the downgrade to TCP is no longer possible. So we must take care to never
assign a TCP backend to a stream on this connection. Otherwise, HAProxy crashes
because raw data from the server are handled as structured data on the client
side.

This patch fixes the issue #420. It must be backported to all versions
supporting the HTX.

BUG/MAJOR: mux-h1: Don't pretend the input channel's buffer is full if empty

A regression was introduced by the commit 76014fd1 ("MEDIUM: h1-htx: Add HTX EOM
block when the message is in H1_MSG_DONE state"). When nothing is copied in the
channel's buffer when the input message is parsed, we erroneously pretend it is
because there is not enough room by setting the CS_FL_WANT_ROOM flag on the
conn-stream. This happens when a partial request is parsed. Because of this
flag, we never try anymore to get input data from the mux because we first wait
for more room in the channel's buffer, which is empty. Because of this bug, it
is pretty easy to freeze a h1 connection.

To fix the bug, we must obsiously set the CS_FL_WANT_ROOM flag only when there
are still data to transfer while the channel's buffer is not empty.

This patch must be backported if the patch 76014fd1 is backported too. So for
now, no backport needed.

BUG/MINOR: state-file: do not leak memory on parse errors

Issue #417 reports a possible memory leak in the state-file loading code.
There's one such place in the loop which corresponds to parsing errors
where the curreently allocated line is not freed when dropped. In any
case this is very minor in that no more than the file's length may be
lost in the worst case, considering that the whole file is kept anyway
in case of success. This fix addresses this.

It should be backported to 2.1.

BUG/MINOR: state-file: do not store duplicates in the global tree

The global state file tree isn't configured for unique keys, so if an
entry appears multiple times, e.g. due to a bogus script that concatenates
entries multiple times, this will needlessly eat memory. Let's just drop
duplicates.

This should be backported to 2.1.

BUG/MEDIUM: state-file: do not allocate a full buffer for each server entry

Starting haproxy with a state file of 700k servers eats 11.2 GB of RAM
due to a mistake in the function that loads the strings into a tree: it
allocates a full buffer for each backend+server name instead of allocating
just the required string. By just fixing this we're down to 80 MB.

This should be backported to 2.1.

BUG/MINOR: ssl: openssl-compat: Fix getm_ defines

LIBRESSL_VERSION_NUMBER evaluates to 0 under OpenSSL, making the condition
always true. Check for the define before checking it.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
[wt: to be backported as far as 1.9]

REGTEST: make the "set ssl cert" require version 2.1

This test fails on 2.0 and earlier since the feature was introduced in 2.1,
let's add the REQUIRE_VERSION tag.

BUG/MEDIUM: fd/threads: fix a concurrency issue between add and rm on the same fd

There's a very hard-to-trigger bug in the FD list code where the
fd_add_to_fd_list() function assumes that if the FD it's trying to add
is already locked, it's in the process of being added. Unfortunately, it
can also be in the process of being removed. It is very hard to trigger
because it requires that one thread is removing the FD while another one
is adding it. First very few FDs run on multiple threads (listeners and
DNS), and second, it does not make sense to add and remove the FD at the
same time.

In practice the DNS code built on the older callback-only model does
perform bursts of fd_want_send() for all resolvers at once when it wants
to send a new query (dns_send_query()). And this is more likely to happen
when here are lots of resolutions in parallel and many resolvers, because
the dns_response_recv() callback can also trigger a series of queries on
all resolvers for each invalid response it receives. This means that it
really is perfectly possible to both stop and start in parallel during
short periods of time there.

This issue was not reported before 2.1, but 2.1 had the FD cache, built
on the exact same code base. It's very possible that the issue caused
exactly the opposite situation, where an event was occasionally lost,
causing a DNS retry that worked, and nobody noticing the problem in the
end. In 2.1 the lost entries are the updates asking for not polling for
writes anymore, and the effect is that the poller contiuously reports
writability on the socket when the issue happens.

This patch fixes bug #416 and must be backported as far as 1.8, and
absolutely requires that previous commit "MINOR: fd/threads: make
_GET_NEXT()/_GET_PREV() use the volatile attribute" is backported as
well otherwise it will make the issue worse.

Special thanks to Julien Pivotto for setting up a reliable reproducer
for this difficult issue.

MINOR: fd/threads: make _GET_NEXT()/_GET_PREV() use the volatile attribute

These macros are either used between atomic ops which cause the volatile
to be implicit, or with an explicit volatile cast. However not having it
in the macro causes some traps in the code because certain loop paths
cannot safely be used without risking infinite loops if one isn't careful
enough.

Let's place the volatile attribute inside the macros and remove them from
the explicit places to avoid this. It was verified that the output executable
remains exactly the same byte-wise.

BUG/MEDIUM: ssl: Revamp the way early data are handled.

Instead of attempting to read the early data only when the upper layer asks
for data, allocate a temporary buffer, stored in the ssl_sock_ctx, and put
all the early data in there. Requiring that the upper layer takes care of it
means that if for some reason the upper layer wants to emit data before it
has totally read the early data, we will be stuck forever.

This should be backported to 2.1 and 2.0.
This may fix github issue #411.

BUG/MAJOR: task: add a new TASK_SHARED_WQ flag to fix foreing requeuing

Since 1.9 with commit b20aa9eef3 ("MAJOR: tasks: create per-thread wait
queues") a task bound to a single thread will not use locks when being
queued or dequeued because the wait queue is assumed to be the owner
thread's.

But there exists a rare situation where this is not true: the health
check tasks may be running on one thread waiting for a response, and
may in parallel be requeued by another thread calling health_adjust()
after a detecting a response error in traffic when "observe l7" is set,
and "fastinter" is lower than "inter", requiring to shorten the running
check's timeout. In this case, the task being requeued was present in
another thread's wait queue, thus opening a race during task_unlink_wq(),
and gets requeued into the calling thread's wait queue instead of the
running one's, opening a second race here.

This patch aims at protecting against the risk of calling task_unlink_wq()
from one thread while the task is queued on another thread, hence unlocked,
by introducing a new TASK_SHARED_WQ flag.

This new flag indicates that a task's position in the wait queue may be
adjusted by other threads than then one currently executing it. This means
that such WQ manipulations must be performed under a lock. There are two
types of such tasks:
  - the global ones, using the global wait queue (technically speaking,
    those whose thread_mask has at least 2 bits set).
  - some local ones, which for now will be placed into the global wait
    queue as well in order to benefit from its lock.

The flag is automatically set on initialization if the task's thread mask
indicates more than one thread. The caller must also set it if it intends
to let other threads update the task's expiration delay (e.g. delegated
I/Os), or if it intends to change the task's affinity over time as this
could lead to the same situation.

Right now only the situation described above seems to be affected by this
issue, and it is very difficult to trigger, and even then, will often have
no visible effect beyond stopping the checks for example once the race is
met. On my laptop it is feasible with the following config, chained to
httpterm:

    global
        maxconn 400 # provoke FD errors, calling health_adjust()

    defaults
        mode http
        timeout client 10s
        timeout server 10s
        timeout connect 10s

    listen px
        bind :8001
        option httpchk /?t=50
        server sback 127.0.0.1:8000 backup
        server-template s 0-999 127.0.0.1:8000 check port 8001 inter 100 fastinter 10 observe layer7

This patch will automatically address the case for the checks because
check tasks are created with multiple threads bound and will get the
TASK_SHARED_WQ flag set.

If in the future more tasks need to rely on this (multi-threaded muxes
for example) and the use of the global wait queue becomes a bottleneck
again, then it should not be too difficult to place locks on the local
wait queues and queue the task on its bound thread.

This patch needs to be backported to 2.1, 2.0 and 1.9. It depends on
previous patch "MINOR: task: only check TASK_WOKEN_ANY to decide to
requeue a task".

Many thanks to William Dauchy for providing detailed traces allowing to
spot the problem.

MINOR: task: only check TASK_WOKEN_ANY to decide to requeue a task

After processing a task, its RUNNING bit is cleared and at the same time
we check for other bits to decide whether to requeue the task or not. It
happens that we only want to check the TASK_WOKEN_* bits, because :
  - TASK_RUNNING was just cleared
  - TASK_GLOBAL and TASK_QUEUE cannot be set yet as the task was running,
    preventing it from being requeued

It's important not to catch yet undefined flags there because it would
prevent addition of new task flags. This also shows more clearly that
waking a task up with flags 0 is not something safe to do as the task
will not be woken up if it's already running.

REGTEST: run-regtests: implement #REQUIRE_BINARIES

Implement #REQUIRE_BINARIES for vtc files.

The run-regtests.sh script will check if the binary is available in the
environment, if not, it wil disable the vtc.

REGTEST: ssl: test the "set ssl cert" CLI command

Add a reg-test which test the update of a certificate over the CLI. This
test requires socat and curl.

This commit also adds an ECDSA certificate in the ssl directory.

MINOR: http: add a new "replace-path" action

This action is very similar to "replace-uri" except that it only acts on the
path component. This is assumed to better match users' expectations when they
used to rely on "replace-uri" in HTTP/1 because mostly origin forms were used
in H1 while mostly absolute URI form is used in H2, and their rules very often
start with a '/', and as such do not match.

It could help users to get this backported to 2.0 and 2.1.

MINOR: debug: support logging to various sinks

As discussed in the thread below [1], the debug converter is currently
not of much use given that it's only built when DEBUG_EXPR is set, and
it is limited to stderr only.

This patch changes this to make it take an optional prefix and an optional
target sink so that it can log to stdout, stderr or a ring buffer. The
default output is the "buf0" ring buffer, that can be consulted from the
CLI.

[1] https://www.mail-archive.com/haproxy@formilux.org/msg35671.html

Note: if this patch is backported, it also requires the following commit to
work: 46dfd78cbf ("BUG/MINOR: sample: always check converters' arguments").

BUG/MINOR: ssl/cli: fix build for openssl < 1.0.2

Commit d4f946c ("MINOR: ssl/cli: 'show ssl cert' give information on the
certificates") introduced a build issue with openssl version < 1.0.2
because it uses the certificate bundles.

MINOR: ssl/cli: 'show ssl cert' give information on the certificates

Implement the 'show ssl cert' command on the CLI which list the frontend
certificates. With a certificate name in parameter it will show more
details.

BUG/MEDIUM: ssl: Don't set the max early data we can receive too early.

When accepting the max early data, don't set it on the SSL_CTX while parsing
the configuration, as at this point global.tune.maxrewrite may still be -1,
either because it was not set, or because it hasn't been set yet. Instead,
set it for each connection, just after we created the new SSL.
Not doing so meant that we could pretend to accept early data bigger than one
of our buffer.

This should be backported to 2.1, 2.0, 1.9 and 1.8.

MINOR: sample: Validate the number of bits for the sha2 converter

Instead of failing the conversion when an invalid number of bits is
given the sha2 converter now fails with an appropriate error message
during startup.

The sha2 converter was introduced in d4376302377e4f51f43a183c2c91d929b27e1ae3,
which is in 2.1 and higher.

BUG/MINOR: sample: always check converters' arguments

In 1.5-dev20, sample-fetch arguments parsing was addresse by commit
689a1df0a1 ("BUG/MEDIUM: sample: simplify and fix the argument parsing").
The issue was that argument checks were not run for sample-fetches if
parenthesis were not present. Surprisingly, the fix was mde only for
sample-fetches and not for converters which suffer from the exact same
problem. There are even a few comments in the code mentioning that some
argument validation functions are not called when arguments are missing.

This fix applies the exact same method as the one above. The impact of
this bug is limited because over the years the code has learned to work
around this issue instead of fixing it.

This may be backported to all maintained versions.

BUG/MINOR: sample: fix the closing bracket and LF in the debug converter

The closing bracket was emitted for the "debug" converter even when the
opening one was not sent, and the new line was not always emitted. Let's
fix this. This is harmless since this converter is not built by default.

DOC: clarify the fact that replace-uri works on a full URI

With H2 deployments becoming more common, replace-uri starts to hit
users by not always matching absolute URIs due to rules expecting the
URI to start with a '/'.

REGTEST: Add an HTX reg-test to check an edge case

This test checks that an HTTP message is properly processed when we failed to
add the HTX EOM block in an HTX message during the parsing because the buffer is
full. Some space must be released in the buffer to make it possible. This
requires an extra pass in the H1 multiplexer. Here, we must be sure the mux is
called while there is no more incoming data.

It is a "devel" test because conditions to run the test successfully is highly
dependent on the implementation. So if it fail, it is not necessarily a bug. It
may be due of an internal change. It relies on internal HTX sample fetches.

MINOR: http-htx: Add some htx sample fetches for debugging purpose

These sample fetches are internal and must be used for debugging purpose. Idea
is to have a way to add some checks on the HTX content from http rules. The main
purpose is to ease reg-tests writing.

MEDIUM: h1-htx: Add HTX EOM block when the message is in H1_MSG_DONE state

During H1 parsing, the HTX EOM block is added before switching the message state
to H1_MSG_DONE. It is an exception in the way to convert an H1 message to
HTX. Except for this block, the message is first switched to the right state
before starting to add the corresponding HTX blocks. For instance, the message
is switched in H1_MSG_DATA state and then the HTX DATA blocks are added.

With this patch, the message is switched to the H1_MSG_DONE state when all data
blocks or trailers were processed. It is the caller responsibility to call
h1_parse_msg_eom() when the H1_MSG_DONE state is reached. This way, it is far
easier to catch failures when the HTX buffer is full.

The H1 and FCGI muxes have been updated accordingly.

This patch may eventually be backported to 2.1 if it helps other backports.

BUILD/MINOR: unix sockets: silence an absurd gcc warning about strncpy()

Apparently gcc developers decided that strncpy() semantics are no longer
valid and now deserve a warning, especially if used exactly as designed.
This results in issue #304. Let's just remove one to the target size to
please her majesty gcc, the God of C Compilers, who tries hard to make
users completely eliminate any use of string.h and reimplement it by
themselves at much higher risks. Pfff....

This can be backported to stable version, the fix is harmless since it
ignores the last zero that is already set on next line.

BUG/MINOR: listener: fix off-by-one in state name check

As reported in issue #380, the state check in listener_state_str() is
invalid as it allows state value 9 to report crap. We don't use such
a state value so the issue should never happen unless the memory is
already corrupted, but better clean this now while it's harmless.

This should be backported to all maintained branches.

BUG/MINOR: server: make "agent-addr" work on default-server line

As reported in issue #408, "agent-addr" doesn't work on default-server
lines. This is due to the transcription of the old "addr" option in commit
6e5e0d8f9e ("MINOR: server: Make 'default-server' support 'addr' keyword.")
which correctly assigns it to the check.addr and agent.addr fields, but
which also copies the default check.addr into both the check's and the
agent's addr fields. Thus the default agent's address is never used.

This fix makes sure to copy the check from the check and the agent from
the agent. However it's worth noting that if "addr" is specified on the
server line, it will still overwrite both the check and the agent's
addresses.

This must be backported as far as 1.8.

BUG/MINOR: listener: do not immediately resume on transient error

The listener supports a "transient error" situation, which corresponds
to those situations where accept fails badly but poll() reports an event.
This happens for example when a listener is paused, or on out of FD. The
same mechanism is used when facing a maxconn or maxsessrate limitation.
When this happens, the listener is disabled for up to 100ms and put back
into the global listener queue so that it automatically wakes up again
as soon as the conditions change from an existing connection releasing
one resource, or the system recovers from a transient issue.

The listener_accept() function has a bug in its exit path causing a
freshly limited listener to be immediately enabled again because all
the conditions are met (connection count < max). It doesn't take into
account the fact that the listener might have been queued and must
first wait for the timeout to expire before doing so. The impact is
that upon certain errors, the faulty process will busy loop on the
accept code without sleeping. This is the scenario reported and
diagnosed by @hedong0411 in issue #382.

This commit fixes it by verifying that the global queue's delay is
at least expired before deciding to resume the listener. Another
approach could consist in having an extra state like LI_DELAY for
situations where only a delay is acceptable, but this would probably
not bring anything except more complex code.

This issue was introduced with the lock-free listener accept code
(commits 3f0d02b and 82c9789a) that were backported to 1.8.20+ and
1.9.7+, so this fix must be backported to the relevant branches.

BUG/MINOR: mworker: properly pass SIGTTOU/SIGTTIN to workers

If a new process is started with -sf and it fails to bind, it may send
a SIGTTOU to the master process in hope that it will temporarily unbind.
Unfortunately this one doesn't catch it and stops to background instead
of forwarding the signal to the workers. The same is true for SIGTTIN.

This commit simply implements an extra signal handler for the master to
deal with such signals that must be passed down to the workers. It must
be backported as far as 1.8, though there the code differs in that it's
entirely in haproxy.c and doesn't require an extra sig handler.

BUG/MINOR: log: fix minor resource leaks on logformat error path

As reported by Ilya in issue #392, Coverity found that we're leaking
allocated strings on error paths in parse_logformat(). Let's use a
proper exit label for failures instead of seeding return 0 everywhere.

This should be backported to all supported versions.

DOC: remove references to the outdated architecture.txt

As mentionned in bug #405 we continue to reference architecture.txt from
places in the doc despite this file not being packaged for many years.
Better drop the reference if it's confusing.

DOC: proxies: HAProxy only supports 3 connection modes

The 4th one (forceclose) has been deprecated and deleted from the
documentation in 10c6c16cde0b0b473a1ab16e958a7d6b61ed36fc

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

MINOR: tasks: split wake_expired_tasks() in two parts to avoid useless wakeups

We used to have wake_expired_tasks() wake up tasks and return the next
expiration delay. The problem this causes is that we have to call it just
before poll() in order to consider latest timers, but this also means that
we don't wake up all newly expired tasks upon return from poll(), which
thus systematically requires a second poll() round.

This is visible when running any scheduled task like a health check, as there
are systematically two poll() calls, one with the interval, nothing is done
after it, and another one with a zero delay, and the task is called:

  listen test
    bind *:8001
    server s1 127.0.0.1:1111 check

  09:37:38.200959 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8696843}) = 0
  09:37:38.200967 epoll_wait(3, [], 200, 1000) = 0
  09:37:39.202459 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8712467}) = 0
>> nothing run here, as the expired task was not woken up yet.
  09:37:39.202497 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8715766}) = 0
  09:37:39.202505 epoll_wait(3, [], 200, 0) = 0
  09:37:39.202513 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8719064}) = 0
>> now the expired task was woken up
  09:37:39.202522 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
  09:37:39.202537 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
  09:37:39.202565 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
  09:37:39.202577 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
  09:37:39.202585 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
  09:37:39.202659 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
  09:37:39.202673 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8814713}) = 0
  09:37:39.202683 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
  09:37:39.202693 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8818617}) = 0
  09:37:39.202701 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
  09:37:39.202715 close(7)                = 0

Let's instead split the function in two parts:
  - the first part, wake_expired_tasks(), called just before
    process_runnable_tasks(), wakes up all expired tasks; it doesn't
    compute any timeout.
  - the second part, next_timer_expiry(), called just before poll(),
    only computes the next timeout for the current thread.

Thanks to this, all expired tasks are properly woken up when leaving
poll, and each poll call's timeout remains up to date:

  09:41:16.270449 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10223556}) = 0
  09:41:16.270457 epoll_wait(3, [], 200, 999) = 0
  09:41:17.270130 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10238572}) = 0
  09:41:17.270157 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
  09:41:17.270194 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
  09:41:17.270204 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
  09:41:17.270216 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
  09:41:17.270224 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
  09:41:17.270299 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
  09:41:17.270314 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10337841}) = 0
  09:41:17.270323 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
  09:41:17.270332 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10341860}) = 0
  09:41:17.270340 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
  09:41:17.270367 close(7)                = 0

This may be backported to 2.1 and 2.0 though it's unlikely to bring any
user-visible improvement except to clarify debugging.

BUG/MINOR: tasks: only requeue a task if it was already in the queue

Commit 0742c314c3 ("BUG/MEDIUM: tasks: Make sure we switch wait queues
in task_set_affinity().") had a slight side effect on expired timeouts,
which is that when used before a timeout is updated, it will cause an
existing task to be requeued earlier than its expected timeout when done
before being updated, resulting in the next poll wakup timeout too early
or even instantly if the previous wake up was done on a timeout. This is
visible in strace when health checks are enabled because there are two
poll calls, one of which has a short or zero delay. The correct solution
is to only requeue a task if it was already in the queue.

This can be backported to all branches having the fix above.

DOC: listeners: add a few missing transitions

Some disable() transitions were missing, and the distinction between
multi-threaded and single-threaded transitions was not mentioned.

BUG/MEDIUM: proto_udp/threads: recv() and send() must not be exclusive.

This is a complement to previous fix for bug #399. The exclusion between
the recv() and send() calls prevents send handlers from being called if
rx readiness is reported. The DNS code can trigger this situations with
threads where the fd_recv_ready() flag disappears between the test in
dgram_fd_handler() and the second test in dns_resolve_recv() while a
thread calls fd_cant_recv(), and this situation can sustain itself for
a while. With 8 threads and an error in the socket queue, placing a
printf on the return statement in dns_resolve_recv() scrolls very fast.

Simply removing the "else" in dgram_fd_handler() addresses the issue.

This fix must be backported as far as 1.6.

BUG/MAJOR: dns: add minimalist error processing on the Rx path

It was reported in bug #399 that the DNS sometimes enters endless loops
after hours working fine. The issue is caused by a lack of error
processing in the DNS's recv() path combined with an exclusive recv OR
send in the UDP layer, resulting in some errors causing CPU loops that
will never stop until the process is restarted.

The basic cause is that the FD_POLL_ERR and FD_POLL_HUP flags are sticky
on the FD, and contrary to a stream socket, receiving an error on a
datagram socket doesn't indicate that this socket cannot be used anymore.
Thus the Rx code must at least handle this situation and flush the error
otherwise it will constantly be reported. In theory this should not be a
big issue but in practise it is due to another bug in the UDP datagram
handler which prevents the send() callback from being called when Rx
readiness was reported, so the situation cannot go away. It happens way
more easily with threads enabled, so that there is no dead time between
the moment the FD is disabled and another recv() is called, such as in
the example below where the request was sent to a closed port on the
loopback provoking an ICMP unreachable to be sent back:

  [pid 20888] 18:26:57.826408 sendto(29, ";\340\1\0\0\1\0\0\0\0\0\1\0031wt\2eu\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 35, 0, NULL, >
  [pid 20893] 18:26:57.826566 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 ECONNREFUSED (Connection refused)
  [pid 20889] 18:26:57.826601 recvfrom(29, 0x7f97c76182f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20892] 18:26:57.826630 recvfrom(29, 0x7f97c5cf02f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20891] 18:26:57.826684 recvfrom(29, 0x7f97c66162f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20895] 18:26:57.826716 recvfrom(29, 0x7f97bffda2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20894] 18:26:57.826747 recvfrom(29, 0x7f97c4cee2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20888] 18:26:58.419838 recvfrom(29, 0x7ffcc8712c20, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20893] 18:26:58.419900 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  (... hundreds before next sendto() ...)

This situation was handled by clearing HUP and ERR when recv()
returns <0.

A second case was handled, there was a control for a missing dgram
handler, but it does nothing, causing the FD to ring again if this
situation ever happens. After looking at the rest of the code, it
doesn't seem possible to face such a situation because these handlers
are registered during startup, but at least we need to handle it
properly.

A third case was handled, that's mainly a small optimization. With
threads and massive responses, due to the large lock around the loop,
it's likely that some threads will have seen fd_recv_ready() and will
wait at the lock(). But if they wait here, chances are that other
threads will have eliminated pending data and issued fd_cant_recv().
In this case, better re-check fd_recv_ready() before performing the
recv() call to avoid the huge amounts of syscalls that happen on
massively threaded setups.

This patch must be backported as far as 1.6 (the atomic AND just
needs to be turned to a regular AND).

BUG/MEDIUM: kqueue: Make sure we report read events even when no data.

When we have a EVFILT_READ event, an optimization was made, and the FD was
not reported as ready to receive if there were no data available. That way,
if the socket was closed by our peer (the EV8EOF flag was set), and there were
no remaining data to read, we would just close(), and avoid doing a recv().
However, it may be fine for TCP socket, but it is not for UDP.
If we send data via UDP, and we receive an error, the only way to detect it
is to attempt a recv(). However, in this case, kevent() will report a read
event, but with no data, so we'd just ignore that read event, nothing would be
done about it, and the poller would be woken up by it over and over.
To fix this, report read events if either we have data, or the EV_EOF flag
is not set.

This should be backported to 2.1, 2.0, 1.9 and 1.8.

DOC: document the listener state transitions

This was done by reading all the code affecting a listener's state,
hopefully it will save some time in the future.

REORG: listener: move the global listener queue code to listener.c

The global listener queue code and declarations were still lying in
haproxy.c while not needed there anymore at all. This complicates
the code for no reason. As a result, the global_listener_queue_task
and the global_listener_queue were made static.

MINOR: listener: split dequeue_all_listener() in two

We use it half times for the global_listener_queue and half times
for a proxy's queue and this requires the callers to take care of
these. Let's split it in two versions, the current one working only
on the global queue and another one dedicated to proxies for the
per-proxy queues. This cleans up quite a bit of code.

MINOR: listener: make the wait paths cleaner and more reliable

In listener_accept() there are several situations where we have to wait
for an event or a delay. These ones all implement their own call to
limit_listener() and the associated task_schedule(). In addition to
being ugly and confusing, one expire date computation is even wrong as
it doesn't take in account the fact that we're using threads and that
the value might change in the middle. Fortunately task_schedule() gets
it right for us.

This patch creates two jump locations, one for the global queue and
one for the proxy queue, allowing the rest of the code to only compute
the expire delay and jump to the right location.

BUG/MEDIUM: listener/threads: fix a remaining race in the listener's accept()

Recent fix 4c044e274c ("BUG/MEDIUM: listener/thread: fix a race when
pausing a listener") is insufficient and moves the race slightly farther.
What now happens is that if we're limiting a listener due to a transient
error such as an accept() error for example, or because the proxy's
maxconn was reached, another thread might in the mean time have switched
again to LI_READY and at the end of the function we'll disable polling on
this FD, resulting in a listener that never accepts anything anymore. It
can more easily happen when sending SIGTTOU/SIGTTIN to temporarily pause
the listeners to let another process bind next to them.

What this patch does instead is to move all enable/disable operations at
the end of the function and condition them to the state. The listener's
state is checked under the lock and the FD's polling state adjusted
accordingly so that the listener's state and the FD always remain 100%
synchronized. It was verified with 16 threads that the cost of taking
that lock is not measurable so that's fine.

This should be backported to the same branches the patch above is
backported to.

BUG/MINOR: listener: also clear the error flag on a paused listener

When accept() fails because a listener is temporarily paused, the
FD might have both FD_POLL_HUP and FD_POLL_ERR bits set. While we do
not exploit FD_POLL_ERR here it's better to clear it because it is
reported on "show fd" and is confusing.

This may be backported to all versions.

BUG/MINOR: listener/threads: always use atomic ops to clear the FD events

There was a leftover of the single-threaded era when removing the
FD_POLL_HUP flag from the listeners. By not using an atomic operation
to clear the flag, another thread acting on the same listener might
have lost some events, though this would have resulted in that thread
to reprocess them immediately on the next loop pass.

This should be backported as far as 1.8.

BUG/MINOR: proxy: make soft_stop() also close FDs in LI_PAUSED state

The proxies' soft_stop() function closes the FDs in all opened states
except LI_PAUSED. This means that a transient error on a listener might
cause it to turn back to the READY state if it happens exactly when a
reload signal is received.

This must be backported to all supported versions.

BUG/MEDIUM: mux-fcgi: Handle cases where the HTX EOM block cannot be inserted

During the HTTP response parsing, if there is not enough space in the channel's
buffer, it is possible to fail to add the HTX EOM block while all data in the
rxbuf were consumed. As for the h1 mux, we must notify the conn-stream the
buffer is full to have a chance to add the HTX EOM block later. In this case, we
must also be carefull to not report a server abort by setting too early the
CS_FL_EOS flag on the conn-stream.

To do so, the FCGI_SF_APPEND_EOM flag must be set on the FCGI stream to know the
HTX EOM block is missing.

This patch must be backported to 2.1.

BUG/MINOR: mux-h1: Be sure to set CS_FL_WANT_ROOM when EOM can't be added

During the message parsing, when the HTX buffer is full and only the HTX EOM
block cannot be added, it is important to notify the conn-stream that some
processing must still be done but it is blocked because there is not enough room
in the buffer. The way to do so is to set the CS_FL_WANT_ROOM flag on the
conn-stream. Otherwise, because all data are received and consumed, the mux is
not called anymore to add this last block, leaving the message unfinished from
the HAProxy point of view. The only way to unblock it is to receive a shutdown
for reads or to hit a timeout.

This patch must be backported to 2.1 and 2.0. The 1.9 does not seem to be
affected.

MEDIUM: init: set NO_NEW_PRIVS by default when supported

HAProxy doesn't need to call executables at run time (except when using
external checks which are strongly recommended against), and is even expected
to isolate itself into an empty chroot. As such, there basically is no valid
reason to allow a setuid executable to be called without the user being fully
aware of the risks. In a situation where haproxy would need to call external
checks and/or disable chroot, exploiting a vulnerability in a library or in
haproxy itself could lead to the execution of an external program. On Linux
it is possible to lock the process so that any setuid bit present on such an
executable is ignored. This significantly reduces the risk of privilege
escalation in such a situation. This is what haproxy does by default. In case
this causes a problem to an external check (for example one which would need
the "ping" command), then it is possible to disable this protection by
explicitly adding this directive in the global section. If enabled, it is
possible to turn it back off by prefixing it with the "no" keyword.

Before the option:

  $ socat - /tmp/sock1 <<< "expert-mode on; debug dev exec sudo /bin/id"
  uid=0(root) gid=0(root) groups=0(root

After the option:
  $ socat - /tmp/sock1 <<< "expert-mode on; debug dev exec sudo /bin/id"
  sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the
        'nosuid' option set or an NFS file system without root privileges?

MINOR: debug: replace popen() with pipe+fork() in "debug dev exec"

popen() is annoying because it doesn't catch stderr. The command was
implemented using it just by pure laziness, let's just redo it a bit
cleaner using normal syscalls. Note that this command is only enabled
when built with -DDEBUG_DEV.

BUG/MEDIUM: checks: Make sure we set the task affinity just before connecting.

In process_chk_conn(), make sure we set the task affinity to the current
thread as soon as we're attempting a connection (and reset the affinity to
"any thread" if we detect a failure).
We used to only set the task affinity if connect_conn_chk() returned
SF_ERR_NONE, however for TCP checks, SF_ERR_UP is returned, so for those
checks, the task could still run on any thread, and this could lead to a
race condition where the connection runs on one thread, while the task runs
on another one, which could create random memory corruption and/or crashes.
This may fix github issue #369.

This should be backported to 2.1, 2.0 and 1.9.

BUG/MEDIUM: tasks: Make sure we switch wait queues in task_set_affinity().

In task_set_affinity(), leave the wait_queue if any before changing the
affinity, and re-enter a wait queue once it is done. If we don't do that,
the task may stay in the wait queue of another thread, and we later may
end up modifying that wait queue while holding no lock, which could lead
to memory corruption.

THis should be backported to 2.1, 2.0 and 1.9.

BUG/MINOR: mux-h1: Fix conditions to know whether or not we may receive data

The h1_recv_allowed() function is inherited from the h2 multiplexer. But for the
h1, conditions to know if we may receive data are less complex because there is
no multiplexing and because data are not parsed when received. So now, following
rules are respected :

* if an error or a shutdown for reads was detected on the connection we must
   not attempt to receive
* if the input buffer failed to be allocated or is full, we must not try to
   receive
* if the input processing is busy waiting for the output side, we may attempt
   to receive
* otherwise must may not attempt to receive

This patch must be backported as far as 1.9.

BUG/MINOR: mux-h1: Don't rely on CO_FL_SOCK_RD_SH to set H1C_F_CS_SHUTDOWN

The CO_FL_SOCK_RD_SH flag is only set when a read0 is received. So we must not
rely on it to set the H1 connection in shutdown state (H1C_F_CS_SHUTDOWN). In
fact, it is suffisant to set the connection in shutdown state when the shutdown
for writes is forwared to the sock layer.

This patch must be backported as far as 1.9.

BUG/MEDIUM: mux-h1: Never reuse H1 connection if a shutw is pending

On the server side, when a H1 stream is detached from the connection, if the
connection is not reusable but some outgoing data remain, the connection is not
immediatly released. In this case, the connection is not inserted in any idle
connection list. But it is still attached to the session. Because of that, it
can be erroneously reused. h1_avail_streams() always report a free slot if no
stream is attached to the connection, independently on the connection's
state. It is obviously a bug. If a second request is handled by the same session
(it happens with H2 connections on the client side), this connection is reused
before we close it.

There is small window to hit the bug, but it may lead to very strange
behaviors. For instance, if a first h2 request is quickly aborted by the client
while it is blocked in the mux on the server side (so before any response is
received), a second request can be processed and sent to the server. Because the
connection was not closed, the possible reply to the first request will be
interpreted as a reply to the second one. It is probably the bug described by
Peter Fröhlich in the issue #290.

To fix the bug, a new flag has been added to know if an H1 connection is idle or
not. So now, H1C_F_CS_IDLE is set when a connection is idle and useable to
handle a new request. If it is set, we try to add the connection in an idle
connection list. And h1_avail_streams() only relies on this flag
now. Concretely, this flag is set when a K/A stream is detached and both the
request and the response are in DONE state. It is exclusive to other H1C_F_CS
flags.

This patch must be backported as far as 1.9.

BUG/MINOR: ssl: certificate choice can be unexpected with openssl >= 1.1.1

It's regression from 9f9b0c6 "BUG/MEDIUM: ECC cert should work with
TLS < v1.2 and openssl >= 1.1.1". Wilcard EC certifcate could be selected
at the expense of specific RSA certificate.
In any case, specific certificate should always selected first, next wildcard.
Reflect this rule in a loop to avoid any bug in certificate selection changes.

Fix issue #394.

It should be backported as far as 1.8.

BUG/MEDIUM: listener/thread: fix a race when pausing a listener

There exists a race in the listener code where a thread might disable
receipt on a listener's FD then turn it to LI_PAUSED while at the same
time another one faces EAGAIN on accept() and enables it again via
fd_cant_recv(). The result is that the FD is in LI_PAUSED state with
its polling still enabled. listener_accept() does not do anything then
and doesn't disable the FD either, resulting in a thread eating all the
CPU as reported in issue #358. A solution would be to take the listener's
lock to perform the fd_cant_recv() call and do it only if the FD is still
in LI_READY state, but this would be totally overkill while in practice
the issue only happens during shutdown.

Instead what is done here is that when leaving we recheck the state and
disable polling if the listener is not in LI_READY state, which never
happens except when being limited. In the worst case there could be one
extra check per thread for the time required to converge, which is
absolutely nothing.

This fix was successfully tested, and should be backported to all
versions using the lock-free listeners, which means all those containing
commit 3f0d02bb ("MAJOR: listener: do not hold the listener lock in
listener_accept()"), hence 2.1, 2.0, 1.9.7+, 1.8.20+.

BUG/MINOR: ssl/cli: don't overwrite the filters variable

When a crt-list line using an already used ckch_store does not contain
filters, it will overwrite the ckchs->filters variable with 0.
This problem will generate all sni_ctx of this ckch_store without
filters. Filters generation mustn't be allowed in any case.

Must be backported in 2.1.

BUG/MINOR: stream-int: avoid calling rcv_buf() when splicing is still possible

In si_cs_recv(), we can end up with a partial splice() call that will be
followed by an attempt to us rcv_buf(). Sometimes this works and places
data into the buffer, which then prevent splicing from being used, and
this causes splice() and recvfrom() calls to alternate. Better simply
refrain from calling rcv_buf() when there are data in the pipe and still
data to be forwarded. Usually this indicates that we've ate everything
available and that we still want to use splice() on subsequent calls.

This should be backported to 2.1 and 2.0.

BUG/MEDIUM: stream-int: don't subscribed for recv when we're trying to flush data

If we cannot splice incoming data using rcv_pipe() due to remaining data
in the buffer, we must not subscribe to the mux but instead tag the
stream-int as blocked on missing Rx room. Otherwise when data are
flushed, calling si_chk_rcv() will have no effect because the WAIT_EP
flag remains present, and we'll end in an rx timeout. This case is very
hard to reproduce, and requires an inversion of the polling side in the
middle of a transfer. This can only happen when the client and the server
are using similar links and when splicing is enabled. It typically takes
hundreds of MB to GB for the problem to happen, and tends to be magnified
by the use of option contstats which causes process_stream() to be called
every 5s and to try again to recv.

This fix must be backported to 2.1, 2.0, and possibly 1.9.

BUG/MINOR: ssl/cli: 'ssl cert' cmd only usable w/ admin rights

The 3 commands 'set ssl cert', 'abort ssl cert' and 'commit ssl cert'
must be only usable with admin rights over the CLI.

Must be backported in 2.1.

MEDIUM: init: prevent process and thread creation at runtime

Some concerns are regularly raised about the risk to inherit some Lua
files which make use of a fork (e.g. via os.execute()) as well as
whether or not some of bugs we fix might or not be exploitable to run
some code. Given that haproxy is event-driven, any foreground activity
completely stops processing and is easy to detect, but background
activity is a different story. A Lua script could very well discretely
fork a sub-process connecting to a remote location and taking commands,
and some injected code could also try to hide its activity by creating
a process or a thread without blocking the rest of the processing. While
such activities should be extremely limited when run in an empty chroot
without any permission, it would be better to get a higher assurance
they cannot happen.

This patch introduces something very simple: it limits the number of
processes and threads to zero in the workers after the last thread was
created. By doing so, it effectively instructs the system to fail on
any fork() or clone() syscall. Thus any undesired activity has to happen
in the foreground and is way easier to detect.

This will obviously break external checks (whose concept is already
totally insecure), and for this reason a new option
"insecure-fork-wanted" was added to disable this protection, and it
is suggested in the fork() error report from the checks. It is
obviously recommended not to use it and to reconsider the reasons
leading to it being enabled in the first place.

If for any reason we fail to disable forks, we still start because it
could be imaginable that some operating systems refuse to set this
limit to zero, but in this case we emit a warning, that may or may not
be reported since we're after the fork point. Ideally over the long
term it should be conditionned by strict-limits and cause a hard fail.

DOC: move the "group" keyword at the right place

It looks like "hard-stop-after", "h1-case-adjust" and "h1-case-adjust-file"
were added before "group", breaking alphabetical ordering.

DOC: Fix ordered list in summary

Section 6 about the cache was placed between 7 and 8. This should
be backported to 2.1.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

DOC: clarify matching strings on binary fetches

Add clarification and example to string matching on binary samples,
as comparison stops at first null byte due to strncmp behaviour.

Backporting all the way down to 1.5 is suggested as it might save
from headaches.

BUG/MINOR: ssl: fix X509 compatibility for openssl < 1.1.0

Commit d4f9a60e "MINOR: ssl: deduplicate ca-file" uses undeclared X509
functions when build with openssl < 1.1.0. Introduce this functions
in openssl-compat.h .

Fix issue #385.

BUG/MINOR: stats: Fix HTML output for the frontends heading

Since the flag STAT_SHOWADMIN was removed, the frontends heading in the HTML
output appears unaligned because the space reserved for the checkbox (not
displayed for frontends) is not inserted.

This patch fixes the issue #390. It must be backported to 2.1.

BUG/MINOR: fcgi-app: Make the directive pass-header case insensitive

The header name configured by the directive "pass-header", in the "fcgi-app"
section, must be case insensitive. For now, it must be in lowercase to match an
header. Internally, header names are in lowercase but there is no reason to
impose this syntax in the configuration.

This patch must be backported to 2.1.

BUG/MINOR: ssl: fix SSL_CTX_set1_chain compatibility for openssl < 1.0.2

Commit 1c65fdd5 "MINOR: ssl: add extra chain compatibility" really implement
SSL_CTX_set0_chain. Since ckch can be used to init more than one ctx with
openssl < 1.0.2 (commit 89f58073 for X509_chain_up_ref compatibility),
SSL_CTX_set1_chain compatibility is required.

This patch must be backported to 2.1.

DOC: ssl/cli: set/commit/abort ssl cert

Document the "set/commit/abort ssl cert" CLI commands in management.txt.

Must be backported in 2.1.

BUG/MINOR: http-htx: Don't make http_find_header() fail if the value is empty

http_find_header() is used to find the next occurrence of a header matching on
its name. When found, the matching header is returned with the corresponding
value. This value may be empty. Unfortunatly, because of a bug, an empty value
make the function fail.

This patch must be backported to 2.1, 2.0 and 1.9.

CLEANUP: dns: resolution can never be null

`eb` being tested above, `res` cannot be null, so the condition is
not needed and introduces potential dead code.

also fix a typo in associated comment

This should fix issue #349

Reported-by: Ð\98Ð»Ñ\8cÑ\8f Ð¨Ð¸Ð¿Ð¸Ñ\86Ð¸Ð½ <chipitsine@gmail.com>
Signed-off-by: William Dauchy <w.dauchy@criteo.com>

MINOR: ssl: deduplicate crl-file

Load file for crl or ca-cert is realy done with the same function in OpenSSL,
via X509_STORE_load_locations. Accordingly, deduplicate crl-file and ca-file
can share the same function.

MINOR: ssl: compute ca-list from deduplicate ca-file

ca-list can be extracted from ca-file already loaded in memory.
This patch set ca-list from deduplicated ca-file when needed
and share it in ca-file tree.

As a corollary, this will prevent file access for ca-list when
updating a certificate via CLI.

MINOR: ssl: deduplicate ca-file

Typically server line like:
'server-template srv 1-1000 *:443 ssl ca-file ca-certificates.crt'
load ca-certificates.crt 1000 times and stay duplicated in memory.
Same case for bind line: ca-file is loaded for each certificate.
Same 'ca-file' can be load one time only and stay deduplicated in
memory.

As a corollary, this will prevent file access for ca-file when
updating a certificate via CLI.

DOC: Clarify behavior of server maxconn in HTTP mode

In HTTP mode the number of concurrent requests is limited, not the
number of actual connections.

BUILD/MINOR: trace: fix use of long type in a few printf format strings

Building on a 32-bit platform produces these warnings in trace code:

src/stream.c: In function 'strm_trace':
src/stream.c:226:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 9 has type 'size_t {aka const unsigned int}' [-Wformat=]
   chunk_appendf(&trace_buf, " req=(%p .fl=0x%08x .ana=0x%08x .exp(r,w,a)=(%u,%u,%u) .o=%lu .tot=%llu .to_fwd=%u)",
                             ^
src/stream.c:229:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 9 has type 'size_t {aka const unsigned int}' [-Wformat=]
   chunk_appendf(&trace_buf, " res=(%p .fl=0x%08x .ana=0x%08x .exp(r,w,a)=(%u,%u,%u) .o=%lu .tot=%llu .to_fwd=%u)",
                             ^
src/mux_fcgi.c: In function 'fcgi_trace':
src/mux_fcgi.c:443:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka const unsigned int}' [-Wformat=]
   chunk_appendf(&trace_buf, " - VAL=%lu", *val);
                             ^
src/mux_h1.c: In function 'h1_trace':
src/mux_h1.c:290:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka const unsigned int}' [-Wformat=]
   chunk_appendf(&trace_buf, " - VAL=%lu", *val);
                             ^

Let's just cast the type to long. This should be backported to 2.1.

BUG/MINOR: h1: Don't test the host header during response parsing

During the H1 message parsing, the host header is tested to be sure it matches
the request's authority, if defined. When there are multiple host headers, we
also take care they are all the same. Of course, these tests must only be
performed on the requests. A host header in a response has no special meaning.

This patch must be backported to 2.1.

BUG/MINOR: contrib/prometheus-exporter: decode parameter and value only

we were decoding all substring and then parsing; this could lead to
consider & and = in decoding result as delimiters where it should not.
this patch reverses the order by first parsing and then decoding each key
and value separately.

we also stop parsing after number sign (#).

This patch should be backported to 2.1 and 2.0

Signed-off-by: William Dauchy <w.dauchy@criteo.com>

CLEANUP: ssl: Clean up error handling

This commit removes the explicit checks for `if (err)` before
passing `err` to `memprintf`. `memprintf` already checks itself
whether the `**out*` parameter is `NULL` before doing anything.
This reduces the indentation depth and makes the code more readable,
before there is less boilerplate code.

Instead move the check into the ternary conditional when the error
message should be appended to a previous message. This is consistent
with the rest of ssl_sock.c and with the rest of HAProxy.

Thus this patch is the arguably cleaner fix for issue #374 and builds
upon
5f1fa7db86c53827c97f8a8c3f5fa75bfcb5be9a and
8b453912ce9a4e1a3b1329efb2af04d1e470852e

Additionally it fixes a few places where the check *still* was missing.

SCRIPTS: update create-release to fix the changelog on new branches

The changelog is empty when creating a dev0 version and this confuses
the commit message, let's clearly mention the exact copy when there are
no changes.

MINOR: version: this is development again, update the status

It's basically a revert of commit 9ca7f8cea.

DOC: this is development again

This is basically a revert of commit eb1a3ee5 ("DOC: mention in INSTALL
haproxy 2.1 is a stable stable version").

[RELEASE] Released version 2.2-dev0

Released version 2.2-dev0 with the following main changes :
- exact copy of 2.1.0

[RELEASE] Released version 2.1.0

Released version 2.1.0 with the following main changes :
    - BUG/MINOR: init: fix set-dumpable when using uid/gid
    - MINOR: init: avoid code duplication while setting identify
    - BUG/MINOR: ssl: ssl_pkey_info_index ex_data can store a dereferenced pointer
    - BUG/MINOR: ssl: fix crt-list neg filter for openssl < 1.1.1
    - MINOR: peers: Alway show the table info for disconnected peers.
    - MINOR: peers: Add TX/RX heartbeat counters.
    - MINOR: peers: Add debugging information to "show peers".
    - BUG/MINOR: peers: Wrong null "server_name" data field handling.
    - MINOR: ssl/cli: 'abort ssl cert' deletes an on-going transaction
    - BUG/MEDIUM: mworker: don't fill the -sf argument with -1 during the reexec
    - BUG/MINOR: peers: "peer alive" flag not reset when deconnecting.
    - BUILD/MINOR: ssl: fix compiler warning about useless statement
    - BUG/MEDIUM: stream-int: Don't loose events on the CS when an EOS is reported
    - MINOR: contrib/prometheus-exporter: filter exported metrics by scope
    - MINOR: contrib/prometheus-exporter: Add a param to ignore servers in maintenance
    - BUILD: debug: Avoid warnings in dev mode with -02 because of some BUG_ON tests
    - BUG/MINOR: mux-h1: Fix tunnel mode detection on the response path
    - BUG/MINOR: http-ana: Properly catch aborts during the payload forwarding
    - DOC: Update http-buffer-request description to remove the part about chunks
    - BUG/MINOR: stream-int: Fix si_cs_recv() return value
    - DOC: internal: document the init calls
    - MEDIUM: dns: Add resolve-opts "ignore-weight"
    - MINOR: ssl: ssl_sock_prepare_ctx() return an error code
    - MEDIUM: ssl/cli: apply SSL configuration on SSL_CTX during commit
    - MINOR: ssl/cli: display warning during 'commit ssl cert'
    - MINOR: version: report the version status in "haproxy -v"
    - MINOR: version: emit the link to the known bugs in output of "haproxy -v"
    - DOC: Add documentation about the use-service action
    - MINOR: ssl: fix possible null dereference in error handling
    - BUG/MINOR: ssl: fix curve setup with LibreSSL
    - BUG/MINOR: ssl: Stop passing dynamic strings as format arguments
    - CLEANUP: ssl: check if a transaction exists once before setting it
    - BUG/MINOR: cli: fix out of bounds in -S parser
    - MINOR: ist: add ist_find_ctl()
    - BUG/MAJOR: h2: reject header values containing invalid chars
    - BUG/MAJOR: h2: make header field name filtering stronger
    - BUG/MAJOR: mux-h2: don't try to decode a response HEADERS frame in idle state
    - MINOR: h2: add a function to report H2 error codes as strings
    - MINOR: mux-h2/trace: report the connection and/or stream error code
    - SCRIPTS: create-release: show the correct origin name in suggested commands
    - SCRIPTS: git-show-backports: add "-s" to proposed cherry-pick commands
    - BUG/MEDIUM: trace: fix a typo causing an incorrect startup error
    - BUILD: reorder the objects in the makefile
    - DOC: mention in INSTALL haproxy 2.1 is a stable stable version
    - MINOR: version: indicate that this version is stable

MINOR: version: indicate that this version is stable

Also indicate that it will get fixes till ~Q1 2021.

DOC: mention in INSTALL haproxy 2.1 is a stable stable version

Let's switch back to the stable wording now.

BUILD: reorder the objects in the makefile

After a number of reorganization, addition of fcgi and the removal of
the legacy mode, some late files ended up being slow to build and were
slowing down the parallel build. Let's reorder them based on the build
time. Full build went down from 8.3-9.2s to 6.8s.

BUG/MEDIUM: trace: fix a typo causing an incorrect startup error

Since commit 88ebd40 ("MINOR: trace: add allocation of buffer-sized
trace buffers") we have a trace buffer allocated at boot time. But
there was a copy-paste error there making the test verify that the
trash was allocated instead of the trace buffer. The result is that
depending on the link order either the test will succeed or fail,
preventing haproxy from starting at all.

No backport is needed.

SCRIPTS: git-show-backports: add "-s" to proposed cherry-pick commands

Since we're using signed-off-by tags for backports, let's add -s to
the command so that we can finally copy-paste it!

SCRIPTS: create-release: show the correct origin name in suggested commands

create-release shows the next steps at the end and suggest to use
"git push origin master" but on my machine it's not "origin" so let's
determine it using git config and only use origin as a fall back.

MINOR: mux-h2/trace: report the connection and/or stream error code

We were missing the error code when tracing a call to h2s_error() or
h2c_error(), let's report it when it's set.

MINOR: h2: add a function to report H2 error codes as strings

Just like we have frame type to string, let's have error to string to
improve debugging and traces.

BUG/MAJOR: mux-h2: don't try to decode a response HEADERS frame in idle state

Christopher found another issue in the H2 backend implementation that
results from a miss in the H2 spec: the processing of a HEADERS frame
is always permitted in IDLE state, but this doesn't make sense on the
response path! And here when facing such a frame, we try to decode it
while we didn't allocate any stream, so we end up trying to fill the
idle stream's buffer (read-only) and crash.

What we're doing here is that if we get a HEADERS frame in IDLE state
from a server, we terminate the connection with a PROTOCOL_ERROR. No
such transition seems to be permitted by the spec but it seems to be
the only sane solution.

This fix must be backported as far as 1.9. Note that in 2.0 and earlier
there's no h2_frame_check_vs_state() function, instead the check is
inlined in h2_process_demux().

BUG/MAJOR: h2: make header field name filtering stronger

Tim Düsterhus found that the amount of sanitization we perform on HTTP
header field names received in H2 is insufficient. Currently we reject
upper case letters as mandated by RFC7540#8.1.2, but section 10.3 also
requires that intermediaries translating streams to HTTP/1 further
refine the filtering to also reject invalid names (which means any name
that doesn't match a token). There is a small trick here which is that
the colon character used to start pseudo-header names doesn't match a
token, so pseudo-header names fall into that category, thus we have to
swap the pseudo-header name lookup with this check so that we only check
from the second character (past the ':') in case of pseudo-header names.

Another possibility could have been to perform this check only in the
HTX-to-H1 trancoder but doing would still expose the configured rules
and logs to such header names.

This fix must be backported as far as 1.8 since this bug could be
exploited and serve as the base for an attack. In 2.0 and earlier,
functions h2_make_h1_request() and h2_make_h1_trailers() must also
be adapted to sanitize requests coming in legacy mode.

BUG/MAJOR: h2: reject header values containing invalid chars

Tim Düsterhus reported an annoying problem in the H2 decoder related to
an ambiguity in the H2 spec. The spec says in section 10.3 that HTTP/2
allows header field values that are not valid (since they're binary) and
at the same time that an H2 to H1 gateway must be careful to reject headers
whose values contain \0, \r or \n.

Till now, and for the sake of the ability to maintain end-to-end binary
transparency in H2-to-H2, the H2 mux wouldn't reject this since it does
not know what version will be used on the other side.

In theory we should in fact perform such a check when converting an HTX
header to H1. But this causes a problem as it means that all our rule sets,
sample fetches, captures, logs or redirects may still find an LF in a header
coming from H2. Also in 2.0 and older in legacy mode, the frames are instantly
converted to H1 and HTX couldn't help there. So this means that in practice
we must refrain from delivering such a header upwards, regardless of any
outgoing protocol consideration.

Applying such a lookup on all headers leaving the mux comes with a
significant performance hit, especially for large ones. A first attempt
was made at placing this into the HPACK decoder to refrain from learning
invalid literals but error reporting becomes more complicated. Additional
tests show that doing this within the HTX transcoding loop benefits from
the hot L1 cache, and that by skipping up to 8 bytes per iteration the
CPU cost remains within noise margin, around ~0.5%.

This patch must be backported as far as 1.8 since this bug could be
exploited and serve as the base for an attack. In 2.0 and earlier the
fix must also be added to functions h2_make_h1_request() and
h2_make_h1_trailers() to handle legacy mode. It relies on previous patch
"MINOR: ist: add ist_find_ctl()" to speed up the control bytes lookup.

All credits go to Tim for his detailed bug report and his initial patch.