git.ipfire.org Git - thirdparty/haproxy.git/log

BUG/MEDIUM: connections: force connections cleanup on server changes

I've been trying to understand a change of behaviour between v2.2dev5 and
v2.2dev6. Indeed our probe is regularly testing to add and remove
servers on a given backend such as:

# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -

-> curl on the corresponding frontend: reply from server:31255

# echo "set server be_foo/srv1 addr 10.236.139.34 port 31257" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31257' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31257 -

-> curl on the corresponding frontend: reply for server:31257
(notice the difference of weight)

# echo "set server be_foo/srv1 state maint" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 addr 0.0.0.0 port 0" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '10.236.139.34' to '0.0.0.0', port changed from '31257' to '0' by 'stats socket command'
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -

-> curl on the corresponding frontend: reply from server:31255

# echo "set server be_foo/srv1 addr 10.236.139.34 port 31256" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31256' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31256 -

-> curl on the corresponding frontend: reply from server:31257 (!)

Here we indeed would expect to get an anver from server:31256. The issue
is highly linked to the usage of `pool-purge-delay`, with a value which
is higher than the duration of the test, 10s in our case.

a git bisect between dev5 and dev6 seems to show commit
079cb9af22da6 ("MEDIUM: connections: Revamp the way idle connections are killed")
being the origin of this new behaviour.

So if I understand the later correctly, it seems that it was more a
matter of chance that we did not saw the issue earlier.

My patch proposes to force clean idle connections in the two following
cases:
- we set a (still running) server to maintenance
- we change the ip/port of a server

This commit should be backported to 2.1, 2.0, and 1.9.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>

BUG/MEDIUM: mux-fcgi: Fix wrong test on FCGI_CF_KEEP_CONN in fcgi_detach()

When a stream is detached from its connection, we try to move the connection in
an idle list to keep it opened, the session one or the server one. But it must
only be done if there is no connection error and if we want to keep it
open. This last statement is true if FCGI_CF_KEEP_CONN flag is set. But the test
is inverted at the stage.

This patch must be backported to 2.1.

BUG/MEDIUM: mux_fcgi: Free the FCGI connection at the end of fcgi_release()

fcgi_release() function is responsible to release a FCGI connection. But the
release of the connection itself is missing.

This patch must be backported to 2.1.

BUG/MEDIUM: mux-fcgi: Return from detach if server don't keep the connection

When the last stream is detached from a FCGI connection, if the server don't add
the connection in its idle list, the connection is destroyed. Thus it is
important to exist immediately from the detach function. A return statement is
missing here.

This bug was introduced in the commit 2444aa5b6 ("MEDIUM: sessions: Don't be
responsible for connections anymore.").

It is a 2.2-dev bug. No need to backport.

MINOR: stream: report the list of active filters on stream crashes

Now we very rarely catch spinning streams, and whenever we catch one it
seems a filter is involved, but we currently report no info about them.
Let's print the list of enabled filters on the stream with such a crash
to help with the reports. A typical output will now look like this:

[ALERT] 121/165908 (1110) : A bogus STREAM [0x7fcaf4016a60] is spinning at 2 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7fcaf4016a60 src=127.0.0.1 fe=l1 be=l1 dst=<CACHE> rqf=6dc42000 rqa=48000 rpf=a0040223 rpa=24000000 sif=EST,10008 sib=DIS,80110 af=(nil),0 csf=0x7fcaf4023c00,10c000 ab=0x7fcaf40235f0,4 csb=(nil),0 cof=0x7fcaf4016610,1300:H1(0x7fcaf4016840)/RAW((nil))/tcpv4(29) cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0) filters={0x7fcaf4016fb0="cache store filter", 0x7fcaf4017080="compression filter"}]

This may be backported to 2.0.

BUG/MEDIUM: shctx: bound the number of loops that can happen around the lock

Given that a "count" value of 32M was seen in _shctx_wait4lock(), it
is very important to prevent this from happening again. It's absolutely
essential to prevent the value from growing unbounded because with an
increase of the number of threads, the number of successive failed
attempts will necessarily grow.

Instead now we're scanning all 2^p-1 values from 3 to 255 and are
bounding to count to 255 so that in the worst case each thread tries an
xchg every 255 failed read attempts. That's one every 4 on average per
thread when there are 64 threads, which corresponds to the initial count
of 4 for the first attempt so it seems like a reasonable value to keep a
low latency.

The bug was introduced with the shctx entries in 1.5 so the fix must
be backported to all versions. Before 1.8 the function was called
_shared_context_wait4lock() and was in shctx.c.

BUG/MEDIUM: shctx: really check the lock's value while waiting

Jérôme reported an amazing crash in the spinlock version of
_shctx_wait4lock() with an extremely high <count> value of 32M! The
root cause is that the function cannot deal with contention on the lock
at all because it forgets to check if the lock's value has changed! As
such, every time it's called due to a contention, it waits twice as
long before trying again and lets the caller check for the contention
by itself.

The correct thing to do is to compare the value again at each loop.
This way it makes sure to mostly perform read accesses on the shared
cache line without writing too often, and to be ready fast enough to
try to grab the lock. And we must not increase the count on success
either!

Unfortunately I'd have expected to see a performance boost on the cache
with this but there was absolutely no change, so it's very likely that
these issues only happen once in a while and are sufficient to derail
the process when they strike, but not to have a permanent performance
impact.

The bug was introduced with the shctx entries in 1.5 so the fix must
be backported to all versions. Before 1.8 the function was called
_shared_context_wait4lock() and was in shctx.c.

BUG/MINOR: debug: properly use long long instead of long for the thread ID

I changed my mind twice on this one and pushed after the last test with
threads disabled, without re-enabling long long, causing this rightful
build warning.

This needs to be backported if the previous commit ff64d3b027 ("MINOR:
threads: export the POSIX thread ID in panic dumps") is backported as
well.

MINOR: threads: export the POSIX thread ID in panic dumps

It is very difficult to map a panic dump against a gdb thread dump
because the thread numbers do not match. However gdb provides the
pthread ID but this one is supposed to be opaque and not to be cast
to a scalar.

This patch provides a fnuction, ha_get_pthread_id() which retrieves
the pthread ID of the indicated thread and casts it to an unsigned
long long so as to lose the least possible amount of information from
it. This is done cleanly using a union to maintain alignment so as
long as these IDs are stored on 1..8 bytes they will be properly
reported. This ID is now presented in the panic dumps so it now
becomes possible to map these threads. When threads are disabled,
zero is returned. For example, this is a panic dump:

  Thread 1 is about to kill the process.
  *>Thread 1 : id=0x7fe92b825180 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
               stuck=1 prof=0 harmless=0 wantrdv=0
               cpu_ns: poll=5119122 now=2009446995 diff=2004327873
               curr_task=0xc99bf0 (task) calls=4 last=0
                 fct=0x592440(task_run_applet) ctx=0xca9c50(<CLI>)
               strm=0xc996a0 src=unix fe=GLOBAL be=GLOBAL dst=<CLI>
               rqf=848202 rqa=0 rpf=80048202 rpa=0 sif=EST,200008 sib=EST,204018
               af=(nil),0 csf=0xc9ba40,8200
               ab=0xca9c50,4 csb=(nil),0
               cof=0xbf0e50,1300:PASS(0xc9cee0)/RAW((nil))/unix_stream(20)
               cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0)
               call trace(20):
               |       0x59e4cf [48 83 c4 10 5b 5d 41 5c]: wdt_handler+0xff/0x10c
               | 0x7fe92c170690 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x13690
               | 0x7ffce29519d9 [48 c1 e2 20 48 09 d0 48]: linux-vdso:+0x9d9
               | 0x7ffce2951d54 [eb d9 f3 90 e9 1c ff ff]: linux-vdso:__vdso_gettimeofday+0x104/0x133
               |       0x57b484 [48 89 e6 48 8d 7c 24 10]: main+0x157114
               |       0x50ee6a [85 c0 75 76 48 8b 55 38]: main+0xeaafa
               |       0x50f69c [48 63 54 24 20 85 c0 0f]: main+0xeb32c
               |       0x59252c [48 c7 c6 d8 ff ff ff 44]: task_run_applet+0xec/0x88c
    Thread 2 : id=0x7fe92b6e6700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
               stuck=0 prof=0 harmless=1 wantrdv=0
               cpu_ns: poll=786738 now=1086955 diff=300217
               curr_task=0
    Thread 3 : id=0x7fe92aee5700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
               stuck=0 prof=0 harmless=1 wantrdv=0
               cpu_ns: poll=828056 now=1129738 diff=301682
               curr_task=0
    Thread 4 : id=0x7fe92a6e4700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
               stuck=0 prof=0 harmless=1 wantrdv=0
               cpu_ns: poll=818900 now=1153551 diff=334651
               curr_task=0

And this is the gdb output:

  (gdb) info thr
    Id   Target Id                         Frame
  * 1    Thread 0x7fe92b825180 (LWP 15234) 0x00007fe92ba81d6b in raise () from /lib64/libc.so.6
    2    Thread 0x7fe92b6e6700 (LWP 15235) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
    3    Thread 0x7fe92a6e4700 (LWP 15237) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
    4    Thread 0x7fe92aee5700 (LWP 15236) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6

We can clearly see that while threads 1 and 2 are the same, gdb's
threads 3 and 4 respectively are haproxy's threads 4 and 3.

This may be backported to 2.0 as it removes some confusion in github issues.

BUG/MEDIUM: listener: mark the thread as not stuck inside the loop

We tried hard to make sure we report threads as not stuck at various
crucial places, but one of them is special, it's the listener_accept()
function. The reason it is special is because it will loop a certain
number of times (default: 64) accepting incoming connections, allocating
resources, dispatching them to other threads or running L4 rules on them,
and while all of this is supposed to be extremely fast, when the machine
slows down or runs low on memory, the expectedly small delays in malloc()
caused by contention with other threads can quickly accumulate and suddenly
become critical to the point of triggering the watchdog. Furthermore, it
is technically possible to trigger this by pure configuration by setting
a huge tune.maxaccept value, which should not be possible.

Given that each operation isn't related to the same task but to a different
one each time, it is appropriate to mark the thread as not stuck each time
it accepts new work that possibly gets dispatched to other threads which
execute it.

This looks like this could be a good reason for the issue reported in
issue #388.

This fix must be backported to 2.0.

CLEANUP: ssl: silence a build warning when threads are disabled

Building without threads now shows this warning:

src/ssl_sock.c: In function 'cli_io_handler_commit_cert':
src/ssl_sock.c:12121:24: warning: unused variable 'bind_conf' [-Wunused-variable]
struct bind_conf *bind_conf = ckchi->bind_conf;
^~~~~~~~~

This is because the variable is needed only to unlock the structure, and
the unlock operation does nothing in this case. Let's mark the variable
__maybe_unused for this, but it would be convenient in the long term if
we could make the thread macros pretend they consume the argument so that
this remains less visible outside.

No backport is needed.

REGTEST: ssl: improve the "set ssl cert" test

Improve the test by removing the curl command and using the same proxy
chaining technique as in commit 3ed722f ("REGTEST: ssl: remove curl from
the "add ssl crt-list" test").

A 3rd request was added which must fail, to ensure that the SNI was
effectively removed from HAProxy.

This patch also adds timeouts in the default section, logs on stderr and
fix some indentation issues.

REGTEST: ssl: remove curl from the "add ssl crt-list" test

Using curl for SSL tests can be a problem if it wasn't compiled with the
right SSL library and if it didn't share any cipher with HAProxy. To
have more robust tests we now use HAProxy as an SSL client, so we are
sure that the client and the server share the same SSL requirements.

This patch also adds timeouts in the default section, logs on stderr and
fix some indentation issues.

REGTEST: http-rules: Require PCRE or PCRE2 option to run map_redirect script

Only PCRE was specified as required option to execute this script. But PCRE2 is
an valid alternative.

DOC: Add more info about request formatting in http-check send description

Only one Host header can be defined and some headers are automatically skipped
(Connection, Content-Length and Transfer-Encoding). In addition, a note about
the synchronisation of the Host header value and the request uri has been added.

DOC: Fix send rules in the http-check connect example

Method, uri and version arguments must be explicitly named.

CLEANUP: checks: Fix checks includes

MINOR: checks: Keep the Host header and the request uri synchronized

Because in HTTP, the host header and the request authority, if any, must be
identical, we keep both synchornized. It means the right flags are set on the
HTX statrt-line calling http_update_host(). There is no header when it happens,
but it is not an issue. Then, if a Host header is inserted,
http_update_authority() is called.

Note that for now, the host header is not automatically added when required.

MINOR: checks: Skip some headers for http-check send rules

Connection, content-length and transfer-encoding headers are ignored for
http-check send rules. For now, the keep-alive is not supported and the
"connection: close" header is always added to the request. And the
content-length header is automatically added.

MINOR: checks: Don't support multiple host header for http-check send rule

Only one host header definition is supported. There is no reason to define it
several times.

MINOR: http-htx: Export functions to update message authority and host

These functions will be used by HTTP health checks when a request is formatted
before sending it.

BUG/MEDIUM: sample: make the CPU and latency sample fetches check for a stream

cpu_calls, cpu_ns_avg, cpu_ns_tot, lat_ns_avg and lat_ns_tot depend on the
stream to find the current task and must check for it or they may cause a
crash if misused or used in a log-format string after commit 5f940703b3
("MINOR: log: Don't depends on a stream to process samples in log-format
string").

This must be backported as far as 1.9.

CLEANUP: http: add a few comments on certain functions' assumptions about streams

get_http_auth() expects a valid stream but this is not mentioned, though
fortunately it's always called from places which already check this.

smp_prefetch_htx() performs all the required checks and is the key to the
stability of almost all sample fetch functions, so let's make this clearer.

BUG/MEDIUM: http: the "unique-id" sample fetch could crash without a steeam

Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.

The unique-id sample fetch function, if called without a stream, will result
in a crash.

This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.7.

BUG/MEDIUM: http: the "http_first_req" sample fetch could crash without a steeam

Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.

The http_first_req sample fetch function, if called without a stream, will
result in a crash.

This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.6.

BUG/MEDIUM: capture: capture.{req,res}.* crash without a stream

Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.

The capture.req.hdr, capture.res.hdr, capture.req.method, capture.req.uri,
capture.req.ver and capture.res.ver sample fetches used to assume the
presence of a stream, which is not necessarily the case (especially after
the commit above) and would crash haproxy if incorrectly used. Let's make
sure they check for this stream.

This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.6.

BUG/MEDIUM: capture: capture-req/capture-res converters crash without a stream

Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.

The capture-req and capture-res converters were in this case and could
crash the process if misused.

This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.6.

DOC: give a more accurate description of what check does

The documentation for check implies that without an application
level check configured, it only enables simple tcp checks. What it
actually does is verify that the configured transport layer is available,
and that optional application level checks succeed.

REGTEST: ssl: test the client certificate authentication

This reg-test tests the client auth feature of HAProxy for both the
backend and frontend section with a CRL list.

This reg-test uses 2 chained listeners because vtest does not handle the
SSL. Test the frontend client auth and the backend side at the same
time.

It sends 3 requests: one with a correct certificate, one with an expired
one and one which was revoked. The client then checks if we received the
right one with the right error.

Certificates, CA and CRL are expiring in 2050 so it should be fine for
the CI.

This test could be backported as far as HAProxy 1.6

BUG/MEDIUM: mux-h1: make sure we always have a timeout on front connections

Mux-h1 currently heavily relies on the presence of an upper stream, even
when waiting for a new request after one is being finished, and it's that
upper stream that's in charge of request and keep-alive timeouts for now.
But since recent commit 493d9dc6ba ("MEDIUM: mux-h1: do not blindly wake
up the tasklet at end of request anymore") that assumption was broken as
the purpose of this change was to avoid initiating processing of a request
when there's no data in the buffer. The side effect is that there's no more
timeout to handle the front connection, resulting in dead front connections
stacking up as clients get kicked off the net.

This fix makes sure we always enable the timeout when there's no stream
attached to the connection. It doesn't do this for back connections since
they may purposely be left idle.

No backport is needed as this bug was introduced in 2.2-dev4.

BUG/MINOR: checks: Set the output buffer length before calling parse_binary()

A bug was introduced in the commit 2edcd4cbd ("BUG/MINOR: checks: Avoid
incompatible cast when a binary string is parsed"). The length of the
destination buffer must be set before call the parse_binary() function.

No backport needed.

MINOR: log: Add "Tu" timer

It can be sometimes useful to measure total time of a request as seen
from an end user, including TCP/TLS negotiation, server response time
and transfer time. "Tt" currently provides something close to that, but
it also takes client idle time into account, which is problematic for
keep-alive requests as idle time can be very long. "Ta" is also not
sufficient as it hides TCP/TLS negotiationtime. To improve that, introduce
a "Tu" timer, without idle time and everything else. It roughly estimates
time spent time spent from user point of view (without DNS resolution
time), assuming network latency is the same in both directions.

BUG/MINOR: checks: Don't lose warning on proxy capability

When a tcp-check line is parsed, a warning may be reported if the keyword is
used for a frontend. The return value must be used to report it. But this info
is lost before the end of the function.

Partly fixes issue #600. No backport needed.

BUG/MINOR: checks: Remove bad call to free() when an expect rule is parsed

When an error is found during the parsing of an expect rule (tcp or http),
everything is released at the same place, at the end of the function.

Partly fixes issue #600. No backport needed.

BUG/MINOR: checks: Avoid incompatible cast when a binary string is parsed

parse_binary() function must be called with a pointer on an integer. So don't
pass a pointer on a size_t element, casting it to a pointer on a integer.

Partly fixes issue #600. No backport needed.

MINOR: checks: Make the use of the check's server more explicit on connect

The variable s, pointing on the check server, may be null when a connection is
openned. It happens for email alerts. To avoid ambiguities, its use is now more
explicit. Comments have been added at some places and tests on the variable have
been added elsewhere (useless but explicit).

Partly fixes issue #600.

CLEANUP: checks: Remove unused code when ldap server message is parsed

In tcpcheck_ldap_expect_bindrsp(), wait_more_data label cannot be reached.

Partly fixes issue #600.

BUG/MINOR: checks: Properly handle truncated mysql server messages

If a message is not fully received from a mysql server, depending on last_read
value, an error must be reported or we must wait for more data. The first if
statement must not check last_read.

Partly fixes issue #600. No backport needed.

BUG/MINOR: checks: Remove wrong variable redeclaration

When mysql-check option is parsed, the user variable is redeclared without any
reason. thus the redeclared variable is removed.

No backport needed.

MINOR: checks: Use ver keyword to specify the HTTP version for http checks

'ver' keyword is already used by sample fetches while 'vsn' is not used anywhere
else. So better to use 'ver' too for http-check send rules.

MINOR: checks: Support HTTP/2 version (without '.0') for http-check send rules

The version is partially parsed to set the flag HTX_SL_F_VER_11 on the HTX
message. But exactly 8 chars is expected. So if "HTTP/2" is specified, the flag
is not set. Thus, the version parsing has been updated to handle "HTTP/2" and
"HTTP/2.0" the same way.

CI: cirrus-ci: remove reg-tests/checks/tcp-check-ssl.vtc on CentOS 6

reg-tests/checks/tcp-check-ssl.vtc requires ALPN which is not
available on CentOS 6

BUG/MINOR: checks: Fix PostgreSQL regex on the authentication packet

For PostgreSQL health check, there is a regex on the backend authentication
packet. It must match to succeed. But it exists 6 types of authentication
packets and the regex only matches the first one (AuthenticationOK). This patch
fixes the regex to match all authentication packets.

No backport needed.

BUG/MEDIUM: checks: Destroy the conn-stream before the session

At the end of a tcp-check based health check, if there is still a connection
attached to the check, it must be closed. But it must be done before releasing
the session, because the session may still be referenced by the mux. For
instance, an h2 stream may still have a reference on the session.

No need to backport.

BUG/MEDIUM: sessions: Always pass the mux context as argument to destroy a mux

This bug was introduced by the commit 2444aa5b ("MEDIUM: sessions: Don't be
responsible for connections anymore."). In session_check_idle_conn(), when the
mux is destroyed, its context must be passed as argument instead of the
connection.

It is de 2.2-dev bug. No need to backport.

BUG/MINOR: checks/server: use_ssl member must be signed

BUG/MINOR: checks: Only use ssl_sock_is_ssl() if compiled with SSL support

ssl_sock_is_ssl() only exists if HAProxy is complied with SSL support.

No backport needed.

BUG/MEDIUM: checks: unsubscribe for events on the old conn-stream on connect

When a new connection is established, if an old connection is still attached to
the current check, it must be detroyed. When it happens, the old conn-stream
must be used to unsubscribe for events, not the new one.

No backport is needed.

BUG/MINOR: server: Fix server_finalize_init() to avoid unused variable

The variable 'ret' must only be declared When HAProxy is compiled with the SSL
support (more precisely SSL_CTRL_SET_TLSEXT_HOSTNAME must be defined).

No backport needed.

REGTEST: Add a script to validate agent checks

BUG/MEDIUM: checks: Unsubscribe to mux events when a conn-stream is destroyed

Since the tcp-check based heath checks uses the best multuplexer for a
connection, the mux-pt is no longer the only possible choice. So events
subscriptions and unsubscriptions must be done with the mux.

No backport needed.

MINOR: checks: Support list of status codes on http-check expect rules

It is now possible to match on a comma-separated list of status codes or range
of codes. In addtion, instead of a string comparison to match the response's
status code, a integer comparison is performed. Here is an example:

http-check expect status 200,201,300-310

BUG/MINOR: mux-fcgi: Be sure to have a connection as session's origin to use it

When default parameters are set in a request message, we get the client
connection using the session's origin. But it is not necessarily a
conncection. For instance, for health checks, thanks to recent changes, it may
be a check object. At this step, the client connection may be NULL. Thus, we
must be sure to have a client connection before using it.

This patch must be backported to 2.1.

MINOR: checks: Support mux protocol definition for tcp and http health checks

It is now possible to force the mux protocol for a tcp-check based health check
using the server keyword "check-proto". If set, this parameter overwrites the
server one.

In the same way, a "proto" parameter has been added for tcp-check and http-check
connect rules. If set, this mux protocol overwrites all others for the current
connection.

BUG/MEDIUM: checks: Use the mux protocol specified on the server line

First, when a server health check is initialized, it inherits the mux protocol
from the server if it is not already specified. Because there is no option to
specify the mux protocol for the checks, it is always inherited from the server
for now.

Then, if the connect rule is configured to use the server options, the mux
protocol of the check is used, if defined. Of course, if a mux protocol is
already defined for the connect rule, it is used in priority. But for now, it is
not possible.

Thus, if a server is configured to use, for instance, the h2 protocol, it is
possible to do the same for the health-checks.

No backport needed.

DOC: Fix the tcp-check and http-check directives layout

DOC: Add documentation about comments for tcp-check and http-check directives

The documentation about the comment argument for some tcp-check and http-check
directives was missing. As well as the description of "tcp-check comment" and
"http-check comment" directives.

Revert "MEDIUM: checks: capture groups in expect regexes"

This reverts commit 1979943c30ef285ed04f07ecf829514de971d9b2.

Captures in comment was only used when a tcp-check expect based on a negative
regex matching failed to eventually report what was captured while it was not
expected. It is a bit far-fetched to be useable IMHO. on-error and on-success
log-format strings are far more usable. For now there is few check sample
fetches (in fact only one...). But it could be really powerful to report info in
logs.

REGTEST: Add scripts to test based tcp-check health-checks

These scripts have been added to validate the health-checks based on tcp-check
rules (http, redis, MySQL...).

BUG/MINOR: checks: Send the right amount of outgoing data for HTTP checks

HTTP health-checks now use HTX multiplexers. So it is important to really send
the amount of outgoing data for such checks because the HTX buffers appears
always full.

No backport needed.

MINOR: checks: Use a tree instead of a list to store tcp-check rulesets

Since all tcp-check rulesets are globally stored, it is a problem to use
list. For configuration with many backends, the lookups in list may be costly
and slow downs HAProxy startup. To solve this problem, tcp-check rulesets are
now stored in a tree.

BUG/MEDIUM: checks: Be sure to subscribe for sends if outgoing data remains

When some data are scheduled to be sent, we must be sure to subscribe for sends
if nothing was sent. Because of a bug, when nothing was sent, connection errors
are checks. If no error is found, we exit, waiting for more data, without any
subcription on send events.

No need to backport.

MINOR: checks: Use ist API as far as possible

Instead of accessing directly to the ist fields, the ist API is used instead. To
get its length or its pointer, to release it or to duplicate it. It is more
readable this way.

MINOR: ist: Add a function to retrieve the ist pointer

There is already the istlen() function to get the ist length. Now, it is
possible to call istptr() to get the ist pointer.

REGTEST: Fix reg-tests about health-checks to adapt them to recent changes

CLEANUP: checks: Reorg checks.c file to be more readable

The patch is not obvious at the first glance. But it is just a reorg. Functions
have been grouped and ordered in a more logical way. Some structures and flags
are now private to the checks module (so moved from the .h to the .c file).

MINOR: checks: Remove unused code about pure TCP checks

Thanks to previous change, it is now possible to removed all code handling pure
tcp checks. Now every connection based health-checks are handled by the
tcpcheck_main() function. __event_srv_chk_w() and __event_srv_chk_r() have been
removed. And all connection establishment is handled at one place.

MEDIUM: checks: Implement default TCP check using tcp-check rules

Defaut health-checks, without any option, doing only a connection check, are now
based on tcp-checks. An implicit default tcp-check connect rule is used. A
shared tcp-check ruleset, name "*tcp-check" is created to support these checks.

MAJOR: checks: Use the best mux depending on the protocol for health checks

When a tcp-check connect rule is evaluated, the mux protocol corresponding to
the health-check is chosen. So for TCP based health-checks, the mux-pt is
used. For HTTP based health-checks, the mux-h1 is used. The connection is marked
as private to be sure to not ruse regular HTTP connection for
health-checks. Connections reuse will be evaluated later.

The functions evaluating HTTP send rules and expect rules have been updated to
be HTX compliant. The main change for users is that HTTP health-checks are now
stricter on the HTTP message format. While before, the HTTP formatting and
parsing were minimalist, now messages should be well formatted.

MINOR: connection: Add a function to install a mux for a health-check

This function is unused for now. But it will have be used to install a mux for
an outgoing connection openned in a health-check context. In this case, the
session's origin is the check itself, and it is used to know the mode, HTTP or
TCP, depending on the tcp-check type and not the proxy mode. The check is also
used to get the mux protocol if configured.

MINOR: checks: Add a mux proto to health-check and tcp-check connect rule

It is not set and not used for now, but it will be possible to force the mux
protocol thanks to this patch. A mux proto field is added to the checks and to
tcp-check connect rules.

MINOR: checks: Use the check as origin when a session is created

Before, the server was used as origin during session creation. It was only used
to get the check associated to the server when a variable is get or set in the
check scope or when a check sample fetch was called. So it seems easier to use
the check as origin of a session. It is also more logical becaues the session is
created by the health-check itself and not its server.

BUG/MINOR: obj_type: Handle stream object in obj_base_ptr() function

The stream object (OBJ_TYPE_STREAM) was missing in the switch statement of the
obj_base_ptr() function.

This patch must be backported as far as 2.0.

MINOR: checks/obj_type: Add a new object type for checks

An object type is now affected to the check structure.

MEDIUM: checks: Refactor how data are received in tcpcheck_main()

A dedicated function is now used to received data. fundamentally, it should do
the same operations than before. But the way data are received has been reworked
to be closer to the si_cs_recv() function.

MINOR: connection: Add macros to know if a conn or a cs uses an HTX mux

IS_HTX_CONN() and IS_HTX_CS may now be used to know if a connection or a
conn-stream use an HTX based multiplexer.

MINOR: checks: Make resume conditions more explicit in tcpcheck_main()

First tests before executing the loop on tcp-check rules in tcpcheck_main()
function have been slightly modified to be more explicit and easier to
understand.

MAJOR: checks: Implement HTTP check using tcp-check rules

HTTP health-checks are now internally based on tcp-checks. Of course all the
configuration parsing of the "http-check" keyword and the httpchk option has
been rewritten. But the main changes is that now, as for tcp-check ruleset, it
is possible to perform several send/expect sequences into the same
health-checks. Thus the connect rule is now also available from HTTP checks, jst
like set-var, unset-var and comment rules.

Because the request defined by the "option httpchk" line is used for the first
request only, it is now possible to set the method, the uri and the version on a
"http-check send" line.

MINOR: checks: Add a reverse non-comment rule iterator to get last rule

the get_last_tcpcheck_rule() function iters on a rule list in the reverse order
and returns the first non comment and non action-kw rule. If no such rule is
found, NULL is returned.

MINOR: standard: Add my_memspn and my_memcspn

Do the same than strsnp() and strcspn() but on a raw bytes buffer.

MINOR: checks: Introduce flags to configure in tcp-check expect rules

Instead of having 2 independent integers, used as boolean values, to know if the
expect rule is invered and to know if the matching regexp has captures, we know
use a 32-bits bitfield.

MINOR: checks: Use an indirect string to represent the expect matching string

Instead of having a string in the expect union with its length outside of the
union, directly in the expect structure, an indirect string is now used.

MEDIUM: checks: Use a shared ruleset to store tcp-check rules

All tcp-check rules are now stored in the globla shared list. The ones created
to parse a specific protocol, for instance redis, are already stored in this
list. Now pure tcp-check rules are also stored in it. The ruleset name is
created using the proxy name and its config file and line. tcp-check rules
declared in a defaults section are also stored this way using "defaults" as
proxy name.

For now, all tcp-check ruleset are stored in a list. But it could be a bit slow
to looks for a specific ruleset with a huge number of backends. So, it could be
a good idea to use a tree instead.

MINOR: proxy/checks: Register a keyword to parse external-check rules

The keyword 'external-check' is now parsed in a dedicated callback
function. Thus the code to parse these rules is now located in checks.c.

MINOR: proxy/checks: Move parsing of external-check option in checks.c

Parsing of the proxy directive "option external-check" have been moved in checks.c.

MINOR: proxy/checks: Register a keyword to parse http-check rules

The keyword 'http-check' is now parsed in a dedicated callback function. Thus
the code to parse these rules is now located in checks.c.

MINOR: proxy/checks: Move parsing of tcp-check option in checks.c

Parsing of the proxy directive "option tcp-check" have been moved in checks.c.

MINOR: proxy/checks: Move parsing of httpchk option in checks.c

Parsing of the proxy directive "option httpchk" have been moved in checks.c.

MINOR: checks: Improve log message of tcp-checks on success

MINOR: checks: Add an option to set success status of tcp-check expect rules

It is now possible to specified the healthcheck status to use on success of a
tcp-check rule, if it is the last evaluated rule. The option "ok-status"
supports "L4OK", "L6OK", "L7OK" and "L7OKC" status.

MINOR: Produce tcp-check info message for pure tcp-check rules only

This way, messages reported by protocol checks are closer that the old one.

REGTEST: Adapt regtests about checks to recent changes

MEDIUM: checks: Implement agent check using tcp-check rules

A shared tcp-check ruleset is now created to support agent checks. The following
sequence is used :

tcp-check send "%[var(check.agent_string)] log-format
tcp-check expect custom

The custom function to evaluate the expect rule does the same that it was done
to handle agent response when a custom check was used.

MINOR: server/checks: Move parsing of server check keywords in checks.c

Parsing of following keywords have been moved in checks.c file : addr, check,
check-send-proxy, check-via-socks4, no-check, no-check-send-proxy, rise, fall,
inter, fastinter, downinter and port.

MINOR: server/checks: Move parsing of agent keywords in checks.c

Parsing of following keywords have been moved in checks.c file: agent-addr,
agent-check, agent-inter, agent-port, agent-send and no-agent-check.

MEDIUM: checks: Implement SPOP check using tcp-check rules

A share tcp-check ruleset is now created to support SPOP checks. This way no
extra memory is used if several backends use a SPOP check.

The following sequence is used :

tcp-check send-binary SPOP_REQ
tcp-check expect custom min-recv 4

The spop request is the result of the function
spoe_prepare_healthcheck_request() and the expect rule relies on a custom
function calling spoe_handle_healthcheck_response().

MEDIUM: checks: Implement LDAP check using tcp-check rules

A shared tcp-check ruleset is now created to support LDAP check. This way no
extra memory is used if several backends use a LDAP check.

The following sequance is used :

    tcp-check send-binary "300C020101600702010304008000"

    tcp-check expect rbinary "^30" min-recv 14 \
        on-error "Not LDAPv3 protocol"

    tcp-check expect custom

The last expect rule relies on a custom function to check the LDAP server reply.

MEDIUM: checks: Implement MySQL check using tcp-check rules

A share tcp-check ruleset is now created to support MySQL checks. This way no
extra memory is used if several backends use a MySQL check.

One for the following sequence is used :

    ## If no extra params are set
    tcp-check connect default linger
    tcp-check expect custom  ## will test the initial handshake

    ## If the username is defined
    tcp-check connect default linger
    tcp-check send-binary MYSQL_REQ log-format
    tcp-check expect custom  ## will test the initial handshake
    tcp-check expect custom  ## will test the reply to the client message

The log-format hexa string MYSQL_REQ depends on 2 preset variables, the packet
header containing the packet length and the sequence ID (check.header) and the
username (check.username). If is also different if the "post-41" option is set
or not. Expect rules relies on custom functions to check MySQL server packets.

MEDIUM: checks: Implement postgres check using tcp-check rules

A shared tcp-check ruleset is now created to support postgres check. This way no
extra memory is used if several backends use a pgsql check.

The following sequence is used :

    tcp-check connect default linger

    tcp-check send-binary PGSQL_REQ log-format

    tcp-check expect !rstring "^E" min-recv 5 \
        error-status "L7RSP" on-error "%[check.payload(6,0)]"

    tcp-check expect rbinary "^520000000800000000 min-recv "9" \
        error-status "L7STS" \
        on-success "PostgreSQL server is ok" \
        on-error "PostgreSQL unknown error"

The log-format hexa string PGSQL_REQ depends on 2 preset variables, the packet
length (check.plen) and the username (check.username).

MEDIUM: checks: Implement smtp check using tcp-check rules

A share tcp-check ruleset is now created to support smtp checks. This way no
extra memory is used if several backends use a smtp check.

The following sequence is used :

    tcp-check connect default linger

    tcp-check expect rstring "^[0-9]{3}[ \r]" min-recv 4 \
        error-status "L7RSP" on-error "%[check.payload(),cut_crlf]"

    tcp-check expect rstring "^2[0-9]{2}[ \r]" min-recv 4 \
        error-status "L7STS" \
        on-error %[check.payload(4,0),ltrim(' '),cut_crlf] \
        status-code "check.payload(0,3)"

    tcp-echeck send "%[var(check.smtp_cmd)]\r\n" log-format

    tcp-check expect rstring "^2[0-9]{2}[- \r]" min-recv 4 \
        error-status "L7STS" \
        on-error %[check.payload(4,0),ltrim(' '),cut_crlf] \
        on-success "%[check.payload(4,0),ltrim(' '),cut_crlf]" \
        status-code "check.payload(0,3)"

The variable check.smtp_cmd is by default the string "HELO localhost" by may be
customized setting <helo> and <domain> parameters on the option smtpchk
line. Note there is a difference with the old smtp check. The server gretting
message is checked before send the HELO/EHLO comand.