git.ipfire.org Git - thirdparty/haproxy.git/log

BUG/MINOR: filters: Don't force the stream's wakeup when we wait in flt_end_analyze

In flt_end_analyze, we wait that the anlayze is finished for both the request
and the response. In this case, because of a task_wakeup, some streams can
consume too much CPU to do nothing. So now, this is the filter's responsibility
to know if this wakeup is needed.

This fix should be backported in 1.7.

BUG/MEDIUM: http: Fix blocked HTTP/1.0 responses when compression is enabled

When the compression filter is enabled, if a HTTP/1.0 response is received (no
content-length and no transfer-encoding), no data are forwarded to the client
because of a bug and the transaction is blocked indefinitly.

The bug comes from the fact we need to synchronize the end of the request and
the response because of the compression filter. This inhibits the infinite
forwarding of data. But for these responses, the compression is not
activated. So the response body is not analyzed. This leads to a deadlock.

The solution is to enable the analyze of the response body in all cases and
handle this one to enable the infinite forwarding. All other cases should
already by handled.

This fix should be backported to 1.7.

MINOR: http: Add debug messages when HTTP body analyzers are called

Only DPRINTF() for developers.

MINOR: http: remove useless check on HTTP_MSGF_XFER_LEN for the request

The flag HTTP_MSGF_XFER_LEN is always set for an HTTP request because we always
now the body length. So there is no need to do check on it.

CLEANUP: buffers: Remove buffer_contig_area and buffer_work_area functions

Not used anymore since last commit.

CLEANUP: buffers: Remove buffer_bounce_realign function

Not used anymore since last commit.

CLEANUP: http: Remove channel_congested function

Not used anymore since last commit.

BUG/MINOR: http: Fix conditions to clean up a txn and to handle the next request

To finish a HTTP transaction and to start the new one, we check, among other
things, that there is enough space in the reponse buffer to eventually inject a
message during the parsing of the next request. Because these messages can reach
the maximum buffers size, it is mandatory to have an empty response
buffer. Remaining input data are trimmed during the txn cleanup (in
http_reset_txn), so we just need to check that the output data were flushed.

The current implementation depends on channel_congested, which does check the
reserved area is available. That's not of course good enough. There are other
tests on the reponse buffer is http_wait_for_request. But conditions to move on
are almost the same. So, we can imagine some scenarii where some output data
remaining in the reponse buffer during the request parsing prevent any messages
injection.

To fix this bug, we just wait that output data were flushed before cleaning up
the HTTP txn (ie. s->res.buf->o == 0). In addition, in http_reset_txn we realign
the response buffer (note the buffer is empty at this step).

Thanks to this changes, there is no more need to set CF_EXPECT_MORE on the
response channel in http_end_txn_clean_session. And more important, there is no
more need to check the response buffer state in http_wait_for_request. This
remove a workaround on response analysers to handle HTTP pipelining.

This patch can be backported in 1.7, 1.6 and 1.5.

BUG/MEDIUM: buffers: Fix how input/output data are injected into buffers

The function buffer_contig_space is buggy and could lead to pernicious bugs
(never hitted until now, AFAIK). This function should return the number of bytes
that can be written into the buffer at once (without wrapping).

First, this function is used to inject input data (bi_putblk) and to inject
output data (bo_putblk and bo_inject). But there is no context. So it cannot
decide where contiguous space should placed. For input data, it should be after
bi_end(buf) (ie, buf->p + buf->i modulo wrapping calculation). For output data,
it should be after bo_end(buf) (ie, buf->p) and input data are assumed to not
exist (else there is no space at all).

Then, considering we need to inject input data, this function does not always
returns the right value. And when we need to inject output data, we must be sure
to have no input data at all (buf->i == 0), else the result can also be wrong
(but this is the caller responsibility, so everything should be fine here).

The buffer can be in 3 different states:

1) no wrapping

              <---- o ----><----- i ----->
+------------+------------+-------------+------------+
|            |oooooooooooo|iiiiiiiiiiiii|xxxxxxxxxxxx|
+------------+------------+-------------+------------+
                           ^             <contig_space>
                           p             ^            ^
                 l            r

2) input wrapping

...--->            <---- o ----><-------- i -------...
+-----+------------+------------+--------------------+
|iiiii|xxxxxxxxxxxx|oooooooooooo|iiiiiiiiiiiiiiiiiiii|
+-----+------------+------------+--------------------+
       <contig_space>            ^
       ^            ^            p
       l            r

3) output wrapping

...------ o ------><----- i ----->            <----...
+------------------+-------------+------------+------+
|oooooooooooooooooo|iiiiiiiiiiiii|xxxxxxxxxxxx|oooooo|
+------------------+-------------+------------+------+
                    ^             <contig_space>
                    p             ^            ^
                  l            r

buffer_contig_space returns (l - r). The cases 1 and 3 are correctly
handled. But for the second case, r is wrong. It points on the buffer's end
(buf->data + buf->size). It should be bo_end(buf) (ie, buf->p - buf->o).

To fix the bug, the function has been splitted. Now, bi_contig_space and
bo_contig_space should be used to know the contiguous space available to insert,
respectively, input data and output data. For bo_contig_space, input data are
assumed to not exist. And the right version is used, depending what we want to
do.

In addition, to clarify the buffer's API, buffer_realign does not return value
anymore. So it has the same API than buffer_slow_realign.

This patch can be backported in 1.7, 1.6 and 1.5.

BUG/MEDIUM: peers: fix buffer overflow control in intdecode.

A buffer overflow could happen if an integer is badly encoded in
the data part of a msg received from a peer. It should not happen
with authenticated peers (the handshake do not use this function).

This patch makes the code of the 'intdecode' function more robust.

It also adds some comments about the intencode function.

This bug affects versions >= 1.6.

BUG/MEDIUM: server: Wrong server default CRT filenames initialization.

This patch fixes a bug which came with 5e57643 commit where server
default CRT filenames were initialized to the same value as server
default CRL filenames.

CLEANUP: time: curr_sec_ms doesn't need to be exported

It's not used anywhere outside of tv_update_date().

BUILD: scripts: fix typo in announce-release error message

It used to say the tag already existed instead of the opposite.

BUILD: make the release script use shortlog for the final changelog

It used to reuse the same command producting the list for the changelog,
requiring to run shortlog manually.

BUG/MEDIUM: tcp: don't require privileges to bind to device

Ankit Malp reported a bug that we've had since binding to devices was
implemented. Haproxy wrongly checks that the process stays privileged
after startup when a binding to a device is specified via the bind
keyword "interface". This is wrong, because after startup we're not
binding any socket anymore, and during startup if there's a permission
issue it will be immediately reported ("permission denied"). More
importantly there's no way around it as the process exits on startup
when facing such an option.

This fix should be backported to 1.7, 1.6 and 1.5.

MINOR: doc: fix use-server example (imap vs mail)

Another minor doc issue in the use-server example, use-server refers
to server "imap", but the server below is actually called "mail".

Renames the server from "mail" to "imap".

DOC: server: Add docs for "server" and "default-server" new "no-*" and other settings.

New boolean settings have been added to disable others. Most of them have "no-" as prefix.

"enabled" disables "disabled" setting,
"no-agent-check" disables "agent-check",
"no-backup" disables "backup",
"no-check" disables "check",
"no-check-ssl" disables "check-ssl",
"no-force-sslv3" disables "force-sslv3",
"no-force-tlsv10" disables "force-tlsv10",
"no-force-tlsv11" disables "force-tlsv11",
"no-force-tlsv12" disables "force-tlsv12,
"no-send-proxy" disables "send-proxy",
"no-send-proxy-v2" disables "send-proxy-v2",
"no-send-proxy-v2-ssl" disables "send-proxy-v2-ssl",
"no-send-proxy-v2-ssl-cn" disables "send-proxy-v2-ssl-cn",
"no-ssl" disables "ssl",
"no-verifyhost" disables "verifyhost",
"sslv3" disables "no-sslv3",
"ssl-reuse" disables "no-ssl-reuse",
"stick" disables "non-stick",
"tlsv10" disables "no-tlsv10",
"tlsv11" disables "no-tlsv11",
"tlsv12" disables "no-tlsv12",
"tls-tickets" disables "no-tls-tickets".

Settings with arguments are now supported on "default-server" lines:

"addr", "ca-file", "ciphers", "crl-file", "crt", "cookie", "namespace", "observe",
"redir", "sni", "source", "tcp-ut" and "track".

From now on, all server "settings" including the new ones above are supported by
"default-server" except "id" which is only supported on "server" lines.

MINOR: server: Add 'no-agent-check' server keyword.

This patch adds 'no-agent-check' setting supported both by 'default-server'
and 'server' directives to disable an agent check for a specific server which would
have 'agent-check' set as default value (inherited from 'default-server'
'agent-check' setting), or, on 'default-server' lines, to disable 'agent-check' setting
as default value for any further 'server' declarations.

For instance, provided this configuration:

    default-server agent-check
    server srv1
    server srv2 no-agent-check
    server srv3
    default-server no-agent-check
    server srv4

srv1 and srv3 would have an agent check enabled contrary to srv2 and srv4.

We do not allocate anymore anything when parsing 'default-server' 'agent-check'
setting.

MINOR: server: Make 'default-server' support 'disabled' keyword.

Before this patch, only 'server' directives could support 'disabled' setting.
This patch makes also 'default-server' directives support this setting.
It is used to disable a list of servers declared after a 'defaut-server' directive.
'enabled' new keyword has been added, both supported as 'default-server' and
'server' setting, to enable again a list of servers (so, declared after a
'default-server enabled' directive) or to explicitly enable a specific server declared
after a 'default-server disabled' directive.

For instance provided this configuration:

    default-server disabled
    server srv1...
    server srv2...
    server srv3... enabled
    server srv4... enabled

srv1 and srv2 are disabled and srv3 and srv4 enabled.

This is equivalent to this configuration:

    default-server disabled
    server srv1...
    server srv2...
    default-server enabled
    server srv3...
    server srv4...

even if it would have been preferable/shorter to declare:

    server srv3...
    server srv4...
    default-server disabled
    server srv1...
    server srv2...

as 'enabled' is the default server state.

MINOR: server: Make 'default-server' support 'addr' keyword.

This patch makes 'default-server' support 'addr' setting.
The code which was responsible of parsing 'server' 'addr' setting
has moved from parse_server() to implement a new parser
callable both as 'default-server' and 'server' 'addr' setting parser.

Should not break anything.

MINOR: server: Make 'default-server' support 'sni' keyword.

This patch makes 'default-server' directives support 'sni' settings.
A field 'sni_expr' has been added to 'struct server' to temporary
stores SNI expressions as strings during both 'default-server' and 'server'
lines parsing. So, to duplicate SNI expressions from 'default-server' 'sni' setting
for new 'server' instances we only have to "strdup" these strings as this is
often done for most of the 'server' settings.
Then, sample expressions are computed calling sample_parse_expr() (only for 'server'
instances).
A new function has been added to produce the same error output as before in case
of any error during 'sni' settings parsing (display_parser_err()).
Should not break anything.

MINOR: server: Make 'default-server' support 'source' keyword.

Before this patch, only 'server' directives could support 'source' setting.
This patch makes also 'default-server' directives support this setting.

To do so, we had to extract the code responsible of parsing 'source' setting
arguments from parse_server() function and make it callable both
as 'default-server' and 'server' 'source' setting parser. So, the code is mostly
the same as before except that before allocating anything for 'struct conn_src'
members, we must free the memory previously allocated.

Should not break anything.

MINOR: server: Make 'default-server' support 'namespace' keyword.

Before this patch, 'namespace' setting was only supported by 'server' directive.
This patch makes 'default-server' directive support this setting.

MINOR: server: Make 'default-server' support 'tcp-ut' keyword.

This patch makes 'default-server' directive support 'tcp-ut' keyword.

MINOR: server: Make 'default-server' support 'ciphers' keyword.

This patch makes 'default-server' directive support 'ciphers' setting.

MINOR: server: Make 'default-server' support 'cookie' keyword.

Before this patch, 'cookie' setting was only supported by 'server' directives.
This patch makes 'default-server' directive also support 'cookie' setting.
Should not break anything.

MINOR: server: Make 'default-server' support 'observe' keyword.

Before this path, 'observe' setting was only supported by 'server' directives.
This patch makes 'default-server' directives also support 'observe' setting.
Should not break anything.

MINOR: server: Make 'default-server' support 'redir' keyword.

Before this patch only 'server' directives could support 'redir' setting.
This patch makes also 'default-server' directives support 'redir' setting.
Should not break anything.

MINOR: server: Make 'default-server' support 'ca-file', 'crl-file' and 'crt' settings.

This patch makes 'default-server' directives support 'ca-file', 'crl-file' and
'crt' settings.

MINOR: server: Make 'default-server' support 'track' setting.

Before this patch only 'server' directives could support 'track' setting.
This patch makes 'default-server' directives also support this setting.
Should not break anything.

MINOR: server: Make 'default-server' support 'check' keyword.

Before this patch 'check' setting was only supported by 'server' directives.
This patch makes also 'default-server' directives support this setting.
A new 'no-check' keyword parser has been implemented to disable this setting both
in 'default-server' and 'server' directives.
Should not break anything.

MINOR: server: Make 'default-server' support 'verifyhost' setting.

This patch makes 'default-server' directive support 'verifyhost' setting.
Note: there was a little memory leak when several 'verifyhost' arguments were
supplied on the same 'server' line.

MINOR: server: Make 'default-server' support 'verify' keyword.

This patch makes 'default-server' directive support 'verify' keyword.

CLEANUP: server: code alignement.

Code alignement again.

MINOR: server: Make 'default-server' support 'send-proxy-v2-ssl*' keywords.

This patch makes 'default-server' directive support 'send-proxy-v2-ssl'
(resp. 'send-proxy-v2-ssl-cn') setting.
A new keyword 'no-send-proxy-v2-ssl' (resp. 'no-send-proxy-v2-ssl-cn') has been
added to disable 'send-proxy-v2-ssl' (resp. 'send-proxy-v2-ssl-cn') setting both
in 'server' and 'default-server' directives.

MINOR: server: Make 'default-server' support 'ssl' keyword.

This patch makes 'default-server' directive support 'ssl' setting.
A new keyword 'no-ssl' has been added to disable this setting both
in 'server' and 'default-server' directives.

MINOR: server: Make 'default-server' support 'no-ssl*' and 'no-tlsv*' keywords.

This patch makes 'default-server' directive support 'no-sslv3' (resp. 'no-ssl-reuse',
'no-tlsv10', 'no-tlsv11', 'no-tlsv12', and 'no-tls-tickets') setting.
New keywords 'sslv3' (resp. 'ssl-reuse', 'tlsv10', 'tlsv11', 'tlsv12', and
'tls-no-tickets') have been added to disable these settings both in 'server' and
'default-server' directives.

CLEANUP: server: code alignement.

Code alignement.

MINOR: server: Make 'default-server' support 'force-sslv3' and 'force-tlsv1[0-2]' keywords.

This patch makes 'default-server' directive support 'force-sslv3'
and 'force-tlsv1[0-2]' settings.
New keywords 'no-force-sslv3' (resp. 'no-tlsv1[0-2]') have been added
to disable 'force-sslv3' (resp. 'force-tlsv1[0-2]') setting both in 'server' and
'default-server' directives.

MINOR: server: Make 'default-server' support 'check-ssl' keyword.

This patch makes 'default-server' directive support 'check-ssl' setting
to enable SSL for health checks.
A new keyword 'no-check-ssl' has been added to disable this setting both in
'server' and 'default-server' directives.

MINOR: server: Make 'default-server' support 'send-proxy' and 'send-proxy-v2 keywords.

This patch makes 'default-server' directive support 'send-proxy'
(resp. 'send-proxy-v2') setting.
A new keyword 'no-send-proxy' (resp. 'no-send-proxy-v2') has been added
to disable 'send-proxy' (resp. 'send-proxy-v2') setting both in 'server' and
'default-server' directives.

MINOR: server: Make 'default-server' support 'non-stick' keyword.

This patch makes 'default-server' directive support 'non-stick' setting.
A new keyword 'stick' has been added so that to disable
'non-stick' setting both in 'server' and 'default-server' directives.

CLEANUP: server: code alignement.

Code alignement.

MINOR: server: Make 'default-server' support 'check-send-proxy' keyword.

This patch makes 'default-server' directive support 'check-send-proxy' setting.
A new keyword 'no-check-send-proxy' has been added so that to disable
'check-send-proxy' setting both in 'server' and 'default-server' directives.

MINOR: server: Make 'default-server' support 'backup' keyword.

At this time, only 'server' supported 'backup' keyword.
This patch makes also 'default-server' directive support this keyword.
A new keyword 'no-backup' has been added so that to disable 'backup' setting
both in 'server' and 'default-server' directives.

For instance, provided the following sequence of directives:

default-server backup
server srv1
server srv2 no-backup

default-server no-backup
server srv3
server srv4 backup

srv1 and srv4 are declared as backup servers,
srv2 and srv3 are declared as non-backup servers.

MINOR: server: irrelevant error message with 'default-server' config file keyword.

There is no reason to emit such an error message:
"'default-server' expects <name> and <addr>[:<port>] as arguments."
if less than two arguments are provided on 'default-server' lines.
This is a 'server' specific error message.

BUG/MINOR: cfgparse: loop in tracked servers lists not detected by check_config_validity().

There is a silly case where a loop is not detected in tracked servers lists:
when a server tracks itself.

Ex:
   server srv1 127.0.0.1:8000 track srv1

Well, this never happens and this does not prevent haproxy from working.

But with this next following configuration:

   server srv1 127.0.0.1:8000 track srv2
   server srv2 127.0.0.1:8000 track srv2
   server srv3 127.0.0.1:8000 track srv2

the code in charge of detecting such loops never returns (without any error message).
haproxy becomes stuck in an infinite loop because of this statement found
in check_config_validity():

for (loop = srv->track; loop && loop != newsrv; loop = loop->track);

Again, such a configuration is never accidentally used I guess.
This latter example seems silly, but as several 'default-server' directives may be used
in the same proxy section, and as 'default-server' settings are not resetted each a
new 'default-server' line is created, it will match the following configuration, in the future,
when 'track' setting will be supported by 'default-server':

   default-server track srv3
   server srv1 127.0.0.1:8000
   server srv2 127.0.0.1:8000
   .
   .
   .
   default-server check
   server srv3 127.0.0.1:8000
(cherry picked from commit 6528fc93d3c065fdac841f24e55cfe9674a67414)

MINOR: dns: improve DNS response parsing to use as many available records as possible

A "weakness" exist in the first implementation of the parsing of the DNS
responses: HAProxy always choses the first IP available matching family
preference, or as a failover, the first IP.

It should be good enough, since most DNS servers do round robin on the
records they send back to clients.
That said, some servers does not do proper round robin, or we may be
unlucky too and deliver the same IP to all the servers sharing the same
hostname.

Let's take the simple configuration below:

  backend bk
    srv s1 www:80 check resolvers R
    srv s2 www:80 check resolvers R

The DNS server configured with 2 IPs for 'www'.
If you're unlucky, then HAProxy may apply the same IP to both servers.

Current patch improves this situation by weighting the decision
algorithm to ensure we'll prefer use first an IP found in the response
which is not already affected to any server.

The new algorithm does not guarantee that the chosen IP is healthy,
neither a fair distribution of IPs amongst the servers in the farm,
etc...
It only guarantees that if the DNS server returns many records for a
hostname and that this hostname is being resolved by multiple servers in
the same backend, then we'll use as many records as possible.
If a server fails, HAProxy won't pick up an other record from the
response.

MEDIUM: global: add a 'hard-stop-after' option to cap the soft-stop time

When SIGUSR1 is received, haproxy enters in soft-stop and quits when no
connection remains.
It can happen that the instance remains alive for a long time, depending
on timeouts and traffic. This option ensures that soft-stop won't run
for too long.

Example:
  global
    hard-stop-after 30s  # Once in soft-stop, the instance will remain
                         # alive for at most 30 seconds.

DOC: Protocol doc: add noop TLV

Add definition of the PP2_TYPE_NOOP TLV which can be used for data
padding and alignment.

DOC: Protocol doc: add SSL TLVs, rename CHECKSUM

Add SSL-related TLV types PP2_SUBTYPE_SSL_CIPHER,
PP2_SUBTYPE_SSL_SIG_ALG and PP2_SUBTYPE_SSL_KEY_ALG. Rename
PP2_TYPE_CHECKSUM to PP2_TYPE_CRC32C to make it easier to add checksums
using other algorithms. Clarified encoding of the string fields.
Renamed ASCII to US-ASCII as recommended by
https://www.iana.org/assignments/character-sets/character-sets.xhtml.

DOC: Protocol doc: add checksum, TLV type ranges

Add the CRC32c checksum TLV PP2_TYPE_CHECKSUM. Reserve TLV type ranges
CUSTOM, EXPERIMENT and FUTURE. Clarify that only UNSPEC protocol byte is
mandatory to implement on the receiver.

DOC/MINOR: Fix typos in proxy protocol doc

MEDIUM: kqueue: only set FD_POLL_IN when there are pending data

Let's avoid setting FD_POLL_IN when there's no pending data. It will
save a useless recv() syscall on pure closes.

MEDIUM: kqueue: take care of EV_EOF to improve polling status accuracy

kevent() always sets EV_EOF with EVFILT_READ to notify of a read shutdown
and EV_EOF with EVFILT_WRITE to notify of a write error. Let's check this
flag to properly update the FD's polled status (FD_POLL_HUP and FD_POLL_ERR
respectively).

It's worth noting that this one can be coupled with a regular read event
to notify about a pending read followed by a shutdown, but for now we only
use this to set the relevant flags (HUP and ERR).

The poller now exhibits the flag HAP_POLL_F_RDHUP to indicate this new
capability.

An improvement may consist in not setting FD_POLL_IN when the "data"
field is null since it normally only reflects the amount of pending
data.

MINOR: kqueue: exclusively rely on the kqueue returned status

After commit e852545 ("MEDIUM: polling: centralize polled events processing")
all pollers stopped to explicitly check the FD's polled status, except the
kqueue poller which is constructed a bit differently. It doesn't seem possible
to cause any issue such as missing an event however, but anyway it's better to
definitely get rid of this since the event filter already provides the event
direction.

OPTIM: poll: enable support for POLLRDHUP

On Linux since 2.6.17 poll() supports POLLRDHUP to notify of an upcoming
hangup after pending data. Making use of it allows us to avoid a useless
recv() after short responses on keep-alive connections. Note that we
automatically enable the feature once this flag has been met first in a
poll() status. Till now it was only enabled on epoll.

BUG/MINOR: raw_sock: always perfom the last recv if RDHUP is not available

Curu Wong reported a case where haproxy used to send RST to a server
after receiving its FIN. The problem is caused by the fact that being
a server connection, its fd is marked with linger_risk=1, and that the
poller didn't report POLLRDHUP, making haproxy unaware of a pending
shutdown that came after the data, so it used to resort to nolinger
for closing.

However when pollers support RDHUP we're pretty certain whether or not
a shutdown comes after the data and we don't need to perform that extra
recv() call.

Similarly when we're dealing with an inbound connection we don't care
and don't want to perform this extra recv after a request for a very
unlikely case, as in any case we'll have to deal with the client-facing
TIME_WAIT socket.

So this patch ensures that only when it's known that there's no risk with
lingering data, as well as in cases where it's known that the poller would
have detected a pending RDHUP, we perform the fd_done_recv() otherwise we
continue, trying a subsequent recv() to try to detect a pending shutdown.

This effectively results in an extra recv() call for keep-alive sockets
connected to a server when POLLRDHUP isn't known to be supported, but
it's the only way to know whether they're still alive or closed.

This fix should be backported to 1.7, 1.6 and 1.5. It relies on the
previous patch bringing support for the HAP_POLL_F_RDHUP flag.

MINOR: fd: add a new flag HAP_POLL_F_RDHUP to struct poller

We'll need to differenciate between pollers which can report hangup at
the same time as read (POLL_RDHUP) from the other ones, because only
these ones may benefit from the fd_done_recv() optimization. Epoll has
had support for EPOLLRDHUP since Linux 2.6.17 and has always been used
this way in haproxy, so now we only set the flag once we've observed it
once in a response. It means that some initial requests may try to
perform a second recv() call, but after the first closed connection it
will be enough to know that the second call is not needed anymore.

Later we may extend these flags to designate event-triggered pollers.

BUG/MEDIUM: stream: fix client-fin/server-fin handling

A tcp half connection can cause 100% CPU on expiration.

First reproduced with this haproxy configuration :

  global
      tune.bufsize 10485760
  defaults
      timeout server-fin 90s
      timeout client-fin 90s
  backend node2
      mode tcp
      timeout server 900s
      timeout connect 10s
      server def 127.0.0.1:3333
  frontend fe_api
      mode  tcp
      timeout client 900s
      bind :1990
      use_backend node2

Ie timeout server-fin shorter than timeout server, the backend server
sends data, this package is left in the cache of haproxy, the backend
server continue sending fin package, haproxy recv fin package. this
time the session information is as follows:

  time the session information is as follows:
      0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
      srv=def ts=08 age=1s calls=3 rq[f=848000h,i=0,an=00h,rx=14m58s,wx=,ax=]
      rp[f=8004c020h,i=0,an=00h,rx=,wx=14m58s,ax=] s0=[7,0h,fd=6,ex=]
      s1=[7,18h,fd=7,ex=] exp=14m58s

  rp has set the CF_SHUTR state, next, the client sends the fin package,

  session information is as follows:
      0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
      srv=def ts=08 age=38s calls=4 rq[f=84a020h,i=0,an=00h,rx=,wx=,ax=]
      rp[f=8004c020h,i=0,an=00h,rx=1m11s,wx=14m21s,ax=] s0=[7,0h,fd=6,ex=]
      s1=[9,10h,fd=7,ex=] exp=1m11s

  After waiting 90s, session information is as follows:
      0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
      srv=def ts=04 age=4m11s calls=718074391 rq[f=84a020h,i=0,an=00h,rx=,wx=,ax=]
      rp[f=8004c020h,i=0,an=00h,rx=?,wx=10m49s,ax=] s0=[7,0h,fd=6,ex=]
      s1=[9,10h,fd=7,ex=] exp=? run(nice=0)

  cpu information:
      6899 root      20   0  112224  21408   4260 R 100.0  0.7   3:04.96 haproxy

Buffering is set to ensure that there is data in the haproxy buffer, and haproxy
can receive the fin package, set the CF_SHUTR flag, If the CF_SHUTR flag has been
set, The following code does not clear the timeout message, causing cpu 100%:

stream.c:process_stream:

        if (unlikely((res->flags & (CF_SHUTR|CF_READ_TIMEOUT)) == CF_READ_TIMEOUT)) {
            if (si_b->flags & SI_FL_NOHALF)
            si_b->flags |= SI_FL_NOLINGER;
            si_shutr(si_b);
        }

If you have closed the read, set the read timeout does not make sense.
With or without cf_shutr, read timeout is set:

       if (tick_isset(s->be->timeout.serverfin)) {
           res->rto = s->be->timeout.serverfin;
           res->rex = tick_add(now_ms, res->rto);
       }

After discussion on the mailing list, setting half-closed timeouts the
hard way here doesn't make sense. They should be set only at the moment
the shutdown() is performed. It will also solve a special case which was
already reported of some half-closed timeouts not working when the shutw()
is performed directly at the stream-interface layer (no analyser involved).
Since the stream interface layer cannot know the timeout values, we'll have
to store them directly in the stream interface so that they are used upon
shutw(). This patch does this, fixing the problem.

An easier reproducer to validate the fix is to keep the huge buffer and
shorten all timeouts, then call it under tcploop server and client, and
wait 3 seconds to see haproxy run at 100% CPU :

  global
      tune.bufsize 10485760

  listen px
      bind :1990
      timeout client 90s
      timeout server 90s
      timeout connect 1s
      timeout server-fin 3s
      timeout client-fin 3s
      server def 127.0.0.1:3333

  $ tcploop 3333 L W N20 A P100 F P10000 &
  $ tcploop 127.0.0.1:1990 C S10000000 F

MINOR: doc: 2.4. Examples should be 2.5. Examples

Guillaume Michaud reported against Cyril's haproxy-dconv project
that the index for the Examples section should be 2.5 instead of
2.4.

Should be backported to 1.7 and 1.6, so that the example section
can be linked to:
https://cbonte.github.io/haproxy-dconv/1.6/configuration.html#2.5

BUG/MAJOR: http: fix typo in http_apply_redirect_rule

Because of this typo, AN_RES_FLT_END was never called when a redirect rule is
applied on a keep-alive connection. In almost all cases, this bug has no
effect. But, it leads to a memory leak if a redirect is done on a http-response
rule when the HTTP compression is enabled.

This patch should be backported in 1.7.

BUILD: ssl: fix OPENSSL_NO_SSL_TRACE for boringssl and libressl

"sample-fetch which captures the cipherlist" patch introduce #define
do deal with trace functions only available in openssl > 1.0.2.
Add this #define to libressl and boringssl environment.
Thanks to Piotr Kubaj for postponing and testing with libressl.

BUILD: ssl: simplify SSL_CTX_set_ecdh_auto compatibility

SSL_CTX_set_ecdh_auto is declared (when present) with #define. A simple #ifdef
avoid to list all cases of ssllibs. It's a placebo in new ssllibs. It's ok with
openssl 1.0.1, 1.0.2, 1.1.0, libressl and boringssl.
Thanks to Piotr Kubaj for postponing and testing with libressl.

BUG: payload: fix payload not retrieving arbitrary lengths

This fixes a regression introduced in d7bdcb874bcb, that removed the
ability to use req.payload(0,0) to read the whole buffer content. The
offending commit is present starting in version 1.6, so the patch
should be backported to versions 1.6 and 1.7.

CLEANUP: connection: completely remove CO_FL_WAKE_DATA

Since it's only set and never tested anymore, let's remove it.

MEDIUM: connection: don't test for CO_FL_WAKE_DATA

This flag is always set when we end up here, for each and every data
layer (idle, stream-interface, checks), and continuing to test it
leaves a big risk of forgetting to set it as happened once already
before 1.5-dev13.

It could make sense to backport this into stable branches as part of
the connection flag fixes, after some cool down period.

BUG/MEDIUM: connection: ensure to always report the end of handshakes

Despite the previous commit working fine on all tests, it's still not
sufficient to completely address the problem. If the connection handler
is called with an event validating an L4 connection but some handshakes
remain (eg: accept-proxy), it will still wake the function up, which
will not report the activity, and will not detect a change once the
handshake it complete so it will not notify the ->wake() handler.

In fact the only reason why the ->wake() handler is still called here
is because after dropping the last handshake, we try to call ->recv()
and ->send() in turn and change the flags in order to detect a data
activity. But if for any reason the data layer is not interested in
reading nor writing, it will not get these events.

A cleaner way to address this is to call the ->wake() handler only
on definitive status changes (shut, error), on real data activity,
and on a complete connection setup, measured as CONNECTED with no
more handshake pending.

It could be argued that the handshake flags have to be made part of
the condition to set CO_FL_CONNECTED but that would currently break
a part of the health checks. Also a handshake could appear at any
moment even after a connection is established so we'd lose the
ability to detect a second end of handshake.

For now the situation around CO_FL_CONNECTED is not clean :
  - session_accept() only sets CO_FL_CONNECTED if there's no pending
    handshake ;

  - conn_fd_handler() will set it once L4 and L6 are complete, which
    will do what session_accept() above refrained from doing even if
    an accept_proxy handshake is still pending ;

  - ssl_sock_infocbk() and ssl_sock_handshake() consider that a
    handshake performed with CO_FL_CONNECTED set is a renegociation ;
    => they should instead filter on CO_FL_WAIT_L6_CONN

  - all ssl_fc_* sample fetch functions wait for CO_FL_CONNECTED before
    accepting to fetch information
    => they should also get rid of any pending handshake

  - smp_fetch_fc_rcvd_proxy() uses !CO_FL_CONNECTED instead of
    CO_FL_ACCEPT_PROXY

  - health checks (standard and tcp-checks) don't check for HANDSHAKE
    and may report a successful check based on CO_FL_CONNECTED while
    not yet done (eg: send buffer full on send_proxy).

This patch aims at solving some of these side effects in a backportable
way before this is reworked in depth :
  - we need to call ->wake() to report connection success, measure
    connection time, notify that the data layer is ready and update
    the data layer after activity ; this has to be done either if
    we switch from pending {L4,L6}_CONN to nothing with no handshakes
    left, or if we notice some handshakes were pending and are now
    done.

  - we document that CO_FL_CONNECTED exactly means "L4 connection
    setup confirmed at least once, L6 connection setup confirmed
    at least once or not necessary, all this regardless of any
    possibly remaining handshakes or future L6 negociations".

This patch also renames CO_FL_CONN_STATUS to the more explicit
CO_FL_NOTIFY_DATA, and works around the previous flags trick consiting
in setting an impossible combination of flags to notify the data layer,
by simply clearing the current flags.

This fix should be backported to 1.7, 1.6 and 1.5.

BUG/MAJOR: stream-int: do not depend on connection flags to detect connection

Recent fix 7bf3fa3 ("BUG/MAJOR: connection: update CO_FL_CONNECTED before
calling the data layer") marked an end to a fragile situation where the
absence of CO_FL_{CONNECTED,L4,L6}* flags is used to mark the completion
of a connection setup. The problem is that by setting the CO_FL_CONNECTED
flag earlier, we can indeed call the ->wake() function from conn_fd_handler
but the stream-interface's wake function needs to see CO_FL_CONNECTED unset
to detect that a connection has just been established, so if there's no
pending data in the buffer, the connection times out. The other ->wake()
functions (health checks and idle connections) don't do this though.

So instead of trying to detect a subtle change in connection flags,
let's simply rely on the stream-interface's state and validate that the
connection is properly established and that handshakes are completed
before reporting the WRITE_NULL indicating that a pending connection was
just completed.

This patch passed all tests of handshake and non-handshake combinations,
with synchronous and asynchronous connect() and should be safe for backport
to 1.7, 1.6 and 1.5 when the fix above is already present.

TESTS: add a test configuration to stress handshake combinations

This config tries to involve the various possible combinations of connection
handshakes, on the accept side and on the connect side. It also produces logs
indicating the handshake time.

May be tested with tcploop as the server, both for TCP and HTTP mode :
   - accept new connection
   - pause 100ms
   - send what looks like an HTTP response
   - wait 500ms and close

Starting log server (mainly to check timers) :
   $ socat udp-recvfrom:5514,fork -

Starting server :
   $ tcploop 8000 L N A W P100 S:"HTTP/1.0 200 OK\r\nConnection: close\r\n\r\n" P500

Testing all combinations with server-speaks-first (tcp) :
   $ nc 0 8007

Testing all combinations with client-speaks-first (tcp) :
   $ (printf "GET / HTTP/1.0\r\n\r\n";sleep 1) | nc 0 8007

Testing all combinations with client-speaks-first after pause (tcp) :
   $ (usleep 0.05 ; printf "GET / HTTP/1.0\r\n\r\n";sleep 1) | nc 0 8007

Testing all combinations with client-speaks-first (http) :
   $ (printf "GET / HTTP/1.0\r\n\r\n";sleep 1) | nc 0 8017

Testing all combinations with client-speaks-first after pause (http) :
   $ (usleep 0.05 ; printf "GET / HTTP/1.0\r\n\r\n";sleep 1) | nc 0 8017

Same tests must be redone after surrounding connect() in tcp_connect_server()
with fcntl(fd, F_SETFL, 0) and fcntl(fd, F_SETFL, O_NONBLOCK) for sycnhronous
connect().

BUG/MEDIUM: filters: Fix channels synchronization in flt_end_analyze

When a filter is used, there are 2 channel's analyzers to surround all the
others, flt_start_analyze and flt_end_analyze. This is the good place to acquire
and release resources used by filters, when needed. In addition, the last one is
used to synchronize the both channels, especially for HTTP streams. We must wait
that the analyze is finished for the both channels for an HTTP transaction
before restarting it for the next one.

But this part was buggy, leading to unexpected behaviours. First, depending on
which channel ends first, the request or the response can be switch in a
"forward forever" mode. Then, the HTTP transaction can be cleaned up too early,
while a processing is still in progress on a channel.

To fix the bug, the flag CF_FLT_ANALYZE has been added. It is set on channels in
flt_start_analyze and is kept if at least one filter is still analyzing the
channel. So, we can trigger the channel syncrhonization if this flag was removed
on the both channels. In addition, the flag TX_WAIT_CLEANUP has been added on
the transaction to know if the transaction must be cleaned up or not during
channels syncrhonization. This way, we are sure to reset everything once all the
processings are finished.

This patch should be backported in 1.7.

CLEANUP: config: Typo in comment.

This is for the recently merged dynamic cookie patch set.

BUG/MEDIUM server: Fix crash when dynamic is defined, but not key is provided.

Wait until we're sure we have a key before trying to calculate its length.

[wt: no backport needed, was just merged]

BUG/MEDIUM: listener: do not try to rebind another process' socket

When the "process" setting of a bind line limits the processes a
listening socket is enabled on, a "disable frontend" operation followed
by an "enable frontend" triggers a bug because all declared listeners
are attempted to be bound again regardless of their assigned processes.
This can at minima create new sockets not receiving traffic, and at worst
prevent from re-enabling a frontend if it's bound to a privileged port.

This bug was introduced by commit 1c4b814 ("MEDIUM: listener: support
rebinding during resume()") merged in 1.6-dev1, trying to perform the
bind() before checking the process list instead of after.

Just move the process check before the bind() operation to fix this.
This fix must be backported to 1.7 and 1.6.

Thanks to Pavlos for reporting this one.

CONTRIB: tcploop: use the trash instead of NULL for recv()

NULL is Linux-centric and we're not focused on performance here but
portability and reproducibility. Don't use NULL and use the trash
instead. It may lead to multiple recv() calls for large blocks but
as a benefit it will be possible to see the contents with strace.

CONTRIB: tcploop: fix connect's address length

FreeBSD wants the address size to be correct, so let's pass the size
of a sockaddr_in struct, not the sockaddr_storage.

CONTRIB: tcploop: report action 'K' (kill) in usage message

It was missing from the initial import.

CONTRIB: tcploop: fix time format to silence build warnings

timeval doesn't necessarily have ints for tv_sec/tv_usec, let's cast
them.

CONTRIB: tcploop: make it build on FreeBSD

A few changes :
- SOL_TCP must be replaced with IPPROTO_TCP
- TCP_NOQUICKACK is not defined
- MSG_MORE can be ignored and replaced with 0

BUG/MINOR: checks: attempt clean shutw for SSL check

Strict interpretation of TLS can cause SSL sessions to be thrown away
when the socket is shutdown without sending a "close notify", resulting
in each check to go through the complete handshake, eating more CPU on
the servers.

[wt: strictly speaking there's no guarantee that the close notify will
be delivered, it's only best effort, but that may be enough to ensure
that once at least one is received, next checks will be cheaper. This
should be backported to 1.7 and possibly 1.6]

MINOR: cli: Let configure the dynamic cookies from the cli.

This adds 3 new commands to the cli :
enable dynamic-cookie backend <backend> that enables dynamic cookies for a
specified backend
disable dynamic-cookie backend <backend> that disables dynamic cookies for a
specified backend
set dynamic-cookie-key backend <backend> that lets one change the dynamic
cookie secret key, for a specified backend.

MINOR: server: Add dynamic session cookies.

This adds a new "dynamic" keyword for the cookie option. If set, a cookie
will be generated for each server (assuming one isn't already provided on
the "server" line), from the IP of the server, the TCP port, and a secret
key provided. To provide the secret key, a new keyword as been added,
"dynamic-cookie-key", for backends.

Example :
backend bk_web
  balance roundrobin
  dynamic-cookie-key "bla"
  cookie WEBSRV insert dynamic
  server s1 127.0.0.1:80 check
  server s2 192.168.56.1:80 check

This is a first step to be able to dynamically add and remove servers,
without modifying the configuration file, and still have all the load
balancers redirect the traffic to the right server.

Provide a way to generate session cookies, based on the IP address of the
server, the TCP port, and a secret key provided.

BUG/MAJOR: connection: update CO_FL_CONNECTED before calling the data layer

Matthias Fechner reported a regression in 1.7.3 brought by the backport
of commit 819efbf ("BUG/MEDIUM: tcp: don't poll for write when connect()
succeeds"), causing some connections to fail to establish once in a while.
While this commit itself was a fix for a bad sequencing of connection
events, it in fact unveiled a much deeper bug going back to the connection
rework era in v1.5-dev12 : 8f8c92f ("MAJOR: connection: add a new
CO_FL_CONNECTED flag").

It's worth noting that in a lab reproducing a similar environment as
Matthias' about only 1 every 19000 connections exhibit this behaviour,
making the issue not so easy to observe. A trick to make the problem
more observable consists in disabling non-blocking mode on the socket
before calling connect() and re-enabling it later, so that connect()
always succeeds. Then it becomes 100% reproducible.

The problem is that this CO_FL_CONNECTED flag is tested after deciding to
call the data layer (typically the stream interface but might be a health
check as well), and that the decision to call the data layer relies on a
change of one of the flags covered by the CO_FL_CONN_STATE set, which is
made of CO_FL_CONNECTED among others.

Before the fix above, this bug couldn't appear with TCP but it could
appear with Unix sockets. Indeed, connect() was always considered
blocking so the CO_FL_WAIT_L4_CONN connection flag was always set, and
polling for write events was always enabled. This used to guarantee that
the conn_fd_handler() could detect a change among the CO_FL_CONN_STATE
flags.

Now with the fix above, if a connect() immediately succeeds for non-ssl
connection with send-proxy enabled, and no data in the buffer (thus TCP
mode only), the CO_FL_WAIT_L4_CONN flag is not set, the lack of data in
the buffer doesn't enable polling flags for the data layer, the
CO_FL_CONNECTED flag is not set due to send-proxy still being pending,
and once send-proxy is done, its completion doesn't cause the data layer
to be woken up due to the fact that CO_FL_CONNECT is still not present
and that the CO_FL_SEND_PROXY flag is not watched in CO_FL_CONN_STATE.

Then no progress is made when data are received from the client (and
attempted to be forwarded), because a CF_WRITE_NULL (or CF_WRITE_PARTIAL)
flag is needed for the stream-interface state to turn from SI_ST_CON to
SI_ST_EST, allowing ->chk_snd() to be called when new data arrive. And
the only way to set this flag is to call the data layer of course.

After the connect timeout, the connection gets killed and if in the mean
time some data have accumulated in the buffer, the retry will succeed.

This patch fixes this situation by simply placing the update of
CO_FL_CONNECTED where it should have been, before the check for a flag
change needed to wake up the data layer and not after.

This fix must be backported to 1.7, 1.6 and 1.5. Versions not having
the patch above are still affected for unix sockets.

Special thanks to Matthias Fechner who provided a very detailed bug
report with a bisection designating the faulty patch, and to Olivier
Houchard for providing full access to a pretty similar environment where
the issue could first be reproduced.

MEDIUM: stats: Add show json schema

This may be used to output the JSON schema which describes the output of
show info json and show stats json.

The JSON output is without any extra whitespace in order to reduce the
volume of output. For human consumption passing the output through a
pretty printer may be helpful.

e.g.:
$ echo "show schema json" | socat /var/run/haproxy.stat stdio | \
     python -m json.tool

The implementation does not generate the schema. Some consideration could
be given to integrating the output of the schema with the output of
typed and json info and stats. In particular the types (u32, s64, etc...)
and tags.

A sample verification of show info json and show stats json using
the schema is as follows. It uses the jsonschema python module:

cat > jschema.py <<  __EOF__
import json

from jsonschema import validate
from jsonschema.validators import Draft3Validator

with open('schema.txt', 'r') as f:
    schema = json.load(f)
    Draft3Validator.check_schema(schema)

    with open('instance.txt', 'r') as f:
        instance = json.load(f)
validate(instance, schema, Draft3Validator)
__EOF__

$ echo "show schema json" | socat /var/run/haproxy.stat stdio > schema.txt
$ echo "show info json" | socat /var/run/haproxy.stat stdio > instance.txt
python ./jschema.py
$ echo "show stats json" | socat /var/run/haproxy.stat stdio > instance.txt
python ./jschema.py

Signed-off-by: Simon Horman <horms@verge.net.au>

MEDIUM: stats: Add JSON output option to show (info|stat)

Add a json parameter to show (info|stat) which will output information
in JSON format. A follow-up patch will add a JSON schema which describes
the format of the JSON output of these commands.

The JSON output is without any extra whitespace in order to reduce the
volume of output. For human consumption passing the output through a
pretty printer may be helpful.

e.g.:
$ echo "show info json" | socat /var/run/haproxy.stat stdio | \
python -m json.tool

STAT_STARTED has bee added in order to track if show output has begun or
not. This is used in order to allow the JSON output routines to only insert
a "," between elements when needed. I would value any feedback on how this
might be done better.

Signed-off-by: Simon Horman <horms@verge.net.au>

CLEANUP: http: make http_server_error() not set the status anymore

Given that all call places except one had to set txn->status prior to
calling http_server_error(), it's simpler to make this function rely
on txn->status than have it store it from an argument.

MINOR: http-request tarpit deny_status.

Implements deny_status for http-request tarpit rule (allows setting
custom http status code). This commit depends on:
MEDIUM: http_error_message: txn->status / http_get_status_idx.

MEDIUM: http_error_message: txn->status / http_get_status_idx.

This commit removes second argument(msgnum) from http_error_message and
changes http_error_message to use s->txn->status/http_get_status_idx for
mapping status code from 200..504 to HTTP_ERR_200..HTTP_ERR_504(enum).

This is needed for http-request tarpit deny_status commit.

CLEANUP: Remove comment that's no longer valid

Code was deleted in ad63582eb, but the comment remained.

Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>

MINOR: Add hostname sample fetch

It adds "hostname" as a new sample fetch. It does exactly the same as
"%H" in a log format except that it can be used outside of log formats.

Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>

CLEANUP: Replace repeated code to count usable servers with be_usable_srv()

2 places were using an open-coded implementation of this function to count
available servers. Note that the avg_queue_size() fetch didn't check that
the proxy was in STOPPED state so it would possibly return a wrong server
count here but that wouldn't impact the returned value.

Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>

MINOR: Add nbsrv sample converter

This is like the nbsrv() sample fetch function except that it works as
a converter so it can count the number of available servers of a backend
name retrieved using a sample fetch or an environment variable.

Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>

BUG/MINOR: Fix "get map <map> <value>" CLI command

The said form of the CLI command didn't return anything since commit
ad8be61c7.

This fix needs to be backported to 1.7.

Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>

BUG/MEDIUM: cli: Prevent double free in CLI ACL lookup

The memory is released by cli_release_mlook, which also properly sets the
pointer to NULL. This was introduced with a big code reorganization
involving moving to the new keyword registration form in commit ad8be61c7.

This fix needs to be backported to 1.7.

Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>

BUG/MEDIUM: ssl: Clear OpenSSL error stack after trying to parse OCSP file

Invalid OCSP file (for example empty one that can be used to enable
OCSP response to be set dynamically later) causes errors that are
placed on OpenSSL error stack. Those errors are not cleared so
anything that checks this stack later will fail.

Following configuration:
  bind :443 ssl crt crt1.pem crt crt2.pem

With following files:
  crt1.pem
  crt1.pem.ocsp - empty one
  crt2.pem.rsa
  crt2.pem.ecdsa

Will fail to load.

This patch should be backported to 1.7.

MINOR: config: warn when some HTTP rules are used in a TCP proxy

Surprizingly, http-request, http-response, block, redirect, and capture
rules did not cause a warning to be emitted when used in a TCP proxy, so
let's fix this.

This patch may be backported to older versions as it helps spotting
configuration issues.

DOC: spoe: Update SPOE documentation to reflect recent changes

MINOR: spoe: Add "max-frame-size" statement in spoe-agent section

As its named said, this statement customize the maximum allowed size for frames
exchanged between HAProxy and SPOAs. It should be greater than or equal to 256
and less than or equal to (tune.bufsize - 4) (4 bytes are reserved to the frame
length).

MINOR: spoe: Add "send-frag-payload" option in spoe-agent section

This option can be used to enable or to disable (prefixing the option line with
the "no" keyword) the sending of fragmented payload to agents. By default, this
option is enabled.

MINOR: spoe: Rely on alertif_too_many_arg during configuration parsing