git.ipfire.org Git - thirdparty/haproxy.git/log

DOC: explain HTTP2 timeout behavior

Clarifies that in HTTP2 we don't consider "timeout http-keep-alive", but
"timeout client" instead.

MEDIUM: cache: max-age configuration keyword

Add a configuration keyword to change the max-age.
The default one is still 60s.

MINOR: cache: replace a fprint() by an abort()

In the applet I/O handler we can never get an object bigger than a
buffer, so we should never reach this case.

DOC: add Christopher and Emeric as maintainers of the threads

We'll need to be extremely careful at the beginning regarding changes.

DOC: mention William as maintainer of the cache and master-worker

The latter is very tricky, better not touch anything there without
his approval.

DOC: add initial peers protovol v2.0 documentation.

[wt: the new version is 2.1 but it's useful to document the different
versions since they're found in field. There's some overlap with the
new one and they complement on certain areas. Most likely they'll
ultimately be merged.]

DOC: fix mangled version in peers protocol documentation

Tim Düsterhus noticed that the create-release script had mangled the
version in the peers protocol doc, forcing it to 1.8 due to its syntax
matching the format of an haproxy version. Let's just slightly readjust
the header not to match this by removing the word "version" and placing
it on the same line as the title.

DOC: update the roadmap file with the latest changes merged in 1.8

We're making progress :-)

CLEANUP: pools: rename all pool functions and pointers to remove this "2"

During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".

MINOR/CLEANUP: proxy: rename "proxy" to "proxies_list"

Rename the global variable "proxy" to "proxies_list".
There's been multiple proxies in haproxy for quite some time, and "proxy"
is a potential source of bugs, a number of functions have a "proxy" argument,
and some code used "proxy" when it really meant "px" or "curproxy". It worked
by pure luck, because it usually happened while parsing the config, and thus
"proxy" pointed to the currently parsed proxy, but we should probably not
rely on this.

[wt: some of these are definitely fixes that are worth backporting]

CLEANUP: log: Rename Alert/Warning in ha_alert/ha_warning

CLEANUP: debug: Use DPRINTF instead of fprintf into #ifdef DEBUG_FULL/#endif

MEDIUM: listener: Bind listeners on a thread subset if specified

If a "process" option with a thread set is used on the bind line, we use the
corresponding bitmask when the listener's FD is created.

MINOR: config: Add threads support for "process" option on "bind" lines

It is now possible on a "bind" line (or a "stats socket" line) to specify the
thread set allowed to process listener's connections. For instance:

    # HTTPS connections will be processed by all threads but the first and HTTP
    # connection will be processed on the first thread.
    bind *:80 process 1/1
    bind *:443 ssl crt mycert.pem process 1/2-

MINOR: config: Add the threads support in cpu-map directive

Now, it is possible to bind CPU at the thread level instead of the process level
by defining a thread set in "cpu-map" directives. Thus, its format is now:

cpu-map [auto:]<process-set>[/<thread-set>] <cpu-set>...

where <process-set> and <thread-set> must follow the format:

all | odd | even | number[-[number]]

Having a process range and a thread range in same time with the "auto:" prefix
is not supported. Only one range is supported, the other one must be a fixed
number. But it is allowed when there is no "auto:" prefix.

Because it is possible to define a mapping for a process and another for a
thread on this process, threads will be bound on the intersection of their
mapping and the one of the process on which they are attached. If the
intersection is null, no specific binding will be set for the threads.

MINOR:: config: Remove thread-map directive

It was a temporary directive used for development purpose. Now, CPU mapping for
at the thread level should be done using the cpu-map directive. This feature
will be added in a next commit.

MINOR: config: Support partial ranges in cpu-map directive

Now, processa and CPU ranges can be partially defined. The higher bound can be
omitted. In such case, it is replaced by the corresponding maximum value, 32 or
64 depending on the machine's word size.

By extension, It is also true for the "bind-process" directive and "process"
parameter on a "bind" or a "stats socket" line.

MINOR: config: Add auto-increment feature for cpu-map

The prefix "auto:" can be added before the process set to let HAProxy
automatically bind a process to a CPU by incrementing process and CPU sets. To
be valid, both sets must have the same size. No matter the declaration order of
the CPU sets, it will be bound from the lower to the higher bound.

  Examples:
      # all these lines bind the process 1 to the cpu 0, the process 2 to cpu 1
      #  and so on.
      cpu-map auto:1-4   0-3
      cpu-map auto:1-4   0-1 2-3
      cpu-map auto:1-4   3 2 1 0

      # bind each process to exaclty one CPU using all/odd/even keyword
      cpu-map auto:all   0-63
      cpu-map auto:even  0-31
      cpu-map auto:odd   32-63

      # invalid cpu-map because process and CPU sets have different sizes.
      cpu-map auto:1-4   0    # invalid
      cpu-map auto:1     0-3  # invalid

MINOR: standard: Add my_ffsl function to get the position of the bit set to one

MINOR: config: Export parse_process_number and use it wherever it's applicable

This function is used when "bind-process" directive is parsed and when "process"
parameter on a "bind" or a "stats socket" line is parsed.

MINOR: config: Slightly change how parse_process_number works

Now, this function returns a status code to indicate a success (0) or a failure
(1) and the error message in set in <err> parameter. And the result of the parsing
is set in <proc> parameter.

MINOR: config: Support a range to specify processes in "cpu-map" parameter

Now, you can define processes concerned by a cpu-map line using a range. For
instance, the following line binds the first 32 processes on CPUs 0 to 3:

cpu-map 1-32 0-3

BUG/MINOR: listener: Allow multiple "process" options on "bind" lines

The documentation specifies that you can have several "process" options to
define several ranges on "bind" lines (or "stats socket" lines). It is uncommon,
but it should be possible. So the bind_proc bitmask in bind_conf structure must
not be overwritten at each new "process" option parsed.

This bug also exists in 1.7, 1.6 and 1.5. So it may be backported. But no one
seems to have noticed it, so it was probably never hitted.

MINOR: cache: move the refcount decrease in the applet release

Move the refcount decrease of the cache in the release callback of the
applet. We don't need to decrease it in the applet code.

BUG/MEDIUM: cache: free ressources in chn_end_analyze

Upon an aborted HTTP connection, or an error, the filter cache does not
decrement the refcount and does not free the allocated ressources.

BUG/MEDIUM: stream: always release the stream-interface on abort

The cache exhibited a but in process_stream() where upon abort it is
possible to switch the stream-int's state to SI_ST_CLO without calling
si_release_endpoint(), resulting in a possibly missing ->release() for
the applet.

It should affect all other applets as well (eg: lua, spoe, peers) and
should carefully be backported to stable branches after some observation
period.

MINOR: ssl: Handle early data with BoringSSL

BoringSSL early data differ from OpenSSL 1.1.1 implementation. When early
handshake is done, SSL_in_early_data report if SSL_read will be done on early
data. CO_FL_EARLY_SSL_HS and CO_FL_EARLY_DATA can be adjust accordingly.

MEDIUM: config: ensure that tune.bufsize is at least 16384 when using HTTP/2

HTTP/2 mandates the support of 16384 bytes frames by default, so we need
a large enough buffer to process them. Till now if tune.bufsize was too
small, H2 connections were simply rejected during their establishment,
making it quite hard to troubleshoot the issue.

Now we detect when HTTP/2 is enabled on an HTTP frontend and emit an
error if tune.bufsize is not large enough, with the appropriate
recommendation.

MINOR: h2: make use of client-fin timeout after GOAWAY

At the moment, the "client" timeout is used on an HTTP/2 connection once
it's idle with no active stream. With this patch, this timeout is replaced
by client-fin once a GOAWAY frame is sent. This closely matches what is
done on HTTP/1 since the principle is the same, as it indicates a willing
ness to quickly close a connection on which we don't expect to see anything
anymore.

MEDIUM: h2: don't gracefully close the connection anymore on Connection: close

As reported by Lukas, it causes more harm than good, for example on
prompt for authentication. Now we have an "http-request reject" rule
to use instead of "http-request deny" if we absolutely want to close
the connection.

MINOR: h2: send RST_STREAM before GOAWAY on reject

Apparently the h2c client has trouble reading the RST_STREAM frame after
a GOAWAY was sent, so it's likely that other clients may face the same
difficulty. Curl and Firefox don't care about this ordering, so let's
send it first.

MINOR: http: implement the "http-request reject" rule

This one acts similarly to its tcp-request counterpart. It immediately
closes the request without emitting any response. It can be suitable in
certain DoS conditions, as well as to close an HTTP/2 connection.

MEDIUM: cache: store sha1 for hashing the cache key

The cache was relying on the txn->uri for creating its key, which was a
big problem when there was no log activated.

This patch does a sha1 of the host + uri, and stores it in the txn.
When a object is stored, the eb32node uses the first 32 bits of the hash
as a key, and the whole hash is stored in the cache entry.

During a lookup, the truncated hash is used, and when it matches an
entry we check the real sha1.

MINOR: mux: Make sure every string is woken up after the handshake.

In case any stream was waiting for the handshake after receiving early data,
we have to wake all of them. Do so by making the mux responsible for
removing the CO_FL_EARLY_DATA flag after all of them are woken up, instead
of doing it in si_cs_wake_cb(), which would then only work for the first one.
This makes wait_for_handshake work with HTTP/2.

MINOR: ssl: Handle reading early data after writing better.

It can happen that we want to read early data, write some, and then continue
reading them.
To do so, we can't reuse tmp_early_data to store the amount of data sent,
so introduce a new member.
If we read early data, then ssl_sock_to_buf() is now the only responsible
for getting back to the handshake, to make sure we don't miss any early data.

BUG/MAJOR: threads/task: dequeue expired tasks under the WQ lock

There is a small unprotected window for a task between the wait queue
and the run queue where a task could be woken up and destroyed at the
same time. What typically happens is that a timeout is reached at the
same time an I/O completes and wakes it up, and the I/O terminates the
task, causing a use after free in wake_expired_tasks() possibly causing
a crash and/or memory corruption :

       thread 1                             thread 2
  (wake_expired_tasks)                (stream_int_notify)

HA_SPIN_UNLOCK(TASK_WQ_LOCK, &wq_lock);
                              task_wakeup(task, TASK_WOKEN_IO);
                              ...
                              process_stream()
                                stream_free()
                                   task_free()
                                      pool_free(task)
task_wakeup(task, TASK_WOKEN_TIMER);

This case is reasonably easy to reproduce with a config using very short
server timeouts (100ms) and client timeouts (10ms), while injecting on
httpterm requesting medium sized objects (5kB) over SSL. All this is
easier done with more threads than allocated CPUs so that pauses can
happen anywhere and last long enough for process_stream() to kill the
task.

This patch inverts the lock and the wakeup(), but requires some changes
in process_runnable_tasks() to ensure we never try to grab the WQ lock
while having the RQ lock held. This means we have to release the RQ lock
before calling task_queue(), so we can't hold the RQ lock during the
loop and must take and drop it.

It seems that a different approach with the scope-aware trees could be
easier, but it would possibly not cover situations where a task is
allowed to run on multiple threads. The current solution covers it and
doesn't seem to have any measurable performance impact.

BUG/MAJOR: h2: always remove a stream from the send list before freeing it

When a stream is aborted on timeout or any reason initiated by the stream,
and this stream was subscribed to the send list, we forgot to detach it
when freeing it, resulting in a dead node remaining present in the send
list with all usual funny consequences (memory corruption, crashes, etc).
Let's simply unconditionally delete the stream.

BUG/MINOR: stream: fix tv_request calculation for applets

When the stats code was moved to an applet, it wasn't completely
cleaned of its usage of the HTTP transaction and it used to store
the HTTP status in txn->status and to set the HTTP request date to
<now> from within the applet. This is totally wrong because the
applet is seen as a server from the HTTP engine, which parses its
response, so the http_txn must not be touched there.

This was made visible by the cache which would always exhibit a
negative TR log, indicating that nowhere in the code we took care of
setting s->logs.tv_request while the code above used to continue to
hide this. Another side effect of this issue is that under load, if
the stats applet call risks to be delayed, the reported t_queue can
appear negative by being below tv_request-tv_accept.

This patch removes the assignment of tv_request and txn->status from
the applet code and instead sets the tv_request if still unset when
connecting to the applet. This ensures that all applets report correct
request timers now.

BUG/MINOR: Use crt_base instead of ca_base when crt is parsed on a server line

In srv_parse_crt, crt_base was checked but ca_base was used to build the
certifacte path.

This patch must be backported in 1.7, 1.6 and 1.5.

MINOR: sample: Add "thread" sample fetch

It returns id of the thread calling the function.

BUG/MEDIUM: threads/time: maintain a common time reference between all threads

During high loads it becomes visible that the time drifts between threads,
sometimes showing tens of seconds after several minutes. The root cause is
the per-thread correction which is performed based on a local offset and
local time. But we can't use a unique global time either as we need the
thread-local time to be stable between two poll() calls.

This commit takes a stab at this problem by proceeding this way :

  - a global "global_now" date is monotonous and common between all threads.
  - each thread has its own local <now> which is resynced with <global_now>
    on each invocation of tv_update_date()
  - each thread detects its own drift based on its poll() timeout and its
    local <now>, and recalculates its adjusted local time
  - each thread then ensures its new local time is no older than the current
    global time, otherwise it readjusts its local time to match this one
  - finally threads do atomically update the global time to match its own
    local one

This guarantees a monotonous global time and a monotonous+stable local time.

It is still possible by definition for two threads to report a minor time
variation on subsequent events but that variation will only be caused by
the moment they watched the time and are very small. When a common global
time is needed between all threads, global_now could be used as a reference
(with care). The wallclock time used in logs is still <date> anyway.

BUG/MEDIUM: threads/time: fix time drift correction

With threads, it became mandatory to implement a thread-local time with
its own correction. However, it was noticed that during high thread
contention, the time correction could occasionally be wrong, reporting
huge negative or positive timers in logs. This was caused by the
conversion between struct timeval and a single 64-bit offset, due to
an erroneous shift and due to a loss of sign during the conversion.

Given that time_t is not always signed, and that timeval is not really
needed here, better avoid playing dangerous games with these operations
and use a single 64-bit offset representing a signed 32-bit offset, for
the seconds part and an unsigned offset for the microsecond part.
It still supports atomic updates and doesn't cause issues anymore.

MINOR: pools: implement DEBUG_UAF to detect use after free

This code has been used successfully a few times in the past to detect
that a pool was used after being freed. Its main goal is to allocate a
full page for each object so that they are always released individually
and unmapped from memory. This way if any part of the code reference the
object after is was freed and before it is reallocated, a segv occurs at
the exact offending location. It does a few extra things such as writing
to the memory area before freeing to detect double-frees and free of
read-only areas, and placing the data at the end of the page instead of
the beginning so that out of bounds accesses are easier to spot. The
amount of memory used with this is huge (about 10 times the regular
usage) but it can be useful sometimes.

MINOR: pools: prepare functions to override malloc/free in pools

This will be useful to add some debugging capabilities. For now it
changes nothing.

MINOR: ssl: Don't disable early data handling if we could not write.

If we can't write early data, for some reason, don't give up on reading them,
they may still be early data to be read, and if we don't do so, openssl
internal states might be inconsistent, and the handshake will fail.

BUG/MINOR: ssl: Always start the handshake if we can't send early data.

The current code only tries to do the handshake in case we can't send early
data if we're acting as a client, which is wrong, it has to be done on the
server side too, or we end up in an infinite loop.

BUG/MEDIUM: deinit: correctly deinitialize the proxy and global listener tasks

While using mmap() to allocate pools for debugging purposes, kill -USR1 caused
libc aborts in deinit() on two calls to free() on proxies' tasks and the global
listener task. The issue comes from the fact that we're using free() to release
a task instead of task_free(), so the task was allocated from a pool and released
using a different method.

This bug has been there since at least 1.5, so a backport is desirable to all
maintained versions.

BUG/MEDIUM: cache fix cli_kws structure

The cli_kws structure was not ended and was causing undefined behavior.

BUG/MEDIUM: cache: refcount forbids to free the objects

Some refcount decrementation were forgotten and they were forbidding to
reuse the objects in some cases.

BUG/MEDIUM: cache: use key=0 as a condition for freeing

The cache was trying to remove objects from the tree while they were
already removed from it. We set the key to 0 as a check for not trying
to remove the object from the tree when we are still using the object.

MEDIUM: cache: "show cache" on the cli

The cli command "show cache" displays the status of the cache, the first
displayed line is the shctx informations with how much blocks available
blocks it contains (blocks are 1k by default).

The next lines are the objects stored in the cache tree, the pointer,
the size of the object and how much blocks it uses, a refcount for the
number of users of the object, and the remaining expiration time (which
can be negative if expired)

Example:

    $ echo "show cache" | socat - /run/haproxy.sock
    0x7fa54e9ab03a: foobar (shctx:0x7fa54e9ab000, available blocks:3921)
    0x7fa54ed65b8c (size: 43190 (43 blocks), refcount:2, expire: 2)
    0x7fa54ecf1b4c (size: 45238 (45 blocks), refcount:0, expire: 2)
    0x7fa54ed70cec (size: 61622 (61 blocks), refcount:0, expire: 2)
    0x7fa54ecdbcac (size: 42166 (42 blocks), refcount:1, expire: 2)
    0x7fa54ec9736c (size: 44214 (44 blocks), refcount:2, expire: 2)
    0x7fa54eca28ec (size: 46262 (46 blocks), refcount:2, expire: -2)

MEDIUM: shctx: use unsigned int for len and block_count

Allows bigger objects to be cached in the shctx, the first
implementation was only storing small ssl session, but we want to store
bigger HTTP response.

CLEANUP: cache: reorder includes

CONTRIB: spoa_example: remove SPOE enums that are useless for clients

CONTRIB: spoa_example: remove last dependencies on type "sample"

Being an external agent, it's confusing that it uses haproxy's internal
types and it seems to have encouraged other implementations to do so.
Let's completely remove any reference to struct sample and use the
native DATA types instead of converting to and from haproxy's sample
types.

CONTRIB: spoa_example: remove bref, wordlist, cond_wordlist

These ones are not needed, let's further reduce the include file.

CONTRIB: spoa_example: allow to compile outside HAProxy.

Don't include haproxy's includes anymore and use a local copy instead.

BUG/MINOR: systemd: ignore daemon mode

Since we switched to notify mode in the systemd unit file in commit
d6942c8, haproxy won't start if the daemon keyword is present in the
configuration.

This change makes sure that haproxy remains in foreground when using
systemd mode and adds a note in the documentation.

BUG/MEDIUM: h2: always reassemble the Cookie request header field

The special case of the Cookie header field was overlooked in the
implementation, considering that most servers do handle cookie lists,
but as reported here on discourse it's not the case at all :

https://discourse.haproxy.org/t/h2-cookie-header-splitted-header/1742

This patch fixes this by skipping all occurences of the Cookie header
in the request while building the H1 request, and then building a single
Cookie header with all values appended at once, according to what is
requested in RFC7540#8.1.2.5.

In order to build the list of values, the list struct is used as a linked
list (as there can't be more cookies than headers). This makes the list
walking quite efficient and ensures all values are quickly found without
having to rescan the list.

A test case provided by Lukas shows that it properly works :

> GET /? HTTP/1.1
> user-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
> accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> accept-language: en-US,en;q=0.5
> accept-encoding: gzip, deflate
> referer: https://127.0.0.1:4443/?expectValue=1511294406
> host: 127.0.0.1:4443

< HTTP/1.1 200 OK
< Server: nginx
< Date: Tue, 21 Nov 2017 20:00:13 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< X-Powered-By: PHP/5.3.10-1ubuntu3.26
< Set-Cookie: HAPTESTa=1511294413
< Set-Cookie: HAPTESTb=1511294413
< Set-Cookie: HAPTESTc=1511294413
< Set-Cookie: HAPTESTd=1511294413
< Content-Encoding: gzip

> GET /?expectValue=1511294413 HTTP/1.1
> user-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
> accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> accept-language: en-US,en;q=0.5
> accept-encoding: gzip, deflate
> host: 127.0.0.1:4443
> cookie: SERVERID=s1; HAPTESTa=1511294413; HAPTESTb=1511294413; HAPTESTc=1511294413; HAPTESTd=1511294413

Many thanks to @Nurza, @adrianw and @lukastribus for their helpful reports
and investigations here.

MEDIUM: h2: change hpack_decode_headers() to only provide a list of headers

The current H2 to H1 protocol conversion presents some issues which will
require to perform some processing on certain headers before writing them
so it's not possible to convert HPACK to H1 on the fly.

This commit modifies the headers decoding so that it now works in two
phases : hpack_decode_headers() only decodes the HPACK stream in the
HEADERS frame and puts the result into a list. Headers which require
storage (huffman-compressed or from the dynamic table) are stored in
a chunk allocated by the H2 demuxer. Then once the headers are properly
decoded into this list, h2_make_h1_request() is called with this list
to produce the HTTP/1.1 request into the destination buffer. The list
necessarily enforces a limit. Here we use 2*MAX_HTTP_HDR, which means
that we can have as many individual cookies as we have regular headers
if a client decides to break their cookies into multiple values. This
seams reasonable and will allow the H1 parser to decide whether it's
too much or not.

Thus the output stream is not produced on the fly anymore and this will
permit to deal with certain corner cases like reparing the Cookie header
(which for now is not done).

In order to limit header duplication and parsing, the known pseudo headers
continue to be passed by their index : the name element in the list then
has a NULL pointer and the value is the pseudo header's index. Given that
these ones represent about half of the incoming requests and need to be
found quickly, it maintains an acceptable level of performance.

The code was significantly reduced by doing this because the orignal code
had to deal with HPACK and H1 combinations (eg: index vs not indexed, etc)
and now the HPACK decoding is totally focused on the decompression, and
the H1 encoding doesn't have to deal with the issue of wrapping input for
example.

One bug was addressed here (though it couldn't happen at the moment). The
H2 demuxer used to detect a failure to write the request into the H1 buffer
and would then detect if the output buffer wraps, realign it and try again.
The problem by doing so was that the HPACK context was already modified and
not rewindable. Thus the size check is now performed first and a failure is
reported if it doesn't fit.

MEDIUM: h2: add a function to emit an HTTP/1 request from a headers list

The current H2 to H1 protocol conversion presents some issues which will
require to perform some processing on certain headers before writing them
so it's not possible to convert HPACK to H1 on the fly.

Here we introduce a function which performs half of what hpack_decode_header()
used to do, which is to take a list of headers on input and emit the
corresponding request in HTTP/1.1 format. The code is the same and functions
were renamed to be prefixed with "h2" instead of "hpack", though it ends
up being simpler as the various HPACK-specific cases could be fused into
a single one (ie: add header).

Moving this part here makes a lot of sense as now this code is specific to
what is documented in HTTP/2 RFC 7540 and will be able to deal with special
cases related to H2 to H1 conversion enumerated in section 8.1.

Various error codes which were previously assigned to HPACK were never
used (aside being negative) and were all replaced by -1 with a comment
indicating what error was detected. The code could be further factored
thanks to this but this commit focuses on compatibility first.

This code is not yet used but builds fine.

BUG/MEDIUM: h2: properly report connection errors in headers and data handlers

We used to return >0 indicating a success when an error was present on the
connection, preventing the caller from detecting and handling it. This for
example happens when sending too many headers in a frame, making the request
impossible to decompress.

BUILD: server: check->desc always exists

Clang reports this warning :

src/server.c:872:14: warning: address of array 'check->desc' will
always evaluate to 'true' [-Wpointer-bool-conversion]

Indeed, check->desc used to be a pointer to a dynamically allocated area
a long time ago and is now an array. Let's remove the useless test.

BUILD: h2: mark some inlined functions "unused"

Clang complains that h2_get_n64() is not used, and a few other protocol
specific functions may fall in that category depending on how the code
evolves. Better mark them unused to silence the warning since it's on
purpose.

BUILD: compiler: add a new type modifier __maybe_unused

While gcc only emits warnings about unused static functions, Clang also
emits such a warning when the functions are inlined. This is a bit
annoying at certain places where functions are provided to manipulate
multiple data types and are not yet used. Let's have a type modifier
"__maybe_unused" which sets the "unused" attribute like the Linux kernel
does. It's elegant as it allows the code author to indicate that it knows
that this element might be unused. It works on variables as well, which
is convenient to remove ifdefs around local variables in certain functions,
but doesn't work on labels.

BUILD: ebtree: don't redefine types u32/s32 in scope-aware trees

Clang emits a warning about these types being redefined in eb32sctree
while they are already defined in eb32tree. Let's simply not redefine
them if eb32tree was already included.

BUILD: threads/plock: fix a build issue on Clang without optimization

[ plock commit 4c53fd3a0b2b1892817cebd0db012a52f4087850 ]

Pieter Baauw reported a build issue affecting haproxy after plock was
included. It happens that expressions of the form :

     if ((const) ? (expr1) : (expr2))
       do_something()

always produce code for both expr1 and expr2 on Clang when building
without optimization. The resulting asm code is even funny, basically
doing :

     mov reg, 1
     cmp reg, 1
     ...

This causes our sizeof() tests to fail to build because we purposely
dereference a fake function that reports the location and nature of the
inconsistency, but this fake function appears in the object code despite
all conditions being there to avoid it.

However the compiler is still smart enough to optimize away code doing

    if (const)
       do_something()

So we simply repeat the condition before do_something(), and the dummy
function is not referenced anymore unless really required.

MINOR: threads/build: atomic: replace the few inlines with macros

[ plock commit 61e255286ae32e83e1a3174dd7c49eda99880a8b]

There are a few inlines such as pl_barrier() and pl_cpu_relax() which
are used a lot. Unfortunately, while building test code at -O0, inlining
is disabled and these ones are called a lot and show up a lot in any
profile, are traced into when single-stepping with a debugger, etc, thus
they are polluting the landscape. Since they're single-asm statements,
there is no reason for not turning them into macros.

The result becomes fairly visible here at -O0 :

  $ size latency.inline latency.macro
     text    data     bss     dec     hex filename
    11431     692     656   12779    31eb treelock.inline
    10967     692     656   12315    301b treelock.macro

And it was verified that regularly optimized code remains strictly identical.

MINOR: threads/atomic: implement pl_bts() on non-x86

[ plock commit da17ba320aad3a8faf08e36fca604de9cad21fdd ]

This one was missing, it can be done using sync_fetch_and_or().

MINOR: threads/atomic: implement pl_mb() in asm on x86

[ plock commit 44081ea493dd78dab48076980e881748e9b33db5 ]

Older compilers (eg: gcc 3.4) don't provide __sync_synchronize() so let's
do it by hand on this platform.

MINOR: threads/plock: rename local variables in macros to avoid conflicts

[ plock commit b155d5c762fb9a9793911881f80e61faa6b0e889 ]

Local variables "l", "i" and "ret" were renamed "__pl_l", "__pl_i" and
"__pl_r" respectively, to limit the risk of conflicts with existing
variables in application code.

MINOR: threads/atomic: rename local variables in macros to avoid conflicts

[ plock commit bfac5887ebabb8ef753b0351f162265767eb219b ]

Local variable "t" was renamed "__pl_t" to limit the risk of conflicts
with existing variables in application code.

CLEANUP: cache: remove wrong comment

MEDIUM: cache: enable the HTTP analysers

Enable the same analysers as the stats applet.
Allows keepalive and termination flags to work.

CLEANUP: cache: remove unused struct

Remove unused structure which remain from old dev.

BUG/MEDIUM: cache: free callback to remove from tree

Call the shctx free_blocks callback in order to remove the row from the
cache tree.

Put the row in the hot list during allocation, forbid the blocks to be
stolen by a free or a row_reserve

MEDIUM: mworker: Add systemd `Type=notify` support

This patch adds support for `Type=notify` to the systemd unit.

Supporting `Type=notify` improves both starting as well as reloading
of the unit, because systemd will be let known when the action completed.

See this quote from `systemd.service(5)`:
> Note however that reloading a daemon by sending a signal (as with the
> example line above) is usually not a good choice, because this is an
> asynchronous operation and hence not suitable to order reloads of
> multiple services against each other. It is strongly recommended to
> set ExecReload= to a command that not only triggers a configuration
> reload of the daemon, but also synchronously waits for it to complete.

By making systemd aware of a reload in progress it is able to wait until
the reload actually succeeded.

This patch introduces both a new `USE_SYSTEMD` build option which controls
including the sd-daemon library as well as a `-Ws` runtime option which
runs haproxy in master-worker mode with systemd support.

When haproxy is running in master-worker mode with systemd support it will
send status messages to systemd using `sd_notify(3)` in the following cases:

- The master process forked off the worker processes (READY=1)
- The master process entered the `mworker_reload()` function (RELOADING=1)
- The master process received the SIGUSR1 or SIGTERM signal (STOPPING=1)

Change the unit file to specify `Type=notify` and replace master-worker
mode (`-W`) with master-worker mode with systemd support (`-Ws`).

Future evolutions of this feature could include making use of the `STATUS`
feature of `sd_notify()` to send information about the number of active
connections to systemd. This would require bidirectional communication
between the master and the workers and thus is left for future work.

BUG/MINOR: stream-int: don't try to read again when CF_READ_DONTWAIT is set

Commit 9aaf778 ("MAJOR: connection : Split struct connection into struct
connection and struct conn_stream.") had to change the way the stream
interface deals with incoming data to accomodate the mux. A break
statement got lost during a change, leading to the receive call being
performed twice even when CF_READ_DONTWAIT is set. The most noticeable
effect is that it made the bug described in commit 33982cb ("BUG/MAJOR:
stream: ensure analysers are always called upon close") much easier to
reproduce as it would appear even with an HTTP frontend.

Let's just restore the stream-interface flag and the break here, as in
the previous code.

No backport is needed as this was introduced during 1.8-dev.

BUG/MAJOR: stream: ensure analysers are always called upon close

A recent issue affecting HTTP/2 + redirect + cache has uncovered an old
problem affecting all existing versions regarding the way events are
reported to analysers.

It happens that when an event is reported, analysers see it and may
decide to temporarily pause processing and prevent other analysers from
processing the same event. Then the event may be cleared and upon the
next call to the analysers, some of them will never see it.

This is exactly what happens with CF_READ_NULL if it is received before
the request is processed, like during redirects : the first time, some
analysers see it, pause, then the event may be converted to a SHUTW and
cleared, and on next call, there's nothing to process. In practice it's
hard to get the CF_READ_NULL flag during the request because requests
have CF_READ_DONTWAIT, preventing the read0 from happening. But on
HTTP/2 it's presented along with any incoming request. Also on a TCP
frontend the flag is not set and it's possible to read the NULL before
the request is parsed.

This causes a problem when filters are present because flt_end_analyse
needs to be called to release allocated resources and remove the
CF_FLT_ANALYZE flag. And the loss of this event prevents the analyser
from being called and from removing itself, preventing the connection
from ever ending.

This problem just shows that the event processing needs a serious revamp
after 1.8. In the mean time we can deal with the really problematic case
which is that we *want* to call analysers if CF_SHUTW is set on any side
ad it's the last opportunity to terminate a processing. It may
occasionally result in some analysers being called for nothing in half-
closed situations but it will take care of the issue.

An example of problematic configuration triggering the bug in 1.7 is :

    frontend tcp
        bind :4445
        default_backend http

    backend http
        redirect location /
        compression algo identity

Then submitting requests which immediately close will have for effect
to accumulate streams which will never be freed :

   $ printf "GET / HTTP/1.1\r\n\r\n" >/dev/tcp/0/4445

This fix must be backported to 1.7 as well as any version where commit
c0c672a ("BUG/MINOR: http: Fix conditions to clean up a txn and to
handle the next request") was backported. This commit didn't cause the
bug but made it much more likely to happen.

BUG/MEDIUM: stream: don't automatically forward connect nor close

Upon stream instanciation, we used to enable channel auto connect
and auto close to ease TCP processing. But commit 9aaf778 ("MAJOR:
connection : Split struct connection into struct connection and
struct conn_stream.") has revealed that it was a bad idea because
this commit enables reading of the trailing shutdown that may follow
a small requests, resulting in a read and a shutr turned into shutw
before the stream even has a chance to apply the filters. This
causes an issue with impossible situations where the backend stream
interface is still in SI_ST_INI with a closed output, which blocks
some streams for example when performing a redirect with filters
enabled.

Let's change this so that we only enable these two flags if there is
no analyser on the stream. This way process_stream() has a chance to
let the analysers decide whether or not to allow the shutdown event
to be transferred to the other side.

It doesn't seem possible to trigger this issue before 1.8, so for now
it is preferable not to backport this fix.

[RELEASE] Released version 1.8-rc4

Released version 1.8-rc4 with the following main changes :
    - BUG/MEDIUM: cache: does not cache if no Content-Length
    - BUILD: thread/pipe: fix build without threads
    - BUG/MINOR: spoe: check buffer size before acquiring or releasing it
    - MINOR: debug/flags: Add missing flags
    - MINOR: threads: Use __decl_hathreads to declare locks
    - BUG/MINOR: buffers: Fix b_alloc_margin to be "fonctionnaly" thread-safe
    - BUG/MAJOR: ebtree/scope: fix insertion and removal of duplicates in scope-aware trees
    - BUG/MAJOR: ebtree/scope: fix lookup of next node in scope-aware trees
    - MINOR: ebtree/scope: add a function to find next node from a parent
    - MINOR: ebtree/scope: simplify the lookup functions by using eb32sc_next_with_parent()
    - BUG/MEDIUM: mworker: Fix re-exec when haproxy is started from PATH
    - BUG/MEDIUM: cache: use msg->sov to forward header
    - MINOR: cache: forward data with headers
    - MINOR: cache: disable cache if shctx_row_data_append fail
    - BUG/MINOR: threads: tid_bit must be a unsigned long
    - CLEANUP: tasks: Remove useless double test on rq_next
    - BUG/MEDIUM: standard: itao_str/idx and quote_str/idx must be thread-local
    - MINOR: tools: add a function to dump a scope-aware tree to a file
    - MINOR: tools: improve the DOT dump of the ebtree
    - MINOR: tools: emphasize the node being worked on in the tree dump
    - BUG/MAJOR: ebtree/scope: properly tag upper nodes during insertion
    - DOC: peers: Add a first version of peers protocol v2.1.
    - CONTRIB: Wireshark dissector for HAProxy Peer Protocol.
    - MINOR: mworker: display an accurate error when the reexec fail
    - BUG/MEDIUM: mworker: wait again for signals when execvp fail
    - BUG/MEDIUM: mworker: does not deinit anymore
    - BUG/MEDIUM: mworker: does not close inherited FD
    - MINOR: tests: add a python wrapper to test inherited fd
    - BUG/MINOR: Allocate the log buffers before the proxies startup
    - MINOR: tasks: Use a bitfield to track tasks activity per-thread
    - MAJOR: polling: Use active_tasks_mask instead of tasks_run_queue
    - MINOR: applets: Use a bitfield to track applets activity per-thread
    - MAJOR: polling: Use active_appels_mask instead of applets_active_queue
    - MEDIUM: applets: Don't process more than 200 active applets at once
    - MINOR: stream: Add thread-mask of tasks/FDs/applets in "show sess all" command
    - MINOR: SSL: Store the ASN1 representation of client sessions.
    - MINOR: ssl: Make sure we don't shutw the connection before the handshake.
    - BUG/MEDIUM: deviceatlas: ignore not valuable HTTP request data

BUG/MEDIUM: deviceatlas: ignore not valuable HTTP request data

A customer reported a crash when within the HTTP request some headers
were not set leading to the module to crash. So the module ignore them
since empty data have no value for the detection.
Needs to be backported to 1.7.

MINOR: ssl: Make sure we don't shutw the connection before the handshake.

Instead of trying to finish the handshake in ssl_sock_shutw, which may
fail, try not to shutdown until the handshake is finished.

MINOR: SSL: Store the ASN1 representation of client sessions.

Instead of storing the SSL_SESSION pointer directly in the struct server,
store the ASN1 representation, otherwise, session resumption is broken with
TLS 1.3, when multiple outgoing connections want to use the same session.

MINOR: stream: Add thread-mask of tasks/FDs/applets in "show sess all" command

MEDIUM: applets: Don't process more than 200 active applets at once

Now, we process at most 200 active applets per call to applet_run_active. We use
the same limit as the tasks. With the cache filter and the SPOE, the number of
active applets can now be huge. So, it is important to limit the number of
applets processed in applet_run_active.

MAJOR: polling: Use active_appels_mask instead of applets_active_queue

applets_active_queue is the active queue size. It is a global variable. So it is
underoptimized because we may be lead to consider there are active applets for a
thread while in fact all active applets are assigned to the otherthreads. So, in
such cases, the polling loop will be evaluated many more times than necessary.

Instead, we now check if the thread id is set in the bitfield active_applets_mask.

This is specific to threads, no backport is needed.

MINOR: applets: Use a bitfield to track applets activity per-thread

a bitfield has been added to know if there are runnable applets for a
thread. When an applet is woken up, the bits corresponding to its thread_mask
are set. When all active applets for a thread is get to be processed, the thread
is removed from active ones by unsetting its tid_bit from the bitfield.

MAJOR: polling: Use active_tasks_mask instead of tasks_run_queue

tasks_run_queue is the run queue size. It is a global variable. So it is
underoptimized because we may be lead to consider there are active tasks for a
thread while in fact all active tasks are assigned to the other threads. So, in
such cases, the polling loop will be evaluated many more times than necessary.

Instead, we now check if the thread id is set in the bitfield active_tasks_mask.

Another change has been made in process_runnable_tasks. Now, we always limit the
number of tasks processed to 200.

This is specific to threads, no backport is needed.

MINOR: tasks: Use a bitfield to track tasks activity per-thread

a bitfield has been added to know if there are runnable tasks for a thread. When
a task is woken up, the bits corresponding to its thread_mask are set. When all
tasks for a thread have been evaluated without any wakeup, the thread is removed
from active ones by unsetting its tid_bit from the bitfield.

BUG/MINOR: Allocate the log buffers before the proxies startup

Since the commit cd7879adc ("BUG/MEDIUM: threads: Run the poll loop on the main
thread too"), the log buffers are allocated after the proxies startup. So log
messages produced during this startup was ignored.

To fix the bug, we restore the initialization of these buffers before proxies
startup.

This is specific to threads, no backport is needed.

MINOR: tests: add a python wrapper to test inherited fd

BUG/MEDIUM: mworker: does not close inherited FD

At the end of the master initialisation, a call to protocol_unbind_all()
was made, in order to close all the FDs.

Unfortunately, this function closes the inherited FDs (fd@), upon reload
the master wasn't able to reload a configuration with those FDs.

The create_listeners() function now store a flag to specify if the fd
was inherited or not.

Replace the protocol_unbind_all() by mworker_cleanlisteners() +
deinit_pollers()

BUG/MEDIUM: mworker: does not deinit anymore

Does not use the deinit() function during a reload, it's dangerous and
might be subject to double free, segfault and hazardous behavior if
it's called twice in the case of a execvp fail.

BUG/MEDIUM: mworker: wait again for signals when execvp fail

After execvp fails, the signals were ignored, preventing to try a reload
again. It is now fixed by reaching the top of the mworker_wait()
function once the execvp failed.

MINOR: mworker: display an accurate error when the reexec fail

When the master worker fail the execvp, it returns the wrong error
"Cannot allocate memory".

We now display the accurate error corresponding to the errno value.

CONTRIB: Wireshark dissector for HAProxy Peer Protocol.

DOC: peers: Add a first version of peers protocol v2.1.

This documentation has to be completed.

BUG/MAJOR: ebtree/scope: properly tag upper nodes during insertion

Christopher found a case where some tasks would remain unseen in the run
queue and would spontaneously appear after certain apparently unrelated
operations performed by the other thread.

It's in fact the insertion which is not correct, the node serving as the
top of duplicate tree wasn't properly updated, just like the each top of
subtree in a duplicate tree. This had the effect that after some removals,
the incorrectly tagged node would hide the underlying ones, which would
then suddenly re-appear once they were removed.

This is 1.8-specific, no backport is needed.

MINOR: tools: emphasize the node being worked on in the tree dump

Now we can show in dotted red the node being removed or surrounded in red
a node having been inserted, and add a description on the graph related to
the operation in progress for example.