git.ipfire.org Git - thirdparty/haproxy.git/log

MINOR: tools: add a generic function to generate UUIDs

We currently have two UUID generation functions, one for the sample
fetch and the other one in the SPOE filter. Both were a bit complicated
since they were made to support random() implementations returning an
arbitrary number of bits, and were throwing away 33 bits every 64. Now
we don't need this anymore, so let's have a generic function consuming
64 bits at once and use it as appropriate.

MINOR: sample: make all bits random on the rand() sample fetch

The rand() sample fetch supports being limited to a certain range, but
it only uses 31 bits and scales them as requested, which means that when
the requested output range is larger than 31 bits, the least significant
one is not random and may even be constant.

Let's make use of the whole 32 bits now that we have access ot them.

BUG/MINOR: checks/threads: use ha_random() and not rand()

In order to honor spread_checks we currently call rand() which is not
thread safe and which must never turn its internal state to zero. This
is not thread safe, let's use ha_random() instead. This is a complement
to commimt 52bf839394 ("BUG/MEDIUM: random: implement a thread-safe and
process-safe PRNG") and may be backported with it.

MINOR: backend: use a single call to ha_random32() for the random LB algo

For the random LB algorithm we need a random 32-bit hashing key that used
to be made of two calls to random(). Now we can simply perform a single
call to ha_random32() and get rid of the useless operations.

BUG/MEDIUM: random: implement a thread-safe and process-safe PRNG

This is the replacement of failed attempt to add thread safety and
per-process sequences of random numbers initally tried with commit
1c306aa84d ("BUG/MEDIUM: random: implement per-thread and per-process
random sequences").

This new version takes a completely different approach and doesn't try
to work around the horrible OS-specific and non-portable random API
anymore. Instead it implements "xoroshiro128**", a reputedly high
quality random number generator, which is one of the many variants of
xorshift, which passes all quality tests and which is described here:

   http://prng.di.unimi.it/

While not cryptographically secure, it is fast and features a 2^128-1
period. It supports fast jumps allowing to cut the period into smaller
non-overlapping sequences, which we use here to support up to 2^32
processes each having their own, non-overlapping sequence of 2^96
numbers (~7*10^28). This is enough to provide 1 billion randoms per
second and per process for 2200 billion years.

The implementation was made thread-safe either by using a double 64-bit
CAS on platforms supporting it (x86_64, aarch64) or by using a local
lock for the time needed to perform the shift operations. This ensures
that all threads pick numbers from the same pool so that it is not
needed to assign per-thread ranges. For processes we use the fast jump
method to advance the sequence by 2^96 for each process.

Before this patch, the following config:
    global
        nbproc 8

    frontend f
        bind :4445
        mode http
        log stdout format raw daemon
        log-format "%[uuid] %pid"
        redirect location /

Would produce this output:
    a4d0ad64-2645-4b74-b894-48acce0669af 12987
    a4d0ad64-2645-4b74-b894-48acce0669af 12992
    a4d0ad64-2645-4b74-b894-48acce0669af 12986
    a4d0ad64-2645-4b74-b894-48acce0669af 12988
    a4d0ad64-2645-4b74-b894-48acce0669af 12991
    a4d0ad64-2645-4b74-b894-48acce0669af 12989
    a4d0ad64-2645-4b74-b894-48acce0669af 12990
    82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12987
    82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12992
    82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12986
    (...)

And now produces:
    f94b29b3-da74-4e03-a0c5-a532c635bad9 13011
    47470c02-4862-4c33-80e7-a952899570e5 13014
    86332123-539a-47bf-853f-8c8ea8b2a2b5 13013
    8f9efa99-3143-47b2-83cf-d618c8dea711 13012
    3cc0f5c7-d790-496b-8d39-bec77647af5b 13015
    3ec64915-8f95-4374-9e66-e777dc8791e0 13009
    0f9bf894-dcde-408c-b094-6e0bb3255452 13011
    49c7bfde-3ffb-40e9-9a8d-8084d650ed8f 13014
    e23f6f2e-35c5-4433-a294-b790ab902653 13012

There are multiple benefits to using this method. First, it doesn't
depend anymore on a non-portable API. Second it's thread safe. Third it
is fast and more proven than any hack we could attempt to try to work
around the deficiencies of the various implementations around.

This commit depends on previous patches "MINOR: tools: add 64-bit rotate
operators" and "BUG/MEDIUM: random: initialize the random pool a bit
better", all of which will need to be backported at least as far as
version 2.0. It doesn't require to backport the build fixes for circular
include files dependecy anymore.

MINOR: tools: add 64-bit rotate operators

This adds rotl64/rotr64 to rotate a 64-bit word by an arbitrary number
of bits. It's mainly aimed at being used with constants.

BUILD: cirrus-ci: get rid of unstable freebsd images

the only stable available freebsd image on cirrus is 12.1
let us drop all "snap" images, they are unusable for running tests
because of being fragile

Revert "BUG/MEDIUM: random: implement per-thread and per-process random sequences"

This reverts commit 1c306aa84d785b9c2240bf7767dcc1f2596cfcfd.

It breaks the build on all non-glibc platforms. I got confused by the
man page (which possibly is the most confusing man page I've ever read
about a standard libc function) and mistakenly understood that random_r
was portable, especially since it appears in latest freebsd source as
well but not in released versions, and with a slightly different API :-/

We need to find a different solution with a fallback. Among the
possibilities, we may reintroduce this one with a fallback relying on
locking around the standard functions, keeping fingers crossed for no
other library function to call them in parallel, or we may also provide
our own PRNG, which is not necessarily more difficult than working
around the totally broken up design of the portable API.

BUG/MEDIUM: random: implement per-thread and per-process random sequences

As mentioned in previous patch, the random number generator was never
made thread-safe, which used not to be a problem for health checks
spreading, until the uuid sample fetch function appeared. Currently
it is possible for two threads or processes to produce exactly the
same UUID. In fact it's extremely likely that this will happen for
processes, as can be seen with this config:

    global
        nbproc 8

    frontend f
        bind :4445
        mode http
        log stdout daemon format raw
        log-format "%[uuid] %pid"
        redirect location /

It typically produces this log:

  551ce567-0bfb-4bbd-9b58-cdc7e9365325 30645
  551ce567-0bfb-4bbd-9b58-cdc7e9365325 30641
  551ce567-0bfb-4bbd-9b58-cdc7e9365325 30644
  551ce567-0bfb-4bbd-9b58-cdc7e9365325 30639
  551ce567-0bfb-4bbd-9b58-cdc7e9365325 30646
  07764439-c24d-4e6f-a5a6-0138be59e7a8 30645
  07764439-c24d-4e6f-a5a6-0138be59e7a8 30639
  551ce567-0bfb-4bbd-9b58-cdc7e9365325 30643
  07764439-c24d-4e6f-a5a6-0138be59e7a8 30646
  b6773fdd-678f-4d04-96f2-4fb11ad15d6b 30646
  551ce567-0bfb-4bbd-9b58-cdc7e9365325 30642
  07764439-c24d-4e6f-a5a6-0138be59e7a8 30642

What this patch does is to use a distinct per-thread and per-process
seed to make sure the same sequences will not appear, and will then
extend these seeds by "burning" a number of randoms that depends on
the global random seed, the thread ID and the process ID. This adds
roughly 20 extra bits of randomness, resulting in 52 bits total per
thread and per process.

It only takes a few milliseconds to burn these randoms and given
that threads start with a different seed, we know they will not
catch each other. So these random extra bits are essentially added
to ensure randomness between boots and cluster instances.

This replaces all uses of random() with ha_random() which uses the
thread-local state.

This must be backported as far as 2.0 or any version having the
UUID sample-fetch function since it's the main victim here.

It's important to note that this patch, in addition to depending on
the previous one "BUG/MEDIUM: init: initialize the random pool a bit
better", also depends on the preceeding build fixes to address a
circular dependency issue in the include files that prevented it
from building. Part or all of these patches may need to be backported
or adapted as well.

BUG/MEDIUM: random: initialize the random pool a bit better

Since the UUID sample fetch was created, some people noticed that in
certain virtualized environments they manage to get exact same UUIDs
on different instances started exactly at the same moment. It turns
out that the randoms were only initialized to spread the health checks
originally, not to provide "clean" randoms.

This patch changes this and collects more randomness from various
sources, including existing randoms, /dev/urandom when available,
RAND_bytes() when OpenSSL is available, as well as the timing for such
operations, then applies a SHA1 on all this to keep a 160 bits random
seed available, 32 of which are passed to srandom().

It's worth mentioning that there's no clean way to pass more than 32
bits to srandom() as even initstate() provides an opaque state that
must absolutely not be tampered with since known implementations
contain state information.

At least this allows to have up to 4 billion different sequences
from the boot, which is not that bad.

Note that the thread safety was still not addressed, which is another
issue for another patch.

This must be backported to all versions containing the UUID sample
fetch function, i.e. as far as 2.0.

BUILD: listener: types/listener.h must not include standard.h

It's only a type definition, this header is not needed and causes
some circular dependency issues.

BUILD: freq_ctr: proto/freq_ctr needs to include common/standard.h

This is needed for div_64_32() which is there and currently accidently
inherited via global.h!

BUILD: global: must not include common/standard.h but only types/freq_ctr.h

This one was accidently inherited and used to work but causes a circular
dependency.

BUILD: ssl: include mini-clist.h

We use some list definitions and we don't include this header which
is in fact accidently inherited from others, causing a circular
dependency issue.

BUILD: buffer: types/{ring.h,checks.h} should include buf.h, not buffer.h

buffer.h relies on proto/activity because it contains some code and not
just type definitions. It must not be included from types files. It
should probably also be split in two if it starts to include a proto.
This causes some circular dependencies at other places.

BUG/MINOR: http-rules: Abort transaction when a redirect is applied on response

In the same way than for the request, when a redirect rule is applied the
transction is aborted. This must be done returning HTTP_RULE_RES_ABRT from
http_res_get_intercept_rule() function.

No backport needed because on previous versions, the action return values are
not handled the same way.

BUG/MINOR: rules: Increment be_counters if backend is assigned for a silent-drop

Backend counters must be incremented only if a backend was already assigned to
the stream (when the stream exists). Otherwise, it means we are still on the
frontend side.

This patch may be backported as far as 1.6.

BUG/MINOR: rules: Return ACT_RET_ABRT when a silent-drop action is executed

When an action interrupts a transaction, returning a response or not, it must
return the ACT_RET_ABRT value and not ACT_RET_STOP. ACT_RET_STOP is reserved to
stop the processing of the current ruleset.

No backport needed because on previous versions, the action return values are
not handled the same way.

BUG/MINOR: rules: Preserve FLT_END analyzers on silent-drop action

When at least a filter is attached to a stream, FLT_END analyzers must be
preserved on request and response channels.

This patch should be backported as far as 1.7.

MINOR: compression/filters: Initialize the comp filter when stream is created

Since the HTX mode is the only mode to process HTTP messages, the stream is
created for a uniq transaction. The keep-alive is handled at the mux level. So,
the compression filter can be initialized when the stream is created and
released with the stream. Concretly, .channel_start_analyze and
.channel_end_analyze callback functions are replaced by .attach and .detach
ones.

With this change, it is no longer necessary to call FLT_START_FE/BE and FLT_END
analysers for the compression filter.

MINOR: cache/filters: Initialize the cache filter when stream is created

Since the HTX mode is the only mode to process HTTP messages, the stream is
created for a uniq transaction. The keep-alive is handled at the mux level. So,
the cache filter can be initialized when the stream is created and released with
the stream. Concretly, .channel_start_analyze and .channel_end_analyze callback
functions are replaced by .attach and .detach ones.

With this change, it is no longer necessary to call FLT_START_FE/BE and FLT_END
analysers for the cache filter.

BUG/MINOR: http-rules: Fix a typo in the reject action function

A typo was introduced by the commit c5bb5a0f2 ("BUG/MINOR: http-rules: Preserve
FLT_END analyzers on reject action").

This patch must be backported with the commit c5bb5a0f2.

BUG/MINOR: http-rules: Preserve FLT_END analyzers on reject action

When at least a filter is attached to a stream, FLT_END analyzers must be
preserved on request and response channels.

This patch should be backported as far as 1.8.

BUG/MINOR: http-rules: Return ACT_RET_ABRT to abort a transaction

When an action interrupts a transaction, returning a response or not, it must
return the ACT_RET_ABRT value and not ACT_RET_DONE. ACT_RET_DONE is reserved to
stop the processing on the current channel but some analysers may still be
active. When ACT_RET_ABRT is returned, all analysers are removed, except FLT_END
if it is set.

No backport needed because on previous verions, the action return value was not
handled the same way.

It is stated in the comment the return action returns ACT_RET_ABRT on
success. It it the right code to use to abort a transaction. ACT_RET_DONE must
be used when the message processing must be stopped. This does not means the
transaction is interrupted.

No backport needed.

BUG/MINOR: lua: Init the lua wake_time value before calling a lua function

The wake_time of a lua context is now always set to TICK_ETERNITY when the
context is initialized and when everytime the execution of the lua stack is
started. It is mandatory to not set arbitrary wake_time when an action yields.

No backport needed.

MINOR: lua: Rename hlua_action_wake_time() to hlua_set_wake_time()

This function does not depends on the action class. So use a more generic
name. It will be easier to bind it on another class if necessary.

MINOR: lua: Remove the flag HLUA_TXN_HTTP_RDY

This flag was used in some internal functions to be sure the current stream is
able to handle HTTP content. It was introduced when the legacy HTTP code was
still there. Now, It is possible to rely on stream's flags to be sure we have an
HTX stream.

So the flag HLUA_TXN_HTTP_RDY can be removed. Everywhere it was tested, it is
replaced by a call to the IS_HTX_STRM() macro.

This patch is mandatory to allow the support of the filters written in lua.

MINOR: lua: Stop using lua txn in hlua_http_del_hdr() and hlua_http_add_hdr()

In these functions, the lua txn was not used. So it can be removed from the
function argument list.

This patch is mandatory to allow the support of the filters written in lua.

MINOR: lua: Stop using the lua txn in hlua_http_rep_hdr()

In this function, the lua txn was only used to retrieve the stream. But it can
be retieve from the HTTP message, using its channel pointer. So, the lua txn can
be removed from the function argument list.

This patch is mandatory to allow the support of the filters written in lua.

MINOR: lua: Stop using the lua txn in hlua_http_get_headers()

In this function, the lua txn was only used to test if the HTTP transaction is
defined. But it is always used in a context where it is true. So, the lua txn
can be removed from the function argument list.

This patch is mandatory to allow the support of the filters written in lua.

MINOR: lua: Add function to know if a channel is a response one

It is now possible to call Channel.is_resp(chn) method to know if a channel is a
response channel or not.

BUG/MINOR: lua: Ignore the reserve to know if a channel is full or not

The Lua function Channel.is_full() should not take care of the reserve because
it is not called from a producer (an applet for instance). From an action, it is
allowed to overwrite the buffer reserve.

This patch should be backported as far as 1.7. But it must be adapted for 1.8
and lower because there is no HTX on these versions.

BUG/MINOR: lua: Abort when txn:done() is called from a Lua action

When a lua action aborts a transaction calling txn:done() function, the action
must return ACT_RET_ABRT instead of ACT_RET_DONE. It is mandatory to
abort the message analysis.

This patch must be backported everywhere the commit 7716cdf45 ("MINOR: lua: Get
the action return code on the stack when an action finishes") was
backported. For now, no backport needed.

BUG/MINOR: http-ana: Reset request analysers on a response side error

When an error occurred on the response side, request analysers must be reset. At
this stage, only AN_REQ_HTTP_XFER_BODY analyser remains, and possibly
AN_REQ_FLT_END, if at least one filter is attached to the stream. So it is safe
to remove the AN_REQ_HTTP_XFER_BODY analyser. An error was already handled and a
response was already returned to the client (or it was at least scheduled to be
sent). So there is no reason to continue to process the request payload. It may
cause some troubles for the filters because when an error occurred, data from
the request buffer are truncated.

This patch must be backported as far as 1.9, for the HTX part only. I don't know
if the legacy HTTP code is affected.

BUG/MEDIUM: compression/filters: Fix loop on HTX blocks compressing the payload

During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.

So, from the compression filter point of view, when we loop on the HTX blocks to
compress the response payload, we must start from the head of the HTX
message. To ease the loop, we use the function htx_find_offset().

This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.

BUG/MEDIUM: cache/filters: Fix loop on HTX blocks caching the response payload

During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.

So, from the cache point of view, when we loop on the HTX blocks to cache the
response payload, we must start from the head of the HTX message. To ease the
loop, we use the function htx_find_offset().

This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.

BUG/MINOR: filters: Forward everything if no data filters are called

If a filter enable the data filtering, in TCP or in HTTP, but it does not
defined the corresponding callback function (so http_payload() or
tcp_payload()), it will be ignored. If all configured data filter do the same,
we must be sure to forward everything. Otherwise nothing will be forwarded at
all.

This patch must be forwarded as far as 1.9.

BUG/MINOR: filters: Use filter offset to decude the amount of forwarded data

When the tcp or http payload is filtered, it is important to use the filter
offset to decude the amount of forwarded data because this offset may change
during the call to the callback function. So we should not rely on a local
variable defined before this call.

For now, existing HAproxy filters don't change this offset, so this bug may only
affect external filters.

This patch must be forwarded as far as 1.9.

MINOR: flt_trace: Use htx_find_offset() to get the available payload length

The trace_get_htx_datalen() function now uses htx_find_offset() to get the
payload length, ie. the length of consecutives DATA blocks.

MINOR: htx: Use htx_find_offset() to truncate an HTX message

The htx_truncate() function now uses htx_find_offset() to find the first block
to start the truncation.

MINOR: htx: Add a function to return a block at a specific offset

The htx_find_offset() function may be used to look for a block at a specific
offset in an HTX message, starting from the message head. A compound result is
returned, an htx_ret structure, with the found block and the position of the
offset in the block. If the offset is ouside of the HTX message, the returned
block is NULL.

MINOR: buf: Add function to insert a string at an absolute offset in a buffer

The b_insert_blk() function may now be used to insert a string, given a pointer
and the string length, at an absolute offset in a buffer, moving data between
this offset and the buffer's tail just after the end of the inserted string. The
buffer's length is automatically updated. This function supports wrapping. All
the string is copied or nothing. So it returns 0 if there are not enough space
to perform the copy. Otherwise, the number of bytes copied is returned.

CLEANUP: proxy_protocol: Use `size_t` when parsing TLVs

Change `int` to `size_t` for consistency.

BUG/MAJOR: proxy_protocol: Properly validate TLV lengths

This patch fixes PROXYv2 parsing when the payload of the TCP connection is
fused with the PROXYv2 header within a single recv() call.

Previously HAProxy ignored the PROXYv2 header length when attempting to
parse the TLV, possibly interpreting the first byte of the payload as a
TLV type.

This patch adds proper validation. It ensures that:

1. TLV parsing stops when the end of the PROXYv2 header is reached.
2. TLV lengths cannot exceed the PROXYv2 header length.
3. The PROXYv2 header ends together with the last TLV, not allowing for
"stray bytes" to be ignored.

A reg-test was added to ensure proper behavior.

This patch tries to find the sweat spot between a small and easily
backportable one, and a cleaner one that's more easily adaptable to
older versions, hence why it merges the "if" and "while" blocks which
causes a reindent of the whole block. It should be used as-is for
versions 1.9 to 2.1, the block about PP2_TYPE_AUTHORITY should be
dropped for 2.0 and the block about CRC32C should be dropped for 1.8.

This bug was introduced when TLV parsing was added. This happened in commit
b3e54fe387c7c1ea750f39d3029672d640c499f9. This commit was first released
with HAProxy 1.6-dev1.

A similar issue was fixed in commit 7209c204bd6f3c49132264c7a58f689cdc741c12.

This patch must be backported to HAProxy 1.6+.

BUG/MINOR: init: make the automatic maxconn consider the max of soft/hard limits

James Stroehmann reported something working as documented but that can
be considered as a regression in the way the automatic maxconn is
calculated from the process' limits :

  https://www.mail-archive.com/haproxy@formilux.org/msg36523.html

The purpose of the changes in 2.0 was to have maxconn default to the
highest possible value permitted to the user based on the ulimit -n
setting, however the calculation starts from the soft limit, which
can be lower than what users were allowed to with previous versions
where the default value of 2000 would force a higher ulimit -n as
long as it fitted in the hard limit.

Usually this is not noticeable if the user changes the limits, because
quite commonly setting a new value restricts both the soft and hard
values.

Let's instead always use the max between the hard and soft limits, as
we know these values are permitted. This was tried on the following
setup:

  $ cat ulimit-n.cfg
  global
    stats socket /tmp/sock1 level admin
  $ ulimit -n
  1024

Before the change the limits would show like this:

  $ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
  Maxsock: 1023
  Maxconn: 489

After the change the limits are now much better and more in line with
the default settings in earlier versions:

  $ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
  Maxsock: 4095
  Maxconn: 2025

The difference becomes even more obvious when running moderately large
configs with hundreds of checked servers and hundreds of listeners:

  $ cat ulimit-n.cfg
  global
    stats socket /tmp/sock1 level admin

  listen l
    bind :10000-10300
    server-template srv- 300 0.0.0.0 check disabled

          Before   After
  Maxsock  1024    4096
  Maxconn  189     1725

This issue is tagged as minor since a trivial config change fixes it,
but it would help new users to have it backported as far as 2.0.

DOC: assorted typo fixes in the documentation and Makefile

This is another round of cleanups in various docs and comments in the
Makefile.

DOC: configuration.txt: fix various typos

This was done using automatic spellcheck.

OPTIM: startup: fast unique_id allocation for acl.

pattern_finalize_config() uses an inefficient algorithm which is a
problem with very large configuration files. This affects startup, and
therefore reload time. When haproxy is deployed as a router in a
Kubernetes cluster the generated configuration file may be large and
reloads are frequently occuring, which makes this a significant issue.

The old algorithm is O(n^2)
* allocate missing uids - O(n^2)
* sort linked list - O(n^2)

The new algorithm is O(n log n):
* find the user allocated uids - O(n)
* store them for efficient lookup - O(n log n)
* allocate missing uids - n times O(log n)
* sort all uids - O(n log n)
* convert back to linked list - O(n)

Performance examples, startup time in seconds:

    pat_refs old     new
    1000      0.02   0.01
    10000     2.1    0.04
    20000    12.3    0.07
    30000    27.9    0.10
    40000    52.5    0.14
    50000    77.5    0.17

Please backport to 1.8, 2.0 and 2.1.

MEDIUM: stream: Make the `unique_id` member of `struct stream` a `struct ist`

The `unique_id` member of `struct stream` now is a `struct ist`.

MINOR: proxy: Make `header_unique_id` a `struct ist`

The `header_unique_id` member of `struct proxy` now is a `struct ist`.

MINOR: ist: Add `struct ist istdup(const struct ist)`

istdup() performs the equivalent of strdup() on a `struct ist`.

CLEANUP: Use `isttest()` and `istfree()`

This adjusts a few locations to make use of `isttest()` and `istfree()`.

MINOR: ist: Add `struct ist istalloc(size_t)` and `void istfree(struct ist*)`

`istalloc` allocates memory and returns an `ist` with the size `0` that points
to this allocation.

`istfree` frees the pointed memory and clears the pointer.

MINOR: ist: Add `int isttest(const struct ist)`

`isttest` returns whether the `.ptr` is non-null.

MINOR: ist: Add `IST_NULL` macro

`IST_NULL` is equivalent to an `struct ist` with `.ptr = NULL` and
`.len = 0`.

MINOR: debug: add CLI command "debug dev write" to write an arbitrary size

This command is used to produce an arbitrary amount of data on the
output. It can be used to test the CLI's state machine as well as
the internal parts related to applets an I/O. A typical test consists
in asking for all sizes from 0 to 16384:

  $ (echo "prompt;expert-mode on";for i in {0..16384}; do
     echo "debug dev write $i"; done) | socat - /tmp/sock1 | wc -c
  134258738

A better test would consist in first waiting for the response before
sending a new request.

This command is not restricted to the admin since it's harmless.

BUG/MINOR: ssl-sock: do not return an uninitialized pointer in ckch_inst_sni_ctx_to_sni_filters

There's a build error reported here:
https://github.com/haproxy/haproxy/commit/c9c6cdbf9c0e61ee88d68960a3220a4fcbf912cd/checks?check_suite_id=501365673

It's just caused by an inconditional assignment of tmp_filter to
*sni_filter without having been initialized, though it's harmless because
this return pointer is not used when fcount is NULL, which is the only
case where this happens.

No backport is needed as this was brought today by commit 38df1c8006
("MINOR: ssl/cli: support crt-list filters").

DOC: fix incorrect indentation of http_auth_*

These ones were incorrectly indented and thus not displayed optimally
in the HTML version. This addresses issue #533.

MINOR: ssl: reach a ckch_store from a sni_ctx

It was only possible to go down from the ckch_store to the sni_ctx but
not to go up from the sni_ctx to the ckch_store.

To allow that, 2 pointers were added:

- a ckch_inst pointer in the struct sni_ctx
- a ckckh_store pointer in the struct ckch_inst

MINOR: ssl/cli: support crt-list filters

Generate a list of the previous filters when updating a certificate
which use filters in crt-list. Then pass this list to the function
generating the sni_ctx during the commit.

This feature allows the update of the crt-list certificates which uses
the filters with "set ssl cert".

This function could be probably replaced by creating a new
ckch_inst_new_load_store() function which take the previous sni_ctx list as
an argument instead of the char **sni_filter, avoiding the
allocation/copy during runtime for each filter. But since are still
handling the multi-cert bundles, it's better this way to avoid code
duplication.

BUG/MINOR: connection/debug: do not enforce !event_type on subscribe() anymore

When building with DEBUG_STRICT, there are still some BUG_ON(events&event_type)
in the subscribe() code which are not welcome anymore since we explicitly
permit to wake the caller up on readiness. This causes some regtests to fail
since 2c1f37d353 ("OPTIM: mux-h1: subscribe rather than waking up at a few
other places") when built with this option.

No backport is needed, this is 2.2-dev.

MINOR: stream: Use stream_generate_unique_id

This patch replaces the ad-hoc generation of stream's `unique_id` values
by calls to `stream_generate_unique_id`.

MINOR: stream: Add stream_generate_unique_id function

Currently unique IDs for a stream are generated using repetitive code in
multiple locations, possibly allowing for inconsistent behavior.

REGTEST: Add unique-id reg-test

This reg-test verifies the following behavior:

1. That unique IDs are stable (i.e. the bug fixed in 530408f976e5fe2f2f2b4b733b39da36770b566f)
2. That unique IDs can use values from the HTTP request (see https://www.mail-archive.com/haproxy@formilux.org/msg36436.html)

OPTIM: mux-h1: subscribe rather than waking up at a few other places

This is another round of conversion from a blind tasklet_wakeup() to
a more careful subscribe(). It has significantly improved the number
of function calls per HTTP request (/?s=1k/t=20) :
                        before    after
tasklet_wakeup:          3         2
conn_subscribe:          3         2
h1_iocb:                 3         2
h1_process:              3         2
h1_parse_msg_hdrs:       4         3
h1_rcv_buf:              5         3
h1_send:                 5         4
h1_subscribe:            2         1
h1_wake_stream_for_send: 5         4
http_wait_for_request:   2         1
process_stream:          3         2
si_cs_io_cb:             4         2
si_cs_process:           3         1
si_cs_rcv:               5         3
si_sync_send:            2         1
si_update_both:          2         1
stream_int_chk_rcv_conn: 3         2
stream_int_notify:       3         1
stream_release_buffers:  9         4

OPTIM: connection: disable receiving on disabled events when the run queue is too high

In order to save a lot on syscalls, we currently don't disable receiving
on a file descriptor anymore if its handler was already woken up. But if
the run queue is huge and the poller collects a lot of events, this causes
excess wakeups which take CPU time which is not used to flush these tasklets.
This patch simply considers the run queue size to decide whether or not to
stop receiving. Tests show that by stopping receiving when the run queue
reaches ~16 times its configured size, we can still hold maximal performance
in extreme situations like maxpollevents=20k for runqueue_depth=2, and still
totally avoid calling epoll_event under moderate load using default settings
on keep-alive connections.

MEDIUM: connection: only call ->wake() for connect() without I/O

We used to call ->wake() for any I/O event for which there was no
subscriber. But this is a problem because this causes massive wake()
storms since we disabled fd_stop_recv() to save syscalls.

The only reason for the io_available condition is to detect that an
asynchronous connect() just finished and will not be handled by any
registered event handler. Since we now properly handle synchronous
connects, we can detect this situation by the fact that we had a
success on conn_fd_check() and no requested I/O took over.

MINOR: tcp/uxst/sockpair: only ask for I/O when really waiting for a connect()

Now that the stream-interface properly handles synchonous connects, there
is no more reason for subscribing and doing nothing.

MEDIUM: stream-int: make sure to try to immediately validate the connection

In the rare case of immediate connect() (unix sockets, socket pairs, and
occasionally TCP over the loopback), it is counter-productive to subscribe
for sending and then getting immediately back to process_stream() after
having passed through si_cs_process() just to update the connection. We
already know it is established and it doesn't have any handshake anymore
so we just have to complete it and return to process_stream() with the
stream_interface in the SI_ST_RDY state. In this case, process_stream will
simply loop back to the beginning to synchronize the state and turn it to
SI_ST_EST/ASS/CLO/TAR etc.

This will save us from having to needlessly subscribe in the connect()
code, something which in addition cannot work with edge-triggered pollers.

BUG/MEDIUM: connection: stop polling for sending when the event is ready

With commit 065a025610 ("MEDIUM: connection: don't stop receiving events
in the FD handler") we disabled a number of fd_stop_* in conn_fd_handler(),
in order to wait for their respective handlers to deal with them. But it
is not correct to do that for the send direction, as we may very well
have nothing to send. This is visible when connecting in TCP mode to
a server with no data to send, there's nobody anymore to disable the
polling for the send direction.

And it is logical, on the recv() path we know the system has data to
deliver and that some code will be in charge of it. On the send
direction we simply don't know if it was the result of a successful
connect() or if there is still something to send. In any case we
almost never fill the network buffer on a single send() after being
woken up by the system, so disabling the FD immediately or much later
will not change the number of operations.

No backport is needed, this is 2.2-dev.

BUILD: makefile: do not modify the build options during make reg-tests

I'm quite fed up with having to rebuild everything from scratch after each
and every "make reg-tests", especially during bisects. The only reason for
this is that there are no build options passed to make for reg-tests, which
modifies the .build_opts file again, resulting in a change upon next build.
Let's just keep this file out of the dependency check for make reg-tests.

CLEANUP: contrib/spoa_example: Fix several typos

This patch can be backported as far as 1.8.

BUILD: tools: rely on __ELF__ not USE_DL to enable use of dladdr()

The approach was wrong. USE_DL is for the makefile to know if it's required
to append -ldl at link time. Some platforms do not need it (and in fact do
not have it) yet they have a working dladdr(). The real condition is related
to ELF. Given that due to Lua, all platforms that require -ldl already have
USE_DL set, let's replace USE_DL with __ELF__ here and consider that dladdr
is always needed on ELF, which basically is already the case.

BUILD: tools: unbreak resolve_sym_name() on non-GNU platforms

resolve_sym_name() doesn't build when USE_DL is set on non-GNU platforms
because "Elf(W)" isn't defined. Since it's only used for dladdr1(), let's
refactor all this so that we can completely ifdef out that part on other
platforms. Now we have a separate function to perform the call depending
on the platform and it only returns the size when available.

MINOR: debug: dump the whole trace if we can't spot the starting point

Instead of special-casing the use of the symbol resolving to decide
whether to dump a partial or complete trace, let's simply start over
and dump everything when we reach the end after having found nothing.
It will be more robust against dirty traces as well.

MINOR: debug: use our own backtrace function on clang+x86_64

A test on FreeBSD with clang 4 to 8 produces this on a call to a
spinning loop on the CLI:

  call trace(5):
  |       0x53e2bc [eb 16 48 63 c3 48 c1 e0]: wdt_handler+0x10c
  |    0x800e02cfe [e8 5d 83 00 00 8b 18 8b]: libthr:pthread_sigmask+0x53e

with our own function it correctly produces this:

  call trace(20):
  |       0x53e2dc [eb 16 48 63 c3 48 c1 e0]: wdt_handler+0x10c
  |    0x800e02cfe [e8 5d 83 00 00 8b 18 8b]: libthr:pthread_sigmask+0x53e
  |    0x800e022bf [48 83 c4 38 5b 41 5c 41]: libthr:pthread_getspecific+0xdef
  | 0x7ffffffff003 [48 8d 7c 24 10 6a 00 48]: main+0x7fffffb416f3
  |    0x801373809 [85 c0 0f 84 6f ff ff ff]: libc:__sys_gettimeofday+0x199
  |    0x801373709 [89 c3 85 c0 75 a6 48 8b]: libc:__sys_gettimeofday+0x99
  |    0x801371c62 [83 f8 4e 75 0f 48 89 df]: libc:gettimeofday+0x12
  |       0x51fa0a [48 89 df 4c 89 f6 e8 6b]: ha_thread_dump_all_to_trash+0x49a
  |       0x4b723b [85 c0 75 09 49 8b 04 24]: mworker_cli_sockpair_new+0xd9b
  |       0x4b6c68 [85 c0 75 08 4c 89 ef e8]: mworker_cli_sockpair_new+0x7c8
  |       0x532f81 [4c 89 e7 48 83 ef 80 41]: task_run_applet+0xe1

So let's add clang+x86_64 to the list of platforms that will use our
simplified version. As a bonus it will not require to link with
-lexecinfo on FreeBSD and will work out of the box when passing
USE_BACKTRACE=1.

MINOR: debug: improve backtrace() on aarch64 and possibly other systems

It happens that on aarch64 backtrace() only returns one entry (tested
with gcc 4.7.4, 5.5.0 and 7.4.1). Probably that it refrains from unwinding
the stack due to the risk of hitting a bad pointer. Here we can use
may_access() to know when it's safe, so we can actually unwind the stack
without taking risks. It happens that the faulting function (the one
just after the signal handler) is not listed here, very likely because
the signal handler uses a special stack and did not create a new frame.

So this patch creates a new my_backtrace() function in standard.h that
either calls backtrace() or does its own unrolling. The choice depends
on HA_HAVE_WORKING_BACKTRACE which is set in compat.h based on the build
target.

MINOR: debug: report the number of entries in the backtrace

It's useful to get an indication of unresolved stuff or memory
corruption to have the apparent depth of the stack trace in the
output, especially if we dump nothing.

MINOR: wdt: do not depend on USE_THREAD

There is no reason for restricting the use of the watchdog to threads
anymore, as it works perfectly without threads as well.

MEDIUM: wdt: fall back to CLOCK_REALTIME if CLOCK_THREAD_CPUTIME is not available

At least FreeBSD has a fully functional CLOCK_THREAD_CPUTIME but it
cannot create a timer on it. This is not a problem since our timer is
only used to measure each thread's usage using now_cpu_time_thread().
So by just replacing this clock with CLOCK_REALTIME we allow such
platforms to periodically call the wdt and check the thread's CPU usage.
The consequence is that even on a totally idle system there will still
be a few extra periodic wakeups, but the watchdog becomes usable there
as well.

BUILD: Makefile: include librt before libpthread

Statically building on for i386/x86_64 on linux+glibc 2.18 fails in rt with
undefined references to pthread_attr_init and a few others. Let's just swap
the two libs in order to fix this.

BUG/MINOR: wdt: do not return an error when the watchdog couldn't be enabled

On operating systems not supporting to create a timer on
POSIX_THREAD_CPUTIME we emit a warning but we return an error so the
process fails to start, which is absurd. Let's return a success once
the warning is emitted instead.

This may be backported to 2.1 and 2.0.

MINOR: ssl: add "ca-verify-file" directive

It's only available for bind line. "ca-verify-file" allows to separate
CA certificates from "ca-file". CA names sent in server hello message is
only compute from "ca-file". Typically, "ca-file" must be defined with
intermediate certificates and "ca-verify-file" with certificates to
ending the chain, like root CA.

Fix issue #404.

MINOR: debug: call backtrace() once upon startup

Calling backtrace() will access libgcc at runtime. We don't want to do
it after the chroot, so let's perform a first call to have it ready in
memory for later use.

MEDIUM: debug: add support for dumping backtraces of stuck threads

When a panic() occurs due to a stuck thread, we'll try to dump a
backtrace of this thread if the config directive USE_BACKTRACE is
set (which is the case on linux+glibc). For this we use the
backtrace() call provided by glibc and iterate the pointers through
resolve_sym_name(). In order to minimize the output (which is limited
to one buffer), we only do this for stuck threads, and we start the
dump above ha_panic()/ha_thread_dump_all_to_trash(), and stop when
meeting known points such as main/run_tasks_from_list/run_poll_loop.

If enabled without USE_DL, the dump will be complete with no details
except that pointers will all be given relative to main, which is
still better than nothing.

The new USE_BACKTRACE config option is enabled by default on glibc since
it has been present for ages. When it is set, the export-dynamic linker
option is enabled so that all non-static symbols are properly resolved.

MINOR: cli: make "show fd" rely on resolve_sym_name()

This way we can drop all hard-coded iocb matching.

MINOR: debug: use resolve_sym_name() to dump task handlers

Now in "show threads", the task/tasklet handler will be resolved
using this function, which will provide more detailed results and
will still support offsets to main for unresolved symbols.

MINOR: tools: add resolve_sym_name() to resolve function pointers

We use various hacks at a few places to try to identify known function
pointers in debugging outputs (show threads & show fd). Let's centralize
this into a new function dedicated to this. It already knows about the
functions matched by "show threads" and "show fd", and when built with
USE_DL, it can rely on dladdr1() to resolve other functions. There are
some limitations, as static functions are not resolved, linking with
-rdynamic is mandatory, and even then some functions will not necessarily
appear. It's possible to do a better job by rebuilding the whole symbol
table from the ELF headers in memory but it's less portable and the gains
are still limited, so this solution remains a reasonable tradeoff.

MINOR: tools: add new function dump_addr_and_bytes()

This function dumps <n> bytes from <addr> in hex form into buffer <buf>
enclosed in brackets after the address itself, formatted on 14 chars
including the "0x" prefix. This is meant to be used as a prefix for code
areas. For example: "0x7f10b6557690 [48 c7 c0 0f 00 00 00 0f]: "
It relies on may_access() to know if the bytes are dumpable, otherwise "--"
is emitted. An optional prefix is supported.

BUILD: tools: remove obsolete and conflicting trace() from standard.c

Since commit 4c2ae48375 ("MINOR: trace: implement a very basic trace()
function") merged in 2.1, trace() is an inline function. It must not
appear in standard.c anymore and may break build depending on includes.

This can be backported to 2.1.

MINOR: task: export run_tasks_from_list

This will help refine debug traces.

MINOR: haproxy: export run_poll_loop

This will help refine debug traces.

MINOR: haproxy: export main to ease access from debugger

Better just export main instead of declaring it as extern, it's cleaner
and may be usable elsewhere.

BUG/MEDIUM: debug: make the debug_handler check for the thread in threads_to_dump

It happens that just sending the debug signal to the process makes on
thread wait for its turn while nobody wants to dump. We need to at
least verify that a dump was really requested for this thread.

This can be backported to 2.1 and 2.0.

MINOR: debug: report the task handler's pointer relative to main

Often in crash dumps we see unknown function pointers. Let's display
them relative to main, that helps quite a lot figure the function
from an executable, for example:

(gdb) x/a main+645360
0x4c56a0 <h1_timeout_task>: 0x2e6666666666feeb

This could be backported to 2.0.

MINOR: tools: make sure to correctly check the returned 'ms' in date2std_log

In commit 4eee38a ("BUILD/MINOR: tools: fix build warning in the date
conversion functions") we added some return checks to shut build
warnings but the last test is useless since the tested pointer is not
updated by the last call to utoa_pad() used to convert the milliseconds.
It turns out the original code from 2012 already skipped this part,
probably in order to avoid the risk of seeing a millisecond field not
belonging to the 0-999 range. Better keep the check and put the code
into stricter shape.

No backport is needed. This fixes issue #526.

BUG/MINOR: arg: don't reject missing optional args

Commit 80b53ffb1c ("MEDIUM: arg: make make_arg_list() stop after its
own arguments") changed the way we detect the empty list because we
cannot stop by looking up the closing parenthesis anymore, thus for
the first missing arg we have to enter the parsing loop again. And
there, finding an empty arg means we go to the empty_err label, where
it was not initially planned to handle this condition. This results
in %[date()] to fail while %[date] works. Let's simply check if we've
reached the minimally supported args there (it used to be done during
the function entry).

Thanks to Jérôme for reporting this issue. No backport is needed,
this is 2.2-dev2+ only.

MEDIUM: mux-h1: do not blindly wake up the tasklet at end of request anymore

Since commit "MEDIUM: connection: make the subscribe() call able to wakeup
if ready" we have the guarantee that the tasklet will be woken up if
subscribing to a connection for an even that's ready. Since we have too
many tasklet_wakeup() calls in mux-h1, let's now use this property to
improve the situation a bit.

With this change, no syscall count changed, however the number of useless
calls to some functions significantly went down. Here are the differences
for the test below (100k req), in number of calls per request :

  $ ./h1load -n 100000 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

                           before   after  change   note
  tasklet_wakeup:           3        1      -66%
  h1_io_cb:                 4        3      -25%
  h1_send:                  6.7      5.4    -19%
  h1_wake:                  0.73     0.44   -39%
  h1_process:               4.7      3.4    -27%
  h1_wake_stream_for_send:  6.7      5.5    -18%
  si_cs_process             3.7      3.4     -7.8%
  conn_fd_handler           2.7      2.4    -10%
  raw_sock_to_buf:          4        2      -50%
  pool_free:                4        2      -50%    from failed rx calls

Note that the situation could be further improved by having muxes lazily
subscribe to Rx events in case the FD is already being polled. However
this requires deeper changes to implement a LAZY_RECV subscribe mode,
and to replace the FD's active bit by 3 states representing the desired
action to perform on the FD during the update, among NONE (no need to
change), POLL (can't proceed without), and STOP (buffer full). This
would only impact Rx since on Tx we know what we have to send. The
savings to expect from this might be more visible with splicing and/or
when dealing with many connections having long think times.

MEDIUM: connection: don't stop receiving events in the FD handler

The remaining epoll_ctl() calls are exclusively caused by the disagreement
between conn_fd_handler() and the mux receiving the data: the fd handler
wants to stop after having woken up the tasklet, then the mux after
receiving data wants to receive again. Given that they don't happen in
the same poll loop when there are many FDs, this causes a lot of state
changes.

As suggested by Olivier, if the task is already scheduled for running,
we don't need to disable the event because it's in the run queue, poll()
cannot stop, and reporting it again will be harmless. What *might*
happen however is that a sampling-based poller like epoll() would report
many times the same event and has trouble getting others behind. But if
it would happen, it would still indicate the run queue has plenty of
pending operations, so it would in fact only displace the problem from
the poller to the run queue, which doesn't seem to be worse (and in
fact we do support priorities while the poller does not).

By doing this change, the keep-alive test with 1k conns and 100k reqs
completely gets rid of the per-request epoll_ctl changes, while still
not causing extra recvfrom() :

  $ ./h1load -n 100000 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

  200000 sendto 1
  200000 recvfrom 1
   10762 epoll_wait 1
    3664 epoll_ctl 1
    1999 recvfrom -1

In close mode, it didn't change anything, we're still in the optimal
case (2 epoll per connection) :

  $ ./h1load -n 100000 -r 1 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

  203764 epoll_ctl 1
  200000 sendto 1
  200000 recvfrom 1
    6091 epoll_wait 1
    2994 recvfrom -1

MEDIUM: connection: make the subscribe() call able to wakeup if ready

There's currently an internal API limitation at the connection layer
regarding conn_subscribe(). We must not subscribe if we haven't yet
met EAGAIN or such a condition, so we sometimes force ourselves to read
in order to meet this condition and being allowed to call subscribe.
But reading cannot always be done (e.g. at the end of a loop where we
cannot afford to retrieve new data and start again) so we instead
perform a tasklet_wakeup() of the requester's io_cb. This is what is
done in mux_h1 for example. The problem with this is that it forces
a new receive when we're not necessarily certain we need one. And if
the FD is not ready and was already being polled, it's a useless
wakeup.

The current patch improves the connection-level subscribe() so that
it really manipulates the polling if the FD is marked not-ready, but
instead schedules the argument tasklet for a wakeup if the FD is
ready. This guarantees we'll wake this tasklet up in any case once the
FD is ready, either immediately or after polling.

By doing so, a test on pure close mode shows we cut in half the number
of epoll_ctl() calls and almost eliminate failed recvfrom():

  $ ./h1load -n 100000 -r 1 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

  before:
   399464 epoll_ctl 1
   200007 recvfrom 1
   200000 sendto 1
   100000 recvfrom -1
     7508 epoll_wait 1

  after:
   205739 epoll_ctl 1
   200000 sendto 1
   200000 recvfrom 1
     6084 epoll_wait 1
     2651 recvfrom -1

On keep-alive there is no change however.