git.ipfire.org Git - thirdparty/haproxy.git/log

MEDIUM: connections: Move some fields from struct connection to ssl_sock_ctx.

Move xprt_st, tmp_early_data and sent_early_data from struct connection to
struct ssl_sock_ctx, as they are only used in the SSL code.

MEDIUM: ssl: Give ssl_sock its own context.

Instead of using directly a SSL * as xprt_ctx, give ssl_sock its own context.
It's useless for now, but will be useful later when we'll want to be able to
stack xprts.

MEDIUM: tasks: Use __ha_barrier_store after modifying global_tasks_mask.

Now that we no longer use atomic operations to update global_tasks_mask,
as it's always modified while holding the TASK_RQ_LOCK, we have to use
__ha_barrier_store() instead of __ha_barrier_atomic_store() to ensure
any modification of global_tasks_mask is seen before modifying
active_tasks_mask.

This should be backported to 1.9.

BUG/MINOR: mworker: disable busy polling in the master process

When enabling busy polling, we don't want the master to use it, or it
wastes a dedicated processor to this!

Must be backported to 1.9.

MINOR: contrib/prometheus-exporter: Follow best practices about metrics type

In short, _total metrics are now counters and others are gauges.

No backport needed. See issue #81 on github.

MINOR: contrib/prometheus-exporter: Rename some metrics to be more usable

Some metrics have been renamed and their type adapted to be more usable in
Prometheus:

  * haproxy_process_uptime_seconds -> haproxy_process_start_time_seconds
  * haproxy_process_max_memory -> haproxy_process_max_memory_bytes
  * haproxy_process_pool_allocated_total -> haproxy_process_pool_allocated_bytes
  * haproxy_process_pool_used_total -> haproxy_process_pool_used_bytes
  * haproxy_process_ssl_cache_lookups -> haproxy_process_ssl_cache_lookups_total
  * haproxy_process_ssl_cache_misses -> haproxy_process_ssl_cache_misses_total

No backport needed. See issue #81 on github.

MINOR: contrib/prometheus-exporter: Remove usless rate metrics

Following metrics have been removed:

  * haproxy_frontend_connections_rate_current (ST_F_CONN_RATE)
  * haproxy_frontend_http_requests_rate_current (ST_F_REQ_RATE)
  * haproxy_*_current_session_rate (ST_F_RATE)

These rates can be deduced using the total value with this kind of formula:

  rate(haproxy_frontend_connections_total[1m])

No backport needed. See issue #81 on github.

BUG/MINOR: contrib/prometheus-exporter: Fix a typo in the run-queue metric type

No backport needed.

MEDIUM: tasks: Don't account a destroyed task as a runned task.

In process_runnable_tasks(), if the task we're about to run has been
destroyed, and should be free, don't account for it in the number of task
we ran. We're only allowed a maximum number of tasks to run per call to
process_runnable_tasks(), and freeing one shouldn't take the slot of a
valid task.

MEDIUM: tasks: Merge task_delete() and task_free() into task_destroy().

task_delete() was never used without calling task_free() just after, and
task_free() was only used on error pathes to destroy a just-created task,
so merge them into task_destroy(), that will remove the task from the
wait queue, and make sure the task is either destroyed immediately if it's
not in the run queue, or destroyed when it's supposed to run.

CLEANUP: task: remain consistent when using the task's handler

A pointer "process" is assigned the task's handler in
process_runnable_tasks(), we have no reason to use t->process
right after it is assigned.

MINOR: task/thread: factor out a wake-up condition

The wakeup condition in task_wakeup() is redundant as it is already
validated by the CAS. Better move the __task_wakeup() call there, it
also has the merit of being easier to audit this way. This also reduces
the code size by around 1.8 kB :

  $ size haproxy-?
     text    data     bss     dec     hex filename
  2153806  100208 1307676 3561690  3658da haproxy-1
  2152094  100208 1307676 3559978  36522a haproxy-2

BUG/MAJOR: task: make sure never to delete a queued task

Commit 0c7a4b6 ("MINOR: tasks: Don't set the TASK_RUNNING flag when
adding in the tasklet list.") revealed a hole in the way tasks may
be freed : they could be removed while in the run queue when the
TASK_QUEUED flag was present but not the TASK_RUNNING one. But it
seems the issue was emphasized by commit cde7902 ("MEDIUM: tasks:
improve fairness between the local and global queues") though the
code it replaces was already affected given how late the TASK_RUNNING
flag was set after removal from the global queue.

At the moment the task is picked from the global run queue, if it
is the last one, the global run queue lock is dropped, and then
the TASK_RUNNING flag was added. In the mean time another thread
might have performed a task_free(), and immediately after, the
TASK_RUNNING flag was re-added to the task, which was then added
to the tasklet list. The unprotected window was extremely faint
but does definitely exist and inconsistent task lists have been
observed a few times during very intensive tests over the last few
days. From this point various options are possible, the task might
have been re-allocated while running, and assigned state 0 and/or
state QUEUED while it was still running, resulting in the tast not
being put back into the tree.

This commit simply makes sure that tests on TASK_RUNNING before removing
the task also cover TASK_QUEUED.

It must be backported to 1.9 along with the previous ones touching
that area.

BUG/MEDIUM: applets: Don't use task_in_rq().

When deciding if we want to wake the task of an applet up, don't give up
if task_in_rq returns 1, as there's a race condition and another thread
may run it. Instead, always attempt to task_wakeup(), at worst the task
is already in the run queue, and nothing will happen.

MINOR: tasks: Don't set the TASK_RUNNING flag when adding in the tasklet list.

Now that TASK_QUEUED is enforced, there's no need to set TASK_RUNNING when
removing the task from the runqueue to add it to the tasklet list. The flag
will only be set right before we run the task.

MEDIUM: tasks: No longer use rq.node.leaf_p as a lock.

Now that we have the warranty that a task won't be added in the runqueue
while the TASK_QUEUED or the TASK_RUNNING flag is set, don't bother trying
to lock the task by setting leaf_p to 0x1 while inserting it in the runqueue
or having it in the tasklet_list, as nobody else will attempt to add it.

MINOR: tasks: Don't consider we can wake task with tasklet_wakeup().

In tasklet_wakeup(), don't bother checking if the tasklet is really a task,
calling tasklet_wakeup() with a task is invalid.

BUG/MEDIUM: tasks: Make sure we modify global_tasks_mask with the rq_lock.

When modifying global_tasks_mask, make sure we hold the rq_lock, or we might
remove the bit while it has been re-set by somebody else, and we make not
be waked when needed.

BUG/MEDIUM: tasks: Make sure we set TASK_QUEUED before adding a task to the rq.

Make sure we set TASK_QUEUED in every case before adding the task to the
run queue. task_wakeup() now checks if either TASK_QUEUED or TASK_RUNNING
is set, and if neither is set, add TASK_QUEUED and effectively add the task
to the runqueue.
No longer use __task_wakeup() anywhere except in task_wakeup(), always use
task_wakeup() instead.
With the old code, process_runnable_task() may re-add a task in the runqueue
without setting the TASK_QUEUED flag, and there were race conditions that could
lead to a task having the TASK_QUEUED flag but not in the runqueue, thus
being unschedulable.

This should be backported to 1.9.

BUG/MINOR: http_fetch/htx: Use HTX versions if the proxy enables the HTX mode

Because the HTX is now the default mode for all proxies (HTTP and TCP), it is
better to match on the proxy options to know if the HTX is enabled or not. This
way, if a TCP proxy explicitly disables the HTX mode, the legacy version of HTTP
fetches will be used.

No backport needed except if the patch activating the HTX by default for all
proxies is backported.

BUG/MINOR: http_fetch/htx: Allow permissive sample prefetch for the HTX

As for smp_prefetch_http(), there is now a way to successfully perform a
prefetch in HTX, even if the message forwarding already begun. It is used for
the sample fetches "req.proto_http" and "method".

This patch must be backported to 1.9.

BUG/MAJOR: http_fetch: Get the channel depending on the keyword used

All HTTP samples are buggy because the channel tested in the prefetch functions
(HTX and legacy HTTP) is chosen depending on the sample direction and not the
keyword really used. It means the request channel is used if the sample is
called during the request analysis and the response channel is used if it is
called during the response analysis, regardless the sample really called. For
instance, if you use the sample "req.ver" in an http-response rule, the response
channel will be prefeched because it is called during the response analysis,
while the request channel should have been used instead. So some assumptions on
the validity of the sample may be made on the wrong channel. It is the first
bug.

Then the same error is done in some samples themselves. So fetches are performed
on the wrong channel. For instance, the header extraction (req.fhdr, res.fhdr,
req.hdr, res.hdr...). If the sample "req.hdr" is used in an http-response rule,
then the matching is done on the response headers and not the request ones. It
is the second bug.

Finally, the last one but not the least, in some samples, the right channel is
used. But because the prefetch was done on the wrong one, this channel may be in
a undefined state. For instance, using the sample "req.ver" in an http-response
rule leads to a matching on a posibility released buffer.

To fix all these bugs, the right channel is now chosen in sample fetches, before
the prefetch. If the same function is used to fetch requests and responses
elements, then the keyword is used to choose the right one. This channel is then
used by the functions smp_prefetch_htx() and smp_prefetch_http(). Of course, it
is also used by the samples themselves to extract information.

This patch must be backported to all supported versions. For version 1.8 and
priors, it must be totally refactored. First because there is no HTX into these
versions. Then the buffers API has changed in HAProxy 1.9. The files
http_fetch.{ch} doesn't exist on old versions.

BUG/MEDIUM: htx: Don't return the start-line if the HTX message is empty

In the function htx_get_stline(), NULL must be returned if the HTX message
doesn't contain any element.

This patch must be backported to 1.9.

MINOR: mux-h1: Handle read0 during TCP splicing

It avoids a roundtrip with underlying I/O callbacks to do so. If a read0 is
handled at the end of h1_rcv_pipe(), the flag CS_FL_REOS is set on the
conn_stream. And if there is no data in the pipe, the flag CS_FL_EOS is also
set.

This path may be backported to 1.9.

BUG/MEDIUM: mux-h1: Enable TCP splicing to exchange data only

Use the TCP splicing only when the input parser is in the state H1_MSG_DATA or
H1_MSG_TUNNEL and don't transfer more than then known expected length for these
data (unlimited for the tunnel mode). In other states or when all data are
transferred, the TCP splicing is disabled.

This patch must be backported to 1.9.

BUG/MEDIUM: mux-h1: Notify the stream waiting for TCP splicing if ibuf is empty

When a stream-interface want to use the TCP splicing to forward its data, it
notifies the mux h1. We will then flush the input buffer and don't read more
data. So the stream-interface will not be notified for read anymore, except if
an error or a read0 is detected. It is a problem everytime the receive I/O
callback is called again. It happens when the pipe is full or when no data are
received on the pipe. It also happens when the input buffer is freshly
flushed. Because the TCP splicing is enabled, nothing is done in h1_recv() and
the stream-interface is never woken up. So, now, in h1_recv(), if the TCP
splicing is used and the input buffer is empty, the stream-interface is notified
for read.

This patch must be backported to 1.9.

BUG/MINOR: mux-h1: Don't switch the parser in busy mode if other side has done

There is no reaon to switch the input parser in busy mode if all the output has
been processed.

This patch must be backported to 1.9.

BUG/MINOR: mux-h1: Process input even if the input buffer is empty

It is required, at least, to add the EOM block and finish the message when the
TCP splicing was used to send all data. Otherwise, there is no way to finish the
parsing.

This patch must be backported to 1.9.

REGTESTS: exclude tests that require ssl, pcre if no such feature is enabled

Signed-off-by: Ilya Shipitsin <chipitsine@gmail.com>

BUG/MINOR: mworker: ensure that we still quits with SIGINT

Since the fix "BUG/MINOR: mworker: don't exit with an ambiguous value"
we are leaving with a EXIT_SUCCESS upon a SIGINT.

We still need to quit with a SIGINT when a worker leaves with a SIGINT.

This is done this way because vtest expect a 130 during the process
stop, haproxy without mworker returns a 130, so it should be the same in
mworker mode.

This should be backported in 1.9, with the previous patch ("BUG/MINOR:
mworker: don't exit with an ambiguous value").

Code has moved, mworker_catch_sigchld() is in haproxy.c.

BUG/MINOR: mworker: don't exit with an ambiguous value

When the sigchld handler is called and waitpid() returns -1,
the behavior of waitpid() with the status variable is undefined.
It is not a good idea to exit with the value contained in it.

Since this exit path does not use the exitcode variable, it means that
this is an expected and successful exit.

This should be backported in 1.9, code has moved,
mworker_catch_sigchld() is in haproxy.c.

BUG/MINOR: mworker: mworker_kill should apply on every children

Commit 3f12887 ("MINOR: mworker: don't use children variable anymore")
introduced a regression.

The previous behavior was to send a signal to every children, whether or
not they are former children. Instead of this, we only send a signal to
the current children, so we don't try to kill -INT or -TERM all
processes during a reload.

No backport needed.

BUG/MINOR: listener/mq: correctly scan all bound threads under low load

When iterating on the CLI using "show activity" and no other load, it
was visible that the last thread was always skipped. This was caused by
the way the thread bits were walking : t1 was updated after t2 to make
sure it never equals t2 (thus it skips t2), and in case of a tie we
choose t1. This results in the chosen thread never to equal t2 unless
the other ones already have one connection. In addition to this, t2 was
recalulated upon each pass due to the fact that only the 31th bit was
looked at instead of looking at the t2'th bit.

This patch fixes this by updating t2 after t1 so that t1 is free to
walk over all positions under equal load. No measurable performance
gains are expected from this though, but it at least removes one
strange indicator which could lead to some suspicion.

No backport is needed.

MINOR: init: add a "set-dumpable" global directive to enable core dumps

It's always a pain to get a core dump when enabling user/group setting
(which disables the dumpable flag on Linux), when using a chroot and/or
when haproxy is started by a service management tool which requires
complex operations to just raise the core dump limit.

This patch introduces a new "set-dumpable" global directive to work
around these troubles by doing the following :

  - remove file size limits     (equivalent of ulimit -f unlimited)
  - remove core size limits     (equivalent of ulimit -c unlimited)
  - mark the process dumpable again (equivalent of suid_dumpable=1)

Some of these will depend on the operating system. This way it becomes
much easier to retrieve a core file. Temporarily moving the chroot to
a user-writable place generally enough.

MINOR: mworker: export HAPROXY_MWORKER=1 when running in mworker mode

Export HAPROXY_MWORKER=1 in an environment variable when running in
mworker mode.

MINOR: cli: don't add a semicolon at the end of HAPROXY_CLI

Only add the semicolon when there is several CLI in HAPROXY_CLI and
HAPROXY_MASTER_CLI.

MEDIUM: mworker/cli: export the HAPROXY_MASTER_CLI variable

It works the same way as the HAPROXY_CLI variable, it exports the
listeners addresses separated by semicolons.

CLEANUP: mworker: remove the type field in mworker_proc

Since the introduction of the options field, we can use it to store the
type of process.

type = 'm' is replaced by PROC_O_TYPE_MASTER
type = 'w' is replaced by PROC_O_TYPE_WORKER
type = 'e' is replaced by PROC_O_TYPE_PROG

The old values are still used in the HAPROXY_PROCESSES environment
variable to pass the information during a reload.

MEDIUM: mworker-prog: implements 'option start-on-reload'

This option is already the default, but its opposite 'no option
start-on-reload' allows the master to keep a previous instance of a
program and don't start a new one upon a reload.

The old program will then appear as a current one in "show proc" and
could also trigger an exit-on-failure upon a segfault.

MEDIUM: mworker: store the leaving state of a process

Previously we were assuming than a process was in a leaving state when
its number of reload was greater than 0. With mworker programs it's not
the case anymore so we need to store a leaving state.

BUG/MAJOR: lb/threads: fix insufficient locking on round-robin LB

Maksim Kupriianov reported very strange crashes in fwrr_update_position()
which didn't make sense because of an apparent divide overflow except that
the value was not null in the core.

It happens that while the locking is correct in all the functions' call
graph, the uppermost one (fwrr_get_next_server()) incorrectly expected
that its target server was already locked when called. This stupid
assumption causd the server lock not to be held when calling the other
ones, explaining how it was possible to change the server's eweight by
calling srv_lb_commit_status() under the server lock yet collide with
its unprotected usage.

This commit makes sure that fwrr_get_server_from_group() retrieves a
locked server and that fwrr_get_next_server() is responsible for
unlocking the server before returning it. There is one subtlety in
this function which is that it builds a list of avoided servers that
were full while scanning the tree, and all of them are queued in a
full state so they must be unlocked upon return.

Many thanks to Maksim for providing detailed info allowing to narrow
down this bug.

This fix must be backported to 1.9. In 1.8 the lock seems much wider
and changes to the server's state are performed under the rendez-vous
point so this it doesn't seem possible that it happens there.

DOC: update for "show peers" CLI command.

Add the documentation for the new "show peers" CLI command which comes with
this commit "MINOR: peers: Add a new command to the CLI for peers.".

MINOR: peers: Add a new command to the CLI for peers.

Implements "show peers [peers section]" new CLI command to dump information
about the peers and their stick-tables to be synchronized and others internal.

May be backported as far as 1.5.

BUILD: htx: fix a used uninitialized warning on is_cookie2

gcc-3.4 reports this which actually looks like a valid warning when
looking at the code, it's unsure why others didn't notice it :

src/proto_htx.c: In function `htx_manage_server_side_cookies':
src/proto_htx.c:4266: warning: 'is_cookie2' might be used uninitialized in this function

BUILD: do not specify "const" on functions returning structs or scalars

Older compilers (like gcc-3.4) warn about the use of "const" on functions
returning a struct, which makes sense since the return may only be copied :

include/common/htx.h:233: warning: type qualifiers ignored on function return type

Let's simply drop "const" here.

BUILD: address a few cases of "static <type> inline foo()"

Older compilers don't like to see "inline" placed after the type in a
function declaration, it must be "static inline <type>" only. This
patch touches various areas. The warnings were seen with gcc-3.4.

BUG/MEDIUM: Threads: Only use the gcc >= 4.7 builtins when using gcc >= 4.7.

Move the definition of the various _HA_ATOMIC_* macros that use
__atomic_* in the #if GCC_VERSION >= 4.7, not just after it, so that we
can build with older versions of gcc again.

MINOR: connections: Remove the SUB_CALL_UNSUBSCRIBE flag.

Garbage collect SUB_CALL_UNSUBSCIRBE, as it's now unused.

BUG/MEDIUM: h2: Revamp the way send subscriptions works.

Instead of abusing the SUB_CALL_UNSUBSCRIBE flag, revamp the H2 code a bit
so that it just checks if h2s->sending_list is empty to know if the tasklet
of the stream_interface has been waken up or not.
send_wait is now set to NULL in h2_snd_buf() (ideally we'd set it to NULL
as soon as we're waking the tasklet, but it can't be done, because we still
need it in case we have to remove the tasklet from the task list).

BUG/MEDIUM: h2: Make sure we're not already in the send_list in h2_subscribe().

In h2_subscribe(), don't add ourself to the send_list if we're already in it.
That may happen if we try to send and fail twice, as we're only removed
from the send_list if we managed to send data, to promote fairness.
Failing to do so can lead to either an infinite loop, or some random crashes,
as we'd get the same h2s in the send_list twice.

This should be backported to 1.9.

BUG/MEDIUM: muxes: Make sure we unsubcribed when destroying mux ctx.

In the h1 and h2 muxes, make sure we unsubscribed before destroying the
mux context.
Failing to do so will lead in a segfault later, as the connection will
attempt to dereference its conn->send_wait or conn->recv_wait, which pointed
to the now-free'd mux context.

This was introduced by commit 39a96ee16eeec51774f9f52a783cf624a0de4ccb, so
should only be backported if that commit gets backported.

BUILD: cli/threads: fix build in single-threaded mode

Commit a8f57d51a ("MINOR: cli/activity: report the accept queue sizes
in "show activity"") broke the single-threaded build because the
accept-rings are not implemented there. Let's ifdef this out. Ideally
we should start to think about always having such elements initialized
even without threads to improve the test coverage.

BUILD: task/thread: fix single-threaded build of task.c

As expected, commit cde7902ac ("MEDIUM: tasks: improve fairness between
the local and global queues") broke the build with threads disabled,
and I forgot to rerun this test before committing. No backport is
needed.

BUG/MINOR: ssl: Fix 48 byte TLS ticket key rotation

Whenever HAProxy was reloaded with rotated keys, the resumption would be
broken for previous encryption key. The bug was introduced with the addition
of 80 byte keys in 9e7547 (MINOR: ssl: add support of aes256 bits ticket keys
on file and cli.).

This fix needs to be backported to 1.9.

BUG/MEDIUM: map: Fix memory leak in the map converter

The allocated trash chunk is not freed properly and causes a memory leak
exhibited as the growth in the trash pool allocations. Bug was introduced
in commit 271022 (BUG/MINOR: map: fix map_regm with backref).

This should be backported to all branches where the above commit was
backported.

MINOR: tasks: restore the lower latency scheduling when niced tasks are present

In the past we used to reduce the number of tasks consulted at once when
some niced tasks were present in the run queue. This was dropped in 1.8
when the scheduler started to take batches. With the recent fixes it now
becomes possible to restore this behaviour which guarantees a better
latency between tasks when niced tasks are present. Thanks to this, with
the default number of 200 for tune.runqueue-depth, with a parasitic load
of 14000 requests per second, nice 0 gives 14000 rps, nice 1024 gives
12000 rps and nice -1024 gives 16000 rps. The amplitude widens if the
runqueue depth is lowered.

MEDIUM: tasks: only base the nice offset on the run queue depth

The offset calculated for the nice value used to be wrong for a long
time and got even worse when the improved multi-thread sheduler was
implemented because it continued to rely on the run queue size, which
become irrelevant given that we extract tasks in batches, so the run
queue size moves following a sawtooth form.

However the offsets much better reflects insertion positions in the
queue, so it's worth dropping this rq_size component of the equation.
Last point, due to the batches made of runqueue-depth entries at once,
the higher the depth, the lower the effect of the nice setting since
values are picked together in batches and placed into a list. An
intuitive approach consists in multiplying the nice value with the
batch size to allow tasks to participate to a different batch. And
experimentation shows that this works pretty well.

With a runqueue-depth of 16 and a parasitic load of 16000 requests
per second on 100 streams, a default nice of 0 shows 16000 requests
per second for nice 0, 22000 for nice -1024 and 10000 for nice 1024.

The difference is even bigger with a runqueue depth of 5. At 200
however it's much smoother (16000-22000).

MEDIUM: tasks: improve fairness between the local and global queues

Tasks allowed to run on multiple threads, as well as those scheduled by
one thread to run on another one pass through the global queue. The
local queues only see tasks scheduled by one thread to run on itself.
The tasks extracted from the global queue are transferred to the local
queue when they're picked by one thread. This causes a priority issue
because the global tasks experience a priority contest twice while the
local ones experience it only once. Thus if a tasks returns still
running, it's immediately reinserted into the local run queue and runs
much faster than the ones coming from the global queue.

Till 1.9 the tasks going through the global queue were mostly :
  - health checks initialization
  - queue management
  - listener dequeue/requeue

These ones are moderately sensitive to unfairness so it was not that
big an issue.

Since 2.0-dev2 with the multi-queue accept, tasks are scheduled to
remote threads on most accept() and it becomes fairly visible under
load that the accept slows down, even for the CLI.

This patch remedies this by consulting both the local and the global
run queues in parallel and by always picking the task whose deadline
is the earliest. This guarantees to maintain an excellent fairness
between the two queues and removes the cascade effect experienced
by the global tasks.

Now the CLI always continues to respond quickly even in presence of
expensive tasks running for a long time.

This patch may possibly be backported to 1.9 if some scheduling issues
are reported but at this time it doesn't seem necessary.

CLEANUP: task: do not export rq_next anymore

This one hasn't been used anymore since the scheduler changes after 1.8
but it kept being exported and maintained up to date while it's always
reset when scanning the trees. Let's stop exporting it and updating it.

BUG/MEDIUM: muxes: Don't dereference mux context if null in release functions

When a mux context is released, we must be sure it exists before dereferencing
it. The bug was introduced in the commit 39a96ee16 ("MEDIUM: muxes: Be prepared
to don't own connection during the release").

No need to backport this patch, expect if the commit 39a96ee16 is backported
too.

REGTEST: Use HTX by default and add '--no-htx' option to disable it

Because the HTX is now the default mode in HAProxy, it is also enabled by
default in reg-tests. The option '--use-htx' is still available, but deprecated
and have concretly no effect. To run reg-tests with the legacy HTTP mode, you
should use the option '--no-htx'.

MAJOR: htx: Enable the HTX mode by default for all proxies

The legacy HTTP mode is no more the default one. So now, by default, without any
option in your configuration, all proxies will use the HTX mode. The line
"option http-use-htx" in proxy sections are now useless, except to cancel the
legacy HTTP mode. To fallback on legacy HTTP mode, you should use the line "no
option http-use-htx" explicitly.

Note that the reg-tests still work by default on legacy HTTP mode. The HTX will
be enabled by default in a futur commit.

MAJOR: muxes/htx: Handle inplicit upgrades from h1 to h2

The upgrade is performed when an H2 preface is detected when the first request
on a connection is parsed. The CS is destroyed by setting EOS flag on it. A
special flag is added on the HTX message to warn the HTX analyzers the stream
will be closed because of an upgrade. This way, no error and no log are
emitted. When the mux h1 is released, we create a mux h2, without any CS and
passing the buffer with the unparsed H2 preface.

MAJOR: proxy/htx: Handle mux upgrades from TCP to HTTP in HTX mode

It is now possible to upgrade TCP streams to HTX when an HTTP backend is set for
a TCP frontend (both with the HTX enabled). So concretely, in such case, an
upgrade is performed from the mux pt to the mux h1. The current CS and the
channel's buffer are used to initialize the mux h1.

MEDIUM: htx: Allow the option http-use-htx to be used on TCP proxies too

This will be mandatory to allow upgrades from TCP to HTTP in HTX. Of course, raw
buffers will still be used by default on TCP proxies, this option sets or
not. But if you want to handle mux upgrades from a TCP proxy, you must enable
the HTX on it and on all its backends.

There is only a small change in the lua code. Because TCP proxies can be HTX
aware, to exclude TCP services only for HTTP proxies, we must also check the
mode (TCP/HTTP) now.

MEDIUM: connection: Add conn_upgrade_mux_fe() to handle mux upgrades

This function will handle mux upgrades, for frontend connections only. It will
retrieve the best mux in the same way than conn_install_mux_fe except that the
mode and optionnally the proto are forced.

The new multiplexer is initialized using a new context and a specific input
buffer. Then, the old one is destroyed. If an error occurred, everything is
rolled back.

MEDIUM: muxes: Be prepared to don't own connection during the release

This happens during mux upgrades. In such case, when the destroy() callback is
called, the connection points to a different mux's context than the one passed
to the callback. It means the connection is owned by another mux. The old mux is
then released but the connection is not closed.

MINOR: muxes: Pass the context of the mux to destroy() instead of the connection

It is mandatory to handle mux upgrades, because during a mux upgrade, the
connection will be reassigned to another multiplexer. So when the old one is
destroyed, it does not own the connection anymore. Or in other words, conn->ctx
does not point to the old mux's context when its destroy() callback is
called. So we now rely on the multiplexer context do destroy it instead of the
connection.

In addition, h1_release() and h2_release() have also been updated in the same
way.

MEDIUM: muxes: Add an optional input buffer during mux initialization

The mux's callback init() now take a pointer to a buffer as extra argument. It
must be used by the multiplexer as its input buffer. This buffer is always NULL
when a multiplexer is initialized with a fresh connection. But if a mux upgrade
is performed, it may be filled with existing data. Note that, for now, mux
upgrades are not supported. But this commit is mandatory to do so.

MINOR: muxes: Rely on conn_is_back() during init to handle front/back conn

Instead of using the connection context to make the difference between a
frontend connection and a backend connection, we now rely on the function
conn_is_back().

MINOR: filters/htx: Use stream flags instead of px mode to instanciate a filter

In the function flt_stream_add_filter(), if the HTX is enabled, before attaching
a filter to a stream, we test if the filter can handle it or not. If not, the
filter is ignored. Before the proxy mode was tested. Now we test if the stream
is an HTX stream or not.

MINOR: http_fetch/htx: Use stream flags instead of px mode in smp_prefetch_htx

In the function smp_prefetch_htx(), we must know if data in the channel's buffer
are structured or not. Before the proxy mode was tested. Now we test if the
stream is an HTX stream or not. If yes, we know the HTX is used to structure
data in the channel's buffer.

MINOR: http: update the macro IS_HTX_STRM() to check the stream flag SF_HTX

Instead of matching on the frontend options, we now check if the flag SF_HTX is
set or not on the stream to know if it is an HTX stream or not.

MINOR: stream: Set a flag when the stream uses the HTX

The flag SF_HTX has been added to know when a stream uses the HTX or not. It is
set when an HTX stream is created. There are 2 conditions to set it. The first
one is when the HTTP frontend enables the HTX. The second one is when the attached
conn_stream uses an HTX multiplexer.

MINOR: muxes: Add a flag to specify a multiplexer uses the HTX

A multiplexer must now set the flag MX_FL_HTX when it uses the HTX to structured
the data exchanged with channels. the muxes h1 and h2 set this flag. Of course,
for the mux h2, it is set on h2_htx_ops only.

MINOR: mux-h2: Add a mux_ops dedicated to the HTX mode

Instead of using the same mux_ops structure for the legacy HTTP mode and the HTX
mode, a dedicated mux_ops is now used for the HTX mode. Same callbacks are used
for both. But the flags may be different depending on the mode used.

BUG/MINOR: mux-h1: Handle the flag CS_FL_KILL_CONN during a shutdown read/write

This flag is used to explicitly kill the connection when the CS is closed. It
may be set by tcp rules. It must be respect by the mux-h1.

This patch must be backported to 1.9.

MINOR: mux-h1: Don't release the conn_stream anymore when h1s is destroyed

An H1 stream is destroyed when the conn_stream is detached or when the H1
connection is destroyed. In the first case, the CS is released by the caller. In
the second one, because the connection is closed, no CS is attached anymore. In
both, there is no reason to release the conn_stream in h1s_destroy().

MEDIUM: mux-h1: Simplify the connection mode management by sanitizing headers

Connection headers are now sanitized during the parsing and the formatting. This
means "close" and "keep-alive" values are always removed but right flags are
set. This way, client side and server side are independent of each other. On the
input side, after the parsing, neither "close" nor "keep-alive" values
remain. So on the output side, if we found one of these values in a connection
headers, it means it was explicitly added by HAProxy. So it overwrites the other
rules, if applicable. Always sanitizing the output is also a way to simplifiy
conditions to update the connection header. Concretly, only additions of "close"
or "keep-alive" values remain, depending the case.

No need to backport this patch.

MEDIUM: h1: Add an option to sanitize connection headers during parsing

The flag H1_MF_CLEAN_CONN_HDR has been added to let the H1 parser sanitize
connection headers. It means it will remove all "close" and "keep-alive" values
during the parsing. One noticeable effect is that connection headers may be
unfolded. In practice, this is not a problem because it is not frequent to have
multiple values for the connection headers.

If this flag is set, during the parsing The function
h1_parse_next_connection_header() is called in a loop instead of
h1_parse_conection_header().

No need to backport this patch

MINOR: stats/htx: Don't add "Connection: close" header anymore in stats responses

On the client side, as far as possible, we will try to keep connection
alive. So, in most of cases, this header will be removed. So it is better to not
add it at all. If finally the connection must be closed, the header will be
added by the mux h1.

No need to backport this patch.

MINOR: mux-h1: Simplify handling of 1xx responses

Because of previous changes on http tunneling, the synchronization of the
transaction can be simplified. Only the check on intermediate messages remains
and it only concerns the response path.

This patch must be backported to 1.9. It is not strictly speaking required but
it will ease futur backports.

BUG/MEDIUM: htx: Fix the process of HTTP CONNECT with h2 connections

In HTX, the HTTP tunneling does not work if h1 and h2 are mixed (an h1 client
sending requests to an h2 server or this opposite) because the h1 multiplexer
always adds an EOM before switching it to tunnel mode. The h2 multiplexer
interprets it as an end of stream, closing the stream as for any other
transaction.

To make it works again, we need to swith to the tunnel mode without emitting any
EOM blocks. Because of that, HTX analyzers have been updated to switch the
transaction to tunnel mode before end of the message (because there is no end of
message...).

To be consistent, the protocol switching is also handled the same way even
though the 101 responses are not supported in h2.

This patch must be backported to 1.9.

MINOR: proto_htx: Don't adjust transaction mode anymore in HTX analyzers

Because the option http-tunnel is now ignored in HTX, there is no longer any
need to adjust the transaction mode in HTX analyzers. A channel can still be
switch to the tunnel mode for legitimate cases (HTTP CONNECT or switching
protocols). So the function htx_adjust_conn_mode() is now useless.

This patch must be backported to 1.9. It is not strictly speaking required but
it will ease futur backports.

MEDIUM: htx: Deprecate the option 'http-tunnel' and ignore it in HTX

The option http-tunnel disables any HTTP processing past the first
transaction. In HTX, it works for full h1 transactions. As for the legacy HTTP,
it is a workaround, but it works. But it is impossible to make it works with an
h2 connection. In such case, it has no effect, the stream is closed at the end
of the transaction. So to avoid any inconsistancies between h1 and h2
connections, this option is now always ignored when the HTX is enabled. It is
also a good opportinity to deprecate an old and ugly option. A warning is
emitted during HAProxy startup to encourage users to remove this option.

Note that in legacy HTTP, this option only works with full h1 transactions
too. If an h2 connection is established on a frontend with this option enabled,
it will have no effect at all. But we keep it for the legacy HTTP for
compatibility purpose. It will be removed with the legacy HTTP.

So to be short, if you have to really (REALLY) use it, it will only work for
legacy HTTP frontends with H1 clients.

The documentation has been updated accordingly.

This patch must be backported to 1.9. It is not strictly speaking required but
it will ease futur backports.

BUG/MEDIUM: htx: Don't crush blocks payload when append is done on a data block

If there is a data block when a header block is added in a HTX message, its
payload will be inserted after the data block payload. But its index will be
moved before the EOH block. So at this stage, if a new data block is added, we
will try to append its payload to the last data block (because it is also the
tail). Thus the payload of the further header block will be crushed.

This cannot happens if the payloads wrap thanks to the previous fix. But it
happens when the tail is not the front too. So now, in this case, we add a new
block instead of appending.

This patch must be backported in 1.9.

BUG/MEDIUM: htx: Defrag if blocks position is changed and the payloads wrap

When a header is added or when a data block is added before another one, the
blocks position may be changed (but not their payloads position). For instance,
when a header is added, we move the block just before the EOH, if any. When the
payloads wraps, it is pretty annoying because we loose the last inserted
block. It is neither the tail nor the head. And it is not the front either.

It is a design problem. Waiting for fixing this problem, we force a
defragmentation in such case. Anyway, it should be pretty rare, so it's not
really critical.

This patch must be backported to 1.9.

BUG/MINOR: spoe: Be sure to set tv_request when each message fragment is encoded

When a message or a fragment is encoded, the date the frame processing starts
must be set if it is undefined. The test on tv_request field was wrong.

This patch must be backported to 1.9.

BUG/MEDIUM: spoe: Return an error if nothing is encoded for fragmented messages

If the maximum frame size is very small with a large message or argument name,
it is possible to be unable to encode anything. In such case, it is important to
stop processing returning an error otherwise we will retry in loop to encode the
message, failing each time because of the too small frame size.

This patch must be backported to 1.9 and 1.8.

BUG/MEDIUM: spoe: Queue message only if no SPOE applet is attached to the stream

If a SPOE applet is already attached to a stream to handle its messages, we must
not queue them. Otherwise it could be handled by another applet leading to
errors. This happens with fragmented messages only. When the first framgnent is
sent, the SPOE applet sending it is attached to the stream. It should be used to
send all other fragments.

This patch must be backported to 1.9 and 1.8.

MINOR: cli/activity: report the accept queue sizes in "show activity"

Seeing the size of each ring helps understand which threads are
overloaded and why some of them are less often elected than others
by the multi-queue load balancer.

MINOR: cli/listener: report the number of accepts on "show activity"

The "show activity" command reports the number of incoming connections
dispatched per thread but doesn't report the number of connections
received by each thread. It is important to be able to monitor this
value as it can show that for whatever reason a smaller set of threads
is receiving the connections and dispatching them to all other ones.

BUG/MINOR: listener: renice the accept ring processing task

It is not acceptable that the accept queues are handled with a normal
priority since they are supposed to quickly dispatch the incoming
traffic, resulting in tasks which will have their respective nice
values and place in the queue. Let's renice the accept ring tasks
to -1024.

No backport is needed, this is strictly 2.0.

BUG/MINOR: tasks: make sure the first task to be queued keeps its nice value

The run queue offset computed from the nice value depends on the run
queue size, but for the first task to enter the run queue, this size
is zero and the task gets queued just as if its nice value was zero as
well. This is problematic for example for the CLI socket if another
higher priority task gets queued immediately after as it can steal its
place.

This patch simply adds one to the rq_size value to make sure the nice
is never multiplied by zero. The way the offset is calculated is
questionable anyway these days, since with the newer scheduler it seems
that just using the nice value as an offset should work (possibly damped
by the task's number of calls).

This fix must be backported to 1.9. It may possibly be backported to
older versions if it proves to make the CLI more interactive.

BUG/MEDIUM: task/threads: address a fairness issue between local and global tasks

It is possible to hit a fairness issue in the scheduler when a local
task runs for a long time (i.e. process_stream() returns running), and
a global task wants to run on the same thread and remains in the global
queue. What happens in this case is that the condition to extract tasks
from the global queue will rarely be satisfied for very low task counts
since whatever non-null queue size multiplied by a thread count >1 is
always greater than the small remaining number of tasks in the queue.
In theory another thread should pick the task but we do have some mono
threaded tasks in the global queue as well during inter-thread wakeups.

Note that this can only happen with task counts lower than the thread
counts, typically one task in each queue for more than two threads.

This patch works around the problem by allowing a very small unfairness,
making sure that we can always pick at least one task from the global
queue even if there is already one in the local queue.

A better approach will consist in scanning the two trees in parallel
and always pick the best task. This will be more complex and will
constitute a separate patch.

This fix must be backported to 1.9.

BUG/MEDIUM: stream_interface: Don't bother doing chk_rcv/snd if not connected.

If the interface is not in state SI_ST_CON or SI_ST_EST, don't bother
trying to send/recv data, we can't do it anyway, and if we're in SI_ST_TAR,
that may lead to adding the SI_FL_ERR flag back on the stream_interface,
while we don't want it.

This should be backported to 1.9.

BUG/MEDIUM: streams: Only re-run process_stream if we're in a connected state.

In process_stream(), only try again when there's the SI_FL_ERR flag and we're
in a connected state, otherwise we can loop forever.
It used to work because si_update_both() bogusly removed the SI_FL_ERR flag,
and it would never be set at this point. Now it does, so take that into
account.
Many, many thanks to Maciej Zdeb for reporting the problem, and helping
investigating it.

This should be backported to 1.9.

MINOR: ssl: Activate aes_gcm_dec converter for BoringSSL

BoringSSL can support it, no need to disable.

MINOR: skip get_gmtime where tm is unused

For LOG_FMT_TS (%Ts), the tm variable is not used, so save some cycles
on the call to get_gmtime.

Backport: 1.9 1.8
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>

BUG/MEDIUM: pattern: assign pattern IDs after checking the config validity

Pavlos Parissis reported an interesting case where some map identifiers
were not assigned (appearing as -1 in show map). It turns out that it
only happens for log-format expressions parsed in check_config_validity()
that involve maps (log-format, use_backend, unique-id-header), as in the
sample configuration below :

    frontend foo
        bind :8001
        unique-id-format %[src,map(addr.lst)]
        log-format %[src,map(addr.lst)]
        use_backend %[src,map(addr.lst)]

The reason stems from the initial introduction of unique IDs in 1.5 via
commit af5a29d5f ("MINOR: pattern: Each pattern is identified by unique
id.") : the unique_id assignment was done before calling
check_config_validity() so all maps loaded after this call are not
properly configured. From what the function does, it seems they will not
be able to use a cache, will not have a unique_id assigned and will not
be updatable from the CLI.

This fix must be backported to all supported versions.