git.ipfire.org Git - thirdparty/haproxy.git/log

MEDIUM: stream: do not forcefully close the client connection anymore

Now that the mux will take care of closing the client connection at the
right moment, we don't need to close the client connection anymore, and
we just need to close the conn_stream.

MEDIUM: connection: make mux->detach() release the connection

For H2, only the mux's timeout or other conditions might cause a
release of the mux and the connection, no stream should be allowed
to kill such a shared connection. So a stream will only detach using
cs_destroy() which will call mux->detach() then free the cs.

For now it's only handled by mux_pt. The goal is that the data layer
never has to care about the connection, which will have to be released
depending on the mux's mood.

MEDIUM: connection: replace conn_full_close() with cs_close()

At all call places where a conn_stream is in use, we can now use
cs_close() to get rid of a conn_stream and of its underlying connection
if the mux estimates it makes sense. This is what is currently being
done for the pass-through mux.

MEDIUM: mux_pt: make cs_shutr() / cs_shutw() properly close the connection

Now these functions are able to automatically close both the transport
and the socket layer, causing the whole connection to be torn down if
needed.

The two shutdown modes are implemented for both directions, and when
a direction is closed, if it sees the other one is closed as well, it
completes by closing the connection. This is similar to what is performed
in the stream interface.

It's not deployed yet but the purpose is to get rid of conn_full_close()
where only conn_stream should be known.

MINOR: connection: add cs_close() to close a conn_stream

This basically calls cs_shutw() followed by cs_shutr(). Both of them
are called in the most conservative mode so that any previous call is
still respected. The CS flags are cleared so that it can be reused
(this is important for connection retries when conn and CS are reused
without being reallocated).

MEDIUM: connection: make conn_sock_shutw() aware of lingering

Instead of having to manually handle lingering outside, let's make
conn_sock_shutw() check for it before calling shutdown(). We simply
don't want to emit the FIN if we're going to reset the connection
due to lingering. It's particularly important for silent-drop where
it's absolutely mandatory that no packet leaves the machine.

MINOR: conn_stream: modify cs_shut{r,w} API to pass the desired mode

Now we can specify how we want to shutdown (drain vs reset, and normal
vs silent), and this propagates to the mux then the transport layer.

MINOR: conn_stream: new shutr/w status flags

In order to support all shutdown modes on the CS, we introduce the
following flags :
  CS_FL_SHRD : shut read, drain extra data
  CS_FL_SHRR : shut read, reset extra data
  CS_FL_SHWN : shut write, normal notification
  CS_FL_SHWS : shut write, silent mode (no notification)

And the following modes for shutr/shutw :

  CS_SHR_DRAIN, CS_SHR_RESET, CS_SHW_NORMAL, CS_SHW_SILENT.

Note: it's possible that we won't need to distinguish the two shutw
above as they're only an action.

For now they are not used.

MINOR: connection: make conn_stream users also check for per-stream error flag

In a 1:1 connection:stream there's no problem relying on the connection
flags alone to check for errors. But in a mux, it will be possible to mark
certain streams in error without having to mark all of them. An example is
an H2 client sending RST_STREAM frames to abort a long download, or a parse
error requiring to abort only this specific stream.

This commit ensures that stream-interface and checks properly check for
CS_FL_ERROR in cs->flags wherever CO_FL_ERROR was in use. Most likely over
the long term, any check for CO_FL_ERROR will have to disappear.

MAJOR: connection : Split struct connection into struct connection and struct conn_stream.

All the references to connections in the data path from streams and
stream_interfaces were changed to use conn_streams. Most functions named
"something_conn" were renamed to "something_cs" for this. Sometimes the
connection still is what matters (eg during a connection establishment)
and were not always renamed. The change is significant and minimal at the
same time, and was quite thoroughly tested now. As of this patch, all
accesses to the connection from upper layers go through the pass-through
mux.

MINOR: mux_pt: implement remaining mux_ops methods

This is a basic pass-through implementation which is now basic but
complete and operational, just not used yet.

MINOR: connection: introduce the conn_stream manipulation functions

Most of the functions dealing with conn_streams are here. They act at
the data layer and interact with the mux. For now they are not used yet
but everything builds.

MINOR: mux: add more methods to mux_ops

We'll need to support reading/writing from both sides, with buffers and
pipes, as well as retrieving/updating flags.

MINOR: connection: introduce conn_stream

This patch introduces a new struct conn_stream. It's the stream-side of
a multiplexed connection. A pool is created and destroyed on exit. For
now the conn_streams are not used at all.

MINOR: connection: report the major HTTP version from the MUX for logging (fc_http_major)

A new sample fetch function reports either 1 or 2 for the on-wire encoding,
to indicate if the request was received using the HTTP/1.x format or HTTP/2
format. Note that it reports the on-wire encoding, not the version presented
in the request header.

This will possibly have to evolve if it becomes necessary to report the
encoding on the server side as well.

MEDIUM: session: use the ALPN token and proxy mode to select the mux

When an incoming connection is made on an HTTP mode frontend, the
session now looks up the mux to use based on the ALPN token and the
proxy mode. This will allow easier mux registration, and we don't
need to hard-code the mux_pt_ops anymore.

MINOR: mux: register the pass-through mux for any ALPN string

The pass-through mux is the fallback used on any incoming connection
unless another mux claims the ALPN name and the proxy mode. Thus mux_pt
registers ALPN token "" (empty name) which catches everything.

MINOR: connection: implement alpn registration of muxes

Selecting a mux based on ALPN and the proxy mode will quickly become a
pain. This commit provides new functions to register/lookup a mux based
on the ALPN string and the proxy mode to make this easier. Given that
we're not supposed to support a wide range of muxes, the lookup should
not have any measurable performance impact.

MEDIUM: connection: start to introduce a mux layer between xprt and data

For HTTP/2 and QUIC, we'll need to deal with multiplexed streams inside
a connection. After quite a long brainstorming, it appears that the
connection interface to the existing streams is appropriate just like
the connection interface to the lower layers. In fact we need to have
the mux layer in the middle of the connection, between the transport
and the data layer.

A mux can exist on two directions/sides. On the inbound direction, it
instanciates new streams from incoming connections, while on the outbound
direction it muxes streams into outgoing connections. The difference is
visible on the mux->init() call : in one case, an upper context is already
known (outgoing connection), and in the other case, the upper context is
not yet known (incoming connection) and will have to be allocated by the
mux. The session doesn't have to create the new streams anymore, as this
is performed by the mux itself.

This patch introduces this and creates a pass-through mux called
"mux_pt" which is used for all new connections and which only
calls the data layer's recv,send,wake() calls. One incoming stream
is immediately created when init() is called on the inbound direction.
There should not be any visible impact.

Note that the connection's mux is purposely not set until the session
is completed so that we don't accidently run with the wrong mux. This
must not cause any issue as the xprt_done_cb function is always called
prior to using mux's recv/send functions.

BUG/MEDIUM: threads: Initialize the sync-point

The sync point must be initialized before starting threads. This line was lost
in one of merges preparing the threads support integration.

BUG/MAJOR: threads/freq_ctr: use a memory barrier to detect changes

commit 6e01286 (BUG/MAJOR: threads/freq_ctr: fix lock on freq counters)
attempted to fix the loop using volatile but that doesn't work depending
on the level of optimization, resulting in situations where the threads
could remain looping forever. Here we use memory barriers between reads
to enforce a strict ordering and the asm code produced does exactly what
the C code does and works perfectly, with a 3-digit measurement accuracy
observed during a test.

MINOR: threads: add a portable barrier for threads and non-threads

HA_BARRIER() is just a simple memory barrier to prevent the compiler
from reordering our code.

MINOR: h1: add a function to measure the trailers length

This is needed in the H2->H1 gateway so that we know how long the trailers
block is in chunked encoding. It returns the number of bytes, or 0 if some
are missing, or -1 in case of parse error.

CLEANUP: threads: rename process_mask to thread_mask

It was a leftover from the last cleaning session; this mask applies
to threads and calling it process_mask is a bit confusing. It's the
same in fd, task and applets.

CLEANUP: threads: replace the last few 1UL<<tid with tid_bit

There were a few occurences left, better replace them now.

MINOR: ssl: Remove the global allow-0rtt option.

BUG/MINOR: dns: Fix SRV records with the new thread code.

srv_set_fqdn() may be called with the DNS lock already held, but tries to
lock it anyway. So, add a new parameter to let it know if it was already
locked or not;

BUILD: stick-tables: silence an uninitialized variable warning

Commit 819fc6f ("MEDIUM: threads/stick-tables: handle multithreads on
stick tables") introduced a valid warning about an uninitialized return
value in stksess_kill_if_expired(). It just happens that this result is
never used, so let's turn the function back to void as previously.

BUG/MAJOR: threads/time: Store the time deviation in an 64-bits integer

In function tv_update_date, we keep an offset reprenting the time deviation to
adjust the system time. At every call, we check if this offset must be updated
or not. Of course, It must be shared by all threads. It was store in a
timeval. But it cannot be atomically updated. So now, instead, we store it in a
64-bits integer. And in tv_update_date, we convert this integer in a
timeval. Once updated, it is converted back in an integer to be atomically
stored.

To store a tv_offset into an integer, we use 32 bits from tv_sec and 32 bits
tv_usec to avoid shift operations.

BUG/MAJOR: threads/freq_ctr: fix lock on freq counters.

The wrong bit was set to keep the lock on freq counter update. And the read
functions were re-worked to use volatile.

Moreover, when a freq counter is updated, it is now rotated only if the current
counter is in the past (now.tv_sec > ctr->curr_sec). It is important with
threads because the current time (now) is thread-local. So, rounded to the
second, the time may vary by more or less 1 second. So a freq counter rotated by
one thread may be see 1 second in the future. In this case, it is updated but
not rotated.

MAJOR: threads: Offically enable the threads support in HAProxy

Now, USE_THREAD option is implicitly enabled when HAProxy is compiled, for
targets linux2628 and freebsd. To enable it for other targets, you can set
"USE_THREAD=1" explicitly on the command line. And to disable it explicitly, you
must set "USE_THREAD=" on the command line.

Now, to be clear. This does not means it is bug free, far from that. But it
seems stable enough to be tested. You can try to experiment it and to report
bugs of course by setting nbthread parameter. By leaving it to 1 (or not using
it at all), it should be as safe as an HAProxy compiled without threads.

Between the commit "MINOR: threads: Prepare makefile to link with pthread" and
this one, the feature was in development and really unstable. It could be hard
to track a bug using a bisect for all these commits.

BUG/MINOR: threads: Add missing THREAD_LOCAL on static here and there

BUG/MEDIUM: threads: Run the poll loop on the main thread too

There was a flaw in the way the threads was created. the main one was just used
to create all the others and just wait to exit. Now, it is used to run a poll
loop. So we only create nbthread-1 threads.

This also fixes a bug about the compression filter when there is only 1 thread
(nbthread == 1 or no threads support). The bug was in the way thread-local
resources was initialized. per-thread init/deinit callbacks were never called
for the main process. So, with nthread set to 1, some buffers remained
uninitialized.

MINOR: threads: Don't start when device a detection module is used

For now, we don't know if device detection modules (51degrees, deviceatlas and
wurfl) are thread-safe or not. So HAproxy exits with an error when you try to
use one of them with nbthread greater than 1.

We will ask to maintainers of these modules to make them thread-safe or to give
us hints to do so.

MEDIUM: threads/server: Use the server lock to protect health check and cli concurrency

MINOR: threads/mailers: Add a lock to protect queues of email alerts

MINOR: threads/checks: Set the task process_mask when a check is executed

Tasks used to process checks are created to be processed by any threads. But,
once a check is started, we must be sure to be sticky on the running thread
because I/O will be also sticky on it. This is a requirement for now: Tasks and
I/O handlers linked to the same session must be executed on the same thread.

MINOR: threads/checks: Add a lock to protect the pid list used by external checks

MINOR: threads: Add thread-map config parameter in the global section

By default, no affinity is set for threads. To bind threads on CPU, you must
define a "thread-map" in the global section. The format is the same than the
"cpu-map" parameter, with a small difference. The process number must be
defined, with the same format than cpu-map ("all", "even", "odd" or a number
between 1 and 31/63).

A thread will be bound on the intersection of its mapping and the one of the
process on which it is attached. If the intersection is null, no specific bind
will be set for the thread.

MEDIUM: thread/dns: Make DNS thread-safe

MEDIUM: thread/spoe: Make the SPOE thread-safe

Because there is not migration mechanism yet, all runtime information about an
SPOE agent are thread-local and async exchanges with agents are disabled when we
have serveral threads. Howerver, pipelining is still available. So for now, the
thread part of the SPOE is pretty simple.

MEDIUM: threads/tasks: Add lock around notifications

This patch add lock around some notification calls

MEDIUM: threads/xref: Convert xref function to a thread safe model

Ensure that the unlink is done safely between thread and that
the peer struct will not destroy between the usage of the peer.

MEDIUM: threads/lua: Cannot acces to the socket if we try to access from another thread.

We have two y for nsuring that the data is not concurently manipulated:
- locks
- running task on the same thread.
locks are expensives, it is better to avoid it.

This patch cecks that the Lua task run on the same thread that
the stream associated to the coprocess.

TODO: in a next version, the error should be replaced by a yield
and thread migration request.

MEDIUM: threads/lua: Ensure that the launched tasks runs on the same threads than me

The applet manipulates the session and its buffers. We have two methods for
ensuring that the memory of the session will not change during its manipulation
by the task:
1 - adding mutex
2 - running on the same threads than the task.
The second point is smart because it cannot lock the execution of another thread.

MEDIUM: threads/lua: Add locks around the Lua execution parts.

Note that the Lua processing is not really thread safe. It provides
heavy system which consists to add our own lock function in the Lua
code and recompile the library. This system will probably not accepted
by maintainers of various distribs.

Our main excution point of the Lua is the function lua_resume(). A
quick looking on the Lua sources displays a lua_lock() a the start
of function and a lua_unlock() at the end of the function. So I
conclude that the Lua thread safe mode just perform a mutex around
all execution. So I prefer to do this in the HAProxy code, it will be
easier for distro maintainers.

Note that the HAProxy lua functions rounded by the macro SET_SAFE_LJMP
and RESET_SAFE_LJMP manipulates the Lua stack, so it will be careful
to set mutex around these functions.

MEDIUM: threads/lua: Makes the jmpbuf and some other buffers local to the current thread.

The jmpbuf contains pointer on the stack memory address currently use
when the jmpbuf is set. So the information is local to each thread.

The struct field is too big to put it in the stack, but it is used
as buffer for retriving stats values. So, this buffer si local to each
threads. Each function using this buffer, use it whithout break (yield)
so, the consistency of local buffer is ensured.

MEDIUM: threads/compression: Make HTTP compression thread-safe

MINOR: threads/filters: Update trace filter to add _per_thread callbacks

MEDIUM: threads/filters: Add init/deinit callback per thread

Now, it is possible to define init_per_thread and deinit_per_thread callbacks to
deal with ressources allocation for each thread.

This is the filter responsibility to deal with concurrency. This is also the
filter responsibility to know if HAProxy is started with some threads. A good
way to do so is to check "global.nbthread" value. If it is greater than 1, then
_per_thread callbacks will be called.

MEDIUM: thread/vars: Make vars thread-safe

A RW lock has been added to the vars structure to protect each list of
variables. And a global RW lock is used to protect registered names.

When a varibable is fetched, we duplicate sample data because the variable could
be modified by another thread.

MEDIUM: threads/freq_ctr: Make the frequency counters thread-safe

When a frequency counter must be updated, we use the curr_sec/curr_tick fields
as a lock, by setting the MSB to 1 in a compare-and-swap to lock and by reseting
it to unlock. And when we need to read it, we loop until the counter is
unlocked. This way, the frequency counters are thread-safe without any external
lock. It is important to avoid increasing the size of many structures (global,
proxy, server, stick_table).

MAJOR: threads/map: Make acls/maps thread safe

locks have been added in pat_ref and pattern_expr structures to protect all
accesses to an instance of on of them. Moreover, a global lock has been added to
protect the LRU cache used for pattern matching.

Patterns are now duplicated after a successfull matching, to avoid modification
by other threads when the result is used.

Finally, the function reloading a pattern list has been modified to be
thread-safe.

MEDIUM: threads/queue: Make queues thread-safe

The list of pending connections are now protected using the proxy or server
lock, depending on the context.

MAJOR: threads/ssl: Make SSL part thread-safe

First, OpenSSL is now initialized to be thread-safe. This is done by setting 2
callbacks. The first one is ssl_locking_function. It handles the locks and
unlocks. The second one is ssl_id_function. It returns the current thread
id. During the init step, we create as much as R/W locks as needed, ie the
number returned by CRYPTO_num_locks function.

Next, The reusable SSL session in the server context is now thread-local.

Shctx is now also initialized if HAProxy is started with several threads.

And finally, a global lock has been added to protect the LRU cache used to store
generated certificates. The function ssl_sock_get_generated_cert is now
deprecated because the retrieved certificate can be removed by another threads
in same time. Instead, a new function has been added,
ssl_sock_assign_generated_cert. It must be used to search a certificate in the
cache and set it immediatly if found.

MEDIUM: threads/stream: Make streams list thread safe

Adds a global lock to protect the full streams list used to dump
sessions on stats socket.

MAJOR: threads/buffer: Make buffer wait queue thread safe

Adds a global lock to protect the buffer wait queue.

MAJOR: threads/peers: Make peers thread safe

A lock is used to protect accesses to a peer structure.

A the lock is taken in the applet handler when the peer is identified
and released living the applet handler.

In the scheduling task for peers section, the lock is taken for every
listed peer and released at the end of the process task function.

The peer 'force shutdown' function was also re-worked.

MAJOR: threads/applet: Handle multithreading for applets

A global lock has been added to protect accesses to the list of active
applets. A process mask has also been added on each applet. Like for FDs and
tasks, it is used to know which threads are allowed to process an
applet. Because applets are, most of time, linked to a session, it should be
sticky on the same thread. But in all cases, it is the responsibility of the
applet handler to lock what have to be protected in the applet context.

MINOR: threads/regex: Change Regex trash buffer into a thread local variable

MEDIUM: threads/http: Make http_capture_bad_message thread-safe

This is done by passing the right stream's proxy (the frontend or the backend,
depending on the context) to lock the error snapshot used to store the error
info.

MINOR: threads/sample: Change temp_smp into a thread local variable

MEDIUM: threads/stick-tables: handle multithreads on stick tables

The stick table API was slightly reworked:

A global spin lock on stick table was added to perform lookup and
insert in a thread safe way. The handling of refcount on entries
is now handled directly by stick tables functions under protection
of this lock and was removed from the code of callers.

The "stktable_store" function is no more externalized and users should
now use "stktable_set_entry" in any case of insertion. This last one performs
a lookup followed by a store if not found. So the code using "stktable_store"
was re-worked.

Lookup, and set_entry functions automatically increase the refcount
of the returned/stored entry.

The function "sticktable_touch" was renamed "sticktable_touch_local"
and is now able to decrease the refcount if last arg is set to true. It
is allowing to release the entry without taking the lock twice.

A new function "sticktable_touch_remote" is now used to insert
entries coming from remote peers at the right place in the update tree.
The code of peer update was re-worked to use this new function.
This function is also able to decrease the refcount if wanted.

The function "stksess_kill" also handle a parameter to decrease
the refcount on the entry.

A read/write lock is added on each entry to protect the data content
updates of the entry.

MEDIUM: threads/lb: Make LB algorithms (lb_*.c) thread-safe

A lock for LB parameters has been added inside the proxy structure and atomic
operations have been used to update server variables releated to lb.

The only significant change is about lb_map. Because the servers status are
updated in the sync-point, we can call recalc_server_map function synchronously
in map_set_server_status_up/down function.

MINOR: threads/server: Add a lock to deal with insert in updates_servers list

This list is used to save changes on the servers state. So when serveral threads
are used, it must be locked. The changes are then applied in the sync-point. To
do so, servers_update_status has be moved in the sync-point. So this is useless
to lock it at this step because the sync-point is a protected area by iteself.

MEDIUM: threads/server: Add a lock per server and atomically update server vars

The server's lock is use, among other things, to lock acces to the active
connection list of a server.

MEDIUM: threads/server: Make connection list (priv/idle/safe) thread-safe

For now, we have a list of each type per thread. So there is no need to lock
them. This is the easiest solution for now, but not the best one because there
is no sharing between threads. An idle connection on a thread will not be able
be used by a stream on another thread. So it could be a good idea to rework this
patch later.

MEDIUM: threads/proxy: Add a lock per proxy and atomically update proxy vars

Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.

MEDIUM: threads/listeners: Make listeners thread-safe

First, we use atomic operations to update jobs/totalconn/actconn variables,
listener's nbconn variable and listener's counters. Then we add a lock on
listeners to protect access to their information. And finally, listener queues
(global and per proxy) are also protected by a lock. Here, because access to
these queues are unusal, we use the same lock for all queues instead of a global
one for the global queue and a lock per proxy for others.

MEDIUM: threads/signal: Add a lock to make signals thread-safe

A global lock has been added to protect the signal processing. So when a signal
it triggered, only one thread will catch it.

MAJOR: threads/task: handle multithread on task scheduler

2 global locks have been added to protect, respectively, the run queue and the
wait queue. And a process mask has been added on each task. Like for FDs, this
mask is used to know which threads are allowed to process a task.

For many tasks, all threads are granted. And this must be your first intension
when you create a new task, else you have a good reason to make a task sticky on
some threads. This is then the responsibility to the process callback to lock
what have to be locked in the task context.

Nevertheless, all tasks linked to a session must be sticky on the thread
creating the session. It is important that I/O handlers processing session FDs
and these tasks run on the same thread to avoid conflicts.

WIP: SQUASH WITH SYNC POINT

MINOR: threads/polling: pollers now handle FDs depending on the process mask

MINOR: threads/fd: Process cached events of FDs depending on the process mask

MEDIUM: threads/fd: Initialize the process mask during the call to fd_insert

Listeners will allow any threads to process the corresponding fd. But for other
FDs, we limit the processing to the current thread.

MINOR: threads/fd: Add a mask of threads allowed to process on each fd in fdtab array

MAJOR: threads/fd: Make fd stuffs thread-safe

Many changes have been made to do so. First, the fd_updt array, where all
pending FDs for polling are stored, is now a thread-local array. Then 3 locks
have been added to protect, respectively, the fdtab array, the fd_cache array
and poll information. In addition, a lock for each entry in the fdtab array has
been added to protect all accesses to a specific FD or its information.

For pollers, according to the poller, the way to manage the concurrency is
different. There is a poller loop on each thread. So the set of monitored FDs
may need to be protected. epoll and kqueue are thread-safe per-se, so there few
things to do to protect these pollers. This is not possible with select and
poll, so there is no sharing between the threads. The poller on each thread is
independant from others.

Finally, per-thread init/deinit functions are used for each pollers and for FD
part for manage thread-local ressources.

Now, you must be carefull when a FD is created during the HAProxy startup. All
update on the FD state must be made in the threads context and never before
their creation. This is mandatory because fd_updt array is thread-local and
initialized only for threads. Because there is no pollers for the main one, this
array remains uninitialized in this context. For this reason, listeners are now
enabled in run_thread_poll_loop function, just like the worker pipe.

MEDIUM: threads/pool: Make pool thread-safe by locking all access to a pool

A lock has been added for each memory pool. It is used to protect the pool
during allocations and releases. It is also used when pool info are dumped.

MEDIUM: threads/logs: Make logs thread-safe

log buffers and static variables used in log functions are now thread-local. So
there is no need to lock anything to log messages. Moreover, per-thread
init/deinit functions are now used to initialize these buffers.

MEDIUM: threads/time: Many global variables from time.h are now thread-local

MEDIUM: threads/chunks: Transform trash chunks in thread-local variables

So, per-thread init/deinit functions are registered to allocate/release them.

MEDIUM: threads/buffers: Define and register per-thread init/deinit functions

For now, only the swap_buffer is handled in these functions. Moreover,
swap_buffer has been changed to be a thread-local variable.

MINOR: threads: Define the sync-point inside run_poll_loop

The function sync_poll_loop is called at the end of each loop inside
run_poll_loop function. It is a protected area where all threads have a chance
to execute tricky tasks with the warranty that no concurrent access is
possible. Of course, it comes with a cost because all threads must be
syncrhonized. So changes must be uncommon.

MAJOR: threads: Start threads to experiment multithreading

[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.

When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...

Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.

Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.

MEDIUM: threads: Adds a set of functions to handle sync-point

A sync-point is a protected area where you have the warranty that no concurrency
access is possible. It is implementated as a thread barrier to enter in the
sync-point and another one to exit from it. Inside the sync-point, all threads
that must do some syncrhonous processing will be called one after the other
while all other threads will wait. All threads will then exit from the
sync-point at the same time.

A sync-point will be evaluated only when necessary because it is a costly
operation. To limit the waiting time of each threads, we must have a mechanism
to wakeup all threads. This is done with a pipe shared by all threads. By
writting in this pipe, we will interrupt all threads blocked on a poller. The
pipe is then flushed before exiting from the sync-point.

MINOR: threads: Add nbthread parameter

It is only parsed and initialized for now. It will be used later. This parameter
is only available when support for threads was built in.

MINOR: threads: Add mechanism to register per-thread init/deinit functions

hap_register_per_thread_init and hap_register_per_thread_deinit functions has
been added to register functions to do, for each thread, respectively, some
initialization and deinitialization. These functions are added in the global
lists per_thread_init_list and per_thread_deinit_list.

These functions are called only when HAProxy is started with more than 1 thread
(global.nbthread > 1).

MEDIUM: threads: Add hathreads header file

This file contains all functions and macros used to deal with concurrency in
HAProxy. It contains all high-level function to do atomic operation
(HA_ATOMIC_*). Note, for now, we rely on "__atomic" GCC builtins to do atomic
operation. So HAProxy can be compiled with the thread support iff these builtins
are available.

It also contains wrappers around plocks to use spin or read/write locks. These
wrappers are used to abstract the internal representation of the locking system
and to add information to help debugging, when compiled with suitable
options.

To add extra info on locks, you need to add DEBUG=-DDEBUG_THREAD or
DEBUG=-DDEBUG_FULL compilation option. In addition to timing info on locks, we
keep info on where a lock was acquired the last time (function name, file and
line). There are also the thread id and a flag to know if it is still locked or
not. This will be useful to debug deadlocks.

MINOR: threads: Add atomic-ops and plock includes in import dir

atomic-ops header contains some low-level functions to do atomic
operations. These operations are used by the progressive locks (plock).

MINOR: threads: Add THREAD_LOCAL macro

When compiled with threads support, this marco is set to __thread. Else it is
empty.

MINOR: threads: Prepare makefile to link with pthread

USE_THREAD option has been added to enable the compilation with the experimental
support of threads . Of course for now, there is nothing. And for a while,
HAProxy will be unstable. When we will be confident enough, this option will be
removed.

For this implementation and probably for a while, only the pthread library will
be supported.

MINOR: startup: Extend the scope the MODE_STARTING flag

Now, MODE_STARTING is set at the begining to init function and it is removed
just before the polling loop. So more alerts or warnings are saved.

MINOR: cli: Add "show startup-logs" command

This command will dump all startup_logs buffer containing all alerts and
warnings emitted during HAProxy startup.

MINOR: log: Save alerts and warnings emitted during HAProxy startup

Because we can't always display the standard error messages when HAProxy is
started, all alerts and warnings emitted during the startup will now be saved in
a buffer. It can also be handy to store these messages just in case you
missed something during the startup

To implement this feature, Alert and Warning functions now relies on
display_message. The difference is just on conditions to call this function and
it remains unchanged. In display_message, if MODE_STARTING flag is set, we save
the message.

MINOR: standard: Add memvprintf function

Now memprintf relies on memvprintf. This new function does exactly what
memprintf did before, but it must be called with a va_list instead of a variable
number of arguments. So there is no change for every functions using
memprintf. But it is now also possible to have same functionnality from any
function with variadic arguments.

MINOR: mailers: Use pools to allocate email alerts and its tcpcheck_rules

MEDIUM: mailers: Init alerts during conf parsing and refactor their processing

Email alerts relies on checks to send emails. The link between a mailers section
and a proxy was resolved during the configuration parsing, But initialization was
done when the first alert is triggered. This implied memory allocations and
tasks creations. With this patch, everything is now initialized during the
configuration parsing. So when an alert is triggered, only the memory required
by this alert is dynamically allocated.

Moreover, alerts processing had a flaw. The task handler used to process alerts
to be sent to the same mailer, process_email_alert, was designed to give back
the control to the scheduler when an alert was sent. So there was a delay
between the sending of 2 consecutives alerts (the min of
"proxy->timeout.connect" and "mailer->timeout.mail"). To fix this problem, now,
we try to process as much queued alerts as possible when the task is woken up.

BUG/MINOR: mailers: Fix a memory leak when email alerts are released

An email alert contains a list of tcpcheck_rule. Each one is dynamically
allocated, just like its internal members. So, when an email alerts is freed, we
must be sure to properly free each tcpcheck_rule too.

This patch must be backported in 1.7 and 1.6.

MAJOR: dns: Refactor the DNS code

This is a huge patch with many changes, all about the DNS. Initially, the idea
was to update the DNS part to ease the threads support integration. But quickly,
I started to refactor some parts. And after several iterations, it was
impossible for me to commit the different parts atomically. So, instead of
adding tens of patches, often reworking the same parts, it was easier to merge
all my changes in a uniq patch. Here are all changes made on the DNS.

First, the DNS initialization has been refactored. The DNS configuration parsing
remains untouched, in cfgparse.c. But all checks have been moved in a post-check
callback. In the function dns_finalize_config, for each resolvers, the
nameservers configuration is tested and the task used to manage DNS resolutions
is created. The links between the backend's servers and the resolvers are also
created at this step. Here no connection are kept alive. So there is no needs
anymore to reopen them after HAProxy fork. Connections used to send DNS queries
will be opened on demand.

Then, the way DNS requesters are linked to a DNS resolution has been
reworked. The resolution used by a requester is now referenced into the
dns_requester structure and the resolution pointers in server and dns_srvrq
structures have been removed. wait and curr list of requesters, for a DNS
resolution, have been replaced by a uniq list. And Finally, the way a requester
is removed from a DNS resolution has been simplified. Now everything is done in
dns_unlink_resolution.

srv_set_fqdn function has been simplified. Now, there is only 1 way to set the
server's FQDN, independently it is done by the CLI or when a SRV record is
resolved.

The static DNS resolutions pool has been replaced by a dynamoc pool. The part
has been modified by Baptiste Assmann.

The way the DNS resolutions are triggered by the task or by a health-check has
been totally refactored. Now, all timeouts are respected. Especially
hold.valid. The default frequency to wake up a resolvers is now configurable
using "timeout resolve" parameter.

Now, as documented, as long as invalid repsonses are received, we really wait
all name servers responses before retrying.

As far as possible, resources allocated during DNS configuration parsing are
releases when HAProxy is shutdown.

Beside all these changes, the code has been cleaned to ease code review and the
doc has been updated.

BUG/MINOR: dns: Fix CLI keyword declaration

The cli command to show resolvers stats is in conflict with the command to show
proxies and servers stats. When you use the command "show stat resolvers [id]",
instead of printing stats about resolvers, you get the stats about all proxies
and servers.

Now, to avoid conflict, to print resolvers stats, you must use the following
command:

show resolvers [id]

This patch must be backported in 1.7.