BUG/MEDIUM: threads: Run the poll loop on the main thread too
There was a flaw in the way the threads was created. the main one was just used
to create all the others and just wait to exit. Now, it is used to run a poll
loop. So we only create nbthread-1 threads.
This also fixes a bug about the compression filter when there is only 1 thread
(nbthread == 1 or no threads support). The bug was in the way thread-local
resources was initialized. per-thread init/deinit callbacks were never called
for the main process. So, with nthread set to 1, some buffers remained
uninitialized.
MINOR: threads: Don't start when device a detection module is used
For now, we don't know if device detection modules (51degrees, deviceatlas and
wurfl) are thread-safe or not. So HAproxy exits with an error when you try to
use one of them with nbthread greater than 1.
We will ask to maintainers of these modules to make them thread-safe or to give
us hints to do so.
MINOR: threads/checks: Set the task process_mask when a check is executed
Tasks used to process checks are created to be processed by any threads. But,
once a check is started, we must be sure to be sticky on the running thread
because I/O will be also sticky on it. This is a requirement for now: Tasks and
I/O handlers linked to the same session must be executed on the same thread.
MINOR: threads: Add thread-map config parameter in the global section
By default, no affinity is set for threads. To bind threads on CPU, you must
define a "thread-map" in the global section. The format is the same than the
"cpu-map" parameter, with a small difference. The process number must be
defined, with the same format than cpu-map ("all", "even", "odd" or a number
between 1 and 31/63).
A thread will be bound on the intersection of its mapping and the one of the
process on which it is attached. If the intersection is null, no specific bind
will be set for the thread.
Because there is not migration mechanism yet, all runtime information about an
SPOE agent are thread-local and async exchanges with agents are disabled when we
have serveral threads. Howerver, pipelining is still available. So for now, the
thread part of the SPOE is pretty simple.
MEDIUM: threads/lua: Cannot acces to the socket if we try to access from another thread.
We have two y for nsuring that the data is not concurently manipulated:
- locks
- running task on the same thread.
locks are expensives, it is better to avoid it.
This patch cecks that the Lua task run on the same thread that
the stream associated to the coprocess.
TODO: in a next version, the error should be replaced by a yield
and thread migration request.
MEDIUM: threads/lua: Ensure that the launched tasks runs on the same threads than me
The applet manipulates the session and its buffers. We have two methods for
ensuring that the memory of the session will not change during its manipulation
by the task:
1 - adding mutex
2 - running on the same threads than the task.
The second point is smart because it cannot lock the execution of another thread.
MEDIUM: threads/lua: Add locks around the Lua execution parts.
Note that the Lua processing is not really thread safe. It provides
heavy system which consists to add our own lock function in the Lua
code and recompile the library. This system will probably not accepted
by maintainers of various distribs.
Our main excution point of the Lua is the function lua_resume(). A
quick looking on the Lua sources displays a lua_lock() a the start
of function and a lua_unlock() at the end of the function. So I
conclude that the Lua thread safe mode just perform a mutex around
all execution. So I prefer to do this in the HAProxy code, it will be
easier for distro maintainers.
Note that the HAProxy lua functions rounded by the macro SET_SAFE_LJMP
and RESET_SAFE_LJMP manipulates the Lua stack, so it will be careful
to set mutex around these functions.
MEDIUM: threads/lua: Makes the jmpbuf and some other buffers local to the current thread.
The jmpbuf contains pointer on the stack memory address currently use
when the jmpbuf is set. So the information is local to each thread.
The struct field is too big to put it in the stack, but it is used
as buffer for retriving stats values. So, this buffer si local to each
threads. Each function using this buffer, use it whithout break (yield)
so, the consistency of local buffer is ensured.
MEDIUM: threads/filters: Add init/deinit callback per thread
Now, it is possible to define init_per_thread and deinit_per_thread callbacks to
deal with ressources allocation for each thread.
This is the filter responsibility to deal with concurrency. This is also the
filter responsibility to know if HAProxy is started with some threads. A good
way to do so is to check "global.nbthread" value. If it is greater than 1, then
_per_thread callbacks will be called.
MEDIUM: threads/freq_ctr: Make the frequency counters thread-safe
When a frequency counter must be updated, we use the curr_sec/curr_tick fields
as a lock, by setting the MSB to 1 in a compare-and-swap to lock and by reseting
it to unlock. And when we need to read it, we loop until the counter is
unlocked. This way, the frequency counters are thread-safe without any external
lock. It is important to avoid increasing the size of many structures (global,
proxy, server, stick_table).
locks have been added in pat_ref and pattern_expr structures to protect all
accesses to an instance of on of them. Moreover, a global lock has been added to
protect the LRU cache used for pattern matching.
Patterns are now duplicated after a successfull matching, to avoid modification
by other threads when the result is used.
Finally, the function reloading a pattern list has been modified to be
thread-safe.
Emeric Brun [Thu, 15 Jun 2017 14:37:39 +0000 (16:37 +0200)]
MAJOR: threads/ssl: Make SSL part thread-safe
First, OpenSSL is now initialized to be thread-safe. This is done by setting 2
callbacks. The first one is ssl_locking_function. It handles the locks and
unlocks. The second one is ssl_id_function. It returns the current thread
id. During the init step, we create as much as R/W locks as needed, ie the
number returned by CRYPTO_num_locks function.
Next, The reusable SSL session in the server context is now thread-local.
Shctx is now also initialized if HAProxy is started with several threads.
And finally, a global lock has been added to protect the LRU cache used to store
generated certificates. The function ssl_sock_get_generated_cert is now
deprecated because the retrieved certificate can be removed by another threads
in same time. Instead, a new function has been added,
ssl_sock_assign_generated_cert. It must be used to search a certificate in the
cache and set it immediatly if found.
Emeric Brun [Mon, 19 Jun 2017 10:38:55 +0000 (12:38 +0200)]
MAJOR: threads/applet: Handle multithreading for applets
A global lock has been added to protect accesses to the list of active
applets. A process mask has also been added on each applet. Like for FDs and
tasks, it is used to know which threads are allowed to process an
applet. Because applets are, most of time, linked to a session, it should be
sticky on the same thread. But in all cases, it is the responsibility of the
applet handler to lock what have to be protected in the applet context.
Emeric Brun [Thu, 15 Jun 2017 09:30:06 +0000 (11:30 +0200)]
MEDIUM: threads/http: Make http_capture_bad_message thread-safe
This is done by passing the right stream's proxy (the frontend or the backend,
depending on the context) to lock the error snapshot used to store the error
info.
Emeric Brun [Tue, 13 Jun 2017 17:37:32 +0000 (19:37 +0200)]
MEDIUM: threads/stick-tables: handle multithreads on stick tables
The stick table API was slightly reworked:
A global spin lock on stick table was added to perform lookup and
insert in a thread safe way. The handling of refcount on entries
is now handled directly by stick tables functions under protection
of this lock and was removed from the code of callers.
The "stktable_store" function is no more externalized and users should
now use "stktable_set_entry" in any case of insertion. This last one performs
a lookup followed by a store if not found. So the code using "stktable_store"
was re-worked.
Lookup, and set_entry functions automatically increase the refcount
of the returned/stored entry.
The function "sticktable_touch" was renamed "sticktable_touch_local"
and is now able to decrease the refcount if last arg is set to true. It
is allowing to release the entry without taking the lock twice.
A new function "sticktable_touch_remote" is now used to insert
entries coming from remote peers at the right place in the update tree.
The code of peer update was re-worked to use this new function.
This function is also able to decrease the refcount if wanted.
The function "stksess_kill" also handle a parameter to decrease
the refcount on the entry.
A read/write lock is added on each entry to protect the data content
updates of the entry.
MEDIUM: threads/lb: Make LB algorithms (lb_*.c) thread-safe
A lock for LB parameters has been added inside the proxy structure and atomic
operations have been used to update server variables releated to lb.
The only significant change is about lb_map. Because the servers status are
updated in the sync-point, we can call recalc_server_map function synchronously
in map_set_server_status_up/down function.
MINOR: threads/server: Add a lock to deal with insert in updates_servers list
This list is used to save changes on the servers state. So when serveral threads
are used, it must be locked. The changes are then applied in the sync-point. To
do so, servers_update_status has be moved in the sync-point. So this is useless
to lock it at this step because the sync-point is a protected area by iteself.
MEDIUM: threads/server: Make connection list (priv/idle/safe) thread-safe
For now, we have a list of each type per thread. So there is no need to lock
them. This is the easiest solution for now, but not the best one because there
is no sharing between threads. An idle connection on a thread will not be able
be used by a stream on another thread. So it could be a good idea to rework this
patch later.
MEDIUM: threads/proxy: Add a lock per proxy and atomically update proxy vars
Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.
MEDIUM: threads/listeners: Make listeners thread-safe
First, we use atomic operations to update jobs/totalconn/actconn variables,
listener's nbconn variable and listener's counters. Then we add a lock on
listeners to protect access to their information. And finally, listener queues
(global and per proxy) are also protected by a lock. Here, because access to
these queues are unusal, we use the same lock for all queues instead of a global
one for the global queue and a lock per proxy for others.
MAJOR: threads/task: handle multithread on task scheduler
2 global locks have been added to protect, respectively, the run queue and the
wait queue. And a process mask has been added on each task. Like for FDs, this
mask is used to know which threads are allowed to process a task.
For many tasks, all threads are granted. And this must be your first intension
when you create a new task, else you have a good reason to make a task sticky on
some threads. This is then the responsibility to the process callback to lock
what have to be locked in the task context.
Nevertheless, all tasks linked to a session must be sticky on the thread
creating the session. It is important that I/O handlers processing session FDs
and these tasks run on the same thread to avoid conflicts.
Many changes have been made to do so. First, the fd_updt array, where all
pending FDs for polling are stored, is now a thread-local array. Then 3 locks
have been added to protect, respectively, the fdtab array, the fd_cache array
and poll information. In addition, a lock for each entry in the fdtab array has
been added to protect all accesses to a specific FD or its information.
For pollers, according to the poller, the way to manage the concurrency is
different. There is a poller loop on each thread. So the set of monitored FDs
may need to be protected. epoll and kqueue are thread-safe per-se, so there few
things to do to protect these pollers. This is not possible with select and
poll, so there is no sharing between the threads. The poller on each thread is
independant from others.
Finally, per-thread init/deinit functions are used for each pollers and for FD
part for manage thread-local ressources.
Now, you must be carefull when a FD is created during the HAProxy startup. All
update on the FD state must be made in the threads context and never before
their creation. This is mandatory because fd_updt array is thread-local and
initialized only for threads. Because there is no pollers for the main one, this
array remains uninitialized in this context. For this reason, listeners are now
enabled in run_thread_poll_loop function, just like the worker pipe.
log buffers and static variables used in log functions are now thread-local. So
there is no need to lock anything to log messages. Moreover, per-thread
init/deinit functions are now used to initialize these buffers.
MINOR: threads: Define the sync-point inside run_poll_loop
The function sync_poll_loop is called at the end of each loop inside
run_poll_loop function. It is a protected area where all threads have a chance
to execute tricky tasks with the warranty that no concurrent access is
possible. Of course, it comes with a cost because all threads must be
syncrhonized. So changes must be uncommon.
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
MEDIUM: threads: Adds a set of functions to handle sync-point
A sync-point is a protected area where you have the warranty that no concurrency
access is possible. It is implementated as a thread barrier to enter in the
sync-point and another one to exit from it. Inside the sync-point, all threads
that must do some syncrhonous processing will be called one after the other
while all other threads will wait. All threads will then exit from the
sync-point at the same time.
A sync-point will be evaluated only when necessary because it is a costly
operation. To limit the waiting time of each threads, we must have a mechanism
to wakeup all threads. This is done with a pipe shared by all threads. By
writting in this pipe, we will interrupt all threads blocked on a poller. The
pipe is then flushed before exiting from the sync-point.
MINOR: threads: Add mechanism to register per-thread init/deinit functions
hap_register_per_thread_init and hap_register_per_thread_deinit functions has
been added to register functions to do, for each thread, respectively, some
initialization and deinitialization. These functions are added in the global
lists per_thread_init_list and per_thread_deinit_list.
These functions are called only when HAProxy is started with more than 1 thread
(global.nbthread > 1).
This file contains all functions and macros used to deal with concurrency in
HAProxy. It contains all high-level function to do atomic operation
(HA_ATOMIC_*). Note, for now, we rely on "__atomic" GCC builtins to do atomic
operation. So HAProxy can be compiled with the thread support iff these builtins
are available.
It also contains wrappers around plocks to use spin or read/write locks. These
wrappers are used to abstract the internal representation of the locking system
and to add information to help debugging, when compiled with suitable
options.
To add extra info on locks, you need to add DEBUG=-DDEBUG_THREAD or
DEBUG=-DDEBUG_FULL compilation option. In addition to timing info on locks, we
keep info on where a lock was acquired the last time (function name, file and
line). There are also the thread id and a flag to know if it is still locked or
not. This will be useful to debug deadlocks.
Emeric Brun [Mon, 26 Jun 2017 16:41:42 +0000 (18:41 +0200)]
MINOR: threads: Prepare makefile to link with pthread
USE_THREAD option has been added to enable the compilation with the experimental
support of threads . Of course for now, there is nothing. And for a while,
HAProxy will be unstable. When we will be confident enough, this option will be
removed.
For this implementation and probably for a while, only the pthread library will
be supported.
MINOR: log: Save alerts and warnings emitted during HAProxy startup
Because we can't always display the standard error messages when HAProxy is
started, all alerts and warnings emitted during the startup will now be saved in
a buffer. It can also be handy to store these messages just in case you
missed something during the startup
To implement this feature, Alert and Warning functions now relies on
display_message. The difference is just on conditions to call this function and
it remains unchanged. In display_message, if MODE_STARTING flag is set, we save
the message.
Now memprintf relies on memvprintf. This new function does exactly what
memprintf did before, but it must be called with a va_list instead of a variable
number of arguments. So there is no change for every functions using
memprintf. But it is now also possible to have same functionnality from any
function with variadic arguments.
MEDIUM: mailers: Init alerts during conf parsing and refactor their processing
Email alerts relies on checks to send emails. The link between a mailers section
and a proxy was resolved during the configuration parsing, But initialization was
done when the first alert is triggered. This implied memory allocations and
tasks creations. With this patch, everything is now initialized during the
configuration parsing. So when an alert is triggered, only the memory required
by this alert is dynamically allocated.
Moreover, alerts processing had a flaw. The task handler used to process alerts
to be sent to the same mailer, process_email_alert, was designed to give back
the control to the scheduler when an alert was sent. So there was a delay
between the sending of 2 consecutives alerts (the min of
"proxy->timeout.connect" and "mailer->timeout.mail"). To fix this problem, now,
we try to process as much queued alerts as possible when the task is woken up.
BUG/MINOR: mailers: Fix a memory leak when email alerts are released
An email alert contains a list of tcpcheck_rule. Each one is dynamically
allocated, just like its internal members. So, when an email alerts is freed, we
must be sure to properly free each tcpcheck_rule too.
This is a huge patch with many changes, all about the DNS. Initially, the idea
was to update the DNS part to ease the threads support integration. But quickly,
I started to refactor some parts. And after several iterations, it was
impossible for me to commit the different parts atomically. So, instead of
adding tens of patches, often reworking the same parts, it was easier to merge
all my changes in a uniq patch. Here are all changes made on the DNS.
First, the DNS initialization has been refactored. The DNS configuration parsing
remains untouched, in cfgparse.c. But all checks have been moved in a post-check
callback. In the function dns_finalize_config, for each resolvers, the
nameservers configuration is tested and the task used to manage DNS resolutions
is created. The links between the backend's servers and the resolvers are also
created at this step. Here no connection are kept alive. So there is no needs
anymore to reopen them after HAProxy fork. Connections used to send DNS queries
will be opened on demand.
Then, the way DNS requesters are linked to a DNS resolution has been
reworked. The resolution used by a requester is now referenced into the
dns_requester structure and the resolution pointers in server and dns_srvrq
structures have been removed. wait and curr list of requesters, for a DNS
resolution, have been replaced by a uniq list. And Finally, the way a requester
is removed from a DNS resolution has been simplified. Now everything is done in
dns_unlink_resolution.
srv_set_fqdn function has been simplified. Now, there is only 1 way to set the
server's FQDN, independently it is done by the CLI or when a SRV record is
resolved.
The static DNS resolutions pool has been replaced by a dynamoc pool. The part
has been modified by Baptiste Assmann.
The way the DNS resolutions are triggered by the task or by a health-check has
been totally refactored. Now, all timeouts are respected. Especially
hold.valid. The default frequency to wake up a resolvers is now configurable
using "timeout resolve" parameter.
Now, as documented, as long as invalid repsonses are received, we really wait
all name servers responses before retrying.
As far as possible, resources allocated during DNS configuration parsing are
releases when HAProxy is shutdown.
Beside all these changes, the code has been cleaned to ease code review and the
doc has been updated.
The cli command to show resolvers stats is in conflict with the command to show
proxies and servers stats. When you use the command "show stat resolvers [id]",
instead of printing stats about resolvers, you get the stats about all proxies
and servers.
Now, to avoid conflict, to print resolvers stats, you must use the following
command:
MEDIUM: spoe/rules: Process "send-spoe-group" action
The messages processing is done using existing functions. So here, the main task
is to find the SPOE engine to use. To do so, we loop on all filter instances
attached to the stream. For each, we check if it is a SPOE filter and, if yes,
if its name is the one used to declare the "send-spoe-group" action.
We also take care to return an error if the action processing is interrupted by
HAProxy (because of a timeout or an error at the HAProxy level). This is done by
checking if the flag ACT_FLAG_FINAL is set.
The function spoe_send_group is the action_ptr callback ot
MINOR: spoe: Add a type to qualify the message list during encoding
Because we can have messages chained by event or by group, we need to have a way
to know which kind of list we manipulate during the encoding. So 2 types of list
has been added, SPOE_MSGS_BY_EVENT and SPOE_MSGS_BY_GROUP. And the right type is
passed when spoe_encode_messages is called.
MEDIUM: spoe/rules: Add "send-spoe-group" action for tcp/http rules
This action is used to trigger sending of a group of SPOE messages. To do so,
the SPOE engine used to send messages must be defined, as well as the SPOE group
to send. Of course, the SPOE engine must refer to an existing SPOE filter. If
not engine name is provided on the SPOE filter line, the SPOE agent name must be
used. For example:
http-request send-spoe-group my-engine some-group
This action is available for "tcp-request content", "tcp-response content",
"http-request" and "http-response" rulesets. It cannot be used for tcp
connection/session rulesets because actions for these rulesets cannot yield.
For now, the action keyword is parsed and checked. But it does nothing. Its
processing will be added in another patch.
MEDIUM: spoe: Parse new "spoe-group" section in SPOE config file
For now, this section is only parsed. It should have the following format:
spoe-group <grp-name>
messages <msg-name> ...
And then SPOE groups must be referenced in spoe-agent section:
spoe-agnt <name>
...
groups <grp-name> ...
The purpose of these groups is to trigger messages sending from TCP or HTTP
rules, directly from HAProxy configuration, and not on specific event. This part
will be added in another patch.
It is important to note that a message belongs at most to a group.
MINOR: spoe: Check uniqness of SPOE engine names during config parsing
The engine name is now kept in "spoe_config" struture. Because a SPOE filter can
be declared without engine name, we use the SPOE agent name by default. Then,
its uniqness is checked against all others SPOE engines configured for the same
proxy.
MEDIUM: spoe: Add support of ACLS to enable or disable sending of SPOE messages
Now, it is possible to conditionnaly send a SPOE message by adding an ACL-based
condition on the "event" line, in a "spoe-message" section. Here is the example
coming for the SPOE documentation:
To avoid mixin with proxy's ACLs, each SPOE message has its private ACL list. It
possible to declare named ACLs in "spoe-message" section, using the same syntax
than for proxies. So we can rewrite the previous example to use a named ACL:
MINOR: action: Add a functions to check http capture rules
"check_http_req_capture" and "check_http_res_capture" functions have been added
to check validity of "http-request capture" and "http-response capture"
rules. Code for these functions come from cfgparse.c.
BUG/MINOR: spoa: Update pointer on the end of the frame when a reply is encoded
The same buffer is used for a request and its response. So we need to be sure
to correctly reset info when the response is encoded. And here there was a
bug. The pointer on the end of the frame was not updated. So it was not
possible to encode a response bigger than the corresponding request.
BUG/MINOR: spoe: Don't compare engine name and SPOE scope when both are NULL
SPOE filter can be declared without engine name. This is an optional
parameter. But in this case, no scope must be used in the SPOE configuration
file. So engine name and scope are both undefined, and, obviously, we must not
try to compare them.
Willy Tarreau [Tue, 31 Oct 2017 07:02:24 +0000 (08:02 +0100)]
MINOR: h1: store the status code in the H1 message
It was painful not to have the status code available, especially when
it was computed. Let's store it and ensure we don't claim content-length
anymore on 1xx, only 0 body bytes.
This patch reorganize the shctx API in a generic storage API, separating
the shared SSL session handling from its core.
The shctx API only handles the generic data part, it does not know what
kind of data you use with it.
A shared_context is a storage structure allocated in a shared memory,
allowing its usage in a multithread or a multiprocess context.
The structure use 2 linked list, one containing the available blocks,
and another for the hot locked blocks. At initialization the available
list is filled with <maxblocks> blocks of size <blocksize>. An <extra>
space is initialized outside the list in case you need some specific
storage.
The API allows to store content on several linked blocks. For example,
if you allocated blocks of 16 bytes, and you want to store an object of
60 bytes, the object will be allocated in a row of 4 blocks.
The API was made for LRU usage, each time you get an object, it pushes
the object at the end of the list. When it needs more space, it discards
The functions name have been renamed in a more logical way, the part
regarding shctx have been prefixed by shctx_ and the functions for the
shared ssl session cache have been prefixed by sh_ssl_sess_.
Move the ssl callback functions of the ssl shared session cache to
ssl_sock.c. The shctx functions still needs to be separated of the ssl
tree and data.
Willy Tarreau [Mon, 30 Oct 2017 18:31:59 +0000 (19:31 +0100)]
MEDIUM: h1: ensure that 1xx, 204 and 304 don't have a payload body
It's important for the H2 to H1 gateway that the response parser properly
clears the H1 message's body_len when seeing these status codes so that we
don't hang waiting to transfer data that will not come.
Olivier Houchard [Fri, 27 Oct 2017 12:58:08 +0000 (14:58 +0200)]
MINOR: ssl: Don't abuse ssl_options.
A bind_conf does contain a ssl_bind_conf, which already has a flag to know
if early data are activated, so use that, instead of adding a new flag in
the ssl_options field.
MINOR: ssl/proto_http: Add keywords to take care of early data.
Add a new sample fetch, "ssl_fc_has_early", a boolean that will be true
if early data were sent, and a new action, "wait-for-handshake", if used,
the request won't be forwarded until the SSL handshake is done.