Baptiste Assmann [Mon, 21 Jan 2019 07:34:50 +0000 (08:34 +0100)]
MINOR: action: new '(http-request|tcp-request content) do-resolve' action
The 'do-resolve' action is an http-request or tcp-request content action
which allows to run DNS resolution at run time in HAProxy.
The name to be resolved can be picked up in the request sent by the
client and the result of the resolution is stored in a variable.
The time the resolution is being performed, the request is on pause.
If the resolution can't provide a suitable result, then the variable
will be empty. It's up to the admin to take decisions based on this
statement (return 503 to prevent loops).
Read carefully the documentation concerning this feature, to ensure your
setup is secure and safe to be used in production.
This patch creates a global counter to track various errors reported by
the action 'do-resolve'.
Baptiste Assmann [Tue, 30 Jan 2018 07:08:04 +0000 (08:08 +0100)]
MINOR: dns: move callback affection in dns_link_resolution()
In dns.c, dns_link_resolution(), each type of dns requester is managed
separately, that said, the callback function is affected globaly (and
points to server type callbacks only).
This design prevents the addition of new dns requester type and this
patch aims at fixing this limitation: now, the callback setting is done
directly into the portion of code dedicated to each requester type.
Baptiste Assmann [Mon, 21 Jan 2019 07:18:09 +0000 (08:18 +0100)]
MINOR: dns: dns_requester structures are now in a memory pool
dns_requester structure can be allocated at run time when servers get
associated to DNS resolution (this happens when SRV records are used in
conjunction with service discovery).
Well, this memory allocation is safer if managed in an HAProxy pool,
furthermore with upcoming HTTP action which can perform DNS resolution
at runtime.
This patch moves the memory management of the dns_requester structure
into its own pool.
This is dummy version of the Scientiamobile WURFL C API that can be used
to successfully build/run haproxy compiled with USE_WURFL=1.
It is marked as version 1.11.2.100 to distinguish it from any real version
of the lib. It has no external dependencies so it should work out of the
box by building it like this :
$ make -C contrib/wurfl
In order to use it, simply reference this directory as the WURFL include
and library paths :
$ make TARGET=<target> USE_WURFL=1 WURFL_INC=$PWD/contrib/wurfl WURFL_LIB=$PWD/contrib/wurfl
Initially excluded multithreaded mode is completely supported (libwurfl is fully MT safe).
Internal tests now are run also with multithreading enabled.
last 2 major releases of libwurfl included a complete review of engine options with
the result of deprecating many features. The patch removes unecessary code and fixes
the documentation.
Can be backported on any version of haproxy.
[wt: must not be backported since it removes config keywords and would
thus break existing configurations] Signed-off-by: Willy Tarreau <w@1wt.eu>
MINOR: gcc: Fix a silly gcc warning in connect_server()
Don't know why it happens now, but gcc seems to think srv_conn may be NULL when
a reused connection is removed from the orphan list. It happens when HAProxy is
compiled with -O2 with my gcc (8.3.1) on fedora 29... Changing a little how
reuse parameter is tested removes the warnings. So...
BUG/MINOR: da: Get the request channel to call CHECK_HTTP_MESSAGE_FIRST()
Since the commit 89dc49935 ("BUG/MAJOR: http_fetch: Get the channel depending on
the keyword used"), the right channel must be passed as argument when the macro
CHECK_HTTP_MESSAGE_FIRST is called.
BUG/MINOR: 51d: Get the request channel to call CHECK_HTTP_MESSAGE_FIRST()
Since the commit 89dc49935 ("BUG/MAJOR: http_fetch: Get the channel depending on
the keyword used"), the right channel must be passed as argument when the macro
CHECK_HTTP_MESSAGE_FIRST is called.
BUG/MEDIUM: stream: Don't request a server connection if a shutw was scheduled
If a shutdown for writes was performed on the client side (CF_SHUTW is set on
the request channel) while the server connection is still unestablished (the
stream-int is in the state SI_ST_INI), then it is aborted. It must also be
aborted when the shudown for write is pending (only CF_SHUTW_NOW is
set). Otherwise, some errors on the request channel can be ignored, leaving the
stream in an undefined state.
This patch must be backported to 1.9. It may probably be backported to all
suported versions, but it is unclear if the bug is visbile for older versions
than 1.9. So it is probably safer to wait bug reports on these versions to
backport this patch.
BUG/MEDIUM: thread/http: Add missing locks in set-map and add-acl HTTP rules
Locks are missing in the rules "http-request set-map" and "http-response
add-acl" when an acl or map update is performed. Pattern elements must be
locked.
This patch must be backported to 1.9 and 1.8. For the 1.8, the HTX part must be
ignored.
BUG/MEDIUM: h1: Don't parse chunks CRLF if not enough data are available
As specified in the function comment, the function h1_skip_chunk_crlf() must not
change anything and return zero if not enough data are available. This must
include the case where there is no data at all. On this point, it must do the
same that other h1 parsing functions. This bug is made visible since the commit 91f77d599 ("BUG/MINOR: mux-h1: Process input even if the input buffer is
empty").
MINOR: proto_tcp: tcp-request content: enable set-dst and set-dst-var
The set-dst and set dst-var are available at both 'tcp-request
connection' and 'http-request' but not at the layer in the middle.
This patch fixes this miss and enables both set-dst and set-dst-var at
'tcp-request content' layer.
checks/s00001.vtc needs support for "srvrecord" which came with 1.8 version.
peers/s_basic_sync.vtc and s_tls_basic_sync.vtc need support for "server"
keyword usage in "peers" section which came with 2.0 version.
BUG/MINOR: acl: properly detect pattern type SMP_T_ADDR
Since 1.6-dev4 with commit b2f8f087f ("MINOR: map: The map can return
IPv4 and IPv6"), maps can return both IPv4 and IPv6 addresses, which
is represented as SMP_T_ADDR at the output of the map converter. But
the ACL parser only checks for either SMP_T_IPV4 or SMP_T_IPV6 and
requires to see an explicit matching method specified. Given that it
uses the same pattern parser for both address families, it implicitly
is also compatible with SMP_T_ADDR, which ought to have been added
there.
BUG/MEDIUM: maps: only try to parse the default value when it's present
Maps returning an IP address (e.g. map_str_ip) support an optional
default value which must be parsed. Unfortunately the parsing code does
not check for this argument's existence and uncondtionally tries to
resolve the argument whenever the output is of type address, resulting
in segfaults at parsing time when no such argument is provided. This
patch adds the appropriate check.
MEDIUM: connections: Add a way to control the number of idling connections.
As by default we add all keepalive connections to the idle pool, if we run
into a pathological case, where all client don't do keepalive, but the server
does, and haproxy is configured to only reuse "safe" connections, we will
soon find ourself having lots of idling, unusable for new sessions, connections,
while we won't have any file descriptors available to create new connections.
To fix this, add 2 new global settings, "pool_low_ratio" and "pool_high_ratio".
pool-low-fd-ratio is the % of fds we're allowed to use (against the maximum
number of fds available to haproxy) before we stop adding connections to the
idle pool, and destroy them instead. The default is 20. pool-high-fd-ratio is
the % of fds we're allowed to use (against the maximum number of fds available
to haproxy) before we start killing idling connection in the event we have to
create a new outgoing connection, and no reuse is possible. The default is 25.
Make sure it builds with OpenSSL < 1.1.0, a lot of the BIO_get/set methods
were introduced with OpenSSL 1.1.0, so fallback with the old way of doing
things if needed.
Instead of letting the OpenSSL code handle the file descriptor directly,
provide a custom BIO, that will use the underlying XPRT to send/recv data.
This will let us implement QUIC later, and probably clean the upper layer,
if/when the SSL code provide its own subscribe code, so that the upper layers
won't have to care if we're still waiting for the handshake to complete or not.
Olivier Houchard [Thu, 21 Mar 2019 17:27:17 +0000 (18:27 +0100)]
MEDIUM: connections: Provide a xprt_ctx for each xprt method.
For most of the xprt methods, provide a xprt_ctx. This will be useful later
when we'll want to be able to stack xprts.
The init() method now has to create and provide the said xprt_ctx if needed.
Olivier Houchard [Thu, 21 Mar 2019 15:30:07 +0000 (16:30 +0100)]
MEDIUM: ssl: provide its own subscribe/unsubscribe function.
In order to prepare for the possibility of using different kinds of xprt
with ssl, make the ssl code provide its own subscribe and unsubscribe
functions, right now it just calls conn_subscribe and conn_unsubsribe.
Olivier Houchard [Tue, 26 Feb 2019 17:37:15 +0000 (18:37 +0100)]
MEDIUM: ssl: Give ssl_sock its own context.
Instead of using directly a SSL * as xprt_ctx, give ssl_sock its own context.
It's useless for now, but will be useful later when we'll want to be able to
stack xprts.
MEDIUM: tasks: Use __ha_barrier_store after modifying global_tasks_mask.
Now that we no longer use atomic operations to update global_tasks_mask,
as it's always modified while holding the TASK_RQ_LOCK, we have to use
__ha_barrier_store() instead of __ha_barrier_atomic_store() to ensure
any modification of global_tasks_mask is seen before modifying
active_tasks_mask.
MEDIUM: tasks: Don't account a destroyed task as a runned task.
In process_runnable_tasks(), if the task we're about to run has been
destroyed, and should be free, don't account for it in the number of task
we ran. We're only allowed a maximum number of tasks to run per call to
process_runnable_tasks(), and freeing one shouldn't take the slot of a
valid task.
MEDIUM: tasks: Merge task_delete() and task_free() into task_destroy().
task_delete() was never used without calling task_free() just after, and
task_free() was only used on error pathes to destroy a just-created task,
so merge them into task_destroy(), that will remove the task from the
wait queue, and make sure the task is either destroyed immediately if it's
not in the run queue, or destroyed when it's supposed to run.
MINOR: task/thread: factor out a wake-up condition
The wakeup condition in task_wakeup() is redundant as it is already
validated by the CAS. Better move the __task_wakeup() call there, it
also has the merit of being easier to audit this way. This also reduces
the code size by around 1.8 kB :
BUG/MAJOR: task: make sure never to delete a queued task
Commit 0c7a4b6 ("MINOR: tasks: Don't set the TASK_RUNNING flag when
adding in the tasklet list.") revealed a hole in the way tasks may
be freed : they could be removed while in the run queue when the
TASK_QUEUED flag was present but not the TASK_RUNNING one. But it
seems the issue was emphasized by commit cde7902 ("MEDIUM: tasks:
improve fairness between the local and global queues") though the
code it replaces was already affected given how late the TASK_RUNNING
flag was set after removal from the global queue.
At the moment the task is picked from the global run queue, if it
is the last one, the global run queue lock is dropped, and then
the TASK_RUNNING flag was added. In the mean time another thread
might have performed a task_free(), and immediately after, the
TASK_RUNNING flag was re-added to the task, which was then added
to the tasklet list. The unprotected window was extremely faint
but does definitely exist and inconsistent task lists have been
observed a few times during very intensive tests over the last few
days. From this point various options are possible, the task might
have been re-allocated while running, and assigned state 0 and/or
state QUEUED while it was still running, resulting in the tast not
being put back into the tree.
This commit simply makes sure that tests on TASK_RUNNING before removing
the task also cover TASK_QUEUED.
It must be backported to 1.9 along with the previous ones touching
that area.
When deciding if we want to wake the task of an applet up, don't give up
if task_in_rq returns 1, as there's a race condition and another thread
may run it. Instead, always attempt to task_wakeup(), at worst the task
is already in the run queue, and nothing will happen.
MINOR: tasks: Don't set the TASK_RUNNING flag when adding in the tasklet list.
Now that TASK_QUEUED is enforced, there's no need to set TASK_RUNNING when
removing the task from the runqueue to add it to the tasklet list. The flag
will only be set right before we run the task.
MEDIUM: tasks: No longer use rq.node.leaf_p as a lock.
Now that we have the warranty that a task won't be added in the runqueue
while the TASK_QUEUED or the TASK_RUNNING flag is set, don't bother trying
to lock the task by setting leaf_p to 0x1 while inserting it in the runqueue
or having it in the tasklet_list, as nobody else will attempt to add it.
BUG/MEDIUM: tasks: Make sure we modify global_tasks_mask with the rq_lock.
When modifying global_tasks_mask, make sure we hold the rq_lock, or we might
remove the bit while it has been re-set by somebody else, and we make not
be waked when needed.
BUG/MEDIUM: tasks: Make sure we set TASK_QUEUED before adding a task to the rq.
Make sure we set TASK_QUEUED in every case before adding the task to the
run queue. task_wakeup() now checks if either TASK_QUEUED or TASK_RUNNING
is set, and if neither is set, add TASK_QUEUED and effectively add the task
to the runqueue.
No longer use __task_wakeup() anywhere except in task_wakeup(), always use
task_wakeup() instead.
With the old code, process_runnable_task() may re-add a task in the runqueue
without setting the TASK_QUEUED flag, and there were race conditions that could
lead to a task having the TASK_QUEUED flag but not in the runqueue, thus
being unschedulable.
BUG/MINOR: http_fetch/htx: Use HTX versions if the proxy enables the HTX mode
Because the HTX is now the default mode for all proxies (HTTP and TCP), it is
better to match on the proxy options to know if the HTX is enabled or not. This
way, if a TCP proxy explicitly disables the HTX mode, the legacy version of HTTP
fetches will be used.
No backport needed except if the patch activating the HTX by default for all
proxies is backported.
BUG/MINOR: http_fetch/htx: Allow permissive sample prefetch for the HTX
As for smp_prefetch_http(), there is now a way to successfully perform a
prefetch in HTX, even if the message forwarding already begun. It is used for
the sample fetches "req.proto_http" and "method".
BUG/MAJOR: http_fetch: Get the channel depending on the keyword used
All HTTP samples are buggy because the channel tested in the prefetch functions
(HTX and legacy HTTP) is chosen depending on the sample direction and not the
keyword really used. It means the request channel is used if the sample is
called during the request analysis and the response channel is used if it is
called during the response analysis, regardless the sample really called. For
instance, if you use the sample "req.ver" in an http-response rule, the response
channel will be prefeched because it is called during the response analysis,
while the request channel should have been used instead. So some assumptions on
the validity of the sample may be made on the wrong channel. It is the first
bug.
Then the same error is done in some samples themselves. So fetches are performed
on the wrong channel. For instance, the header extraction (req.fhdr, res.fhdr,
req.hdr, res.hdr...). If the sample "req.hdr" is used in an http-response rule,
then the matching is done on the response headers and not the request ones. It
is the second bug.
Finally, the last one but not the least, in some samples, the right channel is
used. But because the prefetch was done on the wrong one, this channel may be in
a undefined state. For instance, using the sample "req.ver" in an http-response
rule leads to a matching on a posibility released buffer.
To fix all these bugs, the right channel is now chosen in sample fetches, before
the prefetch. If the same function is used to fetch requests and responses
elements, then the keyword is used to choose the right one. This channel is then
used by the functions smp_prefetch_htx() and smp_prefetch_http(). Of course, it
is also used by the samples themselves to extract information.
This patch must be backported to all supported versions. For version 1.8 and
priors, it must be totally refactored. First because there is no HTX into these
versions. Then the buffers API has changed in HAProxy 1.9. The files
http_fetch.{ch} doesn't exist on old versions.
It avoids a roundtrip with underlying I/O callbacks to do so. If a read0 is
handled at the end of h1_rcv_pipe(), the flag CS_FL_REOS is set on the
conn_stream. And if there is no data in the pipe, the flag CS_FL_EOS is also
set.
BUG/MEDIUM: mux-h1: Enable TCP splicing to exchange data only
Use the TCP splicing only when the input parser is in the state H1_MSG_DATA or
H1_MSG_TUNNEL and don't transfer more than then known expected length for these
data (unlimited for the tunnel mode). In other states or when all data are
transferred, the TCP splicing is disabled.
BUG/MEDIUM: mux-h1: Notify the stream waiting for TCP splicing if ibuf is empty
When a stream-interface want to use the TCP splicing to forward its data, it
notifies the mux h1. We will then flush the input buffer and don't read more
data. So the stream-interface will not be notified for read anymore, except if
an error or a read0 is detected. It is a problem everytime the receive I/O
callback is called again. It happens when the pipe is full or when no data are
received on the pipe. It also happens when the input buffer is freshly
flushed. Because the TCP splicing is enabled, nothing is done in h1_recv() and
the stream-interface is never woken up. So, now, in h1_recv(), if the TCP
splicing is used and the input buffer is empty, the stream-interface is notified
for read.
BUG/MINOR: mux-h1: Process input even if the input buffer is empty
It is required, at least, to add the EOM block and finish the message when the
TCP splicing was used to send all data. Otherwise, there is no way to finish the
parsing.
BUG/MINOR: mworker: ensure that we still quits with SIGINT
Since the fix "BUG/MINOR: mworker: don't exit with an ambiguous value"
we are leaving with a EXIT_SUCCESS upon a SIGINT.
We still need to quit with a SIGINT when a worker leaves with a SIGINT.
This is done this way because vtest expect a 130 during the process
stop, haproxy without mworker returns a 130, so it should be the same in
mworker mode.
This should be backported in 1.9, with the previous patch ("BUG/MINOR:
mworker: don't exit with an ambiguous value").
Code has moved, mworker_catch_sigchld() is in haproxy.c.
BUG/MINOR: mworker: don't exit with an ambiguous value
When the sigchld handler is called and waitpid() returns -1,
the behavior of waitpid() with the status variable is undefined.
It is not a good idea to exit with the value contained in it.
Since this exit path does not use the exitcode variable, it means that
this is an expected and successful exit.
This should be backported in 1.9, code has moved,
mworker_catch_sigchld() is in haproxy.c.
BUG/MINOR: mworker: mworker_kill should apply on every children
Commit 3f12887 ("MINOR: mworker: don't use children variable anymore")
introduced a regression.
The previous behavior was to send a signal to every children, whether or
not they are former children. Instead of this, we only send a signal to
the current children, so we don't try to kill -INT or -TERM all
processes during a reload.
BUG/MINOR: listener/mq: correctly scan all bound threads under low load
When iterating on the CLI using "show activity" and no other load, it
was visible that the last thread was always skipped. This was caused by
the way the thread bits were walking : t1 was updated after t2 to make
sure it never equals t2 (thus it skips t2), and in case of a tie we
choose t1. This results in the chosen thread never to equal t2 unless
the other ones already have one connection. In addition to this, t2 was
recalulated upon each pass due to the fact that only the 31th bit was
looked at instead of looking at the t2'th bit.
This patch fixes this by updating t2 after t1 so that t1 is free to
walk over all positions under equal load. No measurable performance
gains are expected from this though, but it at least removes one
strange indicator which could lead to some suspicion.
MINOR: init: add a "set-dumpable" global directive to enable core dumps
It's always a pain to get a core dump when enabling user/group setting
(which disables the dumpable flag on Linux), when using a chroot and/or
when haproxy is started by a service management tool which requires
complex operations to just raise the core dump limit.
This patch introduces a new "set-dumpable" global directive to work
around these troubles by doing the following :
- remove file size limits (equivalent of ulimit -f unlimited)
- remove core size limits (equivalent of ulimit -c unlimited)
- mark the process dumpable again (equivalent of suid_dumpable=1)
Some of these will depend on the operating system. This way it becomes
much easier to retrieve a core file. Temporarily moving the chroot to
a user-writable place generally enough.
This option is already the default, but its opposite 'no option
start-on-reload' allows the master to keep a previous instance of a
program and don't start a new one upon a reload.
The old program will then appear as a current one in "show proc" and
could also trigger an exit-on-failure upon a segfault.
MEDIUM: mworker: store the leaving state of a process
Previously we were assuming than a process was in a leaving state when
its number of reload was greater than 0. With mworker programs it's not
the case anymore so we need to store a leaving state.
BUG/MAJOR: lb/threads: fix insufficient locking on round-robin LB
Maksim Kupriianov reported very strange crashes in fwrr_update_position()
which didn't make sense because of an apparent divide overflow except that
the value was not null in the core.
It happens that while the locking is correct in all the functions' call
graph, the uppermost one (fwrr_get_next_server()) incorrectly expected
that its target server was already locked when called. This stupid
assumption causd the server lock not to be held when calling the other
ones, explaining how it was possible to change the server's eweight by
calling srv_lb_commit_status() under the server lock yet collide with
its unprotected usage.
This commit makes sure that fwrr_get_server_from_group() retrieves a
locked server and that fwrr_get_next_server() is responsible for
unlocking the server before returning it. There is one subtlety in
this function which is that it builds a list of avoided servers that
were full while scanning the tree, and all of them are queued in a
full state so they must be unlocked upon return.
Many thanks to Maksim for providing detailed info allowing to narrow
down this bug.
This fix must be backported to 1.9. In 1.8 the lock seems much wider
and changes to the server's state are performed under the rendez-vous
point so this it doesn't seem possible that it happens there.
MINOR: peers: Add a new command to the CLI for peers.
Implements "show peers [peers section]" new CLI command to dump information
about the peers and their stick-tables to be synchronized and others internal.
BUILD: htx: fix a used uninitialized warning on is_cookie2
gcc-3.4 reports this which actually looks like a valid warning when
looking at the code, it's unsure why others didn't notice it :
src/proto_htx.c: In function `htx_manage_server_side_cookies':
src/proto_htx.c:4266: warning: 'is_cookie2' might be used uninitialized in this function
BUILD: address a few cases of "static <type> inline foo()"
Older compilers don't like to see "inline" placed after the type in a
function declaration, it must be "static inline <type>" only. This
patch touches various areas. The warnings were seen with gcc-3.4.
BUG/MEDIUM: Threads: Only use the gcc >= 4.7 builtins when using gcc >= 4.7.
Move the definition of the various _HA_ATOMIC_* macros that use
__atomic_* in the #if GCC_VERSION >= 4.7, not just after it, so that we
can build with older versions of gcc again.
BUG/MEDIUM: h2: Revamp the way send subscriptions works.
Instead of abusing the SUB_CALL_UNSUBSCRIBE flag, revamp the H2 code a bit
so that it just checks if h2s->sending_list is empty to know if the tasklet
of the stream_interface has been waken up or not.
send_wait is now set to NULL in h2_snd_buf() (ideally we'd set it to NULL
as soon as we're waking the tasklet, but it can't be done, because we still
need it in case we have to remove the tasklet from the task list).
BUG/MEDIUM: h2: Make sure we're not already in the send_list in h2_subscribe().
In h2_subscribe(), don't add ourself to the send_list if we're already in it.
That may happen if we try to send and fail twice, as we're only removed
from the send_list if we managed to send data, to promote fairness.
Failing to do so can lead to either an infinite loop, or some random crashes,
as we'd get the same h2s in the send_list twice.
BUG/MEDIUM: muxes: Make sure we unsubcribed when destroying mux ctx.
In the h1 and h2 muxes, make sure we unsubscribed before destroying the
mux context.
Failing to do so will lead in a segfault later, as the connection will
attempt to dereference its conn->send_wait or conn->recv_wait, which pointed
to the now-free'd mux context.
BUILD: cli/threads: fix build in single-threaded mode
Commit a8f57d51a ("MINOR: cli/activity: report the accept queue sizes
in "show activity"") broke the single-threaded build because the
accept-rings are not implemented there. Let's ifdef this out. Ideally
we should start to think about always having such elements initialized
even without threads to improve the test coverage.
BUILD: task/thread: fix single-threaded build of task.c
As expected, commit cde7902ac ("MEDIUM: tasks: improve fairness between
the local and global queues") broke the build with threads disabled,
and I forgot to rerun this test before committing. No backport is
needed.
Whenever HAProxy was reloaded with rotated keys, the resumption would be
broken for previous encryption key. The bug was introduced with the addition
of 80 byte keys in 9e7547 (MINOR: ssl: add support of aes256 bits ticket keys
on file and cli.).
BUG/MEDIUM: map: Fix memory leak in the map converter
The allocated trash chunk is not freed properly and causes a memory leak
exhibited as the growth in the trash pool allocations. Bug was introduced
in commit 271022 (BUG/MINOR: map: fix map_regm with backref).
This should be backported to all branches where the above commit was
backported.
MINOR: tasks: restore the lower latency scheduling when niced tasks are present
In the past we used to reduce the number of tasks consulted at once when
some niced tasks were present in the run queue. This was dropped in 1.8
when the scheduler started to take batches. With the recent fixes it now
becomes possible to restore this behaviour which guarantees a better
latency between tasks when niced tasks are present. Thanks to this, with
the default number of 200 for tune.runqueue-depth, with a parasitic load
of 14000 requests per second, nice 0 gives 14000 rps, nice 1024 gives
12000 rps and nice -1024 gives 16000 rps. The amplitude widens if the
runqueue depth is lowered.
MEDIUM: tasks: only base the nice offset on the run queue depth
The offset calculated for the nice value used to be wrong for a long
time and got even worse when the improved multi-thread sheduler was
implemented because it continued to rely on the run queue size, which
become irrelevant given that we extract tasks in batches, so the run
queue size moves following a sawtooth form.
However the offsets much better reflects insertion positions in the
queue, so it's worth dropping this rq_size component of the equation.
Last point, due to the batches made of runqueue-depth entries at once,
the higher the depth, the lower the effect of the nice setting since
values are picked together in batches and placed into a list. An
intuitive approach consists in multiplying the nice value with the
batch size to allow tasks to participate to a different batch. And
experimentation shows that this works pretty well.
With a runqueue-depth of 16 and a parasitic load of 16000 requests
per second on 100 streams, a default nice of 0 shows 16000 requests
per second for nice 0, 22000 for nice -1024 and 10000 for nice 1024.
The difference is even bigger with a runqueue depth of 5. At 200
however it's much smoother (16000-22000).
MEDIUM: tasks: improve fairness between the local and global queues
Tasks allowed to run on multiple threads, as well as those scheduled by
one thread to run on another one pass through the global queue. The
local queues only see tasks scheduled by one thread to run on itself.
The tasks extracted from the global queue are transferred to the local
queue when they're picked by one thread. This causes a priority issue
because the global tasks experience a priority contest twice while the
local ones experience it only once. Thus if a tasks returns still
running, it's immediately reinserted into the local run queue and runs
much faster than the ones coming from the global queue.
Till 1.9 the tasks going through the global queue were mostly :
- health checks initialization
- queue management
- listener dequeue/requeue
These ones are moderately sensitive to unfairness so it was not that
big an issue.
Since 2.0-dev2 with the multi-queue accept, tasks are scheduled to
remote threads on most accept() and it becomes fairly visible under
load that the accept slows down, even for the CLI.
This patch remedies this by consulting both the local and the global
run queues in parallel and by always picking the task whose deadline
is the earliest. This guarantees to maintain an excellent fairness
between the two queues and removes the cascade effect experienced
by the global tasks.
Now the CLI always continues to respond quickly even in presence of
expensive tasks running for a long time.
This patch may possibly be backported to 1.9 if some scheduling issues
are reported but at this time it doesn't seem necessary.
This one hasn't been used anymore since the scheduler changes after 1.8
but it kept being exported and maintained up to date while it's always
reset when scanning the trees. Let's stop exporting it and updating it.
BUG/MEDIUM: muxes: Don't dereference mux context if null in release functions
When a mux context is released, we must be sure it exists before dereferencing
it. The bug was introduced in the commit 39a96ee16 ("MEDIUM: muxes: Be prepared
to don't own connection during the release").
No need to backport this patch, expect if the commit 39a96ee16 is backported
too.
REGTEST: Use HTX by default and add '--no-htx' option to disable it
Because the HTX is now the default mode in HAProxy, it is also enabled by
default in reg-tests. The option '--use-htx' is still available, but deprecated
and have concretly no effect. To run reg-tests with the legacy HTTP mode, you
should use the option '--no-htx'.
MAJOR: htx: Enable the HTX mode by default for all proxies
The legacy HTTP mode is no more the default one. So now, by default, without any
option in your configuration, all proxies will use the HTX mode. The line
"option http-use-htx" in proxy sections are now useless, except to cancel the
legacy HTTP mode. To fallback on legacy HTTP mode, you should use the line "no
option http-use-htx" explicitly.
Note that the reg-tests still work by default on legacy HTTP mode. The HTX will
be enabled by default in a futur commit.
MAJOR: muxes/htx: Handle inplicit upgrades from h1 to h2
The upgrade is performed when an H2 preface is detected when the first request
on a connection is parsed. The CS is destroyed by setting EOS flag on it. A
special flag is added on the HTX message to warn the HTX analyzers the stream
will be closed because of an upgrade. This way, no error and no log are
emitted. When the mux h1 is released, we create a mux h2, without any CS and
passing the buffer with the unparsed H2 preface.
MAJOR: proxy/htx: Handle mux upgrades from TCP to HTTP in HTX mode
It is now possible to upgrade TCP streams to HTX when an HTTP backend is set for
a TCP frontend (both with the HTX enabled). So concretely, in such case, an
upgrade is performed from the mux pt to the mux h1. The current CS and the
channel's buffer are used to initialize the mux h1.
MEDIUM: htx: Allow the option http-use-htx to be used on TCP proxies too
This will be mandatory to allow upgrades from TCP to HTTP in HTX. Of course, raw
buffers will still be used by default on TCP proxies, this option sets or
not. But if you want to handle mux upgrades from a TCP proxy, you must enable
the HTX on it and on all its backends.
There is only a small change in the lua code. Because TCP proxies can be HTX
aware, to exclude TCP services only for HTTP proxies, we must also check the
mode (TCP/HTTP) now.
MEDIUM: connection: Add conn_upgrade_mux_fe() to handle mux upgrades
This function will handle mux upgrades, for frontend connections only. It will
retrieve the best mux in the same way than conn_install_mux_fe except that the
mode and optionnally the proto are forced.
The new multiplexer is initialized using a new context and a specific input
buffer. Then, the old one is destroyed. If an error occurred, everything is
rolled back.
MEDIUM: muxes: Be prepared to don't own connection during the release
This happens during mux upgrades. In such case, when the destroy() callback is
called, the connection points to a different mux's context than the one passed
to the callback. It means the connection is owned by another mux. The old mux is
then released but the connection is not closed.
MINOR: muxes: Pass the context of the mux to destroy() instead of the connection
It is mandatory to handle mux upgrades, because during a mux upgrade, the
connection will be reassigned to another multiplexer. So when the old one is
destroyed, it does not own the connection anymore. Or in other words, conn->ctx
does not point to the old mux's context when its destroy() callback is
called. So we now rely on the multiplexer context do destroy it instead of the
connection.
In addition, h1_release() and h2_release() have also been updated in the same
way.
MEDIUM: muxes: Add an optional input buffer during mux initialization
The mux's callback init() now take a pointer to a buffer as extra argument. It
must be used by the multiplexer as its input buffer. This buffer is always NULL
when a multiplexer is initialized with a fresh connection. But if a mux upgrade
is performed, it may be filled with existing data. Note that, for now, mux
upgrades are not supported. But this commit is mandatory to do so.