Willy Tarreau [Mon, 20 Nov 2017 14:37:13 +0000 (15:37 +0100)]
BUG/MAJOR: stream: ensure analysers are always called upon close
A recent issue affecting HTTP/2 + redirect + cache has uncovered an old
problem affecting all existing versions regarding the way events are
reported to analysers.
It happens that when an event is reported, analysers see it and may
decide to temporarily pause processing and prevent other analysers from
processing the same event. Then the event may be cleared and upon the
next call to the analysers, some of them will never see it.
This is exactly what happens with CF_READ_NULL if it is received before
the request is processed, like during redirects : the first time, some
analysers see it, pause, then the event may be converted to a SHUTW and
cleared, and on next call, there's nothing to process. In practice it's
hard to get the CF_READ_NULL flag during the request because requests
have CF_READ_DONTWAIT, preventing the read0 from happening. But on
HTTP/2 it's presented along with any incoming request. Also on a TCP
frontend the flag is not set and it's possible to read the NULL before
the request is parsed.
This causes a problem when filters are present because flt_end_analyse
needs to be called to release allocated resources and remove the
CF_FLT_ANALYZE flag. And the loss of this event prevents the analyser
from being called and from removing itself, preventing the connection
from ever ending.
This problem just shows that the event processing needs a serious revamp
after 1.8. In the mean time we can deal with the really problematic case
which is that we *want* to call analysers if CF_SHUTW is set on any side
ad it's the last opportunity to terminate a processing. It may
occasionally result in some analysers being called for nothing in half-
closed situations but it will take care of the issue.
An example of problematic configuration triggering the bug in 1.7 is :
This fix must be backported to 1.7 as well as any version where commit c0c672a ("BUG/MINOR: http: Fix conditions to clean up a txn and to
handle the next request") was backported. This commit didn't cause the
bug but made it much more likely to happen.
Willy Tarreau [Sat, 18 Nov 2017 14:39:10 +0000 (15:39 +0100)]
BUG/MEDIUM: stream: don't automatically forward connect nor close
Upon stream instanciation, we used to enable channel auto connect
and auto close to ease TCP processing. But commit 9aaf778 ("MAJOR:
connection : Split struct connection into struct connection and
struct conn_stream.") has revealed that it was a bad idea because
this commit enables reading of the trailing shutdown that may follow
a small requests, resulting in a read and a shutr turned into shutw
before the stream even has a chance to apply the filters. This
causes an issue with impossible situations where the backend stream
interface is still in SI_ST_INI with a closed output, which blocks
some streams for example when performing a redirect with filters
enabled.
Let's change this so that we only enable these two flags if there is
no analyser on the stream. This way process_stream() has a chance to
let the analysers decide whether or not to allow the shutdown event
to be transferred to the other side.
It doesn't seem possible to trigger this issue before 1.8, so for now
it is preferable not to backport this fix.
Willy Tarreau [Sun, 19 Nov 2017 08:55:29 +0000 (09:55 +0100)]
[RELEASE] Released version 1.8-rc4
Released version 1.8-rc4 with the following main changes :
- BUG/MEDIUM: cache: does not cache if no Content-Length
- BUILD: thread/pipe: fix build without threads
- BUG/MINOR: spoe: check buffer size before acquiring or releasing it
- MINOR: debug/flags: Add missing flags
- MINOR: threads: Use __decl_hathreads to declare locks
- BUG/MINOR: buffers: Fix b_alloc_margin to be "fonctionnaly" thread-safe
- BUG/MAJOR: ebtree/scope: fix insertion and removal of duplicates in scope-aware trees
- BUG/MAJOR: ebtree/scope: fix lookup of next node in scope-aware trees
- MINOR: ebtree/scope: add a function to find next node from a parent
- MINOR: ebtree/scope: simplify the lookup functions by using eb32sc_next_with_parent()
- BUG/MEDIUM: mworker: Fix re-exec when haproxy is started from PATH
- BUG/MEDIUM: cache: use msg->sov to forward header
- MINOR: cache: forward data with headers
- MINOR: cache: disable cache if shctx_row_data_append fail
- BUG/MINOR: threads: tid_bit must be a unsigned long
- CLEANUP: tasks: Remove useless double test on rq_next
- BUG/MEDIUM: standard: itao_str/idx and quote_str/idx must be thread-local
- MINOR: tools: add a function to dump a scope-aware tree to a file
- MINOR: tools: improve the DOT dump of the ebtree
- MINOR: tools: emphasize the node being worked on in the tree dump
- BUG/MAJOR: ebtree/scope: properly tag upper nodes during insertion
- DOC: peers: Add a first version of peers protocol v2.1.
- CONTRIB: Wireshark dissector for HAProxy Peer Protocol.
- MINOR: mworker: display an accurate error when the reexec fail
- BUG/MEDIUM: mworker: wait again for signals when execvp fail
- BUG/MEDIUM: mworker: does not deinit anymore
- BUG/MEDIUM: mworker: does not close inherited FD
- MINOR: tests: add a python wrapper to test inherited fd
- BUG/MINOR: Allocate the log buffers before the proxies startup
- MINOR: tasks: Use a bitfield to track tasks activity per-thread
- MAJOR: polling: Use active_tasks_mask instead of tasks_run_queue
- MINOR: applets: Use a bitfield to track applets activity per-thread
- MAJOR: polling: Use active_appels_mask instead of applets_active_queue
- MEDIUM: applets: Don't process more than 200 active applets at once
- MINOR: stream: Add thread-mask of tasks/FDs/applets in "show sess all" command
- MINOR: SSL: Store the ASN1 representation of client sessions.
- MINOR: ssl: Make sure we don't shutw the connection before the handshake.
- BUG/MEDIUM: deviceatlas: ignore not valuable HTTP request data
David Carlier [Fri, 17 Nov 2017 08:47:25 +0000 (08:47 +0000)]
BUG/MEDIUM: deviceatlas: ignore not valuable HTTP request data
A customer reported a crash when within the HTTP request some headers
were not set leading to the module to crash. So the module ignore them
since empty data have no value for the detection.
Needs to be backported to 1.7.
Olivier Houchard [Thu, 16 Nov 2017 16:42:52 +0000 (17:42 +0100)]
MINOR: SSL: Store the ASN1 representation of client sessions.
Instead of storing the SSL_SESSION pointer directly in the struct server,
store the ASN1 representation, otherwise, session resumption is broken with
TLS 1.3, when multiple outgoing connections want to use the same session.
MEDIUM: applets: Don't process more than 200 active applets at once
Now, we process at most 200 active applets per call to applet_run_active. We use
the same limit as the tasks. With the cache filter and the SPOE, the number of
active applets can now be huge. So, it is important to limit the number of
applets processed in applet_run_active.
MAJOR: polling: Use active_appels_mask instead of applets_active_queue
applets_active_queue is the active queue size. It is a global variable. So it is
underoptimized because we may be lead to consider there are active applets for a
thread while in fact all active applets are assigned to the otherthreads. So, in
such cases, the polling loop will be evaluated many more times than necessary.
Instead, we now check if the thread id is set in the bitfield active_applets_mask.
This is specific to threads, no backport is needed.
MINOR: applets: Use a bitfield to track applets activity per-thread
a bitfield has been added to know if there are runnable applets for a
thread. When an applet is woken up, the bits corresponding to its thread_mask
are set. When all active applets for a thread is get to be processed, the thread
is removed from active ones by unsetting its tid_bit from the bitfield.
MAJOR: polling: Use active_tasks_mask instead of tasks_run_queue
tasks_run_queue is the run queue size. It is a global variable. So it is
underoptimized because we may be lead to consider there are active tasks for a
thread while in fact all active tasks are assigned to the other threads. So, in
such cases, the polling loop will be evaluated many more times than necessary.
Instead, we now check if the thread id is set in the bitfield active_tasks_mask.
Another change has been made in process_runnable_tasks. Now, we always limit the
number of tasks processed to 200.
This is specific to threads, no backport is needed.
MINOR: tasks: Use a bitfield to track tasks activity per-thread
a bitfield has been added to know if there are runnable tasks for a thread. When
a task is woken up, the bits corresponding to its thread_mask are set. When all
tasks for a thread have been evaluated without any wakeup, the thread is removed
from active ones by unsetting its tid_bit from the bitfield.
BUG/MINOR: Allocate the log buffers before the proxies startup
Since the commit cd7879adc ("BUG/MEDIUM: threads: Run the poll loop on the main
thread too"), the log buffers are allocated after the proxies startup. So log
messages produced during this startup was ignored.
To fix the bug, we restore the initialization of these buffers before proxies
startup.
This is specific to threads, no backport is needed.
Does not use the deinit() function during a reload, it's dangerous and
might be subject to double free, segfault and hazardous behavior if
it's called twice in the case of a execvp fail.
BUG/MEDIUM: mworker: wait again for signals when execvp fail
After execvp fails, the signals were ignored, preventing to try a reload
again. It is now fixed by reaching the top of the mworker_wait()
function once the execvp failed.
Willy Tarreau [Wed, 15 Nov 2017 18:38:29 +0000 (19:38 +0100)]
BUG/MAJOR: ebtree/scope: properly tag upper nodes during insertion
Christopher found a case where some tasks would remain unseen in the run
queue and would spontaneously appear after certain apparently unrelated
operations performed by the other thread.
It's in fact the insertion which is not correct, the node serving as the
top of duplicate tree wasn't properly updated, just like the each top of
subtree in a duplicate tree. This had the effect that after some removals,
the incorrectly tagged node would hide the underlying ones, which would
then suddenly re-appear once they were removed.
Willy Tarreau [Wed, 15 Nov 2017 17:51:29 +0000 (18:51 +0100)]
MINOR: tools: emphasize the node being worked on in the tree dump
Now we can show in dotted red the node being removed or surrounded in red
a node having been inserted, and add a description on the graph related to
the operation in progress for example.
Willy Tarreau [Wed, 15 Nov 2017 16:49:54 +0000 (17:49 +0100)]
MINOR: tools: improve the DOT dump of the ebtree
Use a smaller and cleaner fixed font, use upper case to indicate sides on
branches, remove the useless node/leaf markers on branches since the colors
already indicate them, and show the node's key as it helps spot the matching
leaf.
MINOR: cache: disable cache if shctx_row_data_append fail
Disable the cache if the append of data failed, it should never happen
because the allocated row size is at least equal to the size of the
object to allocate.
Use msg->sov to forward headers instead of msg->eoh. It can causes some
problem because eoh does not contains the last \r\n, and the filter does
not support to send the headers partially.
Tim Duesterhus [Sun, 12 Nov 2017 16:39:18 +0000 (17:39 +0100)]
BUG/MEDIUM: mworker: Fix re-exec when haproxy is started from PATH
If haproxy is started using the name of the binary only (i.e.
not using a relative or absolute path) the `execv` in
`mworker_reload` fails with `ENOENT`, because it does not
examine the `PATH`:
[WARNING] 315/161139 (7) : Reexecuting Master process
[WARNING] 315/161139 (7) : Cannot allocate memory
[WARNING] 315/161139 (7) : Failed to reexecute the master processs [7]
The error messages are misleading, because the return value of
`execv` is not checked. This should be fixed in a separate commit.
Once this happened the master process ignores any further
signals sent by the administrator.
Replace `execv` with `execvp` to establish the expected
behaviour.
Willy Tarreau [Mon, 13 Nov 2017 18:13:06 +0000 (19:13 +0100)]
MINOR: ebtree/scope: add a function to find next node from a parent
Several parts of the code need to access the next node but don't start
from a node but a tagged parent link. Even eb32sc_next() does this.
Let's provide this function to prepare a cleanup for the lookup function.
Willy Tarreau [Mon, 13 Nov 2017 17:55:44 +0000 (18:55 +0100)]
BUG/MAJOR: ebtree/scope: fix lookup of next node in scope-aware trees
The eb32sc_walk_down_left() function needs to be able to go up when
it doesn't find a matching entry because this situation may always
happen, especially when fixing two constraints (scope + value). It
also happens after certain removal situations where some bits remain
on some intermediary nodes in the tree.
In addition, the algorithm for deciding to take the right branch is
wrong as it would take it if the current node shows a scope that
doesn't matchthe required one.
The current code is flakey in that it returns NULL when the bottom
has been reached and it's up to the caller to visit other nodes above.
In addition to being complex it's not reliable, and it was noticed a
few times that some tasks could remain lying in the tree after heavy
insertion/removals under multi-threaded workloads.
Now instead we make eb32sc_walk_down_left() visit the leftmost branch
that matches the scope, and automatically go up to visit the closest
matching right branch. This effectively does the same operations as a
next() operation but in reverse order (down then up instead of up then
down).
The eb32sc_next() function now becomes very simple again and matches
the original one, and the initial issues cannot be met anymore.
No backport is needed, this is purely 1.8-specific.
Willy Tarreau [Mon, 13 Nov 2017 15:16:09 +0000 (16:16 +0100)]
BUG/MAJOR: ebtree/scope: fix insertion and removal of duplicates in scope-aware trees
Commit ca30839 and following ("MINOR: ebtree: implement the scope-aware
functions for eb32") improperly dealt with the scope in duplicate trees.
The insertion was too lenient in that it would always mark the whole
rightmost chain below the insertion point, and the removal could leave
marks of non-existing scopes causing next()/first() to visit the wrong
branch and return NULL.
For insertion, we must only tag the nodes between the head of the dup
tree and the insertion point which is the top of the lowest subtree. For
removal, the new scope must be be calculated by oring the scopes of the
two new branches and is irrelevant to the previous values.
No backport is needed, this is purely 1.8-specific.
BUG/MINOR: buffers: Fix b_alloc_margin to be "fonctionnaly" thread-safe
b_alloc_margin is, strickly speeking, thread-safe. It will not crash
HAproxy. But its contract is not respected anymore in a multithreaded
environment. In this function, we need to be sure to have <margin> buffers
available in the pool after the allocation. So to have this guarantee, we must
lock the memory pool during all the operation. This also means, we must call
internal and lockless memory functions (prefixed with '__').
For the record, this patch fixes a pernicious bug happens after a soft reload
where some streams can be blocked infinitly, waiting for a buffer in the
buffer_wq list. This happens because, during a soft reload, pool_gc2 is called,
making some calls to b_alloc_fast fail.
This is specific to threads, no backport is needed.
MINOR: threads: Use __decl_hathreads to declare locks
This macro should be used to declare variables or struct members depending on
the USE_THREAD compile option. It avoids the encapsulation of such declarations
between #ifdef/#endif. It is used to declare all lock variables.
BUG/MINOR: spoe: check buffer size before acquiring or releasing it
In spoe_acquire_buffer and spoe_release_buffer, instead of checking the buffer
against buf_empty, we now check its size. It is important because when an
allocation fails, it will be set to buf_wanted. In both cases, the size is 0.
It is a proactive bug fix, no real problem was observed till now. It cannot be
backported as is in 1.7 because of all changes made on the SPOE in 1.8.
Willy Tarreau [Sat, 11 Nov 2017 16:58:31 +0000 (17:58 +0100)]
BUILD: thread/pipe: fix build without threads
Marcus Rückert reported that commit d8b3b65 ("BUG/MEDIUM: splice/threads:
pipe reuse list was not protected.") broke threadless support. Add the
required #ifdef.
Willy Tarreau [Sat, 11 Nov 2017 08:06:48 +0000 (09:06 +0100)]
[RELEASE] Released version 1.8-rc3
Released version 1.8-rc3 with the following main changes :
- BUILD: use MAXPATHLEN instead of NAME_MAX.
- BUG/MAJOR: threads/checks: add 4 missing spin_unlock() in various functions
- BUG/MAJOR: threads/server: missing unlock in CLI fqdn parser
- BUG/MINOR: cli: do not perform an invalid action on "set server check-port"
- BUG/MAJOR: threads/checks: wrong use of SPIN_LOCK instead of SPIN_UNLOCK
- CLEANUP: checks: remove return statements in locked functions
- BUG/MINOR: cli: add severity in "set server addr" parser
- CLEANUP: server: get rid of return statements in the CLI parser
- BUG/MAJOR: cli/streams: missing unlock on exit "show sess"
- BUG/MAJOR: threads/dns: add missing unlock on allocation failure path
- BUG/MAJOR: threads/lb: fix missing unlock on consistent hash LB
- BUG/MAJOR: threads/lb: fix missing unlock on map-based hash LB
- BUG/MEDIUM: threads/stick-tables: close a race condition on stktable_trash_expired()
- BUG/MAJOR: h2: set the connection's task to NULL when no client timeout is set
- BUG/MAJOR: thread/listeners: enable_listener must not call unbind_listener()
- BUG/MEDIUM: threads: don't try to free build option message on exit
- MINOR: applets: no need to check for runqueue's emptiness in appctx_res_wakeup()
- MINOR: add master-worker in the warning about nbproc
- MINOR: mworker: allow pidfile in mworker + foreground
- MINOR: mworker: write parent pid in the pidfile
- MINOR: mworker: do not store child pid anymore in the pidfile
- MINOR: ebtree: implement the scope-aware functions for eb32
- MEDIUM: ebtree: specify the scope of every node inserted via eb32sc
- MINOR: ebtree: update the eb32sc parent node's scope on delete
- MEDIUM: ebtree: only consider the branches matching the scope in lookups
- MINOR: ebtree: implement eb32sc_lookup_ge_or_first()
- MAJOR: task: make use of the scope-aware ebtree functions
- MINOR: task: simplify wake_expired_tasks() to avoid unlocking in the loop
- MEDIUM: task: change the construction of the loop in process_runnable_tasks()
- MINOR: threads: use faster locks for the spin locks
- MINOR: tasks: only visit filled task slots after processing them
- MEDIUM: tasks: implement a lockless scheduler for single-thread usage
- BUG/MINOR: dns: Don't try to get the server lock if it's already held.
- BUG/MINOR: dns: Don't lock the server lock in snr_check_ip_callback().
- DOC: Add note about encrypted password CPU usage
- BUG/MINOR: h2: set the "HEADERS_SENT" flag on stream, not connection
- BUG/MEDIUM: h2: properly send an RST_STREAM on mux stream error
- BUG/MEDIUM: h2: properly send the GOAWAY frame in the mux
- BUG/MEDIUM: h2: don't try (and fail) to send non-existing data in the mux
- MEDIUM: h2: remove the H2_SS_RESET intermediate state
- BUG/MEDIUM: h2: fix some wrong error codes on connections
- BUILD: threads: Rename SPIN/RWLOCK macros using HA_ prefix
- BUILD: enable USE_THREAD for Solaris build.
- BUG/MEDIUM: h2: don't close the connection is there are data left
- MINOR: h2: don't re-enable the connection's task when we're closing
- BUG/MEDIUM: h2: properly set H2_SF_ES_SENT when sending the final frame
- BUG/MINOR: h2: correctly check for H2_SF_ES_SENT before closing
- MINOR: h2: add new stream flag H2_SF_OUTGOING_DATA
- BUG/MINOR: h2: don't send GOAWAY on failed response
- BUG/MEDIUM: splice/threads: pipe reuse list was not protected.
- BUG/MINOR: comp: fix compilation warning compiling without compression.
- BUG/MINOR: stream-int: don't set MSG_MORE on closed request path
- BUG/MAJOR: threads/tasks: fix the scheduler again
- BUG/MINOR; ssl: Don't assume we have a ssl_bind_conf because a SNI is matched.
- MINOR: ssl: Handle session resumption with TLS 1.3
- MINOR: ssl: Spell 0x10101000L correctly.
- MINOR: ssl: Handle sending early data to server.
- BUILD: ssl: fix build of backend without ssl
- BUILD: shctx: do not depend on openssl anymore
- BUG/MINOR: h1: the HTTP/1 make status code parser check for digits
- BUG/MEDIUM: h2: reject non-3-digit status codes
- BUG/MEDIUM: stream-int: Don't loss write's notifs when a stream is woken up
- BUG/MINOR: pattern: Rely on the sample type to copy it in pattern_exec_match
- BUG/MEDIUM: h2: split the function to send RST_STREAM
- BUG/MEDIUM: h1: ensure the chunk size parser can deal with full buffers
- MINOR: tools: don't use unlikely() in hex2i()
- BUG/MEDIUM: h2: support orphaned streams
- BUG/MEDIUM: threads/cli: fix "show sess" locking on release
- CLEANUP: mux: remove the unused "release()" function
- MINOR: cli: make "show fd" report the fd's thread mask
- BUG/MEDIUM: stream: don't ignore res.analyse_exp anymore
- CLEANUP: global: introduce variable pid_bit to avoid shifts with relative_pid
- MEDIUM: http: always reject the "PRI" method
Willy Tarreau [Fri, 10 Nov 2017 18:38:10 +0000 (19:38 +0100)]
MEDIUM: http: always reject the "PRI" method
This method was reserved for the HTTP/2 connection preface, must never
be used and must be rejected. In normal situations it doesn't happen,
but it may be visible if a TCP frontend has alpn "h2" enabled, and
forwards to an HTTP backend which tries to parse the request. Before
this patch it would pass the wrong request to the backend server, now
it properly returns 400 bad req.
This patch should probably be backported to stable versions.
Willy Tarreau [Fri, 10 Nov 2017 18:08:14 +0000 (19:08 +0100)]
CLEANUP: global: introduce variable pid_bit to avoid shifts with relative_pid
At a number of places, bitmasks are used for process affinity and to map
listeners to processes. Every time 1UL<<(relative_pid-1) is used. Let's
create a "pid_bit" variable corresponding to this value to clean this up.
It happens that no single analyser has ever needed to set res.analyse_exp,
so that process_stream() didn't consider it when computing the next task
expiration date. Since Lua actions were introduced in 1.6, this can be
needed on http-response actions for example, so let's ensure it's properly
handled.
Thanks to Nick Dimov for reporting this bug. The fix needs to be
backported to 1.7 and 1.6.
Willy Tarreau [Fri, 10 Nov 2017 15:53:09 +0000 (16:53 +0100)]
MINOR: cli: make "show fd" report the fd's thread mask
This is useful to know what thread(s) an fd is scheduled to be
handled on. It's worth noting that at the moment the "show fd"d
doesn't seem totally thread-safe.
Willy Tarreau [Fri, 10 Nov 2017 15:43:05 +0000 (16:43 +0100)]
CLEANUP: mux: remove the unused "release()" function
In commit 53a4766 ("MEDIUM: connection: start to introduce a mux layer
between xprt and data") we introduced a release() function which ends
up never being used. Let's get rid of it now.
Willy Tarreau [Fri, 10 Nov 2017 10:42:33 +0000 (11:42 +0100)]
BUG/MEDIUM: h2: support orphaned streams
When a stream_interface performs a shutw() then a shutr(), the stream
is marked closed. Then cs_destroy() calls h2_detach() and it cannot
fail since we're on the leaving path of the caller. The problem is that
in order to close streams we usually have to send either an emty DATA
frame with the ES flag set or an RST_STREAM frame, and the mux buffer
might already be full, forcing the stream to be queued. The forced
removal of this stream causes this last message to silently disappear,
and the client to wait forever for a response.
This commit ensures we can detach the conn_stream from the h2 stream
if the stream is blocked, effectively making the h2 stream an orphan,
ensures that the mux can deal with orphaned streams after processing
them, and that the demux can kill them upon receipt of GOAWAY.
Willy Tarreau [Fri, 10 Nov 2017 10:19:54 +0000 (11:19 +0100)]
MINOR: tools: don't use unlikely() in hex2i()
This small inline function causes some pain to the compiler when used
inside other functions due to its use of the unlikely() hint for non-digits.
It causes the letters to be processed far away in the calling function and
makes the code less efficient. Removing these unlikely() hints has increased
the chunk size parsing by around 5%.
Willy Tarreau [Fri, 10 Nov 2017 10:17:08 +0000 (11:17 +0100)]
BUG/MEDIUM: h1: ensure the chunk size parser can deal with full buffers
The HTTP/1 code always has the reserve left available so the buffer is
never full there. But with HTTP/2 we have to deal with full buffers,
and it happens that the chunk size parser cannot tell the difference
between a full buffer and an empty one since it compares the start and
the stop pointer.
Let's change this to instead deal with the number of bytes left to process.
As a side effect, this code ends up being about 10% faster than the previous
one, even on HTTP/1.
Willy Tarreau [Fri, 10 Nov 2017 09:05:24 +0000 (10:05 +0100)]
BUG/MEDIUM: h2: split the function to send RST_STREAM
There is an issue with how the RST_STREAM frames are sent. Some of
them are sent from the demux, either for valid or for closed streams,
and some are sent from the mux always for valid streams. At the moment
the demux stream ID is used, which is wrong for all streams being muxed,
and sometimes results in certain bad HTTP responses causing the emission
of an RST_STREAM referencing stream zero. In addition, the stream's
blocked flags could be updated even if the stream was the closed or
idle ones.
We really need to split the function for the two distinct use cases where
one is used to send an RST on a condition detected at the connection level
(such as a closed stream) and the other one is used to send an RST for a
condition detected at the stream level. The first one is used only in the
demux, and the other one only by a valid stream.
BUG/MINOR: pattern: Rely on the sample type to copy it in pattern_exec_match
To be thread safe, the function pattern_exec_match copy data (the pattern and
the inner sample) in thread-local variables. But when the sample is duplicated,
we must check its type and not the pattern one.
This is specific to threads, no backport is needed.
BUG/MEDIUM: stream-int: Don't loss write's notifs when a stream is woken up
When a write activity is reported on a channel, it is important to keep this
information for the stream because it take part on the analyzers' triggering.
When some data are written, the flag CF_WRITE_PARTIAL is set. It participates to
the task's timeout updates and to the stream's waking. It is also used in
CF_MASK_ANALYSER mask to trigger channels anaylzers. In the past, it was cleared
by process_stream. Because of a bug (fixed in commit 95fad5ba4 ["BUG/MAJOR:
stream-int: don't re-arm recv if send fails"]), It is now cleared before each
send and in stream_int_notify. So it is possible to loss this information when
process_stream is called, preventing analyzers to be called, and possibly
leading to a stalled stream.
Today, this happens in HTTP2 when you call the stat page or when you use the
cache filter. In fact, this happens when the response is sent by an applet. In
HTTP1, everything seems to work as expected.
To fix the problem, we need to make the difference between the write activity
reported to lower layers and the one reported to the stream. So the flag
CF_WRITE_EVENT has been added to notify the stream of the write activity on a
channel. It is set when a send succedded and reset by process_stream. It is also
used in CF_MASK_ANALYSER. finally, it is checked in stream_int_notify to wake up
a stream and in channel_check_timeouts.
This bug is probably present in 1.7 but it seems to have no effect. So for now,
no needs to backport it.
Willy Tarreau [Thu, 9 Nov 2017 10:23:00 +0000 (11:23 +0100)]
BUG/MEDIUM: h2: reject non-3-digit status codes
If the H1 parser would report a status code length not consisting in
exactly 3 digits, the error case was confused with a lack of buffer
room and was causing the parser to loop infinitely.
Willy Tarreau [Thu, 9 Nov 2017 10:15:45 +0000 (11:15 +0100)]
BUG/MINOR: h1: the HTTP/1 make status code parser check for digits
The H1 parser used by the H2 gateway was a bit lax and could validate
non-numbers in the status code. Since it computes the code on the fly
it's problematic, as "30:" is read as status code 310. Let's properly
check that it's a number now. No backport needed.
Willy Tarreau [Wed, 8 Nov 2017 13:33:36 +0000 (14:33 +0100)]
BUILD: shctx: do not depend on openssl anymore
The build breaks on a machine without openssl/crypto.h because shctx
still loads openssl-compat.h while it doesn't need it anymore since
the code was moved :
In file included from src/shctx.c:20:0:
include/proto/openssl-compat.h:3:28: fatal error: openssl/crypto.h: No such file or directory
#include <openssl/crypto.h>
Willy Tarreau [Wed, 8 Nov 2017 13:25:59 +0000 (14:25 +0100)]
BUILD: ssl: fix build of backend without ssl
Commit 522eea7 ("MINOR: ssl: Handle sending early data to server.") added
a dependency on SRV_SSL_O_EARLY_DATA which only exists when USE_OPENSSL
is defined (which is probably not the best solution) and breaks the build
when ssl is not enabled. Just add an ifdef USE_OPENSSL around the block
for now.
This adds a new keyword on the "server" line, "allow-0rtt", if set, we'll try
to send early data to the server, as long as the client sent early data, as
in case the server rejects the early data, we no longer have them, and can't
resend them, so the only option we have is to send back a 425, and we need
to be sure the client knows how to interpret it correctly.
MINOR: ssl: Handle session resumption with TLS 1.3
With TLS 1.3, session aren't established until after the main handshake
has completed. So we can't just rely on calling SSL_get1_session(). Instead,
we now register a callback for the "new session" event. This should work for
previous versions of TLS as well.
Willy Tarreau [Wed, 8 Nov 2017 13:05:19 +0000 (14:05 +0100)]
BUG/MAJOR: threads/tasks: fix the scheduler again
My recent change in commit ce4e0aa ("MEDIUM: task: change the construction
of the loop in process_runnable_tasks()") was bogus as it used to keep the
rq_next across an unlock/lock sequence, occasionally leading to crashes for
tasks that are eligible to any thread. We must use the lookup call for each
new batch instead. The problem is easily triggered with such a configuration :
global
nbthread 4
listen check
mode http
bind 0.0.0.0:8080
redirect location /
option httpchk GET /
server s1 127.0.0.1:8080 check inter 1
server s2 127.0.0.1:8080 check inter 1
Thanks to Olivier for diagnosing this one. No backport is needed.
Willy Tarreau [Tue, 7 Nov 2017 14:07:25 +0000 (15:07 +0100)]
BUG/MINOR: stream-int: don't set MSG_MORE on closed request path
Commit 4ac4928 ("BUG/MINOR: stream-int: don't set MSG_MORE on SHUTW_NOW
without AUTO_CLOSE") was incomplete. H2 reveals another situation where
the input stream is marked closed with the request and we set MSG_MORE,
causing a delay before the request leaves.
Better avoid setting the flag on the request path for close cases in
general.
Willy Tarreau [Tue, 7 Nov 2017 13:42:12 +0000 (14:42 +0100)]
BUG/MINOR: h2: don't send GOAWAY on failed response
As part of the detection for intentional closes, we can kill the
connection if a shutw() happens before the headers. But it can also
happen that an invalid response is not properly parsed, preventing
any headers frame from being sent and making the function believe
it was an abort. Now instead we check if any response was received
from the stream, regardless of the fact that it was properly
converted.
Willy Tarreau [Tue, 7 Nov 2017 10:59:51 +0000 (11:59 +0100)]
MINOR: h2: don't re-enable the connection's task when we're closing
It's pointless to requeue the task when we're closing, so swap the
order of the task_queue() and h2_release(). It also matches what
was written in the comment regarding re-arming the timer.
Willy Tarreau [Tue, 7 Nov 2017 10:48:46 +0000 (11:48 +0100)]
BUG/MEDIUM: h2: don't close the connection is there are data left
h2_detach() is called after a stream was closed, and it evaluates if it's
worth closing the connection. The issue there is that the connection is
closed too early in case there's demand for closing after the last stream,
even if some data remain in the mux. Let's change the condition to check
for this.
Willy Tarreau [Tue, 7 Nov 2017 10:08:28 +0000 (11:08 +0100)]
BUG/MEDIUM: h2: fix some wrong error codes on connections
When the assignment of the connection state was moved into h2c_error(),
3 of them were missed because they were wrong, using H2_SS_ERROR instead.
This resulted in the connection's state being set to H2_CS_ERROR2 in fact,
so the error was not properly sent.
Willy Tarreau [Tue, 7 Nov 2017 10:05:42 +0000 (11:05 +0100)]
MEDIUM: h2: remove the H2_SS_RESET intermediate state
This one was created to maintain the knowledge that a stream was closed
after having sent an RST_STREAM frame but that's not needed anymore and
it confuses certain conditions on the error processing path. It's time
to get rid of it.
Willy Tarreau [Tue, 7 Nov 2017 10:03:56 +0000 (11:03 +0100)]
BUG/MEDIUM: h2: don't try (and fail) to send non-existing data in the mux
The call to xprt->snd_buf() was not conditionned on the presence of
data in the buffer, resulting in snd_buf() returning 0 and never
disabling the polling. It was revealed by the previous bug on error
processing but must properly be handled.
Willy Tarreau [Tue, 7 Nov 2017 08:43:06 +0000 (09:43 +0100)]
BUG/MEDIUM: h2: properly send an RST_STREAM on mux stream error
Some stream errors are detected on the MUX path (eg: H1 response
encoding). The ones forgot to emit an RST_STREAM frame, causing the
client to wait and/or to see the connection being immediately closed.
This is now fixed.
Willy Tarreau [Mon, 6 Nov 2017 19:20:51 +0000 (20:20 +0100)]
BUG/MINOR: h2: set the "HEADERS_SENT" flag on stream, not connection
This flag was added after the GOAWAY flags were introduced and mistakenly
placed in the connection, but that doesn't make sense as it's specific to
the stream. The main impact is the risk of returning a DATA0+ES frame for
an error instead of an RST_STREAM.
From first-hand experience I realized that using encrypted passwords in
userlists can quickly become overwhelming for busy sites. In my case
just about 100 rq/s were enough to drive (user) CPU usage from 2-3% up
to >90%. While it is perfectly explicable why this is the case, having
it mentioned in the relevant documentation section might spare someone
some confusion in the future.
BUG/MINOR: dns: Don't lock the server lock in snr_check_ip_callback().
snr_check_ip_callback() may be called with the server lock, so don't attempt
to lock it again, instead, make sure the callers always have the lock before
calling it.
Willy Tarreau [Sun, 5 Nov 2017 15:35:59 +0000 (16:35 +0100)]
MEDIUM: tasks: implement a lockless scheduler for single-thread usage
The scheduler is complex and uses local queues to amortize the cost of
locks. But all this comes with a cost that is quite observable with
single-thread workloads.
The purpose of this patch is to reimplement the much simpler scheduler
for the case where threads are not used. The code is very small and
simple. It doesn't impact the multi-threaded performance at all, and
provides a nice 10% performance increase in single-thread by reaching
606kreq/s on the tests that showed 550kreq/s before.
Willy Tarreau [Mon, 6 Nov 2017 07:36:53 +0000 (08:36 +0100)]
MINOR: tasks: only visit filled task slots after processing them
process_runnable_tasks() needs to requeue or wake up tasks after
processing them in batches. By only refilling the existing ones, we
avoid revisiting all the queue. The performance gain is measurable
starting with two threads, where the request rate climbs to 657k/s
compared to 644k.
Willy Tarreau [Mon, 6 Nov 2017 00:03:26 +0000 (01:03 +0100)]
MINOR: threads: use faster locks for the spin locks
The spin locks used to rely on W locks, which involve a loop waiting
for readers to leave, and this doesn't happen here. It's more efficient
to use S locks instead, which are also mutually exclusive and do not
have this loop. This saves one test per spinlock and a few tens of
bytes allowing certain functions to be inlined.
Willy Tarreau [Sun, 5 Nov 2017 22:57:00 +0000 (23:57 +0100)]
MEDIUM: task: change the construction of the loop in process_runnable_tasks()
This patch slightly rearranges the loop to pack the locked code a little
bit, and to try to concentrate accesses to the tree together to benefit
more from the cache.
It also fixes how the loop handles the right margin : now that is guaranteed
that the retrieved nodes are filtered to only match the current thread, we
don't need to rewind every 16 entries. Instead we can rewind each time we
reach the right margin again.
With this change, we now achieve the following performance for 10 H2 conns
each containing 100 streams :
Willy Tarreau [Sun, 5 Nov 2017 18:09:27 +0000 (19:09 +0100)]
MINOR: task: simplify wake_expired_tasks() to avoid unlocking in the loop
This function is sensitive, let's make it shorter by factoring out the
unlock and leave code. This reduced the function's size by a few tens
of bytes and increased the overall performance by about 1%.
Willy Tarreau [Sun, 5 Nov 2017 12:34:20 +0000 (13:34 +0100)]
MAJOR: task: make use of the scope-aware ebtree functions
Currently the task scheduler suffers from an O(n) lookup when
skipping tasks that are not for the current thread. The reason
is that eb32_lookup_ge() has no information about the current
thread so it always revisits many tasks for other threads before
finding its own tasks.
This is particularly visible with HTTP/2 since the number of
concurrent streams created at once causes long series of tasks
for the same stream in the scheduler. With only 10 connections
and 100 streams each, by running on two threads, the performance
drops from 640kreq/s to 11.2kreq/s! Lookup metrics show that for
only 200000 task lookups, 430 million skips had to be performed,
which means that on average, each lookup leads to 2150 nodes to
be visited.
This commit backports the principle of scope lookups for ebtrees
from the ebtree_v7 development tree. The idea is that each node
contains a mask indicating the union of the scopes for the nodes
below it, which is fed during insertion, and used during lookups.
Then during lookups, branches that do not contain any leaf matching
the requested scope are simply ignored. This perfectly matches a
thread mask, allowing a thread to only extract the tasks it cares
about from the run queue, and to always find them in O(log(n))
instead of O(n). Thus the scheduler uses tid_bit and
task->thread_mask as the ebtree scope here.
Doing this has recovered most of the performance, as can be seen on
the test below with two threads, 10 connections, 100 streams each,
and 1 million requests total :
Before After Gain
test duration : 89.6s 4.73s x19
HTTP requests/s (DEBUG) : 11200 211300 x19
HTTP requests/s (PROD) : 15900 447000 x28
spin_lock time : 85.2s 0.46s /185
time per lookup : 13us 40ns /325
Even when going to 6 threads (on 3 hyperthreaded CPU cores), the
performance stays around 284000 req/s, showing that the contention
is much lower.
A test showed that there's no benefit in using this for the wait queue
though.
In the scheduler we always have to loop back to the beginning after
we don't find the last entry, so let's implement this in a new lookup
function instead. The resulting code is slightly faster, mostly due
to the fact that there's much less inlined code in the fast path.
Willy Tarreau [Sun, 5 Nov 2017 13:33:01 +0000 (14:33 +0100)]
MEDIUM: ebtree: only consider the branches matching the scope in lookups
Now when looking up a node via eb32sc_first(), eb32sc_next(), and
eb32sc_lookup_ge(), we only focus on the branches matching the requested
scope. The code must be careful to miss no branch. It changes a little
bit from the previous one because the scope stored on the intermediary
nodes is not exact (since we don't propagate upwards during deletion),
so in case a lookup fails, we have to walk up and pick the next matching
entry.
Willy Tarreau [Sun, 5 Nov 2017 17:06:22 +0000 (18:06 +0100)]
MINOR: ebtree: update the eb32sc parent node's scope on delete
During a delete operation, if the deleted node is above its leaf's
parent, this parent will replace the node and then go up. In this
case it is important to update the new parent's scope to reflect
the presence of other branches.
It's worth noting that in theory we should precisely recompute the
exact node value, but it seems that it's not worth it for the rare
cases there is a mismatch.
Willy Tarreau [Sun, 5 Nov 2017 12:31:29 +0000 (13:31 +0100)]
MINOR: ebtree: implement the scope-aware functions for eb32
A new kind of tree nodes is currently being developed in ebtree v7,
consisting in storing a scope in each node indicating a visibility
mask so that certain nodes are not reported on certain lookups. The
initial goal was to make this usable with a multi-thread scheduler.
Since the ebtree v7 code is completely different from v6, this patch
instead copies the minimally required functions from eb32 and ebtree
and calls them "eb32sc_*". At the moment the scope is not implemented,
it's only passed in arguments.
Willy Tarreau [Sun, 5 Nov 2017 11:01:11 +0000 (12:01 +0100)]
MINOR: applets: no need to check for runqueue's emptiness in appctx_res_wakeup()
The __appctx_wakeup() function already does it. It matters with threads
enabled because it simplifies the code in appctx_res_wakeup() to get rid
of this test.
Willy Tarreau [Sun, 5 Nov 2017 10:50:18 +0000 (11:50 +0100)]
BUG/MEDIUM: threads: don't try to free build option message on exit
Commit 0493149 ("MINOR: thread: report multi-thread support in haproxy -vv")
added information about thread support in haproxy -vv output but accidently
marked the message as "must_free" while it's a constant. This causes a segv
on the old process on clean exit if threads are enabled. It doesn't affect
the stability during operations however.