In order to use arbitrary data in the CLI (multiple lines or group of words
that must be considered as a whole, for example), it is now possible to add a
payload to the commands. To do so, the first line needs to end with a special
pattern: <<\n. Everything that follows will be left untouched by the CLI parser
and will be passed to the commands parsers.
Per-command support will need to be added to take advantage of this
feature.
BUG/MEDIUM: h2: implement missing support for chunked encoded uploads
Upload requests not carrying a content-length nor tunnelling data must
be sent chunked-encoded over HTTP/1. The code was planned but for some
reason forgotten during the implementation, leading to such payloads to
be sent as tunnelled data.
Browsers always emit a content length in uploads so this problem doesn't
happen for most sites. However some applications may send data frames
after a request without indicating it earlier.
The only way to detect that a client will need to send data is that the
HEADERS frame doesn't hold the ES bit. In this case it's wise to look
for the content-length header. If it's not there, either we're in tunnel
(CONNECT method) or chunked-encoding (other methods).
This patch implements this.
The following request is sent using content-length :
MINOR: h2: detect presence of CONNECT and/or content-length
We'll need this in order to support uploading chunks. The h2 to h1
converter checks for the presence of the content-length header field
as well as the CONNECT method and returns these information to the
caller. The caller indicates whether or not a body is detected for
the message (presence of END_STREAM or not). No transfer-encoding
header is emitted yet.
Tim Duesterhus [Tue, 24 Apr 2018 11:56:01 +0000 (13:56 +0200)]
BUG/MEDIUM: lua: Fix segmentation fault if a Lua task exits
PiBa-NL reported that haproxy crashes with a segmentation fault
if a function registered using `core.register_task` returns.
An example Lua script that reproduces the bug is:
mytask = function()
core.Info("Stopping task")
end
core.register_task(mytask)
The Valgrind output is as follows:
==6759== Process terminating with default action of signal 11 (SIGSEGV)
==6759== Access not within mapped region at address 0x20
==6759== at 0x5B60AA9: lua_sethook (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==6759== by 0x430264: hlua_ctx_resume (hlua.c:1009)
==6759== by 0x43BB68: hlua_process_task (hlua.c:5525)
==6759== by 0x4FED0A: process_runnable_tasks (task.c:231)
==6759== by 0x4B2256: run_poll_loop (haproxy.c:2397)
==6759== by 0x4B2256: run_thread_poll_loop (haproxy.c:2459)
==6759== by 0x41A7E4: main (haproxy.c:3049)
Add the missing `task = NULL` for the `HLUA_E_OK` case. The error cases
have been fixed as of 253e53e661c49fb9723535319cf511152bf09bc7 which
first was included in haproxy v1.8-dev3. This bugfix should be backported
to haproxy 1.8.
BUG/MINOR: log: t_idle (%Ti) is not set for some requests
If TCP content inspection is used, msg_state can be >= HTTP_MSG_ERROR
the first time http_wait_for_request is called. t_idle was being left
unset in that case.
In the example below :
stick-table type string len 64 size 100k expire 60s
tcp-request inspect-delay 1s
tcp-request content track-sc1 hdr(X-Session)
%Ti will always be -1, because the msg_state is already at HTTP_MSG_BODY
when http_wait_for_request is called for the first time.
Tim Duesterhus [Tue, 24 Apr 2018 17:20:43 +0000 (19:20 +0200)]
BUG/MAJOR: channel: Fix crash when trying to read from a closed socket
When haproxy is compiled using GCC <= 3.x or >= 5.x the `unlikely`
macro performs a comparison with zero: `(x) != 0`, thus returning
either 0 or 1.
In `int co_getline_nc()` this macro was accidentally applied to
the variable `retcode` itself, instead of the result of the
comparison `retcode <= 0`. As a result any negative `retcode`
is converted to `1` for purposes of the comparison.
Thus never taking the branch (and exiting the function) for
negative values.
This in turn leads to reads of uninitialized memory in the for-loop
below:
==12141== Conditional jump or move depends on uninitialised value(s)
==12141== at 0x4EB6B4: co_getline_nc (channel.c:346)
==12141== by 0x421CA4: hlua_socket_receive_yield (hlua.c:1713)
==12141== by 0x421F6F: hlua_socket_receive (hlua.c:1896)
==12141== by 0x529B08F: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x52A7EFC: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529B497: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529711A: lua_pcallk (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x52ABDF0: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529B08F: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x52A7EFC: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529A9F1: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529B523: lua_resume (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141==
==12141== Use of uninitialised value of size 8
==12141== at 0x4EB6B9: co_getline_nc (channel.c:346)
==12141== by 0x421CA4: hlua_socket_receive_yield (hlua.c:1713)
==12141== by 0x421F6F: hlua_socket_receive (hlua.c:1896)
==12141== by 0x529B08F: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x52A7EFC: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529B497: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529711A: lua_pcallk (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x52ABDF0: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529B08F: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x52A7EFC: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529A9F1: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529B523: lua_resume (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141==
==12141== Invalid read of size 1
==12141== at 0x4EB6B9: co_getline_nc (channel.c:346)
==12141== by 0x421CA4: hlua_socket_receive_yield (hlua.c:1713)
==12141== by 0x421F6F: hlua_socket_receive (hlua.c:1896)
==12141== by 0x529B08F: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x52A7EFC: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529B497: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529711A: lua_pcallk (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x52ABDF0: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529B08F: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x52A7EFC: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529A9F1: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== by 0x529B523: lua_resume (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==12141== Address 0x8637171e928bb500 is not stack'd, malloc'd or (recently) free'd
Fix this bug by correctly applying the `unlikely` macro to the result of the comparison.
The incoming H2 frame length was checked against the max_frame_size
setting instead of being checked against the bufsize. The max_frame_size
only applies to outgoing traffic and not to incoming one, so if a large
enough frame size is advertised in the SETTINGS frame, a wrapped frame
will be defragmented into a temporary allocated buffer where the second
fragment my overflow the heap by up to 16 kB.
It is very unlikely that this can be exploited for code execution given
that buffers are very short lived and their address not realistically
predictable in production, but the likeliness of an immediate crash is
absolutely certain.
This fix must be backported to 1.8.
Many thanks to Jordan Zebor from F5 Networks for reporting this issue
in a responsible way.
Recent commit 9631a28 ("MEDIUM: sample: Extend functionality for field/word
converters") introduced this minor build warning that this patch addresses :
src/sample.c: In function 'sample_conv_word':
src/sample.c:2108:8: warning: suggest explicit braces to avoid ambiguous 'else' [-Wparentheses]
src/sample.c:2137:8: warning: suggest explicit braces to avoid ambiguous 'else' [-Wparentheses]
BUG/MEDIUM: kqueue: When adding new events, provide an output to get errors.
When adding new events using kevent(), if there's an error, because we're
trying to delete an event that wasn't there, or because the fd has already
been closed, kevent() will either add an event in the eventlist array if
there's enough room for it, and keep on handling other events, or stop and
return -1.
We want it to process all the events, so give it a large-enough array to
store any error.
Special thanks to PiBa-NL for diagnosing the root cause of this bug.
Marcin Deranek [Mon, 16 Apr 2018 12:30:46 +0000 (14:30 +0200)]
MEDIUM: sample: Extend functionality for field/word converters
Extend functionality of field/word converters, so it's possible
to extract field(s)/word(s) counting from the beginning/end and/or
extract multiple fields/words (including separators) eg.
MINOR: cli: Ensure the CLI always outputs an error when it should
When using the CLI_ST_PRINT_FREE state, always output something back
if the faulty function did not fill the 'err' variable.
The map/acl code could lead to a crash whereas the SSL code was silently
failing.
BUG/MINOR: cli: Guard against NULL messages when using CLI_ST_PRINT_FREE
Some error paths (especially those followed when running out of memory)
can set the error message to NULL. In order to avoid a crash, use a
generic message ("Out of memory") when this case arises.
Ben Draut [Fri, 13 Apr 2018 21:43:04 +0000 (15:43 -0600)]
MINOR: config: Warn if resolvers has no nameservers
Today, a `resolvers` section may be configured without any `nameserver`
directives, which is useless. This implements a warning when such
sections are detected.
Marcin Deranek [Fri, 13 Apr 2018 12:37:50 +0000 (14:37 +0200)]
MINOR: proxy: Add fe_defbe fetcher
Patch adds ability to fetch frontend's default backend name in your
logic, so it can be used later to derive other backend names to make routing
decisions.
BUG/MINOR: http: Return an error in proxy mode when url2sa fails
In proxy mode, the result of url2sa is never checked. So when the function fails
to resolve the destination server from the URL, we continue. Depending on the
internal state of the connection, we get different behaviours. With a newly
allocated connection, the field <addr.to> is not set. So we will get a HTTP
error. The status code is 503 instead of 400, but it's not really critical. But,
if it's a recycled connection, we will reuse the previous value of <addr.to>,
opening a connection on an unexpected server.
To fix the bug, we return an error when url2sa fails.
This patch should be backported in all version from 1.5.
BUG/MEDIUM: connection: Make sure we have a mux before calling detach().
In some cases, we call cs_destroy() very early, so early the connection
doesn't yet have a mux, so we can't call mux->detach(). In this case,
just destroy the associated connection.
BUG/MEDIUM: threads: Fix the max/min calculation because of name clashes
With gcc < 4.7, when HAProxy is built with threads, the macros
HA_ATOMIC_CAS/XCHG/STORE relies on the legacy __sync builtins. These macros
are slightly complicated than the versions relying on the '_atomic'
builtins. Internally, some local variables are defined, prefixed with '__' to
avoid name clashes with the caller.
On the other hand, the macros HA_ATOMIC_UPDATE_MIN/MAX call HA_ATOMIC_CAS. Some
local variables are also definied in these macros, following the same naming
rule as below. The problem is that '__new' variable is used in
HA_ATOMIC_MIN/_MAX and in HA_ATOMIC_CAS. Obviously, the behaviour is undefined
because '__new' in HA_ATOMIC_CAS is left uninitialized. Unfortunatly gcc fails
to detect this error.
To fix the problem, all internal variables to macros are now suffixed with name
of the macros to avoid clashes (for instance, '__new_cas' in HA_ATOMIC_CAS).
BUG/MAJOR: cache: always initialize newly created objects
Recent commit 5bd37fa ("BUG/MAJOR: cache: fix random crashes caused by
incorrect delete() on non-first blocks") addressed an issue where
dangling objects could be deleted in the cache, but even after this fix
some similar segfaults were reported at the same place (cache_free_blocks()).
The tree was always corrupted as well. Placing some traces revealed
that this time it's caused by a missing initialization in
http_action_store_cache() : while object->eb.key is used to note that
the object is not in the tree, the first retrieved block may contain
random data and is not initialized. Further, this entry can be updated
later without the object being inserted into the tree. Thus, if at the
end the object is not stored and the blocks are put back to the avail
list, the next attempt to use them will find eb.key != 0 and will try
to delete the uninitialized block, will see that eb.node.leaf_p is not
NULL (random data), and will dereference it as well as a few other
uninitialized pointers. It was harder to trigger than the previous one,
despite being very closely related. This time the following config
was used :
Other values don't work well. The size and the small pieces in the
responses (p=1) are critical to make it work.
Here the fix consists in pre-zeroing object->eb.key AND object->eb.leaf_p
just after the object is allocated so as to stay consistent with other
locations. Ideally this could be simplified later by only relying on
eb->node.leaf_p everywhere since in the end the key alone is not a
reliable indicator, so that we use only one indicator of being part of
the tree or not.
MINOR: spoe: Add counters to log info about SPOE agents
In addition to metrics about time spent in the SPOE, following counters have
been added:
* applets : number of SPOE applets.
* idles : number of idle applets.
* nb_sending : number of streams waiting to send data.
* nb_waiting : number of streams waiting for a ack.
* nb_processed : number of events/groups processed by the SPOE (from the
stream point of view).
* nb_errors : number of errors during the processing (from the stream point of
view).
Log messages has been updated to report these counters. Following pattern has
been added at the end of the log message:
MINOR: spoe: Add loggers dedicated to the SPOE agent
Now it is possible to configure a logger in a spoe-agent section using a "log"
line, as for a proxy. "no log", "log global" and "log <address> ..." syntaxes
are supported.
MINOR: log: Keep the ref when a log server is copied to avoid duplicate entries
With "log global" line, the global list of loggers are copied into the proxy's
struct. The list coming from the default section is also copied when a frontend
or a backend section is parsed. So it is possible to have duplicate entries in
the proxy's list. For instance, with this following config, all messages will be
logged twice:
global
log 127.0.0.1 local0 debug
daemon
defaults
mode http
log global
option httplog
frontend front-http
log global
bind *:8888
default_backend back-http
MINOR: log: move 'log' keyword parsing in dedicated function
Now, the function parse_logsrv should be used to parse a "log" line. This
function will update the list of loggers passed in argument. It can release all
log servers when "no log" line was parsed (by the caller) or it can parse "log
global" or "log <address> ... " lines. It takes care of checking the caller
context (global or not) to prohibit "log global" usage in the global section.
MINOR: spoe: Add options to store processing times in variables
"set-process-time" and "set-total-time" options have been added to store
processing times in the transaction scope, at each event and group processing,
the current one and the total one. So it is possible to get them.
MINOR: spoe: Add metrics in to know time spent in the SPOE
Following metrics are added for each event or group of messages processed in the
SPOE:
* processing time: the delay to process the event or the group. From the
stream point of view, it is the latency added by the SPOE
processing.
* request time : It is the encoding time. It includes ACLs processing, if
any. For fragmented frames, it is the sum of all fragments.
* queue time : the delay before the request gets out the sending queue. For
fragmented frames, it is the sum of all fragments.
* waiting time: the delay before the reponse is received. No fragmentation
supported here.
* response time: the delay to process the response. No fragmentation supported
here.
* total time: (unused for now). It is the sum of all events or groups
processed by the SPOE for a specific threads.
Log messages has been updated. Before, only errors was logged (status_code !=
0). Now every processing is logged, following this format:
AGENT is the agent name
TYPE is EVENT of GROUP
NAME is the event or the group name
STREAM-ID is an integer, the unique id of the stream
STATUS_CODE is the processing's status code
reqT/qT/wT/resT/pT are delays descrive above
For all these delays, -1 means the processing was interrupted before the end. So
-1 for the queue time means the request was never dequeued. For fragmented
frames it is harder to know when the interruption happened.
For now, messages are logged using the same logger than the backend of the
stream which initiated the request.
BUG/MINOR: spoe: Don't forget to decrement fpa when a processing is interrupted
In async or pipelining mode, we count the number of NOTIFY frames sent waiting
for their corresponding ACK frames. This is a way to evaluate the "load" of a
SPOE applet. For pipelining mode, it is easy to make the link between a NOTIFY
frame and its ACK one, because exchanges are done using the same TCP connection.
For async mode, it is harder because a ACK frame can be received on another
connection than the one sending the NOTIFY frame. So to decrement the fpa of the
right applet, we need to keep it in the SPOE context. Most of time, it works
expect when the processing is interrupted by the stream, because of a timeout.
This patch fixes this issue. If a SPOE applet is still link to a SPOE context
when the processing is interrupted by the stream, the applet's fpa is
decremented. This is only done for unfragmented frames.
BUG/MINOR: spoe: Register the variable to set when an error occurred
Variables referenced in HAProxy's configuration file are registered during the
configuration parsing (during parsing of "var", "set-var" or "unset-var"
keywords). For the SPOE, you can use "register-var-names" directive to
explicitly register variable names. All unknown variables will be rejected
(unless you set "force-set-var" option). But, the variable set when an error
occurred (when "set-on-error" option is defined) should also be regiestered by
default. This is done with this patch.
BUG/MINOR: spoe: Don't release the context buffer in .check_timeouts callbaclk
It is better to let spoe_stop_processing release this buffer because, in
.check_timeouts callback, we lack information to know if it should be release or
not. For instance, if the processing timeout is reached while the SPOE applet
receives the reply, it is preferable to ignore the timeout and process the
result.
BUG/MINOR: spoe: Initialize variables used during conf parsing before any check
Some initializations must be done at the beginning of parse_spoe_flt to avoid
segmentaion fault when first errors are catched, when the "filter spoe" line is
parsed.
This patch must be backported in 1.8.
[cf: the variable "curvars" doesn't exist in 1.8. So the patch must be adapted.]
BUG/MAJOR: cache: fix random crashes caused by incorrect delete() on non-first blocks
Several segfaults were reported in the cache, each time in eb_delete()
called from cache_free_blocks() itself called from shctx_row_reserve_hot().
Each time the tree node was corrupted with random cached data (often JS or
HTML contents).
The problem comes from an incompatibility between the cache's expectations
and the recycling algorithm used in the shctx. The shctx allocates and
releases a chain of blocks at once. And when it needs to allocate N blocks
from the avail list while a chain of M>N is found, it picks the first N
from the list, moves them to the hot list, and marks all remaining M-N
blocks as isolated blocks (chains of 1).
For each such released block, the shctx->free_block() callback is used
and passed a pointer to the first and current block of the chain. For
the cache, it's cache_free_blocks(). What this function does is check
that the current block is the first one, and in this case delete the
object from the tree and mark it as not in tree by setting key to zero.
The problem this causes is that the tail blocks when M>N become first
blocks for the next call to shctx_row_reserve_hot(), these ones will
be passed to cache_free_blocks() as list heads, and will be sent to
eb_delete() despite containing only cached data.
The simplest solution for now is to mark each block as holding no cache
object by setting key to zero all the time. It keeps the principle used
elsewhere in the code. The SSL code is not subject to this problem
because it relies on the block's len not being null, which happens
immediately after a block was released. It was uncertain however whether
this method is suitable for the cache. It is not critical though since
this code is going to change soon in 1.9 to dynamically allocate only
the number of required blocks.
This fix must be backported to 1.8. Thanks to Thierry for providing
exploitable cores.
The "show cache" command used to dump the header for each entry into into
the handler loop, making it repeated every ~16kB of output data. Additionally
chunk_appendf() was used instead of chunk_printf(), causing the output to
repeat already emitted lines, and the output size to grow in O(n^2). It used
to take several minutes to report tens of millions of objects from a small
cache containing only a few thousands. There was no more impact though.
BUG/MINOR: email-alert: Set the mailer port during alert initialization
Since the commit 2f3a56b4f ("BUG/MINOR: tcp-check: use the server's service port
as a fallback"), email alerts stopped working because the mailer's port was
overriden by the server's port. Remember, email alerts are defined as checks
with specific tcp-check rules and triggered on demand to send alerts. So to send
an email, a check is executed. Because no specific port's was defined, the
server's one was used.
To fix the bug, the ports used for checks attached an email alert are explicitly
set using the mailer's port. So this port will be used instead of the server's
one.
In this patch, the assignement to a default port (587) when an email alert is
defined has been removed. Indeed, when a mailer is defined, the port must be
defined. So the default port was never used.
BUG/MINOR: fd: Don't clear the update_mask in fd_insert.
Clearing the update_mask bit in fd_insert may lead to duplicate insertion
of fd in fd_updt, that could lead to a write past the end of the array.
Instead, make sure the update_mask bit is cleared by the pollers no matter
what.
This should be backported to 1.8.
[wt: warning: 1.8 doesn't have the lockless fdcache changes and will
require some careful changes in the pollers]
BUG/MINOR: checks: check the conn_stream's readiness and not the connection
Since commit 9aaf778 ("MAJOR: connection : Split struct connection into
struct connection and struct conn_stream."), the checks use a conn_stream
and not directly the connection anymore. However wake_srv_chk() still used
to verify the connection's readiness instead of the conn_stream's. Due to
the existence of a mux, the connection is always waiting for receiving
something, and doesn't reflect the changes made in event_srv_chk_{r,w}(),
causing the connection appear as not ready yet, and the check to be
validated only after its timeout. The difference is only visible when
sending pure TCP checks, and simply adding a "tcp-check connect" line
is enough to work around it.
Willy Tarreau [Fri, 30 Mar 2018 15:35:38 +0000 (17:35 +0200)]
BUG/MEDIUM: h2: always add a stream to the send or fctl list when blocked
When a stream blocks on a mux buffer full/unallocated or on connection
flow control, a flag among H2_SF_MUX_M* is set, but the stream is not
always added to the connection's list. It's properly done when the
operations are performed from the connection handler but not always when
done from the stream handler. For instance, a simple shutr or shutw may
fail by lack of room. If it's immediately followed by a call to h2_detach(),
the stream remains lying around in no list at all, and prevents the
connection from ending. This problem is actually quite difficult to
trigger and seems to require some large objects and low server-side
timeouts.
This patch covers all identified paths. Some are redundant but since the
code will change and will be simplified in 1.9, it's better to stay on
the safe side here for now. It must be backported to 1.8.
Willy Tarreau [Fri, 30 Mar 2018 15:41:19 +0000 (17:41 +0200)]
BUG/MINOR: h2: remove accidental debug code introduced with show_fd function
Commit e3f36cd ("MINOR: h2: implement a basic "show_fd" function")
accidently brought one surrounding debugging part that was in the same
context. No backport needed.
- st0 is the connection's state (FRAME_H here)
- flg is the connection's flags (MUX_MFULL here)
- fctl_cnt is the number of streams in the fctl_list
- send_cnt is the number of streams in the send_list
- tree_cnt is the number of streams in the streams_by_id tree
- orph_cnt is the number of orphaned streams (cs==0) in the tree
Willy Tarreau [Fri, 30 Mar 2018 12:41:19 +0000 (14:41 +0200)]
MINOR: mux: add a "show_fd" function to dump debugging information for "show fd"
This function will be called from the CLI's "show fd" command to append some
extra mux-specific information that only the mux handler can decode. This is
supposed to help collect various hints about what is happening when facing
certain anomalies.
Willy Tarreau [Thu, 29 Mar 2018 13:41:32 +0000 (15:41 +0200)]
BUG/MEDIUM: h2: don't consider pending data on detach if connection is in error
Interrupting an h2load test shows that some connections remain active till
the client timeout. This is due to the fact that h2_detach() immediately
returns if the h2s flags indicate that the h2s is still waiting for some
buffer room in the output mux (possibly to emit a response or to send some
window updates). If the connection is broken, these data will never leave
and must not prevent the stream from being terminated nor the connection
from being released.
Willy Tarreau [Thu, 29 Mar 2018 13:22:59 +0000 (15:22 +0200)]
BUG/MEDIUM: h2/threads: never release the task outside of the task handler
Currently, h2_release() will release all resources assigned to the h2
connection, including the timeout task if any. But since the multi-threaded
scheduler, the timeout task could very well be queued in the thread-local
list of running tasks without any way to remove it, so task_delete() will
have no effect and task_free() will cause this undefined object to be
dereferenced.
In order to prevent this from happening, we never release the task in
h2_release(), instead we wake it up after marking its context NULL so that
the task handler can release the task.
Future improvements could consist in modifying the scheduler so that a
task_wakeup() has to be done on any task having to be killed, letting
the scheduler take care of it.
This fix must be backported to 1.8. This bug was apparently not reported
so far.
Willy Tarreau [Wed, 28 Mar 2018 11:56:39 +0000 (13:56 +0200)]
MINOR: h2: fuse h2s_detach() and h2s_free() into h2s_destroy()
Since these two functions are always used together, let's simplify
the code by having a single one for both operations. It also ensures
we don't leave wandering elements that risk to leak later.
Willy Tarreau [Wed, 28 Mar 2018 11:51:45 +0000 (13:51 +0200)]
MINOR: h2: always call h2s_detach() in h2_detach()
The code is safer and more robust this way, it avoids multiple paths.
This is possible due to the idempotence of LIST_DEL() and eb32_delete()
that are called in h2s_detach().
Willy Tarreau [Wed, 28 Mar 2018 09:29:04 +0000 (11:29 +0200)]
BUG/MAJOR: h2: remove orphaned streams from the send list before closing
Several people reported very strange occasional crashes when using H2.
Every time it appeared that either an h2s or a task was corrupted. The
outcome is that a missing LIST_DEL() when removing an orphaned stream
from the list in h2_wake_some_streams() can cause this stream to
remain present in the send list after it was freed. This may happen
when receiving a GOAWAY frame for example. In the mean time the send
list may be processed due to pending streams, and the just released
stream is still found. If due to a buffer full condition we left the
h2_process_demux() loop before being able to process the pending
stream, the pool entry may be reassigned somewhere else. Either another
h2 connection will get it, or a task, since they are the same size and
are shared. Then upon next pass in h2_process_mux(), the stream is
processed again. Either it crashes here due to modifications, or the
contents are harmless to it and its last changes affect the other object
reasigned to this area (typically a struct task). In the case of a
collision with struct task, the LIST_DEL operation performed on h2s
corrupts the task's wait queue's leaf_p pointer, thus all the wait
queue's structure.
The fix consists in always performing the LIST_DEL in h2s_detach().
It will also make h2s_stream_new() more robust against a possible
future situation where stream_create_from_cs() could have sent data
before failing.
Many thanks to all the reporters who provided extremely valuable
information, traces and/or cores, namely Thierry Fournier, Yves Lafon,
Holger Amann, Peter Lindegaard Hansen, and discourse user "slawekc".
This fix must be backported to 1.8. It is probably better to also
backport the following code cleanups with it as well to limit the
divergence between master and 1.8-stable :
00dd078 CLEANUP: h2: rename misleading h2c_stream_close() to h2s_close() 0a10de6 MINOR: h2: provide and use h2s_detach() and h2s_free()
Willy Tarreau [Thu, 29 Mar 2018 11:19:37 +0000 (13:19 +0200)]
BUILD/MINOR: cli: fix a build warning introduced by last commit
Commit 35b1b48 ("MINOR: cli: make "show fd" report the mux and mux_ctx
pointers when available") introduced an accidental build warning due to
a missing const statement.
Willy Tarreau [Wed, 28 Mar 2018 16:41:30 +0000 (18:41 +0200)]
MINOR: cli: make "show fd" report the mux and mux_ctx pointers when available
This is handy to quickly distinguish H2 connections as well as to easily
access the h2c context. It could be backported to 1.8 to help during
troubleshooting sessions.
Willy Tarreau [Tue, 27 Mar 2018 13:06:02 +0000 (15:06 +0200)]
BUG/MINOR: hpack: fix harmless use of uninitialized value in hpack_dht_insert
A warning is reported here by valgrind on first pass in hpack_dht_insert().
The cause is that the not-yet-initialized dht->head is checked in
hpack_dht_get_tail(), though the result is not used, making it have
no impact. At the very least it confuses valgrind, and maybe it makes
it harder for gcc to optimize the code path. Let's move the variable
initialization around to shut it up. Thanks to Olivier for reporting
this one.
This fix may be backported to 1.8 at least to make valgrind usage less
painful.
Mark Lakes [Tue, 27 Mar 2018 07:48:06 +0000 (09:48 +0200)]
MINOR: lua: allow socket api settimeout to accept integers, float, and doubles
Instead of hlua_socket_settimeout() accepting only integers, allow user
to specify float and double as well. Convert to milliseconds much like
cli_parse_set_timeout but also sanity check the value.
The main goal is to keep compatibility with the LuaSocket API. This
API only accept seconds, so using a float to specify milliseconds is
an acceptable way.
Ilya Shipitsin [Sat, 24 Mar 2018 12:17:32 +0000 (17:17 +0500)]
BUILD/MINOR: fix build when USE_THREAD is not defined
src/queue.o: In function `pendconn_redistribute':
/home/ilia/haproxy/src/queue.c:272: undefined reference to `thread_want_sync'
src/queue.o: In function `pendconn_grab_from_px':
/home/ilia/haproxy/src/queue.c:311: undefined reference to `thread_want_sync'
src/queue.o: In function `process_srv_queue':
/home/ilia/haproxy/src/queue.c:184: undefined reference to `thread_want_sync'
collect2: error: ld returned 1 exit status
make: *** [Makefile:900: haproxy] Error 1
The output of these function indicates that one element is pushed in
the stack, but no element is set in the stack. Actually, if anyone
read the value returned by this function, is gets "something"
present in the stack.
Ilya Shipitsin [Fri, 23 Mar 2018 12:41:48 +0000 (17:41 +0500)]
CLEANUP: map, stream: remove duplicate code in src/map.c, src/stream.c
issue was identified by cppcheck
[src/map.c:372] -> [src/map.c:376]: (warning) Variable 'appctx->st2' is reassigned a value before the old one has been used. 'break;' missing?
[src/map.c:433] -> [src/map.c:437]: (warning) Variable 'appctx->st2' is reassigned a value before the old one has been used. 'break;' missing?
[src/map.c:555] -> [src/map.c:559]: (warning) Variable 'appctx->st2' is reassigned a value before the old one has been used. 'break;' missing?
[src/stream.c:3264] -> [src/stream.c:3268]: (warning) Variable 'appctx->st2' is reassigned a value before the old one has been used. 'break;' missing?
BUG/MINOR: listener: Don't decrease actconn twice when a new session is rejected
When a freshly created session is rejected, for any reason, during the accept in
the function "session_accept_fd", the variable "actconn" is decreased twice. The
first time when the rejected session is released, then in the function
"listener_accpect", because of the failure. So it is possible to have an
negative value for actconn. Note that, in this case, we will also have a negatve
value for the current number of connections on the listener rejecting the
session (actconn and l->nbconn are in/decreased in same time).
It is easy to reproduce the bug with this small configuration:
global
stats socket /tmp/haproxy
listen test
bind *:12345
tcp-request connection reject if TRUE
A "show info" on the stat socket, after a connection attempt, will show a very
high value (the unsigned representation of -1).
To fix the bug, if the function "session_accept_fd" returns an error, it
decrements the right counters and "listener_accpect" leaves them untouched.
Willy Tarreau [Thu, 22 Mar 2018 16:37:05 +0000 (17:37 +0100)]
BUG/MINOR: h2: ensure we can never send an RST_STREAM in response to an RST_STREAM
There are some corner cases where this could happen by accident. Since
the spec explicitly forbids this (RFC7540#5.4.2), let's add a test in
the two only functions which make the RST to avoid this. Thanks to user
klzgrad for reporting this problem. Usually it is expected to be harmless
but may result in browsers issuing a warning.
Willy Tarreau [Thu, 22 Mar 2018 15:53:12 +0000 (16:53 +0100)]
BUG/MEDIUM: h2: properly account for DATA padding in flow control
Recent fixes made to process partial frames broke the flow control on
DATA frames, as the padding is not considered anymore, only the actual
data is. Let's simply take account of the padding once the transfer
ends. The probability to meet this bug is low because, when used, padding
is small and it can require a large number of padded transfers before the
window is completely depleted.
Thanks to user klzgrad for reporting this bug and confirming the fix.
Emmanuel Hocdet [Mon, 5 Feb 2018 14:26:43 +0000 (15:26 +0100)]
MINOR: proxy-v2-options: add crc32c
This patch add option crc32c (PP2_TYPE_CRC32C) to proxy protocol v2.
It compute the checksum of proxy protocol v2 header as describe in
"doc/proxy-protocol.txt".
Willy Tarreau [Tue, 20 Mar 2018 18:06:52 +0000 (19:06 +0100)]
BUG/MEDIUM: fd/threads: ensure the fdcache_mask always reflects the cache contents
Commit 4815c8c ("MAJOR: fd/threads: Make the fdcache mostly lockless.")
made the fd cache lockless, but after a few iterations, a subtle part was
lost, consisting in setting the bit on the fd_cache_mask immediately when
adding an event. Now it was done only when the cache started to process
events, but the problem it causes is that fd_cache_mask isn't reliable
anymore as an indicator of presence of events to be processed with no
delay outside of fd_process_cached_events(). This results in some spurious
delays when processing inter-thread wakeups between tasks. Just restoring
the flag when the event is added is enough to fix the problem.
Kudos to Christopher for spotting this one!
No backport is needed as this is only in the development version.
Willy Tarreau [Tue, 20 Mar 2018 15:46:46 +0000 (16:46 +0100)]
BUILD/BUG: enable -fno-strict-overflow by default
Some time ago, integer overflows detection stopped working in the timer
code on recent compliers and were addressed by commit 73bdb32 ("BUG/MAJOR:
Use -fwrapv."). By then it was thought that -fno-strict-overflow was not
needed as implied, but it resulted from a misinterpretation of the doc,
as this one is still needed to disable pointer overflow optimization that
is automatically enabled at -O2/-O3/-Os.
Unfortunately the compiler happily removes overflow checks without the
slightest warning so it's not trivial to guess the extent of this issue
without comparing the emitted asm code. By checking the emitted assembly
code with and without the option, it was found that the only affected
location was the reported one, in ssl_sock_parse_clienthello(), where
the test can never fail on any system where the highest userland pointer
is at least 64kB away from wrapping (ie all 32/64 bit OS in field), so
there it is harmless.
This patch must be backported to all maintained versions.
Special thanks to Ilya Shipitsin for reporting this issue.
Willy Tarreau [Tue, 20 Mar 2018 10:17:29 +0000 (11:17 +0100)]
MINOR: log: stop emitting alerts when it's not possible to write on the socket
This is a recurring pain when using certain unix domain sockets or when
sending to temporarily unroutable addresses, if the process remains in
the foreground, the console is full of error which it's impossible to
do anything about. It's even worse when the process is remote, or when
run from a serial console which will slow the whole process down. Let's
send them only once now to warn about a possible config issue, and not
pollute the system nor slow everything down.
BUG/MEDIUM: threads/queue: wake up other threads upon dequeue
The previous patch about queues (5cd4bbd7a "BUG/MAJOR: threads/queue: Fix
thread-safety issues on the queues management") revealed a performance drop when
multithreading is enabled (nbthread > 1). This happens when pending connections
handled by other theads are dequeued. If these other threads are blocked in the
poller, we have to wait the poller's timeout (or any I/O event) to process the
dequeued connections.
To fix the problem, at least temporarly, we "wake up" the threads by requesting
a synchronization. This may seem a bit overkill to use the sync point to do a
wakeup on threads, but it fixes this performance issue. So we can now think
calmly on the good way to address this kind of issues.
This patch should be backported in 1.8 with the commit 5cd4bbd7a ("BUG/MAJOR:
threads/queue: Fix thread-safety issues on the queues management").
Baptiste Assmann [Mon, 19 Mar 2018 11:22:41 +0000 (12:22 +0100)]
BUG/MINOR: tcp-check: use the server's service port as a fallback
When running tcp-check scripts, one must ensure we can establish a tcp
connection first.
When doing this action, HAProxy needs a TCP port configured either on
the server or on the check itself or on the connect rule itself.
For some reasons, the connect code did not evaluate the service port on
the server structure...
BUG/MEDIUM: tcp-check: single connect rule can't detect DOWN servers
When tcpcheck is used to do TCP port monitoring only and the script is
composed by a single "tcp-check connect" rule (whatever port and ssl
options enabled), then the server can't be seen as DOWN.
Simple configuration to reproduce:
backend b
[...]
option tcp-check
tcp-check connect
server s1 127.0.0.1:22 check
The main reason for this issue is that the piece of code which validates
that we're not at the end of the chained list (of rules) prevents
executing the validation of the establishment of the TCP connection.
Since validation is not executed, the rule is terminated and the report
says no errors were encountered, hence the server is UP all the time.
The workaround is simple: move the connection validation outsied the
CONNECT rule processing loop, into the main function.
That way, if the connection status is not CONNECTED, then HAProxy will
now add more time to wait for it. If the time is expired, an error is
now well reported.
Bernard Spil [Thu, 15 Feb 2018 12:34:58 +0000 (13:34 +0100)]
BUILD: ssl: Fix build with OpenSSL without NPN capability
OpenSSL can be built without NEXTPROTONEG support by passing
-no-npn to the configure script. This sets the
OPENSSL_NO_NEXTPROTONEG flag in opensslconf.h
Since NEXTPROTONEG is now considered deprecated, it is superseeded
by ALPN (Application Layer Protocol Next), HAProxy should allow
building withough NPN support.
BUG/MINOR: cli: Ensure all command outputs end with a LF
Since 200b0fac ("MEDIUM: Add support for updating TLS ticket keys via
socket"), 4147b2ef ("MEDIUM: ssl: basic OCSP stapling support."), 4df59e9 ("MINOR: cli: add socket commands and config to prepend
informational messages with severity") and 654694e1 ("MEDIUM: stats/cli:
add support for "set table key" to enter values"), commands
'set ssl tls-key', 'set ssl ocsp-response', 'set severity-output' and
'set table' do not always send an extra LF at the end of their outputs.
This is required as mentioned in doc/management.txt:
"Since multiple commands may be issued at once, haproxy uses the empty
line as a delimiter to mark an end of output for each command"
Olivier Houchard [Thu, 15 Mar 2018 16:48:49 +0000 (17:48 +0100)]
BUG/MINOR: seemless reload: Fix crash when an interface is specified.
When doing a seemless reload, while receiving the sockets from the old process
the new process will die if the socket has been bound to a specific
interface.
This happens because the code that tries to parse the informations bogusly
try to set xfer_sock->namespace, while it should be setting wfer_sock->iface.
BUG/MINOR: dns: don't downgrade DNS accepted payload size automatically
Automatic downgrade of DNS accepted payload size may have undesired side
effect, which could make a backend with all servers DOWN.
After talking with Lukas on the ML, I realized this "feature" introduces
more issues that it fixes problem.
The "best" way to handle properly big responses will be to implement DNS
over TCP.
BUG/MAJOR: threads/queue: Fix thread-safety issues on the queues management
The management of the servers and the proxies queues was not thread-safe at
all. First, the accesses to <strm>->pend_pos were not protected. So it was
possible to release it on a thread (for instance because the stream is released)
and to use it in same time on another one (because we redispatch pending
connections for a server). Then, the accesses to stream's information (flags and
target) from anywhere is forbidden. To be safe, The stream's state must always
be updated in the context of process_stream.
So to fix these issues, the queue module has been refactored. A lock has been
added in the pendconn structure. And now, when we try to dequeue a pending
connection, we start by unlinking it from the server/proxy queue and we wake up
the stream. Then, it is the stream reponsibility to really dequeue it (or
release it). This way, we are sure that only the stream can create and release
its <pend_pos> field.
However, be careful. This new implementation should be thread-safe
(hopefully...). But it is not optimal and in some situations, it could be really
slower in multi-threaded mode than in single-threaded one. The problem is that,
when we try to dequeue pending connections, we process it from the older one to
the newer one independently to the thread's affinity. So we need to wait the
other threads' wakeup to really process them. If threads are blocked in the
poller, this will add a significant latency. This problem happens when maxconn
values are very low.
BUG/MEDIUM: threads/unix: Fix a deadlock when a listener is temporarily disabled
When a listener is temporarily disabled, we start by locking it and then we call
.pause callback of the underlying protocol (tcp/unix). For TCP listeners, this
is not a problem. But listeners bound on an unix socket are in fact closed
instead. So .pause callback relies on unbind_listener function to do its job.
Unfortunatly, unbind_listener hold the listener's lock and then call an internal
function to unbind it. So, there is a deadlock here. This happens during a
reload. To fix the problemn, the function do_unbind_listener, which is lockless,
is now exported and is called when a listener bound on an unix socket is
temporarily disabled.
Cyril Bonté [Mon, 12 Mar 2018 21:02:59 +0000 (22:02 +0100)]
BUG/MINOR: force-persist and ignore-persist only apply to backends
>From the very first day of force-persist and ignore-persist features,
they only applied to backends, except that the documentation stated it
could also be applied to frontends.
In order to make it clear, the documentation is updated and the parser
will raise a warning if the keywords are used in a frontend section.
This patch should be backported up to the 1.5 branch.
Cyril Bonté [Mon, 12 Mar 2018 20:47:39 +0000 (21:47 +0100)]
BUG/MEDIUM: fix a 100% cpu usage with cpu-map and nbthread/nbproc
Krishna Kumar reported a 100% cpu usage with a configuration using
cpu-map and a high number of threads,
Indeed, this minimal configuration to reproduce the issue :
global
nbthread 40
cpu-map auto:1/1-40 0-39
frontend test
bind :8000
This is due to a wrong type in a shift operator (int vs unsigned long int),
causing an endless loop while applying the cpu affinity on threads. The same
issue may also occur with nbproc under FreeBSD. This commit addresses both
cases.
BUG/MINOR: cli: Fix a typo in the 'set rate-limit' usage
The correct keyword is 'ssl-sessions' (vs. 'ssl-session').
The typo was introduced in 45c742be05 ('REORG: cli: move the "set
rate-limit" functions to their own parser').
Willy Tarreau [Mon, 5 Mar 2018 15:10:54 +0000 (16:10 +0100)]
BUG/MEDIUM: h2: also arm the h2 timeout when sending
Right now the h2 idle timeout is only set when there is no stream. If we
fail to send because the socket buffers are full (generally indicating
the client has left), we also need to arm it so that we can properly
expire such connections, otherwise some failed transfers might leave
H2 connections pending forever.
Thanks to Thierry Fournier for the diag and the traces.
Emeric Brun [Mon, 5 Mar 2018 16:46:16 +0000 (17:46 +0100)]
BUG/MINOR: session: Fix tcp-request session failure if handshake.
Some sample fetches check if session is established using
the flag CO_FL_CONNECTED. But in some cases, when a handshake
is performed this flag is set too late, after the process
of the tcp-request session rules.
This fix move the raising of the flag at the beginning of the
conn_complete_session function which processes the tcp-request
session rules.
This fix must be backported to 1.8 (and perhaps 1.7)