BUG/MEDIUM: mux-h1: Trim excess server data at the end of a transaction
At the end of a transaction, when the conn_stream is detach from the H1
connection, on the server side, we must release the input buffer to trim any
excess data received from the server to be sure to block invalid responses. A
typical example of such data would be from a buggy server responding to a HEAD
with some data, or sending more than the advertised content-length.
This issue was reported on Gitbub. See issue #176.
MINOR: config: Warn only if the option http-use-htx is used with "no" prefix
No warning message is emitted anymore if the option is used to enable the
HTX. But it is still diplayed when the "no" prefix is used to disable the HTX
explicitly. So, for existing configs, we display a warning only if there is a
change in the behavior of HAProxy between the 2.1 and the previous versions.
BUG/MINOR: checks: do not exit tcp-checks from the middle of the loop
There's a comment above tcpcheck_main() clearly stating that no return
statement should be placed in the middle, still we did have one after
installing the mux. It looks mostly harmless though as it will only
fail to mark the server as being in error in case of allocation failure
or config issue.
This fix should be backported to 2.0 and probably 1.9 as well.
BUG/MINOR: session: Send a default HTTP error if accept fails for a H1 socket
If session_accept_fd() fails for a raw HTTP socket, we try to send an HTTP error
500. But we must not rely on error messages of the proxy or on the array
http_err_chunks because these are HTX messages. And it should be too expensive
to convert an HTX message to a raw message at this place. So instead, we send a
default HTTP error message from the array http_err_msgs.
BUG/MINOR: session: Emit an HTTP error if accept fails only for H1 connection
If session_accept_fd() fails for a raw HTTP socket, we try to send an HTTP error
500. But, we must also take care it is an HTTP/1 connection. We cannot rely on
the mux at this stage, because the error, if any, happens before or during its
creation. So, instead, we check if the mux_proto is specified or not. Indeed,
the mux h1 cannot be forced on the bind line and there is no ALPN to choose
another mux on a raw socket. So if there is no mux_proto defined for a raw HTTP
socket, we are sure to have an HTTP/1 connection.
MINOR: http: Don't store raw HTTP errors in chunks anymore
Default HTTP error messages are stored in an array of chunks. And since the HTX
was added, these messages are also converted in HTX and stored in another
array. But now, the first array is not used anymore because the legacy HTTP mode
was removed.
So now, only the array with the HTX messages are kept. The other one was
removed.
MINOR: global: Preset tune.max_http_hdr to its default value
By default, this tune parameter is set to MAX_HTTP_HDR. This assignment is done
after the configuration parsing, when we check the configuration validity. So
during the configuration parsing, its value is 0. Now, it is set to MAX_HTTP_HDR
from the start. So, it is possible to rely on it during the configuration
parsing.
MINOR: proxy: Remove support of the option 'http-tunnel'
The option 'http-tunnel' is deprecated and it was only used in the legacy HTTP
mode. So this option is now totally ignored and a warning is emitted during
HAProxy startup if it is found in a configuration file.
REORG: proto_htx: Move HTX analyzers & co to http_ana.{c,h} files
The old module proto_http does not exist anymore. All code dedicated to the HTTP
analysis is now grouped in the file proto_htx.c. So, to finish the polishing
after removing the legacy HTTP code, proto_htx.{c,h} files have been moved in
http_ana.{c,h} files.
In addition, all HTX analyzers and related functions prefixed with "htx_" have
been renamed to start with "http_" instead.
Many flags of the HTTP transction (TX_*) are now unused and useless. So the
flags TX_WAIT_CLEANUP, TX_HDR_CONN_*, TX_CON_CLO_SET and TX_CON_KAL_SET were
removed. Most of TX_CON_WANT_* were also removed. Only TX_CON_WANT_TUN has been
kept.
MINOR: hlua: Remove useless test on TX_CON_WANT_* flags
When an HTTP applet is initialized, it is useless to force server-close mode on
the HTTP transaction because the connection mode is now handled by muxes. In
HTX, during analysis, the flag TX_CON_WANT_CLO is set by default in
htx_wait_for_request(), and TX_CON_WANT_SCL is never tested anywere.
First of all, all legacy HTTP analyzers and all functions exclusively used by
them were removed. So the most of the functions in proto_http.{c,h} were
removed. Only functions to deal with the HTTP transaction have been kept. Then,
http_msg and hdr_idx modules were entirely removed. And finally the structure
http_msg was lightened of all its useless information about the legacy HTTP. The
structure hdr_ctx was also removed because unused now, just like unused states
in the enum h1_state. Note that the memory pool "hdr_idx" was removed and
"http_txn" is now smaller.
MAJOR: filters: Remove code relying on the legacy HTTP mode
This commit breaks the compatibility with filters still relying on the legacy
HTTP code. The legacy callbacks were removed (http_data, http_chunk_trailers and
http_forward_data).
For now, the filters must still set the flag FLT_CFG_FL_HTX to be used on HTX
streams.
MINOR: stats: Remove code relying on the legacy HTTP mode
The part of the applet dealing with raw buffer was removed, for the HTTP part
only. So the old functions stats_send_http_headers() and
stats_send_http_redirect() were removed and replaced by the htx ones. The legacy
applet I/O handler was replaced by the htx one. And the parsing of POST data was
purged of the legacy HTTP code.
MINOR: flt_trace: Remove code relying on the legacy HTTP mode
The legacy HTTP callbacks were removed (trace_http_data,
trace_http_chunk_trailers and trace_http_forward_data). And the loop on the HTTP
headers was updated to only handle HTX messages.
MEDIUM: compression: Remove code relying on the legacy HTTP mode
The legacy HTTP callbacks were removed (comp_http_data, comp_http_chunk_trailers
and comp_http_forward_data). Functions emitting compressed chunks of data for
the legacy HTTP mode were also removed. The state for the compression filter was
updated accordingly. The compression context and the algorigttm used to compress
data are the only useful information remaining.
MEDIUM: cache: Remove code relying on the legacy HTTP mode
The applet delivering cached objects based on the legacy HTTP code was removed
as the filter callback cache_store_http_forward_data(). And the action analyzing
the response coming from the server to store it in the cache or not was purged
of the legacy HTTP code.
MEDIUM: backend: Remove code relying on the HTTP legacy mode
The L7 loadbalancing algorithms are concerned (uri, url_param and hdr), the
"sni" parameter on the server line and the "source" parameter on the server line
when used with "use_src hdr_ip()".
MINOR: proxy: Don't adjust connection mode of HTTP proxies anymore
This was only used for the legacy HTTP mode where the connection mode was
handled by the HTTP analyzers. In HTX, the function http_adjust_conn_mode() does
nothing. The connection mode is handled by the muxes.
MINOR: proxy: Remove tests on the option 'http-use-htx' during H1 upgrade
To know if an upgrade from TCP to H1 must be performed, we now only need to know
if a non HTX stream is assigned to an HTTP backend. So we don't rely anymore on
the flag PR_O2_USE_HTX to handle such upgrades.
MINOR: stream: Remove tests on the option 'http-use-htx' in stream_new()
All streams created for an HTTP proxy must now use the HTX internal
resprentation. So, it is no more necessary to test the flag PR_O2_USE_HTX. It
means a stream is an HTX stream if the frontend is an HTTP proxy or if the
frontend multiplexer, if any, set the flag MX_FL_HTX.
MEDIUM: http_fetch: Remove code relying on HTTP legacy mode
Since the legacy HTTP mode is disbabled, all HTTP sample fetches work on HTX
streams. So it is safe to remove all code relying on HTTP legacy mode. Among
other things, the function smp_prefetch_http() was removed with the associated
macros CHECK_HTTP_MESSAGE_FIRST() and CHECK_HTTP_MESSAGE_FIRST_PERM().
MINOR: stream: Rely on HTX analyzers instead of legacy HTTP ones
Since the legacy HTTP mode is disabled, old HTTP analyzers do nothing but call
those of the HTX. So, it is safe to directly call HTX analyzers from
process_stream().
MINOR: connection: Remove the multiplexer protocol PROTO_MODE_HTX
Since the legacy HTTP mode is disabled and no multiplexer relies on it anymore,
there is no reason to have 2 multiplexer protocols for the HTTP. So the protocol
PROTO_MODE_HTX was removed and all HTTP multiplexers use now PROTO_MODE_HTTP.
MAJOR: http: Deprecate and ignore the option "http-use-htx"
From this commit, the legacy HTTP mode is now definitely disabled. It is the
first commit of a long series to remove the legacy HTTP code. Now, all HTTP
processing is done using the HTX internal representation. Since the version 2.0,
It is the default mode. So now, it is no more possible to disable the HTX to
fallback on the legacy HTTP mode. If you still use "[no] option http-use-htx", a
warning will be emitted during HAProxy startup. Note the passthough multiplexer
is now only usable for TCP proxies.
MINOR: htx: Use an array of char to store HTX blocks
Instead of using a array of (struct block), it is more natural and intuitive to
use an array of char. Indeed, not only (struct block) are stored in this array,
but also their payload.
MINOR: htx: Deduce the number of used blocks from tail and head values
<head> and <tail> fields are now signed 32-bits integers. For an empty HTX
message, these fields are set to -1. So the field <used> is now useless and can
safely be removed. To know if an HTX message is empty or not, we just compare
<head> against -1 (it also works with <tail>). The function htx_nbblks() has
been added to get the number of used blocks.
MINOR: proto_htx: Don't stop forwarding when there is a post-connect processing
The TXN flag HTTP_MSGF_WAIT_CONN is now ignored on HTX streams. There is no
reason to not start to forward data in HTX. This is required for the legacy mode
and this was copied from it during the HTX development. But it is simply
useless.
BUG/MINOR: cache/htx: Make maxage calculation HTX aware
The function http_calc_maxage() was not updated to be HTX aware. So the header
"Cache-Control" on the response was never parsed to find "max-age" or "s-maxage"
values.
BUG/MINOR: http_htx: Initialize HTX error messages for TCP proxies
Since the HTX is the default mode for all proxies, HTTP and TCP, we must
initialize all HTX error messages for all HTX-aware proxies and not only for
HTTP ones. It is required to support HTTP upgrade for TCP proxies.
BUG/MINOR: http_fetch: Fix http_auth/http_auth_group when called from TCP rules
These sample fetches rely on the static fnuction get_http_auth(). For HTX
streams and TCP proxies, this last one gets its HTX message from the request's
channel. When called from an HTTP rule, There is no problem. Bu when called from
TCP rules for a TCP proxy, this buffer is a raw buffer not an HTX message. For
instance, using the following TCP rule leads to a crash :
tcp-request content accept if { http_auth(Users) }
To fix the bug, we must rely on the HTX message returned by the function
smp_prefetch_htx(). So now, the HTX message is passed as argument to the
function get_http_auth().
MINOR: mux-h2: Don't adjust anymore the amount of data sent in h2_snd_buf()
Because the infinite forward is HTX aware, it is useless to tinker with the
number of bytes really sent. This was fixed long ago for the H1 and forgotten to
do so for the H2.
BUG/MINOR: backend: do not try to install a mux when the connection failed
If si_connect() failed, do not try to install the mux nor to complete
the operations or add the connection to an idle list, and abort quickly
instead. No obvious side effects were identified, but continuing to
allocate some resources after something has already failed seems risky.
This was a result of a prior fix which already wanted to push this code
further : aa089d80b ("BUG/MEDIUM: server: Defer the mux init until after
xprt has been initialized.") but it ought to have pushed it even further
to maintain the error check just after si_connect().
The temporary connection used to hold the target connection's address
was missing a valid target, resulting in a 500 server error being
reported when trying to connect to a remote host. Strangely this
issue was introduced as a side effect of commit 2c52a2b9e ("MEDIUM:
connection: make mux->detach() release the connection") which at
first glance looks unrelated but solidly stops the bisection (note
that by default this part even crashes). It's suspected that the
error only happens when closing and destroys pending data in fact.
Given that this feature was broken very early during 1.8-rc1 development
it doesn't seem to be used often. This must be backported as far as 1.8.
BUG/MEDIUM: checks: Don't attempt to receive data if we already subscribed.
tcpcheck_main() might be called while we already attempted to subscribe, and
failed. There's no point in trying to call rcv_buf() again, and failing
would lead to us trying to subscribe again, which is not allowed.
A long time ago, applets were seen as an alternative to connections,
and since their respective sizes were roughly equal it appeared wise
to share the same pool. Nowadays, connections got significantly larger
but applets are not that often used, except for the cache. However
applets are mostly complementary and not alternatives anymore, as
it's very possible not to have a back connection or to share one with
other streams.
The connections will soon lose their addresses and their size will
shrink so much that appctx won't fit anymore. Given that the old
benefits of sharing these pools have long disappeared, let's stop
doing this and have a dedicated pool for appctx.
BUG/MINOR: dns: remove irrelevant dependency on a client connection
The do-resolve action tests for a client connection to the stream and
tries to get the client's address, otherwise it refrains from performing
the resolution. This really makes no sense at all and looks like an
earlier attempt at resolving the client's address to test that the
code was working. Further, it prevents the action from being used
from other places such as an autonomous applet for example, even if
at the moment this use case does not exist.
DOC: management: document cache_hits and cache_lookups in the CSV format
Counters for cache_hits and cache_lookups were added with commit a1214a50 ("MINOR: cache: report the number of cache lookups and cache
hits") but not documented in management.txt.
DOC: management: document reuse and connect counters in the CSV format
Counters for connect and reuse were added in the stats with commit f1573848 ("MINOR: backend: count the number of connect and reuse
per server and per backend") but not documented the CSV format in
management.txt
Released version 2.1-dev1 with the following main changes :
- BUG/MEDIUM: h2/htx: Update data length of the HTX when the cookie list is built
- DOC: this is a development branch again.
- MEDIUM: Make 'block' directive fatal
- MEDIUM: Make 'redispatch' directive fatal
- MEDIUM: Make '(cli|con|srv)timeout' directive fatal
- MEDIUM: Remove 'option independant-streams'
- MINOR: sample: Add sha2([<bits>]) converter
- MEDIUM: server: server-state global file stored in a tree
- BUG/MINOR: lua/htx: Make txn.req_req_* and txn.res_rep_* HTX aware
- BUG/MINOR: mux-h1: Add the header connection in lower case in outgoing messages
- BUG/MEDIUM: compression: Set Vary: Accept-Encoding for compressed responses
- MINOR: htx: Add the function htx_change_blk_value_len()
- BUG/MEDIUM: htx: Fully update HTX message when the block value is changed
- BUG/MEDIUM: mux-h2: Reset padlen when several frames are demux
- BUG/MEDIUM: mux-h2: Remove the padding length when a DATA frame size is checked
- BUG/MEDIUM: lb_fwlc: Don't test the server's lb_tree from outside the lock
- BUG/MAJOR: sample: Wrong stick-table name parsing in "if/unless" ACL condition.
- BUILD: mworker: silence two printf format warnings around getpid()
- BUILD: makefile: use :space: instead of digits to count commits
- BUILD: makefile: adjust the sed expression of "make help" for solaris
- BUILD: makefile: do not rely on shell substitutions to determine git version
- BUG/MINOR: mworker-prog: Fix segmentation fault during cfgparse
- BUG/MINOR: spoe: Fix memory leak if failing to allocate memory
- BUG/MEDIUM: mworker: don't call the thread and fdtab deinit
- BUG/MEDIUM: stream_interface: Don't add SI_FL_ERR the state is < SI_ST_CON.
- BUG/MEDIUM: connections: Always add the xprt handshake if needed.
- BUG/MEDIUM: ssl: Don't do anything in ssl_subscribe if we have no ctx.
- BUG/MEDIUM: mworker/cli: command pipelining doesn't work anymore
- BUG/MINOR: htx: Save hdrs_bytes when the HTX start-line is replaced
- BUG/MAJOR: mux-h1: Don't crush trash chunk area when outgoing message is formatted
- BUG/MINOR: memory: Set objects size for pools in the per-thread cache
- BUG/MINOR: log: Detect missing sampling ranges in config
- BUG/MEDIUM: proto_htx: Don't add EOM on 1xx informational messages
- BUG/MEDIUM: mux-h1: Use buf_room_for_htx_data() to detect too large messages
- BUG/MINOR: mux-h1: Make format errors during output formatting fatal
- BUG/MEDIUM: ssl: Don't attempt to set alpn if we're not using SSL.
- BUG/MEDIUM: mux-h1: Always release H1C if a shutdown for writes was reported
- BUG/MINOR: mworker/cli: don't output a \n before the response
- BUG/MEDIUM: checks: unblock signals in external checks
- BUG/MINOR: mux-h1: Skip trailers for non-chunked outgoing messages
- BUG/MINOR: mux-h1: Don't return the empty chunk on HEAD responses
- BUG/MEDIUM: connections: Always call shutdown, with no linger.
- BUG/MEDIUM: checks: Make sure the tasklet won't run if the connection is closed.
- BUG/MINOR: contrib/prometheus-exporter: Don't use channel_htx_recv_max()
- BUG/MINOR: hlua: Don't use channel_htx_recv_max()
- BUG/MEDIUM: channel/htx: Use the total HTX size in channel_htx_recv_limit()
- BUG/MINOR: hlua/htx: Respect the reserve when HTX data are sent
- BUG/MINOR: contrib/prometheus-exporter: Respect the reserve when data are sent
- BUG/MEDIUM: connections: Make sure we're unsubscribe before upgrading the mux.
- BUG/MEDIUM: servers: Authorize tfo in default-server.
- BUG/MEDIUM: sessions: Don't keep an extra idle connection in sessions.
- MINOR: server: Add "no-tfo" option.
- BUG/MINOR: contrib/prometheus-exporter: Don't try to add empty data blocks
- MINOR: action: Add the return code ACT_RET_DONE for actions
- BUG/MEDIUM: http/applet: Finish request processing when a service is registered
- BUG/MEDIUM: lb_fas: Don't test the server's lb_tree from outside the lock
- BUG/MEDIUM: mux-h1: Handle TUNNEL state when outgoing messages are formatted
- BUG/MINOR: mux-h1: Don't process input or ouput if an error occurred
- MINOR: stream-int: Factorize processing done after sending data in si_cs_send()
- BUG/MEDIUM: stream-int: Don't rely on CF_WRITE_PARTIAL to unblock opposite si
- DOC: contrib: spoa_server Add some hints for building spoa_server
- DOC: Fix typo in intro.txt
- BUG/MEDIUM: servers: Don't forget to set srv_cs to NULL if we can't reuse it.
- BUG/MINOR: ssl: revert empty handshake detection in OpenSSL <= 1.0.2
- MINOR: pools: release the pool's lock during the malloc/free calls
- MINOR: pools: always pre-initialize allocated memory outside of the lock
- MINOR: pools: make the thread harmless during the mmap/munmap syscalls
- BUG/MEDIUM: fd/threads: fix excessive CPU usage on multi-thread accept
- BUG/MINOR: server: Be really able to keep "pool-max-conn" idle connections
- BUG/MEDIUM: checks: Don't attempt to read if we destroyed the connection.
- BUG/MEDIUM: da: cast the chunk to string.
- DOC: Fix typos and grammer in configuration.txt
- CLEANUP: proto_tcp: Remove useless header inclusions.
- BUG/MEDIUM: servers: Fix a race condition with idle connections.
- MINOR: task: introduce work lists
- BUG/MAJOR: listener: fix thread safety in resume_listener()
- BUG/MEDIUM: mux-h1: Don't release h1 connection if there is still data to send
- BUG/MINOR: mux-h1: Correctly report Ti timer when HTX and keepalives are used
- BUG/MEDIUM: streams: Don't give up if we couldn't send the request.
- BUG/MEDIUM: streams: Don't redispatch with L7 retries if redispatch isn't set.
- BUG/MINOR: mux-pt: do not pretend there's more data after a read0
- BUG/MEDIUM: tcp-check: unbreak multiple connect rules again
- MEDIUM: mworker-prog: Add user/group options to program section
- REGTESTS: checks: tcp-check connect to multiple ports
- BUG/MEDIUM: threads: cpu-map designating a single thread/process are ignored
BUG/MEDIUM: threads: cpu-map designating a single thread/process are ignored
Since commit 81492c989 ("MINOR: threads: flatten the per-thread cpu-map"),
we don't keep the proc*thread matrix anymore to represent the full binding
possibilities, but only the proc and thread ones. The problem is that the
per-process binding is not the same for each thread and for the process,
and the proc[] array was assumed to store the per-proc first thread value
when doing this change. Worse, the logic present there tries to deal with
thread ranges and process ranges in a way which automatically exclused the
other possibility (since ranges cannot be used on both) but as such fails
to apply changes if neither the process nor the thread is expressed as a
range.
The real problem comes from the fact that specifying cpu-map 1/1 doesn't
yet reveal if the per-process mask or the per-thread mask needs to be
updated. In practice it's the thread one but then the current storage
doesn't allow to store the binding of the first thread of each other
process in nbproc>1 configurations.
When removing the proc*thread matrix, what ought to have been kept was
both the thread column for process 1 and the process line for threads 1,
but instead only the thread column was kept. This patch reintroduces the
storage of the configuration for the first thread of each process so that
it is again possible to store either the per-thread or per-process
configuration.
As a partial workaround for existing configurations, it is possible to
systematically indicate at least two processes or two threads at once
and map them by pairs or more so that at least two values are present
in the range. E.g :
REGTESTS: checks: tcp-check connect to multiple ports
This test uses two sets of tcp-check connect port rules, with one
of the two ports being closed and expects the check to fail for both
backends at different steps. It aims at detecting regressions such as
the one fixed by 7df8ca62 (BUG/MEDIUM: tcp-check: unbreak multiple
connect rules again).
BUG/MEDIUM: tcp-check: unbreak multiple connect rules again
The last connect rule used to be ignored and that was fixed by commit 248f1173f ("BUG/MEDIUM: tcp-check: single connect rule can't detect
DOWN servers") during 1.9 development. However this patch went a bit
too far by not breaking out of the loop after a pending connect(),
resulting in a series of failed connect() to be quickly skipped and
only the last one to be taken into account.
Technically speaking the series is not exactly skipped, it's just that
TCP checks suffer from a design issue which is that there is no
distinction between a new rule and this rule's completion in the
"connect" rule handling code. As such, when evaluating TCPCHK_ACT_CONNECT
a new connection is created regardless of any previous connection in
progress, and the previous result is ignored. It seems that this issue
is mostly specific to the connect action if we refer to the comments at
the top of the function, so it might be possible to durably address it
by reworking the connect state.
For now this patch does something simpler, it restores the behaviour
before the commit above consisting in breaking out of the loop when
the connection is in progress and after skipping comment rules. This
way we fall back to the default code waiting for completion.
This patch must be backported as far as 1.8 since the commit above
was backported there. Thanks to Jérôme Magnin for reporting and
bisecting this issue.
BUG/MINOR: mux-pt: do not pretend there's more data after a read0
Commit 8706c8131 ("BUG/MEDIUM: mux_pt: Always set CS_FL_RCV_MORE.")
was a bit excessive in setting this flag, it refrained from removing
it after read0 unless it was on an empty call. The problem it causes
is that read0 is thus ignored on the first call :
If CO_FL_SOCK_RD_SH is reported by the transport layer, it indicates the
read0 was already seen thus we must not try again and we must immedaitely
report it. The simple fix consists in removing the test on ret==0 :
BUG/MEDIUM: streams: Don't redispatch with L7 retries if redispatch isn't set.
Move the logic to decide if we redispatch to a new server from
sess_update_st_cer() to a new inline function, stream_choose_redispatch(), and
use it in do_l7_retry() instead of just setting the state to SI_ST_REQ.
That way, when using L7 retries, we won't redispatch the request to another
server except if "option redispatch" is used.
BUG/MEDIUM: streams: Don't give up if we couldn't send the request.
In htx_request_forward_body(), don't give up if we failed to send the request,
and we have L7 retries activated. If we do, we will not retry when we should.
Dave Pirotte [Wed, 10 Jul 2019 13:57:38 +0000 (13:57 +0000)]
BUG/MINOR: mux-h1: Correctly report Ti timer when HTX and keepalives are used
When HTTP keepalives are used in conjunction with HTX, the Ti timer
reports the elapsed time since the beginning of the connection instead
of the end of the previous request as stated in the documentation. Th,
Tq and Tt also report incorrectly as a result.
When creating a new h1s, check if it is the first request on the
connection. If not, set the session create times to the current
timestamp rather than the initial session accept timestamp. This makes
the logged timers behave as stated in the documentation.
BUG/MEDIUM: mux-h1: Don't release h1 connection if there is still data to send
When the h1 stream (h1s) is detached, If the connection is not really shutdown
yet and if there is still some data to send, the h1 connection (h1c) must not be
released. Otherwise, the remaining data are lost. This bug was introduced by the
commit 3ac0f430 ("BUG/MEDIUM: mux-h1: Always release H1C if a shutdown for
writes was reported").
Here is the conditions to release an h1 connection when the h1 stream is
detached :
* An error or a shutdown write occurred on the connection
(CO_FL_ERROR|CO_FL_SOCK_WR_SH)
* an error, an h2 upgrade or full shutdown occurred on the h1 connection
(H1C_F_CS_ERROR||H1C_F_UPG_H2C|H1C_F_CS_SHUTDOWN)
* A shutdown write is pending on the h1 connection and there is no more data
in the output buffer
((h1c->flags & H1C_F_CS_SHUTW_NOW) && !b_data(&h1c->obuf))
If one of these conditions is fulfilled, the h1 connection is
released. Otherwise, the release is delayed. If we are waiting to send remaining
data, a timeout is set.
This patch must be backported to 2.0 and 1.9. It fixes the issue #164.
BUG/MAJOR: listener: fix thread safety in resume_listener()
resume_listener() can be called from a thread not part of the listener's
mask after a curr_conn has gone lower than a proxy's or the process' limit.
This results in fd_may_recv() being called unlocked if the listener is
bound to only one thread, and quickly locks up.
This patch solves this by creating a per-thread work_list dedicated to
listeners, and modifying resume_listener() so that it bounces the listener
to one of its owning thread's work_list and waking it up. This thread will
then call resume_listener() again and will perform the operation on the
file descriptor itself. It is important to do it this way so that the
listener's state cannot be modified while the listener is being moved,
otherwise multiple threads can take conflicting decisions and the listener
could be put back into the global queue if the listener was used at the
same time.
It seems like a slightly simpler approach would be possible if the locked
list API would provide the ability to return a locked element. In this
case the listener would be immediately requeued in dequeue_all_listeners()
without having to go through resume_listener() with its associated lock.
This fix must be backported to all versions having the lock-less accept
loop, which is as far as 1.8 since deadlock fixes involving this feature
had to be backported there. It is expected that the code should not differ
too much there. However, previous commit "MINOR: task: introduce work lists"
will be needed as well and should not present difficulties either. For 1.8,
the commits introducing thread_mask() and LIST_ADDED() will be needed as
well, either backporting my_flsl() or switching to my_ffsl() will be OK,
and some changes will have to be performed so that the init function is
properly called (and maybe the deinit one can be dropped).
In order to test for the fix, simply set up a multi-threaded frontend with
multiple bind lines each attached to a single thread (reproduced with 16
threads here), set up a very low maxconn value on the frontend, and inject
heavy traffic on all listeners in parallel with slightly more connections
than the configured limit ( typically +20%) so that it flips very
frequently. If the bug is still there, at some point (5-20 seconds) the
traffic will go much lower or even stop, either with spinning threads or
not.
Sometimes we need to delegate some list processing to a function running
on another thread. In this case the list element will simply be queued
into a dedicated self-locked list and the task responsible for this list
will be woken up, calling the associated function which will run over the
list.
This is what work_list does. Such lists will be dedicated to a limited
type of work but will significantly ease such remote handling. A function
is provided to create these per-thread lists, their tasks and to properly
bind each task to a distinct thread, so that the caller only has to store
the resulting pointer to the start of the structure.
These structures should not be abused though as each head will consume
4 pointers per thread, hence 32 bytes per thread or 2 kB for 64 threads.
BUG/MEDIUM: servers: Fix a race condition with idle connections.
When we're purging idle connections, there's a race condition, when we're
removing the connection from the idle list, to add it to the list of
connections to free, if the thread owning the connection tries to free it
at the same time.
To fix this, simply add a per-thread lock, that has to be hold before
removing the connection from the idle list, and when, in conn_free(), we're
about to remove the connection from every list. That way, we know for sure
the connection will stay valid while we remove it from the idle list, to add
it to the list of connections to free.
This should happen rarely enough that it shouldn't have any impact on
performances.
This has not been reported yet, but could provoke random segfaults.
BUG/MEDIUM: checks: Don't attempt to read if we destroyed the connection.
In event_srv_chk_io(), only call __event_srv_chk_r() if we did not subscribe
for reading, and if wake_srv_chk() didn't return -1, as it would mean it
just destroyed the connection and the conn_stream, and attempting to use
those to recv data would lead to a crash.
BUG/MINOR: server: Be really able to keep "pool-max-conn" idle connections
The maximum number of idle connections for a server can be configured by setting
the server option "pool-max-conn". But when we try to add a connection in its
idle list, because of a wrong comparison, it may be rejected because there are
already "pool-max-conn - 1" idle connections.
BUG/MEDIUM: fd/threads: fix excessive CPU usage on multi-thread accept
While experimenting with potentially improved fairness and latency using
ticket locks on a Ryzen 16-thread/8-core, a very strange situation happened
a lot for some levels of traffic. Around 300k connections per second, no
more connections would be accepted on the multi-threaded listener but all
others would continue to work fine. All attempts to trace showed that the
threads were all in the trylock in the fd cache, or in the spinlock of
fd_update_events(), or in the one of fd_may_recv(). But as indicated this
was not a deadlock since the process continues to work fine.
After quite some investigation it appeared that the issue is caused by a
lack of fairness between the fdcache's trylock and these functions' spin
locks above. In fact, regardless of the success or failure of the fdcache's
attempt at grabbing the lock, the poller was calling fd_update_events()
which locks the FD once for something that can be done with a CAS, and
then calls fd_may_recv() with another lock for something that most often
didn't change. The high contention on these spinlocks leaves no chance to
any other thread to grab the lock using trylock(), and once this happens,
there is no thread left to process incoming connection events nor to stop
polling on the FD, leaving all threads at 100% CPU but partially operational.
This patch addresses the issue by using bit-test-and-set instead of the OR
in fd_may_recv() / fd_may_send() so that nothing is done if the FD was
already configured as expected. It does the same in fd_update_events()
using a CAS to check if the FD's events need to be changed at all or not.
With this patch applied, it became impossible to reproduce the issue, and
now there's no way to saturate all 16 CPUs with the load used for testing,
as no more than 1350-1400 were noticed at 300+kcps vs 1600.
Ideally this patch should go further and try to remove the remaining
incarnations of the fdlock as this seems possible, but it's difficult
enough to be done in a distinct patch that will not have to be backported.
It is possible that workloads involving a high connection rate may slightly
benefit from this patch and observe a slightly lower CPU usage even when
the service doesn't misbehave.
MINOR: pools: make the thread harmless during the mmap/munmap syscalls
These calls can take quite some time and leave the thread harmless so
it's better to mark it as such. This makes "show sess" respond way
faster during high loads running on processes build with DEBUG_UAF
since these calls are stressed a lot.
MINOR: pools: always pre-initialize allocated memory outside of the lock
When calling mmap(), in general the system gives us a page but does not
really allocate it until we first dereference it. And it turns out that
this time is much longer than the time to perform the mmap() syscall.
Unfortunately, when running with memory debugging enabled, we mmap/munmap()
each object resulting in lots of such calls and a high contention on the
allocator. And the first accesses to the page being done under the pool
lock is extremely damaging to other threads.
The simple fact of writing a 0 at the beginning of the page after
allocating it and placing the POOL_LINK pointer outside of the lock is
enough to boost the performance by 8x in debug mode and to save the
watchdog from triggering on lock contention. This is what this patch
does.
MINOR: pools: release the pool's lock during the malloc/free calls
The malloc and free calls and especially the underlying mmap/munmap()
can occasionally take a huge amount of time and even cause the thread
to sleep. This is visible when haproxy is compiled with DEBUG_UAF which
causes every single pool allocation/free to allocate and release pages.
In this case, when using the locked pools, the watchdog can occasionally
fire under high contention (typically requesting 40000 1M objects in
parallel over 8 threads). Then, "perf top" shows that 50% of the CPU
time is spent in mmap() and munmap(). The reason the watchdog fires is
because some threads spin on the pool lock which is held by other threads
waiting on mmap() or munmap().
This patch modifies this so that the pool lock is released during these
syscalls. Not only this allows other threads to request try to allocate
their data in parallel, but it also considerably reduces the lock
contention.
Note that the locked pools are only used on small architectures where
high thread counts would not make sense, so this will not provide any
benefit in the general case. However it makes the debugging versions
way more stable, which is always appreciated.
Lukas Tribus [Mon, 8 Jul 2019 12:29:15 +0000 (14:29 +0200)]
BUG/MINOR: ssl: revert empty handshake detection in OpenSSL <= 1.0.2
Commit 54832b97 ("BUILD: enable several LibreSSL hacks, including")
changed empty handshake detection in OpenSSL <= 1.0.2 and LibreSSL,
from accessing packet_length directly (not available in LibreSSL) to
calling SSL_state() instead.
However, SSL_state() appears to be fully broken in both OpenSSL and
LibreSSL.
Since there is no possibility in LibreSSL to detect an empty handshake,
let's not try (like BoringSSL) and restore this functionality for
OpenSSL 1.0.2 and older, by reverting to the previous behavior.
BUG/MEDIUM: servers: Don't forget to set srv_cs to NULL if we can't reuse it.
In connect_server(), if there were already a CS assosciated with the stream,
but we can't reuse it, because the target is different (because we tried a
previous connection, it failed, and we use redispatch so we switched servers),
don't forget to set srv_cs to NULL. Otherwise, if we end up reusing another
connection, we would consider we already have a conn_stream, and we won't
create a new one, so we'd have a new connection but we would not be able to
use it.
This can explain frozen streams and connections stuck in CLOSE_WAIT when
using redispatch.
BUG/MEDIUM: stream-int: Don't rely on CF_WRITE_PARTIAL to unblock opposite si
In the function stream_int_notify(), when the opposite stream-interface is
blocked because there is no more room into the input buffer, if the flag
CF_WRITE_PARTIAL is set on this buffer, it is unblocked. It is a way to unblock
the reads on the other side because some data was sent.
But it is a problem during the fast-forwarding because only the stream is able
to remove the flag CF_WRITE_PARTIAL. So it is possible to have this flag because
of a previous send while the input buffer of the opposite stream-interface is
now full. In such case, the opposite stream-interface will be woken up for
nothing because its input buffer is full. If the same happens on the opposite
side, we will have a loop consumming all the CPU.
To fix the bug, the opposite side is now only notify if there is some available
room in its input buffer in the function si_cs_send(), so only if some data was
sent.
MINOR: stream-int: Factorize processing done after sending data in si_cs_send()
In the function si_cs_send(), what is done when an error occurred on the
connection or the conn_stream or when some successfully data was send via a pipe
or the channel's buffer may be factorized at the function. It slightly simplify
the function.
This patch must be backported to 2.0 and 1.9 because a bugfix depends on it.
BUG/MINOR: mux-h1: Don't process input or ouput if an error occurred
It is useless to proceed if an error already occurred. Instead, it is better to
wait it will be catched by the stream or the connection, depending on which is
the first one to detect it.
BUG/MEDIUM: mux-h1: Handle TUNNEL state when outgoing messages are formatted
Since the commit 94b2c7 ("MEDIUM: mux-h1: refactor output processing"), the
formatting of outgoing messages is performed on the message state and no more on
the HTX blocks read. But the TUNNEL state was left out. So, the HTTP tunneling
using the CONNECT method or switching the protocol (for instance,
the WebSocket) does not work.
This issue was reported on Github. See #131. This patch must be backported to
2.0.