Willy Tarreau [Mon, 10 Jun 2019 08:14:52 +0000 (10:14 +0200)]
OPTIM/MINOR: init/threads: only call protocol_enable_all() on first thread
There's no point in calling this on each and every thread since the first
thread passing there will enable the listeners, and the next ones will
simply scan all of them in turn to discover that they are already
initialized. Let's only initilize them on the first thread. This could
slightly speed up start up on very large configurations, eventhough most
of the time is still spent in the main thread binding the sockets.
A few measurements have constantly shown that this decreases the startup
time by ~0.1s for 150k listeners. Starting all of them in parallel doesn't
provide better results and can still expose some undesired races.
Willy Tarreau [Mon, 10 Jun 2019 07:51:04 +0000 (09:51 +0200)]
BUG/MEDIUM: init/threads: prevent initialized threads from starting before others
Since commit 6ec902a ("MINOR: threads: serialize threads initialization")
we now serialize threads initialization. But doing so has emphasized another
race which is that some threads may actually start the loop before others
are done initializing.
As soon as all threads enter the first thread_release() call, their rdv
bit is cleared and they're all waiting for all others' rdv to be cleared
as well, with their harmless bit set. The first one to notice the cleared
mask will progress through thread_isolate(), take rdv again preventing
most others from noticing its short pass to zero, and this first one will
be able to run all the way through the initialization till the last call
to thread_release() which it happily crosses, being the only one with the
rdv bit, leaving the room for one or a few others to do the same. This
results in some threads entering the loop before others are done with
their initialization, which is particularly bad. PiBa-NL reported that
some regtests fail for him due to this (which was impossible to reproduce
here, but races are racy by definition). However placing some printf()
in the initialization code definitely shows this unsychronized startup.
This patch takes a different approach in three steps :
- first, we don't start with thread_release() anymore and we don't
set the rdv mask anymore in the main call. This was initially done
to let all threads start toghether, which we don't want. Instead
we just start with thread_isolate(). Since all threads are harmful
by default, they all wait for each other's readiness before starting.
- second, we don't release with thread_release() but with
thread_sync_release(), meaning that we don't leave the function until
other ones have reached the point in the function where they decide
to leave it as well.
- third, it makes sure we don't start the listeners using
protocol_enable_all() before all threads have allocated their local
FD tables or have initialized their pollers, otherwise startup could
be racy as well. It's worth noting that it is even possible to limit
this call to thread #0 as it only needs to be performed once.
This now guarantees that all thread init calls start only after all threads
are ready, and that no thread enters the polling loop before all others have
completed their initialization.
Please check GH issues #111 and #117 for more context.
No backport is needed, though if some new init races are reported in
1.9 (or even 1.8) which do not affect 2.0, then it may make sense to
carefully backport this small series.
Willy Tarreau [Sun, 9 Jun 2019 10:20:02 +0000 (12:20 +0200)]
MEDIUM: threads: add thread_sync_release() to synchronize steps
This function provides an alternate way to leave a critical section run
under thread_isolate(). Currently, a thread may remain in thread_release()
without having the time to notice that the rdv mask was released and taken
again by another thread entering thread_isolate() (often the same that just
released it). This is because threads wait in harmless mode in the loop,
which is compatible with the conditions to enter thread_isolate(). It's
not possible to make them wait with the harmless bit off or we cannot know
when the job is finished for the next thread to start in thread_isolate(),
and if we don't clear the rdv bit when going there, we create another
race on the start point of thread_isolate().
This new synchronous variant of thread_release() makes use of an extra
mask to indicate the threads that want to be synchronously released. In
this case, they will be marked harmless before releasing their sync bit,
and will wait for others to release their bit as well, guaranteeing that
thread_isolate() cannot be started by any of them before they all left
thread_sync_release(). This allows to construct synchronized blocks like
this :
thread_isolate()
/* optionally do something alone here */
thread_sync_release()
/* do something together here */
thread_isolate()
/* optionally do something alone here */
thread_sync_release()
And so on. This is particularly useful during initialization where several
steps have to be respected and no thread must start a step before the
previous one is completed by other threads.
This one must not be placed after any call to thread_release() or it would
risk to block an earlier call to thread_isolate() which the current thread
managed to leave without waiting for others to complete, and end up here
with the thread's harmless bit cleared, blocking others. This might be
improved in the future.
Willy Tarreau [Sun, 9 Jun 2019 06:44:19 +0000 (08:44 +0200)]
MINOR: threads: avoid clearing harmless twice in thread_release()
thread_release() is to be called after thread_isolate(), i.e. when the
thread already has its harmless bit cleared. No need to clear it twice,
thus avoid calling thread_harmless_end() and directly check the rdv
bits then loop on them.
BUG/MEDIUM: stream_interface: Make sure we call si_cs_process() if CS_FL_EOI.
In si_cs_recv(), if we got the CS_FL_EOI flag on the conn_stream, make sure
we return 1, so that si_cs_process() will be called, and wake
process_stream() up, otherwise if we're unlucky the flag will never be
noticed, and the stream won't be woken up.
BUG/MEDIUM: H1: When upgrading, make sure we don't free the buffer too early.
In h1_release(), when we want to upgrade the mux to h2, make sure we set
h1c->ibuf to BUF_NULL before calling conn_upgrade_mux_fe().
If the upgrade is successful, the buffer will be provided to the new mux,
h1_release() will be called recursively, it will so try to free h1c->ibuf,
and freeing the buffer we just provided to the new mux would be unfortunate.
Willy Tarreau [Fri, 7 Jun 2019 17:00:37 +0000 (19:00 +0200)]
MEDIUM: tools: improve time format error detection
As reported in GH issue #109 and in discourse issue
https://discourse.haproxy.org/t/haproxy-returns-408-or-504-error-when-timeout-client-value-is-every-25d
the time parser doesn't error on overflows nor underflows. This is a
recurring problem which additionally has the bad taste of taking a long
time before hitting the user.
This patch makes parse_time_err() return special error codes for overflows
and underflows, and adds the control in the call places to report suitable
errors depending on the requested unit. In practice, underflows are almost
never returned as the parsing function takes care of rounding values up,
so this might possibly happen on 64-bit overflows returning exactly zero
after rounding though. It is not really possible to cut the patch into
pieces as it changes the function's API, hence all callers.
Tests were run on about every relevant part (cookie maxlife/maxidle,
server inter, stats timeout, timeout*, cli's set timeout command,
tcp-request/response inspect-delay).
MINOR: peers: Optimization for dictionary cache lookup.
When we look up an dictionary entry in the cache used upon transmission
we store the last result in ->prev_lookup of struct dcache_tx so that
to compare it with the subsequent entries to look up and save performances.
MINOR: peers: A bit of optimization when encoding cached server names.
When a server name is cached we only send its cache entry ID which has
an encoded length of 1 (because smaller than PEER_ENC_2BYTES_MIN).
So, in this case we only have to encode 1, the already known encoded length
of this ID before encoding it.
Furthermore we do not have to call strlen() to compute the lengths of server
name strings thanks to this commit: "MINOR: dict: Store the length of the
dictionary entries".
MINOR: dict: Store the length of the dictionary entries.
When allocating new dictionary entries we store the length of the strings.
May be useful so that not to have to call strlen() too much often at runing
time.
DOC: peers: Update for dictionary cache entries for peers protocol.
Add information about how the peers protocol send/receive entries of
LRU caches for literal dictionaries (e.g. server names in replacement
for server IDs).
MINOR peers: data structure simplifications for server names dictionary cache.
We store pointers to server names dictionary entries in a pre-allocated array of
ebpt_node's (->entries member of struct dcache_tx) to cache those sent to remote
peers. Consequently the ID used to identify the server name dictionary entry is
also used as index for this array. There is no need to implement a lookup by key
for this dictionary cache.
Willy Tarreau [Fri, 7 Jun 2019 12:41:11 +0000 (14:41 +0200)]
MINOR: threads: serialize threads initialization
There is no point in initializing threads in parallel when we know that
it's the moment where some global variables are turned to thread-local
ones, and/or that some global variables are updated (like global_now or
trash_size). Some FDs might be created/destroyed/reallocated and could
be tricky to follow as well (think about epoll_fd for example).
Instead of having to be extremely careful about all these, and to trigger
false positives in thread sanitizers, let's simply initialize one thread
at a time. The init step is very fast so nobody should even notice, and
we won't have any more doubts about what might have happened when
analysing a dump.
See GH issues #111 and #117 for some background on this.
It was made only to silence the thread sanitizer but ends up creating a
bug. Indeed, if "tune.bufsize" is in the global section, the trash_size
value is not updated anymore and the trash becomes smaller than a buffer!
Let's stop trying to fix the thread sanitizer reports, they are invalid,
and trying to fix them actually introduces bugs where there were none.
See GH issue #117 for more context. No backport is needed.
MINOR: chunks: Make sure trash_size is only set once.
The trash_size variable is shared by all threads, and is set by all threads,
when alloc_trash_buffers() is called. To make sure it's set only once,
to silence a harmless data race, use a CAS to set it, and only set it if it
was 0.
Willy Tarreau [Fri, 7 Jun 2019 09:10:07 +0000 (11:10 +0200)]
MINOR: logs: use the new bitmap functions instead of fd_sets for encoding maps
The fd_sets we've been using in the log encoding functions are not portable
and were shown to break at least under Cygwin. This patch gets rid of them
in favor of the new bitmap functions. It was verified with the config below
that the log output was exactly the same before and after the change :
Willy Tarreau [Fri, 7 Jun 2019 08:42:43 +0000 (10:42 +0200)]
MINOR: tools: add new bitmap manipulation functions
We now have ha_bit_{set,clr,flip,test} to manipulate bitfields made
of arrays of longs. The goal is to get rid of the remaining non-portable
FD_{SET,CLR,ISSET} that still exist at a few places.
Willy Tarreau [Fri, 7 Jun 2019 06:20:46 +0000 (08:20 +0200)]
BUG/MEDIUM: mux-h2: make sure the connection timeout is always set
There seems to be a tricky case in the H2 mux related to stream flow
control versus buffer a full situation : is a large response cannot
be entirely sent to the client due to the stream window being too
small, the stream is paused with the SFCTL flag. Then the upper
layer stream might get bored and expire this stream. It will then
shut it down first. But the shutdown operation might fail if the
mux buffer is full, resulting in the h2s being subscribed to the
deferred_shut event with the stream *not* added to the send_list
since it's blocked in SFCTL. In the mean time the upper layer completely
closes, calling h2_detach(). There we have a send_wait (the pending
shutw), the stream is marked with SFCTL so we orphan it.
Then if the client finally reads all the data that were clogging the
buffer, the send_list is run again, but our stream is not there. From
this point, the connection's stream list is not empty, the mux buffer
is empty, so the connection's timeout is not set. If the client
disappears without updating the stream's window, nothing will expire
the connection.
This patch makes sure we always keep the connection timeout updated.
There might be finer solutions, such as checking that there are still
living streams in the connection (i.e. streams not blocked in SFCTL
state), though this is not necessarily trivial nor useful, since the
client timeout is the same for the upper level stream and the connection
anyway.
This patch needs to be backported to 1.9 and 1.8 after some observation.
Willy Tarreau [Fri, 7 Jun 2019 04:12:59 +0000 (06:12 +0200)]
[RELEASE] Released version 2.0-dev6
Released version 2.0-dev6 with the following main changes :
- BUG/MEDIUM: connection: fix multiple handshake polling issues
- MINOR: connection: also stop receiving after a SOCKS4 response
- MINOR: mux-h1: don't try to recv() before the connection is ready
- BUG/MEDIUM: mux-h1: only check input data for the current stream, not next one
- MEDIUM: mux-h1: don't use CS_FL_REOS anymore
- CLEANUP: connection: remove the now unused CS_FL_REOS flag
- CONTRIB: debug: add 4 missing connection/conn_stream flags
- MEDIUM: stream: make a full process_stream() loop when completing I/O on exit
- MINOR: server: increase the default pool-purge-delay to 5 seconds
- BUILD: tools: do not use the weak attribute for trace() on obsolete linkers
- BUG/MEDIUM: vars: make sure the scope is always valid when accessing vars
- BUG/MEDIUM: vars: make the tcp/http unset-var() action support conditions
- BUILD: task: fix a build warning when threads are disabled
- CLEANUP: peers: Remove tabs characters.
- CLEANUP: peers: Replace hard-coded values by macros.
- BUG/MINOR: peers: Wrong stick-table update message building.
- MINOR: dict: Add dictionary new data structure.
- MINOR: peers: Add a LRU cache implementation for dictionaries.
- MINOR: stick-table: Add "server_name" new data type.
- MINOR: cfgparse: Space allocation for "server_name" stick-table data type.
- MINOR: proxy: Add a "server by name" tree to proxy.
- MINOR: server: Add a dictionary for server names.
- MINOR: stream: Stickiness server lookup by name.
- MINOR: peers: Make peers protocol support new "server_name" data type.
- MINOR: stick-table: Make the CLI stick-table handler support dictionary entry data type.
- REGTEST: Add a basic server by name stickiness reg test.
- MINOR: peers: Add dictionary cache information to "show peers" CLI command.
- MINOR: peers: Replace hard-coded for peer protocol 64-bits value encoding by macros.
- MINOR: peers: Replace hard-coded values for peer protocol messaging by macros.
- CLEANUP: ssl: remove unneeded defined(OPENSSL_IS_BORINGSSL)
- BUILD: travis-ci improvements
- MINOR: SSL: add client/server random sample fetches
- BUG/MINOR: channel/htx: Don't alter channel during forward for empty HTX message
- BUG/MINOR: contrib/prometheus-exporter: Add HTX data block in one time
- BUG/MINOR: mux-h1: errflag must be set on H1S and not H1M during output processing
- MEDIUM: mux-h1: refactor output processing
- MINOR: mux-h1: Add the flag HAVE_O_CONN on h1s
- MINOR: mux-h1: Add h1_eval_htx_hdrs_size() to estimate size of the HTX headers
- MINOR: mux-h1: Don't count the EOM in the estimated size of headers
- MEDIUM: cache/htx: Always store info about HTX blocks in the cache
- MEDIUM: htx: Add the parsing of trailers of chunked messages
- MINOR: htx: Don't use end-of-data blocks anymore
- BUG/MINOR: mux-h1: Don't send more data than expected
- BUG/MINOR: flt_trace/htx: Only apply the random forwarding on the message body.
- BUG/MINOR: peers: Wrong "server_name" decoding.
- BUG/MEDIUM: servers: Don't attempt to destroy idle connections if disabled.
- MEDIUM: checks: Make sure we unsubscribe before calling cs_destroy().
- MEDIUM: connections: Wake the upper layer even if sending/receiving is disabled.
- MEDIUM: ssl: Handle subscribe by itself.
- MINOR: ssl: Make ssl_sock_handshake() static.
- MINOR: connections: Add a new xprt method, remove_xprt.
- MINOR: connections: Add a new xprt method, add_xprt().
- MEDIUM: connections: Introduce a handshake pseudo-XPRT.
- MEDIUM: connections: Remove CONN_FL_SOCK*
- BUG/MEDIUM: ssl: Don't forget to initialize ctx->send_recv and ctx->recv_wait.
- BUG/MINOR: peers: Wrong server name parsing.
- MINOR: server: really increase the pool-purge-delay default to 5 seconds
- BUG/MINOR: stream: don't emit a send-name-header in conn error or disconnect states
- MINOR: stream-int: use bit fields to match multiple stream-int states at once
- MEDIUM: stream-int: remove dangerous interval checks for stream-int states
- MEDIUM: stream-int: introduce a new state SI_ST_RDY
- MAJOR: stream-int: switch from SI_ST_CON to SI_ST_RDY on I/O
- MEDIUM: stream-int: make idle-conns switch to ST_RDY
- MEDIUM: stream: re-arrange the connection setup status reporting
- MINOR: stream-int: split si_update() into si_update_rx() and si_update_tx()
- MINOR: stream-int: make si_sync_send() from the send code of si_update_both()
- MEDIUM: stream: rearrange the events to remove the loop
- MEDIUM: stream: only loop on flags relevant to the analysers
- MEDIUM: stream: don't abusively loop back on changes on CF_SHUT*_NOW
- BUILD: stream-int: avoid a build warning in dev mode in si_state_bit()
- BUILD: peers: fix a build warning about an incorrect intiialization
- BUG/MINOR: time: make sure only one thread sets global_now at boot
- BUG/MEDIUM: tcp: Make sure we keep the polling consistent in tcp_probe_connect.
BUG/MEDIUM: tcp: Make sure we keep the polling consistent in tcp_probe_connect.
In tcp_probe_connect(), if the connection is still pending, do not disable
want_recv, we don't have any business to do so, but explicitely use
__conn_xprt_want_send(), otherwise the next time we'll reach tcp_probe_connect,
fd_send_ready() would return 0 and we would never flag the connection as
CO_FL_CONNECTED, which can lead to various problems, such as check not
completing because they consider it is not connected yet.
Willy Tarreau [Thu, 6 Jun 2019 14:50:39 +0000 (16:50 +0200)]
BUG/MINOR: time: make sure only one thread sets global_now at boot
All threads call tv_update_date(-1) at boot to set their own local time
offset. While doing so they also overwrite global_now, which is not that
much of a problem except that it's not done using an atomic write and
that it will be overwritten by every there in parallel. We only need the
first thread to set it anyway, so let's simply set it if not set and do
it using a CAS. This should fix GH issue #111.
Willy Tarreau [Thu, 6 Jun 2019 14:40:43 +0000 (16:40 +0200)]
BUILD: peers: fix a build warning about an incorrect intiialization
Just got this one :
src/peers.c:528:13: warning: missing braces around initializer [-Wmissing-braces]
src/peers.c:528:13: warning: (near initialization for 'cde.key') [-Wmissing-braces]
Indeed, this struct contains two structs so scalar zero is not a valid
value for the first field. Let's just leave it as an empty struct since
it was the purpose.
Willy Tarreau [Thu, 6 Jun 2019 12:45:26 +0000 (14:45 +0200)]
MEDIUM: stream: don't abusively loop back on changes on CF_SHUT*_NOW
These flags are not used by analysers, only by the shut* functions, and
they were covered by CF_MASK_STATIC only because in the past the shut
functions were in the middle of the analysers. But here they are causing
excess loop backs which provide no value and increase processing cost.
Ideally the CF_MASK_STATIC bitfield should be revisited, but doing this
alone is enough to reduce by 30% the number of calls to si_sync_send().
Willy Tarreau [Thu, 6 Jun 2019 12:32:49 +0000 (14:32 +0200)]
MEDIUM: stream: only loop on flags relevant to the analysers
In process_stream() we detect a number of conditions to decide to loop
back to the analysers. Some of them are excessive in that they perform
a strict comparison instead of filtering on the flags relevant to the
analysers as is done at other places, resulting in excess wakeups. One
of the effect is that after a successful WRITE_PARTIAL, a second send is
not possible, resulting in the loss of WRITE_PARTIAL, causing another
wakeup! Let's apply the same mask and verify the flags correctly.
Willy Tarreau [Thu, 6 Jun 2019 07:17:23 +0000 (09:17 +0200)]
MEDIUM: stream: rearrange the events to remove the loop
The "goto redo" at the end of process_stream() to make the states converge
is still a big source of problems and mostly stems from the very late call
to the send() functions, whose results need to be considered, while it's
being done in si_update_both() when leaving.
This patch extracts the si_sync_send() calls from si_update_both(), and
places them at the relevant places in process_stream(), which are just
after the amount of data to forward is updated and before the shutw()
calls (which were also moved). The stream-interface resynchronization
needs to go slightly upper to take into account the transition from CON
to RDY that will happen consecutive to some successful send(), and that's
all.
By doing so we can now get rid of this loop and have si_update_both()
called only to update the stream interface and channel when leaving the
function, as it was initially designed to work.
It is worth noting that a number of the remaining conditions to perform
a goto resync_XXX still seem suboptimal and would benefit from being
refined to perform les resynchronization. But what matters at this stage
is that the code remains valid and efficient.
Willy Tarreau [Thu, 6 Jun 2019 06:20:17 +0000 (08:20 +0200)]
MINOR: stream-int: make si_sync_send() from the send code of si_update_both()
Just like we have a synchronous recv() function for the stream interface,
let's have a synchronous send function that we'll be able to call from
different places. For now this only moves the code, nothing more.
Willy Tarreau [Thu, 6 Jun 2019 06:19:20 +0000 (08:19 +0200)]
MINOR: stream-int: split si_update() into si_update_rx() and si_update_tx()
We should not update the two directions at once, in fact we should update
the Rx path after recv() and the Tx path after send(). Let's start by
splitting the update function in two for this.
Willy Tarreau [Wed, 5 Jun 2019 16:02:04 +0000 (18:02 +0200)]
MEDIUM: stream: re-arrange the connection setup status reporting
Till now when a wakeup happens after a connection is attempted, we go
through sess_update_st_con_tcp() to deal with the various possible events,
then to sess_update_st_cer() to deal with a possible error detected by the
former, or to sess_establish() to complete the connection validation. There
are multiple issues in the way this is handled, which have accumulated over
time. One of them is that any spurious wakeup during SI_ST_CON would validate
the READ_ATTACHED flag and wake the analysers up. Another one is that nobody
feels responsible for clearing SI_FL_EXP if it happened at the same time as
a success (and it is present in all reports of loops to date). And another
issue is that aborts cannot happen after a clean connection setup with no
data transfer (since CF_WRITE_NULL is part of CF_WRITE_ACTIVITY). Last, the
flags cleanup work was hackish, added here and there to please the next
function (typically what had to be donne in commit 7a3367cca to work around
the url_param+reuse issue by moving READ_ATTACHED to CON).
This patch performs a significant lift up of this setup code. First, it
makes sure that the state handlers are the ones responsible for the cleanup
of the stuff they rely on. Typically sess_sestablish() will clean up the
SI_FL_EXP flag because if we decided to validate the connection it means
that we want to ignore this late timeout. Second, it splits the CON and
RDY state handlers because the former only has to deal with failures,
timeouts and non-events, while the latter has to deal with partial or
total successes. Third, everything related to connection success was
moved to sess_establish() since it's the only safe place to do so, and
this function is also called at a few places to deal with synchronous
connections, which are not seen by intermediary state handlers.
The code was made a bit more robust, for example by making sure we
always set SI_FL_NOLINGER when aborting a connection so that we don't
have any risk to leave a connection in SHUTW state in case it was
validated late. The useless return codes of some of these functions
were dropped so that callers only rely on the stream-int's state now
(which was already partially the case anyway).
The code is now a bit cleaner, could be further improved (and functions
renamed) but given the sensitivity of this part, better limit changes to
strictly necessary. It passes all reg tests.
Willy Tarreau [Thu, 6 Jun 2019 07:17:15 +0000 (09:17 +0200)]
MEDIUM: stream-int: make idle-conns switch to ST_RDY
The purpose of making idle-conns switch to SI_ST_CON was to make the
transition detectable and the operation retryable in case of connection
error. Now we have the RDY state for this which is much more suitable
since it indicates a validated connection on which we didn't necessarily
send anything yet. This will still lead to a transition to EST while not
requiring unnatural write polling nor connect timeouts.
Willy Tarreau [Wed, 5 Jun 2019 14:43:44 +0000 (16:43 +0200)]
MAJOR: stream-int: switch from SI_ST_CON to SI_ST_RDY on I/O
Now whenever an I/O event succeeds during a connection attempt, we
switch the stream-int's state to SI_ST_RDY. This allows si_update()
to update R/W timeouts on the channel and end points to start to
consume outgoing data and to subscribe to lower layers in case of
failure. It also allows chk_rcv() to be performed on the other side
to enable data forwarding and make sure we don't fall into a situation
where no more events happen and nothing moves anymore.
Willy Tarreau [Wed, 5 Jun 2019 12:34:03 +0000 (14:34 +0200)]
MEDIUM: stream-int: introduce a new state SI_ST_RDY
The main reason for all the trouble we're facing with stream interface
error or timeout reports during the connection phase is that we currently
can't make the difference between a connection attempt and a validated
connection attempt. It is problematic because we tend to switch early
to SI_ST_EST but can't always do what we want in this state since it's
supposed to be set when we don't need to visit sess_establish() again.
This patch introduces a new state betwen SI_ST_CON and SI_ST_EST, which
is SI_ST_RDY. It indicates that we've verified that the connection is
ready. It's a transient state, like SI_ST_DIS, that cannot persist when
leaving process_stream(). For now it is not set, only verified in various
tests where SI_ST_CON was used or SI_ST_EST depending on the cases.
The stream-int state diagram was minimally updated to reflect the new
state, though it is largely obsolete and would need to be seriously
updated.
Willy Tarreau [Wed, 5 Jun 2019 12:53:22 +0000 (14:53 +0200)]
MEDIUM: stream-int: remove dangerous interval checks for stream-int states
The stream interface state checks involving ranges were replaced with
checks on a set of states, already revealing some issues. No issue was
fixed, all was replaced in a one-to-one mapping for easier control. Some
checks involving a strict difference were also replaced with fields to
be clearer. At this stage, the result must be strictly equivalent. A few
tests were also turned to their bit-field equivalent for better readability
or in preparation for upcoming changes.
The test performed in the SPOE filter was swapped so that the closed and
error states are evicted first and that the established vs conn state is
tested second.
Willy Tarreau [Wed, 5 Jun 2019 12:45:06 +0000 (14:45 +0200)]
MINOR: stream-int: use bit fields to match multiple stream-int states at once
At some places we do check for ranges of stream-int states but those
are confusing as states ordering is not well known (e.g. it's not obvious
that CER is between CON and EST). Let's create a bit field from states so
that we can match multiple states at once instead. The new enum si_state_bit
contains SI_SB_* which are state bits instead of state values. The function
si_state_in() indicates if the state in argument is one of those represented
by the bit mask in second argument.
Willy Tarreau [Wed, 5 Jun 2019 13:29:38 +0000 (15:29 +0200)]
BUG/MINOR: stream: don't emit a send-name-header in conn error or disconnect states
The test for the send-name-header field used to cover all states between
SI_ST_CON and SI_ST_CLO, which include SI_ST_CER and SI_ST_DIS. Trying to
send a header in these states makes no sense at all, so let's fix this.
This should have no visible impact so no backport is needed.
Willy Tarreau [Thu, 6 Jun 2019 14:25:55 +0000 (16:25 +0200)]
MINOR: server: really increase the pool-purge-delay default to 5 seconds
Commit fb55365f9 ("MINOR: server: increase the default pool-purge-delay
to 5 seconds") did this but the setting placed in new_server() was
overwritten by srv_settings_cpy() from the default-server values preset
in init_default_instance(). Now let's put it at the right place.
This commit was not complete:
BUG/MINOR: peers: Wrong "server_name" decoding.
We forgot forgotten to move forward <msg_cur> pointer variable after
having parse the server name string.
Again this bug may happen only if we add stick-table new data type after
the server name which is the current last one. Furthermore this bug is
visible only the first time a peer sends a server name for a stick-table
entry.
BUG/MEDIUM: ssl: Don't forget to initialize ctx->send_recv and ctx->recv_wait.
When creating a new ssl_sock_ctx, don't forget to initialize its send_recv
and recv_wait to NULL, or we may end up dereferencing random values, and
crash.
Olivier Houchard [Tue, 28 May 2019 08:12:02 +0000 (10:12 +0200)]
MEDIUM: connections: Remove CONN_FL_SOCK*
Now that the various handshakes come with their own XPRT, there's no
need for the CONN_FL_SOCK* flags, and the conn_sock_want|stop functions,
so garbage-collect them.
Olivier Houchard [Mon, 27 May 2019 10:09:19 +0000 (12:09 +0200)]
MEDIUM: connections: Introduce a handshake pseudo-XPRT.
Add a new XPRT that is used when using non-SSL handshakes, such as proxy
protocol or Netscaler, instead of taking care of it in conn_fd_handler().
This XPRT is installed when any of those is used, and it removes itself once
the handshake is done.
This should allow us to remove the distinction between CO_FL_SOCK* and
CO_FL_XPRT*.
Olivier Houchard [Thu, 23 May 2019 15:47:36 +0000 (17:47 +0200)]
MINOR: connections: Add a new xprt method, remove_xprt.
Add a new method to xprt_ops, remove_xprt. When called, if the provided
xprt_ctx is the same as the xprt's underlying xprt_ctx, it then uses the
new xprt provided, otherwise it calls the remove_xprt method of the next
xprt.
The goal is to be able to add a temporary xprt, that removes itself from
the chain when it did what it had to do. This will be used to implement
a pseudo-xprt for anything that just requires a handshake (such as the
proxy protocol).
Olivier Houchard [Mon, 20 May 2019 12:02:16 +0000 (14:02 +0200)]
MEDIUM: ssl: Handle subscribe by itself.
As the SSL code may have different needs than the upper layer, ie it may want
to receive when the upper layer wants to right, instead of directly forwarding
the subscribe to the underlying xprt, handle it ourself. The SSL code will
know remember any subscribe call, and wake the tasklet when it is ready
for more I/O.
MEDIUM: connections: Wake the upper layer even if sending/receiving is disabled.
In conn_fd_handler(), if the fd is ready to send/recv, wake the upper layer
even if we have CO_FL_ERROR, or if CO_FL_XPRT_RD_ENA/CO_FL_XPRT_WR_ENA isn't
set. The only reason we should reach that point is if we had a shutw/shutr,
and the upper layer may want to know about it, and is supposed to handle it
anyway.
Olivier Houchard [Fri, 31 May 2019 17:20:36 +0000 (19:20 +0200)]
MEDIUM: checks: Make sure we unsubscribe before calling cs_destroy().
When we want to destroy the conn_stream for some reason, usually on error,
make sure we unsubscribed before doing so. If we subsscribed, the xprt may
ultimately wake our tasklet on close, aand the check tasklet doesn't expect
it ot happen when we have no longer any conn_stream.
BUG/MEDIUM: servers: Don't attempt to destroy idle connections if disabled.
In connect_server(), when deciding if we should attempt to remove idle
connections, because we have to many file descriptors opened, don't attempt
to do so if idle connection pool is disabled (with pool-max-conn 0), as
if it is, srv->idle_orphan_conns won't even be allocated, and trying to
dereference it will cause a crash.
This patch fixes a bug which does not occur at this time because the "server_name"
stick-table data type is the last one (see STKTABLE_DT_SERVER_NAME). It was introduced
by this commit: "MINOR: peers: Make peers protocol support new "server_name" data type".
Indeed when receiving STD_T_DICT stick-table data type we first decode the length
of these data, then we decode the ID of this dictionary entry. To know if there
is remaining data to parse, we check if we have reached the end of the current data,
relying on <msg_end> variable. But <msg_end> is at the end of the entire message!
So this patch computes the correct end of the current STD_T_DICT before doing
anything else with it.
BUG/MINOR: flt_trace/htx: Only apply the random forwarding on the message body.
In the function trace_http_payload(), when the random forwarding is enabled,
only blocks of type HTX_BLK_DATA must be considered. Because other blocks must
be forwarding in one time.
This patch must be backported to 1.9. But it will have to be adapted. Because
several changes on the HTX in the 2.0 are missing in the 1.9.
BUG/MINOR: mux-h1: Don't send more data than expected
In h1_snd_buf(), we try to consume as much data as possible in a loop. In this
loop, we first format the raw HTTP message from the HTX message, then we try to
send it. But we must be carefull to never send more data than specified by the
stream-interface.
This type of blocks is useless because transition between data and trailers is
obvious. And when there is no trailers, the end-of-message is still there to
know when data end for chunked messages.
MEDIUM: htx: Add the parsing of trailers of chunked messages
HTTP trailers are now parsed in the same way headers are. It means trailers are
converted to K/V blocks followed by an end-of-trailer marker. For now, to make
things simple, the type for trailer blocks are not the same than for header
blocks. But the aim is to make no difference between headers and trailers by
using the same type. Probably for the end-of marker too.
MEDIUM: cache/htx: Always store info about HTX blocks in the cache
It was only done for the headers (including the EOH marker). data were prefixed
by the info field of these blocks. The payload and the trailers of the messages
were stored in raw. The total size of headers and payload were kept in the
cached object state to help output formatting.
Now, info about each HTX block is store in the cache. Only data are allowed to
be splitted. Otherwise, all blocks of an HTX message are handled the same way,
both when storing a message in the cache and when delivering it from the
cache. This will help the cache implementation to be more robust to internal
changes in the HTX. Especially for the upcoming parsing of trailers. There is
also no more need to keep extra info in the cached object state.
MINOR: mux-h1: Don't count the EOM in the estimated size of headers
If there is not enough space in the HTX message, the EOM can be delayed when a
bodyless message is added. So, don't count it in the estimated size of headers.
This flag is set on h1s when output messages are formatted to know the
connection mode was already processed. It replace the variable process_conn_mode
in the function h1_process_output().
When we format the H1 output, in the loop on the HTX message, instead of
switching on the block types, we now switch on the message state. It is almost
the same, but it will ease futur changes, on trailers and end-of markers.
BUG/MINOR: contrib/prometheus-exporter: Add HTX data block in one time
Since recent changes on the way HTX data blocks are added in an HTX message, we
must now be sure the prometheus service add its own blocks in one time. Indeed,
the function htx_add_data() may now decide to only copy a part of data. So
instead, we must call htx_add_data_atonce() instead.
BUG/MINOR: channel/htx: Don't alter channel during forward for empty HTX message
In channel_htx_forward() and channel_htx_forward_forever(), if the HTX message
is empty, the underlying buffer may be really empty too. And we have no warranty
the caller will call htx_to_buf() later. And in practice, it is almost never
done. So the channel's buffer must not be altered. Otherwise, the buffer may be
considered as full (data == size) for an empty HTX message and no outgoing data.
These fetches retrieve the client or server random value sent during the
handshake.
Their use is to be able to decrypt traffic sent using ephemeral ciphers. Tools
like wireshark expect a TLS log file with lines in a few known formats
(https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob;f=epan/dissectors/packet-tls-utils.c;h=28a51fb1fb029eae5cea52d37ff5b67d9b11950f;hb=HEAD#l5209).
Previously the only format supported using data retrievable from HAProxy state
was the one utilizing the Session-ID. However an SSL/TLS session ID is
optional, and thus cannot be relied upon for this purpose.
This change introduces the ability to extract the client random instead which
can be used for one of the other formats. The change also adds the ability to
extract the server random, just in case it might have some other use, as the
code change to support this was trivial.
Ilya Shipitsin [Tue, 4 Jun 2019 21:16:51 +0000 (02:16 +0500)]
BUILD: travis-ci improvements
full list:
update LibreSSL to 2.9.2
speed up build by using "make -j3"
cache BoringSSL checkout
build prometeus exporter
add basic cygwin build
add USE_TFO=1, USE_SYSTEMD=1 to linux builds
MINOR: peers: Replace hard-coded for peer protocol 64-bits value encoding by macros.
With this patch we define macros for the minimum values which are
encoded for 2 up to 10 bytes. This latter is big enough to encode
UINT64_MAX. We replaced at several places 240 value by PEER_ENC_2BYTES_MIN
which is the minimum value which is encoded with 2 bytes. The peer protocol
encoding consisting in encoding with only one byte a value which is
less than PEER_ENC_2BYTES_MIN and with at least 2 bytes a 64-bits value greater
than PEER_ENC_2BYTES_MIN.
REGTEST: Add a basic server by name stickiness reg test.
With this new reg test we ensure the server by names stickiness is functional
between servers organized differently (with identical names, but with different IDs)
among two haproxy processes backends.
MINOR: peers: Make peers protocol support new "server_name" data type.
Make usage of the APIs implemented for dictionaries (dict.c) and their LRU caches (struct dcache)
so that to send/receive server names used for the server by name stickiness. These
names are sent over the network as follows:
- in every case we send the encode length of the data (STD_T_DICT), then
- if the server names is not present in the cache used upon transmission (struct dcache_tx)
we cache it and we the ID of this TX cache entry followed the encode length of the
server name, and finally the sever name itseft (non NULL terminated string).
- if the server name is present, we repead these operations but we only send the TX cache
entry ID.
Upon receipt, the couple of (cache IDs, server name) are stored the LRU cache used
only upon receipt (struct dcache_rx). As the peers protocol is symetrical, the fact
that the server name is present in the received data (resp. or not) denotes if
the entry is absent (resp. or not).
With this patch we modify the stickiness server targets lookup behavior.
First we look for this server targets by their names before looking for them by their
IDs if not found. We also insert a dictionary entry for the name of the server targets
and store the address of this entry in the underlying stick-table.
MINOR: cfgparse: Space allocation for "server_name" stick-table data type.
When parsing sticking rules, with this patch we reserve some room for the new
"server_name" stick-table data type, as this is already done for "server_id",
setting the offset and used space (in bytes) in the stick-table entry thanks
to stkable_alloc_data_type().
MINOR: stick-table: Add "server_name" new data type.
This simple patch only adds definitions to create a new stick-table
data type ID and a new standard type to store information in relation
wich dictionary entries (STD_T_DICT).
MINOR: peers: Add a LRU cache implementation for dictionaries.
We want to send some stick-table data fields stored as strings in dictionaries
without consuming too much memory and CPU. To do so we implement with this patch
a cache for send/received dictionaries entries. These dictionary of strings entries are
stored in others real dictionary entries with an identifier as key (unsigned int)
and a pointer to the dictionary of strings entries as values.
This patch adds minimalistic definitions to implement dictionary new data structure
which is an ebtree of ebpt_node structs with strings as keys. Note that this has nothing
to see with real dictionary data structure (maps of keys in association with values).
When creating this patch "CLEANUP: peers: Replace hard-coded values by macros",
we realized there was a remaining place in peer_prepare_updatemsg() where the maximum
of an encoded length harcoded value could be replaced by PEER_MSG_ENCODED_LENGTH_MAXLEN
macro. But in this case, the 1 harcoded value for the header length is wrong. Should
be 2 or PEER_MSG_HEADER_LEN. So, there is a missing byte to encode the length of
remaining data after the header.
Note that the bug was never encountered because even with a missing byte, we could
encode a maximum length which would be (1<<25) (32MB) according to the following
extract of the peers protocol documentation which were from far a never reached limit
I guess:
CLEANUP: peers: Replace hard-coded values by macros.
All the peer stick-table messages are made of a 2-byte header (PEER_MSG_HEADER_LEN)
followed by the encoded length of the remaining data wich is harcoded as 5 (in bytes)
for the maximum (PEER_MSG_ENCODED_LENGTH_MAXLEN). With such a length we can encode
a maximum length which equals to (1 << 32) - 1, which is from far enough.
This patches replaces both these values by macros where applicable.
Willy Tarreau [Tue, 4 Jun 2019 15:16:29 +0000 (17:16 +0200)]
BUILD: task: fix a build warning when threads are disabled
The __decl_hathreads() macro will leave a lone semi-colon making the end
of variables declarations, resulting in a warning if threads are disabled.
Let's simply swap it with the last variable. Thanks to Ilya Shipitsin for
reporting this issue.
Willy Tarreau [Tue, 4 Jun 2019 14:43:29 +0000 (16:43 +0200)]
BUG/MEDIUM: vars: make the tcp/http unset-var() action support conditions
Patrick Hemmer reported that http-request unset-var(foo) if ... fails to
parse. The reason is that it reuses the same parser as "set-var(foo)" which
makes a special case of the arguments, supposed to be a sample expression
for set-var, but which must not exist for unset-var. Unfortunately the
parser finds "if" or "unless" and believes it's an expression. Let's simply
drop the test so that the outer rule parser deals with potential extraneous
keywords.
This should be backported to all versions supporting unset-var().
The tests on the variables scopes is not strict enough, it needs to always
verify if the stream is valid when accessing a req/res/txn variable. This
patch does this by adding a new get_vars() function which does the job
instead of open-coding all the lookups everywhere.
It must be backported to all versions supporting set-var and
"tcp-request session" so at least 1.9 and 1.8.
Willy Tarreau [Tue, 4 Jun 2019 14:02:26 +0000 (16:02 +0200)]
BUILD: tools: do not use the weak attribute for trace() on obsolete linkers
The default dummy trace() function is marked weak in order to be easily
replaced at link time. Some linkers are having issues with the weak
attribute, so let's not mark it on these linkers. They will simply not
be able to build with TRACE=1, which is no big deal since it's only used
by developers.
Willy Tarreau [Tue, 4 Jun 2019 12:06:31 +0000 (14:06 +0200)]
MINOR: server: increase the default pool-purge-delay to 5 seconds
The default used to be a very aggressive delay of 1 second before starting
to purge idle connections, but tests show that with bursty traffic it's a
bit short. Let's increase this to 5 seconds.
Willy Tarreau [Mon, 3 Jun 2019 15:27:52 +0000 (17:27 +0200)]
MEDIUM: stream: make a full process_stream() loop when completing I/O on exit
During 1.9 development cycle a shortcut was made in process_stream() to
update the analysers immediately after an I/O even detected on the send()
path while leaving the function. In order to prevent this from being abused
by a single stream stealing all the CPU, the loop didn't cover the initial
recv() call, so that events ultimately converge.
This has caused a number of issues over time because the conditions to
decide to loop are a bit tricky. For example the CF_READ_PARTIAL flag is
not immediately removed from rqf_last and may appear for a long time at
this point, sometimes causing some loops to last long.
Another unexpected side effect is that all analysers are called again with
no data to process, just because CF_WRITE_PARTIAL is present. We cannot get
rid of this event even if of very rare use, because some analysers might
wait for some data to leave a buffer before proceeding. With a full loop,
this event would have been merged with a subsequent recv() allowing analysers
to do something more useful than just ack an event they don't care about.
While during early 1.9-dev it was very important to be kind with the
scheduler, nowadays it's lock-free for local tasks so this optimization
is much less interesting to use it for I/Os, especially if we factor in
the trouble it causes.
This patch thus removes the use of the loop for regular I/Os and instead
performs a task_wakeup() with an I/O event so that the task will be
scheduled after all other ones and will have a chance to perform another
recv() and possibly to gather more I/O events to be processed at once.
Synchronous errors and transitions to SI_ST_DIS however are still handled
by the loop.
Doing so significantly reduces the average number of calls to analysers
(those are typically halved when compression is enabled in legacy mode),
and as a side benefit, has increased the H1 performance by about 1%.
Willy Tarreau [Mon, 3 Jun 2019 12:18:22 +0000 (14:18 +0200)]
MEDIUM: mux-h1: don't use CS_FL_REOS anymore
This flag was already removed from other muxes and from the upper layers,
because it was misused. It indicates to the mux that the end of a stream
was already seen and is pending after existing data, but this should not
be on the conn_stream but internal to the mux.
This patch creates a new H1S flag H1S_F_REOS to replace it and uses it to
replace the last uses of CS_FL_REOS.
Willy Tarreau [Mon, 3 Jun 2019 11:42:54 +0000 (13:42 +0200)]
BUG/MEDIUM: mux-h1: only check input data for the current stream, not next one
The mux-h1 doesn't properly propagate end of streams to the application
layer when requests are pipelined. This is visible by launching h2load
in h1 mode with -m greater than 1 : issuing Ctrl-C has no effect until
the client timeout expires.
The reason is that among the checks conditionning the reporting of the
end of stream status and waking up the streams, is a test on the presence
of remaining input data in the demux. But with pipelining, these data may
be present for another stream and should not prevent the end of stream
condition from being reported.
This patch addresses this issue by introducing a new function
"h1s_data_pending" which returns a boolean indicating if there are in
the demux buffer any data for the current stream. That is, if the stream
is in H1_MSG_DONE state, there are never any data for it. And if it's in
a different state, then the demux buffer is checked. This replaces the
tests on b_data(&h1c->ibuf) and correctly allows end of streams to be
reported at the end of requests.
It's worth noting that 1.9 doesn't suffer from this issue but it possibly
isn't completely immune either given that the same tests are present.
Willy Tarreau [Mon, 3 Jun 2019 08:12:22 +0000 (10:12 +0200)]
MINOR: mux-h1: don't try to recv() before the connection is ready
Just as we already do in h1_send(), if the connection is not yet ready,
do not proceed and instead subscribe. This avoids a needless recvfrom()
and subscription to polling for a case which will never work since the
request was not even sent.
Willy Tarreau [Mon, 3 Jun 2019 08:14:18 +0000 (10:14 +0200)]
MINOR: connection: also stop receiving after a SOCKS4 response
Just as is done in previous patch for all handshake handlers,
also stop receiving after a SOCKS4 response was received. This
one escaped the previous cleanup but must be done to keep the
code safe.
Connection handshakes were rarely stacked on top of each other, but the
recent experiments consisting in sending PROXY over SOCKS4 revealed a
number of issues in these lower layers. First, each handler waiting for
data MUST subscribe to recv events with __conn_sock_want_recv() and MUST
unsubscribe from send events using __conn_sock_stop_send() to avoid any
wake-up loop in case a previous sender has set this. Second, each handler
waiting for sending MUST subscribe to send events with __conn_sock_want_send()
and MUST unsubscribe from recv events using __conn_sock_stop_recv() to
avoid any wake-up loop in case some data are available on the connection.
Till now this was done at various random places, and in particular the
cases where the FD was not ready for recv forgot to re-enable reading.
Second, while senders can happily use conn_sock_send() which automatically
handles EINTR, loops, and marks the FD as not ready with fd_cant_send(),
there is no equivalent for recv so receivers facing EAGAIN MUST call
fd_cant_send() to enable polling. It could be argued that implementing
an equivalent conn_sock_recv() function could be useful and more
long-term proof than the current situation.
Third, both types of handlers MUST unsubscribe from their respective
events once they managed to do their job, and none may even play with
__conn_xprt_*(). Here again this was lacking, and one surprizing call
to __conn_xprt_stop_recv() was present in the proxy protocol parser
for TCP6 messages!
Thanks to Alexander Liu for his help on this issue.
This patch must be backported to 1.9 and possibly some older versions,
though the SOCKS parts should be dropped.
Willy Tarreau [Sun, 2 Jun 2019 10:06:08 +0000 (12:06 +0200)]
[RELEASE] Released version 2.0-dev5
Released version 2.0-dev5 with the following main changes :
- BUILD: watchdog: use si_value.sival_int, not si_int for the timer's value
- BUILD: signals: FreeBSD has SI_LWP instead of SI_TKILL
- BUILD: watchdog: condition it to USE_RT
- MINOR: raw_sock: report global traffic statistics
- MINOR: stats: report the global output bit rate in human readable form
- BUG/MINOR: proto-htx: Try to keep connections alive on redirect
- BUG/MEDIUM: spoe: Don't use the SPOE applet after releasing it
- BUG/MINOR: lua: Set right direction and flags on new HTTP objects
- BUG/MINOR: mux-h2: Count EOM in bytes sent when a HEADERS frame is formatted
- BUG/MINOR: mux-h1: Report EOI instead EOS on parsing error or H2 upgrade
- BUG/MEDIUM: proto-htx: Not forward too much data when 1xx reponses are handled
- BUG/MINOR: htx: Remove a forgotten while loop in htx_defrag()
- DOC: fix typos
- BUG/MINOR: ssl_sock: Fix memory leak when disabling compression
- OPTIM: freq-ctr: don't take the date lock for most updates
- MEDIUM: mux-h2: avoid doing expensive buffer realigns when not absolutely needed
- CLEANUP: debug: remove the TRACE() macro
- MINOR: buffer: introduce b_make() to make a buffer from its parameters
- MINOR: buffer: add a new buffer ring API to manipulate rings of buffers
- MEDIUM: mux-h2: replace all occurrences of mbuf with a buffer ring
- MEDIUM: mux-h2: make the conditions to send based on mbuf, not just its tail
- MINOR: mux-h2: introduce h2_release_mbuf() to release all buffers in the mbuf ring
- MEDIUM: mux-h2: make the send() function iterate over all mux buffers
- CLEANUP: mux-h2: consistently use a local variable for the mbuf
- MINOR: mux-h2: report the mbuf's head and tail in "show fd"
- MAJOR: mux-h2: switch to next mux buffer on buffer full condition.
- BUILD: connections: shut up gcc about impossible out-of-bounds warning
- BUILD: ssl: fix latest LibreSSL reg-test error
- MINOR: cli/activity: remove "fd_del" and "fd_skip" from show activity
- MINOR: cli/activity: add 3 general purpose counters in development mode
- BUG/MAJOR: lb/threads: make sure the avoided server is not full on second pass
- BUG/MEDIUM: queue: fix the tree walk in pendconn_redistribute.
- BUG/MEDIUM: threads: fix double-word CAS on non-optimized 32-bit platforms
- MEDIUM: config: now alert when two servers have the same name
- MINOR: htx: Remove the macro IS_HTX_SMP() and always use IS_HTX_STRM() instead
- MINOR: htx: Move the macro IS_HTX_STRM() in proto/stream.h
- MINOR: htx: Store the head position instead of the wrap one
- MINOR: htx: Store start-line block's position instead of address of its payload
- MINOR: htx: Add functions to get the first block of an HTX message
- MINOR: mux-h2/htx: Get the start-line from the head when HEADERS frame is built
- MINOR: htx: Replace the function http_find_stline() by http_get_stline()
- CLEANUP: htx: Remove unused function htx_get_stline()
- MINOR: http/htx: Use sl_pos directly to replace the start-line
- MEDIUM: http/htx: Perform analysis relatively to the first block
- MINOR: channel/htx: Call channel_htx_recv_max() from channel_recv_max()
- MINOR: htx: Add function htx_get_max_blksz()
- BUG/MINOR: htx: Change htx_xfer_blk() to also count metadata
- MEDIUM: mux-h1: Use the count value received from the SI in h1_rcv_buf()
- MINOR: mux-h2: Use the count value received from the SI in h2_rcv_buf()
- MINOR: stream-int: Don't use the flag CO_RFL_KEEP_RSV anymore in si_cs_recv()
- MINOR: connection: Remove the unused flag CO_RFL_KEEP_RSV
- MINOR: mux-h2/htx: Support zero-copy when possible in h2_rcv_buf()
- MINOR: htx: Add a field to set the memory used by headers in the HTX start-line
- MINOR: h2/htx: Set hdrs_bytes on the SL when an HTX message is produced
- MINOR: mux-h1: Set hdrs_bytes on the SL when an HTX message is produced
- MINOR: htx: Be sure to xfer all headers in one time in htx_xfer_blks()
- MEDIUM: htx: 1xx messages are now part of the final reponses
- MINOR: channel/htx: Add function to forward headers of an HTX message
- MINOR: filters/htx: Use channel_htx_fwd_headers() after headers filtering
- MINOR: proto-htx: Use channel_htx_fwd_headers() to forward 1xx responses
- MEDIUM: htx: Store the first block position instead of the start-line one
- MINOR: stats/htx: don't use the first block position but the head one
- MINOR: channel/htx: Add functions to forward a part or all HTX payload
- MINOR: proto-htx: Use channel_htx_fwd_all() when unfiltered body are forwarded
- MEDIUM: filters/htx: Filter body relatively to the first block
- MINOR: htx: Optimize htx_drain() when all data are drained
- MINOR: htx: don't rely on htx_find_blk() anymore in the function htx_truncate()
- MINOR: htx: remove the unused function htx_find_blk()
- MINOR: htx: Remove support of pseudo headers because it is unused
- BUG/MEDIUM: http: fix "http-request reject" when not final
- MINOR: ssl: Make sure the underlying xprt's init method doesn't fail.
- MINOR: ssl: Don't forget to call the close method of the underlying xprt.
- MINOR: htx: rename htx_append_blk_value() to htx_add_data_atonce()
- MINOR: htx: make htx_add_data() return the transmitted byte count
- MEDIUM: htx: make htx_add_data() never defragment the buffer
- MINOR: activity: write totals on the "show activity" output
- MINOR: activity: report totals and average separately
- MEDIUM: poller: separate the wait time from the wake events
- MINOR: activity: report the number of failed pool/buffer allocations
- MEDIUM: buffers: relax the buffer lock a little bit
- MINOR: task: turn the WQ lock to an RW_LOCK
- MEDIUM: task: don't grab the WR lock just to check the WQ
- BUG/MEDIUM: mux-h1: Don't skip the TCP splicing when there is no more data to read
- MEDIUM: sessions: Introduce session flags.
- BUG/MEDIUM: h2: Don't forget to set h2s->cs to NULL after having free'd cs.
- BUG/MEDIUM: mux-h2: fix the conditions to end the h2_send() loop
- BUG/MEDIUM: mux-h2: don't refrain from offering oneself a used buffer
- BUG/MEDIUM: connection: Use the session to get the origin address if needed.
- MEDIUM: tasks: Get rid of active_tasks_mask.
- MEDIUM: connection: Upstream SOCKS4 proxy support
- BUILD: contrib/prometheus: fix build breakage caused by move of idle_pct
- BUG/MINOR: deinit/threads: make hard-stop-after perform a clean exit
Willy Tarreau [Sun, 2 Jun 2019 09:11:29 +0000 (11:11 +0200)]
BUG/MINOR: deinit/threads: make hard-stop-after perform a clean exit
As reported in GH issue #99, when hard-stop-after triggers and threads
are in use, the chance that any thread releases the resources in use by
the other ones is non-null. Thus no thread should be allowed to deinit()
nor exit by itself.
Here we take a different approach. We simply use a 3rd possible value
for the "killed" variable so that all threads know they must break out
of the run-poll-loop and immediately stop.
This patch was tested by commenting the stream_shutdown() calls in
hard_stop() to increase the chances to see a stream use released
resources. With this fix applied, it never crashes anymore.
Willy Tarreau [Sun, 2 Jun 2019 08:38:48 +0000 (10:38 +0200)]
BUILD: contrib/prometheus: fix build breakage caused by move of idle_pct
The idle_pct thread-local variable was moved to struct thread_info by
commit 81036f2 ("MINOR: time: move the cpu, mono, and idle time to
thread_info") but not updated in service-prometheus.c, thus breaking
it.
Alexander Liu [Wed, 22 May 2019 11:44:48 +0000 (19:44 +0800)]
MEDIUM: connection: Upstream SOCKS4 proxy support
Have "socks4" and "check-via-socks4" server keyword added.
Implement handshake with SOCKS4 proxy server for tcp stream connection.
See issue #82.
I have the "SOCKS: A protocol for TCP proxy across firewalls" doc found
at "https://www.openssh.com/txt/socks4.protocol". Please reference to it.
[wt: for now connecting to the SOCKS4 proxy over unix sockets is not
supported, and mixing IPv4/IPv6 is discouraged; indeed, the control
layer is unique for a connection and will be used both for connecting
and for target address manipulation. As such it may for example report
incorrect destination addresses in logs if the proxy is reached over
IPv6]
Olivier Houchard [Wed, 29 May 2019 17:22:43 +0000 (19:22 +0200)]
MEDIUM: tasks: Get rid of active_tasks_mask.
Remove the active_tasks_mask variable, we can deduce if we've work to do
by other means, and it is costly to maintain. Instead, introduce a new
function, thread_has_tasks(), that returns non-zero if there's tasks
scheduled for the thread, zero otherwise.
Olivier Houchard [Wed, 29 May 2019 15:08:03 +0000 (17:08 +0200)]
BUG/MEDIUM: connection: Use the session to get the origin address if needed.
In conn_si_send_proxy(), if we don't have a conn_stream yet, because the mux
won't be created until the SSL handshake is done, retrieve the opposite's
connection from the session. At this point, we know the session associated
with the connection is the one that initiated it, and we can thus just use
the session's origin.
Willy Tarreau [Wed, 29 May 2019 15:50:48 +0000 (17:50 +0200)]
BUG/MEDIUM: mux-h2: don't refrain from offering oneself a used buffer
Usually when calling offer_buffer(), we don't expect to offer it to
ourselves. But with h2 we have the same buffer_wait for the two directions
so we can unblock the recv path when completing a send(), or we can unblock
part of the mux buffer after sending the first few buffers that we managed
to collect. Thus it is important to always accept to wake up any requester.
A few parts of this patch could possibly be backported but earlier versions
already have other issues related to low-buffer condition so it's not sure
it's worth taking the risk to make things worse.
Willy Tarreau [Wed, 29 May 2019 15:36:37 +0000 (17:36 +0200)]
BUG/MEDIUM: mux-h2: fix the conditions to end the h2_send() loop
The test for the mux alloc failure in h2_send() right after an attempt
at h2_process_mux() used to make sense as it tried to detect that this
latter failed to produce data. But now that we have a list of buffers,
it is a perfectly valid situation where there can still be data in the
buffer(s).
So now when we see this flag we only declare it's the last run on the
loop. In addition we need to make sure we break out of the loop on
snd_buf failure, or we'll loop indefinitely, for example when the buf
is full and we can't send.