BUG/MEDIUM: tcp-check: unbreak multiple connect rules again
The last connect rule used to be ignored and that was fixed by commit 248f1173f ("BUG/MEDIUM: tcp-check: single connect rule can't detect
DOWN servers") during 1.9 development. However this patch went a bit
too far by not breaking out of the loop after a pending connect(),
resulting in a series of failed connect() to be quickly skipped and
only the last one to be taken into account.
Technically speaking the series is not exactly skipped, it's just that
TCP checks suffer from a design issue which is that there is no
distinction between a new rule and this rule's completion in the
"connect" rule handling code. As such, when evaluating TCPCHK_ACT_CONNECT
a new connection is created regardless of any previous connection in
progress, and the previous result is ignored. It seems that this issue
is mostly specific to the connect action if we refer to the comments at
the top of the function, so it might be possible to durably address it
by reworking the connect state.
For now this patch does something simpler, it restores the behaviour
before the commit above consisting in breaking out of the loop when
the connection is in progress and after skipping comment rules. This
way we fall back to the default code waiting for completion.
This patch must be backported as far as 1.8 since the commit above
was backported there. Thanks to Jérôme Magnin for reporting and
bisecting this issue.
BUG/MINOR: mux-pt: do not pretend there's more data after a read0
Commit 8706c8131 ("BUG/MEDIUM: mux_pt: Always set CS_FL_RCV_MORE.")
was a bit excessive in setting this flag, it refrained from removing
it after read0 unless it was on an empty call. The problem it causes
is that read0 is thus ignored on the first call :
If CO_FL_SOCK_RD_SH is reported by the transport layer, it indicates the
read0 was already seen thus we must not try again and we must immedaitely
report it. The simple fix consists in removing the test on ret==0 :
BUG/MEDIUM: streams: Don't redispatch with L7 retries if redispatch isn't set.
Move the logic to decide if we redispatch to a new server from
sess_update_st_cer() to a new inline function, stream_choose_redispatch(), and
use it in do_l7_retry() instead of just setting the state to SI_ST_REQ.
That way, when using L7 retries, we won't redispatch the request to another
server except if "option redispatch" is used.
BUG/MEDIUM: streams: Don't give up if we couldn't send the request.
In htx_request_forward_body(), don't give up if we failed to send the request,
and we have L7 retries activated. If we do, we will not retry when we should.
Dave Pirotte [Wed, 10 Jul 2019 13:57:38 +0000 (13:57 +0000)]
BUG/MINOR: mux-h1: Correctly report Ti timer when HTX and keepalives are used
When HTTP keepalives are used in conjunction with HTX, the Ti timer
reports the elapsed time since the beginning of the connection instead
of the end of the previous request as stated in the documentation. Th,
Tq and Tt also report incorrectly as a result.
When creating a new h1s, check if it is the first request on the
connection. If not, set the session create times to the current
timestamp rather than the initial session accept timestamp. This makes
the logged timers behave as stated in the documentation.
BUG/MEDIUM: mux-h1: Don't release h1 connection if there is still data to send
When the h1 stream (h1s) is detached, If the connection is not really shutdown
yet and if there is still some data to send, the h1 connection (h1c) must not be
released. Otherwise, the remaining data are lost. This bug was introduced by the
commit 3ac0f430 ("BUG/MEDIUM: mux-h1: Always release H1C if a shutdown for
writes was reported").
Here is the conditions to release an h1 connection when the h1 stream is
detached :
* An error or a shutdown write occurred on the connection
(CO_FL_ERROR|CO_FL_SOCK_WR_SH)
* an error, an h2 upgrade or full shutdown occurred on the h1 connection
(H1C_F_CS_ERROR||H1C_F_UPG_H2C|H1C_F_CS_SHUTDOWN)
* A shutdown write is pending on the h1 connection and there is no more data
in the output buffer
((h1c->flags & H1C_F_CS_SHUTW_NOW) && !b_data(&h1c->obuf))
If one of these conditions is fulfilled, the h1 connection is
released. Otherwise, the release is delayed. If we are waiting to send remaining
data, a timeout is set.
This patch must be backported to 2.0 and 1.9. It fixes the issue #164.
BUG/MAJOR: listener: fix thread safety in resume_listener()
resume_listener() can be called from a thread not part of the listener's
mask after a curr_conn has gone lower than a proxy's or the process' limit.
This results in fd_may_recv() being called unlocked if the listener is
bound to only one thread, and quickly locks up.
This patch solves this by creating a per-thread work_list dedicated to
listeners, and modifying resume_listener() so that it bounces the listener
to one of its owning thread's work_list and waking it up. This thread will
then call resume_listener() again and will perform the operation on the
file descriptor itself. It is important to do it this way so that the
listener's state cannot be modified while the listener is being moved,
otherwise multiple threads can take conflicting decisions and the listener
could be put back into the global queue if the listener was used at the
same time.
It seems like a slightly simpler approach would be possible if the locked
list API would provide the ability to return a locked element. In this
case the listener would be immediately requeued in dequeue_all_listeners()
without having to go through resume_listener() with its associated lock.
This fix must be backported to all versions having the lock-less accept
loop, which is as far as 1.8 since deadlock fixes involving this feature
had to be backported there. It is expected that the code should not differ
too much there. However, previous commit "MINOR: task: introduce work lists"
will be needed as well and should not present difficulties either. For 1.8,
the commits introducing thread_mask() and LIST_ADDED() will be needed as
well, either backporting my_flsl() or switching to my_ffsl() will be OK,
and some changes will have to be performed so that the init function is
properly called (and maybe the deinit one can be dropped).
In order to test for the fix, simply set up a multi-threaded frontend with
multiple bind lines each attached to a single thread (reproduced with 16
threads here), set up a very low maxconn value on the frontend, and inject
heavy traffic on all listeners in parallel with slightly more connections
than the configured limit ( typically +20%) so that it flips very
frequently. If the bug is still there, at some point (5-20 seconds) the
traffic will go much lower or even stop, either with spinning threads or
not.
Sometimes we need to delegate some list processing to a function running
on another thread. In this case the list element will simply be queued
into a dedicated self-locked list and the task responsible for this list
will be woken up, calling the associated function which will run over the
list.
This is what work_list does. Such lists will be dedicated to a limited
type of work but will significantly ease such remote handling. A function
is provided to create these per-thread lists, their tasks and to properly
bind each task to a distinct thread, so that the caller only has to store
the resulting pointer to the start of the structure.
These structures should not be abused though as each head will consume
4 pointers per thread, hence 32 bytes per thread or 2 kB for 64 threads.
BUG/MEDIUM: servers: Fix a race condition with idle connections.
When we're purging idle connections, there's a race condition, when we're
removing the connection from the idle list, to add it to the list of
connections to free, if the thread owning the connection tries to free it
at the same time.
To fix this, simply add a per-thread lock, that has to be hold before
removing the connection from the idle list, and when, in conn_free(), we're
about to remove the connection from every list. That way, we know for sure
the connection will stay valid while we remove it from the idle list, to add
it to the list of connections to free.
This should happen rarely enough that it shouldn't have any impact on
performances.
This has not been reported yet, but could provoke random segfaults.
BUG/MEDIUM: checks: Don't attempt to read if we destroyed the connection.
In event_srv_chk_io(), only call __event_srv_chk_r() if we did not subscribe
for reading, and if wake_srv_chk() didn't return -1, as it would mean it
just destroyed the connection and the conn_stream, and attempting to use
those to recv data would lead to a crash.
BUG/MINOR: server: Be really able to keep "pool-max-conn" idle connections
The maximum number of idle connections for a server can be configured by setting
the server option "pool-max-conn". But when we try to add a connection in its
idle list, because of a wrong comparison, it may be rejected because there are
already "pool-max-conn - 1" idle connections.
BUG/MEDIUM: fd/threads: fix excessive CPU usage on multi-thread accept
While experimenting with potentially improved fairness and latency using
ticket locks on a Ryzen 16-thread/8-core, a very strange situation happened
a lot for some levels of traffic. Around 300k connections per second, no
more connections would be accepted on the multi-threaded listener but all
others would continue to work fine. All attempts to trace showed that the
threads were all in the trylock in the fd cache, or in the spinlock of
fd_update_events(), or in the one of fd_may_recv(). But as indicated this
was not a deadlock since the process continues to work fine.
After quite some investigation it appeared that the issue is caused by a
lack of fairness between the fdcache's trylock and these functions' spin
locks above. In fact, regardless of the success or failure of the fdcache's
attempt at grabbing the lock, the poller was calling fd_update_events()
which locks the FD once for something that can be done with a CAS, and
then calls fd_may_recv() with another lock for something that most often
didn't change. The high contention on these spinlocks leaves no chance to
any other thread to grab the lock using trylock(), and once this happens,
there is no thread left to process incoming connection events nor to stop
polling on the FD, leaving all threads at 100% CPU but partially operational.
This patch addresses the issue by using bit-test-and-set instead of the OR
in fd_may_recv() / fd_may_send() so that nothing is done if the FD was
already configured as expected. It does the same in fd_update_events()
using a CAS to check if the FD's events need to be changed at all or not.
With this patch applied, it became impossible to reproduce the issue, and
now there's no way to saturate all 16 CPUs with the load used for testing,
as no more than 1350-1400 were noticed at 300+kcps vs 1600.
Ideally this patch should go further and try to remove the remaining
incarnations of the fdlock as this seems possible, but it's difficult
enough to be done in a distinct patch that will not have to be backported.
It is possible that workloads involving a high connection rate may slightly
benefit from this patch and observe a slightly lower CPU usage even when
the service doesn't misbehave.
MINOR: pools: make the thread harmless during the mmap/munmap syscalls
These calls can take quite some time and leave the thread harmless so
it's better to mark it as such. This makes "show sess" respond way
faster during high loads running on processes build with DEBUG_UAF
since these calls are stressed a lot.
MINOR: pools: always pre-initialize allocated memory outside of the lock
When calling mmap(), in general the system gives us a page but does not
really allocate it until we first dereference it. And it turns out that
this time is much longer than the time to perform the mmap() syscall.
Unfortunately, when running with memory debugging enabled, we mmap/munmap()
each object resulting in lots of such calls and a high contention on the
allocator. And the first accesses to the page being done under the pool
lock is extremely damaging to other threads.
The simple fact of writing a 0 at the beginning of the page after
allocating it and placing the POOL_LINK pointer outside of the lock is
enough to boost the performance by 8x in debug mode and to save the
watchdog from triggering on lock contention. This is what this patch
does.
MINOR: pools: release the pool's lock during the malloc/free calls
The malloc and free calls and especially the underlying mmap/munmap()
can occasionally take a huge amount of time and even cause the thread
to sleep. This is visible when haproxy is compiled with DEBUG_UAF which
causes every single pool allocation/free to allocate and release pages.
In this case, when using the locked pools, the watchdog can occasionally
fire under high contention (typically requesting 40000 1M objects in
parallel over 8 threads). Then, "perf top" shows that 50% of the CPU
time is spent in mmap() and munmap(). The reason the watchdog fires is
because some threads spin on the pool lock which is held by other threads
waiting on mmap() or munmap().
This patch modifies this so that the pool lock is released during these
syscalls. Not only this allows other threads to request try to allocate
their data in parallel, but it also considerably reduces the lock
contention.
Note that the locked pools are only used on small architectures where
high thread counts would not make sense, so this will not provide any
benefit in the general case. However it makes the debugging versions
way more stable, which is always appreciated.
Lukas Tribus [Mon, 8 Jul 2019 12:29:15 +0000 (14:29 +0200)]
BUG/MINOR: ssl: revert empty handshake detection in OpenSSL <= 1.0.2
Commit 54832b97 ("BUILD: enable several LibreSSL hacks, including")
changed empty handshake detection in OpenSSL <= 1.0.2 and LibreSSL,
from accessing packet_length directly (not available in LibreSSL) to
calling SSL_state() instead.
However, SSL_state() appears to be fully broken in both OpenSSL and
LibreSSL.
Since there is no possibility in LibreSSL to detect an empty handshake,
let's not try (like BoringSSL) and restore this functionality for
OpenSSL 1.0.2 and older, by reverting to the previous behavior.
BUG/MEDIUM: servers: Don't forget to set srv_cs to NULL if we can't reuse it.
In connect_server(), if there were already a CS assosciated with the stream,
but we can't reuse it, because the target is different (because we tried a
previous connection, it failed, and we use redispatch so we switched servers),
don't forget to set srv_cs to NULL. Otherwise, if we end up reusing another
connection, we would consider we already have a conn_stream, and we won't
create a new one, so we'd have a new connection but we would not be able to
use it.
This can explain frozen streams and connections stuck in CLOSE_WAIT when
using redispatch.
BUG/MEDIUM: stream-int: Don't rely on CF_WRITE_PARTIAL to unblock opposite si
In the function stream_int_notify(), when the opposite stream-interface is
blocked because there is no more room into the input buffer, if the flag
CF_WRITE_PARTIAL is set on this buffer, it is unblocked. It is a way to unblock
the reads on the other side because some data was sent.
But it is a problem during the fast-forwarding because only the stream is able
to remove the flag CF_WRITE_PARTIAL. So it is possible to have this flag because
of a previous send while the input buffer of the opposite stream-interface is
now full. In such case, the opposite stream-interface will be woken up for
nothing because its input buffer is full. If the same happens on the opposite
side, we will have a loop consumming all the CPU.
To fix the bug, the opposite side is now only notify if there is some available
room in its input buffer in the function si_cs_send(), so only if some data was
sent.
MINOR: stream-int: Factorize processing done after sending data in si_cs_send()
In the function si_cs_send(), what is done when an error occurred on the
connection or the conn_stream or when some successfully data was send via a pipe
or the channel's buffer may be factorized at the function. It slightly simplify
the function.
This patch must be backported to 2.0 and 1.9 because a bugfix depends on it.
BUG/MINOR: mux-h1: Don't process input or ouput if an error occurred
It is useless to proceed if an error already occurred. Instead, it is better to
wait it will be catched by the stream or the connection, depending on which is
the first one to detect it.
BUG/MEDIUM: mux-h1: Handle TUNNEL state when outgoing messages are formatted
Since the commit 94b2c7 ("MEDIUM: mux-h1: refactor output processing"), the
formatting of outgoing messages is performed on the message state and no more on
the HTX blocks read. But the TUNNEL state was left out. So, the HTTP tunneling
using the CONNECT method or switching the protocol (for instance,
the WebSocket) does not work.
This issue was reported on Github. See #131. This patch must be backported to
2.0.
BUG/MEDIUM: lb_fas: Don't test the server's lb_tree from outside the lock
In the function fas_srv_reposition(), the server's lb_tree is tested from
outside the lock. So it is possible to remove it after the test and then call
eb32_insert() in fas_queue_srv() with a NULL root pointer, which is
invalid. Moving the test in the scope of the lock fixes the bug.
This issue was reported on Github, issue #126.
This patch must be backported to 2.0, 1.9 and 1.8.
BUG/MEDIUM: http/applet: Finish request processing when a service is registered
In the analyzers AN_REQ_HTTP_PROCESS_FE/BE, when a service is registered, it is
important to not interrupt remaining processing but just the http-request rules
processing. Otherwise, the part that handles the applets installation is
skipped.
Among the several effects, if the service is registered on a frontend (not a
listen), the forwarding of the request is skipped because all analyzers are not
set on the request channel. If the service does not depends on it, the response
is still produced and forwarded to the client. But the stream is infinitly
blocked because the request is not fully consumed. This issue was reported on
Github, see #151.
So this bug is fixed thanks to the new action return ACT_RET_DONE. Once a
service is registered, the action process_use_service() still returns
ACT_RET_STOP. But now, only rules processing is stopped. As a side effet, the
action http_action_reject() must now return ACT_RET_DONE to really stop all
processing.
This patch must be backported to 2.0. It depends on the commit introducing the
return code ACT_RET_DONE.
MINOR: action: Add the return code ACT_RET_DONE for actions
This code should be now used by action to stop at the same time the rules
processing and the possible following processings. And from its side, the return
code ACT_RET_STOP should be used to only stop rules processing.
So concretely, for TCP rules, there is no changes. ACT_RET_STOP and ACT_RET_DONE
are handled the same way. However, for HTTP rules, ACT_RET_STOP should now be
mapped on HTTP_RULE_RES_STOP and ACT_RET_DONE on HTTP_RULE_RES_DONE. So this
way, a action will have the possibilty to stop all processing or only rules
processing.
Note that changes about the TCP is done in this commit but changes about the
HTTP will be done in another one because it will fix a bug in the same time.
This patch must be backported to 2.0 because a bugfix depends on it.
BUG/MINOR: contrib/prometheus-exporter: Don't try to add empty data blocks
When the response buffer is full and nothing more can be inserted, it is
important to not try to insert an empty data block. Otherwise, when the function
channel_add_input() is called, the flag CF_READ_PARTIAL is set on the response
channel while nothing was read and the stream is uselessly woken up. Finally, we
have loop while the response buffer is full.
BUG/MEDIUM: sessions: Don't keep an extra idle connection in sessions.
When deciding if we keep an idle connection in the session, check if
the number of connections currently in the session is >= the max allowed,
not >, or we'll keep an extra connection.
BUG/MEDIUM: connections: Make sure we're unsubscribe before upgrading the mux.
Just calling conn_force_unsubscribe() from conn_upgrade_mux_fe() is not
enough, as there may be multiple XPRT involved. Instead, require that
any user of conn_upgrade_mux_fe() unsubscribe itself before calling it.
This should fix upgrading a TCP connection to HTX when using SSL.
BUG/MINOR: contrib/prometheus-exporter: Respect the reserve when data are sent
The previous commit e6cdfe574 ("BUG/MINOR: contrib/prometheus-exporter: Don't
use channel_htx_recv_max()") is buggy. The buffer's reserve must be respected.
BUG/MEDIUM: channel/htx: Use the total HTX size in channel_htx_recv_limit()
The receive limit of an HTX channel must be calculated against the total size of
the HTX message. Otherwise, the buffer may never be seen as full whereas the
receive limit is 0. Indeed, the function channel_htx_full() already takes care
to add a block size to the buffer's reserve (8 bytes). So if the function
channel_htx_recv_limit() also keep a block size free in addition to the buffer's
reserve, it means that at least 2 block size will be kept free but only one will
be taken into account, freezing the stream if the option http-buffer-request is
enabled.
This patch fixes the Github issue #136. It should be backported to 2.0 and
1.9. Thanks jaroslawr (Jarosław Rzeszótko) for his help.
The function htx_free_data_space() must be used intead. Otherwise, if there are
some output data not already forwarded, the maximum amount of data that may be
inserted into the buffer may be greater than what we can really insert.
BUG/MINOR: contrib/prometheus-exporter: Don't use channel_htx_recv_max()
The function htx_free_data_space() must be used intead. Otherwise, if there are
some output data not already forwarded, the maximum amount of data that may be
inserted into the buffer may be greater than what we can really insert.
BUG/MEDIUM: checks: Make sure the tasklet won't run if the connection is closed.
wake_srv_chk() can be called from conn_fd_handler(), and may decide to
destroy the conn_stream and the connection, by calling cs_close(). If that
happens, we have to make sure the tasklet isn't scheduled to run, or it will
probably crash trying to access the connection or the conn_stream.
This fixes a crash that can be seen when using tcp checks.
This should be backported to 1.9 and 2.0.
For 1.9, the call should be instead :
task_remove_from_tasklet_list((struct task *)check->wait_list.task);
That function was renamed in 2.0.
BUG/MEDIUM: connections: Always call shutdown, with no linger.
Revert commit fe4abe62c7c5206dff1802f42d17014e198b9141.
The goal was to make sure for health-checks, we would not get sockets in
TIME_WAIT. To do so, we would not call shutdown() if linger_risk is set.
However that is wrong, and that means shutw would never be forwarded to
the server, and thus we could get connection that are never properly closed.
Instead, to fix the original problem as described here :
https://www.mail-archive.com/haproxy@formilux.org/msg34080.html
Just make sure the checks code call cs_shutr() before calling cs_shutw().
If shutr has been called, conn_sock_shutw() will make no attempt to call
shutdown(), as it knows close() will be called.
We should really review and revamp the shutr/shutw code, as described in
github issue #142.
BUG/MINOR: mux-h1: Skip trailers for non-chunked outgoing messages
Unlike H1, H2 messages may contains trailers while the header "Content-Length"
is set. Indeed, because of the framed structure of HTTP/2, it is no longer
necessary to use the chunked transfer encoding. So Trailing HEADERS frames,
after all DATA frames, may be added on messages with an explicit content length.
But in H1, it is impossible to have trailers on non-chunked messages. So when
outgoing messages are formatted by the H1 multiplexer, if the message is not
chunked, all trailers must be dropped.
This patch must be backported to 2.0 and 1.9. However, the patch will have to be
adapted for the 1.9.
BUG/MEDIUM: checks: unblock signals in external checks
As discussed in issue #140, processes are forked with signals blocked
resulting in haproxy's kill being ignored. This happens when the command
takes more time to complete than the configured check timeout or interval.
Just calling "sleep 30" every second makes the problem obvious.
The fix simply consists in unblocking the signals in the child after the
fork. It needs to be backported to all stable branches containing external
checks and where signals are blocked on startup. It's unclear when it
started, but the following config exhibits the issue :
global
external-check
listen www
bind :8001
timeout client 5s
timeout server 5s
timeout connect 5s
option external-check
external-check command "$PWD/sleep10.sh"
server local 127.0.0.1:80 check inter 200
$ cat sleep10.sh
#!/bin/sh
exec /bin/sleep 10
The "sleep" processes keep accumulating for 10 seconds and stabilize
around 25 when the bug is present. Just issuing "killall sleep" has no
effect on them, and stopping haproxy leaves these processes behind.
BUG/MINOR: mworker/cli: don't output a \n before the response
When using a level lower than admin on the master CLI, a \n is output
before the response, this is caused by the response of the "operator" or
"user" that are sent before the actual command.
To fix this problem we introduce the flag APPCTX_CLI_ST1_NOLF which ask
a command response to not be followed by the final \n.
This patch made a special case with the command operator and user
followed by a - so they are not followed by \n.
BUG/MEDIUM: mux-h1: Always release H1C if a shutdown for writes was reported
We must take care of this when the stream is detached from the
connection. Otherwise, on the server side, the connexion is inserted in the list
of idle connections of the session. But when reused, because the shutdown for
writes was already catched, nothing is sent to the server and the session is
blocked with a freezed connection.
This patch must be backported to 2.0 and 1.9. It is related to the issue #136
reported on Github.
Olivier Houchard [Fri, 28 Jun 2019 12:10:33 +0000 (14:10 +0200)]
BUG/MEDIUM: ssl: Don't attempt to set alpn if we're not using SSL.
Checks use ssl_sock_set_alpn() to set the ALPN if check-alpn is used, however
check-alpn failed to check if the connection was indeed using SSL, and thus,
would crash if check-alpn was used on a non-SSL connection. Fix this by
making sure the connection uses SSL before attempting to set the ALPN.
BUG/MINOR: mux-h1: Make format errors during output formatting fatal
These errors are unexpected at this staged and there is not much more to do than
to close the connection and leave. So now, when it happens, the flag
H1C_F_CS_ERROR is set on the H1 connection and the flag HTX_FL_PARSING_ERROR is
set on the channel's HTX message.
BUG/MEDIUM: mux-h1: Use buf_room_for_htx_data() to detect too large messages
During headers parsing, an error is returned if the message is too large and
does not fit in the input buffer. The mux h1 used the function b_full() to do
so. But to allow zero copy transfers, in h1_recv(), the input buffer is
pre-aligned and thus few bytes remains always free.
To fix the bug, as during the trailers parsing, the function
buf_room_for_htx_data() should be used instead.
BUG/MEDIUM: proto_htx: Don't add EOM on 1xx informational messages
Since the commit b75b5eaf ("MEDIUM: htx: 1xx messages are now part of the final
reponses"), these messages are part of the response and should not contain
EOM. This block is skipped during responses parsing, but analyzers still add it
for "100-Continue" and "103-Eraly-Hints". It can also be added for error files
with 1xx status code.
Now, when HAProxy generate such transitional responses, it does not emit EOM
blocks. And informational messages are now forbidden in error files.
BUG/MINOR: memory: Set objects size for pools in the per-thread cache
When a memory pool is created, it may be allocated from a static array. This
happens for "most common" pools, allocated first. Objects of these pools may
also be cached in a pool cache. Of course, to not cache too much entries, we
track the number of cached objects and the total size of the cache.
But the objects size of each pool in the cache (ie, pool_cache[tid][idx].size,
where tid is the thread-id and idx is the index of the pool) was never set. So
the total size of the cache was never limited. Now when a pool is created, if
these objects may be cached, we set the corresponding objects size in the pool
cache.
BUG/MAJOR: mux-h1: Don't crush trash chunk area when outgoing message is formatted
When an outgoing HTX message is formatted before sending it, a trash chunk is
used to do the formatting. Its content is then copied into the output buffer of
the H1 connection. There are some tricks to avoid this last copy. First, if
possible we perform a zero-copy by swapping the area of the HTX buffer with the
one of the output buffer. If zero-copy is not possible, but if the output buffer
is empty, we don't use a trash chunk. To do so, we change the area of the trash
chunk to point on the one of the output buffer. But it is terribly wrong. Trash
chunks are global variables, allocated statically. If the area is changed, the
old one is lost. Worst, the area of the output buffer is dynamically allocated,
so it is released when emptied, leaving the trash chunk with a freed area (in
fact, it is a bit more complicated because buffers are allocated from a memory
pool).
So, honestly, I don't know why we never experienced any problem because this bug
till now. To fix it, we still use a temporary buffer, but we assign it to a
trash chunk only when other solutions were excluded. This way, we never
overwrite the area of a trash chunk.
BUG/MINOR: htx: Save hdrs_bytes when the HTX start-line is replaced
The HTX start-line contains the number of bytes held by all headers as seen by
the mux during the parsing. So it must not be updated during analysis. It was
done when the start-line is replaced, so this update was removed at this
place. But we still save it from the old start-line to not loose it. It should
not be used outside the mux, but there is no reason to skip it. It is a bug,
however it should have no impact.
Olivier Houchard [Mon, 24 Jun 2019 16:57:39 +0000 (18:57 +0200)]
BUG/MEDIUM: ssl: Don't do anything in ssl_subscribe if we have no ctx.
In ssl_subscribe(), make sure we have a ssl_sock_ctx before doing anything.
When ssl_sock_close() is called, it wakes any subscriber up, and that
subscriber may decide to subscribe again, for some reason. If we no longer
have a context, there's not much we can do.
Olivier Houchard [Mon, 24 Jun 2019 16:19:40 +0000 (18:19 +0200)]
BUG/MEDIUM: connections: Always add the xprt handshake if needed.
In connect_server(), we used to only call xprt_add_hs() if CO_FL_SEND_PROXY
was set during the function call, we would not do it if the flag was set
before connect_server() was called. The rational at the time was if the flag
was already set, then the XPRT was already present. But now the xprt_handshake
always removes itself, so we have to re-add it each time, or it wouldn't be
done if the first connection attempt failed.
While I'm there, check any non-ssl handshake flag, instead of just
CO_FL_SEND_PROXY, or we'd miss the SOCKS4 flags.
Olivier Houchard [Mon, 24 Jun 2019 14:08:08 +0000 (16:08 +0200)]
BUG/MEDIUM: stream_interface: Don't add SI_FL_ERR the state is < SI_ST_CON.
Only add SI_FL_ERR if the stream_interface is connected, or is attempting
a connection. We may get there because the stream_interface's tasklet
was woken up, but before it actually runs, process_stream() may be called,
detect that there were an error, and change the state of the stream_interface
to SI_ST_TAR. When the stream_interface's tasklet then run, the connection
may still have CO_FL_ERROR, but that error was already accounted for, so
just ignore it.
BUG/MEDIUM: mworker: don't call the thread and fdtab deinit
Before switching to wait mode, the per thread deinit should not be
called, because we didn't initiate threads and fdtab.
The problem is that the master could crash if we try to reload HAProxy
The commit 944e619 ("MEDIUM: mworker: wait mode use standard init code
path") removed the deinit code by accident, but its fix 7c756a8
("BUG/MEDIUM: mworker: fix FD leak upon reload") was incomplete and did
not took care of the WAIT_MODE.
Tim Duesterhus [Sun, 23 Jun 2019 20:10:12 +0000 (22:10 +0200)]
BUG/MINOR: mworker-prog: Fix segmentation fault during cfgparse
Consider this configuration:
frontend fe_http
mode http
bind *:8080
default_backend be_http
backend be_http
mode http
server example example.com:80
program foo bar
Running with valgrind results in:
==16252== Invalid read of size 8
==16252== at 0x52AE3F: cfg_parse_program (mworker-prog.c:233)
==16252== by 0x4823B3: readcfgfile (cfgparse.c:2180)
==16252== by 0x47BCED: init (haproxy.c:1649)
==16252== by 0x404E22: main (haproxy.c:2714)
==16252== Address 0x48 is not stack'd, malloc'd or (recently) free'd
Check whether `ext_child` is valid before attempting to free it and its
contents.
Willy Tarreau [Sat, 22 Jun 2019 06:24:16 +0000 (08:24 +0200)]
BUILD: makefile: do not rely on shell substitutions to determine git version
Solaris's default shell doesn't support substitutions at the beginning or
end of variables, which are still used to determine the version based on
git. Since we added --abbrev=0 we don't need the last one. And using cut
it's trivial to replace the first one, actually simplifying the whole
expression.
Willy Tarreau [Sat, 22 Jun 2019 06:13:24 +0000 (08:13 +0200)]
BUILD: makefile: adjust the sed expression of "make help" for solaris
Solaris's sed doesn't take the 'p' argument on the 's' command so
nothing is printed. Just passing ';p' fixes this without affecting
other implementations. Additionally, optional characters cannot be
matched using a question mark, which is always searched verbatim, so
the leading '#' wasn't stripped. Using \{0,1\} works fine everywhere
so let's use this instead.
Willy Tarreau [Sat, 22 Jun 2019 05:51:02 +0000 (07:51 +0200)]
BUILD: makefile: use :space: instead of digits to count commits
The 'tr' command on Solaris doesn't conform to POSIX and requires
brackets around ranges. So the sequence '0-9' is understood as the
3 characters '0', '-', and '9'. This causes tagged versions (those
with no commit after the last commit) to be numberred as an empty
string, resulting in an error being reported while computing the
version number.
All implementations support '[:space:]' to delete heading spaces,
so let's use this instead.
Willy Tarreau [Sat, 22 Jun 2019 05:41:38 +0000 (07:41 +0200)]
BUILD: mworker: silence two printf format warnings around getpid()
getpid() is documented as returning a pit pid_t result, not
necessarily an int. This causes a build warning on Solaris 10
because of '%d' or '%u' are used in the format passed to snprintf().
Let's just cast the result as an int (respectively unsigned int).
This can be backported to 2.0 and possibly older versions though
it really has no impact.
BUG/MAJOR: sample: Wrong stick-table name parsing in "if/unless" ACL condition.
This bug was introduced by 1b8e68e commit which supposed the stick-table was always
stored in struct arg at parsing time. This is never the case with the usage of
"if/unless" conditions in stick-table declared as backends. In this case, this is
the name of the proxy which must be considered as the stick-table name.
BUG/MEDIUM: lb_fwlc: Don't test the server's lb_tree from outside the lock
In the function fwlc_srv_reposition(), the server's lb_tree is tested from
outside the lock. So it is possible to remove it after the test and then call
eb32_insert() in fwlc_queue_srv() with a NULL root pointer, which is
invalid. Moving the test in the scope of the lock fixes the bug.
This issue was reported on Github, issue #126.
This patch must be backported to 2.0, 1.9 and 1.8.
BUG/MEDIUM: mux-h2: Remove the padding length when a DATA frame size is checked
When a DATA frame is processed for a message with a content-length, we first
take care to not have a frame size that exceeds the remaining to
read. Otherwise, an error is triggered. But we must remove the padding length
from the frame size because the padding is not included in the announced
content-length.
BUG/MEDIUM: mux-h2: Reset padlen when several frames are demux
In the function h2_process_demux(), if several frames are parsed, the padding
length must be reset between each frame. Otherwise we may wrongly think a frame
has a padding block because the previous one was padded.
BUG/MEDIUM: htx: Fully update HTX message when the block value is changed
Everywhere the value length of a block is changed, calling the function
htx_set_blk_value_len(), the HTX message must be updated. But at many places,
because of the recent changes in the HTX structure, this update was only
partially done. tail_addr and head_addr values were not systematically updated.
In fact, the function htx_set_blk_value_len() was designed as an internal
function to the HTX API. And we used it from outside by convenience. But it is
really painfull and error prone to let the caller update the HTX message. So
now, we use the function htx_change_blk_value_len() wherever is possible. It
changes the value length of a block and updates the HTX message accordingly.
MINOR: htx: Add the function htx_change_blk_value_len()
As its name suggest, this function change the value length of a block. But it
also update the HTX message accordingly. It simplifies the HTX API. The function
htx_set_blk_value_len() is still available and must be used with caution because
this one does not update the HTX message. It just updates the HTX block. It
should be considered as an internal function. When possible,
htx_change_blk_value_len() should be used instead.
This function is used to fix a bug affecting the 2.0. So, this patch must be
backported to 2.0.
Tim Duesterhus [Mon, 17 Jun 2019 14:10:07 +0000 (16:10 +0200)]
BUG/MEDIUM: compression: Set Vary: Accept-Encoding for compressed responses
Make HAProxy set the `Vary: Accept-Encoding` response header if it compressed
the server response.
Technically the `Vary` header SHOULD also be set for responses that would
normally be compressed based off the current configuration, but are not due
to a missing or invalid `Accept-Encoding` request header or due to the
maximum compression rate being exceeded.
Not setting the header in these cases does no real harm, though: An
uncompressed response might be returned by a Cache, even if a compressed
one could be retrieved from HAProxy. This increases the traffic to the end
user if the cache is unable to compress itself, but it saves another
roundtrip to HAProxy.
see the discussion on the mailing list: https://www.mail-archive.com/haproxy@formilux.org/msg34221.html
Message-ID: 20190617121708.GA2964@1wt.eu
A small issue remains: The User-Agent is not added to the `Vary` header,
despite being relevant to the response. Adding the User-Agent header would
make responses effectively uncacheable and it's unlikely to see a Mozilla/4
in the wild in 2019.
Add a reg-test to ensure the behaviour as described in this commit message.
see issue #121
Should be backported to all branches with compression (i.e. 1.6+).
BUG/MINOR: mux-h1: Add the header connection in lower case in outgoing messages
When necessary, this header is directly added in outgoing messages by the H1
multiplexer. Because there is no HTX conversion first, the header name is not
converserted to its lower case version. So, it must be added in lower case by
the multiplexer.
Baptiste Assmann [Thu, 13 Jun 2019 11:24:29 +0000 (13:24 +0200)]
MEDIUM: server: server-state global file stored in a tree
Server states can be recovered from either a "global" file (all backends)
or a "local" file (per backend).
The way the algorithm to parse the state file was first implemented was good
enough for a low number of backends and servers per backend.
Basically, for each backend the state file (global or local) is opened,
parsed entirely and for each line we check if it contains data related to
a server from the backend we're currently processing.
We must read the file entirely, just in case some lines for the current
backend are stored at the end of the file.
This does not scale at all!
This patch changes the behavior above for the "global" file only. Now,
the global file is read and parsed once and all lines it contains are
stored in a tree, for faster discovery.
This result in way much less fopen, fgets, and strcmp calls, which make
loading of very big state files very quick now.
BUG/MEDIUM: h2/htx: Update data length of the HTX when the cookie list is built
When an H2 request is converted into an HTX message, All cookie headers are
grouped into one, each value separated by a semicolon (;). To do so, we add the
header "cookie" with the first value and then we update the value by appending
other cookies. But during this operation, only the size of the HTX block is
updated. And not the data length of the whole HTX message.
It is an old bug and it seems to work by chance till now. But it may lead to
undefined behaviour by time to time.
Willy Tarreau [Sun, 16 Jun 2019 18:00:26 +0000 (20:00 +0200)]
[RELEASE] Released version 2.0.0
Released version 2.0.0 with the following main changes :
- MINOR: fd: Don't use atomic operations when it's not needed.
- DOC: mworker-prog: documentation for the program section
- MINOR: http: add a new "http-request replace-uri" action
- BUG/MINOR: 51d/htx: The _51d_fetch method, and the methods it calls are now HTX aware.
- MINOR: 51d: Added dummy libraries for the 51Degrees module for testing.
- MINOR: mworker: change formatting in uptime field of "show proc"
- MINOR: mworker: add the HAProxy version in "show proc"
- MINOR: doc: Remove -Ds option in man page
- MINOR: doc: add master-worker in the man page
- MINOR: doc: mention HAPROXY_LOCALPEER in the man
- BUILD: Silence gcc warning about unused return value
- CLEANUP: 51d: move the 51d dummy lib to contrib/51d/src to match the real lib
- BUILD: travis-ci: add 51Degree device detection, update openssl to 1.1.1c
- MINOR: doc: update the manpage and usage message about -S
- BUILD/MINOR: 51d: Updated build registration output to indicate thatif the library is a dummy one or not.
- BUG/MEDIUM: h1: Don't wait for handshake if we had an error.
- BUG/MEDIUM: h1: Wait for the connection if the handshake didn't complete.
- BUG/MINOR: task: prevent schedulable tasks from starving under high I/O activity
- BUG/MINOR: fl_trace/htx: Be sure to always forward trailers and EOM
- BUG/MINOR: channel/htx: Call channel_htx_full() from channel_full()
- BUG/MINOR: http: Use the global value to limit the number of parsed headers
- BUG/MINOR: htx: Detect when tail_addr meet end_addr to maximize free rooms
- BUG/MEDIUM: htx: Don't change position of the first block during HTX analysis
- CLEANUP: channel: Remove channel_htx_fwd_payload() and channel_htx_fwd_all()
- BUG/MEDIUM: proto_htx: Introduce the state ENDING during forwarding
- MINOR: htx: Add 3 flags on the start-line to deal with the request schemes
- MINOR: h2: Set flags about the request's scheme on the start-line
- MINOR: mux-h1: Set flags about the request's scheme on the start-line
- MINOR: mux-h2: Forward clients scheme to servers checking start-line flags
- MEDIUM: server: server-state only rely on server name
- CLEANUP: connection: rename the wait_event.task field to .tasklet
- CLEANUP: tasks: rename task_remove_from_tasklet_list() to tasklet_remove_*
- BUG/MEDIUM: connections: Don't call shutdown() if we want to disable linger.
- DOC: add some environment variables in section 2.3
- BUILD: makefile: clarify the "help" output and list options
- BUG/MINOR: mux-h1: Wake busy mux for I/O when message is fully sent
- BUG: tasks: fix bug introduced by latest scheduler cleanup
- BUG/MEDIUM: mux-h2: fix early close with option abortonclose
- BUG/MEDIUM: connections: Don't use ALPN to pick mux when in mode TCP.
- BUG/MEDIUM: connections: Don't try to send early data if we have no mux.
- BUG/MEDIUM: mux-h2: properly account for the appended data in HTX
- BUILD: makefile: further clarify the "help" output and list targets
- BUILD: makefile: rename "linux2628" to "linux-glibc" and remove older targets
- BUILD: travis-ci: switch to linux-glibc instead of linux2628
- DOC: update few references to the linux* targets and change them to linux-glibc
- BUILD: makefile: detect and reject recently removed linux targets
- BUILD: makefile: enable linux namespaces by default on linux
- BUILD: makefile: enable TFO on linux platforms
- BUILD: makefile: enable getaddrinfo on the linux-glibc target
- DOC: small updates to the CONTRIBUTING file
- BUG/MEDIUM: ssl: Make sure we initiate the handshake after using early data.
- CLEANUP: removed obsolete examples an move a few to better places
- DOC: Fix typos in CONTRIBUTING
- DOC: update the outdated ROADMAP file
- DOC: create a BRANCHES file to explain the life cycle
- DOC: mention in INSTALL haproxy 2.0 is a long-term supported stable version
- BUILD: travis-ci: TFO and GETADDRINFO are now enabled by default
- BUILD: makefile: make the obsolete target detection compatible with make-3.80
- BUILD: tools: work around an internal compiler bug in gcc-3.4
- BUILD: pattern: work around an internal compiler bug in gcc-3.4
- BUILD: makefile: enable USE_RT on Solaris
- BUILD: makefile: do not use echo -n
- DOC: mention a few common build errors in the INSTALL file
Willy Tarreau [Sun, 16 Jun 2019 17:39:44 +0000 (19:39 +0200)]
DOC: mention a few common build errors in the INSTALL file
These are some errors met when trying to build with gcc 3.4 on an
old (13 years-old) Solaris 10 and on an even older Linux 2.4 with
glibc 2.2.5. A few options were enough to fix the build there.
Willy Tarreau [Sat, 15 Jun 2019 19:58:44 +0000 (21:58 +0200)]
DOC: update the outdated ROADMAP file
At least the load load balancing was done. Other points are being carried
since 1.5 or so, they should go into the issue tracker with no version
indication.
Olivier Houchard [Sat, 15 Jun 2019 18:59:30 +0000 (20:59 +0200)]
BUG/MEDIUM: ssl: Make sure we initiate the handshake after using early data.
When we're done sending/receiving early data, and we add the handshake
flags on the connection, make sure we wake the associated tasklet up, so that
the handshake will be initiated.
Willy Tarreau [Sat, 15 Jun 2019 15:15:12 +0000 (17:15 +0200)]
DOC: small updates to the CONTRIBUTING file
There's an abstract explaining what is discussed in the file, a small
explanation of how the project works, which justifies the measures
taken here, and instructions about what to do when a patch is ignored,
or how to annoy everyone.
Willy Tarreau [Fri, 14 Jun 2019 16:33:56 +0000 (18:33 +0200)]
BUILD: makefile: enable getaddrinfo on the linux-glibc target
getaddrinfo() has been available since glibc 2.3.3 or so and is generally
enabled by distro packagers. The main reason for not enabling it on Linux
in the past is that it was known broken on some libc alternatives. It's
the right moment to enable it by default with glibc.
Willy Tarreau [Fri, 14 Jun 2019 14:57:42 +0000 (16:57 +0200)]
BUILD: makefile: enable TFO on linux platforms
TCP Fast Open is supported on all supported Linux kernels and on all
kernels shipped in supported distros, except the older 2.6.32 that
comes with RHEL6. However the option is harmless, will not prevent
from building and smoothly falls back even if forcefully enabled, so
it makes sense to enable it by default. It's still possible to pass
"USE_TFO=" to force it disabled if really desired.