Willy Tarreau [Wed, 7 Oct 2020 13:58:50 +0000 (15:58 +0200)]
MINOR: listeners: add a new stop_listener() function
This function will be used to definitely stop a listener (e.g. during a
soft_stop). This is actually tricky because it may be called for a proxy
or for a protocol, both of which require locks and already hold some. The
function takes booleans indicating which ones are already held, hoping
this will be enough. It's not well defined wether proto->disable() and
proto->rx_disable() are supposed to be called with any lock held, and
they are used from do_unbind_listener() with all these locks. Some back
annotations ought to be added on this point.
The proxy's listeners count is updated, and the proxy is marked as
disabled and woken up after the last one is gone. Note that a
listener in listen state is already not attached anymore since it
was disabled.
Willy Tarreau [Fri, 9 Oct 2020 16:25:14 +0000 (18:25 +0200)]
MINOR: listeners: count unstoppable jobs on creation, not deletion
We have to count unstoppable jobs which correspond to worker sockpairs, in
order to know when to count. However the way it's currently done is quite
awkward because these are counted when stopping making the stop mechanism
non-idempotent. This is definitely something we want to fix before stopping
by protocol or our listeners count will quickly go wrong. Now they are
counted when the listeners are created.
Willy Tarreau [Wed, 7 Oct 2020 13:36:16 +0000 (15:36 +0200)]
MINOR: listeners: split delete_listener() in two versions
We'll need an already locked variant of this function so let's make
__delete_listener() which will be called with the protocol lock held
and the listener's lock held.
MEDIUM: listeners: now use the listener's ->enable/disable
At each place we used to manipulate the FDs directly we can now call
the listener protocol's enable/disable/rx_enable/rx_disable depending
on whether the state changes on the listener or the receiver. One
exception currently remains in listener_accept() which is a bit special
and which should be split into 2 or 3 parts in the various protocol
layers.
The test of fd_updt in do_unbind_listener() that was added by commit a51885621 ("BUG/MEDIUM: listeners: Don't call fd_stop_recv() if fd_updt
is NULL.") could finally be removed since that part is correctly handled
in the low-level disable() function.
One disable() was added in resume_listener() before switching to LI_FULL
because rx_resume() enables polling on the FD for the receiver while
we want to disable it if the listener is full. There are different
ways to clean this up in the future. One of them could be to consider
that TCP receivers only act at the listener level. But in fact it does
not translate reality. The reality is that only the receiver is paused
and that the listener's state ought not be affected here. Ultimately
the resume_listener() function should be split so that the part
controlled by the protocols only acts on the receiver, and that the
receiver itself notifies the upper listener about the change so that
the listener protocol may decide to disable or enable polling. Conversely
the listener should automatically update its receiver when they share the
same state. Since there is no harm proceeding like this, let's keep this
for now.
MINOR: protocol: add a new pair of enable/disable methods for listeners
These methods will be used to enable/disable accepting new connections
so that listeners do not play with FD directly anymore. Since all the
currently supported protocols work on socket for now, these are identical
to the rx_enable/rx_disable functions. However they were not defined in
sock.c since it's likely that some will quickly start to differ. At the
moment they're not used.
We have to take care of fd_updt before calling fd_{want,stop}_recv()
because it's allocated fairly late in the boot process and some such
functions may be called very early (e.g. to stop a disabled frontend's
listeners).
MINOR: protocol: add a new pair of rx_enable/rx_disable methods
These methods will be used to enable/disable rx at the receiver level so
that callers don't play with FDs directly anymore. All our protocols use
the generic ones from sock.c at the moment. For now they're not used.
MINOR: sock: provide a set of generic enable/disable functions
These will be used on receivers, to enable or disable receiving on a
listener, which most of the time just consists in enabling/disabling
the file descriptor.
We have to take care of the existence of fd_updt to know if we may
or not call fd_{want,stop}_recv() since it's not permitted in very
early boot.
MINOR: listener: use the protocol's ->rx_resume() method when available
Instead of calling listen() for IPPROTO_TCP in resume_listener(), let's
call the protocol's ->rx_resume() method when defined, which does the same.
This removes another hard-dependency on the fd and underlying protocol
from the generic functions.
MINOR: protocol: implement an ->rx_resume() method
This one undoes ->rx_suspend(), it tries to restore an operational socket.
It was only implemented for TCP since it's the only one we support right
now.
MINOR: protocol: replace ->pause(listener) with ->rx_suspend(receiver)
The ->pause method is inappropriate since it doesn't exactly "pause" a
listener but rather temporarily disables it so that it's not visible at
all to let another process take its place. The term "suspend" is more
suitable, since the "pause" is actually what we'll need to apply to the
FULL and LIMITED states which really need to make a pause in the accept
process. And it goes well with the use of the "resume" function that
will also need to be made per-protocol.
Let's rename the function and make it act on the receiver since it's
already what it essentially does, hence the prefix "_rx" to make it
more explicit.
The protocol struct was a bit reordered because it was becoming a real
mess between the parts related to the listeners and those for the
receivers.
MINOR: protocol: rename the ->listeners field to ->receivers
Since the listeners were split into receiver+listener, this field ought
to have been renamed because it's confusing. It really links receivers
and not listeners, as most of the time it's used via rx.proto_list!
The nb_listeners field was updated accordingly.
MINOR: protocol: directly call enable_listener() from protocol_enable_all()
protocol_enable_all() calls proto->enable_all() for all protocols,
which is always equal to enable_all_listeners() which in turn simply is
a generic loop calling enable_listener() always returning ERR_NONE. Let's
clean this madness by first calling enable_listener() directly from
protocol_enable_all().
CLEANUP: listeners: remove unused disable_listener and disable_all_listeners
These ones have never been called, they were referenced by the protocol's
disable_all for some protocols but there are no traces of their use, so
in addition to not being sure the code works, it has never been tested.
Let's remove a bit of complexity starting from there.
MINOR: listeners: move fd_stop_recv() to the receiver's socket code
fd_stop_recv() has nothing to do in the generic listener code, it's per
protocol as some don't need it. For instance with abns@ it could even
lead to fd_stop_recv(-1). And later with QUIC we don't want to touch
the fd at all! It used to be that since commit f2cb169487 delegating
fd manipulation to their respective threads it wasn't possible to call
it down there but it's not the case anymore, so let's perform the action
in the protocol-specific code.
By using the same "ret" variable in the "if" block to test the return
value of pause(), the second one shadows the first one and when forcing
the result to zero in case of an error, it doesn't do anything. The
problem is that some listeners used to fail to pause in multi-process
mode and this was not reported, but their failure was automatically
resolved by the last process to pause. By properly checking for errors
we might now possibly report a race once in a while so we may have to
roll this back later if some users meet it.
The test on ==0 is wrong too since technically speaking a total stop
validates the need for a pause, but stops the listener so it's just
the resume that won't work anymore. We could switch to stopped but
it's an involuntary switch and the user will not know. Better then
mark it as paused and let the resume continue to fail so that only
the resume will eventually report an error (e.g. abns@).
This must not be backported as there is a risk of side effect by fixing
this bug, given that it hides other bugs itself.
Willy Tarreau [Thu, 8 Oct 2020 14:51:09 +0000 (16:51 +0200)]
MEDIUM: proto_tcp: make the pause() more robust in multi-process
In multi-process, the TCP pause is very brittle and we never noticed
it because the error was lost in the upper layers. The problem is that
shutdown() may fail if another process already did it, and will cause
a process to fail to pause.
What we do here in case of error is that we double-check the socket's
state to verify if it's still accepting connections, and if not, we
can conclude that another process already did the job in parallel.
The difficulty here is that we're trying to eliminate false positives
where some OSes will silently report a success on shutdown() while they
don't shut the socket down, hence this dance of shutw/listen/shutr that
only keeps the compatible ones. Probably that a new approach relying on
connect(AF_UNSPEC) would provide better results.
MAJOR: signals: use protocol_pause_all() and protocol_resume_all()
When temporarily pausing the listeners with SIG_TTOU, we now pause
all listeners via the protocols instead of the proxies. This has the
benefits that listeners are paused regardless of whether or not they
belong to a visible proxy. And for resuming via SIG_TTIN we do the
same, which allows to report binding conflicts and address them,
since the operation can be repeated on a per-listener basis instead
of a per-proxy basis.
While in appearance all cases were properly handled, it's impossible
to completely rule out the possibility that something broken used to
work by luck due to the scan ordering which is naturally different,
hence the major tag.
These two functions are used to pause and resume all listeners of
all protocols. They use the standard listener functions for this
so they're supposed to handle the situation gracefully regardless
of the upper proxies' states, and they will report completion on
proxies once the switch is performed.
It might be nice to define a particular "failed" state for listeners
that cannot resume and to count them on proxies in order to mention
that they're definitely stuck. On the other hand, the current
situation is retryable which is quite appreciable as well.
MEDIUM: listener/proxy: make the listeners notify about proxy pause/resume
Till now, we used to call pause_proxy()/resume_proxy() to enable/disable
processing on a proxy, which is used during soft reloads. But since we want
to drive this process from the listeners themselves, we have to instead
proceed the other way around so that when we enable/disable a listener,
it checks if it changed anything for the proxy and notifies about updates
at this level.
The detection is made using li_ready=0 for pause(), and li_paused=0
for resume(). Note that we must not include any test for li_bound because
this state is seen by processes which share the listener with another one
and which must not act on it since the other process will do it. As such
the socket behind the FD will automatically be paused and resume without
its local state changing, but this is the limit of a multi-process system
with shared listeners.
MINOR: listeners: check the current listener earlier state in resume_listener()
It's quite confusing to have the test on LI_READY very low in the function
as it should be made much earlier. Just like with previous commit, let's
do it when entering. The additional states, however (limited, full) continue
to go through the whole function.
MINOR: listeners: check the current listener state in pause_listener()
It's better not to try to perform pause() actions on wrong states, so
let's check this and make sure that all callers are now safe. This
means that we must not try to pause a listener which is already paused
(e.g. it could possibly fail if the pause operation isn't idempotent at
the socket level), nor should we try it on earlier states.
MEDIUM: proxy: merge zombify_proxy() with stop_proxy()
The two functions don't need to be distinguished anymore since they have
all the necessary info to act as needed on their listeners. Let's just
pass via stop_proxy() and make it check for each listener which one to
close or not.
Its sole remaining purpose was to display "proxy foo started", which
has little benefit and pollutes output for those with plenty of proxies.
Let's remove it now.
The VTCs were updated to reflect this, because many of them had explicit
counts of dropped lines to match this message.
This is tagged as MEDIUM because some users may be surprized by the
loss of this quite old message.
MEDIUM: proxy: replace proxy->state with proxy->disabled
The remaining proxy states were only used to distinguish an enabled
proxy from a disabled one. Due to the initialization order, both
PR_STNEW and PR_STREADY were equivalent after startup, and they
would only differ from PR_STSTOPPED when the proxy is disabled or
shutdown (which is effectively another way to disable it).
Now we just have a "disabled" field which allows to distinguish them.
It's becoming obvious that start_proxies() is only used to print a
greeting message now, that we'd rather get rid of. Probably that
zombify_proxy() and stop_proxy() should be merged once their
differences move to the right place.
CLEANUP: peers: don't use the PR_ST* states to mark enabled/disabled
The enabled/disabled config options were stored into a "state" field
that is an integer but contained only PR_STNEW or PR_STSTOPPED, which
is a bit confusing, and causes a dependency with proxies. This was
renamed to "disabled" and is used as a boolean. The field was also
moved to the end of the struct to stop creating a hole and fill another
one.
MINOR: startup: don't rely on PR_STNEW to check for listeners
Instead of looking at listeners in proxies in PR_STNEW state, we'd
rather check for listeners in those not in PR_STSTOPPED as it's only
this state which indicates the proxy was disabled. And let's check
the listeners count instead of testing the list's head.
This state was used to mention that a proxy was in PAUSED state, as opposed
to the READY state. This was causing some trouble because if a listener
failed to resume (e.g. because its port was temporarily in use during the
resume), it was not possible to retry the operation later. Now by checking
the number of READY or PAUSED listeners instead, we can accurately know if
something went bad and try to fix it again later. The case of the temporary
port conflict during resume now works well:
This state is only set when a pause() fails but isn't even set when a
resume() fails. And we cannot recover from this state. Instead, let's
just count remaining ready listeners to decide to emit an error or not.
It's more accurate and will better support new attempts if needed.
Since v1.4 or so, it's almost not possible anymore to set this state. The
only exception is by using the CLI to change a frontend's maxconn setting
below its current usage. This case makes no sense, and for other cases it
doesn't make sense either because "full" is a vague concept when only
certain listeners are full and not all. Let's just remove this unused
state and make it clear that it's not reported. The "ready" or "open"
states will continue to be reported without being misleading as they
will be opposed to "stop".
MINOR: proxy: maintain per-state counters of listeners
The proxy state tries to be synthetic but that doesn't work well with
many listeners, especially for transition phases or after a failed
pause/resume.
In order to address this, we'll instead rely on counters of listeners in
a given state for the 3 major states (ready, paused, listen) and a total
counter. We'll now be able to determine a proxy's state by comparing these
counters only.
This function is used as a wrapper to set a listener's state everywhere.
We'll use it later to maintain some counters in a consistent state when
switching state so it's capital that all state changes go through it.
No functional change was made beyond calling the wrapper.
CLEANUP: proxy: remove the first_to_listen hack in zombify_proxy()
This thing was needed for an optimization used in soft_stop() which
doesn't exist anymore, so let's remove it as it's cryptic and hinders
the listeners cleanup.
MINOR: listeners: do not uselessly try to close zombie listeners in soft_stop()
The loop doesn't match anymore since the non-started listeners are in
LI_INIT and even if it had ever worked the benefit of closing zombies
at this point looks void at best.
MEDIUM: listeners: remove the now unused ZOMBIE state
The zombie state is not used anymore by the listeners, because in the
last two cases where it was tested it couldn't match as it was covered
by the test on the process mask. Instead now the FD is either in the
LISTEN state or the INIT state. This also avoids forcing the listener
to be single-dimensional because actually belonging to another process
isn't totally exclusive with the other states, which explains some of
the difficulties requiring to check the proc_mask and the fd sometimes.
So let's get rid of it now not to be tempted to reuse it.
MEDIUM: deinit: close all receivers/listeners before scanning proxies
Because of the zombie state, proxies have a skewed vision of the state
of listeners, which explains why there are hacks switching the state
from ZOMBIE to INIT in the proxy cleaning loop. This is particularly
complicated and not needed, as all the information is now available
in the protocol list and the fdtab.
What we do here instead is to first close all active listeners or
receivers by protocol and clean their protocol parts. Then we scan the
fdtab to get rid of remaining ones that were necessarily in INIT state
after a previous invocation of delete_listener(). From this point, we
know the listeners are cleaned, the can safely be freed by scanning the
proxies.
MEDIUM: listeners: make unbind_listener() converge if needed
The ZOMBIE state on listener is a real mess. Listeners passing through
this state have lost their consistency with the proxy AND with the fdtab.
Plus this state is not used for all foreign listeners, only for those
belonging to a proxy that entirely runs on another process, otherwise it
stays in INIT state, which makes the usefulness extremely questionable.
But the real issue is that it's impossible to untangle the receivers
from the proxy state as long as we have this because of deinit()...
So what we do here is to start by making unbind_listener() support being
called more than once. This will permit to call it again to really close
the FD and finish the operations if it's called with an FD that's in a
fake state (such as INIT but with a valid fd).
Willy Tarreau [Wed, 7 Oct 2020 16:36:54 +0000 (18:36 +0200)]
MEDIUM: init: stop disabled proxies after initializing fdtab
During the startup process we don't have any fdtab nor fd_updt for quite
a long time, and as such some operations on the listeners are not
permitted, such as fd_want_*/fd_stop_* or fd_delete(). The latter is of
particular concern because it's used when stopping a disabled frontend,
and it's performed very early during check_config_validity() while there
is no fdtab yet. The trick till now relies on the listener's state which
is a bit brittle.
There is absolutely no valid reason for stopping a proxy's listeners this
early, we can postpone it after init_pollers() which will at least have
allocated fdtab.
MEDIUM: listeners: don't bounce listeners management between queues
During 2.1 development, commit f2cb16948 ("BUG/MAJOR: listener: fix
thread safety in resume_listener()") was introduced to bounce the
enabling/disabling of a listener's FD to one of its threads because
the remains of fd_update_cache() were fundamentally incompatible with
the need to call fd_want_recv() or fd_stop_recv() for another thread.
However since then we've totally dropped such code and it's totally
safe to use these functions on an FD that is solely used by another
thread (this is even used by the FD migration code). The only remaining
limitation concerning the wake up delay was addressed by previous commit
"MEDIUM: fd: always wake up one thread when enabling a foreing FD".
The current situation forces the FD management to remain in the
pause_listener() and resume_listener() functions just so that it can
bounce between threads, without having the ability to delegate it to
the suitable protocol layer.
So let's first remove this now unneeded workaround.
MEDIUM: fd: always wake up one thread when enabling a foreing FD
Since 2.2 it's safe to enable/disable another thread's FD but the fd_wake
calls will not immediately be considered because nothing wakes the other
threads up. This will have an impact on listeners when deciding to resume
them after they were paused, so at minima we want to wake up one of their
threads, just like the scheduler does on task_kill(). This is what this
patch does.
Willy Tarreau [Fri, 9 Oct 2020 09:14:35 +0000 (11:14 +0200)]
REGTESTS: mark abns_socket as broken
This test is inherently racy. It regularly pops up on the CI, and I've
spent one hour chasing a bug that apparently doesn't exist, just because
I'm running it 10 times in a row and it reports from 4 to 8 failures
when built at -O2 and generally even more at -O0. The logs are very
confusing, often reporting that it failed with status 0, with nothing
else wrong. I suspect it might sometimes be the shell command that fails
if it executes faster than haproxy finishes to start up, which would
also explain the relation with the optimization level. E.g:
> Testing with haproxy version: 2.2.0
> # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.006) exit=2
> # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.006) exit=2
> # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.009) exit=2
> # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.008) exit=2
> # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.007) exit=2
> # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.007) exit=2
> 6 tests failed, 0 tests skipped, 4 tests passed
Some of the failures include this, suggesting that some barriers could
help:
---- h1 haproxy h1 PID file check failed:
Could not read PID file '/tmp/haregtests-2020-10-09_11-19-40.kgsDB4/vtc.30539.04dbea7f/h1/pid
Since it has been causing false positives and consumed way more
troubleshooting time than it saved, let's mark it as broken so that it
doesn't waste more time. We can bring it back when someone manages to
figure what the problem is.
BUG/MINOR: http-htx: Expect no body for 204/304 internal HTTP responses
204 and 304 HTTP responses must no contain message body. These status codes are
correctly handled when the responses are received from a server. But there is no
specific processing for internal HTTP reponses (errorfile and http replies).
Now, when errorfiles or an http replies are parsed during the configuration
parsing, an error is triggered if a 204/304 message contains a body. An extra
check is also performed to ensure the body length matches the announce
content-length.
This patch should fix the issue #891. It must be backported as far as 2.0. For
2.1 and 2.0, only the http_str_to_htx() function must be fixed.
http_parse_http_reply() function does not exist.
BUG/MINOR: http: Fix content-length of the default 500 error
96 bytes is announce in the C-L header for a message of body of 97 bytes. This
bug was introduced by the patch 46a030cdd ("CLEANUP: assorted typo fixes in the
code and comments").
This patch must be backported in all versions where the patch above is (the 2.2
for now).
BUG/MEDIUM: mux-h2: Don't handle pending read0 too early on streams
This patch is similar to the previous one on the fcgi. Same is true for the
H2. But the bug is far harder to trigger because of the protocol cinematic. But
it may explain strange aborts in some edge cases.
A read0 received on the connection must not be handled too early by H2 streams.
If the demux buffer is not empty, the pending read0 must not be considered. The
H2 streams must not be passed in half-closed remote state in
h2s_wake_one_stream() and the CS_FL_EOS flag must not be set on the associated
conn-stream in h2_rcv_buf(). To sum up, it means, if there are still data
pending in the demux buffer, no abort must be reported to the streams.
To fix the issue, a dedicated function has been added, responsible for detecting
pending read0 for a H2 connection. A read0 is reported only if the demux buffer
is empty. This function is used instead of conn_xprt_read0_pending() at some
places.
Note that the HREM stream state should not be used to report aborts. It is
performed on h2s_wake_one_stream() function and it is a legacy of the very first
versions of the mux-h2.
This patch should be backported as far as 2.0. In the 1.8, the code is too
different to apply it like that. But it is probably useless because the mux-h2
can only be installed on the client side.
BUG/MEDIUM: mux-fcgi: Don't handle pending read0 too early on streams
A read0 received on the connection must not be handled too early by FCGI
streams. If the demux buffer is not empty, the pending read0 must not be
considered. The FCGI streams must not be passed in half-closed remote state in
fcgi_strm_wake_one_stream() and the CS_FL_EOS flag must not be set on the
associated conn-stream in fcgi_rcv_buf(). To sum up, it means, if there are
still data pending in the demux buffer, no abort must be reported to the
streams.
To fix the issue, a dedicated function has been added, responsible for detecting
pending read0 for a FCGI connection. A read0 is reported only if the demux
buffer is empty. This function is used instead of conn_xprt_read0_pending() at
some places.
This patch should fix the issue #886. It must be backported as far as 2.1.
Willy Tarreau [Fri, 9 Oct 2020 03:56:56 +0000 (05:56 +0200)]
BUG/MINOR: makefile: fix a tiny typo in the target list
Previous commit 382001b46 ("BUILD: Add a DragonFlyBSD target") introduced
a tiny typo in the target list ("iopenbs" vs "openbsd"). This will have to
be backported if that patch is backported.
Willy Tarreau [Thu, 8 Oct 2020 16:05:56 +0000 (18:05 +0200)]
DOC: fix a confusing typo on a regsub example
Sébastien reported a confusing example in the doc about regsub when used
with quotes. Nested quotes are already not trivial to grasp, but when
typos are there and result in something valid, it's even worse. The closing
quote ought to have been inside the brackets. However haproxy will not make
any difference because the single quotes delimit a word and the delimited
word remains the same. Let's just not add yet another level of confusion.
MINOR: mux-h1: Don't wakeup the H1C when output buffer become available
There is no reason to wake up the H1 connection when a new output buffer is
retrieved after an allocation failure because only the H1 stream will fill it.
BUG/MINOR: mux-h1: Always set the session on frontend h1 stream
The session is always defined for a frontend connection. When a new client
connection is established, the session is set for the first H1 stream. But on
keep-alived connections, it is not set for the followings H1 streams while it is
possible.
This patch is tagged as a bug because it fixes an inconsistency in the H1
streams creation. But it does not fixed a known bug.
BUG/MINOR: mux-h1: Be sure to only set CO_RFL_READ_ONCE for the first read
The condition to set CO_RFL_READ_ONCE flag is not really accurate. We must check
the request state on frontend connection only and, in the opposite, the response
state on backend connection only. Only the parsed side must be considered, not
the opposite one.
CLEANUP: ssl: Release cached SSL sessions on deinit
On deinit, when the server SSL ctx is released, we must take care to release the
cached SSL sessions stored in the array <ssl_ctx.reused_sess>. There are
global.nbthread entries in this array, each one may have a pointer on a cached
session.
This patch should fix the issue #802. No backport needed.
Tim Duesterhus [Mon, 14 Sep 2020 16:01:33 +0000 (18:01 +0200)]
CLEANUP: cache: Fix leak of cconf->c.name during config check
During the config check, the post parsing is not performed. Thus, cache filters
are not fully initialized and their cache name are never released. To be able to
release them, a flag is now set when a cache filter is fully initialized. On
deinit, if the flag is not set, it means the cache name must be freed.
The patch should fix #849. No backport needed.
[Cf: Tim is the patch author, but I added the commit message]
BUG/MINOR: proto_tcp: Report warning messages when listeners are bound
When a TCP listener is bound, in the tcp_bind_listener() function, a warning
message may be reported and should be displayed on verbose mode. But the warning
message is actually lost if the socket is successfully bound because we don't
fill the <errmsg> variable in this case.
This patch should fix the issue #863. No backport is needed.
BUG/MINOR: peers: Inconsistency when dumping peer status codes.
A peer connection status must be considered as valid only if there is an applet
which has been instantiated for the connection to the peer. So, ->statuscode
should be considered as the last known peer connection status from the last
connection to this peer if any. To reflect this, "statuscode" field of peer dump
is renamed to "last_statuscode".
This patch also add "active"/"inactive" field after the peer location type
("remote" or "local") if an applet has been instantiated for this peer connection
or not.
Thank you to Emeric for having noticed this issue.
Remove variable declaration inside a for-loop. This was introduced by my
patches serie of the implementation of dynamic stats. This is not
supported by older gcc, notably on the freebsd environment of the ci.
Use the new stats module API to integrate the dns counters in the
standard stats. This is done in order to avoid code duplication, keep
the code related to cli out of dns and use the full possibility of the
stats function, allowing to print dns stats in csv or json format.
MINOR: stats: display extra proxy stats on the html page
Integrate the additional proxy stats on the html stats page. For each
module, a new column is displayed with the individual stats available as
a tooltip.
MINOR: stats: support clear counters for dynamic stats
Add a boolean 'clearable' on stats module structure. If set, it forces
all the counters to be reset on 'clear counters' cli command. If not,
the counters are reset only when 'clear counters all' is used.
MEDIUM: stats: integrate static proxies stats in new stats
This is executed on startup with the registered statistics module. The
existing statistics have been merged in a list containing all
statistics for each domain. This is useful to print all available
statistics in a generic way.
Allocate extra counters for all proxies/servers/listeners instances.
These counters are allocated with the counters from the stats modules
registered on startup.
MEDIUM: stats: add abstract type to store counters
Implement a small API to easily add extra counters inside a structure
instance. This will be used to implement dynamic statistics linked on
every type of object as needed.
The counters are stored in a dynamic array inside the relevant objects.
MEDIUM: stats: define an API to register stat modules
A stat module can be registered to quickly add new statistics on
haproxy. It must be attached to one of the available stats domain. The
register must be done using INITCALL on STG_REGISTER.
The stat module has a name which should be unique for each new module in
a domain. It also contains a statistics list with their name/desc and a
pointer to a function used to fill the stats from the module counters.
The module also provides the initial counters values used on
automatically allocated counters. The offset for these counters
are stored in the module structure.
MEDIUM: stats: add delimiter for static proxy stats on csv
Use the character '-' to mark the end of static statistics on proxy
domain. After this marker, the order of the fields is not guaranteed and
should be parsed with care.
MINOR: stats: define additional flag px cap on domain
This flag can be used to determine on what type of proxy object the
statistics should be relevant. It will be useful when adding dynamic
statistics. Currently, this flag is not used.
MINOR: stats: define the concept of domain for statistics
The domain option will be used to have statistics attached to other
objects than proxies/listeners/servers. At the moment, only the PROXY
domain is available.
Add an argument 'domain' on the 'show stats' cli command to specify the
domain. Only 'domain proxy' is available now. If not specified, proxy
will be considered the default domain.
For HTML output, only proxy statistics will be displayed.
MINOR: hlua: Display debug messages on stderr only in debug mode
Debug Messages emitted in lua using core.Debug() or core.log() are now only
displayed on stderr if HAProxy is started in debug mode (-d parameter on the
command line). There is no change for other message levels.
This patch should fix the issue #879. It may be backported to all stable
versions.
MINOR: stats: hide px/sv/li fields in applet struct
Use an opaque pointer to store proxy instance. Regroup server/listener
as a single opaque pointer. This has the benefit to render the structure
more evolutive to support statistics on other types of objects in the
future.
This patch is needed to extend stat support for components other than
proxies objects.
The prometheus module has been adapted for these changes.
MINOR: stats: add stats size as a parameter for csv/json dump
Render the stats size parametric in csv/json dump functions. This is
needed for the future patch which provides dynamic stats. For now the
static value ST_F_TOTAL_FIELDS is provided.
Remove unused parameter px on stats_dump_one_line.
This patch is needed to extend stat support to components other than
proxies objects.
Un-mark stats_dump_one_line and stats_putchk as static and export them
in the header file. These functions will be reusable by other components to
print their statistics.
This patch is needed to extend stat support to components other than
proxies objects.
The json schema seems to be invalid when checking using the validator
from https://www.jsonschemavalidator.net/. Correct it using the
following specification :
http://json-schema.org/draft/2019-09/json-schema-validation.html#rfc.section.9.1
The impact of the bug it not well known as I am not sure of how useful
the json schema is for users. It is probably not used at all or else
this bug would have been reported.
Willy Tarreau [Fri, 2 Oct 2020 15:52:49 +0000 (17:52 +0200)]
BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe
A crash reported in github issue #880 looks impossible unless
pendconn_cond_unlink() occasionally sees a null leaf_p when attempting
to remove an entry, which seems to be confirmed by the reporter. What
seems to be happening is that depending on compiler optimizations,
this pointer can appear as null while pointers are moved if one of
the node's parents is removed from or inserted into the tree. There's
no explicit null of the pointer during these operations but those
pointers are rewritten in multiple steps and nothing prevents this
situation from happening, and there are no particular barrier nor
atomic ops around this.
This test was used to avoid unnecessary locking, for already deleted
entries, but looking at the code it appears that pendconn_free() already
resets s->pend_pos that's used as <p> there, and that the other call
reasons are after an error where the connection will be dropped as
well. So we don't save anything by doing this test, and make it
unsafe. The older code used to check for list emptiness there and
not inside pendconn_unlink(), which explains why the code has stayed
there. Let's just remove this now.
Thanks to @jaroslawr for reporting this issue in great details and for
testing the proposed fix.
This should be backpored to 1.8, where the test on LIST_ISEMPTY should
be moved to pendconn_unlink() instead (inside the lock, just like 2.0+).
Update the documentation with the new bundle behavior which does not use
the same OpenSSL certificate store anymore but loads the PEM separately
as multiple "crt" were specified.
BUG/MINOR: tcpcheck: Set socks4 and send-proxy flags before the connect call
Since the health-check refactoring in the 2.2, the checks through a socks4 proxy
are broken. To fix this bug, CO_FL_SOCKS4 flag must be set on the connection
before calling the connect() callback function because this flags is checked to
use the right destination address. The same is done for the CO_FL_SEND_PROXY
flag for a consistency purpose.
A reg-test has been added to test the "check-via-socks4" directive.
MEDIUM: tcp-rules: Warn if a track-sc* content rule doesn't depend on content
The warning is only emitted for HTTP frontend. Idea is to encourage the usage of
"tcp-request session" rules to track counters that does not depend on the
request content. The documentation has been updated accordingly.
The warning is important because since the multiplexers were added in the
processing chain, the HTTP parsing is performed at a lower level. Thus parsing
errors are detected in the multiplexers, before the stream creation. In HTTP/2,
the error is reported by the multiplexer itself and the stream is never
created. This difference has a certain number of consequences, one of which is
that HTTP request counting in stick tables only works for valid H2 request, and
HTTP error tracking in stick tables never considers invalid H2 requests but only
invalid H1 ones. And the aim is to do the same with the mux-h1. This change will
not be done for the 2.3, but the 2.4. At the end, H1 and H2 parsing errors will
be caught by the multiplexers, at the session level. Thus, tracking counters at
the content level should be reserved for rules using a key based on the request
content or those using ACLs based on the request content.
To be clear, a warning will be emitted for the following rules :
DOC: tcp-rules: Refresh details about L7 matching for tcp-request content rules
Because the parsing of HTTP message is now performed in the HTTP multiplexers,
the content is immediatly available when "tcp-request content" rules are
evaluated for an HTTP frontend. So, it is a good idea to make the documentation
explicit on this point. In addition, because in all cases, the parsing is
already performed, there is no reason to still use "tcp-request content" rules
based on L7 matching, although it is still valid. The recommended way is to use
"http-request" rules instead. Again, it is a good idea to update the
documentation on this point.
Eric Salama [Fri, 2 Oct 2020 09:58:19 +0000 (11:58 +0200)]
BUG/MINOR: Fix several leaks of 'log_tag' in init().
We use chunk_initstr() to store the program name as the default log-tag.
If we use the log-tag directive in the config file, this chunk will be
destroyed and replaced. chunk_initstr() sets the chunk size to 0 so we
will free the chunk itself, but not its content.
This happens for a global section and also for a proxy.
We fix this by using chunk_initlen() instead of chunk_initstr().
We also check that the memory allocation was successfull, otherwise we quit.
This fixes github issue #850.
It can be backported as far as 1.9, with minor adjustments to includes.