git.ipfire.org Git - thirdparty/haproxy.git/log

BUG/MAJOR: mux-h2: Don't try to send data if we know it is no longer possible

In h2_send(), if we are in a state where we know it is no longer possible to
send data, we must exit the sending loop to avoid any possiblity to loop
forever. It may happen if the mbuf ring is released while the H2_CF_MUX_MFULL
flag is still set. Here is a possible scenario to trigger the bug :

  1) The mbuf ring is full because we are unable to send data. The
     H2_CF_MUX_MFULL flag is set on the H2 connection.

  2) At this stage, the task timeout expires because the H2 connection is
     blocked. We enter in h2_timeout_task() function. Because the mbuf ring is
     full, we cannot send the GOAWAY frame. Thus the H2_CF_GOAWAY_FAILED flag is
     set. The H2 connection is not released yet because there is still a stream
     attached. Here we leave h2_timeout_task() function.

  3) A bit later, the H2 connection is woken up. If h2_process(), nothing is
     performed by the first attempt to send data, in h2_send(). Then, because
     the H2_CF_GOAWAY_FAILED flag is set, the mbuf ring is released. But the
     H2_CF_MUX_MFULL flag is still there. At this step a second attempt to send
     data is performed.

  4) In h2_send(), we try to send data in a loop. To exist this loop, done
     variable must be set to 1. Because the H2_CF_MUX_MFULL flag is set, we
     don't call h2_process_mux() and done is not updated. Because the mbuf ring
     is now empty, nothing is sent and the H2_CF_MUX_MFULL flag is never
     removed. Now, we loop forever... waiting for the watchdog.

To fix the bug, we now exit the loop if one of these conditions is true :

  - The H2_CF_GOAWAY_FAILED flag is set on the H2 connection
  - The CO_FL_SOCK_WR_SH flag is set on the underlying connection
  - The H2 connection is in the H2_CS_ERROR2 state

This patch should fix the issue #912 and most probably #875. It must be
backported as far as the 1.8.

BUG/MINOR: http-ana: Don't send payload for internal responses to HEAD requests

When an internal response is returned to a client, the message payload must be
skipped if it is a reply to a HEAD request. The payload is removed from the HTX
message just before the message forwarding.

This bugs has been around for a long time. It was already there in the pre-HTX
versions. In legacy HTTP mode, internal errors are not parsed. So this bug
cannot be easily fixed. Thus, this patch should only be backported in all HTX
versions, as far as 2.0. However, the code has significantly changed in the
2.2. Thus in the 2.1 and 2.0, the patch must be entirely reworked.

CLEANUP: compression: Make use of http_get_etag_type()

This commit makes the compressor use http_get_etag_type to validate the
ETag instead of using an ad-hoc condition.

REGTEST: cache: Add if-none-match test case

Test that if-none-match header is properly taken into account and that
when the conditions are fulfilled, a "304 Not Modified" response can be
sent to the client.

Co-authored-by: Tim Duesterhus <tim@bastelstu.be>

MEDIUM: cache: Add support for 'If-None-Match' request header

Partial support of conditional HTTP requests. This commit adds the
support of the 'If-None-Match' header (see RFC 7232#3.2).
When a client specifies a list of ETags through one or more
'If-None-Match' headers, they are all compared to the one that might have
been stored in the corresponding http cache entry until one of them
matches.
If a match happens, a specific "304 Not Modified" response is
sent instead of the cached data. This response has all the stored
headers but no other data (see RFC 7232#4.1). Otherwise, the whole cached data
is sent.
Although unlikely in a GET/HEAD request, the "If-None-Match: *" syntax is
valid and also receives a "304 Not Modified" response (RFC 7434#4.3.2).

This resolves a part of GitHub issue #821.

MEDIUM: cache: Store the ETag information in the cache_entry

When sent by a server for a given resource, the ETag header is
stored in the coresponding cache entry (as any other header). So in
order to perform future ETag comparisons (for subsequent conditional
HTTP requests), we keep the length of the ETag and its offset
relative to the start of the cache_entry.
If no ETag header exists, the length and offset are zero.

MINOR: http: Add etag comparison function

Add a function that compares two etags that might be of different types.
If any of them is weak, the 'W/' prefix is discarded and a strict string
comparison is performed.

Co-authored-by: Tim Duesterhus <tim@bastelstu.be>

MINOR: http: Add `enum etag_type http_get_etag_type(const struct ist)`

http_get_etag_type returns whether a given `etag` is a strong, weak, or invalid
ETag.

BUG/MEDIUM: server: support changing the slowstart value from state-file

If the slowstart value in a state file implies the latest state change
is within the slowstart period, we end up calling srv_update_status()
to reschedule the server's state change but its task is not yet
allocated and remains null, causing a crash on startup.

Make sure srv_update_status() supports being called with partially
initialized servers which do not yet have a task. If the task has to
be scheduled, it will necessarily happen after initialization since
it will result from a state change.

This should be backported wherever server-state is present.

BUILD: makefile: add entries to build common debugging tools

A few tools in contrib/ such as halog, flags, poll and tcploop are
occasionally useful at least to developers, and some of them such as
halog or flags can occasionally break due to some changes in the include
files. As reported in issue #907, their inability to inherit the global
build options also causes some warnings related to some specificities
of the main include files. Let's just add entries in the main makefile
to build them.

CONTRIB: tcploop: remove unused local variables in tcp_pause()

Building with -Wall shows that "pollfd" and "ret" are not used. Silly
copy-paste...

BUG/MINOR: queue: properly report redistributed connections

In commit 5cd4bbd7a ("BUG/MAJOR: threads/queue: Fix thread-safety issues
on the queues management") the counter of transferred connections was
accidently lost, so that when a server goes down with connections in its
queue, it will always be reported that 0 connection were transferred.

This should be backported as far as 1.8 since the patch above was
backported there.

MINOR: ssl: 'ssl-load-extra-del-ext' removes the certificate extension

In issue #785, users are reporting that it's not convenient to load a
".crt.key" when the configuration contains a ".crt".

This option allows to remove the extension of the certificate before
trying to load any extra SSL file (.key, .ocsp, .sctl, .issuer etc.)

The patch changes a little bit the way ssl_sock_load_files_into_ckch()
looks for the file.

BUG/MINOR: listener: close before free in `listener_accept`

safer to close handle before the object is put back in the global pool.

this was introduced by commit 9378bbe0bef4005155d ("MEDIUM: listener:
use protocol->accept_conn() to accept a connection")

this should fix github issue #902

no backport needed.

Signed-off-by: William Dauchy <wdauchy@gmail.com>

MEDIUM: config: report that "nbproc" is deprecated

As previously discussed, nbproc usage is bad, deprecated, and scheduled
for removal in 2.5.

If "nbproc" is found with more than one process while nbthread is not
set, a warning will be emitted encouraging to remove it or to migrate
to nbthread instead. This makes sure the user has an opportunity to
both see the message and silence it.

BUG/MEDIUM: connection: Never cleanup server lists when freeing private conns

When a connection is released, depending on its state, it may be detached from
the session and it may be removed from the server lists. The first case may
happen for private or unsharable active connections. The second one should only
be performed for idle or available connections. We never try to remove a
connection from the server list if it is attached to a session. But it is also
important to never try to remove a private connecion from the server lists, even
if it is not attached to a session. Otherwise, the curr_used_conn server counter
is decremented once too often.

This bug was introduced by the commit 04a24c5ea ("MINOR: connection: don't check
priv flag on free"). It is related to the issue #881. It only affects the 2.3,
no backport is needed.

CLEANUP: task: remove the unused and mishandled global_rqueue_size

This counter is only updated and never used, and in addition it's done
without any atomicity so it's very unlikely to be correct on multi-CPU
systems! Let's just remove it since it's not used.

CLEANUP: tree-wide: reorder a few structures to plug some holes around locks

A few structures were slightly rearranged in order to plug some holes
left around the locks. Sizes ranging from 8 to 32 bytes could be saved
depending on the structures. No performance difference was noticed (none
was expected there), though memory usage might be slightly reduced in
some rare cases.

MINOR: threads: change lock_t to an unsigned int

We don't need to waste the size of a long for the locks: with the plocks,
even an unsigned short would offer enough room for up to 126 threads! Let's
use an unsigned int which will be easier to place in certain structures
and will more conveniently plug some holes, and Atomic ops are at least
as fast on 32-bit as on 64-bit. This will not change anything for 32-bit
platforms.

CLEANUP: threads: don't register an initcall when not debugging

It's a bit overkill to register an initcall to call a function to set
a lock to zero when not debugging, let's just declare the lock as
pre-initialized to zero.

BUILD: ssl: make BoringSSL use its own version numbers

BoringSSL is a fork of OpenSSL 1.1.0, however in
49e9f67d8b7cbeb3953b5548ad1009d15947a523 it has changed version to 1.1.1.

Should fix issue #895.

This must be backported to 2.2, 2.1, 2.0, 1.8

BUG/MINOR: disable dynamic OCSP load with BoringSSL

it was accidently enabled on BoringSSL while
actually it is not supported

wla: Fix part of the issue mentionned in #895.
It fixes build of boringSSL versions prior to commit
https://boringssl.googlesource.com/boringssl/+/49e9f67d8b7cbeb3953b5548ad1009d15947a523

Must be backported in 2.2.

Signed-off-by: William Lallemand <wlallemand@haproxy.org>

MINOR: lb/chash: use a read lock in chash_get_server_hash()

When using a low hash-balance-factor value, it's possible to loop
many times trying to find the best server. Figures in the order of
100-300 times were observed for 1000 servers with a factor of 101
(which seems a bit excessive for such a large farm). Given that
there's nothing in that function that prevents multiple threads
from working in parallel, let's switch to a read lock. Tests on
8 threads show roughly a 2% performance increase with this.

MINOR: lb/first: use a read lock in fas_get_next_server()

The "first" algorithm creates a lot of contention because all threads
focus on the same server by definition (the first available one). By
turning the exclusive lock to a read lock in fas_get_next_server(),
the request rate increases by 16% for 8 threads when many servers are
getting close to their maxconn.

MINOR: lb/leastconn: only take a read lock in fwlc_get_next_server()

This function doesn't change the tree, it only looks for the first
usable server, so let's do that under a read lock to limit the
situations like the ones described in issue #881 where finding a
usable server when dealing with lots of saturated ones can be
expensive. At least threads will now be able to look up in
parallel.

It's interesting to note that s->served is not incremented during the
server choice, nor is the server repositionned. So right now already,
nothing prevents multiple threads from picking the same server. This
will not cause a significant imbalance anyway given that the server
will automatically be repositionned at the right place, but this might
be something to improve in the future if it doesn't come with too high
a cost.

It also looks like the way a server's weight is updated could be
revisited so that the write lock gets tighter at the expense of a
short part of inconsistency between weights and servers still present
in the tree.

MINOR: lb/map: use seek lock and read locks where appropriate

- map_get_server_hash() doesn't need a write lock since it only
  reads the array, let's only use a read lock here.

- map_get_server_rr() only needs exclusivity to adjust the rr_idx
  while looking for its entry. Since this one is not used by
  map_get_server_hash(), let's turn this lock to a seek lock that
  doesn't block reads.

With 8 threads, no significant performance difference was noticed
given that lookups are usually instant with this LB algo so the
lock contention is rare.

MINOR: backend: replace the lbprm lock with an rwlock

It was previously a spinlock, and it happens that a number of LB algos
only lock it for lookups, without performing any modification. Let's
first turn it to an rwlock and w-lock it everywhere. This is strictly
identical.

It was carefully checked that every HA_SPIN_LOCK() was turned to
HA_RWLOCK_WRLOCK() and that HA_SPIN_UNLOCK() was turned to
HA_RWLOCK_WRUNLOCK() on this lock. _INIT and _DESTROY were updated too.

[RELEASE] Released version 2.3-dev7

Released version 2.3-dev7 with the following main changes :
    - CI: travis-ci: replace not defined SSL_LIB, SSL_INC for BotringSSL builds
    - BUG/MINOR: init: only keep rlim_fd_cur if max is unlimited
    - BUG/MINOR: mux-h2: do not stop outgoing connections on stopping
    - MINOR: fd: report an error message when failing initial allocations
    - MINOR: proto-tcp: make use of connect(AF_UNSPEC) for the pause
    - MINOR: sock: add sock_accept_conn() to test a listening socket
    - MINOR: protocol: make proto_tcp & proto_uxst report listening sockets
    - MINOR: sockpair: implement the .rx_listening function
    - CLEANUP: tcp: make use of sock_accept_conn() where relevant
    - CLEANUP: unix: make use of sock_accept_conn() where relevant
    - BUG/MINOR: listener: detect and handle shared sockets stopped in other processes
    - CONTRIB: tcploop: implement a disconnect operation 'D'
    - CLEANUP: protocol: intitialize all of the sockaddr when disconnecting
    - BUG/MEDIUM: deinit: check fdtab before fdtab[fd].owner
    - BUG/MINOR: connection: fix loop iter on connection takeover
    - BUG/MEDIUM: connection: fix srv idle count on conn takeover
    - MINOR: connection: improve list api usage
    - MINOR: mux/connection: add a new mux flag for HOL risk
    - MINOR: connection: don't check priv flag on free
    - MEDIUM: backend: add new conn to session if mux marked as HOL blocking
    - MEDIUM: backend: add reused conn to sess if mux marked as HOL blocking
    - MEDIUM: h2: remove conn from session on detach
    - MEDIUM: fcgi: remove conn from session on detach
    - DOC: Describe reuse safe for HOL handling
    - MEDIUM: proxy: remove obsolete "mode health"
    - MEDIUM: proxy: remove obsolete "monitor-net"
    - CLEANUP: protocol: remove the ->drain() function
    - CLEANUP: fd: finally get rid of fd_done_recv()
    - MINOR: connection: make sockaddr_alloc() take the address to be copied
    - MEDIUM: listener: allocate the connection before queuing a new connection
    - MINOR: session: simplify error path in session_accept_fd()
    - MINOR: connection: add new error codes for accept_conn()
    - MINOR: sock: rename sock_accept_conn() to sock_accepting_conn()
    - MINOR: protocol: add a new function accept_conn()
    - MINOR: sock: implement sock_accept_conn() to accept a connection
    - MINOR: sockpair: implement sockpair_accept_conn() to accept a connection
    - MEDIUM: listener: use protocol->accept_conn() to accept a connection
    - MEDIUM: listener: remove the second pass of fd manipulation at the end
    - MINOR: protocol: add a default I/O callback and put it into the receiver
    - MINOR: log: set the UDP receiver's I/O handler in the receiver
    - MINOR: protocol: register the receiver's I/O handler and not the protocol's
    - CLEANUP: protocol: remove the now unused <handler> field of proto_fam->bind()
    - DOC: improve the documentation for "option nolinger"
    - BUG/MEDIUM: proxy: properly stop backends
    - BUG/MEDIUM: task: bound the number of tasks picked from the wait queue at once
    - MINOR: threads: augment rwlock debugging stats to report seek lock stats
    - MINOR: threads: add the transitions to/from the seek state
    - MEDIUM: task: use an upgradable seek lock when scanning the wait queue
    - BUILD: listener: avoir a build warning when threads are disabled
    - BUG/MINOR: peers: Possible unexpected peer seesion reset after collisions.
    - MINOR: ssl: add volatile flags to ssl samples
    - MEDIUM: backend: reuse connection if using a static sni
    - BUG/MEDIUM: spoe: Unset variable instead of set it if no data provided
    - BUG/MEDIUM: mux-h1: Get the session from the H1S when capturing bad messages
    - BUG/MEDIUM: lb: Always lock the server when calling server_{take,drop}_conn
    - DOC: fix typo in MAX_SESS_STKCTR

DOC: fix typo in MAX_SESS_STKCTR

MAX_SESS_STKCTR is spelled wrongly a couple of times
in the configuration docs (K and C are swapped). This patch
fixes the typos.

BUG/MEDIUM: lb: Always lock the server when calling server_{take,drop}_conn

The server lock must be held when server_take_conn() and server_drop_conn()
lbprm callback functions are called. It is a documented prerequisite but it is
not always performed. It only affects leastconn and fas lb algorithm. Others
don't use these callback functions.

A race condition on the next pending effecive weight (next_eweight) may be
encountered with the leastconn lb algorithm. An agent check may set it to 0
while fwlc_srv_reposition() is called. The server is locked during the
next_eweight update. But because the server lock is not acquired when
fwlc_srv_reposition() is called, we may use it to recompute the server key,
leading to a division by 0.

This patch must be backported as far as 1.8.

BUG/MEDIUM: mux-h1: Get the session from the H1S when capturing bad messages

It is not guaranteed that the backend connection has an owner. It is set when
the connection is created. But when the connection is moved in a server idle
list, the connection owner is set to NULL and may never be set again. On the
other hand, when a mux is created or when a CS is attached, the session is
always defined. The H1 stream always keep a reference on it when it is
created. Thus, when a bad message is captured we should not rely on the
connection owner to retrieve the session. Instead we should get it from the H1
stream.

BUG/MEDIUM: spoe: Unset variable instead of set it if no data provided

If an agent try to set a variable with the NULL data type, an unset is perform
instead to avoid undefined behaviors. Once decoded, such data are translated to
a sample with the type SMP_T_ANY. It is unexpected in HAProxy. When a variable
is set with such sample, no data are attached to the variable. Thus, when the
variable is retrieved later in the transaction, the sample data are
uninitialized, leading to undefined behaviors depending on how it is used. For
instance, it leads to a crash if the debug converter is used on such variable.

This patch should fix the issue #855. It must be backported as far as 1.8.

MEDIUM: backend: reuse connection if using a static sni

Detect if the sni used a constant value and if so, allow to reuse this
connection for later sessions. Use a combination of SMP_USE_INTRN +
!SMP_F_VOLATILE to consider a sample as a constant value.

This features has been requested on github issue #371.

MINOR: ssl: add volatile flags to ssl samples

The ssl samples are not constant over time and change according to the
session. Add the flag SMP_F_VOL_SESS to indicate this.

BUG/MINOR: peers: Possible unexpected peer seesion reset after collisions.

During a peers session collision (two peer sessions opened on both side) we must
mark the peer the session of which will be shutdown as alive, if not ->reconnect
timer will be set with a wrong value if the synchro task expires after the peer
has been reconnected. This possibly leads to unexpected deconnections during handshakes.
Furthermore, this patch cancels any heartbeat tranmimission when a reconnection
is prepared.

BUILD: listener: avoir a build warning when threads are disabled

It's just a __decl_thread() that appeared before the last variable.

MEDIUM: task: use an upgradable seek lock when scanning the wait queue

Right now when running a configuration with many global timers (e.g. many
health checks), there is a lot of contention on the global wait queue
lock because all threads queue up in front of it to scan it.

With 2000 servers checked every 10 milliseconds (200k checks per second),
after 23 seconds running on 8 threads, the lock stats were this high:

  Stats about Lock TASK_WQ:
      write lock  : 9872564
      write unlock: 9872564 (0)
      wait time for write     : 9208.409 msec
      wait time for write/lock: 932.727 nsec
      read lock   : 240367
      read unlock : 240367 (0)
      wait time for read      : 149.025 msec
      wait time for read/lock : 619.991 nsec

i.e. ~5% of the total runtime spent waiting on this specific lock.

With upgradable locks we don't need to work like this anymore. We
can just try to upgade the read lock to a seek lock before scanning
the queue, then upgrade the seek lock to a write lock for each element
we want to delete there and immediately downgrade it to a seek lock.

The benefit is double:
  - all other threads which need to call next_expired_task() before
    polling won't wait anymore since the seek lock is compatible with
    the read lock ;

  - all other threads competing on trying to grab this lock will fail
    on the upgrade attempt from read to seek, and will let the current
    lock owner finish collecting expired entries.

Doing only this has reduced the wake_expired_tasks() CPU usage in a
very large servers test from 2.15% to 1.04% as reported by perf top,
and increased by 3% the health check rate (all threads being saturated).

This is expected to help against (and possibly solve) the problem
described in issue #875.

MINOR: threads: add the transitions to/from the seek state

Since our locks are based on progressive locks, we support the upgradable
seek lock that is compatible with readers and upgradable to a write lock.
The main purpose is to take it while seeking down a tree for modification
while other threads may seek the same tree for an input (e.g. compute the
next event date).

The newly supported operations are:

  HA_RWLOCK_SKLOCK(lbl,l)         pl_take_s(l)      /* N --> S */
  HA_RWLOCK_SKTOWR(lbl,l)         pl_stow(l)        /* S --> W */
  HA_RWLOCK_WRTOSK(lbl,l)         pl_wtos(l)        /* W --> S */
  HA_RWLOCK_SKTORD(lbl,l)         pl_stor(l)        /* S --> R */
  HA_RWLOCK_WRTORD(lbl,l)         pl_wtor(l)        /* W --> R */
  HA_RWLOCK_SKUNLOCK(lbl,l)       pl_drop_s(l)      /* S --> N */
  HA_RWLOCK_TRYSKLOCK(lbl,l)      (!pl_try_s(l))    /* N -?> S */
  HA_RWLOCK_TRYRDTOSK(lbl,l)      (!pl_try_rtos(l)) /* R -?> S */

Existing code paths are left unaffected so this patch doesn't affect
any running code.

MINOR: threads: augment rwlock debugging stats to report seek lock stats

We currently use only read and write lock operations with rwlocks, but
ours also support upgradable seek locks for which we do not report any
stats. Let's add them now when DEBUG_THREAD is enabled.

BUG/MEDIUM: task: bound the number of tasks picked from the wait queue at once

There is a theorical problem in the wait queue, which is that with many
threads, one could spend a lot of time looping on the newly expired tasks,
causing a lot of contention on the global wq_lock and on the global
rq_lock. This initially sounds bening, but if another thread does just
a task_schedule() or task_queue(), it might end up waiting for a long
time on this lock, and this wait time will count on its execution budget,
degrading the end user's experience and possibly risking to trigger the
watchdog if that lasts too long.

The simplest (and backportable) solution here consists in bounding the
number of expired tasks that may be picked from the global wait queue at
once by a thread, given that all other ones will do it as well anyway.

We don't need to pick more than global.tune.runqueue_depth tasks at once
as we won't process more, so this counter is updated for both the local
and the global queues: threads with more local expired tasks will pick
less global tasks and conversely, keeping the load balanced between all
threads. This will guarantee a much lower latency if/when wakeup storms
happen (e.g. hundreds of thousands of synchronized health checks).

Note that some crashes have been witnessed with 1/4 of the threads in
wake_expired_tasks() and, while the issue might or might not be related,
not having reasonable bounds here definitely justifies why we can spend
so much time there.

This patch should be backported, probably as far as 2.0 (maybe with
some adaptations).

BUG/MEDIUM: proxy: properly stop backends

The proxy stopping mechanism was changed with commit 322b9b94e ("MEDIUM:
proxy: make stop_proxy() now use stop_listener()") so that it's now
entirely driven by the listeners. One thing was forgotten though, which
is that pure backends will not stop anymore since they don't have any
listener, and that it's necessary to stop them in order to stop the
health checks.

No backport is needed.

DOC: improve the documentation for "option nolinger"

This reminds the different issues caused by option nolinger, as discussed
in issue #896.

CLEANUP: protocol: remove the now unused <handler> field of proto_fam->bind()

We don't need to specify the handler anymore since it's set in the
receiver. Let's remove this argument from the function and clean up
the remains of code that were still setting it.

MINOR: protocol: register the receiver's I/O handler and not the protocol's

Now we define a new sock_accept_iocb() for socket-based stream protocols
and use it as a wrapper for listener_accept() which now takes a listener
and not an FD anymore. This will allow the receiver's I/O cb to be
redefined during registration, and more specifically to get rid of the
hard-coded hacks in protocol_bind_all() made for syslog.

The previous ->accept() callback in the protocol was removed since it
doesn't have anything to do with accept() anymore but is more generic.
A few places where listener_accept() was compared against the FD's IO
callback for debugging purposes on the CLI were updated.

MINOR: log: set the UDP receiver's I/O handler in the receiver

The I/O handler is syslog_fd_handler(), let's set it when creating
the receivers.

MINOR: protocol: add a default I/O callback and put it into the receiver

For now we're still using the protocol's default accept() function as
the I/O callback registered by the receiver into the poller. While
this is usable for most TCP connections where a listener is needed,
this is not suitable for UDP where a different handler is needed.

Let's make this configurable in the receiver just like the upper layer
is configurable for listeners. In order to ease stream protocols
handling, the protocols will now provide a default I/O callback
which will be preset into the receivers upon allocation so that
almost none of them has to deal with it.

MEDIUM: listener: remove the second pass of fd manipulation at the end

The receiver FDs must not be manipulated by the listener_accept()
function anymore, it must exclusively rely on the job performed by
its listeners, as it is also the only way to keep the receivers
working for established connections regardless of the listener's
state (typically for multiplexed protocols like QUIC). This used
to be necessary when the FDs were adjusted at once only but now
that fd_done() is gone and the need for polling enabled by the
accept_conn() function which detects the EAGAIN, we have nothing
to do there to fixup any possible previous bad decision anymore.

Interestingly, as a side effect of making the code not depend on
the FD anymore, it also removes the need for a second lock, which
increase the accept rate by about 1% on 8 threads.

MEDIUM: listener: use protocol->accept_conn() to accept a connection

Now listener_accept() doesn't have to deal with the incoming FD anymore
(except for a little bit of side band stuff). It directly retrieves a
valid connection from the protocol layer, or receives a well-defined
error code that helps it decide how to proceed. This removes a lot of
hardly maintainable low-level code and opens the function to receive
new protocol stacks.

MINOR: sockpair: implement sockpair_accept_conn() to accept a connection

This is the same as previous commit, but this time for the sockpair-
specific stuff, relying on recv_fd_uxst() instead of accept(), so the
code is simpler. The various errno cases are handled like for regular
sockets, though some of them will probably never happen, but this does
not hurt.

MINOR: sock: implement sock_accept_conn() to accept a connection

The socket-specific accept() code in listener_accept() has nothing to
do there. Let's move it to sock.c where it can be significantly cleaned
up. It will now directly return an accepted connection and provide a
status code instead of letting listener_accept() deal with various errno
values. Note that this doesn't support the sockpair specific code.

The function is now responsible for dealing with its own receiver's
polling state and calling fd_cant_recv() when facing EAGAIN.

One tiny change from the previous implementation is that the connection's
sockaddr is now allocated before trying accept(), which saves a memcpy()
of the resulting address for each accept at the expense of a cheap
pool_alloc/pool_free on the final accept returning EAGAIN. This still
apparently slightly improves accept performance in microbencharks.

MINOR: protocol: add a new function accept_conn()

This per-protocol function will be used to accept an incoming
connection and return it as a struct connection*. As such the protocol
stack's internal representation of a connection will not need to be
handled by the listener code.

MINOR: sock: rename sock_accept_conn() to sock_accepting_conn()

This call was introduced by commit 5ced3e887 ("MINOR: sock: add
sock_accept_conn() to test a listening socket") but is actually quite
confusing because it makes one think the socket will accept a connection
(which is what we want to have in a new function) while it only tells
whether it's configured to accept connections. Let's call it
sock_accepting_conn() instead.

The same change was applied to sockpair which had the same issue.

MINOR: connection: add new error codes for accept_conn()

accept_conn() will be used to accept an incoming connection and return it.
It will have to deal with various error codes. The currently identified
ones were created as CO_AC_*.

MINOR: session: simplify error path in session_accept_fd()

Now that this function is always called with an initialized connection
and that the control layer is always initialized, we don't need to play
games with fdtab[] to decide how to close, we can simply rely on the
regular close path using conn_ctrl_close(), which can be fused with
conn_xprt_close() into conn_full_close().

The code is cleaner because the FD is now used only for some
protocol-specific setup (that will eventually have to move) and to
try to send a hard-coded HTTP 500 error message on raw sockets.

MEDIUM: listener: allocate the connection before queuing a new connection

Till now we would keep a per-thread queue of pending incoming connections
for which we would store:
  - the listener
  - the accepted FD
  - the source address
  - the source address' length

And these elements were first used in session_accept_fd() running on the
target thread to allocate a connection and duplicate them again. Doing
this induces various problems. The first one is that session_accept_fd()
may only run on file descriptors and cannot be reused for QUIC. The second
issue is that it induces lots of memory copies and that the listerner
queue thrashes a lot of cache, consuming 64 bytes per entry.

This patch changes this by allocating the connection before queueing it,
and by only placing the connection's pointer into the queue. Indeed, the
first two calls used to initialize the connection already store all the
information above, which can be retrieved from the connection pointer
alone. So we just have to pop one pointer from the target thread, and
pass it to session_accept_fd() which only needs the FD for the final
settings.

This starts to make the accept path a bit more transport-agnostic, and
saves memory and CPU cycles at the same time (1% connection rate increase
was noticed with 4 threads). Thanks to dividing the accept-queue entry
size from 64 to 8 bytes, its size could be increased from 256 to 1024
connections while still dividing the overall size by two. No single
queue full condition was met.

One minor drawback is that connection may be allocated from one thread's
pool to be used into another one. But this already happens a lot with
connection reuse so there is really nothing new here.

MINOR: connection: make sockaddr_alloc() take the address to be copied

Roughly half of the calls to sockadr_alloc() are made to copy an already
known address. Let's optionally pass it in argument so that the function
can handle the copy at the same time, this slightly simplifies its usage.

CLEANUP: fd: finally get rid of fd_done_recv()

fd_done_recv() used to be useful with the FD cache because it used to
allow to keep a file descriptor active in the poller without being
marked as ready in the cache, saving it from ringing immediately,
without incurring any system call. It was a way to make it yield
to wait for new events leaving a bit of time for others. The only
user left was the connection accepter (listen_accept()). We used
to suspect that with the FD cache removal it had become totally
useless since changing its readiness or not wouldn't change its
status regarding the poller itself, which would be the only one
deciding to report it again.

Careful tests showed that it indeed has exactly zero effect nowadays,
the syscall numbers are exactly the same with and without, including
when enabling edge-triggered polling.

Given that there's no more API available to manipulate it and that it
was directly called as an optimization from listener_accept(), it's
about time to remove it.

CLEANUP: protocol: remove the ->drain() function

No protocol defines it anymore. The last user used to be the monitor-net
stuff that got partially broken already when the tcp_drain() function
moved to conn_sock_drain() with commit e215bba95 ("MINOR: connection:
make conn_sock_drain() work for all socket families") in 1.9-dev2.

A part of this will surely move back later when non-socket connections
arrive with QUIC but better keep the API clean and implement what's
needed in time instead.

MEDIUM: proxy: remove obsolete "monitor-net"

As discussed here during 2.1-dev, "monitor-net" is totally obsolete:

https://www.mail-archive.com/haproxy@formilux.org/msg35204.html

It's fundamentally incompatible with usage of SSL, and imposes the
presence of file descriptors with hard-coded syscalls directly in the
generic accept path.

It's very unlikely that anyone has used it in the last 10 years for
anything beyond testing. In the worst case if anyone would depend
on it, replacing it with "http-request return status 200 if ..." and
"mode http" would certainly do the trick.

The keyword is still detected as special by the config parser to help
users update their configurations appropriately.

MEDIUM: proxy: remove obsolete "mode health"

As discussed here during 2.1-dev, "mode health" is totally obsolete:

https://www.mail-archive.com/haproxy@formilux.org/msg35204.html

It's fundamentally incompatible with usage of SSL, doesn't support
source filtering, and imposes the presence of file descriptors with
hard-coded syscalls directly in the generic accept path.

It's very unlikely that anyone has used it in the last 10 years for
anything beyond testing. In the worst case if anyone would depend
on it, replacing it with "http-request return status 200" and "mode
http" would certainly do the trick.

The keyword is still detected as special by the config parser to help
users update their configurations appropriately.

DOC: Describe reuse safe for HOL handling

Explain the special case of server connections using a protocol subject
to HOL on http-reuse safe mode.

MEDIUM: fcgi: remove conn from session on detach

FCGI mux is marked with HOL blocking. On safe reuse mode, the connection
using it are placed on the sessions instead of the available lists to
avoid sharing it with several clients. On detach, if they are no more
streams, remove the connection from the session before adding it to the
idle list. If there is still used streams, do not add it to available
list as it should be already on the session list.

MEDIUM: h2: remove conn from session on detach

H2 mux is marked with HOL blocking. On safe reuse mode, the connection
using it are placed on the sessions instead of the available lists to
avoid sharing it with several clients. On detach, if they are no more
streams, remove the connection from the session before adding it to the
idle list. If there is still used streams, do not add it to available
list as it should be already on the session list.

MEDIUM: backend: add reused conn to sess if mux marked as HOL blocking

If a connection is using a mux protocol subject to HOL blocking, add it
to the session instead of the available list to avoid sharing it with
other clients on connection reuse.

MEDIUM: backend: add new conn to session if mux marked as HOL blocking

When allocating a new session on connect_server, if the mux protocol is
marked as subject of HOL blocking, add it into session instead of
available list to avoid sharing it with other clients.

MINOR: connection: don't check priv flag on free

Do not check CO_FL_PRIVATE flag to check if the connection is in session
list on conn_free. This is necessary due to the future patches which add
server connections in the session list even if not private, if the mux
protocol is the subject of HOL blocking.

MINOR: mux/connection: add a new mux flag for HOL risk

This flag is used to indicate if the mux protocol is subject to
head-of-line blocking problem.

MINOR: connection: improve list api usage

Replace !LIST_ISEMPTY by LIST_ADDED and LIST_DEL+LIST_INIT by
LIST_DEL_INIT for connection session list.

BUG/MEDIUM: connection: fix srv idle count on conn takeover

On server connection migration from one thread to another, the wrong
idle thread-specific counter is decremented. This bug was introduced
since commit 3d52f0f1f828acb2a74b3f13fcc3fa069106d09f due to the
factorization with srv_use_idle_conn. However, this statement is only
executed from conn_backend_get. Extract the decrement from
srv_use_idle_conn in conn_backend_get and use the correct
thread-specific counter.

Rename the function to srv_use_conn to better reflect its purpose as it
is also used with a newly initialized connection not in the idle list.

As a side change, the connection insertion to available list has also
been extracted to conn_backend_get. This will be useful to be able to
specify an alternative list for protocol subject to HOL risk that should
not be shared between several clients.

This bug is only present in this release and thus do not need a backport.

BUG/MINOR: connection: fix loop iter on connection takeover

The loop always missed one iteration due to the incrementation done on
the for check. Move the incrementation on the loop last statement to fix
this behaviour.

This bug has a very limited impact, not at all visible to the user, but
could be backported to 2.2.

BUG/MEDIUM: deinit: check fdtab before fdtab[fd].owner

When running a pure config check (haproxy -c) we go through the deinit
phase without having allocated fdtab, so we can't blindly dereference
it. The issue was added by recent commit ae7bc4a23 ("MEDIUM: deinit:
close all receivers/listeners before scanning proxies"), no backport is
needed.

CLEANUP: protocol: intitialize all of the sockaddr when disconnecting

In issue #894, Coverity suspects uninitialized values for a socket's
address whose family is AF_UNSPEC but it doesn't know that the address
is not used in this case. It's not on a critical path and working around
it is trivial, let's fully declare the address. We're doing it for both
TCP and UDP, because the same principle appears at two places.

CONTRIB: tcploop: implement a disconnect operation 'D'

This performs a connect(AF_UNSPEC) over an existing connection. This is
mainly for compatibility testing. At this step it only seems to work on
linux for TCP sockets (both listening and established), while SO_LINGER
successfully resets established connections on freebsd and aix.

BUG/MINOR: listener: detect and handle shared sockets stopped in other processes

It may happen that during a temporary listener pause resulting from a
SIGTTOU, one process gets one of its sockets disabled by another process
and will not be able to recover from this situation by itself. For the
protocols supporting this (TCPv4 and TCPv6 at the moment) this situation
is detectable, so when this happens, let's put the listener into the
PAUSED state so that it remains consistent with the real socket state.
One nice effect is that just sending the SIGTTIN signal to the process
is enough to recover the socket in this case.

There is no need to backport this, this behavior has been there forever
and the fix requires to reimplement the getsockopt() call there.

CLEANUP: unix: make use of sock_accept_conn() where relevant

This allows to get rid of one getsockopt(SO_ACCEPTCONN) in the binding
code.

CLEANUP: tcp: make use of sock_accept_conn() where relevant

This allows to get rid of two getsockopt(SO_ACCEPTCONN).

MINOR: sockpair: implement the .rx_listening function

For socket pairs we don't rely on a real listening socket but we need to
have a properly connected UNIX stream socket. This is what the new
sockpair_accept_conn() tries to report. Some corner cases like half
shutdown will still not be detected but that should be sufficient for
most cases we really care about.

MINOR: protocol: make proto_tcp & proto_uxst report listening sockets

Now we introdce a new .rx_listening() function to report if a receiver is
actually a listening socket. The reason for this is to help detect shared
sockets that might have been broken by sibling processes.

MINOR: sock: add sock_accept_conn() to test a listening socket

At several places we need to check if a socket is still valid and still
willing to accept connections. Instead of open-coding this, each time,
let's add a new function for this.

MINOR: proto-tcp: make use of connect(AF_UNSPEC) for the pause

Currently the suspend/resume mechanism for listeners only works on Linux
and we resort to a number of tricks involving shutdown+listen+shutdown
to try to detect failures on other operating systems that do not support
it. But on Linux connect(AF_UNSPEC) also works pretty well and is much
cleaner. It still doesn't work on other operating systems but the error
is easier to detect and appears safer. So let's switch to this.

MINOR: fd: report an error message when failing initial allocations

When starting with a huge maxconn (say 1 billion), the only error seen
is "No polling mechanism available". This doesn't help at all to resolve
the problem. Let's add specific alerts for the failed mallocs. Now we can
get this instead:

[ALERT] 286/154439 (23408) : Not enough memory to allocate 2000000033 entries for fdtab!

This may be backported as far as 2.0 as it helps debugging bad configurations.

BUG/MINOR: mux-h2: do not stop outgoing connections on stopping

There are reports of a few "SC" in logs during reloads when H2 is used
on the backend side. Christopher analysed this as being caused by the
proxy disabled test in h2_process(). As the comment says, this was done
for frontends only, and must absolutely not send a GOAWAY to the backend,
as all it will result in is to make newly queued streams fail.

The fix consists in simply testing the connection side before deciding
to send the GOAWAY.

This may be backported as far as 2.0, though for whatever reason it seems
to manifest itself only since 2.2 (probably due to changes in the outgoing
connection setup sequence).

BUG/MINOR: init: only keep rlim_fd_cur if max is unlimited

On some operating systems, RLIM_INFINITY is set to -1 so that when the
hard limit on the number of FDs is set to unlimited, taking the MAX
of both values keeps rlim_fd_cur and everything works. But on other
systems this values is defined as the highest positive integer. This
is what was observed on a 32-bit AIX 5.1. The effect is that maxsock
becomes 2^31-1 and that fdtab allocation fails.

Note that a simple workaround consists in manually setting maxconn in
the global section.

Let's ignore unlimited as soon as we retrieve rlim_fd_max so that all
systems behave consistently.

This may be backported as far as 2.0, though it doesn't seem like it
has annoyed anyone.

CI: travis-ci: replace not defined SSL_LIB, SSL_INC for BotringSSL builds

after 73b520b958be4ee79b4f09a32d6718a13dc2d1fd variables SSL_LIB, SSL_INC
are not set, but still used by BoringSSL builds. That leads to error
(I wish we could stop on such errors) and using stock openssl instead
of boringssl

[RELEASE] Released version 2.3-dev6

Released version 2.3-dev6 with the following main changes :
    - REGTESTS: use "command" instead of "which" for better POSIX compatibility
    - BUILD: makefile: Update feature flags for OpenBSD
    - DOC: agent-check: fix typo in "fail" word expected reply
    - DOC: crt: advise to move away from cert bundle
    - BUG/MINOR: ssl/crt-list: exit on warning out of crtlist_parse_line()
    - REGTEST: fix host part in balance-uri-path-only.vtc
    - REGTEST: make ssl_client_samples and ssl_server_samples requiret to 2.3
    - REGTEST: the iif converter test requires 2.3
    - REGTEST: make agent-check.vtc require 1.8
    - REGTEST: make abns_socket.vtc require 1.8
    - REGTEST: make map_regm_with_backref require 1.7
    - BUILD: makefile: Update feature flags for FreeBSD
    - OPTIM: backend/random: never queue on the server, always on the backend
    - OPTIM: backend: skip LB when we know the backend is full
    - BUILD: makefile: Fix building with closefrom() support enabled
    - BUILD: makefile: add an EXTRAVERSION variable to ease local naming
    - MINOR: tools: support for word expansion of environment in parse_line
    - BUILD: tools: fix minor build issue on isspace()
    - BUILD: makefile: Enable closefrom() support on Solaris
    - CLEANUP: ssl: Use structured format for error line report during crt-list parsing
    - MINOR: ssl: Add error if a crt-list might be truncated
    - MINOR: ssl: remove uneeded check in crtlist_parse_file
    - BUG/MINOR: Fix several leaks of 'log_tag' in init().
    - DOC: tcp-rules: Refresh details about L7 matching for tcp-request content rules
    - MEDIUM: tcp-rules: Warn if a track-sc* content rule doesn't depend on content
    - BUG/MINOR: tcpcheck: Set socks4 and send-proxy flags before the connect call
    - DOC: ssl: new "cert bundle" behavior
    - BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe
    - CLEANUP: ssl: "bundle" is not an OpenSSL wording
    - MINOR: counters: fix a typo in comment
    - BUG/MINOR: stats: fix validity of the json schema
    - REORG: stats: export some functions
    - MINOR: stats: add stats size as a parameter for csv/json dump
    - MINOR: stats: hide px/sv/li fields in applet struct
    - REORG: stats: extract proxy json dump
    - REORG: stats: extract proxies dump loop in a function
    - MINOR: hlua: Display debug messages on stderr only in debug mode
    - MINOR: stats: define the concept of domain for statistics
    - MINOR: stats: define additional flag px cap on domain
    - MEDIUM: stats: add delimiter for static proxy stats on csv
    - MEDIUM: stats: define an API to register stat modules
    - MEDIUM: stats: add abstract type to store counters
    - MEDIUM: stats: integrate static proxies stats in new stats
    - MINOR: stats: support clear counters for dynamic stats
    - MINOR: stats: display extra proxy stats on the html page
    - MINOR: stats: add config "stats show modules"
    - MINOR: dns/stats: integrate dns counters in stats
    - MINOR: stats: remove for loop declaration
    - DOC: ssl: fix typo about ocsp files
    - BUG/MINOR: peers: Inconsistency when dumping peer status codes.
    - DOC: update INSTALL with supported OpenBSD / FreeBSD versions
    - BUG/MINOR: proto_tcp: Report warning messages when listeners are bound
    - CLEANUP: cache: Fix leak of cconf->c.name during config check
    - CLEANUP: ssl: Release cached SSL sessions on deinit
    - BUG/MINOR: mux-h1: Be sure to only set CO_RFL_READ_ONCE for the first read
    - BUG/MINOR: mux-h1: Always set the session on frontend h1 stream
    - MINOR: mux-h1: Don't wakeup the H1C when output buffer become available
    - CLEANUP: sock-unix: Remove an unreachable goto clause
    - BUG/MINOR: proxy: inc req counter on new syslog messages.
    - BUG/MEDIUM: log: old processes with log foward section don't die on soft stop.
    - MINOR: stats: inc req counter on listeners.
    - MINOR: channel: new getword and getchar functions on channel.
    - MEDIUM: log: syslog TCP support on log forward section.
    - BUG/MINOR: proxy/log: frontend/backend and log forward names must differ
    - DOC: re-work log forward bind statement documentation.
    - DOC: fix a confusing typo on a regsub example
    - BUILD: Add a DragonFlyBSD target
    - BUG/MINOR: makefile: fix a tiny typo in the target list
    - BUILD: makefile: Update feature flags for NetBSD
    - CI: travis-ci: help Coverity to detect BUG_ON() as a real stop
    - DOC: Add missing stats fields in the management doc
    - BUG/MEDIUM: mux-fcgi: Don't handle pending read0 too early on streams
    - BUG/MEDIUM: mux-h2: Don't handle pending read0 too early on streams
    - DOC: Fix typos in configuration.txt
    - BUG/MINOR: http: Fix content-length of the default 500 error
    - BUG/MINOR: http-htx: Expect no body for 204/304 internal HTTP responses
    - REGTESTS: mark abns_socket as broken
    - MEDIUM: fd: always wake up one thread when enabling a foreing FD
    - MEDIUM: listeners: don't bounce listeners management between queues
    - MEDIUM: init: stop disabled proxies after initializing fdtab
    - MEDIUM: listeners: make unbind_listener() converge if needed
    - MEDIUM: deinit: close all receivers/listeners before scanning proxies
    - MEDIUM: listeners: remove the now unused ZOMBIE state
    - MINOR: listeners: do not uselessly try to close zombie listeners in soft_stop()
    - CLEANUP: proxy: remove the first_to_listen hack in zombify_proxy()
    - MINOR: listeners: introduce listener_set_state()
    - MINOR: proxy: maintain per-state counters of listeners
    - MEDIUM: proxy: remove the unused PR_STFULL state
    - MEDIUM: proxy: remove the PR_STERROR state
    - MEDIUM: proxy: remove state PR_STPAUSED
    - MINOR: startup: don't rely on PR_STNEW to check for listeners
    - CLEANUP: peers: don't use the PR_ST* states to mark enabled/disabled
    - MEDIUM: proxy: replace proxy->state with proxy->disabled
    - MEDIUM: proxy: remove start_proxies()
    - MEDIUM: proxy: merge zombify_proxy() with stop_proxy()
    - MINOR: listeners: check the current listener state in pause_listener()
    - MINOR: listeners: check the current listener earlier state in resume_listener()
    - MEDIUM: listener/proxy: make the listeners notify about proxy pause/resume
    - MINOR: protocol: introduce protocol_{pause,resume}_all()
    - MAJOR: signals: use protocol_pause_all() and protocol_resume_all()
    - CLEANUP: proxy: remove the now unused pause_proxies() and resume_proxies()
    - MEDIUM: proto_tcp: make the pause() more robust in multi-process
    - BUG/MEDIUM: listeners: correctly report pause() errors
    - MINOR: listeners: move fd_stop_recv() to the receiver's socket code
    - CLEANUP: protocol: remove the ->disable_all method
    - CLEANUP: listeners: remove unused disable_listener and disable_all_listeners
    - MINOR: listeners: export enable_listener()
    - MINOR: protocol: directly call enable_listener() from protocol_enable_all()
    - CLEANUP: protocol: remove the ->enable_all method
    - CLEANUP: listeners: remove the now unused enable_all_listeners()
    - MINOR: protocol: rename the ->listeners field to ->receivers
    - MINOR: protocol: replace ->pause(listener) with ->rx_suspend(receiver)
    - MINOR: protocol: implement an ->rx_resume() method
    - MINOR: listener: use the protocol's ->rx_resume() method when available
    - MINOR: sock: provide a set of generic enable/disable functions
    - MINOR: protocol: add a new pair of rx_enable/rx_disable methods
    - MINOR: protocol: add a new pair of enable/disable methods for listeners
    - MEDIUM: listeners: now use the listener's ->enable/disable
    - MINOR: listeners: split delete_listener() in two versions
    - MINOR: listeners: count unstoppable jobs on creation, not deletion
    - MINOR: listeners: add a new stop_listener() function
    - MEDIUM: proxy: make stop_proxy() now use stop_listener()
    - MEDIUM: proxy: add mode PR_MODE_PEERS to flag peers frontends
    - MEDIUM: proxy: centralize proxy status update and reporting
    - MINOR: protocol: add protocol_stop_now() to instant-stop listeners
    - MEDIUM: proxy: make soft_stop() stop most listeners using protocol_stop_now()
    - MEDIUM: udp: implement udp_suspend() and udp_resume()
    - MINOR: listener: add a few BUG_ON() statements to detect inconsistencies
    - MEDIUM: listeners: always close master vs worker listeners
    - BROKEN/MEDIUM: listeners: rework the unbind logic to make it idempotent
    - MEDIUM: listener: let do_unbind_listener() decide whether to close or not
    - CLEANUP: listeners: remove the do_close argument to unbind_listener()
    - MINOR: listeners: move the LI_O_MWORKER flag to the receiver
    - MEDIUM: receivers: add an rx_unbind() method in the protocols
    - MINOR: listeners: split do_unbind_listener() in two
    - MEDIUM: listeners: implement protocol level ->suspend/resume() calls
    - MEDIUM: config: mark "grace" as deprecated
    - MEDIUM: config: remove the deprecated and dangerous global "debug" directive
    - BUG/MINOR: proxy: respect the proper format string in sig_pause/sig_listen
    - MINOR: peers: heartbeat, collisions and handshake information for "show peers" command.
    - BUILD: makefile: Enable getaddrinfo() on OS/X

BUILD: makefile: Enable getaddrinfo() on OS/X

Enable getaddrinfo() on OS/X.

MINOR: peers: heartbeat, collisions and handshake information for "show peers" command.

This patch adds "coll" new counter and the heartbeat timer values to "show peers"
command. It also adds the elapsed time since the last handshake to new "last_hdshk"
new peer dump field.

BUG/MINOR: proxy: respect the proper format string in sig_pause/sig_listen

When factoring out the pause/resume error messages in commit 775e00158
("MAJOR: signals: use protocol_pause_all() and protocol_resume_all()")
I forgot that ha_warning() and send_log() take a format string and not
just a const string. No backport is needed, this is 2.3-dev.

MEDIUM: config: remove the deprecated and dangerous global "debug" directive

This one was scheduled for removal in 2.3 since 2.2-dev3 by commit
1b85785bc ("MINOR: config: mark global.debug as deprecated"). Let's
remove it now. It remains totally possible to use -d on the command
line though.

MEDIUM: config: mark "grace" as deprecated

This was introduced 15 years ago or so to delay the stopping of some
services so that a monitoring device could detect its port being down
before services were stopped. Since then, clean reloads were implemented
and this doesn't cope well with reload at all, preventing the new process
from seamlessly binding, and forcing processes to coexist with half-baked
configurations.

Now it has become a real problem because there's a significant code
portion in the proxies that is solely dedicated to this obsolete feature,
and dealing with its special cases eases the introduction of bugs in
other places so it's about time that it goes.

We could tentatively schedule its removal for 2.4 with a hard deadline
for 2.5 in any case.

MEDIUM: listeners: implement protocol level ->suspend/resume() calls

Now we have ->suspend() and ->resume() for listeners at the protocol
level. This means that it now becomes possible for a protocol to redefine
its own way to suspend and resume. The default functions are provided for
TCP, UDP and unix, and they are pass-through to the receiver equivalent
as it used to be till now. Nothing was defined for sockpair since it does
not need to suspend/resume during reloads, hence it will succeed.

MINOR: listeners: split do_unbind_listener() in two

The inner part now goes into the protocol and is used to decide how to
unbind a given protocol's listener. The existing code which is able to
also unbind the receiver was provided as a default function that we
currently use everywhere. Some complex listeners like QUIC will use this
to decide how to unbind without impacting existing connections, possibly
by setting up other incoming paths for the traffic.

MEDIUM: receivers: add an rx_unbind() method in the protocols

This is used as a generic way to unbind a receiver at the end of
do_unbind_listener(). This allows to considerably simplify that function
since we can now let the protocol perform the cleanup. The generic code
was moved to sock.c, along with the conditional rx_disable() call. Now
the code also supports that the ->disable() function of the protocol
which acts on the listener performs the close itself and adjusts the
RX_F_BUOND flag accordingly.

MINOR: listeners: move the LI_O_MWORKER flag to the receiver

This listener flag indicates whether the receiver part of the listener
is specific to the master or to the workers. In practice it's only used
by the master's CLI right now. It's used to know whether or not the FD
must be closed before forking the workers. For this reason it's way more
of a receiver's property than a listener's property, so let's move it
there under the name RX_F_MWORKER. The rest of the code remains
unchanged.

CLEANUP: listeners: remove the do_close argument to unbind_listener()

And also remove it from its callers. This subtle distinction was added as
sort of a hack for the seamless reload feature but is not needed anymore
since the do_close turned unused since commit previous commit ("MEDIUM:
listener: let do_unbind_listener() decide whether to close or not").
This also removes the unbind_listener_no_close() function.

MEDIUM: listener: let do_unbind_listener() decide whether to close or not

The listener contains all the information needed to decide to close on
unbind or not. The rule is the following (when we're not stopping):

  - worker process unbinding from a worker's FD with socket transfer enabled => keep
  - master process unbinding from a master's inherited FD => keep
  - master process unbinding from a master's FD => close
  - master process unbinding from a worker's FD => close
  - worker process unbinding from a master's FD => close
  - worker process unbinding from a worker's FD => close

Let's translate that into the function and stop using the do_close
argument that is a bit obscure for callers. It was not yet removed
to ease code testing.

BROKEN/MEDIUM: listeners: rework the unbind logic to make it idempotent

BROKEN: the failure rate on reg-tests/seamless-reload/abns_socket.vtc has
significantly increased for no obvious reason. It fails 99% of the time vs
10% before.

do_unbind_listener() is not logical and is not even idempotent. It must
not touch the fd if already -1, which also means not touch the receiver.
In addition, when performing a partial stop on a socket (not closing),
we know the socket remains in the listening state yet it's marked as
LI_ASSIGNED, which is confusing as it doesn't translate its real state.

With this change, we make sure that FDs marked for close end up in
ASSIGNED state and that those which are really bound and on which a
listen() was made (i.e. not pause) remain in LISTEN state. This is what
is closest to reality.

Ideally this function should become a default proto->unbind() one but
it may still keep a bit too much state logic to become generalized to
other protocols (e.g. QUIC).

MEDIUM: listeners: always close master vs worker listeners

Right now in enable_listener(), we used to start all enabled
listeners then kill from the workers those that were for the master.
But this is incomplete. We must also close from the master the
listeners that are solely for workers, and do it before we even
start them. Otherwise we end up with a master responding to the
worker CLI connections if the listener remains in listen mode to
translate the socket's real state.

It doesn't seem like it could have caused bugs in the past because we
used to aggressively mark disabled listeners as LI_ASSIGNED despite
the fact that they were still bound and listening. If this patch were
ever seen as a candidate solution for any obscure bug, be careful in
that it subtly relies on the fact that fd_delete() doesn't close
inherited FDs anymore, otherwise that could break the master's ability
to pass inherited FDs on reloads.

MINOR: listener: add a few BUG_ON() statements to detect inconsistencies

We must not have an fd==-1 when switching to certain states. This will
later disappear but for now it helps detecting inconsistencies.

MEDIUM: udp: implement udp_suspend() and udp_resume()

In Linux kernel's net/ipv4/udp.c there's a udp_disconnect() function
which is called when connecting to AF_UNSPEC, and which unhashes a
"connection". This property, which is also documented in connect(2)
both in Linux and Open Group's man pages for datagrams, is interesting
because it allows to reverse a connect() which is in fact a filter on
the source. As such we can suspend a receiver by making it connect to
itself, which will cause it not to receive any traffic anymore, letting
a new one receive it all, then resume it by breaking this connection.

This was tested to work well on Linux, other operating systems should
also be tested. Before this, sending a SIGTTOU to a process having a
UDP syslog forwarder would cause this error:

  [WARNING] 280/194249 (3268) : Paused frontend GLOBAL.
  [WARNING] 280/194249 (3268) : Some proxies refused to pause, performing soft stop now.
  [WARNING] 280/194249 (3268) : Proxy GLOBAL stopped (cumulated conns: FE: 0, BE: 0).
  [WARNING] 280/194249 (3268) : Proxy sylog-loadb stopped (cumulated conns: FE: 0, BE: 0).

With this change, it now proceeds just like with TCP listeners:

  [WARNING] 280/195503 (3885) : Paused frontend GLOBAL.
  [WARNING] 280/195503 (3885) : Paused frontend sylog-loadb.

And SIGTTIN also works:

  [WARNING] 280/195507 (3885) : Resumed frontend GLOBAL.
  [WARNING] 280/195507 (3885) : Resumed frontend sylog-loadb.

On Linux this also works with TCP listeners (which can then be resumed
using listen()) and established TCP sockets (which we currently kill
using setsockopt(so_linger)), both not being portable on other OSes.
UNIX sockets and ABNS sockets do not support it however (connect
always fails). This needs to be further explored to see if other OSes
might benefit from this to perform portable and reliable resets
particularly on the backend side.