Willy Tarreau [Tue, 20 May 2014 12:55:13 +0000 (14:55 +0200)]
MAJOR: checks: move health checks changes to set_server_check_status()
We don't want to manipulate check's statuses anymore in functions
which modify the server's state. So since any check is forced to
call set_server_check_status() exactly once to report the result
of the check, it's the best place to update the check's health.
Willy Tarreau [Fri, 16 May 2014 15:37:50 +0000 (17:37 +0200)]
MEDIUM: checks: simplify server up/down/nolb transitions
We don't have to handle the maintenance transition here anymore so we
can simplify the functions and conditions. This also means that we don't
need the disable/enable functions but only a function to switch to each
new state.
It's worth mentionning that at this stage there are still confusions
between the server state and the checks states. For example, the health
check's state is adjusted from tracked servers changing state, while it
should not be.
Willy Tarreau [Fri, 16 May 2014 11:52:00 +0000 (13:52 +0200)]
MEDIUM: server: allow multi-level server tracking
Now that it is possible to know whether a server is in forced maintenance
or inherits its maintenance status from another one, it is possible to
allow server tracking at more than one level. We still provide a loop
detection however.
Note that for the stats it's a bit trickier since we have to report the
check state which corresponds to the state of the server at the end of
the chain.
Willy Tarreau [Fri, 16 May 2014 09:25:16 +0000 (11:25 +0200)]
MEDIUM: server: properly support and propagate the maintenance status
This change now involves a new flag SRV_ADMF_IMAINT to note that the
maintenance status of a server is inherited from another server. Thus,
we know at each server level in the chain if it's running, in forced
maintenance or in a maintenance status because it tracks another server,
or even in both states.
Disabling a server propagates this flag down to other servers. Enabling
a server flushes the flag down. A server becomes up again once both of
its flags are cleared.
Two new functions "srv_adm_set_maint()" and "srv_adm_set_ready()" are used to
manipulate this maintenance status. They're used by the CLI and the stats
page.
Now the stats page always says "MAINT" instead of "MAINT(via)" and it's
only the chk/down field which reports "via x/y" when the status is
inherited from another server, but it doesn't say it when a server was
forced into maintenance. The CSV output indicates "MAINT (via x/y)"
instead of only "MAINT(via)". This is the most accurate representation.
One important thing is that now entering/leaving maintenance for a
tracking server correctly follows the state of the tracked server.
Willy Tarreau [Fri, 16 May 2014 09:48:10 +0000 (11:48 +0200)]
REORG: checks: put the functions in the appropriate files !
Checks.c has become a total mess. A number of proxy or server maintenance
and queue management functions were put there probably because they were
used there, but that makes the code untouchable. And that's without saying
that their names does not always relate to what they really do!
So let's do a first pass by moving these ones :
- set_backend_down() => backend.c
- redistribute_pending() => queue.c:pendconn_redistribute()
- check_for_pending() => queue.c:pendconn_grab_from_px()
- shutdown_sessions => server.c:srv_shutdown_sessions()
- shutdown_backup_sessions => server.c:srv_shutdown_backup_sessions()
Willy Tarreau [Tue, 13 May 2014 21:41:20 +0000 (23:41 +0200)]
MAJOR: server: use states instead of flags to store the server state
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.
The encoding from the previous to the new states can be represented
this way :
SRV_STF_RUNNING
| SRV_STF_GOINGDOWN
| | SRV_STF_WARMINGUP
| | |
0 x x SRV_ST_STOPPED
1 0 0 SRV_ST_RUNNING
1 0 1 SRV_ST_STARTING
1 1 x SRV_ST_STOPPING
Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.
It's worth noting that the internal function names are still quite confusing.
Willy Tarreau [Tue, 13 May 2014 17:44:56 +0000 (19:44 +0200)]
REORG/MEDIUM: server: move the maintenance bits out of the server state
Now we introduce srv->admin and srv->prev_admin which are bitfields
containing one bit per source of administrative status (maintenance only
for now). For the sake of backwards compatibility we implement a single
source (ADMF_FMAINT) but the code already checks any source (ADMF_MAINT)
where the STF_MAINTAIN bit was previously checked. This will later allow
us to add ADMF_IMAINT for maintenance mode inherited from tracked servers.
Along doing these changes, it appeared that some places will need to be
revisited when implementing the inherited bit, this concerns all those
modifying the ADMF_FMAINT bit (enable/disable actions on the CLI or stats
page), and the checks to report "via" on the stats page. But currently
the code is harmless.
Willy Tarreau [Tue, 13 May 2014 13:54:22 +0000 (15:54 +0200)]
REORG/MEDIUM: server: split server state and flags in two different variables
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
Willy Tarreau [Thu, 22 May 2014 06:24:46 +0000 (08:24 +0200)]
MEDIUM: proxy: make timeout parser a bit stricter
Twice in a week I found people were surprized by a "conditional timeout" not
being respected, because they add "if <cond>" after a timeout, and since
they don't see any error nor read the doc, the expect it to work. Let's
make the timeout parser reject extra arguments to avoid these situations.
Willy Tarreau [Wed, 21 May 2014 15:13:13 +0000 (17:13 +0200)]
BUG/MINOR: stats: tracking servers may incorrectly report an inherited DRAIN status
The DRAIN status is not inherited between tracked servers, so the stats
page must only use the reported server's status and not the tracked
server's status, otherwise it misleadingly indicates a DRAIN state when
a server tracks a draining server, while this is wrong.
Willy Tarreau [Wed, 21 May 2014 14:58:17 +0000 (16:58 +0200)]
BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called
As more or less suspected, commit b1982e2 ("BUG/MEDIUM: http/session:
disable client-side expiration only after body") was hazardous. It
introduced a regression causing client side timeout to expire during
connection retries if it's lower than the time needed to cover the
amount of retries, so clients get a 408 when the connection to the
server fails to establish fast enough.
The reason is that the CF_READ_NOEXP flag is set after the MSG_DONE state
is reached, which protects the timeout from being re-armed, then during
the retries, process_session() clears the flag without calling the analyser
(since there's no activity for it), so the timeouts are rearmed.
Ideally, these one-shot flags should be per-analyser, and the analyser
which sets them would be responsible for clearing them, or they would
automatically be cleared when switching to another analyser. Unfortunately
this is not really possible currently.
What can be done however is to only clear them in the following situations :
- we're going to call analysers
- analysers have all been unsubscribed
This method seems reliable enough and approaches the ideal case well enough.
No backport is needed, this bug was introduced in 1.5-dev25.
Conrad Hoffmann [Tue, 20 May 2014 12:28:24 +0000 (14:28 +0200)]
BUG/MEDIUM: polling: fix possible CPU hogging of worker processes after receiving SIGUSR1.
When run in daemon mode (i.e. with at least one forked process) and using
the epoll poller, sending USR1 (graceful shutdown) to the worker processes
can cause some workers to start running at 100% CPU. Precondition is having
an established HTTP keep-alive connection when the signal is received.
The cloned (during fork) listening sockets do not get closed in the parent
process, thus they do not get removed from the epoll set automatically
(see man 7 epoll). This can lead to the process receiving epoll events
that it doesn't feel responsible for, resulting in an endless loop around
epoll_wait() delivering these events.
The solution is to explicitly remove these file descriptors from the epoll
set. To not degrade performance, care was taken to only do this when
neccessary, i.e. when the file descriptor was cloned during fork.
Signed-off-by: Conrad Hoffmann <conrad@soundcloud.com>
[wt: a backport to 1.4 could be studied though chances to catch the bug are low]
Remi Gacogne [Mon, 19 May 2014 08:29:58 +0000 (10:29 +0200)]
MINOR: ssl: SSL_CTX_set_options() and SSL_CTX_set_mode() take a long, not an int
This is a minor fix, but the SSL_CTX_set_options() and
SSL_CTX_set_mode() functions take a long, not an int parameter. As
SSL_OP_ALL is now (since OpenSSL 1.0.0) defined as 0x80000BFFL, I think
it is worth fixing.
Willy Tarreau [Sun, 18 May 2014 06:11:41 +0000 (08:11 +0200)]
BUG/MAJOR: config: don't free valid regex memory
Thomas Heil reported that previous commit 07fcaaa ("MINOR: fix a few
memory usage errors") make haproxy crash when req* rules are used. As
diagnosed by Cyril Bonté, this commit introduced a regression which
makes haproxy free the memory areas allocated for regex even when
they're going to be used, resulting in the crashes.
This patch does three things :
- undo the free() on the valid path
- add regfree() on the error path but only when regcomp() succeeds
- rename err_code to ret_code to avoid confusing the valid return
path with an error path.
Willy Tarreau [Tue, 13 May 2014 16:51:40 +0000 (18:51 +0200)]
MINOR: server: create srv_was_usable() from srv_is_usable() and use a pointer
We used to call srv_is_usable() with either the current state and weights
or the previous ones. This causes trouble for future changes, so let's first
split it in two variants :
- srv_is_usable(srv) considers the current status
- srv_was_usable(srv) considers the previous status
Willy Tarreau [Tue, 13 May 2014 17:27:31 +0000 (19:27 +0200)]
MINOR: server: use functions to detect state changes and to update them
Detecting that a server's status has changed is a bit messy, as well
as it is to commit the status changes. We'll have to add new conditions
soon and we'd better avoid to multiply the number of touched locations
with the high risk of forgetting them.
This commit introduces :
- srv_lb_status_changed() to report if the status changed from the
previously committed one ;
- svr_lb_commit_status() to commit the current status
The function is now used by all load-balancing algorithms.
Willy Tarreau [Tue, 13 May 2014 20:08:20 +0000 (22:08 +0200)]
MINOR: server: remove the SRV_DRAIN flag which can always be deduced
This flag is only a copy of (srv->uweight == 0), so better get rid of
it to reduce some of the confusion that remains in the code, and use
a simple function to return this state based on this weight instead.
Willy Tarreau [Tue, 13 May 2014 19:01:39 +0000 (21:01 +0200)]
MINOR: checks: simplify and improve reporting of state changes when using log-health-checks
Function set_server_check_status() is very weird. It is called at the
end of a check to update the server's state before the new state is even
calculated, and possibly to log status changes, only if the proxy has
"option log-health-checks" set.
In order to do so, it employs an exhaustive list of the combinations
which can lead to a state change, while in practice almost all of
them may simply be deduced from the change of check status. Better,
some changes of check status are currently not detected while they
can be very valuable (eg: changes between L4/L6/TOUT/HTTP 500 for
example).
The doc was updated to reflect this.
Also, a minor change was made to consider s->uweight and not s->eweight
as meaning "DRAIN" since eweight can be null without the DRAIN mode (eg:
throttle, NOLB, ...).
Willy Tarreau [Tue, 13 May 2014 19:58:31 +0000 (21:58 +0200)]
MINOR: stats: improve alignment of color codes to save one line of header
Having both "active or backup DOWN" and "not checked" on the left side of
the color caption inflates the whole header block for no reason. Simply
move them both on the same line and reduce the header height.
Willy Tarreau [Tue, 13 May 2014 15:57:29 +0000 (17:57 +0200)]
BUG/MINOR: checks: tcp-check must not stop on '\0' for binary checks
Abuse of copy-paste has made "tcp-check expect binary" to consider a
buffer starting with \0 as empty! Thanks to Lukas Benes for reporting
this problem and confirming the fix.
Willy Tarreau [Tue, 13 May 2014 11:37:54 +0000 (13:37 +0200)]
BUG/MEDIUM: config: a stats-less config crashes in 1.5-dev25
John-Paul Bader reported a stupid regression in 1.5-dev25, we
forget to check that global.stats_fe is initialized before visiting
its sockets, resulting in a crash.
Thierry FOURNIER [Sun, 11 May 2014 13:15:00 +0000 (15:15 +0200)]
MINOR: acl: set "str" as default match for strings
It appears than many people considers that the default match for a fetch
returning string is "exact match string" aka "str". This patch set this
match as default for strings.
Cyril Bonté [Sun, 11 May 2014 21:10:19 +0000 (23:10 +0200)]
OPTIM: stats: avoid the calculation of a useless link on tracking servers in maintenance
Commit f465994198 removed the "via" link when a tracking server is in maintenance, but
still calculated an empty link that no one can use. We can safely remove it.
Willy Tarreau [Sun, 11 May 2014 07:43:46 +0000 (09:43 +0200)]
MEDIUM: acl: strenghten the option parser to report invalid options
Whatever ACL option beginning with a '-' is considered as a pattern
if it does not match a known option. This is a big problem because
typos are silently ignored, such as "-" or "-mi".
Better clearly check the complete option name and report a parsing
error if the option is unknown.
Willy Tarreau [Sat, 10 May 2014 13:16:43 +0000 (15:16 +0200)]
[RELEASE] Released version 1.5-dev25
Released version 1.5-dev25 with the following main changes :
- MEDIUM: connection: Implement and extented PROXY Protocol V2
- MINOR: ssl: clean unused ACLs declarations
- MINOR: ssl: adds fetchs and ACLs for ssl back connection.
- MINOR: ssl: merge client's and frontend's certificate functions.
- MINOR: ssl: adds ssl_f_sha1 fetch to return frontend's certificate fingerprint
- MINOR: ssl: adds sample converter base64 for binary type.
- MINOR: ssl: convert to binary ssl_fc_unique_id and ssl_bc_unique_id.
- BUG/MAJOR: ssl: Fallback to private session cache if current lock mode is not supported.
- MAJOR: ssl: Change default locks on ssl session cache.
- BUG/MINOR: chunk: Fix function chunk_strcmp and chunk_strcasecmp match a substring.
- MINOR: ssl: add global statement tune.ssl.force-private-cache.
- MINOR: ssl: remove fallback to SSL session private cache if lock init fails.
- BUG/MEDIUM: patterns: last fix was still not enough
- MINOR: http: export the smp_fetch_cookie function
- MINOR: http: generic pointer to rule argument
- BUG/MEDIUM: pattern: a typo breaks automatic acl/map numbering
- BUG/MAJOR: patterns: -i and -n are ignored for inlined patterns
- BUG/MINOR: proxy: unsafe initialization of HTTP transaction when switching from TCP frontend
- BUG/MINOR: http: log 407 in case of proxy auth
- MINOR: http: rely on the message body parser to send 100-continue
- MEDIUM: http: move reqadd after execution of http_request redirect
- MEDIUM: http: jump to dedicated labels after http-request processing
- BUG/MINOR: http: block rules forgot to increment the denied_req counter
- BUG/MINOR: http: block rules forgot to increment the session's request counter
- MEDIUM: http: move Connection header processing earlier
- MEDIUM: http: remove even more of the spaghetti in the request path
- MINOR: http: silently support the "block" action for http-request
- CLEANUP: proxy: rename "block_cond" to "block_rules"
- MEDIUM: http: emulate "block" rules using "http-request" rules
- MINOR: http: remove the now unused loop over "block" rules
- MEDIUM: http: factorize the "auth" action of http-request and stats
- MEDIUM: http: make http-request rules processing return a verdict instead of a rule
- MINOR: config: add minimum support for emitting warnings only once
- MEDIUM: config: inform the user about the deprecatedness of "block" rules
- MEDIUM: config: inform the user that "reqsetbe" is deprecated
- MEDIUM: config: inform the user only once that "redispatch" is deprecated
- MEDIUM: config: warn that '{cli,con,srv}timeout' are deprecated
- BUG/MINOR: auth: fix wrong return type in pat_match_auth()
- BUILD: config: remove a warning with clang
- BUG/MAJOR: http: connection setup may stall on balance url_param
- BUG/MEDIUM: http/session: disable client-side expiration only after body
- BUG/MEDIUM: http: correctly report request body timeouts
- BUG/MEDIUM: http: disable server-side expiration until client has sent the body
- MEDIUM: listener: make the accept function more robust against pauses
- BUILD: syscalls: remove improper inline statement in front of syscalls
- BUILD: ssl: SSL_CTX_set_msg_callback() needs openssl >= 0.9.7
- BUG/MAJOR: session: recover the correct connection pointer in half-initialized sessions
- DOC: add some explanation on the shared cache build options in the readme.
- MEDIUM: proxy: only adjust the backend's bind-process when already set
- MEDIUM: config: limit nbproc to the machine's word size
- MEDIUM: config: check the bind-process settings according to nbproc
- MEDIUM: listener: parse the new "process" bind keyword
- MEDIUM: listener: inherit the process mask from the proxy
- MAJOR: listener: only start listeners bound to the same processes
- MINOR: config: only report a warning when stats sockets are bound to more than 1 process
- CLEANUP: config: set the maxaccept value for peers listeners earlier
- BUG/MINOR: backend: only match IPv4 addresses with RDP cookies
- BUG/MINOR: checks: correctly configure the address family and protocol
- MINOR: tools: split is_addr() and is_inet_addr()
- MINOR: protocols: use is_inet_addr() when only INET addresses are desired
- MEDIUM: unix: add preliminary support for connecting to servers over UNIX sockets
- MEDIUM: checks: only complain about the missing port when the check uses TCP
- MEDIUM: unix: implement support for Linux abstract namespace sockets
- DOC: map_beg was missing from the table of map_* converters
- DOC: ebtree: indicate that prefix insertion/lookup may be used with strings
- MEDIUM: pattern: use ebtree's longest match to index/lookup string beginning
- BUILD: remove the obsolete BSD and OSX makefiles
- MEDIUM: unix: avoid a double connect probe when no data are sent
- DOC: stop referencing the slow git repository in the README
- BUILD: only build the systemd wrapper on Linux 2.6 and above
- DOC: update roadmap with completed tasks
- MEDIUM: session: implement half-closed timeouts (client-fin and server-fin)
Willy Tarreau [Sat, 10 May 2014 12:30:07 +0000 (14:30 +0200)]
MEDIUM: session: implement half-closed timeouts (client-fin and server-fin)
Long-lived sessions are often subject to half-closed sessions resulting in
a lot of sessions appearing in FIN_WAIT state in the system tables, and no
way for haproxy to get rid of them. This typically happens because clients
suddenly disconnect without sending any packet (eg: FIN or RST was lost in
the path), and while the server detects this using an applicative heart
beat, haproxy does not close the connection.
This patch adds two new timeouts : "timeout client-fin" and
"timeout server-fin". The former allows one to override the client-facing
timeout when a FIN has been received or sent. The latter does the same for
server-facing connections, which is less useful.
Willy Tarreau [Sat, 10 May 2014 10:16:21 +0000 (12:16 +0200)]
BUILD: only build the systemd wrapper on Linux 2.6 and above
Attempting to build haproxy-systemd-wrapper on non-linux platforms
sometimes results in build errors. Better move it into an EXTRA
variable which is set to haproxy-systemd-wrapper only on Linux 2.6
and above. Proceeding this way also allows to disable building it
in quick builds (eg: when developing).
Willy Tarreau [Sat, 10 May 2014 09:04:39 +0000 (11:04 +0200)]
DOC: stop referencing the slow git repository in the README
git.1wt.eu is painfully slow and some people experience issues with
it. Better hide it and only advertise git.haproxy.org which is mirrored
on a faster server.
Also replace haproxy.1wt.eu with www.haproxy.org in the download URL
which appears in the stats page.
Willy Tarreau [Sat, 10 May 2014 07:48:28 +0000 (09:48 +0200)]
MEDIUM: unix: avoid a double connect probe when no data are sent
Plain "tcp" health checks sent to a unix socket cause two connect()
calls to be made, one to connect, and a second one to verify that the
connection properly established. But with unix sockets, we get
immediate notification of success, so we can avoid this second
attempt. However we need to ensure that we'll visit the connection
handler even if there's no remaining handshake pending, so for this
we claim we have some data to send in order to enable polling for
writes if there's no more handshake.
Willy Tarreau [Sat, 10 May 2014 07:12:46 +0000 (09:12 +0200)]
BUILD: remove the obsolete BSD and OSX makefiles
These makefiles were aging, with no support for SSL nor zlib, and
they were hard to maintain. GNU make is packaged and provided with
all systems which were relying on these makefiles, so better simply
delete them and enable the new features for everyone.
Willy Tarreau [Sat, 10 May 2014 06:53:48 +0000 (08:53 +0200)]
MEDIUM: pattern: use ebtree's longest match to index/lookup string beginning
Being able to map prefixes to values is already used for IPv4/IPv6
but was not yet used with strings. It can be very convenient to map
directories to server farms but large lists may be slow.
By using ebmb_insert_prefix() and ebmb_lookup_longest(), we can
insert strings with their own length as a prefix, and lookup
candidate strings and ensure that the longest matching one will
be returned, which is the longest string matching the entry.
Willy Tarreau [Sat, 10 May 2014 06:34:01 +0000 (08:34 +0200)]
DOC: ebtree: indicate that prefix insertion/lookup may be used with strings
And indicate what is required for this (that the pattern is properly
terminated by a zero).
(cherry picked from commit c87c93800ce4045b1053302d99a3cd78321a7ec4)
Willy Tarreau [Fri, 9 May 2014 23:49:15 +0000 (01:49 +0200)]
MEDIUM: unix: implement support for Linux abstract namespace sockets
These sockets are the same as Unix sockets except that there's no need
for any filesystem access. The address may be whatever string both sides
agree upon. This can be really convenient for inter-process communications
as well as for chaining backends to frontends.
These addresses are forced by prepending their address with "abns@" for
"abstract namespace".
Willy Tarreau [Fri, 9 May 2014 20:57:47 +0000 (22:57 +0200)]
MEDIUM: unix: add preliminary support for connecting to servers over UNIX sockets
We've had everything in place for this for a while now, we just missed
the connect function for UNIX sockets. Note that in order to connect to
a UNIX socket inside a chroot, the path will have to be relative to the
chroot.
UNIX sockets connect about twice as fast as TCP sockets (or consume
about half of the CPU at the same rate). This is interesting for
internal communications between SSL processes and HTTP processes
for example, or simply to avoid allocating source ports on the
loopback.
The tcp_connect_probe() function is still used to probe a dataless
connection, but it is compatible so that's not an issue for now.
Health checks are not yet fully supported since they require a port.
Willy Tarreau [Fri, 9 May 2014 20:56:10 +0000 (22:56 +0200)]
MINOR: protocols: use is_inet_addr() when only INET addresses are desired
We used to have is_addr() in place to validate sometimes the existence
of an address, sometimes a valid IPv4 or IPv6 address. Replace them
carefully so that is_inet_addr() is used wherever we can only use an
IPv4/IPv6 address.
Willy Tarreau [Fri, 9 May 2014 20:40:55 +0000 (22:40 +0200)]
MINOR: tools: split is_addr() and is_inet_addr()
The is_addr() function indicates if an address is set and is an IPv4
or IPv6 address. Let's rename it is_inet_addr() and make is_addr()
also accept AF_UNIX addresses.
Willy Tarreau [Fri, 9 May 2014 21:38:15 +0000 (23:38 +0200)]
BUG/MINOR: checks: correctly configure the address family and protocol
Currently, mixing an IPv4 and an IPv6 address in checks happens to
work by pure luck because the two protocols use the same functions
at the socket level and both use IPPROTO_TCP. However, they're
definitely wrong as the protocol for the check address is retrieved
from the server's address.
Now the protocol assigned to the connection is the same as the one
the address in use belongs to (eg: the server's address or the
explicit check address).
Willy Tarreau [Fri, 9 May 2014 20:47:50 +0000 (22:47 +0200)]
BUG/MINOR: backend: only match IPv4 addresses with RDP cookies
The RDP cookie extractor compares the 32-bit address from the request
to the address of each server in the farm without first checking that
the server's address is IPv4. This is a leftover from the IPv4 to IPv6
conversion. It's harmless as it's unlikely that IPv4 and IPv6 servers
will be mixed in an RDP farm, but better fix it.
Willy Tarreau [Fri, 9 May 2014 16:48:46 +0000 (18:48 +0200)]
MINOR: config: only report a warning when stats sockets are bound to more than 1 process
Till now a warning was emitted if the "stats bind-process" was not
specified when nbproc was greater than 1. Now we can be much finer
and only emit a warning when at least of the stats socket is bound
to more than one process at a time.
Willy Tarreau [Wed, 7 May 2014 17:22:24 +0000 (19:22 +0200)]
MAJOR: listener: only start listeners bound to the same processes
Now that we know what processes a "bind" statement is attached to, we
have the ability to avoid starting some of them when they're not on the
proper process. This feature is disabled when running in foreground
however, so that debug mode continues to work with everything bound to
the first and only process.
The main purpose of this change is to finally allow the global stats
sockets to be each bound to a different process.
It can also be used to force haproxy to use different sockets in different
processes for the same IP:port. The purpose is that under Linux 3.9 and
above (and possibly other OSes), when multiple processes are bound to the
same IP:port via different sockets, the system is capable of performing
a perfect round-robin between the socket queues instead of letting any
process pick all the connections from a queue. This results in a smoother
load balancing and may achieve a higher performance with a large enough
maxaccept setting.
Willy Tarreau [Fri, 9 May 2014 15:06:11 +0000 (17:06 +0200)]
MEDIUM: listener: inherit the process mask from the proxy
When a process list is specified on either the proxy or the bind lines,
the latter is refined to the intersection of the two. A warning is emitted
if no intersection is found, and the situation is fixed by either falling
back to the first process of the proxy or to all processes.
Willy Tarreau [Wed, 7 May 2014 21:56:38 +0000 (23:56 +0200)]
MEDIUM: config: check the bind-process settings according to nbproc
When a bind-process setting is present in a frontend or backend, we
now verify that the specified process range at least shares one common
process with those defined globally by nbproc. Then if the value is
set, it is reduced to the one enforced by nbproc.
A warning is emitted if process count does not match, and the fix is
done the following way :
- if a single process was specified in the range, it's remapped to
process #1
- if more than one process was specified, the binding is removed
and all processes are usable.
Note that since backends may inherit their settings from frontends,
depending on the declaration order, they may or may not be reported
as warnings.
Willy Tarreau [Fri, 18 Jan 2013 10:29:29 +0000 (11:29 +0100)]
MEDIUM: config: limit nbproc to the machine's word size
Some consistency checks cannot be performed between frontends, backends
and peers at the moment because there is no way to check for intersection
between processes bound to some processes when the number of processes is
higher than the number of bits in a word.
So first, let's limit the number of processes to the machine's word size.
This means nbproc will be limited to 32 on 32-bit machines and 64 on 64-bit
machines. This is far more than enough considering that configs rarely go
above 16 processes due to scalability and management issues, so 32 or 64
should be fine.
This way we'll ensure we can always build a mask of all the processes a
section is bound to.
Willy Tarreau [Fri, 9 May 2014 11:54:22 +0000 (13:54 +0200)]
MEDIUM: proxy: only adjust the backend's bind-process when already set
By default, a proxy's bind_proc is zero, meaning "bind to all processes".
It's only when not zero that its process list is restricted. So we don't
want the frontends to enforce the value on the backends when the backends
are still set to zero.
Emeric Brun [Fri, 9 May 2014 12:01:48 +0000 (14:01 +0200)]
MINOR: ssl: remove fallback to SSL session private cache if lock init fails.
Now, haproxy exit an error saying:
Unable to initialize the lock for the shared SSL session cache. You can retry using
the global statement 'tune.ssl.force-private-cache' but it could increase the CPU
usage due to renegotiation if nbproc > 1.
Emeric Brun [Fri, 9 May 2014 15:11:07 +0000 (17:11 +0200)]
BUG/MINOR: chunk: Fix function chunk_strcmp and chunk_strcasecmp match a substring.
They could match different strings as equal if the chunk was shorter
than the string. Those functions are currently only used for SSL's
certificate DN entry extract.
David S [Fri, 9 May 2014 03:42:08 +0000 (23:42 -0400)]
MEDIUM: connection: Implement and extented PROXY Protocol V2
This commit modifies the PROXY protocol V2 specification to support headers
longer than 255 bytes allowing for optional extensions. It implements the
PROXY protocol V2 which is a binary representation of V1. This will make
parsing more efficient for clients who will know in advance exactly how
many bytes to read. Also, it defines and implements some optional PROXY
protocol V2 extensions to send information about downstream SSL/TLS
connections. Support for PROXY protocol V1 remains unchanged.
Willy Tarreau [Thu, 8 May 2014 19:06:11 +0000 (21:06 +0200)]
BUG/MAJOR: session: recover the correct connection pointer in half-initialized sessions
John-Paul Bader reported a nasty segv which happens after a few hours
when SSL is enabled under a high load. Fortunately he could catch a
stack trace, systematically looking like this one :
The variable "flags" in conn_fd_handler() holds a copy of connection->flags
when entering the function. These flags indicate 41997063 = 0x0280d307 :
- {SOCK,DATA,CURR}_RD_ENA=1 => it's a handshake, waiting for reading
- {SOCK,DATA,CURR}_WR_ENA=0 => no need for writing
- CTRL_READY=1 => FD is still allocated
- XPRT_READY=1 => transport layer is initialized
- ADDR_FROM_SET=1, ADDR_TO_SET=0 => clearly it's a frontend connection
- INIT_DATA=1, WAKE_DATA=1 => processing a handshake (ssl I guess)
- {DATA,SOCK}_{RD,WR}_SH=0 => no shutdown
- ERROR=0, CONNECTED=0 => handshake not completed yet
- WAIT_L4_CONN=0 => normal
- WAIT_L6_CONN=1 => waiting for an L6 handshake to complete
- SSL_WAIT_HS=1 => the pending handshake is an SSL handshake
So this is a handshake is in progress. And the only way to reach line 88
is for the handshake to complete without error. So we know for sure that
ssl_sock_handshake() was called and completed the handshake then removed
the CO_FL_SSL_WAIT_HS flag from the connection. With these flags,
ssl_sock_handshake() does only call SSL_do_handshake() and retruns. So
that means that the problem is necessarily in data->init().
The fd is wrong as reported but is simply mis-decoded as it's the lower
half of the last function pointer.
What happens in practice is that there's an issue with the way we deal
with embryonic sessions during their conversion to regular sessions.
Since they have no stream interface at the beginning, the pointer to
the connection is temporarily stored into s->target. Then during their
conversion, the first stream interface is properly initialized and the
connection is attached to it, then s->target is set to NULL.
The problem is that if anything fails in session_complete(), the
session is left in this intermediate state where s->target is NULL,
and kill_mini_session() is called afterwards to perform the cleanup.
It needs the connection, that it finds in s->target which is NULL,
dereferences it and dies. The only reasons for dying here are a problem
on the TCP connection when doing the setsockopt(TCP_NODELAY) or a
memory allocation issue.
This patch implements a solution consisting in restoring s->target in
session_complete() on the error path. That way embryonic sessions that
were valid before calling it are still valid after.
The bug was introduced in 1.5-dev20 by commit f8a49ea ("MEDIUM: session:
attach incoming connection to target on embryonic sessions"). No backport
is needed.
Special thanks to John for his numerous tests and traces.
1.5-dev24 introduced SSL_CTX_set_msg_callback(), which came with OpenSSL
0.9.7. A build attempt with an older one failed and we're still compatible
with 0.9.6 in 1.5.
Willy Tarreau [Thu, 8 May 2014 20:36:29 +0000 (22:36 +0200)]
BUILD: syscalls: remove improper inline statement in front of syscalls
Trying to build with an old gcc and glibc revealed that we must not
state "inline" in our _syscall* definitions since it's already present
in the declaration making use of the _syscall* macros.
Willy Tarreau [Wed, 7 May 2014 17:47:02 +0000 (19:47 +0200)]
MEDIUM: listener: make the accept function more robust against pauses
During some tests in multi-process mode under Linux, it appeared that
issuing "disable frontend foo" on the CLI to pause a listener would
make the shutdown(read) of certain processes disturb another process
listening on the same socket, resulting in a 100% CPU loop. What
happens is that accept() returns EAGAIN without accepting anything.
Fortunately, we see that epoll_wait() reports EPOLLIN+EPOLLRDHUP
(likely because the FD points to the same file in the kernel), so we
can use that to stop the other process from trying to accept connections
for a short time and try again later, hoping for the situation to change.
We must not disable the FD otherwise there's no way to re-enable it.
Additionally, during these tests, a loop was encountered on EINVAL which
was not caught. Now if we catch an EINVAL, we proceed the same way, in
case the socket is re-enabled later.
Willy Tarreau [Wed, 7 May 2014 12:50:10 +0000 (14:50 +0200)]
BUG/MEDIUM: http: disable server-side expiration until client has sent the body
It's the final part of the 2 previous patches. We prevent the server from
timing out if we still have some data to pass to it. That way, even if the
server runs with a short timeout and the client with a large one, the server
side timeout will only start to count once the client sends everything. This
ensures we don't report a 504 before the server gets the whole request.
It is not certain whether the 1.4 state machine is fully compatible with
this change. Since the purpose is only to ensure that we never report a
server error before a client error if some data are missing from the client
and when the server-side timeout is smaller than or equal to the client's,
it's probably not worth attempting the backport.
Willy Tarreau [Wed, 7 May 2014 12:24:16 +0000 (14:24 +0200)]
BUG/MEDIUM: http: correctly report request body timeouts
This is the continuation of previous patch "BUG/MEDIUM: http/session:
disable client-side expiration only after body".
This one takes care of properly reporting the client-side read timeout
when waiting for a body from the client. Since the timeout may happen
before or after the server starts to respond, we have to take care of
the situation in three different ways :
- if the server does not read our data fast enough, we emit a 504
if we're waiting for headers, or we simply break the connection
if headers were already received. We report either sH or sD
depending on whether we've seen headers or not.
- if the server has not yet started to respond, but has read all of
the client's data and we're still waiting for more data from the
client, we can safely emit a 408 and abort the request ;
- if the server has already started to respond (thus it's a transfer
timeout during a bidirectional exchange), then we silently break
the connection, and only the session flags will indicate in the
logs that something went wrong with client or server side.
This bug is tagged MEDIUM because it touches very sensible areas, however
its impact is very low. It might be worth performing a careful backport
to 1.4 once it has been confirmed that everything is correct and that it
does not introduce any regression.
Willy Tarreau [Tue, 6 May 2014 20:57:53 +0000 (22:57 +0200)]
BUG/MEDIUM: http/session: disable client-side expiration only after body
For a very long time, back in the v1.3 days, we used to rely on a trick
to avoid expiring the client side while transferring a payload to the
server. The problem was that if a client was able to quickly fill the
buffers, and these buffers took some time to reach the server, the
client should not expire while not sending anything.
In order to cover this situation, the client-side timeout was disabled
once the connection to the server was OK, since it implied that we would
at least expire on the server if required.
But there is a drawback to this : if a client stops uploading data before
the end, its timeout is not enforced and we only expire on the server's
timeout, so the logs report a 504.
Since 1.4, we have message body analysers which ensure that we know whether
all the expected data was received or not (HTTP_MSG_DATA or HTTP_MSG_DONE).
So we can fix this problem by disabling the client-side or server-side
timeout at the end of the transfer for the respective side instead of
having it unconditionally in session.c during all the transfer.
With this, the logs now report the correct side for the timeout. Note that
this patch is not enough, because another issue remains : the HTTP body
forwarders do not abort upon timeout, they simply rely on the generic
handling from session.c. So for now, the session is still aborted when
reaching the server timeout, but the culprit is properly reported. A
subsequent patch will address this specific point.
This bug was tagged MEDIUM because of the changes performed. The issue
it fixes is minor however. After some cooling down, it may be backported
to 1.4.
It was reported by and discussed with Rachel Chavez and Patrick Hemmer
on the mailing list.
MINOR: ssl: convert to binary ssl_fc_unique_id and ssl_bc_unique_id.
Previously ssl_fc_unique_id and ssl_bc_unique_id return a string encoded
in base64 of the RFC 5929 TLS unique identifier. This patch modify those fetches
to return directly the ID in the original binary format. The user can make the
choice to encode in base64 using the converter.
MINOR: ssl: adds ssl_f_sha1 fetch to return frontend's certificate fingerprint
ssl_f_sha1 is a binary binary fetch used to returns the SHA-1 fingerprint of
the certificate presented by the frontend when the incoming connection was
made over an SSL/TLS transport layer. This can be used to know which
certificate was chosen using SNI.
MINOR: ssl: adds fetchs and ACLs for ssl back connection.
Adds ssl fetchs and ACLs for outgoinf SSL/Transport layer connection with their
docs:
ssl_bc, ssl_bc_alg_keysize, ssl_bc_cipher, ssl_bc_protocol, ssl_bc_unique_id,
ssl_bc_session_id and ssl_bc_use_keysize.
BUG/MAJOR: http: connection setup may stall on balance url_param
On the mailing list, seri0528@naver.com reported an issue when
using balance url_param or balance uri. The request would sometimes
stall forever.
Cyril Bonté managed to reproduce it with the configuration below :
listen test :80
mode http
balance url_param q
hash-type consistent
server s demo.1wt.eu:80
and found it appeared with this commit : 80a92c0 ("BUG/MEDIUM: http:
don't start to forward request data before the connect").
The bug is subtle but real. The problem is that the HTTP request
forwarding analyzer refrains from starting to parse the request
body when some LB algorithms might need the body contents, in order
to preserve the data pointer and avoid moving things around during
analysis in case a redispatch is later needed. And in order to detect
that the connection establishes, it watches the response channel's
CF_READ_ATTACHED flag.
The problem is that a request analyzer is not subscribed to a response
channel, so it will only see changes when woken for other (generally
correlated) reasons, such as the fact that part of the request could
be sent. And since the CF_READ_ATTACHED flag is cleared once leaving
process_session(), it is important not to miss it. It simply happens
that sometimes the server starts to respond in a sequence that validates
the connection in the middle of process_session(), that it is detected
after the analysers, and that the newly assigned CF_READ_ATTACHED is
not used to detect that the request analysers need to be called again,
then the flag is lost.
The CF_WAKE_WRITE flag doesn't work either because it's cleared upon
entry into process_session(), ie if we spend more than one call not
connecting.
Thus we need a new flag to tell the connection initiator that we are
specifically interested in being notified about connection establishment.
This new flag is CF_WAKE_CONNECT. It is set by the requester, and is
cleared once the connection succeeds, where CF_WAKE_ONCE is set instead,
causing the request analysers to be scanned again.
For future versions, some better options will have to be considered :
- let all analysers subscribe to both request and response events ;
- let analysers subscribe to stream interface events (reduces number
of useless calls)
- change CF_WAKE_WRITE's semantics to persist across calls to
process_session(), but that is different from validating a
connection establishment (eg: no data sent, or no data to send)
The bug was introduced in 1.5-dev23, no backport is needed.
Commit fc6c032 ("MEDIUM: global: add support for CPU binding on Linux ("cpu-map")")
merged into 1.5-dev13 involves a useless test that clang reports as a warning. The
"low" variable cannot be negative here. Issue reported by Charles Carter.
BUG/MINOR: auth: fix wrong return type in pat_match_auth()
Commit 5338eea ("MEDIUM: pattern: The match function browse itself the
list or the tree") changed the return type of pattern matching functions.
One enum was left over in pat_match_auth(). Fortunately, this one equals
zero where a null pointer is expected, so it's cast correctly.
This detected and reported by Charles Carter was introduced in 1.5-dev23,
no backport is needed.
MEDIUM: http: make http-request rules processing return a verdict instead of a rule
Till now we used to return a pointer to a rule, but that makes it
complicated to later add support for registering new actions which
may fail. For example, the redirect may fail if the response is too
large to fit into the buffer.
So instead let's return a verdict. But we needed the pointer to the
last rule to get the address of a redirect and to get the realm used
by the auth page. So these pieces of code have moved into the function
and they produce a verdict.
MEDIUM: http: factorize the "auth" action of http-request and stats
Both use exactly the same mechanism, except for the choice of the
default realm to be emitted when none is selected. It can be achieved
by simply comparing the ruleset with the stats' for now. This achieves
a significant code reduction and further, removes the dependence on
the pointer to the final rule in the caller.
MEDIUM: http: emulate "block" rules using "http-request" rules
The "block" rules are redundant with http-request rules because they
are performed immediately before and do exactly the same thing as
"http-request deny". Moreover, this duplication has led to a few
minor stats accounting issues fixed lately.
Instead of keeping the two rule sets, we now build a list of "block"
rules that we compile as "http-request block" and that we later insert
at the beginning of the "http-request" rules.
The only user-visible change is that in case of a parsing error, the
config parser will now report "http-request block rule" instead of
"blocking condition".
MEDIUM: http: remove even more of the spaghetti in the request path
Some of the remaining interleaving of request processing after the
http-request rules can now safely be removed, because all remaining
actions are mutually exclusive.
So we can move together all those related to an intercepting rule,
then proceed with stats, then with req*.
We still keep an issue with stats vs reqrep which forces us to
keep the stats split in two (detection and action). Indeed, from the
beginning, stats are detected before rewriting and not after. But a
reqdeny rule would stop stats, so in practice we have to first detect,
then perform the action. Maybe we'll be able to kill this in version
1.6.
Till now the Connection header was processed in the middle of the http-request
rules and some reqadd rules. It used to force some http-request actions to be
cut in two parts.
Now with keep-alive, not only that doesn't make any sense anymore, but it's
becoming a total mess, especially since we need to know the headers contents
before proceeding with most actions.
The real reason it was not moved earlier is that the "block" or "http-request"
rules can see a different version if some fields are changed there. But that
is already not reliable anymore since the values observed by the frontend
differ from those in the backend.
This patch is the equivalent of commit f118d9f ("REORG: http: move HTTP
Connection response header parsing earlier") but for the request side. It
has been tagged MEDIUM as it could theorically slightly affect some setups
relying on corner cases or invalid setups, though this does not make real
sense and is highly unlikely.
BUG/MINOR: http: block rules forgot to increment the session's request counter
The session's backend request counters were incremented after the block
rules while these rules could increment the session's error counters,
meaning that we could have more errors than requests reported in a stick
table! Commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking L7
information") is the most responsible for this.
This bug is 1.5-specific and does not need any backport.
BUG/MINOR: http: block rules forgot to increment the denied_req counter
"block" rules used to build the whole response and forgot to increment
the denied_req counters. By jumping to the general "deny" label created
in previous patch, it's easier to fix this.
The issue was already present in 1.3 and remained unnoticed, in part
because few people use "block" nowadays.
MEDIUM: http: jump to dedicated labels after http-request processing
Continue the cleanup of http-request post-processing to remove some
of the interleaved tests. Here we set up a few labels to deal with
the deny and tarpit actions and avoid interleaved ifs.
MEDIUM: http: move reqadd after execution of http_request redirect
We still have a plate of spaghetti in the request processing rules.
All http-request rules are executed at once, then some responses are
built interlaced with other rules that used to be there in the past.
Here, reqadd is executed after an http-req redirect rule is *decided*,
but before it is *executed*.
So let's match the doc and config checks, to put the redirect actually
before the reqadd completely.
MINOR: http: rely on the message body parser to send 100-continue
There's no point in open-coding the sending of 100-continue in
the stats initialization code, better simply rely on the function
designed to process the message body which already does it.
Commit 844a7e7 ("[MEDIUM] http: add support for proxy authentication")
merged in v1.4-rc1 added the ability to emit a status code 407 in auth
responses, but forgot to set the same status in the logs, which still
contain 401.
BUG/MINOR: proxy: unsafe initialization of HTTP transaction when switching from TCP frontend
A switch from a TCP frontend to an HTTP backend initializes the HTTP
transaction. txn->hdr_idx.size is used by hdr_idx_init() but not
necessarily initialized yet here, because the first call to hdr_idx_init()
is in fact placed in http_init_txn(). Moving it before the call is
enough to fix it. We also remove the useless extra confusing call
to hdr_idx_init().
The bug was introduced in 1.5-dev8 with commit ac1932d ("MEDIUM:
tune.http.maxhdr makes it possible to configure the maximum number
of HTTP headers"). No backport to stable is needed.
BUG/MEDIUM: patterns: last fix was still not enough
Last fix did address the issue for inlined patterns, but it was not
enough because the flags are lost as well when updating patterns
dynamically over the CLI.
Also if the same file was used once with -i and another time without
-i, their references would have been merged and both would have used
the same matching method.
It's appear that the patterns have two types of flags. The first
ones are relative to the pattern matching, and the second are
relative to the pattern storage. The pattern matching flags are
the same for all the patterns of one expression. Now they are
stored in the expression. The storage flags are information
returned by the pattern mathing function. This information is
relative to each entry and is stored in the "struct pattern".
Now, the expression matching flags are forwarded to the parse
and index functions. These flags are stored during the
configuration parsing, and they are used during the parse and
index actions.
This issue was introduced in dev23 with the major pattern rework,
and is a continuation of commit a631fc8 ("BUG/MAJOR: patterns: -i
and -n are ignored for inlined patterns"). No backport is needed.