Willy Tarreau [Fri, 23 Nov 2012 16:32:21 +0000 (17:32 +0100)]
MEDIUM: connection: provide a common conn_full_close() function
Several places got the connection close sequence wrong because it
was not obvious. In practice we always need the same sequence when
aborting, so let's have a common function for this.
Willy Tarreau [Fri, 23 Nov 2012 15:32:33 +0000 (16:32 +0100)]
MINOR: checks: fix recv polling after connect()
Commit a522f801 moved a call to __conn_data_want_recv() just after the
connect() call, which is not 100% correct. First, it does not take errors
into account, eventhough this is harmless. Second, this change will only
be taken into account after next call do conn_data_polling_update(), which
is not necessarily what is expected (eg: if an error is only reported on
the recv side).
So let's use conn_data_poll_recv() instead, which directly subscribes
the event to polling.
Willy Tarreau [Fri, 23 Nov 2012 15:22:08 +0000 (16:22 +0100)]
BUG/MAJOR: checks: close FD on all timeouts
Since last commit, some timeouts were converted into an error to report
the status, and as a result, the socket was not closed because it was
supposed to have been done during the wake() call.
Close the socket as soon as the timeout is detected to fix the issue.
Also we now ensure to first initialize the connection flags.
Willy Tarreau [Fri, 23 Nov 2012 13:43:49 +0000 (14:43 +0100)]
MEDIUM: checks: close the socket as soon as we have a response
Until now, the check socked was closed in the task which handles the
check, which can sometimes be substantially later when many tasks are
running. It's much cleaner to close() in the wake call, which also
helps removing some FD management from the task itself.
The code is faster and smaller, and fast health checks show a more
predictable behaviour.
Willy Tarreau [Fri, 23 Nov 2012 13:16:39 +0000 (14:16 +0100)]
MEDIUM: checks: avoid waking the application up for pure TCP checks
Pure TCP checks only use the SYN/ACK in return to a SYN. By forcing
the system to use delayed ACKs, it is possible to send an RST instead
of the ACK and thus ensure that the application will never be needlessly
woken up. This avoids error logs or counters on checked components since
the application is never made aware of this connection which dies in the
network stack.
Willy Tarreau [Fri, 23 Nov 2012 13:02:10 +0000 (14:02 +0100)]
BUG/MINOR: checks: slightly clean the state machine up
The process_chk() function still did not consider the the timeout when
it was woken up, so a spurious wakeup could trigger a false timeout. Some
checks were now redundant or could not be triggered (eg: L7 timeout).
So remove them and rearrange the timeout detection.
Willy Tarreau [Fri, 23 Nov 2012 11:47:05 +0000 (12:47 +0100)]
MAJOR: checks: rework completely bogus state machine
The porting of checks to using connections was totally bogus. Some checks
were considered successful as soon as the connection was established,
regardless of any response. Some errors would be triggered upon recv
if polling was enabled for send or if the send channel was shut down.
Now the behaviour is much better. It would be cleaner to perform the
fd_delete() in wake_srv_chk() and to process failures and timeouts
separately, but this is already a good start.
Willy Tarreau [Fri, 23 Nov 2012 10:32:12 +0000 (11:32 +0100)]
CLEANUP: checks: rename some server check flags
Some server check flag names were not properly choosen and cause
analysis trouble, especially the CHK_RUNNING one which does not
mean that a check is running but that the server is running...
Here's the rename :
CHK_RUNNING -> CHK_PASSED
CHK_ERROR -> CHK_FAILED
Willy Tarreau [Fri, 23 Nov 2012 08:18:20 +0000 (09:18 +0100)]
MEDIUM: checks: avoid accumulating TIME_WAITs during checks
Some checks which do not induce a close from the server accumulate
local TIME_WAIT sockets because they're cleanly shut down. Typically
TCP probes cause this. This is very problematic when there are many
servers, when the checks are fast or when local source ports are rare.
So now we'll disable lingering on the socket instead of sending a
shutdown. Before doing this we try to drain any possibly pending data.
That way we avoid sending an RST when the server has closed first.
This change means that some servers will see more RSTs, but this is
needed to avoid local source port starvation.
Willy Tarreau [Fri, 23 Nov 2012 07:56:35 +0000 (08:56 +0100)]
BUG/MEDIUM: checks: ensure we completely disable polling upon success
When a check succeeds, it used to only disable receive events while
it should disable both directions. The problem is that if the send
event was reported too, it could re-enable the recv event. In theory
this is not a problem as the task is going to be woken up, but if
there are many tasks in the queue and this task is not processed
immediately, we could theorically face a storm of unprocessed events
(typically POLL_HUP).
So better stop both directions, prevent the send side from enabling
recv and have the process_chk() code enable both directions. This
will also help detecting closes before the check is sent.
Note that all this mess has been inherited from the old code that used
the fd as a flag to report if a check was running. We should have a
dedicated flag and perform the fd_delete() in wake_srv_chk() instead.
Willy Tarreau [Fri, 23 Nov 2012 07:51:32 +0000 (08:51 +0100)]
BUG/MEDIUM: checks: mark the check as stopped after a connect error
Health checks currently still use the connection's fd to know whether
a check is running (this needs to change). When a health check
immediately fails during connect() because of a lack of local resource
(eg: port), we failed to unset the fd, so each time the process_chk
woken up after such an error, it believed a check was still running
and used to close the fd again instead of starting a new check. This
could result in other connections being closed because they were
assigned the same fd value.
The bug is only marked medium because when this happens, the system
is already in a bad state.
A comment was added above tcp_connect_server() to clarify that the
fd is *not* valid on error.
Willy Tarreau [Thu, 22 Nov 2012 00:11:33 +0000 (01:11 +0100)]
[RELEASE] Released version 1.5-dev13
Released version 1.5-dev13 with the following main changes :
- BUILD: fix build issue without USE_OPENSSL
- BUILD: fix compilation error with DEBUG_FULL
- DOC: ssl: remove prefer-server-ciphers documentation
- DOC: ssl: surround keywords with quotes
- DOC: fix minor typo on http-send-name-header
- BUG/MEDIUM: acls using IPv6 subnets patterns incorrectly match IPs
- BUG/MAJOR: fix a segfault on option http_proxy and url_ip acl
- MEDIUM: http: accept IPv6 values with (s)hdr_ip acl
- BUILD: report zlib support in haproxy -vv
- DOC: compression: add some details and clean up the formatting
- DOC: Change is_ssl acl to ssl_fc acl in example
- DOC: make it clear what the HTTP request size is
- MINOR: ssl: try to load Diffie-Hellman parameters from cert file
- DOC: ssl: update 'crt' statement on 'bind' about Diffie-Hellman parameters loading
- MINOR: ssl: add elliptic curve Diffie-Hellman support for ssl key generation
- DOC: ssl: add 'ecdhe' statement on 'bind'
- MEDIUM: ssl: add client certificate authentication support
- DOC: ssl: add 'verify', 'cafile' and 'crlfile' statements on 'bind'
- MINOR: ssl: add fetch and ACL 'client_crt' to test a client cert is present
- DOC: ssl: add fetch and ACL 'client_cert'
- MINOR: ssl: add ignore verify errors options
- DOC: ssl: add 'ca-ignore-err' and 'crt-ignore-err' statements on 'bind'
- MINOR: ssl: add fetch and ACL 'ssl_verify_result'
- DOC: ssl: add fetch and ACL 'ssl_verify_result'
- MINOR: ssl: add fetches and ACLs to return verify errors
- DOC: ssl: add fetches and ACLs 'ssl_verify_crterr', 'ssl_verify_caerr', and 'ssl_verify_crterr_depth'
- MINOR: ssl: disable shared memory and locks on session cache if nbproc == 1
- MINOR: ssl: add build param USE_PRIVATE_CACHE to build cache without shared memory
- MINOR: ssl : add statements 'notlsv11' and 'notlsv12' and rename 'notlsv1' to 'notlsv10'.
- DOC: ssl : add statements 'notlsv11' and 'notlsv12' and rename 'notlsv1' to 'notlsv10'.
- MEDIUM: config: authorize frontend and listen without bind.
- MINOR: ssl: add statement 'no-tls-tickets' on bind to disable stateless session resumption
- DOC: ssl: add 'no-tls-tickets' statement documentation.
- BUG/MINOR: ssl: Fix CRL check was not enabled when crlfile was specified.
- BUG/MINOR: build: Fix compilation issue on openssl 0.9.6 due to missing CRL feature.
- BUG/MINOR: conf: Fix 'maxsslconn' statement error if built without OPENSSL.
- BUG/MINOR: build: Fix failure with USE_OPENSSL=1 and USE_FUTEX=1 on archs i486 and i686.
- MINOR: ssl: remove prefer-server-ciphers statement and set it as the default on ssl listeners.
- BUG/MEDIUM: ssl: subsequent handshakes fail after server configuration changes
- MINOR: ssl: add 'crt-base' and 'ca-base' global statements.
- MEDIUM: conf: rename 'nosslv3' and 'notlsvXX' statements 'no-sslv3' and 'no-tlsvXX'.
- MEDIUM: conf: rename 'cafile' and 'crlfile' statements 'ca-file' and 'crl-file'
- MINOR: ssl: use bit fields to store ssl options instead of one int each
- MINOR: ssl: add 'force-sslv3' and 'force-tlsvXX' statements on bind.
- MINOR: ssl: add 'force-sslv3' and 'force-tlsvXX' statements on server
- MINOR: ssl: add defines LISTEN_DEFAULT_CIPHERS and CONNECT_DEFAULT_CIPHERS.
- BUG/MINOR: ssl: Fix issue on server statements 'no-tls*' and 'no-sslv3'
- MINOR: ssl: move ssl context init for servers from cfgparse.c to ssl_sock.c
- MEDIUM: ssl: reject ssl server keywords in default-server statement
- MINOR: ssl: add statement 'no-tls-tickets' on server side.
- MINOR: ssl: add statements 'verify', 'ca-file' and 'crl-file' on servers.
- DOC: Fix rename of options cafile and crlfile to ca-file and crl-file.
- MINOR: sample: manage binary to string type convertion in stick-table and samples.
- MINOR: acl: add parse and match primitives to use binary type on ACLs
- MINOR: sample: export 'sample_get_trash_chunk(void)'
- MINOR: conf: rename all ssl modules fetches using prefix 'ssl_fc' and 'ssl_c'
- MINOR: ssl: add pattern and ACLs fetches 'ssl_fc_protocol', 'ssl_fc_cipher', 'ssl_fc_use_keysize' and 'ssl_fc_alg_keysize'
- MINOR: ssl: add pattern fetch 'ssl_fc_session_id'
- MINOR: ssl: add pattern and ACLs fetches 'ssl_c_version' and 'ssl_f_version'
- MINOR: ssl: add pattern and ACLs fetches 'ssl_c_s_dn', 'ssl_c_i_dn', 'ssl_f_s_dn' and 'ssl_c_i_dn'
- MINOR: ssl: add pattern and ACLs 'ssl_c_sig_alg' and 'ssl_f_sig_alg'
- MINOR: ssl: add pattern and ACLs fetches 'ssl_c_key_alg' and 'ssl_f_key_alg'
- MINOR: ssl: add pattern and ACLs fetches 'ssl_c_notbefore', 'ssl_c_notafter', 'ssl_f_notbefore' and 'ssl_f_notafter'
- MINOR: ssl: add 'crt' statement on server.
- MINOR: ssl: checks the consistency of a private key with the corresponding certificate
- BUG/MEDIUM: ssl: review polling on reneg.
- BUG/MEDIUM: ssl: Fix some reneg cases not correctly handled.
- BUG/MEDIUM: ssl: Fix sometimes reneg fails if requested by server.
- MINOR: build: allow packagers to specify the ssl cache size
- MINOR: conf: add warning if ssl is not enabled and a certificate is present on bind.
- MINOR: ssl: Add tune.ssl.lifetime statement in global.
- MINOR: compression: Enable compression for IE6 w/SP2, IE7 and IE8
- BUG: http: revert broken optimisation from 82fe75c1a79dac933391501b9d293bce34513755
- DOC: duplicate ssl_sni section
- MEDIUM: HTTP compression (zlib library support)
- CLEANUP: use struct comp_ctx instead of union
- BUILD: remove dependency to zlib.h
- MINOR: compression: memlevel and windowsize
- MEDIUM: use pool for zlib
- MINOR: compression: try init in cfgparse.c
- MINOR: compression: init before deleting headers
- MEDIUM: compression: limit RAM usage
- MINOR: compression: tune.comp.maxlevel
- MINOR: compression: maximum compression rate limit
- MINOR: log-format: check number of arguments in cfgparse.c
- BUG/MEDIUM: compression: no Content-Type header but type in configuration
- BUG/MINOR: compression: deinit zlib only when required
- MEDIUM: compression: don't compress when no data
- MEDIUM: compression: use pool for comp_ctx
- MINOR: compression: rate limit in 'show info'
- MINOR: compression: report zlib memory usage
- BUG/MINOR: compression: dynamic level increase
- DOC: compression: unsupported cases.
- MINOR: compression: CPU usage limit
- MEDIUM: http: add "redirect scheme" to ease HTTP to HTTPS redirection
- BUG/MAJOR: ssl: missing tests in ACL fetch functions
- MINOR: config: add a function to indent error messages
- REORG: split "protocols" files into protocol and listener
- MEDIUM: config: replace ssl_conf by bind_conf
- CLEANUP: listener: remove unused conf->file and conf->line
- MEDIUM: listener: add a minimal framework to register "bind" keyword options
- MEDIUM: config: move the "bind" TCP parameters to proto_tcp
- MEDIUM: move bind SSL parsing to ssl_sock
- MINOR: config: improve error reporting for "bind" lines
- MEDIUM: config: move the common "bind" settings to listener.c
- MEDIUM: config: move all unix-specific bind keywords to proto_uxst.c
- MEDIUM: config: enumerate full list of registered "bind" keywords upon error
- MINOR: listener: add a scope field in the bind keyword lists
- MINOR: config: pass the file and line to config keyword parsers
- MINOR: stats: fill the file and line numbers in the stats frontend
- MINOR: config: set the bind_conf entry on listeners created from a "listen" line.
- MAJOR: listeners: use dual-linked lists to chain listeners with frontends
- REORG: listener: move unix perms from the listener to the bind_conf
- BUG: backend: balance hdr was broken since 1.5-dev11
- MINOR: standard: make memprintf() support a NULL destination
- MINOR: config: make str2listener() use memprintf() to report errors.
- MEDIUM: stats: remove the stats_sock struct from the global struct
- MINOR: ssl: set the listeners' data layer to ssl during parsing
- MEDIUM: stats: make use of the standard "bind" parsers to parse global socket
- DOC: move bind options to their own section
- DOC: stats: refer to "bind" section for "stats socket" settings
- DOC: fix index to reference bind and server options
- BUG: http: do not print garbage on invalid requests in debug mode
- BUG/MINOR: config: check the proper pointer to report unknown protocol
- CLEANUP: connection: offer conn_prepare() to set up a connection
- CLEANUP: config: fix typo inteface => interface
- BUG: stats: fix regression introduced by commit 4348fad1
- MINOR: cli: allow to set frontend maxconn to zero
- BUG/MAJOR: http: chunk parser was broken with buffer changes
- MEDIUM: monitor: simplify handling of monitor-net and mode health
- MINOR: connection: add a pointer to the connection owner
- MEDIUM: connection: make use of the owner instead of container_of
- BUG/MINOR: ssl: report the L4 connection as established when possible
- BUG/MEDIUM: proxy: must not try to stop disabled proxies upon reload
- BUG/MINOR: config: use a copy of the file name in proxy configurations
- BUG/MEDIUM: listener: don't pause protocols that do not support it
- MEDIUM: proxy: add the global frontend to the list of normal proxies
- BUG/MINOR: epoll: correctly disable FD polling in fd_rem()
- MINOR: signal: really ignore signals configured with no handler
- MINOR: buffers: add a few functions to write chars, strings and blocks
- MINOR: raw_sock: always report asynchronous connection errors
- MEDIUM: raw_sock: improve connection error reporting
- REORG: connection: rename the data layer the "transport layer"
- REORG: connection: rename app_cb "data"
- MINOR: connection: provide a generic data layer wakeup callback
- MINOR: connection: split conn_prepare() in two functions
- MINOR: connection: add an init callback to the data_cb struct
- MEDIUM: session: use a specific data_cb for embryonic sessions
- MEDIUM: connection: use a generic data-layer init() callback
- MEDIUM: connection: reorganize connection flags
- MEDIUM: connection: only call the data->wake callback on activity
- MEDIUM: connection: make it possible for data->wake to return an error
- MEDIUM: session: register a data->wake callback to process errors
- MEDIUM: connection: don't call the data->init callback upon error
- MEDIUM: connection: it's not the data layer's role to validate the connection
- MEDIUM: connection: automatically disable polling on error
- REORG: connection: move the PROXY protocol management to connection.c
- MEDIUM: connection: add a new local send-proxy transport callback
- MAJOR: checks: make use of the connection layer to send checks
- REORG: server: move the check-specific parts into a check subsection
- MEDIUM: checks: use real buffers to store requests and responses
- MEDIUM: check: add the ctrl and transport layers in the server check structure
- MAJOR: checks: completely use the connection transport layer
- MEDIUM: checks: add the "check-ssl" server option
- MEDIUM: checks: enable the PROXY protocol with health checks
- CLEANUP: checks: remove minor warnings for assigned but not used variables
- MEDIUM: tcp: enable TCP Fast Open on systems which support it
- BUG: connection: fix regression from commit 9e272bf9
- CLEANUP: cttproxy: remove a warning on undeclared close()
- BUG/MAJOR: ensure that hdr_idx is always reserved when L7 fetches are used
- MEDIUM: listener: add support for linux's accept4() syscall
- MINOR: halog: sort output by cookie code
- BUG/MINOR: halog: -ad/-ac report the correct number of output lines
- BUG/MINOR: halog: fix help message for -ut/-uto
- MINOR: halog: add a parameter to limit output line count
- BUILD: accept4: move the socketcall declaration outside of accept4()
- MINOR: server: add minimal infrastructure to parse keywords
- MINOR: standard: make indent_msg() support empty messages
- MEDIUM: server: check for registered keywords when parsing unknown keywords
- MEDIUM: server: move parsing of keyword "id" to server.c
- BUG/MEDIUM: config: check-send-proxy was ignored if SSL was not builtin
- MEDIUM: ssl: move "server" keyword SSL options parsing to ssl_sock.c
- MEDIUM: log: suffix the frontend's name with '~' when using SSL
- MEDIUM: connection: always unset the transport layer upon close
- BUG/MINOR: session: fix some leftover from debug code
- BUG/MEDIUM: session: enable the conn_session_update() callback
- MEDIUM: connection: add a flag to hold the transport layer
- MEDIUM: log: add a new LW_XPRT flag to pin the transport layer
- MINOR: log: make lf_text use a const char *
- MEDIUM: log: report SSL ciphers and version in logs using logformat %sslc/%sslv
- REORG: http: rename msg->buf to msg->chn since it's a channel
- CLEANUP: http: use 'chn' to name channel variables, not 'buf'
- CLEANUP: channel: use 'chn' instead of 'buf' as local variable names
- CLEANUP: tcp: use 'chn' instead of 'buf' or 'b' for channel pointer names
- CLEANUP: stream_interface: use 'chn' instead of 'b' to name channel pointers
- CLEANUP: acl: use 'chn' instead of 'b' to name channel pointers
- MAJOR: channel: replace the struct buffer with a pointer to a buffer
- OPTIM: channel: reorganize struct members to improve cache efficiency
- CLEANUP: session: remove term_trace which is not used anymore
- OPTIM: session: reorder struct session fields
- OPTIM: connection: pack the struct target
- DOC: document relations between internal entities
- MINOR: ssl: add 'ssl_npn' sample/acl to extract TLS/NPN information
- BUILD: ssl: fix shctx build on older compilers
- MEDIUM: ssl: add support for the "npn" bind keyword
- BUG: ssl: fix ssl_sni ACLs to correctly process regular expressions
- MINOR: chunk: provide string compare functions
- MINOR: sample: accept fetch keywords without parenthesis
- MEDIUM: sample: pass an empty list instead of a null for fetch args
- MINOR: ssl: improve socket behaviour upon handshake abort.
- BUG/MEDIUM: http: set DONTWAIT on data when switching to tunnel mode
- MEDIUM: listener: provide a fallback for accept4() when not supported
- BUG/MAJOR: connection: risk of crash on certain tricky close scenario
- MEDIUM: cli: allow the stats socket to be bound to a specific set of processes
- OPTIM: channel: inline channel_forward's fast path
- OPTIM: http: inline http_parse_chunk_size() and http_skip_chunk_crlf()
- OPTIM: tools: inline hex2i()
- CLEANUP: http: rename HTTP_MSG_DATA_CRLF state
- MINOR: compression: automatically disable compression for older browsers
- MINOR: compression: optimize memLevel to improve byte rate
- BUG/MINOR: http: compression should consider all Accept-Encoding header values
- BUILD: fix coexistence of openssl and zlib
- MINOR: ssl: add pattern and ACLs fetches 'ssl_c_serial' and 'ssl_f_serial'
- BUG/MEDIUM: command-line option -D must have precedence over "debug"
- MINOR: tools: add a clear_addr() function to unset an address
- BUG/MEDIUM: tcp: transparent bind to the source only when address is set
- CLEANUP: remove trashlen
- MAJOR: session: detach the connections from the stream interfaces
- DOC: update document describing relations between internal entities
- BUILD: make it possible to specify ZLIB path
- MINOR: compression: add an offload option to remove the Accept-Encoding header
- BUG: compression: disable auto-close and enable MSG_MORE during transfer
- CLEANUP: completely remove trashlen
- MINOR: chunk: add a function to reset a chunk
- CLEANUP: replace chunk_printf() with chunk_appendf()
- MEDIUM: make the trash be a chunk instead of a char *
- MEDIUM: remove remains of BUFSIZE in HTTP auth and sample conversions
- MEDIUM: stick-table: allocate the table key of size buffer size
- BUG/MINOR: stream_interface: don't loop over ->snd_buf()
- BUG/MINOR: session: ensure that we don't retry connection if some data were sent
- OPTIM: session: don't process the whole session when only timers need a refresh
- BUG/MINOR: session: mark the handshake as complete earlier
- MAJOR: connection: remove the CO_FL_CURR_*_POL flag
- BUG/MAJOR: always clear the CO_FL_WAIT_* flags after updating polling flags
- MAJOR: sepoll: make the poller totally event-driven
- OPTIM: stream_interface: disable reading when CF_READ_DONTWAIT is set
- BUILD: compression: remove a build warning
- MEDIUM: fd: don't unset fdtab[].updated upon delete
- REORG: fd: move the speculative I/O management from ev_sepoll
- REORG: fd: move the fd state management from ev_sepoll
- REORG: fd: centralize the processing of speculative events
- BUG: raw_sock: also consider ENOTCONN in addition to EAGAIN
- BUILD: stream_interface: remove si_fd() and its references
- BUILD: compression: enable build in BSD and OSX Makefiles
- MAJOR: ev_select: make the poller support speculative events
- MAJOR: ev_poll: make the poller support speculative events
- MAJOR: ev_kqueue: make the poller support speculative events
- MAJOR: polling: replace epoll with sepoll and remove sepoll
- MAJOR: polling: remove unused callbacks from the poller struct
- MEDIUM: http: refrain from sending "Connection: close" when Upgrade is present
- CLEANUP: channel: remove any reference of the hijackers
- CLEANUP: stream_interface: remove the external task type target
- MAJOR: connection: replace struct target with a pointer to an enum
- BUG: connection: fix typo in previous commit
- BUG: polling: don't skip polled events in the spec list
- MINOR: splice: disable it when the system returns EBADF
- MINOR: build: allow packagers to specify the default maxzlibmem
- BUG: halog: fix broken output limitation
- BUG: proxy: fix server name lookup in get_backend_server()
- BUG: compression: do not always increment the round counter on allocation failure
- BUG/MEDIUM: compression: release the zlib pools between keep-alive requests
- MINOR: global: don't prevent nbproc from being redefined
- MINOR: config: support process ranges for "bind-process"
- MEDIUM: global: add support for CPU binding on Linux ("cpu-map")
- MINOR: ssl: rename and document the tune.ssl.cachesize option
- DOC: update the PROXY protocol spec to support v2
- MINOR: standard: add a simple popcount function
- MEDIUM: adjust the maxaccept per listener depending on the number of processes
- BUG: compression: properly disable compression when content-type does not match
- MINOR: cli: report connection status in "show sess xxx"
- BUG/MAJOR: stream_interface: certain workloads could cause get stuck
- BUILD: cli: fix build when SSL is enabled
- MINOR: cli: report the fd state in "show sess xxx"
- MINOR: cli: report an error message on missing argument to compression rate
- MINOR: http: add some debugging functions to pretty-print msg state names
- BUG/MAJOR: stream_interface: read0 not always handled since dev12
- DOC: documentation on http header capture is wrong
- MINOR: http: allow the cookie capture size to be changed
- DOC: http header capture has not been limited in size for a long time
- DOC: update readme with build methods for BSD
- BUILD: silence a warning on Solaris about usage of isdigit()
- MINOR: stats: report HTTP compression stats per frontend and per backend
- MINOR: log: add '%Tl' to log-format
- MINOR: samples: update the url_param fetch to match parameters in the path
Willy Tarreau [Wed, 21 Nov 2012 07:27:21 +0000 (08:27 +0100)]
MINOR: stats: report HTTP compression stats per frontend and per backend
It was a bit frustrating to have no idea about the bandwidth saved by
HTTP compression. Now we have per-frontend and per-backend stats. The
stats on the HTTP interface are shown in a hover title in the "bytes out"
column if at least something was fed to the compressor. 3 new columns
appeared in the CSV stats output.
Willy Tarreau [Wed, 21 Nov 2012 23:21:46 +0000 (00:21 +0100)]
DOC: http header capture has not been limited in size for a long time
It's been documented for a very long time that captured HTTP headers
were limited to 64 characters, but this has not the case anymore since
1.3.11 in 2007 (commit cf7f320f), as they all use their own pool and
have no such limit anymore.
Willy Tarreau [Wed, 21 Nov 2012 23:17:38 +0000 (00:17 +0100)]
MINOR: http: allow the cookie capture size to be changed
Some users need more than 64 characters to log large cookies. The limit
was set to 63 characters (and not 64 as previously documented). Now it
is possible to change this using the global "tune.http.cookielen" setting
if required.
Willy Tarreau [Wed, 21 Nov 2012 22:37:37 +0000 (23:37 +0100)]
DOC: documentation on http header capture is wrong
Since commit it is said that only the first value of the first occurrence
of a header is captured. This is wrong. Since the introduction of header
captures in version 1.1 in 2005 (commit e983144d), the WHOLE line of the
LAST occurrence has been captured and the behaviour has never changed.
At this time the doc was correct. The error was introduced in the new doc
in 1.3.14 in 2007 (commit 0ba27505).
Willy Tarreau [Wed, 21 Nov 2012 20:51:53 +0000 (21:51 +0100)]
BUG/MAJOR: stream_interface: read0 not always handled since dev12
The connection handling changed introduced in 1.5-dev12 introduced a
regression with commit 9bf9c14c. The issue is that the stream_sock_read0()
callback must update the channel flags to indicate that the side is closed
so that when process_session() is called, it can propagate the close to the
other side and terminate the session.
The issue only appears in HTTP tunnel mode. It's a bit tricky to trigger
the issue, it requires that the request channel is full with data flowing
from the client to the server and that both the response and the read0()
are received at once so that the flags are not updated, and that the HTTP
analyser switches to tunnel mode without being informed that the request
write side is closed. After that, process_session() does not know that the
connection has to be aborted either, and no more event appears on this side
where the connection stays here forever.
Many thanks to Igor at owind for testing several snapshots and for providing
valuable traces to reproduce and diagnose the issue!
New option 'maxcompcpuusage' in global section.
Sets the maximum CPU usage HAProxy can reach before stopping the
compression for new requests or decreasing the compression level of
current requests. It works like 'maxcomprate' but with the Idle.
Using compression rate limit, the compression level wasn't taking care
of the max compression level during a session because the test was done
on the wrong variable.
Willy Tarreau [Mon, 19 Nov 2012 15:43:14 +0000 (16:43 +0100)]
BUG/MAJOR: stream_interface: certain workloads could cause get stuck
Some very specifically scheduled workloads could sometimes get stuck when
data receive was disabled due to buffer full then re-enabled due to a full
send(). A conn_data_want_recv() had to be set again in this specific case.
This bug was introduced with connection rework and polling changes in dev12.
This patch makes changes in the http_response_forward_body state
machine. It checks if the compress algorithm had consumed data before
swapping the temporary and the input buffer. So it prevents null sized
zlib chunks.
Willy Tarreau [Mon, 19 Nov 2012 13:55:02 +0000 (14:55 +0100)]
BUG: compression: properly disable compression when content-type does not match
Disabling compression based on the content-type was improperly done since the
introduction of the COMP_READY flag, sometimes resulting in truncated responses.
Willy Tarreau [Mon, 19 Nov 2012 11:39:59 +0000 (12:39 +0100)]
MEDIUM: adjust the maxaccept per listener depending on the number of processes
global.tune.maxaccept was used for all listeners. This becomes really not
convenient when some listeners are bound to a single process and other ones
are bound to many processes.
Now we change the principle : we count the number of processes a listener
is bound to, and apply the maxaccept either entirely if there is a single
process, or divided by twice the number of processes in order to maintain
fairness.
The default limit has also been increased from 32 to 64 as it appeared that
on small machines, 32 was too low to achieve high connection rates.
Willy Tarreau [Mon, 19 Nov 2012 10:27:29 +0000 (11:27 +0100)]
DOC: update the PROXY protocol spec to support v2
The doc updates covers the following points :
- description of protocol version 2
- discourage emission of UNKNOWN and encourage it acceptance
- clarify that each header must fit in an MSS and be sent at once
- provide an example of receiver code that explains how to use MSG_PEEK.
Willy Tarreau [Fri, 16 Nov 2012 15:12:27 +0000 (16:12 +0100)]
MEDIUM: global: add support for CPU binding on Linux ("cpu-map")
The new "cpu-map" directive allows one to assign the CPU sets that
a process is allowed to bind to. This is useful in combination with
the "nbproc" and "bind-process" directives.
The support is implicit on Linux 2.6.28 and above.
Willy Tarreau [Thu, 15 Nov 2012 16:38:15 +0000 (17:38 +0100)]
MINOR: global: don't prevent nbproc from being redefined
Having nbproc preinitialized to zero is really annoying as it prevents
some checks from being correctly performed. Also the check to prevent
nbproc from being redefined is totally useless, so let's preset it to
1 and remove the test.
Willy Tarreau [Thu, 15 Nov 2012 15:41:22 +0000 (16:41 +0100)]
BUG/MEDIUM: compression: release the zlib pools between keep-alive requests
There was a possible memory leak in the zlib code when the first response of
a keep-alive session was compressed, because the next request would reset the
compression algo, preventing a later call to session_free() from releasing it.
The reason is that it is necessary to release the assigned resources in
http_end_txn_clean_session().
Willy Tarreau [Thu, 15 Nov 2012 13:57:56 +0000 (14:57 +0100)]
BUG: compression: do not always increment the round counter on allocation failure
Zlib (at least 1.2 and 1.3) aborts when it fails to allocate the state, so we
must not count a round on this event. If the state succeeds, then it allocates
all the 4 remaining counters at once.
Emeric Brun [Wed, 14 Nov 2012 10:32:56 +0000 (11:32 +0100)]
MINOR: build: allow packagers to specify the ssl cache size
This is done by passing the default value to SSLCACHESIZE in sessions.
User can use tune.sslcachesize to change this value.
By default, it is set to 20000 sessions as openssl internal cache size.
Currently, a session entry size is between 592 and 616 bytes depending on the arch.
Willy Tarreau [Wed, 14 Nov 2012 23:15:18 +0000 (00:15 +0100)]
BUG: proxy: fix server name lookup in get_backend_server()
The lookup was broken by commit 050536d5. The server ID is
initialized to a negative value but unfortunately not all the
tests were converted. Thanks to Igor at owind for reporting it.
Willy Tarreau [Tue, 13 Nov 2012 19:48:15 +0000 (20:48 +0100)]
BUG: halog: fix broken output limitation
Commit 667c905f introduced parameter -m to halog which limits the size
of the output. Unfortunately it is completely broken in that it doesn't
check that the limit was previously set or not, and also prevents a
simple counting operation from returning anything if a limit is not set.
Note that the -gt and -pct outputs behave differently in face of this
limit, since they count the valid output lines BEFORE actually producing
the data, so the limit really applies to valid input lines.
This clearly is a kernel issue since all FDs are valid here, so let's
simply disable splice() on the connection when this happens so that
the session correctly recovers from that issue using recv().
Emeric Brun [Thu, 8 Nov 2012 18:21:55 +0000 (19:21 +0100)]
BUG/MEDIUM: ssl: Fix sometimes reneg fails if requested by server.
SSL_do_handshake is not appropriate for reneg, it's only appropriate at the
beginning of a connection. OpenSSL correctly handles renegs using the data
functions, so we use SSL_peek() here to make its state machine progress if
SSL_renegotiate_pending() says a reneg is pending.
Emeric Brun [Thu, 8 Nov 2012 17:02:56 +0000 (18:02 +0100)]
BUG/MEDIUM: ssl: Fix some reneg cases not correctly handled.
SSL may decide to switch to a handshake in the middle of a transfer due to
a reneg. In this case we don't want to re-enable polling because data might
have been left pending in the buffer. We just want to switch immediately to
the handshake mode.
Willy Tarreau [Mon, 12 Nov 2012 00:57:14 +0000 (01:57 +0100)]
BUG: polling: don't skip polled events in the spec list
Commit 09f245 came with a bug : if we don't process events from the
spec list that are also being polled, we can end up with some stuck
events that nobody processes.
We must process all events from the spec list even if they're being
polled in parallel.
Willy Tarreau [Sun, 11 Nov 2012 23:42:33 +0000 (00:42 +0100)]
MAJOR: connection: replace struct target with a pointer to an enum
Instead of storing a couple of (int, ptr) in the struct connection
and the struct session, we use a different method : we only store a
pointer to an integer which is stored inside the target object and
which contains a unique type identifier. That way, the pointer allows
us to retrieve the object type (by dereferencing it) and the object's
address (by computing the displacement in the target structure). The
NULL pointer always corresponds to OBJ_TYPE_NONE.
This reduces the size of the connection and session structs. It also
simplifies target assignment and compare.
In order to improve the generated code, we try to put the obj_type
element at the beginning of all the structs (listener, server, proxy,
si_applet), so that the original and target pointers are always equal.
A lot of code was touched by massive replaces, but the changes are not
that important.
Willy Tarreau [Sun, 11 Nov 2012 22:14:16 +0000 (23:14 +0100)]
CLEANUP: stream_interface: remove the external task type target
Before connections were introduced, it was possible to connect an
external task to a stream interface. However it was left as an
exercise for the brave implementer to find how that ought to be
done.
The feature was broken since the introduction of connections and
was never fixed since due to lack of users. Better remove this dead
code now.
Willy Tarreau [Sun, 11 Nov 2012 22:05:39 +0000 (23:05 +0100)]
CLEANUP: channel: remove any reference of the hijackers
Hijackers were functions designed to inject data into channels in the
distant past. They became unused around 1.3.16, and since there has
not been any user of this mechanism to date, it's uncertain whether
the mechanism still works (and it's not really useful anymore). So
better remove it as well as the pointer it uses in the channel struct.
Willy Tarreau [Sun, 11 Nov 2012 21:19:57 +0000 (22:19 +0100)]
MEDIUM: http: refrain from sending "Connection: close" when Upgrade is present
Some servers are not totally HTTP-compliant when it comes to parsing the
Connection header. This is particularly true with WebSocket where it happens
from time to time that a server doesn't support having a "close" token along
with the "Upgrade" token in the Connection header. This broken behaviour has
also been noticed on some clients though the problem is less frequent on the
response path.
Sometimes the workaround consists in enabling "option http-pretend-keepalive"
to leave the request Connection header untouched, but this is not always the
most convenient solution. This patch introduces a new solution : haproxy now
also looks for the "Upgrade" token in the Connection header and if it finds
it, then it refrains from adding any other token to the Connection header
(though "keep-alive" and "close" may still be removed if found). The same is
done for the response headers.
This way, WebSocket much with less changes even when facing non-compliant
clients or servers. At least it fixes the DISCONNECT issue that was seen
on the websocket.org test.
Note that haproxy does not change its internal mode, it just refrains from
adding new tokens to the connection header.
Willy Tarreau [Sun, 11 Nov 2012 16:42:00 +0000 (17:42 +0100)]
MAJOR: polling: replace epoll with sepoll and remove sepoll
Now that all pollers make use of speculative I/O, there is no point
having two epoll implementations, so replace epoll with the sepoll code
and remove sepoll which has just become the standard epoll method.
Willy Tarreau [Sun, 11 Nov 2012 19:49:49 +0000 (20:49 +0100)]
MAJOR: ev_kqueue: make the poller support speculative events
The poller was updated to support speculative events. We'll need this
to fully support SSL.
As an a side effect, the code has become much simpler and much more
efficient, by taking advantage of the nice kqueue API which supports
batched updates. All references to fd_sets have disappeared, and only
the fdtab[].spec_e fields are used to decide about file descriptor
state.
Willy Tarreau [Sun, 11 Nov 2012 18:27:15 +0000 (19:27 +0100)]
BUILD: stream_interface: remove si_fd() and its references
si_fd() is not used a lot, and breaks builds on OpenBSD 5.2 which
defines this name for its own purpose. It's easy enough to remove
this one-liner function, so let's do it.
Willy Tarreau [Sun, 11 Nov 2012 19:38:30 +0000 (20:38 +0100)]
BUG: raw_sock: also consider ENOTCONN in addition to EAGAIN
A failed send() may return ENOTCONN when the connection is not yet established.
On Linux, we generally see EAGAIN but on OpenBSD we clearly have ENOTCONN, so
let's ensure we poll for write when we encounter this error.
Willy Tarreau [Sun, 11 Nov 2012 15:05:19 +0000 (16:05 +0100)]
REORG: fd: move the fd state management from ev_sepoll
ev_sepoll already provides everything needed to manage FD events
by only manipulating the speculative I/O list. Nothing there is
sepoll-specific so move all this to fd.
Cyril Bonté [Sat, 10 Nov 2012 18:27:47 +0000 (19:27 +0100)]
BUILD: report zlib support in haproxy -vv
Compression algorithms are not always supported depending on build options.
"haproxy -vv" now reports if zlib is supported and lists compression algorithms
also supported.
Willy Tarreau [Sat, 10 Nov 2012 16:49:37 +0000 (17:49 +0100)]
BUILD: compression: remove a build warning
gcc emits this warning while building free_zlib() :
src/compression.c: In function `free_zlib':
src/compression.c:403: warning: 'pool' might be used uninitialized in this function
This is not a bug as the pool cannot take other values, but let's
pre-initialize is to null to fix the warning.
MINOR: compression: maximum compression rate limit
This patch adds input and output rate calcutation on the HTTP compresion
feature.
Compression can be limited with a maximum rate value in kilobytes per
second. The rate is set with the global 'maxcomprate' option. You can
change this value dynamicaly with 'set rate-limit http-compression
global' on the UNIX socket.
This optimisation causes haproxy to time out requests that result
in two TCP packets, one packet containing the header, and one
packet containing the actual data. This is a very typical type
of response from a lot of servers.
[Willy: I suspect the fix might have an impact on the compression code
which I'm not sure completely handles calls with 0 bytes to forward]
Willy Tarreau [Fri, 9 Nov 2012 17:27:26 +0000 (18:27 +0100)]
OPTIM: stream_interface: disable reading when CF_READ_DONTWAIT is set
CF_READ_DONTWAIT was designed to avoid getting an EAGAIN upon recv() when
very few data are expected. It prevents the reader from looping over
recv(). Unfortunately with speculative I/O, it is very common that the
same event has the time to be called twice before the task handles the
data and disables the recv(). This is because not all tasks are always
processed at once.
Instead of leaving the buffer free-wheeling and doing an EAGAIN, we
disable reading as soon as the first recv() succeeds. This way we're
sure that only the next wakeup of the task will re-enable it if needed.
Doing so has totally removed the EAGAIN we were seeing till now (30% of
recv).
Willy Tarreau [Tue, 6 Nov 2012 01:34:46 +0000 (02:34 +0100)]
MAJOR: sepoll: make the poller totally event-driven
At the moment sepoll is not 100% event-driven, because a call to fd_set()
on an event which is already being polled will not change its state.
This causes issues with OpenSSL because if some I/O processing is interrupted
after clearing the I/O event (eg: read all data from a socket, can't put it
all into the buffer), then there is no way to call the SSL_read() again once
the buffer releases some space.
The only real solution is to go 100% event-driven. The principle is to use
the spec list as an event cache and that each time an I/O event is reported
by epoll_wait(), this event is automatically scheduled for addition to the
spec list for future calls until the consumer explicitly asks for polling
or stopping.
Doing this is a bit tricky because sepoll used to provide a substantial
number of optimizations such as event merging. These optimizations have
been maintained : a dedicated update list is affected when events change,
but not the event list, so that updates may cancel themselves without any
side effect such as displacing events. A specific case was considered for
handling newly created FDs as soon as they are detected from within the
poll loop. This ensures that their read or write operation will always be
attempted as soon as possible, thus reducing the number of poll loops and
process_session wakeups. This is especially true for newly accepted fds
which immediately perform their first recv() call.
Two new flags were added to the fdtab[] struct to tag the fact that a file
descriptor already exists in the update list. One flag indicates that a
file descriptor is new and has just been created (fdtab[].new) and the other
one indicates that a file descriptor is already referenced by the update list
(fdtab[].updated). Even if the FD state changes during operations or if the
fd is closed and replaced, it's not an issue because the update flag remains
and is easily spotted during list walks. The flag must absolutely reflect the
presence of the fd in the update list in order to avoid overflowing the update
list with more events than there are distinct fds.
Note that this change also recovers the small performance loss introduced
by its connection counter-part and goes even beyond.
Willy Tarreau [Mon, 5 Nov 2012 19:00:43 +0000 (20:00 +0100)]
BUG/MAJOR: always clear the CO_FL_WAIT_* flags after updating polling flags
The CO_FL_WAIT_* flags were not cleared after updating polling flags.
This means that any caller of these functions that did not clear it
would enable polling instead of speculative I/O. This happens during
the stream interface update call which is performed from the session
handler for example.
As of now it's not a problem yet because speculative I/O and polling
are handled the same way. However with upcoming changes it does cause
some deadlocks because enabling read processing on a file descriptor
where everything was already read will do nothing until something new
happens on this FD.
The correct fix consists in clearing the flags while leaving the update
functions.
This fix does not need any backport as it was introduced with recent
connection changes (dev12) and not triggered until last commit.
Willy Tarreau [Mon, 5 Nov 2012 16:52:26 +0000 (17:52 +0100)]
MAJOR: connection: remove the CO_FL_CURR_*_POL flag
This is the first step of a series of changes aiming at making the
polling totally event-driven. This first change consists in only
remembering at the connection level whether an FD was enabled or not,
regardless of the fact it was being polled or cached. From now on, an
EAGAIN will always be considered as a change so that the pollers are
able to manage a cache and to flush it based on such events. One of
the noticeable effect is that conn_fd_handler() is called once more
per session (6 instead of 5 min) but other update functions are less
called.
Note that the performance loss caused by this change at the moment is
quite significant, around 2.5%, but the change is needed to have SSL
working correctly in all situations, even when data were read from the
socket and stored in the invisible cache, waiting for some room in the
channel's buffer.
Willy Tarreau [Mon, 5 Nov 2012 23:14:25 +0000 (00:14 +0100)]
BUG/MINOR: session: mark the handshake as complete earlier
There is a small waste of CPU cycles when no handshake is required on an
accepted connection, because we had to perform one call to conn_fd_handler()
to mark the connection CONNECTED and to call process_session() again to say
that nothing happened.
By marking the connection CONNECTED when there is no pending handshake, we
avoid this extra call to process_session().
Willy Tarreau [Thu, 8 Nov 2012 13:49:17 +0000 (14:49 +0100)]
OPTIM: session: don't process the whole session when only timers need a refresh
Having a global expiration timer for a task means that the tasks are regularly
woken up (at least after each expiration timer). It's totally useless and counter
productive to process the whole session upon each such wakeup, and it's fairly
easy to detect such wakeups, so let's just update the task's timer and return
to sleep when this happens.
For 100k concurrent connections with 10s of timeouts, this can save 10k wakeups
per second, which is not bad.
With the global maxzlibmem option, you are able ton control the maximum
amount of RAM usable for HTTP compression.
A test is done before each zlib allocation, if the there isn't available
memory, the test fail and so the zlib initialization, so data won't be
compressed.
Don't use the zlib allocator anymore, 5 pools are used for the zlib
compression. Their sizes depends of the window size and the memLevel in
deflateInit2.
Willy Tarreau [Mon, 29 Oct 2012 21:41:31 +0000 (22:41 +0100)]
BUG/MINOR: session: ensure that we don't retry connection if some data were sent
With extra-large buffers, it is possible that a lot of data are sent upon
connection establishment before the session is notified. The issue is how
to handle a send() error after some data were actually sent.
At the moment, only a connection error is reported, causing a new connection
attempt and send() to restart after the last data. We absolutely don't want
to retry the connect() if at least one byte was sent, because those data are
lost.
The solution consists in reporting exactly what happens, which is :
- a successful connection attempt
- a read/write error on the channel
That way we go on with sess_establish(), the response analysers are called
and report the appropriate connection state for the error (typically a server
abort while waiting for a response). This mechanism also guarantees that we
won't retry since it's a success. The logs also report the correct connect
time.
Note that 1.4 is not directly affected because it only attempts one send(),
so it cannot detect a send() failure here and distinguish it form a failed
connection attempt. So no backport is needed. Also, this is just a safe belt
we're taking, since this issue should not happen anymore since previous commit.
Willy Tarreau [Mon, 29 Oct 2012 22:27:14 +0000 (23:27 +0100)]
BUG/MINOR: stream_interface: don't loop over ->snd_buf()
It is stupid to loop over ->snd_buf() because the snd_buf() itself already
loops and stops when system buffers are full. But looping again onto it,
we lose the information of the full buffers and perform one useless syscall.
Furthermore, this causes issues when dealing with large uploads while waiting
for a connection to establish, as it can report a server reject of some data
as a connection abort, which is wrong.
1.4 does not have this issue as it loops maximum twice (once for each buffer
half) and exists as soon as system buffers are full. So no backport is needed.