Willy Tarreau [Thu, 8 Nov 2012 13:49:17 +0000 (14:49 +0100)]
OPTIM: session: don't process the whole session when only timers need a refresh
Having a global expiration timer for a task means that the tasks are regularly
woken up (at least after each expiration timer). It's totally useless and counter
productive to process the whole session upon each such wakeup, and it's fairly
easy to detect such wakeups, so let's just update the task's timer and return
to sleep when this happens.
For 100k concurrent connections with 10s of timeouts, this can save 10k wakeups
per second, which is not bad.
With the global maxzlibmem option, you are able ton control the maximum
amount of RAM usable for HTTP compression.
A test is done before each zlib allocation, if the there isn't available
memory, the test fail and so the zlib initialization, so data won't be
compressed.
Don't use the zlib allocator anymore, 5 pools are used for the zlib
compression. Their sizes depends of the window size and the memLevel in
deflateInit2.
Willy Tarreau [Mon, 29 Oct 2012 21:41:31 +0000 (22:41 +0100)]
BUG/MINOR: session: ensure that we don't retry connection if some data were sent
With extra-large buffers, it is possible that a lot of data are sent upon
connection establishment before the session is notified. The issue is how
to handle a send() error after some data were actually sent.
At the moment, only a connection error is reported, causing a new connection
attempt and send() to restart after the last data. We absolutely don't want
to retry the connect() if at least one byte was sent, because those data are
lost.
The solution consists in reporting exactly what happens, which is :
- a successful connection attempt
- a read/write error on the channel
That way we go on with sess_establish(), the response analysers are called
and report the appropriate connection state for the error (typically a server
abort while waiting for a response). This mechanism also guarantees that we
won't retry since it's a success. The logs also report the correct connect
time.
Note that 1.4 is not directly affected because it only attempts one send(),
so it cannot detect a send() failure here and distinguish it form a failed
connection attempt. So no backport is needed. Also, this is just a safe belt
we're taking, since this issue should not happen anymore since previous commit.
Willy Tarreau [Mon, 29 Oct 2012 22:27:14 +0000 (23:27 +0100)]
BUG/MINOR: stream_interface: don't loop over ->snd_buf()
It is stupid to loop over ->snd_buf() because the snd_buf() itself already
loops and stops when system buffers are full. But looping again onto it,
we lose the information of the full buffers and perform one useless syscall.
Furthermore, this causes issues when dealing with large uploads while waiting
for a connection to establish, as it can report a server reject of some data
as a connection abort, which is wrong.
1.4 does not have this issue as it loops maximum twice (once for each buffer
half) and exists as soon as system buffers are full. So no backport is needed.
MINOR: compression: Enable compression for IE6 w/SP2, IE7 and IE8
Some old browsers that have a user-agent starting with "Mozilla/4" do
not support compressison correctly, so disable compression for those.
Internet explorer 6 after Windows XP service pack 2, IE 7, and IE 8,
do however support compression and still have a user agent starting
with Mozilla/4, so we try to enable compression for those.
MSIE has a user-agent on this form:
Mozilla/4.0 (compatible; MSIE <version>; ...)
98% of MSIE 6 SP2 user agents start with
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1
The remaining 2% have additional flags before "SV1".
This simplified matching looking for MSIE at exactly position 25
and SV1 at exacly position 51 gives a few false negatives, so sometimes
a compression opportunity is lost.
A test against 3 hours of traffic to around 3000 news sites worldwide
gives less than 0.007% (70ppm) missed compression opportunities.
Willy Tarreau [Mon, 29 Oct 2012 20:56:59 +0000 (21:56 +0100)]
MEDIUM: stick-table: allocate the table key of size buffer size
Keys are copied from samples to stick_table_key. If a key is larger
than the stick_table_key, we have an overflow. In pratice it does not
happen because it requires :
1) a configuration with tune.bufsize larger than BUFSIZE (common)
2) a stick-table configured with keys strictly larger than buffers
3) extraction of data larger than BUFSIZE (eg: using payload())
Points 2 and 3 don't make any sense for a real world configuration. That
said the issue needs be fixed. The solution consists in allocating it the
same size as the global buffer size, just like the samples. This fixes the
issue.
Willy Tarreau [Mon, 29 Oct 2012 19:44:36 +0000 (20:44 +0100)]
MEDIUM: remove remains of BUFSIZE in HTTP auth and sample conversions
Sample conversions rely on two alternative buffers which were previously
allocated as static bufs of size BUFSIZE. Now they're initialized to the
global buffer size. It was the same for HTTP authentication. Note that it
seems that none of them was prone to any mistake when dealing with the
buffer size, but better stay on the safe side by maintaining the old
assumption that a trash buffer is always "large enough".
Willy Tarreau [Mon, 29 Oct 2012 15:51:55 +0000 (16:51 +0100)]
MEDIUM: make the trash be a chunk instead of a char *
The trash is used everywhere to store the results of temporary strings
built out of s(n)printf, or as a storage for a chunk when chunks are
needed.
Using global.tune.bufsize is not the most convenient thing either.
So let's replace trash with a chunk and directly use it as such. We can
then use trash.size as the natural way to get its size, and get rid of
many intermediary chunks that were previously used.
The patch is huge because it touches many areas but it makes the code
a lot more clear and even outlines places where trash was used without
being that obvious.
Willy Tarreau [Mon, 29 Oct 2012 12:23:11 +0000 (13:23 +0100)]
MINOR: chunk: add a function to reset a chunk
This is a first step in avoiding to constantly reinitialize chunks.
It replaces the old chunk_reset() which was not properly named as it
used to drop everything and was only used by chunk_destroy(). It has
been renamed chunk_drop().
Willy Tarreau [Fri, 26 Oct 2012 23:36:34 +0000 (01:36 +0200)]
BUG: compression: disable auto-close and enable MSG_MORE during transfer
We don't want the lower layer to forward a close while we're compressing,
and we want the system to fuse outgoing TCP segments using MSG_MORE as
much as possible to save round trips that can emerge from sending short
packets with a PUSH flag.
A test on a remote busy DSL line consisting in compressing a 100MB file
on the fly full of zeroes only showed a transfer rate of a few kB/s due
to these round trips.
Willy Tarreau [Fri, 26 Oct 2012 18:10:28 +0000 (20:10 +0200)]
MAJOR: session: detach the connections from the stream interfaces
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.
Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().
This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
Willy Tarreau [Fri, 26 Oct 2012 17:57:58 +0000 (19:57 +0200)]
BUG/MEDIUM: tcp: transparent bind to the source only when address is set
Thomas Heil reported that health checks did not work anymore when a backend
or server has "usesrc clientip". This is because the source address is not
set and tcp_bind_socket() tries to bind to that address anyway.
The solution consists in explicitly clearing the source address in the checks
and to make tcp_bind_socket() avoid binding when the address is not set. This
also has an indirect benefit that a useless bind() syscall will be avoided
when using "source 0.0.0.0 usesrc clientip" in health checks.
Willy Tarreau [Fri, 26 Oct 2012 14:04:28 +0000 (16:04 +0200)]
BUG/MEDIUM: command-line option -D must have precedence over "debug"
From the beginning it has been said that -D must always be used on the
command line from startup scripts so that haproxy does not accidentally
stay in foreground when loaded from init script... Except that this has
not been true for a long time now.
The fix is easy and must be backported to 1.4 too which is affected.
Emeric Brun [Mon, 22 Oct 2012 12:11:22 +0000 (14:11 +0200)]
MINOR: ssl: add pattern and ACLs fetches 'ssl_c_notbefore', 'ssl_c_notafter', 'ssl_f_notbefore' and 'ssl_f_notafter'
ssl_c_notbefore: start date of client cert (string, eg: "121022182230Z" for YYMMDDhhmmss[Z])
ssl_c_notafter: end date of client cert (string, eg: "121022182230Z" for YYMMDDhhmmss[Z])
ssl_f_notbefore: start date of frontend cert (string, eg: "121022182230Z" for YYMMDDhhmmss[Z])
ssl_f_notafter: end date of frontend cert (string, eg: "121022182230Z" for YYMMDDhhmmss[Z])
Emeric Brun [Mon, 22 Oct 2012 10:22:55 +0000 (12:22 +0200)]
MINOR: ssl: add pattern and ACLs fetches 'ssl_c_key_alg' and 'ssl_f_key_alg'
ssl_c_key_alg: algo used to encrypt the client's cert key (ex: rsaEncryption)
ssl_f_key_alg: algo used to encrypt the frontend's cert key (ex: rsaEncryption)
Willy Tarreau [Fri, 26 Oct 2012 13:05:35 +0000 (15:05 +0200)]
BUILD: fix coexistence of openssl and zlib
The crappy zlib and openssl libs both define a free_func as a different typedef.
That's a very clever idea to use such a generic name in general purpose libraries,
really... The zlib one is easier to redefine than openssl's, so let's only fix this
one.
Willy Tarreau [Fri, 26 Oct 2012 09:36:40 +0000 (11:36 +0200)]
MINOR: compression: optimize memLevel to improve byte rate
Decreasing the deflateInit2's memLevel parameter from 9 to 8 does not
affect the compression ratio and increases the compression speed by 12%.
Lower values do not increase transfer speed but decrease the compression
ratio so it looks like 8 is optimal.
Willy Tarreau [Fri, 26 Oct 2012 00:11:25 +0000 (02:11 +0200)]
MINOR: compression: automatically disable compression for older browsers
A number of older browsers have many issues with compressed contents. It
happens that all these older browsers announce themselves as "Mozilla/4"
and that despite not being all broken, the amount of working browsers
announcing themselves this way compared to all other ones is so tiny
that it's not worth wasting cycles trying to adapt to every specific
one.
So let's simply disable compression for these older browsers.
This commit introduces HTTP compression using the zlib library.
http_response_forward_body has been modified to call the compression
functions.
This feature includes 3 algorithms: identity, gzip and deflate:
* identity: this is mostly for debugging, and it was useful for
developping the compression feature. With Content-Length in input, it
is making each chunk with the data available in the current buffer.
With chunks in input, it is rechunking, the output chunks will be
bigger or smaller depending of the size of the input chunk and the
size of the buffer. Identity does not apply any change on data.
* gzip: same as identity, but applying a gzip compression. The data
are deflated using the Z_NO_FLUSH flag in zlib. When there is no more
data in the input buffer, it flushes the data in the output buffer
(Z_SYNC_FLUSH). At the end of data, when it receives the last chunk in
input, or when there is no more data to read, it writes the end of
data with Z_FINISH and the ending chunk.
* deflate: same as gzip, but with deflate algorithm and zlib format.
Note that this algorithm has ambiguous support on many browsers and
no support at all from recent ones. It is strongly recommended not
to use it for anything else than experimentation.
You can't choose the compression ratio at the moment, it will be set to
Z_BEST_SPEED (1), as tests have shown very little benefit in terms of
compression ration when going above for HTML contents, at the cost of
a massive CPU impact.
Compression will be activated depending of the Accept-Encoding request
header. With identity, it does not take care of that header.
To build HAProxy with zlib support, use USE_ZLIB=1 in the make
parameters.
This work was initially started by David Du Colombier at Exceliance.
Willy Tarreau [Thu, 25 Oct 2012 17:04:45 +0000 (19:04 +0200)]
CLEANUP: http: rename HTTP_MSG_DATA_CRLF state
This state's name is confusing as it is only used with chunked encoding
and makes newcomers think it's also related to the content-length. Let's
call it CHUNK_CRLF to clear any doubt on this.
Willy Tarreau [Thu, 25 Oct 2012 22:58:22 +0000 (00:58 +0200)]
OPTIM: tools: inline hex2i()
This tiny function was not inlined because initially not much used.
However it's been used un the chunk parser for a while and it became
one of the most CPU-cycle eater there. By inlining it, the chunk parser
speed was increased by 74 %. We're almost 3 times faster than original
with just the last 4 commits.
Willy Tarreau [Thu, 25 Oct 2012 22:21:52 +0000 (00:21 +0200)]
OPTIM: channel: inline channel_forward's fast path
Most calls to channel_forward() are performed with short byte counts and
are already optimized in channel_forward() taking just a few instructions.
Thus it's a waste of CPU cycles to call a function for this, let's just
inline the short byte count case and fall back to the common one for
remaining situations.
Doing so has increased the chunked encoding parser's performance by 12% !
MEDIUM: http: accept IPv6 values with (s)hdr_ip acl
Commit ceb4ac9c states that IPv6 values are accepted by "hdr_ip" acl,
but the code didn't allow it. This patch provides the ability to accept IPv6
values.
BUG/MEDIUM: acls using IPv6 subnets patterns incorrectly match IPs
Some tests revealed that IPs not in the range of IPv6 subnets incorrectly
matched (for example "acl BUG src 2804::/16" applied to a src IP "127.0.0.1").
This is caused by the acl_match_ip() function applies a mask in host byte
order, whereas it should be in network byte order.
Willy Tarreau [Mon, 22 Oct 2012 21:17:18 +0000 (23:17 +0200)]
MEDIUM: cli: allow the stats socket to be bound to a specific set of processes
Using "stats bind-process", it becomes possible to indicate to haproxy which
process will get the incoming connections to the stats socket. It will also
shut down the warning when nbproc > 1.
Willy Tarreau [Mon, 22 Oct 2012 20:47:55 +0000 (22:47 +0200)]
BUG/MAJOR: connection: risk of crash on certain tricky close scenario
In some circumstances, if the connection to the server is aborted while
some data were planned to be sent and the poller reported an ability to
send, then conn_fd_handler() would still call conn->data->send(), causing
the data layer to dereference the now NULL conn->xprt and crash.
So we have to check for conn->xprt validity before calling the data
layer.
This issue was introduced after 1.5-dev12 so it does not need any backport
and does not affect any released version.
Special thanks go to Cristian Ditoiu who once again provided amazing help
to troubleshoot this bug !
Willy Tarreau [Mon, 22 Oct 2012 17:32:55 +0000 (19:32 +0200)]
MEDIUM: listener: provide a fallback for accept4() when not supported
It happens that on some systems, the libc is recent enough to permit
building with accept4() but the kernel does not support it. The result
is then a disaster since no connection is accepted. We now detect this
and automatically fall back to accept() and fcntl() when this happens.
Please consider the following patch for configuration.txt to clarify meaning
of bufsize, maxrewrite and the size of HTTP request which can be processed.
Willy Tarreau [Sat, 20 Oct 2012 08:38:09 +0000 (10:38 +0200)]
BUG/MEDIUM: http: set DONTWAIT on data when switching to tunnel mode
Jaroslaw Bojar diagnosed an issue when haproxy switches to tunnel mode
after a transfer. The response data are sent with the MSG_MORE flag,
causing them to be needlessly queued in the kernel. In order to fix this,
we set the CF_NEVER_WAIT flag on the channels when switching to tunnel
mode.
One issue remained with client-side keep-alive : if the response is sent
before the end of the request, it suffers the same issue for the same
reason. This is easily addressed by setting the CF_SEND_DONTWAIT flag
on the channel when the response has been parsed and we're waiting for
the other side.
The same issue is present in 1.4 so the fix must be backported.
Willy Tarreau [Fri, 19 Oct 2012 18:52:18 +0000 (20:52 +0200)]
MINOR: ssl: improve socket behaviour upon handshake abort.
While checking haproxy's SSL stack with www.ssllabs.com, it appeared that
immediately closing upon a failed handshake caused a TCP reset to be emitted.
This is because OpenSSL does not consume pending data in the socket buffers.
One side effect is that if the reset packet is lost, the client might not get
it. So now when a handshake fails, we try to clean the socket buffers before
closing, resulting in a clean FIN instead of an RST.
Willy Tarreau [Fri, 19 Oct 2012 17:49:09 +0000 (19:49 +0200)]
MEDIUM: sample: pass an empty list instead of a null for fetch args
ACL and sample fetches use args list and it is really not convenient to
check for null args everywhere. Now for empty args we pass a constant
list of end of lists. It will allow us to remove many useless checks.
Willy Tarreau [Fri, 19 Oct 2012 14:47:23 +0000 (16:47 +0200)]
MINOR: sample: accept fetch keywords without parenthesis
fetch keywords which support arguments do not support being called
without parenthesis even if all arguments are optional. Let's fix
this to allow fetch keywords without parenthesis as is already done
in ACLs.
Willy Tarreau [Fri, 19 Oct 2012 13:18:06 +0000 (15:18 +0200)]
MINOR: chunk: provide string compare functions
It's sometimes needed to be able to compare a zero-terminated string with a
chunk, so we now have two functions to do that, one strcmp() equivalent and
one strcasecmp() equivalent.
Willy Tarreau [Thu, 18 Oct 2012 16:57:14 +0000 (18:57 +0200)]
MEDIUM: ssl: add support for the "npn" bind keyword
The ssl_npn match could not work by itself because clients do not use
the NPN extension unless the server advertises the protocols it supports.
Thanks to Simone Bordet for the explanations on how to get it right.
Willy Tarreau [Sat, 13 Oct 2012 12:33:58 +0000 (14:33 +0200)]
OPTIM: connection: pack the struct target
The struct target contains one int and one pointer, causing it to be
64-bit aligned on 64-bit platforms. By marking it "packed", we can
save 8 bytes in struct connection and as many in struct session on
such platforms.
Willy Tarreau [Sat, 13 Oct 2012 09:09:14 +0000 (11:09 +0200)]
CLEANUP: session: remove term_trace which is not used anymore
This field was used to trace precisely where a session was terminated
but it did not survive code rearchitecture and was not used at all
anymore. Let's get rid of it.
Willy Tarreau [Sat, 13 Oct 2012 08:05:56 +0000 (10:05 +0200)]
OPTIM: channel: reorganize struct members to improve cache efficiency
Now that the buffer is moved out of the channel, it is possible to move
the pointer earlier in the struct and reorder some fields. This new
ordering improves overall performance by 2%, mainly saved in the HTTP
parsers and data transfers.
Willy Tarreau [Fri, 12 Oct 2012 21:49:43 +0000 (23:49 +0200)]
MAJOR: channel: replace the struct buffer with a pointer to a buffer
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.
Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
Willy Tarreau [Fri, 12 Oct 2012 20:51:15 +0000 (22:51 +0200)]
CLEANUP: http: use 'chn' to name channel variables, not 'buf'
These "buf" were confusing as they were really refering to channels. At
most places, a buffer was really all what was needed, so a struct buffer
was used instead. It is possible that the performance has slightly increased
by the removal of pointer offset in many pointer operations by directly
using the buffer pointer instead of the channel pointer.
Willy Tarreau [Fri, 12 Oct 2012 18:17:54 +0000 (20:17 +0200)]
MEDIUM: log: report SSL ciphers and version in logs using logformat %sslc/%sslv
These two new log-format tags report the SSL protocol version (%sslv) and the
SSL ciphers (%sslc) used for the connection with the client. For instance, to
append these information just after the client's IP/port address information
on an HTTP log line, use the following configuration :
Willy Tarreau [Fri, 12 Oct 2012 16:01:49 +0000 (18:01 +0200)]
MEDIUM: log: add a new LW_XPRT flag to pin the transport layer
This flag will have to be set on log tags which require transport layer
information. They will prevent the conn_xprt_close() call from releasing
the transport layer too early.
Willy Tarreau [Fri, 12 Oct 2012 15:50:05 +0000 (17:50 +0200)]
MEDIUM: connection: add a flag to hold the transport layer
When we start logging SSL information, we need the SSL struct to be
present even past the conn_xprt_close() call. In order to achieve this,
we should use refcounting on the connection and the transport layer. At
the moment it's not worth using plain refcounting as only the logs require
this, so instead of real refcounting we just use a flag which will be set
by the log subsystem when SSL data need to be logged.
What happens then is that the xprt->close() call is ignored and the
transport layer is closed again during session_free(), after the log
line is emitted.
Willy Tarreau [Fri, 12 Oct 2012 15:42:13 +0000 (17:42 +0200)]
BUG/MEDIUM: session: enable the conn_session_update() callback
This callback was introduced by commit 9683e9a0 but never enabled because
the CO_FL_WAKE_DATA flag was not set. The result is that this function is
never called when an SSL handshake fails, so the connection is only closed
on timeout.
Willy Tarreau [Fri, 12 Oct 2012 15:36:40 +0000 (17:36 +0200)]
BUG/MINOR: session: fix some leftover from debug code
Commit 82569f91 moved the health and monitor-net checks to session.c
but a debug test introduced 0& to disable MSG_DONTWAIT in the recv()
call and this debug code remained there. Since the socket is marked
non-blocking, there should be no effect but it's dangerous to keep
such a thing here.
Willy Tarreau [Fri, 12 Oct 2012 15:00:05 +0000 (17:00 +0200)]
MEDIUM: connection: always unset the transport layer upon close
When calling conn_xprt_close(), we always clear the transport pointer
so that all transport layers leave the connection in the same state after
a close. This will also make it safer and cheaper to call conn_xprt_close()
multiple times if needed.
Willy Tarreau [Fri, 12 Oct 2012 12:56:11 +0000 (14:56 +0200)]
MEDIUM: log: suffix the frontend's name with '~' when using SSL
Until now it was not possible to know from the logs whether the incoming
connection was made over SSL or not. In order to address this in the existing
log formats, a new log format %ft was introduced, to log the frontend's name
suffixed with its transport layer. The only transport layer in use right now
is '~' for SSL, so that existing log formats for non-SSL traffic are not
affected at all, and SSL log formats have the frontend's name suffixed with
'~'.
The TCP, HTTP and CLF log format now use %ft instead of %f. This does not
affect existing log formats which still make use of %f however.
Emeric Brun [Thu, 11 Oct 2012 14:11:36 +0000 (16:11 +0200)]
MINOR: ssl: add statements 'verify', 'ca-file' and 'crl-file' on servers.
It now becomes possible to verify the server's certificate using the "verify"
directive. This one only supports "none" and "required", as it does not make
much sense to also support "optional" here.
Willy Tarreau [Wed, 10 Oct 2012 21:04:25 +0000 (23:04 +0200)]
MEDIUM: ssl: move "server" keyword SSL options parsing to ssl_sock.c
All SSL-specific "server" keywords are now processed in ssl_sock.c. At
the moment, there is no more "not implemented" hint when SSL is disabled,
but keywords could be added in server.c if needed.
Willy Tarreau [Wed, 10 Oct 2012 06:56:47 +0000 (08:56 +0200)]
MINOR: standard: make indent_msg() support empty messages
indent_msg() is called with dynamically generated messages, so these
may be empty (NULL) when an empty list is being dumped. Support this
and return a NULL too.
Willy Tarreau [Wed, 10 Oct 2012 06:27:36 +0000 (08:27 +0200)]
MINOR: server: add minimal infrastructure to parse keywords
Just like with the "bind" lines, we'll switch the "server" line
parsing to keyword registration. The code is essentially the same
as for bind keywords, with minor changes such as support for the
default-server keywords and support for variable argument count.