Willy Tarreau [Mon, 1 Aug 2011 18:57:55 +0000 (20:57 +0200)]
[MEDIUM] listeners: add a global listener management task
This global task is used to periodically check for end of resource shortage
and to try to enable queued listeners again. This is important in case some
temporary system-wide shortage is encountered, so that we don't have to wait
for an existing connection to be released before checking the queue again.
For situations where listeners are queued due to the global maxconn being
reached, the task is woken up at least every second. For situations where
a system resource shortage is detected (memory, sockets, ...) the task is
woken up at least every 100 ms. That way, recovery from severe events can
still be achieved under acceptable conditions.
[BUG] proxy: stats frontend and peers were missing many initializers
This was revealed with one of the very latest patches which caused
the listener_queue not to be initialized on the stats socket frontend.
And in fact a number of other ones were missing too. This is getting so
boring that now we'll always make use of the same function to initialize
any proxy. Doing so has even saved about 500 bytes on the binary due to
the avoided code redundancy.
[MAJOR] proxy: finally get rid of maintain_proxies()
This function is finally not needed anymore, as it has been replaced with
a per-proxy task that is scheduled when some limits are encountered on
incoming connections or when the process is stopping. The savings should
be noticeable on configs with a large number of proxies. The most important
point is that the rate limiting is now enforced in a clean and solid way.
[BUG] proxy: peers must only be stopped once, not upon every call to maintain_proxies
Peers were stopped on every call to maintain_proxies when stopping=1,
while they should only be stopped once upon call to soft_stop(). This
bug has little impact, mostly increased CPU usage. It's not needed to
backport it.
[MINOR] sessions: only wake waiting listeners up if rate limit is OK
Instead of waking a listener up then making it sleep, we only wake them up
if we know their rate limit is fine. In the future we could improve on top
of that by deciding to wake a proxy-specific task in XX milliseconds to
take care of enabling the listeners again.
[MINOR] proxy: make session rate-limit more accurate
Patch d9bbe17b used to limit the rate-limit to off-by-one to avoid
a busy loop when the limit is reached. Now that the listeners are
automatically disabled and queued when a limit is reached, we don't
need this workaround anymore and can bring back the most accurate
computation.
[CLEANUP] proxy: rename a few proxy states (PR_STIDLE and PR_STRUN)
Those states have been replaced with PR_STFULL and PR_STREADY respectively,
as it is what matches them the best now. Also, two occurrences of PR_STIDLE
in peers.c have been removed as this did not provide any form of error recovery
anyway.
[MEDIUM] listeners: don't change listeners states anymore in maintain_proxies
Now maintain_proxies() only changes proxies states and does not affect their
listeners anymore since they are autonomous. A proxy will switch between the
PR_STIDLE and PR_STRUN states depending whether it's saturated or not. Next
step will consist in renaming PR_STIDLE to PR_STFULL. This state is now only
used to report the proxy state in the stats.
[MEDIUM] listeners: queue proxy-bound listeners at the proxy's
All listeners that are limited by a proxy-specific resource are now
queued at the proxy's and not globally. This allows finer-grained
wakeups when releasing resource.
[MEDIUM] listeners: put listeners in queue upon resource shortage
When an accept() fails because of a connection limit or a memory shortage,
we now disable it and queue it so that it's dequeued only when a connection
is released. This has improved the behaviour of the process near the fd limit
as now a listener with a no connection (eg: stats) will not loop forever
trying to get its connection accepted.
The solution is still not 100% perfect, as we'd like to have this used when
proxy limits are reached (use a per-proxy list) and for safety, we'd need
to have dedicated tasks to periodically re-enable them (eg: to overcome
temporary system-wide resource limitations when no connection is released).
[MINOR] listeners: add support for queueing resource limited listeners
When a listeners encounters a resource shortage, it currently stops until
one re-enables it. This is far from being perfect as it does not yet handle
the case where the single connection from the listener is rejected (eg: the
stats page).
Now we'll have a special status for resource limited listeners and we'll
queue them into one or multiple lists. That way, each time we have to stop
a listener because of a resource shortage, we can enqueue it and change its
state, so that it is dequeued once more resources are available.
This patch currently does not change any existing behaviour, it only adds
the basic building blocks for doing that.
[BUG] stream_sock: ensure orphan listeners don't accept too many connections
For listeners that are not bound to a frontend, the limit on the
number of accepted connections is tested at the end of the accept()
loop, but we don't break out of the loop, meaning that if more
connections than what the listener allows are available and if this
is less than the proxy's limits and within the size of a batch, then
they could be accepted. In practice, this problem currently cannot
appear since all listeners are bound to a frontend, and it's a very
minor issue anyway.
1.4 has the same issue (which cannot happen there either), but there
is some code after it, so it's the code cleanup which revealed it.
[MEDIUM] proxy: add a PAUSED state to listeners and move socket tricks out of proxy.c
Managing listeners state is difficult because they have their own state
and can at the same time have theirs dictated by their proxy. The pause
is not done properly, as the proxy code is fiddling with sockets. By
introducing new functions such as pause_listener()/resume_listener(), we
make it a bit more obvious how/when they're supposed to be used. The
listen_proxies() function was also renamed to resume_proxies() since
it's only used for pause/resume.
This patch is the first in a series aiming at getting rid of the maintain_proxies
mess. In the end, proxies should not call enable_listener()/disable_listener()
anymore.
[BUG] stream_sock: disable listener when system resources are exhausted
When an accept() returns -1 ENFILE du to system limits, it leaves the
connection pending in the backlog and epoll() comes back immediately
afterwards trying to make it accept it again. This causes haproxy to
remain at 100% CPU until something makes an accept() possible again.
Now upon such resource shortage, we mark the listener FULL so that we
only enable it again once at least one connection has been released.
In fact we only do that if there are some active connections on this
proxy, so that it has a chance to be marked not full again. This makes
haproxy remain idle when all resources are used, which helps a lot
releasing those resource as fast as possible.
Backport to 1.4 might be desirable but difficult and tricky.
[OPTIM] stream_sock: reduce the default number of accepted connections at once
By default on a single process, we accept 100 connections at once. This is too
much on recent CPUs where the cache is constantly thrashing, because we visit
all those connections several times. We should batch the processing slightly
less so that all the accepted session may remain in cache during their initial
processing.
Lowering the batch size from 100 to 32 has changed the connection rate for
concurrencies between 5-10k from 67 kcps to 94 kcps on a Core i5 660 (4M L3),
and forward rates from 30k to 39.5k.
Tests on this hardware show that values between 10 and 30 seem to do the job fine.
[MINOR] session: try to emit a 500 response on memory allocation errors
When we fail to create a session because of memory shortage, let's at
least try to send a 500 message directly on the socket. Even if we don't
have any buffers left, the kernel's orphans management will take care of
delivering the message as long as there are socket buffers left.
[BUG] session: risk of crash on out of memory (1.5-dev regression)
Patch af5149 introduced an issue which can be detected only on out of
memory conditions : a LIST_DEL() may be performed on an uninitialized
struct member instead of a LIST_INIT() during the accept() phase,
causing crashes and memory corruption to occur.
This issue was detected and diagnosed by the Exceliance R&D team.
This is 1.5-specific and very recent, so no existing deployment should
be impacted.
[OPTIM] halog: remove many 'if' by using a function pointer for the filters
There were too many filters, we were losing time in all the "if" statements.
By moving all the filters to independant functions, we made the code cleaner
and slightly faster (3%).
One minor bug was found, the -tc and -st options did not report the number
of output lines, but always zero.
[OPTIM] halog: check once for correct line format and reuse the pointer
Almost all filters first check the line format, which takes a lot of code
and requires parsing back and forth. By centralizing this test, we can
save about 15-20 more percent of performance for all filters.
Also, the test was wrong, it was checking that the source IP address was
starting with a digit, which is not always true with local IPv6 addresses.
Instead, we now check that the next field (accept field) starts with an
opening bracket and is followed by a digit between 0 and 3 (day of the
month). Doing this has contributed a 2% speedup because all other field
calculations were relative to a closer field.
Since many fields are relative and some are used a lot, try to cache them
the first time they're used in order to avoid skipping them twice. The
status counts with HTTP pre-check enabled has sped up by 40%.
[MINOR] halog: gain back performance before SKIP_CHAR fix
The SKIP_CHAR fix caused a measurable performance drop. Since we can
consider all chars below 0x20 as delimiters, we can avoid a cache lookup
which requires a char to pointer conversion.
[MEDIUM] http: add support for 'cookie' and 'set-cookie' patterns
This is used to perform cookie-based stickiness with table replication
between multiple masters and across restarts. This partially overrides
some of the appsession capabilities.
Simon Horman [Sat, 25 Jun 2011 00:39:49 +0000 (09:39 +0900)]
[MINOR] Add non-stick server option
Never add connections allocated to this sever to a stick-table.
This may be used in conjunction with backup to ensure that
stick-table persistence is disabled for backup servers.
Simon Horman [Fri, 24 Jun 2011 05:50:20 +0000 (14:50 +0900)]
[MINOR] Add rdp_cookie pattern fetch function
This pattern fetch function extracts the value of the rdp cookie <name> as
a string and uses this value to match. This enables implementation of
persistence based on the mstshash cookie. This is typically done if there
is no msts cookie present.
This differs from "balance rdp-cookie" in that any balancing algorithm may
be used and thus the distribution of clients to backend servers is not
linked to a hash of the RDP cookie. It is envisaged that using a balancing
algorithm such as "balance roundrobin" or "balance leastconnect" will lead
to a more even distribution of clients to backend servers than the hash
used by "balance rdp-cookie".
Example :
listen tse-farm
bind 0.0.0.0:3389
# wait up to 5s for an RDP cookie in the request
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
# apply RDP cookie persistence
persist rdp-cookie
# Persist based on the mstshash cookie
# This is only useful makes sense if
# balance rdp-cookie is not used
stick-table type string size 204800
stick on rdp_cookie(mstshash)
server srv1 1.1.1.1:3389
server srv1 1.1.1.2:3389
Simon Horman [Fri, 24 Jun 2011 05:49:57 +0000 (14:49 +0900)]
[MINOR] Make appsess{,ion}_refresh static
apsession_refresh() and apsess_refressh are only used inside apsession.c
and thus can be made static.
The only use of apsession_refresh() is appsession_task_init().
These functions have been re-ordered to avoid the need for
a forward-declaration of apsession_refresh().
Simon Horman [Tue, 21 Jun 2011 05:34:58 +0000 (14:34 +0900)]
[MINOR] Allow shutdown of sessions when a server becomes unavailable
This adds the "on-marked-down shutdown-sessions" statement on "server" lines,
which causes all sessions established on a server to be killed at once when
the server goes down. The task's priority is reniced to the highest value
(1024) so that servers holding many tasks don't cause a massive slowdown due
to the wakeup storm.
Simon Horman [Tue, 21 Jun 2011 05:34:57 +0000 (14:34 +0900)]
[MINOR] Add active connection list to server
The motivation for this is to allow iteration of all the connections
of a server without the expense of iterating over the global list
of connections.
The first use of this will be to implement an option to close connections
associated with a server when is is marked as being down or in maintenance
mode.
Simon Horman [Tue, 7 Jun 2011 02:07:50 +0000 (11:07 +0900)]
[CLEANUP] Remove assigned but unused variables
gcc (Debian 4.6.0-2) 4.6.1 20110329 (prerelease)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
...
src/proto_http.c:3029:14: warning: variable â\80\98del_clâ\80\99 set but not used [-Wunused-but-set-variable]
In file included from ebtree/eb64tree.c:23:0:
ebtree/eb64tree.h: In function â\80\98__eb64_lookupâ\80\99:
ebtree/eb64tree.h:128:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function â\80\98__eb64i_lookupâ\80\99:
ebtree/eb64tree.h:180:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
In file included from ebtree/ebpttree.h:26:0,
from ebtree/ebimtree.c:23:
ebtree/eb64tree.h: In function â\80\98__eb64_lookupâ\80\99:
ebtree/eb64tree.h:128:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function â\80\98__eb64i_lookupâ\80\99:
ebtree/eb64tree.h:180:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
In file included from ebtree/ebpttree.h:26:0,
from ebtree/ebistree.h:25,
from ebtree/ebistree.c:23:
ebtree/eb64tree.h: In function â\80\98__eb64_lookupâ\80\99:
ebtree/eb64tree.h:128:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function â\80\98__eb64i_lookupâ\80\99:
ebtree/eb64tree.h:180:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
[BUG] checks: fix support of Mysqld >= 5.5 for mysql-check
mysqld >= 5.5 want the client to announce 4.1+ authentication support, even if we have no password, so we do this.
I also check on a debian potato mysqld 3.22 and it works too so i assume we are good from 3.22 to 5.5.
Willy Tarreau [Sun, 5 Jun 2011 13:38:35 +0000 (15:38 +0200)]
[MINOR] config: automatically compute a default fullconn value
The fullconn value is not easy to get right when doing dynamic regulation,
as it should depend on the maxconns of the frontends that can reach a
backend. Since the parameter is mandatory, many configs are found with
an inappropriate default value.
Instead of rejecting configs without a fullconn value, we now set it to
10% of the sum of the configured maxconns of all the frontends which are
susceptible to branch to the backend. That way if new frontends are added,
the backend's fullconn automatically adjusts itself.
Willy Tarreau [Tue, 31 May 2011 16:06:18 +0000 (18:06 +0200)]
[BUG] stats: support url-encoded forms
Bashkim Kasa reported that the stats admin page did not work when colons
were used in server or backend names. This was caused by url-encoding
resulting in ':' being sent as '%3A'. Now we systematically decode the
field names and values to fix this issue.
Willy Tarreau [Mon, 30 May 2011 16:47:41 +0000 (18:47 +0200)]
[MINOR] config: make it possible to specify a cookie even without a server
Since version 1.0.0, it's forbidden to have a cookie specified without at
least one server. This test is useless and makes it complex to write APIs
to iteratively generate working configurations. Remove the test.
Willy Tarreau [Wed, 11 May 2011 18:47:24 +0000 (20:47 +0200)]
[OPTIM] stream_sock: don't use splice on too small payloads
It's more expensive to call splice() on short payloads than to use
recv()+send(). One of the reasons is that doing a splice() involves
allocating a pipe. One other reason is that the kernel will have to
copy itself if we try to splice less than a page. So let's fix a
short offset of 4kB below which we don't splice.
A quick test shows that on chunked encoded data, with splice we had
6826 syscalls (1715 splice, 3461 recv, 1650 send) while with this
patch, the same transfer resulted in 5793 syscalls (3896 recv, 1897
send).
Willy Tarreau [Wed, 11 May 2011 18:32:36 +0000 (20:32 +0200)]
[OPTIM] stream_sock: avoid fast-forwarding of partial data
Fast-forwarding between file descriptors is nice but can be counter-productive
when only one part of the buffer is forwarded, because it can result in doubling
the number of send() syscalls. This is what happens on HTTP chunking, because
the chunk data are sent, then the CRLF + next chunk size are parsed and immediately
scheduled for forwarding. This results in two send() for the same block while a
single one would have done it.
Willy Tarreau [Mon, 30 May 2011 16:10:30 +0000 (18:10 +0200)]
[MEDIUM] http: add support for "http-no-delay"
There are some very rare server-to-server applications that abuse the HTTP
protocol and expect the payload phase to be highly interactive, with many
interleaved data chunks in both directions within a single request. This is
absolutely not supported by the HTTP specification and will not work across
most proxies or servers. When such applications attempt to do this through
haproxy, it works but they will experience high delays due to the network
optimizations which favor performance by instructing the system to wait for
enough data to be available in order to only send full packets. Typical
delays are around 200 ms per round trip. Note that this only happens with
abnormal uses. Normal uses such as CONNECT requests nor WebSockets are not
affected.
When "option http-no-delay" is present in either the frontend or the backend
used by a connection, all such optimizations will be disabled in order to
make the exchanges as fast as possible. Of course this offers no guarantee on
the functionality, as it may break at any other place. But if it works via
HAProxy, it will work as fast as possible. This option should never be used
by default, and should never be used at all unless such a buggy application
is discovered. The impact of using this option is an increase of bandwidth
usage and CPU usage, which may significantly lower performance in high
latency environments.
This change should be backported to 1.4 since the first report of such a
misuse was in 1.4. Next patch will also be needed.
Willy Tarreau [Mon, 30 May 2011 15:32:53 +0000 (17:32 +0200)]
[CLEANUP] stream_sock: remove unneeded FL_TCP and factor out test
The FL_TCP flag was a leftover from the old days we were using TCP_CORK.
With MSG_MORE it's not needed anymore so we can remove the condition and
sensibly simplify the test.
Willy Tarreau [Wed, 11 May 2011 18:14:03 +0000 (20:14 +0200)]
[MINOR] stream_sock: always clear BF_EXPECT_MORE upon complete transfer
When sending is complete, it's preferred to systematically clear the flags
that were set for that transfer. What could happen is that the to_forward
counter had caused the MSG_MORE flag to be set and BF_EXPECT_MORE not to
be cleared, resulting in this flag being unexpectedly maintained for next
round.
The code has taken extreme care of not doing this till now, but it's not
acceptable that the caller has to know these precise semantics. So let's
unconditionnally clear the flag instead.
For the sake of safety, this fix should be backported to 1.4.
Willy Tarreau [Wed, 11 May 2011 17:56:11 +0000 (19:56 +0200)]
[MINOR] http: partially revert the chunking optimization for now
Commit 57f5c1 used to provide a nice improvement on chunked encoding since
it ensured that we did not set a PUSH flag for every chunk or buffer data
part of a chunked transfer.
Some applications appear to erroneously abuse HTTP chunking in order to
get interactive exchanges between a user agent and an origin server with
very small chunks. While it happens to work through haproxy, it's terribly
slow due to the latency added after passing each chunk to the system, who
could wait up to 200ms before pushing them onto the wire.
So we need an interactive mode for such usages. In the mean time, step back
on the optim, but not completely, so that we still keep the flag as long as
we know we're not finished with the current chunk.
This change should be backported to 1.4 too as the issue was discovered
with it.
Willy Tarreau [Wed, 11 May 2011 14:28:49 +0000 (16:28 +0200)]
[MINOR] http: make the "HTTP 200" status code configurable.
This status code is used in response to requests matching "monitor-uri".
Some users need to adjust it to fit their needs (eg: make some strings
appear there). As it's already defined as a chunked string and used
exactly like other status codes, it makes sense to make it configurable
with the usual "errorfile", "errorloc", ...
Willy Tarreau [Wed, 11 May 2011 14:00:54 +0000 (16:00 +0200)]
[MINOR] http: don't report the "haproxy" word on the monitoring response
Some people like to make the monitoring URL testable from unsafe locations.
Reporting haproxy's existence there can sometimes be problematic. This patch
should not be backported to 1.4 because it is possible, eventhough unlikely,
that some scripts rely on this word to appear there.
As reported by Lauri-Alo Adamson, version 1.5-dev6 doesn't support
stick-tables with a binary type.
This issue was introduced in the commit 4f92d32 where a line was erroneously
deleted, and is 1.5-specific.
[BUG] proto_tcp: fix address binding on remote source
Mark Brooks reported that commit 1b4b7c broke tproxy in 1.5-dev6. Nick
Chalk tracked the issue down to a missing address family setting in
tcp_bind_socket() which resulted in a failure to use get_addr_len().
This issue is 1.5-specific.
[BUG] checks: http-check expect could fail a check on multi-packet responses
Christopher Blencowe reported that the httpchk_expect() function was
lacking a test for incomplete responses : if the server sends only the
headers in the first packet and the body in a subsequent one, there is
a risk that the check fails without waiting for more data. A failure
rate of about 1% was reported.
Released version 1.5-dev6 with the following main changes :
- [BUG] stream_sock: use get_addr_len() instead of sizeof() on sockaddr_storage
- [BUG] TCP source tracking was broken with IPv6 changes
- [BUG] stick-tables did not work when converting IPv6 to IPv4
- [CRITICAL] fix risk of crash when dealing with space in response cookies
[CRITICAL] fix risk of crash when dealing with space in response cookies
When doing fix 24581bae022bcf97ea7818e49ef27d21c92d6aa3 to correctly handle
response cookies, an unfortunate typo was inserted in the less likely code
path, resulting in a risk of crash when cookie-based persistence is enabled
and the server emits a cookie with several spaces around the equal sign.
This bug was noticed during a code backport. Its effects were never reported
because this situation is very unlikely to appear, but it can be provoked on
purpose by the server.
This patch must be backported to 1.4 versions which contain the fix above
(anything > 1.4.8), and to similar 1.3 versions > 1.3.25. 1.5-dev versions
after 1.5-dev2 are affected too.
[BUG] stream_sock: use get_addr_len() instead of sizeof() on sockaddr_storage
John Helliwell reported a runtime issue on Solaris since 1.5-dev5. Traces
show that connect() returns EINVAL, which means the socket length is not
appropriate for the family. Solaris does not like being called with sizeof
and needs the address family's size on sockaddr_storage.
The fix consists in adding a get_addr_len() function which returns the
socket's address length based on its family. Tests show that this works
for both IPv4 and IPv6 addresses.
Willy Tarreau [Mon, 28 Mar 2011 23:10:33 +0000 (01:10 +0200)]
[RELEASE] Released version 1.5-dev5
Released version 1.5-dev5 with the following main changes :
- [BUG] standard: is_addr return value for IPv4 was inverted
- [MINOR] update comment about IPv6 support for server
- [MEDIUM] use getaddrinfo to resolve names if gethostbyname fail
- [DOC] update IPv6 support for bind
- [DOC] document IPv6 support for server
- [DOC] fix a minor typo
- [MEDIUM] IPv6 support for syslog
- [DOC] document IPv6 support for syslog
- [MEDIUM] IPv6 support for stick-tables
- [DOC] document IPv6 support for stick-tables
- [DOC] update ROADMAP file
- [BUG] session: src_conn_cur was returning src_conn_cnt instead
- [MINOR] frontend: add a make_proxy_line function
- [MEDIUM] stream_sock: add support for sending the proxy protocol header line
- [MEDIUM] server: add support for the "send-proxy" option
- [DOC] update the spec on the proxy protocol
- [BUILD] proto_tcp: fix build issue with CTTPROXY
- [DOC] update ROADMAP file
- [MEDIUM] config: rework the IPv4/IPv6 address parser to support host-only addresses
- [MINOR] cfgparse: better report wrong listening addresses and make use of str2sa_range
- [BUILD] add the USE_GETADDRINFO build option
- [TESTS] provide a test case for various address formats
- [BUG] session: conn_retries was not always initialized
- [BUG] log: retrieve the target from the session, not the SI
- [BUG] http: fix possible incorrect forwarded wrapping chunk size (take 2)
- [MINOR] tools: add two macros MID_RANGE and MAX_RANGE
- [BUG] http: fix content-length handling on 32-bit platforms
- [OPTIM] buffers: uninline buffer_forward()
- [BUG] stream_sock: fix handling for server side PROXY protocol
- [MINOR] acl: add support for table_cnt and table_avl matches
- [DOC] update ROADMAP file
Since IPv6 is a different type than IPv4, the pattern fetch functions
src6 and dst6 were added. IPv6 stick-tables can also fetch IPv4 addresses
with src and dst. In this case, the IPv4 addresses are mapped to their
IPv6 counterpart, according to RFC 4291.
Willy Tarreau [Mon, 28 Mar 2011 21:17:54 +0000 (23:17 +0200)]
[BUG] stream_sock: fix handling for server side PROXY protocol
Patch 5ab04ec47c9946a2bbc535687c023215ca813da0 was incomplete,
because if the first send() fails on an empty buffer, we fail
to rearm the polling and we can't establish the connection
anymore.
The issue was reported by Ben Timby who provided large amounts
of traces of various tests helping to reliably reproduce the issue.
Willy Tarreau [Mon, 28 Mar 2011 14:25:58 +0000 (16:25 +0200)]
[OPTIM] buffers: uninline buffer_forward()
Since the latest additions to buffer_forward(), it became too large for
inlining, so let's uninline it. The code size drops by 3kB. Should be
backported to 1.4 too.
Willy Tarreau [Mon, 28 Mar 2011 14:06:28 +0000 (16:06 +0200)]
[BUG] http: fix content-length handling on 32-bit platforms
Despite much care around handling the content-length as a 64-bit integer,
forwarding was broken on 32-bit platforms due to the 32-bit nature of
the ->to_forward member of the "buffer" struct. The issue is that this
member is declared as a long, so while it works OK on 64-bit platforms,
32-bit truncate the content-length to the lower 32-bits.
One solution could consist in turning to_forward to a long long, but it
is used a lot in the critical path, so it's not acceptable to perform
all buffer size computations on 64-bit there.
The fix consists in changing the to_forward member to a strict 32-bit
integer and ensure in buffer_forward() that only the amount of bytes
that can fit into it is considered. Callers of buffer_forward() are
responsible for checking that their data were taken into account. We
arbitrarily ensure we never consider more than 2G at once.
That's the way it was intended to work on 32-bit platforms except that
it did not.
This issue was tracked down hard at Exosec with Bertrand Jacquin,
Thierry Fournier and Julien Thomas. It remained undetected for a long
time because files larger than 4G are almost always transferred in
chunked-encoded format, and most platforms dealing with huge contents
these days run on 64-bit.
The bug affects all 1.5 and 1.4 versions, and must be backported.
Willy Tarreau [Sun, 27 Mar 2011 17:53:06 +0000 (19:53 +0200)]
[BUG] log: retrieve the target from the session, not the SI
Since we now have the copy of the target in the session, use it instead
of relying on the SI for it. The SI drops the target upon unregister()
so applets such as stats were logged as "NOSRV".
Willy Tarreau [Sun, 27 Mar 2011 17:16:56 +0000 (19:16 +0200)]
[BUG] session: conn_retries was not always initialized
Johannes Smith reported some wrong retries count in logs associated with bad
requests. The cause was that the conn_retries field in the stream interface
was only initialized when attempting to connect, but is used when logging,
possibly with an uninitialized value holding last connection's conn_retries.
This could have been avoided by making use of a stream interface initializer.