git.ipfire.org Git - thirdparty/haproxy.git/log

[MEDIUM] listeners: add a global listener management task

This global task is used to periodically check for end of resource shortage
and to try to enable queued listeners again. This is important in case some
temporary system-wide shortage is encountered, so that we don't have to wait
for an existing connection to be released before checking the queue again.

For situations where listeners are queued due to the global maxconn being
reached, the task is woken up at least every second. For situations where
a system resource shortage is detected (memory, sockets, ...) the task is
woken up at least every 100 ms. That way, recovery from severe events can
still be achieved under acceptable conditions.

[BUG] proxy: stats frontend and peers were missing many initializers

This was revealed with one of the very latest patches which caused
the listener_queue not to be initialized on the stats socket frontend.
And in fact a number of other ones were missing too. This is getting so
boring that now we'll always make use of the same function to initialize
any proxy. Doing so has even saved about 500 bytes on the binary due to
the avoided code redundancy.

No backport is needed.

[MAJOR] proxy: finally get rid of maintain_proxies()

This function is finally not needed anymore, as it has been replaced with
a per-proxy task that is scheduled when some limits are encountered on
incoming connections or when the process is stopping. The savings should
be noticeable on configs with a large number of proxies. The most important
point is that the rate limiting is now enforced in a clean and solid way.

[MINOR] task: new function task_schedule() to schedule a wake up

This function is used when a task should be woken up at most at a given
date. This will be used with rate shapers.

[CLEANUP] proxy: merge maintain_proxies() operation inside a single loop

This will help transforming the processing into per-proxy tasks.

[BUG] proxy: peers must only be stopped once, not upon every call to maintain_proxies

Peers were stopped on every call to maintain_proxies when stopping=1,
while they should only be stopped once upon call to soft_stop(). This
bug has little impact, mostly increased CPU usage. It's not needed to
backport it.

[MINOR] sessions: only wake waiting listeners up if rate limit is OK

Instead of waking a listener up then making it sleep, we only wake them up
if we know their rate limit is fine. In the future we could improve on top
of that by deciding to wake a proxy-specific task in XX milliseconds to
take care of enabling the listeners again.

[MINOR] proxy: make session rate-limit more accurate

Patch d9bbe17b used to limit the rate-limit to off-by-one to avoid
a busy loop when the limit is reached. Now that the listeners are
automatically disabled and queued when a limit is reached, we don't
need this workaround anymore and can bring back the most accurate
computation.

[MINOR] stats: report a "WAITING" state for sockets waiting for resource

This is useful when enabling socket-stats to know that a socket is being
waiting for some resource (RAM, global connections, etc...).

[CLEANUP] proxy: rename a few proxy states (PR_STIDLE and PR_STRUN)

Those states have been replaced with PR_STFULL and PR_STREADY respectively,
as it is what matches them the best now. Also, two occurrences of PR_STIDLE
in peers.c have been removed as this did not provide any form of error recovery
anyway.

[MEDIUM] listeners: don't change listeners states anymore in maintain_proxies

Now maintain_proxies() only changes proxies states and does not affect their
listeners anymore since they are autonomous. A proxy will switch between the
PR_STIDLE and PR_STRUN states depending whether it's saturated or not. Next
step will consist in renaming PR_STIDLE to PR_STFULL. This state is now only
used to report the proxy state in the stats.

[MEDIUM] listeners: don't stop proxies when global maxconn is reached

Now we don't have to stop proxies anymore since their listeners will be
queued if they attempt to accept a connection past the global limits.

[MEDIUM] listeners: queue proxy-bound listeners at the proxy's

All listeners that are limited by a proxy-specific resource are now
queued at the proxy's and not globally. This allows finer-grained
wakeups when releasing resource.

[MEDIUM] listeners: put listeners in queue upon resource shortage

When an accept() fails because of a connection limit or a memory shortage,
we now disable it and queue it so that it's dequeued only when a connection
is released. This has improved the behaviour of the process near the fd limit
as now a listener with a no connection (eg: stats) will not loop forever
trying to get its connection accepted.

The solution is still not 100% perfect, as we'd like to have this used when
proxy limits are reached (use a per-proxy list) and for safety, we'd need
to have dedicated tasks to periodically re-enable them (eg: to overcome
temporary system-wide resource limitations when no connection is released).

[MINOR] listeners: add support for queueing resource limited listeners

When a listeners encounters a resource shortage, it currently stops until
one re-enables it. This is far from being perfect as it does not yet handle
the case where the single connection from the listener is rejected (eg: the
stats page).

Now we'll have a special status for resource limited listeners and we'll
queue them into one or multiple lists. That way, each time we have to stop
a listener because of a resource shortage, we can enqueue it and change its
state, so that it is dequeued once more resources are available.

This patch currently does not change any existing behaviour, it only adds
the basic building blocks for doing that.

[MINOR] listeners: add listen_full() to mark a listener full

This is just a cleanup which removes calls to EV_FD_CLR() and state
setting everywhere in the code.

[BUG] stream_sock: ensure orphan listeners don't accept too many connections

For listeners that are not bound to a frontend, the limit on the
number of accepted connections is tested at the end of the accept()
loop, but we don't break out of the loop, meaning that if more
connections than what the listener allows are available and if this
is less than the proxy's limits and within the size of a batch, then
they could be accepted. In practice, this problem currently cannot
appear since all listeners are bound to a frontend, and it's a very
minor issue anyway.

1.4 has the same issue (which cannot happen there either), but there
is some code after it, so it's the code cleanup which revealed it.

[MEDIUM] proxy: add a PAUSED state to listeners and move socket tricks out of proxy.c

Managing listeners state is difficult because they have their own state
and can at the same time have theirs dictated by their proxy. The pause
is not done properly, as the proxy code is fiddling with sockets. By
introducing new functions such as pause_listener()/resume_listener(), we
make it a bit more obvious how/when they're supposed to be used. The
listen_proxies() function was also renamed to resume_proxies() since
it's only used for pause/resume.

This patch is the first in a series aiming at getting rid of the maintain_proxies
mess. In the end, proxies should not call enable_listener()/disable_listener()
anymore.

[BUG] stream_sock: disable listener when system resources are exhausted

When an accept() returns -1 ENFILE du to system limits, it leaves the
connection pending in the backlog and epoll() comes back immediately
afterwards trying to make it accept it again. This causes haproxy to
remain at 100% CPU until something makes an accept() possible again.
Now upon such resource shortage, we mark the listener FULL so that we
only enable it again once at least one connection has been released.
In fact we only do that if there are some active connections on this
proxy, so that it has a chance to be marked not full again. This makes
haproxy remain idle when all resources are used, which helps a lot
releasing those resource as fast as possible.

Backport to 1.4 might be desirable but difficult and tricky.

[OPTIM] stream_sock: reduce the default number of accepted connections at once

By default on a single process, we accept 100 connections at once. This is too
much on recent CPUs where the cache is constantly thrashing, because we visit
all those connections several times. We should batch the processing slightly
less so that all the accepted session may remain in cache during their initial
processing.

Lowering the batch size from 100 to 32 has changed the connection rate for
concurrencies between 5-10k from 67 kcps to 94 kcps on a Core i5 660 (4M L3),
and forward rates from 30k to 39.5k.

Tests on this hardware show that values between 10 and 30 seem to do the job fine.

[MINOR] session: try to emit a 500 response on memory allocation errors

When we fail to create a session because of memory shortage, let's at
least try to send a 500 message directly on the socket. Even if we don't
have any buffers left, the kernel's orphans management will take care of
delivering the message as long as there are socket buffers left.

[BUG] session: risk of crash on out of memory (1.5-dev regression)

Patch af5149 introduced an issue which can be detected only on out of
memory conditions : a LIST_DEL() may be performed on an uninitialized
struct member instead of a LIST_INIT() during the accept() phase,
causing crashes and memory corruption to occur.

This issue was detected and diagnosed by the Exceliance R&D team.

This is 1.5-specific and very recent, so no existing deployment should
be impacted.

[MINOR] Free stick rules on denint()

The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()

Discovered using valgrind.

[MINOR] Free stick table pool on denint()

The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()

Discovered using valgrind.

[MINOR] Free tcp rules on denint()

The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()

Discovered using valgrind.

[MINOR] Free rdp_cookie_name on denint()

The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()

Discovered using valgrind.

[MINOR] Consistently free expr on error in cfg_parse_listen()

It seems to me that without this change cfg_parse_listen()
may leak memory.

[MINOR] Consistently use error in tcp_parse_tcp_req()

It seems to me that without this change tcp_parse_tcp_req()
may leak memory.

[OPTIM] halog: remove support for tab delimiters in input data

Haproxy does not use tabs when sending logs, and checking for them
wastes no less than 4% of CPU cycles. Better get rid of these tests.

[OPTIM] halog: remove many 'if' by using a function pointer for the filters

There were too many filters, we were losing time in all the "if" statements.
By moving all the filters to independant functions, we made the code cleaner
and slightly faster (3%).

One minor bug was found, the -tc and -st options did not report the number
of output lines, but always zero.

[OPTIM] halog: check once for correct line format and reuse the pointer

Almost all filters first check the line format, which takes a lot of code
and requires parsing back and forth. By centralizing this test, we can
save about 15-20 more percent of performance for all filters.

Also, the test was wrong, it was checking that the source IP address was
starting with a digit, which is not always true with local IPv6 addresses.
Instead, we now check that the next field (accept field) starts with an
opening bracket and is followed by a digit between 0 and 3 (day of the
month). Doing this has contributed a 2% speedup because all other field
calculations were relative to a closer field.

[OPTIM] halog: cache some common fields positions

Since many fields are relative and some are used a lot, try to cache them
the first time they're used in order to avoid skipping them twice. The
status counts with HTTP pre-check enabled has sped up by 40%.

[MINOR] halog: gain back performance before SKIP_CHAR fix

The SKIP_CHAR fix caused a measurable performance drop. Since we can
consider all chars below 0x20 as delimiters, we can avoid a cache lookup
which requires a char to pointer conversion.

[MINOR] halog: add support for HTTP log matching (-H)

Now it's possible to restrict analysis to HTTP-looking logs when passing -H.
-H -v gives the opposite (most likely TCP logs).

[MINOR] halog: make SKIP_CHAR stop on field delimiters

The SKIP_CHAR() macro did not consider field delimiters, causing the timer parser
to be able to search timers at wrong places when fed with TCP logs.

[BUG] halog: correctly handle truncated last line

If last line is truncated (eg: truncated file), then halog would loop on
it forever.

[MEDIUM] http: add support for 'cookie' and 'set-cookie' patterns

This is used to perform cookie-based stickiness with table replication
between multiple masters and across restarts. This partially overrides
some of the appsession capabilities.

[DOC] add missing entry or stick store-response

[MINOR] Add non-stick server option

Never add connections allocated to this sever to a stick-table.
This may be used in conjunction with backup to ensure that
stick-table persistence is disabled for backup servers.

[CLEANUP] Remove unnecessary casts

There is no need to cast when going to or from void *

[MINOR] Add rdp_cookie pattern fetch function

This pattern fetch function extracts the value of the rdp cookie <name> as
a string and uses this value to match. This enables implementation of
persistence based on the mstshash cookie. This is typically done if there
is no msts cookie present.

This differs from "balance rdp-cookie" in that any balancing algorithm may
be used and thus the distribution of clients to backend servers is not
linked to a hash of the RDP cookie. It is envisaged that using a balancing
algorithm such as "balance roundrobin" or "balance leastconnect" will lead
to a more even distribution of clients to backend servers than the hash
used by "balance rdp-cookie".

Example :
listen tse-farm
    bind 0.0.0.0:3389
    # wait up to 5s for an RDP cookie in the request
    tcp-request inspect-delay 5s
    tcp-request content accept if RDP_COOKIE
    # apply RDP cookie persistence
    persist rdp-cookie
    # Persist based on the mstshash cookie
    # This is only useful makes sense if
    # balance rdp-cookie is not used
    stick-table type string size 204800
    stick on rdp_cookie(mstshash)
    server srv1 1.1.1.1:3389
    server srv1 1.1.1.2:3389

[MINOR] Make appsess{,ion}_refresh static

apsession_refresh() and apsess_refressh are only used inside apsession.c
and thus can be made static.

The only use of apsession_refresh() is appsession_task_init().
These functions have been re-ordered to avoid the need for
a forward-declaration of apsession_refresh().

[MINOR] Add down termination condition

If a connection is closed by because the backend became unavailable
then log 'D' as the termination condition.

Signed-off-by: Simon Horman <horms@verge.net.au>

[MINOR] Allow shutdown of sessions when a server becomes unavailable

This adds the "on-marked-down shutdown-sessions" statement on "server" lines,
which causes all sessions established on a server to be killed at once when
the server goes down. The task's priority is reniced to the highest value
(1024) so that servers holding many tasks don't cause a massive slowdown due
to the wakeup storm.

[MINOR] Add active connection list to server

The motivation for this is to allow iteration of all the connections
of a server without the expense of iterating over the global list
of connections.

The first use of this will be to implement an option to close connections
associated with a server when is is marked as being down or in maintenance
mode.

[CLEANUP] session.c: Make functions static where possible

[CLEANUP] peers.h: fix declarations

* The declaration of peer_session_create() does
  not match its definition. As it is only
  used inside of peers.c make it static.

* Make the declaration of peers_register_table()
  match its definition.

* Also, make all functions in peers.c that
  are not also in peers.h static

[CLEANUP] Remove assigned but unused variables

gcc (Debian 4.6.0-2) 4.6.1 20110329 (prerelease)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

...
src/proto_http.c:3029:14: warning: variable â\80\98del_clâ\80\99 set but not used [-Wunused-but-set-variable]
In file included from ebtree/eb64tree.c:23:0:
ebtree/eb64tree.h: In function â\80\98__eb64_lookupâ\80\99:
ebtree/eb64tree.h:128:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function â\80\98__eb64i_lookupâ\80\99:
ebtree/eb64tree.h:180:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
In file included from ebtree/ebpttree.h:26:0,
                 from ebtree/ebimtree.c:23:
ebtree/eb64tree.h: In function â\80\98__eb64_lookupâ\80\99:
ebtree/eb64tree.h:128:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function â\80\98__eb64i_lookupâ\80\99:
ebtree/eb64tree.h:180:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
In file included from ebtree/ebpttree.h:26:0,
                 from ebtree/ebistree.h:25,
                 from ebtree/ebistree.c:23:
ebtree/eb64tree.h: In function â\80\98__eb64_lookupâ\80\99:
ebtree/eb64tree.h:128:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function â\80\98__eb64i_lookupâ\80\99:
ebtree/eb64tree.h:180:6: warning: variable â\80\98node_bitâ\80\99 set but not used [-Wunused-but-set-variable]

[MINOR] Allow showing and clearing by key of string stick tables

[MINOR] Allow showing and clearing by key of integer stick tables

[MINOR] Allow showing and clearing by key of ipv6 stick tables

[MINOR] More flexible clearing of stick table

* Allow clearing of all entries of a table
* Allow clearing of all entries of a table
that match a data filter

[MINOR] Break out all stick table socat command parsing

This will allow reuse for clearing table entries
other than by key

[MINOR] Allow listing of stick table by key

[MINOR] Break out processing of clear table

This will allow the code to be reused
for showing a table filtered by key

[MINOR] Break out dumping table

This will allow the code to be reused
for showing a table filtered by key

Signed-off-by: Simon Horman <horms@verge.net.au>

[CLEANUP] dumpstats: make symbols static where possible

[BUG] checks: fix support of Mysqld >= 5.5 for mysql-check

mysqld >= 5.5 want the client to announce 4.1+ authentication support, even if we have no password, so we do this.
I also check on a debian potato mysqld 3.22 and it works too so i assume we are good from 3.22 to 5.5.

[WT: this must be backported to 1.4]

[CLEANUP] config: remove some left-over printf debugging code from previous patch

Last patch fbb784 unexpectedly left some debugging printf messages which can be
seen in debug mode.

[MINOR] config: automatically compute a default fullconn value

The fullconn value is not easy to get right when doing dynamic regulation,
as it should depend on the maxconns of the frontends that can reach a
backend. Since the parameter is mandatory, many configs are found with
an inappropriate default value.

Instead of rejecting configs without a fullconn value, we now set it to
10% of the sum of the configured maxconns of all the frontends which are
susceptible to branch to the backend. That way if new frontends are added,
the backend's fullconn automatically adjusts itself.

[DOC] Minor spelling fixes and grammatical enhancements

[BUG] stats: support url-encoded forms

Bashkim Kasa reported that the stats admin page did not work when colons
were used in server or backend names. This was caused by url-encoding
resulting in ':' being sent as '%3A'. Now we systematically decode the
field names and values to fix this issue.

[MINOR] config: make it possible to specify a cookie even without a server

Since version 1.0.0, it's forbidden to have a cookie specified without at
least one server. This test is useless and makes it complex to write APIs
to iteratively generate working configurations. Remove the test.

[OPTIM] stream_sock: don't use splice on too small payloads

It's more expensive to call splice() on short payloads than to use
recv()+send(). One of the reasons is that doing a splice() involves
allocating a pipe. One other reason is that the kernel will have to
copy itself if we try to splice less than a page. So let's fix a
short offset of 4kB below which we don't splice.

A quick test shows that on chunked encoded data, with splice we had
6826 syscalls (1715 splice, 3461 recv, 1650 send) while with this
patch, the same transfer resulted in 5793 syscalls (3896 recv, 1897
send).

[OPTIM] stream_sock: avoid fast-forwarding of partial data

Fast-forwarding between file descriptors is nice but can be counter-productive
when only one part of the buffer is forwarded, because it can result in doubling
the number of send() syscalls. This is what happens on HTTP chunking, because
the chunk data are sent, then the CRLF + next chunk size are parsed and immediately
scheduled for forwarding. This results in two send() for the same block while a
single one would have done it.

[OPTIM] http: optimize chunking again in non-interactive mode

Now that we support the http-no-delay mode, we can optimize HTTP
chunking again by always waiting for more data to come until the
last chunk is met.

This patch may or may not be backported to 1.4, it's not a big deal,
it will mainly help for chunks which are aligned with the buffer size.

[MEDIUM] http: add support for "http-no-delay"

There are some very rare server-to-server applications that abuse the HTTP
protocol and expect the payload phase to be highly interactive, with many
interleaved data chunks in both directions within a single request. This is
absolutely not supported by the HTTP specification and will not work across
most proxies or servers. When such applications attempt to do this through
haproxy, it works but they will experience high delays due to the network
optimizations which favor performance by instructing the system to wait for
enough data to be available in order to only send full packets. Typical
delays are around 200 ms per round trip. Note that this only happens with
abnormal uses. Normal uses such as CONNECT requests nor WebSockets are not
affected.

When "option http-no-delay" is present in either the frontend or the backend
used by a connection, all such optimizations will be disabled in order to
make the exchanges as fast as possible. Of course this offers no guarantee on
the functionality, as it may break at any other place. But if it works via
HAProxy, it will work as fast as possible. This option should never be used
by default, and should never be used at all unless such a buggy application
is discovered. The impact of using this option is an increase of bandwidth
usage and CPU usage, which may significantly lower performance in high
latency environments.

This change should be backported to 1.4 since the first report of such a
misuse was in 1.4. Next patch will also be needed.

[CLEANUP] stream_sock: remove unneeded FL_TCP and factor out test

The FL_TCP flag was a leftover from the old days we were using TCP_CORK.
With MSG_MORE it's not needed anymore so we can remove the condition and
sensibly simplify the test.

[MINOR] stream_sock: always clear BF_EXPECT_MORE upon complete transfer

When sending is complete, it's preferred to systematically clear the flags
that were set for that transfer. What could happen is that the to_forward
counter had caused the MSG_MORE flag to be set and BF_EXPECT_MORE not to
be cleared, resulting in this flag being unexpectedly maintained for next
round.

The code has taken extreme care of not doing this till now, but it's not
acceptable that the caller has to know these precise semantics. So let's
unconditionnally clear the flag instead.

For the sake of safety, this fix should be backported to 1.4.

[MINOR] http: partially revert the chunking optimization for now

Commit 57f5c1 used to provide a nice improvement on chunked encoding since
it ensured that we did not set a PUSH flag for every chunk or buffer data
part of a chunked transfer.

Some applications appear to erroneously abuse HTTP chunking in order to
get interactive exchanges between a user agent and an origin server with
very small chunks. While it happens to work through haproxy, it's terribly
slow due to the latency added after passing each chunk to the system, who
could wait up to 200ms before pushing them onto the wire.

So we need an interactive mode for such usages. In the mean time, step back
on the optim, but not completely, so that we still keep the flag as long as
we know we're not finished with the current chunk.

This change should be backported to 1.4 too as the issue was discovered
with it.

[MINOR] http: make the "HTTP 200" status code configurable.

This status code is used in response to requests matching "monitor-uri".
Some users need to adjust it to fit their needs (eg: make some strings
appear there). As it's already defined as a chunked string and used
exactly like other status codes, it makes sense to make it configurable
with the usual "errorfile", "errorloc", ...

[REORG] http: move HTTP error codes back to proto_http.h

This one was left isolated in its own file. It probably is a leftover
from the 1.2->1.3 split.

[MINOR] http: don't report the "haproxy" word on the monitoring response

Some people like to make the monitoring URL testable from unsafe locations.
Reporting haproxy's existence there can sometimes be problematic. This patch
should not be backported to 1.4 because it is possible, eventhough unlikely,
that some scripts rely on this word to appear there.

[BUG] fix binary stick-tables

As reported by Lauri-Alo Adamson, version 1.5-dev6 doesn't support
stick-tables with a binary type.
This issue was introduced in the commit 4f92d32 where a line was erroneously
deleted, and is 1.5-specific.

[BUG] proto_tcp: fix address binding on remote source

Mark Brooks reported that commit 1b4b7c broke tproxy in 1.5-dev6. Nick
Chalk tracked the issue down to a missing address family setting in
tcp_bind_socket() which resulted in a failure to use get_addr_len().
This issue is 1.5-specific.

[DOC] fix minor typo in the "dispatch" doc

Bradley Falzon reported a left-over of a copy-paste from the "disabled"
keyword in the "dispatch" section.

[BUG] checks: http-check expect could fail a check on multi-packet responses

Christopher Blencowe reported that the httpchk_expect() function was
lacking a test for incomplete responses : if the server sends only the
headers in the first packet and the body in a subsequent one, there is
a risk that the check fails without waiting for more data. A failure
rate of about 1% was reported.

This fix must be backported to 1.4.

[RELEASE] Released version 1.5-dev6

Released version 1.5-dev6 with the following main changes :
    - [BUG] stream_sock: use get_addr_len() instead of sizeof() on sockaddr_storage
    - [BUG] TCP source tracking was broken with IPv6 changes
    - [BUG] stick-tables did not work when converting IPv6 to IPv4
    - [CRITICAL] fix risk of crash when dealing with space in response cookies

[CRITICAL] fix risk of crash when dealing with space in response cookies

When doing fix 24581bae022bcf97ea7818e49ef27d21c92d6aa3 to correctly handle
response cookies, an unfortunate typo was inserted in the less likely code
path, resulting in a risk of crash when cookie-based persistence is enabled
and the server emits a cookie with several spaces around the equal sign.

This bug was noticed during a code backport. Its effects were never reported
because this situation is very unlikely to appear, but it can be provoked on
purpose by the server.

This patch must be backported to 1.4 versions which contain the fix above
(anything > 1.4.8), and to similar 1.3 versions > 1.3.25. 1.5-dev versions
after 1.5-dev2 are affected too.

[BUG] stick-tables did not work when converting IPv6 to IPv4

A stick-table of type IPv6 would store a wrong IPv4 address as the
result of an IPv6 to IPv4 conversion. This bug was introduced in
1.5-dev5.

[BUG] TCP source tracking was broken with IPv6 changes

John Helliwell reported a bug when using TCP source address
tracking on Solaris. The bug was introduced in haproxy 1.5-dev5.

[BUG] stream_sock: use get_addr_len() instead of sizeof() on sockaddr_storage

John Helliwell reported a runtime issue on Solaris since 1.5-dev5. Traces
show that connect() returns EINVAL, which means the socket length is not
appropriate for the family. Solaris does not like being called with sizeof
and needs the address family's size on sockaddr_storage.

The fix consists in adding a get_addr_len() function which returns the
socket's address length based on its family. Tests show that this works
for both IPv4 and IPv6 addresses.

[RELEASE] Released version 1.5-dev5

Released version 1.5-dev5 with the following main changes :
    - [BUG] standard: is_addr return value for IPv4 was inverted
    - [MINOR] update comment about IPv6 support for server
    - [MEDIUM] use getaddrinfo to resolve names if gethostbyname fail
    - [DOC] update IPv6 support for bind
    - [DOC] document IPv6 support for server
    - [DOC] fix a minor typo
    - [MEDIUM] IPv6 support for syslog
    - [DOC] document IPv6 support for syslog
    - [MEDIUM] IPv6 support for stick-tables
    - [DOC] document IPv6 support for stick-tables
    - [DOC] update ROADMAP file
    - [BUG] session: src_conn_cur was returning src_conn_cnt instead
    - [MINOR] frontend: add a make_proxy_line function
    - [MEDIUM] stream_sock: add support for sending the proxy protocol header line
    - [MEDIUM] server: add support for the "send-proxy" option
    - [DOC] update the spec on the proxy protocol
    - [BUILD] proto_tcp: fix build issue with CTTPROXY
    - [DOC] update ROADMAP file
    - [MEDIUM] config: rework the IPv4/IPv6 address parser to support host-only addresses
    - [MINOR] cfgparse: better report wrong listening addresses and make use of str2sa_range
    - [BUILD] add the USE_GETADDRINFO build option
    - [TESTS] provide a test case for various address formats
    - [BUG] session: conn_retries was not always initialized
    - [BUG] log: retrieve the target from the session, not the SI
    - [BUG] http: fix possible incorrect forwarded wrapping chunk size (take 2)
    - [MINOR] tools: add two macros MID_RANGE and MAX_RANGE
    - [BUG] http: fix content-length handling on 32-bit platforms
    - [OPTIM] buffers: uninline buffer_forward()
    - [BUG] stream_sock: fix handling for server side PROXY protocol
    - [MINOR] acl: add support for table_cnt and table_avl matches
    - [DOC] update ROADMAP file

[DOC] document IPv6 support for stick-tables

[MEDIUM] IPv6 support for stick-tables

Since IPv6 is a different type than IPv4, the pattern fetch functions
src6 and dst6 were added. IPv6 stick-tables can also fetch IPv4 addresses
with src and dst. In this case, the IPv4 addresses are mapped to their
IPv6 counterpart, according to RFC 4291.

[DOC] update ROADMAP file

[MINOR] acl: add support for table_cnt and table_avl matches

Those trivial matches respectively return the number of entries used
in a stick-table and the number of entries still available in a table.

[BUG] stream_sock: fix handling for server side PROXY protocol

Patch 5ab04ec47c9946a2bbc535687c023215ca813da0 was incomplete,
because if the first send() fails on an empty buffer, we fail
to rearm the polling and we can't establish the connection
anymore.

The issue was reported by Ben Timby who provided large amounts
of traces of various tests helping to reliably reproduce the issue.

[DOC] document IPv6 support for syslog

[MEDIUM] IPv6 support for syslog

[OPTIM] buffers: uninline buffer_forward()

Since the latest additions to buffer_forward(), it became too large for
inlining, so let's uninline it. The code size drops by 3kB. Should be
backported to 1.4 too.

[BUG] http: fix content-length handling on 32-bit platforms

Despite much care around handling the content-length as a 64-bit integer,
forwarding was broken on 32-bit platforms due to the 32-bit nature of
the ->to_forward member of the "buffer" struct. The issue is that this
member is declared as a long, so while it works OK on 64-bit platforms,
32-bit truncate the content-length to the lower 32-bits.

One solution could consist in turning to_forward to a long long, but it
is used a lot in the critical path, so it's not acceptable to perform
all buffer size computations on 64-bit there.

The fix consists in changing the to_forward member to a strict 32-bit
integer and ensure in buffer_forward() that only the amount of bytes
that can fit into it is considered. Callers of buffer_forward() are
responsible for checking that their data were taken into account. We
arbitrarily ensure we never consider more than 2G at once.

That's the way it was intended to work on 32-bit platforms except that
it did not.

This issue was tracked down hard at Exosec with Bertrand Jacquin,
Thierry Fournier and Julien Thomas. It remained undetected for a long
time because files larger than 4G are almost always transferred in
chunked-encoded format, and most platforms dealing with huge contents
these days run on 64-bit.

The bug affects all 1.5 and 1.4 versions, and must be backported.

[MINOR] tools: add two macros MID_RANGE and MAX_RANGE

Those will be used later, they return the largest and middle integer
possible for a given variable or type.

[BUG] http: fix possible incorrect forwarded wrapping chunk size (take 2)

Fix acd20f80 was incomplete, the computed "bytes" value was not used.

This fix must be backported to 1.4.

[BUG] log: retrieve the target from the session, not the SI

Since we now have the copy of the target in the session, use it instead
of relying on the SI for it. The SI drops the target upon unregister()
so applets such as stats were logged as "NOSRV".

[BUG] session: conn_retries was not always initialized

Johannes Smith reported some wrong retries count in logs associated with bad
requests. The cause was that the conn_retries field in the stream interface
was only initialized when attempting to connect, but is used when logging,
possibly with an uninitialized value holding last connection's conn_retries.
This could have been avoided by making use of a stream interface initializer.

This bug is 1.5-specific.

[TESTS] provide a test case for various address formats

[DOC] fix a minor typo

[DOC] document IPv6 support for server

[DOC] update IPv6 support for bind