Willy Tarreau [Mon, 15 Oct 2007 18:36:37 +0000 (20:36 +0200)]
[BUG] fix wrong timeout computation in event_accept()
In case the incoming socket is set for write and not for read (very
unlikely, except in HEALTH mode), the timeout may remain eternity due
to a copy-paste typo.
Willy Tarreau [Tue, 9 Oct 2007 15:14:37 +0000 (17:14 +0200)]
[MEDIUM] moved the sockaddr pointer to the fdtab structure
The stream_sock_* functions had to know about sessions just in
order to get the server's address for a connect() operation. This
is not desirable, particularly for non-IP protocols (eg: PF_UNIX).
Put a pointer to the peer's sockaddr_storage or sockaddr address
in the fdtab structure so that we never need to look further.
With this small change, the stream_sock.c file is now 100% protocol
independant.
[MINOR] report haproxy's version by default on the stats page
For people who manage many haproxies, it is sometimes convenient
to be informed of their version. This patch adds this, with the
option to disable this report by specifying "stats hide-version".
Also, the feature may be permanently disabled by setting the
STATS_VERSION_STRING to "" (empty string), or the format can
simply be adjusted.
Willy Tarreau [Sun, 14 Oct 2007 21:47:04 +0000 (23:47 +0200)]
[MINOR] spread checks also when the server is OK.
Initial patch only managed to spread the checks when the checks
failed. The randomization code needs to be added also in the path
where the server is going fine.
Willy Tarreau [Sun, 14 Oct 2007 21:05:39 +0000 (23:05 +0200)]
[MEDIUM] only consider slow checks when looking for the common interval
When one server in one backend has a very low check interval, it imposes
its value as the minimal interval, causing all other servers to start
their checks close to each other, thus partially voiding the benefits of
the spread checks.
The solution consists in ignoring intervals lower than a given value
(SRV_CHK_INTER_THRES = 1000 ms) when computing the minimal interval,
and then assigning them a start date relative to their own interval
and not the global one.
With this change, the checks distribution clearly looks better.
When one server appears at the same position in multiple backends, it
receives all the checks from all the backends exactly at the same time
because the health-checks are only spread within a backend but not
globally.
Attached patch implements per-server start delay in a different way.
Checks are now spread globally - not locally to one backend. It also makes
them start faster - IMHO there is no need to add a 'server->inter' when
calculating first execution. Calculation were moved from cfgparse.c to
checks.c. There is a new function start_checks() and now it is not called
when haproxy is started in MODE_CHECK.
With this patch it is also possible to set a global 'spread-checks'
parameter. It takes a percentage value (1..50, probably something near
5..10 is a good idea) so haproxy adds or removes that many percent to the
original interval after each check. My test shows that with 18 backends,
54 servers total and 10000ms/5% it takes about 45m to mix them completely.
I decided to use rand/srand pseudo-random number generator. I am aware it
is not recommend for a good randomness but a) we do not need a good random
generator here b) it is probably the most portable one.
Alexandre Cassen [Thu, 11 Oct 2007 18:48:58 +0000 (20:48 +0200)]
[MINOR] add the "nolinger" option to disable data lingering
The following patch will give the ability to tweak socket linger mode.
You can use this option with "option nolinger" inside fronted or backend
configuration declaration.
This will help in environments where lots of FIN_WAIT sockets are
encountered.
[MEDIUM] do not add a cache-control: header when on non-cacheable responses
I noticed that haproxy, with "cookie (...) nocache" option, always adds
"Cache-control: private" at the end of a header list received from this
server:
It may be just redundant (two "Cache-control: private"), but sometimes it
may be quite confused as we may end with two different, more and less
restricted directions (no-cache & private) and even quite conflicting
directions (eg. public & private):
So, I added and rearranged a code, so now haproxy adds a "Cache-control:
private" header only when there is no the same (private) or more
restrictive (no-cache) one. It was done in three steps:
1. Use check_response_for_cacheability to check if response is
not cacheable. I simply moved this call before http_header_add_tail2.
2. Use TX_CACHEABLE (not TX_CACHE_COOK - apache <= 1.3.26) to check if we
need to add a Cache-control header. If we add it, clear TX_CACHEABLE and
TX_CACHE_COOK.
3. Check cacheability not only with PR_O_CHK_CACHE but also with
PR_O_COOK_NOC, so:
I removed this unlikely since I believe that now it is not so unlikely.
The patch is definitely not perfect, proxy should probably also remove
"Cache-control: public". Unfortunately, I do not know the code good enough
to do in myself, yet. ;)
Anyway, I think that even now, it should be very useful.
[MINOR] prevent the system from sending an RST when closing health-checks
On Sat, 22 Sep 2007, Willy Tarreau wrote:
> On Sun, Sep 23, 2007 at 03:23:38AM +0200, Krzysztof Oledzki wrote:
> > I noticed that with httpchk, haproxy generates TCP RST at end of a check.
> > IMHO, it would be more polite to send FIN to a server, especially that
> > each TCP RST found by a tcpdump makes me concerned that something is
> > wrong, as it is hard to distinguish between a RST from a httpchk and from
> > a normal request, forwarded for a client.
>
> I have also noticed it very recently. In fact, it's never the
> application (here haproxy) which decides to send an RST, it's the
> system. It does so because the server returns data on a terminated
> socket. I guess it's because the health-check code does not read much
> of the response. In fact, we just need to read enough to process common
> responses. If people are dumb enough to check with something like "GET
> /image.iso", they should expect to get an RST after a few kbytes
> instead of reading the whole file!
Right, that was easy. Attached patch changed what you described. Now
haproxy finishes http checks with FIN.
This patch fixes a nasty bug raported by both glibc and valgrind, which
leads into a problem that haproxy does not exit when a new instace
starts ap (-sf/-st).
==9299== Invalid free() / delete / delete[]
==9299== at 0x401D095: free (in
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==9299== by 0x804A377: deinit (haproxy.c:721)
==9299== by 0x804A883: main (haproxy.c:1014)
==9299== Address 0x41859E0 is 0 bytes inside a block of size 21 free'd
==9299== at 0x401D095: free (in
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==9299== by 0x804A84B: main (haproxy.c:985)
==9299==
[MAJOR] timeouts and retries could be ignored when switching backend
When switching from a frontend to a backend, the "retries" parameter
was not kept, resulting in the impossibility to reconnect after the
first connection failure. This problem was reported and analyzed by
Krzysztof Oledzki.
While fixing the code, it appeared that some of the backend's timeouts
were not updated in the session when using "use_backend" or "default_backend".
It seems this had no impact but just in case, it's better to set them as
they should have been.
[MEDIUM] pre-initialize timeouts to infinity, not zero
Since the timers have been changed, the timeouts for the default instance
have not been adjusted. This results in unspecified timeouts becoming zero
instead of infinite.
[MINOR] set the log socket receive window to zero bytes
The syslog UDP socket may receive data, which is not cool because those
data accumulate in the system buffers up to the receive socket buffer size.
To prevent this, we set the receive window to zero and try to shutdown(SHUT_RD)
the socket.
[MEDIUM] fix configuration sanity checks for TCP listeners
A log chain of if/else prevented many sanity checks from being
performed on TCP listeners, resulting in dangerous configs being
accepted. Removed the offending 'else'.
[BUILD] centralize version and date into one file for each
The version does not appear anymore in the Makefiles nor in
the include files. It was a nightmare to maintain. Now there
is a VERSION file which contains the major version, a VERDATE
file which contains the date for this version and a SUBVERS
file which may contain a sub-version.
A "make version" target has been added to all makefiles to
check the version. The GNU Makefile also has an update-version
target to update those files. This should never be used.
It is still possible to override those values by specifying
them in the equivalent make variables. By default, the GNU
makefile tries to detect a GIT repository and always uses the
version and date from the current repository. This can be
disabled by setting IGNOREGIT to a non-void value.
[MAJOR] remove files distributed under an obscure license
src/chtbl.c, src/hashpjw.c and src/list.c are distributed under
an obscure license. While Aleks and I believe that this license
is OK for haproxy, other people think it is not compatible with
the GPL.
Whether it is or not is not the problem. The fact that it rises
a doubt is sufficient for this problem to be addressed. Arnaud
Cornet rewrote the unclear parts with clean GPLv2 and LGPL code.
The hash algorithm has changed too and the code has been slightly
simplified in the process. A lot of care has been taken in order
to respect the original API as much as possible, including the
LGPL for the exportable parts.
The new code has not been thoroughly tested but it looks OK now.
added "wt_hash" which shows only 60 collisions in 575k values, which
sets it between hashword() and djbx33(). It's also between both in
terms of performance, but the most important part is that its variable
length rotation mechanism should make it really harder to predict and
attack than the other ones.
Willy Tarreau [Fri, 31 Aug 2007 15:01:18 +0000 (17:01 +0200)]
[MAJOR] spec I/O: fix allocations of spec entries for an FD
Under some circumstances, it was possible with speculative I/O to
reallocate multiple entries for the same FD if an fd_{set,clr,set}
or fd_{clr,set,clr} sequences were performed before a schedule.
Fix this by keeping a an allocation flag for each fd.
[MEDIUM] stats page: added links for 'refresh' and 'hide down'
The stats page now supports an option to hide servers which are DOWN
and to enable/disable automatic refresh. It is also possible to ask
for an immediate refresh.
[MEDIUM] ensure we never overflow in chunk_printf()
The result of the vsnprintf() called in chunk_printf() must be checked,
and should be added only if lower than the requested size. We simply
return zero if we cannot write the chunk.
[MINOR] add support for "stats refresh <interval>"
Sometimes it may be desirable to automatically refresh the
stats page. Most browsers support the "Refresh:" header with
an interval in seconds. Specifying "stats refresh xxx" will
automatically add this header.
This new configuration manual intends to document every known keyword
of the configuration language. Right now, it enumerates them all and
describes how to use ACLs.
The GCD used when computing the servers' weights causes the total
weight of the backend to appear lower than expected because it is
divided by the GCD. Easy solution consists in recomputing the GCD
from the first server and apply it to the global weight.
[MEDIUM] improve behaviour with large number of servers per proxy
When a very large number of servers is configured (thousands),
shutting down many of them at once could lead to large number
of calls to recalc_server_map() which already takes some time.
This would result in an O(N^3) computation time, leading to
noticeable pauses on slow embedded CPUs on test platforms.
Instead, mark the map as dirty and recalc it only when needed.
[MEDIUM] fade out memory usage when stopping proxies
Now we try to free as many pools as possible when a proxy is stopping.
The reason is that we want to ease the process replacement when applying
a new configuration, without keeping too many unused memory allocated.
[MEDIUM] Added easier support for Doug Lea's malloc (dlmalloc)
It's now as easy as passing "DLMALLOC_SRC=<path_to_dlmalloc.c>" to
build with support for dlmalloc. The dlmalloc source is not provided
with haproxy in order to ensure that people will use either the most
recent, or the most suited version for their platform. The minimal
mmap size is specified in DLMALLOC_THRES, which defaults to 4096. It
should be increased on platforms with larger pages (eg: 8 kB on some
64 bit systems).
Willy Tarreau [Sun, 17 Jun 2007 21:41:40 +0000 (23:41 +0200)]
[RELEASE] Released version 1.3.12 with the following main changes :
- acl: smarter integer comparison support in ACLs
- acl: specify the direction during fetches
- acl: provide the argument length for fetch functions
- acl: provide a reference to the expr to fetch()
- acl: implement matching on header values
- acl: support maching on 'path' component
- acl: permit to return any header when no name specified
- errorfile: use a local file to feed error messages
- negation in ACL conds was not cleared between terms
- fix segfault at exit when using captures
- improve memory freeing upon exit
- acl: support '-i' to ignore case when matching
- str2net() must not change the const char *
- provide default ACLs
- acl: distinguish between request and response headers
- added the 'use_backend' keyword for full content-switching
- acl: added the TRUE and FALSE ACLs.
- shut warnings 'is*' macros from ctype.h on solaris
Willy Tarreau [Sun, 17 Jun 2007 18:40:25 +0000 (20:40 +0200)]
[MEDIUM] acl: added the TRUE and FALSE ACLs.
Those ACLs are sometimes useful for troubleshooting. Two ACL subjects
"always_true" and "always_false" have been added too. They return what
their subject says for every pattern. Also, acl_match_pst() has been
removed.
Willy Tarreau [Sat, 16 Jun 2007 22:36:03 +0000 (00:36 +0200)]
[MINOR] improve memory freeing upon exit
The deinit() function is specialized in memory area freeing.
There were a ton of information that were not released at the
exit time, which made valgrind complain. Now, most of the entries
are freed. However, it seems like regfree() does not completely
free a regex (12 bytes lost per regex).
Willy Tarreau [Sat, 16 Jun 2007 21:19:53 +0000 (23:19 +0200)]
[BUG] fix segfault at exit when using captures
since pools v2, the way pools were destroyed at exit is incorrect
because it ought to account for users of those pools before freeing
them. This test also ensures there is no double free.
Willy Tarreau [Sun, 10 Jun 2007 22:29:26 +0000 (00:29 +0200)]
[MEDIUM] errorfile: use a local file to feed error messages
It is now possible to read error messages from local files,
using the 'errorfile' keyword. Those files are read during
parsing, so there's no I/O involved. They make it possible
to return custom error messages with custom status and headers.
Willy Tarreau [Sun, 10 Jun 2007 19:28:46 +0000 (21:28 +0200)]
[MEDIUM] acl: support maching on 'path' component
'path', 'path_reg', 'path_beg', 'path_end', 'path_sub', 'path_dir'
and 'path_dom' have been implemented to process the path component
of the URI. It starts after the host part, and stops before the
question mark.
Willy Tarreau [Sun, 10 Jun 2007 17:45:56 +0000 (19:45 +0200)]
[MEDIUM] acl: implement matching on header values
hdr(x), hdr_reg(x), hdr_beg(x), hdr_end(x), hdr_sub(x), hdr_dir(x),
hdr_dom(x), hdr_cnt(x) and hdr_val(x) have been implemented. They
apply to any of the possibly multiple values of header <x>.
Right now, hdr_val() is limited to integer matching, but it should
reasonably be upgraded to match long long ints.
Willy Tarreau [Sun, 10 Jun 2007 08:06:18 +0000 (10:06 +0200)]
[MINOR] acl: specify the direction during fetches
Some fetches such as 'line' or 'hdr' need to know the direction of
the test (request or response). A new 'dir' parameter is now
propagated from the caller to achieve this.
Willy Tarreau [Sat, 9 Jun 2007 21:10:04 +0000 (23:10 +0200)]
[MEDIUM] smarter integer comparison support in ACLs
ACLs now support operators such as 'eq', 'le', 'lt', 'ge' and 'gt'
in order to give more flexibility to the language. Because of this
change, the 'dst_limit' keyword changed to 'dst_conn' and now requires
either a range or a test such as 'dst_conn lt 1000' which is more
understandable.
Willy Tarreau [Sun, 3 Jun 2007 15:27:07 +0000 (17:27 +0200)]
[RELEASE] Released version 1.3.11.4 with the following main changes :
- do not re-arm read timeout in SHUTR state
- optimize I/O by detecting system starvation
- the epoll FD must not be shared between processes
- limit the number of events returned by *poll*
Willy Tarreau [Sun, 3 Jun 2007 15:16:49 +0000 (17:16 +0200)]
[MEDIUM] limit the number of events returned by *poll*
By default, epoll/kqueue used to return as many events as possible.
This could sometimes cause huge latencies (latencies of up to 400 ms
have been observed with many thousands of fds at once). Limiting the
number of events returned also reduces the latency by avoiding too
many blind processing. The value is set to 200 by default and can be
changed in the global section using the tune.maxpollevents parameter.
Willy Tarreau [Sun, 3 Jun 2007 14:40:44 +0000 (16:40 +0200)]
[BUG] the epoll FD must not be shared between processes
Recreate the epoll file descriptor after a fork(). It will ensure
that all processes will not share their epoll_fd. Some side effects
were encountered because of this, such as epoll_wait() returning an
FD which was previously deleted, in multi-process mode.
Willy Tarreau [Sun, 3 Jun 2007 12:10:36 +0000 (14:10 +0200)]
[MEDIUM] optimize I/O by detecting system starvation
Compare the results of recv/send with the parameter passed and
detect whether the system has no free buffer space for send()
or has no data anymore for recv(). This dramatically reduces
the number of syscalls (by about 23%).
Willy Tarreau [Sun, 3 Jun 2007 13:59:52 +0000 (15:59 +0200)]
[BUG] do not re-arm read timeout after writing data
A second occurrence of read-timeout rearming was present in stream_sock.c.
To fix the problem, it was necessary to put the shutdown information in
the buffer (already planned).
Willy Tarreau [Sun, 3 Jun 2007 13:25:37 +0000 (15:25 +0200)]
[BUG] do not re-arm read timeout in SHUTR state !
There is a long-time bug causing busy loops when either client-side
or server-side enters a SHUTR state. When writing data to the FD,
it was possible to re-arm the read side if the write had been paused.
Willy Tarreau [Mon, 14 May 2007 00:42:33 +0000 (02:42 +0200)]
[RELEASE] Released version 1.3.11 with the following main changes :
- fixed ev_sepoll again by rewriting the state machine
- switched all timeouts to timevals instead of milliseconds
- improved memory management using mempools v2.
- several minor optimizations
Willy Tarreau [Mon, 14 May 2007 00:11:39 +0000 (02:11 +0200)]
[MINOR] disable useless hint in wake_expired_tasks
wake_expired_tasks() used a hint to avoid scanning the tree in most cases,
but it looks like the hint is more expensive than reaching the first node
in the tree. Disable it for now.
Willy Tarreau [Mon, 14 May 2007 00:02:04 +0000 (02:02 +0200)]
[BUG] fix null timeouts in *poll-based pollers
Introduction of timeval timers broke *poll-based pollers, because the call to
tv_ms_remain may return 0 while the event is not elapsed yet. Now we carefully
check for those cases and round the result up by 1 ms.
Willy Tarreau [Sun, 13 May 2007 22:39:29 +0000 (00:39 +0200)]
[MAJOR] call garbage collector when doing soft stop
When we're interrupted by another instance, it is very likely
that the other one will need some memory. Now we know how to
free what is not used, so let's do it.
Also only free non-null pointers. Previously, pool_destroy()
did implicitly check for this case which was incidentely
needed.
Willy Tarreau [Sun, 13 May 2007 22:16:13 +0000 (00:16 +0200)]
[MEDIUM] enhance behaviour of mempools v2
- keep the number of users of each pool
- call the garbage collector on out of memory conditions
- sort the pools by size for faster creation
- force the alignment size to 16 bytes instead of 4*sizeof(void *)
Willy Tarreau [Sun, 13 May 2007 19:29:55 +0000 (21:29 +0200)]
[MAJOR] ported appsession to use mempools v2
Also during this process, a bug was found in appsession_refresh().
It would not automatically requeue the task in the queue, so the
old sessions would not vanish.
Willy Tarreau [Sun, 13 May 2007 09:40:04 +0000 (11:40 +0200)]
[TESTS] updates to hash experimentations
Aleksandar Lazic has collected many hashing algorithms and put them
in one file to ease benchmarking. Some algos look promising, some
of them have been checked further with uri_hash. Some results on
various systems/hardware are stored in hash_results.txt.
Willy Tarreau [Sat, 12 May 2007 23:52:05 +0000 (01:52 +0200)]
[BUG] fix ev_sepoll again, this time with a new state machine
It was possible in ev_sepoll() to ignore certain events if
all speculative events had been processed at once, because
the epoll_wait() timeout was not cleared, thus delaying the
events delivery.
The state machine was complicated, it has been rewritten.
It seems faster and more correct right now.
Willy Tarreau [Sat, 12 May 2007 20:35:00 +0000 (22:35 +0200)]
[MAJOR] replaced all timeouts with struct timeval
The timeout functions were difficult to manipulate because they were
rounding results to the millisecond. Thus, it was difficult to compare
and to check what expired and what did not. Also, the comparison
functions were heavy with multiplies and divides by 1000. Now, all
timeouts are stored in timevals, reducing the number of operations
for updates and leading to cleaner and more efficient code.
Willy Tarreau [Wed, 9 May 2007 19:57:51 +0000 (21:57 +0200)]
[BUG] two missing states in sepoll transition matrix
Two states were missing in the speculative epoll state transition
matrix. This could cause some timeouts and unhandled events. The
problem showed up in TCP mode with a fast server at high session
rates, but could in theory also affect HTTP mode.