Willy Tarreau [Sat, 9 Jun 2007 21:10:04 +0000 (23:10 +0200)]
[MEDIUM] smarter integer comparison support in ACLs
ACLs now support operators such as 'eq', 'le', 'lt', 'ge' and 'gt'
in order to give more flexibility to the language. Because of this
change, the 'dst_limit' keyword changed to 'dst_conn' and now requires
either a range or a test such as 'dst_conn lt 1000' which is more
understandable.
Willy Tarreau [Sun, 3 Jun 2007 15:27:07 +0000 (17:27 +0200)]
[RELEASE] Released version 1.3.11.4 with the following main changes :
- do not re-arm read timeout in SHUTR state
- optimize I/O by detecting system starvation
- the epoll FD must not be shared between processes
- limit the number of events returned by *poll*
Willy Tarreau [Sun, 3 Jun 2007 15:16:49 +0000 (17:16 +0200)]
[MEDIUM] limit the number of events returned by *poll*
By default, epoll/kqueue used to return as many events as possible.
This could sometimes cause huge latencies (latencies of up to 400 ms
have been observed with many thousands of fds at once). Limiting the
number of events returned also reduces the latency by avoiding too
many blind processing. The value is set to 200 by default and can be
changed in the global section using the tune.maxpollevents parameter.
Willy Tarreau [Sun, 3 Jun 2007 14:40:44 +0000 (16:40 +0200)]
[BUG] the epoll FD must not be shared between processes
Recreate the epoll file descriptor after a fork(). It will ensure
that all processes will not share their epoll_fd. Some side effects
were encountered because of this, such as epoll_wait() returning an
FD which was previously deleted, in multi-process mode.
Willy Tarreau [Sun, 3 Jun 2007 12:10:36 +0000 (14:10 +0200)]
[MEDIUM] optimize I/O by detecting system starvation
Compare the results of recv/send with the parameter passed and
detect whether the system has no free buffer space for send()
or has no data anymore for recv(). This dramatically reduces
the number of syscalls (by about 23%).
Willy Tarreau [Sun, 3 Jun 2007 13:59:52 +0000 (15:59 +0200)]
[BUG] do not re-arm read timeout after writing data
A second occurrence of read-timeout rearming was present in stream_sock.c.
To fix the problem, it was necessary to put the shutdown information in
the buffer (already planned).
Willy Tarreau [Sun, 3 Jun 2007 13:25:37 +0000 (15:25 +0200)]
[BUG] do not re-arm read timeout in SHUTR state !
There is a long-time bug causing busy loops when either client-side
or server-side enters a SHUTR state. When writing data to the FD,
it was possible to re-arm the read side if the write had been paused.
Willy Tarreau [Mon, 14 May 2007 00:42:33 +0000 (02:42 +0200)]
[RELEASE] Released version 1.3.11 with the following main changes :
- fixed ev_sepoll again by rewriting the state machine
- switched all timeouts to timevals instead of milliseconds
- improved memory management using mempools v2.
- several minor optimizations
Willy Tarreau [Mon, 14 May 2007 00:11:39 +0000 (02:11 +0200)]
[MINOR] disable useless hint in wake_expired_tasks
wake_expired_tasks() used a hint to avoid scanning the tree in most cases,
but it looks like the hint is more expensive than reaching the first node
in the tree. Disable it for now.
Willy Tarreau [Mon, 14 May 2007 00:02:04 +0000 (02:02 +0200)]
[BUG] fix null timeouts in *poll-based pollers
Introduction of timeval timers broke *poll-based pollers, because the call to
tv_ms_remain may return 0 while the event is not elapsed yet. Now we carefully
check for those cases and round the result up by 1 ms.
Willy Tarreau [Sun, 13 May 2007 22:39:29 +0000 (00:39 +0200)]
[MAJOR] call garbage collector when doing soft stop
When we're interrupted by another instance, it is very likely
that the other one will need some memory. Now we know how to
free what is not used, so let's do it.
Also only free non-null pointers. Previously, pool_destroy()
did implicitly check for this case which was incidentely
needed.
Willy Tarreau [Sun, 13 May 2007 22:16:13 +0000 (00:16 +0200)]
[MEDIUM] enhance behaviour of mempools v2
- keep the number of users of each pool
- call the garbage collector on out of memory conditions
- sort the pools by size for faster creation
- force the alignment size to 16 bytes instead of 4*sizeof(void *)
Willy Tarreau [Sun, 13 May 2007 19:29:55 +0000 (21:29 +0200)]
[MAJOR] ported appsession to use mempools v2
Also during this process, a bug was found in appsession_refresh().
It would not automatically requeue the task in the queue, so the
old sessions would not vanish.
Willy Tarreau [Sun, 13 May 2007 09:40:04 +0000 (11:40 +0200)]
[TESTS] updates to hash experimentations
Aleksandar Lazic has collected many hashing algorithms and put them
in one file to ease benchmarking. Some algos look promising, some
of them have been checked further with uri_hash. Some results on
various systems/hardware are stored in hash_results.txt.
Willy Tarreau [Sat, 12 May 2007 23:52:05 +0000 (01:52 +0200)]
[BUG] fix ev_sepoll again, this time with a new state machine
It was possible in ev_sepoll() to ignore certain events if
all speculative events had been processed at once, because
the epoll_wait() timeout was not cleared, thus delaying the
events delivery.
The state machine was complicated, it has been rewritten.
It seems faster and more correct right now.
Willy Tarreau [Sat, 12 May 2007 20:35:00 +0000 (22:35 +0200)]
[MAJOR] replaced all timeouts with struct timeval
The timeout functions were difficult to manipulate because they were
rounding results to the millisecond. Thus, it was difficult to compare
and to check what expired and what did not. Also, the comparison
functions were heavy with multiplies and divides by 1000. Now, all
timeouts are stored in timevals, reducing the number of operations
for updates and leading to cleaner and more efficient code.
Willy Tarreau [Wed, 9 May 2007 19:57:51 +0000 (21:57 +0200)]
[BUG] two missing states in sepoll transition matrix
Two states were missing in the speculative epoll state transition
matrix. This could cause some timeouts and unhandled events. The
problem showed up in TCP mode with a fast server at high session
rates, but could in theory also affect HTTP mode.
Willy Tarreau [Tue, 8 May 2007 23:44:58 +0000 (01:44 +0200)]
[RELEASE] Released version 1.3.10 with the following main changes :
- several fixes in ev_sepoll
- fixed some expiration dates on some tasks
- fixed a bug in connection establishment detection due to speculative I/O
- fixed rare bug occuring on TCP with early close (reported by Andy Smith)
- implemented URI hashing algorithm (Guillaume Dallaire)
- implemented SMTP health checks (Peter van Dijk)
- replaced the rbtree with ul2tree from old scheduler project
- new framework for generic ACL support
- added the 'acl' and 'block' keywords to the config language
- added several ACL criteria and matches (IP, port, URI, ...)
- cleaned up and better modularization for some time functions
- fixed list macros
- fixed useless memory allocation in str2net()
- store the original destination address in the session
Willy Tarreau [Tue, 8 May 2007 22:54:10 +0000 (00:54 +0200)]
[MAJOR] fixed some expiration dates on tasks
The time subsystem really needs fixing. It was still possible
that some tasks with expiration date below the millisecond in
the future caused busy loop around poll() waiting for the
timeout to happen.
Willy Tarreau [Tue, 8 May 2007 21:50:35 +0000 (23:50 +0200)]
[MEDIUM] implement SMTP health checks
Peter van Dijk contributed this patch which implements the "smtpchk"
option, which is to SMTP what "httpchk" is to HTTP. By default, it sends
"HELO localhost" to the servers, and waits for the 250 message, but it
can also send a specific request.
Willy Tarreau [Tue, 8 May 2007 17:56:15 +0000 (19:56 +0200)]
[MINOR] implement the ACL keywords 'dst' and 'dport'
The file client.c now provides acl_fetch_dip and acl_fetch_dport
to be able to check the client's destination address and port. The
corresponding ACL keywords 'dst' and 'dport' have been added.
Willy Tarreau [Sun, 6 May 2007 22:58:25 +0000 (00:58 +0200)]
[MEDIUM] added the 'block' keyword to the config language
The new 'block' keyword makes it possible to block a request based on
ACL test results. Block accepts two optional arguments : 'if' <cond>
and 'unless' <cond>.
The request will be blocked with a 403 response if the condition is validated
(if) or if it is not (unless). Do not rely on this one too much, as it's more
of a proof of concept helping in developing other matches.
Willy Tarreau [Sun, 6 May 2007 22:36:48 +0000 (00:36 +0200)]
[MAJOR] new framework for generic ACL support
This framework offers all other subsystems the ability to register
ACL matching criteria. Some generic matching functions are already
provided. Others will come soon and the framework shall evolve.
Willy Tarreau [Tue, 8 May 2007 17:46:30 +0000 (19:46 +0200)]
[MEDIUM] store the original destination address in the session
There are multiple places where the client's destination address is
required. Let's store it in the session when needed, and add a flag
to inform that it has been retrieved.
Willy Tarreau [Tue, 8 May 2007 21:22:43 +0000 (23:22 +0200)]
[TESTS] added a trivial program to benchmark hash algos
The uri_hash.c program makes it very easy to benchmark the
distribution of hash algos. Pass it one word per line, and
it will show the distribution per server for 1 to 10 servers.
Willy Tarreau [Tue, 8 May 2007 12:46:53 +0000 (14:46 +0200)]
[BUG] fix early server close after client close
Problem reported by Andy Smith. If a client sends TCP data
and quickly closes the connection before the server connection
is established, AND the whole buffer can be sent at once when
the connection establishes, then the server side believes that
it can simply abort the connection because the buffer is empty,
without checking that some work was performed.
Fix: ensure that nothing was written before closing.
Willy Tarreau [Wed, 2 May 2007 18:50:16 +0000 (20:50 +0200)]
[MEDIUM] ensure that we always have a null word in config
It is important when parsing configuration file to ensure that at
least one word is empty to mark the end of the line. This will be
required with ACLs in order to avoid reading past the end of line.
Since the introduction of speculative I/O, it was not always possible
to correctly detect a connection establishment. Particularly, in TCP
mode, there is no data to send and getsockopt() returns no error. The
solution consists in trying a connect() again to get its diagnostic.
[MEDIUM] implement and use tv_cmp2_le instead of tv_cmp2_ms
tv_cmp2_ms handles multiple combinations of tv1 and tv2, but only
one form is used: (tv1 <= tv2). So it is overkill to use it everywhere.
A new function designed to do exactly this has been written for that
purpose: tv_cmp2_le. Also, removed old unused tv_* functions.
The fact that TV_ETERNITY was 0 was very awkward because it
required that comparison functions handled the special case.
Now it is ~0 and all comparisons are performed on unsigned
values, so that it is naturally greater than any other value.
A performance gain of about 2-5% has been noticed.
The rbtree-based wait queue consumes a lot of CPU. Use the ul2tree
instead. Lots of cleanups and code reorganizations made it possible
to reduce the task struct and simplify the code a bit.
[RELEASE] Released version 1.3.9 with the following changes :
- modularized the polling mechanisms and use function pointers instead
of macros at many places
- implemented support for FreeBSD's kqueue() polling mechanism
- fixed a warning on OpenBSD : MIN/MAX redefined
- change socket registration order at startup to accomodate kqueue.
- several makefile cleanups to support old shells
- fix build with limits.h once for all
- ev_epoll: do not rely on fd_sets anymore, use changes stacks instead.
- fdtab now holds the results of polling
- implemented support for speculative I/O processing with epoll()
- remove useless calls to shutdown(SHUT_RD), resulting in small speed boost
- auto-registering of pollers at load time
The principle behind speculative I/O is to speculatively try to
perform I/O before registering the events in the system. This
considerably reduces the number of calls to epoll_ctl() and
sometimes even epoll_wait(), and manages to increase overall
performance by about 10%.
The new poller has been called "sepoll". It is used by default
on Linux when it works. A corresponding option "nosepoll" and
the command line argument "-ds" allow to disable it.
Gcc provides __attribute__((constructor)) which is very convenient
to execute functions at startup right before main(). All the pollers
have been converted to have their register() function declared like
this, so that it is not necessary anymore to call them from a centralized
file.
[MAJOR] implemented support for speculative I/O processing
The pollers will now be able to speculatively call the I/O
processing functions and decide whether or not they want to
poll on those FDs. The changes primarily consist in teaching
those functions how to pass the info they got an EAGAIN.
[MINOR] add support for the polling results in fdtab
Now fdtab can contain the FD_POLL_* events so that the pollers
which can fill them can give userful information to readers and
writers about the precise condition of wakeup.
It may be dangerous to play with fdtab before doing fd_insert()
because this last one is responsible for growing maxfd as needed.
Call fd_insert() before instead.
Patch #cf83df3d162687d9c74783357421bd89f596eaac was stupid. Including
limits.h is portable and easier. At least it now builds on Solaris,
FreeBSD, Linux and OpenBSD.
[MAJOR] delay registering of listener sockets at startup
Some pollers such as kqueue lose their FD across fork(), meaning that
the registered file descriptors are lost too. Now when the proxies are
started by start_proxies(), the file descriptors are not registered yet,
leaving enough time for the fork() to take place and to get a new pollfd.
It will be the first call to maintain_proxies that will register them.