Willy Tarreau [Sun, 10 May 2009 15:20:05 +0000 (17:20 +0200)]
[MINOR] implement per-logger log level limitation
Some people are using haproxy in a shared environment where the
system logger by default sends alert and emerg messages to all
consoles, which happens when all servers go down on a backend for
instance. These people can not always change the system configuration
and would like to limit the outgoing messages level in order not to
disturb the local users.
The addition of an optional 4th field on the "log" line permits
exactly this. The minimal log level ensures that all outgoing logs
will have at least this level. So the logs are not filtered out,
just set to this level.
Benoit [Wed, 25 Mar 2009 12:02:10 +0000 (13:02 +0100)]
[MEDIUM] add support for "balance hdr(name)"
There is a patch made by me that allow for balancing on any http header
field.
[WT:
made minor changes:
- turned 'balance header name' into 'balance hdr(name)' to match more
closely the ACL syntax for easier future convergence
- renamed the proxy structure fields header_* => hh_*
- made it possible to use the domain name reduction to any header, not
only "host" since it makes sense to do it with other ones.
Otherwise patch looks good.
/WT]
Willy Tarreau [Sun, 10 May 2009 11:12:33 +0000 (13:12 +0200)]
[DOC] rearrange the configuration manual and add a summary
Several people have asked for a summary in order to ease finding
of sections in the configuration manual. It was the opportunity to
tidy it up a bit and rearrange some sections.
Willy Tarreau [Sun, 10 May 2009 09:57:02 +0000 (11:57 +0200)]
[MINOR] add options dontlog-normal and log-separate-errors
Some big traffic sites have trouble dealing with logs and tend to
disable them. Here are two new options to help cope with massive
logs.
- dontlog-normal only disables logging for 100% successful
connections, other ones will still be logged
- log-separate-errors will cause non-100% successful connections
to be logged at level "err" instead of level "info" so that a
properly configured syslog daemon can send them to a different
file for longer conservation.
Willy Tarreau [Sun, 10 May 2009 08:18:54 +0000 (10:18 +0200)]
[BUG] O(1) pollers should check their FD before closing it
epoll, sepoll and kqueue pollers should check that their fd is not
closed before attempting to close it, otherwise we can end up with
multiple closes of fd #0 upon exit, which is harmless but dirty.
Willy Tarreau [Sun, 10 May 2009 07:59:50 +0000 (09:59 +0200)]
[MEDIUM] convert all signals to asynchronous signals
The small list of signals currently handled by haproxy were processed
as soon as they were received. This has caused trouble with calls to
pool_gc2() occuring in the middle of libc's memory management functions
seldom causing deadlocks preventing the old process from leaving.
Now these signals use the new async signal framework and are called
asynchronously, when there is no risk of recursion. This ensures more
reliable operation, especially for sensible processing such as memory
management.
Willy Tarreau [Sun, 10 May 2009 07:57:21 +0000 (09:57 +0200)]
[MEDIUM] pollers: don't wait if a signal is pending
If an asynchronous signal is received outside of the poller, we don't
want the poller to wait for a timeout to occur before processing it,
so we set its timeout to zero, just like we do with pending tasks in
the run queue.
Willy Tarreau [Sun, 10 May 2009 06:53:33 +0000 (08:53 +0200)]
[MINOR] add basic signal handling functions
These functions will be used to deliver asynchronous signals in order
to make the signal handling functions more robust. The goal is to keep
the same interface to signal handlers.
I have attached a patch which will add on every http request a new
header 'X-Original-To'. If you have HAProxy running in transparent mode
with a big number of SQUID servers behind it, it is very nice to have
the original destination ip as a common header to make decisions based
on it.
The whole thing is configurable with a new option 'originalto'. I have
updated the sourcecode as well as the documentation. The 'haproxy-en.txt'
and 'haproxy-fr.txt' files are untouched, due to lack of my french
language knowledge. ;)
Also the patch adds this header for IPv4 only. I haven't any IPv6 test
environment running here and don't know if getsockopt() with SO_ORIGINAL_DST
will work on IPv6. If someone knows it and wants to test it I can modify
the diff. Feel free to ask me questions or things which should be changed. :)
Willy Tarreau [Fri, 1 May 2009 09:33:17 +0000 (11:33 +0200)]
[BUG] fix wrong pointer arithmetics in HTTP message captures
The pointer arithmetics was wrong in http_capture_bad_message().
This has no impact right now because the error only msg->som was
affected and right now it's always 0. But this was a bug waiting
for keepalive support to strike.
[CRITICAL] uninitialized response field can sometimes cause crashes
The response message in the transaction structure was not properly
initialised at session initialisation. In theory it cannot cause any
trouble since the affected field os expected to always remain NULL.
However, in some circumstances, such as building on 64-bit platforms
with certain options, the struct session can be exactly 1024 bytes,
the same size of the requri field, so the pools are merged and the
uninitialised field may contain non-null data, causing crashes if
an invalid response is encountered and archived.
The fix simply consists in correctly initialising the missing fields.
This bug cannot affect architectures where the session pool is not
shared (32-bit architectures), but this is only by pure luck.
[MEDIUM] ensure we don't recursively call pool_gc2()
A race condition exists in the hot reconfiguration code. It is
theorically possible that the second signal is sent during a free()
in the first list, which can cause crashes or freezes (the later
have been observed). Just set up a counter to ensure we do not
recurse.
The byte counters have long been 64-bit to avoid overflows. But with
several sites nowadays, we see session counters wrap around every 10-days
or so. So it was the moment to switch counters to 64-bit, including
error and warning counters which can theorically rise as fast as session
counters even if in practice there is very low risk.
The performance impact should not be noticeable since those counters are
only updated once per session. The stats output have been carefully checked
for proper types on both 32- and 64-bit platforms.
[BUILD] make it possible to pass alternative arch at build time
When trying to build a 32-bit binary on a 64-bit platform, we generally
need to pass "-m32" to gcc, which is not convenient with current makefile.
Note that this option requires gcc >= 3.
In order to ease parameter passing, a new ARCH= makefile option has been
added. If it receives a target architecture, according "-m32"/"-m64" and
"-march=xxxx" will be passed to gcc. Only the generic makefile has been
changed to support this option right now as the need only appeared on Linux.
The spec file now makes use of this option so that rpmbuild can automatically
build with the proper architecture.
[MEDIUM] http: capture invalid requests/responses even if accepted
It's useful to be able to accept an invalid header name in a request
or response but still be able to monitor further such errors. Now,
when an invalid request/response is received and accepted due to
an "accept-invalid-http-{request|response}" option, the invalid
request will be captured for later analysis with "show errors" on
the stats socket.
[MEDIUM] http: add options to ignore invalid header names
Sometimes it is required to let invalid requests pass because
applications sometimes take time to be fixed and other servers
do not care. Thus we provide two new options :
option accept-invalid-http-request (for the frontend)
option accept-invalid-http-response (for the backend)
When those options are set, invalid requests or responses do
not cause a 403/502 error to be generated.
Willy Tarreau [Sun, 29 Mar 2009 13:26:57 +0000 (15:26 +0200)]
[RELEASE] Released version 1.3.17
Released version 1.3.17 with the following main changes :
- Update specfile to build for v2.6 kernel.
- [BUG] reset the stream_interface connect timeout upon connect or error
- [BUG] reject unix accepts when connection limit is reached
- [MINOR] show sess: report number of calls to each task
- [BUG] don't call epoll_ctl() on closed sockets
- [BUG] stream_sock: disable I/O on fds reporting an error
- [MINOR] sepoll: don't count two events on the same FD.
- [MINOR] show sess: report a lot more information about sessions
- [BUG] stream_sock: check for shut{r,w} before refreshing some timeouts
- [BUG] don't set an expiration date directly from now_ms
- [MINOR] implement ulltoh() to write HTML-formatted numbers
- [MINOR] stats/html: group digits by 3 to clarify numbers
- [BUILD] remove haproxy-small.spec
- [BUILD] makefile: remove unused references to linux24eold and EPOLL_CTL_WORKAROUND
[PATCH] Update specfile to build for v2.6 kernel.
- Fix date in changelog.
- Stop using deprecated "REGEX=pcre", and start using "USE_PCRE=1" instead.
- Disable RPM-processing of perl dependencies, since haproxy
shouldn't depend on perl, and it's only the examples/check script
that's using perl.
Willy Tarreau [Sun, 29 Mar 2009 11:41:58 +0000 (13:41 +0200)]
[MINOR] implement ulltoh() to write HTML-formatted numbers
This function sets CSS letter spacing after each 3rd digit. The page must
create a class "rls" (right letter spacing) with style "letter-spacing: 0.3em"
in order to use it.
Willy Tarreau [Sun, 29 Mar 2009 08:18:41 +0000 (10:18 +0200)]
[BUG] stream_sock: check for shut{r,w} before refreshing some timeouts
Under some circumstances, it appears possible to refresh a timeout
just after a side has been shut. For instance, if poll() plans to
call both read and write, and the read side calls chk_snd() which
in turn causes a shutw to occur, then stream_sock_write could update
its write timeout. The same problem happens the other way.
The timeout checks will then not catch these cases because they
ignore timeouts in case of shut{r,w}.
This is very likely to be the major cause of the 100% CPU usages
reported by Bart Bobrowski.
The fix consists in always ensuring that a side is not shut before
updating its timeout.
Willy Tarreau [Sat, 28 Mar 2009 23:18:14 +0000 (00:18 +0100)]
[MINOR] show sess: report a lot more information about sessions
For complex troubleshooting, it's sometimes useful to be able to
completely dump all the states and flags related to a session.
Now "show sess" will report the stream interfaces and buffers
status for each session.
Willy Tarreau [Sat, 28 Mar 2009 20:10:48 +0000 (21:10 +0100)]
[MINOR] sepoll: don't count two events on the same FD.
sepoll counts the number of speculative events it has processed in
order to remain fair with epoll_wait(). If a same FD is processed
both for read and for write, it is counted twice. Fix this.
Willy Tarreau [Sat, 28 Mar 2009 19:54:53 +0000 (20:54 +0100)]
[BUG] stream_sock: disable I/O on fds reporting an error
Upon read or write error, we cannot immediately close the FD because
we want to first report the error to the upper layer which will do it
itself. However, we want to prevent any further I/O from being performed
on the FD. This is especially important in case of speculative I/O where
nothing else could stop the FD from still being polled until the upper
layer takes care of the condition.
Willy Tarreau [Sat, 28 Mar 2009 18:43:06 +0000 (19:43 +0100)]
[BUG] don't call epoll_ctl() on closed sockets
Some I/O callbacks are able to close their socket themselves. We
want to check this before calling epoll_ctl(EPOLL_CTL_DEL), otherwise
we get a -1 EBADF. Right now is looks like this could not cause any
trouble but the case is racy enough to fix it.
Willy Tarreau [Sat, 28 Mar 2009 10:02:18 +0000 (11:02 +0100)]
[BUG] reject unix accepts when connection limit is reached
unix sockets are not attached to a real frontend, so there is
no way to disable/enable the listener depending on the global
session count. For this reason, if the global maxconn is reached
and a unix socket comes in, it will just be ignored and remain
in the poll list, which will call again indefinitely.
So we need to accept then drop incoming unix connections when
the table is full.
This should not happen with clean configurations since the global
maxconn should provide enough room for unix sockets.
Willy Tarreau [Sun, 22 Mar 2009 22:46:12 +0000 (23:46 +0100)]
[RELEASE] Released version 1.3.16
Released version 1.3.16 with the following main changes :
- [BUILD] Fixed Makefile for linking pcre
- [CONTRIB] selinux policy for haproxy
- [MINOR] show errors: encode backslash as well as non-ascii characters
- [MINOR] cfgparse: some cleanups in the consistency checks
- [MINOR] cfgparse: set backends to "balance roundrobin" by default
- [MINOR] tcp-inspect: permit the use of no-delay inspection
- [MEDIUM] reverse internal proxy declaration order to match configuration
- [CLEANUP] config: catch and report some possibly wrong rule ordering
- [BUG] connect timeout is in the stream interface, not the buffer
- [BUG] session: errors were not reported in termination flags in TCP mode
- [MINOR] tcp_request: let the caller take care of errors and timeouts
- [CLEANUP] http: remove some commented out obsolete code in process_response
- [MINOR] update ebtree to version 4.1
- [MEDIUM] scheduler: get rid of the 4 trees thanks and use ebtree v4.1
- [BUG] sched: don't leave 3 lasts tasks unprocessed when niced tasks are present
- [BUG] scheduler: fix improper handling of duplicates __task_queue()
- [MINOR] sched: permit a task to stay up between calls
- [MINOR] task: keep a task count and clean up task creators
- [MINOR] stats: report number of tasks (active and running)
- [BUG] server check intervals must not be null
- [OPTIM] stream_sock: don't retry to read after a large read
- [OPTIM] buffer: new BF_READ_DONTWAIT flag reduces EAGAIN rates
- [MEDIUM] session: don't resync FSMs on non-interesting changes
- [BUG] check for global.maxconn before doing accept()
- [OPTIM] sepoll: do not re-check whole list upon accepts
Willy Tarreau [Sun, 22 Mar 2009 18:25:46 +0000 (19:25 +0100)]
[OPTIM] sepoll: do not re-check whole list upon accepts
There is already an optimisation in the speculative poller which
causes newly created FDs to be checked immediately after being
created. Unfortunately, this optimisation causes the whole spec
list to be re-checked while we're only interested in the new FDs.
Doing this minor change causes performance gains of up to 6% on
medium-sized objects with a few hundreds concurrent connections.
Willy Tarreau [Sat, 21 Mar 2009 21:43:12 +0000 (22:43 +0100)]
[BUG] check for global.maxconn before doing accept()
If the accept() is done before checking for global.maxconn, we can
accept too many connections and encounter a lack of file descriptors
when trying to connect to the server. This is the cause of the
"cannot get a server socket" message encountered in debug mode
during injections with low timeouts.
Willy Tarreau [Sat, 21 Mar 2009 21:09:29 +0000 (22:09 +0100)]
[MEDIUM] session: don't resync FSMs on non-interesting changes
While processing the session, we used to resync the FSMs when buffer
flags changed. But since BF_KERN_SPLICING and BF_READ_DONTWAIT were
introduced, sometimes we could resync after they were set, which is
not what we want. This was because there were some old checks left
which did not mask changes with BF_MASK_STATIC before checking.
Willy Tarreau [Sat, 21 Mar 2009 20:10:04 +0000 (21:10 +0100)]
[OPTIM] buffer: new BF_READ_DONTWAIT flag reduces EAGAIN rates
When the reader does not expect to read lots of data, it can
set BF_READ_DONTWAIT on the request buffer. When it is set,
the stream_sock_read callback will not try to perform multiple
reads, it will return after only one, and clear the flag.
That way, we can immediately return when waiting for an HTTP
request without trying to read again.
On pure request/responses schemes such as monitor-uri or
redirects, this has completely eliminated the EAGAIN occurrences
and the epoll_ctl() calls, resulting in a performance increase of
about 10%. Similar effects should be observed once we support
HTTP keep-alive since we'll immediately disable reads once we
get a full request.
Willy Tarreau [Sat, 21 Mar 2009 19:43:57 +0000 (20:43 +0100)]
[OPTIM] stream_sock: don't retry to read after a large read
If we get very large data at once, it's almost certain that it's
worthless trying to read again, because we got everything we could
get.
Doing this has made all -EAGAIN disappear from splice reads. The
threshold has been put in the global tunable structures so that if
we one day want to make it accessible from user config, it will be
easy to do so.
Willy Tarreau [Sat, 21 Mar 2009 17:13:21 +0000 (18:13 +0100)]
[MINOR] task: keep a task count and clean up task creators
It's sometimes useful at least for statistics to keep a task count.
It's easy to do by forcing the rare task creators to always use the
same functions to create/destroy a task.
Willy Tarreau [Sat, 21 Mar 2009 12:26:05 +0000 (13:26 +0100)]
[MINOR] sched: permit a task to stay up between calls
If a task wants to stay in the run queue, it is possible. It just
needs to wake itself up. We just want to ensure that a reniced
task will be processed at the right instant.
Willy Tarreau [Sat, 21 Mar 2009 11:51:40 +0000 (12:51 +0100)]
[BUG] scheduler: fix improper handling of duplicates __task_queue()
The top of a duplicate tree is not where bit == -1 but at the most
negative bit. This was causing tasks to be queued in reverse order
within duplicates. While this is not dramatic, it's incorrect and
might lead to longer than expected duplicate depths under some
circumstances.
Willy Tarreau [Sat, 21 Mar 2009 10:53:09 +0000 (11:53 +0100)]
[BUG] sched: don't leave 3 lasts tasks unprocessed when niced tasks are present
When there are niced tasks, we would only process #tasks/4 per
turn, without taking care of running #tasks when #tasks was below
4, leaving those tasks waiting for a few other tasks to push them.
Willy Tarreau [Sat, 21 Mar 2009 09:01:42 +0000 (10:01 +0100)]
[MEDIUM] scheduler: get rid of the 4 trees thanks and use ebtree v4.1
Since we're now able to search from a precise expiration date in
the timer tree using ebtree 4.1, we don't need to maintain 4 trees
anymore. Not only does this simplify the code a lot, but it also
ensures that we can always look 24 days back and ahead, which
doubles the ability of the previous scheduler. Indeed, while based
on absolute values, the timer tree is now relative to <now> as we
can always search from <now>-31 bits.
The run queue uses the exact same principle now, and is now simpler
and a bit faster to process. With these changes alone, an overall
0.5% performance gain was observed.
Tests were performed on the few wrapping cases and everything works
as expected.
Willy Tarreau [Sun, 15 Mar 2009 21:55:47 +0000 (22:55 +0100)]
[MINOR] tcp_request: let the caller take care of errors and timeouts
tcp_request is not meant to decide how an error or a timeout has to
be handled. It must just apply it rules. Now that the error checks
have been added to the session, we don't need to check them anymore
in tcp_request_inspect(), which will only consider the shutdown which
may be the result of such an error.
That makes a lot more sense since tcp_request is not really waiting
for a request.
Willy Tarreau [Sun, 15 Mar 2009 21:34:05 +0000 (22:34 +0100)]
[BUG] session: errors were not reported in termination flags in TCP mode
In order to get termination flags properly updated, the session was
relying a bit too much on http_return_srv_error() which is http-centric.
A generic srv_error function was implemented in the session in order to
catch all connection abort situations. It was then noticed that a request
abort during a connection attempt was not reported, which is now fixed.
Read and write errors/timeouts were not logged either. It was necessary
to add those tests at 4 new locations.
Now it looks like everything is correctly logged. Most likely some error
checking code could now be removed from some analysers.
Willy Tarreau [Sun, 15 Mar 2009 20:49:00 +0000 (21:49 +0100)]
[BUG] connect timeout is in the stream interface, not the buffer
The connect timeout was not properly detected due to the fact that
it was not correctly initialized. It must be set as the stream interface
timeout, not the buffer's write timeout.
Willy Tarreau [Sun, 15 Mar 2009 14:23:16 +0000 (15:23 +0100)]
[CLEANUP] config: catch and report some possibly wrong rule ordering
There are some configurations in which redirect rules are declared
after use_backend rules. We can also find "block" rules after any
of these ones. The processing sequence is :
- block
- redirect
- use_backend
So as of now we try to detect wrong ordering to warn the user about
a possibly undesired behaviour.
Willy Tarreau [Sun, 15 Mar 2009 13:51:53 +0000 (14:51 +0100)]
[MEDIUM] reverse internal proxy declaration order to match configuration
People are regularly complaining that proxies are linked in reverse
order when reading the stats. This is now definitely fixed because
the proxy order is now fixed to match configuration order.
Willy Tarreau [Sun, 15 Mar 2009 13:43:58 +0000 (14:43 +0100)]
[MINOR] tcp-inspect: permit the use of no-delay inspection
Sometimes it may make sense to be able to immediately apply a verdict
without waiting at all. It was not possible because no inspect-delay
meant no inspection at all. This is now fixed.
Willy Tarreau [Sun, 15 Mar 2009 13:06:41 +0000 (14:06 +0100)]
[MINOR] cfgparse: set backends to "balance roundrobin" by default
When a backend has no LB algo specified and is not in dispatch, proxy
nor transparent mode, use "balance roundrobin" by default instead of
complaining. This will be particularly useful with stats and redirects.
Willy Tarreau [Mon, 9 Mar 2009 21:40:57 +0000 (22:40 +0100)]
[BUG] stream_sock: write timeout must be updated when forwarding !
When data are forwarded between socket, we must update the output
socket's write timeout. This was forgotten, causing sessions to
unexpectedly expire during long posts.
Willy Tarreau [Mon, 9 Mar 2009 00:03:42 +0000 (01:03 +0100)]
[RELEASE] Released version 1.3.16-rc1
Released version 1.3.16-rc1 with the following main changes :
- appsessions: cleanup DEBUG_HASH and initialize request_counter
- [MINOR] acl: add new keyword "connslots"
- [MINOR] cfgparse: fix off-by 2 in error message size
- [BUILD] fix build with gcc 4.3
- [BUILD] fix MANDIR default location to match documentation
- [TESTS] add a debug patch to help trigger the stats bug
- [BUG] Flush buffers also where there are exactly 0 bytes left
- [MINOR] Allow to specify a domain for a cookie
- [BUG/CLEANUP] cookiedomain -> cookie_domain rename + free(p->cookie_domain)
- [MEDIUM] Fix memory freeing at exit
- [MEDIUM] Fix memory freeing at exit, part 2
- [BUG] Fix listen & more of 2 couples <ip>:<port>
- [DOC] remove buggy comment for use_backend
- [CRITICAL] fix server state tracking: it was O(n!) instead of O(n)
- [MEDIUM] add support for URI hash depth and length limits
- [MINOR] permit renaming of x-forwarded-for header
- [BUILD] fix Makefile.bsd and Makefile.osx for stream_interface
- [BUILD] Haproxy won't compile if DEBUG_FULL is defined
- [MEDIUM] upgrade to ebtree v4.0
- [DOC] update the README file with new build options
- [MEDIUM] reduce risk of event starvation in ev_sepoll
- [MEDIUM] detect streaming buffers and tag them as such
- [MEDIUM] add support for conditional HTTP redirection
- [BUILD] make install should depend on haproxy not "all"
- [DEBUG] add a TRACE macro to facilitate runtime data extraction
- [BUG] event pollers must not wait if a task exists in the run queue
- [BUG] queue management: wake oldest request in queues
- [BUG] log: reported queue position was offed-by-one
- [BUG] fix the dequeuing logic to ensure that all requests get served
- [DOC] documentation for the "retries" parameter was missing.
- [MEDIUM] implement a monotonic internal clock
- [MEDIUM] further improve monotonic clock by check forward jumps
- [OPTIM] add branch prediction hints in list manipulations
- [MAJOR] replace ultree with ebtree in wait-queues
- [BUG] we could segfault during exit while freeing uri_auths
- [BUG] wqueue: perform proper timeout comparisons with wrapping values
- [MINOR] introduce now_ms, the current date in milliseconds
- [BUG] disable buffer read timeout when reading stats
- [MEDIUM] rework the wait queue mechanism
- [BUILD] change declaration of base64tab to fix build with Intel C++
- [OPTIM] shrink wake_expired_tasks() by using task_wakeup()
- [MAJOR] use an ebtree instead of a list for the run queue
- [MEDIUM] introduce task->nice and boot access to statistics
- [OPTIM] task_queue: assume most consecutive timers are equal
- [BUILD] silent a warning in unlikely() with gcc 4.x
- [MAJOR] convert all expiration timers from timeval to ticks
- [BUG] use_backend would not correctly consider "unless"
- [TESTS] added test-acl.cfg to test some ACL combinations
- [MEDIUM] add support for configuration keyword registration
- [MEDIUM] modularize the global "stats" keyword configuration parser
- [MINOR] cfgparse: add support for warnings in external functions
- [MEDIUM] modularize the "timeout" keyword configuration parser
- [MAJOR] implement tcp request content inspection
- [MINOR] acl: add a new parsing function: parse_dotted_ver
- [MINOR] acl: add req_ssl_ver in TCP, to match an SSL version
- [CLEANUP] remove unused include/types/client.h
- [CLEANUP] remove many #include <types/xxx> from C files
- [CLEANUP] remove dependency on obsolete INTBITS macro
- [DOC] document the new "tcp-request" keyword and associated ACLs
- [MINOR] acl: add REQ_CONTENT to the list of default acls
- [MEDIUM] acl: permit fetch() functions to set the result themselves
- [MEDIUM] acl: get rid of dummy values in always_true/always_false
- [MINOR] acl: add the "wait_end" acl verb
- [MEDIUM] acl: enforce ACL type checking
- [MEDIUM] acl: set types on all currently known ACL verbs
- [MEDIUM] acl: when possible, report the name and requirements of ACLs in warnings
- [CLEANUP] remove 65 useless NULL checks before free
- [MEDIUM] memory: update pool_free2() to support NULL pointers
- [MEDIUM] buffers: ensure buffer_shut* are properly called upon shutdowns
- [MEDIUM] process_srv: rely on buffer flags for client shutdown
- [MEDIUM] process_srv: don't rely at all on client state
- [MEDIUM] process_cli: don't rely at all on server state
- [BUG] fix segfault with url_param + check_post
- [BUG] server timeout was not considered in some circumstances
- [BUG] client timeout incorrectly rearmed while waiting for server
- [MAJOR] kill CL_STINSPECT and CL_STHEADERS (step 1)
- [MAJOR] get rid of SV_STANALYZE (step 2)
- [MEDIUM] simplify and centralize request timeout cancellation and request forwarding
- [MAJOR] completely separate HTTP and TCP states on the request path
- [BUG] fix recently introduced loop when client closes early
- [MAJOR] get rid of the SV_STHEADERS state
- [MAJOR] better separation of response processing and server state
- [MAJOR] clearly separate HTTP response processing from TCP server state
- [MEDIUM] remove unused references to {CL|SV}_STSHUT*
- [MINOR] term_trace: add better instrumentations to trace the code
- [BUG] ev_sepoll: closed file descriptors could persist in the spec list
- [BUG] process_response must not enable the read FD
- [BUG] buffers: remove BF_MAY_CONNECT and fix forwarding issue
- [BUG] process_response: do not touch srv_state
- [BUG] maintain_proxies must not disable backends
- [CLEANUP] get rid of BF_SHUT*_PENDING
- [MEDIUM] buffers: add BF_EMPTY and BF_FULL to remove dependency on req/rep->l
- [MAJOR] process_session: rely only on buffer flags
- [MEDIUM] use buffer->wex instead of buffer->cex for connect timeout
- [MEDIUM] centralize buffer timeout checks at the top of process_session
- [MINOR] ensure the termination flags are set by process_xxx
- [MEDIUM] session: move the analysis bit field to the buffer
- [OPTIM] process_cli/process_srv: reduce the number of tests
- [BUG] regparm is broken on gcc < 3
- [BUILD] fix warning in proto_tcp.c with gcc >= 4
- [MEDIUM] merge inspect_exp and txn->exp into request buffer
- [BUG] process_cli/process_srv: don't call shutdown when already done
- [BUG] process_request: HTTP body analysis must return zero if missing data
- [TESTS] test-fsm: 22 regression tests for state machines
- [BUG] Fix empty X-Forwarded-For header name when set in defaults section
- [BUG] fix harmless but wrong fd insertion sequence
- [MEDIUM] make it possible for analysers to follow the whole session
- [MAJOR] rework of the server FSM
- [OPTIM] remove useless fd_set(read) upon shutdown(write)
- [MEDIUM] massive cleanup of process_srv()
- [MEDIUM] second level of code cleanup for process_srv_data
- [MEDIUM] third cleanup and optimization of process_srv_data()
- [MEDIUM] process_srv_data: ensure that we always correctly re-arm timeouts
- [MEDIUM] stream_sock_process_data moved to stream_sock.c
- [MAJOR] make the client side use stream_sock_process_data()
- [MEDIUM] split stream_sock_process_data
- [OPTIM] stream_sock_read must check for null-reads more often
- [MINOR] only call flow analysers when their read side is connected.
- [MEDIUM] reintroduce BF_HIJACK with produce_content
- [MINOR] re-arrange buffer flags and rename some of them
- [MINOR] do not check for BF_SHUTR when computing write timeout
- [OPTIM] ev_sepoll: detect newly created FDs and check them once
- [OPTIM] reduce the number of calls to task_wakeup()
- [OPTIM] force inlining of large functions with gcc >= 3
- [MEDIUM] indicate a reason for a task wakeup
- [MINOR] change type of fdtab[]->owner to void*
- [MAJOR] make stream sockets aware of the stream interface
- [MEDIUM] stream interface: add the ->shutw method as well as in and out buffers
- [MEDIUM] buffers: add BF_READ_ATTACHED and BF_ANA_TIMEOUT
- [MEDIUM] process_session: make use of the new buffer flags
- [CLEANUP] process_session: move debug outputs out of the critical loop
- [MEDIUM] move QUEUE and TAR timers to stream interfaces
- [OPTIM] add compiler hints in tick_is_expired()
- [MINOR] add buffer_check_timeouts() to check what timeouts have fired.
- [MEDIUM] use buffer_check_timeouts instead of stream_sock_check_timeouts()
- [MINOR] add an expiration flag to the stream_sock_interface
- [MAJOR] migrate the connection logic to stream interface
- [MAJOR] add a connection error state to the stream_interface
- [MEDIUM] add the SN_CURR_SESS flag to the session to track open sessions
- [MEDIUM] continue layering cleanups.
- [MEDIUM] stream_interface: added a DISconnected state between CON/EST and CLO
- [MEDIUM] remove stream_sock_update_data()
- [MINOR] maintain a global session list in order to ease debugging
- [BUG] shutw must imply close during a connect
- [MEDIUM] process shutw during connection attempt
- [MEDIUM] make the stream interface control the SHUT{R,W} bits
- [MAJOR] complete layer4/7 separation
- [CLEANUP] move the session-related functions to session.c
- [MINOR] call session->do_log() for logging
- [MINOR] replace the ambiguous client_return function by stream_int_return
- [MINOR] replace client_retnclose() with stream_int_retnclose()
- [MINOR] replace srv_close_with_err() with http_server_error()
- [MEDIUM] make the http server error function a pointer in the session
- [CLEANUP] session.c: removed some migration left-overs in sess_establish()
- [MINOR] stream_sock_data_finish() should not expose fd
- [MEDIUM] extract TCP request processing from HTTP
- [MEDIUM] extract the HTTP tarpit code from process_request().
- [MEDIUM] move the HTTP request body analyser out of process_request().
- [MEDIUM] rename process_request to http_process_request
- [BUG] fix forgotten server session counter
- [MINOR] declare process_session in session.h, not proto_http.h
- [MEDIUM] first pass of lifting to proto_uxst.c:uxst_event_accept()
- [MINOR] add an analyser code for UNIX stats request
- [MINOR] pre-set analyser flags on the listener at registration time
- [BUG] do not forward close from cons to prod with analysers
- [MEDIUM] ensure that sock->shutw() also closes read for init states
- [MINOR] add an analyser state in struct session
- [MAJOR] make unix sockets work again with stats
- [MEDIUM] remove cli_fd, srv_fd, cli_state and srv_state from the session
- [MINOR] move the listener reference from fd to session
- [MEDIUM] reference the current hijack function in the buffer itself
- [MINOR] slightly rebalance stats_dump_{raw,http}
- [MINOR] add a new back-reference type : struct bref
- [MINOR] add back-references to sessions for later use by a dumper.
- [MEDIUM] add support for "show sess" in unix stats socket
- [BUG] do not release the connection slot during a retry
- [BUG] dynamic connection throttling could return a max of zero conns
- [BUG] do not try to pause backends during reload
- [BUG] ensure that listeners from disabled proxies are correctly unbound.
- [BUG] acl-related keywords are not allowed in defaults sections
- [BUG] cookie capture is declared in the frontend but checked on the backend
- [BUG] critical errors should be reported even in daemon mode
- [MINOR] redirect: add support for the "drop-query" option
- [MINOR] redirect: add support for "set-cookie" and "clear-cookie"
- [MINOR] redirect: in prefix mode a "/" means not to change the URI
- [BUG] do not dequeue requests on a dead server
- [BUG] do not dequeue the backend's pending connections on a dead server
- [MINOR] stats: indicate if a task is running in "show sess"
- [BUG] check timeout must not be changed if timeout.check is not set
- [BUG] "option transparent" is for backend, not frontend !
- [MINOR] transfer errors were not reported anymore in data phase
- [MEDIUM] add a send limit to a buffer
- [MEDIUM] don't report buffer timeout when there is I/O activity
- [MEDIUM] indicate when we don't care about read timeout
- [MINOR] add flags to indicate when a stream interface is waiting for space/data
- [MEDIUM] enable inter-stream_interface wakeup calls
- [MAJOR] implement autonomous inter-socket forwarding
- [MINOR] add the splice_len member to the buffer struct in preparation of splice support
- [MEDIUM] stream_sock: factor out the return path in case of no-writes
- [MEDIUM] i/o: rework ->to_forward and ->send_max
- [OPTIM] stream_sock: do not ask for polling on EAGAIN if we have read
- [OPTIM] buffer: replace rlim by max_len
- [OPTIM] stream_sock: factor out the buffer full handling out of the loop
- [CLEANUP] replace a few occurrences of (flags & X) && !(flags & Y)
- [CLEANUP] stream_sock: move the write-nothing condition out of the loop
- [MEDIUM] split stream_sock_write() into callback and core functions
- [MEDIUM] stream_sock_read: call ->chk_snd whenever there are data pending
- [MINOR] stream_sock: fix a few wrong empty calculations
- [MEDIUM] stream_sock: try to send pending data on chk_snd()
- [MINOR] global.maxpipes: add the ability to reserve file descriptors for pipes
- [MEDIUM] splice: add configuration options and set global.maxpipes
- [MINOR] introduce structures required to support Linux kernel splicing
- [MEDIUM] add definitions for Linux kernel splicing
- [MAJOR] complete support for linux 2.6 kernel splicing
- [BUG] reserve some pipes for backends with splice enabled
- [MEDIUM] splice: add hints to support older buggy kernels
- [MEDIUM] introduce pipe pools
- [MEDIUM] splice: make use of pipe pools
- [STATS] report pipe usage in the statistics
- [OPTIM] make global.maxpipes default to global.maxconn/4 when not specified
- [BUILD] fix snapshot date extraction with negative timezones
- [MEDIUM] move global tuning options to the global structure
- [MEDIUM] splice: add the global "nosplice" option
- [BUILD] add USE_LINUX_SPLICE to enable LINUX_SPLICE on linux 2.6
- [BUG] we must not exit if protocol binding only returns a warning
- [MINOR] add support for bind interface name
- [BUG] inform the user when root is expected but not set
- [MEDIUM] add support for source interface binding
- [MEDIUM] add support for source interface binding at the server level
- [MEDIUM] implement bind-process to limit service presence by process
- [DOC] document maxpipes, nosplice, option splice-{auto,request,response}
- [DOC] filled the logging section of the configuration manual
- [DOC] document HTTP status codes
- [DOC] document a few missing info about errorfile
- [BUG] fix random memory corruption using "show sess"
- [BUG] fix unix socket processing of interrupted output
- [DOC] add diagrams of queuing and future ACL design
- [BUILD] proto_http did not build on gcc-2.95
- [BUG] the "source" keyword must first clear optional settings
- [BUG] global.tune.maxaccept must be limited even in mono-process mode
- [MINOR] ensure that http_msg_analyzer updates pointer to invalid char
- [MEDIUM] store a complete dump of request and response errors in proxies
- [MEDIUM] implement error dump on unix socket with "show errors"
- [DOC] document "show errors"
- [MINOR] errors dump must use user-visible date, not internal date.
- [MINOR] time: add __usec_to_1024th to convert usecs to 1024th of second
- [MINOR] add curr_sec_ms and curr_sec_ms_scaled for current second.
- [MEDIUM] measure and report session rate on frontend, backends and servers
- [BUG] the "connslots" keyword was matched as "connlots"
- [MINOR] acl: add 2 new verbs: fe_sess_rate and be_sess_rate
- [MEDIUM] implement "rate-limit sessions" for the frontend
- [BUG] interface binding: length must include the trailing zero
- [BUG] typo in timeout error reporting : report *res and not *err
- [OPTIM] maintain_proxies: only wake up when the frontend will be ready
- [OPTIM] rate-limit: cleaner behaviour on low rates and reduce consumption
- [BUG] switch server-side stream interface to close in case of abort
- [CLEANUP] remove last references to term_trace
- [OPTIM] freq_ctr: do not rotate the counters when reading
- [BUG] disable any analysers for monitoring requests
- [BUG] rate-limit in defaults section was ignored
- [BUG] task: fix handling of duplicate keys
- [OPTIM] task: don't unlink a task from a wait queue when waking it up
- [OPTIM] displace tasks in the wait queue only if absolutely needed
- [MEDIUM] minor update to the task api: let the scheduler queue itself
- [BUG] event_accept() must always wake the task up, even in health mode
- [CLEANUP] task: distinguish between clock ticks and timers
- [OPTIM] task: reduce the number of calls to task_queue()
- [OPTIM] do not re-check req buffer when only response has changed
- [CLEANUP] don't enable kernel splicing when socket is closed
- [CLEANUP] buffer_flush() was misleading, rename it as buffer_erase
- [MINOR] buffers: implement buffer_flush()
- [MEDIUM] rearrange forwarding condition to enable splice during analysis
- [BUILD] build fixes for Solaris
- [BUILD] proto_http did not build on gcc-2.95 (again)
- [CONTRIB] halog: fast log parser for haproxy
- [CONTRIB] halog: faster fgets() and add support for percentile reporting
Willy Tarreau [Sun, 8 Mar 2009 20:38:23 +0000 (21:38 +0100)]
[MEDIUM] rearrange forwarding condition to enable splice during analysis
The forwarding condition was not very clear. We would only enable
forwarding when send_max is zero, and we would only splice when no
analyser is installed. In fact we want to enable forward when there
is no analyser and we want to splice at soon as there is data to
forward, regardless of the analysers.
Willy Tarreau [Sun, 8 Mar 2009 18:20:25 +0000 (19:20 +0100)]
[OPTIM] do not re-check req buffer when only response has changed
In process_session(), we used to re-run through all the evaluation
loop when only the response had changed. Now we carefully check in
this order :
- changes to the stream interfaces (only SI_ST_DIS)
- changes to the request buffer flags
- changes to the response buffer flags
And we branch to the appropriate section. This saves significant
CPU cycles, which is important since process_session() is one of
the major CPU eaters.
The same changes have been applied to uxst_process_session().
Willy Tarreau [Sun, 8 Mar 2009 15:35:27 +0000 (16:35 +0100)]
[OPTIM] task: reduce the number of calls to task_queue()
Most of the time, task_queue() will immediately return. By extracting
the preliminary checks and putting them in an inline function, we can
significantly reduce the number of calls to the function itself, and
most of the tests can be optimized away due to the caller's context.
Another minor improvement in process_runnable_tasks() consisted in
taking benefit from the processor's branch prediction unit by making
a special case of the process_session() callback which is by far the
most common one.
All this improved performance by about 1%, mainly during the call
from process_runnable_tasks().
Willy Tarreau [Sun, 8 Mar 2009 14:53:06 +0000 (15:53 +0100)]
[CLEANUP] task: distinguish between clock ticks and timers
Timers are unsigned and used as tree positions. Ticks are signed and
used as absolute date within current time frame. While the two are
normally equal (except zero), it's important not to confuse them in
the code as they are not interchangeable.
We add two inline functions to turn each one into the other.
The comments have also been moved to the proper location, as it was
not easy to understand what was a tick and what was a timer unit.
Willy Tarreau [Sun, 8 Mar 2009 08:38:41 +0000 (09:38 +0100)]
[MEDIUM] minor update to the task api: let the scheduler queue itself
All the tasks callbacks had to requeue the task themselves, and update
a global timeout. This was not convenient at all. Now the API has been
simplified. The tasks callbacks only have to update their expire timer,
and return either a pointer to the task or NULL if the task has been
deleted. The scheduler will take care of requeuing the task at the
proper place in the wait queue.
Willy Tarreau [Sun, 8 Mar 2009 06:46:27 +0000 (07:46 +0100)]
[OPTIM] displace tasks in the wait queue only if absolutely needed
We don't need to remove then add tasks in the wait queue every time we
update a timeout. We only need to do that when the new timeout is earlier
than previous one. We can rely on wake_expired_tasks() to perform the
proper checks and bounce the misplaced tasks in the rare case where this
happens. The motivation behind this is that we very rarely hit timeouts,
so we save a lot of CPU cycles by moving the tasks very rarely. This now
means we can also find tasks with expiration date set to eternity in the
queue, and that is not a problem.
Willy Tarreau [Sat, 7 Mar 2009 16:25:21 +0000 (17:25 +0100)]
[OPTIM] task: don't unlink a task from a wait queue when waking it up
In many situations, we wake a task on an I/O event, then queue it
exactly where it was. This is a real waste because we delete/insert
tasks into the wait queue for nothing. The only reason for this is
that there was only one tree node in the task struct.
By adding another tree node, we can have one tree for the timers
(wait queue) and one tree for the priority (run queue). That way,
we can have a task both in the run queue and wait queue at the
same time. The wait queue now really holds timers, which is what
it was designed for.
The net gain is at least 1 delete/insert cycle per session, and up
to 2-3 depending on the workload, since we save one cycle each time
the expiration date is not changed during a wake up.
Willy Tarreau [Sat, 7 Mar 2009 23:26:28 +0000 (00:26 +0100)]
[BUG] task: fix handling of duplicate keys
A bug was introduced with the ebtree-based scheduler. It seldom causes
some timeouts to last longer than required if they hit an expiration
date which is the same as the last queued date, is also part of a
duplicate tree without being the top of the tree. In this case, the
task will not be expired until after the duplicate tree has been
flushed.
It is easier to reproduce by setting a very short client timeout (1s)
and sending connections and waiting for them to expire with the 408
status. Then in parallel, inject at about 1kh/s. The bug causes the
connections to sometimes wait longer than 1s before timing out.
The cause was the use of eb_insert_dup() on wrong nodes, as this
function is designed to work only on the top of the dup tree. The
solution consists in updating last_timer only when its bit is -1,
and using it only if its bit is still -1 (top of a dup tree).
The fix has not reduced performance because it only fixes the case
where this bug could fire, which is extremely rare.
Willy Tarreau [Fri, 6 Mar 2009 13:29:25 +0000 (14:29 +0100)]
[OPTIM] freq_ctr: do not rotate the counters when reading
It's easier to take the counter's age into account when consulting it
than to rotate it first. It also saves some CPU cycles and avoids the
multiply for outdated counters, finally saving CPU cycles here too
when multiple operations need to read the same counter.
The freq_ctr code has also shrinked by one third consecutively to these
optimizations.
Willy Tarreau [Fri, 6 Mar 2009 12:07:40 +0000 (13:07 +0100)]
[CLEANUP] remove last references to term_trace
term_trace was very useful while reworking the lower layers but has almost
completely been removed from every place it was referenced. Even the few
remaining ones were not accurate, so it's better to completely remove those
references and re-add them from scratch later if needed.
Willy Tarreau [Fri, 6 Mar 2009 11:51:23 +0000 (12:51 +0100)]
[BUG] switch server-side stream interface to close in case of abort
In pure TCP mode, there is no response analyser to switch the server-side
stream interface from INI to CLO when the output has been closed after an
abort. This caused sessions to remain indefinitely active when they were
aborted by the client during a TCP content analysis.
The proper action is to switch the stream interface to the CLO state from
INI when we have write enable and shutdown write.
Willy Tarreau [Fri, 6 Mar 2009 08:18:27 +0000 (09:18 +0100)]
[OPTIM] rate-limit: cleaner behaviour on low rates and reduce consumption
The rate-limit was applied to the smoothed value which does a special
case for frequencies below 2 events per period. This caused irregular
limitations when set to 1 session per second.
The proper way to handle this is to compute the number of remaining
events that can occur without reaching the limit. This is what has
been added. It also has the benefit that the frequency calculation
is now done once when entering event_accept(), before the accept()
loop, and not once per accept() loop anymore, thus saving a few CPU
cycles during very high loads.
With this fix, rate limits of 1/s are perfectly respected.