git.ipfire.org Git - thirdparty/haproxy.git/log

]> git.ipfire.org Git - thirdparty/haproxy.git/log

projects / thirdparty / haproxy.git / log

Krzysztof Piotr Oledzki [Thu, 29 Jan 2009 23:52:49 +0000 (00:52 +0100)]

[CRITICAL] fix server state tracking: it was O(n!) instead of O(n)

Using the wrong operator (&& instead of &) causes DOWN->UP
transition to take longer than it should and to produce a lot of
redundant logs. With typical "track" usage (1-6 tracking servers) it
shouldn't make a big difference but for heavily tracked servers
this bug leads to hang with 100% CPU usage and extremely big
log spam.

commit | commitdiff | tree

Willy Tarreau [Wed, 4 Feb 2009 21:05:05 +0000 (22:05 +0100)]

[MEDIUM] implement bind-process to limit service presence by process

The "bind-process" keyword lets the admin select which instances may
run on which process (in multi-process mode). It makes it easier to
more evenly distribute the load across multiple processes by avoiding
having too many listen to the same IP:ports.

commit | commitdiff | tree

Willy Tarreau [Wed, 4 Feb 2009 19:20:58 +0000 (20:20 +0100)]

[MEDIUM] add support for source interface binding at the server level

Add support for "interface <name>" after the "source" statement on
the server line.

commit | commitdiff | tree

Willy Tarreau [Wed, 4 Feb 2009 17:46:54 +0000 (18:46 +0100)]

[MEDIUM] add support for source interface binding

Specifying "interface <name>" after the "source" statement allows
one to bind to a specific interface for proxy<->server traffic.

This makes it possible to use multiple links to reach multiple
servers, and to force traffic to pass via an interface different
from the one the system would have chosen based on the routing
table.

commit | commitdiff | tree

Willy Tarreau [Wed, 4 Feb 2009 17:02:48 +0000 (18:02 +0100)]

[BUG] inform the user when root is expected but not set

When a plain user runs haproxy as non-root but some options require
root, let's inform him.

commit | commitdiff | tree

Willy Tarreau [Wed, 4 Feb 2009 16:19:29 +0000 (17:19 +0100)]

[MINOR] add support for bind interface name

By appending "interface <name>" to a "bind" line, it is now possible
to specifically bind to a physical interface name. Note that this
currently only works on Linux and requires root privileges.

commit | commitdiff | tree

Willy Tarreau [Wed, 4 Feb 2009 16:05:23 +0000 (17:05 +0100)]

[BUG] we must not exit if protocol binding only returns a warning

Right now, protocol binding cannot return a warning, but when this
will happen, we must not exit but just print the warning.

commit | commitdiff | tree

Krzysztof Piotr Oledzki [Tue, 27 Jan 2009 20:09:41 +0000 (21:09 +0100)]

[DOC] remove buggy comment for use_backend

"early blocking based on ACLs" is definitely wrong here

commit | commitdiff | tree

Krzysztof Piotr Oledzki [Tue, 27 Jan 2009 15:57:08 +0000 (16:57 +0100)]

[BUG] Fix listen & more of 2 couples <ip>:<port>

Fix "listen www-mutualise 80.248.x.y1:80,80.248.x.y2:80,80.248.x.y3:80":

[ALERT] 309/161509 (15450) : Invalid server address: '80.248.x.y1:80,80.248.x.y2'
[ALERT] 309/161509 (15450) : Error reading configuration file : /etc/haproxy/haproxy.cfg

Bug reported by Laurent Dolosor.

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 15:13:42 +0000 (16:13 +0100)]

[BUILD] add USE_LINUX_SPLICE to enable LINUX_SPLICE on linux 2.6

This will provide high performance data forwarding between sockets,
but it is broken on many kernels and will sometimes forward corrupted
data without some kernel patches. Consider this experimental for now.

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 15:03:28 +0000 (16:03 +0100)]

[MEDIUM] splice: add the global "nosplice" option

Setting "nosplice" in the global section will disable the use of TCP
splicing (both tcpsplice and linux 2.6 splice). The same will be
achieved using the "-dS" parameter on the command line.

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 14:42:27 +0000 (15:42 +0100)]

[MEDIUM] move global tuning options to the global structure

The global tuning options right now only concern the polling mechanisms,
and they are not in the global struct itself. It's not very practical to
add other options so let's move them to the global struct and remove
types/polling.h which was not used for anything else.

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 13:10:48 +0000 (14:10 +0100)]

[BUILD] fix snapshot date extraction with negative timezones

Building with a last commit having a negative time offset would make
"date" complain.

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 13:06:58 +0000 (14:06 +0100)]

[OPTIM] make global.maxpipes default to global.maxconn/4 when not specified

global.maxconn/4 seems to be a good hint for global.maxpipes when that
one must be guessed. If the limit is reached, it's still possible to
set it manually in the configuration.

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 13:02:00 +0000 (14:02 +0100)]

[STATS] report pipe usage in the statistics

Pipe usage is reported in info and web stats including maxpipes, pipes_free,
and pipes_used.

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 12:56:13 +0000 (13:56 +0100)]

[MEDIUM] splice: make use of pipe pools

Using pipe pools makes pipe management a lot easier. It also allows to
remove quite a bunch of #ifdefs in areas which depended on the presence
or not of support for kernel splicing.

The buffer now holds a pointer to a pipe structure which is always NULL
except if there are still data in the pipe. When it needs to use that
pipe, it dynamically allocates it from the pipe pool. When the data is
consumed, the pipe is immediately released.

That way, there is no need anymore to care about pipe closure upon
session termination, nor about pipe creation when trying to use
splice().

Another immediate advantage of this method is that it considerably
reduces the number of pipes needed to use splice(). Tests have shown
that even with 0.2 pipe per connection, almost all sessions can use
splice(), because the same pipe may be used by several consecutive
calls to splice().

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 12:49:53 +0000 (13:49 +0100)]

[MEDIUM] introduce pipe pools

A new data type has been added : pipes. Some pre-allocated empty pipes
are maintained in a pool for users such as splice which use them a lot
for very short times.

Pipes are allocated using get_pipe() and released using put_pipe().
Pipes which are released with pending data are immediately killed.
The struct pipe is small (16 to 20 bytes) and may even be further
reduced by unifying ->data and ->next.

It would be nice to have a dedicated cleanup task which would watch
for the pipes usage and destroy a few of them from time to time.

commit | commitdiff | tree

Ross West [Thu, 22 Jan 2009 23:32:41 +0000 (18:32 -0500)]

[BUILD] fix Makefile.bsd and Makefile.osx for stream_interface

Did a full compile of the 1.3.15.7 - 20081208 snapshot on Freebsd-7.x
recently, and noted that there needs to be a quick patch done on the
Makefile for bsd machines.

This was due to the stream_interface replacing the send data commands
in the rewrite Willy did a while ago.

Simple fix, and it compiled cleanly otherwise. Thanks for the work
Willy!

Cheers,
Ross.

-=

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 10:11:32 +0000 (11:11 +0100)]

[MEDIUM] splice: add hints to support older buggy kernels

Kernels before 2.6.27.13 would have splice() return EAGAIN on shutdown.
By adding a few tricks, we can deal with the situation. If splice()
returns EAGAIN and the pipe is empty, then fallback to recv() which
will be able to check if it's an end of connection or not.

The advantage of this method is that it remains transparent for good
kernels since there is no reason that epoll() will return EPOLLIN
without anything to read, and even if it would happen, the recv()
overhead on this check is minimal.

commit | commitdiff | tree

Willy Tarreau [Sun, 25 Jan 2009 09:42:05 +0000 (10:42 +0100)]

[BUG] reserve some pipes for backends with splice enabled

If splicing is enabled in a backend, we need to guess how many
pipes will be needed. We used to rely on fullconn, but this leads
to non-working splicing when fullconn is not specified. So we now
fallback to global.maxconn.

commit | commitdiff | tree

Willy Tarreau [Sun, 18 Jan 2009 23:32:22 +0000 (00:32 +0100)]

[MAJOR] complete support for linux 2.6 kernel splicing

This code provides support for linux 2.6 kernel splicing. This feature
appeared in kernel 2.6.25, but initial implementations were awkward and
buggy. A kernel >= 2.6.29-rc1 is recommended, as well as some optimization
patches.

Using pipes, this code is able to pass network data directly between
sockets. The pipes are a bit annoying to manage (fd creation, release,
...) but finally work quite well.

Preliminary tests show that on high bandwidths, there's a substantial
gain (approx +50%, only +20% with kernel workarounds for corruption
bugs). With 2000 concurrent connections, with Myricom NICs, haproxy
now more easily achieves 4.5 Gbps for 1 process and 6 Gbps for two
processes buffers. 8-9 Gbps are easily reached with smaller numbers
of connections.

We also try to splice out immediately after a splice in by making
profit from the new ability for a data producer to notify the
consumer that data are available. Doing this ensures that the
data are immediately transferred between sockets without latency,
and without having to re-poll. Performance on small packets has
considerably increased due to this method.

Earlier kernels return only one TCP segment at a time in non-blocking
splice-in mode, while newer return as many segments as may fit in the
pipe. To work around this limitation without hurting more recent kernels,
we try to collect as much data as possible, but we stop when we believe
we have read 16 segments, then we forward everything at once. It also
ensures that even upon shutdown or EAGAIN the data will be forwarded.

Some tricks were necessary because the splice() syscall does not make
a difference between missing data and a pipe full, it always returns
EAGAIN. The trick consists in stop polling in case of EAGAIN and a non
empty pipe.

The receiver waits for the buffer to be empty before using the pipe.
This is in order to avoid confusion between buffer data and pipe data.
The BF_EMPTY flag now covers the pipe too.

Right now the code is disabled by default. It needs to be built with
CONFIG_HAP_LINUX_SPLICE, and the instances intented to use splice()
must have "option splice-response" (or option splice-request) enabled.

It is probably desirable to keep a pool of pre-allocated pipes to
avoid having to create them for every session. This will be worked
on later.

Preliminary tests show very good results, even with the kernel
workaround causing one memcpy(). At 3000 connections, performance
has moved from 3.2 Gbps to 4.7 Gbps.

commit | commitdiff | tree

Willy Tarreau [Sun, 18 Jan 2009 20:59:13 +0000 (21:59 +0100)]

[MEDIUM] add definitions for Linux kernel splicing

Some older libc don't define the splice() syscall, and some even
define a wrong one. For this reason, we try our best to declare
it correctly. These definitions still work with recent glibc.

commit | commitdiff | tree

Willy Tarreau [Sun, 18 Jan 2009 20:56:21 +0000 (21:56 +0100)]

[MINOR] introduce structures required to support Linux kernel splicing

When CONFIG_HAP_LINUX_SPLICE is defined, the buffer structure will be
slightly enlarged to support information needed for kernel splicing
on Linux.

A first attempt consisted in putting this information into the stream
interface, but in the long term, it appeared really awkward. This
version puts the information into the buffer. The platform-dependant
part is conditionally added and will only enlarge the buffers when
compiled in.

One new flag has also been added to the buffers: BF_KERN_SPLICING.
It indicates that the application considers it is appropriate to
use splicing to forward remaining data.

commit | commitdiff | tree

Willy Tarreau [Sun, 18 Jan 2009 20:44:07 +0000 (21:44 +0100)]

[MEDIUM] splice: add configuration options and set global.maxpipes

Three new options have been added when CONFIG_HAP_LINUX_SPLICE is
set :
  - splice-request
  - splice-response
  - splice-auto

They are used to enable splicing per frontend/backend. They are also
supported in defaults sections. The "splice-auto" option is meant to
automatically turn splice on for buffers marked as fast streamers.
This should save quite a bunch of file descriptors.

It was required to add a new "options2" field to the proxy structure
because the original "options" is full.

When global.maxpipes is not set, it is automatically adjusted to
the max of the sums of all frontend's and backend's maxconns for
those which have at least one splice option enabled.

commit | commitdiff | tree

Willy Tarreau [Sun, 18 Jan 2009 19:39:42 +0000 (20:39 +0100)]

[MINOR] global.maxpipes: add the ability to reserve file descriptors for pipes

This will be needed to use linux's splice() syscall.

commit | commitdiff | tree

Willy Tarreau [Sun, 18 Jan 2009 16:38:44 +0000 (17:38 +0100)]

[MEDIUM] stream_sock: try to send pending data on chk_snd()

When the producer calls stream_sock_chk_snd(), we now try to send
all pending data asynchronously. If it succeeds, we don't have to
enable polling on the FD which saves about half of the calls to
epoll_wait().

In stream_sock_read(), we finally set the WAIT_ROOM flag as soon as
possible, in preparation of the splice code. We reset it when we
detect that some room has been released either in the buffer or in
the splice.

commit | commitdiff | tree

Willy Tarreau [Sun, 18 Jan 2009 16:37:33 +0000 (17:37 +0100)]

[MINOR] stream_sock: fix a few wrong empty calculations

commit | commitdiff | tree

Willy Tarreau [Sun, 18 Jan 2009 15:25:31 +0000 (16:25 +0100)]

[MEDIUM] stream_sock_read: call ->chk_snd whenever there are data pending

The condition to cakk ->chk_snd() in stream_sock_read() was suboptimal
because we did not call it when the socket was shut down nor when there
was an error after data were added.

Now we ensure to call is whenever there are data pending.

Also, the "full" condition was handled before calling chk_snd(), which
could cause deadlock issues if chk_snd() did consume some data.

commit | commitdiff | tree

Willy Tarreau [Sun, 18 Jan 2009 14:30:37 +0000 (15:30 +0100)]

[MEDIUM] split stream_sock_write() into callback and core functions

stream_sock_write() has been split in two parts :
  - the poll callback, intented to be called when an I/O event has
    been detected
  - the write() core function, which ought to be usable from various
    other places, possibly not meant to wake the task up.

The code has also been slightly cleaned up in the process. It's more
readable now.

commit | commitdiff | tree

Willy Tarreau [Fri, 9 Jan 2009 12:05:19 +0000 (13:05 +0100)]

[CLEANUP] stream_sock: move the write-nothing condition out of the loop

Some tricks to handle situations where we write nothing were in the
middle of the main loop in stream_sock_write(). This cleanup provides
better source and object code, and slightly shrinks the output code.

commit | commitdiff | tree

Willy Tarreau [Fri, 9 Jan 2009 11:18:24 +0000 (12:18 +0100)]

[CLEANUP] replace a few occurrences of (flags & X) && !(flags & Y)

This construct collapses into ((flags & (X|Y)) == X) when X is a
single-bit flag. This provides a noticeable code shrink and the
output code results in less conditional jumps.

commit | commitdiff | tree

Willy Tarreau [Fri, 9 Jan 2009 10:38:52 +0000 (11:38 +0100)]

[OPTIM] stream_sock: factor out the buffer full handling out of the loop

Handling the buffer full condition is not trivial and this code was
duplicated inside the loop. Move it out of the loop at a single place.

commit | commitdiff | tree

Willy Tarreau [Fri, 9 Jan 2009 10:13:00 +0000 (11:13 +0100)]

[OPTIM] buffer: replace rlim by max_len

In the buffers, the read limit used to leave some place for header
rewriting was set by a pointer to the end of the buffer. Not only
this required subtracts at every place in the code, but this will
also soon not be usable anymore when we want to support keepalive.

Let's replace this with a length limit, comparable to the buffer's
length. This has also sightly reduced the code size.

commit | commitdiff | tree

Willy Tarreau [Thu, 8 Jan 2009 09:09:08 +0000 (10:09 +0100)]

[OPTIM] stream_sock: do not ask for polling on EAGAIN if we have read

It is not always wise to return 0 in stream_sock_read() upon EAGAIN,
because if we have read enough data, we should consider that enough
and try again later without polling in between.

We still make a difference between small reads and large reads though.
Small reads still lead to polling because we're sure that there's
nothing left in the system's buffers if we read less than one MSS.

commit | commitdiff | tree

Willy Tarreau [Wed, 7 Jan 2009 23:09:41 +0000 (00:09 +0100)]

[MEDIUM] i/o: rework ->to_forward and ->send_max

The way the buffers and stream interfaces handled ->to_forward was
really not handy for multiple reasons. Now we've moved its control
to the receive-side of the buffer, which is also responsible for
keeping send_max up to date. This makes more sense as it now becomes
possible to send some pre-formatted data followed by forwarded data.

The following explanation has also been added to buffer.h to clarify
the situation. Right now, tests show that the I/O is behaving extremely
well. Some work will have to be done to adapt existing splice code
though.

/* Note about the buffer structure

   The buffer contains two length indicators, one to_forward counter and one
   send_max limit. First, it must be understood that the buffer is in fact
   split in two parts :
     - the visible data (->data, for ->l bytes)
     - the invisible data, typically in kernel buffers forwarded directly from
       the source stream sock to the destination stream sock (->splice_len
       bytes). Those are used only during forward.

   In order not to mix data streams, the producer may only feed the invisible
   data with data to forward, and only when the visible buffer is empty. The
   consumer may not always be able to feed the invisible buffer due to platform
   limitations (lack of kernel support).

   Conversely, the consumer must always take data from the invisible data first
   before ever considering visible data. There is no limit to the size of data
   to consume from the invisible buffer, as platform-specific implementations
   will rarely leave enough control on this. So any byte fed into the invisible
   buffer is expected to reach the destination file descriptor, by any means.
   However, it's the consumer's responsibility to ensure that the invisible
   data has been entirely consumed before consuming visible data. This must be
   reflected by ->splice_len. This is very important as this and only this can
   ensure strict ordering of data between buffers.

   The producer is responsible for decreasing ->to_forward and increasing
   ->send_max. The ->to_forward parameter indicates how many bytes may be fed
   into either data buffer without waking the parent up. The ->send_max
   parameter says how many bytes may be read from the visible buffer. Thus it
   may never exceed ->l. This parameter is updated by any buffer_write() as
   well as any data forwarded through the visible buffer.

   The consumer is responsible for decreasing ->send_max when it sends data
   from the visible buffer, and ->splice_len when it sends data from the
   invisible buffer.

   A real-world example consists in part in an HTTP response waiting in a
   buffer to be forwarded. We know the header length (300) and the amount of
   data to forward (content-length=9000). The buffer already contains 1000
   bytes of data after the 300 bytes of headers. Thus the caller will set
   ->send_max to 300 indicating that it explicitly wants to send those data,
   and set ->to_forward to 9000 (content-length). This value must be normalised
   immediately after updating ->to_forward : since there are already 1300 bytes
   in the buffer, 300 of which are already counted in ->send_max, and that size
   is smaller than ->to_forward, we must update ->send_max to 1300 to flush the
   whole buffer, and reduce ->to_forward to 8000. After that, the producer may
   try to feed the additional data through the invisible buffer using a
   platform-specific method such as splice().
*/

commit | commitdiff | tree

Willy Tarreau [Wed, 7 Jan 2009 19:10:39 +0000 (20:10 +0100)]

[MEDIUM] stream_sock: factor out the return path in case of no-writes

Previously, we wrote nothing only if the buffer was empty. Now with
send_max, we can also write nothing because we are not allowed to send
anything due to send_max.

The code starts to look like spaghetti. It needs to be rearranged a
lot before merging the splice patches.

commit | commitdiff | tree

Willy Tarreau [Wed, 7 Jan 2009 18:33:39 +0000 (19:33 +0100)]

[MINOR] add the splice_len member to the buffer struct in preparation of splice support

In preparation of splice support, let's add the splice_len member
to the buffer struct. An earlier implementation made it conditional,
which made the whole logics very complex due to a large number of
ifdefs.

Now BF_EMPTY is only set once both buf->l and buf->splice_len are
null. Splice_len is initialized to zero during buffer creation and
is currently not changed, so the whole logics remains unaffected.

When splice gets merged, splice_len will reflect the number of bytes
in flight out of the buffer but not yet sent, typically in a pipe for
the Linux case.

commit | commitdiff | tree

Willy Tarreau [Sun, 14 Dec 2008 16:31:54 +0000 (17:31 +0100)]

[MAJOR] implement autonomous inter-socket forwarding

If an analyser sets buf->to_forward to a given value, that many
data will be forwarded between the two stream interfaces attached
to a buffer without waking the task up. The same applies once all
analysers have been released. This saves a large amount of calls
to process_session() and a number of task_dequeue/queue.

commit | commitdiff | tree

Willy Tarreau [Sun, 14 Dec 2008 13:42:35 +0000 (14:42 +0100)]

[MEDIUM] enable inter-stream_interface wakeup calls

By letting the producer tell the consumer there is data to check,
and the consumer tell the producer there is some space left again,
we can cut in half the number of session wakeups.

This is also an important starting point for future splicing support.

commit | commitdiff | tree

Willy Tarreau [Sun, 14 Dec 2008 12:26:20 +0000 (13:26 +0100)]

[MINOR] add flags to indicate when a stream interface is waiting for space/data

It will soon be required to know when a stream interface is waiting for
buffer data or buffer room. Let's add two flags for that.

commit | commitdiff | tree

Willy Tarreau [Sun, 14 Dec 2008 08:04:47 +0000 (09:04 +0100)]

[MEDIUM] indicate when we don't care about read timeout

Sometimes we don't care about a read timeout, for instance, from the
client when waiting for the server, but we still want the client to
be able to read.

Till now it was done by articially forcing the read timeout to ETERNITY.
But this will cause trouble when we want the low level stream sock to
communicate without waking the session up. So we add a BF_READ_NOEXP
flag to indicate that when the read timeout is to be set, it might
have to be set to ETERNITY.

Since BF_READ_ENA was not used, we replaced this flag.

commit | commitdiff | tree

Willy Tarreau [Sat, 13 Dec 2008 21:25:59 +0000 (22:25 +0100)]

[MEDIUM] don't report buffer timeout when there is I/O activity

We don't want to report a buffer timeout if there was I/O activity
for the same events. That way we'll not have to always re-arm timeouts
on I/O, without the fear of a timeout triggering too fast.

commit | commitdiff | tree

Willy Tarreau [Sat, 13 Dec 2008 20:12:26 +0000 (21:12 +0100)]

[MEDIUM] add a send limit to a buffer

For keep-alive, line-mode protocols and splicing, we will need to
limit the sender to process a certain amount of bytes. The limit
is automatically set to the buffer size when analysers are detached
from the buffer.

commit | commitdiff | tree

Willy Tarreau [Sun, 14 Dec 2008 10:44:04 +0000 (11:44 +0100)]

[MINOR] transfer errors were not reported anymore in data phase

commit | commitdiff | tree

Willy Tarreau [Tue, 23 Dec 2008 22:13:55 +0000 (23:13 +0100)]

[BUG] "option transparent" is for backend, not frontend !

"option transparent" was set and checked on frontends only while it
is purely a backend thing as it replaces the "balance" mode. For this
reason, it did only work in "listen" sections. This change will then
not affect the rare users of this option.

commit | commitdiff | tree

Willy Tarreau [Sun, 21 Dec 2008 12:00:41 +0000 (13:00 +0100)]

[BUG] check timeout must not be changed if timeout.check is not set

This causes health checks to stop after some time since the new
ticks-based scheduler because a check timeout is set to eternity.
This fix must be merged into master but not in earlier versions
as it only affects the new scheduler.
(cherry picked from commit e349eb452b655dc1adc059f05ba8b36565753393)

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 23:16:21 +0000 (00:16 +0100)]

[MINOR] stats: indicate if a task is running in "show sess"

It's sometimes useful to know that a task is currently running.

commit | commitdiff | tree

Willy Tarreau [Thu, 4 Dec 2008 08:33:58 +0000 (09:33 +0100)]

[BUG] do not dequeue the backend's pending connections on a dead server

Kai Krueger found that previous patch was incomplete, because there is
an unconditionnal call to process_srv_queue() in session_free() which
still causes a dead server to consume pending connections from the
backend.

This call was made unconditionnal so that we don't leave unserved
connections in the server queue, for instance connections coming
in with "option persist" which can bypass the server status check.
However, the server must not touch the backend's queue if it is down.

Another fear was that some connections might remain unserved when
the server is using a dynamic maxconn if the number of connections
to the backend is too low. Right now, srv_dynamic_maxconn() ensures
this cannot happen, so the call can remain conditionnal.

The fix consists in allowing a server to process it own queue whatever
its state, but not to touch the backend's queue if it is down. Its
queue should normally be empty when the server is down because it is
redistributed when the server goes down. The only remaining cases are
precisely the persistent connections with "option persist" set, coming
in after the queue has been redispatched. Those ones must still be
processed when a connection terminates.
(cherry picked from commit cd485c44807bfcdb4928dd83c1907636b4e1b6f3)

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 20:51:58 +0000 (21:51 +0100)]

[BUG] do not dequeue requests on a dead server

Kai Krueger reported a problem when a server goes down with active
connections. A lot of connections were drained by that server. Kai
did an amazing job at tracking this bug down to the dequeuing
mechanism which forgets to check the server state before allowing
a request to be sent to a server.

The problem occurs more often with long requests, which have a chance
to complete after the server is completely marked down, and to find
requests in the global queue which have not yet been fetched by other
servers.

The fix consists in ensuring that a server is up before sending it
any new request from the queue.
(cherry picked from commit 80b286a064eaec828b7fd10e98e3f945e8b244f3)
(cherry picked from commit 2e5e0d2853f059a1d09dc81fdbbad9fd03124a98)

commit | commitdiff | tree

Willy Tarreau [Wed, 19 Nov 2008 20:15:17 +0000 (21:15 +0100)]

[MINOR] redirect: in prefix mode a "/" means not to change the URI

If the prefix is set to "/", it means the user does not want to alter
the original URI, so we don't want to insert a new slash before the
original URI.

(cherry-picked from commit 02a35c74942c1bce762e996698add1270e6a5030)

commit | commitdiff | tree

Willy Tarreau [Wed, 19 Nov 2008 20:07:09 +0000 (21:07 +0100)]

[MINOR] redirect: add support for "set-cookie" and "clear-cookie"

It is now possible to set or clear a cookie during a redirection. This
is useful for logout pages, or for protecting against some DoSes. Check
the documentation for the options supported by the "redirect" keyword.

(cherry-picked from commit 4af993822e880d8c932f4ad6920db4c9242b0981)

commit | commitdiff | tree

Willy Tarreau [Wed, 19 Nov 2008 19:03:04 +0000 (20:03 +0100)]

[MINOR] redirect: add support for the "drop-query" option

If "drop-query" is present on a "redirect" line using the "prefix" mode,
then the returned Location header will be the request URI without the
query-string. This may be used on some login/logout pages, or when it
must be decided to redirect the user to a non-secure server.

(cherry-picked from commit f2d361ccd73aa16538ce767c766362dd8f0a88fd)

commit | commitdiff | tree

Willy Tarreau [Sun, 16 Nov 2008 06:40:34 +0000 (07:40 +0100)]

[BUG] critical errors should be reported even in daemon mode

Josh Goebel reported that haproxy silently dies when it fails to
chroot. In fact, it does so when in daemon mode, because daemon
mode has been disabling output for ages.

Since the code has been reworked, this could have been changed
because there is no reason for this anymore, hence this patch.
(cherry picked from commit 304d6fb00fe32fca1bd932a301d4afb7d54c92bc)
(cherry picked from commit 50b7f7f12c67322c793f50a6be009f0fd0eec1bb)

commit | commitdiff | tree

Jeremy Hinegardner [Sun, 16 Nov 2008 00:29:03 +0000 (17:29 -0700)]

[BUILD] fix MANDIR default location to match documentation

I found this while building for Fedora.
(cherry picked from commit a2b53f8831b84b7c8647d7e960b84defd3bcbfa8)
(cherry picked from commit 2cac232b966a252951073d7b1a4bba4c4a730978)

commit | commitdiff | tree

Jeffrey 'jf' Lim [Sat, 4 Oct 2008 16:07:00 +0000 (18:07 +0200)]

[MINOR] cfgparse: fix off-by 2 in error message size

was just looking through the source, and noticed this... :)
(cherry picked from commit 63b76be713784f487e8d0c859a85513642fe7bdc)
(cherry picked from commit a801db6c5ea750f93a3795dbb2e70c03e05bbef4)

commit | commitdiff | tree

Willy Tarreau [Fri, 17 Oct 2008 10:01:58 +0000 (12:01 +0200)]

[BUG] cookie capture is declared in the frontend but checked on the backend

Cookie capture would only work by pure luck on the request but did
never work on responses since only the backend was checked. The fix
consists in always checking frontend for cookie captures.
(cherry picked from commit a83c5ba9315a7c47cda2698280b7e49a9d3eb374)

commit | commitdiff | tree

Willy Tarreau [Sun, 12 Oct 2008 15:26:37 +0000 (17:26 +0200)]

[BUG] acl-related keywords are not allowed in defaults sections

Using an ACL-related keyword in the defaults section causes a
segfault during parsing because the list headers are not initialized.
We must initialize list headers for default instance and reject
keywords relying on ACLs.
(cherry picked from commit 1c90a6ec20946a713e9c93995a8e91ed3eeb9da4)
(cherry picked from commit eb8131b4e418b838b2d62d991d91d94482ba49de)

commit | commitdiff | tree

Willy Tarreau [Sun, 12 Oct 2008 10:07:48 +0000 (12:07 +0200)]

[BUG] ensure that listeners from disabled proxies are correctly unbound.

There is a problem when an instance is marked "disabled". Its ports are
still bound but will not be unbound upon termination. This causes processes
to accumulate during soft restarts, and might even cause failures to restart
new ones due to the inability to bind to the same port.

The ideal solution would be to bind all ports at the end of the configuration
parsing. An acceptable workaround is to unbind all listeners of disabled
proxies. This is what the current patch does.
(cherry picked from commit a944218e9c1d5ff1aca34609146389dc680335b7)
(cherry picked from commit 8cfebbb82b87345bade831920177077e7d25840a)

commit | commitdiff | tree

Willy Tarreau [Fri, 10 Oct 2008 15:51:34 +0000 (17:51 +0200)]

[BUG] do not try to pause backends during reload

During a configuration reload, haproxy tried to pause all proxies.
Unfortunately, it also tried to pause backends, which would fail
and cause trouble to the new process since the port was still bound.

(backported from commit eab5c70f93c0a44223f706f6c120ad8d59f28796)
(cherry picked from commit ac1ca38e9b07422e21b5b4778918d243768e5498)

commit | commitdiff | tree

Willy Tarreau [Sun, 14 Sep 2008 15:43:27 +0000 (17:43 +0200)]

[BUG] dynamic connection throttling could return a max of zero conns

srv_dynamic_maxconn() is clearly documented as returning at least 1
possible connection under throttling. But the computation was wrong,
the minimum 1 was divided and got lost in case of very low maxconns.

Apply the MAX(1, max) before returning the result in order to ensure
that a newly appeared server will get some traffic.
(cherry picked from commit 819970098f134453c0934047b3bd3440b0996b55)

commit | commitdiff | tree

Willy Tarreau [Sun, 14 Sep 2008 15:40:09 +0000 (17:40 +0200)]

[BUG] do not release the connection slot during a retry

(forward-port of commit 8262d8bd7fdb262c980bd70cb2931e51df07513f)

A bug was introduced during last queue management fix. If a server
connection fails, the allocated connection slot is released, but it
will be needed again after the turn-around. This also causes more
connections than expected to go to the server because it appears to
have less connections than real.

Many thanks to Rupert Fiasco, Mark Imbriaco, Cody Fauser, Brian
Gupta and Alexander Staubo for promptly providing configuration
and diagnosis elements to help reproduce this problem easily.

commit | commitdiff | tree

Jeffrey 'jf' Lim [Wed, 3 Sep 2008 17:03:03 +0000 (01:03 +0800)]

[MINOR] acl: add new keyword "connslots"

I'm in the process of setting up one haproxy instance now, and I find
the following acl option useful. I'm not too sure why this option has
not been available before, but I find this useful for my own usage, so
I'm submitting this patch in the hope that it will be useful as well.

The basic idea is to be able to measure the available connection slots
still available (connection, + queue) - anything beyond that can be
redirected to a different backend. 'connslots' = number of available
server connection slots, + number of available server queue slots. In
the case where we encounter srv maxconn = 0, or srv maxqueue = 0 (in
which case we dont need to care about connslots) the value you get is
-1. Note also that this code does not take care of dynamic connections
at this point in time.

The reason why I'm using this new acl (as opposed to 'nbsrv') is that
'nbsrv' only measures servers that are actually *down*. Whereas this
other acl is more fine-grained, and looks into the number of conn
slots available as well.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 21:29:48 +0000 (22:29 +0100)]

[MEDIUM] add support for "show sess" in unix stats socket

It is now possible to list all known sessions by issuing "show sess"
on the unix stats socket. The format is not much evolved but it is
very useful for debugging.

The doc has been updated to reflect the new keyword.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 19:16:23 +0000 (20:16 +0100)]

[MINOR] add back-references to sessions for later use by a dumper.

This is the first step in implementing a session dump tool.
A session dump will need restart points. It will be necessary for
it to get references to sessions which can be moved when the session
dies.

The principle is not that complex : when a session ends, it looks for
any potential back-references. If it finds any, then it moves them to
the next session in the list. The dump function will of course have
to restart from that new point.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 19:00:15 +0000 (20:00 +0100)]

[MINOR] add a new back-reference type : struct bref

This type will be used to maintain back-references to items which
are subject to move between accesses. Typical usage includes session
removal during a listing.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 17:30:00 +0000 (18:30 +0100)]

[MINOR] slightly rebalance stats_dump_{raw,http}

Both should process the response buffer equally. They now both
clear the hijack bit once done, and both receive a pointer to
the response buffer in their arguments.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 17:03:29 +0000 (18:03 +0100)]

[MEDIUM] reference the current hijack function in the buffer itself

Instead of calling a hard-coded function to produce data, let's
reference this function into the buffer and call it from there
when BF_HIJACK is set. This goes in the direction of more generic
session management code.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 15:45:10 +0000 (16:45 +0100)]

[MINOR] move the listener reference from fd to session

The listener referenced in the fd was only used to check the
listener state upon session termination. There was no guarantee
that the FD had not been reassigned by the moment it was processed,
so this was a bit racy. Having it in the session is more robust.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 15:27:56 +0000 (16:27 +0100)]

[MEDIUM] remove cli_fd, srv_fd, cli_state and srv_state from the session

Those were previously used by the unix sockets only, and could be
removed.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 15:06:43 +0000 (16:06 +0100)]

[MAJOR] make unix sockets work again with stats

The unix protocol handler had not been updated during the last
stream_sock changes. This has been done now. There is still a
lot of duplicated code between session.c and proto_uxst.c due
to the way the session is handled. Session.c relies on the existence
of a frontend while it does not exist here.

It is easier to see the difference between the stats part (placed
in dumpstats.c) and the unix-stream part (in proto_uxst.c).

The hijacking function still needs to be dynamically set into the
response buffer, and some cleanup is still required, then all those
changes should be forward-ported to the HTTP part. Adding support
for new keywords should not cause trouble now.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 13:37:09 +0000 (14:37 +0100)]

[MINOR] add an analyser state in struct session

It will be very convenient to have an analyser state in the session.
It will always be initialized to zero. The analysers can make use of
it, but must reset it to zero when they leave.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 13:04:04 +0000 (14:04 +0100)]

[MEDIUM] ensure that sock->shutw() also closes read for init states

Non-connected states will never have a chance to receive a shutr event,
so we need to propagate the shutw across the stream interface.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 12:05:04 +0000 (13:05 +0100)]

[BUG] do not forward close from cons to prod with analysers

We must not forward a close from consumer to producer as long as
an analyser is present.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 10:50:35 +0000 (11:50 +0100)]

[MINOR] pre-set analyser flags on the listener at registration time

In order to achieve more generic accept() code, we can set the request
analysers at the listener registration time. It's better than doing it
during accept(), and allows more code reuse.

commit | commitdiff | tree

Willy Tarreau [Sun, 7 Dec 2008 10:28:08 +0000 (11:28 +0100)]

[MINOR] add an analyser code for UNIX stats request

The UNIX stats socket will be analysed like any other protocol. Add
an analyser for it.

commit | commitdiff | tree

Willy Tarreau [Mon, 1 Dec 2008 00:44:25 +0000 (01:44 +0100)]

[MEDIUM] first pass of lifting to proto_uxst.c:uxst_event_accept()

The accept function must be adapted to the new framework. It is
still broken, and calling it will still result in a segfault. But
this cleanup is needed anyway.

commit | commitdiff | tree

Willy Tarreau [Mon, 1 Dec 2008 00:35:40 +0000 (01:35 +0100)]

[MINOR] declare process_session in session.h, not proto_http.h

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 23:08:28 +0000 (00:08 +0100)]

[BUG] fix forgotten server session counter

The server session counter was forgotten when the session establishes.

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 22:51:27 +0000 (23:51 +0100)]

[MEDIUM] rename process_request to http_process_request

Now the function only does HTTP request and nothing else. Also pass
the request buffer to it.

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 22:36:37 +0000 (23:36 +0100)]

[MEDIUM] move the HTTP request body analyser out of process_request().

A new function http_process_request_body() has been created to process
the request body. Next step is now to clean up process_request().

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 22:28:40 +0000 (23:28 +0100)]

[MEDIUM] extract the HTTP tarpit code from process_request().

The tarpit is now an autonomous independant analyser.

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 22:15:34 +0000 (23:15 +0100)]

[MEDIUM] extract TCP request processing from HTTP

The TCP analyser has moved to proto_tcp.c. Breaking the function
has required finer use of the return value and adding some tests
to process_session().

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 20:37:12 +0000 (21:37 +0100)]

[MINOR] stream_sock_data_finish() should not expose fd

stream_sock_data_finish was still using a file descriptor as only
argument, while a stream interface is preferred. This is now fixed.

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 20:13:54 +0000 (21:13 +0100)]

[CLEANUP] session.c: removed some migration left-overs in sess_establish()

A few obsolete fd manipulations were left in sess_establish. Obviously
they must go away.

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 19:44:17 +0000 (20:44 +0100)]

[MEDIUM] make the http server error function a pointer in the session

It was a bit awkward to have session.c call return_srv_error() for
HTTP error messages related to servers. The function has been adapted
to be passed a pointer to the faulty stream interface, and is now a
pointer in the session. It is possible that in the future, it will
become a callback in the stream interface itself.

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 19:20:08 +0000 (20:20 +0100)]

[MINOR] replace srv_close_with_err() with http_server_error()

The new function looks like the previous one except that it operates
at the stream interface level and assumes an already closed SI.

Also remove some old unused occurrences of srv_close_with_err().

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 18:48:07 +0000 (19:48 +0100)]

[MINOR] replace client_retnclose() with stream_int_retnclose()

This makes more sense to return a message to a stream interface
than to a session.

senddata.{c,h} have been removed.

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 18:22:53 +0000 (19:22 +0100)]

[MINOR] replace the ambiguous client_return function by stream_int_return

This one applies to a stream interface, which makes more sense.

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 18:02:32 +0000 (19:02 +0100)]

[MINOR] call session->do_log() for logging

In order to avoid having to call per-protocol logging function directly
from session.c, it's better to assign the logging function when the session
is created. This also eliminates a test when the function is needed, and
opens the way to more complete logging functions.

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 17:47:21 +0000 (18:47 +0100)]

[CLEANUP] move the session-related functions to session.c

proto_http.c was not suitable for session-related processing, it was
just convenient for the tranformation.

Some more splitting must occur: process_request/response in proto_http.c
must be split again per protocol, and the caller must run a list.

Some functions should be directly attached to the session or the buffer
(eg: perform_http_redirect, return_srv_error, http_sess_log).

commit | commitdiff | tree

Willy Tarreau [Sun, 30 Nov 2008 17:14:12 +0000 (18:14 +0100)]

[MAJOR] complete layer4/7 separation

All the processing has now completely been split in layers. As of
now, everything is still in process_session() which is not the right
place, but the code sequence works. Timeouts, retries, errors, all
work.

The shutdown sequence has been strictly applied: BF_SHUTR/BF_SHUTW
are only assigned by lower layers. Upper layers can only indicate
their wish to close using BF_SHUTR_NOW and BF_SHUTW_NOW.

When a shutdown is performed on a stream interface, the buffer flags
are updated accordingly and re-checked by upper layers. A lot of care
has been taken to ensure that aborts during intermediate connection
setups are correctly handled and shutdowns correctly propagated to
both buffers.

A future evolution would consist in ensuring that BF_SHUT?_NOW may
be set at any time, and applies only when the buffer is empty. This
might help with error messages, but might complicate the processing
of data remaining in buffers.

Some useless buffer flag combinations have been removed.

Stat counters are still broken (eg: per-server total number of sessions).

Error messages should be delayed to the close instant and be produced by
protocol.

Many functions must now move to proper locations.

commit | commitdiff | tree

Willy Tarreau [Thu, 27 Nov 2008 09:30:51 +0000 (10:30 +0100)]

[MEDIUM] make the stream interface control the SHUT{R,W} bits

It's better that the stream interface controls the BF_SHUT* bits so
that they always reflect the real state of the interface.

commit | commitdiff | tree

Willy Tarreau [Thu, 27 Nov 2008 08:25:45 +0000 (09:25 +0100)]

[MEDIUM] process shutw during connection attempt

It sometimes happens that a connection is aborted at the exact same moment
it establishes. We have to close the socket and not only to shut it down
for writes.

Some corner cases remain. We have to handle the shutr/shutw at the stream
interface and only report the status to the buffer, not the opposite.

commit | commitdiff | tree

Willy Tarreau [Sun, 23 Nov 2008 20:33:29 +0000 (21:33 +0100)]

[BUG] shutw must imply close during a connect

The sessions which were remaining stuck were being connecting to the
server while they received a shutw which caused them to partially
stop. A shutw() during a connect() must imply a close().

commit | commitdiff | tree

Willy Tarreau [Sun, 23 Nov 2008 18:53:55 +0000 (19:53 +0100)]

[MINOR] maintain a global session list in order to ease debugging

Now the global variable 'sessions' will be a dual-linked list of all
known sessions. The list element is set at the beginning of the session
so that it's easier to follow them all with gdb.

commit | commitdiff | tree

Willy Tarreau [Sun, 23 Nov 2008 18:31:35 +0000 (19:31 +0100)]

[MEDIUM] remove stream_sock_update_data()

Two new functions are used instead : buffer_check_{shutr,shutw}.
It is indeed more adequate to check for new closures only when the
buffer reports them.

Several remaining unclosed connections were detected after a test,
even before this patch, so a bug remains. To reproduce, try the
following during 30 seconds :

inject30l4 -n 20000 -l -t 1000 -P 10 -o 4 -u 100 -s 100 -G 127.0.0.1:8000/

commit | commitdiff | tree

Willy Tarreau [Sun, 23 Nov 2008 16:23:07 +0000 (17:23 +0100)]

[MEDIUM] stream_interface: added a DISconnected state between CON/EST and CLO

There were rare situations where it was not easy to detect that a failed
session attempt had occurred and needed some server cleanup. In particular,
client aborts sometimes lead to session leaks on the server side.

A new state "SI_ST_DIS" (disconnected) has been introduced for this. When
a session has been closed at a stream interface but the server cleanup has
not occurred, this state is entered instead of CLO. The cleanup is then
performed there and the state goes to CLO.

A new diagram has been added to show possible stream_interface state
transitions that can occur in a stream-sock. It makes debugging easier.

commit | commitdiff | tree

Willy Tarreau [Wed, 12 Nov 2008 00:51:41 +0000 (01:51 +0100)]

[MEDIUM] continue layering cleanups.

The server sessions are now only decremented when entering SI_ST_CER
and SI_ST_CLO states. A state is clearly missing between EST and CLO,
or after CLO (eg: END), because many cleanups are performed upon CLO
and must rely on tricks to ensure being done only once.

The goal of next changes will be to improve what has been started.
Ideally, the FD should only notify the SI about the change, which
should itself only notify the session when it has some news or when
it needs help (eg: redispatch). The buffer's error processing should
not change the FD's status immediately, otherwise we risk race conds
between a pending connect and a shutw (for instance). Also, the new
connect attempt should only be made after layer 7 and all the crap
above buffers.

commit | commitdiff | tree

Willy Tarreau [Tue, 11 Nov 2008 19:20:02 +0000 (20:20 +0100)]

[MEDIUM] add the SN_CURR_SESS flag to the session to track open sessions

It is quite hard to track when the current session has already been counted
or discounted from the server's total number of established sessions. For
this reason, we introduce a new session flag, SN_CURR_SESS, which indicates
if the current session is one of those reported by the server or not. It
simplifies session accounting and makes it far more robust. It also makes
it possible to perform a last-minute cleanup during session_free().

Right now, with this fix and a few more buffer transitions fixes, no session
were found to remain after a test.

commit | commitdiff | tree

Willy Tarreau [Mon, 3 Nov 2008 05:26:53 +0000 (06:26 +0100)]

[MAJOR] add a connection error state to the stream_interface

Tracking connection status changes was hard, and some code was
redundant. A new SI_ST_CER state was added to the stream interface
to indicate a past connection error, and an SI_FL_ERR flag was
added to report past I/O error. The stream_sock code does not set
the connection to SI_ST_CLO anymore in case of I/O error, it's
the upper layer which does it. This makes it possible to know
exactly when the file descriptors are allocated.

The new SI_ST_CER state permitted to split tcp_connection_status()
in two parts, one processing SI_ST_CON and the other one SI_ST_CER.
Synchronous connection errors now make use of this last state, hence
eliminating duplicate code.

Some ib<->ob copy paste errors were found and fixed, and all entities
setting SI_ST_CLO also shut the buffers down.

Some of these stream_interface specific functions and structures
have migrated to a new stream_interface.c file.

Some types of errors are still not detected by the buffers. For
instance, let's assume the following scenario in one single pass
of process_session: a connection sits in SI_ST_TAR state during
a retry. At TAR expiration, a new connection attempt is made, the
connection is obtained and srv->cur_sess is increased. Then the
buffer timeout is fires and everything is cleared, the new state
becomes SI_ST_CLO. The cleaning code checks that previous state
was either SI_ST_CON or SI_ST_EST to release the connection. But
that's wrong because last state is still SI_ST_TAR. So the
server's connection count does not get decreased.

This means that prev_state must not be used, and must be replaced
by some transition detection instead of level detection.

The following debugging line was useful to track state changes :

fprintf(stderr, "%s:%d: cs=%d ss=%d(%d) rqf=0x%08x rpf=0x%08x\n", __FUNCTION__, __LINE__,
s->si[0].state, s->si[1].state, s->si[1].err_type, s->req->flags, s-> rep->flags);

Mirror of https://github.com/haproxy/haproxy.git

RSS Atom