Willy Tarreau [Mon, 25 Jan 2010 00:49:57 +0000 (01:49 +0100)]
[MINOR] buffer_replace2 must never change the ->w entry
This function is used to move data which is located between ->w and ->r,
so it must not touch ->w, otherwise it will displace pending data which
is before the one we're actually overwriting. The issue arose in 1.4 with
some pipelined responses which cause some part of the previous one to
be chopped off when removing the connection: close header, thus
corrupting last response and shifting next one. Those are detected
in the logs because the next response will be a 502 with flags PH.
Note that this does not affect 1.3, still this is a bug that's better
fixed than blindly copy-pasted and woken up again.
Willy Tarreau [Sun, 24 Jan 2010 12:10:43 +0000 (13:10 +0100)]
[MINOR] http: logs must report persistent connections to down servers
When using "option persist" or "force-persist", we want to know from the
logs if the cookie referenced a valid server or a down server. Till here
the flag reported a valid server even if the server was down, which is
misleading. Now we correctly report that the requested server was down.
We can typically see "--DI" when using "option persist" with redispatch,
ad "SCDN" when using force-persist on a down server.
(cherry picked from commit 2a6d88dafe5e5628abb962caf395143d0c0b6024)
Willy Tarreau [Fri, 22 Jan 2010 18:10:05 +0000 (19:10 +0100)]
[MEDIUM] add the "force-persist" statement to force persistence on down servers
This is used to force access to down servers for some requests. This
is useful when validating that a change on a server correctly works
before enabling the server again.
Willy Tarreau [Fri, 22 Jan 2010 13:17:47 +0000 (14:17 +0100)]
[CLEANUP] http_server_error() must not purge a previous pending response
This can cause parts of responses to be truncated in case of
pipelined requests if the second request generates an error
before the first request is completely flushed.
Willy Tarreau [Fri, 15 Jan 2010 22:38:27 +0000 (23:38 +0100)]
[CLEANUP] buffers: remove remains of wrong obsolete length check
A check was performed in buffer_replace2() to compare buffer
length with its read pointer. This has been wrong for a long
time, though it only has an impact when dealing with keep-alive
requests/responses. In theory this should be backported but
the check has no impact without keep-alive.
(cherry picked from commit 43a7e6620b79e0e771dbaf2a60b57c96d9ba60e5)
Willy Tarreau [Fri, 15 Jan 2010 09:35:58 +0000 (10:35 +0100)]
[BUG] check: we must not check for error before reading a response
We can receive data with a notification of socket error. But we
must not check for the error before reading the data, because it
may be an asynchronous error notification that we check too early
while the response we're waiting for is available. If there is an
error, recv() will get it.
This should help with servers that close very fast after the response
and should also slightly lower the CPU usage during very fast checks
on massive amounts of servers since we eliminate one system call.
Cyril Bonté [Sun, 10 Jan 2010 16:01:47 +0000 (17:01 +0100)]
[MINOR] config: don't accept 'appsession' in defaults section
Maybe appsession should be forbidden in the 'defaults' section as it
will not work in the backends.
(cherry picked from commit 3b7a369baa189aa851bed5ea92f5ed4cb5cb4418)
Cyril Bonté [Sat, 9 Jan 2010 23:30:14 +0000 (00:30 +0100)]
[BUG] appsession: possible memory leak in case of out of memory condition
I've tried to follow all the pool_alloc2/pool_free2 calls in the code
to track memory leaks. I've found one which only happens when there's
already no more memory when allocating a new appsession cookie.
Willy Tarreau [Sat, 9 Jan 2010 23:42:19 +0000 (00:42 +0100)]
[MINOR] http redirect: add the ability to append a '/' to the URL
Sometimes it can be desired to return a location which is the same
as the request with a slash appended when there was not one in the
request. A typical use of this is for sending a 301 so that people
don't reference links without the trailing slash. The name of the
new option is "append-slash" and it can be used on "redirect"
statements in prefix mode.
Willy Tarreau [Sat, 9 Jan 2010 23:24:22 +0000 (00:24 +0100)]
[MINOR] http: fix double slash prefix with server redirect
When using server redirection, it is possible to specify a path
consisting of only one slash. While this is discouraged (risk of
loop) it may sometimes be useful combined with content switching.
The prefixing of a '/' then causes two slashes to be returned in
the response. So we now do as with the other redirects, don't
prepend a slash if it's alone.
(cherry picked from commit dcb75c4a83246f4907cdd5ffac9cbd7b71732816)
Willy Tarreau [Sun, 3 Jan 2010 18:47:39 +0000 (19:47 +0100)]
[MINOR] config: some options were missing for "redirect"
Those options were missing in the parser error message :
set-cookie, clear-cookie, drop-query
(cherry picked from commit 963abc33a2ae002b2efb1bc228ccc8dcb1c72d91)
Willy Tarreau [Sun, 3 Jan 2010 18:45:54 +0000 (19:45 +0100)]
[BUG] http: fix cookie parser to support spaces and commas in values
The cookie parser could be fooled by spaces or commas in cookie names
and values, causing the persistence cookie not to be matched if located
just after such a cookie. Now spaces found in values are considered as
part of the value, and spaces, commas and semi-colons found in values
or names, are skipped till next cookie name.
Willy Tarreau [Wed, 30 Dec 2009 00:10:35 +0000 (01:10 +0100)]
[MINOR] config: option forceclose is valid in frontends too
This option was disabled for frontends in the configuration because
it was useless in its initial implementation, though it was still
checked in the code. Let's officially enable it now.
(cherry picked from commit a31e5dff36b0b7a3c831abae1290b9df168fdd6f)
Willy Tarreau [Mon, 28 Dec 2009 17:37:54 +0000 (18:37 +0100)]
[CLEANUP] buffers: wrong size calculation for displaced data
This error was triggered by requests not starting at the beginning
of the buffer. It cannot happen with earlier versions though it might
be a good idea to fix it anyway.
(cherry picked from commit 019fd5bc932e2527c4a7bf196903aa1055537c1f)
Willy Tarreau [Mon, 28 Dec 2009 05:57:33 +0000 (06:57 +0100)]
[MINOR] http: typos on several unlikely() around header insertion
In many places where we perform header insertion, an error control
is performed but due to a mistake, it cannot match any error :
if (unlikely(error) < 0)
instead of
if (unlikely(error < 0))
This prevents error 400 responses from being sent when the buffer is
full due to many header additions. This must be backported to 1.3.
(cherry picked from commit 58cc872848314ef2ecbaf9808ce4bd5f5b20bb69)
Willy Tarreau [Sun, 6 Dec 2009 17:16:59 +0000 (18:16 +0100)]
[BUG] check_post: limit analysis to the buffer length
If "balance url_param XXX check_post" is used, we must bound the
number of bytes analysed to the buffer's length.
(cherry picked from commit dc8017ced6a8ec699a50a409f3c8ce5928ea70fa)
[BUG] config: fix erroneous check on cookie domain names, again
The previous check was correct: the RFC states that it is required
to have a domain-name which contained a dot AND began with a dot.
However, currently some (all?) browsers do not obey this specification,
so such configuration might work.
Willy Tarreau [Thu, 17 Dec 2009 20:12:16 +0000 (21:12 +0100)]
[CLEANUP] second fix for the printf format warning
Fix 500b8f0349fb52678f5143c49f5a8be5c033a988 fixed the patch for the 64 bit
case but caused the opposite type issue to appear on 32 bit platforms. Cast
the difference and be done with it since gcc does not agree on type carrying
the difference between two pointers on 32 and 64 bit platforms.
(cherry picked from commit 3ccf94efd99db5763546750729b5a81e3b7bce19)
[CLEANUP] format '%d' expects type 'int', but argument 5 has type 'long int'
src/cfgparse.c: In function 'readcfgfile':
src/cfgparse.c:4087: warning: format '%d' expects type 'int', but argument 5 has type 'long int'
(cherry picked from commit 500b8f0349fb52678f5143c49f5a8be5c033a988)
Willy Tarreau [Tue, 15 Dec 2009 20:46:25 +0000 (21:46 +0100)]
[MINOR] config: don't report error on all subsequent files on failure
Cyril Bonté found that when an error is detected in one config file, it
is also reported in all other ones, which is wrong. The fix obviously
consists in checking the return code from readcfgfile() and not the
accumulator.
(cherry picked from commit 25a67fae3e2dd374c51c5e50633ea68b08157fab)
Cyril Bonté [Sun, 6 Dec 2009 12:43:42 +0000 (13:43 +0100)]
[BUG] Configuration parser bug when escaping characters
Today I was testing headers manipulation but I met a bug with my first test.
To reproduce it, add for example this line :
rspadd Cache-Control:\ max-age=1500
Check the response header, it will provide :
Cache-Control: max-age=15000 <= the last character is duplicated
This only happens when we use backslashes on the last line of the
configuration file, without returning to the line.
Also if the last line is like :
rspadd Cache-Control:\ max-age=1500\
the last backslash causes a segfault.
This is not due to rspadd but to a more general bug in cfgparse.c :
...
if (skip) {
memmove(line + 1, line + 1 + skip, end - (line + skip + 1));
end -= skip;
}
...
should be :
...
if (skip) {
memmove(line + 1, line + 1 + skip, end - (line + skip));
end -= skip;
}
...
Willy Tarreau [Sun, 6 Dec 2009 12:10:44 +0000 (13:10 +0100)]
[BUG] config: fix error message when config file is not found
Cameron Simpson reported an annoying case where haproxy simply reports
"Error(s) found in configuration file" when the file is not found or
not readable.
Fortunately the parsing function still returns -1 in case of open
error, so we're able to detect the issue from the caller and report
the corresponding errno message.
(cherry picked from commit c438242878c8bdabffaef62dd2859920cc3e7d26)
Willy Tarreau [Mon, 30 Nov 2009 10:50:16 +0000 (11:50 +0100)]
[BUG] x-original-to: name was not set in default instance
This resulted in an empty header name when option originalto
was declared in a default sections.
(cherry picked from commit b86db34fe00ec91909dbcbf5e889bab458dc0ea8)
Alex Williams [Mon, 2 Nov 2009 02:27:13 +0000 (21:27 -0500)]
[MINOR] server tracking: don't care about the tracked server's mode
Right now, an HTTP server cannot track a TCP server and vice-versa.
This patch enables proxy tracking without relying on the proxy's mode
(tcp/http/health). It only requires a matching proxy name to exist. The
original function was renamed to findproxy_mode().
Willy Tarreau [Thu, 3 Dec 2009 22:28:34 +0000 (23:28 +0100)]
[MINOR] config: support passing multiple "domain" statements to cookies
In some environments it is not possible to rely on any wildcard for a
domain name (eg: .com, .net, .fr...) so it is required to send multiple
domain extensions. (Un)fortunately the syntax check on the domain name
prevented that from being done the dirty way. So let's just build a
domain list when multiple domains are passed on the same line.
Willy Tarreau [Mon, 9 Nov 2009 20:27:51 +0000 (21:27 +0100)]
[BUG] config: disable 'option httplog' on TCP proxies
Gabriel Sosa reported that logs were appearing with BADREQ when
'option httplog' was used with a TCP proxy (eg: inherited via a
default instance). This patch detects it and falls back to tcplog
after emitting a warning.
Willy Tarreau [Mon, 9 Nov 2009 20:16:53 +0000 (21:16 +0100)]
[BUG] config: fix wrong handling of too large argument count
Holger Just reported that running ACLs with too many args caused
a segfault during config parsing. This is caused by a wrong test
on argument count. In case of too many arguments on a config line,
the last one was not correctly zeroed. This is now done and we
report the error indicating what part had been truncated.
Cyril Bonté [Wed, 14 Oct 2009 22:15:40 +0000 (00:15 +0200)]
[MEDIUM] appsession: add the "request-learn" option
This patch has 2 goals :
1. I wanted to test the appsession feature with a small PHP code,
using PHPSESSID. The problem is that when PHP gets an unknown session
id, it creates a new one with this ID. So, when sending an unknown
session to PHP, persistance is broken : haproxy won't see any new
cookie in the response and will never attach this session to a
specific server.
This also happens when you restart haproxy : the internal hash becomes
empty and all sessions loose their persistance (load balancing the
requests on all backend servers, creating a new session on each one).
For a user, it's like the service is unusable.
The patch modifies the code to make haproxy also learn the persistance
from the client : if no session is sent from the server, then the
session id found in the client part (using the URI or the client cookie)
is used to associated the server that gave the response.
As it's probably not a feature usable in all cases, I added an option
to enable it (by default it's disabled). The syntax of appsession becomes :
appsession <cookie> len <length> timeout <holdtime> [request-learn]
This helps haproxy repair the persistance (with the risk of losing its
session at the next request, as the user will probably not be load
balanced to the same server the first time).
2. This patch also tries to reduce the memory usage.
Here is a little example to explain the current behaviour :
- Take a Tomcat server where /session.jsp is valid.
- Send a request using a cookie with an unknown value AND a path
parameter with another unknown value :
(I know, it's unexpected to have a request like that on a live service)
Here, haproxy finds the URI session ID and stores it in its internal
hash (with no server associated). But it also finds the cookie session
ID and stores it again.
- As a result, session.jsp sends a new session ID also stored in the
internal hash, with a server associated.
=> For 1 request, haproxy has stored 3 entries, with only 1 which will be usable
The patch modifies the behaviour to store only 1 entry (maximum).
Michael Shuler [Wed, 14 Oct 2009 15:23:03 +0000 (10:23 -0500)]
[DOC] trivial fix for man page
I'm working on helping Arnaud update haproxy in Debian, and one of the
package build warnings I received was about "hyphen where a minus sign
was intended" in the man page - details:
http://lintian.debian.org/tags/hyphen-used-as-minus-sign.html
Patch included in my 1.3.20 Debian package is attached.
Willy Tarreau [Wed, 14 Oct 2009 18:43:22 +0000 (20:43 +0200)]
[RELEASE] Released version 1.3.22
Released version 1.3.22 with the following main changes :
- [BUG] unix socket: don't try to dereference frontend/backends
- [MINOR] unix socket: report the socket path in case of bind error
- [CONTRIB] halog: support searching by response time
- [DOC] add a reminder about obsolete documents
Willy Tarreau [Wed, 14 Oct 2009 18:30:15 +0000 (20:30 +0200)]
[DOC] add a reminder about obsolete documents
haproxy-en.txt and haproxy-fr.txt are outdated but people still refer to
them quite often, generally causing a useless waste of time.
(cherry picked from commit a080eca533c860f038e848274a38ad91dc951df4)
Willy Tarreau [Wed, 14 Oct 2009 13:16:48 +0000 (15:16 +0200)]
[MINOR] unix socket: report the socket path in case of bind error
When an error occurs during binding of the stats unix socket, messages
are far from clear for the user !
(cherry picked from commit 5d53634f3634ed377843d37ca5450ffed43ecda8)
Willy Tarreau [Wed, 14 Oct 2009 13:22:58 +0000 (15:22 +0200)]
[BUG] unix socket: don't try to dereference frontend/backends
John Lauro reported a new crash on 1.3.21 due to a dereferencing bug
of the frontend which does not have any frontend. The bug was introduced
by commit a3e0e0767f55474e676fffa3387dab4d022a0675.
Willy Tarreau [Mon, 12 Oct 2009 04:20:09 +0000 (06:20 +0200)]
[RELEASE] Released version 1.3.21
Released version 1.3.21 with the following main changes :
- [DOC] add missing rate_lim and rate_max
- [BUG] check if rise/fall has an argument and it is > 0
- [MINOR] add "description", "node" and show-node"/"show-desc", remove "node-name", v2
- [DOC] Add information about http://haproxy.1wt.eu/contrib.html
- [MINOR] acl: don't report valid acls as potential mistakes
- [BUG] task.c: don't assing last_timer to node-less entries
- [MINOR] export the hostname variable so that all the code can access it
- [MINOR] stats: add a new node-name setting
- [MINOR] acl: add support for hdr_ip to match IP addresses in headers
- [CLEANUP] remove ifdef MSG_NOSIGNAL and define it instead
- [BUG] buffer_forward() would not correctly consider data already scheduled
- [MAJOR] http: add support for HTTP 1xx informational responses
- [BUILD] stream_interface: fix conflicting declaration
- [CLEANUP] include time.h from freq_ctr.h as it uses "now".
- [MINOR] report list of supported pollers with -vv
- [MEDIUM] new option "independant-streams" to stop updating read timeout on writes
- [BUG] don't refresh timeouts late after detected activity
- [MINOR] acl: add fe_conn, be_conn, queue, avg_queue
- [BUILD] add a 'make tags' target (cherry picked from commit ebe0af4b77bca2042565a3f15fc1f597f5862874)
SaVaGe [Tue, 6 Oct 2009 15:53:37 +0000 (18:53 +0300)]
[BUG] task.c: don't assing last_timer to node-less entries
I noticed that in __eb32_insert , if the tree is empty
(root->b[EB_LEFT] == NULL) , the node.bit is not defined.
However in __task_queue there are checks:
- if (last_timer->node.bit < 0)
- if (task->wq.node.bit < last_timer->node.bit)
which might rely upon an undefined value.
This is how I see it:
1. We insert eb32_node in an empty wait queue tree for a task (called by
process_runnable_tasks() ):
Inserting into empty wait queue &task->wq = 0x72a87c8, last_timer
pointer: (nil)
2. Then, we set the last timer to the same address:
Setting last_timer: (nil) to: 0x72a87c8
3. We get a new task to be inserted in the queue (again called by
process_runnable_tasks()) , before the __task_unlink_wq() is called for
the previous task.
4. At this point, we still have last_timer set to 0x72a87c8 , but since
it was inserted in an empty tree, it doesn't have node.bit and the
values above get dereferenced with undefined value.
The bug has no effect right now because the check for equality is still
made, so the next timer will still be queued at the right place anyway,
without any possible side-effect. But it's a pending bug waiting for a
small change somewhere to strike.
These ACLs are used to check the number of active connections on the
frontend, backend or in a backend's queue. The avg_queue returns the
average number of queued connections per server, and for this, divides
the total number of queued connections by the number of alive servers.
The dst_conn ACL has been slightly changed to more reflect its name and
original usage, which is to return the number of connections on the
destination address/port (the socket) and not the whole frontend.
(cherry picked from commit a36af91951539ee7b24afd1dee58216979efeaea)
[DOC] Add information about http://haproxy.1wt.eu/contrib.html
Add information about http://haproxy.1wt.eu/contrib.html in
the CONTRIB file and remove one useless comment.
(cherry picked from commit 6d45fcd7198300b1744c04398a49724aff729b75)
Willy Tarreau [Sun, 4 Oct 2009 08:56:08 +0000 (10:56 +0200)]
[BUG] don't refresh timeouts late after detected activity
In old versions, before 1.3.16, we had to refresh the timeouts after
each call to process_session() because the stream socket handler did
not do it. Now that the sockets can exchange data for a long period
without calling process_session(), we can detect an old activity and
refresh a timeout long after the last activity, causing too late a
detection of some timeouts.
The fix simply consists in not checking for activity anymore in
stream_sock_data_finish() but only set a timeout if it was not
previously set.
(cherry picked from commit fe8903cc76184ef20109d9ec9729a88368b2ccd7)
Willy Tarreau [Sat, 3 Oct 2009 20:01:18 +0000 (22:01 +0200)]
[MEDIUM] new option "independant-streams" to stop updating read timeout on writes
By default, when data is sent over a socket, both the write timeout and the
read timeout for that socket are refreshed, because we consider that there is
activity on that socket, and we have no other means of guessing if we should
receive data or not.
While this default behaviour is desirable for almost all applications, there
exists a situation where it is desirable to disable it, and only refresh the
read timeout if there are incoming data. This happens on sessions with large
timeouts and low amounts of exchanged data such as telnet session. If the
server suddenly disappears, the output data accumulates in the system's
socket buffers, both timeouts are correctly refreshed, and there is no way
to know the server does not receive them, so we don't timeout. However, when
the underlying protocol always echoes sent data, it would be enough by itself
to detect the issue using the read timeout. Note that this problem does not
happen with more verbose protocols because data won't accumulate long in the
socket buffers.
When this option is set on the frontend, it will disable read timeout updates
on data sent to the client. There probably is little use of this case. When
the option is set on the backend, it will disable read timeout updates on
data sent to the server. Doing so will typically break large HTTP posts from
slow lines, so use it with caution.
Willy Tarreau [Sat, 3 Oct 2009 16:57:08 +0000 (18:57 +0200)]
[MINOR] report list of supported pollers with -vv
During troubleshooting, it's often useful to get the list of supported
pollers but until now it was required to have a working configuration
first. Since the pollers are known before main() is called, let's list
them with the build options.
[MINOR] add "description", "node" and show-node"/"show-desc", remove "node-name", v2
This patch implements "description" (proxy and global) and "node" (global)
options, removes "node-name" and adds "show-node" & "show-desc" options
for "stats". It also changes the way the header lines (with proxy name) and
the statistics are displayed, so stats no longer look so clumsy with very
long names.
Instead of "node-name" it is possible to use show-node/show-desc with
an optional parameter that overrides a default node/description.
backend cust-0045
# report specific values for this customer
stats show-node Europe
stats show-desc Master node for Europe, Asia, Africa
[BUG] check if rise/fall has an argument and it is > 0
Check if rise/fall has an argument and it is > 0 or bad things may happen
in the health checks. ;)
Now it is verified and the code no longer allows for such condition:
backend bad
(...)
server o-f0 192.168.129.27:80 check inter 4000 source 0.0.0.0 rise 0
server o-r0 192.168.129.27:80 check inter 4000 source 0.0.0.0 fall 0
server o-f1 192.168.129.27:80 check inter 4000 source 0.0.0.0 rise
server o-r1 192.168.129.27:80 check inter 4000 source 0.0.0.0 fall
[ALERT] 269/161830 (24136) : parsing [../git/haproxy.cfg:98]: 'rise' has to be > 0.
[ALERT] 269/161830 (24136) : parsing [../git/haproxy.cfg:99]: 'fall' has to be > 0.
[ALERT] 269/161830 (24136) : parsing [../git/haproxy.cfg:100]: 'rise' expects an integer argument.
[ALERT] 269/161830 (24136) : parsing [../git/haproxy.cfg:101]: 'fall' expects an integer argument.
[MAJOR] http: add support for HTTP 1xx informational responses
HTTP supports status codes 100 and 101 to report protocol indications,
which are followed by the requests's response. Till now, haproxy would
only see those responses without parsing subsequent ones. That means
that cookie additions were only performed on 1xx messages for instance,
which does not work since headers must be ignored with 1xx messages.
Also, logs were not terribly useful with the common 100 status code
in response to "Expect: 100-continue" during POST some requests.
This change adds support for such messages. Now haproxy sees them,
forwards them and skips them until it finds a correct response, which
it logs and processes. As an exception, header removal/rewriting still
work on 1xx responses in order to be able to strip out sensible
information that may have accidentely been left by another equipment
(possibly an older haproxy itself). But headers addition are disabled
however.
This change brings the ability to loop on response without data, which
is a starting point to support keepalive. The change is marked as major
as a few fixes had to be performed in the HTTP message parser.
Note: this change is sensible for version 1.3 but it appears correct
and has extensively been tested. Also it fixes a real misbehaviour.
[BUG] buffer_forward() would not correctly consider data already scheduled
The computations in buffer_forward() were only valid if buffer_forward()
was used on a buffer which had no more data scheduled for forwarding.
This is always the case right now so this bug is not yet triggered but
it will soon be. Now we correctly discount the bytes to be forwarded
from the data already present in the buffer.
(cherry picked from commit 91aa577b1f0483c50bfa72b3198cf1a1ff835163)
Willy Tarreau [Sun, 16 Aug 2009 08:29:18 +0000 (10:29 +0200)]
[MINOR] stats: add a new node-name setting
The new "node-name" stats setting enables reporting of a node ID on
the stats page. It is possible to return the system's host name as
well as a specific name.
(cherry picked from commit 1d45b7cbaeeed0426165521fbd9dab4be55b14a4)
Willy Tarreau [Sun, 9 Aug 2009 20:53:49 +0000 (22:53 +0200)]
[RELEASE] Released version 1.3.20
Released version 1.3.20 with the following main changes :
- [BUG] task: fix possible crash when some timeouts are not configured
- [BUG] log: option tcplog would log to global if no logger was defined
Willy Tarreau [Sun, 9 Aug 2009 08:11:45 +0000 (10:11 +0200)]
[BUG] log: option tcplog would log to global if no logger was defined
Romuald du Song reported a strange bug causing "option tcplog" to
unexpectedly use global log parameters if no log server was declared.
Eventhough it can be useful in some circumstances, it only hides
configuration bugs and can even cause traffic logs to be sent to
the wrong logger, since global settings are just for the process.
This has been fixed and a warning has been added for configurations
where tcplog or httplog are set without any logger. This fix must
be backported to 1.3.20, but not to 1.3.15.X in order not to risk
any regression on old configurations.
(cherry picked from commit e7ded1f8698ab954656106baae9cc8da00567511)
Willy Tarreau [Sun, 9 Aug 2009 07:09:54 +0000 (09:09 +0200)]
[BUG] task: fix possible crash when some timeouts are not configured
Cristian Ditoiu reported a major regression when testing 1.3.19 at
transfer.ro. It would crash within a few minutes while 1.3.15.10
was OK. He offered to help so we could run gdb and debug the crash
live. We finally found that the crash was the result of a regression
introduced by recent fix 814c978fb67782ceeaf1db74abfe7083938bedff
(task: fix possible timer drift after update) which makes it possible
for a tree walk to start from a detached task if this task has got
its timeout disabled due to a missing timeout.
The trivial fix below has been extensively tested and confirmed not
to crash anymore.
Special thanks to Cristian who spontaneously provided a lot of help
and trust to debug this issue which at first glance looked impossible
after reading the code and traces, but took less than an hour to spot
and fix when caught live in gdb ! That's really appreciated !
(cherry picked from commit 34e98ea70d09195e8e047b3ea22c23506598687d)
Released version 1.3.19 with the following main changes :
- [MINOR] startup: don't imply -q with -D
- [BUG] ensure that we correctly re-start old process in case of error
- [MEDIUM] add support for binding to source port ranges during connect
- [MEDIUM] support setting a server weight to zero
- [MINOR] make DEFAULT_MAXCONN user-configurable at build time
- [MEDIUM] config: split parser and checker in two functions
- [MEDIUM] config: support loading multiple configuration files
- [BUG] http: redirect rules were processed too early
- [CLEANUP] remove unused DEBUG_PARSE_NO_SPEEDUP define
- [BUG] default ACLs did not properly set the ->requires flag
- [BUILD] report commit date and not author's date as build date
- [BUG] stream_sock: always shutdown(SHUT_WR) before closing
- [BUG] stream_sock: don't stop reading when the poller reports an error
- [BUG] config: tcp-request content only accepts "if" or "unless"
- [BUG] task: fix possible timer drift after update
- [MINOR] stats: better displaying in MSIE
- [MINOR] config: improve error reporting in global section
- [MINOR] config: improve error reporting in listen sections
- [MINOR] config: the "capture" keyword is not allowed in backends
- [MINOR] config: improve error reporting when checking configuration
- [BUILD] fix a minor build warning on AIX
- [BUILD] use "git cmd" instead of "git-cmd"
- [CLEANUP] report 2009 not 2008 in the copyright banner.
- [MINOR] print usage on the stats sockets upon invalid commands
- [MINOR] acl: detect and report potential mistakes in ACLs
- [BUILD] fix incorrect printf arg count with tcp_splice
- [BUG] fix random pauses on last segment of a series
- [BUILD] add support for build under Cygwin
[BUG] fix random pauses on last segment of a series
During a direct data transfer from the server to the client, if the
system did not have enough buffers anymore, haproxy would not enable
write polling again if it could write at least one data chunk. Under
normal conditions, this would remain undetected because the remaining
data would be pushed by next data chunks.
However, when this happens on the last chunk of a session, or the last
in a series in an interactive bidirectional TCP transfer, haproxy would
only start sending again when the read timeout was reached on the side
it stopped writing, causing long pauses on some protocols such as SQL.
This bug was reported by an Exceliance customer who generously offered
to help us by sending large amounts of traces and running various tests
on production systems.
It is quite hard to trigger it but it becomes easier with a ping-pong
TCP service which transfers random data sizes, with a modified version
of send() able to send packets smaller than the average transfer size.
A cleaner fix would imply only updating the write timeout when data
transfers are *attempted*, not succeeded, but that requires more
sensible code changes without fixing the result. It is a candidate
for a later patch though.
(cherry picked from commit c54aef3180c00abc5abe33136f4cfbaa1328bdb1)
[MINOR] acl: detect and report potential mistakes in ACLs
I've discovered a configuration with lots of occurrences of the
following :
acl xxx hdr_beg (host) xxx
The problem is that hdr_beg will match every header against patterns
(host) and xxx due to the space between both, which certainly is not
what the user wanted. Now we detect such ACLs and report a warning
with a suggestion to add "--" between "hdr_beg" and "(host)" if this
is definitely what is wanted.
(cherry picked from commit 404e8ab4615d564a74f92a0d3822b0292dd6224f)
[MINOR] print usage on the stats sockets upon invalid commands
When issuing commands on the unix socket, there's no way to
know if the result is empty or if the command is wrong. This
patch makes invalid command return a help message.
(cherry picked from commit 43e0e3997870e39b78b87d3050f260571e5a7de9)
Newer GIT versions do not support "git-cmd" anymore, so date and version
can be wrong during development builds. Use "git cmd" now. Also fix
git-tar to use "git archive" instead of "git-tar-tree".
(cherry picked from commit 63076b3f611803a9bf9e6012193f89d14ccc32d2)
[MINOR] config: improve error reporting when checking configuration
Do not exit early at the first error found while checking configuration
validity. This particularly helps spotting multiple wrong tracked server
names at once.
(cherry picked from commit bb9250104fdf1096b708181c82172df737341a95)
[MINOR] config: improve error reporting in listen sections
Try not to immediately exit on non-fatal errors while parsing a
listen section, so that the user has a chance to get most of the
errors at once, which is quite convenient especially during config
checks with the -c argument.
(cherry picked from commit 9389379f608d0776988125e52cbac12a2a2ad165)
[MINOR] config: improve error reporting in global section
Try not to immediately exit on non-fatal errors while parsing the
global section, so that the user has a chance to get most of the
errors at once, which is quite convenient especially during config
checks with the -c argument. Some other errors such as unresolved
server names also don't make the parser exit too early.
(cherry picked from commit 058e9074865abd23912127d8847cccb6b5f4eb8b)
MSIE does not correctly display spaced digits. It requires a margin of
at least one pixel. Also, it does not correctly hide empty cells, so we
work around this by setting the background white. Last, the H1 font was
too large, so we reduce it by one size, which is still OK in other
browsers.
(cherry picked from commit da6721ba28d4c917e07ffa7fef1d8707c80c54b6)
When the scheduler detected that a task was misplaced in the timer
queue, it used to place it right again. Unfortunately, it did not
check whether it would still call the new task from its new place.
This resulted in some tasks not getting called on timeout once in
a while, causing a minor drift for repetitive timers. This effect
was only observable with slow health checks and without any activity
because no other task would cause the scheduler to be immediately
called again.
In practice, it does not affect any real-world configuration, but
it's still better to fix it.
(cherry picked from commit 814c978fb67782ceeaf1db74abfe7083938bedff)
[BUG] config: tcp-request content only accepts "if" or "unless"
As reported by Maik Broemme, if something different from "if" or
"unless" was specified after "tcp-request content accept", the
condition would silently remain void. The parser must obviously
complain since this typically corresponds to a forgotten "if".
(cherry picked from commit 606ad73e73600275aae944f00bda4af9976c0be8)
[BUG] stream_sock: don't stop reading when the poller reports an error
As reported by Jean-Baptiste Quenot and Robbie Aelter, sometimes a
backend server error is converted to a 502 error if the backend stops
before reading all the request. The reason is that the remote system
sends a TCP RST packet because there are still unread data pending in
the socket buffer. This RST is translated as a socket error on the
local system, and this error is reported by the poller.
However, most of the time, it's a write error, but the system is
still able to read the remaining pending data, such as in the trace
below :
The recv succeeded while epoll_wait() reported an error.
Note: This case is very hard to reproduce and requires that the backend
server is reached via the loopback in order to minimise latency and
reduce the risk of sent data being ACKed.
(cherry picked from commit 7154365cc60b124b543db4e98faedc75c0f3a2cb)
[BUG] stream_sock: always shutdown(SHUT_WR) before closing
When we close a socket with unread data in the buffer, or when the
nolinger option is set, we regularly lose the last fragment, which
often contains the error message. This typically occurs when sending
too large a request. Only the RST is seen due to the close() (since
not all data were read) and the output message never reaches the
network.
Doing a shutdown() before the close() solves this annoying issue
because the data are really pushed before the system sends the RST.
(cherry picked from commit 720058cdcbd5285dc4e4a48216b10c9b96000141)
[BUILD] report commit date and not author's date as build date
By default, when building from a git tree, haproxy's release date is
set to the last commit's date. But it was the wrong date which was
used, the initial patch's date, which can cause time jumps in the
past when an old patch gets merged. What we want is the commit date,
which reflects the correct code history.
(cherry picked from commit 446024e7fb5faef86cd6e2c0aba3c4524ad77705)
[BUG] http: redirect rules were processed too early
redirect rules are documented as being processed last before
use_backend but were mistakenly processed before block rules.
Fortunately very few people use a mix of block and redirect
rules, so this bug has never been reported yet.
(cherry picked from commit 06b917c7abcd7313263d551eaecda1b31b9c03b1)
Willy Tarreau [Mon, 22 Jun 2009 14:02:30 +0000 (16:02 +0200)]
[MEDIUM] config: support loading multiple configuration files
We now support up to 10 distinct configuration files. They are
all loaded in the order defined by -f <file1> -f <file2> ...
This can be useful in order to store global, private, public,
etc... configurations in distinct files.
(cherry picked from commit 5d01a63b7862235fdd3119cb29d5a0cfd04edb91)
Willy Tarreau [Mon, 22 Jun 2009 13:48:36 +0000 (15:48 +0200)]
[MEDIUM] config: split parser and checker in two functions
This is a first step towards support of multiple configuration files.
Now readcfgfile() only reads a file in memory and performs very minimal
parsing. The checks are performed afterwards.
(cherry picked from commit 915e1ebe63b2137fa1634ebc9553f5b73ae2fd75)
Willy Tarreau [Mon, 15 Jun 2009 14:33:36 +0000 (16:33 +0200)]
[MINOR] make DEFAULT_MAXCONN user-configurable at build time
The only way to set this previously was to set SYSTEM_MAXCONN
which serves a different purpose.
(cherry picked from commit c9fe4562c24ebacdfdf55631636e5a5a0395e43c)
Willy Tarreau [Mon, 15 Jun 2009 08:56:05 +0000 (10:56 +0200)]
[MEDIUM] support setting a server weight to zero
Sometimes it is useful to be able to set a server's weight to zero.
It allows the server to receive only persistent traffic but never
normal traffic.
(cherry picked from commit 6704d67d656574a602ddf81a603cdb4f482f90a9)
Yitzhak Sapir [Sun, 14 Jun 2009 16:27:54 +0000 (18:27 +0200)]
[BUILD] add support for build under Cygwin
After considering various possibilities, we compiled haproxy under cygwin.
Attached is an updated full diff that also has the TARGET=cygwin documented.
The whole thing compiles and installs with this diff only.
In cygwin 1.7 (now in beta), there is apparently support for ipv6. Cygwin
1.5 (later versions, anyway) already includes some support in the form of a
define USE_IPV6. When defined, it declares the sockaddr_in6 struct and
possibly other things. The above definition AF_INET6=23 is taken from
their /usr/include/socket.h file (where it is #if 0'd out).
We are running into a socket limit. It appears that Cygwin (running on
Windows 2003 Server) will only allow us to set ulimit -n (maximum open
files) to 3200, which means we're a little short of 1600 connections.
The limit of 3200 is an internal Cygwin limit. Perhaps they can raise it in
the future. Using the nbproc option, I was able to bring up 10 servers. It
seems to me that they were able to handle over 2000 connections (even though
each had maxconn 1500 set, and the hard Cygwin fd limit).
(cherry picked from commit 32087312e3a4ad483440d371b0b1769db23946d3)
Willy Tarreau [Wed, 10 Jun 2009 09:09:37 +0000 (11:09 +0200)]
[MEDIUM] add support for binding to source port ranges during connect
Some users are already hitting the 64k source port limit when
connecting to servers. The system usually maintains a list of
unused source ports, regardless of the source IP they're bound
to. So in order to go beyond the 64k concurrent connections, we
have to manage the source ip:port lists ourselves.
The solution consists in assigning a source port range to each
server and use a free port in that range when connecting to that
server, either for a proxied connection or for a health check.
The port must then be put back into the server's range when the
connection is closed.
This mechanism is used only when a port range is specified on
a server. It makes it possible to reach 64k connections per
server, possibly all from the same IP address. Right now it
should be more than enough even for huge deployments.
(cherry picked from commit c6f4ce8fc4a9da9f4c31e8d088fab1ed4f631ed0)
Willy Tarreau [Tue, 9 Jun 2009 12:36:00 +0000 (14:36 +0200)]
[BUG] ensure that we correctly re-start old process in case of error
When a new process fails to grab some ports, it sends a signal to
the old process in order to release them. Then it tries to bind
again. If it still fails (eg: one of the ports is bound to a
completely different process), it must send the continue signal
to the old process so that this one re-binds to the ports. This
is correctly done, but the newly bound ports are not released
first, which sometimes causes the old process to remain running
with no port bound. The fix simply consists in unbinding all
ports before sending the signal to the old process.
(cherry picked from commit f68da4603a092f35af627c459dbc714d9fa796e9)
Willy Tarreau [Mon, 18 May 2009 14:29:51 +0000 (16:29 +0200)]
[MINOR] startup: don't imply -q with -D
It is recommended to have -D in init scripts, but -D also implies
quiet mode, which hides warning messages, and both options are now
completely unrelated. Remove the implication to get warnings with
-D.
Willy Tarreau [Sun, 10 May 2009 18:27:47 +0000 (20:27 +0200)]
[RELEASE] Released version 1.3.18
Released version 1.3.18 with the following main changes :
- [MEDIUM] add support for "balance hdr(name)"
- [CLEANUP] give a little bit more information in error message
- [MINOR] add X-Original-To: header
- [BUG] x-original-to: fix missing initialization to default value
- [BUILD] spec file: fix broken pipe during rpmbuild and add man file
- [MINOR] improve reporting of misplaced acl/reqxxx rules
- [MEDIUM] http: add options to ignore invalid header names
- [MEDIUM] http: capture invalid requests/responses even if accepted
- [BUILD] add format(printf) to printf-like functions
- [MINOR] fix several printf formats and missing arguments
- [BUG] stats: total and lbtot are unsigned
- [MINOR] fix a few remaining printf-like formats on 64-bit platforms
- [CLEANUP] remove unused make option from haproxy.spec
- [BUILD] make it possible to pass alternative arch at build time
- [MINOR] switch all stat counters to 64-bit
- [MEDIUM] ensure we don't recursively call pool_gc2()
- [CRITICAL] uninitialized response field can sometimes cause crashes
- [BUG] fix wrong pointer arithmetics in HTTP message captures
- [MINOR] rhel init script : support the reload operation
- [MINOR] add basic signal handling functions
- [BUILD] add signal.o to all makefiles
- [MEDIUM] call signal_process_queue from run_poll_loop
- [MEDIUM] pollers: don't wait if a signal is pending
- [MEDIUM] convert all signals to asynchronous signals
- [BUG] O(1) pollers should check their FD before closing it
- [MINOR] don't close stdio fds twice
- [MINOR] add options dontlog-normal and log-separate-errors
- [DOC] minor fixes and rearrangements
- [BUG] fix parser crash on unconditional tcp content rules
- [DOC] rearrange the configuration manual and add a summary
- [MINOR] standard: provide a new 'my_strndup' function
- [MINOR] implement per-logger log level limitation
- [MINOR] compute the max of sessions/s on fe/be/srv
- [MINOR] stats: report max sessions/s and limit in CSV export
- [MINOR] stats: report max sessions/s and limit in HTML stats
- [MINOR] stats/html: use the arial font before helvetica
Willy Tarreau [Sun, 10 May 2009 18:08:10 +0000 (20:08 +0200)]
[MINOR] stats/html: use the arial font before helvetica
The stats HTML output were barely readable on some browsers such as
firefox on Linux, due to the selected helvetica font which is too
small. Specifying "arial" first fixes the issue without changing the
table size. Also, the default size of 0.8em choosen to get 10px out
of 12px is wrong because it gets 9px when rounded down.
Willy Tarreau [Sun, 10 May 2009 16:52:49 +0000 (18:52 +0200)]
[MINOR] compute the max of sessions/s on fe/be/srv
Some users want to keep the max sessions/s seen on servers, frontends
and backends for capacity planning. It's easy to grab it while the
session count is updated, so let's keep it.